Students Study vs Grades analysis and Data Visualization with Seaborn

saman aboutorab
Jan 3, 2024
1 min read

Updated: Jan 8, 2024

In investigating the correlation between studying habits and final grades, a linear regression model is constructed to quantify the potential relationship. The independent variable in this analysis is the amount of time students devote to studying, while the dependent variable is their final grades. Through this regression model, we can assess whether there exists a statistically significant positive or negative association between the two variables. A scatter plot is generated to visualize the data, plotting the amount of studying on the x-axis and the corresponding final grades on the y-axis. The regression line on the plot provides a clear representation of the trend, aiding in the interpretation of whether higher levels of studying are indeed linked to improved final grades.

Turning our attention to the impact of family relationship quality on student absences, a separate linear regression model is established. In this case, the quality of family relationships serves as the independent variable, and the number of student absences is the dependent variable. A scatter plot is created to visually inspect the data distribution, with the regression line illustrating any discernible pattern between family relationship quality and the frequency of student absences. This analysis aims to uncover whether a positive or negative correlation exists, shedding light on the potential influence of family dynamics on students' school attendance.

Libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Import median function from numpy
from numpy import median

Dataset

student_data = pd.read_csv('student-alcohol-consumption.csv')

print(student_data.head())

Plot style

sns.set_style("whitegrid")
sns.set_palette("RdBu")

Study time vs. grade

# List of categories from lowest to highest
category_order = ["<2 hours",
                  "2 to 5 hours",
                  "5 to 10 hours",
                  ">10 hours"]

# Turn off the confidence intervals
g = sns.catplot(x="study_time", y="G3",
            data=student_data,
            kind="bar",
            order=category_order, ci=None)

# Add a title
g.fig.suptitle("Study time vs. Grade")

# Show plot
plt.show()

Internet access

# Create a box plot with subgroups and omit the outliers
g = sns.catplot(x='internet', y='G3', data=student_data, kind='box', hue='location', sym='')

# Add a title
g.fig.suptitle("Internet access vs. Grade")

# Show plot
plt.show()

Romantic relationships

# Set the whiskers to 0.5 * IQR
g = sns.catplot(x="romantic", y="G3",
            data=student_data,
            kind="box", whis=0.5)

# Add a title
g.fig.suptitle("Romantic Relationships vs. Grade")

# Show plot
plt.show()

Absence vs. Relationship

# Plot the median number of absences instead of the mean
g = sns.catplot(x="romantic", y="absences",
			data=student_data,
            kind="point",
            hue="school",
            ci=None, estimator=median)

# Add a title
g.fig.suptitle("Absence vs. Relationship")


# Show plot
plt.show()

Family relationships vs. Absence

# Remove the lines joining the points
g = sns.catplot(x="famrel", y="absences",
			data=student_data,
            kind="point",
            capsize=0.2, join=False)

# Add a title
g.fig.suptitle("Family relationships vs. Absence")

# Show plot
plt.show()