Our data consists of 2012 AP test and 2012 SAT test data from every high school in New York. One importance to note is the difference in SAT scoring since 2012. SAT's are scored from 200-800 in each section, to a maximum of 2400 points. In this study, we will only be looking at each section individually. AP tests are reported as pass (3+/5) or fail (1/5 or 2/5). Later in the paper, we will introduce the 2012 census report on New York high schools, adding new information for each school such as socioeconomic status, previous academic ventures, and future academic prospects.
What follows is the process to clean and load the data into proper data frames and arrays to be plotted.
#Loading data into python
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None # default='warn'
df = pd.read_csv("2012__AP_Results.csv")
df2 = pd.read_csv("2012_SAT_Results.csv")
df2 = df2.drop(labels="SCHOOL NAME", axis=1)
df_both = pd.merge(df, df2, on="DBN")
df_both.head()
DBN | SCHOOL NAME | Num of AP Test Takers | Num of AP Total Exams Taken | Num of AP Exams Passed | Num of SAT Test Takers | SAT Critical Reading Avg. Score | SAT Math Avg. Score | SAT Writing Avg. Score | |
---|---|---|---|---|---|---|---|---|---|
0 | 01M292 | HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES | s | s | s | 29 | 355 | 404 | 363 |
1 | 01M448 | UNIVERSITY NEIGHBORHOOD HIGH SCHOOL | 37 | 53 | 21 | 91 | 383 | 423 | 366 |
2 | 01M450 | EAST SIDE COMMUNITY SCHOOL | 12 | 12 | s | 70 | 377 | 402 | 370 |
3 | 01M458 | FORSYTH SATELLITE ACADEMY | s | s | s | 7 | 414 | 401 | 359 |
4 | 01M509 | MARTA VALLE HIGH SCHOOL | 14 | 15 | s | 44 | 390 | 433 | 384 |
df_both.isna().any()
# -> No Missing Values! Look for stand-ins ("s")
DBN False SCHOOL NAME False Num of AP Test Takers False Num of AP Total Exams Taken False Num of AP Exams Passed False Num of SAT Test Takers False SAT Critical Reading Avg. Score False SAT Math Avg. Score False SAT Writing Avg. Score False dtype: bool
df_both.replace(to_replace="s", value=None, inplace=True)
df_both.isna().any()
DBN False SCHOOL NAME False Num of AP Test Takers True Num of AP Total Exams Taken True Num of AP Exams Passed True Num of SAT Test Takers True SAT Critical Reading Avg. Score True SAT Math Avg. Score True SAT Writing Avg. Score True dtype: bool
#Initial data frame cleaning
df_clean = df_both.dropna(how="any", inplace=False)
df_clean.drop("Num of AP Test Takers", axis=1, inplace=True)
df_clean['Num of AP Total Exams Taken'] = df_clean['Num of AP Total Exams Taken'].astype('int')
df_clean['Num of AP Exams Passed'] = df_clean['Num of AP Exams Passed'].astype('int')
df_clean['Num of SAT Test Takers'] = df_clean['Num of SAT Test Takers'].astype('int')
df_clean['SAT Critical Reading Avg. Score'] = df_clean['SAT Critical Reading Avg. Score'].astype('int')
df_clean['SAT Math Avg. Score'] = df_clean['SAT Math Avg. Score'].astype('int')
df_clean['SAT Writing Avg. Score'] = df_clean['SAT Writing Avg. Score'].astype('int')
In this part, we will examine the correlation between SAT and AP exams. The main questions we have to ask here are:
#Simple descriptive stats
df_clean["SAT Math Avg. Score"].describe()
count 174.000000 mean 447.816092 std 74.561303 min 323.000000 25% 396.250000 50% 434.000000 75% 478.000000 max 735.000000 Name: SAT Math Avg. Score, dtype: float64
df_clean["SAT Critical Reading Avg. Score"].describe()
count 174.000000 mean 423.270115 std 65.965848 min 300.000000 25% 383.250000 50% 408.500000 75% 444.000000 max 679.000000 Name: SAT Critical Reading Avg. Score, dtype: float64
df_clean["SAT Writing Avg. Score"].describe()
count 174.000000 mean 419.977011 std 69.047856 min 298.000000 25% 380.000000 50% 402.500000 75% 442.000000 max 682.000000 Name: SAT Writing Avg. Score, dtype: float64
#Mapping %ap test pass on X, three lines for SAT scores on Y, best fit lines included to see correlation
#-> dashed at beginning because less values, higher % of AP test pass is more correlated with higher SAT scores
fig, (ax_math, ax_read, ax_write) = plt.subplots(3, figsize = (12,12))
X = df_clean['Num of AP Exams Passed'].values / df_clean['Num of AP Total Exams Taken'].values
Y_math = df_clean['SAT Math Avg. Score'].values
Y_read = df_clean['SAT Critical Reading Avg. Score'].values
Y_write = df_clean['SAT Writing Avg. Score'].values
ax_math.scatter(X, Y_math)
ax_math.set_ylim([200,800])
a, b = np.polyfit(X, Y_math, 1)
ax_math.plot(X, a*X+b, "--", color="purple")#, "loosely dotted")
ax_read.scatter(X, Y_read)
ax_read.set_ylim([200,800])
a, b = np.polyfit(X, Y_read, 1)
ax_read.plot(X, a*X+b, "--", color="purple")
ax_write.scatter(X, Y_write)
ax_write.set_ylim([200,800])
a, b = np.polyfit(X, Y_write, 1)
ax_write.plot(X, a*X+b, "--", color="purple")
plt.xlabel("AP Test Pass %")
plt.suptitle("SAT scores vs. %AP test passed")
plt.setp(ax_math, ylabel = "SAT Math Scores")
plt.setp(ax_read, ylabel = "SAT Reading Scores")
plt.setp(ax_write, ylabel = "SAT Writing Scores")
fig.tight_layout()
plt.show()
Upon examination, the data follows a slight positive, linear trend. As the percentage of students passing their AP exams increases, the SAT scores for each section, math, writing, and reading, are also increasing. It can be inferred that most students who perform well on their exams do so because of their preparation, and those students who study for one type of exam and do well are seemingly more likely to do the same for the other type. While this may seem obvious to the viewer, it is also important to recognize how the schools themselves play a role. If students at one institution are taught how to prepare for these exams in their classes, while another school may not put as much emphasis on exams like the SAT, students not going to perform as well. This could explain some of the outliers where students are performing very well on their AP exams - exams which they take a full AP course in preparation for - yet not performing as well as their peers on the SAT.
Overall, the SAT tests average in the 400 to 500s, with only a few outliers breaking above the 700 average SAT score mark. When combined, the average SAT test score for all schools in New York is about 1290. While this looks like about half the possible score, its closer to 30% of the potential scoring on the SATs. This is the 26th percentile for scores.
In this section, we will introduce the census data.
What follows is the process to clean and load the data into proper data frames and arrays to be plotted.
df3 = pd.read_csv("2011-2012_High_School_Progress_Report.csv")
#Replacing stand-ins with missing values
df3.replace(to_replace="s", value=None, inplace=True)
df3.replace(to_replace=".", value=None, inplace=True)
df3.replace(to_replace="", value=None, inplace=True)
df3.replace(to_replace=" ", value=None, inplace=True)
df_all = pd.merge(df_both, df3, on="DBN")
df_all.columns
Index(['DBN', 'SCHOOL NAME', 'Num of AP Test Takers', 'Num of AP Total Exams Taken', 'Num of AP Exams Passed', 'Num of SAT Test Takers', 'SAT Critical Reading Avg. Score', 'SAT Math Avg. Score', 'SAT Writing Avg. Score', 'School Name', 'School Type', 'Overall Score', 'Overall Grade', 'Percentile Rank', 'Progress Grade', 'Performance Grade', 'Environment Grade', 'College and Career Readiness Grade', 'Closing the Achievement Gap Points', 'Principal', 'Enrollment', '% Students with Disabilites', '% Students in Self-contained Settings', '% Overage', '% Free Lunch', '% Black or Hispanic', '% ELL', '8th Gr Math/ELA', 'Peer Index'], dtype='object')
#Remove extraneous values
#Overall Grade, School Name are both repetetive
df_all.drop(df_all.loc[df_all["DBN"] == "06M462"].index, axis=0, inplace=True)
df_all.drop(["Principal", "School Name", "% ELL",
'% Students in Self-contained Settings',
"School Type", 'Progress Grade', 'Performance Grade', 'Environment Grade',
"% Overage", "Overall Grade"], axis=1, inplace=True)
#DF Cleaning, initializing test scores into numpy arrays
df_all_clean = df_all.dropna(how="any", inplace=False)
df_all_clean['Num of AP Exams Passed'] = df_all_clean['Num of AP Exams Passed'].astype('int')
df_all_clean['Num of AP Total Exams Taken'] = df_all_clean['Num of AP Total Exams Taken'].astype('int')
df_all_clean['SAT Critical Reading Avg. Score'] = df_all_clean['SAT Critical Reading Avg. Score'].astype('int')
df_all_clean['SAT Math Avg. Score'] = df_all_clean['SAT Math Avg. Score'].astype('int')
df_all_clean['SAT Writing Avg. Score'] = df_all_clean['SAT Writing Avg. Score'].astype('int')
df_all_clean['% Free Lunch'] = df_all_clean['% Free Lunch'].astype('float')
df_all_clean['% Black or Hispanic'] = df_all_clean['% Black or Hispanic'].astype('float')
df_all_clean['8th Gr Math/ELA'] = df_all_clean['8th Gr Math/ELA'].astype('float')
df_all_clean['Peer Index'] = df_all_clean['Peer Index'].astype('float')
df_all_clean['Closing the Achievement Gap Points'] = df_all_clean['Closing the Achievement Gap Points'].astype('float')
XAP = df_all_clean['Num of AP Exams Passed'].values / df_all_clean['Num of AP Total Exams Taken'].values
XMath = df_all_clean['SAT Math Avg. Score'].values
XRead = df_all_clean['SAT Critical Reading Avg. Score'].values
XWrite = df_all_clean['SAT Writing Avg. Score'].values
This section will be dedicated to exploring the socioeconomic metrics, that is, the percent of free lunch students (students from lower-income families) and the percent of hispanic and black students at each school. We will also be examining the Closing the Achievement Gap points awarded to each school, a metric that shows how many percent changes occured in the last year between the disparaged (generally minority and low-income families) students and better off students. This metric should tell us how well the school is doing to offset the disadvantages that come from being in a lower-income or predominantely ethnic school.
In this section, we will ask:
#Creating a 3D scatter plot, initializing variables into np arrays, plotting them as x and y, using test scores as the colormap
free_lunch = df_all_clean['% Free Lunch'].values
minority = df_all_clean['% Black or Hispanic'].values
import seaborn as sns
cmap = sns.cubehelix_palette(as_cmap=True, start=1.7, rot=.75)
f, ax = plt.subplots(2, 2, figsize = (12,10))
points = ax[0,0].scatter(free_lunch, minority, c=XAP, s=50, cmap=cmap)
plt.setp(ax[0,0], xlabel = "% Free Lunch", ylabel = "% Minority")
cb = plt.colorbar(points, ax=ax[0,0])
cb.set_label("AP Test Pass %", rotation = 270, labelpad = 20)
points2 = ax[0,1].scatter(free_lunch, minority, c=XMath, s=50, cmap=cmap)
plt.setp(ax[0,1], xlabel = "% Free Lunch", ylabel = "% Minority")
cb2 = plt.colorbar(points2, ax=ax[0,1])
cb2.set_label("SAT Math Score", rotation = 270, labelpad = 20)
points3 = ax[1,0].scatter(free_lunch, minority, c=XRead, s=50, cmap=cmap)
plt.setp(ax[1,0], xlabel = "% Free Lunch", ylabel = "% Minority")
cb3 = plt.colorbar(points3, ax=ax[1,0])
cb3.set_label("SAT Reading Score", rotation = 270, labelpad = 20)
points4 = ax[1,1].scatter(free_lunch, minority, c=XWrite, s=50, cmap=cmap)
plt.setp(ax[1,1], xlabel = "% Free Lunch", ylabel = "% Minority")
cb4 = plt.colorbar(points4, ax=ax[1,1])
cb4.set_label("SAT Writing Score", rotation = 270, labelpad = 20)
plt.show()
#x is length, y is diameter, color is rings
The first striking part about the graph is how %free lunch is correlated with %minority. As the amount of students that are econmically disparaged goes up, we can see they often attend schools that are high in minority rates as well. This is most likely due to socioeconomic factors like gentrification; lower-income families, that are often black or hispanic, are forced into their own communities away from the richer families.
There is an incredibly strong correlation with all three SAT score types. The origin holds almost all of the high-average SAT test takers. This is most likely due to the SAT being an "optional" test, so schools themselves do not provide study material for it. This means those who can't afford to pay for SAT tutoring, or other test prep, will generally not do well. This notion is supported by the fact that ignoring the %minority (that is, even when you look at schools that are close to 100% minority), the higher % the free lunch students, the lower the average SAT score goes.
Additionally, AP tests seem to have a less strong correlation than the SAT's. Sure, there is a large collection of green (high scoring) AP test takers near the origin (richer, less ethnic schools), but there is a decent spread of average and even excellent test takers in the top right portions of the graph. This shows that the AP tests are not as disparaging as the SAT's towards the economically impaired or towards minority students, as the spread is much more even.
minority = df_all_clean['% Black or Hispanic'].values
acheivment_gap = df_all_clean["Closing the Achievement Gap Points"].values
f, ax = plt.subplots(2, 2, figsize = (12,10))
points = ax[0,0].scatter(minority, acheivment_gap, c=XAP, s=50, cmap=cmap)
plt.setp(ax[0,0], xlabel = "% Minority", ylabel = "Closing the Achievement Gap Points")
cb = plt.colorbar(points, ax=ax[0,0])
cb.set_label("AP Test Pass %", rotation = 270, labelpad = 20)
points2 = ax[0,1].scatter(minority, acheivment_gap, c=XMath, s=50, cmap=cmap)
plt.setp(ax[0,1], xlabel = "% Minority", ylabel = "Closing the Achievement Gap Points")
cb2 = plt.colorbar(points2, ax=ax[0,1])
cb2.set_label("SAT Math Score", rotation = 270, labelpad = 20)
points3 = ax[1,0].scatter(minority, acheivment_gap, c=XRead, s=50, cmap=cmap)
plt.setp(ax[1,0], xlabel = "% Minority", ylabel = "Closing the Achievement Gap Points")
cb3 = plt.colorbar(points3, ax=ax[1,0])
cb3.set_label("SAT Reading Score", rotation = 270, labelpad = 20)
points4 = ax[1,1].scatter(minority, acheivment_gap, c=XWrite, s=50, cmap=cmap)
plt.setp(ax[1,1], xlabel = "% Minority", ylabel = "Closing the Achievement Gap Points")
cb4 = plt.colorbar(points4, ax=ax[1,1])
cb4.set_label("SAT Writing Score", rotation = 270, labelpad = 20)
plt.suptitle("Closing the Achievement Gap's effect on Standardized Tests", y=0.925)
plt.show()
The achievment gap is somewhat difficult to conceptualize compared to some of the other data covered so far. It separates higher and lower performing students, and is generally a good metric for the disparity of privilege among a student population - the wider the gap, the greater the disparity. Therefore, schools should aim to close this gap as much as possible to create a more equal learning environment for their students. Schools that score higher in this category made more progress in closing their gap for the year, decreasing the difference by a greater percentage. Upon observation of the data above, it becomes clear that those schools who have a high population of minority students have a wider variability in the data than those schools who have lower minority populations. This makes sense, as schools with a low minority percentage would most likely not worry as much about the disparity among their minority students compared to a school whose student population is composed mostly of minority students. It can be inferred from the data that schools that spend fewer resources on closing this gap can focus more on improving their test scores, as the better test scores are concentrated near the origin of each graph - where there is the smallest percentage of minority students in the enrollment classes. This is true for the SAT exams, however the AP exams are seemingly less disparaging among student populations, as there are high passing rates distributed more evenly across the data points. This could be due to the fact that students taking AP exams are taking corresponding AP courses with full curriculums, while no such preparation for the SAT is present in most schools. Because students are expected to prepare for the SAT on their own, using their own resources, this creates an unfair advantage in favor of those students who, for example, could afford a tutor or extra lessons, while less fortunate students may be unable to access these tools.
This section will cover the more academic side of the census. The three values examined will be the Eigth Grade standardized ELA/Math test, the Peer Index, and the College and Career Readiness Grade. The Eighth Grade standardized tests are tests every eighth grader has to take before high school. We will be using this as a metric to see where students come from academically. The Peer Index is a very similar metric as the one we created above, it takes factors like %economic disparity, %minority, %special education, and %english language learners. This will be a metric to determine which socioeconomic status students came from. The College and Career Readiness system is one in place that measures an individual student on their academic or otherwise prowess in what the government believes to be ready for college. This number of individuals that pass is divided by the graduating class, and is converted to a 5-category scale, ranging from F and D-A. This will be our metric to see where students are going academically.
In this section, we will ask:
import seaborn as sns
cmap = sns.cubehelix_palette(as_cmap=True, start=1.7, rot=.75)
eighth_grade_test = df_all_clean['8th Gr Math/ELA'].values
peer_index = df_all_clean['Peer Index'].values
f, ax = plt.subplots(2, 2, figsize = (12,10))
points = ax[0,0].scatter(eighth_grade_test, peer_index, c=XAP, s=50, cmap=cmap)
plt.setp(ax[0,0], xlabel = "Eight Grade Math/ELA Test", ylabel = "Peer Index Score")
cb = plt.colorbar(points, ax=ax[0,0])
cb.set_label("AP Test Pass %", rotation = 270, labelpad = 20)
points2 = ax[0,1].scatter(eighth_grade_test, peer_index, c=XMath, s=50, cmap=cmap)
plt.setp(ax[0,1], xlabel = "Eight Grade Math/ELA Test", ylabel = "Peer Index Score")
cb2 = plt.colorbar(points2, ax=ax[0,1])
cb2.set_label("SAT Math Score", rotation = 270, labelpad = 20)
points3 = ax[1,0].scatter(eighth_grade_test, peer_index, c=XRead, s=50, cmap=cmap)
plt.setp(ax[1,0], xlabel = "Eight Grade Math/ELA Test", ylabel = "Peer Index Score")
cb3 = plt.colorbar(points3, ax=ax[1,0])
cb3.set_label("SAT Reading Score", rotation = 270, labelpad = 20)
points4 = ax[1,1].scatter(eighth_grade_test, peer_index, c=XWrite, s=50, cmap=cmap)
plt.setp(ax[1,1], xlabel = "Eight Grade Math/ELA Test", ylabel = "Peer Index Score")
cb4 = plt.colorbar(points4, ax=ax[1,1])
cb4.set_label("SAT Writing Score", rotation = 270, labelpad = 20)
plt.suptitle("Peer Index and Eight Grade Testing's impact on Standardized Tests", y=0.925)
plt.show()
The Eighth grade ELA and math tests averages first and foremost, directly correlate with the peer index of the same school (it is safe to assume that most high schoolers from the same school will come from the same one or two middle schools). This is both consistent and not consistent with our previous findings. In our own socioeconomic data, we saw that tests that are taken with material taught in school (AP tests) are signifigantly more variable than out-of-school tests with low-socioeconomic-status students. In our data here, it seems that, in an almost direct correlation, the higher your peer index, the higher your average eighth grade standardized tests. This can maybe offer us some insight into our previous data sets. The AP test takers may have a more variable scores because they opted into the classes, whereas the eighth grade tests are mandatory.
When looking at the correlation to the SAT and AP tests, we see a very similar trend as before. Those who come from "good" middle schools (that is, high peer index, and high eighth grade test scores) will tend to have a much higfher performance in the SAT's. These are the schools that are in the more wealthy parts of New York, and it is clearly reflected in how test scores stay consistently high from middle school to high school.
Ap tests, however, are varied yet again. Near the top of the graph, from "good" middle schools, AP test scores are understandably high. However, closer to the origin, and spread throughout the graph, there are high-AP-scoring schools throughout the curve. This points to a lack of correlation between background and AP test scores. These same students who do exceptionally well on AP tests who come from disparaged middle schools are, unfortunately, part of schools that score exceptionally poorly on SAT tests.
#PLOT COLLEGE READINESS VS AP TEST AND SAT(<- IT SEEMS THAT SAT's DO NOT CORRELATE WITH GOOD AP TEST SCORES
#OR WITH ANY METRIC OF SOCIOECONOMIC STATUS, SEE IF COLLEGE READINESS IS MORE CORRELATED WITH AP TESTS OR WITH SAT;
#IF NOT CORRELATED WITH AP TESTS, MAYBE AP TEST IS BAD. IF IT IS CORRELATED, MAYBE THEY ARE GOOD AND SAT's ARE JUST A MEASURE
#OF WEALTH <- PRECONCEIVED NOTION).
#College and Career Readiness Score is incomplete for a lot of schools, but the grade is pretty complete.
df_all_clean.replace(to_replace=("A","B","C","D","F"), value=(5,4,3,2,1), inplace=True)
readiness = df_all_clean["College and Career Readiness Grade"].values
f, ax = plt.subplots(2, 2, figsize = (12,10))
box1 = ax[0,0].boxplot([
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 1, 'Num of AP Exams Passed'].values / df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 1, 'Num of AP Total Exams Taken'].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 2, 'Num of AP Exams Passed'].values / df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 2, 'Num of AP Total Exams Taken'].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 3, 'Num of AP Exams Passed'].values / df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 3, 'Num of AP Total Exams Taken'].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 4, 'Num of AP Exams Passed'].values / df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 4, 'Num of AP Total Exams Taken'].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 5, 'Num of AP Exams Passed'].values / df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 5, 'Num of AP Total Exams Taken'].values
], positions = [1,2,3,4,5], showmeans=True)
plt.setp(ax[0,0], xlabel = "College and Career Readiness Grade", ylabel = "AP Test Pass %",)
plt.setp(box1["medians"], color="purple")
box2 = ax[0,1].boxplot([
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 1, "SAT Math Avg. Score"].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 2, "SAT Math Avg. Score"].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 3, "SAT Math Avg. Score"].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 4, "SAT Math Avg. Score"].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 5, "SAT Math Avg. Score"].values
], positions = [1,2,3,4,5], showmeans=True)
plt.setp(ax[0,1], xlabel = "College and Career Readiness Grade", ylabel = "SAT Math Score", ylim=(200,800))
plt.setp(box2["medians"], color="purple")
box3 = ax[1,0].boxplot([
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 1, "SAT Critical Reading Avg. Score"].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 2, "SAT Critical Reading Avg. Score"].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 3, "SAT Critical Reading Avg. Score"].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 4, "SAT Critical Reading Avg. Score"].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 5, "SAT Critical Reading Avg. Score"].values
], positions = [1,2,3,4,5], showmeans=True)
plt.setp(ax[1,0], xlabel = "College and Career Readiness Grade", ylabel = "SAT Reading Score", ylim=(200,800))
plt.setp(box3["medians"], color="purple")
box4 = ax[1,1].boxplot([
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 1, "SAT Writing Avg. Score"].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 2, "SAT Writing Avg. Score"].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 3, "SAT Writing Avg. Score"].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 4, "SAT Writing Avg. Score"].values,
df_all_clean.loc[df_all_clean["College and Career Readiness Grade"] == 5, "SAT Writing Avg. Score"].values,
], positions = [1,2,3,4,5], showmeans=True)
plt.setp(ax[1,1], xlabel = "College and Career Readiness Grade", ylabel = "SAT Writing Score", ylim=(200,800))
plt.setp(box4["medians"], color="purple")
for i in range(2):
for j in range(2):
ax[i,j].set_xticklabels(["F", "D", "C", "B", "A"])
plt.suptitle("College and Career Readiness's effect on Standardized Tests", y=0.925)
plt.show()
College and Career Readiness is a really interesting grade to base metrics on, because one of the metrics to pass the college readiness bar is to pass two AP tests. So, it makes sense that for AP test %, the higher college readiness scores have higher average, median, and maximum AP test pass percentages. What doesn't make as much sense, is how low the average increases. A grade of D to A only increases the average AP pass % by 10%, which is much lower than expected. It does widen the distribution of AP test scores, though. Overall, the college and career readiness grade does not strongly correlate with higher of any of the SAT scores. It is important to note that the F scores are drastically small towards the lowest end of SAT scores, and the A scores are much more variable. However, the increase in mean and median are not signifigant enough to consider a correlation between these scores (or anything more than a very slight one). There are a lot of outliers present for the A and B scores for SAT tests, meaning on occasion, a high college and career readiness can produce higher SAT scores, but again, this is not consistent.
Our general conclusions include the following:
Exploration into tests like these can signifigantly expose disparities between tests used to, potentially change students lives. When it is increasingly present that AP scores are less reliant on a student's socioeconomic status, it might be both more fair and pertinent to colleges to consider AP scores higher than SAT scores. This would end in a win-win for both sides; schools get students who are more prepared, and students get to more accurately determine their education level.