Prefatory note added Mar. 14, 2017): An Addendum has been added to this page to discuss the subject of this subpage with regard to some recently available data that show the rates at which persons who engaged in certain conduct receive suspensions as distinguished from data that simply shows the rates at which students in general received suspensions for certain types of conduct.
The analysis on the page would ideally be informed by the actual underlying figures from the 2002 study referenced below. I sought those numbers from the principal author by email of December 28, 2012, but did not receive a response.
One basis for assertions that racial bias is a source of racial disparities in public school discipline rates is based on a perception about the types of offenses for which black and white students are disciplined. That assertion appears to be based on a failure to understand the way that relative differences tend to be affected by the prevalence of an outcome.
The American Psychological Association Zero Tolerance Study that is the subject of the APA Zero Tolerance Study subpage of DD presented the referenced assertion in the following terms:
“Skiba et al. (2002) [[i]] described racial disparities in school punishments in an urban
setting, and tested alternate hypotheses for that disproportionality. Discriminant function analyses by race revealed differences on only 8 of the 32 possible reasons for referral to the office; yet the group receiving the higher rate of school punishment did not show a pattern of more disruptive behavior. White students were referred to the office significantly more than Black students for offenses that can be more easily documented objectively: smoking, vandalism, leaving without permission, and obscene language. In contrast, African American students were referred for discipline more than White students for disrespect, excessive noise, threat, and loitering, behaviors that would seem to require more subjective judgment on the part of the referring agent. In summary, there is no evidence that racial disparities in school discipline can be accounted for by higher rates of African American disruption. Rather, where racial disparities exist, African American students may be subjected to office referrals or disciplinary consequences for less serious or more subjective reasons.”[ii]
To the extent that the statement “the group receiving the higher rate of school discipline did not show a pattern of more disruptive behavior” is intended to mean the more disciplined group (blacks) were less often disruptive than whites, the statement has no foundation in the data examined and is almost certainly incorrect. An interpretive problem lies in the distinction between (a) rates at which the two student populations were referred to the office for various reasons and (b) the reasons for which students referred to the office were referred. Almost certainly, members of the black student population were referred to the office at higher rates than members of the white student population for both (a) easily-documented objectively identified (typically more serious)[iii] offenses and (b) more subjectively identified (typically less serious) offenses. But, among students who were referred to the office, the proportion of referred white students comprised by those who were referred for the easily-documented offenses was larger than the proportion of referred black student comprised by those who were referred for such offenses; correspondingly, the proportion of referred black students comprised by those who were referred for subjective offenses was larger than the proportion of referred white students comprised by those who were referred for subjective offenses.
As explained in “Illusions of Job Segregation,” in order to determine whether members of either group who commit either type of offense are referred to the office more often than the other, one must have data on the populations who commit each type of offense. The patterns described in the Skiba 2002 article would be consistent with various situations where higher proportions of whites who committed each type of offense are referred than of blacks who committed each type of offense.
Further, the patterns described in the 2002 article simply reflect the type of pattern one would expect where the relative differences between the rates at which members of the student body were referred to the office was larger for subjective offenses than for well-document offenses. And there is reason to expect such pattern simply because students are likely to be referred to the office in a higher proportion of cases of the easily documented conduct than of the subjectively identified conduct.
The point can be illustrated with data from the Table 1 of the 2006 British Society for Population Studies (BSPS) paper. The two rows of data in Table 1 below are based on Rows K and L of the BSPS paper. Both rows involve a situation where mean differences in conduct differ by half a standard deviation and where the adverse outcome involves falling beyond a certain point as to the level of the conduct. As to the more serious type of conduct, all offenders falling beyond point K in terms of level of the conduct (a point benchmarked by an advantaged group adverse outcome rate of 20%) are referred to the office. As to the less serious type of conduct, only those offenders falling beyond point L in terms of the seriousness of the conduct (a point benchmarked by an advantaged group adverse outcome rate of 10%) are disciplined. Thus, we see larger relative differences for the latter than the former. We also observe that among persons in the advantaged and disadvantaged groups experiencing the adverse outcome, the proportion comprised by persons who engaged in the more serious conduct is larger for the advantaged group than the disadvantaged group while the proportion comprised by persons who engaged in the less serious conduct is larger for the disadvantaged group than the advantaged group.[iv]
Table 1. Illustration of Consequence of Differing Referral Thresholds for Different Types of Conduct, Where Distributions of Advantaged Group (AG) and Disadvantaged Group (DG) Differ by Half a Standard Deviation as to Both the More Serious and Less Serious Conduct [ref b4325 a 4]
K - Easily documented/more serious
L - Subjectively identified/less serious
Another way of conceptualizing the matter, and one that would bring it closer the testing situation that underlies BSPS Table 1, would involve referral to remedial education programs for reading and for handwriting. Let us assume that the two groups’ performance differ by half a standard deviation as to both reading and handwriting and that determinations as to reading deficiencies are more objective than determinations as to handwriting. Suppose, then, that a school determined that because of the importance of reading, anyone falling below point K would be sent to a remedial program. But because of the lesser importance of handwriting, only those falling below point L would be sent to a remedial program. The situation would then lend itself to a fair characterization that among persons sent to remedial programs whites were more likely to be sent for poor reading skills (something that could be objectively documented), while blacks were more likely to be sent for poor handwriting skills (something that involves a comparatively subjective determination.)
But it is a situation where, using the manner of characterization employed in the APA Zero Tolerance Report and the 2002 Skiba study on which it relied, one could say that whites were more likely than blacks to be referred to remedial programs for things that could be documented objectively like poor reading sills, while blacks were more likely to be referred for things that were subjectively identified like poor handwriting skills. Employing that manner of characterization, one might also say that blacks were not more likely than whites to require remedial reading courses. But that obviously is not the case.
A similar way of conceptualizing the matter would involve situations where blacks and whites differ by half a standard deviation both on their math scores and on their creative writing scores but where a larger proportion of students is deemed to have failed math (where failure tends to be determined in a fairly objective manner) than is deemed to have failed creative writing (where failure tends to determined in a fairly subjective manner). In such situation the relative differences for failing creative writing would be larger than the relative difference for failing math, and correspondingly, a larger proportion of white students than black students who failed would have failed math, while a larger proportion of black students than white students who failed would have failed creative writing. But examined from the perspective of the Zero Tolerance Report and the 2002 Skiba study, one might say that whites tend to be failed on a subject that could be identified objectively while blacks tend to be failed on a subject that tended to be identified subjectively.
Addendum – Seattle Data on Rates at Which Students Who Engaged in Certain Types of Conduct Were Suspended (Mar. 14, 2017)
A general problem with efforts to compare the size of racial differences regarding rates of experiencing different outcomes involves the fact that we do not usually have data on the rates at which students were engaged in the particular types of conduct. That is, for example, we usually can view only the rates at which students were suspended for particular types of conduct, not the rates at which students who engaged in the particular types of conduct were suspended for those offenses.
A exception is data from a March 2018 Community Center for Education Results (CCER) document titled “Data Brief: Discipline Practices and Disparities in South Seattle and King County.” The document presents data on types of conduct and the manner in which persons of various races were disciplined when having been found to engage in the conduct. The document highlights data on rates at which black and white students who were involved in disruptive conduct and fights without injury received out of school suspension. The figures are shown as 20% and 10% for the former conduct and 70% and 50% for the latter conduct in a figurative illustration on page 11, and, more precisely, as 18% and 10% for the former conduct and 76% and 55% for the latter conduct on page 20. The document describes both set of data as being derived from a simulation that adjusted for characteristics, though it describes the characteristics no more specifically than (at 11) “characteristics such as, the number of incidents accumulated through the most recent year (2016), and gender.”
I am uncertain as to the extent to which the rates presented on page 20 depart from the actual rates. But I do not think that the departure is great or that the adjustment issue affects the point made here. Appendix Table 1 presents the rates at which black and white students who engaged in each type of conduct were suspended, as shown on page 20, along with (a) the ratio of the black suspension rate to the white suspension rate, (b) the ratio of the white rate of avoiding suspension to the black rate of avoiding suspension, (c) the absolute (percentage point) difference between suspension rates, and (d) the ratio of the black odds of suspension to the white odds of suspension (which is the same as the ratio of the white odds of avoiding suspension to the black odds of suspension), and (e) the EES, for estimated effect size, which is an measure of effect size that is theoretically unaffected by the prevalence of an outcome.
Appendix Table 1. Rates at which black and white students engaged in disruptive conduct and fighting without injury were suspended, with measures of differences
WH Susp Rt
B/W Ratio Susp
The relative difference between black and white suspension rates is greater for disruption than for fighting. This is consistent with that claim discussed in the body of this page that racial disparities are greater for more subjectively identified offenses than less subjectively identified offenses. But all other measures, including crucially the EES, indicate that the disparity is greater for fighting.
That is also the result of the logistic regression analysis discussed in the CCER document (at 20), which shows an odds ratio of 2.0 for fighting and 1.7 for disruptive conduct (and a still lower 1.5 for the category failure to cooperate). I am not sure the reason why the odds ratios yielded by the CCER analysis differ from those I show. Possibly the suspension rate figures I use are the actual rates and the CCER analyses calculated the results as adjusted.
CCER’s purpose, it should be noted, is to show that racial disparities exist after adjustment for characteristics rather than the comparative size of disparities for different offenses. To the extent that the CCER analysis is completely valid it would be showing that racial bias is greater in imposing suspensions in greater for fighting than disruptive conduct, which would be inconsistent with the claim that bias is more likely to be manifested where offenses are subject.
But whatever the CCER analysis adjusted for, such analyses tend to fail to adequately adjust for relevant characteristics in situations like this. See, e.g., my “The Perils of Provocative Statistics,” Public Interest (Winter 1991), “Statistical Quirks Confound Lending Bias Claims,” American Banker (August 14, 2012). Among other things – and this might be deemed either underadjustment or faulty aggregation (though they both involve similar same statistical issues) – the group that disproportionately engages in a certain type of conduct tends engage even more disproportionately in the more serious forms of the conduct. (Here I mean that among persons who have engaged in a certain type of conduct, the group that is more likely to have engaged in the conduct will make up a higher proportion of persons of persons who have engaged in the more serious form of the conduct than it makes up of all persons engaging in the conduct. This the factors tends causes an analysis to suggest difference in treatment whether or not it exists.) While the CCER does not present information on which groups are disproportionately engaged in various types of conduct, typically blacks will have higher rates than whites for most types of conduct that sometimes leads to suspensions.
An illustration of the increasing disproportionality as conduct grows more severe may be found be found in Table 2 of my “The Misunderstood Relationship Between Racial Differences in Conduct and Racial Differences in School Discipline and Criminal Justice Outcomes,” Federalist Society Blog (Dec. 20, 2017), which shows greater racial disproportionality for fights that lead to injury than for fighting itself. The CCER analyses is limited to fights that did not lead to injury. But the data in Table 1 suggests (as people familiar with distributions would expects), among blacks and whites who engage in fights that do not lead to injury, a higher proportion of blacks than whites will have engaged in the more serious of such fights.
[i]Skiba, R.J., Michael, R.S., Nardo, A.C. & Peterson, R. (2002). The color of discipline: Sources of racial and gender disproportionality in school punishment. Urban Review, 34, 317-342.
[iii] The quoted material describes the subjectively identified conduct as less serious. I am not sure whether that is necessarily correct given the infractions identified as more easily documented (which include smoking). But I accept the characterization for purposes of the illustration made here. Further, a key point made infra is that it will require a more extreme form of one type of conduct than another to elicit a sanction such as being referred to the office. And that would seem to hold both for types differentiated by more serious versus less serious and differentiated as easily documented versus subjectively identified (where in each case the latter type of conduct is that for which a smaller proportion of cases is sanctioned).
[iv] The Losen study that is the subject of the NEPC National Study subpage and referenced in note ii supra relies on absolute differences between rates to measure disparities. That leads to many conclusions that are the opposite of those one would reach based on relative differences in adverse outcomes. At page 7 the study relies on Skiba to the effect that “racial disparities in discipline are larger in the offense categories that are subjective or vague, and vice versa.” As discussed in the NEPC National Subpage, however, the Losen study measured disparities in terms of absolute differences between rates. According to the analysis of this subpage, absolute differences between rates would be smaller for subjective of vague offense categories. But one would need the underlying data to be certain about that.