After California voters passed Proposition 47 in 2014, recategorizing theft up to $950 in value as misdemeanors rather than felonies, the state saw an uptick in rates of larceny theft despite nation-wide trends continuing downward. By comparison burglary rates, whose sentencing was unaffected by Prop 47, continued downward in line with national trends. Of course correlation does not prove causation. But there is a plausible theory for causation: if you lower the “cost” of stealing $950 worth of goods, for some cohort of people the cost/benefit ratio might shift in favor risking petty theft where they did not before, leading to an increase in larcenies.

Therefore I was surprised to see a San Francisco Chronicle headline declaring that “Easing penalties on low-level offenses didn’t raise crime rate.” The article cites a 2018 study and quotes author Charis Kubrin, a professor of criminology at the University of California, Irvine, as saying “Our analysis tells us Prop. 47 was not responsible, so it must have been something else.”

This quote seemed striking for its certainty as much as for its conclusion. Despite the seeming correlation between the passage of the law and the spike in larcenies, proving definitively that Proposition 47 caused the observed increase in crime would be impossible. There are far too many confounding variables, especially in a state the size of California. Likewise proving that Proposition 47 **did not** cause the observed increase in larceny crime rates should be equally difficult.

Wanting to understand how this seemingly intractable problem was solved I read through the study itself, published in the journal Criminology & Public Policy in August 2018. The authors modeled crime rates for each crime category (e.g., murder, burglary, larceny) in California as the weighted average of the other 49 US states. Since none of these other states passed a Proposition 47, the authors could then use this model to project theoretical crime rates in California during 2015 to 2017 if Proposition 47 had not passed.

To generate this model, the authors used FBI crime rate data from 1977 to 2014 for each state for each category of crime, with larceny being of particularly interest. Each states’ crime rate in each category over these years is a potential contributing variable in the resulting model. The authors then used an algorithm to create the most accurate possible model of California’s crime rates expressed as a weighted sum of these variables.

For example, in the case of larceny the algorithm finds that Nevada and New York have crime rates that are most highly correlated with California’s, with Colorado correlating significantly less:

So the resulting function expresses California’s larceny crime rate as:

.479 * [Nevada] + .406 * [New York] + .095 * [Colorado]

The authors can then use this function with crime rates from these states from more recent years (2015-2017) to model what would have happened in “counterfactual California”.

The intuition underlying this model is that if there was another proximate cause for a rise or fall in crime these states would also be affected and thus capture this change. In the graphs above we can see that Nevada experiences many of the same spikes and falls as California. This makes intuitive sense since these neighboring states share a large border, weather, and fluid population moving between the two.

Employing this methodology the authors found Proposition 47 had a meaningful impact on the observed crime rates for larceny and motor vehicle thefts: “the post-intervention gaps suggest that larceny and motor vehicle thefts were less than 10% and roughly 20% higher, respectively, in 2015 than they would have been without Prop 47.”

That is to say, the study shows a statistically significant increase in rates for larceny and car thefts in observed rates compared to the synthetic control model. This is completely at odds with the conclusions drawn in the Chronicle and in the author’s own quotes! By the authors’ own methodology, the best model for California’s non-intervention larceny theft rate was lower than the actual larceny rate by a statistically significant degree.

How then did the researchers get from this finding to the point where one of the authors would declare “When we compared crime levels between these two California’s, they were very similar, indicating that Prop. 47 was not responsible for the increase”? From the study:

To determine whether the estimated larceny effect is sensitive to changes in Synthetic California’s composition (i.e., different donor pool weights), we iteratively exclude the donor pool state with the greatest weight (ω) until all of the original donor pool states with nonzero weights have been removed. Synthetic California is composed of four donor pool states with weights that are greater than zero: New York, Michigan, Nevada, and New Jersey. The version of Synthetic California that results from this procedure is composed of a set of donor pool states that are entirely different than our original model. If the estimated impact of Prop 47 on California’s crime rate persists under both compositions, we can be confident that our larceny estimate is not dependent on the contribution of certain donor pool states to Synthetic California. If our interpretation changes under Synthetic California’s new composition, however, the estimated effect is dependent on the contribution of certain donor pool states and the finding should be interpreted cautiously.

In short, the authors removed the states with the highest correlation to California’s larceny rates from the model and created a new model from the remaining (more poorly correlated) states. The authors assert that only if this new, necessarily less accurate model **also** demonstrates that there was a statistically significant impact from Prop 47 can we conclude that this is the case.

These stipulations and modifications to the model are puzzling. By removing a small cohort of states that best approximate crime rates in California, the authors degrade the model and increase its error rate relative to the real California crime rates. These modifications appear to have been made with the intention of finding any paradigm in which Proposition 47’s effects on crime rates fell within the margin of error of their changing synthetic model. Once this is done, they find:

For larceny, we find that Synthetic California requires at least one of the following states be included in the donor pool in order to sustain the effect: New York, Michigan, Nevada, and New Jersey… When these four donor pool units are excluded, the post-intervention gap disappears.

So if you exclude **all four** states that originally were used as the best approximation of California’s crime rates, the gap disappears. This apparently is enough to reach the authors’ conclusion:

This suggests that our valid causal interpretation of the Prop 47 effect on larceny

rests on the validity of including these four states in our donor pool. Thus, larceny, our only nonzero, nontrivial effect estimate, appears to be dependent on the contribution of four specific states from our donor pool. This finding, therefore, should be interpreted with caution.

From a statistical standpoint these conclusions are difficult to understand. While the authors state that inclusion of these 4 states calls into question the validity of the model’s findings regarding larceny rates, the authors include no similar testing or mention for modeling of other crimes, such as rape and murder rates. The authors do not attempt to parse out how these effects on larceny vary by inclusion of 1, 2, 3, or all 4 states. Finally, the authors make no mention of completing a Bonferroni correction, or other statistical means through which to correct for the statistical implications of multiple comparisons. In other words, these methodologies are concerning for P-hacking, or the practice of trying different study designs until the desired outcome is found.

The graph of this model from the study itself shows just how facile this process is:

Compare the graphs of *Synthetic California (Baseline, No Restrictions),*** Synthetic California (NY, NY, MI, & NJ excluded)** and

**. The baseline model does a good job of approximating the actual observed data. The model excluding the four states does not come close to plausibly modeling the larceny rates in California. And of course once you remove enough states with the closest correlation to California’s crime rate you will eventually come up with a model that is bad enough that you can declare its findings statistically insignificant. Without an explanation for why the methodology required the removal of four states (rather than three, or two, or none) this simply reads like bad science.**

*California (Actual)*More troubling than these experiment design issues, though, are the massive discrepancies between what is actually proven in the study and the claims made by the authors in the both the study’s conclusion and in the media.

The study’s abstract states that “Larceny and motor vehicle thefts, however, seem to have increased moderately after Prop 47, but these results were both sensitive to alternative specifications of our synthetic control group and small enough that placebo testing cannot rule out spuriousness.” Yet by the end of the abstract they note that “our findings suggest that California can downsize its prisons and jails without compromising public safety.” This conclusion is in no way reached by the actual findings of the study, yet is presented as such by the authors.

The study **did** find that Prop 47 had a statistically significant effect on larceny rates, albeit with caveats about the fallibility of computational models. Yet the author states to the Chronicle that “Our analysis tells us Prop. 47 was not responsible, so it must have been something else.”

Again, this is in no way proven in the study. This statement is complete contradiction of the study’s findings. And yet this conclusion is cited in multiple Chronicle articles, and even an editorial which adds “that analysis said the jump was matched in other states, suggest[ing] Prop. 47 wasn’t a difference maker” – despite this claim never appearing in the study (and in reality being contrary to fact).

Now this untruth has entered the public discourse as a fact with the rigor of a mathematical study behind it, despite being contrary to the findings of said study. We can trace this fuzziness beginning with some questionable but obscure modeling decisions, the implications of which are magnified by the authors in the study’s abstract (at the expense of other, more certain findings) and then further expounded upon in the media by the author and then by journalists writ large.

I emailed the Chronicle months ago but have received no response; sadly this will remain part of the record. All I can recommend is that we be wary of what we accept as fact and make sure to read the fine print before admitting study conclusions as such.