
Guy Lacroix publie un article écrit conjointement avec deux anciens étudiants du département dans la revue Journal of Quantitative Criminology
9 avril 2025
L’article peut être consulté ici.
Voici le résumé de l’article:
Titre: Beyond Traditional Risk Scores: Tackling LS/CMI Offender Misclassifications with Machine Learning
Objectives. This paper investigates the accuracy of offender risk assessment scoring methods.
We study the degree of misclassification resulting from the conventional practice of aggregating
individual items to derive risk scores and categories. We document which types of offenders are
prone to misclassification, particularly in relation to age and gender.
Methods. We use a machine learning algorithm to leverage the rich set of information available in
the LS/CMI. Using all 45,535 assessments conducted between 2008 and 2015 in Quebec (Canada),
we estimate probabilities from a random forest algorithm to predict individual risks of recidivism
over a two-year follow-up. We compare the resulting probabilities to those inferred from the risk
scores or categories to document the extent of misclassification. We devise a simple algorithm
to construct alternative risk categories that reduce misclassification relative to the LS/CMI total
scores and categories.
Results. The probabilities obtained from the random forest approach accurately predict individual
probabilities to reoffend. Compared with these predictions, the traditional aggregation of items
into risk scores or categories yields substantial misclassification for certain groups of offenders. In
particular, we find that the risk associated with older individuals when using the LS/CMI risk
categories is overestimated by about 10 percentage points. Our alternative risk categories, devised
from our machine learning predictions, successfully avoid such misclassification.
Conclusions. Traditional methods of aggregating items from risk assessments into scores may
lead to substantial misclassification, especially for older offenders. Misclassification arises from 1)
items not being equally risk-relevant; 2) information collected by the LS/CMI being excluded or
overly simplified when constructing scores; and 3) age being omitted from risk scores. Machine
learning algorithms avoid these pitfalls and can be used to construct less biased categories.