• No se han encontrado resultados

GUAVATA ANTES DE LAS ELECCIONES DE

PARTIDOS POLÍTICOS EN GUAVATÁ – SANTANDER 1988

2. LIBERALISMO Y ELECCIONES LOCALES, 1988 A

2.2. GUAVATA ANTES DE LAS ELECCIONES DE

While these three models do represent the best quality binary classification models tested on this dataset so far, there are still many possible avenues to explore from a classification or probability estimation point of view. While these will not be the focus of this thesis, it is still possible that a different binary classification model, or a differently-constructed type of feedforward Neural Network, may be able to better predict the occurrence of reoffences on this dataset. However, with the current limi- tations on pre-2008 offender data in place, it is entirely possible that it is difficult, if not impossible, to attain a much better level of predictive accuracy than the current level.

The outperformance of Random Forest over XGBoost is surprising, given the state of the art nature of the XGBoost algorithm. Further parameter optimisation, includ- ing optimisation of the tree depth parameter and regularisations, would likely push the XGBoost algorithm into first place. However, the more complex the build and run processes of the model become, the more pressing the issue of comprehensibility and deployability also becomes. Without a coherent and fully automated parameter tuning script that has been tested within a clearly defined space over several runs, making use of a properly optimised XGBoost model in this context could prove too taxing for staff without a data science or statistical background. As such, we have left this open to further research, should Dyfed-Powys be willing to invest in this solution at a future date.

Youden statistic. Some of the models within this chapter have suffered from imbal- ances between sensitivity and specificity and may benefit from the addition of an analysis or training approach based on this statistic.

Chapter 3

Survival Analysis for Reoffence

Prediction

3.1

Survival Modelling for Recidivism

While it is certainly useful to know whether or not an individual offence is likely to lead to a reoffence from the same offender within a three-year period, it can also be said that it is much more useful for a police force to know when in that three-year period (or beyond), and with what probability, such a reoffence is likely to occur. With a more granular set of temporal predictions at their disposal, Dyfed-Powys police can make more informed decisions about when it is prudent to start and stop monitoring an individual offender. From a cost point of view, this is incredibly beneficial; if an individual offender is very likely to offend within the first month following their offence and very unlikely to offend thereafter, it makes little sense for the police to continue monitoring such an offender beyond the first month. More- over, if a crime is reported in or near an area in which several recorded criminals are resident, these probabilities of reoffence can be used to estimate which of these offenders is most likely to have committed the crime.

The aim of this chapter, therefore, will be to predict (from the series of factors outlined in Chapter 2) how long it is likely to take for a reoffence by the same of- fender to occur following their most recent offence. As such, the prediction that will be produced is an estimate of the time to an event, where the event is represented by a reoffence committed by the individual in question. Should a reoffence occur by

the end of the monitoring period, this will be considered to be the occurrence of an event, or a ”failure”. Should an reoffence not occur before the end of the monitoring period, this will be considered to be a ”censored event”, whereby nothing is known about the subject after the time of censoring. As such, by considering a reoffence to be a ”failure” and a lack of reoffence to be a ”censored event”, this time to event prediction problem can be taken to be one of survival, where the ”survival” of an of- fender refers to the time it takes for that individual to either reoffend or be censored.

In particular, the focus of this investigation will be to assess the effects of various variables on the time it takes for an individual to commit an offence (or otherwise). Therefore, in keeping with the previous chapter, since each crime will be considered separately, only a single event will be considered for each crime and after this event, the offender in question will be seen to have exited the monitoring period. As sur- vival analysis is a well-researched topic that encompasses many types of problems in many different areas of research, this chapter will begin with an overview of re- lated work in tackling survival problems for criminal datasets, then continue with a discussion of these methods and how they may be appropriately used to predict the survival of offenders in this dataset.

3.1.1

Related Work

A great deal of research has already been undertaken in this field from a criminology perspective, mainly focusing (as was the case with the classification of offenders) on the relative survival of specific populations of offenders. As in the binary classifi- cation case, the act of reoffending is often defined as simply the act of committing a further crime. In some other cases, however, the act of committing a reoffence is considered to be a return to prison [82]. This distinction seems to be at the discretion of the researchers involved, with this consideration largely depending on the focus of the individual study. As such, wildly varying success rates in these predictions of survival were reported, depending entirely on the factors, population and scope involved.

Several different models have been put forward to model the survival of offend- ers in this context, most often following their release from prison. One of the most

popular models used for this purpose is the Cox Proportional Hazards [23] model, a well-known class of proportional hazards model. Like other proportional hazards models, this model aims to relate the time to event in a multiplicative way with one or more factors that may be associated with this quantity of time. At the time of writing, the Cox Proportional Hazards model has been shown to be instrumental in investigating certain issues within the field of recidivism. Examples of this include the notion of gender bias [6] as it relates to recidivism, the effect of an individual’s employment status [88] and the effect of racial disparity on recidivism rates [47]. However, this model seems to be best suited to problems for which the number of factors being investigated is reasonably small and the purpose of the investigation relatively specific.

For many typical police datasets, which are often comprised of a large number of diverse predictors, the Cox Proportional Hazards model may be overwhelmed due to the proportional hazards assumption not holding for these datasets. As such, it has been necessary to investigate methods for which these assumptions need not apply. Examples of research making use of this algorithm include an investigation of the factors affecting the survival of graduates from a bootcamp [5] and the survival of population of drug offenders [39]. In recent times, the extension of the Random Forests algorithm for survival modelling [43] has seen an upsurge in popularity, due to its lack of reliance on a probability distribution with a fixed set of parameters and ability to handle large, complex datasets. In many cases, the offenders in ques- tion have already been incarcerated [95]. Again, the research most often focuses on predicting the survival of a small group of individuals determined to pose a risk to society, such as mentally disordered offenders [68], or to investigate the effect of a limited number of variables on the survival of individual offenders.