35 algunos grupos de edad a los que se les asocia mayor actividad de caries; la mayoría

Here, the main contributions of the current investigation are discussed in details.

8.2.1. New Methodology for Modeling Risk Identification

Table 8-1 summarizes the differences in the methodology and compares them to the methodologies proposed in previous studies in term of raw data and variable use.

Table 8‐1: Difference between risk identification modeling methodologies

Studies Traffic variables from raw data Agg.

Interval

Single station?

Oh et al, 2001 Lane-based volume, occupancy, and average speed 10 sec Yes

Golob et al., 2008 Lane-based volume and occupancy 30 sec No Lee et al., 2003 Lane-based volume, occupancy, and average speed 20 sec No Abdel-Aty et al.,

2005

Lane-based volume, occupancy, and average speed 30 sec No Hourdakis et al.,

2006 Individual vehicle data - No

Hossain et al., 2010

Station-based volume and average speed 5 min No

Current research Individual vehicle data - Yes

Most of the studies presented in the literature consider multiple traffic detector stations as a way to create new variables. The common trend of those studies, especially in studies by Hossain et al. and Abdel-Aty et al., is to perform tests using a different number of detector stations upstream and downstream crash locations. The only single station-based study was developed by Oh et al. However, none of these studies discusses potential variables that can be created by means of analysis. As a consequence, an important variable, the speed difference between lanes (identified in the current research as one of main factors contributing the high crash risk), was therefore ignored.

Besides, individual vehicle data is used by Hourdakis et al., 2006 and by the current research. However, Hourdakis et al., 2006 developed their model based on multiple traffic detectors.

8.2.2. Methodology for sampling non‐crash traffic data

The work presented in this section is motivated by the rarity of motorway crashes, leading to the low performance of machine learning techniques, as illustrated in section 3.3.1. Four methodologies for sampling non-crash data are testedas follows:

i) S1 by Oh et al, 2001: non-cash cases at 30 min before and pre-crash cases right before crashes, ii) S2 by Abdel-Aty et al, 2008: : matched case control – controlling the time of the day and the day

of the week as well as weather conditions, iii) S3 by Pande et al, 2007: random selection, and iv) S4: Methodology proposed in the current research.

The methodology for the test includes the following steps:

8-122

 Sample non-crash data. Apply non-crash data sampling methodologies.

 Develop a risk identification model using Random Forest regression based on pre-crash data and on sampled non-crash data.

As a Random Forest regression is applied, the data is divided into three sub-sets: training, calibration, and validation data. Table 8-2 presents the results of the developed models using four non-crash data sampling methodologies (S1, S2, S3, and S4) for validation.

The methodology S1 presents a low performance as non-crash data is simply selected by taking data at 30 minutes before crashes. The application of S3 improves the model’s performance, yet it is still low. This happens because the chance for irrelevant non-crash data, selected for comparison with pre-crash data, is equal to the chance for relevant non-crash data. The performance improves by applying S2; meaning by controlling the time of the day, the day of the week, and meteorological conditions; thereby the search space is reduced. However, the developed model does not perform well with new traffic conditions, which are not accounted for by such controls. With the proposed methodology S4, new traffic conditions are classified into existing traffic regimes. In several regimes, new conditions can be immediately declared as non-crash because crashes (rear-end or sideswipe) difficultly occur under those regimes. Under different circumstances, new traffic conditions are tested and classified into pre-crash or non-crash. Therefore, the performance of models developed using the proposed non-crash data sampling methodology S4 is much improved.

Table 8‐2: Performance of data sampling methodology Data sampling methodology NTS (%) PTS (%)

S1 51.00 39.00

S2 68.00 67.00

S3 58.45 61.45

S4 (Proposed method) 89.83 83.62

8.2.3. Improvement of Risk Assessment Accuracy

Risk assessment relates to the capacity of models to correctly identify pre-crash and non-crash traffic conditions. As the data used in other studies is not available, there is no mean to verify the accuracy of such models. Table 8-3 presents the summary of the best accuracy reported in those studies.

The missed alarm and the false alarm rates presented in Table 8-3 relate to the respective percentages of pre-crash and non-crash cases incorrectly identified as non-crash and pre-crash cases, respectively. A model is more explicative if it has lower missed and false alarm rates.

For each study, there is at least one data set that is used to develop the risk identification model, called training data set. Moreover, depending on the learning method (classification or regression), one or two other data sets can be used. The two other data sets are calibration and validation data sets. The accuracy presented in Table 8-3 is applicable to validation data sets (i.e. data sets that represent new data).

Table 8‐3: Stated accuracy of relevant studies Studies Missed Alarm (%) False Alarm (%) Note

Oh et al, 2001 - - One data set for Training, calibration, and validation

Lee et al, 2003 - - One data set for Training, calibration, and validation

Hourdakis et al, 2006 41.67 6.81 One data set for calibration and validation Abdel-Aty et al, 2005-2008 26.10 30.00 Two data sets for training and validation Pande et al, 2005-2007 26.00 34.00 Two data sets for training and validation Hossain et al, 2010 36.67 20.00 One data set for calibration and validation

Proposed RIM 10.27 16.38 Three different data sets

Two studies by Oh et al. 2001 and Lee et al. 2003 cannot be compared to other studies as no validation data sets were used. Two more studies, by Hourdakis et al. 2006 and Hossain et al. 2010, combine calibration and validation data sets in one single set and employ regression methods. Therefore, the developed models are not assessing new traffic data. Only two data sets are used in the studies of Abdel- Aty et al. and Pande et al., as the methods used were based on classification. Therefore, among the previous studies, only the results by Abdel-Aty et al. and Pande et al. are validated. However, the accuracy reported in those studies is much lower than the accuracy obtained by applying the methodology proposed in the current research.

8.2.4. Crash Risk Prediction

In previous studies, traffic crash risk assessment was well studied. A further improvement was suggested by Abdel-Aty et al. and Pande et al. in preventing crash risks: once the risk is identified, the incoming traffic conditions are also at high risk and preventive measures such as variable speed limits are immediately activated. In this case, the activation of preventive measures is dependent on the performance of risk identification models, again rather low (see Table 8-3).

In the current study, crash risk prediction is undertaken based on the test of several consecutive time intervals Lrm, called length of risk memory. The future crash risk is more certain if traffic conditions

during those time intervals are identified as risky. As illustrated in Figure 7-8, false alarm and missed rates cannot be altogether minimized, yet an optimal value of Lrm can be selected based on the location of

the study site.

It is worth noting that by fixing Lrm=1, MyTRIM works exactly as the model suggested by Abdel-Aty et

8-124

In document Eficacia en la técnica de restauración atraumática. (página 36-40)