The feature optimised logistic regression model is a function of only two features, one of which quantifies the duration of the vibration event, (Te), with the other
quantifying the proportion of the signal’s rms acceleration contained within the 5 Hz 1/3rd octave band (rms5Hz). Both these features are normalised against the mean feature value for all signals recorded at the same measurement position using the same instrument, accounting for distance attenuation and other potential differences in ground conditions between measurement positions. Using only these two features, the logistic regression model is able to correctly classify, on average, 96% of all signals tested. The mean precision of the model is 0.91, meaning that 91% of the signals that are classified as freight railway signals truly are freight railway signals and the mean recall of the model is 0.88, meaning that 88% of all the freight railway signals are correctly classified as such. When considering
passenger as the positive class, the precision and recall are both equal to 0.98. With the parameters of the model defined, the logistic regression was performed using all 238 known examples of freight and passenger vibration signals, in order to determine the regression coefficients. The result of this logistic regression is shown in Equation 3.22, where P(Y = 1) is the predicted probability that a vibration signal with normalised event signal duration,Te, and normalised proportional 5 Hz rms acceleration, rms5Hz, is a freight railway vibration signal, i.e. its class, Y, is equal to 1. Further details of the logistic regression model are presented in Table 3.3, where standard errors are calculated using Equation 3.8 and the parameter
p-values are determined from the Wald test statistic (Equation 3.9).
P(Y = 1) = 1 1 + exp(10.7−5.01Te−2.25rms5Hz) (3.22) Parameter β Estimate Standard
Error p-value Overall Model
Intercept -10.73 1.76 <0.001 N 238
Te 5.01 0.89 <0.001 p-value <0.001
rms5Hz 2.25 0.70 <0.010 Rpseudo2 0.79
Table 3.3: Parameter estimates and other details of the optimised logistic
regression model
Figure 3.5 shows all of the 238 known examples of freight and passenger vibration signals plotted in the two-dimensional feature space ofTe andrms5Hz. Also shown is the decision boundary for which P(Y = 1) = 0.5 and above which signals are classified as freight vibration signals by the logistic regression model. With this fit of the regression model to the data, 4 passenger railway signals and 4 freight railway signals exist in the wrong prediction regions and would be incorrectly classified if introduced to the model as unlabelled signals. However, the remaining 190 passenger railway signals and 40 freight railway signals exist in the correct
region and would be correctly classified if introduced to the model as unlabelled signals. This is commensurate with the reported 96% accuracy of the model when tested on unlabelled data completely independent to that used for the training, fitting and optimizing of the model. Most passenger railway vibration signals are clustered together in a region of low event signal duration and low proportional 5 Hz 1/3rd octave band energy. The freight railway vibration signals show more variation, but tend to have longer event signal durations and greater proportional
rms acceleration in the 5 Hz 1/3rd octave band, allowing these signals to be classified with confidence using only these two signal properties.
0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 T e rms 5Hz Decision Boundary Passenger Freight
Figure 3.5: Decision boundary of logistic regression model as a function
of the normalized event signal duration (Te) and normalized proportional
5 Hz 1/3rd octave band rms acceleration (rms5Hz)
As a final check of the model accuracy, 500 vibration signals were randomly se- lected from the field study measurements of Waddington et al. (2014) and their class was predicted using the logistic regression model. These vibration signatures were visually inspected by the author who judged their class, based on previous experience, and made the same class judgements as the logistic regression model for 94% of the vibration signals inspected.
The optimised logistic regression model has been shown to be able to correctly classify, on average, 96% of unknown railway vibration signals that are completely independent of the training, fitting and optimising of the model. In addition, both features of the optimised logistic regression model are normalised to other signals recorded at the same position using the same instrument. This allows the model to be more applicable to other sets of data in the measurement database than if abso- lute properties were used. For example, it avoids the problem of misclassification occurring because all railway traffic moves slowly close to a measurement position, perhaps due to a proximity to a station or tight bend, resulting in longer passbys relative to signals measured at other measurement positions. However, the model would not be applicable to freight only railway lines, as this would significantly skew the normalised values of Te and rms5Hz. This is not a cause for concern in this work as no measurements were made near freight only railway lines. In the extreme case where there are no freight pass bys in a 24 hour measurement period, each signal property will be of similar magnitude to the mean properties of the passenger passbys (since all passbys will be passenger traffic) and all the signals will cluster around Te = rms5Hz = 1 and should mostly be correctly classified. The high level of classification accuracy and the normalised nature of the features of the model suggests that the model can be confidently applied to the remain- ing data from the field study by Waddington et al. (2014) in order to classify the unknown signals and determine exposure-response relationships for annoyance caused by exposure to freight and passenger railway vibration separately. This will be useful in furthering the understanding of the human response to freight railway vibration in light of current proposals to increase the proportion of freight traffic on rail.