7. FUNDAMENTACIÓN DE LA VALORACIÓN DEL DAÑO CORPORAL EN EL SGSSI:
7.8. CONCEPTO DE CALIFICACIÓN INTEGRAL DE LA INVALIDEZ
compared later in this dissertation, differences in the output data which are reported to be significantly different are so at a 5% level of significance. In order to determine whether there are, in fact, statistically significant differences in the algorithmic means of the PMIs described in §5.1.4, an analysis of variance (ANOVA) [104] is carried out. The ANOVA, however, only indicates whether there is at least one significant difference between two means. As a result, post hoc tests are required in order to determine where this difference actually occurs. Unfortunately, most post hoc tests assume homogeneity of sample variances, which is not necessarily the case in the PMI data of §5.1.4.
The two post hoc tests employed in this dissertation for determining where the differences in the algorithmic means of the PMIs lie are Fisher’s Least Significant Difference (LSD) test [174] and the Games-Howell test [39]. After an ANOVA has been performed and a significant difference has been identified between the means of two samples, a Levene test [146] is carried out in order to determine whether or not the corresponding variances differ significantly from one another. In the case where the variances are found not to differ statistically from one another at a 5% level of significance, the LSD test (which requires homogeneity of sample variances) is employed in order to identify the location of the differences in the PMIs. If, however, the Levene test reveals that the variances are, in fact, statistically different at a 5% level of significance, the Games-Howell test (which does not require homogeneity of sample variances) is employed in order to determine where these differences lie.
The working of the ANOVA, Levene, LSD and Games-Howell statistical tests are reviewed briefly in this section. In all the tests described below, n samples are to be compared, each containing m observations. Denote the i-th sample by x(i)1 , . . . , x(i)m for all i∈ {1, . . . , n}. Furthermore, let
the mean of the i-th sample be denoted by ¯x(i)and let sidenote the sample’s standard deviation.
Finally, let ¯x denote the mean of all sample means ¯x(1), . . . , ¯x(n).
In all of the above-mentioned statistical tests it is assumed that the data are normally distributed. According to the central limit theorem, even if samples are taken from an unknown distribution, the distribution of the sample mean will still be approximately normally distributed, if the sample size m is sufficiently large [104]. This implies that although the underlying probability distributions of the PMIs of §5.1.4 may be unknown, the sample means may be considered to be approximately normally distributed and, as a result, the requirement of the statistical tests that the data are normally distributed, is not violated, if m is large.
In order to determine a suitable sample size m, the technique outlined by Lindley [87] is em- ployed. Initially, a sample of m PMI values is generated from simulation runs. An estimate of the standard deviation of the sample PMI values may be calculated as
Sx =
sPm
i=1(x− ¯x(i))2
m− 1 . (5.4)
Thereafter, a confidence interval (CI) around the true mean may be determined. In order to determine an accurate CI, the studentised range distribution1 [104, Appendix A] is employed. The CI half-width h, at a (1− α)-level of confidence, is given by
h1−α = t(1−α/2),(m−1)
Sx
√
m, (5.5)
1
A distribution which may be used for the estimation of the range of a normally distributed population in the case where the standard deviation of the population is unknown and the population may be considered to be small.
For the purposes of this study, a CI half-width h not exceeding a value greater than 5% of the sample mean, as suggested by Lindley [87], is deemed sufficiently accurate. This procedure is repeated for each PMI, after which the largest m∗-value is chosen.
The ANOVA test
The null-hypothesis H0to be tested when performing an ANOVA may be formulated as there are
no statistically significant differences between the means of any of the samples. It follows that the alternative hypothesis H1 is that there are significant differences between at least two of the
sample means. In the ANOVA test, both the sum of squares of observations within samples and the sum of squares between sets of the sample data are used in order to test the null-hypothesis. The sum of squares of observations within samples is calculated as
Sw = 1 mn− n n X i=1 m X j=1 (x(i)j − ¯x(i))2. (5.7)
Similarly, the sum of squares between sets of data is given by Sb = m n− 1 n X i=1 (¯x(i)− ¯x)2. (5.8)
The test statistic is given by the ratio Sb/Sw. This test statistic is compared with the critical
value F (d1, d2, α) of the F-distribution, where d1 = n− 1 denotes the number of degrees of
freedom of the numerator, d2 = mn− n denotes the number of degrees of freedom of the
denominator, and α denotes the level of statistical significance. The value of F (n− 1, mn − n, α) may be found in [104, Appendix A]. In the case where
Sb/Sw > F (n− 1, mn − n, α), (5.9)
the null-hypothesis H0 is rejected at an α-level of significance. This implies that there are,
in fact, significant differences between the means of at least two samples at a (1− α)-level of confidence. Alternatively, if the inequality in (5.9) does not hold, it may be concluded that no statistically significant differences exist between the sample means at a (1−α)-level of confidence. The Levene test
The Levene test is used in order to assess whether the variances of two or more data sets are statistically different at an α-level of significance. This is encapsulated in the null-hypothesis H0 that there are no statistically significant differences between the variances of any of the
original samples, while the alternative hypothesis H1 becomes there are statistically significant
differences between at least two of the original sample variances. In order to perform the test, two variables have to be determined. The first of these values, the test statistic FL is calculated
as FL= (mn− n)Pni=1m(¯x(i)− ¯x)2 (n− 1)Pni=1 Pm j=1(|x (i) j − ¯x(i)| − ¯x(i))2 . (5.10)
The critical value F (n− 1, mn − n, α) is again obtained from the F-distribution table. If
FL> F (n− 1, mn − n, α), (5.11)
then the null-hypothesis is rejected which implies that variances between at least two of the data sets are statistically different at a (1− α)-level of confidence and the Games-Howell test is subsequently performed in respect of each pair of samples. If, however, the inequality in (5.11) does not hold, it may be concluded there are no statistical differences between the variances at a (1− α)-level of confidence, and the LSD test is performed.
The Fisher LSD post hoc test
The Fisher LSD post hoc test has proven to be a powerful parametric statistical test. Criticism has, however, been offered due to the belief that its protection against inflated Type I error2 rates is insufficient, although this has only been proven to be the case when more than three data sets are being compared [51].
The null-hypothesis H0 for Fisher’s LSD test is that there is no statistically significant difference
between the means ¯x(k) and ¯x(`) of two samples. The test statistic of the LSD test is given by |¯x(k)− ¯x(`)|, while the critical value at a level of significance α is
Lα= tα/2,d2
p
2Sw/m, (5.12)
where tα/2,d2 denotes the entry in the table corresponding to the two-sided t-distribution [104, Appendix A] at a significance level of α with d2= mn− n degrees of freedom and where Sw is
the value of the sum of squares within samples, as determined in (5.7). If |¯x(k)− ¯x(`)| > L
α, the null-hypothesis is rejected at a level of confidence 1− α (i.e. there is a
statistical difference between the means ¯x(k) and ¯x(`) at an α-level of significance). Otherwise,
the means may not be considered different at an α-level of significance. This procedure has to be repeated for all n2 pairs of samples. When performing the Fisher LSD post hoc test it is important to keep the practical significance3 as well as the statistical significance of the results
in mind.
The Games-Howell test
The Games-Howell post hoc test [59, 60] is a non-parametric test recommended for use in cases with unequal sample sizes or if the assumption of homogeneity of variances required for Fisher’s LSD test is violated [35]. According to Armstrong and Hilton [6], the Games-Howell post hoc test is one of the most robust modern methods of post hoc testing. Furthermore, it is said to be a more conservative test than the majority of other post hoc tests [6]. The critical value required for the test employs Welch’s degrees of freedom (from Welch’s t-test4) and the studentised range
2
A Type I error is the error of rejecting a null-hypothesis when it is actually true.
3Practical significance refers to the evaluation of whether statistically significant differences are large enough
to be relevant in a practical sense. As an example, consider the mean travel times of vehicles returned after the implementation of the policies as suggested by two reinforcement learning agents. Now assume that after a number of simulation runs, these means have been found to be statistically significantly different although they only differ by 0.5 seconds. While it may have been proven that these means are different from a statistical perspective, it is clear that this difference is negligible in a practical sense.
4
Welch’s t-test is a two-sample location test used for determining whether the means of two different popula- tions are equal. In this test, homogeneity of variance is not assumed, but normality of data is assumed.
is the standard error, d(k, `) denotes the degrees of freedom, calculated here as d(k, `) = m− 1 s2 k/m 2 + s2 `/m 2 s2k+ s2` m 2 (5.14)
and α is again the level of statistical significance. If |¯x(k)− ¯x(`)| > q
σ(k,`),d(k,`),α, then there is a
statistical difference between the means of the two samples at an α-level of significance and the null-hypothesis is rejected. If, on the other hand, this inequality does not hold, then the means of the two samples do not differ at an α-level of significance and the null-hypothesis may not be rejected at a (1− α)-level of confidence.
P-values in Hypothesis Tests
One method of reporting the results of an hypothesis test involves stating whether or not a null- hypothesis should be rejected at a specified level of significance α, and is called fixed significance level testing [104]. A so-called p-value is employed in fixed significance level testing and denotes the probability that the test statistic will take on a value that is at least as extreme as the observed value in the case that the null-hypothesis is true. In other words, the p-value is the smallest level of significance which would lead to rejection of the null-hypothesis H0 based on
the given data. Consider, for example, the two-sided hypothesis test employed in the Fisher LSD test, where
H0 :|¯x(k)− ¯x(`)| = 0 and H1:|¯x(k)− ¯x(`)| 6= 0 (5.15)
are the null and alternative hypotheses, respectively. Then the p-value is given by 1− P −|¯x (k)− ¯x(`)| p 2Sw/m < tα/2,d2 < |¯x(k)− ¯x(`)| p 2Sw/m ! . (5.16)
Operationally, once the p-value has been computed it is compared with a predefined level of significance α, in order to make a decision. It is then standard practice to report the observed p-value, along with the decision made in respect of rejection of the null-hypothesis. Apart from stating this decision on the null-hypothesis, the p-value provides a measure of credibility of the null-hypothesis. More specifically, the p-value provides a measure of risk that an incorrect decision regarding the null-hypothesis has been made, as the p-value denotes the probability that the null-hypothesis is wrongly rejected [104] (in other words, it is the probability of making a Type I error). The p-values for the ANOVA, Levene and Games-Howell tests may be computed similarly, but using the appropriate probability distributions in each case, as mentioned above.
5.4 Chapter Summary
This chapter opened in §5.1 with a description of the various entities involved in the simula- tion model building process, culminating in a detailed description of the simple, hypothetical, benchmark highway network which will be used as a test-bed and concept demonstrator for the working of the reinforcement learning algorithms implemented in the following chapters. This
was followed in §5.2 by a description of the verification and validation techniques employed so as to ensure a valid simulation. Finally, an experimental design was described in §5.3, with a specific focus on the simulation warm-up period, as well as some general parameter specifications and the statistical analysis which is to be performed in respect of the simulation model output data.
CHAPTER 6
Reinforcement Learning for Ramp Metering
Contents
6.1 ALINEA and PI-ALINEA in a Microscopic Context . . . 102 6.2 Formulation as a Reinforcement Learning Problem . . . 102 6.2.1 The State Space . . . 102 6.2.2 The Action Space . . . 103 6.2.3 The Reward Function . . . 104 6.2.4 Learning Rate and Action Selection . . . 104 6.3 Q-Learning for Ramp Metering . . . 105 6.4 kNN-TD Learning for Ramp Metering . . . 106 6.5 Computational Results . . . 106 6.5.1 Parameter Evaluation . . . 107 6.5.2 Algorithmic Comparison . . . 110 6.6 Ramp Metering with a Queueing Consideration . . . 130 6.6.1 ALINEA and PI-ALINEA with Queue Limits . . . 130 6.6.2 Q-Learning andkNN-TD with Queue Limits . . . 131 6.6.3 Algorithmic Comparison . . . 131 6.7 Chapter Summary . . . 152
The purpose of this chapter is to provide a detailed description of the implementation of RL in the context of RM. The chapter opens in §6.1 with a description of the implementations of the well-known ALINEA and PI-ALINEA RM control strategies within the microscopic traffic benchmark model of §5.1.2. Thereafter, the RM problem is formulated in §6.2 in the context of RL, which then serves as the blueprint for the algorithmic implementations of Q-Learning and the kNN-TD RL algorithms. These algorithmic implementations are presented in §6.3 and §6.4, respectively. This is followed by an algorithmic parameter evaluation in §6.5.1, after which the relative algorithmic performances of the RM techniques are compared in §6.5.2, using suitable algorithmic parameter values. Thereafter, queueing considerations are introduced within each of the RM implementations in §6.6, so as to prevent the formation of excessively long on- ramp queues. A thorough algorithmic performance comparison of the RM implementations incorporating these queueing considerations follows in §6.6.3. The chapter finally closes in §6.7 with a brief summary of the work included in the chapter.
6.1 ALINEA and PI-ALINEA in a Microscopic Context
The ALINEA RM control strategy, widely regarded as the benchmark RM control strategy [130], has been designed for application in a macroscopic traffic modelling environment. As a result, a number of minor adjustments have to be made to the control strategy in order to facilitate its successful application within a microscopic traffic simulation model. According to the ALINEA strategy, the metering rate is adjusted based on the traffic density along the highway directly downstream of the on-ramp. In the macroscopic case, this is achieved simply by adjusting the maximum allowable flow entering the highway from the on-ramp.
As in several real-world applications of RM [52, 130], a one-vehicle-per-green-phase approach is adopted for its microscopic implementation in this dissertation. The flow of vehicles onto the highway from the on-ramp may then be controlled by adjusting the red phase duration of the traffic signal enforcing RM at the on-ramp. Due to the fact that the control law only returns a metering rate, this metering rate is converted to practically implementable red phase time
R(t) = max 0, 3600 r(t) − G(t) , (6.1)
where r(t) denotes the metering rate (in veh/h) determined according to (3.17), and G(t) denotes the fixed green phase duration. It is evident that a larger metering rate (i.e. allowing more vehicles to enter the highway traffic stream) results in shorter red phase times, while a smaller metering rate restricts the traffic flow allowed to enter the highway by enforcing longer red phase durations.
Due to the fact that the ALINEA control law dates back to 1997, it may be considered outdated, especially considering the large volume of work performed since. Therefore, PI-ALINEA, a more recent extension of the ALINEA control law, first published in 2014, is also implemented as a second benchmark control strategy against which the performance of the RL implementations may be measured. In PI-ALINEA the metering rate is determined according to (3.18) and similarly to ALINEA based on the traffic density directly downstream of the on-ramp. This metering rate is then again converted to red phase times which may be applied in the microscopic traffic simulation model in the same manner as for ALINEA, by means of (6.1).