• No se han encontrado resultados

Capítulo 2: Meta y Objetivos

3.7 Descripción del sistema anticolisiones Acumine

The next marathon we analyzed was the New York Marathon. We began by analyzing the entire dataset using Model 1 in the same manner in which we handled the Boston and Chicago datasets. The New York data includes 55,608 unique male runners which account for 160,127 observations and 28,907 female runners which account for 76,339 observations. This is the largest dataset in which we analyzed. Implemented Model 1 on this data yielded the age- performance curves seen below:

Figure 4.31 - Age-Performance Curve from Model 1 on New York data

These age-performance curves are very similar to the ones generated from the Chicago dataset. The New York marathon runners are noticeably slower than the Boston runners, but are about the same as the Chicago runners. The shape of the curves is similar across all three races. Finish times increase at an increasing rate with age. One pattern that is prevalent in both Chicago

this age range for both males and females. This contrasts with the slowly increasing pattern in the Boston Marathon curves. Perhaps this is a result of less experienced runners competing in New York and Chicago. This leaves more room for improvement, which can offset the aging factor that can occur between 18 and 40. The Boston Marathon typically attracts higher-quality, more experienced runners which may be performing closer to their peak. This results in an evident aging component in this age range.

The summary output of Model 1 mirrors the previous two analyses. The αicoefficients have more variance for male runners than female runners, which suggests that there is a greater variance in the ability of male runners in the dataset. The βyijcoefficient aligns reasonably with

the race conditions in each year. In years with extreme race conditions, this value was positive which indicates slower times. This pattern was persistent across all marathons in which we analyzed.

As shown with the Boston and Chicago datasets, we wanted to convert the New York age-performance curves into new qualifying standards. We repeated the same analysis as we did with the previous races to arrive at the following standards:

Table 4.12 - Qualifying Standards produced from Model 1 on New York data

As anticipated, these new qualifying standards are very similar to those generated by the Chicago dataset. Most of the times are at least 10 minutes faster than the 2020 Boston qualifying standards for most age-groups due to the lack of aging in the curves between ages 18-45. Once again, the standards for the older age-group seem very unreasonable. This is likely due to a lack of data (less than 1% of the data is composed of runners over the age of 65). Most of the times generated from this analysis are faster than the current qualifying standards, which suggests that the current standards are too strict on the 18-34 age-group. This conclusion is consistent with the Boston and Chicago analyses. Perhaps this is due to the theory in which we previously proposed that suggests the New York Marathon attracts a less experienced runner. These runners often improve into their forties. Another theory is that the best young runners may not be able to afford the travel and expenses associated with the New York Marathon. This means that many of the

younger runners may be locals who are not as committed or experienced. The faster young runners may also be deterred by the complicated entry procedures for the New York Marathon. These entry procedures reward runners who participate in a large number of races organized by New York Road Runners. Fast, young runners may not be interested or capable of gaining points in this system. Therefore, these runners could be unable to participate in the New York

Marathon.

I was also curious to look at fitting Model 1 on different performance quartiles of New York Marathon runners. We split the male and female runners into quartiles based on the runner ability (αi). Then we fit Model 1 on each of these quartiles, separately. This resulted in the following age-performance curves:

Figure 4.33 - Age-Performance curve for Chicago Male data split by Quartiles

These age-performance curves closely resemble those in the Chicago analysis. The shape of the curves are all very similar which suggests that runners in this dataset age similarly despite their ability. The curves are still very flat from ages 18-45 for all quartiles. The gap between quartiles is much narrower for quartiles 2 and 3. This signifies that this marathon has some elite runners (quartile 1) and some more casual runners (quartile 4). We were curious if filtering this data for only runners in quartiles 1 and 2 (top 50% of runners) could lead to more similar qualifying suggestions as the Boston analysis. However, this was not the case. We have shown these new qualifying standards below:

Table 4.13 - Qualifying Standards produced from Model 1 on top 50% of New York data

This analysis resulted in age-performance curves which were similar to those generated when using all of the New York data. Therefore, it makes sense that the qualifying standards (shown above) did not change much. Overall, the results of the analyses performed on the New York dataset closely resembled the results from the Chicago analyses.

We also implemented the variable-quantile analysis which uses dropout probability to select a subset of runners. Using a logistic regression to model the effect of age on dropout probability yielded very similar results to the Chicago dataset. Dropout probability increased with age. When utilizing p = 25% (the baseline percent of top runners aged 18 as determined by the αi coefficients), we obtained a subset of runners which captured about 42% of runners in the dataset. The age-performance curves and suggested qualifying times from implementing Model 1 on this data is shown below:

Figure 4.34 - Age-Performance curves when using p = 25% in the variable-quantile analysis with New York data

As seen in the results above, the shape of the curve and qualifying times only change slightly from the original analysis with Model 1. The curves are a bit less flat and the qualifying times are a bit more logical, but it still seems to suffer from the same problem. The percent increase in suggested finish times from the 18-34 age-group to the 50-54 age-group is 8.10% and 6.10% for males and females respectively. While this is greater than the results from Model 1 on the entire dataset, it still is significantly less than the Boston qualifying times (or the results from our analysis on the Boston and Chicago datasets). As mentioned, this variable-quantile analysis is still a work in progress. The results promote further research and analysis to address the issue of dropout probability.