As you can see in the chart in Figure 10-3, there is a drop from 15 airlines to 10 between 1988 and 1992 and then the number remains stable for almost a decade until there is a slight increase to 12. The effects of the attacks of September 11, 2001, can be seen as two carriers drop out of the picture. The industry rapidly grows again starting on 2003 to a peak of 21 airlines on 2006 from which it slowly drops to 15. We are interested in understanding what happened right after September 11, so we modify the search to count by month instead of year and we change the time period in the time picker to go from September 2001 to June 2002. The results show that one airline ceases to operate in October 2001 and another one in December 2001. It would be interesting to know which airlines ceased to operate. The first search that comes to mind is:
* | stats values(UniqueCarrier) by Month
The stats function values returns a list of all the distinct values of the UniqueCarrier field as a multivalue entry sorted in lexicographical order. The problem with this search is that even though the output contains the results we want, it is difficult to process. You can see a part of the total output in Figure 10-4.
The better way is using a graphic chart that will be easier to interpret. By clicking on the chart button, as indicated in Figure 10-4, we see that the results of the search produce only empty charts for any of the available chart types. Let us try with another search that will produce something that is more prone to be charted. Also, as we have pinpointed the demise of the two airlines before January 2002, we reduce the time period to five months, selecting from September 2001 to January 2002 in the time picker. We will use the chart command combined with sparklines:
* | chart sparkline(count,1w) by UniqueCarrier Figure 10-4. List of airlines by month
By default, sparklines will be presented by the closest time unit of the data being processed. Because we are handling five months’ worth of data in the search, sparklines defaults to monthly, but that is too small to be a useful visualization. Using 1w as an argument expands the resolution from monthly to weekly. The output of this search can be seen in Figure 10-5.
The output of the chart command is rather spartan. We can see that KH (Aloha) and TW (TWA) drop to zero at different points in time, so we have answers to our question. However, even though the sparklines provide a basic idea of what is happening it does not feel quite complete. Surely there has to be way to provide a more comprehensive and compelling visualization. To explore a better visualization we try using the timechart command:
* | timechart count by UniqueCarrier limit=0
In this search we count all the events (flight records) and group them by airline. It is a simple way of finding out which are all the airlines and also get a count of scheduled flights for each one of them. Because the span is over five months, the timechart command will present the results broken down by months, which is what we want.
The reason we use the limit argument is that by default the maximum number of items or data series to display is 10. There are 12 airlines in the time period we are analyzing, so we can set the limit to 12, or we can be lazy and set it to zero. By doing the latter we state that timechart should accommodate all the distinct items of the UniqueCarrier
field, no matter how many there are. You have to be careful using an argument of zero as a large number of items will probably produce a rather chaotic and illegible chart. Had we not specified the limit argument, Splunk would have presented only nine items and grouped the additional ones under a category called Other. Which items fall in the
Other category depends on the function used with the timechart command. In this example, it would have been those with the lower counts. The useother argument, which controls if the Other grouping exists, is turned on by default, but only has effect if limit has a value different than zero.
Communicating results in such a way that they are easily consumable by the intended audience is challenging. Charts with colors tend to be the favored method for this. The issue is finding the appropriate chart type. In this particular example, we will see that two different chart types clearly communicate two different pieces of information. The chart type options that Splunk offers for the results of this search are column, line and area. The column chart shown in Figure 10-6 clearly presents the airlines that went out of business during this period by the simple fact that the corresponding airline column is not there anymore. In this case the column for Aloha is no longer present starting the month of November and TWA is not there in January.
Chapter 10 ■ analyzing airlines, airports, Flights, and delays
Figure 10-7 presents the same results but using a line chart. As you can see the chart is pretty busy and does not convey the demise of the airlines as clearly as the column chart. However, in between December and January we can see a piece of information that was not obvious in the column chart. Whereas the traffic of TWA goes down to zero, the number of flights for American increases by about the same amount. What really happened was that American acquired TWA, thus a corresponding increase in flights. A completely different piece of information emerges by just changing the chart type.
For completeness, we present in Figure 10-8 the area chart generated with the same results as the previous two charts. You can barely distinguish five items and nothing can really be deduced from reviewing it. Although this chart type is useless for the current type of results, it can be useful for results generated by other searches.
Figure 10-8. Airlines out of business after September 11 (area chart)
Now we have a pretty good idea about the airlines over the period of available flight data. The next step is to gain an understanding of the airports.