Capítulo II Marco de Referencia
39 E n este capítulo se reportan diversos estudios relacionados con cambios
2.1 CASOS DE ESTUDIO A NIVEL INTERNACIONAL
Thus far OccApriori has only been concerned with predicting occupants’ exact future locations based on the exact locations in the past. However, there are cases where it would be useful to make predictions involving more general classes of
117
locations. This includes being able to make a more generalised prediction about where an occupant will go, for example predicting what part of a building they will be in, whether it will be some location involved in a certain task, or even simply predicting that they will be somewhere other than their current location. It also includes making predictions based on patterns that do not reference specific locations, for example predicting an occupant’s next location based on what part of a building they were in, or what task they were involved in.
These ideas are represented using a taxonomy. In a taxonomy of locations, the leaf nodes would be the specific locations which have been learned and predicted so far, i.e. specific rooms. Nodes higher in the taxonomy would represent a more general class of locations and so are referred to hereafter as ‘generalised locations’. For example, if we classify rooms by their location, then rooms ‘A’, ‘B’ and ‘C’ might all be generalised as ‘Building_1’ as all these rooms are located in Building 1. Or, they may all be generalised as ‘Office’ as they are all offices. Using such taxonomies, it is possible to refer to occupant locations in a more general sense than exact locations, allowing for generalised rules and predictions, i.e. rules and predictions which use generalised locations instead of specific ones.
In some cases these generalised predictions are only added-value, the algorithm would predict the specific location and add the general prediction to provide extra information. However there are cases where the ability to learn more general patterns would allow it to make predictions it would otherwise be unable to make. This chapter examines two examples of this in particular. The first is difficulties predicting an occupant’s location due to being unable to choose between several low-confidence options, i.e. knowing they will go somewhere but being unsure where. The second is being unable to recognise patterns which relate to classes of locations with few or no examples of those locations, for example an occupant who regularly has meetings of fixed duration, but with varying location.
118
09:00 10:00 11:00 Office Office A Office Office B Office Office C
Table 6-1 – An example occupant who remains in their office until 11:00, at which point they go to one of three locations with equal probability
Table 6-1 shows an example dataset where an occupant stays in their office until 11:00, at which point they go to one of three locations. As we have no data on what causes the occupant to go to one location rather than another, we are simply left with a 33% chance of the occupant going to each location. This will result in three rules of the form {09:00=Office, 10:00=Office} => {11:00=A}, one for each location, each of which will have 33% confidence.
We can see from the data that the occupant always leaves their office at 11:00 to go somewhere; the only doubt is where exactly they will go. However, due to the fact that we are trying to predict the exact location, all we can do is make a low- confidence prediction for one of the three locations. The low confidence is correct, as we don’t know with any certainty where the occupant will go, but it is an issue if the rule is not chosen because of its low confidence. The simplest example of this is if the confidence threshold is too high to allow these rules; if our confidence threshold is 50%, then in the example above we will make no prediction for where the occupant is at 11:00.
09:00 10:00 11:00 Office Office A Office Office B Office Office C Office Office Office Office Office Office
Table 6-2 – An example occupant who remains in their office, with the possibility of leaving at 11:00 to go to one of three locations
Something similar would occur if we were to add two extra instances to the data in Table 6-1 as shown in Table 6-2; now with two examples of the occupant staying in
119
the office at 11:00, we would predict (with 40% confidence) that the occupant will remain in the office, even though there is a 60% chance they will leave.
09:00 10:00 11:00 Office Office Not-Office
Office Office Not-Office
Office Office Not-Office
Office Office Office Office Office Office
Table 6-3 – The same example occupant as Table 6-2, with the non-office locations replaced with a single generalised location
We can avoid these failures to predict by generalising the locations outside the office. Table 6-3 replaces the locations A, B and C with the generalised location ‘Not-Office’. If we were to train on this modified dataset, we would end up with two possible predictions for where the occupant will be at 11:00, ‘Office’ with a confidence of 40% and ‘Not-Office’ with a confidence of 60%. This means that we can now correctly predict that the occupant is more likely to leave the office, without saying where specifically.
09:00 10:00 11:00 Office Office Not-Office
Office Office Not-Office
Office Office Not-Office
Table 6-4 – The example occupant from Table 6-1 with locations A, B and C generalised
Returning to the example in Table 6-1, generalising locations A,B and C results in Table 6-4. We can now learn a 100% confidence rule stating the occupant will leave their office, allowing us to make the prediction with any confidence threshold.
09:00 10:00 11:00
A A A
B B B
C C C
Table 6-5 – An example training set featuring an occupant who may be in one of three locations with equal probability, and who will remain in that location
120
Table 6-5 shows an example of the opposite issue. In this case the occupant stays in the same location in all three timeslots. For each of these locations we can learn that the occupant will remain in that location rather than returning to the office. However we do not learn the general pattern that there are locations which will keep the occupant away from their office.
09:00 10:00 11:00
D D D
E E E
Table 6-6 – An example test set in which the occupant is in two new locations
Table 6-6 shows an example test set which has two new locations which keep the occupant away from the office. We cannot predict anything regarding these locations as they do not appear in the training set. For example, if we wish to predict the occupant’s location at 11:00, we can’t as there is no data and hence no rules regarding locations D or E at 09:00 or 10:00. This is similar to the issue addressed in Chapter 5, where the algorithm needed to be able to recognise patterns even if the time at which they occur changed; in this case the time remains the same but the locations involved are changing. If we have data on these locations that tells us that A through E are all locations that will keep the occupant from their office, we would like to use that data to be able to make predictions for the instances in Table 6-6.
09:00 10:00 11:00 Not-Office Not-Office Not-Office Not-Office Not-Office Not-Office Not-Office Not-Office Not-Office
Not-Office Not-Office Not-Office Not-Office Not-Office Not-Office
Table 6-7 – The example training and test sets, now shown together and with the locations generalised
Table 6-7 replaces all the locations in the example training and test sets with the generalised ‘Not-Office’ location. With this replacement it is trivial to learn the pattern that the occupant will remain out of the office if they start the day out of
121
the office. If we wish to predict the occupant’s location at 11:00 for example, the rule {09:00=Not-Office, 10:00=Not-Office => 11:00=Not-Office} will be available.
The following sections discuss applying taxonomies to the data used in OccApriori in order to learn general patterns like these alongside the specific patterns which it already learns.