• No se han encontrado resultados

In order to apply association rule mining to occupancy data, the dataset must be suitably structured. The aim of OccApriori to predict occupants’ locations at times throughout the day based on their historical locations at those times, and so each instance in the dataset is defined to be a single day for a single occupant. Since the time at which each location appears in the day is required for predictions, and not merely which locations appear in the same day, the items in the instances are attribute/value pairs, with the initial attributes and values being times and locations respectively. The basic set of attributes also includes the occupant’s identity, the day of the week, and the occupant’s scheduled location during the day if they have timetable data or similar available.

The occupant and day are direct attributes whose values simply state which occupant and day the instance relates to, allowing the algorithm to find patterns relating to specific occupants and/or days. For the location data, actual and scheduled, the day consists of a set of equal-length timeslots. For each timeslot there is an attribute for actual location during that period and another for scheduled location during that period, with the values stating the location. Thus the full set of basic attributes in the dataset is:

5 = J@, , 1, … , B, 1, … , BK (4-1)

Here @ is the day, is the occupant, is the occupant’s location at timeslot , and is the location they were scheduled to be in at timeslot . Of these attributes,

1… B are the ones which the algorithm aims to predict. ‘Away’ is treated as an

actual location which is recorded when the occupant is not in the building. The value for any attribute can be empty, however if absences are not explicitly recorded in the data then any patterns involving absences, such as departing the building at the end of the work day, will not be found. Table 4-1 below shows a pair of sample instances of this form, with the attributes listed in the table header.

78

Occupant Day 12:00 13:00 12:00 (S) 13:00 (S) Bob Monday Bob Office Canteen Bob Office -

Bill Monday Bill Office Away - - Table 4-1 – Example instances using OccApriori’s dataset structure

While the items in the traditional market basket dataset are all simple strings which are either present or absent, the attribute/value pairs in an occupant location dataset have some additional properties which must be considered. The timeslot attributes 1B represent an ordered list, the ordering being the time at which they occur. This ordering is important as only predictions of later times will be useful. The attributes 1B are also ordered, however this ordering is not important. This is because predicting some using data for some ; where > means the time must have passed already since there is actual location data for time , thus making a prediction of unnecessary, whereas because schedule data is by definition available ahead of time, any slot in 1B can be used to predict any slot in 1B.

The non-timeslot attributes are special in that they are actually data about the location data, i.e. meta-data. This has consequences for the frequency of the possible values as while a timeslot can, in principle, take on any location value, each day of the week occurs with fixed frequency, and each occupant’s data represents a fixed fraction of the total dataset. This in turn changes the support threshold which must be used to delineate frequent itemsets. Taking the example of the UCC dataset with 5 days and 6 occupants, assuming an even distribution of days and occupants across the instances, an itemset which includes a value for occupant and day will have a maximum support of ~0.03, which is extremely low. This will be addressed in section 4.4.

As association rule mining is an unsupervised approach designed to find any patterns present in a dataset, any of the attributes listed above can be absent and the algorithm will still function. An obvious example is that the schedule data may be incomplete or empty simply because occupants may not have any scheduled activities, and the evaluation includes examining the effect of removing the

79

timetable data where it is available. Instances missing actual location data or even the occupant and day labels can also be handled, but will obviously be of limited use in terms of making predictions. New attributes can also be freely added to the dataset, and will be picked up by the algorithm without further intervention. Examples of attributes which could be added include month, date, academic term, weather data, etc. As with the basic attributes it must be considered whether a new attribute is a meta-data attribute or not. Table 4-2 shows the example dataset from Table 4-1 with the Month and Weather attributes added (and the schedule attributes remove due to space limitations).

Occupant Day Month Weather 12:00 13:00 Bob Monday July Rain Bob Office Canteen Bill Monday July Rain Bill Office Away

Table 4-2 – Example instances using OccApriori’s dataset structure with two new attributes added – Month and Weather

Documento similar