• No se han encontrado resultados

Palaeozoic Basement of the Pyrenees

A COMPARISON OF THE TOLEDANIAN AND SARDIC VOLCANISM

3. Geochemical data

3.1. Materials and methods

In this section, we apply our change detection method to a real-world time series dataset. The objectives for the case study are: (1) determine whether or not our method detects changes, which have occurred during some time periods; (2) determine whether or not our method produces “false alarms”

while the dataset’s characteristics do not change over time significantly.

The dataset has been obtained from a large manufacturing plant in Israel representing daily production orders of products. From now on this dataset will be referred as “Manufacturing”.

The candidate input attributes are: Catalog number group (CATGRP) — a discrete categorical variable; Market code group (MRKTCODE) — a discrete categorical variable; Customer code group (CUSTOMERGRP) — a discrete categorical variable; Processing duration (DURATION) — a discrete categorical variable which represents the pro-cessing times as disjoint intervals of variable size; Time left to operate in order to meet demand (TIME TO OPERATE) — a discrete categorical variable which stands for the amount of time between the starting date of the production order and its due date. Each value represents a distinct time interval; Quantity (QUANTITY) — a categorical discrete variable which describes the quantity of items in a production order. Each value represents a distinct quantity interval. The target variable indicates whether the order was delivered on time or not (0 or 1).

The time series database in this case study consists of records of pro-duction orders accumulated over a period of several months. The ‘Manu-facturing’ database was extracted from a continuous production sequence.

Without further knowledge of the process or any other relevant informa-tion about the nature of change of that process, we may assume that no significant changes of the operation characteristics are expected over such a short period of time.

Presentation and Analysis of Results. Table 9 and Figure 2 describe the results of applying the IFN algorithm to six consecutive months in the

‘Manufacturing’ database and using our change detection methodology to detect significant changes that have occurred during these months.

The XP statistics, as described in Table 9 and Figure 2, refer only to the target variable (delivery on time). The magnitude of change in the can-didate input variables as evaluated across the monthly intervals is shown in Table 10.

Table 9. Results of the CD hypothesis testing on the ‘Manufacturing’ database

Month CD XP

eMK−1,K eMK−1,K−1 d H(95%) 1 − p-value 1 − p-value

1

2 14.10% 12.10% 2.00% 4.80% 58.50% 78.30%

3 11.70% 10.40% 1.30% 3.40% 54.40% 98.80%

4 10.60% 9.10% 1.50% 2.90% 68.60% 76.50%

5 11.90% 10.10% 1.80% 2.80% 78.90% 100.00%

6 6.60% 8.90% 2.30% 2.30% 95.00% 63.10%

58% 54%

69%

79%

78% 95%

99%

76%

100%

63%

0%

20%

40%

60%

80%

100%

2 3 4 5 6

month

CD XP

1–p value

Fig. 2. Summary of implementing the change detection methodology on ‘Manufactur-ing’ database (1 − p-value).

Table 10. XP confidence level of all independent and dependent variables in ‘Manufacturing’ database (1 − p-value).

CAT MRKT Duration Time to Quantity Customer

GRP Code operate GRP

Domain 18 19 19 19 15 18

Month 2 100% 100% 100% 100% 100% 100%

Month 3 100% 100% 100% 100% 100% 100%

Month 4 100% 99.8% 100% 100% 100% 100%

Month 5 100% 99.9% 100% 100% 100% 100%

Month 6 100% 100% 100% 100% 100% 100%

According to the change detection methodology, during all six consecu-tive months there was no significant change in the rules describing the rela-tionships between the candidate and the target variables (which is our main interest). Nevertheless, it is easy to notice that major changes have been revealed by the XP statistic in distributions of most target and candidate input variables. One can expect that variables with a large number of values need greater data sets in order to reduce the variation of their distribution across periods. However, this phenomenon has not affected the CD statistic.

An interesting phenomenon is the increasing rate of the CD confidence level from month 2 to month 6. In order to further investigate whether a change in frequency distribution has still occurred during the six consecu-tive months without resulting in a significant CD confidence level, we have validated the sixth month on the fifth and the first months. Table 11 and Figure 3 describe the outcomes of the change detection methodology.

Implementing the change detection methodology by validating the sixth month on the fifth and the first month did not produce contradicting results.

That is, the CD confidence level of both months ranges only within±8%

from the original CD estimation based on the all five previous months.

Furthermore, although XP produced extremely high confidence levels indi-cating a drastic change in the distribution of all candidate and target vari-ables, the data mining model was not affected, and it kept producing similar validation error rates (which were statistically evaluated by CD).

The following statements summarize the case study’s detailed results:

• Our expectation that in the ‘Manufacturing’ database, there are no sig-nificant changes in the relationship between the candidate input variables and the target variable over time, is validated by the change detection

Table 11. Outcomes of XP by validating the sixth month on the fifth and the first month in ‘Manufacturing’ database (p-value).

CAT MRKT Duration Time Quantity Customer Target

GRP Code to Operate GRP

Metric XP domain 18 19 19 19 15 18 2

(1 − p-value) month 5 validated by month 6

100% 100% 100% 100% 100% 100% 98.4%

month 1 validated by month 6

100% 100% 100% 100% 100% 100% 100%

months 1 to 5 validated by month 6

100% 100% 100% 100% 100% 100% 63.1%

95.0%

70.0%

75.0%

80.0%

85.0%

90.0%

months 1 to 5 validated by

month 6

month 1 validated by

month 6

month 5 validated by

month 6 88.5%

76.1%

Fig. 3. CD confidence level (1 − p-value) outcomes of validating the sixth month on the fifth and the first month in ‘Manufacturing’ database.

procedure. No results of the changed detection metric (CD) exceeding the 95% confidence level were produced in any period. This means that no “false alarms” were issued by the procedure.

• Statistically significant changes in the distributions of the candidate input (independent) variables and the target (dependent) variable across monthly intervals have not generated a significant change in the rules, which are induced from the database.

• The CD metric implemented by our method can also be used to determine whether an incrementally built model is stable. If we are applying a stable data mining algorithm, like the Info-Fuzzy Network, to an accumulated amount of data, it should produce increasing confidence levels of the CD metric over the initial periods of the time series, as more data supports the induced classification model.

Thus the results obtained from a real-world time series database confirm the conclusions of the experiments on artificial datasets with respect to reliability of the proposed change detection methodology.