The experiments presented in this section aim at the analysis of the ability of the soft sensing algorithm to develop adaptive soft sensors starting with a minimal training data set. The collection of a large amount of training is required by most of the current soft sensors considered in Chapter 2. Some of the extreme cases require the collection of historical data over several months of operation of the processes.
To analyse the ability of the algorithm to deal with this constraint the soft sensors are developed using only10% of the available data as historical data. After the initial training, the soft sensors are deployed. For the on-line data, only25% of the target values are used for the adaptation. For all of the soft sensors, the default parameters from Table 6.1 are applied.
Industrial drier
The predictions of the industrial drier soft sensors developed using only a minimal amount of training data are shown in Figure 6.36. One can see that the on-line phase takes longer (compare e.g. with Figure 6.31) as it now covers90% of the data set. The soft sensor is able to deliver useful
0 200 400 600 800 1000 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time Target value
Predictions Robust On!line Soft Sensor
Target
ROSS [0.00696 , 0.31]
Figure 6.36: Industrial drier: Predictions of a soft sensor developed using minimal training data predictions from the beginning of its operation (see Figure 6.37), the on-line area that is covered poorly is between samples 130 an 180, which can probably be attributed to the high noise level of the target variable in this range.
In order to analyse the influence of the limited training data, Table 6.8 shows the MSE and correlation coefficient of the full scale soft sensor from Section 6.6.4 next to the performance of this soft sensor measured over the same on-line data samples. The table shows that the minimal training data soft sensor’s performance is equivalent to the full-scale ROSS using the full training data. This clearly demonstrates that the limited amount of training data is no obstacle for the soft sensor because it maintains a stable performance by means of its adaptation.
0 20 40 60 80 100 120 140 160 180 200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Time Target value
Predictions Robust On!line Soft Sensor
Target ROSS
Figure 6.37: Industrial drier: Minimal training data soft sensor predictions (detail view of the first 200 samples)
Soft sensor type MSE Corr. coef.
Full-scale ROSS 5.35∗ 10−3 0.26
Minimal training data soft sensor 5.20∗ 10−3 0.28
Table 6.8: Industrial drier: Comparing the minimal training data soft sensor with the full-scale soft sensor from Section 6.6.4
Thermal Oxidiser
The overall predictions of the thermal oxidiser soft sensor developed with minimal training data is shown in Figure 6.38. One can see that this soft sensor is also able to deal with the on-line data. However, unlike in the previous case, this soft sensor has problems at the beginning of the training phase as shown in Figure 6.39. Nevertheless, after the initial problems the model stabilises, which is also confirmed in Table 6.9, which shows its performance over the last 1437 samples, i.e the range equivalent to the on-line phase in previous thermal oxidiser experiments. The table reveals that both the full-scale soft sensor from Section 6.6.4 and the soft sensor trained with minimal training data achieve similar performance and thus demonstrates the ability of the algorithm to start with only minimal training data and to continue to learn during the learning phase.
Soft sensor type MSE Corr. coef.
Full-scale ROSS 9.91∗ 10−4 0.64
Minimal training data soft sensor 8.77∗ 10−4 0.69
Table 6.9: Thermal oxidiser: Comparing the minimal training data soft sensor with the full-scale soft sensor from Section 6.6.4
0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Time Target value
Predictions Robust On!line Soft Sensor
Target
ROSS [0.000845 , 0.7]
Figure 6.38: Thermal oxidiser: Predictions of a soft sensor developed using minimal training data
0 50 100 150 200 250 300 350 400 0.25 0.3 0.35 0.4 0.45 Time Target value
Predictions Robust On!line Soft Sensor
Target ROSS
Figure 6.39: Thermal oxidiser: Minimal training data soft sensor predictions (detail view of the first 400 samples)
Catalyst activation
The predictions of the catalyst activation soft sensor for the extended on-line phase are shown in Figure 6.40. The soft sensor seems to have a problem with the predictions of the first 200 samples,
which is confirmed in Figure 6.41, which focuses on this part of the on-line phase. The reason for this is probably the extremely low number of data points available for the training, which, in the case of this experiment, are only 65 samples.
0 100 200 300 400 500 600 !0.2 0 0.2 0.4 0.6 0.8 1 1.2 Time Target value
Predictions Robust On!line Soft Sensor
Target
ROSS [0.0119 , 0.92]
Figure 6.40: Catalyst activation: Predictions of a soft sensor developed using minimal training data
An interesting fact can be observed in Table 6.10 (as well as in Table 6.9 and Table 6.8), which compares the performance of the soft sensor from Section 6.6.4, which is trained with the full training set, with the soft sensor presented in this section. It turns out that the soft sensor trained with only10% of the samples achieves much better performance than the one trained with the full training data, i.e. 30% of the data. The reason for this paradoxical observation is that the on-line adaptation is probably more effective (even despite the fact that only25% of the data are used for adaptation) than the intitial training process.
Soft sensor type MSE Corr. coef.
Full-scale ROSS 1.18∗ 10−2 0.78
Minimal training data soft sensor 6.37∗ 10−3 0.87
Table 6.10: Catalyst activation: Comparing the minimal training data soft sensor with the full- scale soft sensor from Section 6.6.4
Experiment conclusion
The experiments with the minimal training data have shown that the limitation can have a negative effect on the prediction for the early data points in the on-line phase. This issue is caused by the small amount of the initial training data. However, all of the considered soft sensors managed to deal with poor prediction performance for the early data points by means of on-line adaptation
0 20 40 60 80 100 120 140 160 180 200 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Time Target value
Predictions Robust On!line Soft Sensor
Target ROSS
Figure 6.41: Catalyst activation: Minimal training data soft sensor predictions (detail view of the first 200 samples)
and over comparable ranges delivered similar or better performance compared to the soft sensor presented in Section 6.6.4, which use the full training data set.
6.7
Summary
The main contribution of this chapter is the demonstration of a practical instantiation of the ab- stract architecture presented in the previous chapter. The implementation, i.e. the complex soft sensing algorithm, shows that by following the structure of the architecture, a flexible, adaptive and robust algorithm for the development of soft sensors can be constructed. The algorithm’s core is built by the adaptive local learning discussed in Chapter 4. The extended algorithm relies ex- clusively on established and proven machine learning principles such as cross-validation, boosting and ensemble methods. Although very complex, the algorithm is easily manageable and provides mechanisms that allow off-the-shelf deployment of soft sensors without any manual parameter or model selection.
Next, soft sensors for the three process industry data sets used in Chapter 4 were developed using the complex soft sensing algorithm. The experiments focus on aspects like the analysis of the influence of the number of local expert on the performance of the soft sensor. Another aspect that was analysed is the influence of the amount of feedback available for adaptation purposes. It turned out that for two out of the three data sets a small amount of feedback, i.e. target values, is already sufficient to maintain a stable performance level. In another set of experiments the ability of the algorithm to develop simple and transparent soft sensors was demonstrated. These soft sensors were a locally valid convex combination of a few linear models. Although the soft sensors delivered sub-optimal performance, the performance level was still acceptable and similar to the LWPR-based soft sensors. In the experiment, the ability of the algorithm to develop soft sensors
off-the-shelf, without any manual intervention from the user, was analysed. The conclusion of this experiment was that the algorithm is able to adapt the structure of the resulting soft sensor to the underlying data set, i.e. modelling task. The final set of experiments paid attention to the ability of the algorithm to develop adaptive soft sensors with a minimal amount of training data. Although two of the soft sensors had difficulties at the beginning of the on-line phase, they managed to improve their performance during the on-line phase by means of on-line adaptation.
Conclusions
7.1
Project summary
The primary target of this work was to define a concept for the development of next-generation soft sensors. The purpose of the concept is to help to overcome the major sources of frustration with practical implementations of soft sensors in the process industry.
In order to be able to address this problem, it was first necessary to review the current state- of-the-art soft sensor development and application. In this work this task was approached from two different directions. On one hand a comprehensive review of a large number of academic articles dealing with practical implementations of soft sensors was done. On the other hand, valuable practical information was obtained from discussions and interviews with experienced soft sensor developers from Evonik Degussa GmbH. Both of these aspects are reflected in Chapter 2. Although there are many differences in these two view-points, one point where both agree is that there is a large amount of process a-priori knowledge necessary to be able to develop useful soft sensors. This knowledge has to be applied to data pre-processing, e.g. variable selection, or model type and parameter selection, which makes this step time consuming and costly. A particularly critical aspect that is neglected in the vast majority of publications is the soft sensor maintenance issue. In most publications the developed soft sensor is applied only to data that covers only a few months of the process operation, which is considered as sufficient to demonstrate that the soft sensor performs well. However, in practical scenarios the soft sensors are required to operate over much longer periods of time, which is often prevented by effects like process dynamics or changing data quality. In fact, the soft sensor maintenance can be much more expensive than its development because the model has to be re-trained or re-developed periodically. This is a very limiting factor requiring a lot of attention from the soft sensor operator and prohibiting a wider spread of soft sensors in the process industry. After this review, it was clear that the developed concept has to facilitate not only quick and efficient development of soft sensors but has also to pay at least the same attention to the automation of maintenance of the soft sensors in the form of their self-adaptation.
The next step was the identification of useful machine learning concepts and techniques, which, theoretically, could be useful for achieving the project goals. What became obvious at this stage was that the work had to be embedded in the concept of algorithm independent learning. This fact was supported by the previously discussed review, which has shown that there are many different methods, not only for modelling but also for data pre-processing, applied to soft sensor development. Each of these methods has its strengths and weaknesses and there is no golden method to which this work could be restricted. One of the first concepts that turned out to be
potentially beneficial was local learning. By applying this technique it is possible to train models, called local experts, for different operational states of the processes. An inevitable challenge that emerges from this concept is the switching or combining of the different local experts. In general, this goal can be approached using ensemble methods, which is the next class of techniques and was reviewed in Chapter 3. The effectiveness of ensemble methods is backed-up by analytic and empirical evidence, which was another motivation to consider this concept. As it started to be apparent that the developed concept for soft sensor development will be rather complex, another technique that would handle its high-level aspects had to be found. In this context, meta-learning turned out to be very useful. Meta-learning techniques can be used, for example, for extraction and transfer of high level knowledge. All of the methods mentioned so far focus mainly on non- adaptive, off-line predictive modelling. To close this gap to the on-line adaptation, concept drift detection and handlingwas reviewed. This research topic provides the theory as well as many practical techniques for dealing with adaptation aspects of predictive models. Another benefit is that it is largely independent of the underlying predictive methods and can easily be combined with the techniques like local learning and ensemble methods, i.e. it fits well into the algorithm independent learning framework.
The local learning and ensemble methods together together with a specific concept drift han- dling method were applied to develop an early-stage algorithm presented in Chapter 4. The goal of the algorithm was to prove that these approaches are useful for soft sensor development and deal- ing with the issues identified earlier in this work. The local experts applied in the algorithm were simple models consisting of local PCA pre-processing and linear regression techniques. They were trained on partitions of data that were segmented using a novel approach. The experimental evaluation was done using data sets from three different processes from the industrial partner of the project. The applied methods included apart from the novel algorithm (Local learning-based Adaptive Soft Sensing Algorithm- LASSA) also LWPR, which is another local learning-based technique, as well as traditional soft sensors based on ANN. The experiments have shown diffi- culties with applying ANN to the real-life data sets. It was proven that it is not only difficult to select correct parameters, but also that after the selection, the performance of the optimally pa- rametetrised ANNs can still vary significantly. This is mainly due to the random initialisation and local minima problems of the ANNs. Another important result from the experiments was the good manageability and strong performance of the local learning-based methods. Another particularly important feature of these algorithms was their modularity and the possibility to apply adaptation mechanisms to the algorithm. The LASSA adaptation mechanism was based on the modification of the combination weights of the local experts. This was shown to be effective for two of the three data sets. However, for the third one, namely the catalyst activation, this approach was not able to adapt the soft sensor adequately. The reason, why this method failed was because it did not support deployment of new models (local experts) during the on-line phase and it merely adapted the contributions of the existing local experts to the final prediction. The final conclusion of the experiments was that the local learning approach combined with ensemble learning is very useful for the development of adaptive soft sensors but, in order to offer a truly adaptive soft sensor, more than this simple adaptation method had to be provided.
All of the findings and experiences with soft sensor development were projected into the ar- chitecture for the development of soft sensing algorithms presented in Chapter 5. The machine learning techniques discussed in Chapter 3 were arranged into a three-level hierarchical structure where each of the levels represents a different type of information processing. The bottom level, where the actual prediction making units operate, represents the lowest complexity but largest di- versity level. At the next level, the predictions of the models, e.g. local experts, are combined and
thus more complex predictors are built. Although the diversity at this level is much lower than at the bottom level, it is still present in the form of multiple combination schemes. At the top level, the complexity level reaches its maximum and there is no more diversity present. From this level, the operations at the lower levels are managed. The adaptation loops of the architecture follow its three-level structure and the implemented mechanisms can range from the adaptation of the models at the bottom level through the adaptation of the combinations at the intermediate level to the high-level adaptation where new combinations or local experts are deployed. The next impor- tant aspect of the architecture is the role of expert knowledge. In practical scenarios there is often some kind of expert knowledge about the underlying process available. In such cases it is crucial to provide a formal mechanism for the incorporation of this knowledge into the soft sensors. At the present stage the interface for the expert knowledge incorporation is only defined theoretically. The final step of this work was the presentation of a complex soft sensing algorithm that was developed by following the architecture. The core of this algorithm is the local learning algorithm presented earlier in this work. The extensions presented in Chapter 6 aimed at increased robustness and more effective adaptiveness of the resulting soft sensors. This is achieved by: (i) extending the available pre-processing and modelling techniques; (ii) building several local experts per recep- tive field; (iii) providing advanced selection and pruning of the local experts; (iv) using advanced data management based on two-fold cross-validation; and (v) providing several adaptation mech- anisms. In particular, there is special attention paid to the adaptation mechanisms. There is at least one mechanism implemented at each of the hierarchy levels providing the algorithm with the ability to follow the changes of the on-line data as well as the ability to adapt the algorithm to the current task. The subsequent experiments focused on the following aspect of the complex soft sensing algorithm: (i) the role of the local expert population setting and its influence on the performance of the soft sensors; (ii) the behaviour of the algorithm with varying percentage of target data available for adaptation; and (iii) the fulfilment of the project goals. Consequently, it was shown that the algorithm’s adaptation mechanisms were successful in updating the models. Another experiment demonstrated that despite the large number of input parameters the algorithm