The definition of the concept of Emergent Software Systems allows a range of pos- sibilities to implement the details of the learning process to support system self- composition and optimisation at runtime. A key property of the learning process of emergent systems is its ability to experiment with the live system and learn about the operating environment and available components as the system executes. This section focuses on the challenges of implementing online learning strategy consider- ing our experience to support software composition at runtime.
The reinforcement learning approach, explored in this thesis, consists of two parts, the exploration and exploitation phases. In the exploration phase, the system selects an architectural composition for use, then it waits for a period of time (named observation window) as the chosen composition is executing. At the end of the observation window the system collects the generated metrics and events. These collected metrics and events support the system to understand how well a particular composition performed under a specific operating condition. This process is repeated until all compositions are tried at runtime. In the end of the experimentation cycle, the system compares the performance of the tried compositions and selects the best performing architecture for the perceived operating environment. Once the system experiments and locates the appropriate software assembly for a specific operating environment, it stores that information in its knowledge base, so that
3.2. Emergent Software Systems Challenges 56 signal (event stream) time level observation window a b c d e f g h i a {n.3, q.2, t.9} 100
architecture environment performance
b {n.14, q.4, t.1} 120
c {n.26, q.9, t.0} 60
d {n.26, q.7, t.3} 170
e {n.24, q.0, t.5} 34
f {n.26, q.4, t.5} 82
Figure 3.1: The emergent software online learning problem. The way the environ- ment changes over time is illustrated on the left. On the right, the data collected at the end of each observation window is illustrated.
if the system encounters this operating condition again, it automatically changes its architectural composition to the one it previously learned, without realising the exploration process again. In the exploitation phase, the system maintains the located optimal composition running, whilst observing the operating environment. In case new conditions emerge, the system returns to the exploration phase.
Fig. 3.1 illustrates the learning problem in its abstract form. On the left side of the figure is the environment representation (event stream) to which the system is subjected. These events represent the system perception of the environment which is generally outside the control of the system. The graph is built with a collection of events periodically generated by components integrating the running architectural composition. On the right side of Fig. 3.1, the data collected from the components both events and metrics are displayed. In this example, the system changes its architecture in each successive observation window. This behaviour is common during the exploration phase of the online reinforcement learning approach. From this example a number of challenges are discussed:
• Comparison difficulty: Comparing different architectural compositions is
an essential part of the online learning approach. This comparison process is difficult because of the dynamism of the operating environment that might change its conditions at any point of the exploration process. For instance, changes in operating environment on subsequent observation windows makes
3.2. Emergent Software Systems Challenges 57 the compositions experimented before the change incomparable with the ar- chitectural compositions experimented after the change. This is illustrated in Fig 3.1 on the right side in architecture (a), (b) and (c) as compared to (d), (e) and (f) that executes in the operating conditions. Scenarios that present a constantly changing operating environment present a great challenge to online learning and an interesting avenue for future work.
• Mid-window changes: Changes in the operating environment also can hap-
pen during an observation window. This is illustrated in architecture (g), where the graph representing the environment changes to an upward slope. This is particular difficult to detect depending on the size of the observation window and the type of collected events. Poorly defined observation window size or the absence of certain events may hide this transition from the system. This hinders the system’s ability to properly compare compositions, impacting the quality of the learning process outcome.
• Self-referentiality: The operating environment is perceived through the
components that compose a certain architecture. This might create a dis- torted perception of the environment. For example, in the architecture (e) there is an apparent dip in the environment graph. If the environment is rep- resented by the number of requests made to the system, at that point it looks like the system received less requests as compared to architectures (d) and (f). This can happen if architecture (e) is slower than architecture (d) and (f), handling less requests in the same period of time. The perceived environment condition is often influenced by the executing architecture. This concept is further discussed in Sec. 3.2.2 when describing ‘Dynamic Fitness Landscape’.
• Observing the past: The system perception, in terms of its performance or
in terms of its operating environment, is always observing the past. Because the metrics and events are only collected after the observation window period (i.e. the system has already processed them). The online learning approach considers a reactive strategy making decisions assuming that the system will continue to behave as it recently did. A prediction strategy, on the other hand,
3.2. Emergent Software Systems Challenges 58 might consider the general trend of the operating condition, making decisions assuming the direction the environment is taking, rather than the information it has just collected. This thesis explores a reactive approach, but considers the possibility of exploring both approaches in parallel as future work.
• Hidden trends: The selection of the aspects to characterise the environment
is important in order to have a clear picture of the environment and how it maps onto the perceived performance of the system. An example of this is when the system captures only the volume of data the system is handling but not the type of data. Thus a high increase in the volume of a specific type of data might not impact on the system performance as much as a low volume increase of another type of data. Ignoring the data type in this situation makes the system unable to establish a precise correlation between the environment and its performance, making the learning process more challenging.
• Multi-dimensionality: The graph in Fig. 3.1 represents only one dimension
of the operating environment. In reality, a proper characterisation of an oper- ating environment considers multi dimensions, requiring the system to report multiple event types on different aspects of the environment. This charac- teristic makes all previous described challenges multi-dimensional in nature, increasing the difficulty in implementing the learning process.
In summary, the reinforcement learning approach has two main tasks. The first task consists of the proper characterisation of the operating environment so that the experimented architectural compositions can be compared in equivalent envi- ronments. This enables the system to establish correlations between architecture features (e.g. the presence of a certain component) and operating conditions char- acteristic (e.g. the volume level of a specific input type). This task is key to ensure that the system is able to learn what architectural composition is most suitable for a specific operating environment. The second task, on the other hand, is to ensure that the learning approach balance the trade-off between exploration and exploita- tion. The exploration is important to find optimal composition in case of changes in the operating environment or when new components are added. Exploration is also
3.2. Emergent Software Systems Challenges 59 important to increase the system knowledge about the operating environments and components and possibly find better architecture assemblies. At the same time, it is desirable that the system exploits the optimal architectures as much as possible. This balance is essential considering this reinforcement learning happens in the ‘live system’, having real consequence when the system operates sub-optimally during exploration. This is a reactive learning approach, where the system learns when new operating conditions emergent. In parallel the system could use a predictive approach, having an offline algorithm running separately analysing all information gathered by the reinforcement learning process.