Regionalización fitogeogáfica de la superprovincia de Cuba y sus característi- característi-cas
TAREAS Y PRINCIPIOS DE LA TIPOLOGIA FORESTAL
A Bayesian statistical model can summarise information comprised in the dataset and allow us to perform inference on these summarised (learned) statistics over a probabilistic framework. A BN offers us a way to propose a probabilistic model on a collection of variables, without necessarily involving data [149]. In essence, we can describe a Bayesian statistical model as a BN: the marginal distribution of a parameter is the prior; the CPT of the data is the likelihood function over the parameter; as a result, the posterior distribution is the conditional distribution of the parameter given the observed data. In the rest of the thesis, we use the term ‘hybrid BN models’ to encompass both types of models to avoid confusion.
The process of constructing a hybrid BN model structure can be demanding, especially for complex and large networks (see Section 2.3.2 and 2.3.3). In Mahoney and Laskey [105], the difficulty of using BN models for large and complex systems was highlighted. The
network becomes difficult to understand when too many variables are presented in a single model [45, 26].
Often, the complexity of the model is contributed by many similar repeated fragments with the same causal (or influential) structures and variables but different entries. Object Oriented (OO) design, where each object contains a set of fixed data structures and methods interact with the data [57], gives a suitable framework to design models with many repeated structures. Early work in Laskey and Mahoney [90] implemented the idea of OO to organise the BN into various parts, where each part is a semantically meaningful unit called a fragment. Each fragment consists of a range of variables and corresponding edges. A prototype was built by arranging these fragments with a system engineering approach that follows a spiral life cycle model in Laskey and Mahoney [91]. Similar works were developed by Neil et al. [124]: inspired by Pearl [136]’s idea of assembling causal structures from a stock of building blocks, Neil et al. proposed a set of so-called idioms to serve as these reusable building blocks in a BN. These idioms represent typical types of uncertain reasoning that can be used to construct BNs based on specific cases. Adapting the concept of OO, each idiom was treated as an object, therefore can build a complex model using some simple combination rules. A software safety case was also studied using these idioms, demonstrating its capability in formulating a large-scale BN. These idioms are recurring patterns that provide an efficient and consistent guideline; with this knowledge-based guideline, these idioms allow the practitioners to understand the problem domain and Bayesian modelling principles better.
At the same time, Koller and Pfeffer [82] proposed to extend the concept of OO to structure the BN parts as reusable objects and instantiate these objects to form a complete (ground) model. This type of BN is called an Object-Oriented Bayesian Network (OOBN). An object in an OOBN has attributes that can be random variables. The attributes in an object can be private and only accessible inside the object or can be inputs, or serve as the outputs of this object and become the inputs of another object managed by an external interface (Encapsulation). The object can be viewed as a stochastic function that outputs the probability distribution for each value of its inputs. The concept of class is also included in OOBN, where a model can contain multiple instances of any each class, each described using the same probabilistic model. OOBNs gives us a framework to model a generalised probabilistic model that can be reused in different contexts (Abstraction). Inheritance is also supported in an OOBN. A subclass simply modifies some of the attributes or adds new attributes from the parent class’s stochastic function. These features enable us to instantiate a large BN in a well-defined way.
Assume bridge failure is determined by the failure of its components Deck, Support 1 and Support 2; component failure is determined by the age of the component and whether the bridge is overloaded (factor). We can model each component’s failures using the same probabilistic model with an internal parent variable Age and an external parent Factor representing whether the bridge is overloaded that shared between all components; afterwards, all Component Failure variables are aggregated to determine the state of Bridge Failure. This example is presented in Figure 3.7 with an object class Component, which encodes the same probabilistic model for each component and this object was reused three times. However, pointed out in Pfeffer et al. [140], an OOBN is constrained in representing a fixed set of related objects only. In Figure 3.7, each Component Failure variable is fixed with two parents: its age and the factor shared between all components. While in maintenance modelling problem, the number of objects and the relationships between them could vary, for example, what if a factor has an impact on some components only instead all of them?
This challenge can be resolved using the Probabilistic Relational Model (PRM) formalism, developed by Koller [81], which extends the concept of OO in modelling probabilistic graphical models (i.e. BNs) with uncertain relations. A PRM combines probabilistic dependencies with a relational schema that describes the entities in the problem domain. The uncertainties represented in the model can include attribute uncertainty, structural uncertainty, and class uncertainty. The OO paradigm also gives us advantages in model inference: we can perform inference on specific compiled parts of the model that are queried rather than the joint distribution of a whole model where some variables may not be used in this particular case [140, 53, 114].
Figure 3.8 A PRM example about bridge failure: (a) probabilistic dependencies; (b) a relational schema; (c) the ground BN.
Additional to the same assumptions made in the last example, we also believe coastal proximity has an impact on component failure but only to component built with metal [187]. This example is difficult to build using OOBN as you either have to develop two object classes one for components with metal and one for others or create a dummy variable in the component object to check whether each component is made of metal and use the result to further adjust the function on component failure. Instead, PRM gives an elegant way to model this problem with two parts as shown in Figure 3.8: (a) probabilistic dependencies: every object class encodes a probabilistic model that has the same variables and dependencies; (b) a relational schema: encodes the relational structure of the problem, its instantiation is a relational database.
When instantiating the variables related to Bridge 1, the PRM first queries table Compo- nent in the relational database to see which component has a foreign key of Bridge 1, we have Deck, Support 1 and Support 2; at the same time, the PRM queries table Feature follows the same principle, we have factors about whether the bridge is overloaded and near coastal. After the instantiation of variables, the next step is the instantiation of dependencies. The rela- tionship between Factor overloaded and component failure depends on the bridge; therefore, the dependencies are instantiated if Bridge 1 is instantiated, together, the loading value from table Bridge gives the observation on Factor overloaded. The component’s material decides the relationship between Factor near-coastal and the component failure; therefore, from table Component we can see whether the component is made of metal, if yes, the dependency is instantiated and the observation on Factor near-coastal is given by the coastal proximity value from table Bridge. Variables Component Failures are later aggregated (e.g. mean, maximum or minimum) to determine the probability for Bridge Failure.
Figure 3.8 (c) presents an instantiated BN follows these procedures. Because all three components belong to Bridge 1, they all depend on Factor overloaded. Also, because the deck is made of concrete, its relationship with Factor near-coastal is not initiated, while both supports are made of metal, they both depend on Factor near-coastal. This example shows how we can use the PRM to instantiate non-fixed dependencies using the same object. Later in Chapter 7 we will explain these procedures in details, including how to instantiate a relational schema into a relational database and how to instantiate a BN follows the database and the probabilistic dependencies.
The PRM provides a separation between probabilistic models and structure relationships, gives a clear semantic in describing complex problems and an effective inference structure for the underlying models. This modelling framework is adopted in this thesis, where a number of generic BN models are developed in Chapter 4 to serve as the reusable probabilistic models that later are used as the model library in Chapter 7. To fulfil Objective VI, Chapter7 also show how to organise these models to create a maintenance model applicable in a particular circumstance according to its own structural relations.