• No se han encontrado resultados

2.2.12. Modelo de Transporte

2.2.12.2. Técnica del modelo de transporte

Scaling-up performance of RBF-NF models: Results are shown for the three different modelling approaches applied to the prediction of survival in a bladder cancer, Section 4.4 Analysis of predictive performance and Section 4.5: Summary.

4.2 Methodology

The methodology is organised in three, incremental, parts, whereby a FCM- based RBF-NF modelling approach is presented, then enhanced with measures of weighted-clustering followed by a cluster validity approach.

4.2.1 FCM and RBF-NF function model

The data-mining workflow consists of an initial data pre-processing step, where data normalisation is performed followed by a student’s distribution t-test to eliminate easy to identify irrelevant to the process genes. The following step consists of applying Fuzzy C-means clustering for the creation of the initial rule-base. This rule-base is then ‘translated’ into a Radial-Basis-Function Neural-Fuzzy structure (one multi-dimensional cluster corresponds to one Fuzzy Logic rule), and the modelling structure is finally

parametrically optimised via the Levenberg-Marquardt function-minimisation algorithm [144].

In the same way as in the preceding chapters, the data to be analysed is composed of all the patients and genes plus the survival outcome. Another characteristic of the weighted Fuzzy C-Means is that the number of clusters (rules) needs to be specified, to ensure the low computational complexity; the number of clusters is fixed to 5 (rules). Based on previous research work presented in Chapter 3, five rules in this case study offers a good balance of performance and model simplicity.

4.2.2 WFCM and RBF-NF function model

FCM algorithms consider each object equally important in the cluster solution. For that reason, when FCM is applied to a high number of inputs (more than a thousand), the rule-base loses clarity due to the high dimensional space and the values of the membership degree become truly small. The challenge that arises is that the FCM clusters are the initial conditions for the RBF Neural-Fuzzy and because of their poor quality, the optimisation algorithm fails. By applying Weighted FCM the relative importance of each object to the clustering solution is defined. This weighted factor is applied to the output of the data to improve the membership degree of each cluster. This modification improves the quality of the initial Membership functions of the RBF Neural-Fuzzy model. The second contribution presented in this Chapter (Figure 4.1) consists of applying a Weighted Fuzzy C-means clustering algorithm for the creation of the initial rule-base and applying the rule-base directly to the RBF Neural-Fuzzy model. The rule-base is then ‘translated’ into a Radial-Basis-Function Neural-Fuzzy structure, and is parametrically optimised via the Levenberg-Marquardt function minimisation algorithm [144].

Chapter 4: Scaling-up of RBF models in bladder cancer prediction 68

The weighted FCM (WFCM) is based on the minimisation of the following objective function: 𝐽𝑚= ∑ ∑𝐶 𝑢 𝑗=1 𝑁 𝑖=1 𝑖𝑗 𝑚 𝑤𝑖 ‖𝑥𝑖 − 𝑐𝑗 ‖2, 1 ≤ 𝑚 < ∞ (4.1)

where m is any real number greater than 1, uij is the membership degree of xi in

the cluster j, xi is the measured data, cj is the centre of the cluster, and 𝑤𝑖 is a weighted

factor applied to the output of the data and is equal to the number of inputs.

The membership uij and the cluster centres cj are calculated by:

𝑢𝑖𝑗 = 1 ∑ (‖𝑥‖𝑥𝑖 − 𝑐𝑗‖ 𝑖 − 𝑐𝑘‖) 2 𝑚−1 𝑐 𝑘=1 , 𝑐𝑗 = ∑ 𝑤𝑖 𝑢𝑖𝑗 𝑚∗ 𝑥𝑗 𝑁 𝑖=1 ∑𝑁 𝑤𝑖 𝑢𝑖𝑗𝑚 𝑖=1 (4.2)

Each sample will have a membership assigned (uij) in every cluster; a higher

membership would translate into a higher degree of similarity between the sample and the cluster. Each derived information granule (data-cluster) depicts a process rule in the Fuzzy Logic domain. The weighted FCM is similar to the one proposed in [155, 156], however, the novelty of the present work is that the weighting factor changes in relation to the number of genes that are used by the model.

4.2.3 WFCM, validation index and RBF-NF function model

In this section, a cluster-validity index is introduced to the data-mining process to further improve the quality of the rule-base. Figure 4.2 depicts the validity index data-mining workflow. There are multiple indices for validation of the fuzzy clusters; partition coefficient [157], partition entropy[158] , Fukuyama and Sugeno [159], Xie- Beni[160] . Most of the validation indices aim to find the optimal number of clusters, but in this Chapter a modification of the Xie-Beni index is used, as presented in [155], to improve the quality of the clusters calculated by the WFCM. A reliable validation index should take into consideration the compactness or how close each point of the cluster is and the separation of the FCM clusters, which is the case in the Xie-Beni index; 𝐼𝑑 =∑ ∑ 𝑤𝑘(𝑢𝑘𝑗) 𝑚 ‖𝑥 𝑘− 𝑐𝑗‖2 𝑐 𝑗=1 𝑁 𝑘=1 𝑛 𝑚𝑖𝑛𝑗≠𝑖{‖𝑐𝑗− 𝑐𝑖‖ 2 } (4.3)

The measure of Compactness (𝐶𝑡) is given by:

𝐶𝑡 =∑ ∑ 𝑤𝑘(𝑢𝑘𝑗) 𝑚 ‖𝑥 𝑘− 𝑐𝑗‖ 2 𝑐 𝑗=1 𝑁 𝑘=1 𝑛 (4.4)

Chapter 4: Scaling-up of RBF models in bladder cancer prediction 70

The measure of separation is given by:

𝑆𝑒𝑝𝑎𝑟𝑎𝑡𝑖𝑜𝑛 = 𝑚𝑖𝑛𝑗≠𝑖{‖𝑐𝑗− 𝑐𝑖‖2} (4.5)

where 𝐶 is the number of clusters, 𝑢𝑘𝑗 is the membership degree, 𝑤𝑘 is the

weight of significance assigned to 𝑥𝑘, which is the complete data, and 𝑐𝑖 are the centres of the clusters. The optimal partition clusters would have to be as compact as possible, while they maintain a good balance between separation and coverage of the input space [152]; these characteristics would translate into a high quality rule-base.

Figure 4.2: Flow chart of the processing of the data with weighted FCM and the validation index

Documento similar