CAPITULO III: EL ESTADO Y EL PATRIMONIO CULTURAL EN EL PERÚ
3.5 Mapa de riesgo en patrimonio cultural edificado
3.5.3 Incendio en el Centro Histórico de Lima
SVM is a machine learning technique that can be applied to both regression and pattern
recognition (classification) problems. For the classification, SVM develops a decision boundary that separates two classes in the data space. To build this decision boundary, SVM maximizes the separating margin between two classes in the data space while it minimizes the classification error. Figure 5.1 shows a linear SVM decision boundary. Dots and stars denote the two classes in the data. The data points that lie on the margins at both sides of the decision boundary are called support vectors. These support vectors are shown in Figure 5.1 with a circle around them. 𝑤 is the normal to the decision boundary and 𝑏/|𝑤| is the perpendicular distance of the decision boundary from the origin [127]. When two classes are not completely separable, some of the examples will be misclassified. In Figure 5.1, one star data point has misclassified as a dot, the distance of this point from the decision boundary is −𝜀/|𝑤|.
79
-b∕|w|
-ε ∕|w|
w
margi
ns
Figure 5.1-Linear SVM hyperplane
SVM can be applied to both linear and non-linear separable problems. When two classes are not linearly separable, kernel trick can be employed and the data is mapped to a feature space (using a mapping function 𝜙(. )), which is in a higher dimension [128]. In the feature space, two classes will be linearly separable and the problem will be handled similar to the linearly separable case.
Now we describe the SVM mathematical formulation. Let the training dataset for a two-class problem be represented as S = {(𝑥1, 𝑦1), (𝑥2, 𝑦2), … , (𝑥𝑁, 𝑦𝑁) } . 𝑥1, 𝑥2, … , 𝑥𝑁 ∈ 𝑅𝑚 are the 𝑚 dimensional training data points and 𝑦1, 𝑦2, … , 𝑦𝑁 ∈ {−1, +1} are their corresponding class labels (-1 for majority class and +1 for minority class). By solving the optimization problem in Formulation 5.1, SVM develops a decision boundary that separates two classes.
min 𝑤,𝑏 1 2𝑤 𝑇𝑤 + 𝐶 ∑ 𝜀 𝑖 𝑁 𝑖=1 (5.1) 𝑠. 𝑡. 𝑦𝑖(𝑤𝑇𝜙(𝑥𝑖) + 𝑏) ≥ 1 − 𝜀𝑖 𝜀𝑖 ≥ 0
80
𝜀𝑖 are positive slack variables. When a classification error occurs, these variables will be greater than 1. 𝐶 is a parameter that determines the error penalty. 𝐶, which the user chooses is a tradeoff between minimizing the error and maximizing the margin.
Usually the Lagrangian formulation of SVM is solved (Formulation 5.2). The Lagrangian formulation is easier to handle because the constraints in Formulation 5.1 are replaced by Lagrangian multipliers [127]. 𝐿𝑃(𝑤, 𝑏, 𝜀, 𝛼, 𝜇) = 1 2𝑤 𝑇𝑤 + 𝐶 ∑ 𝜀 𝑖 𝑁 𝑖=1 − ∑𝑁𝑖=1𝛼𝑖(𝑦𝑖(𝑤𝑇𝜙(𝑥𝑖) + 𝑏) − 1 + 𝜀𝑖) − ∑𝑁𝑖=1𝜇𝑖𝜀𝑖 (5.2)
where 𝛼𝑖 and 𝜇𝑖 are positive Lagrangian multipliers associated with first and second sets of constraints in Formulation 5.1. The Krush-Kuhn-Tucker conditions for the Lagrangian primal (Formulation 5.2) are as follows,
𝑤 = ∑𝑁𝑖=1𝛼𝑖𝑦𝑖𝜙(𝑥𝑖) (5.3)
0 ≤ 𝛼𝑖≤ 𝐶 (5.4)
∑𝑁𝑖=1𝛼𝑖𝑦𝑖 = 0 (5.5)
The Lagrangian dual form of Formulation 5.2 is obtained by replacing the Equation 5.3 in 5.2. Formulation 5.6 shows the Lagrangian dual [127],
max 𝛼 ∑ 𝛼𝑖− 𝑁 𝑖=1 1 2∑ ∑ 𝛼𝑖𝛼𝑗𝑦𝑖𝑦𝑗 𝐾(𝑥𝑖, 𝑥𝑗) 𝑁 𝑗=1 𝑁 𝑖=1 (5.6) 𝑠. 𝑡. ∑𝑁𝑖=1𝛼𝑖𝑦𝑖 = 0 0 ≤ 𝛼𝑖 ≤ 𝐶
where 𝐾(𝑥𝑖, 𝑥𝑗) = 𝜙(𝑥𝑖)𝑇𝜙(𝑥𝑗) is the kernel function that calculates the inner product of data points in the feature space.
81
𝑤 = ∑𝑁𝑗=1𝑠 𝛼𝑖𝑦𝑖𝜙(𝑥𝑖) (5.7)
Here 𝑁𝑠 is the number of support vectors. In fact Equation 5.7 is the same as Equation 5.3, but because in the optimal solution of the Lagrangian dual only 𝛼𝑖 corresponding to the support vectors have non-zero values, the summation in Equation 5.7 is only on support vectors [127].
To determine the class of a new sample, 𝑥, a sign function (sgn(.)) is used, it is obtained using,
𝑦 = 𝑠𝑔𝑛{𝑤𝑇𝜙(𝑥) + 𝑏} (5.8)
or
𝑦 = 𝑠𝑔𝑛{∑𝑁𝑖=1𝑠 𝛼𝑖∗𝑦𝑖 𝐾(𝑥𝑖, 𝑥) + 𝑏} (5.9)
where 𝛼∗ is the solution of Lagrangian formulation in 5.6.
SVM on Imbalanced Datasets
Although SVM has a very good performance on balanced datasets, when applied to imbalanced datasets, its performance deteriorates dramatically, especially on the minority class. The SVM decision boundary in an imbalanced dataset is closer toward the minority class region compared to the ideal classification decision boundary. As a result, a considerable number of minority class examples will be misclassified as the majority. Wu and Chang [129] mentioned two reasons for this decision boundary skewness. The first reason is in regard to the imbalanced training data ratio, because the negative data points outnumber the positive examples, these positive examples are further away from the “ideal” decision boundary compared to the majority examples. Second, the imbalanced supports vector ratio, because the number of the negative (majority class) support vectors is much more than the positive (minority class) support vectors, a positive test data point might have more negative support vector neighbors, and as a result will be misclassified as negative (majority) class. Akbani, et al. [97] pointed out another reason for the skewed decision boundary. The objective of the SVM model is to maximize the margin between two classes as
82
well as minimizing the classification errors and there is a tradeoff between these two. When the number of negative examples is much more than the positive ones, the cumulative
misclassification cost of the positive points is relatively small, therefore SVM tends to maximize the margin to its highest possible degree by classifying most (sometimes all) of the examples as negative. Thus, the decision boundary will be shifted toward the minority class region. In the next section, we describe our proposed remedy to this problem.