EL ABORDAJE PALEOETNOBOTÁNICO: EVOLUCIÓN BAJO DOMESTICACIÓN
4.1.1 GÉNERO CUCURBITA.
4.1.1 a Características que avalan o rechazan la concepción de C.
For each set of argumentation features, i.e., argument component (AC), component label (CL), argument flow (AF), relation label (RL), and typology structure (TS), Table49shows 10-fold cross validation performance with the Decision Tree classifier implementation in Scikit-learn2. Comparing with Logistic Regression algorithm (see Table 37), while Deci-
sion Tree yielded lower score prediction performance for each argumentation feature set, the algorithm obtained better κ and qwk than Logistic Regression when trained with all argumentation features. One reason is that Decision Tree algorithm is capable of pruning ineffective features which do not help further classify data. Unfortunately, the pruning capa- bility of Decision Tree algorithm does not always yield to optimal feature set. An evidence is that 10-fold cross validation performance with all argumentation feature is lower then the performance with component label features.
To illustrate the feature importance, Figure 19visualizes the Decision Tree trained with 107 essays using all argumentation features. Five features that show up in the tree include:
1. WordInArgument: number of words in argument components (AC) 2. SentencewArgument: number of sentences that have AC
3. danglingClaim: number of claims that have no support premises 4. SentencewArgumentPct: percentage of argumentative sentences 5. PremisePct: percentage of premises
Figure 19: Decision tree learned using argumentation features with true labels
Each node of the tree has darker color if the distribution of its data is more skewed to the major class, and includes the following content in top-down order:
• The condition that split its data. For example, the right branch from node #0 says that essays with more than 212 words in AC can have scores either b or a. However, essays with less than 212 words in AC have scores c or b.
• Gini impurity score that measures the error probability of a random labeling given the distribution of labels in the subset (Breiman et al., 1984). The splitting conditions that yield small gini scores are desired.
Set Feature Short name
AC Number of words in AC WordInArgument
Number of argumentative sentences SentencewArgument Percentage of argumentative sentences SentencewArgumentPct
CL Number of premises Premise
Premise percentage PremisePct
Number of claims Claim
AF Number of paragraphs with claim and premise ParagraphwClaim-Premise Percentage of typed bigram MajorClaim-Claim MajorClaim-Claim
Percentage of typed bigram Claim-Claim Claim-Claim Percentage of type bigram Premise-Claim Premise-Claim
RL Number of supporting premises supportPremise
Number of dangling claims danglingClaim
TS Number of paragraphs that have chain arguments pwChainTopo
Number of tree arguments treeTopo
Table 50: Most important features of each feature set
• Total samples of the subset, e.g., node #1 contains 58 essays.
• Class distribution of the subset. For instance, node #2 has all 12 essays of score c and no essays of scores a or b.
• The major class, e.g., score b in node #3.
As shown in the figure, branches from nodes #0 and #5 generalize a rule that essays with more words in argument components (AC) tend to have higher scores than those with less words. Among essays with more words in AC, the number of dangling claims (node #8) and percentage of premises (node #9) further refines essay scores. To get scores of a, essays should not have many dangling claim (e.g., more than 5) and small percentage of premise (e.g., less than 0.3). While the conditions that formulate this decision tree may be specific to the training data, e.g., WordInArgument≤ 212, the generalized rules instantiate well rubric statements in Table 48, especially the rules of dangling claims and percentage of premises.
To examine feature importance in each argumentation feature set, Figures 20, 21, 22, 23, and 24 visualize the decision trees learned with argumentation features of each set, respectively. Table50shows important features which show up in the learned decision trees. Over the five trees, we consider the leaf nodes that have the smallest gini scores and obtain
the following rules for each of the score levels. By only considering the leaf nodes with the smallest gini scores, we aim for the most reliable decision rules to validate the argumentation features that show up.
• Score c:
– WordInArgument≤ 212 AND SentencewArgument ≤ 3.5 (Figure 20, gini = 0) – ParagraphwClaim-Premise ≤ 1.5 AND MajorClaim-Claim > 0.415 (Figure 22, gini
= 0) • Score b:
– WordInArgument ≤ 212 AND SentencewArgument > 3.5 AND SentencewArgu- mentPct ≤ 0.435 (Figure 20, gini = 0.1327)
• Score a:
– Premise > 4.5 AND PremisePct > 0.345 AND Claim≤ 9.5 (Figure21, gini = 0.0986) First, we can see that decision rules for score c and a have very low gini score and are consistent with the writing rubrics. One of the rules for score c states that essays that have one or no paragraph with claim and premise, but a high ratio of major claim – claim chain will have score c. On the other hand, essays that have many premises (more than 4) but not too many claims (less than 9) will have score a. However, rules for score b have higher gini and their clauses are contradictory. As stated in the rule above, essays that have less words in argument components (less than 212), but not too few or too many argumentative sentences will have scores of b. The conflicting clauses of that rule and also many other rules of score b (e.g., node #7 in Figure21) may reveal the challenges of classifying this score level which is considered more ambiguous than the levels a and c.
Overall, the results show that the extracted rules from decision tree models trained with argumentation features align well with the reference writing rubrics. This is expected because the decision tree models were trained with argumentation features derived from true labels of argumentation features. In the next experiment, we study argumentation features in the case they are computed from predicted labels of argumentation features.
Figure 20: Decision tree learned with TrueLabel AC features
Figure 22: Decision tree learned with TrueLabel AF features
Figure 24: Decision tree learned with TrueLabel TS features