Título del gráfico
RESPUESTAS A LAS INTERROGANTES DE LA INVESTIGACIÒN
Based on the results of the systematic study of different aspects of methodological design pre- sented in the previous section, this section examines three different strategies to boost the performance for automatic user profiling using only a single message per user: a feature union approach (Section 4.5.1), a balancing strategy (Section 4.5.2) and a cross-task classification approach (Section 4.5.3).
4.5.1 Combining Features into Complex Models
Because the experiments that were based on token unigram and character 4-gram features showed the best results for both age groups, in the next step we combined both these types with other single feature types. Interestingly, as can be seen in Figure 4.2 and 4.3 there was only one combination that was able to top the performance of the BOW features for both categories: when merging character 4-grams with sociolinguistic features, the NB classifier achieved a 83.3% precision, a 56.2% recall and a 67.1% F-score for the adult class and a 91.5% precision, a 97.7% recall and a 94.5% F-score for the adolescent age group. Threefold combinations of single feature types did not produce higher results for either category.
Similar to the feature union experiments for age group detection described above, the best single feature types were merged with other types to examine which combinations could boost the performance of the gender classifier. Again, the best results were achieved when combining character tetragrams with the sociolinguistic features introduced in this chapter, resulting
in a 47.8% precision, 45.1% recall and 46.6% F-score for the male category. With regard to identifying females, the BOW model still outperformed all other combinations. An overview of the performance of the different (combinations of) feature types is presented in Figure 4.4 and 4.5.
Figure 4.2: Precision, recall and F-score for age prediction per feature type (combination) for the
PLUS20 class.
Figure 4.3: Precision, recall and F-score for age prediction per feature type (combination) for the
Figure 4.4: Precision, recall and F-score for gender prediction per feature type (combination) for
the MALEclass.
Figure 4.5: Precision, recall and F-score for gender prediction per feature type (combination) for
4.5.2 Balancing the Data
To create a good reflection of reality, up until this point, a highly skewed data distribution was adopted during each learning experiment. The results discussed above showed that it is feasible to improve upon the baseline performance for the minority classes that are the focus of this study (i.e., male adults), based on a heavily imbalanced dataset containing only a single message per user. The series of experiments described in this section investigated whether balancing the data in train while maintaining the original skewed data distribution in test could increase the performance for these categories. Furthermore, experiments on the completely balanced NETLOG_SUBSET2 dataset were also included to enable a useful comparative analysis with regard to prior studies in automatic user profiling (see Section 4.2.1).
The best results for both types of balanced data experiments were again obtained by the Multi- nomial Naïve Bayes classifier, using a combination of the character 4-grams and sociolinguistic features. Balancing the dataset in each training partition only, led to a considerably higher recall score for the minority classes when predicting age and gender compared to the imbalanced data
experiments described above, but the precision and F-score decreased in for both the PLUS20
and the MALEcategory.
Finally, regarding the completely balanced learning experiments, this study’s findings show that adding the sociolinguistic features (which were newly introduced in Section 4.3.2) to the
more traditional charactern-gram features produced a higher accuracy of 84.1% compared to
prior work for age prediction on blogs (e.g., [73, 116, 186]) and social network postings ([152, 204]) which all incorporated multiple messages per user in their experiments. With regard to gender prediction, the Naïve Bayes classifier yielded slightly lower results (61.0% accuracy) than reported by [34], who set up a learning experiment in which a single tweet per user was incorporated. However, the authors of [34] did not include adolescent messages in their dataset.
4.5.3 Combining Age and Gender Prediction
Finally, the third part of this section investigates whether gender meta-data can be a helpful information source in constructing more accurate classifiers for age group detection.
(see Section 4.4.1), which are referred to as BIN_EXP., in this section, three different approaches of including the meta-data for gender are examined in order to investigate their effect on age group prediction. Given that one of the key objectives of this thesis is to investigate the feasibility of detecting adults posing as adolescents in social network environments, during these experiments the focus lies on the scores for the adult class.
In the first experiment (EXP_1) the data was balanced according to both age and gender in
each training partition of the NETLOG_SUBSET2 dataset. Next, the Multinomial Naïve Bayes
classifier was retrained, extracting the 50,000 most frequent features (DF) and using binary feature values (see Section 4.4.1). Compared to the results of the imbalanced data experiments,
both the recall and F-score for the PLUS20 category improved from 56.2% to 81.9% and from 67.1
to 69.8%, respectively. The precision, however, decreased from 83.3% to 60.8%.
In the second experiment (EXP_2) a three-way NB model was trained on the original, imbal-
anced dataset, which included the MIN16 category and two complex classes for the older class
in which gender was included, namely PLUS20_MALEand PLUS20_FEMALE. Subsequently, the
complex classes were reduced to PLUS20 in the classifier’s output in order to examine whether
the extra gender information the classifier had acquired during training would lead to a better age prediction on the binary test sets. Although the recall dropped slightly to 80.9%, the results
of EXP_2 showed an improvement upon the EXP_1 approach with regard to the precision (67.7%)
and the F-score (73.7%).
The third experiment (EXP_3) consisted of including gender as an additional feature in every
instance of the original NETLOG_SUBSET2 data (i.e., a feature union approach). Again the results
improved upon those of BIN. and EXP_1, but not upon those of EXP_2, resulting in a precision
of 65.8%, a recall of 81.4% and an f-score of 72.8% for the adult class. In Table 4.8 an overview is provided of the results for the three additional experiments compared to those of the binary
age classification experiments (BIN_EXP.). In EXP_2, the model showed a classification error of
Table 4.8: Results for age prediction when including gender meta-data: EXP_1 (data balanced
according to age group and gender in train), EXP_2 (3 classes in train, 2 in test) and EXP_3
(gender as feature).
Scores (%) Class Bin.
Including Gender
Exp_1 Exp_2 Exp_3
Precision Min16 91.5 96.0 95.9 95.9 Plus20 83.3 60.8 67.7 65.8 Recall Min16 97.7 89.0 92.0 91.2 Plus20 56.2 81.9 80.9 81.4 F-score Min16 94.5 92.4 93.9 93.5 Plus20 67.1 69.8 73.7 72.8