CAPITULO 4. RESULTADOS Y DISCUSION
4.1. Época de siembra
Analyst Rating Prediction
Analyst ratings were estimated with Linear Regression, with and without Lasso reg- ularization, Gradient Boosting Regressor and Random Forest Regressor, on the four subsets of features described in section 3.4.2. The prediction of the average ana- lyst rating is taken as benchmark. We use Mean Squared Error, Residual Sum of Squares and R squared statistics to evaluate the regression results. Mean Squared Error (MSE) measures the average squared difference between predictions and tar- get variable. It expresses the average error in the prediction of the model and it is used to asses the quality of the model. It’s value is always positive because of the squares and a value close to 0 is desired. Residual sum of squares (RSS) is a sim- ple metric that sums up the squared errors between target variable and prediction. It’s value tells us how much of the variance of the input data is not explained by the model. R squared measures the percentage of the total variation of the dependent variable that is explained by the by the independent variable. It can gave values be- tween 0 and 100, with high values expressing a good fit of the model with the given data.
Table 3.4 presents our estimation results. As benchmark we took the prediction of the average rating for all instances. The benchmark has a MSE of 0.407. All mod- els perform slightly better than the benchmark, meaning that they learn something. As expected, the best performing model is Random Forest Regressor, with a mean squared error of 0.36 on subset 4. Errors do not vary much across models and subsets. Worst performing algorithm predicts with a MSE of 0.38 in case of Linear Regression. The regularization parameter does not improve any of the models.
3.5. RESULTS 29
Data Model MSE test RSS test R2test
Set 1 LinearRegression 0.388 509.274 0.03 Lasso 0.388 508.448 0.032 GradientBoostingReg 0.381 500.057 0.056 RandomForestReg 0.373 489.027 0.074 Set 2 LinearRegression 0.385 504.919 0.043 Lasso 0.385 504.782 0.043 GradientBoostingReg 0.371 486.268 0.079 RandomForestReg 0.369 484.093 0.08 Set 3 LinearRegression 0.387 508.373 0.032 Lasso 0.387 508.375 0.032 GradientBoostingReg 0.367 481.05 0.082 RandomForestReg 0.369 484.283 0.077 Set 4 LinearRegression 0.378 495.908 0.052 Lasso 0.378 495.72 0.053 GradientBoostingReg 0.363 476.658 0.092 RandomForestReg 0.36 472.386 0.098 Average (Benchmark) 0.407 533.734 0
Table 3.4: Analyst Ratings prediction results Low residual sum of squares and R
Squared values show poor fit of the data with the model. Comparing RSS and R Squared scores of models with the ones of benchmark, our models learn very lit- tle from the data and the data itself is not very representative for the chosen models
Prediction per Sector
We validate the results of classification using F1 score. The F1 score is the har- monic mean of precision and recall met-
rics. Precision refers to the number of correctly classified instances over all in- stances classified to the respective class. Recall is the ratio between correctly classified instances over all instances in the actual class. The micro-average F1 score takes class imbalance into consideration and gives increased weights to classes with fewer samples. Table 3.5 presents the micro-average F1 score of the classification models previously mentioned. We considered a naive bench- mark the average F1-score across prediction of each class. More specifically,
given n classes (five in our case), we calculate the benchmark based on formula
Benchmark= n1 Pn
i=1f1score(predicting class i).
The majority of the models score above benchmark. Columns 2 and 3 of results table represent the micro-average F1 score of models that use a small set of features from data, comprised of adjusted beta, return last year, PE and market cap. The last 2 columns of results belong to models that contain additional polynomial features of 3rd degree. In case of decision tree based models, the polynomial features don’t improve the end result. SVM model is improved from an F1 score of 0.13 to 0.28.
Table 3.5: Results of classification per individual sector.
Model without polifeature with polifeature
F1 score without clusters F1 score with clusters F1 score without clusters F1 score with clusters Decision Tree Classifier 0.46 0.46 0.46 0.46 AdaBoost with base Decision Tree 0.48 0.46 0.46 0.43 Bagging with base Decision Tree 0.46 0.46 0.46 0.46
SVM linear kernel 0.13 0.44 0.28 0.28
OneVSRest with base linear SVM 0.13 0.62 0.32 0.32 SVM poly kernel 9d & 11d 0.02 0.57 0.03 0.03 OneVSRest with base poly SVM 0.03 0.72 0.08 0.48 Average prediction (benchmark) 0.2
As mentioned in section 3.4.3, we clustered the observations based on financial features and used the cluster number as an additional feature for classification. The
30 CHAPTER3. PREDICTION OF ANALYST RATINGS
third and last column of Table 3.5 show results of models in which we included the cluster number as feature. This does not affect the results of the decision tree classifiers, nor any of the models containing polynomial features. But in the case of SVM it makes a significant difference, bringing the F1 score from 0.03 to 0.72, the highest value.
Predicting Returns from Analyst Ratings
Further analysis was carried out in order to have a complete overview over the explanatory power of data. As analyst ratings correspond to expected future returns, we want to test whether returns are correlated with the set of features we used: volatility 360 days, P/E, returns last year.
We applied Multiple Linear Regression on features volatility 360 days, P/E, re- turns 2016 to see their cumulative linear influence over analyst ratings. The result of this method is depicted in Figure 3.5(a). Figure 3.5(b) shows the cumulative in- fluence of features volatility, P/E and returns of 2016 over returns of 2017. Lastly, Figure 3.5(c) shows how much the returns of 2017 are explained by analyst ratings emitted at the end of 2016.
(a) Correlation with ANR (b) Correlation with returns of 2016
(c) Correlation with returns of 2017
Figure 3.5: Multiple Linear Regression
In case of correlation we would expect the red data points to form a pattern. Contrary to expectations, these plots show little explanatory value in the indepen- dent variables. Figure 3.5(a) and 3.5(b) show that the set of features used do not influence the analyst rating, nor do they explain returns of 2017. Lastly, Figure 3.5(c) shows that analyst ratings emitted at the end of the year 2016 are not correlated with returns of 2017. In theory they should, which suggest that the labels we have for analyst ratings are not precise.
3.6. CONCLUSIONS& DISCUSSION 31