2. Marco Conceptual
3.2 Normas asociadas al turismo de naturaleza
The outcomes of both Experiments 3 and 4 appear to support the hypotheses. The data produced through the experiments are further validated through statistical tests.
The statistical tests for the backward analysis experiments were conducted using two separate methods. The first method was based on the assumption that the only important alerts (i.e. alerts that are worth focusing on) were the price jump to 5% or higher. The three alerts “Y” (5% price hike), “A” (10% price hike) and “R” (15% price hike) are recoded in SPSS prior to the tests with a value “1” and all other thresholds, i.e. “C” (less than 5% price hike) and “Null” are recoded with a value “0”. Similarly, the “5%”, “10%” and “15%” moving average thresholds are also recoded with a value
“1” and recoded thresholds below 5% with a value “0”.
The variable “threshold” was renamed “Threshold_Yes_No” for better readability due to the recoding in previous steps. The variable “sma_1day_alert” was also renamed to “SMA1_Threshold_Yes_No”.
In this discussion, “SMA1_Threshold_Yes_No” is used for explanation. A crosstabulation table was constructed as shown in Table 6.13 and a chi-square test for independence was also conducted as shown in Table 6.14.
The other crosstabulation, chi-square and symmetric measure results that compare the other moving averages thresholds against “Threshold_Yes_No” are shown in Appendix F.
Table 6.13 Threshold_Yes_No * SMA1_Threshold_YesNo Crosstabulation – First Method
SMA1_Threshold_YesNo
Total
0 1
Threshold_Yes_No 0 37,529 964 38,493 1 10,361 1,004 11,365
Total 47,890 1,968 49,858
The crosstabulation data in Table 6.13 show that out of the total 49,858 cases, 37,529 cases fall into the non-important category in both “Threshold_Yes_No” and
“SMA1_Threshold_YesNo”. But from there, there are a total of 1,004 cases which fall into the important category in both “Threshold_Yes_No” and
“SMA1_Threshold_YesNo”. This means 1,004 of the flagged comments should be investigated for potentially illegal activities on FDBs at a higher priority.
Table 6.14 Chi-Square Test – First Method Pearson Chi-Square 927.245a 1 .000
Continuity Correctionb 925.576 1 .000
Likelihood Ratio 777.525 1 .000
A chi-square test was conducted to examine the independence of the two variables i.e. “threshold” and “SMA 1 Day”. The test resulted in rejecting the null hypothesis (chi-square = 927.245, df=1, p-value <0.01). The correlation (r=0.136) between these two variables is also computed in Table 6.15 below and found to be significant.
Table 6.15 Symmetric Measures – First Method
Value Ordinal by Ordinal Spearman Correlation .136 .005 30.737 .000c
N of Valid Cases 49858
This means that the two variables are dependent and that the hypotheses in Experiments 3 and 4 appeared to be statistically supported.
Similarly, the “Threshold_Yes_No” variable is compared with the other moving average and time period i.e. “SMA 3 Day”, “SMA 5 Day”, “WMA 1 Day”, “WMA 3 Day”, “WMA 5 Day”, and “EMA 1 Day”, “EMA 3 Day” and “EMA 5 Day”. These results are shown in Appendix G. All the results show that the p-value of their chi-square tests is small (i.e. p-value<0.01) and hence, the null hypothesis of no association between the two variables is rejected. In another word, the hypotheses in Experiment 3 (H1c) and 4 (H1d) are accepted.
Table 6.16 Summary of Correlations – First Method threshold variable and moving average variables. All the correlations were positive and found to be statistically significant even though they were not leaning towards the stronger side. Correlation results in Table 6.16 show that the highest correlation is between “threshold” and "sma3” (i.e. SMA 3 Day) alerts. Among SMA, WMA and EMA, EMA seems to have the lowest correlations. As with the observation of the outcome of Experiment 3 (price-only experiment), SMA seems to be a more suitable moving average technique to be used with the backward analysis. In this case, the relevant authorities could consider using just the SMA calculations (as part of the backward analysis) when running investigations.
The second method of testing the data in the backward analysis experiments is described next. The results of the statistical tests in relation to the “threshold” and
“SMA 1 Day” are discussed. This test, which slightly differs from the first method, is based on the assumption that all the thresholds “Y” (5% of price hike), “A” (10% of price hike) and “R” (15% of price hike) are important alerts and should be tested separately. Hence, “Y” is recoded as value “1”, “A” is recoded as value “2”, “R” is recoded as value “3”, “C” and “Null” are recoded as value “0”. This makes the
“threshold” variable ordinal data, which is in order instead of it simply being categorical as in the first test. Similarly, the thresholds “<5%”, “5%”, “10%” and “15%”
in moving average price calculations are also recoded as values “0”, “1”, “2” and “3”
respectively.
In this discussion, “SMA 1 Day” is used for the explanation. A crosstabulation table was constructed as shown in Table 6.17 and a chi-square test for independence was also conducted as shown in Table 6.18.
The other crosstabulation, chi-square and symmetric measure results that compare the other moving averages thresholds against “threshold” are shown in Appendix G.
Table 6.17 Threshold_Recoded * SMA1_Threshold Crosstabulation – Second Method
SMA1_Threshold
Table 6.18 Chi-Square Test – Second Method
Value df
Asymp. Sig.
(2-sided) Pearson Chi-Square 1319.065a 9 .000
Likelihood Ratio 932.526 9 .000
N of Valid Cases 49858
Table 6.19 Symmetric Measures – Second Method
Value Ordinal by Ordinal Spearman Correlation .141 .006 31.843 .000c
N of Valid Cases 49858
Table 6.20 Summary of Correlations – Second Method
sma1 sma3 sma5 wma1 wma3 sma5 ema1 ema3 ema5 Threshold Co .146** .193** .202** .125** .171** .191** .122** .096** .084
Si .000 .000 .000 .000 .000 .000 .000 .000 .000 N 49858 49858 49858 49858 49858 49858 49858 49858 49858
All hypothesis tests in the second method also rejected the null hypothesis of no association between the threshold variable and moving average variables. All the correlations were positive and found to be statistically significant even though they were not leaning towards the stronger side. However, the chi-square test value and correlation values were found to be improved when each threshold level is treated differently. This means that it makes sense to take into account separately the “Null”
and “C”, “Y”, “A” and “R” as well as the “<5%”, “5%”, “10%” and “15%” on the moving averages side. Correlation results in Table 6.20 show that the highest correlation in the second method is between “threshold” and "sma5” (i.e. SMA 5 Day) alerts.
Similar to the first method of testing, among SMA, WMA and EMA, EMA seems to generate the lowest correlations. Again, in this second method, SMA has once again hit the highest correlation value. It is, thus, synchronised with the observation in the outcome of Experiment 3 (price-only experiment) where SMA was shown to be a more suitable moving average technique to be used with the backward analysis because it does not lean towards too few or too many flagged prices. In this case, the relevant authorities could consider using just the SMA calculations (as part of the backward analysis) when performing investigations.
6.8 Chapter Summary
The purpose of Chapter 6 was to conduct four experiments in order to test the two novel methodologies, namely forward analysis and backward analysis, in relation to the research question. Two experiments belonged to forward analysis and another two experiments belonged to backward analysis.
The aim of the forward analysis is to flag and filter potentially illegal Pump and Dump (P&D) comments. This analysis flags the comments against the predefined P&D Information Extraction (IE) keyword template and then calculates the ±2 days’ worth of per minute share prices against the “base price” of the flagged comments. The flagged comments are then labelled with price hike thresholds (i.e. “C”, “Y”, “A” and
“R”) accordingly. This allows the flagged comments to be filtered according to their price movements and the indices they belong to. The hypotheses for the two
experiments under forward analysis have both been supported empirically and statistically.
As for the backward analysis, the aim is to detect abnormalities in the price movements followed by performing backward analysis to match the abnormal stock prices with the flagged comments to further classify flagged comments with the intention of reducing false positives for flagged comments. This analysis flags the abnormalities in the price movements using moving average techniques (i.e. SMA, WMA, EMA) with the moving average thresholds (i.e. “5%”, “10%” and “15%”) for three different time periods (i.e. 1 Day, 3 Day and 5 Day) and then backward labels the moving average thresholds to the flagged comments, which attempts to further filter the flagged comments while resolving false positives. The hypotheses for the two backward analysis experiments have both been supported empirically and statistically.
This concludes that it is possible to perform both novel forward and backward analysis by flagging the comments using the P&D IE keyword template with the price hike thresholds, then match the abnormalities in the stock prices with the flagged comments to further classify flagged comments to resolve false positives.
This research also investigates the potential use of Semantical Textual Similarity (STS) in order to test whether it can be used to improve the comments flagging process in the forward analysis, by comparing the semantic meaning between the comments and the P&D keywords, phrases and sentences. Chapter 6 consists of three experiments for this purpose.