Capítulo 3 Desglose de las principales magnitudes bancarias
3.1. Productos de activo
3.2.2. Participaciones preferentes
6.2
Future Work
Due to both time constraints and the intended scope of the project, some of the ideas that emerged throughout the project were not looked into. The following is a summary of such ideas as a suggestion for possible future work.
6.2.1 Alternative Approaches
There are a wealth of alternative approaches to binary classification which were not explored fully. Given the success of the Naive Bayes and WKNN style approaches, it may make sense to look into methods which build on these. For example, distance metric learning for the k-nearest neighbour classifier[39] and a weighted one order dependency extension to the Naive Bayes classifier[19].
Other possibilities include ensemble classifiers such as random forests[5] and other general ensemble techniques[31] such as AdaBoost, however these may be unsuitable due to the noisy nature of the data.
6.2.2 Shorter Retraining Intervals
As noted in Section 5.4, something else we did not investigate is the effect shorter re- training intervals might have on classifier performance. To clarify, the retraining interval is the length of time we use a trained classifier for. In the final test phase, we split the data into 10 equally sized blocks, but why not 100 or even 1000 blocks? The only real limitation here is the time taken to retrain classifiers. If it tookt units of time to train for 10 blocks, it will take 10tunits of time to train for 100 blocks, for example.
In reality, where the meta-tipster is being used to continuously predict the outcomes of events, a practical approach might be to retrain overnight using the previous day’s additional data which would keep the computation feasible.
6.2.3 Markets With Several Outcomes
Throughout the project, we focussed on the case of a two outcome market, as is the case with the TennisInsight data set, yet much of the project is applicable to markets with more than two outcomes. Both the simulated world and the data representations considered are capable of dealing with markets with a variable number of outcomes.
The difference now is that the data set will be unbalanced. For a market with m
outcomes, for each instance of that market, we will have (m−1) “lose” data points and
only 1 “win” data point. This also means that we go from having equal prior probabilities of 1
2 in the two outcome case to prior probabilities of 1
m for a win and 1−
1
m
for a lose
in them outcome case.
The classifiers would have to take this imbalance into account, for example in the stan- dard K nearest neighbour algorithm it no longer makes sense to give all data points equal weighting and we may instead weight win data points more than lose data points.
6.2. Future Work Chapter 6. Discussion
6.2.4 We Need More Data!
The main problem we faced when considering the profit streams of the classifiers is that the majority of the hypothesis tests conducted were inconclusive at the 5% level. The only real solution to this is to conduct the tests using more data. Since the original Ten- nisInsight data collection was performed, at the time of writing there are now a further 6 months worth of data (from January to June 2012) which has not been considered. As well as the TennisInsight data set, it is certainly worth considering other sources of data. OLBG is a possibility, however the 45 day tip history and sensitive flood detection
pose problems. TennisInsight has launched another (beta) site SportInsight1, covering
professional hockey, baseball, rugby and football which happens to use the same site template as TennisInsight, which would make data collection an easier task given the current scrapers.
In addition to these kinds of tipping sites, there are hundreds of independent tipsters who run their own “blogs” providing sporting tips. Unfortunately, sourcing such sites is difficult and often the posting style is inconsistent which would make data collection
itself a challenge. Although, the site Blogabet2 verifies and indexes a number of blogs
run independently by tipsters with a standard post template for tips, which might make it worth investigating further. Blogabet covers a wide range of sports including tennis, football, rugby and boxing.
Collecting data from more than one source will introduce new challenges involving iden- tification of data. For example, what if the name of a tennis player is spelt incorrectly in one source but not in another? Such small discrepancies could be overcome by having one “trusted” source for data and then mapping any discrepancies in other sources to the trusted source’s identifier.
6.2.5 Bookmakers As Tipsters
Something else which may be worth investigating is considering the bookmakers them- selves as tipsters. Although the majority of the time the odds set by all bookmakers for a market will agree and not vary a great deal, perhaps when there is variation this
has some hidden meaning. Maybe Ladbrokes3 always has inflated opening odds for the
outsiders whilst William Hill4 has inflated opening odds for favourites.
Alternatively, a time series analysis of how the odds set by various bookmakers tend to change from the market opening time to closing time for a specific kind of event has the potential to be revealing. If a particular combination of odds drift rates had a meaning attached to it and this meaning could be detected early on, this information could be exploited. 1 http://www.sport-insight.com 2http://www.blogabet.com/directory 3http://www.ladbrokes.com/ 4http://www.williamhill.com/
6.2. Future Work Chapter 6. Discussion
6.2.6 A New Tipping Website
We found through experimentation with the simulated world that a more efficient way for us to extract knowledge from the tipsters arises if the tipsters always tell us their predictions, even if they deem a selection as having “no value”. To my knowledge, at the time of writing, no existing tipping site implements such a tipping style.
As such, another possible extension is to set up a tipping website which does exactly that. I suggest three possible ways of achieving this:
Direct Percentages
Tipsters provide percentage predictions directly by adjusting a series of sliders corresponding to the probability they think each runner has of winning.
Lowest Odds Acceptable
Tipsters provide the lowest odds they would consider for placing a bet on the runner in question.
Be The Bookmaker
In a virtual world, each tipster gets to lay a book on real world events of their choice and simultaneously place bets with other user controlled bookmakers. Perhaps the most potential for entertainment for the tipsters lies with the “be the book- maker” option, which operates almost like a non-anonymous virtual betting exchange.
Since it is unrealistic for one tipster to lay a book for all events every day, perhaps
tipsters could form teams and operate under a single name and compete against other teams.
All of these options would be community driven. The incentive for tipsters is both “just for fun” and also there is the potential for (monetary or otherwise) prizes to be awarded to the best performing tipsters.
A
Odds
This appendix provides details regarding odds and bookmakers. Section A.1 gives details on the different ways odds are typically represented and how to convert between them. Section A.2 discusses the concept of the over-round which is introduced by bookmakers into the odds so they can make a profit.