What to ask a model for

A fundamental question we need to answer when creating a model is what we are going to ask it for. Ideally we would like to just ask the model to provide us with the winner of the race since this is what really matters. It is not as simple as this though. Remember that racing is a chaotic event and it is impossible to predict the winner with absolute accuracy. More than this we as horse players are not just trying to pick the winner of the race instead we are trying to capitalize in our opinion by been more precise in our opinions than the public.

In theory if we could ask the model to provide us with the probability of each horse to win the race our job as bettors would have been as easy as to comparer this probability against the odds offered by the pool and bet only the overlays. In real world it does not work like this as we have no clue of what the final odds are going to be, even our bet is affecting the pools which means than even if we were able to find the real probability we would not have been able to make a profit using it in this way. More than this what we are going to estimate as a real probability most of the times will be close to what is offered by the public and in the times there will be a large discrepancy the public will be correct most of the times as our model would have been missing some information known to the other bettors.

This does not mean that developing such a model is useless. It can be helpful not only as a confirmation of the validity of the whole methodology of model creation but more than this it can serve as a starting point to more sophisticated models that potentially can show profitability in the long run. Most likely the creation of this type of model is the best start and we will discuss it extensible later although we now that this will not be enough to beat the game.

Another opinion we can ask a model, is to provide us an opinion about what is the likelihood of this particular race to produce a high odds winner. In other words now what we are asking for is a binary value serving as an indicator of what kind of a winner we should be looking for. Obviously this is a much easier question than the previous although its answer remains open to multiple interpretations. Note though, that this answer can now initiate a completely new process of research trying to identify the winner or winners using a subset of the starters of the race.

Note that our question to the model does not necessary has to do with the final outcome of the race. For example we can very well ask the model to identify the horse that will take the lead, or how fast a race will go or anything else.

Based in the infinite set of questions we can ask a model for it becomes obvious that asking the right question will become a critical factor for its success and we will talk about this later to a larger extend.

 

Presenting input to a model

When we, horseplayers, talk about data we immediately thing of the racing form which covers a lot of information for each starter, including his 10 last races in a very detailed format, some per race figures like beyer and track variant, life time statistics about the horse, jockey and trainer and many more. Our first challenge towards the creation of a model is to decide how we are going to feed it with these data.

We can categorize the data to the following general groups:

– Raw data
– Derivatives
– Handicapping Factors

By raw data I am referring to basic measurements like the final time of the race, the intermediate fractions, the position of the horse in each fraction, the weight, medication and equipment information and any other type of data that can be directly measured or observed.

These data is transformed to higher level derivatives like for example the track variant and speed or pace figures. Note that these figures apply to a specific past performance, we can go a step further and create another type of derivative which will apply to the horse in a more general way, measuring some attribute with a single figure. An example of this type of figure is the well known Bris Prime Power which is measuring the ability of a horse with a single number, another example is the Quirin speed index which is measuring the early speed of the horse.

There is another type of a figure we can create and this type will be in the race level describing a property of the race. For example we can create a figure to measure how much early speed a race has or another figure measuring the recency of the race, in other words the layoffs of all the horses in it.

We can go even further by creating more general figures in the race track or distance level. For example we can create a figure to measure how much a specific track is favoring early or late runners.

More that these data that can be expressed as figures, we need to provide another type of input to the model, describing a handicapping factor, like for example first time lasix or female against males for first time. These type of data can easily be described in a binary format since a horse can either satisfy a factor or not.

Another very important procedure we need to define is the normalization of the data to a normal format that will be easier to be processed by the model. To describe the need of this level let’ assume that we are providing the number of days since the last start to the model. Note that this metric can range from very small like 2 to very large like 1,000 days. It is easier for the model instead of dealing with the absolute number of days to have some preprocessed representation of them allowing it to easy conclude that a layoff of 800 days is much more similar to 1,000 than a layoff of 2 is to 202 although they both differ by 200 days. There are several solutions to normalize the data based in their nature and range and we will describe them later.

At this point it becomes obvious that the main challenge is to decide what kind of data are needed to sufficiently describe the race so the model can have enough input to derive accurate results. Addressing this challenge requires handicapping expertize and knowledge of statistics as we saw before. Of course the handicapping domain should be subject to some type of statistical confirmation before we decide that indeed a certain factor presents some value and should be part of the input. Of course even if we present some completely random factors to a model, it is possible for it to find out that they have no value and ignore them but this will increase the processing power needed and the complexity of the algorithm. If instead we do some type of preprocessing deciding what factors are valid or not the model will be able to function faster and possibly more accurately.

The final question we need to answer in regards of the input to the model is if we are going to pass all the starters of the race to it or we are going to pass them one by one. Racing is a complex event in the sense that its outcome depends not only in each performance of every individual horse but in a combination of all the past performances of all the starters in the race. For example having multiple front runners that will contest the pace requires a completely different approach than having a single horse that will make the lead.

We all use a model

All bettors are using some type of a model even if they do it consciously or not. Although not a common word in conversations among horse players, we can make a case that every opinion they state is the outcome of one despite the fact they might not be able to articulate it as such.

A model is an abstraction used to approximate an event that cannot be described analytically. Models are widely and successfully used to predict chaotic events weather being the most common example of them. A chaotic event defers from a purely stochastic or random and a deterministic in the sense that although in theory can be considered as deterministic the volume of data needed to estimated it is so large that makes an analytical resolution impossible to the extend to resemble a pure random event.

Horse racing is a chaotic event. If we had every possible information about the starters of the race, the racetrack, the jockeys and many more we would be able to create an extremely complicated model to predict its outcome. Something like this of course is impossible and this is exactly why the game remains interesting as a sport and as betting proposition.

All opinions about the game are the product of abstraction and reduction. The process of weighting each handicapping factor and composing a final opinion is called handicapping. Based in this definition a model is nothing else than a function receiving as input the data that are important to the race and returns back some numerical representation reflecting an opinion that can be translated to a bet.

To develop such a model we have to answer the following questions:

– What data we need to provide as the input to the model?

– What the model will calculate for us?

– How the model will process the data

Exactly these questions will be the topics of the next postings….

Losers make the best mentors

A book with the title ‘Loosing at the track : How I managed to loose ten years worth of wages studying the race form’

The most underestimated aspect when it comes to horse betting is that of proper psychology. Horse players, authors, public handicappers any anyone who are called an expert of the game, tend to overestimate the value of handicapping as the most important factor of success at the track.

The belief that he is smarter than the crowd is what makes any horse bettor to continue his betting, besides the fact that most of them are chronic losers. This fallacy is created and cultivated by what Daniel Kahneman calls an illusion of validity and illusion of skill in his monumental book Think Fast And Slow.

The average horse bettor maintains a fallacious hope of been able to grind the game, something that can be seen as the result of him viewing the outcome of the race as a deterministic even not taking into account the effect of luck which is what compensates the most for it. An immediate consequence of this is the sentiment of self condemn that a horse player is always feeling after a loosing bet plus his effort to explain the outcome of the race and re handicap it in retrospect trying to find the handicapping approach that would have given him the winner.

What the average horse player (and some of those who consider themselves to be winners) is missing, is to raise the correct question which is not who is the expert or what what is the proper way to bet but rather if there really exists an expert or if this proper way to bet is something feasible.

I do not claim to have the answer to this dilemma. At least I cannot reach a conclusion using a rational and inductive process of thought. More than this I believe that such an answer does not exist. As difficult is to prove that indeed a correct approach and real expertize exists in this game it is equally easy to define wrong theories presenting a guaranteed way to disaster. Remember, although it might be impossible to find the sure way to beat a game the opposite is always true, try for example betting all the numbers in a roulette spin!

Indeed I think that we have more to learn from losers than from winners what it comes to horse racing. What they can teach us is a form of silent evidence that most likely will never become a New York Times best seller but what they have to say consist valuable information if we use it and interpret it properly. By the way can you imagine a book with the title ‘Loosing at the track : How I managed to loose ten years worth of wages studying the race form’? Who would even buy this book? In my opinion what a loser has to teach us is of more worth than what a self proclaimed ‘expert’ preaches as the solution to the game. Avoiding the loser mistakes might contribute more to our bottom line that following the new handicapping wizard who is covering his new book with stories and anecdotes making fallacious claims that happen to look correct and certainly something that the public wants to read about.

I have seen quite a few horse gamblers who are starting their betting day having a couple of hundred in their pockets. What most of them have in common is that by the end of the day they will leave the track broke. Besides them getting broke most of the times they manage to bet the whole card betting a few bucks each race… This routine is repeated over and over again, for weeks, months and years. What I find strange is that they never change their bad habits. Instead as time goes by, they become even more passive and conservative, taking very little risk each time the bet while they try to make as many bets as possible. I think this type of a player represents the weakest possible gambler. He never tries to bet all his money into a single race shooting for a good score, instead he is always thinking that he has to bet the next race and to remain in the game with his limited bankroll converting himself to a very soft opponent for someone who is willing to take some measured risks.

I think that avoiding this kind of behavior is the very first step someone can make to improve his game. It seems easier than it really is though. To get rid of this bad habit someone needs to work hard improving the way he is thinking about the game. He has to defeat his intuitive impression that the outcome of a game is a deterministic event, he has to understand that it is much preferable to bet a single race per day instead of betting the whole card but he also needs to convince his subconscious about it. He needs to find the power to stay out of the game for long periods of time and to be very aggressive when he decides to commit to a bet. Of course he has to clarify to himself that the purpose of his involvement with the game is only to make money and nothing else.

Without providing and guarantees proper psychology is a very significant aspect of our performance as gamblers who need to work very hard in order to achieve it.

Discussing Standard Deviation

Discussing Standard Deviation In Horse Racing

It is common among horseplayers when referring to their betting systems to define performance by ROI complementing it most of the times with their winning or place percent.

Setting aside the complexities of ROI calculation and sample selection that seem to be underestimated by many, I have to notice that the standard deviation of the betting model is almost never mentioned while it is the equally important for it.

To make it more concrete let’s consider the following example:

An expert horse bettor has a betting model that based in his simulations has a 1.04 ROI (per $1) and is associated with a $0.25 standard deviation while it has an opinion about a race (selection rate) about 30% of the time.

If this player follows 5 racetracks per day he will find approximately 15 bets per day. Let’s assume that he plays his model for a period of four weeks. Assuming five racing days per week he will bet in total 300 races. Let’s also assume that he uses a flat bet of $500 per race.

After 300 races the total amount of his bets is
300 * $500 = $150,000
Based in his 1.04 ROI he expects to win: $6,000
Since he bets $500 / race his SD is 500 * 0.25 = $125 / race
The standard deviation of his expectation for the sequence of 300 races is going to be:
125 / SQRT(300) = 125 / 17.3 = $7.2 / race
So for the 300 races sample we expect him to be ( three standard deviations of the mean contain almost the total population, that’s why we need to multiply the SD times 3)

Loosing from 20 – 3*7.3 = -1.6 /race or -480
Winning to 20 + 3*7.3 = 41.6 / race or +12,480

It is easy to see that it is quite possible for a winning strategy to loose even after a sequence of 300 races.

Now doing the same calculations for ten months (3,000) we have the following results:

3,000 * $500 = $1,500,000
Expected : + 60,000
SD = 125 / SQRT(3000) = $2.28 / race

Min: 20 – 3 * 2.28 = 13.16 or TOTAL $39,480
Max: 20 + 3 * 2.28 = 26.84 or TOTAL $80,520

So the same strategy in a sequence of 3,000 races will present a profit with pretty high certainty (over 95%)!

Please note that the winning frequency or the longest lossing is nowhere mentioned since it is irrelevant to the calculations…

Impact of Early Speed

A common handicapping factor widely used today, deals with the race shape and depends in the running style of the horse.

There are various ways to measure the early speed of a horse like quirin figures or bris running styles.

Each call position in every past performance off a horse is used to compose these figures which can be in numeric format (quirin) or in a more descriptive string format like Early, Sustained etc.

The big question is what is the impact of such a metric to the outcome of a race.

The usual interpretation given by most handicappers is that the more ‘early’ types we have in a race the better are the chances of a late runner to win and vice versa.

Some data base research can help us to improve our opinion about this as like any other handicapping factor.

We assign to each race a figure (let’s call it APF for average pace figure) representing the average quirin figure for all its starters and calculate the average and standard deviation of it.

Querying my data base reveals that the average APF is close to 3.0 while its standard deviation is 1.3

Based in this we can conclude that races having APF more than 4.3 can be seen as having a lot of speed while those having APF les than 1.7 having very little.

The maximum number for quirin figures is 8, so let’s focus our research to this type of runners..

In races having APF more than 4.3 the total number of horses having quirin equal to 8 is 3,233 out of who 483 were winners or approximately 15%

In races having APF less than 3.0 the total number of horses having quirin equal to 8 is 725 out of who 150 were winners or approximately 20%

In races having APF less than 1.7 the total number of horses having quirin equal to 8 is 54 out of who 16 were winners or approximately 30%

These results indicate that indeed this handicapping approach seems to be correct as long as affecting the final outcome of the race.

Our final step will be to investigate how this concept is embedded in the price of the horse:

In races having APF more than 4.3 horses having quirin equal to 8 have a ROI of 0.78

In races having APF less than 3.0 horses having quirin equal to 8 have a ROI of 1.03

In races having APF less than 3.0 horses having quirin equal to 8 have a ROI of 1.11

So based in these we can assume that the common opinion holds truth.

Now let’s refine our research extending it to report all available quirin points (ranging from 1 to 8) and in top of it let’s see how our data will perform after we break down our races to sprints and routes.

Here you can see a detailed analysis of the performance of Quirin Speed Points

Code to generate the quirin statistics

Using the data contained in these tables, we create a summarized consensus of them that can be found here:

Summarized quirin statistics

Every horse in the data base will be considered in this matrix only once. We can see that it creates the distribution of ROI for every possible combination of the following parameters:

  • Race classification
  • Distance (Route or Sprint)
  • How fast is the race based in the average quirin points
  • Quirin Points of the horse

As we can immediately see this curve is normally distributed and it is displayed sorted by ROI while it is divided in top, average and low portions using one standard deviation as measurement…  We can create a pointing system to represent the value of each horse and write a betting simulator to verify its performance.

The point assignment methodology I use is 2, 1, 0 , -1, -2 based in the number of standard deviations that each starter is far from the mean value.

One thing we need to understand here is that it not the absolute score of each horse that will lead us to a betting decision.

Take for example a race where all the horses happened to be assigned with the same number…  Could be any of the valid 2,1,0,-1,-2. Obviously in this particular race we cannot make a betting selection based in the Quirin points since all of the starters appear equal.

Exactly this decision making mechanism will be the topic of the next posting which will try to optimize the use of this approach to the best possible betting strategy….