Don’t rely on intuition and logic

The most common error among horse players is to rely on intuition and logical thinking. A combination of selective memory with a think process that tries to interpret a handicapping factor most of the times is reaching a completely wrong conclusion. The good news is that any kind of a handicapping scenario that can be defined as a function of concrete data can easily be tested using a data base. This does not mean that a data base can completely substitute the human factor completely but it can certainly validate it, helping the handicapper to reach more objective and accurate conclusions.

Two logical handicapping factors

In this posting I will analyze a couple of such factors that appeared recently on Pace Advantage from a couple of handicappers that using the experience and logic concluded to the following factors:

A horse claimed in his last race where he started at more than 10-1 odds is a good bet in his next race as his trainer must have been convinced about the ability of the horse in order to try such a risky claim

Bet a horse when a top jockey stays on for the second time after the horse runs out of the money. It seems reasonable that if a top jockey made the decision to stick with a horse even if he run out the money with him in his last race to assume that he has a very good reason since it would have been very easy for him to select any other mount

Horse claimed in his last race at more than 10-1

To test this factor I will compare all the horses who were claimed last time at odds of more than 10-1 against all the other who have been claimed for less. To make comparisons easier I will add as a constrain the existence of precisely one of these in a race, meaning that I will not consider races where I had two or more of each category in any race.

After creating these factors and running them through the data base I am getting the following results:

factor winners losers Win% ROI
claimed more than 10-1 28 191 12.8 0.86
claimed less than 10-1 496 2279 17.9 0.85

Calculating Chi Square using observed results only

chi2 = 3.64
critical value = 3.84

not significant

Calculating Chi Square using observed results and expected values based on the odds

observed/expected values using percentages
28.00/ 26.14 191.00/ 192.86 totals = 219 219.0
496.00/ 520.76 2279.00/ 2254.24 totals = 2775 2775.0

chi2 = 1.60
critical value = 3.85

not significant

Note that horses fitting the factor won close to 13% while all other claims at 18% both having very close ROI.As we can see from the chi square analysis these two categories have exactly the same behavior for betting purposes so the factor is completely random.

Besides the fact that it look reasonable to assume that a trainer who is claiming a longshot knows something that no one else does, the numbers prove that this is not the case.

Top jockey stays after an out of the money race

I am using the top 10 jockeys from here comparing horses with top jockey staying for second time after an out of the money race against horses that were ridden in the last race from a top jockey who today passes the mount.

The results are as follows:

factor winners losers Win% ROI
top_jockey_goes 573 3024 15.9 0.72
top_jockey_stays 123 592 17.2 0.79

Calculating Chi Square using observed results only

chi2 = 0.72
critical value = 3.841

not significant

Calculating Chi Square using observed results and expected values based on the odds

observed/expected values using percentages
573.00/ 649.29 3024.00/ 2947.71 totals = 3597 3597.0

123.00/ 131.99 592.00/ 583.01 totals = 715 715.0

chi2 = 11.69
critical value = 3.85

significant

As we can see now the factor has no value as far as winning frequency goes but for betting purposes it is indeed significant.

To simplify the results note that the ROI when the top jockey goes is 0.72 while when he stays is 0.79.

This can be interpreted in two ways:

– When jockey goes the bet is very poor

– When jockey goes the bet is very good

To address this we need another test where we are going to compare horses where the top jockey stays against all other mounts of the top jockey.

The results for this test look like this:

factor winners losers Win% ROI
top_jockey 1093 3487 23.9 0.85
top_jockey_stays 123 592 17.2 0.79

Calculating Chi Square using observed results only

chi2 = 15.52
critical value = 3.85

significant

Calculating Chi Square using observed results and expected values based on the odds

observed/expected values using percentages
1093.00/ 1129.18 3487.00/ 3450.82 totals = 4580 4580.0

123.00/ 131.99 592.00/ 583.01 totals = 715 715.0

total observed=5295
total expected=5295.0

chi2 = 2.29
critical value = 3.85

not significant

Analyzing these results we can see that as far as winning frequency goes when the jockey stays we have a significantly lower winning chance than all his other mount and for betting purposes there is no significant difference at all. This means that the bet is neutral and makes no difference (for betting purposes) whether this factor is true or not.

Combining this result with the previous we conclude that the significance of the factor is to avoid horses abandoned by the top jockey rather to bet those he stays up.

Do not rely on intuition and logic when your factor can be tested

The purpose of this posting was not that much to analyze the specific factors as to prove that intuition and logic are deceiving and should not be trusted when making handicapping decisions. As a handicapper your major goal is to identify this type of factors that are used by the betting crowd an take advantage of them.

Analyzing Betting Records

delta
Keeping good records is an essential process for anyone who is serious about horse betting. Most of us although accept the importance of it we are not disciplined enough to update our records and especially after a long loosing streak. The conclusions we can reach by having a complete and comprehensive view of our bets are extremely valuable and we need to convince ourself to religiously keep up with it even when we will have to record consecutive losses that make us look really bad as bettors.

One of the best record keepers I have seen is Ray2000, a poster in Pace Advantage who specializes in harness racing and today I will analyze one of his files that can be found here.

Note that for each race Ray is picking three horses and based on them he plays four combinations.More specifically he boxes his top selection with the second and third choice. The amounts he bets differ based in the combination so AXB is always bet more than CXA for example.

For this posting what I will check is indeed there is a significant more probability of occurrence for some of the combinations of not. Note that at this point I am not considering amount won or lost at all, I just want to see if some of the combinations are significantly more probable than the others.

For each row in the spreadsheet, I created a new row containing only four columns one for each combination and populated it with 0 if it was loosing and 1 if it was winning.

After summing up all the rows I am getting a 4 X 2 table where for each combination I have its total wins and loses.

Based in the data this table looks like this:

Winners Losers
17 327
11 333
8 336
7 337

Analyzing this data I can see that we have 3 degrees of freedom and a critical value of 7.82 while the chi square is 5.82.

Degress of freedom 3
Chi Square 5.84
Critical Value 7.82

not significant

As you can see we can conclude that from this data set there is no real difference in the frequency of occurrence for any of the four combinations (something that at first glance might look strange as the C X A only occurs 7 times while A X B occurs 17 times).

There are more interesting conclusions we can reach analyzing Ray’s file and I will get back to it later.

Validating a Speed Figure

delta
On Pace Advantage http://www.paceadvantage.com/forum/ that happens to be the best horse racing forum on the Web and I strongly suggest to anyone interested in the game to follow it, many times the conversation revolves around speed and pace figures.

The following two interesting recent threads are questioning this topic:

http://www.paceadvantage.com/forum/showthread.php?t=101063
http://www.paceadvantage.com/forum/showthread.php?t=101422

What should we expect from a speed figure

My opinion when it comes to figure making is that what the starting point should be the creation of a validation mechanism that will allow us to compare two or more methodologies and decide which one is better than the others.

It is not exactly what of how is measured by a figure that is really important for betting purposes. As long as we now how to measure the validity of a figure what is really important for us as bettors are just the figures themselves; any additional information about how the figure is created should be completely transparent and irrelevant.

An example using Bris Speed Rating

In this posting I will show a simple mechanism I am using as an indicator of the validity of a specific speed figure which happens to be bris speed rating. The specific testing scenario I am examining here is by no means the most precise or comprehensive treatment and we can create far more sophisticated algorithms if we need to delve in to a more detailed level that might serve as the foundation for the creation of an automated betting system. Besides this the presented approach is sufficient to make my case about evaluating the concrete factor.

Although the insides of bris speed rating consist a black box we know that it measures the final performance of a horse in a scale from lower to higher with higher numbers signifying a better performance. Based in this I am making the following simple handicapping assumption:

If these figures are valid and if previous races are somehow dictating future performance I am expecting horses that run faster in the past based in the figures to have an advantage on any given race.

Note that for the purpose of this exercise I will not examine how well the crowd tends to be aware about the importance or not of this factor since here I am only trying to understand if indeed this particular speed figure has a valid reflection to the outcome of the race strictly from the final winer prediction scope of view.

For simplicity I will narrow my data set only to horses who started as favorites and I will divide them to two groups:

– The first group will contain favorites having one of their last three races rated with a speed figure that is at least 4 points higher than the last three races of all the other starters in the race

– The second group will contain all other favorites

For a race to be considered all its starters need to have at least three races otherwise will be discarded.

Running this scenario through a universe of close to 19,000 races I am getting the following results:

factor winners losers Win%
superior_speed_figure 633 786 44.6%
all other favorites 6238 11247 35.7%

Chi Square

chi2 = 45.264
critical value = 3.841

significant

As you can see horses falling in the first group are winning at a very significantly higher rate that horses of group two.

Although we can not rely on this scenario alone to assure a profit it still serves as a strong indicator that bris speed rating is a significant figure that indeed works as advertised, measuring final performance with enough precision.

If I had a databased containing an alternative methodology of a similar figure like Beyer or Ragozin it would have been trivial running similar tests to detect which one shows superior behavior.

Following this approach I have tested various custom speed figure methodologies that I have developed, having the best of them ranging on the same levels as the convenient and inexpensive Bris figures, this is the reason I no longer go through the hassle of custom figures but relying on Bris for most of my needs.

Always start with a concrete verification method

The most important thing when creating a figure or any other kind of metric is to have a concrete methodology of how to verify its performance and significance. Once we have a clear understanding of how to do that then we should start with the most simple paradigm adding complexity as needed until we reach an acceptable performance.

Factor code

For documentation purposes here you can see the factor descibed in python:

 10 def is_favorite_with_superior_speed_figure(starter):
 11 
 12     # the requirements are the following:
 11 
 13     # 1) all starters on the race have at 
 13     #    least three races with valid speed ratings
 11 
 14     # 2) one of the three last races of the favorite
 14     #    is a least 4 points more than any of the last 
 15     #    five valid figures than any of the last five valid figures 
 15     #    of all of the starters of the race
 16 
 17     def get_highest_figure(starter1):
 18         # returns a tuple of the number
 18         # of valid figures and the best figure
 19         figures = []
 20         for pp in starter1.past_performances:
 21             try:
 25                 figures.append(int(pp.bris_speed_rating))
 24             except:
 25                 pass
 26             if len(figures) >= 3:
 27                 break
 28 
 29         if len(figures) == 0:
 30             return 0,0
 32         return len(figures), max(figures)
 16 
 35     if not is_the_favorite(starter):
 36         return False
 37 
 38     c, best_fig_for_favorite = get_highest_figure(starter)
 39 
 40     if c < 3 :
 41         return False
 42 
 43     for s in starter.parent.starters:
 44         if s is starter:
 45             continue
 46         c,f = get_highest_figure(s)
 47         if c < 3 or f + 4 >= best_fig_for_favorite:
 48             return False
 49 
 50     return True

It’s a chaotic event

delta

One of the fundamental directions we have to take when forming our handicapping philosophy has to do with the abstraction levels we need to maintain when evaluating a race.

It’s chaos

Horse racing is a chaotic event and as such is dominated by an extremely large number of parameters making an analytical approach impossible. A very common pitfall among figure makers is their belief that a very detailed algorithm will necessary lead to better results compared to a more abstract and simple one. This perception is dominate not only among figure makers but horse bettors alike thus we have professional services taking advantage of it, providing very expensive ‘numbers’ claiming a very extensive set of parameters been part of their methodologies.

How a speed figure should be used is more important than what it represents

In general, I think that what is more important than the figure making methodology itself, is the definition of a mechanism to classify a certain figure making method as better or worse from another one. Having such a tool on hand it will be easy to compare among distinct approaches and more than this to improve existing ones since we will be able to know if the addition of something that seems reasonable from our empirical prospective, indeed adds value to the method or not.

Questions like ‘What is a speed figure’ are a matter of definition and do not add much handicapping value, far more important is to answer the question of ‘how a speed figure should be used’ or the equivalent ‘what is the predictive power of this figure’. Of course we are not restricted to speed figures as we can generalize this approach to any attribute that can be expressed in a numerical format. Having a clear understanding of how to evaluate a figure allows us, to come up with pace, class, form, trainer or any other type of figure.

Each starter can be seen as a vector of figures and nothing more

Based in this approach we no longer need to make a distinction of each figure based in the specific attribute it is measuring. Instead applying an abstraction level, we can now represent each starter as a vector of figures, while each of them is associated with its corresponding behavior metrics. Again the challenging part is not that much the creation of each figure but the evaluation mechanism that should be followed for each one.

Using a similar approach we can create higher level composites, measuring a starter by a single number as opposed of a sequence of figures for each past performance, something that by contemporary handicappers is usually called rating. Going even further we can again compose all the ratings for a specific starter to a unique, final rating which will be a representative of his chances to win the race.

This unique rating should be the objective of any automated system as it allows the creation of a break down to win percentages for each starter. A very good example of such a rating is the Prime Power, provided by Bris, which represents an excellent model for the probability of each horse to win the race. Of course as good as it is Bris Prime Power or any other similar figure cannot be used for successful betting reasons not only because it is publicly available thus incorporated in the odds of each horse but also because it fails to account for the complexities of the race.

Understanding what my last statement about ‘complexities of the race’ is very important since it asserts the chaotic nature of the event which is what makes the game so difficult. In a nutshell what I mean when I am referring to horse racing handicapping as a complex procedure, is that the final outcome depends on all the past performances of all the starters in the race.

Figures are not enough

Before we decide that we need to add more detailed factors in our handicapping figures we need to have a concrete methodology to justify and validate its results. We need to be able to make a distinction of good versus a bad figure and we should follow an incremental approach adding more details until we reach an acceptable level. The nature of the figure, whether it measures speed, pace or something else, is of secondary importance, what really counts is to have a clear understanding of its predictability. Even having the most comprehensive and accurate set of figures still does not resolve the betting side of it since the game represents a highly complex event that cannot be solved by merely applying quantitative methods.

Art Or Science?

delta

An interesting dilemma regarding horse betting lies in the question of whether it should be driven by art or science.

By art I am referring to a process that largely relies in an intuitive process trying to decipher each race as an individual event relying on obscure factors that cannot be easily quantified as concrete metrics while on the other hand with science to a process that uses only concrete data and factors that can be processed by a strictly cognitive methodology leaving no room for further interpretations.

Partners not Competitors

I think that both approaches should work together in a complementary fashion rather than be seen as competitors that one nullifies the other. I also see proper betting execution and correct psychology as equally important factors to form a successful performance as bettors. For this posting though I will only talk about art versus science and leave the betting execution and psychology for later.

For the artistic part of handicapping to function properly, it needs to be fed by as accurate input as possible. This can be achieved by the scientific side of handicapping which deals with data mining and pattern recognition. A completely automated system that will be able to make successful betting decisions on its own, is something that is not proven to exist. Although there have been several references to it, there does not seem to exist enough evidence to actually prove that it is possible to beat the races following a robotic system.

Meta-Handicapping

Where the analytical part of handicapping can really be helpful is to minimize personal biases and opinions when it comes to data, quantities and metrics that can actually be measured and verified. To achieve this what we really need is a meta-handicapping level where we will define the proper approaches to specify and validate handicapping factors. The next step will be to use a comprehensive universe of validated factors as the input to the ‘artistic’ layer of our handicapping to reach our final conclusions. Of course this ‘artistic’ layer can again use computerized mechanisms such as pattern recognition, genetic algorithms or any other method of artificial intelligence although in most of the cases a human participation will be needed as a vehicle to apply ‘handicapping talent’ which might not be covered even from the most sophisticated AI process.

The more extensive and accurate is the universe of factors we are using the less we will have to rely to ‘talent’ who will receive more and more help from the scientific part, making its work simpler and more efficient without of course its impact to be completely eliminated.

Both are needed

Statistics, intuition, psychology and game theory they all work together in horse betting and are all needed to achieve success. The reason most of my posting here are dealing with the analytical part of handicapping is simply because it is much more cognitive and concrete and easier to present and process, this does not mean that we should underestimate any of these components if we really want to become successful as horse bettors.

Blinkers On and Off

delta

Equipment changes is a handicapping factor that can easily be defined and tested. As bettors we need to know how the affect the outcome of a race and how well they are perceived by the betting crowd. For this test I will define as blinkers on / off any change since the last race of the horse, alternatively we can use instead of a change since the last the first time such a change occurs although I will not test this more specific case here. What is important is to understand the methodology to use to derive a conclusion rather than the specific result.

If we restrict equipment changes to only blinkers we have three cases:

– Blinkers on

– Blinkers off

– No change since last

Here I will test the behavior of these changes as they apply only to the favorite.

Of course I will follow the same method I did for: Leo dont be romantic and Don’t be fooled by Da Hoss

Blinkers off

For blinkers off we have the following results:

factor                  winners     losers       Win%        ROI
================================================================
blinkers off                       87        140        38.3%       0.91
no blinkers changes        6591      11509      36.4%       0.86

total number of starters : 18327

Calculating Chi Square using observed results only


degress of freedom = 1
chi2               = 0.353
critical value     = 3.841

not significant

Calculating Chi Square using observed results and expected values based on the odds


observed/expected values using percentages
87.00/   83.12    140.00/  143.88  totals = 227 227.0

6591.00/ 6935.26  11509.00/11164.74  totals = 18100 18100.0

total observed=18327
total expected=18327.0


degress of freedom = 1
chi2               = 27.99
critical value     = 3.841

significant

As you can easily see blinkers off is not a significant value as a predictor of the winner of the race.

Besides that this change indeed offers betting value since as we can see when we are using expected values based in the public odds the offer a significant overlay.

This means that the crowd underestimates the value of this angle something that can be also seen by its increased ROI.

This is a typical example of an angle that although neutral when it comes to absolute winning prediction it still offers betting value.

Blinkers On

Now let’s test blinkers on:

factor                  winners     losers       Win%        ROI
================================================================
blinkers on                     193        384        33.4%       0.82
no blinkers changes        6591      11509      36.4%       0.86

total number of starters : 18677

Calculating Chi Square using observed results only


degress of freedom = 1
chi2               = 2.12606978472
critical value     = 3.841

not significant

Calculating Chi Square using observed results and expected values based on the odds

observed/expected values using percentages
193.00/  213.99    384.00/  363.01  totals = 577 577.0

6591.00/ 6935.26  11509.00/11164.74  totals = 18100 18100.0

total observed=18677
total expected=18677.0


degress of freedom = 1
chi2               = 30.9776643876
critical value     = 3.841

significant

Here we have a reversal of the previous example:

The factor still remains neutral as an absolute winner predictor but as it can be seen in the calculations using as expected value the crowd’s win percentages, it represents an underlay (note the lower ROI).

The crowd is overestimating the value of the angle again giving us betting value but in the opposite direction this time.

This means that in races with such a favorite we should be inclined to look for a value bet in some other horse who will potentially represent an overlay created by misestimation of the public due to ‘blinkers on’ on the favorite of the race.

Specializing blinkers on/off based in race classification

Now lets see how this angle behaves based in race classification. For this I will compare each factor’s behavior in maiden and non maiden races trying to see if there is a significant impact due to race classification:

Blinkers Off Maidens against Non Maidens


factor             winners losers   Win%     ROI
================================================================
blinkers off maiden     30    50   37.5%    0.88
blinkers off nomaiden   57    90   38.8%    0.92

total number of starters : 227

Calculating Chi Square using observed results only


degress of freedom = 1
chi2 = 0.035
critical value = 3.841

not significant

Calculating Chi Square using observed results and expected values based on the odds


observed/expected values using percentages
30.00/ 30.01 50.00/ 49.99 totals = 80 80.0

57.00/ 53.10 90.00/ 93.90 totals = 147 147.0


degress of freedom = 1
chi2 = 0.448
critical value = 3.841

not significant

As you can see there is no significant difference for blinkers off favorites from maiden to non maiden races

Blinkers On Maidens against Non Maidens

factor                     winners        losers    Win%        ROI
================================================================
blinkers on maiden          85        190      30.9%         0.75
blinkers on nomaiden    108         194      35.8%         0.89

total number of starters : 577

Calculating Chi Square using observed results only

degress of freedom = 1
chi2 = 1.52250152136
critical value = 3.841

not significant

Calculating Chi Square using observed results and expected values based on the odds

observed/expected values using percentages
85.00/ 102.75 190.00/ 172.25 totals = 275 275.0

108.00/ 111.25 194.00/ 190.75 totals = 302 302.0

degress of freedom = 1
chi2 = 5.04328586031
critical value = 3.841

significant

For blinkers on again there is no impact to the absolute predictability of the factor when switching from maidens to non maidens but indeed there is some betting value on this factor as we can see that the crowd tends to over bet this angle for maidens. This means that maiden favorites getting ‘blinkers on’ are poor bets (underlays) who tend to create value to some of the other starters of the race (always in comparison to non maidens blinkers on)

Betting Strategy

Based on this analysis we can easily understand why we do not really care about the absolute predictiveness of a specific factor but about how this factor is perceived by the public…

We can use both angles (blinkers on / off) for our betting using them in the opposite way. For blinkers off we now should be aware that it an angle overlooked by the crowd (remember I am always talking about the favorite of the race) so it should be used as an positive sign when evaluating the favorite while the exact opposite should be done in the case of blinkers on.

I will be glad to hear your comments about this topic so please feel free to leave a comment…

Leo, don’t be romantic !

delta

Leo is one of my track buddies and a good handicapper and player. He has the patience to wait for the race he really has an opinion and when this happens he is really sending it in, he is one of the heaviest player I know. His handicapping is decent and I have seen him many times hitting a 4-1 shot for a grand or even more; however, one of the weakness of his handicapping is that he follow a very classic way of thinking which sometimes leads him to the wrong horse…

It was a cold weekend day here in Aqueduct back in February of 2010 when one of the best horses we have seen during the last years (especially on the turf), Gio Ponti, was making his first start for the season after a very strong performance in the Breeder’s Cup classic where he was second only to the modern era freak filly Zenyatta.  His trainer had find a relatively soft spot for his comeback, selecting a race in Tampa Bay which was looking like an easy task for the champ.

Discussing the race, Leo was convinced that Gio was not going to lose by any means, based in his view he was a cinch, an opinion that was shared by almost everybody at the track since minutes before the race he was an odds one favorite giving the impression that the race was going to be a breeze in the park.  This is when I responded to him with a phrase that since then is used as a joke among my friends who happened to listen to my comment:

– Leo, please don’t be romantic..  Think out of the box…  Gio Ponti is vulnerable today… Be aware…

Of course Delta was not able to convince Leo or anybody else, something that maybe is a good thing, as Karelian under Rosy finished just a nose ahead of the champ (as you can see in the picture), saving my day as he was my final pick at odds close to 7-1.

This romantic view of Gio Ponti’s chances to win, is a typical example of a Type 1 factor that is misleading the public.

The factor I am talking about can be described from a horse fulfilling the following requrements:

– Favorite of the race

– His last start was one of the Breeder’s Cup route races

This type of starters make terrible favorites creating the opportunity for the bettor to find a good bet among the other starters of the race who will most likely be overlays.

Following the same method we did in the Da Hoss posting we can see the following results:

Factor Winners Loosers Win% ROI
not romantic 6860 12015 36.3% 0.86
romantic 11 18 38% 0.81

total number of starters : 18904

Chi Square

degress of freedom = 1
chi2               = 0.03
critical value     = 3.84

not significant

Note that what I call romantic horses are not winning in a significant rate more than any other favorite while as we can see here the crowd is over betting them very significantly:


Chi Square using expected values based on the odds

degress of freedom = 1
chi2               = 29.65
critical value     = 3.84

significant

You can easily realize this angle even by the ROI which for romantic horses is only 0.81 (always based in a dollar bet) while all other favorites show a 0.86.

Or course this does not mean that romantic horse never win. As you can see from the sample almost 38% of them are going to win but since they are significantly over bet they are a terrible bet.

So, next time you will see a romantic horse, remember of Delta and try to beat him..  This is the way to go!

It is Anybody’s Race…

delta

This race looks very open to me, anyone can win..I prefer to pass it and wait for something more secure…

This is a sentence I have heard hundreds of times from my racing friends when a race with some additional handicapping complexity is coming up.

Horse players are usually reluctant to bet in a race which is conducted under not very common conditions, like for example in a strange distance like a mile and five sixteens, a sprint on the turf, races who were moved from turf to a muddy track or races filled with a lot of first time outers or shippers who have never before ran against each other.

In other words, horse players do not like to bet races where there is not large determinism. The more relative data exist that can be matched to today’s distance and classification the more deterministic the event becomes or the possibility for a surprise in the outcome is minimized.

Imagine for a second a race where all the starters have ran against each other several times on the same distance and surface as today and none of them is coming from a layoff while two of them have always finished on the top three spots. This is a very deterministic race where very little is unknown about how the race will be run today. The bad thing though is that this information is known to everyone so there will be very little margin for the public to commit a serious mistake.

As we discussed in It is not about picking winners what we are looking for as bettors in not to pick the winner but to capitalize in betting inefficiencies of the public. Based in this perspective the race I have just described offers very little interest a betting event.

Another race in the other hand, where we are going to have shippers coming from all over the country to compete against each other in a distance that none of them have race before while some of them will be coming of a layoff despite the fact that will present far less determinism it could also be a good betting proposition if we have reasons to believe that the public will make a serious mistake evaluating each horse’s chance to win.

If indeed the race looks open to us while the crowd is focusing in a couple of starters that we really think they have exactly the same chance with a couple of others then there is no reason not to take advantage of it betting the latter while ignoring the former.

Of course taking this approach we need to be ready to sustain a low strike frequency which will be compensated with higher than normal returns. The high stochastic nature of these type of races will not allow us to find the winner very often, since we will be making our selection from a larger than usually group of contenders, despite this though the price will be high enough to eventually allow us to show profitability.

What really counts is to have the skill to successfully classify a race as open and justify if there are betting errors from the public.

Don’t be fooled by Da Hoss

Legendary (but fragile) turf champion Da Hoss, trained by Mike Dickinson managed to win the Breeders Cup Mile on the Turf after a two years layoff!

Here you can watch the race:

Although 15 years have already passed since then, it still stays vivid in my memories as does the fact that the champ was the very first horse to eliminate when I was handicapping for the Cup the night before!

Many years later this race was used by one of the handicappers on DRF seminar DVDs to make his case about how horses are more likely to win on turf than on dirt when they are coming from a layoff.

Recency is one of the fundamental Type 1 factors.

Exactly as I did when handicapping Da Hoss, one of the first things we are going to use when we are handicapping a race is the recency of each starter.

The most typical approach is to categorize a starter as coming from a layoff, second or third of the layoff, deep form cycle, long layoff and first time out. Note that this categorization is covering all the starters of the race something that we will see its importance later.

For now I will just select an example using recency to explain the procedure I follow in general and in a next posting I will extend this to a more generic concept that can be applied in many cases.

The factor I am going to analyze is for horses coming from a layoff. I define as such any horse who has not race for the last 45 days but his last race was not more that 120 days ago. If the horse is a second time out the second condition is not used.

I will use as a secondary factor the surface of the race which can either be turf or dirt (including synthetic surfaces).

The reason I am selecting surface as the secondary factor is because it is a common belief among handicappers that it is easier for a horse coming of a layoff to be ready to win on turf than it is for dirt. I still remember I heard such a comment in one of the DRF seminars (sold on DVD) and I also remember that the speaker used as an example the legendary (but fragile) turf champion Da Hoss who trained by Mike Dickinson managed to win the Breeders Cup Mile on the Turf after a two years layoff!

Let’s see if this opinion is correct.

The tool we are going to use to reach this type of conclusions is the chi square hypothesis test. At this point I will not describe in detail this method, for more detailed description you can read this thread started by an expert poster on this field posting under the nick TrifectaMike.

You can also find a lot of related material on the Web. A very good introduction that I found on youtube is the following:

For starters let me just say that this method starts with a hypothesis that certain random events have the same probability and using a statistical method based in the chi square distribution tries to either confirm or reject it. The outcome of the experiment will either be that indeed the original (null) hypothesis is correct or it will be a percentage of confidence that the hypothesis is wrong. For all my testing I will be using as level of confidence the 0.05 threshold. Meaning if I will reject the null hypothesis I will be sure by 95% that I am correct about my rejection.

Note that for this case we will only consider the favorites of the race, in other words we will only consider races where the favorite was coming of a layoff, any other race will not be considered for our test. Our conclusions will only apply to favorites and for all other horses I will do a similar experiment later.

Using compare factors  and recency factors python modules I was able to query my database getting the following results:

BE CAREFUL WE ARE USING FAVORITES ONLY

Factor Winners Loosers Win% ROI
layoff on turf 291 592 33% 0.83
layoff on dirt 707 1155 38% 0.88

Chi Square calculations

degress of freedom 1
chi2 6.5
critical value 3.84

We can immediately see that the null hypothesis is rejected and indeed there is a significant difference between the two sets.

To our surprise though, it is not what we expected! Please note that the dirt starters are winning with a higher frequency!

So far we have proved that the common belief of turf starters coming of layoff are winning more than dirt starters is completely wrong when we are considering only the favorite of the race.

We have not finished yet though…

The next and final step is to calculate how well this discrepancy is reflected on the pools. This is what is really important, because if the public is aware of this irregularity then we do not have anything to gain out of it..

For this, I am rerunning the chisquare test but now I am using as the expected value the one that is suggested by the odds of the horse (adjusted for the take out). If the crowd is betting perfectly then I am expecting the null hypothesis to hold true. The results were as following:

Calculating Chi Square using expected values based on the odds

degress of freedom 1
chi2 4.17
critical value 3.84

Good news! As you can see the null hypothesis is rejected and the condition is significant.
What this means is that the DRF seminars and other sources of handicapping like books have done a good job, completely misleading the public to constantly misjudge horses starting on turn while coming of a layoff and starting as the favorite of the race. Note that the ROI for this starters is only 0.83 while for dirt is 0.88

One note here, the fact that just the ROI is higher for the dirt does not necessary means that the condition is significant, this is exactly why we need to calculate the chi square.

I will continue with more similar examples as I think this is a very fundamental starting point to create a winning betting strategy….

An exercise for the reader

Speaking (and possibly dreaming) about Whales seems to be one of the favorite topics among horseplayers.

Despite the fact that we have close to nothing when it comes to concrete information about them, the rumor has it that they exist and thrive. The actual numbers thrown around about them are very impressive as I have heard claims for a whale betting as high as $500M total handle per year which would return in a rebate total the range of $50M.

The common belief is that whales, have developed very sophisticated models (besides the fact there is no evidence of it) and are able to predict the outcome of the race with the closest possible accuracy, taking advantage of us, the naive and addicted horse bettors who remain hopeless in our battle against them!

What I find really interesting, is the fact that since there are claims for multiple whales they need some kind of an agreement to no bet against each other. In the case where whales are going to compete among them, considering the size of their betting any advantage they might have over the crowd, will by shortly evaporated since the largest portion of the pool will be their money. Of course we cannot rule out this type of a silent contract among them but in this case we have to realize that their share is shrining proportionally to their count.

As a player, I am not concerned about their handicapping expertize, not even about their betting execution plans… The only real advantage they have over a middle or large player is their rebate percentage. The whole idea behind applying different rebates levels based in total handle represents a very unfair concept (for the small player) that the race tracks and AWD seem not to care about.

To understand how unfair this is let me give the following example:

The total pool for a given bet is $100K. The take out for this particular bet happens to be 25% so the effective pool will be $75K while the $25 are going to the track.

Out of the total $100K 30% of it consists of whale money so we have the following break down for the effective pool:

Whales: $22.5K
Public: $52.5K

Let’s assume that the public receives a 3% rebate while the whale a 10%. Based in this assumption we have the following rebates:

REBATES

Whales: $2,250 (58.9% of the total rebate)
Public: $1,575 (41.1% of the total rebate)
——
total : $3,825

As I said before the total take out is $25,000 which is distributed as follows:

TAKEOUT

Whales: $ 7,500 (30% of the total takeout)
Public: $17,500 (70% of the total takeout)
——
total : $25,000

 

So you understand that although whales only pay for 30% of the total takeout they still are getting back almost 60% of the total rebate.

Who is actually paying for this advantage is left as an exercise for the reader!