An exercise for the reader

Speaking (and possibly dreaming) about Whales seems to be one of the favorite topics among horseplayers.

Despite the fact that we have close to nothing when it comes to concrete information about them, the rumor has it that they exist and thrive. The actual numbers thrown around about them are very impressive as I have heard claims for a whale betting as high as $500M total handle per year which would return in a rebate total the range of $50M.

The common belief is that whales, have developed very sophisticated models (besides the fact there is no evidence of it) and are able to predict the outcome of the race with the closest possible accuracy, taking advantage of us, the naive and addicted horse bettors who remain hopeless in our battle against them!

What I find really interesting, is the fact that since there are claims for multiple whales they need some kind of an agreement to no bet against each other. In the case where whales are going to compete among them, considering the size of their betting any advantage they might have over the crowd, will by shortly evaporated since the largest portion of the pool will be their money. Of course we cannot rule out this type of a silent contract among them but in this case we have to realize that their share is shrining proportionally to their count.

As a player, I am not concerned about their handicapping expertize, not even about their betting execution plans… The only real advantage they have over a middle or large player is their rebate percentage. The whole idea behind applying different rebates levels based in total handle represents a very unfair concept (for the small player) that the race tracks and AWD seem not to care about.

To understand how unfair this is let me give the following example:

The total pool for a given bet is $100K. The take out for this particular bet happens to be 25% so the effective pool will be $75K while the $25 are going to the track.

Out of the total $100K 30% of it consists of whale money so we have the following break down for the effective pool:

Whales: $22.5K
Public: $52.5K

Let’s assume that the public receives a 3% rebate while the whale a 10%. Based in this assumption we have the following rebates:

REBATES

Whales: $2,250 (58.9% of the total rebate)
Public: $1,575 (41.1% of the total rebate)
——
total : $3,825

As I said before the total take out is $25,000 which is distributed as follows:

TAKEOUT

Whales: $ 7,500 (30% of the total takeout)
Public: $17,500 (70% of the total takeout)
——
total : $25,000

 

So you understand that although whales only pay for 30% of the total takeout they still are getting back almost 60% of the total rebate.

Who is actually paying for this advantage is left as an exercise for the reader!

Betting horses does not pay by hour

Here they do not pay you by the hour..

My friend Thaskalos, who is a pro bettor and one of the top posters in PaceAdvantage made a pretty good score parlaying a small win to a large triumph an approach I find to be the best when you are betting for profit.

Discussing his thought process and the aftermath of his handicapping and betting he stated a very interesting quote that I think encapsulates one of the more fundamental and important truths of horse betting… He told me that after collecting his winnings he was quick to his way home, saying to his OTB buddies that here you are not getting paid by the hour…

This simple sentence encapsulates the essence of gambling. A tight and aggressive strategy resulting to a not self weighted bets is the way to treat this (and almost all) type of gambling. Giving yourself the opportunity to get lucky, capitalizing on your confidence to contradict the public is the only way to achieve a big score.

Even more important than handicapping expertize and betting skills is the proper physiological attitude needed in order to successfully apply this concept under real world conditions. The gambler needs to be convinced that he is doing the right thing and never allow negativity to shadow his plan. He must be prepared to accept both winning and losses trying to minimize their impact to his behavior and thinking. A great gambler must view money as a mere score keeping mechanism similar to video games and nothing more than that.

The moment that the gambler will allow himself to think that with the money he is betting in the next race he can buy a new laptop or go for a vacation to a Caribbean island, he is already too soft to be called a pro gambler.

The truth is that the majority of gamblers do not have this attitude, instead they become very protective and soft in their decisions, something that mathematically leads them to ruin. I find this behavior analogues to the boiling frog story which describes how a frog will allow himself to be boiled alive if he is placed in a slowly heated pot of water while he will immediately jump out if he is thrown directly to boiling water.

As Thaskalos says, in horse racing you are not getting payed by hour, instead your strategy needs to have explosions that will give the chance to take em down and long periods of calmness where you will just be thinking about the game…

Type 1 or Obvious Handicapping Factors

Today I am going back to the topic of  Categorization of Handicapping data which represents a very interesting topic and its comprehension is essential for betting success.

Specifically I will talk about what I call Type 1 or obvious factors;  what  I am referring here are  the factors commonly used by the majority of the players that represent the most basic knowledge of the game and are most of the times reflected on the odds line as formed by the crowds.

Some examples of these Type 1 factors are the following:

  • Bet live horses who managed to either win of finish close to the winner in their most recent races
  • The horse should have run within the previous 35 days
  • Require experience to the distance and surface and avoid horses who are trying something for first time
  • Do not bet in low profile connections
  • From two starters exiting the same race always bet the one who finished in a better position
  • Do not bet horses who just broke maiden or are significantly stepping up in class
  • Prefer horses showing some ‘back class’ to cheaper ones
  • For first time starters require a good breeding and good workout pattern…
some of them might have predictive value which is not necessary the same as betting value as we discussed here:

The problem with these type of factors is that they represent well known racing maxims that have been used forever and although some of them indeed might help to identify the winner of the race with some relative high frequency in the other hand it is completely impossible to make a profit betting this way.  Reading this posting It is not about picking winners will help you to understand what exactly I mean here and I believe that this is one of the most fundamentals aspects towards the way to improve  our game…

The good thing about this type of factors is that although we cannot make money directly betting their matches we can use our knowledge about their ineffectiveness to find betting value on other starters showing a completely different profile.

There are two things we need to do in regards to Type 1 factors:

  • Identify as many as possible of them
  • Try to understand to what degree and under what kind of  circumstances each of them is creating false expectations from the crowd

After we have gathered a large enough number of these  factors and know which of them are most likely to mislead  the public more, it will be easier for us to detect overlays using the other type of factor I described earlier, the  Type 2 or hidden.

In my next posting we will continue this discussion and see some of the mechanisms that we can use for this purpose…

Gambling or weather predictions

Today I am continuing the discussion about the fact that success in horse betting does not necessary on just handicapping skills alone. In top of this although handicapping horses is a field where expertize can be developed this does not mean that it can be used as the prime or only means to betting success.
I am not still sure if I convinced you about the truth of my statement and for this let me try a hypothetical example that might illuminate my case a bit.

Weather forecasting in our days is a science, applying very sophisticated mathematical and technological achievements to predict with a very high degree of accuracy the state of the atmosphere; this prediction is far from been deterministic albeit the statistical error can be very low.

Let’s assume that there is a mutual based type of gambling game where the players are trying to predict tomorrow’s weather.

What do you think about this game?

First of all we have to say that there is no doubt that there is a lot of skill involved in the prediction itself which is based in a scientific discipline.

The real question though is about what the optimal bettor strategy should be so he can maximize his expected value.

One can be quick to think to just following the weather forecasters on the web or on his TV every night should be enough to make him a long term winner.

Unfortunately this simple strategy will not allow him to make any profit because the expertize associated with the weather handicapping is so easy to achieve; even if this expertize encapsulates extremely advanced knowledge, it boils down to a simple set of decisions that will be very similar even if multiple scientists are performing the calculations independently.

One way for this game to gain some betting interest is to narrow the predictions to a very precise level. Just betting if it is going to rain or not will not be enough, the exact inches of rain can add enough stochastic to the game so it will create winners or losers. What this means is that the effect of randomness needs to be amplified for the game to become something more than just a money wash where winners (and losers) can exist. In other words what takes to make this game a real betting event is to decrease the impact of skill to a level that randomness will be high enough to encourage potential bettors that they might have found a good bet.

Someone might approach this game trying to discover a better forecasting method applying more advanced math or technology and keep it secret to himself. The question is how possible is going to be for such an approach to be successful. Let’s not forget that the individual researches on the domain are pretty much equipped with the same knowledge something that means that it takes a very optimistic way of thinking to believe that one specific researcher can outsmart all the rest. Even if this will be the case chances are that sooner or later his advantage will be evaporated since his competitors will eventually discover his secret methods again resulting to a leveled game.

Another way for the bettor will be to not pay close attention to the experts betting against their predictions. Sure enough most of the time he will loose his bet since they will be correct in an overwhelming percent of events. Remember though that the expert opinion is not deterministic and there will be some times they will be proven wrong. To illustrate this case better let’s consider the following fictitious case:

The mutual pool is created by exactly 10 players, nine of them are expert statisticians having the best technology available to them and are always betting their prediction for a big amount, let’s say $1,000. The tenth player is completely ignorant when it comes to forecasting and his bet is dictated by a random method, for example he could be using a roulette having instead of numbers the available weather conditions to bet. He is also betting a very small amount each time, like $1 for example.

This game goes on for several years in a daily basis. Who has the best chance to become a winner?

If you think about it for a few seconds you will easy find out that the ignorant bettor will be the only winner (actually a very big one) since when eventually he will be picking the most unlike outcome he will be taking down $9,000 for his single buck!

This is happening exactly because in this game as it is structured, prediction expertize is useless and what only counts is the proper betting execution strategy and nothing else.

Horse racing betting is not as simple as this toy game described here but the same concept of diminished value of handicapping and increased value in betting strategy holds true to it as well.

The overwhelming majority of horse handicappers are concerned about developing their expertize towards picking the winner of the race meanwhile completely ignoring the impact of randomness on their betting behavior something that can be much more important than a slightly improved prediction ability.

In my subsequent postings I will delve deeper to the concept of the optimal ‘game theory’ that needs to be followed to maximize expected value for horse betting…

Meanwhile,  please feel free to express your objections as they can initiate some interesting discussions about this interesting topic….

It is not about picking winners

In horse betting and in any other form of gambling based in mutual pools the objective not to select winners!

I know that this sounds like a very bold and strange statement that someone might be quick to oppose and reject. After all one of the most influential books ever written in the subject is titled ‘Picking Winners’ and his author, Andy Beyer is one of monumental figures of the domain.

Still, this approach based in picking the winner of the race represents one of the worst ways to bet horses.

While betting horses, our objective is not to end up with a lot of winning tickets but to win more money than what we bet.

An obvious horse that figures to win with a very high degree of certainty is useless for betting purposes if it is such perceived from the majority of the participants to the pool.

It is a completely different skill to handicap a race and predict its outcome than betting on it trying to win some money. I think that the former indeed is a skill driven domain which can be mastered while for the latter I do not have a definite answer.

The problem relies in the fact that the expertise in just picking the winner of the race although a real skill is still a relatively easy either to master of get access to it. Do not forget how good the crowd seems to be at this, excelling in its ranking selection in such a way that the lower odds horses always win more races than the higher ones. Besides this, the crowd if of course is a heavy looser in what we call the ‘long run’, meaning after any long enough sequence of races allowing the law of large numbers to be applied adequately.

Tomorrow I will continue this discussion making my point more clear through some examples and further clarifications..

A categorization of handicapping data

A simple research of the behavior of the crowed will reveal that it is surprisingly good in its estimates. It is a well known fact that favorites win more races than second choices, second choices more than third and so on. Based in this observation we can easily conclude that the outcome of a race is highly correlated to the past performances of its participants (since this is the source used by the crowd to make betting decisions).

Another trivial conclusion is that although the crowd can peek the most frequent winner with remarkable consistency and accuracy, it is impossible to make a profit by betting on it.  A more astute researcher will add to this statement the fact that although you cannot make a profit by always betting the favorite, you will still loose the less amount possible by betting on it rather than the second, third or any other choice; this means that favorites as a group seem to be underbet while the opposite occurs for longer priced horses. Actually we can easily prove that the longer the odds of the more of underlaid the corresponding group of horses is.

This behavior of the public is a well know physiological effect known as risk seeking and you can read more about it in the Daniel’s Kahneman book Think Fast and Slow. An example of this concept can be observed in lotto players who are willing to risk a very small amount with the hope to win a very large one, even if the odds are severely against them.

Having said that, at this posting I will not describe a complete betting solution, something that needs to account for randomness and variance but I will focus only on the data that as we saw are playing a dominant role in the outcome of the race .

To make my research easier I will categorize the data that can affect the outcome of a race to the following categories:

Type 1
Obvious handicapping factors used by the crowd

Type 2:
Hidden handicapping factors that although they are derivatives of the public data still they are not known to the wide public

Type 3:
Factors that cannot be known to the public but still are known to the connections of each starter or to some insiders of the game.

We can also add another type of data, those that are completely unknown. An example of this case might be a pathological condition of a horse that is not diagnosed so far but will become apparent during the race. For now I will not refer to this type of data, which is something that I will speak about in a later post.

Type 1 data are the easier to specify as they are directly reflected in the betting pools. Traditional handicapping factors, like those contained in handicapping books suggesting to only bet horses who have previously won at the distance or are coming from a bug win are some examples of this type of factors that to a great extend are driving the game.

Type 2 represents a more processed state of the primitive data that is not widely known or are completely unknown to the public. Such an example could be the Sartin figures that although known and used by some handicappers they do not consist a mainstream approach and in some cases when they will contradict Type 1 data might present some betting value. A similar example could be some metrics that are created by an individual handicapper and only used from him.

Type 3 is a category that cannot be directly evaluated but we probably can use some indicators like layoff for example. It is obvious that the longer the layoff of starter the more possible if for the trainer to observe some change in his condition. Somehow we need to identify as many such indicators that might increase the ability of the insiders to have some hidden information. As bettors we might prefer either not to be this type of information at all or to have enough hints to make an educated decision of what the insiders are trying to do.

After these definitions, the next step is to investigate if there exist a relationship of these three categories that can create a measurement of how profitable a race can be?

Can we somehow create a metric that will tell us how profitable or not this race will be for betting purposes?

For now I will just articulate a hypothesis leaving its proof for another posting.

Based in what I have so far describe it is natural to assume that the more dominant the presence of Type 1 data is for a specific race the more possible it is for the crowd to form the correct opinion shaping the pools in such a way that it is impossible to extract some value out of them.

In contrary the less of Type 1 data are available the more possible it is for the crowd to be confused ending up in a week line offering some betting opportunity.

So, the more the outcome of the race is influenced by type 2 the more interesting is becoming for the handicapper while a race where there are no significant type 2 and no type 3 data will be handicapped correctly by the crowd thus do not offering any value.

The immediate challenges we are facing at this point are the following:

– Identify what are the type 1 factors

– Create a set of valid type 2 factors

– Transform hints we might get from the past performances to possible type 3 factors.

These three points will be the topics of my next posts…

Always evolve

As a programmer you should always follow the evolution and try not to be conservative in your decisions about what technology to follow.

A very common pitfall among programmers is that after they reach a certain level of an expertise in a specific technology they become bound to it refusing to extend their skillset to parallel domains. For example after a developer becomes fluent to BASIC he is reluctant to learn PYTHON.

This happens for several reasons. He might be afraid that he will get confused by the additional information or he might believe that it is a better investment of his time to delve deeper into his current language assuming that this is the best way to become a better programmer. He also might feel a sense a completeness, in other words that he has reached his goals and he pretty much covers all his needs using his current knowledge.

I think this is one of the worst mistakes that a programmer can commit during his career.

Technology moves extremely fast and failure to evolve will rapidly convert a competent developer of the present to an outdated dinosaur of the next year. The developer should resist to the temptation to consider his skillset complete and stop to learning new things if he wants to remain active for the coming years.

More than this, a horizontal expansion of the skills of a developer will improve his performance in his core technology and make his judgement calls more accurate and justified.

For example understanding functional programing can help a developer who specializes in an object orient language to apply a recursive style in his coding improving its readability and expressibility.

As a rule of thump I think that a programmer should learn at least one new technology every year; of course he does not need to become a master in all of them but he should get a good understanding of the design philosophy of each of them and whenever possible to try to apply some of them to a real world project.

A good developer should view the technologies he is using to build a solution as a mere detail to the whole picture and should design in such a way that his implementations are as technology agnostic as possible. Platforms that appear to be tightly coupled with specific technologies although attractive for tactical solutions will eventually become problematic in a more strategic prospective.

Based in this it becomes clear that a developer should be liberal about the tools he is using to reach his goals, never be reluctant to adopt new approaches and stay focused to the big picture as opposed to concentrate to the details.

Think twice before use raw percentages

When handicapping it is a very common tactic to use percentages as an indicator of the likelihood of some event. A common fallacy among horse players is the misunderstanding and misinterpretation of a percent figure that can easily create wrong impressions and betting decisions.

For example let’s assume the winning percentage of two trainers. Trainer A has a 16% while trainer B has a 33%. Which of the two seems to be better based in this percentages?

A lot of us will be quick to select trainer B as the better of the two. This is not necessary the correct decision; in reality we do not have enough data to make such a judgement, our input appears to be incomplete and we need some further information if we want our opinion to present some value.

Why it is that the first impression of just comparing the winning percentage of each trainer is not sufficient? The following example will clarify the reason:

Let’s assume that trainer A is running in races that always are full of entrants, for the example sake we can further assume that each of these races consist of 15 starters. So, this trainer is winning more than double of his fair share, which would be around 7%. You can easily see that it takes a very good trainer to reach such an accomplishment.

Now let’s go to trainer B who for the needs of our example, happens to run his horses in a fictitious racetrack where only match races are permitted. Following the same logic as before we can easily conclude that he happens to be a rather weak trainer since his fair share of winning is 50% but he is only winning at the 33% level.

This example of course is an extreme case but besides this it represents the concept of inadequacy of winning percentages as handicapping factors pretty well. In real word, we will probably do not have such bold situations but this concept will be quite common.

Of course there exist several methods to improve the value of the percentage as an indicator such as impact values, chi square test been the most commonly used.

For now let’s talk a bit more about the percentages since they are the topic of this post.

It is possible to improve the expressibility of a percentage by the application of additional conditions. What does this mean exactly? Again let me use an example to make it easily to understand.

Jockey A wins at a 28% level while B only at 10%. Based in what I said above these figures alone do not necessary mean too much and we can not derive valid results solely from them.

But, let’s now go a step further calculating the percentage of each jockey, not with all his mounts but only for the favorites he happened to have ridden. It is quite possible and happens all the time that now the statistics will look completely different. For example they can now look as A winning 29% while B winning 45% when they are on the favorite of the race. Although at this point I do not want to use this data to arrive to a handicapping conclusion (in other words whether this discrepancy it is a good or a bad thing for betting purposes) I think it is quite obvious that our view of these two jockeys now can very well be different than before.

Similar to specifying favoritism to be an additional factor we can create any possible combination we might think can present some value and be repeating this process for several such combinations we can view the same match for different points of view.

This process can be seen as the starting point for the creation of a set of statistical measurements that can improve our performance as bettors and I will talk more about it later…

The most common fallacy

The most common fallacy among horse players is the belief that winning in races is all about skill and there is always an correct way to select your bets. Horse bettors tend to overestimate their understanding of the game, usually leaving in denial of the evidence and facts proving that they simply cannot beat the game.

The horse player is constantly in the search for the winning approach that will give him the edge in his battle against the crowd. All kinds of different methodologies have been developed during the years measuring track variants, speed and pace figures, race profile, pace handicapping even the biorhythms of the horse, most of them only making a profit for their creators and promoters. What almost all the users and even the producers of this kind of products seem to be missing, is that betting horses is not a productive process, it does not create wealth but instead it redistributes the amount bet while each time this is happening a large percentage of it disappears from the pools becoming real profit for the race track owner and all the professionals related to any extend to the industry.

So, horse betting has by default a negative expectation which can only be turned to a positive if an individual player has the ability to outsmart the odds as they are offered by the crowed. Everything starts from there. If there is no edge, obviously is impossible to avoid loss in the long run due to the take out.

Can it be proven the game can be beaten? Does horse betting consists an activity involving enough skill to the extend to allow an ‘expert’ to create wealth just by placing bets? Do real expert horse racing handicappers that can consistently beat the public exist in reality?

These are not easy questions to address and most likely there does not exist a clear answer especially one that can be reached using an analytical way of thinking. It seems that the best we can do trying to find an objective answer is to refer to reality and analyze several winning players applying some statistical methodologies and derive our opinion based on that. Of course this sound easier that it really is considering how difficult it is to gather the necessary data to perform this type of a research. I have never seen such a research or anything similar more than aphorisms of the type ‘only the 1% of the horse players are winners all the rest end up loosing their bankroll’ without any real evidence to justify such an argument.

Personally I do not think that anyone can systematically outperform the crowd converting the pools to a form of an ATM machine that he can grind in a regular basis. Again, I cannot prove my point but until I have enough evidence of the opposite I will consider it valid. Despite this fact though, I think that horse racing might present some opportunity for a player to score pretty big, if he is willing to accept the fact that this is not something that can be done regularly and more than sound handicapping and skill it has as additional requirement to be selected by luck. In other words this type of horse player who can make a big score should be well aware of the impact of luck in this process and should try to put himself in the situation to attract it. Surely this approach does not constitute by no means a recipe to guaranteed profits (such a thing does not exist) but converts an average horse bettor who struggles to extend his betting time to a speculator that knowing that the odds are against him he still tries the best to put him in the right spot to get lucky when the circumstances will be favorable….

 

 

Data normalization in general

As we discussed before, for a model to be easily trained it needs to receive input in a normalized form.
The most common Normalization methods follow to the following broad categories:

– Min – Max

– z-score

– Decimal Scaling

– Logarithmic

– Sliding Window

Min – Max: Normalization is one of the simpler approaches and works as a linear transformation to a given range.

Two of the problems associated with this approach have to do with the following two requirements:

(1) The value range should be known and predefined both in training and in real world applications
(2) The closest the distribution is to a linear form the better it will work

z-score: Is using a simple formula to normalize the data based in the number of sigmas (standard deviations) from the mean

This method will work good assuming the following:

(1) The data are normally distrubuted
(2) The mean and sigma can be considered the same for training and real world data

Decimal Scaling: Is moving the decimal points of the values according to the maximum absolute value. This means that using the normalization technique the normalized data will always fall between 0 and 1

This approach needs (similarly to min – max method) the maximum value to be predined

Logarithmic: Used wherever it makes sense to use a logarithmic representation instead of a linear

Again this method needs some normalization of the input set otherwise it might establish some bias.

Sliding Window: Is mostly used for timeseries data and possible can be applied to any other data that can be expressed as a function of time. What this approach is doing is to divide the data to fragments of specific length and normalize each window individually.

The topic of Normalization is one of the most important for the creation of a model and I will analyze it deeper in my next postings…