A categorization of handicapping data

A simple research of the behavior of the crowed will reveal that it is surprisingly good in its estimates. It is a well known fact that favorites win more races than second choices, second choices more than third and so on. Based in this observation we can easily conclude that the outcome of a race is highly correlated to the past performances of its participants (since this is the source used by the crowd to make betting decisions).

Another trivial conclusion is that although the crowd can peek the most frequent winner with remarkable consistency and accuracy, it is impossible to make a profit by betting on it.  A more astute researcher will add to this statement the fact that although you cannot make a profit by always betting the favorite, you will still loose the less amount possible by betting on it rather than the second, third or any other choice; this means that favorites as a group seem to be underbet while the opposite occurs for longer priced horses. Actually we can easily prove that the longer the odds of the more of underlaid the corresponding group of horses is.

This behavior of the public is a well know physiological effect known as risk seeking and you can read more about it in the Daniel’s Kahneman book Think Fast and Slow. An example of this concept can be observed in lotto players who are willing to risk a very small amount with the hope to win a very large one, even if the odds are severely against them.

Having said that, at this posting I will not describe a complete betting solution, something that needs to account for randomness and variance but I will focus only on the data that as we saw are playing a dominant role in the outcome of the race .

To make my research easier I will categorize the data that can affect the outcome of a race to the following categories:

Type 1
Obvious handicapping factors used by the crowd

Type 2:
Hidden handicapping factors that although they are derivatives of the public data still they are not known to the wide public

Type 3:
Factors that cannot be known to the public but still are known to the connections of each starter or to some insiders of the game.

We can also add another type of data, those that are completely unknown. An example of this case might be a pathological condition of a horse that is not diagnosed so far but will become apparent during the race. For now I will not refer to this type of data, which is something that I will speak about in a later post.

Type 1 data are the easier to specify as they are directly reflected in the betting pools. Traditional handicapping factors, like those contained in handicapping books suggesting to only bet horses who have previously won at the distance or are coming from a bug win are some examples of this type of factors that to a great extend are driving the game.

Type 2 represents a more processed state of the primitive data that is not widely known or are completely unknown to the public. Such an example could be the Sartin figures that although known and used by some handicappers they do not consist a mainstream approach and in some cases when they will contradict Type 1 data might present some betting value. A similar example could be some metrics that are created by an individual handicapper and only used from him.

Type 3 is a category that cannot be directly evaluated but we probably can use some indicators like layoff for example. It is obvious that the longer the layoff of starter the more possible if for the trainer to observe some change in his condition. Somehow we need to identify as many such indicators that might increase the ability of the insiders to have some hidden information. As bettors we might prefer either not to be this type of information at all or to have enough hints to make an educated decision of what the insiders are trying to do.

After these definitions, the next step is to investigate if there exist a relationship of these three categories that can create a measurement of how profitable a race can be?

Can we somehow create a metric that will tell us how profitable or not this race will be for betting purposes?

For now I will just articulate a hypothesis leaving its proof for another posting.

Based in what I have so far describe it is natural to assume that the more dominant the presence of Type 1 data is for a specific race the more possible it is for the crowd to form the correct opinion shaping the pools in such a way that it is impossible to extract some value out of them.

In contrary the less of Type 1 data are available the more possible it is for the crowd to be confused ending up in a week line offering some betting opportunity.

So, the more the outcome of the race is influenced by type 2 the more interesting is becoming for the handicapper while a race where there are no significant type 2 and no type 3 data will be handicapped correctly by the crowd thus do not offering any value.

The immediate challenges we are facing at this point are the following:

– Identify what are the type 1 factors

– Create a set of valid type 2 factors

– Transform hints we might get from the past performances to possible type 3 factors.

These three points will be the topics of my next posts…