About KPoz Sports | Methodology | Data Sources
 

Methodology


The KPoz sports ranking system is a linear probability model based on the logit regression technique. The model assigns a power rating to each team based on the results of games played (i.e. wins and losses) and incorporates a home field advantage term to adjust for the psychological edge from playing at home. Teams are then ranked according to their power ratings.

In addition to its high degree of accuracy, the KPoz model satisfies the required properties of statistical ranking models: transparency, verifiability, and accuracy:

Model Description


The KPoz sports ranking system is a linear probability model based on the logit regression technique. The model determines a power rating for each team based on wins and losses, and venue (and home field advantage) to accommodate for the psychological edge gained from playing at home. The model incorporates overall strength of schedule, power rating of the opponents, and home field advantage. Teams are then ranked according to their power rating.

The KPoz regression equation is as follows:

Y = ฿0 + ฿H – ฿A (1)

where

Y = natural logarithm of the odds ratio
฿0 = home field rating
฿H = power rating for the home team
฿A = power rating for the visiting team

Mathematically, we have

Y = ln (
pHA
1 – pHA
) (2)

where

pHA = probability that the home team will defeat the away team
1 – pHA = probability that the away team will defeat the home team
pHA
1 – pHA
= odds ratio (percentage of wins-to-losses)
ln (
pHA
1 – pHA
)
= natural logarithm of the odds ratio

The “odds ratio” above is the ratio of wins to losses. It denotes the number of wins per each loss. It is typically computed as the percentages of wins to percentage of losses (e.g., winning percentage divided by losing percentage).

It is important to note in equation (1) that the sign of home team is always positive, e.g., and the sign of the away team is always negative, e.g., – ฿H. Additionally, the home field advantage term is also always positive, e.g., + ฿0 .

Example

Suppose we have the following power ratings ฿0 = 0.25, ฿H = 3.25, and ฿A = 2.75. Then, the probability that the home team will defeat the away (visiting) team is computed as follows:

First, compute the natural logarithm of the odds ratio.

Y = ฿0 + ฿H – ฿A = 0.25 + 3.25 – 2.75 = 0.75

So we have

Y = ln (
pHA
1 – pHA
) = 0.75

Next, compute pHA as follows:

ln (
pHA
1 – pHA
) = 0.75
pHA
1 – pHA
= e0.75
pHA =
e0.75
1 – e0.75
= 0.68

So the probability that the home team will win is pHA = 68%.

The probability that the visiting (away) team will win is simply 1 minus the probability that the home team will win. That is,

pAH  =  (1 – pHA)  =  (1 – 68%)  =  32%

Specifying the probabilities

One difficulty that arises when estimating the parameters of the model, i.e., the power rating for each team, is that we do not know the exact probabilities that one team will defeat another team. Therefore, we can not compute true “odds ratio” or Y exactly. But this can be resolved by estimating the probability from each observation. The process is to assign p^= 0.90 if the home team won the game and p^= 0.10 if the visiting team won the game. Thus we have,

Y = {
ln (
0.90
0.10
) = 2.20 if Home Team wins
ln (
0.90
0.10
) = –2.20 if Home Team loses

So, if the home team wins we set Y = 2.20 and if the away team wins we set Y = –2.20.

Example - Estimating the parameters

Suppose we have the following six observations.

  1. JHome  defeats  KAway  so Y = 2.20.
  2. LHome  loses to  JAway  so Y = –2.20.
  3. KHome  defeats  LAway  so Y = 2.20.
  4. MHome  loses to  KAway  so Y = –2.20.
  5. MHome  defeats  LAway  so Y = 2.20.
  6. JHome  defeats  MAway  so Y = 2.20.

Then the model equations (1) are written as follows:

฿0 + ฿J – ฿K = 2.20
฿0 + ฿L – ฿J = –2.20
฿0 + ฿K – ฿L = 2.20
฿0 + ฿M – ฿K = –2.20
฿0 + ฿M – ฿L = 2.20
฿0 + ฿J – ฿M = 2.20

The ratings ฿0, ฿J, ฿K, ฿L, and ฿M are determined from regression analysis. But this requires a correction term for the matrix rank and for heteroscadisticity.