Elo Ranking Methodology
Original Twitter Post: https://twitter.com/scuerij/status/1527039265800085510
@TheRedVillage | Elo Ranking Methodology | 18 MAY 2022
Moving forward, my @TheRedVillage Elo leaderboard rankings will be based upon a revised K-factor of 11 (formerly 20). Accordingly, Elo rankings posted on and after May 18th will differ slightly from those posted before. For more info, read on: Thread: 1/11
TRV Elo scores are updated after each battle, and the so-called “K-factor” defines the maximum value by which a champ’s Elo score will change after a battle. With this change, a champ will increase/decrease by no more than 11 pts (formerly 20 pts) following a battle. 2/11
The Elo system has been described extensively online, but a really clear, relatively non-technical overview is provided by Franz et al. 2015. This research effort applied the Elo scoring system to study non-human primate dominance hierarchies. 3/11

The excerpt below defines how win probability and Elo updates are calculated… Note that this is a zero sum system and in the implementation that I use, a participant will never gain points for a loss, nor lose points for a win, no matter what the difference in Elo. 4/11

Following a battle, the participants’ Elo scores will change minimally if the outcome aligns with the win probability implied by the difference in Elo prior to the match. By contrast, Elo will change maximally if the outcome deviates from Elo expectations. 5/11

The K-factor defines the maximal amount by which Elo will change following a battle. 6/11

So, TRV Elo scores are updated after each battle, and K defines the maximal value by which a champ’s Elo score will change. In this case, a champ will increase/decrease by no more than 11 pts following a battle. But how was the appropriate K-factor determined? 7/11
I used a parameter optimization method to determine the optimal K value, whereby win probabilities expected by Elo (pre-fight) were compared to post-fight outcomes. This method is outlined in Foerster et al. 2016. 8/11
K values 1-30 were tested across ~80k TRV battles, and the value that optimized the likelihood of observed fight outcomes (excluding 10% burn-in) was deemed optimal. Using this procedure K=11 was chosen, per the graphic below. This value will be reevaluated periodically. 9/11

It is interesting to compare the optimal TRV K-factor (K=11) to that of other sports. Per FiveThirtyEight, optimal K-factor for NBA, NFL, and international soccer is around 20. By contrast, optimal K-factor from MLB is closer to 4. 10/11
These K-factors loosely suggest that TRV fight outcomes are “noisier” (i.e. subject to more randomness) than NBA, NFL, or FIFA matches, but are more predictable than MLB games. So, it may not take 162 fights to identify a stud, but it takes time to cut through the noise. 11/11