Let’s Talk About Team Ratings
So a few months ago, scant (Dota Editor here at 2P) tasked me with coming up with a ranking system for the relaunch of the website, as well as doing general statistics work for other articles here at 2P. The rankings were supposed to be at least slightly close with what people understood to be ‘right’.
For last year’s The International, I had spent quite a bit of time thinking about how hard it would be to make predictions for any hypothetical match, and to ultimately answer the question of “who will win TI3?”. I eventually spent some time and came up with a stochastic head-to-head model that used win-rates between all pairs of teams, average overall winrates of each team, Elo values of each team and mixed all of these together in a weighted average. I ran simulations, and Martin from datDota (who provides all the excellent data we use) published the results on the datDota blog. The results were pretty good, making several key predictions and being helpful enough to make some good profit betting on the TI matches. In a recent article from Goldman-Sachs, they discussed how some of their economists used a similar model to make FIFA World Cup 2014 predictions. Since the Elo ratings formed such a core component of this model, I was at least slightly confident that the Elo rating idea would pan out nicely.
Elo is a chess rating system that assigns a ‘skill rating’ to everyone. Over time, people’s skill ratings go up and down based on winning or losing (if you win, your rating always goes up and your opponent’s rating goes down by the same amount that yours went up by). The amount that ratings change by for a result is based on the difference in their ratings. In cases where a team that is heavily favoured to win beats a severe underdog, they get very few points; but should an underdog beat a favourite, they will take many points away from the favourite.
So my first goal was to resurrect the model that I’d used, update it, and allow integration with datDota, so that the model is continuously updated. Elo ratings are a pretty stock standard, and most of the issues with the otherwise fantastic Elo system are completely non-existent in the Dota world - teams don’t refuse to play matches against some opponents to protect their ratings, they want to win (money)!
The system I’ve used has a few small deviations from normal Elo implementations. The first involves point deflation/inflation. If a team disbands, their rating under Elo remains the same forever, until they are unlisted and must start a new rating afresh. In this system, a small percentage of every team’s rating is taken away each day from every team who is above the average rating, and distributed proportionally to the teams who have a rating below average (taking them back closer towards the average value). This amount is small, it just means that a team will return to about 10% above average rating (no matter how high their rating is) over the period of 3 months of inactivity. Obviously this is easy to mitigate by playing games.
The second difference involves what unit is used for the Elo system. Most implementations of Dota 2 Elo systems involve using a match result as an Elo data point. This means that if a team wins 4-0 in a bo7, it’s the same as winning 4-3. This is a bit misleading, and I tend to avoid these systems. An Elo model should use the smallest possible significant result it can, and in the case of Dota - this is a single game, not a match. A complete underdog should logically get a significant point reward if they take a favoured team to the 7th game in a best of seven match, and to do this one has to consider each game as a separate entity. To avoid timing issues of matches being significant, all games for a day are processed at the same time.
A big concern is having teams who are from different populations because of geographical boundaries. In Dota 2 - this is primarily ping-related, with Chinese teams mostly playing between themselves, SEA doing the same thing and then a big clump of “Western Teams” (both the Americas and Europe). As anyone who has played in a somewhat isolated region (like, South Africa) for matchmaking and then returned to EU, your rating in the smaller region becomes very inflated by slamming within the smaller, more isolated community; and upon returning to a larger community you are in a pool of much better players. This is because each region can reach different dynamic equilibriums, for example, maybe a team of rating 1400 from Europe can maintain a 50% winrate against a 1600 rating SEA team. So, as a warning - cross-region comparison’s should be taken with a pinch of salt, but over time they slowly tend towards their true values (more and more international tournaments break down the effects of isolation).
One thing to note is that Elo ratings don’t necessarily reflect who will win between two teams. Elo ratings measure performance of true underlying skill against an arbitrarily large number of opponents - an average of sorts. A random team with a rating of 1800 is expected to lose 1 out of every 11 games against a team with a rating of random 1400, so it’s not impossible for these kinds of upsets to occur, they should happen just rarely.
At the moment, the bar on the right-hand-side shows the top 10 teams per region: “East” and “West”. For a team to be listed there, they need to have played more than 50 games ever, and their last game has to be in the last 60 days. This is just to remove teams who are inactive and slowly decaying back to average. The “Rankings” page shows this information, as well as if the Elo rating for each team has gone up or down over the past week. This may or may not indicate a change in position on the rankings list. You can click on either of the two regions to get a more verbose list of teams. The “Statistics” page allows you to see the last 10 results for any team, and there is also a leaderboard for the top 10 teams in terms of their winrate over the past month.
With new features constantly being added to the site I maintain at http://http://twopee.noxville.co.za/more and more of them will filter across into the 2P site, and into articles, hopefully increasing your time you spend here!