Krzysztof z Bagien, on 25 April 2013 - 04:26 PM, said:
I meant skill of individual player - the more people plays in single match the less relevant individual contribution is on average, I think that we can agree on that.
Fair enough. I was mostly just picking on that because you were separating a player individual skill from the gear he was brining. I can agree that in a larger team a players individual contribution is less important.
What bothers me about this is that some people (not you actually) jump from the idea of their contribution being less effective to the idea that their contribution is irrelevant to the win. They have the idea that because their team is always so bad they can't improve their rating and are unjustly forced to play with bad players.
This is still wrong though.
Krzysztof z Bagien, on 25 April 2013 - 04:26 PM, said:
I really can't fully articulate my concerns with Elo system in English as it would require some mathematical terminology I only know in Polish and I'm not sure I can properly translate that without creating more confusion.
Can you try looking up the terms in Polish wikipedia and then switch to the English version? I've read the paper and I can see what you might be going after but it'd help if you can point it out.
Krzysztof z Bagien, on 25 April 2013 - 04:26 PM, said:
Here you can read about TrueSkill (
Bayesian skill rating system which can be viewed as a generalisation of the Elo system used in Chess) developed by Micro$oft for their multiplayer games. Not like I'm great fan of them, but that work looks really solid from mathematical point of view.
It says:
Multiplayer online games provide the following challenges:
1. Game outcomes often refer to teams of players yet a skill rating for individual players is needed for future matchmaking. (and that's the main problem I see with Elo system being used in multiplayer games)
(...)
TrueSkil (...) addresses both these challenges in a principled Bayesian framework.
There's also a comparison to Elo system (they run tests in Halo 2 beta) - for 8vs8 games Elo system was inacurate in almost 40% of the games.
In fact Elo system was less acurate than Bayesian even in 1vs1 games!
As I understand it PGI implemented Elo system pretty straightforward, without any significant modifications. That's why I belive it can
't do any significant change to matchmaking quality (even if it was implemented properly, and we can't be sure about that). Also, we have quite small playerbase (100k? maybe slightly more players total, and no more than 30k - 40k players at any given moment; and no, I don't have any data to prove it, it's just an educated guess) and in many cases there won't be enough players to be matched properly (skillwise) as tonnage would be (or is it already?) one of the factors matchmaker uses.
I hope I made myself clear
Edit: "I belive it can
't", not "it can". One shouldn't think of solving complicated mathematical problems at 3AM.
Edit 2: also, funny fact - TrueSkill is patented. You can actually patent mathematical formula
That was an interesting read. Thank you.
For anybody else that doesn't want to chew through a paper, here are a few friendly explanations:
The major things I took away from it is that:
- TrueSkill is based on Glicko which is based on Elo.
- Like Elo the TrueSkill rating is only based on wining or losing the game. Skill in this context is also defined as the probability of winning.
- The major improvements are in the win estimation factor. When estimating teams with an equal skill mean and variance the system actually reduces to elo.
I think #3 is especially important because the win estimation factor determines how much elo you your are awarded for a win or loss. Improving this would make the system converge faster.
The results in the paper are interesting but I wonder how they'll compare. The paper mentions that for 4vs4 the TrueSkill system did not do significantly better then Elo. The researchers bring up the argument that the most played gamemode is capture the flag which violates the additive performance model used. I think the same can be said about MWO as most games are a form of capture the flag(s).
Looking at the results also shows something far more important. The matches on which TrueSkill and Elo were evaluated were produced by a match maker. So they are already quite equal teams. Both TrueSkill and Elo correctly predict the outcome in 70% of all these matches. Only when given matches that the other system thinks would be equal TrueSkill seems to be better. This is somewhat expected as it takes more data into account then Elo.
So the results regardless of which is better at least validates the general principle of using a match maker and ranking system based on wins and losses as an effective solution. That was my starting argument and I'll consider it sufficiently argued by now.
edit:
Going by the command chair posts, the matchmaker takes tonnage and elo into account. When it can't find players it will relax those requirements. I don't know if it relaxes them both equally or favours the more tonnage less elo and less tonnage more elo directions.
Edited by Hauser, 26 April 2013 - 06:16 AM.