Roland, on 22 January 2014 - 08:12 PM, said:
You keep saying it, and sure it's theoretically true... Given infinite games, Elo would potentially seat you correctly (although even that may not be necessarily true, given that it really has no ability at all to account for any specific mech configuration you are dropping in, only an attachment to a weight class).
But you are incapable of identifying how many games it would take. Certainly, you will get a number, and the numbers will quickly form a curve, but that curve doesn't indicate that the players are correctly seated.
That's the thing... sure, your own performance is measured in the data, so it could eventually show itself.. but unless it does so in a reasonable amount of time, then the statement is meaningless.
This is why simple win loss ISN'T all that matters, because in many cases only taking into account win loss requires far too many games to arrive at appropriate skill ratings.
According to TrueSkill win/loss is all that matters. It just includes the makeup of your team and the other team and your historical performance in related situations, not just identical ones.
It doesn't require 'infinite games'. A couple of hundred is a solid start and gets a good ballpark. You're about 8.333% all other factors being equal. That works in your advantage in terms of shaking on statistical irregularities (mismatched weight, wide Elo mismatch, etc) in addition to variance between mechs in a weight class.
Again, you're mistaking precision for efficiency. Also you're ignoring the k-factor. You'll gain more points for winning against a team of better players and less points for losing against an inferior team.
200-500 matches is plenty to seat you reasonably well for a weight class regardless of your skill.
Remember - it's trying to place you in a variance of 175 points. If you're misplaced by 50 or even 100 points that's not that big a deal.
Elo is not that precise but it is simple and efficient. Absolutely, a system that accurately tracks your performance per chassis, loadout and team composition would be better and it would seat you faster and more accurately.
Currently though it's about 3 minutes to roughly match weight and high/low to a target within 175 points between teams. If everyones score is off by 50 or even 100 points it's not going to make a significant difference.
I'm not debating that a more precise system for measuring your win/loss to chassis, loadouts, team composition and maps would be better. It's just not going to make a difference without a ton more players.
What would help is a Gaussian score distribution. That'll fatten up the Elo bands. Also splitting pug and premade Elo, which is probably causing the biggest single disparities in score convergence. Also a huge factor would be matching to a range and not a target score since we know there's a lot of variance in Elo seating currently.
When the population and telemetry is there it'd be great to match you not just with premade Elo but vary your Elo by your premade team composition, for example you're a rockstar in a Raven with tag/NARC when you're in a premade with some LRM boats but you're just not poptart material and adjust your value accordingly but it'd take too many matches in too many configurations to seat you effectively and thus you'd spend months with unconverged Elo trying to find the right score seating.
Population is another big one. Something like TrueSkill or Glicko would be great when we've got CW and 250k concurrent players and competition is fierce and there's the telemetry to accurately identify your performance in specific mechs and loadouts and situations.
Remember though the more factors you attempt to identify the more control you need over the whole equation (i.e. more precision you need in being able to select players with specific metrics to meet the target and match fairly) which means you need more total players - or correspondingly larger sample sizes (more matches). That puts that sort of matchmaking complexity out of reach right now.
Here's what I mean - to say that Roland does awesome in his 3L with ERLL, MLs, Narc and TAG when he's got teammates who carry LRMs and have a score within a given range and is playing with a team composition of a particular level against a team composition of a particular level.
For me to predict that value accurately I need to either be able to accurately and consistently create all those factors save you (the premade you're with, the team you drop with and the other team) with the same balance enough times to accurately chart your performance (for the sake of sanity lets say 20 matches). If you dropped 20 matches with the exact same people and loadouts 20 times that'd make it very easy but even reasonably close would give me reasonably accurate telemetry. If I can't do that then I need WAY more samples - I need to be able to widen the variance on all the variables there. Your team, the other team, their composition, your teams composition. That means way more samples. All that just to get your 3L performance dropping with LRM users.
Without either a big sample size to account for constant variation in other players OR a huge player base to keep the quality of those samples consistent (more players means easier to match to specific variables or criteria) it's tough to get results that granular.
Hence why I say Elo is fine, just with the recommended changes I put forward.
Make sense? I get the article you linked, I get how matchmaking and statistics works. I'm saying there's a huge difference between what's theoretically best and what's practically best given the limitations MW:O has.
Plus there's the man hours needed to create and support a more complex statistical model. Adding more variables adds almost exponential effort to maintain.