Why Elo Doesn't Work Here

#61 Deathlike

Member

Littlest Helper
29,240 posts

Location#NOToTaterBalance #BadBalanceOverlordIsBad

Posted 22 January 2014 - 08:05 PM

Also, I should try to reference links to various Matthew Craig posts (it's easier searching through his profile and posts for that, I recommend you doing that for the context that I'll try to repeat as best I can).

The MM currently seems to do a FIFO-like system of MM. It constructs the first team quickly, probably based on some randomly generated number within a range, and then "attempts" to assemble an opposing team similar to the first team, with ELO being #1 above all else, and copying "weight classes" as best it can. You can probably see the effects better with a new account.

It doesn't quite try to "make the teams" constructed on equal terms like have a premade on both sides with even ELO, weight classes, OR team size.

This is my part of the experience...

When you are in the match first, it's probably because "you are perfect" for the MM (other than having a super-fast computer). When you're in the match last, it's probably because "you are the last resort" for the MM. That's the best way of "explaining" my MM experience with respect to ELO. You are the guy who's the last player picked in kickball, or the guy who's great at participating in overkill (that is, show up and win). That's what the MM constructs more often than not and has far less to do with ELO.

Edited by Deathlike, 22 January 2014 - 08:08 PM.

#62 Roland

Member

8,260 posts

Posted 22 January 2014 - 08:12 PM

Quote

You keep saying it, and sure it's theoretically true... Given infinite games, Elo would potentially seat you correctly (although even that may not be necessarily true, given that it really has no ability at all to account for any specific mech configuration you are dropping in, only an attachment to a weight class).

But you are incapable of identifying how many games it would take. Certainly, you will get a number, and the numbers will quickly form a curve, but that curve doesn't indicate that the players are correctly seated.

That's the thing... sure, your own performance is measured in the data, so it could eventually show itself.. but unless it does so in a reasonable amount of time, then the statement is meaningless.

This is why simple win loss ISN'T all that matters, because in many cases only taking into account win loss requires far too many games to arrive at appropriate skill ratings.

#63 A banana in the tailpipe

Member

The 1 Percent
2,705 posts

Locationbehind your mech

Posted 22 January 2014 - 08:15 PM

Deathlike, on 22 January 2014 - 08:05 PM, said:

I appreciate the useful info and it has made me think back to my drops, where I can't remember in the past few months the last time I was placed in a game before the other team was assembled, yet due to my connection I'm usually first or second to pop up in which ever team I'm being placed in. It's always a lovely sight to enter a game as a pug only to see 8+ davions ect waiting for you....

Either the MM is high on crack or it believes based on my ELO that I should face an entire organized company. I highly doubt that since my KD ratio is 0.5 being the casual baddie that I am. PGI needs to get their scrap together asap.

Edited by lockwoodx, 22 January 2014 - 08:16 PM.

#64 Deathlike

Member

Littlest Helper
29,240 posts

Location#NOToTaterBalance #BadBalanceOverlordIsBad

Posted 22 January 2014 - 08:31 PM

Roland, on 22 January 2014 - 08:12 PM, said:

I don't think that would be easy in this game. Skill in terms of knowing how to drive, and use weapons can transfer to every mech... but every mech has different attributes, both can be beneficial and also crippling (sometimes at the same time) so if anything, it is at best "semi-accurate" how that ELO score is reflected. It is technically impossible to perfectly quantify, but you can give relative terms... like the Locust will murder my ELO, but using a Jenner will probably put me in a better ELO state over time than the Locust.

Quote

That's the thing... sure, your own performance is measured in the data, so it could eventually show itself.. but unless it does so in a reasonable amount of time, then the statement is meaningless.

TBH, if you go about fully eliting a mech, it should take 60-80 matches (give or take, depending on whether you use premium time or not) to arrive at where you should consider "an accurate ELO level". Unfortunately, the lack of efficiencies will also indirectly hurt you as well (not always, but it depends on the mech). So, even within certain situations, it's sub-optimal in predicting... even if you were to get it "right then and there".

Quote

This is why simple win loss ISN'T all that matters, because in many cases only taking into account win loss requires far too many games to arrive at appropriate skill ratings.

Well... you need to look at it more contextually... so it's never going to be obvious at first glance.

Remember that TrueSkill is used for Halo 2. I've only played Halo, so what I'm saying may be inaccurate, but bear with me.

Halo isn't a complex game and I don't expect it to be differ than modern FPSes. One thing that MWO has to cater to is tonnage. This is a factor that in itself makes TrueSkill harder to factor in... since regardless of your skills and performance, you can only do so much in a particular mech over another. In the vast majority of FPSes, you can pick your role, pick your weapon, pick your mode, and succeed and fail. Those games at least remotely attempt to balance the classes against each other... hard to say this happens in MWO, but that's besides the point.

ELO as a system will be imperfect, as much as we'd like to hope it can be better, but let's be honest... capping a base in Assault for a win is different than killing everyone outright. Is there's a better way of modifying ELO to the certain rules in this game? I guess. It's just easier to just go by wins and losses, and go from there. I'm not sure how much finer grain you would need outside making ELO per chassis+variant basis.

I personally would rather sleep on it, for better ideas.

Edited by Deathlike, 22 January 2014 - 08:34 PM.

#65 MischiefSC

Member

The Benefactor
16,697 posts

Posted 22 January 2014 - 08:39 PM

Roland, on 22 January 2014 - 08:12 PM, said:

According to TrueSkill win/loss is all that matters. It just includes the makeup of your team and the other team and your historical performance in related situations, not just identical ones.

It doesn't require 'infinite games'. A couple of hundred is a solid start and gets a good ballpark. You're about 8.333% all other factors being equal. That works in your advantage in terms of shaking on statistical irregularities (mismatched weight, wide Elo mismatch, etc) in addition to variance between mechs in a weight class.

Again, you're mistaking precision for efficiency. Also you're ignoring the k-factor. You'll gain more points for winning against a team of better players and less points for losing against an inferior team.

200-500 matches is plenty to seat you reasonably well for a weight class regardless of your skill.

Remember - it's trying to place you in a variance of 175 points. If you're misplaced by 50 or even 100 points that's not that big a deal.

Elo is not that precise but it is simple and efficient. Absolutely, a system that accurately tracks your performance per chassis, loadout and team composition would be better and it would seat you faster and more accurately.

Currently though it's about 3 minutes to roughly match weight and high/low to a target within 175 points between teams. If everyones score is off by 50 or even 100 points it's not going to make a significant difference.

I'm not debating that a more precise system for measuring your win/loss to chassis, loadouts, team composition and maps would be better. It's just not going to make a difference without a ton more players.

What would help is a Gaussian score distribution. That'll fatten up the Elo bands. Also splitting pug and premade Elo, which is probably causing the biggest single disparities in score convergence. Also a huge factor would be matching to a range and not a target score since we know there's a lot of variance in Elo seating currently.

When the population and telemetry is there it'd be great to match you not just with premade Elo but vary your Elo by your premade team composition, for example you're a rockstar in a Raven with tag/NARC when you're in a premade with some LRM boats but you're just not poptart material and adjust your value accordingly but it'd take too many matches in too many configurations to seat you effectively and thus you'd spend months with unconverged Elo trying to find the right score seating.

Population is another big one. Something like TrueSkill or Glicko would be great when we've got CW and 250k concurrent players and competition is fierce and there's the telemetry to accurately identify your performance in specific mechs and loadouts and situations.

Remember though the more factors you attempt to identify the more control you need over the whole equation (i.e. more precision you need in being able to select players with specific metrics to meet the target and match fairly) which means you need more total players - or correspondingly larger sample sizes (more matches). That puts that sort of matchmaking complexity out of reach right now.

Here's what I mean - to say that Roland does awesome in his 3L with ERLL, MLs, Narc and TAG when he's got teammates who carry LRMs and have a score within a given range and is playing with a team composition of a particular level against a team composition of a particular level.

For me to predict that value accurately I need to either be able to accurately and consistently create all those factors save you (the premade you're with, the team you drop with and the other team) with the same balance enough times to accurately chart your performance (for the sake of sanity lets say 20 matches). If you dropped 20 matches with the exact same people and loadouts 20 times that'd make it very easy but even reasonably close would give me reasonably accurate telemetry. If I can't do that then I need WAY more samples - I need to be able to widen the variance on all the variables there. Your team, the other team, their composition, your teams composition. That means way more samples. All that just to get your 3L performance dropping with LRM users.

Without either a big sample size to account for constant variation in other players OR a huge player base to keep the quality of those samples consistent (more players means easier to match to specific variables or criteria) it's tough to get results that granular.

Hence why I say Elo is fine, just with the recommended changes I put forward.

Make sense? I get the article you linked, I get how matchmaking and statistics works. I'm saying there's a huge difference between what's theoretically best and what's practically best given the limitations MW:O has.

Plus there's the man hours needed to create and support a more complex statistical model. Adding more variables adds almost exponential effort to maintain.

#66 Deathlike

Member

Littlest Helper
29,240 posts

Location#NOToTaterBalance #BadBalanceOverlordIsBad

Posted 22 January 2014 - 08:41 PM

Kinda relevant post, but it's technically outdated and I'm not sure how true it is anymore:
http://mwomercs.com/...ost__p__2766628

#67 A banana in the tailpipe

Member

The 1 Percent
2,705 posts

Locationbehind your mech

Posted 22 January 2014 - 08:42 PM

Ok I just checked my stats to be sure. Being a casual player over a year who's only recently began playing with a purpose, I literally have a 50/50 win loss ratio, and 50/50 kill death ratio.

This means the MM flips a coin every time it wants to screw me, and lately it has been using a two headed quarter.

#68 RussianWolf

Member

Legendary Founder
2,097 posts

LocationWV

Posted 22 January 2014 - 08:43 PM

I'll put it this way, I aced Statistics in college without ever listening to the professor. Statistically you can't get to a measure of 1/12 th of a team by measuring the win/loss of the team without the team being a constant. Mathametically that doesn't work.

If I drop in a dozen PUG matches as a solo, the only constant is likely to be me. The other 23 participants will likely be random and different in each PUG. That make the similarity between matches 4.17% (not the 8.33% mentioned). As you play more games with more variables, those percentages go down, not up.

Statistics works when you have a fixed set of constants. Poweball uses 5 non-repeating numbers from a set group plus a 6th number from another fixed set. The odds of matching can then be statistically calculated.

We have a two groups of constants that come from a pool that is incredible large. Statistically speaking, you don't want to know your odds (read something like C3PO) of effecting the outcome, and can't without knowledge of your teammates flawed ratings.

Math doesn't lie, but it can be flawed in its use.
____________________________________________________________________________

as far as the Matchmaker, yeah, its still borked.

I was in a match a couple days ago thathad 1 Assault on one side with 5 lights, the other side had 1 light and 7 assaults. Hard to overcome that much of a weight inbalance. I'll have to run fraps for the next couple nights and do some screen grabs to show how borked it still is.

Edited by RussianWolf, 22 January 2014 - 08:48 PM.

#69 Bhael Fire

Banned - Cheating

Ace Of Spades
4,002 posts

Twitter: Link
Twitch: Link
LocationThe Outback wastes of planet Outreach.

Posted 22 January 2014 - 08:48 PM

MischiefSC, on 22 January 2014 - 07:56 PM, said:

It does, but again - remember, you are just as likely to drop with the high tonnage team with the premade on VOIP as you are against the team with that. It washes out in the aggregate. All that does is mean it takes more matches to accurately seat you.

What mech you play, again, comes out in the average. If you consistently play sub-par mechs you will (and should) have a lower Elo score. If you consistently play cutting edge competitive cheese it will be reflected in your Elo. If you play sometimes one and sometimes the other it will, in fact, be represented in your Elo score because you'll have better odds of helping your team win more often with piloting better mechs - especially if you do so more skillfully.

I completely understand your reasoning. You make some very good points...and you also obviously have a strong grasp of the "long term" effect of these things on a player's Elo score.

However, none of that matters match to match because the only thing that matters is what you are currently bringing to the table. That is, it's the short term effect that affects a match the most.

Example A
Booger Man (not sure if a real name or name) usually pilots a Raven, but he decides to take his Locust out for a spin. His effectiveness on the field is going to immediately change if he's not used to piloting that locust, even though his Elo remains the same. That can have a massive effect on the current match.

Example B
Palmer Face (again, not a real name that I'm aware of...probably will be soon) usually rolls in his Atlas sync-dropping with his buddies every night. He's not very good, but his teammates (of whom are very skilled) watch out for him....and they usually win more than they lose. Palmer's Elo is bloated beyond recognition because of this. Then one night he goes to play solo in a PUG and gets violated repeatedly, match after match because his Elo was bloated beyond his actual skill level, and MM did nothing to adjust for the fact that he was no longer with his group.

Because he's not grouped up, he cannot help or contribute like he normally does (if only marginally) and therefore the groups and other players on his team don't bother looking out for him; He's on his own.

Sure, in the long term his Elo will equal out if he continues to play solo or rejoin his friends...but that's not the point.

The point is, his change in circumstances immediately affected the outcome of his performance and therefore his team.

It's all about the past AND the present, not the future, when matchmaking players.

Edited by Bhael Fire, 22 January 2014 - 08:57 PM.

#70 MischiefSC

Member

The Benefactor
16,697 posts

Posted 22 January 2014 - 09:01 PM

RussianWolf, on 22 January 2014 - 08:43 PM, said:

You're missing that the players are not random - they are matched to within a certain range of yourself. While the other team is not exactly the same the value of their contribution is. Also thanks to the k-factor (you gain or lose points based on the other teams skill relative to your own) statistically anomalous events impact your skill minimally if at all one way or the other. That's what makes Elo different than just a straight win/loss +1/-1 in random matchups, it self-corrects for mismatches.

That you then consider anecdotal cherry-picked examples to be relevant doesn't help either.

#71 MischiefSC

Member

The Benefactor
16,697 posts

Posted 22 January 2014 - 09:08 PM

Bhael Fire, on 22 January 2014 - 08:48 PM, said:

What matters when matchmaking players is the quality of the match, but you calculate that over repeated samples, not just one anecdotal sample.

Human behavior is that you'll forget the 'average' and even 'good' matches but remember the 'bad' matches. You might play 20 games but remember 5, 4 of which were bad, so you remember that you had 'a bunch of bad matches'. Those bad matches may be absolutely unrelated to the matchmaker. Maybe your best player got headshot by a lucky noob, maybe one of the higher Elo people on your team was having a bad day. Maybe the other teams noobs just 'clicked' this game.

Generally though the other team just out-played you and your team made some bad decisions. That's the nature of games like this.

Your examples of A and B are both valid - B is why I say split pug and premade Elo. Example A however is as likely to happen in your favor as your detriment. You'll win as many games because of that event as you'll lose games because of it.

#72 Ghogiel

Member

CS 2021 Gold Champ
6,852 posts

Posted 22 January 2014 - 09:12 PM

Sandpit, on 22 January 2014 - 06:10 PM, said:

I've seen details on everything EXCEPT how it's calculated. Nowhere have I seen a dev state it is based solely on your win.loss record.

I thought you at least had read the command chair on the MM and Elo when you are constantly making claims on how it actually worked :\

#73 Bhael Fire

Banned - Cheating

Ace Of Spades
4,002 posts

Twitter: Link
Twitch: Link
LocationThe Outback wastes of planet Outreach.

Posted 22 January 2014 - 09:34 PM

MischiefSC, on 22 January 2014 - 09:08 PM, said:

Your examples of A and B are both valid - B is why I say split pug and premade Elo. Example A however is as likely to happen in your favor as your detriment. You'll win as many games because of that event as you'll lose games because of it.

I'd be fine with Solo and Grouped Elo scores per player. That's a reasonable solution.

However, as for Example A, I'm still staunchly adamant that a player's current mech and loadout should affect their effective "value" as far as matchmaking goes.

Simply "taking turns" every other match being an ineffective waste of metal on your team or killing an ineffective waste of metal on the other team is pointless and not very fun. The MM should be taking this into account when making matches so that this stuff does not happen as frequently as it does now.

#74 Sug

Member

The People's Hero
4,630 posts

LocationChicago

Posted 22 January 2014 - 10:01 PM

lockwoodx, on 22 January 2014 - 08:42 PM, said:

Ok I just checked my stats to be sure. Being a casual player over a year who's only recently began playing with a purpose, I literally have a 50/50 win loss ratio, and 50/50 kill death ratio.

Well KDR doesn't effect your Elo but if you only solo pug 50/50 is right where the MM wants you to be. Which kinda sucks. It's not purposely trying to make you lose half your games but it's trying give you "good" games by matching you against people that you only win against 50% of the time.

#75 RussianWolf

Member

Legendary Founder
2,097 posts

LocationWV

Posted 22 January 2014 - 10:11 PM

The anectdotal evidence isn't about ELO (all caps to annoy

) its aboutt he MM which is a different topic. And it is borked if it can provide matches that are so far out of whack for weight. As I said, I don't mind losing, but when I see something like that that kind of predetermines the outcome, its not as much fun.

#76 MischiefSC

Member

The Benefactor
16,697 posts

Posted 22 January 2014 - 10:34 PM

Bhael Fire, on 22 January 2014 - 09:34 PM, said:

It does, just more loosely. How often you play a gimped mech affects how often you win overall. Again, it's worked out in the aggregate (lots of matches).

The problem with per-mech MM is that, again, until we have a large enough player base to take more complex samples this is going to extend the time it takes to correctly seat your 4 different Elo ranks (light/medium/heavy/assault) and turn it into more like 30 or so ranks depending on how many mechs you have and how often you change their loadout.

Keep in mind that the way it works now it likely requires 200, 300, even 400 matches (depending on how good you are - the better or worse you are the more matches to seat you correctly) for each weight class to really get you dialed in.

Now, when there's more people playing more often (like when CW comes out or even UI 2.0 working well) then it's easier to make a matchmaker that can accurately account for that granular detail and more quickly seat you. For example you have a default 'lights' Elo, but it tweaks for each chassis and even loadout, in fact could even identify that when you play a Highlander with PPCs and 2xAC5 you're going to do even better against Battlemasters with short-range loadouts and weight you accordingly, more meticulously picking each team.

Again, currently it's happy to get two teams of 12 with 750 Elo variance within a team and 150 or so variance between both teams total and tonnage with 100 tons. That's pretty freaking lose. Trying to tighten that isn't a problem with the MM but with available population.

RussianWolf, on 22 January 2014 - 10:11 PM, said:

The anectdotal evidence isn't about ELO (all caps to annoy

Yep, it's absolutely true that when it happens it's a problem. Relatively uncommon though and as likely to be in your favor as against you. It always sucks to get a skewed game but, again, it's pretty rare and the k-factor will pretty much wipe those losses and throw you a big bone if you still manage to win. Not to mention that you drop with Wispy in a 4man of friends in a Jenner wolfpack the tonnage different isn't a big deal - hell, it's a perk.

#77 A banana in the tailpipe

Member

The 1 Percent
2,705 posts

Locationbehind your mech

Posted 22 January 2014 - 11:12 PM

Ghogiel, on 22 January 2014 - 09:12 PM, said:

I thought you *Sandpit* at least had read the command chair on the MM and Elo when you are constantly making claims on how it actually worked :\

Sug, on 22 January 2014 - 10:01 PM, said:

That's why I'm grumpy about the recent changes. I'm use to having a reasonable chance of winning when I drop. Lately there has been zero chance... do not pass go, do not collect any c-bills. The 10+ monster stomps in a row people are reporting are true. The system is severely flawed now or so many people have left the game I'm being paired against the most hardcore of tryhards. Hell... I'm starting to see forum regulars in my drops whom I'd never fathom would fall so low as to reach my level of casual play. They must be loving all the free kills with how hard elitists are defending this new system. Queue times are lower for their sync drops, and casuals quit in droves. /sarcasm Sounds like a winning strategy for PGI.

Edited by lockwoodx, 22 January 2014 - 11:21 PM.

#78 RiotHero

Member

Bridesmaid
311 posts

LocationDetroit

Posted 23 January 2014 - 02:23 AM

If it is based only on W/L I finally understand what has been holding me back. I dropped on a teem today with 8 assaults and 6 of them finished the game in double digit damage. I'm not even that good, I shouldn't be getting multiple times the damage of every assault on my team in a heavy. I also have been playing long enough to not have to drop and read "how do you get out of 3rd person" or "how can I shoot all my lasers at one time".

It's a self perpetuating cycle. You get a bad team and you lose so your elo lowers and you get more bad teams. You can never dig your way out. It simply does not work in any way shape or form. The majority of games I play are pure steam rolls either for or against. When ever I see a even slightly close game I'm thrilled because it's like a unicorn.

#79 o0Marduk0o

Member

Bad Company
4,231 posts

LocationBerlin, Germany

Posted 23 January 2014 - 02:41 AM

Roadkill, on 22 January 2014 - 04:44 PM, said:

The thing is, Elo is working just fine. It's providing ratings of player skill, and those ratings (assuming Elo is implemented correctly) get more accurate with every game played.

The problem is that people mistakenly believe that Elo is some magical system that can be used to create thrillingly awesome down-to-the-wire matches. That's just wrong. That's not what Elo does. Elo rates player skill only.

The matchmaker is what's screwed up. Your Elo rating is just one of many things that the matchmaker has available to help it create reasonably balanced matches. But it fails. Often dramatically.

Don't blame Elo when it fails, though. Elo is a proven system. Blame the matchmaker.

No, Elo is only working in 1vs1 fights and fixed team battles, where you expect everyone to contribute to the game.
You can't rate single players based on a random team performance but this is happening.
The matchmaker can balance Elo ratings among both teams and he is quite accurate there as Paul told in the last command chair post. The problem is, bad players may win and good players may lose because of their team. W/L is totally meaningless in this case.
In other fps games you can rate their K/D when you need only a few hits to kill because those fights/duels are basically 1vs1. You can use W/L in chess and it will be very accurate to rate player performance.
But you can't use it in any random team fight where the influence of a single player is minimal.

#80 MustrumRidcully

Member

Legendary Founder
10,644 posts

Posted 23 January 2014 - 02:53 AM

Roland, on 22 January 2014 - 03:31 PM, said:

Really, Elo is designed for 1v1. It can be applied to team based games, but not really via the simple averaging that we're seeing here. You tend to need to make virtual players for individual combinations of players, and rank those combinations. And even that is generally only going to help solve the problem of grouped players. You'll still have the issues associated with trying to rank an overall team of randomly assigned people (presuming that you couldn't do the virtual player solution for PUG teams, since the number of combinations would be far too high, and constantly shifting, so no team would effect actually have a rank).

Understand, Elo rating is for chess. It's not for team based games. Even in cases where it has been applied to team based games, it's applied to the teams themselves, not the individual players on the teams... thus, the team is effectively a single player in those cases.

But a virtual player for 12 man groups is ... impractical. Too many virtual players, not enough data per virtual player.
maybe one could try it for 4 man groups and have the match-maker first to recreate "known" teams (okay, stjobe, DocBac, Sandpit and Varent in the queue? They've played before, I'll try to put them together). and then match these known teams Elo scores.

Maybe the lobby system will really help us more here, since it might keep players together for longer?

Quote

For a team based game, you would be better served with something like Microsoft's Trueskill system, which is much better at creating ratings for this type of game. Indeed, it was specifically designed to address the inherent flaws of an Elo like rating system in this type of gaming environment.

How does that work? (Of course, I'll google for it.

)

Ah, damn, TrueSkill is patented and not open. :ph34r:

It's based on the Glicko rating system, but apparantly that's also not designed for teams, TrueSkill seems to add that aspect particularly, and that's what you find out about the least.

http://research.micr...ll/details.aspx

Edited by MustrumRidcully, 23 January 2014 - 03:03 AM.

Why Elo Doesn't Work Here

#61 Deathlike

#62 Roland

#63 A banana in the tailpipe

#64 Deathlike

#65 MischiefSC

#66 Deathlike

#67 A banana in the tailpipe

#68 RussianWolf

#69 Bhael Fire

#70 MischiefSC

#71 MischiefSC

#72 Ghogiel

#73 Bhael Fire

#74 Sug

#75 RussianWolf

#76 MischiefSC

#77 A banana in the tailpipe

#78 RiotHero

#79 o0Marduk0o

#80 MustrumRidcully

1 user(s) are reading this topic

HOME

GAME

MEDIA

COMMUNITY

SUPPORT