Jump to content

Why Elo Doesn't Work Here


633 replies to this topic

#161 MustrumRidcully

    Member

  • PipPipPipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 10,644 posts

Posted 23 January 2014 - 01:09 PM

View PostRoadkill, on 23 January 2014 - 12:20 PM, said:

It's not as bad as some of the numbers floating around. Elo systems self correct rather quickly, actually, so the variance that is introduced by MWO doesn't make it that bad. You might need 100 matchs (in each weight class) to reach nominal stability, but that's really not all that bad.

Remember that when your rating is really off - so badly off that you're dramatically tipping the balance in matches - that Elo is adjusting your rating after each match by the full K value. If K is 50 and the system is set up so that 2800 is intended to be the max rating, it only takes 25-30 distorted wins to correct your rating.

The main instability in PGI's implementation of Elo (IMHO) comes from the fact that we have 1 rating for each weight class. That sounds fine, and is in fact better than just having a single rating, but my performance in a Locust is nowhere near the same as it is in a Raven 3L or a Jenner D. We should really have 1 rating per Mech. Ideally per build but that's just getting crazy.

One can argue that this is also self-correcting... You're not gonna stick to mechs you badly with, you'll tweak t he build or change the mech.

But of course, that's only your Elo score - for an individual match-making effort, it can make quite a difference whether you play your long-term favorite Raven or play a Locust build you have no idea how to play yet (and may be inherently inferior).

But I think such specific events are not that important for match-making, it will probably not happen so often that it wrecks games left and right. After all, we don't expect to have a Elo score for when you had a 8h work day and a lot of coffee, a Elo score when you fire up MW:O after you drank some beers with your friends, and the Elo score you have after playing 16 matches on a weekend.

#162 Abivard

    Member

  • PipPipPipPipPipPipPipPip
  • Shredder
  • 1,935 posts
  • LocationFree Rasalhague Republic

Posted 23 January 2014 - 01:11 PM

View PostGhogiel, on 23 January 2014 - 12:52 PM, said:

In both of those matches you are the only constant.


That statement is totally speculative, you have no idea what else was a constant, the map? the mode? the mech? etc...

#163 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 23 January 2014 - 01:13 PM

View PostRoadkill, on 23 January 2014 - 01:08 PM, said:


The way PGI set up Elo, your rating for each weight class will stabilize rather quickly. As I explained above, you should only need 25-30 matches in each weight class to reach reasonably accurate Elo ratings. You don't have to play every other player to do that. 25-30 matches against random people is sufficient. That's just how Elo works.

It felt more like 200 matches per weight class for me to press through most of the scrub tier {Scrap}.

I personally believe 25-30 matches is a pretty wrong estimate. It's closer than however many million the other guy was saying lol.

#164 Doctor Proctor

    Member

  • PipPipPipPipPipPip
  • Bad Company
  • Bad Company
  • 343 posts
  • LocationSouth Suburbs of Chicago, IL, USA

Posted 23 January 2014 - 01:16 PM

View PostRoadkill, on 23 January 2014 - 12:39 PM, said:

View PostAbivard, on 23 January 2014 - 12:38 PM, said:



Show your work ;p

I already have. Scroll up a few posts.

Worst case it takes 25-30 matches for any given player's Elo rating to correct itself.


I'm assuming you meant to look at the following post:

View PostRoadkill, on 23 January 2014 - 12:20 PM, said:

It's not as bad as some of the numbers floating around. Elo systems self correct rather quickly, actually, so the variance that is introduced by MWO doesn't make it that bad. You might need 100 matchs (in each weight class) to reach nominal stability, but that's really not all that bad.

Remember that when your rating is really off - so badly off that you're dramatically tipping the balance in matches - that Elo is adjusting your rating after each match by the full K value. If K is 50 and the system is set up so that 2800 is intended to be the max rating, it only takes 25-30 distorted wins to correct your rating.


That's not really "showing your work". You stated that it would take 100 matches per weight class to reach stability, and that it could be correct within 25-30, but you're not showing how you reached these numbers. I'm not calling you wrong, but as I said earlier no one is really showing any math here beyond "Naw dude, the math totally checks out, I know it does". An informed decision that does not make.

Edited by Doctor Proctor, 23 January 2014 - 01:16 PM.


#165 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 23 January 2014 - 01:17 PM

View PostAbivard, on 23 January 2014 - 01:11 PM, said:


That statement is totally speculative, you have no idea what else was a constant, the map? the mode? the mech? etc...

The map can't be constant as it changes. The mechs and their load outs almost certainly change as well. The odds of getting the same mechs every round would be so incredibly low.

The mode however could be considered a constant for that player as that is the one thing you listed he can also control and limit.

#166 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 01:22 PM

View PostMustrumRidcully, on 23 January 2014 - 01:09 PM, said:

But I think such specific events are not that important for match-making, it will probably not happen so often that it wrecks games left and right.

Actually, I'd say that such specific events are only important for matchmaking. They're not relevant for your Elo rating, as that evens out in the long term. But for any given match, those variables are a huge part of what determines whether you win or lose that match.

If I switch from a Raven 3L to a Locust 1V for a given match, I'm going to be significantly less of an asset to my team for that match despite having the same Elo rating. We'll probably lose because my performance will not be up to snuff relative to my Elo rating. So my Elo rating will drop a little bit. If I then switch back to a Jenner D, I'll now be overperforming based on my now Locust-hampered Elo.

In the long term it's just a minor fluctuation and it will even out. But for those two matches it likely affected the outcome of the match (along with everyone else's similar fluctuations).

Now... you could easily make the case that similar fluctuations are happening for every player in every match, and that those ultimately balance out as well. And in the long run, they do. But I'm a lot worse in a Locust than in a Raven 3L or Jenner D. I can easily see that difference swinging a match unless an exact similar penalty is being simultaneously paid by the other team.

#167 Doctor Proctor

    Member

  • PipPipPipPipPipPip
  • Bad Company
  • Bad Company
  • 343 posts
  • LocationSouth Suburbs of Chicago, IL, USA

Posted 23 January 2014 - 01:25 PM

View PostGhogiel, on 23 January 2014 - 01:17 PM, said:

The map can't be constant as it changes. The mechs and their load outs almost certainly change as well. The odds of getting the same mechs every round would be so incredibly low.

The mode however could be considered a constant for that player as that is the one thing you listed he can also control and limit.


I think he meant that you might be running a hot build and therefore always do poorly on hot maps like Tourmaline or Terra Therma, while doing much better on colder maps like Alpine Valley and Frozen City. In that case you have two constants: The player and the map. Specifically, that the player (constant) will always play better on Apline/Frozen (constant) and worse on Toumaline/Terra Therma (constant).

I suppose it's true that such things will even out over time, but then again, have you looked at your map stats lately? I have a 1.53 and 1.54 win/loss ratio on Forest Colony and Crimson Straits respectively. Yet, my overall w/l ratio is only 1.12. My worst maps are Frozen City and Terra Therma, with 0.90 and 0.93 w/l ratios respectively. So while Elo might not care about what map I'm on too much, my teammates certainly will.

Edit: Also, my number of matches on those particular maps ranges anywhere from 155 to 223, which should be well within the claimed self correcting range of Elo. So those stats should've normalized by now to match my overall ratio by now, but they're not. And Crimson Straits in particular is even a relatively new map (only 155 drops) for which my Elo should have long ago been well established.

Edited by Doctor Proctor, 23 January 2014 - 01:27 PM.


#168 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 01:25 PM

View PostDoctor Proctor, on 23 January 2014 - 01:16 PM, said:

That's not really 'showing your work'.

Fair enough. I'm making the basic assumption (which may be rash) that people here have read the rest of the thread, including the links that have been posted to how Elo ratings work in MWO, and so that listing the K value that PGI is using along with their intended max rating of 2800 would be sufficient.

(Because, if you know how Elo ratings work, that is sufficient.)

#169 MustrumRidcully

    Member

  • PipPipPipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 10,644 posts

Posted 23 January 2014 - 01:25 PM

View PostRoadkill, on 23 January 2014 - 01:22 PM, said:

Actually, I'd say that such specific events are only important for matchmaking. They're not relevant for your Elo rating, as that evens out in the long term. But for any given match, those variables are a huge part of what determines whether you win or lose that match.

If I switch from a Raven 3L to a Locust 1V for a given match, I'm going to be significantly less of an asset to my team for that match despite having the same Elo rating. We'll probably lose because my performance will not be up to snuff relative to my Elo rating. So my Elo rating will drop a little bit. If I then switch back to a Jenner D, I'll now be overperforming based on my now Locust-hampered Elo.

In the long term it's just a minor fluctuation and it will even out. But for those two matches it likely affected the outcome of the match (along with everyone else's similar fluctuations).

Now... you could easily make the case that similar fluctuations are happening for every player in every match, and that those ultimately balance out as well. And in the long run, they do. But I'm a lot worse in a Locust than in a Raven 3L or Jenner D. I can easily see that difference swinging a match unless an exact similar penalty is being simultaneously paid by the other team.

I just realize that something like a confidence level as addition to the skill rating like TrueSkill might actually be applied here. We have an idea that your usual skill level is x, but today you're mixing things up, so we lower the confidence. Though I am not sure it actually makes much difference. I haven't seen enoung material on the TrueSkill matchmaking how it uses confidence and skill rating - if it, in the end, just uses the combined value, it would probably just underestimate you, and if you move from Locust to Raven instead, that would be a change in the opposite than the desired direction.

Edited by MustrumRidcully, 23 January 2014 - 01:26 PM.


#170 IceCase88

    Member

  • PipPipPipPipPipPipPip
  • The 1 Percent
  • 689 posts
  • LocationDenzien of K-Town

Posted 23 January 2014 - 01:25 PM

Who cares whether your ELO is high or low? It's just a game and being the best at it will have exactly zero effect on your daily life. When the game goes away, and it will one day, what then?

No one knows what their ELO score is or how it is computed. Is it based off wins/losses, damage, KDR, is it per mech, total of all your mechs, an average of all stats, a ratio there of, per capita of the total sum? Speculating is just dumb. Dr. Seuss hour is over.

#171 MustrumRidcully

    Member

  • PipPipPipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 10,644 posts

Posted 23 January 2014 - 01:30 PM

View PostIceCase88, on 23 January 2014 - 01:25 PM, said:

Who cares whether your ELO is high or low? It's just a game and being the best at it will have exactly zero effect on your daily life. When the game goes away, and it will one day, what then?

No one knows what their ELO score is or how it is computed. Is it based off wins/losses, damage, KDR, is it per mech, total of all your mechs, an average of all stats, a ratio there of, per capita of the total sum? Speculating is just dumb. Dr. Seuss hour is over.

View PostIceCase88, on 23 January 2014 - 01:25 PM, said:

No one knows what their ELO score is or how it is computed. Is it based off wins/losses, damage, KDR, is it per mech, total of all your mechs, an average of all stats, a ratio there of, per capita of the total sum? Speculating is just dumb. Dr. Seuss hour is over.

Are you certain that no one here knows? Elo is not just a random name, it's a defined system that needs certain input, and not all the things you listed might work for it.
Also, the developers have actually made statements on what input they use.

We might not know our own Elo, but that doesn't mean we don't know anything about how Elo works. Or at least, it doesn't mean we can't learn, if we research the topic instead of just making wild assumptions.

#172 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 01:32 PM

View PostGhogiel, on 23 January 2014 - 01:13 PM, said:

It felt more like 200 matches per weight class for me to press through most of the scrub tier {Scrap}.

I personally believe 25-30 matches is a pretty wrong estimate. It's closer than however many million the other guy was saying lol.

Given what PGI has told us, 25-30 in each weight class should get you within the matchmaker's tolerance of your true rating.

That's not at all the same as getting to your actual rating, though, especially if you're at either extreme. If you're near the top (or bottom) ratings-wise, then yeah it could take a while before you finally work your way out of the matchmaker's maximum range for including you and scrubs in the same game. 200 matches doesn't sound off to me. It could vary person-to-person, too.

It's really, really hard to get to 2800 if that's intended to be the soft cap. For a player who is actually that good, it's not hard at all to get to 2450 which is within the matchmaker's tolerance range of your actual rating. (Or was, at least. They keep tweaking the tolerance.)

#173 Doctor Proctor

    Member

  • PipPipPipPipPipPip
  • Bad Company
  • Bad Company
  • 343 posts
  • LocationSouth Suburbs of Chicago, IL, USA

Posted 23 January 2014 - 01:33 PM

View PostIceCase88, on 23 January 2014 - 01:25 PM, said:

Who cares whether your ELO is high or low? It's just a game and being the best at it will have exactly zero effect on your daily life. When the game goes away, and it will one day, what then?

No one knows what their ELO score is or how it is computed. Is it based off wins/losses, damage, KDR, is it per mech, total of all your mechs, an average of all stats, a ratio there of, per capita of the total sum? Speculating is just dumb. Dr. Seuss hour is over.


If you had bothered to read the rest of the thread then you would understand two things:
  • Elo only takes into account your wins and losses. This is how the system was designed, and this is how PGI has implemented it here.
  • This isn't about who has the bigger e-peen, but about whether using Elo as the basis for the matchmaker (again, the devs have stated that Elo is the majority factor in putting together matches) was a good idea and is working as intended to create relatively balanced matches.
Instead, you appear to not understand those things, and should therefore RTFT before posting your rant.

#174 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 23 January 2014 - 01:33 PM

View PostDoctor Proctor, on 23 January 2014 - 01:25 PM, said:


I think he meant that you might be running a hot build and therefore always do poorly on hot maps like Tourmaline or Terra Therma, while doing much better on colder maps like Alpine Valley and Frozen City. In that case you have two constants: The player and the map. Specifically, that the player (constant) will always play better on Apline/Frozen (constant) and worse on Toumaline/Terra Therma (constant).

Those aren't constants lol

#175 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 01:40 PM

View PostDoctor Proctor, on 23 January 2014 - 01:25 PM, said:

Edit: Also, my number of matches on those particular maps ranges anywhere from 155 to 223, which should be well within the claimed self correcting range of Elo. So those stats should've normalized by now to match my overall ratio by now, but they're not. And Crimson Straits in particular is even a relatively new map (only 155 drops) for which my Elo should have long ago been well established.

W/L ratio != Elo rating.

Two people with identical W/L ratios can have dramatically different Elo ratings. Also, two people with identical Elo ratings can have dramatically different W/L ratios.

Since maps are chosen randomly, weird things can happen to the map W/L ratio. In my case, Frozen City is one of my better maps while Frozen City Night is one of my worst. What we don't know is what Mechs (read: weight classes, and therefore Elo ratings) I used during the games on those maps.

It's less complicated if you're someone who only plays one weight class, but there are still different Mechs (and different variants, with different loadouts) within each weight class so it's not trivial to figure out.

#176 Asmudius Heng

    Member

  • PipPipPipPipPipPipPipPipPip
  • Survivor
  • 2,429 posts
  • Twitter: Link
  • Twitch: Link
  • LocationSydney, Australia

Posted 23 January 2014 - 01:48 PM

This is a fascinating thread - i am kinda lost in all the statistics but I think I have enjoyed the debate more than most threads on this forum.

It is odd that this level of intellect and in depth discussion does not happen more often, usually it gets dragged down too much into personal attacks and ego.

Keep it up :huh:

#177 Doctor Proctor

    Member

  • PipPipPipPipPipPip
  • Bad Company
  • Bad Company
  • 343 posts
  • LocationSouth Suburbs of Chicago, IL, USA

Posted 23 January 2014 - 01:49 PM

View PostRoadkill, on 23 January 2014 - 01:40 PM, said:

W/L ratio != Elo rating.

Two people with identical W/L ratios can have dramatically different Elo ratings. Also, two people with identical Elo ratings can have dramatically different W/L ratios.

Since maps are chosen randomly, weird things can happen to the map W/L ratio. In my case, Frozen City is one of my better maps while Frozen City Night is one of my worst. What we don't know is what Mechs (read: weight classes, and therefore Elo ratings) I used during the games on those maps.

It's less complicated if you're someone who only plays one weight class, but there are still different Mechs (and different variants, with different loadouts) within each weight class so it's not trivial to figure out.


No, but all things being equal, the MM (which uses Elo to place teams and predict winners) will consistently underestimate or overestimate me on certain maps. This is because, statistically speaking, I have a better chance to win on one map than the other. Since Elo doesn't take map into account, then statistically speaking Elo assumes that I have the same chance to win or lose (whatever that chance may be).

#178 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 23 January 2014 - 02:14 PM

View PostDoctor Proctor, on 23 January 2014 - 01:49 PM, said:


No, but all things being equal, the MM (which uses Elo to place teams and predict winners) will consistently underestimate or overestimate me on certain maps. This is because, statistically speaking, I have a better chance to win on one map than the other. Since Elo doesn't take map into account, then statistically speaking Elo assumes that I have the same chance to win or lose (whatever that chance may be).

The map variable is just one of thousands.

The Elo rating is actually stepping over those variables straight to you personal affect on a matches outcome irrespective of all the millions of variables besides weight class.

It ignores all the excuses> the scrub talk "I would have won but"

I would have won but I was sick; I didn't have my coffee; my ping; if I wasn't in a **** mech; if my pugs didn't suck; if you weren't all in meta builds; if I had a cold map; if I had a different mech etc

#179 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 23 January 2014 - 02:51 PM

View PostRoadkill, on 23 January 2014 - 12:48 PM, said:

Sorry, that's just wrong. I've implemented Elo before. It works just fine for individual ratings in a team game. It just takes longer to stabilize.


Again, wtf are you talking about? None of that has anything to do with what I'm saying.

Sorry, dude, you have me confused with someone else. I've said nothing that would prompt you to respond with these things. I have no idea what you're ranting about or why.


Well apparenlty W/L ratios are very important. It is how they calculate ELO. In a previous thread you actually asked why it is important....

ButI disagree about ELO working to rank individuals in team games. ELO does work for rating a team, but The problem is in a random pug, you are never going to be on the same team. Which makes it meaningless for an individual. To say it will eventually even out or take longer to stabilize is a bogus myth. Noone with any common sense is going to believe that.

Its also silly in a game where people are more concerned with their K/D, Dmg done, assists and match scores. Precisely because they are on ever changing random teams. I think MWO would be better using a skill rating model like in quakelive.

Why do they even have a match score in the game?So your match score is completely meaningless? You can be a horrible player but get a higher rated ELO then someone who is a better play then you, for many reasons, simply because you were on the winning team!?!?!? Even if they got a better score in the match then you!?!?! Even though you ge the lowest match score on your team every single time!?! This is ludicrous.

What fun is it going to be for someone to get a high ELO, because they are on the winning team, even though they are getting severely smashed? Oh i know, we shouldn't worry it will just even out eventually. You really expect most people to believe this? lol.

I'm sure ID software will give PGI their ratings formula for free if they ask for it. Match score has to be accounted for. ELO is good for ranking a team in a tournament or league only, but not for individuals in random teams. This should be common sense. This is all shocking news to me.

Makes absolutely no sense at all.

(also bring back the points in conquest for capping individual bases)

Edited by RichAC, 23 January 2014 - 03:03 PM.


#180 RussianWolf

    Member

  • PipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 2,097 posts
  • LocationWV

Posted 23 January 2014 - 03:10 PM

View PostRoadkill, on 23 January 2014 - 10:07 AM, said:

Elo doesn't need to adjust for those eventualities. The matchmaker is responsible for that.

Elo only cares if you win or lose. It doesn't care if the match was fair. It doesn't care if you did 1200 damage or 12 damage. It doesn't care if you won 12-0 or 12-11. A win is a win and a loss is a loss.

In a pretty random environment like the one that MWO creates, Elo rankings will take longer to converge on your actual skill. But they will eventually reach a stable value (within the constraints used to set up the system) and it will be accurate within the tolerances used to set up the system.

Elo rankings fluctuate by design. Every time you win or lose your ranking changes. All of the randomness that you're talking about simply increases that fluctuation, but it doesn't invalidate the system.

by that rationale, You shouldn't need 4 ELOs for the different classes of mechs. It should all even out in the end. But it doesn't. The different classes require a different skill set. Using ECM uses another, Using LRMs uses another, etc. etc. ELO works when the variables are very limited and static. The more variables you add, and the less static they are, the more problems you will run into with ELO ratings being accurate.





21 user(s) are reading this topic

0 members, 21 guests, 0 anonymous users