Jump to content

Why Elo Doesn't Work Here


633 replies to this topic

#221 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 23 January 2014 - 05:05 PM

View PostRichAC, on 23 January 2014 - 05:02 PM, said:

Can you just imagine if a general manager of a pro athletic team, or pro athletic scouts, used an ELO system.


Since Elo is used in some sports to determine win probabilities I don't even have to imagine.

#222 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 05:08 PM

View Post100mile, on 23 January 2014 - 04:36 PM, said:

http://mwomercs.com/...79-matchmaking/

Old Rating = 1350
Maximum Change Allowed = +50 for a win, -50 for a loss (as seen in Figure 1)
WinFlag = 1
Probability of Winning = 0.41

1350 Players new ranking = 1350 + 50 x (1 – 0.41)
= 1380

1410 Players new ranking = 1410 – 50 x (1 – 0.41)
= 1381

A player's rating will only go down if they are beaten by a player who has a lower rating than theirs. In this case, if the 1350 player lost, their score would not change since the Match Maker was correct in its prediction.


I'm just quoting 100mile because he provided the link and excerpt. The excerpt was originally posted by Paul.

This excerpt is a problem. Either:

a) The excerpt is wrong; Paul didn't understand what he was saying or mis-typed,
or
:huh: PGI isn't using Elo.

In a real Elo system, your rating always changes unless your rating is so much higher (or lower) than your opponent's that you're virtually guaranteed to win (or lose). If you're virtually guaranteed to win, and you do, then a real Elo system won't adjust your rating (and vice versa for a loss).

If this excerpt is correct, the what PGI has implemented is not Elo.

My guess is that Paul screwed up his explanation of how Elo works. In a proper Elo system, if the 1350 player lost their rating would drop by 20 (50 * 0.41) to 1330 and the 1410 player's rating would increase by 20 (50 * 0.41) to 1430. It was predicted, so the change is less than if an upset occurred, but the Elo ratings still change.

#223 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 23 January 2014 - 05:08 PM

View PostRichAC, on 23 January 2014 - 05:02 PM, said:

Can you just imagine if a general manager of a pro athletic team, or pro athletic scouts, used an ELO system.

ELO is for chess or 1v1.

http://en.wikipedia.org/wiki/Moneyball

http://en.wikipedia....iki/Sabermetric

Sabermetrics is what the MIT guy at the Oakland A's came up with. Its the standard used today to rate players.

But MWO doesn't have to be complicated. Basicaly sabermetrics still uses indiviudal players stats for VORP for a permanent team. Since even in baseball your don't drop with random players on a team each match.

But rating on stats, is still basically the same thing. For example when it comes to lebron james. His stats are what determines his worth or how much hes getting paid. Just like stats already determine how much cbills an MWO player is getting paid. People want to be matched up accordingly to their match scores. That is what they are playing for.

If there is a problem with the point system. Change the point system. There have been many threads about how the point system takes away incentives to win. Takes away incentives to use other weight classes. Give more points for winning a game, give more points for capping....etc... Tweak the point system a little and match people up accordingly. After all we have skirmish now.

Otherwise the complaints will never end, and people will always feel mismatched.



just reposting since both Ghogiel and Mischief ignored the full statement.

I can see ELO used for 1v1 sports only. People can't blame their teams.

But I only trust methods used by people that rate athletes for a living. And no individual player in a team game is rated based only on a team win or loss. Its absolutely absurd.

Sabermetrics is basically tweaking how they use stats.

I'm sorry we's all stupid Mischief, but refer to my signature. Thats all most of us care about and we want to be matched up accordingly.

Edited by RichAC, 23 January 2014 - 05:16 PM.


#224 RussianWolf

    Member

  • PipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 2,097 posts
  • LocationWV

Posted 23 January 2014 - 05:14 PM

I've played well over 6000 matches, yet my Win/Lose rate is steadily increasing. So I guess ELO can't get to my rating in over 6000 matches yet. I'll keep playing and let you know when my win/loss percentage evens out.

#225 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 05:16 PM

View PostRichAC, on 23 January 2014 - 05:02 PM, said:

Otherwise the complaints will never end, and people will always feel mismatched.

The snark in me thinks that people wouldn't be complaining about Elo if the ratings were public and they could use them to measure their e-peens.

I think a huge part of the problem with the matchmaker is that a critical component of it - Elo ratings - is hidden. People want to use match score or K/D ratio or whatever because that's public data that they can flaunt at each other.

I'd love to try some other rating system, mostly because I know that any other system can probably be gamed and I'm pretty good at gaming rating systems. :huh:

#226 Sug

    Member

  • PipPipPipPipPipPipPipPipPip
  • The People's Hero
  • The People
  • 4,629 posts
  • LocationChicago

Posted 23 January 2014 - 05:17 PM

View PostRussianWolf, on 23 January 2014 - 05:14 PM, said:

I've played well over 6000 matches, yet my Win/Lose rate is steadily increasing. So I guess ELO can't get to my rating in over 6000 matches yet. I'll keep playing and let you know when my win/loss percentage evens out.


Unless you only drop solo we probably can't go by your WLR

#227 RussianWolf

    Member

  • PipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 2,097 posts
  • LocationWV

Posted 23 January 2014 - 05:18 PM

Tell you what, take your ELO system of your choice. and apply it to a March Madness sheet and see how accurate it comes out.

I'll tell you this, they are offering a Billion dollar prize to correctly guess all the games. Good Luck.

I'll wager on no one and no system collecting that prize.

View PostSug, on 23 January 2014 - 05:17 PM, said:


Unless you only drop solo we probably can't go by your WLR

Not exclusively, but never on a "competitive team", no TS for example.

#228 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 23 January 2014 - 05:19 PM

View PostSug, on 23 January 2014 - 05:02 PM, said:


Team A Player Elo Scores: 2000, 2000, 2000, 2000, 1900, 1800, 1800, 1800, 500, 500, 500, 500

Team B Player Elo Scores: 1700, 1700, 1600, 1600, 1500, 1500, 1400, 1400, 1400, 1300, 1200, 1200

Average Elo of Team A: 1460

Average Elo of Team B: 1460

The matchmaker sees these teams as equal so no one's Elo score is going to move very far if at all no matter what the outcome. No big deal. Elo working as intended.

Would this be a good, fun game to be in? Probably not for one of the teams. Which is what people ***** about, which is a problem with how the MM uses our Elo scores and not Elo itself.

since 2000 is further away from 500 than the current 1400 threshold allows I think your example is invalid.

Every hypothetical will never give me the metric that I want to know. We know in theory that teams similar (though valid) to your examples are possible and have done forever. What is new is knowing the thresholds.

But we don't know how many/how often and how divergent the Elo ratings actually are in games for high elo rated players in practice. Is it every other game that they get 1400 rated players all up in their biznass. Every game, what's the average Elo difference between players in games with high Elo rated players involved. etc

#229 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 23 January 2014 - 05:19 PM

View PostRoadkill, on 23 January 2014 - 05:16 PM, said:

The snark in me thinks that people wouldn't be complaining about Elo if the ratings were public and they could use them to measure their e-peens.

I think a huge part of the problem with the matchmaker is that a critical component of it - Elo ratings - is hidden. People want to use match score or K/D ratio or whatever because that's public data that they can flaunt at each other.

I'd love to try some other rating system, mostly because I know that any other system can probably be gamed and I'm pretty good at gaming rating systems. ;)


Your a pretty good cheater thats nice of you to admit.

It doesn't need to be public, we all kow what our win/loss ratio is.... and apparenlty thats all the rating system is based on...

I guess we know now why you'd want to know your own rating... it also explains why people are more concerned with damage done rather then winning huh?

#230 A banana in the tailpipe

    Member

  • PipPipPipPipPipPipPipPipPip
  • The 1 Percent
  • 2,705 posts
  • Locationbehind your mech

Posted 23 January 2014 - 05:20 PM

I've really enjoyed continuing to follow this thread despite throwing in the towel. If you guys want to use me as a standard for your theories feel free. I was not kidding when I have a perfect 50/50 ratio. The poster child for where the ELO/Matchmaker bar should be set in terms of balance yet even last night I was still getting streaks of stomp after stomp after stomp after stomp mostly due to being paired up solo against sync drops.

I do not play with premium time and have only purchased MC for paint/camo.
I have been a casual player since AUG 24th 2012

Kills / Death: 941 / 1,865
Wins / Losses: 1,144 / 1,279
Accumulative C-bills per match: 71,725.57
Avg. XP per match: 506.92

^ This is why I've noticed such a drastic change in the matchmaker lately. I'm use to the good and bad any given night. Since last weekend it has been nothing but bad over and over. It didn't matter if I dropped with clan mates in 4 mans, it didn't matter if I soloed, it didn't matter if I changed weight classes.

There's your bar gentlemen. A paragon of neutrality and balance to base your calculations off of.

Use it in good faith.

Edited by lockwoodx, 23 January 2014 - 05:34 PM.


#231 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 23 January 2014 - 05:20 PM

View Postlockwoodx, on 23 January 2014 - 05:20 PM, said:

I've really enjoyed continuing to follow this thread despite throwing in the towel. If you guys want to use me as a standard for your theories feel free. I was not kidding when I have a perfect 50/50 ratio. The poster child for where the ELO/Matchmaker bar should be set in terms of balance yet even last night I was still getting streaks of stomp after stomp after stomp after stomp mostly due to being paired up solo against sync drops.

I do not play with premium time and have only purchased MC for paint/camo.
I have been a casual player since AUG 24th 2012

Kills / Death: 941 / 1,865
Wins / Losses: 1,144 / 1,279
Accumulative C-bills per match: 71,725.57
Avg. XP per match: 506.92

^ This is why I've noticed such a drastic change in the matchmaker lately. I'm use to the good and bad any given night. Since last weekend it has been nothing but bad over and over. It didn't matter if I dropped with clan mates in 4 mans, it didn't matter if I soloed, it didn't matter if I changed weight classes.

There's your bar gentlemen. A paragon of neutrality and balance to base your calculations off of. Use it in good faith.



most people have a pretty even win/loss ratio. The problem is people also care about all their other stats as well.

I'm between 1.00 and 1.30 on all game modes. I'm not really complaining about my matchups. I'm just putting myself in other peoples shoes, and want to stop this game from dying, which is the direction it seems to be heading.

Edited by RichAC, 23 January 2014 - 05:23 PM.


#232 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 05:21 PM

View PostRussianWolf, on 23 January 2014 - 05:14 PM, said:

I've played well over 6000 matches, yet my Win/Lose rate is steadily increasing. So I guess ELO can't get to my rating in over 6000 matches yet. I'll keep playing and let you know when my win/loss percentage evens out.

Win/Loss != Elo

Here's one very simple way that your W/L could be rising while your Elo remains constant:

You win 2 games in a row that you were supposed to win, then you lose one that you were not supposed to lose. W/L is 2.0! Woo-hoo! But your Elo rating could very well remain neutral.

#233 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 23 January 2014 - 05:24 PM

View Postlockwoodx, on 23 January 2014 - 05:20 PM, said:

I've really enjoyed continuing to follow this thread despite throwing in the towel. If you guys want to use me as a standard for your theories feel free. I was not kidding when I have a perfect 50/50 ratio. The poster child for where the ELO/Matchmaker bar should be set in terms of balance yet even last night I was still getting streaks of stomp after stomp after stomp after stomp mostly due to being paired up solo against sync drops.

I do not play with premium time and have only purchased MC for paint/camo.
I have been a casual player since AUG 24th 2012

Kills / Death: 941 / 1,865
Wins / Losses: 1,144 / 1,279
Accumulative C-bills per match: 71,725.57
Avg. XP per match: 506.92

^ This is why I've noticed such a drastic change in the matchmaker lately. I'm use to the good and bad any given night. Since last weekend it has been nothing but bad over and over. It didn't matter if I dropped with clan mates in 4 mans, it didn't matter if I soloed, it didn't matter if I changed weight classes.

There's your bar gentlemen. A paragon of neutrality and balance to base your calculations off of. Use it in good faith.



most people have a pretty even win/loss ratio. The problem is people also care about all their other stats as well.

I'm between 1.00 and 1.30 on all game modes. I'm not really complaining about my matchups. I'm just putting myself in other peoples shoes, and want to stop this game from dying, which is the direction it seems to be heading.

I'm the first to always say post your w/l ratio to all the winers. But this revelation of how they calculate our ranks has me convinced some people out there have a reason not to be happy about it.

Noone cares if their win/loss is even, when the rest of their stats are in the toilet. Or lets say there is no stats, before someone brings that up. The feeling of getting smashed even though your team won, is not satisfying enough for most people.

View PostSug, on 23 January 2014 - 05:17 PM, said:


Unless you only drop solo we probably can't go by your WLR


good point that hasn't been bought up enough.

Edited by RichAC, 23 January 2014 - 05:26 PM.


#234 Sug

    Member

  • PipPipPipPipPipPipPipPipPip
  • The People's Hero
  • The People
  • 4,629 posts
  • LocationChicago

Posted 23 January 2014 - 05:25 PM

View Postlockwoodx, on 23 January 2014 - 05:20 PM, said:

Kills / Death: 941 / 1,865 50%/50%


That would be a 0.5, or 1/2 ratio. 50/50 and your KDR would be 1.00.

Edited by Sug, 23 January 2014 - 05:38 PM.


#235 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 23 January 2014 - 05:25 PM

View PostRoland, on 23 January 2014 - 04:51 PM, said:

Dude, just because it uses wins and losses as a basis doesn't mean its the same as Elo.
The point of the system is that they IMPROVED Elo (or rather, Glicko, which was itself an improvement of Elo) to be better able to determine player skill from team results.


I would agree that in the given matchmaking situation, Trueskill would not somehow magically fix things. The matchmaker has other fundamental problems with it.

However, at the same time, simple Elo OR Trueskill may be insufficient for this game anyway.

Both Elo and Trueskill are essentially just generalized rating algorithms... For MWO, since you aren't trying to generalize across all possible games, you could come up with a BETTER rating system which actually took into account specific aspects of the game itself.

For instance, as others have pointed out... The match score at the end of the game. THAT could be leveraged in a rating system... because the folks topping the scoreboard are generally better players.

There's really no reason to limit ourselves to only using simple win/loss for rating.


Roland, I suspect we're missing each other here. Let me clarify something.

TrueSkill or a system like that does not include a metric like match score, or your damage, or anything like that. Neither does Glicko. They use your win/loss. That's it, that's the basis.

Where they expand upon that is with the ability to leverage massive sampling values (millions of players across countless games) then can drill down to look at how likely you are to win in a specific sort of situation with specific team compositions and specific environments. That lets it more accurately seat players into the best possible matches and compare them more accurately to other players so you can say out of these 1 million players where you fit it, pretty much exactly. They can say that *in this specific match* your impact is going to be X amount as opposed to saying in general, you're about X good.

There are some tweaks to how Elo is tracked and implemented in MW:O (split pug/premade, match range not target, Gaussian distribution) that will make a big difference.

If PGI has the people with the chops to do it and they've been collecting the telemetry for it I'm all for slicing Elo more finely - track win/loss with specific players and against specific players, track performance by chassis and loadout not just weight class, have a variable rating based on those and use that to more quickly seat all further Elo rankings.

For example I love me some AC20. I got good with Orions pretty quickly because I ran them like a mini-Atlas. A more comprehensive Elo system that tracks me by chassis and loadout wouldn't need hundreds of games to seat me for my new Orion, it could take how I perform with similar loadouts, adjust for what proficiencies I've got unlocked and stick me in reasonably balanced matches after 20 or 50 rounds instead of the 300 or 500 it took to generally seat me for heavies.

That's a hell of a lot of work though Roland. It would need a team of very competent people, we're not talking 1 or 2 and it would take a pretty sleepless 12-18 months to roll live. You'd be better off paying for licensing the closest TrueSkill type model and then painstakingly tweaking variables for the MW:O environment. Your eyes would bleed from matching charts trying to do that. You'd need 5 or 6 dedicated terminals just for running reports all day long.

Hence why I say it's outside the scope of MW:O right now. I get the value of it - I don't think you're wrong at all. Realistically though Elo is the basis of all those systems and with the pbase we have not being many hundreds of thousands the ability to make precisely seeded matches is missing anyway, so why musclefuck something of that caliber into place?

Make the tweaks I recommended. Then, when UI 2.0 is done and CW is close enough to at least dream about (if it's 2016 I'll be surprised) then seed Elo by chassis and game mode. Start each chassis at weight class value - 20%. Halve the k-value (how much it's modified by win/loss vs relative ranked opponents) for the first 10 matches to give people shake-out time. Then give them a +20% k-value for the next 40 matches so by match 50 in a chassis you should be pretty accurate.

We've already essentially got a win/loss tracking for game-mode regardless of chassis. Make a 'dumb' (no k-value at all) Elo score for game-modes and take 20% of that plus 80% of your chassis Elo to give you an actual Elo for dropping in a specific game mode with a specific chassis.

The 20/80 is random, you could determine a more accurate impact by looking at ~100 matches by players with tons of matches who've dropped often in the same mechs with the same loadout and see how their performance varies by game-mode. This would give you a rough percentage of game-mode impact on overall performance against the same chassis.

All that aside though the difference that sort of fine hair splitting in matchmaking would do is minimal - there's not enough people to give the matchmaker a huge range of options to fill matches.

#236 Roland

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • 8,260 posts

Posted 23 January 2014 - 05:30 PM

Quote

TrueSkill or a system like that does not include a metric like match score, or your damage, or anything like that. Neither does Glicko. They use your win/loss. That's it, that's the basis.

Yes. Why do you keep saying this? Is something I said making you think that I think otherwise?

I mean, in the section you just quoted there, I point out that because of exactly that, both systems are effectively poor choices for this game... because you don't have to limit yourself to only that one value.

Since you aren't trying to make a system which provides generalized ratings for ANY game, you can actually take into account for things specific to THIS game.

#237 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 23 January 2014 - 05:30 PM

View PostMischiefSC, on 23 January 2014 - 05:03 PM, said:


So, the answer is yes. You believe that 'all those people at MIT' are just wrong and math is some big sneaky conspiracy that smart people use to, what, help keep people down for THE MAN?

For giggles I looked up the documentation available for the ranking system for Quake Live - it was based on Elo, more to the point it was a broken implementation of TrueSkill which is based on the Elo system. Win/loss. Clearly a conspiracy, right?

So your opinion is that we should use metrics like damage/kills/etc even though it's an absolutely unreliable basis for representing skill but it'll make you 'feel better' about how you're ranked.

I'm not going to spend the time to argue that. I'll just go with 'no thank you'.



Your wrong. link the documentation. The rating system was made by gaimtheory. ELO is only used for duels. Meaning 1v1. In the team games they use another system which is unkown, and not soley based on wins/losses. Which is why one of my ideas was the ranking on each team by points compared to teamates and not just win/loss.

Yes some people just like to sound smart. Some people also believe the more complicated it is, the more true it has to be. Its an inferiority complex for most lol Just because a forumula was made by an MIT guy, doesn't mean its practical.

Stats certainly matter when it comes to pro athletic sports. I don't know how you can deny this. Lebron james isn't getting paid alot of money because he has a higher ELO or how often his team wins. Hes getting paid based off his stats. Thats why players get traded all the time.

Hes getting paid for the same reasons people get paid Cbills in MWO. And that is how people should be ranked and matched. That is what people are playing for. To do otherwise is PGI going against itself. Undermining its own game.

They should be ranked by how PGI designed and based their core game! Its common sense and PGI's ELO system is contradicting itself, and even us commonfolk can see that. PGI should be a leader not a follower if they want to be successful in this dying industry.

Edited by RichAC, 23 January 2014 - 05:41 PM.


#238 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 05:33 PM

View PostRichAC, on 23 January 2014 - 05:19 PM, said:

Your a pretty good cheater thats nice of you to admit.

It doesn't need to be public, we all kow what our win/loss ratio is.... and apparenlty thats all the rating system is based on...

I guess we know now why you'd want to know your own rating... it also explains why people are more concerned with damage done rather then winning huh?

How's that cheating? I just know how the system works, so I use it to my advantage. Or are you saying that we should just use random builds, because using the knowledge of how to build a superior Mech is cheating?

And you still have it wrong - win/loss != Elo. Elo is derived from your wins and losses, but it is unrelated to your win/loss ratio.

People are more concerned with damage and kill/death ratio because they mistakenly believe that those numbers mean they are better players. If it takes you more than 200 damage to destroy an Atlas, you're actually bad at the game. You apparently can't aim. But people still go for the big numbers because it makes them feel special.

Which is fine with me, actually. I like putting up big damage numbers and getting lots of kills, too. I actually wish they'd track Assists because I'm usually pretty good at that. But it doesn't mean that damage and kill/death ratio are valid indicators of skill.

I'd love to see something akin to Sabremetrics implemented for MWO. I actually agree with you on all of that! The only thing I've been arguing against is this mistaken belief that somehow Elo doesn't work. It works extremely well... it's just that a lot of people don't really understand what it does, so they form a mistaken belief in their heads, and then they think it's not working when the results don't match that mistaken belief.

So yeah, let's do Sabremetrics. There's got to be some way to create a KVOR or WARP for MWO. And I'm not being sarcastic at all when I say I think that would be totally cool.

Unfortunately, I'm not an MIT-trained statistician, so I can't come up with it by myself. ;)

#239 A banana in the tailpipe

    Member

  • PipPipPipPipPipPipPipPipPip
  • The 1 Percent
  • 2,705 posts
  • Locationbehind your mech

Posted 23 January 2014 - 05:34 PM

View PostSug, on 23 January 2014 - 05:25 PM, said:


That would be a 0.5, or 1/2 ratio. 50/50 and you KDR would be 1.00.


My bad had a long day and brain is fuzzy. Most matches I'm usually plotting, typing/calling out marks, and more occupied with the win than personal kills. Now damage I do care about, along with assists. Lately the squash matches have left teams I've pugged into with multiple players under 100 damage and 0 assists myself included. That's how bad things have gotten and will continue to be. There is no point in participating when it is this ugly, and I've taken a stand not to. The info I provided is for the benefit of your number crunching/theory crafting. Nothing more.

#240 Sug

    Member

  • PipPipPipPipPipPipPipPipPip
  • The People's Hero
  • The People
  • 4,629 posts
  • LocationChicago

Posted 23 January 2014 - 05:37 PM

View Postlockwoodx, on 23 January 2014 - 05:34 PM, said:

The info I provided is for the benefit of your number crunching/theory crafting. Nothing more.


Stats don't tell the whole story.





16 user(s) are reading this topic

0 members, 16 guests, 0 anonymous users