Jump to content

Why Elo Doesn't Work Here


633 replies to this topic

#141 Doctor Proctor

    Member

  • PipPipPipPipPipPip
  • Bad Company
  • Bad Company
  • 343 posts
  • LocationSouth Suburbs of Chicago, IL, USA

Posted 23 January 2014 - 10:23 AM

View PostRoadkill, on 23 January 2014 - 10:07 AM, said:

Elo doesn't need to adjust for those eventualities. The matchmaker is responsible for that.

Elo only cares if you win or lose. It doesn't care if the match was fair. It doesn't care if you did 1200 damage or 12 damage. It doesn't care if you won 12-0 or 12-11. A win is a win and a loss is a loss.

In a pretty random environment like the one that MWO creates, Elo rankings will take longer to converge on your actual skill. But they will eventually reach a stable value (within the constraints used to set up the system) and it will be accurate within the tolerances used to set up the system.

Elo rankings fluctuate by design. Every time you win or lose your ranking changes. All of the randomness that you're talking about simply increases that fluctuation, but it doesn't invalidate the system.


That may be all well and good in theory, but how much will those fluctuations increase the number of matches necessary to calculate your Elo correctly? I mean, what if it took 24,000 matches due to the many destabilizing factors? I've currently been in 2,935 matches, if Elo required 24,000 to balance me correctly then it would only have a bit over 10% of the number of matches that it needs. In other words, it would be inaccurate.

Of course, if it needed 200 matches then it would probably be relatively accurate, and would probably have picked up the natural changes in skill that I've had since I started playing this game early last year. Unless I missed it somewhere though, we don't really know how many matches Elo needs currently to calculate your true position, which means that we don't know the current accuracy of the system.

#142 Doctor Proctor

    Member

  • PipPipPipPipPipPip
  • Bad Company
  • Bad Company
  • 343 posts
  • LocationSouth Suburbs of Chicago, IL, USA

Posted 23 January 2014 - 10:30 AM

View PostJoseph Mallan, on 23 January 2014 - 10:20 AM, said:

And that is the mistake. If I did 22 damage in a match and WE win, I get bumped up in rank for nothing! If we lose and I killed 8 enemy with 0 assists I still fall in the ranking. How is that a reflection of my performance?


FWIW, I agree with you completely. I understand that statistically we should eventually figure out how many 22 damage games you have versus how many 8 kill games you have, but trying to do that for everyone simultaneously could cause the sample size required to grow to untenable levels. Even if you happen to have hit that sample size and have the "correct" Elo, the people you're matched against may not have and therefore have "incorrect" Elo scores, further complicating the issue.

Personally, I would think it would be better to use your win/loss record coupled with some modifier for personal performance. Win the match but only did 22 damage? You get +5 theoretical MM points. Win the match but got 8 kills? You get +25 MM points. Lose the match while doing 22 damage? You get -25 MM points. Lose the match while getting 8 kills? You get -5 MM points, or +5 MM points. Whichever seems to balance out better in the long run (I am not a math/stats major). This way at least the carries don't move up as fast as the carrier since your individual performance carries some weight in addition to the team's performance.

#143 Joseph Mallan

    ForumWarrior

  • PipPipPipPipPipPipPipPipPipPipPipPipPipPipPip
  • FP Veteran - Beta 1
  • FP Veteran - Beta 1
  • 35,216 posts
  • Google+: Link
  • Facebook: Link
  • LocationMallanhold, Furillo

Posted 23 January 2014 - 10:35 AM

That does sound more robust Doc!

#144 Sug

    Member

  • PipPipPipPipPipPipPipPipPip
  • The People's Hero
  • The People
  • 4,629 posts
  • LocationChicago

Posted 23 January 2014 - 10:40 AM

View PostRoadkill, on 23 January 2014 - 09:57 AM, said:

I still have no idea why you are responding to me. Nothing you're saying has anything to do with what I've been saying. I think you're confusing me with someone else earlier in the thread.

Seriously, you have me completely confused. I have no idea why you are responding to me because nothing you're saying has anything to do with the point I've been making.


I think he thinks you're Roadbeer.

#145 Krasnovian

    Member

  • PipPip
  • Overlord
  • Overlord
  • 40 posts

Posted 23 January 2014 - 10:59 AM

I'd be curious to see how the math would work out if the Elo system was implemented regarding kills as wins and deaths as losses. As I understand it (in a grossly simplified and possibly totally incorrect way) Elo eventually pushes W/L ratio toward 1.0 if implemented would it push K/D toward 1.0 making games, win or lose more satisfying. I do understand this is of limited value now and even more limited if they figure out role warfare for lights e.g. scouting. Just curious how the math would work out.

#146 Doctor Proctor

    Member

  • PipPipPipPipPipPip
  • Bad Company
  • Bad Company
  • 343 posts
  • LocationSouth Suburbs of Chicago, IL, USA

Posted 23 January 2014 - 11:08 AM

View PostKrasnovian, on 23 January 2014 - 10:59 AM, said:

I'd be curious to see how the math would work out if the Elo system was implemented regarding kills as wins and deaths as losses. As I understand it (in a grossly simplified and possibly totally incorrect way) Elo eventually pushes W/L ratio toward 1.0 if implemented would it push K/D toward 1.0 making games, win or lose more satisfying. I do understand this is of limited value now and even more limited if they figure out role warfare for lights e.g. scouting. Just curious how the math would work out.


That probably wouldn't work out too well actually. While there are excellent pilots in all weight classes, as a general rule I tend to see the number of kills in matches go something like Lights<Mediums<Heavies<Assaults. Meaning, your high Elo ranks would mostly be Assault players and your lowest Elo ranks would mostly be Light players. You would also have wildly swinging Elo for single players as they switch between chassis in different weight classes.

Basically, it probably wouldn't be very good if that were the only metric you were using...

#147 A banana in the tailpipe

    Member

  • PipPipPipPipPipPipPipPipPip
  • The 1 Percent
  • 2,705 posts
  • Locationbehind your mech

Posted 23 January 2014 - 11:42 AM

View PostJoseph Mallan, on 23 January 2014 - 04:52 AM, said:

I have 4 Alts. All with varying degrees of limitations I personally placed on the account. The more flexible I was with my play style the more successful the Account is. Anton Shiningstar (a FedRat), Is the most successful, Followed by My DCMS Xando Parapasu, I use primarily mobile Mediums and Heavies... Got a good AC10 Pult and a Shadow Hawk I am not ashamed of! My Clanner Is modeled after a Scorpion MechWarrior so mostly Energy weapons and fast moving Lights and Mediums. I have mentioned a few times I am not a good light pilot right? :huh:

The last one I am not telling but its a Leaguer, and to new t know how it will turn out. :lol:


You should be ashamed of your Hawk and the EZ-mode they are! :D

Anyways best of luck to you. It's dark days again for MWO and I'm taking another lengthy break having left my clan and everything. Come to think of it for the year+ I've played casually, half of that has been inactive due to multi-month breaks friends and I took off while they slowly fixed game breaking issues such as dragon bowling, lag shields, UAC5s, Spider Hitboxes, ect.

This time it's the matchmaker, but I promise not to write another scathing review about "too little too late" because the game feels balanced for the most part, it just needs more people. Being unplayable for the average joe who wants to enjoy messing around with stompy mechs at the moment is a shame and what I fear might truly cripple MWO's ability to grow.

This reminds of of how bad Counterstrike can get when you're on a server against a team of AWP users. The user may welcome the challenge, or move onto another more balanced server. MWO doesn't grant you that option, and even worse the matchmaker places those AWP teams against the users trying to distance themselves from them.

That's the nature of elitism in gaming. The user desires participation in order to experience what fun the game has to offer. The elitist desires attention, thus eliminating the participation of users until only the elitist stands out. Diminishing returns is inevitable within a closed system such as the one MWO uses, and it has finally caught up. The only solution now is to pit the elitists against each other because the thing they hate the most is genuine competition which could rob them of the attention they seek.

When the user stops having fun they stop participating. When the elitist stops having fun they take the path of least resistance until they themselves become a user. It's a deadly cycle for competitive games. One I hope PGI has a solution for.

#148 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 12:08 PM

View PostJoseph Mallan, on 23 January 2014 - 10:20 AM, said:

And that is the mistake. If I did 22 damage in a match and WE win, I get bumped up in rank for nothing! If we lose and I killed 8 enemy with 0 assists I still fall in the ranking. How is that a reflection of my performance?

Damage isn't the only indicator of performance, but you know that and I understand the question you're asking.

It's a reflection of your contribution to the team in the sense that despite single-handedly killing 8 enemies, it wasn't enough for your team to win. You didn't add enough to the team, so you get a loss.

The matchmaker may have put you on that team because your Elo was high enough that you should have theoretically made the game balanced. It said "Joe, your assignment this match is to carry the team by yourself. Sorry dude, all I have are lamers to pair with you. Good luck!" You failed to carry, so you get a loss and your Elo drops a bit.

In some other game it may say "ROFL wow you suck Joe. Here, join these 11 Ooberz and have a free win. I feel sorry for you." You drop, run into a few walls, drop off a cliff, and your team wins.

Over time, these things average out. Over time, Elo self corrects and will give you an accurate rating. (Whether or not the matchmaker puts that rating to good use is an entirely different question.)

#149 Joseph Mallan

    ForumWarrior

  • PipPipPipPipPipPipPipPipPipPipPipPipPipPipPip
  • FP Veteran - Beta 1
  • FP Veteran - Beta 1
  • 35,216 posts
  • Google+: Link
  • Facebook: Link
  • LocationMallanhold, Furillo

Posted 23 January 2014 - 12:14 PM

View PostRoadkill, on 23 January 2014 - 12:08 PM, said:

Damage isn't the only indicator of performance, but you know that and I understand the question you're asking.

It's a reflection of your contribution to the team in the sense that despite single-handedly killing 8 enemies, it wasn't enough for your team to win. You didn't add enough to the team, so you get a loss.

The matchmaker may have put you on that team because your Elo was high enough that you should have theoretically made the game balanced. It said "Joe, your assignment this match is to carry the team by yourself. Sorry dude, all I have are lamers to pair with you. Good luck!" You failed to carry, so you get a loss and your Elo drops a bit.

In some other game it may say "ROFL wow you suck Joe. Here, join these 11 Ooberz and have a free win. I feel sorry for you." You drop, run into a few walls, drop off a cliff, and your team wins.

Over time, these things average out. Over time, Elo self corrects and will give you an accurate rating. (Whether or not the matchmaker puts that rating to good use is an entirely different question.)

I am only responsible for 8% of my teams victory... that is a percentage that is easily carried by 11 others!

#150 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 12:20 PM

View PostDoctor Proctor, on 23 January 2014 - 10:23 AM, said:

That may be all well and good in theory, but how much will those fluctuations increase the number of matches necessary to calculate your Elo correctly?

It's not as bad as some of the numbers floating around. Elo systems self correct rather quickly, actually, so the variance that is introduced by MWO doesn't make it that bad. You might need 100 matchs (in each weight class) to reach nominal stability, but that's really not all that bad.

Remember that when your rating is really off - so badly off that you're dramatically tipping the balance in matches - that Elo is adjusting your rating after each match by the full K value. If K is 50 and the system is set up so that 2800 is intended to be the max rating, it only takes 25-30 distorted wins to correct your rating.

The main instability in PGI's implementation of Elo (IMHO) comes from the fact that we have 1 rating for each weight class. That sounds fine, and is in fact better than just having a single rating, but my performance in a Locust is nowhere near the same as it is in a Raven 3L or a Jenner D. We should really have 1 rating per Mech. Ideally per build but that's just getting crazy.

View PostJoseph Mallan, on 23 January 2014 - 12:14 PM, said:

I am only responsible for 8% of my teams victory... that is a percentage that is easily carried by 11 others!

Only true in the average case where you're all equal. If your rating is 8900 and you're paired with 11 guys all rated 100 each, then you're responsible for 89% of your team's victory. Carry harder! :huh:

View PostSug, on 23 January 2014 - 10:40 AM, said:

I think he thinks you're Roadbeer.

Not sure if flattered or insulted. :D

#151 Abivard

    Member

  • PipPipPipPipPipPipPipPip
  • Shredder
  • 1,935 posts
  • LocationFree Rasalhague Republic

Posted 23 January 2014 - 12:32 PM

With a random 12 on 12, Elo would need 10.000.000.000.000 (approximately)games for the individuals efforts to show through. the larger the player base, the more games that would need to show through, for each 100 players in the pool an additional 1 million some games would need to be played.

#152 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 12:36 PM

View PostAbivard, on 23 January 2014 - 12:32 PM, said:

With a random 12 on 12, Elo would need 10.000.000.000.000 (approximately)games for the individuals efforts to show through. the larger the player base, the more games that would need to show through, for each 100 players in the pool an additional 1 million some games would need to be played.

Show your work, because those numbers are dead wrong.

#153 Abivard

    Member

  • PipPipPipPipPipPipPipPip
  • Shredder
  • 1,935 posts
  • LocationFree Rasalhague Republic

Posted 23 January 2014 - 12:38 PM

View PostRoadkill, on 23 January 2014 - 12:36 PM, said:

Show your work, because those numbers are dead wrong.



Show your work ;p

#154 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 12:39 PM

View PostAbivard, on 23 January 2014 - 12:38 PM, said:

Show your work ;p

I already have. Scroll up a few posts.

Worst case it takes 25-30 matches for any given player's Elo rating to correct itself.

#155 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 23 January 2014 - 12:41 PM

View PostRoadkill, on 23 January 2014 - 09:57 AM, said:

I still have no idea why you are responding to me. Nothing you're saying has anything to do with what I've been saying. I think you're confusing me with someone else earlier in the thread.

1. Elo is a proven system for head-to-head, whether that is 1v1 or 12v12. It doesn't use match score at all, it only uses win/loss. The matchmaker might somehow incorporate match score, but if it does PGI hasn't admitted it.

2. Um... sure? How is that relevant? Not sure why you think I care.

3. Again... sure? Not sure why you think I care?

My win/loss stats? How are my win/loss stats relevant in any way?

Seriously, you have me completely confused. I have no idea why you are responding to me because nothing you're saying has anything to do with the point I've been making. Which is thus:

If you find yourself in an unbalanced match, the fault for that lies with the matchmaker and not with Elo. (Assuming PGI implemented Elo correctly.)

Notice how I'm not saying that matches are necessarily unbalanced?



1. ELO is not a proven system for an individual. It would only work for team, and be considered if you are going to keep playing on the same team. Obviously that is not the case in a random pug.

How are your win loss stats relevant?Seriously? I think your the one confused. You are definitely a factor in your win/losses, no matter how little. Stop blaming the matchmaker for every loss you get.

And if they are using a true ELO system, that is all they are going by. I'd like to assume they are also using other stats like match scores when factoring skill rating. Not just a win/loss. But unfortunately it seems this is not the case.

If there is not enough people at your ELO level to play with at a given time. Thats not PGI's fault. What would you rather do? wait a day for match?

Edited by RichAC, 23 January 2014 - 12:44 PM.


#156 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 12:48 PM

View PostRichAC, on 23 January 2014 - 12:41 PM, said:

1. ELO is not a proven system for an individual. It would only work for team, and be considered if you are going to keep playing on the same team. Obviously that is not the case in a random pug.

Sorry, that's just wrong. I've implemented Elo before. It works just fine for individual ratings in a team game. It just takes longer to stabilize.

RichAC said:

How are your win loss stats relevant?Seriously? I think your the one confused. You are definitely a factor in your win/losses, no matter how little. Stop blaming the matchmaker for every loss you get.

And if they are using a true ELO system, that is all they are going by. I'd like to assume they are also using other stats like match scores when factoring skill rating. Not just a win/loss.

If there is not enough people at your ELO level to play with at a given time. Thats not PGI's fault. What would you rather do? wait a day for a match?

Again, wtf are you talking about? None of that has anything to do with what I'm saying.

Sorry, dude, you have me confused with someone else. I've said nothing that would prompt you to respond with these things. I have no idea what you're ranting about or why.

#157 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 23 January 2014 - 12:52 PM

View PostJoseph Mallan, on 23 January 2014 - 10:20 AM, said:

And that is the mistake. If I did 22 damage in a match and WE win, I get bumped up in rank for nothing! If we lose and I killed 8 enemy with 0 assists I still fall in the ranking. How is that a reflection of my performance?

In both of those matches you are the only constant.

#158 Abivard

    Member

  • PipPipPipPipPipPipPipPip
  • Shredder
  • 1,935 posts
  • LocationFree Rasalhague Republic

Posted 23 January 2014 - 12:56 PM

We do not know how many players are playing, but we can put in numbers to work with, lets use that 10 0n 10 example. we will also assume that the total player base is 20 players. and we are playing one mode, on one map and everyone is in the exact same mech.

Possible team combinations = 90, times your lowest value of matches 100 = 9000 but we have 3 modes, so 18000, we have 5 weight classes so 90,000. but these 90,000 games only take 1 game from each possible combination, you win or lose. So say we want 10 games to round that out, now we are at 900k games, but wait, doesnt each weight class have avriants? yes it does min of 3 now we are at 2.7 MILLION games.


But Abi! there are way more that 20 players playing, at least a 1000, PGI says over 1 Million. so 1,350,000,000 to 1,350,000,000,000 games just to place you about where you belong. But their are other variables I havent mentioned that will increase this number even higher, by orders of magnitude.

#159 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 23 January 2014 - 12:57 PM

View PostAbivard, on 23 January 2014 - 12:56 PM, said:

We do not know how many players are playing, but we can put in numbers to work with, lets use that 10 0n 10 example. we will also assume that the total player base is 20 players. and we are playing one mode, on one map and everyone is in the exact same mech.

Possible team combinations = 90, times your lowest value of matches 100 = 9000 but we have 3 modes, so 18000, we have 5 weight classes so 90,000. but these 90,000 games only take 1 game from each possible combination, you win or lose. So say we want 10 games to round that out, now we are at 900k games, but wait, doesnt each weight class have avriants? yes it does min of 3 now we are at 2.7 MILLION games.


But Abi! there are way more that 20 players playing, at least a 1000, PGI says over 1 Million. so 1,350,000,000 to 1,350,000,000,000 games just to place you about where you belong. But their are other variables I havent mentioned that will increase this number even higher, by orders of magnitude.


lol

#160 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 23 January 2014 - 01:08 PM

That's what I thought. None of that makes sense.

Most obviously:
* Elo doesn't care about game mode.
* There are only 4 weight classes, not 5.
* Elo doesn't care about variants or chassis.

But the intrinsic error in your ... calculations ... is that not every possible game needs to be played before Elo ratings can stabilize.

The way PGI set up Elo, your rating for each weight class will stabilize rather quickly. As I explained above, you should only need 25-30 matches in each weight class to reach reasonably accurate Elo ratings. You don't have to play every other player to do that. 25-30 matches against random people is sufficient. That's just how Elo works.





9 user(s) are reading this topic

0 members, 9 guests, 0 anonymous users