Jump to content

Why Elo Doesn't Work Here


633 replies to this topic

#401 Abivard

    Member

  • PipPipPipPipPipPipPipPip
  • Shredder
  • 1,935 posts
  • LocationFree Rasalhague Republic

Posted 25 January 2014 - 01:45 PM

If Elo as is so perfect, why are so many people trying to modify it to work better?

#402 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 25 January 2014 - 02:06 PM

View PostAbivard, on 25 January 2014 - 01:44 PM, said:



OMFG! are you for real! Trueskill is not Elo under a different name, My sports car is an automobile, it runs on wheels, The fact that it uses wheels does not mean it is a rickshaw, even if the rickshaw came first.

If you try to race that rickshaw at the indy 500 what do you think would happen?


Just read it Abivard. Then read what I just wrote to you in this last post, and my previous half dozen posts. I'm going in circles and it's not changing anything. Here's the simple summation.

Elo is a system for ranking someones probability of winning. It's the basis of every other comparable ranking system, like Glicko and TrueSkill. It's also the basis for the matchmaker in MW:O. At no point is MW:O, or any of those systems, using just Elo - in fact Chess itself doesn't use just Elo. Elo isn't a matchmaker, it's a ranking system, more to the point it's an equation that serves as the foundation for a matchmaker.

The point of the argument has been clarifying why Elo is the basis for the matchmaker and win/loss can't be modified by damage/kills/whatever to accurately modify the Elo rating to create a valid result.


View PostRoland, on 25 January 2014 - 01:45 PM, said:

That second part of applying additional modifications is pretty key though.

However, that is besides the point, since trueskill would likely also fail to perform good matchmaking in this game, due to the significant differences in capabilities stemming from the equipment rather than the player.

That is, you could take 24 pilots with identical skill, put half of them all on a team of hunchbacks, and half on a team of highlanders, and that game would not be even remotely close, even if elo suggested they each had equal chance of winning.

A market based value system could address this fairly easily, in a way that would effectively self balance.


The problem though is that not everyone performs the same, regardless of tonnage. That's where TrueSkill, or Glicko for that matter, shine with using a Bayesian breakdown of results based on unbiasable criteria.

For example, I just can't make a Shawk perform. I know some people can rock them - I suck on ice with them. However I have a KDR over 2.0 and a win/loss of 1.67 after almost 200 matches in a Highlander 733 packing LRMs. Not potarting but LRM boating.

So you take the persons base Elo and you modify it based on their performance by chassis, not just weight class. Then you give a smaller percentage weighting to weapon loadout. Again, for example, I do really well with AC20s. You put a 20 on something and I'll do better. I'm alright with PPCs but an AC20 and LLs? My odds of winning go up. This will let you tweak the persons Elo more accurately and more quickly to a specific chassis with less total matches, letting you quickly seat people.

When population levels permit you can expand that to give a modifier based on who they're dropping with. I do well supporting LRM boats - pushing people into the open, holding targets even if they're out of my weapon range to keep them viable as LRM targets. So my Elo should get a bump if my team is LRM heavy.

All of this is separate from weight matching but that's going to be irrelevant when there are weight limits for matches. Though, admittedly, we're almost certainly going to see that all divisions of a tonnage cap are not equal. That's where we go back to adjusting Elo value by player based on their performance in a given mech and loadout.

For now though, just because I'm hoping repetition will get it back to PGI -

We need to split pug and premade Elo.

We need to match to a range and not high/low to target. Ideally leave 1 or 2 slots open on a team for outliers who can't find a match to drop in without skewing the matchmakers view of the match balance. That can be removed later when player populations are so high that high and low ranked players have no issue finding full matches in their own range but for now they need allowances made or they're inherently going to cause variance for everyone else.

We need to use a Gaussian distribution for rankings and not a logical one. That means thickening up the middle and making both ends of the curve steeper.

#403 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 25 January 2014 - 02:14 PM

View PostAbivard, on 25 January 2014 - 01:45 PM, said:

If Elo as is so perfect, why are so many people trying to modify it to work better?


Okay. Now I get it. Let me make this clear since perhaps I haven't already, or if I did it was buried in a wall of text -

Nobody, me or PGI or anyone else, is saying a pure Elo equation is a good choice for the matchmaker in MW:O.

Elo is the equation that ranks people based on win/loss rate. That equation is critical - it's the foundation of the MW:O matchmaker, TrueSkill, Glicko, LoLs matchmaker, you name it.

Your Elo score is a prediction of your odds of winning a given match. In a team game that needs additional modifications - for example the Gaussian distribution I've mentioned. It essentially means that lower skilled players have a bigger negative impact on a teams performance than good players have a positive influence and that being ranked REALLY high is way, way tough and being ranked REALLY low is way, way tougher, since your performance is always filtered through the performance of the team.

The key point I'm arguing with Rich and others is that you can't modify Elo with damage score, kills, assists, that sort of thing. You modify it by what chassis you're piloting, what weapons you're carrying and how likely you've been in the past to win or lose with those chassis and weapons.

Does that make sense? The argument I've been having with people thus far is two fold -

1. Win/loss (as represented by the Elo equation) is the best and most reliable starting point and basis (but not only) factor for ranking a player for the matchmaker.

2. You can't modify that Elo score by damage/kills/assists or the like as those numbers can be gamed and are not always accurate in reflecting how good you are at helping win a match. You need to use things like what weapon you do best with or chassis or the like (in terms of how often you win relative to how often you carry it in matches).

#404 Abivard

    Member

  • PipPipPipPipPipPipPipPip
  • Shredder
  • 1,935 posts
  • LocationFree Rasalhague Republic

Posted 25 January 2014 - 02:14 PM

Ok, now you want to talk MM?

You do understand they are really separate issues?

Elo is used to rank players.
Match making is to match players.

The fact that MM uses the Elo as a criteria means absolutely NOTHING as to Elo's fitness to be used as the criteria for MM!

Can you get this concept into your thinking or not?

#405 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 25 January 2014 - 02:17 PM

View PostAbivard, on 25 January 2014 - 02:14 PM, said:

Ok, now you want to talk MM?

You do understand they are really separate issues?

Elo is used to rank players.
Match making is to match players.

The fact that MM uses the Elo as a criteria means absolutely NOTHING as to Elo's fitness to be used as the criteria for MM!

Can you get this concept into your thinking or not?


Elo is the only viable and reliable basis for the matchmaker to use to rank players against each other. That's why it's used. That's why it's used as the basis for the TrueSkill matchmaker (which is not an Elo replacement but a matchmaker itself) and the Glicko matchmaker (which, again, is not a replacement for Elo but a means of using it in a matchmaker).

Which... brings us back to the peer reviewed document from MIT I referenced you to prior, discussing TrueSkill and discussing how it's based on Elo and why and the value of that.

Full circle, I'm going to go ahead and get off now.

#406 Abivard

    Member

  • PipPipPipPipPipPipPipPip
  • Shredder
  • 1,935 posts
  • LocationFree Rasalhague Republic

Posted 25 January 2014 - 02:18 PM

Excuse me, everything is subject to gaming,

#407 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 25 January 2014 - 02:43 PM

View PostMischiefSC, on 25 January 2014 - 01:37 PM, said:


No. Read the link. It says Elo works great - that's why the Elo equation is used in TrueSkill. It then uses modifiers to apply the results of it across multiple games.

TrueSkill uses the Elo equation in the exact same manner that the MW:O matchmaker does. To establish the players probability of winning against a set opponent based off their win/loss record as a member of other teams in other games and matches. It then modifies that ranking via a Bayesian equation to convey it to other games and other team sizes.




I agree with abviard, on one post I commented how I thought you said skills with weapons are different depending on the player, so they shouldn't be factored in. But now you keep responding about how different weapons and loadouts should be used as a modifier?

My suggestion is use what people judge for skill themselves as modifier. Stats! kills, damage, assists, etc... basically Match Score. Noone is saying take wins out of the equation, I'm just saying add match score to it. Match score, which is how players are ranked on the scoreboard.

Edited by RichAC, 25 January 2014 - 02:47 PM.


#408 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 25 January 2014 - 02:51 PM

View PostMischiefSC, on 25 January 2014 - 02:14 PM, said:


2. You can't modify that Elo score by damage/kills/assists or the like as those numbers can be gamed and are not always accurate in reflecting how good you are at helping win a match. You need to use things like what weapon you do best with or chassis or the like (in terms of how often you win relative to how often you carry it in matches).


This is the core of your argument now friend. The only thing you have left to debate.

So lets address this.


1. its a free to play game. People can "game" the system just by making a new account. If people never pay money into the game, these young kids with the good muscle reflexes and aim are still gonna own all the old guys with their trial accounts, or just buy that one main mech which is the only one they need. Over and Over and Over again.

Banning on GUI and IP address is also fruitless. Its impossible to get rid of these kids. Except to have a community that disassociates it self and is vocal against them.

2. People can still game the system by not trying to win, which is what they already do since there is no great reward for winning in the first place. In fact there is only the chance they will face tougher competition and get less cbills lol.

3. Why, if most people only care about cbills, would they be gaming themselves out of making cbills?

4. Its hard to game a system when you don't even know what your rating or what bracket you are in, because you can't see or determine it. its hidden for a reason, and still will be hidden for a reason.

With your system its easier to determine. They just look at W/L ratio and can guess in the ballpark. Factor in match score. and its impossible for them to determine.

5. Its just nuts to think someone who avgs a higher score isn't more likely to win a match. And if they are sandbagging their stats..which should be public, ..well then they are getting less cbills and losing more games. Right now there is more incentive for them to "game" the system by not trying to win, because the only thing they will be hurting is their win loss ratio. Which is not as important to most as K/D and cbills.

Edited by RichAC, 25 January 2014 - 03:15 PM.


#409 Artgathan

    Member

  • PipPipPipPipPipPipPipPip
  • Knight Errant
  • Knight Errant
  • 1,764 posts

Posted 25 January 2014 - 03:03 PM

So the last 19 pages of this thread boil down to: abstract concepts are hard for some people to grasp.

#410 Iskareot

    Member

  • PipPipPipPipPipPip
  • The Universe
  • The Universe
  • 433 posts
  • Google+: Link
  • LocationNW,IN

Posted 25 January 2014 - 03:40 PM

I dare say my ELO rating can change based on if am rolling with a premade group or solo. My success as a solo player has to be less based on concepts of this. If there is two good premades made or even one, - good players, on coms, with strats and the concept knowledge of convergence would that not be an advantage? Not to mention being able to direct people easier.

In theory this should help anyone vs just rolling solo

Edited by Iskareot, 25 January 2014 - 03:41 PM.


#411 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 25 January 2014 - 03:44 PM

View PostIskareot, on 25 January 2014 - 03:40 PM, said:

I dare say my ELO rating can change based on if am rolling with a premade group or solo. My success as a solo player has to be less based on concepts of this. If there is two good premades made or even one, - good players, on coms, with strats and the concept knowledge of convergence would that not be an advantage? Not to mention being able to direct people easier.

In theory this should help anyone vs just rolling solo


It absolutely does. That's why when the game detects you in a premade it gives your Elo a 'bump', it reads you as being higher than you are when calculating the value of your premade.

I'd say more importantly however it needs to split your Elo from playing in a premade to playing solo and keep them separate. Your performance can vary dramatically between the two and the play experience is very different.

#412 A banana in the tailpipe

    Member

  • PipPipPipPipPipPipPipPipPip
  • The 1 Percent
  • 2,705 posts
  • Locationbehind your mech

Posted 25 January 2014 - 03:47 PM

View PostArtgathan, on 25 January 2014 - 03:03 PM, said:

So the last 19 pages of this thread boil down to: abstract concepts are hard for some people to grasp.


In a competitive game nothing should be abstract, it should be quantitative.

P.S. I really like you as a poster Rich, but the moment you said PC gaming is dead, so were your posts.

Edited by lockwoodx, 25 January 2014 - 03:50 PM.


#413 Iskareot

    Member

  • PipPipPipPipPipPip
  • The Universe
  • The Universe
  • 433 posts
  • Google+: Link
  • LocationNW,IN

Posted 25 January 2014 - 03:50 PM

Agreed... it makes no sense if it HOLDS that ELO rating from being solo to premade. But see here is an underlying issues with this.

If you mix premade and solo players that alone is a mish mash of imbalance. HOW can our current drops be fair or even ROUGHLY fair at start?

Who knows how this affects ELO ratings.

#414 Sug

    Member

  • PipPipPipPipPipPipPipPipPip
  • The People's Hero
  • The People
  • 4,629 posts
  • LocationChicago

Posted 25 January 2014 - 03:58 PM

View PostMischiefSC, on 25 January 2014 - 03:44 PM, said:

That's why when the game detects you in a premade it gives your Elo a 'bump', it reads you as being higher than you are when calculating the value of your premade.


Source? I've never heard that. I just thought it tries to match groups with groups based on average Elo.

#415 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 25 January 2014 - 04:00 PM

View PostIskareot, on 25 January 2014 - 03:50 PM, said:

Agreed... it makes no sense if it HOLDS that ELO rating from being solo to premade. But see here is an underlying issues with this.

If you mix premade and solo players that alone is a mish mash of imbalance. HOW can our current drops be fair or even ROUGHLY fair at start?

Who knows how this affects ELO ratings.


You're as likely to drop with a premade as against one. Law of averages.


As to mixing your Elo for you premade and pug, well for most of us it's not THAT different. My understanding is that they combine and then average the Elo of everyone in your premade, then give it a 'bump' to compensate for the advantage being in a premade gives. In general is going to give a reasonably reliable result. Remember, the matchmaker is a long ways from pinpoint. All it's trying to do, all the matchmaker will currently support in fact, is trying to generally match 12 people of approximate skill against 12 people of approximate skill and try its best to match weight. We don't seem to have the player population to accomplish more than that so for the time being we've got a comfortable margin for error - it won't make a big difference on the match makeup.

There's a small sliver of hardcore players though for whom it will be significant. A big part of the problem is that if you play competitively with a competitive 4man for a lot of drops it's going to push your Elo well beyond what you can effectively carry when pugging. This is exacerbated when the matchmaker then tries to compensate for that high Elo by bringing in lower Elo players to counterbalance the high ones on the same team. This is where you get a team that's nowhere close to being able to actually meet the expectations of the matchmaker.

That's why I'm a proponent of splitting premade and pug Elo for the time being and matching players to a range, even if the variance between players on a team and between totals between teams is higher. It may have a higher variance in points but the point values currently are not precise enough to let such a method shine through. We're better off playing in ranges.

I don't expect much to change before UI 2.0 and weight limits however since that will hugely impact everything about the matchmaker.

#416 Iskareot

    Member

  • PipPipPipPipPipPip
  • The Universe
  • The Universe
  • 433 posts
  • Google+: Link
  • LocationNW,IN

Posted 25 January 2014 - 04:05 PM

I agree split them up - split the players up premade from pug. To me it is fair and logical. I know I am not the only one that thinks that.

However I feel the premade community might get seriously mad.

#417 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 25 January 2014 - 04:08 PM

View Postlockwoodx, on 25 January 2014 - 03:47 PM, said:


In a competitive game nothing should be abstract, it should be quantitative.

P.S. I really like you as a poster Rich, but the moment you said PC gaming is dead, so were your posts.



lol, well there are still some popular strategy and rpg games, but even they are dying.

Steam sales mean nothing to me. One guy in this thread said he bought 129 games on steam? How many of them are he actually playing? Playerbases is all that matter to me when I say dead. Maybe I judge with different criteria. But I don't see many games built around new hardware, including this one. PC sales have also been down past years. MIght as well say birds and farmville also prove pc gaming is alive.

Edited by RichAC, 25 January 2014 - 04:13 PM.


#418 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 25 January 2014 - 04:14 PM

View PostIskareot, on 25 January 2014 - 04:05 PM, said:

I agree split them up - split the players up premade from pug. To me it is fair and logical. I know I am not the only one that thinks that.

However I feel the premade community might get seriously mad.


Public CW lobbies will address this. Minus the syncdroppers.

I don't really think the ELO should change for player because of it im the random pugs. Its just the end result matters. If a player is always in a premade that gives him an advantage, then his ELO will go up naturally because of this.

Yes Premades usually have an advantage. But sometimes playing with certain friends can also disadvantage you haha. Lets face it. But PGI encourages these things. Its even a tooltip, part of the selling points of the game. Its usually more fun then anything.

And if Mischief is so worried about people gaming the system, Well he just took away an incentive to team up with friends.

The only things that matters and what to judge is end results, scores and wins.

Which is probably why PGI never bothered with weight limits in the first place. The reason to add weight limits now, is really not nescessarily for more balanced matches. But to encourage use of some of the other mechs that aren't getting used enough. period. People will soon realize this.

Add using a mech your not good in as another way to "game" the system Mischief lol. NO way will PGI ever have elos for every single item in the game for every player. Thats psychotic.


Giving a different ELO to someone in a premade would be flawed for the same reason ELO's shouldn't be used in random team games in the first place.

Edited by RichAC, 25 January 2014 - 04:22 PM.


#419 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 25 January 2014 - 04:27 PM

View PostSug, on 25 January 2014 - 03:58 PM, said:


Source? I've never heard that. I just thought it tries to match groups with groups based on average Elo.


That's an excellent question. It was from a dev response to a debate on the topic prior to Elos release and I'm trying to find the link. PGI has been unwilling to say what the exact criteria they use to modify Elo rankings is, other than 'without getting into the details of how that works'.

Paul talks about doing so here in Dec of 2012, as well as this:

Quote

Once we get a full understanding of how accurately the Match Maker is working, we are going to add some additional parameters to the mix. These include a more defined player skill rating and a Mech weight class balancing system. More info on these when the first pass of Elo testing is done.


I find it intriguing that they certainly do track lonewolf vs premade queing separately, yet we don't have split Elo? Or do we?

I've been reading through all the content from Matthew Craig, who is clearly a freaking BOSS. His responses are polite, concise and concerned. He seems to work considerably with the MM.

While no details are ever given away about weighting and criteria for Elo in the MM it sounds like two things may be true -

All the stuff I've talked about may even already be in but weight matching and populations still put us where we are.

Weight matching and UI 2.0 magic are going to change the MM so much that serious coding around that prior to their implementation are unlikely.

For now I will say I may be wrong in assuming premades get an Elo 'bump'. The only actual data we have on what they were going to implement is from 2012 and is almost certainly no longer accurate. It may not be relevant for all I know. I can say after reading 6 pages of Matts responses on the subject that he clearly gets how Elo and matchmaking works, reviews all its telemetry and really, really wants matches to come out as well as possible.

Admittedly I do worse when dropping with the friends of most the time, not better. The concern about premade vs pug may be theoretically an issue but practically irrelevant, at least compared to weight matching - which, at least from what I've seen reading everything Matt and Paul have written on it, the real bugbear.

Edited by MischiefSC, 25 January 2014 - 04:28 PM.


#420 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 25 January 2014 - 04:39 PM

Yes we all know a mech weight class balancing system is coming. I look forward to community warfare. I just hope people actually play it.

As for premades getting seperate ELO's I don't see Paul saying that anywhere. Thank goodness.

The way Mischief worded it I thought both were being factored in and I was ready to just give up.

It is funny though to read the descriptions as ELO stating player vs player, when its a team game...But Paul does clarify about how they avg it for the team. I have read this thread before, when bringing up this same argument when I first started playing. And was happy enough at least they acknowledge it.

But now there is even less people then when I started, and now I realize noone cares about wins and losses in this game. So rating players based on wins and losses, isn't going to make most players happy. They will still feel mismatched if their match scores are always poor, to clarify for roadkill, their kills, dmg and assists. Normally ratings on wins only would be the way to go, which should be the only end result that should matter to people. It is what matters most to me, but I'm a minority because:

1. this is a random team game

2. the game is based on cbills.

Edited by RichAC, 25 January 2014 - 04:45 PM.






23 user(s) are reading this topic

0 members, 23 guests, 0 anonymous users