Jump to content

Please Implement Elo Or Trueskill Matchmaking


184 replies to this topic

#101 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 02 December 2017 - 07:29 AM

View PostExard3k, on 29 November 2017 - 09:54 AM, said:

how does your proposal influence wait times?

It doesn't if it's ok putting max Elo A+ players with complete glue eating D- potatoes against a load of C and B players.

Otherwise we go back to the days when searching took 10min-3hrs for solo quickplay matches.

#102 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 02 December 2017 - 10:05 AM

View PostXavori, on 29 November 2017 - 09:48 AM, said:

I think I can safely say, with little disagreement, that our current matchmaker sucks. There are far too many 12-0 matches, and not enough 12-11 matches. It doesn't have to be this way.


Define "far too many."

Provide evidence that 12-0 matches are the result of matchmaker.

Quote

Even with a smaller player population, it's entirely possible to create matches that are more likely to produce balanced gameplay. I've been in a number of long-running small dart leagues that had teams with players who had skill from 6-7 dart outs to lucky to hit the board with all 3 darts. But because teams had balanced ratings, the matches still were competitive.

MWO could have that. There are many options available for creating player ratings, and then using those ratings to assemble roughly equivalent teams. This would dramatically improve the quality of matches, and should make the game more enjoyable as well.


#103 Trissila

    Member

  • PipPipPipPipPipPip
  • Survivor
  • 439 posts

Posted 02 December 2017 - 10:19 AM

While I agree with the base premise that the matchmaking needs some work, your desired end state is simply not feasible.

MWO is a game that snowballs. With only one life to live and the way focus-fire drastically increases lethality (one player cannot reliably headshot their way to victory), every 'mech your team loses that the enemy does not puts you at a substantial disadvantage. When you lose a 'mech that doesn't simultaneously kill an enemy, it's now 11v12 and you're at a disadvantage. When you lose another, it's 10v12 and you're decidedly disadvantaged. Lose another, now it's 9v12 and you're on the back foot.

Thing is, almost-dead 'mechs can still contribute to lethal focus-fire, so even if you lose a 'mech while severely damaging one of theirs, it doesn't "count" until they're actually destroyed. This is where the snowballing comes from.

Counter Strike is a pretty close analogue due to its shared one-life-to-live system, and that game tends to be pretty snowball-y too, but it has a key difference: Even 1v6, a good player can still land headshots and kill enemies before they get a chance to attack him, and pull out a win. In MWO, 1v6 is a lost cause regardless of player skill, because there is no way to reliably, instantly kill an enemy and prevent them from damaging you. It becomes a battle of attrition that you cannot win.

A good match in MWO might be 12-6, maybe 12-7. 12-11 is an unrealistic goal as it reflects two teams that both had trouble focusing their fire.

#104 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 02 December 2017 - 10:28 AM

View PostVlad Striker, on 29 November 2017 - 12:35 PM, said:

ELO rating must be calculated on every owned chasis separately, "avarage patients temperature on hospital" does not make any sense but PGI made exactly this mistake at near past when they try to apply ELO system.


You can't create a true ELO system for players that are randomly assigned to teams.

ELO only works for single players vs. single players or static teams vs. static teams.

#105 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 02 December 2017 - 11:15 AM

View PostKhobai, on 29 November 2017 - 05:16 PM, said:


PGI tried to give us seperate buckets for faction play to prevent stomps

all the units cried because their wait times were longer because they had to wait for other groups for an even match instead of just stomping pugs


You do know that many of us in the forums were actually here for that phase of the game and know that you are outright lying?

The group queue was flooded by one man units because the solo queue was a ghost town.

Quote

so it lasted about one day before PGI rolled it back and let them stomp pugs again

those people have always held the game back. they contribute nothing positive to the game. stomping pugs and manipulating a broken system to pad their stats makes them feel like theyre elite players.

then you have comp players that are pushing the game in stupid *** directions like comp play and solaris because russ is deluded into thinking MWO is going to be the next huge esport. but the game just isnt very entertaining to spectate either because of the whole snowballing aspect of the game. The first team that falls behind virtually always loses so theres no reason to watch beyond that.

PGI needs to focus on going back to the basics and improving the things MWO does well. Instead of trying to expand into areas it doesnt do well like 1v1. 1v1 matches in MWO are pathetic. they last like 30 seconds. nobody wants to play that.


#106 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 02 December 2017 - 11:26 AM

View PostXavori, on 29 November 2017 - 11:34 PM, said:


You don't need tiers at all. Both ELO and TrueSkill generate individual player ratings based on winning or losing versus the skill of the players you fought against.


No they don't. Neither of those systems actually measures skill. They both rely on results of matches as a proxy value for skill. Both can be gamed and both require large population pools to be in the least bit effective.

Quote

When you win, your rating goes up, and how much it goes up is based on the rating of your opponent compared to you. When you lose, your rating goes down, and again, the amount is based on the rating of your opponent compared to you. Over time, this rating will reach a point where two players with the same rating fighting each other would be expected to win 50% of the time.


Except that in MWO two players never fight each other. Elo systems can't work for randomly assembled teams. The average Elo rating of an opposing team is a meaningless metric for determining how much to adjust one player's Elo rating after a match.

The average of several averages is NOT a universal average.

Quote

The rating is also not "You are this skilled of a pilot." The rating is "Compared to other pilots, you are expected to win x% of the time based on your rating and the other pilot's rating.


Move your apostrophe in the word pilot's to its proper location pilots' and you'll see why Elo is a useless tool for this type of game.

Quote

The matchmaker can then take that rating and create two teams where the total player skill on both teams is roughly equivalent.


No it can't and your previous sentence in this post tells you why. Your words - The rating is also not "You are this skilled of a pilot."

You just said that Elo ratings are not a measurement of skill and then turn around a claim that the matchmaker will use the ratings to create teams of matching skill.

Quote

The expectation is that this produces better matches because WE ALREADY KNOW IT DOES IN LOTS OF OTHER FORMS OF COMPETITIVE GAMING.


The threads in every forum of every other PvP game that has a matchmaker would seem to repudiate this hyperbolic claim.

Quote

The rating also lets you create actual leaderboards because the rating is entirely comparison based built on quality, rather than random, matches.


#107 Xavori

    Member

  • PipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 792 posts

Posted 02 December 2017 - 01:00 PM

View Postvandalhooch, on 02 December 2017 - 10:28 AM, said:


You can't create a true ELO system for players that are randomly assigned to teams.

ELO only works for single players vs. single players or static teams vs. static teams.


That is absolutely false unless you're being pedantic and going back to Arpad Elo's original system. Or maybe you meant Electric Light Orchestra which would be an awesome addition to MWO.

Team based ELO's exist all over the place. There are even a multitude of other good ranking systems for team based games. MWO could have run with any of them, but instead we got a sorta-kinda XP bar with 5 huge tiers and a matchmaker that abandons tiers in favor of speed making the already pretty meaningless tiers that much worse.

The reason I'd want ELO-ish rankings (TrueSkill, WHR, whatever) is to balance the teams and create actual leaderboards. I want matches that are more competitive than the 80-90% facerolls we get now. (as a side note, I'd love to have a respawn quickplay...or maybe quickplay with dropdecks or such, but that's not the point of this thread). The current matchmaker is practically making random teams. An ELO-based matchmaker would at least be giving us teams that have a similar combined skill which should move the number of quality matches much higher.

#108 Xavori

    Member

  • PipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 792 posts

Posted 02 December 2017 - 01:05 PM

View Postvandalhooch, on 02 December 2017 - 11:26 AM, said:

Except that in MWO two players never fight each other. Elo systems can't work for randomly assembled teams. The average Elo rating of an opposing team is a meaningless metric for determining how much to adjust one player's Elo rating after a match.


ELO-ish systems do work for teams. I don't care if a handful of people who wish they had higher rank b***h about rankings on other games' forums. I care about MWO getting a better matchmaker which means it needs a better player ranking system.

Here's the thing, if you produce matches where you assume the two teams had similar skill, then move everyone on the winning team up a bit and the losing team down a bit, then assemble new teams that again have similar skill, you eventually will get players playing matches where they consistently have ~50% chance to win or lose. This really for reals does happen.

#109 BlueStrat

    Member

  • PipPipPipPipPipPip
  • 239 posts

Posted 02 December 2017 - 01:18 PM

I just wish that whoever owns the IP rights to MW would take it away from PGI and give it to a competent game company. PGI is damaging the MW IP franchise and reducing it's value.

#110 Brizna

    Member

  • PipPipPipPipPipPipPipPip
  • Liquid Metal
  • Liquid Metal
  • 1,367 posts
  • LocationCatalonia

Posted 02 December 2017 - 01:31 PM

Where do all this whining about 12-0 stomps being too frequent comes from? Me I see them very rarely, just saying.

#111 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 02 December 2017 - 01:45 PM

@vandalhooch -

I've been over, repeatedly, on multiple threads, exactly why win/loss is an accurate measuring tool for a single players performance in TWO for QP. You're 8.333% of your team not 100% so it's 60 matches to Get close, 80 matches to be close enough for our matchmaking needs.

Please search the forums for the teams of examples, math, the formulas involved, even links to websites that go over the math in detail.

For simple evidence go to the leaderboard. Look at people stats. See how month after month they are very similar? If it was random your win loss would swing randomly from 10.0 to 0.01 every month. It doesn't, because lln washes out the impact of other people and just leaves your personal impact on your teams win/loss on average with an adequate sample size.

#112 Weeny Machine

    Member

  • PipPipPipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 4,014 posts
  • LocationAiming for the flat top (B. Murray)

Posted 02 December 2017 - 01:46 PM

I would be happy if the system were considering in which mech class you sit when you are supposed "to carry". A pro in a light has a harder time to hold the line for the potatoes

#113 ocular tb

    Member

  • PipPipPipPipPipPipPip
  • The Seeker
  • The Seeker
  • 544 posts
  • LocationCaught Somewhere in Time

Posted 02 December 2017 - 01:53 PM

I don't think these stomps are any more common now than they were when I first started playing. Yep, even back then people complained about matchmaker. I just don't see those complaints ever going away. For me, I say do whatever. Go back to Elo, keep PSR, do something completely different- I don't really care all that much. I think that regardless of the "player skill" system used very little will change with the smallish population we have. Elite players will still be forced to play with much lesser-skilled players at times otherwise the wait times will be too long.

#114 Wintersdark

    Member

  • PipPipPipPipPipPipPipPipPipPipPip
  • 13,375 posts
  • Google+: Link
  • Twitter: Link
  • LocationCalgary, AB

Posted 02 December 2017 - 01:57 PM

View PostAsym, on 02 December 2017 - 06:56 AM, said:

Again, 8x8 only concentrates the errors we already have? Unless, you can fix MM, as many smart pilots have suggested above, reducing the scope of a match only speeds up the carnage and increases the frustration of new players...
No, it doesn't. As someone who played a LOT in 8v8, and particularly around the transition time to 12v12:

Early complaints by the "Stay at 8v8" crowd where, in particular, that 12v12 leads to more stomps and is harder on new players. I argued against this, and was wrong. See, what happens - what they said would happen - is that small mistakes are actually harder on you. Peek around a corner into the OpFor? Now there's 50% more mechs firing at you while you try to backpedal. In 8v8, you just take less punishment when you make a mistake. New players make lots of mistakes, but they tend to survive them more.

Yes, each player is a larger part of the whole team - each player "matters" more - in 8v8, but as good and poor players tend to be fairly evenly distributed, overall team effectiveness is comparable either way. Just that in 8v8, mistakes are less immediately lethal.

This is what they warned would happen, and it is exactly what happened. You can even look back in the forum threads from that time and see it spelled out verbatim. Along with my own posts of "but in 12v12 each player matters less, so a poor player dying tanks his team less" - except more people die due to single mistakes in 12v12 than 8v8, and it's death that starts the snowball effect leading to stomps.

See, this is due to thresholds, how much damage a mech can do in a brief period of time vs how much damage a mech can take. The reality is that you're basically never exposed to an entire team at once, but in 12v12 you're going to take fire from more mechs simultaneously. When this fire in a single oops costs you just armor, you still fight with full effectiveness (8v8). When it costs you armor, structure and weapons, suddenly you're less effective... or just dead (12v12). 12v12 essentially means any time you're taking damage, you're taking 50% more.

View Postocular tb, on 02 December 2017 - 01:53 PM, said:

I don't think these stomps are any more common now than they were when I first started playing. Yep, even back then people complained about matchmaker. I just don't see those complaints ever going away. For me, I say do whatever. Go back to Elo, keep PSR, do something completely different- I don't really care all that much. I think that regardless of the "player skill" system used very little will change with the smallish population we have. Elite players will still be forced to play with much lesser-skilled players at times otherwise the wait times will be too long.


Stomps are an inevitability, and the MM has very, very little to do with it. Hell, look at the world tournament match results: they're all at least decent players, but there's still a good number of stomps and those are in matches where there's consistent mech builds, player skills, etc. It's just how the game works.

#115 Maker L106

    Member

  • PipPipPipPipPipPip
  • The Raider
  • The Raider
  • 250 posts

Posted 02 December 2017 - 02:14 PM

View PostExard3k, on 29 November 2017 - 09:54 AM, said:

how does your proposal influence wait times?


Mine will... drop ranks entirely, base it on connections. This game has a comp mode now, no need to have QP work off of some scaling system.

Also currently, yes it DOES function as some sort of skilled MM, i never saw Tier 1's as a Tier 4 or 5. Or if i did it was the rare shining diamond among the **** pile.

#116 LordNothing

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 17,657 posts

Posted 02 December 2017 - 02:54 PM

View PostMystere, on 29 November 2017 - 06:36 PM, said:


You are remembering things wrong. The alternative is unspeakable.



what really happened, was scouting. everyone was playing it to death. if you wanted to play invasion, screw it. everyone was scouting. it was the first new mode we have seen in years. also freelancer was really poorly implemented. all the solos who took loyalty had no problem finding games.

Edited by LordNothing, 02 December 2017 - 02:54 PM.


#117 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 03 December 2017 - 09:49 AM

View PostXavori, on 02 December 2017 - 01:00 PM, said:


That is absolutely false unless you're being pedantic and going back to Arpad Elo's original system. Or maybe you meant Electric Light Orchestra which would be an awesome addition to MWO.

Team based ELO's exist all over the place.


Static teams or randomly assembled teams?

Quote

There are even a multitude of other good ranking systems for team based games. MWO could have run with any of them, but instead we got a sorta-kinda XP bar with 5 huge tiers and a matchmaker that abandons tiers in favor of speed making the already pretty meaningless tiers that much worse.


Every pilot has a single number PSR behind that Tier. The matchmaker uses the PSR's of the various pilots to assemble matches with a few restrictions in place regarding Tiers. If a match can't be assembled with the currently available pilots, then the restrictions are lifted.

Your proposed system would assign players a single number Elo or TrueSkill number. Then the matchmaker would use those numbers of the various pilots to assemble matches. What happens when your system can't assemble a match with the currently available pilots?

Oh, and you seem to have failed to address the mech class restriction as well. How is your system going to account for that?

Quote

The reason I'd want ELO-ish rankings (TrueSkill, WHR, whatever)


Suddenly it's "-ish?"

Quote

is to balance the teams and create actual leaderboards. I want matches that are more competitive than the 80-90% facerolls we get now.


You do know that I'm not going to accept your completely made up "80-90%" value, right?

Quote

(as a side note, I'd love to have a respawn quickplay


It's called Faction Play.

Quote

...or maybe quickplay with dropdecks or such, but that's not the point of this thread). The current matchmaker is practically making random teams. An ELO-based matchmaker would at least be giving us teams that have a similar combined skill which should move the number of quality matches much higher.


You just got done claiming that Elo scores don't actually measure skill.

For example: Assume we have two pilots who hypothetically have identical skills. Player One plays dozens of matches each day and does well against the competition (mostly newer pilots) and ends up with his/her Elo rating going up significantly. Even a small increase adds up when repeated. It's called grinding.

Player Two plays only a few matches a day and ends up playing at the same time as most of the best players in the game and ends up losing a great deal. His/her Elo score will go down. It may not drop a lot but it will still go down.

Now, both players queue up for a match at the same time, will the matchmaker accurately assess that these are players with identical skill levels? Because they earned their values from facing different pools of opponents, their ratings are not comparable. To make matters worse, you want to pool the various Elo ratings of pilots who earned them on their own into some sort of collective value. Statistical nonsense.

You can not assign an Elo value to an individual player in a team competition. You can assign a value to a team but that value only has meaning for that particular grouping of players.

#118 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 03 December 2017 - 09:52 AM

View PostBrizna, on 02 December 2017 - 01:31 PM, said:

Where do all this whining about 12-0 stomps being too frequent comes from? Me I see them very rarely, just saying.

Confirmation bias of players who exemplify Dunning-Kruger.

#119 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 03 December 2017 - 09:58 AM

View PostMischiefSC, on 02 December 2017 - 01:45 PM, said:

@vandalhooch -

I've been over, repeatedly, on multiple threads, exactly why win/loss is an accurate measuring tool for a single players performance in TWO for QP. You're 8.333% of your team not 100% so it's 60 matches to Get close, 80 matches to be close enough for our matchmaking needs.


An approximation metric to make matches reasonably competitive is of course possible. Is exactly what we already have. That's not what the OP is proposing.

Quote

Please search the forums for the teams of examples, math, the formulas involved, even links to websites that go over the math in detail.


None of them claim to be measuring a player's skill in absolute terms, do they?

Quote

For simple evidence go to the leaderboard. Look at people stats. See how month after month they are very similar? If it was random your win loss would swing randomly from 10.0 to 0.01 every month.


Strawman is made of straw. At no time did I say that players don't vary in their skill at the game. At no point did I claim that a player's skill at the game has no effect on the outcome of an individual match.

Quote

It doesn't, because lln washes out the impact of other people and just leaves your personal impact on your teams win/loss on average with an adequate sample size.


Group queue vs. solo queue. Split the leaderboards and tell me that you think they'll look just like they do now.

#120 Xavori

    Member

  • PipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 792 posts

Posted 03 December 2017 - 03:19 PM

vandalhooch,

In reading your various responses, you're either misreading what I'm saying, or you're trolling. In the interests of discussion, I'll assume positive intent that you're just misreading me.

When I said Elo-ish, it wasn't to say -ish as something weakened or watered down. ELO is a specific player rating system created for chess, and even there modified by the various foundations. When I say ELO-ish, I mean other ranking systems that work in a similar fashion.

You keep insisting that we somehow magically rate players' skill levels. I not once suggested any such thing. You are correct in that it'd be impossible simply because so many variables go into MWO piloting, and they change based on situations. So it'd be a fool's errand to try to create a number and call it "player skill". I've actually made the point about how arbitrary and useless such a number would be in a number of other threads...

Instead, what you look for in ELO-ish (by which I mean, ELO, WHR, TrueSkill, or some other rating system) is a way to match up players so that when two players have the same rating, you'd assume that they have a 50/50 chance of beating each other. That's about as strong an approximation of equivalent skill as you are going to get, but it is important to remember that it's a comparative, not absolute, skill rating.

Now, in head to head, it's pretty easy to rate players. You just keep having people with similar current ranks play each other with the winner going up in rating and the loser going down. The actual math for most rating systems gets a bit more complicated simply because they build in the quality of the match (ie. the difference in the player's actual rating, if any) in order to determine how much to move each of them. Repeat this often enough, and you get a rating that ultimately will lead to that 50/50 condition (or a W/L ratio of ~1).

With teams, it's a bit more work, and takes a lot more matches, but it ultimately functions the same way. You just move all the winners up and all the losers down, and then reshuffle and have new teams of equivalent rating playing each other. And when I say it takes a bit more work, that's just in that the formulas tend to be a bit more complex and need quite a few more iterations before the rating is "solid." For example, using TrueSkill, it takes 12 head to head matches between individual players to get a solid rating. For 8v8 team play, though, it takes 91 matches. If we applied TrueSkill to MWO 12v12, it'd take hundreds of matches. But it would get there. Pretty much any ranking system is going to face the same challenge in terms of needing a bunch of matches, but the great thing about computers is that they're totally cool doing the same math problem over and over and over again.

Now I personally would prefer team ELO or WHR to TrueSkill simply because TrueSkill only has 50 ranks, and so lacks the precision in both leaderboards (TrueSkill matchmaking almost always has other stats for their leaderboards because each rank will have tons of players) as well as matchmaking. It's a lot more likely to be a quality match when using a wider range of possible values because the ratings represent a smaller difference, ie. the best rank 50 TrueSkill player might be measurably more likely than 50% chance to win vs the worst rank 50, but a 2800 ELO player is almost certainly at 50% vs another 2800 ELO, but just above 50% against a 2799 ELO. And if you want to get way bogged down in complicated math. WHR rankings tend to be really good at prediction because they aren't incremental (ie. moving up and down each time), but instead recalculate the rating based on the entire history of the player.

But the key point here is that the goal is to maximize the number of quality matches. To do that, you have to get teams put together that have similar total skill. Since you can't really assign an absolute number to MWO pilot skill (or lots of other games for that matter), you instead rely on a comparative skill where the assumption is that two players with the same rating have a 50/50 chance of beating each other, and then you build teams where the combined skill level is as close as equal as you can get (because we don't want people waiting for days for perfectly equal matches) using the current queued players. That is definitely something that could be implemented, and it wouldn't even be that complicated because all the theory and math and formulas for doing it are already fully developed.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users