Jump to content

Please Implement Elo Or Trueskill Matchmaking


184 replies to this topic

#121 Sjorpha

    Member

  • PipPipPipPipPipPipPipPipPip
  • Philanthropist
  • Philanthropist
  • 4,480 posts
  • LocationSweden

Posted 03 December 2017 - 03:42 PM

View Postvandalhooch, on 03 December 2017 - 09:58 AM, said:

None of them claim to be measuring a player's skill in absolute terms, do they?


I'm not sure what you mean by "skill in absolute terms". Being good at a game is, as far as I can see, identical to being good at winning in that game. Being good at a team game is identical to being a good at maximising your teams chances of winning. The more often you win, and the better opposing teams you can reliably help your team beat, the better you are at the game. This is perfectly measurable, you simply look at the combination of how often, and against how good opponents, a person wins and loses.

I really don't see any other logical interpretation of the word "skill" as a general term for being good at playing a team game, do you?

Obviously you can look at secondary skills, being good at performing different tasks within the game, but none of those secondary skills are interesting from a matchmaking or ratings perspective, it doesn't really matter why a person is good or bad at winning for the purpose of matchmaking or general skill rating.

As far as the claim that ELO type systems can't measure skill for team games with random teams, actually the opposite is more true. You can only apply ELO ratings to the individual player if the teams are random, if they are not random the rating will have to be for the team not the player. With random teams it is trivial to isolate the individual over time, with static teams it's much harder.

#122 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 03 December 2017 - 04:24 PM

View Postvandalhooch, on 03 December 2017 - 09:58 AM, said:


An approximation metric to make matches reasonably competitive is of course possible. Is exactly what we already have. That's not what the OP is proposing.



None of them claim to be measuring a player's skill in absolute terms, do they?



Strawman is made of straw. At no time did I say that players don't vary in their skill at the game. At no point did I claim that a player's skill at the game has no effect on the outcome of an individual match.



Group queue vs. solo queue. Split the leaderboards and tell me that you think they'll look just like they do now.


Having called for and even voted for the github request on splitting it I'm all for the split for group/solo queue stats.

Are you trying to say that the only reason better players consistently show a better w/l is group queue? While the mix of group and solo queue muddies the water for scale on the top end there's not enough people in group queue to make a big difference on the 40k results in the leaderboard.

The point is that your impact on the match is reflected in the averages of your w/l with an adequate sample size.

Elo and TrueSkill due measure skill - the skill of winning matches. That is in turn composed of numerous other skills from communication to mech building to teamwork to positioning. All of which impacts your ability to win matches. The better you are at winning matches the more total matches you'll win on average.

Hence an Elo based or style system would be drastically better than the PSR system we have right now as it's varied based how good you are at winning which is all a matchmaker is or should be concerned with. Building teams with similar odds of winning.

The example you gave for a test case was also flawed. Player One will not be facing consistently inferior teams, because it will give him bigger jumps at first and then nothing if he continues fighting the same people he beat before. The matchmaker would be statistically broken if you consistently played inferior teams and you were not north of 1800 PSR, which would put you in the top percentile or two. Your example is fundamentally flawed because it assumes that the matchmaker will ignore the whole point of generating an Elo score and consistently place people in uneven matches. Why would it do that, other than to create a fake point for you to argue?

Player One would seat pretty close to his correct Elo score within 60-100 matches and play matches geared toward him. If he encounters Player Two then Player Two will be on a team that balances his lower Elo score, giving both of them close to even odds of winning. If Player One is consistently improving and is always better than the Matchmaker predicts then he will absolutely drive wins and continue to increase but that's because he'll be playing against comparable teams. If Player Two is bad and doesn't improve and shows himself below average he will continue to lose until he settles into a rank that approximates his skill at winning matches.

Nothing measures human skill in absolute terms. That's an absurd idea and an absurd argument. Your skill will vary based on sleep, focus, recent sexual activity, calories in your system, countless factors will make you rise and fall in performance over any given day. Saying that getting an average of someones performance as an indicator of average performance level doesn't mean anything because it's not some absolute measure is ridiculous.

#123 Brain Cancer

    Member

  • PipPipPipPipPipPipPipPipPip
  • The 1 Percent
  • 3,851 posts

Posted 03 December 2017 - 04:34 PM

Honestly, group and solo stats need to be split up. The guys who farm group with well tuned kill squads are not the same as carryharders in solo that basically earn it in spite of 11 randoms per match.

#124 SFC174

    Member

  • PipPipPipPipPipPipPip
  • The Pharaoh
  • The Pharaoh
  • 695 posts

Posted 03 December 2017 - 04:38 PM

View PostMischiefSC, on 03 December 2017 - 04:24 PM, said:


Having called for and even voted for the github request on splitting it I'm all for the split for group/solo queue stats.

Are you trying to say that the only reason better players consistently show a better w/l is group queue? While the mix of group and solo queue muddies the water for scale on the top end there's not enough people in group queue to make a big difference on the 40k results in the leaderboard.


Group queue has to be a big factor for people with very high WLR (2:1 and above I'd say). Well, group queue and poor matchmaking in Solo I suppose.

If you look at other multiplayer team games like World of Tanks, it is very rare to find people with win rates over 65% (60%+ is already top 1%). 60% win rate equates to a WLR of 1.5, 66% is 2.0 WLR.

I don't doubt that the people with very high WLR in MWO are generally much better than average. But above 2.0 I'm looking at bad MM or group as the primary drivers.

#125 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 03 December 2017 - 06:05 PM

View PostXavori, on 03 December 2017 - 03:19 PM, said:

vandalhooch,

In reading your various responses, you're either misreading what I'm saying, or you're trolling. In the interests of discussion, I'll assume positive intent that you're just misreading me.

When I said Elo-ish, it wasn't to say -ish as something weakened or watered down. ELO is a specific player rating system created for chess, and even there modified by the various foundations. When I say ELO-ish, I mean other ranking systems that work in a similar fashion.


Define "similar." How different does it have to be before you stop calling it Elo-ish?

Quote

You keep insisting that we somehow magically rate players' skill levels.


No, I most certainly did not. I keep pointing out that you claim your proposed system will create teams with matching "skill levels." I keep pointing out that you have not actually show how you are going to measure a player's skill level. It's you who keeps claiming that your system can do it.

Quote

I not once suggested any such thing. You are correct in that it'd be impossible simply because so many variables go into MWO piloting, and they change based on situations. So it'd be a fool's errand to try to create a number and call it "player skill". I've actually made the point about how arbitrary and useless such a number would be in a number of other threads...


So why do you keep claiming that your Elo-ish system will create matches with teams of equal skill level? All your system would do is create matches with teams with equal Elo-ish rankings. And as you yourself have pointed out, that's not a measurement of skill level.

Quote

Instead, what you look for in ELO-ish (by which I mean, ELO, WHR, TrueSkill, or some other rating system) is a way to match up players so that when two players have the same rating, you'd assume that they have a 50/50 chance of beating each other.


Which is perfectly fine for a 1 v 1 game or contest. MWO is not such a thing.

Quote

That's about as strong an approximation of equivalent skill as you are going to get, but it is important to remember that it's a comparative, not absolute, skill rating.


More importantly, your rating being adjusted due to the actions of other players in the match (teammates and opponents) makes your proxy metric the absolute same as what we have now.

Quote

Now, in head to head, it's pretty easy to rate players. You just keep having people with similar current ranks play each other with the winner going up in rating and the loser going down. The actual math for most rating systems gets a bit more complicated simply because they build in the quality of the match (ie. the difference in the player's actual rating, if any) in order to determine how much to move each of them. Repeat this often enough, and you get a rating that ultimately will lead to that 50/50 condition (or a W/L ratio of ~1).

With teams, it's a bit more work, and takes a lot more matches, but it ultimately functions the same way.


No it doesn't. I don't think you quite comprehend what "a lot more matches" actually means in regards to such a system. Minimum number of matches necessary for a reasonable alpha level increases exponentially as team sizes increase.

Quote

You just move all the winners up and all the losers down, and then reshuffle and have new teams of equivalent rating playing each other. And when I say it takes a bit more work, that's just in that the formulas tend to be a bit more complex and need quite a few more iterations before the rating is "solid." For example, using TrueSkill, it takes 12 head to head matches between individual players to get a solid rating. For 8v8 team play, though, it takes 91 matches.


MWO has 12v12, what's the minimum? Does that minimum require the same 24 players be reshuffled each match? How do you account for mech class in your matchmaking?

Quote

If we applied TrueSkill to MWO 12v12, it'd take hundreds of matches. But it would get there.


How many hundreds? Mech classes?

Edit: Since it seems like you don't want to do the math, I went ahead and did it for you.

Last season (17) saw 28,078 different players play at least ten matches. According to the TrueSkill system, under ideal circumstances, that means each player must play 354 matches in order for the system to accurately place them in their appropriate rank, using the default 50 levels.

However, a small pool of comparable players to draw from compounded by the mech classes would create a non-idealized situation where many matches would not provide a bit for calculation. Their own data runs indicate that non-idealized situations typically increase the minimum match number necessary by a factor of 2-3.

So, to implement a TrueSkill system for a two-team, twelve-player per team game with 28,000 players will only require that every player play between 708 and 1,064 matches to be properly rated.

No problem. Sounds like a piece of cake. You've got me convinced . . . or not.

Quote

Pretty much any ranking system is going to face the same challenge in terms of needing a bunch of matches, but the great thing about computers is that they're totally cool doing the same math problem over and over and over again.

Now I personally would prefer team ELO or WHR to TrueSkill simply because TrueSkill only has 50 ranks, and so lacks the precision in both leaderboards (TrueSkill matchmaking almost always has other stats for their leaderboards because each rank will have tons of players) as well as matchmaking. It's a lot more likely to be a quality match when using a wider range of possible values because the ratings represent a smaller difference, ie. the best rank 50 TrueSkill player might be measurably more likely than 50% chance to win vs the worst rank 50, but a 2800 ELO player is almost certainly at 50% vs another 2800 ELO, but just above 50% against a 2799 ELO. And if you want to get way bogged down in complicated math. WHR rankings tend to be really good at prediction because they aren't incremental (ie. moving up and down each time), but instead recalculate the rating based on the entire history of the player.


Group vs solo queue? How's your calculation going to handle that?

Note that non-random teams increases the minimum match number to properly rate players.

Quote

But the key point here is that the goal is to maximize the number of quality matches. To do that, you have to get teams put together that have similar total skill. Since you can't really assign an absolute number to MWO pilot skill (or lots of other games for that matter), you instead rely on a comparative skill where the assumption is that two players with the same rating have a 50/50 chance of beating each other,


You don't face one opponent at a time.

Quote

and then you build teams where the combined skill level is as close as equal as you can get (because we don't want people waiting for days for perfectly equal matches) using the current queued players.


So, two aces and ten rookies vs twelve average players is a "good match" in your mind?

Edit: It is in Microsoft's view. But how does the TrueSkill ranking system incorporate the game outcome of a team match? In this case, the team’s skill is assumed to be the sum of the skills of the players.

That's the problem with calculating a proxy value for a player's skill based on their individual history in the game. You can't simply add up the various teammate's values to make up a team value.

The average of several averages is a useless metric.

Quote

That is definitely something that could be implemented, and it wouldn't even be that complicated because all the theory and math and formulas for doing it are already fully developed.


And they create the same level of "matchmaker broken" whining that we already get in MWO.

BTW: I'm still waiting for you to show your evidence that 80-90% of matches result in stomps, necessitating this overhaul.

Edited by vandalhooch, 03 December 2017 - 08:27 PM.


#126 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 03 December 2017 - 06:18 PM

View PostSjorpha, on 03 December 2017 - 03:42 PM, said:


I'm not sure what you mean by "skill in absolute terms". Being good at a game is, as far as I can see, identical to being good at winning in that game. Being good at a team game is identical to being a good at maximising your teams chances of winning. The more often you win, and the better opposing teams you can reliably help your team beat, the better you are at the game. This is perfectly measurable, you simply look at the combination of how often, and against how good opponents, a person wins and loses.


So simple. Be sure to let us know when you file for your patent.

Quote

I really don't see any other logical interpretation of the word "skill" as a general term for being good at playing a team game, do you?


Care to take a look at how many forum goers complain about people being carried in group queue?

If I always partner with the best players in my unit, I'll amass a huge W/L ratio. Is that an indication that I'm really skilled and will dominate other players when I drop in the solo queue? What if I only drop at times of the day when newer players are logging in and purposely avoid the times of day when most of the best players are online? Will my high W/L ratio be a reflection of my skill at the game?

What if a few griefers purposefully lose matches in order to drop their rating enough to be grouped in with the newer players? Will their W/L ratio be a reflection of their skill at the game?

Quote

Obviously you can look at secondary skills, being good at performing different tasks within the game, but none of those secondary skills are interesting from a matchmaking or ratings perspective, it doesn't really matter why a person is good or bad at winning for the purpose of matchmaking or general skill rating.


Your reliance on W/L ratio as a proxy for skill makes a whole bunch of assumptions that simply are not true about actual online games.

Quote

As far as the claim that ELO type systems can't measure skill for team games with random teams, actually the opposite is more true. You can only apply ELO ratings to the individual player if the teams are random,


Will the increase/decrease in your rating be due to your skill at the game or your teammates'?

Care to notice that MWO has a mode dedicated to non-random teams?

Quote

if they are not random the rating will have to be for the team not the player. With random teams it is trivial to isolate the individual over time, with static teams it's much harder.


MWO has both. How's your algorithm doing so far?

#127 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 03 December 2017 - 06:41 PM

View PostMischiefSC, on 03 December 2017 - 04:24 PM, said:


Having called for and even voted for the github request on splitting it I'm all for the split for group/solo queue stats.

Are you trying to say that the only reason better players consistently show a better w/l is group queue?


No. I'm pointing out that your claim that the leaderboards are the same season after season without the two modes being split is not evidence for anything.

Quote

While the mix of group and solo queue muddies the water for scale on the top end there's not enough people in group queue to make a big difference on the 40k results in the leaderboard.


You can't possibly know that. You are just assuming it's true. What you think is true and what you have evidence for are not the same thing.

Quote

The point is that your impact on the match is reflected in the averages of your w/l with an adequate sample size.


Calculate for me the adequate sample size given the constant churn of players coming and going from the game over time.

Quote

Elo and TrueSkill due measure skill - the skill of winning matches.


Elo measures the results of 1 v 1 contests under very controlled conditions.

TrueSkill measuring skill at winning matches is what Microsoft CLAIMS it does. What it actually does is give game makers a reasonable enough estimate in order to facilitate matches in a timely manner. The reasonableness of the tool drops the more variables you introduce into the matches. Larger teams make the minimum match number climb. Mech classes and non-random map selection further limit its effectiveness. Mixing solo queue and group queue further destroys the effectiveness of such a system.

Quote

That is in turn composed of numerous other skills from communication to mech building to teamwork to positioning. All of which impacts your ability to win matches. The better you are at winning matches the more total matches you'll win on average.


Assuming all other factors being equally variable. Which is patently not true.

Quote

Hence an Elo based or style system would be drastically better than the PSR system we have right now as it's varied based how good you are at winning which is all a matchmaker is or should be concerned with. Building teams with similar odds of winning.


We had an Elo-ish system before PSR. People complained then as much as they do now. You would think that they might eventually glom on to the fact that there might by some other factor underlying the perceived imbalance in match results besides the matchmaker.

Quote

The example you gave for a test case was also flawed. Player One will not be facing consistently inferior teams, because it will give him bigger jumps at first and then nothing if he continues fighting the same people he beat before.


Elo-ish and Trueskill systems always increase your rating on a win. You never, ever win and result in no change.

Quote

The matchmaker would be statistically broken if you consistently played inferior teams and you were not north of 1800 PSR, which would put you in the top percentile or two. Your example is fundamentally flawed because it assumes that the matchmaker will ignore the whole point of generating an Elo score and consistently place people in uneven matches. Why would it do that, other than to create a fake point for you to argue?


Here comes that underlying factor you and the others have consistently ignored . . . small available player pool at the time of match creation.

Quote

Player One would seat pretty close to his correct Elo score within 60-100 matches and play matches geared toward him.


How exactly did you determine it would be 60-100 matches? You know you can't just make numbers up out of nowhere in a discussion about statistics, right?

Quote

If he encounters Player Two then Player Two will be on a team that balances his lower Elo score, giving both of them close to even odds of winning. If Player One is consistently improving and is always better than the Matchmaker predicts then he will absolutely drive wins and continue to increase but that's because he'll be playing against comparable teams.


Limited available player pool at the moment of match creation prevents him/her from consistently "playing against comparable teams."

Quote

If Player Two is bad and doesn't improve and shows himself below average he will continue to lose until he settles into a rank that approximates his skill at winning matches.


Unless they team up with Player One and join the group queue and get carried to several wins.

Quote

Nothing measures human skill in absolute terms. That's an absurd idea and an absurd argument.


And yet the OP and you keep claiming that your system will create matches with teams of equal skill facing each other. You admit you can't measure skill but your system will make teams that are equal in skill.

HOW DO YOU ACTUALLY MEASURE THE TOTAL SKILL OF EACH TEAM IF YOU CAN'T ACTUALLY MEASURE SKILL?

Quote

Your skill will vary based on sleep, focus, recent sexual activity, calories in your system, countless factors will make you rise and fall in performance over any given day. Saying that getting an average of someones performance as an indicator of average performance level doesn't mean anything because it's not some absolute measure is ridiculous.


A proxy measurement of skill is fine. But you guys keep forgetting that it is just a proxy and then you go and do stupid things like total up all the proxy values for each team as if that means anything.

#128 A Headless Chicken

    Member

  • PipPipPipPipPipPip
  • The Hungry
  • The Hungry
  • 273 posts
  • LocationImmersed in Stupid.

Posted 03 December 2017 - 08:19 PM

View Postvandalhooch, on 03 December 2017 - 06:41 PM, said:

HOW DO YOU ACTUALLY MEASURE THE TOTAL SKILL OF EACH TEAM IF YOU CAN'T ACTUALLY MEASURE SKILL?


Easy, just count the number of potatoes peeled in a minute.

#129 Xavori

    Member

  • PipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 792 posts

Posted 03 December 2017 - 09:36 PM

vandalhooch,

Obviously you don't understand how ELO, WHR, TrueSkill, etc. work. That's the difficulty we're having. Google them. Read up on them. All of your arguments against my suggestion seem to be born in ignorance, and ignorance is easily correctable.

The short version, you absolutely can use ELO-like rating systems in team games that will ultimately end up giving a rating that allows for comparison between individual skills. The evidence in support of this is overwhelming as all kinds of team based games do exactly that and have been since even before computer games.

View PostSjorpha, on 03 December 2017 - 03:42 PM, said:

As far as the claim that ELO type systems can't measure skill for team games with random teams, actually the opposite is more true. You can only apply ELO ratings to the individual player if the teams are random, if they are not random the rating will have to be for the team not the player. With random teams it is trivial to isolate the individual over time, with static teams it's much harder.


Actually, you can't use random teams. If you use random teams, you get random results, and random numbers are random.

Here's what could happen if you had random teams. And for the purposes of this example, I'm going to use player skills of just 1-10 with 1 being mashed potato and 10 being l33tzorz. Let's say you had 8 players with ratings: 10,10,8,7,3,2,2,1. Now let's say the matchmaker randomly made these two teams:
Team A: 10,10,8,7
Team B: 3,2,2,1
Exactly what do you think is going to happen? And how then do you think you're getting any new meaningful data with which to adjust player ratings based on this massacre?

It doesn't matter how likely such a scenario is. It just matter that it's possible. It's why I keep calling our current leaderboard garbage. A W/L ratio above 1 only means that that player has played on winning teams more often than losing teams. You cannot read anything else into it because you cannot isolate down to just that one player's skill being responsible. Yes, it's possible that in the current environment that a W/L ratio above 1 means you are a good pilot, but it's not anywhere near a certainty. You could be a bad pilot who just happens to only log on when even worse players are online. You could be getting carried by your teammates (because solo and group aren't split on the leaderboards). You could just be lucky in that you end up with good teammates in solo queue more often than not. It doesn't matter how likely any of those possibilities are. All that matters for rendering the leaderboard data meaningless is that there are so very many possibilities other than just "good pilot".

Conversely, if you have a matchmaker that does not build random, but instead tries to match team skill ratings, you'd get:
Team A 10, 8, 2, 2 (22)
Team B 10, 7, 3, 1 (21)
Now you have a quality match. Team A has a very slight advantage which will end up reflected in the adjustments, but it's close enough that both teams have a good chance to win, and hence, you can expect that the winning team should move up in ratings. Team A would move up say 3 points with a win or 4 points down with a loss, and Team B would move up 4 points with a win and only 3 points down with a loss (these are totally made up numbers just to illustrate the point...badly since you don't want negative numbers in player ratings *smirk*) that reflect the small advantage Team A had. After the match, you'd adjust everyone's ratings immediately so that next time the match will reflect the most up to date rating possible. Repeat this 40-50 times, and you'll effectively shuffle the individuals within the teams to a rating that likely reflects their comparable skill (or 90-100 for 8v8, or hundreds for 12v12)

So the goal of a better rating system is to produce better matches. This in turn has the effect of giving us a much better indication of who are the good, average, and still learning pilots. It won't be perfect because MWO has so many moving pieces and variables, but it'll be light years beyond what we have now.

#130 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 03 December 2017 - 10:16 PM

View PostXavori, on 03 December 2017 - 09:36 PM, said:


Conversely, if you have a matchmaker that does not build random, but instead tries to match team skill ratings, you'd get:
Team A 10, 8, 2, 2 (22)
Team B 10, 7, 3, 1 (21)
Now you have a quality match.

Wrong.

This is no better than the PSR problem. If we have to put the lowest rank potatoes with top rank players all in the same match there is no point in changing the MMer.

Once the spread is so high and you are tallying up Elo ratings that look like that over 12 plyaers the prediction breaks down.

I have never been conviced that things like having a load of 5s and 6s on the enemy team and the MM dumping a few 3s on your team to compensate for your 8 or 10 is predicable. imo the players in the match all need to be in a low spread from each other, the over all team Elo is a secondary factor to that.

#131 Xavori

    Member

  • PipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 792 posts

Posted 03 December 2017 - 10:58 PM

View PostGhogiel, on 03 December 2017 - 10:16 PM, said:

Wrong.

This is no better than the PSR problem. If we have to put the lowest rank potatoes with top rank players all in the same match there is no point in changing the MMer.

Once the spread is so high and you are tallying up Elo ratings that look like that over 12 plyaers the prediction breaks down.

I have never been conviced that things like having a load of 5s and 6s on the enemy team and the MM dumping a few 3s on your team to compensate for your 8 or 10 is predicable. imo the players in the match all need to be in a low spread from each other, the over all team Elo is a secondary factor to that.


It doesn't matter if you have ever been convinced. You are not reality. Your opinion doesn't supercede history.

Reality is on my side. There are any number of team games that use individual rankings to make the team that show that it really does work.

A great example is a small dart league I used to play in. The total player base for the league was about 60 players. The ratings when from 0-maybe hits the board with all three darts to 20-will lhit 501 in 12 darts or less every single time they play. We had 4 player teams, and each team was assembled with a 30 point cap and only one 18+ player allowed per team. Every team was competitive, and at the end of each season, the ratings were redone based on performance. The truly ironic part was that even at the start of the league, the team whose 0's played the best (because this league had so many 0's every team had one) were the teams that won, and by the end of the season, the 0's that improved the most basically won the end of season tournaments.

And it's not just darts where this works. Practically every game that uses individual ratings within teams does it the way I described precisely because it's been shown to work. The math has been done and redone and refined for decades. When Microsoft put together their TrueSkill system (or really any of the quality matchmakers out there), they weren't making something totally new. They were simply taking existing work and ideas and putting them together in a way that best fit the kinds of matches they were handling.

So again, it doesn't matter if you are convinced or not. Reality has shown that it works.

#132 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 03 December 2017 - 11:15 PM

View PostXavori, on 03 December 2017 - 10:58 PM, said:


It doesn't matter if you have ever been convinced. You are not reality. Your opinion doesn't supercede history.

Reality is on my side. There are any number of team games that use individual rankings to make the team that show that it really does work.

A great example is a small dart league I used to play in. The total player base for the league was about 60 players. The ratings when from 0-maybe hits the board with all three darts to 20-will lhit 501 in 12 darts or less every single time they play. We had 4 player teams, and each team was assembled with a 30 point cap and only one 18+ player allowed per team. Every team was competitive, and at the end of each season, the ratings were redone based on performance. The truly ironic part was that even at the start of the league, the team whose 0's played the best (because this league had so many 0's every team had one) were the teams that won, and by the end of the season, the 0's that improved the most basically won the end of season tournaments.

And it's not just darts where this works. Practically every game that uses individual ratings within teams does it the way I described precisely because it's been shown to work. The math has been done and redone and refined for decades. When Microsoft put together their TrueSkill system (or really any of the quality matchmakers out there), they weren't making something totally new. They were simply taking existing work and ideas and putting them together in a way that best fit the kinds of matches they were handling.

So again, it doesn't matter if you are convinced or not. Reality has shown that it works.

It doesn't matter that I am conviced that they might get the algorithm for points adjustment right in MWO, history is on my side that they won't because they didn't last time, or that full spread rated players in random teams could potentially be accurately predicted in a game like MWO. Fact is the matches would still be utter garbage and you are wrong that they would be quality matches and we might as well have PSR

#133 A Headless Chicken

    Member

  • PipPipPipPipPipPip
  • The Hungry
  • The Hungry
  • 273 posts
  • LocationImmersed in Stupid.

Posted 03 December 2017 - 11:18 PM

EDIT: I'd like to make a point that your individual performance actually can and will impact your win-loss ratio. Maybe your WLR is supposed to ideally be 1, but please don't forget that there are people out there who actually do more than the 8.3% in a match they are allocated. Incredibly scary players out there who are literally godtier like the gang from EmP and thus extreme outliers who are forced to play in your average match due to having a lack of people at that level of play.

Did you kill plenty of reds? Did you scout the reds and keep vocal about it? Did you play the objective on Conquest and Doom-ination? Did you bring good "carry" loadouts? Pet peeve: People asking why they are getting rolled 12-0 when they bring bracket builds - you shoot yourself in the foot, enough said.

Sure, sometimes your team is a spud fest, but make an effort to grow those spuds and you might just be rewarded, sometimes against the odds. Some people do indeed "luck out" on their win-losses, but not everyone is a case of inflation, especially seeing how group queue has been dead for a few months.

View PostXavori, on 03 December 2017 - 10:58 PM, said:

So again, it doesn't matter if you are convinced or not. Reality has shown that it works.


Small playerbase, check.
Pilot skill more or less being on the bellcurve, check.
Other games not having a perfect elo system, check.
0's playing the best amongst 0's don't automatically make them 100s, check.

I dunno what reality is to you but it seems edible and very delicious.

Edited by A Headless Chicken, 03 December 2017 - 11:57 PM.


#134 Xavori

    Member

  • PipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 792 posts

Posted 04 December 2017 - 12:19 PM

Headless Chicken,

Yes. Scary, great players exist. They are worth more than the 8.3% to their team an average pilot is. They can sway matches.

Unless...

You pair them with 11 mashed potatoes and put them up against a team of good, albeit not great, players. They can only carry so hard, and that kind of match is asking to much. I don't think I'll get much argument about this example.

But let's take it a bit further. What if instead of 11 mashed potatoes, he only gets say 2. The rest of his team is a mix of good and average players and they are fighting a team of good and average players with no potatoes. The great player is going to make up for his potatoes...maybe. He's definitely worth more than both of them put together. But he's still basically leading a team that overall skill-wise, is about the same as his competitors. If the matchmaker does this often enough, the great player paired with potatoes still gets lots of those quality matches where he is as likely to lose as to win.

What's more, is that if you run enough of these quality matches, and yes, we're talking several hundred, you'll get everyone sorted into a player ranking that the matchmaker can use to make sure the great players have to carry a couple terribad potatoes every match because otherwise it isn't fair to the other team to have to try to face such a killing machine as the great pilot.

Trust me. This exact type of matchmaking and team building predates computer games, and we know it works. I'm not suggesting anything earth shattering or amazingly new. I'm suggested using the same old system that people have been using since they first started trying to put together teams of differently skilled players and still have them play competitive matches.

#135 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 04 December 2017 - 02:55 PM

View PostSFC174, on 03 December 2017 - 04:38 PM, said:


Group queue has to be a big factor for people with very high WLR (2:1 and above I'd say). Well, group queue and poor matchmaking in Solo I suppose.

If you look at other multiplayer team games like World of Tanks, it is very rare to find people with win rates over 65% (60%+ is already top 1%). 60% win rate equates to a WLR of 1.5, 66% is 2.0 WLR.

I don't doubt that the people with very high WLR in MWO are generally much better than average. But above 2.0 I'm looking at bad MM or group as the primary drivers.


I broke 2.0 last season with all but 9 matches in QP. Admittedly I almost exclusively ran long range poke meta (HLL/CERML MAD and Deathstrike stuff, just to see how it played out in the averages. If you're wondering - very, very well) and I'm not even that good.

Because of the variable of the Mech Bay I think you'll find really good players absolutely can push an average over 66%. It's just not nearly as common as the Leaderboard indicates.

View Postvandalhooch, on 03 December 2017 - 06:41 PM, said:


No. I'm pointing out that your claim that the leaderboards are the same season after season without the two modes being split is not evidence for anything.



You can't possibly know that. You are just assuming it's true. What you think is true and what you have evidence for are not the same thing.



Calculate for me the adequate sample size given the constant churn of players coming and going from the game over time.



Elo measures the results of 1 v 1 contests under very controlled conditions.

TrueSkill measuring skill at winning matches is what Microsoft CLAIMS it does. What it actually does is give game makers a reasonable enough estimate in order to facilitate matches in a timely manner. The reasonableness of the tool drops the more variables you introduce into the matches. Larger teams make the minimum match number climb. Mech classes and non-random map selection further limit its effectiveness. Mixing solo queue and group queue further destroys the effectiveness of such a system.



Assuming all other factors being equally variable. Which is patently not true.



We had an Elo-ish system before PSR. People complained then as much as they do now. You would think that they might eventually glom on to the fact that there might by some other factor underlying the perceived imbalance in match results besides the matchmaker.

Elo-ish and Trueskill systems always increase your rating on a win. You never, ever win and result in no change.

Here comes that underlying factor you and the others have consistently ignored . . . small available player pool at the time of match creation.

How exactly did you determine it would be 60-100 matches? You know you can't just make numbers up out of nowhere in a discussion about statistics, right?

Limited available player pool at the moment of match creation prevents him/her from consistently "playing against comparable teams."

Unless they team up with Player One and join the group queue and get carried to several wins.

And yet the OP and you keep claiming that your system will create matches with teams of equal skill facing each other. You admit you can't measure skill but your system will make teams that are equal in skill.

HOW DO YOU ACTUALLY MEASURE THE TOTAL SKILL OF EACH TEAM IF YOU CAN'T ACTUALLY MEASURE SKILL?

A proxy measurement of skill is fine. But you guys keep forgetting that it is just a proxy and then you go and do stupid things like total up all the proxy values for each team as if that means anything.


You absolutely can and do have matches both in Elo and TrueSkill that result in a win with 0 change. In fact that's usually the product for both sides when the match plays exactly as predicted when it's unable to make a balanced match. A variable of about 400 pts between the averages will usually result in a 0 win/loss if the match goes as predicted. It's going to depend on what you use for the K factor (which both have, though the TrueSkill one is far more complicated) but both of them come to 0 on a wide enough spread. So a team with a 1400 average vs a team with a 1,000 average, the 1400 team wins the players will get, depending on K factor, 1 or 0 points and the losing team changes 0 points.

If you don't get that the leaderboards showing consistent stats for each player month after month means anything than you need an education on math and statistics that's beyond the scope of what I can do on the forums for you other than give you google links to both. Also Law of Averages, Law of Large Numbers. Then again if you were willing to do the research to understand what you're talking about we wouldn't be having this discussion so it comes to reason that you won't, you'll just continue to argue erroneous points and then keep refusing to actually take the steps to understand what you're discussing.

It's not a 'proxy' measure of skill. It's a performance average in the same environment with the same options available. So you're taking the average performance of the team and generating an average for the team. Ideally you want each team to be composed of players within 150 to 200 pts (on an Elo scale) of each other but periodically you'll end up with unbalanced teams - which will, absolutely, affect the K factor for each player for the match. In part that facet of how your scores change on matches that are outside of the desired range is what TrueSkill does that makes it special.

You are 8.333% of your team, every single game you play. The only thing having group queue in the mix does is skew results out of scope UPWARD on the top of the scale and for people who play a mix of group and pug queue (which is a minority of players, per PGI the last time they gave results it was less than 6% of the player population. Less than those who play FW even) it just increases the number of samples required to give accurate results. Your W/L averaged over 3 months at over 100 matches a month, for example, would account for group queue play that's less than 50% of the players time.

If/when we do get group/pug queue split the only thing it's going to do is shrink the numbers, not the players, on the top few pages and shift a small percentage of players (less than 10%, again, most never touch group queue) by a few percentage points spread over the results.

Again. It's 0 sum. Not hard to figure each players value.

View PostXavori, on 04 December 2017 - 12:19 PM, said:

Headless Chicken,

Yes. Scary, great players exist. They are worth more than the 8.3% to their team an average pilot is. They can sway matches.

Unless...

You pair them with 11 mashed potatoes and put them up against a team of good, albeit not great, players. They can only carry so hard, and that kind of match is asking to much. I don't think I'll get much argument about this example.

But let's take it a bit further. What if instead of 11 mashed potatoes, he only gets say 2. The rest of his team is a mix of good and average players and they are fighting a team of good and average players with no potatoes. The great player is going to make up for his potatoes...maybe. He's definitely worth more than both of them put together. But he's still basically leading a team that overall skill-wise, is about the same as his competitors. If the matchmaker does this often enough, the great player paired with potatoes still gets lots of those quality matches where he is as likely to lose as to win.

What's more, is that if you run enough of these quality matches, and yes, we're talking several hundred, you'll get everyone sorted into a player ranking that the matchmaker can use to make sure the great players have to carry a couple terribad potatoes every match because otherwise it isn't fair to the other team to have to try to face such a killing machine as the great pilot.

Trust me. This exact type of matchmaking and team building predates computer games, and we know it works. I'm not suggesting anything earth shattering or amazingly new. I'm suggested using the same old system that people have been using since they first started trying to put together teams of differently skilled players and still have them play competitive matches.


Ideally it wants to build teams within a range. Such matches would give a better K factor (how much you change scores by based on win/loss) than matches with a high-low to average makeup but it's still going to work out. Just takes more matches to get players seated correctly.

#136 Asym

    Member

  • PipPipPipPipPipPipPipPipPip
  • Nova Captain
  • 2,186 posts

Posted 04 December 2017 - 03:42 PM

View PostBrain Cancer, on 03 December 2017 - 04:34 PM, said:

Honestly, group and solo stats need to be split up. The guys who farm group with well tuned kill squads are not the same as carryharders in solo that basically earn it in spite of 11 randoms per match.

BINGO ! We have a winner !

Another reality check in this whole, long, convoluted and senseless discussion because PGI are not going to do squat until they have the Solaris platform figured out..... By then, the entire spectrum of weapons will be degraded to the point that any statistics gathered mean almost nothing becuase the weapons effectiveness changed performance results..... Has anyone thought of that? That every nerf will cause the algorithm to change because the damage potentials themselves are changing results??? As PGI extends TTK by degrading the effectiveness of the battlespace (changes to precision, weapons effectiveness, hit box calculations, agility degradation, IDF losing even more effectiveness, etc...) and increasing survivability, the MM system will need to consider massive player changes because some players will play a lot and some players very little and those historic numbers could be significantly un-reliable or massively inconsistant....

#137 A Headless Chicken

    Member

  • PipPipPipPipPipPip
  • The Hungry
  • The Hungry
  • 273 posts
  • LocationImmersed in Stupid.

Posted 04 December 2017 - 06:21 PM

On win-lose ratio: Honestly, all it really takes to turn the tide of battle is that one player. Seen it happen more often than not. Sure, having a bunch of potatoes in your team may hinder you but it is entirely possible. I don't mean to brag but my W/L for Quick Play has been stagnant at 2, give or take, for the past few seasons. Group queue is dead in my APAC timezone when I play. Yes, I get decent numbers most games except the ones I bring spastic (read: completely unusable) 'Mechs in. Don't see why its hard for others to comprehend that. Once got 14 straight losses on reasonably well played matches but what the heck, my ratio is still 2 - means I won 28 elsewhere.

View PostXavori, on 04 December 2017 - 12:19 PM, said:

You pair them with 11 mashed potatoes and put them up against a team of good, albeit not great, players. They can only carry so hard, and that kind of match is asking to much. I don't think I'll get much argument about this example.

But let's take it a bit further. What if instead of 11 mashed potatoes, he only gets say 2. The rest of his team is a mix of good and average players and they are fighting a team of good and average players with no potatoes. The great player is going to make up for his potatoes...maybe. He's definitely worth more than both of them put together. But he's still basically leading a team that overall skill-wise, is about the same as his competitors. If the matchmaker does this often enough, the great player paired with potatoes still gets lots of those quality matches where he is as likely to lose as to win.

What's more, is that if you run enough of these quality matches, and yes, we're talking several hundred, you'll get everyone sorted into a player ranking that the matchmaker can use to make sure the great players have to carry a couple terribad potatoes every match because otherwise it isn't fair to the other team to have to try to face such a killing machine as the great pilot.


Kay, I have been asking many questions over and over and found I have been repeating myself and becoming more snarky as it goes.

More importantly, what makes you so sure that the people who are "potatoes, "good" and "legendary" will not end up in the same game after implementation of tr00sk1ll?

Note that we have a small playerbase - I easily see the same names in my games throughout the night. Right now your top 20% of the playerbase has a mix of the above tiers simply because they are active.

Even if PGI develops some amazing magical formula to slot people where their deserved position on the bell curve is (through what metrics is questionable and wow we r goin' 4 real esports bois), nothing much is going to change when only the top 20% of the bell curve plays regularly, which is not many people at all. They'll all either get forced into a game and you whine "potatoes!" which is this thread all over, or nobody gets a game and we all whine and start another rage post or quit.

I have been asking this with no solid answer, and all I keep hearing is TrUeSkIlL wIlL sOlVe EvErYtHInG.



#138 UnofficialOperator

    Member

  • PipPipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 1,493 posts
  • LocationIn your head

Posted 04 December 2017 - 06:23 PM

View PostA Headless Chicken, on 04 December 2017 - 06:11 PM, said:


Kay, I have been asking many questions over and over and found I have been repeating myself and becoming more snarky as it goes.

Honestly, all it really takes to turn the tide of battle is that one player. Seen it happen more often than not. Sure, having a bunch of potatoes in your team may hinder you but it is entirely possible.

More importantly, what makes you so sure that the people who are "potatoes, "good" and "legendary" will not end up in the same game after implementation of tr00sk1ll?

Note that we have a small playerbase - I easily see the same names in my games throughout the night. Right now your top 20% of the playerbase has a mix of the above tiers simply because they are active.

Even if PGI develops some amazing magical formula to slot people where their deserved position on the bell curve is (through what metrics is questionable and wow we r goin' 4 real esports bois), nothing much is going to change when only the top 20% of the bell curve plays regularly, which is not many people at all. They'll all either get forced into a game and you whine "potatoes!" which is this thread all over, or nobody gets a game and we all whine and start another rage post or quit.

I have been asking this with no solid answer, and all I keep hearing is TrUeSkIlL wIlL sOlVe EvErYtHInG.


I'm ashamed to be your countryman...

What hours do you play? Do you only play oceanic/SG primetime?

Then yes of course you will see the same names. More so if you frequently get Oceanic server drops.

Even if the current play base is not optimal, it doesn't mean we shouldn't try to improve on a better system rather than this terrible PSR.

And even if it is implemented, it probably won't affect Oceanic server much if they simply loosen the MM so Oceanic can get games going.

However it would mean a better quality gameplay for MWO's the majority of the player base i.e. the Americans, Canadians and Europeans.

They can also implement other constraints like 3v3, 4v4 or 8v8 to get games going...

Like your signature suggests, you show an amazing lack of imagination.

Edited by UnofficialOperator, 04 December 2017 - 06:23 PM.


#139 A Headless Chicken

    Member

  • PipPipPipPipPipPip
  • The Hungry
  • The Hungry
  • 273 posts
  • LocationImmersed in Stupid.

Posted 04 December 2017 - 06:32 PM

View PostUnofficialOperator, on 04 December 2017 - 06:23 PM, said:


I'm ashamed to be your countryman...

What hours do you play? Do you only play oceanic/SG primetime?

Then yes of course you will see the same names. More so if you frequently get Oceanic server drops.

Even if the current play base is not optimal, it doesn't mean we shouldn't try to improve on a better system rather than this terrible PSR.

And even if it is implemented, it probably won't affect Oceanic server much if they simply loosen the MM so Oceanic can get games going.

However it would mean a better quality gameplay for MWO's the majority of the player base i.e. the Americans, Canadians and Europeans.

They can also implement other constraints like 3v3, 4v4 or 8v8 to get games going...

Like your signature suggests, you show an amazing lack of imagination.


Go ahead, be ashamed, your daiji not mine.

I don't check Oceanic because no one plays there. Do you? Are you salty because of it?

I play noon (NA primetime) and nights (OC primetime) on weekends. And yes I see the same names constantly during either session. Problem?

And what point is there to implement a system that will not change anything? If anything, it maintains the status quo and makes wasted effort and little details on the devs. You can reduce the game sizes, but the distribution of players within a certain range (top 80-100%) will still be matched together, unless you want to tier it so stringently that you can only have games with people +/-5% of your tr00sk1ll. Then nobody gets games.

All for people to cry and nitpick about it again.

EDIT: If you want to make this personal, by all means, go ahead. I have been trying to be objective, and in true Singaporean style, you come in, rebut my arguments and top it off with a few personal insults. Did you honestly think every human being that is not you is stupid? Thank you for for your opinions. I am ashamed to be YOUR countryman.

Edited by A Headless Chicken, 04 December 2017 - 06:48 PM.


#140 sycocys

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Moderate Giver
  • Moderate Giver
  • 7,697 posts

Posted 04 December 2017 - 06:34 PM

My opinion is that the game felt the most balanced (player vs player wise, discluding tech and bugged code) when there was completely random drops.
You generally ended up with a good mixture of skill levels other than the odd times people were sync dropping like mad.

So really if you want a more useful MM system go random on player and MM on a BV of each mech (summed from it's base + it's loadout) instead. Then you can even get rid of the 1/1/1/1 model and just let people run what they like and the MM just rounds up two sides of even tech.

This would also make more sense towards setting up drop decks with the current FW system - remove the tonnage restriction, limit the overall firepower/BV you can bring to a match.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users