Jump to content

Stats Study: Matchmaker Is Unfair

Balance

344 replies to this topic

#21 Bud Crue

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Rage
  • Rage
  • 9,883 posts
  • LocationOn the farm in central Minnesota

Posted 19 April 2017 - 10:09 AM

View PostSilentFenris, on 19 April 2017 - 10:00 AM, said:

First - drunkblackstar, thanks for putting the time into this and posting/sharing. I love that you chose to use 3 indicators (Win/Loss, Kill/Death and Match Score) rather than just one.

Unstated Assumptions for the study to be valid:

- Each pilot must be using a mech/build that they perform in compable to their Leaderboard stats. If they are testing a new build or trying something different "just for fun" it would make their Leaderboard stats irrelavant to that match.

- Pilot must be a primarly solo-queue player. Some unit/team players would have a "artifically pumped" Win/Loss stat if they play group queue more often than Solo queue.

- Both sides had equally proficent dropcallers. Easy to know for your team, hard to know for the enemy. The best strategist isn't always the one the team listens to. The "better" team can loose if an idiot but charismatic drop caller is running things. One order properly obeyed/executed can turn a whole match for better or worse.

- Both teams had mechs equally suited to the Map and Gametype choosen.

As far as the results, others have already said PGIs goal is 50/50 or a Win/Loss ratio of 1.00. Good business model or not, it is sustainable. Unfortunately it is not good for match QUALITY as others have also stated the Match Maker seems a little heavy-handed which results in more 12-0 stomps than quality matches.


No...just...that isn't how stats work. I can't.


#22 Savage Wolf

    Member

  • PipPipPipPipPipPipPipPip
  • The Wolf
  • The Wolf
  • 1,323 posts
  • LocationÅrhus, Denmark

Posted 19 April 2017 - 10:10 AM

View PostSilentFenris, on 19 April 2017 - 10:00 AM, said:

As far as the results, others have already said PGIs goal is 50/50 or a Win/Loss ratio of 1.00. Good business model or not, it is sustainable. Unfortunately it is not good for match QUALITY as others have also stated the Match Maker seems a little heavy-handed which results in more 12-0 stomps than quality matches.

Why is a Win/Loss ratio of 1.00 bad? How else would you measure being matched against people of your own skill level?

Also 12-0 has nothing to do with the matchmaker. That's because of the deathspiral nature of the game. Should naturally happen even with two evenly matched teams.

#23 ScrapIron Prime

    Member

  • PipPipPipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 4,803 posts
  • LocationSmack dab in the middle of Ohio

Posted 19 April 2017 - 10:12 AM

Quick caveat here... 12 matches is not a study. Its statistically interesting, but not significant because you cant rule out flukes with so few data points.

Now if that same method was applied to 200 games, that woupld be a more rock solid study. And perhaps it would find the same thing.

#24 Too Much Love

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 787 posts

Posted 19 April 2017 - 10:14 AM

View PostScrapIron Prime, on 19 April 2017 - 10:12 AM, said:

Quick caveat here... 12 matches is not a study. Its statistically interesting, but not significant because you cant rule out flukes with so few data points.

Now if that same method was applied to 200 games, that woupld be a more rock solid study. And perhaps it would find the same thing.

In fact I wrote about it in the end of the post.

#25 ScrapIron Prime

    Member

  • PipPipPipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 4,803 posts
  • LocationSmack dab in the middle of Ohio

Posted 19 April 2017 - 10:16 AM

Yep, but i dont think most ppl picked up on that. =)

#26 Archer Magnus

    Member

  • PipPipPipPipPipPip
  • Philanthropist
  • Philanthropist
  • 218 posts
  • LocationFoCo

Posted 19 April 2017 - 10:26 AM

Great work mate!

#27 Maurice Thorez

    Member

  • PipPipPip
  • 58 posts

Posted 19 April 2017 - 11:32 AM

View Postdrunkblackstar, on 19 April 2017 - 09:06 AM, said:

Since the very beginning of MWO there has been an ongoing discussion about the matchmaker. Is it good or bad, is it balanced or biased, does it assemble equal or unequal teams?

Not long ago the players data became public via Leaderboard. It means that now we can examine how matchmaker really works - all we need to do is to compare the teams comparison and the players performance available in Leaderboard.

I made such an attempt and want to share results and my conclusions.

The method of the study.

1) I made the screenshots with the results of the game in the end of the match

2) Using Leaderboard I checked stats of every player who participated in the match (the stats of the current gaming season)

3) Then I calculated the average kill/death ratio, win/loss ratio and average matchscrore (MS) for the players of the victorious and defeated teams (of course, it was based not on the performance in this match, but in the whole season).

The scope of the study

I've analyzed 12 matches played in solo queue during 18-19 April, 2017.

The results

The study showed that in the overwhelming majority of cases the victorious team had an initial advantage. It consisted of players who had higher W\L, K\D and average MS. The opposing team had lower average W\L, K\D and MS.

In 6 matches the players of the team that gained victory had higher W\L, K\D and MS.
In 5 matches the players of the team that gained victory had higher performance among 2 of 3 stats (e.g. they had higher W\L and MS, but their K\D was lower).
In only 1 match the winners had lower average W\L, K\D and MS then the defeated team.

Posted Image

The conclusions

I've reinforced my impression that the outcome of the match is determined by the matchmaker. In fact matchmaker doesn't assemble the equal teams. It makes teams to be unequal. The one team is determined to win, the other - to lose.

Among 12 analyzed matches there was only one exception to this rule. In 90% of the matches the result could be easily predicted after examining of the players stats from Leaderboard.

The question is why matchmaker is programmed that way. Nobody expects the teams to be equal 100%. But the differences between the teams is sometimes striking.

For example in the match №1 the winners average K\D was 1.6, the losers - 0.92. W\L - 1.3 and 1.06, MS - 260 and 208 respectively.

In the match №2 the winners average K\D was 1.31, the losers - 0.91. W\L - 1.29 and 1.01, MS - 238 and 198 respectively.

This difference is really huge. Those 24 people could be mixed the other way to smooth it out, but instead matchmaker formed one "strong" and one "weak" team.

We can imagine some fantastic machine (that could be build by someone with programming skills) that can predict the result of the battle in the beginning of the match. The person takes screenshot of the participants, then this screenshot is scanned, the program redirects the names to the Leaderboard, calculates team's performance and give the result. In 90% (if not more) it would be correct.

I understand that 12 matches is not enough to make really representative sample and come to the firm conclusions. But I believe it shows the trend. I encourage other players, who want to spend time and effort, to make their own examine of the matchmaker.

The data that I used can be found here:
https://docs.google....xejU/edit#gid=0


Not to nitpick(I know this not meant to be an academic study by any means), but there are a couple statistical warning signs.

First of all, I would say you need a larger sample than 12 matches. Sure, lopsided average match scores per team is definitely an issue in any inidividual match, but frequency of this important too. Those first 2 matches that you highlighted were the most lopsided(in terms of match score difference) from a brief glance at your data. The others were much more even with only a 20-25 point difference between teams. If anything, the considerable variation between inidividual team average match score(190-260) indicates some noise in the data and a need for a larger samlpe.

Second, from double checking a couple of names, it seems you only took data from this season. I would suggest you use stats from further back(the past 6 months perhaps from which point they put in the 10 match limit) to get a more balanced view. My own stats can vary quite a bit on a month to month basis based on weight class, chassis, and how meta the builds I am playing The same would be true of other players.

I have been tempted to do this myself, but it is very laborious to do this manually for 100+ matches. If I had the technical skills to scrape the leaderboard data, it would be at least quite a bit easier.

Edited by Maurice Thorez, 19 April 2017 - 11:39 AM.


#28 SilentFenris

    Member

  • PipPipPipPipPip
  • Bridesmaid
  • Bridesmaid
  • 163 posts
  • LocationCalifornia

Posted 19 April 2017 - 11:34 AM

View PostSilentFenris, on 19 April 2017 - 10:00 AM, said:

First - drunkblackstar, thanks for putting the time into this and posting/sharing. I love that you chose to use 3 indicators (Win/Loss, Kill/Death and Match Score) rather than just one.

Unstated Assumptions for the study to be valid:

- Each pilot must be using a mech/build that they perform in compable to their Leaderboard stats. If they are testing a new build or trying something different "just for fun" it would make their Leaderboard stats irrelavant to that match.

- Pilot must be a primarly solo-queue player. Some unit/team players would have a "artifically pumped" Win/Loss stat if they play group queue more often than Solo queue.

- Both sides had equally proficent dropcallers. Easy to know for your team, hard to know for the enemy. The best strategist isn't always the one the team listens to. The "better" team can loose if an idiot but charismatic drop caller is running things. One order properly obeyed/executed can turn a whole match for better or worse.

- Both teams had mechs equally suited to the Map and Gametype choosen.

As far as the results, others have already said PGIs goal is 50/50 or a Win/Loss ratio of 1.00. Good business model or not, it is sustainable. Unfortunately it is not good for match QUALITY as others have also stated the Match Maker seems a little heavy-handed which results in more 12-0 stomps than quality matches.


View PostSavage Wolf, on 19 April 2017 - 10:10 AM, said:

Why is a Win/Loss ratio of 1.00 bad? How else would you measure being matched against people of your own skill level?

Also 12-0 has nothing to do with the matchmaker. That's because of the deathspiral nature of the game. Should naturally happen even with two evenly matched teams.


I didn't say anywhere in my post that a Win/Loss of 1.00 is "good" or "bad". I did say poor QUALITY of matches results by having the Matchmaker attempt to maintain a 1.00 ratio on players.

I don't subscribe to your "deathspiral" theory for 12-0 matches. 12-3 and 12-4 losses seem more plausible to account for one team overpowering as the weaker team begins loosing mechs. Even a moderate amount of cooperation and focus-fire should in at least 1 or 2 casualties on the Winning team. To achieve a 12-0 the Loosing team has to make a major blunder, typically by spreading out so their teamates can be picked off a few at a time and doing minimal damage before their mech is reduced to a smoking wreck.

Edited by SilentFenris, 19 April 2017 - 11:48 AM.


#29 Too Much Love

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 787 posts

Posted 19 April 2017 - 11:35 AM

View PostMaurice Thorez, on 19 April 2017 - 11:32 AM, said:


Not to nitpick(I know this not meant to be an academic study by any means), but there are a couple statistical warning signs.

First of all, I would say you need a larger sample than 12 matches...
I knew that there would be a lot of people who would say "this data is not enough", so I specifically underlined it in the original post:

View Postdrunkblackstar, on 19 April 2017 - 09:06 AM, said:

I understand that 12 matches is not enough to make really representative sample and come to the firm conclusions. But I believe it shows the trend. I encourage other players, who want to spend time and effort, to make their own examine of the matchmaker.


#30 Maurice Thorez

    Member

  • PipPipPip
  • 58 posts

Posted 19 April 2017 - 11:37 AM

View PostSavage Wolf, on 19 April 2017 - 10:10 AM, said:

Why is a Win/Loss ratio of 1.00 bad? How else would you measure being matched against people of your own skill level?

Also 12-0 has nothing to do with the matchmaker. That's because of the deathspiral nature of the game. Should naturally happen even with two evenly matched teams.


There is at least one egregious match, the first one he cited with an average match score of 259 versus a team with one of 208. If I recall Tarogato's data set from scraping the leaderboards, that is like matching a team around the top 25 percentile of match score against one in the 75th percentile. That should not happen, that is very unbalanced in terms of skill distribution.

View Postdrunkblackstar, on 19 April 2017 - 11:35 AM, said:

I knew that there would be a lot of people who would say "this data is not enough", so I specifically underlined it in the original post:


My apologies, I missed that. Several others beat me to the punch in saying that too.

#31 Trev Firestorm

    Member

  • PipPipPipPipPipPipPipPip
  • The Boombox
  • The Boombox
  • 1,240 posts

Posted 19 April 2017 - 11:57 AM

This sort of topic makes me think about noobmeter for WoT... a good tool for evaluating matchmaker, but creates a destabilizing/toxic atmosphere that can artificially decide matches before they begin based entirely on psychological effects.

#32 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 19 April 2017 - 11:59 AM

40 is a solid statistical sample but 12 isn't terrible. Good enough for a "guidance" look.

I'm just wondering if the virtual lobby can shuffle. As in once it pulls someone to put in a lobby does it shuffle teams or rebuild lobbies.

#33 SilentFenris

    Member

  • PipPipPipPipPip
  • Bridesmaid
  • Bridesmaid
  • 163 posts
  • LocationCalifornia

Posted 19 April 2017 - 12:02 PM

View PostBud Crue, on 19 April 2017 - 10:09 AM, said:

No...just...that isn't how stats work. I can't.


Ah, I think I see what gave you a headache Crue, I said "Assumptions for the study to be valid:" when I meant "for the study's conculsion to be valid." You are right that the stats themselves won't be changed by Assumptions. I've ammended my previous post.

Edited by SilentFenris, 19 April 2017 - 03:42 PM.


#34 Maurice Thorez

    Member

  • PipPipPip
  • 58 posts

Posted 19 April 2017 - 12:08 PM

View PostTrev Firestorm, on 19 April 2017 - 11:57 AM, said:

This sort of topic makes me think about noobmeter for WoT... a good tool for evaluating matchmaker, but creates a destabilizing/toxic atmosphere that can artificially decide matches before they begin based entirely on psychological effects.


Yeah, anything that fuels people suiciding at the start due to low odds is not good for a game. I saw plenty of lopsised matches turned around too because a 20 percent chance to win is still a 20 percent chance. At least that was the way it was when I last played it 3 or so years ago.

I did find some of the underlying items built into the rating system interesting. I would love to have a large sample of Commando matches from the general player population with average k/d, damage, win/loss, and more so I could evaluate my results on individual mechs . That data could give a lot more substance to the community's ongoing balance debates too.

#35 Jman5

    Member

  • PipPipPipPipPipPipPipPipPip
  • Littlest Helper
  • Littlest Helper
  • 4,914 posts

Posted 19 April 2017 - 12:09 PM

Large sample sizes are more necessary when the effect is subtle. Like if if a coin flipped heads 51% of the time you would have to flip it a lot to conclude that with any confidence.

In this case 11 out of 12 games went as predicted, which isn't subtle at all!

#36 Champion of Khorne Lord of Blood

    Member

  • PipPipPipPipPipPipPipPipPip
  • Shredder
  • Shredder
  • 4,806 posts

Posted 19 April 2017 - 12:19 PM

I still stand by a Match maker based either purely off of KDR or at least average match score rather than the XP bar.

#37 C E Dwyer

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 9,274 posts
  • LocationHiding in the periphery, from Bounty Hunters

Posted 19 April 2017 - 12:21 PM

Match maker is designed to try to balance out peoples win/loss Ratio at 1/1

It's the Mechanic that P.G.I have said is part of match maker. It's been the case since P.G.I stopped using ELO with a separate entry for each weight class, to this muated thing they use now.

It's only other factors are It tries to throw in are the same number of mechs for each weight class, and tries to keep the same tiered players together, opening the 'gates' after X amount of time, to form a match.

#38 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 19 April 2017 - 12:27 PM

View Postdrunkblackstar, on 19 April 2017 - 10:04 AM, said:

If it was as you propose, then the average stats of team 1 and team 2 would be relatively close.

Not really, because stats have nearly nothing what so ever to do with tier. The MM doesn't know what players stats are and in all likelihood one team out the 2 will almost certainly have some adv in stats*

And lets face it the advantage you are talking about is like 10-25 points difference in matchscore between the teams and ~0.2 difference W/L, which isn't anything to write home about and is about what everyone should expect.

And the MM doesn't care. But I'd bet you the number of Tier 1-3s in those matches are pretty close to the same spread between teams. And that's all the MM is doing besides matching weight classes. It's just that in T1 there are both 1.1 W/L peasants aswell as 4 W/L master races. where having a master race player is avg double the matchscore and W/L as someone the MM thinks is their equal.

Edited by Ghogiel, 19 April 2017 - 12:36 PM.


#39 Too Much Love

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 787 posts

Posted 19 April 2017 - 12:29 PM

View PostTrev Firestorm, on 19 April 2017 - 11:57 AM, said:

This sort of topic makes me think about noobmeter for WoT... a good tool for evaluating matchmaker, but creates a destabilizing/toxic atmosphere that can artificially decide matches before they begin based entirely on psychological effects.
It would be based not on psychological effects, but on science.

View Postdrunkblackstar, on 19 April 2017 - 09:06 AM, said:

We can imagine some fantastic machine (that could be build by someone with programming skills) that can predict the result of the battle in the beginning of the match. The person takes screenshot of the participants, then this screenshot is scanned, the program redirects the names to the Leaderboard, calculates team's performance and give the result. In 90% (if not more) it would be correct.


#40 Ghogiel

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • CS 2021 Gold Champ
  • CS 2021 Gold Champ
  • 6,852 posts

Posted 19 April 2017 - 12:35 PM

View Postdrunkblackstar, on 19 April 2017 - 12:29 PM, said:

It would be based not on psychological effects, but on science.

Yep, it wouldn't have any psychological impact on games if the predicted outcome was kept hidden until the game actually had finished being played, and I'd expect the result would be nigh identical with like 90% success rate.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users