Jump to content

Stats Study: Matchmaker Is Unfair

Balance

344 replies to this topic

#161 Dimento Graven

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Guillotine
  • Guillotine
  • 6,208 posts

Posted 22 April 2017 - 12:15 PM

OP,

This was a good post.

Kudos.

#162 Too Much Love

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 787 posts

Posted 22 April 2017 - 01:01 PM

I feel that this discussion went the wrong way.

My original intent was to show what MM is doing, not speculate on why it is doing it.

My point is that MM at its present state (deliberatly or not) assembles unequal teams to the extent, that the outcome of the match in most cases is determined from the first second. In 11 of 12 cases I studied the accounting of 3 variables (W\L, K\D and MS) predicted the winner. Suppose, it's an important finding.

If PGI is anaware of this situation, they should take mesures. At best, build the MM on this 3 variables.

If they already know of this situaton, well, now we know also.

It would be nice to include into the match analysis the data on players tiers. Sadly, this couldn't be done until this information becomes publuc as statistics in the leaderboard.

#163 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 22 April 2017 - 01:19 PM

View Postdrunkblackstar, on 22 April 2017 - 01:28 AM, said:

There are 2 possibilities:

1) The MM does use only tier system.

2) The MM doesn't use tiers system or uses something along with tier system (other variables like K\D and W\L).

In either way the proposed study managed to show something.


What? You can't have your "study" confirm both the null hypothesis and alternative hypothesis at the same time!

Stop trying to defend this. It's nearly meaningless.

Quote

If 1) is correct and MM uses only tier system, it means it works awfully wrong and assembles unequal teams.


You can't possibly know this because you don't know the Tiers of the players in your sampled matches.

Quote

If 2) is correct and MM doesn't use tiers system or uses something along with it, it means that MM deliberately makes unequal teams.


A sample size of twelve matches can't find anything to be "deliberate."

Quote

I'm not 100% sure wich variant is true.


You left off a whole list of other alternatives that are just as well supported by your "data."

Quote

I'm not all into, as you said, "tinfoil plot government hides 9\11 truth theory".

But I'm far from being confident in that PGI told us, at least, the whole truth about MM. For example, they were already cought on "not telling the whole story", when players found out thet tier 1 was matched with tier 4, despite the fact they claimed that it can be macthed with tier 3 only.


They weren't "caught out" at anything. They simply explained that after a certain period of failing to fill up a match, the matchmaker relaxes its restrictions and looks to fill out the match with players beyond the starting criteria. That's something the matchmaker has always done.

Quote

The MM is not that transparent at all. They don't even publish online numbers, why you expect they will reveal you the core mechanics of their business project?


Why do you think that your non-random sample of twelve matches has "discovered" some nefarious plot?

View PostXetelian, on 22 April 2017 - 02:04 AM, said:


What about the other one that used 100? Both show the same results.


No they don't. Did you even read the the other write up?

Edited by vandalhooch, 22 April 2017 - 01:19 PM.


#164 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 22 April 2017 - 01:25 PM

View PostTarogato, on 22 April 2017 - 04:08 AM, said:

Pardon, but what was cherry picked about my study?

I studied extreme match results (stomp matches, like 12-0) in a consistent environment (solo queue),


No you didn't. You even admitted you didn't. You said you included 12-0, 12-1 and SOME 12-2's. That SOME part is the problem. You only included 12-2's that you FELT were stomps. That's the very definition of cherry picking your data.

Quote

and included all of the data I could get my hands on over the course of 2-3 months. I didn't cherry pick any of my data, and I even provided a link to the data I collected if anybody else wanted to look into it.


Yes you did and you flat out told us you did in your initial post. Either you have to include all the 12-2's or exclude all the 12-2's. You can't include some and exclude others because of how you "felt" the match went.

#165 Dremnon

    Member

  • PipPipPip
  • Big Daddy
  • Big Daddy
  • 60 posts
  • LocationWinnipeg, Manitoba

Posted 22 April 2017 - 01:26 PM

Ratio's don't work, period. Someone that has 20 games can have the same W/L or K/D as some that has 2000. Tier structure doesn't work. Everyone given enough play time will end up at Tier 1. Match score doesn't work. A good player deciding to start over on a new account will have a MS that would be insane vs. people just starting. Someone that plays in group play the majority of the time and only occasionally drops in solo que will have different stats or skewed stats vs. someone that drops in solo que all the time. Too many variables to make sense of anything.

First, solo que stats need to be tracked separately than group que if that's not already being done. Second, the one variable that can be controlled that would level out the matches would be the amount of matches played. That is your starting point. You can't fake how many matches you've played. You might get there in different ways, but whether its 20, 50, 100, 15000 that learning curve (for that account) will be the same. From that point you might have to get into mechanics of ratio's for separation into teams, but you can't rely solely on ratios as the starting point.

#166 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 22 April 2017 - 01:31 PM

View Postdrunkblackstar, on 22 April 2017 - 04:50 AM, said:

Maybe you noted that in original post I didn't look into PGi's possible intentions. It's not what I was looking after.

As Marx said about ideology, “They don't know what they are doing, but they are nonetheless doing it”. I don't know why PGI are doing it, but nonetheless they are doing it:)

My goal was to show that there are differences between skill level of opposing teams.


No one said there wouldn't be. BTW, you going to explain how your chosen metrics indicate skill level?


Quote

This spread is rather big (nobody expects perfect 100% eqality), so in many cases the outcome of matches is determined.


I don't think that word means what you think it means. The outcome of every match is always determined. It's determined by the final score of the match. It's not like the match ends but our screens go blank and we never know who won, an undetermined match.

Quote

These results correlates with results of another study, that used much more data.


No it doesn't. Show me your calculation for the correlation coefficient you are basing this conclusion on.

Quote

The rest are speculations. My study can't answer, what's the meaning of life, or if God exists, or what PGI employees eat for breakfast, but it doesn't make this study less important.


It isn't "a study" at all. Studies are rigorous. Studies are well designed. Studies make use of proper statistical analysis.

#167 Too Much Love

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 787 posts

Posted 22 April 2017 - 01:39 PM

View Postvandalhooch, on 22 April 2017 - 01:19 PM, said:


What? You can't have your "study" confirm both the null hypothesis and alternative hypothesis at the same time!

Stop trying to defend this. It's nearly meaningless.

Posted Image


Posted ImagePosted Image

#168 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 22 April 2017 - 01:40 PM

View Postdrunkblackstar, on 22 April 2017 - 01:01 PM, said:

I feel that this discussion went the wrong way.

My original intent was to show what MM is doing, not speculate on why it is doing it.


1 - You didn't show anything despite your intentions.

2 - You definitely speculated on reasons for your imaginary biased matchmaker existing.

Quote

My point is that MM at its present state (deliberatly or not) assembles unequal teams to the extent, that the outcome of the match in most cases is determined from the first second. In 11 of 12 cases I studied the accounting of 3 variables (W\L, K\D and MS) predicted the winner.


That isn't how predictive models work. You can't use your model (W/L, K/D and MS metrics) to predict the outcomes of matches that you used to create the model.

Quote

Suppose, it's an important finding.

If PGI is anaware of this situation, they should take mesures. At best, build the MM on this 3 variables.


How exactly would that work? What is the weighting system you will use to combine the three metrics? Does it account for the a new player having high ratios due to them being pitted against other new players? What about players who game the system by hiding to preserve KDR or players who tend to pilot lights that result in lower average match scores?

Quote

If they already know of this situaton, well, now we know also.

It would be nice to include into the match analysis the data on players tiers. Sadly, this couldn't be done until this information becomes publuc as statistics in the leaderboard.


There are pluses and minuses to the idea of making Tiers public either before or after each match.

Edited by vandalhooch, 22 April 2017 - 01:41 PM.


#169 Too Much Love

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 787 posts

Posted 22 April 2017 - 01:45 PM

View Postvandalhooch, on 22 April 2017 - 01:40 PM, said:


1 - You didn't show anything despite your intentions.

2 - You definitely speculated on reasons for your imaginary biased matchmaker existing.

Posted Image

#170 Too Much Love

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 787 posts

Posted 22 April 2017 - 02:06 PM

View Postvandalhooch, on 22 April 2017 - 01:54 PM, said:


You failed. Nice of you to admit it in a passive aggressive manner.
Please, stop spamming this thread.

#171 Dimento Graven

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Guillotine
  • Guillotine
  • 6,208 posts

Posted 22 April 2017 - 02:10 PM

View Postvandalhooch, on 22 April 2017 - 01:54 PM, said:

You failed. Nice of you to admit it in a passive aggressive manner.
Actually I got his point.

The point is, Match Maker is currently NOT doing its intended job.

The intended job, I believe we all agree is to create as evenly matched teams as possible, so that each team has as near a 50/50 chance of winning/losing as is reasonable.

>>IF<< what MM is doing is just taking the Tier value, and weight class of 'mech selected, and assembling teams, THAT is NOT enough to do its job.

As pointed out (badly by most, so I will put it in the proper terms), as long as you are winning with a high enough score, more often than you are losing with a bad enough score, EVENTUALLY, it is possible to grind your way to Tier 1, and still be an 'average' player.

(Note: Let's stop the BS of "eventually everyone will be Tier 1" lying to ourselves. There's people that have been playing this game for YEARS who have NOT made it to Tier 1. Some people are just bad enough, or have terribly bad ISP's, or have absolute potato computer systems, or some physical disability, where they can't help but lose badly more often than not, those people will NEVER make it to Tier 1.)

As the OP, and others who have posted similar threads have stated, MM should be using other stats, other than just Tier, to assemble teams to create more balanced matches.

This isn't all that hard to understand, infer, or otherwise comprehend.

Edited by Dimento Graven, 22 April 2017 - 02:11 PM.


#172 Tarogato

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Civil Servant
  • Civil Servant
  • 6,558 posts
  • LocationUSA

Posted 22 April 2017 - 02:26 PM

View Postvandalhooch, on 22 April 2017 - 01:25 PM, said:

No you didn't. You even admitted you didn't. You said you included 12-0, 12-1 and SOME 12-2's. That SOME part is the problem. You only included 12-2's that you FELT were stomps. That's the very definition of cherry picking your data.

Yes you did and you flat out told us you did in your initial post. Either you have to include all the 12-2's or exclude all the 12-2's. You can't include some and exclude others because of how you "felt" the match went.


Ah, I see what you're saying. See... I considered the requirements for a match being included to be boolean. Either it felt like a stomp, or it didn't. To me, a 12-0 or a 12-1 always feels like a stomp. But a 12-2... sometimes it is, and sometimes it isn't. If it makes you feel any better... I probably had less than a dozen 12-2 matches, and they certainly would not have swayed the results by much at all. I noticed strong correlations when my sample size was only 30 matches... by I wanted to collect at least 100, so I did so, and the patterns only seemed to grow stronger.

#173 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 22 April 2017 - 02:27 PM

View PostDimento Graven, on 22 April 2017 - 02:10 PM, said:

Actually I got his point.

The point is, Match Maker is currently NOT doing its intended job.


I know you want that to be true but nothing this guy has posted supports such a conclusion.

Quote

The intended job, I believe we all agree is to create as evenly matched teams as possible, so that each team has as near a 50/50 chance of winning/losing as is reasonable.


Nope. The matchmaker's job is to create matches as quickly as possible while trying to reduce the level of mixing of experienced and inexperienced pilots in any particular match.

Your definition of "evenly matched teams" is an impossibility for a human to create even given infinite time let alone a relatively simple algorithm.

Quote

>>IF<< what MM is doing is just taking the Tier value, and weight class of 'mech selected, and assembling teams, THAT is NOT enough to do its job.


Why not?

Quote

As pointed out (badly by most, so I will put it in the proper terms), as long as you are winning with a high enough score, more often than you are losing with a bad enough score, EVENTUALLY, it is possible to grind your way to Tier 1, and still be an 'average' player.


So?

Quote

(Note: Let's stop the BS of "eventually everyone will be Tier 1" lying to ourselves. There's people that have been playing this game for YEARS who have NOT made it to Tier 1. Some people are just bad enough, or have terribly bad ISP's, or have absolute potato computer systems, or some physical disability, where they can't help but lose badly more often than not, those people will NEVER make it to Tier 1.)

As the OP, and others who have posted similar threads have stated, MM should be using other stats, other than just Tier, to assemble teams to create more balanced matches.


When you come up with the appropriate metrics for determining a player's true skill be sure to file for a patent on it because every single game programmer on the planet will come knocking on your door.

Quote

This isn't all that hard to understand, infer, or otherwise comprehend.


It does seem hard for some to comprehend that what they think of as easy to define, player skill level, is nothing of the sort.

#174 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 22 April 2017 - 02:33 PM

View PostTarogato, on 22 April 2017 - 02:26 PM, said:

Ah, I see what you're saying. See... I considered the requirements for a match being included to be boolean. Either it felt like a stomp, or it didn't. To me, a 12-0 or a 12-1 always feels like a stomp. But a 12-2... sometimes it is, and sometimes it isn't.


Cherry picking. Now tell me why a 12-3 would never be considered a stomp. Be sure to back it up with something more than "I just feel that way." Statistics don't work off of what you "feel" about things no matter how clever you think your use of the term boolean is.

Quote

If it makes you feel any better... I probably had less than a dozen 12-2 matches, and they certainly would not have swayed the results by much at all.


You can't possibly know that because you didn't actually run those calculations did you?

Now, if you drop all the 12-2's a re-run the test you are guilty of cherry picking your results. You need to decide if all 12-2's count or don't count BEFORE you begin collecting any data.

Quote

I noticed strong correlations when my sample size was only 30 matches... by I wanted to collect at least 100, so I did so, and the patterns only seemed to grow stronger.


Peaking at the data during collection is also a statistical error. Your "study" is just as meaningless as his.

#175 Dimento Graven

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Guillotine
  • Guillotine
  • 6,208 posts

Posted 22 April 2017 - 02:44 PM

View Postvandalhooch, on 22 April 2017 - 02:27 PM, said:

I know you want that to be true but nothing this guy has posted supports such a conclusion.
Except I know how to read and comprehend a point not specifically stated.

Quote

Nope. The matchmaker's job is to create matches as quickly as possible while trying to reduce the level of mixing of experienced and inexperienced pilots in any particular match.

Your definition of "evenly matched teams" is an impossibility for a human to create even given infinite time let alone a relatively simple algorithm.
You're wrong, I'm right. I'm using PGI's own words. "as evenly matched as possible".

Yes, there's a time limiting factor of the MM that is a portion of the calculation, however, the goal isn't just slap some people together as fast as possible. We originally started that way in MWO, and there's a very loud and vocal minority who did nothing but ceaselessly post on the forums about it until we ended up with the original elo-based MM abortion.

Then we all bitched about that for years until they came out with the Tier system, which was supposed to do a better job at factoring actual individual performance, which due to its "win weighted" scoring methodology, it couldn't help but fail, too.

Quote

Why not?

So?
Yeah, now it looks like you're just here to argue, not actually "discuss" anything with an intent to glean some sort of solution.

BUT I'll restate it: The Tier scoring system is slanted, GREATLY, towards heavily scoring wins much, much, MUCH more so than it penalizes losing. It makes it such that even a low-average player can, EVENTUALLY, make it to Tier 1, and be grouped with people who are SIGNIFICANTLY better than himself.

If the points lost for losing were scaled to be, at least, on par to the points awarded for winning, the skills in the various Tiers would be more stratified. People who consistently play at a Tier 4 level would probably still be Tier 4.

As it is now, you have to be incredibly bad, or have a horribly unreliable ISP, or completely craptastic computer, or some disability, to NOT go up in rank. There are people playing under these conditions, hence we have a small subset of players who have never, and probably never will (without some dramatic change) be Tier 1.

If the Tier-ing system is failing, there's no possible way for MM to do its job. MM can't do its job using Tiers, so it should be using other data, W/L, MS, etc. all seem like a good place to start.

Quote

When you come up with the appropriate metrics for determining a player's true skill be sure to file for a patent on it because every single game programmer on the planet will come knocking on your door.
You mean besides win/loss ratio, plus average damage, plus average match score?

You mean using THOSE 3 numbers might not be better than say, just ONE number of "1","2","3","4", or "5"?

As was mentioned, there was another thread by someone else I'd read a long while back, and if I remember right he took the people dropped in the match, rearranged them with 'mech weight, and those 3 numbers and came out with sides that were absolutely MORE balanced than they were when MM originally reassembled them.

Quote

It does seem hard for some to comprehend that what they think of as easy to define, player skill level, is nothing of the sort.
Uh huh, and yet a potato is very easy to spot when observing his play.

#176 Tarogato

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Civil Servant
  • Civil Servant
  • 6,558 posts
  • LocationUSA

Posted 22 April 2017 - 03:05 PM

View Postvandalhooch, on 22 April 2017 - 02:33 PM, said:

Cherry picking. Now tell me why a 12-3 would never be considered a stomp. Be sure to back it up with something more than "I just feel that way." Statistics don't work off of what you "feel" about things no matter how clever you think your use of the term boolean is.

You can't possibly know that because you didn't actually run those calculations did you?

Now, if you drop all the 12-2's a re-run the test you are guilty of cherry picking your results. You need to decide if all 12-2's count or don't count BEFORE you begin collecting any data.


I guess it would make you feel better if I did that. I didn't note which matches were which, but I still have all of the screenshots. With a little effort, I could go back and remove the 12-2's.


And btw, cherry-picking would have been if I analysed the numbers from certain matches, and rejected them if they didn't support my hypothesis. That would be cherry picking. The 12-2 matches I included or excluded, I did so without knowing the numbers that they would provide for the data - it was all candid.


Quote

Peaking at the data during collection is also a statistical error. Your "study" is just as meaningless as his.


There is nothing wrong with peeking at the data. What is wrong is "selective stopping" - deciding to stop collecting data when you see there is a "desirable" result.

Edited by Tarogato, 22 April 2017 - 03:09 PM.


#177 Zergling

    Member

  • PipPipPipPipPipPipPipPipPip
  • The Angel
  • The Angel
  • 2,439 posts

Posted 22 April 2017 - 03:18 PM

View Postdrunkblackstar, on 22 April 2017 - 01:01 PM, said:

the outcome of the match in most cases is determined from the first second.


That's a load of bull.

Because if that were the case, W/L would largely be due to random luck, which would make it impossible for players to have high or low W/L over large numbers of battles.

Eg, if the odds of a win/loss were 50/50, then my 700 wins to 510 losses I have on the Quick Play leaderboard, which is 100% solo queue play, would have a 1 in 38.4 million chance of occurring.
And there are many players with far higher W/L and far more battles than me, which lowers the odds of their W/L occurring due to random luck dramatically.

Edited by Zergling, 22 April 2017 - 03:29 PM.


#178 SFC174

    Member

  • PipPipPipPipPipPipPip
  • The Pharaoh
  • The Pharaoh
  • 695 posts

Posted 22 April 2017 - 03:24 PM

Taro and Blackstar, I really don't think you're going to get anywhere with Vandal. Unless you have full access to all match data and player data from PGI and can run a full analysis on all games played over a sample period, he's going to pick holes in anything you put out there (and we all know PGI will never give us that data).

He's certainly not looking to help you do a better job of figuring out whether or not the matchmaker does a good job of balancing teams. I appreciate the efforts you've made to analyze what data you have, but don't waste your time on this argument. Keep collecting data if you can and do what analysis is possible.

#179 Tarogato

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Civil Servant
  • Civil Servant
  • 6,558 posts
  • LocationUSA

Posted 22 April 2017 - 03:29 PM

View PostZergling, on 22 April 2017 - 03:18 PM, said:


That's a load of bull.

Because if that were the case, W/L would largely be due to random luck, which would make it impossible for players to have high or low W/L over large numbers of battles.


No, it's true. It's pretty easy to predict the outcome of a match before it occurs just by looking at the cumulative WLR of both teams. Sometimes you can do it just by looking at names of higher level players you recognise.

#180 Zergling

    Member

  • PipPipPipPipPipPipPipPipPip
  • The Angel
  • The Angel
  • 2,439 posts

Posted 22 April 2017 - 03:42 PM

View PostTarogato, on 22 April 2017 - 03:29 PM, said:

No, it's true. It's pretty easy to predict the outcome of a match before it occurs just by looking at the cumulative WLR of both teams. Sometimes you can do it just by looking at names of higher level players you recognise.


Ok fair enough, I'm talking more about personal W/L than predicting individual battle results.

It certainly is possible to predict match results to some extent if skill levels of players on each team is known, but personal W/L doesn't come down to 'pre-determined matchmaker results', as each player is able to influence the battles to cause their W/L to differ from 1.00.

I mean, if the matchmaker was pre-determining battle results, each battle would have a 50/50 chance of a win/loss, which makes players with W/L substantially different from 1.00 over a substantial number of battles impossible.

Edited by Zergling, 22 April 2017 - 03:43 PM.






15 user(s) are reading this topic

0 members, 15 guests, 0 anonymous users