Jump to content

Stats Study: Matchmaker Is Unfair

Balance

344 replies to this topic

#221 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 23 April 2017 - 10:18 AM

View PostTarogato, on 23 April 2017 - 05:24 AM, said:



I'm just gonna leave this here...

You very clearly know a lot more about this than any of us. I'm not being facetious... I've actually already learned a few things from you and I'm not going to pretend I didn't. I'm not a statistician, or scientist, I'm just a dude with copious spare time and big crush on curiousity - I have a lot to learn still, and I accept that I will make mistakes and can unintentionally misrepresent data.


Totally cool and a very, very healthy attitude to have.

Quote

Now, I did limit the scope of my "study" for practical purposes. I entered my data manually, and would need a working OCR to expand the scope enough to even attempt to assuage the concerns you've raised. But even if I did that, I could still fall victim to more scientific inadequacies due to my growing but decidedly limited knowledge.


Sure. Properly collecting unbiased data is a skill that young science students are specifically taught and practice. Even still, veteran scientists still fall prey to unforeseen biases from time to time.

Quote

Now, I've already spent maybe... two hours? ... just reading and replying to this thread, and I suspect perhaps you might have as well. It would be absolutely wonderful if somebody like you, with superior knowledge, experience, and ideals... spent this sort of time doing this kinda of work. Showing how it's done properly, and what ACTUAL objective conclusions can be definitively drawn. I'd love that! I'd really like to see somebody one-up me. But nobody wants to spend the time! I'm fallible, duh! I wish more people cared enough to actually put in this kind of work, rather than just bickering about hypotheticals on the forums like so many do.


Man, the hours you must have spent going through and matching up all the pilots from every match with their appropriate entries from the leaderboards . . . Gives me a headache just thinking about it.

Your technique is on to something that could be useful if you set up the proper protocols for data collection from the start.

1 - solo queue only (nice job on that one)

2 - skirmish only (different modes might give or take advantageous to stronger or weaker teams)

3 - ideally one map only - however if you collect a large enough sample size (say 500 matches to be safe) then you should end up with a similar frequency of maps to the maps used in the leaderboard calculations

4 - unfortunately the last one kind of sinks the whole endeavor: leaderboard data needs to be broken into solo vs. group queue in order to be most unbiased. You can't rate players as strong or weak based on their performance in a different environment or worse based on an unknown mixing of play in both environments. If you had access to the separate group vs. solo leaderboard stats then you might be able to demonstrate that group queue data is still appropriate for assigning strength within solo queue matches. In fact, I suspect you would probably find that to be the case. But, without having done that initial analysis any conclusions drawn from the pooled leaderboard data must be held up as suspect.

Quote

What I mean to say is... the work I've done here is pretty much the best we have so far. And I know it's not great work, it's amateur, and you've pointed out flaws quite clearly. But who wants to step up and do more, better? Or the golden question... why should WE have to, when it's PGI's job? At the end of the day, I'll be happy if anybody shows conclusive enough evidence to merit PGI's attention, and prod them into investigating themselves, and addressing our concerns as a community that we feel the matchmaker could do a better job with the hand that it is dealt.


Pretty hard for anyone to do since we lack the access to the appropriate data. But that's going to be true of every gaming company in existence.

Quote

Sorry, this might have come off as a bit like "I put in the work even if it's shoddy, and you didn't put in any, therefore I'm above you". I realise that... I apologise, I don't intend that by any means. But I'm just not sure where to go from here. You're being very critical and argumentative, when you have an opportunity to be critical and contributive. If none of this research is valid, for various reasons, what *can* we do?


I'm not being critical of the hard work you did. I'm being critical of the inappropriate use to which it's being applied.

See above for a basic layout as to what can be done given the limited resources we have.

Quote

Or is everything you have in mind beyond that which is reasonable practical for us, the playerbase, and thus futile? If we have a feeling that the matchmaker could be better, how should *we* go about showing to PGI that our concerns have merit?


See above and be prepared for a very, very long slog.

#222 Dimento Graven

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Guillotine
  • Guillotine
  • 6,208 posts

Posted 23 April 2017 - 10:41 AM

View Postvandalhooch, on 22 April 2017 - 11:34 PM, said:

I see someone pretending that they have done some science and then trying to use their pretend "study to sway the opinions of others, I speak up. You don't like it, don't pretend to be doing science.
Except you keep making claims about 'insufficient data' and how things haven't been done to a standard you didn't define until like, page 12? While at the same time, offer no research of your own to counter what's been brought to fore by two different individuals working independently of one another.

Even with the small sample sizes, it shows patterns which bare some further looking into.

Quote

Define a "balance match." Give us a definition that can objectively be measured so that we can all look at the same data and draw the same conclusion as to what is and is not a "balanced match."
LOL! See, this is the **** you do. "DEFINE THIS... DEFINE THAT..." When it's been defined ad nausea, one more time for the 'reading comprehension impaired':

A "balanced match" is when both sides have teams that have skill levels as evenly matched as possible. If one side ends up with significantly more players who typically have higher W/L ratios, higher avg. match scores, and higher average damage dealt numbers than the other side, both sides ARE NOT as evenly matched as they could be.

And your statements about "map affects" and the like are so much obfuscatory denial BS.

Exactly what kind of map is going to significantly improve the play of BAD players??!?!? What kind of map is going to make GOOD players, do badly and let BAD players do well?!?!?!?

I believe that piece of data is irrelevant, I'd be willing to bet pretty much anything that it is a difference that makes no difference.

Quote

Do really terrible players move up in Tier rapidly or slowly? How many matches will it take a terrible player (in your opinion) to reach Tier 1? How many matches does that player play in a month? How many months to get to Tier 1? Upon reaching Tier 1 is that player the same level of terrible (in your opinion) as they were when they started out?

Got any data to back up your claim?
Yet again, another example of you arguing for the sake of argument. You don't need data, you look at how it operates:

If a player is playing badly enough to where they are more often than not losing, and when they lose they score badly enough to take the largest PSR penalty, and even when they win they are scoring only enough to come out with NO PSR bonus, they'll NEVER increase their PSR ranking, and in fact, eventually bottom out at T5, no green in the bar.

That's how the PSR system is setup, and it takes a special kind of "bad" to do it the way that the PSR system works.

Now, if a player wins and loses an equal amount, and when they win they're scoring enough to at least get the most minimum bump, and when they lose, they're scoring enough to either break even or take the minimum PSR penalty, they'll go up in rank, slowly.

It's how PGI documented the PSR system, you don't need raw data to prove it, UNLESS, you believe that PGI lied (and that's actually not an unreasonable belief).

Quote

Or be a new player, or play very, very infrequently, or do like some in these forums claimed to do and purposefully throw matches in order to stay in Tier 4/5.
Outliers that do more to prove my point than call it into question, but whatever.

Quote

Weight class balanced is not tonnage balanced.
True that, absolutely.

Quote

Nope, nope, nope. That 80% you are quoting from Taragato does not mean what you think it means.

Taragato only included stomps (12-0 and 12-1) in his analysis. The 80% was the percentage of stomps that had the "higher level" team winning. Note, 20% of those stomps were STOMPS BY THE "WEAKER TEAM." Not just that the "weaker team" won, they STOMPED the stronger team. Taragato's data did not show that the majority of matches have such a discrepancy because he did not collect data on every match! How many matches do "weaker teams" according to Taragato end up winning the match? We have no idea because he didn't collect that data.
Except when he included more data, it didn't reduce the trend, in fact, the trend went up, did it not?

You don't like his sample size, and later on went on to say 1,000 matches might be enough data.

I say you should get right on that and see what you come up with.

Quote

Define balanced in an objective way.
There you go again. Did so numerous times, and above, scroll up to read it.

Quote

That may be what you imagined the new matchmaker was supposed to do but you can't program any matchmaker to balance player skills. If you could, you'd make a fortune from the gaming industry and Vegas would definitely be interested in using your system for their sports betting. You can create a system that reduces the general level of imbalance by using proxy metrics for "skill" but there is no goal of "interesting" and "fun" matches because those aren't objective things that can be calculated.
WOW!!! You're telling me Vegas is NOT already making BILLIONS of dollars on sports betting?!?!?!!?!??!??

Quote

Please don't try to lecture me about how science works. Do you even know what I do for a living?
At this point I'm fairly convinced you are a professional climate change denier.

Possibly employed by Exxon...

Quote

If your observations are biased, and your data is insufficient then you end up not understanding anything beyond what you wanted to be true from the start.
The data points are:

Match Outcomes
Players performance statistics

The observation is:
Taking player the average of the player statistics for both sides and comparing them against the match outcome.

Not sure how that's 'biased', it's raw numbers.

The CONCLUSION might be biased, but not unreasonably so, because the numbers we have show us that when a stomp occurs MM has inadvertently stacked one side up with higher performing players than the other.

Ergo, MM is not always doing its job, its only using 'mech class and PSR, but that is not enough to ensure both sides of a match are balanced because PSR is flawed and not tuned to reflect player skills correctly.

Quote

The limited sample can't support anything. It's too limited. It is the very definition of observation bias at work. If the data seems to confirm what you originally believed before the analysis then GOOD SCIENCE is to be even more skeptical of the data. Humans are very, very good at lying to themselves without being consciously aware of it.
So go get your 1000 matches and show us what you get.

Quote

Which is why I said that the matchmaker builds matches as quickly as possible WHILE LIMITING THE MIXING OF EXPERIENCED AND INEXPERIENCED PILOTS as much as possible.
But it's not doing that, because people apparently don't want to wait an extra minute for a match, BUT ALSO, because PSR isn't doing its job either.

Quote

What size tin foil hat do you wear?
Sorry, I work in a field where you have to define, not just what is said, but the actual intent behind the statement.

Syntax, context, and tone all have meaning.

It allows me to face an irate person screaming at the top of his lungs cussing me and my entire lineage in the most foul and perverse manner, and find a means to provide a solution the actual problem.

Quote

And if you were convinced by those two different "analysis" then you deserve to be ripped off by every huckster who comes along. Your critical thinking skills are nearly non-existent.
LOL...

Quote

No it doesn't. And no amount of his and your repeating that claim will ever make it true. That's not how statistics and science work.
"Do you smell smoke?" "Yeah I smell smoke, do you smell smoke?" "No, smell no smoke."

"I think something is on fire." "Yeah I think so too, I smell something burning."

"I neither see any fire, nor smoke, nothing is on fire! You people don't have enough..." KABOOM!!!

Quote

Very definition of biased observation.
Sorry, I can only report what I see. I can't report what you see, or anyone else does, unless of course, they provide the numbers...

Quote

Got a metric that does it better? We're all ears!
Yet again...

Quote

A zero sum system is the root of ELO systems. You said that didn't work when it was tried but here you are arguing in favor of it.
Yes, applied during the build of teams, not in deciding who should win and how big their elo bump should be.

That was the entire problem of PGI's implementation of elo, if I remember correctly how it was documented.

It was never used to assemble the teams, it was used to figure out who was probably going to win, and if they did win, it limited the bump in elo, and if they lost it increased the elo penalty, where as the opposite, if the team projected to lose won, they got a much bigger elo bump, but if they lost, their elo penalty was minimal.

The way PGI set it up, it could NEVER result in balanced matches because elo wasn't being used to balance teams.

I might be remembering this wrong, it's been a few years.

Quote

So? If the goal is to separate experienced from inexperienced then it works just fine.
Do new players start out in Tier 5 now? If not and they're still being lumped in at the top end of Tier 4 then, automatic fail is it not?

I don't remember reading any changes to new player PSR rating, BUT, if PGI were to increase the MM PSR range from 1-3, to 1-4, then it's reasonable that they might have made the change to new player PSR rank as well, however, over the years how often has PGI done what should have been obvious and reasonable?

Quote

Given player population sizes during some times of the day, just how long are you willing to wait for your match? Is everyone in agreement with your opinion? Why or why not?
It doesn't matter, if players want balanced teams, then they need to wait longer.

Howe much longer, I don't know, PGI doesn't give us the data to determine that.

Me? I'm willing to wait at least another minute or two for more interesting matches. I won't speak to anyone else's opinion on that.

Quote

So, you acknowledge that the root problem is player pool size but think that a "better" matchmaker will overcome that root problem?
When people are reporting that it seems like most of their matches are stomps, that can become boring (especially if you're typically on the receiving end of those stomps) and thus players go elsewhere for entertainment, exacerbating the population problem was bleed players bored with 12-0, 12-1, 12-2, 12-3 6 minutes or less matches.

But ultimately, no PGI needs to frickin' advertise this game like WoT or WoWP or other games have done and are doing. It needs lots and lots of new players to swell the ranks of T4 (or T5 if that's where they're starting now). Then they need to fix the PSR system so that it actually reflects skill, and not how long and how much you've been playing the game.

Quote

Those metrics are already incorporated in PSR. You want to create a more complex algorithm that attempts to balance multiple metrics simultaneously between two teams? Why? Why would you create such a grossly inefficient system?
Actually that's incorrect, as I recall from how PGI documented PSR, it's another value, mostly independent of the other values, and it grows or shrinks based off your end of match score and whether you won or lost the match.

The way it's documented as functioning it's perfectly possible to have a W/L ratio lower than 1, low average match and average damage scores and still be maxed out in Tier 1.

Quote

In your imaginary system, how does matchmaker balance match score between the two teams at the same time it's trying to balance damage per match between those teams? What does it do when those numbers are not strongly correlated within each player?

Sheer lunacy!

I'll bet that your solution ends up combining those different metrics into one overall summary score for each player and that you end up using that summary to create the teams.
Again, there you go...

Quote

No. We are NOT "fairly certain we're not getting balanced matches now." That's my entire point. Neither the OP nor Taragato have demonstrated anything of the kind. All we have is the biased opinions of people. Nothing systematic or objective about any of it.
Taragato is very amenable. I admire his ability to be patient with people being intentionally obtuse. I think he would do better at my job than I do (though it's very rare to have an individual be as purposefully obtuse as you have been).

Taragato's work, plus the OP, independent sampling, small though it is seems to be fairly indicative of something going on.

You the critic haven't brought forth one piece of datum on your own to refute anything the numbers show, only complained that the sampling was too small, or they cherry picked, or that they're just biased, yadda, yadda, yadda.

You're here to argue.

Quote

And you're here to do what . . . knit puppy mittens?
Right now, I'm having fun pissing you off.

Quote

Those who don't understand statistics are the ones who are fooled by the liars. Guess which category you fit into?
LOL, yeah and "93% of statistics are made up on the spot"...

Nicely done passive aggressiveness there!

Quote

Yep. My bad. Sorry for the misquote.
Meh, it happens which is why I only elaborated, not berated.

Quote

How should they be combined?
Do they have to be?

Quote

What about new/smurf accounts that have very few matches and thus might have extreme values?
New accounts start at whatever the default is now and will be scored according to performance, for smurf accounts with extreme values, why do we have to do anything for them? Their scores are their scores, and again, eventually pilot performance will score them where they need to be.

Quote

Is a new player that got lucky in his first match going to be placed in with the best of the best in his second match? How will you prevent that?
If PGI has changed new players to T5, that won't happen until the player scores himself to T4. If PGI hasn't changed new players to start in T5, well, fail.

Quote

Group queue vs. solo?
Both actually. A player who wins a lot, gets good match scores, and does lots of damage almost every match, will rise to the top of the leaderboards. It's how they work. It doesn't matter if they're grouped or not.

Most skilled players tend to do well whether they're grouped or not.

Quote

Yes. Do you think there are enough of them to fill out a complete match every time one of them hits the Quick Play button at any time of the day? If you do manage to get them into their own matches with others like them, will their metrics remain the same high level over time? As their metrics drop do they come join the rest of us plebes down below?
If they're always against equally skilled/performing players the most likely outcome is those numbers will drop to the middle or maybe low end of the top tier. With less, lesser skilled, players for the meat grinder matches "should" become more balanced as their opponents are coming at them with equal skill matches will take longer and be less 'stompy' (baring any outlying disco/afk issues).

There'll be less opportunity for kill/damage hording you see in stomps where there's a few players with high damage and kill counts.

Quote

Isn't that what we already have now?
As is evidenced by the data presented so far? No.

Quote

Why? It will produce exactly what we have now.
How do you know? We really haven't ever had it as described.

The win weighted PSR system NEVER let it be that way.

Quote

Small player pool can not be overcome by a more elaborate matchmaker.
This is absolutely true, hence my call for PGI to actually advertise this damn game some.

Hell you have other, NEWER, games advertising themselves on national TV as "...thinking man's..." games.

You fix population issues by 'butts in seats', you maintain population by ensuring your product can hold the interest of those 'butts'.

#223 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 23 April 2017 - 10:44 AM

@vandalhooch;

I get where you're coming from but there's a huge difference between "this is data I would publish in a thesis" and "this data points to a trend worth studying and leads to some potential hypotheses".

12 matches is enough to say "huh. That's anomalous, we should look at that". Tarogatos study far more so - it's enough data to point to what you can look at or at least build a hypothesis to then really test -

Which we really can't, because we don't have the real telemetry from PGI.

Even if we did it's not going to change much because the MM still has to expediently build matches out of who is available in that moment. The only things I can see that would be worthwhile to change would be to ensure that the MM can reshuffle teams completely as needed in the virtual lobby prior to launch (I don't think it does currently) and tinker with the weighting on matching tonnage vs PSR, potentillay allowing more tonnage mismatch for more accurate skill match and see if that's better.

You could also look at a better PSR more like Elo but drilled down to performance per chassis, loadout and even relative team compositions. However if the population isn't thick enough to exploit that it wouldn't make much real difference.

#224 Dremnon

    Member

  • PipPipPip
  • Big Daddy
  • Big Daddy
  • 60 posts
  • LocationWinnipeg, Manitoba

Posted 23 April 2017 - 10:49 AM

This is so giggle worthy. Honestly the data for all of this is already with PGI, yet everyone is going to sit here for 12 pages on the forums being master debaters on whether what was collected by OP is sufficient or not. Maybe get a thread going on here or ask on twitter or on reddit, etc to PGI to show some stats over the last 2 months or out of 1000 matches played on how many where 12-0, 12-1, and so on.

Edited by Dremnon, 23 April 2017 - 10:51 AM.


#225 Dimento Graven

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Guillotine
  • Guillotine
  • 6,208 posts

Posted 23 April 2017 - 11:16 AM

View PostDremnon, on 23 April 2017 - 10:49 AM, said:

This is so giggle worthy. Honestly the data for all of this is already with PGI, yet everyone is going to sit here for 12 pages on the forums being master debaters on whether what was collected by OP is sufficient or not. Maybe get a thread going on here or ask on twitter or on reddit, etc to PGI to show some stats over the last 2 months or out of 1000 matches played on how many where 12-0, 12-1, and so on.
AND along with the 12-0, 12-1, etc., the avg. player match score, avg. player W/L ratio, and avg. end of match score, for each side.

Again, I predict the winning side will have more players with historically better statistics, than the losing side.

#226 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 23 April 2017 - 11:24 AM

View PostDimento Graven, on 23 April 2017 - 11:16 AM, said:

AND along with the 12-0, 12-1, etc., the avg. player match score, avg. player W/L ratio, and avg. end of match score, for each side.

Again, I predict the winning side will have more players with historically better statistics, than the losing side.


That's a fine hypothesis. Testing it would be more work than I'm willing to do for free and wouldn't change much.

#227 Too Much Love

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 787 posts

Posted 23 April 2017 - 11:43 AM

What my study has definetely proven is that Vandalhooch's posts need to be mesured in kilometers, not in words.

The funny part is that all these kms can be squeezed into one sentence - "if you don't have 100 Tb of all possible data you don't have the right to make any assumtions". Back to work, peasants, there are guys who have all the data necessary, they will think for you!

Copernicus was wrong with his Earth orbits around the Sun theory. He didn't have all the necessary data! Did he travelled to the Moon? Did he knew exact (to millimeters) distance between Earth and Saturn? Poor guy, he was so terribly wrong.

By the way, if you havn't been to Australia, how do you know it exists? Do you have any data? Don't forget to make chemical analysis of your milk before drinking, otherwise how can you be sure its not poisoned.

But who we are to judge? He made some hints that he is a great scientist...

Posted Image

Edited by drunkblackstar, 23 April 2017 - 11:44 AM.


#228 Too Much Love

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 787 posts

Posted 23 April 2017 - 11:49 AM

View PostDimento Graven, on 23 April 2017 - 10:41 AM, said:

Except you keep making claims about 'insufficient data' and how things haven't been done to a standard you didn't define until like, page 12?


I admire your persistence in explaining the things to the person who pretends he doesn't hear the explanation.

#229 Jman5

    Member

  • PipPipPipPipPipPipPipPipPip
  • Littlest Helper
  • Littlest Helper
  • 4,914 posts

Posted 23 April 2017 - 12:01 PM

So I've tried reading this thread, but my eyes kind of gloss over when folks mass quote/respond line by line. Apologies if this was addressed satisfactorily.

Seems like the premise of this is pretty straight forward: Average Matchscore is a pretty good indicator of skill. The higher skill team wins more often than not. If matchmaker did a better job of balancing the teams based on mathscore you would have more competitive matches.

"But population, but search times, etc..."

The way I would approach matchmaker is by using the average matchscore of a player's last 50-100 games in each weight class. This would give you a good approximation without being bogged down by the 500 games you played last year. Splitting it up by weight class would give the system much greater granularity. So the guy who's playing an arctic Cheetah isn't being judged for how he plays in a kodiak.

It's not like it's super hard on the matchmaker either. Usually it's just a matter of trading one or two good players with one or two bad players of the same weight class. I looked at one game in the spreadsheet and just switching around two Lights I was able to cut the average matchscore difference from about 50 to 12. This is with all the same players and maintaining the 1:1 weight class parity.

Of course it's impossible to prove that this hypothetical match would have resulted in a closer match, but the point is to show that you can create improved matchscore parity between teams without needing more players or a longer search time.

Edited by Jman5, 23 April 2017 - 12:10 PM.


#230 Too Much Love

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 787 posts

Posted 23 April 2017 - 12:03 PM

What would be really useful and interesting is not posting walls of "you don't have enough data" messages, but to look into the matches more closely.

Like mesure the disparities between 2 teams and inside the teams.

Lets take for example the match 1 from the data I've collected.

There are 2 guys in the winning team with K\D below 1. And 8 (!) guys in the defeated team with the K\D less then 1.

If we look closely, then we will find out that there are almost always more guys with less then 1 K\D in the losing team, then in the winning team.

In fact it is the easiest way to make the certain team to lose (by giving them more "tier 4 players").

#231 Too Much Love

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 787 posts

Posted 23 April 2017 - 12:12 PM

View PostJman5, on 23 April 2017 - 12:01 PM, said:

The way I would approach matchmaker is by using the average matchscore of a player's last 50-100 games in each weight class.

It's kind of like old ELO system worked - they had sparate ELO for each class.

Your suggestion is definetly reasonable. It's better to have more individual approach. But as you said yourself, "But population, but search times, etc..."

Edited by drunkblackstar, 23 April 2017 - 12:13 PM.


#232 Tarogato

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Civil Servant
  • Civil Servant
  • 6,558 posts
  • LocationUSA

Posted 23 April 2017 - 12:45 PM

View Postvandalhooch, on 23 April 2017 - 10:18 AM, said:



Instead of quoting and responding to what you said, I'm just going to thank you for a good set of replies. Pretty much answered all of my questions and laid out most of your expectations. I appreciate that, and don't really have anything further. If I can get a working OCR, I'll know how to better approach a revisit of this in the future.

Cheers, mate.

#233 Jman5

    Member

  • PipPipPipPipPipPipPipPipPip
  • Littlest Helper
  • Littlest Helper
  • 4,914 posts

Posted 23 April 2017 - 12:45 PM

View Postdrunkblackstar, on 23 April 2017 - 12:12 PM, said:

It's kind of like old ELO system worked - they had sparate ELO for each class.

Your suggestion is definetly reasonable. It's better to have more individual approach. But as you said yourself, "But population, but search times, etc..."

Yeah kind of like old Elo except with matchscore which I think is a better metric than WLR.

#234 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 23 April 2017 - 12:58 PM

View PostDimento Graven, on 23 April 2017 - 10:41 AM, said:

Except you keep making claims about 'insufficient data' and how things haven't been done to a standard you didn't define until like, page 12?


My first post pointing out the biased nature of the OP's "study" was on page 6. Prior to that, I simply had not read it yet.

Quote

While at the same time, offer no research of your own to counter what's been brought to fore by two different individuals working independently of one another.


I don't have to have analysis of my own to point out the weakness in the analysis of others. For someone who earlier tried to lecture me about how science works, you seem to be awfully ignorant about the process of peer review.

Quote

Even with the small sample sizes, it shows patterns which bare some further looking into.


No it doesn't. That's the whole point. The sample sizes are too small to draw any conclusions about. The sample sizes are too small to claim that they have told you anything meaningful. You attaching meaning to them is just you expressing your own confirmation bias on what is in essence, random noise.

Quote

LOL! See, this is the **** you do. "DEFINE THIS... DEFINE THAT..." When it's been defined ad nausea, one more time for the 'reading comprehension impaired':

A "balanced match" is when both sides have teams that have skill levels as evenly matched as possible.


That isn't an objective definition. What units are "skill levels" measured in? How is "skill level" calculated? How far apart do these "skill levels" have to be to be defined as unbalanced?

Without any actual numbers you haven't defined anything. This just makes your earlier attempts to lecture others about how science works all the more laughable.

Quote

If one side ends up with significantly more players who typically have higher W/L ratios, higher avg. match scores, and higher average damage dealt numbers than the other side, both sides ARE NOT as evenly matched as they could be.


You need actual numbers to claim that one team is "significantly more players" than the other. You are starting to sound like many politicians who tried to pass anti-porn legislation. "I know it when I see it."

Quote

And your statements about "map affects" and the like are so much obfuscatory denial BS.


No they aren't. They are absolutely confounding factors. They must be accounted for in any analysis. Or, are you going to try and claim that low skill players in multiple LRM boats definitely don't have a greater chance of victory on Polar Highlands than HPG? What analysis have you done to show that maps play no role in determining the outcome of matches? Did the OP get a random sample of maps? Did the OP even record which maps the matches took place on?

Quote

Exactly what kind of map is going to significantly improve the play of BAD players??!?!? What kind of map is going to make GOOD players, do badly and let BAD players do well?!?!?!?


Don't know for sure, so we need to properly analyze the data to see if there is no effect. If there isn't any effect, then any further studies can safely ignore map variety. You can't just ignore it now because you can't imagine there is an effect. That's biased data collection.

Quote

I believe that piece of data is irrelevant, I'd be willing to bet pretty much anything that it is a difference that makes no difference.


You being confident that it's true and the data showing that it's true are two completely different things. One is pseudo-scientific nonsense and the other is how science is actually done.

Quote

Yet again, another example of you arguing for the sake of argument. You don't need data, you look at how it operates:

If a player is playing badly enough to where they are more often than not losing, and when they lose they score badly enough to take the largest PSR penalty, and even when they win they are scoring only enough to come out with NO PSR bonus, they'll NEVER increase their PSR ranking, and in fact, eventually bottom out at T5, no green in the bar.

That's how the PSR system is setup, and it takes a special kind of "bad" to do it the way that the PSR system works.

Now, if a player wins and loses an equal amount, and when they win they're scoring enough to at least get the most minimum bump, and when they lose, they're scoring enough to either break even or take the minimum PSR penalty, they'll go up in rank, slowly.

It's how PGI documented the PSR system, you don't need raw data to prove it, UNLESS, you believe that PGI lied (and that's actually not an unreasonable belief).

Outliers that do more to prove my point than call it into question, but whatever.


In other words, you have absolutely no data as to how quickly the average player moves through the Tiers. You could have just admitted you had no data. I already knew you didn't and so did everyone else.

Quote

True that, absolutely.

Except when he included more data, it didn't reduce the trend, in fact, the trend went up, did it not?


Trend? What trend? Trends are patterns in time series data. Taragato did not do a time series analysis. There is no trend to be discovered.

Quote

You don't like his sample size, and later on went on to say 1,000 matches might be enough data.

I say you should get right on that and see what you come up with.


Why? Taragato's data has absolutely no bearing on the question of "is the matchmaker failing to make adequately evenly matched teams?"

He only recorded data for matches that were STOMPS!

Quote

There you go again. Did so numerous times, and above, scroll up to read it.


Scroll up to what? Your completely subjective definitions of evenly matched and skill level? Do you even comprehend what the term objective means?

Quote

WOW!!! You're telling me Vegas is NOT already making BILLIONS of dollars on sports betting?!?!?!!?!??!??


Do you have the slightest clue how they do that? Hint: It does not involve their ability to accurately predict the outcomes of sporting events. Vegas casinos do not gamble!

Quote

At this point I'm fairly convinced you are a professional climate change denier.

Possibly employed by Exxon...


Bwaaaa, haaaa, haaaaa. You couldn't be more wrong.

Quote

The data points are:

Match Outcomes


Which match outcomes? Stomps only? All matches? All maps? Group only? Solo only? Group and solo? One mode only? All modes pooled together?

Quote

Players performance statistics


Which metrics (they aren't statistics until they are pooled or compared to others)? Why those? How are they to be weighted? Why? What about metrics that are derivations of the other metrics?

Quote

The observation is:
Taking player the average of the player statistics for both sides and comparing them against the match outcome.


Averaging? How exactly do you mathematically average a win loss RATIO with a damage PER MATCH? What does that even mean? What units is your final number in?

Do I just take my 1.27 WLR and add it to my 325 damage per match score and divide by 2?

Edited by vandalhooch, 23 April 2017 - 01:02 PM.


#235 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 23 April 2017 - 01:03 PM

Broken up to get it posted.

Quote

Not sure how that's 'biased', it's raw numbers.


It's biased because you seem to be mathematically illiterate.

Quote

The CONCLUSION might be biased, but not unreasonably so, because the numbers we have show us that when a stomp occurs MM has inadvertently stacked one side up with higher performing players than the other.


Was it inadvertent? How do you know? What is the frequency of actual matches having unbalanced teams according to Taragato's ranking system? I'll give you a hint, it isn't 80%.

Quote

Ergo, MM is not always doing its job,


You still haven't actually measured this to draw such a conclusion.

Quote

its only using 'mech class and PSR, but that is not enough to ensure both sides of a match are balanced because PSR is flawed and not tuned to reflect player skills correctly.


This could be true, but you still don't have any actual evidence that it is. Neither the OP nor Taragato collected enough data in a systematic manner in order to provide that evidence.

Quote

So go get your 1000 matches and show us what you get.


That's not how science works. You made the claim, you provide the evidence to support your claim. I can't search for counter-evidence to something that only exists in your imagination.

Quote

But it's not doing that, because people apparently don't want to wait an extra minute for a match, BUT ALSO, because PSR isn't doing its job either.


What should be the appropriate wait time to open valves in the current system? Should that time be calculated from the time the first pilot was slotted into the match or the latest? I think you'll find that there is absolutely no such thing as universal agreement on what those values should be.

Quote

Sorry, I work in a field where you have to define, not just what is said, but the actual intent behind the statement.

Syntax, context, and tone all have meaning.


And objectivity doesn't factor in? I thought you were trying to lecture me about how science works and yet objectivity is nowhere included in the process of defining things.

Quote

It allows me to face an irate person screaming at the top of his lungs cussing me and my entire lineage in the most foul and perverse manner, and find a means to provide a solution the actual problem.


Sounds incredibly useful. It's not scientific.

Quote

LOL...

"Do you smell smoke?" "Yeah I smell smoke, do you smell smoke?" "No, smell no smoke."

"I think something is on fire." "Yeah I think so too, I smell something burning."

"I neither see any fire, nor smoke, nothing is on fire! You people don't have enough..." KABOOM!!!


The problem is that you are confusing dust in the air for smoke. You don't actually understand how to tell the difference between dust and smoke.

Quote

Sorry, I can only report what I see. I can't report what you see, or anyone else does, unless of course, they provide the numbers...


And humans are incredibly terrible at reporting what actually happened instead of what they saw happen. Those two things are NOT the same thing. What you "see" happening is an experience in your brain but it is not a recording of what actually happened.

That's why science has rigorous protocols to eliminate the biases of human experience from any deliberations about what is actually happening.

Quote

Yet again...


Because you have continually failed to give an objective definition for your terms.

Quote

Yes, applied during the build of teams, not in deciding who should win and how big their elo bump should be.

That was the entire problem of PGI's implementation of elo, if I remember correctly how it was documented.

It was never used to assemble the teams, it was used to figure out who was probably going to win, and if they did win, it limited the bump in elo, and if they lost it increased the elo penalty, where as the opposite, if the team projected to lose won, they got a much bigger elo bump, but if they lost, their elo penalty was minimal.

The way PGI set it up, it could NEVER result in balanced matches because elo wasn't being used to balance teams.


Is your system going to adjust players' rankings based on the results of their matches? How much will players move in your system? What about players who get pulled into higher or lower level matches due to the small player pool?

Quote

I might be remembering this wrong, it's been a few years.


Nah, you got it mostly right. Although, I think ELO's were also used to initially seed the matches to begin with in addition to calculating movements after matches. Either way, it was an inappropriate system for essentially randomly assembled teams.

Quote

Do new players start out in Tier 5 now? If not and they're still being lumped in at the top end of Tier 4 then, automatic fail is it not?


I thought you were getting rid of the current system! I'm asking how these situations will be handled in your newer, "better" system.

BTW: New accounts start mid-level Tier 5. The first few matches have PSR gain multipliers applied to move smurf accounts out of Tiers 4 and 5 rapidly.

Quote

I don't remember reading any changes to new player PSR rating, BUT, if PGI were to increase the MM PSR range from 1-3, to 1-4, then it's reasonable that they might have made the change to new player PSR rank as well, however, over the years how often has PGI done what should have been obvious and reasonable?

It doesn't matter, if players want balanced teams, then they need to wait longer.

Howe much longer, I don't know, PGI doesn't give us the data to determine that.


Seems you don't know a lot about your newer, "better" system.

Quote

Me? I'm willing to wait at least another minute or two for more interesting matches. I won't speak to anyone else's opinion on that.


But if you are designing a "better" system you have to speak to everyone's opinion on it. You have to program the valve release times. If you aren't willing to even address that simple tool, why should we care that you think you have a better system?

Quote

When people are reporting that it seems like most of their matches are stomps,


People reporting something is not the same thing as it actually happening. Confirmation bias is a real thing.

Quote

that can become boring (especially if you're typically on the receiving end of those stomps) and thus players go elsewhere for entertainment, exacerbating the population problem was bleed players bored with 12-0, 12-1, 12-2, 12-3 6 minutes or less matches.


So now 12-2 and 12-3 matches are considered stomps? Why? Does everyone agree with that? Does this match Taragato's data you were citing in support of your position?

Why should I listen to you when you cite data that doesn't even match your definition of what constitutes a "stomp?"

Quote

But ultimately, no PGI needs to frickin' advertise this game like WoT or WoWP or other games have done and are doing. It needs lots and lots of new players to swell the ranks of T4 (or T5 if that's where they're starting now).


A completely separate issue from "is the matchmaker failing to create evenly matched teams?"

Quote

Then they need to fix the PSR system so that it actually reflects skill, and not how long and how much you've been playing the game.


Still waiting on your proposed system. You keep saying that everything would be better if we had such a system but you don't seem capable of even describing how such a system would actually work.

Quote

Actually that's incorrect, as I recall from how PGI documented PSR, it's another value, mostly independent of the other values, and it grows or shrinks based off your end of match score and whether you won or lost the match.


Win and lose are part of match score. Damage dealt is part of match score. PSR moves are based on match scores, therefore PSR incorporates WLR and damage per match.

PSR is not an independent metric.

Quote

The way it's documented as functioning it's perfectly possible to have a W/L ratio lower than 1, low average match and average damage scores and still be maxed out in Tier 1.


Theoretically possible is not the same as actually happening in reality. The matchmaker does not work off of what pilots are theoretically capable of having for metrics.

Quote

Again, there you go...


And I'll stop just as soon as you actually define your terminology in such a way that we can both understand what it is we are discussing.

Quote

Taragato is very amenable. I admire his ability to be patient with people being intentionally obtuse. I think he would do better at my job than I do (though it's very rare to have an individual be as purposefully obtuse as you have been).


Are you trying to speak for him now? Why not just let him speak for himself?

Quote

Taragato's work, plus the OP, independent sampling, small though it is seems to be fairly indicative of something going on.


No it isn't. The fact that you think it's indicative of anything at all just shows how ignorant you are of statistical analysis.

Quote

You the critic haven't brought forth one piece of datum on your own to refute anything the numbers show,


I have in fact explained why "the numbers" don't really show what you claim they do. I don't need my own numbers to do that.

Quote

only complained that the sampling was too small, or they cherry picked, or that they're just biased, yadda, yadda, yadda.


Now who sounds like a climate denier?

Quote

You're here to argue.

Right now, I'm having fun pissing you off.


Bwaaaa haaaa haaaaa. Nothing you have said has "pissed me off" in the slightest. When I come across someone who repeatedly refuses to admit they are only pretending to understand something, I find a great deal of satisfaction in showing how clueless they are.

Taragato admitted that he learned a great deal from his experience. He has earned my respect for both initial hard work and his willingness to listen and learn from criticism. You on the other hand are a pompous know-nothing.

Quote

LOL, yeah and "93% of statistics are made up on the spot"...


I'm not the one claiming to have statistics on my side . . .

Quote

Nicely done passive aggressiveness there!


Nothing passive about it.

Quote

Meh, it happens which is why I only elaborated, not berated.

Do they have to be?


If you are going to use them in a matchmaking algorithm, yes.

Quote

New accounts start at whatever the default is now and will be scored according to performance, for smurf accounts with extreme values, why do we have to do anything for them?


So new players will instantly jump in with the best of the best because they had one good match? Sound like a recipe for good new player experience?

Quote

Their scores are their scores, and again, eventually pilot performance will score them where they need to be.
If PGI has changed new players to T5, that won't happen until the player scores himself to T4. If PGI hasn't changed new players to start in T5, well, fail.


Nice to see you feel confident enough to argue about a system that you don't even know the basics of.

Quote

Both actually. A player who wins a lot, gets good match scores, and does lots of damage almost every match, will rise to the top of the leaderboards. It's how they work. It doesn't matter if they're grouped or not.


Bwaaaaaa haaaaa haaaaaa.

For someone who was complaining about bad players being able to rise to Tier 1 in the current system you seem particularly dumb when it comes to your proposed alternative.

Quote

Most skilled players tend to do well whether they're grouped or not.


Do they do better in groups or in solo? Do you have any actual data to support such an assertion?

Quote

If they're always against equally skilled/performing players the most likely outcome is those numbers will drop to the middle or maybe low end of the top tier.


That's not how zero-sum systems work. You can't have a match of 24 killers and have all of them get high damage numbers. They can't all maintain high win-loss ratios.

Quote

With less, lesser skilled, players for the meat grinder matches "should" become more balanced as their opponents are coming at them with equal skill matches will take longer and be less 'stompy' (baring any outlying disco/afk issues).


You have zero evidence that stomps are the result of unevenly balanced teams. Taragato's analysis does not address that claim. All Taragato's data shows is that if a stomp does occur, the "stronger team" usually wins. Well, duh. It does not show that unbalanced teams result in stomps more often than balanced teams do.

Quote

There'll be less opportunity for kill/damage hording you see in stomps where there's a few players with high damage and kill counts.


Do you even play the game anymore? Most of the stomps I've observed result in damage being spread more evenly. Of course, I could be wrong and we'd need to go back through Taragato's data to check. That's something that his data set could actually address already in fact.

Quote

As is evidenced by the data presented so far? No.


The data you are referring to provide no evidence as to the actual question at hand.

Quote

How do you know? We really haven't ever had it as described.

The win weighted PSR system NEVER let it be that way.


In your new system, highly ranked players will eventually be pooled in with less skilled players because there won't be enough of them to fill out matches on demand. That is exactly the thing the OP and you are complaining about right now. Your new system will end up with the same results we already have.

Quote

This is absolutely true, hence my call for PGI to actually advertise this damn game some.

Hell you have other, NEWER, games advertising themselves on national TV as "...thinking man's..." games.

You fix population issues by 'butts in seats', you maintain population by ensuring your product can hold the interest of those 'butts'.


You admit that the fundamental issue is small player pool and a new matchmaker won't change that. Then what's with all the crying about the current matchmaker?

#236 Tarogato

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Civil Servant
  • Civil Servant
  • 6,558 posts
  • LocationUSA

Posted 23 April 2017 - 01:10 PM

View PostDimento Graven, on 23 April 2017 - 10:41 AM, said:

This is absolutely true, hence my call for PGI to actually advertise this damn game some.


You know, I really wish I could get behind this... but I just can't. Even after the Tutorial was introduced, it still isn't enough to integrate new players. With the new player experience in the state that it is, just like with steam launch, we lose so many borderline players - the ones that might like the game, might even want to like the game, but get rekt because they fail to understand it, and then lose interest, perhaps never to return even if the game *is* later advertised.

All PGI really has to do, is feature some youtube embeds in the game - let a new player browse topics, and play a video that covers it. Very much like Kanjashi's old tutorial series, but we need one that is up to date, more complete, and more professional. PGI doesn't even have to do it themselves, they can outsource the community (I'd actually recommend that they do... )

#237 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 23 April 2017 - 01:17 PM

View PostTarogato, on 23 April 2017 - 12:45 PM, said:

Instead of quoting and responding to what you said, I'm just going to thank you for a good set of replies. Pretty much answered all of my questions and laid out most of your expectations. I appreciate that, and don't really have anything further. If I can get a working OCR, I'll know how to better approach a revisit of this in the future.

Cheers, mate.

No problem. And hit me up if and when you decide to go for the really hard analysis. I'll lend a hand.

View Postdrunkblackstar, on 23 April 2017 - 12:03 PM, said:

What would be really useful and interesting is not posting walls of "you don't have enough data" messages, but to look into the matches more closely.

Like mesure the disparities between 2 teams and inside the teams.

Lets take for example the match 1 from the data I've collected.

There are 2 guys in the winning team with K\D below 1. And 8 (!) guys in the defeated team with the K\D less then 1.

If we look closely, then we will find out that there are almost always more guys with less then 1 K\D in the losing team, then in the winning team.

In fact it is the easiest way to make the certain team to lose (by giving them more "tier 4 players").


So you want to go from a sample size of 12 to a sample size of 1? Have you learned nothing?

#238 vandalhooch

    Member

  • PipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 891 posts

Posted 23 April 2017 - 01:28 PM

View Postdrunkblackstar, on 23 April 2017 - 11:43 AM, said:

What my study has definetely proven is that Vandalhooch's posts need to be mesured in kilometers, not in words.


Words are how you convey understanding. If a topic is complex and not necessarily intuitive, like statistical analysis, then it's likely going to take more than a few words to accurately get that understanding across. This is especially true if the recipient is almost completely ignorant of the basics of the topic. Real scientific papers can be very, very brief. Watson and Crick's structure of DNA paper is exactly two pages. But, without the requisite background knowledge most readers would find it baffling.

Quote

The funny part is that all these kms can be squeezed into one sentence - "if you don't have 100 Tb of all possible data you don't have the right to make any assumtions". Back to work, peasants, there are guys who have all the data necessary, they will think for you!


Got any more anti-intellectualism to share with the class? Come now, don't hog it all for yourself.

Quote

Copernicus was wrong with his Earth orbits around the Sun theory. He didn't have all the necessary data! Did he travelled to the Moon? Did he knew exact (to millimeters) distance between Earth and Saturn? Poor guy, he was so terribly wrong.


I highly doubt you have ever even cracked open a copy of De Revolutionibus. You should try it. Come back to us and explain how you understand what Copernicus was saying in it.

Quote

By the way, if you havn't been to Australia, how do you know it exists? Do you have any data? Don't forget to make chemical analysis of your milk before drinking, otherwise how can you be sure its not poisoned.

But who we are to judge? He made some hints that he is a great scientist...


Ad hominem. The last resort of the pathetically desperate.

BTW: Comparing me to a brilliant scientist is not nearly the insult you seem to think it is.

Edited by vandalhooch, 23 April 2017 - 01:28 PM.


#239 BLOOD WOLF

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The Jaws
  • The Jaws
  • 6,368 posts
  • Locationnowhere

Posted 23 April 2017 - 03:11 PM

View Postvandalhooch, on 23 April 2017 - 01:28 PM, said:


Ad hominem. The last resort of the pathetically desperate.


yea, pointing out fallacies is the quickest way to hurt ego's around here. It's a shame they are the norm for some people, and worse some can't recognize them.

As an old associate of philosophy, nice work there.

Edited by BLOOD WOLF, 23 April 2017 - 03:11 PM.


#240 Mawai

    Member

  • PipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 3,495 posts

Posted 23 April 2017 - 04:51 PM

View PostMcgral18, on 19 April 2017 - 09:47 AM, said:


Sure...but not by throwing 6 Terribads on one side, and 2 on the other
Balancing the teams properly, not choosing a default side to lose (or massive carry)


You know that it would be harder to write a matchmaker that did that than it was to code the one they have now? Trying to tailor each match made to affect an individual players record because it is too good or too bad would be almost impossible when trying to do the same for every other player in the game.

The interesting thing that we can't do without PGI's help is to compare the actual PSR of each player as well as their recent stats then compare how the matches are balanced in terms of PSR. A players stats in the past play season based on W/L, K/D and MS is simply an alternative rating system.

The matchmaker is just a piece of code that places people on opposing teams and tries to make them about equal. Perhaps there is a bug in the algorithm such that matches have an inherent imbalance but if that is the case I would have hoped they would have found it by now.

More likely, the underlying player rating system upon which the matchmaker is based is basically broken ... which I think most players have agreed on for quite some time now.

Finally, the findings are actually quite reassuring ... in most of the results posted by the OP ... the team with the better players based on W/L, K/D and MS actually won ... which is what you would hope and might expect would be the case.





6 user(s) are reading this topic

0 members, 6 guests, 0 anonymous users