Stats Study: Matchmaker Is Unfair

#201 Too Much Love

Member

Ace Of Spades
787 posts

Posted 22 April 2017 - 10:50 PM

vandalhooch, on 22 April 2017 - 10:46 PM, said:

In order to analyze how effective matchmaker is doing you need to have the relevant data. We don't have it. We can't claim to be drawing any conclusions based on data that we don't in fact have.

Ok. I got your point. Thank you for the participation.

#202 vandalhooch

Member

Ace Of Spades
891 posts

Posted 22 April 2017 - 11:34 PM

Dimento Graven, on 22 April 2017 - 09:12 PM, said:

Yes, you're only here to argue, very obvious with that.

Quote

The Tier system was originally intended for more balanced matches,

Define a "balance match." Give us a definition that can objectively be measured so that we can all look at the same data and draw the same conclusion as to what is and is not a "balanced match."

Quote

and yes to also protect the noobs, who originally came it near the top end of Tier 4, avoid playing against vets, or being put on the side of vets who don't typically like having to carry 'potatos'.

Just as I said.

Quote

Sure, but due to the Tiering system being slanted to favor winning over losing, only the worst of the worst will NOT be able to move rank.

Do really terrible players move up in Tier rapidly or slowly? How many matches will it take a terrible player (in your opinion) to reach Tier 1? How many matches does that player play in a month? How many months to get to Tier 1? Upon reaching Tier 1 is that player the same level of terrible (in your opinion) as they were when they started out?

Got any data to back up your claim?

Quote

So the longer it functions this way the more meaningless Tiers 1-3 are (BUT, interestingly enough, it makes Tier 5 really mean something, because man you have to play REALLY sh!tty, consistently, to deserve a spot in Tier 5).

Or be a new player, or play very, very infrequently, or do like some in these forums claimed to do and purposefully throw matches in order to stay in Tier 4/5.

Quote

So while, from a Tier and tonnage perspective it's "balanced" and quickly,

Weight class balanced is not tonnage balanced.

Quote

as soon as someone actually looks at the performance of the players, they find that the MM is regularly setting up sides where one side has more higher performing players than the other side, resulting in an upwards of 80% accuracy in predicting which side is probably going to win.

Nope, nope, nope. That 80% you are quoting from Taragato does not mean what you think it means.

Taragato only included stomps (12-0 and 12-1) in his analysis. The 80% was the percentage of stomps that had the "higher level" team winning. Note, 20% of those stomps were STOMPS BY THE "WEAKER TEAM." Not just that the "weaker team" won, they STOMPED the stronger team. Taragato's data did not show that the majority of matches have such a discrepancy because he did not collect data on every match! How many matches do "weaker teams" according to Taragato end up winning the match? We have no idea because he didn't collect that data.

Quote

So the match was fast, but it wasn't actually balanced.

Define balanced in an objective way.

Quote

Totally agree, it was doomed to failure before it began, they wasted a LOT of effort on someone misapplying elo.

That was part of it yes, didn't think it merited mentioning because that's obvious. No one wants brand new players being butt ***** by people who have been playing for years. While it might be fun to club seals, it's no fun being the seal. BUT ALSO, it was an attempt at producing a mechanism that worked better than elo, and would create more interesting and fun matches by ensuring that both sides had as close to equal skill as reasonably possible.

That may be what you imagined the new matchmaker was supposed to do but you can't program any matchmaker to balance player skills. If you could, you'd make a fortune from the gaming industry and Vegas would definitely be interested in using your system for their sports betting. You can create a system that reduces the general level of imbalance by using proxy metrics for "skill" but there is no goal of "interesting" and "fun" matches because those aren't objective things that can be calculated.

Quote

Sorry, but that's kind of how science works. You observe, you gather data, you try and understand what the data is telling you.

Please don't try to lecture me about how science works. Do you even know what I do for a living?

If your observations are biased, and your data is insufficient then you end up not understanding anything beyond what you wanted to be true from the start.

Quote

The limited sample supports what it appears a majority post regularly about in the forums, MM is not doing its intended job.

The limited sample can't support anything. It's too limited. It is the very definition of observation bias at work. If the data seems to confirm what you originally believed before the analysis then GOOD SCIENCE is to be even more skeptical of the data. Humans are very, very good at lying to themselves without being consciously aware of it.

Quote

If you just want to make matches as quickly as possible, you don't need Tiers, or rankings, you just start slopping people together as soon as they show up.

Which is why I said that the matchmaker builds matches as quickly as possible WHILE LIMITING THE MIXING OF EXPERIENCED AND INEXPERIENCED PILOTS as much as possible.

Quote

We had that, we didn't like it, we got elo, then we got Tiers.

I'm supporting these sentences:

"In fact matchmaker doesn't assemble the equal teams. It makes teams to be unequal. "

What size tin foil hat do you wear?

Quote

This from experience and two different people doing analysis independently, appears to be proven.

And if you were convinced by those two different "analysis" then you deserve to be ripped off by every huckster who comes along. Your critical thinking skills are nearly non-existent.

Quote

Yes, yes, he goes on to say:

"The one team is determined to win, the other - to lose."

Poorly worded, perhaps a bit hyperbolic if the syntax was intended, but it doesn't invalidate the fact that it really does appear from the data,

No it doesn't. And no amount of his and your repeating that claim will ever make it true. That's not how statistics and science work.

Quote

and reported experiences, and my own personal experience,

Very definition of biased observation.

Quote

that MM is NOT creating balanced teams and if/when it does, it's more accident than intent, primarily because the key measure of each pilot is insufficient, the player's Tier ranking is not sufficient to gauge skill.

Got a metric that does it better? We're all ears!

Quote

After all, I'm fairly certain that in most circumstances any player with a W/L ratio of less than 1.0 should be going DOWN in ranking at some point.

A zero sum system is the root of ELO systems. You said that didn't work when it was tried but here you are arguing in favor of it.

Quote

HOWEVER, with the Tier system as slanted to wins as it is, you can have a very low W/L ratio and STILL grind into Tier 1.

So? If the goal is to separate experienced from inexperienced then it works just fine.

Quote

Or, we Tier 1's could suck it up and let MM churn for another minute or two...

Given player population sizes during some times of the day, just how long are you willing to wait for your match? Is everyone in agreement with your opinion? Why or why not?

Quote

AND/OR BETTER YET: PGI could actually spend for some WoT level TV and magazine ads to maybe attract more players so that we could actually HAVE a decently sized player base to avoid the wait time...

So, you acknowledge that the root problem is player pool size but think that a "better" matchmaker will overcome that root problem?

Quote

I guess I wasn't clear:

"...so it should be using other data, W/L, MS, etc. all seem like a good place to start."

PGI has this information already, surely it can make sure that the average W/L, match score, and damage per match amongst the two sides is closer to even.

Those metrics are already incorporated in PSR. You want to create a more complex algorithm that attempts to balance multiple metrics simultaneously between two teams? Why? Why would you create such a grossly inefficient system?

In your imaginary system, how does matchmaker balance match score between the two teams at the same time it's trying to balance damage per match between those teams? What does it do when those numbers are not strongly correlated within each player?

Sheer lunacy!

Quote

You want me to write the code for that? Soon as PGI hires me away from my current employer, and trust me, in Canadian dollars, that's gonna be a LOT more than I bet they're willing to spend, I'll be happy to.

I'll bet that your solution ends up combining those different metrics into one overall summary score for each player and that you end up using that summary to create the teams.

Quote

Doesn't invalidate that, at least at that level, IT CAN BE DONE.

Since we're fairly certain we're not getting balanced matches now, it'd sure be nice to at least attempt it, no?

No. We are NOT "fairly certain we're not getting balanced matches now." That's my entire point. Neither the OP nor Taragato have demonstrated anything of the kind. All we have is the biased opinions of people. Nothing systematic or objective about any of it.

Quote

Or are you just here to argue?

And you're here to do what . . . knit puppy mittens?

Quote

Yeah, yeah, yeah... "lies, damned lies, and statistics..."

<yawn>

Those who don't understand statistics are the ones who are fooled by the liars. Guess which category you fit into!

Quote

I didn't mention KDR.

I mentioned W/L ratio, match score, and damage per match.

Yep. My bad. Sorry for the misquote.

Quote

I think those are pretty good places to start measuring 'true' pilot skill.

How should they be combined? What about new/smurf accounts that have very few matches and thus might have extreme values? Is a new player that got lucky in his first match going to be placed in with the best of the best in his second match? How will you prevent that?

Quote

Are we not seeing people that generally considered "very skilled" at the top of the leaderboards month after month? What are their W/L ratios, avg. match score, and avg. damage per match?

Group queue vs. solo?

Quote

Typically they're MUCH better than people lower down the leaderboards...

Yes. Do you think there are enough of them to fill out a complete match every time one of them hits the Quick Play button at any time of the day? If you do manage to get them into their own matches with others like them, will their metrics remain the same high level over time? As their metrics drop do they come join the rest of us plebes down below?

Isn't that what we already have now?

Quote

It seams a not unreasonable thing to try...

Why? It will produce exactly what we have now.

Small player pool can not be overcome by a more elaborate matchmaker.

#203 Tarogato

Member

Civil Servant
6,558 posts

LocationUSA

Posted 23 April 2017 - 04:18 AM

vandalhooch, on 22 April 2017 - 08:08 PM, said:

And you were personally making the call of whether to include or not include them in your data set based on how you "felt" about the match.

Again, I didn't know what numbers each individual match would poop out until long after I decided to save the screenshot and add it to the data. It's not like I analysed each match, went "hrmmm, this one won't support my narrative, I'll throw it out."

Quote

Okay, now for the next set of problems with your data.

1 - Sample size of 71 is likely not large enough for a an alpha of 5% given the inherent variance of the data.

Given how crude the whole thing was, I'd say 5% is pretty narrow, and shows that it might be worth looking into.

Quote

2 - Your sample data does not control for other factors that might increase or decrease frequency of stomps.

True. But in order to do this, I'd have to probably use OCR and collect thousands of matches. That's getting to be a bit much.

Quote

3 - What is your control for map effects? Mode effects? Group vs. solo effects? Initial drop site effects?

4 - What is the baseline rate of difference between teams for all matches? Are stomps common? Are they rare? How well does your model predict the rate of stomps?

Again, in order to go this in-depth, I'd have to up the scale and depth by an order of magnitude. Though, I already did note that this was solo queue only, so at least we can tick one off the list.

Quote

Without all this additional data, you nor I can not really say anything at all about the strength or weakness of the matchmaker. All you have possibly shown is that unbalanced teams (according to your metric) rarely result in a reverse stomp.

But it has absolutely no relevance to the question of if the matchmaker is failing to generally make evenly matched teams. For that, you need to collect the results of thousands of matches so that you can run a proper ANOVA to account for all the factors that might affect the outcome of any particular match.

I agree, but that's PGI's job. My goal was to show that there is enough evidence to warrant a proper investigation. The outcome of a stomp match being predicted correctly 75% to 80% of the time just by the players' stats is fishy, that's the point i was trying to make. And I showed in my OP how matches could be better constructed (I checked a few actual matches in my data to confirm that my suggestion about swapping players could be viable before just throwing the idea out there).

Quote

BTW: Dimento just tried to tell me that true pilot skill is a combination of WLR, KDR, and MS. Gosh, it seems like there's no universal agreement on what metrics truly indicate an individual pilot's skill.

I mean, I agree. WLR, MS, and KDR are better measures of a player than the experience bar we have now that is PSR.

If anything, PSR should be based on

- Your average matchscore, primarily
- adjusted by what weight classes you play, proportionally
- adjusted by the number of matches you played, with a cap (so that you can't, or at least are unlikely to, be thrown into the Tier 1 sharktank unless you've played a certain number of matches, like 100, or 500, or whatever turns out to be an appropriate limit)
- WLR (and KDR) (though honestly, I haven't had the most convincing results by using these in player rating algorithms, but if done properly they should be effective at reconciling the difference between high MS pug-star heroes, and low MS but high-efficiency group-queue winners.

Quote

Not sure what you mean by "went down, average of 3%." Significance is something that can be calculated given the proper data. With a sample size of 71 matches however, your conclusions are unlikely to prove significant at an alpha of 5%.

As in, previously, the measured variables agreed with the result of the match 75%-80% of the time, a pretty danged strong correlation for something as sophisticated as MWO matches. After removing the 12-2 data, those numbers only went down by like 2%, 2%, and 4%, respectively, shown on the table. For having removed nearly 40% of the data, and the result didn't even change much... I think that's pretty telling that there is an underlying pattern.

Quote

They should never have been included in the first place. They are just as much cherry picking as your "feelings" about stomps. A proper data set, properly analyzed, might be capable of supporting the metamechs opinions but without that, their inclusion in your data set is useless. Garbage in, garbage out.

I analysed them because I didn't know what would happen, and I was looking anywhere I could for patterns/predictability. They were sorta of "extras", or "wildcards" that could be worthless, or could much to my surprise show a very strong correlation. The results of course turned out pretty inconclusive, but I think it's a shame to not share them, because I did the work and I wanted people to see what I found, whether it was useful or not. Hey, if I decided to not show them in my post, that would be cherry picking wouldn't it? =P

Quote

Not a study (see my previous posts). Your analysis, now, is more reliable in indicating that 12-0 and 12-1 matches often have unbalanced teams (as defined by your WLR metric). Your analysis says absolutely nothing about whether or not the matchmaker is doing a "good job" or not.

But again, it does show that cases of stomp matches the result can be predicted up to 80% by just looking at the players before they drop. You're right, I didn't show that A STOMP could be predicted with any certainty, I only showed that when stomps DO occur, they show some signs of being predetermined. At least, to a large enough extent that I feel it supports the notion that PGI should really take a fresh look at their matchmaker and actually check if they are matching players optimally. ie., is PSR alone a good enough metric? I don't think so.

#204 Tarogato

Member

Civil Servant
6,558 posts

LocationUSA

Posted 23 April 2017 - 04:45 AM

Dimento Graven, on 22 April 2017 - 09:12 PM, said:

I'm supporting these sentences:

"In fact matchmaker doesn't assemble the equal teams. It makes teams to be unequal. "

This from experience and two different people doing analysis independently, appears to be proven.

You have to be very careful how you word these things though, and because this is worded very cynically I strongly disagree with it. It seems to me he is saying "the matchmaker intends to make imbalanced matches", or that "the matchmaker tries to force results by assembling stronger teams against weaker teams."

I don't believe this to be true. I believe the matchmaker does the best it can at creating matches where each side has equal opportunity to win. Any evidence we gather that shows unnecessary imbalances doesn't necessarily mean the matchmaker is trying to create unfair matches, but it certainly shows that the matchmaker is failing to create fair matches. This is two completely different spins on the same particle here. Posted Image

#205 Shifty McSwift

Member

2,889 posts

Posted 23 April 2017 - 04:48 AM

I am starting to think the 1-2 page argument I got into about the matchmaker was well handled by comparison to some of the discussion around it.

**Pats self on back**

#206 Tarogato

Member

Civil Servant
6,558 posts

LocationUSA

Posted 23 April 2017 - 05:24 AM

vandalhooch, on 22 April 2017 - 11:34 PM, said:

I see someone pretending that they have done some science and then trying to use their pretend "study" to sway the opinions of others, I speak up. You don't like it, don't pretend to be doing science.

[...]

Nope, nope, nope. That 80% you are quoting from Taragato does not mean what you think it means.

Taragato only included stomps (12-0 and 12-1) in his analysis. The 80% was the percentage of stomps that had the "higher level" team winning. Note, 20% of those stomps were STOMPS BY THE "WEAKER TEAM." Not just that the "weaker team" won, they STOMPED the stronger team. Taragato's data did not show that the majority of matches have such a discrepancy because he did not collect data on every match! How many matches do "weaker teams" according to Taragato end up winning the match? We have no idea because he didn't collect that data.

[...]

Please don't try to lecture me about how science works. Do you even know what I do for a living?

If your observations are biased, and your data is insufficient then you end up not understanding anything beyond what you wanted to be true from the start.

The limited sample can't support anything. It's too limited. It is the very definition of observation bias at work. If the data seems to confirm what you originally believed before the analysis then GOOD SCIENCE is to be even more skeptical of the data. Humans are very, very good at lying to themselves without being consciously aware of it.

I'm just gonna leave this here...

You very clearly know a lot more about this than any of us. I'm not being facetious... I've actually already learned a few things from you and I'm not going to pretend I didn't. I'm not a statistician, or scientist, I'm just a dude with copious spare time and big crush on curiousity - I have a lot to learn still, and I accept that I will make mistakes and can unintentionally misrepresent data.

Now, I did limit the scope of my "study" for practical purposes. I entered my data manually, and would need a working OCR to expand the scope enough to even attempt to assuage the concerns you've raised. But even if I did that, I could still fall victim to more scientific inadequacies due to my growing but decidedly limited knowledge.

Now, I've already spent maybe... two hours? ... just reading and replying to this thread, and I suspect perhaps you might have as well. It would be absolutely wonderful if somebody like you, with superior knowledge, experience, and ideals... spent this sort of time doing this kinda of work. Showing how it's done properly, and what ACTUAL objective conclusions can be definitively drawn. I'd love that! I'd really like to see somebody one-up me. But nobody wants to spend the time! I'm fallible, duh! I wish more people cared enough to actually put in this kind of work, rather than just bickering about hypotheticals on the forums like so many do.

What I mean to say is... the work I've done here is pretty much the best we have so far. And I know it's not great work, it's amateur, and you've pointed out flaws quite clearly. But who wants to step up and do more, better? Or the golden question... why should WE have to, when it's PGI's job? At the end of the day, I'll be happy if anybody shows conclusive enough evidence to merit PGI's attention, and prod them into investigating themselves, and addressing our concerns as a community that we feel the matchmaker could do a better job with the hand that it is dealt.

Sorry, this might have come off as a bit like "I put in the work even if it's shoddy, and you didn't put in any, therefore I'm above you". I realise that... I apologise, I don't intend that by any means. But I'm just not sure where to go from here. You're being very critical and argumentative, when you have an opportunity to be critical and contributive. If none of this research is valid, for various reasons, what *can* we do? Or is everything you have in mind beyond that which is reasonable practical for us, the playerbase, and thus futile? If we have a feeling that the matchmaker could be better, how should *we* go about showing to PGI that our concerns have merit?

Edited by Tarogato, 23 April 2017 - 05:33 AM.

#207 Too Much Love

Member

Ace Of Spades
787 posts

Posted 23 April 2017 - 05:27 AM

Tarogato, on 23 April 2017 - 04:45 AM, said:

You have to be very careful how you word these things though, and because this is worded very cynically I strongly disagree with it.

"Worded vey cynically"? Ok, the next time I'll add some photos of cats and doggies to make it less disturbing and appropriate for minors.

Anyway, I like how my original post gets the features of sacred text and promotes the struggles of interpretations.

#208 Too Much Love

Member

Ace Of Spades
787 posts

Posted 23 April 2017 - 05:33 AM

Tarogato, on 23 April 2017 - 05:24 AM, said:

I'm just gonna leave this here...

It would be absolutely wonderful if somebody like you, with superior knowledge, experience, and ideals... spent this sort of time doing this kinda of work. Showing how it's done properly, and what ACTUAL objective conclusions can be definitively drawn. I
What I mean to say is... the work I've done here is pretty much the best we have so far.

OMG, this is such a great mix of blunt flattery and uncovered narcissism that it has the value of its own.

Glad that my topic provided you an opportunity to meet and fruitflully exchange ideas.

Edited by drunkblackstar, 23 April 2017 - 05:35 AM.

#209 Tarogato

Member

Civil Servant
6,558 posts

LocationUSA

Posted 23 April 2017 - 05:52 AM

drunkblackstar, on 23 April 2017 - 05:33 AM, said:

OMG, this is such a great mix of blunt flattery and uncovered narcissism that it has the value of its own.

lol, perhaps taken sliiiiiiiightly out of context and certainly poorly quoted, but not entirely false I guess. =P

=/

#210 BLOOD WOLF

Member

The Jaws
6,368 posts

Locationnowhere

Posted 23 April 2017 - 06:03 AM

drunkblackstar, on 23 April 2017 - 05:33 AM, said:

OMG, this is such a great mix of blunt flattery and uncovered narcissism that it has the value of its own.

Glad that my topic provided you an opportunity to meet and fruitfully exchange ideas.

your serious?

yea vandalhooch. That confirmation bias is strong on these forums, with people patting each other on the back when they get **** wrong. makes them wan't to double down. the Backfire effect.

#211 Too Much Love

Member

Ace Of Spades
787 posts

Posted 23 April 2017 - 06:17 AM

BLOOD WOLF, on 23 April 2017 - 06:03 AM, said:

your serious?

That confirmation bias is strong on these forums

#212 BLOOD WOLF

Member

The Jaws
6,368 posts

Locationnowhere

Posted 23 April 2017 - 06:22 AM

drunkblackstar, on 23 April 2017 - 06:17 AM, said:

It's cool, I see you gotta troll me, but you got destroyed in this thread. As well as your bad attempt at gathering data to prove the MM bIas/unfair.

Then the grand argument that lasted pages on end, because a few people don't know how rigorousness science works.

oh yea, and before you go around saying to yourself, There is no evidence on this forum nor do I ever search for confirmation bias at all. I go where the evidence leads. Of course you don't know me outside the forums so you don't know the academic field I am in.

Shifty McSwift, on 23 April 2017 - 04:48 AM, said:

I am starting to think the 1-2 page argument I got into about the matchmaker was well handled by comparison to some of the discussion around it.

**Pats self on back**

careful, now most of the thread is a waste. The idea was refuted around page 1-2. Some people didn't want to give up.

Edited by BLOOD WOLF, 23 April 2017 - 06:30 AM.

#213 Too Much Love

Member

Ace Of Spades
787 posts

Posted 23 April 2017 - 06:40 AM

BLOOD WOLF, on 23 April 2017 - 06:22 AM, said:

It's cool, I see you gotta troll me, but you got destroyed in this thread. As well as your bad attempt at gathering data to prove the MM bIas/unfair.

I can understand your ressentiment, that is because you were with one of the most negative abusive people on team speak.

1) Your statement is simply not true. Up to date my OP gathered 37 likes. A lot of people told that they agree with me and expressed their support. I appreciate that.

2) In fact, the followed discussion was quite useful . I would say that it affected my opinion. Previously I was almost sure that MM deliberatly fixes results. Now I'm not so positive. I understand that there is a possibility that it is PSR system flaw. I'd like to thank guys who provided constructive thoughts.

3) What I didn't like is the simple posts like "you are wrong", "it's BS" etc. If you have input to make, something to say, say it. If not - better move along.

Edited by drunkblackstar, 23 April 2017 - 06:48 AM.

#214 BLOOD WOLF

Member

The Jaws
6,368 posts

Locationnowhere

Posted 23 April 2017 - 06:55 AM

drunkblackstar, on 23 April 2017 - 06:40 AM, said:

1.The number of likes doesn't mean anything. Democracy doesn't overturn empirical data.

2) I am glad your a little more advanced than people like Carl, and are capable of changing opinion in lieu of the data. However, the possibility could be a number of factors. People on this forum seem to jump with the easiest conclusions and even worse Cling to their in groups on certain issues, and they never get out of that bubble of same concluding thoughts.

3) Sorry but its how I post. If your wrong empirically I am going to say so. I will also go to explain why or give my take. It's also entirely possible I say that and I could be wrong. I have been a few times on this forum, just like everybody else. That's what discussion is for, to root out the truth. Depending on a persons disposition towards me I respond in kind. Like how I mentioned confirmation bias and you felt to post a meme that presumed that I am strong with confirmation bias. Sorry to say it but that makes your number 3 a hypocritical stance. again, not making it personal

Edited by BLOOD WOLF, 23 April 2017 - 06:57 AM.

#215 KingCobra

Member

FP Veteran - Beta 1
2,726 posts

LocationUSA

Posted 23 April 2017 - 07:08 AM

Mr. OP here is your answer.

Crytek (
Advanced Modular AI System

Realistically rendered and animated characters require state-of-the-art AI systems to intelligently respond to the game environment and maintain the illusion of realism. CryENGINE 3 features powerful, scalable, and flexible AI

technology to handle character behaviors with modular sensory systems, such as sight and hearing, and fully support the complex requirements of the character locomotion system.)

When PGI learned they could manipulate players win/loss rate with the Crytek Server AI which tries to maintain a 1.0 for everyone they started to down a dark path in this games history. No longer was the determining factor skill based as the Server AI will attempt to limit your offensive output so you seem to hit a target but no damage is registered by the client to the server or
the other way around where you receive more damage than you should so you die to balance the 1.0 equation.

You can download the Crytek SDK like I have and basically make a sandbox MWO clone play with the PVE and PVP Server AI setting play a few games with some friends and soon you start to understand why MWO is not player skill based at all.

http://www.crytek.co...ngine3/overview

Edited by KingCobra, 23 April 2017 - 07:08 AM.

#216 Jamun

Member

30 posts

Posted 23 April 2017 - 07:09 AM

There are lies, damned lies and statistics, so they say.

You only have to be put into multiple matches to realise that the Match 'maker' isn't doing a decent job. As a T5 player I've just been put into a match with many T2 players - not fun.

Is this a function of a small player base? Who knows. The only way we would know is if PGI release the algorithm they use to do this. Shame it isn't an Open Source bit of code.

#217 Tarogato

Member

Civil Servant
6,558 posts

LocationUSA

Posted 23 April 2017 - 07:13 AM

drunkblackstar, on 23 April 2017 - 06:40 AM, said:

1) Your statement is simply not true. Up to date my OP gathered 37 likes. A lot of people told that they agree with me and expressed their support. I appreciate that.

To be fair, I didn't offer my opinion either way, and I just realised that now.

I also think your study lacks a proper sample size for conclusions. 12 matches, is ... quite bluntly... pathetic. I'd like to see at least 100 matches before I'm inclined to believe something, and closer to 1000 if you want to begin to prove it.

It's a good start, but you need a lot more before it will carry any weight at all.

#218 Too Much Love

Member

Ace Of Spades
787 posts

Posted 23 April 2017 - 07:46 AM

Tarogato, on 23 April 2017 - 07:13 AM, said:

I also think your study lacks a proper sample size for conclusions. 12 matches, is ... quite bluntly... pathetic. I'd like to see at least 100 matches before I'm inclined to believe something, and closer to 1000 if you want to begin to prove it.

I knew that the scientific standards on online gaming forums considered to be one of the highest. I completly agree with you. 12 is not enough. Where are we? It's not "Nature" or "Science" ! It's MWO forum for God sake, I had to be precise, I recognize that.

I also knew that there would be few respecful scientific peers, who specialize in the field of statistics, who would point out that my sample is quite small (turned out about 10500+ gentelmen) . That's why I made special clause in my original post about it.

Thank you for your opinion!

#219 BLOOD WOLF

Member

The Jaws
6,368 posts

Locationnowhere

Posted 23 April 2017 - 08:39 AM

KingCobra, on 23 April 2017 - 07:08 AM, said:

When PGI learned they could manipulate players win/loss rate with the Crytek Server AI which tries to maintain a 1.0 for everyone they started to down a dark path in this games history. No longer was the determining factor skill based as the Server AI will attempt to limit your offensive output so you seem to hit a target but no damage is registered by the client to the server or
the other way around where you receive more damage than you should so you die to balance the 1.0 equation.

http://www.crytek.co...ngine3/overview

yea.........hmmm...........no

Edited by BLOOD WOLF, 23 April 2017 - 08:39 AM.

#220 vandalhooch

Member

Ace Of Spades
891 posts

Posted 23 April 2017 - 09:59 AM

Tarogato, on 23 April 2017 - 04:18 AM, said:

I get that, but it still makes your inclusion of 12-2's biased, whether or not you were consciously aware of any bias or not.

Quote

Given how crude the whole thing was, I'd say 5% is pretty narrow, and shows that it might be worth looking into.

I don't disagree that it would be worth looking at in a more systematic way. The 5% alpha is just the typically acceptable error rate for most sociological and psychological research. Particle physics, like at the LHC, has a much, much more stringent acceptable error rate.

Quote

True. But in order to do this, I'd have to probably use OCR and collect thousands of matches. That's getting to be a bit much.

Again, in order to go this in-depth, I'd have to up the scale and depth by an order of magnitude. Though, I already did note that this was solo queue only, so at least we can tick one off the list.

Yep. There's a reason why databases and spreadsheets are the go to tools of modern science.

Quote

Except that's not what you measured. In the case of a stomp you actually showed that the weaker team STOMPED THE STRONGER TEAM 20% of the time. That high of a value leads me to believe that your metric is not nearly as good at identifying stronger vs. weaker teams. None of that has anything to do with the likelihood of a weaker or stronger team WINNING the match.

Quote

And I showed in my OP how matches could be better constructed (I checked a few actual matches in my data to confirm that my suggestion about swapping players could be viable before just throwing the idea out there).

Except you never collected data on win/loss rates of your so-called strong vs weak teams. You only recorded the results of stomps, not every match.

Quote

I mean, I agree. WLR, MS, and KDR are better measures of a player than the experience bar we have now that is PSR.

Since we don't know exactly how PSR is calculated, how can you possibly know that WLR, MS and KDR aren't already included in PSR?

BTW: Those three metrics are not independent of one another. Wins and losses as well as kills and deaths are part of match score calculations. That is going to be problematic for your new player skill metric. If a player has a high KDR, then they will by default have a higher average match score.

Quote

If anything, PSR should be based on

- Your average matchscore, primarily
- adjusted by what weight classes you play, proportionally
- adjusted by the number of matches you played, with a cap (so that you can't, or at least are unlikely to, be thrown into the Tier 1 sharktank unless you've played a certain number of matches, like 100, or 500, or whatever turns out to be an appropriate limit)
- WLR (and KDR) (though honestly, I haven't had the most convincing results by using these in player rating algorithms, but if done properly they should be effective at reconciling the difference between high MS pug-star heroes, and low MS but high-efficiency group-queue winners.

You just described the current PSR system, with the exception of the weight class weighting. I'm not sure how you are going to incorporate that into a single value for matchmaking. What happens when that pilot decides to drop in a weight class they rarely use? Does their PSR go up? Down? How much?

Quote

But you haven't detected a pattern in the match maker. You detected a pattern that if a stomp happens, then the stronger team is usually the stomper and only 20% of the time the stompee. That's hardly an earth-shattering revelation.

You didn't show that stomps happen more often than they should if teams were "balanced."
You didn't show that matches are more often unbalanced than balanced from match to match.
You didn't actually address anything that the OP of this thread claimed, which is why I pointed out to him that citing your analysis was completely irrelevant.

Quote

I definitely appreciate the hard work you put into gathering and organizing the data. I think it was a very worthwhile effort.

However, we still have to understand what it is you actually found versus what you hoped to find out.

Your technique could be used to answer the questions most people in this thread are actually interested in but as you noted above it would require massive amounts of work on your part because we are having to sift through different data sources instead of having direct access to the database itself.

Quote

Better team wins match . . . news at eleven.

Quote

At least, to a large enough extent that I feel it supports the notion that PGI should really take a fresh look at their matchmaker and actually check if they are matching players optimally. ie., is PSR alone a good enough metric? I don't think so.

Without a measurement of how often teams are unbalanced using your metrics and a strong correlation with win rates favoring the stronger team, I don't think your data really supports a claim that PSR is a bad metric. It could be as you say but you still don't have the appropriate data to back that claim up.

Stats Study: Matchmaker Is Unfair

#201 Too Much Love

#202 vandalhooch

#203 Tarogato

#204 Tarogato

#205 Shifty McSwift

#206 Tarogato

#207 Too Much Love

#208 Too Much Love

#209 Tarogato

#210 BLOOD WOLF

#211 Too Much Love

#212 BLOOD WOLF

#213 Too Much Love

#214 BLOOD WOLF

#215 KingCobra

#216 Jamun

#217 Tarogato

#218 Too Much Love

#219 BLOOD WOLF

#220 vandalhooch

1 user(s) are reading this topic

HOME

GAME

MEDIA

COMMUNITY

SUPPORT