Jump to content

Statistical Analysis Of The 12-0


187 replies to this topic

#161 Jonathan8883

    Member

  • PipPipPipPipPipPipPip
  • Shredder
  • Shredder
  • 708 posts

Posted 17 January 2018 - 09:04 AM

Fascinating thread. Great work, even if it's a year old.
Clan mechs do more damage and move faster, but IS mechs are more durable. A higher-skilled player within the same tier is going to have better positioning and will read the battle better, resulting in more opportunities to deal damage, and less damage received. I think this accounts for the Clan/IS disparity in the data.

The rest is "quality within Tiers" for the most part, and the only way PGI can fix that is by implementing a more detailed/granular system.

#162 Dimento Graven

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Guillotine
  • Guillotine
  • 6,208 posts

Posted 17 January 2018 - 09:10 AM

View PostJonathan8883, on 17 January 2018 - 09:04 AM, said:

...Clan mechs do more damage and move faster, but IS mechs are more durable.

...
And no. While a very few IS 'mechs without ISXL engines can be more durable than a few of their clan weight class equivalents, IS 'mechs in general are by no means MORE durable than clan 'mechs.

Damnit, this is a discussion for another thread... I'll shuddup now...

#163 sub2000

    Member

  • PipPipPipPipPip
  • Bad Company
  • Bad Company
  • 127 posts

Posted 17 January 2018 - 11:19 AM

Again averages....I see nobody studies statistics as a proper mathematical tool anymore.
It is frigging instrument, and requires respect and knowledge to proper use. At least provide variance if you use averages anywhere....

12:0 is the result of the snowball effect, or as people call it here death-ball. While it is always the result of "flow" in QP single queue, there are have to be seeders to start one.

Check outliers. Look for players with w2L>1.3 and k2d~2. Especially for the situations when you have two good players on one team....

Edited by sub2000, 17 January 2018 - 11:20 AM.


#164 Dimento Graven

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Guillotine
  • Guillotine
  • 6,208 posts

Posted 17 January 2018 - 11:23 AM

View Postsub2000, on 17 January 2018 - 11:19 AM, said:

Again averages....I see nobody studies statistics as a proper mathematical tool anymore.
It is frigging instrument, and requires respect and knowledge to proper use. At least provide variance if you use averages anywhere....

12:0 is the result of the snowball effect, or as people call it here death-ball. While it is always the result of "flow" in QP single queue, there are have to be seeders to start one.

Check outliers. Look for players with w2L>1.3 and k2d~2. Especially for the situations when you have two good players on one team....
Meh, I disagree with you on this actually.

There's good data here, but as they say, "There's liars, damned liars, and statistics..."

Anyway from a strictly personal gaming experience, over the last few months 12-0/12-1/12-2 matches have become considerably rare.

I don't know if that's just because I'm a particular trend right now, or if it's due to something that PGI has actually done, in game, or with match maker.

But for now, and in my own experience, it 'feels' improved (most of the time).

#165 Grus

    Member

  • PipPipPipPipPipPipPipPipPip
  • Little Devil
  • Little Devil
  • 4,155 posts

Posted 17 January 2018 - 11:48 AM

whos the Necromancer? this is way old numbers...

#166 Xavori

    Member

  • PipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 792 posts

Posted 17 January 2018 - 12:41 PM

You could have saved yourself a ton of work if the point was to critique the matchmaker :P

The goal of a working matchmaker is to give everyone an equal chance to win every match. That means everyone should have a W/L ratio that approaches 1 over time. You can just look at the leaderboard, see that that is not the case, and go, "Yup. Bad matchmaker."

#167 Xavori

    Member

  • PipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 792 posts

Posted 17 January 2018 - 12:46 PM

View Postsub2000, on 17 January 2018 - 11:19 AM, said:

Again averages....I see nobody studies statistics as a proper mathematical tool anymore.
It is frigging instrument, and requires respect and knowledge to proper use. At least provide variance if you use averages anywhere....

12:0 is the result of the snowball effect, or as people call it here death-ball. While it is always the result of "flow" in QP single queue, there are have to be seeders to start one.

Check outliers. Look for players with w2L>1.3 and k2d~2. Especially for the situations when you have two good players on one team....


Heh. The bigger problem is people that know some statistics, but not the underlying theories.

It's practically impossible to read anything into our current leaderboard stats because the matches are effectively random. And contrary to what so many say, you can't just take a huge sample and turn random numbers of players into meaningful stats because the required elements of such a sample don't exist. There is no normalized distribution of player skill. There is no way to separate the mech from the pilot in terms of how effective it will be. If MWO has a similar distribution of player skill as compared with other small-medium F2P games, then the number of casuals and potatoes so vastly outnumbers the skilled players that you have a bell-bottom jeans curve, not a bell curve. And since low skill players can be expected to perform fairly randomly (sometimes they'll play well, sometimes bad, sometimes who knows), they take the randomness of the matchmaker and just make it even moar randomer :P

#168 NRP

    Member

  • PipPipPipPipPipPipPipPipPip
  • Fire
  • Fire
  • 3,949 posts
  • LocationCalifornia

Posted 17 January 2018 - 12:54 PM

Is it true Tarogato plays an oboe?

#169 lazorbeamz

    Member

  • PipPipPipPipPipPipPip
  • 567 posts

Posted 17 January 2018 - 04:30 PM

Nice topics you got there. Can you investigate the impact of diffirent weapon types on winning or losing? large, medium, small lasers, cannons, ppc, gauss, etc

We are often upset because we think that some weapon is op or weak. woudl like to see some proof of that or something that dispels the illusion.

Edited by lazorbeamz, 17 January 2018 - 04:31 PM.


#170 mouser42

    Member

  • PipPipPipPipPipPip
  • Bad Company
  • Bad Company
  • 382 posts
  • Locationb-more

Posted 17 January 2018 - 05:13 PM

NICE! Great job also on top of the amazing work you did I'm impressed that people are reading it to.

#171 nehebkau

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,386 posts
  • LocationIn a water-rights dispute with a Beaver

Posted 22 January 2018 - 07:36 AM

View PostXavori, on 17 January 2018 - 12:46 PM, said:


Heh. The bigger problem is people that know some statistics, but not the underlying theories.

It's practically impossible to read anything into our current leaderboard stats because the matches are effectively random. And contrary to what so many say, you can't just take a huge sample and turn random numbers of players into meaningful stats because the required elements of such a sample don't exist. There is no normalized distribution of player skill. There is no way to separate the mech from the pilot in terms of how effective it will be. If MWO has a similar distribution of player skill as compared with other small-medium F2P games, then the number of casuals and potatoes so vastly outnumbers the skilled players that you have a bell-bottom jeans curve, not a bell curve. And since low skill players can be expected to perform fairly randomly (sometimes they'll play well, sometimes bad, sometimes who knows), they take the randomness of the matchmaker and just make it even moar randomer Posted Image


Uh, wrong? If you take a large enough sample your confidence in the trends goes up -- that's basic maths.

BTW I necro'd this thread because, well, in a year what has changed -- seriously if PGI ignores something for a year is it really dead? Maybe just in hibernation?

Edited by nehebkau, 22 January 2018 - 07:37 AM.


#172 Growlly

    Member

  • PipPipPip
  • Bad Company
  • Bad Company
  • 82 posts

Posted 22 January 2018 - 10:58 AM

I know this was necro'd, but I wasn't around then, so thoughts anyway from a scientician:
1) I love the use of Kill Rate in this thread over KDR.

2) I would like to see data from a comparison group--matches that weren't stomps. It could be that some or all of these variable imbalances are everywhere, but most of the time it doesn't matter. And what do 12-11 games look like in comparison? In other words, looking at matchmaking "successes" could reveal the most important differences.

3) What about map and game mode? If you're on the bad spawn of Grim Plexus or River City Domination, you're gonna have a bad time. And then there's Escort... probably should just throw out data from that mode.

4) I agree with the apparent oversight of emphasis on Clan mechs in the analysis, especially since this was before another year's worth of nerfs.

5) I would like to see some additional examination on the WLR and Kill Rate variables. The winning team is higher on both, but only by a seemingly small amount (about 0.1). We see that the average is higher, but is there clustering? In other words, it could be that in these matches, one team has 12 decent players, and the other team has 5 amazing players and 7 potatoes. With such a small difference, it could even be that one team has 12 decent players and the other has 11 decent players and one s o l i t u d e.

6) I would like to see some interaction variables. In other words, what happens when the high WLR/Kill Rate players are in the assault mechs vs. potato assaults? That could explain everything. What happens when one side has Clan Assault mechs? What I'm saying is that it could be a combination that matters.

7) Unfortunately there's no way to quantify "the most OP thing in the game"--teamwork. It's an absent variable that could explain a lot of the variance. The easiest thing I can think of (an observer on each team that records how many messages were relayed) would still be really difficult and subject to bias.

8) I would love to see a similar approach taken to the group queue matchmaker, which I think we can agree is worse than the solo queue.

9) I would love a supplemental examination of the snowball/deathball effect. In other words, what is the likelihood of victory for a team that starts 2-0? What is the effect of AFK/disconnect players?

Quote

Nice topics you got there. Can you investigate the impact of diffirent weapon types on winning or losing? large, medium, small lasers, cannons, ppc, gauss, etc

We are often upset because we think that some weapon is op or weak. woudl like to see some proof of that or something that dispels the illusion.


Doubtful, since the end screens won't show loadouts.

EDIT: Also, this was astounding work. Should have said that earlier, was in science mode.

Edited by Growlly, 22 January 2018 - 11:07 AM.


#173 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 22 January 2018 - 02:09 PM

FFS.

W/L.

That's it.

If you make a matchmaker that's based around or including anything OTHER than w/l, the matchmaker is ****.

You're not trying to balance teams on KDR, you're not trying to balance teams around who farms the most damage - you're trying to balance teams based on only one thing; their odds of winning.

Every other metric, every single one, can be looked at to find patterns in WHY and HOW one person or team wins or loses but actually WINNING is only reflected, ONLY REFLECTED IN HOW OFTEN THEY WIN OR LOSE.

Your W/L *skews* your score, your KDR and your kills per match because matches you win either directly buff your score (double-dipping that result) or skew your potential for the others.

There is no statistical analysis view on a matchmaker for building teams out of pug players that's going to use anything other than w/l that's not going to be ****. Take PSR as an example. You could use matchscore; which would put people who win only 1.0 to 1.1 but do tons of useless damage because they only play LRM boats at the same performance level or higher than some of the players on the winning team of MWOWC. Go look at the Jarls List for an example; it's got people with a 1.1 w/l in the top 1/10th of 1% because they play LRMs all the time every match and so have inflated damage scores. You would have the exact same caliber of matches (or worse!) with that than we have with PSR.

W/L. That's it. That's all. You want it to be really good, you have a score for the player, a score for mech chassis and a small modifier for the weapons and equipment on the mech based on those mechs and those weapons w/l history.

#174 DAYLEET

    Member

  • PipPipPipPipPipPipPipPipPip
  • Bad Company
  • Bad Company
  • 4,316 posts
  • LocationLinoleum.

Posted 22 January 2018 - 02:28 PM

View PostTarogato, on 30 January 2017 - 01:43 PM, said:

Team tonnage had a negative correlation. The heavier team was on the losing side of a stomp 59% of the time. I suppose this makes sense if you look at it from the perspective that the team with the heaviest and slowest fatties is more likely to have those fatties get left behind in the Nascarfest and get picked off, initiating a snowball effect. I can't assert this fully, but if there is enough interest I could investigate a little more closely and see if I can confirm these suspicions.



Pure tonnage isnt a good metric. The team that get the assault flavor of the month always win if the other team gets the unique snowflakes. Lesser tonnage win? What assault were considered top of the line when you made your stats? 100tonners have not been great since the kdk got balanced. In QP it's always been like this, the chassis/variant have a lot to say when one team get all the meta.

Edited by DAYLEET, 22 January 2018 - 02:31 PM.


#175 Tarogato

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Civil Servant
  • Civil Servant
  • 6,557 posts
  • LocationUSA

Posted 22 January 2018 - 04:50 PM

View Postsub2000, on 17 January 2018 - 11:19 AM, said:

Again averages....I see nobody studies statistics as a proper mathematical tool anymore.
It is frigging instrument, and requires respect and knowledge to proper use. At least provide variance if you use averages anywhere....

12:0 is the result of the snowball effect, or as people call it here death-ball. While it is always the result of "flow" in QP single queue, there are have to be seeders to start one.

Check outliers. Look for players with w2L>1.3 and k2d~2. Especially for the situations when you have two good players on one team....

View PostGrowlly, on 22 January 2018 - 10:58 AM, said:

2) I would like to see data from a comparison group--matches that weren't stomps. It could be that some or all of these variable imbalances are everywhere, but most of the time it doesn't matter. And what do 12-11 games look like in comparison? In other words, looking at matchmaking "successes" could reveal the most important differences.

Aye, when I get OCR up and running, I should able able to farm stats from *all* matches. In this study I typed in pilot names manually in the downtime between matches, very labour-intensive. I'd love to do a more comprehensive approach, as you've said, looking to see if the magnitude of statistical (historical) discrepancies between teams does truly seem to be a predictor for match outcome.

Of course, I'm learning as I go. I'm not a statistician or a scientist, it's just a hobby I've picked up over the years of being disappointed with aspects of this game and desiring to dig deeper. So yes, I'm an amateur. =P

Implications of variances and whatnot... all stuff I need to get a proper handle on. Anybody of course is welcome to do their own analysis based on the data I've collected, that's why I've provided it.



View PostDAYLEET, on 22 January 2018 - 02:28 PM, said:

Pure tonnage isnt a good metric. The team that get the assault flavor of the month always win if the other team gets the unique snowflakes. Lesser tonnage win? What assault were considered top of the line when you made your stats? 100tonners have not been great since the kdk got balanced. In QP it's always been like this, the chassis/variant have a lot to say when one team get all the meta.

This is why I also checked mech matchups according to how GMan rated them on his site, which at the time of making this thread, I felt his tier lists were pretty accurate and reflected my own opinions of where mechs stood in terms of influence potential.



View Postlazorbeamz, on 17 January 2018 - 04:30 PM, said:

Nice topics you got there. Can you investigate the impact of diffirent weapon types on winning or losing? large, medium, small lasers, cannons, ppc, gauss, etc

We are often upset because we think that some weapon is op or weak. woudl like to see some proof of that or something that dispels the illusion.

PGI doesn't provide us any means of gleaning what loadouts were used in a public match, so no, this is an impossibility.




View PostMischiefSC, on 22 January 2018 - 02:09 PM, said:

FFS.

W/L.

That's it.

If you make a matchmaker that's based around or including anything OTHER than w/l, the matchmaker is ****.

You're not trying to balance teams on KDR, you're not trying to balance teams around who farms the most damage - you're trying to balance teams based on only one thing; their odds of winning.

Every other metric, every single one, can be looked at to find patterns in WHY and HOW one person or team wins or loses but actually WINNING is only reflected, ONLY REFLECTED IN HOW OFTEN THEY WIN OR LOSE.

Your W/L *skews* your score, your KDR and your kills per match because matches you win either directly buff your score (double-dipping that result) or skew your potential for the others.

There is no statistical analysis view on a matchmaker for building teams out of pug players that's going to use anything other than w/l that's not going to be ****. Take PSR as an example. You could use matchscore; which would put people who win only 1.0 to 1.1 but do tons of useless damage because they only play LRM boats at the same performance level or higher than some of the players on the winning team of MWOWC. Go look at the Jarls List for an example; it's got people with a 1.1 w/l in the top 1/10th of 1% because they play LRMs all the time every match and so have inflated damage scores. You would have the exact same caliber of matches (or worse!) with that than we have with PSR.

W/L. That's it. That's all. You want it to be really good, you have a score for the player, a score for mech chassis and a small modifier for the weapons and equipment on the mech based on those mechs and those weapons w/l history.

I wish it were that simple. We had an Elo system in the past, which was just that: your MMR was based entirely upon whether you won or lost, and how likely you were to win or lose based on the win/loss MMR of the two teams in your match. People didn't want this system though, (myself included), because in a game like MWO with 8v8 or 12v12, then your win/loss record is heavily skewed by factors outside of your own control. You're only a very small percent of what influences any given match, regardless of how good or terrible you are at the game, and it's a bit unfair to base your MMR mostly on the effects of the other 23 people in your matches. This is why I think that PSR is a good system in theory, but it fails to do one thing that I think is pivotal to a good over-time rating system: zero-sum modifications to ratings of all players per given match.




View PostGrowlly, on 22 January 2018 - 10:58 AM, said:

5) I would like to see some additional examination on the WLR and Kill Rate variables. The winning team is higher on both, but only by a seemingly small amount (about 0.1). We see that the average is higher, but is there clustering? In other words, it could be that in these matches, one team has 12 decent players, and the other team has 5 amazing players and 7 potatoes. With such a small difference, it could even be that one team has 12 decent players and the other has 11 decent players and one s o l i t u d e.

6) I would like to see some interaction variables. In other words, what happens when the high WLR/Kill Rate players are in the assault mechs vs. potato assaults? That could explain everything. What happens when one side has Clan Assault mechs? What I'm saying is that it could be a combination that matters.

7) Unfortunately there's no way to quantify "the most OP thing in the game"--teamwork. It's an absent variable that could explain a lot of the variance. The easiest thing I can think of (an observer on each team that records how many messages were relayed) would still be really difficult and subject to bias.

8) I would love to see a similar approach taken to the group queue matchmaker, which I think we can agree is worse than the solo queue.

9) I would love a supplemental examination of the snowball/deathball effect. In other words, what is the likelihood of victory for a team that starts 2-0? What is the effect of AFK/disconnect players?


Group queue throws another wrench in the works however - group size. And it's not always obvious just from looking at screenshots who is in what group and which groups had more cohesion than others.

Clustering and interaction... yeah, I could probably do something to that effect with comparing averages, medians, and std deviations. That's getting to be a little beyond my understanding, but... there's work to be done there.




View Postmouser42, on 17 January 2018 - 05:13 PM, said:

NICE! Great job also on top of the amazing work you did I'm impressed that people are reading it to.

View PostGrowlly, on 22 January 2018 - 10:58 AM, said:

I know this was necro'd, but I wasn't around then, so thoughts anyway from a scientician...
[...]
EDIT: Also, this was astounding work. Should have said that earlier, was in science mode.

Thanks! Although, rather than astounding and complete, I hope it serves as inspiration for others to dig even deeper and do things more properly than my limited understanding of statistical analysis.




View PostNRP, on 17 January 2018 - 12:54 PM, said:

Is it true Tarogato plays an oboe?

I actually do not have a, nor do I play the... oboe. Double reeds are hard to pick up as side instruments, since dedicated oboe players always make their own reeds and it's a rather expensive and very time-consuming process to produce consistent, good results. Something that multi-instrumentalists or amateurs don't have time for. I can play the bassoon a little bit though, it's slightly more forgiving in the reed department I find. I'm not very good at it, though. =3

My name, however, comes from the tárogató, which is a folk instrument in central/eastern Europe. I don't have one, I've never played one, my heritage has nothing to do with its culture... I just like its sound. It's like a marriage of rustic cultural music and a complex fully modern instrument. Somewhat eccentric and unusual, like myself. It sounds and looks like a clarinet had a baby with a soprano saxophone. It's fairly relatable to the oboe in timbre, and makes for a good jazz horn. =]

Edited by Tarogato, 22 January 2018 - 04:51 PM.


#176 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 22 January 2018 - 10:35 PM

@tarogato;

It is that simple though. Without question.

You split pug/group queue.

Your w/l is based on the average impact of your performance on w/l.

Everything else looks at HOW and WHY. I promise you -

Every single major matchmaker designed by every single group of coders and analysts isn't some study in ineptitude. The single most expensive adventure into matchmaking, TrueSkill, isn't some laughable example of 'they clearly just don't get it'.

You're arguing that LRMs are great 'if you use them right'.

If you attempt to include anything other than w/l in a matchmaker designed to match teams ability to win or lose matches you are designing a flawed matchmaker. If you want to match teams for build types or sniping maps or around some other criteria than their odds of winning vs each other then great - use other factors.

However if you include any other factors in it then your matchmaker is bad and the data unreliable.

Edited to add -

You're making a mistake very common in the analytics field. You've got a ton of data points and you want to USE them. It's the assumption that 'more data sources = more accurate results'.

This is only true when your end result is a mosaic answer; as in you're mining data to get, for example, profiles of shoppers and product sales related to other products or advertising. The answer you're attempting to get from your question is a wide and varied one so more data = more useful results so long as all the data is related and carefully combed.

A matchmaker isn't that. You're asking one specific question (actually making one specific prediction) with a binary answer. The results of the challenge you're predicting are binary as well. You can modify that with a confidence factor, as in how confident you are in your prediction but that's it.

The prediction itself is specific to win/loss. Because players are in teams of 4,8 or 12 your confidence level in each sample size is low; hence needing so many samples to get a good prediction. However the prediction itself, the matchmaker, can only look at w/l because that's all it is predicting about and it's predicting the teams w/l, not the individual player.

Make sense? If you were attempting to predict a predetermined set of players with really, really well mined data history you could build profiles for each individual person on each team and mine their performance data to get better confidence in your prediction by, perhaps, modifying your estimate of their Elo value (or equivalent) but we're not doing anything even remotely like that.

You're just building teams of pugs and attempting to get as balanced a match as possible for who is likely to win. As such w/l is all that matters. Damage, KDR, kills/match, survivability, what fixings they like on their potatoes and the weather in Spain are all unrelated to that in equal measure.

I know it doesn't *feel* that way. It *feels* like these other data points should be relevant to that, something you can mine for a modifier. They're not and you shouldn't. If it was a very closed system and you had a huge amount of data about each player and what mechs/loadouts they were taking you could make a case for confidence studies but the matchmaker itself?

W/L only. Or it's going to be as bad (potentially worse) than PSR.

Edited by MischiefSC, 22 January 2018 - 10:49 PM.


#177 Sjorpha

    Member

  • PipPipPipPipPipPipPipPipPip
  • Philanthropist
  • Philanthropist
  • 4,475 posts
  • LocationSweden

Posted 23 January 2018 - 12:30 AM

View PostMischiefSC, on 22 January 2018 - 10:35 PM, said:

W/L only. Or it's going to be as bad (potentially worse) than PSR.


I feel the term "W/L" is a little confusing though, if it refers to historical win/loss ratio (total win/loss since you started playing) then it isn't necessarily a good measure of your current chances to win.

Equally skilled pilots that have the same chances of winning the next match against a given opponent can have very different historical win/loss ratios depending on how fast they improved to that point. This can be illustrated by starting an alt account and playing it to tier 1, it will have much higher W/L since you will win a lot more matches going through tier 5-2 that you did when you went through these tiers as a newbie. The same difference in histroical w/l will be there between a fast learning FPS prodigy and someone slowly training himself to a high level.

ELO and similar systems don't look at or even know about a players win/loss ratio, they only look at whether you won or lost each match and compares to the predicted result against that opponent, then modifies your ELO rating based on that result. If you improve your ELO very fast this will come with a very high w/l, but slowly improving your ELO with a slightly positive w/l can also happen and reach the same rating.

In the example above the main account and the alt account, as well and the fast vs slow learner example, will reach the same ELO rating at the same skill level even though their W/L ratios are very different and it took much longer for the main/slow learner to get there.

It's true that winning and losing modified by predicted result (if you beat a higher ranked opponent your rating goes up more) is the only relevant factors needed to create ratings and matchmaking in games, and it that sense we agree, but that is not the same as win/loss ratio and it is misleading to imply those systems use that or even know about it.

Edited by Sjorpha, 23 January 2018 - 12:36 AM.


#178 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 23 January 2018 - 12:38 AM

View PostSjorpha, on 23 January 2018 - 12:30 AM, said:


I feel the term "W/L" is a little confusing though, if it refers to historical win/loss ratio (total win/loss since you started playing) then it isn't necessarily a good measure of your current chances to win.

Equally skilled pilots that have the same chances of winning the next match against a given opponent can have very different historical win/loss ratios depending on how fast they improved to that point. This can be illustrated by starting an alt account and playing it to tier 1, it will have much higher W/L since you will win a lot more matches going through tier 5-2 that you did when you went through these tiers as a newbie. The same difference in histroical w/l will be there between a fast learning FPS prodigy and someone slowly training himself to a high level.

ELO and similar systems don't look at or even know about a players win/loss ratio, they only look at whether you won or lost each match and compares to the predicted result against that opponent.

It's true that winning and losing modified by predicted result (if you beat a higher ranked opponent your rating goes up more) is the only relevant factors needed to create ratings and matchmaking in games, and it that sense we agree, but that is not the same as win/loss ratio and it is misleading to imply those systems use that or even know about it.


At this point we're just trying to make it clear that win/loss history is all you take into account when building a matchmaker to predict peoples odds of winning or losing a match. Adding k factor and how it's computed can come after we are clear on what a matchmaker is even trying to do.

W/L is used to create your Elo score but your score itself is adjusted based on the abilities of who you won or lost against.The matchmaker doesn't actually look at your w/l stats; it creates a new value and adjusts it match by match based on your win or loss and who you won or lost against. Very true.

As we don't have an Elo system in place any scraping of existing metrics would still need to focus 100% on w/l to have any relevance; especially if split pug/premade queue.

KDR, damage, match score, kills per match, favorite color and anything else has no place at all in the calculation process for predicting win/loss in this sort of environment. Until we can get even just the smart people (and I count Tarogato high on that list) to understand the how/why of that we're going to end up with stuff like PSR, which may as well split players up by total matches played in the last rolling 365 day cycle.

Edited by MischiefSC, 23 January 2018 - 12:39 AM.


#179 Sjorpha

    Member

  • PipPipPipPipPipPipPipPipPip
  • Philanthropist
  • Philanthropist
  • 4,475 posts
  • LocationSweden

Posted 23 January 2018 - 12:55 AM

View PostMischiefSC, on 23 January 2018 - 12:38 AM, said:

W/L is used to create your Elo score but your score itself is adjusted based on the abilities of who you won or lost against.The matchmaker doesn't actually look at your w/l stats; it creates a new value and adjusts it match by match based on your win or loss and who you won or lost against. Very true.


I just explained that people with the same ELO ratings can have very different w/l ratios, w/l IS NOT used to create your ELO rating.

One example from another game is a competitive chess player deliberately playing only against better opponents as a method of learning, he will have a negative w/l but rapidly improve and his wins will also impact his ELO more. Compare to a casual chess player that likes to win and deliberately plays worse opponents, he'll have a great w/l but his ELO wont go up. Now that can't be done in MWO but it shows the disconnect between w/l and ELO, even though both are measured purely by looking at winning.

You absolutely have to make this distinction to create good matchmaking and good skill ratings, it's not a minor nitpick or hair splitting. Winning or losing and against who is the relevant factor for making good ratings, yes, and those good ratings then are the relevant number for the matchmaket to predict the outcome of a match and build balanced teams.

But w/l is not that, it is simply a blunt stat of your entire match history with no regard to how long you've played, how fast you improved, if you've had long periods of not improving and so on.

For example if you improved very rapidly in the last few months after years of casual play, this will hardly impact your w/l at all since your total match number is so high.

A new player with some talent can easily have a 4.0 w/l, is he better than you or me? Nope, that is why a new player cruising through the low tiers of a game will still have a low ELO, though it will be improving rapidly at that time it isn't guaranteed that player will ever reach a high level.

There are many tier 4 and 5 players with high w/l but who are definitely still bad players compared to most in tier 1 and 2 whose w/l have normalized from playing against equals, matchmaking purely on w/l would create absolute chaos.

Edited by Sjorpha, 23 January 2018 - 01:05 AM.


#180 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 23 January 2018 - 01:08 AM

View PostSjorpha, on 23 January 2018 - 12:55 AM, said:


I just explained that people with the same ELO ratings can have very different w/l ratios, w/l IS NOT used to create your ELO rating.

You absolutely have to make this distinction to create good matchmaking and good skill ratings, it's not a minor nitpick or hair splitting. Winning or losing and against who is the relevant factor for making good ratings, yes, and those good ratings then are the relevant number for the matchmaket to predict the outcome of a match and build balanced teams.

But w/l is not that, it is simply a blunt stat of your entire match history with no regard to how long you've played, how fast you improved, if you've had long periods of not improving and so on.

For example if you improved very rapidly in the last few months after years of casual play, this will hardly impact your w/l at all since your total match number is so high.

A new player with some talent can easily have a 4.0 w/l, is he better than you or me? Nope, that is why a new player cruising through the low tiers of a game will still have a low ELO, though it will be improving rapidly at that time it isn't guaranteed that player will ever reach a high level.

There are many tier 4 and 5 players with high w/l but who are definitely still bad players compared to most in tier 1 and 2 whose w/l have normalized from playing against equals, matchmaking purely on w/l would create absolute chaos.


Well aware of of the difference; the point is that your winning (and losing) is the basis of a matchmaker that doesn't suck. Not even going to say a 'good' matchmaker, just any matchmaker that isn't terrible.

However to be fair even a straight w/l MM would normalize with sufficient sample size. After a few hundred matches anyone with a good w/l ends up in T2 and as such is playing against T1s which will normalize them pretty quickly.

W/L is a viable mechanic on its own to use to create matchmaking tiers. It would (if you take total matches to make a confidence value) be better than PSR is. If you take the Jarls List, sort it by W/L and eliminate everyone with under 600 matches you get a significantly better representation of player skill ranking than matchscore for example. However without question an Elo based/style system would be vastly superior still and provide a useful matchmaking foundation.

I don't disagree with anything you've said though and yes, absolutely, a good matchmaker isn't looking at your historic w/l; it's creating a new value and modifying it on a match by match basis depending on who you played against.

It's also important to note that the issue with w/l isn't that you drop with random players; its that because of existing and prior matchmakers segments of your w/l can be skewed.

Also that at no point does damage/KMDDs/whatever else come into it as useful for matchmaking.

To reiterate though you are absolutely correct in that an Elo system is *not* just using your w/l record or something like that. It's an independent score that is adjusted based every match you play based on the score of who you played against, which is what makes it a very accurate and trustworthy system.

If we had group/pug queue scores split.

Edited by MischiefSC, 23 January 2018 - 01:08 AM.






1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users