Jump to content

The Last Match Maker Thread We Need


248 replies to this topic

#1 Nightbird

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 7,518 posts

Posted 08 June 2019 - 07:49 PM

This thread is an analysis on how to create a Match Maker (MM) for Quick Play Solo Queue that makes better quality matches than what we have today. Be warned that despite trying my best to explain it as simply as possible, it is still a tough read, proceed if you're interested and up to the challenge.

What makes a MM good or high quality?

Some people say matches that end in stomps (lopsided scores of 12-0, 12-1) are bad, but the truth is even when both teams have a 50/50 chance of winning at the beginning, stomps will still happen because of the snowball effect. That been said, if teams are unbalanced, for example if the chance of winning due to team imbalance is 10/90, the chance of a stomp is much higher. Therefore, I would like to define a good MM as one that makes teams with 50/50 chances of winning, as this maximizes the chances of a fair fight and at the same time minimizes the chances of a stomp.

How this thread is different from the others

Before, whenever people suggested improving the MM, it is with a method that is somewhat unscientific. This thread will substantiate with scientific methods the suggestion it contains. We will create a simulation with rules based on how we know the current MM works, create metrics on quality of matches the simulated MM makes, check that this quality corresponds to our current experiences. Then we will tweak the MM per our suggestion and rerun the simulation, seeing if the metrics on match quality has improved.

Simulation of Current MM

This is a tough section, to skip to the next section, just understand the numbers picked here are to create simulation results similar to what we see in game today.

To simulate the current MM, first I created 100 Tier 1 players with a hidden skill level that range from 200 to 1800 following a bell curve (normal distribution). The skill level is hidden because it is not directly accessible by the MM. From these 100 players, the simulated MM will randomly select 24 of which a random 12 will be assigned to each team. This matches the current MM in that there is no consideration to past performance.

To determine who wins and whether there is a stomp, I calculate the hidden skill total for each team. If the skill total for both teams is the same, the win chance is 50/50. If the skill totals are different, I estimate the win chance for Team 1 based on the difference. (tough: win chance = Cumulative Density Function of the Standard Normal based on the skill total difference / 800). I then generate a random uniform number between 0 to 100, if this random number is <= the win chance, Team 1 wins. To calculate if a stomp happens, if the difference between the random number and the win chance is >= 47.5, then a stomp occurred. This means for a balanced match, there is only a 5% chance of a stomp, but if a team has a win chance of 99%, the stomp chance increases to 52.5%.

Posted Image

After simulating each match, I record the results of the match and update the individual player stats.

Posted Image

Results of Simulation of Current MM

After running the simulation for 10,000 matches, I created some graphs to represent the quality of matches created by this simulation of the current MM. If you are interested, you can see the individual stats of all 100 players here (https://imgur.com/Dvkrq7X).

First, we have a summary of the WLR of the players based on their hidden skill level. It doesn't look bad, the best players have 2WLR, going down to 0.5 for the worst. (Some may think, hey, we see people with >3WLR in the Jarl's list! Keep in mind that with a database of 40k players, you'll see more extreme values of skills, but they represent <0.1% of the pop. We can add higher skilled players to this sim, but with their rarity it's not necessary or helpful.)

Posted Image

Then we look at the chance of winning based on the teams created by the MM, here it looks very bad. Only about 15% of matches have a decent win chance of 35-65. More than 50% of matches are guaranteed wins or losses 0-15, 85-100.

Posted Image

Finally, based on our way of calculating when stomps occur, average players experience 1/5 of matches as a stomp against them. However, lower skilled players lose to stomps 3X as often as high skilled players.

Posted Image

Hopefully these results are in the right ballpark per everyone's experiences. Perfection is not the goal here, but to use these results to simulate how much of an improvement we can expect from switching out the MM.

Simulation of Win-Loss Ratio (WLR) based MM

We make one change to the simulation above. Where before we picked 24 player out of 100 and tossed them randomly into 2 teams, instead we will first sort the 24 players based on their WLR, put the 1st (highest WLR) player onto team 1, the 2nd and 3rd onto team 2, etc etc, just as in a regular pick-up game between friends. For an example of a team being created, see (https://imgur.com/xEWJR5k) and please note that ties use a random tiebreaker.

What happens as a result?

Results of Simulation of WLR MM

After running the simulation for another 10,000 matches, I recreated the same graphs to represent the quality of matches. If you are interested, you can see the individual stats of all 100 players here (https://imgur.com/JEsoC5Q).

First up, the WLR ratio of players by their hidden skill level (as a reminder, hidden means unknowable by the MM). Some will question why all the WLR is not 1, the reason is because if it all become 1, the WLR MM would become blind and the more skilled players would start winning more as in the simulation of the current MM. Therefore, it is impossible for WLR for everyone to become 1.

Posted Image

Next, the chance of winning based on the teams created by the MM. For teams made with a win chance in the range of 35-65, instead of 15% of all matches, we now see 50% of all matches. Likewise, 'unwinnable' matches with a win chance of 0-15, 85-100 have dropped from >50% of all matches to 8%. The degree of improvement is extreme.

Posted Image

Lastly, the chance of stomps has dropped for everyone. It is most noticeable for lower and average skilled players where it has dropped by more than half. For the highest skilled players the change is minor. Keep in mind there is a minimum of 5% stomps even with perfectly even teams due to how we created the simulations, so it is impossible to get to 0.

Posted Image

Conclusion
I hope graphs make it clear we could expect a large improvement in the quality of matches made by using WLR to sort players into teams

What I Do
I analysis data from the development of various healthcare products. I program stats for a living basically. I've worked the breadth from all sorts of chemotherapy for cancer, vaccines for viruses, implants from spinal disk replacements, knees, heart, breasts, and all sorts of other stuff.

Edited by Nightbird, 08 June 2019 - 08:16 PM.


#2 The6thMessenger

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Nova Captain
  • Nova Captain
  • 8,104 posts
  • LocationFrom a distance in an Urbie with a HAG, delivering righteous fury to heretics.

Posted 08 June 2019 - 08:59 PM

If only PGI actually listens to the community.

#3 Kiiyor

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Big Daddy
  • Big Daddy
  • 5,565 posts
  • LocationSCIENCE.

Posted 08 June 2019 - 10:17 PM

View PostNightbird, on 08 June 2019 - 07:49 PM, said:


Conclusion
I hope graphs make it clear we could expect a large improvement in the quality of matches made by using WLR to sort players into teams

What I Do
I analysis data from the development of various healthcare products. I program stats for a living basically. I've worked the breadth from all sorts of chemotherapy for cancer, vaccines for viruses, implants from spinal disk replacements, knees, heart, breasts, and all sorts of other stuff.


Heck yes! This is some pretty glorious science!

Posted Image

I really like your numbers, especially the w/l ratio. The players I know with the higher w/l ratios (above 2) tend to predominately play in the group queues, whereas those decent ones I know that PUG tend to cap out around 1.5.

Man, I wish we could separate PUG and group stats.

I'd be really interested in seeing if you could simulate skill improving and decreasing across your pilots - maybe having your pilots starting out lower skilled, then rising as they get better, and plateauing as they move up tiers and meet more skilled players - as I think that's where a lot of MM complaints come from. Say there's a new-ish player who finds a mech and build that agrees with them, stomps their way from T5 to T3 or T2, and hits an enormous wall as they find players better suited to exploit their weaknesses. Maybe a skill bell curve.

There's some mad science that would need to go into that though.

...

I was actually slogging through something similar to your stats after arguments in other matchmaker threads, but I was using the Jarl's list (not perfect, but better than tier balance IMHO) as a basis for balance. Basically, people were using screens of stomps 12-3 or lower to justify their belief that the MM was random, and that too many potatoes on one side spoiled the flavor of the whole dish. My stance was that the MM could only do so much with what it had, and that the outcome of a battle was based more on how each team handled the first few minutes of contact, their play styles complimenting each other (or not), or their decisions in the mechlab, rather than their overall skill levels - as i'd seen innumerable matches where great players were wrecked because of poor team decisions at the start.

Basically, I was countering their anecdotal evidence with anecdotal evidence of my own, so I decided to grab the results of some of their screens, and compare them with some of my own (i screenshot pretty much every match i'm in) to show that correlation doesn't imply causation.

So, I started painstakingly transcribing player names from each match into Jarls and getting their overall aggregate skill levels to compare to each other. Lo and behold, the first few stomp screens I had went either way as far as overall skill went. Some seemed to show skill mattered, some didn't. Kiiyor, this bears further investigation! I thought to myself.

Then I remembered how much of a pain in the *** it was to grab match results from the last time I had done it, and wandered away.

Here's a snippet:

One of the stomp screens used by others:

Spoiler


The winning team here significantly outskilled the losers. Really, it was stacked against the losers. No lube.




One of the first stomps I grabbed from my own games:
And it was a 12-0 to boot!

Spoiler


While the swing isn't as drastic, the losers of this stomp (and I was one of the vanquished) were more skilled overall, yet still couldn't muster a kill between them. I was in the match, and know we lost because we split pretty drastically at the start, and one of our blobs was caught by the enemy's bigger blob.

Still... 2 matches total, uber anecdotal, your sample sizes are small and you should feel bad blah blah totally correct blah blah.

I had transcribed a few other stomp results, and there seemed to be a lean towards a skill gap being a contributing factor, but I only had 4 or 5 matches, not nearly enough to get a decent sample size, and my free trial of my text detection software had run out since my last big batch, meaning I had to grab the player names manually...

Anyhoo, this is a really roundabout way of me saying I agree that pretty much any stat based MM algorithm is better than balancing by MWO's Tier list - but I think that true fair and balanced matchmaking in MWO is near impossible thanks to the mind boggling numbers of variables involved.
  • Are the decent w/l players rolling in meta mechs? Are they bored and experimenting? Have they made questionable decisions in the mechlab?
  • Is a player's mech and playstyle suitable for the map, and the other players on it?
  • Are the skilled pilots rolling in a mech from the class they do best in?
    • I've seen matches where players who predominately roll in Assaults and do well jump into lights and mediums and do... not so well.
  • Is there enough population to handle more complex matchmaking? Wait times are already pretty crap....
There's more, but the arguments are pretty tired at this point.


Anyhoo, kudos to you for bringing numbers to an anecdote fight. I'd be SUPER interested in checking the match results for your simulation, vs those we can get from the Jarl's list with the matches we already have through EOM screens, but it's a mad pain in the butt to transcribe them.

I'd also love to see if PGI could simulate wait times based on a more complex algorithm. I bet they already have numbers of their own for this. It seems to me that the tier MM protocols were implemented to replace the days of PSR matchmaking to speed wait times for a game with a slowly decreasing population.

#4 MrMadguy

    Member

  • PipPipPipPipPipPipPipPipPip
  • 2,259 posts

Posted 08 June 2019 - 10:28 PM

I also wanted to fully simulate current MM in order to prove, that PSR is biased towards increasing. But I was too lazy to make full simulation.

Now. What is wrong with this simulation? It doesn't take many important factors into account. For example, that all 100 players aren't always available at the same time. That there is always constant flow of new and retired players. I.e., yeah, in all-vs-all situation WLR matchmaker would work due to law of large numbers. But it won't work in real conditions. That's why AvgMS part is very important. Because when you compare A + B + C with X + Y + Z, it's impossible to distinguish, what is A's contribution into A + B + C. And AvgMS allows you to do it.

That's, why I decided to focus on my own PSR level. Only thing, we don't know about it - is actual PSR change values. Unfortunately I don't have full stats from all matches, I've ever played. That's why I have to guess, that they have some normal distribution. If I would have full stats from at least one player, I would be able to fully crack PSR system then. What I currently suspect - is that for matches below 400MS it can be balanced. But any match above 400 will boost your PSR dramatically.

Edited by MrMadguy, 08 June 2019 - 10:38 PM.


#5 Nightbird

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 7,518 posts

Posted 08 June 2019 - 10:45 PM

View PostKiiyor, on 08 June 2019 - 10:17 PM, said:

Anyhoo, this is a really roundabout way of me saying I agree that pretty much any stat based MM algorithm is better than balancing by MWO's Tier list - but I think that true fair and balanced matchmaking in MWO is near impossible thanks to the mind boggling numbers of variables involved.
  • Are the decent w/l players rolling in meta mechs? Are they bored and experimenting? Have they made questionable decisions in the mechlab?
  • Is a player's mech and playstyle suitable for the map, and the other players on it?
  • Are the skilled pilots rolling in a mech from the class they do best in?
    • I've seen matches where players who predominately roll in Assaults and do well jump into lights and mediums and do... not so well.
  • Is there enough population to handle more complex matchmaking? Wait times are already pretty crap....
The one thing to not get too worked about in stats is getting overwhelmed by all the possible factors. For example, you can be tired one day and perform worse than if you're fresh and focused. Do you want the MM to take this into consideration as well? That'd be silly right?


The simulation accounts for all the other factors by making the result random. Rather than saying the team with higher skill total always wins, I calculate a percent. The stronger team can still lose because said of variations in everyone's mech choices, cooperation in the match, emotional status, underwear color, everything. The component of all choices, all factors of a player that can positively or negatively impact the matches they are in is rolled into one artificial construct: skill. Everything else I throw into a random number to salt the result.

Whether I salted it the right amount is in the results of the simulation of the current MM. I can easily increase or decrease the number of stomps for example, just by tweaking the parameters, but without provided stats on the actual number of stomps, I'm limited in how accurate I can make it. That been said, even if I did change it, the RELATIVE improvement with switching to a WLR MM will not change, this is the important piece.

View PostMrMadguy, on 08 June 2019 - 10:28 PM, said:

Now. What is wrong with this simulation? It doesn't take many important factors into account. For example, that all 100 players aren't always available at the same time. That there is always constant flow of new and retired players. I.e., yeah, in all-vs-all situation WLR matchmaker would work due to law of large numbers. But it won't work in real conditions. That's why AvgMS part is very important. Because when you compare A + B + C with X + Y + Z, it's impossible to distinguish, what is A's contribution into A + B + C. And AvgMS allows you to do it.


Thank you for an example of 'Before, whenever people suggested improving the MM, it is with a method that is somewhat unscientific.' If you analyze avgMS with the same way as this thread, you'll see it provides only 20% the benefit that WLR gives.

Edited by Nightbird, 08 June 2019 - 10:56 PM.


#6 MrMadguy

    Member

  • PipPipPipPipPipPipPipPipPip
  • 2,259 posts

Posted 08 June 2019 - 11:08 PM

View PostNightbird, on 08 June 2019 - 10:45 PM, said:

Thank you for an example of 'Before, whenever people suggested improving the MM, it is with a method that is somewhat unscientific.' If you analyze avgMS with the same way as this thread, you'll see it provides only 20% the benefit that WLR gives.

No, pure WLR MM is as bad, as ranking players by Jarl's list, as skill is relative - not absolute. I.e. in ideal situation all players would have WLR = 1 and AvgMS = 200, so "absolute" MM wouldn't be able to work properly. That's why we have MMR, that increases, when player performs well and decreases, when he performs bad. But I think, that in QP AvgMS should be the most important factor. The only one in ideal. WLR is only needed, because some ways of contribute towards victory can't be measured by score. But in reality they're only about 5-10% of all other factors in QP. PGI still push WLR as the most important factor, because they still think, that their game should be about teamplay even in QP, where teams are just bunches of randoms. It's the same reason, why we still don't have deathmatch, despite of fact, that it would be the funniest mode. It's the same E-Sport crap, that plagues many other games. Some gamedevs are way to stubborn to realize, that their vision of game is wrong and that they should make game for players, not for themselves.

Edited by MrMadguy, 08 June 2019 - 11:11 PM.


#7 Kiiyor

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Big Daddy
  • Big Daddy
  • 5,565 posts
  • LocationSCIENCE.

Posted 08 June 2019 - 11:12 PM

View PostMrMadguy, on 08 June 2019 - 10:28 PM, said:

But any match above 400 will boost your PSR dramatically.


This. The complaints have been made innumerable times, that Tier gain is pretty much an XP bar. The score increase for a win is decent even if you don't contribute, and I daresay the tier decrease swing point for a loss is probably a little too generous. I find myself at least breaking even on losses to be (generally) a rule, rather than an exception. Plus, you only have to land a single decent strike to inflate your damage numbers for a match, or hang back in a multiple AMS mech and watch the dollar signs flow in for each missile you farm.

You're right that it's the contributing factors that muddy the waters. Not only does the MM have to balance by tier, it also has to *try* and fit the ideal of 4/4/4/4. Which part does the current MM prioritize? Filling the weight class brackets, or Tier?

Tough balancing act for PGI though to provide a sense of progress. I guess if you view a player's Tier level as their familiarity with game mechanics rather than overall skill, it's more palatable. Maybe there should be some form of extra skill calculation attempted for the players in each Tier match the MM tries to make as it fills the roster.

I think that the best way to SCIENCE it all would be to check the OP's findings against a sample of actual matches, to see how much an influence player skill is over actual match results. The issue is sample size - I know first hand how hard it is to get a decent sample of EOM screens. There are other issues with manipulating the data also - if you're using Jarls, do you take overall stats, or the most recent X seasons? If you're using MWO vanilla stats from a player's page, is it an accurate representation of where they are now vs their overall aggregate?

Edited by Kiiyor, 08 June 2019 - 11:13 PM.


#8 Nightbird

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 7,518 posts

Posted 08 June 2019 - 11:15 PM

View PostMrMadguy, on 08 June 2019 - 11:08 PM, said:

No, pure WLR MM is as bad, as ranking players by Jarl's list, as skill is relative - not absolute. I.e. in ideal situation all players would have WLR = 1 and AvgMS = 200, so "absolute" MM wouldn't be able to work properly. That's why we have MMR, that increases, when player performs well and decreases, when he performs bad. But I think, that in QP AvgMS should be the most important factor. The only one in ideal. WLR is only needed, because some ways of contribute towards victory can't be measured by score. But in reality they're only about 5-10% of all other factors in QP. PGI still push WLR as the most important factor, because they still think, that their game should be about teamplay even in QP, where teams are just bunches of randoms. It's the same reason, why we still don't have deathmatch, despite of fact, that it would be the funniest mode. It's the same E-Sport crap, that plagues many other games. Some gamedevs are way to stubborn to realize, that their vision of game is wrong and that they should make game for players, not for themselves.


In my simulation, skill is something which when you have more of, you win more, less of, win less. It doesn't matter where it comes from, the mech lab, twitch aim skills, situational awareness, anything and everything. AvgMS covers 20% of everything.

#9 MrMadguy

    Member

  • PipPipPipPipPipPipPipPipPip
  • 2,259 posts

Posted 08 June 2019 - 11:33 PM

Ehhh. That church of WLR MM. Same, as church of random map selection.

I personally see only two possibilities here.

1) PGI should become 100% fair, stop trying to manipulate players via not providing full information to them, as many gamedevs do, and finally admit, that matches just can't be balanced. And therefore open all information about match balance, such as player Tiers, PSRs and provide some compensations/penalties for playing unbalanced matches, so there wouldn't be such situation, that some players always play on easy mode and earn 400k CB in every match, while others barely get 50k. For example, if I would have to play in match above my skill level, I would get compensation bonus. Or may be it should be call to arms system, that will give me a choice, if I want bonus and shorter queue or higher quality of game, but longer queue.

2) PGI should realize, that game shouldn't be 100% mathematically balanced. It should be fun. And therefore we don't need 100% mathematically correct MM, that would provide pure 50/50 chance to win. We need simple AI MM, that would try to provide good and fun gaming experience to players. For example for me it would be much more better, if 50% matches would be stomps against Tier 1, but other 50% matches would be good fun matches against Tier 4. Better, than always having "balanced" average 200MS, i.e. something between 100 and 300.

Edited by MrMadguy, 08 June 2019 - 11:38 PM.


#10 Kiiyor

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Big Daddy
  • Big Daddy
  • 5,565 posts
  • LocationSCIENCE.

Posted 08 June 2019 - 11:36 PM

View PostNightbird, on 08 June 2019 - 11:15 PM, said:


In my simulation, skill is something which when you have more of, you win more, less of, win less. It doesn't matter where it comes from, the mech lab, twitch aim skills, situational awareness, anything and everything. AvgMS covers 20% of everything.


A good point. One of the main issues though is how much your own skill is offset by potatoes. With 11 other players, it would be interesting to see how much impact a decent player has if they're stacked with average teammates. It's probably the chief complaint in losing matches anyway. Kiiyor <all> "**** TEAM! YOU GUYS ALL SUCK OMG".

Maybe checking something like the Jarls and grabbing a sample of the leaderboards sorted by average score would show how much of an influence MS has on winrate.

Probably wouldn't be too disparate to what you have now though.

Makes me wonder what the overall match scores are now, and if there's a way to simulate the final match score with your data (12-6, 12-11 or whatever) in addition to the % chance of winning. That's one of the main perceptions of a close match. You'd be hard pressed for people to whine about GG's in chat for a 12-10, vs a 12-3.

Even that's not the closest representation of a decent match though. I'd love to be able to see the %health remaining of each player, and the team overall. Losing 12-4 would be easier to swallow if you could see that the enemy were milimeters from disaster themselves.

What were we talking about again?

#11 Nightbird

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 7,518 posts

Posted 08 June 2019 - 11:41 PM

View PostKiiyor, on 08 June 2019 - 11:36 PM, said:

Makes me wonder what the overall match scores are now, and if there's a way to simulate the final match score with your data


You know how my model is Skill + Randomness right? If you change it to Skill + avgMS + Randomness, nothing changes. Everything important in avgMS is captured in Skill, so avgMS is treated is extraneous noise.

View PostMrMadguy, on 08 June 2019 - 11:33 PM, said:




Nice, you have a random word generator :D

#12 Jackal Noble

    Member

  • PipPipPipPipPipPipPipPipPip
  • 4,863 posts
  • LocationTerra

Posted 09 June 2019 - 12:05 AM

Wow, fantastic and well presented analysis. Thank you for taking the team to break it down into an easier to digest format for us less stat inclined nerds.
Hopefully this gets implemented (or some form of) asap.

Obviously there are many other variables that are hard or impossible to account for; as I understand it, WLR is the best overall way to address that despite all else.

#13 Sjorpha

    Member

  • PipPipPipPipPipPipPipPipPip
  • Philanthropist
  • Philanthropist
  • 4,478 posts
  • LocationSweden

Posted 09 June 2019 - 02:15 AM

Nice work.

Have you considered the problem that wlr can depend on how quickly you've improved? If you played several years and improved slowly your wlr is much lower than for someone who improved quicker, even if your wlr for the last 1000 games is equal or even higher your total wlr will always be lower because it might be dragged down by thousands of past matches with low performance.

Maybe only the last 500 matches or so should be used?

#14 Kiiyor

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Big Daddy
  • Big Daddy
  • 5,565 posts
  • LocationSCIENCE.

Posted 09 June 2019 - 02:49 AM

View PostNightbird, on 08 June 2019 - 11:41 PM, said:


You know how my model is Skill + Randomness right? If you change it to Skill + avgMS + Randomness, nothing changes. Everything important in avgMS is captured in Skill, so avgMS is treated is extraneous noise.



Yeah, I was just thinking of ways to refine the numbers, or to add some result based context for players. To me, discussing WL/R is a more nebulous way to identify the 'closeness' of a decent match, as opposed to match results, for all that WL/R is probably more accurate.

I'm not attacking your numbers or methodology either, I really dig it. It's far more scientifically sound than just about anything I've seen presented here.

I did some number crunching myself aeons ago, and collected data from EOM screens for around 1000+ matches, not that long after 4x3 was first implemented. It wasn't specifically about match results, but they were mentioned and science'd a little.

The average match result then (waaaaaay back in 2014) was 12-5 (that's the average of #kills/match and #deaths/match), but the most prolific match result was 12-3. No-one likes 12-3's. 12-3's are where people get angry at people who say gg. SaltyDude <all> NO IT WASN'T A GG. SaltyDude has disconnected.

Here's a distribution chart I just whipped up from the data I have from back then:

Posted Image



I've no idea what modern match results would look like, but I daresay they wouldn't be drastically different to what we have there - they may even be better, due to balance passes and whatnot.

Using those numbers, your stomp chance (I'm assuming 12-3 or lower, or for matches that don't see one team kill all the opposition, a difference of 75% or so) is around 40%. That's insane in the membrane, and a little more bleak than the 1/5 chance you used (what do you consider to be a stomp tho? a 12-0?)

MM methodologies are great to debate, but it's harder to justify them if you can't relate them to the outcome of a match.

My point is, what are we aiming for as a 'good' match result? What should the MM be trying to organize? If the MM was able to perfectly balance every match, pitting each player against an opponent of the exact same skill level, and using whichever metrics it could to ensure each team was perfectly balanced against each other, 12-11 nailbiters should be the order of the day. That will never happen though, so what would you consider to be a good match result?

Personally, i'm happy with anything 12-8 and higher, and content with anything 12-6+, and out of those scores, I sort of... don't care if I win. If I go deeper, i'd go as far as to say that a the kill-death results on the scoreboard and whether or not we won or lost is of far lower importance to me than my damage and match score. That's just me though; as a selfish pug lord, I value my own performance over the results of the team Posted Image

Using the numbers above in that chart, if I did approach my MWO match satisfaction level based on what I consider to be good scoreboard matches, i'd be not-angry with 23% of my matches.

So, how would changing the MM to a W/L ratio algorithm affect the general outcome of a match?

If you spread match results through your data of 12-0 being the worst (a mega stomp), to 12-11 being the best, and apply it to your own results, I daresay you'd see 12-6'es being the more prolific match result, instead of 12-3's, if the curves are similar. It's easier to get on board with match results you're more readily able to relate them to an EOM screen, IMHO. If you could show that a WL/R MM could improve matches from 12-3's to 12-6'es i'd be all over it, getting drunk and tweeting Russ and Paul.

Forgive me if i'm not making much sense, i've been up for close to 30 hours now.

#15 Roland09

    Member

  • PipPipPipPipPipPip
  • Tai-shu
  • Tai-shu
  • 474 posts
  • LocationLuthien, Draconis Combine

Posted 09 June 2019 - 03:09 AM

View PostNightbird, on 08 June 2019 - 07:49 PM, said:

Conclusion
I hope graphs make it clear we could expect a large improvement in the quality of matches made by using WLR to sort players into teams

What I Do
I analysis data from the development of various healthcare products. I program stats for a living basically. I've worked the breadth from all sorts of chemotherapy for cancer, vaccines for viruses, implants from spinal disk replacements, knees, heart, breasts, and all sorts of other stuff.


Who are you, and why is Tarogato having a vacation in your basement?

#16 General Solo

    Member

  • PipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • 3,625 posts

Posted 09 June 2019 - 03:42 AM

Strongly dis agree with the tittle
We need all the match maker threads we can get
It needs to be the new Lerm/nacar/mechOP/weaponOP/MapSux/NewNewFlavourOfTheMonth thread

Been the primary cause of dieing game modes all over
Lets take a count Faction Play, Group Queue......Dont Let it hapeen to Quick Play and Fix FW n GQ MM pronto

If thats possible wid small pop.......I aint no mathamagician

Edited by OZHomerOZ, 09 June 2019 - 03:45 AM.


#17 Weeny Machine

    Member

  • PipPipPipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 4,014 posts
  • LocationAiming for the flat top (B. Murray)

Posted 09 June 2019 - 04:25 AM

Do you honestly think they will spend any more resources for this game?

#18 The6thMessenger

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Nova Captain
  • Nova Captain
  • 8,104 posts
  • LocationFrom a distance in an Urbie with a HAG, delivering righteous fury to heretics.

Posted 09 June 2019 - 04:44 AM

View PostNightbird, on 08 June 2019 - 10:45 PM, said:

The one thing to not get too worked about in stats is getting overwhelmed by all the possible factors. For example, you can be tired one day and perform worse than if you're fresh and focused. Do you want the MM to take this into consideration as well? That'd be silly right?


I think it would be beneficial to have some seasonal MM reset or something. It would mean that you could build your rating back up, though by briefly waddling out there with your unequals, but at least if you ****** up your MM badly it wouldn't be much of a grind crawling out of the hole you dug yourself into.

If not a complete reset, it would probably also work if there's a gradual decrease of your ELO overtime, towards the mean, or increase of it if you are below the mean. That means you could just stop playing for a while and your ELO would return to the mean, or if you're interested in maintaining your standing you just have to play regularly and keep your rank.

After all, chances are, you would be playing heavily anyways if you are interested and you wouldn't feel your rank decreasing if that's the case, assuming that you have the skill to maintain it.

Edited by The6thMessenger, 09 June 2019 - 04:51 AM.


#19 Nightbird

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 7,518 posts

Posted 09 June 2019 - 08:58 AM

View PostKiiyor, on 09 June 2019 - 02:49 AM, said:

Using those numbers, your stomp chance (I'm assuming 12-3 or lower, or for matches that don't see one team kill all the opposition, a difference of 75% or so) is around 40%. That's insane in the membrane, and a little more bleak than the 1/5 chance you used (what do you consider to be a stomp tho? a 12-0?)


I'm defining a stomp as something which has a 5% chance of happening if the teams are perfectly balanced, but the chance goes up for unbalanced teams, hitting a max of 52.5% if a team has a 100% chance of winning. This definition gave around 36% of all matches ended with a stomp with the current MM sim, and decreases to 19% with the WLR MM sim.

Compared with your numbers, this definition corresponds to scores up to 12-3 where you have around 34%. But, it also depends on the skill of your team mates in involved. You can see that the stomp chance decreases drastically as people become more skilled, therefore the data would change based on who is doing the sampling.

Edited by Nightbird, 09 June 2019 - 09:44 AM.


#20 Nightbird

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 7,518 posts

Posted 09 June 2019 - 09:05 AM

View PostSjorpha, on 09 June 2019 - 02:15 AM, said:

Nice work.

Have you considered the problem that wlr can depend on how quickly you've improved? If you played several years and improved slowly your wlr is much lower than for someone who improved quicker, even if your wlr for the last 1000 games is equal or even higher your total wlr will always be lower because it might be dragged down by thousands of past matches with low performance.

Maybe only the last 500 matches or so should be used?


Calculating from the last 500 or 1000 matches would be OK and would actually be what I recommend. That been said, it's not necessary since even if the MM under-estimated you, the only harm is getting some extra wins until you reach your true WLR. In comparison, to keep the last 1000 matches, you actually have to store the data from those those matches. On your 1001th game, do you delete a W or L from your first game? Might be too much effort for the tiny improvement.

Edited by Nightbird, 09 June 2019 - 09:27 AM.






5 user(s) are reading this topic

0 members, 5 guests, 0 anonymous users