What makes a MM good or high quality?
Some people say matches that end in stomps (lopsided scores of 12-0, 12-1) are bad, but the truth is even when both teams have a 50/50 chance of winning at the beginning, stomps will still happen because of the snowball effect. That been said, if teams are unbalanced, for example if the chance of winning due to team imbalance is 10/90, the chance of a stomp is much higher. Therefore, I would like to define a good MM as one that makes teams with 50/50 chances of winning, as this maximizes the chances of a fair fight and at the same time minimizes the chances of a stomp.
How this thread is different from the others
Before, whenever people suggested improving the MM, it is with a method that is somewhat unscientific. This thread will substantiate with scientific methods the suggestion it contains. We will create a simulation with rules based on how we know the current MM works, create metrics on quality of matches the simulated MM makes, check that this quality corresponds to our current experiences. Then we will tweak the MM per our suggestion and rerun the simulation, seeing if the metrics on match quality has improved.
Simulation of Current MM
This is a tough section, to skip to the next section, just understand the numbers picked here are to create simulation results similar to what we see in game today.
To simulate the current MM, first I created 100 Tier 1 players with a hidden skill level that range from 200 to 1800 following a bell curve (normal distribution). The skill level is hidden because it is not directly accessible by the MM. From these 100 players, the simulated MM will randomly select 24 of which a random 12 will be assigned to each team. This matches the current MM in that there is no consideration to past performance.
To determine who wins and whether there is a stomp, I calculate the hidden skill total for each team. If the skill total for both teams is the same, the win chance is 50/50. If the skill totals are different, I estimate the win chance for Team 1 based on the difference. (tough: win chance = Cumulative Density Function of the Standard Normal based on the skill total difference / 800). I then generate a random uniform number between 0 to 100, if this random number is <= the win chance, Team 1 wins. To calculate if a stomp happens, if the difference between the random number and the win chance is >= 47.5, then a stomp occurred. This means for a balanced match, there is only a 5% chance of a stomp, but if a team has a win chance of 99%, the stomp chance increases to 52.5%.
After simulating each match, I record the results of the match and update the individual player stats.
Results of Simulation of Current MM
After running the simulation for 10,000 matches, I created some graphs to represent the quality of matches created by this simulation of the current MM. If you are interested, you can see the individual stats of all 100 players here (https://imgur.com/Dvkrq7X).
First, we have a summary of the WLR of the players based on their hidden skill level. It doesn't look bad, the best players have 2WLR, going down to 0.5 for the worst. (Some may think, hey, we see people with >3WLR in the Jarl's list! Keep in mind that with a database of 40k players, you'll see more extreme values of skills, but they represent <0.1% of the pop. We can add higher skilled players to this sim, but with their rarity it's not necessary or helpful.)
Then we look at the chance of winning based on the teams created by the MM, here it looks very bad. Only about 15% of matches have a decent win chance of 35-65. More than 50% of matches are guaranteed wins or losses 0-15, 85-100.
Finally, based on our way of calculating when stomps occur, average players experience 1/5 of matches as a stomp against them. However, lower skilled players lose to stomps 3X as often as high skilled players.
Hopefully these results are in the right ballpark per everyone's experiences. Perfection is not the goal here, but to use these results to simulate how much of an improvement we can expect from switching out the MM.
Simulation of Win-Loss Ratio (WLR) based MM
We make one change to the simulation above. Where before we picked 24 player out of 100 and tossed them randomly into 2 teams, instead we will first sort the 24 players based on their WLR, put the 1st (highest WLR) player onto team 1, the 2nd and 3rd onto team 2, etc etc, just as in a regular pick-up game between friends. For an example of a team being created, see (https://imgur.com/xEWJR5k) and please note that ties use a random tiebreaker.
What happens as a result?
Results of Simulation of WLR MM
After running the simulation for another 10,000 matches, I recreated the same graphs to represent the quality of matches. If you are interested, you can see the individual stats of all 100 players here (https://imgur.com/JEsoC5Q).
First up, the WLR ratio of players by their hidden skill level (as a reminder, hidden means unknowable by the MM). Some will question why all the WLR is not 1, the reason is because if it all become 1, the WLR MM would become blind and the more skilled players would start winning more as in the simulation of the current MM. Therefore, it is impossible for WLR for everyone to become 1.
Next, the chance of winning based on the teams created by the MM. For teams made with a win chance in the range of 35-65, instead of 15% of all matches, we now see 50% of all matches. Likewise, 'unwinnable' matches with a win chance of 0-15, 85-100 have dropped from >50% of all matches to 8%. The degree of improvement is extreme.
Lastly, the chance of stomps has dropped for everyone. It is most noticeable for lower and average skilled players where it has dropped by more than half. For the highest skilled players the change is minor. Keep in mind there is a minimum of 5% stomps even with perfectly even teams due to how we created the simulations, so it is impossible to get to 0.
Conclusion
I hope graphs make it clear we could expect a large improvement in the quality of matches made by using WLR to sort players into teams
What I Do
I analysis data from the development of various healthcare products. I program stats for a living basically. I've worked the breadth from all sorts of chemotherapy for cancer, vaccines for viruses, implants from spinal disk replacements, knees, heart, breasts, and all sorts of other stuff.
Edited by Nightbird, 08 June 2019 - 08:16 PM.