Jump to content

Do Heavier Teams Win More?

Balance

28 replies to this topic

#1 dubstep albatross

    Member

  • PipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 68 posts

Posted 30 August 2021 - 07:47 PM

Do heavier teams win more?

TL;DR Yes, both in the observed data and under statistical analysis, but it probably does not actually matter much. There is a fancy chart at the bottom.

MOTIVATION

The topic of tonnage imbalance and the match maker come up time and time again. Some insist the tonnage imbalance skews win/loss, others claim that tonnage imbalance does not matter. The former claim usually revolves around how many more assaults one team had than the other and the latter claim usually revolves around how good players offset raw tonnage. A heavier mech does not automatically make for a more effective mech, even with all other things being equal. Every class of mech has its place on the battlefield and any mech can be effective in the right hands. Nonetheless, I wanted to find out what the data actually can tell us when we look at a large number of games played. Essentially I asked, "what is the reality of tonnage imbalances?"

With a large number of games, things like individual player skill, play style, drop composition, tactics, map choice, game mode choice, events, trolls, groups, etc. should be averaged out. If tonnage imbalance was not a factor, the proportion of wins by heavier teams should be 50% (and conversely, the win rate of lighter teams should be 50%). If tonnage imbalance is no better of an indicator than flipping a coin, likely it does not influence the outcome of the match.

DATA

The data from 734 games were collected from 3/09/2021 to 8/25/2021. The vast majority of these games were played by tier 4 pilots, with some being tier 5, and some being tier 3. So a caveat might be that things are different in tier 2 or tier 1 games. Data were extracted via OCR from match summary screenshots and augmented with mech data that were extracted from https://wiki.mwomercs.com/ (with many corrections) and hand-input by me. There is no selection bias here and no cherry picking: these are my own games. I am a constant and I am nothing special, sometimes playing well-practiced builds and other times trying new stuff.

Tonnage difference is calculated by totaling the tonnage of each team and then subtracting the highest from the lowest. The difference is unidirectional; it will always be positive. Then I identify who the victor was and if the tonnage difference was in their favor or not. In this analysis I did not determine if the heavier team was actually heavier than some kind of average or if the lighter team was below some kind of average. Heavier and lighter are simply relative to the opposing force.

The minimum tonnage difference observed was zero and the maximum tonnage difference observed was 295. The lightest drop observed was 560 tons and the heaviest drop observed was 945 tons, with the lightest possible drop being 240 tons and the heaviest possible drop being 1200 tons. The mean team tonnage was about 779 tons and the median team tonnage was 785 tons.

FINDINGS

Out of 734 games, about 56.4% of them were won by the team that had the heavier drop. A heavier drop is where that team has more tons of mech total than the opposing team, irrespective of the actual composition (which is another analysis I am doing, but I digress). This results is statistically significantly different than 50% (proportions z-test, t-stat 3.498, alpha 0.05, p-value 0.000). The 95% confidence interval is about 52.8% to 60%. Yes, heavier teams tend to win more. Note that causation has not been proven here, which is why I say that heavier teams tend to win more instead of teams win more because they are heavy.

HOW OFTEN DOES THIS HAPPEN

It should not surprise anyone that almost all games have a tonnage imbalance. Only 14 out of the 734 games were equally balanced. There are too many variables to balance to get games in a decent time. But how often are the teams imbalanced and generally by how much? The mean tonnage difference is about 72 tons, with the median being 60 tons. Of the 734 games, 454 (about 61.8%) were imbalanced by between 5 and 80 tons.

MORE DETAILS

A heavier team can be heavier by five tons or heavier by five hundred tons. Does it make a difference how heavy a team is? Yes it does. Using the same statistical method, tonnage differences were binned in 40-ton bins and each bin was analyzed for statistical significance. Why 40? I wanted a reasonable number of bins of a width that related well to mech weights, with a reasonable number of observations for statistical power.

It turns out that at the 5-40 bin, while the observed proportion of heavier team wins is about 53.6%, there is not enough statistical evidence to reject the null hypothesis and accept the alternative hypothesis. In other words, I cannot say that the true proportion is different than 50%.

In the 45-80 and 85-120 bins, the observed proportion does increasingly favor the heavier team, but the results are not statistically significant. They are very, very close to being so and with only a little bit of data can cross that threshold. Their 95% confidence intervals contain 50%, but only just barely. From a practical perspective, the true proportion is very likely to be above 50%.

The 165-200 bin is statistically significantly different than 50%. In fact, the true proportion for that bin could be between about 58.3% and 79.8%. The remaining bins are unreliable due to their small sample sizes. Those tonnage differences do not occur often enough and their confidence intervals are so wide. Ultimately, common sense tells us that if lower tonnage differences are different, higher tonnage differences are as well. Still, a claim of a statistically significant difference cannot be made based on the data available.

CONCLUSION

If you are on a team with more tonnage than the opposing team, based on these 734 games, you have a higher probability of obtaining a victory. The more tonnage difference you have in your favor, generally the higher that probability is. As a practical effect, you might see one additional victory for every ten to twenty games where the tonnage was in your favor, depending on the imbalances. Given that most imbalances are mild (about 60% are 80 tons and under, with 36% being 40 tons or under), in the long run it is probably not as much of a factor on one's WLR (and PSR/tier) as one would think.

Can you do anything about this? Other than not dropping in groups with lots of unused tonnage (or infamous and often very ineffective meme light "hunter" packs), probably not. Just play the game, enjoy yourself, and ignore the stats. If you really care about the stats, focus on winning in mechlab, playing better, or maybe find a group, I guess. Don't let your dreams be memes.

Posted Image

#2 Nightbird

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 7,518 posts

Posted 30 August 2021 - 08:26 PM

Nice job, though could I request you recalculate the p-values after slicing the tonnage with ranges 0 to X, X to Y, Y to Z such that each slice has 1/3 of the population?

Only one of your t-test p-values is significant, which at best lets me conclude that there may be some limited significance to tonnage differences.

If you'll do the same test with the average WLR of the members of both teams (from Jarl's), I think you'll see an extremely significant result in comparison./

#3 dubstep albatross

    Member

  • PipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 68 posts

Posted 30 August 2021 - 09:10 PM

View PostNightbird, on 30 August 2021 - 08:26 PM, said:

Nice job, though could I request you recalculate the p-values after slicing the tonnage with ranges 0 to X, X to Y, Y to Z such that each slice has 1/3 of the population?

Only one of your t-test p-values is significant, which at best lets me conclude that there may be some limited significance to tonnage differences.


Like this below?

In the end, I believe the conclusion is fundamentally the same, though two out of three bins (excluding the equal tonnage bin) now show statistical significance. Thus, the conclusion is more supported by the data. I am sure I could arrange the bins differently with different outcomes, which is a general hazard of binning. I did not think to bin the population as such, so thank you for the suggestion. I wanted to keep the bins relatable to the average player. The 90 to 320 bin is pretty wide from that perspective.

Posted Image

Quote

If you'll do the same test with the average WLR of the members of both teams (from Jarl's), I think you'll see an extremely significant result in comparison./


That is an interesting idea. However, I am unsure how to get data out of that site (have not looked, maybe it is easy), plus I am far too lazy. My OCR is really good with most things, but names can be a bit iffy. Based off of intuition, if a team consists of better players, and WLR might be one indicator of that in some cases, they are more likely to achieve victory. I would say that, intuitively, an imbalance in WLR between teams is going to contribute more to the outcome than the imbalance in tonnage.

#4 Capt Deadpool

    Member

  • PipPipPipPipPipPip
  • Ace Of Spades
  • 305 posts

Posted 30 August 2021 - 10:09 PM

View Postdubstep albatross, on 30 August 2021 - 09:10 PM, said:

I would say that, intuitively, an imbalance in WLR between teams is going to contribute more to the outcome than the imbalance in tonnage.


Most certainly it will, which is why Nightbird here and I and many others wanted a MM/PSR system based upon WLR. Many people irrationally believed that if they 'did well' but their team lost then they were being punished if they didn't get PSR rewards, despite the obvious fact that losses literally happen to everyone, and winning is the only real objective of an online team-based FPS game. (Not even mentioning the fact that moving up in PSR should be considered more of a 'punishment' than moving down, since MM will ensure you are playing against people that will kill you and defeat you more frequently...)

You could ideally make a PSR/MM system where every single chassis you own gets its own PSR value, which could be modified further based on whether you are solo, in a two-man, three-man, or four-man. But seemingly such things are far too difficult to implement, despite the likely significant reduction in stomps that would result, as we can't even get step one implemented...

Edited by Capt Deadpool, 30 August 2021 - 10:26 PM.


#5 LordNothing

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 17,739 posts

Posted 31 August 2021 - 12:01 AM

didnt read but the graphs say everything. gj.

#6 MW Waldorf Statler

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 9,459 posts
  • LocationGermany/Berlin

Posted 31 August 2021 - 02:05 AM

heavier Team wins more when the Mechs medium or Heavys , not Assaults in Kids Hands ...Lights and Assaults to special in handling for a Good Gameplay in MWO for New Players, 2 Heavys with 70t better as 2 Assaults with 200t ...otherside 2 Lights in Good Hands better as 2 Assaults in new hands, tahts dies in Seconds far away from the Team.in QP the Assaults most the Mechs with under 100 dmg.

By Teams thats fighting in small places ,Entrys and Tunnels each Tonnage is Lost Weight

Edited by MW Waldorf Statler, 31 August 2021 - 08:02 PM.


#7 My Lord and Saviour Jesus Christ

    Member

  • PipPipPipPipPipPip
  • Major General
  • Major General
  • 475 posts

Posted 31 August 2021 - 03:36 AM

Even if you enforced equal weight classes on each side you could have matches where its 4/4/4/4 and has a tonnage disparity of up to 260t. The only reason its actually noticable now is because of groups dropping light and then not having the skill to leverage that.

Edited by My Lord and Saviour Jesus Christ, 31 August 2021 - 03:37 AM.


#8 VeeOt Dragon

    Member

  • PipPipPipPipPipPipPipPip
  • Survivor
  • Survivor
  • 1,302 posts
  • LocationHell, otherwise known as Ohio

Posted 31 August 2021 - 05:28 AM

i think its more a of a sliding scale. a few tons one way or the other wont have much effect but if you get a match where one team gets 4 assaults and the other none, well the outcome is inevitable. yes play skill comes in but no amount of skill can save you when the dice of life roll that heavily against you. given two teams of equal skill tonnage or rather Firepower/armor makes the difference.

Edited by VeeOt Dragon, 31 August 2021 - 05:29 AM.


#9 PocketYoda

    Member

  • PipPipPipPipPipPipPipPipPip
  • Shredder
  • Shredder
  • 4,147 posts
  • LocationAustralia

Posted 31 August 2021 - 05:30 AM

Not surprised by this at all i see it daily in tiers 4 and 5, Brute force (weapon Boating) tends to win over skills in most aspects of this game..

Edited by MechaGnome, 31 August 2021 - 05:30 AM.


#10 Aidan Crenshaw

    Member

  • PipPipPipPipPipPipPipPipPip
  • The Mercenary
  • The Mercenary
  • 3,650 posts

Posted 31 August 2021 - 05:35 AM

View PostMechaGnome, on 31 August 2021 - 05:30 AM, said:

Not surprised by this at all i see it daily in tiers 4 and 5, Brute force (weapon Boating) tends to win over skills in most aspects of this game..


Of course it does. Because there's little skill involved in T5 to begin with.

#11 Thorqemada

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • 6,397 posts

Posted 31 August 2021 - 06:23 AM

I feel that in general more Firepower wins over less Firepower as long you manage to bring that Firepower to fruition.

#12 Nightbird

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 7,518 posts

Posted 31 August 2021 - 06:31 AM

View Postdubstep albatross, on 30 August 2021 - 09:10 PM, said:


Like this below?

In the end, I believe the conclusion is fundamentally the same, though two out of three bins (excluding the equal tonnage bin) now show statistical significance. Thus, the conclusion is more supported by the data. I am sure I could arrange the bins differently with different outcomes, which is a general hazard of binning. I did not think to bin the population as such, so thank you for the suggestion. I wanted to keep the bins relatable to the average player. The 90 to 320 bin is pretty wide from that perspective.

Posted Image



That is an interesting idea. However, I am unsure how to get data out of that site (have not looked, maybe it is easy), plus I am far too lazy. My OCR is really good with most things, but names can be a bit iffy. Based off of intuition, if a team consists of better players, and WLR might be one indicator of that in some cases, they are more likely to achieve victory. I would say that, intuitively, an imbalance in WLR between teams is going to contribute more to the outcome than the imbalance in tonnage.


Thanks, I like this better. By putting more data into each bucket, you clearly show the effect on the match increasing as tonnage differences increase.

I created a thread a while back using publicly available variables (not tonnage but weight class, number of matches played, WLR, KDR) to see which had the most effect on winning probability. The finding was that they all did, but WLR had the most effect by far, and also that all factors were collinear with WLR. So just by using past WLR, you take into consideration all other variables.

https://mwomercs.com...and-suggestion/

data from https://leaderboard.isengrim.org/

#13 GARION26

    Member

  • PipPipPipPipPipPip
  • Giant Helper
  • 301 posts

Posted 31 August 2021 - 08:06 AM

Thank your for the nicely done analysis!

The comments in this thread about Tier 5 being a place where skill matters less makes me wonder if the findings vary by tier? Does the data set contain enough info about Jarls or similar ranking data to see if this changes at higher skill levels?

#14 dubstep albatross

    Member

  • PipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 68 posts

Posted 31 August 2021 - 10:45 AM

View PostGARION26, on 31 August 2021 - 08:06 AM, said:

Thank your for the nicely done analysis!

The comments in this thread about Tier 5 being a place where skill matters less makes me wonder if the findings vary by tier? Does the data set contain enough info about Jarls or similar ranking data to see if this changes at higher skill levels?


I think the claim that skill matters less in tier 5 is a bit reductive and maybe even misguided. I think there's more margin for error in tier 5 and a greater skill disparity can have a disproportional effect on the outcome, both at the individual and team/match levels. A totally-clueless player in tier 5 is not likely to be very successful regardless of other tier 5 players, and certainly is going to be a victim to anyone with more skill, even in tier 5. Maybe that totally-clueless player does not encounter the more-skilled player in that round because the more-skilled player gets eliminated somewhere else, thus that totally-clueless player enjoys more success. Perhaps luck plays a bigger role in the lower tiers, then?

Regarding tier, I addressed that in the original post. Most (rough number? 75%) of the games were played by a tier 4 pilot, with some small number played by tier 5 and tier 3 pilots.

I am not sure how to identify what tier a player is in (could be easy, I have not looked into it), but that would be very interesting. The dataset used for this analysis has 10,411 unique pilot names. I am not sure tier could be reliably inferred. With reliable pilot/tier data, light could be shed regarding how well matches are balanced by tier, how often the release valves open, and how often the poor seals in tier 5 get clubbed by the certified-grade-A killers in tier 1. An initial challenge would be defining how to determine what the "tier" of the match really was. My guess is matches are not homogenous with respect to pilot tier.

#15 GARION26

    Member

  • PipPipPipPipPipPip
  • Giant Helper
  • 301 posts

Posted 31 August 2021 - 11:56 AM

You may not have the data to do it.
But if you had all 10411 pilot names and can match with the Jarls ranking you could segregate matches by average Jarls rank for all 24 participants in total and then split the matches into groups based on average Jarls rank using whatever cut points make sense.

#16 Heavy Money

    Member

  • PipPipPipPipPipPipPipPip
  • The Marauder
  • 1,275 posts

Posted 31 August 2021 - 12:39 PM

Hey, great work and thanks for doing this for all of us!

#17 Rkshz

    Member

  • PipPipPipPipPipPipPipPipPip
  • 2,866 posts
  • Twitch: Link
  • LocationOdesa, Ukraine

Posted 31 August 2021 - 03:11 PM

View PostCapt Deadpool, on 30 August 2021 - 10:09 PM, said:

Most certainly it will, which is why Nightbird here and I and many others wanted a MM/PSR system based upon WLR. Many people irrationally believed that if they 'did well' but their team lost then they were being punished if they didn't get PSR rewards, despite the obvious fact that losses literally happen to everyone, and winning is the only real objective of an online team-based FPS game. (Not even mentioning the fact that moving up in PSR should be considered more of a 'punishment' than moving down, since MM will ensure you are playing against people that will kill you and defeat you more frequently...)

You could ideally make a PSR/MM system where every single chassis you own gets its own PSR value, which could be modified further based on whether you are solo, in a two-man, three-man, or four-man. But seemingly such things are far too difficult to implement, despite the likely significant reduction in stomps that would result, as we can't even get step one implemented...

it is not difficult to implement, for this devs need a desire, which the devs did not have and do not have

#18 Nightbird

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 7,518 posts

Posted 31 August 2021 - 03:56 PM

View PostRkshz, on 31 August 2021 - 03:11 PM, said:

it is not difficult to implement, for this devs need a desire, which the devs did not have and do not have


The current implementation is from the community. It is easily exploitable, and highly advantageous to skilled players. That it hurts everyone else is just a non-factor. It's amusing how much this is like a microcosm of the capitalistic world :)

#19 MW Waldorf Statler

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 9,459 posts
  • LocationGermany/Berlin

Posted 31 August 2021 - 08:07 PM

View PostVeeOt Dragon, on 31 August 2021 - 05:28 AM, said:

i think its more a of a sliding scale. a few tons one way or the other wont have much effect but if you get a match where one team gets 4 assaults and the other none, well the outcome is inevitable. yes play skill comes in but no amount of skill can save you when the dice of life roll that heavily against you. given two teams of equal skill tonnage or rather Firepower/armor makes the difference.

in the Random Multiplayer from MW4 all the Kids drives Assaults with unlimited Ammo and heat and only one Team wins ;-)

in T5-3 the Assaults most the first dying Teammates, with nothing great influence to the Match,im seeing matchs who the complete Assault Lance is diying in Seconds after the Matchstart while run in the Murderball or eliminated from Light Wolfpacks.

Edited by MW Waldorf Statler, 31 August 2021 - 08:09 PM.


#20 GentleMouse

    Member

  • Pip
  • Ace Of Spades
  • 16 posts

Posted 31 August 2021 - 11:35 PM

I love the analysis, though I'm not sure it was wise to split your mechs up. I know it can seem narcissistic but your own personal performance can have pretty drastic effects on the outcome of a game. For instance with the CTF-0XP I have a 1.27 win/loss ratio across 93 games, and with the CTF-3D only 1.03 across 118 games (ECM master race).

Statistically speaking if you're playing a mech lighter than the mean (about 65 tonnes) then you're more likely to end up on the lighter team than if you were playing a mech heavier than the mean. This means that if your most practiced, best performing mechs are above average tonnage while your worst performing ones are below average tonnage then your data will, whilst looking fine, be quite skewed. Equally this could work in the other direction.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users