Jump to content

Is Vs Clan A Statistical Approach: Aka Locusts Are Op, But...

Balance

85 replies to this topic

#1 Tahawus

    Member

  • PipPipPipPipPip
  • Little Helper
  • Little Helper
  • 189 posts

Posted 21 August 2016 - 11:09 AM

Edit: Go To: http://mwomercs.com/...ost__p__5414473

For most recent analysis.

Someday, I'll restructure this first post to contain everything.



Original Post Below:

A statistical analysis of Mechwarrior Online leader board data. All data are from mwomercs.com.

You have been warned. This will be a wall of text with some charts and graphs...

I did this because I was both bored and frustrated with the continual sniping about which is better (IS or Clan) and balance in the game. Lots of anecdotal evidence on both sides, but surprisingly little real analysis. The actual analysis didn't take very long. Extracting the data and doing the minimal post run graphics (note the low effort on the pictures....)

Synopsis:

In short, overall clan mechs have a distinct and statistically significant advantage over IS mechs based on scores. Both IS and Clans have some mechs that hold advantages at the top end of the range. IS is overwhelmingly represented at the bottom end of the performance range and Clan mechs have more mechs in the top end of the ranges than the IS, some of them by large margins.

A small portion of this work was informed by a spreadsheet prepared (I'm told) by Tarogato from ISEN (Thank you). I recompiled the data on my own, though I reused his approach for integrating data for the Viper.


I'll probably update this to include the Cyclops after it has a leader board event and after the heavy event assuming that there haven't been too many changes in the game or scoring.


All data and processing are available at: https://github.com/n...MWOLeaderBoards


All results and any opinions are my work only. Please provide credit should you choose to build on my work.


Some notes and thoughts before I dive in.


The distributions of scores are non-normal and do not transform easily, so all of these results should be interpreted with caution. i.e. I believe the conclusions we can draw are valid, but if you're going to base an active handicapping system or online gambling odds on these you're on your own (unless the latter proves profitable, in which case I want royalties).


I'm making a number of other assumptions here that should be stated.


1. The top 75 results for each mech are representative of high performing pilots in that mech and that the pilot quality is even between mechs. i.e. we don't have a major effect in which mech rankings are influenced by better pilots choosing to pilot better mechs leaving poorer pilots to dominate the lower performing mechs.


2. That the scoring system is a good representation of the utility of the mech.


3. The non-normality of the data isn't a critical failure (I don't think it invalidates the results, just raises some questions where results are close).


4. A linear model is appropriate (It generally tests well and has reasonable distributions of residuals for the model), but given the non-normality is subject to being questioned.


Point 1 is probably the biggest reason to doubt the underlying message behind the results. The others either are uniform across mechs, or may have marginal effects that slightly change orders of mechs that are close to each other. We could debate this for pages, but I'm going to (non-scientifically) rationalize that if high performing pilots are disdaining the use of certain mechs, it's because they feel that those mechs put them at a competitive disadvantage (reinforcing the point, though not in a quantifiable way).


High scores can result from two primary mechanisms. High raw damage output, or high longevity on the field. There's no good way to quantify the degree to which each influences the score from the data used.


Reporting on Statistics:


Every statistic generated is significant with p<0.001, mostly because of the number of samples (>3000 as of this writing).


The dataset used includes the results of the Light, Medium, and Assault leader boards, and the to 75 results from the viper leaderboard. (see the github account for the xls and csv used to prepare the data).


IS vs Clan Expectations:


In a linear regression model that accounts for the mech's tonnage and whether it is IS or Clan tech the resulting prediction for score is: (IsIS = 1 for IS mech, 0 for Clan)


Score = Tons*9.84 - 239.1*(IsIS) + 2139.3
R-Square: 0.269


Interpretation: Across the entire range being in a clan mech adds about 240 points to your score. Given the range of scores, that ranges between about 6.5 and 12.5% of the final scores.


Using this function and computing residuals for each result, we can then look at which mechs perform above and below the expectation. Ranked by mean residual:


Posted Image


The locust is the highest performing IS mech relative to tonnage, but still lags behind several clan mechs. The Mist Lynx is the lowest.


Mechs by Tonnage:


The above example assumes that clan mechs will outperform IS mechs on average. Given PGI's statements on balancing, and the lack of other limitations on clan mechs (either in numbers deployed to the field, penalties for dishonorable combat, and employing mercenaries), I don't believe that to be the intent.


If we compare purely on the mech's tonnage we get the following regression model.


Score = 9.98* Tons + 1975.4
R-Square: 0.221


The plot of this function against the input data:


Posted Image

Note high performance at some weight classes (20 and 40 tons most notably).

Using the same analysis of the residuals:
Posted Image
If we assume that there should generally be ton for ton parity between IS and Clan mechs, at least as measured by the scoring system used in the leader board, we have a clear failure. Of the top 10 performing mechs by tonnage, 3 are IS, and the top 4 are all clan mechs and by large margins. Similarly at the bottom of the list, of the bottom 10, 1 is Clan. And the worst performers are at the bottom by a pretty substantial margin.

PS. Edited for formatting. I'm sure there are other typos and updates.

If PGI would like me to do more work by giving me greater access to their data, I'm willing and able. I have a significant background in the statistical modeling of behavioral and geographic phenomena.

Also added R-Square's for the regressions

Edited by Tahawus, 28 September 2016 - 07:48 PM.


#2 Positive Mental Attitude

    Member

  • PipPipPipPipPipPip
  • Bad Company
  • Bad Company
  • 393 posts
  • LocationWAYup

Posted 21 August 2016 - 11:20 AM

phoenix hawk worse than an urbie :(

#3 S 0 L E N Y A

    Member

  • PipPipPipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 2,031 posts
  • LocationWest Side

Posted 21 August 2016 - 11:48 AM

How did you account for pilot skill?
-giggles-

#4 FupDup

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 26,888 posts
  • LocationThe Keeper of Memes

Posted 21 August 2016 - 11:49 AM

I don't like the "score per ton" statistic because it implies that choosing a smaller mech should instantly make you inferior to somebody who chose a bigger mech.

#5 WANTED

    Member

  • PipPipPipPipPipPipPip
  • Knight Errant
  • Knight Errant
  • 611 posts
  • LocationFt. Worth, TX

Posted 21 August 2016 - 12:15 PM

Highlander and Victor only above Myst Lynx :((( so sad

#6 Tahawus

    Member

  • PipPipPipPipPip
  • Little Helper
  • Little Helper
  • 189 posts

Posted 21 August 2016 - 12:39 PM

View PostBoogie138, on 21 August 2016 - 11:48 AM, said:

How did you account for pilot skill?
-giggles-

I didn't except as it is represented in the error term for score. See my discussion on point 1 for related info.

View PostFupDup, on 21 August 2016 - 11:49 AM, said:

I don't like the "score per ton" statistic because it implies that choosing a smaller mech should instantly make you inferior to somebody who chose a bigger mech.


That is an incorrect interpretation of the results. The assumption that, on average, an higher tonnage mech will have a higher score is supported (though causality is not demonstrated) by the models. There is substantial variability within the error (i.e. tonnage alone explains ~22% of the variation). Much of the remainder is probably related to pilot skill. So the argument that a pilot choosing one mech makes them a better pilot than someone who picked another is not valid and is not addressed here. With additional data (PGI...) we might be able to test whether objectively better pilots tend to pick different mechs. I suspect that there is a relationship.

Edit: note that I added R-Squares for the models to the original post.

Edited by Tahawus, 21 August 2016 - 12:41 PM.


#7 Duatam

    Member

  • PipPipPipPipPip
  • Ace Of Spades
  • 135 posts
  • LocationFinland

Posted 21 August 2016 - 01:44 PM

Where's Dragon?

#8 Kirkland Langue

    Member

  • PipPipPipPipPipPipPipPip
  • Bad Company
  • 1,581 posts

Posted 21 August 2016 - 01:53 PM

View PostBoogie138, on 21 August 2016 - 11:48 AM, said:

How did you account for pilot skill?
-giggles-


Either you assume that the pilot skill will be randomly distributed, which would mean that you can ignore it as an impacting factor, or you assume that the skilled pilots will gravitate to the better equipment. While that could "negate" the results of any study - you've already proven the point of the study to even take that approach.

The only way pilot skill even matters is if you want to make the argument that skilled pilots intentionally choose weaker mechs but get them to out perform the stronger (IS) mechs.

View PostFupDup, on 21 August 2016 - 11:49 AM, said:

I don't like the "score per ton" statistic because it implies that choosing a smaller mech should instantly make you inferior to somebody who chose a bigger mech.


And I think it should - because the event basically only measures Damage and I flatly will not accept that it's reasonable for light mechs to do damage equal to assault mechs.

#9 Fang01

    Member

  • PipPipPipPipPipPipPip
  • Fury
  • Fury
  • 993 posts
  • LocationNew Jersey

Posted 21 August 2016 - 03:30 PM

View PostOsmobot, on 21 August 2016 - 01:44 PM, said:

Where's Dragon?


He'll add Dragon and other heavies once a data compatible leaderboard event for them is completed

#10 Big Tin Man

    Member

  • PipPipPipPipPipPipPipPip
  • Rage
  • Rage
  • 1,957 posts

Posted 21 August 2016 - 05:07 PM

Fwiw, I know Tahawus irl. He has more letters after his name than in his name. And he has a pretty long name.

Great info man

#11 Cy Mitchell

    Member

  • PipPipPipPipPipPipPipPipPip
  • The Privateer
  • The Privateer
  • 2,688 posts

Posted 21 August 2016 - 05:28 PM

The Viper was not available for the Medium event so it is an outlier. No one that has driven or fought a Viper would place it in the top half of the ratings. You can effectively eliminate it.

A better measure of each Mech's performance would be based on the present event where there is a more normal mix of weight class on the battlefield instead of the individual weight class events where the composition of the drops were badly skewed by the individual events.

Case in point: Look at where the Viper is performing there.

Edited by Rampage, 21 August 2016 - 05:30 PM.


#12 Requiemking

    Member

  • PipPipPipPipPipPipPipPipPip
  • The Solitary
  • The Solitary
  • 2,479 posts
  • LocationStationed at the Iron Dingo's Base on Dumassas

Posted 21 August 2016 - 06:18 PM

View PostRampage, on 21 August 2016 - 05:28 PM, said:

The Viper was not available for the Medium event so it is an outlier. No one that has driven or fought a Viper would place it in the top half of the ratings. You can effectively eliminate it.

A better measure of each Mech's performance would be based on the present event where there is a more normal mix of weight class on the battlefield instead of the individual weight class events where the composition of the drops were badly skewed by the individual events.

Case in point: Look at where the Viper is performing there.

Also, a lot of this can be explained by things other than quirks. Case in point, the Arctic Cheetah and the Kodiak. Both of them are highly successful, but not because of godquirks. They merely possess all the qualities a mech of their weightclass could ever want. Hell, the Cheetah isn't even that powerful, it simply possesses everything a Light mech could ever want.

#13 Ibrandul Mike

    Member

  • PipPipPipPipPipPipPipPip
  • The Referee
  • The Referee
  • 1,913 posts

Posted 21 August 2016 - 06:36 PM

First things first... nicely done Posted Image
I can't wait to see the new results with the leaderboard going on right now. It is quite interesting.

I have a few questions, but first I have to say that I didn't go through the data myself. Just your post and the results presented. The questions are not meant as an attack to your work but out of curiosity.


How are you addressing the different potential maximum points between the events?
For example in the Assault Leaderboard Event you had a significant higher number of Assaults running on both teams. This means more possible damage, which means higher possible scores. On the other side in the Light Leaderboard Event there where lights everywhere. It didn't seem as "bad" as with the assaults, but there seemed to be more than on average. So less damage potential and therefore less maximum possible points. And yes I am well aware of the fact that damage dealt is not the primary contributor.


How are you addressing the the formula used for leaderboard events?
The standard formula used for the events is this (just to have it in mind):

Quote

Score Formula: (Solo Kill x 30) + (Kill Most Damage x 20) + (Killing Blow x 10) + ( Kill Assist x 10) + (Win × 10) + (Loss × 5) + (Survive × 10) + (Dead × 5) + ((Damage done - Team Damage) ÷ 15)

I am asking, because the whole analysis is based on it. So have you thought about the formula and its weighting or did you just take the data at face value?


How are you addressing the problem that we only have the top 75 players?
Yes it is the only freely available data sample... but that doesn't mean that it is transferable to the whole system.
As an example, would you trust a statistic that shows you the 75 highest incomes in the US to have any kind of explanatory power for the income of the 75 lowest incomes or even the "average Joe"?

Thanks in advance for the answers and your work Posted Image

Edited by Ibrandul Mike, 21 August 2016 - 06:37 PM.
hr's added instead of spaces.


#14 Hit the Deck

    Member

  • PipPipPipPipPipPipPipPipPip
  • 4,677 posts
  • LocationIndonesia

Posted 21 August 2016 - 07:22 PM

First, thanks for your work!

I have one request if you don't mind, could you show the predicted score line and the residuals of each 'Mech's score in one chart?

EDIT: Sorry I missed the 2nd chart!

Edited by Hit the Deck, 21 August 2016 - 07:24 PM.


#15 Tahawus

    Member

  • PipPipPipPipPip
  • Little Helper
  • Little Helper
  • 189 posts

Posted 21 August 2016 - 08:27 PM

View PostRampage, on 21 August 2016 - 05:28 PM, said:

The Viper was not available for the Medium event so it is an outlier. No one that has driven or fought a Viper would place it in the top half of the ratings. You can effectively eliminate it.

A better measure of each Mech's performance would be based on the present event where there is a more normal mix of weight class on the battlefield instead of the individual weight class events where the composition of the drops were badly skewed by the individual events.

Case in point: Look at where the Viper is performing there.

I know a couple viper pilots that absolutely love it and wreck face in it on a regular basis, but you're correct, the viper is a questionable inclusion. I've run models with and without it, and it does not have much leverage on the model results.

I wish I could run stats on this week's leader board, but knowing only final score, and the mech that contributed a majority of the points, we'd be having to make some pretty ugly assumptions, and will have a sample size of at most 300 data points compared to the more than 3k in the chassis level analysis.


View PostIbrandul Mike, on 21 August 2016 - 06:36 PM, said:

First things first... nicely done
I can't wait to see the new results with the leaderboard going on right now. It is quite interesting.

I have a few questions, but first I have to say that I didn't go through the data myself. Just your post and the results presented. The questions are not meant as an attack to your work but out of curiosity.


Thank you, if I couldn't take someone challenging my work, I picked the wrong occupation (and wasted a lot of time in school).

View PostIbrandul Mike, on 21 August 2016 - 06:36 PM, said:

How are you addressing the different potential maximum points between the events?
For example in the Assault Leaderboard Event you had a significant higher number of Assaults running on both teams. This means more possible damage, which means higher possible scores. On the other side in the Light Leaderboard Event there where lights everywhere. It didn't seem as "bad" as with the assaults, but there seemed to be more than on average. So less damage potential and therefore less maximum possible points. And yes I am well aware of the fact that damage dealt is not the primary contributor.

This is a really good point that I haven't addressed. I don't explicitly deal with the different potential pool of damage. Without more information from PGI, I don't think doing so quantitatively is possible. Possible implications are would be that the slope associated with tonnage might be steeper than it should be because the light/medium end couldn't do as much damage. My feeling (note this is conjecture) is that being able to account for it would change the absolute numbers slightly, but probably wouldn't change the relative positions except in edge cases where mechs from different classes have close statistic on the residuals.

View PostIbrandul Mike, on 21 August 2016 - 06:36 PM, said:

How are you addressing the the formula used for leaderboard events?
The standard formula used for the events is this (just to have it in mind):

I am asking, because the whole analysis is based on it. So have you thought about the formula and its weighting or did you just take the data at face value?

I'm accepting their formula as is. Like above, I don't have data available to disaggregate it, or any other sources of data. So, lacking any other measure, I'm using what's available and trying to recognize what I'm analysing for what it is. I've tried to be precise in my wording, that we're comparing scores obtained by pilots using the mechs. Each of those is a loaded component of the analysis, with much of the result being loaded into the unexplained portion of the variation (pilot skill, available opponents, server connection quality....) I'd love to have more data, but if that were to happen, it'd be because PGI provided it.


View PostIbrandul Mike, on 21 August 2016 - 06:36 PM, said:

How are you addressing the problem that we only have the top 75 players?
Yes it is the only freely available data sample... but that doesn't mean that it is transferable to the whole system.
As an example, would you trust a statistic that shows you the 75 highest incomes in the US to have any kind of explanatory power for the income of the 75 lowest incomes or even the "average Joe"?


I'm actually addressing that specifically. Dealing with a sample of exactly the top 75 players makes it possible for us to do analysis based on the public data. Two assumptions that we do have to make are that each of the 75 has all 10 matches and that those 10 represent at least "good" matches. Given the aggregate score and lack of other information, we wouldn't be able to use these scores if we can't make those assumptions. If we had a long tail in the sample of less than 10 matches, or if the total number of high performing pilots were large enough that we couldn't make the two assumptions above, we wouldn't be able to do this analysis.

#16 Ibrandul Mike

    Member

  • PipPipPipPipPipPipPipPip
  • The Referee
  • The Referee
  • 1,913 posts

Posted 22 August 2016 - 02:28 AM

Thanks for the answers!

#17 SchnitzlXS

    Member

  • PipPip
  • Ace Of Spades
  • Ace Of Spades
  • 23 posts

Posted 22 August 2016 - 02:59 AM

Hey man,

Let me say, really nice work.

I'll PM you some humble suggestions.

Cheers!

Edited by SchnitzlXS, 22 August 2016 - 03:00 AM.


#18 Tahawus

    Member

  • PipPipPipPipPip
  • Little Helper
  • Little Helper
  • 189 posts

Posted 22 August 2016 - 07:34 AM

View PostSchnitzlXS, on 22 August 2016 - 02:59 AM, said:

Hey man,

Let me say, really nice work.

I'll PM you some humble suggestions.

Cheers!


Thanks Schnitzl, always good to hear from the data scientists. I dabble in it professionally, but it's not my specialty. I'll address your suggestions tonight after work.

#19 Angel of Annihilation

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The Infernal
  • The Infernal
  • 8,872 posts

Posted 22 August 2016 - 08:54 AM

One thing the stats don't reflect is that these contests are completely weighted on the playing the most battles you can possibly play. For example, you could suck as a player and only have 1 fantastic match for every 50 you play and as long as you play 500 matches, you could potentially have a higher score than a great player that is only able to play 10 matches, one of which resulted in an unlucky headshot ending his match early. Also if you compare a popular mech to a unpopular one, the popular one will have 1000s of matches played in it, verse 100s in the unpopluar one.

The point I am trying to make is that a mech like the Myst Lynx, isn't very popular so it is very likely it had only a few people playing them and even then most people who played them likely didn't fully concentrate on only playing the Myst Lynx throughout the contest. The Kodiak on the other hand is very popular so there are probably tons of people who only played the Kodiak and played them exclusively for 6-8 hours a day. This trend is going to tend to skew the scores upward for popular mechs.

#20 Contrex

    Member

  • PipPipPipPipPip
  • Philanthropist
  • Philanthropist
  • 112 posts

Posted 22 August 2016 - 09:11 AM

View PostViktor Drake, on 22 August 2016 - 08:54 AM, said:


The point I am trying to make is that a mech like the Myst Lynx, isn't very popular so it is very likely it had only a few people playing them and even then most people who played them likely didn't fully concentrate on only playing the Myst Lynx throughout the contest. The Kodiak on the other hand is very popular so there are probably tons of people who only played the Kodiak and played them exclusively for 6-8 hours a day. This trend is going to tend to skew the scores upward for popular mechs.


What makes a Mech popular?

a) Lore... i think you wont find many of them playing an event to be in the top 75
b ) like the look? dont think so either.
c) because the mech is stronger then mechs in the same class ...i would bet 99% of the people in here choose the mechs they use because they are strong. all but Proton with his kincrab in the assault top10! He just likes the lulz!

The top75 of every class, are most likely players who want to be in there. They force good games, but therefore you need the stronger mechs.

Edited by Contrex, 22 August 2016 - 09:11 AM.






1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users