Problems With Elo-Hard Stats

#1 freak

Member

14 posts

Posted 12 March 2013 - 01:52 PM

Before I begin I want to make something very clear, I am a 24 year veteran of the Battletech Table Top game, a registered Catalyst Demo Agent and have played every Mechwarrior game since Mechwarrior 2, anyone who thinks I don't know what I'm talking about stop now and leave because odds are I've probably forgotten more about this Universe/Game then you know.

Recently I've been seeing a lot of commentary on the ELO matchmaking system, many people have commented on individual matches but I would like to present some more concrete data. I have been working on unlocking the Master Level on my Awesome-9M and decided to kill two birds with one stone by grinding the necessary XP and recording the data from a series of matches to illustrate several points.

Over the course of several hours I played a total of 20 straight matches, all but 1 was in my custom Awesome-9M, for the benefit of completeness the mech is equipped with 3xERPPC's that I have set to Chainfire and 3X Streak SRM2 racks for periods when cooling is necessary as well as for dealing with light mechs. Finally it carries a Beagle Active Probe to aid sensors and because I had tonnage left over and was at Elite level with the Sensor Module for the entire test. The only match not fought in this mech was a single game (Game two on the list) in the Trial Trebuchet-7M.

The Win/Lose results were as follows,

Wins = 6
Loses = 14

That's a lose rate of 70%, if the ELO system is supposed to provide me with balanced matches, why isn't closer to 50%???

Most of the matches took place on the Frozen City/Frozen City Night and Forest Colony/forest Colony Snow maps and were a mix of Conquest and Assault.

The actual Win/Lose ratio is not my only point however and I want to draw attention to the Casualty figures for the 20 Matches, specifically the number of mechs destroyed on each side.

Casualty Figures.

8/2
8/2
5/4
8/0
8/2
8/3
8/2
8/4 *
6/2 (Base Cap Win)
8/2
5/3 (Base Cap Win)
8/3
7/1 (Base Cap Win)
8/2
8/1
8/4
7/4 **
8/0
8/3
8/4

The two Asterisk marked matches I will get to in a minute but a casual look at the figures illustrates a disturbing trend. Under the ELO system of matchmaking 65% of the matches ended in a casualty rate of 8/3 or worse for one side or the other, what I would reasonably describe as a Landslide Victory for one side, 50% had a casualty rate of 8/2 or worse, if that's not a Landslide, I don't know what is.

* This particular match looks close initially until you factor in the fact that the winning side had 3x Atlas D-DC mechs, mechs which are not only the largest, most heavily armed and armoured monsters in the game but that are also fundamentally invisible and invulnerable to LRM and Streak SRM fire and furthermore were fighting on the River City Night map.

**Another match that looks close on paper but was a Conquest game on Frozen City were one team had a Cicada, a Jenner and a Spider while the opposing sides fastest mech was a Yen Lo Wang. The only reason they didn't simply CAP their way to a victory was they wanted the kills, even then the points at the end had over 400 points in the difference.

From the various comments and posts here on the forums there is a clear indicator that ELO is producing more Landslide wins and they are not fun for anyone, if as the DEV's say, the sytem is supposed to produce more even matches then the rests should be showing at least half the games with 8/4 or closer casualty rates. Further I would make the following contention,

ELO is fundamentally flawed because it conflicts with a primary principle of Mechwarrior Online.

Time and again PGI have stated that MWO is a game that requires teamwork, yet they have introduced a system for generating matches that is based on the skill of the INDIVIDUAL, a quality that is variable at best, some pilots are better in light mechs then heavies. Some groups drop as teams while others have to try and rely on their spatial awareness to guess what the team is going to do. TeamSpeak can help but only if the bulk of the people you drop with are using it. So if Team Co-operation is the key to victory, why are you matching up people based on INDIVIDUAL skill??????

This also means the system is ignoring the differences between the different chassis and believe me this can make a huge difference, for example, my Assault mech is an Awesome. I like it, it suits my style and since I am a Marik player I feel it is appropriate but, an Awesome of any type will struggle to bring down a Stalker as the Stalker typically mounts more weapons and armour, not much but it gives it an edge that a pilot must be careful of. Against an Atlas though an Awesome is lunch, the lighter mech may hurt it, even cripple it but the Atlas is simply too large and well armoured to deal with.

This is not as big a problem in TableTop play as the ranges of weapons are fixed values and certain tactics can level the playing field but, again as has been repeatedly stated by PGI this game is not TableTOP, as such weight of fire and weight of ARMOUR make a huge difference that ELO completely ignores.

Most of the time the solution to a problem is the simplest, I would suggest PGI return to a simple tonnage based system for now (I am not going to go into things like ECM and Weapons here and now, this post is long enough and is for a specific problem). Later a look at a BV balanced system may allow the DEVS to narrow the gap even more but ELO needs to go, if you want more proof of this then I would heartily encourage more and more players to repeat my experiment and post the results to illustrate the problem, give PGI all the data they can handle, if necessary until they choke on it.

#2 WolvesX

Member

The Machete
2,072 posts

Posted 12 March 2013 - 01:54 PM

ELO is not working for a team game.

#3 Hamm3r

Member

Big Brother
221 posts

Posted 12 March 2013 - 02:02 PM

I agree that ELO needs work, but your one sampling of 20 games does not a conclusion make! Show me the results of 20-30 random people doing the same thing and you'll come closer to hard data.

#4 Bubba Wilkins

Member

688 posts

Posted 12 March 2013 - 02:04 PM

You do realize that you have a separate ELO rating for each class right?

Sounds like you haven't established a solid rating in that particular class and were overly rated by default.

Edited by Bubba Wilkins, 12 March 2013 - 02:07 PM.

#5 TOGSolid

Member

1,212 posts

LocationJuneau, Alaska

Posted 12 March 2013 - 02:05 PM

Quote

Most of the time the solution to a problem is the simplest, I would suggest PGI return to a simple tonnage based system for now...Later a look at a BV balanced system may allow the DEVS to narrow the gap even more

I fully agree. As it stands the ELO matchmaker is just a giant trainwreck. The weight differences in games feels like a return to the bad old days when there was no matchmaking balance at all.

#6 Jestun

Member

1,270 posts

Posted 12 March 2013 - 02:06 PM

Hamm3r, on 12 March 2013 - 02:02 PM, said:

I agree that ELO needs work, but your one sampling of 20 games does not a conclusion make! Show me the results of 20-30 random people doing the same thing and you'll come closer to hard data.

Far to many people claim that their tiny sample of test data proves their point.

If I tossed a coin 20 times it's unlikely to actually be 10 x heads and 10 x tails. And that only has 2 potential outcomes!

A multiplayer team-based game has far more variables, 20 matches isn't even remotely enough.

Edited by Jestun, 12 March 2013 - 02:06 PM.

#7 WardenWolf

Member

Legendary Founder
1,684 posts

LocationTerra

Posted 12 March 2013 - 02:08 PM

Hmm, thank you for presenting a lot of information, data, and your thoughts. However, I would point out a couple of things from my own experience:

1) Since ELO has been factored into matchmaking, my KDR has increased slightly (from 3.0, which I had been hovering at for a long time, to ~3.2). This surprised me, as I expected to have it go down... but talking it over with my brother, who also plays, he pointed something out which makes sense and fits my experience (see #2).

2) The matches I play now seem to be with smarter, more experienced players. When I suggest something in chat people more often listen than they used to, and folks seem to group together more and just play better. This improved teamwork may be what is leading to my personal KDR going up, and it may also lead to #3.

3) I don't think the 8:3 or even 8:2 is a landslide. You can't tell how badly damaged the remaining mechs were, or if the losing team had disconnected players, etc. Further, once the tide of battle starts to turn one direction or the other you often end up with a 'close' game where it is still a wipe on one side and only a couple losses on the other: that is the nature of numerical superiority, which can result from one side just getting lucky about where they positioned themselves compared to their enemies.

4) Now with all of that said, I am not sure that ELO is the best matchmaking system... but I don't know enough about alternatives to say what else might prove better. I can say, though, that it feels to me like it is better than it was before; so there is my subjective opinion for you

Oh, and one more thing -

5) Remember, you have multiple ELO scores. There is at least one for each weight class, and I think it may also take into account grouped vs solo (but I'm not sure of that - it just sticks in the back of my mind). That makes your #2 game data invalid, and also eliminates some of the concerns you raised.

#8 Royalewithcheese

Member

2,342 posts

Posted 12 March 2013 - 02:08 PM

Ran a quick t-test using this tool and, assuming I'm doin it rite, it doesn't look like your results are statistically distinguishable from a 50/50 win/loss ratio.

#9 Hamm3r

Member

Big Brother
221 posts

Posted 12 March 2013 - 02:10 PM

Also want to point out that as I read it anyways, your data is inherently flawed in that you went in already assuming that ELO was bad and were going to prove it, rather then to put it to the test and analyze the data and see a result.

Edited by Hamm3r, 12 March 2013 - 02:11 PM.

#10 DJMarine

Member

Elite Founder
99 posts

Posted 12 March 2013 - 02:10 PM

Couple points

1- I also recorded a handful of my games pre-elo and my results were much the same. Lots of losses and the majority of the matches win or lose were blowouts. So unfortunately it doesn't seem as though ELO has fixed that aspect yet, I can't say for sure as my game crashes a lot since downloading one of the recent patches. But, ELO definitely hasn't hurt the game either since those type of results were already the norm.

2- ELO absolutely works for a team-based game like MWO. You may know the series front and back, but are you familiar with the system in other team-based games like CS:GO? You can absolutely be a great team player and have that reflected in your ELO ranking. ELO is more about who you beat than how many kills you score. So if you're a great teamplayer and can lead your team to victory through tactics then you will win a lot of games and have a high ELO. Unfortunately, atm the gameplay doesn't foster or encourage teamwork, that's the bigger problem.

No voip for easy coordination and communication, no lances for better organization, etc.

So, the problem definitely isn't ELO, it's just an easy scapegoat atm.

Also, no a proper ELO system doesn't mean you should win 50% of your games, it simply means you'll be placed with other similarly skilled players.

Edited by JayTac, 12 March 2013 - 02:13 PM.

#11 FerretWithASpork

Member

65 posts

Posted 12 March 2013 - 02:11 PM

Jestun, on 12 March 2013 - 02:06 PM, said:

I apologize on the Author's behalf that he's not from PGI and doesn't have access to the data that they do.. 20 games is a decent sample size for a single person.. This guy has done his research..

Bubba Wilkins, on 12 March 2013 - 02:04 PM, said:

You do realize that you have a separate ELO rating for each class right?

Sounds like you haven't established a solid rating in that particular class and were overly rated by default.

" have been working on unlocking the Master Level on my Awesome-9M"

Sounds to me like the OP already has a decent number of Assault matches under his belt. ELO should have balanced out by now.

#12 Mazzyplz

Member

Ace Of Spades
3,292 posts

Posted 12 March 2013 - 02:12 PM

the problem is your build sadly.
ELO may become unreliable if you're using crappy builds

i used to run a very similar build to yours and it was fun but it was really hard; i had to accept the truth 3er PPC is not very good even in an awesome. unless you're only doing that specific builds with single HS and no other weaps.

do THIS; drop a single ERppc for a standard ppc; instead of using streaks, get yourself 2xsrm4 and 1xsrm2 instead of the usual 3xsrm4, those 10 missiles usually hit where as the last ones of the 12 usually miss on the 9m because of the tubes.
use that tonage you saved from the last srm4 to put a med laser on the head;
i also add ams.

use standard engine to zombie it with the 2xsrm4 + the medlaser.

this is my own build; it's lacking a bit of armor though to get the 12 srm back
http://mwo.smurfy-ne...d35c712e7c584c3

Edited by Mazzyplz, 12 March 2013 - 02:20 PM.

#13 Matthew Craig

Technical Director

867 posts

LocationVancouver, BC

Posted 12 March 2013 - 02:13 PM

The current iteration of Elo on production is quite loose in the games it creates with respect to Elo rating and tonnage, this was intentional to keep the time to find a match down.

We're actively gathering data from production and working towards tuning the match making system to create better matches without significantly increasing the time to find a match. We haven't even had the first balancing pass yet for Elo so I think it's premature to speculate on how well Elo does or doesn't work for MWO.

As the tuning work progresses we will be monitoring and considering adding an additional rating for Group play so players would have a separate rating for lone wolf vs. group play if the data shows that it is necessary.

#14 freak

Member

14 posts

Posted 12 March 2013 - 02:14 PM

I appreciate that 20 matches is a very small sampling but this was really a side project to getting the Awesome Mastered which is why I was encouraging people to do the same and publish their results.

According to my stats I have played 60 matches in the Awesome-9M since they started tracking the scores not to mention the other two variants, I've been living in Awesomes for nearly a month, mostly to get it were it is now, I would expect the ELO score to be pretty much worked out by now.

Currently have no teamspeak and as I said, I do appreciate it makes a big difference in things

#15 Jestun

Member

1,270 posts

Posted 12 March 2013 - 02:15 PM

FerretWithASpork, on 12 March 2013 - 02:11 PM, said:

I apologize on the Author's behalf that he's not from PGI and doesn't have access to the data that they do.. 20 games is a decent sample size for a single person.. This guy has done his research..

No, it's not.

20 is not enough to draw any real conclusion from.

If I toss a coin 20 times and it lands on heads 15 times, does that coin have a 75% chance of landing on heads or did I just not do it enough times?

Now imagine I was rolling a dice with every potential outcome of an MWO match instead of just a 2 sided coin... it would be even harder to get meaningful data from.

#16 ViKingOmega

Member

31 posts

LocationTexas

Posted 12 March 2013 - 02:16 PM

I have had very similar results to the OP with the ELO system. When playing PUG matches i have about 5-15 record for every 20 games average record. The matches are also very one-sided similar to what he reported. I did not take as much data down but i remember from last week, I bought a new X-5 and while trying to rank up in PUG matches my win/loss ration was terrible. The worst problem is the tonnage mismatch. For example when on team has 4 lightish Mechs, EX 2 Raven 3L a Cicada 3M and a Cicada X-5 and the other team has no lights but has 4 Assaults and 2 Medium 2 Heavy the matches are very one-sided. I think the system should have tonnage matches first then ELO applied. Or maybe a BV system. Or a total tonnage match so that there would be a different distribution of mechs but the same total tonnage so that the teams would be even and then match players by their stats.

#17 freak

Member

14 posts

Posted 12 March 2013 - 02:18 PM

Thanks for the tip Mazzyplz but did try a different build on the 9M and really wasn't working for me, this one was doing good until the patch that brought in ELO maybe that's why I felt it more then some others.

#18 Karl Marlow

Member

Elite Founder
2,277 posts

Posted 12 March 2013 - 02:20 PM

WolvesX, on 12 March 2013 - 01:54 PM, said:

ELO is not working for a team game.

ELO also does not account for the rules changing. In chess you start the game with the exact same pieces in the exact same setup. In MWO the only thing that is the same is that there are 8 pieces n each side. Those pieces change wildly from match to match. The rules also change from game to game as we have assault maps and conquest maps.

You can't rate skill in this fashion. They would be better served by implementing a BV system and keeping two teams equal in that way. Of course BV is also subjective.

#19 Heffay

Rum Runner

The Referee
6,458 posts

LocationPHX

Posted 12 March 2013 - 02:20 PM

freak, on 12 March 2013 - 01:52 PM, said:

What does your ability to ERP Battletech type stuff have to do with your lack of understanding of the Elo system?

#20 freak

Member

14 posts

Posted 12 March 2013 - 02:27 PM

I understand the ELO system fine Heffay, that statement was to try and discourage Flamers who think I'm some 16 year old who's looking for attention or who feel the need to pick a fight as opposed to having a reasonable discussion, something I find is usually necessary as I can't have these discussions face to face.

Problems With Elo-Hard Stats

#1 freak

#2 WolvesX

#3 Hamm3r

#4 Bubba Wilkins

#5 TOGSolid

#6 Jestun

#7 WardenWolf

#8 Royalewithcheese

#9 Hamm3r

#10 DJMarine

#11 FerretWithASpork

#12 Mazzyplz

#13 Matthew Craig

#14 freak

#15 Jestun

#16 ViKingOmega

#17 freak

#18 Karl Marlow

#19 Heffay

#20 freak

1 user(s) are reading this topic

HOME

GAME

MEDIA

COMMUNITY

SUPPORT