

Problems With Elo-Hard Stats
#1
Posted 12 March 2013 - 01:52 PM
Recently I've been seeing a lot of commentary on the ELO matchmaking system, many people have commented on individual matches but I would like to present some more concrete data. I have been working on unlocking the Master Level on my Awesome-9M and decided to kill two birds with one stone by grinding the necessary XP and recording the data from a series of matches to illustrate several points.
Over the course of several hours I played a total of 20 straight matches, all but 1 was in my custom Awesome-9M, for the benefit of completeness the mech is equipped with 3xERPPC's that I have set to Chainfire and 3X Streak SRM2 racks for periods when cooling is necessary as well as for dealing with light mechs. Finally it carries a Beagle Active Probe to aid sensors and because I had tonnage left over and was at Elite level with the Sensor Module for the entire test. The only match not fought in this mech was a single game (Game two on the list) in the Trial Trebuchet-7M.
The Win/Lose results were as follows,
Wins = 6
Loses = 14
That's a lose rate of 70%, if the ELO system is supposed to provide me with balanced matches, why isn't closer to 50%???
Most of the matches took place on the Frozen City/Frozen City Night and Forest Colony/forest Colony Snow maps and were a mix of Conquest and Assault.
The actual Win/Lose ratio is not my only point however and I want to draw attention to the Casualty figures for the 20 Matches, specifically the number of mechs destroyed on each side.
Casualty Figures.
8/2
8/2
5/4
8/0
8/2
8/3
8/2
8/4 *
6/2 (Base Cap Win)
8/2
5/3 (Base Cap Win)
8/3
7/1 (Base Cap Win)
8/2
8/1
8/4
7/4 **
8/0
8/3
8/4
The two Asterisk marked matches I will get to in a minute but a casual look at the figures illustrates a disturbing trend. Under the ELO system of matchmaking 65% of the matches ended in a casualty rate of 8/3 or worse for one side or the other, what I would reasonably describe as a Landslide Victory for one side, 50% had a casualty rate of 8/2 or worse, if that's not a Landslide, I don't know what is.
* This particular match looks close initially until you factor in the fact that the winning side had 3x Atlas D-DC mechs, mechs which are not only the largest, most heavily armed and armoured monsters in the game but that are also fundamentally invisible and invulnerable to LRM and Streak SRM fire and furthermore were fighting on the River City Night map.
**Another match that looks close on paper but was a Conquest game on Frozen City were one team had a Cicada, a Jenner and a Spider while the opposing sides fastest mech was a Yen Lo Wang. The only reason they didn't simply CAP their way to a victory was they wanted the kills, even then the points at the end had over 400 points in the difference.
From the various comments and posts here on the forums there is a clear indicator that ELO is producing more Landslide wins and they are not fun for anyone, if as the DEV's say, the sytem is supposed to produce more even matches then the rests should be showing at least half the games with 8/4 or closer casualty rates. Further I would make the following contention,
ELO is fundamentally flawed because it conflicts with a primary principle of Mechwarrior Online.
Time and again PGI have stated that MWO is a game that requires teamwork, yet they have introduced a system for generating matches that is based on the skill of the INDIVIDUAL, a quality that is variable at best, some pilots are better in light mechs then heavies. Some groups drop as teams while others have to try and rely on their spatial awareness to guess what the team is going to do. TeamSpeak can help but only if the bulk of the people you drop with are using it. So if Team Co-operation is the key to victory, why are you matching up people based on INDIVIDUAL skill??????
This also means the system is ignoring the differences between the different chassis and believe me this can make a huge difference, for example, my Assault mech is an Awesome. I like it, it suits my style and since I am a Marik player I feel it is appropriate but, an Awesome of any type will struggle to bring down a Stalker as the Stalker typically mounts more weapons and armour, not much but it gives it an edge that a pilot must be careful of. Against an Atlas though an Awesome is lunch, the lighter mech may hurt it, even cripple it but the Atlas is simply too large and well armoured to deal with.
This is not as big a problem in TableTop play as the ranges of weapons are fixed values and certain tactics can level the playing field but, again as has been repeatedly stated by PGI this game is not TableTOP, as such weight of fire and weight of ARMOUR make a huge difference that ELO completely ignores.
Most of the time the solution to a problem is the simplest, I would suggest PGI return to a simple tonnage based system for now (I am not going to go into things like ECM and Weapons here and now, this post is long enough and is for a specific problem). Later a look at a BV balanced system may allow the DEVS to narrow the gap even more but ELO needs to go, if you want more proof of this then I would heartily encourage more and more players to repeat my experiment and post the results to illustrate the problem, give PGI all the data they can handle, if necessary until they choke on it.
#2
Posted 12 March 2013 - 01:54 PM
#3
Posted 12 March 2013 - 02:02 PM
#4
Posted 12 March 2013 - 02:04 PM
Sounds like you haven't established a solid rating in that particular class and were overly rated by default.
Edited by Bubba Wilkins, 12 March 2013 - 02:07 PM.
#5
Posted 12 March 2013 - 02:05 PM
Quote
I fully agree. As it stands the ELO matchmaker is just a giant trainwreck. The weight differences in games feels like a return to the bad old days when there was no matchmaking balance at all.
#6
Posted 12 March 2013 - 02:06 PM
Hamm3r, on 12 March 2013 - 02:02 PM, said:
Far to many people claim that their tiny sample of test data proves their point.
If I tossed a coin 20 times it's unlikely to actually be 10 x heads and 10 x tails. And that only has 2 potential outcomes!
A multiplayer team-based game has far more variables, 20 matches isn't even remotely enough.
Edited by Jestun, 12 March 2013 - 02:06 PM.
#7
Posted 12 March 2013 - 02:08 PM
1) Since ELO has been factored into matchmaking, my KDR has increased slightly (from 3.0, which I had been hovering at for a long time, to ~3.2). This surprised me, as I expected to have it go down... but talking it over with my brother, who also plays, he pointed something out which makes sense and fits my experience (see #2).
2) The matches I play now seem to be with smarter, more experienced players. When I suggest something in chat people more often listen than they used to, and folks seem to group together more and just play better. This improved teamwork may be what is leading to my personal KDR going up, and it may also lead to #3.
3) I don't think the 8:3 or even 8:2 is a landslide. You can't tell how badly damaged the remaining mechs were, or if the losing team had disconnected players, etc. Further, once the tide of battle starts to turn one direction or the other you often end up with a 'close' game where it is still a wipe on one side and only a couple losses on the other: that is the nature of numerical superiority, which can result from one side just getting lucky about where they positioned themselves compared to their enemies.
4) Now with all of that said, I am not sure that ELO is the best matchmaking system... but I don't know enough about alternatives to say what else might prove better. I can say, though, that it feels to me like it is better than it was before; so there is my subjective opinion for you

Oh, and one more thing -
5) Remember, you have multiple ELO scores. There is at least one for each weight class, and I think it may also take into account grouped vs solo (but I'm not sure of that - it just sticks in the back of my mind). That makes your #2 game data invalid, and also eliminates some of the concerns you raised.
#9
Posted 12 March 2013 - 02:10 PM
Edited by Hamm3r, 12 March 2013 - 02:11 PM.
#10
Posted 12 March 2013 - 02:10 PM
1- I also recorded a handful of my games pre-elo and my results were much the same. Lots of losses and the majority of the matches win or lose were blowouts. So unfortunately it doesn't seem as though ELO has fixed that aspect yet, I can't say for sure as my game crashes a lot since downloading one of the recent patches. But, ELO definitely hasn't hurt the game either since those type of results were already the norm.
2- ELO absolutely works for a team-based game like MWO. You may know the series front and back, but are you familiar with the system in other team-based games like CS:GO? You can absolutely be a great team player and have that reflected in your ELO ranking. ELO is more about who you beat than how many kills you score. So if you're a great teamplayer and can lead your team to victory through tactics then you will win a lot of games and have a high ELO. Unfortunately, atm the gameplay doesn't foster or encourage teamwork, that's the bigger problem.
No voip for easy coordination and communication, no lances for better organization, etc.
So, the problem definitely isn't ELO, it's just an easy scapegoat atm.
Also, no a proper ELO system doesn't mean you should win 50% of your games, it simply means you'll be placed with other similarly skilled players.
Edited by JayTac, 12 March 2013 - 02:13 PM.
#11
Posted 12 March 2013 - 02:11 PM
Jestun, on 12 March 2013 - 02:06 PM, said:
Far to many people claim that their tiny sample of test data proves their point.
If I tossed a coin 20 times it's unlikely to actually be 10 x heads and 10 x tails. And that only has 2 potential outcomes!
A multiplayer team-based game has far more variables, 20 matches isn't even remotely enough.
I apologize on the Author's behalf that he's not from PGI and doesn't have access to the data that they do.. 20 games is a decent sample size for a single person.. This guy has done his research..
Bubba Wilkins, on 12 March 2013 - 02:04 PM, said:
Sounds like you haven't established a solid rating in that particular class and were overly rated by default.
" have been working on unlocking the Master Level on my Awesome-9M"
Sounds to me like the OP already has a decent number of Assault matches under his belt. ELO should have balanced out by now.
#12
Posted 12 March 2013 - 02:12 PM
ELO may become unreliable if you're using crappy builds
i used to run a very similar build to yours and it was fun but it was really hard; i had to accept the truth 3er PPC is not very good even in an awesome. unless you're only doing that specific builds with single HS and no other weaps.
do THIS; drop a single ERppc for a standard ppc; instead of using streaks, get yourself 2xsrm4 and 1xsrm2 instead of the usual 3xsrm4, those 10 missiles usually hit where as the last ones of the 12 usually miss on the 9m because of the tubes.
use that tonage you saved from the last srm4 to put a med laser on the head;
i also add ams.
use standard engine to zombie it with the 2xsrm4 + the medlaser.
this is my own build; it's lacking a bit of armor though to get the 12 srm back
http://mwo.smurfy-ne...d35c712e7c584c3
Edited by Mazzyplz, 12 March 2013 - 02:20 PM.
#13
Posted 12 March 2013 - 02:13 PM
We're actively gathering data from production and working towards tuning the match making system to create better matches without significantly increasing the time to find a match. We haven't even had the first balancing pass yet for Elo so I think it's premature to speculate on how well Elo does or doesn't work for MWO.
As the tuning work progresses we will be monitoring and considering adding an additional rating for Group play so players would have a separate rating for lone wolf vs. group play if the data shows that it is necessary.
#14
Posted 12 March 2013 - 02:14 PM
According to my stats I have played 60 matches in the Awesome-9M since they started tracking the scores not to mention the other two variants, I've been living in Awesomes for nearly a month, mostly to get it were it is now, I would expect the ELO score to be pretty much worked out by now.
Currently have no teamspeak and as I said, I do appreciate it makes a big difference in things
#15
Posted 12 March 2013 - 02:15 PM
FerretWithASpork, on 12 March 2013 - 02:11 PM, said:
I apologize on the Author's behalf that he's not from PGI and doesn't have access to the data that they do.. 20 games is a decent sample size for a single person.. This guy has done his research..
No, it's not.
20 is not enough to draw any real conclusion from.
If I toss a coin 20 times and it lands on heads 15 times, does that coin have a 75% chance of landing on heads or did I just not do it enough times?
Now imagine I was rolling a dice with every potential outcome of an MWO match instead of just a 2 sided coin... it would be even harder to get meaningful data from.
#16
Posted 12 March 2013 - 02:16 PM
#17
Posted 12 March 2013 - 02:18 PM
#18
Posted 12 March 2013 - 02:20 PM
WolvesX, on 12 March 2013 - 01:54 PM, said:
ELO also does not account for the rules changing. In chess you start the game with the exact same pieces in the exact same setup. In MWO the only thing that is the same is that there are 8 pieces n each side. Those pieces change wildly from match to match. The rules also change from game to game as we have assault maps and conquest maps.
You can't rate skill in this fashion. They would be better served by implementing a BV system and keeping two teams equal in that way. Of course BV is also subjective.
#19
Posted 12 March 2013 - 02:20 PM
freak, on 12 March 2013 - 01:52 PM, said:
*snip*
That's a lose rate of 70%, if the ELO system is supposed to provide me with balanced matches, why isn't closer to 50%???
What does your ability to ERP Battletech type stuff have to do with your lack of understanding of the Elo system?
#20
Posted 12 March 2013 - 02:27 PM
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users