Why Elo Doesn't Work Here

#41 Sandpit

Member

Veteran Founder
17,419 posts

Facebook: Link
Twitter: Link
LocationArkansas

Posted 22 January 2014 - 06:10 PM

I've seen details on everything EXCEPT how it's calculated. Nowhere have I seen a dev state it is based solely on your win.loss record.

#42 MischiefSC

Member

The Benefactor
16,697 posts

Posted 22 January 2014 - 06:17 PM

Necromantion, on 22 January 2014 - 06:03 PM, said:

Yep and stop saying games are balanced cause "Elo of each team was within x amount of eachother" because all that often results in is the team with the more median players stomping the team with 2-3 really good players and the rest that are junk.

Yep constant 700-900 dmg games with 80% of the kills in the game at times but the games drop my Elo cause i lost. Whoopty do

It must be difficult to be immune to math. How does that work, really? That the law of averages and probability just don't apply to you? That somehow you manage to be an unstoppable killing machine amost every game and yet your team still loses when probability dictates the caliber of people you drop with will, in the aggregate, be exactly the same as everyone else? That even accounting for high/low Elo scores matched on each team the other team will have the same odds of composition, that you're exactly as likely to be on the team with the superior player composition as you are to be on the one with the inferior one? If you throw up 700 to 900 damage and 80% of the kills in almost every game you plan than you are literally one of the greatest players of any game, ever, in the history of the human race and a clear paragon of eye-hand coordination.

Or... I don't believe you and instead think you're knowingly or unknowingly exaggerating.

The last 4 or 5 threads on this I argued I did my best to be gentle and understanding and clearly that didn't work so let's get right to the facts -

You are wrong. Statistically, provably, demonstratively wrong. This isn't about opinions or feelings this is about how mathematics, statistics and probability theory work. You are playing with the same odds as everyone else. The only constant in every single game you play is you. There isn't some evil AI living in the servers hand-picking matches to make sure you always get the bad teams. Sometimes you get a good team sometimes a bad team, the odds are the same for you as they are for everyone else. What you do control and what does influence your odds is how you play. If you are not skilled enough to influence the odds of your team winning more than 50% of the time then Elo is doing exactly what it's supposed to do - rank you at a position where you will win about 50% of your games. You're average. So are most people.

If you're winning 50% of your games that's because Elo is working - not because it doesn't work. Attempting to say that your performance doesn't impact the odds of your team winning when averaged out over hundreds of games isn't just statistically false but implies a lack of personal accountability. You win because you're a great player - you lose because you're with terrible people.

No. Your performance, for better or worse, impacts the total odds of your team winning and is verifiable in the aggregate of your wins and losses over a sufficient sample size. Between 200 and 500 games depending on how good you are for a pretty complete seating of your score. 100-300 to get you in the right general area.

#43 TehSBGX

Member

FP Veteran - Beta 1
911 posts

Facebook: Link

Posted 22 January 2014 - 06:19 PM

Honestly I suspect the new 'leveling' system has a lot to do with Match maker. So Elo might be on it's way out.

#44 MischiefSC

Member

The Benefactor
16,697 posts

Posted 22 January 2014 - 06:21 PM

Sandpit, on 22 January 2014 - 06:10 PM, said:

I've seen details on everything EXCEPT how it's calculated. Nowhere have I seen a dev state it is based solely on your win.loss record.

Here you go, because I'm on a roll on this topic tonight.

The statistics behind Elo, statistical sampling, etc. are my biznatch. I deal with them professionally and every single effing day all the people saying 'it's totally not my fault that I missed metrics about X, CUZ REASONS' all while hundreds of their peers did and I can show them specifically what they did differently, etc.

I know this song and dance. Does the matchmaker need some work? Of course. Do we need a bigger pool of players to make it better? Absolutely.

Elo however is the best possible tool for measuring the worth of a player in a ranked system be that as a member of a team or as an individual.

Math, baby. Math is sexy.

#45 Roland

Member

8,260 posts

Posted 22 January 2014 - 06:24 PM

MischiefSC, on 22 January 2014 - 05:17 PM, said:

Elo (not ELO, it's a last name) is not 'made for 1v1'. It's not 'made for' anything.

Yes it was. It was made as a chess ranking system.

Quote

It's a system for representing in a point value the results of a 1/0 (win/loss for example) contest. 1v1, 100 v 100, it doesn't matter.

It actually matters quite a lot, if you are attempting to attribute the rating earned by a TEAM to the individuals ON that team, because that isn't how Elo's rating system was designed to work.

While it could potentially work for a team of any size, it views that team as a single individual from a ratings perspective. It is not able to accurately attribute a rating to individual players based upon the outcome of a team game, especially games where each individual player does not constitute a large portion of the team.

That is, Elo can work for team based games, if the teams are FIXED, and each TEAM gets a rating. Then, the outcomes of each match will directly reflect the competency of the team. However, in cases where the teams are randomly selected and fluctuate every time, the result of one game does not have a clear translation into a measurement of skill for any individual player (at least, not if you only view the ultimate W/L outcome of the match).

Quote

The only difference how many people involved in each match creates is how many total matches you need to play to get a comparatively accurate reading. To refine it more simply in a 12 v 12 environment you need to play 11 times as many matches to get the same results as 1 v 1.

No, that's not true at all. That is not statistically correct by any stretch of the imagination.

Quote

The other 11 people on your team are irrelevant. Completely and totally and absolutely 100% irrelevant in the aggregate measure of win/loss as it relates to you.

Again, this is not at all true. Especially in a complex environment such as mechwarrior where not only skill but complex mech configuration plays into the outcome. Because different mechs do not have equal combat capabilities, each player does not have a clear "one twelfth" contribution factor to the outcome of the game.

This combines with a number of other complex factors which are all effectively ignored by Elo, because the Elo rating system was designed to rank chess players, where each player had sole control of the outcome of his games, and each game started from exactly the same starting state.

MischiefSC, on 22 January 2014 - 05:17 PM, said:

Elo is accurate and about as correct as possible. Like with any sport it obviously can't predict in the immediacy your mood and likelihood of winning a specific game but taken as an average how likely having you there will promote a win over a loss against a given skill of adversary? Yes, it works like all other probability sampling and statistical analysis works.

No, this is clearly not true.
Elo has well established and understood weaknesses when applied to a gaming environment such as MWO PUG matches.

This is why other, BETTER rating systems have been developed.

Again, I point you to Microsoft's Trueskill rating system, which uses a Bayesian approach to matchmaking in the type of environment we have here. And, again, it was developed in order to address the failures of an Elo type ranking system in complex group play where the teams consist of constantly fluctuating members.

If you are interested, you can read this research paper from MIT press about Trueskill, and how it was developed to address specific problems with applying Elo's rating system to multiplayer games.

#46 Bhael Fire

Banned - Cheating

Ace Of Spades
4,002 posts

Twitter: Link
Twitch: Link
LocationThe Outback wastes of planet Outreach.

Posted 22 January 2014 - 06:25 PM

Sandpit, on 22 January 2014 - 06:01 PM, said:

Posts from the devs?

Yes. There's one from Paul (I think) back when they introduced Elo to the MM that explains how it works...And there's other posts from the devs explaining it in greater detail.

It's only based on a Win/Loss formula.

They erroneously theorized that good players will win more (which would seem logical), but failed to take into account game-changing factors like groups with VOIP, tonnage/loadouts of mechs on each team, and other factors that contribute to the effectiveness of any given player.

Ideally, the player's Win/Loss ratio would be factored in with what mech the player is using and its loadout (i.e. BattleValue) and whether or not the player is grouped or playing solo.

#47 Sandpit

Member

Veteran Founder
17,419 posts

Facebook: Link
Twitter: Link
LocationArkansas

Posted 22 January 2014 - 06:25 PM

MischiefSC, on 22 January 2014 - 06:21 PM, said:

I read that but I don't see where it states win/loss is what is used to calculate it. Or is the sole parameter used to calculate it

#48 MischiefSC

Member

The Benefactor
16,697 posts

Posted 22 January 2014 - 06:51 PM

Roland, on 22 January 2014 - 06:22 PM, said:

Yes it was. It was made as a chess ranking system.

It actually matters quite a lot, if you are attempting to attribute the rating earned by a TEAM to the individuals ON that team, because that isn't how Elo's rating system was designed to work.

While it could potentially work for a team of any size, it views that team as a single individual from a ratings perspective. It is not able to accurately attribute a rating to individual players based upon the outcome of a team game, especially games where each individual player does not constitute a large portion of the team.

That is, Elo can work for team based games, if the teams are FIXED, and each TEAM gets a rating. Then, the outcomes of each match will directly reflect the competency of the team. However, in cases where the teams are randomly selected and fluctuate every time, the result of one game does not have a clear translation into a measurement of skill for any individual player (at least, not if you only view the ultimate W/L outcome of the match).

No, that's not true at all. That is not statistically correct by any stretch of the imagination.

Again, this is not at all true. Especially in a complex environment such as mechwarrior where not only skill but complex mech configuration plays into the outcome. Because different mechs do not have equal combat capabilities, each player does not have a clear "one twelfth" contribution factor to the outcome of the game.

This combines with a number of other complex factors which are all effectively ignored by Elo, because the Elo rating system was designed to rank chess players, where each player had sole control of the outcome of his games, and each game started from exactly the same starting state.

No, this is clearly not true.
Elo has well established and understood weaknesses when applied to a gaming environment such as MWO PUG matches.

This is why other, BETTER rating systems have been developed.

Again, I point you to Microsoft's Trueskill rating system, which uses a Bayesian approach to matchmaking in the type of environment we have here. And, again, it was developed in order to address the failures of an Elo type ranking system in complex group play where the teams consist of constantly fluctuating members.

I'm very, very familiar with TrueSkill - it was not created to address flaws in Elo it was created to deal with carrying a ranking across multiple games. TrueSkill is based off the idea of creating a ranking system and a way to predict how someone who, for example, tore it up in CoD would play in Battlefield 3 and do so with professional accuracy for professional gaming. It's also very granular and predicts not just player skill (regardless of team composition) but the degree of uncertainty about the players skill.

It would also require a massive playerbase to accurately distribute and the biggest tweak it would make to the Elo system we use currently is using a Gaussian distribution instead of a logical one - which I've already recommended. It would also split pug and premade Elo - which I've already recommended.

What a Bayesian inference (which is what we're talking about here) would change about this game is... exactly what I've recommended. Everything beyond that (measuring how you perform based on the relative ranks of the people on your team, etc) are just massive overkill for something like MW:O. Also you don't need to be able to predict how a player will perform in a 8 person free-for-all or a 3 v 3, just 12 v 12. Given the (relative to Xbox Live) low population in MW:O you don't need to rank players nearly so precisely, they're going to be in matches with a wide skill distribution no matter what you do.

It's also still just basing it all on win/loss.

Love you to bits Roland but you're wrong. Elo is the right sort of system, it just needs tweaked a bit (changes stated above) to make it seat more accurately. TrueSkill, Glicko, Elo, they measure win/loss odds between players given relative skill ranges. Anything more complex than Elo (with, as I said, Gaussian distribution curve, split pug/premade Elo, matching first for skill range before high/low to target) is simply going to give you additional telemetry for predicting performance that isn't relevant. How well you perform on a team with people in a given skill range vs another skill range against teams of X,Y,Z provisioning doesn't matter. There's only 24 people available for the match right now in the given Elo range and approximate tonnage match-up so tag, you're it!

Every other rating system is just Elo with additional factors to predict more granular and provisional performance across a wider set of variables. In MW:O the matchmaker just needs to find 24 people within a given skill range piloting mechs of a set tonnage range within 3 minutes.

Please provide me with an example of how TrueSkill uses something other than your win/loss results to predict your performance. Let me help you - that's not possible because TrueSkill (and systems like it) use your win/loss as the sole indicator, just like Elo does. They just let you predict performance with more precision (which we don't need) or across a wider spectrum of games (which we don't need).

What I said stands. A more complex system *might* be able to give you your ranking with less games to convergence or seat you more precisely or better predict how you will perform with a specific set of players before you've even played with them but we don't have anything like the population density to make that viable or even worthwhile. What it won't do is care about the damage you do, the kills or assists you get, or any other factor than how often you win or lose within a ranking framework.

Sandpit, on 22 January 2014 - 06:25 PM, said:

I read that but I don't see where it states win/loss is what is used to calculate it. Or is the sole parameter used to calculate it

It is the sole parameter. Here's a link to Elo and how it works, Pauls post implies that whoever reads it already understands what the Elo rating system is and how it works so I don't think he repeats the details of that anywhere.

#49 MischiefSC

Member

The Benefactor
16,697 posts

Posted 22 January 2014 - 06:57 PM

Bhael Fire, on 22 January 2014 - 06:25 PM, said:

No. That's not an erroneous theory. Please show me via mathematical model, statistical analysis or probability equation how your performance does not affect the statistical odds of your team winning measured over hundreds of samples.

BV is absolutely untrustworthy and unreliable in accurately predicting somesones value in winning or losing. While having an Elo score tied to every build you pilot would be even more accurate it would be almost impossible to seat everyone accurately as it takes hundreds of matches. Also that sort of precision is irrelevant given that there will be hundreds of points in variation between players scores on each team and in the end would contribute to less accuracy, not more - at least until everyone had played 300 matches in every possible build.

I get you don't want to feel like your skill or performance is relevant but it is. It can be measured and quantified. I get that you want to believe that somehow matchmaking is 'harder' for you than everyone else but it's not. Everyone else is just as likely to be paired with as against the 4man with VOIP or the window-licking mouth-breather playing with a steering wheel. What's different and measurable and relevant is your performance and how that impacts your teams performance measured over hundreds of games.

#50 Sandpit

Member

Veteran Founder
17,419 posts

Facebook: Link
Twitter: Link
LocationArkansas

Posted 22 January 2014 - 07:02 PM

MischiefSC, on 22 January 2014 - 06:51 PM, said:

Quote

Once we get a full understanding of how accurately the Match Maker is working, we are going to add some additional parameters to the mix. These include a more defined player skill rating

My point being we don't know that Elo is the only parameter or how Elo is being implemented here and with the recent post that shows the fallacy of "I got stomped because I'm playing high ELO players (who are high elo apparently because they said they were) when I shouldn't be" and just how wrong many perceptions surrounding the MM are.

What I'm getting at is that it's getting a bit tiring to see everyone running around blaming MM, premades, balance, teamates, etc. because they lost.

I had it happen with a teammate last night. Spent almost the entire match complaining about how out tonned we were. I kept explaining to them we weren't and that we weren't getting beat because of a tonnage discrepancy. He, in his "elite uber skilled" mentality couldn't accept it was poor strategy and game play that lost us the match.

Even after we counted the tonnage (which was a mere 30 tons in difference) he still couldn't accept that maybe we just got outplayed.

#51 Bhael Fire

Banned - Cheating

Ace Of Spades
4,002 posts

Twitter: Link
Twitch: Link
LocationThe Outback wastes of planet Outreach.

Posted 22 January 2014 - 07:07 PM

MischiefSC, on 22 January 2014 - 06:57 PM, said:

I get you don't want to feel like your skill or performance is relevant but it is. It can be measured and quantified. I get that you want to believe that somehow matchmaking is 'harder' for you than everyone else but it's not. Everyone else is just as likely to be paired with as against the 4man with VOIP or the window-licking mouth-breather playing with a steering wheel. What's different and measurable and relevant is your performance and how that impacts your teams performance measured over hundreds of games.

You know nothing, Jon Snow.

#52 Roland

Member

8,260 posts

Posted 22 January 2014 - 07:17 PM

Quote

I'm very, very familiar with TrueSkill - it was not created to address flaws in Elo it was created to deal with carrying a ranking across multiple games.

Perhaps you should refamiliarize yourself with Trueskill, since it was indeed created to address the problems with applying Elo to multiplayer games... which are extremely well defined and understood. And honestly, since you are a statistician, the multitude of problems applying Elo's rating system to a multiplayer game really should be obvious to you.

The chief problem with trying to apply Elo to a game like this is the inability to infer individual player skill from team results.

Again, if you are familiar with Trueskill, then you know this... and if you don't, you are free to read that paper I just posted, which clearly states this challenge and how they designed Trueskill to deal with that issue.

#53 MischiefSC

Member

The Benefactor
16,697 posts

Posted 22 January 2014 - 07:22 PM

Sandpit, on 22 January 2014 - 07:02 PM, said:

I agree completely and get exactly what you're saying. It's called self-serving bias, it's how people take credit when things go well and refuse accountability when things go wrong. Thanks to the growth of postmodernism (essentially the belief that because you saw a National Geographic special on something you know as much about it as someone with an 8 year degree and 30 years experience) you run into self-serving bias a lot in games and business. The wins, well, they win because they're amazing. When they lose, well, that's because someone else messed up.

Bhael Fire, on 22 January 2014 - 07:07 PM, said:

You know nothing, Jon Snow.

I know statistical analysis. It pays my bills and keeps me in hookers and blow.

Look at it like this -

Every 180 seconds (3 minutes) the matchmaker does its best to grab the closest ranked group of 24 people that hit 'launch' in that 180 second period. That's not a lot to choose from. It has to try and account for their skill in their given weight-class (light/medium/heavy/assault) and the tonnage of what they're piloting.

It could be 100% prefect, accounting in every detail for your skill with that specific loadout, if you've eaten recently or if your blood sugar is low, everything - and you'd still get one-sided stomps because of the people available in that 180 seconds this was as close a match as was available.

What the matchmaker needs is 5 things to be better:

1. You need to split pug and premade Elo scores. I absolutely agree that skews results and it skews them enough to be relevant.

2. It needs to match players first within a range, not high/low for a target. It'd be a better game if everyone on both teams was within 200 Elo points of each other even if one team had a 100 point advantage than if one team had three people with a high score and 3-5 people with lower scores they're having to carry. This can account for a lot of problems - a person with a high Elo has a bad match or gets popped early and the whole team suffers exponentially for it. Match to range, not to target.

3. A Gaussian distribution curve instead of a logical one. This is sort of the bell curve that the rankings are 'valued' at. I'll spare you the math but for simplicity sake it thickens up the middle (bulk of players) rankings while makes the high/low more extreme. This makes it easier to do #2 above.

4. More matches. The more matches you play the better your convergence, how accurately you're ranked.

5. More players. More players = more people in every 180 seconds to pull from to make matches around.

#54 Deathlike

Member

Littlest Helper
29,240 posts

Location#NOToTaterBalance #BadBalanceOverlordIsBad

Posted 22 January 2014 - 07:24 PM

Roland, I read the doc you posted, and although my pathetic college memories of stats should disappoint people, what MischiefSC is saying is correct.

The TrueSkill thing is essentially saying "we've figured out how to correctly give you ELO/ranking" after a fewer number of games. Congrats.

The problem with it is how it is relative to mechs (because the same pilot is different across all mechs, and is also loadout dependent as well, which is somewhat quantifiable) and the current "ELO sharing" nature between the weight classes (which is fixable).

Addressing the latter issue... you probably could be epic with a HGN-733C or an Atlas D-DC, but once you switch to an Awesome (PB/9M. it doesn't matter really), the game will quickly "auto-correct" you from your top tier ELO to mediocre tier (well, unless you are the most elite Awesome pilot, you will very unlikely stay at the levels of the optimal mechs).

Abuse of the mechanic with the "TrueSkill" solution could technically be made, but the doc doesn't provides "safeguards" to "intentionally tanking" (which could also be mistaken for, taking a bad mech).

Following Homeless Bill's basic idea for mech chassis and variant ELO is an optimal concept/solution.

The former issue is that in other FPSes, they don't quite offer the same kind of customizeability that this game provides... they tend to have same health bars, weapons you can pick up from anywhere... and although things can be similar (weapon mechanics, weapon tracking, accuracy tracking, etc), there are quite a few dynamics that can be tweaked that give technically infinite (but actually limited in a sense) so...

Damn numbers.

#55 Roland

Member

8,260 posts

Posted 22 January 2014 - 07:30 PM

Deathlike, on 22 January 2014 - 07:24 PM, said:

No, that is actually the Glicko rating system.
The Trueskill system goes beyond that, to use a Bayesian approach to help derive individual skill from team results.

Honestly, if you read the paper, didn't you read the part where they specifically cite this as a challenge that Elo is poorly suited for, and that Trueskill is meant to address it?

#56 MischiefSC

Member

The Benefactor
16,697 posts

Posted 22 January 2014 - 07:36 PM

Roland, on 22 January 2014 - 07:17 PM, said:

TrueSkill isn't simply about multiplayer games but about predicting a players performance across different team sizes and compositions and game types. The only problem Elo has in inferring individual player skill from team results is that it takes more games to seat accurately and the more people on each team the larger the sample size needed for convergence. It also will always lack fine precision, which you need for ranking high-end competitive matches. An Elo system alone absolutely would NOT work if we were trying to, say, establish who the 24 best players in MW:O were.

We're not though. We're just trying to fill matches with 24 people.

When MW:O gets competitive it'll need a more comprehensive system with more granular use of telemetry - but it will still use win/loss as the basis of it. Ideally, if they're logging it, it'll have the data to help predict performance for a player based no chassis, loadout and type of team composition. It won't have to deal with variables like varying team sizes however or how you'll perform in a FFA vs how you perform in team to team games.

You're mistaking the inability to get accurate relativity between players (taking, for example, 100 players and ranking them accurately among themselves) with approximate value (fitting players into a 'class 1 to class 5' system) to get reasonable matchmaking in a population pool the size of MW:O.

It does what it needs to do and with a minimum of fuss. Additional parameters are exponentially more difficult to do accurately and any good statistical modeling system is no more complex than it needs to be.

Also they do, in the end, only calculate off win/loss.

That's the thing that's being missed here. Suppose PGI just paid Microsoft to modify TrueSkill for use with MW:O and implement it, you wouldn't get a huge difference in performance. You'd get a tiny difference, mostly seen at the highest ends of the scale, than if you just did what I've already recommended (gaussian, split pug/premade Elo, match range not target).

TrueSkill measures off of win/loss. It just includes metrics like how likely you are to win piloting a Jenner with 4MLs, 300XL and 2SRM4s when your team is comprised of X mechs with players of Y skill against a team of composition A, B and C. It's not going to add more players to the available folks who hit 'launch' within 180 seconds of when you did. It's still going to have to pick from the same group of people available.

#57 Bhael Fire

Banned - Cheating

Ace Of Spades
4,002 posts

Twitter: Link
Twitch: Link
LocationThe Outback wastes of planet Outreach.

Posted 22 January 2014 - 07:40 PM

MischiefSC, on 22 January 2014 - 06:57 PM, said:

BV is absolutely untrustworthy and unreliable in accurately predicting somesones value in winning or losing. While having an Elo score tied to every build you pilot would be even more accurate it would be almost impossible to seat everyone accurately as it takes hundreds of matches. Also that sort of precision is irrelevant given that there will be hundreds of points in variation between players scores on each team and in the end would contribute to less accuracy, not more - at least until everyone had played 300 matches in every possible build.

Just as Win/Loss alone is not effective in getting an accurate read on a player's skill in MWO, neither would BV alone. It's only when you factor these things together that you start to get a more accurate picture of a player's/team's ability to win.

You have to admit that the toys you bring to the party can have a substantial influence on the outcome of that party. These things simply cannot be ignored on a whim.

If you refer to Paul's recent CC post, he concedes that in all of the 12-0 stomps he reviewed, the common denominator between each match was that one side had more groups with VOIP and more tonnage...even though essentially everyone in the match had similar Elo scores (with a disparity less than 200).

They need to start taking tonnage/loadouts and groups into consideration when determining a player's/team's ability to perform in battle. Win/Loss alone just isn't working for a lot of players.

#58 Deathlike

Member

Littlest Helper
29,240 posts

Location#NOToTaterBalance #BadBalanceOverlordIsBad

Posted 22 January 2014 - 07:43 PM

Roland, on 22 January 2014 - 07:30 PM, said:

They're saying, yes, "elo" is poorly suited for multiple players trying to obtain a win-loss result. I don't disagree with that base assertion.

However, ELO itself isn't the actual problem (it is and it isn't at the same time)... it's how the MM goes about picking teams. It implodes when "nothing fits its criteria nicely". This is dreadfully obvious.

What they should be doing is matching within ELO brackets to a degree, instead of randomly deciding on "averaging" to an arbitrary number for the sake of "precious MM time". It's not doing a great job of that in the first place. The variance is simply too damn high for it to be acceptable, of course that ignores the tonnage issue (which the tonnage limits would "technically address" [due to incompetence as we know it now], but not solve the problem of bad mechs, and terrible matchups of skill.

#59 Sug

Member

The People's Hero
4,630 posts

LocationChicago

Posted 22 January 2014 - 07:45 PM

Bhael Fire, on 22 January 2014 - 05:55 PM, said:

It's how it works in MWO; A player's Elo score is based on their wins and their losses only. Nothing else is factored in.

It depends who you won/lost against that determines your Elo.

#60 MischiefSC

Member

The Benefactor
16,697 posts

Posted 22 January 2014 - 07:56 PM

Bhael Fire, on 22 January 2014 - 07:40 PM, said:

I never said skill doesn't affect a player's Elo score. I said using WIn/Loss by itself as a means of measuring skill/effectiveness will not work in MWO without factoring in other considerations. As I pointed out, it's best suited for 1 v 1 and for matches with permanent teams, not teams that are dynamically generated on the fly. That's not to say that a skilled player won't have a higher Elo rating, it just means there are MANY more factors to consider to get an accurate representation of player skill.

Just as Win/Loss alone is not effective in getting an accurate read on a player's skill in MWO, neither would BV alone. It's only when you factor these things together that you start to get a more accurate picture of a player's/team's ability to win.

You have to admit that the toys you bring to the party can have a substantial influence on the outcome of that party. These things simply cannot be ignored on a whim.

If you refer to Paul's recent CC post, he concedes that in all of the 12-0 stomps he reviewed, the common denominator between each match was that one side had more groups with VOIP and more tonnage...even though essentially everyone in the match had similar Elo scores (with a disparity less than 200).

They need to start taking tonnage/loadouts and groups into consideration when determining a player's/team's ability to perform in battle. Win/Loss alone just isn't working for a lot of players.

It does, but again - remember, you are just as likely to drop with the high tonnage team with the premade on VOIP as you are against the team with that. It washes out in the aggregate. All that does is mean it takes more matches to accurately seat you.

What mech you play, again, comes out in the average. If you consistently play sub-par mechs you will (and should) have a lower Elo score. If you consistently play cutting edge competitive cheese it will be reflected in your Elo. If you play sometimes one and sometimes the other it will, in fact, be represented in your Elo score because you'll have better odds of helping your team win more often with piloting better mechs - especially if you do so more skillfully.

Even beyond that what win/loss helps reflect is - do you work to support your team even when pugging? Do you have good ideas and are you good at helping your team implement them? Are you good at scouting in a light, do you help your team target well? Conversely do you like to kill-steal, are you the sort who just stays quiet but is good at capitalizing on the performance of your team to boost your own damage and KDR? Do you have a high KDR because you retreat early and power down?

That's why win/loss is all that truly matters. In the end it's the result - the consequence of every choice you make and every facet of how you play. If you're great at sinking shots from half court but you're terrible on defense and don't play well with you're team you're not a valuable player - you're just good at half-court shots. If, conversely, you have great situational awareness, you know who to pass to, you're great at reading the plays on the court and supporting your team you may may rarely sink a shot but be an incredibly valuable member of your team.

Win/loss is the only worthwhile, trustworthy and reliable factor for determining what really matters - how likely you truly are to help your team win a match. Now, refining that down to how you do in specific matches on specific maps with specific loadouts is great. I'm all for adding that in when the game has the population to make it statistically relevant. If we had 300k concurrent players that's great stuff to know because i'll have more than 10,000 players hitting 'launch' every 180 seconds and it'll have the people available to really drill down and set up the best possible match.

At the moment though it's having trouble just fitting people high/low to a target within 150 points and passable weight matching.

Gaussian distribution, match to range, split pug/premade Elo. That's more than enough to give the best matches with the data currently available.

Why Elo Doesn't Work Here

#41 Sandpit

#42 MischiefSC

#43 TehSBGX

#44 MischiefSC

#45 Roland

#46 Bhael Fire

#47 Sandpit

#48 MischiefSC

#49 MischiefSC

#50 Sandpit

#51 Bhael Fire

#52 Roland

#53 MischiefSC

#54 Deathlike

#55 Roland

#56 MischiefSC

#57 Bhael Fire

#58 Deathlike

#59 Sug

#60 MischiefSC

3 user(s) are reading this topic

HOME

GAME

MEDIA

COMMUNITY

SUPPORT