Jump to content

Why Elo Doesn't Work Here


633 replies to this topic

#341 Abivard

    Member

  • PipPipPipPipPipPipPipPip
  • Shredder
  • 1,935 posts
  • LocationFree Rasalhague Republic

Posted 24 January 2014 - 01:21 PM

View PostRoadkill, on 24 January 2014 - 12:54 PM, said:

No. The math still gives the correct results. The results are just being used wrong.

Math is truth. Whether or not you can handle the truth; whether or not you can make proper use of the truth; those are different questions.

Elo works in this environment. This isn't the optimal environment, but it still works. It just takes longer to stabilize and has greater variability. But it still works.

If you use it wrong, though, or don't understand the conditions in which it has been used, then you can improperly interpret it and get bad results. Arguably, that is what's happening.


BEST POST IN THE THREAD.



Again not one single person said Elo is a flawed formulae, they simply say it can not be applied to this game as is, nor does it seem PGI understands how to MM, yet they are combing these two things trying to make them do what they are not meant to do.

2+2=4 but that is only true if 2 is 2 , what if one of those 2's is really another number?

That is the whole point, the 2 does not have a real value of 2, it's true value is unknown,and may in fact be subject to a variable that goes anywhere from the smallest possible number to the largest possible number. so the answer is not really 4.

the true fact is it is x+2=E, but you insist on constantly using the 2+2=4 example.

#342 Abivard

    Member

  • PipPipPipPipPipPipPipPip
  • Shredder
  • 1,935 posts
  • LocationFree Rasalhague Republic

Posted 24 January 2014 - 01:26 PM

View PostRoadkill, on 24 January 2014 - 01:16 PM, said:

Yeah, actually, you can. You just need to properly vent the exhaust.

It's actually the perfect analogy. Running a diesel engine underwater isn't optimal, but it can be done and works just fine if you take the proper precautions.

Using Elo in MWO's current environment isn't optimal, but it works just fine provided you understand the constraints.


Pot, kettle, black.

\

Do you understand straw argument? you just made one.

you changed my statement, left off the part where I mention the snorkel, then proceed to tell me that it can work with a snorkel, there by avoiding the whole point that it doesnt work with out it, nor can it work if the sub goes deep, you may not be aware that a sub just below the surface with something sticking above the water is not truly submerged. But that would invalidate your point, so lets just ignore that FACT!

#343 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 24 January 2014 - 01:27 PM

View PostRichAC, on 24 January 2014 - 01:17 PM, said:

haha wow, did I touch a nerve or something with match score? But ya thats what I thought you said. My highest match score is 149 I believe, with 8 kills and 1050 dmg, not sure how many assists.

To me match score is just a summary of all the other stats combined. Kills, damage, assists etc... Not probaby, but most definitely they all contribute to your match score. People will say nice dmg, but sometimes not nice kills because maybe you didn't get any, or vice versa...assists etc... Saying nice score just isn't specific enough, but I for one am glad its there and I will personally use that to judge who had the best performance in a match.

What most people can agree on, is that people should not be ranked solely by one stat over the other, hence we have a match score. Unfortunately, noone is actually ranked by that in the playerbase.

They are only ranked/rated by how their random team does in general, regardless of their performance in that particular match, which is nonsense to anyone with any sports sense.

No, no nerve. I don't really care about match score at all because it isn't tracked. And as near as I can tell, no one else (except you) cares about it either.

I agree that match score could be a useful tool for the matchmaker. I disagree that it is an indicator of skill, but it doesn't need to be in order to be useful. We already have a useful indicator of skill - your Elo rating - so adding match score to that in the matchmaker's calculations could help a lot.

You keep referencing sports. I don't think pro sports players care about stats like you think they do. The only stat they care about is championships. I.e. wins.

Yeah, they compare salaries. Yeah, they compare triple-doubles or TD passes or hat tricks or whatever else. But when it comes right down to it, the ONLY stat they care about is wins. Super Bowls. NBA Championships. Majors victories. Stanley cups.

Wins.

#344 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 24 January 2014 - 01:30 PM

View PostAbivard, on 24 January 2014 - 01:26 PM, said:

Do you understand straw argument? you just made one.

you changed my statement, left off the part where I mention the snorkel, then proceed to tell me that it can work with a snorkel, there by avoiding the whole point that it doesnt work with out it, nor can it work if the sub goes deep, you may not be aware that a sub just below the surface with something sticking above the water is not truly submerged. But that would invalidate your point, so lets just ignore that FACT!

So what you're saying is that you never had a point at all?

You're admitting that you can use a diesel engine underwater?

And by the way, a snorkel isn't the only possible solution.

So what, exactly, is your point? You just admitted that your own analogy defeats your own argument. So are you admitting that Elo does, in fact, work in the current MWO environment as long as you understand the constraints? Which is what I've been saying all along?

#345 Grits N Gravy

    Member

  • PipPipPipPipPipPip
  • 287 posts

Posted 24 January 2014 - 01:32 PM

View PostRussianWolf, on 24 January 2014 - 12:05 PM, said:

If you try to use ELO to predict 2 chess opponents that have never played any of the same opponents, it gets a lot less accurate. variables = 100-1000 maybe.

Not true . As long as they have played the same amount of games. The model really only breaks down if one either play only Elite players or poor players or the skill pool's of the players previous opponents its well below the standard deviation of skill. You can play online and see that while not prefect it's pretty decent.

View PostRussianWolf, on 24 January 2014 - 12:05 PM, said:

In MWO you are starting with 24 variables. Go to 100 matches and you are approaching 2300 variables (if you are a constant). The more variables you add, the less likely you are to get accurate resulting data. The less likely it is to be a predictor.

Statistics backs this up every time. The more variables you have, the less accurate the data.


We are not concerned with variables which determine the outcome, only the outcome. The only single stat that correlates well to a predicted out come is Elo. It's not perfect but it's the best out there.

View PostRussianWolf, on 24 January 2014 - 12:05 PM, said:

Secondly, no one has addressed the hypothetical that I gave.

ELO can rate an individual in an individual effort, or a team in a team effort. But I see no way for it to rate an individual in a team effort with any accuracy.

Well you take the team average and let the central limit therom do the rest over time. The true skill white paper showed Elo was accurate in predicting the winner in closely matched games (Elo scores with in 20% of each other) with large teams in Halo 60% of the time.

Edited by Grits N Gravy, 24 January 2014 - 01:38 PM.


#346 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 24 January 2014 - 01:36 PM

View PostAbivard, on 24 January 2014 - 01:21 PM, said:

Again not one single person said Elo is a flawed formulae, they simply say it can not be applied to this game as is, nor does it seem PGI understands how to MM, yet they are combing these two things trying to make them do what they are not meant to do.

The problem is that it can be applied to this game as it is. It does work in this scenario.

Does it work as well as it does for Chess? No. Elo is ideally suited for chess. It is not ideally suited for MWO (and I have not claimed that it is). But it does work, and provided you understand the constraints and use the ratings properly, it can work well.

Where we appear to agree is that PGI doesn't seem to be using it properly. The matchmaker needs to take more data into consideration in order to make better balanced matches.

#347 RussianWolf

    Member

  • PipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 2,097 posts
  • LocationWV

Posted 24 January 2014 - 01:40 PM

View PostRoadkill, on 24 January 2014 - 01:30 PM, said:


You're admitting that you can use a diesel engine underwater?

And by the way, a snorkel isn't the only possible solution.


really, I'm all ears. We only ran them for better than 50 years with no better solution until we switched to all Nucs.

#348 RussianWolf

    Member

  • PipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 2,097 posts
  • LocationWV

Posted 24 January 2014 - 01:45 PM

View PostRoadkill, on 24 January 2014 - 01:36 PM, said:

The problem is that it can be applied to this game as it is. It does work in this scenario.

Does it work as well as it does for Chess? No. Elo is ideally suited for chess. It is not ideally suited for MWO (and I have not claimed that it is). But it does work, and provided you understand the constraints and use the ratings properly, it can work well.

Where we appear to agree is that PGI doesn't seem to be using it properly. The matchmaker needs to take more data into consideration in order to make better balanced matches.

Say it with me now

ELO is not accurate enough to work in MWO by itself.

You said its too variable here.
That was what I said in the beginning.

As an aside, do a google on Widely accepted Mathematical Proofs that were later dis-proven. Who is to say that ELO won't be one of those at some point.

#349 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 24 January 2014 - 01:46 PM

View PostRussianWolf, on 24 January 2014 - 01:40 PM, said:

really, I'm all ears. We only ran them for better than 50 years with no better solution until we switched to all Nucs.

Ever heard of a rebreather?

#350 RussianWolf

    Member

  • PipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 2,097 posts
  • LocationWV

Posted 24 January 2014 - 01:47 PM

View PostRoadkill, on 24 January 2014 - 01:46 PM, said:

Ever heard of a rebreather?

do you know what it does?

#351 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 24 January 2014 - 01:48 PM

View PostRussianWolf, on 24 January 2014 - 01:45 PM, said:

You said its too variable here.
That was what I said in the beginning.

No, I didn't. I think it works fine here.

I think the matchmaker (not Elo, the matchmaker) could work better if we fed it more data in addition to Elo ratings, but that's not at all the same thing.

Elo is working just fine. The matchmaker needs help. Different things.

#352 RussianWolf

    Member

  • PipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 2,097 posts
  • LocationWV

Posted 24 January 2014 - 01:48 PM

do you know what it does?


Hint hint, CO2 isn't the problem.

Edited by RussianWolf, 24 January 2014 - 01:49 PM.


#353 Grits N Gravy

    Member

  • PipPipPipPipPipPip
  • 287 posts

Posted 24 January 2014 - 01:49 PM

View PostRussianWolf, on 24 January 2014 - 01:40 PM, said:

really, I'm all ears. We only ran them for better than 50 years with no better solution until we switched to all Nucs.

Burn it in a environment of hydrogen peroxide, Like Dr Walter did.
http://en.wikipedia....dent_propulsion

#354 Roadkill

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,610 posts

Posted 24 January 2014 - 01:49 PM

View PostRussianWolf, on 24 January 2014 - 01:47 PM, said:

do you know what it does?

Yep.

#355 RussianWolf

    Member

  • PipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 2,097 posts
  • LocationWV

Posted 24 January 2014 - 01:56 PM

View PostGrits N Gravy, on 24 January 2014 - 01:49 PM, said:

Burn it in a environment of hydrogen peroxide, Like Dr Walter did.
http://en.wikipedia....dent_propulsion
That's providing O2, not getting rid of the CO.

Your link is to AIP systems which is basically the battery system on them.

Edited by RussianWolf, 24 January 2014 - 01:57 PM.


#356 Grits N Gravy

    Member

  • PipPipPipPipPipPip
  • 287 posts

Posted 24 January 2014 - 02:18 PM

View PostRussianWolf, on 24 January 2014 - 01:56 PM, said:

That's providing O2, not getting rid of the CO.

Your link is to AIP systems which is basically the battery system on them.

The exhaust is vented overboard, you build a back pressure which exceeds the pressure at depth and the gasses flow into the water. The power of which can be use to provide additional thrust.

The walter engine would be coupled the drive or the dynamo to charge the batteries while in use. Read the link, The germans, the brits and the soviets all sailed peroxide boats. AIP can be used to describe any system that doesn't have access atmospheric oxygen. Like a Walter engine that uses peroxide as the oxidizing agent.

Edited by Grits N Gravy, 24 January 2014 - 02:19 PM.


#357 Abivard

    Member

  • PipPipPipPipPipPipPipPip
  • Shredder
  • 1,935 posts
  • LocationFree Rasalhague Republic

Posted 24 January 2014 - 02:30 PM

This about Elo in MWO working:

It's not a question of where he grips it! It's a simple question of weight ratios! A five ounce bird could not carry a one pound coconut.

Edited by Abivard, 24 January 2014 - 02:30 PM.


#358 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 24 January 2014 - 02:45 PM

View PostGrits N Gravy, on 24 January 2014 - 11:54 AM, said:

You could just apply a modifier to the K factor based on each piece equipment and chassis. Easy to use items increasing your K factor, thus your Elo goes up, so you have to play superior teammates more quickly. Track live data in background and make changes to dial in the system before you launch.

Doing the same thing here would help too. Apply a modifier to K factors based on group size.
The beautyf of all these solutions is u can run them in background and tweak them before the go live, or use historical data and model them too. It doesn't take a lot of effort to do so, it's just a matter of focus.


Sorta, but you'd need to keep the results of each modified K factor separate. Use the base Elo as a starting point for the weight class but then grow and maintain a separate Elo value, or in this case modified Elo value, for all the variables a player generates by playing different versions. That your base Elo isn't modified by a K factor just for wins/losses but our predicted value (your value before the win, not after) is modified for chassis and loadout in addition to the composition of both teams.

View PostRussianWolf, on 24 January 2014 - 12:21 PM, said:

I realize that you don't know me. But when you start off like that, you've lost your audience. I've aced Calculus, Trigonometry, Physics, Statistics and Geometry. Understanding complex equations and systems is something I also do for a living.

I'll restate it very simply.

Math works. Always. BUT math can give flaw results when placed in the wrong environment.

Simple equation

Input a constant greater than 0. X^2 = Y and Y will always be greater than 1

Simple. Easy. Always works.

Change the environment where you allow numbers larger than 0 and the equation stops being 100% accurate.

.5^2=.25 oops, that's not greater than 1

Math is wrong? No. You just put the equation into the wrong environment for what you were working on. So it gave you flawed data. If you use the data and continue, that's your problem.


You know what, I do owe you an apology. I've been debating with Rich and Abivard and it skewed my perspective so fair enough, totally my bad.

However what you're doing is mistaking algorithms and logarithms for probability theory and statistical analysis. You can identify the impact of a single variable with an influence in the hundredths of a percentile if you've got enough telemetry and enough samples. This is, essentially, how marketing works. It's also the same tool set that lets us identify the composition of distant solar systems and galaxies by measuring influences on gravitational lensing.

The changes in the environment in MW:O is never as great as what a completely random generation would create. The variability isn't anything close to random, it's not like it's trying to predict a lotto number. Every match results in either a 1/0 result and while at a granular level there's an incredible number of variables, if you want to get picky there are a good tens or even hundred million variables in a 12 v 12 match. You want to get into the variables impacting each player which can then theoretically impact their performance you could get into billions of variables.

The same could be said of rolling a six sided dice. Air viscosity, surface imperfections, lunar gravity impacting the dice differently based on how high it's thrown. Surface friction based on how much of the dice comes into contact with the table as it bounces to a halt. Billions of variables.

Truth is that unless you've modified the dice the odds of rolling any particular number on any throw of the dice is 1:6.

The reality is that the performance of every player in a match are going to fall within a very narrow range of performance (Elo average). Very high and very low performance will be statistical outliers and their impact will decline with a larger sample range. Taken as a strict aggregate you'll have an 8.33% impact on your teams odds of a win. In individual instances though it's going to swing based on the relative ability of your teammates, your value might swing from 3% to 12%.

Where the player skill impact comes in however is suppose you're 1% better, then your impact will be from 4% to 13% compared to an average players 3-12%. That 1% variance in your performance over average performance is magnified by the K factor. You'll increase your Elo more dramatically beating players who are better than average, you'll lose less when they beat you.

The composition of the opposite team may seem random but because they are balanced to an Elo target much like your own team the range of actual variables to team compositions are very narrow.

The reality is that most people are not that good. Almost all players are average. Your 'skill' and how it impacts Elo is reflected far more by your ability to pick, build and play effective mech builds and if you want a spectacular Elo you need to hone the skill of joining and coordinating with a skilled team of players. These two skills are going to be far, far more impactful to your win/loss than your aiming ability.

What's all this equate to? In the current system your Elo rating is a good ballpark estimate of your performance and it does serve to separate players into rough bands of terrible, bad, average, good, awesome. You need hundreds of drops in a weight class to get seated around where you should be.

So if you've studied statistics how are you saying that something like how big an impact one player in a 12 v 12 MW:O match after several hundred examples in a roughly skill and weight balanced environment? The only difference between 1 v 1 and 12 v 12 is scale. Accounting for scale is solved by increasing sample size and breadth of data. That's why statistical analysis exists as a science.

#359 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 24 January 2014 - 03:25 PM

View PostAbivard, on 24 January 2014 - 02:30 PM, said:

This about Elo in MWO working:

It's not a question of where he grips it! It's a simple question of weight ratios! A five ounce bird could not carry a one pound coconut.


Actually, that's absolutely true in the context of why we need to match to a range and not high/low for a target.

One high Elo player can not consistently carry 1-3 lower Elo players. Don't sandbag people. Leave 1 or 2 slots on each team for higher Elo players with no possible matches in their range and essentially ignore their impact on the match. It'll be minor anyway. Don't do that with premade groups obviously but a solo player with a 2800 Elo almost certainly got there based on their skill in a 4man. Dropping solo, probably leveling Locusts or whatever, you try to treat him like the raised middle finger of Odin and stack his team with window-licking mouth-breathers and you're just skewing the matches results. Pile him into the highest Elo matches available but balance both teams based on matching to as narrow a range of Elo variance between players on the same team.

Better experience for everyone.

#360 RichAC

    Member

  • PipPipPipPipPipPipPip
  • 661 posts

Posted 24 January 2014 - 06:46 PM

View PostRoadkill, on 24 January 2014 - 01:27 PM, said:

No, no nerve. I don't really care about match score at all because it isn't tracked. And as near as I can tell, no one else (except you) cares about it either.

I agree that match score could be a useful tool for the matchmaker. I disagree that it is an indicator of skill, but it doesn't need to be in order to be useful. We already have a useful indicator of skill - your Elo rating - so adding match score to that in the matchmaker's calculations could help a lot.

You keep referencing sports. I don't think pro sports players care about stats like you think they do. The only stat they care about is championships. I.e. wins.

Yeah, they compare salaries. Yeah, they compare triple-doubles or TD passes or hat tricks or whatever else. But when it comes right down to it, the ONLY stat they care about is wins. Super Bowls. NBA Championships. Majors victories. Stanley cups.

Wins.



Well it should be tracked. I assumed it was before recent revelations, silly me.


Oh they definitely care about stats friend. Its also what makes sports popular to most fans. STATS. Especially fans of losing teams...lol

The general manager and owner of the teams care about wins. But The players themselves usually just care about money and their personal STATS which get them more money.

Noone uses fancy math equations that dont' even take stats into account, except a bunch of nerds who dont' even follow sports.

And yes we know, you think a better indicator of skill is an individuals constantly changing random teammates..... we shouldn't worry though, because if a player is rated higher after performing poorer in a match then players who performed better, which we judge immediately by stats, we just have to feel assured that it will all eventually "correct" itself in the end. Whenever that is.... but thats ok it will happen one day... you've done the math.

But Once again, noone cares about wins.

1. its a random team game.

2. the game is based on cbills.

Edited by RichAC, 24 January 2014 - 06:54 PM.






33 user(s) are reading this topic

0 members, 33 guests, 0 anonymous users