Jump to content

Elo Based On Win/loss (Or Anything Based On Win/loss) Is Silly


167 replies to this topic

#101 Diego Angelus

    Member

  • PipPipPipPipPipPip
  • The Warden
  • The Warden
  • 471 posts

Posted 10 November 2013 - 03:41 PM

View PostTooooonpie, on 10 November 2013 - 03:35 PM, said:

I agree completely - ELO should be based on damage done and capping, since those are the two things that directly contribute to both objectives in both gametypes

I'm not in a clan/lance yet, so all I do is pug - I just had 8 games in a row in my dakka hawk, 600+ damage most times, each one: lost

Does that mean I'm a bad player, or does that mean I was unlucky to be put with bad team mates?


ELO rating system can only be used with w/l since when you win you gain points and when you lose against opponent that has higher elo you lose less points. New players should have 1500 ELO points and when you win and lose those numbers change so new player can have bigger elo then player that has a lot of battles behind him so that is why you see new players in those matches.

#102 Tooooonpie

    Member

  • PipPipPip
  • 96 posts

Posted 10 November 2013 - 03:41 PM

What I'm getting at is that ELO right now is luck based - Do you get put with good players, or not - Individual skill can only carry 11 other people so much, and that paired with random dc's all the time, its easy to see why a lot of people are put off from this game

#103 Deathlike

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPipPipPip
  • Littlest Helper
  • Littlest Helper
  • 29,240 posts
  • Location#NOToTaterBalance #BadBalanceOverlordIsBad

Posted 10 November 2013 - 03:43 PM

View PostTooooonpie, on 10 November 2013 - 03:41 PM, said:

What I'm getting at is that ELO right now is luck based - Do you get put with good players, or not - Individual skill can only carry 11 other people so much, and that paired with random dc's all the time, its easy to see why a lot of people are put off from this game


ELO isn't luck based... the MM is luck based at this point in time.

#104 Diego Angelus

    Member

  • PipPipPipPipPipPip
  • The Warden
  • The Warden
  • 471 posts

Posted 10 November 2013 - 03:55 PM

View PostDeathlike, on 10 November 2013 - 03:43 PM, said:


ELO isn't luck based... the MM is luck based at this point in time.


ELO is also luck based at least when teams are this big you can possibly think that skill of one player can make it a win every single team just because he is good.

#105 Screech

    Member

  • PipPipPipPipPipPipPipPipPip
  • Knight Errant
  • 2,290 posts

Posted 10 November 2013 - 04:06 PM

Checked my pre and post Elo alt accounts. About 750 drops pre-Elo and 450 post Elo 100% solo drops various accounts. Win rate went up 9.8% on the post Elo accounts.

#106 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 10 November 2013 - 05:06 PM

View PostVictor Morson, on 09 November 2013 - 11:44 PM, said:


Which both you and the people who are programming ELO have outright butchered through a fundamental misunderstanding of WHICH stats your are using to figure out your averages.

Calculating individual PLAYER worth based on TEAM outcomes is a flaw in the very basis of everything you are saying, which renders all the rest of it entirely moot. You cannot gather clean data about individual player performance from a completely random team, with every single other random factor (such as tonnages) further muddying the water.

You aren't getting good input. All the math and links about how statistical analysis in the entire world can't change that fact.

Ultimately win/loss comes down to more how lucky you were in pulling an average solid team, or if you brought your own teammates into the match. Period. Even if things even out over time to a 1:1 via all the random elements are you are suggesting, this just means that players in every game will have an artificially high win and loss rate based on their random factors. End result?

Newbie McNewbieton will likely win 50% of his games due to random factors. If he is really bad and causes some losses by himself somehow, that will only influence things a TINY amount. He'll still be lumped in with other players that had a similar team experience.

Again, bad input data, bad output data. You can't statistically analyze things clearly when there is this much out of ANYONE'S control at play.

PS: They often do grab statistics from skewed sources like this in real life..... when they want to push a specific viewpoint. They will purposely pull "muddy data" that they can skew to show a different picture. Statistical analysis is one of the very, very easiest things to get wrong... or manipulate. And the way ELO is done here is very muddy indeed.




The vast majority of the stat he listed (without any firm evidence of the 60-70 figure really) was done in 4-mans, a whole other ballgame.


A critical point in here - gaining Elo is not a reward for skill. It's just a metric reflecting how often you drive a win rather than a loss. Cbill and XP are rewards for kills, damage, assists, spotting and the like. Win/loss is the superior metric specifically because you can't game it - right now for example I'm leveling my tbolt. I do better in terms of winning when I've got 2PPCs and 2xAC5s. I get more kills and help my teammates survive longer. For XP gain though I'm actually loaded with 3xMLs, 2xSRM6 and an LB10X. While I don't win quite as often or get kills quite as often I do more total damage and when I do catch someone with their armor down I pile in the component destruction. My match score average is higher. This is more a byproduct of exploiting the successes of my teammates though, not some sign of my being a better player. You'll likely find that the highest Elo scores are held by light pilots. Lights excel at disrupting enemy tactics and drawing disproportionate attention. One spider can shut down 3 LRM boats and pull several heavy mechs off the line in a match, letting the spiders teammates exploit that opportunity for a victory. The spider may have done minimal damage but his ability to help his team win is what matters to Elo.

So your main concern is the belief that your impact in 12 v 12 isn't statistically significant enough to represent you in Elo? I'd say that statistically a bigger impact than your teammates on your win/loss score is your mech build, how rested or distracted you are and other personal factors but for the sake of simplicity of this debate we'll go with the core of your concern - teammates and your impact on the odds of a win/loss.

First, the difference between small and large sample sized statistics and why Las Vegas casinos make money.

Casinos make money because 'luck', as a statistical representation of probability modeling, doesn't play the long game. Ever. So while some people may win the odds are inevitable that the house wins out. So inevitable that the house inevitably wins out BIG. That's because in gambling statistical 'average' pays the house. Win/loss might swing wildly in increments of 10 matches but over 100, 300, 500 matches probability theory wins with relentless mathematical precision. If it didn't no casino could make money in the long run.

For another thing your team is not as equally probable to be someone who downloaded MW:O by complete accident and is in fact earnestly trying to play Barbie Pet Rescue while being confused about all the weapons fire they're taking. Pretty much everyone on both sides in both matches is going to be outside their first 25 matches. There's a small probability that you might get a newbie but as a given rule if you're a couple hundred drops into any weight class you're not likely to see newbies.

For yet another your own perception of how well you play vs how often you win is absolutely untrustworthy. That's human nature and how your brain works. It'll remember when you do well but your team didn't and you lost even though you got 4 kills but forget every time you got killed early and your team carried the match.

Your impact on your 12 man team in pugs is ~8.333%. 100/12 = 8.333. That's your impact on your teams overall success or failure as a statistical average if you were in fact absolutely average. What's that equate to?

Let's use an analogy. Suppose you were on a ship out in the ocean and you set a course for a port 5,000 miles away. The ship is a huge cruise ship with 12,000 people on it. Every 30 minutes you and 11 other randomly selected people are required to adjust the course either port or starboard 8.333%. If everyone was random the ship would eventually hit the port - or at least pretty close to it. While in the short run there might be stretches where the ship would end up pointed way off course probability theory dictates that it'll average back out to stay on course.

You, however, like to go right. Out of every single 'shift' in adjusting the course you are present for every one of them - you are the only absolute constant in measuring the course correction. Some other people might show up repeatedly but you are accounted for in every match. Suppose you however ALWAYS adjust the course to starboard. That means that since every single other entry will, mathematically, average out after enough 'turns' you will cause a persistent 8.333% shift to starboard. Result? After the first 20 or 30 sets of adjustment you might notice a bit of a starboard shift. After 100 it'll be pronounced. You will in fact have driven the ship ~833.333% off course at this point. So every 25 hours, statistically you would have whipped that big cruise ship into 8 and 1/3rd circles - even though there are 11 other people making adjustments every 15 minutes.

Matchmaking in MW:O is certainly more complex and it would be arrogant to assume that anyone is 100% beneficial to their team every game. That's why it takes a good 100 matches just to get you in the right zone and not unreasonable to say ~500 matches to get you well seated in your appropriate Elo weight class. If you were dropping with the same 11 other people against the same 12 other people it would go far more quickly as there is less variance to account for.

This works however in pugs for the same reason that Google and Facebook work - probability theory helps scrub out random static by dint of volume of data. The more player telemetry there is the more accurate everyones matchmaking is.

How about another example?

www.random.org

On the right is a random number generator. Remember how your performance on your team equates to 8.333%? Certainly not all players are equal. So generate 12 numbers between 1 and 16.666%. That's the 'value' of your team.

Do that 2 or three times. You'll probably see a swing of about 10 points - you'll end up with totals between 90 and 110. You might even get an outlier in the 80 to 120 range.

Do that about 100 times. Add up all 100 of them and divide by 100. You'll end up with a number ranging from 99 to 101.

Now, do that again but only randomly generate 11 numbers each time and for that 12th man always make it 10.

You'll end up with a number ranging from 101 to 102. Why? that extra 1.666% difference between you and statistical average plays out clearly. In fact the more times you roll it the more pronounced it gets. At 500 matches you'll be at a 508.315 instead of 500.

Does that make more sense? There are functional limits on how much any one person, for good or ill, can influence a win/loss in a match so the swing created by your teammates is certainly less than -100 to +100%. It is however perfectly direct to plot and identify, unlike KDR, damage, score or other personal metrics.

It certainly does have a margin of error but that's another place Elo shines - the equation is self-correcting. Since win/loss is a 1/0 factor and not a sliding scale if your Elo gets pushed too quickly in the wrong direction probability theory will ease it back.

Win/loss is the only viable metric for determining an Elo score. Everything else is too subjective. Win/loss in a pugging environment is the only reliable indicator that can be used to place and rank players. Everything else is less reliable and more prone to manipulation.

#107 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 10 November 2013 - 05:13 PM

View PostDeathlike, on 10 November 2013 - 03:43 PM, said:


ELO isn't luck based... the MM is luck based at this point in time.


Luck is the study of statistical anomaly. It isn't 'real' in large aggregate data. The only problem the weaknesses in the matchmaker present is a widening of the margin of error and increasing the number of drops required to seat a players Elo better. We have 3 million players all with 1000 drops in each weight class and your Elo would be so tight you could bounce a quarter off of it.

Right now however the matches you get are a sample of the best available matchup.

Your match consists of a sample taken from everyone who hit 'launch' within ~60 or ~120 or ~180 seconds of when you did, adjusted to try and find the most even total matchup for Elo rank and weight. Of course it's got a wide margin for error. That's not a byproduct of win/loss as an Elo sampling metric but availability of players. The higher your Elo the more likely you are to end up in a less well balanced match because there are fewer people in your band, so you end up with a wider disparity between people on each team trying to find a balance point.

The more you play the better your matches get. More players will do more for the MM though than any statistical tweak ever could. It's still way better than it used to be.

#108 Jonathan Paine

    Member

  • PipPipPipPipPipPipPipPip
  • Survivor
  • Survivor
  • 1,197 posts

Posted 10 November 2013 - 05:17 PM

In the long run, Elo works. For any given match, maybe not as much. However, the CURRENT MATCHMAKER IS BROKEN. Matches are rarely well balanced on skill and weight. Why does this not show up easily in the stats? Because it is consistently broken. Over enough matches, you get to roll as much as you are rolled. In between the really bad matches, your skill will have some impact.

As for the 12 man queue? I don't know if there is enough players there for Elo to make any difference. For Elo to work, we need tons of players playing tons of matches.

#109 Deathlike

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPipPipPip
  • Littlest Helper
  • Littlest Helper
  • 29,240 posts
  • Location#NOToTaterBalance #BadBalanceOverlordIsBad

Posted 10 November 2013 - 05:21 PM

View PostJonathan Paine, on 10 November 2013 - 05:17 PM, said:

As for the 12 man queue? I don't know if there is enough players there for Elo to make any difference. For Elo to work, we need tons of players playing tons of matches.


12-mans have no restrictions. Literally.

The problem with it is that it is difficult to construct 12-mans at any given time, except at common/high playing hours.

#110 Wispsy

    Member

  • PipPipPipPipPipPipPipPipPip
  • Talon
  • Talon
  • 2,007 posts

Posted 10 November 2013 - 05:21 PM

I think you will find the highest Elo scores are held by assault pilots...they bring the most to the table.

#111 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 10 November 2013 - 06:51 PM

View PostWispsy, on 10 November 2013 - 05:21 PM, said:

I think you will find the highest Elo scores are held by assault pilots...they bring the most to the table.


Don't mistake win/loss with firepower/kills/damage.

When a match gets turned around it is because one good Highlander pilot killed 5 on 1 or because a Light pilot pulled off a cap or finished off damaged enemies?

Remember, win/loss is a lot more nebulous. Heavy and assault mechs excel at exploiting opportunities. People in the open, numerical advantages, concentration of firepower. Light mechs excel at *creating* opportunities. Distracting people, drawing people off, backstabbing, disrupting enemy positions. That wins more matches than having an extra PPC.

A completely inept medium pilot who always goes river on Forest Colony could end up with an above average Elo because while he plays in 3PV and can't hit {Scrap} he's gotten good at staying alive (relatively) long when being hunted by 4 or 5 mechs smelling an easy kill. Why is this good for Elo? Because he draws a disproportionate amount of attention. This leaves his fellow teammates with a 3 mech advantage against the rest of the enemy team. It's enough to skew odds in his favor after a few dozen matches.

For pugging I'd say light pilots have the biggest win/loss impact. Capping, blocking capping, disrupting formations and all that stuff that doesn't earn {Scrap} for XP/cbills but makes the difference between a win and a loss. Lights also draw and waste, by dint of mobility, an inordinate amount of enemy firepower and ammunition. A well piloted and fast moving Jenner may attract 30 or 40 shots or more in a single match and convert them into only 20 or 30 points of damage. An Atlas doing the same thing would be dead in no time.

Make sense? That's why win/loss is the right metric to base Elo off of. It most accurately reflects the impact of a player on their teams chance of success. A talented kill-stealer, someone who likes to boat big damage low kill weapons like missiles or LBX, someone who's fast to run and hide to preserve KDR when things go south, they can have great stats but be a terrible teammate.

Win/loss. It's what really matters.

#112 Wispsy

    Member

  • PipPipPipPipPipPipPipPipPip
  • Talon
  • Talon
  • 2,007 posts

Posted 10 November 2013 - 07:03 PM

Trust me on this. It is much much easier to reliably carry a team to victory in a Highlander then anything else. Assault pilots will have the highest Elos.

#113 D04S02B04

    Member

  • PipPipPipPipPip
  • 158 posts

Posted 10 November 2013 - 07:21 PM

ELO does not account for quality of game and that's why people stop playing it.

I want to win/lose a game because I had a hard fought battle that was down to the wire and could easily go either way and was down to a split second faster/more accurate alpha that decided battles.

I'm not into a game where I have to carry the useless team or a comparison of which team has the greater number of idiots. Does ELO work in that situation? Yes. Is it enjoyable? No.

For all the talk about light mechs, the game is won and lost by Assault mechs. No matter how much opportunities or disruption that is created by light or medium mechs or firepower poured in by medium or heavy mechs... Assault mechs seal the deal.

That simply cannot happen when you have idiots in trial mechs like an Atlas firing dual LRM15s all the way at the rear.

EDIT: ELO and MM also means that you get equal chance to have more idiots on your side, as do the opponent. That's not what makes an enjoyable match.

Edited by D04S02B04, 10 November 2013 - 07:22 PM.


#114 Victor Morson

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Elite Founder
  • 6,370 posts
  • LocationAnder's Moon

Posted 11 November 2013 - 12:04 AM

View PostMischiefSC, on 10 November 2013 - 05:06 PM, said:

A critical point in here - gaining Elo is not a reward for skill. It's just a metric reflecting how often you drive a win rather than a loss.


Buzzz... and that is where you're entirely wrong. The true flaw behind everything you keep saying.

It's not a reward, no, but it is an attempt to measure skill for the intent of, gasp, placing those players in matches with players of a similar skill. This is the entire purpose of ELO, even if the execution is entirely lacking.

You keep talking about how you odds even out over time and that is precisely the problem. The small % bump your horrendous lack of skill / amazing displays of skill will get lost in a sea of drops that you won or lost based on a bum MM roll. Every bum roll, that number keeps on averaging out until it means absolutely nothing.

If you're trying to put pilots with others of equal skill you need to track the pilots and not the entire team. Any ELO system using the core logic you are putting forth is less than worthless for the goal of balancing matches.

View PostWispsy, on 10 November 2013 - 07:03 PM, said:

Trust me on this. It is much much easier to reliably carry a team to victory in a Highlander then anything else. Assault pilots will have the highest Elos.


Yes and no. They can make the biggest difference, yes, but ultimately one Highlander will not save a team of Locust pilots calling people tryhards for telling them not to charge the hill.

Again I do remind you the majority of your major ELO streaks have been in 4 mans, which would equally steamroll you if you happened to roll in alone on the other side against your teammates.

Edited by Victor Morson, 11 November 2013 - 12:06 AM.


#115 Hauser

    Member

  • PipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 976 posts

Posted 11 November 2013 - 04:07 AM

View PostVictor Morson, on 11 November 2013 - 12:04 AM, said:

You keep talking about how you odds even out over time and that is precisely the problem. The small % bump your horrendous lack of skill / amazing displays of skill will get lost in a sea of drops that you won or lost based on a bum MM roll. Every bum roll, that number keeps on averaging out until it means absolutely nothing.

If you're trying to put pilots with others of equal skill you need to track the pilots and not the entire team. Any ELO system using the core logic you are putting forth is less than worthless for the goal of balancing matches.


The whole point of Mischiefs argument is that you are the only constant in your team. The effects of your team mates and your opponents cancel each other out in the long run. The matchmaker can botch the match either way. So the only thing that remains, that isn't averaged out, is your own influence.

Now Elo can handle that just fine. It only changes your score when proven wrong and it does it proportionally to expected outcome. So even if fair matches only show up 1% of time you will get to the right Elo score.

Now you're better of making an argument against the matchmaker. But that should be fixed roundabout UI 2.0. It will remove the tonnage matching and replace it with a weight limit. So the match maker will only have to look at Elo which should be easier.

View PostVictor Morson, on 11 November 2013 - 12:04 AM, said:

Again I do remind you the majority of your major ELO streaks have been in 4 mans, which would equally steamroll you if you happened to roll in alone on the other side against your teammates.


Grouping up is an effective way to win, as such it should be reflected in your Elo rating. It doesn't require much piloting skill but we're measuring ones ability to win games.

Edited by Hauser, 11 November 2013 - 04:10 AM.


#116 Dirkdaring

    Member

  • PipPipPipPipPipPipPip
  • Survivor
  • 685 posts
  • LocationTwycross

Posted 11 November 2013 - 04:45 AM

View PostMischiefSC, on 10 November 2013 - 05:13 PM, said:

Your match consists of a sample taken from everyone who hit 'launch' within ~60 or ~120 or ~180 seconds of when you did, adjusted to try and find the most even total matchup for Elo rank and weight.


Kind of. Keep in mind the game puts a pre-made in first. Then if one is available (it should wait) it will put one on the other side. If not, it fills with high rank ELO players.

#117 Boris The Spider

    Member

  • PipPipPipPipPipPip
  • Bad Company
  • Bad Company
  • 447 posts

Posted 11 November 2013 - 05:44 AM

View PostVictor Morson, on 11 November 2013 - 12:04 AM, said:

If you're trying to put pilots with others of equal skill you need to track the pilots and not the entire team. Any ELO system using the core logic you are putting forth is less than worthless for the goal of balancing matches.


But Victor, what are you going to track to determine my contribution? An example from the weekend, Tourmaline Desert, enemy team has the high ground, my team is trying to fight up from the drop-ship debris. I'm sitting on the right flank probing the enemy line when I notice, a freindly medium, assault and two heavies making a move to the left flank. Now I wait until the timing is exactly right, push up into the enemies right flank and send an AC20 round scooting right through the pack to hit the CPT on the far left, give the nearest mech to me a face full of SRM's and put my mech into full reverse.

The CPT on the left turns to see what hit him, if he has seismic, its non functional just as the flanking lance pushing right into his back coring him out in under a second, unhindered they advance to take firing positions. By now, I'm under sustained fire from half the enemy team and my right torso is slag taking with it my AC20 and the majority of my damage potential. The nearest enemy assault smelling blood pushes to finish me, but instead of backing up direct to my team I alter my direction subtly so he twists his torso to the left allowing my team mates on the right flank hit him in the back. This costs me my left arm and the last of my weapons, not that I need them any more, the enemy are sandwiched between 10 fresh mechs all with good firing positions, within a few seconds all that remains is a handful of lights that managed to escape the carnage.

I end the round with about 120 damage, 3 assists and 0 kills. Team score was 12-0. Now should my Elo rating go up or down? It took me nearly 300 words to describe what I did and why I did it, you are never going make any program understand more than numbers and the only number that counts here is 12-0.

#118 Joseph Mallan

    ForumWarrior

  • PipPipPipPipPipPipPipPipPipPipPipPipPipPipPip
  • FP Veteran - Beta 1
  • FP Veteran - Beta 1
  • 35,216 posts
  • Google+: Link
  • Facebook: Link
  • LocationMallanhold, Furillo

Posted 11 November 2013 - 06:08 AM

View PostBoris The Spider, on 11 November 2013 - 05:44 AM, said:


But Victor, what are you going to track to determine my contribution? An example from the weekend, Tourmaline Desert, enemy team has the high ground, my team is trying to fight up from the drop-ship debris. I'm sitting on the right flank probing the enemy line when I notice, a freindly medium, assault and two heavies making a move to the left flank. Now I wait until the timing is exactly right, push up into the enemies right flank and send an AC20 round scooting right through the pack to hit the CPT on the far left, give the nearest mech to me a face full of SRM's and put my mech into full reverse.

The CPT on the left turns to see what hit him, if he has seismic, its non functional just as the flanking lance pushing right into his back coring him out in under a second, unhindered they advance to take firing positions. By now, I'm under sustained fire from half the enemy team and my right torso is slag taking with it my AC20 and the majority of my damage potential. The nearest enemy assault smelling blood pushes to finish me, but instead of backing up direct to my team I alter my direction subtly so he twists his torso to the left allowing my team mates on the right flank hit him in the back. This costs me my left arm and the last of my weapons, not that I need them any more, the enemy are sandwiched between 10 fresh mechs all with good firing positions, within a few seconds all that remains is a handful of lights that managed to escape the carnage.

I end the round with about 120 damage, 3 assists and 0 kills. Team score was 12-0. Now should my Elo rating go up or down? It took me nearly 300 words to describe what I did and why I did it, you are never going make any program understand more than numbers and the only number that counts here is 12-0.

Your contibutions were assisting kills and spotting, AN assist should count towards Elo at 10-up to 75% of a
kill you get. Tracking assists is a valuable bit of data to your performance.

#119 Kunae

    Member

  • PipPipPipPipPipPipPipPipPip
  • 4,303 posts

Posted 11 November 2013 - 07:03 AM

View PostJoseph Mallan, on 11 November 2013 - 06:08 AM, said:

Your contibutions were assisting kills and spotting, AN assist should count towards Elo at 10-up to 75% of a
kill you get. Tracking assists is a valuable bit of data to your performance.

Actually, his contributions were making the enemy move/maneuver in a way to put them out of position.

#120 Joseph Mallan

    ForumWarrior

  • PipPipPipPipPipPipPipPipPipPipPipPipPipPipPip
  • FP Veteran - Beta 1
  • FP Veteran - Beta 1
  • 35,216 posts
  • Google+: Link
  • Facebook: Link
  • LocationMallanhold, Furillo

Posted 11 November 2013 - 07:12 AM

View PostKunae, on 11 November 2013 - 07:03 AM, said:

Actually, his contributions were making the enemy move/maneuver in a way to put them out of position.

This too! Though it could be argued that is part of Assisting. :D





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users