Jump to content

The Jarl's List: The Leadboard Tool You've Been Waiting For!


349 replies to this topic

#101 Nightbird

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 7,518 posts

Posted 14 November 2017 - 03:33 PM

View PostScurro, on 14 November 2017 - 08:52 AM, said:


Here is the last three leaderboard seasons pulled straight from the database. It will include some other data that is used in calculating scores.

Tier and mech information is not published by PGI and I am unable to pull that data.


Awesome, I'll look into it Saturday, too burned out on weekdays

#102 Nightbird

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 7,518 posts

Posted 14 November 2017 - 03:44 PM

View PostMischiefSC, on 14 November 2017 - 02:10 PM, said:


Tonnage.



I would bet that if you put the 12 best players together in a random group of 24 and this group had 50% less tonnage, it will still win most of the game. I.e. tonnage is one predictor and might make a difference when total skill is balance. I for one love to see plenty of 100 tonners when I play a locust :)


View PostMischiefSC, on 14 November 2017 - 02:10 PM, said:

Also the MM is not there to give a 50/50 chance; it's there to test predictions. You really want it to be off by as much as 8% to start, usually though about 4% assuming it has the confidence of 120+ matches per player involved. You want the variance to be off by a bit more than estimated margin of error. At least until you get the players to a point where your confidence in their score is 98-99%, but that's a lot of matches.


The MM is there to make balanced matches, 50/50 is the goal. With my current season, and only with some 50 matches, I am sure it doesn't work as I'm already more than 3 standard deviations away from 50/50 per the binomial distribution. Otherwise, what is the point of MM? Why not just randomly put people together?


View PostMischiefSC, on 14 November 2017 - 02:10 PM, said:

Oddly the more I think about it the more I think you sorta DO want tiers - just based of confidence in the players ranking. So the higher the confidence level the tighter you can try to make the balance for matchmaking and still be making accurate adjustments from the results.


Tier is something to counter skill. Skill makes you win most of your matches, but if you're matched against someone of equal tier, it SHOULD lower you back to 50% again. If Tiers did it's job, most people would have a WLR of 1, with rare exceptions of 1.3, 1.5, etc, but you would never see 3, 4, 5 lol!

View PostMischiefSC, on 14 November 2017 - 02:10 PM, said:

KDR/Damage/Match Score are associated metrics, not predictors. At the highest tiers of performance you'll very likely see a tight correlation because everyone already has a comparable skillset. Out side of that top few % (probably 7 or 8%, on the skill curve that's the point where it turns into more of a cliff than a curve) W/L will have way less viable relation to other metrics.


Past performance is a predictor of future performance. If statement is obviously true to you, then by analyzing past performance using metrics, you can build a model to predict future wins and losses. If you predict more losses than wins, then the 'tier' should be lowered; Vice versa if you predict more wins. In other words, performance and tier should negate each other with a perfect matchmaker.

#103 Tarogato

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Civil Servant
  • Civil Servant
  • 6,558 posts
  • LocationUSA

Posted 15 November 2017 - 11:22 PM

View PostNightbird, on 14 November 2017 - 01:03 PM, said:

Given any group of 24 players, it should be possible to create teams with close to 50% win loss chance, but more data means more predictors.

[...]


What I'll try to do is use one season to generate weights for stats we have, like KDR, Kills per match, avg match score, and see if I can predict the W/L ratio better than the MM. Remember, the MM always predicts 1, so I want to be on the right side of 1 and closer to the true W/L ratio. What will this accomplish? It will mean I can generate a better number to represent a player's Tier than the current PSR.

This better number can be used in place of the PSR formula, to create better matches.


You might be interested in this, though I might not read it just yet if you want to come up with your own method independently first - findings in here *could* potentially influence what you do. https://mwomercs.com...is-of-the-12-0/

Those matches were recorded during a certain time period, I still have all the time-relevant data concerning the players involved, and the match results. Shouldn't be hard to engineer your method to predict results and compare to what actually happened.


View PostMischiefSC, on 14 November 2017 - 02:10 PM, said:

Tonnage.

Have a look at my thingy in link above.

#104 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 16 November 2017 - 12:51 AM

View PostNightbird, on 14 November 2017 - 03:44 PM, said:


I would bet that if you put the 12 best players together in a random group of 24 and this group had 50% less tonnage, it will still win most of the game. I.e. tonnage is one predictor and might make a difference when total skill is balance. I for one love to see plenty of 100 tonners when I play a locust Posted Image




The MM is there to make balanced matches, 50/50 is the goal. With my current season, and only with some 50 matches, I am sure it doesn't work as I'm already more than 3 standard deviations away from 50/50 per the binomial distribution. Otherwise, what is the point of MM? Why not just randomly put people together?




Tier is something to counter skill. Skill makes you win most of your matches, but if you're matched against someone of equal tier, it SHOULD lower you back to 50% again. If Tiers did it's job, most people would have a WLR of 1, with rare exceptions of 1.3, 1.5, etc, but you would never see 3, 4, 5 lol!



Past performance is a predictor of future performance. If statement is obviously true to you, then by analyzing past performance using metrics, you can build a model to predict future wins and losses. If you predict more losses than wins, then the 'tier' should be lowered; Vice versa if you predict more wins. In other words, performance and tier should negate each other with a perfect matchmaker.


Matchmaker consists of more than one component. One is the predictive model that identifies who is worth what - that's Elo, as an example. The other is the team builder part that puts everyone together. They're two separate things.

A common misconception is that they're there to make matches as close to perfect as possible - the problem is that it's not really possible. Even the same person can vary by no small bit over the course of a day from mood, calories in their blood, exhaustion, focus, distractions, ping, there's absolutely some wiggle room in there.

Beyond that you have two pieces to a predictive model for something like a matchmaker. You have an estimated value and then you have a confidence level in that value. That's more than just a margin of error, you have to try and figure out how much you trust your own ranking.

So to get rankings more precise you need to intentionally skew odds in the match to get test results you can trust enough to then modify their stats. So if you've only got a 90% confidence in a teams estimated win value but the match is only a 3% variance you can't really trust the results enough to then adjust peoples values afterward. So you actually need periodic unbalanced matches to test your values. You do this until you get their confidence level to 98 or 99% (or better), at which point you can start trying to tighten match values up.

That's where I'm saying tiers need to be based on confidence level, so you play enough matches with some swing in balance that your confidence gets dialed in. Then you can have closer matches.

As to always being at a 1.0 w/l.... still not going to happen for most people. We're learning creatures. Learning curve is still a curve. What actually happens when you get something like a really good matchmaker and people play consistently with people at their own skill level and slightly better... everyone starts to improve. You stretch the curve and keep stretching it. It's why we keep breaking records in sports and what constitutes a 'good season' has gotten more and more difficult in almost everything.

The predictive statistical analysis around something like a matchmaker is actually really cool (if you're a statistics nerd, which it sounds like you are). Getting an accurate predictive model around human behavior (normally what I deal with is employee performance and customer behaviors but it's surprisingly similar) involves intentionally created inaccuracy in a carefully pre-selected amount and used to identify which direction of the inaccuracy someone consistently falls on, until you can successfully identify the bits about them you don't know.

The cunning use of math is sexy.

#105 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 16 November 2017 - 12:56 AM

View PostTarogato, on 15 November 2017 - 11:22 PM, said:

You might be interested in this, though I might not read it just yet if you want to come up with your own method independently first - findings in here *could* potentially influence what you do. https://mwomercs.com...is-of-the-12-0/

Those matches were recorded during a certain time period, I still have all the time-relevant data concerning the players involved, and the match results. Shouldn't be hard to engineer your method to predict results and compare to what actually happened.



Have a look at my thingy in link above.


I just looked over it again. I enjoyed it when it first came out.

Since match score is buffed by winning it makes sense that the people with the best w/l rate will have a higher match score; match score is strongly adjusted by w/l.

What I really want to know is how big the pool is and the average tonnage breakdown, plus high/low tonnage breakdown for who's in queue. That would give us a better idea of what we're looking at.

#106 Lily from animove

    Member

  • PipPipPipPipPipPipPipPipPipPipPip
  • The Devoted
  • The Devoted
  • 13,891 posts
  • LocationOn a dropship to Terra

Posted 17 November 2017 - 09:48 AM

View PostDavegt27, on 07 September 2017 - 12:43 AM, said:

I don't see this guy he says he is the best




where are these 0 aim people, and why does that nova has 4srm 6's wow thats adder level of equipment, xD

#107 Nightbird

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • The God of Death
  • The God of Death
  • 7,518 posts

Posted 17 November 2017 - 09:57 AM

Don't worry, everything I do will come with a P-value :)

I won't look at that thread Taro linked until after I finish. I already plan to test interactions to see if they're relevant.for example, yes score per match correlates with w/l, but I expect a much higher correlation with kills/match.

#108 Tarogato

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Civil Servant
  • Civil Servant
  • 6,558 posts
  • LocationUSA

Posted 01 December 2017 - 07:15 AM

The database has been updated to Season 17 and Scurro and I have added a new feature:

Now when searching for any single player, you will get a detailed breakdown of their stats for each season/month in addition to their overall stats. This even includes %improvement in Adjusted Score, if you like to track your monthly progress.



Known issues:
- if searching for the same player twice with mixed cases (ie., "tarogato" and "Tarogato") the entire query will void.
- if you are exactly on a higher percentile, such as "99.00%", it will display only as "99%".








Inside the spoiler, a gratuitous bit of history for those so-inclined:

Spoiler


#109 Brauer

    Member

  • PipPipPipPipPipPipPipPip
  • 1,066 posts

Posted 01 December 2017 - 07:39 AM

View PostNightbird, on 17 November 2017 - 09:57 AM, said:

Don't worry, everything I do will come with a P-value :)

I won't look at that thread Taro linked until after I finish. I already plan to test interactions to see if they're relevant.for example, yes score per match correlates with w/l, but I expect a much higher correlation with kills/match.


Thanks for providing this resource and adding the new feature! As a fairly new MWO player (started in season 12) it's interesting to see my per season stat trends. Season 14 and 15 are my statistical peak so far, and I think that coincides with my time in Tiers 3 and 2 when I started to get the hang of things and consistently perform well for the tier. I know there is plenty of subpar play in Tier 1, and I am guilty of it at times, but at least for me there has been a definite difference.

#110 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 01 December 2017 - 07:40 AM

I am hugely grateful for this improved leaderboard, it's reams more useful than the existing one.

However I'm struggling to figure out the reasons for the weighting - for example, my best month from a population percentile perspective was season 12 - however I only had 57 matches, my w/l was 1.59 and my KDR was 1.89 and my match score was 316. I mostly played heavies.

Season 17 my win/loss was 2.4, KDR 2.02 and because there was almost no missile use of any sort my match score was 296 average. I also had 153 matches, 3x as many to give me more accurate results.

The only difference is that season 12 I played like 6 matches in lights, where as in season 17 I played no matches in lights but 18 matches in mediums, in both instances the rest of the season was split between heavies and assaults.

So by playing fewer total matches but half a dozen drops in lights my ranking is 6% of the population higher, even though I managed to drive wins in 71% of my QP matches in season 17 vs 61% in season 12, have improved my survivability by 25% over prior from 12 to 17. Match score average is 20 pts lower (dem lazors vs SRMs and dakka because KMDD ftw) but it seems like the difference between lights and assaults is just... huge.

Is it really that significant? So if I am a good light pilot or I shift focus away from winning and more toward big score matches it's that much more significant? What's the basis for the spread on the skew for light/med/heavy/assault?

Not saying it's not an accurate representation (though I would argue I am a significantly better player now than I was 5 months ago) but I'm trying to understand the reason for the weighting as is.

#111 NeirSolon

    Member

  • PipPip
  • Trinary Star Captain
  • 39 posts

Posted 01 December 2017 - 07:59 AM

Huge kudos to you and your team, Taro. Great work that keeps getting better. Now if only they would allow us to split solo from group queue.

#112 Xiphias

    Member

  • PipPipPipPipPipPipPip
  • Littlest Helper
  • Littlest Helper
  • 862 posts

Posted 01 December 2017 - 08:41 AM

View PostMischiefSC, on 01 December 2017 - 07:40 AM, said:

However I'm struggling to figure out the reasons for the weighting - for example, my best month from a population percentile perspective was season 12 - however I only had 57 matches, my w/l was 1.59 and my KDR was 1.89 and my match score was 316. I mostly played heavies.

Season 17 my win/loss was 2.4, KDR 2.02 and because there was almost no missile use of any sort my match score was 296 average. I also had 153 matches, 3x as many to give me more accurate results.

The only difference is that season 12 I played like 6 matches in lights, where as in season 17 I played no matches in lights but 18 matches in mediums, in both instances the rest of the season was split between heavies and assaults.

So by playing fewer total matches but half a dozen drops in lights my ranking is 6% of the population higher, even though I managed to drive wins in 71% of my QP matches in season 17 vs 61% in season 12, have improved my survivability by 25% over prior from 12 to 17. Match score average is 20 pts lower (dem lazors vs SRMs and dakka because KMDD ftw) but it seems like the difference between lights and assaults is just... huge.

Is it really that significant? So if I am a good light pilot or I shift focus away from winning and more toward big score matches it's that much more significant? What's the basis for the spread on the skew for light/med/heavy/assault?

Not saying it's not an accurate representation (though I would argue I am a significantly better player now than I was 5 months ago) but I'm trying to understand the reason for the weighting as is.

It's not the fact that you dropped in lights that increased your percentage rank significantly, it's the fact that your base score was higher. In season 12 you have a MS of 316 (adjusted to 300) with 2% (mediums) unaccounted for. In season 17 you have a MS of 296 (adjusted to 270) with 3% (lights) unaccounted for. The biggest difference is the base MS, not the adjustment. In season 16 you have a MS of 315 (adjusted to 281.5) with 15% (lights+meds) unaccounted for.

The reason for the much bigger drop is that the adjusted score is not taking classes where you've dropped fewer than 10 matches when it calculates the score.

As for the weighing issue between classes, you can show with the stats that heavier weight classes consistently average higher matchscores. Because the leaderboard is based on MS, farming damage will always be the way to get a higher score and heavier classes are betting at doing this. The weight is trying to normalize the "skill" across weight classes since with two equally skilled pilots a 100% heavy pilot is going to outscore a 100% light pilot.

#113 MischiefSC

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • The Benefactor
  • The Benefactor
  • 16,697 posts

Posted 01 December 2017 - 09:03 AM

View PostXiphias, on 01 December 2017 - 08:41 AM, said:

It's not the fact that you dropped in lights that increased your percentage rank significantly, it's the fact that your base score was higher. In season 12 you have a MS of 316 (adjusted to 300) with 2% (mediums) unaccounted for. In season 17 you have a MS of 296 (adjusted to 270) with 3% (lights) unaccounted for. The biggest difference is the base MS, not the adjustment. In season 16 you have a MS of 315 (adjusted to 281.5) with 15% (lights+meds) unaccounted for.

The reason for the much bigger drop is that the adjusted score is not taking classes where you've dropped fewer than 10 matches when it calculates the score.

As for the weighing issue between classes, you can show with the stats that heavier weight classes consistently average higher matchscores. Because the leaderboard is based on MS, farming damage will always be the way to get a higher score and heavier classes are betting at doing this. The weight is trying to normalize the "skill" across weight classes since with two equally skilled pilots a 100% heavy pilot is going to outscore a 100% light pilot.


That makes more sense.

I just view it as an excuse to play more MRMs.

Though at this point I'm pretty convinced that CT drilling or legging with laservomit is hands down the best option to win matches.

#114 Humpday

    Member

  • PipPipPipPipPipPipPipPip
  • The Pharaoh
  • The Pharaoh
  • 1,463 posts

Posted 01 December 2017 - 09:58 AM

W00t! This is cool!
Although...I didn't quite realize how strongly I leaned on my heavy mechs. Interesting.

#115 Xiphias

    Member

  • PipPipPipPipPipPipPip
  • Littlest Helper
  • Littlest Helper
  • 862 posts

Posted 01 December 2017 - 10:12 AM

View PostMischiefSC, on 01 December 2017 - 09:03 AM, said:

Though at this point I'm pretty convinced that CT drilling or legging with laservomit is hands down the best option to win matches.

Oh certainly. Killing things quickly is the best thing for winning matches. Blowing side torsos off on the other hand is the best way to get KMDD, high matchscore, and lots of cbills if you are farming.

#116 Humpday

    Member

  • PipPipPipPipPipPipPipPip
  • The Pharaoh
  • The Pharaoh
  • 1,463 posts

Posted 01 December 2017 - 10:35 AM

When you look yourself up and you realize....you're still a potato compared to everyone...sum {Dezgra}

#117 Tarogato

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Civil Servant
  • Civil Servant
  • 6,558 posts
  • LocationUSA

Posted 01 December 2017 - 11:57 AM

The formula for the `Adjusted Score` has been modified slightly to better account for "missing" games in the weight class multiplier. This change will have the greatest effect on players with very few matches played, particularly when looking at the individual pilot per-season stats.


For instance, if you played 10 games each in lights and mediums, but only 5 games in each heavies and assaults, then you wouldn't show up on PGI's leaderboard for heavies and assaults, but the global leaderboard would see that you played 30 games altogether. The Jarl's List sees that 30 game total, but only sees the 10 games in each lights and mediums, so it assumed you played 50% lights and 50% mediums, and then applied a weightclass multiplier based on that. This is now fixed so that the Jarl's List properly sees that you played 33% lights, 33% mediums, and 33% "other", and gives you a more fair multiplier.




View PostHumpday, on 01 December 2017 - 10:35 AM, said:

When you look yourself up and you realize....you're still a potato compared to everyone...sum {Dezgra}

You are where I was two years ago. The best part about being there is that you begin to realise just how much breathing room you have to improve into.

Average match score becomes increasingly meaningless the higher up you go... starting at around 350 or so. But if you can reach that milestone, and maybe extend close to 400, then you've basically achieved the highest measurable echelon. Beyond that limit you need to judge players by a heck of a lot more than just their scores.

For reference, I average just over 300 raw matchscore when playing potato mechs. If you want to see how you size up, take one month and just play the best meta builds you can muster, see what kind of scores you post. Then you can go back to doing whatever, having now discovered where you sit when you relinquish any of your mech-induced limitations. =P

#118 Bluttrunken

    Member

  • PipPipPipPipPipPipPip
  • The Patron Saint
  • The Patron Saint
  • 830 posts

Posted 01 December 2017 - 12:00 PM

Good Job!

#119 Schonnes

    Member

  • Pip
  • The Pharaoh
  • The Pharaoh
  • 11 posts

Posted 04 January 2018 - 05:23 AM

Hi Scuro,

first of all - thanks for that great tool!
I really enjoy looking at the stats and comparing the guys within our unit.
Nevertheless - I think I found a bug (at least in my stats). Because - even if I really would be happy with the result - I think my overall adjusted score is wrong. I do not find any senseful way of calculating to get to that number...
Could you check that? Or - just tell me how you calculate the average?

Best Regards - Schonnes

Posted Image

#120 Tarogato

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Civil Servant
  • Civil Servant
  • 6,558 posts
  • LocationUSA

Posted 04 January 2018 - 10:16 AM

View PostSchonnes, on 04 January 2018 - 05:23 AM, said:

Hi Scuro,

first of all - thanks for that great tool!
I really enjoy looking at the stats and comparing the guys within our unit.
Nevertheless - I think I found a bug (at least in my stats). Because - even if I really would be happy with the result - I think my overall adjusted score is wrong. I do not find any senseful way of calculating to get to that number...
Could you check that? Or - just tell me how you calculate the average?

Best Regards - Schonnes


Your Overall Adjusted Score takes more into account your recent months. For instance, your last four months you had scores of 298, 346, 377, 316 - these are weighted more heavily because they are recent. Also, you played more lighter mechs recently, which also helps your Adjusted Score (because the expectation is that lighter mechs perform slightly worse, it boosts your score a little bit the more you play them)

There are some details at the bottom of the page that explain the formulae.





12 user(s) are reading this topic

0 members, 12 guests, 0 anonymous users