MischiefSC, on 12 April 2014 - 04:43 PM, said:
It actually wouldn't be a hard variable to solve for. Playing in a premade doesn't necessarily given an advantage; what you're really wanting to measure is synergy. How well specific players do together.
So you'd track a paired Elo for every player who plays with another player, this way it self-corrects for the size of the premade team. They get an Elo modifier based on the difference between their 'stock' Elo and their premade Elo with that player and average it with everyone in that specific premade instance. You'd also want a minimum seed threshold before you applied the variable. Dunno what the standard deviation is for Elo in MW:O
<snip..>
Good thoughts. I'm not sure I 100% understand your approach, so apologies if I've misinterpreted something (and correct me please if I have misunderstood!).
It definitely better reflects the biases that can impact player skill than our simple per-weight class system we currently have. However, here are some issues that would need to be considered with pair-wise tracking of Elo:
This is N^2 in memory cost on number of players. We currently have around 1.6 million player accounts or so. Not all are active, but an enormous percentage are or were active at some point, have played games, and have stats and ELO values associated with their account.
A non-sparse matrix of Elo's would then cost around 2.5 trillion entries worst case. Would this is be a symmetric matrix of Elos? Would player A's Elo when playing with player B be the same as player B playing with player A? If so storage costs could be cut in half.
For the 3 player group case now, we have many Elo's being tracked: A-B, B-C, C-A, and potentially the inverse cases B-A, C-B, and A-C.
For the 4 player group case we have A-B, A-C, A-D, B-C, B-D, and C-D, and potentially the inverse cases.
I think this works out to N choose 2 for the general case, so 1/2(N)(N-1); and so a 12 player group (worse case) costs 66 or 132 additional stat writes. That could definitely cause us issues with additional write-pressure at end of game.
Furthermore I'm not certain this is a true general solution, consider a case where players A and B perform *very* well when teamed with C due to C's leadership abilities; but terribly with D. So the pairwise Elo between A and B would heavily depend on the presence of C or D in their group.
To fully generalize using specific Elo tracking I think might require a separate Elo for every single group playing the game. Instead of a 2-D matrix of pairwise Elo's this takes it to an N-dimensional matrix. Definitely outside the realm of feasibility in terms of storage and data tracking costs.
There's another problem with highly fractured Elo's. We see it a bit with the 4 Elo categories we already have in fact, as players switch weight classes and find wildly different match qualities as a result. The more categories of Elo's we store and track, the more Elo's in our system are not seeded or updated correctly for that player. Here is one example with pairwise tracking. A played with B a few times back in closed beta, and B decides to take a long break. A continues playing and becomes very skilled indeed. If B returns and A and B decide to play again, A's archived pairwise Elo with B has him categorized as a new player!
For these reasons I would really love to explore the options we have with tracking just a few Elo's only. Say a solo and grouped Elo only, and examine other emergent properties of teamplay to infer other sources of bias, and this is where I would really want to pull in a data-mining / statistics expert to assist with. Are there correlations with group size? Are there correlations with Mech tonnage? Are there correlations with kill/death ratios? Are there correlations with # of games played? Do these correlations correlate between themselves, so combinations of group size combined with play count? We certainly have the terabytes of data required to curve fit or otherwise experiment with. I realize we won't be able to take this all the way to perfect, but I'm not sure that's a reasonable goal to set out with in the first place. The cool thing is this is one of the places where we can actually run controlled experimentation. We would, on launch of an updated system, be able to say with confidence, 'this new system is X% better at predicting group player skill'.