I'd like to briefly pick up on the strength estimates that I mentioned earlier.First, 343i is likely using Microsoft's TrueSkill algorithm. Paper linked here if you're interested: Microsoft's Research TrueSkill Study. It is really similar to the Glicker system, which is actually a good ranking system, but Glicker is currently on 2.0 now so maybe Microsoft could update their own TrueSkill someday: Glicker 2.0.Also, in your proposal you recommended using order statistics to solve the matching problem, but you might be confusing matching and rank. Order statistics are used for determining rank over time. Matchmaking is completely independent of that. What you want is for smurfs to not get paired with officially good players at all or reduce the consequences to the good players (where adjusting TrueSkill comes to play). That can be done in several ways, but here are two: reduce variance tolerance in matchmaking to cut-out smurfs entirely from high-level team games (Gold or higher) or favor quantity of wins over the probability of a win when adjusting MMR. That way one smurf game wouldn't matter if you were consistently winning otherwise.
Perhaps a few plots clarify my point here as I'm pretty sure the effect of smurfing could, in theory, be reduced by some minor adjustments to the ranking system.
To simplify things, I'll only talk about 2v2s in this example as it gives us some nice plots.
At the moment, I'm pretty sure we're seeing something (~) like this:
http://www.wolframalpha.com/input/?i=plot+(x%2By)%2F8,x%3D1..4,y%3D1..4Where x is the MMR of player 1, y is the MMR of player 2, z is the attributed team strength.
Let's take a look at 3 example teams A = (2.5, 2.5), B = (3.5, 3.5) and C = (4.0, 1.0).
Clearly, even in 2v2, pairing a good player 1 with an (apparently) bad player 2 (as in C) can easily yield the same team strength (z = 0.625) as 2 not so good players (Team A, z = 0.625) while Team B seems to be much stronger (z = 0.875).
Of course, we don't know if the ranking system really works this way, but this would have been my first approach and it explains the current dilemma.
Let's assume we now add some logarithmic weight to the MMR of the dominant player.
We then would see something like this:
http://www.wolframalpha.com/input/?i=plot+(max(x,+y)+*+(log(max(x,+y))+%2F+log(6))+%2B+min(x,+y)+*+(1+-+(log(max(x,+y))+%2F+log(6))))+%2F+8,+x+%3D+1..4,+y+%3D+1..4With the following values for z: 0.3125 (A), 0.4375 (B) and 0.415 (C)
Team B would still be better than C but the single strong MMR of C would yield a score that is larger than A and closer to B.
This is, by no means, a proposal but rather an explanation why I think this whole problem with smurfs could be solved by more carefully chosen team ranks.
That's a very good read, thank you.
I actually never thought much about the matchmaking internals but TrueSkill seems to be a neat idea.
Perhaps my assumptions have been slightly too naive but there are 2 points that come to my mind after reading the TrueSkill study.
1) In the factor graph, each player exhibits a performance p_i centered around their skill s_i. The team performance t_j is then modelled as the sum of the performances of its members. I wonder whether this assumption of the teams strength estimation isn't quite similar to the one that I talked about earlier.
2) The paper does not answer how you derive the KFR adjustments from the match predictions.
You said I confused matchmaking with the KFR adjustments but did I, really?
I would claim that the adjustments to the KFR are (almost) orthogonal to the MMR adjustments, aren't they?
Regarding your two proposals:
A) I would guess that reducing the variance tolerance in matchmaking is probably not feasible as the player base is not large enough.
B) We probably shouldn't favour quantity of wins over the probability of a win when adjusting MMR.
And except for the peak times, the matchmaking system shouldn't even have too much of a choice in Halo Wars 2 anyway.
But if we now come to the question how we update a KFR rank based on a game prediction, I'd still claim that there is the option to focus on dominant team members.
Considering a strong Team X and a smurfing or inactive-but-experienced team Y, you might want to take the team strength distribution into account when updating the KFR even though X might have a higher probability to win according to TrueSkill.