Forums / Games / Halo 5: Guardians

Ranking & Matching in H5

OP NNMS MXMS

After realizing I've been making a false assumption (details here), I've put more thought into the problems with H4 matching and ranking. I've also spent a lot of time reading and re-reading about TrueSkill and the H2 ranking / matching system. I believe there is a better way than either of those alone.

First, it's important to list what we want a ranking system to do:

.....1. Reward behavior that contributes to a team win.
.....2. Punish behavior that contributes to a team loss.
.....3. Accurately reflects some combination of individual skill and ability to play as part of a team.
.....4. Updates based on recent performance.
.....5. Easy for players to understand.

Second, it's important to list what we want a matching system to do:

.....1. Accurately estimate a player's skill level.
.....2. Place players of nearly equal skill into matches with each other.
.....3. Match parties in such a way that each team has an equitable mix of parties and lone wolves.

These are not the same goals. They are similar - but not the same. So if you attempt to use the exact same number to accomplish both ranking and matching, you will always run into problems.

Let's say you want to punish a habitual quitter via rank reduction. In all previous systems, doing so will result in that person getting easier matches. It doesn't punish that player - it punishes all those who are forced to play against someone who's actual skill level is not properly reflected by rank.

Or let's say that you have a player to whom K/D and win percentage is more important than the rank number. That player can easily partner up with some low-level players and kill farm in CTF while allowing his team to lose. He only gets one loss (but a boost to the K/D), but because the loss is to a much lower-ranked team, he can drop quite a few ranks. That gives him easier matches while he ranks up again, and the overall effect on his win percentage is to artificially inflate it.

There are other difficulties as well that do not involve intentional manipulation of rank. In the TrueSkill* system, ranks grow very stable over time. While this is good for matching purposes (since a player's skill changes only gradually), it's not ideal for ranks. We want ranks to change based on recent games to provide a constant incentive for winning every game. The H2 system definitely provides that feature - but sacrifices matching stability (which allows players relatively simple avenues to manipulate ranks and stats).

In short, there is no single system that can provide both optimal matching and optimal ranking.

H2 for Ranking

The H2 ranking system is quite beautiful, in my opinion. It is simple and easy to understand, so players will always know why they gained or lost rank. It provides a clear incentive for winning (as you cannot rank up unless your team wins). It constantly updates based on recent performance. Because it is a points-based system, it allows easy implementation of ways to punish undesirable behavior or even allowing rank to decay with time. These things make it a very elegant system for ranking. They also make it unstable and a poor estimate of true individual skill, which is an undesirable trait for a matching system.

TrueSkill for Matching

The TrueSkill ranking system is mathematically robust - given the right inputs. It has the ability (if used properly) to distinguish between individual skill even among members of the same team. Because it converges to a better and better (and, hence, stable) estimate of a player's skill over time, it provides an excellent means of matching players based on their actual estimated skill. The characteristics that make it good for matching, however, render it less desirable than H2 for ranking. The problems with using TrueSkill in H3/Reach/H4 were not issues with TrueSkill itself, but rather with the information that those games fed TrueSkill - a decision driven by the desire to also use TrueSkill as a ranking system.

If TrueSkill is used for matching only, it can be optimized for finding individual skill. Statistically, using individual performance (rather than team performance) provides not just more information, but also more accurate information about the true skill level of a player. Individual performance has obvious issues when used for rankings - but not for matching - so long as some measure of ability to win is included. To accomplish that, each player on the winning team should be given a win bonus prior to ordering the players by points. Points should only be awarded for activities that contribute to a win - like kills in Slayer and flag caps in CTF. Snapshot/comeback kill/assassination bonuses never should be applied. Going to individual performance and eliminating the extraneous scoring will greatly improve the matching algorithm over both H3 and H4.

Benefits

Combining the two systems as described above provides the following benefits:

.....1. A rank system that rewards winning, is easy-to-understand, allows punishing undesirable behavior, and updates based on recent events.

.....2. A matching system that tracks individual skill such that:

..........a. Players that sometimes play in parties and sometimes play solo can be accurately matched in both cases.
..........b. Deranking no longer has any statistical benefit (as deranking =/= getting matched against worse opponents, since TrueSkill is more stable than the ranks).
..........c. Rank punishments no longer benefit the punished player by providing easier matches (since rank and skill estimation are decoupled).
..........d. Kill farming and other means of not playing the objective result in continually harder matches with no corresponding benefit to rank.
..........e. Increased matching efficiency by starting every player at the median (rather than biased low in H2).
..........f. Avoids the need for complex scoring formulas (like score = K + A - 0.5 * D) because kamikaze play no longer has any rank benefit.

Regardless of whether you agree or disagree, hopefully this makes sense.

TL;DR: Use the H2 system for ranks. Use individual TrueSkill (adding win bonuses and removing extraneous scoring) for matching.

*Note: H3 used TrueSkill. So for those of you who pine for H3, you probably ought to know that the H3 system is exactly the same as the H4 system for team-ranked lists (objectives, proving grounds). There is zero difference between the two.
......I love you.....

Seriously though I 100% agree!
Quote:
......I love you.....
You can have my orange juice.
Thank you for this description. I think it eloquently combines the best elements of the previous Halo matching/ranking systems. One can only hope that 343i takes note :)
I love this idea!
Here's a way of visualizing both the ranking and matching systems in-game:
A player's rank will be shown as ranks 1-50, with 50 obviously being the highest (like in Halo 2, just as you mentioned). A player's TrueSkill rank that will determine how that player is matched up with others can be shown as the military ranks that first debuted in Halo 3 (giving a nod to the Halo 3 fanatics). Just as in Halo 3, both a number rank and a military rank will be issued, but instead of Halo 3, they will be completely independent ranks. This would also be great because if I saw a General that was only ranked, say, a 15, then I'd know that the player isn't a good team player, while the person may very well have a large amount of skill. This would help determine if players would want to party up, join clans, etc.

Along those lines, it'd also be awesome if ranks (1-50 and/or TrueSkill) could be displayed as a secondary emblem on a player's Spartan. Any player would have their standard Spartan emblem, an option to enable their 1-50 rank and/or TrueSkill military emblem, as well as possibly the clan emblem (should clans return; hopeful!). Again, this would be a way of showcasing any/all ranks that a player has (just as the military does IRL), and would also show allegiance to a clan, if the clan system is built into the game as it was in H2.

Just some thoughts.
Quote:
I'm not sure displaying both TrueSkill and the rank are a good idea. The whole idea of the H2 ranking system is to emphasize winning as a team - and I believe that is how the majority of the fanbase would like to see games played. Establishing TrueSkill awards could detract from that, as maximizing the TrueSkill estimate necessarily means sacrificing the rank, and - by extension - the incentive to win.

Instead, I'd prefer to see icons based on playing habits. Someone who has performed well as a driver could get a wheelman icon. Someone who caps a lot of flags could get a flag icon. Things like that to indicate where the player's strengths lie. It's a slightly different take on what you wrote above, but still along the same lines.

Unless the H3 military ranks are tied directly to the 1-50 ranks, I'd rather not see them return.
Quote:
Quote:
I'm not sure displaying both TrueSkill and the rank are a good idea. The whole idea of the H2 ranking system is to emphasize winning as a team - and I believe that is how the majority of the fanbase would like to see games played. Establishing TrueSkill awards could detract from that, as maximizing the TrueSkill estimate necessarily means sacrificing the rank, and - by extension - the incentive to win.

Instead, I'd prefer to see icons based on playing habits. Someone who has performed well as a driver could get a wheelman icon. Someone who caps a lot of flags could get a flag icon. Things like that to indicate where the player's strengths lie. It's a slightly different take on what you wrote above, but still along the same lines.

Unless the H3 military ranks are tied directly to the 1-50 ranks, I'd rather not see them return.
.........This one is speechless....you consistantly read my mind....
I'm pretty sure I've learned something here today. Not sure what it is but I learned something.

This sounds like it could work rather well... though I can't help but admit that rankings have never been my forte and this has left me kinda overwhelmed.

But this nonetheless sounds like a pretty dam solid way of building the game.
It seems like there might be a possibility of a disparity between the two numbers. If the one used for matching is then kept invisible you will get a feeling that the system isn't working.

If we're going for two numbers, then I think one should be some sort of trueskill-like and shown in some fashion- either numerically or categorically, doesn't matter.

Then players who have been active for a given period of time or number of games are ranked by percentile, creating a built in leader board.

The mean and standard deviation for the active population rating can be calculated rather quickly by computer and used to make matches. Perhaps +/- 1 standard deviation for your estimated rating, as an example. Inactive players would necessarily use their previous rating for matching, but could gain a level of uncertainty that would allow that rating to become more flexible and change to fit changes in the distribution. Then a percentile could be given after a time.

I don't actually expect the distribution of skill to be normal (probably very skewed right), but its likely a decent and functional assumption.

So, as a an example. Let's assume we start be very one off at 1500 rating. Wins, losses, top 4 showings, etc will affect that rating. After a dozenor so games the system pegs you at 1640, and assigns you to the 67 percentile. Now you know that about 67% of all active players have ratings less than 1640. If you then are paired with players around the 50th thru the 75th percentile, you should be able to look at their trueskill-like ratings and see how you differ to verify the quality of the match.
Quote:
Quote:
Quote:
I'm not sure displaying both TrueSkill and the rank are a good idea. The whole idea of the H2 ranking system is to emphasize winning as a team - and I believe that is how the majority of the fanbase would like to see games played. Establishing TrueSkill awards could detract from that, as maximizing the TrueSkill estimate necessarily means sacrificing the rank, and - by extension - the incentive to win.

Instead, I'd prefer to see icons based on playing habits. Someone who has performed well as a driver could get a wheelman icon. Someone who caps a lot of flags could get a flag icon. Things like that to indicate where the player's strengths lie. It's a slightly different take on what you wrote above, but still along the same lines.

Unless the H3 military ranks are tied directly to the 1-50 ranks, I'd rather not see them return.
.........This one is speechless....you consistantly read my mind....
Lol. :)

I actually posted an idea like that waaaaay back when Survivor first came out because I was so annoyed in BTB trying to get laser kills. I was tired of perks being associated with specializations. I liked the concept of some kind of medal associated with how you play . . . I just didn't want perks associated with it.
Quote:
It seems like there might be a possibility of a disparity between the two numbers. If the one used for matching is then kept invisible you will get a feeling that the system isn't working.
There will be a disparity between the two numbers. Matching needs to be based on individual skill in order to arrive at the best match. Ranking needs to be based on winning in order to incentivize the right behavior. Those are different things, so the numbers will be different.

Consider playing Snapshots in the Team Snipers list. If you play it, how many times have you seen the top players on the leaderboard end with 30+ kills . . . but only a handful are snapshots? Maximizing individual score maximizes TrueSkill . . . but leads to playing the game in a way that was not intended.

Now you could say, just count snapshots in individual score. But does that really matter? The guy with 30+ kills and no snapshots is often on the losing team. He obviously already cares more about K/D than winning. So only counting snapshots in individual score actually benefits his strategy. It ensures he remains low in TrueSkill so that he can kill at will.

If, however, you separate the two numbers and let TrueSkill reflect individual skill, he will get matched against harder and harder opponents - to the point where he won't be at the top of the scoreboard. He also won't have gained any rank. If you let the two numbers be different, then "incorrect" behavior - i.e., not playing the game the way it was intended to be played - is naturally discouraged. Sure, you could still play by not doing snapshots, but you'll end up with no rank benefit and being matched against tough opponents such that your K/D will suffer.

If you keep the numbers the same as in your proposal, you're right back to where we started: a system that is a good ranking but poor matching system, or a system that is a poor ranking but good matching system, or a system that is suboptimal for both.

Only by letting them be different can you obtain ranks that match how most people want Halo to be played and have a matching system that puts people together who have similar actual skill.

That is why there is no need to show the TrueSkill number. Knowing that number contributes nothing except the possibility that people will try to maximize it instead of rank. Only show the number that reflects playing the game as it was meant to be played: Rank.
Snapshots is a poor example to support this idea. If the game type is intended to only reward snapshots then it should only award points for them.

If you want to promote "proper" play you make the proper way the best route to victory. Capture the flag doesn't award points for kills. If someone wants to pad k/d they can, they'll likely lose.

Its also worth noting the kdr became a metric because a proper skill rank went away. Its a nice stat, but mattered very little previously when a skill rank showed up next to your name.

Differentiating between the two metrics is unnecessary. A basic win/loss mitigated by individual performance and a high population is all that's required. The caveat being how individual performance is rated- highest kills might matter in a slayer game, less so than in an objective. Which brings us back to rewarding proper play. If we're looking at an objective game we need to consider what should count and how that changes player motivations.
Quote:
Snapshots is a poor example to support this idea. If the game type is intended to only reward snapshots then it should only award points for them.

If you want to promote "proper" play you make the proper way the best route to victory. Capture the flag doesn't award points for kills. If someone wants to pad k/d they can, they'll likely lose.

Its also worth noting the kdr became a metric because a proper skill rank went away. Its a nice stat, but mattered very little previously when a skill rank showed up next to your name.

Differentiating between the two metrics is unnecessary. A basic win/loss mitigated by individual performance and a high population is all that's required. The caveat being how individual performance is rated- highest kills might matter in a slayer game, less so than in an objective. Which brings us back to rewarding proper play. If we're looking at an objective game we need to consider what should count and how that changes player motivations.
I already addressed what happens if you simply don't award points for non-snapshots. Doing that makes it easier for people to play that way.

The only metric you see is rank. TrueSkill is used only for the purpose of matching people of equivalent skill. Nothing else. It's not a player stat that appears anywhere. It's only a calculation to estimate skill.