Forums / Support / Halo: The Master Chief Collection Support

[Locked] Services Issue 8/28/18

OP ske7ch

11:00 AM PDT Update
We believe we have resolved the issue with our game services that impacted Halo 5, Halo Wars 2, and Halo: MCC this morning. We are seeing and hearing from players that matchmaking is up and running again. Please note that it's possible we could see intermittent hiccups as things return to normal and we will continue to keep a close eye on everything. Thank you for working through this issue with us - this is why we wanted to release the update ahead of our Xbox Game Pass release on 9/1. Now jump in and let's play!

10:00 AM PDT Update
We are still investigating this issue at this time and working to resolve it. We are seeing though that players are able to matchmake more at this time. We will continue to update as we have updates.

9:15 AM PDT Update
We've identified a services issue that is causing the outage. We saw that Halo 5, Halo Wars 2, and MCC all be impacted by this issue. As we work through this issue we are seeing players matchmake successfully as we clear out the full issue. You may have intermittent issues while we continue to resolve this services problem.

8:30 AM PDT Initial Update
Our engineering team is currently investigating a services issue that began at approx. 6:30AM PDT this morning. Currently, MCC matchmaking is unavailable and you will be unable to retrieve playlist data or connect to servers.

You'll see the following error message:
Contacting server to get the latest matchmaking data. Please wait."

We also have reports of impact to Halo 5 and Halo Wars 2 though both of those games' services appear to be functioning correctly at this time (though they could experience intermittent issues as the broader problem is addressed).

We'll keep this thread updated as more information becomes available.
Thank you
Some folks have been asking what happened? Did the MCC update break the game? Is this "MCC launch all over again?"

Well the good news is no - the update didn't break the game and there's nothing wrong with MCC itself. We did, however, experience an unfortunate issue with the game services that MCC (and Halo 5 and HW2) rely on for things like Matchmaking.

The issue we worked through today is precisely why we released the MCC update ahead of the 9/1 Xbox Game Pass launch. Thanks to this head start and players jumping in early, we were able to get in front of this before many more players were potentially impacted.

Here's the inside scoop on what happened today that resulted in a few hours of matchmaking interruptions - spoiler: it's a cache thing.

Around 4AM PDT, a new service that is being used for the first time with the new Halo: The Master Chief Collection update started experiencing issues.

This new service provides static content for the MCC title and started responding slowly to requests for this content. Typically this is not an issue because we have a caching service in front of some of our systems to protect our players from situations like this. However, in this situation, the caching service was not actually caching the static content.

We discovered that the content being served back by this new service was not including a needed cache control header that's supposed to instruct the caching service to cache it (and for how long). Thus, our caching service wasn’t actually performing the protective duties that it was built for. Instead, all requests were going back to this new content service (the “origin” in caching terminology) and were therefore susceptible to slow responses.

How did some slow responses cascade into a multi-hour outage across multiple Halo titles? As explained, the bulk majority of the requests for content were routing back to the new service, which was not scaled out to handle that sheer volume of requests (remember, the system build out had assumed that the caching service in front would be handling the bulk majority of the content requests). Since most of the calls to the caching service went back to this new content service, and this service started responding slowly, the caching service built up a growing queue as it waited for the static content service to respond back. As these requests piled up, it caused a backlog for all requests going through the caching service, including requests for content from different content origins. This impacted other parts of our system as those services could not get the content that they were requesting in a timely fashion (thus the impact to Halo 5 and even Halo Wars 2).

We alleviated the issue by updating a configuration in our caching service to timeout any retrieval for un-cached content within a few seconds. Besides this new static content service, the other content origins were responding within milliseconds, so having a quick timeout did not impact them. For the slowly responding static content service, this allowed our caching layer to purge its backed up queue and to prevent much of a queue from reforming. This enabled our systems to recover and get all Halo titles back up and running except for MCC, since it still depends on content from the slow responding static content service. We then implemented and deployed an updated build of the static content service so that it returned the correct cache control header that enabled our caching service to behave as we expected it to and cache the content.

By approx. 10:30AM we saw the issue clear up and players started successfully getting into Matchmaking again. We've been monitoring closely since and so far haven't noticed any lingering issues. Our services and game teams will continue to keep a close eye on everything in the coming days as we continue to see more players install the update and jump online.