The Life of a Bug
Many folks were wondering about what happens when players submit a Halo Support ticket and how that translates into helping our game development. Below is a complete breakdown of the process and what happens when an issue is reported. Nobody likes bugs – not our players and certainly not our teams – but in software development they are a fact of life. While the types and severity of bugs varies greatly, we hope this overview helps to provides a greater understanding of how our team handles feedback and addresses issues reported by players.
From first contact to reproduction, from evaluation to implementation of a fix, the journey is considerable and each team has a different role to play.
This is the life of a bug.
Let us start by looking at how the Community Team plugs into the equation and then follow that up with information pulled from other teams including Halo Support, the Quality Assurance team, Design, Production, and Engineering. Forewarning: to help break up the length of this, we are going to sprinkle in some MCC PC H3 Screenshots. Let us know if you notice anything new!
Generally, when there is an issue, we tend to see it crop up on social media first. People will post on Reddit, tag us in a Tweet, reach out to us on Discord, or come to our forums and make a post about what they're encountering. Depending on the severity of the issue, meaning how detrimental the issue is, it may be flagged immediately to the development team, or additional digging will be conducted to see how many others are reporting a similar issue. With PC development in particular, there are more variables than ever that could cause an issue to manifest for one particular player and scenario that isn’t seen elsewhere so this project has introduced plenty of new challenges and complexities for all of our teams to navigate. I personally took the liberty of breaking down some of the key pieces of how our Community Team and I work with bugs.
What role does the Community Team play in the lifecycle of a community bug?
Our team specifically works to help flag issues that players come across. A lot of work is done between our team and Halo Support to drive people towards the Support site to report their findings. It honestly is the best way to get your voices heard and helps us identify the problems that people are experiencing. While we may see news crop up on social media, without a detailed support ticket there’s not much further anyone can take the issue. The more people that submit a ticket on an issue, the more attention it receives, and the more likely that issue is going to be focused on for future updates. Of course all bugs and issues that negatively impact players are important, but with the realities of finite time and resources, the development team looks first to ticket volumes to help assess priority.
What represents a High Priority Issue?
To me, if an issue is impacting players, it is something we should be working to resolve. In my eyes, if there is a bug that blocks players from playing, that is the highest priority. Not all issues that are raised to our attention are game related, but as part of the Community Team, that is part of what we investigate. Whether these are crashes, connectivity restrictions, install related pieces, or errors, a lot of effort goes into finding workarounds to help players get back into game.
What does the Community Team do when bugs are fixed?
Well, a big portion of what we do is support communication and act as the studio’s liaisons to the community. This includes driving conversations internally to teams as well as sharing information (like bug fixes) with the general public via blogs, updates, or patch notes. We also update our Known Issues lists to contain the latest statuses of all big-ticket items.
Our goal is to make sure the community is heard, self helps where they can, and continues to use Halo Support so that the wants and desires for bug fixes across MCC are clearly logged in tickets We also help inform prioritization around what is being addressed and communicate back outwards what fixes have been made.
Halo Support is one of our greatest tools for discovering, tracking, and ultimately resolving issues encountered by the community. It’s a place where you can submit tickets with critical details of your issues and get them seen by folks within our studio and passed along to the right people for the job. The greatest benefit to our fairly new Support Site is capturing helpful necessary specifics in a procedural, scalable manner. Here is a breakdown from Conor on the Halo Support team about their role in the life of a bug.
What do you do in the lifecycle of a community bug?
As the Project Liaison, my role is to act as the bridge between the Halo Support team and other teams at the studio. So as an example, let us say that Support is seeing a lot of reports about performance issues in Halo 2: Anniversary. In that case, I gather all of the available information from Support, break it down to the relevant information, communicate with the team that could fix it, and craft a Support response that accurately portrays the status of that team’s investigation or fix.
How does Halo Support handle community reported bugs?
The first thing we do when receiving a report is identify its scope. That means we look for other players reporting the same issue or similar issues. This usually results in a huge pool of information, some useful and some… Not so useful. Once we compile all the relevant information into a single source, which we call a “bucket,” we determine what we can do to solve that issue. Sometimes that results in us forwarding the bucket to a team that can create a fix and supporting them with additional detail. More commonly though, our fixes come from outside of the game: updating drivers, changing settings on the players’ system, et cetera. In either scenario, we aim to regularly inform our players of the status of the investigation and pass along our fixes when they become available.
What, to you, represents a high priority issue from the community?
First and foremost, installation issues. If someone cannot get into the game, then that is obviously a huge issue. Sometimes these issues are out of our control though and there is simply nothing that the team here can do. With that said, we often work with our outside partners to resolve these issues as fast as possible. A close second to installation issues would be anything that ruins the fun of the game. Halo should always be fun and if there is a bug or glitch that cheats players from having a fair fight online or finishing a campaign on Legendary then that is something we want to hand over to the development team as quickly as possible, for them to address sooner rather than later.
What is the most important aspect of a reported issue for you?
All the ticket forms for Halo Support include a section titled “Repro Steps.” This section is essentially asking players to provide us with a step-by-step guide on how they encountered their issue and how we might be able to recreate that process. Understanding what each player did, where they did it, and how they did it is invaluable to our investigations. The more specific someone can be, the better chance we have for reproducing the issue on our side and experiencing the error they reported.
Sometimes we get tickets that are vague, and it can kind of send us down a rabbit hole. For example, a ticket that just says “the game gets choppy when I’m in a vehicle” can mean a lot of things. With a report like that, it is up to the team here to find out which part of the level, which seat in the car is affected, or whether the car needs to be moving.
How do multiple reports of the same issue impact your workflow or handling of an issue?
I want to pass along a huge “Thank You” to all the players who have sent us screenshots or videos of their issue. Being able to see their process and know exactly what they are seeing can save us a lot of time.
The more reports we receive on a single issue, the easier it becomes for us to find a common thread. Support is looking at a lot of different types of issues on a lot of different hardware configurations. So, for a visual issue, one of the first things we look at is a user’s GPU and CPU. Are we seeing reports of the same visual issue on other GPU and CPU combos? Or is it tied to that specific combination of hardware components? If we can narrow it down there, we have a much better idea of how many users are affected and what the potential root cause may be. This logic tracks for a variety of other issue types: Are installation issues tied to a specific storefront? Are performance issues tied to a specific GPU or hard drive? Some network issues are tied to the default settings on specific router models, and we even had an input issue related to a Windows security update. Finding that common thread between each report of an issue helps us both determine the true number of impacted users and helps the rest of the team focus in on potential causes.
What happens when you have one-off reports of issues from the community?
I love one-off reports. They are either the simplest thing in the world or incredible puzzles, but we treat them the same as any other ticket. The simple one offs are usually players who have an out-of-date driver or an account issue that manifests itself in an odd way. Our team can get these players back into the game quickly. Our more unique one-offs require a lot of investigation, so they end up on my plate. I often reach out to other teams here at 343 and work with them to figure these ones out. Again, a huge “Thank You” to those team members who have helped me.
For those who would like to learn more about how to use the Halo Support site, please review this article HERE.
Bugs are found in Test through a variety of different test passes and investigations. When a bug is filed, a Test representative known as a "Redliner" will review the bug and either send it back to the Tester for more information or up to Triage for prioritization. After a bug has been prioritized as a Must Fix for release, the team must track the status of the bug through the release lifecycle. When the bug is resolved as fixed, Test must await a new build to be generated containing the fixes, which can take anywhere from one day to a week or more, depending on what is involved with the fix and if it requires the entire game's content to be rebuilt. Test then verifies that the bug is fixed appropriately and that no other areas of the title have impacted by the change.
When a bug is reported by someone in the community, a Halo Support Agent compares the report with other similar reports and creates an issue "bucket." Buckets are then handed over to a Test liaison, who assesses the issues and identifies if the team has enough information to move forward with investigation. The Halo Support Agents then create feedback loops with the reporting parties to gather any information requested by Test. After all info has been gathered, a task is created to initiate internal investigation into the bug. This causes the usual bug flow lifecycle. The largest difference with community bug lifecycles is that once the bug is fixed, the Halo Support Agent closes the feedback loop with the reporter by informing them that their issue has been addressed.
Release Testing has three parts: Release Candidate Testing (or RC for short), Certification Testing, and Selective Publish Private Flights (or SelPub for short). When the team has identified the Must Fix bug list for a release, the Test team can initiate Release Candidate testing. This is an assessment of the entire game prior to the update going live to Retail (meaning a game launch or a game update). This process can take upwards of two weeks. Once all bugs in the Must Fix list are addressed and the Test team signs off on RC testing, the build is submitted to Xbox and Windows Store certification but not Steam as they do not have a certification process. This usually takes about three days but can take longer than that for various reasons. Once the build passes certification, we initiate SelPub to a group of ~25 external users and allow them to give final feedback on the update prior to its full release. This can take around one week and is the last step before the build goes live to the world. The following is an in depth look at the entire process.
Bugs are generated in test in relation to various tasks and test passes. These include but are not limited to:
So now that you know where bugs come from, let's talk about their lifecycle!
- Build Verification Testing (BVT)
- Baseline test cases when new builds are generated from Engineering/Build farm
- Fix Verification Testing (FVAR)
- Targeted testing around new changes to both the code and content daily
- Functional Test Passes
- Targeted testing around specific areas of the title on semi regular cadences
- Compliance Testing (CERT/Accessibility/GEOPS)
- Tests ran against the entire title to ensure we are compliant with Xbox/Windows Certification, FCC Requirements, Geopolitical requirements, etc.
- Compatibility Testing (Compat)
- Specialized teams assess the title across a plethora of various hardware types and specs
- Milestone Acceptance Testing (MAT)
- Tests ran against code and content deliveries from our external development partners
- Release Candidate Testing (RC)
- Large scale build assessment to lock in candidates for Retail releases
- Flight Candidate Testing
- Build assessment for potential flight builds prior to any flight ring promotion
A bug is identified, and an initial investigation occurs
- This investigation will lock down the most reliable reproduction steps and severity of the bug
- If the bug is initially severe, such as a crash that blocks access to the areas of the game, the tester escalates the bug to the Production team in real time
When initial investigation is complete, the bug is logged in our internal bug database
- The bug is required to have an outline of area, severity, description, reproduction steps, impact, expected results, media, and an assignment
The bug is assessed by one or more people who are deemed as "Redliners"
- Redliners will review every bug that is filed, and ensure that all the required information is present prior to sending the bug to Triage (triage is a meeting where a collection of individuals from across the teams discuss a bug and help determine what happens with it).
- Bugs that do not have all the required info get sent back to the tester for more information
One (or more) times a day, Triage will review every newly filed bug for that day
- This assessment can resolve in many ways, but usually:
- Assigned to Design for input
- Assigned to Test for more information
- Assigned to Production for impact discussion
- Assigned to Engineering for a fix
A Triage loop may occur, where it is assigned to and from Triage until the appropriate Priority is given to the bug, or until all information is clearly outlined.
- Since MCC has titles that have been released several times already, Triage often needs to know whether a bug occurs in previous MCC versions or even legacy releases
- Currently, Halo 2 takes the most time to check as we sometimes need to look back at previous MCC builds, H2V, and H2 Back Compat for a single issue
- Understanding when something became broken not only helps us prioritize, but also understand the potential risk in changing code in legacy portions of the title
- A bug may be prioritized for a fix immediately, deprioritized to a later fix date, moved to the backlog for a future reassessment, or resolved as a non-fix (such as By Design or Won't Fix).
Once the bug has been resolved back to the tester who filed it, the tester must then acquire a new build which contains the fix.
- Often, this is the build that is generated the day after the fix is checked in, but can sometimes be longer due to several factors
When the build containing the fix is delivered, the tester verifies that not only the original bug is fixed but that no new bugs were created around the fix as a result
- If the bug is fixed as written and the fix is stable and healthy, the bug is closed.
- If the bug still occurs as written, the bug is reactivated and sent back to Triage with new relevant info
For bugs found via Community reported issues, such as those reported to Halo Support or collected from Social Media, append these steps to the beginning of the process:
- If the bug no longer occurs but new or similar bugs are occurring due to the fix, new bugs are filed
Release Testing! What does this mean? It gets used a lot but may not mean much until you know how it works!
- The Test liaison parses through "buckets" to identify actionable reports with the highest impact to players
- Often, the liaison must request further information, which the Halo Support Agent collects by reaching back out to the original reporter
- On a regular basis, Halo Support Agents hand off these "buckets" to an internal Test liaison for investigation
- A Halo Support Agent collects all available information, and creates a "bucket" of similar issues
- A player finds a bug and reports it on Halo Support
- The actionable/high impact reports are prioritized for investigation and delivered to the Test team as a task
- From here, the normal bug lifecycle above occurs
- After the bug is resolved/fixed, the Halo Support Agent is notified and all players who filed reports into the addressed "buckets" are notified that their issue has been fixed
- A target is set for a Title Launch or game update to Retail
- Several release conditions must be met before a build is deemed ready
- The build must meet the decided upon spec and contain the expected content
- All bugs identified as Must Fix for the release have been addressed
- < X amount of high priority bugs are being found per day
- No new critical bugs are identified
- The test team runs a BVT and if it passes, testing transitions to RC
- RC Testing comprises of an assessment of the entire game to ensure baseline parity to previous builds and launch readiness
- Once enough of the conditions are met, the Production team informs the Test team that we are ready for Release Candidate Testing (RC)
- RC testing takes on average 2 weeks to complete due to the size and nature of MCC
- During RC, the build will be updated (revved) for the test team to actively check required fixes prior to launch
- In some instances, changes to the build are so large that RC testing must be restarted entirely to ensure content changes are stable in areas that had already been tested
- Often times, this occurs after a “content rebuild”, or a change that forces the actual game content to be regenerated to reflect the most recent update
- Content rebuilds are also the reason that update sizes can be so big by the time they are released.
- If new critical bugs are found during RC testing, those bugs must be prioritized for a fix by Triage prior to releasing the build
- Once the team can pass RC testing (mainly when all required active bugs are fixed or addressed), the build is sent to Xbox/Windows Store Certification
- Certification testing takes at minimum 3 days; however, this can take longer if there are changes to the Store page, new add-ons, or large update sizes
- After the build has gone to Certification, we publish the build to a private external flight group of ~25 people
- This is known as Selective Publish (SelPub) and usually lasts for about 1 week
- This 1-week window is usually the time between passing Certification and releasing to Retail
- If critical or game breaking bugs are found in SelPub, the build needs to be pulled from Certification and a new build must be submitted containing fixes
- If this occurs, repeat from Step 2
- Once we are confident with the SelPub results, Test gives sign-off and the build can be released.
There are a lot of individual pieces that go into this process, but the information that test gathers together helps paint the picture of what a bug is, what it impacts, and helps inform everyone who opens it again on what the real problem is.
Design is a complex discipline that spans all areas of why something does what it does in games. Design aspirations are why a weapon fires at a certain rate and why a menu acts the way it does. Every intentional action that is present in a game has ties to design. Below, Max Slzagor, Design Director on the Publishing Team, helped fill in some details for us of where design fits into bugs and what goes into the decision-making process.
What is the most important aspect of a bug to you?
The most important aspect of a bug for us is how we prioritize it and when and if we should fix it. The latter part seems crazy, so let me explain. Halo: The Master Chief Collection now contains many game engines and depending on how you count, includes over 20 years of code development. There are legacy bugs that the community prefers we leave alone along with bugs that are more recent and don’t provide the experience we want for our players. Fixing any bug also has the possibility of creating new bugs or changing gameplay in a way that diminishes the player experience. Whenever we evaluate bugs, we need to consider the severity, frequency, legacy behavior, and player experience impact of that bug against all the other development work and bugs being fixed.
How do you handle community reported bugs?
All bugs are initially screened by the triage group and assigned a variety of labels, one of which is the community tag. We carefully consider every community reported bug, especially in cases where it is being reported by multiple people. The key thing we have to consider is that there is no singular community group that plays Halo. We have a community of people who started at different points in the franchise, on different platforms, and who play in fundamentally different ways. Some community members are new and some have been playing Halo since the beginning. Some are very competitive ranked multiplayer players while others are more casual. Some community members prefer to play campaign, others co-op Firefight, and others are focused on creating new content in Forge. The design team needs to do its own form of triage that weighs the various community requests and priorities alongside our own design goals and roadmap. It’s a tough decision-making process that evolves over time. We are grateful to all the members of the community and their requests and strive to do our best at balancing decisions across all Halo players.
What, to you, represents a high priority bug?
A high priority bug is either crippling to the player experience, very frequent, or in many cases, both. Another example would be a regression in feature functionality from previous MCC releases. Examples of high priority bugs would be things like crashes, soft locks, or not being able to play part of the game.
What do you do in the lifecycle of a bug?
The triage group has built up a lot of knowledge around bugs and who to involve at each step of the way. In some cases, there isn’t a clear solution to the bug and it is assigned to design for evaluation. From there, the design team will dig into whether the bug is a legacy issue or something new. If it is a legacy issue, we make a decision about whether fixing the bug is a net improvement to the experience or whether it negatively impacts the way people remember and prefer to play the game. If it is not a legacy bug, we look to our design pillars and decide what the right solution is. Sometimes we need to come up with a shorter-term solution to improve the experience and also build a plan for a longer term fix, which could take several months. We evaluate and discuss different solutions with the team. In some cases, there is a need to update existing design documentation to cover questions that are raised by the bug so we can streamline the decision and solution process going forward. With each release, we also refine goals and collaborate with the triage and production teams to continuously improve bug evaluation going forward.
Production is the part of the team that helps make sure everything is on track in development. The project managers help determine what pieces need to be done in order to help reach milestones and complete pieces of a project. They also work with helping to assign out work relating to bugs in the process and help move items through the development life cycle for upcoming patches and releases.
This section will outline how production assesses and works in concert with other teams on the life of a bug. Michael Fahrny, Senior Producer on the Publishing Team, gave us his insights below.
What is the most important aspect of a bug?
I’ll break this down to a few sections and some questions we regularly go back and forth on from the production side:
Does the bug make sense? Can I understand what is happening here?
If I can, the next thought process is what’s the impact to the player? This is typically the most important aspect for us. As we look to the product as a whole, we want as few bugs as possible that have a negative impact on the majority of our player base.
Where does this fit in our current development roadmap?
How risky is this to fix? If it’s a legacy issue, should we even try to, is it worth the risk to potentially destabilize the entire product?
Is this something we can fix properly, or will this just add to the tech-debt bucket? (are we just kicking the can down the road?)
How do you handle community reported bugs?
The short answer here is the same way we handle all bugs. We judge them against all the questions above, weigh them against player impact, and then funnel them into our workflow where appropriate. They are on equal standing with our internal bugs as part of our daily triage process.
What, to you, represents a high priority bug?
Went into this a bit above but the bugs that impact players the most are what really matter to me. These include things like:
What do you do in the lifecycle of a bug?
- Stability and performance issues
- Functionality regressions
- Anything that causes players to take longer to get into a game mode
- Large legacy issues that take away too much from the core Halo experience
Production is the gatekeeper here – all bugs must pass through us before anyone touches them.
We have a daily bug triage that is the one, do not touch, do not move, rain or shine, this meeting must happen every single day. This is to ensure information and prioritization continues to flow properly throughout the team.
Each discipline on a team helps handle various bugs for various reasons. Whether it’s to help organize the flow of work, conduct investigations, make calls on whether it is worth the time investment, worth the risk, and other various decisions in the process of a bug. Engineers are the wizards of game development. They help bring everyone’s wants and desires together and make it possible for everything in a game to happen. Without code, games simply wouldn’t exist. Here is Sean Cooper’s perspective on the role and a breakdown of where engineers fit into the life of a bug.
What is the most important aspect of an issue from an Engineering perspective?
I do not think there is an aspect that can be singled out as the most important. Rather, it is fuzzy. There are multiple aspects that need to be weighted together. From there you start to have a clearer picture of not only the impact of the issue, but also questions that other stakeholders (especially non-engineers) typically have. Remember, issues are not just the concern of engineers, but also art/design/production.
- Is this an issue, or a design that is being miscommunicated?
- This could be attributed to an issue I had in Reach: “AI Fire rate is greatly increased in Legendary Firefight”
- When you compared Reach on 360, which updates the game and frames at 30hz, to Reach in MCC, which updates instead at 60hz, it would seem like the AI were firing faster.
- However, when you slowed the games down you would start to see where the smoke and mirrors were being utilized at 30hz. To hit their designer specified firing rate for the given difficulty, every few frames the AI had to fire two shots in the same frame. With 60hz, there’s greater precision/fidelity in the game and visual updates. So, the number of shots the AI are taking is more clearly visible.
- Is this an issue, or is it now a feature :)?
- “H2: CAMP: PLAY: SKULLS - Warthog Horn and other vehicle weapons incorrectly shoot Scarab rounds when Scarab Skull is enabled"
- That was decidedly a feature. I disabled the fix that someone put it in for that.
- When was the issue introduced?
- Was it present in the original release?
- Was it introduced in MCC circa 2014?
- Was it introduced recently, and not present in the previous CU (content update)?
- Have we shipped the issue to customers yet?
- What are the repro steps?
- That is, what are the steps you need to take to reproduce the thing being described as an issue?
- What is the repro rate?
- That is, when performing the repro steps X amount of times, what is the rate at which it occurs? If you perform the repro steps 5 times and it occurs only 4 times, you are looking at a repro rate of 4/5, or 80% of the time the issue occurs.
- What platforms does the issue occur on? All? PC only?
- Does the issue occur along the golden path?
- That is, will a user hit this when playing normally?
- Or do they have to stand on their head, repeat “Regret, Regret, Regret”, while they spam the X and Y buttons during a half second window of time on a loading screen?
- Is the issue blocking testing of other features?
Those aspects/facts and more help make an informed decision for deciding the Priority and Severity of an issue. These are typically weighted with a number ranging from 1 to 4 (we’re are not a house that follows Pri/Sev 0). From there you can now start to balance one issue, against the ever-growing list of issues and features. All of which are being scheduled for a given sprint, which relates to a specific release/CU.
How do you handle community reported issues?
This is generally an area of concern for Design and Production. Sometimes I hear of issues the community is voicing before they are properly filed in our issue database and triaged. Sometimes I may know roughly or exactly what area of code may be introducing the issue and can get a head start on investigations, or even a fix.
What, to you, represents a high priority issue?
Generally, to me, if an issue is blocking a feature from being used or QA from testing something. Especially if it is something I can fix. Everyone should be free to do their jobs without friction.
An example is when we ask QA to reproduce an issue in a debug build so we have logs, only to find out they can’t because some other change went in and broke the debug build, while seemingly not impacting the shipping build.
What do you do in the lifecycle of an issue?
Sometimes I join Triage meetings to help inform on issues. This could be to note that I saw a CL (change list) that was submitted which most likely caused that. Or to note whom we could assign the issue to for further investigations or fix.
All other times, I am assigned an issue for feedback, but to not make changes on just yet as it needs more input prior to addressing. This could be for a risk assessment, initial ideas around what the issue may be caused by, or in some cases, I am assigned an issue to investigate and/or fix myself.
What are some of the most challenging issues you have encountered on MCC?
Pretty much anything relating to Halo 1 networking. It is a very ancient engine. There have been some improvements over time at an architecture level. E.g., last year I switched it all from C to C++, which makes integrating some changes from later games easier. However, its networking and simulation architecture does not match any of the later Halo games, and the game-oriented debug tooling is very primitive or non-existent. Which makes investigating problems even more time consuming.
Also, Halo 2 network co-op. Issues relating to this area of MCC were some of the first things I was assigned back in 2018. And then again, this past month. Part of this related to H2 never having network co-op prior to MCC. The other part was further complications by the remastered game engine. Prior to the recent work I had to do, the loading process was Load Remastered Level -> Connect All Peers -> Load Legacy Level -> Play. The problem lies with how long it takes to Load Remastered Level.
One machine running on an SSD could load in say, 30 seconds, while another machine may be rocking a very slow external HHD and take 90 seconds to load. However, we only let peers wait for one another for so long before we exit back to the frontend (I think it is around 45 seconds). So the first machine loads in 30 seconds, then times out after 45 more seconds, then 15 seconds later the other machine is like “I’m ready, how about you?” and waits 45 seconds, not realizing the other already gave up.
So, while it may not sound like it, there was quite a bit of work to instead reorder things to Connect All Peers (Pregame) -> Load Remastered Level -> Load Legacy Level -> Play (In-game).
- The legacy levels load fast, but there are many pre-existing assumptions by the remastered engine’s code that it loads first before the legacy level.
- Then you must handle the case where you play from level-to-level. That involves touching in-game lifecycle logic, which is different from the pre-game’s lifecycle.
- H2 is also not designed to run networking on a different thread unless the engine’s main loop is not running. So, there is a careful dance of ensuring networking is never being starved of time to process during the loading phases. Else one machine may not send traffic and cause the host or other machine to consider them timed out and close the connection.
- There was also a lot of duct tape that I instead replaced with tried-and-proven code/workflows already found in Halo 3.
- Then I had to verify that all my changes were playing nice with regular co-op, playlist co-op, solo campaign, and multiplayer (which has different code pathways and no remastered level loading).
Thanks, Scoops, for helping break down engineering’s impact on bugs. And that is it, the life of a bug.
Everyone has their role to play, and we hope this gave you a bit more context (and clarity) around not only the complexity of how an issue gets selected and solved, but also the importance of filing that ticket on Halo Support and getting involved.
And speaking of issues that received plenty of tickets, let’s take a look at the big ones coming out of H2’s recent launch on PC.
Halo 2 Projectiles and Cursed Halo 3
With this months’ game release, we had a handful of issues that came to light within the first few hours of launch. As unfortunate as it was, the team worked diligently across all disciplines and studios to isolate, resolve, and release a hotfix for it while keeping the community in the know of the status of these changes along the way (and as you know, all while being scattered working-from-home during quarantine). Below, Scoops will walk us through these unfortunate issues we faced.
The Halo 2 projectiles issue was fun…. the first time it happened.
The issue was related to interpolation code that was added in fairly late and is only compiled in PC builds, and unintentionally the dedicated server build. Xbox builds do not compile the interpolation code. One change for interpolation was in a function that gets the camera position of a biped. Players control bipeds. At most, there are four local players, or ‘users’. This function was tweaked so that it would update and read from the interpolation data for the user controlling that biped. However, it was not handling the case where the user controlling the biped did not exist on that machine. So, the user index for non-local bipeds would be -1, which then caused entirely different memory in the interpolation state to be written to and read from. Well, this is further complicated by the fact that this same code path is used by the function that creates a weapon’s projectiles. Because to do that, the game needs to know where the player’s camera was aiming. When it comes to the host and remote players, the same code path is still taken but using the predicted weapon fire state of the remote player. Not all weapons are fired in the same predictive manner, which is why some weapons would inflict damage, while other weapons would not.
The fix here was one, not compile the code for the dedicated server, but also two, guard against cases where the user index was -1.
Halo 3 has various achievements related to players looking at very specific points on very specific maps. One of those maps is Valhalla (internally known as Riverworld). The code that performed these checks, once again, got the user index for a given player…but failed to sanity check that it was not -1, meaning that player was not local to the machine. It would then take that user index and use it to read and write from a static array that tracked how long players were looking at the sign on Valhalla that triggered the achievement after two seconds.
As it turns out, in the Xbox build that shipped with Halo 2’s release, the data immediately before that static array was the pointer to an address where we map some physical memory. What do we put in physical memory? All teh thingz! Game data, textures, you name it! Well, things don’t work so well if you say, zap that pointer with a zero (aka, NULL) or assign it with a game time which is very much not a valid address or more entertainingly, it is a valid address and then you get some very bad things to show up before crashing.
To make matters worse, the order static data appears in the executable will not always be the same. New code is added, some is removed, or a different build machine is used on the farm from the previous build. So, it could seem like it “works on my machine!”, or you load a debug build and nothing is going amuck. Now you must investigate things in a build with no debug code and with all the optimizations possible enabled. That is never fun.
Anyway, the fix? Check for -1. Do not do all teh thingz. Problem solved.
Here is a picture of the code that was ultimately doing The Bads. This has lived in MCC since 2014. Turns out landmines are not limited to Halo 3 gameplay, but the code too.
It has since Got Good. Which is more than I can say for my K/D ratio. That remains to get fixed.
What about that other issue, you know, the one with ghost rides and whips? In efforts that are going on behind the scenes for some other work, we ended up making two non-immediately obvious typos on two lines related to object replication. This is the process which a host tracks and informs clients about game objects and their state. The host gives clients the replication data needed so the client can predictively run the game simulation. Well, the typos related to a change in how some flags were set. On both lines we ended up leaving out a proceeding line which would zero out pre-existing flags. This was needed because the code is related to re-using replication data from previously used state (more efficient to allocate a large contiguous table then use/free as needed without resizing the allocated table). So, the issue only really reared its head in larger games, or after a match had gone on for so long to where unused state would be re-used again.
There were also some other changes which increased the size of some structures that are allocated from the network heap. Basically, this is a fixed area of memory where allocations relating to networking operations are kept. Halo 3 still had the same network heap size as it did back in the Xbox 360 days: 1.5MB. We have since bumped this up. Running out of network heap memory is not a critical error, at least it will not crash your game. But it will lead to problems with trying to allocate new simulation entities to track the state of game objects, among many other network/simulation operations.
Were you walking up to a Brute Chopper, trying to get in, only to not get in and appear as if the vehicle was mimicking your movements? Well, more than likely the host, or possibly your game, ran out of network memory to track the event that said you are now driving that vehicle so your client state should get in the chopper. Or it ran into the replication flags previously mentioned that were not being reset correctly before reuse.
Cursed Halo can be fun. In moderation.