Continuing the Release Management Series , let’s talk about how we changed our bug prioritization system for 5.4 and why.
In recent times, we have been focusing a lot on quality. One common, though inaccurate, measurement of quality is the bug count. In the last two years, we’ve been crushing out over 11,000 bugs per year (!), but even so, we never came close to reaching zero bugs. We don’t make shrink wrapped software running on fixed hardware platforms with specifications that can be locked in. Aside from cases of our own optimizations or fixes creating new bugs, new issues arise daily. There are countless cases, some of them not even coming from our own code base.
So, the question we ask ourselves is, “Which bug do you fix first?” Unity has so many dimensions to work with, so is it fair to compare a “small platform bug” against a “editor usability bug”? Under the old system, we would track a priority, but the teams defined the priorities individually, so it was decidedly inconsistent. What is a shipstopper for one group does not make it a shipstopper for all. While some teams with larger overviews and ability to cross spaces had a consistent priority system, most did not and what was often organizational team priority was assigned at a global scope. You might see how this comes to be a release management problem.
So, in comes “user pain”. The “user pain” idea can be found in a lot of places, but the seeds for us started with another Director of Development in R&D, Erik Juhl, recalling a methodology used during his MMO days where the team would assess the risk of fixing a production bug. It went something like this:
In that situation there was a threshold to reach in order for the whole group to believe a bug was worth fixing immediately. This worked for its purpose of “live” issues in a service and case by case analysis, but it doesn’t exactly scale to bug databases. However, the idea was something we latched on to.
Finding A Formula
With this example in mind, our release management sampled all the old priority system “Priority 1 (Drop everything now)” and “Priority 2 (Shipstopper)” bugs marked for fixing in 5.4, which amounted to some ~150 bugs at the time in mid-development.
We started with something like this:
Here, Severity and IsRegression were information that previously existed in our bug tracking system (see below about Severity changes). “QA Risk” was an attempt to describe the risk in the ability to test the bug. “Technical Risk” was attempting to capture the complexity and “halo effect” (the likelihood the fix will influence another part of the system).
In looking at the sampling of bugs and calculated pain, the list ordering looked wrong from our perspective. Mainly, we decided to drop QA Risk & Technical Risk as they are things only for release assessment, “if we should accept a bug fix”, instead of user pain, which should represent “should I prioritize fixing this.”
So, we next refined the formula to something like:
We introduce a number of things here:
- Platform Importance – to separate individual edge platforms spanning to cross platform
|1 = default||Single platforms, i.e. Tizen, PS4, etc.|
|2 = Core Platforms||Desktops + iOS + Android + WebGL|
|3 = Editor||Greater than Core Platforms, because working on making the game trumps shipping a particular platform|
|4 = Cross-Platform||Most or all platforms|
- Prevalence to consider how widespread it is
|1 = Affects few or very select users|
|2 = Affects some, but not majority of users|
|3 = Affects a majority|
|4 = Affects everyone|
- IsEnterpriseSupport – addressing Enterprise Support requests and their importance
- IsUpcomingPlatform – working with various partners, this is a way to highlight issues that might need to be tracked
- IsInstability – is a bug causing internal development slowdowns by introducing an automated test instability. While not obviously user pain, it’s indirect and very very impactful which is why it gets such a high rating. These can cause significant slowdowns that impact how much we can fix or improve quickly.
- Vote – a simple vote counter collection from the Issue Tracker
This formula gave a nice spread of bugs that seemed to properly give weight to appropriate bugs, and we worked with this idea for a while. Over time we made the following adjustments:
- IsUpcomingPlatform was dropped, as it would carry excessive weight beyond the life of the initial release. Plus it was expected that the team in charge of the platform would be owning, raising concern and chasing these bugs.
- IsEnterpriseSupport was dropped. This was in recognition that the Enterprise Support team in conjunction with our Sustained Engineering were already on top of these marked bugs. We were running into each other looking at the same bugs with extra engineers now looking at bugs that someone was already working on.
- “Release Manager” tag was added to give weight to items like Known Issues or other items showing up lower in the ranks that were of high importance to users, but scoring did not reflect so.
So now we’ve arrived at:
We started with a threshold of 54, on the basis of:
- Severity (Major functionality impacted) >= 2
- Platform Importance (at least Editor) >= 3
- User Prevalence (Majority) >= 3
- Is Regression (True) = 1
However once we went on and removed the Enterprise weighting, we dropped the threshold to 48 by arbitrary inspection. This is the current state of things, but we recognize it’s a living idea and may change over time.
The one other adjustment was made that our Severity classification was somewhat broken. It would allow minor issues to trump functional issues on the basis of “no workaround available.” Here’s the shift of mapping we decided to implement along side of things:
|1||Crash or major loss of functionality||Crash, freeze, data loss or work stops|
|2||No workaround||Core functionality majorly impacted|
|3||Workaround is possible||Secondary functionality broken|
|4||Minor or cosmetic issue||Minor or cosmetic issue|
Across R&D there’s been appreciation of the new approach. Many teams have adopted it and are quite pleased thus far. QA members and release management have gone through lots of the existing bugs to assign appropriate values for the newer variables we had to add.
With the adoption of this process, people are starting to step away from the old priority mechanisms and view the bugs in light of how they impact the users.
What’s interesting about the graph above, is that painful bugs still counted for less than our per team assessments of priorities. The “Pain” line grew because we expanded scope of bugs being assessed far beyond 5.4 P1 + P2. It eventually encompassed any bug listed from the past 5.0 to the future 5.5 (5.x) for all old prioritizations of P1 – P3, plus our bug triage queue. The numbers still were less than the “high priority bugs” of our past weighting. Part of the up and down was the incorporation of 5.3 items and then dropping of the IsEnterpriseSupport bugs which already had coverage.
Thus far, what we’re tracking with user pain plus a couple other items like upgrade issues is showing to be largely the main user concerns.The current feeling about the release is fairly positive without correlating as strongly to the number of P1s and P2s.
So, what’s next for the “user pain” system?
Well, currently it’s a fairly manual analysis for us, and not fully integrated into our bug system. Also, we need to take a pass at all the older bugs that haven’t been assessed for platform importance and prevalence.
Additionally, we still know there’s tweaking to the formula to consider. We may have missed some aspects to include or are still giving a lot of weight to area we shouldn’t. Also, there’s a great benefit to being able to change the formulas when needed, so it better reflects the issues of the moment. We know that the pain index is not an absolute measurement, but guidance to help show how we might consider prioritization.
Lastly, we’ve started experimenting with the idea across other areas including the release risks for acceptance of fixes during RC/patch. We’re also thinking how this might be useful to prioritize features or re-factors.
It’s all still an experiment and thus far current results and perception is that a more unified system of prioritizing is helping. We’ll continue the experiment forward and update you some time in the future on how things keep going.
For the next round, we’ll talk about some internal tooling changes.