Leveraging Holistic Synergies

Triaging and squashing bugs in existing Django apps (This Old Pony #53)

We're officially in the middle of an ongoing series on how to prioritize issues in your Django application.

This week, we're going to talk about one of my favorite topics:

A very elusive bug

That's right, bugs[0]. More specifically, strategies for triaging, identifying, and resolving bugs - in your web app, at least.
 

What is a bug?

A bug is an insect with piercing and sucking mouthparts_..._ sorry_,_ what I mean is that it's an error, a defect, a thing that does not do the thing it was supposed to do.

Bugs can be of trivial concern and they can also cause industrial catastrophe. They can result in application errors and they can look like normal working software. They can be painfully obvious once someone has identified a problem and they can be subtle, based on factors no one imagined affecting the software. And perhaps most importantly, they can be as easy to fix as deleting - or adding - a delimiter, or require significant architectural changes to evade.

You won't know how your bug or bugs fit into these spectrums initially, not for every category, but the first step in approaching any bug is to identify as much of the above as possible.
 

Triaging bugs

There are tools for debugging, including Python and Django specific tools, but - so that we can belabor the point - the most important tools are strategies. You need to understand who has identified the issue, when, doing what, and whether it's repeated. A bug is like a bit like a crime statistic: the evidence is likely based only on reports, not actual incidents.

Ultimately, your goal is to figure out how critical a bug is, whether it's user facing, and what the costs of leaving it unsolved are. If it's a less-than-a-nuisance for one staff user and affects no one else, perhaps a resolution can wait. But if you discover that it's a data error or application error even and causing ripple effects throughout the application for customers? Then you should tackle it immediately.

This goes against the common wisdom that you always tackle bugs first. However this common wisdom - while reasonably wise - misses the implicit economics of software development decisions. Weighing the costs and benefits of identifying and fixing a bug against the costs and benefits of working on a business driving feature, it may make the most sense to put off going after the bug first[1].

My own recommendation is to create a hybrid prioritization based on both criticality and perceived ease of identification/resolution. The latter is an element of a "snowball" strategy where you attack small things first and build momentum to the large issues. It tends to be more an issue of psychology (both individual and team) than technical, but code doesn't write itself. You may be left starting with a giant ball of mud because it's so serious and urgent, but most often you'll end up rolling up at least a few good bugs before getting to something really hairy. Aside from generating happy feels from watching tickets close, you'll also likely start clearing the field for additional updates and bug fixes.
 

Quick wins: what to look for

When you're hunting for bugs with a somewhat-known scope, there a few things you can keep an eye out that tend to lend themselves to including or hiding buggy behavior:

Tools of the trade

Debugging tools let us do several different things: observe, replicate, inspect, and test. 

In the case of application errors, the first step is making sure you have a good system set up for tracking exceptions and errors. Logging is a good step here, and if you get the full stacktrace included, that's a significant step forward. In our own work we usually recommend a dedicated error tracking service, and when integrated into your project these can often get the full stacktrace by default, including context for each frame, which is _exceptionally _helpful.

More generic logging of minor issues and general information is also helpful. However advice to "use logging" is good but more often than not a bit weak on details. What should you log, when, how? This will vary from situation to situation within an app, but there are a few good places to start:

Testing in production gets a bad wrap, but you already do it and so does everyone else[4]. The referenced article is more about testing new changes, but you can also get data from production to identify your bugs, beyond logging. Provided you're not changing their data or violating privacy agreements, you can _assume the role _of an end user and see the site as they actually see it. There are tools for this[5] which don't require that you log in with someone else's password. It can also be handy to include some data in your presentation layer, whether that's an HTML template or a JSON API response. This might include version sentinels or other identifying information which will give you some information about the source of the current state.

Last but not least, tests (i.e. automated, code-based tests) are not just useful for ensuring bug fixes work, but can be handy for identifying bugs to start with. An underused strategy in testing Python projects of all kinds is property based testing. Hypothesis[6] is the tool of choice in Python, and in essence it allows you to run highly parameterized, randomized tests. Imagine a parameterized test (or table test) with values for each provided argument that span the range of allowed values for each, hundreds or thousands of them, that allow you test the range of a function given a specific domain. It can be tricky to wrap your head around at first because you don't have recourse to testing for specific result values, but it's invaluable for finding edge cases that are otherwise difficult to think of.

Ever on the hunt,
Ben

[0] Technically Apheloria virginiensis isn't a bug, seeing as how they're _Diplopoda _and not Hemiptera, but colloquially, yeah.
[1] Yes, this is technical debt, not the creation of it, but the carrying forward of it. It's like not paying down a loan when you have the money to do so because there's another use for that cash that's more critical or carries a greater return. If that sounds dangerous to you think about it in the context of a business rather than personal finance. There can be long term benefits to tackling bugs first as a policy, as over time it may focus you (or your team) on producing fewer blocking bugs, but getting to the long term often requires slightly different decisions.
[2] pdb: https://docs.python.org/3/library/pdb.html, variants on PyPI: https://pypi.org/search/?q=pdb
[3] Embedding IPython: https://ipython.readthedocs.io/en/stable/interactive/reference.html#embedding
[4] "Testing in production: Yes, you can (and should)": https://opensource.com/article/17/8/testing-production
[5] User hijacking: https://djangopackages.org/grids/g/user-switching/
[6] Hypothesis: https://hypothesis.readthedocs.io/en/latest/

Originally published 2018-07-03