February 05, 2019
We’ve been looking at why people erroneously think they’ve reached the limits of their software stack:
Both generally and with regard to Django, in most of the cases where I’ve seen people who think they’ve reached the limits of what their tools can do, they’re simply wrong, often by significant margins. The reasons, though, vary a lot. They’re interesting in and of themselves but identifying them is important unless you have piles of cash to turn and don’t mind risking your entire software stack.
This week we’re going to continue looking at architecture, specifically how the architecture of data models creates problems, problems that are sometimes confused with the tech stack.
Customer accounts are often subject to an anti-pattern in data modeling, one that tends to cause all kinds of problems later.
Let’s say you want to create a web application that allows people to register for the site and pay to use the site. How would you go about doing this?
The first thing you’d probably do is allow people to register as users on the site. Give them a way to provide some basic identifying information, a password, an email address, and maybe confirm the latter, and then let them access everything behind the great login curtain.
Next you’d want to add in some hooks for payment, likely including recurring payments, otherwise known as subscriptions. So now you set up a payment backend to allow each user to provide payment information and actually pay you.
That’s fine. But now what happens when you want to allow customers to add additional users? Maybe you want to allow up to 3 people to have access to a single subscription, or you want customers to be able to add additional seats to a subscription.
This is doable, but it’s starting to get a little tricky, because the subscription is associated with the user.
Next, how about we allow users to have access to multiple accounts? I may have my own subscription, but you may have resources you want to allow me to be able to create resources in your account, too. Suddenly it looks like associating everything with users isn’t such a good idea any longer. From subscriptions to API keys, the data model is missing a key layer, and to workaround this all kinds of weird relationships get constructed, not to mention hard to test and hard to maintain spaghetti code.
Premature optimization is the root of all evil.
One form of premature optimization is prematurely transforming data. This is something I wrote about previously, in which your trusted guide did just this exact thing. Prematurely concerned with the future size of the database, I designed the API-facing data tables to reflect only the aspects of the data which we knew the project needed, rather than reflect the data which would be coming in.
It seemed cleaner. Instead of one (or more) table for each vendor API with possibly dozens of columns for data we didn’t think we needed, the result was a single table which would map to the original application specs with not a single piece of unnecessary data.
This is a data straight jacket.
You don’t want to fill your database with unnecessary junk, but when you’re dealing with data to-be transformed, among other things, you need a data model that keeps an accurate representation of what will be transformed, first.
Unlike our previous two examples, generic or generic foreign key relationships are largely specific to the Django ORM. A generic relationship is one not defined in the database, but based on the name of the app and model. It’s a nice fit for very few apps, like tags, which may be used to create a taxonomy for a wide variety of types of content, from images, to books, to products for sale, and even people. For pretty much everything else they’re a shortsighted crutch that lead to slow performance and hard to maintain code.
Think twice, even thrice, before modeling your data with undefined relationships.
What makes data model problems challenging is that the data model is akin to the foundation of your application. Making changes means not just making significant code changes but a long sequence of migration steps to ensure that production data is safely updated.
It can be scary to make these changes, but like any maintenance issue the cost of making them increases the longer its put off.
Post script: The customer management issue is illustrative of the kind of data model challenge that you’re likely to run into working with an existing application, especially one that’s “organically” grown as a production application. It’s also specifically representative of a problem we’ve encountered again and again with SaaS applications the years, and why I created Django Organizations.
Another fine post by Ben Lopatin.
© 1997-2019 Ben Lopatin: follow me on Twitter; fork me on GitHub; connect, sync, and circle back with me on LinkedIn.