Leveraging Holistic Synergies: Ben Lopatin

Automated Django Deployments (This Old Pony #82)

March 26, 2019

Today we’re going to walk through a fairly typical web application deployment (obviously using Django as our example) and how to automate it.

Even if you’re not deploying to a single (or multiple long-lived) machine, I think you’ll find this useful in seeing how aside from saving us time and headaches, automation can unlock features that would have been - at best - PITA to do manually, if not infeasible.

Assume a Django application

Let’s start at the beginning.

First, we’ll assume a Django application. Let’s say it’s called broken_brackets and it’s running with Gunicorn, also Celery for asynchronous tasks, a PostgreSQL database, a modern version of pip, and, for what it’s worth, running behind Nginx.

Let’s say our app lives in /var/broken_brackets/ . How do we get changes there?

Well, one way would be to just dump your code right in there. Strictly speaking you don’t need source control on your server since you put the code there and run it. So you could use FTP to upload the files or rsync to sync them. And then of course, since you’re not running Django’s deserver, you’d need to restart the application server.

And there are plenty of simple situations where this would at least suffice. But it’s forgetting a few steps. Like collecting your static files. Migrating database changes. Updating dependencies. And you likely have to restart or reload several processes after doing this.

The simplest thing to do would be to include a deployment script, e.g. a bash script, in the project root, and have it do the things we would otherwise do manually (after syncing the files).

pip install -r requirements.txt
python manage.py collectstatic
python manage.py migrate
service gunicorn restart

Please forgive the simplifications made here. Do note that the requirement update should go first, because both static file collection and database schema migration rely on the currently installed packages.

So now you could just sync your files and then log into the server to run this script and call it a day. It’s already simpler.

But of course that’s extra steps, what if we could skip a step? Provided you have the necessary configuration on the remote machine, you could pull from your source code (e.g. Git) repository in the script.

git pull origin master
pip install -r requirements.txt
python manage.py collectstatic
python manage.py migrate
service gunicorn restart

Hot damn.

But this still requires logging into the remote server, navigating to the directory with the script, and running it.

Folding servers with Fabric

What if we could do all of this with one command?

That’s where a tool like Fabric comes into play. Fabric uses SSH under the hood, but lets you sequence commands and run them from a single Python script.

The goal is something like this:

fab production deploy

Where fab is the command that calls into the installed Fabric library, and production and deploy are respectively custom defined tasks in the Fabric parlance that (a) set the environment and (b) execute the steps necessary to deploy the latest updates.

Before we go any further, can you identify any potential weaknesses in our deployment pattern?

We’re doing everything in place. Code updates, static file updates, Python requirement changes… That means if anything fails, whether in the deployment process itself or in the deployed code, that’s just what we get. No turning back! Even to buy ourselves a few minutes. Oy. Wouldn’t it be nice to have actual releases?

So that’s what we’ll target here, something looking closer to releases that we can switch back and forth between (even if the switching isn’t entirely automated).

For our example we’ll still be pulling from Git (pushing from a tarball or zip archive is a perfectly good option too!). We don’t want to do it in place though, so we’ll create a separate directory.’

/var/broken_brackets/git_cache/

That’s where the Git repo will live and we’ll pull updates. We’ll keep all of our releases in the “releases” directory, named with a timestamp (using a Git identifier is not a bad idea!).

/var/broken_brackets/releases/2019-02-23-12-45-23/

The application itself will always be served from the latest release directory.

/var/broken_brackets/app/

And that in turn will actually just be a symlink to the most recent release directory. Why? Presumably you have some kind of supervising service, like Supervisor or systemd (or upstart back in the day), and these are configured with a single path to your application. There’s no need to update this path, we just tell it “run this script here” and if we change what’s linked under the path then so be it.

So when we deploy now we do the following:

  1. Update the server’s “cached” Git fork
  2. Create a new release directory
  3. Enter the new release directory
  4. Update Python dependencies with pip
  5. Collect static files
  6. Migrate database changes
  7. Update the “latest” release symlink
  8. Restart the services And if something’s gone wrong we can change the symlink back to the old release, restart, and then try again. With a few exceptions of course: database changes, Python dependencies which have been upgraded, and of course new static files - I mentioned we were using Nginx and these are all stored in /var/www/broken_bracket_static/.

We can’t solve all of these problems but we can get closer.

The point of using a modern version of pip is that pip caches downloads such that running pip again and again and again is pretty cheap. We should include the virtual environment for the project in the release directory, and the static file directory, too (the latter will just need to be readable by Nginx).

It might seem like overkill creating a new virtual environment with each new release (e.g. /var/broken_brackets/releases/2019-02-23-12-45-23/venv/) but it ensures release specific requirements, and the problem of accumulated files is easily solved by deleting old releases, e.g. anything past the last 20, for example.

Now when you run fab production deploy you’ll:

  1. Update the server’s “cached” Git fork
  2. Create a new release directory
  3. Enter the new release directory
  4. Create a new virtual environment in the release directory
  5. Install Python dependencies with pip
  6. Create a new collected static files directory in the release directory
  7. Collect static files
  8. Migrate database changes[0]
  9. Update the “latest” release symlink
  10. Restart the services Collectively executing these steps from a script doesn’t take that much longer than the original “upload code and restart the server”, it’s a lot less risky, and the deployment itself is a lot more robust.

Of course there are other ways to deploy Django applications, including platforms as a service (e.g. Heroku), serverless, using Docker, Kubernetes… the overarching ideas are pretty consistent however.

That said, I don’t like locally executed, manually triggered deployments. They couple the deployment process to a person, to your laptop and your internet connection, they don’t scale well past one person (and even with two people they require coordination).

So in the next issue we’re going to examine how my favorite home appliances, like the dishwashing machine, are related to automation services (and some strategies and tips for working with them to test and deploy your Django application).

Repeatedly yours,
Ben

[0] One of the benefits here is that every other step has taken place in isolation, and if there’s a failure that failure too is isolated before any attempts are made at changing the database.

Ben Lopatin

Another fine post by Ben Lopatin.


© 1997-2019 Ben Lopatin: follow me on Twitter; fork me on GitHub; connect, sync, and circle back with me on LinkedIn.