Leveraging Holistic Synergies

When you start looking at tutorials and guides for a configuration management tool like Ansible, most of them show how to get started or provide a great picture of what a nice complete set up looks like. That's great if you're trying to figure out the basics or start building a new project, but it leaves open the question of how to start applying these tools - and concepts - to existing projects. This is a story on how to do that.

Why Ansible and not just Ansible

We'll look at how to do this using Ansible as the tool but the principles are largely tool agnostic, so if you're using Puppet or Chef or Salt or Bash scripts it shouldn't matter too much.

For smaller projects and for getting started in general the barrier to entry with Ansible is considerably lower. You can eschew needing agents on your target machines and for most configuration the playbook YAML is simpler to work with.

The existing project situation

For the example I'll use a project we've been working on. It's a site that we've been working on in maintenance mode for the past couple years after taking over that role from another agency and inheriting the existing system. We're now working with this agency to redesign and overhaul the site, which is going to include some significant upgrades to core components.

The stack includes Nginx to reverse proxy requests and serve static media, Apache running mod_wsgi to serve Django requests and to serve PHP for the WordPress site running on the same domain; PostgreSQL for the main Django site; MySQL for the WordPress blog; Memcached for caching; Postfix for email routing; and of course the assorted other system components like cron jobs and general system configuration.

The stack was overdue for an upgrade along with the design, but with an OS upgrade due itself, it was time to start updating everything else. We wanted more up-to-date versions of Python and PostgreSQL, to start with, and better control over logging and montioring.

Picking a starting point

The best advice I heard about starting testing in an existing software project it is to start with the bugs. Every time you encounter a bug, write a test to replicate the bug and solve the problem (satisfying the test). You just keep doing this, and writing tests for new features, rather than trying to approach writing a complete test suite from scratch. Eventually you'll end up with a decent test suite with reasonable coverage.

Similarly, it's simpler to start your configuration management with problem areas, and notwithstanding that, start with anything you need to immediately change. It also helps to pick things that require minimal configuration, i.e. use the snowball method. Something like memcached is a pretty good candidate here. You're installing it and then modifying one configuration file. That's a cinch.

Start with the small pieces. Don't try downloading a complete set of roles (Ansible) or modules (Puppet) to manage the entire service. Let's take Nginx, for example. It's a fairly simple configuration albeit one with several components. A full configuration will ensure that it's installed, manage the service configuration, maybe update the startup scripts, and then let you add your local sites.

By example: configuring the status quo

So in our scenario what you need to do is update some minor item in the site configuration for one site. Let's make it simple, it's just a redirect at the web server level. Now you have a staging site and a production that you need to make this change on, sequentially. You could decide to fully manage Nginx at this point, but here I'd argue that it's enough to just manage that one file.

Role call

We will go so far as to set up an Nginx role, rather than just using a one-off playbook. This will require a small folder structure and will make further work simpler, but the immediate benefit is that it'll be easier to work with files and templates. There's a staging environment and a production environment, so this difference needs to be accounted for. There are several ways to handle this, but for simplicity in this example we'll use distinct files for each environment.

Here's the role structure:

roles/
    nginx/
        files/
            staging_site.conf
            production_site.conf
        handlers/
            main.yml
        tasks/
            main.yml

The files folder should be self-explanatory. There's a site configuration file for each environment. The tasks folder is where all of the Ansible task files go. When executing a role Ansible will look for main.yml - you can include other task files to break up tasks into logical groups, but you'll still need a main.yml file to include them from.

Here's what the tasks file looks like:


---
- name: Add default site
  copy: >
    src="{{deployment}}_site.conf"
    dest=/etc/nginx/sites-available/default
  sudo: yes
  notify: Reload Nginx

- name: Enable the default site
  file: >
    state=link
    src=/etc/nginx/sites-available/default
    dest=/etc/nginx/sites-enabled/default
  sudo: true
  notify: Reload Nginx

Your site configuration file names may differ.

A handler, as referenced by notify above, is a task executed in response to another. Making the changes to Nginx's configuration isn't enough, we need to reload those changes, too. I've added it for both tasks because I want Nginx to reload under any change here, but definitely after the second task. However since the first task could result in a change without the second doing so (the link should be unchanged) I'll add it twice.


---
- name: Reload Nginx
  service: name=nginx state=reloaded
  sudo: true

For the contents of each site configuration file, start by copying the exact contents of the target file from that environment. The first step is ensuring that our process matches what's on the server. We're going to codify the existing setup before making even a tiny change.

Environments and execution

In order to deploy this change, we'll create a playbook to specify which roles to include. Here's that file, called configure.yml:


---
- hosts: all
  roles:
  - nginx

Next create a hosts file for each environment, e.g. hosts.staging, hosts.production. You can name them otherwise, of course, but that's my own convention. The hosts.staging file will include your host servers, e.g.:

[staging]
mydomain.com

Note that you can add domains, IP addresses, or named servers from your SSH configuratiNote that you can add domains, IP addresses, or named servers from your SSH configuration.

One last thing: the tasks file references the configuration file paths using a variable and that's not set anywhere. We'll use the group_vars folder to include a file for each environment with variable definitions. group_vars/staging will look like this:


---
deployment: "staging"

Pretty simple. You can also include an all file for variables which apply to all hosts. These are overridden by host specific variables.

Your file structure should look like so now:

group_vars/
    production
    staging
roles/
    nginx/
        files/
            staging_site.conf
            production_site.conf
        handlers/
            main.yml
        tasks/
            main.yml
configure.yml
hosts.production
hosts.staging

Now you're ready to execute:

ansible-playbook -K configure.yml -i hosts.staging --check

The -K option will prompt for the sudo password; the -i option is followed by the path to the hosts file we want to use; and the --check flag tells Ansible not to execute any of the tasks but just report back whether changes would be made. Our expectation is that no changes will be made - presuming, of course that the file contents are exactly the same. Go ahead and run this without the check flag on staging, and do the same sequence on production.

By example: incremental changes

Now let's add our change to the configuration. It's a simple redirect rule, e.g.:

location /redirect/ {
    rewrite ^(.*) http://www.othersite.com/ permanent;
}

Drop that into the appropriate place in the site configuration file for each environment. For good measure add some comments to the file to the effect that one should not make changes to it directly since it's under new management. Now execute the playbook again as before. Your one redirect is now active across your environments. Woohoo!

Complete configuration

Of course it'd have taken a fraction of the time to make this change directly, but now you can repeat changes, exactly, across environments. You have the benefit of being able to more easily test these changes and to retain a history of what changed, by whom, and why.

Once you've got your immediate changes out of the way, start fleshing out your configuration of the status quo. Add a task to install Nginx, and include additional configuration files in their present state. For each addition, execute the playbook across your environments in sequence, using the --check flag as desired. This is a tad cautious, but the goal is to make small, easy to track changes.

As you build out your configuration, including additional roles, the best test of your configuration's completeness is how well you can rebuild your current environment from the group up using your configuration code. A disposable development environment such as a local VM or cheap cloud instance is your friend here.

Wrap up

Your strategy should be to start small, both in terms of the scope of processes or files to manage, and to work incrementally. Work incrementally with both the changes you make and the systems against which you make your changes.

If this sounds crazy and exceptionally conservative, you're right. And that's how I like it. We have an existing system, a production system even, and our goal is not to make tons of changes at once, but incrementally in the smallest fashion possible. Eventually as you start checking more systems and making more changes you'll get more pieces of more services configured authoritatively and codified.

Originally published 2014-06-28