Managing software repositories in production environments

Introduction

This article explains the benefits of managing the propagation of software and software updates between repositories in production environments, discusses how this might be done and highlights some useful tools. It mentions Linux distributions and tools, but the concepts discussed are universal.

Symmetry

On home computer systems used by one or two people, regularly applying software updates is important in order to provide protection from the latest viruses, to address bug fixes in applications and to make the latest features available.

On production systems running mission-critical applications, regularly applying software updates is of subordinate importance; more important is the availability of those mission-critical applications.

(Actually, even on home systems, there may be critical applications; e.g. an email client while waiting for a job offer, a web browser to do some urgent internet banking.)

When there is unscheduled downtime, then recovery will be cheaper (i.e. faster and requiring less effort) if the environment is as predictable as possible. “As predictable as possible” means that the number of differences (between system S1 and system S2, or between system S1 at time T1 and the same system at time T2) is as low as possible. We call this symmetry.

In order to achieve symmetry, at least the following two things are needed:

  1. tools for automatic installation
  2. software update repositories which are:
    1. persistent
    2. stable
    3. complete

Regarding point 1: we are human; if a single person repeatedly follows a complex procedure to install a system then the results will be inconsistent; if different people follow the same complex procedure then the results will be even more inconsistent. Automatic installation is an absolute requirement for symmetry.

Pull-propagation of software updates

Software packages and updates to software packages are created by Software Authorities (e.g. GNU), from whom they are pulled by Distribution Authorities (e.g. Debian), from whom they are pulled by upstream mirrors (e.g. the official Australian Debian mirror). Where they are pulled to next depends on what system administrators want. In the simple case of a home computer system, they are probably directly pulled and installed by the home computer system itself, as illustrated below:

pull-propogation-1

We refer to this as the ‘pull-propagation map’. Also, we refer to both updates to software packages and also initial installation of software packages as ‘software updates’, as both take the same route across the pull-propagation map.

Regarding point 2A in the previous section: if a new system is required but the upstream mirror is no longer available, then the new system cannot be installed, which may compromise an organisation’s ability to fulfill its mission. To avoid this, a local copy of the upstream mirror should be made. Since a mirror is widely understood to be up to date, we should not deviate from this understanding and so should update this mirror on a regular basis (e.g. via cron). Also, having an up to date mirror will help answer the question “but does it work in the latest release?”.

Regarding point 2B in the previous section: if a system is pulling software updates from a local mirror that is being updated then the system will also be updated, which may break mission-critical applications. To avoid this, we need to ‘freeze’ the mirror (i.e. to make a read-only copy of the mirror at a recorded point in time) and tell the system to pull updates from the freeze. Obviously if the release has been released, so to say, then it will be considerably more stable than an experimental release, but the requirement to freeze is critical in both cases.

Regarding point 2C in the previous section: if an organisation decides that tool X should be installed on all existing and future installations and the mirror is not complete, then tool X or its prerequisites may not be available in the versions declared compatible with each other by the Distribution Authority. To avoid this, we need to make a complete mirror, even if, initially, we do not believe it will be needed.

Generally, systems are members of large groups (e.g. a group of systems installed with the OS release required by one mission-critical application, another group of systems installed with the different OS release required by another mission critical application). Freezes can help with managing groups, but sometimes finer granularity is needed.

Sometimes, systems need to be pointed to system-specific repositories (e.g. to answer the question “if I upgrade this system to the latest release, will it fix the problem?”) This would necessitate making a new freeze and then temporarily changing the URL of the freeze in the auto-installer programs. If a freeze is then to be rolled out to a group of existing systems then the URL of the freeze in the auto-updater programs on each existing system will need to be changed so as to point to the new freeze. To avoid this, we direct each system to host-specific URL which is then mapped on the server (e.g. with a symlink or an Apache Redirect) to the appropriate freeze. We use host-specific URLs whether or not we expect to need such fine granularity, in order to avoid the cost of later adding the extra layer of redirection. At rollout time, the only thing that needs to be changed is the mapping on the server. We refer to such mappings as ‘indirects’ because requests to access a freeze are indirected through a different access point.

The number of hops that a software update must now make is considerably increased. Here is a diagram illustrating them:

pull-propogation-2

System administrators are generally concerned only with the right half of this pull-propagation map and the corresponding tasks:

  1. creating and updating local mirrors
  2. creating freezes
  3. creating indirects and sharing them via HTTP, FTP or NFS
  4. instructing systems to pull updates from the shares

Some of these steps may be skipped for certain types of update for certain systems (e.g. security updates may be considered urgent enough for mission critical web servers with public interfaces to warrant pulling directly from the Distribution Authority to the computer system with high frequency). Here is a diagram illustrating this:

pull-propogation-3

 

Devising a management policy

In order to maintain symmetry a policy is needed that defines, for each type of software update, when and how each hop is to be made. This amounts to filling in the table below:

Update typeUpdate local mirrorCreate freezeCreate indirect and shareUpdate system
Security bug fix
Non-security bug fix
New feature
New installation of standard package

As an illustration, it is filled in according to how my home network is managed:

Update typeUpdate local mirrorCreate freezeCreate indirect and shareUpdate system
Security bug fixnever (security bug fixes are pulled directly from DA)daily by cron
Non-security bug fixdaily by cronstable releases: once after release by sysadmin

testing releases: when a blocking bug is fixed, by sysadmin
at system (re)installation time by auto-installerdaily by cron (but indirect is stable so this is a no-op)
New feature
New installation of standard packageas required by sysadmin (and always followed by either removal of the package or update of auto-installer)

Tools

Mirrors of multiple releases of an OS and multiple freezes of each of those OSs contain a lot of duplication. ZFS or other filesystems supporting de-duplication could save a lot of disk space.

PAA offers a complete solution for all steps in the pull-propagation map for any Linux distribution.

There are several tools to perform automatic installation on Linux systems. Anaconda/Kickstart is tuned for RedHat-based systems (RHEL, CentOS, ScientificLinux, etc). FAI is for Debian-based systems (Debian, Ubuntu).

See also