Configuring storage and virtualisation services generation four

Introduction

This page describes how Alexis Huxley installed and configures a replicated storage server that doubles as the virtualisation platform.

alert  Completion of this procedure has been abandoned because of:

The page remains here because although GlusterFS is not currently mature enough to meet my needs, it might be in future, in which case this page may be useful.

Local storage volumes

  1. Create LVs:
    lvcreate --name=local --size=200g vg0
    
  2. Format for XFS:
    mkfs -t xfs -f /dev/vg0/local
  3. Add fstab entries for them all, create mountpoints and mount them. (Note that I do the fstab entry using PCMS, because otherwise the change is reverted.)

Dedicated network interface

You should probably use a dedicated network card for cluster communications in order to ensure that “public” traffic does not delay replication. I use ‘traditional’ NIC naming (i.e. eth0, eth1), which is not persistent. This causes me a problem because I have three NICs in each machine, and the names eth1 and eth2 are effectively randomly assigned to the second and third NICs at each reboot. Therefore persistent naming is required.

  1. Edit /etc/udev/rules.d/70-persistent-net.rules to contain something like:
    SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:0e:0c:c5:f0:6d", ATTR{dev_id}=="0x0", ATTR{type}=="1", NAME="eth1"
    SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="10:fe:ed:05:92:6d", ATTR{dev_id}=="0x0", ATTR{type}=="1", NAME="eth2"

    Note that I don’t bother with an entry for eth0, because that absolutely always is eth0, probably because it is on the systemboard.

  2. Reboot a few times to ensure that the NICs are consistently named the same.
  3. Add a suitable entry to /etc/network/interfaces for the NIC you will use for the cluster communications and add an entry for it to /etc/hosts.

Replicated storage volumes

  1. Create LVs:
    lvcreate --name=small    --size=200g vg0
    lvcreate --name=vmpool0  --size=500g vg0
    lvcreate --name=pub      --size=2t   vg0
  2. Format for XFS:
    mkfs -t xfs -f /dev/vg0/small
    mkfs -t xfs -f /dev/vg0/vmpool0
    mkfs -t xfs -f /dev/vg0/pub
  3. Add fstab entries for them all, create mountpoints and mount them. (Note that I do the fstab entry using PCMS, because otherwise the change is reverted.)
  4. Install the GlusterFS server software:
    apt-get install glusterfs-server attr
    

    (Without installing attr, a warning will appear when starting a volume. Install all recommendation!)

  5. When creating the first node in the cluster then create and start one-legged GlusterFS volumes:
    #  on fiori
    gluster volume create small   fiori-backlan:/vol/bricks/small/brick
    gluster volume start  small
    gluster volume create vmpool0 fiori-backlan:/vol/bricks/vmpool0/brick
    gluster volume set    vmpool0 group virt
    gluster volume start  vmpool0
    gluster volume create pub     fiori-backlan:/vol/bricks/pub/brick
    gluster volume start  pub

    (See here for a good explanation of why the volumes should be created in subdirectories of the mountpoint.)

  6. When adding more nodes (i.e. a second node to a one-legged cluster, or a third node to a two-legged cluster, etc) complete the following sub-procedure:
    1. Note:
      • Think very carefully about how many nodes you want! Two nodes cannot withstand one node failing!
      • Remember to always references hosts (including the host upon which you run any commands) using their name on the cluster LAN, not the public LAN!
      • Verify all cluster connections!
    2. On the older node add the newer node; e.g.: on fiori:
      gluster peer probe torchio-backlan
      gluster peer status
    3. On the older node add replicas to the volumes:
      gluster volume add-brick test replica 2 torchio-backlan:/vol/bricks/test/brick

      but note that this does not automatically trigger synchronisation, as can be verified by running:

      df /vol/bricks/test                                   #  volume is empty
      gluster v heal test statistics                        #  it thinks there is nothing to sync
  7. For the volumes that will be used as VM storage pools, add fstab entries and mount them. E.g.:
    #  fiori
    fiori-backlan:vmpool0 /srv/vmpool0 glusterfs noauto,noatime,nodiratime 0 2

    and mount it with:

    #  fiori
    mkdir -p /srv/vmpool0
    mount /srv/vmpool0
  8. To ensure that the system boots even if there are GlusterFS problems, I set the ‘noauto’ option in /etc/fstab. This means that I need to do the mount manually after each reboot. In fact there are several such commands, so I have a script to to these steps.)
  9. Temporarily mount the other volumes:
    #  on fiori
    mkdir -p /srv/{small,pub}
    mount -t glusterfs fiori:small /srv/small
    mount -t glusterfs fiori:pub /srv/pub
  10. If you have SSH keys to install for root on the nodes of your storage cluster (e.g. to allow manual replication), then install them now.
  11. If you have existing data to migrate, then migrate them now.

Shares

  1. Restrict mount rights. E.g.:
    gluster volume set small   auth.allow 192.168.1.*
    gluster volume set vmpool0 auth.allow torchio-backlan,fiori-backlan
    gluster volume set pub     auth.allow 192.168.1.*

    (Note that CIDRs are not supported yet and per-client read/write access control is not supported yet.)

Virtualisation

  1. Run:
    apt-get install qemu-kvm libvirt-bin qemu-utils
  2. Define the storage pools using ‘virsh’. E.g.:
    virsh pool-define-as --name=vmpool0   --type=dir --target=/srv/vmpool0
    virsh pool-start vmpool0
    virsh pool-define-as --name=isoimages --type=dir --target=/srv/pub/computing/software/isoimages/os
    virsh pool-start isoimages
    virsh pool-destroy default

    (Since vmpool0 and isoimages are on GlusterFS storage, we choose at this time not to enable autostart on them.)

  3. Define the networks using ‘virsh’. E.g.:
    virsh net-destroy default
  4. Set up SSH keys to allow the running of virt-manager from a remote system.
  5. If you have existing VM images and definitions to migrate, then migrate them now.

See also