Introduction
This page describes how Alexis Huxley installed and configures a replicated storage server that doubles as the virtualisation platform.
Completion of this procedure has been abandoned because of:
- inadequate support for dedicated network for cluster communications
- inadequate client access control using GlusterFS protocol
The page remains here because although GlusterFS is not currently mature enough to meet my needs, it might be in future, in which case this page may be useful.
Local storage volumes
- Create LVs:
lvcreate --name=local --size=200g vg0
- Format for XFS:
mkfs -t xfs -f /dev/vg0/local
- Add fstab entries for them all, create mountpoints and mount them. (Note that I do the fstab entry using PCMS, because otherwise the change is reverted.)
Dedicated network interface
You should probably use a dedicated network card for cluster communications in order to ensure that “public” traffic does not delay replication. I use ‘traditional’ NIC naming (i.e. eth0, eth1), which is not persistent. This causes me a problem because I have three NICs in each machine, and the names eth1 and eth2 are effectively randomly assigned to the second and third NICs at each reboot. Therefore persistent naming is required.
- Edit /etc/udev/rules.d/70-persistent-net.rules to contain something like:
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:0e:0c:c5:f0:6d", ATTR{dev_id}=="0x0", ATTR{type}=="1", NAME="eth1" SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="10:fe:ed:05:92:6d", ATTR{dev_id}=="0x0", ATTR{type}=="1", NAME="eth2"
Note that I don’t bother with an entry for eth0, because that absolutely always is eth0, probably because it is on the systemboard.
- Reboot a few times to ensure that the NICs are consistently named the same.
- Add a suitable entry to /etc/network/interfaces for the NIC you will use for the cluster communications and add an entry for it to /etc/hosts.
Replicated storage volumes
- Create LVs:
lvcreate --name=small --size=200g vg0 lvcreate --name=vmpool0 --size=500g vg0 lvcreate --name=pub --size=2t vg0
- Format for XFS:
mkfs -t xfs -f /dev/vg0/small mkfs -t xfs -f /dev/vg0/vmpool0 mkfs -t xfs -f /dev/vg0/pub
- Add fstab entries for them all, create mountpoints and mount them. (Note that I do the fstab entry using PCMS, because otherwise the change is reverted.)
- Install the GlusterFS server software:
apt-get install glusterfs-server attr
(Without installing attr, a warning will appear when starting a volume. Install all recommendation!)
- When creating the first node in the cluster then create and start one-legged GlusterFS volumes:
# on fiori gluster volume create small fiori-backlan:/vol/bricks/small/brick gluster volume start small gluster volume create vmpool0 fiori-backlan:/vol/bricks/vmpool0/brick gluster volume set vmpool0 group virt gluster volume start vmpool0 gluster volume create pub fiori-backlan:/vol/bricks/pub/brick gluster volume start pub
(See here for a good explanation of why the volumes should be created in subdirectories of the mountpoint.)
- When adding more nodes (i.e. a second node to a one-legged cluster, or a third node to a two-legged cluster, etc) complete the following sub-procedure:
- Note:
- Think very carefully about how many nodes you want! Two nodes cannot withstand one node failing!
- Remember to always references hosts (including the host upon which you run any commands) using their name on the cluster LAN, not the public LAN!
- Verify all cluster connections!
- On the older node add the newer node; e.g.: on fiori:
gluster peer probe torchio-backlan gluster peer status
- On the older node add replicas to the volumes:
gluster volume add-brick test replica 2 torchio-backlan:/vol/bricks/test/brick
but note that this does not automatically trigger synchronisation, as can be verified by running:
df /vol/bricks/test # volume is empty gluster v heal test statistics # it thinks there is nothing to sync
- Note:
- For the volumes that will be used as VM storage pools, add fstab entries and mount them. E.g.:
# fiori fiori-backlan:vmpool0 /srv/vmpool0 glusterfs noauto,noatime,nodiratime 0 2
and mount it with:
# fiori mkdir -p /srv/vmpool0 mount /srv/vmpool0
- To ensure that the system boots even if there are GlusterFS problems, I set the ‘noauto’ option in /etc/fstab. This means that I need to do the mount manually after each reboot. In fact there are several such commands, so I have a script to to these steps.)
- Temporarily mount the other volumes:
# on fiori mkdir -p /srv/{small,pub} mount -t glusterfs fiori:small /srv/small mount -t glusterfs fiori:pub /srv/pub
- If you have SSH keys to install for root on the nodes of your storage cluster (e.g. to allow manual replication), then install them now.
- If you have existing data to migrate, then migrate them now.
Shares
- Restrict mount rights. E.g.:
gluster volume set small auth.allow 192.168.1.* gluster volume set vmpool0 auth.allow torchio-backlan,fiori-backlan gluster volume set pub auth.allow 192.168.1.*
(Note that CIDRs are not supported yet and per-client read/write access control is not supported yet.)
Virtualisation
- Run:
apt-get install qemu-kvm libvirt-bin qemu-utils
- Define the storage pools using ‘virsh’. E.g.:
virsh pool-define-as --name=vmpool0 --type=dir --target=/srv/vmpool0 virsh pool-start vmpool0 virsh pool-define-as --name=isoimages --type=dir --target=/srv/pub/computing/software/isoimages/os virsh pool-start isoimages virsh pool-destroy default
(Since vmpool0 and isoimages are on GlusterFS storage, we choose at this time not to enable autostart on them.)
- Define the networks using ‘virsh’. E.g.:
virsh net-destroy default
- Set up SSH keys to allow the running of virt-manager from a remote system.
- If you have existing VM images and definitions to migrate, then migrate them now.