Debian Bullseye rollout

Last updated: 03/11/2021

Introduction

This page is an activity plan/log for migrating my systems from Debian 10 to Debian 11.

Project log

  1. prologue:
    1. order more memory? (skipped; too expensive)
    2. order a second 8TB disk? (done)
    3. await delivery of disks (done)
    4. download Debian 11.1 netinst image from here and archive it (done)
  2. proof of concept (done)
    1. generate a new keypair for pestaroli & testaroli and archive somewhere accessible (done) 
    2. migrate mafalde from pestaroli to testaroli (done)
    3. on testaroli disconnect (not down!) DRBD devices from network (so that when I come to upgrade pestaroli then testaroli doesn’t immediately try to to connect/sync) (done)
    4. changes to disk arrangement  (done)
      1. on torchio and fiori add 4 new DRBD devices called pestaroli-new-disk1, pestaroli-new-disk2, testaroli-new-disk1, testaroli-new-disk2, all of size 60GB (these will become the OS+containers space for the reinstalled versions of pestaroli and testaroli) and wait for the synchronisation to complete (done)
        root@fiori# create-perfect-drbd-vol -v testaroli_new_disk1 20 fiori 192.168.3.6 torchio 192.168.3.7
        root@fiori# create-perfect-drbd-vol -v testaroli_new_disk2 20 fiori 192.168.3.6 torchio 192.168.3.7
        root@fiori# create-perfect-drbd-vol -v pestaroli_new_disk1 20 fiori 192.168.3.6 torchio 192.168.3.7
        root@fiori# create-perfect-drbd-vol -v pestaroli_new_disk2 20 fiori 192.168.3.6 torchio 192.168.3.7
      2. shutdown pestaroli (done)
      3. on fiori and torchio down DRBD devices drbd_pestaroli and drbd_pestaroli_containers (done)
      4. on fiori and torchio rename the drbd_pestaroli and  drbd_pestaroli_containers LVs to add ‘_old’ suffix (done)
      5. on fiori and torchio down DRBD devices drbd_pestaroli_new_disk1 and drbd_pestaroli_new_disk2 (done)
      6. on fiori and torchio rename the drbd_pestaroli_new_disk1 and drbd_pestaroli_new_disk2 LVs to remove the ‘_new’ suffix (done)
      7. on fiori and torchio accordingly adjust the DRBD config files names and contents for DRBD devices drbd_pestaroli_new_disk1 and drbd_pestaroli_new_disk2 (done)
      8. on fiori and torchio up the DRBD devices drbd_pestaroli_new_disk1 and drbd_pestaroli_new_disk2 (done)
      9. on fiori or torchio promote to primary the DRBD devices drbd_pestaroli_new_disk1 and drbd_pestaroli_new_disk2 (done)
      10. notes about VMs meant to support nested VMs (done)
        1. to run nested VMs it is essential the VM’s CPU is declared correctly, but virt-manager’s CPU configuration menu does not include the right options. So it can only be done by editing the XML with virsh edit <domain>. The required CPU stanza is this: (done)
          <cpu mode='host-passthrough' check='partial'/>

          (There are plenty of web pages saying to set ‘Copy host CPU configuration’ option, but these did not work for me.)

        2. to check support for nested VMs verify the output of the following commands: (done)
          root# egrep -q "vmx|svm" "/proc/cpuinfo" && echo OK
          OK
          root# cat /sys/module/kvm_amd/parameters/nested 
          1
          root# virt-host-validate qemu
          QEMU: Checking for hardware virtualization : PASS
          ...
          root#
      11. undefine the pestaroli VM (done)
      12. redefine the pestaroli VM by running:
        create-perfect-kvm-vm -v -n --name=pestaroli --mem=4 --nics=2 --disk=block:/dev/drbd_pestaroli_disk1 --disk=block:/dev/drbd_pestaroli_disk2 --remote=<other-vm-server>
      13. note partitioning within the pestaroli VM with its two disks: we choose to use LVM-over-MD-RAID0 because: (done)
        1. although the Debian installer supports creating multi-disk VGs, it does not support creating striped LVs
        2. according to this article MD-RAID0 provides slightly better performance than striped LVs

        so we go for:

        1. /dev/vda1: 550MB for EFI
        2. /dev/vda2: 1024MB for /boot (general advice is keep /boot out of LVM)
        3. /dev/vda3: remainder for MD-RAID0
        4. /dev/vdb1: 550MB for nothing except symmetry
        5. /dev/vdb2: 1024 for nothing except symmetry
        6. /dev/vdb3: remainder for MD-RAID0
        7. /dev/md0: remainder for LVM
        8. create VG vg0 on /md0
        9. create LV root
    5. reinstall according to this procedure (installing and running PCMS is included in that procedure) and update it based on the partition notes above (done)
    6. running PCMS should nave been enough to set up ~root/.ssh/authorized_keys but ~/.ssh/id_rsa* are still missing; restore them (done)
    7. Set testaroli up as a VM server by completing Configuring storage and virtualisation services generation three point four (but ignore certain bits which are only relevant in the production environment on fiori and torchio). (done)
    8. for mafalde: (done)
      1. create one-legged DRBD devices on pestaroli using a higher port number so synchronisation cannot start: (done)
        mkdir -p ~/opt
        svn -q co https://svn.pasta.freemyip.com/main/virttools/trunk ~/opt/virttools
        create-perfect-drbd-vol -n mafalde 5 pestaroli 192.168.3.32 testaroli 192.168.3.31

        and archive the file script that that produces, so that we can run it later on the other DRBD node.

      2. make the device primary on pestaroli (done)
      3. shutdown the VM on testaroli (done)
      4. copy over ssh keys to allow two way communication between pestaroli and testaroli
      5. dd the DRBD device from testaroli to pestaroli: (done)
        #  on pestaroli
        ssh -n testaroli dd if=/dev/drbd_mafalde bs=1M | dd of=/dev/drbd_mafalde bs=1M
      6. copy over the VM configuration: (done)
        #  on pestaroli
        virsh --connect=lxc:/// define <(ssh -n testaroli virsh --connect=lxc:/// dumpxml mafalde)
      7. start the VM on pestaroli (done)
    9. confirm: mafalde is now running on pestaroli, right? (done)
    10. reinstall testaroli:  (done)
      1. undefine the pestaroli VM (done)
      2. redefine the pestaroli VM by running:
        create-perfect-kvm-vm -v -n --name=pestaroli --mem=4 --nics=2 --disk=block:/dev/drbd_pestaroli_disk1 --disk=block:/dev/drbd_pestaroli_disk2 --remote=<other-vm-server>
      3. reinstall according to this procedure (installing and running PCMS is included in that procedure) (done)
      4. running PCMS should nave been enough to set up ~root/.ssh/authorized_keys but ~/.ssh/id_rsa* are still missing; restore them (done)
      5. Set testaroli up as a VM server (but ignore certain bits which are only relevant in the production environment on fiori and torchio).
    11. for mafalde: (done)
      1. create second leg of DRBD device on testaroli by running the script saved earlier; synchronisation should start automatically. (done)
      2. wait for synchronisation to finish (done)
      3. shutdown the VM on pestaroli (done)
      4. make the device secondary on pestaroli (done)
      5. make the device primary on testaroli (done)
      6. verify two way ssh between pestaroli and testaroli (done)
      7. copy over the VM configuration: (done)
        #  on testaroli
        virsh --connect=lxc:/// define <(ssh -n pestaroli virsh --connect=lxc:/// dumpxml mafalde)
      8. I needed to reboot at this point but I think a systemctl restart libvirt probably would have been enough (done)
      9. start the VM on testaroli (done)
      10. migrate it back again using my scripts (done)
      11. run the uefi sync script to make sure VM definitions are aligned (done)
    12. rebalance VMs (not necessary; listed here only for symmetry) (skipped)
    13. cleanup (done)
      1. shut down mafalde (done)
      2. undefine the mafalde LXC container on pestaroli and testaroli (done)
      3. down the DRBD device on pestaroli and testaroli (done)
      4. undefine the DRBD device on pestaroli and testaroli (done)
      5. remove the LV on pestaroli and testaroli (done)
    14. PCMS modifications (done)
      1. modify PCMS to make congane the same as mafalde is current (i.e. both should be LXC) (done)
      2. modify PCMS to make mafalde a KVM VM (done)
    15. for KVM-based VM mafalde on Debian 11 KVM-based VMs on Debian 1o PMs (in progress)
      1. create new 10GB DRBD volume on pestaroli and testaroli (20GB will not be possible due to size of RAID0 device on pestaroli and testaroli) (done)
        create-perfect-drbd-vol -v --remote=testaroli mafalde 10 pestaroli 192.168.3.32 testaroli 192.168.3.31
      2. create the mafalde VM (done):
        create-perfect-kvm-vm --remote=testaroli --name=mafalde --disk=block:/dev/drbd_mafalde
      3. install the VM as follows (failed)
        1. update the create-perfect-kvm-vm script for Debian 11 (done, but then reverted in case caused problem below)
          1. change video to virtio  (done, but then reverted in case caused problem below)
          2. change OS to Debian 11  (done, but then reverted in case caused problem below, see also BTS#997928)
        2. complete this procedure (failed, after grub menu, mafalde VM uses 100% of CPU of pestaroli; several bug reports suggest this is due to different kernels at L0, L1, L2 level)
    16. for KVM-based VM mafalde on Debian 11 PM as possible fix for above problem (done)
      1. commit all working copies in ~root/opt/ on all recently used hosts (done)
      2. modify PCMS to make halusky a VM server (done)
      3. reinstall halusky according to this procedure (installing and running PCMS is included in that procedure) (done)
      4. configure halusky as a VM server (but ignore certain bits which are only relevant in the production environment on fiori and torchio) (done)
      5. in order to better understand how to use virsh create-vol-as, I installed virt-manager (done)
      6. on halusky create new 20GB volume in the default pool for mafalde (done)
        virsh vol-create-as --pool local --name mafalde.img --capacity 20G --format raw
      7. on halusky create the mafalde VM (done)
        IMG_PATH=$(virsh -q vol-path --pool local --vol mafalde.img)
        ./create-perfect-kvm-vm -v --name=mafalde --disk=file:$IMG_PATH
      8. install mafalde according to this procedure (installing and running PCMS is included in that procedure) (done)
      9. note that at this point we are confident that when we come to reinstall fiori and torchio we will be able to run VMs on them (done)
      10. if okay then modify create-perfect-kvm-vm to use virtio video hardware (done)
      11. if okay then modify create-perfect-kvm-vm to define the VM as Debian 11 (done)
      12. configure mafalde as a VM server (but ignore certain bits which are only relevant in the production environment on fiori and torchio) (done)
      13. install virt-manager (done)
      14. on mafalde create new 5GB volume in the default pool for mafalde2 (done)
        virsh vol-create-as --pool local --name mafalde2.img --capacity 5G --format raw
      15. on mafalde create the mafalde2 VM
        IMG_PATH=$(virsh -q vol-path --pool local --vol mafalde2.img)
        ./create-perfect-kvm-vm -v --name=mafalde2 --disk=file:$IMG_PATH
      16. start (but don’t bother completing) this procedure to install mafalde2 (done)
      17. note – solely for completeness – that at this point we are confident that when L0 and L1 and L2 are all running the same kernel then nested VMs work (done)
      18. put halusky away for the time being (done)
    17. for LXC-based VM congane (done)
      1. create new 5GB DRBD volume on pestaroli and testaroli for congane (done)
        create-perfect-drbd-vol -n --remote=testaroli --name=congane --size=5 --name1=pestaroli --ip1=192.168.3.32 --name2=testaroli --ip2=192.168.3.31

        (Note that I modified create-perfect-drbd-vol to take all arguments as options.)

      2. install new LXC container with the same name according to this procedure (which should include installing and running PCMS) (done)
    18.  epilogue (done)
      1. remove obsolete DRBDs and LVs (done)
      2. commit everything (done)
      3. for halusky (done)
        1. modify PCMS to make it a laptop (done)
        2. reinstall halusky according to its own page (done)
        3. commit edits to pcms (done)
        4. restore non-dot files from backup of home (done)
        5. configure backups (done)
        6. do a backup (done)
  3. review this section and apply anything necessary to the fiori/torchio procedures below (done)
  4. prologue (done)
    1. fix get_iplayer location issue (done)
    2. modify PCMS to be able to list (skipped; difficult to integrate and there is an existing solution)
    3. make a new release of PCMS (done)
    4. remove the workaround for pcms version 2 in installing and running pcms page (done)
    5. decide when to do fiori (done)
  5. reinstall fiori (done)
    1. migrate all VMs from fiori to torchio (done)
    2. on fiori commit anything I need to commit (done)
    3. on fiori run vm-server down (done)
    4. on torchio disconnect (not down!) DRBD devices from network (so that when I come to upgrade fiori then torchio doesn’t immediately try to to connect/sync) (done)
    5. on fiori power off (done)
    6. on fiori remove the 4TB disk and and insert one of the new 8TB disks (done)
    7. on sugo configure fiori in PCMS as KVM+LXC server (but not torchio else it will be applied before torchio is reinstalled) (done)
    8. edit fiori‘s wordpress page and install it according to its own page (making edits along the way) (done)
    9. retrieve the standard working copies for ~root/opt (done)
    10. copy ssh keys from torchio to fiori (done)
    11. adjust the port range that create-perfect-drbd-vol uses so that new devices on this node can’t talk to old devices on the other node but do not commit yet (done)
  6. The following procedure will be used for each VM to migrate (not applicable)
    1. set up some environment variables:
      VM=<vm-name>
      DRBD_DEVICES=( $(ssh -q torchio virsh domblklist $VM | sed -nr 's@.*/dev/drbd_(.*)@\1@p') )
      
    2. create one-legged DRBD devices on fiori:
      # on fiori
      for DRBD_DEVICE in "${DRBD_DEVICES[@]}"; do
          create-perfect-drbd-vol --name=$DRBD_DEVICE --size=20 --name1=fiori --ip1=192.168.3.6 --name2=torchio --ip2=192.168.3.7
          drbdadm disconnect drbd_$DRBD_DEVICE
      done
      

      (There is no point in it attempting to connect that the moment.)

    3. Archive the script that create-perfect-drbd-vol created, making sure to include the VM name in the name of the file (this file will be needed later on torchio).
    4. shutdown the VM on torchio
    5. test the ssh connection between fiori and torchio
    6. copy over the VM configuration to fiori from torchio:
      #  on fiori
      virsh define <(ssh -n torchio virsh dumpxml $VM)
      scp torchio:/var/lib/libvirt/qemu/nvram/${VM}_VARS.fd /var/lib/libvirt/qemu/nvram/
      
    7. dd the DRBD devices to fiori from torchio:
      #  on fiori
      for DRBD_DEVICE in "${DRBD_DEVICES[@]}"; do
          ssh -n torchio dd if=/dev/drbd_$DRBD_DEVICE bs=1M | dd of=/dev/drbd_$DRBD_DEVICE bs=1M
      done
      
    8. start the VM on fiori:
      virsh start $VM
  7. Do it for:
    1. penne (done)
    2. lagane (done)
    3. vermicelli (done)
    4. rombi (done)
    5. pestaroli (done)
    6. testaroli (done)
    7. anelli (done)
    8. ziti (done)
    9. barbine (done)
    10. nuvole (done)
    11. marille (done)
    12. stortini (done)
      1. on fiori copy over the OS disk only and do not turn on! (done)
      2. copy over the VM configuration (done)
      3. change boot order so that it boots from an installation/rescue medium (done)
      4. change the NICs’ MAC addresses (done)
      5. chroot into disk (done)
      6. on new stortini disable pcms (done)
      7. on new stortini comment out the large devices and swap from fstab (done)
      8. on new stortini change the IPs in /etc/network/interfaces (done)
      9. on new stortini shutdown (done)
      10. change boot order so it boots from hard disk (done)
      11. log into the new stortini (done)
      12. format it’s large devices (done)
      13. uncomment the large devices in fstab (done)
      14. mount the large devices (done)
      15. mount swap (done)
      16. reboot the new stortini (done)
      17. create temporary ssh keys (done)
      18. start rsyncs in series over the backlan (done)
      19. critical period starts here (done)
      20. save all edits in web browser and exit as much as possible (done)
      21. shutdown the mail server (done)
      22. repeat rsyncs (done)
      23. on torchio shutdown the old stortini (done)
      24. on new stortini fix the IPs in /etc/network/interfaces (27—>26) (done)
      25. on new stortini reboot (done)
      26. ssh into stortini (which is the new one) (done)
      27. on new stortini remove the temporary ssh keys (done)
      28. on new stortini enable PCMS (done)
      29. to work around the modified /etc/network/interfaces run pcms -- -t (done)
      30. on torchio drbdadm down the old stortini’s DRBD devices (done)
      31. start up the mail server (done)
      32. critical period ends here (done)
  8. confirm: all VMs are now running on fiori, right? (done)
  9. commit any changes made to ~root/opt (done)
  10. fix the file/block problem in the VMs’ XMLs (done)
  11. make a new PCMS (done)
  12. decide a weekend to do torchio (done: immediately)
  13. reinstall torchio: (done)
    1. on torchio commit anything I need to commit (done)
    2. remove all, LVs, VGs, PVs (done)
    3. power off (done)
    4. on torchio remove the 4TB disk and and insert one of the new 8TB disks (done)
    5. on sugo configure torchio in PCMS as KVM+LXC server (done)
    6. edit torchio‘s wordpress page and install it according to its own page (making edits along the way) (done)
    7. retrieve the standard working copies for ~root/opt (done)
    8. generate a new keypair for torchio and fiori,  verify two way ssh between fiori and torchio and update pcms-config accordingly (done)
  14. The following procedure will be used for each VM (not applicable)
    1. create second leg of DRBD device on torchio by running the script saved earlier and on fiori running drbadm connect <drbd-device>
    2. copy over the VM configuration to torchio from fiori by running this on torchio:
      VM=<vm-name>
      virsh define <(ssh -n fiori virsh dumpxml $VM)
      scp fiori:/var/lib/libvirt/qemu/nvram/${VM}_VARS.fd /var/lib/libvirt/qemu/nvram/
    3. wait for synchronisation to finish
    4. migrate the VM by running:
      vm-migrate fiori torchio $VM
  15. Do it for:
    1. penne (done)
    2. lagane (done)
    3. vermicelli (done)
    4. rombi (done)
    5. pestaroli (done)
    6. testaroli (done)
    7. anelli (done)
    8. ziti (done)
    9. barbine (done)
    10. nuvole (done)
    11. marille (done)
    12. (decide when to do it for stortini and wait until that time) (done)
    13. stortini (done)
  16. rebalance VMs (pending)
  17. erase the two 4TB disks I removed (done)
  18. jump a little below to do one KVM VM replacement
  19. jump a little below to and one LXC VM replacement (done)
  20. do work machine
  21. do delguine
  22. for cercis:
    1. review my upgrade instructions for Judith (in openproject, I think) and copy some of it to this sub-procedure
    2. prepare USB stick (done)
    3. print relevant pages from here, including this list
    4. add that USB stick and printouts to my packing list
    5. wait till 21/12/2021
    6. do cercis
  23. new key pair (done)
    1. generate a new keypair for fiori & torchio (done)
    2. archive somewhere accessible (skipped)
    3. copy it to both machines (done)
    4. on sugo update PCMS to use that public key for torchio/fiori access (PCMS doesn’t distribute the key pair so I have to do that later, but it does manage the ~/.ssh/authorized_keys) (done)
  24. The following procedure will be used for each VM:
    1. decide if KVM or LXC or decommission
    2. decide on a new name
    3. create DRBD volume(s)
    4. create its wordpress page
    5. create entry on the Computing page but don’t move the service description tag from next to the old system’s name to next to the new system’s name
    6. install new VM according to this procedure
    7. shutdown services on old VM?
    8. configure and start services on new VM and update the service page with any changes
    9. on the Computing page move the service description tag from next to the old system’s name to next to the new system’s name
    10. make sure that the new VM is being backed up
    11. power off the old VM
  25. Do it for:
    1. ziti
    2. lagane
    3. vermicelli
    4. rombi
    5. pestaroli
    6. testaroli
    7. anelli
    8. penne
    9. barbine
    10. nuvole
    11. marille (done)
    12. (decide when to do it for stortini and wait until that time)
    13. stortini
  26. wait a month
  27. on the Installing Debian 11 o on 21/12/2021n a PM or KVM VM page, there is some stuff I have noted can be deleted
  28. for each VM:
    1. remove backups of old VM
    2. remove disk images of old VM
    3. virsh undefine old VM
  29. close ticket #159

    See also