Introduction
I had access to a couple of single-port PCI Infiniband cards and a suitable cable. I wanted to use them to set up a point-to-point connection for syncing DRBD devices between to Debian 11 systems.
I found a few links describing what needs to be done (Arch Linux‘s Infiniband page was probably the best), but none of them really explained what depended on what in sufficient granularity for my understanding. Hence, writing this page.
This page is a work in progress! When I meet a new problem I’ll come back and update it. Laste edited: 13/02/2021.
Note that an Infiniband network needs at least one subnet manager to check the network for new adapters and adds them to the routing tables. A consequence of this is that how the first node behaves during its configuration is slightly different from how the second node behaves during its configuration, because the first node will not already have a subnet manager available to it whereas the second node will. This procedure tries to take that into account.
Procedure
Complete this procedure on each of the two nodes, ideally doing it in parallel.
- Insert the cards and connect with the cable.
- Use lspci to verify the card is detected, as in this output:
torchio# lspci ... 03:00.0 Infiniband controller: Mellanox Technologies MT27600 [Connect-IB] ... torchio#
- Install the following packages to get access to the ibstat command:
apt-get install infiniband-diags
- Run ibstat; the state should be either Active or initializing, depending on whether there is already a subnet manager on the network, as shown in this output:
torchio# ibstat ... State: Initializing Physical state: LinkUp ... torchio#
- Without the correct modules loaded, the IB network cannot be reached (regardless of state), as shown in the output below:
torchio# ibhosts ibwarn: [1267] get_abi_version: can't read ABI version from /sys/class/infiniband_mad/abi_version (No such file or directory): is ib_umad module loaded? ibwarn: [1267] mad_rpc_open_port: can't open UMAD port ((null):0) src/ibnetdisc.c:786; can't open MAD port ((null):0) /usr/sbin/ibnetdiscover: iberror: failed: discover failed torchio#
- Load the following modules:
modprobe -a ib_uverbs mlx5_ib mlx5_core ib_core ib_umad rdma_ucm
- Run ibhosts again; this time it should report other HCAs (cards) on the IB network, as shown in the output below:
torchio# ibhosts Ca : 0x7cfe900300b82270 ports 1 "MT4113 ConnectIB Mellanox Technologies" Ca : 0x7cfe900300b82220 ports 1 "MT4113 ConnectIB Mellanox Technologies" torchio#
- But looking back at ibstat it still shows Initializing because we still have no subnet manager, as shown in the output below:
torchio# ibstat ... State: Initializing Physical state: LinkUp ... torchio#
- Regardless of whether there is already a subnet manager on the network, install and start one:
apt-get -y install opensm
- Run ibstat again; this time it should show the state as Active, as shown in this output:
torchio# ibstat ... State: Active Physical state: LinkUp ... torchio#
- Even though ibstat and ibhosts now work, the infiniband IP interface is still not accessible until this is run:
torchio# modprobe -a ib_ipoib torchio#
(Note that that fix is not persistent.)
- To determine the infiniband IP interface name run:
torchio# ifconfig -a ... torchio#
(Note that the interface name can vary; in my own case it was ibp1s0, which is used in the text below.)
- Temporarily configure an IP address on the ib0 interface from both hosts with something like this:
fiori# ifconfig ibp1s0 192.168.2.6 netmask 255.255.255.0 up fiori# torchio# ifconfig ibp1s0 192.168.2.7 netmask 255.255.255.0 up torchio#
- Do a ping test, as shown in this output:
torchio# ping 192.168.2.6 -c 2 PING 192.168.2.6 (192.168.2.6) 56(84) bytes of data. 64 bytes from 192.168.2.6: icmp_seq=1 ttl=64 time=0.162 ms 64 bytes from 192.168.2.6: icmp_seq=2 ttl=64 time=0.123 ms --- 192.168.2.6 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 8ms rtt min/avg/max/mdev = 0.123/0.142/0.162/0.022 ms torchio#
- To make all the above work reboot safe:
- Add the needed modules to /etc/modules by running:
echo ib_uverbs mlx5_ib mlx5_core ib_core ib_umad rdma_ucm ib_ipoib | xargs -n 1 echo >> /etc/modules-load.d/ib.conf
- Add a suitable entry to /etc/network/interfaces or /etc/network/interfaces.d/ibp1s0 as shown in the extract below:
auto ibp1s0 iface ibp1s0 inet static address 192.168.2.7 netmask 255.255.255.0
- Reboot and test that ping still works.
- Add the needed modules to /etc/modules by running:
Tuning and performance testing
- Install ibutils and run ibdiag net and examine the output for warnings and errors, as shown in the output below:
torchio# apt-get install ibutils torchio# ibdiagnet ... -W- Topology file is not specified. ... -W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps ... torchio#
- The first warning I did not manage to solve in a nice manner. I could use option -wt to write a topology file:
torchio# ibdiagnet -wt ~/pasta.top > /dev/null torchio#
but when I tried to call ibdiag net specifying that file as the topology file then:
torchio# ibdiagnet -t ~/pasta.top -s S7cfe900300b82270 ... Aborted torchio#
- The second warning may only apply to multicast groups. The interfaces can be run in ‘datagram’ or ‘connection’ mode; the former offers lower latency; the latter offers a higher MTU. This page contains the following table:
Mode MTU MB/s us latency datagram 2044 707 19.4 connected 2044 353 18.9 connected 65520 726 19.6
which I take to mean: the most important thing is not to change the MTU from that that is the default for a particular mode. - I did my own performance tests as follows:
torchio# apt-get -y install netperf torchio# netserver -4 torchio# fiori# apt-get -y install netperf fiori# netperf -4 -H 192.168.3.7 # over DRBD cluster network MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.7 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 10.02 936.37
fiori# netperf -4 -H 192.168.2.7 # over IB MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.7 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 10.00 3308.75
fiori#
But this is still a long way from what I was expecting.
To do
- Look at l-s’s IB tests – is there anything useful there?
- Look at the other scripts – I thought there was a way to get info on cables
- Switch DRBD to using the infiniband IPs