Pegasi Wiki

This wiki acts as a memo for our own work so why not share them? Feel free to browse and use out notes and leave a note while at it.

Infiniband RDMA native setup for Linux

Overview

Native Infiniband RDMA enables lossless storage traffic, low CPU loads, high speeds and a very low latency. We decided to with 40Gbps native Infiniband with our new NVMe based storage backend. For software solution we use Linstor that supports native Infiniband RDMA and gives us flexibility.

Here is a quick sheet on how to get native Infiniband up and running with Almalinux 8 / CentOS 8 / RHEL 8.

Interfaces

We have ConnectX-3 cards and two Mellanox 56Gbps switches where one will be a stand-by and other will be in production. We have connected cables our two storage backend nodes and one of our front end nodes. The rest are still operating in the legacy storage and will be upgraded once the virtual guests have been migrated to the new storage.

Setup

Here are the tasks required to set up the native Infiniband environment. Do this in all storage servers. One server needs to be primary and it needs to hold the highest PRIORITY flag (look below).

  • Almalinux 8 minimal install
  • dnf in vim rdma-core libibverbs-utils librdmacm librdmacm-utils ibacm infiniband-diags opensm
  • systemctl enable rdma
  • systemctl start rdma
  • ip link show ib0
  • ip link show ib1
    • Write down the last 8 bytes of the infiniband MAC addresses
  • vim /etc/udev/rules.d/70-persistent-ipoib.rules. Add lines and replace the “xx:xx:xx:xx:xx:xx:xx:xx” with the bytes you copied above:
ACTION=="add", SUBSYSTEM=="net", DRIVERS=="?*", ATTR{type}=="32", ATTR{address}=="?*xx:xx:xx:xx:xx:xx:xx:xx", NAME="mlx4_ib0
ACTION=="add", SUBSYSTEM=="net", DRIVERS=="?*", ATTR{type}=="32", ATTR{address}=="?*xx:xx:xx:xx:xx:xx:xx:xx", NAME="mlx4_ib1
  • vim /etc/security/limits.d/rdma.conf. Add lines:
@rdma    soft    memlock     unlimited
@rdma    hard    memlock     unlimited
  • ibstat
    • write down the mlx* port GUIDs
  • do not touch /etc/rdma/opensm.conf
  • vim /etc/sysconfig/opensm. Modify following and replace the GUIDS with the ones you wrote down (to master server PRIORITY 15, others below that):
GUIDS="0xXXXXXXXXXXXXXX 0xXXXXXXXXXXXXXX"
PRIORITY=15
  • vim /etc/rdma/partitions.conf, add your native parition definition as follows
DataVault_A=0x0002,rate=7,mtu=4,scope=2,defmember=full:ALL=full;
  • systemctl enable opensm
  • reboot

Test

Test RDMA connectivity with ibping. First write down each server Port GUIDs by issuing command

ibstat

Then start ibping in server mode on each server

ibping -S

From each client run ping with the Port GUID you wrote down earlier

ibping -G 0xXXXXX

You should see very low latency pongs as a response.

IP setup for Infiniband

Originally I did not want to do this but since iSER seems to outperform SRP then why not try it.

Let's use nm-cli and setup our interfaces with commands. If you for some strange reason do not have ib0/ib1 devices set up automatically you can add one with this command

nmcli connection add type infiniband con-name ib0 ifname ib0 transport-mode Connected mtu 65520

Otherwise you can do

nmcli connection modify ib0 transport-mode Connected
nmcli connection modify ib0 mtu 65520
nmcli connection modify ib0 ipv4.addresses '10.0.0.1/24'
nmcli connection modify ib0 ipv4.method manual
nmcli connection up ib0

I skipped gateway / dns setups since I do not need them in a storage network.

Comments

All comments and corrections are welcome.

 stars  from 0 votes

Leave a comment

Enter your comment:
R P Z B P
 

  //check if we are running within the DokuWiki environment if (!defined("DOKU_INC")){ die(); } //place the needed HTML source codes BELOW this line