Pegasi Wiki

This wiki acts as a memo for our own work so why not share them? Feel free to browse and use out notes and leave a note while at it.

Linux network storage tests

We have two 24 x NVME storage backends and multiple frontends. I am testing different ways to bring storage to a frontend server and might as well display the results here more publicly.

I used fio and hdparm and I have tested the following configurations so far:

  • local drives for comparison
    • frontend server local SATA SSD RAID
    • storage server local NVME
  • NVME-of, Infiniband 56/40 Gbps
    • NVME-of
    • striped LVM on top of RAID 1 MD from both storage backends, built on top of NVME-of
  • iSER RDMA enabled iSCSI, Infiniband 56/40 Gbps
    • iSER
    • striped LVM on top of RAID 1 MD from both storage backends, built on top of iSER
  • iSER RDMA enabled iSCSI, using drbd replicated device on the storage server, Infiniband 56/40 Gbps
    • iSER
    • DRBD backend
    • striped LVM on top of iSCSI devices

I am using an RDMA Infiniband 56/40Gbps network with ConnectX-3 cards. Each server has a single active connection. I could possibly speed things up by using a different connection per storage server but that may not be required.

For testing I am using two commands:

  • hdparm -Tt <device>
  • fio –name=random-write –ioengine=posixaio –rw=randwrite –bs=4k –numjobs=1 –size=4g –iodepth=1 –runtime=60 –time_based –end_fsync=1

Frontend server local SATA SSD

Supermicro MegaRAID based 2 SSD disk raid as an operating system disk.

hdparm -Tt /dev/sda:
 Timing cached reads:   21894 MB in  1.99 seconds = 11005.29 MB/sec
 Timing buffered disk reads: 3186 MB in  3.00 seconds = 1061.92 MB/sec

random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [w(1)][100.0%][eta 00m:00s]                         
random-write: (groupid=0, jobs=1): err= 0: pid=3542: Mon Jun 28 18:39:00 2021
  write: IOPS=46.5k, BW=182MiB/s (191MB/s)(12.0GiB/67589msec); 0 zone resets
    slat (nsec): min=879, max=127715, avg=1779.53, stdev=650.57
    clat (nsec): min=202, max=1674.4k, avg=8290.97, stdev=2533.87
     lat (usec): min=7, max=1676, avg=10.07, stdev= 2.73
    clat percentiles (nsec):
     |  1.00th=[ 6688],  5.00th=[ 6944], 10.00th=[ 7136], 20.00th=[ 7328],
     | 30.00th=[ 7456], 40.00th=[ 7520], 50.00th=[ 7648], 60.00th=[ 7712],
     | 70.00th=[ 7904], 80.00th=[ 8512], 90.00th=[10816], 95.00th=[11456],
     | 99.00th=[16192], 99.50th=[24192], 99.90th=[28032], 99.95th=[36096],
     | 99.99th=[50432]
   bw (  KiB/s): min=78840, max=418144, per=100.00%, avg=363084.28, stdev=77272.87, samples=69
   iops        : min=19710, max=104536, avg=90771.12, stdev=19318.21, samples=69
  lat (nsec)   : 250=0.01%
  lat (usec)   : 10=85.06%, 20=13.99%, 50=0.94%, 100=0.01%, 250=0.01%
  lat (usec)   : 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%
  cpu          : usr=15.36%, sys=14.12%, ctx=3195292, majf=0, minf=701
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,3145729,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=182MiB/s (191MB/s), 182MiB/s-182MiB/s (191MB/s-191MB/s), io=12.0GiB (12.9GB), run=67589-67589msec

Disk stats (read/write):
    dm-0: ios=0/127661, merge=0/0, ticks=0/1201685, in_queue=1201685, util=50.69%, aggrios=0/161924, aggrmerge=0/12, aggrticks=0/2302922, aggrin_queue=2302922, aggrutil=56.08%
  sda: ios=0/161924, merge=0/12, ticks=0/2302922, in_queue=2302922, util=56.08%

Storage server local NVME

Intel Corporation SSDPE2KE016T8O PCIe NVME local device.

hdparm -Tt /dev/nvme2n1:
 Timing cached reads:   17856 MB in  2.00 seconds = 8943.15 MB/sec
 Timing buffered disk reads: 6910 MB in  3.00 seconds = 2303.19 MB/sec

random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=349MiB/s][w=89.2k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=3782: Mon Jun 28 18:45:44 2021
  write: IOPS=88.6k, BW=346MiB/s (363MB/s)(20.3GiB/60194msec); 0 zone resets
    slat (nsec): min=580, max=81950, avg=1524.49, stdev=380.18
    clat (nsec): min=220, max=600600, avg=7236.48, stdev=1917.52
     lat (usec): min=4, max=602, avg= 8.76, stdev= 2.18
    clat percentiles (nsec):
     |  1.00th=[ 4320],  5.00th=[ 4576], 10.00th=[ 4768], 20.00th=[ 5344],
     | 30.00th=[ 6368], 40.00th=[ 7520], 50.00th=[ 7776], 60.00th=[ 7904],
     | 70.00th=[ 8096], 80.00th=[ 8256], 90.00th=[ 8768], 95.00th=[ 9152],
     | 99.00th=[11072], 99.50th=[12224], 99.90th=[16768], 99.95th=[19072],
     | 99.99th=[34560]
   bw (  KiB/s): min=20728, max=622960, per=100.00%, avg=411116.83, stdev=106584.98, samples=103
   iops        : min= 5182, max=155740, avg=102779.18, stdev=26646.25, samples=103
  lat (nsec)   : 250=0.01%, 750=0.01%
  lat (usec)   : 4=0.01%, 10=97.95%, 20=2.00%, 50=0.04%, 100=0.01%
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.01%
  cpu          : usr=18.80%, sys=25.26%, ctx=5379783, majf=0, minf=780
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,5333109,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=346MiB/s (363MB/s), 346MiB/s-346MiB/s (363MB/s-363MB/s), io=20.3GiB (21.8GB), run=60194-60194msec

Disk stats (read/write):
  nvme0n1: ios=1/1405100, merge=0/23, ticks=0/2932174, in_queue=2932174, util=24.13%

Frontend NVME-of bind using backend NVME

The previous Intel NVME exported with NVME-of, as seen in one front end server.

hdparm -Tt /dev/nvme0n3:
 Timing cached reads:   21334 MB in  1.99 seconds = 10722.82 MB/sec
 Timing buffered disk reads: 5940 MB in  3.00 seconds = 1979.68 MB/sec

random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][w=4839KiB/s][w=1209 IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=3456: Mon Jun 28 18:37:08 2021
  write: IOPS=77.8k, BW=304MiB/s (319MB/s)(18.2GiB/61171msec); 0 zone resets
    slat (nsec): min=860, max=110739, avg=1750.52, stdev=316.16
    clat (usec): min=4, max=576, avg= 8.12, stdev= 2.13
     lat (usec): min=7, max=578, avg= 9.87, stdev= 2.26
    clat percentiles (nsec):
     |  1.00th=[ 6624],  5.00th=[ 6944], 10.00th=[ 7072], 20.00th=[ 7200],
     | 30.00th=[ 7328], 40.00th=[ 7392], 50.00th=[ 7456], 60.00th=[ 7584],
     | 70.00th=[ 7712], 80.00th=[ 8384], 90.00th=[10688], 95.00th=[11328],
     | 99.00th=[13632], 99.50th=[17280], 99.90th=[29824], 99.95th=[44288],
     | 99.99th=[61696]
   bw (  KiB/s): min=15976, max=430968, per=100.00%, avg=370133.55, stdev=90692.00, samples=102
   iops        : min= 3994, max=107742, avg=92533.38, stdev=22673.00, samples=102
  lat (usec)   : 10=85.87%, 20=13.78%, 50=0.32%, 100=0.02%, 250=0.01%
  lat (usec)   : 500=0.01%, 750=0.01%
  cpu          : usr=23.38%, sys=20.65%, ctx=4824959, majf=0, minf=526
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4762087,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=304MiB/s (319MB/s), 304MiB/s-304MiB/s (319MB/s-319MB/s), io=18.2GiB (19.5GB), run=61171-61171msec

Disk stats (read/write):
  nvme0n1: ios=1/738871, merge=0/28, ticks=0/279727, in_queue=279727, util=25.48%

Frontend striped LVM - RAID1 - NVME-of

  • Six NVME-of drives, 3 from each server
  • Three RAID1 devices, each consisting of 2 drives from 2 different storage servers
  • One striped LVM volume group consisting of the three RAID1 MD drives
  • One logical volume from the volume group
hdparm -Tt /dev/datavault/testvolume:
 Timing cached reads:   21458 MB in  1.99 seconds = 10784.51 MB/sec
 Timing buffered disk reads: 7640 MB in  3.00 seconds = 2546.47 MB/sec

random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][w=2029KiB/s][w=507 IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=4526: Tue Jun 29 08:05:20 2021
  write: IOPS=76.2k, BW=298MiB/s (312MB/s)(17.8GiB/61302msec); 0 zone resets
    slat (nsec): min=802, max=142478, avg=1831.73, stdev=521.29
    clat (usec): min=5, max=3873, avg= 8.61, stdev= 6.04
     lat (usec): min=7, max=3874, avg=10.44, stdev= 6.13
    clat percentiles (nsec):
     |  1.00th=[ 6624],  5.00th=[ 6880], 10.00th=[ 7072], 20.00th=[ 7264],
     | 30.00th=[ 7392], 40.00th=[ 7520], 50.00th=[ 7648], 60.00th=[ 7840],
     | 70.00th=[ 8768], 80.00th=[10432], 90.00th=[11072], 95.00th=[11584],
     | 99.00th=[14016], 99.50th=[22656], 99.90th=[29312], 99.95th=[43264],
     | 99.99th=[52480]
   bw (  KiB/s): min=  838, max=438626, per=100.00%, avg=349216.78, stdev=92392.10, samples=106
   iops        : min=  209, max=109656, avg=87303.93, stdev=23097.92, samples=106
  lat (usec)   : 10=74.54%, 20=24.87%, 50=0.57%, 100=0.01%, 250=0.01%
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%
  cpu          : usr=23.60%, sys=22.23%, ctx=4868012, majf=0, minf=979
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4674124,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=298MiB/s (312MB/s), 298MiB/s-298MiB/s (312MB/s-312MB/s), io=17.8GiB (19.1GB), run=61302-61302msec

Disk stats (read/write):
    dm-3: ios=1/891284, merge=0/0, ticks=0/694354549, in_queue=694354549, util=25.73%, aggrios=1/904069, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
    md0: ios=1/904069, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/903755, aggrmerge=0/188, aggrticks=0/333576, aggrin_queue=333577, aggrutil=23.45%
  nvme0n1: ios=1/903756, merge=0/188, ticks=1/338804, in_queue=338805, util=23.45%
  nvme1n1: ios=0/903755, merge=0/189, ticks=0/328349, in_queue=328349, util=23.27%

Frontend iSER single drive

I set up iSER target to the storage backend, iSER initiator to the frontend and created an XFS filesystem. Exactly as with NVME-of cases. Looks slightly less in writing speed but very solid latency.

hdparm -Tt /dev/disk/by-path/ip-xxx-lun-1:

 Timing cached reads:   21130 MB in  1.99 seconds = 10618.36 MB/sec
 Timing buffered disk reads: 3704 MB in  3.00 seconds = 1234.42 MB/sec


random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=3116: Tue Jun 29 15:52:07 2021
  write: IOPS=70.8k, BW=277MiB/s (290MB/s)(17.3GiB/63904msec); 0 zone resets
    slat (nsec): min=836, max=324487, avg=1694.97, stdev=519.52
    clat (nsec): min=267, max=938988, avg=7613.02, stdev=1547.30
     lat (usec): min=6, max=940, avg= 9.31, stdev= 1.67
    clat percentiles (nsec):
     |  1.00th=[ 6688],  5.00th=[ 6880], 10.00th=[ 7008], 20.00th=[ 7136],
     | 30.00th=[ 7264], 40.00th=[ 7328], 50.00th=[ 7392], 60.00th=[ 7520],
     | 70.00th=[ 7584], 80.00th=[ 7776], 90.00th=[ 8256], 95.00th=[ 8512],
     | 99.00th=[10432], 99.50th=[22400], 99.90th=[25728], 99.95th=[26752],
     | 99.99th=[29568]
   bw (  KiB/s): min=33408, max=423456, per=100.00%, avg=394176.32, stdev=72408.07, samples=91
   iops        : min= 8352, max=105864, avg=98544.10, stdev=18102.02, samples=91
  lat (nsec)   : 500=0.01%
  lat (usec)   : 4=0.01%, 10=98.87%, 20=0.55%, 50=0.58%, 100=0.01%
  lat (usec)   : 250=0.01%, 750=0.01%, 1000=0.01%
  cpu          : usr=21.22%, sys=20.55%, ctx=4836908, majf=0, minf=893
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4526608,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=277MiB/s (290MB/s), 277MiB/s-277MiB/s (290MB/s-290MB/s), io=17.3GiB (18.5GB), run=63904-63904msec

Disk stats (read/write):
  sdd: ios=1/620934, merge=0/5878, ticks=1/224797, in_queue=224798, util=34.91%

Frontend striped LVM - RAID1 - iSER

Similar setup to NVME-of but using iSER instead.

  • Six iSER mounted drives, 3 from each server
  • Three RAID1 devices, each consisting of 2 drives from 2 different storage servers
  • One striped LVM volume group consisting of the three RAID1 MD drives
  • One logical volume from the volume group
hdparm -Tt /dev/datavault/testvolume:
 Timing cached reads:   21368 MB in  1.99 seconds = 10739.75 MB/sec
 Timing buffered disk reads: 3898 MB in  3.00 seconds = 1298.67 MB/sec

random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=74.6MiB/s][w=19.1k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=10059: Mon Jul  5 12:01:40 2021
  write: IOPS=69.7k, BW=272MiB/s (285MB/s)(16.1GiB/60525msec); 0 zone resets
    slat (nsec): min=839, max=261971, avg=1676.40, stdev=620.25
    clat (nsec): min=205, max=1344.2k, avg=8360.25, stdev=8926.87
     lat (usec): min=7, max=1345, avg=10.04, stdev= 8.97
    clat percentiles (usec):
     |  1.00th=[    7],  5.00th=[    7], 10.00th=[    8], 20.00th=[    8],
     | 30.00th=[    8], 40.00th=[    8], 50.00th=[    8], 60.00th=[    8],
     | 70.00th=[    8], 80.00th=[    8], 90.00th=[    9], 95.00th=[   10],
     | 99.00th=[   25], 99.50th=[   35], 99.90th=[  155], 99.95th=[  186],
     | 99.99th=[  273]
   bw (  KiB/s): min=21752, max=424928, per=100.00%, avg=367624.85, stdev=96931.23, samples=91
   iops        : min= 5440, max=106232, avg=91906.21, stdev=24232.72, samples=91
  lat (nsec)   : 250=0.01%, 500=0.01%
  lat (usec)   : 4=0.01%, 10=95.68%, 20=2.66%, 50=1.27%, 100=0.14%
  lat (usec)   : 250=0.23%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%
  cpu          : usr=20.85%, sys=21.57%, ctx=4476176, majf=0, minf=871
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4215656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=272MiB/s (285MB/s), 272MiB/s-272MiB/s (285MB/s-285MB/s), io=16.1GiB (17.3GB), run=60525-60525msec

Disk stats (read/write):
    dm-3: ios=1/548969, merge=0/0, ticks=1/830242240, in_queue=830242241, util=36.64%, aggrios=1/623957, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
    md0: ios=1/623957, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/623048, aggrmerge=0/526, aggrticks=0/213754, aggrin_queue=213754, aggrutil=35.33%
  sdb: ios=1/623078, merge=0/497, ticks=1/211402, in_queue=211402, util=35.33%
  sdc: ios=0/623019, merge=0/556, ticks=0/216106, in_queue=216106, util=35.32%

Single iSER - DRBD

  • One iSER mounted drive from primary server
hdparm -Tt /dev/sdb:
 Timing cached reads:   21054 MB in  1.99 seconds = 10581.09 MB/sec
 Timing buffered disk reads: 3908 MB in  3.00 seconds = 1302.55 MB/sec

fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.19
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=11449: Tue Jul  6 14:14:06 2021
  write: IOPS=62.5k, BW=244MiB/s (256MB/s)(15.7GiB/65801msec); 0 zone resets
    slat (nsec): min=913, max=134212, avg=1668.50, stdev=498.55
    clat (usec): min=3, max=1588, avg= 7.58, stdev= 2.04
     lat (usec): min=7, max=1590, avg= 9.25, stdev= 2.13
    clat percentiles (nsec):
     |  1.00th=[ 6560],  5.00th=[ 6752], 10.00th=[ 6880], 20.00th=[ 7072],
     | 30.00th=[ 7200], 40.00th=[ 7328], 50.00th=[ 7456], 60.00th=[ 7520],
     | 70.00th=[ 7648], 80.00th=[ 7776], 90.00th=[ 8032], 95.00th=[ 8512],
     | 99.00th=[10176], 99.50th=[22656], 99.90th=[25984], 99.95th=[27008],
     | 99.99th=[36096]
   bw (  KiB/s): min=67536, max=451744, per=100.00%, avg=396963.38, stdev=70142.99, samples=82
   iops        : min=16884, max=112936, avg=99240.79, stdev=17535.73, samples=82
  lat (usec)   : 4=0.01%, 10=98.97%, 20=0.42%, 50=0.61%, 100=0.01%
  lat (usec)   : 250=0.01%, 500=0.01%
  lat (msec)   : 2=0.01%
  cpu          : usr=20.62%, sys=16.00%, ctx=4239733, majf=0, minf=807
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4111551,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=244MiB/s (256MB/s), 244MiB/s-244MiB/s (256MB/s-256MB/s), io=15.7GiB (16.8GB), run=65801-65801msec

Disk stats (read/write):
  sdb: ios=1/189398, merge=0/3781, ticks=1/229437, in_queue=229437, util=38.13%

Frontend striped LVM - iSER - DRBD

  • Three iSER mounted drives from primary server
  • One striped LVM volume group consisting of the three iSER mounted devices
  • One logical volume from the volume group
hdparm -Tt /dev/datavault/test:
 Timing cached reads:   20898 MB in  1.99 seconds = 10501.52 MB/sec
 Timing buffered disk reads: 3812 MB in  3.00 seconds = 1270.48 MB/sec

fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1                            random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.19
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=11743: Tue Jul  6 14:25:35 2021
  write: IOPS=61.8k, BW=241MiB/s (253MB/s)(15.6GiB/66078msec); 0 zone resets
    slat (nsec): min=815, max=152844, avg=1670.57, stdev=552.98
    clat (usec): min=5, max=164, avg= 7.63, stdev= 1.64
     lat (usec): min=7, max=170, avg= 9.30, stdev= 1.76
    clat percentiles (nsec):
     |  1.00th=[ 6752],  5.00th=[ 6944], 10.00th=[ 7072], 20.00th=[ 7200],
     | 30.00th=[ 7264], 40.00th=[ 7392], 50.00th=[ 7456], 60.00th=[ 7520],
     | 70.00th=[ 7584], 80.00th=[ 7712], 90.00th=[ 8032], 95.00th=[ 8512],
     | 99.00th=[10944], 99.50th=[23168], 99.90th=[26240], 99.95th=[27264],
     | 99.99th=[40704]
   bw (  KiB/s): min= 1912, max=419584, per=100.00%, avg=399272.78, stdev=66461.85, samples=81
   iops        : min=  478, max=104896, avg=99818.16, stdev=16615.56, samples=81
  lat (usec)   : 10=98.81%, 20=0.42%, 50=0.76%, 100=0.01%, 250=0.01%
  cpu          : usr=18.56%, sys=16.97%, ctx=4211106, majf=0, minf=763
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4084062,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=241MiB/s (253MB/s), 241MiB/s-241MiB/s (253MB/s-253MB/s), io=15.6GiB (16.7GB), run=66078-66078msec

Disk stats (read/write):
    dm-3: ios=1/103363, merge=0/0, ticks=0/116893, in_queue=116893, util=35.07%, aggrios=1/204245, aggrmerge=0/1728, aggrticks=0/235478, aggrin_queue=235478, aggrutil=38.83%
  sdb: ios=1/204245, merge=0/1728, ticks=0/235478, in_queue=235478, util=38.83%

Conclusions

I still have not studied hdparm / fio analysis and possible alternate analysis methods well enough to make big conclusions. But quick study shows the Infiniband network seems to be quite transparent and performance seems good.

The latency seems to be unchanged on NVME-of compared to local NVME which is incredible. With RAID1 + LVM there is a slight increase but we are talking only a three nanosecond increase from 10 to 13 or so, unless I am reading that fio output wrong.

iSER surprises with it's solid latency but for some reason gains no advantage when using LVM striping.

My only worry would be the CPU usage of the MD RAID1. I may try DRBD backend replication and some solution on the front end to swap block devices on the fly.

So far the best performing options (rough roundings there) of networked storages would be:

  • By lowest latency: iSER, 10us single drive, 10.05us striped raid1, beating NVME-of (11us / 12us) by ~10%
  • By read speed: NVME-of, 1980MB/s single drive, 2546MB/s striped raid1, beating iSER (1234MB/s / 1299MB/s) by 40-50%
  • By write speed: NVME-of, 319MB/s single drive, 312MB/s striped raid1, beating iSER (290MB/s / 285MB/s) by ~10%

Please leave a comment if you have any ideas.

Comments

All comments and corrections are welcome.

 stars  from 0 votes

Leave a comment

Enter your comment:
E H X B S
 

  //check if we are running within the DokuWiki environment if (!defined("DOKU_INC")){ die(); } //place the needed HTML source codes BELOW this line