Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: Enable *NVMe-over-TCP* for rk35xx/rk3588/rockchip64/uefi/wsl #6368

Merged
merged 1 commit into from
Mar 7, 2024

Conversation

ColorfulRhino
Copy link
Collaborator

@ColorfulRhino ColorfulRhino commented Mar 7, 2024

Description

This PR enables NVMe over TCP kernel support for the board families rk35xx (vendor), rockchip-rk3588 (edge), rockchip64 (current, edge) as well as the generic uefi target (current, edge) and wsl (current, edge)¹.

I used kernel-config to only change stuff in "Device drivers" --> "NVMe", other values were changed/updated automatically by kernel-config. Especially for uefi and wsl targets since they have not been updated in a while. Host used to run kernel-config: x86/AMD64 machine with gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0.

This PR also serves as a little guide/writeup/documentation since I myself struggled to find information on this topic at first😀 Let me know if this would be a good fit for the wiki or if that's out-of-scope.

¹ (Note: for wsl only host-mode is supported, target-mode is not available).

AR-2282

What is NVMe over TCP?

TL;DR: It's a fast and efficient network storage as a recent alternative to NFS or iSCSI.

NVMe over TCP is part of NVMe over Fabrics (NVMe-oF), which enables remote network access to NVMe devices (SSDs). There are other transport layers like RDMA for enterprise networks, but kernel 5.0 introduced NVMe-oF using TCP, which means special hardware required. You can simply use your device's ethernet port and your home network.

Once a host ist connected to a NVMe SSD over the network, the host has direct access and device behaves as if it was physically plugged into the host. This means, you can not only transfer files and such, but also partition the device or send a secure erase command via nvme-cli for example.

There advantages using NVMe over TCP over something like NFS, since NVMe over TCP has very low overhead, meaning the performance should be as close to the physical limits of your hardware as possible; namely your ethernet/network speed, the speed of your SSD and the PCIe connection speed (see below for some performance benchmarks). NFS also has problems when writing to databases like SQLite over NFS, since NFS does locking differently than a real file system.

NVMe over TCP specs and slides: https://nvmexpress.org/wp-content/uploads/March-2019-NVMe-TCP-What-You-Need-to-Know-About-the-Specification.pdf

How can I configure and use NVMe over TCP?

For example you could use this to connect multiple NVMe SSDs on different SBCs to form a storage array, basically a JBOD.

Here are some guides on how to export an NVMe SSD on a target machine to make it available on the network, and then how to connect to it from another host. nvme-cli is your friend :)

Advanced topics:

Note regarding legacy kernels: I have not enabled NVMe over TCP on legacy kernels, since kernels <5.14 and >=5.14 are not able to connect through NVMe over TCP to each other. If you still want to export your NVMe SSD on a machine using legacy kernel 5.10, apply a patch from this commit to the legacy kernel. I tested this method by patching it myself and it worked.

How Has This Been Tested?

Kernel successfully compiled for the following targets:

  • rk3588-vendor
  • rockchip64-edge
  • wsl2-x86-edge
  • uefi-arm64-edge

Tested an export on the target host (FriendlyElec CM3588 NAS) using the nvmet-tcp module and connected to it with a FriendlyElec NanoPi R5C using the nvme-tcp module and nvme-cli.

modinfo on the various modules (NanoPi R5C edge 6.7):

root@nanopi-r5c:~# modinfo nvme_tcp
filename:       /lib/modules/6.7.9-edge-rockchip64/kernel/drivers/nvme/host/nvme-tcp.ko
license:        GPL v2
depends:        nvme-fabrics,nvme-keyring
intree:         Y
name:           nvme_tcp
vermagic:       6.7.9-edge-rockchip64 SMP preempt mod_unload aarch64
parm:           so_priority:nvme tcp socket optimize priority (int)
parm:           tls_handshake_timeout:nvme TLS handshake timeout in seconds (default 10) (int)
root@nanopi-r5c:~# modinfo nvmet
filename:       /lib/modules/6.7.9-edge-rockchip64/kernel/drivers/nvme/target/nvmet.ko
license:        GPL v2
import_ns:      NVME_TARGET_PASSTHRU
depends:        nvme-keyring
intree:         Y
name:           nvmet
vermagic:       6.7.9-edge-rockchip64 SMP preempt mod_unload aarch64
root@nanopi-r5c:~# modinfo nvmet-tcp
filename:       /lib/modules/6.7.9-edge-rockchip64/kernel/drivers/nvme/target/nvmet-tcp.ko
alias:          nvmet-transport-3
license:        GPL v2
depends:        nvmet
intree:         Y
name:           nvmet_tcp
vermagic:       6.7.9-edge-rockchip64 SMP preempt mod_unload aarch64
parm:           so_priority:nvmet tcp socket optimize priority: Default 0
parm:           idle_poll_period_usecs:nvmet tcp io_work poll till idle time period in usecs: Default 0
parm:           tls_handshake_timeout:nvme TLS handshake timeout in seconds (default 10) (int)

Benchmarks comparing NVMe over TCP with NFS

Test setup:

  • NVMe-over-TCP target/exporter: FriendlyElec CM3588 with NVMe SSD, running Armbian Bookworm
  • NVMe-over-TCP "client": FriendlyElec NanoPi R5C, running Armbian Bookworm
  • both have 2.5G ethernet, but are connected through a 1G switch, which is the bottleneck in this case (maximum physical throughput about 120 MB/s)

Read test (no real difference in speed):

Fio read benchmark using NFS
fio --bs=64k --numjobs=4 --iodepth=4 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --size=2G --name=read-phase --rw=randread
read-phase: (g=0): rw=randread, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=4
...
fio-3.33
Starting 4 processes
read-phase: Laying out IO file (1 file / 2048MiB)
read-phase: Laying out IO file (1 file / 2048MiB)
read-phase: Laying out IO file (1 file / 2048MiB)
read-phase: Laying out IO file (1 file / 2048MiB)
Jobs: 4 (f=4): [r(4)][100.0%][r=108MiB/s][r=1729 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=315282: Mon Mar  4 23:35:12 2024
  read: IOPS=437, BW=27.4MiB/s (28.7MB/s)(1642MiB/60003msec)
    slat (usec): min=21, max=6055, avg=91.60, stdev=130.84
    clat (usec): min=295, max=24673, avg=9017.64, stdev=1729.06
     lat (usec): min=1575, max=24773, avg=9109.24, stdev=1729.44
    clat percentiles (usec):
     |  1.00th=[ 4113],  5.00th=[ 6521], 10.00th=[ 7832], 20.00th=[ 8455],
     | 30.00th=[ 8717], 40.00th=[ 8717], 50.00th=[ 8848], 60.00th=[ 8979],
     | 70.00th=[ 9110], 80.00th=[ 9372], 90.00th=[10552], 95.00th=[12387],
     | 99.00th=[15664], 99.50th=[16581], 99.90th=[19006], 99.95th=[19530],
     | 99.99th=[21365]
   bw (  KiB/s): min=23168, max=29056, per=25.05%, avg=28063.45, stdev=858.50, samples=119
   iops        : min=  362, max=  454, avg=438.47, stdev=13.42, samples=119
  lat (usec)   : 500=0.01%
  lat (msec)   : 2=0.02%, 4=0.85%, 10=86.83%, 20=12.26%, 50=0.03%
  cpu          : usr=1.95%, sys=5.06%, ctx=26371, majf=0, minf=80
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=26270,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4
read-phase: (groupid=0, jobs=1): err= 0: pid=315283: Mon Mar  4 23:35:12 2024
  read: IOPS=437, BW=27.3MiB/s (28.7MB/s)(1640MiB/60004msec)
    slat (usec): min=21, max=7653, avg=93.96, stdev=156.12
    clat (usec): min=127, max=25270, avg=9026.85, stdev=1712.87
     lat (usec): min=1994, max=25358, avg=9120.81, stdev=1717.58
    clat percentiles (usec):
     |  1.00th=[ 4228],  5.00th=[ 6587], 10.00th=[ 7898], 20.00th=[ 8586],
     | 30.00th=[ 8717], 40.00th=[ 8717], 50.00th=[ 8848], 60.00th=[ 8848],
     | 70.00th=[ 9110], 80.00th=[ 9372], 90.00th=[10421], 95.00th=[12387],
     | 99.00th=[15664], 99.50th=[16319], 99.90th=[18744], 99.95th=[20055],
     | 99.99th=[23987]
   bw (  KiB/s): min=23424, max=29184, per=25.02%, avg=28023.22, stdev=850.11, samples=119
   iops        : min=  366, max=  456, avg=437.84, stdev=13.30, samples=119
  lat (usec)   : 250=0.01%
  lat (msec)   : 2=0.01%, 4=0.79%, 10=86.79%, 20=12.36%, 50=0.05%
  cpu          : usr=1.94%, sys=5.11%, ctx=26444, majf=0, minf=83
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=26234,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4
read-phase: (groupid=0, jobs=1): err= 0: pid=315284: Mon Mar  4 23:35:12 2024
  read: IOPS=437, BW=27.4MiB/s (28.7MB/s)(1641MiB/60003msec)
    slat (usec): min=21, max=7681, avg=94.27, stdev=147.04
    clat (usec): min=1464, max=23385, avg=9018.06, stdev=1708.21
     lat (usec): min=1761, max=23425, avg=9112.34, stdev=1715.44
    clat percentiles (usec):
     |  1.00th=[ 4228],  5.00th=[ 6521], 10.00th=[ 7832], 20.00th=[ 8586],
     | 30.00th=[ 8717], 40.00th=[ 8717], 50.00th=[ 8848], 60.00th=[ 8979],
     | 70.00th=[ 8979], 80.00th=[ 9372], 90.00th=[10421], 95.00th=[12256],
     | 99.00th=[15533], 99.50th=[16581], 99.90th=[18482], 99.95th=[19268],
     | 99.99th=[21365]
   bw (  KiB/s): min=23296, max=29184, per=25.04%, avg=28049.03, stdev=828.99, samples=119
   iops        : min=  364, max=  456, avg=438.24, stdev=12.97, samples=119
  lat (msec)   : 2=0.02%, 4=0.74%, 10=86.86%, 20=12.35%, 50=0.02%
  cpu          : usr=1.89%, sys=5.22%, ctx=26507, majf=0, minf=81
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=26258,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4
read-phase: (groupid=0, jobs=1): err= 0: pid=315285: Mon Mar  4 23:35:12 2024
  read: IOPS=437, BW=27.4MiB/s (28.7MB/s)(1641MiB/60004msec)
    slat (usec): min=21, max=8658, avg=93.76, stdev=151.17
    clat (usec): min=1796, max=23939, avg=9018.15, stdev=1707.54
     lat (usec): min=1892, max=23963, avg=9111.90, stdev=1715.78
    clat percentiles (usec):
     |  1.00th=[ 4146],  5.00th=[ 6521], 10.00th=[ 7832], 20.00th=[ 8586],
     | 30.00th=[ 8717], 40.00th=[ 8717], 50.00th=[ 8848], 60.00th=[ 8979],
     | 70.00th=[ 8979], 80.00th=[ 9372], 90.00th=[10552], 95.00th=[12256],
     | 99.00th=[15533], 99.50th=[16450], 99.90th=[18744], 99.95th=[19530],
     | 99.99th=[21627]
   bw (  KiB/s): min=23552, max=28928, per=25.04%, avg=28054.38, stdev=807.24, samples=119
   iops        : min=  368, max=  452, avg=438.33, stdev=12.63, samples=119
  lat (msec)   : 2=0.02%, 4=0.84%, 10=86.73%, 20=12.38%, 50=0.05%
  cpu          : usr=1.90%, sys=5.17%, ctx=26450, majf=0, minf=80
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=26261,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=109MiB/s (115MB/s), 27.3MiB/s-27.4MiB/s (28.7MB/s-28.7MB/s), io=6564MiB (6883MB), run=60003-60004msec
Fio read benchmark using NVMe over TCP
fio --bs=64k --numjobs=4 --iodepth=4 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n3 --name=read-phase --rw=randread
read-phase: (g=0): rw=randread, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=4
...
fio-3.33
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=112MiB/s][r=1798 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=312297: Mon Mar  4 23:23:02 2024
  read: IOPS=445, BW=27.9MiB/s (29.2MB/s)(1672MiB/60006msec)
    slat (usec): min=19, max=11343, avg=208.49, stdev=331.71
    clat (usec): min=316, max=25848, avg=8735.55, stdev=2718.12
     lat (usec): min=1705, max=26145, avg=8944.04, stdev=2662.31
    clat percentiles (usec):
     |  1.00th=[ 2024],  5.00th=[ 3490], 10.00th=[ 5407], 20.00th=[ 6587],
     | 30.00th=[ 7504], 40.00th=[ 8225], 50.00th=[ 8848], 60.00th=[ 9503],
     | 70.00th=[10159], 80.00th=[10814], 90.00th=[11994], 95.00th=[12911],
     | 99.00th=[15008], 99.50th=[15926], 99.90th=[17957], 99.95th=[18744],
     | 99.99th=[21103]
   bw (  KiB/s): min=26112, max=32640, per=24.89%, avg=28543.61, stdev=1208.55, samples=119
   iops        : min=  408, max=  510, avg=445.92, stdev=18.91, samples=119
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.92%, 4=4.86%, 10=62.02%, 20=32.14%, 50=0.02%
  cpu          : usr=1.83%, sys=9.20%, ctx=28266, majf=0, minf=85
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=26746,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4
read-phase: (groupid=0, jobs=1): err= 0: pid=312298: Mon Mar  4 23:23:02 2024
  read: IOPS=445, BW=27.8MiB/s (29.2MB/s)(1671MiB/60006msec)
    slat (usec): min=20, max=17913, avg=209.97, stdev=367.47
    clat (usec): min=224, max=30789, avg=8740.54, stdev=2775.01
     lat (usec): min=1649, max=31838, avg=8950.51, stdev=2720.51
    clat percentiles (usec):
     |  1.00th=[ 2024],  5.00th=[ 3392], 10.00th=[ 5276], 20.00th=[ 6587],
     | 30.00th=[ 7504], 40.00th=[ 8225], 50.00th=[ 8848], 60.00th=[ 9503],
     | 70.00th=[10159], 80.00th=[10945], 90.00th=[11994], 95.00th=[13042],
     | 99.00th=[15139], 99.50th=[16188], 99.90th=[18482], 99.95th=[21103],
     | 99.99th=[26870]
   bw (  KiB/s): min=25650, max=31297, per=24.87%, avg=28524.07, stdev=1248.97, samples=119
   iops        : min=  400, max=  489, avg=445.61, stdev=19.55, samples=119
  lat (usec)   : 250=0.01%, 500=0.02%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.88%, 4=5.14%, 10=61.75%, 20=32.10%, 50=0.07%
  cpu          : usr=1.94%, sys=9.06%, ctx=27996, majf=0, minf=86
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=26728,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4
read-phase: (groupid=0, jobs=1): err= 0: pid=312299: Mon Mar  4 23:23:02 2024
  read: IOPS=451, BW=28.2MiB/s (29.6MB/s)(1694MiB/60008msec)
    slat (usec): min=22, max=15799, avg=202.56, stdev=305.10
    clat (usec): min=308, max=30849, avg=8622.91, stdev=2769.15
     lat (usec): min=1686, max=36425, avg=8825.47, stdev=2717.60
    clat percentiles (usec):
     |  1.00th=[ 2008],  5.00th=[ 3294], 10.00th=[ 5211], 20.00th=[ 6456],
     | 30.00th=[ 7308], 40.00th=[ 8094], 50.00th=[ 8717], 60.00th=[ 9372],
     | 70.00th=[10028], 80.00th=[10814], 90.00th=[11994], 95.00th=[12911],
     | 99.00th=[15139], 99.50th=[16057], 99.90th=[17957], 99.95th=[19268],
     | 99.99th=[30278]
   bw (  KiB/s): min=24634, max=36224, per=25.22%, avg=28929.08, stdev=1472.07, samples=119
   iops        : min=  384, max=  566, avg=451.95, stdev=23.04, samples=119
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.96%, 4=5.31%, 10=63.30%, 20=30.37%, 50=0.04%
  cpu          : usr=2.08%, sys=9.14%, ctx=28363, majf=0, minf=84
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=27098,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4
read-phase: (groupid=0, jobs=1): err= 0: pid=312300: Mon Mar  4 23:23:02 2024
  read: IOPS=449, BW=28.1MiB/s (29.4MB/s)(1685MiB/60008msec)
    slat (usec): min=21, max=11953, avg=209.06, stdev=351.18
    clat (usec): min=54, max=30400, avg=8662.16, stdev=2759.84
     lat (usec): min=1631, max=31145, avg=8871.21, stdev=2702.00
    clat percentiles (usec):
     |  1.00th=[ 2024],  5.00th=[ 3392], 10.00th=[ 5211], 20.00th=[ 6456],
     | 30.00th=[ 7373], 40.00th=[ 8160], 50.00th=[ 8848], 60.00th=[ 9372],
     | 70.00th=[10028], 80.00th=[10814], 90.00th=[11994], 95.00th=[12911],
     | 99.00th=[15008], 99.50th=[15926], 99.90th=[17957], 99.95th=[18744],
     | 99.99th=[21890]
   bw (  KiB/s): min=26187, max=36825, per=25.10%, avg=28784.56, stdev=1471.26, samples=119
   iops        : min=  409, max=  575, avg=449.69, stdev=23.00, samples=119
  lat (usec)   : 100=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.88%, 4=5.21%, 10=62.68%, 20=31.16%, 50=0.02%
  cpu          : usr=1.88%, sys=9.35%, ctx=28265, majf=0, minf=84
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=26964,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=112MiB/s (117MB/s), 27.8MiB/s-28.2MiB/s (29.2MB/s-29.6MB/s), io=6721MiB (7047MB), run=60006-60008msec

Disk stats (read/write):
  nvme0n3: ios=107335/0, merge=0/0, ticks=919960/0, in_queue=919960, util=100.00%

Summary: no performance difference with my setup.

Write test with the following command:

fio --bs=64k --numjobs=4 --iodepth=4 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --size=300M --name=read-phase --rw=randwrite

Fio write benchmark using NFS
fio --bs=64k --numjobs=4 --iodepth=4 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --size=300M --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=4
...
fio-3.33
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][w=28.8MiB/s][w=460 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=317066: Mon Mar  4 23:42:38 2024
  write: IOPS=188, BW=11.8MiB/s (12.4MB/s)(708MiB/60003msec); 0 zone resets
    slat (usec): min=29, max=2936, avg=95.00, stdev=129.42
    clat (usec): min=8263, max=66532, avg=21072.86, stdev=8077.54
     lat (usec): min=8381, max=66714, avg=21167.86, stdev=8075.75
    clat percentiles (usec):
     |  1.00th=[11469],  5.00th=[13173], 10.00th=[14091], 20.00th=[14877],
     | 30.00th=[15139], 40.00th=[15533], 50.00th=[17171], 60.00th=[19006],
     | 70.00th=[24249], 80.00th=[32637], 90.00th=[33817], 95.00th=[35390],
     | 99.00th=[38011], 99.50th=[39584], 99.90th=[45351], 99.95th=[49021],
     | 99.99th=[60031]
   bw (  KiB/s): min= 6656, max=17920, per=25.09%, avg=12126.46, stdev=4054.13, samples=119
   iops        : min=  104, max=  280, avg=189.45, stdev=63.33, samples=119
  lat (msec)   : 10=0.09%, 20=63.34%, 50=36.53%, 100=0.04%
  cpu          : usr=0.83%, sys=1.81%, ctx=9395, majf=0, minf=16
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,11326,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4
read-phase: (groupid=0, jobs=1): err= 0: pid=317067: Mon Mar  4 23:42:38 2024
  write: IOPS=188, BW=11.8MiB/s (12.4MB/s)(708MiB/60028msec); 0 zone resets
    slat (usec): min=30, max=3539, avg=99.64, stdev=153.52
    clat (usec): min=8264, max=67411, avg=21066.61, stdev=8085.77
     lat (usec): min=8395, max=67543, avg=21166.25, stdev=8084.49
    clat percentiles (usec):
     |  1.00th=[11469],  5.00th=[13173], 10.00th=[14091], 20.00th=[14877],
     | 30.00th=[15139], 40.00th=[15533], 50.00th=[17171], 60.00th=[19006],
     | 70.00th=[24249], 80.00th=[32375], 90.00th=[33817], 95.00th=[35390],
     | 99.00th=[38536], 99.50th=[39584], 99.90th=[45351], 99.95th=[47449],
     | 99.99th=[66323]
   bw (  KiB/s): min= 6528, max=17920, per=25.10%, avg=12127.50, stdev=4060.45, samples=119
   iops        : min=  102, max=  280, avg=189.46, stdev=63.43, samples=119
  lat (msec)   : 10=0.11%, 20=63.43%, 50=36.41%, 100=0.04%
  cpu          : usr=0.81%, sys=1.84%, ctx=9632, majf=0, minf=18
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,11331,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4
read-phase: (groupid=0, jobs=1): err= 0: pid=317068: Mon Mar  4 23:42:38 2024
  write: IOPS=188, BW=11.8MiB/s (12.4MB/s)(708MiB/60028msec); 0 zone resets
    slat (usec): min=30, max=6640, avg=99.26, stdev=158.69
    clat (usec): min=4955, max=67734, avg=21066.00, stdev=8084.00
     lat (usec): min=9226, max=67844, avg=21165.26, stdev=8082.78
    clat percentiles (usec):
     |  1.00th=[11600],  5.00th=[13173], 10.00th=[14091], 20.00th=[14877],
     | 30.00th=[15139], 40.00th=[15533], 50.00th=[17171], 60.00th=[19006],
     | 70.00th=[24249], 80.00th=[32375], 90.00th=[33817], 95.00th=[35390],
     | 99.00th=[38011], 99.50th=[40109], 99.90th=[45351], 99.95th=[47449],
     | 99.99th=[58459]
   bw (  KiB/s): min= 6528, max=17920, per=25.10%, avg=12128.59, stdev=4056.75, samples=119
   iops        : min=  102, max=  280, avg=189.48, stdev=63.37, samples=119
  lat (msec)   : 10=0.07%, 20=63.29%, 50=36.61%, 100=0.04%
  cpu          : usr=0.99%, sys=1.71%, ctx=9695, majf=0, minf=17
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,11331,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4
read-phase: (groupid=0, jobs=1): err= 0: pid=317069: Mon Mar  4 23:42:38 2024
  write: IOPS=188, BW=11.8MiB/s (12.4MB/s)(709MiB/60030msec); 0 zone resets
    slat (usec): min=29, max=2971, avg=94.00, stdev=126.15
    clat (usec): min=8044, max=67524, avg=21061.28, stdev=8087.85
     lat (usec): min=8647, max=67627, avg=21155.28, stdev=8086.75
    clat percentiles (usec):
     |  1.00th=[11469],  5.00th=[13173], 10.00th=[14091], 20.00th=[14877],
     | 30.00th=[15139], 40.00th=[15533], 50.00th=[17171], 60.00th=[19006],
     | 70.00th=[24249], 80.00th=[32375], 90.00th=[33817], 95.00th=[35390],
     | 99.00th=[38536], 99.50th=[39584], 99.90th=[44827], 99.95th=[46924],
     | 99.99th=[58983]
   bw (  KiB/s): min= 6656, max=18048, per=25.11%, avg=12136.12, stdev=4066.86, samples=119
   iops        : min=  104, max=  282, avg=189.60, stdev=63.53, samples=119
  lat (msec)   : 10=0.14%, 20=63.56%, 50=36.28%, 100=0.03%
  cpu          : usr=0.81%, sys=1.77%, ctx=8981, majf=0, minf=16
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,11338,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
  WRITE: bw=47.2MiB/s (49.5MB/s), 11.8MiB/s-11.8MiB/s (12.4MB/s-12.4MB/s), io=2833MiB (2970MB), run=60003-60030msec
Fio write benchmark using NVMe over TCP
fio --bs=64k --numjobs=4 --iodepth=4 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --size=300M --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=4
...
fio-3.33
Starting 4 processes
read-phase: Laying out IO file (1 file / 300MiB)
read-phase: Laying out IO file (1 file / 300MiB)
read-phase: Laying out IO file (1 file / 300MiB)
read-phase: Laying out IO file (1 file / 300MiB)
Jobs: 4 (f=4): [w(4)][100.0%][w=110MiB/s][w=1767 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=317797: Mon Mar  4 23:45:33 2024
  write: IOPS=440, BW=27.5MiB/s (28.9MB/s)(1652MiB/60017msec); 0 zone resets
    slat (usec): min=38, max=24863, avg=328.01, stdev=595.94
    clat (usec): min=412, max=34355, avg=8716.83, stdev=2150.57
     lat (usec): min=2026, max=35145, avg=9044.84, stdev=2236.72
    clat percentiles (usec):
     |  1.00th=[ 4621],  5.00th=[ 6063], 10.00th=[ 6718], 20.00th=[ 7373],
     | 30.00th=[ 7832], 40.00th=[ 8160], 50.00th=[ 8455], 60.00th=[ 8717],
     | 70.00th=[ 9110], 80.00th=[ 9634], 90.00th=[10945], 95.00th=[12387],
     | 99.00th=[17171], 99.50th=[19006], 99.90th=[25035], 99.95th=[26346],
     | 99.99th=[33424]
   bw (  KiB/s): min=13312, max=31429, per=25.09%, avg=28208.34, stdev=2150.77, samples=119
   iops        : min=  208, max=  491, avg=440.69, stdev=33.61, samples=119
  lat (usec)   : 500=0.01%, 1000=0.01%
  lat (msec)   : 2=0.02%, 4=0.45%, 10=82.88%, 20=16.27%, 50=0.37%
  cpu          : usr=2.46%, sys=13.15%, ctx=29689, majf=0, minf=21
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,26430,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4
read-phase: (groupid=0, jobs=1): err= 0: pid=317798: Mon Mar  4 23:45:33 2024
  write: IOPS=436, BW=27.3MiB/s (28.6MB/s)(1636MiB/60018msec); 0 zone resets
    slat (usec): min=37, max=20509, avg=343.22, stdev=661.86
    clat (usec): min=994, max=56554, avg=8793.04, stdev=2275.08
     lat (usec): min=1924, max=56700, avg=9136.27, stdev=2379.52
    clat percentiles (usec):
     |  1.00th=[ 4621],  5.00th=[ 6128], 10.00th=[ 6783], 20.00th=[ 7439],
     | 30.00th=[ 7832], 40.00th=[ 8160], 50.00th=[ 8455], 60.00th=[ 8717],
     | 70.00th=[ 9110], 80.00th=[ 9765], 90.00th=[11076], 95.00th=[12518],
     | 99.00th=[17695], 99.50th=[20055], 99.90th=[25560], 99.95th=[28181],
     | 99.99th=[55313]
   bw (  KiB/s): min=13668, max=29764, per=24.85%, avg=27939.81, stdev=2164.35, samples=119
   iops        : min=  213, max=  465, avg=436.48, stdev=33.85, samples=119
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=0.04%, 4=0.48%, 10=82.06%, 20=16.91%, 50=0.49%
  lat (msec)   : 100=0.02%
  cpu          : usr=2.30%, sys=13.50%, ctx=29302, majf=0, minf=21
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,26170,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4
read-phase: (groupid=0, jobs=1): err= 0: pid=317799: Mon Mar  4 23:45:33 2024
  write: IOPS=439, BW=27.5MiB/s (28.8MB/s)(1650MiB/60017msec); 0 zone resets
    slat (usec): min=39, max=32870, avg=330.98, stdev=624.38
    clat (usec): min=346, max=55137, avg=8722.58, stdev=2165.00
     lat (usec): min=1901, max=55419, avg=9053.56, stdev=2236.39
    clat percentiles (usec):
     |  1.00th=[ 4621],  5.00th=[ 6063], 10.00th=[ 6718], 20.00th=[ 7373],
     | 30.00th=[ 7832], 40.00th=[ 8160], 50.00th=[ 8455], 60.00th=[ 8717],
     | 70.00th=[ 9110], 80.00th=[ 9765], 90.00th=[10945], 95.00th=[12256],
     | 99.00th=[17171], 99.50th=[19006], 99.90th=[25035], 99.95th=[27132],
     | 99.99th=[34866]
   bw (  KiB/s): min=13796, max=30336, per=25.07%, avg=28183.76, stdev=2023.17, samples=119
   iops        : min=  215, max=  474, avg=440.29, stdev=31.65, samples=119
  lat (usec)   : 500=0.01%, 1000=0.01%
  lat (msec)   : 2=0.04%, 4=0.51%, 10=82.86%, 20=16.27%, 50=0.31%
  lat (msec)   : 100=0.01%
  cpu          : usr=2.38%, sys=13.22%, ctx=29660, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,26406,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4
read-phase: (groupid=0, jobs=1): err= 0: pid=317800: Mon Mar  4 23:45:33 2024
  write: IOPS=440, BW=27.5MiB/s (28.9MB/s)(1652MiB/60017msec); 0 zone resets
    slat (usec): min=36, max=25208, avg=333.02, stdev=603.60
    clat (usec): min=743, max=59529, avg=8711.27, stdev=2229.34
     lat (usec): min=2157, max=59728, avg=9044.29, stdev=2297.89
    clat percentiles (usec):
     |  1.00th=[ 4490],  5.00th=[ 5997], 10.00th=[ 6652], 20.00th=[ 7373],
     | 30.00th=[ 7832], 40.00th=[ 8160], 50.00th=[ 8455], 60.00th=[ 8717],
     | 70.00th=[ 9110], 80.00th=[ 9634], 90.00th=[10945], 95.00th=[12256],
     | 99.00th=[17171], 99.50th=[18744], 99.90th=[24249], 99.95th=[29492],
     | 99.99th=[49546]
   bw (  KiB/s): min=13312, max=32384, per=25.09%, avg=28212.72, stdev=2143.95, samples=119
   iops        : min=  208, max=  506, avg=440.75, stdev=33.53, samples=119
  lat (usec)   : 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.03%, 4=0.55%, 10=82.84%, 20=16.31%, 50=0.26%
  lat (msec)   : 100=0.01%
  cpu          : usr=2.36%, sys=13.26%, ctx=29720, majf=0, minf=22
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,26433,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
  WRITE: bw=110MiB/s (115MB/s), 27.3MiB/s-27.5MiB/s (28.6MB/s-28.9MB/s), io=6590MiB (6910MB), run=60017-60018msec

Disk stats (read/write):
  nvme0n3: ios=12/105337, merge=0/0, ticks=112/887672, in_queue=887784, util=100.00%

Summary: 49.5MB/s WRITE with NFS vs 111MB/s WRITE with NVMe over TCP. Meaning over double the speed compared to NFS in this specific scenario.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

- Enable NVMe over TCP kernel support for the board families `rk35xx` (vendor), `rockchip-rk3588` (edge), `rockchip64` (current, edge) as well as the generic `uefi` target (current, edge) and `wsl` (current, edge)

- Note: for `wsl` only host-mode is supported, target-mode is not available

- Support for `legacy` kernels not added due to incompatibilities between kernel versions <5.14 and >=5.14. Kernels <5.14 need to patch in this commit to be compatible: torvalds/linux@3c3ee16532c1

- NVMe over TCP specs and slides: https://nvmexpress.org/wp-content/uploads/March-2019-NVMe-TCP-What-You-Need-to-Know-About-the-Specification.pdf

- Guides: https://www.linuxjournal.com/content/data-flash-part-iii-nvme-over-fabrics-using-tcp and https://blogs.oracle.com/linux/post/nvme-over-tcp

- NVMe-oF authentication support: https://blogs.oracle.com/linux/post/nvme-inband-authentication

- NVMe-oF TLS support (kernel >=6.7 required): https://lwn.net/Articles/942817/
Copy link
Member

@rpardini rpardini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! "yes, and" for the future:

  • when faced with outdated .configs, use ./compile.sh BOARD=x BRANCH=y rewrite-kernel-config, then commit; that way when you do the actual change, it's isolated.
  • BCACHEFS needs more or less the same treatment ;-)

@rpardini rpardini merged commit 093955e into armbian:main Mar 7, 2024
@ColorfulRhino
Copy link
Collaborator Author

ColorfulRhino commented Mar 7, 2024

  • when faced with outdated .configs, use ./compile.sh BOARD=x BRANCH=y rewrite-kernel-config, then commit; that way when you do the actual change, it's isolated.

Ah yes, totally agree with you. This came to my mind as well, but only after I had already sent the PR and I didn't want to revert and do it again 😅 Next time!

@ColorfulRhino
Copy link
Collaborator Author

@rpardini What would be the best place to copy-paste this to, so more people can see this little guide? Do you have an idea?

@rpardini
Copy link
Member

rpardini commented Mar 7, 2024

Ref the contents: good write up. I think the comparison to NFS is a bit unfair, poor NFS, a more apples-to-apples opponent would be iSCSI, maybe.

I'm not the best to ask with docs and such and where to store them, but maybe in a .md file in https://github.com/armbian/documentation ? We're in sore need of updates there, see armbian/documentation#352 , there have been meetings about docs etc. @igorpecovnik ?

@igorpecovnik
Copy link
Member

but maybe in a .md file in https://github.com/armbian/documentation ?

Yes, unfortunately current docs is a bit messy but we are working on ... IMO anywhere around here
https://docs.armbian.com/User-Guide_Getting-Started/ and we will place it better once we redesign top level structure.

We're in sore need of updates there, see

Yes, meetings are, 3-5 people came around, structure / path is set, progress remain slow.

@ColorfulRhino
Copy link
Collaborator Author

Ref the contents: good write up. I think the comparison to NFS is a bit unfair, poor NFS, a more apples-to-apples opponent would be iSCSI, maybe.

Oh yeah, totally! My writeup is indeed opinionated. I did the comparison with NFS instead of iSCSI for two reasons mostly:

  1. Reading discussions, threads and so on, I took the view that many hobbyists and home operators use NFS or SMB for network sharing. Rarely did I see iSCSI. Basically, my writeup could also have been "hey look at these reasons why you should consider using iSCSI for your network file system instead of NFS" 😆
  2. I have never used iSCSI and honestly don't know much about it. I did not set it up for testing/benchmarking.

So yeah, it's an opinionated text with some information and links I wish I had when I first learn about its existence :)

but maybe in a .md file in https://github.com/armbian/documentation ?

Yes, unfortunately current docs is a bit messy but we are working on ... IMO anywhere around here https://docs.armbian.com/User-Guide_Getting-Started/ and we will place it better once we redesign top level structure.

Alright thanks! I'll see what I can do to contribute.

  • BCACHEFS needs more or less the same treatment ;-)

I have only read a few articles about Bcachefs recently, not tried it out. But I can have a look what needs to be done in order to make it available for some Armbian boards.

@rpardini
Copy link
Member

rpardini commented Mar 7, 2024

iSCSI wouldn't get close to NVMe's perf at (X>1)gbit/s, due to its own overhead and queue depth limitations.
On the other hand, there are many more iSCSI initiators available, so useful if you have say Windows or MacOS clients.

@ColorfulRhino
Copy link
Collaborator Author

iSCSI wouldn't get close to NVMe's perf at (X>1)gbit/s, due to its own overhead and queue depth limitations.

Oh, interesting!

On the other hand, there are many more iSCSI initiators available, so useful if you have say Windows or MacOS clients.

Yeah that's true. When I did a quick search, it seems like there's loads of software for iSCSI on Windows. Not so much with NVME-oF. But I guess that will develop over time :)
The good thing is, you can always run NVME-oF and NFS/iSCSI at the same time for different clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Hardware Hardware related like kernel, U-Boot, ... size/large PR with 250 lines or more
Development

Successfully merging this pull request may close these issues.

3 participants