Skip to content

Radxa Orion O6 (Mini ITX) #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
geerlingguy opened this issue Dec 20, 2024 · 121 comments
Open

Radxa Orion O6 (Mini ITX) #62

geerlingguy opened this issue Dec 20, 2024 · 121 comments

Comments

@geerlingguy
Copy link
Owner

geerlingguy commented Dec 20, 2024

Image

Basic information

  • Board URL (official): https://radxa.com/products/orion/o6
  • Board purchased from: ARACE Tech
  • Board purchase date: December 18, 2024 (pre-order, 'shipping after the Spring Festival 2025', so early Feb)
  • Board specs (as tested): Radxa Orion O6 16GB (32GB for the one I pre-ordered)
  • Board price (as tested): $251 ($299.00 for the one I pre-ordered)

NOTE: I originally tested the 0.2.x firmware with the Debian 12 Device Tree configuration the first batches of the Orion O6 shipped with. Those test results are stored for posterity in this comment below. Because Radxa and Cix have released firmware which radically alters the performance characteristics of the board post-launch (April 2025), I am re-running all benchmarks and will list those results below.

Also, GitHub user @System64fumo is maintaining a list of all the features and tested hardware that works or doesn't work in mainline Linux currently: Orion O6 mainline support.

Linux/system information

# output of `screenfetch`
                          ./+o+-       jgeerling@orion-o6
                  yyyyy- -yyyyyy+      OS: Ubuntu 25.04 plucky
               ://+//////-yyyyyyo      Kernel: aarch64 Linux 6.14.0-15-generic
           .++ .:/++++++/-.+sss/`      Uptime: 7m
         .:++o:  /++++++++/:--:/-      Packages: 1589
        o:+o+:++.`..```.-/oo+++++/     Shell: dash
       .:+o:+o/.          `+sssoo+/    Disk: 9.8G / 234G (5%)
  .++/+:+oo+o:`             /sssooo.   CPU: ARM Cortex-A720 @ 8x 2.6GHz
 /+++//+:`oo+o               /::--:.   GPU: 
 \+/+o+++`o++o               ++////.   RAM: 1365MiB / 15229MiB
  .++.o+++oo+:`             /dddhhh.  
       .+.o+oo:.          `oddhhhh+   
        \+.++o+o``-````.:ohdhhhhh+    
         `:o+++ `ohhhhhhhhyo++os:     
           .o:`.syhhhhhhh/.oo++o`     
               /osyyyyyyo++ooo+++/    
                   ````` +oo+++o\:    
                          `oo++.    

# output of `uname -a`
Linux orion-o6 6.14.0-15-generic #15-Ubuntu SMP PREEMPT_DYNAMIC Sun Apr  6 14:37:51 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

Benchmark results

CPU

Power

  • Idle power draw (at wall): 14.2 W
  • Maximum simulated power draw (stress-ng --matrix 0): 24.9 W
  • During Geekbench multicore benchmark: 27.3 W
  • During top500 HPL benchmark: 26.4 W

Disk

Inland 256 GB PCIe Gen 3x4 NVMe SSD

Benchmark Result
iozone 4K random read 51.67 MB/s
iozone 4K random write 215.64 MB/s
iozone 1M random read 1420.56 MB/s
iozone 1M random write 1452.81 MB/s
iozone 1M sequential read 1916.47 MB/s
iozone 1M sequential write 1683.79 MB/s

Network

iperf3 results:

  • iperf3 -c $SERVER_IP: TODO Mbps
  • iperf3 -c $SERVER_IP --reverse: TODO Mbps
  • iperf3 -c $SERVER_IP --bidir: TODO Mbps up, TODO Mbps down

(Be sure to test all interfaces, noting any that are non-functional.)

GPU

glmark2

glmark2-es2 / glmark2-es2-wayland results:

NOTE: The 9.0.0 firmware and ACPI mode don't seem to properly expose the GPU for Ubuntu 25.04, at least. The iGPU will need later drivers and possibly other firmware to work correctly in Linux. See:

1. Install glmark2-es2 with `sudo apt install -y glmark2-es2`
2. Run `glmark2-es2` (with `DISPLAY=:0` prepended if running over SSH)
3. Replace this block of text with the results.

vkmark

vkmark results:

1. Install vkmark with `sudo apt install -y vkmark`
2. Run `vkmark` (with `DISPLAY=:0` prepended if running over SSH)
3. Replace this block of text with the results.

Note: vkmark needs to be compiled from source on Debian 12 and earlier.

GravityMark

GravityMark results:

1. Download the latest version of GravityMark: https://gravitymark.tellusim.com
2. Run `chmod +x [downloaded_filename].run`
3. Run `sudo ./[downloaded_filename].run` and press `y` to accept the terms.
4. Open the link it prints, and run the Benchmark defaults, changing to 720p resolution and 50,000 asteroids.

Note: These benchmarks require an active display on the device. Not all devices may be able to run glmark2-es2, so in that case, make a note and move on!

Ollama

ollama LLM model inference results:

See: geerlingguy/ollama-benchmark#13

Memory

tinymembench results:

Click to expand memory benchmark result
tinymembench v0.4.10 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :  11874.4 MB/s (2.1%)
 C copy backwards (32 byte blocks)                    :  11686.3 MB/s (1.4%)
 C copy backwards (64 byte blocks)                    :  11691.6 MB/s (1.4%)
 C copy                                               :  13160.3 MB/s (0.7%)
 C copy prefetched (32 bytes step)                    :  12196.7 MB/s (0.8%)
 C copy prefetched (64 bytes step)                    :  12154.2 MB/s (0.8%)
 C 2-pass copy                                        :  15616.0 MB/s (0.1%)
 C 2-pass copy prefetched (32 bytes step)             :  13435.1 MB/s (0.3%)
 C 2-pass copy prefetched (64 bytes step)             :  15813.3 MB/s (0.3%)
 C fill                                               :  40840.2 MB/s
 C fill (shuffle within 16 byte blocks)               :  40847.8 MB/s
 C fill (shuffle within 32 byte blocks)               :  40847.4 MB/s
 C fill (shuffle within 64 byte blocks)               :  40842.5 MB/s
 NEON 64x2 COPY                                       :  14618.1 MB/s (0.6%)
 NEON 64x2x4 COPY                                     :  15031.3 MB/s (0.2%)
 NEON 64x1x4_x2 COPY                                  :  14249.7 MB/s (0.2%)
 NEON 64x2 COPY prefetch x2                           :  14087.5 MB/s (0.5%)
 NEON 64x2x4 COPY prefetch x1                         :  15009.3 MB/s
 NEON 64x2 COPY prefetch x1                           :  15042.0 MB/s (0.2%)
 NEON 64x2x4 COPY prefetch x1                         :  15041.8 MB/s (0.2%)
 ---
 standard memcpy                                      :  11352.6 MB/s (1.3%)
 standard memset                                      :  48439.3 MB/s (0.9%)
 ---
 NEON LDP/STP copy                                    :  15467.9 MB/s (1.4%)
 NEON LDP/STP copy pldl2strm (32 bytes step)          :  13229.0 MB/s (0.2%)
 NEON LDP/STP copy pldl2strm (64 bytes step)          :  14515.6 MB/s (0.4%)
 NEON LDP/STP copy pldl1keep (32 bytes step)          :  15019.7 MB/s (0.2%)
 NEON LDP/STP copy pldl1keep (64 bytes step)          :  16052.8 MB/s (2.1%)
 NEON LD1/ST1 copy                                    :  15500.5 MB/s
 NEON STP fill                                        :  48873.6 MB/s (1.3%)
 NEON STNP fill                                       :  47780.1 MB/s (1.4%)
 ARM LDP/STP copy                                     :  14210.0 MB/s (0.2%)
 ARM STP fill                                         :  48795.8 MB/s (1.2%)
 ARM STNP fill                                        :  47495.6 MB/s (1.2%)

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read, [MADV_NOHUGEPAGE]
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    0.0 ns          /     0.0 ns 
    131072 :    1.0 ns          /     1.5 ns 
    262144 :    2.0 ns          /     2.8 ns 
    524288 :    5.8 ns          /     8.3 ns 
   1048576 :   21.4 ns          /    30.4 ns 
   2097152 :   27.5 ns          /    34.6 ns 
   4194304 :   32.9 ns          /    36.7 ns 
   8388608 :   38.3 ns          /    41.5 ns 
  16777216 :   48.5 ns          /    56.1 ns 
  33554432 :  123.8 ns          /   169.5 ns 
  67108864 :  185.7 ns          /   230.0 ns 

block size : single random read / dual random read, [MADV_HUGEPAGE]
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    0.0 ns          /     0.0 ns 
    131072 :    1.0 ns          /     1.5 ns 
    262144 :    1.5 ns          /     2.1 ns 
    524288 :    2.0 ns          /     2.7 ns 
   1048576 :   17.9 ns          /    26.5 ns 
   2097152 :   25.6 ns          /    32.8 ns 
   4194304 :   29.4 ns          /    34.8 ns 
   8388608 :   31.3 ns          /    35.5 ns 
  16777216 :   39.5 ns          /    46.8 ns 
  33554432 :  110.6 ns          /   155.1 ns 
  67108864 :  163.5 ns          /   206.7 ns 

sbc-bench results

https://0x0.st/8WAL.bin / ThomasKaiser/sbc-bench#115

Phoronix Test Suite

Results from pi-general-benchmark.sh:

  • pts/encode-mp3: 8.724 sec
  • pts/x264 4K: 11.95 fps
  • pts/x264 1080p: 50.79 fps
  • pts/phpbench: 611861
  • pts/build-linux-kernel (defconfig): 776.221 sec
@geerlingguy geerlingguy changed the title https://radxa.com/products/orion/o6 Radxa Orion O6 (Mini ITX) Dec 20, 2024
@ThomasKaiser
Copy link

Slightly/highly off-topic... but I thought I ask anyway since the question might interest some of your audience too (if you disagree simply mark my comment as off-topic).

To test 5GbE capabilties a TrendNet TEG-S762 just arrived (cheapest unmanaged passively-cooled switch with at least 2 5GbE ports I could find two days ago) and an RTL8126 based M.2 NIC is on its way (10 bucks taxes and shipping included, currently listed for a lot more again)

Since you recommended WisdPi equipment to me may I ask where you ordered and if from their store whether you got something like a commercial invoice? I'm thinking about ordering two RTL8157 based WP-UT5 you already reviewed to test the Onion O6 against Mac clients but need a 'real' invoice :)

@geerlingguy
Copy link
Owner Author

geerlingguy commented Dec 22, 2024

@ThomasKaiser - Their site uses Shopify, so it spits out the normal Shopify-style invoice (lists billing/customer address, items, shipping, tax—I presume it would also list VAT information in EU (not sure on that).

For my most recent order, it took 3 days from departing China to clear customs in the US, then four days to hit the regional mail facility here.

I also just noticed WisdPi is selling the WP-UT5 on Amazon now, too.

@MagicAndre1981
Copy link

I'm thinking about ordering two RTL8157 based WP-UT5 you already reviewed to test the Onion O6 against Mac clients but need a 'real' invoice :)

I bought a RTL8157 based adapter from Wavlink on German Amazon (your name may indicate you are German, too). On USB 3.0x Gen1 (5Gbit) you get 3.5GBit/s so around 400MB/s and full 5 GBit/s on USB 3.x Gen2 (10GBit) Port. But I use Windows devices, so I have no idea about Mac. You may need newer drivers.

@ThomasKaiser
Copy link

Shopify-style invoice (lists billing/customer address, items, shipping, tax—I presume it would also list VAT information in EU (not sure on that).

Since they listed 'included taxes' at checkout I simply gave it a try and edit this comment later when items and invoice arrived.

no idea about Mac. You may need newer drivers.

RTL8157 has pretty good support by stock macOS (confirmed to work out of the box with at least macOS 13-15). I checked the macOS driver package and it only contains two directories named '10.8' and '10.9-10.15' so this is only for ancient OS versions :)

@spx86
Copy link

spx86 commented Jan 15, 2025

Can you help test if this development board supports Arm CCA?
I also want to buy this model development board because I want to do some development work in Realm, so I need to know if orion o6 supports Arm CCA. i can hardly find any information about it!

@geerlingguy
Copy link
Owner Author

geerlingguy commented Jan 27, 2025

There's an Orion O6 Debug Party thread over on the Radxa community forum. I just got a special box in the mail today...

Image

@ThomasKaiser
Copy link

'Mine' arrived as well:

Image

Let's hope the aforementioned thread will bundle information needed at this point (and it doesn't happen in some Discord channels)

@hrw
Copy link

hrw commented Jan 27, 2025

Do tests, share data etc.

Planning to order, but winter vacations first :D

@hrw
Copy link

hrw commented Jan 28, 2025

Can you add BSA ACS to list of tests?

You run it from EFI Shell: bsa.efi for normal run + bsa.efi -v 1 for verbose one (and share both).

This checks for BSA compliance.

And my ArmCpuInfo as it shows better which cpu features are supported than running Linux and checking /proc/cpuinfo ;D

@mkesper
Copy link

mkesper commented Jan 28, 2025

If this thing gets mainline support it will be rocking!

@ThomasKaiser
Copy link

@geerlingguy @cnxsoft: latest sbc-bench v0.9.70 should now contain all adoptions/hacks needed for dealing with CD8180.

Especially the weird cluster setup is addressed and the mystery why cpu11 is not only faster than its cpu9 and cpu10 siblings from same cpufreq cluster but also faster than cpu0 with which the core shares the real CPU clockspeeds but not access to DRAM.

See commit comment for further details: ThomasKaiser/sbc-bench@6b0cd05#commitcomment-151945077

@geerlingguy
Copy link
Owner Author

@ThomasKaiser - Quite strange! I wonder if there could be any chip to chip variance here? Seems very weird to have one core behave with a measurable difference (especially wrt memory access). If that's as designed, what was the goal??

@ThomasKaiser
Copy link

If that's as designed, what was the goal??

Good question. I'm hoping @wtarreau as announced will soon start his detective work about the inner workings :)

(Though $dayjob might be in the way just as in my case right now)

@wtarreau
Copy link

Yeah I really want to, but indeed I didn't plan on receiving one so I haven't reserved time yet. But probably this week-end. I already downloaded the image to flash on it, and already have an SSD available.

@geerlingguy
Copy link
Owner Author

I was trying to get the OS loaded on an NVMe drive, but when I flashed the B3 version of the Debian desktop image, there's no .img file included in the Home folder as per the instructions: https://docs.radxa.com/en/orion/o6/getting-started/quick-start#3-write-the-image

I was considering cloning the USB drive, but I have a 1TB USB drive and 256 GB NVMe installed, so that won't work either.

I guess I can just try downloading the B3 img.gz file to the Live install USB drive, then flash from there... but I had assumed it would be included. Maybe a change from revision B1 to B3?

@wtarreau
Copy link

I had the image on mine, it was the B3. I downloaded the "usb-preinstalled-something-b3" image, uncompressed it to a USB hard drive, the image was 20 GB. Then once booted, the NVME image was in the radxa user account. I just copied it over the /dev/nvme0n1 device, rebooted and it worked first time.

Also, apparently the USB image and the NVME ones are not exactly the same. For example I noticed lscpu missing from the USB one while it was on the NVME one. Thus there's at least one difference.

BTW be careful if you ever test openssl. There is an outdated version 3.0.1 in /usr/share/cix/something that is not correct, and unfortunately that path is before /usr/bin for the radxa user. Same for their libs, so you can quickly end up with tests reporting garbage numbers.

And you can change the max CPU freq in the BIOS from 2.5 to 2.6, as indicated by Thomas (I did it, since better test it closer to the expected final specs).

@geerlingguy
Copy link
Owner Author

That's what I downloaded (https://dl.radxa.com/orion/o6/images/debian/orion-o6-debian12-preinstalled-desktop-b3.img.gz), and I flashed that download directly with Etcher, then booted off the USB stick.

The home directory had no extra files in it (confirmed in Terminal and in the file browser, not just looking in Etcher)... so I just downloaded that entire file again on the Live booted image.

Flashed it with Etcher, and it is running now.

However, the fan was not going full blast in the Live CD boot from USB, nor during BIOS operations... but once I booted from the internal NVMe, the fan is now going at 100% continuously (and is quite loud!). Any way to force the fan to use PWM? CPU temp is reporting as 30°C so it's certainly not necessary to be going all out :D

@wtarreau
Copy link

Yes, I got the same as well. I explained on the forum how I calmed it down, by echoing value 17 to pwm1 (at 16 it eventually stopped spinning). Found it, it's here: https://forum.radxa.com/t/orion-o6-debug-party-invitation/25054/126?u=willy

$ echo 17 > /sys/class/hwmon/hwmon1/pwm1

Do not hesitate to read some contents there, as there's already quite a lot of data shared (CPU cores arrangement in clusters, RAM latency/BW etc).

@geerlingguy
Copy link
Owner Author

@wtarreau Thanks, and no worries, right after commenting, I finished reading through the debug party thread and found your comment on the PWM there too.

I'll carry on with my testing here :)

@geerlingguy
Copy link
Owner Author

Power consumption during a top500 HPL run:

Image

@geerlingguy
Copy link
Owner Author

I posted an initial pre-release blog post about the board, as I think it's important for Radxa to tone down some of the marketing hyperbole around the board, if they don't want the first wave of testers to lambast this board.

It's good hardware, and I'd hate for it to be let down by early software issues. (And for the hundredth time, I wish Radxa would give more time between these 'debug parties' where tons of issues are found, and the public launch, when people who pre-ordered get boards that are pretty good but lacking in some essential features, or they have to do things like pin old software packages since updating would remove functionality they expect to have...).

@cnxsoft
Copy link

cnxsoft commented Feb 4, 2025

I usually take my time or do a preview (as opposed to a review) for new boards from Radxa.

@geerlingguy
Copy link
Owner Author

@cnxsoft - The tricky bit for that (in general) is when to actually do a full-on review, though. Do you do it when they start shipping boards to people who pre-ordered, or wait some time after that?

From prior experience, I don't have a lot of confidence we'll be in a much different place in a few weeks, when these boards start shipping :(

I want to be proven wrong, because I really like this board hardware, but as I say in the blog post, I think I need to temper my expectations!

@cnxsoft
Copy link

cnxsoft commented Feb 6, 2025

My method is quite basic. I review a few other boards first, and once I've done I switch to it. I try not to delay more than one month though.

@geerlingguy
Copy link
Owner Author

A couple updates over the past few weeks:

Also been monitoring the O6 Radxa forum, looks like one user has an RX 7600 running Skyrim at 4K, and another has Windows 11 up and running.

@ThomasKaiser
Copy link

another has Windows 11 up and running.

The another is Mario known for Windows on R. And thanks for the link to the OpenBSD mailing list, great to see Jared McNeill in action (again, he did a lot of the ground work to boot any arm64 based distro on recent Rockchip platforms).

@geerlingguy
Copy link
Owner Author

@Civil - I'm currently running Ubuntu 25.04 with 9.0.0 BIOS, in ACPI mode. It has a newer kernel and later AMDGPU and Nvidia drivers without having to install backports or use Mainline Kernels to update.

$ uname -a
Linux orion-o6 6.14.0-15-generic #15-Ubuntu SMP PREEMPT_DYNAMIC Sun Apr  6 14:37:51 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

@geerlingguy
Copy link
Owner Author

Testing CUDA cores and 'AI' inference on the 3080 Ti...

$ sudo apt install nvidia-cuda-toolkit
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:10:07_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Then I installed and compiled llama.cpp for arm64, compiling in CUDA support with cmake -B build -DGGML_CUDA=1.

Trying out some inference:

cd models && wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
cd ../
./build/bin/llama-cli -m "models/Llama-3.2-3B-Instruct-Q4_K_M.gguf" -p "Why is the blue sky blue?" -e -ngl 100 -t 4

That worked, and quite fast:

llama_perf_sampler_print:    sampling time =       0.33 ms /    14 runs   (    0.02 ms per token, 42042.04 tokens per second)
llama_perf_context_print:        load time =     732.10 ms
llama_perf_context_print: prompt eval time =      96.00 ms /    98 tokens (    0.98 ms per token,  1020.78 tokens per second)
llama_perf_context_print:        eval time =    5120.66 ms /  1072 runs   (    4.78 ms per token,   209.35 tokens per second)
llama_perf_context_print:       total time =   31253.99 ms /  1170 tokens

Power consumption spiked from idle at 30W to full tilt 438W.

More benchmarking information will be in this issue: geerlingguy/ollama-benchmark#13

@HeyMeco
Copy link

HeyMeco commented May 6, 2025

Power consumption spiked from idle at 30W to full tilt 438W.

The board power consumption becomes not even worth mentioning when the PCIe slot suddenly ramps up :D
Edge deployments are going to very interesting.
Maybe one day even with intel arc gpu’s

@ThomasKaiser
Copy link

ThomasKaiser commented May 6, 2025

One of the very few aspects of this board I still think could make it useful long-term is SystemReady support.

Sure, I just wanted to point out that the UEFI version you're running is solely meant to get this certification (by doing weird/nasty stuff like disabling the little cores for every OS to get silly Windows happy/booting) and there's another UEFI version on its way that is meant for end users then being able to also run any aarch64 OS version (maybe even Windows if Cix 'fixes' the CPU ordering).

Asides disabling the A520 cores your testing also revealed that the SR UEFI version has other performance impacts.

@geerlingguy
Copy link
Owner Author

geerlingguy commented May 6, 2025

@ThomasKaiser - True... hopefully both will be fully supported though. If Nvidia ever decided to actually release the Windows on Arm drivers for their cards that I know exist... this board would immediately be the best value for Windows on Arm development (especially for games), and Qualcomm would be quickly in the dust for desktop use cases (IMHO).

(Ampere is still a great option but those boards are kinda hulking and the power consumption is a bit higher too. But they have their uses for development.)

@wtarreau
Copy link

wtarreau commented May 6, 2025

Ampere boards are still quite expensive (count roughly $1000 for the naked board, and add at least as much for the CPU module). They're nice for enterprise usage but a bit too much for home. Also their cores are numerous but not that fast individually, so they only make sense for highly parallel workloads.

@geerlingguy
Copy link
Owner Author

@wtarreau - The Neoverse N1 cores in the Altra are about on par with the A720s, at least for most of the applications in my benchmarking. You just get a loooot more of them, and consistent performance between cores (instead of the odd hodgepodge of 'big' and 'medium' cores).

I think of the Ampere chips as 'Threadripper' scale chips, versus the Cix being a low-end desktop or mobile chip. It is certainly a lot slower than Apple M-series if you just want a great desktop experience and a native Arm CPU. But if you want a good desktop experience and IO/Linux capabilities, this board is what I think will hit that sweet spot.

The Amperes get some use in home environments too (just like a Threadripper), not everyone's needs are the same :D

@bexcran
Copy link

bexcran commented May 6, 2025

Ampere boards are still quite expensive (count roughly $1000 for the naked board, and add at least as much for the CPU module). They're nice for enterprise usage but a bit too much for home.

You can get a bundle with the CPU and motherboard for around $1500.
https://www.newegg.com/asrock-rack-altrad8ud-1l2t-q64-22-ampere-altra-max-ampere-altra-processors/p/13-140-134?Item=13-140-134&cm_sp=product-_-from-price-options

@wtarreau
Copy link

wtarreau commented May 6, 2025

@wtarreau - The Neoverse N1 cores in the Altra are about on par with the A720s, at least for most of the applications in my benchmarking.

Oh no, they're quite far from this, roughly half at same frequency. The N1 is exactly the same as an A76 (e.g. RK3588 or RPi5). They're arranged by pairs on L2 caches.

You just get a loooot more of them, and consistent performance between cores (instead of the odd hodgepodge of 'big' and 'medium' cores).

Yes for this I agree. I'm using daily a Q80-26 (80 cores at 2.6 GHz), "make -j" is a joy on this :-) Also their PCIe controller works very well, and the memory controller as well (I get something like 90% of theoretical BW, never seen this anywhere else, even on x86).

I think of the Ampere chips as 'Threadripper' scale chips, versus the Cix being a low-end desktop or mobile chip. It is certainly a lot slower than Apple M-series if you just want a great desktop experience and a native Arm CPU. But if you want a good desktop experience and IO/Linux capabilities, this board is what I think will hit that sweet spot.

I'm precisely not convinced at all for a desktop due to each core not being that fast. Single-threaded workloads are slow. Typically the link phase of a build (particularly if you're using LTO) can be long. I guess that certain desktop applications can be slow (e.g. JS in a browser). Note that by "slow" I don't mean "ultra slow", Just not what you'd get on a low-end x86. But I agree that for anything multi-threaded/multi-process it's a bomb.

The Amperes get some use in home environments too (just like a Threadripper), not everyone's needs are the same :D

I really thought about it and figured that aside building kernels or bisecting haproxy I wouldn't really benefit from it. The Ampere One cores are said (by marketing) to be way more powerful. I have never tested them though, so I have no opinion on these.

@wtarreau
Copy link

wtarreau commented May 6, 2025

You can get a bundle with the CPU and motherboard for around $1500.

Yes but you just don't want to pick a low frequency CPU. Sadly they mix high core counts with high frequencies. There are 32 cores at 1.7 and 128 cores at 3.0/3.3. I think many of us would love to get a 16/32 at 3.3/3.5 instead of having to sacrifice the frequency that way to limit the TDP. In fact contrary to the x86 world you cannot choose a balance of cores count and frequency for a given TDP. On x86 you can find super high frequencies in 4/8 cores and super low ones in 64+ cores. That's what we're missing there.

@geerlingguy
Copy link
Owner Author

Oh no, they're quite far from this, roughly half at same frequency. The N1 is exactly the same as an A76 (e.g. RK3588 or RPi5). They're arranged by pairs on L2 caches.

What I mean is when I'm doing my benchmarking of a 2.8 or 3.0 GHz Ampere Altra, and comparing that in end user applications and benchmarks to the 2.4 GHz (sometimes 2.5 GHz) A720s on this system, I see similar performance. Sometimes the Ampere wins, sometimes the Cix... it is dependent on how much it relies on cache, RAM access, etc.

I don't spend as much time in the weeds of the theoretical performance, I just test how the cores perform with my workloads :)

And there, this board is great... but does not live up to the potential I initially envisioned based on specs alone!

@Civil
Copy link

Civil commented May 6, 2025

Ampere boards are still quite expensive (count roughly $1000 for the naked board, and add at least as much for the CPU module). They're nice for enterprise usage but a bit too much for home. Also their cores are numerous but not that fast individually, so they only make sense for highly parallel workloads.

You can save a bit. Altra m128-30 is usually sold for 6500-650$ used on ebay. Add ASRock altrad8ud there from Newegg (just mb is 829$) and you can get mb and CPU for 1.5k$

Though cheap CPUs comes and goes but every other month you can get one

@geerlingguy
Copy link
Owner Author

(totally off topic now, but) I really wish Ampere would sell CPUs DTC, or at least just have some available like that on NewEgg. I hate bundles lol.

@wtarreau
Copy link

wtarreau commented May 6, 2025

Also don't forget the RAM cost. The power of Altra comes from the many channels, you really need to fill them. On the Adlink model only 6 channels are populated, but a few other boards have the 8.

@bexcran
Copy link

bexcran commented May 6, 2025

(totally off topic now, but) I really wish Ampere would sell CPUs DTC, or at least just have some available like that on NewEgg. I hate bundles lol.

I haven't tried buying from them, but apparently they're available on https://anafrashop.com/cpu-2?filtrManufacturer[]=2623 - though it looks like they're all on demand and not kept in stock.

@wtarreau
Copy link

wtarreau commented May 6, 2025

Sorry, what do you call "DTC" ? I couldn't figure what that means here.

@bexcran
Copy link

bexcran commented May 6, 2025

Sorry, what do you call "DTC" ? I couldn't figure what that means here.

Direct To Consumer.

@Civil
Copy link

Civil commented May 6, 2025

Also don't forget the RAM cost. The power of Altra comes from the many channels, you really need to fill them. On the Adlink model only 6 channels are populated, but a few other boards have the 8.

Ddr4 rdimms are rather cheap nowadays if you are going for 16 or 32gb sticks. And you might save a bit if you'll go for ddr4-2933. It is slightly slower, but ampere doesn't seems to take advantage of extra speed.

P.s. right now if you'll search for m128-30 on ebay you'll see a lot of used, working one shipping from us for 440$

@wtarreau
Copy link

wtarreau commented May 6, 2025

Thanks Rebecca. I saw these CPUs for sale already but they were way too expensive for high frequencies, precisely due to the problem I mentioned above (i.e. lots of cores).

@Shivansps
Copy link

Oh no, they're quite far from this, roughly half at same frequency. The N1 is exactly the same as an A76 (e.g. RK3588 or RPi5). They're arranged by pairs on L2 caches.

What I mean is when I'm doing my benchmarking of a 2.8 or 3.0 GHz Ampere Altra, and comparing that in end user applications and benchmarks to the 2.4 GHz (sometimes 2.5 GHz) A720s on this system, I see similar performance. Sometimes the Ampere wins, sometimes the Cix... it is dependent on how much it relies on cache, RAM access, etc.

I don't spend as much time in the weeds of the theoretical performance, I just test how the cores perform with my workloads :)

And there, this board is great... but does not live up to the potential I initially envisioned based on specs alone!

Just wondering, did you try to compile the bench tools while specifing arm v9.2 march? it crossed my mind that the reason why some benchmarks are below what is expected is becuse it is running on generic arm v8 instructions. But i did not had time to test myself.

Im not sure if i would even trust -march=native here, considering some linux distros and some tools misID it as armv8.

@geerlingguy
Copy link
Owner Author

I will have to try recompiling the top500 benchmark at least. I had hoped at least Ubuntu 25.04 being so new with 6.14 would identify Armv9 appropriately.

@RadxaYuntian
Copy link

For the 9.0.0 BIOS that would be 'any arm64 Linux ISO', it seems.

It was only certified for a few distros (excluding Debian though). Others might have additional issues (for example the lack of RTL8126 driver).

One of the very few aspects of this board I still think could make it useful long-term is SystemReady support. If functionality is gated behind custom DTBs and Radxa-provided ISOs... that makes this board just a glorified SBC that'll only be supported by Armbian in 5 years time ;)

By definition, SystemReady cannot be gated behind custom DTB and our custom ISOs. The test uses unmodified upstream ISOs, just like what an end user is supposed to use.

We direly want to support SR + ACPI. Nobody wants to keep updating software forever. A standard compliant machine might finally make this a done deal. But until we have the source code, there is not much we can do.

Cix is moving very fast on this though. We should have their initial ACPI BIOS code this month.

@Civil
Copy link

Civil commented May 7, 2025

I saw these CPUs for sale already but they were way too expensive for high frequencies

Ampere is not too popular, so on the used market they tend to be available until the seller drops the price to ~400-600$ for M128-30. So if you don't mind used hw, just grab a CPU from eBay, and a brand new MB from Newegg. Cooler is a bit of a problem, but Noctua used to sell them if you ask their business support nicely, and Arctic has a mounting kit for their 4U-M cooler available for sale (that would be a cheaper option that you can grab from Amazon).

@wtarreau
Copy link

wtarreau commented May 7, 2025

The thing is, in my case since we already have a nice Q80-26, the gains to expect by upgrading to M128-30 are not that high to justify such a change. Plus the TDP of that beast is much higher (250W vs 150W) and ours just has the stock fan on it. That's why we never upgraded. But I agree with you on the used CPU approach, we did that already for many-cores EPYC and Xeons ;-)

@volyrique
Copy link

@geerlingguy

I will have to try recompiling the top500 benchmark at least. I had hoped at least Ubuntu 25.04 being so new with 6.14 would identify Armv9 appropriately.

BLIS already detects the presence of SVE support just fine, but as I already told you elsewhere that is not necessarily a win in the HPL benchmark (given that the hardware vector size is still 128 bits, just the same as ASIMD/NEON, and that the workload is a boring FP64 one). I am somewhat skeptical that any other Armv9.2-A functionality would help much in that case (LSE/atomics could, but HPL is a multi-process instead of multi-threaded application, so I doubt it), unless @Shivansps has something specific in mind.

@geerlingguy
Copy link
Owner Author

Today I got an email from ARACE with the following, regarding my original order for a 32GB model with the AI Kit case:

Due to high customs duties, our DHL and Fedex service providers have suspended shipping services to the US.

Currently, we can only offer shipping to the U.S. through the 4PX logistics channel, but prepayment of taxes is required.

If you can still accept the high tariffs, you can choose 4PX to place a new order.

We will refund the payment and it will take 3~7 business days to return the refund to your account.

I went over to the site to try a new order...

Image

From $299 for the original pre-order to $311.90... that I can stomach. From $311.90 to $1500.90? Not so much :(

@geerlingguy
Copy link
Owner Author

Posted a video and blog post this morning:

Still more testing to do, of course :P

@wtarreau
Copy link

I heard today that tariffs could be almost dropped again for 90 days, maybe your should monitor the price variations in the forthcoming days.

@geerlingguy
Copy link
Owner Author

@wtarreau heh, tariff situation changes almost minute-by-minute now. Someone could probably make money being an arbitrator of when to actually 'cross the border' with a shipment of essential goods!

@wtarreau
Copy link

Yeah I thought the same, but I just wanted to let you know that it's still worth doing a daily ping there ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests