Notes on RAIDZ expansion in TrueNAS

Posted on Jan. 24, 2025 by Ben Dickson.


I have an Asustor Flashtor 12.

Initially I had 4x 4TB drives in a RAIDZ-1. I copied about 9.4TB of data to them.

I wanted to then expand the pool with 2 additional 4TB drives.

Expanding the pool is very straight forward - in TrueNAS Scale, just select the RAIDZ1 vdev, click extend, and pick the drive.

You can only do one drive at a time, and it takes a while. Specifically,

expand: expanded raidz1-0 copied 9.36T in 04:07:11

This works out at roughly 700MB/s - this is almost definitely bottlenecked by the 2GHz Intel Celeron N5105 CPU in the device. The pool is encrypted, which I assume must impact the speed (but not sure/have not verified this)

After this, a scrub task was automatically run (not sure if this is from TrueNAS or ZFS), which I waited for before expanding the second drive. Not sure if it's necessary to wait for the scrub to complete, but seems sensible. The scrub took about 1h 45mins

Repeating this process for the second drive took more or less the same amount of time.

Rebalancing

An interesting note is because the data isn't rewritten, you lose some storage capacity until the data is rewritten.

Specifically, I had 4x 4TB drives with about 9TB of data, so that data would have about 9TB/4drives/1 drive for parity==2.2TB of overhead. With the new 6 drive pool, that data is still taking (9/4/1) TB of storage space - whereas if I set it up from the start with 6 drives, it would only be using (9/6/1)TB

Meaning rebalancing the data on the pool using a script like zfs-inplace-rebalancing would free up about 0.75TB

These calculations are very likely somewhere between "wrong" and "massively over simplified", but let's see..

Checking the calculations

Running it on a small 138G dataset didn't show any measurable difference, showed 138G USED before and after - not too surprising.

Running the script again on a larger dataset, which was showing 1.57T in the USED column of zfs list then changed to 1.43T - a difference of 0.14TB. The rebalance took 2h15 to run (about 175MB/s, which is with the checksum feature disabled)

Running again on a larger dataset, 4.77T dataset became 4.34T taking 8h5 to run, a reduction of 0.43T

So this totals about 570GB. There are a few other datasets I didn't bother running the script on, but the numbers aren't too far off the 750GB guess before - and worth doing to free up essentially 1/8th of a drive worth of space

Summary

These are a very long way from proper benchmarks - there are just notes taken while initially setting up a new NAS device, experimenting with the RAIDZ expansion before trusting it. The reason for buying the 12 bay Flashstor device being to expand the pool over time.

Unsurprisingly, the ZFS expansion works about like it should. I guess the biggest thing to note is it's a somewhat slow process. To expand a vdev with 9TB of data on it took, roughly:

So about 20 hours for one drive - which is with NVME drives. I imagine things would be markedly slower with spinning drives, but probably quite a bit faster with a faster CPU or without encryption.

The speed isn't an issue, it's probably about the same speed as replacing a failed drive - plus the data is still accessible while the expand process is running, and the rebalancing is optional/can be done at a later point.