Surprised by the number of articles all around t3h interwebz about btrfs having problems and saying it corrups/hangs/overheads , I’ve decided to describe my really pleasant desktop use experience.
Use case is quite simple. I have 3 drives that are used to keep data on my computer. Data being video and audio files, all kinds of linux-based distro images, torrents and vm machines that I use for testing purposes. I consider all of this data expendable, meaning that I would not be sorry if any of it got deleted or corrupted. Those are just test machines and multimedia that I can download or recreate again if I need them.
However, after I’ve seen the performance and management benefits of using btrfs to store this stuff, and tested it all thoroughly, I’ve decided to have my root filesystem, as well as my /home directory on it. Backups are still being stored on a completely separate drive, but even after 9 months of really abusing btrfs for all these purposes, I have not had any kind of data corruption, performance degradation or overhead. At all. Not on disk space, or in iops.
So, here is a little scenario that might help you get started with btrfs. It’s more of a showcase for desktop use, than a performance benchmark of any kind.
First of all, let’s give credit to IMHO best entry level howto that I’ve found. You might want to check it out to get familiar with btrfs usage: BTRFS Fun
We’ll be using /run/shm when copying things, so that we get clean performance data (copy from and to RAM to prevent disk latency). In Ubuntu Oneiric and later, you can modify its size like this:
mount -o remount,size=2048M /run/shm
Ignoring the OS drive, here is a list of devices used in this article. These are 3 really slow 2.5’ laptop drives that I’m using because they are small and quiet. As we can see, there are already some btrfs partitions on it, but we’ll focus on 5th partition of every drive - that’s the only logical partition and it is currently formatted as ext4.
parted -l Model: ATA WDC WD2500BMVS-1 (scsi) Disk /dev/sdb: 250GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.3kB 12.8GB 12.8GB primary btrfs 2 12.8GB 25.6GB 12.8GB primary btrfs 3 25.6GB 178GB 153GB primary btrfs 4 179GB 250GB 70.7GB extended 5 179GB 197GB 17.2GB logical ext4 Model: ATA ST9250410AS (scsi) Disk /dev/sdc: 250GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.3kB 12.8GB 12.8GB primary btrfs 2 12.8GB 25.6GB 12.8GB primary btrfs 3 25.6GB 178GB 153GB primary btrfs 4 179GB 250GB 70.8GB extended 5 179GB 196GB 17.2GB logical ext4 Model: ATA MAXTOR STM316081 (scsi) Disk /dev/sdd: 160GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.3kB 10.7GB 10.7GB primary btrfs 2 10.7GB 21.5GB 10.7GB primary btrfs 3 21.5GB 88.2GB 66.7GB primary btrfs 4 89.2GB 160GB 70.8GB extended 5 89.2GB 106GB 17.2GB logical ext4
mkdir /mnt/sdb5 /mnt/sdc5 /mnt/sdd5 /mnt/raid ls -lah /mnt/ total 0 drwxr-xr-x 1 root root 32 Mar 16 09:08 . drwxr-xr-x 1 root root 302 Mar 6 17:59 .. drwxr-xr-x 1 root root 0 Mar 16 09:08 raid drwxr-xr-x 1 root root 0 Mar 16 09:08 sdb5 drwxr-xr-x 1 root root 0 Mar 16 09:08 sdc5 drwxr-xr-x 1 root root 0 Mar 16 09:08 sdd5
Ext4 single device
Let’s do some benchmarks using ext4 on just one device, to get a feel for the performance we can expect. All 3 drives are more or less the same (actually not, but it does not really matter).
mkfs.ext4 /dev/sdb5 tune2fs -o journal_data_writeback /dev/sdb5 mount -t ext4 -o defaults,noatime,data=writeback /dev/sdb5 /mnt/sdb5 mount |grep sdb5 /dev/sdb5 on /mnt/sdb5 type ext4 (rw,noatime,data=writeback) df -h |grep sdb5 /dev/sdb5 16G 369M 15G 3% /mnt/sdb5
And here is our video file to be copied:
ls -lah /run/shm/video.m4v -rw-r--r-- 1 root root 1.8G Mar 16 09:12 /run/shm/video.m4v
In these examples, I’ll use “pv” instead of “cp”, so that we can see the copy progress. Note that “cp” is some 10% faster in my case, but if we use the same tool all the time, we should get consistent results.
time pv /run/shm/video.m4v > /mnt/sdb5/video.m4v 1.73GB 0:00:43 [40.4MB/s] [==================>] 100% real 0m45.696s user 0m0.036s sys 0m2.728s
OK, so that’s our starting value, using only one drive formatted as ext4. That was my typical setup before I started using multiple device setup, one of which is btrfs. I’ve tried this test several times, and it always ends up with those numbers.
Ext4 on software RAID0
Now, let’s see what this test looks like when we do it using my old setup - big ext4 partition on software raid0 which consists of 3 partitions, one on each drive (/sdb5 , sdc5 and sdd5). Those are the same partitions that we’ll use in our btrfs test. Keep in mind that software (kernel) raid0 and raid1 levels perform better than any affordable raid controller you can buy. Performance bennefits from using a dedicated hardware controller are visible only when that controller is real hardware controller (usually $1k or more). And even then performance difference is minimal. However, hardware controllers do have better rebuild rates.
mdadm --create /dev/md0 --chunk=4 --level=0 --raid-devices=3 /dev/sdb5 /dev/sdc5 /dev/sdd mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. cat /proc/mdstat Personalities : [raid0] md0 : active raid0 sdd5 sdc5 sdb5 50336508 blocks super 1.2 4k chunks tune2fs -o journal_data_writeback /dev/md0p1 mount -t ext4 -o defaults,noatime,data=writeback /dev/md0p1 /mnt/raid df -h |grep raid /dev/md0p1 48G 853M 45G 2% /mnt/raid
time pv /run/shm/video.m4v > /mnt/raid/video.m4v 1.73GB 0:00:16 [ 107MB/s] [==================>] 100% real 0m16.789s user 0m0.000s sys 0m2.860s
OK, so as expected, writing on 3 striped (raid0) devices is much faster than just one. And it’s a linear performance gain, so everything working as expected.
btrfs in raid0 setup
Now for the fun part…
WARNING do not try to specify nodesize,leafsize and sectorsize larger than 4KB. Although it will create and mount filesystem, kernel will lock up when trying to do anything else (including unmounting it or shutting machine down). This is just a wild guess, but it might have something to do with kernel PAGESIZE. Not sure if it’s a feature, for what it’s worth, I’m running Linux 3.2.0-18-generic #29-Ubuntu SMP Fri Mar 9 21:36:08 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
mkfs.btrfs -m raid0 -d raid0 -n 4096 -l 4096 -s 4096 -L testing /dev/sdb5 /dev/sdc5 /dev/sdd5 adding device /dev/sdc5 id 2 adding device /dev/sdd5 id 3 fs created label testing on /dev/sdb5 nodesize 4096 leafsize 4096 sectorsize 4096 size 48.01GB Btrfs Btrfs v0.19 btrfs filesystem show testing Label: 'testing' uuid: f765325b-fa2e-4df3-8242-8c101a914f5f Total devices 3 FS bytes used 28.00KB devid 3 size 16.00GB used 2.00GB path /dev/sdd5 devid 1 size 16.00GB used 2.02GB path /dev/sdb5 devid 2 size 16.00GB used 2.00GB path /dev/sdc5
mount -o noatime,compress=lzo,space_cache /dev/sdb5 /mnt/raid df -h |grep raid /dev/sdb5 49G 28K 45G 1% /mnt/raid btrfs filesystem df /mnt/raid Data, RAID0: total=3.00GB, used=0.00 Data: total=8.00MB, used=0.00 System, RAID0: total=15.94MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, RAID0: total=3.00GB, used=24.00KB Metadata: total=8.00MB, used=0.00
time pv /run/shm/video.m4v > /mnt/raid/video.m4v 1.73GB 0:00:14 [ 125MB/s] [==================>] 100% real 0m14.571s user 0m0.000s sys 0m2.200s
So, results are the same as with kernel raid0, and write time drops from 45.7s (single device) to 14.5s. That’s quite linear, and that’s what I call cool.
I’ve repeated these test for 9 months, with many small files, large files, concurrent processes and everything I could think of. Results are always the same, or even much better with btrfs. I have not noticed any performance drops. No matter what file it is.
My point is that btrfs raid0 is no slower than software raid0, which is no slower than hardware raid0, so we are not going to have any penalties from using it. What do we gain and why I prefer it to any other solution?
Simplicity. I have 3 btrfs filesystems at this time. Root, home and media (media actually has 2 subvolumes, vm and Videos with different mount options).
And just for comparison, simplicity comes to play when using a bit more complicated setups, be it several arrays, or more complex arrays such as raid10.
For example, to add a device to software raid0 setup, we have to:
- create that partition or device
- add that device to raid0
- extend ext4 volume to utilize the new size
With btrfs, we only have to
- create that partition or device
- add that device to raid0
With raid10, it gets a bit more complicated. Typical raid10 setup that many people use on linux is:
- raid1 volume 1 (2 redundant devices)
- raid1 volume 2 (2 redundant devices)
- and then they create an LVM consisting of those 2 raid arrays. This gives us both speed, redundancy and easy recovery.
So, to add another partition to that array, we would:
- create the new partitions or devices
- add those devices to raid1
- extend PV to include that new raid1 arrays
- extend LV to utilize the new PV space
- extend ext4 partition to utilize the new LV size
With btrfs, we only need 2 or 3 steps:
- create that partition or device
- add that device to raid10
- rebalance the array (or ignore this step and allow btrfs to do it automatically over time)
And this is just when we consider RAID setups. btrfs comes with much, much more, and eliminates the need for RAID/LVM completely. For example, it’s got built-in subvolume and snapshot management. And it’s cow, so snapshot usage performs really really well. But that’s not covered in this blog entry.
Also, notice that I’ve used “compress=lzo,space_cache” mount options to mount btrfs? Yup, it comes with lzo and zlib compression, and space_cache is just there to make allocation faster. Note that “compress” does not mean “slow”. Quite contrary, it’s got a smart little algorythm that will use compression only for files which can benefit from it. And if you use this mount option on small files (such as root and home filesystems), you gain a LOT of performance. Also, I’ve found that using “lzo” instead of “zlib” compression makes thins less cpu-intensive. At least for my use case. More about that can be found at Phoronix’s article .
BTW, if you prefer a btrfs crash-course video, here’s one you might find entertaining: