Hello,
TL;DR are there particular workloads that suffer from having to access a RAID0 array?
I've currently got my /home partition in a BTRFS RAID0 array with two 1 TB mechanical drives, and I'm considering getting SSDs for /home instead. I could get one 2 TB SSD and be happy with it, but I could instead get two 1 TB SSDs and make a RAID0 array again. The latter option would of course get me better overall throughput, but I'm wondering whether there are workloads that might suffer from being run from a RAID0 array vs. just running on a "bare" disk.
On Thu, 2022-02-10 at 18:11 +0200, Matti Pulkkinen wrote:
Hello,
TL;DR are there particular workloads that suffer from having to access a RAID0 array?
I've currently got my /home partition in a BTRFS RAID0 array with two 1 TB mechanical drives, and I'm considering getting SSDs for /home instead. I could get one 2 TB SSD and be happy with it, but I could instead get two 1 TB SSDs and make a RAID0 array again. The latter option would of course get me better overall throughput, but I'm wondering whether there are workloads that might suffer from being run from a RAID0 array vs. just running on a "bare" disk.
What I did was convert my 2x1TB HDDs (rescued from a dead NAS) to a BTRFS RAID1 array and attach them via a USB3 port for use as backup (using BorgBackup, which dedupes and compresses). I installed a single 2TB SSD (on a SATA3 port) for active use, i.e. /root, /home, /boot etc., also under BTRFS but with no RAID. My workload is nothing special, so YMMV, but I find the throughput of the SSD so dramatically better than an HDD that I don't care that it's not striped.
poc
On 2/10/22 10:11, Matti Pulkkinen wrote:
Hello,
TL;DR are there particular workloads that suffer from having to access a RAID0 array?
I've currently got my /home partition in a BTRFS RAID0 array with two 1 TB mechanical drives, and I'm considering getting SSDs for /home instead. I could get one 2 TB SSD and be happy with it, but I could instead get two 1 TB SSDs and make a RAID0 array again. The latter option would of course get me better overall throughput, but I'm wondering whether there are workloads that might suffer from being run from a RAID0 array vs. just running on a "bare" disk.
Remember that with RAID0, if you lose ANY drive, you lose the whole volume. RAID0 is great for increasing throughput, but it is the most risky RAID configuration possible. I would never run /home on RAID0 unless I was doing something like two drives in RAID0 but doing nightly backups to a third drive in case my RAID0 volume broke. You're flirting with disaster running /home on RAID0.
For my home machine, I have an nvme drive for the OS, including /home. I have a nightly rsync job that syncs my /home directory to an NFS server that has multiple drives in it. So if I lose my nvme drive, I still have /home backed up. If I lose a single drive on my NFS server, I still have my backups. Basically I would have to lose three drives in two machines at the same time before I lose my /home directory.
Just my two cents.
Thomas
On 2022-02-10 11:11 a.m., Matti Pulkkinen wrote:
TL;DR are there particular workloads that suffer from having to access a RAID0 array?
I've currently got my /home partition in a BTRFS RAID0 array with two 1 TB mechanical drives, and I'm considering getting SSDs for /home instead. I could get one 2 TB SSD and be happy with it, but I could instead get two 1 TB SSDs and make a RAID0 array again. The latter option would of course get me better overall throughput, but I'm wondering whether there are workloads that might suffer from being run from a RAID0 array vs. just running on a "bare" disk.
Read the SSD reviews before picking one. There are quite a lot of variations in burst and sustained write speeds, number of rewrites before failure, etc. There is even one out there that has performance issues when you write the stupid flashy lights on it to a particular colour (!!!).
SSDs have no appreciable seek time and have much faster read rates that spinning rust. Depending upon how much onboard RAM cache they provide (some even provide no cache), you may also see considerably better burst write speed, although sustained write speeds are generally no better than a disk. So, unless you are doing something that requires sustained intense writes, moving to an SSD is a no-brainer. I would not bother with RAID, as eliminating the seek times will speed up virtually any app. Go with a single SSD because unless you are writing many TB/month, a single SSD will also probably last longer than your spinning disks. They also use less power and are quieter. IMHO, I think that LVM and/or MD are a lot of extra and unnecessary complexity more useful on larger servers (at least dozens of disks), and buy you very little on a smaller server or personal workstation. If drive failure is a new concern (remembering that you're using non-redundant RAID-0), then get a second one and run them as a BTRFS or ZFS RAID-1 set.
P.S: Why are you using a RAID-0 array? You have no redundancy, higher software complexity, somewhat better read speeds and much slower write speeds, and a much higher chance of hardware error. RAID-0 is generally used for things like very short-lived DB caches and not much else. If you have a RAID controller, trash it, as the IOPS are inferior to software solutions already in the kernel and FS logic. RAID rebuild times with a controller are also generally so bad that you have excellent odds that you will probably experience a second drive failure while rebuilding with the drive sizes sold today.
--
John Mellor
On 2/10/22 08:11, Matti Pulkkinen wrote:
TL;DR are there particular workloads that suffer from having to access a RAID0 array?
If uptime is excluded as a factor, then I'm not aware of any.
I've currently got my /home partition in a BTRFS RAID0 array with two 1 TB mechanical drives, and I'm considering getting SSDs for /home instead. I could get one 2 TB SSD and be happy with it, but I could instead get two 1 TB SSDs and make a RAID0 array again. The latter option would of course get me better overall throughput
It *probably* will, but I think there are conditions under which it wouldn't. One SSD might be able to saturate your controller for reads. Interleaved writes will probably improve, but that might depend on how many cells are in each SSD, and how the SSD's controller spreads writes among them. If you're looking at QLC drives, it might depend somewhat on whether the 1TB drives together have more SLC cells than the 2TB drive has.
Which is to say that if you haven't actually tested *your workload* on it, then there's some risk. You're probably not going to save much on the purchase, you're going to have slightly less reliable storage, and there's a small chance that performance won't be much better than a single drive.
On 2/10/22 08:35, Thomas Cameron wrote:
RAID0 is great for increasing throughput, but it is the most risky RAID configuration possible. I would never run /home on RAID0 unless I was doing something like two drives in RAID0 but doing nightly backups to a third drive in case my RAID0 volume broke. You're flirting with disaster running /home on RAID0.
As we often say, RAID isn't backup. Or in other words, RAID6 is (essentially) no safer than RAID0 if you don't have backups. I would never run /home on any storage configuration without regular backups, either. From that perspective, it seems odd to warn someone about RAID0. Non-redundant arrays increase the risk of down time, but recovery from data loss is an orthogonal concern.
On 2/10/22 11:19, John Mellor wrote:
SSDs have no appreciable seek time and have much faster read rates that spinning rust. Depending upon how much onboard RAM cache they provide (some even provide no cache), you may also see considerably better burst write speed, although sustained write speeds are generally no better than a disk.
Hard disk drives usually write sequentially at around 130MB/s. I've seen QLC drives that aren't better than HDD, but unless you're buying literally the slowest drives on the market, SSDs will probably write *much* faster than HDD.
P.S: Why are you using a RAID-0 array? You have no redundancy, higher software complexity, somewhat better read speeds and much slower write speeds
Most RAID levels will have slower writes than single disks for some workload (usually small random writes on arrays with small numbers of members). RAID0 is the only level I'd expect to be faster than a single disk for all workloads.
When would you see much slower write speeds with RAID0?
to, 2022-02-10 kello 10:35 -0600, Thomas Cameron kirjoitti:
Remember that with RAID0, if you lose ANY drive, you lose the whole volume. RAID0 is great for increasing throughput, but it is the most risky RAID configuration possible. I would never run /home on RAID0 unless I was doing something like two drives in RAID0 but doing nightly backups to a third drive in case my RAID0 volume broke. You're flirting with disaster running /home on RAID0.
I'm not concerned about losing drives, because I have Déjà Dup push daily backups automatically to a different machine. If I lose a drive, I can just replace it and then restore a backup.
to, 2022-02-10 kello 19:12 -0800, Gordon Messmer kirjoitti:
It *probably* will, but I think there are conditions under which it wouldn't. One SSD might be able to saturate your controller for reads. Interleaved writes will probably improve, but that might depend on how many cells are in each SSD, and how the SSD's controller spreads writes among them. If you're looking at QLC drives, it might depend somewhat on whether the 1TB drives together have more SLC cells than the 2TB drive has.
Interesting. I'm looking at the Crucial MX500 in either 2x1TB or 1x2TB configuration. They appear to be "TLC" drives rather than QLC or SLC, but I would imagine that both configurations would have more or less the same amount of cells.
Which is to say that if you haven't actually tested *your workload* on it, then there's some risk. You're probably not going to save much on the purchase, you're going to have slightly less reliable storage, and there's a small chance that performance won't be much better than a single drive.
I'm not concerned about reliability in this case, and the price is more of less the same, but if the RAID0 array isn't going to be any faster – and might even be slower – then I suppose there wouldn't be much point to it.
Mostly trivia, but might help someone one day...
True that raid0 is basically for data you don't care about, if any drive in the array dies, you lose everything. Except on Btrfs...
If the metadata profile is raid1 (or raid1c34), you will still lose all the data on the failed drive. But you will be able to mount the remaining drive(s) using `mount -o ro,degraded` due to the raid1 metadata profile. The file system itself is not striped, but mirrored (two copies for raid1 no matter how many drives). You can't mount it read-write because it's below the minimum number of drives due to raid0 data.
If you copy the files out, you'll have quite a mess because obviously most files are missing or damaged (swiss cheese). You'll need a tool that tolerates I/O errors, by continuing to read the rest of the file rather than giving up on the first I/O error. ddrescue does this (it works on block devices or files, in this case you'd use it on files).
The mkfs time default profile for metadata is raid1 if you include 2 or more disks in the mkfs command. Otherwise you get DUP profile for metadata and single profile for data. If you add a second drive to a single drive Btrfs, you need to manually convert, e.g. `btrfs balance start -mconvert=raid1 -dconvert=raid0`
This same behavior happens with an e.g. 2-disk Btrfs with single profile data, and raid1 profile metadata. You can mount it ro,degraded, and get the files off the surviving drives. In this case, you'll get both more completely lost files and more completely intact files because single profile doesn't stripe data.
-- Chris Murphy
Not familiar with DejaDup, but with this setup on RAID0 do an rsync every 15 minutes to the backup system.
Regards, -Jamie
On Fri, Feb 11, 2022 at 8:38 AM Matti Pulkkinen mkjpul@utu.fi wrote:
to, 2022-02-10 kello 10:35 -0600, Thomas Cameron kirjoitti:
Remember that with RAID0, if you lose ANY drive, you lose the whole volume. RAID0 is great for increasing throughput, but it is the most risky RAID configuration possible. I would never run /home on RAID0 unless I was doing something like two drives in RAID0 but doing nightly backups to a third drive in case my RAID0 volume broke. You're flirting with disaster running /home on RAID0.
I'm not concerned about losing drives, because I have Déjà Dup push daily backups automatically to a different machine. If I lose a drive, I can just replace it and then restore a backup.
-- Terveisin / Regards, Matti Pulkkinen
users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
la, 2022-02-12 kello 11:32 -0400, Jamie Fargen kirjoitti:
Not familiar with DejaDup, but with this setup on RAID0 do an rsync every 15 minutes to the backup system.
No. I've considered the options, and decided that a backup with Déjà Dup once a day is good enough for me.
On Sat, Feb 12, 2022 at 8:32 AM Jamie Fargen jamie@fargenable.com wrote:
Not familiar with DejaDup, but with this setup on RAID0 do an rsync every 15 minutes to the backup system.
rsync has some advantages: destination does not need to be btrfs, the --inplace option for VM images
But for such very frequent backups, it's more like a replication use case. And btrfs send/receive is very efficient at this because unlike rsync, no deep traversal is required of either the source or destination. Btrfs will increment the generation anytime a file is modified, and the generation of the leaf the inode is contained in, and the node the leaf is referenced in, all the way up to the file tree root. This makes it very cheap when doing a diff between two snapshots, for btrfs to figure out what has changed without having to look at every inode. It just skips all the parts of the tree that haven't changed, in effect it creates a "replay" list between the two generations. An incremental send contains just the changes, and that includes knowing when files are renamed or moved, so their data doesn't need to be sent again.
So if you were to change just one file in 15 minutes, a btrfs send -p stream (an incremental stream produced as a "diff" between two snapshots) and receive will take a few seconds, even if the snapshot contains millions of files. There'd be a straight line following the nodes and leaves with the incremented generation leading straight to the only changed file.
(You could use 'btrfs send -f' and place the stream as a file on a non-btrfs file system. But you can't really look inside of it like a snapshot received on a btrfs file system.)