this post was submitted on 30 Jan 2024
74 points (100.0% liked)

Linux

48033 readers
1042 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

cross-posted from: https://programming.dev/post/9319044

Hey,

I am planning to implement authenticated boot inspired from Pid Eins' blog. I'll be using pam mount for /home/user. I need to check integrity of all partitions.

I have been using luks+ext4 till now. I am ~~hesistant~~ hesitant to switch to zfs/btrfs, afraid I might fuck up. A while back I accidently purged '/' trying out timeshift which was my fault.

Should I use zfs/btrfs for /home/user? As for root, I'm considering luks+(zfs/btrfs) to be restorable to blank state.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 10 points 9 months ago* (last edited 9 months ago) (4 children)

My experiences:

ZFS: never even tried because it's not integrated (license).

Btrfs: iirc I've tried it three times. Several years ago now. On at least two of those tries, after maybe a month or some of daily driving, suddenly the fs goes totally unresponsive and because it's the entire system, could only reboot. FS is corrupted and won't recover. There is no fsck. There is no recovery. Total data loss. Start again from last backup. Haven't seen that since reiserfs around 2000. Found lots of posts with similar error message. Took btrfs off the list of things I'll be using in production.

I like both from a distance, but still use ext*. Never had total data loss that wasn't a completely electrically dead drive with any version I've used since 1995.

[–] [email protected] 8 points 9 months ago (1 children)

Ouch, that must have been a pain to recover from...

I've had almost the opposite experience to yours funnily. Several years ago my HDDs would drop out at random during heavy write loads, after a while I narrowed down the cause to some dodgy SATA power cables, which sadly I could not replace at the time. Due to the hardware issue I could not scrub the filesystem successfully either. However I managed to recover all my data to a separate BTRFS filesystem, using some "restore" utility that was mentioned in the docs, and to the best of my knowledge all the recovered data was intact.

While that past error required a separate filesystem to perform the recovery, my most recent hardware issue with drives dropping out didn't need any recovery at all - after resolving the hardware issue (a loose power connection) BTRFS pretty much fixed itself during a scheduled scrub and spat out all the repairs in dmesg.

I would suggest enabling some kind of monitoring on BTRFS's counters if you haven't, because the fs will do whatever it can to prevent interruption to operations. In my previous two cases, performance was pretty much unaffected, and I only noticed the hardware problems due to the scheduled scrub & balance taking longer or failing.

Don't run a fsck - BTRFS essentially does this to itself during filesystem operations, such as a scrub or a file read. The provided btrfs check tool (fsck) is for the internal B-tree structure specifically AFAIK, and irreversably modifies the filesystem internally in a way that can cause unrecoverable data loss if the user does not know what they are doing. Instead of running fsck, run a scrub - it's an online operation that can be done while the filesystem is still mounted

[–] [email protected] 4 points 9 months ago

DO NOT RUN A SCRUB IF YOU SUSPECT HARDWARE FAILURE.

No seriously. If you are having hardware issues a scrub could make the corruption much worse. You should first make a complete copy of your data and then run btrfs check. Sorry for shouting but it is really important you don't stub a bad disk.

[–] [email protected] 7 points 9 months ago (1 children)

Btrfs has come a long way in the last few years. I have been using it for a little over 5 years and its rock solid. It now powers all my bare metal machines and I use Raid 1 on my servers.

There was one time I had a disk unexpectedly go bad (it started returning bad data on read) which lead to the system going read only. It took me about 5min to swap disks and it was fine. Needless to say I was impressed that no data was lost.

Btrfs will normally won't get corrupted unless you have a hardware issue. It uses cow so writes can never be half competed. If you do manage to get corruption you can use btrfs check.

[–] [email protected] 3 points 9 months ago* (last edited 9 months ago)

Btrfs will normally won’t get corrupted unless you have a hardware issue. It uses cow so writes can never be half competed. If you do manage to get corruption you can use btrfs check.

From my experience BTRFS is way more reliable against hardware failure then Ext4 ever was. Ext* filesystems tend to go corrupt on the first and smallest power loss or hardware failure.

[–] [email protected] 4 points 9 months ago (1 children)

There is btrfs-check --repair to fix corruption

[–] [email protected] 4 points 9 months ago* (last edited 9 months ago) (1 children)

https://www.suse.com/support/kb/doc/?id=000018769

WARNING: Using '--repair' can further damage a filesystem instead of helping if it can't fix your particular issue.

Edit:

It is extremely important that you ensure a backup has been created before invoking '--repair'.

[–] [email protected] 5 points 9 months ago* (last edited 9 months ago)

That is a caveat with OS disk tools. Even partition resizing gives this warning, as does Windows checkdisk...something about unnessary disk checks ahould be avoided as they can create issues where none might have existed, so only run when you suspect a problem.

But as lemann pointed out in this thread btrfs scrub is less risky

[–] [email protected] 3 points 9 months ago (1 children)

Several years ago now. On at least two of those tries, after maybe a month or some of daily driving, suddenly the fs goes totally unresponsive and because it’s the entire system, could only reboot. FS is corrupted and won’t recover. There is no fsck. There is no recovery. Total data loss.

Could you narrow it down to just how long ago? BTRFS took a very long time to stabilise, so that could possibly make a difference here. Also, do you remember if you were using any special features, especially RAID, and if RAID, which level?

[–] [email protected] 2 points 9 months ago

I could see if there's notes somewhere. Very plain desktop and laptop. Probably encrypted LVM. At least one was doing a lot of software builds with big system image trees and snapshots.