this post was submitted on 21 Oct 2024

59 points (94.0% liked)

Linux

48008 readers

1217 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
No misinformation
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago

MODERATORS

[email protected]

Should I be worried? (lemmy.ml)

submitted 2 weeks ago by [email protected] to c/[email protected]

27 comments fedilink hide all child comments

top 27 comments

sorted by: hot top controversial new old

[–] [email protected] 46 points 2 weeks ago (1 children)

Not if you have backed up your data. You have a backup of your data right?

[–] [email protected] 19 points 2 weeks ago (2 children)

Yeah the important stuff is backed up, but I am still concerned my entire OS will suddenly go kaput. How fucked am I?

[–] [email protected] 14 points 2 weeks ago (1 children)

The OS is the least important part of your computer.

[–] [email protected] 1 points 1 week ago (1 children)

It'd be bad if I were working on something and the entire thing just suddenly broke down before I have the time to save and backup 😅

[–] [email protected] 3 points 1 week ago* (last edited 1 week ago)

If it's your os drive that dies, nothing important has been lost except for a few minutes of work. You can boot from a variety of media (cd, usb...) for recovery, or drive replacement. Worst case, you'll have to reinstall a few things in the following days.

It's also why it's not a bad idea to separate the various aspects of the system on distinct drives.

[–] [email protected] 8 points 2 weeks ago

If you have everything you need backed up you can reinstall on a new hard drive and restore everything you need. So you should not be completely fucked. Just an inconvenience you might have to go through. You will lose the stuff not backed up so if any of that is a pain to get again it might be more painful to restore everything.

Others have said some thing you might want to try. But having a spare disk you can swap to is never a bad idea. Disks to fail and you should plan for what to do when they do. Backing up your data is a good first step.

I would say it is not a bad idea to just get a new disk now and go through the process of restoring everything anyway - you can treat it like your disk has failed and do what you would need to do to restore. With the ability to swap back when you need to.

This is a good way to find things you might have missed in your backups.

[–] [email protected] 26 points 2 weeks ago (1 children)

Back up your data now
Reseat the cables for the drive
Run a self test on the drive - smartctl -t long - if it doesn't pass, then the drive is trash. If it does, then it might limp along a bit longer before catastrophically failing

[–] [email protected] 2 points 1 week ago (2 children)

I used the GUI program for SMART and the list of issues got marked as "old age", all of them.

[–] [email protected] 2 points 1 week ago

Iirc old age is the best it can be

[–] [email protected] 1 points 1 week ago (1 children)

They meant the SMART self-test, not SMART data readout. Those are not meant to be interpreted by laymen and often not even experts.

[–] [email protected] 1 points 1 week ago* (last edited 1 week ago) (1 children)

I did perform the self-test function, the long version that says it will take 10s of minutes. Some of the errors were displayed with red text before the test. After the self test, it said that my drive passed and all the red errors showed up as "Old age" in black text, every single one.

(This is in the GUI app for smartctl)

[–] [email protected] 2 points 1 week ago* (last edited 1 week ago)

Please stop trying to interpret the SMART data report. Even if you're knowledgeable it can easily mislead you because this is vendor-specific data that follows no standard and is frequently misinterpreted by even the program displaying the data.

If the self-test passed, it's likely the cable or the controller. Try a different cable.

[–] [email protected] 19 points 2 weeks ago

Looks like either bad cable or failing drive.

[–] [email protected] 11 points 2 weeks ago

What kind of machine is this, laptop? Desktop? If desktop, check the cables. Otherwise I'd switch out the drive.

[–] [email protected] 6 points 2 weeks ago

You are probably fine. Check your cables as this is either buggy firmware or a flaky connection

[–] [email protected] 5 points 2 weeks ago* (last edited 2 weeks ago)

No need to worry, disk failures almost never result in fires or hazardous conditions.

A-yuk-yuk-yuk.

Seriously: you have a disk that has failed, based just on that little snippet of the logs, internally (ICRC ABRT). You can either use a tool like spinrite to try and repair it, but you may lose all the data in the process, or replace it.

A user suggested bad cabling and that’s a possibility, one you can check easily if the error is reproducible by swapping the cable. Before I swap cables often I’ll confirm the diagnosis using smartctl and look for whatever the drive manufacturer calls the errors that happen between the media and disk controller chip on the drive. If it has those then there’s no point in trying a cable swap, the problem is not happening there.

People will say that you can’t “fix” bad disks with tools like spinrite or smartctl. I’ve found that to be incorrect. There are certainly times when the disk is kaput but most of the time it’ll work fine and can go back into service.

Of course, that’s recovering from errors when I get an email or text the first time and going back to service in a multi-parity array so lowered criticality and early detection could have lots to do with that experience.