this post was submitted on 30 Nov 2023
1 points (100.0% liked)

Data Hoarder

116 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 1 year ago
MODERATORS
 

Here is a fairly robust way to ensure a drive safe to put into service. I have tested this before and caught drives that would have failed shortly after put into prod, and some that would of after it was more than half full.

  1. Check S.M.A.R.T Info: Confirm no (0) Seek Error Rate, Read Error Rate, Reallocated Sector Count, Uncorrectable Sector Count

  2. Run Short S.M.A.R.T test

  3. Repeat Step 1

  4. Run Conveyance S.M.A.R.T test

  5. Repeat Step 1

  6. Run Destructive Badblocks test (read and write)

  7. Repeat Step 1

  8. Perform a FULL Format (Overwrite with Zeros)

  9. Repeat Step 1

  10. Run Extended S.M.A.R.T test

  11. Repeat Step 1

Return the drive if either of the following is true:

A) The formatting speed drops below 80MB/s by more than 10MB/s (my defective one was ~40MB/s from first power-on)

B) The S.M.A.R.T tests show error count increasing at any step

It is also highly advisable to stagger the testing (and repeat some) if you plan on using multiple drives in a pool/raid config. This way the wear on the drives differ, to reduce the likelihood of them failing at the same time. For example, I re-ran either the Full format or badblocks test on some of the drives so some drives have 48 hours of testing, some have 72, some have 96. This way, the chances of a multiple drive failures during rebuild is lower.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 2 points 11 months ago (1 children)

Jeez you're buring through so much of the drive's lifespan just checking the damn thing. If a failed drive will cause problems worthy of this amount of burn-in time you need a more robust setup.

I run all used ebay drives. Except for a glance at the smart data before addng them to the array I don't test them at all. Just keep an extra drive or two on hand as spares. Life's easier when you plan for failure instead of fighting it.

[โ€“] [email protected] 1 points 11 months ago

Same, except I also use Scrutiny to flag drives for my attention. It makes educated guesses for a pass/fail mark, using analysis of vendor-specific interpretations of SMART values, matched against the failure thresholds from the BackBlaze survey. It can tell you things like "the current value for the Command Timeout attribute for this drive falls into the 1-10% bracket of probability of failure according to BackBlaze".

It helps me to plan ahead. If for example I have 3 drives that Scrutiny says "smell funny" it would be nice if I had 2-3 spares on hand rather than just 1. Or if two of those drives happen to be together in a 2-pair mirror perhaps I can swap one somewhere else.