this post was submitted on 04 Dec 2023
2 points (75.0% liked)
Data Hoarder
116 readers
1 users here now
We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I realize there are solutions, but I wanted my own for various reasons (better fit to the peculiar way I store and backup).
It was straightforward to write a python script to crawl a directory tree, adding files to an sqlite database. The script has a few commands:
- "check" computes checksums on files whose modification times have changed since last check, or on any file whose checksum is older than X days (find bitrot this way).
- "parity" Use par2 to compute parity files for all files in database. Store these in a ".par2" directory in the directory tree root so it doesn't clutter the directory tree.
I like this because I can compute checksums and parity files per directory tree (movies, music, photos, etc), and by disk (no raid here, just JBOD + mergerfs). Each disk corresponds exactly to a backup set kept in a pelican case.
The sqlite database has the nice side effect that checksum / parity computation can run in the background and be interrupted at any time (it takes a loooooooong time). The commits are atomic, so machines crashes or have to shut down, it's easy to resume from previous point.
Surely.... SURELY... someone has already written this. But it took me a couple of afternoons to roll my own. Now I have parity and the ability detect bitrot on all live disks and backup sets.
Mind sharing on github or something?