this post was submitted on 25 May 2024
24 points (92.9% liked)
Asklemmy
43777 readers
1686 users here now
A loosely moderated place to ask open-ended questions
Search asklemmy π
If your post meets the following criteria, it's welcome here!
- Open-ended question
- Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
- Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
- Not ad nauseam inducing: please make sure it is a question that would be new to most members
- An actual topic of discussion
Looking for support?
Looking for a community?
- Lemmyverse: community search
- sub.rehab: maps old subreddits to fediverse options, marks official as such
- [email protected]: a community for finding communities
~Icon~ ~by~ ~@Double_[email protected]~
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Isn't there a software issue if it breaks with time unless you maintain it? What has happened more specifically? Memory leaks?
The server's hard drive filled up completely from image uploads, which in turn corrupted the database.
By the time the admin noticed it and had time to troubleshoot, the automated backup process had replicated the corrupted database, overwriting all backups that had a still-functioning database.
There was some personal event in the admin's life as well as a long planned vacation at that time.
The lesson to be learned here is that a private server administrated by a single person in their spare time isn't something you can rely on.
The Feddit community is currently trying to found an organisation that can share the administrative load and is able to receive donations.
This is so baffling to me. How does this happen? There needs to be checks in place so that this can't happen ffs lol. No space left on the device to complete the write? Abort. Or like, starting to run out of space => stop accepting new content until fixed, to protect the integrity of the data. Something.
Anyway, I hope they manage to find a solution to sharing the load of work! ππ
You can set up safeguards against this. You can also make sure some of your backups are never overwritten. But you have to do it in advance.
Memory leaks?
Possible, but much more likely is disk full. Not a bug, just something that happens...
Good point, didn't think of that. That's not an issue with the software. Although, one could argue that it should not break down and become unresponsive.
A lot of software writes to log files or temporary files or lock files or database transaction logs as part of its normal function and when those writes fail due to a full disk the software doesn't work anymore.
That's bad software then, right? The inability to write to disk shouldn't cause the software to lose all functionality. Unless that's its only function, or somehow depends on it for proper functioning. π€·ββοΈ
No. Every good software program should write at least logs to disk. Every good database writes to disk. Add a new post, db will commit to the db and the db will grow in size.
Name any decent sized program where new content is added and I guarantee it writes to disk and will fail eventually if not maintained.
Nice down vote. Let's discuss instead.
I'm saying that the server shouldn't go down just because new content can't be added. You should get maybe a 500-series REST response or something. Not... nothing. Ideally it should write to disk. Ideally it should allow new content to be added. But uptime and content access is still more important than being able to write to disk. It should warn the admin of the serious errors, and explain to the user in some diplomatic/apologetic manner. But never go down completely. That's not resilient at all.
That's my opinion. π
Fun fact: Old school admins used to write a large-ish (~5% hdd space) file of random data to the drive right after installing the server. If the hard drive ran full without anyone noticing, you could just delete the file to get some breathing room to deal with the issue. It's a very crude alarm system, but one you WILL notice when it goes off even if you ignore all emails.
I'm slightly confused. They will notice it, but somehow it still happened without anyone noticing? I feel like I lost something in that text. π What makes that trick such a good alarm system?
The alarm is that your server stops working, which you will definitely notice.
But you can get it working again simply by deleting the file.
lol okay. So that fixes nothing, except you'll be able to get it running a bit faster, provided you're even there to quickly react. But if this isn't the case, like it wasn't in the case of feddit, it's exactly the same. No maintenance: server becomes unreachable/crashes just because there's no disk space. Very fragile.
Oh well. Status quo, I guess.
For the record I did not downvote.
But I capitulate on your point. It would be great if every piece of software was written with resilience and uptime in mind.
As a former sysadmin that sounds like a dream. But I donβt think I have ever seen that with any mainstream program that Iβve had responsibility for. Does that mean all those programs were bad? I donβt think so. We wouldnβt need sysadmins if all programs were written the way you describe.
Programs can be written to auto rotate their logs, compact and reindex their dbβs. Using browser updates as an example, they can even safely auto update and revert back on failure.
How many programs actually do these things? My experience is next to 0. But I wouldnβt call them all bad or poorly written programs.
Thank you. It's alright, just aimed at whoever did. Sorry if it came off salty!
I meeeaaan... I hear what you're saying. I think your definition of "bad" is a bit stricter than what I was going for. So, you're right. They're not bad in some sense that they're useless or something along those lines. But if I were to write server software, my main goals I guess would be security, performance, and resiliency to failure. Stay alive at all costs (within reason).
But I think we're both on the same side there on all counts. ππ
I'm almost positive it did warn the admin in some way, but the admin was afk for weeks and didn't see the warnings.
That's good. π Just one piece of the puzzle though. π¬