this post was submitted on 07 Mar 2024
208 points (98.1% liked)

/0

1549 readers
41 users here now

Meta community. Discuss about this lemmy instance or lemmy in general.

Service Uptime view

founded 1 year ago
MODERATORS
 

Follow-up from https://lemmy.dbzer0.com/post/15792108

I've spent a ton of hours trying to troubleshoot why lemmy.dbzer0.com is falling behind lemmy.world in federation. We have a good idea where it's going wrong, but we have no idea why.

The problem is that once I receive an apub sync from l.w and send the request to the lemmy backend, it takes about 1 second to process an apub sync, which is way too long (is should typically be < 100ms).

We had a look at the DB and it was somewhat slow due to syncing commits to disk. OK we disabled that, and now it's much faster (and less safe but whatever) but the sync times have not improved at all.

I've also made a lot of tests to ensure the problem is not coming from my loadbalancers and I am certain I've removed them from the equation. The issue is somewhere within the lemmy docker stuff and/or the postgresql DB.

Unfortunately I'm relying solely on other admins help on matrix, and at this point I'm being asked to recompile lemmy from scratch or deploy my own docker container with more debug instructions. Neither of these is within my skillset, so I'm struggling to make progress on this.

In the meantime we're falling further and further behind in the lemmy.world federation queue, (along with a lot of other instances). To clarify, the problem is not lemmy.world. It takes my instance the same time to receive apub syncs from every other server. It's just that the other servers don't have as much traffic so 1/s is enough to keep up. But lemmy.world has so much constant changes, 1/s is not nearly fast enough.

I'm continuing to dig on this as much as I can. But I won't lie that I could use some help.

I'll keep you all updated in this thread.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 103 points 8 months ago* (last edited 8 months ago) (10 children)

Final Update: Problem has been resolved by migrating my lemmy backend. We are currently catching up to lemmy.world, which should probably take all the next day. But the "distance" is reducing by hundreds of syncs per minute.

I will write a post-mortem soon

[–] [email protected] 22 points 8 months ago (1 children)

HACKERPERSON-Energy. Hope you feel relieved now.

[–] [email protected] 26 points 8 months ago (1 children)
[–] [email protected] 9 points 8 months ago

Appreciate everything you do! Thanks

[–] [email protected] 21 points 8 months ago
[–] [email protected] 10 points 8 months ago (1 children)

I'm interviewing for a job today that could drastically change my life. If I get it I'll be setting aside some $ to help maintain this instance. Your work here and beyond (ex. AI horde) are incredible and I'm looking forward to helping

[–] [email protected] 6 points 8 months ago

much appreciated mate!

[–] [email protected] 8 points 8 months ago
[–] [email protected] 6 points 8 months ago

congrats, as a self hoster myself, i understand the feeling of getting something up and running to your liking. It's the best feeling.

[–] [email protected] 5 points 8 months ago

Thank you so much 🙏

[–] [email protected] 5 points 8 months ago

Thank you for your continous work!

[–] [email protected] 4 points 8 months ago

Thanks a lot for putting the energy and time into fixing this.

[–] [email protected] 2 points 8 months ago