Overview
So I made this forum to work on one specific piece of software that I
think could benefit Lemmy (and the overall fediverse community)
substantially. I'll lay out what I want to make and why, in some
detail. I apologize for the length, but I can't really do this without some level of support and agreement from the community, so hopefully the wall of text is worth it
if it resonates with some people and they're swayed to support the
idea.
If something like this already exists please let me know. I looked
and couldn't find it, which is why I'm making this extensive pitch
about it being a good idea. But, if it's already in the works, I'd be
just as happy working on existing tech instead of reinventing it.
So:
The Problem
In short, the problem is that you have to pay for hosting. Reddit
started as a great community, just like Lemmy is now, but because it
was great it got huge, which meant they had to pay millions of
dollars to run their infrastructure, and now all of a sudden they're
not a community site anymore. They're a business, whether they like
that or not. Fast forward fifteen years and look how that turned out.
I think this will impact Lemmy in the future, in very different ways
but still substantially. It's actually already, at this very early
stage, impacting Lemmy: There are popular instances that are
struggling under the load, and people are asking for donations because
they have hosting bills. Sure, donations are great, and I'm sure these
particular load problems will get solved -- but the underlying
conflict, that someone who wants to run a substantial part of the
network has to make a substantial financial investment, will remain.
Because of its federated nature, Lemmy is actually a lot better
positioned to resist this problem. But, it'll still be a problem on
some level (esp. for big instances), and wouldn't it be better if we
just didn't have to worry about it?
The Solution
Basically, I propose that all users help run the network. Lemmy is a
big step forward because a lot more of the users can help than before, but
even in Lemmy, only a small fraction of people will choose to make
instances, and you'll still have big instances serving lots of
content. I propose to make it trivially easy for the end-users to
carry the load. They can install an app on their phones, or a browser
plugin, or run something on their home computer, but they have
absolutely trivial ways to use their hardware to add load capacity. The load on the instances will be way reduced just from
that option existing, I think. I would actually argue for taking it a
step further and having instance operators be able to require
load-carrying by their users, but that's a choice for the individual
operators and the community, based on observation of how this all
plays out in practice.
One Implementation
It's easy to talk in generalities. I'm going to describe one
particular way I could envision this being implemented. This proposed
approach is actually not specific to Lemmy -- it would benefit Lemmy
quite a lot I think, but you could just as easily use this technology to
carry load for a Mastadon instance or a traditional siloed web
site. It's complementary to Lemmy, but not specific to it. Also, this
is going to be somewhat technical, so feel free to just skip to the
next section if you're just interested in the broad picture.
So like I said, I propose to make peer software that provides capacity
to the system to balance out the load you're causing as an
end-user. The peer is extremely simple -- mostly it runs a node in a
shared data store like IPFS or Holepunch, and it serves
content-addressable chunks of data to other users. You can run it as
an app on your phone if you have unlimited data, you can run it as a
browser plugin (which speeds up your experience as a user, since it'll
have precached some of the data the app will need), you can run it on your
computer back at home while you access Lemmy from the road, etc. The
peer doesn't need to be trusted (since it's serving
content-addressable data that gets double-checked), and it doesn't
need to be reliable or always on. The system keeps rough track of
how much capacity your peer(s) have added, and as long as it's less then
your user has consumed, you're fine if your peer goes away for a
couple of days or something.
When you, as a user, open your Lemmy page served by the instance, what
you get served back is tiny: Just a static chunk of bootstrapping javascript, a
list of good peers you can talk to, and a content hash of the "root"
of the data store. What the bootstrapping code does, is to start at
the "root" of what it got told was the current state of the content,
and walk down from there through the namespace, fetching everything it
needs (both the data and the Lemmy app to render it and interact with
it) by making content-addressable requests to peers. Since it all
started with a verified content hash, it's all trustable.
It's important that the bootstrapping code in the browser verifies
everything that it gets from every peer. You can't trust anything you
get from the peers, so you verify it all. Also, you don't trust the
peers to be available -- the bootstrapping code keeps track of which
ones are providing good performance, and doesn't talk to just a single
one, so if one is overloaded or suddenly drops out, the user's
experience isn't ruined. Also, you're able to configure a peer you're
running to always keep full a mirror of some part of the data store
that you're invested in. That's vital, because this system can't
magically make all data always available without anyone
thinking about it -- it just decouples (1) an instance you can always
reach, which is probably on paid hosting, from (2) a peer which
provides the heavy lifting of load capacity, but might drop out at any
time, i.e. can run on unmetered consumer internet. You as a moderator
still need to ensure that (1) and (2) are both present if you want to
ensure that your content is going to exist on the system.
The end result of this is that the end-user's interaction with the
system only places load on the instance when it first fetches the
bootstrapping packge. My hope would be that it can be small enough that you can run a fairly
busy instance on a $20/month hosting package, instead of paying
hundreds or thousands of dollars a month. Also, like I said, I think culturally it would be way better if running a peer
was a requirement to access the instance. That's up to the individual
instance operators, obviously, but to me people shouldn't just be
entitled to use the system. They have to help support it if they're
going to add load (since it's become trivial enough that that's
reasonable to ask). Aside from ensuring load capacity, I actually
think that would be a big step up culturally -- look at the moderation
problems every online forum has right now because people are empowered
to come onto shared systems and be dicks. I think having your use
of the system contingent on fulfilling a social contract is going to
empower the operators of the system a lot. If someone's being
malicious, you don't have to play whack-a-mole with their IP addresses
to try to revoke their entitlement to be there -- you just remove
their status as a peer and their privilege to even use the system
you've volunteered to make available in the first place.
I've handwaved aside some important details to paint the broad
picture. How do updates to the content happen? How do you index the
data or make it relational so you make real apps on top of this? How
do you prevent malicious changes to the data store? How is a peer that's
port-restricted or behind NAT still able to help? These are obviously
not minor issues, but they're also not new or extraordinary
challenges. This is already long enough, so I'll make a separate post
addressing more of the nitty-gritty details.
What's the Result?
So to zoom back out: One result, hopefully, is that the experience
becomes faster from the end-user perspective. Hopefully. I believe
that the increase in capacity will more than make up for the slowness
introduced by distributing the data store, but that's just theory at
this point. I would also argue that this will start to open up
possibilities like video streaming that are hard to do if instances host all the content. But regardless of
that, I think big popular instances not having to pay ever-increasing
hosting costs is huge. It's necessary. It's not a trivial
benefit. And, in addition to that and the cultural issues, I think
this improves the overall architecture of the system in one more very
significant way:
Because the Lemmy app itself becomes static (AJAX-utilizing javascript
which exists fully within the shared data store), it becomes trivial
to make your own custom changes to the app even if you don't want to
run an instance. You
can clone the Lemmy app in the data store, make revisions, and then
tell the system that you want to see your same data but rendered with
the new version of the web app. Ultimately the entire system becomes a
lot more transparent and flexible from a tech-savvy user's
perspective. You don't have to interact with "the Lemmy API" in the
same way people had to interact with "the Reddit API" -- your modified or independent app just
interacts directly with the data. This is a huge shift further in the same
direction that started with federating the servers in the first
place. Part of the further future beyond this document is the
possibility of opening up a lot of tinkering possbilities for
tech-savvy end users, and expanding what even non-techy end users
would be able to do with the apps they're interacting with.
Getting It Done
So I think I'm hitting a length limit, so I'll fill in the details of the first steps I want to take, down in the comments.