this post was submitted on 20 Aug 2023
39 points (95.3% liked)
lemmy.ml meta
1406 readers
1 users here now
Anything about the lemmy.ml instance and its moderation.
For discussion about the Lemmy software project, go to [email protected].
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Wouldn't they theoretically be able to set up their own instance, federate with all the larger ones and scrape the data this way? Not sure if blocking them via the robots.txt file is the most effective barrier in case that they really want the data.
Robots.txt is more of an honor system. If they respect , they won't do that trick.
Robots.txt is just a notice anyways. Your scraper could just ignore it, no workaround necessary.