this post was submitted on 19 Oct 2023
91 points (100.0% liked)

Technology

59111 readers
4050 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Related:

Major cyber attack could cost the world $3.5 trillion - Power Grid, Internet Outage

The one database/file/zip to save humanity, what is it?

Show Lemmy the downloadable URL of a Database or AI you know of so we can have a local backup copy that will improve the resilience and availability of Human Knowledge.

Given the state of AI being Corporatized I think we could definitely use links for whatever comes closest to a fully usable Open Source, fully self-contained downloadable AI.

Starter Pack:

★ Lemmy List

Databases

AI

top 14 comments
sorted by: hot top controversial new old
[–] [email protected] 30 points 1 year ago* (last edited 1 year ago) (3 children)

This is too much catastrophism for my taste, but If I wanted to start archiving, I'll start by downloading Wikipedia, The Library Genesis and the Gutenberg Project.

Videos are too heavy to archive with ease, and they are probably of much less value of actual knowledge.

[–] [email protected] 9 points 1 year ago (2 children)

Haven't heard about the Gutenberg project before, seems pretty neat!

I'd probably add repair.wiki to a list of things I'd archive, although some of that content is picture heavy so not as easily compressible as Wikipedia

There was a project that allows you to download wikipedia and some other online resources into an easy to search & navigate UI, think it was called Kiwi something but can't remember. It was targeted at regions with poor internet coverage

[–] [email protected] 6 points 1 year ago

Yup Kiwix, an app available for Android, iOS, Linux and possibly other OSs too.

[–] [email protected] 4 points 1 year ago

Project Gutenberg has been a thing for a couple decades. I think they are starting to also create free audiobooks from books they have in their collection. There is an TTS AI service that I checked out a week ago (play.ht)and that does voicing very realistically from the text that I gave it and I might spring spend $40 for a month of that service and build some audiobooks. The paid version gives access to more voices and will do 1 million characters of text a year.

Or if anyone knows a good open source online alternative, I'm all ears. I'd prefer to go that route but did not give anything that was a very good solution.

[–] [email protected] 8 points 1 year ago (2 children)

Humanity has been using writing for millennia. It's a proven technology. Photographs and video don't tend to last longer than the one institution or family that cares about them.

[–] [email protected] 2 points 1 year ago

Mostly due to previous physical constraints, I would argue. Thankfully there are fewer chances your hard drive is going to decompose into vinegar while sitting in your cupboard, and even if it does, it's likely not the only copy.

They're also more limited for current data because they're harder to parse and convert into other usable formats, but thankfully that will get better over time too.

I still preference text-first data for various reasons, but let's not dismiss the leagues of potential video has for communication and archival value, both intentional and unintentional.

[–] Taleya 2 points 1 year ago

Plus writing dgaf if you get hit with a carrington event

[–] [email protected] 7 points 1 year ago

Perhaps think of it more as knowledge decentralization as a form of resiliency for unplanned network outages. Sometimes the library of Alexandria just happens to catch fire, and it might be nobody's fault at all.

Besides, plenty of people grew up in families with a basic encyclopaedia or dictionary or a repair manual. This is essentially the same thing, just with less paper.

[–] [email protected] 8 points 1 year ago

I'm particulary looking for anyone that already has a collection of Arxiv and Sci-Hub papers. Please curate your collection and make it available here!

We also need a hashtag/topic/keyword for this project that is brief and catchy we can also use for a GitHub search, etc. Anyone?

[–] [email protected] 4 points 1 year ago (2 children)

Is it possible to download an archive of scihub?

[–] [email protected] 12 points 1 year ago (2 children)

Sci-Hub is ENORMOUS, about 100TB. If you want to help preserve it, you can torrent and seed one of their many 100GB chunks.

[–] [email protected] 1 points 1 year ago

Super cool never knew about this. I got probably 1-2tb I can spare for the effort.

[–] [email protected] 1 points 1 year ago

What a fantastic resource, this is exactly what is needed. I also found about The Standard Template Construct Library:

"Learn about how to access large corpus of high-quality scholarly texts using Python and use them in AI apps"

[–] [email protected] 1 points 1 year ago

Does anyone know if a LLM has been trained on something like scihub?