Preface: This is a rather fascinating and helpful article created by a user named Lokhir who had a very short stay on the Mariana Bay forum as shortly after posting he vanished. Little did he know this article would go onto be one of the best on the site and wherever you are Lokhir I wish you well, and hope one day you come back to the site to bestow us with your wisdom once more. Here’s a link to the original thread for those interested (https://forum.marianabay.com/threads/data-hoarding-thread-guide.162/), you’ll notice the current author is ThirdyAughtSix, this was done for preservation’s sake.
Anyways, without further ado onto the original guide… (Lokhir used a lot of unique formatting with this article so forgive me for not perfectly replicating it on here, this guide is best experienced with the forum link above)
Welcome to the data-hoarding thread (and guide), where I will explain what is data-hoarding, why you should do it, and how.
(This guide shall be edited with time. Consider it not entirely finished. The thread also is for general discussion of data-hoarding, the guide is just a bonus).
You might wonder:
What is data-hoarding?
If you type “data-hoarding definition” on Goolag™, you’ll be prompted with a somewhat negative definition, going as far as to qualify data-hoarding as a “mental disorder”, “compulsive behavior”:
In reality, data-hoarding is, in my own definition, the process of downloading and storing every single tidbits of internet media you like or come across, it is more of a value/principle and of a hobby.
I didn’t start getting into data-hoarding because I had some form of “compulsive disorder” but who knows. In simpler words, if you like something on any platform like TheyTube or a website that you like, series, movies, animes, imageboard threads, blogs, texts and such, you download it and store it on either a hard-drive or else (USB key storage is rare since not many have Terabytes of capacity, or are much more expensive, and are easier to lose in a corner of your room or else, also tend to dysfunction more often).
But then, it begs the following question:
Why should you do data-hoarding ?
“Why would I want to download EVERYTHING (or not) I like on the Internet?” you might ask yourself. And that is a valid question.
Well, because of several reasons:
1. Because everything on the Internet gets deleted at some point.
That’s right, contrary to what the (false) saying is “Nothing disappears from the Internet”, everything does disappears at some point. For the sake of storage, or because the person(s) that uploaded a media suddenly want it deleted (which happens much, much more often than you think), either out of controversy related to it.
Or because the service that hosted it went bankrupt or faced legal issues (many file hosters recently were deleted, same for torrent websites like rarbg that was recently shut down), or because they grew tired of it, obscure IRL reasons… there are millions of reasons really.
Or sometimes because things (such as softwares) were either only launchable or available on a very old OS/machine and are today nearly unfindable or do not work even on emulators (such is the case for the iMac TTS, a text-to-speech software famous for its use in the anime Serial Experiment Lain [which i’m not necessarily a fan of, or its weird connection with troons]).
Literal hundreds of pentabytes (if not more) were lost with time since the beginning of the Internet. And to point out that you never know when things might get deleted, I at one point data-hoarded a TheyTube channel, and merely a week later the owner decided to delete all of the videos with no prior warnings, and I now only have 50% of the videos on hard-drive, sucks, although I probably am the only person with the videos still, which are on my hard drive.
You never know when the things you like can disappear.
Another example is the way imageboards function with threads. They have a limited number of threads active, only keeping those who are bumped/have the most interaction on, and deleting threads that went inactive for a long time, to save disk space. As such, an insane amount of threads were deleted to this day, fortunately with some being available on archive websites, torrent collections of threads, or rarely relayed with screenshots by other people somewhat randomly.
As a sidenote, Goolag™ is going to remove all TheyTube channels that are unactive for more than 2 years (goodbye pentabytes of Internet memes and history of Y2K) in december 2023. So another reason to start data-hoarding.
Data-hoarding is also often the only way lost media can ever get solved/found (another interesting topic you could be interested in, of which I will not talk about however, in this guide).
2. Because of censorship.
As most of you already know, platforms like TheyTube changed policies with time, progressively becoming more and more restrictive and enforcing globohomo agenda. As such, many (although not limited to) TheyTube channels were banned, videos censored (and more recently age-restricted for some), some deleted due to copyright enfringement, with no hope of ever recovering them (unless they were archive but more often than not they are not if the videos/channels don’t go further than 5k or 10k subscribers), on top of the fact that TheyTubers generally do not store all of their videos on their hard-drive (to give you an example, I have some videos I recorded with OBS on my hard drive, and something as simple as gameplay of
trolling in L4D2 in HD in 12 mins takes nearly 3 gigabytes, but that’s just a single case and the result varies in general. Imagine the TheyTube channels that have hundreds of these videos, on top of having all TheyTube videos by default having 720p, with some having 2k, or even 4k quality available).
And anything that is remotely politically incorrect or contains swear words or “nudity” (more of an excuse since videos banned for that reason do not generally have explicit nudity despite e-whores never having any problems with their videos) gets banned too.
3. Because you can watch/access the things you data-hoard whenever you want.
With data-hoarding, you can access everything you downloaded at whatever moment you want, with no bandwith, and without depending on some corporation’s platform, websites and ToS to access, and (generally) no one can delete what you have on your hard drive, no matter what it is. You don’t need to send information to anyone, not even your ISP if you look at the things you data-hoard (although the exception can be made that your OS technically could know with data collection, or possibly glowies since most CPUs and motherboards have backdoors inside of them ).
4. Because you can transfer the things you data-hoard on whatever devices you want.
If you downloaded some .mp4 or .mkv files (videos), you can transfer them from one disk to another, to your phone, your tablet, and share it with your friends if you want, since the things you have on your disk drive cannot be deleted (with the only exception that you still technically can lose it if your hard-drive is damaged, which i’ll talk about later on).
5. Because it stands for web preservation and archiving.
If you’re more of a moral/value person, it also stands for web preservation and archiving, allowing medias, history and such to be kept. Most of the internet’s history/drama/else are generally written/shown by data-hoarders, which still are (to various degrees depending on the person) archivists.
??. And other hundreds of reasons.
There are many other reasons that exist for one to want to data-hoard, which either that currently do not come to my mind, or because others are unknown. The reasons I gave you are reasons that generally apply to me, and as such, there can be many other reasons for other persons.
Convinced? If so, onto the next and last section of the guide.
How to data-hoard?
“Data-hoarding is cool and all but you still didn’t tell us how to do it. “ Yup. And that’s what i’m going to do. Keep in mind that I am nowhere a professional data-hoarder, I only started data-hoarding about 1-2 years ago which is VERY late for data-hoarding standards. What i’ll give you is more of the way *I* data-hoard, and there may be many more softwares or websites I do not use or talk about, this guide is more of an introduction point do data-hoarding, a beginner’s guide really (which will get updated with time).
Data-hoarding is primarily made by:
– The use of softwares, be it web crawlers, torrent softwares, downloading softwares
– Archive websites, sometimes torrent/file-sharing websites, and bunch of others.
I will now give the websites and softwares that I use to data-hoard, explain briefly how they work, with a few screenshots and small tutorials to get you started.
DISCLAIMER: I data-hoard on windows 10, so there may be small differences of performance or installation process or even software that are and can be used with Linux.
Jdownloader, or the data-hoarder’s best friend:
Jdownloader is the software I use the most often. It is free, open-source, and with it you can download nearly everything you wish. It is written in Java (ew, i know). It can be used to download TheyTube links, files on file-hosting websites, it supports the use of accounts (for example for premium file hosting websites or private trackers or else), and you can tinker with the settings to make the downloads go wherever you want it to go, how the folders must be made, what quality settings you want it to use for TheyTube videos (by default it takes 720p if I remember correctly), it even downloads descriptions in .txt files, as well as subtitles in a .srt file (generally those automatically generated in the video).
All-in-all a very good data-hoarding software. It also has a (by default activated) clipboard function that grabs all the links you copy, so that it automatically puts it in the links to download (and you can turn that off if you want), it’s perfect to download multiple youtube videos, you just copy the link and it grabs it easily, perfect for your liked TheyTube videos playlist (since it is by default private) where you can simply right click, copy and do that over and over (still quite a long process but much less compared to using third party websites to “convert” the video), although I never tried putting my account login in the software.
Link for Jdownloader: https://jdownloader.org/jdownloader2
Choose your OS, it will redirect you to MEGA and from here you just download and execute it.
Don’t mind the very old layout and UI, it is a legit and good software.
How to use it:
Jdownloader is very easy to use and only revolves about three parts:
– Downloads section
Here is how the downloads look like (with copious amounts of sometimes unnecessary captions):
*Editors note: It’s occurred to me the original images from this article have been lost and I can’t find a backup for them in the article’s history on the forum neither, this is an unfortunate loss but the guide is still awesome nevertheless.*
Then, the settings. You don’t really need to tinker with it if you just want to simply download links and whatnot, but I modified one setting that allowed to name the file with the date of which
the original file (that I downloaded) was uploaded on the internet, useful to date TheyTube videos. IF you want to do that, go to the settings tab, Plugins, choose the TheyTube.com plugin, scroll down until you arrive at “filename & packagername”, scroll some more and in filename for video files, make sure the following is written:
*3D* *360* *VIDEO_NAME* (*H*p_*FPS*fps_*VIDEO_CODEC*-*AUDIO_BITRATE*kbit_*AUDIO_CODEC*)*DATE_UPLOAD* *DATE*.*EXT*
and you should be good. But again that’s not necessary and i’ve had a few problems with it, make a backup of the original line in case you have problems.
Now that Jdownloader is out of the way, let’s now talk about the second thing I use the most. It technically isn’t strictly made for data-hoarding.
Qbittorrent, or the free sailor’s ship:
Qbittorrent of which most of you probably already know is a torrent software, it’s once again free, and open-source, and without the sketchiness of the garbage-tier ex-crypto-miner malware that is utorrent.
But why would we want to use a torrent software for data-hoarding?
Well, because there are many torrents out there that are collections of different stuff, including for example things such as 20 Gb of 4chan threads between 2009-2012. If you search well you can download multiple collections of archived stuff, youtube videos, etc.
Or even simply download series, animes, and such, which TECHNICALLY still counts as data-hoarding.
Link for Qbittorrent: https://www.qbittorrent.org/download
Qbittorrent is even easier to use than Jdownloader, so I won’t be providing any guide here. Just a tip however, I recommend once you downloaded Qbittorrent that you go in Tools (or just click the cogwheel, easier like that) -> Settings -> BitTorrent and tick “Enable anonymous mode”.
Then, onto the third thing I use the most to data-hoard.
Archive.org, the heaven of all data-hoarders:
archive.org/ (<- link), a website that devotes itself to web preservation, unfortunately facing legal problems (lawsuits) due to ‘copyright issues’. Hopefully it will stick around, yet I recommend you still download as much stuff as you can there (and you can even couple it with Jdownloader). Always go from the principle that if it can be deleted, it will at some point.
You can find an INSANE amount of things there, be it books, youtube videos, even includes the wayback machine which allows you to go to snapshots of websites (if users were kind enough to do snapshots of them), some trace back to early 2000s.
The only problem with archive.org is its difficulty to navigate and find the stuff you want to look for, the search bar isn’t that great, and you’ll definitely have to be patient (or lucky) to some degree.
Therefore, you also can use another alternative, which is to write in Goolag™ in quotation marks the things you want to find, for example “lost media” “playlist” site:archive.org. What it basically does is that it will only show the pages that contain the words in the quotation marks, and limits itself to the website archive.org. Much quicker alternative indeed.
HTTrack, the website downloading software:
Did you ever told yourself “man, i’d really like to have this whole website available at any time on my PC ” and thought it to be not possible? Sure you could archive.org, but it is clunky, long to use, and doesn’t allow you to access the website offline, without third-parties and it doesn’t give you the guarantee that archive.org will stay forever.
I present to you WinHTTrack, an open source offline browser (and web-crawler/downloader) for websites. It is quite old, and its last update goes back to 2017, but it still works perfectly.
Link for HTTrack: https://www.httrack.com/page/2/en/index.html
Both the UI from the website and the software are very old-looking, but it still works just fine and is legit. Do note that you need to make an individual folder for a website you wish to download. Since this software isn’t too hard to use either, i’m not going to post a guide for the time being, it’s very accessible anyway. Also, if you can’t decide in action “download web site(s)” or “download web site(s) + questions” just choose the first option.
Tips for data-hoarding:
– ALWAYS backup the stuff you download on other disk drives. You never know when you can have a faulty disk drive that decides to die for some random reason.
– IF you decide to use servers instead of disk drive storage (which also is possible in data-hoarding), you need to be prepared to handle security and put a solid password, otherwise there is a small probability that you will lose your files or possibly more. You’d be amazed at how many servers exist that are public for a decade and contains a lot of stuff everyone can download (even if it was not supposed to be so), with security issues. In my opinion disk drive storage without internet is the most secure and efficient way, but whatever floats your boat of course.
– Consider buying additional disk drives (externals are a good way to do so) if you aim to download a lot of stuff.
And that’s basically it for the guide!
Don’t hesitate to ask questions or even just generally talk about data-hoarding, that’s why I made this thread after all, it’s not just about the guide.
Originally posted on the Mariana Bay Forum on 12/7/2023 (https://forum.marianabay.com/threads/data-hoarding-thread-guide.162/)
Credit again to Lokhir, the OG author of this piece.