As the internet keeps growing at an exponential rate, it is also imploding in on itself at an even faster one. Companies go out of business, their servers closed and with them websites, important history and part of the internet, dies.
We have all seen them. The horrible walls. The end of the internet. A hole. A page which says “404“.
That beautiful image of your family your took on your trip to Gran Canaria last year and uploaded to that file sharing website, can be gone in an instance. Poof.
How can we prevent such important parts of our life’s and history from disappearing in front of our eyes? By archiving these websites and creating mirror-websites. Websites such as The Wayback Machine run by the non-profit organization the Internet Archive is just one example how we can make copies of the internet. However, we need to do this at a much higher rate.
Websites close down at such a high rate, and are archived at such an infrequent rate that it is impossible to archive all websites as it is today, without missing a whole bunch which just…disappeared.
Please donate to organizations such as the Internet Archive to help their work to archive the web of information. For everyone.
The website FindArticles.com was a great website which was functional until late 2012. It archived journal articles, newspapers and books of all sorts. At this pont in time Wikipedia has over 20 000 links to FindArticles.com, most of the as sources and references. The only downsides is that the website is dead. Most times when a link dies they are archived on other websites such as arhvie.org (the Wayback machine), and this website was, until September 2012. At that time the websites robot.txt was changed, and all archived copies on websites which follows web etiquette was deleted. The website isn’t marked as 404 either, which makes it hard for tools to mark the links as dead too.
Most of the links from Wikipedia to FindArticles are for journal articles, which most likely has doi’s or pmid’s, but not mentioned in the articles. Thes casues some problems. Either remove all links to FindArticles.com with any though of the consequences, or look at each article one-by-one and try to find another copy of the journal online or identification numbers and then remove the links.
Normally you could run the bot Citation bot on incompleat citation and have it fill out and find information which is not already in the reference, however the bot is currently blocked due to Wikimedia’s decition to change to https without real consideration of how it would break all tools currently in place for day-to-day operatons.