What is the Wayback Machine? : Explaining how to utilize it
contents
- 1 What is the Wayback Machine?
- 1.1 How to Use the Wayback Machine
- 1.2 Searching by URL
- 1.3 Searching by Keyword
- 1.4 Searching for Image, Book, and Video Content
- 1.5 How to Archive a Website on the Wayback Machine
- 1.6 How to Remove Pages Stored in the Wayback Machine
- 1.7 Preventing Archival by the Wayback Machine
- 1.8 How to Restrict Access
- 2 How to Utilize the Wayback Machine
- 3 Summary
The Wayback Machine, operated by the non-profit organization Internet Archive, is a web tool.
In the fast-paced world of the internet, sites are constantly being updated, and older articles may be replaced or removed. However, thanks to the Wayback Machine, which automatically saves site information, users can freely browse past content at no cost.
What is the Wayback Machine?
The Wayback Machine is a free tool that enables users to view and save old web pages. Internet Archive, its parent organization, is a non-profit entity that collects vast amounts of data from the internet and stores it in its database. While it’s commonly thought of as a repository for web pages, it also archives data from platforms like Twitter, as well as music, movies, and books.
At the top of the site’s search input field, you’ll find the statement ‘Explore more than 685 billion web pages saved over time,’ indicating that as of May 2022, over 685 billion pages are archived.
How to Use the Wayback Machine
There are four main ways to use the Wayback Machine
-Search by URL
-Search by keyword
-Search for images, music, videos, etc.
Given the vast amount of data stored, it’s advisable to use the method that best fits what you’re looking for. However, searching for past sites by URL and browsing through their historical records is particularly popular.
Searching by URL
The image above shows the results when ‘ https://www.yahoo.co.jp/ ‘ was entered and searched.
As expected from a massive, it has been archived a whopping 244,190 times between 1996 and 2022. Whether you look at the timeline or the calendar, you can see a considerable number of archival instances.
In the example above, we entered the URL of the homepage, but you can also input the URL of a specific page you want to see the past versions of.
Clicking on the timeline allows you to browse through the calendar for that particular year, and from the calendar, you can select the date and time of the archived data you want to see (refer to the image below).
For instance, the left image below is Yahoo! Japan’s homepage from 2000, and the right image is from 2010. While the basic functions remain the same, you can see significant changes in design compared to the current one.
Additionally, while Yahoo! JAPAN, being a large-scale site, is saved very frequently, it’s common for smaller sites to be saved only once or twice a month, as shown in the diagram below.
Despite storing massive amounts of data globally, it doesn’t necessarily mean that past data is always accessible.
Searching by Keyword
You can also perform keyword searches on the Wayback Machine. Simply enter your keyword in the search box, making it easy to look up information. However, the search results may look different from what you’re used to, as shown in the example above (searching for ‘soccer’). The Wayback Machine often searches by URL, so it’s good to use it as a supplementary tool.
Searching for Image, Book, and Video Content
At the top of the page, on the right side of the logo, you’ll find various icons lined up. Typically, you can browse past data by performing URL searches followed by keyword searches. However, it’s also possible to review data other than the usual websites, such as books, images, and music.
How to Archive a Website on the Wayback Machine
The Wayback Machine is an exceptional tool that automatically archives websites from around the world. However, due to limited archival resources, it’s entirely possible that new sites may not be archived at all, or if they are, the frequency of archiving may be very low.
To address this, you can input the URL of the page you want to save into the ‘Save Page Now’ feature, located at the bottom right of the Wayback Machine’s homepage (see the red box in the image above).
Below the input window, you’ll see the message ‘Capture a web page as it appears now for use as a trusted citation in the future.’ This emphasizes the potential value of saving a page for someone’s reference in the future.
How to Remove Pages Stored in the Wayback Machine
Regarding the removal of pages stored in the Wayback Machine, the help page states the following
“How can I exclude or remove my site’s pages from the Wayback Machine?
You can send an email request for us to review to info@archive.org with the URL (web address) in the text of your message.”
Source: USING THE WAYBACK MACHINE
In other words, there isn’t a specific deletion request form. You’ll need to email info@archive.org with the URL you want to be removed, along with a request for deletion in the body of the email.
However, since the operating entity, Internet Archive, is based in the United States, all correspondence must be conducted in English, requiring caution in communication.
Preventing Archival by the Wayback Machine
While the Wayback Machine is a highly useful tool and typically used just for browsing.
However, if you need to delete something, you must send the URL you want to remove via email and communicate in English, which can be time-consuming. If you wish to prevent archival, you’ll need to take steps beforehand to ensure your content isn’t archived.
However, note that altering the robots.txt file to prevent archival may inadvertently block access from search engines like Google, potentially causing significant negative SEO impacts.
Generally, being archived by the Wayback Machine doesn’t have adverse effects. Only if necessary, seek assistance from someone knowledgeable about robots.txt to make the necessary adjustments.
What is h3: robots.txt?
Robots.txt is a file that allows you to control which parts of your website crawlers like search engines can access or be denied access to.
While some people consider using robots.txt to prevent their site from appearing in search engine results, typically, to prevent it from appearing in search results, you should use the ‘noindex’ setting or set up password protection.
How to Restrict Access
By rewriting robots.txt, you can prevent the Wayback Machine’s crawler from accessing your site, thus preventing it from being archived.
The syntax itself is very simple, and by adding just the following two lines, your site won’t be archived
User-agent: ia_archiver
Disallow: /
However, with the above settings, you’ll be denying access to all archives of your site. If you want to deny access to specific directories only, you should specify as follows
User-agent: ia_archiver
Disallow: /sample-directory/
And if you want to prevent archiving of specific pages only, specify it further like this
User-agent: ia_archiver
Disallow: /sample-directory/sample-file/
Again, I repeat, if you make a mistake in the robots.txt settings, it can have a negative impact on SEO, so make sure someone who understands the settings handles it.
How to Utilize the Wayback Machine
While the Wayback Machine is primarily used for viewing past web data, there are various ways to utilize it.
One of the most common uses is for SEO purposes (competitive analysis) and checking for expired domains. With its extensions, you can effectively use it to improve your own site.
Extensions
The Wayback Machine offers various APIs and add-ons, including
-Wayback Machine Availability API
-Chrome Extension
-Firefox Add-on
-Safari Extension
-MS Edge Add-on
-iOS app
-Android app
Add-ons are particularly useful for frequent users.
SEO Strategies (Competitive Analysis)
The Wayback Machine allows you to browse past websites. This means you can see how competitor sites or your own site have changed over time, and compare it with search rankings to understand how sites have improved or declined in rankings.
Especially when competitor sites undergo significant changes, being able to see exactly what changes were made can help understand their intentions and consider how to improve your own site.
Checking Past Data for Expired Domains
When purchasing expired domains, it’s crucial to know their history. Past site themes may have unexpected external links or the domain may have been penalized by Google for being used as a spam site.
You won’t know if a domain has been penalized until you purchase it and register it with Google Search Console. However, you can use the Wayback Machine to check what kind of site operation was conducted before purchasing the domain.
Viewing Deleted Pages
Due to its nature, the Wayback Machine not only allows you to view past versions of existing sites or pages but also deleted ones.
However, keep in mind that sites no longer in existence may have had short public periods, so they may not necessarily be archived. Consider it as something you can view for reference purposes only.
Alternatives to the Wayback Machine
While the Wayback Machine is the most well-known tool for checking web archives, there may be cases where it doesn’t provide the information you need. In such instances, consider using the following alternative tools
WebGyotaku
WebGyotaku is a website archiving tool similar to the Wayback Machine. The main difference is that while the Wayback Machine automatically registers websites using crawlers, WebGyotaku requires users to input URLs for saving.
However, it can preserve image data and even FLASH content almost entirely, making it highly reproducible.
Stillio
Stillio is an automated screenshot service. It allows you to capture screenshots of websites at regular intervals.
Archive.today
Archive.today is also a web archive tool. It’s notable for saving domain images, but it’s more of a supplement to the Wayback Machine.
PageFreezer
PageFreezer is also well-known as an alternative tool to the Wayback Machine. While it is automated, the main difference from the Wayback Machine is that it is a paid service.
Time Travel
Time Travel is unique in that it allows you to search for archived sites. Even for the same domain, the appearance may vary depending on the archived sites, making it useful when you need information from different perspectives.
Summary
The Wayback Machine is invaluable for accessing a vast archive of past websites for free. Not only can you view information about your own site, but you can also explore the past data of competitor sites, making it incredibly useful for understanding the evolution of websites. While typically used for browsing past information, comparing old and current data can also aid in SEO strategies and competitor analysis.