The amount of information on the Internet is growing daily; it is also changing. For many purposes, including cultural and scientific research, storing old versions of news or web pages, audio, or video is essential.
For that reason, many public and private web archives were created to preserve the digital heritage and transfer it to future generations.
The Wayback Machine platform is the crucial web archive with the history of many websites from their start. Many alternative services can be used additionally or instead of Wayback Machine. This article is about them.
What is a Wayback Machine?
Wayback Machine is one of the oldest digital archives of websites worldwide, except China and Bahrain. It was founded in 1996 for private storage, but only in 2001 it started public storage. Wayback Machine is a web version of the Internet Archive, which is a non-profit organization. Now it includes 735 billion web pages, 41 million books and texts, 14.7 million audio recordings, 8.4 million videos, and much more, and the number is currently increasing. The project is free, but the developers appreciate donations. This archive is a member of many Library Associations, Libraries, Information Resources, Library Federations, etc. All users can upload media and add a web page to Wayback Machine Archive.
Wayback Machine is available as a Browser extension (Mozilla, Chrome, Safari, and Edge) or as a mobile app (Android and iOS). After registration, you can access web pages, books, or audio. Some content you can download. Books should be borrowed for reading for 14 days in Adobe Digital Editions; music can be listened to for free from the website. The website limits requests to Archive – only 15 per minute; also, it limits the number of books for loaning – only 5.
Wayback collects information by crawling the Internet, but the content is not restricted by the publisher or without rights obligations. The frequency of information capturing of web pages differs from daily to yearly. On the first pages, the most popular collections are presented; also, there are hot buttons for accessing books and music. You can visit FAQs or Help Center for newcomers to the Internet Archive. The service appreciates not only donations but any volunteer work as an Open Library Developer or Open Librarian. Also, the Internet Archive hosts more than thirty annual special events, where volunteers are welcomed. The complete list of projects and opportunities is available on the official website.
- Free usage
- Enormous worldwide archive of websites, books, audios, videos and images
- Help center
- Easy access to archives
- Access to books, web pages, audio
- Possibility to archive websites and media on request
- Separate application for iOS, Android
- Limitations to archive requests
- Some data is stored violating copyright laws
- Some web pages can be not finished as crawling process is in progress
- Slow processing
Wayback Machine Archive can have gaps, or its functionality needs to be improved for your purposes. It also cannot have an entire web page or process your request too slowly. That is why you may search for alternatives. A quick comparison of the eight top alternatives is presented in the Comparison table.
Free version | Web page storage | Media storage | Review media on the Archive | Review captured websites on the Archive | Archive per request | Mobile version | Website monitoring | Integration with other Archive resources | Extensions | |
---|---|---|---|---|---|---|---|---|---|---|
Wayback Machine | ||||||||||
Fluxguard | ||||||||||
Archive.fo | Chrome | |||||||||
Perma.cc | Only for registrar organizations | |||||||||
Pagefreezer | ||||||||||
Smarsh | ||||||||||
GitHub | ||||||||||
Memento Time Travel | Mozilla Firefox / Google Chrome | |||||||||
UK Web Archive |
Fluxguard
Fluxguard is a simple tool to monitor website changes using artificial intelligence. The tool can summarize all critical new changes into reports, which you can filter for positive and false positives. It uses GPT-4 for precise change detection. The tool doesn’t use much hard drive space as it is a cloud-based solution, but still, it checks millions of web pages. With Fluxguard, you can search for information leaks, errors, compliance threats, competitor analysis, or valuable integrations. All research commands can be automated by simple rules and setting keywords, alerts, and DOM elements. You can set several authorization levels for your team or ask Fluxguard’s Solution Architects to provide a report for you.
Generated reports include detailed information about the required source of interest, including URL, screenshot, or extracted text with a change. You can change reports for side-by-side HTML views or short case descriptions. Fluxguard also crawls password protection web pages; following a simple tutorial, you can easily monitor changes there as well. For crawling, Fluxguards render with Google Chrome every page (dynamic crawling). To better understand the format of reports, you can check the available demos on the official website; the report may include changes in coding as well as source codes, timing, and other detailed information. Upon your request, Fluxguard can create a website archive for five years and more with all changes, texts, screenshots, and HTML.
Fluxguard proposes everyone start for free with monitoring of 50 pages. For more extensive research work, there are four other proposals for users. Periscope ($99/month) allows to crawl up to 10.000 every 5 minutes; Telescope ($199/month) with 25.000 pages to monitor; Horoscope ($499/month) with 100.000 pages crawling and Enterprise plan (the price is provided individually) for large teams. All plans differ by the number of allowable users per account, availability of CharGPT, and technical support. All paid plans have a 7-day free trial period.
- Free account with 50 pages to monitor
- 7-day free trials
- Cloud-based tool
- User friendly
- SaaS infrastructure
- Several training options
- Dynamic website monitoring
- Website archive creation on demand
- Proxy network
- Customizable reports
- No firewalls or threat response
- No clear data regarding privacy protection
- No version for mobiles and desktops (Windows, Mac, Linux, Chromebook)
Archive.today
Archive.today is a straightforward tool to store and check the history of web pages. It was founded in 2012 as a web archiving site. It has many URLs: archive.today (main); and other extensions ph, is, li, vn, fo, md. All URLs lead to the same website. The tool archives the website also by request of the users; it saves the page content, including images, but without dynamic content. Unfortunately, there are no web crawlers in this tool. To start the archive process, you need only to enter the URL of the website; the process will be started. In the end, you will see the overall webpage history with changes and dates of their implementation. Archive.today provides links to where the snapshots were taken. You can check the history of some web pages if other users created archives several years or months ago.
The tool is free without limitations, but the developers would appreciate any donation. Some requests for the archive are performed immediately; for others, you need to stay in the queue and wait for some time. The size of the archived page should not be more than 50 MB. All data is stored in HDFS format. There are some program limitations in some countries (Australia, China, Finland, and Russia).
While the history of web pages is made, none can delete it from the Internet. Archive.today uses Google and Yandex search engines. The website is helpful for people who want to save Internet content from blockage or changes. Other users already save thousands of pages. For the last month, it was visited by half a million visitors. It also does not store any malicious content so that you can be free of viruses checking snapshots of websites. You can download Archive Page for free – Google Chrome Extension – to record your bookmarks automatically. You can search all content through the context menu on all archived pages.
- Free tool
- Record websites by request
- Possibility to check history if pages were archived before
- No possibility to delete archived page
- Do not record viruses
- Google Chrome extension for automatic archive
- All recorded data can be indexed
- Records into HDFS format
- A lot of ads in the program
- Do not use have web crawlers
- Page size for archive can’t be more 50 MB
- Records only text and images
- Do no record authorized web pages
Perma.cc
Perma Links is not a straightforward archive website; its primary goal is to freeze a webpage linked to the required website. This tool is extremely powerful against link rot, preventing your website from pointing to invalid or malicious pages. PermaLinks supports Chrome, Safari, Firefox, and IE10+. The archive pages are stored in WARC format. Perma creates the archived copy of the linked page (blog, article, etc.), and on your website, the link will always be referenced to this recorded page. The article authors can forget about regular checking of the links.
Perma.cc services are free for all authorized users from registrar organizations, including public libraries, academic law journals, courts, or other legal organizations. All libraries can become Perma.cc partner libraries as Harvard Library created this tool for the Law School users. Other organizations should pay a monthly fee – $10/month (for ten new links), $25/month (for 100 new links), and $100/month (for 500 new links). A free trial with 10 PermaLinks is available for all organizations and individuals. For all users, the access and visibility of Permalinks are the same; the difference is only in the number of links.
The detailed guide is available on the official website to make proper Permalinks. All links are created using Browser Tools. The PermaLinks look like regular URLs for regular readers of the user’s content. Links lead to the Perma website, where the original archived version is presented, and the current version of the linked website. The date of recording will be available as well. In articles, the Perma Links can be referenced with a unique number. Be aware that Perma does not record links from the referenced pages (deep links). If you want to set specific rights for archived pages, you may contact the developer’s team.
- Free for registrar organization
- Possibility to delete a link within 24 hours after creation
- Creates an archived copy of web pages
- Archives are not available for search engines
- The user has access to the archived web page and a current live version
- Easy way to create a record of the web page
- Do not have a free version for personal use
- No guarantees for the duration of Permalinks
- Paid subscription has strict limitations for the number of Permalinks
Pagefreezer
Pagefreezer is a multifunctional solution for website archiving and social media monitoring. It also has a separate solution for enterprises to monitor data breaches and misuse of corporate information. To receive prices or detailed information on the Pagefreezer tools, you need to register on the official website. All prices are provided on your real needs and requirements.
Regarding website archiving, Pagefreezer provides automated tools for capturing all changes. The tool automatically makes snapshots of the chosen web pages, making a reliable archive with date and time recording. Using the calendar, you can always log into your Pagefreezer account and find the one from the required date in saved web pages. The view can be split into two to compare the two versions and see deletions highlighted in red and additions highlighted in green. If required, you can use advanced website search through all archive websites and webpages; there are multiple filters to simplify the research. For legal purposes, all archive files have metadata with SHA-256 digital signatures to prove the authenticity of web pages.
Regarding social media monitoring, Pagefreezer provides archiving tools for capturing every post, deleted comments, and reactions and monitoring real-time conversations and posts based on the chosen keywords, numbers, or patterns. The archiving tool works the same for social media as for any other website; in your Pagefreezer account, you always have access to edited and deleted information. You can protect your company from the offensive behavior of some attendants or sharing sensitive information by setting multiple flagged keywords. Artificial intelligence will keep an eye on all potential threats and send alert messages for immediate reaction. You also can use advanced search for research-required information through all archived social media pages. Pagefreezer records from Facebook, Twitter, Pinterest, Instagram, LinkedIn, YouTube, Tumblr, and other popular social networks.
- Capture web pages in Javascript/ Ajax framework
- Monitoring of social networks in real-time
- Possibility to check history for the required date
- Comparison of archive web pages for deletions and additions
- Advanced website search through all recorded websites
- Digital signature on all captured websites and web pages
- Export of web pages in PDF
- Separate solutions for enterprises
- Detailed information about tools is provided only for registered on the official website users
- No fixed price or subscription plans; all information is provided by request
- Lack of integration with third parties software
- Limited functionality
Smarsh (Actiance)
Smarsh has been one of the leaders in Information Achieving services for many years. In 2017 Actiance and Smarsh were merged to operate under the Smarsh brand. It provides services for capturing and archiving digital information for Financial Institutes, Governments, Law Companies, Medical Care, and other interested users. All captured information is stored in the native format and looks like the original. First of all, this tool was created for all Companies to comply with valid regulations and legal requirements. There is no specific price for Smarsh services; you need to contact the sales team by describing your current needs and leaving your Company details.
It helps to supervise and capture all electronic communication channels in one place to reduce risks and perform quicker all internal investigations and legal cases. Smarsh keeps an eye on the most used company’s encrypted messengers – WhatsApp and WeChat (across 100+ channels) – recording native formats on mobile and desktops. By setting keywords, you will be quickly alerted about any potential threats to your company’s reputation and trust from your partners.
Meanwhile, none of the personal information is disclosed and recorded for Third Parties; the storage is encrypted and certified with SSAE-16 SOC II. You can make dashboards with visualization based on all captured information. Also, you can find any required data by searching by user, channel, or other criteria.
Another service from Smarsh is dynamic archiving of all changes on your website. You can compare all changes side-by-side and export all data in PDF or JPEG.
This feature can be used for restoring any version of your website in case of dropdown. For other valuable features, you can visit Smarsh’s official website, with tons of blogs, podcasts, press releases, and webinars describing your opportunities. The technical support team is available by phone, or you can check the community for the related subject.
- Electronic communication monitoring
- Intuitive navigation
- Customizable engine policy for classification flagged messages
- Boolean search in the captured files
- High-security level
- Dynamic website archiving
- Integration with Teams
- 24/7 technical support
- Some time is required to learn all possibilities of Smarsh
- No clear subscription plans
- Exporting process is time consuming
GitHub
In this review, GitHub is the most functional tool with proposals for increasing your team’s productivity, collaboration, and security.
Regarding archiving capabilities, GitHub has repositories for your project’s code versions. GitHub has a partnership with Internet Archive Organization to store open-source software for future generations. To request your work be archived, you need to add the open-source license to your project. With repositories, you have easier control and access to your software versions. Repositories have an interface based on WordPress source code, so you can review the various branches and who and when committed there.
But GitHub is mainly interested in other products. It is a developer platform with many features. We will describe the three – the most popular. The first of them is a virtual machine or a container to try and start any web services. All actions will be performed per set matrix workflow on several operating systems (Linux, macOS, Windows) to save you time. GitHub understands all languages, from Python to Rust. The second service is to Packages for software development and integration with APIs, Actions, and other services. Thirdly, GitHub provides services for automatically checking your source code for vulnerabilities. All GitHub users can step into Open Source Security Foundations for communication regarding the latest achievements in security protection.
Unlimited private and public repositories are available for free. Also, you will receive 500MB of Package storage and access to GitHub tools – Copilot and Codespace. Copilot helps to write source code using Artificial Intelligence, and CodeSpace is your cloud Dev environment. With paid plans – Team $3.67 per user/month and $19.25 per user/month – you will get access to more Package storage, web-based supports, pull requests, and more. To start work with GitHub, you need to sign up for a free account. To understand better the possibilities of the platform, we recommend you follow numerous guides and tutorials.
- Free archive storage of open-source software projects
- Platform to develop and test software
- Possibility to assign different access to the team members
- User-friendly interface
- Easy access to version history
- Review of collaboration progress
- Integration with Third Parties application
- AI for checking source code
- Archive only for open-store software
- Only two subscription plan
- Complicated code visualization
- Not so many customization opportunities
Memento Time Travel
With Memento Time Travel, you become a web traveler in the history of the website versions. Memento is part of the National Digital Infrastructure and Preservation Program from the USA. It is a science project designed by Los Alamos National Laboratory. To get access to the history, you need only to insert the UTL and the time (you can search for several versions inside a set time frame). You will receive links with all available archive versions. Memento time travel collects versions from Stanford Web Archive, Wikipedia, Island Web Archive, Canadian Archive, and Perma.cc, Archive.is GitHub, National Library of Israel, UK National Archives Web Archive, and more than ten other International Archives.
If, in the required time, Memento can’t find any versions, it will propose the closest version (you need to activate the option “get near current time”). It also reconstructs all versions to look similar to the current one. For more straightforward navigation in the version, the tool has a Time Map listing available Mementos on different remote servers. For a quick search of versions, you can install the Google Chrome extension, which has full functionality and even navigates in different languages on Wikipedia. On all old versions, all links are workable. The service, unfortunately, can reconstruct future versions of web pages.
The extension and web service are free for all users. But you need to read the privacy policy and terms of use as the website collects information about visits – IP address, software, hardware, requested websites, etc. This information is used for analyzing traffic and monitoring proper usage. None of this information will be transferred to Third Parties, but if you follow the links from the website version to Third Party sites, these rules are not applicable. If you do not follow the rules or policy, access to services can be restricted for you.
- Free access for all purposes
- More than 20 Archives to check the version history
- Possibility of choosing the required time
- Possibility to choose specific archive
- Available extension Google Chrome and Mozilla Firefox
- User-friendly
- Limited functionality
- Low speed
- Do not create archives, only searching through already created
UK Web Archive
UK Web Archive is an automated archive of UK websites, especially interesting for scholars and professional needs. The tool also collects other important international websites and websites dedicated to special events or interests. There are 138 collections that the curators are interested in; on the official website, you can check them all. The UK Web Archive uses crawlers, which visit websites with different frequencies (monthly, daily, or weekly). Access to the tool is free; you only need to insert the URL in the search line and wait. You will receive the list per year with old versions of the website. The archive already in 2017 collected 500 TB of data. Only 558 titles of a news website are available. UK Web Archive is one of the six partners of UK Legal Deposit Libraries.
If you want your UK website to be crawled by the UK Web Archive, you need to create a site map and check areas preventing crawlers’ access to the websites. You may check with the developer team about what content will be archived. You also may contact the developer team to highlight an exciting website you think is important to archive. For that, a contact form is in the “Save a UK website” section. But sound and video platforms with predominant content, private data, and social networks are not archived.
The deposits of website archives –the Digital Library System – are protected by antivirus software and firewalls without public Internet access. The right holders of private information on the websites can ask for removal from the archive, but before the archive starts, the developer’s team will contact for appropriate permission rights. The website also collects a lot of personal information (name, postal address, email address, gender, social media accounts, etc.) to personalize service, administration of legitimate use, and compliance with governmental requirements.
- Free access to the website archives
- Easy navigation with the keyboard
- Search engines index UK Web Archive
- A most comprehensive collection of UK websites
- Possibility to use a screen reader for most parts of the websites
- Possibility to suggest your website or interested in the collection
- Mainly UK websites are collected
- Some websites cannot be fully archived
- Personal information for registration can be provided to Third Parties
Conclusion
This review contains eight services to help review, monitor, and capture essential Internet data. Some services monitor all communication channels (Smarsh), others preserve links from your website to other resources to keep them constantly valid (Perma.cc), another capture all open source programs (GitHub), and the rest provide standard archive service similar to Wayback Machine (UK Web Archive, Archive.fo). You can choose between them to choose the exact one for your needs. Meanwhile, some tools are free, but you need to pay for some.