Heritrix: Internet Archive Web Crawler
This is the web app or web related tool named Heritrix: Internet Archive Web Crawler whose latest release can be downloaded as heritrix-1.8.0.jar from this website redcoolmedia.net
SCREENSHOTS:
Heritrix: Internet Archive Web Crawler
APP DESCRIPTION:
Download this app named Heritrix: Internet Archive Web Crawler.
The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.Features
- deeply and thoroughly harvests website content
- works on any Java platform (Linux recommended)
- stores content to ARC or ISO WARC aggregate/transcript format
- web interface for operator control and monitoring of crawls
Audience
Advanced End Users, Developers, Education, Government, Information Technology, Non-Profit Organizations
User interface
Web-based
Programming Language
Java
Database Environment
Berkeley/Sleepycat/Gdbm (DBM)
Free download Web app or web tool Heritrix: Internet Archive Web Crawler from RedcoolMedia.net