WebCorpus
This is the web app or web related tool named WebCorpus whose latest release can be downloaded as webcorpus-1.0.1.jar from this website redcoolmedia.net
SCREENSHOTS:
WebCorpus
APP DESCRIPTION:
Download this app named WebCorpus.
WebCorpus is a Hadoop-based framework that enables you to calculate statistics on large web corpora extracted from web crawls.Features
- linguistic processing of text corpora with multiple GB or TB in size using Apache Hadoop
- extracts and counts sentences, word n-grams (with or without POS-tags) and cooccurrences
- reads popular web crawl formats (ARC and WARC)
- filters input data by language, duplicate URL, duplicate content and encoding errors
- can be extended by further linguistic counts based on custom UIMA annotations
Programming Language
Java
Free download Web app or web tool WebCorpus from RedcoolMedia.net