WebCorpus

WebCorpus

This is the web app or web related tool named WebCorpus whose latest release can be downloaded as webcorpus-1.0.1.jar from this website redcoolmedia.net

 
 


WebCorpus


APP DESCRIPTION:

Download this app named WebCorpus.

WebCorpus is a Hadoop-based framework that enables you to calculate statistics on large web corpora extracted from web crawls.

Features

  • linguistic processing of text corpora with multiple GB or TB in size using Apache Hadoop
  • extracts and counts sentences, word n-grams (with or without POS-tags) and cooccurrences
  • reads popular web crawl formats (ARC and WARC)
  • filters input data by language, duplicate URL, duplicate content and encoding errors
  • can be extended by further linguistic counts based on custom UIMA annotations


Programming Language

Java



Free download Web app or web tool WebCorpus from RedcoolMedia.net