Project Description

Nutch is highly scalable Web searching software
which builds on top of Apache Hadoop and Lucene
Java. Key features include a Web crawler, indexer,
crawl management tools, parsers for HTML, PDF,
DOC, and several other document formats, and an
expandable architecture that allows you to plug in
additional functionality such as document parsers,
custom scoring algorithms, custom content parsers,
protocols, and more.

(This Description is auto-translated) Try to translate to Japanese Show Original Description

Review
Your rating
Review this project