public class LanguageIndexingFilter extends Object implements IndexingFilter
IndexingFilter
that adds a
lang
(language) field to the document.
It tries to find the language of the document by checking if
HTMLLanguageParser
has added some language informationX_POINT_ID
Constructor and Description |
---|
LanguageIndexingFilter()
Constructs a new Language Indexing Filter.
|
Modifier and Type | Method and Description |
---|---|
void |
addIndexBackendOptions(org.apache.hadoop.conf.Configuration conf) |
NutchDocument |
filter(NutchDocument doc,
String url,
WebPage page)
Adds fields or otherwise modifies the document that will be indexed for a
parse.
|
org.apache.hadoop.conf.Configuration |
getConf() |
Collection<WebPage.Field> |
getFields() |
void |
setConf(org.apache.hadoop.conf.Configuration conf) |
public LanguageIndexingFilter()
public NutchDocument filter(NutchDocument doc, String url, WebPage page) throws IndexingException
IndexingFilter
filter
in interface IndexingFilter
doc
- document instance for collecting fieldsurl
- page urlIndexingException
public Collection<WebPage.Field> getFields()
getFields
in interface FieldPluggable
public void addIndexBackendOptions(org.apache.hadoop.conf.Configuration conf)
public void setConf(org.apache.hadoop.conf.Configuration conf)
setConf
in interface org.apache.hadoop.conf.Configurable
public org.apache.hadoop.conf.Configuration getConf()
getConf
in interface org.apache.hadoop.conf.Configurable
Copyright © 2015 The Apache Software Foundation