de.l3s.boilerpipe.filters.english
Class MinFulltextWordsFilter

java.lang.Object
  extended by de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
All Implemented Interfaces:
BoilerpipeFilter

public final class MinFulltextWordsFilter
extends java.lang.Object
implements BoilerpipeFilter

Keeps only those content blocks which contain at least k full-text words (measured by HeuristicFilterBase.getNumFullTextWords(TextBlock)). k is 30 by default.

Author:
Christian Kohlschütter

Field Summary
static MinFulltextWordsFilter DEFAULT_INSTANCE
           
 
Constructor Summary
MinFulltextWordsFilter(int minWords)
           
 
Method Summary
static MinFulltextWordsFilter getDefaultInstance()
           
protected static int getNumFullTextWords(TextBlock tb)
           
protected static int getNumFullTextWords(TextBlock tb, float minTextDensity)
           
 boolean process(TextDocument doc)
          Processes the given document doc.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_INSTANCE

public static final MinFulltextWordsFilter DEFAULT_INSTANCE
Constructor Detail

MinFulltextWordsFilter

public MinFulltextWordsFilter(int minWords)
Method Detail

getDefaultInstance

public static MinFulltextWordsFilter getDefaultInstance()

process

public boolean process(TextDocument doc)
                throws BoilerpipeProcessingException
Description copied from interface: BoilerpipeFilter
Processes the given document doc.

Specified by:
process in interface BoilerpipeFilter
Parameters:
doc - The TextDocument that is to be processed.
Returns:
true if changes have been made to the TextDocument.
Throws:
BoilerpipeProcessingException

getNumFullTextWords

protected static int getNumFullTextWords(TextBlock tb)

getNumFullTextWords

protected static int getNumFullTextWords(TextBlock tb,
                                         float minTextDensity)