de.l3s.boilerpipe.filters.english
Class IgnoreBlocksAfterContentFilter

java.lang.Object
  extended by de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
All Implemented Interfaces:
BoilerpipeFilter

public final class IgnoreBlocksAfterContentFilter
extends java.lang.Object
implements BoilerpipeFilter

Marks all blocks as "non-content" that occur after blocks that have been marked DefaultLabels.INDICATES_END_OF_TEXT. These marks are ignored unless a minimum number of words in content blocks occur before this mark (default: 60). This can be used in conjunction with an upstream TerminatingBlocksFinder.

Author:
Christian Kohlschütter
See Also:
TerminatingBlocksFinder

Field Summary
static IgnoreBlocksAfterContentFilter DEFAULT_INSTANCE
           
 
Constructor Summary
IgnoreBlocksAfterContentFilter(int minNumWords)
           
 
Method Summary
static IgnoreBlocksAfterContentFilter getDefaultInstance()
          Returns the singleton instance for DeleteBlocksAfterContentFilter.
protected static int getNumFullTextWords(TextBlock tb)
           
protected static int getNumFullTextWords(TextBlock tb, float minTextDensity)
           
 boolean process(TextDocument doc)
          Processes the given document doc.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_INSTANCE

public static final IgnoreBlocksAfterContentFilter DEFAULT_INSTANCE
Constructor Detail

IgnoreBlocksAfterContentFilter

public IgnoreBlocksAfterContentFilter(int minNumWords)
Method Detail

getDefaultInstance

public static IgnoreBlocksAfterContentFilter getDefaultInstance()
Returns the singleton instance for DeleteBlocksAfterContentFilter.


process

public boolean process(TextDocument doc)
                throws BoilerpipeProcessingException
Description copied from interface: BoilerpipeFilter
Processes the given document doc.

Specified by:
process in interface BoilerpipeFilter
Parameters:
doc - The TextDocument that is to be processed.
Returns:
true if changes have been made to the TextDocument.
Throws:
BoilerpipeProcessingException

getNumFullTextWords

protected static int getNumFullTextWords(TextBlock tb)

getNumFullTextWords

protected static int getNumFullTextWords(TextBlock tb,
                                         float minTextDensity)