de.l3s.boilerpipe.filters.heuristics
Class KeepLargestBlockFilter

java.lang.Object
  extended by de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
All Implemented Interfaces:
BoilerpipeFilter

public final class KeepLargestBlockFilter
extends java.lang.Object
implements BoilerpipeFilter

Keeps the largest TextBlock only (by the number of words). In case of more than one block with the same number of words, the first block is chosen. All discarded blocks are marked "not content" and flagged as DefaultLabels.MIGHT_BE_CONTENT. Note that, by default, only TextBlocks marked as "content" are taken into consideration.

Author:
Christian Kohlschütter

Field Summary
static KeepLargestBlockFilter INSTANCE
           
static KeepLargestBlockFilter INSTANCE_EXPAND_TO_SAME_TAGLEVEL
           
 
Constructor Summary
KeepLargestBlockFilter(boolean expandToSameLevelText)
           
 
Method Summary
 boolean process(TextDocument doc)
          Processes the given document doc.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INSTANCE

public static final KeepLargestBlockFilter INSTANCE

INSTANCE_EXPAND_TO_SAME_TAGLEVEL

public static final KeepLargestBlockFilter INSTANCE_EXPAND_TO_SAME_TAGLEVEL
Constructor Detail

KeepLargestBlockFilter

public KeepLargestBlockFilter(boolean expandToSameLevelText)
Method Detail

process

public boolean process(TextDocument doc)
                throws BoilerpipeProcessingException
Description copied from interface: BoilerpipeFilter
Processes the given document doc.

Specified by:
process in interface BoilerpipeFilter
Parameters:
doc - The TextDocument that is to be processed.
Returns:
true if changes have been made to the TextDocument.
Throws:
BoilerpipeProcessingException