de.l3s.boilerpipe.filters.heuristics
Class KeepLargestBlockFilter
java.lang.Object
de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- All Implemented Interfaces:
- BoilerpipeFilter
public final class KeepLargestBlockFilter
- extends java.lang.Object
- implements BoilerpipeFilter
Keeps the largest TextBlock
only (by the number of words). In case of
more than one block with the same number of words, the first block is chosen.
All discarded blocks are marked "not content" and flagged as
DefaultLabels.MIGHT_BE_CONTENT
.
Note that, by default, only TextBlocks marked as "content" are taken into consideration.
- Author:
- Christian Kohlschütter
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
INSTANCE
public static final KeepLargestBlockFilter INSTANCE
INSTANCE_EXPAND_TO_SAME_TAGLEVEL
public static final KeepLargestBlockFilter INSTANCE_EXPAND_TO_SAME_TAGLEVEL
KeepLargestBlockFilter
public KeepLargestBlockFilter(boolean expandToSameLevelText)
process
public boolean process(TextDocument doc)
throws BoilerpipeProcessingException
- Description copied from interface:
BoilerpipeFilter
- Processes the given document
doc
.
- Specified by:
process
in interface BoilerpipeFilter
- Parameters:
doc
- The TextDocument
that is to be processed.
- Returns:
true
if changes have been made to the
TextDocument
.
- Throws:
BoilerpipeProcessingException