de.l3s.boilerpipe.filters.english
Class IgnoreBlocksAfterContentFilter
java.lang.Object
de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
- All Implemented Interfaces:
- BoilerpipeFilter
public final class IgnoreBlocksAfterContentFilter
- extends java.lang.Object
- implements BoilerpipeFilter
Marks all blocks as "non-content" that occur after blocks that have been
marked TextBlockLabel.INDICATES_END_OF_TEXT
. These marks are ignored
unless a minimum number of words in content blocks occur before this mark (default: 60).
This can be used in conjunction with an upstream TerminatingBlocksFinder
.
- Author:
- Christian Kohlschütter
- See Also:
TerminatingBlocksFinder
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULT_INSTANCE
public static final IgnoreBlocksAfterContentFilter DEFAULT_INSTANCE
IgnoreBlocksAfterContentFilter
public IgnoreBlocksAfterContentFilter(int minNumWords)
getDefaultInstance
public static IgnoreBlocksAfterContentFilter getDefaultInstance()
- Returns the singleton instance for DeleteBlocksAfterContentFilter.
process
public boolean process(TextDocument doc)
throws BoilerpipeProcessingException
- Description copied from interface:
BoilerpipeFilter
- Processes the given document
doc
.
- Specified by:
process
in interface BoilerpipeFilter
- Parameters:
doc
- The TextDocument
that is to be processed.
- Returns:
true
if changes have been made to the
TextDocument
.
- Throws:
BoilerpipeProcessingException
getNumFullTextWords
protected static int getNumFullTextWords(TextBlock tb)
getNumFullTextWords
protected static int getNumFullTextWords(TextBlock tb,
float minTextDensity)