|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
BoilerplateBlockFilter | Removes TextBlock s which have explicitly been marked as "not content". |
InvertedFilter | Reverts the "isContent" flag for all TextBlock s |
LabelToBoilerplateFilter | Marks all blocks that contain a given label as "boilerplate". |
LabelToContentFilter | Marks all blocks that contain a given label as "content". |
MarkEverythingContentFilter | Marks all blocks as content. |
MinClauseWordsFilter | Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5). |
MinWordsFilter | Keeps only those content blocks which contain at least k words. |
SplitParagraphBlocksFilter | Splits TextBlocks at paragraph boundaries. |
The BoilerpipeFilters in this package are straight-forward and probably not really specific to English.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |