Uses of Interface
de.l3s.boilerpipe.BoilerpipeFilter

Packages that use BoilerpipeFilter
de.l3s.boilerpipe The Boilerpipe top-level package. 
de.l3s.boilerpipe.extractors This package contains some standard extractors (i.e., completely piped BoilerpipeFilters)  
de.l3s.boilerpipe.filters.english The BoilerpipeFilters in this package have only been tested on English text. 
de.l3s.boilerpipe.filters.heuristics The BoilerpipeFilters in this package are pure heuristics. 
de.l3s.boilerpipe.filters.simple The BoilerpipeFilters in this package are straight-forward and probably not really specific to English. 
 

Uses of BoilerpipeFilter in de.l3s.boilerpipe
 

Subinterfaces of BoilerpipeFilter in de.l3s.boilerpipe
 interface BoilerpipeExtractor
          Describes a complete filter pipeline.
 

Uses of BoilerpipeFilter in de.l3s.boilerpipe.extractors
 

Classes in de.l3s.boilerpipe.extractors that implement BoilerpipeFilter
 class ArticleExtractor
          A full-text extractor which is tuned towards news articles.
 class ArticleSentencesExtractor
          A full-text extractor which is tuned towards extracting sentences from news articles.
 class CanolaExtractor
          A full-text extractor trained on krdwrd Canola .
 class DefaultExtractor
          A quite generic full-text extractor.
 class ExtractorBase
          The base class of Extractors.
 class KeepEverythingExtractor
          Marks everything as content.
 class KeepEverythingWithMinKWordsExtractor
          A full-text extractor which extracts the largest text component of a page.
 class LargestContentExtractor
          A full-text extractor which extracts the largest text component of a page.
 class NumWordsRulesExtractor
          A quite generic full-text extractor solely based upon the number of words per block (the current, the previous and the next block).
 

Fields in de.l3s.boilerpipe.extractors declared as BoilerpipeFilter
static BoilerpipeFilter CanolaExtractor.CLASSIFIER
          The actual classifier, exposed.
 

Uses of BoilerpipeFilter in de.l3s.boilerpipe.filters.english
 

Classes in de.l3s.boilerpipe.filters.english that implement BoilerpipeFilter
 class DensityRulesClassifier
          Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features", particularly using text densities and link densities.
 class IgnoreBlocksAfterContentFilter
          Marks all blocks as "non-content" that occur after blocks that have been marked DefaultLabels.INDICATES_END_OF_TEXT.
 class IgnoreBlocksAfterContentFromEndFilter
          Marks all blocks as "non-content" that occur after blocks that have been marked DefaultLabels.INDICATES_END_OF_TEXT, and after any content block.
 class KeepLargestFulltextBlockFilter
          Keeps the largest TextBlock only (by the number of words).
 class MinFulltextWordsFilter
          Keeps only those content blocks which contain at least k full-text words (measured by HeuristicFilterBase.getNumFullTextWords(TextBlock)). k is 30 by default.
 class NumWordsRulesClassifier
          Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block.
 class TerminatingBlocksFinder
          Finds blocks which are potentially indicating the end of an article text and marks them with DefaultLabels.INDICATES_END_OF_TEXT.
 

Uses of BoilerpipeFilter in de.l3s.boilerpipe.filters.heuristics
 

Classes in de.l3s.boilerpipe.filters.heuristics that implement BoilerpipeFilter
 class AddPrecedingLabelsFilter
          Adds the labels of the preceding block to the current block, optionally adding a prefix.
 class ArticleMetadataFilter
           
 class BlockProximityFusion
          Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.
 class ContentFusion
           
 class DocumentTitleMatchClassifier
          Marks TextBlocks which contain parts of the HTML <TITLE> tag, using some heuristics which are quite specific to the news domain.
 class ExpandTitleToContentFilter
          Marks all TextBlocks "content" which are between the headline and the part that has already been marked content, if they are marked DefaultLabels.MIGHT_BE_CONTENT.
 class KeepLargestBlockFilter
          Keeps the largest TextBlock only (by the number of words).
 class LabelFusion
          Fuses adjacent blocks if their labels are equal.
 class SimpleBlockFusionProcessor
          Merges two subsequent blocks if their text densities are equal.
 

Uses of BoilerpipeFilter in de.l3s.boilerpipe.filters.simple
 

Classes in de.l3s.boilerpipe.filters.simple that implement BoilerpipeFilter
 class BoilerplateBlockFilter
          Removes TextBlocks which have explicitly been marked as "not content".
 class InvertedFilter
          Reverts the "isContent" flag for all TextBlocks
 class LabelToBoilerplateFilter
          Marks all blocks that contain a given label as "boilerplate".
 class LabelToContentFilter
          Marks all blocks that contain a given label as "content".
 class MarkEverythingContentFilter
          Marks all blocks as content.
 class MinClauseWordsFilter
          Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5).
 class MinWordsFilter
          Keeps only those content blocks which contain at least k words.
 class SplitParagraphBlocksFilter
          Splits TextBlocks at paragraph boundaries.
 class SurroundingToContentFilter