de.l3s.boilerpipe.extractors
Class CommonExtractors

java.lang.Object
  extended by de.l3s.boilerpipe.extractors.CommonExtractors

public final class CommonExtractors
extends java.lang.Object

Provides quick access to common BoilerpipeExtractors.

Author:
Christian Kohlschütter

Field Summary
static ArticleExtractor ARTICLE_EXTRACTOR
          Works very well for most types of Article-like HTML.
static CanolaExtractor CANOLA_EXTRACTOR
          Trained on krdwrd Canola (different definition of "boilerplate").
static DefaultExtractor DEFAULT_EXTRACTOR
          Usually worse than ArticleExtractor, but simpler/no heuristics.
static KeepEverythingExtractor KEEP_EVERYTHING_EXTRACTOR
          Dummy Extractor; should return the input text.
 
Method Summary
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ARTICLE_EXTRACTOR

public static final ArticleExtractor ARTICLE_EXTRACTOR
Works very well for most types of Article-like HTML.


DEFAULT_EXTRACTOR

public static final DefaultExtractor DEFAULT_EXTRACTOR
Usually worse than ArticleExtractor, but simpler/no heuristics.


CANOLA_EXTRACTOR

public static final CanolaExtractor CANOLA_EXTRACTOR
Trained on krdwrd Canola (different definition of "boilerplate"). You may give it a try.


KEEP_EVERYTHING_EXTRACTOR

public static final KeepEverythingExtractor KEEP_EVERYTHING_EXTRACTOR
Dummy Extractor; should return the input text. Use this to double-check that your problem is within a particular BoilerpipeExtractor, or somewhere else.