de.l3s.boilerpipe.extractors
Class LargestContentExtractor

java.lang.Object
  extended by de.l3s.boilerpipe.extractors.ExtractorBase
      extended by de.l3s.boilerpipe.extractors.LargestContentExtractor
All Implemented Interfaces:
BoilerpipeExtractor, BoilerpipeFilter

public final class LargestContentExtractor
extends ExtractorBase

A full-text extractor which extracts the largest text component of a page. For news articles, it may perform better than the DefaultExtractor, but usually worse than ArticleExtractor.

Author:
Christian Kohlschütter

Field Summary
static LargestContentExtractor INSTANCE
           
 
Method Summary
static LargestContentExtractor getInstance()
          Returns the singleton instance for LargestContentExtractor.
 boolean process(TextDocument doc)
          Processes the given document doc.
 
Methods inherited from class de.l3s.boilerpipe.extractors.ExtractorBase
getText, getText, getText, getText, getText
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INSTANCE

public static final LargestContentExtractor INSTANCE
Method Detail

getInstance

public static LargestContentExtractor getInstance()
Returns the singleton instance for LargestContentExtractor.


process

public boolean process(TextDocument doc)
                throws BoilerpipeProcessingException
Description copied from interface: BoilerpipeFilter
Processes the given document doc.

Parameters:
doc - The TextDocument that is to be processed.
Returns:
true if changes have been made to the TextDocument.
Throws:
BoilerpipeProcessingException