de.l3s.boilerpipe.filters.english
Class DensityRulesClassifier

java.lang.Object
  extended by de.l3s.boilerpipe.filters.english.DensityRulesClassifier
All Implemented Interfaces:
BoilerpipeFilter

public class DensityRulesClassifier
extends java.lang.Object
implements BoilerpipeFilter

Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features", particularly using text densities and link densities.

Author:
Christian Kohlschütter

Field Summary
static DensityRulesClassifier INSTANCE
           
 
Constructor Summary
DensityRulesClassifier()
           
 
Method Summary
protected  boolean classify(TextBlock prev, TextBlock curr, TextBlock next)
           
static DensityRulesClassifier getInstance()
          Returns the singleton instance for RulebasedBoilerpipeClassifier.
 boolean process(TextDocument doc)
          Processes the given document doc.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INSTANCE

public static final DensityRulesClassifier INSTANCE
Constructor Detail

DensityRulesClassifier

public DensityRulesClassifier()
Method Detail

getInstance

public static DensityRulesClassifier getInstance()
Returns the singleton instance for RulebasedBoilerpipeClassifier.


process

public boolean process(TextDocument doc)
                throws BoilerpipeProcessingException
Description copied from interface: BoilerpipeFilter
Processes the given document doc.

Specified by:
process in interface BoilerpipeFilter
Parameters:
doc - The TextDocument that is to be processed.
Returns:
true if changes have been made to the TextDocument.
Throws:
BoilerpipeProcessingException

classify

protected boolean classify(TextBlock prev,
                           TextBlock curr,
                           TextBlock next)