de.l3s.boilerpipe.sax
Class BoilerpipeSAXInput

java.lang.Object
  extended by de.l3s.boilerpipe.sax.BoilerpipeSAXInput
All Implemented Interfaces:
BoilerpipeInput

public final class BoilerpipeSAXInput
extends java.lang.Object
implements BoilerpipeInput

Parses an InputSource using SAX and returns a TextDocument.

Author:
Christian Kohlschütter

Constructor Summary
BoilerpipeSAXInput(org.xml.sax.InputSource is)
          Creates a new instance of BoilerpipeSAXInput for the given InputSource.
 
Method Summary
 TextDocument getTextDocument()
          Retrieves the TextDocument using a default HTML parser.
 TextDocument getTextDocument(BoilerpipeHTMLParser parser)
          Retrieves the TextDocument using the given HTML parser.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BoilerpipeSAXInput

public BoilerpipeSAXInput(org.xml.sax.InputSource is)
                   throws org.xml.sax.SAXException
Creates a new instance of BoilerpipeSAXInput for the given InputSource.

Parameters:
is -
Throws:
org.xml.sax.SAXException
Method Detail

getTextDocument

public TextDocument getTextDocument()
                             throws BoilerpipeProcessingException
Retrieves the TextDocument using a default HTML parser.

Specified by:
getTextDocument in interface BoilerpipeInput
Returns:
A TextDocument.
Throws:
BoilerpipeProcessingException

getTextDocument

public TextDocument getTextDocument(BoilerpipeHTMLParser parser)
                             throws BoilerpipeProcessingException
Retrieves the TextDocument using the given HTML parser.

Parameters:
parser - The parser used to transform the input into boilerpipe's internal representation.
Returns:
The retrieved TextDocument
Throws:
BoilerpipeProcessingException