de.l3s.boilerpipe.sax
Class BoilerpipeHTMLContentHandler

java.lang.Object
  extended by de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
All Implemented Interfaces:
org.xml.sax.ContentHandler

public class BoilerpipeHTMLContentHandler
extends java.lang.Object
implements org.xml.sax.ContentHandler

A simple SAX ContentHandler, used by BoilerpipeSAXInput. Can be used by different parser implementations, e.g. NekoHTML and TagSoup.

Author:
Christian Kohlschütter

Constructor Summary
BoilerpipeHTMLContentHandler()
           
 
Method Summary
 void characters(char[] ch, int start, int length)
           
 void endDocument()
           
 void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)
           
 void endPrefixMapping(java.lang.String prefix)
           
 java.lang.String getTitle()
           
 void ignorableWhitespace(char[] ch, int start, int length)
           
 void processingInstruction(java.lang.String target, java.lang.String data)
           
 void setDocumentLocator(org.xml.sax.Locator locator)
           
 void setTitle(java.lang.String s)
           
 void skippedEntity(java.lang.String name)
           
 void startDocument()
           
 void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)
           
 void startPrefixMapping(java.lang.String prefix, java.lang.String uri)
           
 TextDocument toTextDocument()
          Returns a TextDocument containing the extracted TextBlocks.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BoilerpipeHTMLContentHandler

public BoilerpipeHTMLContentHandler()
Method Detail

endDocument

public void endDocument()
                 throws org.xml.sax.SAXException
Specified by:
endDocument in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

endPrefixMapping

public void endPrefixMapping(java.lang.String prefix)
                      throws org.xml.sax.SAXException
Specified by:
endPrefixMapping in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

ignorableWhitespace

public void ignorableWhitespace(char[] ch,
                                int start,
                                int length)
                         throws org.xml.sax.SAXException
Specified by:
ignorableWhitespace in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

processingInstruction

public void processingInstruction(java.lang.String target,
                                  java.lang.String data)
                           throws org.xml.sax.SAXException
Specified by:
processingInstruction in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

setDocumentLocator

public void setDocumentLocator(org.xml.sax.Locator locator)
Specified by:
setDocumentLocator in interface org.xml.sax.ContentHandler

skippedEntity

public void skippedEntity(java.lang.String name)
                   throws org.xml.sax.SAXException
Specified by:
skippedEntity in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

startDocument

public void startDocument()
                   throws org.xml.sax.SAXException
Specified by:
startDocument in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

startPrefixMapping

public void startPrefixMapping(java.lang.String prefix,
                               java.lang.String uri)
                        throws org.xml.sax.SAXException
Specified by:
startPrefixMapping in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

startElement

public void startElement(java.lang.String uri,
                         java.lang.String localName,
                         java.lang.String qName,
                         org.xml.sax.Attributes atts)
                  throws org.xml.sax.SAXException
Specified by:
startElement in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String uri,
                       java.lang.String localName,
                       java.lang.String qName)
                throws org.xml.sax.SAXException
Specified by:
endElement in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws org.xml.sax.SAXException
Specified by:
characters in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

getTitle

public java.lang.String getTitle()

setTitle

public void setTitle(java.lang.String s)

toTextDocument

public TextDocument toTextDocument()
Returns a TextDocument containing the extracted TextBlocks. NOTE: Only call this after #parse(org.xml.sax.InputSource).

Returns:
The TextDocument