|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.l3s.boilerpipe.sax.HTMLHighlighter
public final class HTMLHighlighter
Highlights text blocks in an HTML document that have been marked as "content"
in the corresponding TextDocument
.
Method Summary | |
---|---|
java.lang.String |
getExtraStyleSheet()
Returns the extra stylesheet definition that will be inserted in the HEAD element. |
java.lang.String |
getPostHighlight()
Returns the string that will be inserted after any highlighted HTML block. |
java.lang.String |
getPreHighlight()
Returns the string that will be inserted before any highlighted HTML block. |
boolean |
isOutputHighlightOnly()
If true, only HTML enclosed within highlighted content will be returned |
static HTMLHighlighter |
newExtractingInstance()
Creates a new HTMLHighlighter , which is set-up to return only the
extracted HTML text, including enclosed markup. |
static HTMLHighlighter |
newHighlightingInstance()
Creates a new HTMLHighlighter , which is set-up to return the full
HTML text, with the extracted text portion highlighted. |
java.lang.String |
process(TextDocument doc,
org.xml.sax.InputSource is)
Processes the given TextDocument and the original HTML text (as
an InputSource ). |
java.lang.String |
process(TextDocument doc,
java.lang.String origHTML)
Processes the given TextDocument and the original HTML text (as a
String). |
java.lang.String |
process(java.net.URL url,
BoilerpipeExtractor extractor)
|
void |
setExtraStyleSheet(java.lang.String extraStyleSheet)
Sets the extra stylesheet definition that will be inserted in the HEAD element. |
void |
setOutputHighlightOnly(boolean outputHighlightOnly)
Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML document. |
void |
setPostHighlight(java.lang.String postHighlight)
Sets the string that will be inserted after any highlighted HTML block. |
void |
setPreHighlight(java.lang.String preHighlight)
Sets the string that will be inserted prior to any highlighted HTML block. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static HTMLHighlighter newHighlightingInstance()
HTMLHighlighter
, which is set-up to return the full
HTML text, with the extracted text portion highlighted.
public static HTMLHighlighter newExtractingInstance()
HTMLHighlighter
, which is set-up to return only the
extracted HTML text, including enclosed markup.
public java.lang.String process(TextDocument doc, java.lang.String origHTML) throws BoilerpipeProcessingException
TextDocument
and the original HTML text (as a
String).
doc
- The processed TextDocument
.origHTML
- The original HTML document.
BoilerpipeProcessingException
public java.lang.String process(TextDocument doc, org.xml.sax.InputSource is) throws BoilerpipeProcessingException
TextDocument
and the original HTML text (as
an InputSource
).
doc
- The processed TextDocument
.is
- The original HTML document.
BoilerpipeProcessingException
public java.lang.String process(java.net.URL url, BoilerpipeExtractor extractor) throws java.io.IOException, BoilerpipeProcessingException, org.xml.sax.SAXException
java.io.IOException
BoilerpipeProcessingException
org.xml.sax.SAXException
public boolean isOutputHighlightOnly()
public void setOutputHighlightOnly(boolean outputHighlightOnly)
public java.lang.String getExtraStyleSheet()
public void setExtraStyleSheet(java.lang.String extraStyleSheet)
extraStyleSheet
- Plain HTMLpublic java.lang.String getPreHighlight()
<span class=&qupt;x-boilerpipe-mark1">
public void setPreHighlight(java.lang.String preHighlight)
public java.lang.String getPostHighlight()
</span>
public void setPostHighlight(java.lang.String postHighlight)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |