A B C D E G H I K L M N P S T U

A

addLabel(String) - Method in class de.l3s.boilerpipe.document.TextBlock
Adds an arbitrary String label to this TextBlock.
addLabels(Set<String>) - Method in class de.l3s.boilerpipe.document.TextBlock
Adds a set of labels to this TextBlock.
ArticleExtractor - Class in de.l3s.boilerpipe.extractors
A full-text extractor which is tuned towards news articles.
ArticleExtractor() - Constructor for class de.l3s.boilerpipe.extractors.ArticleExtractor
 
ArticleSentencesExtractor - Class in de.l3s.boilerpipe.extractors
A full-text extractor which is tuned towards extracting sentences from news articles.
ArticleSentencesExtractor() - Constructor for class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
 

B

BlockProximityFusion - Class in de.l3s.boilerpipe.filters.heuristics
Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.
BlockProximityFusion(int, boolean) - Constructor for class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
Creates a new BlockProximityFusion instance.
BoilerpipeExtractor - Interface in de.l3s.boilerpipe
Describes a complete filter pipeline.
BoilerpipeFilter - Interface in de.l3s.boilerpipe
A generic BoilerpipeFilter.
BoilerpipeHTMLContentHandler - Class in de.l3s.boilerpipe.sax
A simple SAX ContentHandler, used by BoilerpipeSAXInput.
BoilerpipeHTMLContentHandler() - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
BoilerpipeHTMLParser - Class in de.l3s.boilerpipe.sax
A simple SAX Parser, used by BoilerpipeSAXInput.
BoilerpipeHTMLParser() - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
 
BoilerpipeInput - Interface in de.l3s.boilerpipe
A source that returns TextDocuments.
BoilerpipeProcessingException - Exception in de.l3s.boilerpipe
Exception for signaling failure in the processing pipeline.
BoilerpipeProcessingException() - Constructor for exception de.l3s.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeProcessingException(String, Throwable) - Constructor for exception de.l3s.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeProcessingException(String) - Constructor for exception de.l3s.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeProcessingException(Throwable) - Constructor for exception de.l3s.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeSAXInput - Class in de.l3s.boilerpipe.sax
Parses an InputSource using SAX and returns a TextDocument.
BoilerpipeSAXInput(InputSource) - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeSAXInput
Creates a new instance of BoilerpipeSAXInput for the given InputSource.
BoilerplateBlockFilter - Class in de.l3s.boilerpipe.filters.simple
Removes TextBlocks which have explicitly been marked as "not content".
BoilerplateBlockFilter() - Constructor for class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
 

C

characters(char[], int, int) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
classify(TextBlock, TextBlock, TextBlock) - Method in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
 
classify(TextBlock, TextBlock, TextBlock) - Method in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
 

D

de.l3s.boilerpipe - package de.l3s.boilerpipe
The Boilerpipe top-level package.
de.l3s.boilerpipe.document - package de.l3s.boilerpipe.document
The classes in this package represent the simple Boilerpipe document model.
de.l3s.boilerpipe.extractors - package de.l3s.boilerpipe.extractors
This package contains some standard extractors (i.e., completely piped BoilerpipeFilters)
de.l3s.boilerpipe.filters.english - package de.l3s.boilerpipe.filters.english
The BoilerpipeFilters in this package have only been tested on English text.
de.l3s.boilerpipe.filters.heuristics - package de.l3s.boilerpipe.filters.heuristics
The BoilerpipeFilters in this package are pure heuristics.
de.l3s.boilerpipe.filters.simple - package de.l3s.boilerpipe.filters.simple
The BoilerpipeFilters in this package are straight-forward and probably not really specific to English.
de.l3s.boilerpipe.sax - package de.l3s.boilerpipe.sax
Classes related to parsing and producing HTML from/to Boilerpipe TextDocuments.
de.l3s.boilerpipe.util - package de.l3s.boilerpipe.util
Some helper classes.
debugString() - Method in class de.l3s.boilerpipe.document.TextDocument
Returns detailed debugging information about the contained TextBlocks.
DEFAULT_INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
DEFAULT_INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
 
DefaultExtractor - Class in de.l3s.boilerpipe.extractors
A quite generic full-text extractor.
DefaultExtractor() - Constructor for class de.l3s.boilerpipe.extractors.DefaultExtractor
 
DensityRulesClassifier - Class in de.l3s.boilerpipe.filters.english
Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features", particularly using text densities and link densities.
DensityRulesClassifier() - Constructor for class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
 
DocumentTitleMatchClassifier - Class in de.l3s.boilerpipe.filters.heuristics
Marks TextBlocks which contain parts of the HTML <TITLE> tag, using some heuristics which are quite specific to the news domain.
DocumentTitleMatchClassifier(String) - Constructor for class de.l3s.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 

E

EMPTY_END - Static variable in class de.l3s.boilerpipe.document.TextBlock
 
EMPTY_START - Static variable in class de.l3s.boilerpipe.document.TextBlock
 
endDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
endElement(String, String, String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
endPrefixMapping(String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
ExpandTitleToContentFilter - Class in de.l3s.boilerpipe.filters.heuristics
Marks all TextBlocks "content" which are between the headline and the part that has already been marked content, if they are marked TextBlockLabel.MIGHT_BE_CONTENT.
ExpandTitleToContentFilter() - Constructor for class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
 
ExtractorBase - Class in de.l3s.boilerpipe.extractors
The base class of Extractors.
ExtractorBase() - Constructor for class de.l3s.boilerpipe.extractors.ExtractorBase
 

G

getContainedTextElements() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getContent() - Method in class de.l3s.boilerpipe.document.TextDocument
Returns the TextDocument's content.
getDefaultInstance() - Static method in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
Returns the singleton instance for DeleteBlocksAfterContentFilter.
getDefaultInstance() - Static method in class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
 
getHTML() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Returns the highlighted HTML code.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.ArticleExtractor
Returns the singleton instance for ArticleExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
Returns the singleton instance for ArticleSentencesExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.DefaultExtractor
Returns the singleton instance for DefaultExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.LargestContentExtractor
Returns the singleton instance for LargestContentExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor
Returns the singleton instance for NumWordsRulesExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
Returns the singleton instance for RulebasedBoilerpipeClassifier.
getInstance() - Static method in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
Returns the singleton instance for RulebasedBoilerpipeClassifier.
getInstance() - Static method in class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
Returns the singleton instance for TerminatingBlocksFinder.
getInstance() - Static method in class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
Returns the singleton instance for ExpandTitleToContentFilter.
getInstance() - Static method in class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
Returns the singleton instance for BlockFusionProcessor.
getInstance() - Static method in class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
Returns the singleton instance for BoilerplateBlockFilter.
getInstance() - Static method in class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
Returns the singleton instance for TerminatingBlocksFinder.
getLabels() - Method in class de.l3s.boilerpipe.document.TextBlock
Returns the labels associated to this TextBlock, or null if no such labels exist.
getLinkDensity() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getNumWords() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getNumWordsInAnchorText() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getOffsetBlocksEnd() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getOffsetBlocksStart() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getText(String) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
Extracts text from the HTML code given as a String.
getText(InputSource) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
Extracts text from the HTML code available from the given InputSource.
getText(Reader) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
Extracts text from the HTML code available from the given Reader.
getText(TextDocument) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
Extracts text from the given TextDocument object.
getText() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getText(boolean, boolean) - Method in class de.l3s.boilerpipe.document.TextDocument
Returns the TextDocument's content, non-content or both
getText(String) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code given as a String.
getText(InputSource) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code available from the given InputSource.
getText(URL) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code available from the given URL.
getText(Reader) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code available from the given Reader.
getText(TextDocument) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
Extracts text from the given TextDocument object.
getTextBlocks() - Method in class de.l3s.boilerpipe.document.TextDocument
Returns the TextBlocks of this document.
getTextDensity() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getTextDocument() - Method in interface de.l3s.boilerpipe.BoilerpipeInput
Returns (somehow) a TextDocument.
getTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeSAXInput
 
getTitle() - Method in class de.l3s.boilerpipe.document.TextDocument
Returns the "main" title for this document, or null if no such title has ben set.
getTitle() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 

H

hasLabel(String) - Method in class de.l3s.boilerpipe.document.TextBlock
Checks whether this TextBlock has the given label.
HTMLHighlighter - Class in de.l3s.boilerpipe.sax
Highlights text blocks in an HTML document that have been marked as "content" in the corresponding TextDocument.
HTMLHighlighter(TextDocument, String) - Constructor for class de.l3s.boilerpipe.sax.HTMLHighlighter
Prepares the HTMLHighlighter for the given TextDocument and the original HTML text (as a String).
HTMLHighlighter(TextDocument, InputSource) - Constructor for class de.l3s.boilerpipe.sax.HTMLHighlighter
Prepares the HTMLHighlighter for the given TextDocument and the original HTML text (as an InputSource).

I

ignorableWhitespace(char[], int, int) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
IgnoreBlocksAfterContentFilter - Class in de.l3s.boilerpipe.filters.english
Marks all blocks as "non-content" that occur after blocks that have been marked TextBlockLabel.INDICATES_END_OF_TEXT.
IgnoreBlocksAfterContentFilter(int) - Constructor for class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
INDICATES_END_OF_TEXT - Static variable in class de.l3s.boilerpipe.document.TextBlockLabel
 
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.ArticleExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.DefaultExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.KeepEverythingExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.LargestContentExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.InvertedFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.MarkEverythingContentFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
 
InvertedFilter - Class in de.l3s.boilerpipe.filters.simple
Reverts the "isContent" flag for all TextBlocks
isContent() - Method in class de.l3s.boilerpipe.document.TextBlock
 

K

KeepEverythingExtractor - Class in de.l3s.boilerpipe.extractors
Marks everything as content.
KeepEverythingWithMinKWordsExtractor - Class in de.l3s.boilerpipe.extractors
A full-text extractor which extracts the largest text component of a page.
KeepEverythingWithMinKWordsExtractor(int) - Constructor for class de.l3s.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
 
KeepLargestBlockFilter - Class in de.l3s.boilerpipe.filters.heuristics
Keeps the largest TextBlock only (by the number of words).
KeepLargestBlockFilter() - Constructor for class de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
KeepLargestFulltextBlockFilter - Class in de.l3s.boilerpipe.filters.english
Keeps the largest TextBlock only (by the number of words).
KeepLargestFulltextBlockFilter() - Constructor for class de.l3s.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
 

L

LargestContentExtractor - Class in de.l3s.boilerpipe.extractors
A full-text extractor which extracts the largest text component of a page.

M

MarkEverythingContentFilter - Class in de.l3s.boilerpipe.filters.simple
Marks all blocks as content.
MAX_DISTANCE_1 - Static variable in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
 
MAX_DISTANCE_1_CONTENT_ONLY - Static variable in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
 
mergeNext(TextBlock) - Method in class de.l3s.boilerpipe.document.TextBlock
 
MIGHT_BE_CONTENT - Static variable in class de.l3s.boilerpipe.document.TextBlockLabel
 
MinClauseWordsFilter - Class in de.l3s.boilerpipe.filters.simple
Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5).
MinClauseWordsFilter(int) - Constructor for class de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter
 
MinClauseWordsFilter(int, boolean) - Constructor for class de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter
 
MinFulltextWordsFilter - Class in de.l3s.boilerpipe.filters.english
Keeps only those content blocks which contain at least k full-text words (measured by TextBlock#getNumFullTextWords()). k is 30 by default.
MinFulltextWordsFilter(int) - Constructor for class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
 
MinWordsFilter - Class in de.l3s.boilerpipe.filters.simple
Keeps only those content blocks which contain at least k words.
MinWordsFilter(int) - Constructor for class de.l3s.boilerpipe.filters.simple.MinWordsFilter
 

N

NumWordsRulesClassifier - Class in de.l3s.boilerpipe.filters.english
Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block.
NumWordsRulesClassifier() - Constructor for class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
 
NumWordsRulesExtractor - Class in de.l3s.boilerpipe.extractors
A quite generic full-text extractor solely based upon the number of words per block (the current, the previous and the next block).
NumWordsRulesExtractor() - Constructor for class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor
 

P

process(TextDocument) - Method in interface de.l3s.boilerpipe.BoilerpipeFilter
Processes the given document doc.
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.ArticleExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.DefaultExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.KeepEverythingExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.LargestContentExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.InvertedFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.MarkEverythingContentFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.MinWordsFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
 
processingInstruction(String, String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 

S

setDocumentLocator(Locator) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
setIsContent(boolean) - Method in class de.l3s.boilerpipe.document.TextBlock
 
setTitle(String) - Method in class de.l3s.boilerpipe.document.TextDocument
Updates the "main" title for this document.
setTitle(String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
SimpleBlockFusionProcessor - Class in de.l3s.boilerpipe.filters.heuristics
Merges two subsequent blocks if their text densities are equal.
SimpleBlockFusionProcessor() - Constructor for class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
 
skippedEntity(String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
SplitParagraphBlocksFilter - Class in de.l3s.boilerpipe.filters.simple
Splits TextBlocks at paragraph boundaries.
SplitParagraphBlocksFilter() - Constructor for class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
 
startDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
startElement(String, String, String, Attributes) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
startPrefixMapping(String, String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 

T

TerminatingBlocksFinder - Class in de.l3s.boilerpipe.filters.english
Finds blocks which are potentially indicating the end of an article text and marks them with TextBlockLabel.INDICATES_END_OF_TEXT.
TerminatingBlocksFinder() - Constructor for class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
 
TextBlock - Class in de.l3s.boilerpipe.document
Describes a block of text.
TextBlock(String) - Constructor for class de.l3s.boilerpipe.document.TextBlock
 
TextBlock(String, BitSet, int, int, int, int, int) - Constructor for class de.l3s.boilerpipe.document.TextBlock
 
TextBlockLabel - Class in de.l3s.boilerpipe.document
Some pre-defined labels which can be used in conjunction with TextBlock.addLabel(String) and TextBlock.hasLabel(String).
TextBlockLabel() - Constructor for class de.l3s.boilerpipe.document.TextBlockLabel
 
TextDocument - Class in de.l3s.boilerpipe.document
A text document, consisting of one or more TextBlocks.
TextDocument(List<TextBlock>) - Constructor for class de.l3s.boilerpipe.document.TextDocument
Creates a new TextDocument with given TextBlocks, and no title.
TextDocument(String, List<TextBlock>) - Constructor for class de.l3s.boilerpipe.document.TextDocument
Creates a new TextDocument with given TextBlocks and given title.
TITLE - Static variable in class de.l3s.boilerpipe.document.TextBlockLabel
 
tokenize(CharSequence) - Static method in class de.l3s.boilerpipe.util.UnicodeTokenizer
Tokenizes the text and returns an array of tokens.
toString() - Method in class de.l3s.boilerpipe.document.TextBlock
 
toTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
Returns a TextDocument containing the extracted TextBlocks.
toTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
Returns a TextDocument containing the extracted TextBlocks.

U

UnicodeTokenizer - Class in de.l3s.boilerpipe.util
Tokenizes text according to Unicode word boundaries and strips off non-word characters.
UnicodeTokenizer() - Constructor for class de.l3s.boilerpipe.util.UnicodeTokenizer
 

A B C D E G H I K L M N P S T U