|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use de.l3s.boilerpipe.filters.english | |
---|---|
de.l3s.boilerpipe.filters.english | The BoilerpipeFilters in this package have only been tested on English text. |
Classes in de.l3s.boilerpipe.filters.english used by de.l3s.boilerpipe.filters.english | |
---|---|
DensityRulesClassifier
Classifies TextBlock s as content/not-content through rules that have
been determined using the C4.8 machine learning algorithm, as described in the
paper "Boilerplate Detection using Shallow Text Features", particularly using
text densities and link densities. |
|
IgnoreBlocksAfterContentFilter
Marks all blocks as "non-content" that occur after blocks that have been marked TextBlockLabel.INDICATES_END_OF_TEXT . |
|
KeepLargestFulltextBlockFilter
Keeps the largest TextBlock only (by the number of words). |
|
MinFulltextWordsFilter
Keeps only those content blocks which contain at least k full-text words (measured by TextBlock#getNumFullTextWords() ). k is 30 by default. |
|
NumWordsRulesClassifier
Classifies TextBlock s as content/not-content through rules that have
been determined using the C4.8 machine learning algorithm, as described in
the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010),
particularly using number of words per block and link density per block. |
|
TerminatingBlocksFinder
Finds blocks which are potentially indicating the end of an article text and marks them with TextBlockLabel.INDICATES_END_OF_TEXT . |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |