de.l3s.boilerpipe.document
Class TextBlock

java.lang.Object
  extended by de.l3s.boilerpipe.document.TextBlock
All Implemented Interfaces:
java.lang.Cloneable

public class TextBlock
extends java.lang.Object
implements java.lang.Cloneable

Describes a block of text. A block can be an "atomic" text element (i.e., a sequence of text that is not interrupted by any HTML markup) or a compound of such atomic elements.

Author:
Christian Kohlschütter

Field Summary
static TextBlock EMPTY_END
           
static TextBlock EMPTY_START
           
 
Constructor Summary
TextBlock(java.lang.String text)
           
TextBlock(java.lang.String text, java.util.BitSet containedTextElements, int numWords, int numWordsInAnchorText, int numWordsInWrappedLines, int numWrappedLines, int offsetBlocks)
           
 
Method Summary
 void addLabel(java.lang.String label)
          Adds an arbitrary String label to this TextBlock.
 void addLabels(java.util.Set<java.lang.String> l)
          Adds a set of labels to this TextBlock.
 void addLabels(java.lang.String... l)
          Adds a set of labels to this TextBlock.
protected  java.lang.Object clone()
           
 java.util.BitSet getContainedTextElements()
          Returns the containedTextElements BitSet, or null.
 java.util.Set<java.lang.String> getLabels()
          Returns the labels associated to this TextBlock, or null if no such labels exist.
 float getLinkDensity()
           
 int getNumWords()
           
 int getNumWordsInAnchorText()
           
 int getOffsetBlocksEnd()
           
 int getOffsetBlocksStart()
           
 java.lang.String getText()
           
 float getTextDensity()
           
 boolean hasLabel(java.lang.String label)
          Checks whether this TextBlock has the given label.
 boolean isContent()
           
 void mergeNext(TextBlock other)
           
 boolean setIsContent(boolean isContent)
           
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

EMPTY_START

public static final TextBlock EMPTY_START

EMPTY_END

public static final TextBlock EMPTY_END
Constructor Detail

TextBlock

public TextBlock(java.lang.String text)

TextBlock

public TextBlock(java.lang.String text,
                 java.util.BitSet containedTextElements,
                 int numWords,
                 int numWordsInAnchorText,
                 int numWordsInWrappedLines,
                 int numWrappedLines,
                 int offsetBlocks)
Method Detail

isContent

public boolean isContent()

setIsContent

public boolean setIsContent(boolean isContent)

getText

public java.lang.String getText()

getNumWords

public int getNumWords()

getNumWordsInAnchorText

public int getNumWordsInAnchorText()

getTextDensity

public float getTextDensity()

getLinkDensity

public float getLinkDensity()

mergeNext

public void mergeNext(TextBlock other)

getOffsetBlocksStart

public int getOffsetBlocksStart()

getOffsetBlocksEnd

public int getOffsetBlocksEnd()

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

addLabel

public void addLabel(java.lang.String label)
Adds an arbitrary String label to this TextBlock.

Parameters:
label - The label
See Also:
DefaultLabels

hasLabel

public boolean hasLabel(java.lang.String label)
Checks whether this TextBlock has the given label.

Parameters:
label - The label
Returns:
true if this block is marked by the given label.

getLabels

public java.util.Set<java.lang.String> getLabels()
Returns the labels associated to this TextBlock, or null if no such labels exist. NOTE: The returned instance is the one used directly in TextBlock. You have full access to the data structure. However it is recommended to use the label-specific methods in TextBlock whenever possible.

Returns:
Returns the set of labels, or null if no labels was added yet.

addLabels

public void addLabels(java.util.Set<java.lang.String> l)
Adds a set of labels to this TextBlock. null-references are silently ignored.

Parameters:
l - The labels to be added.

addLabels

public void addLabels(java.lang.String... l)
Adds a set of labels to this TextBlock. null-references are silently ignored.

Parameters:
l - The labels to be added.

getContainedTextElements

public java.util.BitSet getContainedTextElements()
Returns the containedTextElements BitSet, or null.

Returns:

clone

protected java.lang.Object clone()
Overrides:
clone in class java.lang.Object