de.l3s.boilerpipe.util
Class UnicodeTokenizer

java.lang.Object
  extended by de.l3s.boilerpipe.util.UnicodeTokenizer

public class UnicodeTokenizer
extends java.lang.Object

Tokenizes text according to Unicode word boundaries and strips off non-word characters.

Author:
Christian Kohlschütter

Constructor Summary
UnicodeTokenizer()
           
 
Method Summary
static java.lang.String[] tokenize(java.lang.CharSequence text)
          Tokenizes the text and returns an array of tokens.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UnicodeTokenizer

public UnicodeTokenizer()
Method Detail

tokenize

public static java.lang.String[] tokenize(java.lang.CharSequence text)
Tokenizes the text and returns an array of tokens.

Parameters:
text - The text
Returns:
The tokens