uk.ac.shef.dcs.oak.jate.core.npextractor
Class NGramExtractor
java.lang.Object
uk.ac.shef.dcs.oak.jate.core.npextractor.CandidateTermExtractor
uk.ac.shef.dcs.oak.jate.core.npextractor.NGramExtractor
public class NGramExtractor
- extends CandidateTermExtractor
An NGram extractor that extracts n-grams from texts. By default n=5. change this by resetting the property
jate.system.term.maxwords in the property file
Method Summary |
java.util.Map<java.lang.String,java.util.Set<java.lang.String>> |
extract(Corpus c)
|
java.util.Map<java.lang.String,java.util.Set<java.lang.String>> |
extract(Document d)
|
java.util.Map<java.lang.String,java.util.Set<java.lang.String>> |
extract(java.lang.String content)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
NGramExtractor
public NGramExtractor(StopList stop,
Normalizer normaliser)
throws java.io.IOException
- Throws:
java.io.IOException
extract
public java.util.Map<java.lang.String,java.util.Set<java.lang.String>> extract(Corpus c)
throws JATEException
- Specified by:
extract
in class CandidateTermExtractor
- Parameters:
c
- corpus
- Returns:
- a map containing mappings from term canonical form to its variants found in the corpus
- Throws:
JATEException
extract
public java.util.Map<java.lang.String,java.util.Set<java.lang.String>> extract(Document d)
throws JATEException
- Specified by:
extract
in class CandidateTermExtractor
- Parameters:
d
- document
- Returns:
- a map containing mappings from term canonical form to its variants found in the document
- Throws:
JATEException
extract
public java.util.Map<java.lang.String,java.util.Set<java.lang.String>> extract(java.lang.String content)
throws JATEException
- Specified by:
extract
in class CandidateTermExtractor
- Parameters:
content
- a string
- Returns:
- a map containing mappings from term canonical form to its variants found in the string
- Throws:
JATEException