uk.ac.shef.dcs.oak.jate.core.npextractor
Class NGramExtractor

java.lang.Object
  extended by uk.ac.shef.dcs.oak.jate.core.npextractor.CandidateTermExtractor
      extended by uk.ac.shef.dcs.oak.jate.core.npextractor.NGramExtractor

public class NGramExtractor
extends CandidateTermExtractor

An NGram extractor that extracts n-grams from texts. By default n=5. change this by resetting the property jate.system.term.maxwords in the property file


Field Summary
 
Fields inherited from class uk.ac.shef.dcs.oak.jate.core.npextractor.CandidateTermExtractor
_normaliser, _stoplist
 
Constructor Summary
NGramExtractor(StopList stop, Normalizer normaliser)
           
 
Method Summary
 java.util.Map<java.lang.String,java.util.Set<java.lang.String>> extract(Corpus c)
           
 java.util.Map<java.lang.String,java.util.Set<java.lang.String>> extract(Document d)
           
 java.util.Map<java.lang.String,java.util.Set<java.lang.String>> extract(java.lang.String content)
           
 
Methods inherited from class uk.ac.shef.dcs.oak.jate.core.npextractor.CandidateTermExtractor
applyCharacterReplacement, applySplitList, applyTrimStopwords, containsDigit, containsLetter, hasReasonableNumChars
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

NGramExtractor

public NGramExtractor(StopList stop,
                      Normalizer normaliser)
               throws java.io.IOException
Throws:
java.io.IOException
Method Detail

extract

public java.util.Map<java.lang.String,java.util.Set<java.lang.String>> extract(Corpus c)
                                                                        throws JATEException
Specified by:
extract in class CandidateTermExtractor
Parameters:
c - corpus
Returns:
a map containing mappings from term canonical form to its variants found in the corpus
Throws:
JATEException

extract

public java.util.Map<java.lang.String,java.util.Set<java.lang.String>> extract(Document d)
                                                                        throws JATEException
Specified by:
extract in class CandidateTermExtractor
Parameters:
d - document
Returns:
a map containing mappings from term canonical form to its variants found in the document
Throws:
JATEException

extract

public java.util.Map<java.lang.String,java.util.Set<java.lang.String>> extract(java.lang.String content)
                                                                        throws JATEException
Specified by:
extract in class CandidateTermExtractor
Parameters:
content - a string
Returns:
a map containing mappings from term canonical form to its variants found in the string
Throws:
JATEException