public class JahiaExtendedSpellChecker extends Object implements Closeable
Jahia Spell Checker class (Main class)
(initially inspired by the David Spencer code).
Example Usage:
JahiaSpellChecker spellchecker = new JahiaSpellChecker(spellIndexDirectory);
// To index a field of a user index:
spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
// To index a file containing words:
spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
| Modifier and Type | Field and Description |
|---|---|
static String |
F_LANGUAGE |
static String |
F_SITE |
static String |
F_WORD
Field name for each word in the ngram index.
|
| Constructor and Description |
|---|
JahiaExtendedSpellChecker(org.apache.lucene.store.Directory spellIndex) |
JahiaExtendedSpellChecker(org.apache.lucene.store.Directory spellIndex,
org.apache.lucene.search.spell.StringDistance sd)
Use the given directory as a spell checker index.
|
| Modifier and Type | Method and Description |
|---|---|
void |
clearIndex()
Removes all terms from the spell check index.
|
void |
close() |
boolean |
exist(String word,
String langCode,
String site)
Check whether the word exists in the index.
|
org.apache.lucene.search.spell.StringDistance |
getStringDistance()
Returns the
StringDistance instance used by this
SpellChecker instance. |
void |
indexDictionary(org.apache.lucene.search.spell.Dictionary dict,
int mergeFactor,
int ramMB,
String site,
String langCode)
Indexes the data from the given
Dictionary. |
void |
setAccuracy(float minScore)
Sets the accuracy 0 < minScore < 1; default 0.5
|
void |
setSpellIndex(org.apache.lucene.store.Directory spellIndexDir)
Use a different index as the spell checker index or re-open
the existing index if
spellIndex is the same value
as given in the constructor. |
void |
setStringDistance(org.apache.lucene.search.spell.StringDistance sd)
Sets the
StringDistance implementation for this
SpellChecker instance. |
String[] |
suggestSimilar(String word,
int numSug,
org.apache.lucene.index.IndexReader ir,
boolean morePopular,
String[] sites,
String language)
Suggest similar words (optionally restricted to a field of an index).
|
String[] |
suggestSimilar(String word,
int numSug,
org.apache.lucene.index.IndexReader ir,
String field,
boolean morePopular,
String sites,
String language)
Deprecated.
|
public static final String F_WORD
public static final String F_LANGUAGE
public static final String F_SITE
public JahiaExtendedSpellChecker(org.apache.lucene.store.Directory spellIndex,
org.apache.lucene.search.spell.StringDistance sd)
throws IOException
spellIndex - IOExceptionpublic JahiaExtendedSpellChecker(org.apache.lucene.store.Directory spellIndex)
throws IOException
IOExceptionpublic void setSpellIndex(org.apache.lucene.store.Directory spellIndexDir)
throws IOException
spellIndex is the same value
as given in the constructor.spellIndexDir - the spell directory to useorg.apache.lucene.store.AlreadyClosedException - if the Spellchecker is already closedIOException - if spellchecker can not open the directorypublic void setStringDistance(org.apache.lucene.search.spell.StringDistance sd)
StringDistance implementation for this
SpellChecker instance.sd - the StringDistance implementation for this
SpellChecker instancepublic org.apache.lucene.search.spell.StringDistance getStringDistance()
StringDistance instance used by this
SpellChecker instance.StringDistance instance used by this
SpellChecker instance.public void setAccuracy(float minScore)
public String[] suggestSimilar(String word, int numSug, org.apache.lucene.index.IndexReader ir, String field, boolean morePopular, String sites, String language) throws IOException
IOExceptionpublic String[] suggestSimilar(String word, int numSug, org.apache.lucene.index.IndexReader ir, boolean morePopular, String[] sites, String language) throws IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
word - the word you want a spell check done onnumSug - the number of suggested wordsir - the indexReader of the user index (can be null see field
param)morePopular - return only the suggest words that are as frequent or more
frequent than the searched word (only if restricted mode =
(indexReader!=null and field!=null)sites - an array of site keys to search inlanguage - the current languages, used for the searchIOException - in case of index read errorpublic void clearIndex()
throws IOException
IOExceptionorg.apache.lucene.store.AlreadyClosedException - if the Spellchecker is already closedpublic boolean exist(String word, String langCode, String site) throws IOException
word - IOExceptionpublic void indexDictionary(org.apache.lucene.search.spell.Dictionary dict,
int mergeFactor,
int ramMB,
String site,
String langCode)
throws IOException
Dictionary.dict - Dictionary to indexmergeFactor - mergeFactor to use when indexingramMB - the max amount or memory in MB to useIOExceptionpublic void close()
throws IOException
close in interface Closeableclose in interface AutoCloseableIOExceptionCopyright © 2004–2020 Jahia Solutions Group SA. All rights reserved.