WO2000057291A1 - Spelling correction method using improved minimum edit distance algorithm - Google Patents

Spelling correction method using improved minimum edit distance algorithm Download PDF

Info

Publication number
WO2000057291A1
WO2000057291A1 PCT/US2000/000260 US0000260W WO0057291A1 WO 2000057291 A1 WO2000057291 A1 WO 2000057291A1 US 0000260 W US0000260 W US 0000260W WO 0057291 A1 WO0057291 A1 WO 0057291A1
Authority
WO
WIPO (PCT)
Prior art keywords
sleft
sright
icheck2
smiddle
elsif
Prior art date
Application number
PCT/US2000/000260
Other languages
French (fr)
Inventor
Mark Kantrowitz
Original Assignee
Justsystem Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Justsystem Corporation filed Critical Justsystem Corporation
Priority to AU24922/00A priority Critical patent/AU2492200A/en
Publication of WO2000057291A1 publication Critical patent/WO2000057291A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation

Definitions

  • BACKGROUND Current spelling correction software detects nonword spelling errors by checking whether the word or text string appears in a dictionary of valid words. Once a misspelled word is detected it is either automatically corrected or a candidate list of possible corrections is displayed. Algorithms for selecting the correction or displaying a candidate list of possible corrections use a word similarity metric to measure the distance from the misspelled word to words in the dictionary. The closest matches are treated as candidates.
  • the most popular word similarity metric is minimum edit distance, that is, the minimum number of insertions, deletions, transpositions and substitutions required to transform the misspelled word into a valid word. Computing the edit distance to every word in the dictionary is time consuming.
  • candidate generation algorithms typically partition the dictionary according to word length and the first two letters of the word. Edit distances are only calculated for selected dictionary partitions. Stepping through the dictionary partition, each word is compared to the misspelled word and the edit distance therebetween is calculated. Now the dictionary partitioning used with standard edit distance leads to a reduction in accuracy. For example, the partitioning on the first letter means that it cannot correct errors that occur in the first letter (about 7% of all spelling errors) .
  • Reverse minimum edit distance is a candidate generation algorithm which applies possible edits to the misspelled word and then compares the edited word to words in the dictionary to discover which words are within a given number of edits from the misspelled word.
  • n-letter nonword there are 25n possible substitutions, 26 (n+1) possible insertions, n possible deletions, and n-1 possible transpositions for a total of 53n + 25 possible edits.
  • n+1 possible substitutions
  • n possible deletions for a total of 53n + 25 possible edits.
  • n-1 possible transpositions for a total of 53n + 25 possible edits.
  • For a seven letter nonword that means a total of 396 possible words just for an edit distance of one.
  • the number of possible words goes up by the square yielding 156,816 possible words (not counting the edit distance one possibilities) .
  • the standard minimum edit distance algorithm is generally preferred over the reverse minimum edit distance algorithm.
  • the standard minimum edit distance algorithm computes the edit distance between the misspelled word and every word in the applicable dictionary partition.
  • the number of minimum edit distance calculations is equal to the number of words in the partition.
  • the cost of computing edit distances is only manageable because the set of potential corrections is limited.
  • the reverse minimum edit distance algorithm applies all possible edits at the distances 1 or 2 and so on to a misspelled word blindly generating a large list of candidates each of which must then be tested against the valid dictionary. The number of candidates generated and the dictionary references required is normally considered prohibitive. Reverse minimum edit distance was described by
  • a computer method of spelling correction which comprises a step for calculating minimum edit distances using a restricted set of edit operations which correct the most common errors comprising insertion, deletion, transposition and/or substitution.
  • the restricted set of edit operations consists of only the most common edits
  • the set of edits may also include common complex edits such as long-distance transpositions, multiple letter corrections and missing space errors .
  • a computer method of spelling correction comprises the steps of: a) storing a dictionary of valid words; b) for each input string to be checked, comparing the input string to words in the stored dictionary to identify input strings not in the dictionary; c) for each input string not found in the preceding step, generating test words by a restricted set of edit operations which correct the most common errors comprising insertion, deletion, transposition and/or substitution; d) comparing the edited input string generated in the preceding step with words stored in the dictionary; and e) generating a candidate word or list of candidate words from edited input strings that are found in the dictionary.
  • the members of the restricted set of edit operations are selected based upon a training set of the most common spelling errors.
  • the members of the restricted set of edit operations may be selected based on the letter n-grams containing more than the letter or letters to be edited.
  • a unique feature according to this invention is the use of edit operations that consist of only the most common edits to correct errors and at the same time allow more complex edits than used in prior algorithms, although these more complex edits relate to common errors.
  • the edit operations are restricted to distance one and if no valid edited input strings are found at edit distance one, allowing edits at distance two. According to another embodiment, the edit operations are restricted to distances one and two. According to yet another embodiment, all possible edits are allowed if no valid edited input strings are found at edit distances one or two.
  • the edit operations include long-distance transpositions, multiple letter insertions, multiple letter substitutions, multiple letter deletions and missing space errors at edit distance one.
  • the substitution edits may include non- alphabetic characters.
  • the dictionary may be stored in a data structure selected from hash tables, binary trees, or tries, for example.
  • the candidate list may be sorted by combinations of word length, word frequency or error frequency.
  • a search is made for missing space errors by testing complementary portions of a nonword for being valid words with a frequency above a given threshold.
  • Particularly useful applications of the computer methods disclosed herein are spelling correction in text files (documents), command lines and query statements.
  • the preferred computer method according to this invention comprises testing an input string against a dictionary to determine if it is a valid word. If the input string is a nonword because it is not found in the dictionary, for example, because it is misspelled or two words run together, a reverse minimum edit algorithm is implemented to find every word that is edit distance one away from the input string where the possible edits are limited to only those that are common spelling errors.
  • Spelling errors are considered common, for example, based upon experience and/or a statistical study of errors found in a corpus of documents that have not had spellings corrected.
  • the corpus of documents used to identify common spelling errors is preferably selected from documents relating to the specific academic or business field in with which this reverse minimum edit algorithm is used. Moreover, the corpus of documents may be typist specific. If a valid word is not found at edit distance one, the next step is to look for valid words at edit distance two. If a valid word still has not been found, a search is made for missing space errors (two words run together) . The final step is to return a correct word or a list of possible correct words. This method has a number of applications ranging from correcting words provided in the command line to correcting errors in a text document.
  • the computer method according to this invention involves a number of substeps . It first classifies the case of an input string as uppercase, lowercase, initial-caps or "McDonald" style and then converts the string to all lowercase letters. The original case is later restored to the corrected word. The lowercase string is then tested for membership in a dictionary. If the string is found in the dictionary, but only in non-lowercase, the case of the input string is changed to match that in the dictionary. If the string matches a word in the dictionary, it is accepted as correct. If the string is not present in the dictionary its case will be applied to corrections, except if the input string is lower case and the correction is not lowercase .
  • the reverse minimum edit distance algorithm for edit distance one then iterates over the letters of the input string, attempting at each position to find a correction at edit distance one away. It does this by applying each allowable edit to the input string at that position and checking whether the result is a word in the dictionary. Allowable edits are a subset of all possible edits chosen to correct common spelling errors. If a valid word is found it is put in a candidate list.
  • the edit distance two reverse minimum edit distance algorithm is similar, but after making the first edit to the string, it repeats the process on the resulting string looking for another possible edit starting after the current position. This is more efficient than the na ⁇ ve method, which would apply every possible edit to the resulting string using the code implemented for edit distance one. There is no need to check for edits before the current position because they will have been checked in previous iterations. There is no need to check for edits at the current position since they would undo or replace edits just completed.
  • the input string is split into words with no less than three characters and tested for each of the words to have a frequency of occurrence above a certain threshold.
  • the frequency of occurrence information is computed using a training corpus. Essentially, a large collection of documents is assembled and the frequency of occurrence of every word in the collection is computed.
  • the collection of documents could be a set of documents from the user's academic or business field, a generic set such as a large collection of newswire articles, or even generated from the user's own past writings.
  • the frequency information may be used in several places including when sorting candidate corrections .
  • the set of allowable edits may be selected using a program that analyzes a corpus of spelling errors and their corrections to identify the frequency of all single edits present in the corpus.
  • the source code for testing the analysis program is included immediately before the claims. For example, the analysis program for a particular corpus of documents tested found the following substitutions for the letter a.
  • the algorithm disclosed includes substitutions for non-alphabetic characters, such as replacing a semicolon with the letter 1 and the digit 3 with the letter e or the digit 5 with the letter s.
  • the most frequent deletions were: e, i, 1, s, t, r, n, a, o, u, c and m.
  • the most frequent insertions were e, s, i, n, r, t, 1, p, g, a, c and space.
  • the most frequent transpositions were ei, ie, le, re, ne, el, ro, er, al , na, it and si.
  • the most frequent larger substitutions were as follows: y for ie te for ght f for ph ie for y urns for a e for ia al for le
  • the algorithm includes 65 common spelling patterns, such as the prefix un becoming im before p.
  • misspelling beginning as beggining is equivalent to substituting gi for in.
  • the restrictions on permitted edits can be limited not just on the letters affected by the edits, but also on zero or more letters of context on either side of the edit.
  • the ie-->ei transposition is a common edit.
  • transpose (ie) would be an allowed edit and the computation would proceed, but we could, if we wished, restrict whether this edit was allowed based on the context in which it appears. For example, we might only allow it if the previous letter was a "c".
  • ie --> ei as an allowed edit
  • the transposition ne - - > en could be restricted, if desired, to mnet --> ment .
  • Standard minimum edit distance algorithms are driven by the letters in the words being compared, so when they consider an edit, they know exactly what letters are involved. They do need to consider different possible edits. For example, if the current position in the misspelled word starts with an R and the dictionary word starts with a P, it could be that the R is an insertion (e.g., if the next letter after the R is a P) , or it could be a transposition, a substitution or a deletion. Each possibility leads to a branch in the minimum edit distance computation.
  • Some of the branches may be pruned by considering only the most common edits, as with the reverse minimum edit distance algorithm. For example, since a P/R substitution is not very common, that possibility can be skipped.
  • the same kind of restricted set of edits can be used with standard edit distance algorithms. Moreover, if the number of edits is cut by a factor of three, that leads to a significant speedup in computing the distance between two words.
  • the method according to this invention demonstrates a speed increase of 13 to 26% for edit one distance and a speed increase of 44 to 50% for edit two distance.
  • the edit one distance method is fast enough to be useful for correcting the spelling of documents and queries in an information retrieval system.
  • the method according to this invention increases the number of cases in which there is only one correction in the candidate list and the percentage of those for which this unique candidate is the correct correction. If there is more than one candidate, sorting the list by word length, word frequency and the frequency of the edit tends to move the correction to the top of the candidate list.
  • the method recognizes all of the nonword errors by checking whether the word is present in a valid dictionary.
  • the method according to this invention demonstrates a first guess accuracy of about 75%, far beyond the state of the art.
  • the first guess accuracy improved to about 95%.
  • the speed and accuracy of the algorithm when there is only one candidate correction makes it possible to use it for automatic substitution of corrections as the user types.
  • edist (wordl, il,kl,word2, i2,k2) + cost (delete, wordl, kl, kl+1) + edist (wordl , kl+1 , j 1 , word2 , k2 , j 2 ) ,
  • edist (wordl, il, kl , word2 , i2 , k2) + cost (insert , wordl, kl, word2 , k2 ,k2+l) + edist (wordl, kl, j l,word2 , k2+l , j2) , edist (wordl , il , kl , word2 , i2 , k2 ) + cost (transpose, wordl, kl , kl+2 , word2 , k2 , k2+2) + edist (wordl, kl+2, j 1, word2 , k2+2 , j2)
  • the above is computing the minimum edit distance between the portions of wordl and word2 designated by indices il to jl and i2 to j2, respectively.
  • the simplest edit distance implementation has the costs set to 1 for nontrivial edits (e.g., substituting P for R) and 0 for trivial edits (e.g., substituting P for itself) .
  • More complex edit distance algorithms will use other cost figures to reflect the frequency of a given edit, for example .
  • the overall structure of the algorithm above is to split the input and target words each into three parts: the part containing the potential edit, the part before the edit and the part after the edit.
  • the parts before the edit are compared recursively using the same algorithm and likewise for the parts of the edit, and the resulting scores are added to the score for the current edit to compute an overall score for that edit, and the minimum score over all possible types of edits at all possible positions is returned as the result.
  • the restricted set of edits may be applied to this algorithm as follows. First, additional clauses are added to the min list corresponding to the more complex edits. The form of the clauses is similar. In fact, all edits may be treated as just different complex substitutions. For example, transposing "i” and “e” in “wierd” could be thought of as substituting "ei” for "ie”. All insertions, deletions, substitutions and transpositions, as well as our more complex edits, are nothing more than substitutions of one n-gram for another.
  • the way one calculates the speed of standard edit distance is to realize that the recursive process is essentially filling a table based on all possible values of the indices il, jl, i2 and j2.
  • the running time of the algorithm is the size of the table.
  • Other common optimizations can avoid the need to fill the entire table (e.g., if only words expected to be within edit distance 3 are compared, the words can be processed iteratively instead of recursively, leading to a semi-linear algorithm) .
  • the possible edits are restricted, the amount of computation is cut down by a factor of 3 to 4.
  • minval When computing the minimum, a variable is maintained with the current minimum value. Call it minval . The first time a recursive computation is performed, minval is set to the result. Every subsequent time, the result is compared to minval. If it is lower than minval, minval is set to it . All of the possible ways of decomposing the computation are iterated and at the end, the then current value of minval is returned as the result of the edist computation.
  • # accuracy figure is the percentage of errors for which SMART1 comes up
  • # error corpus prints a list of the misses.
  • # -f file Checks spelling of every word in a file.
  • $arg2 substr (Sarg, 2 ) ; ⁇ if ($arg2 e ⁇ "2") ⁇
  • Sorev $word; ⁇ ⁇ ⁇ close (ERRFILE) ;
  • ($correction, $numcor) ispellco ($1) ; if ($correction eq $r
  • ($loose_count && ( ($firstguess && Scorrection - / ⁇ $r/i)
  • ( !$firstguess ⁇ i Scorrection - /Sr/i)))) ⁇ Ssuccess++; ⁇ else ⁇
  • Slength length( Sword) ;
  • &SDeilcor_pos2 (Sword, $i, Slength) ;
  • $correction $tmp; push( ⁇ corrections, Scorrection) ; Snumcor++; ⁇ ⁇ ⁇ else ⁇
  • Sleft substr ( Sword, 0, $i) ;
  • Smiddle substr (Sword, $i,l) ;
  • Sright substr (Sword, Si- 1 -!) ;
  • $m3 substr (Sright, 1, 1 ⁇ ;
  • Sleft substr (Sword.0, Si) ;
  • Smiddle substr (Sword, i, ii ;
  • Sright substr (Sword, Si-D ; 5 spell.pl 4/21 p.7SpellGrams/revminedist/
  • icheck2 (Sleft “n”, Sright) icheck2 (Sleft “r” , Sright) icheck2 (Sleft “s” .Sright) icheck (Sleft “t”, Sright) elsif (Smiddle eq "e") ⁇ icheck2 (Sleft “a”.Sright) icheck2 (Sleft “c”, Sright) icheck2 (Sleft “d”, Sright) icheck2 (Sleft “g”, Sright) icheck2( Sleft “ i “ , Srigh ) icheck2 (Sleft “1”, Sright) icheckZ(Sleft “o”, Sright) &check2 (Sleft “r”, Sright) icheck2(Sleft “s”, Sright) icheck2 (Sleft “t”.
  • icheck2 (Sleft, Smiddle. “b” , Sright) icheck2 (Sleft, Smiddle. “c” .Sright) icheck2 (Sleft, Smiddle. “d” .Sright) icheck2 (Sleft, Smiddle. “e” , Sright) icheck2 (Sleft, Smiddle. “f “ .Sright) icheck2 (Sleft, Smiddle. “g” .Sright) icheck2 (Sleft, Smiddle . “h” .Sright) icheck2 (Sleft, Smiddle. .Sright) icheck2 (Sleft, Smiddle.
  • $m2 substr (Sright, 0, 1) ;
  • $m3 substr (Sright, 1, 1) ;
  • $r2 substr(Sright, 2) ;
  • # pell > ppel icheck (Sleft, "pe” , $right2) ; icheck (Sleft , "al “ , $right2 ) ;
  • icheck2 (Sleft, "ey” ,$right2) ; elsif ($middle2 eq “in”) ( ⁇ from cinn — •> ccin icheck2 (Sleft • ci",$right2) ; elsif ($middle2 eq "if) ⁇ icheck2 (Sleft “ate”,$right2) ; icheck2 (Sleft ,”ute",$right2) ,- icheck2 (Sleft, "mi” ,$right2) ; icheck2 (Sleft, “te",$right2) ,- elsif ($middle2 eq "le*) ⁇ icheck2 (Sleft “al " , $right2 ) ;
  • Swfirst substr (Sword, 0, 1) ;
  • Swrest substr (Sword, 1) ;
  • Swfirst substr($word, 0,1) ;
  • Swsecond substr(Sword, 1,1) ;
  • Swthird substr ($word, 2,1) ;
  • Swrest substr (Sword, 3) ;
  • Swrest - tr/A-Z/a-z/ ; return ( Swfirs . Swsecond. Swthird. Swres;

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

A computer method of spelling correction comprises the steps of: a) storing a dictionary of valid words, b) for each input string to be checked comparing the input string to words in the stored dictionary to identify input strings not in the dictionary, c) for each input string not found in the preceding step, generating test words by a restricted set of edit operations which correct the most common errors comprising insertion, deletion, transposition and/or substitution, d) comparing the edited input string generated in the preceding step with words stored in the dictionary and e) generating a candidate word or candidate list of the words.

Description

SPELLING CORRECTION METHOD USING IMPROVED MINIMUM EDIT DISTANCE ALGORITHM
BACKGROUND Current spelling correction software detects nonword spelling errors by checking whether the word or text string appears in a dictionary of valid words. Once a misspelled word is detected it is either automatically corrected or a candidate list of possible corrections is displayed. Algorithms for selecting the correction or displaying a candidate list of possible corrections use a word similarity metric to measure the distance from the misspelled word to words in the dictionary. The closest matches are treated as candidates. The most popular word similarity metric is minimum edit distance, that is, the minimum number of insertions, deletions, transpositions and substitutions required to transform the misspelled word into a valid word. Computing the edit distance to every word in the dictionary is time consuming. To reduce the number of required comparisons, candidate generation algorithms typically partition the dictionary according to word length and the first two letters of the word. Edit distances are only calculated for selected dictionary partitions. Stepping through the dictionary partition, each word is compared to the misspelled word and the edit distance therebetween is calculated. Now the dictionary partitioning used with standard edit distance leads to a reduction in accuracy. For example, the partitioning on the first letter means that it cannot correct errors that occur in the first letter (about 7% of all spelling errors) .
There is another approach. Reverse minimum edit distance is a candidate generation algorithm which applies possible edits to the misspelled word and then compares the edited word to words in the dictionary to discover which words are within a given number of edits from the misspelled word. For an n-letter nonword, there are 25n possible substitutions, 26 (n+1) possible insertions, n possible deletions, and n-1 possible transpositions for a total of 53n + 25 possible edits. For a seven letter nonword, that means a total of 396 possible words just for an edit distance of one. For an edit distance of two, the number of possible words goes up by the square yielding 156,816 possible words (not counting the edit distance one possibilities) . This is a much more time consuming algorithm than candidate generation algorithms based on word similarity metrics described in the preceding paragraph. Hence, modern word processing programs do not use reverse minimum edit distance algorithms. The standard minimum edit distance algorithm is generally preferred over the reverse minimum edit distance algorithm. The standard minimum edit distance algorithm computes the edit distance between the misspelled word and every word in the applicable dictionary partition. The number of minimum edit distance calculations is equal to the number of words in the partition. The cost of computing edit distances is only manageable because the set of potential corrections is limited. The reverse minimum edit distance algorithm applies all possible edits at the distances 1 or 2 and so on to a misspelled word blindly generating a large list of candidates each of which must then be tested against the valid dictionary. The number of candidates generated and the dictionary references required is normally considered prohibitive. Reverse minimum edit distance was described by
Ralph E. Gorin, in SPELL: Spell check and correction program, Stanford University, 1971. His implementation was limited to edit distance one. He applied all possible single errors (insertions, deletions, substitutions and transpositions) to the input string and proposed as candidate corrections the results that yielded valid words.
Mor and Fraenkel in "A hash code method for detecting and correcting spelling errors", Communications of the ACM 25 (12) , pp. 935-938 (1982) disclose a hashing method for efficiently retrieving all words within edit distance one of the misspelled word. The hash table is too big to be practical, however. Mays, Damerau, and Mercer in "Context based spelling correction", Information Processing & Management
27(5) , pp. 517-522 (1991) disclose reverse minimum edit distance with an edit distance of one to test an algorithm for valid word spelling correction.
Kernighan, Church and Gale in "A spelling correction program based on error frequencies", Proceedings of COLING-90, 2, pp. 205-210 (1990) used a corpus of spelling errors for which only a single correction existed to compute the frequency of occurrence for every correction and used these to rank candidate corrections.
It is an object of the present invention to implement spelling correction with a reverse minimum edit distance algorithm which is not nearly as time consuming as prior algorithms of this type, which allows for more complex edits than those used in previous word similarity metrics and reverse minimum edit distance algorithms, for example, long-distance transpositions and larger substitutions, and which actually yields more accurate results.
SUMMARY OF THE INVENTION Briefly, according to this invention, there is provided a computer method of spelling correction which comprises a step for calculating minimum edit distances using a restricted set of edit operations which correct the most common errors comprising insertion, deletion, transposition and/or substitution. The restricted set of edit operations consists of only the most common edits
(generally at distance 1 or 2) required to correct errors based upon a training corpus of documents with uncorrected spelling errors. However, the set of edits may also include common complex edits such as long-distance transpositions, multiple letter corrections and missing space errors . According to one embodiment of this invention, a computer method of spelling correction comprises the steps of: a) storing a dictionary of valid words; b) for each input string to be checked, comparing the input string to words in the stored dictionary to identify input strings not in the dictionary; c) for each input string not found in the preceding step, generating test words by a restricted set of edit operations which correct the most common errors comprising insertion, deletion, transposition and/or substitution; d) comparing the edited input string generated in the preceding step with words stored in the dictionary; and e) generating a candidate word or list of candidate words from edited input strings that are found in the dictionary. The members of the restricted set of edit operations are selected based upon a training set of the most common spelling errors. The members of the restricted set of edit operations may be selected based on the letter n-grams containing more than the letter or letters to be edited. A unique feature according to this invention is the use of edit operations that consist of only the most common edits to correct errors and at the same time allow more complex edits than used in prior algorithms, although these more complex edits relate to common errors.
According to another embodiment, the edit operations are restricted to distance one and if no valid edited input strings are found at edit distance one, allowing edits at distance two. According to another embodiment, the edit operations are restricted to distances one and two. According to yet another embodiment, all possible edits are allowed if no valid edited input strings are found at edit distances one or two. Preferably, the edit operations include long-distance transpositions, multiple letter insertions, multiple letter substitutions, multiple letter deletions and missing space errors at edit distance one. The substitution edits may include non- alphabetic characters.
The dictionary may be stored in a data structure selected from hash tables, binary trees, or tries, for example. The candidate list may be sorted by combinations of word length, word frequency or error frequency. According to one preferred embodiment, a search is made for missing space errors by testing complementary portions of a nonword for being valid words with a frequency above a given threshold. Particularly useful applications of the computer methods disclosed herein are spelling correction in text files (documents), command lines and query statements.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
The preferred computer method according to this invention comprises testing an input string against a dictionary to determine if it is a valid word. If the input string is a nonword because it is not found in the dictionary, for example, because it is misspelled or two words run together, a reverse minimum edit algorithm is implemented to find every word that is edit distance one away from the input string where the possible edits are limited to only those that are common spelling errors.
Spelling errors are considered common, for example, based upon experience and/or a statistical study of errors found in a corpus of documents that have not had spellings corrected. The corpus of documents used to identify common spelling errors is preferably selected from documents relating to the specific academic or business field in with which this reverse minimum edit algorithm is used. Moreover, the corpus of documents may be typist specific. If a valid word is not found at edit distance one, the next step is to look for valid words at edit distance two. If a valid word still has not been found, a search is made for missing space errors (two words run together) . The final step is to return a correct word or a list of possible correct words. This method has a number of applications ranging from correcting words provided in the command line to correcting errors in a text document.
More specifically, the computer method according to this invention involves a number of substeps . It first classifies the case of an input string as uppercase, lowercase, initial-caps or "McDonald" style and then converts the string to all lowercase letters. The original case is later restored to the corrected word. The lowercase string is then tested for membership in a dictionary. If the string is found in the dictionary, but only in non-lowercase, the case of the input string is changed to match that in the dictionary. If the string matches a word in the dictionary, it is accepted as correct. If the string is not present in the dictionary its case will be applied to corrections, except if the input string is lower case and the correction is not lowercase .
The reverse minimum edit distance algorithm for edit distance one then iterates over the letters of the input string, attempting at each position to find a correction at edit distance one away. It does this by applying each allowable edit to the input string at that position and checking whether the result is a word in the dictionary. Allowable edits are a subset of all possible edits chosen to correct common spelling errors. If a valid word is found it is put in a candidate list.
The edit distance two reverse minimum edit distance algorithm is similar, but after making the first edit to the string, it repeats the process on the resulting string looking for another possible edit starting after the current position. This is more efficient than the naϊve method, which would apply every possible edit to the resulting string using the code implemented for edit distance one. There is no need to check for edits before the current position because they will have been checked in previous iterations. There is no need to check for edits at the current position since they would undo or replace edits just completed.
When checking for missing space errors, the input string is split into words with no less than three characters and tested for each of the words to have a frequency of occurrence above a certain threshold. The frequency of occurrence information is computed using a training corpus. Essentially, a large collection of documents is assembled and the frequency of occurrence of every word in the collection is computed. The collection of documents could be a set of documents from the user's academic or business field, a generic set such as a large collection of newswire articles, or even generated from the user's own past writings. The frequency information may be used in several places including when sorting candidate corrections . The set of allowable edits may be selected using a program that analyzes a corpus of spelling errors and their corrections to identify the frequency of all single edits present in the corpus. The source code for testing the analysis program is included immediately before the claims. For example, the analysis program for a particular corpus of documents tested found the following substitutions for the letter a.
A 1 c 1 e 344 h 1 i 195
1 2 n 1 o 90 r 2 s 22 t 1 u 17 v 1 y i z 1
From this we see that the letter a is most often improperly replaced with the letters e, i, o, s and u. If we eliminate all substitutions with their frequency count of 1 or 0 , we are left with 7 transformations instead of
25, a three-fold reduction. In some cases, we allowed low frequency substitutions if they involved adjacent keys on the keyboard. Overall, this resulted in 181 substitutions for all letters, instead of the original 650. Weighting the letters by frequency of occurrence yields a weighted average of 8.65 substitutions per letter. This should result in a three-fold speedup.
The algorithm disclosed includes substitutions for non-alphabetic characters, such as replacing a semicolon with the letter 1 and the digit 3 with the letter e or the digit 5 with the letter s.
Of course there are other edits. The most frequent deletions were: e, i, 1, s, t, r, n, a, o, u, c and m. The most frequent insertions were e, s, i, n, r, t, 1, p, g, a, c and space. The most frequent transpositions were ei, ie, le, re, ne, el, ro, er, al , na, it and si. The most frequent larger substitutions were as follows: y for ie te for ght f for ph ie for y urns for a e for ia al for le
Larger substitutions were also found to be useful in improving accuracy of the edit distance one algorithm.
All told, the algorithm includes 65 common spelling patterns, such as the prefix un becoming im before p.
Others capture the confusions in words with double letters.
For example, misspelling beginning as beggining is equivalent to substituting gi for in. The restrictions on permitted edits can be limited not just on the letters affected by the edits, but also on zero or more letters of context on either side of the edit. For example, the ie-->ei transposition is a common edit. In a simple implementation, transpose ("ie") would be an allowed edit and the computation would proceed, but we could, if we wished, restrict whether this edit was allowed based on the context in which it appears. For example, we might only allow it if the previous letter was a "c". Thus, instead of including ie --> ei as an allowed edit, we would include cie --> cei as an allowed edit. Similarly, the transposition ne - - > en could be restricted, if desired, to mnet --> ment .
Since long-distance transpositions are much less likely, these edits were limited to the exchange of vowels around consonants d, g, 1, n, r, s, t, v and the exchange of the consonants 1, m, n around a vowel. The set of possible edits is restricted to the most common edits and more complex edits are added in order to improve the efficiency of reverse minimum edit distance by limiting the number of generated candidates. The limited set of edits could be applied to other spelling correction algorithms.
For example, if the limited set of edits and the more complex edits are used with standard edit distance algorithms, it would have the following consequences. The cost of computing the distance between two words would be reduced. Standard minimum edit distance algorithms are driven by the letters in the words being compared, so when they consider an edit, they know exactly what letters are involved. They do need to consider different possible edits. For example, if the current position in the misspelled word starts with an R and the dictionary word starts with a P, it could be that the R is an insertion (e.g., if the next letter after the R is a P) , or it could be a transposition, a substitution or a deletion. Each possibility leads to a branch in the minimum edit distance computation. (The computation increase is quadratic, not exponential, due to the use of dynamic programming, but there is still extra computation for each such branching point . ) Some of the branches may be pruned by considering only the most common edits, as with the reverse minimum edit distance algorithm. For example, since a P/R substitution is not very common, that possibility can be skipped. The same kind of restricted set of edits can be used with standard edit distance algorithms. Moreover, if the number of edits is cut by a factor of three, that leads to a significant speedup in computing the distance between two words. Since the edits are limited to only the most common, there is a reduction in the number of words that will be considered as close, but the set of close words will be the same as the set generated with the reverse minimum edit distance algorithm disclosed herein, so the same increase in accuracy applies here as well. Moreover, if no close candidates are found, it is not necessary to recompute everything from scratch in order to allow all possible edits to find a close match. The method simply backtracks to the point where edits were disallowed by saving the partial computation, thereby avoiding the need to start from scratch. Statistics on the number of times each uncommon edit was disallowed could be kept and used to prioritize which uncommon edits to allow first.
Various data structures may be used to store the valid word dictionary including binary trees, hash tables and "tries". The latter is a data structure described in detail by Donald Kuth in The Art of Computer Programming, Vol . 3 , (Addison Wesley) .
The source code immediately before the claims contains the complete listing of two programs written in the Perl language which is described, for example, in
Learning Perl 2nd Edition by Schwartz and Christianson
(O'Reilly & Associates Inc. 1997) . One listing implements the reverse minimum edit algorithm as disclosed herein and the other permits statistical testing of a corpus of documents to identify common spelling errors.
The method according to this invention demonstrates a speed increase of 13 to 26% for edit one distance and a speed increase of 44 to 50% for edit two distance. The edit one distance method is fast enough to be useful for correcting the spelling of documents and queries in an information retrieval system. The method according to this invention increases the number of cases in which there is only one correction in the candidate list and the percentage of those for which this unique candidate is the correct correction. If there is more than one candidate, sorting the list by word length, word frequency and the frequency of the edit tends to move the correction to the top of the candidate list. The method recognizes all of the nonword errors by checking whether the word is present in a valid dictionary.
The method according to this invention demonstrates a first guess accuracy of about 75%, far beyond the state of the art. When only one candidate correction was proposed by the algorithm, the first guess accuracy improved to about 95%. The speed and accuracy of the algorithm when there is only one candidate correction makes it possible to use it for automatic substitution of corrections as the user types.
The edit distance metric is usually implemented using dynamic programming (bottom-up) or memorization (top- down) with the following recursion: edist (wordl, il, j 1, word2 , i2 , j2) = if (wordl (il,jl) = word2 (i2 , j2) ) { return 0 } else { return min { edist (wordl , il, kl ,word2 , i2 , k2) + cost (subst , wordl , kl , kl+1 , word2 , k2 , k2+l) + edist (wordl , kl+1, j 1 , word2 , k2+l , j2) ,
edist (wordl, il,kl,word2, i2,k2) + cost (delete, wordl, kl, kl+1) + edist (wordl , kl+1 , j 1 , word2 , k2 , j 2 ) ,
edist (wordl, il, kl , word2 , i2 , k2) + cost (insert , wordl, kl, word2 , k2 ,k2+l) + edist (wordl, kl, j l,word2 , k2+l , j2) , edist (wordl , il , kl , word2 , i2 , k2 ) + cost (transpose, wordl, kl , kl+2 , word2 , k2 , k2+2) + edist (wordl, kl+2, j 1, word2 , k2+2 , j2)
}
for all kl such that il <= kl <= jl and for all k2 such that i2 <= k2 <= j2 }
where the above is computing the minimum edit distance between the portions of wordl and word2 designated by indices il to jl and i2 to j2, respectively. The simplest edit distance implementation has the costs set to 1 for nontrivial edits (e.g., substituting P for R) and 0 for trivial edits (e.g., substituting P for itself) . More complex edit distance algorithms will use other cost figures to reflect the frequency of a given edit, for example .
The overall structure of the algorithm above is to split the input and target words each into three parts: the part containing the potential edit, the part before the edit and the part after the edit. The parts before the edit are compared recursively using the same algorithm and likewise for the parts of the edit, and the resulting scores are added to the score for the current edit to compute an overall score for that edit, and the minimum score over all possible types of edits at all possible positions is returned as the result. Although this may seem computation intensive, efficiencies are gained because much of the computation overlaps. Saving partial computations makes the resulting algorithm quadratic instead of exponential.
The restricted set of edits may be applied to this algorithm as follows. First, additional clauses are added to the min list corresponding to the more complex edits. The form of the clauses is similar. In fact, all edits may be treated as just different complex substitutions. For example, transposing "i" and "e" in "wierd" could be thought of as substituting "ei" for "ie". All insertions, deletions, substitutions and transpositions, as well as our more complex edits, are nothing more than substitutions of one n-gram for another. Thus, we minimize edist (wordl, il,kl,word2, i2,k2) + cost (subst , wordl, kl, kl+m, word2 ,k2 , k2+n) + edist (wordl, kl+n, j 1, word2 ,k2+m, j2) over the range of values of m and n. Standard edit distance allows m and n to each be 0 or 1 or 2 (with 2 restricted to cases where the two substrings are transpositions of each other) . More complex edits might allow m and n to be 3 or even 4. Second, the summand is only computed when the substitution described by the cost line is one of the restricted set of common edits. Thus, if m and n are both 1 (a substitution) , we check whether the substitution is one of the list of common substitutions before doing the recursive edist computations for the parts before and after the edit. (The recursive computations are the expensive part.) A control statement is added into the minimization process, doing a test for the edit corresponding to each of the summands before executing the recursive sums. By restricting the allowed edits to the most common (about 1/3 of the possible edits) , the test succeeded only 1/3 of the time. This will lead to a factor better than three speedup .
The way one calculates the speed of standard edit distance is to realize that the recursive process is essentially filling a table based on all possible values of the indices il, jl, i2 and j2. The running time of the algorithm is the size of the table. Other common optimizations can avoid the need to fill the entire table (e.g., if only words expected to be within edit distance 3 are compared, the words can be processed iteratively instead of recursively, leading to a semi-linear algorithm) . When the possible edits are restricted, the amount of computation is cut down by a factor of 3 to 4. (If all misspelled words involved only one error, the reduction would be a factor of three speedup, but since some misspelled words involve multiple errors, the speedup in those cases is greater. Assuming that 80% of all spelling errors involve a single edit, this means an estimated speedup of a factor of . )
When computing the minimum, a variable is maintained with the current minimum value. Call it minval . The first time a recursive computation is performed, minval is set to the result. Every subsequent time, the result is compared to minval. If it is lower than minval, minval is set to it . All of the possible ways of decomposing the computation are iterated and at the end, the then current value of minval is returned as the result of the edist computation.
Before another summation for possible comparison with minval (each iteration) , the edit under consideration is first compared with the set of allowed possible edits.
(There are many possible ways to compare an edit with a list of possible edits. Representing the list of possible edits as a linear list would have a running time equal to half the number of possible edits. A much better representation would be a binary tree which would have a running time equal to the length of the longest edit, effectively a constant.) If the edit is allowed, the summand is computed, including the recursive edist computations. If the edit is not allowed, the program skips to the next iteration.
Having thus defined the invention in the detail and particularity required by the patent laws, what is desired protected by Letters Patent is set forth in the following claims. spell.pl 1/35 pJSpellGrams/revminedist/ 98/04/21
# ! /usr/local/bin/perl
# Copyright (c) 1998 by Justsystem Pittsburgh Research Center, Inc.
# Written by Mark Kantrowits, Research Scientist, nikant3jprc.com.
# Start date: April 7, 1998.
# Lasc modification: April 16, 1998. This is a variation on reverse minimum edit distance that limits TΓ the collection of edits to only the most frequent:. The indention
# is to significantly speed up the algorithm. However, ir: also
# winds up improving the accuracy as well, a counter-intuitive result. r Average # of substitutions is 6.96 (average) or 8.65 (freg weighted).
# This is compared to 26 for full minimum edit distance. So a factor o; 3-4 speeduc #
# 3994 substitutions 24.8%
# 3674 deletions 22.8%
# 6191 insertions 38.5%
# 2225 transpositions 13.8%
# 16084 TOTAL
# So 83.3% of errors in the cor us are single errors. #
# 19298 total errors in corous . #
# Assumes 5 seconds of startup time (to load dictionaries) .
# edist= =1 edist= =2
# NAIVE 57.0% (19. .7 ms) 59.5% (702 ms)
# SMART 59.8% (14, .7 ms) 62.7% (372 ms)
# SMART1 60.6% (15, .1 ms) 63.5% (348 ms)
SMART2 61.1% (17. .0 ms) 64.0% (374 ms)
# SMART3 61.6% (17. .3 ms) 64.2% (373 ms)
SMART4 61.6% (17. .3 ms) 64.2% (388 ms)
# SMART2 fixed a few slight bugs in SMART1 (e.g., insertions as last letter)
# SMART3 added long distance transpositions and longer substitutions.
# SMART4 is like SMART3, but adds them to edist=2 as well.
The following table shows the overall accuracy for errors for which the algorithm provided a unique correction. edist=l edist=2
SMART1 94.1% 92.4% SMART2 94.4% 93.0% SMART3 94.9% 93.5%
# Set this variable to 1, 2, 3, or 4 to choose the appropriate version. $version = 3 ; Set loose_count to 1 to count membership, firscguess to 1 to count membership where the correction is the first in the list.
# For example, ΞMART1 with edist=I contains the correct correction 32% if of the time, and has an overall first-guess accuracy of 73.9%. The 60.5%
# accuracy figure is the percentage of errors for which SMART1 comes up
# with the correction as the only answer. need to rerun tne overaj..
# first-guess stats with other sorting orders, to see :r.ev can imcrove 2/35 spell.pl
98/04/21 p.7SpellGrams/revminedist/
the accuracy to the theoretical maximum of 82%. Also get first-guess
# accuracy scores for edist=2 and the other versions.)
$loosa_cσunt = 0; Sfirstguess = 0;
# If this variable is set to 1, it forces the edit distance 2 computation,
# even if it finds a correction at edit distance 1. $force_e2 = 0;
# TODO:
# 0. Integrate into ps2ascii . #
# 1. When there is more than one correction, they are currently
# sorted by frequency, with word length disambiguating between
# equal frequency. Perhaps they should be sorted first by length,
# second by frequency. if 2. Replace hash table with a trie? a. ft 3. Review code for possible efficiency hacks.
# 4. Test sensitivity of algorithm to dictionary size. if 5. Review the list of the unique corrections that it gets wrong.
# USAGE: * π spell.pl -d -v4 -stats -e 2 -split
# -d Turns on debug mode. When measuring performance on the
# error corpus, prints a list of the misses. When doing
# generic spellcor, returns all matches when there's more
# than one.
# -stats Measures performance on the error corpus. #
# -e #
# If # is 2, does edist of 2, otherwise edist of 1. #
# -split If present, also looks for word boundary errors.
# -f file Checks spelling of every word in a file.
# -v # Version of code to use. Default = 4.
# Typical invocations :
# spell.pl -d -e2 -split recieve
# spell.pl -stats
# spell.pl -stats -e2
# spell.pl -f tmp.txt
$debug = 0;
Sstats = 0;
Sstatsfile = "errors.txt",-
$edist2 = 0;
Ssplitw = 0;
Sfi derrs = 0;
$finderrfile = " " ;
Scmdline = 0; spell.pl 3/35 pJSpeϋGrams/revminedist/ 98/04/21
# This variable controls whether the frequencies are updated S using the frequencies from the document. If a docword is if in the dictionary, it increments the count. If a docword is TΓ not in the dictionary and is not found to be an error, it is π added to the dictionary. This will enable the program to
# correct misspelled versions of the document's words. #
# Note that we're using a greedy approximation. What we should
# be doing is compiling frequency statistics on the document
# words which don't appear in the dictionary, cross-compare the words in the list, and for any pair pick the more frequent as
# the correct spelling. Then do the spellcor. But this approximation
# is easier to imolement. #
# Also, we should be increasing the dictionary frequency of the
# corrections, but that isn't important for this application.
Sfrequpdt = 1;
# This controls whether file spell checking is conservative in
# the treatment of words with an initial capital or not. $conservcase = 1;
# This controls whether it counts the number of unique corrections if that were not the correct correction.
$unique_accuracy = 1;
# This controls whether it prints out the unique corrections that % were incorrect.
$print_unique_fail = 0; while (SARGV) {
$arg = shift SARGV; # print "$arg\n"; if ($arg =- /ΛW) { if ($arg =- /Λ-d/i) {
$debug = 1; } elsif ($arg =- /Λ-stats/i) {
$stats = 1; } elsif ($arg =- /Λ-split/i) {
$splitw = 1; } elsif ($arg =- /Λ-e/i) C if (length ($arg) == 2) { $arg2 = shift SARGV; } else {
$arg2 = substr (Sarg, 2 ) ; } if ($arg2 eσ "2") {
$edist2 = 1; } else {
$edist2 = 0; } ; elsif (Sarg =- /Λ-v/i) { if (length (Sarg) == 2) { $arg2 = shift SARGV,- } else (
$arg2 = substr (Sarg, 2 ) ; 4/35 speil.pl
98/04/21 p:/SpellGrams/revminedist/
} if ($arg2 eq "1") {
$ ersion = 1; } elsif ($arg2 eq "2") {
$version = 2; } elsif ($arg2 eq "3") C
$version = 3; } elsif ($arg2 eq "4") {
Sversion = 4; } else {
Sversion = 4; } - print "Running version Sversion. \n" ; } elsif ($arg =- /Λ-f/i) { $arg2 = shift SARGV; $finderrs = 1; $finderrfile = $arg2; } } else [ unshift (SARGV, ($arg) ) ; last; } } Scmdline = 1 if (!$finderrs &ά !$stats);
# print "Scmdline Sdebug $finderrs Sstats Sedist Ssplitw SARGV\n" ,-
# Load in the spelling dictionary, open (DICT, "words . tx " ) ; while (<DICT>) { chom ;
$word = $_;
Sword =- tr/A-Z/a-z/;
Sdic {Sword}=1; } close (DICT) ;
# The following variable is the minimum frequency required for
# each of the two parts of a split. This eliminates many spurious splits.
# To replace this constraint with a simple requirement that the two
# parts be words, set it to 1. $minsplitfreq = 10;
TΓ Load in the frequency statistics. These are based on the Tipster
# article word frequency statistics, culled of Tipster-specific
# aspects. But any frequency stats should do as well, open (DICT, " tf eq. txt" ) ; while (<DICT>) { chomp;
($count,$word) = split (Λt/) ;
Sword =- tr/A-Z/a-z/,-
$dict{Sword} += Scount; } close (DICT) ,- if ( Scmdline) ( foreach Sword (SARGV) {
( Scorr, Snumcor) = έspellco ( Sword) ; spell.pl 5/35 p:/Spe!!Grams/revminedist/ 98/04/21
print "$word —> $corr\n" ;
($finderrs) { open(ERRFILE, *$finderrfile" ) ; $las aos = 0; Sprev = * ,- while (<SRRFILE>) { chom ; foreach $word (spli (/\s+/ ) ) { if -(Sprev =- / [\ . \?\ ! ] $/ ) {
$lasteos = 1; } else {
$lasteos = 0; }
Sword =- s/\W+$//; Sword =- s/Λ\W+//;
Stmp = Sword; $tmp =- tr/A-Z/a-z/ ; if (Stmp eq $word || Slasteos || ! $conservcase) if (SdictCStmp} == 0 &ά length($tmp) > 4) { ($corr,$numcor) = &spellcor (Sword) ; if (Sword ne Scorr &ά $corr ne " " ) { printf "%s —> %s\n". Sword, $corr; } elsif (Sfrequpdt) { $dict($tmp}+÷;
}
} elsif (Sfrequpdt && lengt ($tap) > 4) {
$dict{$tmp}++; }
Sorev = $word; } } } close (ERRFILE) ;
if (Sstats) {
Serrcount = 0; Ssuccess = 0; $ failure = 0; openfERRS, "$statsfile" ) ; while (<ERRS>) { chomp;
($l,$r) = split!/ --> /); # $tmp = $1; Stmp =- tr/A-Z/a-z/; $errcount++;
($correction, $numcor) = ispellco ($1) ; if ($correction eq $r | | ($loose_count && ( ($firstguess && Scorrection =- /Λ$r/i) || ( !$firstguess άi Scorrection =- /Sr/i)))) { Ssuccess++; } else {
Sfailure÷÷ if ( $unigue_accuracy &i Sπumcor == 1) if ($print_unique_ ii ϋ Snumcor == 1) ( printf "$1 —> Sr (%s)\n", Scorrection; } elsif (Sdebug) [ printf "$1 —> Sr (%si\n", Scorrection; } 6/35 spell.pl
98/04/21 p:/Spe!!Grams/revminedist/
1
} close (ERRS) ; printf "Accuracy: %.3f (Ssuccess/Serrcount) \n" , Ssuccess/Serrcount; if (Sunique_accuracy) { printf "Unique Correction Accuracy: %.3f (%d/%d)\n",
Ssuccess/ ($success-5 ilure) , Ssuccess, (Ssuccess÷Sfailure) }
} sub spellcor { - local ($word) = θ_; local ($length, $i, $unmod_word, Stmp); local ($correction, θcorrections, $numcor, Scase); if (($word =- /Λ[a-z-\_\;\'\,\"\,\.\?\!\=\/]+$/i ||
Sword =- /Λ[a-z-\_\;\'\,\"\,\.\?\!\=\/]-\d[a-z-\_\;\'\1\"\,\.\?\!\=\/]÷$/i II
Sword =- /Λ[a-z-\_\;\'V\'\.\.\?\!\=\/]- Q$/i ||
Sword =- /Λ\d[a-z-\;\'\,\"\,\.\?\!\=\/]-t$/i) Si
Sword !- A's$/i &£ Sword !- A'$/i) {
Snumcor = 0; Scorrection = ""; θcorrections = ();
Slength = length( Sword) ;
Scase = &id_case (Sword) ;
$unmod_word = Sword;
Sword =- tr/A-Z/a-z/; for ($i = 0; $i < Slength; $i++) {
&st>ellcor_pos ( Sword, $i, Slength) ; } if ($force_e2 || (Snumcor == 0 4i $edist2)) { for ($i = 0; $i < Slength; $i++) (
&SDeilcor_pos2 (Sword, $i, Slength) ;
} } if (Ssplitw &ά Snumcor != 1) { Stmp = 4split_word($word) ; if ($tmp ne $word) (
$correction = $tmp; push(θcorrections, Scorrection) ; Snumcor++; } } } else {
# print "$word slipoed through...\n" ,- } if (Snumcor == 1) { retur (Scorrection, Snumcor); } else C if (Sdebug == 1 | | ($loose_count &ά Sstats)) { return! join( " , " , sort (by_freq θcorrections )), Snumcor) ; } else [ return! $unmod_word, Snumcor) ; } } spell.pl 7/35 p:/SpellGrams/revminedist/ 98/04/21
} sub spellcor_pos { local (Sword, $i, Slength) = @_; local ( $left, Smiddle , $right, $m2 , $r2 ) ;
Sleft = substr ( Sword, 0, $i) ; Smiddle = substr (Sword, $i,l) ; Sright = substr (Sword, Si-1-!) ;
# substitutions if (Smiddle. eq ";") {
&check($left, "\'",$right) if (Si < $length - 1) ;
&check(Sleft, " 1 ",$right) ; } elsif ($middle eq "\") {
&check($left, "V", Sright) if (Si < Slength - 1) ; } elsif ($middle eq "\_") {
&check($left,"\-", Sright) if ($i < $length - 1) ; } elsif (Smiddle eq "\$") {
&check($left, "s",$right) if ($i < Slength - 1 &£ $i > 0); } elsif (Smiddle eq "\=") £
&check($left, "\", Sright) if ($i < Slength - 1 && $i > 0) } elsif (Smiddle e "V") { fccheck ( Sleft, " ",$right) if ($i < Slength - 1 &ά $i > 0) } elsif (Smiddle =- /Λ\α$/) { if (Smiddle eq "0") (
&check( Sleft, "o\ Sright) ;
} elsif (Smiddle eq "1") { &check ( $left, " 1' , Sright) ; &check($left, "i", Sright) ; &check($left, "e'.Sright) ;
} elsif (Smiddle eq "3") { &check($left, "e',$right) ;
} elsif ($middle eq "9") { &check($left, *o", Sright) ;
} } elsif (Smiddle eq "a") {
&check($left, "e",$right) ;
&check($left, "i",$right) ;
&check($left, "o',$right) ;
&check($left, "s",$right);
&check ($left,"u",$righ ) ;
&check($left, "z",$right) ; } elsif ($middle eq "b") {
4check($left, "d" , $right) ;
&check($left, "g",$right) ;
&check ($left , " h" , Sright ) ;
&check($left, "1", Sright);
4check($left, "n",$right) ;
&check($left, "p", Sright) ;
&check($left, "t" , Sright) ;
&check( Sleft, "v", Sright) ; } elsif (Smiddle e " c " ) ( icheck! Sleft, "d", Sright)
4check($laft, "e" , Sright)
&check( Slef , "g" , Sright)
&check(Sleft, "k", Sright) icheckt Sleft, "n" , Srighc) 8/35 spell.pl
98/04/21 p:/SpellGrams/revminedist/
&check(Sleft , Sright) ; &check(Sleft "t" , Sright) ; &check(Sleft -v" , Sright) ; &check(Sleft "X" , Sright);
} elsif (Smiddle eeqq "d") { &check(Sleft " "bb" , Sright) ; &check(Sleft " "cc" , Sright) ; &check( Sleft "e" , Sright) ; &check(Sleft "f " .Sright) ; icheck (Sleft "g« , Sright) ; &check(Sleft "n" , Sright) ; &check(Sleft , Sright) ; &check(Sleft "s" , Sright) ; &check(Sleft , Sright) ;
} elsif (Smiddle eeqq "e'J { &check($left " "aa" , Sright) ; &check(Sleft " "cc" , Sright) ; &check(Sleft " "dd" , Sright) ; icheck (Sleft "g" , Sright) ; ichec (Sleft , Sright) ; icheck(Sleft , Sright) ; &check(Sleft »o" , Sright) ; icheck (Sleft "r" .Sright) ; icheck (Sleft "s" , Sright) ; &check(Sleft "t" .Sright); icheck(Sleft " , Sright) ; &check(Sleft "w" , Sright) ; icheck (Sleft "y" , Sright) ;
} elsif (Smiddle eq "f" ) { icheck (Sleft "d" .Sright); icheck (Sleft "g" , Sright) ; icheck (Sleft " "oO" ,$right) ; icheck (Sleft " "pp" .Sright) ; icheck(Sleft ""rr" , Sright) ,- icheck (Sleft ""tt" , Sright) ; icheck (Sleft "vv" .Sright) ;
} elsif (Smiddle eeqq "g") ( &check(Sleft ""bb" .Sright); icheck (Sleft ""cc" , Sright) ; icheck($left ""dd" , Sright) ,- icheck(Sleft "e" .Sright); icheck(Sleft ■ a * , Sright ) ; icheck (Sleft "h" .Sright) ; icheck (Sleft "j" .Sright); icheck (Sleft "n" , Sright ) ; icheck(Sleft "q" .Sright); ichec (Sleft "t" , Sright); ichec (Sleft »v" , Sright) ;
} elsif (Smiddle eq "h"J { icheck (Sleft "c" , Sright ) ; ichec (Sleft "g" , Sright) ; icheck(Sleft "j" .Sright) ; icheck( Sleft "k" .Sright) ,- ichec (Sleft " 1 " , Sright) ,- ichec (Sleft , Sright) ; icheck (Sleft .Sright) ;
} elsif (Smiddle eq "i"i t ichec (Sleft "a" , Sright) ; spell.pl 9/35 p:/SpellGrams/revmiπedist/ 98/04/21
icheck(Sleft e" , Sright) ; ichec (Sleft 1", Sright); icheck(Sleft n" , Sright) ; icheck (Sleft "o" .Sright) ; ichec (Sleft s" .Sright) ; icheck(Sleft "U* .Sright) ; icheck(Sleft "y» , Sright) ; elsif (Smiddle eq "j") { icheck (Sleft "g", Sright) ; icheck(Sleft "h", Sright) ; icheck(Sleft "n", Sright) ; elsif (Smiddle eq "k") { icheck(Sleft "c", Sright) ; ichec (Sleft "g", Sright) ; icheck(Sleft "i", Sright) ; icheck(Sleft "1", Sright) ; icheck(Sleft "n", Sright) ; icheck(Sleft "o", Sright) ; ichec (Sleft •t", Sright) ,- elsif (Smiddle eq "1") { ichec (Sleft "d", Sright) ichec (Sleft "i", Sright) icheck (Sleft "k", Sright) ichec (Sleft "n" .Sright) icheck(Sleft "o" .Sright) icheck(Sleft .Sright) ichec (Sleft .Sright) icheck(Sleft .Sright)
} elsif (Smiddle eq "m") {
# m is the last character if (Si == Slength - 1 ii Sdic (Sleft} != 0) { Scorrection = Sleft. "\ , " ; push (θcorrections, Sleft. "\, ") ; $numcor+÷;
} icheck($left, "b" , Sright) ; icheck($left, "1", Sright) ; icheck($left, "n". Sright); icheck($left, "o", Sright) ; icheck (Sleft, " t ", Sright ); } elsif (Smiddle eq "n") { icheck($left, "b" . Sright) icheck (Sleft , " c " , Sright) icheck($left, "d", Sright) ichec (Sleft, "g", Sright) icheck (Sleft, "h" , Sright) icheck (Sleft, "1", Sright) icheck($left, " ", Sright) icheck($left, "r", Sright) icheck($left, "t", Sright) icheck (Sleft, "u", Sright) } elsif (Smiddle eq "o") ( icheck (Sleft, *a", Sright) ; icheck (Sleft , " e " , Sright ) ; icheck (Sleft, "i", Sright) ; icheck($left, "1", Sright) ; icheck($left, *n", Sright) icheck($left, "p", Sright) ; spell.pl /21 p:/Spe!IGrams/revminedist/
ichec (Sleft , Sright) ; ichec (Sleft "u" Sright) ; elsif (Smiddle eq P") { icheck(Sleft "b" Sright) ; icheck (Sleft "e" Sright) ; icheckfSleft "o" Sright) ; icheck (Sleft Sright) ; elsif (Smiddle eq q") ( icheck (Sleft "a" Sright) ; icheck (Sleft "c" Sright) ; icheckfSleft "g" Sright) ; icheck (-Sleft "w" Sright) ; elsif (Smiddle eq r") { icheck(Sleft "b" Sright) ; icheckfSleft "c" Sright) ; icheck (Sleft "d" Sright); icheckfSleft "e" Sright) ; icheckfSleft "g" Sright) ; icheckfSleft "1" Sright) ; icheckfSleft "n" Sright) ; icheck (Sleft "o" Sright) ; icheck (Sleft "P" Sright); icheckfSleft -t" Sright) ; elsif (Smiddle eq s") { icheckfSleft "a" Sright) ; icheckfSleft "c" Sright) ; icheckfSleft "d" Sright) ; icheckfSleft "e" Sright) ; icheckfSleft "1" Sright) ; icheckfSleft "m" Sright) ; icheckfSleft n" Sright) ,- icheckfSleft "t" Sright) ; icheckfSleft "w" Sright) ; icheckfSleft "X" Sright) ; icheckfSleft "z" Sright) ; elsif (Smiddle eq f) ( icheckfSleft "V .Sright) icheckfSleft "b" Sright) icheckfSleft "c" Sright) icheckfSleft "d" Sright) icheckfSleft "e" Sright) icheckfSleft "f" Sright) icheckfSleft "g" Sright) icheckfSleft "n" Sright) icheckfSleft "r" Sright) icheckfSleft "s" Sright) icheckfSleft ■y" Sright) elsif (Smiddle eq u") { icheckfSleft "a" Sright) icheckfSleft Sright) icheckfSleft "i" Sright) icheckfSleft "n" Sright) icheckfSleft "O" Sright) ichec fSlef Sright! ichec fSleft Sright) ichec fSleft »y'< Sright) elsif (Smiddle eq V) ( icheckfSleft b" Sright) spell.pl 11/35 p:/SpellGrams/revminedist/ 98/04/21
ichec fSleft, c , Sright) ; icheckfSleft, , Sright) ; icheckfSleft, "» «-»" .Sright) ; icheckfSleft, "n" .Sright) ; icheckfSleft, "w" .Sright) ; elsif (Smiddle eq "w") { icheck (Sleft, "a" .Sright) ; ichec fSleft, , Sright) ; icheckfSleft, "q" , Sright) ; icheckfSleft, , Sright) ; icheckfSleft, "S" .Sright) ,- icheckfSleft, .Sright); icheckfSleft, "U" .Sright); elsif (Smiddle eq "X") { ichec (Sleft, "c" .Sright) ; icheckfSleft, " "αd" .Sright) ; icheckfSleft, " "ss" , Sright) ; elsif (Smiddle eeqq "y") { icheck (Sleft, " "aa* .Sright) ; ichec (Sleft, " "ee" , Sright) ; icheckfSleft, , Sright) ; icheck (Sleft, "i" .Sright) ; icheckfSleft, " o" .Sright) ; icheckfSleft, "t" .Sright) ; icheckfSleft, "U" .Sright); elsif (Smiddle e "z") { icheckfSleft, "a" ,$right) ; icheckfSleft, " c " .Sright) ; icheckfSleft, "s" .Sright) ; icheckfSleft, "X" .Sright) ;
# deletions icheckfSleft, ' .Sright) ;
# inse:rtions icheck (Sleft a" .Smiddle, Sright ) icheck (Sleft "b" .Smiddle, Sright) icheck (Sleft "c" •Smiddle, Sright) icheck (Sleft "d" •Smiddle, Sright) icheck (Sleft "e" .Smiddle, Sright) icheck (Sleft f .Smiddle, Sright) icheck (Sleft "g" •Smiddle, Sright) icheck(Sleft "h" .Smiddle, Sright) icheck (Sleft .Smiddle, Sright) icheck (Sleft .Smiddle, Sright) icheck (Sleft "k" .Smiddle, Sright) icheck (Sleft •Smiddle, Sright) icheck (Sleft " " •Smiddle, Sright) icheck (Sleft "n" •Smiddle, Sright) icheck (Sleft "o" •Smiddle, Sright) icheck (Sleft "p" •Smiddle, Sright) icheck (Sleft q" •Smiddle, Sright) icheck [ Slsf* .Smiddle, Srigh ) icheck (Sleft "s" •Smiddle, Sriσht ) icheck (Sleft •Smiddle, Sright) icheck (Sleft "u" •Smiddle, Sright) icheck (Sleft "V" •Smiddle, Sright) /35 spell.pl 04/21 p:/SpeIIGrams/revminedist/
icheck(Sleft "w" .Smiddle, Sright) ; icheck($lef: "x" -Smiddle, Sright) ; ichec (Slef- "y" .Smiddle, Sright) ; icheck (Slef: "z" . Smiddl , Sright ) ; icheck (Slef: " \ ' " . Smiddle , Sright)
TΓ special case fo. last letter s rtions if (Sversion > 1) { if (Si == Slength - 1) C icheckfSleft, Smiddle > . " a " , Sright) icheck(Slef , Smiddlei.-b" , right) ichec fSleft, Smiddle ' . " c " , right) icheck(Sleft, Smiddle ."d" , Sright) icheckfSleft, Smiddle . "e" , Sright) icheckfSleft, Smiddle . " f " , Sright) icheckfSleft, Smiddle . "g" , Sright) icheck (Sleft, Smiddle . h" , Sright) ichec (Sleft, Smiddle . * i " , Sright) icheckfSleft, Smiddle ."j" .Sright) icheckfSleft, Smiddle . "k- .Sright) ichec fSleft, Smiddle * 1 " , Sright) icheckfSleft, Smiddle . "m" , Sright) icheckfSle t, Smiddle .Sright) icheck(Sle t, Smiddle . "0" , Sright) icheckfSleft, Smiddle . "p" , Sright) icheck(Sleft, Smiddle ."q" , Sright) icheck ( Sleft, Smiddle ."r" , Sright) icheck (Sle , Smiddle . "s" , Sright) icheck(Sleft, Smiddle " , Sright) icheck(Sleft, Smiddle . "u" , Sright) icheck(Sle t, Smiddle ."V" , Sright) icheck(Sleft, Smiddle -nW" .Sright) icheck(Sleft, Smiddle .Sright) icheck(Sleft, Smiddle ."y" , Sright) icheckfSle t, Smiddle . "z"
} π transpositions if (Si != Slength - 1) {
Sm2 = substr(Sright, 0, 1) ; Sr2 = substrfS ight, 1) ; if (Smiddle eq "V") { icheckfSleft, "n\'",$r2) if (Sm2 eq "n") } elsif (Smiddle eq "a") { icheck(Sleft, "ca" ,$r2) if (Sm2 eq "c") icheck($left, *ea",$r2) if (Sm2 eq "e") icheckfSleft, "ga",$r2) ($m2 e "g") icheckfSleft, "ha 1 ,$r2) (Sm2 eq "h") icheckfSleft, "ia ',$r2) (Sm2 eq "i") icheckfSleft, "ka \$r2) (Sm2 eq "k") icheckfSleft, "ia \Sr2) (S 2 eq "1") ichec fSleft, " a ',Sr2) (Sm2 eq "m") icheckfSlef , "na ' , Sr2) (3m2 e _ ichec (Slef , " oa ',$r2! !Sm2 ec "o") ichec (Sleft, "pa ',Sr2) (3m2 "tj" ! icheckfSle t, "ra"1 ,, SSrr2)) if ((SSmm22 eq "r"i ichec fSle , "sa ',Sr2) (Sm2 spell.pl 13/35 p:/S pellG rams/revmin edist/ 98/04/21
icheckfSleft, ,Sr2) ($m2 eq "t") ; icheckfSleft, "ua" ,Sr2) ($m2 eq "u") ; } elsif (Smiddle eq "b") { icheck (Sleft, "ab" ,Sr2) ($m2 eq "a") ; icheck (Sleft, "ib" ,Sr2) ($m2 eq "i") ; icheck (Sleft, " b" ,Sr2) ($m2 eq "m") ; } elsif (Smiddle eq "c") C icheck (Sleft, "ac" ,Sr2) ir ($m2 eq "a") ; icheck($left, "ec" ,Sr2) ($m2 eq "e"); icheckfSleft, "ic" ,Sr2) ($m2 eq "i") ; icheckfSleft, "nc" ,Sr2) ($m2 eq "n") ; ichec fSleft, oc" ,Sr2) it (Sm2 eq "o") ; icheckfSleft, rc",$r2) if (Sm2 eq "r") ; icheckfSleft, sc",$r2) if ($m2 eq 's'); ichec fSleft, "tc",Sr2) if ($m2 eq "t") ; icheckfSleft, "uc",$r2) if (Sm2 eq "u") ; icheckfSleft, yc",$r2) if ($m2 eq "y"); } elsif (Smiddle eq "d") { ichec fSleft, "ad",$r2) if ($m2 eq "a") ; icheckfSleft, "ed",$r2) if ($m2 eq 'e"); ichec fSleft, id",$r2) if (Sm2 eq *i") ; icheckfSleft, *ld",$r2) if ($m2 eq "1") ; ichec fSleft, nd".$r2) if (Sm2 eq *n") ; ichec fSleft, od",$r2) if ($m2 eq "o") ; } elsif (Smiddle eq "e") { ichec fSleft, ae",$r2) if (Sm2 eq "a") ; icheckfSleft, "be",$r2) if (Sm2 eq "b") ; icheckfSleft, "ce",$r2) if (Sm2 eq 'c'); icheckfSleft, "de",$r2) if (Sm2 eq "d") ; ichec fSleft, fe",$r2) if (Sm2 eq "f); ichec fSleft, "ge",$r2) if (Sm2 eq "g"); icheckfSleft, he",$r2) if f$m2 eq "h" ) ; icheckfSleft, ie",$r2) if ($m2 eq "i") ; ichec fSleft, "ke",$r2) if (Sm2 eq "k") ; icheckfSleft, le",$r2) if ($m2 eq "1"); icheckfSleft, "me",$r2) if (Sm2 eq "m" ) ; icheckfSleft, ne",Sr2) if ($m2 eq *n" ) ; ichec fSleft, "oe",$r2) if (Sm2 eq "O"); icheckfSleft, "pe",$r2) if ($m2 eq *p" ) ; icheckfSleft, "re"',Sr2) if (Sm2 eq *r"); icheckfSleft, se"'.Sr2) if ($m2 eq -s') ; ichec fSleft, "te"',$r2) if ($m2 eq *t") ,- ichec fSleft, "ue ,Sr2) if ($m2 eq "u") ; icheckfSleft, "ve ,Sr2) if (Sm2 eq "v" ) ; icheckfSleft, "ye ,Sr2) if (Sm2 eq *y } ; } elsif (Smiddle eq f) { icheckfSleft, "ef ,$r2) ir ( Sm2 eq "e" ) ; icheckfSleft, "If ,Sr2) if ( $m2 eq " l" ) ; icheckfSleft, "nf ,$r2) if ( $m2 eq "n" ) ; icheckfSleft, "of ,$r2) if ( $m2 eq "o " ) ; icheckfSleft, rf",$r2) if ( $m2 eq "r" ) ; } elsif (Smiddle eq "g") { icheck (Sleft, "ag",$r2) ( $m2 eq " a " ) ; ichec fSleft, "ig". s .:; ( Sm2 eς ' i " ) icheckfSleft, "ng", Sr2) ( Sm2 eq "n" ) icheckfSleft, "og" Sr2) ( Sm2 eq " o " ) ; icheckfSleft, Sr2) ( $m2 eq " r" ) ; icheckfSleft, "ug" Sr2) ( $m2 ec "u" ) ; speli.pl /21 p:/SpeilGrams/revminedist/
elsif (Smiddis eq "h") { icheck (Slef: , "ch" ,$r2) f ($m2 eq "c" ichec (Sleft ,, "gh" ,$r2) f (≤m2 eq *g" icheckfSleft ,, "ph" ,$r2) f (Sm2 eq "p" icheckfSleft , "rh" ,$r2) if ($m2 eq "r" icheckfSleft ,, "sh" ,$r2) f ($m2 eq "s" icheckfSleft , "th" ,$r2) if (Sm2 eq "t" icheckfSleft "wh",$r2 if (Sm2 eq "w" elsif (Smiddle se "i") icheck (Sleft ai",Sr2 τ_ (S 2 eq "a" icheck(Sleft "ci",Sr2 if (Sm2 icheck(Sleft "di",Sr2 τ_ ($m2 icheckfSleft "ei",Sr2 τ_f (Sm2 eq "e" icheckfSleft gi",$r2 if (Sm2 eq g" icheck (Sleft "hi",$r2 if (Sm2 eq h" icheckfSleft "li",$r2 if (Sm2 eq 1" icheckfSleft "ni",$r2 if (Sm2 eq "n" icheckfSleft, "oi",$r2 if (Sm2 eq o" icheckfSleft, "ri",$r2 if (Sm2 eq "r" icheckfSleft, "si",$r2 i (Sm2 eq s" icheckfSleft, "ti",$r2 •ϊ (Sm2 eq * n icheckfSleft, "ui",$r2 if (Sm2 eq "U" ichec (Sleft , vi " , $r2 if ($m2 eq "V" ichec (Sleft, "wi",$r2 if ($m2 eσ w" elsif (Smiddle eq "j")
# ichec ( $le t, "g",$ 2) if ($m2 eq "n" ) elsif (Smiddle e "k") icheckfSleft, "ak",$r2 ir Sm2 eq a" icheckfSleft, "ck",$r2 if Sm2 eq c' icheckfSleft, "nk",$r2 if Sm2 eq n" icheckfSleft, "rk".$r2 if Sm2 eq r" icheckfSleft, "sk",$r2 if $m2 eq "s" elsif (Smiddle eq "1") icheck ( Sleft , "al " , $r2 if $m2 eq 'a* ichec (Sleft, "bl",$r2 if $m2 eq *b" icheck($left,"cl",$r2 if $m2 eq "c" icheck (Sleft , el " , $r2 if $m2 eq "e" icheckfSleft, "il",$r2 if $m2 eq "i" icheckfSleft, "ol",$r2 if $m2 eq "o" icheckfSleft. "pi ",$r2 if $m2 eq "p" ichec ( Sleft , "rl " , $r2 if $m2 eq "r" icheckfSleft."si ".$r2 if $m2 eq "s" icheckfSleft. "tl",$r2 if $m2 eq *t" ichec ( Sleft, "ul " , $r2 if $m2 eq "u" ichec (Sleft, "yl",$r2 if Sm2 eq "y" elsif (Smiddle eq "m" ) ichec (Sleft, "am",$r2 if $m2 eq "a" ichec (Sleft , " em" , $r2 if $m2 eq "e" icheckfSleft, "nm",$r2 if $m2 eq "n" icheckfSleft, "om",$r2 if $m2 eq "o" icheckfSleft, "rm",Sr2 if $m2 eq "r" icheckfSleft, "sm",$r2 $m2 eq "s" elsif (Smiddle eq "n") icheckfSleft, "an" , Sr2 $m2 eq "a" icheckfSleft •en",Sr2 Sm2 eq "e" icheckfSleft gn",Sr2 $m2 e "g" icheckfSleft n ,$r2 $m2 eq "i" icheckfSleft kn" ,$r2 $m2 eq "k" spell.pf 15/35 pJSpellGrams/revminedist 98/04/21
icheckfSleft "on",$r2 ;$m2 eq "o") , icheckfSleft m",$r2 !$m2 eς *r"); icheckfSleft "sn",$r2 ir :$m2 eς s"); icheckfSleft "un",Sr2 if $m2 eq "U") ; icheckfSleft "wn",Sr2 if $m2 eq w" ) ; icheckfSleft "yn",Sr2 if ($m2 eq y ) ;
} elsif (Smiddle eq "o") icheckfSleft "ao",$r2 if $m2 eq "a"); icheckfSleft "co",$r2 if $m2 eq "C" ) ; icheckfSleft eo",$r2 if $m2 eq e"); icheckfSleft "fo",Sr2 if $m2 eq f"); icheckfSleft "go",Sr2 if $m2 eς g") ; icheckfSleft "ho",Sr2 if $m2 eq h"); icheckfSleft "io".$r2 if Sm2 eq "i"); icheckfSleft "lo",Sr2 if Sm2 eq 1") ; icheckfSleft "mo",$r2 if $m2 eq "a* ) ; icheck(Sleft "no",$r2 if $m2 eq n") ; icheckfSleft "ro*,Sr2 if Sm2 eq "r"); icheckfSleft "so",$r2 if $m2 eq s"}; icheckfSleft "to",$r2 if $m2 eq t"); icheckfSleft uo",$r2 if $m2 eq "U" ) ; ichec fSleft "wo",$r2 if $m2 eq w" ) ;
} elsif (Smiddle eq "p") ichec (Sleft "ap",$r2 if $m2 eς "a"); icheckfSleft "ep",$r2 if Sm2 eq e") ; icheckfSleft "iρ",$r2 if $m2 eq i"); ichec fSleft "lp",$r2 if $m2 eg "1"); icheckfSleft "mp",$r2 if $m2 eq m" ) ; icheckfSleft "oρ",$r2 if $m2 eq o"); icheckfSleft "rp",$r2 if $m2 eq "r"); ichec fSleft sp",$r2 if Sm2 eq s") ; icheckfSleft "up",$r2 if $m2 eq "U" ) ;
} elsif (Smiddle eq "q") icheckfSleft "cq",$r2 if $m2 eq "C") ;
} elsif (Smiddle eq "r") icheckfSleft "ar",$r2 if Sm2 eq "a"); icheckfSleft "cr",$r2 if $m2 eq c"); icheckfSleft "er",$r2 if Sm2 eq e"); ichec fSleft "gr",$r2 if $m2 eq "g"); icheck(Sleft "hr",$r2 if $m2 eq h"); icheck(Sleft ir",$r2 if Sm2 eq i"); icheck($left or",$r2 if $m2 eq o"); icheckfSleft "pr".$r2 if $m2 eq p"); icheckfSleft "tr".$r2 if Sm2 eq t") ; icheck(Sleft ur",$r2 if $m2 eq U*) ; icheckfSleft "yr".$r2 if $m2 eq y ) ;
} elsif (Smiddle eq "s") icheckfSleft "as",$r2 if $m2 eq "a") ; icheckfSleft "bs",$r2 if $ni2 eq "b") ; icheckfSleft "es",$r2 if $m2 eq "e") ; icheckfSleft "is",$r2 if $ια2 eq "i") ; icheckfSleft "ks ,Sr2 if Sm2 eq k"); icheckfSleft "Is ,Sr2 Sm2 eq 1" ) ; icheckfSlef "ms ,Sr2 Sm2 eς m'' ) ; icheckfSleft "ns ,Sr2 $m2 eς "n" ) ; icheckfSleft "os ,Sr2 $m2 eς *0" ) ; icheckfSleft "ps",$r2 Sm2 eq p") ; ichec fSleft "rs",$r2 Sm2 eς "r"),- /35 speil.pl 04/21 p:/Spel!Grams/revminedist/
icheck(Sleft, "ts " , $r2 f ($m2 eq "t" icheckfSleft, "us" ,$r2 if ($m2 eq "u" icheckfSleft, "ys",$r2 f ($m2 ec "y*
} elsif (Smiddle e "t") icheckfSleft,"at",$r2 f ($m2 eς "a" icheckfSleft, "ct",$r2 if ($m2 ec "c" icheckfSleft, "et",$r2 if (Sm2 eq "e" icheckfSleft, "ht",$r2 ($m2 eq "h" icheckfSleft, "it",$r2 ($m2 eς "i" icheckfSleft, "It", $r2 if ($m2 eς "1" icheckfSleft, "nf,$r2 if ($m2 eς "n" icheckfSleft, "ot",$r2 f ($m2 ec "o" icheckfSleft, "pt",$r2 f ($m2 eq "p" icheckfSleft, "rt",$r2 if ($m2 eq "r" icheckfSleft, "st",$r2 ($m2 eq "s" icheckfSleft, "ut",$r2 if ($m2 eq "u"
} elsif (Smiddle eq "u") icheck(Sleft, "au" , $r2 ($m2 eq "a") icheckfSleft, "bu",$r2 if (Sm2 eq "b") icheckfSleft, "cu",$r2 . ($m2 eq c") icheckfSleft, "eu",$r2 if (Sm2 eq *e") icheckfSleft, "gu",$r2 if ($m2 eq "g") icheckfSleft, "lu",$r2 ($m2 eq "1") icheckfSleft, "nu",$r2 (Sm2 eq "a") icheckfSleft, "ou",$r2 (Sm2 eq o") icheckfSleft, "pu",$r2 if ($m2 eq p') icheck(Sleft, "ru" , Sr2 if (Sm2 eq "r") icheckfSleft, "su",$r2 ($m2 eq s") icheckfSleft, "tu",$r2 if ($m2
} elsif (Smiddle eq "v") icheckfSleft, "av",$r2 if ($m2 eq "a") icheckfSleft, "ev",$r2 if ($m2 eq "e") icheckfSleft, "iv",$r2 if ($m2 eq *i") icheckfSleft, "lv",$r2 if ($m2 eq "1")
} elsif (Smiddle eq "w" ) icheckfSleft, "ew",$r2 if ($m2 eq "e") icheckfSleft, "ow",$r2 if ($m2 eq "o") icheckfSleft, "sw",$r2 if ($m2 eq "s")
} elsif (Smiddle eq "x") # icheck($left,"s",$r if ($m2 eq "n")
} elsif (Smiddle eq "y") ichec (Sleft,"ay", $r2 if (Sm2 eq "a") icheckfSleft, "hy" , $r2 if (Sm2 eς "h") icheckfSleft, "ly",$r2 if ($m2 eq *1") icheck (Sle t, "ry" , $r2 if ($m2 eς "r") icheck(Sleft, "sy" , $r2 if (Sm2 eq *s")
} elsif (Smiddle eq "z") icheckfSleft, "iz",$r2 if ($m2 ec "i") icheck( Sleft, "yz" , $r2 if ($m2 eq "y" )
}
}
T Long Distance Transpositions if (Si <= Slength - 2) {
Sm2 = substr (Sright, 0, 1) ;
$m3 = substr (Sright, 1, 1} ;
Sr2 = substr (Sright, 2);
Smid = Smiddle. Sm2.Sm3; spell.pl 17/35 p:/SpellGrams/revminedist/ 98/04/21
Γ Not including if (Smid =- /Λ [aeiouy] [dglnrstv] [aeiouyJS/ ϋ Smiddle ne Sm3) { icheck (Slef , Sm3. $m2. Smiddle, $r2) ,- } elsif (Smid =- /Λ [Imn] [aeiou] [lmn] $/ ii Smiddle ne $m3) { icheckfSleft, Sm . $m2. Smiddle, $r2) ; } elsif ($mid eq "vel" | | $mid eq "lev" | | Smid eq "ton" j | Smid eq "not") ( icheckfSleft, $m3. $m2.Smiddle, $r2) ; } elsif (Smid =- /Λ [cdfgrst] [aieu] [cdfgrst] S/ ii Smiddle ne $m3) ( icheck($left, $m3. $m2.Smiddle, $r2) ; }
if (Sversion >= 3) {
# Multiple Character Substitutions iii
# y/ie, f/ph, al/le,
5r not doing pneu/ne, ant/ent, ance/ence, aly/ally, eu/ea, oe/ow,
# pre/pro, ious/uous, pre/per, ceed/sede, ament/ement, eous/ious,
# sh/sc, ghth/ght, all/al, c/sc/s/ss, m/gm, ss/s, eorg/orge, ene/ean Γ uf/ough, mce/cem, eat/ate, tui/uit, al/ile, ash/has, fea/afe,
# rau/ura if (Smiddle eq "e") { icheck($left, "ia", Sright) ; icheckfSleft, "ai" , Sright) ; } elsif (Smiddle eq "f") { icheck($left, "ph", Sright) ; icheck($lef , "ve" , Sright) ; } elsif (Smiddle eq "i") { icheckfSleft, "ea" , Sright) ; } elsif (Smiddle eq "y") { icheckfSleft, "ie" , Sright) ; } if ($i == Slength - 1) { if (Smiddle eq "t") { icheckfSleft, "ed", Sright) ; icheckfSleft, "led" , Sright) ,- } } if ($i <= Slength - 2) {
$middle2 = substr (Sword, $i,2) ; $right2 = substr (Sword, $i+2) ; if ($middle2 eq "al") { ichec (Sleft, "le" , $right2) ; } elsif ($middle2 eq "as") { # from cassi —> ccasi icheck($left, "ca" , $right2) ; } elsif (Smiddle2 eq "a *) { icheckfSleft, "o",$right2) ; } elsif (Smiddle2 eq "ce") { icheckfSleft, "es" , $right2) ; } elsif (Smiddle2 eq "co") { icheckfSleft, "om" , Sright2) ; } elsif (Smiddle2 eq "de" ii Si == 0) ( icheckfSleft, "un" , $right2) ; } elsif ($middle2 eq "ea") { icheckfSleft, "i",$right2) ; icheckfSleft, "ie" , $right2) ; spell.pl /21 p:/SpellG rams/revminedist/
elsif (S iddle2 eq "el*) {
# pell —> ppel icheck($left, "pe" , Sright2) ; ichec (Sleft , "al " , $right2 ) ;
# icheck(Sleft, " le " , $right2 ) ; elsif ($middle2 eq "en") { icheck(Sleft, *ine " , $right2) ; elsif ($middle2 eq "ey") { icheck(Sleft, "ie" , Sright2) ; elsif (Smiddle2 eq "fe") {
# from ffering —> farring and ttereα —> r= - ichecktSleft, "εr" , $right2) ;
# from ffes —> fess icheck($lef , " s " , Sright2 ) ; elsif ($midάle2 eq "gi*) {
# ggin —> ginn icheck($left, "in" , $right2) ; elsif (Smiddle2 eq "ia") { ichec ( Slef , "e" , $right2 ) ; elsif ($middle2 eq "ie*) { ichecktSleft, "y" , Sright2) ; icheckfSleft, ey",Sright2) ; elsif (Smiddle2 eq "in") {
# from cinn —> ccin icheck(Sleft , "ci" , Sright2 ) ; elsif ($miαdle2 eq if) { ichecktSleft, "ate ' ,$right2); ichecktSleft, "ute \$right2); ichec fSleft, "mi" .$right2); ichec ( Sleft, " te" , $right2 ) ; elsif ($middle2 eq "le*) { icheckfSleft, "al" , $right2) ;
# icheck (Sleft, "el * , $right2 ) elsif ($middle2 eq "lo") { icheckfSleft, "os" ,$right2) ; elsif fSmiddle2 eq "mn") { ichecktSleft, "um" , $right2) ; elsif ($middle2 eq "oo*) { icheckfSleft, "u" , $right2) ; elsif ($middle2 eq "ph*) C icheckfSleft, "f $right2); elsif ($middle2 eq *qu*) { icheckfSleft, "ck' ,$right2); elsif ($middle2 eq "ra") C ichecktSleft, *al" ,$right2); ichecktSleft, "as" , $right2) elsif ($middle2 eq "ri*l { icheckfSleft, "ib ' , $right2 ) ; icheckfSleft, "if ' ,$right2) ; elsif ($middle2 eq "ro'l C icheckfSleft, "er , $right2 ) ; el ") {
Figure imgf000034_0001
ichec tSleft, "ic" ,Sright2) , elsif (Smiddle2 eq "te") ( icheck($left, "ght" , Srignt2) ; elsif (Smiάdle2 eq "ye") { ichec fSleft, "i" , Sright2 ) ; spell. pi 19/35 pr/SpellGrams/revminedist 98/04/21
}
}
-f (Si <= Slength - 3) { $.middle3 = subst (Sword, $i, 3) ; $right3 = substr (Sword, $i-r3 ) ; if ($middle3 eq "age") { icheckfSleft, "edge" , $right3 ) ,- elsif ($middle3 eq "acy") ( icheckfSleft, ' isy",$right3) ; elsif ($middle3 eq "ase*) { ichecktSleft, ' sea" , $right3) ; elsif ($middle3 eq "ded" ) { ichecktSleft, ' t" , $right3) ; elsif ($middle3 eq "ear") { icheckfSleft, ' ere" , $right3 ) ; elsif ($middle3 eq "evi" ) { ichec tSleft, iev" , $right3) ; elsif {$middle3 eq "exi") { π from exion —> ection icheckfSleft, "ecti" ,$right3) elsif ($middle3 eq "gin") { ichecktSleft, ing ,$right3) elsif ($middle3 eq ine") { ichecktSleft, ein ,$right3) elsif ($middle3 eq isy") { icheckfSleft, acy ,$right3) elsif ($middle3 eq nts*) { icheckfSleft, nee , $right3 ) elsif ($middle3 eq ons* ) { ichecktSleft, a",$right3) ; elsif ($middle3 eq "que") { ichecktSleft, "ck $right3) ; elsif ($middle3 eq sci") C icheckfSleft, cil ,$right3) ,- elsif ($middle3 eq tio*) { ichecktSleft, cea , $right3 ) ; elsif ($middle3 eσ unp" ϋ $i == 0) C # un —> im before a p icheckfSleft, " imp" , $right3 ) ; elsif ($middle3 eq 'ums") ( ichecktSleft, a",$right3) ; elsif ($middle3 eq "ure") { icheckfSleft, eur" ,$right3) ; icheckfSleft, er",$right3) ;
} sub spellcor_pos2 { local ( Sword, $i, Slength) = @_; local (Sleft, Smiddle, Sright, Sm2,Sr2)
Sleft = substr (Sword.0, Si) ; Smiddle = substr (Sword, i, ii ; Sright = substr (Sword, Si-D ; 5 spell.pl 4/21 p.7SpellGrams/revminedist/
substitutions if (Smiddle eq " ; " ) { icheck2 ($left, "\", Sright) if (Si < Slength - 1); icheck2 (Sleft, "1", Sright) ; } elsif (Smiddle eq *\*") { icheck2 (Sleft, "\", Sright) if (Si < Slength - 1) ; } elsif (Smiddle eq "\_") { icheck2 (Slef , "\-", Sright) if (Si < Slength - 1) ; } elsif (Smiddle eq "\S") { icheck2 (Sleft, "s", Sright) if (Si < Slength - 1 ii $i > 0); } elsif (Smiddle eq "\=") { icheck2 (Sleft, "\", Sright) if ($i < Slength - 1 ii $i > 0) } elsif (Smiddle eq *V) { icheck2 (Sleft, "\'", Sright) if ($i < Slength - 1 ii $i > 0), } elsif (Smiddle =- /Λ\d$/) { if (Smiddle eq "0") { icheck2 (Sleft, "o", Sright) ;
} elsif (Smiddle eq "1") C icheck2 (Sleft, "1", Sright) ; icheck2 (Sleft, "i",Sright) ; icheck2 (Sleft, "e", Sright) ;
} elsif (Smiddle eq "3") ( icheck2 (Sleft, "e", Sright) ;
} elsif (Smiddle eq "9") { icheck2 ( $ieft, "o* . Sright) ;
} } elsif (Smiddle eq "a") { icheck2 (Sleft, "e", Sright) ; icheck2 (Sleft, "i", Sright) ; icheck2 (Sleft, "o", Sright) ; icheck2 (Sleft, "s", Sright) ; icheck2(Sleft, "u", Sright) ; icheck2 (Sleft, "z", Sright) ; } elsif (Smiddle eq "b") { icheck2 (Sleft, "d", Sright) ; icheck2 (Sleft, "g", Sright) ; icheck2 (Sleft, "h" , Sright) ; icheck2 (Slef , "1* , Sright) ; icheck2( Sleft, "n", Sright); icheck2 (Sleft, " ", Sright); icheck2 (Sleft, *t" , Sright) ; icheck (Sleft, "v" , Sright) ; } elsif (Smiddle eq "c") { icheck2 (Sleft, "d" , Sright) icheck2 (Sleft, "e", Sright) icheck2 (Sleft, "g", Sright) icheck2 (Sleft, "k", Sright) icheck2 (Sleft, "n", Sright) icheck2 (Sleft, "s" .Sright) icheck2(Sleft, "t", Sright) icheck2 (Sleft, "v", Sright) icheck2(Sleft, "x" , Sright) } elsif (Smiddle eq "d") { icheck2 (Sleft, "b1 , Sright) ; icheck2 (Sleft, "c", Sright) ; icheck2 (Slef , "e" , Sright) ; icheck (Sleft, "f" , Sright) ; icheck2 (Sleft, "g" , Sright) ,- spell.pl 21/35 p:/Spe!IGrams/revminedis 98/04/21
icheck2 (Sleft "n", Sright) icheck2 (Sleft "r" , Sright) icheck2 (Sleft "s" .Sright) icheck (Sleft "t", Sright) elsif (Smiddle eq "e") { icheck2 (Sleft "a".Sright) icheck2 (Sleft "c", Sright) icheck2 (Sleft "d", Sright) icheck2 (Sleft "g", Sright) icheck2( Sleft " i " , Srigh ) icheck2 (Sleft "1", Sright) icheckZ(Sleft "o", Sright) &check2 (Sleft "r", Sright) icheck2(Sleft "s", Sright) icheck2 (Sleft "t". Sright) icheck2 (Sleft "u" .Sright) icheck2 (Sleft "w" .Sright) icheck2 (Sleft "y", Sright) elsif (Smiddle eq "f") { icheck2 (Sleft "d", Sright) icheck2 (Sleft "g", Sright) icheck2 (Sleft "o" , Sright) icheck2 (Sleft "p", Sright) icheck2 (Sleft "r" .Sright) icheck2 (Sleft "t", Sright) icheck2 (Sleft "v", Sright) elsif (Smiddle eq "g") { icheck2 (Sleft "b", Sright) icheck2 (Sleft "c", Sright) icheck2 (Sleft d", Sright) icheck2 (Sleft "e", Sright) icheck (Sleft "f", Sright) icheck2 (Sleft "h", Sright) icheck2( Sleft "j", Sright) icheck2 (Sleft "n" .Sright) icheck2 (Sleft "q", Sright) icheck2 (Sleft t" , Sright) icheck2 (Sleft v", Sright) elsif (Smiddle eq "h") { icheck2 (Sleft "c", Sright) icheck2 (Sleft g", Sright) icheck2 (Sleft j", Sright) icheck2 (Sleft k", Sright) icheck2 (Sleft 1", Sright) icheck (Sleft "n", Sright) icheck2 (Sleft "s", Sright) elsif (Smiddle eq "i") { icheck2 (Sleft "a", Sright) icheck2 (Sleft "e", Sright) icheck2 (Sleft "1", Sright) icheck2 (Sleft "n", Sright) icheck2 (Sleft "o", Sright) icheck2 (Sleft "s", Sright) icheck (Sleft "u", Sright) icheck2(Sleft "y" , Sright) elsif (Smiddle eq " j " ) { icheck2 (Sleft "g" , Sright) icheck2 (Sleft "h", Sright) 22 35 speli.pl
98/04/21 p-iSpelIGrams/revminedis /
icheck2 (Sleft, "n", Sright) elsif (Smiddle eq "k" ) { icheck2( Sleft, "c", Sright) icheck2( Sleft "g", Sright) icheck2 (Sleft "i" .Sright) icheck2 (Sleft "1", Sright) icheck2 (Sleft "n" , Sright) icheck2 (Sleft, ""oo" , Sright) icheck2 ( Sleft , ""tt" , Sright) elsif (Smiddle eq 1") { icheck2 (Sleft , "d .Sright) ,- icheck2 (Slef , "i .Sright) ; icheck2 (Sleft "k1 , Sright) ; icheck2 (Sleft "n' .Sright) ; icheck2 (Sleft "o' .Sright) ; icheck2 (Sleft, "D1 .Sright) ; icheck2(Sleft, .Sright) ; icheck2 (Sleft t' .Sright) ; elsif (Smiddle eq " ") {
# m is the last character
(Si == Slength - 1 ii SdictCSleft} 0) { Scorrection = $left."\,"; pushfScorrections, Sleft. " \ , ") ; $numcor*÷ ; icheck2( Sleft b" ,$right) ; icheck2 (Sleft 1", Sright); icheck2(Sleft n" , Sright) ; icheck2 (Slef , " o , Sright) ; icheck2( Sleft, "t .Sright) ; elsif (Smiddle eq n") { icheck2( Sleft, "b .Sright) icheck2 (Sleft, , S ight) icheck2 (Sleft, .Sright) icheck2( Sleft, .Sright) icheck2 (Sleft, , Sright) icheck2 (Sleft, ,$right) icheck2 (Sleft, "m .Sright) icheck2 (Sleft, "r ,$right) icheck2( Sleft, "t", Sright) icheck (Sleft, "u", Sright) elsif (Smiddle eq "o") { icheck2 (Sleft, "a", Sright) icheck2 (Sleft, "e", Sright) icheck2( Sleft, "i", Sright) icheck2(Sleft, "1' , Sright) ; icheck2 (Sleft, "n . Sright) , icheck (Sleft, "p . Sright) , icheck2 (Sleft, "r . Sright) icheck2 (Sleft, "u . Sright) elsif (Smiddle eq P" ) ( icheck2 (Sleft, "b , $riςht) icheck2 ( Sleft , " a , Sright) icheck2 ( Sle , " o , Sright) icheck (Sleft , " r , Sright) Lsif ( Smiddle eq q" ) ( icheck2 ( Sleft , "a , Sright) icheck2 ( Sleft , " c 1 , Sright ) speli.pl 23/35 p:/SpelIGrams/revminedist 98/04/21
icheck2 (Slef g", Sright) icheck2($lef w", Sright) elsif (Smiddle eq "r") [ icheck2 (Sleft "b", Sright) icheck2 (Sleft "c", Sright) icheck2 (Sleft "d", Sright) icheck2 (Sleft "e", Sright) icheck2 (Sleft "g", Sright) icheck2 (Sleft "1", Sright) icheck (Sleft "n" .Sright) icheck2 (Sleft "o " , Srigh ) icheck2.( Sleft "p", Sright) icheck2( Sleft "t", Sright) elsif. (Smiddle eq "s") { icheck2 (Sleft "a", Sright) icheck2 (Sleft "c", Sright) icheck2 (Sleft "d", Sright) icheck2 (Sleft "e", Sright) icheck (Sleft "1", Sright) icheck2 (Sleft "m", Sright) icheck2 (Sleft "n", Sright) icheck2 (Sleft "t", Sright) icheck2( Sleft "w" , Sright) icheck2( Sleft "x", Sright) icheck2 (Sleft "z " , Sright) elsif (Smiddle eq "t") { icheck (Sleft \' ".Sright ); icheck2 (Sleft "b", Sright) icheck2 (Sleft "c", Sright) icheck (Sleft "d", Sright) icheck2 (Sleft "e", Sright) icheck2 (Sleft "f", Sright) icheck2 (Sleft "g", Sright) icheck2 (Sleft "n" , Sright) icheck2( Sleft "r", Sright) icheck2 (Sleft "s", Sright) icheck2 (Sleft "y", Sright) elsif (Smiddle eq "u") { icheck2( Sleft "a", Sright) icheck2 (Sleft "e", Sright) icheck2 (Sleft "i", Sright) icheck2 (Sleft "n" .Sright) icheck2 (Sleft o", Sright) icheck2 (Sleft "r", Sright) icheck2 (Sleft "w", Sright) icheck2 (Sleft "y", Sright) elsif (Smiddle eq "v") { icheck2 (Sleft "b", Sright) icheck2 (Sleft "c" , Sright) icheck2( Sleft "f", Sright) icheck2 (Sleft "g" .Sright) icheck2 (Sleft "n" .Sright) icheck2 (Sleft "w" .Sright) elsif (Smiddle eq "w") { icheck (Sleft "a" .Sright) icheck2 (Sleft e", Sright) icheck2 (Sleft "q" .Sright) icheck2 (Sleft "r" , Sright) /35 spell.pl 04/21 p:/SpellGrams/revminedis /
icheck2 (Sleft, "s", Sright) icheck2 (Sleft, "t", Sright) icheck2 (Sleft, "u", Sright) elsif (Smiddle eq "x") { icheck2 (Sleft, "c", Sright) icheck2 (Sleft, "d", Sright) icheck2 (Sleft, "s", Sright) elsif (Smiddle eq "y" ) { icheck2 (Sleft, "a" , Sright) icheck2 (Sleft, "e" , Sright) icheck2( Sleft, "h", Sright) icheck (Sleft, "i", Sright) icheck2( Sleft, "o", Sright) icheck2( Sleft, "t", Sright) icheck2 (Sleft, *u", Sright) elsif (Smiddle eq "z") { icheck2 (Sleft, "a" , Sright) icheck2 (Sleft, "c", Sright) icheck2 (Sleft, "s", Sright) icheck2 (Sleft, *x" , Sright)
} ir deletions icheck2 (Sleft '", Sright)
$ insβrtions icheck2 (Sleft, "a" .Smiddle, Sright) icheck2 (Sleft, "b" .Smiddle, Sright) icheck2 (Sleft, "c" .Smiddle, Sright) icheck2 (Sleft, d" . Smiddle, Sright) icheck2 (Sleft, "e" . Smiddle, Sright) icheck2 (Sleft, "f" . Smiddle, Sright) icheck2 (Sleft, •g» .Smiddle, Sright) icheck2 (Sleft, "h" .Smiddle, Sright) icheck2 (Sleft, "i" .Smiddle, Sright ) icheck2 (Sleft, "j" .Smiddle, Sright) icheck2 (Sleft, "k" .Smiddle, Sright) icheck2 (Sleft, "1" .Smiddle, Sright) icheck2 (Sleft, "m" .Smiddle, Sright) icheck2 (Sleft, "n" .Smiddle, Sright) icheck2 (Sleft, "o" -Smiddle, Sright) icheck2 (Sleft, •p" . Smiddle, Sright) icheck2 (Sleft, "q" .Smiddle, Sright) icheck2 (Sleft, "r" . Smiddle, Sright) icheck2 (Sleft, "S" .Smiddle, Sright) icheck2 (Sleft, "t" . Smiddle, Sright) icheck2 (Sleft, "U" .Smiddle, Sright) icheck2 (Sleft, "v" . Smiddle, Sright) icheck2 (Sleft, "w" •Smiddle, Sright) icheck2 (Sleft, "X" .Smiddle, Sright) icheck2 (Sleft, "y" .Smiddle, Sright) icheck2 (Sleft, .Smiddle, Sright) icheck2 Sleft, 'V " . Smiddle:, Sright)
# special case for last lattar insertion if (Sversion > 1) ( if (Si == Slength - 1) { icheck2 (Slef t, Smiddle. "a" , Srigh- speil.pl 25/35 p:/SpellGrams/revminedist/ 98/04/21
icheck2 (Sleft, Smiddle. "b" , Sright) icheck2 (Sleft, Smiddle. "c" .Sright) icheck2 (Sleft, Smiddle. "d" .Sright) icheck2 (Sleft, Smiddle. "e" , Sright) icheck2 (Sleft, Smiddle. "f " .Sright) icheck2 (Sleft, Smiddle. "g" .Sright) icheck2 (Sleft, Smiddle . "h" .Sright) icheck2 (Sleft, Smiddle. .Sright) icheck2 (Sleft, Smiddle. "j" .Sright) icheck2 (Sleft, Smiddle. "k" .Sright) icheck2 (Sleft, Smiddle. "1" .Sright) icheck2 (Sleft, Smiddle. "m" .Sright) icheck2 (Sleft, Smiddle. "n" .Sright) icheck2 (Sleft, Smiddle. "o" .Sright) icheck2 (Sleft, Smiddle. "p" .Sright) icheck2 (Sleft, Smiddle. "q" .Sright) icheck2 (Sleft, Smiddle. "r" , S ight) icheck2 (Sleft, Smiddle. "S" .Sright) icheck2 (Sleft, Smiddle. "t" .Sright) icheck2 (Sleft, Smiddle. "U" , Sright) icheck2 (Sleft, Smiddle. "V" .Sright) icheck2 (Sleft, Smiddle. "w" , Sright) icheck2 (Sleft, Smiddle. "X" .Sright) icheck2 (Sleft, Smiddle. -y- , Sright) icheck2 (Sleft, Smiddle. Sright)
}
.•# transpositions if (Si != Slength - 1) {
Sm2 = substr (Sright, 0, 1);
Sr2 = substrfSright, 1) if (Smiddle eq "\") { icheck2 (Sleft, "n\' ",Sr2) i f (S . 2 eq "n" )
} elsif (Smiddle eq "a ') ( icheck2 (Sleft, "ca" ,Sr2) if (Sm2 eq "c"); icheck2 (Sleft, "ea" ,Sr2) if (Sm2 eq "e") ; icheck2 (Sleft, "ga" ,Sr2) if (Sm2 eq »g»); icheck2 (Sleft, "ha" ,Sr2) if (Sm2 eq "h"); icheck2 (Sleft, "ia" ,Sr2) if (Sm2 eq "i"); icheck2 (Sleft, "ka" ,Sr2) if (Sm2 eq "k"); icheck2 (Sleft, "la" ,Sr2) if (Sm2 eq *1"); icheck2 (Sleft, "ma" ,Sr2) if (Sm2 eq "m" ) ; icheck2{Sleft, "na" ,Sr2) if (Sm2 eq "n"); icheck (Sleft, "oa" ,Sr2) if (Sm2 eq "O") ; icheck (Sleft, "pa" ,Sr2) if f$m2 eq *P"); icheck2 (Sleft, "ra" ,Sr2) if (Sm2 eq "r"); icheck2 (Sleft, "sa" ,Sr2) if (Sm2 eq "S") ; icheck2 (Sleft, "ta" ,Sr2) if (Sm2 eq "t"); icheck2 (Sleft, "ua" ,Sr2) i (Sm2 eq "u") ;
} elsif (Smiddle eq "b ") ( icheck2 (Sleft, "ab" ,Sr2) if (Sm2 eq "a"); icheck2 (Sleft, "ib" ,Sr2) f (Sm2 eq "!*) ; icheck (Sleft, "mb" ,Sr2) t (Sm2 s*~* "m" ) ;
} elsif (Smiddle eq "c ) { icheck2 (S1eft, "ac" . ,Sr2) t (Sm2 eq "a") ; icheck2 (Sleft, "ec" ,Sr2) i (Sm2 eq "e") ; icheck (Sleft, "ic". ,5r2) if (Sm2 eq " i ' ) ; speil.pl /21 p:/SpellGrams/revminedist/
icheck2 (Sleft, *nc" , Sr2) (Sm2 eς "n" ) icheck2 (Sleft, "oc" , Sr2) (Sm2 eq "o") icheck2 (Sleft, "re" , Sr2) (Sm2 eq "r") icheck2 (Sleft, "sc", Sr2) (Sm2 eq "s") icheck2 ( Sle t , " tc " , Sr2) it (Sm2 eq *t") icheck2 (Sleft, "uc", Sr2) (Sm2 eς "u") icheck2 (Sleft, "yc", Sr2) ( Sm2 eq "y" ) } elsif (Smiddle eq "d" ) { icheck2( Sleft, "ad", Sr2) (Sm2 eq *a") icheck2 t Sleft, "ed", Sr2) it (Sm2 eς "e") icheck2 (Sleft, "id Sr2) if (Sm2 eq "i") icheck2 (Sleft, "Id Sr2) if (Sm2 eq *1") icheck2 (Sleft, "nd Sr2) if (Sm2 eq *n") icheck2 (Sleft, "od" , Sr2) if (Sm2 eq *o")
} elsif (Smiddle eq "e" ) { icheck2 (Sleft, "ae" , Sr2) ir (Sm2 eς *a") icheck2 (Sleft, "be", Sr2) if (Sm2 eς b") icheck2 (Sleft, "ce", Sr2) if (Sm2 eq C) icheck2 (Sleft, "de", Sr2) if (Sm2 eq *d") icheck2 (Sleft, "fe", Sr2) if (Sm2 eq *f") icheck2 ( Sleft , "ge" , Sr2) if (Sm2 eq *g") icheck2 (Sleft, "he" , Sr2) if (Sm2 eq *h") icheck2 (Sleft, "ie" , Sr2) if (Sm2 eς "i") icheck2 (Sleft, "ke" , Sr2) if (Sm2 eq "k") icheck2 (Sleft, "le", Sr2) if (Sm2 eς "1") icheck2 (Sleft, "me", Sr2) if (Sm2 eq "m") icheck2 (Sleft, "ne" , Sr2) if (Sm2 eq "n") icheck2 (Sleft, "oe" , Sr2) if (Sm2 eq o") icheck2 (Sleft, "pe" , Sr2) if (Sm2 eq *P*) icheck2 (Sleft, "re", Sr2) if (Sm2 eq r*) icheck2( Sleft, "se", Sr2) if (Sm2 eq *s") icheck2 (Sleft, "te", Sr2) if (Sm2 eς *t*) icheck2 (Sleft, "ue" , Sr2) if (Sm2 eq "u") , icheck2 (Sleft, "ve", Sr2) (Sm2 eq "v") , icheck2 (Sleft, "ye", Sr2) (Sm2 eq "y"),
} elsif (Smiddle eq "f" ) { icheck2 (Sleft, "ef", Sr2) lϊ ($m2 eq "e") icheck2 (Sleft, "If, Sr2) if (Sm2 eq "1") icheck2 (Sleft, "nf" , Sr2) if (Sm2 eq *n") icheck2 (Sleft, "of, Sr2) if (Sm2 eq "o") icheck2 (Sleft, *rf", Sr2) if (Sm2 eq "r")
} elsif (Smiddle eq "g" ) { icheck2 (Sleft, "ag", Sr2) if (Sm2 eq *a") ; icheck2 (Sleft, ig" Sr2) if (Sm2 eq "i") ; icheck2( Sleft, ng" Sr2) if ($m2 eq "n"); icheck2 (Sleft, og" Sr2) if ($m2 eς "o") ,- icheck2 (Sleft, rg" Sr2) if (Sm2 eq *r"); icheck2( Sleft, ug* Sr2) if (Sm2 eq "u") ;
} elsif (Smiddle eq "h ) { icheck2 (Sleft, "ch" Sr2) ir (Sm2 eq "c") ,- icheck2( Sleft, "gn" Sr2) if (Sm2 eς "g") ; icheck2 (Sleft, "ph" Sr2) if (Sm2 eq "p") ; icheck2 (Sleft, Sr2) (Sm2 eq "r") ; icheck2 (Sle t, *sh" Sr2) ( 5m2 eς s " ) ,- icheck2 (Sleft, ~. Sr2) (Sm2 eς "t") ; icheck2 (Sleft, "wh" Sr2) (Sm2 eς "w") ;
} elsif (Smiddle eq ) { icheck2 (Sleft, "a Sr2) .f ($m2 eς "a" ) ; spell.pl 27/35 p:/Spe!IGrams/revmiπedist/ 98/04/21
icheck2 (Sleft, "ci",$r2) if (Sm2 eq "c") icheck2 (Sleft, "di",$r2) if ($m2 eq *d") icheck2 (Sleft, "ei",$r2) if ($m2 eq "e") icheck2 (Sleft, "gi",$r2) i (Sm2 eq "g") icheck2 (Sleft, "hi", $r2) if (Sm2 eq "h" ) ; icheck2 (Sleft, "li \Sr2) it ($m2 eq "1") icheck2 (Sleft, "ni \$r2) if ($m2 eq "n") icheck2 (Sleft, "oi ,$r2) if ( $m2 eq "o " ) icheck2 (Sleft, "r *.Sr2) if (Sm2 eq "r") icheck2 (Sleft, "s \$r2) ( $m2 eq * s " ) icheck2 (Sleft, "ti \Sr2) ( $m2 eς *t " ) icheck2 (Sleft, "ui" , $r2) ($m2 eq "u") icheck2 (Sleft, "vi",$r2) ($m2 eq "v" ) icheck2($left,"wi",$r2) if ($m2 eq "w" ) elsif (Smiddle eq "j") { S icheck2 (Sleft, "g",$r2) if ($m2 eq "n") elsif (Smiddle eq *k") { icheck2($left,"ak",$r2) if $m2 eq "a") ; icheck2 (Sleft, "ck",$r2) if $m2 eq "c") ; icheck2 (Sleft, "nk",$r2) if $m2 eq *n") ; icheck2 (Sleft, "rk",$r2) if $m2 eq "r") ; icheck2($left,"sk",$r2) if $m2 eq *s") ;
} elsif (Smiddle eq "1") { icheck2 (Sleft, "al" , $r2 $m2 eq "a") ; icheck2 (Sleft, "bl" , $r2 $m2 eq *b") ,- icheck2 (Sleft, "cl" , $r2 $m2 eq c"); icheck2 (Sleft, "el",$r2 $m2 eq e"); icheck2 (Sleft, "il" , $r2 $m2 eq i") ; icheck2 (Sleft, "ol",$r2 $m2 eq o"); icheck2 (Sleft, "pi ",$r2 $m2 eq "p") ; icheck2 (Sleft, "rl",$r2 $m2 eq "r") ; icheck2 (Sleft, "si" , $r2 $m2 eq "s") ; icheck2 (Sleft, "tl" , $r2 $m2 eq " t " ) ; icheck2 (Sleft, "ul" , $r2 $m2 eq *u" ) ; icheck2 (Sleft, "yl",$r2 $m2 eq "y" ) ;
} elsif (Smiddle eq "m") { icheck2( Sleft, "am",$r2 $m2 eq "a" ) ; icheck2 (Sleft, "em" , $r2 $m2 eq "e") ; icheck2 (Sleft, "nm",$r2 Sm2 eq "n" ) ; icheck2 (Sleft, "om",$r2 $m2 eq "o") ; icheck2 (Sleft, "rm" , $r2 $m2 eq *r" ) ; icheck2 (Sleft, "sm" , $r2 $m2 eq *s" ) ;
} elsif (Smiddle eq "n") C icheck2 (Sleft, "an", $r2 $m2 eq "a") ; icheck2 (Sleft, "en" , $r2 Sm2 eq "e") ; icheck2 (Sleft, "gn",$r2 $m2 eq "g" ) ; icheck2 (Sleft, "in" , $r2 $m2 eq "i") ,- icheck2 (Sleft, "kn" , $r2 $m2 eq "k") ; icheck2 (Sleft, "on", $r2 if $m2 eq "o" ) ; icheck2($lef ,"rn",$r2 if $m2 eq *r") ,- icheck2 (Sleft, "sn" , $r2 if $m2 eq "s") ; icheck2 (Sleft, "un" , $r2 Sm2 eq "u") ; icheck2 (Sleft, "wn" , Sr2 $m2 eq *w" ) ; icheck2 (Sleft, "yn" , $r2 Sm2 ec "v" )
} elsif (Smiddle eq "o") { icheck2 (Sleft, "ao" , Sr2 5ni2 aq icheck2(Sleft, "co" ,Sr2 $m2 eq icheck2 (Sleft, "eo" , $r2 Sm2 eq spell. pi /21 p:/SpellGrams revminedisi/
icheck2 (Sleft, ' to ,Sr2) ($m2 icheck2 (Sleft, "go' ,Sr2) (Sm2 g- icheck2 (Sleft, "ho* ,Sr2) (Sm2 * Vι " icheck2 (Sleft, "io" ,Sr2) (Sm2 eq "i" icheck2( Sleft, "lo' ,Sr2) ($m2 eq "1" icheck2 (Sleft, "mo" ,Sr2) ($m2 eq "m" icheck2 (Sleft, "no" ,Sr2) (Sm2 eq "n" icheck2 (Sleft, "ro" ,Sr2) (Sm2 & "r1 icheck2 (Sleft, "30" ,Sr2) (Sm2 & "s1 icheck2 (Sleft, "to" ,Sr2) f (Sm2 eq "f icheck2 (Sleft, *uo" ,Sr2) if (Sm2 eq "u' icheck2 (Sleft, ' o" ,Sr2) if (Sm2 eq "w' elsif (Smiddle eq "p") { icheck2( Sleft, "ap",$r2) (Sm2 eq "a" ' icheck2 (Sleft, "ep",$r2) (Sm2 eq "e" icheck2 (Sleft, "ip",$r2) ($m2 eq "i" icheck2 (Sleft, "lp",$r2) ir ($m2 eq "1" icheck2 (Sleft, "mp",$r2) ($m2 eq "m" icheck2 (Sleft, "op",Sr2) I: (Sm2 eq "o" icheck2(Sleft, "rp",$r2) ( m2 eq "r" icheck2(Sleft, "sp",$r2) ($m2 eq "s" icheck2 (Sleft, "up",$r2) ($m2 eq "u" elsif (Smiddle eq "q") { icheck2( Sleft, "cq",$r2) Lf ($m2 eq "c") ; elsif (Smiddle eq "r") { icheck2 (Sleft, "ar",$r2) if ($m2 eq "a" ) ; icheck2 (Sleft, "cr",$r2) if ($m2 eq "c" ) ; icheck2 (Sleft, "er ,$r2) if (Sm2 eq "e" ) ; icheck2 (Sleft, "gr ,$r2) if (Sm2 eq "g") ; icheck2 (Sleft, "hr ,$r2) if ($m2 eq "h") ; icheck2 (Sleft, "ir ,$r2) if ($m2 eq "i" ) ; icheck2 (Sleft, "or ,$r2) if t$m2 eq *o") ; icheck2 (Sleft, "pr ,$r2) if (Sm2 eq "p") ; icheck2 (Sleft, tr",$r2) if (Sm2 eq "t") ; icheck2 (Sleft, ur",Sr2) if ($m2 eq *u" ) ; icheck2 (Sleft, yr",$r2) if ($m2 eq "y" ) ; elsif (Smiddle eq "s") { icheck2 (Sleft, "as",$r2) if ($m2 eq "a") ; icheck2 (Sleft, "bs",$r2) if ($m2 eq "b" ) ; icheck2 (Sleft, "es",$r2) if t$m2 eq *e") ; icheck2 (Sleft, "is",$r2) if ($m2 eq "i"); icheck2($left,"ks*,$r2) if ($m2 eq "k"); icheck2 (Sleft, ls",$r2) if ($m2 eq *1"); icheck2(Sleft, ms",$r2) if ($m2 eq " " ) ; icheck2(Sleft, ns",$r2) if ($m2 eq *n" ) ; icheck2 (Sleft, "OS ,$r2) if (Sm2 eq "o") ; icheck2 (Sleft, "PS ,Sr2) if (Sm2 eq "p") ; icheck2 (Sleft, "rs ,$r2) if (Sm2 eq "r") ; icheck2 (Sleft, "ts ,$r2) if (Sm2 eq "t") ; icheck2(Sleft, "us ,$r2) if (Sm2 eq "u") ; icheck2 (Sleft, "ys ,$r2) if ($m2 eq "y" ) ; elsif (Smiddle eq "t") C icheck2 (Sleft, "at" ,Sr2) (Sm2 eq "a") ; icheck2 (Sleft, "ct" ,Sr2) :Sm2 eq "c1) ; icheck (Slef , "et" ,$r2) (Sm2 eq "e" ) ; icheck2 (Sleft, "ht" ,Sr2) (3m2 eq "h" ) ; icheck (Sleft, "it" ,$r2) ( Sm2 eq " i* ) ; icheck2 (Sleft, "It" ,Sr2) if (3m2 eq "1* ) ; spell.pl 29/35 p:/SpellGrams/revminedist 98/04/21
icheck2 (Sleft, "nt" , $r2) ($m2 eς " " ) ; icheck2 (Sleft, "ot",$r2) ($m2 eς "o''); icheck2 (Sleft, "pt",$r2) ($m2 eς "p" icheck2 (Sleft, "rt",$r2) (Sm2 eς "r" icheck2 (Sleft, "st" , $r2 ) ($m2 eς "s" icheck2 (Sleft, "ut",$r2) if ($m2 eq "u" elsif (Smiddle eq "u") { icheck2 (Sleft, "au",$r2) if ($m2 eq "a" icheck2 (Sleft, "bu",$r2) if ($m2 eq "b" icheck (Sleft, "cu",Sr2) if ($m2 eς "c" icheck2 (Sleft, "eu",$r2) if (Sm2 eς "e" icheck2 (Sleft , "g " , $r2 ) if ($m2 eς "g" icheck2 (Sleft, "lu",Sr2) if ($m2 eς *1" icheck2 (Sleft, "nu",$r2) if ($m2 eς "n" icheck2 (Sleft, "ou",$r2) if ($m2 eq "o" icheck2 (Sleft, "pu",$r2) if ($m2 eq "p" icheck2 (Sleft, "ru",$r2) if ($m2 eq "r" icheck2(Sleft, "su",$r2) if ($m2 eq *s" icheck2 (Sleft, "tu",$r2) if ($m2 eς "t" elsif (Smiddle eq "v") ( icheck2(Sleft, "av",Sr2) if ($m2 eς "a" icheck2 (Sleft, "ev",$r2) if ($m2 eς "e" icheck2 (Sleft, °iv",$r2) if ($m2 eς "i" icheck2 (Sleft, "lv",$r2) if ($m2 eς "1" elsif (Smiddle eq "w" ) £ icheck2 (Sleft, "ew" , $r2 ) if ($m2 eq "e" icheck2 (Sleft, "ow" , $r2 ) if ($m2 eq "o" icheck2(Sleft, "sw",$r2) if ($m2 eq "s" elsif (Smiddle eq "x") C # icheck2($left,"s",$r2) if ($m2 eq "n") ; elsif (Smiddle eq "y") C icheck2 (Sleft, "ay",$r2) if ( Sm2 eς " a " ) icheck2 (Sleft, "hy" , $r2 ) if ( $m2 eς "h" ) icheck2 (Sleft,"ly" , $r2 ) if ( Sm2 eq " 1 " ) icheck2 (Sleft, "ry",$r2) if ( Sm2 eq "r" ) icheck2 (Sleft, "sy",Sr2) if ( $m2 eς * s " ) elsif (Smiddle eq "z") { icheck2 (Sleft, "iz",$r2) if ($m2 eq "i") ; icheck2 (Sleft, "yz", $r2 ) if ($m2 eq "y" ) ;
}
# Long Distance Transpositions if ($i <= Slength - 2) {
$m2 = substr (Sright, 0, 1) ;
$m3 = substr (Sright, 1, 1) ;
$r2 = substr(Sright, 2) ;
Smid = Smiddle. $m2. $m3 ;
# Not including if (Smid =- /Λ[aeiouy] [dglnrstv] (aeiouy]$/ ii Smiddle ne Sm3] icheck2 (Sleft, $m3. Sm2. Smiddle, $r2) ;
} elsif (Smid =- /Λ [lmni [aeiou] [Imn] S/ ii Smiddle ne Sm3 ) icheck2 (Slef , $m3. Sm2. Smiddle, Sr2) ; els: (Smid eq "vel" \ ] Smid eq " lev" I ; Smid eq "ton" | \ Smid eq "net") i icheck2 (Sleft, Sm3. Sm2.Smiddle, 2-2) ; elsif ($mid =- /Λ [cdf rst] [aieu] [cdfgrs- 5 cccx
Figure imgf000045_0001
icheck (Sleft, Sπϋ . $m .Smiddle, $r2) ; spell.pl /21 pr/SpellGrams/revminedist
(Sversion >= 4) C Γ Multiole Character Substitutions ...
# y/ia, f/ph, al/le,
÷r not doing pneu/ne, ant/ent, aπce/ence, aiy/ally, eu/ea, oe/ow, i pre/pro, ious/uous, pre/per, caed/sede, ament/ement, eous/ious, ÷r sh/sc, ghth/ght, all/al, c/sc/s/ss, m/gm, ss/s, eorg/orge, ene/e= Γ uf/ough, mce/cem, eat/ate, tui/uit, ai/ile, ash/has, fea/afe, rau/ura if (Smiddle eq "e") ( icheck2 (Sleft, "ia", Sright) ; icheck2 (Sleft, "ai", Sright) ; } elsif (Smiddle eq "f") { icheck2 (Sleft, "ph" , Sright) ; icheck2 (Sleft, "ve" , Sright) ; } elsif (Smiddle eq "i") { icheck2 (Sleft, "ea", Sright) ; } elsif (Smiddle eq "y" ) { icheck2 (Sleft, "ie" , Sright) ; } if ($i == Slength - 1) { if (Smiddle eq "t") { icheck (Sleft, "ed" , Sright) ; icheck2 (Sleft, "led" , Sright) ;
} } if (Si <= Slength - 2) {
Smiddle2 = substr (Sword, Si, 2) ;
Sright2 = substr (Sword, $i+2) ; if ($middle2 eq "al") { icheck2 (Sleft, "le" , $right2) ;
} elsif (Smiddle2 eq "as") {
# from cassi —> ccasi icheck2( Sleft, "ca" , $right2) ;
} elsif ($middle2 eq "aw") { icheck2 (Slef , "o" , $right2) ; } elsif (Smiddle2 eq "ce") { icheck2($left,"es",$right2) } elsif (Smiddle2 eq "co") C icheck2 (Sleft, "om" , $right2) } elsif ($middle2 eq "de" ii Si == 0) { icheck2 (Sleft, *un" , $right2 ) } elsif ($middle2 eq "ea") { icheck2 ( Sleft , " i " , $right2 ) ; icheck2 (Sleft, "ie" , $right2) ; } elsif (Smiddle2 eq "el") C
# pell —> ppel icheck (Sleft, "pe" , $right2) ; icheck (Sleft , "al " , $right2 ) ;
# icheck2 (Sleft, " ia" , Sright2 ) elsif (Smiddla2 eq "an") { icheck2 (Slef , " ine" , Sright2 j } elsif (Smiddle2 eq "ey") C icheck2 (Sleft, " ie" , Sright2 ) ; } elsif (Smiddie2 eς "fe") ( spell.pl 31/35 p:/SpelIG rams/revminedist/ 98/04/21
=r tro tteπ: :g —> ferring and :terec —> rerreα icheck2 (Sleft,"er",$right2) ;
# from ffes — > fess icheck2 (Sleft es",$right2) ; elsif ($middle2 eq "gi") {
# ggin —> gi .nn icheck2 (Sleft , "in",$right2) ; elsif ($middle2 eq "ia") { icheck2 (Sleft, "e",$right2) ; elsif (Smiddle2 eq "ie") { icheck2 (Sleft, "y" ,$right2) ;
. icheck2 (Sleft, "ey" ,$right2) ; elsif ($middle2 eq "in") ( Γ from cinn — •> ccin icheck2 (Sleft ci",$right2) ; elsif ($middle2 eq "if) { icheck2 (Sleft "ate",$right2) ; icheck2 (Sleft ,"ute",$right2) ,- icheck2 (Sleft, "mi" ,$right2) ; icheck2 (Sleft, "te",$right2) ,- elsif ($middle2 eq "le*) { icheck2 (Sleft "al " , $right2 ) ;
# icheck2($laft, "el",$right2) ; elsif ($middle2 eq "lo") { icheck2 (Sleft,"os",$right2) ; elsif ($middle2 eq "inn*) { icheck2 (Sleft, "urn" ,$right2) ; elsif ($middle2 eq "oo") { icheck2 (Sleft ,"u",$right2); elsif ($middle2 eq "ph*) £ icheck2 (Sleft "f-,$right2) ; elsif ($middle2 eq "qu*) £ icheck2 (Sleft , "ck",$right2) ; elsif ($middle2 eq "ra") { icheck2 (Sleft,"al",$right2) ,- icheck2( Sleft, "as " , $right2 ) ; elsif ($middle2 eq "ri*) f icheck2 (Sleft ,"ib",$right2); icheck2($lef ,"if",$right2) ; elsif ($middle2 eq "rq") £ icheck2 (Sleft, "er",$right2) ; elsif ($middle2 eq "si*) £
# from ssic — > sice icheck2 (Sleft "ic",$right2) ; elsif ($middle2 eq "te*) £ icheck2 (Sleft ,"ght",$right2) ; elsif ($middle2 eq "ye*) { icheck2 (Sleft,"i",$right2);
(Si <= Slength - 3) { Smiddle3 = substr (Sword, $i, 3) ; Sright3 = substr (Sword, $i-3) ; if (Smiddle3 eq "aga") { icheck2 (Sleft, "adge" ,Sright3 ) } elsif (Smiάάle3 eq "acy*} ( icheck2 ( Sleft , " isy" , Srιght3 j ; } elsif (Smiddlej eq "asa") £ 32/35 spell. pi 98/04/21 p:/Spe!lGrams/revminedist/
icheck2 (Sleft, "sea",Sright2) ; elsif (Smiddie3 eq "dec") £ icheck2 (Sleft, t",Sright3); elsif ($middle3 eq "ear") £ icheck2 (Sleft, "ere" ,Sright3) ; elsif ($middle3 eq "evi") £ icheck2 (Sleft, "iev" ,Sright3) ; elsif ($middle3 eq "exi") £
# from exion -—> ection icheck2( Sleft, "ecti " , $right3 ) ,- elsif (Smiάdle3 eq "gin") { .icheck2 (Sleft, " ing" , Sright3 ) ; elsif (Smiddle3 eq "ine") { icheck2 (Sleft, "ein",$right3) ; elsif ($middle3 eq "isy") £ icheck2 (Sleft, "acy",Sright3) ; elsif ($middle3 eq "nts") £ icheck (Sleft, "nce",$right3) ,- elsif ($middle3 eq "ons" ) { icheck2( Sleft, "a",$right3) ; elsif ($middle3 eq "que") { icheck2 (Sleft, "ck",≤right3) ; elsif ($middle3 eq "sci") £ icheck2 (Sleft, "cil",$right3); elsif ($middla3 eq "tio") { icheck2 (Sleft, "cea",Sright3) ; elsif ($middle3 eσ "unp" ϋ Si == 0) if un —> im before a p icheck2 (Sleft, "imp" , $right3 ) ; elsif ($middle3 eq "urns") £ icheck2 (Sleft, "a",$right3) ,- elsif ($middle3 eq "ure") £ icheck2 (Sleft, "eur" , $right3 ) ; icheck2 (Sleft, "er",Sright3) ;
} sub check £ local (Sleft, Smiddle, Sright) = @_; local ( Sword) ;
Sword = Sleft. Smiddle. Sright; if f$dict£$word} != 0) t
Scorrection = iapply_case (Sword, Scase) ; if ( !grep(/Λ$correction$/, Θcorrections) ) ( push ( ©corrections , Scorrection) ; $numcor+÷;
}
}
..b check2 £ local (Sleft, Smiddle, Sright ) = -i_ local (Sword, 3length, Si) ; Sword = Sleft. Smiddle. Sright ; Slength = length (Sword) ; spell.pl 33/35 p:/Spe!IGrams/revminedis / 98/04/21
for ($i = length (Sleft. middle) ; Ξi < Slength; Si—) £ isoellcor_pos (Sword, Si, Slength) ; } }
# Code for Sorting sub by_freq ( if (SdictCSb} == $dict£$a}) { this gives a preference to insertions over subst/trans over deletions length($b) <=> length(Sa);
} else £
$dict£$b} <=> Sdict£$a}; } }
# The following code relates to the case (UPPER, lower, Initial, etc.)
# of the word. sub id_case £ local (Sword) = 9_; local (Swfirst, Swrest, Slcfirst, Slcrest);
Swfirst = substr (Sword, 0, 1) ;
Swrest = substr (Sword, 1) ;
Slcfirst = Swfirst; Slcfirst =- tr/A-Z/a-z/;
Slcrest = Swrest; Slcrest =- tr/A-Z/a-z/; if (Swfirst eq Slcfirst) £
# First Letter is Lowercase if (Swrest eq Slcrest) £
# Rest of Word is Lowercase return(O) ;
} elsif (length (Swrest) >= 2 ϋ substr (Swrest, 0,1) ne substr (Slcrest, 0, 1) ii substr ( Swrest, 1) eq substr (Slcrest, 1) ) ( π mistake style return(0) ; } elsif (length (Swrest) >= 2 ϋ substr (Swrest, 0,1) ne substr (Slcrest, 0, 1) ii substr (Swrest, 1,1) ne substr (Slcrest, 1, 1) i substr (Swrest, 2) ne substr (Slcrest, 2) ) £
# mISTAKΞ style (inverted caps lock) return(2) ;
} else £
# Rest of Word is Uppercase retur (1) ;
} } else £
S First Letter is Uppercase if (Swrest eq Slcrest) £
# Rest of Word is Lowercase r=tum(2) ; aisif (length (Swrest) >= 2 ϋ subst (Swrest, 0,1) eq substr ( lcrest , 0 , 1) ii substr (Swrest, 1,1) ne subst (Slcrest, 1, 1) ii substr (Swrest, 2) eq subst (Slcrest, ) ) £ =f McDonald style 34/35 spell.pl
98/04/21 pr/SpellGrams/revmiπedist/
retur (4) ; } elsif (length (Swrest) >= 2 ϋ substr (Swrast, 0, 1! ne substr (Slcrest, 0 , 1) ii substr (Swrsst, 1) eq substr (Slcrest, 1) ) £
# Mistake style return(2) ;
} elsif (Swrest =- /-/) C
# Foo-3ar style, ignore retur (-I) ;
} else £
S Rest of Word is Uppercase xetum(3) ; }
} sub apply_case £ local (Sword, Scase) = θ_; local (Swfirst, Swrest); if (Scase == 0) {
# word
Sword =- tr/A-Z/a-z/ ; retur (Sword) ; } elsif (Scase == 1) £
# wORD
Swfirst = substr (Sword, 0,1) ; Swrest = substr (Sword, 1) ; Swfirst =- tr/A-Z/a-z/; Swrest =- tr/a-z/A-Z/; retur (Swfirst. Swrest) ; } elsif (Scase == 2) {
# Word
Swfirst = substr (Sword, 0,1) ; Swrest = substr (Sword, 1) ; Swfirst =- tr/a-z/A-Z/; Swrest =- tr/A-Z/a-z/; return(Swfirst. Swrest) ; } elsif (Scase == 3) {
# WORD
Sword =- tr/a-z/A-Z/; retur (Sword) ; } elsif (Scase == 4) {
# McDonald
Swfirst = substr($word, 0,1) ;
Swsecond = substr(Sword, 1,1) ;
Swthird = substr ($word, 2,1) ;
Swrest = substr (Sword, 3) ;
Swfirst =- tr/a-z/A-Z/;
Swsecond =- tr/A-Z/a-z/ ;
Swthird =- tr/a-z/A-Z/ ;
Swrest =- tr/A-Z/a-z/ ; return ( Swfirs . Swsecond. Swthird. Swres;
Correcting word boundary errors . spell.pl 35/35 p:/Spe!IGrams/revminedis / 98/04/21
sub split_word £ local (Sword) = @_;
IccaKSfound, $i, Slength, S≤point, Sleft, Sright) ; if (Sword =- /\ -,\w+/l £ if (Sword !- /\d÷,\d÷/ ii Sword !- A(.÷\)/) £
Sword =- s/, /, /g; } else £
Sword = Sword; } } elsif (length(Sword) >= 6) £
TΓ search for potential split point
$found = 0;
Sspoint = 0;
Slength = length (Sword) ; for ($i = 3; $i <= Slength - 3; $i+÷) £
Sleft = substr (Sword, 0, $i) ;
Sright = substr (Sword, $i) ;
# note that we\'re not changing the case, £ so FooBar will not be split if ($dict£$left} >= S insplitfreq ii S ict fSright} >= Sminsplitfreq) £ $found+τ; Sspoint = $i;
} } if (Sfound == 1) £
Sleft = substr (Sword, 0, Sspoint) ;
Sright = substr (Sword, Sspoint) ;
Sword = "Sleft Sright"; } elsif (Sfound > 1) £
# if more than one, return word unchanged for now Sword = Sword;
} else {
# if no splits found, return word unchanged Sword = Sword;
} } else {
# leave the word alone
Sword = Sword; } retur ( Sword) ; }
# *EOF*
errstat.pl 1/3 p:/Spe!lGrams/revmiπedist/ 98/04/21
# ! /usr/lccal/bin/peri
Smisses = 0; while (<>) £ chomp; S/\s÷$//;
(Sleft, Sright) = spli (/ —> / ) ; Smismatch = ifind_mismatch($lef , Sright) ; if (subst (Sleft, Smismatch÷i) eq substr (Sright, Smismatch÷i) ii lengt (Sleft) > Smismatch ϋ length (Sright) > Smismatch) ( $sub£substr (Sleft, Smismatch, 1) .substr (Sright, Smismatch,!) }-rτ; } elsif (substr (Sleft, Smismatch÷i) eq substr (Sright, Smismatch) ) £
$del£substr (Sleft, Smismatch, 1) }+÷; } elsif (substr (Sleft, Smismatch) eq substr (Sright, Smismatch÷i) ) (
Sins £substr(Sright, Smismatch, 1) }+÷; } elsif (substr (Sleft, $mismatch÷2) eq substr (Sright, $mismatch+2) ii substr (Sleft, Smismatch, 1) eq substr (Sright, Smismatch÷i, 1) ϋ substr($left, Smismatch÷i, 1) eq substr (Sright, Smismatch, 1) ) £ $trans (substr (Sleft, Smismatch, 2) }÷÷; } elsif (substr($left,$mismatch÷3) eq substr (Sright, $mismatch÷3) ii substr (Sleft, Smismatch, 1) eq substr (Sright, $mismatch÷2, 1) ϋ substr (Sleft, Smismatch÷i, 1) eq substr (Sright, Smismatch÷i, I) i≤ substr (Sleft,$mismatch÷2, 1) eq substr (Sright, Smismatch, 1) ) £ $trans2 (substr (Sleft, Smismatch, 3) }÷÷; } elsif (substr (Slef , $mismatch-4) eq substr (Sright, $mismatch+4) ii substr (Sleft, Smismatch, 1) eq substr (Sright, $mismatch÷3 , 1) ϋ substr($left,$mismatch-l,2) eq substr (Sright, Smismatch÷i, 2) iS substr ($lef , $mismatch÷ , 1) eq substr (Sright, Smismatch, 1) ) £ $trans2{substr(Sleft, Smismatch, 4) }++; } elsif (substr($left,$mismatch÷2) eq substr (Sright, Smismatch) ii length(Sleft) > Smismatch÷i) ( Sdel2 {substr (Sleft, Smismatch, 2 ) }÷÷ ; } elsif (substr (Sleft, $mismatchτ3) eq subst (Sright, Smismatch) ii length(Sleft) > $mismatch÷2) { $del2 (substr (Sleft, Smismatch, 3 ) } ++; } elsif (substr (Sleft,$mismatch+4) eq substr (Sright, Smismatch) ϋ length(Sleft) > $mismatch+3) { $del2 (substr(Sleft, Smismatch, 4) }++; } elsif (substr(Sleft, Smismatch) eq substr (Sright, $mismatch÷2) ii length(Sright) > Smismatch÷i) £ Sins2£substr(Sright, Smismatch, 2) }++; } elsif (substr (Sleft, Smismatch) eq substr (Sright, $mismatch÷3) ii length(Sright) > $mismatch÷2) { Sins2£substr(Sright, Smismatch, 3) }++; } elsif (substr(Sleft, Smismatch) eq substr (Sright, $mismatch÷4) ii length(Sright) > $mismatch+3) { $ins2£substr(Sright, Smismatch, 4) }++; } elsif (substr (Sleft,$mismatch÷2) eq substr (Sright, Smismatch÷i) ii length(Sleft) > Smismatch ii length(Sright) > Smismatch) £ $sub2 {substr(Sleft, Smismatch, 2) ." —> " .substr (Sright, Smismatch, 1) } } elsif (substr($left,$mismatchτ-l) eq substr (Sright, ≤mismatch÷2) ii length(Sleft) > Smismatch ii length(Sright) > Smismatch) { Ss b2 (substr (Sleft, Smismatch, 1) ." —> " .substr (Sright , Smismatch, 2 ) } elsif (substr (Sleft, Smismatch-2) eq substr ( Sright, Smismatch-2 ) ii length(Sleft) > Smismatch ii length) Sright ) > Smismatch: Ssub2 (substr (Sleft, Smismatch, 2) ." —> " .subst (Sright , Smismatch, 2 ; } elsif (substr (Sleft, Smismatch÷i) eq substr (Sright, Smismatch- ) ii 2 3 errstat.pl
98/04/21 p:/Spe!IGrams/revmiπedist/
length( Sle t ) > Smismatch ϋ lengt (Sright) > Smismatch-2) ( 5sub2Csubstr (Sleft, Smismatch, 1) ." —> " .substr (Sright, Smismatch, 3 )} ÷-r ; } elsif (substr (Sleft, mismatch-2) eq substr (Sright, Smismatch÷3 ) ϋ length (Sleft) > Smismatch÷i ϋ length(Sright) > $mismatch÷2) £ Ssub2£substr( Sleft, Smismatch, 2) ." —> " .subst (Sright, Smismatch, 3 ) }÷÷; } elsif (substr (Sleft, Smismatch÷3 ) eq substr (Sright, Smismatch÷i) ϋ length(Sleft) > $mismatch÷2 ii length(Sright) > Smismatch) { $sub2 (substr (Sleft, Smismatch, 3) . * —> " .substr (Sright, Smismatch, 1) }÷-; } elsif (substr (Sleft, $mismatch÷3 ) eq substr (Sright, Smismatch-2 ) ϋ length (Sleft) > Smismatch-2 ϋ length (Sright) > Smismatch÷i) £ Ssub2£subst (Sleft, Smismatch, 3) . " —> " .subst (Sright, Smismatch, 2) }÷-; } elsif (substr (Sleft, Smismatch-2) eq substr (Sright, $mismatch÷3 ) ϋ length(Sleft) > $mismatch+2 ii length (Sright) > $mismatch÷2) £ $sub2 (substr (Sleft, Smismatch, 3) . " —> " .subst (Sright, Smismatch, 3) }÷÷; } else £ print "Sleft —> $right\n"; Smisses+÷; } } print "\nMisses: $misses\n" ; print " \nSubstitutions : \n" ; foreach Scorr (keys(%sub)) £ printf "%6d\t%s —> %s\n", $sub£Scorr}, subst (Scorr, 0, 1) , substr (Scorr , 1) , } print "\nDeletions:\n" ; foreach Scorr (keys(%del)) £ printf "%6d\t%s\n", $del£$corr}, Scorr; } print " \nInsertions : \n" ; foreach Scorr (keys(%ins)) { printf "%6d\t%s\n", SinsfScorr}, Scorr; } print "\nTranspositions:\n" ; foreach Scorr (keys (%trans ) ) £ printf "%6d\t%s\n", $ rans £Scor } , Scorr; } print "\nLong Distance Transpositions : \n" ; foreach Scorr (keys (%trans2) ) £ printf "%6d\t%s\n", $trans2 (Scorr} , Scorr; } print " \nLarger Substitutions : \n" ; foreach Scorr (keys (%sub2) ) { printf "%6d\t%s\n", $sub2 £$corr} , Scorr; } print " \nLarger Deletions : \n' ; it-reach Scorr (keys (%dei2 ) ; ' printf "%δd\t s\n" , 5del (Scorr , Scorr: }
\nlarger Insertions errstat.pl 3/3 p:/SpellGrams/revminedist/ 98/04/21
foreach Scorr (keys (%ins2) ) £ printf
Figure imgf000054_0001
Sins2 (Scorr} , Scorr; } sub find_mismatch £ local ($wordl,$word2) = @_; local ( $wl, $w2 , ©word! , @word2 , Scount) ; ©wordl = spli ( // , Swordl) ; 3word2 = split ( // , $word2 ) ; for ($count=0; Scount < i i (length! Swordl ) , length ( Sword2 )); Scount--) £ $wl = shift (Swordl) ; $w2 = shift (@word2) ; if (Swl ne $w2) £ last; } } return (Scount) ; } sub min £ local ($vall,$val2) = @_; if (Svall <= $vai2) £ retur ($vail) ; } else { retur ($vai2) ; } }

Claims

I CLAIM:
1. A computer method of spelling correction which comprises a step of testing a word against a valid dictionary and, if the word is not in the dictionary, calculating the edit distance to at least one valid word using a restricted set of edit operations that correct the most common errors comprising insertion, deletion, transposition and/or substitution errors and displaying at least one valid word.
2. The computer method of spelling correction according to claim 1 wherein statistical techniques are used to identify the most common spelling errors and/or the restricted set of edit operations.
3. The computer method of spelling correction according to claim 1 wherein the calculation of edit distance is a step in a standard minimum edit distance algorithm.
4. The computer method of spelling correction according to claim 1 wherein the calculation of edit distance is a step in a reverse minimum edit distance algorithm.
5. The computer method of spelling correction according to claim 1 wherein the valid word is displayed in a list of candidate words.
6. The computer method of spelling correction according to claim 1 wherein the valid word is displayed by replacing it for the misspelled word.
7. The computer method of spelling correction according to claim 1 wherein the restricted set of edit operations includes the most common edits at distance one to correct errors based upon a training corpus of documents with uncorrected spelling errors.
8. The computer method of spelling correction according to claim 1 wherein the restricted set of edit operations includes common complex edits selected from the group comprising long-distance transpositions, multiple letter corrections and missing space errors.
9. The computer method of spelling correction according to claim 1 implemented in the automatic spelling correction function of a word processing program.
10. The computer method of spelling correction according to claim 1 implemented in the batch spelling correction function of a word processing program.
11. The computer method of spelling correction according to claim 1 implemented to correct the spelling on an input line in a computer interface such as a command line for an operating system, a data base query or the like.
12. A computer method of spelling correction comprising the steps of : a) storing a dictionary of valid words; b) for each input string to be checked comparing the input string to words in the stored dictionary to identify input strings not in the dictionary; c) for each input string not found in the preceding step, generating test words by a restricted set of edit operations which correct the most common errors comprising insertion, deletion, transposition and/or substitution; d) comparing the edited input string generated in the preceding step with words stored in the dictionary; and e) generating a candidate word or list of candidate words from edited input strings that are found in the dictionary.
13. A computer method according to claim 12 wherein the members of the restricted set of edit operations are selected based upon the most common spelling errors .
14. A computer method according to claim 13 wherein statistical techniques are used to identify the most common spelling errors and/or the restricted set of edit operations.
15. A computer method according to claim 14 wherein a corpus of documents used to determine the most common spelling errors is selected from the academic or business field with which the method will be used.
16. A computer method according to claim 15 wherein the corpus of documents used to determine the most common spelling errors is selected from documents generated by the individual who will use the method.
17. A computer method according to claim 12 wherein the members of the restricted set of edit operations are selected based on the letter n-grams containing the letter or letters to be edited.
18. A computer method according to claim 12 wherein the edit operations are restricted to distance one.
19. A computer method according to claim 12 wherein if no valid edited input strings are found at edit distance one, allowing edits of distance two.
20. A computer method according to claim 12 wherein the edit operations are restricted to distances one and two .
21. A computer method according to claim 12 allowing all possible edits if no valid edited input strings are found at edit distances one or two.
22. A computer method according to claim 12 wherein the edit operations include long-distance transpositions, multiple letter substitutions, multiple letter deletions and missing space errors at edit distance one .
23. A computer method according to claim 12 wherein the dictionary is stored in a data structure selected from the group hash tables, binary trees and tries .
24. A computer method according to claim 12 wherein the substitution edits may include non-alphabetic characters .
25. A computer method according to claim 12 wherein a candidate list is sorted by combinations of word length, typed edit, word frequency or edit frequency.
26. A computer method according to claim 12 further comprising searching for missing space errors by testing complementary portions of a nonword for being valid words with a frequency above a given threshold.
27. A computer method of spelling correction in input lines comprising the steps of: a) storing a dictionary of valid words; b) for each word in the command line comparing the word to words in the stored dictionary to identify misspelled words; c) for each misspelled word found in the preceding step, generating test words by a restricted set of edit operations which correct the most common errors comprising insertion, deletion, transposition and/or substitution; d) comparing the edited words generated in the preceding step with words stored in the dictionary; and e) substituting a candidate word in the command line.
28. A computer method of spelling correction of words in a computer document comprising the steps of: a) storing a dictionary of valid words; b) for each word in the computer document comparing the word to words in the stored dictionary to identify misspelled words; c) for each misspelled word found in the preceding step, generating test words by a restricted set of edit operations which correct the most common errors comprising insertion, deletion, transposition and/or substitution; d) comparing the edited words generated in the preceding step with words stored in the dictionary; and e) substituting a unique candidate word for misspelled words in the computer document.
PCT/US2000/000260 1999-03-24 2000-01-06 Spelling correction method using improved minimum edit distance algorithm WO2000057291A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU24922/00A AU2492200A (en) 1999-03-24 2000-01-06 Spelling correction method using improved minimum edit distance algorithm

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US27570199A 1999-03-24 1999-03-24
US09/275,701 1999-03-24

Publications (1)

Publication Number Publication Date
WO2000057291A1 true WO2000057291A1 (en) 2000-09-28

Family

ID=23053449

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/000260 WO2000057291A1 (en) 1999-03-24 2000-01-06 Spelling correction method using improved minimum edit distance algorithm

Country Status (2)

Country Link
AU (1) AU2492200A (en)
WO (1) WO2000057291A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1288790A1 (en) * 2001-08-29 2003-03-05 Tarchon BV Method of analysing a text corpus and information analysis system
EP1686492A1 (en) * 2005-01-26 2006-08-02 Research In Motion Limited Method and Apparatus for Correction of Spelling Errors in Text Composition
WO2007079571A1 (en) * 2006-01-13 2007-07-19 Research In Motion Limited Handheld electronic device and method for disambiguation of text input and providing spelling substitution
EP1855210A1 (en) * 2006-05-11 2007-11-14 Exalead Spell checking
WO2008155503A1 (en) * 2007-06-18 2008-12-24 France Telecom Method for entering information on an electronic form
EP2021731A2 (en) * 2006-05-08 2009-02-11 Telecommunication Systems, Inc. Location input mistake correction
EP2169562A1 (en) * 2008-09-30 2010-03-31 BRITISH TELECOMMUNICATIONS public limited company Partial parsing method, based on calculation of string membership in a fuzzy grammar fragment
WO2010038017A1 (en) * 2008-09-30 2010-04-08 British Telecommunications Public Limited Company Partial parsing method, basec on calculation of string membership in a fuzzy grammar fragment
US8577328B2 (en) 2006-08-21 2013-11-05 Telecommunication Systems, Inc. Associating metro street address guide (MSAG) validated addresses with geographic map data
WO2014143350A1 (en) * 2013-03-15 2014-09-18 Apple Inc. Web-based spell checker
CN105183732A (en) * 2014-06-04 2015-12-23 广州市动景计算机科技有限公司 Method and device for processing webpage
WO2016110455A1 (en) * 2015-01-06 2016-07-14 What3Words Limited A method for suggesting candidate words as replacements for an input string received at an electronic device
US10204362B2 (en) * 2012-02-08 2019-02-12 Ebay Inc. Marketplace listing analysis systems and methods
US10754880B2 (en) 2017-07-27 2020-08-25 Yandex Europe Ag Methods and systems for generating a replacement query for a user-entered query

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5261112A (en) * 1989-09-08 1993-11-09 Casio Computer Co., Ltd. Spelling check apparatus including simple and quick similar word retrieval operation
US5485372A (en) * 1994-06-01 1996-01-16 Mitsubishi Electric Research Laboratories, Inc. System for underlying spelling recovery
US5576955A (en) * 1993-04-08 1996-11-19 Oracle Corporation Method and apparatus for proofreading in a computer system
US5604897A (en) * 1990-05-18 1997-02-18 Microsoft Corporation Method and system for correcting the spelling of misspelled words
US5649223A (en) * 1988-12-21 1997-07-15 Freeman; Alfred B. Word based text producing system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5649223A (en) * 1988-12-21 1997-07-15 Freeman; Alfred B. Word based text producing system
US5261112A (en) * 1989-09-08 1993-11-09 Casio Computer Co., Ltd. Spelling check apparatus including simple and quick similar word retrieval operation
US5604897A (en) * 1990-05-18 1997-02-18 Microsoft Corporation Method and system for correcting the spelling of misspelled words
US5576955A (en) * 1993-04-08 1996-11-19 Oracle Corporation Method and apparatus for proofreading in a computer system
US5485372A (en) * 1994-06-01 1996-01-16 Mitsubishi Electric Research Laboratories, Inc. System for underlying spelling recovery

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KUKICH K.: "Technique for automatically correcting words in text", ACM COMPUTING SURVEYS, vol. 24, no. 4, December 1992 (1992-12-01), pages 377 - 439, XP002925921 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1288790A1 (en) * 2001-08-29 2003-03-05 Tarchon BV Method of analysing a text corpus and information analysis system
EP1686492A1 (en) * 2005-01-26 2006-08-02 Research In Motion Limited Method and Apparatus for Correction of Spelling Errors in Text Composition
WO2007079571A1 (en) * 2006-01-13 2007-07-19 Research In Motion Limited Handheld electronic device and method for disambiguation of text input and providing spelling substitution
US8537118B2 (en) 2006-01-13 2013-09-17 Blackberry Limited Handheld electronic device and method for disambiguation of text input and providing spelling substitution
GB2449019A (en) * 2006-01-13 2008-11-05 Research In Motion Ltd Handheld electronic device and method disambiguation of text input and providing spelling substitution
US9442573B2 (en) 2006-01-13 2016-09-13 Blackberry Limited Handheld electronic device and method for disambiguation of text input and providing spelling substitution
GB2449019B (en) * 2006-01-13 2011-02-09 Research In Motion Ltd Handheld electronic device and method disambiguation of text input and providing spelling substitution
US7786979B2 (en) 2006-01-13 2010-08-31 Research In Motion Limited Handheld electronic device and method for disambiguation of text input and providing spelling substitution
US8854311B2 (en) 2006-01-13 2014-10-07 Blackberry Limited Handheld electronic device and method for disambiguation of text input and providing spelling substitution
EP2021731A4 (en) * 2006-05-08 2010-07-21 Telecomm Systems Inc Location input mistake correction
EP2021731A2 (en) * 2006-05-08 2009-02-11 Telecommunication Systems, Inc. Location input mistake correction
US8370339B2 (en) 2006-05-08 2013-02-05 Rajat Ahuja Location input mistake correction
US9244904B2 (en) 2006-05-11 2016-01-26 Dassault Systemes Software-implemented method and computerized system for spell checking
EP1855210A1 (en) * 2006-05-11 2007-11-14 Exalead Spell checking
US8577328B2 (en) 2006-08-21 2013-11-05 Telecommunication Systems, Inc. Associating metro street address guide (MSAG) validated addresses with geographic map data
US9275073B2 (en) 2006-08-21 2016-03-01 Telecommunication Systems, Inc. Associating metro street address guide (MSAG) validated addresses with geographic map data
WO2008155503A1 (en) * 2007-06-18 2008-12-24 France Telecom Method for entering information on an electronic form
EP2169562A1 (en) * 2008-09-30 2010-03-31 BRITISH TELECOMMUNICATIONS public limited company Partial parsing method, based on calculation of string membership in a fuzzy grammar fragment
WO2010038017A1 (en) * 2008-09-30 2010-04-08 British Telecommunications Public Limited Company Partial parsing method, basec on calculation of string membership in a fuzzy grammar fragment
US10204362B2 (en) * 2012-02-08 2019-02-12 Ebay Inc. Marketplace listing analysis systems and methods
WO2014143350A1 (en) * 2013-03-15 2014-09-18 Apple Inc. Web-based spell checker
US9489372B2 (en) 2013-03-15 2016-11-08 Apple Inc. Web-based spell checker
CN105183732A (en) * 2014-06-04 2015-12-23 广州市动景计算机科技有限公司 Method and device for processing webpage
WO2016110455A1 (en) * 2015-01-06 2016-07-14 What3Words Limited A method for suggesting candidate words as replacements for an input string received at an electronic device
KR20170122727A (en) * 2015-01-06 2017-11-06 와트3워즈 리미티드 A method for presenting candidate words as replacements for an input string received at an electronic device
CN107408108A (en) * 2015-01-06 2017-11-28 三词有限公司 For the method by candidate word suggestion for the replacement for the input string received at electronic installation
US20170364502A1 (en) * 2015-01-06 2017-12-21 What3Words Limited A Method For Suggesting Candidate Words As Replacements For An Input String Received At An Electronic Device
US11017169B2 (en) 2015-01-06 2021-05-25 What3Words Limited Method for suggesting candidate words as replacements for an input string received at an electronic device
KR102482391B1 (en) * 2015-01-06 2022-12-29 와트3워즈 리미티드 A method for presenting candidate words as substitutes for an input string received at an electronic device
US10754880B2 (en) 2017-07-27 2020-08-25 Yandex Europe Ag Methods and systems for generating a replacement query for a user-entered query

Also Published As

Publication number Publication date
AU2492200A (en) 2000-10-09

Similar Documents

Publication Publication Date Title
EP1302861B1 (en) Natural language parser
EP1585030B1 (en) Automatic Capitalization Through User Modeling
US7809744B2 (en) Method and system for approximate string matching
CN107193921B (en) Method and system for correcting error of Chinese-English mixed query facing search engine
US8015175B2 (en) Language independent stemming
CN110362824B (en) Automatic error correction method, device, terminal equipment and storage medium
US7546316B2 (en) Determining a known character string equivalent to a query string
Bille et al. Fast and compact regular expression matching
WO2000057291A1 (en) Spelling correction method using improved minimum edit distance algorithm
CA2509496A1 (en) Search-enhanced trie-based syntactic pattern recognition of sequences
JPS63254559A (en) Spelling aid for compound word
EP1011057B1 (en) Identifying a group of words using modified query words obtained from successive suffix relationships
US20050065776A1 (en) System and method for the recognition of organic chemical names in text documents
CN103514236A (en) Retrieval condition error correction prompt processing method based on Pinyin in retrieval application
Navarro et al. Space-efficient top-k document retrieval
US8024319B2 (en) Finite-state model for processing web queries
Abainia et al. Comparing the effectiveness of the improved ARLSTem algorithm with existing Arabic light stemmers
CN112016328B (en) Academic institution name entity alignment method based on text features
Lawaye et al. Design and implementation of spell checker for Kashmiri
Lewenstein et al. Document retrieval with one wildcard
Badr et al. On optimizing syntactic pattern recognition using tries and AI-based heuristic-search strategies
EP1348175B1 (en) Improved multistage intelligent database search method
Luque et al. Natural language tagging with parallel genetic algorithms
Otair et al. An Arabic retrieval system with native language rather than SQL queries
Tschorn et al. Morphological knowledge and alignment of English-German parallel corpora

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase