CN109033065A - A kind of English- word spelling inspection method - Google Patents
A kind of English- word spelling inspection method Download PDFInfo
- Publication number
- CN109033065A CN109033065A CN201810555195.4A CN201810555195A CN109033065A CN 109033065 A CN109033065 A CN 109033065A CN 201810555195 A CN201810555195 A CN 201810555195A CN 109033065 A CN109033065 A CN 109033065A
- Authority
- CN
- China
- Prior art keywords
- word
- editing distance
- distance
- vision
- english
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 238000007689 inspection Methods 0.000 title claims abstract description 9
- 238000005303 weighing Methods 0.000 claims abstract description 13
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
The present invention relates to a kind of English- word spelling inspection methods, belong to natural language processing technique field.The calculating for carrying out editing distance to the word and English dictionary of input with conventional Levenshtein distance first, filters out set of letters similar with its according to threshold value;Then key editing distance model is introduced, the key editing distance of all words in input word and set of letters is calculated, secondly, vision editing distance model, calculates the vision editing distance of all words in input word and set of letters;Finally, giving the above resulting corresponding weight of similarity of calculating, calculated by weighing edit distance.Compared with prior art, the present invention mainly solving phenomena such as text editor carries out the inaccuracy and excessive redundancy of spell check to English word at this stage, the approximate set of words being matched to can be narrowed down to more accurate range.
Description
Technical field
The present invention relates to a kind of English- word spelling inspection methods, belong to natural language processing technique field.
Background technique
Currently, the user using Word or WPS etc text editor is more and more, especially for office worker, makes
It is most important a part in work with these text edit softwares, however is sent out often the case where misspelling during typewriting
It is raw.
Levenshtein distance, also known as editing distance, refer between two character strings, are converted into another by one
Required minimum edit operation times.It is mostly exactly to be entangled using editing distance in spell checking methods at this stage
Mistake, although this method can be matched less than a part of word within the scope of editing distance, it is very there are limitation
Hardly possible one unified threshold value of setting, is as a result likely to occur mistake or omits, and if there are more word under uniform threshold, it
Also all legal words can be listed, the selection of user is unfavorable for.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of English- word spelling inspection methods, for solving
Levenshtein distance sets the deviation of too large or too small appearance to threshold value, set of words can be narrowed down to more accurate range.
The technical scheme is that a kind of English- word spelling inspection method, first with conventional Levenshtein away from
Calculating from word and English dictionary progress editing distance to input, filters out set of words similar with its according to threshold value
It closes;Then introduce key editing distance model, calculate input word and set of letters in all words key editor away from
From secondly, vision editing distance model, calculates the vision editing distance of all words in input word and set of letters;
Finally, giving the above resulting corresponding weight of similarity of calculating, calculated by weighing edit distance.
Specific step is as follows for the method:
Step0.1: key letter approximate data library is established.According to each finger to the control feelings of letter key each on keyboard
Condition makes the rule that can react degree of approximation between any letter key, is calculated between any letter and letter according to rule
Close degree is simultaneously stored in database, sets up key letter approximate data library;
Step0.2: alphabetical vision similar database is established.Manually check the similar situation of every two letter on the screen,
A kind of a kind of rule that can reflect alphabetical similar situation is designed according to these similar situations, any word is calculated according to rule
Collimation error distance between female and letter is simultaneously stored in database, sets up vision letter approximate data library;
Step1: the word A of spell check is carried out needed for choosing;
Step2: vocabulary in word A and dictionary is carried out approximate match, is measured using editing distance by traversal English dictionary
Inquiry filters out part of words set B={ w if the threshold value of editing distance is X1,w2,w3,…,wn, the size of n is by threshold X
It determines;
Step3: according to key letter approximate data library, word A and set of letters B={ w is calculated1,w2,w3,…,wnIn
Element wi, approximate editing distance I (A, the B of key letter is based between i ∈ [1, n]i);
Step4: according to alphabetical vision similar database, word A and set of letters B={ w is calculated1,w2,w3,…,wnIn
Element wi, based on similar editing distance J (A, the B of alphabetical vision between i ∈ [1, n]i);
Step5: setting the calculated editing distance of Step3, Step4 step institute and corresponding to weight is respectively i, j, and weight i, j are full
The requirement of sufficient i+j=1 calculates word A and text B by editing distance I (A, B) and weight i, editing distance J (A, B) and weight j
Weighing edit distance R (A, B)=I (A, B) × i+J (A, B) × j, list is further screened according to weighing edit distance and threshold value Y
Element in set of words B.
Further, in the step Step0.1, letter and letter can be reacted according to control situation of the hand to keyboard
Between close degree, the key editing distance table of letter to letter, i.e. key letter approximate number can be drawn according to close degree
Word A and B are calculated according to library, then by Step3i, key editing distance I (A, B between i ∈ [1, n]i)。
Further, in the step Step0.2, judged according to the vision of people between two letters or alphabetical and several
Similar situation between word can draw vision editing distance table according to the rule reflected between letter, i.e., alphabetical vision phase
Likelihood data library, then word A and B are calculated by Step4i, vision editing distance J (A, B between i ∈ [1, n]i)。
Further, the threshold X in the step Step2, generally 3, but can be repaired by a small margin according to the actual situation
Change;What n was indicated is that the editing distance of all words in word A and English dictionary is less than the word total number of X.
Further, weighing edit distance described in the step Step5 is expressed as follows:
R (A, B)=I (A, B) × i+J (A, B) × j
Wherein, R (A, B) indicates that the weighing edit distance with B replacement A, I (A, B) are A to B based on key letter approximation
Editing distance, J (A, B) is A to B based on the similar editing distance of alphabetical vision, and i, j are vision editing distance and key volume
Collect the weight of distance.
It is not allowed the beneficial effects of the present invention are: solving text editor at this stage and carrying out spell check to English word
Phenomena such as true property and excessive redundancy, the approximate set of words being matched to can be narrowed down to more accurate range.
Detailed description of the invention
Fig. 1 is general flow chart of the present invention;
Fig. 2 is the example diagram in step Step0.1 key letter approximate data of the present invention library;
Fig. 3 is the example diagram of step Step0.2 letter vision similar database of the present invention.
Specific embodiment
With reference to the accompanying drawings and detailed description, the invention will be further described.
Embodiment 1: a kind of English- word spelling inspection method, specific step is as follows for the method:
Step0.1: key letter approximate data library is established.According to each finger to the control feelings of letter key each on keyboard
Condition makes the rule that can react degree of approximation between any letter key, is calculated between any letter and letter according to rule
Close degree is simultaneously stored in database, sets up key letter approximate data library;
Step0.2: alphabetical vision similar database is established.Manually check the similar situation of every two letter on the screen,
A kind of a kind of rule that can reflect alphabetical similar situation is designed according to these similar situations, any word is calculated according to rule
Collimation error distance between female and letter is simultaneously stored in database, sets up vision letter approximate data library;
Step1: the word A of spell check is carried out needed for choosing;
Step2: vocabulary in word A and dictionary is carried out approximate match, is measured using editing distance by traversal English dictionary
Inquiry filters out part of words set B={ w if the threshold value of editing distance is X1,w2,w3,…,wn, the size of n is by threshold X
It determines;
Step3: according to key letter approximate data library, word A and set of letters B={ w is calculated1,w2,w3,…,wnIn
Element wi, approximate editing distance I (A, the B of key letter is based between i ∈ [1, n]i);
Step4: according to alphabetical vision similar database, word A and set of letters B={ w is calculated1,w2,w3,…,wnIn
Element wi, based on similar editing distance J (A, the B of alphabetical vision between i ∈ [1, n]i);
Step5: setting the calculated editing distance of Step3, Step4 step institute and corresponding to weight is respectively i, j, and weight i, j are full
The requirement of sufficient i+j=1 calculates word A and text B by editing distance I (A, B) and weight i, editing distance J (A, B) and weight j
Weighing edit distance R (A, B)=I (A, B) × i+J (A, B) × j, list is further screened according to weighing edit distance and threshold value Y
Element in set of words B.
Further, in the step Step0.1, letter and letter can be reacted according to control situation of the hand to keyboard
Between close degree, the key editing distance table of letter to letter, i.e. key letter approximate number can be drawn according to close degree
Word A and B are calculated according to library, then by Step3i, key editing distance I (A, B between i ∈ [1, n]i)。
Further, in the step Step0.2, judged according to the vision of people between two letters or alphabetical and several
Similar situation between word can draw vision editing distance table according to the rule reflected between letter, i.e., alphabetical vision phase
Likelihood data library, then word A and B are calculated by Step4i, vision editing distance J (A, B between i ∈ [1, n]i)。
Further, the threshold X in the step Step2, generally 3, but can be repaired by a small margin according to the actual situation
Change;What n was indicated is that the editing distance of all words in word A and English dictionary is less than the word total number of X.
Further, weighing edit distance described in the step Step5 is expressed as follows:
R (A, B)=I (A, B) × i+J (A, B) × j
Wherein, R (A, B) indicates that the weighing edit distance with B replacement A, I (A, B) are A to B based on key letter approximation
Editing distance, J (A, B) is A to B based on the similar editing distance of alphabetical vision, and i, j are vision editing distance and key volume
Collect the weight of distance.
In conjunction with attached drawing, the embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned
Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept
Put that various changes can be made.
Claims (2)
1. a kind of English- word spelling inspection method, it is characterised in that:
Step1: the word A of spell check is carried out needed for choosing;
Step2: vocabulary in word A and dictionary is carried out approximate match, is looked into using editing distance to measure by traversal English dictionary
It askes, if the threshold value of editing distance is X, filters out part of words set B={ w1,w2,w3,…,wn, the size of n is determined by threshold X
Fixed, what n was indicated is that the editing distance of all words in word A and English dictionary is less than the word total number of X;
Step3: according to key letter approximate data library, word A and set of letters B={ w is calculated1,w2,w3,…,wnIn element
wi, approximate editing distance I (A, the B of key letter is based between i ∈ [1, n]i);
Step4: according to alphabetical vision similar database, word A and set of letters B={ w is calculated1,w2,w3,…,wnIn element
wi, based on similar editing distance J (A, the B of alphabetical vision between i ∈ [1, n]i);
Step5: setting the calculated editing distance of Step3, Step4 step institute and corresponding to weight is respectively i, j, and weight i, j meet i+j
=1 requirement calculates adding for word A and text B by editing distance I (A, B) and weight i, editing distance J (A, B) and weight j
Editing distance R (A, B)=I (A, B) × i+J (A, B) × j is weighed, set of words is further screened according to weighing edit distance and threshold value Y
Close the element in B.
2. English- word spelling inspection method according to claim 1, it is characterised in that: described in the step Step5
Weighing edit distance be expressed as follows:
R (A, B)=I (A, B) × i+J (A, B) × j
Wherein, R (A, B) indicates that the weighing edit distance with B replacement A, I (A, B) are approximately compiling based on key letter for A to B
Volume distance, J (A, B) are A to B based on the similar editing distance of alphabetical vision, i, j for vision editing distance and key editor away from
From weight.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810555195.4A CN109033065A (en) | 2018-06-01 | 2018-06-01 | A kind of English- word spelling inspection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810555195.4A CN109033065A (en) | 2018-06-01 | 2018-06-01 | A kind of English- word spelling inspection method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109033065A true CN109033065A (en) | 2018-12-18 |
Family
ID=64611938
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810555195.4A Pending CN109033065A (en) | 2018-06-01 | 2018-06-01 | A kind of English- word spelling inspection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109033065A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101916263A (en) * | 2010-07-27 | 2010-12-15 | 武汉大学 | Fuzzy keyword query method and system based on weighing edit distance |
CN103299550A (en) * | 2010-11-04 | 2013-09-11 | 纽昂斯通讯公司 | Spell-check for a keyboard system with automatic correction |
CN103885938A (en) * | 2014-04-14 | 2014-06-25 | 东南大学 | Industry spelling mistake checking method based on user feedback |
CN105975625A (en) * | 2016-05-26 | 2016-09-28 | 同方知网数字出版技术股份有限公司 | Chinglish inquiring correcting method and system oriented to English search engine |
-
2018
- 2018-06-01 CN CN201810555195.4A patent/CN109033065A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101916263A (en) * | 2010-07-27 | 2010-12-15 | 武汉大学 | Fuzzy keyword query method and system based on weighing edit distance |
CN103299550A (en) * | 2010-11-04 | 2013-09-11 | 纽昂斯通讯公司 | Spell-check for a keyboard system with automatic correction |
CN103885938A (en) * | 2014-04-14 | 2014-06-25 | 东南大学 | Industry spelling mistake checking method based on user feedback |
CN105975625A (en) * | 2016-05-26 | 2016-09-28 | 同方知网数字出版技术股份有限公司 | Chinglish inquiring correcting method and system oriented to English search engine |
Non-Patent Citations (1)
Title |
---|
QIRUIDUNI: ""最小编辑距离,键盘距离与拼写纠正"", 《HTTPS://BLOG.CSDN.NET/QIRUIDUNI/ARTICLE/DETAILS/25861799》, 15 May 2014 (2014-05-15), pages 1 - 3 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10311146B2 (en) | Machine translation method for performing translation between languages | |
Fowler et al. | Effects of language modeling and its personalization on touchscreen typing performance | |
US8316295B2 (en) | Shared language model | |
US8677237B2 (en) | Integrated pinyin and stroke input | |
CN111310440B (en) | Text error correction method, device and system | |
CN102971729B (en) | Operable attribute is attributed to the data describing personal identification | |
CN110597994A (en) | Event element identification method and device | |
WO2015139497A1 (en) | Method and apparatus for determining similar characters in search engine | |
CN105094368A (en) | Control method and control device for frequency modulation ordering of input method candidate item | |
CN111832278B (en) | Document fluency detection method and device, electronic equipment and medium | |
CN105808197A (en) | Information processing method and electronic device | |
Mawardi et al. | Spelling correction for text documents in Bahasa Indonesia using finite state automata and Levinshtein distance method | |
CN111199726A (en) | Speech processing based on fine-grained mapping of speech components | |
US7813920B2 (en) | Learning to reorder alternates based on a user'S personalized vocabulary | |
WO2024045527A1 (en) | Word/sentence error correction method and device, readable storage medium, and computer program product | |
Li et al. | Dimsim: An accurate chinese phonetic similarity algorithm based on learned high dimensional encoding | |
KR20080007261A (en) | Abbreviated handwritten ideographic entry phrase by partial entry | |
US11842154B2 (en) | Visually correlating individual terms in natural language input to respective structured phrases representing the natural language input | |
Volk et al. | Comparing a statistical and a rule-based tagger for German | |
US20110229036A1 (en) | Method and apparatus for text and error profiling of historical documents | |
US10353927B2 (en) | Categorizing columns in a data table | |
CN109033065A (en) | A kind of English- word spelling inspection method | |
CN115831117A (en) | Entity identification method, entity identification device, computer equipment and storage medium | |
US20220092453A1 (en) | Systems and methods for analysis explainability | |
US20130080148A1 (en) | Information processing apparatus, information processing method, and computer readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181218 |
|
RJ01 | Rejection of invention patent application after publication |