CN109033065A - A kind of English- word spelling inspection method - Google Patents

A kind of English- word spelling inspection method Download PDF

Info

Publication number
CN109033065A
CN109033065A CN201810555195.4A CN201810555195A CN109033065A CN 109033065 A CN109033065 A CN 109033065A CN 201810555195 A CN201810555195 A CN 201810555195A CN 109033065 A CN109033065 A CN 109033065A
Authority
CN
China
Prior art keywords
word
editing distance
distance
vision
english
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810555195.4A
Other languages
Chinese (zh)
Inventor
邵玉斌
王林坪
龙华
杜庆治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201810555195.4A priority Critical patent/CN109033065A/en
Publication of CN109033065A publication Critical patent/CN109033065A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The present invention relates to a kind of English- word spelling inspection methods, belong to natural language processing technique field.The calculating for carrying out editing distance to the word and English dictionary of input with conventional Levenshtein distance first, filters out set of letters similar with its according to threshold value;Then key editing distance model is introduced, the key editing distance of all words in input word and set of letters is calculated, secondly, vision editing distance model, calculates the vision editing distance of all words in input word and set of letters;Finally, giving the above resulting corresponding weight of similarity of calculating, calculated by weighing edit distance.Compared with prior art, the present invention mainly solving phenomena such as text editor carries out the inaccuracy and excessive redundancy of spell check to English word at this stage, the approximate set of words being matched to can be narrowed down to more accurate range.

Description

A kind of English- word spelling inspection method
Technical field
The present invention relates to a kind of English- word spelling inspection methods, belong to natural language processing technique field.
Background technique
Currently, the user using Word or WPS etc text editor is more and more, especially for office worker, makes It is most important a part in work with these text edit softwares, however is sent out often the case where misspelling during typewriting It is raw.
Levenshtein distance, also known as editing distance, refer between two character strings, are converted into another by one Required minimum edit operation times.It is mostly exactly to be entangled using editing distance in spell checking methods at this stage Mistake, although this method can be matched less than a part of word within the scope of editing distance, it is very there are limitation Hardly possible one unified threshold value of setting, is as a result likely to occur mistake or omits, and if there are more word under uniform threshold, it Also all legal words can be listed, the selection of user is unfavorable for.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of English- word spelling inspection methods, for solving Levenshtein distance sets the deviation of too large or too small appearance to threshold value, set of words can be narrowed down to more accurate range.
The technical scheme is that a kind of English- word spelling inspection method, first with conventional Levenshtein away from Calculating from word and English dictionary progress editing distance to input, filters out set of words similar with its according to threshold value It closes;Then introduce key editing distance model, calculate input word and set of letters in all words key editor away from From secondly, vision editing distance model, calculates the vision editing distance of all words in input word and set of letters; Finally, giving the above resulting corresponding weight of similarity of calculating, calculated by weighing edit distance.
Specific step is as follows for the method:
Step0.1: key letter approximate data library is established.According to each finger to the control feelings of letter key each on keyboard Condition makes the rule that can react degree of approximation between any letter key, is calculated between any letter and letter according to rule Close degree is simultaneously stored in database, sets up key letter approximate data library;
Step0.2: alphabetical vision similar database is established.Manually check the similar situation of every two letter on the screen, A kind of a kind of rule that can reflect alphabetical similar situation is designed according to these similar situations, any word is calculated according to rule Collimation error distance between female and letter is simultaneously stored in database, sets up vision letter approximate data library;
Step1: the word A of spell check is carried out needed for choosing;
Step2: vocabulary in word A and dictionary is carried out approximate match, is measured using editing distance by traversal English dictionary Inquiry filters out part of words set B={ w if the threshold value of editing distance is X1,w2,w3,…,wn, the size of n is by threshold X It determines;
Step3: according to key letter approximate data library, word A and set of letters B={ w is calculated1,w2,w3,…,wnIn Element wi, approximate editing distance I (A, the B of key letter is based between i ∈ [1, n]i);
Step4: according to alphabetical vision similar database, word A and set of letters B={ w is calculated1,w2,w3,…,wnIn Element wi, based on similar editing distance J (A, the B of alphabetical vision between i ∈ [1, n]i);
Step5: setting the calculated editing distance of Step3, Step4 step institute and corresponding to weight is respectively i, j, and weight i, j are full The requirement of sufficient i+j=1 calculates word A and text B by editing distance I (A, B) and weight i, editing distance J (A, B) and weight j Weighing edit distance R (A, B)=I (A, B) × i+J (A, B) × j, list is further screened according to weighing edit distance and threshold value Y Element in set of words B.
Further, in the step Step0.1, letter and letter can be reacted according to control situation of the hand to keyboard Between close degree, the key editing distance table of letter to letter, i.e. key letter approximate number can be drawn according to close degree Word A and B are calculated according to library, then by Step3i, key editing distance I (A, B between i ∈ [1, n]i)。
Further, in the step Step0.2, judged according to the vision of people between two letters or alphabetical and several Similar situation between word can draw vision editing distance table according to the rule reflected between letter, i.e., alphabetical vision phase Likelihood data library, then word A and B are calculated by Step4i, vision editing distance J (A, B between i ∈ [1, n]i)。
Further, the threshold X in the step Step2, generally 3, but can be repaired by a small margin according to the actual situation Change;What n was indicated is that the editing distance of all words in word A and English dictionary is less than the word total number of X.
Further, weighing edit distance described in the step Step5 is expressed as follows:
R (A, B)=I (A, B) × i+J (A, B) × j
Wherein, R (A, B) indicates that the weighing edit distance with B replacement A, I (A, B) are A to B based on key letter approximation Editing distance, J (A, B) is A to B based on the similar editing distance of alphabetical vision, and i, j are vision editing distance and key volume Collect the weight of distance.
It is not allowed the beneficial effects of the present invention are: solving text editor at this stage and carrying out spell check to English word Phenomena such as true property and excessive redundancy, the approximate set of words being matched to can be narrowed down to more accurate range.
Detailed description of the invention
Fig. 1 is general flow chart of the present invention;
Fig. 2 is the example diagram in step Step0.1 key letter approximate data of the present invention library;
Fig. 3 is the example diagram of step Step0.2 letter vision similar database of the present invention.
Specific embodiment
With reference to the accompanying drawings and detailed description, the invention will be further described.
Embodiment 1: a kind of English- word spelling inspection method, specific step is as follows for the method:
Step0.1: key letter approximate data library is established.According to each finger to the control feelings of letter key each on keyboard Condition makes the rule that can react degree of approximation between any letter key, is calculated between any letter and letter according to rule Close degree is simultaneously stored in database, sets up key letter approximate data library;
Step0.2: alphabetical vision similar database is established.Manually check the similar situation of every two letter on the screen, A kind of a kind of rule that can reflect alphabetical similar situation is designed according to these similar situations, any word is calculated according to rule Collimation error distance between female and letter is simultaneously stored in database, sets up vision letter approximate data library;
Step1: the word A of spell check is carried out needed for choosing;
Step2: vocabulary in word A and dictionary is carried out approximate match, is measured using editing distance by traversal English dictionary Inquiry filters out part of words set B={ w if the threshold value of editing distance is X1,w2,w3,…,wn, the size of n is by threshold X It determines;
Step3: according to key letter approximate data library, word A and set of letters B={ w is calculated1,w2,w3,…,wnIn Element wi, approximate editing distance I (A, the B of key letter is based between i ∈ [1, n]i);
Step4: according to alphabetical vision similar database, word A and set of letters B={ w is calculated1,w2,w3,…,wnIn Element wi, based on similar editing distance J (A, the B of alphabetical vision between i ∈ [1, n]i);
Step5: setting the calculated editing distance of Step3, Step4 step institute and corresponding to weight is respectively i, j, and weight i, j are full The requirement of sufficient i+j=1 calculates word A and text B by editing distance I (A, B) and weight i, editing distance J (A, B) and weight j Weighing edit distance R (A, B)=I (A, B) × i+J (A, B) × j, list is further screened according to weighing edit distance and threshold value Y Element in set of words B.
Further, in the step Step0.1, letter and letter can be reacted according to control situation of the hand to keyboard Between close degree, the key editing distance table of letter to letter, i.e. key letter approximate number can be drawn according to close degree Word A and B are calculated according to library, then by Step3i, key editing distance I (A, B between i ∈ [1, n]i)。
Further, in the step Step0.2, judged according to the vision of people between two letters or alphabetical and several Similar situation between word can draw vision editing distance table according to the rule reflected between letter, i.e., alphabetical vision phase Likelihood data library, then word A and B are calculated by Step4i, vision editing distance J (A, B between i ∈ [1, n]i)。
Further, the threshold X in the step Step2, generally 3, but can be repaired by a small margin according to the actual situation Change;What n was indicated is that the editing distance of all words in word A and English dictionary is less than the word total number of X.
Further, weighing edit distance described in the step Step5 is expressed as follows:
R (A, B)=I (A, B) × i+J (A, B) × j
Wherein, R (A, B) indicates that the weighing edit distance with B replacement A, I (A, B) are A to B based on key letter approximation Editing distance, J (A, B) is A to B based on the similar editing distance of alphabetical vision, and i, j are vision editing distance and key volume Collect the weight of distance.
In conjunction with attached drawing, the embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept Put that various changes can be made.

Claims (2)

1. a kind of English- word spelling inspection method, it is characterised in that:
Step1: the word A of spell check is carried out needed for choosing;
Step2: vocabulary in word A and dictionary is carried out approximate match, is looked into using editing distance to measure by traversal English dictionary It askes, if the threshold value of editing distance is X, filters out part of words set B={ w1,w2,w3,…,wn, the size of n is determined by threshold X Fixed, what n was indicated is that the editing distance of all words in word A and English dictionary is less than the word total number of X;
Step3: according to key letter approximate data library, word A and set of letters B={ w is calculated1,w2,w3,…,wnIn element wi, approximate editing distance I (A, the B of key letter is based between i ∈ [1, n]i);
Step4: according to alphabetical vision similar database, word A and set of letters B={ w is calculated1,w2,w3,…,wnIn element wi, based on similar editing distance J (A, the B of alphabetical vision between i ∈ [1, n]i);
Step5: setting the calculated editing distance of Step3, Step4 step institute and corresponding to weight is respectively i, j, and weight i, j meet i+j =1 requirement calculates adding for word A and text B by editing distance I (A, B) and weight i, editing distance J (A, B) and weight j Editing distance R (A, B)=I (A, B) × i+J (A, B) × j is weighed, set of words is further screened according to weighing edit distance and threshold value Y Close the element in B.
2. English- word spelling inspection method according to claim 1, it is characterised in that: described in the step Step5 Weighing edit distance be expressed as follows:
R (A, B)=I (A, B) × i+J (A, B) × j
Wherein, R (A, B) indicates that the weighing edit distance with B replacement A, I (A, B) are approximately compiling based on key letter for A to B Volume distance, J (A, B) are A to B based on the similar editing distance of alphabetical vision, i, j for vision editing distance and key editor away from From weight.
CN201810555195.4A 2018-06-01 2018-06-01 A kind of English- word spelling inspection method Pending CN109033065A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810555195.4A CN109033065A (en) 2018-06-01 2018-06-01 A kind of English- word spelling inspection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810555195.4A CN109033065A (en) 2018-06-01 2018-06-01 A kind of English- word spelling inspection method

Publications (1)

Publication Number Publication Date
CN109033065A true CN109033065A (en) 2018-12-18

Family

ID=64611938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810555195.4A Pending CN109033065A (en) 2018-06-01 2018-06-01 A kind of English- word spelling inspection method

Country Status (1)

Country Link
CN (1) CN109033065A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916263A (en) * 2010-07-27 2010-12-15 武汉大学 Fuzzy keyword query method and system based on weighing edit distance
CN103299550A (en) * 2010-11-04 2013-09-11 纽昂斯通讯公司 Spell-check for a keyboard system with automatic correction
CN103885938A (en) * 2014-04-14 2014-06-25 东南大学 Industry spelling mistake checking method based on user feedback
CN105975625A (en) * 2016-05-26 2016-09-28 同方知网数字出版技术股份有限公司 Chinglish inquiring correcting method and system oriented to English search engine

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916263A (en) * 2010-07-27 2010-12-15 武汉大学 Fuzzy keyword query method and system based on weighing edit distance
CN103299550A (en) * 2010-11-04 2013-09-11 纽昂斯通讯公司 Spell-check for a keyboard system with automatic correction
CN103885938A (en) * 2014-04-14 2014-06-25 东南大学 Industry spelling mistake checking method based on user feedback
CN105975625A (en) * 2016-05-26 2016-09-28 同方知网数字出版技术股份有限公司 Chinglish inquiring correcting method and system oriented to English search engine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QIRUIDUNI: ""最小编辑距离,键盘距离与拼写纠正"", 《HTTPS://BLOG.CSDN.NET/QIRUIDUNI/ARTICLE/DETAILS/25861799》, 15 May 2014 (2014-05-15), pages 1 - 3 *

Similar Documents

Publication Publication Date Title
US10311146B2 (en) Machine translation method for performing translation between languages
Fowler et al. Effects of language modeling and its personalization on touchscreen typing performance
US8316295B2 (en) Shared language model
US8677237B2 (en) Integrated pinyin and stroke input
CN111310440B (en) Text error correction method, device and system
CN102971729B (en) Operable attribute is attributed to the data describing personal identification
CN110597994A (en) Event element identification method and device
WO2015139497A1 (en) Method and apparatus for determining similar characters in search engine
CN105094368A (en) Control method and control device for frequency modulation ordering of input method candidate item
CN111832278B (en) Document fluency detection method and device, electronic equipment and medium
CN105808197A (en) Information processing method and electronic device
Mawardi et al. Spelling correction for text documents in Bahasa Indonesia using finite state automata and Levinshtein distance method
CN111199726A (en) Speech processing based on fine-grained mapping of speech components
US7813920B2 (en) Learning to reorder alternates based on a user'S personalized vocabulary
WO2024045527A1 (en) Word/sentence error correction method and device, readable storage medium, and computer program product
Li et al. Dimsim: An accurate chinese phonetic similarity algorithm based on learned high dimensional encoding
KR20080007261A (en) Abbreviated handwritten ideographic entry phrase by partial entry
US11842154B2 (en) Visually correlating individual terms in natural language input to respective structured phrases representing the natural language input
Volk et al. Comparing a statistical and a rule-based tagger for German
US20110229036A1 (en) Method and apparatus for text and error profiling of historical documents
US10353927B2 (en) Categorizing columns in a data table
CN109033065A (en) A kind of English- word spelling inspection method
CN115831117A (en) Entity identification method, entity identification device, computer equipment and storage medium
US20220092453A1 (en) Systems and methods for analysis explainability
US20130080148A1 (en) Information processing apparatus, information processing method, and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181218

RJ01 Rejection of invention patent application after publication