CN108628827A - Candidate word evaluation method and device, computer equipment and storage medium - Google Patents
Candidate word evaluation method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN108628827A CN108628827A CN201810321061.6A CN201810321061A CN108628827A CN 108628827 A CN108628827 A CN 108628827A CN 201810321061 A CN201810321061 A CN 201810321061A CN 108628827 A CN108628827 A CN 108628827A
- Authority
- CN
- China
- Prior art keywords
- word
- candidate
- candidate word
- wrong
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 claims abstract description 105
- 238000012937 correction Methods 0.000 claims description 84
- 230000015654 memory Effects 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 14
- 238000012935 Averaging Methods 0.000 claims description 9
- 230000007257 malfunction Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract 1
- 230000000875 corresponding effect Effects 0.000 description 325
- XOFYZVNMUHMLCC-ZPOLXVRWSA-N prednisone Chemical compound O=C1C=C[C@]2(C)[C@H]3C(=O)C[C@](C)([C@@](CC4)(O)C(=O)CO)[C@@H]4[C@@H]3CCC2=C1 XOFYZVNMUHMLCC-ZPOLXVRWSA-N 0.000 description 156
- 241000220225 Malus Species 0.000 description 92
- 235000021016 apples Nutrition 0.000 description 92
- 238000010586 diagram Methods 0.000 description 15
- 230000003993 interaction Effects 0.000 description 15
- 239000000463 material Substances 0.000 description 12
- 230000008569 process Effects 0.000 description 7
- 230000007613 environmental effect Effects 0.000 description 4
- 238000007689 inspection Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 241000251468 Actinopterygii Species 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a candidate word evaluation method, a candidate word evaluation device, computer equipment and a storage medium, which are applied to the field of data processing. The method comprises the following steps: detecting wrong words, and acquiring a plurality of candidate words corresponding to the wrong words; determining the editing distance between each candidate word and the wrong word; respectively replacing the wrong words with the candidate words to obtain candidate sentences, and determining the evaluation probability of the corresponding candidate words according to the candidate sentences, wherein the evaluation probability is obtained according to the language environment probability of the candidate words in the candidate sentences and the language environment probability of the adjacent words of the candidate words; acquiring error information of the wrong word relative to each candidate word; and determining the evaluation score corresponding to each candidate word according to the editing distance, the evaluation probability and the error information. The embodiment of the invention solves the problem of low evaluation reliability of the candidate words and is beneficial to improving the reliability of the evaluation result of the candidate words.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to candidate word appraisal procedure, device, computer equipment and
Storage medium.
Background technology
Currently a popular word processor, such as Word, WPS, WordPerfect etc. are embedded in English spelling inspection
Function, the function is for realizing English spelling inspection, when checking the word of misspelling, provides prompt message, or provide
Corresponding Correcting Suggestion.
In the implementation of the present invention, following problem, existing error correction method exists in the prior art in inventor
It is mainly detected using dictionary, the candidate word of wrong word is assessed by editing distance after finding misspelling, however
This method is too simple and stiff, not ideal enough to the reliability of the assessment result of candidate word.
Invention content
Based on this, it is necessary to for the problem that existing way is not accurate enough to the assessment result of candidate word, provide a kind of time
Select word appraisal procedure, device, computer equipment and storage medium.
Scheme provided in an embodiment of the present invention includes:
A kind of candidate word appraisal procedure, includes the following steps:It detects wrong word, obtains the corresponding multiple candidate words of wrong word;
Determine the editing distance of each candidate word and the wrong word;The wrong word is replaced with each candidate word respectively, obtains candidate sentence, according to
Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence
Border probability and the language environment probability for closing on word of candidate word obtain;Obtain error of the wrong word relative to each candidate word
Information;According to the editing distance, assessment probability and error message, the corresponding assessment score of each candidate word is determined.
A kind of candidate word apparatus for evaluating, including:It is corresponding to obtain wrong word for detecting wrong word for candidate word acquisition module
Multiple candidate words;Apart from determining module, the editing distance for determining each candidate word and the wrong word;Second determine the probability mould
Block obtains candidate sentence, corresponding candidate word is determined according to the candidate sentence for replacing the wrong word with each candidate word respectively
Assessment probability, the assessment probability closes on word according to the language environment probability of candidate word in candidate sentence and candidate word
Language environment probability obtain;Error message acquisition module, for obtaining error message of the wrong word relative to each candidate word;
And the 9th evaluation module, for according to the editing distance, assessment probability and error message, determining that each candidate word corresponds to
Assessment score.
Above-mentioned candidate word appraisal procedure and device, determine the editing distance of candidate word and wrong word, and calculate candidate word in mistake
The assessment probability that lexeme is set determines the corresponding score value of each candidate word according to editing distance, assessment probability and error message
And then error correction term is determined from candidate word;Both the phenomenon that word is write problem had been considered, also by the information of context language environment
It takes into account, is thus conducive to the accuracy for improving candidate word assessment result.
A kind of computer equipment, including memory, processor and be stored on the memory and can be in the processing
The computer program run on device, the processor realize above-mentioned candidate word appraisal procedure when executing the computer program.
Above computer equipment is conducive to improve candidate word assessment by the computer program run on the processor
As a result accuracy.
A kind of computer storage media, is stored thereon with computer program, which realizes above-mentioned when being executed by processor
Candidate word appraisal procedure.
Above computer storage medium is conducive to improve candidate word assessment result by the computer program of its storage
Accuracy.
Description of the drawings
Fig. 1 is the applied environment figure of candidate word appraisal procedure in one embodiment;
Fig. 2 is the schematic flow chart of the candidate word appraisal procedure of first embodiment;
Fig. 3 is the schematic flow chart of the candidate word appraisal procedure of second embodiment;
Fig. 4 is the schematic flow chart of the candidate word appraisal procedure of 3rd embodiment;
Fig. 5 is the schematic flow chart of the candidate word appraisal procedure of fourth embodiment;
Fig. 6 is the schematic flow chart of the candidate word appraisal procedure of the 5th embodiment;
Fig. 7 is the schematic flow chart of the candidate word appraisal procedure of sixth embodiment;
Fig. 8 is the schematic flow chart of the candidate word appraisal procedure of the 7th embodiment;
Fig. 9 is the schematic flow chart of the candidate word appraisal procedure of the 8th embodiment;
Figure 10 is the schematic flow chart of the candidate word appraisal procedure of the 9th embodiment;
Figure 11 is the schematic flow chart of the candidate word appraisal procedure of the tenth embodiment;
Figure 12 is the schematic flow chart of the candidate word appraisal procedure of the 11st embodiment;
Figure 13 is the schematic flow chart of the candidate word appraisal procedure of the 12nd embodiment;
Figure 14 is the schematic flow chart of the candidate word appraisal procedure of the 13rd embodiment;
Figure 15 is the schematic diagram of the candidate word apparatus for evaluating of 14 embodiments;
Figure 16 is the schematic diagram of the candidate word apparatus for evaluating of 15 embodiments;
Figure 17 is the schematic diagram of the candidate word apparatus for evaluating of 16 embodiments;
Figure 18 is the schematic diagram of the candidate word apparatus for evaluating of 17 embodiments;
Figure 19 is the schematic diagram of the candidate word apparatus for evaluating of 18 embodiments;
Figure 20 is the schematic diagram of the candidate word apparatus for evaluating of 19 embodiments;
Figure 21 is the schematic diagram of the candidate word apparatus for evaluating of 20 embodiments;
Figure 22 is the schematic diagram of the candidate word apparatus for evaluating of 21 embodiments;
Figure 23 is the schematic diagram of the candidate word apparatus for evaluating of 22 embodiments;
Figure 24 is the schematic diagram of the candidate word apparatus for evaluating of 23 embodiments;
Figure 25 is the schematic diagram of the candidate word apparatus for evaluating of 24 embodiments;
Figure 26 is the schematic diagram of the candidate word apparatus for evaluating of 25 embodiments;
Figure 27 is the schematic diagram of the candidate word apparatus for evaluating of 26 embodiments.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments
It is contained at least one embodiment of the application.Each position in the description occur the phrase might not each mean it is identical
Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and
Implicitly understand, embodiment described herein can be combined with other embodiments.
Candidate word appraisal procedure provided by the present application, can be applied in application environment as shown in Figure 1.The computer is set
Standby internal structure chart can be with as shown in Figure 1, include processor, memory, display screen and the input connected by system bus
Device.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory of the computer equipment includes
Non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system and computer program.The memory
Reservoir provides environment for the operation of operating system and computer program in non-volatile memory medium.The computer program is located
It manages when device executes to realize a kind of candidate word appraisal procedure.The display screen of the computer equipment can be liquid crystal display or electricity
The input unit of sub- ink display screen, the computer equipment can be the touch layer covered on display screen, can also be computer
Button, trace ball or the Trackpad being arranged on device housings can also be external keyboard, Trackpad or mouse etc..
The computer equipment can be terminal, and it is various personal computers, laptop, intelligent hand to include but not limited to
Machine and intelligent interaction tablet;When for intelligent interaction tablet, the writing operation of user can be detected and identified, additionally it is possible to book
The content of write operation carries out error detection, even, can also carry out error correction to the word of clerical error automatically.
It will be understood by those skilled in the art that structure shown in Fig. 1, is only tied with the relevant part of application scheme
The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment
May include either combining certain components than more or fewer components as shown in the figure or being arranged with different components.
As shown in Fig. 2, in the first embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S11 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
Wrong word includes the word of misspelling or the word of clerical error, is the word being not present in corresponding dictionary.
Wrong word can be the texts class words such as symbol, word, word, English word.Candidate word can according to the mistake word it is true
Fixed word similar with the mistake word is likely to be the corresponding correct word (error correction term) of the mistake word.Candidate word can be one, two
It is a, three etc., the embodiment of the present invention is not limited the number of candidate word.If it is determined that candidate word be one, then directly should
Candidate word is as error correction term.
Wherein, various ways realization, such as dictionary detection mode, the embodiment of the present invention may be used in the mode for detecting wrong word
The mode of the wrong word of detection is not limited.
Step S12 determines the editing distance of each candidate word and the wrong word.
The editing distance of candidate word and wrong word is used to weigh the difference degree of candidate word and the wrong word;Candidate word can be referred to
With the alphabetical number difference of the mistake word, stroke difference etc..For character string or English word, editing distance refers to two
Between a word string, the minimum edit operation number changed into needed for another by one, the edit operation of license includes by a word
Symbol replaces, and is inserted into a character, deletes a character.
In one embodiment, the value of editing distance is preserved using d [i, j] two-dimensional array, indicates to be converted to string s [1 ... i]
String t [1 ... j] required minimal steps, when i is equal to 0, that is, string s is sky, then corresponding d [0, j] is exactly to increase j
A character so that string s is converted into string t;When j is equal to 0, that is, string t is sky, then corresponding d [i, 0] is exactly to reduce i
Character so that string s is converted into string t.For example, editing distance between such as kitten and sitting is 3 because at least with
Lower 3 steps:Sitten (k → s), sittin (e → i), sitting (→ g).
Such as:For a wrong word, all known words that the editing distance in dictionary with it is 1 and 2 are selected, this is built
The candidate word set of wrong word.According to actual conditions, the determination of the candidate word set is also based on other editing distance conditions.
By screening the corresponding candidate word of wrong word, when improving the reliability of candidate word assessment result, while advantageously reducing follow-up calculating
Between.
Step S13 obtains error message of the wrong word relative to each candidate word.
Error message of the wrong word relative to each candidate word, the distinctive information for characterizing wrong word and each candidate word.Optionally,
Error message can written word usual according to user when custom determine, e.g. user is particularly easy to fail to write character or specific
Part is easiest to error etc..
In one embodiment, wrong word can refer to the initial of wrong word and each candidate word relative to the error message of each candidate word
Whether identical information, the whether identical information of character quantity of wrong word and candidate word, whether the radical of wrong word and each candidate word
Identical information, can also be in wrong word whether the information etc. containing illegal symbol.
Step S14 determines the corresponding assessment score of each candidate word according to the editing distance and error message.
The assessment score obtained by the step can reflect that each candidate word is the corresponding error correction term of wrong word (correct word)
Possibility.The step is equivalent to editing distance information and error message according to candidate word and wrong word, is wrong to candidate word
The possibility of the corresponding error correction term of word is effectively assessed.
The candidate word appraisal procedure of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding each candidate word and wrong word editing distance, and determine error message of the wrong word relative to each candidate word;According to editor
Distance and error message, determine the corresponding assessment score of each candidate word;Both custom feature when user's written word had been considered,
It is contemplated that the editing distance information between word and word, is thus conducive to the reliability for improving candidate word assessment result.
In one embodiment, according to the editing distance and error message, the corresponding assessment score of each candidate word is determined,
Including:According to the inverse and error message of editing distance, the corresponding assessment score of each candidate word is determined.
Usually, editing distance is bigger, and candidate word and the difference degree of wrong word are bigger, for the possibility of the error correction term of wrong word
Property it is smaller, therefore, the reliability of the assessment score of candidate word is can guarantee by the inverse and error message of editing distance.
In one embodiment, wrong word includes relative to the error message of each candidate word:Wrong word and candidate word whether lead-in
Female identical information.Accordingly, above-mentioned according to editing distance and error message, determine the corresponding assessment score of each candidate word
Step includes:If wrong word is identical as candidate initial letter, according to the inverse of editing distance and the first coefficient, candidate word is calculated
Corresponding assessment score;If wrong word is different from candidate initial letter, according to the inverse of editing distance and the second coefficient, calculate
The corresponding assessment score of candidate word.
Such as:Assuming that wrong word is indicated relative to the error message of each candidate word with K, editing distance DeditIt indicates, it is candidate
The corresponding point value of evaluation score of wordwordIt indicates, then calculating the formula of the scoring of each candidate word can be:
scoreword=K × 1/Dedit;
Wherein according to candidate word and wrong word, whether initial is identical selects for the selection of K values, if candidate word and wrong prefix word
Parent phase is same, and K values are K1, and otherwise, K values K2, K1, K2 are preset numerical value.Such as:K values are 1 when identical, K when different
Value is 0.5.
For English word or character string, it is based on user's writing style, initial will not generally malfunction, therefore based on upper
Embodiment is stated, is that the possibility of the error correction term of wrong word is assessed to each candidate word, can more reasonably determine candidate word
Assess score.It is error correction term if candidate word is identical as the wrong initial of word so that in the case where other conditions are equivalent
Possibility bigger, otherwise, be error correction term possibility smaller.It should be understood that wherein the value of K include but not limited on
State exemplary value.
Such as:Known cases are that user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet,
Context language environment is I have sone apples;To sone corresponding candidate words assessment method include:
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the candidate word of sone is obtained, it is assumed that have some, same, one, as candidate word.
3, the editing distance of some, same, one and sone are determined respectively, respectively:1、2、1.
4, according to formula scoreword=K × 1/DeditCalculate the score of each candidate word:
scoresome=1 × 1=1;
scoresame=1 × 1/2=0.5;
scoreone=0.5 × 1=0.5.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
The corresponding assessment score of finally obtained each candidate word as a result, is that custom when considering user's written word is special
What sign and the editing distance size between word and word obtained, relative to traditional candidate word assessment side based on editing distance
Method is conducive to the reliability for improving candidate word assessment result, ensures the accuracy of error correction term.
As shown in figure 3, in a second embodiment, a kind of candidate word appraisal procedure includes the following steps:
Step S21 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
Realization about the step can refer to the correspondence step S11 of above-mentioned implementation, not repeat.
Step S22, determines the similarity of each candidate word and wrong word, and the similarity is according to the longest of each candidate word and wrong word
Common subsequence and/or Longest Common Substring obtain.
Longest common subsequence is that (being usually two sequences) is used for searching in all sequences most in an arrangement set
The problem of long sub-sequence.One ordered series of numbers if being respectively the subsequence of two or more known ordered series of numbers, and is all to meet this
It is longest in part sequence, then the referred to as longest common subsequence of known array.
Longest Common Substring is that (being usually two sequences) is used for searching longest in all sequences in an arrangement set
The problem of substring.One substring if being respectively the substring of two or more known ordered series of numbers, and is all to meet this condition substring
In longest, the then referred to as Longest Common Substring of known array.
The effective object of subsequence is sequence, and subsequence is orderly but not necessarily continuous.Such as:Sequence X=<B,C,D,B>It is
Sequence Y=<A,B,C,B,D,A,B>Subsequence, corresponding subscript sequence is<2,3,5,7>.The effect of substring likes character
String, substring are orderly and continuous.Such as:Character string a=abcd is the substring of character string c=aaabcdddd;But character string b=
Acdddd is not just the substring of character string c.
By longest common subsequence and/or Longest Common Substring, it can reflect candidate word and wrong word to a certain extent
Between identical characters quantity number, actual physical meaning is had to the similarity determined between candidate word and wrong word based on this.
Step S23 obtains error message of the wrong word relative to each candidate word.
About the realization method of the step, the explanation of above-mentioned first embodiment can refer to.
Step S24 determines the corresponding assessment score of each candidate word according to the similarity and error message.
Score is assessed by the candidate word that the step obtains, can reflect that each candidate word is the corresponding error correction term of wrong word
Possibility.
The candidate word appraisal procedure of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding each candidate word and wrong word similarity, the similarity according to the longest common subsequence of each candidate word and wrong word and/
Or Longest Common Substring obtains, and determine error message of the wrong word relative to each candidate word;According to similarity and error letter
Breath, determines the corresponding assessment score of each candidate word;Between custom feature and word and word when having considered user's written word
Similarity, be thus conducive to improve candidate word assessment result reliability.
In one embodiment, the step of each candidate word and the similarity of wrong word are determined in above-mentioned implementation is specifically as follows:
According at least one of each candidate word and the longest common subsequence rate of the wrong word, Longest Common Substring rate, each time is calculated
Select the similarity of word and the wrong word.
The mean value of sequence total length is longest common subsequence in the length divided by arrangement set of longest common subsequence
Rate can be denoted as Dlsc1.The mean value of sequence total length is the public son of longest in the length divided by arrangement set of Longest Common Substring
String rate can be denoted as Dlsc2.Such as:Calculating the detailed process of longest common subsequence rate and Longest Common Substring rate can be:It is right
In arrangement set { dbcabca, cbaba }, longest common subsequence is baba, and the length of longest common subsequence is 4, then most
Long common subsequence rate is 4/ ((7+5)/2)=0.67;For arrangement set { babca, cbaba }, Longest Common Substring is
The length of bab, Longest Common Substring are 3, then Longest Common Substring rate is 3/ ((5+5)/2)=0.6.
Longest common subsequence rate and Longest Common Substring rate not only consider the identical characters between candidate word and wrong word
Number, also further consider the ratio shared by identical characters, be conducive to further increase similarity between candidate word and wrong word
Accuracy.
Optionally, if longest common subsequence rate Dlsc1It indicates, Longest Common Substring rate Dlsc2It indicates, each candidate word
It is indicated with S with the similarity of the wrong word, the mode for calculating each candidate word and the similarity of the wrong word is:
S=w1Dlcs1;
S=w2Dlcs2;
S=w1Dlcs1+w2Dlcs2。
Wherein, w1、w2For preset weight coefficient, and sum of the two is 1.
By the weighted sum of longest common subsequence rate and Longest Common Substring rate, candidate word and the wrong word are determined
Similarity, accurate similarity can be obtained.Under normal circumstances, longest common subsequence rate Dlsc1To the similar of two words
Degree influence bigger, therefore longest common subsequence rate Dlsc1Weight coefficient be more than Longest Common Substring rate Dlsc2Weight coefficient, example
Such as:
S=0.7Dlcs1+0.3Dlcs2。
In another embodiment, the step of similarity for determining each candidate word and wrong word can also be:According to each candidate
The volume of at least one of the longest common subsequence rate of word and wrong word, Longest Common Substring rate and each candidate word and wrong word
Distance is collected, the similarity of each candidate word and wrong word is calculated.
Optionally, the mode for calculating each candidate word and the similarity of wrong word may include:
S=1/Dedit+w1Dlcs1;
Alternatively, S=1/Dedit+w2Dlcs2;
Alternatively, S=1/Dedit+w1Dlcs1+w2Dlcs2
Specifically for example:
S=1/Dedit+0.7Dlcs1;
Alternatively, S=1/Dedit+0.3Dlcs2;
Alternatively, S=1/Dedit+0.7Dlcs1+0.3Dlcs2
Wherein, DeditIt is editing distance, refers between two word strings, the minimum editor changed into needed for another by one grasps
Make number, the edit operation of license includes replacing a character, is inserted into a character, deletes a character.In general, it edits
Apart from smaller, the degrees of approximation of two strings are bigger.
Thus it in combination with the editing distance and identical characters quantity between word and word, determines similar between two words
Degree, and by increasing editing distance, this assesses dimension, is conducive to the validity for improving similarity calculation.
It is appreciated that longest common subsequence rate D in above-mentioned formulalsc1, Longest Common Substring rate Dlsc2Weight system before
Number is also to take other numerical value, including but not limited to above example.
In one embodiment, wrong word includes relative to the error message of each candidate word:Wrong word and candidate word whether lead-in
Female identical information.Accordingly, above-mentioned according to the editing distance and error message, determine the corresponding assessment of each candidate word
The step of score includes:If wrong word is identical as candidate initial letter, according to similarity and the first coefficient, candidate word pair is calculated
The assessment score answered;If wrong word is different from candidate initial letter, according to similarity and the second coefficient, calculates candidate word and correspond to
Assessment score.
Optionally, it is assumed that similarity S, wrong word are indicated relative to the error message of each candidate word with K, calculate each candidate word
The formula of scoring can be:
scoreword=K × S;
Wherein according to candidate word and wrong word, whether initial is identical selects for the selection of K values, if candidate word and wrong prefix word
Parent phase is same, and K values are K1, and otherwise, K values K2, K1, K2 are preset numerical value.Such as:K values are 1 when identical, K when different
Value is 0.5.
For English word or character string, it is based on user's writing style, initial will not generally malfunction, therefore by upper
Embodiment is stated, is that the possibility of the error correction term of wrong word is assessed to each candidate word, can more reasonably determine candidate word
Assess score.It is error correction term if candidate word is identical as the wrong initial of word so that in the case where other conditions are equivalent
Possibility bigger, otherwise, be error correction term possibility smaller.It should be understood that wherein the value of K include but not limited on
State exemplary value.
Such as:Known cases are that user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet,
Context language environment is I have sone apples;To sone corresponding candidate words assessment method include:
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, one, as time
Select word.
3, the longest common subsequence rate and Longest Common Substring rate of some, same, one and sone are determined respectively, and are counted
It is as follows to calculate the corresponding similarity of each candidate word:
The longest common subsequence of candidate word some and wrong word sone is soe, and the length of longest common subsequence is 3, most
Long common subsequence rate is 3/ ((4+4)/2)=0.75;Longest Common Substring is so, and the length of Longest Common Substring is 2, longest
Public substring rate is 2/ ((4+4)/2)=0.5;
The longest common subsequence of candidate word same and wrong word sone is se, and the length of longest common subsequence is 2, longest
Common subsequence rate is 2/ ((4+4)/2)=0.5;Longest Common Substring is s or e, and the length of Longest Common Substring is 1, longest
Public substring rate is 1/ ((4+4)/2)=0.25;
The longest common subsequence of candidate word one and wrong word sone is one, and the length of longest common subsequence is 3, longest
Common subsequence rate is 3/ ((3+4)/2)=0.86;Longest Common Substring is one, and the length of Longest Common Substring is 3, longest
Public substring rate is 3/ ((3+4)/2)=0.86;
According to formula S=1/Dedit+0.7Dlcs1+0.3Dlcs2, calculate each candidate word and the similarity S of wrong word be as follows:
Some:1+0.7 × 0.75+0.3 × 0.5=1.675;
Same:1/2+0.7 × 0.5+0.3 × 0.25=0.925;
One:1+0.7 × 0.86+0.3 × 0.86=1.86.
4, according to formula scoreword=K × S calculates the assessment score of each candidate word:
scoresome=1 × 1.675=1.675;
scoresame=1 × 0.925=0.925;
scoreone=0.5 × 1.86=0.93.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
Candidate word appraisal procedure based on above-described embodiment, the phase of the corresponding assessment score of each candidate word wrong word corresponding thereto
Like the relationship that degree is positively correlated, while also being influenced by the initial distinctive information of itself and wrong word, finally obtained each candidate
The corresponding assessment score of word, after being the similarity information between custom feature and word and the word when considering user's written word
It obtains, relative to traditional candidate word appraisal procedure based on editing distance, be conducive to improve candidate word assessment result can
By property.
As shown in figure 4, in the third embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S31 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
Realization about the step can refer to the correspondence step S11 of above-mentioned implementation, not repeat.
Step S32 determines the language environment probability that each candidate word is set in the wrong lexeme.
For candidate word when the language environment probability that wrong lexeme is set refers to replacing the mistake word with candidate word, the candidate word is opposite
In corresponding context reasonability, the corresponding language environment probability of the higher candidate word of context reasonability is higher.
In one embodiment, the probability that each candidate word is set in the wrong lexeme is calculated according to preset language model, by this
Language environment probability of the log values of probability as the candidate word.
Step S33 obtains error message of the wrong word relative to each candidate word.
About the realization method of the step, the explanation of above-mentioned first embodiment can refer to.
Step S34 determines the corresponding assessment score of each candidate word according to the language environment probability and error message.
The assessment score obtained by the step can reflect that each candidate word is the corresponding error correction term of wrong word (correct word)
Possibility.
The candidate word appraisal procedure of above-described embodiment obtains corresponding multiple candidate words when detecting wrong word, true respectively
The probability that fixed each candidate word is set in the wrong lexeme, and determine error message of the wrong word relative to each candidate word;According to
The language environment probability and error message determine the corresponding assessment score of each candidate word;Both user's written word had been considered
When habit problem, also the information of context language environment is taken into account, is thus conducive to improve candidate word assessment result
Reliability.
In one embodiment, determine that the language model for the language environment probability that each candidate word is set in the wrong lexeme includes
But it is not limited to N-Gram models, BiLSTM models or LSTM models.
Wherein, N-Gram models are a kind of statistical language models, are used for (n-1) a item before and predict n-th
item.In application, these item can be phoneme (speech recognition application), character (input method application), word (answer by participle
With) or base-pair (gene information).The thought of N-Gram models:Given a string of letters, such as " for ex ", next maximum possible
Property occur letter what is.From training corpus data, N number of probability point can be obtained by the method for Maximum-likelihood estimation
Cloth:The probability for being a is 0.4, and the probability for being b is 0.0001, and being the probability of c is ..., certainly, need to meet constraints:All N
The summation of a probability distribution is 1.
Length Memory Neural Networks model, commonly referred to as LSTM models are a kind of special Recognition with Recurrent Neural Network;It can lead to
The character level sequence of input is crossed to predict next character being likely to occur.
Two-way shot and long term memory network model, commonly referred to as BiLSTM models, it is as the structure of LSTM models, and institute is not
With BiLSTM models are not only connect with past state, but also are also connect with following state.For example, passing through one
A ground input letter, the unidirectional LSTM predictions " fish (fish) " of training (remember over by being connected by circulation on a timeline
State value), next letter in the feedback network list entries of BiLSTM, this makes it will be seen that following information
What is.The training of this form makes the network can be with the blank between filling information, rather than predictive information.
In one embodiment, the log values for the probability set in the wrong lexeme by each candidate word when language environment probability
When expression, then the corresponding assessment score of each candidate word can be determined according to the inverse and error message of the language environment probability.
The corresponding probability log values of the corresponding assessment score of each candidate word are at inverse relationship as a result, while also by itself and wrong word
The influence of distinctive information.The assessment score of i.e. finally obtained each candidate word is that custom when considering user's written word is asked
It is obtained after topic and language environment information, relative to traditional candidate word appraisal procedure, is conducive to improve candidate word assessment
Reliability.
In one embodiment, wrong word includes relative to the error message of each candidate word:It is described mistake word be with candidate word
The identical information of no initial.Accordingly, described according to the language environment probability and error message, determine each candidate word pair
The step of assessment score answered includes:If wrong word is identical as candidate's initial letter, according to the language environment probability and the
One coefficient calculates the corresponding assessment score of the candidate word;If wrong word is different from candidate initial letter, according to the language ring
Border probability and the second coefficient calculate the corresponding assessment score of the candidate word.
Based on any of the above-described embodiment, if the language environment probability that candidate word is set in wrong lexeme is usedIt indicates,
Word indicates that candidate word, mx representation language models then calculate the assessment score of each candidate word according to following formula:
Wherein, if candidate word is identical with wrong initial letter, K values are K1, and otherwise, K values K2, K1, K2 are preset
Numerical value.
Such as:Assuming that the language environment probability determined by N-Gram language models isCalculate each time
Selecting the formula of the scoring of word can be:
Assuming that being expressed as by the language environment probability that BiLSTM models, LSTM models are determined The formula for then calculating the scoring of each candidate word may respectively be:
Wherein according to candidate word and wrong word, whether initial is identical selects for the selection of K values, and K values are 1 when identical, different
When K values be 0.5.It should be understood that the value of K includes but not limited to the value of above-mentioned example.
For English word or character string, it is based on user's writing style, initial will not generally malfunction, therefore by upper
Embodiment is stated, is that the possibility of the error correction term of wrong word is assessed to each candidate word, can more reasonably determine candidate word
Assess score.It is error correction term if candidate word is identical as the wrong initial of word so that in the case where other conditions are equivalent
Possibility bigger, otherwise, be error correction term possibility smaller.
For example, as it is known that situation is, user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet,
Context language environment is I have sone apples.
For N-GRAM models, N values 3 in N-GRAM models:
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make
For candidate word.
3, some, same, son, one are replaced into the sone in I have sone apples respectively, according to N-GRAM moulds
It is corresponding that type calculates each candidate wordThe language environment probability for obtaining each candidate word is as follows:
Wherein, P (I, have, some)=c (I, have, some)/c (3-Gram), c, which are represented, to be counted, i.e. I in language material,
3 tuples as have, some count the counting of (number) divided by all 3 tuples;
Wherein, such 2 tuple of P (have | I)=c (have, I)/c (I), i.e. have in language material, I counts divided by institute
There is the counting of I;
Wherein, P (I)=c (I)/c (1-Gram), i.e., such 1 tuple counting of I divided by the meter of all 1 tuples in language material
Number.
4, according to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-5.7=0.175;
scoresame=-1 × 1/-6=0.167;
scoreson=-1 × 1/-6=0.167;
scoreone=-0.5 × 1/-4.8=0.104.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for BiLSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Point
Some, same, son, one are not replaced into the sone in I have sone apples, calculate the corresponding language ring of each candidate word
Border probabilityEach candidate word calculates as follows:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-4.9=0.204;
scoresame=-1 × 1/-5.07=0.197;
scoreson=-1 × 1/-7.6=0.132;
scoreone=-0.5 × 1/-4.66=0.107.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for LSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Respectively
Some, same, son, one are replaced into the sone in I have sone apples, calculate the corresponding language environment of each candidate word
ProbabilityEach candidate word calculates as follows:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-5.6=0.179;
scoresame=-1 × 1/-6.3=0.159;
scoreson=-1 × 1/-6=0.116;
scoreone=-0.5 × 1/-5.1=0.098.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
Score is assessed by the candidate word under above-mentioned three kinds of language models to calculate, and has both considered habit when user's written word
Used problem, it is also contemplated that language environment information can thereby determine out the candidate word of the wrong word of more effectively assessment, improve the standard of error correction
True property.
As shown in figure 5, in the fourth embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S41 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The specific implementation of the step is not repeated with reference to the step S11 of above-described embodiment.
Step S42 replaces the wrong word with each candidate word respectively, obtains candidate sentence, is determined according to the candidate sentence
The assessment probability of corresponding candidate word, language environment probability and candidate of the assessment probability according to candidate word in candidate sentence
The language environment probability for closing on word of word obtains.
Wrong word is replaced with each candidate word respectively and obtains corresponding candidate sentence, may include multiple words in candidate's sentence,
In one be candidate word.Word each word in sentence refers to the word relative to correspondence in the language environment probability of its corresponding position
Context reasonability, context reasonability is higher, and the corresponding language environment probability of word is higher.
In one embodiment, the word that closes on of candidate word can be one or more, also, both may include that candidate word is straight
It connects and closes on word, word is closed at the interval that can also include candidate word.It can be calculated according to preset language model candidate in candidate sentence
The probability for closing on its each comfortable position of word of word, candidate word, the log values of the probability is general as the language environment of equivalent
Rate;And then it to the language environment probability averaging for closing on word of the language environment probability of candidate word, candidate word in candidate sentence, obtains
To the assessment probability of candidate word in the candidate sentence.Wherein the language environment probability of candidate word, candidate word the language for closing on word
The mean value for saying ambient probability, either absolute average, can also be weighted average.
Step S43 obtains error message of the wrong word relative to each candidate word.
About the realization method of the step, the explanation of above-mentioned first embodiment can refer to.
Step S44 determines the corresponding assessment score of each candidate word according to the assessment probability and error message.
The assessment score obtained by the step can reflect that each candidate word is the possibility of the corresponding error correction term of wrong word
Property.
The candidate word appraisal procedure of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding the corresponding assessment probability of each candidate word, and determine error message of the wrong word relative to each candidate word;According to institute
Probability and error message are estimated in commentary, determine the corresponding assessment score of each candidate word;Both habit when user's written word had been considered
Used problem, also takes into account the information of context language environment, is thus conducive to the reliability for improving candidate word assessment result.
In one embodiment, candidate word in candidate sentence is calculated, candidate word closes on the general of its each comfortable position of word
Language model includes but not limited to N-Gram models, BiLSTM models or LSTM models.The explanation of each language model can be found in
The description of 3rd embodiment.
In one embodiment, log value of the language environment probability of each word with each word in the probability of its position indicates, is based on
This obtains the assessment probability of candidate word, further, can be determined according to the inverse and error message of the assessment probability of candidate word
The corresponding assessment score of each candidate word.
Such as:If the assessment probability of candidate word is usedIt indicates, error of the wrong word relative to each candidate word
Information indicates with K, the corresponding point value of evaluation score of candidate wordwordIt indicates, then can calculate each candidate word according to following formula
Assess score:
Wherein, if candidate word is identical with wrong initial letter, K values are K1, otherwise, K values K2;K1, K2 are preset
Numerical value.
The corresponding assessment probability of the corresponding assessment score of each candidate word is at inverse relationship as a result, while also by it
With the influence of the initial information of wrong word.The corresponding assessment score of i.e. finally obtained each candidate word, is to consider user's book
It is obtained after habit problem and contextual information when writing word, relative to traditional side for assessing candidate word by editing distance
Method improves the reliability of candidate word assessment result.
In one embodiment, the wrong word includes relative to the error message of each candidate word:The mistake word and candidate
Word whether the identical information of initial.Accordingly, described according to the assessment probability and error message, determine each candidate word pair
The step of assessment score answered includes:If wrong word is identical as candidate initial letter, according to the assessment probability and the first system
Number, calculates the corresponding assessment score of the candidate word;If wrong word is different with candidate's initial letter, according to the assessment probability with
And second coefficient, calculate the corresponding assessment score of the candidate word.
Optionally:Assuming that being used by the assessment probability that N-Gram language models are determinedThen calculate
The formula of the scoring of each candidate word can be:
Wherein, according to candidate word and wrong word, whether initial is identical selects for the selection of K values, and K values are 1 when identical, no
K values are 0.5 simultaneously.It should be understood that the value of K includes but not limited to the value of above-mentioned example.
For English word or character string, it is based on user's writing style, initial will not generally malfunction, therefore based on upper
Embodiment is stated, is that the possibility of the error correction term of wrong word is assessed to each candidate word, can more reasonably determine candidate word
Assess score.It is error correction term if candidate word is identical as the wrong initial of word so that in the case where other conditions are equivalent
Possibility bigger, otherwise, be error correction term possibility smaller.
Such as:Known cases are that user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet,
Context language environment is I have sone apples.
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make
For candidate word.
3, some, same, son, one are replaced to the sone in I have sone apples respectively;It obtains:
Candidate sentence one:I have some apples;
Candidate sentence two:I have same apples;
Candidate sentence three:I have son apples;
Candidate sentence four:I have one apples.
Based on N-GRAM models, N values 3 in N-GRAM models, candidate word some is corresponding in candidate sentence one closes on word
For apples, the corresponding words that close on of candidate word same are apples in candidate sentence two, and candidate word son is corresponded in candidate sentence three
The word that closes on be apples, the corresponding words that close on of candidate word one are apples in candidate sentence four;Based on each candidate language of this calculating
The corresponding assessment probability of candidate word is as follows in sentence:
4, according to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-5.6=0.179;
scoresame=-1 × 1/-6.35=0.157;
scoreson=-1 × 1/-7.58=0.132;
scoreone=-0.5 × 1/-5.26=0.095.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, candidate word some is corresponding for BiLSTM models, in candidate sentence one closes on word
For I, have and apples, the corresponding words that close on of candidate word same are I, have and apples in candidate sentence two, candidate
For I, have and apples, candidate word one is corresponding in candidate sentence four faces the corresponding words that close on of candidate word son in sentence three
Nearly word is I, have and apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-2.88=0.347;
scoresame=-1 × 1/-3.62=0.276;
scoreson=-1 × 1/-4.1=0.244;
scoreone=-0.5 × 1/-2.62=0.191.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for LSTM models, the corresponding words that close on of candidate word some are in candidate sentence one
I, have and apples, the corresponding words that close on of candidate word same are I, have and apples, candidate language in candidate sentence two
For I, have and apples, candidate word one is corresponding in candidate sentence four closes on the corresponding words that close on of candidate word son in sentence three
Word is I, have and apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-3.175=0.315;
scoresame=-1 × 1/-3.75=0.267;
scoreson=-1 × 1/-4.6=0.217;
scoreone=-0.5 × 1/-3.45=0.145.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
By the candidate word assessment under above-mentioned three kinds of language models, habit problem when user's written word had both been considered,
Also the assessment information of language environment is added, which thereby enhances the reliability of candidate word assessment.
As shown in fig. 6, in the 5th embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S51 obtains the corresponding multiple candidate words of wrong word when detecting wrong word.
It is not repeated with reference to the step S11 of above-mentioned first embodiment about the realization method of the step.
Step S52 determines the editing distance of each candidate word and the wrong word.
About the realization method of the step, the explanation of first embodiment can refer to.
Step S53 replaces the wrong word with each candidate word respectively, obtains candidate sentence, is determined according to the candidate sentence
The assessment probability of corresponding candidate word, language environment probability and candidate of the assessment probability according to candidate word in candidate sentence
The language environment probability for closing on word of word obtains.
About the realization method of the step, the explanation of fourth embodiment can refer to.
Step S54 determines the assessment score of each candidate word according to the editing distance and assessment probability.
The assessment score obtained by the step, it is the corresponding error correction term of wrong word (correct word) that can characterize each candidate word
Possibility.
By the candidate word appraisal procedure of above-described embodiment, when detecting wrong word, the corresponding multiple candidates of wrong word are obtained
Word;According to the editing distance and context language environmental information of candidate word and wrong word, comprehensive assessment candidate word is that wrong word corresponds to
The possibility of error correction term improve candidate word assessment relative to traditional candidate word appraisal procedure for relying only on editing distance
Reliability.
In one embodiment, candidate word in candidate sentence is calculated according to preset language model, candidate word closes on word
The probability of each its position of leisure, using the log values of the probability as the language environment probability of each word;Further, to candidate sentence
The language environment probability of middle candidate word, the language environment probability for closing on word of candidate word are averaging, and are obtained in the candidate sentence
The assessment probability of candidate word.
Wherein, language model includes but not limited to N-Gram models, BiLSTM models or LSTM models.Each language model
It can be found in the explanation of above-mentioned 3rd embodiment.
In one embodiment, log value of the language environment probability of each word with each word in the probability of its position indicates, is based on
This obtains the assessment probability of candidate word, further, can be according to the inverse of the editing distance and the inverse of assessment probability, really
Determine the corresponding assessment score of each candidate word.
Such as:The assessment score of each candidate word can be calculated according to following formula:
Wherein, DeditIndicate that the editing distance of candidate word and wrong word, word indicate candidate word,Table
Show the assessment probability of candidate word, scorewordIndicate the corresponding assessment score of candidate word.
The corresponding assessment probability of the corresponding assessment score of each candidate word is at inverse relationship as a result, while also by it
With the influence of the initial information of wrong word.The corresponding assessment score of i.e. finally obtained each candidate word, is to consider written word
When the phenomenon that feature and contextual information after obtain, relative to traditional method for assessing candidate word by editing distance,
Improve the assessment reliability of candidate word.
Such as:Known cases are that user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet,
Context language environment is I have sone apples;To sone corresponding candidate words assessment method include:
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the candidate word of sone is obtained, it is assumed that have some, same, one, son, as candidate word.
3, the editing distance of some, same, one, son and sone are determined respectively, respectively:1、2、1、1.
4, some, same, son, one are replaced to the sone in I have sone apples respectively;It obtains:
Candidate sentence one:I have some apples;
Candidate sentence two:I have same apples;
Candidate sentence three:I have son apples;
Candidate sentence four:I have one apples;
Based on N-GRAM models, N values 3 in N-GRAM models, candidate word some is corresponding in candidate sentence one closes on word
For apples, the corresponding words that close on of candidate word same are apples in candidate sentence two, and candidate word son is corresponded in candidate sentence three
The word that closes on be apples, the corresponding words that close on of candidate word one are apples in candidate sentence four;Based on each candidate language of this calculating
The corresponding assessment probability of candidate word is as follows in sentence:
5, according to formulaCalculate the assessment of each candidate word
Score:
scoresome=-1 × 1/-5.6=0.179;
scoresame=-1/2 × 1/-6.35=0.0787;
scoreson=-1 × 1/-7.58=0.132;
scoreone=-1 × 1/-5.26=0.19.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, candidate word some is corresponding for BiLSTM models, in candidate sentence one closes on word
For I, have and apples, the corresponding words that close on of candidate word same are I, have and apples in candidate sentence two, candidate
For I, have and apples, candidate word one is corresponding in candidate sentence four faces the corresponding words that close on of candidate word son in sentence three
Nearly word is I, have and apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-2.88=0.347;
scoresame=-1/2 × 1/-3.62=0.138;
scoreson=-1 × 1/-4.1=0.244;
scoreone=-1 × 1/-2.62=0.382.
It is similarly obtained, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for LSTM models, the corresponding words that close on of candidate word some are in candidate sentence one
I, have and apples, the corresponding words that close on of candidate word same are I, have and apples, candidate language in candidate sentence two
For I, have and apples, candidate word one is corresponding in candidate sentence four closes on the corresponding words that close on of candidate word son in sentence three
Word is I, have and apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Step 4 replaces with:
According to formulaCalculate assessing for each candidate word
Point:
scoresome=-1 × 1/-3.175=0.315;
scoresame=-1/2 × 1/-3.75=0.1335;
scoreson=-1 × 1/-4.6=0.217;
scoreone=-1 × 1/-3.45=0.29.
It equally obtains, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
As a result, by the corresponding assessment score of the finally obtained each candidate word of above-mentioned three kinds of language models, consider
Thus context language environmental information and editing distance information improve the reliability of candidate word assessment.
As shown in fig. 7, in the sixth embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S61 obtains the corresponding multiple candidate words of wrong word when detecting wrong word.
It is not repeated with reference to the step S11 of above-mentioned first embodiment about the realization method of the step.
Step S62, determines the similarity of each candidate word and wrong word, and the similarity is according to the longest of each candidate word and wrong word
Common subsequence and/or Longest Common Substring obtain.
It is not repeated with reference to the explanation of above-mentioned second embodiment about the realization method of the step.
Step S63 replaces the wrong word with each candidate word respectively, obtains candidate sentence, is determined according to the candidate sentence
The assessment probability of corresponding candidate word, language environment probability and candidate of the assessment probability according to candidate word in candidate sentence
The language environment probability for closing on word of word obtains.
It is not repeated with reference to the explanation of above-mentioned fourth embodiment about the realization method of the step.
Step S64 determines the assessment score of each candidate word according to the similarity and assessment probability.
The assessment score obtained by the step, it is the corresponding error correction term of wrong word (correct word) that can characterize each candidate word
Possibility.
By the candidate word appraisal procedure of above-described embodiment, when detecting wrong word, the corresponding multiple candidates of wrong word are obtained
Word;According to the similarity and context language environmental information of candidate word and wrong word, comprehensive assessment candidate word is that wrong word is corresponding
The possibility of error correction term improves candidate word assessment knot relative to traditional candidate word appraisal procedure for relying only on editing distance
The reliability of fruit.
In one embodiment, candidate word in candidate sentence is calculated according to preset language model, candidate word closes on word
The probability of each its position of leisure, using the log values of the probability as the language environment probability of each word;Further, to candidate sentence
The language environment probability of middle candidate word, the language environment probability for closing on word of candidate word are averaging, and are obtained in the candidate sentence
The assessment probability of candidate word.Wherein, language model includes but not limited to N-Gram models, BiLSTM models or LSTM models.
The case where each language model, can be found in described in above-described embodiment.
In one embodiment, log value of the language environment probability of each word with each word in the probability of its position indicates, is based on
This obtains the assessment probability of candidate word, further, can determine each candidate word pair according to the inverse and similarity of assessment probability
The assessment score answered.
Optionally, the assessment score of each candidate word is calculated according to following formula:
Wherein, word indicates candidate word, scorewordIndicate the corresponding assessment score of candidate wordTable
Show that the assessment probability of candidate word, mx representation language models, S indicate the similarity of candidate word and the wrong word.
The corresponding assessment probability of the corresponding assessment score of each candidate word is at inverse relationship as a result, while also by it
With the influence of the similarity of wrong word.The corresponding assessment score of i.e. finally obtained each candidate word is comprehensive similarity and up and down
It is obtained after literary information, relative to traditional method for assessing candidate word by editing distance, the assessment for improving candidate word can
By property.
Such as:Known cases are that user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet,
Context language environment is I have sone apples;To sone corresponding candidate words assessment method include:
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the candidate word of sone is obtained, it is assumed that have some, same, son, as candidate word.
3, the longest common subsequence rate and Longest Common Substring rate of some, same, son and sone are determined respectively, and are counted
It is as follows to calculate the corresponding similarity of each candidate word:
The longest common subsequence of candidate word some and wrong word sone is soe, and the length of longest common subsequence is 3, most
Long common subsequence rate is 3/ ((4+4)/2)=0.75;Longest Common Substring is so, and the length of Longest Common Substring is 2, longest
Public substring rate is 2/ ((4+4)/2)=0.5;
The longest common subsequence of candidate word same and wrong word sone is se, and the length of longest common subsequence is 2, longest
Common subsequence rate is 2/ ((4+4)/2)=0.5;Longest Common Substring is s or e, and the length of Longest Common Substring is 1, longest
Public substring rate is 1/ ((4+4)/2)=0.25;
The longest common subsequence of candidate word son and wrong word sone is son, and the length of longest common subsequence is 3, longest
Common subsequence rate is 3/ ((3+4)/2)=0.86;Longest Common Substring is son, and the length of Longest Common Substring is 3, longest
Public substring rate is 3/ ((3+4)/2)=0.86.
According to formula S=1/Dedit+0.7Dlcs1+0.3Dlcs2, calculate each candidate word and the similarity S of wrong word be as follows:
Some:1+0.7 × 0.75+0.3 × 0.5=1.675;
Same:1/2+0.7 × 0.5+0.3 × 0.25=0.925;
son:1+0.7 × 0.86+0.3 × 0.86=1.86.
4, some, same, son are replaced to the sone in I have sone apples respectively;It obtains:
Candidate sentence one:I have some apples;
Candidate sentence two:I have same apples;
Candidate sentence three:I have son apples.
Based on N-GRAM models, N values 3 in N-GRAM models, candidate word some is corresponding in candidate sentence one closes on word
For apples, the corresponding words that close on of candidate word same are apples in candidate sentence two, and candidate word son is corresponded in candidate sentence three
Close on word be apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Wherein, P (I, have, some)=c (I, have, some)/c (3-Gram), c, which are represented, to be counted, i.e. I in language material,
3 tuples as have, some count the counting of (number) divided by all 3 tuples;
Wherein, 2 tuples as P (have | I)=c (have, I)/c (I), i.e. have in language material, I count divided by all I
Counting;
Wherein, P (I)=c (I)/c (1-Gram), i.e., the meter of the counting of 1 tuple divided by all 1 tuples as I in language material
Number.
5, according to formulaCalculate the assessment score of each candidate word:
scoresome=-1.675 × 1/-5.6=0.299;
scoresame=-0.925 × 1/-6.35=0.146;
scoreson=-1.86 × 1/-7.58=0.245.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, candidate word some is corresponding for BiLSTM models, in candidate sentence one closes on word
For I, have and apples, the corresponding words that close on of candidate word same are I, have and apples in candidate sentence two, candidate
The corresponding words that close on of candidate word son are I, have and apples in sentence three;Based on candidate word in each candidate sentence of this calculating
Corresponding assessment probability is as follows:
Step 5 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1.675 × 1/-2.88=0.582;
scoresame=-0.925 × 1/-3.62=0.256;
scoreson=-1.86 × 1/-4.1=0.454.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for LSTM models, the corresponding words that close on of candidate word some are in candidate sentence one
I, have and apples, the corresponding words that close on of candidate word same are I, have and apples, candidate language in candidate sentence two
The corresponding words that close on of candidate word son are I, have and apples in sentence three;Based on candidate word pair in each candidate sentence of this calculating
The assessment probability answered is as follows:
Step 5 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1.675 × 1/-3.175=0.528;
scoresame=-0.925 × 1/-3.75=0.247;
scoreson=-1.86 × 1/-4.6=0.404.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
As a result, by the corresponding assessment score of the finally obtained each candidate word of above-mentioned three kinds of language models, consider
Thus similarity information between context language environmental information and word and word improves the reliability of candidate word assessment.
As shown in figure 8, in the seventh embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S71 obtains the corresponding multiple candidate words of wrong word when detecting wrong word.
About the realization method of the step, the explanation of the step S11 of first embodiment can refer to.
Step S72 determines the editing distance of each candidate word and the wrong word.
About the realization method of the step, the explanation of first embodiment can refer to.
Step S73 determines the language environment probability that each candidate word is set in the wrong lexeme.
About the realization method of the step, the explanation of 3rd embodiment can refer to.
Step S74 obtains error message of the wrong word relative to each candidate word.
The specific implementation of the step can refer to the explanation of above-mentioned first embodiment.
Step S75 determines that each candidate word is corresponding according to the editing distance, language environment probability and error message
Assess score.
The assessment score obtained by the step can reflect that each candidate word is the corresponding error correction term of wrong word (correct word)
Possibility.
The candidate word appraisal procedure of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding the probability set in the wrong lexeme with the editing distance of the wrong word and each candidate word of each candidate word, and determine
Error message of the mistake word relative to each candidate word;According to the editing distance, language environment probability and error message, really
Determine the corresponding assessment score of each candidate word;Both habit problem when editing distance and user's written word had been considered, it will also be upper
Hereafter the information of language environment takes into account, and is thus conducive to the reliability for improving candidate word assessment result.
In one embodiment, determine that the language model for the language environment probability that each candidate word is set in the wrong lexeme includes
But it is not limited to N-Gram models, BiLSTM models or LSTM models.Wherein, the explanation of each model is with reference to above-mentioned related embodiment
Description.
In one embodiment, the log values for the probability set in the wrong lexeme by each candidate word when language environment probability
When expression, then each candidate word can be determined according to the inverse of the editing distance, the inverse and error message of language environment probability
Corresponding assessment score.
The editing distance of the corresponding assessment score of each candidate word wrong word corresponding thereto is corresponding at inverse relationship as a result,
Language environment probability at inverse relationship, while also being influenced by the error message of itself and wrong word.I.e. finally obtained each time
The assessment score for selecting word is after habit problem, editing distance and the language environment information when considering user's written word
It arrives, relative to traditional candidate word appraisal procedure, is conducive to the assessment reliability for improving candidate word.
Wrong word includes relative to the error message of each candidate word in one of the embodiments,:The mistake word and candidate
Word whether the identical information of initial.It is accordingly, described according to the editing distance, language environment probability and error message,
The step of determining each candidate word corresponding assessment score include:If wrong word is identical as candidate initial letter, according to the editor
Distance, language environment probability and the first coefficient, calculate the corresponding assessment score of the candidate word;If wrong word and candidate word lead-in
It is female different, then according to the editing distance, language environment probability and the second coefficient, calculate that the candidate word is corresponding to be assessed
Point.
Optionally, the assessment score of each candidate word is calculated according to following formula:
Wherein, word indicates candidate word, DeditIndicate the editing distance of candidate word and the wrong word,It indicates to wait
Select the language environment probability of word, mx representation language models, scorewordIndicate that the corresponding assessment score of candidate word, K indicate wrong word
Relative to the error message of each candidate word, if candidate word is identical with wrong initial letter, K values are K1, otherwise, K values K2, K1,
K2 is preset numerical value.
Such as:Assuming that the language environment probability determined by N-Gram language models isThen calculate each time
Selecting the formula of the scoring of word can be:
Wherein according to candidate word and wrong word, whether initial is identical selects for the selection of K values, and K values are 1 when identical, different
When K values be 0.5.It should be understood that the value of K includes but not limited to the value of above-mentioned example.
For English word or character string, it is based on user's writing style, initial will not generally malfunction, therefore based on upper
Embodiment is stated, is that the possibility of the error correction term of wrong word is assessed to each candidate word, can more reasonably determine candidate word
Assess score.It is error correction term if candidate word is identical as the wrong initial of word so that in the case where other conditions are equivalent
Possibility bigger, otherwise, be error correction term possibility smaller.
For example, as it is known that situation is, user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet,
Context language environment is I have sone apples.
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make
For candidate word.
3, some, same, son, one are replaced into the sone in I have sone apples respectively, according to N-GRAM moulds
It is corresponding that type calculates each candidate wordN values 3 in N-GRAM models, the language environment for obtaining each candidate word are general
Rate is as follows:
Wherein, P (I, have, some)=c (I, have, some)/c (3-Gram), c, which are represented, to be counted, i.e. I in language material,
3 tuples as have, some count the counting of (number) divided by all 3 tuples;
Wherein, 2 tuples as P (have | I)=c (have, I)/c (I), i.e. have in language material, I count divided by all I
Counting;
Wherein, P (I)=c (I)/c (1-Gram), i.e., the meter of the counting of 1 tuple divided by all 1 tuples as I in language material
Number.
4, according to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1 × 1/-5.7=0.175;
scoresame=-1 × 0.5 × 1/-6=0.083;
scoreson=-1 × 1 × 1/-6=0.167;
scoreone=-0.5 × 1 × 1/-4.8=0.104.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for BiLSTM models, above-mentioned steps 1 and step 2 are identical;
Step 3 replaces with:Some, same, son, one are replaced into the sone in I have sone apples, meter respectively
It is as follows to calculate the corresponding language environment probability of each candidate word:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1 × 1/-4.9=0.204;
scoresame=-1 × 1/2 × 1/-5.07=0.098;
scoreson=-1 × 1 × 1/-7.6=0.132;
scoreone=-0.5 × 1 × 1/-4.66=0.107.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for LSTM models, above-mentioned steps 1 and step 2 are identical;
Step 3 replaces with:Some, same, son, one are replaced into the sone in I have sone apples, meter respectively
It is as follows to calculate the corresponding language environment probability of each candidate word:
Step 4 replaces with:
According to formulaCalculate assessing for each candidate word
Point:
scoresome=-1 × 1 × 1/-5.6=0.179;
scoresame=-1 × 1/2 × 1/-6.3=0.079;
scoreson=-1 × 1 × 1/-6=0.116;
scoreone=-0.5 × 1 × 1/-5.1=0.098.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
By above-described embodiment, both considered editing distance information and context language environment is new, it is also contemplated that
This not error-prone habit problem of initial when user writes English word, can thereby determine out more accurate error correction term.
As shown in figure 9, in the eighth embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S81 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S82 determines the editing distance of each candidate word and the wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S83, determines the similarity of each candidate word and wrong word, and the similarity is according to the longest of each candidate word and wrong word
Common subsequence and/or Longest Common Substring obtain.
The explanation that can refer to second embodiment about the realization of the step, does not repeat.
Step S84 obtains error message of the wrong word relative to each candidate word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S85 determines that each candidate word is corresponding and assesses according to the editing distance, similarity and error message
Point.
Specifically, which is equivalent to editing distance size, distinctive information and candidate word according to candidate word and wrong word
It with the degree of closeness of wrong word, scores candidate word, this comprehensive three aspects assessment candidate word is the corresponding error correction term of wrong word
Possibility.
The present embodiment determines each candidate word according to each candidate word and editing distance, similarity and the error message of wrong word
Corresponding assessment score.The phenomenon that editing distance, similarity and user's writing style for considering candidate word and wrong word, asks
Topic, can improve the reliability of candidate word assessment result.
In one embodiment, described according to the editing distance, similarity and error message, determine each candidate word pair
The step of assessment score answered includes:If wrong word is identical as candidate's initial letter, according to the editing distance, similarity and
First coefficient calculates the corresponding assessment score of the candidate word;If wrong word is different from candidate initial letter, according to the editor
Distance, similarity and the second coefficient calculate the corresponding assessment score of the candidate word.
In one embodiment, the candidate word is calculated according to the editing distance, similarity and the first coefficient to correspond to
Assessment score the step of include:According to the inverse of the editing distance, similarity and the first coefficient, the candidate word is calculated
Corresponding assessment score.
In one embodiment, the candidate word is calculated according to the editing distance, similarity and the second coefficient to correspond to
Assessment score the step of include:According to the inverse of the editing distance, similarity and the second coefficient, the candidate word is calculated
Corresponding assessment score.
In one embodiment, the assessment score of each candidate word can be calculated according to following formula:
scoreword=K × S × 1/Dedit
The specific calculating process of assessment score is exemplified below:
Known cases are that user is write some mistakes as sone, context when writing English words sentence on intelligent interaction tablet
Language environment is I have sone apples.English dictionary detection is crossed, it is found that sone is not present in English dictionary, determines
Sone is wrong word.Obtain the candidate word of sone:some、same、one.
Wherein, the editing distance of some, same, one and wrong word sone are respectively:1,2,1.
1, according to formula S=1/Dedit+0.7Dlcs1+0.3Dlcs2The similarity point of each candidate word and wrong word is calculated
It is not:
Ssome:1+0.7 × 0.75+0.3 × 0.5=1.675;
Ssame:1/2+0.7 × 0.5+0.3 × 0.25=0.925;
Sone:1+0.7 × 0.86+0.3 × 0.86=1.86.
2, according to formula scoreword=K × S × 1/DeditThe assessment that each candidate word is calculated is scored at:
scoresome=1 × 1.675 × 1/1=1.675;
scoresame=1 × 0.925 × 1/2=0.463;
scoreone=0.5 × 1.86 × 1/1=0.93.
The assessment score that can be seen that candidate word some from the above assessment score is higher than the assessment score of same and one;It comments
Estimate result has certain reference value to the determination of error correction term.
As shown in Figure 10, in the 9th embodiment, a kind of candidate word appraisal procedure is provided, is included the following steps:
Step S91 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S92 determines the editing distance of each candidate word and the wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S93 replaces the wrong word with each candidate word respectively, obtains candidate sentence, is determined according to the candidate sentence
The assessment probability of corresponding candidate word, language environment probability and candidate of the assessment probability according to candidate word in candidate sentence
The language environment probability for closing on word of word obtains.
The explanation that can refer to fourth embodiment about the realization of the step, does not repeat.
Step S94 obtains error message of the wrong word relative to each candidate word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S95 determines the corresponding assessment of each candidate word according to the editing distance, assessment probability and error message
Score.
The present embodiment determines each candidate according to each candidate word and the editing distance, assessment probability and error message of wrong word
The corresponding assessment score of word.Both the phenomenon that having considered editing distance and the user's writing of candidate word and wrong word problem, will also
The assessment information of language model is added, and can improve the reliability of candidate word assessment result, is conducive to improve text editing
Efficiency and accuracy.
In one embodiment, log value of the language environment probability of each word with each word in the probability of its position indicates, is based on
This obtains the assessment probability of candidate word, further, according to the inverse of the editing distance, the inverse of assessment probability and error
Information determines the corresponding assessment score of each candidate word.
In one embodiment, described according to the editing distance, assessment probability and error message, determine each candidate word
The step of corresponding assessment score includes:If wrong word is identical as candidate initial letter, according to the editing distance, assessment probability
And first coefficient calculate the assessment score of the candidate word;If wrong word is different from candidate initial letter, according to the editor
Distance, assessment probability and the second coefficient calculate the assessment score of the candidate word.
In one embodiment, the assessment score of each candidate word is calculated according to following formula:
The specific calculating process of assessment score is exemplified below:
Known cases are that user is write some mistakes as sone, context when writing English words sentence on intelligent interaction tablet
Language environment is I have sone apples.
For N-Gram models, N values 3 in N-GRAM models:
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make
For candidate word.The corresponding editing distance of candidate word some, same, son, one is respectively:1,2,1,1.
3, some, same, son, one are replaced into the sone in I have sone apples respectively, obtains corresponding comment
Estimate sentence;According to N-GRAM models and assessment sentence, the corresponding assessment probability of each candidate word, obtained assessment are calculated separately
Probability is as follows:
4, according to formulaEach candidate word is calculated
Assessment score it is as follows:
scoresome=-1 × 1/1 × 1/ (- 5.6)=0.179;
scoresame=-1 × 1/2 × 1/ (- 6.35)=0.079;
scoreson=-1 × 1/1 × 1/ (- 7.58)=0.132;
scoreone=-0.5 × 1/1 × 1/ (- 5.26)=0.095.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for BiLSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Point
Some, same, son, one are not replaced to the sone in I have sone apples;According to BiLSTM models and assessment language
Sentence, calculates separately the corresponding assessment probability of each candidate word, obtained assessment probability is as follows:
Step 4 replaces with:
According to formulaEach candidate word is calculated
Assessment is scored at:
scoresome=-1 × 1/1 × 1/ (- 2.88)=0.347;
scoresame=-1 × 1/2 × 1/ (- 3.62)=0.138;
scoreson=-1 × 1/1 × 1/ (- 4.1)=0.244;
scoreone=-0.5 × 1/1 × 1/ (- 2.62)=0.191.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for LSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:According to
LSTM models and assessment sentence, calculate separately the corresponding assessment probability of each candidate word, obtained assessment probability is as follows:
Step 4 replaces with:
According to formulaEach candidate word is calculated
Assessment is scored at:
scoresome=-1 × 1/1 × 1/ (- 3.175)=0.315;
scoresame=-1 × 1/2 × 1/ (- 3.75)=0.133;
scoreson=-1 × 1/1 × 1/ (- 4.6)=0.217;
scoreone=-0.5 × 1/1 × 1/ (- 3.45)=0.145.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
Score is assessed by the candidate word under above-mentioned three kinds of language models to calculate, and has both considered habit when user's written word
Used problem, it is also contemplated that editing distance and language environment information can thereby determine out the candidate word of the wrong word of more effectively assessment.
As shown in figure 11, in the tenth embodiment, a kind of candidate word appraisal procedure is provided, is included the following steps:
Step S101, it detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The correspondence step that can refer to first embodiment about the realization of the step, does not repeat.
Step S102, determines the similarity of each candidate word and wrong word, the similarity according to each candidate word with wrong word most
Long common subsequence and/or Longest Common Substring obtain.
The correspondence step that can refer to second embodiment about the realization of the step, does not repeat.
Step S103 determines the language environment probability that each candidate word is set in the wrong lexeme.
The explanation that can refer to 3rd embodiment about the realization of the step, does not repeat.
Step S104 obtains error message of the wrong word relative to each candidate word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S105 determines that each candidate word is corresponding and comments according to the similarity, language environment probability and error message
Estimate score.
The present embodiment determines each time according to each candidate word and similarity, language environment probability and the error message of wrong word
Select the corresponding assessment score of word.Both the phenomenon that having considered similarity and the user's writing of candidate word and wrong word problem, will also
The assessment information of language model is added, and can improve the reliability of candidate word assessment result, is conducive to improve text editing
Efficiency and accuracy.
In one embodiment, the log values for the probability set in the wrong lexeme by each candidate word when language environment probability
When expression, the corresponding assessment of each candidate word can be determined according to the similarity, the inverse of language environment probability and error message
Score.
In one embodiment, described according to the similarity, language environment probability and error message, determine each candidate
The step of word corresponding assessment score includes:If wrong word is identical as candidate initial letter, according to the similarity, language environment
Probability and the first coefficient calculate the assessment score of the candidate word;If wrong word is different from candidate initial letter, according to
Similarity, language environment probability and the second coefficient calculate the assessment score of the candidate word.
In one embodiment, the assessment score of each candidate word is calculated according to following formula:
In one embodiment, the language model is N-Gram models, BiLSTM models or LSTM models.
It is alternatively possible to calculate the corresponding language environment probability of candidate word in conjunction with above-mentioned multiple language models.
Optionally, determine that the corresponding assessment score of each candidate word can be counted by following formula by N-Gram models
It calculates:
Wherein,For the corresponding language environment probability of a certain candidate word calculated by N-Gram models.
For N-Gram models, N values 3 in N-Gram models:
Concrete example is as follows:Known cases are that user writes some mistakes when writing English words sentence on intelligent interaction tablet
At sone, context language environment is I have sone apples.
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make
For candidate word.Determine that each candidate word and the similarity of wrong word are respectively:Ssome=1.675;Ssame=0.925;Sson=1.86;
Sone=1.86.
3, some, same, son, one are replaced into the sone in I have sone apples respectively, according to replaced
Sentence calculates the corresponding language environment probability of candidate word
Language environment probability of each candidate word in the position of sone calculated according to N-Gram models be:
Wherein, P (I, have, some)=c (I, have, some)/c (3-Gram), c, which are represented, to be counted, i.e. I in language material,
3 tuples as have, some count the counting of (number) divided by all 3 tuples;
Wherein, 2 tuples as P (have | I)=c (have, I)/c (I), i.e. have in language material, I count divided by all I
Counting;
Wherein, P (I)=c (I)/c (1-Gram), i.e., the meter of the counting of 1 tuple divided by all 1 tuples as I in language material
Number.
4, according to formulaIt is calculated that each candidate word is corresponding to be commented
Estimate and is scored at:
scoresome=-1 × 1.675 × 1/ (- 5.7)=0.294;
scoresame=-1 × 0.925 × 1/ (- 6)=0.154;
scoreson=-1 × 1.86 × 1/ (- 8)=0.219;
scoreone=-0.5 × 1.86 × 1/ (- 4.8)=0.182.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for BiLSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Point
Some, same, son, one are not replaced into the sone in I have sone apples, calculate the corresponding language ring of each candidate word
Border probabilityEach candidate word calculates as follows:
Step 4 replaces with:
According to formulaThe corresponding assessment score of each candidate word is calculated
For:
scoresome=-1 × 1.675 × 1/-5.6=0.299;
scoresame=-1 × 0.925 × 1/-6.3=0.147;
scoreson=-1 × 1.86 × 1/-8.6=0.203;
scoreone=-0.5 × 1.86 × 1/-5.1=0.172.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for LSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Respectively
Some, same, son, one are replaced into the sone in I have sone apples, calculate the corresponding language environment of each candidate word
ProbabilityEach candidate word calculates as follows:
Step 4 replaces with:
According to formulaThe corresponding assessment score of each candidate word is calculated
For:
scoresome=-1 × 1.675 × 1/-5.6=0.299;
scoresame=-1 × 0.925 × 1/-6.3=0.147;
scoreson=-1 × 1.86 × 1/-8.6=0.203;
scoreone=-0.5 × 1.86 × 1/-5.1=0.172.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
By the candidate word assessment under above-mentioned three kinds of language models, habit problem when user's written word had both been considered,
Also the assessment information of similarity information and language environment is added, which thereby enhances the reliability of candidate word assessment.
As shown in figure 12, in the 11st embodiment, a kind of candidate word appraisal procedure is provided, is included the following steps:
Step S111 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S112, determines the similarity of each candidate word and wrong word, the similarity according to each candidate word with wrong word most
Long common subsequence and/or Longest Common Substring obtain.
The explanation that can refer to second embodiment about the realization of the step, does not repeat.
Step S113 replaces the wrong word with each candidate word respectively, obtains candidate sentence, is determined according to the candidate sentence
The assessment probability of corresponding candidate word, language environment probability and candidate of the assessment probability according to candidate word in candidate sentence
The language environment probability for closing on word of word obtains.
The explanation that can refer to fourth embodiment about the realization of the step, does not repeat.
Step S114 obtains error message of the wrong word relative to each candidate word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S115 determines that each candidate word is corresponding and assesses according to the similarity, assessment probability and error message
Point.
The present embodiment determines each candidate word according to each candidate word and similarity, assessment probability and the error message of wrong word
Corresponding assessment score.Both the phenomenon that having considered similarity and the user's writing of candidate word and wrong word problem, also by language
The assessment information of model is added, and can improve the reliability of candidate word assessment result.
In one embodiment, log value of the language environment probability of each word with each word in the probability of its position indicates, is based on
This obtains the assessment probability of candidate word, further, can according to the similarity, assess probability inverse and error message,
Determine the corresponding assessment score of each candidate word.
In one embodiment, described according to the similarity, assessment probability and error message, determine each candidate word pair
The step of assessment score answered includes:If wrong word is identical as candidate initial letter, according to the similarity, assess probability and
First coefficient calculates the assessment score of the candidate word;If wrong word is different from candidate initial letter, according to the similarity, comment
Estimate probability and the second coefficient calculates the assessment score of the candidate word.
In one embodiment, the assessment score of each candidate word is calculated according to following formula:
The specific calculating process of assessment score is exemplified below:
Known cases are that user is write some mistakes as sone, context when writing English words sentence on intelligent interaction tablet
Language environment is I have sone apples.
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make
For candidate word.Determine that each candidate word and the similarity of wrong word are respectively:Ssome=1.675;Ssame=0.925;Sson=1.86;
Sone=1.86.
3, some, same, son, one are replaced to the sone in I have sone apples respectively;It obtains:
Candidate sentence one:I have some apples;
Candidate sentence two:I have same apples;
Candidate sentence three:I have son apples;
Candidate sentence four:I have one apples.
Based on N-GRAM models, N values 3 in N-GRAM models, candidate word some is corresponding in candidate sentence one closes on word
For apples, the corresponding words that close on of candidate word same are apples in candidate sentence two, and candidate word son is corresponded in candidate sentence three
The word that closes on be apples, the corresponding words that close on of candidate word one are apples in candidate sentence four;And according to N-GRAM models point
The corresponding assessment probability of each candidate word is not calculated, and obtained assessment probability is as follows:
4, according to formulaThe assessment of each candidate word is calculated
It is scored at:
scoresome=-1 × 1.675 × 1/ (- 5.6)=0.299;
scoresame=-1 × 0.925 × 1/ (- 6.35)=0.146;
scoreson=-1 × 1.86 × 1/ (- 7.58)=0.245;
scoreone=-0.5 × 1.86 × 1/ (- 5.26)=0.177.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, candidate word some is corresponding for BiLSTM models, in candidate sentence one closes on word
For I, have and apples, the corresponding words that close on of candidate word same are I, have and apples in candidate sentence two, candidate
For I, have and apples, candidate word one is corresponding in candidate sentence four faces the corresponding words that close on of candidate word son in sentence three
Nearly word is I, have and apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Step 4 replaces with:
According to formulaThe assessment score of each candidate word is calculated
For:
scoresome=-1 × 1.675 × 1/ (- 2.88)=0.582;
scoresame=-1 × 0.925 × 1/ (- 3.62)=0.256;
scoreson=-1 × 1.86 × 1/ (- 4.1)=0.454;
scoreone=-0.5 × 1.86 × 1/ (- 2.62)=0.355.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for LSTM models, the corresponding words that close on of candidate word some are in candidate sentence one
I, have and apples, the corresponding words that close on of candidate word same are I, have and apples, candidate language in candidate sentence two
For I, have and apples, candidate word one is corresponding in candidate sentence four closes on the corresponding words that close on of candidate word son in sentence three
Word is I, have and apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Step 4 replaces with:
According to formulaAssessing for each candidate word is calculated
It is divided into:
scoresome=-1 × 1.675 × 1/ (- 3.175)=0.528;
scoresame=-1 × 0.925 × 1/ (- 3.75)=0.247;
scoreson=-1 × 1.86 × 1/ (- 4.6)=0.404;
scoreone=-0.5 × 1.86 × 1/ (- 3.45)=0.27.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
By the candidate word assessment under above-mentioned three kinds of language models, habit problem when user's written word had both been considered,
Also the assessment information of similarity information and language environment is added, which thereby enhances the reliability of candidate word assessment.
As shown in figure 13, in the 12nd embodiment, a kind of candidate word appraisal procedure is provided, is included the following steps:
Step S121 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S122 determines the editing distance of each candidate word and the wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S123, determines the similarity of each candidate word and wrong word, the similarity according to each candidate word with wrong word most
Long common subsequence and/or Longest Common Substring obtain.
The explanation that can refer to second embodiment about the realization of the step, does not repeat.
Step S124 determines the language environment probability that each candidate word is set in the wrong lexeme.
The explanation that can refer to 3rd embodiment about the realization of the step, does not repeat.
Step S125 obtains error message of the wrong word relative to each candidate word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S126 determines each candidate according to the editing distance, similarity, language environment probability and error message
The corresponding assessment score of word.
The present embodiment is according to editing distance, similarity, language environment probability and the error message of each candidate word and wrong word
Determine the corresponding assessment score of each candidate word.Both editing distance, similarity and the user of candidate word and wrong word had been considered
The phenomenon that writing problem, also by the assessment information of language model be added come in, the reliability of candidate word assessment result can be improved, into
One step, be conducive to the efficiency and accuracy that improve text editing.
In one embodiment, the log values for the probability set in the wrong lexeme by each candidate word when language environment probability
When expression, it can be determined each according to the inverse of the editing distance, similarity, the inverse of language environment probability and error message
The corresponding assessment score of candidate word.
It is in one embodiment, described according to the editing distance, similarity, language environment probability and error message,
The step of determining each candidate word corresponding assessment score include:If wrong word is identical as candidate initial letter, according to the editor
Distance, similarity, language environment probability and the first coefficient calculate the assessment score of the candidate word;If wrong word and candidate prefix
It is alphabetical different, then commenting for the candidate word is calculated according to the editing distance, similarity, language environment probability and the second coefficient
Estimate score.
In one embodiment, the assessment score of each candidate word is calculated according to following formula:
The specific calculating process of assessment score is exemplified below:
Known cases are that user is write some mistakes as sone, context when writing English words sentence on intelligent interaction tablet
Language environment is I have sone apples.
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make
For candidate word.Determine that each candidate word and the similarity of wrong word are respectively:Ssome=1.675;Ssame=0.925;Sson=1.86;
Sone=1.86.The corresponding editing distance difference of candidate word some, same, son, one:1,2,1,1.
3, some, same, son, one are replaced into the sone in I have sone apples respectively, according to N-GRAM moulds
It is corresponding that type calculates each candidate wordN values 3 in N-GRAM models, the language environment for obtaining each candidate word are general
Rate is as follows:
4, according to formulaEach candidate word is calculated
Assessment is scored at:
scoresome=-1 × 1.675 × 1/1 × 1/ (- 5.7)=0.294;
scoresame=-1 × 0.925 × 1/2 × 1/ (- 6)=0.077;
scoreson=-1 × 1.86 × 1/1 × 1/ (- 8)=0.233;
scoreone=-0.5 × 1.86 × 1/1 × 1/ (- 4.8)=0.194.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for BiLSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Point
Some, same, son, one are not replaced into the sone in I have sone apples, calculate the corresponding language ring of each candidate word
Border probabilityEach candidate word calculates as follows:
Step 4 replaces with:
According to formulaCalculate assessing for each candidate word
Point:
scoresome=-1 × 1.675 × 1/1 × 1/-4.9=0.342;
scoresame=-1 × 0.925 × 1/2 × 1/-5.07=0.182;
scoreson=-1 × 1.86 × 1/1 × 1/-7.6=0.246;
scoreone=-0.5 × 1.86 × 1/1 × 1/-4.66=0.199.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for LSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Respectively
Some, same, son, one are replaced into the sone in I have sone apples, calculate the corresponding language environment of each candidate word
ProbabilityEach candidate word calculates as follows:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1.675 × 1/1 × 1/-5.6=0.3;
scoresame=-1 × 0.925 × 1/2 × 1/-6.3=0.147;
scoreson=-1 × 1.86 × 1/1 × 1/-6=0.216;
scoreone=-0.5 × 1.86 × 1/1 × 1/-5.1=0.182.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
By the candidate word assessment under above-mentioned three kinds of language models, habit problem when user's written word had both been considered,
Also the assessment information of editing distance, similarity and language environment is added, which thereby enhances the reliable of candidate word assessment
Property.
As shown in figure 14, in the 13rd embodiment, a kind of candidate word appraisal procedure is provided, is included the following steps:
Step S131 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S132 determines the editing distance of each candidate word and the wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S133, determines the similarity of each candidate word and wrong word, the similarity according to each candidate word with wrong word most
Long common subsequence and/or Longest Common Substring obtain.
The explanation that can refer to second embodiment about the realization of the step, does not repeat.
Step S134 replaces the wrong word with each candidate word respectively, obtains candidate sentence, is determined according to the candidate sentence
The assessment probability of corresponding candidate word, language environment probability and candidate of the assessment probability according to candidate word in candidate sentence
The language environment probability for closing on word of word obtains.
The explanation that can refer to fourth embodiment about the realization of the step, does not repeat.
Step S135 obtains error message of the wrong word relative to each candidate word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S136 determines each candidate word pair according to the editing distance, similarity, assessment probability and error message
The assessment score answered.
The present embodiment is determined according to each candidate word and editing distance, similarity, assessment probability and the error message of wrong word
The corresponding assessment score of each candidate word.Both it had considered candidate word and editing distance, similarity and the user of wrong word writes
The phenomenon that problem, also by the assessment information of language model be added come in, the reliability of candidate word assessment result can be improved, be conducive to
Improve the efficiency and accuracy of text editing.
In one embodiment, log value of the language environment probability of each word with each word in the probability of its position indicates, is based on
This obtains the assessment probability of candidate word, further, can according to the inverse of the editing distance, similarity, assessment probability fall
Number and error message, determine the corresponding assessment score of each candidate word.
In one embodiment, described according to the editing distance, similarity, assessment probability and error message, it determines
The step of each candidate word corresponding assessment score includes:If wrong word is identical as candidate's initial letter, according to the editing distance,
Similarity, assessment probability and the first coefficient calculate the assessment score of the candidate word;If wrong word is different from candidate initial letter,
The assessment score of the candidate word is then calculated according to the editing distance, similarity, assessment probability and the second coefficient.
In one embodiment, the assessment score of each candidate word is calculated according to following formula:
The specific calculating process of assessment score is exemplified below:
Known cases are that user is write some mistakes as sone, context when writing English words sentence on intelligent interaction tablet
Language environment is I have sone apples.
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make
For candidate word.It calculates each candidate word and the similarity of wrong word is respectively:Ssome=1.675;Ssame=0.925;Sson=1.86;
Sone=1.86.The corresponding editing distance difference of candidate word some, same, son, one:1,2,1,1.
3, some, same, son, one are replaced into the sone in I have sone apples respectively, and according to N-GRAM
Model calculates separately the corresponding assessment probability of each candidate word, and obtained assessment probability is as follows:
4, according to formulaEach candidate is calculated
The assessment of word is scored at:
scoresome=-1 × 1.675 × 1/1 × 1/ (- 5.6)=0.299;
scoresame=-1 × 0.925 × 1/2 × 1/ (- 6.35)=0.073;
scoreson=-1 × 1.86 × 1/1 × 1/ (- 7.58)=0.245;
scoreone=-0.5 × 1.86 × 1/1 × 1/ (- 5.26)=0.177.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for BiLSTM models, it is general to calculate the corresponding assessment of candidate word in each candidate sentence
Rate is as follows:
Step 4 replaces with:
According to formulaEach candidate word is calculated
Assessment be scored at:
scoresome=-1 × 1.675 × 1/1 × 1/ (- 2.88)=0.582;
scoresame=-1 × 0.925 × 1/2 × 1/ (- 3.62)=0.128;
scoreson=-1 × 1.86 × 1/1 × 1/ (- 4.1)=0.454;
scoreone=-0.5 × 1.86 × 1/1 × 1/ (- 2.62)=0.355.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
In another embodiment, for LSTM models, the corresponding assessment probability of candidate word in each candidate sentence is calculated
It is as follows:
Step 4 replaces with:
According to formulaEach candidate word is calculated
Assessment is scored at:
scoresome=-1 × 1.675 × 1/1 × 1/ (- 3.175)=0.528;
scoresame=-1 × 0.925 × 1/2 × 1/ (- 3.75)=0.124;
scoreson=-1 × 1.86 × 1/1 × 1/ (- 4.6)=0.404;
scoreone=-0.5 × 1.86 × 1/1 × 1/ (- 3.45)=0.27.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone
Possibility bigger.
By the candidate word assessment under above-mentioned three kinds of language models, habit problem when user's written word had both been considered,
Also the assessment information of editing distance, similarity and language environment is added, which thereby enhances candidate word assessment result
Reliability.
On the basis of any of the above-described embodiment obtains candidate word corresponding assessment score, in one embodiment, above-mentioned time
It further includes step to select word appraisal procedure:Determine that the wrong word is corresponding from the multiple candidate word according to the assessment score
Error correction term is corrected the wrong word with the error correction term;Thus, it is possible to more accurately determinations to be corrected to wrong word.
Optionally, the candidate word that assessment highest scoring is selected from multiple candidate words of wrong word is corresponded to as the wrong word
Error correction term.Further, wrong word can also be replaced using the error correction term, to realize the effect to wrong word automatic error-correcting
Fruit.
In addition, in one embodiment, on the basis of any of the above-described embodiment obtains candidate word corresponding assessment score, on
It further includes step to state candidate word appraisal procedure:Multiple candidate words are ranked up according to assessment score, it is multiple according to sequencing display
Candidate word so that the higher candidate word of assessment score show it is more forward, preferably to prompt user.
In one embodiment, based on any of the above embodiments, above-mentioned candidate word appraisal procedure further includes step:Inspection
Word to be detected is surveyed whether in default dictionary, if not, it is determined that the word to be detected is wrong word.Such as:Each word is scanned, is detected
Each word is determined as wrong word whether in dictionary if not in dictionary.
It should be understood that default dictionary is either general English dictionary, Chinese dictionary etc., can also be that other are specific
Dictionary, dictionary can be selected according to actual conditions.
In one embodiment, optionally, after detection malfunctions word, above-mentioned candidate word appraisal procedure further includes the determining mistake
The step of word corresponding candidate word set.The step can be:The editing distance for calculating the wrong word and known words in dictionary, chooses
Known words of the editing distance in setting range obtain the corresponding multiple candidate words of the wrong word.For example, it is small to choose editing distance
In 3 known words as candidate word, thus improve the validity of candidate word assessment.
It should be understood that for each method embodiment above-mentioned, although each step in flow chart is according to arrow
Instruction is shown successively, but these steps are not the inevitable sequence indicated according to arrow to be executed successively.Unless having herein bright
True explanation, there is no stringent sequences to limit for the execution of these steps, these steps can execute in other order.And
And at least part step in the flow chart of embodiment of the method may include multiple sub-steps or multiple stages, this is a little
Step or stage are not necessarily to execute completion in synchronization, but can execute at different times, these sub-steps
Either the execution sequence in stage be also not necessarily carry out successively but can with the sub-step of other steps or other steps or
At least part in person's stage executes in turn or alternately.
Based on the thought of the candidate word appraisal procedure with the embodiment of above-mentioned first embodiment~the 13rd, the embodiment of the present application
Additionally provide corresponding candidate word apparatus for evaluating.
As shown in figure 15, in the 14th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 101 obtains the corresponding multiple candidate words of wrong word for detecting wrong word;
Apart from determining module 102, the editing distance for determining each candidate word and the wrong word;
Error message acquisition module 103, for obtaining error message of the wrong word relative to each candidate word;
And first evaluation module 104, for according to the editing distance and error message, determining each candidate word pair
The assessment score answered.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding each candidate word and the wrong word editing distance, and determine error message of the wrong word relative to each candidate word;
According to the editing distance and error message, the corresponding assessment score of each candidate word is determined;Both showing for word writing had been considered
As problem, also the information of the editing distance between word is taken into account, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, first evaluation module 104, for being believed according to the inverse and error of the editing distance
Breath, determines the corresponding assessment score of each candidate word.
In one embodiment, wherein wrong word includes relative to the error message of each candidate word:The mistake word and candidate word
Whether the identical information of initial;
First evaluation module 104 includes:First scoring submodule, if identical as candidate initial letter for wrong word,
Then according to the editing distance and the first coefficient, the corresponding assessment score of the candidate word is calculated;Second scoring submodule, is used
If different from candidate initial letter in wrong word, according to the editing distance and the second coefficient, calculate the candidate word and correspond to
Assessment score.
As shown in figure 16, in the 15th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 201 obtains the corresponding multiple candidate words of wrong word for detecting wrong word;
Similarity determining module 202, the similarity for determining each candidate word and wrong word, the similarity is according to each candidate
The longest common subsequence and/or Longest Common Substring of word and wrong word obtain;
Error message acquisition module 203, for obtaining error message of the wrong word relative to each candidate word;
And second evaluation module 204, for according to the similarity and error message, determining that each candidate word corresponds to
Assessment score.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding each candidate word and the wrong word similarity, and determine error message of the wrong word relative to each candidate word;Root
According to the similarity and error message, the corresponding assessment score of each candidate word is determined;Both the phenomenon that word is write had been considered to ask
Topic, also takes into account the similarity information between word, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, the similarity determining module 202 includes:First similarity calculation submodule or the second phase
Like degree computational submodule.
Wherein, the first similarity calculation submodule, for the longest common subsequence according to each candidate word and the wrong word
At least one of rate, Longest Common Substring rate calculate the similarity of each candidate word and the wrong word;Alternatively, the second similarity
Computational submodule, for according in the longest common subsequence rate of each candidate word and the wrong word, Longest Common Substring rate extremely
The editing distance of few one and each candidate word and the wrong word, calculates the similarity of each candidate word and the wrong word.
In one embodiment, the second similarity calculation submodule, for according to each candidate word with the wrong word most
At least one of long common subsequence rate, Longest Common Substring rate and each candidate word and the editing distance of the wrong word
Inverse calculates the similarity of each candidate word and the wrong word.
In one embodiment, the wrong word includes relative to the error message of each candidate word:The mistake word and candidate word
Whether the identical information of initial;Optionally, second evaluation module 204 includes:First scoring submodule, if for wrong word
It is identical as candidate initial letter, then according to the similarity and the first coefficient, calculate the corresponding assessment score of the candidate word;
Second scoring submodule, according to the similarity and the second coefficient, calculates if different from candidate initial letter for wrong word
The corresponding assessment score of the candidate word.
As shown in figure 17, in the 16th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 301 obtains the corresponding multiple candidate words of wrong word for detecting wrong word;
First probability determination module 302, the language environment probability set in the wrong lexeme for determining each candidate word;
Error message acquisition module 303, for obtaining error message of the wrong word relative to each candidate word;
And third evaluation module 304, for according to the language environment probability and error message, determining each candidate
The corresponding assessment score of word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding the probability set in the wrong lexeme of each candidate word, and determine error message of the wrong word relative to each candidate word;
According to the language environment probability and error message, the corresponding assessment score of each candidate word is determined;Both user's book had been considered
Habit problem when word is write, also takes into account the information of context language environment, thus, it is possible to improve candidate word assessment result
Reliability.
In one embodiment, first probability determination module 302, for calculating each candidate according to preset language model
The probability that word is set in the wrong lexeme, using the log values of the probability as the language environment probability of the candidate word.
In one embodiment, the third evaluation module 304, for according to the inverse of the language environment probability and going out
Wrong information determines the corresponding assessment score of each candidate word;Wherein, the language model include but not limited to N-Gram models,
BiLSTM models or LSTM models.
In one embodiment, wrong word includes relative to the error message of each candidate word:Whether mistake word and the candidate word
The identical information of initial;Accordingly, the third evaluation module 304 includes:First scoring submodule, if for wrong word and time
It selects initial letter identical, assessing for the candidate word is calculated according to the inverse of the language environment probability and the first coefficient
Point;Second scoring submodule, if different with candidate's initial letter for wrong word, according to the inverse of the language environment probability and
Second coefficient calculates the assessment score of the candidate word.
As shown in figure 18, in the 17th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 401 obtains the corresponding multiple candidate words of wrong word for detecting wrong word;
Second probability determination module 402, for each candidate word replacement wrong word, obtaining candidate sentence respectively, according to
Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence
Border probability and the language environment probability for closing on word of candidate word obtain;
Error message acquisition module 403, for obtaining error message of the wrong word relative to each candidate word;
And the 4th evaluation module 404, for according to the assessment probability and error message, determining each candidate word pair
The assessment score answered.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding the corresponding assessment probability of each candidate word, and determine error message of the wrong word relative to each candidate word;According to institute
Probability and error message are estimated in commentary, determine the corresponding assessment score of each candidate word;Both habit when user's written word had been considered
Used problem, also takes into account the information of context language environment, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, second probability determination module 402 is additionally operable to be calculated according to preset language model candidate
The probability for closing on its each comfortable position of word of candidate word, candidate word in sentence, using the log values of the probability as the language of each word
Ambient probability;The language environment probability averaging for closing on word to the language environment probability of candidate word, candidate word in candidate sentence,
Obtain the assessment probability of candidate word in the candidate sentence.
In one embodiment, the 4th evaluation module 404, specifically for according to it is described assessment probability inverse and go out
Wrong information determines the corresponding assessment score of each candidate word;Wherein, the language model include but not limited to N-Gram models,
BiLSTM models or LSTM models.
In one embodiment, the wrong word includes relative to the error message of each candidate word:The mistake word and candidate word
Whether the identical information of initial;4th evaluation module 404 includes:First scoring submodule, if for wrong word and candidate
Initial letter is identical, then according to the assessment probability and the first coefficient, calculates the corresponding assessment score of the candidate word;Second
Score submodule, if different from candidate initial letter for wrong word, according to the assessment probability and the second coefficient, calculates institute
State the corresponding assessment score of candidate word.
As shown in figure 19, in the 18th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 501 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Apart from determining module 502, the editing distance for determining each candidate word and the wrong word;
Second probability determination module 503, for each candidate word replacement wrong word, obtaining candidate sentence respectively, according to
Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence
Border probability and the language environment probability for closing on word of candidate word obtain;
And the 5th evaluation module 504, for according to the editing distance and assessment probability, determining each candidate word
Assess score.
In one embodiment, second probability determination module 503, for replacing the mistake with each candidate word respectively
Word obtains candidate sentence, according to preset language model calculate candidate word in candidate sentence, candidate word close on word it is each it is comfortable its
The probability of position, using the log values of the probability as the language environment probability of each word;To the language ring of candidate word in candidate sentence
The language environment probability averaging for closing on word of border probability, candidate word, obtains the assessment probability of candidate word in the candidate sentence.
Wherein, the language model includes but not limited to:N-Gram models, BiLSTM models or LSTM models.
Based on above-described embodiment, the 5th evaluation module 504 is specifically used for the inverse according to the editing distance and assessment
The inverse of probability determines the corresponding assessment score of each candidate word.
As shown in figure 20, in the 19th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 601 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Similarity determining module 602, the similarity for determining each candidate word and wrong word, the similarity is according to each candidate
The longest common subsequence and/or Longest Common Substring of word and wrong word obtain;
Second probability determination module 603, for each candidate word replacement wrong word, obtaining candidate sentence respectively, according to
Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence
Border probability and the language environment probability for closing on word of candidate word obtain;
And the 6th evaluation module 604, for according to the similarity and assessment probability, determining commenting for each candidate word
Estimate score.
In one embodiment, second probability determination module 603, for replacing the mistake with each candidate word respectively
Word obtains candidate sentence, according to preset language model calculate candidate word in candidate sentence, candidate word close on word it is each it is comfortable its
The probability of position, using the log values of the probability as the language environment probability of each word;To the language ring of candidate word in candidate sentence
The language environment probability averaging for closing on word of border probability, candidate word, obtains the assessment probability of candidate word in the candidate sentence.
Wherein, the language model includes but not limited to:N-Gram models, BiLSTM models or LSTM models.
Based on above-described embodiment, the 6th evaluation module 604 is specifically used for according to the reciprocal and similar of the assessment probability
Degree, determines the corresponding assessment score of each candidate word.
As shown in figure 21, in the 20th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 701, for when detecting wrong word, obtaining the corresponding multiple candidate words of wrong word;
Apart from determining module 702, the editing distance for determining each candidate word and the wrong word;
First probability determination module 703, the language environment probability set in the wrong lexeme for determining each candidate word;
Error message acquisition module 704, for obtaining error message of the wrong word relative to each candidate word;
And the 7th evaluation module 705, for according to the editing distance, language environment probability and error message, really
Determine the corresponding assessment score of each candidate word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding the probability set in the wrong lexeme with the editing distance of the wrong word and each candidate word of each candidate word, and determine
Error message of the mistake word relative to each candidate word;According to the editing distance, language environment probability and error message, really
Determine the corresponding assessment score of each candidate word;Both editing distance information and context language environment had been considered, it is also contemplated that
Habit problem when user's written word, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, first probability determination module 703, for calculating each candidate according to preset language model
The probability that word is set in the wrong lexeme, using the log values of the probability as the language environment probability of candidate word.
In one embodiment, the 7th evaluation module 505 is specifically used for inverse, language according to the editing distance
The inverse and error message of ambient probability determine the corresponding assessment score of each candidate word;The language model includes but unlimited
In N-Gram models, BiLSTM models or LSTM models.
In one embodiment, wrong word includes relative to the error message of each candidate word:Whether mistake word and the candidate word
The identical information of initial;Accordingly, the 7th evaluation module 705 includes:First scoring submodule, if for wrong word and time
It selects initial letter identical, then according to the editing distance, language environment probability and the first coefficient, calculates the candidate word and correspond to
Assessment score;Second scoring submodule, if different from candidate initial letter for wrong word, according to the editing distance, language
It says ambient probability and the second coefficient, calculates the corresponding assessment score of the candidate word.
As shown in figure 22, in the 21st embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 801 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Apart from determining module 802, the editing distance for determining each candidate word and the wrong word.
Similarity determining module 803, the similarity for determining each candidate word and wrong word, the similarity is according to each candidate
The longest common subsequence and/or Longest Common Substring of word and wrong word obtain.
Error message acquisition module 804, for obtaining error message of the wrong word relative to each candidate word.
And the 8th evaluation module 805, for according to the editing distance, similarity and error message, determining each time
Select the corresponding assessment score of word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding each candidate word and the wrong word editing distance and similarity, and determine error of the wrong word relative to each candidate word
Information;According to the editing distance, similarity and error message, the corresponding assessment score of each candidate word is determined;Both it considered
Editing distance information and similarity, it is also contemplated that habit problem when user's written word, thus, it is possible to improve candidate word to comment
Estimate the reliability of result.
In one embodiment, the 8th evaluation module 805 includes:First scoring submodule, if for wrong word and time
It selects initial letter identical, then the corresponding assessment score of the candidate word is calculated according to the distance, similarity and the first coefficient;
Second scoring submodule, if different from candidate initial letter for wrong word, according to the distance, similarity and the second coefficient
Calculate the corresponding assessment score of the candidate word.
As shown in figure 23, in 22 embodiments, a kind of candidate word apparatus for evaluating is provided, the candidate word of the present embodiment is commented
Estimating device includes:
Candidate word acquisition module 901 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Apart from determining module 902, the editing distance for determining each candidate word and the wrong word.
Second probability determination module 903, for each candidate word replacement wrong word, obtaining candidate sentence respectively, according to
Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence
Border probability and the language environment probability for closing on word of candidate word obtain.
Error message acquisition module 904, for obtaining error message of the wrong word relative to each candidate word.
And the 9th evaluation module 905, for according to the editing distance, assessment probability and error message, determining each
The corresponding assessment score of candidate word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding the assessment probability set in the wrong lexeme with the editing distance of the wrong word and each candidate word of each candidate word, and
Determine error message of the wrong word relative to each candidate word;According to the editing distance, assessment probability and error message, really
Determine the corresponding assessment score of each candidate word;Both editing distance information and context language environment had been considered, it is also contemplated that
Habit problem when user's written word, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, the second probability determination module 903 is specifically used for being calculated according to preset language model candidate
The probability for closing on its each comfortable position of word of candidate word, candidate word in sentence, using the log values of the probability as the language of each word
Ambient probability;The language environment probability averaging for closing on word to the language environment probability of candidate word, candidate word in candidate sentence,
Obtain the assessment probability of candidate word in the candidate sentence.
In one embodiment, the 9th evaluation module 905 includes:First scoring submodule, if for wrong word and time
It selects initial letter identical, then assessing for the candidate word is calculated according to the editing distance, assessment probability and the first coefficient
Point;Second scoring submodule, if different with candidate initial letter for wrong word, according to the editing distance, assess probability with
And second coefficient calculate the assessment score of the candidate word.
As shown in figure 24, in 23 embodiments, a kind of candidate word apparatus for evaluating is provided, the candidate word of the present embodiment is commented
Estimating device includes:
Candidate word acquisition module 1001 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Similarity determining module 1002, the similarity for determining each candidate word and wrong word, the similarity is according to each time
The longest common subsequence and/or Longest Common Substring for selecting word and wrong word obtain.
First probability determination module 1003, the language environment probability set in the wrong lexeme for determining each candidate word.
Error message acquisition module 1004, for obtaining error message of the wrong word relative to each candidate word.
Tenth evaluation module 1005, for according to the similarity, language environment probability and error message, determining each time
Select the corresponding assessment score of word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding the prophesy ambient probability set in the wrong lexeme with the similarity of the wrong word and each candidate word of each candidate word, with
And determine error message of the wrong word relative to each candidate word;According to the similarity, language environment probability and error letter
Breath, determines the corresponding assessment score of each candidate word;Both editing distance information and context language environment had been considered, it is further contemplated that
Habit problem when user's written word is arrived, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, the first probability determination module 1003, for calculating each candidate word according to preset language model
In the probability that the wrong lexeme is set, using the log values of the probability as the language environment probability of candidate word.
In one embodiment, the tenth evaluation module 1005 includes:First scoring submodule, if for wrong word and time
It selects initial letter identical, then calculates the assessment of the candidate word according to the similarity, language environment probability and the first coefficient
Score;Second scoring submodule, it is general according to the similarity, language environment if different from candidate initial letter for wrong word
Rate and the second coefficient calculate the assessment score of the candidate word.
As shown in figure 25, in the 24th embodiment, a kind of candidate word apparatus for evaluating, the candidate word of the present embodiment are provided
Apparatus for evaluating includes:
Candidate word acquisition module 1101 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Similarity determining module 1102, the similarity for determining each candidate word and wrong word, the similarity is according to each time
The longest common subsequence and/or Longest Common Substring for selecting word and wrong word obtain.
Second probability determination module 1103, for each candidate word replacement wrong word, obtaining candidate sentence respectively, according to
Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence
Border probability and the language environment probability for closing on word of candidate word obtain.
Error message acquisition module 1104, for obtaining error message of the wrong word relative to each candidate word.
And the 11st evaluation module 1105, for according to the similarity, assessment probability and error message, determining
The corresponding assessment score of each candidate word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding the assessment probability set in the wrong lexeme with the similarity of the wrong word and each candidate word of each candidate word, and determine
Error message of the mistake word relative to each candidate word;According to the similarity, assessment probability and error message, each time is determined
Select the corresponding assessment score of word;Both similarity and context language environment had been considered, it is also contemplated that when user's written word
Habit problem, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, the second probability determination module 1103 is specifically used for being calculated according to preset language model and wait
The probability for closing on its each comfortable position of word for selecting candidate word in sentence, candidate word, using the log values of the probability as the language of each word
Say ambient probability;Flat is asked to the language environment probability for closing on word of the language environment probability of candidate word in candidate sentence, candidate word
, the assessment probability of candidate word in the candidate sentence is obtained.
In one embodiment, the 11st evaluation module 1105 includes:First scoring submodule, if for wrong word with
Candidate initial letter is identical, then calculates assessing for the candidate word according to the similarity, assessment probability and the first coefficient
Point;Second scoring submodule, if different with candidate initial letter for wrong word, according to the similarity, assess probability and
Second coefficient calculates the assessment score of the candidate word.
As shown in figure 26, in the 25th embodiment, a kind of candidate word apparatus for evaluating, the candidate word of the present embodiment are provided
Apparatus for evaluating includes:
Candidate word acquisition module 1201 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Apart from determining module 1202, the editing distance for determining each candidate word and the wrong word.
Similarity determining module 1203, the similarity for determining each candidate word and wrong word, the similarity is according to each time
The longest common subsequence and/or Longest Common Substring for selecting word and wrong word obtain.
First probability determination module 1204, the language environment probability set in the wrong lexeme for determining each candidate word.
Error message acquisition module 1205, for obtaining error message of the wrong word relative to each candidate word.
And the 12nd evaluation module 1206, for according to the editing distance, similarity, language environment probability and
Error message determines the corresponding assessment score of each candidate word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
It Que Ding not the language ring set in the wrong lexeme with the editing distance and similarity of the wrong word and each candidate word of each candidate word
Border probability, and determine error message of the wrong word relative to each candidate word;According to the editing distance, similarity, language
Ambient probability and error message determine the corresponding assessment score of each candidate word;Both editing distance information, similarity had been considered
And context language environment, it is also contemplated that habit problem when user's written word, thus, it is possible to improve candidate word assessment result
Reliability.
In one embodiment, first probability determination module 1204 is specifically used for according to preset language model meter
The probability that each candidate word is set in the wrong lexeme is calculated, using the log values of the probability as the language environment probability of the candidate word.
In one embodiment, the 12nd evaluation module 1206 includes:First scoring submodule, if for wrong word with
Candidate initial letter is identical, then calculates the time according to the editing distance, similarity, language environment probability and the first coefficient
Select the assessment score of word;Second scoring submodule, if different with candidate initial letter for wrong word, according to it is described edit away from
From, similarity, language environment probability and the second coefficient calculate the assessment score of the candidate word.
As shown in figure 27, in the 26th embodiment, a kind of candidate word apparatus for evaluating, the candidate word of the present embodiment are provided
Apparatus for evaluating includes:
Candidate word acquisition module 1301 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Apart from determining module 1302, the editing distance for determining each candidate word and the wrong word.
Similarity determining module 1303, the similarity for determining each candidate word and wrong word, the similarity is according to each time
The longest common subsequence and/or Longest Common Substring for selecting word and wrong word obtain.
Second probability determination module 1304, for each candidate word replacement wrong word, obtaining candidate sentence respectively, according to
Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence
Border probability and the language environment probability for closing on word of candidate word obtain.
Error message acquisition module 1305, for obtaining error message of the wrong word relative to each candidate word.
And the 13rd evaluation module 1306, for according to the editing distance, similarity, assessment probability and error
Information determines the corresponding assessment score of each candidate word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point
Not Que Ding the assessment set in the wrong lexeme of each candidate word and the editing distance and similarity of the wrong word and each candidate word it is general
Rate, and determine error message of the wrong word relative to each candidate word;According to the editing distance, similarity, assessment probability
And error message, determine the corresponding assessment score of each candidate word;Both editing distance, similarity and context language had been considered
Say environment, it is also contemplated that habit problem when user's written word, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, the second probability determination module 1304 is specifically used for being calculated according to preset language model and wait
The probability for closing on its each comfortable position of word for selecting candidate word in sentence, candidate word, using the log values of the probability as the language of each word
Say ambient probability;Flat is asked to the language environment probability for closing on word of the language environment probability of candidate word in candidate sentence, candidate word
, the assessment probability of candidate word in the candidate sentence is obtained.
In one embodiment, the 13rd evaluation module 1306 is additionally operable to the inverse according to the editing distance, phase
Like degree, the inverse and error message of assessment probability, the corresponding assessment score of each candidate word is determined.
In one embodiment, the 13rd evaluation module 1306 includes:First scoring submodule, if for wrong word with
Candidate initial letter is identical, then calculates the candidate word according to the editing distance, similarity, assessment probability and the first coefficient
Corresponding assessment score;Second scoring submodule, if different with candidate initial letter for wrong word, according to it is described edit away from
The corresponding assessment score of the candidate word is calculated from, similarity, assessment probability and the second coefficient.
On the basis of the candidate word apparatus for evaluating of any of the above-described embodiment, in one embodiment, candidate word apparatus for evaluating
Further include:Candidate word determining module, the editing distance for calculating wrong word and known words in default dictionary choose editing distance and exist
Known words in setting range obtain the corresponding multiple candidate words of the wrong word.
In one embodiment, on the basis of the candidate word apparatus for evaluating of any of the above-described embodiment, candidate word apparatus for evaluating
Further include:Correction module, for determining that described wrong word is corresponding entangles from the multiple candidate word according to the assessment score
Wrong word is corrected the wrong word with the error correction term.Optionally, the wrong word correction module, is used for from multiple candidate words
In determine the candidate word of the assessment highest scoring, as the corresponding error correction term of the wrong word.
In one embodiment, on the basis of the candidate word apparatus for evaluating of any of the above-described embodiment, candidate word apparatus for evaluating
Further include:Sorting module, it is described after display sequence for being ranked up to the multiple candidate word according to the assessment score
Multiple candidate words.
In one embodiment, a kind of computer equipment, including memory and processor are provided, meter is stored in memory
The step of calculation machine program, which realizes candidate word appraisal procedure in any of the above-described embodiment when executing computer program.
It can be more effective compared to tradition according to the candidate word assessment mode of editing distance by the computer equipment
The candidate word of the wrong word of assessment.
In one embodiment, a kind of computer readable storage medium is provided, computer program, computer are stored thereon with
The step of candidate word appraisal procedure in any of the above-described embodiment is realized when program is executed by processor.
It can compared to tradition according to the candidate word assessment mode of editing distance by the computer readable storage medium
The more effectively candidate word of the wrong word of assessment.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer
In read/write memory medium, the computer program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein,
Any reference to memory, storage, database or other media used in each embodiment provided herein,
Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield is all considered to be the range of this specification record.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiments.
The term " comprising " and " having " of embodiment hereof and their any deformations, it is intended that cover non-exclusive packet
Contain.Such as contain series of steps or the process, method, system, product or equipment of (module) unit are not limited to arrange
The step of going out or unit, but further include the steps that optionally not listing or unit, or further include optionally for these mistakes
The intrinsic other steps of journey, method, product or equipment or unit.
Referenced herein " multiple " refer to two or more."and/or", the association for describing affiliated partner are closed
System indicates may exist three kinds of relationships, for example, A and/or B, can indicate:Individualism A exists simultaneously A and B, individualism
These three situations of B.It is a kind of relationship of "or" that character "/", which typicallys represent forward-backward correlation object,.
Referenced herein " first second " be only be the similar object of difference, do not represent for the specific of object
Sequence, it is possible to understand that specific sequence or precedence can be interchanged in ground, " first second " in the case of permission.It should manage
The object that solution " first second " is distinguished can be interchanged in the appropriate case so that the embodiments described herein can in addition to
Here the sequence other than those of diagram or description is implemented.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art
It says, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection of the application
Range.Therefore, the protection domain of the application patent should be determined by the appended claims.
Claims (15)
1. a kind of candidate word appraisal procedure, which is characterized in that including:
It detects wrong word, obtains the corresponding multiple candidate words of wrong word;
Determine the editing distance of each candidate word and the wrong word;
The wrong word is replaced with each candidate word respectively, obtains candidate sentence, corresponding candidate word is determined according to the candidate sentence
Assess probability, the assessment probability is according to the word that closes on of the language environment probability of candidate word in candidate sentence and candidate word
Language environment probability obtains;
Obtain error message of the wrong word relative to each candidate word;
According to the editing distance, assessment probability and error message, the corresponding assessment score of each candidate word is determined.
2. candidate word appraisal procedure according to claim 1, which is characterized in that described according to the candidate sentence determination pair
The step of assessment probability for answering candidate word includes:
The probability for closing on its each comfortable position of word of candidate word in candidate sentence, candidate word is calculated according to preset language model,
Using the log values of the probability as the language environment probability of each word;
The language environment probability averaging for closing on word to the language environment probability of candidate word, candidate word in candidate sentence, obtains
The assessment probability of candidate word in candidate's sentence.
3. candidate word appraisal procedure according to claim 2, which is characterized in that described according to the editing distance, assessment
Probability and error message, the step of determining each candidate word corresponding assessment score include:
According to the inverse of the editing distance, the inverse and error message of assessment probability, the corresponding assessment of each candidate word is determined
Score;
And/or
The language model includes N-Gram models, BiLSTM models or LSTM models.
4. candidate word appraisal procedure according to any one of claims 1 to 3, which is characterized in that the mistake word is relative to each time
The error message of word is selected to include:Mistake word and the candidate word whether the identical information of initial;
It is described according to the editing distance, assessment probability and error message, determine the step of the corresponding assessment score of each candidate word
Suddenly include:
If wrong word is identical as candidate initial letter, according to the editing distance, assessment probability and the calculating of the first coefficient
The assessment score of candidate word;
If wrong word is different from candidate initial letter, according to the editing distance, assessment probability and the calculating of the second coefficient
The assessment score of candidate word.
5. candidate word appraisal procedure according to claim 4, which is characterized in that further include step:
Word to be detected is detected not in default dictionary, determines that the word to be detected is wrong word.
6. candidate word appraisal procedure according to claim 5, which is characterized in that further include step after detection malfunctions word
Suddenly:
The editing distance for calculating the wrong word and known words in the dictionary, chooses editing distance known in setting range
Word obtains the corresponding multiple candidate words of the wrong word.
7. according to any candidate word appraisal procedure in claim 1,2,3,5,6, which is characterized in that further include step:
The corresponding error correction term of the wrong word is determined from the multiple candidate word according to the assessment score, with the error correction term
Correct the wrong word;
And/or
The multiple candidate word is ranked up according to the assessment score, the multiple candidate word after display sequence.
8. candidate word appraisal procedure according to claim 7, which is characterized in that it is described according to the assessment score from described
The step of corresponding error correction term of the wrong word is determined in multiple candidate words include:
The candidate word that the assessment highest scoring is determined from the multiple candidate word, as the corresponding error correction of the mistake word
Word.
9. candidate word appraisal procedure according to claim 3, which is characterized in that calculate each candidate word according to following formula
Assess score:
Wherein, word indicates candidate word, DeditIndicate the editing distance of candidate word and wrong word,Indicate candidate
The assessment probability of word, scorewordIndicate that the corresponding assessment score of candidate word, K indicate that wrong word is believed relative to the error of each candidate word
Breath;If candidate word is identical with wrong initial letter, K values are K1, and otherwise, K values K2, K1, K2 are preset numerical value.
10. a kind of candidate word apparatus for evaluating, which is characterized in that including:
Candidate word acquisition module obtains the corresponding multiple candidate words of wrong word for detecting wrong word;
Apart from determining module, the editing distance for determining each candidate word and the wrong word;
Second probability determination module obtains candidate sentence, according to the candidate for replacing the wrong word with each candidate word respectively
Sentence determines the assessment probability of corresponding candidate word, the assessment probability according to the language environment probability of candidate word in candidate sentence,
And the language environment probability for closing on word of candidate word obtains;
Error message acquisition module, for obtaining error message of the wrong word relative to each candidate word;
And the 9th evaluation module, for according to the editing distance, assessment probability and error message, determining each candidate word
Corresponding assessment score.
11. candidate word apparatus for evaluating according to claim 10, which is characterized in that second probability determination module, packet
It includes:
Determine the probability submodule, for calculating candidate word in candidate sentence according to preset language model, candidate word closes on word
The probability of each its position of leisure, using the log values of the probability as the language environment probability of each word;
Determine the probability submodule is assessed, for closing on word to the language environment probability of candidate word in candidate sentence, candidate word
Language environment probability is averaging, and obtains the assessment probability of candidate word in the candidate sentence.
12. the candidate word apparatus for evaluating according to claim 10 or 11, which is characterized in that wrong word is relative to each candidate word
Error message includes:Mistake word and the candidate word whether the identical information of initial;
9th evaluation module includes:
First scoring submodule, if identical as candidate initial letter for wrong word, according to the editing distance, assess probability with
And first coefficient calculate the assessment score of the candidate word;
Second scoring submodule, if different with candidate initial letter for wrong word, according to the editing distance, assess probability with
And second coefficient calculate the assessment score of the candidate word.
13. candidate word apparatus for evaluating according to claim 10, which is characterized in that further include:
Candidate word determining module, the editing distance for calculating wrong word and known words in default dictionary are chosen editing distance and are being set
Determine the known words in range, obtains the corresponding multiple candidate words of the wrong word;
And/or
Error correction term determining module, for determining that the wrong word is corresponding from the multiple candidate word according to the assessment score
Error correction term, with the error correction term correction wrong word;
And/or
Sorting module is described more after display sequence for being ranked up to the multiple candidate word according to the assessment score
A candidate word.
14. a kind of computer equipment, including memory, processor and storage are on a memory and the meter that can run on a processor
Calculation machine program, which is characterized in that the processor realizes the step of claim 1 to 9 any the method when executing described program
Suddenly.
15. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
The step of claim 1 to 9 any the method is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810321061.6A CN108628827A (en) | 2018-04-11 | 2018-04-11 | Candidate word evaluation method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810321061.6A CN108628827A (en) | 2018-04-11 | 2018-04-11 | Candidate word evaluation method and device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108628827A true CN108628827A (en) | 2018-10-09 |
Family
ID=63705000
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810321061.6A Pending CN108628827A (en) | 2018-04-11 | 2018-04-11 | Candidate word evaluation method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108628827A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310440A (en) * | 2018-11-27 | 2020-06-19 | 阿里巴巴集团控股有限公司 | Text error correction method, device and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103885938A (en) * | 2014-04-14 | 2014-06-25 | 东南大学 | Industry spelling mistake checking method based on user feedback |
CN105550173A (en) * | 2016-02-06 | 2016-05-04 | 北京京东尚科信息技术有限公司 | Text correction method and device |
CN106843523A (en) * | 2016-12-12 | 2017-06-13 | 百度在线网络技术(北京)有限公司 | Character input method and device based on artificial intelligence |
CN107621891A (en) * | 2017-09-28 | 2018-01-23 | 北京新美互通科技有限公司 | A kind of text entry method, device and electronic equipment |
-
2018
- 2018-04-11 CN CN201810321061.6A patent/CN108628827A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103885938A (en) * | 2014-04-14 | 2014-06-25 | 东南大学 | Industry spelling mistake checking method based on user feedback |
CN105550173A (en) * | 2016-02-06 | 2016-05-04 | 北京京东尚科信息技术有限公司 | Text correction method and device |
CN106843523A (en) * | 2016-12-12 | 2017-06-13 | 百度在线网络技术(北京)有限公司 | Character input method and device based on artificial intelligence |
CN107621891A (en) * | 2017-09-28 | 2018-01-23 | 北京新美互通科技有限公司 | A kind of text entry method, device and electronic equipment |
Non-Patent Citations (2)
Title |
---|
张扬: "拼写校正技术在信息检索和文本处理领域的应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
郑文曦 等: "自动拼写校对的算法设计和系统实现", 《科技和产业》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310440A (en) * | 2018-11-27 | 2020-06-19 | 阿里巴巴集团控股有限公司 | Text error correction method, device and system |
CN111310440B (en) * | 2018-11-27 | 2023-05-30 | 阿里巴巴集团控股有限公司 | Text error correction method, device and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5752150B2 (en) | Context-sensitive automatic language correction using an Internet corpus specifically for small keyboard devices | |
US8554537B2 (en) | Method and device for transliteration | |
Luyckx | Scalability issues in authorship attribution | |
CN102831177B (en) | Statement error correction and system thereof | |
CN111241824B (en) | Method for identifying Chinese metaphor information | |
Yazdani et al. | Sentiment classification of financial news using statistical features | |
CN108628826A (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN108694167A (en) | Candidate word evaluation method, candidate word sorting method and device | |
CN101369285B (en) | Spell emendation method for query word in Chinese search engine | |
CN108681533A (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
KR102552811B1 (en) | System for providing cloud based grammar checker service | |
CN114328798A (en) | Processing method, device, equipment, storage medium and program product for searching text | |
CN108628827A (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN108664466A (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN108647202A (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN108595419A (en) | Candidate word evaluation method, candidate word sorting method and device | |
CN108664467A (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN108681534A (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN108681535A (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN108733645A (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN108733646A (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN108694166A (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN114896382A (en) | Artificial intelligent question-answering model generation method, question-answering method, device and storage medium | |
CN114548049A (en) | Digital regularization method, device, equipment and storage medium | |
CN110457695B (en) | Online text error correction method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181009 |
|
RJ01 | Rejection of invention patent application after publication |