CN108681533A - Candidate word appraisal procedure, device, computer equipment and storage medium - Google Patents

Candidate word appraisal procedure, device, computer equipment and storage medium Download PDF

Info

Publication number
CN108681533A
CN108681533A CN201810320351.9A CN201810320351A CN108681533A CN 108681533 A CN108681533 A CN 108681533A CN 201810320351 A CN201810320351 A CN 201810320351A CN 108681533 A CN108681533 A CN 108681533A
Authority
CN
China
Prior art keywords
word
candidate
candidate word
wrong
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810320351.9A
Other languages
Chinese (zh)
Other versions
CN108681533B (en
Inventor
李贤�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Original Assignee
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shiyuan Electronics Thecnology Co Ltd filed Critical Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority to CN201810320351.9A priority Critical patent/CN108681533B/en
Publication of CN108681533A publication Critical patent/CN108681533A/en
Application granted granted Critical
Publication of CN108681533B publication Critical patent/CN108681533B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation

Abstract

The present invention relates to candidate word appraisal procedure, device, computer equipment and storage mediums, are applied to data processing field.The method includes:It detects wrong word, obtains the corresponding multiple candidate words of wrong word;Determine the editing distance of each candidate word and the wrong word;Determine that the similarity of each candidate word and wrong word, the similarity are obtained according to each candidate word and the longest common subsequence and/or Longest Common Substring of wrong word;The wrong word is replaced with each candidate word respectively, obtains candidate sentence, the assessment probability of corresponding candidate word is determined according to the candidate sentence;Obtain error message of the wrong word relative to each candidate word;According to the editing distance, similarity, assessment probability and error message, the corresponding assessment score of each candidate word is determined.The embodiment of the present invention solves the problems, such as that candidate word assessment reliability is relatively low, is conducive to the reliability for improving candidate word assessment result.

Description

Candidate word appraisal procedure, device, computer equipment and storage medium
Technical field
The present invention relates to technical field of data processing, more particularly to candidate word appraisal procedure, device, computer equipment and Storage medium.
Background technology
Currently a popular word processor, such as Word, WPS, WordPerfect etc. are embedded in English spelling inspection Function, the function is for realizing English spelling inspection, when checking the word of misspelling, provides prompt message, or provide Corresponding Correcting Suggestion.
In the implementation of the present invention, following problem, existing error correction method exists in the prior art in inventor It is mainly detected using dictionary, the candidate word of wrong word is assessed by editing distance after finding misspelling, however This method is too simple and stiff, not ideal enough to the reliability of the assessment result of candidate word.
Invention content
Based on this, it is necessary to for the problem that existing way is not accurate enough to the assessment result of candidate word, provide a kind of time Select word appraisal procedure, device, computer equipment and storage medium.
Scheme provided in an embodiment of the present invention includes:
A kind of candidate word appraisal procedure, includes the following steps:It detects wrong word, obtains the corresponding multiple candidate words of wrong word; Determine the editing distance of each candidate word and the wrong word;Determine the similarity of each candidate word and wrong word, the similarity is according to each Candidate word and the longest common subsequence and/or Longest Common Substring of wrong word obtain;The wrong word is replaced with each candidate word respectively, Candidate sentence is obtained, the assessment probability of corresponding candidate word is determined according to the candidate sentence, the assessment probability is according to candidate language The language environment probability of candidate word and the language environment probability for closing on word of candidate word obtain in sentence;Obtain the wrong word phase For the error message of each candidate word;According to the editing distance, similarity, assessment probability and error message, each time is determined Select the corresponding assessment score of word.
A kind of candidate word apparatus for evaluating, including:It is corresponding to obtain wrong word for detecting wrong word for candidate word acquisition module Multiple candidate words;Apart from determining module, the editing distance for determining each candidate word and the wrong word;Similarity determining module, Similarity for determining each candidate word and wrong word, the similarity is according to the longest common subsequence of each candidate word and wrong word And/or Longest Common Substring obtains;Second probability determination module is waited for replacing the wrong word with each candidate word respectively Sentence is selected, determines that the assessment probability of corresponding candidate word, the assessment probability are waited according in candidate sentence according to the candidate sentence The language environment probability for closing on word of the language environment probability and candidate word that select word obtains;Error message acquisition module, is used for Obtain error message of the wrong word relative to each candidate word;And the 13rd evaluation module, for according to the editor away from From, similarity, assessment probability and error message, the corresponding assessment score of each candidate word is determined.
Above-mentioned candidate word appraisal procedure and device, determine the editing distance and similarity of candidate word and wrong word, and determine and wait The assessment probability for selecting word determines that each candidate word is corresponding according to editing distance, similarity, assessment probability and error message Score is assessed, and then each candidate word is assessed;Both the phenomenon that word is write problem had been considered, also by context language ring The information in border takes into account, and is thus conducive to the accuracy for improving candidate word assessment result.
A kind of computer equipment, including memory, processor and be stored on the memory and can be in the processing The computer program run on device, the processor realize above-mentioned candidate word appraisal procedure when executing the computer program.
Above computer equipment is conducive to improve candidate word assessment by the computer program run on the processor As a result accuracy.
A kind of computer storage media, is stored thereon with computer program, which realizes above-mentioned when being executed by processor Candidate word appraisal procedure.
Above computer storage medium is conducive to improve candidate word assessment result by the computer program of its storage Accuracy.
Description of the drawings
Fig. 1 is the applied environment figure of candidate word appraisal procedure in one embodiment;
Fig. 2 is the schematic flow chart of the candidate word appraisal procedure of first embodiment;
Fig. 3 is the schematic flow chart of the candidate word appraisal procedure of second embodiment;
Fig. 4 is the schematic flow chart of the candidate word appraisal procedure of 3rd embodiment;
Fig. 5 is the schematic flow chart of the candidate word appraisal procedure of fourth embodiment;
Fig. 6 is the schematic flow chart of the candidate word appraisal procedure of the 5th embodiment;
Fig. 7 is the schematic flow chart of the candidate word appraisal procedure of sixth embodiment;
Fig. 8 is the schematic flow chart of the candidate word appraisal procedure of the 7th embodiment;
Fig. 9 is the schematic flow chart of the candidate word appraisal procedure of the 8th embodiment;
Figure 10 is the schematic flow chart of the candidate word appraisal procedure of the 9th embodiment;
Figure 11 is the schematic flow chart of the candidate word appraisal procedure of the tenth embodiment;
Figure 12 is the schematic flow chart of the candidate word appraisal procedure of the 11st embodiment;
Figure 13 is the schematic flow chart of the candidate word appraisal procedure of the 12nd embodiment;
Figure 14 is the schematic flow chart of the candidate word appraisal procedure of the 13rd embodiment;
Figure 15 is the schematic diagram of the candidate word apparatus for evaluating of 14 embodiments;
Figure 16 is the schematic diagram of the candidate word apparatus for evaluating of 15 embodiments;
Figure 17 is the schematic diagram of the candidate word apparatus for evaluating of 16 embodiments;
Figure 18 is the schematic diagram of the candidate word apparatus for evaluating of 17 embodiments;
Figure 19 is the schematic diagram of the candidate word apparatus for evaluating of 18 embodiments;
Figure 20 is the schematic diagram of the candidate word apparatus for evaluating of 19 embodiments;
Figure 21 is the schematic diagram of the candidate word apparatus for evaluating of 20 embodiments;
Figure 22 is the schematic diagram of the candidate word apparatus for evaluating of 21 embodiments;
Figure 23 is the schematic diagram of the candidate word apparatus for evaluating of 22 embodiments;
Figure 24 is the schematic diagram of the candidate word apparatus for evaluating of 23 embodiments;
Figure 25 is the schematic diagram of the candidate word apparatus for evaluating of 24 embodiments;
Figure 26 is the schematic diagram of the candidate word apparatus for evaluating of 25 embodiments;
Figure 27 is the schematic diagram of the candidate word apparatus for evaluating of 26 embodiments.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments It is contained at least one embodiment of the application.Each position in the description occur the phrase might not each mean it is identical Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and Implicitly understand, embodiment described herein can be combined with other embodiments.
Candidate word appraisal procedure provided by the present application, can be applied in application environment as shown in Figure 1.The computer is set Standby internal structure chart can be with as shown in Figure 1, include processor, memory, display screen and the input connected by system bus Device.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory of the computer equipment includes Non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system and computer program.The memory Reservoir provides environment for the operation of operating system and computer program in non-volatile memory medium.The computer program is located It manages when device executes to realize a kind of candidate word appraisal procedure.The display screen of the computer equipment can be liquid crystal display or electricity The input unit of sub- ink display screen, the computer equipment can be the touch layer covered on display screen, can also be computer Button, trace ball or the Trackpad being arranged on device housings can also be external keyboard, Trackpad or mouse etc..
The computer equipment can be terminal, and it is various personal computers, laptop, intelligent hand to include but not limited to Machine and intelligent interaction tablet;When for intelligent interaction tablet, the writing operation of user can be detected and identified, additionally it is possible to book The content of write operation carries out error detection, even, can also carry out error correction to the word of clerical error automatically.
It will be understood by those skilled in the art that structure shown in Fig. 1, is only tied with the relevant part of application scheme The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment May include either combining certain components than more or fewer components as shown in the figure or being arranged with different components.
As shown in Fig. 2, in the first embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S11 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
Wrong word includes the word of misspelling or the word of clerical error, is the word being not present in corresponding dictionary.
Wrong word can be the texts class words such as symbol, word, word, English word.Candidate word can according to the mistake word it is true Fixed word similar with the mistake word is likely to be the corresponding correct word (error correction term) of the mistake word.Candidate word can be one, two It is a, three etc., the embodiment of the present invention is not limited the number of candidate word.If it is determined that candidate word be one, then directly should Candidate word is as error correction term.
Wherein, various ways realization, such as dictionary detection mode, the embodiment of the present invention may be used in the mode for detecting wrong word The mode of the wrong word of detection is not limited.
Step S12 determines the editing distance of each candidate word and the wrong word.
The editing distance of candidate word and wrong word is used to weigh the difference degree of candidate word and the wrong word;Candidate word can be referred to With the alphabetical number difference of the mistake word, stroke difference etc..For character string or English word, editing distance refers to two Between a word string, the minimum edit operation number changed into needed for another by one, the edit operation of license includes by a word Symbol replaces, and is inserted into a character, deletes a character.
In one embodiment, the value of editing distance is preserved using d [i, j] two-dimensional array, indicates to be converted to string s [1 ... i] String t [1 ... j] required minimal steps, when i is equal to 0, that is, string s is sky, then corresponding d [0, j] is exactly to increase j A character so that string s is converted into string t;When j is equal to 0, that is, string t is sky, then corresponding d [i, 0] is exactly to reduce i Character so that string s is converted into string t.For example, editing distance between such as kitten and sitting is 3 because at least with Lower 3 steps:Sitten (k → s), sittin (e → i), sitting (→ g).
Such as:For a wrong word, all known words that the editing distance in dictionary with it is 1 and 2 are selected, this is built The candidate word set of wrong word.According to actual conditions, the determination of the candidate word set is also based on other editing distance conditions. By screening the corresponding candidate word of wrong word, when improving the reliability of candidate word assessment result, while advantageously reducing follow-up calculating Between.
Step S13 obtains error message of the wrong word relative to each candidate word.
Error message of the wrong word relative to each candidate word, the distinctive information for characterizing wrong word and each candidate word.Optionally, Error message can written word usual according to user when custom determine, e.g. user is particularly easy to fail to write character or specific Part is easiest to error etc..
In one embodiment, wrong word can refer to the initial of wrong word and each candidate word relative to the error message of each candidate word Whether identical information, the whether identical information of character quantity of wrong word and candidate word, whether the radical of wrong word and each candidate word Identical information, can also be in wrong word whether the information etc. containing illegal symbol.
Step S14 determines the corresponding assessment score of each candidate word according to the editing distance and error message.
The assessment score obtained by the step can reflect that each candidate word is the corresponding error correction term of wrong word (correct word) Possibility.The step is equivalent to editing distance information and error message according to candidate word and wrong word, is wrong to candidate word The possibility of the corresponding error correction term of word is effectively assessed.
The candidate word appraisal procedure of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding each candidate word and wrong word editing distance, and determine error message of the wrong word relative to each candidate word;According to editor Distance and error message, determine the corresponding assessment score of each candidate word;Both custom feature when user's written word had been considered, It is contemplated that the editing distance information between word and word, is thus conducive to the reliability for improving candidate word assessment result.
In one embodiment, according to the editing distance and error message, the corresponding assessment score of each candidate word is determined, Including:According to the inverse and error message of editing distance, the corresponding assessment score of each candidate word is determined.
Usually, editing distance is bigger, and candidate word and the difference degree of wrong word are bigger, for the possibility of the error correction term of wrong word Property it is smaller, therefore, the reliability of the assessment score of candidate word is can guarantee by the inverse and error message of editing distance.
In one embodiment, wrong word includes relative to the error message of each candidate word:Wrong word and candidate word whether lead-in Female identical information.Accordingly, above-mentioned according to editing distance and error message, determine the corresponding assessment score of each candidate word Step includes:If wrong word is identical as candidate initial letter, according to the inverse of editing distance and the first coefficient, candidate word is calculated Corresponding assessment score;If wrong word is different from candidate initial letter, according to the inverse of editing distance and the second coefficient, calculate The corresponding assessment score of candidate word.
Such as:Assuming that wrong word is indicated relative to the error message of each candidate word with K, editing distance DeditIt indicates, it is candidate The corresponding point value of evaluation score of wordwordIt indicates, then calculating the formula of the scoring of each candidate word can be:
scoreword=K × 1/Dedit
Wherein according to candidate word and wrong word, whether initial is identical selects for the selection of K values, if candidate word and wrong prefix word Parent phase is same, and K values are K1, and otherwise, K values K2, K1, K2 are preset numerical value.Such as:K values are 1 when identical, K when different Value is 0.5.
For English word or character string, it is based on user's writing style, initial will not generally malfunction, therefore based on upper Embodiment is stated, is that the possibility of the error correction term of wrong word is assessed to each candidate word, can more reasonably determine candidate word Assess score.It is error correction term if candidate word is identical as the wrong initial of word so that in the case where other conditions are equivalent Possibility bigger, otherwise, be error correction term possibility smaller.It should be understood that wherein the value of K include but not limited on State exemplary value.
Such as:Known cases are that user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet, Context language environment is I have sone apples;To sone corresponding candidate words assessment method include:
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the candidate word of sone is obtained, it is assumed that have some, same, one, as candidate word.
3, the editing distance of some, same, one and sone are determined respectively, respectively:1、2、1.
4, according to formula scoreword=K × 1/DeditCalculate the score of each candidate word:
scoresome=1 × 1=1;
scoresame=1 × 12=0.5;
scoreone=0.5 × 1=0.5.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
The corresponding assessment score of finally obtained each candidate word as a result, is that custom when considering user's written word is special What sign and the editing distance size between word and word obtained, relative to traditional candidate word assessment side based on editing distance Method is conducive to the reliability for improving candidate word assessment result, ensures the accuracy of error correction term.
As shown in figure 3, in a second embodiment, a kind of candidate word appraisal procedure includes the following steps:
Step S21 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
Realization about the step can refer to the correspondence step S11 of above-mentioned implementation, not repeat.
Step S22, determines the similarity of each candidate word and wrong word, and the similarity is according to the longest of each candidate word and wrong word Common subsequence and/or Longest Common Substring obtain.
Longest common subsequence is that (being usually two sequences) is used for searching in all sequences most in an arrangement set The problem of long sub-sequence.One ordered series of numbers if being respectively the subsequence of two or more known ordered series of numbers, and is all to meet this It is longest in part sequence, then the referred to as longest common subsequence of known array.
Longest Common Substring is that (being usually two sequences) is used for searching longest in all sequences in an arrangement set The problem of substring.One substring if being respectively the substring of two or more known ordered series of numbers, and is all to meet this condition substring In longest, the then referred to as Longest Common Substring of known array.
The effective object of subsequence is sequence, and subsequence is orderly but not necessarily continuous.Such as:Sequence X=<B,C,D,B>It is Sequence Y=<A,B,C,B,D,A,B>Subsequence, corresponding subscript sequence is<2,3,5,7>.The effect of substring likes character String, substring are orderly and continuous.Such as:Character string a=abcd is the substring of character string c=aaabcdddd;But character string b= Acdddd is not just the substring of character string c.
By longest common subsequence and/or Longest Common Substring, it can reflect candidate word and wrong word to a certain extent Between identical characters quantity number, actual physical meaning is had to the similarity determined between candidate word and wrong word based on this.
Step S23 obtains error message of the wrong word relative to each candidate word.
About the realization method of the step, the explanation of above-mentioned first embodiment can refer to.
Step S24 determines the corresponding assessment score of each candidate word according to the similarity and error message.
Score is assessed by the candidate word that the step obtains, can reflect that each candidate word is the corresponding error correction term of wrong word Possibility.
The candidate word appraisal procedure of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding each candidate word and wrong word similarity, the similarity according to the longest common subsequence of each candidate word and wrong word and/ Or Longest Common Substring obtains, and determine error message of the wrong word relative to each candidate word;According to similarity and error letter Breath, determines the corresponding assessment score of each candidate word;Between custom feature and word and word when having considered user's written word Similarity, be thus conducive to improve candidate word assessment result reliability.
In one embodiment, the step of each candidate word and the similarity of wrong word are determined in above-mentioned implementation is specifically as follows: According at least one of each candidate word and the longest common subsequence rate of the wrong word, Longest Common Substring rate, each time is calculated Select the similarity of word and the wrong word.
The mean value of sequence total length is longest common subsequence in the length divided by arrangement set of longest common subsequence Rate can be denoted as Dlsc1.The mean value of sequence total length is the public son of longest in the length divided by arrangement set of Longest Common Substring String rate can be denoted as Dlsc2.Such as:Calculating the detailed process of longest common subsequence rate and Longest Common Substring rate can be:It is right In arrangement set { dbcabca, cbaba }, longest common subsequence is baba, and the length of longest common subsequence is 4, then most Long common subsequence rate is 4/ ((7+5)/2)=0.67;For arrangement set { babca, cbaba }, Longest Common Substring is The length of bab, Longest Common Substring are 3, then Longest Common Substring rate is 3/ ((5+5)/2)=0.6.
Longest common subsequence rate and Longest Common Substring rate not only consider the identical characters between candidate word and wrong word Number, also further consider the ratio shared by identical characters, be conducive to further increase similarity between candidate word and wrong word Accuracy.
Optionally, if longest common subsequence rate Dlsc1It indicates, Longest Common Substring rate Dlsc2It indicates, each candidate word It is indicated with S with the similarity of the wrong word, the mode for calculating each candidate word and the similarity of the wrong word is:
S=w1Dlcs1
S=w2Dlcs2
S=w1Dlcs1+w2Dlcs2
Wherein, w1、w2For preset weight coefficient, and sum of the two is 1.
By the weighted sum of longest common subsequence rate and Longest Common Substring rate, candidate word and the wrong word are determined Similarity, accurate similarity can be obtained.Under normal circumstances, longest common subsequence rate Dlsc1To the similar of two words Degree influence bigger, therefore longest common subsequence rate Dlsc1Weight coefficient be more than Longest Common Substring rate Dlsc2Weight coefficient, example Such as:
S=0.7Dlcs1+0.3Dlcs2
In another embodiment, the step of similarity for determining each candidate word and wrong word can also be:According to each candidate The volume of at least one of the longest common subsequence rate of word and wrong word, Longest Common Substring rate and each candidate word and wrong word Distance is collected, the similarity of each candidate word and wrong word is calculated.
Optionally, the mode for calculating each candidate word and the similarity of wrong word may include:
S=1/Dedit+w1Dlcs1
Alternatively, S=1/Dedit+w2Dlcs2
Alternatively, S=1/Dedit+w1Dlcs1+w2Dlcs2
Specifically for example:
S=1/Dedit+0.7Dlcs1
Alternatively, S=1/Dedit+0.3Dlcs2
Alternatively, S=1/Dedit+0.7Dlcs1+0.3Dlcs2
Wherein, DeditIt is editing distance, refers between two word strings, the minimum editor changed into needed for another by one grasps Make number, the edit operation of license includes replacing a character, is inserted into a character, deletes a character.In general, it edits Apart from smaller, the degrees of approximation of two strings are bigger.
Thus it in combination with the editing distance and identical characters quantity between word and word, determines similar between two words Degree, and by increasing editing distance, this assesses dimension, is conducive to the validity for improving similarity calculation.
It is appreciated that longest common subsequence rate D in above-mentioned formulalsc1, Longest Common Substring rate Dlsc2Weight system before Number is also to take other numerical value, including but not limited to above example.
In one embodiment, wrong word includes relative to the error message of each candidate word:Wrong word and candidate word whether lead-in Female identical information.Accordingly, above-mentioned according to the editing distance and error message, determine the corresponding assessment of each candidate word The step of score includes:If wrong word is identical as candidate initial letter, according to similarity and the first coefficient, candidate word pair is calculated The assessment score answered;If wrong word is different from candidate initial letter, according to similarity and the second coefficient, calculates candidate word and correspond to Assessment score.
Optionally, it is assumed that similarity S, wrong word are indicated relative to the error message of each candidate word with K, calculate each candidate word The formula of scoring can be:
scoreword=K × S;
Wherein according to candidate word and wrong word, whether initial is identical selects for the selection of K values, if candidate word and wrong prefix word Parent phase is same, and K values are K1, and otherwise, K values K2, K1, K2 are preset numerical value.Such as:K values are 1 when identical, K when different Value is 0.5.
For English word or character string, it is based on user's writing style, initial will not generally malfunction, therefore by upper Embodiment is stated, is that the possibility of the error correction term of wrong word is assessed to each candidate word, can more reasonably determine candidate word Assess score.It is error correction term if candidate word is identical as the wrong initial of word so that in the case where other conditions are equivalent Possibility bigger, otherwise, be error correction term possibility smaller.It should be understood that wherein the value of K include but not limited on State exemplary value.
Such as:Known cases are that user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet, Context language environment is I have sone apples;To sone corresponding candidate words assessment method include:
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, one, as time Select word.
3, the longest common subsequence rate and Longest Common Substring rate of some, same, one and sone are determined respectively, and are counted It is as follows to calculate the corresponding similarity of each candidate word:
The longest common subsequence of candidate word some and wrong word sone is soe, and the length of longest common subsequence is 3, most Long common subsequence rate is 3/ ((4+4)/2)=0.75;Longest Common Substring is so, and the length of Longest Common Substring is 2, longest Public substring rate is 2/ ((4+4)/2)=0.5;
The longest common subsequence of candidate word same and wrong word sone is se, and the length of longest common subsequence is 2, longest Common subsequence rate is 2/ ((4+4)/2)=0.5;Longest Common Substring is s or e, and the length of Longest Common Substring is 1, longest Public substring rate is 1/ ((4+4)/2)=0.25;
The longest common subsequence of candidate word one and wrong word sone is one, and the length of longest common subsequence is 3, longest Common subsequence rate is 3/ ((3+4)/2)=0.86;Longest Common Substring is one, and the length of Longest Common Substring is 3, longest Public substring rate is 3/ ((3+4)/2)=0.86;
According to formula S=1/Dedit+0.7Dlcs1+0.3Dlcs2, calculate each candidate word and the similarity S of wrong word be as follows:
Some:1+0.7 × 0.75+0.3 × 0.5=1.675;
Same:1/2+0.7 × 0.5+0.3 × 0.25=0.925;
One:1+0.7 × 0.86+0.3 × 0.86=1.86.
4, according to formula scoreword=K × S calculates the assessment score of each candidate word:
scoresome=1 × 1.675=1.675;
scoresame=1 × 0.925=0.925;
scoreone=0.5 × 1.86=0.93.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
Candidate word appraisal procedure based on above-described embodiment, the phase of the corresponding assessment score of each candidate word wrong word corresponding thereto Like the relationship that degree is positively correlated, while also being influenced by the initial distinctive information of itself and wrong word, finally obtained each candidate The corresponding assessment score of word, after being the similarity information between custom feature and word and the word when considering user's written word It obtains, relative to traditional candidate word appraisal procedure based on editing distance, be conducive to improve candidate word assessment result can By property.
As shown in figure 4, in the third embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S31 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
Realization about the step can refer to the correspondence step S11 of above-mentioned implementation, not repeat.
Step S32 determines the language environment probability that each candidate word is set in the wrong lexeme.
For candidate word when the language environment probability that wrong lexeme is set refers to replacing the mistake word with candidate word, the candidate word is opposite In corresponding context reasonability, the corresponding language environment probability of the higher candidate word of context reasonability is higher.
In one embodiment, the probability that each candidate word is set in the wrong lexeme is calculated according to preset language model, by this Language environment probability of the log values of probability as the candidate word.
Step S33 obtains error message of the wrong word relative to each candidate word.
About the realization method of the step, the explanation of above-mentioned first embodiment can refer to.
Step S34 determines the corresponding assessment score of each candidate word according to the language environment probability and error message.
The assessment score obtained by the step can reflect that each candidate word is the corresponding error correction term of wrong word (correct word) Possibility.
The candidate word appraisal procedure of above-described embodiment obtains corresponding multiple candidate words when detecting wrong word, true respectively The probability that fixed each candidate word is set in the wrong lexeme, and determine error message of the wrong word relative to each candidate word;According to The language environment probability and error message determine the corresponding assessment score of each candidate word;Both user's written word had been considered When habit problem, also the information of context language environment is taken into account, is thus conducive to improve candidate word assessment result Reliability.
In one embodiment, determine that the language model for the language environment probability that each candidate word is set in the wrong lexeme includes But it is not limited to N-Gram models, BiLSTM models or LSTM models.
Wherein, N-Gram models are a kind of statistical language models, are used for (n-1) a item before and predict n-th item.In application, these item can be phoneme (speech recognition application), character (input method application), word (answer by participle With) or base-pair (gene information).The thought of N-Gram models:Given a string of letters, such as " for ex ", next maximum possible Property occur letter what is.From training corpus data, N number of probability point can be obtained by the method for Maximum-likelihood estimation Cloth:The probability for being a is 0.4, and the probability for being b is 0.0001, and being the probability of c is ..., certainly, need to meet constraints:All N The summation of a probability distribution is 1.
Length Memory Neural Networks model, commonly referred to as LSTM models are a kind of special Recognition with Recurrent Neural Network;It can lead to The character level sequence of input is crossed to predict next character being likely to occur.
Two-way shot and long term memory network model, commonly referred to as BiLSTM models, it is as the structure of LSTM models, and institute is not With BiLSTM models are not only connect with past state, but also are also connect with following state.For example, passing through one A ground input letter, the unidirectional LSTM predictions " fish (fish) " of training (remember over by being connected by circulation on a timeline State value), next letter in the feedback network list entries of BiLSTM, this makes it will be seen that following information What is.The training of this form makes the network can be with the blank between filling information, rather than predictive information.
In one embodiment, the log values for the probability set in the wrong lexeme by each candidate word when language environment probability When expression, then the corresponding assessment score of each candidate word can be determined according to the inverse and error message of the language environment probability. The corresponding probability log values of the corresponding assessment score of each candidate word are at inverse relationship as a result, while also by itself and wrong word The influence of distinctive information.The assessment score of i.e. finally obtained each candidate word is that custom when considering user's written word is asked It is obtained after topic and language environment information, relative to traditional candidate word appraisal procedure, is conducive to improve candidate word assessment Reliability.
In one embodiment, wrong word includes relative to the error message of each candidate word:It is described mistake word be with candidate word The identical information of no initial.Accordingly, described according to the language environment probability and error message, determine each candidate word pair The step of assessment score answered includes:If wrong word is identical as candidate's initial letter, according to the language environment probability and the One coefficient calculates the corresponding assessment score of the candidate word;If wrong word is different from candidate initial letter, according to the language ring Border probability and the second coefficient calculate the corresponding assessment score of the candidate word.
Based on any of the above-described embodiment, if the language environment probability that candidate word is set in wrong lexeme is usedIt indicates, Word indicates that candidate word, mx representation language models then calculate the assessment score of each candidate word according to following formula:
Wherein, if candidate word is identical with wrong initial letter, K values are K1, and otherwise, K values K2, K1, K2 are preset Numerical value.
Such as:Assuming that the language environment probability determined by N-Gram language models isCalculate each time Selecting the formula of the scoring of word can be:
Assuming that being expressed as by the language environment probability that BiLSTM models, LSTM models are determined The formula for then calculating the scoring of each candidate word may respectively be:
Wherein according to candidate word and wrong word, whether initial is identical selects for the selection of K values, and K values are 1 when identical, different When K values be 0.5.It should be understood that the value of K includes but not limited to the value of above-mentioned example.
For English word or character string, it is based on user's writing style, initial will not generally malfunction, therefore by upper Embodiment is stated, is that the possibility of the error correction term of wrong word is assessed to each candidate word, can more reasonably determine candidate word Assess score.It is error correction term if candidate word is identical as the wrong initial of word so that in the case where other conditions are equivalent Possibility bigger, otherwise, be error correction term possibility smaller.
For example, as it is known that situation is, user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet, Context language environment is I have sone apples.
For N-GRAM models, N values 3 in N-GRAM models:
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make For candidate word.
3, some, same, son, one are replaced into the sone in I have sone apples respectively, according to N-GRAM moulds It is corresponding that type calculates each candidate wordThe language environment probability for obtaining each candidate word is as follows:
Wherein, P (I, have, some)=c (I, have, some)/c (3-Gram), c, which are represented, to be counted, i.e. I in language material, 3 tuples as have, some count the counting of (number) divided by all 3 tuples;
Wherein, such 2 tuple of P (have | I)=c (have, I)/c (I), i.e. have in language material, I counts divided by institute There is the counting of I;
Wherein, P (I)=c (I)/c (1-Gram), i.e., such 1 tuple counting of I divided by the meter of all 1 tuples in language material Number.
4, according to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-5.7=0.175;
scoresame=-1 × 1/-6=0.167;
scoreson=-1 × 1/-6=0.167;
scoreone=-0.5 × 1/-4.8=0.104.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for BiLSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Point Some, same, son, one are not replaced into the sone in I have sone apples, calculate the corresponding language ring of each candidate word Border probabilityEach candidate word calculates as follows:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-4.9=0.204;
scoresame=-1 × 1/-5.07=0.197;
scoreson=-1 × 1/-7.6=0.132;
scoreone=-0.5 × 1/-4.66=0.107.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for LSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Respectively Some, same, son, one are replaced into the sone in I have sone apples, calculate the corresponding language environment of each candidate word ProbabilityEach candidate word calculates as follows:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-5.6=0.179;
scoresame=-1 × 1/-6.3=0.159;
scoreson=-1 × 1/-6=0.116;
scoreone=-0.5 × 1/-5.1=0.098.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
Score is assessed by the candidate word under above-mentioned three kinds of language models to calculate, and has both considered habit when user's written word Used problem, it is also contemplated that language environment information can thereby determine out the candidate word of the wrong word of more effectively assessment, improve the standard of error correction True property.
As shown in figure 5, in the fourth embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S41 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The specific implementation of the step is not repeated with reference to the step S11 of above-described embodiment.
Step S42 replaces the wrong word with each candidate word respectively, obtains candidate sentence, is determined according to the candidate sentence The assessment probability of corresponding candidate word, language environment probability and candidate of the assessment probability according to candidate word in candidate sentence The language environment probability for closing on word of word obtains.
Wrong word is replaced with each candidate word respectively and obtains corresponding candidate sentence, may include multiple words in candidate's sentence, In one be candidate word.Word each word in sentence refers to the word relative to correspondence in the language environment probability of its corresponding position Context reasonability, context reasonability is higher, and the corresponding language environment probability of word is higher.
In one embodiment, the word that closes on of candidate word can be one or more, also, both may include that candidate word is straight It connects and closes on word, word is closed at the interval that can also include candidate word.It can be calculated according to preset language model candidate in candidate sentence The probability for closing on its each comfortable position of word of word, candidate word, the log values of the probability is general as the language environment of equivalent Rate;And then it to the language environment probability averaging for closing on word of the language environment probability of candidate word, candidate word in candidate sentence, obtains To the assessment probability of candidate word in the candidate sentence.Wherein the language environment probability of candidate word, candidate word the language for closing on word The mean value for saying ambient probability, either absolute average, can also be weighted average.
Step S43 obtains error message of the wrong word relative to each candidate word.
About the realization method of the step, the explanation of above-mentioned first embodiment can refer to.
Step S44 determines the corresponding assessment score of each candidate word according to the assessment probability and error message.
The assessment score obtained by the step can reflect that each candidate word is the possibility of the corresponding error correction term of wrong word Property.
The candidate word appraisal procedure of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding the corresponding assessment probability of each candidate word, and determine error message of the wrong word relative to each candidate word;According to institute Probability and error message are estimated in commentary, determine the corresponding assessment score of each candidate word;Both habit when user's written word had been considered Used problem, also takes into account the information of context language environment, is thus conducive to the reliability for improving candidate word assessment result.
In one embodiment, candidate word in candidate sentence is calculated, candidate word closes on the general of its each comfortable position of word Language model includes but not limited to N-Gram models, BiLSTM models or LSTM models.The explanation of each language model can be found in The description of 3rd embodiment.
In one embodiment, log value of the language environment probability of each word with each word in the probability of its position indicates, is based on This obtains the assessment probability of candidate word, further, can be determined according to the inverse and error message of the assessment probability of candidate word The corresponding assessment score of each candidate word.
Such as:If the assessment probability of candidate word is usedIt indicates, error of the wrong word relative to each candidate word Information indicates with K, the corresponding point value of evaluation score of candidate wordwordIt indicates, then can calculate each candidate word according to following formula Assess score:
Wherein, if candidate word is identical with wrong initial letter, K values are K1, otherwise, K values K2;K1, K2 are preset Numerical value.
The corresponding assessment probability of the corresponding assessment score of each candidate word is at inverse relationship as a result, while also by it With the influence of the initial information of wrong word.The corresponding assessment score of i.e. finally obtained each candidate word, is to consider user's book It is obtained after habit problem and contextual information when writing word, relative to traditional side for assessing candidate word by editing distance Method improves the reliability of candidate word assessment result.
In one embodiment, the wrong word includes relative to the error message of each candidate word:The mistake word and candidate Word whether the identical information of initial.Accordingly, described according to the assessment probability and error message, determine each candidate word pair The step of assessment score answered includes:If wrong word is identical as candidate initial letter, according to the assessment probability and the first system Number, calculates the corresponding assessment score of the candidate word;If wrong word is different with candidate's initial letter, according to the assessment probability with And second coefficient, calculate the corresponding assessment score of the candidate word.
Optionally:Assuming that being used by the assessment probability that N-Gram language models are determinedThen calculate The formula of the scoring of each candidate word can be:
Wherein, according to candidate word and wrong word, whether initial is identical selects for the selection of K values, and K values are 1 when identical, no K values are 0.5 simultaneously.It should be understood that the value of K includes but not limited to the value of above-mentioned example.
For English word or character string, it is based on user's writing style, initial will not generally malfunction, therefore based on upper Embodiment is stated, is that the possibility of the error correction term of wrong word is assessed to each candidate word, can more reasonably determine candidate word Assess score.It is error correction term if candidate word is identical as the wrong initial of word so that in the case where other conditions are equivalent Possibility bigger, otherwise, be error correction term possibility smaller.
Such as:Known cases are that user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet, Context language environment is I have sone apples.
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make For candidate word.
3, some, same, son, one are replaced to the sone in I have sone apples respectively;It obtains:
Candidate sentence one:I have some apples;
Candidate sentence two:I have same apples;
Candidate sentence three:I have son apples;
Candidate sentence four:I have one apples.
Based on N-GRAM models, N values 3 in N-GRAM models, candidate word some is corresponding in candidate sentence one closes on word For apples, the corresponding words that close on of candidate word same are apples in candidate sentence two, and candidate word son is corresponded in candidate sentence three The word that closes on be apples, the corresponding words that close on of candidate word one are apples in candidate sentence four;Based on each candidate language of this calculating The corresponding assessment probability of candidate word is as follows in sentence:
4, according to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-5.6=0.179;
scoresame=-1 × 1/-6.35=0.157;
scoreson=-1 × 1/-7.58=0.132;
scoreone=-0.5 × 1/-5.26=0.095.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, candidate word some is corresponding for BiLSTM models, in candidate sentence one closes on word For I, have and apples, the corresponding words that close on of candidate word same are I, have and apples in candidate sentence two, candidate For I, have and apples, candidate word one is corresponding in candidate sentence four faces the corresponding words that close on of candidate word son in sentence three Nearly word is I, have and apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-2.88=0.347;
scoresame=-1 × 1/-3.62=0.276;
scoreson=-1 × 1/-4.1=0.244;
scoreone=-0.5 × 1/-2.62=0.191.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for LSTM models, the corresponding words that close on of candidate word some are in candidate sentence one I, have and apples, the corresponding words that close on of candidate word same are I, have and apples, candidate language in candidate sentence two For I, have and apples, candidate word one is corresponding in candidate sentence four closes on the corresponding words that close on of candidate word son in sentence three Word is I, have and apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1/-3.175=0.315;
scoresame=-1 × 1/-3.75=0.267;
scoreson=-1 × 1/-4.6=0.217;
scoreone=-0.5 × 1/-3.45=0.145.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
By the candidate word assessment under above-mentioned three kinds of language models, habit problem when user's written word had both been considered, Also the assessment information of language environment is added, which thereby enhances the reliability of candidate word assessment.
As shown in fig. 6, in the 5th embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S51 obtains the corresponding multiple candidate words of wrong word when detecting wrong word.
It is not repeated with reference to the step S11 of above-mentioned first embodiment about the realization method of the step.
Step S52 determines the editing distance of each candidate word and the wrong word.
About the realization method of the step, the explanation of first embodiment can refer to.
Step S53 replaces the wrong word with each candidate word respectively, obtains candidate sentence, is determined according to the candidate sentence The assessment probability of corresponding candidate word, language environment probability and candidate of the assessment probability according to candidate word in candidate sentence The language environment probability for closing on word of word obtains.
About the realization method of the step, the explanation of fourth embodiment can refer to.
Step S54 determines the assessment score of each candidate word according to the editing distance and assessment probability.
The assessment score obtained by the step, it is the corresponding error correction term of wrong word (correct word) that can characterize each candidate word Possibility.
By the candidate word appraisal procedure of above-described embodiment, when detecting wrong word, the corresponding multiple candidates of wrong word are obtained Word;According to the editing distance and context language environmental information of candidate word and wrong word, comprehensive assessment candidate word is that wrong word corresponds to The possibility of error correction term improve candidate word assessment relative to traditional candidate word appraisal procedure for relying only on editing distance Reliability.
In one embodiment, candidate word in candidate sentence is calculated according to preset language model, candidate word closes on word The probability of each its position of leisure, using the log values of the probability as the language environment probability of each word;Further, to candidate sentence The language environment probability of middle candidate word, the language environment probability for closing on word of candidate word are averaging, and are obtained in the candidate sentence The assessment probability of candidate word.
Wherein, language model includes but not limited to N-Gram models, BiLSTM models or LSTM models.Each language model It can be found in the explanation of above-mentioned 3rd embodiment.
In one embodiment, log value of the language environment probability of each word with each word in the probability of its position indicates, is based on This obtains the assessment probability of candidate word, further, can be according to the inverse of the editing distance and the inverse of assessment probability, really Determine the corresponding assessment score of each candidate word.
Such as:The assessment score of each candidate word can be calculated according to following formula:
Wherein, DeditIndicate that the editing distance of candidate word and wrong word, word indicate candidate word,Table Show the assessment probability of candidate word, scorewordIndicate the corresponding assessment score of candidate word.
The corresponding assessment probability of the corresponding assessment score of each candidate word is at inverse relationship as a result, while also by it With the influence of the initial information of wrong word.The corresponding assessment score of i.e. finally obtained each candidate word, is to consider written word When the phenomenon that feature and contextual information after obtain, relative to traditional method for assessing candidate word by editing distance, Improve the assessment reliability of candidate word.
Such as:Known cases are that user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet, Context language environment is I have sone apples;To sone corresponding candidate words assessment method include:
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the candidate word of sone is obtained, it is assumed that have some, same, one, son, as candidate word.
3, the editing distance of some, same, one, son and sone are determined respectively, respectively:1、2、1、1.
4, some, same, son, one are replaced to the sone in I have sone apples respectively;It obtains:
Candidate sentence one:I have some apples;
Candidate sentence two:I have same apples;
Candidate sentence three:I have son apples;
Candidate sentence four:I have one apples;
Based on N-GRAM models, N values 3 in N-GRAM models, candidate word some is corresponding in candidate sentence one closes on word For apples, the corresponding words that close on of candidate word same are apples in candidate sentence two, and candidate word son is corresponded in candidate sentence three The word that closes on be apples, the corresponding words that close on of candidate word one are apples in candidate sentence four;Based on each candidate language of this calculating The corresponding assessment probability of candidate word is as follows in sentence:
5, according to formulaCalculate the assessment of each candidate word Score:
scoresome=-1 × 1/-5.6=0.179;
scoresame=-1/2 × 1/-6.35=0.0787;
scoreson=-1 × 1/-7.58=0.132;
scoreone=-1 × 1/-5.26=0.19.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, candidate word some is corresponding for BiLSTM models, in candidate sentence one closes on word For I, have and apples, the corresponding words that close on of candidate word same are I, have and apples in candidate sentence two, candidate For I, have and apples, candidate word one is corresponding in candidate sentence four faces the corresponding words that close on of candidate word son in sentence three Nearly word is I, have and apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Step 4 replaces with:
According to formulaCalculate assessing for each candidate word Point:
scoresome=-1 × 1/-2.88=0.347;
scoresame=-1/2 × 1/-3.62=0.138;
scoreson=-1 × 1/-4.1=0.244;
scoreone=-1 × 1/-2.62=0.382.
It is similarly obtained, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for LSTM models, the corresponding words that close on of candidate word some are in candidate sentence one I, have and apples, the corresponding words that close on of candidate word same are I, have and apples, candidate language in candidate sentence two For I, have and apples, candidate word one is corresponding in candidate sentence four closes on the corresponding words that close on of candidate word son in sentence three Word is I, have and apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Step 4 replaces with:
According to formulaCalculate assessing for each candidate word Point:
scoresome=-1 × 1/-3.175=0.315;
scoresame=-1/2 × 1/-3.75=0.1335;
scoreson=-1 × 1/-4.6=0.217;
scoreone=-1 × 1/-3.45=0.29.
It equally obtains, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
As a result, by the corresponding assessment score of the finally obtained each candidate word of above-mentioned three kinds of language models, consider Thus context language environmental information and editing distance information improve the reliability of candidate word assessment.
As shown in fig. 7, in the sixth embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S61 obtains the corresponding multiple candidate words of wrong word when detecting wrong word.
It is not repeated with reference to the step S11 of above-mentioned first embodiment about the realization method of the step.
Step S62, determines the similarity of each candidate word and wrong word, and the similarity is according to the longest of each candidate word and wrong word Common subsequence and/or Longest Common Substring obtain.
It is not repeated with reference to the explanation of above-mentioned second embodiment about the realization method of the step.
Step S63 replaces the wrong word with each candidate word respectively, obtains candidate sentence, is determined according to the candidate sentence The assessment probability of corresponding candidate word, language environment probability and candidate of the assessment probability according to candidate word in candidate sentence The language environment probability for closing on word of word obtains.
It is not repeated with reference to the explanation of above-mentioned fourth embodiment about the realization method of the step.
Step S64 determines the assessment score of each candidate word according to the similarity and assessment probability.
The assessment score obtained by the step, it is the corresponding error correction term of wrong word (correct word) that can characterize each candidate word Possibility.
By the candidate word appraisal procedure of above-described embodiment, when detecting wrong word, the corresponding multiple candidates of wrong word are obtained Word;According to the similarity and context language environmental information of candidate word and wrong word, comprehensive assessment candidate word is that wrong word is corresponding The possibility of error correction term improves candidate word assessment knot relative to traditional candidate word appraisal procedure for relying only on editing distance The reliability of fruit.
In one embodiment, candidate word in candidate sentence is calculated according to preset language model, candidate word closes on word The probability of each its position of leisure, using the log values of the probability as the language environment probability of each word;Further, to candidate sentence The language environment probability of middle candidate word, the language environment probability for closing on word of candidate word are averaging, and are obtained in the candidate sentence The assessment probability of candidate word.Wherein, language model includes but not limited to N-Gram models, BiLSTM models or LSTM models. The case where each language model, can be found in described in above-described embodiment.
In one embodiment, log value of the language environment probability of each word with each word in the probability of its position indicates, is based on This obtains the assessment probability of candidate word, further, can determine each candidate word pair according to the inverse and similarity of assessment probability The assessment score answered.
Optionally, the assessment score of each candidate word is calculated according to following formula:
Wherein, word indicates candidate word, scorewordIndicate the corresponding assessment score of candidate wordTable Show that the assessment probability of candidate word, mx representation language models, S indicate the similarity of candidate word and the wrong word.
The corresponding assessment probability of the corresponding assessment score of each candidate word is at inverse relationship as a result, while also by it With the influence of the similarity of wrong word.The corresponding assessment score of i.e. finally obtained each candidate word is comprehensive similarity and up and down It is obtained after literary information, relative to traditional method for assessing candidate word by editing distance, the assessment for improving candidate word can By property.
Such as:Known cases are that user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet, Context language environment is I have sone apples;To sone corresponding candidate words assessment method include:
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the candidate word of sone is obtained, it is assumed that have some, same, son, as candidate word.
3, the longest common subsequence rate and Longest Common Substring rate of some, same, son and sone are determined respectively, and are counted It is as follows to calculate the corresponding similarity of each candidate word:
The longest common subsequence of candidate word some and wrong word sone is soe, and the length of longest common subsequence is 3, most Long common subsequence rate is 3/ ((4+4)/2)=0.75;Longest Common Substring is so, and the length of Longest Common Substring is 2, longest Public substring rate is 2/ ((4+4)/2)=0.5;
The longest common subsequence of candidate word same and wrong word sone is se, and the length of longest common subsequence is 2, longest Common subsequence rate is 2/ ((4+4)/2)=0.5;Longest Common Substring is s or e, and the length of Longest Common Substring is 1, longest Public substring rate is 1/ ((4+4)/2)=0.25;
The longest common subsequence of candidate word son and wrong word sone is son, and the length of longest common subsequence is 3, longest Common subsequence rate is 3/ ((3+4)/2)=0.86;Longest Common Substring is son, and the length of Longest Common Substring is 3, longest Public substring rate is 3/ ((3+4)/2)=0.86.
According to formula S=1/Dedit+0.7Dlcs1+0.3Dlcs2, calculate each candidate word and the similarity S of wrong word be as follows:
Some:1+0.7 × 0.75+0.3 × 0.5=1.675;
Same:1/2+0.7 × 0.5+0.3 × 0.25=0.925;
son:1+0.7 × 0.86+0.3 × 0.86=1.86.
4, some, same, son are replaced to the sone in I have sone apples respectively;It obtains:
Candidate sentence one:I have some apples;
Candidate sentence two:I have same apples;
Candidate sentence three:I have son apples.
Based on N-GRAM models, N values 3 in N-GRAM models, candidate word some is corresponding in candidate sentence one closes on word For apples, the corresponding words that close on of candidate word same are apples in candidate sentence two, and candidate word son is corresponded in candidate sentence three Close on word be apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Wherein, P (I, have, some)=c (I, have, some)/c (3-Gram), c, which are represented, to be counted, i.e. I in language material, 3 tuples as have, some count the counting of (number) divided by all 3 tuples;
Wherein, 2 tuples as P (have | I)=c (have, I)/c (I), i.e. have in language material, I count divided by all I Counting;
Wherein, P (I)=c (I)/c (1-Gram), i.e., the meter of the counting of 1 tuple divided by all 1 tuples as I in language material Number.
5, according to formulaCalculate the assessment score of each candidate word:
scoresome=-1.675 × 1/-5.6=0.299;
scoresame=-0.925 × 1/-6.35=0.146;
scoreson=-1.86 × 1/-7.58=0.245.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, candidate word some is corresponding for BiLSTM models, in candidate sentence one closes on word For I, have and apples, the corresponding words that close on of candidate word same are I, have and apples in candidate sentence two, candidate The corresponding words that close on of candidate word son are I, have and apples in sentence three;Based on candidate word in each candidate sentence of this calculating Corresponding assessment probability is as follows:
Step 5 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1.675 × 1/-2.88=0.582;
scoresame=-0.925 × 1/-3.62=0.256;
scoreson=-1.86 × 1/-4.1=0.454.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for LSTM models, the corresponding words that close on of candidate word some are in candidate sentence one I, have and apples, the corresponding words that close on of candidate word same are I, have and apples, candidate language in candidate sentence two The corresponding words that close on of candidate word son are I, have and apples in sentence three;Based on candidate word pair in each candidate sentence of this calculating The assessment probability answered is as follows:
Step 5 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1.675 × 1/-3.175=0.528;
scoresame=-0.925 × 1/-3.75=0.247;
scoreson=-1.86 × 1/-4.6=0.404.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
As a result, by the corresponding assessment score of the finally obtained each candidate word of above-mentioned three kinds of language models, consider Thus similarity information between context language environmental information and word and word improves the reliability of candidate word assessment.
As shown in figure 8, in the seventh embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S71 obtains the corresponding multiple candidate words of wrong word when detecting wrong word.
About the realization method of the step, the explanation of the step S11 of first embodiment can refer to.
Step S72 determines the editing distance of each candidate word and the wrong word.
About the realization method of the step, the explanation of first embodiment can refer to.
Step S73 determines the language environment probability that each candidate word is set in the wrong lexeme.
About the realization method of the step, the explanation of 3rd embodiment can refer to.
Step S74 obtains error message of the wrong word relative to each candidate word.
The specific implementation of the step can refer to the explanation of above-mentioned first embodiment.
Step S75 determines that each candidate word is corresponding according to the editing distance, language environment probability and error message Assess score.
The assessment score obtained by the step can reflect that each candidate word is the corresponding error correction term of wrong word (correct word) Possibility.
The candidate word appraisal procedure of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding the probability set in the wrong lexeme with the editing distance of the wrong word and each candidate word of each candidate word, and determine Error message of the mistake word relative to each candidate word;According to the editing distance, language environment probability and error message, really Determine the corresponding assessment score of each candidate word;Both habit problem when editing distance and user's written word had been considered, it will also be upper Hereafter the information of language environment takes into account, and is thus conducive to the reliability for improving candidate word assessment result.
In one embodiment, determine that the language model for the language environment probability that each candidate word is set in the wrong lexeme includes But it is not limited to N-Gram models, BiLSTM models or LSTM models.Wherein, the explanation of each model is with reference to above-mentioned related embodiment Description.
In one embodiment, the log values for the probability set in the wrong lexeme by each candidate word when language environment probability When expression, then each candidate word can be determined according to the inverse of the editing distance, the inverse and error message of language environment probability Corresponding assessment score.
The editing distance of the corresponding assessment score of each candidate word wrong word corresponding thereto is corresponding at inverse relationship as a result, Language environment probability at inverse relationship, while also being influenced by the error message of itself and wrong word.I.e. finally obtained each time The assessment score for selecting word is after habit problem, editing distance and the language environment information when considering user's written word It arrives, relative to traditional candidate word appraisal procedure, is conducive to the assessment reliability for improving candidate word.
Wrong word includes relative to the error message of each candidate word in one of the embodiments,:The mistake word and candidate Word whether the identical information of initial.It is accordingly, described according to the editing distance, language environment probability and error message, The step of determining each candidate word corresponding assessment score include:If wrong word is identical as candidate initial letter, according to the editor Distance, language environment probability and the first coefficient, calculate the corresponding assessment score of the candidate word;If wrong word and candidate word lead-in It is female different, then according to the editing distance, language environment probability and the second coefficient, calculate that the candidate word is corresponding to be assessed Point.
Optionally, the assessment score of each candidate word is calculated according to following formula:
Wherein, word indicates candidate word, DeditIndicate the editing distance of candidate word and the wrong word,It indicates to wait Select the language environment probability of word, mx representation language models, scorewordIndicate that the corresponding assessment score of candidate word, K indicate wrong word Relative to the error message of each candidate word, if candidate word is identical with wrong initial letter, K values are K1, otherwise, K values K2, K1, K2 is preset numerical value.
Such as:Assuming that the language environment probability determined by N-Gram language models isThen calculate each time Selecting the formula of the scoring of word can be:
Wherein according to candidate word and wrong word, whether initial is identical selects for the selection of K values, and K values are 1 when identical, different When K values be 0.5.It should be understood that the value of K includes but not limited to the value of above-mentioned example.
For English word or character string, it is based on user's writing style, initial will not generally malfunction, therefore based on upper Embodiment is stated, is that the possibility of the error correction term of wrong word is assessed to each candidate word, can more reasonably determine candidate word Assess score.It is error correction term if candidate word is identical as the wrong initial of word so that in the case where other conditions are equivalent Possibility bigger, otherwise, be error correction term possibility smaller.
For example, as it is known that situation is, user is write some mistakes as sone when writing English words sentence on intelligent interaction tablet, Context language environment is I have sone apples.
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make For candidate word.
3, some, same, son, one are replaced into the sone in I have sone apples respectively, according to N-GRAM moulds It is corresponding that type calculates each candidate wordN values 3 in N-GRAM models, the language environment for obtaining each candidate word are general Rate is as follows:
Wherein, P (I, have, some)=c (I, have, some)/c (3-Gram), c, which are represented, to be counted, i.e. I in language material, 3 tuples as have, some count the counting of (number) divided by all 3 tuples;
Wherein, 2 tuples as P (have | I)=c (have, I)/c (I), i.e. have in language material, I count divided by all I Counting;
Wherein, P (I)=c (I)/c (1-Gram), i.e., the meter of the counting of 1 tuple divided by all 1 tuples as I in language material Number.
4, according to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1 × 1/-5.7=0.175;
scoresame=-1 × 0.5 × 1/-6=0.083;
scoreson=-1 × 1 × 1/-6=0.167;
scoreone=-0.5 × 1 × 1/-4.8=0.104.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for BiLSTM models, above-mentioned steps 1 and step 2 are identical;
Step 3 replaces with:Some, same, son, one are replaced into the sone in I have sone apples, meter respectively It is as follows to calculate the corresponding language environment probability of each candidate word:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1 × 1/-4.9=0.204;
scoresame=-1 × 1/2 × 1/-5.07=0.098;
scoreson=-1 × 1 × 1/-7.6=0.132;
scoreone=-0.5 × 1 × 1/-4.66=0.107.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for LSTM models, above-mentioned steps 1 and step 2 are identical;
Step 3 replaces with:Some, same, son, one are replaced into the sone in I have sone apples, meter respectively It is as follows to calculate the corresponding language environment probability of each candidate word:
Step 4 replaces with:
According to formulaCalculate assessing for each candidate word Point:
scoresome=-1 × 1 × 1/-5.6=0.179;
scoresame=-1 × 1/2 × 1/-6.3=0.079;
scoreson=-1 × 1 × 1/-6=0.116;
scoreone=-0.5 × 1 × 1/-5.1=0.098.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
By above-described embodiment, both considered editing distance information and context language environment is new, it is also contemplated that This not error-prone habit problem of initial when user writes English word, can thereby determine out more accurate error correction term.
As shown in figure 9, in the eighth embodiment, providing a kind of candidate word appraisal procedure, including the following steps:
Step S81 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S82 determines the editing distance of each candidate word and the wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S83, determines the similarity of each candidate word and wrong word, and the similarity is according to the longest of each candidate word and wrong word Common subsequence and/or Longest Common Substring obtain.
The explanation that can refer to second embodiment about the realization of the step, does not repeat.
Step S84 obtains error message of the wrong word relative to each candidate word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S85 determines that each candidate word is corresponding and assesses according to the editing distance, similarity and error message Point.
Specifically, which is equivalent to editing distance size, distinctive information and candidate word according to candidate word and wrong word It with the degree of closeness of wrong word, scores candidate word, this comprehensive three aspects assessment candidate word is the corresponding error correction term of wrong word Possibility.
The present embodiment determines each candidate word according to each candidate word and editing distance, similarity and the error message of wrong word Corresponding assessment score.The phenomenon that editing distance, similarity and user's writing style for considering candidate word and wrong word, asks Topic, can improve the reliability of candidate word assessment result.
In one embodiment, described according to the editing distance, similarity and error message, determine each candidate word pair The step of assessment score answered includes:If wrong word is identical as candidate's initial letter, according to the editing distance, similarity and First coefficient calculates the corresponding assessment score of the candidate word;If wrong word is different from candidate initial letter, according to the editor Distance, similarity and the second coefficient calculate the corresponding assessment score of the candidate word.
In one embodiment, the candidate word is calculated according to the editing distance, similarity and the first coefficient to correspond to Assessment score the step of include:According to the inverse of the editing distance, similarity and the first coefficient, the candidate word is calculated Corresponding assessment score.
In one embodiment, the candidate word is calculated according to the editing distance, similarity and the second coefficient to correspond to Assessment score the step of include:According to the inverse of the editing distance, similarity and the second coefficient, the candidate word is calculated Corresponding assessment score.
In one embodiment, the assessment score of each candidate word can be calculated according to following formula:
scoreword=K × S × 1/Dedit
The specific calculating process of assessment score is exemplified below:
Known cases are that user is write some mistakes as sone, context when writing English words sentence on intelligent interaction tablet Language environment is I have sone apples.English dictionary detection is crossed, it is found that sone is not present in English dictionary, determines Sone is wrong word.Obtain the candidate word of sone:some、same、one.
Wherein, the editing distance of some, same, one and wrong word sone are respectively:1,2,1.
1, according to formula S=1/Dedit+0.7Dlcs1+0.3Dlcs2The similarity point of each candidate word and wrong word is calculated It is not:
Ssome:1+0.7 × 0.75+0.3 × 0.5=1.675;
Ssame:1/2+0.7 × 0.5+0.3 × 0.25=0.925;
Sone:1+0.7 × 0.86+0.3 × 0.86=1.86.
2, according to formula scoreword=K × S × 1/DeditThe assessment that each candidate word is calculated is scored at:
scoresome=1 × 1.675 × 1/1=1.675;
scoresame=1 × 0.925 × 1/2=0.463;
scoreone=0.5 × 1.86 × 1/1=0.93.
The assessment score that can be seen that candidate word some from the above assessment score is higher than the assessment score of same and one;It comments Estimate result has certain reference value to the determination of error correction term.
As shown in Figure 10, in the 9th embodiment, a kind of candidate word appraisal procedure is provided, is included the following steps:
Step S91 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S92 determines the editing distance of each candidate word and the wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S93 replaces the wrong word with each candidate word respectively, obtains candidate sentence, is determined according to the candidate sentence The assessment probability of corresponding candidate word, language environment probability and candidate of the assessment probability according to candidate word in candidate sentence The language environment probability for closing on word of word obtains.
The explanation that can refer to fourth embodiment about the realization of the step, does not repeat.
Step S94 obtains error message of the wrong word relative to each candidate word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S95 determines the corresponding assessment of each candidate word according to the editing distance, assessment probability and error message Score.
The present embodiment determines each candidate according to each candidate word and the editing distance, assessment probability and error message of wrong word The corresponding assessment score of word.Both the phenomenon that having considered editing distance and the user's writing of candidate word and wrong word problem, will also The assessment information of language model is added, and can improve the reliability of candidate word assessment result, is conducive to improve text editing Efficiency and accuracy.
In one embodiment, log value of the language environment probability of each word with each word in the probability of its position indicates, is based on This obtains the assessment probability of candidate word, further, according to the inverse of the editing distance, the inverse of assessment probability and error Information determines the corresponding assessment score of each candidate word.
In one embodiment, described according to the editing distance, assessment probability and error message, determine each candidate word The step of corresponding assessment score includes:If wrong word is identical as candidate initial letter, according to the editing distance, assessment probability And first coefficient calculate the assessment score of the candidate word;If wrong word is different from candidate initial letter, according to the editor Distance, assessment probability and the second coefficient calculate the assessment score of the candidate word.
In one embodiment, the assessment score of each candidate word is calculated according to following formula:
The specific calculating process of assessment score is exemplified below:
Known cases are that user is write some mistakes as sone, context when writing English words sentence on intelligent interaction tablet Language environment is I have sone apples.
For N-Gram models, N values 3 in N-GRAM models:
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make For candidate word.The corresponding editing distance of candidate word some, same, son, one is respectively:1,2,1,1.
3, some, same, son, one are replaced into the sone in I have sone apples respectively, obtains corresponding comment Estimate sentence;According to N-GRAM models and assessment sentence, the corresponding assessment probability of each candidate word, obtained assessment are calculated separately Probability is as follows:
4, according to formulaEach candidate word is calculated Assessment score it is as follows:
scoresome=-1 × 1/1 × 1/ (- 5.6)=0.179;
scoresame=-1 × 1/2 × 1/ (- 6.35)=0.079;
scoreson=-1 × 1/1 × 1/ (- 7.58)=0.132;
scoreone=-0.5 × 1/1 × 1/ (- 5.26)=0.095.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for BiLSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Point Some, same, son, one are not replaced to the sone in I have sone apples;According to BiLSTM models and assessment language Sentence, calculates separately the corresponding assessment probability of each candidate word, obtained assessment probability is as follows:
Step 4 replaces with:
According to formulaEach candidate word is calculated Assessment is scored at:
scoresome=-1 × 1/1 × 1/ (- 2.88)=0.347;
scoresame=-1 × 1/2 × 1/ (- 3.62)=0.138;
scoreson=-1 × 1/1 × 1/ (- 4.1)=0.244;
scoreone=-0.5 × 1/1 × 1/ (- 2.62)=0.191.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for LSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:According to LSTM models and assessment sentence, calculate separately the corresponding assessment probability of each candidate word, obtained assessment probability is as follows:
Step 4 replaces with:
According to formulaEach candidate word is calculated Assessment is scored at:
scoresome=-1 × 1/1 × 1/ (- 3.175)=0.315;
scoresame=-1 × 1/2 × 1/ (- 3.75)=0.133;
scoreson=-1 × 1/1 × 1/ (- 4.6)=0.217;
scoreone=-0.5 × 1/1 × 1/ (- 3.45)=0.145.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
Score is assessed by the candidate word under above-mentioned three kinds of language models to calculate, and has both considered habit when user's written word Used problem, it is also contemplated that editing distance and language environment information can thereby determine out the candidate word of the wrong word of more effectively assessment.
As shown in figure 11, in the tenth embodiment, a kind of candidate word appraisal procedure is provided, is included the following steps:
Step S101, it detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The correspondence step that can refer to first embodiment about the realization of the step, does not repeat.
Step S102, determines the similarity of each candidate word and wrong word, the similarity according to each candidate word with wrong word most Long common subsequence and/or Longest Common Substring obtain.
The correspondence step that can refer to second embodiment about the realization of the step, does not repeat.
Step S103 determines the language environment probability that each candidate word is set in the wrong lexeme.
The explanation that can refer to 3rd embodiment about the realization of the step, does not repeat.
Step S104 obtains error message of the wrong word relative to each candidate word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S105 determines that each candidate word is corresponding and comments according to the similarity, language environment probability and error message Estimate score.
The present embodiment determines each time according to each candidate word and similarity, language environment probability and the error message of wrong word Select the corresponding assessment score of word.Both the phenomenon that having considered similarity and the user's writing of candidate word and wrong word problem, will also The assessment information of language model is added, and can improve the reliability of candidate word assessment result, is conducive to improve text editing Efficiency and accuracy.
In one embodiment, the log values for the probability set in the wrong lexeme by each candidate word when language environment probability When expression, the corresponding assessment of each candidate word can be determined according to the similarity, the inverse of language environment probability and error message Score.
In one embodiment, described according to the similarity, language environment probability and error message, determine each candidate The step of word corresponding assessment score includes:If wrong word is identical as candidate initial letter, according to the similarity, language environment Probability and the first coefficient calculate the assessment score of the candidate word;If wrong word is different from candidate initial letter, according to Similarity, language environment probability and the second coefficient calculate the assessment score of the candidate word.
In one embodiment, the assessment score of each candidate word is calculated according to following formula:
In one embodiment, the language model is N-Gram models, BiLSTM models or LSTM models.
It is alternatively possible to calculate the corresponding language environment probability of candidate word in conjunction with above-mentioned multiple language models.
Optionally, determine that the corresponding assessment score of each candidate word can be counted by following formula by N-Gram models It calculates:
Wherein,For the corresponding language environment probability of a certain candidate word calculated by N-Gram models.
For N-Gram models, N values 3 in N-Gram models:
Concrete example is as follows:Known cases are that user writes some mistakes when writing English words sentence on intelligent interaction tablet At sone, context language environment is I have sone apples.
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make For candidate word.Determine that each candidate word and the similarity of wrong word are respectively:Ssome=1.675;Ssame=0.925;Sson=1.86; Sone=1.86.
3, some, same, son, one are replaced into the sone in I have sone apples respectively, according to replaced Sentence calculates the corresponding language environment probability of candidate word
Language environment probability of each candidate word in the position of sone calculated according to N-Gram models be:
Wherein, P (I, have, some)=c (I, have, some)/c (3-Gram), c, which are represented, to be counted, i.e. I in language material, 3 tuples as have, some count the counting of (number) divided by all 3 tuples;
Wherein, 2 tuples as P (have | I)=c (have, I)/c (I), i.e. have in language material, I count divided by all I Counting;
Wherein, P (I)=c (I)/c (1-Gram), i.e., the meter of the counting of 1 tuple divided by all 1 tuples as I in language material Number.
4, according to formulaIt is calculated that each candidate word is corresponding to be commented Estimate and is scored at:
scoresome=-1 × 1.675 × 1/ (- 5.7)=0.294;
scoresame=-1 × 0.925 × 1/ (- 6)=0.154;
scoreson=-1 × 1.86 × 1/ (- 8)=0.219;
scoreone=-0.5 × 1.86 × 1/ (- 4.8)=0.182.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for BiLSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Point Some, same, son, one are not replaced into the sone in I have sone apples, calculate the corresponding language ring of each candidate word Border probabilityEach candidate word calculates as follows:
Step 4 replaces with:
According to formulaThe corresponding assessment score of each candidate word is calculated For:
scoresome=-1 × 1.675 × 1/-5.6=0.299;
scoresame=-1 × 0.925 × 1/-6.3=0.147;
scoreson=-1 × 1.86 × 1/-8.6=0.203;
scoreone=-0.5 × 1.86 × 1/-5.1=0.172.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for LSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Respectively Some, same, son, one are replaced into the sone in I have sone apples, calculate the corresponding language environment of each candidate word ProbabilityEach candidate word calculates as follows:
Step 4 replaces with:
According to formulaThe corresponding assessment score of each candidate word is calculated For:
scoresome=-1 × 1.675 × 1/-5.6=0.299;
scoresame=-1 × 0.925 × 1/-6.3=0.147;
scoreson=-1 × 1.86 × 1/-8.6=0.203;
scoreone=-0.5 × 1.86 × 1/-5.1=0.172.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
By the candidate word assessment under above-mentioned three kinds of language models, habit problem when user's written word had both been considered, Also the assessment information of similarity information and language environment is added, which thereby enhances the reliability of candidate word assessment.
As shown in figure 12, in the 11st embodiment, a kind of candidate word appraisal procedure is provided, is included the following steps:
Step S111 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S112, determines the similarity of each candidate word and wrong word, the similarity according to each candidate word with wrong word most Long common subsequence and/or Longest Common Substring obtain.
The explanation that can refer to second embodiment about the realization of the step, does not repeat.
Step S113 replaces the wrong word with each candidate word respectively, obtains candidate sentence, is determined according to the candidate sentence The assessment probability of corresponding candidate word, language environment probability and candidate of the assessment probability according to candidate word in candidate sentence The language environment probability for closing on word of word obtains.
The explanation that can refer to fourth embodiment about the realization of the step, does not repeat.
Step S114 obtains error message of the wrong word relative to each candidate word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S115 determines that each candidate word is corresponding and assesses according to the similarity, assessment probability and error message Point.
The present embodiment determines each candidate word according to each candidate word and similarity, assessment probability and the error message of wrong word Corresponding assessment score.Both the phenomenon that having considered similarity and the user's writing of candidate word and wrong word problem, also by language The assessment information of model is added, and can improve the reliability of candidate word assessment result.
In one embodiment, log value of the language environment probability of each word with each word in the probability of its position indicates, is based on This obtains the assessment probability of candidate word, further, can according to the similarity, assess probability inverse and error message, Determine the corresponding assessment score of each candidate word.
In one embodiment, described according to the similarity, assessment probability and error message, determine each candidate word pair The step of assessment score answered includes:If wrong word is identical as candidate initial letter, according to the similarity, assess probability and First coefficient calculates the assessment score of the candidate word;If wrong word is different from candidate initial letter, according to the similarity, comment Estimate probability and the second coefficient calculates the assessment score of the candidate word.
In one embodiment, the assessment score of each candidate word is calculated according to following formula:
The specific calculating process of assessment score is exemplified below:
Known cases are that user is write some mistakes as sone, context when writing English words sentence on intelligent interaction tablet Language environment is I have sone apples.
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make For candidate word.Determine that each candidate word and the similarity of wrong word are respectively:Ssome=1.675;Ssame=0.925;Sson=1.86; Sone=1.86.
3, some, same, son, one are replaced to the sone in I have sone apples respectively;It obtains:
Candidate sentence one:I have some apples;
Candidate sentence two:I have same apples;
Candidate sentence three:I have son apples;
Candidate sentence four:I have one apples.
Based on N-GRAM models, N values 3 in N-GRAM models, candidate word some is corresponding in candidate sentence one closes on word For apples, the corresponding words that close on of candidate word same are apples in candidate sentence two, and candidate word son is corresponded in candidate sentence three The word that closes on be apples, the corresponding words that close on of candidate word one are apples in candidate sentence four;And according to N-GRAM models point The corresponding assessment probability of each candidate word is not calculated, and obtained assessment probability is as follows:
4, according to formulaThe assessment of each candidate word is calculated It is scored at:
scoresome=-1 × 1.675 × 1/ (- 5.6)=0.299;
scoresame=-1 × 0.925 × 1/ (- 6.35)=0.146;
scoreson=-1 × 1.86 × 1/ (- 7.58)=0.245;
scoreone=-0.5 × 1.86 × 1/ (- 5.26)=0.177.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, candidate word some is corresponding for BiLSTM models, in candidate sentence one closes on word For I, have and apples, the corresponding words that close on of candidate word same are I, have and apples in candidate sentence two, candidate For I, have and apples, candidate word one is corresponding in candidate sentence four faces the corresponding words that close on of candidate word son in sentence three Nearly word is I, have and apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Step 4 replaces with:
According to formulaThe assessment score of each candidate word is calculated For:
scoresome=-1 × 1.675 × 1/ (- 2.88)=0.582;
scoresame=-1 × 0.925 × 1/ (- 3.62)=0.256;
scoreson=-1 × 1.86 × 1/ (- 4.1)=0.454;
scoreone=-0.5 × 1.86 × 1/ (- 2.62)=0.355.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for LSTM models, the corresponding words that close on of candidate word some are in candidate sentence one I, have and apples, the corresponding words that close on of candidate word same are I, have and apples, candidate language in candidate sentence two For I, have and apples, candidate word one is corresponding in candidate sentence four closes on the corresponding words that close on of candidate word son in sentence three Word is I, have and apples;It is as follows based on the corresponding assessment probability of candidate word in each candidate sentence of this calculating:
Step 4 replaces with:
According to formulaAssessing for each candidate word is calculated It is divided into:
scoresome=-1 × 1.675 × 1/ (- 3.175)=0.528;
scoresame=-1 × 0.925 × 1/ (- 3.75)=0.247;
scoreson=-1 × 1.86 × 1/ (- 4.6)=0.404;
scoreone=-0.5 × 1.86 × 1/ (- 3.45)=0.27.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
By the candidate word assessment under above-mentioned three kinds of language models, habit problem when user's written word had both been considered, Also the assessment information of similarity information and language environment is added, which thereby enhances the reliability of candidate word assessment.
As shown in figure 13, in the 12nd embodiment, a kind of candidate word appraisal procedure is provided, is included the following steps:
Step S121 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S122 determines the editing distance of each candidate word and the wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S123, determines the similarity of each candidate word and wrong word, the similarity according to each candidate word with wrong word most Long common subsequence and/or Longest Common Substring obtain.
The explanation that can refer to second embodiment about the realization of the step, does not repeat.
Step S124 determines the language environment probability that each candidate word is set in the wrong lexeme.
The explanation that can refer to 3rd embodiment about the realization of the step, does not repeat.
Step S125 obtains error message of the wrong word relative to each candidate word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S126 determines each candidate according to the editing distance, similarity, language environment probability and error message The corresponding assessment score of word.
The present embodiment is according to editing distance, similarity, language environment probability and the error message of each candidate word and wrong word Determine the corresponding assessment score of each candidate word.Both editing distance, similarity and the user of candidate word and wrong word had been considered The phenomenon that writing problem, also by the assessment information of language model be added come in, the reliability of candidate word assessment result can be improved, into One step, be conducive to the efficiency and accuracy that improve text editing.
In one embodiment, the log values for the probability set in the wrong lexeme by each candidate word when language environment probability When expression, it can be determined each according to the inverse of the editing distance, similarity, the inverse of language environment probability and error message The corresponding assessment score of candidate word.
It is in one embodiment, described according to the editing distance, similarity, language environment probability and error message, The step of determining each candidate word corresponding assessment score include:If wrong word is identical as candidate initial letter, according to the editor Distance, similarity, language environment probability and the first coefficient calculate the assessment score of the candidate word;If wrong word and candidate prefix It is alphabetical different, then commenting for the candidate word is calculated according to the editing distance, similarity, language environment probability and the second coefficient Estimate score.
In one embodiment, the assessment score of each candidate word is calculated according to following formula:
The specific calculating process of assessment score is exemplified below:
Known cases are that user is write some mistakes as sone, context when writing English words sentence on intelligent interaction tablet Language environment is I have sone apples.
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make For candidate word.Determine that each candidate word and the similarity of wrong word are respectively:Ssome=1.675;Ssame=0.925;Sson=1.86; Sone=1.86.The corresponding editing distance difference of candidate word some, same, son, one:1,2,1,1.
3, some, same, son, one are replaced into the sone in I have sone apples respectively, according to N-GRAM moulds It is corresponding that type calculates each candidate wordN values 3 in N-GRAM models, the language environment for obtaining each candidate word are general Rate is as follows:
4, according to formulaEach candidate word is calculated Assessment is scored at:
scoresome=-1 × 1.675 × 1/1 × 1/ (- 5.7)=0.294;
scoresame=-1 × 0.925 × 1/2 × 1/ (- 6)=0.077;
scoreson=-1 × 1.86 × 1/1 × 1/ (- 8)=0.233;
scoreone=-0.5 × 1.86 × 1/1 × 1/ (- 4.8)=0.194.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for BiLSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Point Some, same, son, one are not replaced into the sone in I have sone apples, calculate the corresponding language ring of each candidate word Border probabilityEach candidate word calculates as follows:
Step 4 replaces with:
According to formulaCalculate assessing for each candidate word Point:
scoresome=-1 × 1.675 × 1/1 × 1/-4.9=0.342;
scoresame=-1 × 0.925 × 1/2 × 1/-5.07=0.182;
scoreson=-1 × 1.86 × 1/1 × 1/-7.6=0.246;
scoreone=-0.5 × 1.86 × 1/1 × 1/-4.66=0.199.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for LSTM models, above-mentioned steps 1 and step 2 are identical;Step 3 replaces with:Respectively Some, same, son, one are replaced into the sone in I have sone apples, calculate the corresponding language environment of each candidate word ProbabilityEach candidate word calculates as follows:
Step 4 replaces with:
According to formulaCalculate the assessment score of each candidate word:
scoresome=-1 × 1.675 × 1/1 × 1/-5.6=0.3;
scoresame=-1 × 0.925 × 1/2 × 1/-6.3=0.147;
scoreson=-1 × 1.86 × 1/1 × 1/-6=0.216;
scoreone=-0.5 × 1.86 × 1/1 × 1/-5.1=0.182.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
By the candidate word assessment under above-mentioned three kinds of language models, habit problem when user's written word had both been considered, Also the assessment information of editing distance, similarity and language environment is added, which thereby enhances the reliable of candidate word assessment Property.
As shown in figure 14, in the 13rd embodiment, a kind of candidate word appraisal procedure is provided, is included the following steps:
Step S131 detects wrong word, obtains the corresponding multiple candidate words of wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S132 determines the editing distance of each candidate word and the wrong word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S133, determines the similarity of each candidate word and wrong word, the similarity according to each candidate word with wrong word most Long common subsequence and/or Longest Common Substring obtain.
The explanation that can refer to second embodiment about the realization of the step, does not repeat.
Step S134 replaces the wrong word with each candidate word respectively, obtains candidate sentence, is determined according to the candidate sentence The assessment probability of corresponding candidate word, language environment probability and candidate of the assessment probability according to candidate word in candidate sentence The language environment probability for closing on word of word obtains.
The explanation that can refer to fourth embodiment about the realization of the step, does not repeat.
Step S135 obtains error message of the wrong word relative to each candidate word.
The explanation that can refer to first embodiment about the realization of the step, does not repeat.
Step S136 determines each candidate word pair according to the editing distance, similarity, assessment probability and error message The assessment score answered.
The present embodiment is determined according to each candidate word and editing distance, similarity, assessment probability and the error message of wrong word The corresponding assessment score of each candidate word.Both it had considered candidate word and editing distance, similarity and the user of wrong word writes The phenomenon that problem, also by the assessment information of language model be added come in, the reliability of candidate word assessment result can be improved, be conducive to Improve the efficiency and accuracy of text editing.
In one embodiment, log value of the language environment probability of each word with each word in the probability of its position indicates, is based on This obtains the assessment probability of candidate word, further, can according to the inverse of the editing distance, similarity, assessment probability fall Number and error message, determine the corresponding assessment score of each candidate word.
In one embodiment, described according to the editing distance, similarity, assessment probability and error message, it determines The step of each candidate word corresponding assessment score includes:If wrong word is identical as candidate's initial letter, according to the editing distance, Similarity, assessment probability and the first coefficient calculate the assessment score of the candidate word;If wrong word is different from candidate initial letter, The assessment score of the candidate word is then calculated according to the editing distance, similarity, assessment probability and the second coefficient.
In one embodiment, the assessment score of each candidate word is calculated according to following formula:
The specific calculating process of assessment score is exemplified below:
Known cases are that user is write some mistakes as sone, context when writing English words sentence on intelligent interaction tablet Language environment is I have sone apples.
1, it is detected by English dictionary, it is found that sone is not present in English dictionary, determine that sone is wrong word.
2, the word that the editing distance in English dictionary with sone is 1 and 2 is obtained, it is assumed that have some, same, son, one, make For candidate word.It calculates each candidate word and the similarity of wrong word is respectively:Ssome=1.675;Ssame=0.925;Sson=1.86; Sone=1.86.The corresponding editing distance difference of candidate word some, same, son, one:1,2,1,1.
3, some, same, son, one are replaced into the sone in I have sone apples respectively, and according to N-GRAM Model calculates separately the corresponding assessment probability of each candidate word, and obtained assessment probability is as follows:
4, according to formulaEach candidate is calculated The assessment of word is scored at:
scoresome=-1 × 1.675 × 1/1 × 1/ (- 5.6)=0.299;
scoresame=-1 × 0.925 × 1/2 × 1/ (- 6.35)=0.073;
scoreson=-1 × 1.86 × 1/1 × 1/ (- 7.58)=0.245;
scoreone=-0.5 × 1.86 × 1/1 × 1/ (- 5.26)=0.177.
It follows that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for BiLSTM models, it is general to calculate the corresponding assessment of candidate word in each candidate sentence Rate is as follows:
Step 4 replaces with:
According to formulaEach candidate word is calculated Assessment be scored at:
scoresome=-1 × 1.675 × 1/1 × 1/ (- 2.88)=0.582;
scoresame=-1 × 0.925 × 1/2 × 1/ (- 3.62)=0.128;
scoreson=-1 × 1.86 × 1/1 × 1/ (- 4.1)=0.454;
scoreone=-0.5 × 1.86 × 1/1 × 1/ (- 2.62)=0.355.
It will also be appreciated that candidate word some is relative to other candidate words, it is the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
In another embodiment, for LSTM models, the corresponding assessment probability of candidate word in each candidate sentence is calculated It is as follows:
Step 4 replaces with:
According to formulaEach candidate word is calculated Assessment is scored at:
scoresome=-1 × 1.675 × 1/1 × 1/ (- 3.175)=0.528;
scoresame=-1 × 0.925 × 1/2 × 1/ (- 3.75)=0.124;
scoreson=-1 × 1.86 × 1/1 × 1/ (- 4.6)=0.404;
scoreone=-0.5 × 1.86 × 1/1 × 1/ (- 3.45)=0.27.
It can equally obtain, candidate word some is relative to other candidate words, for the corresponding correct words (error correction term) of wrong word sone Possibility bigger.
By the candidate word assessment under above-mentioned three kinds of language models, habit problem when user's written word had both been considered, Also the assessment information of editing distance, similarity and language environment is added, which thereby enhances candidate word assessment result Reliability.
On the basis of any of the above-described embodiment obtains candidate word corresponding assessment score, in one embodiment, above-mentioned time It further includes step to select word appraisal procedure:Determine that the wrong word is corresponding from the multiple candidate word according to the assessment score Error correction term is corrected the wrong word with the error correction term;Thus, it is possible to more accurately determinations to be corrected to wrong word.
Optionally, the candidate word that assessment highest scoring is selected from multiple candidate words of wrong word is corresponded to as the wrong word Error correction term.Further, wrong word can also be replaced using the error correction term, to realize the effect to wrong word automatic error-correcting Fruit.
In addition, in one embodiment, on the basis of any of the above-described embodiment obtains candidate word corresponding assessment score, on It further includes step to state candidate word appraisal procedure:Multiple candidate words are ranked up according to assessment score, it is multiple according to sequencing display Candidate word so that the higher candidate word of assessment score show it is more forward, preferably to prompt user.
In one embodiment, based on any of the above embodiments, above-mentioned candidate word appraisal procedure further includes step:Inspection Word to be detected is surveyed whether in default dictionary, if not, it is determined that the word to be detected is wrong word.Such as:Each word is scanned, is detected Each word is determined as wrong word whether in dictionary if not in dictionary.
It should be understood that default dictionary is either general English dictionary, Chinese dictionary etc., can also be that other are specific Dictionary, dictionary can be selected according to actual conditions.
In one embodiment, optionally, after detection malfunctions word, above-mentioned candidate word appraisal procedure further includes the determining mistake The step of word corresponding candidate word set.The step can be:The editing distance for calculating the wrong word and known words in dictionary, chooses Known words of the editing distance in setting range obtain the corresponding multiple candidate words of the wrong word.For example, it is small to choose editing distance In 3 known words as candidate word, thus improve the validity of candidate word assessment.
It should be understood that for each method embodiment above-mentioned, although each step in flow chart is according to arrow Instruction is shown successively, but these steps are not the inevitable sequence indicated according to arrow to be executed successively.Unless having herein bright True explanation, there is no stringent sequences to limit for the execution of these steps, these steps can execute in other order.And And at least part step in the flow chart of embodiment of the method may include multiple sub-steps or multiple stages, this is a little Step or stage are not necessarily to execute completion in synchronization, but can execute at different times, these sub-steps Either the execution sequence in stage be also not necessarily carry out successively but can with the sub-step of other steps or other steps or At least part in person's stage executes in turn or alternately.
Based on the thought of the candidate word appraisal procedure with the embodiment of above-mentioned first embodiment~the 13rd, the embodiment of the present application Additionally provide corresponding candidate word apparatus for evaluating.
As shown in figure 15, in the 14th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 101 obtains the corresponding multiple candidate words of wrong word for detecting wrong word;
Apart from determining module 102, the editing distance for determining each candidate word and the wrong word;
Error message acquisition module 103, for obtaining error message of the wrong word relative to each candidate word;
And first evaluation module 104, for according to the editing distance and error message, determining each candidate word pair The assessment score answered.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding each candidate word and the wrong word editing distance, and determine error message of the wrong word relative to each candidate word; According to the editing distance and error message, the corresponding assessment score of each candidate word is determined;Both showing for word writing had been considered As problem, also the information of the editing distance between word is taken into account, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, first evaluation module 104, for being believed according to the inverse and error of the editing distance Breath, determines the corresponding assessment score of each candidate word.
In one embodiment, wherein wrong word includes relative to the error message of each candidate word:The mistake word and candidate word Whether the identical information of initial;
First evaluation module 104 includes:First scoring submodule, if identical as candidate initial letter for wrong word, Then according to the editing distance and the first coefficient, the corresponding assessment score of the candidate word is calculated;Second scoring submodule, is used If different from candidate initial letter in wrong word, according to the editing distance and the second coefficient, calculate the candidate word and correspond to Assessment score.
As shown in figure 16, in the 15th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 201 obtains the corresponding multiple candidate words of wrong word for detecting wrong word;
Similarity determining module 202, the similarity for determining each candidate word and wrong word, the similarity is according to each candidate The longest common subsequence and/or Longest Common Substring of word and wrong word obtain;
Error message acquisition module 203, for obtaining error message of the wrong word relative to each candidate word;
And second evaluation module 204, for according to the similarity and error message, determining that each candidate word corresponds to Assessment score.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding each candidate word and the wrong word similarity, and determine error message of the wrong word relative to each candidate word;Root According to the similarity and error message, the corresponding assessment score of each candidate word is determined;Both the phenomenon that word is write had been considered to ask Topic, also takes into account the similarity information between word, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, the similarity determining module 202 includes:First similarity calculation submodule or the second phase Like degree computational submodule.
Wherein, the first similarity calculation submodule, for the longest common subsequence according to each candidate word and the wrong word At least one of rate, Longest Common Substring rate calculate the similarity of each candidate word and the wrong word;Alternatively, the second similarity Computational submodule, for according in the longest common subsequence rate of each candidate word and the wrong word, Longest Common Substring rate extremely The editing distance of few one and each candidate word and the wrong word, calculates the similarity of each candidate word and the wrong word.
In one embodiment, the second similarity calculation submodule, for according to each candidate word with the wrong word most At least one of long common subsequence rate, Longest Common Substring rate and each candidate word and the editing distance of the wrong word Inverse calculates the similarity of each candidate word and the wrong word.
In one embodiment, the wrong word includes relative to the error message of each candidate word:The mistake word and candidate word Whether the identical information of initial;Optionally, second evaluation module 204 includes:First scoring submodule, if for wrong word It is identical as candidate initial letter, then according to the similarity and the first coefficient, calculate the corresponding assessment score of the candidate word; Second scoring submodule, according to the similarity and the second coefficient, calculates if different from candidate initial letter for wrong word The corresponding assessment score of the candidate word.
As shown in figure 17, in the 16th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 301 obtains the corresponding multiple candidate words of wrong word for detecting wrong word;
First probability determination module 302, the language environment probability set in the wrong lexeme for determining each candidate word;
Error message acquisition module 303, for obtaining error message of the wrong word relative to each candidate word;
And third evaluation module 304, for according to the language environment probability and error message, determining each candidate The corresponding assessment score of word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding the probability set in the wrong lexeme of each candidate word, and determine error message of the wrong word relative to each candidate word; According to the language environment probability and error message, the corresponding assessment score of each candidate word is determined;Both user's book had been considered Habit problem when word is write, also takes into account the information of context language environment, thus, it is possible to improve candidate word assessment result Reliability.
In one embodiment, first probability determination module 302, for calculating each candidate according to preset language model The probability that word is set in the wrong lexeme, using the log values of the probability as the language environment probability of the candidate word.
In one embodiment, the third evaluation module 304, for according to the inverse of the language environment probability and going out Wrong information determines the corresponding assessment score of each candidate word;Wherein, the language model include but not limited to N-Gram models, BiLSTM models or LSTM models.
In one embodiment, wrong word includes relative to the error message of each candidate word:Whether mistake word and the candidate word The identical information of initial;Accordingly, the third evaluation module 304 includes:First scoring submodule, if for wrong word and time It selects initial letter identical, assessing for the candidate word is calculated according to the inverse of the language environment probability and the first coefficient Point;Second scoring submodule, if different with candidate's initial letter for wrong word, according to the inverse of the language environment probability and Second coefficient calculates the assessment score of the candidate word.
As shown in figure 18, in the 17th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 401 obtains the corresponding multiple candidate words of wrong word for detecting wrong word;
Second probability determination module 402, for each candidate word replacement wrong word, obtaining candidate sentence respectively, according to Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence Border probability and the language environment probability for closing on word of candidate word obtain;
Error message acquisition module 403, for obtaining error message of the wrong word relative to each candidate word;
And the 4th evaluation module 404, for according to the assessment probability and error message, determining each candidate word pair The assessment score answered.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding the corresponding assessment probability of each candidate word, and determine error message of the wrong word relative to each candidate word;According to institute Probability and error message are estimated in commentary, determine the corresponding assessment score of each candidate word;Both habit when user's written word had been considered Used problem, also takes into account the information of context language environment, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, second probability determination module 402 is additionally operable to be calculated according to preset language model candidate The probability for closing on its each comfortable position of word of candidate word, candidate word in sentence, using the log values of the probability as the language of each word Ambient probability;The language environment probability averaging for closing on word to the language environment probability of candidate word, candidate word in candidate sentence, Obtain the assessment probability of candidate word in the candidate sentence.
In one embodiment, the 4th evaluation module 404, specifically for according to it is described assessment probability inverse and go out Wrong information determines the corresponding assessment score of each candidate word;Wherein, the language model include but not limited to N-Gram models, BiLSTM models or LSTM models.
In one embodiment, the wrong word includes relative to the error message of each candidate word:The mistake word and candidate word Whether the identical information of initial;4th evaluation module 404 includes:First scoring submodule, if for wrong word and candidate Initial letter is identical, then according to the assessment probability and the first coefficient, calculates the corresponding assessment score of the candidate word;Second Score submodule, if different from candidate initial letter for wrong word, according to the assessment probability and the second coefficient, calculates institute State the corresponding assessment score of candidate word.
As shown in figure 19, in the 18th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 501 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Apart from determining module 502, the editing distance for determining each candidate word and the wrong word;
Second probability determination module 503, for each candidate word replacement wrong word, obtaining candidate sentence respectively, according to Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence Border probability and the language environment probability for closing on word of candidate word obtain;
And the 5th evaluation module 504, for according to the editing distance and assessment probability, determining each candidate word Assess score.
In one embodiment, second probability determination module 503, for replacing the mistake with each candidate word respectively Word obtains candidate sentence, according to preset language model calculate candidate word in candidate sentence, candidate word close on word it is each it is comfortable its The probability of position, using the log values of the probability as the language environment probability of each word;To the language ring of candidate word in candidate sentence The language environment probability averaging for closing on word of border probability, candidate word, obtains the assessment probability of candidate word in the candidate sentence. Wherein, the language model includes but not limited to:N-Gram models, BiLSTM models or LSTM models.
Based on above-described embodiment, the 5th evaluation module 504 is specifically used for the inverse according to the editing distance and assessment The inverse of probability determines the corresponding assessment score of each candidate word.
As shown in figure 20, in the 19th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 601 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Similarity determining module 602, the similarity for determining each candidate word and wrong word, the similarity is according to each candidate The longest common subsequence and/or Longest Common Substring of word and wrong word obtain;
Second probability determination module 603, for each candidate word replacement wrong word, obtaining candidate sentence respectively, according to Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence Border probability and the language environment probability for closing on word of candidate word obtain;
And the 6th evaluation module 604, for according to the similarity and assessment probability, determining commenting for each candidate word Estimate score.
In one embodiment, second probability determination module 603, for replacing the mistake with each candidate word respectively Word obtains candidate sentence, according to preset language model calculate candidate word in candidate sentence, candidate word close on word it is each it is comfortable its The probability of position, using the log values of the probability as the language environment probability of each word;To the language ring of candidate word in candidate sentence The language environment probability averaging for closing on word of border probability, candidate word, obtains the assessment probability of candidate word in the candidate sentence. Wherein, the language model includes but not limited to:N-Gram models, BiLSTM models or LSTM models.
Based on above-described embodiment, the 6th evaluation module 604 is specifically used for according to the reciprocal and similar of the assessment probability Degree, determines the corresponding assessment score of each candidate word.
As shown in figure 21, in the 20th embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 701, for when detecting wrong word, obtaining the corresponding multiple candidate words of wrong word;
Apart from determining module 702, the editing distance for determining each candidate word and the wrong word;
First probability determination module 703, the language environment probability set in the wrong lexeme for determining each candidate word;
Error message acquisition module 704, for obtaining error message of the wrong word relative to each candidate word;
And the 7th evaluation module 705, for according to the editing distance, language environment probability and error message, really Determine the corresponding assessment score of each candidate word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding the probability set in the wrong lexeme with the editing distance of the wrong word and each candidate word of each candidate word, and determine Error message of the mistake word relative to each candidate word;According to the editing distance, language environment probability and error message, really Determine the corresponding assessment score of each candidate word;Both editing distance information and context language environment had been considered, it is also contemplated that Habit problem when user's written word, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, first probability determination module 703, for calculating each candidate according to preset language model The probability that word is set in the wrong lexeme, using the log values of the probability as the language environment probability of candidate word.
In one embodiment, the 7th evaluation module 505 is specifically used for inverse, language according to the editing distance The inverse and error message of ambient probability determine the corresponding assessment score of each candidate word;The language model includes but unlimited In N-Gram models, BiLSTM models or LSTM models.
In one embodiment, wrong word includes relative to the error message of each candidate word:Whether mistake word and the candidate word The identical information of initial;Accordingly, the 7th evaluation module 705 includes:First scoring submodule, if for wrong word and time It selects initial letter identical, then according to the editing distance, language environment probability and the first coefficient, calculates the candidate word and correspond to Assessment score;Second scoring submodule, if different from candidate initial letter for wrong word, according to the editing distance, language It says ambient probability and the second coefficient, calculates the corresponding assessment score of the candidate word.
As shown in figure 22, in the 21st embodiment, candidate word apparatus for evaluating includes:
Candidate word acquisition module 801 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Apart from determining module 802, the editing distance for determining each candidate word and the wrong word.
Similarity determining module 803, the similarity for determining each candidate word and wrong word, the similarity is according to each candidate The longest common subsequence and/or Longest Common Substring of word and wrong word obtain.
Error message acquisition module 804, for obtaining error message of the wrong word relative to each candidate word.
And the 8th evaluation module 805, for according to the editing distance, similarity and error message, determining each time Select the corresponding assessment score of word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding each candidate word and the wrong word editing distance and similarity, and determine error of the wrong word relative to each candidate word Information;According to the editing distance, similarity and error message, the corresponding assessment score of each candidate word is determined;Both it considered Editing distance information and similarity, it is also contemplated that habit problem when user's written word, thus, it is possible to improve candidate word to comment Estimate the reliability of result.
In one embodiment, the 8th evaluation module 805 includes:First scoring submodule, if for wrong word and time It selects initial letter identical, then the corresponding assessment score of the candidate word is calculated according to the distance, similarity and the first coefficient; Second scoring submodule, if different from candidate initial letter for wrong word, according to the distance, similarity and the second coefficient Calculate the corresponding assessment score of the candidate word.
As shown in figure 23, in 22 embodiments, a kind of candidate word apparatus for evaluating is provided, the candidate word of the present embodiment is commented Estimating device includes:
Candidate word acquisition module 901 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Apart from determining module 902, the editing distance for determining each candidate word and the wrong word.
Second probability determination module 903, for each candidate word replacement wrong word, obtaining candidate sentence respectively, according to Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence Border probability and the language environment probability for closing on word of candidate word obtain.
Error message acquisition module 904, for obtaining error message of the wrong word relative to each candidate word.
And the 9th evaluation module 905, for according to the editing distance, assessment probability and error message, determining each The corresponding assessment score of candidate word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding the assessment probability set in the wrong lexeme with the editing distance of the wrong word and each candidate word of each candidate word, and Determine error message of the wrong word relative to each candidate word;According to the editing distance, assessment probability and error message, really Determine the corresponding assessment score of each candidate word;Both editing distance information and context language environment had been considered, it is also contemplated that Habit problem when user's written word, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, the second probability determination module 903 is specifically used for being calculated according to preset language model candidate The probability for closing on its each comfortable position of word of candidate word, candidate word in sentence, using the log values of the probability as the language of each word Ambient probability;The language environment probability averaging for closing on word to the language environment probability of candidate word, candidate word in candidate sentence, Obtain the assessment probability of candidate word in the candidate sentence.
In one embodiment, the 9th evaluation module 905 includes:First scoring submodule, if for wrong word and time It selects initial letter identical, then assessing for the candidate word is calculated according to the editing distance, assessment probability and the first coefficient Point;Second scoring submodule, if different with candidate initial letter for wrong word, according to the editing distance, assess probability with And second coefficient calculate the assessment score of the candidate word.
As shown in figure 24, in 23 embodiments, a kind of candidate word apparatus for evaluating is provided, the candidate word of the present embodiment is commented Estimating device includes:
Candidate word acquisition module 1001 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Similarity determining module 1002, the similarity for determining each candidate word and wrong word, the similarity is according to each time The longest common subsequence and/or Longest Common Substring for selecting word and wrong word obtain.
First probability determination module 1003, the language environment probability set in the wrong lexeme for determining each candidate word.
Error message acquisition module 1004, for obtaining error message of the wrong word relative to each candidate word.
Tenth evaluation module 1005, for according to the similarity, language environment probability and error message, determining each time Select the corresponding assessment score of word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding the prophesy ambient probability set in the wrong lexeme with the similarity of the wrong word and each candidate word of each candidate word, with And determine error message of the wrong word relative to each candidate word;According to the similarity, language environment probability and error letter Breath, determines the corresponding assessment score of each candidate word;Both editing distance information and context language environment had been considered, it is further contemplated that Habit problem when user's written word is arrived, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, the first probability determination module 1003, for calculating each candidate word according to preset language model In the probability that the wrong lexeme is set, using the log values of the probability as the language environment probability of candidate word.
In one embodiment, the tenth evaluation module 1005 includes:First scoring submodule, if for wrong word and time It selects initial letter identical, then calculates the assessment of the candidate word according to the similarity, language environment probability and the first coefficient Score;Second scoring submodule, it is general according to the similarity, language environment if different from candidate initial letter for wrong word Rate and the second coefficient calculate the assessment score of the candidate word.
As shown in figure 25, in the 24th embodiment, a kind of candidate word apparatus for evaluating, the candidate word of the present embodiment are provided Apparatus for evaluating includes:
Candidate word acquisition module 1101 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Similarity determining module 1102, the similarity for determining each candidate word and wrong word, the similarity is according to each time The longest common subsequence and/or Longest Common Substring for selecting word and wrong word obtain.
Second probability determination module 1103, for each candidate word replacement wrong word, obtaining candidate sentence respectively, according to Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence Border probability and the language environment probability for closing on word of candidate word obtain.
Error message acquisition module 1104, for obtaining error message of the wrong word relative to each candidate word.
And the 11st evaluation module 1105, for according to the similarity, assessment probability and error message, determining The corresponding assessment score of each candidate word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding the assessment probability set in the wrong lexeme with the similarity of the wrong word and each candidate word of each candidate word, and determine Error message of the mistake word relative to each candidate word;According to the similarity, assessment probability and error message, each time is determined Select the corresponding assessment score of word;Both similarity and context language environment had been considered, it is also contemplated that when user's written word Habit problem, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, the second probability determination module 1103 is specifically used for being calculated according to preset language model and wait The probability for closing on its each comfortable position of word for selecting candidate word in sentence, candidate word, using the log values of the probability as the language of each word Say ambient probability;Flat is asked to the language environment probability for closing on word of the language environment probability of candidate word in candidate sentence, candidate word , the assessment probability of candidate word in the candidate sentence is obtained.
In one embodiment, the 11st evaluation module 1105 includes:First scoring submodule, if for wrong word with Candidate initial letter is identical, then calculates assessing for the candidate word according to the similarity, assessment probability and the first coefficient Point;Second scoring submodule, if different with candidate initial letter for wrong word, according to the similarity, assess probability and Second coefficient calculates the assessment score of the candidate word.
As shown in figure 26, in the 25th embodiment, a kind of candidate word apparatus for evaluating, the candidate word of the present embodiment are provided Apparatus for evaluating includes:
Candidate word acquisition module 1201 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Apart from determining module 1202, the editing distance for determining each candidate word and the wrong word.
Similarity determining module 1203, the similarity for determining each candidate word and wrong word, the similarity is according to each time The longest common subsequence and/or Longest Common Substring for selecting word and wrong word obtain.
First probability determination module 1204, the language environment probability set in the wrong lexeme for determining each candidate word.
Error message acquisition module 1205, for obtaining error message of the wrong word relative to each candidate word.
And the 12nd evaluation module 1206, for according to the editing distance, similarity, language environment probability and Error message determines the corresponding assessment score of each candidate word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point It Que Ding not the language ring set in the wrong lexeme with the editing distance and similarity of the wrong word and each candidate word of each candidate word Border probability, and determine error message of the wrong word relative to each candidate word;According to the editing distance, similarity, language Ambient probability and error message determine the corresponding assessment score of each candidate word;Both editing distance information, similarity had been considered And context language environment, it is also contemplated that habit problem when user's written word, thus, it is possible to improve candidate word assessment result Reliability.
In one embodiment, first probability determination module 1204 is specifically used for according to preset language model meter The probability that each candidate word is set in the wrong lexeme is calculated, using the log values of the probability as the language environment probability of the candidate word.
In one embodiment, the 12nd evaluation module 1206 includes:First scoring submodule, if for wrong word with Candidate initial letter is identical, then calculates the time according to the editing distance, similarity, language environment probability and the first coefficient Select the assessment score of word;Second scoring submodule, if different with candidate initial letter for wrong word, according to it is described edit away from From, similarity, language environment probability and the second coefficient calculate the assessment score of the candidate word.
As shown in figure 27, in the 26th embodiment, a kind of candidate word apparatus for evaluating, the candidate word of the present embodiment are provided Apparatus for evaluating includes:
Candidate word acquisition module 1301 obtains the corresponding multiple candidate words of wrong word for detecting wrong word.
Apart from determining module 1302, the editing distance for determining each candidate word and the wrong word.
Similarity determining module 1303, the similarity for determining each candidate word and wrong word, the similarity is according to each time The longest common subsequence and/or Longest Common Substring for selecting word and wrong word obtain.
Second probability determination module 1304, for each candidate word replacement wrong word, obtaining candidate sentence respectively, according to Candidate's sentence determines the assessment probability of corresponding candidate word, described to assess language ring of the probability according to candidate word in candidate sentence Border probability and the language environment probability for closing on word of candidate word obtain.
Error message acquisition module 1305, for obtaining error message of the wrong word relative to each candidate word.
And the 13rd evaluation module 1306, for according to the editing distance, similarity, assessment probability and error Information determines the corresponding assessment score of each candidate word.
The candidate word apparatus for evaluating of above-described embodiment obtains corresponding multiple candidate words first when detecting wrong word, point Not Que Ding the assessment set in the wrong lexeme of each candidate word and the editing distance and similarity of the wrong word and each candidate word it is general Rate, and determine error message of the wrong word relative to each candidate word;According to the editing distance, similarity, assessment probability And error message, determine the corresponding assessment score of each candidate word;Both editing distance, similarity and context language had been considered Say environment, it is also contemplated that habit problem when user's written word, thus, it is possible to improve the reliability of candidate word assessment result.
In one embodiment, the second probability determination module 1304 is specifically used for being calculated according to preset language model and wait The probability for closing on its each comfortable position of word for selecting candidate word in sentence, candidate word, using the log values of the probability as the language of each word Say ambient probability;Flat is asked to the language environment probability for closing on word of the language environment probability of candidate word in candidate sentence, candidate word , the assessment probability of candidate word in the candidate sentence is obtained.
In one embodiment, the 13rd evaluation module 1306 is additionally operable to the inverse according to the editing distance, phase Like degree, the inverse and error message of assessment probability, the corresponding assessment score of each candidate word is determined.
In one embodiment, the 13rd evaluation module 1306 includes:First scoring submodule, if for wrong word with Candidate initial letter is identical, then calculates the candidate word according to the editing distance, similarity, assessment probability and the first coefficient Corresponding assessment score;Second scoring submodule, if different with candidate initial letter for wrong word, according to it is described edit away from The corresponding assessment score of the candidate word is calculated from, similarity, assessment probability and the second coefficient.
On the basis of the candidate word apparatus for evaluating of any of the above-described embodiment, in one embodiment, candidate word apparatus for evaluating Further include:Candidate word determining module, the editing distance for calculating wrong word and known words in default dictionary choose editing distance and exist Known words in setting range obtain the corresponding multiple candidate words of the wrong word.
In one embodiment, on the basis of the candidate word apparatus for evaluating of any of the above-described embodiment, candidate word apparatus for evaluating Further include:Correction module, for determining that described wrong word is corresponding entangles from the multiple candidate word according to the assessment score Wrong word is corrected the wrong word with the error correction term.Optionally, the wrong word correction module, is used for from multiple candidate words In determine the candidate word of the assessment highest scoring, as the corresponding error correction term of the wrong word.
In one embodiment, on the basis of the candidate word apparatus for evaluating of any of the above-described embodiment, candidate word apparatus for evaluating Further include:Sorting module, it is described after display sequence for being ranked up to the multiple candidate word according to the assessment score Multiple candidate words.
In one embodiment, a kind of computer equipment, including memory and processor are provided, meter is stored in memory The step of calculation machine program, which realizes candidate word appraisal procedure in any of the above-described embodiment when executing computer program.
It can be more effective compared to tradition according to the candidate word assessment mode of editing distance by the computer equipment The candidate word of the wrong word of assessment.
In one embodiment, a kind of computer readable storage medium is provided, computer program, computer are stored thereon with The step of candidate word appraisal procedure in any of the above-described embodiment is realized when program is executed by processor.
It can compared to tradition according to the candidate word assessment mode of editing distance by the computer readable storage medium The more effectively candidate word of the wrong word of assessment.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, Any reference to memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield is all considered to be the range of this specification record.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.
The term " comprising " and " having " of embodiment hereof and their any deformations, it is intended that cover non-exclusive packet Contain.Such as contain series of steps or the process, method, system, product or equipment of (module) unit are not limited to arrange The step of going out or unit, but further include the steps that optionally not listing or unit, or further include optionally for these mistakes The intrinsic other steps of journey, method, product or equipment or unit.
Referenced herein " multiple " refer to two or more."and/or", the association for describing affiliated partner are closed System indicates may exist three kinds of relationships, for example, A and/or B, can indicate:Individualism A exists simultaneously A and B, individualism These three situations of B.It is a kind of relationship of "or" that character "/", which typicallys represent forward-backward correlation object,.
Referenced herein " first second " be only be the similar object of difference, do not represent for the specific of object Sequence, it is possible to understand that specific sequence or precedence can be interchanged in ground, " first second " in the case of permission.It should manage The object that solution " first second " is distinguished can be interchanged in the appropriate case so that the embodiments described herein can in addition to Here the sequence other than those of diagram or description is implemented.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the protection domain of the application patent should be determined by the appended claims.

Claims (19)

1. a kind of candidate word appraisal procedure, which is characterized in that including:
It detects wrong word, obtains the corresponding multiple candidate words of wrong word;
Determine the editing distance of each candidate word and the wrong word;
Determine the similarity of each candidate word and wrong word, the similarity is according to the longest common subsequence of each candidate word and wrong word And/or Longest Common Substring obtains;
The wrong word is replaced with each candidate word respectively, obtains candidate sentence, corresponding candidate word is determined according to the candidate sentence Assess probability, the assessment probability is according to the word that closes on of the language environment probability of candidate word in candidate sentence and candidate word Language environment probability obtains;
Obtain error message of the wrong word relative to each candidate word;
According to the editing distance, similarity, assessment probability and error message, the corresponding assessment score of each candidate word is determined.
2. candidate word appraisal procedure according to claim 1, which is characterized in that the phase of each candidate word of determination and wrong word The step of seemingly spending include:
According at least one of each candidate word and the longest common subsequence rate of the wrong word, Longest Common Substring rate, calculate The similarity of each candidate word and the wrong word;
Alternatively,
According at least one of each candidate word and the longest common subsequence rate of the wrong word, Longest Common Substring rate, and The editing distance of each candidate word and the wrong word calculates the similarity of each candidate word and the wrong word.
3. candidate word appraisal procedure according to claim 2, which is characterized in that described according to each candidate word and the wrong word Longest common subsequence rate, at least one of Longest Common Substring rate and each candidate word and the wrong word editor away from From, the step of calculating each candidate word and the similarity of the wrong word, including:
According at least one of each candidate word and the longest common subsequence rate of the wrong word, Longest Common Substring rate, and The inverse of each candidate word and the editing distance of the wrong word calculates the similarity of each candidate word and the wrong word.
4. candidate word appraisal procedure according to claim 1, which is characterized in that described according to the candidate sentence determination pair The step of assessment probability for answering candidate word includes:
The probability for closing on its each comfortable position of word of candidate word in candidate sentence, candidate word is calculated according to preset language model, Using the log values of the probability as the language environment probability of each word;
The language environment probability averaging for closing on word to the language environment probability of candidate word, candidate word in candidate sentence, obtains The assessment probability of candidate word in candidate's sentence.
5. candidate word appraisal procedure according to claim 4, which is characterized in that it is described according to the editing distance, it is similar Degree, assessment probability and error message, the step of determining each candidate word corresponding assessment score include:
According to the inverse of the editing distance, similarity, the inverse and error message for assessing probability, determine that each candidate word corresponds to Assessment score;
And/or
The language model includes N-Gram models, BiLSTM models or LSTM models.
6. candidate word appraisal procedure according to any one of claims 1 to 5, which is characterized in that the mistake word is relative to each time The error message of word is selected to include:Mistake word and the candidate word whether the identical information of initial;
It is described according to the editing distance, similarity, assessment probability and error message, determine that each candidate word is corresponding and assess Point the step of include:
If wrong word is identical as candidate initial letter, according to the editing distance, similarity, assessment probability and the first coefficient meter Calculate the assessment score of the candidate word;
If wrong word is different from candidate initial letter, according to the editing distance, similarity, assessment probability and the second coefficient meter Calculate the assessment score of the candidate word.
7. candidate word appraisal procedure according to claim 6, which is characterized in that further include step:
Word to be detected is detected not in default dictionary, determines that the word to be detected is wrong word.
8. candidate word appraisal procedure according to claim 7, which is characterized in that further include step after detecting wrong word:
The editing distance for calculating the wrong word and known words in the dictionary, chooses editing distance known in setting range Word obtains the corresponding multiple candidate words of the wrong word.
9. according to any candidate word appraisal procedure in claim 1,2,3,4,5,7,8, which is characterized in that further include step Suddenly:
The corresponding error correction term of the wrong word is determined from the multiple candidate word according to the assessment score, with the error correction term Correct the wrong word;
And/or
The multiple candidate word is ranked up according to the assessment score, the multiple candidate word after display sequence.
10. candidate word appraisal procedure according to claim 9, which is characterized in that it is described according to the assessment score from institute Stating the step of corresponding error correction term of the wrong word is determined in multiple candidate words includes:
The candidate word that the assessment highest scoring is determined from the multiple candidate word, as the corresponding error correction of the mistake word Word.
11. candidate word appraisal procedure according to claim 5, which is characterized in that calculate each candidate word according to following formula Assessment score:
Wherein, word indicates candidate word, DeditIndicate the editing distance of candidate word and wrong word,Indicate candidate The assessment probability of word, scorewordIndicate that the corresponding assessment score of candidate word, S indicate that the similarity of candidate word and wrong word, K indicate Error message of the wrong word relative to each candidate word;If candidate word is identical with wrong initial letter, K values are K1, otherwise, K value K2, K1, K2 are preset numerical value.
12. a kind of candidate word apparatus for evaluating, which is characterized in that including:
Candidate word acquisition module obtains the corresponding multiple candidate words of wrong word for detecting wrong word;
Apart from determining module, the editing distance for determining each candidate word and the wrong word;
Similarity determining module, the similarity for determining each candidate word and wrong word, the similarity is according to each candidate word and mistake The longest common subsequence and/or Longest Common Substring of word obtain;
Second probability determination module obtains candidate sentence, according to the candidate for replacing the wrong word with each candidate word respectively Sentence determines the assessment probability of corresponding candidate word, the assessment probability according to the language environment probability of candidate word in candidate sentence, And the language environment probability for closing on word of candidate word obtains;
Error message acquisition module, for obtaining error message of the wrong word relative to each candidate word;
And the 13rd evaluation module, for according to the editing distance, similarity, assessment probability and error message, determining The corresponding assessment score of each candidate word.
13. candidate word apparatus for evaluating according to claim 12, which is characterized in that the similarity determining module includes:
First similarity calculation module, for public according to each candidate word and longest common subsequence rate, the longest of the wrong word At least one of substring rate calculates the similarity of each candidate word and the wrong word;
Alternatively,
Second similarity calculation module, for public according to each candidate word and longest common subsequence rate, the longest of the wrong word The editing distance of at least one of substring rate and each candidate word and the wrong word calculates each candidate word and the wrong word Similarity.
14. candidate word apparatus for evaluating according to claim 13, which is characterized in that the second similarity calculation submodule Block is additionally operable to according at least one of each candidate word and the longest common subsequence rate of the wrong word, Longest Common Substring rate, And the inverse of each candidate word and the editing distance of the wrong word, calculate the similarity of each candidate word and the wrong word.
15. candidate word apparatus for evaluating according to claim 12, which is characterized in that second probability determination module, packet It includes:
Determine the probability submodule, for calculating candidate word in candidate sentence according to preset language model, candidate word closes on word The probability of each its position of leisure, using the log values of the probability as the language environment probability of each word;
Determine the probability submodule is assessed, for closing on word to the language environment probability of candidate word in candidate sentence, candidate word Language environment probability is averaging, and obtains the assessment probability of candidate word in the candidate sentence.
16. according to any candidate word apparatus for evaluating of claim 12 to 15, which is characterized in that wrong word is relative to each candidate The error message of word includes:Mistake word and the candidate word whether the identical information of initial;
13rd evaluation module includes:
If first scoring submodule according to the editing distance, similarity, is commented identical as candidate initial letter for wrong word Estimate probability and the first coefficient calculates the corresponding assessment score of the candidate word;
If second scoring submodule according to the editing distance, similarity, is commented different from candidate initial letter for wrong word Estimate probability and the second coefficient calculates the corresponding assessment score of the candidate word.
17. candidate word apparatus for evaluating according to claim 12, which is characterized in that further include:
Candidate word determining module, the editing distance for calculating wrong word and known words in default dictionary are chosen editing distance and are being set Determine the known words in range, obtains the corresponding multiple candidate words of the wrong word;
And/or
Error correction term determining module, for determining that the wrong word is corresponding from the multiple candidate word according to the assessment score Error correction term, with the error correction term correction wrong word;
And/or
Sorting module is described more after display sequence for being ranked up to the multiple candidate word according to the assessment score A candidate word.
18. a kind of computer equipment, including memory, processor and storage are on a memory and the meter that can run on a processor Calculation machine program, which is characterized in that the processor realizes the step of claim 1 to 11 any the method when executing described program Suddenly.
19. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The step of claim 1 to 11 any the method is realized when execution.
CN201810320351.9A 2018-04-11 2018-04-11 Candidate word evaluation method and device, computer equipment and storage medium Active CN108681533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810320351.9A CN108681533B (en) 2018-04-11 2018-04-11 Candidate word evaluation method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810320351.9A CN108681533B (en) 2018-04-11 2018-04-11 Candidate word evaluation method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108681533A true CN108681533A (en) 2018-10-19
CN108681533B CN108681533B (en) 2022-04-19

Family

ID=63799847

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810320351.9A Active CN108681533B (en) 2018-04-11 2018-04-11 Candidate word evaluation method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108681533B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11176321B2 (en) 2019-05-02 2021-11-16 International Business Machines Corporation Automated feedback in online language exercises

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550173A (en) * 2016-02-06 2016-05-04 北京京东尚科信息技术有限公司 Text correction method and device
CN105975625A (en) * 2016-05-26 2016-09-28 同方知网数字出版技术股份有限公司 Chinglish inquiring correcting method and system oriented to English search engine
CN106202153A (en) * 2016-06-21 2016-12-07 广州智索信息科技有限公司 The spelling error correction method of a kind of ES search engine and system
CN106847288A (en) * 2017-02-17 2017-06-13 上海创米科技有限公司 The error correction method and device of speech recognition text

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550173A (en) * 2016-02-06 2016-05-04 北京京东尚科信息技术有限公司 Text correction method and device
CN105975625A (en) * 2016-05-26 2016-09-28 同方知网数字出版技术股份有限公司 Chinglish inquiring correcting method and system oriented to English search engine
CN106202153A (en) * 2016-06-21 2016-12-07 广州智索信息科技有限公司 The spelling error correction method of a kind of ES search engine and system
CN106847288A (en) * 2017-02-17 2017-06-13 上海创米科技有限公司 The error correction method and device of speech recognition text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑文曦等: "自动拼写校对的算法设计和系统实现", 《科技和产业》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11176321B2 (en) 2019-05-02 2021-11-16 International Business Machines Corporation Automated feedback in online language exercises

Also Published As

Publication number Publication date
CN108681533B (en) 2022-04-19

Similar Documents

Publication Publication Date Title
JP5752150B2 (en) Context-sensitive automatic language correction using an Internet corpus specifically for small keyboard devices
US20190087084A1 (en) User-centric soft keyboard predictive technologies
US8554537B2 (en) Method and device for transliteration
Luyckx Scalability issues in authorship attribution
CN102831177B (en) Statement error correction and system thereof
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
JP5809381B1 (en) Natural language processing system, natural language processing method, and natural language processing program
Yazdani et al. Sentiment classification of financial news using statistical features
CN108694167A (en) Candidate word appraisal procedure, candidate word sort method and device
CN108563634A (en) Recognition methods, system, computer equipment and the storage medium of word misspelling
CN106610990A (en) Emotional tendency analysis method and apparatus
CN110991181B (en) Method and apparatus for enhancing labeled samples
CN108628826A (en) Candidate word appraisal procedure, device, computer equipment and storage medium
CN101369285B (en) Spell emendation method for query word in Chinese search engine
CN114328798A (en) Processing method, device, equipment, storage medium and program product for searching text
CN108681533A (en) Candidate word appraisal procedure, device, computer equipment and storage medium
CN108664466A (en) Candidate word appraisal procedure, device, computer equipment and storage medium
CN108628827A (en) Candidate word appraisal procedure, device, computer equipment and storage medium
CN108647202A (en) Candidate word appraisal procedure, device, computer equipment and storage medium
CN108681534A (en) Candidate word appraisal procedure, device, computer equipment and storage medium
CN108664467A (en) Candidate word appraisal procedure, device, computer equipment and storage medium
CN108595419A (en) Candidate word appraisal procedure, candidate word sort method and device
CN108681535A (en) Candidate word appraisal procedure, device, computer equipment and storage medium
CN108733645A (en) Candidate word appraisal procedure, device, computer equipment and storage medium
CN108694166A (en) Candidate word appraisal procedure, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant