CN106598939B - A kind of text error correction method and device, server, storage medium - Google Patents
A kind of text error correction method and device, server, storage medium Download PDFInfo
- Publication number
- CN106598939B CN106598939B CN201610922072.0A CN201610922072A CN106598939B CN 106598939 B CN106598939 B CN 106598939B CN 201610922072 A CN201610922072 A CN 201610922072A CN 106598939 B CN106598939 B CN 106598939B
- Authority
- CN
- China
- Prior art keywords
- participle
- phonetic
- error correction
- centering
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Abstract
The invention discloses a kind of text error correction method and devices, which comprises collects the first corpus in the form of participle pair;Two participles of the participle centering are marked all in the form of phonetic;Determine that the similarity of phonetic between centering two participles of the participle, the similarity are used to show the similarity degree between the phonetic of the participle centering first participle and the phonetic of the second participle;If the similarity meets preset condition, mutual error correction participle is identified as by two of the participle centering or the first participle is the error correction participle of the second participle.
Description
Technical field
The present invention relates to electronic technology more particularly to a kind of text error correction methods and device, server, storage medium.
Background technique
Text error correcting technique is widely used in various text input scenes, such as input method, search engine, speech recognition
It is that a kind of attempt to correct may in the text (such as keyword of the texts such as Chinese English) of user's input Deng, text error correcting technique
Existing mistake, and possible correctly enter is recommended user.For Chinese error correction, text error correcting technique also needs to send out
Word selection mistake, phonetic notation mistake, font mistake and its a little mistake occurred in current family input, and may be wished to user recommended user
Hope the correct keyword of input.It can be seen that error correcting technique effectively can provide guidance for user entered keyword, and can entangle
More often occurs keyword mistake.In text error correcting technique, be repaired keyword and correct keyword it
Between similarity decide that the accuracy rate of error correction, current calculating similarity specifically include that the hair based on mandarin initial and simple or compound vowel of a Chinese syllable
The initial and the final is divided into several groups by sound type, and defining the note similarity in same group is 1, and the note between different groups is similar
Degree is 0, by Chinese character aligned in position, calculates the pronunciation similarity of corresponding position one by one, and then be averaging similarity as a result.It should
The shortcomings that scheme, is: by same group it is that this definition that 1 different groups are 0 is unable to similarity degree between accurate description note, thus
The similarity difference in group between note is had ignored, such as similar journey of pronouncing between labial b [glass] and p [slope], b [glass] and m [touching]
Difference is spent, and is entirely only zero note similarity degree between group, it is as certain in having between labial b [glass] and velar g [brother]
Similar pronunciation.
Summary of the invention
In view of this, the embodiment of the present invention be solve the problems, such as it is existing in the prior art at least one and a kind of text is provided
It is similar as note to excavate note transition probability by pronunciation Similar Text for error correction method and device, server, storage medium
Degree, can be improved error correction probability.
The technical solution of the embodiment of the present invention is achieved in that
In a first aspect, the embodiment of the present invention provides a kind of text error correction method, which comprises
The first corpus is collected in the form of participle pair;
Two participles that centering is segmented in first corpus are marked all in the form of phonetic;
Determine the similarity of phonetic between centering two participles of the participle, the similarity is for showing the participle pair
Similarity degree between the phonetic of the middle first participle and the phonetic of the second participle;
If the similarity meets preset condition, two of the participle centering are identified as each other
Error correction participle or the first participle be second participle error correction participle.
Second aspect, the embodiment of the present invention provide a kind of text error correction device, and described device includes the first formation unit, mark
Infuse unit, the first determination unit and the second determination unit, in which:
Described first forms unit, for collecting the first corpus in the form of participle pair;
The mark unit, for two participle marks all in the form of phonetic of centering will to be segmented in first corpus
Note;
First determination unit, for determining the similarity of phonetic between centering two participles of the participle, the phase
Like degree for showing the similarity degree between the phonetic of the participle centering first participle and the phonetic of the second participle;
Second determination unit, if meeting preset condition for the similarity, by the participle centering
Two are identified as mutual error correction participle or the first participle as the error correction participle of the second participle.
The third aspect, the embodiment of the present invention provide a kind of server, and the server includes that processor and PERCOM peripheral communication connect
Mouthful, the processor is used for:
The first corpus is collected in the form of participle pair;
Two participles that centering is segmented in first corpus are marked all in the form of phonetic;
Determine the similarity of phonetic between centering two participles of the participle, the similarity is for showing the participle pair
Similarity degree between the phonetic of the middle first participle and the phonetic of the second participle;
If the similarity meets preset condition, two of the participle centering are identified as each other
Error correction participle or the first participle be second participle error correction participle;
The similarity is met into the participle of preset condition to formation error correction dictionary;
The error correction dictionary is sent to terminal by the external communication interface.
Fourth aspect, the embodiment of the present invention provide a kind of computer storage medium, store in the computer storage medium
There are computer executable instructions, which is used to execute the text error correction method that above-mentioned first aspect provides.
The embodiment of the present invention provides a kind of text error correction method and device, server, storage medium, wherein with participle pair
Form collect the first corpus;Two participles that centering is segmented in first corpus are marked all in the form of phonetic;It determines
The similarity of phonetic between centering two participles of the participle, the similarity are used to show the participle centering first participle
Similarity degree between phonetic and the phonetic of the second participle;If the similarity meets preset condition, by the participle
Two of centering are identified as mutual error correction participle or the first participle is the error correction participle of the second participle;In this way, logical
It crosses pronunciation Similar Text and excavates note transition probability as note similarity, can be improved error correction probability.
Detailed description of the invention
Fig. 1 is the implementation process schematic diagram one of text of embodiment of the present invention error correction method;
Fig. 2-1 is the implementation process schematic diagram two of text of embodiment of the present invention error correction method;
Fig. 2-2 is the relation schematic diagram that the embodiment of the present invention first calculates that equipment calculates equipment with second;
Fig. 2-3 is the implementation process schematic diagram three of text of embodiment of the present invention error correction method;
Fig. 3-1 is the implementation process schematic diagram four of text of embodiment of the present invention error correction method;
Fig. 3-2 is the implementation process schematic diagram of step S301 in Fig. 3-1;
Fig. 3-3 is the implementation process schematic diagram of step S302 in Fig. 3-1;
Fig. 3-4 is the implementation process schematic diagram of step S324 in Fig. 3-3;
Fig. 4 is the composed structure schematic diagram one of text of embodiment of the present invention error correction device;
Fig. 5 is the composed structure schematic diagram two of text of embodiment of the present invention error correction device;
Fig. 6 is the composed structure schematic diagram of server of the embodiment of the present invention.
Specific embodiment
The technical solution of the present invention is further elaborated in the following with reference to the drawings and specific embodiments.
In order to solve aforementioned technical problem, the embodiment of the present invention provides a kind of text error correction method, and this method is used for shape
At corresponding error correction participle is segmented, during realization, the processor that this method can calculate equipment by first calls journey
Sequence code realizes that certain program code can be stored in computer storage medium, it is seen then that the first calculating equipment is at least wrapped
Pocessor and storage media is included, the first calculating equipment can set for various types of electronics with information processing capability
It is standby, for example, the electronic equipment may include mobile phone, tablet computer, desktop computer, personal digital assistant, navigator, digital telephone,
Visual telephone, television set etc..
Fig. 1 is the implementation process schematic diagram one of text of embodiment of the present invention error correction method, as shown in Figure 1, this method comprises:
Step S101 collects the first corpus in the form of participle pair;
Here, the step of step S101 is a collection corpus, during realization, step S101 can be from following several
A channel collects corpus: the nearly sound words allusion quotation of Chinese Chinese language, the confusing dialect of note and standard pronunciation dictionary, speech recognition errors
Input method error label result in annotation results and line.The form that corpus is collected is complete to the form of (phrase segment to) to segment
At, such as: " logging off "-" leg is slightly imperial ", " coupons "-" cash equivalent volume ", " comrades "-" bobbins ", " comrades "-
" notice door ", " dried shrimp "-" villagers ", " salted vegetables are too expensive "-" let's start the meeting " and " sausage pickled melon "-" chief of township's speech ",
" comrades "-" let's start the meeting " etc..It should be noted that allowing in the first corpus includes wrong participle pair, such as participle pair
The phonetic diversity ratio of " comrades "-" let's start the meeting " is larger, is not to be regarded as the similar participle pair of phonetic under normal circumstances.
It further include the second corpus for being used to form the initial and the final similarity matrix, the second corpus and first in the other embodiment of the present invention
Corpus can be different, and the second corpus can actually regard a standard corpus as, i.e. should not include mistake inside the second corpus
Participle pair accidentally;And in the first corpus may include the participle pair of mistake, the first corpus forms this by embodiment shown in FIG. 1
The participle set provided is provided.
Step S102 marks two participles that centering is segmented in first corpus in the form of phonetic;
Here, continue to accept the example in above-mentioned steps S101, the phonetic of mark " salted vegetables are too expensive " is " xian-cai-
Tai-gui ", the phonetic of mark " let's start the meeting " are " xian-zai-kai-hui ", and the phonetic of mark " sausage pickled melon " is
" xiang-chang-jiang-gua ", the phonetic of mark " chief of township's speech " are " xiang-zhang-jiang-hua ".
Step S103 determines the similarity of phonetic between centering two participles of the participle, and the similarity is for showing
Similarity degree between the phonetic of the participle centering first participle and the phonetic of the second participle;
Here, continue to accept the example in above-mentioned steps S101, for example, determine " logging off " and " leg slightly imperial " the two
The similarity of phonetic between participle determines the similarity of phonetic between " comrades " and " bobbins " the two participles for another example, then
Such as the similarity of phonetic between determination " comrades " and " let's start the meeting " the two participles.
Here, the similarity of phonetic includes: to utilize preset initial consonant between centering two participles of the determination participle
Simple or compound vowel of a Chinese syllable similarity matrix determines the similarity of phonetic between centering two participles of the participle.Wherein about determining the initial and the final phase
It is described in other examples like the process of degree matrix.
Step S104, judges whether the similarity meets preset condition;
Here, the preset condition can be threshold value, and described to judge whether similarity meets preset condition include: to sentence
Whether the similarity of breaking is greater than the threshold value, if the similarity is greater than the threshold value, it is determined that full for the similarity
The foot preset condition, if the similarity is less than or equal to the threshold value, it is determined that be unsatisfactory for for the similarity described
Preset condition.
Step S105 distinguishes two participles of the participle centering if the similarity meets preset condition
It is determined as mutual error correction participle or the first participle is the error correction participle of the second participle.
Here, continue to accept the example in above-mentioned step S103, two can be identified as by step S105
Mutual error correction participle;For example, the error correction that " salted vegetables are too expensive " is determined as " let's start the meeting " is segmented, " let's start the meeting " is determined as
The error correction of " salted vegetables are too expensive " segments;For another example, " sausage pickled melon " is determined as to the error correction participle of " chief of township's speech ", by " chief of township's speech "
It is determined as the error correction participle of " sausage pickled melon ".The first participle can be that the error correction of the second participle segments by rapid S105;For example, can be with
" comrades " are determined as to the error correction participle of " bobbins ", " will log off " and be determined as the error correction participle of " leg is slightly imperial ".From with
Upper can be seen that is determined as actually a kind of two-way mechanism for correcting errors of mutual error correction participle, and is determined as the first participle and is
The error correction participle of second participle is actually a kind of unidirectional mechanism for correcting errors, this is because two points in two-way mechanism for correcting errors
Word is all the everyday expressions in Working Life study, and the application environment for only segmenting two words of centering is different, for example,
" let's start the meeting " and " salted vegetables are too expensive " mutual error correction segments (i.e. two-way mechanism for correcting errors) each other, and " let's start the meeting " is generally used for
In work, and " salted vegetables are too expensive " is generally used in life.Mistake is generally acknowledged to by the participle of error correction in unidirectional mechanism for correcting errors
Word, such as " logging off " is the error correction participle of " leg slightly imperial ", i.e., " logging off " for error correction " leg is slightly imperial ", " leg is thick
It is a wrong participle that dragon ", which is generally divided into,;For another example, " comrades " are the error correction participles of " bobbins ", i.e., " comrades " use
In error correction " bobbins " or " notice door ", it is a wrong participle that " bobbins " or " notice door ", which are generally divided into,.It needs
Bright, above-mentioned unidirectional mechanism for correcting errors can be converted to two-way mechanism for correcting errors under certain conditions, such as in some feelings
Under condition, " bobbins " or " notice door " may also be considered as a correct word.
In the embodiment of the present invention, the error correction participle determined according to step S105 can form an error correction dictionary, that is, described
Method further include: the similarity is met into the participle of preset condition to formation error correction dictionary;The error correction dictionary is sent
To terminal.It include several participle set in the error correction dictionary, the participle set includes at least a phase according to phonetic
Like the error correction participle for being segmented described in error correction to error correction that degree is calculated and is obtained, for example, " comrades " corresponding participle
Set includes " bobbins " and " notice door ", and " let's start the meeting " corresponding participle set includes " salted vegetables are too expensive ", and " notice door " is right
The participle set answered includes " bobbins " and " comrades ", and " leg is slightly imperial " corresponding participle set includes " logging off ".
In the above-described embodiment, it needs two participles to participle centering to carry out pinyin marking in step S102, is marking
Infuse phonetic when, step S102 the following steps are included:
Step S121 judges whether the participle centering includes Arabic numerals;
Step S122 is converted to the Arabic numerals corresponding if the participle centering includes Arabic numerals
Chinese character;
Here, suppose that participle is " speed 8 " or " Mo Tai 168 ", then the Arabic numerals in participle be converted to Chinese character being
" speed eight " or " Mo Taiyi six or eight ".
Step S123 marks the participle of the participle centering be converted to after Chinese character in the form of phonetic;
Here, continue to accept the example in above-mentioned steps S122, it is assumed that segment as " speed 8 " or " Mo Tai 168 ", then marking
When infusing the phonetic of the two participles, the phonetic of " speed 8 " is " su-ba ", and the phonetic of " Mo Tai 168 " is " mo-tai-yi-liu-ba "
Or " mo-tai-yao-liu-ba ".Wherein one in " Mo Taiyi six or eight " is polyphone, can be labeled as " yi or yao ".
Step S124 judges whether the participle centering includes polyphone;
Step S125, if the participle centering does not include polyphone, all by two participles of the participle centering
It is marked in the form of phonetic.
Here, such as participle does not include polyphone to " leg is slightly imperial "-" logging off ", then carries out to the two participles
Pinyin marking: the phonetic of " leg is slightly imperial " is " tui-cu-long ", and the phonetic of " logging off " is " tui-chu-xi-tong ".
Step S126 continues to judge in two participles of the participle centering if the participle centering includes polyphone
Whether polyphonic word is had;
Here, such as participle is to for " PianYiFang "-" variation side " or " PianYiFang "-" derogatory sense side ", wherein " PianYiFang "
In " just " be polyphone, " just " corresponding phonetic has " two sound pi á n " and " four tones of standard Chinese pronunciation bi à n ".So continue to judge the participle pair
In two participles in whether have polyphonic word, for example, " cheap " for polyphonic word in " PianYiFang ", " cheap " corresponding phonetic packet
Include " bian-yi " and " pian-yi ".In general, the collection of corpus is the form to segment pair, then the present embodiment is in order to mention
High efficiency directly judges to segment whether centering has polyphonic word, for example, " list " word is polyphone, is singly pronouncing the " four tones of standard Chinese pronunciation as surname Shi
Sh à n " pronounces " ch á n " in the title as ancient times Xiongnu monarch, as it is for example lonely in general phrase when pronounce " sound a d
ān".When collecting corpus, if participle is to for " loneliness "-" fighting single-handed ", although being singly polyphone, loneliness is not multitone
Word then lonely phonetic is exactly unique, and does not have to " list " being labeled as three phonetics " ch á n ", " sh à n " and " d ā n ".By
This is as it can be seen that method provided in this embodiment can significantly improve computational efficiency.If the appearance for segmenting centering polyphone is single
Word rather than phrase come then needing to mark out each phonetic of the word;As previously mentioned, the collection of corpus is participle pair
Form, therefore, appearance the case where being single word rather than phrase of participle centering polyphone, will be very rare.
Step S127, if at least one in two participles of the participle centering includes polyphonic word, by the multitone
The corresponding more than two phonetics of word are labeled as some or all of the phonetic of the corresponding participle of participle centering.
Here, continue to accept above-mentioned example, " PianYiFang " mark phonetic be include " bian-yi-fang " and
“pian-yi-fang”。
From the above, it can be seen that above-mentioned steps S102, which can actually be one, to be segmented to the process for turning phonetic, in reality
The step can realize that each word is obtained in Chinese dictionary to be had in such a way that one is looked into Chinese characters and pinyin table in existing process
Corresponding phonetic.Processing step is as follows: 1) encounters non-polyphone table look-at and turns phonetic, 2) and encounter polyphone, check the word and week
The group word result of side word is tabled look-up, and exists and word has unique pronunciation then to turn phonetic;In the presence of and word pronunciation multitone not yet, use language
Model determines pronunciation (such as: cheap pin-yi, bian-yi);3) there is no using default pronunciation, (polyphone has default to send out in table
Sound);4) encounter Arabic numerals, switch to corresponding Chinese character and table look-up again;4) encounter English character, mark phonetic can not be done
Processing;5) encounter the Chinese character not in table, skip the word, and the phonetic of the position is set to sky.
It should be noted that in above-mentioned steps S121 into step S127, step S121 to step S123 and step S124
To stringent successive execution relationship is had no between step S127, i.e., in implementation process, step S121 can be first carried out to step
S123, then step S124 is executed to step S127;Certainly step S124 to step S127 can also be first carried out, then executes step
S121 to step S123.
In other embodiments of the invention, step S103 is used to determine phonetic between centering two participles of the participle
Similarity, the step include:
Step S131, by the initial consonant alignment for segmenting the phonetic that centering two segment and the rhythm for the phonetic for segmenting two
Mother's alignment;
Here, in other embodiments of the invention, by the alignment thereof of most same pronunciations by the participle centering two
The simple or compound vowel of a Chinese syllable for the phonetic that the initial consonant of the phonetic of a participle is aligned and segments two is aligned.For example, " logging off "-" leg is slightly imperial "
Phonetic alignment is as follows:
" logging off " --- t-ui-ch-u-x-i-t-ong;
" leg is slightly imperial " --- t-ui-c--u- -- l-ong;
During alignment, most same pronunciations in order to obtain, by the simple or compound vowel of a Chinese syllable of simple or compound vowel of a Chinese syllable " long " and " system " of " dragon "
" long " alignment, rather than the phonetic of " dragon " and " being " is aligned;" " indicates default.In this example, " logging off " is four
Word, " leg is slightly imperial " are three words, are first aligned in sequence in the alignment most started, i.e. spelling of the phonetic of " leg " corresponding to " moving back "
Sound, " thick " phonetic correspond to phonetic of the phonetic corresponding to " being " of the phonetic of " out ", " dragon ", and the phonetic of " system " is default, the
It is one group " leg " very high with the similarity of " moving back " and second group " thick " and " out ", but third group " dragon " and the similarity of " being " are very
Low, at this time, the present embodiment will do it dislocation processing, i.e., changes third group are as follows: the phonetic of " being " is default, the 4th group of change
Are as follows: the phonetic of " dragon " corresponds to the phonetic of " system ";It is handled by dislocation, first group, second group and the 4th group of similarity all can
It is relatively high.When in the related technology, using voicing text similarity, by two sections of texts according to word sequence aligned in position, the prior art
The disadvantage is that, in the case where meeting some sentence multiword or few word, the mistake alignment plenary session mistake of follow-up location.And this hair
The method that bright embodiment uses most same pronunciations, in the case where capable of guaranteeing certain section of text multiword or few word, two sections of texts
Between alignment.
The conversion of step S132, the phonetic that the phonetic for calculating the participle centering first participle is converted to the second participle are general
Rate;
Step S133 determines the similarity of phonetic between centering two participles of the participle according to the transition probability.
In other embodiments of the invention, the mode of two kinds of realization step S132 is provided below:
Mode one: first way is fairly simple, that is, determines different between the first participle and second participle
The number of note, then according to the length of the number of different notes and the first participle or the note string of second participle
Determine the transition probability, wherein the length of note string can for the first participle number of words multiplied by 2 product, alternatively, note string
Length can also for second participle number of words multiplied by 2 product, alternatively, the length of the note string can be for the first participle
Number of words and the sum of the number of words of the second participle multiplied by 2 product because the phonetic of a Chinese character includes initial consonant and simple or compound vowel of a Chinese syllable, then sound
The length of symbol string is just 2 times of Chinese total number.By " logging off "-" leg slightly imperial " this to participle for be illustrated: assuming that
It is " leg is slightly imperial " by the participle (first participle) of error correction, error correction participle (the second participle) is " logging off ", and the participle is between
The numbers of different notes be 4, respectively " ch ", " x ", " i " and " t ", wherein note includes initial consonant and simple or compound vowel of a Chinese syllable, then described in
Transition probability may be calculated: 4 ÷ 6 (i.e. 4 divided by 6,6 for the note string of the first participle length), (i.e. 4 are 4 ÷ 8 divided by 8,8
Second participle note string length) or 4 ÷ (6+8) (i.e. 4 divided by 14,14 for the first participle and second participle note string
The sum of length).Assuming that being " logging off " by the participle (first participle) of error correction, error correction participle (the second participle) is that " leg is thick
Dragon ", the numbers of different notes of the participle between are 2, respectively " c " and " l ", and wherein note includes initial consonant and simple or compound vowel of a Chinese syllable,
The so described transition probability may be calculated: 2 ÷ 6 (i.e. 2 divided by 6,6 for the note string of the first participle length), 2 ÷ 8 (i.e. 2
Be the length of the note string of the second participle divided by 8,8) or 2 ÷ (6+8) (i.e. 2 divided by 14,14 be the first participle and second point
The sum of the length of the note string of word).
It should be noted that the relationship between the above-mentioned transition probability calculated and similarity is shifted in inverse ratio
Probability is smaller, and similarity is bigger, and transition probability is bigger, and similarity is smaller, and the transition probability is between [0,1], i.e. institute
Transition probability is stated more than or equal to 0 and is less than or equal to 1, when transition probability is 0, shows that the note of the first participle and the second participle is
It is identical, such as " comrades "-" notice ";When transition probability is 1, show the sound of the first participle and the second participle
Symbol is entirely different, such as " comrades "-" let's start the meeting ".In order to there is a good corresponding relationship to be easier in other words
Understand transition probability, similarity can be calculated using following relational expression: similarity=1- transition probability.It calculates in this way
Similarity between [0,1], i.e., the described similarity be more than or equal to 0 and be less than or equal to 1, when similarity be 0 when, indicate participle pair
In two participles phonetics it is entirely different, when similarity is 1, indicate the complete phases of phonetic of two of centering participles of participle
Together.
Mode two: the second way is to calculate the participle centering first using preset the initial and the final similarity matrix to divide
The phonetic of word is converted to the transition probability of the phonetic of the second participle, step S132, the calculating participle centering first participle
Phonetic be converted to the second participle phonetic transition probability, comprising:
Step S1321, if the word unisonance of two participle same positions after alignment, calculates score Score and adds 1, and
The position of the position of the participle centering first participle and the second participle is all added 1;
Step S1322, if the word not unisonance of two participle same positions after alignment, according to preset the initial and the final phase
The score Score of the phonetic of the phonetic of the first participle and the second participle in described two participles is determined like degree matrix;
Step S1323 is determined normalized according to the score Score, the number of words of the first participle, the number of words of the second participle
Final score Sf;
Step S1324 determines that the phonetic of the participle centering first participle is converted to second according to the final score Sf
The transition probability of the phonetic of participle.
In other embodiments of the invention, described to determine described two points according to preset the initial and the final similarity matrix
The score Score of the phonetic of the phonetic of the first participle and the second participle in word, comprising:
Step S13221 obtains the initial consonant of the word of two participle same positions according to preset the initial and the final similarity matrix
Between similarity, the similarity between simple or compound vowel of a Chinese syllable;
Step S13222, if to be greater than first default by the product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between initial consonant
Value then calculates score Score plus S, the position of the position of the participle centering first participle and the second participle is all added 1;
Step S13223, if two participle same position word initial consonant between similarity and simple or compound vowel of a Chinese syllable between it is similar
The product S of degree is less than or equal to the first preset value, then the present bit of the first participle is obtained according to the initial and the final similarity matrix
Similarity, initial consonant and the simple or compound vowel of a Chinese syllable between similarity, simple or compound vowel of a Chinese syllable between the initial consonant of the word of the next position of the word set and the second participle
Between similarity and simple or compound vowel of a Chinese syllable and initial consonant between similarity;
Here, first preset value and the second following preset value, third preset value can be empirical value, and first is default
Value can be identical with the second following preset value, third preset value, such as all value is 0.8, naturally it is also possible to difference.
Step S13224, determines the first maximum value, and first maximum value is the word and the of the current location of the first participle
The product S's of similarity between similarity and simple or compound vowel of a Chinese syllable, the first participle between the initial consonant of the word of the next positions of two participles works as
The present bit of similarity and the first participle between the simple or compound vowel of a Chinese syllable of the word of the next position of the initial consonant of the word of front position and the second participle
The maximum value between similarity this three between the initial consonant of the word of the next position of the simple or compound vowel of a Chinese syllable for the word set and the second participle;
Step S13225, judges whether first maximum value is greater than the second preset value, before calculating score Score is
Score adds first maximum value before obtaining, and the position of the position of the participle centering first participle and the second participle is all added
1;
Step S13226, it is similar according to the initial and the final if first maximum value is less than or equal to the second preset value
Similarity, rhythm between the initial consonant of the word for the current location that degree matrix obtains the word of the next position of the first participle and second segments
Similarity, initial consonant between mother and the similarity between simple or compound vowel of a Chinese syllable and the similarity between simple or compound vowel of a Chinese syllable and initial consonant;
Step S13227, determines the second maximum value, and second maximum value is the word and the of the next position of the first participle
The product S of similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of the current locations of two participles, under the first participle
The next bit of similarity and the first participle between the simple or compound vowel of a Chinese syllable of the word of the current location of the initial consonant of the word of one position and the second participle
The maximum value between similarity this three between the initial consonant of the word of the current location of the simple or compound vowel of a Chinese syllable for the word set and the second participle;
Step S13228, judges whether second maximum value is greater than third preset value, before calculating score Score is
Score adds second maximum value before obtaining, and the position of the position of the participle centering first participle and the second participle is all added
1, word after then judging the location updating of the first participle and the second participle whether unisonance, through the above-mentioned steps traversal first participle
The phonetic of phonetic and the second participle simultaneously calculates score Score.
In embodiments of the present invention, a kind of method of determining the initial and the final similarity matrix is also provided, described in the determination
The initial and the final similarity matrix includes:
Step S140 collects the second corpus in the form of participle pair;Two points of centering will be segmented in second corpus
Word marks all in the form of phonetic;
Here, second corpus is used to form the initial and the final similarity matrix, the second corpus and the first corpus above-mentioned
Can be different, the second corpus can actually regard a standard corpus as, i.e. should not include mistake inside the second corpus
Participle pair;And in the first corpus may include the participle pair of mistake, the first corpus forms the present invention by embodiment shown in FIG. 1
The participle set of offer.
Here, pinyin marking can use mask method above-mentioned, for example, by using the method for most same pronunciations.
Step S141 determines that first note is by the total degree of pronunciation mistake, the first note packet in second corpus
Include initial consonant or simple or compound vowel of a Chinese syllable;
Here, the second corpus can be standard corpus library, so whether abundant decide of corpus is entangled to a certain extent
Wrong accuracy, in the present embodiment, other than dictionary above-mentioned, the collection of corpus further includes the session log between user,
Corpus is excavated from the session log on line, the main purpose of the step is connected applications field, is dug from user's history log
Excavate the error correction candidate for meeting application target.The thinking for excavating session log is also the pronunciation similarity using text as similar
Degree measurement, in general, there are two main methods: a) user conversation (session) (such as customer service session), excavates user actively
Repair pronunciation mistake;From session context, the pronunciation analog result between different inputs is excavated repeatedly;B) field is manually customized
Emphasis phrase excavates fallibility candidate, artificial to customize field emphasis phrase in conjunction with business objective, excavates and determines from a large amount of logs
Phrase processed pronounces similar result.
Step S142 determines number of the first note by incorrect pronunciations for the second note;
Step S143 by incorrect pronunciations is by the total degree of pronunciation mistake and the first note according to the first note
The number of second note determines that first note transfer is the probability of the second note;
Step S144 determines that the second note is by the total degree of pronunciation mistake, the second note packet in second corpus
Include initial consonant or simple or compound vowel of a Chinese syllable;
Step S145 determines number of second note by incorrect pronunciations for first note;
Step S146 by incorrect pronunciations is by the total degree of pronunciation mistake and second note according to second note
The number of first note determines that the transfer of the second note is the probability of first note;
Step S147 is first according to the probability and second note transfer that first note transfer is the second note
Similarity between first note described in the determine the probability of note and second note.
Here, it is illustrated by taking " let's start the meeting "-" salted vegetables are too expensive " as an example: as follows first to the participle to mark phonetic:
Let's start the meeting x ian z ai t ai h ui;
Salted vegetables too your x ian c ai t ai g ui;
After alignment, inconsistent note has z and c, h and g.
The probability (i.e. the transition probability p (c | z) of note) that first note z transfer is the second note c is calculated now, will acquire
The participle obscured of all pronunciations to being aligned, statistics is pronounced inconsistent various note numbers, and the transfer for calculating note is general
Rate p (c | z):
P (c | z)=count (z- > c)/count (z) (1);
In formula (1): p (c | z) it is the transition probability that note z incorrect pronunciations are note c;Count (z- > c) is the second corpus
Middle note z incorrect pronunciations are the number of c;Count (z) is the total degree that note z is wrong by pronunciation in the second corpus;
Similarly calculate the Probability p (z | c) that the second note c incorrect pronunciations are first note z.
Then the Probability p (c | z) and the second note c incorrect pronunciations for being the second note c according to first note z transfer are first
The Probability p (z | c) of note z determines the pronunciation similarity Sim (c, z) between note z and note c, can be with during realization
It is obtained using formula (2):
Sim (c, z)=(P (c | z)+P (z | c))/2 (2).
It should be noted that note includes initial consonant and simple or compound vowel of a Chinese syllable, then the initial and the final similarity matrix actually includes at least
Similarity matrix and simple or compound vowel of a Chinese syllable and rhythm between three similarity matrix, initial consonant and the simple or compound vowel of a Chinese syllable between matrix, such as initial consonant and initial consonant
Similarity matrix between mother, wherein assuming that initial consonant has 21, then the similarity matrix between initial consonant and initial consonant is 21 × 21
Square matrix, it is assumed that initial consonant has 39, then the square matrix that the similarity matrix between initial consonant and initial consonant is 39 × 39, initial consonant and simple or compound vowel of a Chinese syllable
Between similarity matrix be 21 × 39 matrix.In later embodiment, turn between two initial consonants if necessary to determine
Probability is moved, then can directly inquire the similarity matrix between initial consonant and initial consonant, if necessary to determine between two simple or compound vowel of a Chinese syllable
Transition probability, then can directly inquire the similarity matrix between simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable;If necessary to determine initial consonant and rhythm
Transition probability between mother then can directly inquire the similarity matrix between initial consonant and simple or compound vowel of a Chinese syllable.
Based on embodiment above-mentioned, the embodiment of the present invention provides a kind of text error correction method again, should during realization
Method can realize that certain program code can be stored in calculating by the processor caller code of the second calculating equipment
In machine storage medium, it is seen then that this second calculates equipment and include at least pocessor and storage media, and described second calculates equipment can be with
For various types of electronic equipments with information processing capability, for example, the electronic equipment may include mobile phone, tablet computer,
Desktop computer, personal digital assistant, navigator, digital telephone, visual telephone, television set etc..
Fig. 2-1 is the implementation process schematic diagram two of text of embodiment of the present invention error correction method, as shown in Fig. 2-1, this method
Include:
Step S201 is determined to error correction participle, the participle segmented in the sentence for being user's input to error correction;
Here, during realization, the text of user's input is often a word, or continuously multiple participles, that
The text for inputting user is needed to make pauses in reading unpunctuated ancient writings, when punctuate can be disconnected in the form of participle, for example, the possibility that user inputs
It is " bobbins, let's start the meeting for we ", then in punctuate the auxiliary word such as modal particle, auxiliary word can be removed, and use
The form of participle disconnects, and the result of disconnection is " bobbins-we-let's start the meeting ".After disconnection, determines and wrapped to error correction participle
It includes: " bobbins ", " we " and " let's start the meeting ".
Step S202 is judged whether there is and is gathered with described to the corresponding participle of error correction participle, the participle gather in extremely
It less include the error correction participle for being segmented described in error correction to error correction that a similarity according to phonetic is calculated and obtained;
Here, be illustrated by taking " bobbins " as an example, i.e., judgement " bobbins " whether include corresponding participle set, by with
Above it is found that the participle set of " bobbins " includes " comrades " and " notice door ";For another example, by taking " let's start the meeting " as an example, judgement is " existing
It whether include in session " that corresponding participle is gathered, as known from the above, the participle set of " let's start the meeting " includes " salted vegetables are too expensive ".
Step S203 determines that first language model score, the first language model score are that described segment to error correction exists
Language model scores in the sentence;
Here, continue to accept the example in step S202, that is, determine the language model scores of " bobbins ", and determine
The language model scores of " let's start the meeting ".
Step S204 determines that second language model score, the second language model score are that described segment to error correction collects
Error correction segments the language model scores in the sentence respectively in conjunction;
Here, continue to accept the example in step S202, that is, determine the language model scores of " comrades " and " notice door ",
And determine the language model scores of " salted vegetables are too expensive ".
Step S205, judge whether there is to have in the second language model score obtains greater than the first language model
Point, obtain judging result;
Here, continue to accept the example in above-mentioned steps, " let's start the meeting " corresponding second language model score only has one
It is a, i.e. the language model scores of " salted vegetables are too expensive ";And " bobbins " corresponding second language model score includes two, i.e., it is " same
The language model scores of will " and the language model scores of " notice door ";When this is in judgement, i.e. judgement " salted vegetables are too expensive "
Language model scores whether be greater than the language model scores of " let's start the meeting ", judge " comrades " language model scores whether
Greater than the language whether language model scores of the language model scores of " bobbins ", and judgement " notice door " are greater than " bobbins "
Say model score.Assuming that the language model scores of " comrades " are higher than " bobbins " and " notice door ", the language of " let's start the meeting "
Model score is higher than " salted vegetables are too expensive ", then judging result is to be not present in second language model score for " let's start the meeting "
Greater than the first language model score;For " bobbins ", judging result is greater than institute to exist in second language model score
State first language model score.
Step S206 carries out error correction to described segment to error correction according to judging result.
Here, step S206, it is described that error correction is carried out to described segment to error correction according to judging result, comprising:
Step S2061, if there are the first language model score is greater than in the second language model score, it will
The error correction participle of highest scoring is determined as to the error correction word to error correction participle in language model scores;Here, step
S206, further includes: the first participle is replaced with to the error correction word of the first participle, is exported.
Step S2062 is not right if being not greater than the first language model score in the second language model score
Described segment to error correction carries out error correction.
In this example, it is assumed that the language model scores of " comrades " are higher than " bobbins " and " notice door ", then by " cylinder
Son " is corrected as " comrades ";Assuming that the language model scores of " let's start the meeting " are higher than " salted vegetables are too expensive ", then not to " opening now
Meeting " carries out error correction.
It should be noted that step S202 during realization, can be by inquiring preset related information judgement
It is no to exist with described to error correction participle corresponding participle set, the related information during realization can by list,
Incidence relation etc. realizes that the related information is used to show to segment and segmenting the corresponding relationship between set to error correction.It is described
Related information can be pre-set (calculating equipment from first), naturally it is also possible to be that the first calculating equipment is handed down to the
Two, which calculate equipment or the second calculating equipment, calculates device request to first, in other words, referring to fig. 2 shown in -2, realizes Fig. 1 institute
The first calculating equipment 10 can be regarded as realizing the service of the second calculating equipment 21 and 22 shown in Fig. 2-1 in the technical solution shown
Device, and the second calculating equipment can be regarded as the terminal of the first calculating equipment, first calculates equipment 10 can also be regular or indefinite
Phase calculates equipment 21 and 22 to the second of user and updates related information.
In other embodiments of the invention, shown in referring to figure 2-3, on the basis of method shown in Fig. 1, the method
Further include:
Step S230, terminal to server send error correction request, the sentence of user's input are carried in the error correction request;
Here, terminal side is equipped with client, and client can use the form of application program (App, Application)
It embodies, user is at terminal read statement (or text), and then client detects the sentence of user's input, and then, client will
The sentence carries in error correction request, and then the error correction request is sent to server by client.
Step S231, the error correction request that server receiving terminal is sent,
Step S232, server is determined to be segmented to error correction, in the sentence to error correction participle for user input
Participle;
Here, the text of in general user's input is often in short or continuous multiple participles, then need by
The text of user's input is made pauses in reading unpunctuated ancient writings, and when punctuate can be disconnected in the form of participle, for example, what user inputted may be " bobbin
, let's start the meeting for we ", then in punctuate the auxiliary word such as modal particle, auxiliary word can be removed, and using participle
Form disconnects, and the result of disconnection is " bobbins-we-let's start the meeting ".After disconnection, determine that error correction participle include: " cylinder
Son ", " we " and " let's start the meeting ".
Step S233, server judge in error correction dictionary with the presence or absence of with described to error correction participle corresponding participle set;
Step S234 gathers if there is with described to the corresponding participle of error correction participle, and server determines first language mould
Type score and second language model score, the first language model score are the language to error correction participle in the sentence
It says model score, one is included at least in the participle set for the error correction participle described in error correction to error correction participle, described the
Two language model scores segment the language model scores in the sentence respectively to error correction in error correction participle set to be described;
Step S235, judge whether there is to have in the second language model score obtains greater than the first language model
Point;
Step S236, if there are be greater than the first language model score, clothes in the second language model score
The error correction participle of highest scoring in language model scores is determined as to the error correction word to error correction participle by business device;
Here, the step S201 in embodiment shown in above-mentioned step S232 to step S236 and earlier figures 2-1 is extremely walked
Rapid S206 is similar, and those skilled in the art is referred to embodiment shown in earlier figures 2-1 and understands above-mentioned step S232
To step S236.
The error correction word is carried in the first error correction response, and first error correction is rung by step S237, server
Terminal should be sent to.
Step S238, if being not greater than the first language model score in the second language model score, or such as
Fruit, which is not present, gathers with described to the corresponding participle of error correction participle, and server sends the second error correction and responds, the second error correction sound
Applied to show not to it is described to error correction segment carry out error correction.
Step S239, terminal receive the error correction response that server is sent, and determine that the error correction response received is that the first error correction is rung
At once, it is then responded according to the first error correction and error correction is carried out to the sentence that user inputs;Determine that the error correction response received is second
When error correction responds, error correction is not carried out to the sentence of user's input.
In the embodiment shown in Fig. 2-1, language model scores are to complete in terminal side, and be based on eventually in the present embodiment
The request at end, server complete language model scores, it can be seen that, when error correction method consumes ratio for the hardware of terminal
, can be using method shown in Fig. 2-1 when lower, can not need networking in this way can be completed text error correction, i.e. this method can
To be completed in the case where offline;When consumption of the error correction method to hardware is relatively high, method shown in Fig. 2-3 can be used,
Consumption of the terminal to hardware resource can be saved in this way, but is needed terminal to network with server and be just able to achieve.
Based on embodiment above-mentioned, the embodiment of the present invention provides a kind of text error correction side based on Chinese pronunciations similarity
Method can be applied to the speech recognition result error correction and Chinese pinyin input method result error correction of Chinese, can also be directly as spy
It takes over for use in Chinese Semantic Similarity Measurement.Fig. 3-1 is the implementation process schematic diagram four of text of embodiment of the present invention error correction method, such as
Shown in Fig. 3-1, this method comprises:
Step S301, pronunciation similarity dictionary excavate;
Here, as shown in figure 3-2, step S301 is further comprising the steps of:
Step S311 collects easily pronunciation and obscures phrase pair;
Here, the step 1 corpus collection step can collect corpus: the nearly sound word of Chinese Chinese language from following channel
Dictionary;The confusing dialect of note and standard pronunciation dictionary;Speech recognition errors annotation results;Input method error label knot on line
Fruit.
Here, the form that corpus is collected is completed in the form of phrase segment pair, such as " logging off " --- " leg is slightly imperial ", " generation
--- ----" bobbins ", " dried shrimp " --- " villagers ", " salted vegetables are too expensive " --- are " now for " cash equivalent volume ", " comrades " for gold note "
Meeting ", " sausage pickled melon " --- " chief of township's speech ".
Step S312, phrase is to turning phonetic;
This step is realized in such a way that one is looked into Chinese characters and pinyin table, and each word is obtained in Chinese dictionary correspondence
Phonetic, processing step are as follows: 1) encountering non-polyphone table look-at and turn phonetic;2) encounter polyphone, check the word and periphery word
Group word result table look-up;Here, exist and word has unique pronunciation then to turn phonetic, exist and word pronunciation multitone not yet, use language
Say that model determines pronunciation (such as: cheap pronunciation includes " pin-yi " and " bian-yi ");There is no use default to pronounce (in table
Polyphone has default to pronounce).3) encounter Arabic numerals, switch to corresponding Chinese character and table look-up again;4) encounter English character, no
It processes;5) encounter the Chinese character not in table, skip the word, and the position is set to sky.
Step S313, phonetic the initial and the final cutting alignment;
Here, due to similar phrase centering of pronouncing, incorrect pronunciations are a small number of notes, press most multiphase so use herein
The alignment schemes of equal pronunciations, such as:
Let's start the meeting x ian z ai t ai hui;
Salted vegetables too your x ian c ai t ai g ui;
After alignment, inconsistent note has z and c, h and g.
Step S314 calculates transition probability between the initial and the final;
Here, all pronunciations that will acquire are obscured to being aligned according to the above method, count inconsistent various sounds
Number is accorded with, the transition probability p (c | z) that note z incorrect pronunciations are note c is calculated:
P (c | z)=count (z- > c)/count (z);
Wherein, it is note z in corpus that p (c | z), which is transition probability, count (z- > c) that note z incorrect pronunciations are note c,
Incorrect pronunciations are the number of c;Count (z) is the total degree that note z is wrong by pronunciation in corpus.
Step S315 calculates similarity score between any note;
It is herein that the pronunciation between note z and note c is similar by the p (c | z) being calculated and p of upper step (z | c)
Degree is defined as: Sim (c, z)=(P (c | z)+P (z | c))/2;
Calculate similarity between any note, the initial and the final similarity matrix between an available note, wherein sound
Between symbol the initial and the final similarity matrix include between initial consonant and initial consonant, initial consonant and simple or compound vowel of a Chinese syllable between simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable
Similarity matrix.
Step S302, phrase pronunciation similarity calculation;
Pronounce between the note being calculated based on step S301 similarity, this step calculate two any given phrases it
Between pronunciation similarity, detailed process is as shown in Fig. 3-3, comprising:
Step S321, Arabic numerals pretreatment, such as " 2 " switch to " two ", convenient for extracting phonetic;
Step S322, Chinese character turn phonetic, with step S312;
Step S323, each word pronunciation cutting the initial and the final of pinyin string;
Step S324 word for word traverses two pinyin strings, calculates similar score;
Here, the current location for first assuming the first participle is pos1, the current location of the second participle is pos2, ScoreSS,
ScoreYY and ScoreSY is respectively between initial consonant and initial consonant, between simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable, similarity score between initial consonant and simple or compound vowel of a Chinese syllable,
It can be obtained by inquiring above-mentioned the initial and the final similarity matrix;Score is score;So calculate similar score referring to
Fig. 3-4, comprising:
Step S3241 starts, and pos is arranged1=1, pos2=1;
Step S3242 judges whether the word of the current location of the first participle is identical as the word of the current location of the second participle,
If identical, Score+=1, pos1+=1, pos2+=1, continue returns to step S3242;If it is not the same, then into
Enter step S3243;
Whether step S3243, judgement (S=ScoreSS*ScoreYY) are greater than 0.8, if (S=ScoreSS*ScoreYY) >
0.8, then Score+=S, pos2+=1, pos2+=1, continue returns to step S3242;If (S=ScoreSS*
ScoreYY)≤0.8, it is determined that the similarity for facing a word of the first participle and the second participle enters step S3244.
Step S3244, if (S=ScoreSS*ScoreYY)≤0.8, judges pos1With pos2Whether+1 place has S=max
(ScoreSS*ScoreYY,ScoreSY1,ScoreSY2)>0.8;
If pos1With pos2+ 1 place has S=max (ScoreSS*ScoreYY, ScoreSY1, ScoreSY2) > 0.8, then
Score+=S, pos1+=1, pos2+=2, continue return to step S3242;If pos1With pos2There is S=max at+1 place
(ScoreSS*ScoreYY, ScoreSY1, ScoreSY2)≤0.8, then enter step S3245;
Step S3245, judges pos1+ 1 and pos2Place whether have (S=max (ScoreSS*ScoreYY, ScoreSY1,
ScoreSY2)>0.8;
If pos1+ 1 and pos2Place, (S=max (ScoreSS*ScoreYY, ScoreSY1, ScoreSY2) > 0.8, then
Score+=S, pos1+=2, pos2+=1, continue returns to step S3242;
S3242 to step S3245 traversal terminates through the above steps, and Score is the similarity score of two participles.
Step S325, similarity score normalization, referring to as follows:
Sf=Score*2/ (Size1*Size2)
Wherein: Sf is the final score after normalization, and Score is that previous step traverses score, and Size1 is the first Chinese character string
Number of words, Size2 are the number of words of the second Chinese character string;
Step S303, error correction candidate excavate;
Based on the similarity calculating method of upper step, from excavating in interactive log on line, error correction is candidate.The main mesh of this step
Be connected applications field, excavated from user's history log meet application target error correction it is candidate.
The thinking of error correction candidate is excavated as conventional error correction problem thinking, difference is the pronunciation similarity using text
As measuring similarity.There are two main methods: a) user conversation (such as customer service is to session), excavates user and actively repairs pronunciation
Mistake excavates repeatedly the pronunciation analog result between different inputs from session context;B) manually customization field emphasis is short
It is candidate to excavate fallibility for language;It is artificial to customize field emphasis phrase in conjunction with business objective, it is excavated from a large amount of logs and customization phrase
Pronounce similar result.
Step S304, error correction;
Online error correction is carried out to (participle is gathered) based on error correction candidate, the thinking of the embodiment of the present invention is as follows:
1) user inputs S0 participle;
Adjacent multiple word combination phrases search whether that there are error correction candidates (to attempt adjacent 1 to 4 phrases respectively from candidate
The phrase of conjunction), there are error correction candidates then to replace corresponding phrase in original input, as a kind of user may input Si (i=1,
2, ,).
2) user is calculated separately to be originally inputted the language model scores of S0 and a variety of possible input Si (language model scores can
To measure the process degree of sentence);
3) compare the score of S0 and multiple Si;
If S0 score is high, without error correction;If Si score is high, the alternative of Si carries out error correction
It can be seen that in the embodiment of the present invention from above embodiment and note transition probability excavated by pronunciation Similar Text
As note similarity, and the alignment requirements of phonetic are relaxed, i.e., finds most like note in permission window, have in processing
When the participle of Arabic numerals, Arabic numerals are first converted into Chinese character, the participle with Arabic numerals can be calculated in this way
With the similarity between other participles.By the above technological means, technical solution provided in an embodiment of the present invention has following skill
Art advantage: 1) the pronunciation similarity obtained using Statistics-Based Method, data source is in user behavior, more representative of really answering
Similarity between note, as a result more acurrate in the case of;2) each sound of available different pronunciation types and same pronunciation type
Pronounce similarity degree between symbol, is a floating point values, and the similarity degree between different notes is more comparable;3) voicing text is being calculated
When the aligned in position of similarity, allow to find optimal alignment in a window as a result, having to the similarity calculation of hiatus or multiword
Robustness.
Based on embodiment above-mentioned, the embodiment of the present invention provides a kind of text error correction device, each list included by the device
Each submodule included by each module or even each module included by member and each unit can calculate equipment by first
In processor realize, certainly can also be realized by specific logic circuit;During specific embodiment, processor can
Think central processing unit (CPU), microprocessor (MPU), digital signal processor (DSP) or field programmable gate array (FPGA)
Deng.
Fig. 4 is the composed structure schematic diagram one of text of embodiment of the present invention error correction device, and shown in Fig. 4, which includes
First forms unit 401, mark unit 402, the first determination unit 403, the first judging unit 404 and the second determination unit 405,
Wherein:
Described first forms unit 401, for collecting the first corpus in the form of participle pair;
The mark unit 402, for marking two participles of the participle centering all in the form of phonetic;
First determination unit 403, it is described for determining the similarity of phonetic between centering two participles of the participle
Similarity is used to show the similarity degree between the phonetic of the participle centering first participle and the phonetic of the second participle;
First judging unit 404, for judging whether the similarity meets preset condition;
Second determination unit 405, if meeting preset condition for the similarity, by the participle centering
Two be identified as mutual error correction participle.
In other embodiments of the invention, the mark unit includes first judgment module and the first labeling module,
In:
The first judgment module, for judging whether the participle centering includes polyphone;
First labeling module, if not including polyphone for the participle centering, by the participle centering
Two participle all in the form of phonetic mark.
In other embodiments of the invention, the mark unit further includes the second judgment module and the second labeling module,
Wherein:
Second judgment module continues to judge the participle centering if including polyphone for the participle centering
Two participle in whether have polyphonic word;
Second labeling module, if including multitone at least one in two participles of the participle centering
The corresponding more than two phonetics of the polyphonic word are labeled as the part of the phonetic of the corresponding participle of participle centering or complete by word
Portion.
In other embodiments of the invention, the mark unit includes third judgment module, conversion module and third mark
Injection molding block, in which:
The third judgment module, for judging whether the participle centering includes Arabic numerals;
The conversion module converts the Arabic numerals if including Arabic numerals for the participle centering
For corresponding Chinese character;
The third labeling module, for by the participle of the participle centering be converted to after Chinese character in the form of phonetic mark
Note.
In other embodiments of the invention, first determination unit includes that alignment module, computing module and first are true
Cover half block, in which:
The alignment module, for what is segmented by the initial consonant alignment of the phonetic of centering two participles of participle and by two
The simple or compound vowel of a Chinese syllable of phonetic is aligned;
The computing module, the phonetic for calculating the participle centering first participle are converted to the phonetic of the second participle
Transition probability;
First determining module, for determining phonetic between participle centering two participles according to the transition probability
Similarity.
In other embodiments of the invention, the alignment module, for the alignment thereof by most same pronunciations by institute
The simple or compound vowel of a Chinese syllable for the phonetic stated the initial consonant alignment of the phonetic of centering two participles of participle and segment two is aligned.
In other embodiments of the invention, the computing module includes computational submodule, the first determining submodule, second
Determine submodule and transform subblock, in which:
The computational submodule, if the word unisonance for two participle same positions after being aligned, calculates score
Score adds 1, and the position of the position of the participle centering first participle and the second participle is all added 1;
Described first determines submodule, if the word for segmenting same position for two after be aligned not unisonance, according to pre-
If the initial and the final similarity matrix determine the phonetic of the first participle in described two participles and the second participle phonetic score
Score;
Described second determines submodule, for the number of words according to the score Score, the first participle, the second word segmented
Number determines normalized final score Sf;
The transform subblock, for determining that the phonetic of the participle centering first participle turns according to the final score Sf
It is changed to the transition probability of the phonetic of the second participle.
In other embodiments of the invention, it described second determines submodule, is used for:
It is obtained according to preset the initial and the final similarity matrix similar between the initial consonant of the word of two participle same positions
Similarity between degree, simple or compound vowel of a Chinese syllable;
If the product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between initial consonant is greater than the first preset value, calculate
Divide Score to add S, the position of the position of the participle centering first participle and the second participle is all added 1;
If the product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of two participle same positions is small
In being equal to the first preset value, then the word and second of the current location of the first participle is obtained according to the initial and the final similarity matrix
The similarity between the similarity between similarity, simple or compound vowel of a Chinese syllable, initial consonant and simple or compound vowel of a Chinese syllable between the initial consonant of the word of the next position of participle
Similarity between simple or compound vowel of a Chinese syllable and initial consonant;
Determine the first maximum value, first maximum value be the word of the current location of the first participle and second segment it is next
The product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of position, the word of the current location of the first participle
The simple or compound vowel of a Chinese syllable of the word of the current location of similarity and the first participle between the simple or compound vowel of a Chinese syllable of the word of the next position of initial consonant and the second participle
The maximum value between similarity this three between the initial consonant of the word of the next position of the second participle;
Judge whether first maximum value is greater than the second preset value, calculates the preceding Score that obtains before score Score is and add
The position of the position of the participle centering first participle and the second participle is all added 1 by upper first maximum value;
If first maximum value is less than or equal to the second preset value, the is obtained according to the initial and the final similarity matrix
It is similar between similarity, simple or compound vowel of a Chinese syllable between the word of the next position of one participle and the initial consonant of the word of the current location of the second participle
The similarity between similarity and simple or compound vowel of a Chinese syllable and initial consonant between degree, initial consonant and simple or compound vowel of a Chinese syllable;
Determine the second maximum value, second maximum value be the word of the next position of the first participle and second segment it is current
The product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of position, the word of the next position of the first participle
The simple or compound vowel of a Chinese syllable of the word of the next position of similarity and the first participle between the simple or compound vowel of a Chinese syllable of the word of the current location of initial consonant and the second participle
The maximum value between similarity this three between the initial consonant of the word of the current location of the second participle;
Judge whether second maximum value is greater than third preset value, calculates the preceding Score that obtains before score Score is and add
The position of the position of the participle centering first participle and the second participle is all added 1 by upper second maximum value, then judges the
One participle and second participle location updating after word whether unisonance, through above-mentioned steps traversal the first participle phonetic and second point
The phonetic of word simultaneously calculates score Score.
In other embodiments of the invention, described device further includes third determination unit, for determining the initial consonant rhythm
Female similarity matrix, the third determination unit further comprise the second determining module, third determining module, the 4th determining mould
Block, the 5th determining module, the 6th determining module, the 7th determining module and the 8th module, in which:
Second determining module, for determining that first note is by the total degree of pronunciation mistake, institute in second corpus
Stating first note includes initial consonant or simple or compound vowel of a Chinese syllable;
The third determining module, for determining number of the first note by incorrect pronunciations for the second note;
4th determining module, for the total degree and the first note wrong by pronunciation according to the first note
Determine that first note transfer is the probability of the second note by the number that incorrect pronunciations are the second note;
5th determining module, for determining that the second note is by the total degree of pronunciation mistake, institute in second corpus
Stating the second note includes initial consonant or simple or compound vowel of a Chinese syllable;
6th determining module, for determining number of second note by incorrect pronunciations for first note;
7th determining module, for the total degree and second note wrong by pronunciation according to second note
Determine that the transfer of the second note is the probability of first note by the number that incorrect pronunciations are first note;
8th determining module, for being the probability and second sound of the second note according to first note transfer
Symbol transfer is the similarity described in the determine the probability of first note between first note and second note.
It need to be noted that: the description of apparatus above embodiment, be with the description of above method embodiment it is similar,
With the similar beneficial effect of same embodiment of the method.For undisclosed technical detail in apparatus of the present invention embodiment, please refer to
The description of embodiment of the present invention method and understand.
Based on embodiment above-mentioned, the embodiment of the present invention provides a kind of text error correction device, each list included by the device
Member can be realized by the processor in the second calculating equipment, can also be realized certainly by specific logic circuit;Having
During body embodiment, processor can be central processing unit (CPU), microprocessor (MPU), digital signal processor
(DSP) or field programmable gate array (FPGA) etc..
Fig. 5 is the composed structure schematic diagram two of text of embodiment of the present invention error correction device, and shown in Fig. 5, which includes
4th determination unit 501, second judgment unit 502, the 5th determination unit 503, the 6th determination unit 504, third judging unit
505 and error correction unit 506, in which:
4th determination unit 501 is segmented for determining to error correction, the sentence segmented to error correction as user's input
In participle;
The second judgment unit 502 is gathered with described to the corresponding participle of error correction participle, institute for judging whether there is
State included at least in participle set a similarity according to phonetic calculated and obtain for described in error correction to error correction point
The error correction of word segments;
5th determination unit 503, for determining that first language model score, the first language model score are institute
It states and segments the language model scores in the sentence to error correction;
6th determination unit 504, for determining that second language model score, the second language model score are institute
It states and segments the language model scores in the sentence respectively to error correction in error correction participle set;
The third judging unit 505 has for judging to whether there is in the second language model score greater than described
First language model score, obtains judging result;
The error correction unit 506, for carrying out error correction to described segment to error correction according to judging result.
In other embodiments of the invention, the error correction unit, is used for: if deposited in the second language model score
Having be greater than the first language model score, by language model scores highest scoring error correction participle be determined as to it is described to
The error correction word of error correction participle;If being not greater than the first language model score in the second language model score, no
Error correction is carried out to described segment to error correction.
In other embodiments of the invention, described device further includes the first formation unit, mark unit, the first determining list
Member, the first judging unit, the second determination unit and second form unit, in which:
Described first forms unit, for collecting the first corpus in the form of participle pair;
The mark unit, for marking two participles of the participle centering all in the form of phonetic;
First determination unit, for determining the similarity of phonetic between centering two participles of the participle, the phase
Like degree for showing the similarity degree between the phonetic of the participle centering first participle and the phonetic of the second participle;
First judging unit, for judging whether the similarity meets preset condition;
Second determination unit, if meeting preset condition for the similarity, by the participle centering
Two are identified as mutual error correction participle;
Described second forms unit, to form the participle set for segmenting according to the error correction.
It need to be noted that: the description of apparatus above embodiment, be with the description of above method embodiment it is similar,
With the similar beneficial effect of same embodiment of the method.For undisclosed technical detail in apparatus of the present invention embodiment, please refer to
The description of embodiment of the present invention method and understand.
Based on embodiment above-mentioned, the embodiment of the present invention provides a kind of calculating equipment, and Fig. 6 is server of the embodiment of the present invention
Composed structure schematic diagram, as shown in fig. 6, the calculating equipment 600 may include: at least one processor 601, at least one is logical
Believe bus 602, user interface 603, at least one external communication interface 604 and the memory 605 for storing executable program
Equal components.Wherein, communication bus 602 is for realizing processor 601, user interface 603, external communication interface 604 and memory
Connection communication between 605.Wherein, user interface 603 may include display screen and keyboard.External communication interface 604 is optional
Including wireline interface and wireless interface.The wherein processor 601, is used for:
The processor 601 is used for:
The first corpus is collected in the form of participle pair;
Two participles that centering is segmented in first corpus are marked all in the form of phonetic;
Determine the similarity of phonetic between centering two participles of the participle, the similarity is for showing the participle pair
Similarity degree between the phonetic of the middle first participle and the phonetic of the second participle;
If the similarity meets preset condition, two of the participle centering are identified as each other
Error correction participle or the first participle be second participle error correction participle;
The similarity is met into the participle of preset condition to formation error correction dictionary;
The error correction dictionary is sent to terminal by the external communication interface 604.
It need to be noted that: the description of the above server implementation item, with the above method description be it is similar, have
The identical beneficial effect with embodiment of the method.For undisclosed technical detail in server example of the present invention, this field
Technical staff please refers to the description of embodiment of the present invention method and understands.
It should be noted that in the embodiment of the present invention, if realizing that above-mentioned text entangles in the form of software function module
Wrong method, and when sold or used as an independent product, it also can store in a computer readable storage medium.Base
In such understanding, substantially the part that contributes to existing technology can be in other words for the technical solution of the embodiment of the present invention
The form of software product embodies, which is stored in a storage medium, including some instructions to
So that a computer equipment (can be personal computer, server or network equipment etc.) executes each implementation of the present invention
The all or part of example the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read
Only Memory), the various media that can store program code such as magnetic or disk.In this way, the embodiment of the present invention does not limit
It is combined in any specific hardware and software.Correspondingly, the embodiment of the present invention provides a kind of computer storage medium, the meter again
Computer executable instructions are stored in calculation machine storage medium, the computer executable instructions are for executing in the embodiment of the present invention
Text error correction method.
It should be understood that " one embodiment " or " embodiment " that specification is mentioned in the whole text mean it is related with embodiment
A particular feature, structure, or characteristic is included at least one embodiment of the present invention.Therefore, occur everywhere in the whole instruction
" in one embodiment " or " in one embodiment " not necessarily refer to identical embodiment.In addition, these specific features, knot
Structure or characteristic can combine in any suitable manner in one or more embodiments.It should be understood that in various implementations of the invention
In example, magnitude of the sequence numbers of the above procedures are not meant that the order of the execution order, the execution sequence Ying Yiqi function of each process
It can determine that the implementation process of the embodiments of the invention shall not be constituted with any limitation with internal logic.The embodiments of the present invention
Serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
There is also other identical elements in the process, method of element, article or device.
In several embodiments provided herein, it should be understood that disclosed device and method can pass through it
Its mode is realized.Apparatus embodiments described above are merely indicative, for example, the division of the unit, only
A kind of logical function partition, there may be another division manner in actual implementation, such as: multiple units or components can combine, or
It is desirably integrated into another system, or some features can be ignored or not executed.In addition, shown or discussed each composition portion
Mutual coupling or direct-coupling or communication connection is divided to can be through some interfaces, the INDIRECT COUPLING of equipment or unit
Or communication connection, it can be electrical, mechanical or other forms.Above-mentioned unit as illustrated by the separation member can be,
Or may not be and be physically separated, component shown as a unit can be or may not be physical unit;It both can be with
It is in one place, it may be distributed over multiple network units;Part therein or complete can be selected according to the actual needs
Portion unit achieves the purpose of the solution of this embodiment.In addition, each functional unit in various embodiments of the present invention can all collect
It, can also be with two or more lists at each unit in one processing unit, is also possible to individually as a unit
Member is integrated in one unit;Above-mentioned integrated unit both can take the form of hardware realization, can also be added using hardware soft
The form of part functional unit is realized.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through
The relevant hardware of program instruction is completed, and program above-mentioned can store in computer-readable storage medium, which exists
When execution, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes: movable storage device, read-only deposits
The various media that can store program code such as reservoir (Read Only Memory, ROM), magnetic or disk.Alternatively, this hair
If bright above-mentioned integrated unit is realized and when sold or used as an independent product in the form of software function module, can also
To be stored in a computer readable storage medium.Based on this understanding, the technical solution essence of the embodiment of the present invention
On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product
It is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal computer, service
Device or the network equipment etc.) execute all or part of each embodiment the method for the present invention.And storage medium packet above-mentioned
It includes: the various media that can store program code such as movable storage device, ROM, magnetic or disk.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (15)
1. a kind of text error correction method, which is characterized in that the described method includes:
The first corpus is collected in the form of participle pair;
Two participles that centering is segmented in first corpus are marked all in the form of phonetic;
Determine the similarity of phonetic between centering two participles of the participle, the similarity is for showing the participle centering the
Similarity degree between the phonetic of one participle and the phonetic of the second participle;
If the similarity meets preset condition, mutual entangle is identified as by two of the participle centering
Mistake participle or the first participle are the error correction participle of the second participle;
Wherein, the preset condition is threshold value, and when the similarity is greater than the threshold value, then the similarity meets default
Condition;
It is described to mark two participles that centering is segmented in first corpus all in the form of phonetic, comprising: if described point
Word centering includes that at least one in polyphone and two participles of the participle centering includes polyphonic word, by the participle centering
There are the participles of polyphonic word to be marked in the form of more than two phonetics.
2. the method according to claim 1, wherein two participles by the participle centering are all with phonetic
Form mark, further includes:
If the participle centering does not include polyphone, by two participle marks all in the form of phonetic of the participle centering
Note.
3. the method according to claim 1, wherein two participles by the participle centering are all with phonetic
Form mark, comprising:
If the participle centering includes Arabic numerals, the Arabic numerals are converted into corresponding Chinese character;
The participle of the participle centering be converted to after Chinese character is marked in the form of phonetic.
4. method according to any one of claims 1 to 3, which is characterized in that described two points of centering of participle of the determination
The similarity of phonetic between word, comprising:
By the initial consonant alignment of the phonetic of centering two participles of participle and the simple or compound vowel of a Chinese syllable of the phonetic of two participles is aligned;
Calculate the transition probability that the phonetic for segmenting the centering first participle is converted to the phonetic of the second participle;
The similarity of phonetic between centering two participles of the participle is determined according to the transition probability.
5. according to the method described in claim 4, it is characterized in that, the sound of the phonetic by centering two participles of participle
The simple or compound vowel of a Chinese syllable of mother's alignment and the phonetic for segmenting two is aligned, comprising:
By the alignment thereof of most same pronunciations by the initial consonant alignment of the phonetic of centering two participles of participle and by two points
The simple or compound vowel of a Chinese syllable of the phonetic of word is aligned.
6. according to the method described in claim 4, it is characterized in that, the phonetic for calculating the participle centering first participle turns
It is changed to the transition probability of the phonetic of the second participle, comprising:
Determine the number of note different between the first participle and second participle;
Institute is determined according to the length of the number of the different note and the first participle or the note string of second participle
State transition probability.
7. according to the method described in claim 4, it is characterized in that, the phonetic for calculating the participle centering first participle turns
It is changed to the transition probability of the phonetic of the second participle, comprising:
If the word unisonance of two participle same positions after alignment, calculates score Score and adds 1, and by the participle centering
The position of the first participle and the position of the second participle all add 1;
If the word not unisonance of two participle same positions after alignment, determines institute according to preset the initial and the final similarity matrix
State the score Score of the phonetic of the phonetic of the first participle and the second participle in two participles;
Normalized final score Sf is determined according to the number of words of the score Score, the number of words of the first participle, the second participle;
Determine that the phonetic of the participle centering first participle is converted to turning for the phonetic of the second participle according to the final score Sf
Change probability.
8. the method according to claim 1, wherein phonetic between the determination participle centering two participles
Similarity, comprising:
The similarity of phonetic between centering two participles of the participle is determined using preset the initial and the final similarity matrix.
9. method according to claim 7 or 8, which is characterized in that the determination the initial and the final similarity matrix packet
It includes:
The second corpus is collected in the form of participle pair;
Two participles that centering is segmented in second corpus are marked all in the form of phonetic;
Determine that for first note by the total degree of pronunciation mistake, the first note includes initial consonant or simple or compound vowel of a Chinese syllable in second corpus;
Determine number of the first note by incorrect pronunciations for the second note;
According to the first note by pronunciation mistake total degree and the first note by incorrect pronunciations be the second note time
Number determines that first note transfer is the probability of the second note;
Determine that for the second note by the total degree of pronunciation mistake, second note includes initial consonant or simple or compound vowel of a Chinese syllable in second corpus;
Determine number of second note by incorrect pronunciations for first note;
According to second note by pronunciation mistake total degree and second note by incorrect pronunciations be first note time
Number determines that the transfer of the second note is the probability of first note;
It is true according to the probability that the probability and second note transfer that first note transfer is the second note are first note
Similarity between the fixed first note and second note, according between the first note and second note
Similarity forms the initial and the final similarity matrix.
10. the method according to claim 1, wherein the method also includes:
The similarity is met into the participle of preset condition to formation error correction dictionary;
The error correction dictionary is sent to terminal.
11. the method according to claim 1, wherein the method also includes:
The error correction request that terminal is sent is received, the sentence of user's input is carried in the error correction request;
It determines and is segmented to error correction, the participle in the sentence inputted to error correction participle for the user;
Gather if there is with described to the corresponding participle of error correction participle, determines first language model score and second language model
Score, the first language model score are the language model scores to error correction participle in the sentence, the participle
One is included at least in set for the error correction participle described in error correction to error correction participle, the second language model score is described
The language model scores in the sentence respectively are segmented to error correction in error correction participle set;
If, will be in language model scores there are the first language model score is greater than in the second language model score
The error correction participle of highest scoring is determined as to the error correction word to error correction participle;
The error correction word is carried in the first error correction response, first error correction response is sent to terminal.
12. according to the method for claim 11, which is characterized in that the method also includes:
If the first language model score is not greater than in the second language model score, or if there is no with it is described
Segment corresponding participle set to error correction, send the second error correction response, second error correction respond for show not to it is described to
Error correction participle carries out error correction.
13. a kind of text error correction device, which is characterized in that described device is determined including the first formation unit, mark unit, first
Unit and the second determination unit, in which:
Described first forms unit, for collecting the first corpus in the form of participle pair;
The mark unit, for marking two participles of the participle centering all in the form of phonetic;
First determination unit, for determining the similarity of phonetic between centering two participles of the participle, the similarity
For showing the similarity degree between the phonetic of the participle centering first participle and the phonetic of the second participle;
Second determination unit, if meeting preset condition for the similarity, by two of the participle centering
It is identified as mutual error correction participle;
Wherein, the preset condition is threshold value, and when the similarity is greater than the threshold value, then the similarity meets default
Condition;
The mark unit includes the second labeling module, second labeling module, if including more for the participle centering
At least one in sound word and two participles of the participle centering includes polyphonic word, and by the participle centering, there are polyphonic words
Participle is marked in the form of more than two phonetics.
14. a kind of server, which is characterized in that the server includes processor and external communication interface, and the processor is used
In:
The first corpus is collected in the form of participle pair;
Two participles that centering is segmented in first corpus are marked all in the form of phonetic;
Determine the similarity of phonetic between centering two participles of the participle, the similarity is for showing the participle centering the
Similarity degree between the phonetic of one participle and the phonetic of the second participle;
If the similarity meets preset condition, mutual entangle is identified as by two of the participle centering
Mistake participle or the first participle are the error correction participle of the second participle;
The similarity is met into the participle of preset condition to formation error correction dictionary;
The error correction dictionary is sent to terminal by the external communication interface;
Wherein, the preset condition is threshold value, and when the similarity is greater than the threshold value, then the similarity meets default
Condition;
It is described to mark two participles that centering is segmented in first corpus all in the form of phonetic, comprising: if described point
Word centering includes that at least one in polyphone and two participles of the participle centering includes polyphonic word, by the participle centering
There are the participles of polyphonic word to be marked in the form of more than two phonetics.
15. a kind of computer storage medium, which is characterized in that be stored with the executable finger of computer in the computer storage medium
It enables, which requires 1 to 12 described in any item text error correction methods for perform claim.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610922072.0A CN106598939B (en) | 2016-10-21 | 2016-10-21 | A kind of text error correction method and device, server, storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610922072.0A CN106598939B (en) | 2016-10-21 | 2016-10-21 | A kind of text error correction method and device, server, storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106598939A CN106598939A (en) | 2017-04-26 |
CN106598939B true CN106598939B (en) | 2019-09-17 |
Family
ID=58555570
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610922072.0A Active CN106598939B (en) | 2016-10-21 | 2016-10-21 | A kind of text error correction method and device, server, storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106598939B (en) |
Families Citing this family (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107357775A (en) | 2017-06-05 | 2017-11-17 | 百度在线网络技术(北京)有限公司 | The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence |
CN107301866B (en) * | 2017-06-23 | 2021-01-05 | 北京百度网讯科技有限公司 | Information input method |
CN107564528B (en) * | 2017-09-20 | 2020-12-15 | 广东惠禾科技发展有限公司 | Method and equipment for matching voice recognition text with command word text |
CN109753636A (en) * | 2017-11-01 | 2019-05-14 | 阿里巴巴集团控股有限公司 | Machine processing and text error correction method and device calculate equipment and storage medium |
CN108052499B (en) * | 2017-11-20 | 2021-06-11 | 北京百度网讯科技有限公司 | Text error correction method and device based on artificial intelligence and computer readable medium |
CN108091328B (en) * | 2017-11-20 | 2021-04-16 | 北京百度网讯科技有限公司 | Speech recognition error correction method and device based on artificial intelligence and readable medium |
CN107958039A (en) * | 2017-11-21 | 2018-04-24 | 北京百度网讯科技有限公司 | A kind of term error correction method, device and server |
CN108563632A (en) * | 2018-03-29 | 2018-09-21 | 广州视源电子科技股份有限公司 | Modification method, system, computer equipment and the storage medium of word misspelling |
CN108519973A (en) * | 2018-03-29 | 2018-09-11 | 广州视源电子科技股份有限公司 | Detection method, system, computer equipment and the storage medium of word spelling |
CN108491392A (en) * | 2018-03-29 | 2018-09-04 | 广州视源电子科技股份有限公司 | Modification method, system, computer equipment and the storage medium of word misspelling |
CN108647346B (en) * | 2018-05-15 | 2021-10-29 | 苏州东巍网络科技有限公司 | Old people voice interaction method and system for wearable electronic equipment |
CN110019684B (en) * | 2018-08-17 | 2021-06-15 | 武汉斗鱼网络科技有限公司 | Method, device, terminal and storage medium for correcting search text |
CN111079412B (en) * | 2018-10-18 | 2024-01-23 | 北京嘀嘀无限科技发展有限公司 | Text error correction method and device |
CN109376362A (en) * | 2018-11-30 | 2019-02-22 | 武汉斗鱼网络科技有限公司 | A kind of the determination method and relevant device of corrected text |
CN111339755A (en) * | 2018-11-30 | 2020-06-26 | 中国移动通信集团浙江有限公司 | Automatic error correction method and device for office data |
CN109858473B (en) * | 2018-12-28 | 2023-03-07 | 天津幸福生命科技有限公司 | Self-adaptive deviation rectifying method and device, readable medium and electronic equipment |
CN109739368A (en) * | 2018-12-29 | 2019-05-10 | 咪咕文化科技有限公司 | A kind of method, apparatus of the fractionation of the Chinese phonetic alphabet |
CN111462748B (en) * | 2019-01-22 | 2023-09-26 | 北京猎户星空科技有限公司 | Speech recognition processing method and device, electronic equipment and storage medium |
CN109901727A (en) * | 2019-03-06 | 2019-06-18 | 上海依智医疗技术有限公司 | A kind of method and apparatus obtaining text error correction information |
CN110399608B (en) * | 2019-06-04 | 2023-04-25 | 深思考人工智能机器人科技(北京)有限公司 | Text error correction system and method for dialogue system based on pinyin |
CN110276077A (en) * | 2019-06-25 | 2019-09-24 | 上海应用技术大学 | The method, device and equipment of Chinese error correction |
CN110516248A (en) * | 2019-08-27 | 2019-11-29 | 出门问问(苏州)信息科技有限公司 | Method for correcting error of voice identification result, device, storage medium and electronic equipment |
CN112668311A (en) * | 2019-09-29 | 2021-04-16 | 北京国双科技有限公司 | Text error detection method and device |
CN111783433A (en) * | 2019-12-26 | 2020-10-16 | 北京沃东天骏信息技术有限公司 | Text retrieval error correction method and device |
CN111078898B (en) * | 2019-12-27 | 2023-08-08 | 出门问问创新科技有限公司 | Multi-tone word annotation method, device and computer readable storage medium |
CN111639495A (en) * | 2020-04-28 | 2020-09-08 | 深圳壹账通智能科技有限公司 | Parallel corpus generation method, device, equipment and storage medium |
CN111611792B (en) * | 2020-05-21 | 2023-05-23 | 全球能源互联网研究院有限公司 | Entity error correction method and system for voice transcription text |
CN112509581B (en) * | 2020-11-20 | 2024-03-01 | 北京有竹居网络技术有限公司 | Error correction method and device for text after voice recognition, readable medium and electronic equipment |
CN112417867B (en) * | 2020-12-07 | 2022-10-18 | 四川长虹电器股份有限公司 | Method and system for correcting video title error after voice recognition |
CN112560493B (en) * | 2020-12-17 | 2024-04-30 | 金蝶软件(中国)有限公司 | Named entity error correction method, named entity error correction device, named entity error correction computer equipment and named entity error correction storage medium |
CN112836039B (en) * | 2021-01-27 | 2023-04-21 | 成都网安科技发展有限公司 | Voice data processing method and device based on deep learning |
CN116013278B (en) * | 2023-01-06 | 2023-08-08 | 杭州健海科技有限公司 | Speech recognition multi-model result merging method and device based on pinyin alignment algorithm |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101080875A (en) * | 2005-09-01 | 2007-11-28 | 日本电信电话株式会社 | Error correcting method and apparatus |
US7565348B1 (en) * | 2005-03-24 | 2009-07-21 | Palamida, Inc. | Determining a document similarity metric |
CN102915314A (en) * | 2011-08-05 | 2013-02-06 | 腾讯科技(深圳)有限公司 | Automatic error correction pair generation method and system |
CN103365925A (en) * | 2012-04-09 | 2013-10-23 | 高德软件有限公司 | Method for acquiring polyphone spelling, method for retrieving based on spelling, and corresponding devices |
CN103914444A (en) * | 2012-12-29 | 2014-07-09 | 高德软件有限公司 | Error correction method and device thereof |
CN104750672A (en) * | 2013-12-27 | 2015-07-01 | 重庆新媒农信科技有限公司 | Chinese word error correction method used in search and device thereof |
CN104991889A (en) * | 2015-06-26 | 2015-10-21 | 江苏科技大学 | Fuzzy word segmentation based non-multi-character word error automatic proofreading method |
-
2016
- 2016-10-21 CN CN201610922072.0A patent/CN106598939B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7565348B1 (en) * | 2005-03-24 | 2009-07-21 | Palamida, Inc. | Determining a document similarity metric |
CN101080875A (en) * | 2005-09-01 | 2007-11-28 | 日本电信电话株式会社 | Error correcting method and apparatus |
CN102915314A (en) * | 2011-08-05 | 2013-02-06 | 腾讯科技(深圳)有限公司 | Automatic error correction pair generation method and system |
CN103365925A (en) * | 2012-04-09 | 2013-10-23 | 高德软件有限公司 | Method for acquiring polyphone spelling, method for retrieving based on spelling, and corresponding devices |
CN103914444A (en) * | 2012-12-29 | 2014-07-09 | 高德软件有限公司 | Error correction method and device thereof |
CN104750672A (en) * | 2013-12-27 | 2015-07-01 | 重庆新媒农信科技有限公司 | Chinese word error correction method used in search and device thereof |
CN104991889A (en) * | 2015-06-26 | 2015-10-21 | 江苏科技大学 | Fuzzy word segmentation based non-multi-character word error automatic proofreading method |
Also Published As
Publication number | Publication date |
---|---|
CN106598939A (en) | 2017-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106598939B (en) | A kind of text error correction method and device, server, storage medium | |
US11416679B2 (en) | System and method for inputting text into electronic devices | |
US11614862B2 (en) | System and method for inputting text into electronic devices | |
US10402493B2 (en) | System and method for inputting text into electronic devices | |
Kim et al. | Two-stage multi-intent detection for spoken language understanding | |
US9471566B1 (en) | Method and apparatus for converting phonetic language input to written language output | |
CN1918578B (en) | Handwriting and voice input with automatic correction | |
EP1686493A2 (en) | Dictionary learning method and device using the same, input method and user terminal device using the same | |
US20130041647A1 (en) | Method for disambiguating multiple readings in language conversion | |
CN105404621B (en) | A kind of method and system that Chinese character is read for blind person | |
CN103578464A (en) | Language model establishing method, speech recognition method and electronic device | |
JP2003514304A (en) | A linguistic input architecture that converts from one text format to another and is resistant to spelling, typing, and conversion errors | |
JP2003527676A (en) | A linguistic input architecture that converts one text format to the other text format with modeless input | |
CN111783443B (en) | Text disturbance detection method, disturbance recovery method, disturbance processing method and device | |
CN102915122B (en) | Based on the intelligent family moving platform spelling input method of language model | |
CN102214238A (en) | Device and method for matching similarity of Chinese words | |
CN112346696A (en) | Speech comparison of virtual assistants | |
CN1965349A (en) | Multimodal disambiguation of speech recognition | |
KR101777141B1 (en) | Apparatus and method for inputting chinese and foreign languages based on hun min jeong eum using korean input keyboard | |
Celikkaya et al. | A mobile assistant for Turkish | |
Saraçlar et al. | Utterance classification with discriminative language modeling | |
Perez-Cortes et al. | Improvement of embedded human-machine interfaces combining language, hypothesis and error models | |
JP2019159118A (en) | Output program, information processing device, and output control method | |
Yang et al. | A New Hybrid Method for Machine Transliteration | |
Ma et al. | Phrase-based approach for adaptive tokenization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |