CN106598939A

CN106598939A - Method and device for text error correction, server and storage medium

Info

Publication number: CN106598939A
Application number: CN201610922072.0A
Authority: CN
Inventors: 焦增涛
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2016-10-21
Filing date: 2016-10-21
Publication date: 2017-04-26
Anticipated expiration: 2036-10-21
Also published as: CN106598939B

Abstract

The invention discloses a method and a device for text error correction. The method comprises the following steps of collecting a first corpus in a form of a participle pair; labelling two participles in the participle pair in a pinyin form; determining similarity of the pinyin between the two participles in the participle pair, wherein the similarity is used for indicating similarity degree between the pinyin of the first participle and the pinyin of the second participle in the participle pair; and if the similarity meets preconditions, respectively determining the two participles in the participle pair as error correction participles of each other or making the first participle as an error correction participle of the second participle.

Description

A kind of text error correction method and device, server, storage medium

Technical field

The present invention relates to electronic technology, more particularly to a kind of text error correction method and device, server, storage medium.

Background technology

Text error correcting technique is widely used in various text input scenes, such as input method, search engine, speech recognition Deng, text error correcting technique be may in a kind of text (such as the keyword of the word such as Chinese English) attempted and correct user input The mistake of presence, and possible correctly entering is recommended user.For Chinese error correction, text error correcting technique also needs to send out Word selection mistake, wrong phonetic notation, font mistake and its a little mistake occurred in existing user input, and may wish to user recommended user Hope the correct keyword of input.As can be seen here, error correcting technique effectively can provide guidance for user entered keyword, and can entangle More in use often there is keyword mistake in Jing.In text error correcting technique, be repaired key word and correct key word it Between similarity decide the accuracy rate of error correction, current calculating similarity mainly includes：Sending out based on mandarin initial and simple or compound vowel of a Chinese syllable Sound type, by the initial and the final several groups are divided into, and the note similarity defined in same group is 1, and the note between different groups is similar Spend for 0, by Chinese character aligned in position, the pronunciation similarity of relevant position is calculated one by one, and then be averaging similarity as a result.Should The shortcoming of scheme is：Similarity degree between accurate description note will be unable to for 0 this definition for 1 different groups with group, so as to Have ignored the similarity difference between note, similar journey of pronouncing between such as labial b [glass] and p [slope], b [glass] and m [touching] in group Degree difference, and be only entirely zero note similarity degree between group, as having certain between labial b [glass] and velar g [brother] Similar pronunciation.

The content of the invention

In view of this, the embodiment of the present invention provides a kind of text to solve at least one problem present in prior art Error correction method and device, server, storage medium, are used as note similar by the Similar Text excavation note transition probability that pronounces Degree, it is possible to increase error correction probability.

What the technical scheme of the embodiment of the present invention was realized in：

In a first aspect, the embodiment of the present invention provides a kind of text error correction method, methods described includes：

The first language material is collected in the form of participle pair；

Two participles of participle centering in first language material are marked all in the form of phonetic；

Determine the similarity of phonetic between two participles of the participle centering, the similarity is used to show the participle pair Similarity degree between the phonetic of the phonetic of the middle first participle and the second participle；

If the similarity meets default condition, two participles of the participle centering are identified as each other Error correction participle or error correction participle that the first participle is the second participle.

Second aspect, the embodiment of the present invention provides a kind of text error correction device, and described device includes that first forms unit, mark Note unit, the first determining unit and the second determining unit, wherein：

Described first forms unit, for collecting the first language material in the form of participle pair；

The mark unit, for by two participles of participle centering in first language material all in the form of phonetic mark Note；

First determining unit, for determining the similarity of phonetic between two participles of the participle centering, the phase Seemingly spend the similarity degree for showing between the phonetic of the participle centering first participle and the phonetic of the second participle；

Second determining unit, if meeting default condition for the similarity, by the participle centering The error correction participle that two participles are identified as mutual error correction participle or the first participle is the second participle.

The third aspect, the embodiment of the present invention provides a kind of server, and the server includes that processor and PERCOM peripheral communication connect Mouthful, the processor is used for：

The first language material is collected in the form of participle pair；

If the similarity meets default condition, two participles of the participle centering are identified as each other Error correction participle or error correction participle that the first participle is the second participle；

The similarity is met into the participle of default condition to forming error correction dictionary；

The error correction dictionary is sent to into terminal by the external communication interface.

Fourth aspect, the embodiment of the present invention provides a kind of computer-readable storage medium, stores in the computer-readable storage medium There are computer executable instructions, the computer executable instructions are used to perform the text error correction method that above-mentioned first aspect is provided.

The embodiment of the present invention provides a kind of text error correction method and device, server, storage medium, wherein, with participle pair Form collect the first language material；Two participles of participle centering in first language material are marked all in the form of phonetic；It is determined that The similarity of phonetic between two participles of the participle centering, the similarity is used to show the participle centering first participle Similarity degree between the phonetic of phonetic and the second participle；If the similarity meets default condition, by the participle The error correction participle that two participles of centering are identified as mutual error correction participle or the first participle is the second participle；Thus, logical Cross pronunciation Similar Text and excavate note transition probability as note similarity, it is possible to increase error correction probability.

Description of the drawings

Fig. 1 realizes schematic flow sheet one for embodiment of the present invention text error correction method；

Fig. 2-1 realizes schematic flow sheet two for embodiment of the present invention text error correction method；

Fig. 2-2 is the relation schematic diagram of the computing device of the embodiment of the present invention first and the second computing device；

Fig. 2-3 realizes schematic flow sheet three for embodiment of the present invention text error correction method；

Fig. 3-1 realizes schematic flow sheet four for embodiment of the present invention text error correction method；

Fig. 3-2 is that step S301 realizes schematic flow sheet in Fig. 3-1；

Fig. 3-3 is that step S302 realizes schematic flow sheet in Fig. 3-1；

Fig. 3-4 is that step S324 realizes schematic flow sheet in Fig. 3-3；

Fig. 4 is the composition structural representation one of embodiment of the present invention text error correction device；

Fig. 5 is the composition structural representation two of embodiment of the present invention text error correction device；

Fig. 6 is the composition structural representation of embodiment of the present invention server.

Specific embodiment

Below in conjunction with the accompanying drawings the technical solution of the present invention is further elaborated with specific embodiment.

In order to solve aforesaid technical problem, the embodiment of the present invention provides a kind of text error correction method, and the method is used for shape Into the corresponding error correction participle of participle, during realization, the method can call journey by the processor of the first computing device Realizing, certain program code can be stored in computer-readable storage medium sequence code, it is seen then that first computing device is at least wrapped Processor and storage medium are included, first computing device can set for various types of electronics with information processing capability It is standby, such as described electronic equipment can include mobile phone, panel computer, desktop computer, personal digital assistant, navigator, digital telephone, Visual telephone, television set etc..

Fig. 1 realizes schematic flow sheet one for embodiment of the present invention text error correction method, as shown in figure 1, the method includes：

Step S101, collects the first language material in the form of participle pair；

Here, step S101 is one the step of collect language material, and during realization, step S101 can be from following several Individual channel collects language material：The nearly sound words allusion quotation of Chinese Chinese language, the confusing dialect of note and RP dictionary, speech recognition errors Input method mistake annotation results in annotation results and line.The form that language material is collected is complete in the form of participle is to (phrase fragment to) Into for example：" logging off "-" lower limb slightly dragon ", " coupons "-" cash equivalent volume ", " comrades "-" bobbins ", " comrades "- " notice door ", " dried shrimps "-" villagers ", " brined vegetable are too expensive "-" let's start the meeting " and " sausage pickled melon "-" chief of township's speech ", " comrades "-" let's start the meeting " etc..It should be noted that allowing the participle pair for including mistake, such as participle pair in the first language material The phonetic diversity ratio of " comrades "-" let's start the meeting " is larger, is generally not to be regarded as the similar participle pair of phonetic. Also include the second language material for forming the initial and the final similarity matrix, the second language material and first in the other embodiment of the present invention Language material can be with difference, and the second language material can essentially regard a standard corpus as, i.e. the second language material the inside should not include mistake Participle pair by mistake；And wrong participle pair can be included in the first language material, the first language material forms this through the embodiment shown in Fig. 1 The participle set that invention is provided.

Step S102, two participles of participle centering in first language material are marked all in the form of phonetic；

Here, continue to accept the example in above-mentioned steps S101, the phonetic for marking " brined vegetable are too expensive " is " xian-cai- Tai-gui ", the phonetic for marking " let's start the meeting " is " xian-zai-kai-hui ", and the phonetic for marking " sausage pickled melon " is " xiang-chang-jiang-gua ", the phonetic for marking " chief of township's speech " is " xiang-zhang-jiang-hua ".

Step S103, determines the similarity of phonetic between two participles of the participle centering, and the similarity is used to show Similarity degree between the phonetic of the participle centering first participle and the phonetic of the second participle；

Here, continue to accept the example in above-mentioned steps S101, for example determine " logging off " and " lower limb slightly dragon " the two The similarity of phonetic between participle, for another example, it is determined that between " comrades " and " bobbins " the two participles phonetic similarity, then Similarity as determined phonetic between " comrades " and " let's start the meeting " the two participles.

Here, the similarity for determining phonetic between described two participles of participle centering includes：Using default initial consonant Simple or compound vowel of a Chinese syllable similarity matrix determines the similarity of phonetic between two participles of the participle centering.Wherein with regard to determining the initial and the final phase Describe in other examples like the process of degree matrix.

Step S104, judges whether the similarity meets default condition；

Here, the default condition can be threshold value, described to judge whether similarity meets default condition and include：Sentence Whether the similarity of breaking is more than the threshold value, if the similarity is more than the threshold value, it is determined that full for the similarity The foot default condition, if the similarity is less than or equal to the threshold value, it is determined that be unsatisfactory for for the similarity described Default condition.

Step S105, if the similarity meets default condition, two participles of the participle centering is distinguished It is defined as mutual error correction participle or error correction participle that the first participle is the second participle.

Here, continue to accept the example in above-mentioned step S103, step S105 can be identified as two participles Mutual error correction participle；For example, " brined vegetable are too expensive " is defined as the error correction participle of " let's start the meeting ", " let's start the meeting " is defined as The error correction participle of " brined vegetable are too expensive "；And for example, " sausage pickled melon " is defined as the error correction participle of " chief of township's speech ", by " chief of township's speech " It is defined as the error correction participle of " sausage pickled melon ".Rapid S105 can be by error correction participle that the first participle is the second participle；For example, can be with " comrades " are defined as into the error correction participle of " bobbins ", " will be logged off " and be defined as the error correction participle of " lower limb is slightly imperial ".From with It is upper as can be seen that being defined as mutual error correction participle is actually a kind of two-way mechanism for correcting errors, and be defined as the first participle and be The error correction participle of the second participle is actually a kind of unidirectional mechanism for correcting errors, this is because, two points in two-way mechanism for correcting errors Word is all the everyday expressions in Working Life study, and only the applied environment of the two of participle centering word is different, for example, " let's start the meeting " and " brined vegetable are too expensive " mutual error correction participle (i.e. two-way mechanism for correcting errors) each other, " let's start the meeting " is generally used for In work, and " brined vegetable are too expensive " is generally used in life.Mistake is generally acknowledged to by the participle of error correction in unidirectional mechanism for correcting errors Word, for example " log off " and be the error correction participle of " lower limb slightly dragon ", i.e., " log off " for error correction " lower limb slightly dragon ", and " lower limb is thick Dragon " be generally divided into be a mistake participle；For another example, " comrades " are the error correction participles of " bobbins ", i.e., " comrades " use In error correction " bobbins " or " notice door ", " bobbins " or " notice door " be generally divided into be a mistake participle.Need Bright, above-mentioned unidirectional mechanism for correcting errors can be converted under certain conditions two-way mechanism for correcting errors, such as in some feelings Under condition, " bobbins " or " notice door " are likely to be considered as a correct word.

In the embodiment of the present invention, an error correction dictionary can be formed according to the error correction participle that step S105 determines, i.e. described Method also includes：The similarity is met into the participle of default condition to forming error correction dictionary；The error correction dictionary is sent To terminal.Include some participle set in the error correction dictionary, the participle set at least includes a phase according to phonetic Seemingly spend the error correction participle for treating error correction participle described in error correction for being calculated and being drawn, for example, " comrades " corresponding participle Set includes " bobbins " and " notice door ", and " let's start the meeting " corresponding participle set includes " brined vegetable are too expensive ", and " notice door " is right The participle set answered includes " bobbins " and " comrades ", and " lower limb is slightly imperial " corresponding participle set includes " logging off ".

In the above-described embodiment, need to carry out pinyin marking to two participles of participle centering in step S102, in mark During note phonetic, step S102 is comprised the following steps：

Step S121, judges whether the participle centering includes Arabic numerals；

Step S122, if the participle centering includes Arabic numerals, the Arabic numerals is converted to corresponding Chinese character；

Here, suppose that participle is " speed 8 " or " Mo Tai 168 ", then the Arabic numerals in participle are converted to into Chinese character is " speed eight " or " Mo Tai 1 ".

Step S123, the participle be converted to after Chinese character of the participle centering is marked in the form of phonetic；

Here, continue to accept the example in above-mentioned steps S122, it is assumed that participle is " speed 8 " or " Mo Tai 168 ", then in mark When noting the phonetic of the two participles, the phonetic of " speed 8 " is " su-ba ", and the phonetic of " Mo Tai 168 " is " mo-tai-yi-liu-ba " Or " mo-tai-yao-liu-ba ".Wherein in " Mo Tai 1 " one is polyphone, can be labeled as " yi or yao ".

Step S124, judges whether the participle centering includes polyphone；

Step S125, if the participle centering does not include polyphone, by two participles of the participle centering all Marked in the form of phonetic.

Here, such as participle includes polyphone to " lower limb is slightly imperial "-" logging off ", then the two participles are carried out Pinyin marking：The phonetic of " lower limb is slightly imperial " is " tui-cu-long ", and the phonetic of " logging off " is " tui-chu-xi-tong ".

Step S126, if the participle centering includes polyphone, continuation is judged in two participles of the participle centering Whether polyphonic word is had；

Here, such as participle is to for " PianYiFang "-" variation side " or " PianYiFang "-" derogatory sense side ", wherein " PianYiFang " In " just " be polyphone, " just " corresponding phonetic has " two sound pi á n " and " four tones of standard Chinese pronunciation bi à n ".So continue to judge the participle pair In two participles in whether have polyphonic word, for example, in " PianYiFang " it is " cheap " be polyphonic word, " cheap " corresponding phonetic bag Include " bian-yi " and " pian-yi ".In general, the collection of language material is in the form of participle pair, then the present embodiment is in order to carry High efficiency, directly judges whether participle centering has polyphonic word, and for example, " list " word is polyphone, is singly pronouncing the " four tones of standard Chinese pronunciation as surname Shi Sh à n ", in the title as ancient times Xiongnu monarch " ch á n " is pronounced, and " sound a d is pronounced during such as loneliness as in general phrase ān”.When language material is collected, if participle is to for " loneliness "-" fighting single-handed ", although be singly polyphone, but loneliness is not multitone Word, then lonely phonetic is exactly unique, and without " list " is labeled as into three phonetics " ch á n ", " sh à n " and " d ā n ".By This is visible, and the method that the present embodiment is provided can significantly improve computational efficiency.If the appearance of participle centering polyphone is single Word rather than phrase, then need to mark out each phonetic of the word to come；As it was previously stated, the collection of language material is participle pair Form, therefore, the appearance of participle centering polyphone is that the situation of single word rather than phrase will be very rare.

Step S127, if at least one includes polyphonic word in two participles of the participle centering, by the multitone The corresponding two or more phonetic of word is labeled as the part or all of of the phonetic of the participle centering correspondence participle.

Here, continue to accept above-mentioned example, " PianYiFang " mark phonetic be include " bian-yi-fang " and “pian-yi-fang”。

From the above, it can be seen that above-mentioned steps S102 can essentially be one by participle to turning the process of phonetic, in reality The step can realize that each word is obtained in Chinese dictionary to be had by way of one is looked into Chinese character and pinyin table in existing process Correspondence phonetic.Process step is as follows：1) encounter non-polyphone table look at and turn phonetic, 2) encounter polyphone, check the word with week The group word result of side word is tabled look-up, and exists and word has unique pronunciation then to turn phonetic；Exist and word pronunciation multitone not yet, using language Model determines pronunciation (such as：Cheap pin-yi, bian-yi)；3) do not exist using acquiescence pronunciation (in table polyphone have acquiescence send out Sound)；4) encounter Arabic numerals, switch to corresponding Chinese character and tabled look-up again；4) encounter English character, mark phonetic can not be done Process；5) encounter the Chinese character not in table, skip the word, and the phonetic of the position is set to into sky.

It should be noted that in above-mentioned steps S121 to step S127, step S121 to step S123 and step S124 Relation is performed to strict priority is had no between step S127, i.e., in implementation process, step S121 can be first carried out to step S123, then execution step S124 is to step S127；Certainly step S124 can also be first carried out to step S127, then execution step S121 is to step S123.

In other embodiments of the invention, step S103 is used to determine phonetic between two participles of the participle centering Similarity, the step includes：

Step S131, by the alignment of the initial consonant of the phonetic of two participles of participle centering and by the rhythm of the phonetic of two participles Mother's alignment；

Here, in other embodiments of the invention, by the alignment thereof of most same pronunciations by the participle centering two The initial consonant alignment of the phonetic of individual participle simultaneously aligns the simple or compound vowel of a Chinese syllable of the phonetic of two participles.For example, " log off "-" lower limb is slightly imperial " Phonetic alignment is as follows：

" logging off " --- t-ui-ch-u-x-i-t-ong；

" lower limb is slightly imperial " --- t-ui-c--u- -- l-ong；

During alignment, in order to obtain most same pronunciations, by the simple or compound vowel of a Chinese syllable " long " of " dragon " and the simple or compound vowel of a Chinese syllable of " system " " long " aligns, rather than the phonetic of " dragon " and " being " is alignd；" " represents default.In this example, " logging off " is four Word, " lower limb is slightly imperial " is three words, is first alignd in order in the alignment for most starting, i.e. spelling of the phonetic of " lower limb " corresponding to " moving back " Sound, the phonetic of " thick " correspond to the phonetic of " being " corresponding to the phonetic of " going out ", the phonetic of " dragon ", and the phonetic of " system " is default, the One group " lower limb " is very high with the similarity of " moving back " and second group " thick " and " going out ", but the similarity of the 3rd group " dragon " and " being " is very It is low, at this time, the present embodiment can enter line misregistration process, will the 3rd group be changed to：The phonetic of " being " is default, the 4th group of change For：Phonetic of the phonetic of " dragon " corresponding to " system "；Process through dislocation, first group, second group and the 4th group of similarity all can Comparison is high.In correlation technique, during using voicing text similarity, by two sections of texts according to word sequence aligned in position, prior art The disadvantage is that, in the case of meeting some sentence multiword or few word, the mistake alignment plenary session mistake of follow-up location.And this The method that bright embodiment adopts most same pronunciations, in the case of ensure that certain section of text multiword or few word, two sections of texts Between alignment.

Step S132, the conversion of the phonetic that the phonetic for calculating the participle centering first participle is converted to the second participle is general Rate；

Step S133, according to the transition probability similarity of phonetic between two participles of the participle centering is determined.

In other embodiments of the invention, two kinds of modes for realizing step S132 are provided below：

Mode one：First kind of way is fairly simple, that is, determine different between the first participle and second participle The number of note, then according to the length of the note string of the number and the first participle or second participle of different notes Determine the transition probability, wherein the length of note string can be the first participle number of words be multiplied by 2 product, or, note string Length can also be the product that the number of words of second participle is multiplied by 2, or, the length of the note string can be the first participle The number of words sum of number of words and the second participle is multiplied by again 2 product, because the phonetic of a Chinese character includes initial consonant and simple or compound vowel of a Chinese syllable, then sound The length of symbol string is just 2 times of Chinese total number.By " logging off "-" lower limb slightly dragon " this to participle as a example by illustrate：Assume It is " lower limb is slightly imperial " by the participle (first participle) of error correction, error correction participle (the second participle) is " logging off ", between the participle pair The number of different notes be 4, respectively " ch ", " x ", " i " and " t ", wherein note includes initial consonant and simple or compound vowel of a Chinese syllable, then described Transition probability may be calculated：4 ÷ 6 (i.e. 4 divided by 6,6 for the note string of the first participle length), (i.e. 4 are 4 ÷ 8 divided by 8,8 The length of the note string of the second participle) or 4 ÷ (6+8) (i.e. 4 is the note string of the first participle and the second participle divided by 14,14 Length sum).Assume that by the participle (first participle) of error correction be " logging off ", error correction participle (the second participle) is for " lower limb is thick Dragon ", the number of the different note between the participle pair is 2, and respectively " c " and " l ", wherein note includes initial consonant and simple or compound vowel of a Chinese syllable, So described transition probability may be calculated：2 ÷ 6 (i.e. 2 is the length of the note string of the first participle divided by 6,6), 2 ÷ 8 (i.e. 2 Divided by 8,8 for the note string of the second participle length) or 2 ÷ (6+8) (i.e. 2 divided by 14, and 14 is the first participle and second point The length sum of the note string of word).

It should be noted that the relation between the above-mentioned transition probability for calculating and similarity is in inverse ratio, that is, shift Probability is less, and similarity is bigger, and transition probability is bigger, and similarity is less, and the transition probability is between [0,1], i.e. institute Transition probability is stated more than or equal to 0 and less than or equal to 1, when transition probability is 0, shows that the first participle is with the note of the second participle It is identical, such as " comrades "-" notice "；When transition probability is 1, show the sound of the first participle and the second participle Symbol is diverse, such as " comrades "-" let's start the meeting ".In order to there is a good corresponding relation to be easier in other words Understand transition probability, similarity can be calculated using following relational expression：Similarity=1- transition probabilities.So calculate Similarity between [0,1], i.e., described similarity, when similarity is 0, represents participle pair more than or equal to 0 and less than or equal to 1 In two participles phonetic it is entirely different, when similarity be 1 when, represent participle centering two participles the complete phase of phonetic Together.

Mode two：The second way is to calculate first point of the participle centering using default the initial and the final similarity matrix The phonetic of word is converted to the transition probability of the phonetic of the second participle, step S132, the calculating participle centering first participle Phonetic be converted to the second participle phonetic transition probability, including：

Step S1321, if the word unisonance of two participle same positions after alignment, calculates score Score and adds 1, and The position of the position of the participle centering first participle and the second participle is all added 1；

Step S1322, if the word not unisonance of two participle same positions after alignment, according to default the initial and the final phase Determine score Score of the phonetic of the phonetic of the first participle and the second participle in described two participles like degree matrix；

Step S1323, determines normalized according to score Score, the number of words of the first participle, the number of words of the second participle Final score Sf；

Step S1324, determines that the phonetic of the participle centering first participle is converted to second according to the final score Sf The transition probability of the phonetic of participle.

In other embodiments of the invention, it is described to determine described two points according to default the initial and the final similarity matrix Score Score of the phonetic of the phonetic of the first participle and the second participle in word, including：

Step S13221, the initial consonant of the word of two participle same positions is obtained according to default the initial and the final similarity matrix Between similarity, simple or compound vowel of a Chinese syllable between similarity；

Step S13222, if product S of the similarity between the similarity and simple or compound vowel of a Chinese syllable between initial consonant is default more than first Value, then calculate score Score and add S, all adds 1 by the position of the position of the participle centering first participle and the second participle；

Step S13223, if similar between the similarity between the initial consonant of the word of two participle same positions and simple or compound vowel of a Chinese syllable Product S of degree is less than or equal to the first preset value, then the present bit of the first participle is obtained according to the initial and the final similarity matrix The similarity between similarity, simple or compound vowel of a Chinese syllable, initial consonant and simple or compound vowel of a Chinese syllable between the word put and the initial consonant of the word of the next position of the second participle Between similarity and the similarity between simple or compound vowel of a Chinese syllable and initial consonant；

Here, first preset value and the second following preset value, the 3rd preset value can be empirical value, and first presets Value is with the second following preset value, the 3rd preset value can be with identical, and for example all value is 0.8, naturally it is also possible to different.

Step S13224, determines the first maximum, first maximum for the current location of the first participle word and the Product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of the next position of two participles, the first participle are worked as The present bit of similarity and the first participle between the simple or compound vowel of a Chinese syllable of the word of the next position of the initial consonant of the word of front position and the second participle Maximum between similarity between the initial consonant of the word of the next position of the simple or compound vowel of a Chinese syllable of the word put and the second participle this three；

Whether step S13225, judge first maximum more than the second preset value, before calculating score Score is Score adds first maximum before obtaining, and the position of the position of the participle centering first participle and the second participle is all added 1；

Step S13226, it is similar according to the initial and the final if first maximum is less than or equal to the second preset value Degree matrix obtains similarity, the rhythm between the initial consonant of the word of the current location of the word and the second participle of the next position of the first participle Similarity between mother, the similarity between initial consonant and simple or compound vowel of a Chinese syllable and the similarity between simple or compound vowel of a Chinese syllable and initial consonant；

Step S13227, determines the second maximum, second maximum for the next position of the first participle word and the Under product S, the first participle of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of the current location of two participles The next bit of similarity and the first participle between the simple or compound vowel of a Chinese syllable of the word of the current location of the initial consonant of the word of one position and the second participle Maximum between similarity between the initial consonant of the word of the current location of the simple or compound vowel of a Chinese syllable of the word put and the second participle this three；

Whether step S13228, judge second maximum more than the 3rd preset value, before calculating score Score is Score adds second maximum before obtaining, and the position of the position of the participle centering first participle and the second participle is all added 1, then judge the whether unisonance of the word after the location updating of the first participle and the second participle, the Jing above-mentioned steps traversal first participle The phonetic of phonetic and the second participle simultaneously calculates score Score.

In embodiments of the present invention, a kind of method of determination the initial and the final similarity matrix is also provided, described in the determination The initial and the final similarity matrix includes：

Step S140, collects the second language material in the form of participle pair；By two points of participle centering in second language material Word is marked all in the form of phonetic；

Here, second language material is used to form the initial and the final similarity matrix, the second language material and aforesaid first language material Can be with difference, the second language material can essentially regard a standard corpus as, i.e. the second language material the inside should not include mistake Participle pair；And wrong participle pair can be included in the first language material, the first language material forms the present invention through the embodiment shown in Fig. 1 The participle set of offer.

Here, pinyin marking can adopt aforesaid mask method, for example with the method for most same pronunciations.

Step S141, determines that first note is by the wrong total degree that pronounced, the first note bag in second language material Include initial consonant or simple or compound vowel of a Chinese syllable；

Here, the second language material can be standard corpus storehouse, so to a certain extent whether abundant decide of corpus is entangled Wrong accuracy, in the present embodiment, in addition to aforesaid dictionary, the collection of language material also includes the session log between user, Language material is excavated in session log from line, the main purpose of the step is connected applications field, dug from user's history daily record Excavate the error correction candidate for meeting application target.The thinking for excavating session log is also as similar using the pronunciation similarity of text Degree tolerance, in general, main method has two：A) user conversation (session) (such as customer service session), digging user is actively Repair pronunciation mistake；From session context, the pronunciation analog result between different inputs is excavated repeatedly；B) field is manually customized Emphasis phrase, excavates fallibility candidate, and with reference to business objective, artificial customization field emphasis phrase is excavated from a large amount of daily records and determined The similar result of phrase pronunciation processed.

Step S142, determines the first note by number of times that incorrect pronunciations are the second note；

Step S143, be by incorrect pronunciations by the total degree and the first note of mistake of pronouncing according to the first note The number of times of the second note determines probability of the first note transfer for the second note；

Step S144, determines that the second note is by the wrong total degree that pronounced, the second note bag in second language material Include initial consonant or simple or compound vowel of a Chinese syllable；

Step S145, determine second note by incorrect pronunciations for first note number of times；

Step S146, be by incorrect pronunciations by the total degree and second note of mistake of pronouncing according to second note The number of times of first note determines probability of the second note transfer for first note；

Step S147, it is first to be shifted for the probability and second note of the second note according to first note transfer Similarity between first note described in the determine the probability of note and second note.

Here, illustrate by taking " let's start the meeting "-" brined vegetable are too expensive " as an example：It is as follows first to the participle to marking phonetic：

Let's start the meeting x ian z ai t ai h ui；

Brined vegetable too your x ian c ai t ai g ui；

After alignment, inconsistent note has z and c, h and g.

Calculate first note z transfers now for the second note c probability (i.e. the transition probability p (c | z) of note), will obtain The participle obscured of all pronunciations to aliging, the inconsistent various note number of times of statistics pronunciation, the transfer for calculating note is general Rate p (c | z)：

P (c | z)=count (z->c)/count(z) (1)；

In formula (1)：P (c | z) is the transition probability that note z incorrect pronunciations are note c；count(z->C) it is the second language material Middle note z incorrect pronunciations are the number of times of c；Count (z) be in the second language material note z by pronounced mistake total degree；

The Probability p (z | c) that the second note c incorrect pronunciations are first note z is calculated in the same manner.

Then for the Probability p (c | z) and the second note c incorrect pronunciations of the second note c it is first according to first note z transfers The Probability p (z | c) of note z determines pronunciation similarity Sim (c, z) between note z and note c, can be with during realization Obtained using formula (2)：

Sim (c, z)=(P (c | z)+P (z | c))/2 (2).

It should be noted that note includes initial consonant and simple or compound vowel of a Chinese syllable, then the initial and the final similarity matrix actually at least includes Three matrixes, such as similarity matrix between initial consonant and initial consonant, the similarity matrix between initial consonant and simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable and rhythm Similarity matrix between mother, where it is assumed that initial consonant has 21, then the similarity matrix between initial consonant and initial consonant is 21 × 21 Square formation, it is assumed that initial consonant has 39, then the similarity matrix between initial consonant and initial consonant for 39 × 39 square formation, initial consonant and simple or compound vowel of a Chinese syllable Between similarity matrix for 21 × 39 matrix.In embodiment afterwards, if necessary to determine turning between two initial consonants Move probability, the then similarity matrix that can directly inquire about between initial consonant and initial consonant, if necessary to determine between two simple or compound vowel of a Chinese syllable Transition probability, then the similarity matrix that can directly inquire about between simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable；If necessary to determine initial consonant and rhythm Transition probability between mother, the then similarity matrix that can directly inquire about between initial consonant and simple or compound vowel of a Chinese syllable.

Based on aforesaid embodiment, the embodiment of the present invention provides again a kind of text error correction method, during realization, should Method can realize that certain program code can be stored in calculating by the processor caller code of the second computing device In machine storage medium, it is seen then that second computing device at least includes processor and storage medium, second computing device can be with For various types of electronic equipments with information processing capability, such as described electronic equipment can include mobile phone, panel computer, Desktop computer, personal digital assistant, navigator, digital telephone, visual telephone, television set etc..

Fig. 2-1 realizes schematic flow sheet two, as shown in Fig. 2-1, the method for embodiment of the present invention text error correction method Including：

Step S201, it is determined that error correction participle is treated, the participle treated in the sentence that error correction participle is user input；

Here, during realization, the text of user input is often a word, or continuously multiple participles, that Need that the text of user input is made pauses in reading unpunctuated ancient writings, can be disconnected in the form of participle during punctuate, for example, the possibility of user input It is " bobbins, let's start the meeting for we ", then in punctuate, the auxiliary word such as modal particle, auxiliary word can be removed, and adopts The form of participle disconnects, and the result of disconnection is " bobbins-we-let's start the meeting ".After disconnection, it is determined that treating that error correction participle is wrapped Include：" bobbins ", " we " and " let's start the meeting ".

Step S202, judges whether to treat the corresponding participle set of error correction participle with described, in the participle set extremely Include the error correction participle for treating error correction participle described in error correction that a similarity according to phonetic is calculated and drawn less；

Here, illustrate by taking " bobbins " as an example, that is, judge that whether " bobbins " include corresponding participle set, by with Upper to understand, the participle set of " bobbins " includes " comrades " and " notice door "；And for example, by taking " let's start the meeting " as an example, judge " existing Whether include corresponding participle set in session ", as known from the above, the participle set of " let's start the meeting " includes " brined vegetable are too expensive ".

Step S203, determines first language model score, and the first language model score treats that error correction participle exists for described Language model scores in the sentence；

Here, continue to accept the example in step S202, that is, determine the language model scores of " bobbins ", and determine The language model scores of " let's start the meeting ".

Step S204, determines second language model score, and the second language model score treats error correction participle collection for described Error correction participle language model scores respectively in the sentence in conjunction；

Here, continue to accept the example in step S202, that is, determine the language model scores of " comrades " and " notice door ", And the language model scores of determination " brined vegetable are too expensive ".

Step S205, judges to be obtained more than the first language model with the presence or absence of having in the second language model score Point, obtain judged result；

Here, continue to accept the example in above-mentioned steps, " let's start the meeting " corresponding second language model score only has one It is individual, i.e. language model scores of " brined vegetable are too expensive "；It is and " bobbins " corresponding second language model score includes two, i.e., " same The language model scores of the language model scores of will " and " notice door "；When this is in judgement, that is, judge " brined vegetable are too expensive " Language model scores whether more than the language model scores of " let's start the meeting ", whether judge the language model scores of " comrades " More than the language model scores of " bobbins ", and judge whether the language model scores of " notice door " are more than the language of " bobbins " Speech model score.Assume the language model scores of " comrades " higher than " bobbins " and " notice door ", the language of " let's start the meeting " Model score is higher than " brined vegetable are too expensive ", then for " let's start the meeting ", and judged result is do not exist in second language model score More than the first language model score；For " bobbins ", judged result is more than institute to exist in second language model score State first language model score.

Step S206, treats that error correction participle carries out error correction according to judged result to described.

Here, step S206, it is described to treat that error correction participle carries out error correction to described according to judged result, including：

Step S2061, if there are in the second language model score more than the first language model score, will The error correction participle of highest scoring is defined as to the error correction word for treating error correction participle in language model scores；Here, step S206, also includes：The first participle is replaced with into the error correction word of the first participle, is exported.

Step S2062 is not right if being not greater than the first language model score in the second language model score It is described to treat that error correction participle carries out error correction.

In this example, it is assumed that the language model scores of " comrades " are higher than " bobbins " and " notice door ", then by " cylinder Son " is corrected as " comrades "；The language model scores for assuming " let's start the meeting " are higher than " brined vegetable are too expensive ", then not to " opening now Meeting " carries out error correction.

It should be noted that step S202 is during realization, can be by the default related information judgement of inquiry No presence treats the corresponding participle set of error correction participle with described, the related information during realization can by list, Realizing, the related information is used to show to treat the corresponding relation between error correction participle and participle set incidence relation etc..It is described Related information can be (the coming from the first computing device) for pre-setting, naturally it is also possible to be that the first computing device is handed down to What two computing devices or the second computing device were asked to the first computing device, in other words, referring to shown in Fig. 2-2, realizing Fig. 1 institutes The first computing device 10 can be regarded as realizing the service of the second computing device 21 and 22 shown in Fig. 2-1 in the technical scheme shown Device, and the second computing device can be regarded as the terminal of the first computing device, the first computing device 10 can also be regular or indefinite Phase to second computing device 21 and 22 of user updates related information.

In other embodiments of the invention, referring to shown in Fig. 2-3, on the basis of the method shown in Fig. 1, methods described Also include：

Step S230, terminal to server sends error correction request, and the sentence of user input is carried in the error correction request；

Here, end side is provided with client, and client can adopt the form of application program (App, Application) Embody, user detects the sentence of user input in terminal read statement (or text), then client, then, client will The sentence is carried in error correction request, and then the error correction request is sent to server by client.

Step S231, the error correction request that server receiving terminal sends,

Step S232, server determines treats error correction participle, described to treat in the sentence that error correction participle is the user input Participle；

Here, in general the text of user input is often in short or continuous multiple participles, then need by The text of user input is made pauses in reading unpunctuated ancient writings, and can be disconnected in the form of participle during punctuate, and for example, user input is probably " bobbin , let's start the meeting for we ", then in punctuate, the auxiliary word such as modal particle, auxiliary word can be removed, and using participle Form disconnects, and the result of disconnection is " bobbins-we-let's start the meeting ".After disconnection, it is determined that treating that error correction participle includes：" cylinder Son ", " we " and " let's start the meeting ".

Step S233, server judges to whether there is in error correction dictionary treats the corresponding participle set of error correction participle with described；

Step S234, if there is with described the corresponding participle set of error correction participle is treated, server determines first language mould Type score and second language model score, the first language model score is described to treat language of the error correction participle in the sentence Speech model score, at least includes that one is used to treat the error correction participle of error correction participle described in error correction in the participle set, described the Two language model scores treat error correction participle language model scores respectively in the sentence in error correction participle set described in being；

Step S235, judges to be obtained more than the first language model with the presence or absence of having in the second language model score Point；

Step S236, if there are in the second language model score more than the first language model score, clothes Business device is defined as the error correction participle of highest scoring in language model scores to the error correction word for treating error correction participle；

Here, step S201 in the embodiment shown in above-mentioned step S232 to step S236 and earlier figures 2-1 is to step Rapid S206 is similar to, and those skilled in the art is referred to the embodiment shown in earlier figures 2-1 and understands above-mentioned step S232 To step S236.

Step S237, server carries the error correction word in the first error correction response, and first error correction is rung Terminal should be sent to.

Step S238, if the first language model score is not greater than in the second language model score, or such as Fruit is not present treats the corresponding participle set of error correction participle with described, and server sends the second error correction response, and second error correction rings It is applied to show not treat that error correction participle carries out error correction to described.

Step S239, the error correction response that terminal the reception server sends, it is determined that the error correction response for receiving is the first error correction sound At once, then responding the sentence to user input according to the first error correction carries out error correction；It is determined that the error correction response for receiving is second When error correction is responded, error correction is not carried out to the sentence of user input.

In the embodiment shown in Fig. 2-1, language model scores are completed in end side, and based on eventually in the present embodiment The request at end, server is completing language model scores, it can be seen that, when error correction method consumes ratio for the hardware of terminal When relatively low, the method shown in Fig. 2-1 can be adopted, text error correction, i.e. the method are completed by so can networking can To complete in the case of offline；When consumption of the error correction method to hardware is higher, the method shown in Fig. 2-3 can be adopted, The consumption of terminal-pair hardware resource can be so saved, however it is necessary that terminal could be realized with server networking.

Based on aforesaid embodiment, the embodiment of the present invention provides a kind of text error correction side based on Chinese pronunciations similarity Method, can apply to the voice identification result error correction and Chinese pinyin input method result error correction of Chinese, it is also possible to directly as spy Take over for use in Chinese Semantic Similarity Measurement.Fig. 3-1 realizes schematic flow sheet four for embodiment of the present invention text error correction method, such as Shown in Fig. 3-1, the method includes：

Step S301, pronunciation similarity dictionary is excavated；

Here, as shown in figure 3-2, step S301 is further comprising the steps of：

Step S311, collects easily pronunciation and obscures phrase pair；

Here, the step one language material collection step, can collect language material from following channel：The nearly sound word of Chinese Chinese language Dictionary；The confusing dialect of note and RP dictionary；Speech recognition errors annotation results；Input method mistake mark knot on line Really.

Here, the form that language material is collected is completed in the form of phrase fragment pair, such as " logging off " --- " lower limb is slightly imperial ", " generation --- ----" bobbins ", " dried shrimps " --- " villagers ", " brined vegetable are too expensive " --- are " now for " cash equivalent volume ", " comrades " for gold note " Meeting ", " sausage pickled melon " --- " chief of township's speech ".

Step S312, phrase is to turning phonetic；

This step is realized by way of one is looked into Chinese character and pinyin table, and each word is obtained in Chinese dictionary correspondence Phonetic, process step is as follows：1) encounter non-polyphone table look at and turn phonetic；2) encounter polyphone, check the word and periphery word Group word result table look-up；Here, exist and word has unique pronunciation then to turn phonetic, exist and word pronunciation multitone not yet, using language Speech model determines pronunciation (such as：Cheap pronunciation includes " pin-yi " and " bian-yi ")；Do not exist using acquiescence pronunciation (in table Polyphone has acquiescence pronunciation).3) encounter Arabic numerals, switch to corresponding Chinese character and tabled look-up again；4) English character is encountered, no Process；5) encounter the Chinese character not in table, skip the word, and the position is set to into sky.

Step S313, phonetic the initial and the final cutting alignment；

Here, due to close phrase centering of pronouncing, incorrect pronunciations are minority note, so herein using by most multiphase Deng the alignment schemes of pronunciation, for example：

Let's start the meeting x ian z ai t ai hui；

Brined vegetable too your x ian c ai t ai g ui；

After alignment, inconsistent note has z and c, h and g.

Step S314, calculates transition probability between the initial and the final；

Here, all pronunciations for obtaining are obscured to aliging as stated above, the inconsistent various sounds of statistics pronunciation Symbol number of times, calculate note z incorrect pronunciations for note c transition probability p (c | z)：

P (c | z)=count (z->c)/count(z)；

Wherein, p (c | z) is transition probability, the count (z- that note z incorrect pronunciations are note c>C) it is note z in language material Incorrect pronunciations are the number of times of c；Count (z) be in language material note z by pronounced mistake total degree.

Step S315, calculates similarity score between any note；

It is herein that the pronunciation between note z and note c is similar by the calculated p of upper step (c | z) and p (z | c) Degree is defined as：Sim (c, z)=(P (c | z)+P (z | c))/2；

Similarity between any note is calculated, the initial and the final similarity matrix between a note can be obtained, wherein, sound The initial and the final similarity matrix includes between initial consonant and initial consonant, between initial consonant and simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable between symbol Similarity matrix.

Step S302, phrase pronunciation Similarity Measure；

Based on similarity of pronouncing between the calculated note of step S301, this step calculate two any given phrases it Between pronunciation similarity, idiographic flow as shown in Fig. 3-3, including：

Step S321, such as Arabic numerals pretreatment, " 2 " switch to " two ", are easy to extract phonetic；

Step S322, Chinese character turns phonetic, with step S312；

Step S323, each word pronunciation cutting the initial and the final of pinyin string；

Step S324, word for word travels through two pinyin strings, calculates the similar score of pronunciation；

Here, the current location for first assuming the first participle is pos₁, the current location of the second participle is pos₂, ScoreSS, ScoreYY and ScoreSY are respectively between initial consonant and initial consonant, similarity score between simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable, between initial consonant and simple or compound vowel of a Chinese syllable, Can be drawn by inquiring about above-mentioned the initial and the final similarity matrix；Score is score；So calculate the similar score of pronunciation referring to Fig. 3-4, including：

Step S3241, starts, and arranges pos₁=1, pos₂=1；

Step S3242, judges whether the word of the current location of the first participle is identical with the word of the current location of the second participle, If identical, Score+=1, pos₁+=1, pos₂+=1, continue, returns to step S3242；If it is not the same, then entering Enter step S3243；

Step S3243, judges whether (S=ScoreSS*ScoreYY) is more than 0.8, if (S=ScoreSS*ScoreYY)> 0.8, then Score+=S, pos₂+=1, pos₂+=1, continue, returns to step S3242；If (S=ScoreSS* ScoreYY)≤0.8, it is determined that the similarity for facing a word of the first participle and the second participle, into step S3244.

Step S3244, if (S=ScoreSS*ScoreYY)≤0.8, judges pos₁With pos₂Whether+1 place has S=max (ScoreSS*ScoreYY,ScoreSY1,ScoreSY2)>0.8；

If pos₁With pos₂There is S=max (ScoreSS*ScoreYY, ScoreSY1, ScoreSY2) at+1 place>0.8, then Score+=S, pos₁+=1, pos₂+=2, continue, return to step S3242；If pos₁With pos₂There is S=max at+1 place (ScoreSS*ScoreYY, ScoreSY1, ScoreSY2)≤0.8, then into step S3245；

Step S3245, judges pos₁+ 1 and pos₂Place whether have (S=max (ScoreSS*ScoreYY, ScoreSY1, ScoreSY2)>0.8；

If pos₁+ 1 and pos₂Place, (S=max (ScoreSS*ScoreYY, ScoreSY1, ScoreSY2)>0.8, then Score+=S, pos₁+=2, pos₂+=1, continue, returns to step S3242；

Terminated to step S3245 traversal by above-mentioned step S3242, Score is the similarity score of two participles.

Step S325, similarity score normalization, referring to as follows：

Sf=Score*2/ (Size1*Size2)

Wherein：Sf is the final score after normalization, and Score is that previous step travels through score, and Size1 is the first Chinese character string Number of words, Size2 is the number of words of the second Chinese character string；

Step S303, error correction candidate excavates；

Based on the similarity calculating method of upper step, error correction candidate is excavated in interactive log from line.The main mesh of this step Be connected applications field, the error correction candidate for meeting application target is excavated from user's history daily record.

The thinking of error correction candidate is excavated as conventional error correction problem thinking, difference is using the pronunciation similarity of text As measuring similarity.Main method has two：A) user conversation (for example customer service is to session), digging user actively repairs pronunciation Mistake, from session context, excavates repeatedly the pronunciation analog result between different inputs；B) artificial customization field emphasis is short Language, excavates fallibility candidate；With reference to business objective, artificial customization field emphasis phrase is excavated and customization phrase from a large amount of daily records The similar result of pronunciation.

Step S304, error correction；

Online error correction is carried out to (participle set) based on error correction candidate, the thinking of the embodiment of the present invention is as follows：

1) user input S0 participle；

Adjacent multiple word combination phrases search whether that there is error correction candidate (attempts respectively adjacent 1 to 4 phrases from candidate The phrase of conjunction), there is error correction candidate and correspondence phrase then replaced in former input, as a kind of user may be input into Si (i=1, 2, ,).

2) respectively calculate user be originally inputted S0 and it is various may input Si language model scores (language model scores can To weigh the flow process degree of sentence)；

3) score of S0 and multiple Si is compared；

If S0 scores are high, error correction is not carried out；If Si scores are high, the substitute mode of Si carries out error correction

Embodiment more than can be seen that in the embodiment of the present invention and excavate note transition probability by pronunciation Similar Text As note similarity, and the alignment requirements of phonetic are relaxed, that is, allow to find most like note in window, had in process During the participle of Arabic numerals, Arabic numerals are first converted to Chinese character, can so calculate the participle with Arabic numerals With the similarity between other participles.By above technological means, technical scheme provided in an embodiment of the present invention has following skill Art advantage：1) the pronunciation similarity obtained using Statistics-Based Method, Data Source truly should more can be represented in user behavior Similarity between note, as a result more accurate with the case of；2) each sound of different pronunciation types and same pronunciation type can be obtained Pronounce similarity degree between symbol, is a floating point values, and the similarity degree between different notes is more comparable；3) voicing text is being calculated During the aligned in position of similarity, it is allowed to optimal alignment result is found in a window, is had to the Similarity Measure of hiatus or multiword Robustness.

Based on aforesaid embodiment, the embodiment of the present invention provides a kind of text error correction device, each list included by the device Unit, and each module included by each unit, or even each submodule included by each module, can pass through the first computing device In processor realizing, also can be realized by specific logic circuit certainly；During specific embodiment, processor can Think central processing unit (CPU), microprocessor (MPU), digital signal processor (DSP) or field programmable gate array (FPGA) Deng.

Fig. 4 is the composition structural representation one of embodiment of the present invention text error correction device, and shown in Fig. 4, the device 400 includes First forms unit 401, mark unit 402, the first determining unit 403, the first judging unit 404 and the second determining unit 405, Wherein：

Described first forms unit 401, for collecting the first language material in the form of participle pair；

The mark unit 402, for two participles of the participle centering to be marked all in the form of phonetic；

First determining unit 403, it is described for determining the similarity of phonetic between two participles of the participle centering Similarity is used to show the similarity degree between the phonetic of the participle centering first participle and the phonetic of the second participle；

First judging unit 404, for judging whether the similarity meets default condition；

Second determining unit 405, if meeting default condition for the similarity, by the participle centering Two participles be identified as mutual error correction participle.

In other embodiments of the invention, the mark unit includes the first judge module and the first labeling module, its In：

First judge module, for judging whether the participle centering includes polyphone；

First labeling module, if not including polyphone for the participle centering, by the participle centering Two participles mark all in the form of phonetic.

In other embodiments of the invention, the mark unit also includes the second judge module and the second labeling module, Wherein：

Second judge module, if including polyphone for the participle centering, continuation judges the participle centering Two participles in whether have polyphonic word；

Second labeling module, if in two participles of the participle centering at least one include multitone Word, the corresponding two or more phonetic of the polyphonic word is labeled as the part or complete of the phonetic of the participle centering correspondence participle Portion.

In other embodiments of the invention, the mark unit includes the 3rd judge module, modular converter and the 3rd mark Injection molding block, wherein：

3rd judge module, for judging whether the participle centering includes Arabic numerals；

The modular converter, if including Arabic numerals for the participle centering, by Arabic numerals conversion For corresponding Chinese character；

3rd labeling module, for by the participle be converted to after Chinese character of the participle centering in the form of phonetic mark Note.

In other embodiments of the invention, first determining unit includes that alignment module, computing module and first are true Cover half block, wherein：

The alignment module, for by the alignment of the initial consonant of the phonetic of two participles of participle centering and by two participles The simple or compound vowel of a Chinese syllable alignment of phonetic；

The computing module, the phonetic for calculating the participle centering first participle is converted to the phonetic of the second participle Transition probability；

First determining module, for determining phonetic between two participles of the participle centering according to the transition probability Similarity.

In other embodiments of the invention, the alignment module, for by the alignment thereof of most same pronunciations by institute State the initial consonant alignment of the phonetic of two participles of participle centering and the simple or compound vowel of a Chinese syllable of the phonetic of two participles aligns.

In other embodiments of the invention, the computing module include calculating sub module, the first determination sub-module, second Determination sub-module and transform subblock, wherein：

The calculating sub module, if for the word unisonance of two participle same positions after alignment, calculating score Score adds 1, and all adds 1 by the position of the position of the participle centering first participle and the second participle；

First determination sub-module, if for the word not unisonance of two participle same positions after alignment, according to pre- If the initial and the final similarity matrix determine the score of the phonetic of the phonetic of the first participle and the second participle in described two participles Score；

Second determination sub-module, for according to score Score, the number of words of the first participle, the second participle word Number determines normalized final score Sf；

The transform subblock, for determining that the phonetic of the participle centering first participle turns according to the final score Sf It is changed to the transition probability of the phonetic of the second participle.

In other embodiments of the invention, second determination sub-module, is used for：

It is similar between the initial consonant of the word that two participle same positions are obtained according to default the initial and the final similarity matrix Similarity between degree, simple or compound vowel of a Chinese syllable；

If product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between initial consonant is more than the first preset value, calculate Divide Score to add S, the position of the position of the participle centering first participle and the second participle is all added 1；

If product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of two participle same positions is little In equal to the first preset value, then the word and second of the current location of the first participle is obtained according to the initial and the final similarity matrix The similarity between similarity, simple or compound vowel of a Chinese syllable between the initial consonant of the word of the next position of participle, the similarity between initial consonant and simple or compound vowel of a Chinese syllable Similarity and simple or compound vowel of a Chinese syllable and initial consonant between；

Determine the first maximum, first maximum is next with the second participle for the word of the current location of the first participle Product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of position, the word of the current location of the first participle The simple or compound vowel of a Chinese syllable of the word of the current location of similarity and the first participle between the simple or compound vowel of a Chinese syllable of the word of the next position of initial consonant and the second participle And the maximum between the similarity between the initial consonant of the word of the next position of the second participle this three；

Judge that first maximum, whether more than the second preset value, calculates the front Score that obtains before score Score is and adds Upper first maximum, all adds 1 by the position of the position of the participle centering first participle and the second participle；

If first maximum is less than or equal to the second preset value, according to the initial and the final similarity matrix the is obtained It is similar between similarity, simple or compound vowel of a Chinese syllable between the word of the next position of one participle and the initial consonant of the word of the current location of the second participle Degree, the similarity between initial consonant and simple or compound vowel of a Chinese syllable and the similarity between simple or compound vowel of a Chinese syllable and initial consonant；

Determine the second maximum, second maximum is current with the second participle for the word of the next position of the first participle Product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of position, the word of the next position of the first participle The simple or compound vowel of a Chinese syllable of the word of the next position of similarity and the first participle between the simple or compound vowel of a Chinese syllable of the word of the current location of initial consonant and the second participle And the maximum between the similarity between the initial consonant of the word of the current location of the second participle this three；

Judge that second maximum, whether more than the 3rd preset value, calculates the front Score that obtains before score Score is and adds Upper second maximum, all adds 1 by the position of the position of the participle centering first participle and the second participle, then judges the Word after the location updating of one participle and the second participle whether unisonance, the phonetic of the Jing above-mentioned steps traversal first participle and second point The phonetic of word simultaneously calculates score Score.

In other embodiments of the invention, described device also includes the 3rd determining unit, for determining the initial consonant rhythm Female similarity matrix, the 3rd determining unit further includes that the second determining module, the 3rd determining module, the 4th determine mould Block, the 5th determining module, the 6th determining module, the 7th determining module and the 8th module, wherein：

Second determining module, for determining second language material in first note by pronounced mistake total degree, institute First note is stated including initial consonant or simple or compound vowel of a Chinese syllable；

3rd determining module, for determining the first note by number of times that incorrect pronunciations are the second note；

4th determining module, for according to the first note by pronounce mistake total degree and the first note Probability of the first note transfer for the second note is determined by the number of times that incorrect pronunciations are the second note；

5th determining module, for determining second language material in the second note by pronounced mistake total degree, institute The second note is stated including initial consonant or simple or compound vowel of a Chinese syllable；

6th determining module, for determine second note by incorrect pronunciations for first note number of times；

7th determining module, for according to second note by pronounce mistake total degree and second note Determine probability of the second note transfer for first note for the number of times of first note by incorrect pronunciations；

8th determining module, for according to the probability and second sound that first note transfer is the second note Similarity of the symbol transfer described in the determine the probability of first note between first note and second note.

It need to be noted that be：The description of apparatus above embodiment, be with the description of said method embodiment it is similar, With the similar beneficial effect of same embodiment of the method.For the ins and outs not disclosed in apparatus of the present invention embodiment, refer to The description of the inventive method embodiment and understand.

Based on aforesaid embodiment, the embodiment of the present invention provides a kind of text error correction device, each list included by the device Unit, can be realized by the processor in the second computing device, also can be realized by specific logic circuit certainly；In tool During body embodiment, processor can be central processing unit (CPU), microprocessor (MPU), digital signal processor Or field programmable gate array (FPGA) etc. (DSP).

Fig. 5 is the composition structural representation two of embodiment of the present invention text error correction device, and shown in Fig. 5, the device 500 includes 4th determining unit 501, the second judging unit 502, the 5th determining unit 503, the 6th determining unit 504, the 3rd judging unit 505 and error correction unit 506, wherein：

4th determining unit 501, for determining error correction participle is treated, the sentence for treating that error correction participle is user input In participle；

Second judging unit 502, for judging whether to treat the corresponding participle set of error correction participle, institute with described State in participle set at least include that a similarity according to phonetic calculated and drawn for treating error correction point described in error correction The error correction participle of word；

5th determining unit 503, for determining first language model score, the first language model score is institute State the language model scores for treating error correction participle in the sentence；

6th determining unit 504, for determining second language model score, the second language model score is institute State and treat error correction participle language model scores respectively in the sentence in error correction participle set；

3rd judging unit 505, for judging the second language model score in the presence or absence of having more than described First language model score, obtains judged result；

The error correction unit 506, for treating that error correction participle carries out error correction to described according to judged result.

In other embodiments of the invention, the error correction unit, is used for：If deposited in the second language model score Having more than the first language model score, the error correction participle of highest scoring in language model scores is defined as to treat to described The error correction word of error correction participle；If the first language model score is not greater than in the second language model score, no Treat that error correction participle carries out error correction to described.

In other embodiments of the invention, described device also includes that first forms unit, mark unit, the first determination list Unit, the first judging unit, the second determining unit and second form unit, wherein：

The mark unit, for two participles of the participle centering to be marked all in the form of phonetic；

First judging unit, for judging whether the similarity meets default condition；

Second determining unit, if meeting default condition for the similarity, by the participle centering Two participles are identified as mutual error correction participle；

Described second forms unit, for forming the participle set according to the error correction participle.

Based on aforesaid embodiment, the embodiment of the present invention provides a kind of computing device, and Fig. 6 is embodiment of the present invention server Composition structural representation, as shown in fig. 6, the computing device 600 can include：At least one processor 601, at least one leads to Letter bus 602, user interface 603, at least one external communication interface 604 and the memorizer 605 for storing executable program Deng component.Wherein, communication bus 602 is used to realize processor 601, user interface 603, external communication interface 604 and memorizer Connection communication between 605.Wherein, user interface 603 can include display screen and keyboard.External communication interface 604 is optional Including wireline interface and wave point.Wherein described processor 601, is used for：

The processor 601 is used for：

The first language material is collected in the form of participle pair；

The error correction dictionary is sent to into terminal by the external communication interface 604.

It need to be noted that be：The description of above server implementation item, is similar with said method description, is had With embodiment of the method identical beneficial effect.For the ins and outs not disclosed in server example of the present invention, this area Technical staff refer to the description of the inventive method embodiment and understand.

It should be noted that in the embodiment of the present invention, if realizing that above-mentioned text entangles in the form of software function module Wrong method, and as independent production marketing or when using, it is also possible in being stored in a computer read/write memory medium.Base In such understanding, the part that the technical scheme of the embodiment of the present invention substantially contributes in other words to prior art can be with The form of software product is embodied, and the computer software product is stored in a storage medium, including some instructions to So that a computer equipment (can be personal computer, server or network equipment etc.) performs each enforcement of the present invention The all or part of example methods described.And aforesaid storage medium includes：USB flash disk, portable hard drive, read only memory (ROM, Read Only Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.So, the embodiment of the present invention is not limited Combine in any specific hardware and software.Correspondingly, the embodiment of the present invention provides again a kind of computer-readable storage medium, the meter Be stored with computer executable instructions in calculation machine storage medium, and the computer executable instructions are used to perform in the embodiment of the present invention Text error correction method.

It should be understood that " one embodiment " or " embodiment " that description is mentioned in the whole text means relevant with embodiment Special characteristic, structure or characteristic are included at least one embodiment of the present invention.Therefore, occur everywhere in entire disclosure " in one embodiment " or " in one embodiment " not necessarily refers to identical embodiment.Additionally, these specific feature, knots Structure or characteristic can be combined in any suitable manner in one or more embodiments.It should be understood that in the various enforcements of the present invention In example, the size of the sequence number of above-mentioned each process is not meant to the priority of execution sequence, and the execution sequence of each process should be with its work( Can determine with internal logic, and any restriction should not be constituted to the implementation process of the embodiment of the present invention.The embodiments of the present invention Sequence number is for illustration only, does not represent the quality of embodiment.

It should be noted that herein, term " including ", "comprising" or its any other variant are intended to non-row His property is included, so that a series of process, method, article or device including key elements not only include those key elements, and And also include other key elements being not expressly set out, or also include for this process, method, article or device institute inherently Key element.In the absence of more restrictions, the key element for being limited by sentence "including a ...", it is not excluded that including being somebody's turn to do Also there is other identical element in the process of key element, method, article or device.

In several embodiments provided herein, it should be understood that disclosed apparatus and method, it can be passed through Its mode is realized.Apparatus embodiments described above are only schematic, and for example, the division of the unit is only A kind of division of logic function, can have other dividing mode, such as when actually realizing：Multiple units or component can be combined, or Another system is desirably integrated into, or some features can be ignored, or do not perform.In addition, shown or discussed each composition portion Coupling point each other or direct-coupling or communication connection can be the INDIRECT COUPLINGs by some interfaces, equipment or unit Or communication connection, can be electrical, machinery or other forms.It is above-mentioned as separating component explanation unit can be, Or may not be physically separate, can be as the part that unit shows or may not be physical location；Both can be with Positioned at a place, it is also possible to be distributed on multiple NEs；Part therein or complete can according to the actual needs be selected Portion's unit is realizing the purpose of this embodiment scheme.In addition, each functional unit in various embodiments of the present invention can all collect In Cheng Yi processing unit, or each unit is individually as a unit, it is also possible to two or more lists Unit is integrated in a unit；Above-mentioned integrated unit both can be realized in the form of hardware, it would however also be possible to employ hardware adds soft The form of part functional unit is realized.

One of ordinary skill in the art will appreciate that：Realizing all or part of step of said method embodiment can pass through Completing, aforesaid program can be stored in computer read/write memory medium the related hardware of programmed instruction, and the program exists During execution, the step of including said method embodiment is performed；And aforesaid storage medium includes：Movable storage device, read-only deposit Reservoir (Read Only Memory, ROM), magnetic disc or CD etc. are various can be with the medium of store program codes.Or, this If bright above-mentioned integrated unit is realized using in the form of software function module and as independent production marketing or when using, also may be used In to be stored in a computer read/write memory medium.Based on such understanding, the technical scheme essence of the embodiment of the present invention On prior art is contributed part in other words can be embodied in the form of software product, the computer software product In being stored in a storage medium, including some instructions are used so that a computer equipment (can be personal computer, service Device or the network equipment etc.) perform all or part of each embodiment methods described of the invention.And aforesaid storage medium bag Include：Movable storage device, ROM, magnetic disc or CD etc. are various can be with the medium of store program codes.

The above, the only specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, all should contain Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be defined by the scope of the claims.

Claims

1. a kind of text error correction method, it is characterised in that methods described includes：

The first language material is collected in the form of participle pair；

Determine the similarity of phonetic between two participles of the participle centering, the similarity is used to showing the participle centering the Similarity degree between the phonetic of the phonetic of one participle and the second participle；

If the similarity meets default condition, two participles of the participle centering are identified as into mutual entangling Wrong participle or the first participle are the error correction participle of the second participle.

2. method according to claim 1, it is characterised in that two participles by the participle centering are all with phonetic Form mark, including：

If the participle centering include polyphone, by two participles of the participle centering all in the form of phonetic mark Note.

3. method according to claim 2, it is characterised in that two participles by the participle centering are all with phonetic Form mark, also include：

If the participle centering includes that at least one includes polyphonic word in polyphone and two participles of the participle centering, The corresponding two or more phonetic of the polyphonic word is labeled as participle centering correspondence participle phonetic it is part or all of.

4. method according to claim 1, it is characterised in that two participles by the participle centering are all with phonetic Form mark, including：

If the participle centering includes Arabic numerals, the Arabic numerals are converted to into corresponding Chinese character；

The participle be converted to after Chinese character of the participle centering is marked in the form of phonetic.

5. the method according to any one of Claims 1-4, it is characterised in that the determination two points of the participle centering The similarity of phonetic between word, including：

Align by the initial consonant alignment of the phonetic of two participles of participle centering and by the simple or compound vowel of a Chinese syllable of the phonetic of two participles；

Calculate the participle centering first participle phonetic be converted to the second participle phonetic transition probability；

The similarity of phonetic between two participles of the participle centering is determined according to the transition probability.

6. method according to claim 5, it is characterised in that the sound of the phonetic by two participles of participle centering Mother's alignment simultaneously aligns the simple or compound vowel of a Chinese syllable of the phonetic of two participles, including：

The initial consonant of the phonetic of two participles of participle centering is alignd and by two points by the alignment thereof of most same pronunciations The simple or compound vowel of a Chinese syllable alignment of the phonetic of word.

7. method according to claim 5, it is characterised in that the phonetic of the calculating participle centering first participle turns The transition probability of the phonetic of the second participle is changed to, including：

Determine the number of notes different between the first participle and second participle；

Institute is determined according to the length of the note string of the number and the first participle or second participle of the different note State transition probability.

8. method according to claim 5, it is characterised in that the phonetic of the calculating participle centering first participle turns The transition probability of the phonetic of the second participle is changed to, including：

If the word unisonance of two participle same positions after alignment, calculate score Score and add 1, and by the participle centering The position of the position of the first participle and the second participle all adds 1；

If the word not unisonance of two participle same positions after alignment, according to default the initial and the final similarity matrix institute is determined State score Score of the phonetic of the phonetic of the first participle and the second participle in two participles；

Normalized final score Sf is determined according to score Score, the number of words of the first participle, the number of words of the second participle；

Determine that the phonetic of the participle centering first participle is converted to turning for the phonetic of the second participle according to the final score Sf Change probability.

9. method according to claim 1, it is characterised in that phonetic between described two participles of participle centering of the determination Similarity, including：

Determine the similarity of phonetic between two participles of the participle centering using default the initial and the final similarity matrix.

10. method according to claim 8 or claim 9, it is characterised in that the determination the initial and the final similarity matrix bag Include：

The second language material is collected in the form of participle pair；

Two participles of participle centering in second language material are marked all in the form of phonetic；

Determine that first note is by the wrong total degree that pronounced in second language material, the first note includes initial consonant or simple or compound vowel of a Chinese syllable；

Determine the first note by number of times that incorrect pronunciations are the second note；

The total degree for being pronounced wrong according to the first note and the first note are the secondary of the second note by incorrect pronunciations Number determines probability of the first note transfer for the second note；

Determine that the second note is by the wrong total degree that pronounced in second language material, second note includes initial consonant or simple or compound vowel of a Chinese syllable；

Determine second note by incorrect pronunciations for first note number of times；

The total degree for being pronounced wrong according to second note and second note are the secondary of first note by incorrect pronunciations Number determines probability of the second note transfer for first note；

It is true for the probability of first note for probability and second note transfer of the second note according to first note transfer Fixed similarity between the first note and second note, according between the first note and second note Similarity forms the initial and the final similarity matrix.

11. methods according to claim 1, it is characterised in that methods described also includes：

The error correction dictionary is sent to into terminal.

12. methods according to claim 1, it is characterised in that methods described also includes：

The error correction request that receiving terminal sends, carries the sentence of user input in the error correction request；

It is determined that error correction participle is treated, the participle treated in the sentence that error correction participle is the user input；

The corresponding participle set of error correction participle is treated if there is with described, first language model score and second language model is determined Score, the first language model score is described to treat language model scores of the error correction participle in the sentence, the participle At least include an error correction participle for being used to treat error correction participle described in error correction in set, the second language model score is described Treat error correction participle language model scores respectively in the sentence in error correction participle set；

If there are in the second language model score more than the first language model score, by language model scores The error correction participle of highest scoring is defined as to the error correction word for treating error correction participle；

The error correction word is carried in the first error correction response, first error correction response is sent to into terminal.

13. methods according to claim 12, it is characterised in that methods described also includes：

If the first language model score is not greater than in the second language model score, or if there is no with it is described The corresponding participle set of error correction participle is treated, the second error correction response is sent, second error correction is responded for showing not treating to described Error correction participle carries out error correction.

14. a kind of text error correction devices, it is characterised in that described device includes that first forms unit, mark unit, the first determination Unit and the second determining unit, wherein：

First determining unit, for determining the similarity of phonetic between two participles of the participle centering, the similarity For showing the similarity degree between the phonetic of the participle centering first participle and the phonetic of the second participle；

Second determining unit, if meeting default condition for the similarity, by two of the participle centering Participle is identified as mutual error correction participle.

15. a kind of servers, it is characterised in that the server includes processor and external communication interface, the processor is used In：

The first language material is collected in the form of participle pair；

If the similarity meets default condition, two participles of the participle centering are identified as into mutual entangling Wrong participle or the first participle are the error correction participle of the second participle；

16. a kind of computer-readable storage mediums, it is characterised in that the computer that is stored with the computer-readable storage medium is executable to be referred to Order, the computer executable instructions are used for the text error correction method described in perform claim 1 to 13 any one of requirement.