CN106598939A - Method and device for text error correction, server and storage medium - Google Patents
Method and device for text error correction, server and storage medium Download PDFInfo
- Publication number
- CN106598939A CN106598939A CN201610922072.0A CN201610922072A CN106598939A CN 106598939 A CN106598939 A CN 106598939A CN 201610922072 A CN201610922072 A CN 201610922072A CN 106598939 A CN106598939 A CN 106598939A
- Authority
- CN
- China
- Prior art keywords
- participle
- phonetic
- error correction
- centering
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Abstract
The invention discloses a method and a device for text error correction. The method comprises the following steps of collecting a first corpus in a form of a participle pair; labelling two participles in the participle pair in a pinyin form; determining similarity of the pinyin between the two participles in the participle pair, wherein the similarity is used for indicating similarity degree between the pinyin of the first participle and the pinyin of the second participle in the participle pair; and if the similarity meets preconditions, respectively determining the two participles in the participle pair as error correction participles of each other or making the first participle as an error correction participle of the second participle.
Description
Technical field
The present invention relates to electronic technology, more particularly to a kind of text error correction method and device, server, storage medium.
Background technology
Text error correcting technique is widely used in various text input scenes, such as input method, search engine, speech recognition
Deng, text error correcting technique be may in a kind of text (such as the keyword of the word such as Chinese English) attempted and correct user input
The mistake of presence, and possible correctly entering is recommended user.For Chinese error correction, text error correcting technique also needs to send out
Word selection mistake, wrong phonetic notation, font mistake and its a little mistake occurred in existing user input, and may wish to user recommended user
Hope the correct keyword of input.As can be seen here, error correcting technique effectively can provide guidance for user entered keyword, and can entangle
More in use often there is keyword mistake in Jing.In text error correcting technique, be repaired key word and correct key word it
Between similarity decide the accuracy rate of error correction, current calculating similarity mainly includes:Sending out based on mandarin initial and simple or compound vowel of a Chinese syllable
Sound type, by the initial and the final several groups are divided into, and the note similarity defined in same group is 1, and the note between different groups is similar
Spend for 0, by Chinese character aligned in position, the pronunciation similarity of relevant position is calculated one by one, and then be averaging similarity as a result.Should
The shortcoming of scheme is:Similarity degree between accurate description note will be unable to for 0 this definition for 1 different groups with group, so as to
Have ignored the similarity difference between note, similar journey of pronouncing between such as labial b [glass] and p [slope], b [glass] and m [touching] in group
Degree difference, and be only entirely zero note similarity degree between group, as having certain between labial b [glass] and velar g [brother]
Similar pronunciation.
The content of the invention
In view of this, the embodiment of the present invention provides a kind of text to solve at least one problem present in prior art
Error correction method and device, server, storage medium, are used as note similar by the Similar Text excavation note transition probability that pronounces
Degree, it is possible to increase error correction probability.
What the technical scheme of the embodiment of the present invention was realized in:
In a first aspect, the embodiment of the present invention provides a kind of text error correction method, methods described includes:
The first language material is collected in the form of participle pair;
Two participles of participle centering in first language material are marked all in the form of phonetic;
Determine the similarity of phonetic between two participles of the participle centering, the similarity is used to show the participle pair
Similarity degree between the phonetic of the phonetic of the middle first participle and the second participle;
If the similarity meets default condition, two participles of the participle centering are identified as each other
Error correction participle or error correction participle that the first participle is the second participle.
Second aspect, the embodiment of the present invention provides a kind of text error correction device, and described device includes that first forms unit, mark
Note unit, the first determining unit and the second determining unit, wherein:
Described first forms unit, for collecting the first language material in the form of participle pair;
The mark unit, for by two participles of participle centering in first language material all in the form of phonetic mark
Note;
First determining unit, for determining the similarity of phonetic between two participles of the participle centering, the phase
Seemingly spend the similarity degree for showing between the phonetic of the participle centering first participle and the phonetic of the second participle;
Second determining unit, if meeting default condition for the similarity, by the participle centering
The error correction participle that two participles are identified as mutual error correction participle or the first participle is the second participle.
The third aspect, the embodiment of the present invention provides a kind of server, and the server includes that processor and PERCOM peripheral communication connect
Mouthful, the processor is used for:
The first language material is collected in the form of participle pair;
Two participles of participle centering in first language material are marked all in the form of phonetic;
Determine the similarity of phonetic between two participles of the participle centering, the similarity is used to show the participle pair
Similarity degree between the phonetic of the phonetic of the middle first participle and the second participle;
If the similarity meets default condition, two participles of the participle centering are identified as each other
Error correction participle or error correction participle that the first participle is the second participle;
The similarity is met into the participle of default condition to forming error correction dictionary;
The error correction dictionary is sent to into terminal by the external communication interface.
Fourth aspect, the embodiment of the present invention provides a kind of computer-readable storage medium, stores in the computer-readable storage medium
There are computer executable instructions, the computer executable instructions are used to perform the text error correction method that above-mentioned first aspect is provided.
The embodiment of the present invention provides a kind of text error correction method and device, server, storage medium, wherein, with participle pair
Form collect the first language material;Two participles of participle centering in first language material are marked all in the form of phonetic;It is determined that
The similarity of phonetic between two participles of the participle centering, the similarity is used to show the participle centering first participle
Similarity degree between the phonetic of phonetic and the second participle;If the similarity meets default condition, by the participle
The error correction participle that two participles of centering are identified as mutual error correction participle or the first participle is the second participle;Thus, logical
Cross pronunciation Similar Text and excavate note transition probability as note similarity, it is possible to increase error correction probability.
Description of the drawings
Fig. 1 realizes schematic flow sheet one for embodiment of the present invention text error correction method;
Fig. 2-1 realizes schematic flow sheet two for embodiment of the present invention text error correction method;
Fig. 2-2 is the relation schematic diagram of the computing device of the embodiment of the present invention first and the second computing device;
Fig. 2-3 realizes schematic flow sheet three for embodiment of the present invention text error correction method;
Fig. 3-1 realizes schematic flow sheet four for embodiment of the present invention text error correction method;
Fig. 3-2 is that step S301 realizes schematic flow sheet in Fig. 3-1;
Fig. 3-3 is that step S302 realizes schematic flow sheet in Fig. 3-1;
Fig. 3-4 is that step S324 realizes schematic flow sheet in Fig. 3-3;
Fig. 4 is the composition structural representation one of embodiment of the present invention text error correction device;
Fig. 5 is the composition structural representation two of embodiment of the present invention text error correction device;
Fig. 6 is the composition structural representation of embodiment of the present invention server.
Specific embodiment
Below in conjunction with the accompanying drawings the technical solution of the present invention is further elaborated with specific embodiment.
In order to solve aforesaid technical problem, the embodiment of the present invention provides a kind of text error correction method, and the method is used for shape
Into the corresponding error correction participle of participle, during realization, the method can call journey by the processor of the first computing device
Realizing, certain program code can be stored in computer-readable storage medium sequence code, it is seen then that first computing device is at least wrapped
Processor and storage medium are included, first computing device can set for various types of electronics with information processing capability
It is standby, such as described electronic equipment can include mobile phone, panel computer, desktop computer, personal digital assistant, navigator, digital telephone,
Visual telephone, television set etc..
Fig. 1 realizes schematic flow sheet one for embodiment of the present invention text error correction method, as shown in figure 1, the method includes:
Step S101, collects the first language material in the form of participle pair;
Here, step S101 is one the step of collect language material, and during realization, step S101 can be from following several
Individual channel collects language material:The nearly sound words allusion quotation of Chinese Chinese language, the confusing dialect of note and RP dictionary, speech recognition errors
Input method mistake annotation results in annotation results and line.The form that language material is collected is complete in the form of participle is to (phrase fragment to)
Into for example:" logging off "-" lower limb slightly dragon ", " coupons "-" cash equivalent volume ", " comrades "-" bobbins ", " comrades "-
" notice door ", " dried shrimps "-" villagers ", " brined vegetable are too expensive "-" let's start the meeting " and " sausage pickled melon "-" chief of township's speech ",
" comrades "-" let's start the meeting " etc..It should be noted that allowing the participle pair for including mistake, such as participle pair in the first language material
The phonetic diversity ratio of " comrades "-" let's start the meeting " is larger, is generally not to be regarded as the similar participle pair of phonetic.
Also include the second language material for forming the initial and the final similarity matrix, the second language material and first in the other embodiment of the present invention
Language material can be with difference, and the second language material can essentially regard a standard corpus as, i.e. the second language material the inside should not include mistake
Participle pair by mistake;And wrong participle pair can be included in the first language material, the first language material forms this through the embodiment shown in Fig. 1
The participle set that invention is provided.
Step S102, two participles of participle centering in first language material are marked all in the form of phonetic;
Here, continue to accept the example in above-mentioned steps S101, the phonetic for marking " brined vegetable are too expensive " is " xian-cai-
Tai-gui ", the phonetic for marking " let's start the meeting " is " xian-zai-kai-hui ", and the phonetic for marking " sausage pickled melon " is
" xiang-chang-jiang-gua ", the phonetic for marking " chief of township's speech " is " xiang-zhang-jiang-hua ".
Step S103, determines the similarity of phonetic between two participles of the participle centering, and the similarity is used to show
Similarity degree between the phonetic of the participle centering first participle and the phonetic of the second participle;
Here, continue to accept the example in above-mentioned steps S101, for example determine " logging off " and " lower limb slightly dragon " the two
The similarity of phonetic between participle, for another example, it is determined that between " comrades " and " bobbins " the two participles phonetic similarity, then
Similarity as determined phonetic between " comrades " and " let's start the meeting " the two participles.
Here, the similarity for determining phonetic between described two participles of participle centering includes:Using default initial consonant
Simple or compound vowel of a Chinese syllable similarity matrix determines the similarity of phonetic between two participles of the participle centering.Wherein with regard to determining the initial and the final phase
Describe in other examples like the process of degree matrix.
Step S104, judges whether the similarity meets default condition;
Here, the default condition can be threshold value, described to judge whether similarity meets default condition and include:Sentence
Whether the similarity of breaking is more than the threshold value, if the similarity is more than the threshold value, it is determined that full for the similarity
The foot default condition, if the similarity is less than or equal to the threshold value, it is determined that be unsatisfactory for for the similarity described
Default condition.
Step S105, if the similarity meets default condition, two participles of the participle centering is distinguished
It is defined as mutual error correction participle or error correction participle that the first participle is the second participle.
Here, continue to accept the example in above-mentioned step S103, step S105 can be identified as two participles
Mutual error correction participle;For example, " brined vegetable are too expensive " is defined as the error correction participle of " let's start the meeting ", " let's start the meeting " is defined as
The error correction participle of " brined vegetable are too expensive ";And for example, " sausage pickled melon " is defined as the error correction participle of " chief of township's speech ", by " chief of township's speech "
It is defined as the error correction participle of " sausage pickled melon ".Rapid S105 can be by error correction participle that the first participle is the second participle;For example, can be with
" comrades " are defined as into the error correction participle of " bobbins ", " will be logged off " and be defined as the error correction participle of " lower limb is slightly imperial ".From with
It is upper as can be seen that being defined as mutual error correction participle is actually a kind of two-way mechanism for correcting errors, and be defined as the first participle and be
The error correction participle of the second participle is actually a kind of unidirectional mechanism for correcting errors, this is because, two points in two-way mechanism for correcting errors
Word is all the everyday expressions in Working Life study, and only the applied environment of the two of participle centering word is different, for example,
" let's start the meeting " and " brined vegetable are too expensive " mutual error correction participle (i.e. two-way mechanism for correcting errors) each other, " let's start the meeting " is generally used for
In work, and " brined vegetable are too expensive " is generally used in life.Mistake is generally acknowledged to by the participle of error correction in unidirectional mechanism for correcting errors
Word, for example " log off " and be the error correction participle of " lower limb slightly dragon ", i.e., " log off " for error correction " lower limb slightly dragon ", and " lower limb is thick
Dragon " be generally divided into be a mistake participle;For another example, " comrades " are the error correction participles of " bobbins ", i.e., " comrades " use
In error correction " bobbins " or " notice door ", " bobbins " or " notice door " be generally divided into be a mistake participle.Need
Bright, above-mentioned unidirectional mechanism for correcting errors can be converted under certain conditions two-way mechanism for correcting errors, such as in some feelings
Under condition, " bobbins " or " notice door " are likely to be considered as a correct word.
In the embodiment of the present invention, an error correction dictionary can be formed according to the error correction participle that step S105 determines, i.e. described
Method also includes:The similarity is met into the participle of default condition to forming error correction dictionary;The error correction dictionary is sent
To terminal.Include some participle set in the error correction dictionary, the participle set at least includes a phase according to phonetic
Seemingly spend the error correction participle for treating error correction participle described in error correction for being calculated and being drawn, for example, " comrades " corresponding participle
Set includes " bobbins " and " notice door ", and " let's start the meeting " corresponding participle set includes " brined vegetable are too expensive ", and " notice door " is right
The participle set answered includes " bobbins " and " comrades ", and " lower limb is slightly imperial " corresponding participle set includes " logging off ".
In the above-described embodiment, need to carry out pinyin marking to two participles of participle centering in step S102, in mark
During note phonetic, step S102 is comprised the following steps:
Step S121, judges whether the participle centering includes Arabic numerals;
Step S122, if the participle centering includes Arabic numerals, the Arabic numerals is converted to corresponding
Chinese character;
Here, suppose that participle is " speed 8 " or " Mo Tai 168 ", then the Arabic numerals in participle are converted to into Chinese character is
" speed eight " or " Mo Tai 1 ".
Step S123, the participle be converted to after Chinese character of the participle centering is marked in the form of phonetic;
Here, continue to accept the example in above-mentioned steps S122, it is assumed that participle is " speed 8 " or " Mo Tai 168 ", then in mark
When noting the phonetic of the two participles, the phonetic of " speed 8 " is " su-ba ", and the phonetic of " Mo Tai 168 " is " mo-tai-yi-liu-ba "
Or " mo-tai-yao-liu-ba ".Wherein in " Mo Tai 1 " one is polyphone, can be labeled as " yi or yao ".
Step S124, judges whether the participle centering includes polyphone;
Step S125, if the participle centering does not include polyphone, by two participles of the participle centering all
Marked in the form of phonetic.
Here, such as participle includes polyphone to " lower limb is slightly imperial "-" logging off ", then the two participles are carried out
Pinyin marking:The phonetic of " lower limb is slightly imperial " is " tui-cu-long ", and the phonetic of " logging off " is " tui-chu-xi-tong ".
Step S126, if the participle centering includes polyphone, continuation is judged in two participles of the participle centering
Whether polyphonic word is had;
Here, such as participle is to for " PianYiFang "-" variation side " or " PianYiFang "-" derogatory sense side ", wherein " PianYiFang "
In " just " be polyphone, " just " corresponding phonetic has " two sound pi á n " and " four tones of standard Chinese pronunciation bi à n ".So continue to judge the participle pair
In two participles in whether have polyphonic word, for example, in " PianYiFang " it is " cheap " be polyphonic word, " cheap " corresponding phonetic bag
Include " bian-yi " and " pian-yi ".In general, the collection of language material is in the form of participle pair, then the present embodiment is in order to carry
High efficiency, directly judges whether participle centering has polyphonic word, and for example, " list " word is polyphone, is singly pronouncing the " four tones of standard Chinese pronunciation as surname Shi
Sh à n ", in the title as ancient times Xiongnu monarch " ch á n " is pronounced, and " sound a d is pronounced during such as loneliness as in general phrase
ān”.When language material is collected, if participle is to for " loneliness "-" fighting single-handed ", although be singly polyphone, but loneliness is not multitone
Word, then lonely phonetic is exactly unique, and without " list " is labeled as into three phonetics " ch á n ", " sh à n " and " d ā n ".By
This is visible, and the method that the present embodiment is provided can significantly improve computational efficiency.If the appearance of participle centering polyphone is single
Word rather than phrase, then need to mark out each phonetic of the word to come;As it was previously stated, the collection of language material is participle pair
Form, therefore, the appearance of participle centering polyphone is that the situation of single word rather than phrase will be very rare.
Step S127, if at least one includes polyphonic word in two participles of the participle centering, by the multitone
The corresponding two or more phonetic of word is labeled as the part or all of of the phonetic of the participle centering correspondence participle.
Here, continue to accept above-mentioned example, " PianYiFang " mark phonetic be include " bian-yi-fang " and
“pian-yi-fang”。
From the above, it can be seen that above-mentioned steps S102 can essentially be one by participle to turning the process of phonetic, in reality
The step can realize that each word is obtained in Chinese dictionary to be had by way of one is looked into Chinese character and pinyin table in existing process
Correspondence phonetic.Process step is as follows:1) encounter non-polyphone table look at and turn phonetic, 2) encounter polyphone, check the word with week
The group word result of side word is tabled look-up, and exists and word has unique pronunciation then to turn phonetic;Exist and word pronunciation multitone not yet, using language
Model determines pronunciation (such as:Cheap pin-yi, bian-yi);3) do not exist using acquiescence pronunciation (in table polyphone have acquiescence send out
Sound);4) encounter Arabic numerals, switch to corresponding Chinese character and tabled look-up again;4) encounter English character, mark phonetic can not be done
Process;5) encounter the Chinese character not in table, skip the word, and the phonetic of the position is set to into sky.
It should be noted that in above-mentioned steps S121 to step S127, step S121 to step S123 and step S124
Relation is performed to strict priority is had no between step S127, i.e., in implementation process, step S121 can be first carried out to step
S123, then execution step S124 is to step S127;Certainly step S124 can also be first carried out to step S127, then execution step
S121 is to step S123.
In other embodiments of the invention, step S103 is used to determine phonetic between two participles of the participle centering
Similarity, the step includes:
Step S131, by the alignment of the initial consonant of the phonetic of two participles of participle centering and by the rhythm of the phonetic of two participles
Mother's alignment;
Here, in other embodiments of the invention, by the alignment thereof of most same pronunciations by the participle centering two
The initial consonant alignment of the phonetic of individual participle simultaneously aligns the simple or compound vowel of a Chinese syllable of the phonetic of two participles.For example, " log off "-" lower limb is slightly imperial "
Phonetic alignment is as follows:
" logging off " --- t-ui-ch-u-x-i-t-ong;
" lower limb is slightly imperial " --- t-ui-c--u- -- l-ong;
During alignment, in order to obtain most same pronunciations, by the simple or compound vowel of a Chinese syllable " long " of " dragon " and the simple or compound vowel of a Chinese syllable of " system "
" long " aligns, rather than the phonetic of " dragon " and " being " is alignd;" " represents default.In this example, " logging off " is four
Word, " lower limb is slightly imperial " is three words, is first alignd in order in the alignment for most starting, i.e. spelling of the phonetic of " lower limb " corresponding to " moving back "
Sound, the phonetic of " thick " correspond to the phonetic of " being " corresponding to the phonetic of " going out ", the phonetic of " dragon ", and the phonetic of " system " is default, the
One group " lower limb " is very high with the similarity of " moving back " and second group " thick " and " going out ", but the similarity of the 3rd group " dragon " and " being " is very
It is low, at this time, the present embodiment can enter line misregistration process, will the 3rd group be changed to:The phonetic of " being " is default, the 4th group of change
For:Phonetic of the phonetic of " dragon " corresponding to " system ";Process through dislocation, first group, second group and the 4th group of similarity all can
Comparison is high.In correlation technique, during using voicing text similarity, by two sections of texts according to word sequence aligned in position, prior art
The disadvantage is that, in the case of meeting some sentence multiword or few word, the mistake alignment plenary session mistake of follow-up location.And this
The method that bright embodiment adopts most same pronunciations, in the case of ensure that certain section of text multiword or few word, two sections of texts
Between alignment.
Step S132, the conversion of the phonetic that the phonetic for calculating the participle centering first participle is converted to the second participle is general
Rate;
Step S133, according to the transition probability similarity of phonetic between two participles of the participle centering is determined.
In other embodiments of the invention, two kinds of modes for realizing step S132 are provided below:
Mode one:First kind of way is fairly simple, that is, determine different between the first participle and second participle
The number of note, then according to the length of the note string of the number and the first participle or second participle of different notes
Determine the transition probability, wherein the length of note string can be the first participle number of words be multiplied by 2 product, or, note string
Length can also be the product that the number of words of second participle is multiplied by 2, or, the length of the note string can be the first participle
The number of words sum of number of words and the second participle is multiplied by again 2 product, because the phonetic of a Chinese character includes initial consonant and simple or compound vowel of a Chinese syllable, then sound
The length of symbol string is just 2 times of Chinese total number.By " logging off "-" lower limb slightly dragon " this to participle as a example by illustrate:Assume
It is " lower limb is slightly imperial " by the participle (first participle) of error correction, error correction participle (the second participle) is " logging off ", between the participle pair
The number of different notes be 4, respectively " ch ", " x ", " i " and " t ", wherein note includes initial consonant and simple or compound vowel of a Chinese syllable, then described
Transition probability may be calculated:4 ÷ 6 (i.e. 4 divided by 6,6 for the note string of the first participle length), (i.e. 4 are 4 ÷ 8 divided by 8,8
The length of the note string of the second participle) or 4 ÷ (6+8) (i.e. 4 is the note string of the first participle and the second participle divided by 14,14
Length sum).Assume that by the participle (first participle) of error correction be " logging off ", error correction participle (the second participle) is for " lower limb is thick
Dragon ", the number of the different note between the participle pair is 2, and respectively " c " and " l ", wherein note includes initial consonant and simple or compound vowel of a Chinese syllable,
So described transition probability may be calculated:2 ÷ 6 (i.e. 2 is the length of the note string of the first participle divided by 6,6), 2 ÷ 8 (i.e. 2
Divided by 8,8 for the note string of the second participle length) or 2 ÷ (6+8) (i.e. 2 divided by 14, and 14 is the first participle and second point
The length sum of the note string of word).
It should be noted that the relation between the above-mentioned transition probability for calculating and similarity is in inverse ratio, that is, shift
Probability is less, and similarity is bigger, and transition probability is bigger, and similarity is less, and the transition probability is between [0,1], i.e. institute
Transition probability is stated more than or equal to 0 and less than or equal to 1, when transition probability is 0, shows that the first participle is with the note of the second participle
It is identical, such as " comrades "-" notice ";When transition probability is 1, show the sound of the first participle and the second participle
Symbol is diverse, such as " comrades "-" let's start the meeting ".In order to there is a good corresponding relation to be easier in other words
Understand transition probability, similarity can be calculated using following relational expression:Similarity=1- transition probabilities.So calculate
Similarity between [0,1], i.e., described similarity, when similarity is 0, represents participle pair more than or equal to 0 and less than or equal to 1
In two participles phonetic it is entirely different, when similarity be 1 when, represent participle centering two participles the complete phase of phonetic
Together.
Mode two:The second way is to calculate first point of the participle centering using default the initial and the final similarity matrix
The phonetic of word is converted to the transition probability of the phonetic of the second participle, step S132, the calculating participle centering first participle
Phonetic be converted to the second participle phonetic transition probability, including:
Step S1321, if the word unisonance of two participle same positions after alignment, calculates score Score and adds 1, and
The position of the position of the participle centering first participle and the second participle is all added 1;
Step S1322, if the word not unisonance of two participle same positions after alignment, according to default the initial and the final phase
Determine score Score of the phonetic of the phonetic of the first participle and the second participle in described two participles like degree matrix;
Step S1323, determines normalized according to score Score, the number of words of the first participle, the number of words of the second participle
Final score Sf;
Step S1324, determines that the phonetic of the participle centering first participle is converted to second according to the final score Sf
The transition probability of the phonetic of participle.
In other embodiments of the invention, it is described to determine described two points according to default the initial and the final similarity matrix
Score Score of the phonetic of the phonetic of the first participle and the second participle in word, including:
Step S13221, the initial consonant of the word of two participle same positions is obtained according to default the initial and the final similarity matrix
Between similarity, simple or compound vowel of a Chinese syllable between similarity;
Step S13222, if product S of the similarity between the similarity and simple or compound vowel of a Chinese syllable between initial consonant is default more than first
Value, then calculate score Score and add S, all adds 1 by the position of the position of the participle centering first participle and the second participle;
Step S13223, if similar between the similarity between the initial consonant of the word of two participle same positions and simple or compound vowel of a Chinese syllable
Product S of degree is less than or equal to the first preset value, then the present bit of the first participle is obtained according to the initial and the final similarity matrix
The similarity between similarity, simple or compound vowel of a Chinese syllable, initial consonant and simple or compound vowel of a Chinese syllable between the word put and the initial consonant of the word of the next position of the second participle
Between similarity and the similarity between simple or compound vowel of a Chinese syllable and initial consonant;
Here, first preset value and the second following preset value, the 3rd preset value can be empirical value, and first presets
Value is with the second following preset value, the 3rd preset value can be with identical, and for example all value is 0.8, naturally it is also possible to different.
Step S13224, determines the first maximum, first maximum for the current location of the first participle word and the
Product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of the next position of two participles, the first participle are worked as
The present bit of similarity and the first participle between the simple or compound vowel of a Chinese syllable of the word of the next position of the initial consonant of the word of front position and the second participle
Maximum between similarity between the initial consonant of the word of the next position of the simple or compound vowel of a Chinese syllable of the word put and the second participle this three;
Whether step S13225, judge first maximum more than the second preset value, before calculating score Score is
Score adds first maximum before obtaining, and the position of the position of the participle centering first participle and the second participle is all added
1;
Step S13226, it is similar according to the initial and the final if first maximum is less than or equal to the second preset value
Degree matrix obtains similarity, the rhythm between the initial consonant of the word of the current location of the word and the second participle of the next position of the first participle
Similarity between mother, the similarity between initial consonant and simple or compound vowel of a Chinese syllable and the similarity between simple or compound vowel of a Chinese syllable and initial consonant;
Step S13227, determines the second maximum, second maximum for the next position of the first participle word and the
Under product S, the first participle of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of the current location of two participles
The next bit of similarity and the first participle between the simple or compound vowel of a Chinese syllable of the word of the current location of the initial consonant of the word of one position and the second participle
Maximum between similarity between the initial consonant of the word of the current location of the simple or compound vowel of a Chinese syllable of the word put and the second participle this three;
Whether step S13228, judge second maximum more than the 3rd preset value, before calculating score Score is
Score adds second maximum before obtaining, and the position of the position of the participle centering first participle and the second participle is all added
1, then judge the whether unisonance of the word after the location updating of the first participle and the second participle, the Jing above-mentioned steps traversal first participle
The phonetic of phonetic and the second participle simultaneously calculates score Score.
In embodiments of the present invention, a kind of method of determination the initial and the final similarity matrix is also provided, described in the determination
The initial and the final similarity matrix includes:
Step S140, collects the second language material in the form of participle pair;By two points of participle centering in second language material
Word is marked all in the form of phonetic;
Here, second language material is used to form the initial and the final similarity matrix, the second language material and aforesaid first language material
Can be with difference, the second language material can essentially regard a standard corpus as, i.e. the second language material the inside should not include mistake
Participle pair;And wrong participle pair can be included in the first language material, the first language material forms the present invention through the embodiment shown in Fig. 1
The participle set of offer.
Here, pinyin marking can adopt aforesaid mask method, for example with the method for most same pronunciations.
Step S141, determines that first note is by the wrong total degree that pronounced, the first note bag in second language material
Include initial consonant or simple or compound vowel of a Chinese syllable;
Here, the second language material can be standard corpus storehouse, so to a certain extent whether abundant decide of corpus is entangled
Wrong accuracy, in the present embodiment, in addition to aforesaid dictionary, the collection of language material also includes the session log between user,
Language material is excavated in session log from line, the main purpose of the step is connected applications field, dug from user's history daily record
Excavate the error correction candidate for meeting application target.The thinking for excavating session log is also as similar using the pronunciation similarity of text
Degree tolerance, in general, main method has two:A) user conversation (session) (such as customer service session), digging user is actively
Repair pronunciation mistake;From session context, the pronunciation analog result between different inputs is excavated repeatedly;B) field is manually customized
Emphasis phrase, excavates fallibility candidate, and with reference to business objective, artificial customization field emphasis phrase is excavated from a large amount of daily records and determined
The similar result of phrase pronunciation processed.
Step S142, determines the first note by number of times that incorrect pronunciations are the second note;
Step S143, be by incorrect pronunciations by the total degree and the first note of mistake of pronouncing according to the first note
The number of times of the second note determines probability of the first note transfer for the second note;
Step S144, determines that the second note is by the wrong total degree that pronounced, the second note bag in second language material
Include initial consonant or simple or compound vowel of a Chinese syllable;
Step S145, determine second note by incorrect pronunciations for first note number of times;
Step S146, be by incorrect pronunciations by the total degree and second note of mistake of pronouncing according to second note
The number of times of first note determines probability of the second note transfer for first note;
Step S147, it is first to be shifted for the probability and second note of the second note according to first note transfer
Similarity between first note described in the determine the probability of note and second note.
Here, illustrate by taking " let's start the meeting "-" brined vegetable are too expensive " as an example:It is as follows first to the participle to marking phonetic:
Let's start the meeting x ian z ai t ai h ui;
Brined vegetable too your x ian c ai t ai g ui;
After alignment, inconsistent note has z and c, h and g.
Calculate first note z transfers now for the second note c probability (i.e. the transition probability p (c | z) of note), will obtain
The participle obscured of all pronunciations to aliging, the inconsistent various note number of times of statistics pronunciation, the transfer for calculating note is general
Rate p (c | z):
P (c | z)=count (z->c)/count(z) (1);
In formula (1):P (c | z) is the transition probability that note z incorrect pronunciations are note c;count(z->C) it is the second language material
Middle note z incorrect pronunciations are the number of times of c;Count (z) be in the second language material note z by pronounced mistake total degree;
The Probability p (z | c) that the second note c incorrect pronunciations are first note z is calculated in the same manner.
Then for the Probability p (c | z) and the second note c incorrect pronunciations of the second note c it is first according to first note z transfers
The Probability p (z | c) of note z determines pronunciation similarity Sim (c, z) between note z and note c, can be with during realization
Obtained using formula (2):
Sim (c, z)=(P (c | z)+P (z | c))/2 (2).
It should be noted that note includes initial consonant and simple or compound vowel of a Chinese syllable, then the initial and the final similarity matrix actually at least includes
Three matrixes, such as similarity matrix between initial consonant and initial consonant, the similarity matrix between initial consonant and simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable and rhythm
Similarity matrix between mother, where it is assumed that initial consonant has 21, then the similarity matrix between initial consonant and initial consonant is 21 × 21
Square formation, it is assumed that initial consonant has 39, then the similarity matrix between initial consonant and initial consonant for 39 × 39 square formation, initial consonant and simple or compound vowel of a Chinese syllable
Between similarity matrix for 21 × 39 matrix.In embodiment afterwards, if necessary to determine turning between two initial consonants
Move probability, the then similarity matrix that can directly inquire about between initial consonant and initial consonant, if necessary to determine between two simple or compound vowel of a Chinese syllable
Transition probability, then the similarity matrix that can directly inquire about between simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable;If necessary to determine initial consonant and rhythm
Transition probability between mother, the then similarity matrix that can directly inquire about between initial consonant and simple or compound vowel of a Chinese syllable.
Based on aforesaid embodiment, the embodiment of the present invention provides again a kind of text error correction method, during realization, should
Method can realize that certain program code can be stored in calculating by the processor caller code of the second computing device
In machine storage medium, it is seen then that second computing device at least includes processor and storage medium, second computing device can be with
For various types of electronic equipments with information processing capability, such as described electronic equipment can include mobile phone, panel computer,
Desktop computer, personal digital assistant, navigator, digital telephone, visual telephone, television set etc..
Fig. 2-1 realizes schematic flow sheet two, as shown in Fig. 2-1, the method for embodiment of the present invention text error correction method
Including:
Step S201, it is determined that error correction participle is treated, the participle treated in the sentence that error correction participle is user input;
Here, during realization, the text of user input is often a word, or continuously multiple participles, that
Need that the text of user input is made pauses in reading unpunctuated ancient writings, can be disconnected in the form of participle during punctuate, for example, the possibility of user input
It is " bobbins, let's start the meeting for we ", then in punctuate, the auxiliary word such as modal particle, auxiliary word can be removed, and adopts
The form of participle disconnects, and the result of disconnection is " bobbins-we-let's start the meeting ".After disconnection, it is determined that treating that error correction participle is wrapped
Include:" bobbins ", " we " and " let's start the meeting ".
Step S202, judges whether to treat the corresponding participle set of error correction participle with described, in the participle set extremely
Include the error correction participle for treating error correction participle described in error correction that a similarity according to phonetic is calculated and drawn less;
Here, illustrate by taking " bobbins " as an example, that is, judge that whether " bobbins " include corresponding participle set, by with
Upper to understand, the participle set of " bobbins " includes " comrades " and " notice door ";And for example, by taking " let's start the meeting " as an example, judge " existing
Whether include corresponding participle set in session ", as known from the above, the participle set of " let's start the meeting " includes " brined vegetable are too expensive ".
Step S203, determines first language model score, and the first language model score treats that error correction participle exists for described
Language model scores in the sentence;
Here, continue to accept the example in step S202, that is, determine the language model scores of " bobbins ", and determine
The language model scores of " let's start the meeting ".
Step S204, determines second language model score, and the second language model score treats error correction participle collection for described
Error correction participle language model scores respectively in the sentence in conjunction;
Here, continue to accept the example in step S202, that is, determine the language model scores of " comrades " and " notice door ",
And the language model scores of determination " brined vegetable are too expensive ".
Step S205, judges to be obtained more than the first language model with the presence or absence of having in the second language model score
Point, obtain judged result;
Here, continue to accept the example in above-mentioned steps, " let's start the meeting " corresponding second language model score only has one
It is individual, i.e. language model scores of " brined vegetable are too expensive ";It is and " bobbins " corresponding second language model score includes two, i.e., " same
The language model scores of the language model scores of will " and " notice door ";When this is in judgement, that is, judge " brined vegetable are too expensive "
Language model scores whether more than the language model scores of " let's start the meeting ", whether judge the language model scores of " comrades "
More than the language model scores of " bobbins ", and judge whether the language model scores of " notice door " are more than the language of " bobbins "
Speech model score.Assume the language model scores of " comrades " higher than " bobbins " and " notice door ", the language of " let's start the meeting "
Model score is higher than " brined vegetable are too expensive ", then for " let's start the meeting ", and judged result is do not exist in second language model score
More than the first language model score;For " bobbins ", judged result is more than institute to exist in second language model score
State first language model score.
Step S206, treats that error correction participle carries out error correction according to judged result to described.
Here, step S206, it is described to treat that error correction participle carries out error correction to described according to judged result, including:
Step S2061, if there are in the second language model score more than the first language model score, will
The error correction participle of highest scoring is defined as to the error correction word for treating error correction participle in language model scores;Here, step
S206, also includes:The first participle is replaced with into the error correction word of the first participle, is exported.
Step S2062 is not right if being not greater than the first language model score in the second language model score
It is described to treat that error correction participle carries out error correction.
In this example, it is assumed that the language model scores of " comrades " are higher than " bobbins " and " notice door ", then by " cylinder
Son " is corrected as " comrades ";The language model scores for assuming " let's start the meeting " are higher than " brined vegetable are too expensive ", then not to " opening now
Meeting " carries out error correction.
It should be noted that step S202 is during realization, can be by the default related information judgement of inquiry
No presence treats the corresponding participle set of error correction participle with described, the related information during realization can by list,
Realizing, the related information is used to show to treat the corresponding relation between error correction participle and participle set incidence relation etc..It is described
Related information can be (the coming from the first computing device) for pre-setting, naturally it is also possible to be that the first computing device is handed down to
What two computing devices or the second computing device were asked to the first computing device, in other words, referring to shown in Fig. 2-2, realizing Fig. 1 institutes
The first computing device 10 can be regarded as realizing the service of the second computing device 21 and 22 shown in Fig. 2-1 in the technical scheme shown
Device, and the second computing device can be regarded as the terminal of the first computing device, the first computing device 10 can also be regular or indefinite
Phase to second computing device 21 and 22 of user updates related information.
In other embodiments of the invention, referring to shown in Fig. 2-3, on the basis of the method shown in Fig. 1, methods described
Also include:
Step S230, terminal to server sends error correction request, and the sentence of user input is carried in the error correction request;
Here, end side is provided with client, and client can adopt the form of application program (App, Application)
Embody, user detects the sentence of user input in terminal read statement (or text), then client, then, client will
The sentence is carried in error correction request, and then the error correction request is sent to server by client.
Step S231, the error correction request that server receiving terminal sends,
Step S232, server determines treats error correction participle, described to treat in the sentence that error correction participle is the user input
Participle;
Here, in general the text of user input is often in short or continuous multiple participles, then need by
The text of user input is made pauses in reading unpunctuated ancient writings, and can be disconnected in the form of participle during punctuate, and for example, user input is probably " bobbin
, let's start the meeting for we ", then in punctuate, the auxiliary word such as modal particle, auxiliary word can be removed, and using participle
Form disconnects, and the result of disconnection is " bobbins-we-let's start the meeting ".After disconnection, it is determined that treating that error correction participle includes:" cylinder
Son ", " we " and " let's start the meeting ".
Step S233, server judges to whether there is in error correction dictionary treats the corresponding participle set of error correction participle with described;
Step S234, if there is with described the corresponding participle set of error correction participle is treated, server determines first language mould
Type score and second language model score, the first language model score is described to treat language of the error correction participle in the sentence
Speech model score, at least includes that one is used to treat the error correction participle of error correction participle described in error correction in the participle set, described the
Two language model scores treat error correction participle language model scores respectively in the sentence in error correction participle set described in being;
Step S235, judges to be obtained more than the first language model with the presence or absence of having in the second language model score
Point;
Step S236, if there are in the second language model score more than the first language model score, clothes
Business device is defined as the error correction participle of highest scoring in language model scores to the error correction word for treating error correction participle;
Here, step S201 in the embodiment shown in above-mentioned step S232 to step S236 and earlier figures 2-1 is to step
Rapid S206 is similar to, and those skilled in the art is referred to the embodiment shown in earlier figures 2-1 and understands above-mentioned step S232
To step S236.
Step S237, server carries the error correction word in the first error correction response, and first error correction is rung
Terminal should be sent to.
Step S238, if the first language model score is not greater than in the second language model score, or such as
Fruit is not present treats the corresponding participle set of error correction participle with described, and server sends the second error correction response, and second error correction rings
It is applied to show not treat that error correction participle carries out error correction to described.
Step S239, the error correction response that terminal the reception server sends, it is determined that the error correction response for receiving is the first error correction sound
At once, then responding the sentence to user input according to the first error correction carries out error correction;It is determined that the error correction response for receiving is second
When error correction is responded, error correction is not carried out to the sentence of user input.
In the embodiment shown in Fig. 2-1, language model scores are completed in end side, and based on eventually in the present embodiment
The request at end, server is completing language model scores, it can be seen that, when error correction method consumes ratio for the hardware of terminal
When relatively low, the method shown in Fig. 2-1 can be adopted, text error correction, i.e. the method are completed by so can networking can
To complete in the case of offline;When consumption of the error correction method to hardware is higher, the method shown in Fig. 2-3 can be adopted,
The consumption of terminal-pair hardware resource can be so saved, however it is necessary that terminal could be realized with server networking.
Based on aforesaid embodiment, the embodiment of the present invention provides a kind of text error correction side based on Chinese pronunciations similarity
Method, can apply to the voice identification result error correction and Chinese pinyin input method result error correction of Chinese, it is also possible to directly as spy
Take over for use in Chinese Semantic Similarity Measurement.Fig. 3-1 realizes schematic flow sheet four for embodiment of the present invention text error correction method, such as
Shown in Fig. 3-1, the method includes:
Step S301, pronunciation similarity dictionary is excavated;
Here, as shown in figure 3-2, step S301 is further comprising the steps of:
Step S311, collects easily pronunciation and obscures phrase pair;
Here, the step one language material collection step, can collect language material from following channel:The nearly sound word of Chinese Chinese language
Dictionary;The confusing dialect of note and RP dictionary;Speech recognition errors annotation results;Input method mistake mark knot on line
Really.
Here, the form that language material is collected is completed in the form of phrase fragment pair, such as " logging off " --- " lower limb is slightly imperial ", " generation
--- ----" bobbins ", " dried shrimps " --- " villagers ", " brined vegetable are too expensive " --- are " now for " cash equivalent volume ", " comrades " for gold note "
Meeting ", " sausage pickled melon " --- " chief of township's speech ".
Step S312, phrase is to turning phonetic;
This step is realized by way of one is looked into Chinese character and pinyin table, and each word is obtained in Chinese dictionary correspondence
Phonetic, process step is as follows:1) encounter non-polyphone table look at and turn phonetic;2) encounter polyphone, check the word and periphery word
Group word result table look-up;Here, exist and word has unique pronunciation then to turn phonetic, exist and word pronunciation multitone not yet, using language
Speech model determines pronunciation (such as:Cheap pronunciation includes " pin-yi " and " bian-yi ");Do not exist using acquiescence pronunciation (in table
Polyphone has acquiescence pronunciation).3) encounter Arabic numerals, switch to corresponding Chinese character and tabled look-up again;4) English character is encountered, no
Process;5) encounter the Chinese character not in table, skip the word, and the position is set to into sky.
Step S313, phonetic the initial and the final cutting alignment;
Here, due to close phrase centering of pronouncing, incorrect pronunciations are minority note, so herein using by most multiphase
Deng the alignment schemes of pronunciation, for example:
Let's start the meeting x ian z ai t ai hui;
Brined vegetable too your x ian c ai t ai g ui;
After alignment, inconsistent note has z and c, h and g.
Step S314, calculates transition probability between the initial and the final;
Here, all pronunciations for obtaining are obscured to aliging as stated above, the inconsistent various sounds of statistics pronunciation
Symbol number of times, calculate note z incorrect pronunciations for note c transition probability p (c | z):
P (c | z)=count (z->c)/count(z);
Wherein, p (c | z) is transition probability, the count (z- that note z incorrect pronunciations are note c>C) it is note z in language material
Incorrect pronunciations are the number of times of c;Count (z) be in language material note z by pronounced mistake total degree.
Step S315, calculates similarity score between any note;
It is herein that the pronunciation between note z and note c is similar by the calculated p of upper step (c | z) and p (z | c)
Degree is defined as:Sim (c, z)=(P (c | z)+P (z | c))/2;
Similarity between any note is calculated, the initial and the final similarity matrix between a note can be obtained, wherein, sound
The initial and the final similarity matrix includes between initial consonant and initial consonant, between initial consonant and simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable between symbol
Similarity matrix.
Step S302, phrase pronunciation Similarity Measure;
Based on similarity of pronouncing between the calculated note of step S301, this step calculate two any given phrases it
Between pronunciation similarity, idiographic flow as shown in Fig. 3-3, including:
Step S321, such as Arabic numerals pretreatment, " 2 " switch to " two ", are easy to extract phonetic;
Step S322, Chinese character turns phonetic, with step S312;
Step S323, each word pronunciation cutting the initial and the final of pinyin string;
Step S324, word for word travels through two pinyin strings, calculates the similar score of pronunciation;
Here, the current location for first assuming the first participle is pos1, the current location of the second participle is pos2, ScoreSS,
ScoreYY and ScoreSY are respectively between initial consonant and initial consonant, similarity score between simple or compound vowel of a Chinese syllable and simple or compound vowel of a Chinese syllable, between initial consonant and simple or compound vowel of a Chinese syllable,
Can be drawn by inquiring about above-mentioned the initial and the final similarity matrix;Score is score;So calculate the similar score of pronunciation referring to
Fig. 3-4, including:
Step S3241, starts, and arranges pos1=1, pos2=1;
Step S3242, judges whether the word of the current location of the first participle is identical with the word of the current location of the second participle,
If identical, Score+=1, pos1+=1, pos2+=1, continue, returns to step S3242;If it is not the same, then entering
Enter step S3243;
Step S3243, judges whether (S=ScoreSS*ScoreYY) is more than 0.8, if (S=ScoreSS*ScoreYY)>
0.8, then Score+=S, pos2+=1, pos2+=1, continue, returns to step S3242;If (S=ScoreSS*
ScoreYY)≤0.8, it is determined that the similarity for facing a word of the first participle and the second participle, into step S3244.
Step S3244, if (S=ScoreSS*ScoreYY)≤0.8, judges pos1With pos2Whether+1 place has S=max
(ScoreSS*ScoreYY,ScoreSY1,ScoreSY2)>0.8;
If pos1With pos2There is S=max (ScoreSS*ScoreYY, ScoreSY1, ScoreSY2) at+1 place>0.8, then
Score+=S, pos1+=1, pos2+=2, continue, return to step S3242;If pos1With pos2There is S=max at+1 place
(ScoreSS*ScoreYY, ScoreSY1, ScoreSY2)≤0.8, then into step S3245;
Step S3245, judges pos1+ 1 and pos2Place whether have (S=max (ScoreSS*ScoreYY, ScoreSY1,
ScoreSY2)>0.8;
If pos1+ 1 and pos2Place, (S=max (ScoreSS*ScoreYY, ScoreSY1, ScoreSY2)>0.8, then
Score+=S, pos1+=2, pos2+=1, continue, returns to step S3242;
Terminated to step S3245 traversal by above-mentioned step S3242, Score is the similarity score of two participles.
Step S325, similarity score normalization, referring to as follows:
Sf=Score*2/ (Size1*Size2)
Wherein:Sf is the final score after normalization, and Score is that previous step travels through score, and Size1 is the first Chinese character string
Number of words, Size2 is the number of words of the second Chinese character string;
Step S303, error correction candidate excavates;
Based on the similarity calculating method of upper step, error correction candidate is excavated in interactive log from line.The main mesh of this step
Be connected applications field, the error correction candidate for meeting application target is excavated from user's history daily record.
The thinking of error correction candidate is excavated as conventional error correction problem thinking, difference is using the pronunciation similarity of text
As measuring similarity.Main method has two:A) user conversation (for example customer service is to session), digging user actively repairs pronunciation
Mistake, from session context, excavates repeatedly the pronunciation analog result between different inputs;B) artificial customization field emphasis is short
Language, excavates fallibility candidate;With reference to business objective, artificial customization field emphasis phrase is excavated and customization phrase from a large amount of daily records
The similar result of pronunciation.
Step S304, error correction;
Online error correction is carried out to (participle set) based on error correction candidate, the thinking of the embodiment of the present invention is as follows:
1) user input S0 participle;
Adjacent multiple word combination phrases search whether that there is error correction candidate (attempts respectively adjacent 1 to 4 phrases from candidate
The phrase of conjunction), there is error correction candidate and correspondence phrase then replaced in former input, as a kind of user may be input into Si (i=1,
2, ,).
2) respectively calculate user be originally inputted S0 and it is various may input Si language model scores (language model scores can
To weigh the flow process degree of sentence);
3) score of S0 and multiple Si is compared;
If S0 scores are high, error correction is not carried out;If Si scores are high, the substitute mode of Si carries out error correction
Embodiment more than can be seen that in the embodiment of the present invention and excavate note transition probability by pronunciation Similar Text
As note similarity, and the alignment requirements of phonetic are relaxed, that is, allow to find most like note in window, had in process
During the participle of Arabic numerals, Arabic numerals are first converted to Chinese character, can so calculate the participle with Arabic numerals
With the similarity between other participles.By above technological means, technical scheme provided in an embodiment of the present invention has following skill
Art advantage:1) the pronunciation similarity obtained using Statistics-Based Method, Data Source truly should more can be represented in user behavior
Similarity between note, as a result more accurate with the case of;2) each sound of different pronunciation types and same pronunciation type can be obtained
Pronounce similarity degree between symbol, is a floating point values, and the similarity degree between different notes is more comparable;3) voicing text is being calculated
During the aligned in position of similarity, it is allowed to optimal alignment result is found in a window, is had to the Similarity Measure of hiatus or multiword
Robustness.
Based on aforesaid embodiment, the embodiment of the present invention provides a kind of text error correction device, each list included by the device
Unit, and each module included by each unit, or even each submodule included by each module, can pass through the first computing device
In processor realizing, also can be realized by specific logic circuit certainly;During specific embodiment, processor can
Think central processing unit (CPU), microprocessor (MPU), digital signal processor (DSP) or field programmable gate array (FPGA)
Deng.
Fig. 4 is the composition structural representation one of embodiment of the present invention text error correction device, and shown in Fig. 4, the device 400 includes
First forms unit 401, mark unit 402, the first determining unit 403, the first judging unit 404 and the second determining unit 405,
Wherein:
Described first forms unit 401, for collecting the first language material in the form of participle pair;
The mark unit 402, for two participles of the participle centering to be marked all in the form of phonetic;
First determining unit 403, it is described for determining the similarity of phonetic between two participles of the participle centering
Similarity is used to show the similarity degree between the phonetic of the participle centering first participle and the phonetic of the second participle;
First judging unit 404, for judging whether the similarity meets default condition;
Second determining unit 405, if meeting default condition for the similarity, by the participle centering
Two participles be identified as mutual error correction participle.
In other embodiments of the invention, the mark unit includes the first judge module and the first labeling module, its
In:
First judge module, for judging whether the participle centering includes polyphone;
First labeling module, if not including polyphone for the participle centering, by the participle centering
Two participles mark all in the form of phonetic.
In other embodiments of the invention, the mark unit also includes the second judge module and the second labeling module,
Wherein:
Second judge module, if including polyphone for the participle centering, continuation judges the participle centering
Two participles in whether have polyphonic word;
Second labeling module, if in two participles of the participle centering at least one include multitone
Word, the corresponding two or more phonetic of the polyphonic word is labeled as the part or complete of the phonetic of the participle centering correspondence participle
Portion.
In other embodiments of the invention, the mark unit includes the 3rd judge module, modular converter and the 3rd mark
Injection molding block, wherein:
3rd judge module, for judging whether the participle centering includes Arabic numerals;
The modular converter, if including Arabic numerals for the participle centering, by Arabic numerals conversion
For corresponding Chinese character;
3rd labeling module, for by the participle be converted to after Chinese character of the participle centering in the form of phonetic mark
Note.
In other embodiments of the invention, first determining unit includes that alignment module, computing module and first are true
Cover half block, wherein:
The alignment module, for by the alignment of the initial consonant of the phonetic of two participles of participle centering and by two participles
The simple or compound vowel of a Chinese syllable alignment of phonetic;
The computing module, the phonetic for calculating the participle centering first participle is converted to the phonetic of the second participle
Transition probability;
First determining module, for determining phonetic between two participles of the participle centering according to the transition probability
Similarity.
In other embodiments of the invention, the alignment module, for by the alignment thereof of most same pronunciations by institute
State the initial consonant alignment of the phonetic of two participles of participle centering and the simple or compound vowel of a Chinese syllable of the phonetic of two participles aligns.
In other embodiments of the invention, the computing module include calculating sub module, the first determination sub-module, second
Determination sub-module and transform subblock, wherein:
The calculating sub module, if for the word unisonance of two participle same positions after alignment, calculating score
Score adds 1, and all adds 1 by the position of the position of the participle centering first participle and the second participle;
First determination sub-module, if for the word not unisonance of two participle same positions after alignment, according to pre-
If the initial and the final similarity matrix determine the score of the phonetic of the phonetic of the first participle and the second participle in described two participles
Score;
Second determination sub-module, for according to score Score, the number of words of the first participle, the second participle word
Number determines normalized final score Sf;
The transform subblock, for determining that the phonetic of the participle centering first participle turns according to the final score Sf
It is changed to the transition probability of the phonetic of the second participle.
In other embodiments of the invention, second determination sub-module, is used for:
It is similar between the initial consonant of the word that two participle same positions are obtained according to default the initial and the final similarity matrix
Similarity between degree, simple or compound vowel of a Chinese syllable;
If product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between initial consonant is more than the first preset value, calculate
Divide Score to add S, the position of the position of the participle centering first participle and the second participle is all added 1;
If product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of two participle same positions is little
In equal to the first preset value, then the word and second of the current location of the first participle is obtained according to the initial and the final similarity matrix
The similarity between similarity, simple or compound vowel of a Chinese syllable between the initial consonant of the word of the next position of participle, the similarity between initial consonant and simple or compound vowel of a Chinese syllable
Similarity and simple or compound vowel of a Chinese syllable and initial consonant between;
Determine the first maximum, first maximum is next with the second participle for the word of the current location of the first participle
Product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of position, the word of the current location of the first participle
The simple or compound vowel of a Chinese syllable of the word of the current location of similarity and the first participle between the simple or compound vowel of a Chinese syllable of the word of the next position of initial consonant and the second participle
And the maximum between the similarity between the initial consonant of the word of the next position of the second participle this three;
Judge that first maximum, whether more than the second preset value, calculates the front Score that obtains before score Score is and adds
Upper first maximum, all adds 1 by the position of the position of the participle centering first participle and the second participle;
If first maximum is less than or equal to the second preset value, according to the initial and the final similarity matrix the is obtained
It is similar between similarity, simple or compound vowel of a Chinese syllable between the word of the next position of one participle and the initial consonant of the word of the current location of the second participle
Degree, the similarity between initial consonant and simple or compound vowel of a Chinese syllable and the similarity between simple or compound vowel of a Chinese syllable and initial consonant;
Determine the second maximum, second maximum is current with the second participle for the word of the next position of the first participle
Product S of the similarity between similarity and simple or compound vowel of a Chinese syllable between the initial consonant of the word of position, the word of the next position of the first participle
The simple or compound vowel of a Chinese syllable of the word of the next position of similarity and the first participle between the simple or compound vowel of a Chinese syllable of the word of the current location of initial consonant and the second participle
And the maximum between the similarity between the initial consonant of the word of the current location of the second participle this three;
Judge that second maximum, whether more than the 3rd preset value, calculates the front Score that obtains before score Score is and adds
Upper second maximum, all adds 1 by the position of the position of the participle centering first participle and the second participle, then judges the
Word after the location updating of one participle and the second participle whether unisonance, the phonetic of the Jing above-mentioned steps traversal first participle and second point
The phonetic of word simultaneously calculates score Score.
In other embodiments of the invention, described device also includes the 3rd determining unit, for determining the initial consonant rhythm
Female similarity matrix, the 3rd determining unit further includes that the second determining module, the 3rd determining module, the 4th determine mould
Block, the 5th determining module, the 6th determining module, the 7th determining module and the 8th module, wherein:
Second determining module, for determining second language material in first note by pronounced mistake total degree, institute
First note is stated including initial consonant or simple or compound vowel of a Chinese syllable;
3rd determining module, for determining the first note by number of times that incorrect pronunciations are the second note;
4th determining module, for according to the first note by pronounce mistake total degree and the first note
Probability of the first note transfer for the second note is determined by the number of times that incorrect pronunciations are the second note;
5th determining module, for determining second language material in the second note by pronounced mistake total degree, institute
The second note is stated including initial consonant or simple or compound vowel of a Chinese syllable;
6th determining module, for determine second note by incorrect pronunciations for first note number of times;
7th determining module, for according to second note by pronounce mistake total degree and second note
Determine probability of the second note transfer for first note for the number of times of first note by incorrect pronunciations;
8th determining module, for according to the probability and second sound that first note transfer is the second note
Similarity of the symbol transfer described in the determine the probability of first note between first note and second note.
It need to be noted that be:The description of apparatus above embodiment, be with the description of said method embodiment it is similar,
With the similar beneficial effect of same embodiment of the method.For the ins and outs not disclosed in apparatus of the present invention embodiment, refer to
The description of the inventive method embodiment and understand.
Based on aforesaid embodiment, the embodiment of the present invention provides a kind of text error correction device, each list included by the device
Unit, can be realized by the processor in the second computing device, also can be realized by specific logic circuit certainly;In tool
During body embodiment, processor can be central processing unit (CPU), microprocessor (MPU), digital signal processor
Or field programmable gate array (FPGA) etc. (DSP).
Fig. 5 is the composition structural representation two of embodiment of the present invention text error correction device, and shown in Fig. 5, the device 500 includes
4th determining unit 501, the second judging unit 502, the 5th determining unit 503, the 6th determining unit 504, the 3rd judging unit
505 and error correction unit 506, wherein:
4th determining unit 501, for determining error correction participle is treated, the sentence for treating that error correction participle is user input
In participle;
Second judging unit 502, for judging whether to treat the corresponding participle set of error correction participle, institute with described
State in participle set at least include that a similarity according to phonetic calculated and drawn for treating error correction point described in error correction
The error correction participle of word;
5th determining unit 503, for determining first language model score, the first language model score is institute
State the language model scores for treating error correction participle in the sentence;
6th determining unit 504, for determining second language model score, the second language model score is institute
State and treat error correction participle language model scores respectively in the sentence in error correction participle set;
3rd judging unit 505, for judging the second language model score in the presence or absence of having more than described
First language model score, obtains judged result;
The error correction unit 506, for treating that error correction participle carries out error correction to described according to judged result.
In other embodiments of the invention, the error correction unit, is used for:If deposited in the second language model score
Having more than the first language model score, the error correction participle of highest scoring in language model scores is defined as to treat to described
The error correction word of error correction participle;If the first language model score is not greater than in the second language model score, no
Treat that error correction participle carries out error correction to described.
In other embodiments of the invention, described device also includes that first forms unit, mark unit, the first determination list
Unit, the first judging unit, the second determining unit and second form unit, wherein:
Described first forms unit, for collecting the first language material in the form of participle pair;
The mark unit, for two participles of the participle centering to be marked all in the form of phonetic;
First determining unit, for determining the similarity of phonetic between two participles of the participle centering, the phase
Seemingly spend the similarity degree for showing between the phonetic of the participle centering first participle and the phonetic of the second participle;
First judging unit, for judging whether the similarity meets default condition;
Second determining unit, if meeting default condition for the similarity, by the participle centering
Two participles are identified as mutual error correction participle;
Described second forms unit, for forming the participle set according to the error correction participle.
It need to be noted that be:The description of apparatus above embodiment, be with the description of said method embodiment it is similar,
With the similar beneficial effect of same embodiment of the method.For the ins and outs not disclosed in apparatus of the present invention embodiment, refer to
The description of the inventive method embodiment and understand.
Based on aforesaid embodiment, the embodiment of the present invention provides a kind of computing device, and Fig. 6 is embodiment of the present invention server
Composition structural representation, as shown in fig. 6, the computing device 600 can include:At least one processor 601, at least one leads to
Letter bus 602, user interface 603, at least one external communication interface 604 and the memorizer 605 for storing executable program
Deng component.Wherein, communication bus 602 is used to realize processor 601, user interface 603, external communication interface 604 and memorizer
Connection communication between 605.Wherein, user interface 603 can include display screen and keyboard.External communication interface 604 is optional
Including wireline interface and wave point.Wherein described processor 601, is used for:
The processor 601 is used for:
The first language material is collected in the form of participle pair;
Two participles of participle centering in first language material are marked all in the form of phonetic;
Determine the similarity of phonetic between two participles of the participle centering, the similarity is used to show the participle pair
Similarity degree between the phonetic of the phonetic of the middle first participle and the second participle;
If the similarity meets default condition, two participles of the participle centering are identified as each other
Error correction participle or error correction participle that the first participle is the second participle;
The similarity is met into the participle of default condition to forming error correction dictionary;
The error correction dictionary is sent to into terminal by the external communication interface 604.
It need to be noted that be:The description of above server implementation item, is similar with said method description, is had
With embodiment of the method identical beneficial effect.For the ins and outs not disclosed in server example of the present invention, this area
Technical staff refer to the description of the inventive method embodiment and understand.
It should be noted that in the embodiment of the present invention, if realizing that above-mentioned text entangles in the form of software function module
Wrong method, and as independent production marketing or when using, it is also possible in being stored in a computer read/write memory medium.Base
In such understanding, the part that the technical scheme of the embodiment of the present invention substantially contributes in other words to prior art can be with
The form of software product is embodied, and the computer software product is stored in a storage medium, including some instructions to
So that a computer equipment (can be personal computer, server or network equipment etc.) performs each enforcement of the present invention
The all or part of example methods described.And aforesaid storage medium includes:USB flash disk, portable hard drive, read only memory (ROM, Read
Only Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.So, the embodiment of the present invention is not limited
Combine in any specific hardware and software.Correspondingly, the embodiment of the present invention provides again a kind of computer-readable storage medium, the meter
Be stored with computer executable instructions in calculation machine storage medium, and the computer executable instructions are used to perform in the embodiment of the present invention
Text error correction method.
It should be understood that " one embodiment " or " embodiment " that description is mentioned in the whole text means relevant with embodiment
Special characteristic, structure or characteristic are included at least one embodiment of the present invention.Therefore, occur everywhere in entire disclosure
" in one embodiment " or " in one embodiment " not necessarily refers to identical embodiment.Additionally, these specific feature, knots
Structure or characteristic can be combined in any suitable manner in one or more embodiments.It should be understood that in the various enforcements of the present invention
In example, the size of the sequence number of above-mentioned each process is not meant to the priority of execution sequence, and the execution sequence of each process should be with its work(
Can determine with internal logic, and any restriction should not be constituted to the implementation process of the embodiment of the present invention.The embodiments of the present invention
Sequence number is for illustration only, does not represent the quality of embodiment.
It should be noted that herein, term " including ", "comprising" or its any other variant are intended to non-row
His property is included, so that a series of process, method, article or device including key elements not only include those key elements, and
And also include other key elements being not expressly set out, or also include for this process, method, article or device institute inherently
Key element.In the absence of more restrictions, the key element for being limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
Also there is other identical element in the process of key element, method, article or device.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, it can be passed through
Its mode is realized.Apparatus embodiments described above are only schematic, and for example, the division of the unit is only
A kind of division of logic function, can have other dividing mode, such as when actually realizing:Multiple units or component can be combined, or
Another system is desirably integrated into, or some features can be ignored, or do not perform.In addition, shown or discussed each composition portion
Coupling point each other or direct-coupling or communication connection can be the INDIRECT COUPLINGs by some interfaces, equipment or unit
Or communication connection, can be electrical, machinery or other forms.It is above-mentioned as separating component explanation unit can be,
Or may not be physically separate, can be as the part that unit shows or may not be physical location;Both can be with
Positioned at a place, it is also possible to be distributed on multiple NEs;Part therein or complete can according to the actual needs be selected
Portion's unit is realizing the purpose of this embodiment scheme.In addition, each functional unit in various embodiments of the present invention can all collect
In Cheng Yi processing unit, or each unit is individually as a unit, it is also possible to two or more lists
Unit is integrated in a unit;Above-mentioned integrated unit both can be realized in the form of hardware, it would however also be possible to employ hardware adds soft
The form of part functional unit is realized.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of said method embodiment can pass through
Completing, aforesaid program can be stored in computer read/write memory medium the related hardware of programmed instruction, and the program exists
During execution, the step of including said method embodiment is performed;And aforesaid storage medium includes:Movable storage device, read-only deposit
Reservoir (Read Only Memory, ROM), magnetic disc or CD etc. are various can be with the medium of store program codes.Or, this
If bright above-mentioned integrated unit is realized using in the form of software function module and as independent production marketing or when using, also may be used
In to be stored in a computer read/write memory medium.Based on such understanding, the technical scheme essence of the embodiment of the present invention
On prior art is contributed part in other words can be embodied in the form of software product, the computer software product
In being stored in a storage medium, including some instructions are used so that a computer equipment (can be personal computer, service
Device or the network equipment etc.) perform all or part of each embodiment methods described of the invention.And aforesaid storage medium bag
Include:Movable storage device, ROM, magnetic disc or CD etc. are various can be with the medium of store program codes.
The above, the only specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, all should contain
Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be defined by the scope of the claims.
Claims (16)
1. a kind of text error correction method, it is characterised in that methods described includes:
The first language material is collected in the form of participle pair;
Two participles of participle centering in first language material are marked all in the form of phonetic;
Determine the similarity of phonetic between two participles of the participle centering, the similarity is used to showing the participle centering the
Similarity degree between the phonetic of the phonetic of one participle and the second participle;
If the similarity meets default condition, two participles of the participle centering are identified as into mutual entangling
Wrong participle or the first participle are the error correction participle of the second participle.
2. method according to claim 1, it is characterised in that two participles by the participle centering are all with phonetic
Form mark, including:
If the participle centering include polyphone, by two participles of the participle centering all in the form of phonetic mark
Note.
3. method according to claim 2, it is characterised in that two participles by the participle centering are all with phonetic
Form mark, also include:
If the participle centering includes that at least one includes polyphonic word in polyphone and two participles of the participle centering,
The corresponding two or more phonetic of the polyphonic word is labeled as participle centering correspondence participle phonetic it is part or all of.
4. method according to claim 1, it is characterised in that two participles by the participle centering are all with phonetic
Form mark, including:
If the participle centering includes Arabic numerals, the Arabic numerals are converted to into corresponding Chinese character;
The participle be converted to after Chinese character of the participle centering is marked in the form of phonetic.
5. the method according to any one of Claims 1-4, it is characterised in that the determination two points of the participle centering
The similarity of phonetic between word, including:
Align by the initial consonant alignment of the phonetic of two participles of participle centering and by the simple or compound vowel of a Chinese syllable of the phonetic of two participles;
Calculate the participle centering first participle phonetic be converted to the second participle phonetic transition probability;
The similarity of phonetic between two participles of the participle centering is determined according to the transition probability.
6. method according to claim 5, it is characterised in that the sound of the phonetic by two participles of participle centering
Mother's alignment simultaneously aligns the simple or compound vowel of a Chinese syllable of the phonetic of two participles, including:
The initial consonant of the phonetic of two participles of participle centering is alignd and by two points by the alignment thereof of most same pronunciations
The simple or compound vowel of a Chinese syllable alignment of the phonetic of word.
7. method according to claim 5, it is characterised in that the phonetic of the calculating participle centering first participle turns
The transition probability of the phonetic of the second participle is changed to, including:
Determine the number of notes different between the first participle and second participle;
Institute is determined according to the length of the note string of the number and the first participle or second participle of the different note
State transition probability.
8. method according to claim 5, it is characterised in that the phonetic of the calculating participle centering first participle turns
The transition probability of the phonetic of the second participle is changed to, including:
If the word unisonance of two participle same positions after alignment, calculate score Score and add 1, and by the participle centering
The position of the position of the first participle and the second participle all adds 1;
If the word not unisonance of two participle same positions after alignment, according to default the initial and the final similarity matrix institute is determined
State score Score of the phonetic of the phonetic of the first participle and the second participle in two participles;
Normalized final score Sf is determined according to score Score, the number of words of the first participle, the number of words of the second participle;
Determine that the phonetic of the participle centering first participle is converted to turning for the phonetic of the second participle according to the final score Sf
Change probability.
9. method according to claim 1, it is characterised in that phonetic between described two participles of participle centering of the determination
Similarity, including:
Determine the similarity of phonetic between two participles of the participle centering using default the initial and the final similarity matrix.
10. method according to claim 8 or claim 9, it is characterised in that the determination the initial and the final similarity matrix bag
Include:
The second language material is collected in the form of participle pair;
Two participles of participle centering in second language material are marked all in the form of phonetic;
Determine that first note is by the wrong total degree that pronounced in second language material, the first note includes initial consonant or simple or compound vowel of a Chinese syllable;
Determine the first note by number of times that incorrect pronunciations are the second note;
The total degree for being pronounced wrong according to the first note and the first note are the secondary of the second note by incorrect pronunciations
Number determines probability of the first note transfer for the second note;
Determine that the second note is by the wrong total degree that pronounced in second language material, second note includes initial consonant or simple or compound vowel of a Chinese syllable;
Determine second note by incorrect pronunciations for first note number of times;
The total degree for being pronounced wrong according to second note and second note are the secondary of first note by incorrect pronunciations
Number determines probability of the second note transfer for first note;
It is true for the probability of first note for probability and second note transfer of the second note according to first note transfer
Fixed similarity between the first note and second note, according between the first note and second note
Similarity forms the initial and the final similarity matrix.
11. methods according to claim 1, it is characterised in that methods described also includes:
The similarity is met into the participle of default condition to forming error correction dictionary;
The error correction dictionary is sent to into terminal.
12. methods according to claim 1, it is characterised in that methods described also includes:
The error correction request that receiving terminal sends, carries the sentence of user input in the error correction request;
It is determined that error correction participle is treated, the participle treated in the sentence that error correction participle is the user input;
The corresponding participle set of error correction participle is treated if there is with described, first language model score and second language model is determined
Score, the first language model score is described to treat language model scores of the error correction participle in the sentence, the participle
At least include an error correction participle for being used to treat error correction participle described in error correction in set, the second language model score is described
Treat error correction participle language model scores respectively in the sentence in error correction participle set;
If there are in the second language model score more than the first language model score, by language model scores
The error correction participle of highest scoring is defined as to the error correction word for treating error correction participle;
The error correction word is carried in the first error correction response, first error correction response is sent to into terminal.
13. methods according to claim 12, it is characterised in that methods described also includes:
If the first language model score is not greater than in the second language model score, or if there is no with it is described
The corresponding participle set of error correction participle is treated, the second error correction response is sent, second error correction is responded for showing not treating to described
Error correction participle carries out error correction.
14. a kind of text error correction devices, it is characterised in that described device includes that first forms unit, mark unit, the first determination
Unit and the second determining unit, wherein:
Described first forms unit, for collecting the first language material in the form of participle pair;
The mark unit, for two participles of the participle centering to be marked all in the form of phonetic;
First determining unit, for determining the similarity of phonetic between two participles of the participle centering, the similarity
For showing the similarity degree between the phonetic of the participle centering first participle and the phonetic of the second participle;
Second determining unit, if meeting default condition for the similarity, by two of the participle centering
Participle is identified as mutual error correction participle.
15. a kind of servers, it is characterised in that the server includes processor and external communication interface, the processor is used
In:
The first language material is collected in the form of participle pair;
Two participles of participle centering in first language material are marked all in the form of phonetic;
Determine the similarity of phonetic between two participles of the participle centering, the similarity is used to showing the participle centering the
Similarity degree between the phonetic of the phonetic of one participle and the second participle;
If the similarity meets default condition, two participles of the participle centering are identified as into mutual entangling
Wrong participle or the first participle are the error correction participle of the second participle;
The similarity is met into the participle of default condition to forming error correction dictionary;
The error correction dictionary is sent to into terminal by the external communication interface.
16. a kind of computer-readable storage mediums, it is characterised in that the computer that is stored with the computer-readable storage medium is executable to be referred to
Order, the computer executable instructions are used for the text error correction method described in perform claim 1 to 13 any one of requirement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610922072.0A CN106598939B (en) | 2016-10-21 | 2016-10-21 | A kind of text error correction method and device, server, storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610922072.0A CN106598939B (en) | 2016-10-21 | 2016-10-21 | A kind of text error correction method and device, server, storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106598939A true CN106598939A (en) | 2017-04-26 |
CN106598939B CN106598939B (en) | 2019-09-17 |
Family
ID=58555570
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610922072.0A Active CN106598939B (en) | 2016-10-21 | 2016-10-21 | A kind of text error correction method and device, server, storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106598939B (en) |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301866A (en) * | 2017-06-23 | 2017-10-27 | 北京百度网讯科技有限公司 | Data inputting method |
CN107357775A (en) * | 2017-06-05 | 2017-11-17 | 百度在线网络技术(北京)有限公司 | The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence |
CN107564528A (en) * | 2017-09-20 | 2018-01-09 | 深圳市空谷幽兰人工智能科技有限公司 | A kind of speech recognition text and the method and apparatus of order word text matches |
CN107958039A (en) * | 2017-11-21 | 2018-04-24 | 北京百度网讯科技有限公司 | A kind of term error correction method, device and server |
CN108052499A (en) * | 2017-11-20 | 2018-05-18 | 北京百度网讯科技有限公司 | Text error correction method, device and computer-readable medium based on artificial intelligence |
CN108091328A (en) * | 2017-11-20 | 2018-05-29 | 北京百度网讯科技有限公司 | Speech recognition error correction method, device and readable medium based on artificial intelligence |
CN108491392A (en) * | 2018-03-29 | 2018-09-04 | 广州视源电子科技股份有限公司 | Modification method, system, computer equipment and the storage medium of word misspelling |
CN108519973A (en) * | 2018-03-29 | 2018-09-11 | 广州视源电子科技股份有限公司 | Detection method, system, computer equipment and the storage medium of word spelling |
CN108563632A (en) * | 2018-03-29 | 2018-09-21 | 广州视源电子科技股份有限公司 | Modification method, system, computer equipment and the storage medium of word misspelling |
CN108647346A (en) * | 2018-05-15 | 2018-10-12 | 苏州东巍网络科技有限公司 | A kind of the elderly's voice interactive method and system for wearable electronic |
CN109376362A (en) * | 2018-11-30 | 2019-02-22 | 武汉斗鱼网络科技有限公司 | A kind of the determination method and relevant device of corrected text |
CN109739368A (en) * | 2018-12-29 | 2019-05-10 | 咪咕文化科技有限公司 | A kind of method, apparatus of the fractionation of the Chinese phonetic alphabet |
CN109753636A (en) * | 2017-11-01 | 2019-05-14 | 阿里巴巴集团控股有限公司 | Machine processing and text error correction method and device calculate equipment and storage medium |
CN109858473A (en) * | 2018-12-28 | 2019-06-07 | 天津幸福生命科技有限公司 | A kind of adaptive method for correcting error, device, readable medium and electronic equipment |
CN109901727A (en) * | 2019-03-06 | 2019-06-18 | 上海依智医疗技术有限公司 | A kind of method and apparatus obtaining text error correction information |
CN110019684A (en) * | 2018-08-17 | 2019-07-16 | 武汉斗鱼网络科技有限公司 | A kind of correcting method, device, terminal and storage medium for searching for text |
CN110276077A (en) * | 2019-06-25 | 2019-09-24 | 上海应用技术大学 | The method, device and equipment of Chinese error correction |
CN110399608A (en) * | 2019-06-04 | 2019-11-01 | 深思考人工智能机器人科技(北京)有限公司 | A kind of conversational system text error correction system and method based on phonetic |
CN110516248A (en) * | 2019-08-27 | 2019-11-29 | 出门问问(苏州)信息科技有限公司 | Method for correcting error of voice identification result, device, storage medium and electronic equipment |
CN111078898A (en) * | 2019-12-27 | 2020-04-28 | 出门问问信息科技有限公司 | Polyphone marking method and device and computer readable storage medium |
CN111079412A (en) * | 2018-10-18 | 2020-04-28 | 北京嘀嘀无限科技发展有限公司 | Text error correction method and device |
CN111339755A (en) * | 2018-11-30 | 2020-06-26 | 中国移动通信集团浙江有限公司 | Automatic error correction method and device for office data |
CN111462748A (en) * | 2019-01-22 | 2020-07-28 | 北京猎户星空科技有限公司 | Voice recognition processing method and device, electronic equipment and storage medium |
CN111611792A (en) * | 2020-05-21 | 2020-09-01 | 全球能源互联网研究院有限公司 | Entity error correction method and system for voice transcription text |
CN111783433A (en) * | 2019-12-26 | 2020-10-16 | 北京沃东天骏信息技术有限公司 | Text retrieval error correction method and device |
CN112417867A (en) * | 2020-12-07 | 2021-02-26 | 四川长虹电器股份有限公司 | Method and system for correcting video title error after voice recognition |
CN112509581A (en) * | 2020-11-20 | 2021-03-16 | 北京有竹居网络技术有限公司 | Method and device for correcting text after speech recognition, readable medium and electronic equipment |
CN112560493A (en) * | 2020-12-17 | 2021-03-26 | 金蝶软件(中国)有限公司 | Named entity error correction method, named entity error correction device, computer equipment and storage medium |
CN112668311A (en) * | 2019-09-29 | 2021-04-16 | 北京国双科技有限公司 | Text error detection method and device |
CN112836039A (en) * | 2021-01-27 | 2021-05-25 | 成都网安科技发展有限公司 | Voice data processing method and device based on deep learning |
WO2021218329A1 (en) * | 2020-04-28 | 2021-11-04 | 深圳壹账通智能科技有限公司 | Parallel corpus generation method, apparatus and device, and storage medium |
CN116013278A (en) * | 2023-01-06 | 2023-04-25 | 杭州健海科技有限公司 | Speech recognition multi-model result merging method and device based on pinyin alignment algorithm |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101080875A (en) * | 2005-09-01 | 2007-11-28 | 日本电信电话株式会社 | Error correcting method and apparatus |
US7565348B1 (en) * | 2005-03-24 | 2009-07-21 | Palamida, Inc. | Determining a document similarity metric |
CN102915314A (en) * | 2011-08-05 | 2013-02-06 | 腾讯科技(深圳)有限公司 | Automatic error correction pair generation method and system |
CN103365925A (en) * | 2012-04-09 | 2013-10-23 | 高德软件有限公司 | Method for acquiring polyphone spelling, method for retrieving based on spelling, and corresponding devices |
CN103914444A (en) * | 2012-12-29 | 2014-07-09 | 高德软件有限公司 | Error correction method and device thereof |
CN104750672A (en) * | 2013-12-27 | 2015-07-01 | 重庆新媒农信科技有限公司 | Chinese word error correction method used in search and device thereof |
CN104991889A (en) * | 2015-06-26 | 2015-10-21 | 江苏科技大学 | Fuzzy word segmentation based non-multi-character word error automatic proofreading method |
-
2016
- 2016-10-21 CN CN201610922072.0A patent/CN106598939B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7565348B1 (en) * | 2005-03-24 | 2009-07-21 | Palamida, Inc. | Determining a document similarity metric |
CN101080875A (en) * | 2005-09-01 | 2007-11-28 | 日本电信电话株式会社 | Error correcting method and apparatus |
US20090055704A1 (en) * | 2005-09-01 | 2009-02-26 | Nippon Telegraph And Telephone Corporation | Error correction method and apparatus |
CN102915314A (en) * | 2011-08-05 | 2013-02-06 | 腾讯科技(深圳)有限公司 | Automatic error correction pair generation method and system |
CN103365925A (en) * | 2012-04-09 | 2013-10-23 | 高德软件有限公司 | Method for acquiring polyphone spelling, method for retrieving based on spelling, and corresponding devices |
CN103914444A (en) * | 2012-12-29 | 2014-07-09 | 高德软件有限公司 | Error correction method and device thereof |
CN104750672A (en) * | 2013-12-27 | 2015-07-01 | 重庆新媒农信科技有限公司 | Chinese word error correction method used in search and device thereof |
CN104991889A (en) * | 2015-06-26 | 2015-10-21 | 江苏科技大学 | Fuzzy word segmentation based non-multi-character word error automatic proofreading method |
Cited By (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107357775A (en) * | 2017-06-05 | 2017-11-17 | 百度在线网络技术(北京)有限公司 | The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence |
US11314921B2 (en) | 2017-06-05 | 2022-04-26 | Baidu Online Network Technology (Beijing) Co., Ltd. | Text error correction method and apparatus based on recurrent neural network of artificial intelligence |
CN107301866B (en) * | 2017-06-23 | 2021-01-05 | 北京百度网讯科技有限公司 | Information input method |
CN107301866A (en) * | 2017-06-23 | 2017-10-27 | 北京百度网讯科技有限公司 | Data inputting method |
CN107564528A (en) * | 2017-09-20 | 2018-01-09 | 深圳市空谷幽兰人工智能科技有限公司 | A kind of speech recognition text and the method and apparatus of order word text matches |
CN107564528B (en) * | 2017-09-20 | 2020-12-15 | 广东惠禾科技发展有限公司 | Method and equipment for matching voice recognition text with command word text |
CN109753636A (en) * | 2017-11-01 | 2019-05-14 | 阿里巴巴集团控股有限公司 | Machine processing and text error correction method and device calculate equipment and storage medium |
CN108052499A (en) * | 2017-11-20 | 2018-05-18 | 北京百度网讯科技有限公司 | Text error correction method, device and computer-readable medium based on artificial intelligence |
CN108091328A (en) * | 2017-11-20 | 2018-05-29 | 北京百度网讯科技有限公司 | Speech recognition error correction method, device and readable medium based on artificial intelligence |
CN108052499B (en) * | 2017-11-20 | 2021-06-11 | 北京百度网讯科技有限公司 | Text error correction method and device based on artificial intelligence and computer readable medium |
CN108091328B (en) * | 2017-11-20 | 2021-04-16 | 北京百度网讯科技有限公司 | Speech recognition error correction method and device based on artificial intelligence and readable medium |
CN107958039A (en) * | 2017-11-21 | 2018-04-24 | 北京百度网讯科技有限公司 | A kind of term error correction method, device and server |
CN108563632A (en) * | 2018-03-29 | 2018-09-21 | 广州视源电子科技股份有限公司 | Modification method, system, computer equipment and the storage medium of word misspelling |
CN108491392A (en) * | 2018-03-29 | 2018-09-04 | 广州视源电子科技股份有限公司 | Modification method, system, computer equipment and the storage medium of word misspelling |
CN108519973A (en) * | 2018-03-29 | 2018-09-11 | 广州视源电子科技股份有限公司 | Detection method, system, computer equipment and the storage medium of word spelling |
CN108647346A (en) * | 2018-05-15 | 2018-10-12 | 苏州东巍网络科技有限公司 | A kind of the elderly's voice interactive method and system for wearable electronic |
CN108647346B (en) * | 2018-05-15 | 2021-10-29 | 苏州东巍网络科技有限公司 | Old people voice interaction method and system for wearable electronic equipment |
CN110019684A (en) * | 2018-08-17 | 2019-07-16 | 武汉斗鱼网络科技有限公司 | A kind of correcting method, device, terminal and storage medium for searching for text |
CN110019684B (en) * | 2018-08-17 | 2021-06-15 | 武汉斗鱼网络科技有限公司 | Method, device, terminal and storage medium for correcting search text |
CN111079412B (en) * | 2018-10-18 | 2024-01-23 | 北京嘀嘀无限科技发展有限公司 | Text error correction method and device |
CN111079412A (en) * | 2018-10-18 | 2020-04-28 | 北京嘀嘀无限科技发展有限公司 | Text error correction method and device |
CN111339755A (en) * | 2018-11-30 | 2020-06-26 | 中国移动通信集团浙江有限公司 | Automatic error correction method and device for office data |
CN109376362A (en) * | 2018-11-30 | 2019-02-22 | 武汉斗鱼网络科技有限公司 | A kind of the determination method and relevant device of corrected text |
CN109858473B (en) * | 2018-12-28 | 2023-03-07 | 天津幸福生命科技有限公司 | Self-adaptive deviation rectifying method and device, readable medium and electronic equipment |
CN109858473A (en) * | 2018-12-28 | 2019-06-07 | 天津幸福生命科技有限公司 | A kind of adaptive method for correcting error, device, readable medium and electronic equipment |
CN109739368A (en) * | 2018-12-29 | 2019-05-10 | 咪咕文化科技有限公司 | A kind of method, apparatus of the fractionation of the Chinese phonetic alphabet |
CN111462748B (en) * | 2019-01-22 | 2023-09-26 | 北京猎户星空科技有限公司 | Speech recognition processing method and device, electronic equipment and storage medium |
CN111462748A (en) * | 2019-01-22 | 2020-07-28 | 北京猎户星空科技有限公司 | Voice recognition processing method and device, electronic equipment and storage medium |
CN109901727A (en) * | 2019-03-06 | 2019-06-18 | 上海依智医疗技术有限公司 | A kind of method and apparatus obtaining text error correction information |
CN110399608B (en) * | 2019-06-04 | 2023-04-25 | 深思考人工智能机器人科技(北京)有限公司 | Text error correction system and method for dialogue system based on pinyin |
CN110399608A (en) * | 2019-06-04 | 2019-11-01 | 深思考人工智能机器人科技(北京)有限公司 | A kind of conversational system text error correction system and method based on phonetic |
CN110276077A (en) * | 2019-06-25 | 2019-09-24 | 上海应用技术大学 | The method, device and equipment of Chinese error correction |
CN110516248A (en) * | 2019-08-27 | 2019-11-29 | 出门问问(苏州)信息科技有限公司 | Method for correcting error of voice identification result, device, storage medium and electronic equipment |
CN112668311A (en) * | 2019-09-29 | 2021-04-16 | 北京国双科技有限公司 | Text error detection method and device |
CN111783433A (en) * | 2019-12-26 | 2020-10-16 | 北京沃东天骏信息技术有限公司 | Text retrieval error correction method and device |
CN111078898A (en) * | 2019-12-27 | 2020-04-28 | 出门问问信息科技有限公司 | Polyphone marking method and device and computer readable storage medium |
CN111078898B (en) * | 2019-12-27 | 2023-08-08 | 出门问问创新科技有限公司 | Multi-tone word annotation method, device and computer readable storage medium |
WO2021218329A1 (en) * | 2020-04-28 | 2021-11-04 | 深圳壹账通智能科技有限公司 | Parallel corpus generation method, apparatus and device, and storage medium |
CN111611792B (en) * | 2020-05-21 | 2023-05-23 | 全球能源互联网研究院有限公司 | Entity error correction method and system for voice transcription text |
CN111611792A (en) * | 2020-05-21 | 2020-09-01 | 全球能源互联网研究院有限公司 | Entity error correction method and system for voice transcription text |
CN112509581A (en) * | 2020-11-20 | 2021-03-16 | 北京有竹居网络技术有限公司 | Method and device for correcting text after speech recognition, readable medium and electronic equipment |
CN112509581B (en) * | 2020-11-20 | 2024-03-01 | 北京有竹居网络技术有限公司 | Error correction method and device for text after voice recognition, readable medium and electronic equipment |
CN112417867B (en) * | 2020-12-07 | 2022-10-18 | 四川长虹电器股份有限公司 | Method and system for correcting video title error after voice recognition |
CN112417867A (en) * | 2020-12-07 | 2021-02-26 | 四川长虹电器股份有限公司 | Method and system for correcting video title error after voice recognition |
CN112560493A (en) * | 2020-12-17 | 2021-03-26 | 金蝶软件(中国)有限公司 | Named entity error correction method, named entity error correction device, computer equipment and storage medium |
CN112836039A (en) * | 2021-01-27 | 2021-05-25 | 成都网安科技发展有限公司 | Voice data processing method and device based on deep learning |
CN116013278A (en) * | 2023-01-06 | 2023-04-25 | 杭州健海科技有限公司 | Speech recognition multi-model result merging method and device based on pinyin alignment algorithm |
CN116013278B (en) * | 2023-01-06 | 2023-08-08 | 杭州健海科技有限公司 | Speech recognition multi-model result merging method and device based on pinyin alignment algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN106598939B (en) | 2019-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106598939B (en) | A kind of text error correction method and device, server, storage medium | |
US11416679B2 (en) | System and method for inputting text into electronic devices | |
CN102682763B (en) | Method, device and terminal for correcting named entity vocabularies in voice input text | |
EP3633521A1 (en) | Knowledge-based question answering system for the diy domain | |
JP4829901B2 (en) | Method and apparatus for confirming manually entered indeterminate text input using speech input | |
US9424246B2 (en) | System and method for inputting text into electronic devices | |
TWI437449B (en) | Multi-mode input method and input method editor system | |
CN103578465B (en) | Speech identifying method and electronic installation | |
CN103578467B (en) | Acoustic model building method, voice recognition method and electronic device | |
RU2377664C2 (en) | Text input method | |
CN111369996A (en) | Method for correcting text error in speech recognition in specific field | |
CN1918578B (en) | Handwriting and voice input with automatic correction | |
CN109637537B (en) | Method for automatically acquiring annotated data to optimize user-defined awakening model | |
EP1686493A2 (en) | Dictionary learning method and device using the same, input method and user terminal device using the same | |
CN105404621B (en) | A kind of method and system that Chinese character is read for blind person | |
CN103578464A (en) | Language model establishing method, speech recognition method and electronic device | |
US11907671B2 (en) | Role labeling method, electronic device and storage medium | |
JP2005084681A (en) | Method and system for semantic language modeling and reliability measurement | |
JP7266683B2 (en) | Information verification method, apparatus, device, computer storage medium, and computer program based on voice interaction | |
CN110070855B (en) | Voice recognition system and method based on migrating neural network acoustic model | |
CN102915122B (en) | Based on the intelligent family moving platform spelling input method of language model | |
CN101276245A (en) | Reminding method and system for coding to correct error in input process | |
Dinarelli et al. | Discriminative reranking for spoken language understanding | |
CN111489746A (en) | Power grid dispatching voice recognition language model construction method based on BERT | |
CN112528605B (en) | Text style processing method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |