CN104464736A - Error correction method and device for voice recognition text - Google Patents
Error correction method and device for voice recognition text Download PDFInfo
- Publication number
- CN104464736A CN104464736A CN201410778108.3A CN201410778108A CN104464736A CN 104464736 A CN104464736 A CN 104464736A CN 201410778108 A CN201410778108 A CN 201410778108A CN 104464736 A CN104464736 A CN 104464736A
- Authority
- CN
- China
- Prior art keywords
- text
- candidate
- editing distance
- corrected text
- error correction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Document Processing Apparatus (AREA)
Abstract
The embodiment of the invention discloses an error correction method and device for a voice recognition text. The error correction method for the voice recognition text comprises the steps that at least one candidate error correction text used for error correction of a result text is obtained according to the multi-layer K-Gram index of the voice recognition result text, the fuzzy sound editing distance matrix between the at least one candidate error correction text and the result text is determined, the fuzzy sound editing distance between the at least one candidate error correction text and the result text and a candidate error correction boundary are obtained according to the determined fuzzy sound editing distance matrix, an error correction text is selected according to the fuzzy sound editing distance corresponding to the at least one candidate error correction text, and error correction is conducted on the result text according to the candidate error correction boundary corresponding to the error correction text. By the adoption of the error correction method and device for the voice recognition text, accurate error correction of the voice recognition result text is achieved.
Description
Technical field
The embodiment of the present invention relates to technical field of voice recognition, particularly relates to a kind of error correction method and device of speech recognition text.
Background technology
Along with the maturation day by day of speech recognition technology, the application of speech recognition is also more and more extensive.Relative to other Text Input mode, the phonetic entry mode that speech recognition realizes more meets the daily habits of people, also makes input process more efficient.Can estimate, speech recognition technology will be widely used in multiple fields such as commercial production, communication, medical treatment, household services.
In the practical application of speech recognition technology, due to the impact of the factor such as ambient noise, dialect, the recognition result of speech recognition is often inconsistent with the input of user.Especially, under everyday spoken english scene, the identification error of speech recognition is more general.And the approach of error correction lacked in prior art identification error, thus have impact on the further genralrlization of speech recognition technology.
Summary of the invention
In view of this, the embodiment of the present invention proposes a kind of error correction method and device of speech recognition text, to carry out error correction accurately to the resulting text of speech recognition.
First aspect, embodiments provide a kind of error correction method of speech recognition text, described method comprises:
According to the multi-level K-Gram index of the resulting text of speech recognition, pull at least one the candidate's corrected text for carrying out error correction to described resulting text;
Determine the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text;
The fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined;
The fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
Second aspect, the embodiment of the present invention additionally provides a kind of error correction device of speech recognition text, and described device comprises:
Corrected text pulls module, for the multi-level K-Gram index of the resulting text according to speech recognition, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text;
Editing distance matrix computations module, for determining the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text;
Path backtracking module, for obtaining the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined;
Correction module, choose corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text for the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described.
The error correction method of the speech recognition text that the embodiment of the present invention provides and device, by the multi-level K-Gram index of the resulting text according to speech recognition, pull at least one the candidate's corrected text for carrying out error correction to described resulting text, determine the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text, the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined, the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text, thus achieve the accurate error correction of the resulting text to speech recognition.
Accompanying drawing explanation
By reading the detailed description done non-limiting example done with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 is the process flow diagram of the error correction method of the speech recognition text that first embodiment of the invention provides;
Fig. 2 is the process flow diagram of the error correction method of the speech recognition text that second embodiment of the invention provides;
Fig. 3 is the process flow diagram that the error correction method inediting distance matrix of the speech recognition text that second embodiment of the invention provides calculates;
Fig. 4 is the process flow diagram of path backtracking in the error correction method of the speech recognition text that second embodiment of the invention provides;
Fig. 5 is the process flow diagram of the error correction method of the speech recognition text that third embodiment of the invention provides;
Fig. 6 is the process flow diagram that in the error correction method of the speech recognition text that third embodiment of the invention provides, corrected text pulls;
Fig. 7 is the process flow diagram that the error correction method inediting distance matrix of the speech recognition text that third embodiment of the invention provides calculates;
Fig. 8 is the process flow diagram of path backtracking in the error correction method of the speech recognition text that third embodiment of the invention provides;
Fig. 9 is the process flow diagram of the error correction method of the speech recognition text that fourth embodiment of the invention provides;
Figure 10 is the process flow diagram of error correction in the error correction method of the speech recognition text that fifth embodiment of the invention provides;
Figure 11 is the structural drawing of the error correction device of the speech recognition text that sixth embodiment of the invention provides.
Embodiment
Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not full content.
First embodiment
Fig. 1 is the process flow diagram of the error correction method of the speech recognition text that first embodiment of the invention provides.See Fig. 1, the error correction method of described speech recognition text comprises:
S110, according to the multi-level K-Gram index of the resulting text of speech recognition, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text.
Before error correction is carried out to the resulting text of described speech recognition, first set up the multi-level K-Gram index of described resulting text.After the multi-level K-Gram index setting up described resulting text, according to described multi-level K-Gram index, from preset corpus, pull the candidate corrected text the most similar to described resulting text.
Concrete, described multi-level K-Gram index comprises any one in the K-Gram index of the K-Gram index of the K-Gram index of Chinese character level, the K-Gram index of pinyin syllable level, spelling or simplicity level, the initial and the final level.
The K-Gram index of described Chinese character level is for the element of composition K-Gram index and the K-Gram index set up with the Chinese character in described resulting text.The K-Gram index of described pinyin syllable level is the K-Gram index set up with the element of pinyin syllable composition K-Gram index corresponding to Chinese character in described resulting text.The K-Gram index of described spelling or simplicity level obtains the spelling or simplicity that in described resulting text, Chinese character is corresponding, and with described spelling or simplicity for the element of composition K-Gram index and the K-Gram index set up.The K-Gram index of described the initial and the final level distinguishes initial consonant and simple or compound vowel of a Chinese syllable in the spelling that Chinese character is corresponding from described resulting text, and with the initial consonant distinguished and simple or compound vowel of a Chinese syllable for the element of composition K-Gram index and the K-Gram index set up.
The candidate's corrected text pulled is for therefrom choosing the alternative text described resulting text being carried out to the corrected text of error correction.In order to carry out error correction to described resulting text more accurately, when pulling candidate's corrected text, the quantity of the candidate's corrected text pulled should be at least one.
S120, determines the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text.
After determining at least one candidate's corrected text, determine the fuzzy phoneme editing distance matrix between each candidate's corrected text and described resulting text.
Editing distance refers between two character strings, converts the minimum editing operation number of times needed for another one character string to by a character string.Wherein, described editing operation comprises replacement operation, update and deletion action.Described replacement operation refers to and utilizes a character to replace another character; Described update refers to an insertion character originally do not had in character string; Described deletion action refers to and delete an original character from character string.
Editing distance matrix is a kind of matrix for calculating the editing distance between two character strings.Table 1 shows the editing distance matrix between character string " kitten " and character string " sitting ".
Table 1
k | i | t | t | e | n | ||
0 | 1 | 2 | 3 | 4 | 5 | 6 | |
s | 1 | 1 | 2 | 3 | 4 | 5 | 6 |
i | 2 | 2 | 1 | 2 | 3 | 4 | 5 |
t | 3 | 3 | 2 | 1 | 2 | 3 | 4 |
t | 4 | 4 | 3 | 2 | 1 | 2 | 3 |
i | 5 | 5 | 4 | 3 | 2 | 2 | 2 |
n | 6 | 6 | 5 | 4 | 3 | 3 | 2 |
g | 7 | 7 | 6 | 5 | 4 | 4 | 3 |
Provide two character strings, dynamic programming algorithm can be utilized to solve editing distance matrix between two character strings.
After utilizing dynamic programming algorithm to solve to obtain the editing distance matrix between two character strings, element corresponding for replacement operation in described editing distance matrix is replaced with the fuzzy phoneme similarity between the character in character in current candidate corrected text corresponding to this element described resulting text corresponding with described element, just obtain the fuzzy phoneme editing distance matrix between current candidate corrected text and resulting text.Described fuzzy phoneme similarity is for characterizing between two character strings similarity degree phonetically.Concrete, in the present embodiment, described fuzzy phoneme similarity is for characterizing current candidate corrected text and resulting text similarity degree phonetically.
Fuzzy phoneme similarity between character in the described resulting text that character in the current candidate corrected text that described element is corresponding is corresponding with described element is by searching the fuzzy phoneme matrix that pre-sets and obtaining.In described fuzzy phoneme matrix, record the corresponding relation of kinds of characters and the fuzzy phoneme similarity between them.Therefore, can by searching the fuzzy phoneme similarity that described fuzzy phoneme matrix obtains needing.
S130, obtains the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined.
After determining the fuzzy phoneme editing distance matrix between each candidate's corrected text and described resulting text, for each candidate's corrected text, according to the fuzzy phoneme editing distance matrix between current candidate corrected text and described resulting text, the candidate's error correction border obtaining the fuzzy phoneme editing distance between current candidate corrected text and described resulting text and should adopt when utilizing current candidate corrected text to carry out error correction to described resulting text.
Described fuzzy phoneme editing distance to be used to represent between current candidate corrected text and described resulting text the amount of similarity degree phonetically.Fuzzy phoneme editing distance between two texts is larger, shows that these two texts similarity degree is phonetically lower.When between candidate's corrected text and described resulting text, similarity degree is phonetically lower, the final probability adopting this candidate's corrected text to carry out error correction as corrected text to described resulting text is just lower.
In general, adopt candidate's corrected text to carry out error correction to described resulting text and from described candidate's corrected text, choose an error correction substring exactly, replace in described resulting text the wrong substring of makeing mistakes.The coboundary of error correction substring described in described candidate's error correction boundary representation in described candidate's corrected text and lower boundary, and the coboundary of described wrong substring in described resulting text and lower boundary.Such as, suppose that candidate's corrected text is for " APEC ", and the resulting text of speech recognition is " forum of economic cooperation official that Asia-Pacific Organization for Economic Co-operation is Asian-Pacific area most impact ".Through identifying, obtain described error correction substring for " APEC ", mistake substring is " Asia-Pacific Organization for Economic Co-operation ", the then coboundary of described error correction substring " APEC " and lower boundary, and the coboundary of described wrong substring " Asia-Pacific Organization for Economic Co-operation " and lower boundary are described candidate's error correction border.
S140, the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
After getting the fuzzy phoneme editing distance between each candidate's corrected text and described resulting text, according to each self-corresponding fuzzy phoneme editing distance of each candidate's corrected text, from least one candidate's corrected text described, choose corrected text.Because described fuzzy phoneme editing distance is larger, show that candidate's corrected text and the described resulting text similarity degree phonetically of correspondence are lower, so, candidate's corrected text that fuzzy phoneme editing distance at least one candidate's corrected text described in generally should choosing and between described resulting text is less, as final corrected text of described resulting text being carried out to error correction.
The present embodiment is by the multi-level K-Gram index according to the resulting text of speech recognition, pull at least one the candidate's corrected text for carrying out error correction to described resulting text, determine the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text, the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined, and the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text, error correction has accurately been carried out to speech recognition text.
Second embodiment
Fig. 2 is the process flow diagram of the error correction method of the speech recognition text that second embodiment of the invention provides.The error correction method of described speech recognition text is based on first embodiment of the invention, further, the error correction method of described speech recognition text comprises: according to the multi-level K-Gram index of the resulting text of speech recognition, pulls at least one non-template candidate corrected text for carrying out error correction to described resulting text; Determine the described fuzzy phoneme editing distance matrix of at least one non-template candidate corrected text respectively and between described resulting text; The fuzzy phoneme editing distance of described at least one non-template candidate corrected text respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined; The fuzzy phoneme editing distance corresponding respectively according to described at least one non-template candidate corrected text chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
See Fig. 2, the error correction method of described speech recognition text comprises:
S210, according to the multi-level K-Gram index of the resulting text of speech recognition, pulls at least one non-template candidate corrected text for carrying out error correction to described resulting text.
Concrete, according to the multi-level K-Gram index of the resulting text of speech recognition, at least one the candidate's corrected text pulled for carrying out error correction to described resulting text comprises: according to the K-Gram index of Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pull at least one non-template candidate corrected text for carrying out error correction to described resulting text.
In the present embodiment, pull candidate's corrected text according to multi-level K-Gram index and be specially K-Gram index according to Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pull at least one candidate's corrected text, the candidate's corrected text pulled is non-template candidate corrected text.Described non-template candidate corrected text is the candidate's corrected text wherein not comprising asterisk wildcard.
S220, determines the described fuzzy phoneme editing distance matrix of at least one non-template candidate corrected text respectively and between described resulting text.
S230, obtains the fuzzy phoneme editing distance of described at least one non-template candidate corrected text respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined.
S240, the fuzzy phoneme editing distance corresponding respectively according to described at least one non-template candidate corrected text chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
Fig. 3 is the process flow diagram of fuzzy phoneme editing distance matrix computations in the error correction method of the speech recognition text that second embodiment of the invention provides.See Fig. 3, determine that the described fuzzy phoneme editing distance matrix of at least one non-template candidate corrected text respectively and between described resulting text comprises:
S221, for each non-template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current non-template candidate corrected text corresponding to described element and the resulting text corresponding to described element.
In the present embodiment, after pulling at least one non-template candidate corrected text, for each non-template candidate corrected text pulled, calculate the fuzzy phoneme editing distance matrix between it and described resulting text.When calculating the fuzzy phoneme editing distance matrix between current non-template candidate corrected text and described resulting text, first by the value of replacement operation corresponding element in described fuzzy phoneme editing distance matrix, the fuzzy phoneme similarity between the character in the character in the current non-template candidate corrected text corresponding to described element and the resulting text corresponding to described element is set to.
Can by carrying out text to described non-template candidate corrected text and described resulting text relatively or speech comparison corresponding to text and identifying to the position corresponding to described replacement operation.Such as, voice in described non-template candidate corrected text and described resulting text can be correlated with to the position corresponding in described fuzzy phoneme editing distance matrix of the highest two characters as the position corresponding to replacement operation.
Further, if the value of the element on the position that the replacement operation determined according to aforesaid way is corresponding is less than the value of the element on position that in described fuzzy phoneme editing distance matrix, previous replacement operation is corresponding, be then the value of this element with the value of the element on position corresponding to previous replacement operation.Therefore, the value of the element on the position that all replacement operations are corresponding should increase progressively successively.
S222, determines the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix, obtains the fuzzy phoneme editing distance matrix between current non-template candidate corrected text and described resulting text according to dynamic programming algorithm.
After the value of element corresponding to replacement operation in described fuzzy phoneme editing distance matrix is set, determine non-replaced operation corresponding element in described fuzzy phoneme editing distance, the value of other elements namely except replacement operation corresponding element.
Concrete, the value of described non-replaced operation corresponding element is determined according to the mode of dynamic programming algorithm.Further, when the transverse axis index of described element or the value of longitudinal axis index are 0, the value of described element is non-zero transverse axis index or longitudinal axis index.When the transverse axis index of described element and longitudinal axis index are not 0, the value of described element is determined according to following formula:
d[i][j]=min(d[i-1][j]+1,d[i][j-1]+1,d[i-1][j-1]+θ[i][j])。
Wherein, d [i] [j] for transverse axis index be i, longitudinal axis index is the value of the element of j, θ [i] [j] for transverse axis index be i, the fuzzy phoneme similarity between the character in the character in the non-template candidate corrected text of longitudinal axis index corresponding to the element of j and the corrected text corresponding to this element.
It should be noted that when calculating the value of non-replaced operation corresponding element, needing the value of element corresponding to the synchronous renewal replacement operation of above formula.
Table 2 show non-template candidate corrected text " tremble with fear " and resulting text " do not explain containing the Chinese idiom of Ah leopard cat " between fuzzy phoneme editing distance matrix.See table 2, on position corresponding to replacement operation between described non-template candidate corrected text and described resulting text, namely on the position that the transverse axis index of described table 2 is equal with longitudinal axis index, described fuzzy phoneme editing distance entry of a matrix element is the fuzzy phoneme similarity between the character in the resulting text that the character in the non-template candidate corrected text that this element is corresponding is corresponding with this element.
Table 2
No | Cold | And | Chestnut | ||
0 | 1 | 2 | 3 | 4 | |
No | 1 | 0 | 1 | 2 | 3 |
Contain | 2 | 1 | 0 | 1 | 2 |
Ah | 3 | 2 | 1 | 0.3369 | 1.3369 |
Leopard cat | 4 | 3 | 2 | 1.3369 | 0.3369 |
's | 5 | 4 | 3 | 2.3369 | 1.3369 |
Become | 6 | 5 | 4 | 3.3369 | 2.3369 |
Language | 7 | 6 | 5 | 4.3369 | 3.3369 |
Separate | 8 | 7 | 6 | 5.3369 | 4.3369 |
Release | 9 | 8 | 7 | 6.3369 | 5.3369 |
Fig. 4 is the process flow diagram of path backtracking in the error correction method of the speech recognition text that second embodiment of the invention provides.See Fig. 4, obtain the fuzzy phoneme editing distance of described at least one non-template candidate corrected text respectively and between described resulting text according to the fuzzy phoneme editing distance matrix determined and candidate's error correction border comprises:
S231, for each fuzzy phoneme editing distance matrix determined, obtains by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence.
When carrying out path backtracking, move to element corresponding to first replacement operation from first element of described fuzzy phoneme editing distance matrix with shortest path, move to element corresponding to last replacement operation from last element of described fuzzy phoneme editing distance matrix with shortest path simultaneously.The table 3 fuzzy phoneme editing distance matrix shown between non-template candidate corrected text " is trembled with fear " and resulting text " is not explained containing the Chinese idiom of Ah leopard cat " carries out the operation chart of path backtracking.Arrow in table 3 specifically designates the path back tracking operation carried out described fuzzy phoneme editing distance matrix.
Table 3
No | Cold | And | Chestnut | ||
0↘ | 1 | 2 | 3 | 4 | |
No | 1 | 0 | 1 | 2 | 3 |
Contain | 2 | 1 | 0 | 1 | 2 |
Ah | 3 | 2 | 1 | 0.3369 | 1.3369 |
Leopard cat | 4 | 3 | 2 | 1.3369 | 0.3369 |
's | 5 | 4 | 3 | 2.3369 | 1.3369↑ |
Become | 6 | 5 | 4 | 3.3369 | 2.3369↑ |
Language | 7 | 6 | 5 | 4.3369 | 3.3369↑ |
Separate | 8 | 7 | 6 | 5.3369 | 4.3369↑ |
Release | 9 | 8 | 7 | 6.3369 | 5.3369↑ |
S232, present Fuzzy sound editing distance matrix norm is stuck with paste candidate's error correction border of sound editing distance and correspondence, the fuzzy phoneme editing distance between the non-template candidate corrected text corresponding as present Fuzzy sound editing distance matrix and described resulting text and candidate's error correction border.
Concrete, take the value of element corresponding to last replacement operation as the fuzzy phoneme editing distance of described non-template candidate corrected text and described resulting text.And, with the border of the character in non-template candidate corrected text described corresponding to the element that described first replacement operation is corresponding and the character in resulting text, and the border of the character in the described non-template candidate corrected text corresponding to element corresponding to last replacement operation described and the character in resulting text is described candidate's error correction border.In the above example, the coboundary of " trembling with fear " with candidate's corrected text and lower boundary, and candidate's error correction border that the coboundary of character string in resulting text " not containing Ah leopard cat " and lower boundary " are trembled with fear " as described non-template candidate corrected text.
The present embodiment is by the multi-level K-Gram index according to the resulting text of speech recognition, pull at least one non-template candidate corrected text for carrying out error correction to described resulting text, determine the described fuzzy phoneme editing distance matrix of at least one non-template candidate corrected text respectively and between described resulting text, the fuzzy phoneme editing distance of described at least one non-template candidate corrected text respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined, the fuzzy phoneme editing distance corresponding respectively according to described at least one non-template candidate corrected text chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text, achieve the accurate error correction of the resulting text to speech recognition.
3rd embodiment
Fig. 5 is the process flow diagram of the error correction method of the speech recognition text that third embodiment of the invention provides.The error correction method of described speech recognition text is based on first embodiment of the invention, further, the error correction method of described speech recognition text comprises: according to the multi-level K-Gram index of the resulting text of speech recognition, pulls at least one the template candidate corrected text for carrying out error correction to described resulting text; Determine the described fuzzy phoneme editing distance matrix of at least one template candidate corrected text respectively and between described resulting text; The fuzzy phoneme editing distance of at least one template candidate corrected text described respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined; The fuzzy phoneme editing distance corresponding respectively according at least one template candidate corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
See Fig. 5, the error correction method of described speech recognition text comprises:
S510, according to the multi-level K-Gram index of the resulting text of speech recognition, pulls at least one the template candidate corrected text for carrying out error correction to described resulting text.
S520, determines the described fuzzy phoneme editing distance matrix of at least one template candidate corrected text respectively and between described resulting text.
S530, obtains the fuzzy phoneme editing distance of at least one template candidate corrected text described respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined.
S540, the fuzzy phoneme editing distance corresponding respectively according at least one template candidate corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
Fig. 6 is the process flow diagram that in the error correction method of the speech recognition text that third embodiment of the invention provides, corrected text pulls.See Fig. 6, according to the multi-level K-Gram index of the resulting text of speech recognition, at least one the template candidate corrected text pulled for carrying out error correction to described resulting text comprises:
S511, according to the K-Gram index of Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text.
S512, identifies the proper noun comprised in each candidate's corrected text, and uses asterisk wildcard to replace described proper noun, to obtain at least one template candidate corrected text.
After pulling described candidate's corrected text, judge whether include proper noun in described candidate's corrected text.Described proper noun comprises the name of place name, national title, organization name and personality.Such as, " Liu Dehua " is the name of personality, can be identified as proper noun.
Identify proper noun from described candidate's corrected text after, use asterisk wildcard to replace described proper noun, thus obtain template candidate corrected text corresponding to described candidate's corrected text.Such as, for candidate's corrected text " I wants the song listening Liu De China ", identify proper noun " Liu Dehua " and after using asterisk wildcard to replace proper noun " Liu Dehua ", just define template candidate corrected text " I wants to listen the song of * ".In the above example, " * " is exactly the asterisk wildcard in described template candidate corrected text.
Fig. 7 is the process flow diagram that the error correction method inediting distance matrix of the speech recognition text that third embodiment of the invention provides calculates.See Fig. 7, determine that the described fuzzy phoneme editing distance matrix of at least one template candidate corrected text respectively and between described resulting text comprises:
S521, for each template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current template candidate corrected text corresponding to described element and the resulting text corresponding to described element.
For the template candidate corrected text comprising asterisk wildcard, similar with the determination mode of the fuzzy phoneme editing distance matrix of non-template candidate corrected text, by the value of replacement operation corresponding element, be set to the fuzzy phoneme similarity between character in the character in the current template candidate corrected text corresponding to prime number element and the resulting text corresponding to described element.
Table 4 shows the fuzzy phoneme editing distance matrix between template candidate corrected text " I wants to listen the song of * " and resulting text " I thinks very Liu De China taxi driver brother ".
Table 4
I | Think | Listen | * | 's | Song | ||
0 | 1 | 2 | 3 | 4 | 5 | 6 | |
I | 1 | 0 | 1 | 2 | 3 | 4 | 5 |
Think | 2 | 1 | 0 | 1 | 2 | 3 | 4 |
Very | 3 | 2 | 1 | 0 | 1 | 2 | 3 |
Liu | 4 | 3 | 2 | 1 | 1 | 1.7 | 2.7 |
Moral | 5 | 4 | 3 | 2 | 2 | 1 | 2 |
China | 6 | 5 | 4 | 3 | 3 | 2 | 1.8 |
's | 7 | 6 | 5 | 4 | 4 | 3 | 2.8 |
Brother | 8 | 7 | 6 | 5 | 5 | 4 | 3 |
See table 4, on position corresponding to replacement operation between described template candidate corrected text and described resulting text, described fuzzy phoneme editing distance entry of a matrix element is the fuzzy phoneme similarity between the character in the resulting text that the character in the template candidate corrected text that this element is corresponding is corresponding with this element.
Identical with the fuzzy phoneme editing distance matrix of non-template candidate corrected text, position corresponding to described replacement operation can by carrying out text relatively or speech comparison corresponding to text and identifying to described non-template candidate corrected text and described resulting text.
With the fuzzy phoneme editing distance matrix of non-template candidate corrected text unlike, owing to comprising asterisk wildcard in described template candidate corrected text, the character in described template candidate corrected text and the character in described resulting text are not one_to_one corresponding.Under normal circumstances, asterisk wildcard can correspondence and the character at least two described resulting texts.Such as, in the example shown in table 4, described asterisk wildcard and the character of three in described resulting text: " Liu ", " moral " and " China " corresponding.
For the element on the position corresponding with the replacement operation corresponding to described asterisk wildcard, owing to cannot obtain the fuzzy phoneme similarity of these elements, their value is that the value of element on position that their previous replacement operation is corresponding adds one.
S522, determines the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix, obtains the fuzzy phoneme editing distance matrix between current template candidate corrected text and described resulting text according to dynamic programming algorithm.
For in described fuzzy phoneme editing distance matrix non-replaced operation corresponding element, namely in described fuzzy phoneme editing distance matrix except other elements of replacement operation corresponding element, determine its value according to dynamic programming algorithm.Further, when the value according to dynamic programming algorithm determination non-replaced operation corresponding element, the value upgrading replacement operation corresponding element together is also needed.
Fig. 8 is the process flow diagram of path backtracking in the error correction method of the speech recognition text that third embodiment of the invention provides.See Fig. 8, obtain the fuzzy phoneme editing distance of at least one template candidate corrected text described respectively and between described resulting text according to the fuzzy phoneme editing distance matrix determined and candidate's error correction border comprises:
S531, for each fuzzy phoneme editing distance matrix determined, obtains by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence.
For template candidate corrected text, the process and the non-template candidate corrected text that are obtained candidate's error correction border of present Fuzzy sound editing distance matrix norm paste sound editing distance and correspondence by path backtracking are similar, do not repeat them here.
S532, determines that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance, the difference between the editing distance corresponding to the asterisk wildcard in the template candidate corrected text corresponding with present Fuzzy sound editing distance matrix.
With the fuzzy phoneme editing distance acquisition process of non-template candidate corrected text unlike, for template candidate corrected text, after getting the fuzzy phoneme editing distance matrix norm paste sound editing distance of its correspondence, need the editing distance corresponding to the asterisk wildcard in described fuzzy phoneme editing distance matrix and described template candidate corrected text to do difference.
The editing distance that asterisk wildcard in described template candidate corrected text is corresponding is also obtained by path backtracking.Table 5 shows the process being obtained editing distance corresponding to described asterisk wildcard in fuzzy phoneme editing distance matrix corresponding to described template candidate corrected text by path backtracking.See table 5, the arrow in table represents the process that above-mentioned path is recalled.
Table 5
I | Think | Listen | * | 's | Song | ||
0↘ | 1 | 2 | 3 | 4 | 5 | 6 | |
I | 1 | 0↘ | 1 | 2 | 3 | 4 | 5 |
Think | 2 | 1 | 0↘ | 1 | 2 | 3 | 4 |
Very | 3 | 2 | 1 | 0↘ | 1 | 2 | 3 |
Liu | 4 | 3 | 2 | 1 | 1 | 1.7 | 2.7 |
Moral | 5 | 4 | 3 | 2 | 2 | 1 | 2 |
China | 6 | 5 | 4 | 3 | 3 | 2 | 1.8 |
's | 7 | 6 | 5 | 4 | 4 | 3↖ | 2.8 |
Brother | 8 | 7 | 6 | 5 | 5 | 4 | 3↖ |
The path backtracking illustrated by above-mentioned, deducting the value of the element on the previous replacement operation correspondence position of first element corresponding to described asterisk wildcard by the value of last element corresponding to described asterisk wildcard, is exactly editing distance corresponding to asterisk wildcard.In the above example, the editing distance that described asterisk wildcard is corresponding is 3.The fuzzy phoneme editing distance corresponding due to described fuzzy phoneme editing distance matrix is 3, so, in the example shown in table 4 and table 5, the difference between described template candidate corrected text " I wants the song listening * " and described resulting text " I thinks very Liu De China taxi driver brother " is 0.
S533, using described difference as the fuzzy phoneme editing distance between template candidate corrected text corresponding to present Fuzzy sound editing distance matrix and described resulting text.
In the above example, the value of described difference is 0.Therefore, the fuzzy phoneme editing distance between described template candidate corrected text " I wants the song listening * " and described resulting text " I thinks very Liu De China taxi driver brother " is 0.
The present embodiment is by the multi-level K-Gram index according to the resulting text of speech recognition, pull at least one the template candidate corrected text for carrying out error correction to described resulting text, determine the described fuzzy phoneme editing distance matrix of at least one template candidate corrected text respectively and between described resulting text, the fuzzy phoneme editing distance of at least one template candidate corrected text described respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined, the fuzzy phoneme editing distance corresponding respectively according at least one template candidate corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text, achieve the accurate error correction of the resulting text to speech recognition.
4th embodiment
Fig. 9 is the process flow diagram of the error correction method of the speech recognition text that fourth embodiment of the invention provides.The error correction method of described speech recognition text is based on first embodiment of the invention, further, after pulling at least one the candidate's corrected text for carrying out error correction to described resulting text, before determining the fuzzy phoneme editing distance matrix of at least one candidate's corrected text described respectively and between described resulting text, also comprise: according to the site of user or often pass place, at least one candidate's corrected text described is screened, to filter out and user-dependent at least one place name candidate corrected text.
See Fig. 9, the corrected text of described speech recognition text comprises:
S910, according to the multi-level K-Gram index of the resulting text of speech recognition, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text.
S920, according to the site of user or often pass place, screens at least one candidate's corrected text described, to filter out and user-dependent at least one place name candidate corrected text.
Suppose that the resulting text of speech recognition is a place name " Shi Gezhuan ", the candidate's corrected text pulled is included in Pekinese " Shi Gezhuan ", " Shi Gezhuan " in Qingdao and " Shi Gezhuan " in Qinhuangdao, by inquiring user site, learn that the site of user is Qingdao, then from above-mentioned candidate's corrected text, filter out place name candidate corrected text " Shi Gezhuan " as place name candidate corrected text.
S930, determines the described fuzzy phoneme editing distance matrix of at least one place name candidate corrected text respectively and between described resulting text.
S940, obtains the fuzzy phoneme editing distance of described at least one place name candidate corrected text respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined;
S950, the fuzzy phoneme editing distance corresponding respectively according to described at least one place name candidate corrected text chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
The present embodiment is by after pulling at least one the candidate's corrected text for carrying out error correction to described resulting text, before determining the fuzzy phoneme editing distance matrix of at least one candidate's corrected text described respectively and between described resulting text, according to the site of user or often pass place, at least one candidate's corrected text described is screened, to filter out and user-dependent at least one place name candidate corrected text, thus for user self location or through realize pulling of candidate's corrected text, achieve the personalized error correcting of the resulting text to speech recognition.
5th embodiment
Figure 10 is the process flow diagram of error correction in the error correction method of the speech recognition text that fifth embodiment of the invention provides.The error correction method of described speech recognition text is based on the first embodiment of the present invention, further, the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described is chosen corrected text and is comprised: if the number of at least one candidate's corrected text described is greater than one, then one that selects fuzzy phoneme editing distance at least one candidate's corrected text described minimum as corrected text; If the number of at least one candidate's corrected text described is one, then according to the magnitude relationship of the fuzzy phoneme editing distance threshold value preset with the fuzzy phoneme editing distance of this candidate's corrected text, judge whether described candidate's corrected text as corrected text.
See Figure 10, the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described is chosen corrected text and is comprised:
S141, if the number of at least one candidate's corrected text described is greater than one, then one that selects fuzzy phoneme editing distance at least one candidate's corrected text described minimum as corrected text.
Fuzzy phoneme editing distance between two texts is larger, and these two texts similarity degree is phonetically lower, and fuzzy phoneme editing distance between two texts is less, then these two texts similarity degree is phonetically higher.Therefore, when the quantity of described candidate's corrected text is greater than one, one that should select that fuzzy phoneme editing distance in described candidate's corrected text is minimum, namely the highest with described resulting text similarity degree phonetically one as corrected text.
S142, if the number of at least one candidate's corrected text described is one, then according to the magnitude relationship of the fuzzy phoneme editing distance threshold value preset with the fuzzy phoneme editing distance of this candidate's corrected text, judge whether described candidate's corrected text as corrected text.
Concrete, when the number of described candidate's corrected text is one, judge whether the fuzzy phoneme editing distance between described candidate's corrected text and described resulting text is less than default fuzzy phoneme editing distance threshold value.If fuzzy phoneme editing distance corresponding to described candidate's corrected text is less than default fuzzy phoneme editing distance threshold value, then can using described candidate's corrected text as corrected text; If fuzzy phoneme editing distance corresponding to described candidate's corrected text is greater than default fuzzy phoneme editing distance threshold value, then not using described candidate's corrected text as corrected text.
The present embodiment is by when the number of described candidate's corrected text is greater than one, one that selects fuzzy phoneme editing distance in described candidate's corrected text minimum is corrected text, and when the number of described candidate's corrected text is one, according to the magnitude relationship of the fuzzy phoneme editing distance threshold value preset with the fuzzy phoneme editing distance of this candidate's corrected text, to judge whether described candidate's corrected text, as corrected text, to achieve the accurate error correction of the resulting text to speech recognition.
6th embodiment
Figure 11 is the structural drawing of the error correction device of the speech recognition text that sixth embodiment of the invention provides.See Figure 11, the error correction device of described speech recognition text comprises: corrected text pulls module 1110, editing distance matrix computations module 1130, path backtracking module 1140 and correction module 1150.
Described corrected text pulls the multi-level K-Gram index of module 1110 for the resulting text according to speech recognition, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text.
Described editing distance matrix computations module 1130 is for determining the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text.
Described path backtracking module 1140 is for obtaining the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined.
Described correction module 1150 chooses corrected text for the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
Preferably, described corrected text pulls module 1110 and comprises: first pulls unit 1111 at many levels.
Described first pulls unit 1111 at many levels for the K-Gram index according to Chinese character level, spelling level or the initial and the final level, pulls at least one non-template candidate corrected text for carrying out error correction to described resulting text.
Preferably, described editing distance matrix computations module 1130 comprises: the first matrix calculation unit 1131 and the first matrix element replacement unit 1132.
Described first matrix element setting unit 1131 is for for each non-template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current non-template candidate corrected text corresponding to described element and the resulting text corresponding to described element.
Described first matrix calculation unit 1132, for determining the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix according to dynamic programming algorithm, obtains the fuzzy phoneme editing distance matrix between current non-template candidate corrected text and described resulting text.
Preferably, described path backtracking module 1140 comprises: the first path trace unit 1141 and the first editing distance computing unit 1142.
Described first path trace unit 1141, for for each fuzzy phoneme editing distance matrix determined, obtains by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence.
Described first editing distance computing unit 1142 is for sticking with paste candidate's error correction border of sound editing distance and correspondence, the fuzzy phoneme editing distance between the non-template candidate corrected text corresponding as present Fuzzy sound editing distance matrix and described resulting text and candidate's error correction border by present Fuzzy sound editing distance matrix norm.
Preferably, described corrected text pulls module 1110 and comprises: second pulls unit 1112 and asterisk wildcard replacement unit 1113 at many levels.
Described second pulls unit 1112 at many levels for the K-Gram index according to Chinese character level, spelling level or the initial and the final level, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text.
Described asterisk wildcard replacement unit 1113 for identifying the proper noun comprised in each candidate's corrected text, and uses asterisk wildcard to replace described proper noun, to obtain at least one template candidate corrected text.
Preferably, described editing distance matrix computations module 1130 comprises: the second matrix calculation unit 1133 and the second matrix element replacement unit 1134.
Described second matrix element setting unit 1133 is for for each template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current template candidate corrected text corresponding to described element and the resulting text corresponding to described element.
Described second matrix calculation unit 1134, for determining the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix according to dynamic programming algorithm, obtains the fuzzy phoneme editing distance matrix between current template candidate corrected text and described resulting text.
Preferably, described path backtracking module 1140 comprises: the second path trace unit 1143, difference acquiring unit 1144 and the second editing distance computing unit 1145.
Described second path trace unit 1143, for for each fuzzy phoneme editing distance matrix determined, obtains by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence.
Described difference acquiring unit 1144 is for determining that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance, the difference between the editing distance that the asterisk wildcard in the template candidate corrected text corresponding with present Fuzzy sound editing distance matrix is corresponding.
Described second editing distance computing unit 1145 for using described difference as the fuzzy phoneme editing distance between template candidate corrected text corresponding to present Fuzzy sound editing distance matrix and described resulting text.
Preferably, the error correction device of described speech recognition text also comprises: place name text replacement module 1120.
Described place name text replacement module 1120 is for after pulling at least one the candidate's corrected text for carrying out error correction to described resulting text, before determining the fuzzy phoneme editing distance matrix of at least one candidate's corrected text described respectively and between described resulting text, according to the site of user or often pass place, at least one candidate's corrected text described is screened, to filter out and user-dependent at least one place name candidate corrected text.
Preferably, corresponding respectively according at least one candidate's corrected text described fuzzy phoneme editing distance is chosen corrected text and is comprised:
Be greater than when one in the number of at least one candidate's corrected text described, one that selects fuzzy phoneme editing distance at least one candidate's corrected text described minimum as corrected text;
When the number of at least one candidate's corrected text described is one, according to the magnitude relationship of the fuzzy phoneme editing distance threshold value preset with the fuzzy phoneme editing distance of this candidate's corrected text, judge whether described candidate's corrected text as corrected text.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
Those of ordinary skill in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of computer installation, thus they storages can be performed by calculation element in the storage device, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to the combination of any specific hardware and software.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, the same or analogous part between each embodiment mutually see.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, to those skilled in the art, the present invention can have various change and change.All do within spirit of the present invention and principle any amendment, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (18)
1. an error correction method for speech recognition text, is characterized in that, comprising:
According to the multi-level K-Gram index of the resulting text of speech recognition, pull at least one the candidate's corrected text for carrying out error correction to described resulting text;
Determine the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text;
The fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined;
The fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
2. method according to claim 1, is characterized in that, according to the multi-level K-Gram index of the resulting text of speech recognition, at least one the candidate's corrected text pulled for carrying out error correction to described resulting text comprises:
According to the K-Gram index of Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pull at least one non-template candidate corrected text for carrying out error correction to described resulting text.
3. method according to claim 2, is characterized in that, determines that the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text comprises:
For each non-template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current non-template candidate corrected text corresponding to described element and the resulting text corresponding to described element;
Determine the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix according to dynamic programming algorithm, obtain the fuzzy phoneme editing distance matrix between current non-template candidate corrected text and described resulting text.
4. method according to claim 2, is characterized in that, obtains the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border comprises:
For each fuzzy phoneme editing distance matrix determined, obtain by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence;
Present Fuzzy sound editing distance matrix norm is stuck with paste candidate's error correction border of sound editing distance and correspondence, the fuzzy phoneme editing distance between the non-template candidate corrected text corresponding as present Fuzzy sound editing distance matrix and described resulting text and candidate's error correction border.
5. method according to claim 1, is characterized in that, according to the multi-level K-Gram index of the resulting text of speech recognition, at least one the candidate's corrected text pulled for carrying out error correction to described resulting text comprises:
According to the K-Gram index of Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pull at least one the candidate's corrected text for carrying out error correction to described resulting text;
Identify the proper noun comprised in each candidate's corrected text, and use asterisk wildcard to replace described proper noun, to obtain at least one template candidate corrected text.
6. method according to claim 5, is characterized in that, determines that the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text comprises:
For each template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current template candidate corrected text corresponding to described element and the resulting text corresponding to described element;
Determine the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix according to dynamic programming algorithm, obtain the fuzzy phoneme editing distance matrix between current template candidate corrected text and described resulting text.
7. method according to claim 5, is characterized in that, obtains the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border comprises:
For each fuzzy phoneme editing distance matrix determined, obtain by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence;
Determine that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance, the difference between the editing distance corresponding to the asterisk wildcard in the template candidate corrected text corresponding with present Fuzzy sound editing distance matrix;
Using described difference as the fuzzy phoneme editing distance between template candidate corrected text corresponding to present Fuzzy sound editing distance matrix and described resulting text.
8. method according to claim 1, it is characterized in that, after pulling at least one the candidate's corrected text for carrying out error correction to described resulting text, before determining the fuzzy phoneme editing distance matrix of at least one candidate's corrected text described respectively and between described resulting text, also comprise:
According to the site of user or often pass place, at least one candidate's corrected text described is screened, to filter out and user-dependent at least one place name candidate corrected text.
9. method according to claim 1, is characterized in that, the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described is chosen corrected text and comprised:
If the number of at least one candidate's corrected text described is greater than one, then one that selects fuzzy phoneme editing distance at least one candidate's corrected text described minimum as corrected text;
If the number of at least one candidate's corrected text described is one, then according to the magnitude relationship of the fuzzy phoneme editing distance threshold value preset with the fuzzy phoneme editing distance of this candidate's corrected text, judge whether described candidate's corrected text as corrected text.
10. an error correction device for speech recognition text, is characterized in that, comprising:
Corrected text pulls module, for the multi-level K-Gram index of the resulting text according to speech recognition, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text;
Editing distance matrix computations module, for determining the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text;
Path backtracking module, for obtaining the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined;
Correction module, choose corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text for the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described.
11. devices according to claim 10, is characterized in that, described corrected text pulls module and comprises:
First pulls unit at many levels, for the K-Gram index according to Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pulls at least one non-template candidate corrected text for carrying out error correction to described resulting text.
12. devices according to claim 11, is characterized in that, described editing distance matrix computations module comprises:
First matrix element setting unit, for for each non-template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current non-template candidate corrected text corresponding to described element and the resulting text corresponding to described element;
First matrix calculation unit, for determining the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix according to dynamic programming algorithm, obtains the fuzzy phoneme editing distance matrix between current non-template candidate corrected text and described resulting text.
13. devices according to claim 11, is characterized in that, described path backtracking module comprises:
First path trace unit, for for each fuzzy phoneme editing distance matrix determined, obtains by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence;
First editing distance computing unit, for present Fuzzy sound editing distance matrix norm being stuck with paste candidate's error correction border of sound editing distance and correspondence, the fuzzy phoneme editing distance between the non-template candidate corrected text corresponding as present Fuzzy sound editing distance matrix and described resulting text and candidate's error correction border.
14. devices according to claim 10, is characterized in that, described corrected text pulls module and comprises:
Second pulls unit at many levels, for the K-Gram index according to Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text;
Asterisk wildcard replacement unit, for identifying the proper noun comprised in each candidate's corrected text, and uses asterisk wildcard to replace described proper noun, to obtain at least one template candidate corrected text.
15. devices according to claim 14, is characterized in that, described editing distance matrix computations module comprises:
Second matrix element setting unit, for for each template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current template candidate corrected text corresponding to described element and the resulting text corresponding to described element;
Second matrix calculation unit, for determining the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix according to dynamic programming algorithm, obtains the fuzzy phoneme editing distance matrix between current template candidate corrected text and described resulting text.
16. devices according to claim 14, is characterized in that, described path backtracking module comprises:
Second path trace unit, for for each fuzzy phoneme editing distance matrix determined, obtains by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence;
Difference acquiring unit, for determining that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance, the difference between the editing distance that the asterisk wildcard in the template candidate corrected text corresponding with present Fuzzy sound editing distance matrix is corresponding;
Second editing distance computing unit, for using described difference as the fuzzy phoneme editing distance between template candidate corrected text corresponding to present Fuzzy sound editing distance matrix and described resulting text.
17. devices according to claim 10, is characterized in that, also comprise:
Place name text replacement module, for after pulling at least one the candidate's corrected text for carrying out error correction to described resulting text, before determining the fuzzy phoneme editing distance matrix of at least one candidate's corrected text described respectively and between described resulting text, according to the site of user or often pass place, at least one candidate's corrected text described is screened, to filter out and user-dependent at least one place name candidate corrected text.
18. devices according to claim 10, is characterized in that, the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described is chosen corrected text and comprised:
Be greater than when one in the number of at least one candidate's corrected text described, one that selects fuzzy phoneme editing distance at least one candidate's corrected text described minimum as corrected text;
When the number of at least one candidate's corrected text described is one, according to the magnitude relationship of the fuzzy phoneme editing distance threshold value preset with the fuzzy phoneme editing distance of this candidate's corrected text, judge whether described candidate's corrected text as corrected text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410778108.3A CN104464736B (en) | 2014-12-15 | 2014-12-15 | The error correction method and device of speech recognition text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410778108.3A CN104464736B (en) | 2014-12-15 | 2014-12-15 | The error correction method and device of speech recognition text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104464736A true CN104464736A (en) | 2015-03-25 |
CN104464736B CN104464736B (en) | 2018-02-02 |
Family
ID=52910683
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410778108.3A Active CN104464736B (en) | 2014-12-15 | 2014-12-15 | The error correction method and device of speech recognition text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104464736B (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105869634A (en) * | 2016-03-31 | 2016-08-17 | 重庆大学 | Field-based method and system for feeding back text error correction after speech recognition |
CN105869642A (en) * | 2016-03-25 | 2016-08-17 | 海信集团有限公司 | Voice text error correction method and device |
CN105976818A (en) * | 2016-04-26 | 2016-09-28 | Tcl集团股份有限公司 | Instruction identification processing method and apparatus thereof |
CN106448675A (en) * | 2016-10-21 | 2017-02-22 | 科大讯飞股份有限公司 | Recognition text correction method and system |
CN106534548A (en) * | 2016-11-17 | 2017-03-22 | 科大讯飞股份有限公司 | Voice error correction method and device |
CN106847288A (en) * | 2017-02-17 | 2017-06-13 | 上海创米科技有限公司 | The error correction method and device of speech recognition text |
CN107544726A (en) * | 2017-07-04 | 2018-01-05 | 百度在线网络技术(北京)有限公司 | Method for correcting error of voice identification result, device and storage medium based on artificial intelligence |
CN107729321A (en) * | 2017-10-23 | 2018-02-23 | 上海百芝龙网络科技有限公司 | A kind of method for correcting error of voice identification result |
CN107741928A (en) * | 2017-10-13 | 2018-02-27 | 四川长虹电器股份有限公司 | A kind of method to text error correction after speech recognition based on field identification |
CN108694942A (en) * | 2018-04-02 | 2018-10-23 | 浙江大学 | A kind of smart home interaction question answering system based on home furnishings intelligent service robot |
CN109710904A (en) * | 2018-11-13 | 2019-05-03 | 平安科技(深圳)有限公司 | Text accuracy rate calculation method, device, computer equipment based on semanteme parsing |
CN110033769A (en) * | 2019-04-23 | 2019-07-19 | 努比亚技术有限公司 | A kind of typing method of speech processing, terminal and computer readable storage medium |
CN110415679A (en) * | 2019-07-25 | 2019-11-05 | 北京百度网讯科技有限公司 | Voice error correction method, device, equipment and storage medium |
CN110442853A (en) * | 2019-08-09 | 2019-11-12 | 深圳前海微众银行股份有限公司 | Text positioning method, device, terminal and storage medium |
CN110442876A (en) * | 2019-08-09 | 2019-11-12 | 深圳前海微众银行股份有限公司 | Text mining method, apparatus, terminal and storage medium |
US10546062B2 (en) | 2017-11-15 | 2020-01-28 | International Business Machines Corporation | Phonetic patterns for fuzzy matching in natural language processing |
CN110992956A (en) * | 2019-11-11 | 2020-04-10 | 上海市研发公共服务平台管理中心 | Information processing method, device, equipment and storage medium for voice conversion |
CN111382562A (en) * | 2020-03-05 | 2020-07-07 | 百度在线网络技术(北京)有限公司 | Text similarity determination method and device, electronic equipment and storage medium |
CN111832554A (en) * | 2019-04-15 | 2020-10-27 | 顺丰科技有限公司 | Image detection method, device and storage medium |
CN111862955A (en) * | 2020-06-23 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Voice recognition method, terminal and computer readable storage medium |
CN112382289A (en) * | 2020-11-13 | 2021-02-19 | 北京百度网讯科技有限公司 | Method and device for processing voice recognition result, electronic equipment and storage medium |
CN112560493A (en) * | 2020-12-17 | 2021-03-26 | 金蝶软件(中国)有限公司 | Named entity error correction method, named entity error correction device, computer equipment and storage medium |
CN112836497A (en) * | 2021-01-29 | 2021-05-25 | 上海寻梦信息技术有限公司 | Address correction method, device, electronic equipment and storage medium |
CN113781998A (en) * | 2021-09-10 | 2021-12-10 | 未鲲(上海)科技服务有限公司 | Dialect correction model-based voice recognition method, device, equipment and medium |
WO2023173533A1 (en) * | 2022-03-17 | 2023-09-21 | 平安科技(深圳)有限公司 | Text error correction method and apparatus, device, and storage medium |
US11810558B2 (en) | 2021-05-26 | 2023-11-07 | International Business Machines Corporation | Explaining anomalous phonetic translations |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923854A (en) * | 2010-08-31 | 2010-12-22 | 中国科学院计算技术研究所 | Interactive speech recognition system and method |
CN102999483A (en) * | 2011-09-16 | 2013-03-27 | 北京百度网讯科技有限公司 | Method and device for correcting text |
US20130158995A1 (en) * | 2009-11-24 | 2013-06-20 | Sorenson Communications, Inc. | Methods and apparatuses related to text caption error correction |
US20130311182A1 (en) * | 2012-05-16 | 2013-11-21 | Gwangju Institute Of Science And Technology | Apparatus for correcting error in speech recognition |
CN104021786A (en) * | 2014-05-15 | 2014-09-03 | 北京中科汇联信息技术有限公司 | Speech recognition method and speech recognition device |
-
2014
- 2014-12-15 CN CN201410778108.3A patent/CN104464736B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130158995A1 (en) * | 2009-11-24 | 2013-06-20 | Sorenson Communications, Inc. | Methods and apparatuses related to text caption error correction |
CN101923854A (en) * | 2010-08-31 | 2010-12-22 | 中国科学院计算技术研究所 | Interactive speech recognition system and method |
CN102999483A (en) * | 2011-09-16 | 2013-03-27 | 北京百度网讯科技有限公司 | Method and device for correcting text |
US20130311182A1 (en) * | 2012-05-16 | 2013-11-21 | Gwangju Institute Of Science And Technology | Apparatus for correcting error in speech recognition |
CN104021786A (en) * | 2014-05-15 | 2014-09-03 | 北京中科汇联信息技术有限公司 | Speech recognition method and speech recognition device |
Non-Patent Citations (1)
Title |
---|
吴斌: "语音识别中的后处理技术研究", 《博士研究生学位论文》 * |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105869642A (en) * | 2016-03-25 | 2016-08-17 | 海信集团有限公司 | Voice text error correction method and device |
CN105869642B (en) * | 2016-03-25 | 2019-09-20 | 海信集团有限公司 | A kind of error correction method and device of speech text |
CN105869634A (en) * | 2016-03-31 | 2016-08-17 | 重庆大学 | Field-based method and system for feeding back text error correction after speech recognition |
CN105869634B (en) * | 2016-03-31 | 2019-11-19 | 重庆大学 | It is a kind of based on field band feedback speech recognition after text error correction method and system |
CN105976818A (en) * | 2016-04-26 | 2016-09-28 | Tcl集团股份有限公司 | Instruction identification processing method and apparatus thereof |
CN105976818B (en) * | 2016-04-26 | 2020-12-25 | Tcl科技集团股份有限公司 | Instruction recognition processing method and device |
CN106448675A (en) * | 2016-10-21 | 2017-02-22 | 科大讯飞股份有限公司 | Recognition text correction method and system |
CN106448675B (en) * | 2016-10-21 | 2020-05-01 | 科大讯飞股份有限公司 | Method and system for correcting recognition text |
CN106534548A (en) * | 2016-11-17 | 2017-03-22 | 科大讯飞股份有限公司 | Voice error correction method and device |
CN106534548B (en) * | 2016-11-17 | 2020-06-12 | 科大讯飞股份有限公司 | Voice error correction method and device |
CN106847288B (en) * | 2017-02-17 | 2020-12-25 | 上海创米科技有限公司 | Error correction method and device for voice recognition text |
CN106847288A (en) * | 2017-02-17 | 2017-06-13 | 上海创米科技有限公司 | The error correction method and device of speech recognition text |
CN107544726A (en) * | 2017-07-04 | 2018-01-05 | 百度在线网络技术(北京)有限公司 | Method for correcting error of voice identification result, device and storage medium based on artificial intelligence |
CN107741928A (en) * | 2017-10-13 | 2018-02-27 | 四川长虹电器股份有限公司 | A kind of method to text error correction after speech recognition based on field identification |
CN107741928B (en) * | 2017-10-13 | 2021-01-26 | 四川长虹电器股份有限公司 | Method for correcting error of text after voice recognition based on domain recognition |
CN107729321A (en) * | 2017-10-23 | 2018-02-23 | 上海百芝龙网络科技有限公司 | A kind of method for correcting error of voice identification result |
US10546062B2 (en) | 2017-11-15 | 2020-01-28 | International Business Machines Corporation | Phonetic patterns for fuzzy matching in natural language processing |
CN108694942A (en) * | 2018-04-02 | 2018-10-23 | 浙江大学 | A kind of smart home interaction question answering system based on home furnishings intelligent service robot |
CN109710904A (en) * | 2018-11-13 | 2019-05-03 | 平安科技(深圳)有限公司 | Text accuracy rate calculation method, device, computer equipment based on semanteme parsing |
CN109710904B (en) * | 2018-11-13 | 2023-11-14 | 平安科技(深圳)有限公司 | Text accuracy rate calculation method and device based on semantic analysis and computer equipment |
CN111832554A (en) * | 2019-04-15 | 2020-10-27 | 顺丰科技有限公司 | Image detection method, device and storage medium |
CN110033769A (en) * | 2019-04-23 | 2019-07-19 | 努比亚技术有限公司 | A kind of typing method of speech processing, terminal and computer readable storage medium |
CN110415679B (en) * | 2019-07-25 | 2021-12-17 | 北京百度网讯科技有限公司 | Voice error correction method, device, equipment and storage medium |
CN110415679A (en) * | 2019-07-25 | 2019-11-05 | 北京百度网讯科技有限公司 | Voice error correction method, device, equipment and storage medium |
US11328708B2 (en) | 2019-07-25 | 2022-05-10 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Speech error-correction method, device and storage medium |
CN110442876A (en) * | 2019-08-09 | 2019-11-12 | 深圳前海微众银行股份有限公司 | Text mining method, apparatus, terminal and storage medium |
CN110442853A (en) * | 2019-08-09 | 2019-11-12 | 深圳前海微众银行股份有限公司 | Text positioning method, device, terminal and storage medium |
CN110442876B (en) * | 2019-08-09 | 2023-09-05 | 深圳前海微众银行股份有限公司 | Text mining method, device, terminal and storage medium |
CN110992956A (en) * | 2019-11-11 | 2020-04-10 | 上海市研发公共服务平台管理中心 | Information processing method, device, equipment and storage medium for voice conversion |
CN111382562A (en) * | 2020-03-05 | 2020-07-07 | 百度在线网络技术(北京)有限公司 | Text similarity determination method and device, electronic equipment and storage medium |
CN111382562B (en) * | 2020-03-05 | 2024-03-01 | 百度在线网络技术(北京)有限公司 | Text similarity determination method and device, electronic equipment and storage medium |
CN111862955A (en) * | 2020-06-23 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Voice recognition method, terminal and computer readable storage medium |
CN111862955B (en) * | 2020-06-23 | 2024-04-23 | 北京嘀嘀无限科技发展有限公司 | Speech recognition method and terminal, and computer readable storage medium |
CN112382289B (en) * | 2020-11-13 | 2024-03-22 | 北京百度网讯科技有限公司 | Speech recognition result processing method and device, electronic equipment and storage medium |
CN112382289A (en) * | 2020-11-13 | 2021-02-19 | 北京百度网讯科技有限公司 | Method and device for processing voice recognition result, electronic equipment and storage medium |
CN112560493B (en) * | 2020-12-17 | 2024-04-30 | 金蝶软件(中国)有限公司 | Named entity error correction method, named entity error correction device, named entity error correction computer equipment and named entity error correction storage medium |
CN112560493A (en) * | 2020-12-17 | 2021-03-26 | 金蝶软件(中国)有限公司 | Named entity error correction method, named entity error correction device, computer equipment and storage medium |
CN112836497A (en) * | 2021-01-29 | 2021-05-25 | 上海寻梦信息技术有限公司 | Address correction method, device, electronic equipment and storage medium |
US11810558B2 (en) | 2021-05-26 | 2023-11-07 | International Business Machines Corporation | Explaining anomalous phonetic translations |
CN113781998A (en) * | 2021-09-10 | 2021-12-10 | 未鲲(上海)科技服务有限公司 | Dialect correction model-based voice recognition method, device, equipment and medium |
CN113781998B (en) * | 2021-09-10 | 2024-06-07 | 河南松音科技有限公司 | Speech recognition method, device, equipment and medium based on dialect correction model |
WO2023173533A1 (en) * | 2022-03-17 | 2023-09-21 | 平安科技(深圳)有限公司 | Text error correction method and apparatus, device, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN104464736B (en) | 2018-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104464736A (en) | Error correction method and device for voice recognition text | |
JP6675463B2 (en) | Bidirectional stochastic rewriting and selection of natural language | |
JP4580885B2 (en) | Scene information extraction method, scene extraction method, and extraction apparatus | |
CN106534548B (en) | Voice error correction method and device | |
CN106570180B (en) | Voice search method and device based on artificial intelligence | |
CN109637537B (en) | Method for automatically acquiring annotated data to optimize user-defined awakening model | |
US7810030B2 (en) | Fault-tolerant romanized input method for non-roman characters | |
KR101359718B1 (en) | Conversation Managemnt System and Method Thereof | |
CN105550171B (en) | A kind of the Query Information error correction method and system of vertical search engine | |
US9747893B2 (en) | Unsupervised training method, training apparatus, and training program for an N-gram language model based upon recognition reliability | |
US20170103061A1 (en) | Interaction apparatus and method | |
WO2015169134A1 (en) | Method and apparatus for phonetically annotating text | |
US9779728B2 (en) | Systems and methods for adding punctuations by detecting silences in a voice using plurality of aggregate weights which obey a linear relationship | |
EP3213227A1 (en) | Contextual search disambiguation | |
US20070179777A1 (en) | Automatic Grammar Generation Using Distributedly Collected Knowledge | |
CN103678684A (en) | Chinese word segmentation method based on navigation information retrieval | |
CN102193646B (en) | Method and device for generating personal name candidate words | |
CN112861521B (en) | Speech recognition result error correction method, electronic device and storage medium | |
US20150228273A1 (en) | Automated generation of phonemic lexicon for voice activated cockpit management systems | |
WO2014036827A1 (en) | Text correcting method and user equipment | |
CN104007836A (en) | Handwriting input processing method and terminal device | |
CN111125438A (en) | Entity information extraction method and device, electronic equipment and storage medium | |
JP2001229180A (en) | Contents retrieval device | |
CN115858733A (en) | Cross-language entity word retrieval method, device, equipment and storage medium | |
Rehbein | POS error detection in automatically annotated corpora |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |