CN104464736A - Error correction method and device for voice recognition text - Google Patents

Error correction method and device for voice recognition text Download PDF

Info

Publication number
CN104464736A
CN104464736A CN201410778108.3A CN201410778108A CN104464736A CN 104464736 A CN104464736 A CN 104464736A CN 201410778108 A CN201410778108 A CN 201410778108A CN 104464736 A CN104464736 A CN 104464736A
Authority
CN
China
Prior art keywords
text
candidate
editing distance
corrected text
error correction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410778108.3A
Other languages
Chinese (zh)
Other versions
CN104464736B (en
Inventor
时迎超
周晓
张海雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410778108.3A priority Critical patent/CN104464736B/en
Publication of CN104464736A publication Critical patent/CN104464736A/en
Application granted granted Critical
Publication of CN104464736B publication Critical patent/CN104464736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The embodiment of the invention discloses an error correction method and device for a voice recognition text. The error correction method for the voice recognition text comprises the steps that at least one candidate error correction text used for error correction of a result text is obtained according to the multi-layer K-Gram index of the voice recognition result text, the fuzzy sound editing distance matrix between the at least one candidate error correction text and the result text is determined, the fuzzy sound editing distance between the at least one candidate error correction text and the result text and a candidate error correction boundary are obtained according to the determined fuzzy sound editing distance matrix, an error correction text is selected according to the fuzzy sound editing distance corresponding to the at least one candidate error correction text, and error correction is conducted on the result text according to the candidate error correction boundary corresponding to the error correction text. By the adoption of the error correction method and device for the voice recognition text, accurate error correction of the voice recognition result text is achieved.

Description

The error correction method of speech recognition text and device
Technical field
The embodiment of the present invention relates to technical field of voice recognition, particularly relates to a kind of error correction method and device of speech recognition text.
Background technology
Along with the maturation day by day of speech recognition technology, the application of speech recognition is also more and more extensive.Relative to other Text Input mode, the phonetic entry mode that speech recognition realizes more meets the daily habits of people, also makes input process more efficient.Can estimate, speech recognition technology will be widely used in multiple fields such as commercial production, communication, medical treatment, household services.
In the practical application of speech recognition technology, due to the impact of the factor such as ambient noise, dialect, the recognition result of speech recognition is often inconsistent with the input of user.Especially, under everyday spoken english scene, the identification error of speech recognition is more general.And the approach of error correction lacked in prior art identification error, thus have impact on the further genralrlization of speech recognition technology.
Summary of the invention
In view of this, the embodiment of the present invention proposes a kind of error correction method and device of speech recognition text, to carry out error correction accurately to the resulting text of speech recognition.
First aspect, embodiments provide a kind of error correction method of speech recognition text, described method comprises:
According to the multi-level K-Gram index of the resulting text of speech recognition, pull at least one the candidate's corrected text for carrying out error correction to described resulting text;
Determine the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text;
The fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined;
The fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
Second aspect, the embodiment of the present invention additionally provides a kind of error correction device of speech recognition text, and described device comprises:
Corrected text pulls module, for the multi-level K-Gram index of the resulting text according to speech recognition, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text;
Editing distance matrix computations module, for determining the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text;
Path backtracking module, for obtaining the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined;
Correction module, choose corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text for the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described.
The error correction method of the speech recognition text that the embodiment of the present invention provides and device, by the multi-level K-Gram index of the resulting text according to speech recognition, pull at least one the candidate's corrected text for carrying out error correction to described resulting text, determine the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text, the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined, the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text, thus achieve the accurate error correction of the resulting text to speech recognition.
Accompanying drawing explanation
By reading the detailed description done non-limiting example done with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 is the process flow diagram of the error correction method of the speech recognition text that first embodiment of the invention provides;
Fig. 2 is the process flow diagram of the error correction method of the speech recognition text that second embodiment of the invention provides;
Fig. 3 is the process flow diagram that the error correction method inediting distance matrix of the speech recognition text that second embodiment of the invention provides calculates;
Fig. 4 is the process flow diagram of path backtracking in the error correction method of the speech recognition text that second embodiment of the invention provides;
Fig. 5 is the process flow diagram of the error correction method of the speech recognition text that third embodiment of the invention provides;
Fig. 6 is the process flow diagram that in the error correction method of the speech recognition text that third embodiment of the invention provides, corrected text pulls;
Fig. 7 is the process flow diagram that the error correction method inediting distance matrix of the speech recognition text that third embodiment of the invention provides calculates;
Fig. 8 is the process flow diagram of path backtracking in the error correction method of the speech recognition text that third embodiment of the invention provides;
Fig. 9 is the process flow diagram of the error correction method of the speech recognition text that fourth embodiment of the invention provides;
Figure 10 is the process flow diagram of error correction in the error correction method of the speech recognition text that fifth embodiment of the invention provides;
Figure 11 is the structural drawing of the error correction device of the speech recognition text that sixth embodiment of the invention provides.
Embodiment
Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not full content.
First embodiment
Fig. 1 is the process flow diagram of the error correction method of the speech recognition text that first embodiment of the invention provides.See Fig. 1, the error correction method of described speech recognition text comprises:
S110, according to the multi-level K-Gram index of the resulting text of speech recognition, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text.
Before error correction is carried out to the resulting text of described speech recognition, first set up the multi-level K-Gram index of described resulting text.After the multi-level K-Gram index setting up described resulting text, according to described multi-level K-Gram index, from preset corpus, pull the candidate corrected text the most similar to described resulting text.
Concrete, described multi-level K-Gram index comprises any one in the K-Gram index of the K-Gram index of the K-Gram index of Chinese character level, the K-Gram index of pinyin syllable level, spelling or simplicity level, the initial and the final level.
The K-Gram index of described Chinese character level is for the element of composition K-Gram index and the K-Gram index set up with the Chinese character in described resulting text.The K-Gram index of described pinyin syllable level is the K-Gram index set up with the element of pinyin syllable composition K-Gram index corresponding to Chinese character in described resulting text.The K-Gram index of described spelling or simplicity level obtains the spelling or simplicity that in described resulting text, Chinese character is corresponding, and with described spelling or simplicity for the element of composition K-Gram index and the K-Gram index set up.The K-Gram index of described the initial and the final level distinguishes initial consonant and simple or compound vowel of a Chinese syllable in the spelling that Chinese character is corresponding from described resulting text, and with the initial consonant distinguished and simple or compound vowel of a Chinese syllable for the element of composition K-Gram index and the K-Gram index set up.
The candidate's corrected text pulled is for therefrom choosing the alternative text described resulting text being carried out to the corrected text of error correction.In order to carry out error correction to described resulting text more accurately, when pulling candidate's corrected text, the quantity of the candidate's corrected text pulled should be at least one.
S120, determines the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text.
After determining at least one candidate's corrected text, determine the fuzzy phoneme editing distance matrix between each candidate's corrected text and described resulting text.
Editing distance refers between two character strings, converts the minimum editing operation number of times needed for another one character string to by a character string.Wherein, described editing operation comprises replacement operation, update and deletion action.Described replacement operation refers to and utilizes a character to replace another character; Described update refers to an insertion character originally do not had in character string; Described deletion action refers to and delete an original character from character string.
Editing distance matrix is a kind of matrix for calculating the editing distance between two character strings.Table 1 shows the editing distance matrix between character string " kitten " and character string " sitting ".
Table 1
k i t t e n
0 1 2 3 4 5 6
s 1 1 2 3 4 5 6
i 2 2 1 2 3 4 5
t 3 3 2 1 2 3 4
t 4 4 3 2 1 2 3
i 5 5 4 3 2 2 2
n 6 6 5 4 3 3 2
g 7 7 6 5 4 4 3
Provide two character strings, dynamic programming algorithm can be utilized to solve editing distance matrix between two character strings.
After utilizing dynamic programming algorithm to solve to obtain the editing distance matrix between two character strings, element corresponding for replacement operation in described editing distance matrix is replaced with the fuzzy phoneme similarity between the character in character in current candidate corrected text corresponding to this element described resulting text corresponding with described element, just obtain the fuzzy phoneme editing distance matrix between current candidate corrected text and resulting text.Described fuzzy phoneme similarity is for characterizing between two character strings similarity degree phonetically.Concrete, in the present embodiment, described fuzzy phoneme similarity is for characterizing current candidate corrected text and resulting text similarity degree phonetically.
Fuzzy phoneme similarity between character in the described resulting text that character in the current candidate corrected text that described element is corresponding is corresponding with described element is by searching the fuzzy phoneme matrix that pre-sets and obtaining.In described fuzzy phoneme matrix, record the corresponding relation of kinds of characters and the fuzzy phoneme similarity between them.Therefore, can by searching the fuzzy phoneme similarity that described fuzzy phoneme matrix obtains needing.
S130, obtains the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined.
After determining the fuzzy phoneme editing distance matrix between each candidate's corrected text and described resulting text, for each candidate's corrected text, according to the fuzzy phoneme editing distance matrix between current candidate corrected text and described resulting text, the candidate's error correction border obtaining the fuzzy phoneme editing distance between current candidate corrected text and described resulting text and should adopt when utilizing current candidate corrected text to carry out error correction to described resulting text.
Described fuzzy phoneme editing distance to be used to represent between current candidate corrected text and described resulting text the amount of similarity degree phonetically.Fuzzy phoneme editing distance between two texts is larger, shows that these two texts similarity degree is phonetically lower.When between candidate's corrected text and described resulting text, similarity degree is phonetically lower, the final probability adopting this candidate's corrected text to carry out error correction as corrected text to described resulting text is just lower.
In general, adopt candidate's corrected text to carry out error correction to described resulting text and from described candidate's corrected text, choose an error correction substring exactly, replace in described resulting text the wrong substring of makeing mistakes.The coboundary of error correction substring described in described candidate's error correction boundary representation in described candidate's corrected text and lower boundary, and the coboundary of described wrong substring in described resulting text and lower boundary.Such as, suppose that candidate's corrected text is for " APEC ", and the resulting text of speech recognition is " forum of economic cooperation official that Asia-Pacific Organization for Economic Co-operation is Asian-Pacific area most impact ".Through identifying, obtain described error correction substring for " APEC ", mistake substring is " Asia-Pacific Organization for Economic Co-operation ", the then coboundary of described error correction substring " APEC " and lower boundary, and the coboundary of described wrong substring " Asia-Pacific Organization for Economic Co-operation " and lower boundary are described candidate's error correction border.
S140, the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
After getting the fuzzy phoneme editing distance between each candidate's corrected text and described resulting text, according to each self-corresponding fuzzy phoneme editing distance of each candidate's corrected text, from least one candidate's corrected text described, choose corrected text.Because described fuzzy phoneme editing distance is larger, show that candidate's corrected text and the described resulting text similarity degree phonetically of correspondence are lower, so, candidate's corrected text that fuzzy phoneme editing distance at least one candidate's corrected text described in generally should choosing and between described resulting text is less, as final corrected text of described resulting text being carried out to error correction.
The present embodiment is by the multi-level K-Gram index according to the resulting text of speech recognition, pull at least one the candidate's corrected text for carrying out error correction to described resulting text, determine the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text, the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined, and the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text, error correction has accurately been carried out to speech recognition text.
Second embodiment
Fig. 2 is the process flow diagram of the error correction method of the speech recognition text that second embodiment of the invention provides.The error correction method of described speech recognition text is based on first embodiment of the invention, further, the error correction method of described speech recognition text comprises: according to the multi-level K-Gram index of the resulting text of speech recognition, pulls at least one non-template candidate corrected text for carrying out error correction to described resulting text; Determine the described fuzzy phoneme editing distance matrix of at least one non-template candidate corrected text respectively and between described resulting text; The fuzzy phoneme editing distance of described at least one non-template candidate corrected text respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined; The fuzzy phoneme editing distance corresponding respectively according to described at least one non-template candidate corrected text chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
See Fig. 2, the error correction method of described speech recognition text comprises:
S210, according to the multi-level K-Gram index of the resulting text of speech recognition, pulls at least one non-template candidate corrected text for carrying out error correction to described resulting text.
Concrete, according to the multi-level K-Gram index of the resulting text of speech recognition, at least one the candidate's corrected text pulled for carrying out error correction to described resulting text comprises: according to the K-Gram index of Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pull at least one non-template candidate corrected text for carrying out error correction to described resulting text.
In the present embodiment, pull candidate's corrected text according to multi-level K-Gram index and be specially K-Gram index according to Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pull at least one candidate's corrected text, the candidate's corrected text pulled is non-template candidate corrected text.Described non-template candidate corrected text is the candidate's corrected text wherein not comprising asterisk wildcard.
S220, determines the described fuzzy phoneme editing distance matrix of at least one non-template candidate corrected text respectively and between described resulting text.
S230, obtains the fuzzy phoneme editing distance of described at least one non-template candidate corrected text respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined.
S240, the fuzzy phoneme editing distance corresponding respectively according to described at least one non-template candidate corrected text chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
Fig. 3 is the process flow diagram of fuzzy phoneme editing distance matrix computations in the error correction method of the speech recognition text that second embodiment of the invention provides.See Fig. 3, determine that the described fuzzy phoneme editing distance matrix of at least one non-template candidate corrected text respectively and between described resulting text comprises:
S221, for each non-template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current non-template candidate corrected text corresponding to described element and the resulting text corresponding to described element.
In the present embodiment, after pulling at least one non-template candidate corrected text, for each non-template candidate corrected text pulled, calculate the fuzzy phoneme editing distance matrix between it and described resulting text.When calculating the fuzzy phoneme editing distance matrix between current non-template candidate corrected text and described resulting text, first by the value of replacement operation corresponding element in described fuzzy phoneme editing distance matrix, the fuzzy phoneme similarity between the character in the character in the current non-template candidate corrected text corresponding to described element and the resulting text corresponding to described element is set to.
Can by carrying out text to described non-template candidate corrected text and described resulting text relatively or speech comparison corresponding to text and identifying to the position corresponding to described replacement operation.Such as, voice in described non-template candidate corrected text and described resulting text can be correlated with to the position corresponding in described fuzzy phoneme editing distance matrix of the highest two characters as the position corresponding to replacement operation.
Further, if the value of the element on the position that the replacement operation determined according to aforesaid way is corresponding is less than the value of the element on position that in described fuzzy phoneme editing distance matrix, previous replacement operation is corresponding, be then the value of this element with the value of the element on position corresponding to previous replacement operation.Therefore, the value of the element on the position that all replacement operations are corresponding should increase progressively successively.
S222, determines the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix, obtains the fuzzy phoneme editing distance matrix between current non-template candidate corrected text and described resulting text according to dynamic programming algorithm.
After the value of element corresponding to replacement operation in described fuzzy phoneme editing distance matrix is set, determine non-replaced operation corresponding element in described fuzzy phoneme editing distance, the value of other elements namely except replacement operation corresponding element.
Concrete, the value of described non-replaced operation corresponding element is determined according to the mode of dynamic programming algorithm.Further, when the transverse axis index of described element or the value of longitudinal axis index are 0, the value of described element is non-zero transverse axis index or longitudinal axis index.When the transverse axis index of described element and longitudinal axis index are not 0, the value of described element is determined according to following formula:
d[i][j]=min(d[i-1][j]+1,d[i][j-1]+1,d[i-1][j-1]+θ[i][j])。
Wherein, d [i] [j] for transverse axis index be i, longitudinal axis index is the value of the element of j, θ [i] [j] for transverse axis index be i, the fuzzy phoneme similarity between the character in the character in the non-template candidate corrected text of longitudinal axis index corresponding to the element of j and the corrected text corresponding to this element.
It should be noted that when calculating the value of non-replaced operation corresponding element, needing the value of element corresponding to the synchronous renewal replacement operation of above formula.
Table 2 show non-template candidate corrected text " tremble with fear " and resulting text " do not explain containing the Chinese idiom of Ah leopard cat " between fuzzy phoneme editing distance matrix.See table 2, on position corresponding to replacement operation between described non-template candidate corrected text and described resulting text, namely on the position that the transverse axis index of described table 2 is equal with longitudinal axis index, described fuzzy phoneme editing distance entry of a matrix element is the fuzzy phoneme similarity between the character in the resulting text that the character in the non-template candidate corrected text that this element is corresponding is corresponding with this element.
Table 2
No Cold And Chestnut
0 1 2 3 4
No 1 0 1 2 3
Contain 2 1 0 1 2
Ah 3 2 1 0.3369 1.3369
Leopard cat 4 3 2 1.3369 0.3369
's 5 4 3 2.3369 1.3369
Become 6 5 4 3.3369 2.3369
Language 7 6 5 4.3369 3.3369
Separate 8 7 6 5.3369 4.3369
Release 9 8 7 6.3369 5.3369
Fig. 4 is the process flow diagram of path backtracking in the error correction method of the speech recognition text that second embodiment of the invention provides.See Fig. 4, obtain the fuzzy phoneme editing distance of described at least one non-template candidate corrected text respectively and between described resulting text according to the fuzzy phoneme editing distance matrix determined and candidate's error correction border comprises:
S231, for each fuzzy phoneme editing distance matrix determined, obtains by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence.
When carrying out path backtracking, move to element corresponding to first replacement operation from first element of described fuzzy phoneme editing distance matrix with shortest path, move to element corresponding to last replacement operation from last element of described fuzzy phoneme editing distance matrix with shortest path simultaneously.The table 3 fuzzy phoneme editing distance matrix shown between non-template candidate corrected text " is trembled with fear " and resulting text " is not explained containing the Chinese idiom of Ah leopard cat " carries out the operation chart of path backtracking.Arrow in table 3 specifically designates the path back tracking operation carried out described fuzzy phoneme editing distance matrix.
Table 3
No Cold And Chestnut
0↘ 1 2 3 4
No 1 0 1 2 3
Contain 2 1 0 1 2
Ah 3 2 1 0.3369 1.3369
Leopard cat 4 3 2 1.3369 0.3369
's 5 4 3 2.3369 1.3369↑
Become 6 5 4 3.3369 2.3369↑
Language 7 6 5 4.3369 3.3369↑
Separate 8 7 6 5.3369 4.3369↑
Release 9 8 7 6.3369 5.3369↑
S232, present Fuzzy sound editing distance matrix norm is stuck with paste candidate's error correction border of sound editing distance and correspondence, the fuzzy phoneme editing distance between the non-template candidate corrected text corresponding as present Fuzzy sound editing distance matrix and described resulting text and candidate's error correction border.
Concrete, take the value of element corresponding to last replacement operation as the fuzzy phoneme editing distance of described non-template candidate corrected text and described resulting text.And, with the border of the character in non-template candidate corrected text described corresponding to the element that described first replacement operation is corresponding and the character in resulting text, and the border of the character in the described non-template candidate corrected text corresponding to element corresponding to last replacement operation described and the character in resulting text is described candidate's error correction border.In the above example, the coboundary of " trembling with fear " with candidate's corrected text and lower boundary, and candidate's error correction border that the coboundary of character string in resulting text " not containing Ah leopard cat " and lower boundary " are trembled with fear " as described non-template candidate corrected text.
The present embodiment is by the multi-level K-Gram index according to the resulting text of speech recognition, pull at least one non-template candidate corrected text for carrying out error correction to described resulting text, determine the described fuzzy phoneme editing distance matrix of at least one non-template candidate corrected text respectively and between described resulting text, the fuzzy phoneme editing distance of described at least one non-template candidate corrected text respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined, the fuzzy phoneme editing distance corresponding respectively according to described at least one non-template candidate corrected text chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text, achieve the accurate error correction of the resulting text to speech recognition.
3rd embodiment
Fig. 5 is the process flow diagram of the error correction method of the speech recognition text that third embodiment of the invention provides.The error correction method of described speech recognition text is based on first embodiment of the invention, further, the error correction method of described speech recognition text comprises: according to the multi-level K-Gram index of the resulting text of speech recognition, pulls at least one the template candidate corrected text for carrying out error correction to described resulting text; Determine the described fuzzy phoneme editing distance matrix of at least one template candidate corrected text respectively and between described resulting text; The fuzzy phoneme editing distance of at least one template candidate corrected text described respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined; The fuzzy phoneme editing distance corresponding respectively according at least one template candidate corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
See Fig. 5, the error correction method of described speech recognition text comprises:
S510, according to the multi-level K-Gram index of the resulting text of speech recognition, pulls at least one the template candidate corrected text for carrying out error correction to described resulting text.
S520, determines the described fuzzy phoneme editing distance matrix of at least one template candidate corrected text respectively and between described resulting text.
S530, obtains the fuzzy phoneme editing distance of at least one template candidate corrected text described respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined.
S540, the fuzzy phoneme editing distance corresponding respectively according at least one template candidate corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
Fig. 6 is the process flow diagram that in the error correction method of the speech recognition text that third embodiment of the invention provides, corrected text pulls.See Fig. 6, according to the multi-level K-Gram index of the resulting text of speech recognition, at least one the template candidate corrected text pulled for carrying out error correction to described resulting text comprises:
S511, according to the K-Gram index of Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text.
S512, identifies the proper noun comprised in each candidate's corrected text, and uses asterisk wildcard to replace described proper noun, to obtain at least one template candidate corrected text.
After pulling described candidate's corrected text, judge whether include proper noun in described candidate's corrected text.Described proper noun comprises the name of place name, national title, organization name and personality.Such as, " Liu Dehua " is the name of personality, can be identified as proper noun.
Identify proper noun from described candidate's corrected text after, use asterisk wildcard to replace described proper noun, thus obtain template candidate corrected text corresponding to described candidate's corrected text.Such as, for candidate's corrected text " I wants the song listening Liu De China ", identify proper noun " Liu Dehua " and after using asterisk wildcard to replace proper noun " Liu Dehua ", just define template candidate corrected text " I wants to listen the song of * ".In the above example, " * " is exactly the asterisk wildcard in described template candidate corrected text.
Fig. 7 is the process flow diagram that the error correction method inediting distance matrix of the speech recognition text that third embodiment of the invention provides calculates.See Fig. 7, determine that the described fuzzy phoneme editing distance matrix of at least one template candidate corrected text respectively and between described resulting text comprises:
S521, for each template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current template candidate corrected text corresponding to described element and the resulting text corresponding to described element.
For the template candidate corrected text comprising asterisk wildcard, similar with the determination mode of the fuzzy phoneme editing distance matrix of non-template candidate corrected text, by the value of replacement operation corresponding element, be set to the fuzzy phoneme similarity between character in the character in the current template candidate corrected text corresponding to prime number element and the resulting text corresponding to described element.
Table 4 shows the fuzzy phoneme editing distance matrix between template candidate corrected text " I wants to listen the song of * " and resulting text " I thinks very Liu De China taxi driver brother ".
Table 4
I Think Listen * 's Song
0 1 2 3 4 5 6
I 1 0 1 2 3 4 5
Think 2 1 0 1 2 3 4
Very 3 2 1 0 1 2 3
Liu 4 3 2 1 1 1.7 2.7
Moral 5 4 3 2 2 1 2
China 6 5 4 3 3 2 1.8
's 7 6 5 4 4 3 2.8
Brother 8 7 6 5 5 4 3
See table 4, on position corresponding to replacement operation between described template candidate corrected text and described resulting text, described fuzzy phoneme editing distance entry of a matrix element is the fuzzy phoneme similarity between the character in the resulting text that the character in the template candidate corrected text that this element is corresponding is corresponding with this element.
Identical with the fuzzy phoneme editing distance matrix of non-template candidate corrected text, position corresponding to described replacement operation can by carrying out text relatively or speech comparison corresponding to text and identifying to described non-template candidate corrected text and described resulting text.
With the fuzzy phoneme editing distance matrix of non-template candidate corrected text unlike, owing to comprising asterisk wildcard in described template candidate corrected text, the character in described template candidate corrected text and the character in described resulting text are not one_to_one corresponding.Under normal circumstances, asterisk wildcard can correspondence and the character at least two described resulting texts.Such as, in the example shown in table 4, described asterisk wildcard and the character of three in described resulting text: " Liu ", " moral " and " China " corresponding.
For the element on the position corresponding with the replacement operation corresponding to described asterisk wildcard, owing to cannot obtain the fuzzy phoneme similarity of these elements, their value is that the value of element on position that their previous replacement operation is corresponding adds one.
S522, determines the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix, obtains the fuzzy phoneme editing distance matrix between current template candidate corrected text and described resulting text according to dynamic programming algorithm.
For in described fuzzy phoneme editing distance matrix non-replaced operation corresponding element, namely in described fuzzy phoneme editing distance matrix except other elements of replacement operation corresponding element, determine its value according to dynamic programming algorithm.Further, when the value according to dynamic programming algorithm determination non-replaced operation corresponding element, the value upgrading replacement operation corresponding element together is also needed.
Fig. 8 is the process flow diagram of path backtracking in the error correction method of the speech recognition text that third embodiment of the invention provides.See Fig. 8, obtain the fuzzy phoneme editing distance of at least one template candidate corrected text described respectively and between described resulting text according to the fuzzy phoneme editing distance matrix determined and candidate's error correction border comprises:
S531, for each fuzzy phoneme editing distance matrix determined, obtains by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence.
For template candidate corrected text, the process and the non-template candidate corrected text that are obtained candidate's error correction border of present Fuzzy sound editing distance matrix norm paste sound editing distance and correspondence by path backtracking are similar, do not repeat them here.
S532, determines that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance, the difference between the editing distance corresponding to the asterisk wildcard in the template candidate corrected text corresponding with present Fuzzy sound editing distance matrix.
With the fuzzy phoneme editing distance acquisition process of non-template candidate corrected text unlike, for template candidate corrected text, after getting the fuzzy phoneme editing distance matrix norm paste sound editing distance of its correspondence, need the editing distance corresponding to the asterisk wildcard in described fuzzy phoneme editing distance matrix and described template candidate corrected text to do difference.
The editing distance that asterisk wildcard in described template candidate corrected text is corresponding is also obtained by path backtracking.Table 5 shows the process being obtained editing distance corresponding to described asterisk wildcard in fuzzy phoneme editing distance matrix corresponding to described template candidate corrected text by path backtracking.See table 5, the arrow in table represents the process that above-mentioned path is recalled.
Table 5
I Think Listen * 's Song
0↘ 1 2 3 4 5 6
I 1 0↘ 1 2 3 4 5
Think 2 1 0↘ 1 2 3 4
Very 3 2 1 0↘ 1 2 3
Liu 4 3 2 1 1 1.7 2.7
Moral 5 4 3 2 2 1 2
China 6 5 4 3 3 2 1.8
's 7 6 5 4 4 3↖ 2.8
Brother 8 7 6 5 5 4 3↖
The path backtracking illustrated by above-mentioned, deducting the value of the element on the previous replacement operation correspondence position of first element corresponding to described asterisk wildcard by the value of last element corresponding to described asterisk wildcard, is exactly editing distance corresponding to asterisk wildcard.In the above example, the editing distance that described asterisk wildcard is corresponding is 3.The fuzzy phoneme editing distance corresponding due to described fuzzy phoneme editing distance matrix is 3, so, in the example shown in table 4 and table 5, the difference between described template candidate corrected text " I wants the song listening * " and described resulting text " I thinks very Liu De China taxi driver brother " is 0.
S533, using described difference as the fuzzy phoneme editing distance between template candidate corrected text corresponding to present Fuzzy sound editing distance matrix and described resulting text.
In the above example, the value of described difference is 0.Therefore, the fuzzy phoneme editing distance between described template candidate corrected text " I wants the song listening * " and described resulting text " I thinks very Liu De China taxi driver brother " is 0.
The present embodiment is by the multi-level K-Gram index according to the resulting text of speech recognition, pull at least one the template candidate corrected text for carrying out error correction to described resulting text, determine the described fuzzy phoneme editing distance matrix of at least one template candidate corrected text respectively and between described resulting text, the fuzzy phoneme editing distance of at least one template candidate corrected text described respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined, the fuzzy phoneme editing distance corresponding respectively according at least one template candidate corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text, achieve the accurate error correction of the resulting text to speech recognition.
4th embodiment
Fig. 9 is the process flow diagram of the error correction method of the speech recognition text that fourth embodiment of the invention provides.The error correction method of described speech recognition text is based on first embodiment of the invention, further, after pulling at least one the candidate's corrected text for carrying out error correction to described resulting text, before determining the fuzzy phoneme editing distance matrix of at least one candidate's corrected text described respectively and between described resulting text, also comprise: according to the site of user or often pass place, at least one candidate's corrected text described is screened, to filter out and user-dependent at least one place name candidate corrected text.
See Fig. 9, the corrected text of described speech recognition text comprises:
S910, according to the multi-level K-Gram index of the resulting text of speech recognition, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text.
S920, according to the site of user or often pass place, screens at least one candidate's corrected text described, to filter out and user-dependent at least one place name candidate corrected text.
Suppose that the resulting text of speech recognition is a place name " Shi Gezhuan ", the candidate's corrected text pulled is included in Pekinese " Shi Gezhuan ", " Shi Gezhuan " in Qingdao and " Shi Gezhuan " in Qinhuangdao, by inquiring user site, learn that the site of user is Qingdao, then from above-mentioned candidate's corrected text, filter out place name candidate corrected text " Shi Gezhuan " as place name candidate corrected text.
S930, determines the described fuzzy phoneme editing distance matrix of at least one place name candidate corrected text respectively and between described resulting text.
S940, obtains the fuzzy phoneme editing distance of described at least one place name candidate corrected text respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined;
S950, the fuzzy phoneme editing distance corresponding respectively according to described at least one place name candidate corrected text chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
The present embodiment is by after pulling at least one the candidate's corrected text for carrying out error correction to described resulting text, before determining the fuzzy phoneme editing distance matrix of at least one candidate's corrected text described respectively and between described resulting text, according to the site of user or often pass place, at least one candidate's corrected text described is screened, to filter out and user-dependent at least one place name candidate corrected text, thus for user self location or through realize pulling of candidate's corrected text, achieve the personalized error correcting of the resulting text to speech recognition.
5th embodiment
Figure 10 is the process flow diagram of error correction in the error correction method of the speech recognition text that fifth embodiment of the invention provides.The error correction method of described speech recognition text is based on the first embodiment of the present invention, further, the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described is chosen corrected text and is comprised: if the number of at least one candidate's corrected text described is greater than one, then one that selects fuzzy phoneme editing distance at least one candidate's corrected text described minimum as corrected text; If the number of at least one candidate's corrected text described is one, then according to the magnitude relationship of the fuzzy phoneme editing distance threshold value preset with the fuzzy phoneme editing distance of this candidate's corrected text, judge whether described candidate's corrected text as corrected text.
See Figure 10, the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described is chosen corrected text and is comprised:
S141, if the number of at least one candidate's corrected text described is greater than one, then one that selects fuzzy phoneme editing distance at least one candidate's corrected text described minimum as corrected text.
Fuzzy phoneme editing distance between two texts is larger, and these two texts similarity degree is phonetically lower, and fuzzy phoneme editing distance between two texts is less, then these two texts similarity degree is phonetically higher.Therefore, when the quantity of described candidate's corrected text is greater than one, one that should select that fuzzy phoneme editing distance in described candidate's corrected text is minimum, namely the highest with described resulting text similarity degree phonetically one as corrected text.
S142, if the number of at least one candidate's corrected text described is one, then according to the magnitude relationship of the fuzzy phoneme editing distance threshold value preset with the fuzzy phoneme editing distance of this candidate's corrected text, judge whether described candidate's corrected text as corrected text.
Concrete, when the number of described candidate's corrected text is one, judge whether the fuzzy phoneme editing distance between described candidate's corrected text and described resulting text is less than default fuzzy phoneme editing distance threshold value.If fuzzy phoneme editing distance corresponding to described candidate's corrected text is less than default fuzzy phoneme editing distance threshold value, then can using described candidate's corrected text as corrected text; If fuzzy phoneme editing distance corresponding to described candidate's corrected text is greater than default fuzzy phoneme editing distance threshold value, then not using described candidate's corrected text as corrected text.
The present embodiment is by when the number of described candidate's corrected text is greater than one, one that selects fuzzy phoneme editing distance in described candidate's corrected text minimum is corrected text, and when the number of described candidate's corrected text is one, according to the magnitude relationship of the fuzzy phoneme editing distance threshold value preset with the fuzzy phoneme editing distance of this candidate's corrected text, to judge whether described candidate's corrected text, as corrected text, to achieve the accurate error correction of the resulting text to speech recognition.
6th embodiment
Figure 11 is the structural drawing of the error correction device of the speech recognition text that sixth embodiment of the invention provides.See Figure 11, the error correction device of described speech recognition text comprises: corrected text pulls module 1110, editing distance matrix computations module 1130, path backtracking module 1140 and correction module 1150.
Described corrected text pulls the multi-level K-Gram index of module 1110 for the resulting text according to speech recognition, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text.
Described editing distance matrix computations module 1130 is for determining the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text.
Described path backtracking module 1140 is for obtaining the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined.
Described correction module 1150 chooses corrected text for the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
Preferably, described corrected text pulls module 1110 and comprises: first pulls unit 1111 at many levels.
Described first pulls unit 1111 at many levels for the K-Gram index according to Chinese character level, spelling level or the initial and the final level, pulls at least one non-template candidate corrected text for carrying out error correction to described resulting text.
Preferably, described editing distance matrix computations module 1130 comprises: the first matrix calculation unit 1131 and the first matrix element replacement unit 1132.
Described first matrix element setting unit 1131 is for for each non-template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current non-template candidate corrected text corresponding to described element and the resulting text corresponding to described element.
Described first matrix calculation unit 1132, for determining the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix according to dynamic programming algorithm, obtains the fuzzy phoneme editing distance matrix between current non-template candidate corrected text and described resulting text.
Preferably, described path backtracking module 1140 comprises: the first path trace unit 1141 and the first editing distance computing unit 1142.
Described first path trace unit 1141, for for each fuzzy phoneme editing distance matrix determined, obtains by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence.
Described first editing distance computing unit 1142 is for sticking with paste candidate's error correction border of sound editing distance and correspondence, the fuzzy phoneme editing distance between the non-template candidate corrected text corresponding as present Fuzzy sound editing distance matrix and described resulting text and candidate's error correction border by present Fuzzy sound editing distance matrix norm.
Preferably, described corrected text pulls module 1110 and comprises: second pulls unit 1112 and asterisk wildcard replacement unit 1113 at many levels.
Described second pulls unit 1112 at many levels for the K-Gram index according to Chinese character level, spelling level or the initial and the final level, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text.
Described asterisk wildcard replacement unit 1113 for identifying the proper noun comprised in each candidate's corrected text, and uses asterisk wildcard to replace described proper noun, to obtain at least one template candidate corrected text.
Preferably, described editing distance matrix computations module 1130 comprises: the second matrix calculation unit 1133 and the second matrix element replacement unit 1134.
Described second matrix element setting unit 1133 is for for each template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current template candidate corrected text corresponding to described element and the resulting text corresponding to described element.
Described second matrix calculation unit 1134, for determining the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix according to dynamic programming algorithm, obtains the fuzzy phoneme editing distance matrix between current template candidate corrected text and described resulting text.
Preferably, described path backtracking module 1140 comprises: the second path trace unit 1143, difference acquiring unit 1144 and the second editing distance computing unit 1145.
Described second path trace unit 1143, for for each fuzzy phoneme editing distance matrix determined, obtains by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence.
Described difference acquiring unit 1144 is for determining that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance, the difference between the editing distance that the asterisk wildcard in the template candidate corrected text corresponding with present Fuzzy sound editing distance matrix is corresponding.
Described second editing distance computing unit 1145 for using described difference as the fuzzy phoneme editing distance between template candidate corrected text corresponding to present Fuzzy sound editing distance matrix and described resulting text.
Preferably, the error correction device of described speech recognition text also comprises: place name text replacement module 1120.
Described place name text replacement module 1120 is for after pulling at least one the candidate's corrected text for carrying out error correction to described resulting text, before determining the fuzzy phoneme editing distance matrix of at least one candidate's corrected text described respectively and between described resulting text, according to the site of user or often pass place, at least one candidate's corrected text described is screened, to filter out and user-dependent at least one place name candidate corrected text.
Preferably, corresponding respectively according at least one candidate's corrected text described fuzzy phoneme editing distance is chosen corrected text and is comprised:
Be greater than when one in the number of at least one candidate's corrected text described, one that selects fuzzy phoneme editing distance at least one candidate's corrected text described minimum as corrected text;
When the number of at least one candidate's corrected text described is one, according to the magnitude relationship of the fuzzy phoneme editing distance threshold value preset with the fuzzy phoneme editing distance of this candidate's corrected text, judge whether described candidate's corrected text as corrected text.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
Those of ordinary skill in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of computer installation, thus they storages can be performed by calculation element in the storage device, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to the combination of any specific hardware and software.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, the same or analogous part between each embodiment mutually see.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, to those skilled in the art, the present invention can have various change and change.All do within spirit of the present invention and principle any amendment, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (18)

1. an error correction method for speech recognition text, is characterized in that, comprising:
According to the multi-level K-Gram index of the resulting text of speech recognition, pull at least one the candidate's corrected text for carrying out error correction to described resulting text;
Determine the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text;
The fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border is obtained according to the fuzzy phoneme editing distance matrix determined;
The fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described chooses corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text.
2. method according to claim 1, is characterized in that, according to the multi-level K-Gram index of the resulting text of speech recognition, at least one the candidate's corrected text pulled for carrying out error correction to described resulting text comprises:
According to the K-Gram index of Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pull at least one non-template candidate corrected text for carrying out error correction to described resulting text.
3. method according to claim 2, is characterized in that, determines that the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text comprises:
For each non-template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current non-template candidate corrected text corresponding to described element and the resulting text corresponding to described element;
Determine the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix according to dynamic programming algorithm, obtain the fuzzy phoneme editing distance matrix between current non-template candidate corrected text and described resulting text.
4. method according to claim 2, is characterized in that, obtains the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border comprises:
For each fuzzy phoneme editing distance matrix determined, obtain by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence;
Present Fuzzy sound editing distance matrix norm is stuck with paste candidate's error correction border of sound editing distance and correspondence, the fuzzy phoneme editing distance between the non-template candidate corrected text corresponding as present Fuzzy sound editing distance matrix and described resulting text and candidate's error correction border.
5. method according to claim 1, is characterized in that, according to the multi-level K-Gram index of the resulting text of speech recognition, at least one the candidate's corrected text pulled for carrying out error correction to described resulting text comprises:
According to the K-Gram index of Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pull at least one the candidate's corrected text for carrying out error correction to described resulting text;
Identify the proper noun comprised in each candidate's corrected text, and use asterisk wildcard to replace described proper noun, to obtain at least one template candidate corrected text.
6. method according to claim 5, is characterized in that, determines that the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text comprises:
For each template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current template candidate corrected text corresponding to described element and the resulting text corresponding to described element;
Determine the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix according to dynamic programming algorithm, obtain the fuzzy phoneme editing distance matrix between current template candidate corrected text and described resulting text.
7. method according to claim 5, is characterized in that, obtains the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border comprises:
For each fuzzy phoneme editing distance matrix determined, obtain by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence;
Determine that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance, the difference between the editing distance corresponding to the asterisk wildcard in the template candidate corrected text corresponding with present Fuzzy sound editing distance matrix;
Using described difference as the fuzzy phoneme editing distance between template candidate corrected text corresponding to present Fuzzy sound editing distance matrix and described resulting text.
8. method according to claim 1, it is characterized in that, after pulling at least one the candidate's corrected text for carrying out error correction to described resulting text, before determining the fuzzy phoneme editing distance matrix of at least one candidate's corrected text described respectively and between described resulting text, also comprise:
According to the site of user or often pass place, at least one candidate's corrected text described is screened, to filter out and user-dependent at least one place name candidate corrected text.
9. method according to claim 1, is characterized in that, the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described is chosen corrected text and comprised:
If the number of at least one candidate's corrected text described is greater than one, then one that selects fuzzy phoneme editing distance at least one candidate's corrected text described minimum as corrected text;
If the number of at least one candidate's corrected text described is one, then according to the magnitude relationship of the fuzzy phoneme editing distance threshold value preset with the fuzzy phoneme editing distance of this candidate's corrected text, judge whether described candidate's corrected text as corrected text.
10. an error correction device for speech recognition text, is characterized in that, comprising:
Corrected text pulls module, for the multi-level K-Gram index of the resulting text according to speech recognition, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text;
Editing distance matrix computations module, for determining the described fuzzy phoneme editing distance matrix of at least one candidate's corrected text respectively and between described resulting text;
Path backtracking module, for obtaining the fuzzy phoneme editing distance of at least one candidate's corrected text described respectively and between described resulting text and candidate's error correction border according to the fuzzy phoneme editing distance matrix determined;
Correction module, choose corrected text, and error correction is carried out to described resulting text in the candidate's error correction border corresponding to described corrected text for the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described.
11. devices according to claim 10, is characterized in that, described corrected text pulls module and comprises:
First pulls unit at many levels, for the K-Gram index according to Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pulls at least one non-template candidate corrected text for carrying out error correction to described resulting text.
12. devices according to claim 11, is characterized in that, described editing distance matrix computations module comprises:
First matrix element setting unit, for for each non-template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current non-template candidate corrected text corresponding to described element and the resulting text corresponding to described element;
First matrix calculation unit, for determining the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix according to dynamic programming algorithm, obtains the fuzzy phoneme editing distance matrix between current non-template candidate corrected text and described resulting text.
13. devices according to claim 11, is characterized in that, described path backtracking module comprises:
First path trace unit, for for each fuzzy phoneme editing distance matrix determined, obtains by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence;
First editing distance computing unit, for present Fuzzy sound editing distance matrix norm being stuck with paste candidate's error correction border of sound editing distance and correspondence, the fuzzy phoneme editing distance between the non-template candidate corrected text corresponding as present Fuzzy sound editing distance matrix and described resulting text and candidate's error correction border.
14. devices according to claim 10, is characterized in that, described corrected text pulls module and comprises:
Second pulls unit at many levels, for the K-Gram index according to Chinese character level, pinyin syllable level, spelling or simplicity level or the initial and the final level, pulls at least one the candidate's corrected text for carrying out error correction to described resulting text;
Asterisk wildcard replacement unit, for identifying the proper noun comprised in each candidate's corrected text, and uses asterisk wildcard to replace described proper noun, to obtain at least one template candidate corrected text.
15. devices according to claim 14, is characterized in that, described editing distance matrix computations module comprises:
Second matrix element setting unit, for for each template candidate corrected text pulled, by the value of replacement operation corresponding element in initialized fuzzy phoneme editing distance matrix, be set to the fuzzy phoneme similarity between the character in the character in the current template candidate corrected text corresponding to described element and the resulting text corresponding to described element;
Second matrix calculation unit, for determining the value of the non-replaced operation corresponding element in described fuzzy phoneme editing distance matrix according to dynamic programming algorithm, obtains the fuzzy phoneme editing distance matrix between current template candidate corrected text and described resulting text.
16. devices according to claim 14, is characterized in that, described path backtracking module comprises:
Second path trace unit, for for each fuzzy phoneme editing distance matrix determined, obtains by path backtracking candidate's error correction border that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance and correspondence;
Difference acquiring unit, for determining that present Fuzzy sound editing distance matrix norm sticks with paste sound editing distance, the difference between the editing distance that the asterisk wildcard in the template candidate corrected text corresponding with present Fuzzy sound editing distance matrix is corresponding;
Second editing distance computing unit, for using described difference as the fuzzy phoneme editing distance between template candidate corrected text corresponding to present Fuzzy sound editing distance matrix and described resulting text.
17. devices according to claim 10, is characterized in that, also comprise:
Place name text replacement module, for after pulling at least one the candidate's corrected text for carrying out error correction to described resulting text, before determining the fuzzy phoneme editing distance matrix of at least one candidate's corrected text described respectively and between described resulting text, according to the site of user or often pass place, at least one candidate's corrected text described is screened, to filter out and user-dependent at least one place name candidate corrected text.
18. devices according to claim 10, is characterized in that, the fuzzy phoneme editing distance corresponding respectively according at least one candidate's corrected text described is chosen corrected text and comprised:
Be greater than when one in the number of at least one candidate's corrected text described, one that selects fuzzy phoneme editing distance at least one candidate's corrected text described minimum as corrected text;
When the number of at least one candidate's corrected text described is one, according to the magnitude relationship of the fuzzy phoneme editing distance threshold value preset with the fuzzy phoneme editing distance of this candidate's corrected text, judge whether described candidate's corrected text as corrected text.
CN201410778108.3A 2014-12-15 2014-12-15 The error correction method and device of speech recognition text Active CN104464736B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410778108.3A CN104464736B (en) 2014-12-15 2014-12-15 The error correction method and device of speech recognition text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410778108.3A CN104464736B (en) 2014-12-15 2014-12-15 The error correction method and device of speech recognition text

Publications (2)

Publication Number Publication Date
CN104464736A true CN104464736A (en) 2015-03-25
CN104464736B CN104464736B (en) 2018-02-02

Family

ID=52910683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410778108.3A Active CN104464736B (en) 2014-12-15 2014-12-15 The error correction method and device of speech recognition text

Country Status (1)

Country Link
CN (1) CN104464736B (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869634A (en) * 2016-03-31 2016-08-17 重庆大学 Field-based method and system for feeding back text error correction after speech recognition
CN105869642A (en) * 2016-03-25 2016-08-17 海信集团有限公司 Voice text error correction method and device
CN105976818A (en) * 2016-04-26 2016-09-28 Tcl集团股份有限公司 Instruction identification processing method and apparatus thereof
CN106448675A (en) * 2016-10-21 2017-02-22 科大讯飞股份有限公司 Recognition text correction method and system
CN106534548A (en) * 2016-11-17 2017-03-22 科大讯飞股份有限公司 Voice error correction method and device
CN106847288A (en) * 2017-02-17 2017-06-13 上海创米科技有限公司 The error correction method and device of speech recognition text
CN107544726A (en) * 2017-07-04 2018-01-05 百度在线网络技术(北京)有限公司 Method for correcting error of voice identification result, device and storage medium based on artificial intelligence
CN107729321A (en) * 2017-10-23 2018-02-23 上海百芝龙网络科技有限公司 A kind of method for correcting error of voice identification result
CN107741928A (en) * 2017-10-13 2018-02-27 四川长虹电器股份有限公司 A kind of method to text error correction after speech recognition based on field identification
CN108694942A (en) * 2018-04-02 2018-10-23 浙江大学 A kind of smart home interaction question answering system based on home furnishings intelligent service robot
CN109710904A (en) * 2018-11-13 2019-05-03 平安科技(深圳)有限公司 Text accuracy rate calculation method, device, computer equipment based on semanteme parsing
CN110033769A (en) * 2019-04-23 2019-07-19 努比亚技术有限公司 A kind of typing method of speech processing, terminal and computer readable storage medium
CN110415679A (en) * 2019-07-25 2019-11-05 北京百度网讯科技有限公司 Voice error correction method, device, equipment and storage medium
CN110442853A (en) * 2019-08-09 2019-11-12 深圳前海微众银行股份有限公司 Text positioning method, device, terminal and storage medium
CN110442876A (en) * 2019-08-09 2019-11-12 深圳前海微众银行股份有限公司 Text mining method, apparatus, terminal and storage medium
US10546062B2 (en) 2017-11-15 2020-01-28 International Business Machines Corporation Phonetic patterns for fuzzy matching in natural language processing
CN110992956A (en) * 2019-11-11 2020-04-10 上海市研发公共服务平台管理中心 Information processing method, device, equipment and storage medium for voice conversion
CN111382562A (en) * 2020-03-05 2020-07-07 百度在线网络技术(北京)有限公司 Text similarity determination method and device, electronic equipment and storage medium
CN111832554A (en) * 2019-04-15 2020-10-27 顺丰科技有限公司 Image detection method, device and storage medium
CN111862955A (en) * 2020-06-23 2020-10-30 北京嘀嘀无限科技发展有限公司 Voice recognition method, terminal and computer readable storage medium
CN112382289A (en) * 2020-11-13 2021-02-19 北京百度网讯科技有限公司 Method and device for processing voice recognition result, electronic equipment and storage medium
CN112560493A (en) * 2020-12-17 2021-03-26 金蝶软件(中国)有限公司 Named entity error correction method, named entity error correction device, computer equipment and storage medium
CN112836497A (en) * 2021-01-29 2021-05-25 上海寻梦信息技术有限公司 Address correction method, device, electronic equipment and storage medium
CN113781998A (en) * 2021-09-10 2021-12-10 未鲲(上海)科技服务有限公司 Dialect correction model-based voice recognition method, device, equipment and medium
WO2023173533A1 (en) * 2022-03-17 2023-09-21 平安科技(深圳)有限公司 Text error correction method and apparatus, device, and storage medium
US11810558B2 (en) 2021-05-26 2023-11-07 International Business Machines Corporation Explaining anomalous phonetic translations

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923854A (en) * 2010-08-31 2010-12-22 中国科学院计算技术研究所 Interactive speech recognition system and method
CN102999483A (en) * 2011-09-16 2013-03-27 北京百度网讯科技有限公司 Method and device for correcting text
US20130158995A1 (en) * 2009-11-24 2013-06-20 Sorenson Communications, Inc. Methods and apparatuses related to text caption error correction
US20130311182A1 (en) * 2012-05-16 2013-11-21 Gwangju Institute Of Science And Technology Apparatus for correcting error in speech recognition
CN104021786A (en) * 2014-05-15 2014-09-03 北京中科汇联信息技术有限公司 Speech recognition method and speech recognition device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130158995A1 (en) * 2009-11-24 2013-06-20 Sorenson Communications, Inc. Methods and apparatuses related to text caption error correction
CN101923854A (en) * 2010-08-31 2010-12-22 中国科学院计算技术研究所 Interactive speech recognition system and method
CN102999483A (en) * 2011-09-16 2013-03-27 北京百度网讯科技有限公司 Method and device for correcting text
US20130311182A1 (en) * 2012-05-16 2013-11-21 Gwangju Institute Of Science And Technology Apparatus for correcting error in speech recognition
CN104021786A (en) * 2014-05-15 2014-09-03 北京中科汇联信息技术有限公司 Speech recognition method and speech recognition device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴斌: "语音识别中的后处理技术研究", 《博士研究生学位论文》 *

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869642A (en) * 2016-03-25 2016-08-17 海信集团有限公司 Voice text error correction method and device
CN105869642B (en) * 2016-03-25 2019-09-20 海信集团有限公司 A kind of error correction method and device of speech text
CN105869634A (en) * 2016-03-31 2016-08-17 重庆大学 Field-based method and system for feeding back text error correction after speech recognition
CN105869634B (en) * 2016-03-31 2019-11-19 重庆大学 It is a kind of based on field band feedback speech recognition after text error correction method and system
CN105976818A (en) * 2016-04-26 2016-09-28 Tcl集团股份有限公司 Instruction identification processing method and apparatus thereof
CN105976818B (en) * 2016-04-26 2020-12-25 Tcl科技集团股份有限公司 Instruction recognition processing method and device
CN106448675A (en) * 2016-10-21 2017-02-22 科大讯飞股份有限公司 Recognition text correction method and system
CN106448675B (en) * 2016-10-21 2020-05-01 科大讯飞股份有限公司 Method and system for correcting recognition text
CN106534548A (en) * 2016-11-17 2017-03-22 科大讯飞股份有限公司 Voice error correction method and device
CN106534548B (en) * 2016-11-17 2020-06-12 科大讯飞股份有限公司 Voice error correction method and device
CN106847288B (en) * 2017-02-17 2020-12-25 上海创米科技有限公司 Error correction method and device for voice recognition text
CN106847288A (en) * 2017-02-17 2017-06-13 上海创米科技有限公司 The error correction method and device of speech recognition text
CN107544726A (en) * 2017-07-04 2018-01-05 百度在线网络技术(北京)有限公司 Method for correcting error of voice identification result, device and storage medium based on artificial intelligence
CN107741928A (en) * 2017-10-13 2018-02-27 四川长虹电器股份有限公司 A kind of method to text error correction after speech recognition based on field identification
CN107741928B (en) * 2017-10-13 2021-01-26 四川长虹电器股份有限公司 Method for correcting error of text after voice recognition based on domain recognition
CN107729321A (en) * 2017-10-23 2018-02-23 上海百芝龙网络科技有限公司 A kind of method for correcting error of voice identification result
US10546062B2 (en) 2017-11-15 2020-01-28 International Business Machines Corporation Phonetic patterns for fuzzy matching in natural language processing
CN108694942A (en) * 2018-04-02 2018-10-23 浙江大学 A kind of smart home interaction question answering system based on home furnishings intelligent service robot
CN109710904A (en) * 2018-11-13 2019-05-03 平安科技(深圳)有限公司 Text accuracy rate calculation method, device, computer equipment based on semanteme parsing
CN109710904B (en) * 2018-11-13 2023-11-14 平安科技(深圳)有限公司 Text accuracy rate calculation method and device based on semantic analysis and computer equipment
CN111832554A (en) * 2019-04-15 2020-10-27 顺丰科技有限公司 Image detection method, device and storage medium
CN110033769A (en) * 2019-04-23 2019-07-19 努比亚技术有限公司 A kind of typing method of speech processing, terminal and computer readable storage medium
CN110415679B (en) * 2019-07-25 2021-12-17 北京百度网讯科技有限公司 Voice error correction method, device, equipment and storage medium
CN110415679A (en) * 2019-07-25 2019-11-05 北京百度网讯科技有限公司 Voice error correction method, device, equipment and storage medium
US11328708B2 (en) 2019-07-25 2022-05-10 Beijing Baidu Netcom Science And Technology Co., Ltd. Speech error-correction method, device and storage medium
CN110442876A (en) * 2019-08-09 2019-11-12 深圳前海微众银行股份有限公司 Text mining method, apparatus, terminal and storage medium
CN110442853A (en) * 2019-08-09 2019-11-12 深圳前海微众银行股份有限公司 Text positioning method, device, terminal and storage medium
CN110442876B (en) * 2019-08-09 2023-09-05 深圳前海微众银行股份有限公司 Text mining method, device, terminal and storage medium
CN110992956A (en) * 2019-11-11 2020-04-10 上海市研发公共服务平台管理中心 Information processing method, device, equipment and storage medium for voice conversion
CN111382562A (en) * 2020-03-05 2020-07-07 百度在线网络技术(北京)有限公司 Text similarity determination method and device, electronic equipment and storage medium
CN111382562B (en) * 2020-03-05 2024-03-01 百度在线网络技术(北京)有限公司 Text similarity determination method and device, electronic equipment and storage medium
CN111862955A (en) * 2020-06-23 2020-10-30 北京嘀嘀无限科技发展有限公司 Voice recognition method, terminal and computer readable storage medium
CN111862955B (en) * 2020-06-23 2024-04-23 北京嘀嘀无限科技发展有限公司 Speech recognition method and terminal, and computer readable storage medium
CN112382289B (en) * 2020-11-13 2024-03-22 北京百度网讯科技有限公司 Speech recognition result processing method and device, electronic equipment and storage medium
CN112382289A (en) * 2020-11-13 2021-02-19 北京百度网讯科技有限公司 Method and device for processing voice recognition result, electronic equipment and storage medium
CN112560493B (en) * 2020-12-17 2024-04-30 金蝶软件(中国)有限公司 Named entity error correction method, named entity error correction device, named entity error correction computer equipment and named entity error correction storage medium
CN112560493A (en) * 2020-12-17 2021-03-26 金蝶软件(中国)有限公司 Named entity error correction method, named entity error correction device, computer equipment and storage medium
CN112836497A (en) * 2021-01-29 2021-05-25 上海寻梦信息技术有限公司 Address correction method, device, electronic equipment and storage medium
US11810558B2 (en) 2021-05-26 2023-11-07 International Business Machines Corporation Explaining anomalous phonetic translations
CN113781998A (en) * 2021-09-10 2021-12-10 未鲲(上海)科技服务有限公司 Dialect correction model-based voice recognition method, device, equipment and medium
CN113781998B (en) * 2021-09-10 2024-06-07 河南松音科技有限公司 Speech recognition method, device, equipment and medium based on dialect correction model
WO2023173533A1 (en) * 2022-03-17 2023-09-21 平安科技(深圳)有限公司 Text error correction method and apparatus, device, and storage medium

Also Published As

Publication number Publication date
CN104464736B (en) 2018-02-02

Similar Documents

Publication Publication Date Title
CN104464736A (en) Error correction method and device for voice recognition text
JP6675463B2 (en) Bidirectional stochastic rewriting and selection of natural language
JP4580885B2 (en) Scene information extraction method, scene extraction method, and extraction apparatus
CN106534548B (en) Voice error correction method and device
CN106570180B (en) Voice search method and device based on artificial intelligence
CN109637537B (en) Method for automatically acquiring annotated data to optimize user-defined awakening model
US7810030B2 (en) Fault-tolerant romanized input method for non-roman characters
KR101359718B1 (en) Conversation Managemnt System and Method Thereof
CN105550171B (en) A kind of the Query Information error correction method and system of vertical search engine
US9747893B2 (en) Unsupervised training method, training apparatus, and training program for an N-gram language model based upon recognition reliability
US20170103061A1 (en) Interaction apparatus and method
WO2015169134A1 (en) Method and apparatus for phonetically annotating text
US9779728B2 (en) Systems and methods for adding punctuations by detecting silences in a voice using plurality of aggregate weights which obey a linear relationship
EP3213227A1 (en) Contextual search disambiguation
US20070179777A1 (en) Automatic Grammar Generation Using Distributedly Collected Knowledge
CN103678684A (en) Chinese word segmentation method based on navigation information retrieval
CN102193646B (en) Method and device for generating personal name candidate words
CN112861521B (en) Speech recognition result error correction method, electronic device and storage medium
US20150228273A1 (en) Automated generation of phonemic lexicon for voice activated cockpit management systems
WO2014036827A1 (en) Text correcting method and user equipment
CN104007836A (en) Handwriting input processing method and terminal device
CN111125438A (en) Entity information extraction method and device, electronic equipment and storage medium
JP2001229180A (en) Contents retrieval device
CN115858733A (en) Cross-language entity word retrieval method, device, equipment and storage medium
Rehbein POS error detection in automatically annotated corpora

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant