CN101650943A - Non-native speech recognition system and method thereof - Google Patents

Non-native speech recognition system and method thereof Download PDF

Info

Publication number
CN101650943A
CN101650943A CN200810239892A CN200810239892A CN101650943A CN 101650943 A CN101650943 A CN 101650943A CN 200810239892 A CN200810239892 A CN 200810239892A CN 200810239892 A CN200810239892 A CN 200810239892A CN 101650943 A CN101650943 A CN 101650943A
Authority
CN
China
Prior art keywords
mother tongue
state
pronunciation
mother
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200810239892A
Other languages
Chinese (zh)
Inventor
颜永红
潘接林
张晴晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN200810239892A priority Critical patent/CN101650943A/en
Publication of CN101650943A publication Critical patent/CN101650943A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a non-native speech recognition system based on mixed model state correction and a method thereof. The non-native speech recognition system comprises a non-native speech interface, a native model module, a non-native model module, a native state decoding module, a non-native state forced alignment module, a native-non-native state similarity matrix computation module, a native-non-native state mapping table computation module and a non-native state correction model decoding module. In the system and the method thereof, a non-native acoustic model is corrected at statelevels based on a native acoustic model of a speaker and state mapping among different models, thus obtaining a model that better meets non-native pronunciation characteristics. The system and the method thereof have the advantages of obvious improvement of recognition performance compared with that of a recognition system which is not corrected by the method on the premise of only the native training data without increase of any non-native speech training data, no obvious speed fall of the recognition speech of the system and very high practicability.

Description

A kind of non-mother tongue pronunciation recognition system and method
Technical field
The present invention relates to speech recognition system and method, particularly based on the non-mother tongue pronunciation recognition system and the method for mixture model state correction.
Background technology
Along with the globalization of modern society's information, non-mother tongue pronunciation identification becomes one of speech recognition technology area research focus.For the speech recognition of the relative mother tongue of speech recognition of non-mother tongue, recognition performance will reduce significantly, particularly for the speaker that serious accent is arranged.How guaranteeing speech recognition system under the prerequisite that a small amount of non-mother tongue training data is only arranged, is the emphasis of non-mother tongue pronunciation Study of recognition to the raising of the non-mother tongue pronunciation recognition performance that has accent in various degree.
Document (Bohn, O.-S., Flege, J.E., " The production of new and similar vowels by adultGerman learners of English. " [M] Stud.Second Lang.Acquis.14,131-158,1992.) point out, non-mother tongue speaker may substitute with speaker's self mother tongue pronunciation when carrying out the target language pronunciation, perhaps produces the pronunciation that speaker's mother tongue pronunciation characteristic and target language pronunciation characteristic combine.This conclusion inspires us, and speaker's self mother tongue speech training data may be helpful to the non-mother tongue pronunciation identification of speaker, particularly at the heavier speech data of those pronunciation accents.
Current, speaker adaptation technology (as MAP, MLLR) has been widely used in handles (Z.Wang in the non-mother tongue pronunciation identification, T.Schultz, A.Waibel, " Comparison of acoustic model adaptationtechniques on non-native speech " [C], Proc.ICASSP 2003.).These class methods mainly are by a small amount of non-mother tongue pronunciation data being carried out self-adaptation, making mother tongue pronunciation acoustic model can approach the pronunciation characteristic of non-mother tongue to a certain extent.In these methods, the similarity of self-adapting data and test data is the key factor of decision recognition performance quality.Although adaptive technique has contribution on the non-mother tongue pronunciation recognition performance of raising, compare the performance of mother tongue pronunciation model by the performance of model in non-mother tongue pronunciation identification that self-adaptation is later in mother tongue pronunciation identification, still lower.Document (J.Humphries, P.Woodland, and D.Pearce. " Using accent-specificpronunciation modeling for robust speech recognition. " [C] In Proc.ICSLP ' 96, pages2324-2327, Philadelphia, PA, October 1996.) studied the limitation of adaptive algorithm, point out to cause the low main cause of non-mother tongue pronunciation discrimination to come from the non-mother tongue language pronouncing that can't contain in speaker's mother tongue family of languages.How making non-mother tongue pronunciation acoustic model simulate this speech like sound pronunciation preferably is the emphasis of non-mother tongue pronunciation Study of recognition work.
Summary of the invention
In order to overcome above-mentioned the deficiencies in the prior art, the invention provides a kind of non-mother tongue pronunciation recognition system and method based on the state correction, this system and method is by the mapping of the state between different models, utilize the acoustic model of speaker's mother tongue on Status Level, non-mother tongue acoustic model to be revised, thereby more met the model of non-mother tongue pronunciation characteristic.
To achieve these goals, a kind of non-mother tongue pronunciation recognition system that the present invention proposes comprises:
One non-mother tongue pronunciation interface is used to gather non-mother tongue pronunciation data, and should non-mother tongue pronunciation data send into non-mother tongue state pressure alignment module and mother tongue state decode module.
One mother tongue model module is used for providing the mother tongue acoustic model to mother tongue state decode module and non-mother tongue state correction model decoder module.
One non-mother tongue model module is used for forcing alignment module and non-mother tongue state correction model decoder module that non-mother tongue acoustic model is provided to non-mother tongue state.
One mother tongue state decode module, be used for non-mother tongue pronunciation data being decoded according to standard female phonics model, obtain mother tongue pronunciation state levels segment information, be mother tongue pronunciation state decode information, and mother tongue pronunciation state levels segment information is sent into mother tongue and non-mother tongue state similarity matrix computing module.
One non-mother tongue state is forced alignment module, be used for non-mother tongue pronunciation data being forced alignment according to non-mother tongue acoustic model, obtain non-mother tongue pronunciation state levels segment information, be non-mother tongue pronunciation state reference information, and non-mother tongue pronunciation state levels segment information is sent into mother tongue and non-mother tongue state similarity matrix computing module.
One mother tongue and non-mother tongue state similarity matrix computing module, be used for mother tongue pronunciation state levels segment information and non-mother tongue pronunciation state levels segment information are alignd in time, when time of coincidence of mother tongue pronunciation state and non-mother tongue pronunciation state during greater than predefined threshold value, think that " with existing " appears once in these two states, count all " with existing " situations, and calculate the similarity matrix of non-mother tongue pronunciation state, and this similarity matrix information is sent into mother tongue and non-mother tongue state mapping map computing module corresponding to the mother tongue pronunciation state.
One mother tongue and non-mother tongue state mapping map computing module are used for calculating state mapping map according to similarity matrix.And
One non-mother tongue state correction model decoder module is used for the decode procedure in speech recognition, with the corresponding non-mother tongue acoustic model state of the mother tongue acoustic model state correction of finding in the state mapping map, obtains revised non-mother tongue acoustic model; Final this revised non-mother tongue acoustic model that utilizes is finished non-mother tongue pronunciation identification.
Wherein, described non-mother tongue pronunciation state obtains by following formula corresponding to the similarity matrix of mother tongue pronunciation state:
A i , j = count ( t j | s i ) Σ n = 1 N count ( t n | s i )
Wherein, M, N are respectively Chinese and English number of states; A S, T(M N) is similarity matrix; A I, jI for matrix ThRow j ThColumn element; t jBe non-mother tongue pronunciation state, s iBe the mother tongue pronunciation state; A I, j∈ A S, T(M, N), i=1 ... M, j=1 ... N; Count (t j| s i) be mother tongue pronunciation state s iWith non-mother tongue pronunciation state t jBetween " with existing " occurrence number.
Wherein, described state mapping map obtains in the following manner:
If A I, jN in the matrix ThRow j ThColumn element is j ThThe big element of n of row, then state n ThBe state j ThThe n candidate revise state; For example: if A I, jI in the matrix ThRow j ThColumn element is j in the matrix ThThe greatest member of row this means the state i from corresponding language ThWith state j ThThe most similar, state i ThBe state j ThFirst candidate revise state.If k ThRow j ThColumn element is j ThThe second largest element of row, state k so ThBe state j ThSecond candidate revise state.So analogize, the state of each non-mother tongue language can both find n candidate of mother tongue language to revise state (n<M) in matrix.
Wherein, described non-mother tongue state correction model decoder module is in the speech recognition decoder process, for observed value o t, with n the non-mother tongue state j that the candidate state correction is later ThThe observation probability
Figure G2008102398925D00032
Become:
p j th ( o t ) = α p j th ( o t ) tar + ( 1 - α ) Σ l = 1 n a lj p l th ( o t ) sou
Wherein, α represents the correction weight of non-mother tongue state, a LjBe corresponding l ThIndividual candidate state, n is candidate's number;
Figure G2008102398925D00034
With Represent non-mother tongue state j respectively ThWith mother tongue state l ThAt observed value o tUnder the raw observation probability.
A kind of non-mother tongue pronunciation recognition methods provided by the invention comprises the steps:
(1) non-mother tongue pronunciation interface is gathered the non-mother tongue pronunciation data of some, is used to obtain the model state mapping table.
(2) non-mother tongue state forces alignment module with non-mother tongue acoustic model non-mother tongue pronunciation data to be forced alignment, obtains non-mother tongue pronunciation state levels segment information, promptly non-mother tongue pronunciation state reference information.
(3) mother tongue state decode module is decoded to non-mother tongue pronunciation data with standard female phonics model, obtains mother tongue pronunciation state levels segment information, i.e. mother tongue pronunciation state decode information.
(4) mother tongue aligns in time with non-mother tongue pronunciation state levels segment information and the mother tongue pronunciation state levels segment information that non-mother tongue state similarity matrix computing module will obtain, when time of coincidence of certain two state during greater than predefined threshold value, " with existing " appears once in these two states.
(5) mother tongue and non-mother tongue state similarity matrix computing module count all " with existing " situations and calculate the similarity matrix of non-mother tongue pronunciation state corresponding to the mother tongue pronunciation state.
(6) mother tongue and non-mother tongue state mapping map computing module obtain state mapping map according to this similarity matrix.
(7) non-mother tongue state correction model decoder module is according to the state mapping map that obtains, in the decode procedure of speech recognition, with the corresponding non-mother tongue acoustic model state of the mother tongue acoustic model state correction of finding in the mapping table, obtain revised non-mother tongue acoustic model.
The invention has the advantages that:
(1) non-mother tongue pronunciation recognition system of the present invention and method more adapt to non-mother tongue acoustic model to have the non-mother tongue pronunciation characteristics of mother tongue accent.
(2) non-mother tongue pronunciation recognition system of the present invention and method, adopt the state mapping between different models, utilize the acoustic model of speaker's mother tongue on Status Level non-mother tongue acoustic model to be revised, demoder is realized non-mother tongue pronunciation identification according to the state mapping map that obtains.The non-mother tongue pronunciation recognition system of mixture model state correction is not compared with adopting the corrected recognition system of this method to the discrimination of non-mother tongue pronunciation; And do not increasing any non-mother tongue pronunciation training data, only relying on the standard female speech therapy to practice under the prerequisite of data, do not adopting the corrected recognition system of this method to be significantly improved relatively the identification situation of the non-mother tongue pronunciation that has the mother tongue accent; The speed of simultaneity factor recognizing voice does not obviously descend.
(3) non-mother tongue pronunciation recognition system of the present invention and method have realized utilizing the correction of mother tongue acoustic model to non-mother tongue acoustic model, and the raising system is to the recognition performance of non-mother tongue pronunciation.Compare the voice adaptive technique, do not need to increase more non-mother tongue training data based on the modification method of mixture model state.Because the scale of the mother tongue acoustic model that is used to revise can be controlled at very little scope, revised model can not be significantly increased on computing velocity yet.We have carried out the test of a large amount of real netting index certificates, the result is presented at the recognition performance aspect, because the map information of this system on Status Level is based on the status switch situation of demoder output and adds up and obtain, this criterion is comparatively direct, can reflect the similarity degree between the bilingual state substantially really.Compare with the system that does not use this method, can reach 5-10% to the relative decline of identification error rate of non-mother tongue pronunciation based on the non-mother tongue pronunciation recognition system of mixture model state correction.
Description of drawings
Fig. 1 is of the present invention based on the non-mother tongue pronunciation recognition system of mixture model state correction algorithm and the ultimate principle block diagram of method;
The FB(flow block) of Fig. 2 non-mother tongue pronunciation recognition system of the present invention;
" with existing " key diagram between Fig. 3 specific embodiments of the invention mother tongue pronunciation state and the non-mother tongue pronunciation state.
Embodiment
Below in conjunction with accompanying drawing the specific embodiment of the present invention is done and to be described in further detail:
Fig. 1 is based on the ultimate principle block diagram of the non-mother tongue pronunciation recognition system of mixture model state correction algorithm, it has described the core constituent of mixture model state correction algorithm, mainly is made of several sections: the decoding of non-mother tongue pronunciation, mother tongue/non-mother tongue acoustic model, pressure alignment, decoding, state mapping map generation and non-mother tongue state correction model.Fig. 2 is based on the concrete implementing procedure block diagram of the non-mother tongue pronunciation recognition system of mixture model state correction algorithm.
The core technology of the non-mother tongue pronunciation identification of the mixture model state correction algorithm that the present invention relates to is obtain (module 1 to 7 as shown in Figure 2) of state mapping map.Mixture model state correction algorithm is a kind of novel state mapping algorithm based on similarity matrix, by to the statistics with the occurrence number, obtains the corresponding relation between the bilingual state, and the state that utilizes this relation to be identified for revising is right.
System of the present invention comprises: a non-mother tongue pronunciation interface is used to gather non-mother tongue pronunciation data, and should non-mother tongue pronunciation data sends into non-mother tongue state pressure alignment module and mother tongue state decode module.
One mother tongue model module is used for providing the mother tongue acoustic model to mother tongue state decode module and non-mother tongue state correction model decoder module.
One non-mother tongue model module is used for forcing alignment module and non-mother tongue state correction model decoder module that non-mother tongue acoustic model is provided to non-mother tongue state.
One mother tongue state decode module, be used for non-mother tongue pronunciation data being decoded according to standard female phonics model, obtain mother tongue pronunciation state levels segment information, be mother tongue pronunciation state decode information, and mother tongue pronunciation state levels segment information is sent into mother tongue and non-mother tongue state similarity matrix computing module.
One non-mother tongue state is forced alignment module, be used for non-mother tongue pronunciation data being forced alignment according to non-mother tongue acoustic model, obtain non-mother tongue pronunciation state levels segment information, be non-mother tongue pronunciation state reference information, and non-mother tongue pronunciation state levels segment information is sent into mother tongue and non-mother tongue state similarity matrix computing module.
One mother tongue and non-mother tongue state similarity matrix computing module, be used for mother tongue pronunciation state levels segment information and non-mother tongue pronunciation state levels segment information are alignd in time, when time of coincidence of mother tongue pronunciation state and non-mother tongue pronunciation state during greater than predefined threshold value, think that " with existing " appears once in these two states, count all " with existing " situations, and calculate the similarity matrix of non-mother tongue pronunciation state, and this similarity matrix information is sent into mother tongue and non-mother tongue state mapping map computing module corresponding to the mother tongue pronunciation state.
One mother tongue and non-mother tongue state mapping map computing module are used for calculating state mapping map according to similarity matrix.And
One non-mother tongue state correction model decoder module is used for the decode procedure in speech recognition, with the corresponding non-mother tongue acoustic model state of the mother tongue acoustic model state correction of finding in the state mapping map, obtains revised non-mother tongue acoustic model; Final this revised non-mother tongue acoustic model that utilizes is finished non-mother tongue pronunciation identification.
Concrete calculation process of the present invention is as follows: (as Fig. 2)
The first step, non-mother tongue state Key for Reference: select a certain amount of non-mother tongue pronunciation data (non-mother tongue pronunciation interface 1), these data are used to generate the similarity matrix of mother tongue to non-mother tongue.With non-mother tongue acoustic model (non-mother tongue model module 2) these data are forced alignment (Forced-alignment), obtain non-mother tongue status switch, note the temporal information (non-mother tongue state is forced alignment module 4) of each state simultaneously.
Second step, mother tongue state recognition result: with mother tongue acoustic model (mother tongue model module 3) to the non-mother tongue pronunciation data in the first step decode (Decode), obtain the mother tongue status switch, note the temporal information (mother tongue state decode module 5) of each state simultaneously.
The 3rd step, " with existing " criterion: by the first step and second step, on with a collection of non-mother tongue pronunciation data, the status switch and the corresponding time period information of non-mother tongue and mother tongue have been obtained respectively, according to the position relation of these two status switches on time shaft, can define " with existing " phenomenon of two states." with existing " is preceding calculating, and at first defines " with an existing " matrix, and this matrix be the matrix of (the non-mother tongue Language State of mother tongue Language State number * number), has write down " with the now " number of times between the state of corresponding row and column on each element position.In the experiment of the method for the invention, definition accounts for non-mother tongue Language State duration one half when two state overlapping times on the time period, can be regarded as once " with existing ".As non-mother tongue Language State i ThWith mother tongue Language State j ThWhen " with existing " occurring once, show the i of matrix together in correspondence ThRow j ThAdd 1 record on the position of row.As shown in Figure 3, once " with existing " that occurs between mother tongue pronunciation state " aa_native " and the non-mother tongue pronunciation state " ae_nonnative ".
In the 4th step, the state similarity matrix calculates (mother tongue and non-mother tongue state similarity matrix computing module 6): if set M, N is respectively mother tongue language and non-mother tongue Language State number, A S, T(matrix element has write down the similarity situation between mother tongue language and the non-mother tongue Language State for M, the N) similarity matrix for deriving from from same matrix now.A I, jBe matrix i ThRow j ThThe element of row is set t jBe non-mother tongue Language State, s iBe the mother tongue Language State, similarity between the two is calculated as:
A i , j = count ( t j | s i ) Σ n = 1 N count ( t n | s i ) - - - ( 1 )
Wherein, A I, j∈ A S, T(M, N), i=1 ... M, j=1 ... N.
The 5th the step, obtain similarity matrix after, obtain state mapping map (mother tongue and non-mother tongue state mapping map computing module 7) according to this matrix.If A I, jI in the matrix ThRow j ThColumn element is j in the matrix ThThe greatest member of row this means the state i from corresponding language ThWith state j ThThe most similar, state i ThBe state j ThFirst candidate revise state.If k ThRow j ThColumn element is j ThThe second largest element of row, state k so ThBe state j ThSecond candidate revise state.So analogize, the state of each non-mother tongue language can both find n candidate of mother tongue language to revise state (n<M) in matrix.
Arrive this, we have obtained the map information between mother tongue language and non-mother tongue Language State based on mixture model state correction algorithm.
In the 6th step, next, in decode procedure, non-mother tongue acoustic model is revised (non-mother tongue state correction model decoder module 8) with the mother tongue acoustic model according to the mapping relations between the state that obtains.According to the 5th state mapping map that obtain of step, in the decode procedure of speech recognition, for observed value o t, with n the non-mother tongue state j that the candidate state correction is later ThThe observation probability
Figure G2008102398925D00071
Become:
p j th ( o t ) = α p j th ( o t ) tar + ( 1 - α ) Σ l = 1 n a lj p l th ( o t ) sou - - - ( 2 )
Here α represents the correction weight of non-mother tongue state, a LjBe corresponding l ThIndividual candidate state, n is candidate's number.
Figure G2008102398925D00073
With
Figure G2008102398925D00074
Represent non-mother tongue state j respectively ThWith mother tongue state l ThAt observed value o tUnder the raw observation probability.
Characteristics of the present invention: (1) makes non-mother tongue acoustic model more adapt to non-mother tongue pronunciation characteristics with the mother tongue accent (2) propose a kind of novel non-native speech recognition method based on the mixed model state revision, (3) are only relying on mark Under the prerequisite of accurate mother tongue acoustic model, the raising system is to the recognition performance of non-mother tongue pronunciation.
In the English real network data test with Chinese accent of carrying out based on given grammer, show, based on mixing mould The non-native speech recognition system of type state revision algorithm is with the error rate phase on the English test set of Chinese accent To not using the decline 5%-10% of non-native speech recognition system (relative value) of the method; Computational speed is not with respect to Use the non-native speech recognition system of the method to reduce 20%-25% (relative value).

Claims (8)

1, a kind of non-mother tongue pronunciation recognition system is characterized in that described system comprises:
One non-mother tongue pronunciation interface is used to gather non-mother tongue pronunciation data, and should non-mother tongue pronunciation data send into non-mother tongue state pressure alignment module and mother tongue state decode module;
One mother tongue model module is used for providing the mother tongue acoustic model to mother tongue state decode module and non-mother tongue state correction model decoder module;
One non-mother tongue model module is used for forcing alignment module and non-mother tongue state correction model decoder module that non-mother tongue acoustic model is provided to non-mother tongue state;
One mother tongue state decode module, be used for non-mother tongue pronunciation data being decoded according to standard female phonics model, obtain mother tongue pronunciation state levels segment information, be mother tongue pronunciation state decode information, and mother tongue pronunciation state levels segment information is sent into mother tongue and non-mother tongue state similarity matrix computing module;
One non-mother tongue state is forced alignment module, be used for non-mother tongue pronunciation data being forced alignment according to non-mother tongue acoustic model, obtain non-mother tongue pronunciation state levels segment information, be non-mother tongue pronunciation state reference information, and non-mother tongue pronunciation state levels segment information is sent into mother tongue and non-mother tongue state similarity matrix computing module;
One mother tongue and non-mother tongue state similarity matrix computing module, be used for mother tongue pronunciation state levels segment information and non-mother tongue pronunciation state levels segment information are alignd in time, when time of coincidence of mother tongue pronunciation state and non-mother tongue pronunciation state during greater than predefined threshold value, think that " with existing " appears once in these two states, count all " with existing " situations, and calculate the similarity matrix of non-mother tongue pronunciation state, and this similarity matrix information is sent into mother tongue and non-mother tongue state mapping map computing module corresponding to the mother tongue pronunciation state;
One mother tongue and non-mother tongue state mapping map computing module are used for calculating state mapping map according to similarity matrix; And
One non-mother tongue state correction model decoder module is used for the decode procedure in speech recognition, with the corresponding non-mother tongue acoustic model state of the mother tongue acoustic model state correction of finding in the state mapping map, obtains revised non-mother tongue acoustic model; Final this revised non-mother tongue acoustic model that utilizes is finished non-mother tongue pronunciation identification.
2, non-mother tongue pronunciation recognition system according to claim 1 is characterized in that, described non-mother tongue pronunciation state obtains by following formula corresponding to the similarity matrix of mother tongue pronunciation state:
A i , j = count ( t j | s i ) Σ n = 1 N count ( t n | s i )
Wherein, M, N are respectively Chinese and English number of states; A S, T(M N) is similarity matrix; A I, jI for matrix ThRow j ThColumn element; t jBe non-mother tongue pronunciation state, s iBe the mother tongue pronunciation state; A I, j∈ A S, T(M, N), i=1 ... M, j=1 ... N; Count (t j| s i) be mother tongue pronunciation state s iWith non-mother tongue pronunciation state t jBetween " with existing " occurrence number.
3, non-mother tongue pronunciation recognition system according to claim 1 is characterized in that described state mapping map obtains in the following manner:
If A I, jN in the matrix ThRow j ThColumn element is j ThThe big element of n of row, then state n ThBe state j ThThe n candidate revise state; The state of each non-mother tongue language can both find n candidate of mother tongue language to revise state in matrix, wherein, and n<M.
4, non-mother tongue pronunciation recognition system according to claim 1 is characterized in that, described non-mother tongue state correction model decoder module is in the speech recognition decoder process, for observed value o t, with n the non-mother tongue state j that the candidate state correction is later ThThe observation probability
Figure A2008102398920003C1
Become:
p j th ( o t ) = α p j th ( o t ) tar + ( 1 - α ) Σ l = 1 n a lj p l th ( o t ) sou
Wherein, α represents the correction weight of non-mother tongue state, a LjBe corresponding l ThIndividual candidate state, n is candidate's number;
Figure A2008102398920003C3
With Represent non-mother tongue state j respectively ThWith mother tongue state l ThAt observed value o tUnder the raw observation probability.
5, a kind of non-mother tongue pronunciation recognition methods comprises the steps:
(1) non-mother tongue pronunciation interface is gathered the non-mother tongue pronunciation data of some, is used to obtain the model state mapping table;
(2) non-mother tongue state forces alignment module with non-mother tongue acoustic model non-mother tongue pronunciation data to be forced alignment, obtains non-mother tongue pronunciation state levels segment information, promptly non-mother tongue pronunciation state reference information;
(3) mother tongue state decode module is decoded to non-mother tongue pronunciation data with standard female phonics model, obtains mother tongue pronunciation state levels segment information, i.e. mother tongue pronunciation state decode information;
(4) mother tongue aligns in time with non-mother tongue pronunciation state levels segment information and the mother tongue pronunciation state levels segment information that non-mother tongue state similarity matrix computing module will obtain, when time of coincidence of certain two state during greater than predefined threshold value, " with existing " appears once in these two states;
(5) mother tongue and non-mother tongue state similarity matrix computing module count all " with existing " situations and calculate the similarity matrix of non-mother tongue pronunciation state corresponding to the mother tongue pronunciation state;
(6) mother tongue and non-mother tongue state mapping map computing module obtain state mapping map according to this similarity matrix;
(7) non-mother tongue state correction model decoder module is according to the state mapping map that obtains, in the decode procedure of speech recognition, with the corresponding non-mother tongue acoustic model state of the mother tongue acoustic model state correction of finding in the mapping table, obtain revised non-mother tongue acoustic model.
6, non-mother tongue pronunciation recognition methods according to claim 5 is characterized in that, described non-mother tongue pronunciation state obtains by following formula corresponding to the similarity matrix of mother tongue pronunciation state:
A i , j = count ( t j | s i ) Σ n = 1 N count ( t n | s i )
Wherein, M, N are respectively Chinese and English number of states; A S, T(M N) is similarity matrix; A I, jI for matrix ThRow j ThColumn element; t jBe non-mother tongue pronunciation state, s iBe the mother tongue pronunciation state; A I, j∈ A S, T(M, N), i=1 ... M, j=1 ... N; Count (t j| s i) be mother tongue pronunciation state s iWith non-mother tongue pronunciation state t jBetween " with existing " occurrence number.
7, non-mother tongue pronunciation recognition methods according to claim 5 is characterized in that described state mapping map obtains in the following manner:
If A I, jN in the matrix ThRow j ThColumn element is j ThThe big element of n of row, then state n ThBe state j ThThe n candidate revise state; The state of each non-mother tongue language can both find n candidate of mother tongue language to revise state in matrix, wherein, and n<M.
8, non-mother tongue pronunciation recognition methods according to claim 5 is characterized in that, described non-mother tongue state correction model decoder module is in the speech recognition decoder process, for observed value o t, with n the non-mother tongue state j that the candidate state correction is later ThThe observation probability
Figure A2008102398920004C2
Become:
p j th ( o t ) = α p j th ( o t ) tar + ( 1 - α ) Σ l = 1 n a lj p l th ( o t ) sou
Wherein, α represents the correction weight of non-mother tongue state, a LiBe corresponding l ThIndividual candidate state, n is candidate's number;
Figure A2008102398920004C4
With
Figure A2008102398920004C5
Represent non-mother tongue state j respectively ThWith mother tongue state l ThAt observed value o tUnder the raw observation probability.
CN200810239892A 2008-12-19 2008-12-19 Non-native speech recognition system and method thereof Pending CN101650943A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810239892A CN101650943A (en) 2008-12-19 2008-12-19 Non-native speech recognition system and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810239892A CN101650943A (en) 2008-12-19 2008-12-19 Non-native speech recognition system and method thereof

Publications (1)

Publication Number Publication Date
CN101650943A true CN101650943A (en) 2010-02-17

Family

ID=41673164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810239892A Pending CN101650943A (en) 2008-12-19 2008-12-19 Non-native speech recognition system and method thereof

Country Status (1)

Country Link
CN (1) CN101650943A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254553A (en) * 2010-05-17 2011-11-23 阿瓦雅公司 Automatic normalization of spoken syllable duration
CN103632668A (en) * 2012-08-21 2014-03-12 北京百度网讯科技有限公司 Method and apparatus for training English voice model based on Chinese voice information
CN106663422A (en) * 2014-07-24 2017-05-10 哈曼国际工业有限公司 Text rule based multi-accent speech recognition with single acoustic model and automatic accent detection
CN107958666A (en) * 2017-05-11 2018-04-24 小蚁科技(香港)有限公司 Method for the constant speech recognition of accent
WO2018086033A1 (en) * 2016-11-10 2018-05-17 Nuance Communications, Inc. Techniques for language independent wake-up word detection
CN108352127A (en) * 2015-09-22 2018-07-31 旺多姆咨询私人有限公司 Method, automatic accents recognition and the quantization of score and improved speech recognition are produced for automatically generating speech samples assets for the user of distributed language learning system
US10224023B2 (en) 2016-12-13 2019-03-05 Industrial Technology Research Institute Speech recognition system and method thereof, vocabulary establishing method and computer program product
CN110199348A (en) * 2016-12-21 2019-09-03 亚马逊技术股份有限公司 Accent conversion
WO2021000068A1 (en) * 2019-06-29 2021-01-07 播闪机械人有限公司 Speech recognition method and apparatus used by non-native speaker
US11087750B2 (en) 2013-03-12 2021-08-10 Cerence Operating Company Methods and apparatus for detecting a voice command
US11437020B2 (en) 2016-02-10 2022-09-06 Cerence Operating Company Techniques for spatially selective wake-up word recognition and related systems and methods
US11600269B2 (en) 2016-06-15 2023-03-07 Cerence Operating Company Techniques for wake-up word recognition and related systems and methods

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254553A (en) * 2010-05-17 2011-11-23 阿瓦雅公司 Automatic normalization of spoken syllable duration
CN102254553B (en) * 2010-05-17 2016-05-11 阿瓦雅公司 The automatic normalization of spoken syllable duration
CN103632668A (en) * 2012-08-21 2014-03-12 北京百度网讯科技有限公司 Method and apparatus for training English voice model based on Chinese voice information
CN103632668B (en) * 2012-08-21 2018-07-27 北京百度网讯科技有限公司 A kind of method and apparatus for training English speech model based on Chinese voice information
US11393461B2 (en) 2013-03-12 2022-07-19 Cerence Operating Company Methods and apparatus for detecting a voice command
US11676600B2 (en) 2013-03-12 2023-06-13 Cerence Operating Company Methods and apparatus for detecting a voice command
US11087750B2 (en) 2013-03-12 2021-08-10 Cerence Operating Company Methods and apparatus for detecting a voice command
CN106663422A (en) * 2014-07-24 2017-05-10 哈曼国际工业有限公司 Text rule based multi-accent speech recognition with single acoustic model and automatic accent detection
CN106663422B (en) * 2014-07-24 2021-03-30 哈曼国际工业有限公司 Speech recognition system and speech recognition method thereof
CN108352127A (en) * 2015-09-22 2018-07-31 旺多姆咨询私人有限公司 Method, automatic accents recognition and the quantization of score and improved speech recognition are produced for automatically generating speech samples assets for the user of distributed language learning system
US11437020B2 (en) 2016-02-10 2022-09-06 Cerence Operating Company Techniques for spatially selective wake-up word recognition and related systems and methods
US11600269B2 (en) 2016-06-15 2023-03-07 Cerence Operating Company Techniques for wake-up word recognition and related systems and methods
CN111971742A (en) * 2016-11-10 2020-11-20 赛轮思软件技术(北京)有限公司 Techniques for language independent wake word detection
US11545146B2 (en) 2016-11-10 2023-01-03 Cerence Operating Company Techniques for language independent wake-up word detection
WO2018086033A1 (en) * 2016-11-10 2018-05-17 Nuance Communications, Inc. Techniques for language independent wake-up word detection
US10224023B2 (en) 2016-12-13 2019-03-05 Industrial Technology Research Institute Speech recognition system and method thereof, vocabulary establishing method and computer program product
CN110199348A (en) * 2016-12-21 2019-09-03 亚马逊技术股份有限公司 Accent conversion
CN107958666A (en) * 2017-05-11 2018-04-24 小蚁科技(香港)有限公司 Method for the constant speech recognition of accent
WO2021000068A1 (en) * 2019-06-29 2021-01-07 播闪机械人有限公司 Speech recognition method and apparatus used by non-native speaker

Similar Documents

Publication Publication Date Title
CN101650943A (en) Non-native speech recognition system and method thereof
CN110534095B (en) Speech recognition method, apparatus, device and computer readable storage medium
CN101447184B (en) Chinese-English bilingual speech recognition method based on phoneme confusion
CN101178896B (en) Unit selection voice synthetic method based on acoustics statistical model
CN106297800A (en) A kind of method and apparatus of adaptive speech recognition
CN107103900A (en) A kind of across language emotional speech synthesizing method and system
CN105957518A (en) Mongolian large vocabulary continuous speech recognition method
CN108172218A (en) A kind of pronunciation modeling method and device
CN103928023A (en) Voice scoring method and system
CN103345922A (en) Large-length voice full-automatic segmentation method
CN103345923A (en) Sparse representation based short-voice speaker recognition method
CN105139864A (en) Voice recognition method and voice recognition device
CN102184731A (en) Method for converting emotional speech by combining rhythm parameters with tone parameters
CN102354495A (en) Testing method and system of semi-opened spoken language examination questions
Ling et al. The USTC and iFlytek speech synthesis systems for Blizzard Challenge 2007
CN102436807A (en) Method and system for automatically generating voice with stressed syllables
CN103810994A (en) Method and system for voice emotion inference on basis of emotion context
CN110070855A (en) A kind of speech recognition system and method based on migration neural network acoustic model
Xie et al. A KL divergence and DNN approach to cross-lingual TTS
Ubale et al. Exploring end-to-end attention-based neural networks for native language identification
CN111090726A (en) NLP-based electric power industry character customer service interaction method
Gómez et al. Improvements on automatic speech segmentation at the phonetic level
Georgescu et al. Automatic annotation of speech corpora using complementary GMM and DNN acoustic models
CN100431003C (en) Voice decoding method based on mixed network
CN102339605B (en) Fundamental frequency extraction method and system based on prior surd and sonant knowledge

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20100217