CN105280181A - Training method for language recognition model and language recognition method - Google Patents

Training method for language recognition model and language recognition method Download PDF

Info

Publication number
CN105280181A
CN105280181A CN201410336650.3A CN201410336650A CN105280181A CN 105280181 A CN105280181 A CN 105280181A CN 201410336650 A CN201410336650 A CN 201410336650A CN 105280181 A CN105280181 A CN 105280181A
Authority
CN
China
Prior art keywords
phoneme
model
cognition
training
languages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410336650.3A
Other languages
Chinese (zh)
Other versions
CN105280181B (en
Inventor
周若华
王宪亮
颜永红
索宏彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN201410336650.3A priority Critical patent/CN105280181B/en
Publication of CN105280181A publication Critical patent/CN105280181A/en
Application granted granted Critical
Publication of CN105280181B publication Critical patent/CN105280181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a training method for a language recognition model and a language recognition method. The language recognition method comprises the steps: extracting the phoneme posterior probability of speech data, converting the phoneme posterior probability into a log domain, conducting dimensionality reduction and mean and variance normalization, and then obtaining phoneme associated features; calculating Baum-Welch statistical magnitude by means of the phoneme associated features, and extracting a phoneme variance factor through the Baum-Welch statistical magnitude; modeling the phoneme variance factor, and establishing an SVM model (a language recognition model); and marking the SVM model by a phoneme variance factor of to-be-recognized speech data, conducting mean and variance normalization on the score, performing linear discriminant analysis and Gauss back end normalization on the normalized score to realize score correction, and finally obtaining a recognition result. Compared with a conventional language recognition method, the language recognition method of the invention has the advantages that the calculation complexity is reduced, the language recognition performance is obviously improved, and the method is highly practical.

Description

A kind of training method of languages model of cognition and Language Identification
Technical field
The present invention relates to the recognition methods of speech data language information, more particularly, the present invention relates to the Language Identification based on phoneme correlated characteristic.
Background technology
Along with the globalization of modern society's information, languages identification becomes one of speech recognition technology study hotspot.The object of languages recognition technology to manufacture Language Identification is carried out in a kind of apish thinking to a certain extent machine to voice, from voice signal, namely extract the different information of each languages, and languages belonging to judging on this basis.The phonic signal character extracted directly has influence on the result of languages identification.
The languages recognition technology of main flow comprises based on the identification of acoustics spectrum signature with based on the large class of phoneme feature identification two.
Acoustics spectrum signature refers to shift differential spectrum signature (MSDC) (document [1] P.A.Torres-Carrasquillo of Mel-cepstrum, E.Singer, M.A.Kohler, R.J.Greene, D.A.Reynolds, andJRDellerJr, " ApproachestolanguageidenticationusingGaussianmixturemode lsandshifteddeltacepstralfeatures, " inSeventhInternationalConferenceonSpokenLanguageProcessi ng.Citeseer, 2002.), model method based on acoustics spectrum signature is using the feature of the cepstrum feature extracted from voice as these voice, then modeling is carried out to these features, do not relate to the pronunciation information of voice.Modeling uses gauss hybrid models (GMM) (document [2] L.Burget usually, P.MatejkaandJ.Cernocky, " Discriminativetrainingtechniquesforacousticlanguageident ification ", InternationalConferenceonAcoustics, Speech, andSignalProcessing, vol.1, 2006.) and supporting vector machine model (SVM) (document [3] W.M.Campbell, J.P.Campbell, D.A.Reynolds, E.SingerandP.A.Torres-Carrasquillo, " Supportvectormachinesforspeakerandlanguagerecognition ", ComputerSpeechLanguage, vol.20, no.2-3, pp.210-229, 2006.).Based on ivector system (document [4] NajimDehak of factorial analysis, PedroATorres-Carrasquillo, DouglasAReynolds, andRedaDehak, " Languagerecognitionviai-vectorsanddimensionalityreductio n., " inINTERSPEECH, 2011, pp.857 – 860.) in languages identification, achieve good performance, be widely used.Ivector method defines the lower dimensional space that is called total variation factor space, this space contains speaker space and channel space simultaneously, then Gauss's super vector of higher-dimension is expressed as the total variation factor of low-dimensional, experiment proves, the total variation factor of low-dimensional can Gauss's super vector of Complete Characterization higher-dimension.The method becomes rapidly the main stream approach of Acoustic Modeling after introducing languages identification, and the research of being permitted multi-speech recognition is all carried out on the method basis.But, in languages identification, acoustics spectrum signature is just confined to for the research of Ivector method, is not generalized to the phoneme feature comprising abundant phonetic pronunciation information.
Language recognition system based on phoneme feature adopts phoneme recognizer to carry out decoding to voice and obtains aligned phoneme sequence or phoneme lattice, then uses grammatical feature to carry out modeling to languages.Document [5] (W.M.Campbell, F.RichardsonandD.A.Reynolds, " Languagerecognitionwithwordlatticesandsupportvectormachi nes ", InternationalConferenceonAcoustics, Speech, andSignalProcessing, vol.4,2007.).PPRVSM (H.Li, B.Ma, C.-H.Lee, Avectorspacemodelingapproachtospokenlanguageidentificati on, Audio, Speech, andLanguageProcessing, IEEETransactionson15 (1) (2007) 271 – 284) vector space model is introduced based in the languages recognition technology of phoneme recognition, aligned phoneme sequence or phoneme lattice are considered as " text ", extract from aligned phoneme sequence or phoneme lattice and have distinctive phone string as characteristic item composition characteristic vector, then support vector machine is adopted to classify, obtain good languages recognition performance.
Traditional recognition system based on phoneme feature considers the pronunciation character of voice, recognition performance is better than based on acoustics spectrum signature system, but due to decoding aligned phoneme sequence computation complexity high, long operational time, therefore seldom uses in systems in practice.
Summary of the invention
The object of the invention is to overcome traditional defect not comprising phonetic pronunciation information based on acoustics spectrum signature method, overcome tradition high based on phoneme characterization method decoding aligned phoneme sequence computation complexity, the defect of long operational time, thus a kind of reduction computational complexity is provided, improve the Language Identification of recognition performance.
To achieve these goals, the invention provides a kind of training method and Language Identification of languages model of cognition, wherein the training method of languages model of cognition comprises the steps:
Step 1-1), the target language speech data gathering some, as training statement, extracts the phoneme posterior probability of training statement;
Step 1-2), phoneme posterior probability is transformed into log-domain, and carries out dimensionality reduction, mean variance regular (MVN) is carried out to the feature after dimensionality reduction, obtains phoneme correlated characteristic;
Suppose x itthe t frame log-domain phoneme posterior probability vector of i-th training statement, T ithe frame number of i-th training statement, m ithe average of i-th training statement all frame log-domains phoneme posterior probability, as shown in the formula obtaining:
m i = 1 T i Σ t = 1 T i x it - - - ( 1 )
Covariance matrix C can be calculated by the average of training statement all frame log-domains phoneme posterior probability, as shown in the formula:
C = 1 N Σ i = 1 N ( m i - 1 N Σ i = 1 N m i ) ( m i - 1 N Σ i = 1 N m i ) T - - - ( 2 )
Wherein, N is the number of training statement;
The latent vector corresponding by L before covariance matrix (L value is close with phoneme number) individual dominant eigenvalue generates PCA transition matrix A pCA, by described PCA transition matrix A pCAobtain the feature after dimensionality reduction with log-domain phoneme posterior probability vector, its expression formula is:
y it = A PCA T x it - - - ( 3 )
To the feature y after dimensionality reduction itcarry out mean variance regular (MVN), obtain phoneme correlated characteristic vector z it.
Step 1-3), utilize phoneme correlated characteristic to calculate Baum-Welch statistic;
N c = Σ t = 1 T i P ( c | z it , Ω ) - - - ( 4 )
F c = Σ t = 1 T i P ( c | z it , Ω ) ( z it - μ c ) - - - ( 5 )
C is gaussian component, and Ω is the variance of global context model (UBM), p (c|z it, Ω) and represent that t frame belongs to the probability of c gaussian component, μ cit is the mean vector of c gaussian component.
Step 1-4), utilize Baum-Welch statistic to extract the phoneme variable quantity factor;
I-th training statement phoneme variable quantity factor w obtained by following formula:
w=(I+T TΣ -1N(i)T) -1T TΣ -1F(i)(6)
Wherein, N (i) is diagonal matrix, and on diagonal line, element is N ci, F (i) are by single order Baum-Welch statistic F csplicing obtains, Σ and T is obtained by EM Algorithm for Training in Factor Analysis.
Step 1-5), modeling is carried out to the phoneme variable quantity factor, sets up languages model of cognition;
Adopt one to one with one-to-many strategy, use SVM modeling is carried out to the phoneme variable quantity factor, set up SVM model, SVM model is described languages model of cognition.
A kind of Language Identification provided by the invention, based on the training method of the languages model of cognition of technique scheme, comprises the steps:
Step 2-1), extract the phoneme posterior probability of speech data to be identified;
Step 2-2), phoneme posterior probability is transformed into log-domain, and carries out dimensionality reduction, mean variance regular (MVN) is carried out to the feature after dimensionality reduction, obtains phoneme correlated characteristic;
Step 2-3), utilize phoneme correlated characteristic to calculate Baum-Welch statistic;
Step 2-4), utilize Baum-Welch statistic to extract the phoneme variable quantity factor;
Step 2-5), by the SVM model marking described in phoneme variable quantity factor pair, and it is regular to carry out mean variance to score, use linear discriminant to analyze (LDA) to the score after regular and Gauss rear end is regular carries out score correction, obtain final recognition result;
The regular computation process of described mean variance is:
s m ′ = exp ( s m - μ kσ ) - - - ( 7 )
s m ′ ′ = log ( M - 1 ) s m ′ Σ j = 1 M s j ′ - s m ′ - - - ( 8 )
Wherein, M is the number of described supporting vector machine model, s mbe the initial score of m SVM model, μ and σ is respectively average and the standard deviation of all SVM model score of this test data, and k is adjustable parameter, s " mit is regular rear score.
The invention has the advantages that:
1, consider the pronunciation character of language, the different information between languages is more obvious;
2, phoneme correlated characteristic is used for factorial analysis, improves the performance of system languages identification;
3, eliminate the decode procedure of tradition based on phoneme Feature Recognition System, greatly reduce the computation complexity of system.
Accompanying drawing explanation
Fig. 1 is a kind of process flow diagram of training method of languages model of cognition;
Fig. 2 is a kind of process flow diagram of Language Identification.
Embodiment
Now by reference to the accompanying drawings the present invention is described in further detail;
With reference to figure 1, a kind of flow process of training method of languages model of cognition comprises:
Step 1-1), gather the target language speech data of some as training data, extract phoneme posterior probability;
The target language speech data gathering some is as training data, and by traditional speech data front-end processing, quiet to training data excision, the invalid voice such as music, retain efficient voice; Then respectively to left frequency band and right frequency band extraction time pattern (TRAP) feature; The frame length of each frame is 25ms, and frame moves as 10ms, and the feature of 15 frames got respectively by left and right frequency band, and therefore the feature of each frame comprises the duration of 310ms around.The TRAP feature of left and right frequency band sends into the phoneme posterior probability that artificial neural network obtains two frequency bands respectively, the phoneme posterior probability of two frequency bands is stitched together, and uses another person's artificial neural networks to process, finally obtains the phoneme posterior probability of 159 dimensions.
Step 1-2), phoneme posterior probability is transformed into log-domain, uses principal component analysis (PCA) technology (PCA) to carry out dimensionality reduction, mean variance regular (MVN) is carried out to the feature after dimensionality reduction, obtains phoneme correlated characteristic;
Suppose x itthe t frame log-domain phoneme posterior probability vector of i-th training statement, T ithe frame number of i-th training statement, m ithe average of i-th training statement all frame log-domains phoneme posterior probability, as shown in the formula obtaining:
m i = 1 T i Σ t = 1 T i x it
By the average m of training data all frame log-domains phoneme posterior probability icovariance matrix C can be calculated, as shown in the formula:
C = 1 N Σ i = 1 N ( m i - 1 N Σ i = 1 N m i ) ( m i - 1 N Σ i = 1 N m i ) T
Wherein, N is the number of training statement;
PCA transition matrix A is generated by the latent vector that front 56 dominant eigenvalues of covariance matrix are corresponding pCA, by described PCA transition matrix A pCAobtain the feature after dimensionality reduction with log-domain phoneme posterior probability vector, its expression formula is:
y it = A PCA T x it
To the feature y after dimensionality reduction itcarry out mean variance regular (MVN), remove the impact of pronunciation change, obtain phoneme correlated characteristic z it, intrinsic dimensionality is 56 dimensions.
Step 1-3), utilize phoneme correlated characteristic to calculate Baum-Welch statistic;
N c = Σ t = 1 T i P ( c | z it , Ω )
F c = Σ t = 1 T i P ( c | z it , Ω ) ( z it - μ c )
C is gaussian component, and Ω is the variance of global context model (UBM), p (c|z it, Ω) and represent that t frame belongs to the probability of c gaussian component, μ cit is the mean vector of c gaussian component.Gaussage gets 1024.
Step 1-4), utilize Baum-Welch statistic to extract the phoneme variable quantity factor;
I-th training statement phoneme variable quantity factor w be:
w=(I+T tΣ -1N(i)T) -1T tΣ -1F(i)
Wherein, N (i) is diagonal matrix, and on diagonal line, element is N ci, F (i) are by single order Baum-Welch statistic F csplicing obtains, Σ and T is obtained by EM Algorithm for Training in Factor Analysis.
Step 1-5), modeling is carried out to the phoneme variable quantity factor, sets up languages model of cognition;
Adopt one to one with one-to-many strategy, use SVM modeling is carried out to the phoneme variable quantity factor, set up M SVM model, SVM model is described languages model of cognition.
With reference to figure 2, a kind of flow process of Language Identification comprises the steps:
Step 2-1), extract the temporal mode feature of speech data to be identified, end user's artificial neural networks extracts phoneme posterior probability;
Step 2-2), phoneme posterior probability is transformed into log-domain, uses principal component analysis (PCA) technology (PCA) to carry out dimensionality reduction, mean variance regular (MVN) is carried out to the feature after dimensionality reduction, obtains phoneme correlated characteristic;
Step 2-3), utilize phoneme correlated characteristic to calculate Baum-Welch statistic;
Step 2-4), utilize Baum-Welch statistic to extract the phoneme variable quantity factor;
Step 2-5), by M SVM model marking described in phoneme variable quantity factor pair, and it is regular to carry out mean variance to score, use linear discriminant to analyze (LDA) to regular rear score and Gauss rear end is regular carries out score correction, obtain final recognition result;
The regular computation process of described mean variance is:
s m ′ = exp ( s m - μ kσ )
s m ′ ′ = log ( M - 1 ) s m ′ Σ j = 1 M s j ′ - s m ′
Wherein, s mbe the initial score of m SVM model, μ and σ is respectively average and the standard deviation of all SVM model score of this test data, k=100, s " mit is regular rear score.
American National Standard technology administration (NIST) languages identification in 2011 evaluation and test (LRE) data set is tested, test languages comprise 24 target languages, and Performance Evaluating Indexes has EER (etc. error rate), minDCF (minimum detection mistake cost), minCavg 276(the minimum average B configuration value of risk that 276 languages are right), actCavg 276(the actual average value of risk that 276 languages are right), minCavg 24(the minimum average B configuration value of risk that 24 languages the poorest are right) and actCavg 24(the actual average value of risk that 24 languages the poorest are right).S1 represents traditional Ivector method, and S2 represents traditional PPRVSM method based on phoneme feature, and S3 represents the Language Identification that the present invention proposes.Use Russian phoneme recognizer in actual test, the test performance evaluation index comparing result of each method is as shown in table 1.
Table 1
Method EER minDCF minCavg 276 actCavg 276 minCavg 24 actCavg 24
S1 6.65 6.86 2.45 3.45 11.88 14.52
S2 7.62 8.04 2.67 4.68 11.68 14.09
S3 5.52 5.75 1.45 2.68 9.00 12.01
The Language Identification that the present invention proposes is used for factorial analysis the phoneme correlated characteristic relevant to pronunciation content, test result shows the Language Identification of the present invention's proposition compared with traditional Ivector method, recognition performance has the lifting of relative 16%-41%, compared with traditional Language Identification PPRVSM based on phoneme feature, recognition performance has the lifting of relative 15%-46%.

Claims (7)

1. a training method for languages model of cognition, comprising:
Step 1-1), the target language speech data gathering some, as training statement, extracts the phoneme posterior probability of training statement;
Step 1-2), forward phoneme posterior probability to log-domain, and carry out dimensionality reduction, mean variance carries out to the feature after dimensionality reduction regular, obtain phoneme correlated characteristic;
Step 1-3), utilize phoneme correlated characteristic to calculate Baum-Welch statistic;
Step 1-4), utilize Baum-Welch statistic to extract the phoneme variable quantity factor;
Step 1-5), modeling is carried out to the phoneme variable quantity factor, sets up languages model of cognition.
2., by the training method of languages model of cognition according to claim 1, it is characterized in that, described step 1-2) computation process be:
The latent vector T.G Grammar matrix A corresponding by the dominant eigenvalue of L before covariance matrix pCA, covariance matrix is defined as follows formula:
C = 1 N Σ i = 1 N ( m i - 1 N Σ i = 1 N m i ) ( m i - 1 N Σ i = 1 N m i ) T
Wherein, N is the number of training statement, m ithe average of i-th training statement all frame log-domains phoneme posterior probability, as shown in the formula obtaining:
m i = 1 T i Σ t = 1 T i x it
X itthe t frame log-domain phoneme posterior probability vector of i-th training statement, T ibe the frame number of i-th training statement, the feature after dimensionality reduction is:
y it = A PCA T x it
To the feature y after dimensionality reduction itcarry out mean variance regular, obtain phoneme correlated characteristic vector z it.
3., by the training method of languages model of cognition according to claim 1, it is characterized in that, described step 1-3) in Baum-Welch normalized set process be:
N c = Σ t = 1 T i P ( c | z it , Ω )
F c = Σ t = 1 T i P ( c | z it , Ω ) ( z it - μ c )
C is gaussian component, and Ω is the variance of global context model, p (c|z it, Ω) and represent that t frame belongs to the probability of c gaussian component, μ cit is the mean vector of c gaussian component.
4., by the training method of languages model of cognition according to claim 1, it is characterized in that, described step 1-4) computation process be:
The phoneme variable quantity factor w of i-th training statement is:
w=(I+T TΣ -1N(i)T) -1T TΣ -1F(i)
Wherein, N (i) is diagonal matrix, and on diagonal line, element is N ci, F (i) are by single order Baum-Welch statistic F csplicing obtains, Σ and T is trained by expectation-maximization algorithm and obtain in Factor Analysis.
5. by the training method of languages model of cognition according to claim 1, it is characterized in that, described step 1-5) process be: adopt one to one with one-to-many strategy, support vector machine is used to carry out modeling to the phoneme variable quantity factor, set up supporting vector machine model, supporting vector machine model is described languages model of cognition.
6. a Language Identification, the method is based on the training method of the languages model of cognition one of claim 1-5 Suo Shu, and the method comprises the steps:
Step 2-1), extract the phoneme posterior probability of statement to be identified;
Step 2-2), forward phoneme posterior probability to log-domain, and carry out dimensionality reduction, mean variance carries out to the feature after dimensionality reduction regular, obtain phoneme correlated characteristic;
Step 2-3), utilize phoneme correlated characteristic to calculate Baum-Welch statistic;
Step 2-4), utilize Baum-Welch statistic to extract the phoneme variable quantity factor;
Step 2-5), the languages model of cognition described in phoneme variable quantity factor pair is given a mark, mean variance carries out to score regular, linear discriminant analysis is used to the score after regular and Gauss rear end is regular carries out score correction, obtain final recognition result.
7., by Language Identification according to claim 6, it is characterized in that, step 2-5) described in the regular computation process of mean variance be:
s m ′ = exp ( s m - μ kσ )
s m ′ ′ = log ( M - 1 ) s m ′ Σ j = 1 M s j ′ - s m ′
Wherein, M is the number of described languages model of cognition, s mbe the initial score of m languages model of cognition, μ and σ is respectively average and the standard deviation of this test data all languages model of cognition score, and k is adjustable parameter, s " mit is regular rear score.
CN201410336650.3A 2014-07-15 2014-07-15 A kind of training method and Language Identification of languages identification model Active CN105280181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410336650.3A CN105280181B (en) 2014-07-15 2014-07-15 A kind of training method and Language Identification of languages identification model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410336650.3A CN105280181B (en) 2014-07-15 2014-07-15 A kind of training method and Language Identification of languages identification model

Publications (2)

Publication Number Publication Date
CN105280181A true CN105280181A (en) 2016-01-27
CN105280181B CN105280181B (en) 2018-11-13

Family

ID=55149073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410336650.3A Active CN105280181B (en) 2014-07-15 2014-07-15 A kind of training method and Language Identification of languages identification model

Country Status (1)

Country Link
CN (1) CN105280181B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107369440A (en) * 2017-08-02 2017-11-21 北京灵伴未来科技有限公司 The training method and device of a kind of Speaker Identification model for phrase sound
CN108269574A (en) * 2017-12-29 2018-07-10 安徽科大讯飞医疗信息技术有限公司 Voice signal processing method and device, storage medium and electronic equipment
CN108510977A (en) * 2018-03-21 2018-09-07 清华大学 Language Identification and computer equipment
CN108648747A (en) * 2018-03-21 2018-10-12 清华大学 Language recognition system
CN110858477A (en) * 2018-08-13 2020-03-03 中国科学院声学研究所 Language identification and classification method and device based on noise reduction automatic encoder
CN112270923A (en) * 2020-10-22 2021-01-26 江苏峰鑫网络科技有限公司 Semantic recognition system based on neural network
CN113744717A (en) * 2020-05-15 2021-12-03 阿里巴巴集团控股有限公司 Language identification method and device
CN115394288A (en) * 2022-10-28 2022-11-25 成都爱维译科技有限公司 Language identification method and system for civil aviation multi-language radio land-air conversation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599126A (en) * 2009-04-22 2009-12-09 哈尔滨工业大学 Utilize the support vector machine classifier of overall intercommunication weighting
CN101702314A (en) * 2009-10-13 2010-05-05 清华大学 Method for establishing identified type language recognition model based on language pair
CN103077709A (en) * 2012-12-28 2013-05-01 中国科学院声学研究所 Method and device for identifying languages based on common identification subspace mapping

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599126A (en) * 2009-04-22 2009-12-09 哈尔滨工业大学 Utilize the support vector machine classifier of overall intercommunication weighting
CN101702314A (en) * 2009-10-13 2010-05-05 清华大学 Method for establishing identified type language recognition model based on language pair
CN103077709A (en) * 2012-12-28 2013-05-01 中国科学院声学研究所 Method and device for identifying languages based on common identification subspace mapping

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XIANLIANG WANG ETC: "Language recognition system using language branch discriminative information", 《ACOUSTICS,SPEECH AND SIGNAL PROCESSING(ICASSP),2014 IEEE INTERNATIONAL CONFERENCE ON》 *
仲海兵: "基于音素层信息的语种识别", 《中国优秀硕士学位论文全文数据库》 *
王宪亮 等: "基于SVM一对一分类的语种识别方法", 《清华大学学报(自然科学版)》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107369440A (en) * 2017-08-02 2017-11-21 北京灵伴未来科技有限公司 The training method and device of a kind of Speaker Identification model for phrase sound
CN108269574A (en) * 2017-12-29 2018-07-10 安徽科大讯飞医疗信息技术有限公司 Voice signal processing method and device, storage medium and electronic equipment
CN108269574B (en) * 2017-12-29 2021-05-25 安徽科大讯飞医疗信息技术有限公司 Method and device for processing voice signal to represent vocal cord state of user, storage medium and electronic equipment
CN108510977A (en) * 2018-03-21 2018-09-07 清华大学 Language Identification and computer equipment
CN108648747A (en) * 2018-03-21 2018-10-12 清华大学 Language recognition system
CN108510977B (en) * 2018-03-21 2020-05-22 清华大学 Language identification method and computer equipment
CN108648747B (en) * 2018-03-21 2020-06-02 清华大学 Language identification system
CN110858477A (en) * 2018-08-13 2020-03-03 中国科学院声学研究所 Language identification and classification method and device based on noise reduction automatic encoder
CN113744717A (en) * 2020-05-15 2021-12-03 阿里巴巴集团控股有限公司 Language identification method and device
CN112270923A (en) * 2020-10-22 2021-01-26 江苏峰鑫网络科技有限公司 Semantic recognition system based on neural network
CN115394288A (en) * 2022-10-28 2022-11-25 成都爱维译科技有限公司 Language identification method and system for civil aviation multi-language radio land-air conversation
CN115394288B (en) * 2022-10-28 2023-01-24 成都爱维译科技有限公司 Language identification method and system for civil aviation multi-language radio land-air conversation

Also Published As

Publication number Publication date
CN105280181B (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN105280181A (en) Training method for language recognition model and language recognition method
CN107492382B (en) Voiceprint information extraction method and device based on neural network
McLaren et al. Advances in deep neural network approaches to speaker recognition
JP6954680B2 (en) Speaker confirmation method and speaker confirmation device
US9355642B2 (en) Speaker recognition method through emotional model synthesis based on neighbors preserving principle
CN104732978B (en) The relevant method for distinguishing speek person of text based on combined depth study
CN107146601A (en) A kind of rear end i vector Enhancement Methods for Speaker Recognition System
CN109637545B (en) Voiceprint recognition method based on one-dimensional convolution asymmetric bidirectional long-short-time memory network
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN105261367B (en) A kind of method for distinguishing speek person
Rouvier et al. Speaker diarization through speaker embeddings
CN102664010B (en) Robust speaker distinguishing method based on multifactor frequency displacement invariant feature
CN104240706B (en) It is a kind of that the method for distinguishing speek person that similarity corrects score is matched based on GMM Token
CN103985381A (en) Voice frequency indexing method based on parameter fusion optimized decision
CN106601258A (en) Speaker identification method capable of information channel compensation based on improved LSDA algorithm
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
CN111599344A (en) Language identification method based on splicing characteristics
CN102237089B (en) Method for reducing error identification rate of text irrelevant speaker identification system
CN111091809B (en) Regional accent recognition method and device based on depth feature fusion
CN104464738A (en) Vocal print recognition method oriented to smart mobile device
Kudashev et al. A Speaker Recognition System for the SITW Challenge.
CN106486114A (en) Improve method and apparatus and audio recognition method and the device of language model
Diez et al. New insight into the use of phone log-likelihood ratios as features for language recognition
Wu et al. Joint nonnegative matrix factorization for exemplar-based voice conversion
Rouvier et al. Investigation of speaker embeddings for cross-show speaker diarization

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant