CN105280181B - A kind of training method and Language Identification of languages identification model - Google Patents

A kind of training method and Language Identification of languages identification model Download PDF

Info

Publication number
CN105280181B
CN105280181B CN201410336650.3A CN201410336650A CN105280181B CN 105280181 B CN105280181 B CN 105280181B CN 201410336650 A CN201410336650 A CN 201410336650A CN 105280181 B CN105280181 B CN 105280181B
Authority
CN
China
Prior art keywords
phoneme
regular
languages
identification model
variable quantity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410336650.3A
Other languages
Chinese (zh)
Other versions
CN105280181A (en
Inventor
周若华
王宪亮
颜永红
索宏彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN201410336650.3A priority Critical patent/CN105280181B/en
Publication of CN105280181A publication Critical patent/CN105280181A/en
Application granted granted Critical
Publication of CN105280181B publication Critical patent/CN105280181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of training methods and Language Identification of languages identification model, including:The phoneme posterior probability of extraction training voice data, log-domain is transformed by phoneme posterior probability, carries out dimensionality reduction and mean variance is regular obtains phoneme correlated characteristic;Baum-Welch statistics are calculated using phoneme correlated characteristic, the phoneme variable quantity factor is extracted using Baum-Welch statistics;The phoneme variable quantity factor is modeled, SVM models (languages identification model) are established;The phoneme variable quantity factor pair SVM models of voice data to be identified are given a mark, carry out that mean variance is regular to score, and the score after regular are analyzed using linear discriminant and the regular progress score correction in Gauss rear end, final recognition result is obtained.This method reduces computation complexity compared with traditional Language Identification, and languages recognition performance is obviously improved, and has very high practicability.

Description

A kind of training method and Language Identification of languages identification model
Technical field
The present invention relates to the recognition methods of voice data language information, it is more particularly related to be based on phoneme phase Close the Language Identification of feature.
Background technology
With the globalization of modern society's information, languages identification becomes one of speech recognition technology research hotspot.Languages are known The purpose of other technology is the machine that can be manufactured a kind of thinking apish to a certain extent and carry out Language Identification to voice, The different information of each languages is exactly extracted from voice signal, and judges affiliated languages on this basis.The voice of extraction is believed Number feature directly influences the result of languages identification.
The languages identification technology of mainstream includes based on acoustical frequency spectrum feature recognition and being based on phoneme feature recognition two major classes.
Acoustical frequency spectrum feature refers to shift differential spectrum signature (MSDC) (document [1] P.A.Torres- of Mel-cepstrum Carrasquillo,E.Singer,M.A.Kohler,R.J.Greene,D.A.Reynolds,and JR Deller Jr, “Approaches to language identication using Gaussian mixture models and shifted delta cepstral features,"in Seventh International Conference on Spoken Language Processing.Citeseer, 2002.), the model method based on acoustical frequency spectrum feature be from Feature of the cepstrum feature extracted in voice as the voice, then models these features, without reference to the hair of voice Message ceases.Modeling is usually using gauss hybrid models (GMM) (document [2] L.Burget, P.Matejka and J.Cernocky,“Discriminative training techniques for acoustic language identification”,International Conference on Acoustics,Speech,and Signal Processing, vol.1,2006.) and supporting vector machine model (SVM) (document [3] W.M.Campbell, J.P.Campbell,D.A.Reynolds,E.Singer and P.A.Torres-Carrasquillo,“Support vector machines for speaker and language recognition”,Computer Speech Language,vol.20,no.2-3,pp.210-229,2006.).Ivector systems (document [4] based on factorial analysis Najim Dehak,Pedro A Torres-Carrasquillo,Douglas A Reynolds,and Reda Dehak, “Language recognition viai-vectors and dimensionality reduction.,”in INTERSPEECH, 2011, pp.857-860.) languages identification in achieve good performance, be widely used. Ivector methods define a lower dimensional space for being known as total variation factor space, this space contains speaker simultaneously Then the Gauss super vector of higher-dimension is expressed as the total variation factor of low-dimensional by space and channel space, it is demonstrated experimentally that low-dimensional The total variation factor can characterize the Gauss super vector of higher-dimension completely.After this method introduces languages identification, rapidly becomes acoustics and build The main stream approach of mould, the perhaps research of multi-speech recognition are all carried out on the basis of this method.However, languages identification in for The research of Ivector methods is only limited to acoustical frequency spectrum feature, is not generalized to comprising abundant phonetic pronunciation information Phoneme feature.
Language recognition system based on phoneme feature using phoneme recognizer to voice be decoded to obtain aligned phoneme sequence or Then phoneme lattice models languages using grammatical feature.Document [5] (W.M.Campbell, F.Richardson and D.A.Reynolds,“Language recognition with word lattices and support vector machines”,International Conference on Acoustics,Speech,and Signal Processing, vol.4,2007.)。PPRVSM(H.Li,B.Ma,C.-H.Lee,A vector space modeling approach to spoken language identification,Audio,Speech,and Language Processing,IEEE Transactions on15 (1) (2007) 271-284) vector space model is introduced to the languages identification skill based on phoneme recognition In art, aligned phoneme sequence or phoneme lattice are considered as " text ", the phoneme for having distinctive is extracted from aligned phoneme sequence or phoneme lattice Then string is classified using support vector machines as characteristic item composition characteristic vector, has obtained good languages recognition performance.
Traditional identifying system based on phoneme feature considers the pronunciation character of voice, is better than being based on recognition performance Acoustical frequency spectrum tag system, but since decoding aligned phoneme sequence computation complexity is high, long operational time, therefore seldom in real system Middle use.
Invention content
It is an object of the invention to overcome tradition scarce comprising phonetic pronunciation information based on acoustical frequency spectrum characterization method It falls into, overcomes tradition high based on phoneme characterization method decoding aligned phoneme sequence computation complexity, the defect of long operational time, to provide A kind of reduction computational complexity, improves the Language Identification of recognition performance.
To achieve the goals above, the present invention provides a kind of training method of languages identification model and languages identification sides The training method of method, wherein languages identification model includes the following steps:
Step 1-1), a certain number of target language voice data are acquired as training sentence, extract the sound of training sentence Plain posterior probability;
Step 1-2), phoneme posterior probability is transformed into log-domain, and carry out dimensionality reduction, mean value is carried out to the feature after dimensionality reduction Variance is regular (MVN), obtains phoneme correlated characteristic;
Assuming that xitIt is the t frame log-domain phonemes posterior probability vector of i-th of training sentence, TiIt is i-th of training sentence Frame number, miIt is the mean value of all frame log-domain phoneme posterior probability of i-th of training sentence, as following formula obtains:
Covariance matrix C, such as following formula can be calculated by the mean value of training all frame log-domain phoneme posterior probability of sentence:
Wherein, N is the number of trained sentence;
PCA is generated by the corresponding eigenvector of preceding L (L values and phoneme number are close) a dominant eigenvalues of covariance matrix Transition matrix APCA, by the PCA transition matrixes APCAThe feature after dimensionality reduction is obtained with log-domain phoneme posterior probability vector, Expression formula is:
To the feature y after dimensionality reductionitIt is regular (MVN) to carry out mean variance, obtains phoneme correlated characteristic vector zit
Step 1-3), calculate Baum-Welch statistics using phoneme correlated characteristic;
C is Gaussian component, and Ω is the variance of global context model (UBM), and p (c | zit, Ω) and indicate that t frames belong to c-th The probability of Gaussian component, μcIt is the mean vector of c-th of Gaussian component.
Step 1-4), extract the phoneme variable quantity factor using Baum-Welch statistics;
The phoneme variable quantity factor w of i-th of training sentence is obtained by following formula:
W=(I+TTΣ-1N(i)T)-1TTΣ-1F(i) (6)
Wherein, N (i) is diagonal matrix, and element is N on diagonal linecI, F (i) are by single order Baum-Welch statistics FcSplicing It obtains, Σ and T are trained to obtain in Factor Analysis by EM algorithms.
Step 1-5), the phoneme variable quantity factor is modeled, languages identification model is established;
Using one-to-one and one-to-many strategy, the phoneme variable quantity factor is modeled using SVM, establishes SVM models, The as described languages identification model of SVM models.
A kind of Language Identification provided by the invention, the training side of languages identification model based on the above-mentioned technical proposal Method includes the following steps:
Step 2-1), extract the phoneme posterior probability of voice data to be identified;
Step 2-2), phoneme posterior probability is transformed into log-domain, and carry out dimensionality reduction, mean value is carried out to the feature after dimensionality reduction Variance is regular (MVN), obtains phoneme correlated characteristic;
Step 2-3), calculate Baum-Welch statistics using phoneme correlated characteristic;
Step 2-4), extract the phoneme variable quantity factor using Baum-Welch statistics;
Step 2-5), the SVM models described in phoneme variable quantity factor pair are given a mark, and mean variance rule are carried out to score It is whole, (LDA) and the regular progress score correction in Gauss rear end are analyzed using linear discriminant to the score after regular, finally known Other result;
The regular calculating process of the mean variance is:
Wherein, M is the number of the supporting vector machine model, smIt is the initial score of m-th of SVM model, μ and σ divide Not Wei all SVM model scores of the test data mean value and standard deviation, k is adjustable parameter, s "mIt is regular rear score.
The advantage of the invention is that:
1, the pronunciation character of language is considered, the different information between languages becomes apparent from;
2, phoneme correlated characteristic is used for factorial analysis, improves the performance of system languages identification;
3, traditional decoding process based on phoneme Feature Recognition System is eliminated, the calculating for greatly reducing system is complicated Degree.
Description of the drawings
Fig. 1 is a kind of flow chart of the training method of languages identification model;
Fig. 2 is a kind of flow chart of Language Identification.
Specific implementation mode
The present invention is described in further detail in conjunction with attached drawing;
With reference to figure 1, a kind of flow of the training method of languages identification model includes:
Step 1-1), a certain number of target language voice data are acquired as training data, extract phoneme posterior probability;
A certain number of target language voice data are acquired as training data, at traditional voice data front end Reason cuts off mute, the invalid voices such as music to training data, retains efficient voice;Then left frequency band and right frequency band are carried respectively Take temporal mode (TRAP) feature;The frame length of each frame is 25ms, and it is 10ms that frame, which moves, and left and right frequency band takes the feature of 15 frames respectively, Therefore the feature of each frame includes the duration of surrounding 310ms.The TRAP features of left and right frequency band are respectively fed to artificial neural network and obtain To the phoneme posterior probability of two frequency bands, the phoneme posterior probability of two frequency bands is stitched together, another artificial neuron is used Network is handled, and the phoneme posterior probability of 159 dimensions is finally obtained.
Step 1-2), phoneme posterior probability is transformed into log-domain, dimensionality reduction is carried out using principal component analysis technology (PCA), It is regular (MVN) to the feature progress mean variance after dimensionality reduction, obtain phoneme correlated characteristic;
Assuming that xitIt is the t frame log-domain phonemes posterior probability vector of i-th of training sentence, TiIt is i-th of training sentence Frame number, miIt is the mean value of all frame log-domain phoneme posterior probability of i-th of training sentence, as following formula obtains:
By the mean value m of all frame log-domain phoneme posterior probability of training dataiCovariance matrix C, such as following formula can be calculated:
Wherein, N is the number of trained sentence;
PCA transition matrixes A is generated by the corresponding eigenvector of preceding 56 dominant eigenvalues of covariance matrixPCA, by described PCA transition matrixes APCAThe feature after dimensionality reduction is obtained with log-domain phoneme posterior probability vector, expression formula is:
To the feature y after dimensionality reductionitThe influence that mean variance is regular (MVN), and removal pronunciation changes is carried out, phoneme correlation is obtained Feature zit, intrinsic dimensionality is 56 dimensions.
Step 1-3), calculate Baum-Welch statistics using phoneme correlated characteristic;
C is Gaussian component, and Ω is the variance of global context model (UBM), and p (c | zit, Ω) and indicate that t frames belong to c-th The probability of Gaussian component, μcIt is the mean vector of c-th of Gaussian component.Gaussage takes 1024.
Step 1-4), extract the phoneme variable quantity factor using Baum-Welch statistics;
I-th training sentence phoneme variable quantity factor w be:
W=(I+TtΣ-1N(i)T)-1TtΣ-1F(i)
Wherein, N (i) is diagonal matrix, and element is N on diagonal linecI, F (i) are by single order Baum-Welch statistics FcSplicing It obtains, Σ and T are trained to obtain in Factor Analysis by EM algorithms.
Step 1-5), the phoneme variable quantity factor is modeled, languages identification model is established;
Using one-to-one and one-to-many strategy, the phoneme variable quantity factor is modeled using SVM, establishes M SVM mould Type, the as described languages identification model of SVM models.
With reference to figure 2, a kind of flow of Language Identification includes the following steps:
Step 2-1), the temporal mode feature of voice data to be identified is extracted, after extracting phoneme using artificial neural network Test probability;
Step 2-2), phoneme posterior probability is transformed into log-domain, dimensionality reduction is carried out using principal component analysis technology (PCA), It is regular (MVN) to the feature progress mean variance after dimensionality reduction, obtain phoneme correlated characteristic;
Step 2-3), calculate Baum-Welch statistics using phoneme correlated characteristic;
Step 2-4), extract the phoneme variable quantity factor using Baum-Welch statistics;
Step 2-5), M SVM model described in phoneme variable quantity factor pair is given a mark, and mean variance is carried out to score It is regular, to regular rear score using linear discriminant analysis (LDA) and the regular progress score correction in Gauss rear end, finally known Other result;
The regular calculating process of the mean variance is:
Wherein, smIt is the initial score of m-th of SVM model, μ and σ are respectively all SVM model scores of the test data Mean value and standard deviation, k=100, s "mIt is regular rear score.
It is tested within 2011, is surveyed on languages identification evaluation and test (LRE) data set in American National Standard technology administration (NIST) It includes 24 target languages to try languages, Performance Evaluating Indexes have EER (etc. error rates), minDCF (minimum detection mistake cost), minCavg276(the minimum average B configuration values of risk of 276 languages pair), actCavg276(the actual average risk of 276 languages pair Cost), minCavg24(the minimum average B configuration values of risk of 24 worst languages pair) and actCavg24(24 worst languages pair Actual average value of risk).S1 indicates that traditional Ivector methods, S2 indicate traditional side PPRVSM based on phoneme feature Method, S3 indicate Language Identification proposed by the present invention.Russian phoneme recognizer, the testability of each method are used in actual test Energy evaluation index comparing result is as shown in table 1.
Table 1
Method EER minDCF minCavg276 actCavg276 minCavg24 actCavg24
S1 6.65 6.86 2.45 3.45 11.88 14.52
S2 7.62 8.04 2.67 4.68 11.68 14.09
S3 5.52 5.75 1.45 2.68 9.00 12.01
Language Identification handle proposed by the present invention is used for factorial analysis with the pronunciation relevant phoneme correlated characteristic of content, surveys Test result shows Language Identification proposed by the present invention compared with traditional Ivector methods, has on recognition performance opposite The promotion of 16%-41% has phase compared with traditional Language Identification PPRVSM based on phoneme feature on recognition performance Promotion to 15%-46%.

Claims (5)

1. a kind of training method of languages identification model, including:
Step 1-1), a certain number of target language voice data are acquired as training sentence, after the phoneme for extracting training sentence Test probability;
Step 1-2), phoneme posterior probability is gone into log-domain, and carry out dimensionality reduction, mean variance rule are carried out to the feature after dimensionality reduction It is whole, obtain phoneme correlated characteristic;
Step 1-3), calculate Baum-Welch statistics using phoneme correlated characteristic;
Step 1-4), extract the phoneme variable quantity factor using Baum-Welch statistics;
Step 1-5), the phoneme variable quantity factor is modeled, languages identification model is established;
The step 1-4) calculating process be:
I-th training sentence phoneme variable quantity factor w be:
W=(I+TTΣ-1N(i)T)-1TTΣ-1F(i)
Wherein, N (i) is diagonal matrix, and element is N on diagonal linecI, F (i) are by single order Baum-Welch statistics FcSplice It arrives, Σ and T are trained to obtain in Factor Analysis by expectation-maximization algorithm;
The step 1-5) process be:Using one-to-one and one-to-many strategy, using support vector machines to phoneme variable quantity because Son is modeled, and supporting vector machine model is established, and supporting vector machine model is the languages identification model.
2. the training method of languages identification model as described in claim 1, which is characterized in that the step 1-2) calculating Cheng Wei:
Transition matrix A is generated by the corresponding eigenvector of the preceding L dominant eigenvalue of covariance matrixPCA, covariance matrix definition Such as following formula:
Wherein, N is the number of trained sentence, miIt is the mean value of all frame log-domain phoneme posterior probability of i-th of training sentence, such as Following formula obtains:
xitIt is the t frame log-domain phonemes posterior probability vector of i-th of training sentence, TiIt is the frame number of i-th of training sentence, drop It is characterized as after dimension:
To the feature y after dimensionality reductionitIt is regular to carry out mean variance, obtains phoneme correlated characteristic vector zit
3. the training method of languages identification model as described in claim 1, which is characterized in that the step 1-3) in Baum-Welch normalized set processes are:
C is Gaussian component, and Ω is the variance of global context model, and p (c | zit, Ω) and indicate that t frames belong to c-th Gaussian component Probability, μcIt is the mean vector of c-th of Gaussian component.
4. a kind of Language Identification, training method of this method based on the languages identification model described in one of claim 1-3, This method comprises the following steps:
Step 2-1), extract the phoneme posterior probability of sentence to be identified;
Step 2-2), phoneme posterior probability is gone into log-domain, and carry out dimensionality reduction, mean variance rule are carried out to the feature after dimensionality reduction It is whole, obtain phoneme correlated characteristic;
Step 2-3), calculate Baum-Welch statistics using phoneme correlated characteristic;
Step 2-4), extract the phoneme variable quantity factor using Baum-Welch statistics;
Step 2-5), the languages identification model described in phoneme variable quantity factor pair is given a mark, mean variance is carried out to score It is regular, the score after regular is analyzed and the regular progress score correction in Gauss rear end using linear discriminant, is finally identified As a result.
5. Language Identification as described in claim 4, which is characterized in that step 2-5) described in the regular meter of mean variance Calculation process is:
Wherein, M is the number of the languages identification model, smIt is the initial score of m-th of languages identification model, μ and σ difference For the mean value and standard deviation of all languages identification model scores, k is adjustable parameter, s "mIt is regular rear score.
CN201410336650.3A 2014-07-15 2014-07-15 A kind of training method and Language Identification of languages identification model Active CN105280181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410336650.3A CN105280181B (en) 2014-07-15 2014-07-15 A kind of training method and Language Identification of languages identification model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410336650.3A CN105280181B (en) 2014-07-15 2014-07-15 A kind of training method and Language Identification of languages identification model

Publications (2)

Publication Number Publication Date
CN105280181A CN105280181A (en) 2016-01-27
CN105280181B true CN105280181B (en) 2018-11-13

Family

ID=55149073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410336650.3A Active CN105280181B (en) 2014-07-15 2014-07-15 A kind of training method and Language Identification of languages identification model

Country Status (1)

Country Link
CN (1) CN105280181B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107369440B (en) * 2017-08-02 2021-04-09 北京灵伴未来科技有限公司 Training method and device of speaker recognition model for short voice
CN108269574B (en) * 2017-12-29 2021-05-25 安徽科大讯飞医疗信息技术有限公司 Method and device for processing voice signal to represent vocal cord state of user, storage medium and electronic equipment
CN108648747B (en) * 2018-03-21 2020-06-02 清华大学 Language identification system
CN108510977B (en) * 2018-03-21 2020-05-22 清华大学 Language identification method and computer equipment
CN110858477B (en) * 2018-08-13 2022-05-03 中国科学院声学研究所 Language identification and classification method and device based on noise reduction automatic encoder
CN113744717A (en) * 2020-05-15 2021-12-03 阿里巴巴集团控股有限公司 Language identification method and device
CN112270923A (en) * 2020-10-22 2021-01-26 江苏峰鑫网络科技有限公司 Semantic recognition system based on neural network
CN115394288B (en) * 2022-10-28 2023-01-24 成都爱维译科技有限公司 Language identification method and system for civil aviation multi-language radio land-air conversation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599126A (en) * 2009-04-22 2009-12-09 哈尔滨工业大学 Utilize the support vector machine classifier of overall intercommunication weighting
CN101702314A (en) * 2009-10-13 2010-05-05 清华大学 Method for establishing identified type language recognition model based on language pair
CN103077709A (en) * 2012-12-28 2013-05-01 中国科学院声学研究所 Method and device for identifying languages based on common identification subspace mapping

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599126A (en) * 2009-04-22 2009-12-09 哈尔滨工业大学 Utilize the support vector machine classifier of overall intercommunication weighting
CN101702314A (en) * 2009-10-13 2010-05-05 清华大学 Method for establishing identified type language recognition model based on language pair
CN103077709A (en) * 2012-12-28 2013-05-01 中国科学院声学研究所 Method and device for identifying languages based on common identification subspace mapping

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Language recognition system using language branch discriminative information;Xianliang Wang etc;《Acoustics,Speech and Signal Processing(ICASSP),2014 IEEE International Conference on》;20140714;第5364-5368页 *
基于SVM一对一分类的语种识别方法;王宪亮 等;《清华大学学报(自然科学版)》;20130630;第53卷(第6期);第808-812页 *
基于音素层信息的语种识别;仲海兵;《中国优秀硕士学位论文全文数据库》;20110915(第09期);第1-42页 *

Also Published As

Publication number Publication date
CN105280181A (en) 2016-01-27

Similar Documents

Publication Publication Date Title
CN105280181B (en) A kind of training method and Language Identification of languages identification model
CN105632501B (en) A kind of automatic accent classification method and device based on depth learning technology
Singer et al. The MITLL NIST LRE 2011 language recognition system
US8478591B2 (en) Phonetic variation model building apparatus and method and phonetic recognition system and method thereof
US9355642B2 (en) Speaker recognition method through emotional model synthesis based on neighbors preserving principle
Lozano-Diez et al. Analysis and Optimization of Bottleneck Features for Speaker Recognition.
CN107342077A (en) A kind of speaker segmentation clustering method and system based on factorial analysis
EP2888669B1 (en) Method and system for selectively biased linear discriminant analysis in automatic speech recognition systems
CN104240706B (en) It is a kind of that the method for distinguishing speek person that similarity corrects score is matched based on GMM Token
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN107093422A (en) A kind of audio recognition method and speech recognition system
Franco et al. Adaptive and discriminative modeling for improved mispronunciation detection
Shekofteh et al. Feature extraction based on speech attractors in the reconstructed phase space for automatic speech recognition systems
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
Kockmann et al. Investigations into prosodic syllable contour features for speaker recognition
Maghsoodi et al. Speaker recognition with random digit strings using uncertainty normalized HMM-based i-vectors
CN104376850B (en) A kind of fundamental frequency estimation method of Chinese ear voice
Wang et al. CUHK System for the Spoken Web Search task at Mediaeval 2012.
Zheng et al. Exploring robustness of DNN/RNN for extracting speaker baum-welch statistics in mismatched conditions.
CN104240699A (en) Simple and effective phrase speech recognition method
Diez et al. New insight into the use of phone log-likelihood ratios as features for language recognition
Suo et al. Using SVM as back-end classifier for language identification
Laskar et al. HiLAM-state discriminative multi-task deep neural network in dynamic time warping framework for text-dependent speaker verification
Sarkar et al. Fast approach to speaker identification for large population using MLLR and sufficient statistics
Diez et al. On the use of dot scoring for speaker diarization

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant