CN105280181B - A kind of training method and Language Identification of languages identification model - Google Patents
A kind of training method and Language Identification of languages identification model Download PDFInfo
- Publication number
- CN105280181B CN105280181B CN201410336650.3A CN201410336650A CN105280181B CN 105280181 B CN105280181 B CN 105280181B CN 201410336650 A CN201410336650 A CN 201410336650A CN 105280181 B CN105280181 B CN 105280181B
- Authority
- CN
- China
- Prior art keywords
- phoneme
- regular
- languages
- identification model
- variable quantity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The present invention relates to a kind of training methods and Language Identification of languages identification model, including:The phoneme posterior probability of extraction training voice data, log-domain is transformed by phoneme posterior probability, carries out dimensionality reduction and mean variance is regular obtains phoneme correlated characteristic;Baum-Welch statistics are calculated using phoneme correlated characteristic, the phoneme variable quantity factor is extracted using Baum-Welch statistics;The phoneme variable quantity factor is modeled, SVM models (languages identification model) are established;The phoneme variable quantity factor pair SVM models of voice data to be identified are given a mark, carry out that mean variance is regular to score, and the score after regular are analyzed using linear discriminant and the regular progress score correction in Gauss rear end, final recognition result is obtained.This method reduces computation complexity compared with traditional Language Identification, and languages recognition performance is obviously improved, and has very high practicability.
Description
Technical field
The present invention relates to the recognition methods of voice data language information, it is more particularly related to be based on phoneme phase
Close the Language Identification of feature.
Background technology
With the globalization of modern society's information, languages identification becomes one of speech recognition technology research hotspot.Languages are known
The purpose of other technology is the machine that can be manufactured a kind of thinking apish to a certain extent and carry out Language Identification to voice,
The different information of each languages is exactly extracted from voice signal, and judges affiliated languages on this basis.The voice of extraction is believed
Number feature directly influences the result of languages identification.
The languages identification technology of mainstream includes based on acoustical frequency spectrum feature recognition and being based on phoneme feature recognition two major classes.
Acoustical frequency spectrum feature refers to shift differential spectrum signature (MSDC) (document [1] P.A.Torres- of Mel-cepstrum
Carrasquillo,E.Singer,M.A.Kohler,R.J.Greene,D.A.Reynolds,and JR Deller Jr,
“Approaches to language identication using Gaussian mixture models and
shifted delta cepstral features,"in Seventh International Conference on
Spoken Language Processing.Citeseer, 2002.), the model method based on acoustical frequency spectrum feature be from
Feature of the cepstrum feature extracted in voice as the voice, then models these features, without reference to the hair of voice
Message ceases.Modeling is usually using gauss hybrid models (GMM) (document [2] L.Burget, P.Matejka and
J.Cernocky,“Discriminative training techniques for acoustic language
identification”,International Conference on Acoustics,Speech,and Signal
Processing, vol.1,2006.) and supporting vector machine model (SVM) (document [3] W.M.Campbell,
J.P.Campbell,D.A.Reynolds,E.Singer and P.A.Torres-Carrasquillo,“Support
vector machines for speaker and language recognition”,Computer Speech
Language,vol.20,no.2-3,pp.210-229,2006.).Ivector systems (document [4] based on factorial analysis
Najim Dehak,Pedro A Torres-Carrasquillo,Douglas A Reynolds,and Reda Dehak,
“Language recognition viai-vectors and dimensionality reduction.,”in
INTERSPEECH, 2011, pp.857-860.) languages identification in achieve good performance, be widely used.
Ivector methods define a lower dimensional space for being known as total variation factor space, this space contains speaker simultaneously
Then the Gauss super vector of higher-dimension is expressed as the total variation factor of low-dimensional by space and channel space, it is demonstrated experimentally that low-dimensional
The total variation factor can characterize the Gauss super vector of higher-dimension completely.After this method introduces languages identification, rapidly becomes acoustics and build
The main stream approach of mould, the perhaps research of multi-speech recognition are all carried out on the basis of this method.However, languages identification in for
The research of Ivector methods is only limited to acoustical frequency spectrum feature, is not generalized to comprising abundant phonetic pronunciation information
Phoneme feature.
Language recognition system based on phoneme feature using phoneme recognizer to voice be decoded to obtain aligned phoneme sequence or
Then phoneme lattice models languages using grammatical feature.Document [5] (W.M.Campbell, F.Richardson and
D.A.Reynolds,“Language recognition with word lattices and support vector
machines”,International Conference on Acoustics,Speech,and Signal Processing,
vol.4,2007.)。PPRVSM(H.Li,B.Ma,C.-H.Lee,A vector space modeling approach to
spoken language identification,Audio,Speech,and Language Processing,IEEE
Transactions on15 (1) (2007) 271-284) vector space model is introduced to the languages identification skill based on phoneme recognition
In art, aligned phoneme sequence or phoneme lattice are considered as " text ", the phoneme for having distinctive is extracted from aligned phoneme sequence or phoneme lattice
Then string is classified using support vector machines as characteristic item composition characteristic vector, has obtained good languages recognition performance.
Traditional identifying system based on phoneme feature considers the pronunciation character of voice, is better than being based on recognition performance
Acoustical frequency spectrum tag system, but since decoding aligned phoneme sequence computation complexity is high, long operational time, therefore seldom in real system
Middle use.
Invention content
It is an object of the invention to overcome tradition scarce comprising phonetic pronunciation information based on acoustical frequency spectrum characterization method
It falls into, overcomes tradition high based on phoneme characterization method decoding aligned phoneme sequence computation complexity, the defect of long operational time, to provide
A kind of reduction computational complexity, improves the Language Identification of recognition performance.
To achieve the goals above, the present invention provides a kind of training method of languages identification model and languages identification sides
The training method of method, wherein languages identification model includes the following steps:
Step 1-1), a certain number of target language voice data are acquired as training sentence, extract the sound of training sentence
Plain posterior probability;
Step 1-2), phoneme posterior probability is transformed into log-domain, and carry out dimensionality reduction, mean value is carried out to the feature after dimensionality reduction
Variance is regular (MVN), obtains phoneme correlated characteristic;
Assuming that xitIt is the t frame log-domain phonemes posterior probability vector of i-th of training sentence, TiIt is i-th of training sentence
Frame number, miIt is the mean value of all frame log-domain phoneme posterior probability of i-th of training sentence, as following formula obtains:
Covariance matrix C, such as following formula can be calculated by the mean value of training all frame log-domain phoneme posterior probability of sentence:
Wherein, N is the number of trained sentence;
PCA is generated by the corresponding eigenvector of preceding L (L values and phoneme number are close) a dominant eigenvalues of covariance matrix
Transition matrix APCA, by the PCA transition matrixes APCAThe feature after dimensionality reduction is obtained with log-domain phoneme posterior probability vector,
Expression formula is:
To the feature y after dimensionality reductionitIt is regular (MVN) to carry out mean variance, obtains phoneme correlated characteristic vector zit。
Step 1-3), calculate Baum-Welch statistics using phoneme correlated characteristic;
C is Gaussian component, and Ω is the variance of global context model (UBM), and p (c | zit, Ω) and indicate that t frames belong to c-th
The probability of Gaussian component, μcIt is the mean vector of c-th of Gaussian component.
Step 1-4), extract the phoneme variable quantity factor using Baum-Welch statistics;
The phoneme variable quantity factor w of i-th of training sentence is obtained by following formula:
W=(I+TTΣ-1N(i)T)-1TTΣ-1F(i) (6)
Wherein, N (i) is diagonal matrix, and element is N on diagonal linecI, F (i) are by single order Baum-Welch statistics FcSplicing
It obtains, Σ and T are trained to obtain in Factor Analysis by EM algorithms.
Step 1-5), the phoneme variable quantity factor is modeled, languages identification model is established;
Using one-to-one and one-to-many strategy, the phoneme variable quantity factor is modeled using SVM, establishes SVM models,
The as described languages identification model of SVM models.
A kind of Language Identification provided by the invention, the training side of languages identification model based on the above-mentioned technical proposal
Method includes the following steps:
Step 2-1), extract the phoneme posterior probability of voice data to be identified;
Step 2-2), phoneme posterior probability is transformed into log-domain, and carry out dimensionality reduction, mean value is carried out to the feature after dimensionality reduction
Variance is regular (MVN), obtains phoneme correlated characteristic;
Step 2-3), calculate Baum-Welch statistics using phoneme correlated characteristic;
Step 2-4), extract the phoneme variable quantity factor using Baum-Welch statistics;
Step 2-5), the SVM models described in phoneme variable quantity factor pair are given a mark, and mean variance rule are carried out to score
It is whole, (LDA) and the regular progress score correction in Gauss rear end are analyzed using linear discriminant to the score after regular, finally known
Other result;
The regular calculating process of the mean variance is:
Wherein, M is the number of the supporting vector machine model, smIt is the initial score of m-th of SVM model, μ and σ divide
Not Wei all SVM model scores of the test data mean value and standard deviation, k is adjustable parameter, s "mIt is regular rear score.
The advantage of the invention is that:
1, the pronunciation character of language is considered, the different information between languages becomes apparent from;
2, phoneme correlated characteristic is used for factorial analysis, improves the performance of system languages identification;
3, traditional decoding process based on phoneme Feature Recognition System is eliminated, the calculating for greatly reducing system is complicated
Degree.
Description of the drawings
Fig. 1 is a kind of flow chart of the training method of languages identification model;
Fig. 2 is a kind of flow chart of Language Identification.
Specific implementation mode
The present invention is described in further detail in conjunction with attached drawing;
With reference to figure 1, a kind of flow of the training method of languages identification model includes:
Step 1-1), a certain number of target language voice data are acquired as training data, extract phoneme posterior probability;
A certain number of target language voice data are acquired as training data, at traditional voice data front end
Reason cuts off mute, the invalid voices such as music to training data, retains efficient voice;Then left frequency band and right frequency band are carried respectively
Take temporal mode (TRAP) feature;The frame length of each frame is 25ms, and it is 10ms that frame, which moves, and left and right frequency band takes the feature of 15 frames respectively,
Therefore the feature of each frame includes the duration of surrounding 310ms.The TRAP features of left and right frequency band are respectively fed to artificial neural network and obtain
To the phoneme posterior probability of two frequency bands, the phoneme posterior probability of two frequency bands is stitched together, another artificial neuron is used
Network is handled, and the phoneme posterior probability of 159 dimensions is finally obtained.
Step 1-2), phoneme posterior probability is transformed into log-domain, dimensionality reduction is carried out using principal component analysis technology (PCA),
It is regular (MVN) to the feature progress mean variance after dimensionality reduction, obtain phoneme correlated characteristic;
Assuming that xitIt is the t frame log-domain phonemes posterior probability vector of i-th of training sentence, TiIt is i-th of training sentence
Frame number, miIt is the mean value of all frame log-domain phoneme posterior probability of i-th of training sentence, as following formula obtains:
By the mean value m of all frame log-domain phoneme posterior probability of training dataiCovariance matrix C, such as following formula can be calculated:
Wherein, N is the number of trained sentence;
PCA transition matrixes A is generated by the corresponding eigenvector of preceding 56 dominant eigenvalues of covariance matrixPCA, by described
PCA transition matrixes APCAThe feature after dimensionality reduction is obtained with log-domain phoneme posterior probability vector, expression formula is:
To the feature y after dimensionality reductionitThe influence that mean variance is regular (MVN), and removal pronunciation changes is carried out, phoneme correlation is obtained
Feature zit, intrinsic dimensionality is 56 dimensions.
Step 1-3), calculate Baum-Welch statistics using phoneme correlated characteristic;
C is Gaussian component, and Ω is the variance of global context model (UBM), and p (c | zit, Ω) and indicate that t frames belong to c-th
The probability of Gaussian component, μcIt is the mean vector of c-th of Gaussian component.Gaussage takes 1024.
Step 1-4), extract the phoneme variable quantity factor using Baum-Welch statistics;
I-th training sentence phoneme variable quantity factor w be:
W=(I+TtΣ-1N(i)T)-1TtΣ-1F(i)
Wherein, N (i) is diagonal matrix, and element is N on diagonal linecI, F (i) are by single order Baum-Welch statistics FcSplicing
It obtains, Σ and T are trained to obtain in Factor Analysis by EM algorithms.
Step 1-5), the phoneme variable quantity factor is modeled, languages identification model is established;
Using one-to-one and one-to-many strategy, the phoneme variable quantity factor is modeled using SVM, establishes M SVM mould
Type, the as described languages identification model of SVM models.
With reference to figure 2, a kind of flow of Language Identification includes the following steps:
Step 2-1), the temporal mode feature of voice data to be identified is extracted, after extracting phoneme using artificial neural network
Test probability;
Step 2-2), phoneme posterior probability is transformed into log-domain, dimensionality reduction is carried out using principal component analysis technology (PCA),
It is regular (MVN) to the feature progress mean variance after dimensionality reduction, obtain phoneme correlated characteristic;
Step 2-3), calculate Baum-Welch statistics using phoneme correlated characteristic;
Step 2-4), extract the phoneme variable quantity factor using Baum-Welch statistics;
Step 2-5), M SVM model described in phoneme variable quantity factor pair is given a mark, and mean variance is carried out to score
It is regular, to regular rear score using linear discriminant analysis (LDA) and the regular progress score correction in Gauss rear end, finally known
Other result;
The regular calculating process of the mean variance is:
Wherein, smIt is the initial score of m-th of SVM model, μ and σ are respectively all SVM model scores of the test data
Mean value and standard deviation, k=100, s "mIt is regular rear score.
It is tested within 2011, is surveyed on languages identification evaluation and test (LRE) data set in American National Standard technology administration (NIST)
It includes 24 target languages to try languages, Performance Evaluating Indexes have EER (etc. error rates), minDCF (minimum detection mistake cost),
minCavg276(the minimum average B configuration values of risk of 276 languages pair), actCavg276(the actual average risk of 276 languages pair
Cost), minCavg24(the minimum average B configuration values of risk of 24 worst languages pair) and actCavg24(24 worst languages pair
Actual average value of risk).S1 indicates that traditional Ivector methods, S2 indicate traditional side PPRVSM based on phoneme feature
Method, S3 indicate Language Identification proposed by the present invention.Russian phoneme recognizer, the testability of each method are used in actual test
Energy evaluation index comparing result is as shown in table 1.
Table 1
Method | EER | minDCF | minCavg276 | actCavg276 | minCavg24 | actCavg24 |
S1 | 6.65 | 6.86 | 2.45 | 3.45 | 11.88 | 14.52 |
S2 | 7.62 | 8.04 | 2.67 | 4.68 | 11.68 | 14.09 |
S3 | 5.52 | 5.75 | 1.45 | 2.68 | 9.00 | 12.01 |
Language Identification handle proposed by the present invention is used for factorial analysis with the pronunciation relevant phoneme correlated characteristic of content, surveys
Test result shows Language Identification proposed by the present invention compared with traditional Ivector methods, has on recognition performance opposite
The promotion of 16%-41% has phase compared with traditional Language Identification PPRVSM based on phoneme feature on recognition performance
Promotion to 15%-46%.
Claims (5)
1. a kind of training method of languages identification model, including:
Step 1-1), a certain number of target language voice data are acquired as training sentence, after the phoneme for extracting training sentence
Test probability;
Step 1-2), phoneme posterior probability is gone into log-domain, and carry out dimensionality reduction, mean variance rule are carried out to the feature after dimensionality reduction
It is whole, obtain phoneme correlated characteristic;
Step 1-3), calculate Baum-Welch statistics using phoneme correlated characteristic;
Step 1-4), extract the phoneme variable quantity factor using Baum-Welch statistics;
Step 1-5), the phoneme variable quantity factor is modeled, languages identification model is established;
The step 1-4) calculating process be:
I-th training sentence phoneme variable quantity factor w be:
W=(I+TTΣ-1N(i)T)-1TTΣ-1F(i)
Wherein, N (i) is diagonal matrix, and element is N on diagonal linecI, F (i) are by single order Baum-Welch statistics FcSplice
It arrives, Σ and T are trained to obtain in Factor Analysis by expectation-maximization algorithm;
The step 1-5) process be:Using one-to-one and one-to-many strategy, using support vector machines to phoneme variable quantity because
Son is modeled, and supporting vector machine model is established, and supporting vector machine model is the languages identification model.
2. the training method of languages identification model as described in claim 1, which is characterized in that the step 1-2) calculating
Cheng Wei:
Transition matrix A is generated by the corresponding eigenvector of the preceding L dominant eigenvalue of covariance matrixPCA, covariance matrix definition
Such as following formula:
Wherein, N is the number of trained sentence, miIt is the mean value of all frame log-domain phoneme posterior probability of i-th of training sentence, such as
Following formula obtains:
xitIt is the t frame log-domain phonemes posterior probability vector of i-th of training sentence, TiIt is the frame number of i-th of training sentence, drop
It is characterized as after dimension:
To the feature y after dimensionality reductionitIt is regular to carry out mean variance, obtains phoneme correlated characteristic vector zit。
3. the training method of languages identification model as described in claim 1, which is characterized in that the step 1-3) in
Baum-Welch normalized set processes are:
C is Gaussian component, and Ω is the variance of global context model, and p (c | zit, Ω) and indicate that t frames belong to c-th Gaussian component
Probability, μcIt is the mean vector of c-th of Gaussian component.
4. a kind of Language Identification, training method of this method based on the languages identification model described in one of claim 1-3,
This method comprises the following steps:
Step 2-1), extract the phoneme posterior probability of sentence to be identified;
Step 2-2), phoneme posterior probability is gone into log-domain, and carry out dimensionality reduction, mean variance rule are carried out to the feature after dimensionality reduction
It is whole, obtain phoneme correlated characteristic;
Step 2-3), calculate Baum-Welch statistics using phoneme correlated characteristic;
Step 2-4), extract the phoneme variable quantity factor using Baum-Welch statistics;
Step 2-5), the languages identification model described in phoneme variable quantity factor pair is given a mark, mean variance is carried out to score
It is regular, the score after regular is analyzed and the regular progress score correction in Gauss rear end using linear discriminant, is finally identified
As a result.
5. Language Identification as described in claim 4, which is characterized in that step 2-5) described in the regular meter of mean variance
Calculation process is:
Wherein, M is the number of the languages identification model, smIt is the initial score of m-th of languages identification model, μ and σ difference
For the mean value and standard deviation of all languages identification model scores, k is adjustable parameter, s "mIt is regular rear score.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410336650.3A CN105280181B (en) | 2014-07-15 | 2014-07-15 | A kind of training method and Language Identification of languages identification model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410336650.3A CN105280181B (en) | 2014-07-15 | 2014-07-15 | A kind of training method and Language Identification of languages identification model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105280181A CN105280181A (en) | 2016-01-27 |
CN105280181B true CN105280181B (en) | 2018-11-13 |
Family
ID=55149073
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410336650.3A Active CN105280181B (en) | 2014-07-15 | 2014-07-15 | A kind of training method and Language Identification of languages identification model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105280181B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107369440B (en) * | 2017-08-02 | 2021-04-09 | 北京灵伴未来科技有限公司 | Training method and device of speaker recognition model for short voice |
CN108269574B (en) * | 2017-12-29 | 2021-05-25 | 安徽科大讯飞医疗信息技术有限公司 | Method and device for processing voice signal to represent vocal cord state of user, storage medium and electronic equipment |
CN108648747B (en) * | 2018-03-21 | 2020-06-02 | 清华大学 | Language identification system |
CN108510977B (en) * | 2018-03-21 | 2020-05-22 | 清华大学 | Language identification method and computer equipment |
CN110858477B (en) * | 2018-08-13 | 2022-05-03 | 中国科学院声学研究所 | Language identification and classification method and device based on noise reduction automatic encoder |
CN113744717A (en) * | 2020-05-15 | 2021-12-03 | 阿里巴巴集团控股有限公司 | Language identification method and device |
CN112270923A (en) * | 2020-10-22 | 2021-01-26 | 江苏峰鑫网络科技有限公司 | Semantic recognition system based on neural network |
CN115394288B (en) * | 2022-10-28 | 2023-01-24 | 成都爱维译科技有限公司 | Language identification method and system for civil aviation multi-language radio land-air conversation |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101599126A (en) * | 2009-04-22 | 2009-12-09 | 哈尔滨工业大学 | Utilize the support vector machine classifier of overall intercommunication weighting |
CN101702314A (en) * | 2009-10-13 | 2010-05-05 | 清华大学 | Method for establishing identified type language recognition model based on language pair |
CN103077709A (en) * | 2012-12-28 | 2013-05-01 | 中国科学院声学研究所 | Method and device for identifying languages based on common identification subspace mapping |
-
2014
- 2014-07-15 CN CN201410336650.3A patent/CN105280181B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101599126A (en) * | 2009-04-22 | 2009-12-09 | 哈尔滨工业大学 | Utilize the support vector machine classifier of overall intercommunication weighting |
CN101702314A (en) * | 2009-10-13 | 2010-05-05 | 清华大学 | Method for establishing identified type language recognition model based on language pair |
CN103077709A (en) * | 2012-12-28 | 2013-05-01 | 中国科学院声学研究所 | Method and device for identifying languages based on common identification subspace mapping |
Non-Patent Citations (3)
Title |
---|
Language recognition system using language branch discriminative information;Xianliang Wang etc;《Acoustics,Speech and Signal Processing(ICASSP),2014 IEEE International Conference on》;20140714;第5364-5368页 * |
基于SVM一对一分类的语种识别方法;王宪亮 等;《清华大学学报(自然科学版)》;20130630;第53卷(第6期);第808-812页 * |
基于音素层信息的语种识别;仲海兵;《中国优秀硕士学位论文全文数据库》;20110915(第09期);第1-42页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105280181A (en) | 2016-01-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105280181B (en) | A kind of training method and Language Identification of languages identification model | |
CN105632501B (en) | A kind of automatic accent classification method and device based on depth learning technology | |
Singer et al. | The MITLL NIST LRE 2011 language recognition system | |
US8478591B2 (en) | Phonetic variation model building apparatus and method and phonetic recognition system and method thereof | |
US9355642B2 (en) | Speaker recognition method through emotional model synthesis based on neighbors preserving principle | |
Lozano-Diez et al. | Analysis and Optimization of Bottleneck Features for Speaker Recognition. | |
CN107342077A (en) | A kind of speaker segmentation clustering method and system based on factorial analysis | |
EP2888669B1 (en) | Method and system for selectively biased linear discriminant analysis in automatic speech recognition systems | |
CN104240706B (en) | It is a kind of that the method for distinguishing speek person that similarity corrects score is matched based on GMM Token | |
CN103456302B (en) | A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight | |
CN107093422A (en) | A kind of audio recognition method and speech recognition system | |
Franco et al. | Adaptive and discriminative modeling for improved mispronunciation detection | |
Shekofteh et al. | Feature extraction based on speech attractors in the reconstructed phase space for automatic speech recognition systems | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
Kockmann et al. | Investigations into prosodic syllable contour features for speaker recognition | |
Maghsoodi et al. | Speaker recognition with random digit strings using uncertainty normalized HMM-based i-vectors | |
CN104376850B (en) | A kind of fundamental frequency estimation method of Chinese ear voice | |
Wang et al. | CUHK System for the Spoken Web Search task at Mediaeval 2012. | |
Zheng et al. | Exploring robustness of DNN/RNN for extracting speaker baum-welch statistics in mismatched conditions. | |
CN104240699A (en) | Simple and effective phrase speech recognition method | |
Diez et al. | New insight into the use of phone log-likelihood ratios as features for language recognition | |
Suo et al. | Using SVM as back-end classifier for language identification | |
Laskar et al. | HiLAM-state discriminative multi-task deep neural network in dynamic time warping framework for text-dependent speaker verification | |
Sarkar et al. | Fast approach to speaker identification for large population using MLLR and sufficient statistics | |
Diez et al. | On the use of dot scoring for speaker diarization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |