CN105280181A - Training method for language recognition model and language recognition method - Google Patents
Training method for language recognition model and language recognition method Download PDFInfo
- Publication number
- CN105280181A CN105280181A CN201410336650.3A CN201410336650A CN105280181A CN 105280181 A CN105280181 A CN 105280181A CN 201410336650 A CN201410336650 A CN 201410336650A CN 105280181 A CN105280181 A CN 105280181A
- Authority
- CN
- China
- Prior art keywords
- phoneme
- model
- cognition
- training
- languages
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The invention relates to a training method for a language recognition model and a language recognition method. The language recognition method comprises the steps: extracting the phoneme posterior probability of speech data, converting the phoneme posterior probability into a log domain, conducting dimensionality reduction and mean and variance normalization, and then obtaining phoneme associated features; calculating Baum-Welch statistical magnitude by means of the phoneme associated features, and extracting a phoneme variance factor through the Baum-Welch statistical magnitude; modeling the phoneme variance factor, and establishing an SVM model (a language recognition model); and marking the SVM model by a phoneme variance factor of to-be-recognized speech data, conducting mean and variance normalization on the score, performing linear discriminant analysis and Gauss back end normalization on the normalized score to realize score correction, and finally obtaining a recognition result. Compared with a conventional language recognition method, the language recognition method of the invention has the advantages that the calculation complexity is reduced, the language recognition performance is obviously improved, and the method is highly practical.
Description
Technical field
The present invention relates to the recognition methods of speech data language information, more particularly, the present invention relates to the Language Identification based on phoneme correlated characteristic.
Background technology
Along with the globalization of modern society's information, languages identification becomes one of speech recognition technology study hotspot.The object of languages recognition technology to manufacture Language Identification is carried out in a kind of apish thinking to a certain extent machine to voice, from voice signal, namely extract the different information of each languages, and languages belonging to judging on this basis.The phonic signal character extracted directly has influence on the result of languages identification.
The languages recognition technology of main flow comprises based on the identification of acoustics spectrum signature with based on the large class of phoneme feature identification two.
Acoustics spectrum signature refers to shift differential spectrum signature (MSDC) (document [1] P.A.Torres-Carrasquillo of Mel-cepstrum, E.Singer, M.A.Kohler, R.J.Greene, D.A.Reynolds, andJRDellerJr, " ApproachestolanguageidenticationusingGaussianmixturemode lsandshifteddeltacepstralfeatures, " inSeventhInternationalConferenceonSpokenLanguageProcessi ng.Citeseer, 2002.), model method based on acoustics spectrum signature is using the feature of the cepstrum feature extracted from voice as these voice, then modeling is carried out to these features, do not relate to the pronunciation information of voice.Modeling uses gauss hybrid models (GMM) (document [2] L.Burget usually, P.MatejkaandJ.Cernocky, " Discriminativetrainingtechniquesforacousticlanguageident ification ", InternationalConferenceonAcoustics, Speech, andSignalProcessing, vol.1, 2006.) and supporting vector machine model (SVM) (document [3] W.M.Campbell, J.P.Campbell, D.A.Reynolds, E.SingerandP.A.Torres-Carrasquillo, " Supportvectormachinesforspeakerandlanguagerecognition ", ComputerSpeechLanguage, vol.20, no.2-3, pp.210-229, 2006.).Based on ivector system (document [4] NajimDehak of factorial analysis, PedroATorres-Carrasquillo, DouglasAReynolds, andRedaDehak, " Languagerecognitionviai-vectorsanddimensionalityreductio n., " inINTERSPEECH, 2011, pp.857 – 860.) in languages identification, achieve good performance, be widely used.Ivector method defines the lower dimensional space that is called total variation factor space, this space contains speaker space and channel space simultaneously, then Gauss's super vector of higher-dimension is expressed as the total variation factor of low-dimensional, experiment proves, the total variation factor of low-dimensional can Gauss's super vector of Complete Characterization higher-dimension.The method becomes rapidly the main stream approach of Acoustic Modeling after introducing languages identification, and the research of being permitted multi-speech recognition is all carried out on the method basis.But, in languages identification, acoustics spectrum signature is just confined to for the research of Ivector method, is not generalized to the phoneme feature comprising abundant phonetic pronunciation information.
Language recognition system based on phoneme feature adopts phoneme recognizer to carry out decoding to voice and obtains aligned phoneme sequence or phoneme lattice, then uses grammatical feature to carry out modeling to languages.Document [5] (W.M.Campbell, F.RichardsonandD.A.Reynolds, " Languagerecognitionwithwordlatticesandsupportvectormachi nes ", InternationalConferenceonAcoustics, Speech, andSignalProcessing, vol.4,2007.).PPRVSM (H.Li, B.Ma, C.-H.Lee, Avectorspacemodelingapproachtospokenlanguageidentificati on, Audio, Speech, andLanguageProcessing, IEEETransactionson15 (1) (2007) 271 – 284) vector space model is introduced based in the languages recognition technology of phoneme recognition, aligned phoneme sequence or phoneme lattice are considered as " text ", extract from aligned phoneme sequence or phoneme lattice and have distinctive phone string as characteristic item composition characteristic vector, then support vector machine is adopted to classify, obtain good languages recognition performance.
Traditional recognition system based on phoneme feature considers the pronunciation character of voice, recognition performance is better than based on acoustics spectrum signature system, but due to decoding aligned phoneme sequence computation complexity high, long operational time, therefore seldom uses in systems in practice.
Summary of the invention
The object of the invention is to overcome traditional defect not comprising phonetic pronunciation information based on acoustics spectrum signature method, overcome tradition high based on phoneme characterization method decoding aligned phoneme sequence computation complexity, the defect of long operational time, thus a kind of reduction computational complexity is provided, improve the Language Identification of recognition performance.
To achieve these goals, the invention provides a kind of training method and Language Identification of languages model of cognition, wherein the training method of languages model of cognition comprises the steps:
Step 1-1), the target language speech data gathering some, as training statement, extracts the phoneme posterior probability of training statement;
Step 1-2), phoneme posterior probability is transformed into log-domain, and carries out dimensionality reduction, mean variance regular (MVN) is carried out to the feature after dimensionality reduction, obtains phoneme correlated characteristic;
Suppose x
itthe t frame log-domain phoneme posterior probability vector of i-th training statement, T
ithe frame number of i-th training statement, m
ithe average of i-th training statement all frame log-domains phoneme posterior probability, as shown in the formula obtaining:
Covariance matrix C can be calculated by the average of training statement all frame log-domains phoneme posterior probability, as shown in the formula:
Wherein, N is the number of training statement;
The latent vector corresponding by L before covariance matrix (L value is close with phoneme number) individual dominant eigenvalue generates PCA transition matrix A
pCA, by described PCA transition matrix A
pCAobtain the feature after dimensionality reduction with log-domain phoneme posterior probability vector, its expression formula is:
To the feature y after dimensionality reduction
itcarry out mean variance regular (MVN), obtain phoneme correlated characteristic vector z
it.
Step 1-3), utilize phoneme correlated characteristic to calculate Baum-Welch statistic;
C is gaussian component, and Ω is the variance of global context model (UBM), p (c|z
it, Ω) and represent that t frame belongs to the probability of c gaussian component, μ
cit is the mean vector of c gaussian component.
Step 1-4), utilize Baum-Welch statistic to extract the phoneme variable quantity factor;
I-th training statement phoneme variable quantity factor w obtained by following formula:
w=(I+T
TΣ
-1N(i)T)
-1T
TΣ
-1F(i)(6)
Wherein, N (i) is diagonal matrix, and on diagonal line, element is N
ci, F (i) are by single order Baum-Welch statistic F
csplicing obtains, Σ and T is obtained by EM Algorithm for Training in Factor Analysis.
Step 1-5), modeling is carried out to the phoneme variable quantity factor, sets up languages model of cognition;
Adopt one to one with one-to-many strategy, use SVM modeling is carried out to the phoneme variable quantity factor, set up SVM model, SVM model is described languages model of cognition.
A kind of Language Identification provided by the invention, based on the training method of the languages model of cognition of technique scheme, comprises the steps:
Step 2-1), extract the phoneme posterior probability of speech data to be identified;
Step 2-2), phoneme posterior probability is transformed into log-domain, and carries out dimensionality reduction, mean variance regular (MVN) is carried out to the feature after dimensionality reduction, obtains phoneme correlated characteristic;
Step 2-3), utilize phoneme correlated characteristic to calculate Baum-Welch statistic;
Step 2-4), utilize Baum-Welch statistic to extract the phoneme variable quantity factor;
Step 2-5), by the SVM model marking described in phoneme variable quantity factor pair, and it is regular to carry out mean variance to score, use linear discriminant to analyze (LDA) to the score after regular and Gauss rear end is regular carries out score correction, obtain final recognition result;
The regular computation process of described mean variance is:
Wherein, M is the number of described supporting vector machine model, s
mbe the initial score of m SVM model, μ and σ is respectively average and the standard deviation of all SVM model score of this test data, and k is adjustable parameter, s "
mit is regular rear score.
The invention has the advantages that:
1, consider the pronunciation character of language, the different information between languages is more obvious;
2, phoneme correlated characteristic is used for factorial analysis, improves the performance of system languages identification;
3, eliminate the decode procedure of tradition based on phoneme Feature Recognition System, greatly reduce the computation complexity of system.
Accompanying drawing explanation
Fig. 1 is a kind of process flow diagram of training method of languages model of cognition;
Fig. 2 is a kind of process flow diagram of Language Identification.
Embodiment
Now by reference to the accompanying drawings the present invention is described in further detail;
With reference to figure 1, a kind of flow process of training method of languages model of cognition comprises:
Step 1-1), gather the target language speech data of some as training data, extract phoneme posterior probability;
The target language speech data gathering some is as training data, and by traditional speech data front-end processing, quiet to training data excision, the invalid voice such as music, retain efficient voice; Then respectively to left frequency band and right frequency band extraction time pattern (TRAP) feature; The frame length of each frame is 25ms, and frame moves as 10ms, and the feature of 15 frames got respectively by left and right frequency band, and therefore the feature of each frame comprises the duration of 310ms around.The TRAP feature of left and right frequency band sends into the phoneme posterior probability that artificial neural network obtains two frequency bands respectively, the phoneme posterior probability of two frequency bands is stitched together, and uses another person's artificial neural networks to process, finally obtains the phoneme posterior probability of 159 dimensions.
Step 1-2), phoneme posterior probability is transformed into log-domain, uses principal component analysis (PCA) technology (PCA) to carry out dimensionality reduction, mean variance regular (MVN) is carried out to the feature after dimensionality reduction, obtains phoneme correlated characteristic;
Suppose x
itthe t frame log-domain phoneme posterior probability vector of i-th training statement, T
ithe frame number of i-th training statement, m
ithe average of i-th training statement all frame log-domains phoneme posterior probability, as shown in the formula obtaining:
By the average m of training data all frame log-domains phoneme posterior probability
icovariance matrix C can be calculated, as shown in the formula:
Wherein, N is the number of training statement;
PCA transition matrix A is generated by the latent vector that front 56 dominant eigenvalues of covariance matrix are corresponding
pCA, by described PCA transition matrix A
pCAobtain the feature after dimensionality reduction with log-domain phoneme posterior probability vector, its expression formula is:
To the feature y after dimensionality reduction
itcarry out mean variance regular (MVN), remove the impact of pronunciation change, obtain phoneme correlated characteristic z
it, intrinsic dimensionality is 56 dimensions.
Step 1-3), utilize phoneme correlated characteristic to calculate Baum-Welch statistic;
C is gaussian component, and Ω is the variance of global context model (UBM), p (c|z
it, Ω) and represent that t frame belongs to the probability of c gaussian component, μ
cit is the mean vector of c gaussian component.Gaussage gets 1024.
Step 1-4), utilize Baum-Welch statistic to extract the phoneme variable quantity factor;
I-th training statement phoneme variable quantity factor w be:
w=(I+T
tΣ
-1N(i)T)
-1T
tΣ
-1F(i)
Wherein, N (i) is diagonal matrix, and on diagonal line, element is N
ci, F (i) are by single order Baum-Welch statistic F
csplicing obtains, Σ and T is obtained by EM Algorithm for Training in Factor Analysis.
Step 1-5), modeling is carried out to the phoneme variable quantity factor, sets up languages model of cognition;
Adopt one to one with one-to-many strategy, use SVM modeling is carried out to the phoneme variable quantity factor, set up M SVM model, SVM model is described languages model of cognition.
With reference to figure 2, a kind of flow process of Language Identification comprises the steps:
Step 2-1), extract the temporal mode feature of speech data to be identified, end user's artificial neural networks extracts phoneme posterior probability;
Step 2-2), phoneme posterior probability is transformed into log-domain, uses principal component analysis (PCA) technology (PCA) to carry out dimensionality reduction, mean variance regular (MVN) is carried out to the feature after dimensionality reduction, obtains phoneme correlated characteristic;
Step 2-3), utilize phoneme correlated characteristic to calculate Baum-Welch statistic;
Step 2-4), utilize Baum-Welch statistic to extract the phoneme variable quantity factor;
Step 2-5), by M SVM model marking described in phoneme variable quantity factor pair, and it is regular to carry out mean variance to score, use linear discriminant to analyze (LDA) to regular rear score and Gauss rear end is regular carries out score correction, obtain final recognition result;
The regular computation process of described mean variance is:
Wherein, s
mbe the initial score of m SVM model, μ and σ is respectively average and the standard deviation of all SVM model score of this test data, k=100, s "
mit is regular rear score.
American National Standard technology administration (NIST) languages identification in 2011 evaluation and test (LRE) data set is tested, test languages comprise 24 target languages, and Performance Evaluating Indexes has EER (etc. error rate), minDCF (minimum detection mistake cost), minCavg
276(the minimum average B configuration value of risk that 276 languages are right), actCavg
276(the actual average value of risk that 276 languages are right), minCavg
24(the minimum average B configuration value of risk that 24 languages the poorest are right) and actCavg
24(the actual average value of risk that 24 languages the poorest are right).S1 represents traditional Ivector method, and S2 represents traditional PPRVSM method based on phoneme feature, and S3 represents the Language Identification that the present invention proposes.Use Russian phoneme recognizer in actual test, the test performance evaluation index comparing result of each method is as shown in table 1.
Table 1
Method | EER | minDCF | minCavg 276 | actCavg 276 | minCavg 24 | actCavg 24 |
S1 | 6.65 | 6.86 | 2.45 | 3.45 | 11.88 | 14.52 |
S2 | 7.62 | 8.04 | 2.67 | 4.68 | 11.68 | 14.09 |
S3 | 5.52 | 5.75 | 1.45 | 2.68 | 9.00 | 12.01 |
The Language Identification that the present invention proposes is used for factorial analysis the phoneme correlated characteristic relevant to pronunciation content, test result shows the Language Identification of the present invention's proposition compared with traditional Ivector method, recognition performance has the lifting of relative 16%-41%, compared with traditional Language Identification PPRVSM based on phoneme feature, recognition performance has the lifting of relative 15%-46%.
Claims (7)
1. a training method for languages model of cognition, comprising:
Step 1-1), the target language speech data gathering some, as training statement, extracts the phoneme posterior probability of training statement;
Step 1-2), forward phoneme posterior probability to log-domain, and carry out dimensionality reduction, mean variance carries out to the feature after dimensionality reduction regular, obtain phoneme correlated characteristic;
Step 1-3), utilize phoneme correlated characteristic to calculate Baum-Welch statistic;
Step 1-4), utilize Baum-Welch statistic to extract the phoneme variable quantity factor;
Step 1-5), modeling is carried out to the phoneme variable quantity factor, sets up languages model of cognition.
2., by the training method of languages model of cognition according to claim 1, it is characterized in that, described step 1-2) computation process be:
The latent vector T.G Grammar matrix A corresponding by the dominant eigenvalue of L before covariance matrix
pCA, covariance matrix is defined as follows formula:
Wherein, N is the number of training statement, m
ithe average of i-th training statement all frame log-domains phoneme posterior probability, as shown in the formula obtaining:
X
itthe t frame log-domain phoneme posterior probability vector of i-th training statement, T
ibe the frame number of i-th training statement, the feature after dimensionality reduction is:
To the feature y after dimensionality reduction
itcarry out mean variance regular, obtain phoneme correlated characteristic vector z
it.
3., by the training method of languages model of cognition according to claim 1, it is characterized in that, described step 1-3) in Baum-Welch normalized set process be:
C is gaussian component, and Ω is the variance of global context model, p (c|z
it, Ω) and represent that t frame belongs to the probability of c gaussian component, μ
cit is the mean vector of c gaussian component.
4., by the training method of languages model of cognition according to claim 1, it is characterized in that, described step 1-4) computation process be:
The phoneme variable quantity factor w of i-th training statement is:
w=(I+T
TΣ
-1N(i)T)
-1T
TΣ
-1F(i)
Wherein, N (i) is diagonal matrix, and on diagonal line, element is N
ci, F (i) are by single order Baum-Welch statistic F
csplicing obtains, Σ and T is trained by expectation-maximization algorithm and obtain in Factor Analysis.
5. by the training method of languages model of cognition according to claim 1, it is characterized in that, described step 1-5) process be: adopt one to one with one-to-many strategy, support vector machine is used to carry out modeling to the phoneme variable quantity factor, set up supporting vector machine model, supporting vector machine model is described languages model of cognition.
6. a Language Identification, the method is based on the training method of the languages model of cognition one of claim 1-5 Suo Shu, and the method comprises the steps:
Step 2-1), extract the phoneme posterior probability of statement to be identified;
Step 2-2), forward phoneme posterior probability to log-domain, and carry out dimensionality reduction, mean variance carries out to the feature after dimensionality reduction regular, obtain phoneme correlated characteristic;
Step 2-3), utilize phoneme correlated characteristic to calculate Baum-Welch statistic;
Step 2-4), utilize Baum-Welch statistic to extract the phoneme variable quantity factor;
Step 2-5), the languages model of cognition described in phoneme variable quantity factor pair is given a mark, mean variance carries out to score regular, linear discriminant analysis is used to the score after regular and Gauss rear end is regular carries out score correction, obtain final recognition result.
7., by Language Identification according to claim 6, it is characterized in that, step 2-5) described in the regular computation process of mean variance be:
Wherein, M is the number of described languages model of cognition, s
mbe the initial score of m languages model of cognition, μ and σ is respectively average and the standard deviation of this test data all languages model of cognition score, and k is adjustable parameter, s "
mit is regular rear score.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410336650.3A CN105280181B (en) | 2014-07-15 | 2014-07-15 | A kind of training method and Language Identification of languages identification model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410336650.3A CN105280181B (en) | 2014-07-15 | 2014-07-15 | A kind of training method and Language Identification of languages identification model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105280181A true CN105280181A (en) | 2016-01-27 |
CN105280181B CN105280181B (en) | 2018-11-13 |
Family
ID=55149073
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410336650.3A Active CN105280181B (en) | 2014-07-15 | 2014-07-15 | A kind of training method and Language Identification of languages identification model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105280181B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107369440A (en) * | 2017-08-02 | 2017-11-21 | 北京灵伴未来科技有限公司 | The training method and device of a kind of Speaker Identification model for phrase sound |
CN108269574A (en) * | 2017-12-29 | 2018-07-10 | 安徽科大讯飞医疗信息技术有限公司 | Voice signal processing method and device, storage medium and electronic equipment |
CN108510977A (en) * | 2018-03-21 | 2018-09-07 | 清华大学 | Language Identification and computer equipment |
CN108648747A (en) * | 2018-03-21 | 2018-10-12 | 清华大学 | Language recognition system |
CN110858477A (en) * | 2018-08-13 | 2020-03-03 | 中国科学院声学研究所 | Language identification and classification method and device based on noise reduction automatic encoder |
CN112270923A (en) * | 2020-10-22 | 2021-01-26 | 江苏峰鑫网络科技有限公司 | Semantic recognition system based on neural network |
CN113744717A (en) * | 2020-05-15 | 2021-12-03 | 阿里巴巴集团控股有限公司 | Language identification method and device |
CN115394288A (en) * | 2022-10-28 | 2022-11-25 | 成都爱维译科技有限公司 | Language identification method and system for civil aviation multi-language radio land-air conversation |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101599126A (en) * | 2009-04-22 | 2009-12-09 | 哈尔滨工业大学 | Utilize the support vector machine classifier of overall intercommunication weighting |
CN101702314A (en) * | 2009-10-13 | 2010-05-05 | 清华大学 | Method for establishing identified type language recognition model based on language pair |
CN103077709A (en) * | 2012-12-28 | 2013-05-01 | 中国科学院声学研究所 | Method and device for identifying languages based on common identification subspace mapping |
-
2014
- 2014-07-15 CN CN201410336650.3A patent/CN105280181B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101599126A (en) * | 2009-04-22 | 2009-12-09 | 哈尔滨工业大学 | Utilize the support vector machine classifier of overall intercommunication weighting |
CN101702314A (en) * | 2009-10-13 | 2010-05-05 | 清华大学 | Method for establishing identified type language recognition model based on language pair |
CN103077709A (en) * | 2012-12-28 | 2013-05-01 | 中国科学院声学研究所 | Method and device for identifying languages based on common identification subspace mapping |
Non-Patent Citations (3)
Title |
---|
XIANLIANG WANG ETC: "Language recognition system using language branch discriminative information", 《ACOUSTICS,SPEECH AND SIGNAL PROCESSING(ICASSP),2014 IEEE INTERNATIONAL CONFERENCE ON》 * |
仲海兵: "基于音素层信息的语种识别", 《中国优秀硕士学位论文全文数据库》 * |
王宪亮 等: "基于SVM一对一分类的语种识别方法", 《清华大学学报(自然科学版)》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107369440A (en) * | 2017-08-02 | 2017-11-21 | 北京灵伴未来科技有限公司 | The training method and device of a kind of Speaker Identification model for phrase sound |
CN108269574A (en) * | 2017-12-29 | 2018-07-10 | 安徽科大讯飞医疗信息技术有限公司 | Voice signal processing method and device, storage medium and electronic equipment |
CN108269574B (en) * | 2017-12-29 | 2021-05-25 | 安徽科大讯飞医疗信息技术有限公司 | Method and device for processing voice signal to represent vocal cord state of user, storage medium and electronic equipment |
CN108510977A (en) * | 2018-03-21 | 2018-09-07 | 清华大学 | Language Identification and computer equipment |
CN108648747A (en) * | 2018-03-21 | 2018-10-12 | 清华大学 | Language recognition system |
CN108510977B (en) * | 2018-03-21 | 2020-05-22 | 清华大学 | Language identification method and computer equipment |
CN108648747B (en) * | 2018-03-21 | 2020-06-02 | 清华大学 | Language identification system |
CN110858477A (en) * | 2018-08-13 | 2020-03-03 | 中国科学院声学研究所 | Language identification and classification method and device based on noise reduction automatic encoder |
CN113744717A (en) * | 2020-05-15 | 2021-12-03 | 阿里巴巴集团控股有限公司 | Language identification method and device |
CN112270923A (en) * | 2020-10-22 | 2021-01-26 | 江苏峰鑫网络科技有限公司 | Semantic recognition system based on neural network |
CN115394288A (en) * | 2022-10-28 | 2022-11-25 | 成都爱维译科技有限公司 | Language identification method and system for civil aviation multi-language radio land-air conversation |
CN115394288B (en) * | 2022-10-28 | 2023-01-24 | 成都爱维译科技有限公司 | Language identification method and system for civil aviation multi-language radio land-air conversation |
Also Published As
Publication number | Publication date |
---|---|
CN105280181B (en) | 2018-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105280181A (en) | Training method for language recognition model and language recognition method | |
CN107492382B (en) | Voiceprint information extraction method and device based on neural network | |
McLaren et al. | Advances in deep neural network approaches to speaker recognition | |
JP6954680B2 (en) | Speaker confirmation method and speaker confirmation device | |
US9355642B2 (en) | Speaker recognition method through emotional model synthesis based on neighbors preserving principle | |
CN104732978B (en) | The relevant method for distinguishing speek person of text based on combined depth study | |
CN107146601A (en) | A kind of rear end i vector Enhancement Methods for Speaker Recognition System | |
CN109637545B (en) | Voiceprint recognition method based on one-dimensional convolution asymmetric bidirectional long-short-time memory network | |
CN103456302B (en) | A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight | |
CN105261367B (en) | A kind of method for distinguishing speek person | |
Rouvier et al. | Speaker diarization through speaker embeddings | |
CN102664010B (en) | Robust speaker distinguishing method based on multifactor frequency displacement invariant feature | |
CN104240706B (en) | It is a kind of that the method for distinguishing speek person that similarity corrects score is matched based on GMM Token | |
CN103985381A (en) | Voice frequency indexing method based on parameter fusion optimized decision | |
CN106601258A (en) | Speaker identification method capable of information channel compensation based on improved LSDA algorithm | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
CN111599344A (en) | Language identification method based on splicing characteristics | |
CN102237089B (en) | Method for reducing error identification rate of text irrelevant speaker identification system | |
CN111091809B (en) | Regional accent recognition method and device based on depth feature fusion | |
CN104464738A (en) | Vocal print recognition method oriented to smart mobile device | |
Kudashev et al. | A Speaker Recognition System for the SITW Challenge. | |
CN106486114A (en) | Improve method and apparatus and audio recognition method and the device of language model | |
Diez et al. | New insight into the use of phone log-likelihood ratios as features for language recognition | |
Wu et al. | Joint nonnegative matrix factorization for exemplar-based voice conversion | |
Rouvier et al. | Investigation of speaker embeddings for cross-show speaker diarization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |