CN106531157B - Regularization accent adaptive approach in speech recognition - Google Patents

Regularization accent adaptive approach in speech recognition Download PDF

Info

Publication number
CN106531157B
CN106531157B CN201610971766.3A CN201610971766A CN106531157B CN 106531157 B CN106531157 B CN 106531157B CN 201610971766 A CN201610971766 A CN 201610971766A CN 106531157 B CN106531157 B CN 106531157B
Authority
CN
China
Prior art keywords
accent
acoustic model
regularization
loss function
characteristic parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610971766.3A
Other languages
Chinese (zh)
Other versions
CN106531157A (en
Inventor
陶建华
易江燕
温正棋
刘斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201610971766.3A priority Critical patent/CN106531157B/en
Publication of CN106531157A publication Critical patent/CN106531157A/en
Application granted granted Critical
Publication of CN106531157B publication Critical patent/CN106531157B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker

Abstract

The invention discloses the regularization accent adaptive approach in a kind of speech recognition, and the method includes the following steps: step S100, carry out characteristic parameter extraction to collected accent data;Step S101 utilizes the characteristic parameter of extraction, the training independent baseline acoustic model of accent;Step S102 identifies its accent classification to accent data with classifier using the characteristic parameter of extraction;Step S103 calculates softening probability distribution;Step S104, regularization objective function;Step S105 carries out the acoustic model adaptive, generation accent relies on to the independent baseline acoustic model of the accent using the loss function of regularization.In the present invention, by acoustic model carry out regularization it is adaptive, improve the accuracy rate of the speech recognition with accent.

Description

Regularization accent adaptive approach in speech recognition
Technical field
The present invention relates to the regularization accents in electronics industry signal processing technology field more particularly to a kind of speech recognition Adaptive approach.
Background technique
Voice is the most natural and most efficient medium that person to person is exchanged, and speech recognition is that people and machine carry out certainly So interactive important channel.It identifies in recent years, with the deep application in speech recognition of depth learning technology, speech recognition Obtain the achievement to attract people's attention.The end-to-end long short-term memory being trained based on connection timing classification being especially recently proposed Acoustic training model method rise, the step of not only greatly simplifying acoustic model and improve decoded speed, and And improve the precision of speech recognition.But when speaker's pronunciation less standard or when with strong accents, the standard of speech recognition True rate sharply declines.
Summary of the invention
The present invention in view of the above-mentioned problems existing in the prior art, proposes that the regularization accent in a kind of speech recognition is adaptive Method, to improve the recognition accuracy of accent voice.
Regularization accent adaptive approach in speech recognition of the invention the following steps are included:
Step S100 carries out characteristic parameter extraction to collected accent data;
Step S101 utilizes the characteristic parameter of extraction, the training independent baseline acoustic model of accent;
Step S102 identifies its accent classification to accent data with classifier using the characteristic parameter of extraction;
Step S103 calculates softening probability distribution;
Step S104, the loss function of baseline acoustic model described in regularization;
Step S105 carries out adaptively the independent baseline acoustic model of the accent using the loss function of regularization, Generate the acoustic model that accent relies on.
Further, the characteristic parameter is Meier spectrum signature or mel-frequency cepstrum feature.
Further, the static parameter for first extracting the accent data, then calculates separately the single order of the static parameter Difference and second differnce obtain the characteristic parameter.
Further, the baseline acoustic model is long Memory Neural Networks model in short-term.
Further, the classifier is feedforward neural network classifier.
Further, the softening probability distribution is calculated using forwards algorithms.
Further, the loss function is to be coupled timing Classification Loss function.
Further, in step S104, the loss function of the baseline acoustic model is considered as a regularization term and is added Enter on the standard loss function relied on to accent, for inputting target voice x, corresponding label sets are z, the connection of regularization Tie timing Classification Loss function are as follows:
L (S)=- ln ∏(x,z)∈SP (z | x)=- ∑(x,z)∈Slnp(z|x)
Wherein, ρ is regularization parameter, and S is training sample set, when L (S) is that the standard for the acoustic model that accent relies on is coupled Sequence Classification Loss function,It is the regularization connection timing Classification Loss function for the acoustic model that accent relies on.lnp(z|x) It is the corresponding correct log probability of label z in the acoustic model of accent dependence;lnpAI(z | x) is the softening log probability of label z Distribution, using forwards algorithms, is calculated to the independent length of accent in short-term memory models baseline acoustic model;For the linear combination of correct log probability and softening log probability.
Further, in step s105, only the last layer of the baseline acoustic model is carried out adaptively, to obtain The acoustic model that accent relies on.
Further, in step s105, the adaptive of the baseline acoustic model is carried out using Back Propagation Algorithm.
In the present invention, by acoustic model carry out regularization it is adaptive, improve the accurate of the speech recognition with accent Rate.
Detailed description of the invention
Fig. 1 is the flow diagram of the regularization accent adaptive approach in speech recognition of the embodiment of the present invention;
Fig. 2 is the process signal of accents recognition in regularization accent adaptive approach in speech recognition of the embodiment of the present invention Figure;
Fig. 3 is the process that softening probability generates in regularization accent adaptive approach in speech recognition of the embodiment of the present invention Schematic diagram;
Fig. 4 is that the accent in the regularization accent adaptive approach in speech recognition of the embodiment of the present invention relies on acoustic model Product process schematic diagram.
Specific embodiment
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little embodiments are used only for explaining technical principle of the invention, it is not intended that limit the scope of the invention.
As shown in Figure 1, the regularization accent adaptive approach of the embodiment of the present invention mainly includes the following steps:
Step S100 carries out characteristic parameter extraction to collected accent data.
The mandarin audio data of various dialectal accents can be acquired according to different geographical, age and gender, to form one A accent database, for training the independent baseline acoustic model of accent.
Meier spectrum signature or mel-frequency cepstrum coefficient (Mel Frequency Cepstrum are used in present embodiment Coefficient, MFCC), MFCC is put forward based on human auditory system, and recognition performance is preferable, is widely used in voice letter Number processing every field.Here, static parameter can be extracted first, their first-order difference and second differnce are then calculated separately, The parameter finally extracted is, for example, 39 dimensions, and the identification of succeeding state is carried out using this 39 dimension attribute.
In other embodiments, it is possible to use LPCC (linear prediction residue error), HMM (Hidden Markov Model), The methods of DTW (dynamic time warping) carries out characteristic parameter extraction.
Step S101 utilizes the characteristic parameter of extraction, the training independent baseline acoustic model of accent.
Use the model based on long Memory Neural Networks in short-term as the baseline acoustic model, loss in present embodiment Function is to be coupled timing Classification Loss function.
In other embodiments, it is possible to use other models train the acoustic model, including Hidden Markov-height This mixed model, Hidden Markov-BP network model, Hidden Markov-length Memory Neural Networks model in short-term, hidden horse Er Kefu-convolutional neural networks model etc..
Specifically, one can be trained using timing Classification Loss function is coupled according to the acoustical characteristic parameters of the extraction The baseline acoustic model of the independent length of accent memory depth Recognition with Recurrent Neural Network in short-term.The connection timing Classification Loss function is mark Quasi- loss function.
Step S102, as shown in Fig. 2, identifying it to accent data with classifier using the characteristic parameter extracted Accent classification.
In the present invention, classifier can be arbitrary to the classifier that accent data are classified.Make in present embodiment With feedforward neural network classifier, which constructed based on deep neural network, can have 4 classifications, hidden containing 2 Layer is hidden, each hidden layer contains 1024 nodes, and loss function is cross entropy.
Step S103 calculates softening probability distribution.
As shown in figure 3, utilizing the independent baseline of the accent constructed in step s101 according to the characteristic parameter of extraction The softening probability distribution of acoustic model calculating accent data.
Softening probability forwards algorithms are calculated, calculate probability value for each label of the acoustic model output layer, i.e., Soften probability.
Herein it is apparent that step S102 and step S103 can be carried out simultaneously, can also successively be carried out by different order.
Step S104, the loss function of baseline acoustic model described in regularization.
Specifically, by the loss function of the independent baseline acoustic model of accent be considered as a regularization term be added to accent according to On bad standard loss function, trained to prevent adaptive process from destroying the parameter of neural network acoustic model or allowing Journey generates the phenomenon that over-fitting.In present embodiment, which is to be coupled timing Classification Loss function.
For the target voice x of input, corresponding label sets are z, the connection timing Classification Loss function of regularization Then there is following formula:
L (S)=- ln ∏(x,z)∈SP (z | x)=- ∑(x,z)∈Slnp(z|x)
ρ is regularization parameter, and S is training sample set, and L (S) is that the standard for the acoustic model that accent relies on is coupled timing point Class loss function,It is the regularization connection timing Classification Loss function for the acoustic model that accent relies on.Lnp (z | x) it is mouth The corresponding correct log probability of label z in the acoustic model that sound relies on;lnpAI(z | x) is the softening log probability distribution of label z, Using forwards algorithms, it is calculated to the independent length of accent in short-term memory models baseline acoustic model;It is most New log probability eventually, i.e., the linear combination of correct log probability and softening log probability.
Step S105 carries out the independent baseline acoustic model of the accent using the loss function of regularization adaptive It answers, generates the acoustic model that accent relies on.
As shown in figure 4, using the characteristic parameter extracted in step S100, the accent classification and step that are generated using step S102 The rapid calculated softening probability of S103 and accent data utilize the loss letter of the regularization derived in step S104 as input The independent acoustic model of several pairs of accents carries out the acoustic model adaptive, generation accent relies on.
It can be carried out with Back Propagation Algorithm during carrying out adaptive in present embodiment, ultimately produce accent dependence Acoustic model.Back Propagation Algorithm is particularly suitable for neural metwork training.
Preferably, only the last layer of the baseline acoustic model is carried out adaptively, to obtain the sound of accent dependence Learn model.Only the last layer of acoustic model adaptively, all carry out compared to all layers to acoustic model adaptively For method, adaptive speed is improved.
Method of the invention, by acoustic model carry out regularization it is adaptive, improve the speech recognition with accent Accuracy rate.By carrying out regularization to loss function, adaptive step is simplified.
It should be noted that the above-mentioned definition to each element is not limited in the various specific structures mentioned in embodiment Or shape, those skilled in the art can replace with simply being known to it.
So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, this field Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this Under the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, these Technical solution after change or replacement will fall within the scope of protection of the present invention.

Claims (8)

1. the regularization accent adaptive approach in a kind of speech recognition, which is characterized in that the method includes the following steps:
Step S100 carries out characteristic parameter extraction to collected accent data;
Step S101 utilizes the characteristic parameter of extraction, the training independent baseline acoustic model of accent;
Step S102 identifies its accent classification to accent data with classifier using the characteristic parameter of extraction;
Step S103 calculates softening probability distribution using forwards algorithms;
Step S104, the loss function of baseline acoustic model described in regularization;
The loss function is to be coupled timing Classification Loss function, and the loss function of the baseline acoustic model is being considered as one just Then change item to be added on the standard loss function of accent dependence, for inputting target voice x, corresponding label sets are z, just The connection timing Classification Loss function then changed are as follows:
L (S)=- ln Π(x, z) ∈ SP (z | x)=- ∑(x, z) ∈ Sln p(z|x)
Wherein, ρ is regularization parameter, and S is training sample set, and L (S) is that the standard for the acoustic model that accent relies on is coupled timing point Class loss function,It is the regularization connection timing Classification Loss function for the acoustic model that accent relies on;Ln p (z | x) be The corresponding correct log probability of label z in the acoustic model that accent relies on;ln pAI(z | x) is the softening log probability point of label z Cloth is calculated to the independent length of accent in short-term memory models baseline acoustic model using forwards algorithms; For the linear combination of correct log probability and softening log probability;
Step S105 carries out adaptive, generation to the independent baseline acoustic model of the accent using the loss function of regularization The acoustic model that accent relies on.
2. the method according to claim 1, wherein the characteristic parameter is Meier spectrum signature or mel-frequency Cepstrum feature.
3. according to the method described in claim 2, then dividing it is characterized in that, first extract the static parameter of the accent data The first-order difference and second differnce for not calculating the static parameter, obtain the characteristic parameter.
4. the method according to claim 1, wherein the baseline acoustic model is long Memory Neural Networks in short-term Model.
5. the method according to claim 1, wherein the classifier is feedforward neural network classifier.
6. the method according to claim 1, wherein calculating the softening probability distribution using forwards algorithms.
7. method according to claim 1 to 6, which is characterized in that in step s105, only to the baseline The last layer of acoustic model carries out adaptively, to obtain the acoustic model of accent dependence.
8. method according to claim 1 to 6, which is characterized in that in step s105, use back-propagating Algorithm carries out the adaptive of the baseline acoustic model.
CN201610971766.3A 2016-10-28 2016-10-28 Regularization accent adaptive approach in speech recognition Active CN106531157B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610971766.3A CN106531157B (en) 2016-10-28 2016-10-28 Regularization accent adaptive approach in speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610971766.3A CN106531157B (en) 2016-10-28 2016-10-28 Regularization accent adaptive approach in speech recognition

Publications (2)

Publication Number Publication Date
CN106531157A CN106531157A (en) 2017-03-22
CN106531157B true CN106531157B (en) 2019-10-22

Family

ID=58326772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610971766.3A Active CN106531157B (en) 2016-10-28 2016-10-28 Regularization accent adaptive approach in speech recognition

Country Status (1)

Country Link
CN (1) CN106531157B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102339716B1 (en) * 2017-06-30 2021-12-14 삼성에스디에스 주식회사 Method for recognizing speech and Apparatus thereof
CN108108357B (en) * 2018-01-12 2022-08-09 京东方科技集团股份有限公司 Accent conversion method and device and electronic equipment
CN108564134B (en) * 2018-04-27 2021-07-06 网易(杭州)网络有限公司 Data processing method, device, computing equipment and medium
CN109102037B (en) * 2018-06-04 2024-03-05 平安科技(深圳)有限公司 Chinese model training and Chinese image recognition method, device, equipment and medium
CN110706710A (en) * 2018-06-25 2020-01-17 普天信息技术有限公司 Voice recognition method and device, electronic equipment and storage medium
CN108877784B (en) * 2018-09-05 2022-12-06 河海大学 Robust speech recognition method based on accent recognition
CN110895935B (en) * 2018-09-13 2023-10-27 阿里巴巴集团控股有限公司 Speech recognition method, system, equipment and medium
CN109410911A (en) * 2018-09-13 2019-03-01 何艳玲 Artificial intelligence learning method based on speech recognition
CN109685671A (en) * 2018-12-13 2019-04-26 平安医疗健康管理股份有限公司 Medical data exception recognition methods, equipment and storage medium based on machine learning
CN111461155A (en) * 2019-01-18 2020-07-28 富士通株式会社 Apparatus and method for training classification model
CN111128229A (en) * 2019-08-05 2020-05-08 上海海事大学 Voice classification method and device and computer storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102208030A (en) * 2011-06-03 2011-10-05 天津大学 Bayesian-model-averaging-based model combing method on regularization path of support vector machine
CN102405495A (en) * 2009-03-11 2012-04-04 谷歌公司 Audio classification for information retrieval using sparse features
EP2996045A1 (en) * 2014-09-10 2016-03-16 Xerox Corporation Language model with structured penalty

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7890328B1 (en) * 2006-09-07 2011-02-15 At&T Intellectual Property Ii, L.P. Enhanced accuracy for speech recognition grammars
US10373054B2 (en) * 2015-04-19 2019-08-06 International Business Machines Corporation Annealed dropout training of neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102405495A (en) * 2009-03-11 2012-04-04 谷歌公司 Audio classification for information retrieval using sparse features
CN102208030A (en) * 2011-06-03 2011-10-05 天津大学 Bayesian-model-averaging-based model combing method on regularization path of support vector machine
EP2996045A1 (en) * 2014-09-10 2016-03-16 Xerox Corporation Language model with structured penalty

Also Published As

Publication number Publication date
CN106531157A (en) 2017-03-22

Similar Documents

Publication Publication Date Title
CN106531157B (en) Regularization accent adaptive approach in speech recognition
CN105632501B (en) A kind of automatic accent classification method and device based on depth learning technology
CN104681036B (en) A kind of detecting system and method for language audio
Saon et al. Speaker adaptation of neural network acoustic models using i-vectors
CN103928023B (en) A kind of speech assessment method and system
CN104575490B (en) Spoken language pronunciation evaluating method based on deep neural network posterior probability algorithm
Senior et al. GMM-free DNN acoustic model training
Kourkounakis et al. Fluentnet: End-to-end detection of stuttered speech disfluencies with deep learning
US11862145B2 (en) Deep hierarchical fusion for machine intelligence applications
Agarwalla et al. Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech
CN108806696A (en) Establish method, apparatus, computer equipment and the storage medium of sound-groove model
CN104240706B (en) It is a kind of that the method for distinguishing speek person that similarity corrects score is matched based on GMM Token
Ryant et al. Highly accurate mandarin tone classification in the absence of pitch information
Abro et al. Qur'an recognition for the purpose of memorisation using Speech Recognition technique
Xie et al. A KL divergence and DNN approach to cross-lingual TTS
CN106653002A (en) Literal live broadcasting method and platform
Kourkounakis et al. FluentNet: end-to-end detection of speech disfluency with deep learning
Sreevidya et al. Sentiment analysis by deep learning approaches
Yilmaz et al. Combining non-pathological data of different language varieties to improve DNN-HMM performance on pathological speech
KR20160059265A (en) Method And Apparatus for Learning Acoustic Model Considering Reliability Score
White et al. Maximum entropy confidence estimation for speech recognition
Mansour et al. Voice recognition Using back propagation algorithm in neural networks
Rabiee et al. Persian accents identification using an adaptive neural network
Kumar et al. Designing neural speaker embeddings with meta learning
Farooq et al. Mispronunciation detection in articulation points of Arabic letters using machine learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant