CN106531157B - Regularization accent adaptive approach in speech recognition - Google Patents
Regularization accent adaptive approach in speech recognition Download PDFInfo
- Publication number
- CN106531157B CN106531157B CN201610971766.3A CN201610971766A CN106531157B CN 106531157 B CN106531157 B CN 106531157B CN 201610971766 A CN201610971766 A CN 201610971766A CN 106531157 B CN106531157 B CN 106531157B
- Authority
- CN
- China
- Prior art keywords
- accent
- acoustic model
- regularization
- loss function
- characteristic parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 19
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 33
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 230000015654 memory Effects 0.000 claims description 5
- 230000003068 static effect Effects 0.000 claims description 5
- 230000006403 short-term memory Effects 0.000 claims description 4
- 238000001228 spectrum Methods 0.000 claims description 3
- 239000004744 fabric Substances 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 241001672694 Citrus reticulata Species 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
Abstract
The invention discloses the regularization accent adaptive approach in a kind of speech recognition, and the method includes the following steps: step S100, carry out characteristic parameter extraction to collected accent data;Step S101 utilizes the characteristic parameter of extraction, the training independent baseline acoustic model of accent;Step S102 identifies its accent classification to accent data with classifier using the characteristic parameter of extraction;Step S103 calculates softening probability distribution;Step S104, regularization objective function;Step S105 carries out the acoustic model adaptive, generation accent relies on to the independent baseline acoustic model of the accent using the loss function of regularization.In the present invention, by acoustic model carry out regularization it is adaptive, improve the accuracy rate of the speech recognition with accent.
Description
Technical field
The present invention relates to the regularization accents in electronics industry signal processing technology field more particularly to a kind of speech recognition
Adaptive approach.
Background technique
Voice is the most natural and most efficient medium that person to person is exchanged, and speech recognition is that people and machine carry out certainly
So interactive important channel.It identifies in recent years, with the deep application in speech recognition of depth learning technology, speech recognition
Obtain the achievement to attract people's attention.The end-to-end long short-term memory being trained based on connection timing classification being especially recently proposed
Acoustic training model method rise, the step of not only greatly simplifying acoustic model and improve decoded speed, and
And improve the precision of speech recognition.But when speaker's pronunciation less standard or when with strong accents, the standard of speech recognition
True rate sharply declines.
Summary of the invention
The present invention in view of the above-mentioned problems existing in the prior art, proposes that the regularization accent in a kind of speech recognition is adaptive
Method, to improve the recognition accuracy of accent voice.
Regularization accent adaptive approach in speech recognition of the invention the following steps are included:
Step S100 carries out characteristic parameter extraction to collected accent data;
Step S101 utilizes the characteristic parameter of extraction, the training independent baseline acoustic model of accent;
Step S102 identifies its accent classification to accent data with classifier using the characteristic parameter of extraction;
Step S103 calculates softening probability distribution;
Step S104, the loss function of baseline acoustic model described in regularization;
Step S105 carries out adaptively the independent baseline acoustic model of the accent using the loss function of regularization,
Generate the acoustic model that accent relies on.
Further, the characteristic parameter is Meier spectrum signature or mel-frequency cepstrum feature.
Further, the static parameter for first extracting the accent data, then calculates separately the single order of the static parameter
Difference and second differnce obtain the characteristic parameter.
Further, the baseline acoustic model is long Memory Neural Networks model in short-term.
Further, the classifier is feedforward neural network classifier.
Further, the softening probability distribution is calculated using forwards algorithms.
Further, the loss function is to be coupled timing Classification Loss function.
Further, in step S104, the loss function of the baseline acoustic model is considered as a regularization term and is added
Enter on the standard loss function relied on to accent, for inputting target voice x, corresponding label sets are z, the connection of regularization
Tie timing Classification Loss function are as follows:
L (S)=- ln ∏(x,z)∈SP (z | x)=- ∑(x,z)∈Slnp(z|x)
Wherein, ρ is regularization parameter, and S is training sample set, when L (S) is that the standard for the acoustic model that accent relies on is coupled
Sequence Classification Loss function,It is the regularization connection timing Classification Loss function for the acoustic model that accent relies on.lnp(z|x)
It is the corresponding correct log probability of label z in the acoustic model of accent dependence;lnpAI(z | x) is the softening log probability of label z
Distribution, using forwards algorithms, is calculated to the independent length of accent in short-term memory models baseline acoustic model;For the linear combination of correct log probability and softening log probability.
Further, in step s105, only the last layer of the baseline acoustic model is carried out adaptively, to obtain
The acoustic model that accent relies on.
Further, in step s105, the adaptive of the baseline acoustic model is carried out using Back Propagation Algorithm.
In the present invention, by acoustic model carry out regularization it is adaptive, improve the accurate of the speech recognition with accent
Rate.
Detailed description of the invention
Fig. 1 is the flow diagram of the regularization accent adaptive approach in speech recognition of the embodiment of the present invention;
Fig. 2 is the process signal of accents recognition in regularization accent adaptive approach in speech recognition of the embodiment of the present invention
Figure;
Fig. 3 is the process that softening probability generates in regularization accent adaptive approach in speech recognition of the embodiment of the present invention
Schematic diagram;
Fig. 4 is that the accent in the regularization accent adaptive approach in speech recognition of the embodiment of the present invention relies on acoustic model
Product process schematic diagram.
Specific embodiment
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this
A little embodiments are used only for explaining technical principle of the invention, it is not intended that limit the scope of the invention.
As shown in Figure 1, the regularization accent adaptive approach of the embodiment of the present invention mainly includes the following steps:
Step S100 carries out characteristic parameter extraction to collected accent data.
The mandarin audio data of various dialectal accents can be acquired according to different geographical, age and gender, to form one
A accent database, for training the independent baseline acoustic model of accent.
Meier spectrum signature or mel-frequency cepstrum coefficient (Mel Frequency Cepstrum are used in present embodiment
Coefficient, MFCC), MFCC is put forward based on human auditory system, and recognition performance is preferable, is widely used in voice letter
Number processing every field.Here, static parameter can be extracted first, their first-order difference and second differnce are then calculated separately,
The parameter finally extracted is, for example, 39 dimensions, and the identification of succeeding state is carried out using this 39 dimension attribute.
In other embodiments, it is possible to use LPCC (linear prediction residue error), HMM (Hidden Markov Model),
The methods of DTW (dynamic time warping) carries out characteristic parameter extraction.
Step S101 utilizes the characteristic parameter of extraction, the training independent baseline acoustic model of accent.
Use the model based on long Memory Neural Networks in short-term as the baseline acoustic model, loss in present embodiment
Function is to be coupled timing Classification Loss function.
In other embodiments, it is possible to use other models train the acoustic model, including Hidden Markov-height
This mixed model, Hidden Markov-BP network model, Hidden Markov-length Memory Neural Networks model in short-term, hidden horse
Er Kefu-convolutional neural networks model etc..
Specifically, one can be trained using timing Classification Loss function is coupled according to the acoustical characteristic parameters of the extraction
The baseline acoustic model of the independent length of accent memory depth Recognition with Recurrent Neural Network in short-term.The connection timing Classification Loss function is mark
Quasi- loss function.
Step S102, as shown in Fig. 2, identifying it to accent data with classifier using the characteristic parameter extracted
Accent classification.
In the present invention, classifier can be arbitrary to the classifier that accent data are classified.Make in present embodiment
With feedforward neural network classifier, which constructed based on deep neural network, can have 4 classifications, hidden containing 2
Layer is hidden, each hidden layer contains 1024 nodes, and loss function is cross entropy.
Step S103 calculates softening probability distribution.
As shown in figure 3, utilizing the independent baseline of the accent constructed in step s101 according to the characteristic parameter of extraction
The softening probability distribution of acoustic model calculating accent data.
Softening probability forwards algorithms are calculated, calculate probability value for each label of the acoustic model output layer, i.e.,
Soften probability.
Herein it is apparent that step S102 and step S103 can be carried out simultaneously, can also successively be carried out by different order.
Step S104, the loss function of baseline acoustic model described in regularization.
Specifically, by the loss function of the independent baseline acoustic model of accent be considered as a regularization term be added to accent according to
On bad standard loss function, trained to prevent adaptive process from destroying the parameter of neural network acoustic model or allowing
Journey generates the phenomenon that over-fitting.In present embodiment, which is to be coupled timing Classification Loss function.
For the target voice x of input, corresponding label sets are z, the connection timing Classification Loss function of regularization
Then there is following formula:
L (S)=- ln ∏(x,z)∈SP (z | x)=- ∑(x,z)∈Slnp(z|x)
ρ is regularization parameter, and S is training sample set, and L (S) is that the standard for the acoustic model that accent relies on is coupled timing point
Class loss function,It is the regularization connection timing Classification Loss function for the acoustic model that accent relies on.Lnp (z | x) it is mouth
The corresponding correct log probability of label z in the acoustic model that sound relies on;lnpAI(z | x) is the softening log probability distribution of label z,
Using forwards algorithms, it is calculated to the independent length of accent in short-term memory models baseline acoustic model;It is most
New log probability eventually, i.e., the linear combination of correct log probability and softening log probability.
Step S105 carries out the independent baseline acoustic model of the accent using the loss function of regularization adaptive
It answers, generates the acoustic model that accent relies on.
As shown in figure 4, using the characteristic parameter extracted in step S100, the accent classification and step that are generated using step S102
The rapid calculated softening probability of S103 and accent data utilize the loss letter of the regularization derived in step S104 as input
The independent acoustic model of several pairs of accents carries out the acoustic model adaptive, generation accent relies on.
It can be carried out with Back Propagation Algorithm during carrying out adaptive in present embodiment, ultimately produce accent dependence
Acoustic model.Back Propagation Algorithm is particularly suitable for neural metwork training.
Preferably, only the last layer of the baseline acoustic model is carried out adaptively, to obtain the sound of accent dependence
Learn model.Only the last layer of acoustic model adaptively, all carry out compared to all layers to acoustic model adaptively
For method, adaptive speed is improved.
Method of the invention, by acoustic model carry out regularization it is adaptive, improve the speech recognition with accent
Accuracy rate.By carrying out regularization to loss function, adaptive step is simplified.
It should be noted that the above-mentioned definition to each element is not limited in the various specific structures mentioned in embodiment
Or shape, those skilled in the art can replace with simply being known to it.
So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, this field
Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this
Under the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, these
Technical solution after change or replacement will fall within the scope of protection of the present invention.
Claims (8)
1. the regularization accent adaptive approach in a kind of speech recognition, which is characterized in that the method includes the following steps:
Step S100 carries out characteristic parameter extraction to collected accent data;
Step S101 utilizes the characteristic parameter of extraction, the training independent baseline acoustic model of accent;
Step S102 identifies its accent classification to accent data with classifier using the characteristic parameter of extraction;
Step S103 calculates softening probability distribution using forwards algorithms;
Step S104, the loss function of baseline acoustic model described in regularization;
The loss function is to be coupled timing Classification Loss function, and the loss function of the baseline acoustic model is being considered as one just
Then change item to be added on the standard loss function of accent dependence, for inputting target voice x, corresponding label sets are z, just
The connection timing Classification Loss function then changed are as follows:
L (S)=- ln Π(x, z) ∈ SP (z | x)=- ∑(x, z) ∈ Sln p(z|x)
Wherein, ρ is regularization parameter, and S is training sample set, and L (S) is that the standard for the acoustic model that accent relies on is coupled timing point
Class loss function,It is the regularization connection timing Classification Loss function for the acoustic model that accent relies on;Ln p (z | x) be
The corresponding correct log probability of label z in the acoustic model that accent relies on;ln pAI(z | x) is the softening log probability point of label z
Cloth is calculated to the independent length of accent in short-term memory models baseline acoustic model using forwards algorithms;
For the linear combination of correct log probability and softening log probability;
Step S105 carries out adaptive, generation to the independent baseline acoustic model of the accent using the loss function of regularization
The acoustic model that accent relies on.
2. the method according to claim 1, wherein the characteristic parameter is Meier spectrum signature or mel-frequency
Cepstrum feature.
3. according to the method described in claim 2, then dividing it is characterized in that, first extract the static parameter of the accent data
The first-order difference and second differnce for not calculating the static parameter, obtain the characteristic parameter.
4. the method according to claim 1, wherein the baseline acoustic model is long Memory Neural Networks in short-term
Model.
5. the method according to claim 1, wherein the classifier is feedforward neural network classifier.
6. the method according to claim 1, wherein calculating the softening probability distribution using forwards algorithms.
7. method according to claim 1 to 6, which is characterized in that in step s105, only to the baseline
The last layer of acoustic model carries out adaptively, to obtain the acoustic model of accent dependence.
8. method according to claim 1 to 6, which is characterized in that in step s105, use back-propagating
Algorithm carries out the adaptive of the baseline acoustic model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610971766.3A CN106531157B (en) | 2016-10-28 | 2016-10-28 | Regularization accent adaptive approach in speech recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610971766.3A CN106531157B (en) | 2016-10-28 | 2016-10-28 | Regularization accent adaptive approach in speech recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106531157A CN106531157A (en) | 2017-03-22 |
CN106531157B true CN106531157B (en) | 2019-10-22 |
Family
ID=58326772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610971766.3A Active CN106531157B (en) | 2016-10-28 | 2016-10-28 | Regularization accent adaptive approach in speech recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106531157B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102339716B1 (en) * | 2017-06-30 | 2021-12-14 | 삼성에스디에스 주식회사 | Method for recognizing speech and Apparatus thereof |
CN108108357B (en) * | 2018-01-12 | 2022-08-09 | 京东方科技集团股份有限公司 | Accent conversion method and device and electronic equipment |
CN108564134B (en) * | 2018-04-27 | 2021-07-06 | 网易(杭州)网络有限公司 | Data processing method, device, computing equipment and medium |
CN109102037B (en) * | 2018-06-04 | 2024-03-05 | 平安科技(深圳)有限公司 | Chinese model training and Chinese image recognition method, device, equipment and medium |
CN110706710A (en) * | 2018-06-25 | 2020-01-17 | 普天信息技术有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN108877784B (en) * | 2018-09-05 | 2022-12-06 | 河海大学 | Robust speech recognition method based on accent recognition |
CN110895935B (en) * | 2018-09-13 | 2023-10-27 | 阿里巴巴集团控股有限公司 | Speech recognition method, system, equipment and medium |
CN109410911A (en) * | 2018-09-13 | 2019-03-01 | 何艳玲 | Artificial intelligence learning method based on speech recognition |
CN109685671A (en) * | 2018-12-13 | 2019-04-26 | 平安医疗健康管理股份有限公司 | Medical data exception recognition methods, equipment and storage medium based on machine learning |
CN111461155A (en) * | 2019-01-18 | 2020-07-28 | 富士通株式会社 | Apparatus and method for training classification model |
CN111128229A (en) * | 2019-08-05 | 2020-05-08 | 上海海事大学 | Voice classification method and device and computer storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102208030A (en) * | 2011-06-03 | 2011-10-05 | 天津大学 | Bayesian-model-averaging-based model combing method on regularization path of support vector machine |
CN102405495A (en) * | 2009-03-11 | 2012-04-04 | 谷歌公司 | Audio classification for information retrieval using sparse features |
EP2996045A1 (en) * | 2014-09-10 | 2016-03-16 | Xerox Corporation | Language model with structured penalty |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7890328B1 (en) * | 2006-09-07 | 2011-02-15 | At&T Intellectual Property Ii, L.P. | Enhanced accuracy for speech recognition grammars |
US10373054B2 (en) * | 2015-04-19 | 2019-08-06 | International Business Machines Corporation | Annealed dropout training of neural networks |
-
2016
- 2016-10-28 CN CN201610971766.3A patent/CN106531157B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102405495A (en) * | 2009-03-11 | 2012-04-04 | 谷歌公司 | Audio classification for information retrieval using sparse features |
CN102208030A (en) * | 2011-06-03 | 2011-10-05 | 天津大学 | Bayesian-model-averaging-based model combing method on regularization path of support vector machine |
EP2996045A1 (en) * | 2014-09-10 | 2016-03-16 | Xerox Corporation | Language model with structured penalty |
Also Published As
Publication number | Publication date |
---|---|
CN106531157A (en) | 2017-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106531157B (en) | Regularization accent adaptive approach in speech recognition | |
CN105632501B (en) | A kind of automatic accent classification method and device based on depth learning technology | |
CN104681036B (en) | A kind of detecting system and method for language audio | |
Saon et al. | Speaker adaptation of neural network acoustic models using i-vectors | |
CN103928023B (en) | A kind of speech assessment method and system | |
CN104575490B (en) | Spoken language pronunciation evaluating method based on deep neural network posterior probability algorithm | |
Senior et al. | GMM-free DNN acoustic model training | |
Kourkounakis et al. | Fluentnet: End-to-end detection of stuttered speech disfluencies with deep learning | |
US11862145B2 (en) | Deep hierarchical fusion for machine intelligence applications | |
Agarwalla et al. | Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech | |
CN108806696A (en) | Establish method, apparatus, computer equipment and the storage medium of sound-groove model | |
CN104240706B (en) | It is a kind of that the method for distinguishing speek person that similarity corrects score is matched based on GMM Token | |
Ryant et al. | Highly accurate mandarin tone classification in the absence of pitch information | |
Abro et al. | Qur'an recognition for the purpose of memorisation using Speech Recognition technique | |
Xie et al. | A KL divergence and DNN approach to cross-lingual TTS | |
CN106653002A (en) | Literal live broadcasting method and platform | |
Kourkounakis et al. | FluentNet: end-to-end detection of speech disfluency with deep learning | |
Sreevidya et al. | Sentiment analysis by deep learning approaches | |
Yilmaz et al. | Combining non-pathological data of different language varieties to improve DNN-HMM performance on pathological speech | |
KR20160059265A (en) | Method And Apparatus for Learning Acoustic Model Considering Reliability Score | |
White et al. | Maximum entropy confidence estimation for speech recognition | |
Mansour et al. | Voice recognition Using back propagation algorithm in neural networks | |
Rabiee et al. | Persian accents identification using an adaptive neural network | |
Kumar et al. | Designing neural speaker embeddings with meta learning | |
Farooq et al. | Mispronunciation detection in articulation points of Arabic letters using machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |