CN106531157B

CN106531157B - Regularization accent adaptive approach in speech recognition

Info

Publication number: CN106531157B
Application number: CN201610971766.3A
Authority: CN
Inventors: 陶建华; 易江燕; 温正棋; 刘斌
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2016-10-28
Filing date: 2016-10-28
Publication date: 2019-10-22
Anticipated expiration: 2036-10-28
Also published as: CN106531157A

Abstract

The invention discloses the regularization accent adaptive approach in a kind of speech recognition, and the method includes the following steps: step S100, carry out characteristic parameter extraction to collected accent data；Step S101 utilizes the characteristic parameter of extraction, the training independent baseline acoustic model of accent；Step S102 identifies its accent classification to accent data with classifier using the characteristic parameter of extraction；Step S103 calculates softening probability distribution；Step S104, regularization objective function；Step S105 carries out the acoustic model adaptive, generation accent relies on to the independent baseline acoustic model of the accent using the loss function of regularization.In the present invention, by acoustic model carry out regularization it is adaptive, improve the accuracy rate of the speech recognition with accent.

Description

Regularization accent adaptive approach in speech recognition

Technical field

The present invention relates to the regularization accents in electronics industry signal processing technology field more particularly to a kind of speech recognition Adaptive approach.

Background technique

Voice is the most natural and most efficient medium that person to person is exchanged, and speech recognition is that people and machine carry out certainly So interactive important channel.It identifies in recent years, with the deep application in speech recognition of depth learning technology, speech recognition Obtain the achievement to attract people's attention.The end-to-end long short-term memory being trained based on connection timing classification being especially recently proposed Acoustic training model method rise, the step of not only greatly simplifying acoustic model and improve decoded speed, and And improve the precision of speech recognition.But when speaker's pronunciation less standard or when with strong accents, the standard of speech recognition True rate sharply declines.

Summary of the invention

The present invention in view of the above-mentioned problems existing in the prior art, proposes that the regularization accent in a kind of speech recognition is adaptive Method, to improve the recognition accuracy of accent voice.

Regularization accent adaptive approach in speech recognition of the invention the following steps are included:

Step S100 carries out characteristic parameter extraction to collected accent data；

Step S101 utilizes the characteristic parameter of extraction, the training independent baseline acoustic model of accent；

Step S102 identifies its accent classification to accent data with classifier using the characteristic parameter of extraction；

Step S103 calculates softening probability distribution；

Step S104, the loss function of baseline acoustic model described in regularization；

Step S105 carries out adaptively the independent baseline acoustic model of the accent using the loss function of regularization, Generate the acoustic model that accent relies on.

Further, the characteristic parameter is Meier spectrum signature or mel-frequency cepstrum feature.

Further, the static parameter for first extracting the accent data, then calculates separately the single order of the static parameter Difference and second differnce obtain the characteristic parameter.

Further, the baseline acoustic model is long Memory Neural Networks model in short-term.

Further, the classifier is feedforward neural network classifier.

Further, the softening probability distribution is calculated using forwards algorithms.

Further, the loss function is to be coupled timing Classification Loss function.

Further, in step S104, the loss function of the baseline acoustic model is considered as a regularization term and is added Enter on the standard loss function relied on to accent, for inputting target voice x, corresponding label sets are z, the connection of regularization Tie timing Classification Loss function are as follows:

L (S)=- ln ∏_(x,z)∈SP (z | x)=- ∑_(x,z)∈Slnp(z|x)

Wherein, ρ is regularization parameter, and S is training sample set, when L (S) is that the standard for the acoustic model that accent relies on is coupled Sequence Classification Loss function,It is the regularization connection timing Classification Loss function for the acoustic model that accent relies on.lnp(z|x) It is the corresponding correct log probability of label z in the acoustic model of accent dependence；lnp^AI(z | x) is the softening log probability of label z Distribution, using forwards algorithms, is calculated to the independent length of accent in short-term memory models baseline acoustic model；For the linear combination of correct log probability and softening log probability.

Further, in step s105, only the last layer of the baseline acoustic model is carried out adaptively, to obtain The acoustic model that accent relies on.

Further, in step s105, the adaptive of the baseline acoustic model is carried out using Back Propagation Algorithm.

In the present invention, by acoustic model carry out regularization it is adaptive, improve the accurate of the speech recognition with accent Rate.

Detailed description of the invention

Fig. 1 is the flow diagram of the regularization accent adaptive approach in speech recognition of the embodiment of the present invention；

Fig. 2 is the process signal of accents recognition in regularization accent adaptive approach in speech recognition of the embodiment of the present invention Figure；

Fig. 3 is the process that softening probability generates in regularization accent adaptive approach in speech recognition of the embodiment of the present invention Schematic diagram；

Fig. 4 is that the accent in the regularization accent adaptive approach in speech recognition of the embodiment of the present invention relies on acoustic model Product process schematic diagram.

Specific embodiment

The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little embodiments are used only for explaining technical principle of the invention, it is not intended that limit the scope of the invention.

As shown in Figure 1, the regularization accent adaptive approach of the embodiment of the present invention mainly includes the following steps:

Step S100 carries out characteristic parameter extraction to collected accent data.

The mandarin audio data of various dialectal accents can be acquired according to different geographical, age and gender, to form one A accent database, for training the independent baseline acoustic model of accent.

Meier spectrum signature or mel-frequency cepstrum coefficient (Mel Frequency Cepstrum are used in present embodiment Coefficient, MFCC), MFCC is put forward based on human auditory system, and recognition performance is preferable, is widely used in voice letter Number processing every field.Here, static parameter can be extracted first, their first-order difference and second differnce are then calculated separately, The parameter finally extracted is, for example, 39 dimensions, and the identification of succeeding state is carried out using this 39 dimension attribute.

In other embodiments, it is possible to use LPCC (linear prediction residue error), HMM (Hidden Markov Model), The methods of DTW (dynamic time warping) carries out characteristic parameter extraction.

Step S101 utilizes the characteristic parameter of extraction, the training independent baseline acoustic model of accent.

Use the model based on long Memory Neural Networks in short-term as the baseline acoustic model, loss in present embodiment Function is to be coupled timing Classification Loss function.

In other embodiments, it is possible to use other models train the acoustic model, including Hidden Markov-height This mixed model, Hidden Markov-BP network model, Hidden Markov-length Memory Neural Networks model in short-term, hidden horse Er Kefu-convolutional neural networks model etc..

Specifically, one can be trained using timing Classification Loss function is coupled according to the acoustical characteristic parameters of the extraction The baseline acoustic model of the independent length of accent memory depth Recognition with Recurrent Neural Network in short-term.The connection timing Classification Loss function is mark Quasi- loss function.

Step S102, as shown in Fig. 2, identifying it to accent data with classifier using the characteristic parameter extracted Accent classification.

In the present invention, classifier can be arbitrary to the classifier that accent data are classified.Make in present embodiment With feedforward neural network classifier, which constructed based on deep neural network, can have 4 classifications, hidden containing 2 Layer is hidden, each hidden layer contains 1024 nodes, and loss function is cross entropy.

Step S103 calculates softening probability distribution.

As shown in figure 3, utilizing the independent baseline of the accent constructed in step s101 according to the characteristic parameter of extraction The softening probability distribution of acoustic model calculating accent data.

Softening probability forwards algorithms are calculated, calculate probability value for each label of the acoustic model output layer, i.e., Soften probability.

Herein it is apparent that step S102 and step S103 can be carried out simultaneously, can also successively be carried out by different order.

Step S104, the loss function of baseline acoustic model described in regularization.

Specifically, by the loss function of the independent baseline acoustic model of accent be considered as a regularization term be added to accent according to On bad standard loss function, trained to prevent adaptive process from destroying the parameter of neural network acoustic model or allowing Journey generates the phenomenon that over-fitting.In present embodiment, which is to be coupled timing Classification Loss function.

For the target voice x of input, corresponding label sets are z, the connection timing Classification Loss function of regularization Then there is following formula:

L (S)=- ln ∏_(x,z)∈SP (z | x)=- ∑_(x,z)∈Slnp(z|x)

ρ is regularization parameter, and S is training sample set, and L (S) is that the standard for the acoustic model that accent relies on is coupled timing point Class loss function,It is the regularization connection timing Classification Loss function for the acoustic model that accent relies on.Lnp (z | x) it is mouth The corresponding correct log probability of label z in the acoustic model that sound relies on；lnp^AI(z | x) is the softening log probability distribution of label z, Using forwards algorithms, it is calculated to the independent length of accent in short-term memory models baseline acoustic model；It is most New log probability eventually, i.e., the linear combination of correct log probability and softening log probability.

Step S105 carries out the independent baseline acoustic model of the accent using the loss function of regularization adaptive It answers, generates the acoustic model that accent relies on.

As shown in figure 4, using the characteristic parameter extracted in step S100, the accent classification and step that are generated using step S102 The rapid calculated softening probability of S103 and accent data utilize the loss letter of the regularization derived in step S104 as input The independent acoustic model of several pairs of accents carries out the acoustic model adaptive, generation accent relies on.

It can be carried out with Back Propagation Algorithm during carrying out adaptive in present embodiment, ultimately produce accent dependence Acoustic model.Back Propagation Algorithm is particularly suitable for neural metwork training.

Preferably, only the last layer of the baseline acoustic model is carried out adaptively, to obtain the sound of accent dependence Learn model.Only the last layer of acoustic model adaptively, all carry out compared to all layers to acoustic model adaptively For method, adaptive speed is improved.

Method of the invention, by acoustic model carry out regularization it is adaptive, improve the speech recognition with accent Accuracy rate.By carrying out regularization to loss function, adaptive step is simplified.

It should be noted that the above-mentioned definition to each element is not limited in the various specific structures mentioned in embodiment Or shape, those skilled in the art can replace with simply being known to it.

So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, this field Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this Under the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, these Technical solution after change or replacement will fall within the scope of protection of the present invention.

Claims

1. the regularization accent adaptive approach in a kind of speech recognition, which is characterized in that the method includes the following steps:

Step S103 calculates softening probability distribution using forwards algorithms；

The loss function is to be coupled timing Classification Loss function, and the loss function of the baseline acoustic model is being considered as one just Then change item to be added on the standard loss function of accent dependence, for inputting target voice x, corresponding label sets are z, just The connection timing Classification Loss function then changed are as follows:

L (S)=- ln Π_{(x, z) ∈ S}P (z | x)=- ∑_{(x, z) ∈ S}ln p(z|x)

Wherein, ρ is regularization parameter, and S is training sample set, and L (S) is that the standard for the acoustic model that accent relies on is coupled timing point Class loss function,It is the regularization connection timing Classification Loss function for the acoustic model that accent relies on；Ln p (z | x) be The corresponding correct log probability of label z in the acoustic model that accent relies on；ln p^AI(z | x) is the softening log probability point of label z Cloth is calculated to the independent length of accent in short-term memory models baseline acoustic model using forwards algorithms； For the linear combination of correct log probability and softening log probability；

Step S105 carries out adaptive, generation to the independent baseline acoustic model of the accent using the loss function of regularization The acoustic model that accent relies on.

2. the method according to claim 1, wherein the characteristic parameter is Meier spectrum signature or mel-frequency Cepstrum feature.

3. according to the method described in claim 2, then dividing it is characterized in that, first extract the static parameter of the accent data The first-order difference and second differnce for not calculating the static parameter, obtain the characteristic parameter.

4. the method according to claim 1, wherein the baseline acoustic model is long Memory Neural Networks in short-term Model.

5. the method according to claim 1, wherein the classifier is feedforward neural network classifier.

6. the method according to claim 1, wherein calculating the softening probability distribution using forwards algorithms.

7. method according to claim 1 to 6, which is characterized in that in step s105, only to the baseline The last layer of acoustic model carries out adaptively, to obtain the acoustic model of accent dependence.

8. method according to claim 1 to 6, which is characterized in that in step s105, use back-propagating Algorithm carries out the adaptive of the baseline acoustic model.