CN111933123A

CN111933123A - Acoustic modeling method based on gated cyclic unit

Info

Publication number: CN111933123A
Application number: CN202010966498.2A
Authority: CN
Inventors: 温登峰; 何云鹏; 许兵
Original assignee: Chipintelli Technology Co Ltd
Current assignee: Chipintelli Technology Co Ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2020-11-13

Abstract

Step 1, extracting corresponding acoustic features from original audio data; step 2, improving a gate control cycle unit by using layer normalization, and calculating the forward output of the neural network by using the improved gate control cycle unit; step 3, training the model according to the state vector of the current moment calculated in the step 2; and 4, decoding the trained model, namely finding the output sequence with the maximum probability. According to the invention, a layer normalization technology is applied to the gated cyclic neural unit, the activation value of the neuron can be normalized, and the network convergence speed is improved, so that the network training time is reduced; the activation function in the traditional gating circulation unit is replaced by an ELU activation function, so that the robustness of data is improved; meanwhile, by optimizing the calculation formula of the gate structure, the model parameters of the traditional gate control circulation unit are reduced, and the identification performance of the model can be improved.

Description

Acoustic modeling method based on gated cyclic unit

Technical Field

The invention belongs to the technical field of voice recognition, relates to an acoustic modeling method, and particularly relates to an acoustic modeling method based on a gate control cycle unit.

Background

In recent years, with the continuous development of artificial intelligence and computer technology, deep learning technology is widely applied to the fields of images, voice and the like. As one of the most natural interfaces between a robot and a human, speech is becoming a hot research direction in academic and industrial fields.

The acoustic model is one of the most core modules of the speech recognition system, and the performance of the acoustic model directly affects the whole speech system. The basic structure of an acoustic Model was a Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) before 2009, but with the successful use of Neural networks in the speech recognition field, the conventional GMM-HMM was gradually replaced by DNN-HMM (Deep Neural Network-Deep Neural Network, DNN-HMM). However, since speech is essentially a continuous signal, DNN has a relatively fixed field of view of the input signal, and cannot be modeled efficiently using context information. The Recurrent Neural Network (RNN) can capture dynamic information in serialized data well by periodically connecting hidden layer nodes, so that the modeling capability of the RNN on voice information is better.

However, standard RNNs suffer from gradient disappearance and gradient explosion during training. In order to solve the above problems, scholars have proposed a Long Short-Term Memory network (LSTM) with a gating mechanism, which can well alleviate the problem of gradient disappearance and learn longer history information by introducing input, forgetting and output gates to control the flow of information. Although the LSTM structure is very efficient, its complex gating structure also makes implementation more difficult. Therefore, to simplify the network structure, Cho et al proposed a Gated Recurrent Unit (GRU) based on the above and demonstrated that GRU has comparable effects to LSTM in subsequent phonetic studies.

However, in practical applications, such methods are far from the requirement of large-scale commercialization because the GRU still has the problems of excessive model parameters, too long training time, insufficient robustness to noise data, and the like, which will greatly limit the performance of the speech recognition system.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention discloses an acoustic modeling method based on a gating cycle unit.

The acoustic modeling method based on the gating cycle unit comprises the following steps:

step 1, extracting corresponding acoustic characteristics from original audio data

The subscript T is 1,2, …, T is the frame number of the speech signal;

step 2, improving a gating cycle unit by utilizing layer normalization, and replacing a tanh activation function in the traditional gating cycle unit with an ELU activation function; computing a forward output of a neural network using a modified gated cyclic unit function, the forward output including a current time of day

State vector of

；

Step 3, calculating the current time according to the step 2

State vector of

Training the model;

and 4, decoding the trained model, namely finding the output sequence with the maximum probability.

Preferably, the state vector is aligned in step 3

And normalizing to obtain the output probability of each neuron, then constructing a corresponding CTC loss function by combining with a CTC algorithm, and training the model by a reverse time propagation algorithm (BPTT).

Preferably, the step 2 further comprises normalizing the forward outputThe normalized equation is:

wherein the content of the first and second substances,

is that

To a corresponding second

The number of the elements is one,

for the output state vector at the present time instant t,

for time t the network outputs a label of

X represents the current frame input.

Preferably, in step 2, the activation vectors of the gate and the reset gate are updated

And

the calculation formulas of (A) and (B) are respectively as follows:

for the input characteristic data at the time t,

is the state vector at the time immediately preceding time t,

is a logic sigmoid function, br and bz represent the offset vectors of the reset gate and the update gate, respectively; wz and Wr respectively represent the feedforward weight of the update gate and the reset gate, Uz and Ur respectively represent the recursive weight of the update gate and the reset gate, and LN is a normalization function.

The acoustic modeling method based on the gating cycle unit has the following advantages that:

the invention applies the layer normalization technology to the gated cyclic neural unit, can normalize the activation value of the neuron, and improves the network convergence speed, thereby reducing the network training time.

Replacing the tanh activation function in the traditional gating circulation unit with an ELU activation function; the robustness to data is improved.

And thirdly, in order to reduce the model parameters of the GRU, the invention provides that matrix multiplication related to input in an update gate and a reset gate in the traditional gated cyclic unit is replaced by multiplication among elements, so that the model parameters of the traditional gated cyclic unit are reduced, and the identification performance of the model is improved.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Detailed Description

The following provides a more detailed description of the present invention.

The acoustic modeling method based on the gating cycle unit can be used for continuous speech recognition scenes and can also be used for modeling under other situations related to speech recognition, and is specifically shown in FIG. 1.

The subscript T is 1,2, …, T being the number of frames of the speech signal.

Step 2, improving a gate control cycle unit by using layer normalization, calculating the forward output of the neural network by using an improved gate control cycle unit function, and normalizing the forward output to obtain the output probability of each neuron;

normalization may use a softmax function;

the specific way of normalization is:

(1.0)

wherein the content of the first and second substances,

is that

To a corresponding second

The number of the elements is one,

for time t the network outputs a label of

K and k' represent different summation tag definitions,

x represents the current frame input for the output state vector at the current time t.

The modified gated round unit function LN-SGRU is:

wherein the content of the first and second substances,

for the input characteristic data at the time t,

corresponding to the reset gate, the update gate, the activation vector of the candidate state,

as the current time

The state vector of (a) is the output vector,

is the state vector at the previous time instant.

Is a logical sigmoid function, which constrains

And

is in the range of 0 to 1.

Representing multiplication between elements.

And

representing the feedforward weight and the recursive weight separately,

is the corresponding offset vector;

subscripts z, r, h denote the weights associated with the input for the update gate, reset gate, and candidate state, respectively;

step 3, calculating the current time according to the step 2

State vector of

Constructing a corresponding CTC loss function by combining a CTC algorithm, and training a model by a reverse time propagation algorithm (BPTT);

the way of constructing the CTC loss function can be performed with reference to the existing literature such as the labeling of the unsegmented sequence data with the recovery neural networks (Graves A, Fern' dez S, Gomez F, et al. connection temporary classification [ C ]// Proceedings of the 23rd international reference on Machine learning. 2006: 369-.

And 4, decoding the trained model to find the output sequence with the maximum probability.

In the improved gated cyclic unit function, the gated cyclic unit function is performed according to the traditional gated cyclic neural unit equation, and the gated cyclic neural unit equation adopting a layer normalization method is

Where the layer normalization function LN is defined as follows, reference may be made to the corresponding literature, such as: ba J L, Kiros J R, Hinton G E. Layer normalization [ J ]. arXIv preprint arXIv:1607.06450, 2016.

And

respectively corresponding to the average value and the standard deviation of the input sum of each layer, wherein D is the number of neurons in the current layer;

and

the adaptive bias and the gain of the neuron are respectively, and the initialization values of the adaptive bias and the gain are respectively 0 and 1;

representing a vector

To (1) a

Individual elements, Z is the input vector for each layer of neurons.

The tanh activation function in the formula (1.3) is replaced by the ELU activation function, so that the network is more robust to noise data, the benefits brought by the layer normalization technology can be fully utilized, the convergence rate of the network is faster, and therefore the formula (1.3) is changed into:

（2.6）

wherein the ELU activation function is defined as formula (2.3), and the invention uses

Can be set to 1;

（2.3）

in the calculation formula of gate structure due to gated circulation cell

And

there is a certain redundancy of the information of (1), so that it is possible to reduce the redundancy by an appropriate amountAnd the information carried by the model parameters are fully utilized, so that the recognition effect of the model is better. In this respect, the invention changes the calculation formulas of the update gate and the reset gate, namely, in the formulas (1.1) and (1.2)

，

Become into

，

The matrix multiplication is changed into element corresponding multiplication, obviously, the number of model parameters can be greatly reduced by the multiplication among elements, and further, the calculation is simplified.

Combining the above improvements, the improved gated round-robin unit function is:

as will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing is directed to preferred embodiments of the present invention, wherein the preferred embodiments are not obviously contradictory or subject to any particular embodiment, and any combination of the preferred embodiments may be combined in any overlapping manner, and the specific parameters in the embodiments and examples are only for the purpose of clearly illustrating the inventor's invention verification process and are not intended to limit the scope of the invention, which is defined by the claims and the equivalent structural changes made by the description and drawings of the present invention are also intended to be included in the scope of the present invention.