CN111933123A - Acoustic modeling method based on gated cyclic unit - Google Patents
Acoustic modeling method based on gated cyclic unit Download PDFInfo
- Publication number
- CN111933123A CN111933123A CN202010966498.2A CN202010966498A CN111933123A CN 111933123 A CN111933123 A CN 111933123A CN 202010966498 A CN202010966498 A CN 202010966498A CN 111933123 A CN111933123 A CN 111933123A
- Authority
- CN
- China
- Prior art keywords
- unit
- model
- gate
- state vector
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/148—Duration modelling in HMMs, e.g. semi HMM, segmental models or transition probabilities
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
Abstract
Step 1, extracting corresponding acoustic features from original audio data; step 2, improving a gate control cycle unit by using layer normalization, and calculating the forward output of the neural network by using the improved gate control cycle unit; step 3, training the model according to the state vector of the current moment calculated in the step 2; and 4, decoding the trained model, namely finding the output sequence with the maximum probability. According to the invention, a layer normalization technology is applied to the gated cyclic neural unit, the activation value of the neuron can be normalized, and the network convergence speed is improved, so that the network training time is reduced; the activation function in the traditional gating circulation unit is replaced by an ELU activation function, so that the robustness of data is improved; meanwhile, by optimizing the calculation formula of the gate structure, the model parameters of the traditional gate control circulation unit are reduced, and the identification performance of the model can be improved.
Description
Technical Field
The invention belongs to the technical field of voice recognition, relates to an acoustic modeling method, and particularly relates to an acoustic modeling method based on a gate control cycle unit.
Background
In recent years, with the continuous development of artificial intelligence and computer technology, deep learning technology is widely applied to the fields of images, voice and the like. As one of the most natural interfaces between a robot and a human, speech is becoming a hot research direction in academic and industrial fields.
The acoustic model is one of the most core modules of the speech recognition system, and the performance of the acoustic model directly affects the whole speech system. The basic structure of an acoustic Model was a Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) before 2009, but with the successful use of Neural networks in the speech recognition field, the conventional GMM-HMM was gradually replaced by DNN-HMM (Deep Neural Network-Deep Neural Network, DNN-HMM). However, since speech is essentially a continuous signal, DNN has a relatively fixed field of view of the input signal, and cannot be modeled efficiently using context information. The Recurrent Neural Network (RNN) can capture dynamic information in serialized data well by periodically connecting hidden layer nodes, so that the modeling capability of the RNN on voice information is better.
However, standard RNNs suffer from gradient disappearance and gradient explosion during training. In order to solve the above problems, scholars have proposed a Long Short-Term Memory network (LSTM) with a gating mechanism, which can well alleviate the problem of gradient disappearance and learn longer history information by introducing input, forgetting and output gates to control the flow of information. Although the LSTM structure is very efficient, its complex gating structure also makes implementation more difficult. Therefore, to simplify the network structure, Cho et al proposed a Gated Recurrent Unit (GRU) based on the above and demonstrated that GRU has comparable effects to LSTM in subsequent phonetic studies.
However, in practical applications, such methods are far from the requirement of large-scale commercialization because the GRU still has the problems of excessive model parameters, too long training time, insufficient robustness to noise data, and the like, which will greatly limit the performance of the speech recognition system.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention discloses an acoustic modeling method based on a gating cycle unit.
The acoustic modeling method based on the gating cycle unit comprises the following steps:
step 1, extracting corresponding acoustic characteristics from original audio dataThe subscript T is 1,2, …, T is the frame number of the speech signal;
step 2, improving a gating cycle unit by utilizing layer normalization, and replacing a tanh activation function in the traditional gating cycle unit with an ELU activation function; computing a forward output of a neural network using a modified gated cyclic unit function, the forward output including a current time of dayState vector of;
and 4, decoding the trained model, namely finding the output sequence with the maximum probability.
Preferably, the state vector is aligned in step 3And normalizing to obtain the output probability of each neuron, then constructing a corresponding CTC loss function by combining with a CTC algorithm, and training the model by a reverse time propagation algorithm (BPTT).
wherein the content of the first and second substances,is thatTo a corresponding secondThe number of the elements is one,for the output state vector at the present time instant t,for time t the network outputs a label ofX represents the current frame input.
Preferably, in step 2, the activation vectors of the gate and the reset gate are updatedAndthe calculation formulas of (A) and (B) are respectively as follows:
for the input characteristic data at the time t,is the state vector at the time immediately preceding time t,is a logic sigmoid function, br and bz represent the offset vectors of the reset gate and the update gate, respectively; wz and Wr respectively represent the feedforward weight of the update gate and the reset gate, Uz and Ur respectively represent the recursive weight of the update gate and the reset gate, and LN is a normalization function.
The acoustic modeling method based on the gating cycle unit has the following advantages that:
the invention applies the layer normalization technology to the gated cyclic neural unit, can normalize the activation value of the neuron, and improves the network convergence speed, thereby reducing the network training time.
Replacing the tanh activation function in the traditional gating circulation unit with an ELU activation function; the robustness to data is improved.
And thirdly, in order to reduce the model parameters of the GRU, the invention provides that matrix multiplication related to input in an update gate and a reset gate in the traditional gated cyclic unit is replaced by multiplication among elements, so that the model parameters of the traditional gated cyclic unit are reduced, and the identification performance of the model is improved.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Detailed Description
The following provides a more detailed description of the present invention.
The acoustic modeling method based on the gating cycle unit can be used for continuous speech recognition scenes and can also be used for modeling under other situations related to speech recognition, and is specifically shown in FIG. 1.
Step 1, extracting corresponding acoustic characteristics from original audio dataThe subscript T is 1,2, …, T being the number of frames of the speech signal.
Step 2, improving a gate control cycle unit by using layer normalization, calculating the forward output of the neural network by using an improved gate control cycle unit function, and normalizing the forward output to obtain the output probability of each neuron;
normalization may use a softmax function;
the specific way of normalization is:
wherein the content of the first and second substances,is thatTo a corresponding secondThe number of the elements is one,for time t the network outputs a label ofK and k' represent different summation tag definitions,x represents the current frame input for the output state vector at the current time t.
The modified gated round unit function LN-SGRU is:
wherein the content of the first and second substances,for the input characteristic data at the time t,corresponding to the reset gate, the update gate, the activation vector of the candidate state,as the current timeThe state vector of (a) is the output vector,is the state vector at the previous time instant.Is a logical sigmoid function, which constrainsAndis in the range of 0 to 1.Representing multiplication between elements.Andrepresenting the feedforward weight and the recursive weight separately,is the corresponding offset vector;
subscripts z, r, h denote the weights associated with the input for the update gate, reset gate, and candidate state, respectively;
step 3, calculating the current time according to the step 2State vector ofConstructing a corresponding CTC loss function by combining a CTC algorithm, and training a model by a reverse time propagation algorithm (BPTT);
the way of constructing the CTC loss function can be performed with reference to the existing literature such as the labeling of the unsegmented sequence data with the recovery neural networks (Graves A, Fern' dez S, Gomez F, et al. connection temporary classification [ C ]// Proceedings of the 23rd international reference on Machine learning. 2006: 369-.
And 4, decoding the trained model to find the output sequence with the maximum probability.
In the improved gated cyclic unit function, the gated cyclic unit function is performed according to the traditional gated cyclic neural unit equation, and the gated cyclic neural unit equation adopting a layer normalization method is
Where the layer normalization function LN is defined as follows, reference may be made to the corresponding literature, such as: ba J L, Kiros J R, Hinton G E. Layer normalization [ J ]. arXIv preprint arXIv:1607.06450, 2016.
Andrespectively corresponding to the average value and the standard deviation of the input sum of each layer, wherein D is the number of neurons in the current layer;andthe adaptive bias and the gain of the neuron are respectively, and the initialization values of the adaptive bias and the gain are respectively 0 and 1;representing a vectorTo (1) aIndividual elements, Z is the input vector for each layer of neurons.
The tanh activation function in the formula (1.3) is replaced by the ELU activation function, so that the network is more robust to noise data, the benefits brought by the layer normalization technology can be fully utilized, the convergence rate of the network is faster, and therefore the formula (1.3) is changed into:
wherein the ELU activation function is defined as formula (2.3), and the invention usesCan be set to 1;
in the calculation formula of gate structure due to gated circulation cellAndthere is a certain redundancy of the information of (1), so that it is possible to reduce the redundancy by an appropriate amountAnd the information carried by the model parameters are fully utilized, so that the recognition effect of the model is better. In this respect, the invention changes the calculation formulas of the update gate and the reset gate, namely, in the formulas (1.1) and (1.2),Become into,The matrix multiplication is changed into element corresponding multiplication, obviously, the number of model parameters can be greatly reduced by the multiplication among elements, and further, the calculation is simplified.
Combining the above improvements, the improved gated round-robin unit function is:
as will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is directed to preferred embodiments of the present invention, wherein the preferred embodiments are not obviously contradictory or subject to any particular embodiment, and any combination of the preferred embodiments may be combined in any overlapping manner, and the specific parameters in the embodiments and examples are only for the purpose of clearly illustrating the inventor's invention verification process and are not intended to limit the scope of the invention, which is defined by the claims and the equivalent structural changes made by the description and drawings of the present invention are also intended to be included in the scope of the present invention.
Claims (4)
1. An acoustic modeling method based on a gated cyclic unit is characterized by comprising the following steps:
step 1, extracting corresponding acoustic characteristics from original audio dataThe subscript T is 1,2, …, T is the frame number of the speech signal;
step 2, improving a gating cycle unit by utilizing layer normalization, and replacing a tanh activation function in the traditional gating cycle unit with an ELU activation function; computing a forward output of a neural network using a modified gated cyclic unit function, the forward output including a current time of dayState vector of;
and 4, decoding the trained model, namely finding the output sequence with the maximum probability.
3. The gated-round cell based acoustic modeling method of claim 2, wherein step 2 further comprises normalizing the forward output by:
4. The gated-cyclic-unit-based acoustic modeling method of claim 1, wherein in step 2, the activation vectors of the update gate and the reset gateAndthe calculation formulas of (A) and (B) are respectively as follows:
for the input characteristic data at the time t,is the state vector at the time immediately preceding time t,is a logic sigmoid function, br and bz represent the offset vectors of the reset gate and the update gate, respectively; wz and Wr respectively represent the feedforward weight of the update gate and the reset gate, Uz and Ur respectively represent the recursive weight of the update gate and the reset gate, and LN is a normalization function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010966498.2A CN111933123A (en) | 2020-09-15 | 2020-09-15 | Acoustic modeling method based on gated cyclic unit |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010966498.2A CN111933123A (en) | 2020-09-15 | 2020-09-15 | Acoustic modeling method based on gated cyclic unit |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111933123A true CN111933123A (en) | 2020-11-13 |
Family
ID=73334646
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010966498.2A Pending CN111933123A (en) | 2020-09-15 | 2020-09-15 | Acoustic modeling method based on gated cyclic unit |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111933123A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906887A (en) * | 2021-02-20 | 2021-06-04 | 上海大学 | Sparse GRU neural network acceleration realization method and device |
CN113707135A (en) * | 2021-10-27 | 2021-11-26 | 成都启英泰伦科技有限公司 | Acoustic model training method for high-precision continuous speech recognition |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105976808A (en) * | 2016-04-18 | 2016-09-28 | 成都启英泰伦科技有限公司 | Intelligent speech recognition system and method |
CA3005241A1 (en) * | 2017-05-19 | 2018-11-19 | Salesforce.Com, Inc. | Domain specific language for generation of recurrent neural network architectures |
CN110738983A (en) * | 2018-07-02 | 2020-01-31 | 成都启英泰伦科技有限公司 | Multi-neural-network model voice recognition method based on equipment working state switching |
-
2020
- 2020-09-15 CN CN202010966498.2A patent/CN111933123A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105976808A (en) * | 2016-04-18 | 2016-09-28 | 成都启英泰伦科技有限公司 | Intelligent speech recognition system and method |
CA3005241A1 (en) * | 2017-05-19 | 2018-11-19 | Salesforce.Com, Inc. | Domain specific language for generation of recurrent neural network architectures |
CN110738983A (en) * | 2018-07-02 | 2020-01-31 | 成都启英泰伦科技有限公司 | Multi-neural-network model voice recognition method based on equipment working state switching |
Non-Patent Citations (5)
Title |
---|
DJORK-ARNÉ CLEVERT: ""fast and accurate deep network learning by exponential linear unites (ELUs)"", 《ICLR 2016》 * |
ELSAVED N: ""empirical activation function effection effects on unsupervised conbolutional LSTM learning"", 《ICTAI》 * |
MARTIN SCHRIMPF: ""a flexible approach to automated RNN architecture generation"", 《ARXIV:1712.07316V1 [CS.CL]》 * |
TAESUP KIM: ""Dynamic layer normalization for adaptive neural acoustic modeling in speech recognition"", 《PROC. INTERSPEECH 2017》 * |
温登峰: ""基于循环神经网络的语音识别声学建模研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906887A (en) * | 2021-02-20 | 2021-06-04 | 上海大学 | Sparse GRU neural network acceleration realization method and device |
CN113707135A (en) * | 2021-10-27 | 2021-11-26 | 成都启英泰伦科技有限公司 | Acoustic model training method for high-precision continuous speech recognition |
CN113707135B (en) * | 2021-10-27 | 2021-12-31 | 成都启英泰伦科技有限公司 | Acoustic model training method for high-precision continuous speech recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108764207B (en) | Face expression recognition method based on multitask convolutional neural network | |
CN107301864B (en) | Deep bidirectional LSTM acoustic model based on Maxout neuron | |
Gelly et al. | Optimization of RNN-based speech activity detection | |
CN112560432B (en) | Text emotion analysis method based on graph attention network | |
CN110609891A (en) | Visual dialog generation method based on context awareness graph neural network | |
CN110046252B (en) | Medical text grading method based on attention mechanism neural network and knowledge graph | |
CN111815033A (en) | Offshore wind power prediction method based on RCNN and meteorological time sequence characteristics | |
CN111477220B (en) | Neural network voice recognition method and system for home spoken language environment | |
CN111933123A (en) | Acoustic modeling method based on gated cyclic unit | |
CN110598552A (en) | Expression recognition method based on improved particle swarm optimization convolutional neural network optimization | |
Chen et al. | Distilled binary neural network for monaural speech separation | |
CN111461907A (en) | Dynamic network representation learning method oriented to social network platform | |
CN114596839A (en) | End-to-end voice recognition method, system and storage medium | |
Jiang et al. | Neuralizing regular expressions for slot filling | |
CN116863920A (en) | Voice recognition method, device, equipment and medium based on double-flow self-supervision network | |
CN115761654B (en) | Vehicle re-identification method | |
CN116205227A (en) | Keyword generation method and system based on variation inference theory | |
Chen et al. | Deep sparse autoencoder network for facial emotion recognition | |
CN112598065B (en) | Memory-based gating convolutional neural network semantic processing system and method | |
Hu et al. | Several models and applications for deep learning | |
CN114880527A (en) | Multi-modal knowledge graph representation method based on multi-prediction task | |
CN115408603A (en) | Online question-answer community expert recommendation method based on multi-head self-attention mechanism | |
Sun et al. | Regularization of deep neural networks using a novel companion objective function | |
CN112750466A (en) | Voice emotion recognition method for video interview | |
Ali et al. | The Impact of Optimization Algorithms on The Performance of Face Recognition Neural Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201113 |