WO2016101688A1 - Continuous voice recognition method based on deep long-and-short-term memory recurrent neural network - Google Patents

Continuous voice recognition method based on deep long-and-short-term memory recurrent neural network Download PDF

Info

Publication number
WO2016101688A1
WO2016101688A1 PCT/CN2015/092380 CN2015092380W WO2016101688A1 WO 2016101688 A1 WO2016101688 A1 WO 2016101688A1 CN 2015092380 W CN2015092380 W CN 2015092380W WO 2016101688 A1 WO2016101688 A1 WO 2016101688A1
Authority
WO
WIPO (PCT)
Prior art keywords
output
neural network
short
term
term memory
Prior art date
Application number
PCT/CN2015/092380
Other languages
French (fr)
Chinese (zh)
Inventor
杨毅
孙甲松
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Publication of WO2016101688A1 publication Critical patent/WO2016101688A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Abstract

A continuous voice recognition method based on a deep long-and-short-term memory recurrent neural network, comprising: utilizing a noisy voice signal (302) and an original pure voice signal (301) as training samples; constructing two deep long-and-short-term memory recurrent neural network modules (303, 305) having the same structure; conducting a cross entropy calculation between each deep long-and-short-term memory layer (102) of the two modules (303, 305) to obtain the difference therebetween; updating a cross entropy parameter via a linear circulation projection layer (108); and finally acquiring a deep long-and-short-term memory recurrent neural network acoustic model robust to environmental noise. The method constructs a deep long-and-short-term memory recurrent neural network acoustic model, thus increasing a recognition efficiency of a continuous noisy voice signal, addressing the problem that the majority of calculations need to be completed on a CPU device as a result of the large scale of deep neural network (DNN) parameters, having a low calculation complexity and a fast convergence rate, and being widely applicable to a plurality of machine learning fields such as speaker recognition, key word recognition and human-machine interaction related to voice recognition.

Description

一种基于深度长短期记忆循环神经网络的连续语音识别方法Continuous speech recognition method based on deep long-term and short-term memory cycle neural network 技术领域Technical field
本发明属于音频技术领域,特别涉及一种基于深度长短期记忆循环神经网络的连续语音识别方法。The invention belongs to the field of audio technology, and in particular relates to a continuous speech recognition method based on a deep long-term and short-term memory cycle neural network.
背景技术Background technique
随着信息技术的迅速发展,语音识别技术已经具备大规模商业化的条件。目前语音识别主要采用基于统计模型的连续语音识别技术,其主要目标是通过给定的语音序列寻找其所代表的概率最大的词序列。连续语音识别系统通常包括声学模型、语言模型及解码方法,声学建模方法作为连续语音识别的核心技术,近年来获得快速发展。常用的声学模型为混合高斯模型-隐马尔科夫模型(Gaussian Mixture Model-Hidden Markov Model,GMM-HMM),其原理为:训练混合高斯模型得到每帧特征属于每个音素状态的概率,训练隐马尔科夫模型得到音素状态之间及自身的转移概率,据此得到每个音素状态序列产生当前语音特征矢量序列的概率。考虑到协同发音(Coarticulation)现象,进一步将音素根据不同的上下文(Context Dependent)分为不同的建模单元,称为CD-GMM-HMM方法。With the rapid development of information technology, speech recognition technology has been subject to large-scale commercialization. At present, speech recognition mainly adopts continuous speech recognition technology based on statistical model, and its main goal is to find the most probable word sequence represented by a given speech sequence. Continuous speech recognition systems usually include acoustic models, language models and decoding methods. Acoustic modeling methods, as the core technology of continuous speech recognition, have developed rapidly in recent years. The commonly used acoustic model is the Gaussian Mixture Model-Hidden Markov Model (GMM-HMM). The principle is: training the mixed Gaussian model to obtain the probability that each frame feature belongs to each phoneme state. The Markov model obtains the transition probabilities between the phoneme states and themselves, and accordingly obtains the probability that each phoneme state sequence produces the current speech feature vector sequence. Considering the phenomenon of Coarticulation, the phonemes are further divided into different modeling units according to different contexts (Context Dependent), which is called CD-GMM-HMM method.
微软在2011年提出用深度神经网络(Deep Neural Network,DNN)取代传统声学模型中的混合高斯模型,构成了新的CD-DNN-HMM模型,将DNN模型的表达能力与CD-HMM模型的顺序建模能力结合,其核心是对声学特征进行多层变换,并将特征提取和声学建模在同一网络进行优化。与传统的GMM-HMM模型框架相比,DNN-HMM模型在英文连续语音识别库上的错误率降低了30%左右。但是DNN的每一层都有百万量级的参数,且下一层的输入是上一次的输出,因此一般计算代价较大,且在说话速度不同以及需要对长时序列进行处理时效果不佳。 In 2011, Microsoft proposed to replace the Gaussian model in the traditional acoustic model with Deep Neural Network (DNN), which constitutes a new CD-DNN-HMM model, and the expression ability of the DNN model and the order of the CD-HMM model. The combination of modeling capabilities is based on multi-layer transformation of acoustic features and optimization of feature extraction and acoustic modeling on the same network. Compared with the traditional GMM-HMM model framework, the error rate of the DNN-HMM model on the English continuous speech recognition library is reduced by about 30%. However, each layer of the DNN has a million-level parameter, and the input of the next layer is the last output, so the general calculation cost is large, and the effect is not good when the speech speed is different and the long-term sequence needs to be processed. good.
循环神经网络(Recurrent Neural Network,RNN)是一种单元之间存在有向循环来表达网络内部动态时间特性的神经网络,在手写体识别和语言模型等方面得到广泛应用。语音信号是复杂的时变信号,在不同时间尺度上具有复杂的相关性,因此相比于深度神经网络而言,循环神经网络具有的循环连接功能更适合处理这类复杂时序数据。作为循环神经网络的一种,长短期记忆(Long Short-Term Memory,LSTM)模型比循环神经网络更适合处理和预测事件滞后且时间不定的长时序列。多伦多大学提出的增加了记忆模块(memory block)的深度LSTM-RNN声学模型则将深度神经网络的多层次表征能力与循环神经网络灵活利用长跨度上下文的能力结合,使得基于TIMIT库的音素识别错误率降至17.1%。Recurrent Neural Network (RNN) is a kind of neural network with a directed loop to express the dynamic time characteristics of the network. It is widely used in handwriting recognition and language modeling. Speech signals are complex time-varying signals with complex correlations on different time scales. Therefore, compared with deep neural networks, cyclic neural networks have a loop-connecting function that is more suitable for processing such complex time series data. As a kind of cyclic neural network, the Long Short-Term Memory (LSTM) model is more suitable than the cyclic neural network to process and predict long-term sequences with delayed events and uncertain time. The deep LSTM-RNN acoustic model proposed by the University of Toronto with the addition of a memory block combines the multi-level representation capabilities of deep neural networks with the ability of cyclic neural networks to flexibly utilize long-span contexts, resulting in errors in phoneme recognition based on TIMIT libraries. The rate dropped to 17.1%.
但是循环神经网络中使用的梯度下降法存在梯度消散(vanishing gradient)问题,也就是在对网络的权重进行调整的过程中,随着网络层数增加,梯度逐层消散,致使其对权重调整的作用越来越小。谷歌提出的两层深度LSTM-RNN声学模型,在以前的深度LSTM-RNN模型中增加了线性循环投影层(Recurrent Projection Layer),用于解决梯度消散问题。对比实验表明,RNN的帧正确率(Frame Accuracy)及其收敛速度明显逊于LSTM-RNN和DNN;在词错误率及其收敛速度方面,最好的DNN在训练数周后的词错误率为11.3%;而两层深度LSTM-RNN模型在训练48小时后词错误率降低至10.9%,训练100/200小时后,词错误率降低至10.7/10.5(%)。However, the gradient descent method used in the cyclic neural network has the problem of vanishing gradient, that is, in the process of adjusting the weight of the network, as the number of network layers increases, the gradient dissipates layer by layer, causing the weight to be adjusted. The effect is getting smaller and smaller. Google's proposed two-layer depth LSTM-RNN acoustic model adds a linear Recurrent Projection Layer to the previous depth LSTM-RNN model to solve the gradient dissipation problem. Contrastive experiments show that the frame accuracy rate (Frame Accuracy) and its convergence speed of RNN are obviously lower than LSTM-RNN and DNN; in terms of word error rate and its convergence speed, the word error rate of the best DNN after training for several weeks 11.3%; while the two-layer depth LSTM-RNN model reduced the word error rate to 10.9% after 48 hours of training, after 100/200 hours of training, the word error rate decreased to 10.7/10.5 (%).
但实际声学环境的复杂性仍然严重影响和干扰连续语音识别系统的性能,即使利用目前最好的深度神经网络方法,在包括噪声、音乐、口语、重复等复杂条件下的连续语音识别数据集上也只能获得70%左右的识别率,连续语音识别系统中声学模型的抗噪性和鲁棒性有待改进。此外深度神经网络方法参数规模大,大部分计算工作需要在GPU设备上完成,普通CPU难以胜任,因此这类方法距离大规模商业化的要求还有一定的距离。However, the complexity of the actual acoustic environment still seriously affects and interferes with the performance of continuous speech recognition systems, even with the best deep neural network methods available, on continuous speech recognition data sets under complex conditions including noise, music, spoken language, and repetition. Only 70% recognition rate can be obtained, and the noise immunity and robustness of the acoustic model in the continuous speech recognition system need to be improved. In addition, the parameters of the deep neural network method are large, and most of the calculation work needs to be done on the GPU device, and the ordinary CPU is difficult to perform. Therefore, such methods have a certain distance from the requirements of large-scale commercialization.
发明内容Summary of the invention
为了克服上述现有技术的缺点,本发明的目的在于提供一种基于深度长 短期记忆循环神经网络的连续语音识别方法,提高了对带噪连续语音信号的语音识别率,并且具有计算复杂度低、收敛速度快等特点,适合在普通CPU上实现。In order to overcome the above disadvantages of the prior art, it is an object of the present invention to provide a depth-based The continuous speech recognition method of short-term memory loop neural network improves the speech recognition rate of noisy continuous speech signals, and has the characteristics of low computational complexity and fast convergence, which is suitable for implementation on ordinary CPU.
为了实现上述目的,本发明采用的技术方案是:In order to achieve the above object, the technical solution adopted by the present invention is:
一种基于深度长短期记忆循环神经网络的连续语音识别方法,包括:A continuous speech recognition method based on deep long-term and short-term memory cyclic neural network, comprising:
步骤一,建立两个结构完全相同的包括多个长短期记忆层和线性循环投影层的深度长短期记忆循环神经网络模块;Step one, establishing two deep long-term and short-term memory cyclic neural network modules having identical structures including a plurality of long-term and short-term memory layers and a linear cyclic projection layer;
步骤二,分别将原始纯净语音信号和带噪信号作为输入送至步骤一的两个模块;Step two, respectively sending the original pure speech signal and the noisy signal as input to the two modules of step one;
步骤三,对两个模块中对应的长短期记忆层的所有参数计算交叉熵来衡量两个模块之间的信息分布差异,并通过线性循环投影层二实现交叉熵参数更新;Step 3: Calculate the cross-entropy of all the parameters of the corresponding long-short-term memory layer in the two modules to measure the information distribution difference between the two modules, and implement the cross-entropy parameter update through the linear cyclic projection layer 2;
步骤四,通过比较最终的更新结果与以原始纯净语音信号为输入的深度长短期记忆循环神经网络模块的最终输出,实现连续语音识别。In step four, continuous speech recognition is achieved by comparing the final update result with the final output of the deep long-term memory loop neural network module with the original pure speech signal as input.
所述深度长短期记忆循环神经网络模块中,语音信号x=[x1,...,xT]作为整个模块的输入,同时也作为第一个长短期记忆层的输入,第一个长短期记忆层的输出作为第一个线性循环投影层的输入,第一个线性循环投影层的输出作为下一个线性循环投影层的输入,下一个线性循环投影层的输出再作为下下一个线性循环投影层的输入,依次类推,其中,以原始纯净语音信号为输入的深度长短期记忆循环神经网络模块中,最后一个线性循环投影层的输出作为整个深度长短期记忆循环神经网络模块的输出y=[y1,...,yT],T为语音信号的时间长度,而以带噪信号为输入的深度长短期记忆循环神经网络模块中,最后一个线性循环投影层的输出舍弃。In the deep long-term and short-term memory cycle neural network module, the speech signal x=[x 1 ,...,x T ] is input to the entire module, and also serves as the input of the first long-term and short-term memory layer, the first long The output of the short-term memory layer is the input of the first linear loop projection layer. The output of the first linear loop projection layer is the input of the next linear loop projection layer, and the output of the next linear loop projection layer is used as the next linear loop. The input of the projection layer, and so on, wherein the output of the last linear loop projection layer is the output of the entire deep long-term memory loop neural network module in the deep long-term memory loop neural network module with the original pure speech signal as input. [y 1 ,...,y T ], T is the length of time of the speech signal, and in the deep long-term memory loop neural network module with the noisy signal as input, the output of the last linear loop projection layer is discarded.
所述长短期记忆层由记忆细胞、输入门、输出门、遗忘门、tanh函数以及乘法器组成,其中长短期记忆层即长短期记忆神经网络子模块,在t∈[1,T]时刻长短期记忆神经网络子模块中的参数按照如下公式计算: The long-term and short-term memory layer is composed of a memory cell, an input gate, an output gate, an forgetting gate, a tanh function, and a multiplier, wherein the long-term and short-term memory layer, that is, the long-term and short-term memory neural network sub-module, is long at t∈[1,T] The parameters in the short-term memory neural network sub-module are calculated as follows:
Ginput=sigmoid(Wixx+WicCell'+bi)G input = sigmoid(W ix x+W ic Cell'+b i )
Gforget=sigmoid(Wfxx+WfcCell'+bf)G forget =sigmoid(W fx x+W fc Cell'+b f )
Cell=m'+Gforget⊙Cell'+Ginput⊙tanh(Wcxx)⊙m'+bc Cell=m'+G forget ⊙Cell'+G input ⊙tanh(W cx x)⊙m'+b c
Goutput=sigmoid(Woxx+WocCell'+bo)G output = sigmoid(W ox x+W oc Cell'+b o )
m=tanh(Goutput⊙Cell⊙m')m=tanh(G output ⊙Cell⊙m')
y=soft maxk(Wymm+by)y=soft max k (W ym m+b y )
其中Ginput为输入门的输出,Gforget为遗忘门的输出,Cell为记忆细胞的输出,Cell'为t-1时刻记忆细胞的输出,Goutput为输出门的输出,G'output为t-1时刻输出门的输出,m为线性循环投影层的输出,m'为t-1时刻线性循环投影层的输出;x为整个长短期记忆循环神经网络模块的输入,y为一个长短期记忆循环神经网络子模块的输出;bi为输入门i的偏差量,bf为遗忘门f的偏差量,bc为记忆细胞c的偏差量,bo为输出门o的偏差量,by为输出y的偏差量,不同的b代表不同的偏差量;Wix为输入门i与输入x之间的权重,Wic为输入门i与记忆细胞c之间的权重,Wfx为遗忘门f与输入x之间的权重,Wfc为遗忘门f与记忆细胞c之间的权重,Woc为输出门o与记忆细胞c之间的权重,Wym为输出y与输出m之间的权重,且有
Figure PCTCN2015092380-appb-000001
Figure PCTCN2015092380-appb-000002
其中xk表示第k∈[1,K]个soft max函数的输入,l∈[1,K]用于对全部
Figure PCTCN2015092380-appb-000003
求和;⊙代表矩阵元素相乘。
Where G input is the output of the input gate, G forget is the output of the forgetting gate, Cell is the output of the memory cell, Cell' is the output of the memory cell at time t-1, G output is the output of the output gate, and G' output is t- The output of the output gate is 1 time, m is the output of the linear cyclic projection layer, m' is the output of the linear cyclic projection layer at time t-1; x is the input of the whole long-term and short-term memory cycle neural network module, and y is a long-term and short-term memory cycle The output of the neural network sub-module; b i is the deviation of the input gate i, b f is the deviation of the forgetting gate f, b c is the deviation of the memory cell c, b o is the deviation of the output gate o, b y is The amount of deviation of the output y, different b represents a different amount of deviation; W ix is the weight between the input gate i and the input x, W ic is the weight between the input gate i and the memory cell c, and W fx is the forgetting gate f The weight between the input x and W fc is the weight between the forgetting gate f and the memory cell c, W oc is the weight between the output gate o and the memory cell c, and W ym is the weight between the output y and the output m And have
Figure PCTCN2015092380-appb-000001
Figure PCTCN2015092380-appb-000002
Where x k represents the input of the kth ∈[1,K] soft max functions, l∈[1,K] is used for all
Figure PCTCN2015092380-appb-000003
Summation; ⊙ represents the multiplication of matrix elements.
所述两个深度长短期记忆循环神经网络模块中,分别取一个位于同一级的长短期记忆神经网络子模块的输出作为一个更新子模块的两个输入,一个更新子模块由交叉熵和线性循环投影层二组成,多个更新子模块串联组成更新模块,一个更新子模块的输出作为下一个更新子模块的输入,最后一个子模块的输出为整个更新模块的输出。In the two deep long-term and short-term memory loop neural network modules, the output of a long-term and short-term memory neural network sub-module at the same level is taken as two inputs of an update sub-module, and one update sub-module is composed of cross-entropy and linear loop. The projection layer is composed of two, and the plurality of update submodules are connected in series to form an update module, and the output of one update submodule is input of the next update submodule, and the output of the last submodule is output of the entire update module.
所述更新子模块中的交叉熵按照如下公式计算: The cross entropy in the update submodule is calculated according to the following formula:
d(x1,x2)=∫x1lnx2dt-∫x2lnx1dtd(x 1 ,x 2 )=∫x 1 lnx 2 dt-∫x 2 lnx 1 dt
其中d为交叉熵,x1和x2分别代表本更新子模块的两个输入,即以原始纯净语音信号和带噪信号为输入的长短期记忆神经网络模块中的长短期记忆神经网络子模块的输出;Where d is the cross entropy, x 1 and x 2 represent the two inputs of the update sub-module, respectively, ie the long-short-term memory neural network sub-module in the long-short-term memory neural network module with the original pure speech signal and the noisy signal as input. Output;
线性循环投影层二的输出按照如下公式计算:The output of the linear loop projection layer 2 is calculated as follows:
y'=soft maxk(Wy'd+by')y'=soft max k (W y' d+b y' )
其中y'为整个更新模块的输出矢量,Wy代表参数更新输出到线性循环投影层输出的权重,d代表交叉熵,by'代表偏差量。Where y' is the output vector of the entire update module, W y represents the weight of the parameter update output to the output of the linear loop projection layer, d represents the cross entropy, and b y ' represents the amount of deviation.
现有的深度神经网络声学模型在安静的环境下具有很好的性能,但在环境噪声较大使得信噪比急剧下降的情况下失效。与深度神经网络声学模型相比,本发明循环神经网络声学模型中的单元之间存在有向循环,可以有效的描述神经网络内部的动态时间特性,更适合处理具有复杂时序的语音数据。而长短期记忆神经网络比循环神经网络更适合处理和预测事件滞后且时间不定的长时序列,因此用于构建语音识别的声学模型能够取得更好的效果。进一步,在深度长短期记忆循环神经网络声学模型结构中需要降低噪声特征对神经网络参数的影响,提高语音识别系统在环境噪声干扰下的抗噪性及鲁棒性。The existing deep neural network acoustic model has good performance in a quiet environment, but fails in the case where the environmental noise is large and the signal to noise ratio is drastically reduced. Compared with the deep neural network acoustic model, there is a directed cycle between the elements in the acoustic neural network acoustic model of the present invention, which can effectively describe the dynamic time characteristics inside the neural network, and is more suitable for processing voice data with complex time series. Long- and short-term memory neural networks are more suitable than cyclic neural networks to process and predict long-term sequences with delayed events and uncertain time. Therefore, acoustic models used to construct speech recognition can achieve better results. Furthermore, in the deep and long-term memory cycle neural network acoustic model structure, it is necessary to reduce the influence of noise characteristics on neural network parameters and improve the noise immunity and robustness of speech recognition system under environmental noise interference.
附图说明DRAWINGS
图1是本发明的深度长短期记忆神经网络模型流程图。1 is a flow chart of a deep long-term and short-term memory neural network model of the present invention.
图2是本发明的深度长短期记忆循环神经网络更新模块流程图。2 is a flow chart of the deep long-term and short-term memory cycle neural network update module of the present invention.
图3是本发明的鲁棒深度长短期记忆神经网络声学模型流程图。3 is a flow chart of the acoustic model of the robust deep long-term memory neural network of the present invention.
具体实施方式detailed description
下面结合附图和实施例详细说明本发明的实施方式。Embodiments of the present invention will be described in detail below with reference to the drawings and embodiments.
本发明提出一种鲁棒深度长短期记忆神经网络声学模型的方法和装置,特别地,用于连续语音识别的场景下。这些方法和装置不局限于连续语音识别,也可以是任何与语音识别有关的方法和装置。 The present invention proposes a method and apparatus for robust deep long-term memory neural network acoustic models, in particular, for continuous speech recognition scenarios. These methods and apparatus are not limited to continuous speech recognition, but can be any method and apparatus related to speech recognition.
步骤1,建立两个结构完全相同的包括多个长短期记忆层和线性循环投影层的深度长短期记忆循环神经网络模块,分别将原始纯净语音信号和带噪信号作为输入送至步骤一的两个模块。Step 1. Establish two deep long-term and short-term memory cyclic neural network modules including two long-short-term memory layers and a linear cyclic projection layer, respectively, and send the original pure speech signal and the noisy signal as input to the two of step one respectively. Modules.
图1为本发明深度长短期记忆循环神经网络模块的流程图,包括以下内容:1 is a flow chart of a deep long-term and short-term memory cycle neural network module according to the present invention, including the following contents:
输入101为语音信号x=[x1,...,xT](T为语音信号的时间长度);方框内为长短期记忆层102,也即长短期记忆神经网络子模块,该子模块由记忆细胞103、输入门104、输出门105、遗忘门106、tanh函数107、乘法器组成;长短期记忆神经网络子模块的输出作为线性循环投影层108的输入,线性循环投影层108的输出为y=[y1,...,yT],即长短期记忆循环神经网络子模块的输出109,109作为下一个长短期记忆神经网络子模块的输入,如此循环多次。Input 101 is a speech signal x=[x 1 ,...,x T ] (T is the length of time of the speech signal); within the box is a long-term and short-term memory layer 102, that is, a long-term and short-term memory neural network sub-module, the sub-module The module is composed of a memory cell 103, an input gate 104, an output gate 105, a forgetting gate 106, a tanh function 107, and a multiplier; the output of the long-term and short-term memory neural network sub-module is input to the linear cyclic projection layer 108, and the linear cyclic projection layer 108 The output is y=[y 1 ,...,y T ], ie the output 109,109 of the long-term and short-term memory-cycle neural network sub-module is used as the input to the next long-short-term memory neural network sub-module, so that it is repeated multiple times.
在t∈[1,T]时刻长短期记忆神经网络子模块中的参数按照如下公式计算:The parameters in the short-term memory neural network sub-module at t∈[1,T] are calculated according to the following formula:
Ginput=sigmoid(Wixx+WicCell'+bi)G input = sigmoid(W ix x+W ic Cell'+b i )
Gforget=sigmoid(Wfxx+WfcCell'+bf)G forget =sigmoid(W fx x+W fc Cell'+b f )
Cell=m'+Gforget⊙Cell'+Ginput⊙tanh(Wcxx)⊙m'+bc Cell=m'+G forget ⊙Cell'+G input ⊙tanh(W cx x)⊙m'+b c
Goutput=sigmoid(Woxx+WocCell'+bo)G output = sigmoid(W ox x+W oc Cell'+b o )
m=tanh(Goutput⊙Cell⊙m')m=tanh(G output ⊙Cell⊙m')
y=soft maxk(Wymm+by)y=soft max k (W ym m+b y )
其中Ginput为输入门的输出,Gforget为遗忘门的输出,Cell为记忆细胞的输出,Cell'为t-1时刻记忆细胞的输出,Goutput为输出门的输出,G'output为t-1时刻输出门的输出,m为线性循环投影层的输出,m'为t-1时刻线性循环投影层的输出;x为整个长短期记忆循环神经网络模块的输入,y为一个长短期记忆循环神经网络子模块的输出;bi为输入门i的偏差量,bf为遗忘门f的偏差量,bc为记忆细胞c的偏差量,bo为输出门o的偏差量,by为输出y的偏差量,不同的b代表不同的偏差量;Wix为输入门i与输入x之间的权重,Wic为输入门i与记忆细胞c之间的权重,Wfx为遗忘门f与输入x之间的权重,Wfc为遗忘门f与记忆 细胞c之间的权重,Woc为输出门o与记忆细胞c之间的权重,Wym为输出y与输出m之间的权重,且有
Figure PCTCN2015092380-appb-000004
其中xk表示第k∈[1,K]个soft max函数的输入,l∈[1,K]用于对全部
Figure PCTCN2015092380-appb-000005
求和;⊙代表矩阵元素相乘。
Where G input is the output of the input gate, G forget is the output of the forgetting gate, Cell is the output of the memory cell, Cell' is the output of the memory cell at time t-1, G output is the output of the output gate, and G' output is t- The output of the output gate is 1 time, m is the output of the linear cyclic projection layer, m' is the output of the linear cyclic projection layer at time t-1; x is the input of the whole long-term and short-term memory cycle neural network module, and y is a long-term and short-term memory cycle The output of the neural network sub-module; b i is the deviation of the input gate i, b f is the deviation of the forgetting gate f, b c is the deviation of the memory cell c, b o is the deviation of the output gate o, b y is The amount of deviation of the output y, different b represents a different amount of deviation; W ix is the weight between the input gate i and the input x, W ic is the weight between the input gate i and the memory cell c, and W fx is the forgetting gate f The weight between the input x and W fc is the weight between the forgetting gate f and the memory cell c, W oc is the weight between the output gate o and the memory cell c, and W ym is the weight between the output y and the output m And have
Figure PCTCN2015092380-appb-000004
Where x k represents the input of the kth ∈[1,K] soft max functions, l∈[1,K] is used for all
Figure PCTCN2015092380-appb-000005
Summation; ⊙ represents the multiplication of matrix elements.
步骤2,对两个模块中对应的长短期记忆层的所有参数计算交叉熵来衡量两个模块之间的信息分布差异,并通过线性循环投影层二实现交叉熵参数更新。Step 2: Calculate the cross-entropy of all the parameters of the corresponding long-short-term memory layer in the two modules to measure the information distribution difference between the two modules, and implement the cross-entropy parameter update through the linear cyclic projection layer 2.
图2为本发明的深度长短期记忆循环神经网络更新模块流程图,包括以下内容:将原始纯净语音信号和带噪信号(即受到环境噪声干扰后的原始纯净语音信号)分别作为图1中深度长短期记忆循环神经网络模块的输入,可以分别得到两个长短期记忆神经网络子模块(即图1的方框)的输出,将这两个输出作为本更新模块的输入201;虚线框内为本更新模块的更新子模块202,更新子模块202由交叉熵203和线性循环投影层二204组成;更新子模块202的输出作为下一个更新子模块的输入,如此循环多次;最后一个更新子模块的输出为整个更新模块的输出205。2 is a flow chart of a deep-long-term and short-term memory cycle neural network update module according to the present invention, which includes the following contents: the original pure speech signal and the noisy signal (ie, the original pure speech signal after being interfered by environmental noise) are respectively used as the depth in FIG. The input of the long-term and short-term memory cycle neural network module can respectively obtain the outputs of two long-term and short-term memory neural network sub-modules (ie, the block of FIG. 1), and the two outputs are used as the input 201 of the update module; The update submodule 202 of the update module, the update submodule 202 is composed of a cross entropy 203 and a linear loop projection layer two 204; the output of the update submodule 202 is used as an input of the next update submodule, so that the loop is repeated multiple times; the last updater The output of the module is the output 205 of the entire update module.
更新子模块中202的交叉熵203按照如下公式计算:The cross entropy 203 of the update submodule 202 is calculated according to the following formula:
d(x1,x2)=∫x1lnx2dt-∫x2lnx1dtd(x 1 ,x 2 )=∫x 1 lnx 2 dt-∫x 2 lnx 1 dt
其中d为交叉熵,x1和x2分别代表本更新模块的两个输入,即原始纯净语音信号和带噪信号分别输入得到的两个长短期记忆循环神经网络的输出。Where d is the cross entropy, and x 1 and x 2 represent the two inputs of the update module, that is, the outputs of the two long-term and short-term memory loop neural networks respectively input from the original pure speech signal and the noisy signal.
线性循环投影层204的输出按照如下公式计算:The output of the linear loop projection layer 204 is calculated as follows:
y'=soft maxk(Wy'd+by')y'=soft max k (W y' d+b y' )
其中y'为整个模块的输出205,Wy代表交叉熵203输出到线性循环投影层204的权重,d代表交叉熵,by'代表偏差量,并有
Figure PCTCN2015092380-appb-000006
其中xk表示第k∈[1,K]个soft max函数的输入,l∈[1,K]用于对全部
Figure PCTCN2015092380-appb-000007
求和。
Where y' is the output 205 of the entire module, W y represents the weight of the cross entropy 203 output to the linear cyclic projection layer 204, d represents the cross entropy, b y ' represents the amount of deviation, and
Figure PCTCN2015092380-appb-000006
Where x k represents the input of the kth ∈[1,K] soft max functions, l∈[1,K] is used for all
Figure PCTCN2015092380-appb-000007
Summing.
步骤3,通过比较最终的更新结果与以原始纯净语音信号为输入的深度长短 期记忆循环神经网络模块的最终输出,实现连续语音识别。Step 3, by comparing the final update results with the depth of the input with the original pure speech signal The final output of the memory-cycled neural network module enables continuous speech recognition.
图3为本发明的鲁棒深度长短期记忆神经网络声学模型流程图,包括以下内容:3 is a flow chart of an acoustic deep model of the robust deep long-term memory neural network of the present invention, including the following contents:
从左到右分别为:以原始纯净语音信号301为输入的深度长短期记忆循环神经网络模块303、深度长短期记忆循环神经网络更新模块304、以带噪信号(即受到环境噪声干扰后的原始纯净语音信号)302为输入的深度长短期记忆循环神经网络模块305,其中参数的计算见步骤1和步骤2,最终输出为原始纯净语音信号为输入的深度长短期记忆循环神经网络模块的输出306,以及深度长短期记忆循环神经网络更新模块的输出307。 From left to right: the deep long-term memory cycle neural network module 303 with the original pure speech signal 301 as input, the deep long-term memory cycle neural network update module 304, and the noisy signal (that is, the original after being disturbed by environmental noise) The pure speech signal 302 is the input deep long-term memory loop neural network module 305, wherein the parameters are calculated as shown in steps 1 and 2, and the final output is the original pure speech signal as the input of the deep long-term memory loop neural network module output 306. And the output 307 of the deep long-term memory loop neural network update module.

Claims (5)

  1. 一种基于深度长短期记忆循环神经网络的连续语音识别方法,其特征在于,包括:A continuous speech recognition method based on deep long-term and short-term memory cyclic neural network, which comprises:
    步骤一,建立两个结构完全相同的包括多个长短期记忆层和线性循环投影层的深度长短期记忆循环神经网络模块;Step one, establishing two deep long-term and short-term memory cyclic neural network modules having identical structures including a plurality of long-term and short-term memory layers and a linear cyclic projection layer;
    步骤二,分别将原始纯净语音信号和带噪信号作为输入送至步骤一的两个模块;Step two, respectively sending the original pure speech signal and the noisy signal as input to the two modules of step one;
    步骤三,对两个模块中对应的长短期记忆层的所有参数计算交叉熵来衡量两个模块之间的信息分布差异,并通过线性循环投影层二实现交叉熵参数更新;Step 3: Calculate the cross-entropy of all the parameters of the corresponding long-short-term memory layer in the two modules to measure the information distribution difference between the two modules, and implement the cross-entropy parameter update through the linear cyclic projection layer 2;
    步骤四,通过比较最终的更新结果与以原始纯净语音信号为输入的深度长短期记忆循环神经网络模块的最终输出,实现连续语音识别。In step four, continuous speech recognition is achieved by comparing the final update result with the final output of the deep long-term memory loop neural network module with the original pure speech signal as input.
  2. 根据权利要求1所述基于深度长短期记忆循环神经网络的连续语音识别方法,其特征在于,所述深度长短期记忆循环神经网络模块中,语音信号x=[x1,...,xT]作为整个模块的输入,同时也作为第一个长短期记忆层的输入,第一个长短期记忆层的输出作为第一个线性循环投影层的输入,第一个线性循环投影层的输出作为下一个线性循环投影层的输入,下一个线性循环投影层的输出再作为下下一个线性循环投影层的输入,依次类推,其中,以原始纯净语音信号为输入的深度长短期记忆循环神经网络模块中,最后一个线性循环投影层的输出作为整个深度长短期记忆循环神经网络模块的输出y=[y1,...,yT],T为语音信号的时间长度,而以带噪信号为输入的深度长短期记忆循环神经网络模块中,最后一个线性循环投影层的输出舍弃。The continuous speech recognition method based on the deep long-term and short-term memory cycle neural network according to claim 1, wherein in the deep long-term and short-term memory cycle neural network module, the speech signal x=[x 1 ,...,x T As the input to the entire module, and also as the input of the first long-term and short-term memory layer, the output of the first long-term and short-term memory layer is the input of the first linear cyclic projection layer, and the output of the first linear cyclic projection layer is used as The input of the next linear loop projection layer, the output of the next linear loop projection layer is used as the input of the next linear loop projection layer, and so on, wherein the deep long-term memory loop neural network module with the original pure speech signal as input The output of the last linear loop projection layer is used as the output of the whole deep long-term memory loop neural network module y=[y 1 ,...,y T ], where T is the time length of the speech signal, and the noisy signal is Entering the depth of the long-term memory loop in the neural network module, the output of the last linear loop projection layer is discarded.
  3. 根据权利要求1或2所述基于深度长短期记忆循环神经网络的连续语音识别方法,其特征在于,所述长短期记忆层由记忆细胞、输入门、输出门、遗忘门、tanh函数以及乘法器组成,其中长短期记忆层即长短期记忆神经网 络子模块,在t∈[1,T]时刻长短期记忆神经网络子模块中的参数按照如下公式计算:The continuous speech recognition method based on the deep long-term and short-term memory cycle neural network according to claim 1 or 2, wherein the long-term and short-term memory layer is composed of a memory cell, an input gate, an output gate, an forgetting gate, a tanh function, and a multiplier Long-term and short-term memory neural network The complex module, the parameters in the short-term memory neural network sub-module at t∈[1,T] time is calculated according to the following formula:
    Ginput=sigmoid(Wixx+WicCell'+bi)G input = sigmoid(W ix x+W ic Cell'+b i )
    Gforget=sigmoid(Wfxx+WfcCell'+bf)G forget =sigmoid(W fx x+W fc Cell'+b f )
    Cell=m'+Gforget⊙Cell'+Ginput⊙tanh(Wcxx)⊙m'+bc Cell=m'+G forget ⊙Cell'+G input ⊙tanh(W cx x)⊙m'+b c
    Goutput=sigmoid(Woxx+WocCell'+bo)G output = sigmoid(W ox x+W oc Cell'+b o )
    m=tanh(Goutput⊙Cell⊙m')m=tanh(G output ⊙Cell⊙m')
    y=soft maxk(Wymm+by)y=soft max k (W ym m+b y )
    其中Ginput为输入门的输出,Gforget为遗忘门的输出,Cell为记忆细胞的输出,Cell'为t-1时刻记忆细胞的输出,Goutput为输出门的输出,G'output为t-1时刻输出门的输出,m为线性循环投影层的输出,m'为t-1时刻线性循环投影层的输出;x为整个长短期记忆循环神经网络模块的输入,y为一个长短期记忆循环神经网络子模块的输出;bi为输入门i的偏差量,bf为遗忘门f的偏差量,bc为记忆细胞c的偏差量,bo为输出门o的偏差量,by为输出y的偏差量,不同的b代表不同的偏差量;Wix为输入门i与输入x之间的权重,Wic为输入门i与记忆细胞c之间的权重,Wfx为遗忘门f与输入x之间的权重,Wfc为遗忘门f与记忆细胞c之间的权重,Woc为输出门o与记忆细胞c之间的权重,Wym为输出y与输出m之间的权重,且有
    Figure PCTCN2015092380-appb-100001
    Figure PCTCN2015092380-appb-100002
    Figure PCTCN2015092380-appb-100003
    其中xk表示第k∈[1,K]个softmax函数的输入,l∈[1,K]用于对全部
    Figure PCTCN2015092380-appb-100004
    求和;⊙代表矩阵元素相乘。
    Where G input is the output of the input gate, G forget is the output of the forgetting gate, Cell is the output of the memory cell, Cell' is the output of the memory cell at time t-1, G output is the output of the output gate, and G' output is t- The output of the output gate is 1 time, m is the output of the linear cyclic projection layer, m' is the output of the linear cyclic projection layer at time t-1; x is the input of the whole long-term and short-term memory cycle neural network module, and y is a long-term and short-term memory cycle The output of the neural network sub-module; b i is the deviation of the input gate i, b f is the deviation of the forgetting gate f, b c is the deviation of the memory cell c, b o is the deviation of the output gate o, b y is The amount of deviation of the output y, different b represents a different amount of deviation; W ix is the weight between the input gate i and the input x, W ic is the weight between the input gate i and the memory cell c, and W fx is the forgetting gate f The weight between the input x and W fc is the weight between the forgetting gate f and the memory cell c, W oc is the weight between the output gate o and the memory cell c, and W ym is the weight between the output y and the output m And have
    Figure PCTCN2015092380-appb-100001
    Figure PCTCN2015092380-appb-100002
    Figure PCTCN2015092380-appb-100003
    Where x k represents the input of the kth ∈[1,K] softmax functions, l∈[1,K] is used for all
    Figure PCTCN2015092380-appb-100004
    Summation; ⊙ represents the multiplication of matrix elements.
  4. 根据权利要求3所述基于深度长短期记忆循环神经网络的连续语音识别方法,其特征在于,所述两个深度长短期记忆循环神经网络模块中,分别取一个位于同一级的长短期记忆神经网络子模块的输出作为一个更新子模块的两个输入,一个更新子模块由交叉熵和线性循环投影层二组成,多个更新 子模块串联组成更新模块,一个更新子模块的输出作为下一个更新子模块的输入,最后一个子模块的输出为整个更新模块的输出。The continuous speech recognition method based on the deep long-term and short-term memory cycle neural network according to claim 3, wherein the two deep long-term and short-term memory cyclic neural network modules respectively take a long-term and short-term memory neural network at the same level The output of the submodule acts as two inputs to an update submodule, and an update submodule consists of cross entropy and linear loop projection layer two, multiple updates The submodules are connected in series to form an update module. The output of one update submodule is used as the input of the next update submodule, and the output of the last submodule is the output of the entire update module.
  5. 根据权利要求4所述基于深度长短期记忆循环神经网络的连续语音识别方法,其特征在于,所述更新子模块中的交叉熵按照如下公式计算:The continuous speech recognition method based on the deep long-term and short-term memory cycle neural network according to claim 4, wherein the cross entropy in the update submodule is calculated according to the following formula:
    d(x1,x2)=∫x1ln x2dt-∫x2lnx1dtd(x 1 ,x 2 )=∫x 1 ln x 2 dt-∫x 2 lnx 1 dt
    其中d为交叉熵,x1和x2分别代表本更新子模块的两个输入,即以原始纯净语音信号和带噪信号为输入的长短期记忆神经网络模块中的长短期记忆神经网络子模块的输出;Where d is the cross entropy, x 1 and x 2 represent the two inputs of the update sub-module, respectively, ie the long-short-term memory neural network sub-module in the long-short-term memory neural network module with the original pure speech signal and the noisy signal as input. Output;
    线性循环投影层二的输出按照如下公式计算:The output of the linear loop projection layer 2 is calculated as follows:
    y'=soft maxk(Wy'd+by')y'=soft max k (W y' d+b y' )
    其中d即交叉熵,y'为整个更新模块的输出矢量,Wy代表参数更新输出到线性循环投影层输出的权重,u代表交叉熵,by'代表偏差量。 Where d is the cross entropy, y' is the output vector of the entire update module, W y represents the weight of the parameter update output to the output of the linear loop projection layer, u represents the cross entropy, and b y ' represents the amount of deviation.
PCT/CN2015/092380 2014-12-25 2015-10-21 Continuous voice recognition method based on deep long-and-short-term memory recurrent neural network WO2016101688A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410821646.6 2014-12-25
CN201410821646.6A CN104538028B (en) 2014-12-25 2014-12-25 A kind of continuous speech recognition method that Recognition with Recurrent Neural Network is remembered based on depth shot and long term

Publications (1)

Publication Number Publication Date
WO2016101688A1 true WO2016101688A1 (en) 2016-06-30

Family

ID=52853544

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/092380 WO2016101688A1 (en) 2014-12-25 2015-10-21 Continuous voice recognition method based on deep long-and-short-term memory recurrent neural network

Country Status (2)

Country Link
CN (1) CN104538028B (en)
WO (1) WO2016101688A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086865A (en) * 2018-06-11 2018-12-25 上海交通大学 A kind of series model method for building up based on cutting Recognition with Recurrent Neural Network
CN110147284A (en) * 2019-05-24 2019-08-20 湖南农业大学 Supercomputer workload prediction method based on two-dimentional shot and long term Memory Neural Networks
CN110377889A (en) * 2019-06-05 2019-10-25 安徽继远软件有限公司 A kind of method for editing text and system based on feedforward sequence Memory Neural Networks
CN110705743A (en) * 2019-08-23 2020-01-17 国网浙江省电力有限公司 New energy consumption electric quantity prediction method based on long-term and short-term memory neural network
CN111079906A (en) * 2019-12-30 2020-04-28 燕山大学 Cement product specific surface area prediction method and system based on long-time and short-time memory network
CN111191559A (en) * 2019-12-25 2020-05-22 国网浙江省电力有限公司泰顺县供电公司 Overhead line early warning system obstacle identification method based on time convolution neural network
CN111241466A (en) * 2020-01-15 2020-06-05 上海海事大学 Ship flow prediction method based on deep learning
CN111414478A (en) * 2020-03-13 2020-07-14 北京科技大学 Social network emotion modeling method based on deep cycle neural network
CN112001482A (en) * 2020-08-14 2020-11-27 佳都新太科技股份有限公司 Vibration prediction and model training method and device, computer equipment and storage medium
CN112466056A (en) * 2020-12-01 2021-03-09 上海旷日网络科技有限公司 Self-service cabinet pickup system and method based on voice recognition
CN112488286A (en) * 2019-11-22 2021-03-12 大唐环境产业集团股份有限公司 MBR membrane pollution online monitoring method and system
CN112714130A (en) * 2020-12-30 2021-04-27 南京信息工程大学 Big data-based adaptive network security situation sensing method

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104538028B (en) * 2014-12-25 2017-10-17 清华大学 A kind of continuous speech recognition method that Recognition with Recurrent Neural Network is remembered based on depth shot and long term
CN104952448A (en) * 2015-05-04 2015-09-30 张爱英 Method and system for enhancing features by aid of bidirectional long-term and short-term memory recurrent neural networks
US10909329B2 (en) * 2015-05-21 2021-02-02 Baidu Usa Llc Multilingual image question answering
CN106611599A (en) * 2015-10-21 2017-05-03 展讯通信(上海)有限公司 Voice recognition method and device based on artificial neural network and electronic equipment
KR102494139B1 (en) * 2015-11-06 2023-01-31 삼성전자주식회사 Apparatus and method for training neural network, apparatus and method for speech recognition
CN105389980B (en) * 2015-11-09 2018-01-19 上海交通大学 Short-time Traffic Flow Forecasting Methods based on long short-term memory recurrent neural network
CN105469065B (en) * 2015-12-07 2019-04-23 中国科学院自动化研究所 A kind of discrete emotion identification method based on recurrent neural network
CN105513591B (en) * 2015-12-21 2019-09-03 百度在线网络技术(北京)有限公司 The method and apparatus for carrying out speech recognition with LSTM Recognition with Recurrent Neural Network model
EP3398118B1 (en) * 2016-02-04 2023-07-12 Deepmind Technologies Limited Associative long short-term memory neural network layers
US10235994B2 (en) 2016-03-04 2019-03-19 Microsoft Technology Licensing, Llc Modular deep learning model
CN105559777B (en) * 2016-03-17 2018-10-12 北京工业大学 Electroencephalogramrecognition recognition method based on wavelet packet and LSTM type RNN neural networks
KR102151682B1 (en) * 2016-03-23 2020-09-04 구글 엘엘씨 Adaptive audio enhancement for multi-channel speech recognition
CN111784348A (en) * 2016-04-26 2020-10-16 阿里巴巴集团控股有限公司 Account risk identification method and device
EP3451239A4 (en) 2016-04-29 2020-01-01 Cambricon Technologies Corporation Limited Apparatus and method for executing recurrent neural network and lstm computations
CN106096729B (en) * 2016-06-06 2018-11-20 天津科技大学 A kind of depth-size strategy learning method towards complex task in extensive environment
CN106126492B (en) * 2016-06-07 2019-02-05 北京高地信息技术有限公司 Sentence recognition methods and device based on two-way LSTM neural network
US11449744B2 (en) 2016-06-23 2022-09-20 Microsoft Technology Licensing, Llc End-to-end memory networks for contextual language understanding
CN107808664B (en) * 2016-08-30 2021-07-30 富士通株式会社 Sparse neural network-based voice recognition method, voice recognition device and electronic equipment
US10366163B2 (en) 2016-09-07 2019-07-30 Microsoft Technology Licensing, Llc Knowledge-guided structural attention processing
CN106383888A (en) * 2016-09-22 2017-02-08 深圳市唯特视科技有限公司 Method for positioning and navigation by use of picture retrieval
CN108461080A (en) * 2017-02-21 2018-08-28 中兴通讯股份有限公司 A kind of Acoustic Modeling method and apparatus based on HLSTM models
CN116702843A (en) 2017-05-20 2023-09-05 谷歌有限责任公司 Projection neural network
CN107293288B (en) * 2017-06-09 2020-04-21 清华大学 Acoustic model modeling method of residual long-short term memory recurrent neural network
CN107633842B (en) 2017-06-12 2018-08-31 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium
CN107301864B (en) * 2017-08-16 2020-12-22 重庆邮电大学 Deep bidirectional LSTM acoustic model based on Maxout neuron
CN107657313B (en) * 2017-09-26 2021-05-18 上海数眼科技发展有限公司 System and method for transfer learning of natural language processing task based on field adaptation
CN107993636B (en) * 2017-11-01 2021-12-31 天津大学 Recursive neural network-based music score modeling and generating method
CN108364634A (en) * 2018-03-05 2018-08-03 苏州声通信息科技有限公司 Spoken language pronunciation evaluating method based on deep neural network posterior probability algorithm
CN108831450A (en) * 2018-03-30 2018-11-16 杭州鸟瞰智能科技股份有限公司 A kind of virtual robot man-machine interaction method based on user emotion identification
US10885277B2 (en) 2018-08-02 2021-01-05 Google Llc On-device neural networks for natural language understanding
CN109243494B (en) * 2018-10-30 2022-10-11 南京工程学院 Children emotion recognition method based on multi-attention mechanism long-time memory network
CN110517679B (en) * 2018-11-15 2022-03-08 腾讯科技(深圳)有限公司 Artificial intelligence audio data processing method and device and storage medium
CN111368996B (en) 2019-02-14 2024-03-12 谷歌有限责任公司 Retraining projection network capable of transmitting natural language representation
CN110570845B (en) * 2019-08-15 2021-10-22 武汉理工大学 Voice recognition method based on domain invariant features
CN111429938B (en) * 2020-03-06 2022-09-13 江苏大学 Single-channel voice separation method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5133012A (en) * 1988-12-02 1992-07-21 Kabushiki Kaisha Toshiba Speech recognition system utilizing both a long-term strategic and a short-term strategic scoring operation in a transition network thereof
CN101937675A (en) * 2009-06-29 2011-01-05 展讯通信(上海)有限公司 Voice detection method and equipment thereof
CN102122507A (en) * 2010-01-08 2011-07-13 龚澍 Speech error detection method by front-end processing using artificial neural network (ANN)
US8005674B2 (en) * 2006-11-29 2011-08-23 International Business Machines Corporation Data modeling of class independent recognition models
CN104538028A (en) * 2014-12-25 2015-04-22 清华大学 Continuous voice recognition method based on deep long and short term memory recurrent neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9235799B2 (en) * 2011-11-26 2016-01-12 Microsoft Technology Licensing, Llc Discriminative pretraining of deep neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5133012A (en) * 1988-12-02 1992-07-21 Kabushiki Kaisha Toshiba Speech recognition system utilizing both a long-term strategic and a short-term strategic scoring operation in a transition network thereof
US8005674B2 (en) * 2006-11-29 2011-08-23 International Business Machines Corporation Data modeling of class independent recognition models
CN101937675A (en) * 2009-06-29 2011-01-05 展讯通信(上海)有限公司 Voice detection method and equipment thereof
CN102122507A (en) * 2010-01-08 2011-07-13 龚澍 Speech error detection method by front-end processing using artificial neural network (ANN)
CN104538028A (en) * 2014-12-25 2015-04-22 清华大学 Continuous voice recognition method based on deep long and short term memory recurrent neural network

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086865A (en) * 2018-06-11 2018-12-25 上海交通大学 A kind of series model method for building up based on cutting Recognition with Recurrent Neural Network
CN109086865B (en) * 2018-06-11 2022-01-28 上海交通大学 Sequence model establishing method based on segmented recurrent neural network
CN110147284A (en) * 2019-05-24 2019-08-20 湖南农业大学 Supercomputer workload prediction method based on two-dimentional shot and long term Memory Neural Networks
CN110377889A (en) * 2019-06-05 2019-10-25 安徽继远软件有限公司 A kind of method for editing text and system based on feedforward sequence Memory Neural Networks
CN110377889B (en) * 2019-06-05 2023-06-20 安徽继远软件有限公司 Text editing method and system based on feedforward sequence memory neural network
CN110705743A (en) * 2019-08-23 2020-01-17 国网浙江省电力有限公司 New energy consumption electric quantity prediction method based on long-term and short-term memory neural network
CN110705743B (en) * 2019-08-23 2023-08-18 国网浙江省电力有限公司 New energy consumption electric quantity prediction method based on long-term and short-term memory neural network
CN112488286A (en) * 2019-11-22 2021-03-12 大唐环境产业集团股份有限公司 MBR membrane pollution online monitoring method and system
CN111191559A (en) * 2019-12-25 2020-05-22 国网浙江省电力有限公司泰顺县供电公司 Overhead line early warning system obstacle identification method based on time convolution neural network
CN111191559B (en) * 2019-12-25 2023-07-11 国网浙江省电力有限公司泰顺县供电公司 Overhead line early warning system obstacle recognition method based on time convolution neural network
CN111079906A (en) * 2019-12-30 2020-04-28 燕山大学 Cement product specific surface area prediction method and system based on long-time and short-time memory network
CN111079906B (en) * 2019-12-30 2023-05-05 燕山大学 Cement finished product specific surface area prediction method and system based on long-short-term memory network
CN111241466A (en) * 2020-01-15 2020-06-05 上海海事大学 Ship flow prediction method based on deep learning
CN111241466B (en) * 2020-01-15 2023-10-03 上海海事大学 Ship flow prediction method based on deep learning
CN111414478A (en) * 2020-03-13 2020-07-14 北京科技大学 Social network emotion modeling method based on deep cycle neural network
CN111414478B (en) * 2020-03-13 2023-11-17 北京科技大学 Social network emotion modeling method based on deep cyclic neural network
CN112001482A (en) * 2020-08-14 2020-11-27 佳都新太科技股份有限公司 Vibration prediction and model training method and device, computer equipment and storage medium
CN112466056A (en) * 2020-12-01 2021-03-09 上海旷日网络科技有限公司 Self-service cabinet pickup system and method based on voice recognition
CN112714130A (en) * 2020-12-30 2021-04-27 南京信息工程大学 Big data-based adaptive network security situation sensing method

Also Published As

Publication number Publication date
CN104538028A (en) 2015-04-22
CN104538028B (en) 2017-10-17

Similar Documents

Publication Publication Date Title
WO2016101688A1 (en) Continuous voice recognition method based on deep long-and-short-term memory recurrent neural network
TWI692751B (en) Voice wake-up method, device and electronic equipment
JP7109302B2 (en) Text generation model update method and text generation device
Nakkiran et al. Compressing deep neural networks using a rank-constrained topology
US20190034784A1 (en) Fixed-point training method for deep neural networks based on dynamic fixed-point conversion scheme
US20180018555A1 (en) System and method for building artificial neural network architectures
US9728183B2 (en) System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification
WO2016145850A1 (en) Construction method for deep long short-term memory recurrent neural network acoustic model based on selective attention principle
CN109065032B (en) External corpus speech recognition method based on deep convolutional neural network
KR20160069329A (en) Method and apparatus for training language model, method and apparatus for recognizing speech
JP2019159654A (en) Time-series information learning system, method, and neural network model
CN108831445A (en) Sichuan dialect recognition methods, acoustic training model method, device and equipment
US20140142929A1 (en) Deep neural networks training for speech and pattern recognition
US20230196202A1 (en) System and method for automatic building of learning machines using learning machines
KR20160089210A (en) Method and apparatus for training language model, method and apparatus for recognizing language
CN109147774B (en) Improved time-delay neural network acoustic model
CN110853630B (en) Lightweight speech recognition method facing edge calculation
CN106340297A (en) Speech recognition method and system based on cloud computing and confidence calculation
KR20220130565A (en) Keyword detection method and apparatus thereof
CN111144124A (en) Training method of machine learning model, intention recognition method, related device and equipment
CN113987179A (en) Knowledge enhancement and backtracking loss-based conversational emotion recognition network model, construction method, electronic device and storage medium
Zhang et al. High order recurrent neural networks for acoustic modelling
CN108461080A (en) A kind of Acoustic Modeling method and apparatus based on HLSTM models
Kang et al. Advanced recurrent network-based hybrid acoustic models for low resource speech recognition
Li et al. Improving long short-term memory networks using maxout units for large vocabulary speech recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15871761

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15871761

Country of ref document: EP

Kind code of ref document: A1