CN104700828A - Deep long-term and short-term memory recurrent neural network acoustic model establishing method based on selective attention principles - Google Patents

Deep long-term and short-term memory recurrent neural network acoustic model establishing method based on selective attention principles Download PDF

Info

Publication number
CN104700828A
CN104700828A CN201510122982.6A CN201510122982A CN104700828A CN 104700828 A CN104700828 A CN 104700828A CN 201510122982 A CN201510122982 A CN 201510122982A CN 104700828 A CN104700828 A CN 104700828A
Authority
CN
China
Prior art keywords
neural network
input
recurrent neural
long term
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510122982.6A
Other languages
Chinese (zh)
Other versions
CN104700828B (en
Inventor
杨毅
孙甲松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201510122982.6A priority Critical patent/CN104700828B/en
Publication of CN104700828A publication Critical patent/CN104700828A/en
Priority to PCT/CN2015/092381 priority patent/WO2016145850A1/en
Application granted granted Critical
Publication of CN104700828B publication Critical patent/CN104700828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

Abstract

Disclosed is a deep long-term and short-term memory recurrent neural network acoustic model establishing method based on selective attention principles. According to the deep long-term and short-term memory recurrent neural network acoustic model establishing method based on the selective attention principles, attention gate units are added inside a deep long-term and short-term memory recurrent neural network acoustic model to represent instantaneous function change of auditory cortex neurons; the gate units are different in other gate units in that the other gate units are in one-to-one correspondence with time series, while the attention gate units represent short-term plasticity effects and accordingly have intervals in the time series; through the neural network acoustic model obtained by training mass voice data containing Cross-talk noise, robustness feature extraction of the Cross-talk noise and establishment of robust acoustic models can be achieved; the aim of improving the robustness of the acoustic models can be achieve by inhibiting influence of non-target flow on feature extraction. The deep long-term and short-term memory recurrent neural network acoustic model establishing method based on the selective attention principles can be widely applied to multiple voice recognition-related machine learning fields of speaker recognition, keyword recognition, man-machine interaction and the like.

Description

Based on the construction method of the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model of selective attention principle
Technical field
The invention belongs to Audiotechnica field, particularly a kind of construction method of the memory of the degree of depth shot and long term based on selective attention principle Recognition with Recurrent Neural Network acoustic model.
Background technology
Along with developing rapidly of infotech, speech recognition technology has possessed the condition of large-scale commercial.Current speech recognition mainly adopts the continuous speech recognition technology of Corpus--based Method model, and its main target is found the word sequence of the maximum probability representated by it.The task of the Continuous Speech Recognition System of Corpus--based Method model finds the word sequence of the maximum probability representated by it, generally includes the search coding/decoding method building acoustic model and language model and correspondence thereof.Along with the fast development of acoustic model and language model, the performance of speech recognition system is greatly improved under desirable acoustic enviroment, existing deep neural network-Hidden Markov Model (HMM) (Deep Neural Network-HiddenMarkov Model, DNN-HMM) tentatively ripe, automatically validity feature can be extracted by the method for machine learning, and contextual information modeling that can be corresponding to multiframe voice, but this each layer of class model has the parameter of 1,000,000 magnitudes, and the input of lower one deck is last output, therefore need to use GPU equipment to train DNN acoustic model, training time is long, the characteristic of nonlinearity and parameter sharing also makes DNN be difficult to carry out parameter adaptive.
There is oriented cycles to express the neural network of network internal dynamic time characteristic in Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN), is used widely in handwriting recongnition and language model etc. between a kind of unit.Voice signal is complicated time varying signal, and Different time scales has complicated correlativity, and therefore compared to deep neural network, the circulation linkage function that Recognition with Recurrent Neural Network has is more suitable for processing this kind of complex time sequence data.
As the one of Recognition with Recurrent Neural Network, shot and long term memory (Long Short-Term Memory, LSTM) model is more suitable for process and the delayed and long time series that the time is indefinite of predicted events than Recognition with Recurrent Neural Network.The multi-level sign ability of deep neural network is then combined with the contextual ability of Recognition with Recurrent Neural Network flexible utilization long span by the degree of depth LSTM-RNN acoustic model adding memory module (memory block) that University of Toronto proposes, and makes the phoneme recognition error rate based on TIMIT storehouse be down to 17.1%.
But there is gradient dispersion (vanishinggradient) problem in the gradient descent method used in Recognition with Recurrent Neural Network, namely in the process that the weight of network is adjusted, along with the network number of plies increases, gradient layer-by-layer dissipates, and causes it more and more less to the effect of weight adjusting.The two layer depth LSTM-RNN acoustic models that Google proposes, add Linear Circulation projection layer (Recurrent Projection Layer), for solving gradient dispersion problem in former degree of depth LSTM-RNN model.Contrast experiment shows, the frame accuracy (Frame Accuracy) of RNN and speed of convergence thereof are obviously inferior to LSTM-RNN and DNN; In Word Error Rate and speed of convergence thereof, the Word Error Rate of best DNN after training several weeks is 11.3%; And two layer depth LSTM-RNN models training after 48 hours Word Error Rate be reduced to 10.9%, train after 100/200 hour, Word Error Rate is reduced to 10.7/10.5 (%).
The degree of depth that University of Munich proposes two-way shot and long term memory Recognition with Recurrent Neural Network (DeepBidirectional Long Short-Term Memory Recurrent Neural Networks, DBLSTM-RNN) acoustic model, separate forward direction layer and rear to layer is defined in each circulation layer of neural network, and use the acoustic feature of many hidden layers to input to carry out more high-rise sign, the projection of supervised learning realization character and enhancing are carried out to noise and reverberation simultaneously.The method, on 2013PASCALCHiME data set, achieves Word Error Rate and is reduced to 22% from 55% of baseline in signal to noise ratio (S/N ratio) [-6dB, 9dB] scope.
But the complicacy of practical acoustic environment still has a strong impact on and disturbs the performance of Continuous Speech Recognition System, even if utilize the DNN acoustic model method of current main flow, comprise noise, music, spoken language, to repeat etc. under complicated environmental condition continuous speech recognition data set on also can only obtain about 70% discrimination, in Continuous Speech Recognition System, the noise immunity of acoustic model and robustness still have much room for improvement.
Along with the fast development of acoustic model and language model, the performance of speech recognition system is greatly improved under desirable acoustic enviroment, existing DNN-HMM model is tentatively ripe, automatically validity feature can be extracted by the method for machine learning, and contextual information modeling that can be corresponding to multiframe voice.But most of recognition system is still very responsive for the change of acoustic enviroment, the requirement of Practical Performance particularly can not be met under cross-talk noise (two people or many people speak simultaneously) interference.Compared with deep neural network acoustic model, between the unit in Recognition with Recurrent Neural Network acoustic model, there is oriented cycles, effectively can describe the dynamic time characteristic of neural network inside, be more suitable for processing the speech data with complex time sequence.And shot and long term Memory Neural Networks is more suitable for process and the delayed and long time series that the time is indefinite of predicted events than Recognition with Recurrent Neural Network, the acoustic model therefore for building speech recognition can obtain better effect.
The phenomenon of selective attention is there is in human brain when processing the voice of complex scene, its cardinal principle is: human brain has the ability of sense of hearing selective attention, in auditory cortex region by top-down controlling mechanism, realize the object suppressing non-targeted stream and strengthen target stream.Research shows, in the process of selective attention, short-term plasticity (Short-Term Plasticity) effect of auditory cortex adds the separating capacity to sound.When notice is concentrated very much, can start to carry out enhancing process to sound objects in 50 milliseconds at primary auditory cortex.
Summary of the invention
In order to overcome the shortcoming of above-mentioned prior art, a kind of degree of depth shot and long term based on selective attention principle is the object of the present invention is to provide to remember the construction method of Recognition with Recurrent Neural Network acoustic model, establish the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle, gate cell is noted by increasing in degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model, characterize the neuronic instantaneous function of auditory cortex to change, notice that gate cell and other gate cell differences are, other gate cells and time series one_to_one corresponding, and what notice that gate cell embodies is short-term plasticity effect, therefore in time series, there is interval.By carrying out a large amount of speech datas comprising Cross-talk noise training the above-mentioned neural network acoustic model obtained, can realize extracting the robust features of Cross-talk noise and the structure of robust acoustic model, by the object suppressing non-targeted stream can to reach the robustness improving acoustic model to the impact of feature extraction.
To achieve these goals, the technical solution used in the present invention is:
Based on a continuous speech recognition method for selective attention principle, comprise the steps:
The first step, builds the degree of depth shot and long term memory Recognition with Recurrent Neural Network based on selective attention principle
A shot and long term memory Recognition with Recurrent Neural Network is defined as from being input to hidden layer, the output that the degree of depth refers to each shot and long term memory Recognition with Recurrent Neural Network is the input that next shot and long term remembers Recognition with Recurrent Neural Network, repetition like this, last shot and long term remembers the output of output as whole system of Recognition with Recurrent Neural Network; In each shot and long term memory Recognition with Recurrent Neural Network, voice signal x tfor the input of t, x t-1for the input in t-1 moment, by that analogy, x=[x is input as in T.T. length 1..., x t] wherein t ∈ [1, T], T are the T.T. length of voice signal; The shot and long term memory Recognition with Recurrent Neural Network of t by noting door, input gate, out gate, forget door, memory cell, tanh function, hidden layer, multiplier form, the shot and long term memory Recognition with Recurrent Neural Network in t-1 moment by input gate, out gate, forget door, memory cell, tanh function, hidden layer, multiplier form; Hidden layer in T.T. length exports as y=[y 1..., y t];
Second step, builds the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle
On the basis of the first step, exist at interval of the degree of depth shot and long term memory Recognition with Recurrent Neural Network that the s moment is corresponding and note door, there is not attention door in the degree of depth shot and long term memory Recognition with Recurrent Neural Network in other moment, that is, the degree of depth shot and long term memory Recognition with Recurrent Neural Network that there is attention door based on the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model of selective attention principle by interval forms.
How disturbing at complex environment, particularly identify under cross-talk noise, be one of difficult point of speech recognition always, hinders the large-scale application of speech recognition.Compared with prior art, the present invention uses for reference human brain exists selective attention phenomenon when processing the voice of complex scene and realizes suppressing non-targeted stream and strengthening target stream, gate cell is noted by increasing in degree of depth shot and long term memory recurrent neural network acoustic model, characterize the neuronic instantaneous function of auditory cortex to change, notice that gate cell and other gate cell differences are, other gate cells and time series one_to_one corresponding, and what notice that gate cell embodies is short-term plasticity effect, therefore there is interval in time series.The continuous speech recognition data set that some comprise Cross-talk noise is adopted in this way, performance more better than deep neural network method can be obtained.
Accompanying drawing explanation
Fig. 1 is the degree of depth shot and long term based on selective attention principle of the present invention memory Recognition with Recurrent Neural Network process flow diagram.
Fig. 2 is the degree of depth shot and long term Memory Neural Networks acoustic model process flow diagram based on selective attention principle of the present invention.
Embodiment
Embodiments of the present invention are described in detail below in conjunction with drawings and Examples.
The present invention utilizes the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle, achieves continuous speech recognition.But models and methods provided by the invention is not limited to continuous speech recognition, also can be any method and apparatus relevant with speech recognition.
The present invention mainly comprises the steps:
The first step, builds the degree of depth shot and long term memory Recognition with Recurrent Neural Network based on selective attention principle
As shown in Figure 1, inputting 101 with input 102 is t and t-1 moment voice signal input x tand x t-1(t ∈ [1, T], T are the T.T. length of voice signal); The shot and long term memory Recognition with Recurrent Neural Network of t by noting door 103, input gate 104, forget door 105, memory cell 106, out gate 107, tanh function 108, tanh function 109, hidden layer 110, multiplier 122 and multiplier 123 form; The shot and long term memory Recognition with Recurrent Neural Network in t-1 moment by input gate 112, forget door 113, memory cell 114, out gate 115, tanh function 116, tanh function 117, hidden layer 118, multiplier 120 and multiplier 121 and form.T and t-1 moment hidden layer export and are respectively output 111 and export 119.
Wherein, input 102 is simultaneously as input gate 112, the input forgeing door 113, out gate 115 and tanh function 116, multiplier 120 is sent in the output of input gate 112 and the output of tanh function 116, output after computing is as the input of memory cell 114, the output of memory cell 114 is as the input of tanh function 117, multiplier 121 is sent in the output of tanh function 117 and the output of out gate 115, output after computing is as the input of hidden layer 118, and the output of hidden layer 118 is output 119.
Input 101, the output of memory cell 114 and the output of multiplier 121 are jointly as the input noting door 103, the output of attention door 103 and the output of multiplier 121 are jointly as the input of tanh function 108, note the output of door 103, the output of memory cell 114 and the output of multiplier 121 are common as input gate 104 respectively, forget the input of door 105 and out gate 107, multiplier 124 is sent in the output of the output and memory cell 114 of forgeing door 105, multiplier 122 is sent in the output of input gate 104 and the output of tanh function 108, the output of multiplier 124 and the output of multiplier 122 are as the input of memory cell 106, the output of memory cell 106 is as the input of tanh function 109, multiplier 123 is sent in the output of tanh function 109 and the output of out gate 107, the output of multiplier 123 is as the input of hidden layer 110, the output of hidden layer 110 is output 111.
That is: in the parameter in t ∈ [1, T] moment according to following formulae discovery:
G atten_t=sigmoid(W axx t+W amm t-1+W acCell t-1+b a)
G input_t=sigmoid(W iaG atten_t+W imm t-1+W icCell t-1+b i)
G forget_t=sigmoid(W faG atten_t+W fmm t-1+W fcCell t-1+b f)
Cell t=G forget_t⊙Cell t-1+G input_t⊙tanh(W caG atten_t+W cmm t-1+b c)
G output_t=sigmoid(W oaG atten_t+W omm t-1+W ocCell t-1+b o)
m t=G output_t⊙tanh(Cell t)
y t=softmax k(W ymm t+b y)
Wherein G atten_tfor t notes the output of door 103, G input_tfor the output of t input gate 104, G forget_tfor t forgets the output of door 105, Cell tfor the output of t memory cell 106, G output_tfor the output of t out gate 107, m tfor the input of t hidden layer 110, y tfor the output 111 of t; x tfor the input 101, m of t t-1for the input of t-1 moment hidden layer 118, Cell t-1for the output of t-1 moment memory cell 114; W axfor t notices that an a and t input the weight between x, W amfor t notes the weight between an a and t-1 moment hidden layer input m, W acfor t notes the weight between an a and t-1 moment memory cell c, W iafor t input gate i and t note the weight between an a, W imfor t input gate i and t-1 moment hidden layer input the weight between m, W icfor the weight between t input gate i and t-1 moment memory cell c, W fafor t forgets the weight that a f and t note between an a, W fmfor t forgets the weight between a f and t-1 moment hidden layer input m, W fcfor t forgets the weight between a f and t-1 moment memory cell c, W cafor t memory cell c and t note the weight between an a, W cmfor t memory cell c and t-1 moment hidden layer input the weight between m, W oafor t out gate o and t note the weight between an a, W omfor t out gate o and t-1 moment hidden layer input the weight between m, W ocfor the weight between t out gate o and t-1 moment memory cell c; b afor noting the departure of door a, b ifor the departure of input gate i, b ffor forgeing the departure of a f, b cfor the departure of memory cell c, b ofor the departure of out gate o, b yfor exporting the departure of y, different b represents different departures; And have wherein x krepresent the input of kth ∈ [1, K] individual softmax function, l ∈ [1, K] is for all summation; ⊙ represents matrix element and is multiplied.
Second step, builds the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle
On the basis of the first step, exist at interval of the degree of depth shot and long term memory Recognition with Recurrent Neural Network that s (s=5) moment is corresponding and note door, there is not attention door in the degree of depth shot and long term memory Recognition with Recurrent Neural Network in other moment, that is, the degree of depth shot and long term memory Recognition with Recurrent Neural Network that there is attention door based on the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model of selective attention principle by interval forms.Be illustrated in figure 2 the set up memory of the degree of depth shot and long term based on selective attention principle Recognition with Recurrent Neural Network acoustic model, the degree of depth shot and long term memory Recognition with Recurrent Neural Network of t exists notes door 201, the degree of depth shot and long term memory Recognition with Recurrent Neural Network in t-s moment exists notes door 202, so circulates.

Claims (2)

1., based on a construction method for the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model of selective attention principle, comprise the steps:
The first step, builds the degree of depth shot and long term memory Recognition with Recurrent Neural Network based on selective attention principle
A shot and long term memory Recognition with Recurrent Neural Network is defined as from being input to hidden layer, the output that the degree of depth refers to each shot and long term memory Recognition with Recurrent Neural Network is the input that next shot and long term remembers Recognition with Recurrent Neural Network, repetition like this, last shot and long term remembers the output of output as whole system of Recognition with Recurrent Neural Network; In each shot and long term memory Recognition with Recurrent Neural Network, voice signal x tfor the input of t, x t-1for the input in t-1 moment, by that analogy, x=[x is input as in T.T. length 1..., x t] wherein t ∈ [1, T], T are the T.T. length of voice signal; The shot and long term memory Recognition with Recurrent Neural Network of t by noting door, input gate, out gate, forget door, memory cell, tanh function, hidden layer, multiplier form, the shot and long term memory Recognition with Recurrent Neural Network in t-1 moment by input gate, out gate, forget door, memory cell, tanh function, hidden layer, multiplier form; Hidden layer in T.T. length exports as y=[y 1..., y t];
In the parameter in t ∈ [1, T] moment according to following formulae discovery:
G atten_t=sigmoid(W axx t+W amm t-1+W acCell t-1+b a)
G input_t=sigmoid(W iaG atten_t+W imm t-1+W icCell t-1+b i)
G forget_t=sigmoid(W faG atten_t+W fmm t-1+W fcCell t-1+b f)
Cell t=G forget_t⊙Cell t-1+G input_t⊙tanh(W caG atten_t+W cmm t-1+b c)
G output_t=sigmoid(W oaG atten_t+W omm t-1+W ocCell t-1+b o)
m t=G output_t⊙tanh(Cell t)
y t=soft max k(W ymm t+b y)
Wherein G atten_tfor t notes the output of door, G input_tfor the output of t input gate, G forget_tfor t forgets the output of door, Cell tfor the output of t memory cell, G output_tfor the output of t out gate, m tfor the input of t hidden layer, y tfor the output of t; x tfor the input of t, m t-1for the input of t-1 moment hidden layer, Cell t-1for the output of t-1 moment memory cell; W axfor t notices that an a and t input the weight between x, W amfor t notes the weight between an a and t-1 moment hidden layer input m, W acfor t notes the weight between an a and t-1 moment memory cell c, W iafor t input gate i and t note the weight between an a, W imfor t input gate i and t-1 moment hidden layer input the weight between m, W icfor the weight between t input gate i and t-1 moment memory cell c, W fafor t forgets the weight that a f and t note between an a, W fmfor t forgets the weight between a f and t-1 moment hidden layer input m, W fcfor t forgets the weight between a f and t-1 moment memory cell c, W cafor t memory cell c and t note the weight between an a, W cmfor t memory cell c and t-1 moment hidden layer input the weight between m, W oafor t out gate o and t note the weight between an a, W omfor t out gate o and t-1 moment hidden layer input the weight between m, W ocfor the weight between t out gate o and t-1 moment memory cell c; b afor noting the departure of door a, b ifor the departure of input gate i, b ffor forgeing the departure of a f, b cfor the departure of memory cell c, b ofor the departure of out gate o, b yfor exporting the departure of y, different b represents different departures; And have sigmoid ( x ) = 1 1 + e - x , tanh ( x ) = e x - e - x e x + e - x , soft max k ( x ) = e x k Σ l = 1 K e x l , Wherein x krepresent the input of kth ∈ [1, K] individual sof tmax function, l ∈ [1, K] is for all summation; ⊙ represents matrix element and is multiplied;
Second step, builds the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle
On the basis of the first step, exist at interval of the degree of depth shot and long term memory Recognition with Recurrent Neural Network that the s moment is corresponding and note door, there is not attention door in the degree of depth shot and long term memory Recognition with Recurrent Neural Network in other moment, that is, the degree of depth shot and long term memory Recognition with Recurrent Neural Network that there is attention door based on the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model of selective attention principle by interval forms.
2., according to claim 1 based on the construction method of the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model of selective attention principle, it is characterized in that, described s=5.
CN201510122982.6A 2015-03-19 2015-03-19 The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle Active CN104700828B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510122982.6A CN104700828B (en) 2015-03-19 2015-03-19 The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle
PCT/CN2015/092381 WO2016145850A1 (en) 2015-03-19 2015-10-21 Construction method for deep long short-term memory recurrent neural network acoustic model based on selective attention principle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510122982.6A CN104700828B (en) 2015-03-19 2015-03-19 The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle

Publications (2)

Publication Number Publication Date
CN104700828A true CN104700828A (en) 2015-06-10
CN104700828B CN104700828B (en) 2018-01-12

Family

ID=53347887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510122982.6A Active CN104700828B (en) 2015-03-19 2015-03-19 The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle

Country Status (2)

Country Link
CN (1) CN104700828B (en)
WO (1) WO2016145850A1 (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105185374A (en) * 2015-09-11 2015-12-23 百度在线网络技术(北京)有限公司 Prosodic hierarchy annotation method and device
CN105354277A (en) * 2015-10-30 2016-02-24 中国船舶重工集团公司第七0九研究所 Recommendation method and system based on recurrent neural network
CN105513591A (en) * 2015-12-21 2016-04-20 百度在线网络技术(北京)有限公司 Method and device for speech recognition by use of LSTM recurrent neural network model
CN105956469A (en) * 2016-04-27 2016-09-21 百度在线网络技术(北京)有限公司 Method and device for identifying file security
WO2016145850A1 (en) * 2015-03-19 2016-09-22 清华大学 Construction method for deep long short-term memory recurrent neural network acoustic model based on selective attention principle
CN106096729A (en) * 2016-06-06 2016-11-09 天津科技大学 A kind of towards the depth-size strategy learning method of complex task in extensive environment
CN106652999A (en) * 2015-10-29 2017-05-10 三星Sds株式会社 System and method for voice recognition
CN106650789A (en) * 2016-11-16 2017-05-10 同济大学 Image description generation method based on depth LSTM network
CN106683663A (en) * 2015-11-06 2017-05-17 三星电子株式会社 Neural network training apparatus and method, and speech recognition apparatus and method
CN107293291A (en) * 2016-03-30 2017-10-24 中国科学院声学研究所 A kind of audio recognition method end to end based on autoadapted learning rate
CN107293288A (en) * 2017-06-09 2017-10-24 清华大学 A kind of residual error shot and long term remembers the acoustic model modeling method of Recognition with Recurrent Neural Network
CN107484017A (en) * 2017-07-25 2017-12-15 天津大学 Supervision video abstraction generating method is had based on attention model
CN107492121A (en) * 2017-07-03 2017-12-19 广州新节奏智能科技股份有限公司 A kind of two-dimension human body bone independent positioning method of monocular depth video
CN107563122A (en) * 2017-09-20 2018-01-09 长沙学院 The method of crime prediction of Recognition with Recurrent Neural Network is locally connected based on interleaving time sequence
CN107993636A (en) * 2017-11-01 2018-05-04 天津大学 Music score modeling and generation method based on recurrent neural network
CN108269569A (en) * 2017-01-04 2018-07-10 三星电子株式会社 Audio recognition method and equipment
CN108304914A (en) * 2017-01-12 2018-07-20 三星电子株式会社 System and method for high-order shot and long term memory network
CN108475505A (en) * 2015-11-12 2018-08-31 谷歌有限责任公司 Using partial condition target sequence is generated from list entries
CN108780521A (en) * 2016-02-04 2018-11-09 渊慧科技有限公司 It is associated with shot and long term Memory Neural Networks layer
CN109155132A (en) * 2016-03-21 2019-01-04 亚马逊技术公司 Speaker verification method and system
CN109243494A (en) * 2018-10-30 2019-01-18 南京工程学院 Childhood emotional recognition methods based on the long memory network in short-term of multiple attention mechanism
CN109243493A (en) * 2018-10-30 2019-01-18 南京工程学院 Based on the vagitus emotion identification method for improving long memory network in short-term
CN109460812A (en) * 2017-09-06 2019-03-12 富士通株式会社 Average information analytical equipment, the optimization device, feature visualization device of neural network
CN109523995A (en) * 2018-12-26 2019-03-26 出门问问信息科技有限公司 Audio recognition method, speech recognition equipment, readable storage medium storing program for executing and electronic equipment
CN109614485A (en) * 2018-11-19 2019-04-12 中山大学 A kind of sentence matching method and device of the layering Attention based on syntactic structure
CN109866713A (en) * 2019-03-21 2019-06-11 斑马网络技术有限公司 Safety detection method and device, vehicle
CN110085249A (en) * 2019-05-09 2019-08-02 南京工程学院 The single-channel voice Enhancement Method of Recognition with Recurrent Neural Network based on attention gate
CN110135634A (en) * 2019-04-29 2019-08-16 广东电网有限责任公司电网规划研究中心 Long-medium term power load forecasting device
CN110192204A (en) * 2016-11-03 2019-08-30 易享信息技术有限公司 The deep neural network model of data is handled by multiple language task levels
CN110473529A (en) * 2019-09-09 2019-11-19 极限元(杭州)智能科技股份有限公司 A kind of Streaming voice transcription system based on from attention mechanism
CN111081231A (en) * 2016-03-23 2020-04-28 谷歌有限责任公司 Adaptive audio enhancement for multi-channel speech recognition
US11003949B2 (en) 2016-11-09 2021-05-11 Microsoft Technology Licensing, Llc Neural network-based action detection

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102100977B1 (en) * 2016-02-03 2020-04-14 구글 엘엘씨 Compressed circulatory neural network model
US9799327B1 (en) * 2016-02-26 2017-10-24 Google Inc. Speech recognition with attention-based recurrent neural networks
US10769522B2 (en) 2017-02-17 2020-09-08 Wipro Limited Method and system for determining classification of text
CN109543165B (en) * 2018-11-21 2022-09-23 中国人民解放军战略支援部队信息工程大学 Text generation method and device based on circular convolution attention model
CN110473554B (en) * 2019-08-08 2022-01-25 Oppo广东移动通信有限公司 Audio verification method and device, storage medium and electronic equipment
CN111079906B (en) * 2019-12-30 2023-05-05 燕山大学 Cement finished product specific surface area prediction method and system based on long-short-term memory network
CN111314345B (en) * 2020-02-19 2022-09-16 安徽大学 Method and device for protecting sequence data privacy, computer equipment and storage medium
CN111311009B (en) * 2020-02-24 2023-05-26 广东工业大学 Pedestrian track prediction method based on long-term and short-term memory
CN111429938B (en) * 2020-03-06 2022-09-13 江苏大学 Single-channel voice separation method and device and electronic equipment
CN111709754B (en) * 2020-06-12 2023-08-25 中国建设银行股份有限公司 User behavior feature extraction method, device, equipment and system
CN111814849B (en) * 2020-06-22 2024-02-06 浙江大学 DA-RNN-based wind turbine generator set key component fault early warning method
CN111985610A (en) * 2020-07-15 2020-11-24 中国石油大学(北京) System and method for predicting pumping efficiency of oil pumping well based on time sequence data
CN111930602B (en) * 2020-08-13 2023-09-22 中国工商银行股份有限公司 Performance index prediction method and device
CN112001482A (en) * 2020-08-14 2020-11-27 佳都新太科技股份有限公司 Vibration prediction and model training method and device, computer equipment and storage medium
CN112214852B (en) * 2020-10-09 2022-10-14 电子科技大学 Turbine mechanical performance degradation prediction method considering degradation rate
CN112382265A (en) * 2020-10-21 2021-02-19 西安交通大学 Active noise reduction method based on deep cycle neural network, storage medium and system
CN112434784A (en) * 2020-10-22 2021-03-02 暨南大学 Deep student performance prediction method based on multilayer LSTM
CN112906291B (en) * 2021-01-25 2023-05-19 武汉纺织大学 Modeling method and device based on neural network
CN112784472B (en) * 2021-01-27 2023-03-24 电子科技大学 Simulation method for simulating quantum condition principal equation in quantum transport process by using cyclic neural network
CN113792772B (en) * 2021-09-01 2023-11-03 中国船舶重工集团公司第七一六研究所 Cold and hot data identification method for data hierarchical hybrid storage

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080172349A1 (en) * 2007-01-12 2008-07-17 Toyota Engineering & Manufacturing North America, Inc. Neural network controller with fixed long-term and adaptive short-term memory
CN102983819A (en) * 2012-11-08 2013-03-20 南京航空航天大学 Imitating method of power amplifier and imitating device of power amplifier
CN103049792A (en) * 2011-11-26 2013-04-17 微软公司 Discriminative pretraining of Deep Neural Network
CN103680496A (en) * 2013-12-19 2014-03-26 百度在线网络技术(北京)有限公司 Deep-neural-network-based acoustic model training method, hosts and system
CN104217226A (en) * 2014-09-09 2014-12-17 天津大学 Dialogue act identification method based on deep neural networks and conditional random fields

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104700828B (en) * 2015-03-19 2018-01-12 清华大学 The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080172349A1 (en) * 2007-01-12 2008-07-17 Toyota Engineering & Manufacturing North America, Inc. Neural network controller with fixed long-term and adaptive short-term memory
CN103049792A (en) * 2011-11-26 2013-04-17 微软公司 Discriminative pretraining of Deep Neural Network
CN102983819A (en) * 2012-11-08 2013-03-20 南京航空航天大学 Imitating method of power amplifier and imitating device of power amplifier
CN103680496A (en) * 2013-12-19 2014-03-26 百度在线网络技术(北京)有限公司 Deep-neural-network-based acoustic model training method, hosts and system
CN104217226A (en) * 2014-09-09 2014-12-17 天津大学 Dialogue act identification method based on deep neural networks and conditional random fields

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ALEX GRAVES等: ""Towards end-to-end speech recognition with recurrent neural networks"", 《PROCEEDINGS OF THE 31ST INTERNATIONAL CONFERENCE ON MACHINE》 *

Cited By (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016145850A1 (en) * 2015-03-19 2016-09-22 清华大学 Construction method for deep long short-term memory recurrent neural network acoustic model based on selective attention principle
CN105185374A (en) * 2015-09-11 2015-12-23 百度在线网络技术(北京)有限公司 Prosodic hierarchy annotation method and device
CN105185374B (en) * 2015-09-11 2017-03-29 百度在线网络技术(北京)有限公司 Prosody hierarchy mask method and device
CN106652999A (en) * 2015-10-29 2017-05-10 三星Sds株式会社 System and method for voice recognition
CN105354277A (en) * 2015-10-30 2016-02-24 中国船舶重工集团公司第七0九研究所 Recommendation method and system based on recurrent neural network
CN105354277B (en) * 2015-10-30 2020-11-06 中国船舶重工集团公司第七0九研究所 Recommendation method and system based on recurrent neural network
CN106683663B (en) * 2015-11-06 2022-01-25 三星电子株式会社 Neural network training apparatus and method, and speech recognition apparatus and method
CN106683663A (en) * 2015-11-06 2017-05-17 三星电子株式会社 Neural network training apparatus and method, and speech recognition apparatus and method
CN108475505A (en) * 2015-11-12 2018-08-31 谷歌有限责任公司 Using partial condition target sequence is generated from list entries
CN108475505B (en) * 2015-11-12 2023-03-17 谷歌有限责任公司 Generating a target sequence from an input sequence using partial conditions
CN105513591B (en) * 2015-12-21 2019-09-03 百度在线网络技术(北京)有限公司 The method and apparatus for carrying out speech recognition with LSTM Recognition with Recurrent Neural Network model
CN105513591A (en) * 2015-12-21 2016-04-20 百度在线网络技术(北京)有限公司 Method and device for speech recognition by use of LSTM recurrent neural network model
CN108780521B (en) * 2016-02-04 2023-05-26 渊慧科技有限公司 Associated long-short term memory neural network layer
CN108780521A (en) * 2016-02-04 2018-11-09 渊慧科技有限公司 It is associated with shot and long term Memory Neural Networks layer
CN109155132B (en) * 2016-03-21 2023-05-30 亚马逊技术公司 Speaker verification method and system
CN109155132A (en) * 2016-03-21 2019-01-04 亚马逊技术公司 Speaker verification method and system
CN111081231A (en) * 2016-03-23 2020-04-28 谷歌有限责任公司 Adaptive audio enhancement for multi-channel speech recognition
CN111081231B (en) * 2016-03-23 2023-09-05 谷歌有限责任公司 Adaptive audio enhancement for multi-channel speech recognition
CN107293291A (en) * 2016-03-30 2017-10-24 中国科学院声学研究所 A kind of audio recognition method end to end based on autoadapted learning rate
CN105956469B (en) * 2016-04-27 2019-04-26 百度在线网络技术(北京)有限公司 File security recognition methods and device
CN105956469A (en) * 2016-04-27 2016-09-21 百度在线网络技术(北京)有限公司 Method and device for identifying file security
CN106096729B (en) * 2016-06-06 2018-11-20 天津科技大学 A kind of depth-size strategy learning method towards complex task in extensive environment
CN106096729A (en) * 2016-06-06 2016-11-09 天津科技大学 A kind of towards the depth-size strategy learning method of complex task in extensive environment
CN110192204A (en) * 2016-11-03 2019-08-30 易享信息技术有限公司 The deep neural network model of data is handled by multiple language task levels
US11797825B2 (en) 2016-11-03 2023-10-24 Salesforce, Inc. Training a joint many-task neural network model using successive regularization
US11783164B2 (en) 2016-11-03 2023-10-10 Salesforce.Com, Inc. Joint many-task neural network model for multiple natural language processing (NLP) tasks
CN110192204B (en) * 2016-11-03 2023-09-29 硕动力公司 Deep neural network model for processing data through multiple language task hierarchies
US11003949B2 (en) 2016-11-09 2021-05-11 Microsoft Technology Licensing, Llc Neural network-based action detection
CN106650789A (en) * 2016-11-16 2017-05-10 同济大学 Image description generation method based on depth LSTM network
CN106650789B (en) * 2016-11-16 2023-04-07 同济大学 Image description generation method based on depth LSTM network
CN108269569B (en) * 2017-01-04 2023-10-27 三星电子株式会社 Speech recognition method and device
CN108269569A (en) * 2017-01-04 2018-07-10 三星电子株式会社 Audio recognition method and equipment
CN108304914A (en) * 2017-01-12 2018-07-20 三星电子株式会社 System and method for high-order shot and long term memory network
CN108304914B (en) * 2017-01-12 2023-12-05 三星电子株式会社 System and method for high-order long-short-term memory network
CN107293288A (en) * 2017-06-09 2017-10-24 清华大学 A kind of residual error shot and long term remembers the acoustic model modeling method of Recognition with Recurrent Neural Network
CN107492121A (en) * 2017-07-03 2017-12-19 广州新节奏智能科技股份有限公司 A kind of two-dimension human body bone independent positioning method of monocular depth video
CN107492121B (en) * 2017-07-03 2020-12-29 广州新节奏智能科技股份有限公司 Two-dimensional human body bone point positioning method of monocular depth video
CN107484017A (en) * 2017-07-25 2017-12-15 天津大学 Supervision video abstraction generating method is had based on attention model
CN107484017B (en) * 2017-07-25 2020-05-26 天津大学 Supervised video abstract generation method based on attention model
CN109460812A (en) * 2017-09-06 2019-03-12 富士通株式会社 Average information analytical equipment, the optimization device, feature visualization device of neural network
CN107563122B (en) * 2017-09-20 2020-05-19 长沙学院 Crime prediction method based on interleaving time sequence local connection cyclic neural network
CN107563122A (en) * 2017-09-20 2018-01-09 长沙学院 The method of crime prediction of Recognition with Recurrent Neural Network is locally connected based on interleaving time sequence
CN107993636B (en) * 2017-11-01 2021-12-31 天津大学 Recursive neural network-based music score modeling and generating method
CN107993636A (en) * 2017-11-01 2018-05-04 天津大学 Music score modeling and generation method based on recurrent neural network
CN109243493A (en) * 2018-10-30 2019-01-18 南京工程学院 Based on the vagitus emotion identification method for improving long memory network in short-term
CN109243493B (en) * 2018-10-30 2022-09-16 南京工程学院 Infant crying emotion recognition method based on improved long-time and short-time memory network
CN109243494B (en) * 2018-10-30 2022-10-11 南京工程学院 Children emotion recognition method based on multi-attention mechanism long-time memory network
CN109243494A (en) * 2018-10-30 2019-01-18 南京工程学院 Childhood emotional recognition methods based on the long memory network in short-term of multiple attention mechanism
CN109614485A (en) * 2018-11-19 2019-04-12 中山大学 A kind of sentence matching method and device of the layering Attention based on syntactic structure
CN109614485B (en) * 2018-11-19 2023-03-14 中山大学 Sentence matching method and device of hierarchical Attention based on grammar structure
CN109523995B (en) * 2018-12-26 2019-07-09 出门问问信息科技有限公司 Audio recognition method, speech recognition equipment, readable storage medium storing program for executing and electronic equipment
CN109523995A (en) * 2018-12-26 2019-03-26 出门问问信息科技有限公司 Audio recognition method, speech recognition equipment, readable storage medium storing program for executing and electronic equipment
CN109866713A (en) * 2019-03-21 2019-06-11 斑马网络技术有限公司 Safety detection method and device, vehicle
CN110135634A (en) * 2019-04-29 2019-08-16 广东电网有限责任公司电网规划研究中心 Long-medium term power load forecasting device
CN110085249A (en) * 2019-05-09 2019-08-02 南京工程学院 The single-channel voice Enhancement Method of Recognition with Recurrent Neural Network based on attention gate
CN110473529B (en) * 2019-09-09 2021-11-05 北京中科智极科技有限公司 Stream type voice transcription system based on self-attention mechanism
CN110473529A (en) * 2019-09-09 2019-11-19 极限元(杭州)智能科技股份有限公司 A kind of Streaming voice transcription system based on from attention mechanism

Also Published As

Publication number Publication date
CN104700828B (en) 2018-01-12
WO2016145850A1 (en) 2016-09-22

Similar Documents

Publication Publication Date Title
CN104700828A (en) Deep long-term and short-term memory recurrent neural network acoustic model establishing method based on selective attention principles
CN104538028B (en) A kind of continuous speech recognition method that Recognition with Recurrent Neural Network is remembered based on depth shot and long term
Nakkiran et al. Compressing deep neural networks using a rank-constrained topology
Gelly et al. Optimization of RNN-based speech activity detection
Prabhavalkar et al. On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition
Sainath et al. Auto-encoder bottleneck features using deep belief networks
CN107293288B (en) Acoustic model modeling method of residual long-short term memory recurrent neural network
US10783900B2 (en) Convolutional, long short-term memory, fully connected deep neural networks
CN106919977B (en) Feedforward sequence memory neural network and construction method and system thereof
CN105139864B (en) Audio recognition method and device
Huang et al. Sndcnn: Self-normalizing deep cnns with scaled exponential linear units for speech recognition
CN107301864A (en) A kind of two-way LSTM acoustic models of depth based on Maxout neurons
Guiming et al. Speech recognition based on convolutional neural networks
CN108847244A (en) Voiceprint recognition method and system based on MFCC and improved BP neural network
CN110853668B (en) Voice tampering detection method based on multi-feature fusion
Guo et al. Time-delayed bottleneck highway networks using a DFT feature for keyword spotting
CN104464727A (en) Single-channel music singing separation method based on deep belief network
CN110223714A (en) A kind of voice-based Emotion identification method
CN109036467A (en) CFFD extracting method, speech-emotion recognition method and system based on TF-LSTM
CN109410974A (en) Sound enhancement method, device, equipment and storage medium
Liu et al. Pruning deep neural networks by optimal brain damage.
Zhang et al. High order recurrent neural networks for acoustic modelling
Huang et al. Beyond cross-entropy: Towards better frame-level objective functions for deep neural network training in automatic speech recognition
Li et al. Improving long short-term memory networks using maxout units for large vocabulary speech recognition
Cai et al. Convolutional maxout neural networks for low-resource speech recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant