CN104700828A - Deep long-term and short-term memory recurrent neural network acoustic model establishing method based on selective attention principles - Google Patents
Deep long-term and short-term memory recurrent neural network acoustic model establishing method based on selective attention principles Download PDFInfo
- Publication number
- CN104700828A CN104700828A CN201510122982.6A CN201510122982A CN104700828A CN 104700828 A CN104700828 A CN 104700828A CN 201510122982 A CN201510122982 A CN 201510122982A CN 104700828 A CN104700828 A CN 104700828A
- Authority
- CN
- China
- Prior art keywords
- neural network
- input
- recurrent neural
- long term
- recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
Abstract
Disclosed is a deep long-term and short-term memory recurrent neural network acoustic model establishing method based on selective attention principles. According to the deep long-term and short-term memory recurrent neural network acoustic model establishing method based on the selective attention principles, attention gate units are added inside a deep long-term and short-term memory recurrent neural network acoustic model to represent instantaneous function change of auditory cortex neurons; the gate units are different in other gate units in that the other gate units are in one-to-one correspondence with time series, while the attention gate units represent short-term plasticity effects and accordingly have intervals in the time series; through the neural network acoustic model obtained by training mass voice data containing Cross-talk noise, robustness feature extraction of the Cross-talk noise and establishment of robust acoustic models can be achieved; the aim of improving the robustness of the acoustic models can be achieve by inhibiting influence of non-target flow on feature extraction. The deep long-term and short-term memory recurrent neural network acoustic model establishing method based on the selective attention principles can be widely applied to multiple voice recognition-related machine learning fields of speaker recognition, keyword recognition, man-machine interaction and the like.
Description
Technical field
The invention belongs to Audiotechnica field, particularly a kind of construction method of the memory of the degree of depth shot and long term based on selective attention principle Recognition with Recurrent Neural Network acoustic model.
Background technology
Along with developing rapidly of infotech, speech recognition technology has possessed the condition of large-scale commercial.Current speech recognition mainly adopts the continuous speech recognition technology of Corpus--based Method model, and its main target is found the word sequence of the maximum probability representated by it.The task of the Continuous Speech Recognition System of Corpus--based Method model finds the word sequence of the maximum probability representated by it, generally includes the search coding/decoding method building acoustic model and language model and correspondence thereof.Along with the fast development of acoustic model and language model, the performance of speech recognition system is greatly improved under desirable acoustic enviroment, existing deep neural network-Hidden Markov Model (HMM) (Deep Neural Network-HiddenMarkov Model, DNN-HMM) tentatively ripe, automatically validity feature can be extracted by the method for machine learning, and contextual information modeling that can be corresponding to multiframe voice, but this each layer of class model has the parameter of 1,000,000 magnitudes, and the input of lower one deck is last output, therefore need to use GPU equipment to train DNN acoustic model, training time is long, the characteristic of nonlinearity and parameter sharing also makes DNN be difficult to carry out parameter adaptive.
There is oriented cycles to express the neural network of network internal dynamic time characteristic in Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN), is used widely in handwriting recongnition and language model etc. between a kind of unit.Voice signal is complicated time varying signal, and Different time scales has complicated correlativity, and therefore compared to deep neural network, the circulation linkage function that Recognition with Recurrent Neural Network has is more suitable for processing this kind of complex time sequence data.
As the one of Recognition with Recurrent Neural Network, shot and long term memory (Long Short-Term Memory, LSTM) model is more suitable for process and the delayed and long time series that the time is indefinite of predicted events than Recognition with Recurrent Neural Network.The multi-level sign ability of deep neural network is then combined with the contextual ability of Recognition with Recurrent Neural Network flexible utilization long span by the degree of depth LSTM-RNN acoustic model adding memory module (memory block) that University of Toronto proposes, and makes the phoneme recognition error rate based on TIMIT storehouse be down to 17.1%.
But there is gradient dispersion (vanishinggradient) problem in the gradient descent method used in Recognition with Recurrent Neural Network, namely in the process that the weight of network is adjusted, along with the network number of plies increases, gradient layer-by-layer dissipates, and causes it more and more less to the effect of weight adjusting.The two layer depth LSTM-RNN acoustic models that Google proposes, add Linear Circulation projection layer (Recurrent Projection Layer), for solving gradient dispersion problem in former degree of depth LSTM-RNN model.Contrast experiment shows, the frame accuracy (Frame Accuracy) of RNN and speed of convergence thereof are obviously inferior to LSTM-RNN and DNN; In Word Error Rate and speed of convergence thereof, the Word Error Rate of best DNN after training several weeks is 11.3%; And two layer depth LSTM-RNN models training after 48 hours Word Error Rate be reduced to 10.9%, train after 100/200 hour, Word Error Rate is reduced to 10.7/10.5 (%).
The degree of depth that University of Munich proposes two-way shot and long term memory Recognition with Recurrent Neural Network (DeepBidirectional Long Short-Term Memory Recurrent Neural Networks, DBLSTM-RNN) acoustic model, separate forward direction layer and rear to layer is defined in each circulation layer of neural network, and use the acoustic feature of many hidden layers to input to carry out more high-rise sign, the projection of supervised learning realization character and enhancing are carried out to noise and reverberation simultaneously.The method, on 2013PASCALCHiME data set, achieves Word Error Rate and is reduced to 22% from 55% of baseline in signal to noise ratio (S/N ratio) [-6dB, 9dB] scope.
But the complicacy of practical acoustic environment still has a strong impact on and disturbs the performance of Continuous Speech Recognition System, even if utilize the DNN acoustic model method of current main flow, comprise noise, music, spoken language, to repeat etc. under complicated environmental condition continuous speech recognition data set on also can only obtain about 70% discrimination, in Continuous Speech Recognition System, the noise immunity of acoustic model and robustness still have much room for improvement.
Along with the fast development of acoustic model and language model, the performance of speech recognition system is greatly improved under desirable acoustic enviroment, existing DNN-HMM model is tentatively ripe, automatically validity feature can be extracted by the method for machine learning, and contextual information modeling that can be corresponding to multiframe voice.But most of recognition system is still very responsive for the change of acoustic enviroment, the requirement of Practical Performance particularly can not be met under cross-talk noise (two people or many people speak simultaneously) interference.Compared with deep neural network acoustic model, between the unit in Recognition with Recurrent Neural Network acoustic model, there is oriented cycles, effectively can describe the dynamic time characteristic of neural network inside, be more suitable for processing the speech data with complex time sequence.And shot and long term Memory Neural Networks is more suitable for process and the delayed and long time series that the time is indefinite of predicted events than Recognition with Recurrent Neural Network, the acoustic model therefore for building speech recognition can obtain better effect.
The phenomenon of selective attention is there is in human brain when processing the voice of complex scene, its cardinal principle is: human brain has the ability of sense of hearing selective attention, in auditory cortex region by top-down controlling mechanism, realize the object suppressing non-targeted stream and strengthen target stream.Research shows, in the process of selective attention, short-term plasticity (Short-Term Plasticity) effect of auditory cortex adds the separating capacity to sound.When notice is concentrated very much, can start to carry out enhancing process to sound objects in 50 milliseconds at primary auditory cortex.
Summary of the invention
In order to overcome the shortcoming of above-mentioned prior art, a kind of degree of depth shot and long term based on selective attention principle is the object of the present invention is to provide to remember the construction method of Recognition with Recurrent Neural Network acoustic model, establish the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle, gate cell is noted by increasing in degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model, characterize the neuronic instantaneous function of auditory cortex to change, notice that gate cell and other gate cell differences are, other gate cells and time series one_to_one corresponding, and what notice that gate cell embodies is short-term plasticity effect, therefore in time series, there is interval.By carrying out a large amount of speech datas comprising Cross-talk noise training the above-mentioned neural network acoustic model obtained, can realize extracting the robust features of Cross-talk noise and the structure of robust acoustic model, by the object suppressing non-targeted stream can to reach the robustness improving acoustic model to the impact of feature extraction.
To achieve these goals, the technical solution used in the present invention is:
Based on a continuous speech recognition method for selective attention principle, comprise the steps:
The first step, builds the degree of depth shot and long term memory Recognition with Recurrent Neural Network based on selective attention principle
A shot and long term memory Recognition with Recurrent Neural Network is defined as from being input to hidden layer, the output that the degree of depth refers to each shot and long term memory Recognition with Recurrent Neural Network is the input that next shot and long term remembers Recognition with Recurrent Neural Network, repetition like this, last shot and long term remembers the output of output as whole system of Recognition with Recurrent Neural Network; In each shot and long term memory Recognition with Recurrent Neural Network, voice signal x
tfor the input of t, x
t-1for the input in t-1 moment, by that analogy, x=[x is input as in T.T. length
1..., x
t] wherein t ∈ [1, T], T are the T.T. length of voice signal; The shot and long term memory Recognition with Recurrent Neural Network of t by noting door, input gate, out gate, forget door, memory cell, tanh function, hidden layer, multiplier form, the shot and long term memory Recognition with Recurrent Neural Network in t-1 moment by input gate, out gate, forget door, memory cell, tanh function, hidden layer, multiplier form; Hidden layer in T.T. length exports as y=[y
1..., y
t];
Second step, builds the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle
On the basis of the first step, exist at interval of the degree of depth shot and long term memory Recognition with Recurrent Neural Network that the s moment is corresponding and note door, there is not attention door in the degree of depth shot and long term memory Recognition with Recurrent Neural Network in other moment, that is, the degree of depth shot and long term memory Recognition with Recurrent Neural Network that there is attention door based on the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model of selective attention principle by interval forms.
How disturbing at complex environment, particularly identify under cross-talk noise, be one of difficult point of speech recognition always, hinders the large-scale application of speech recognition.Compared with prior art, the present invention uses for reference human brain exists selective attention phenomenon when processing the voice of complex scene and realizes suppressing non-targeted stream and strengthening target stream, gate cell is noted by increasing in degree of depth shot and long term memory recurrent neural network acoustic model, characterize the neuronic instantaneous function of auditory cortex to change, notice that gate cell and other gate cell differences are, other gate cells and time series one_to_one corresponding, and what notice that gate cell embodies is short-term plasticity effect, therefore there is interval in time series.The continuous speech recognition data set that some comprise Cross-talk noise is adopted in this way, performance more better than deep neural network method can be obtained.
Accompanying drawing explanation
Fig. 1 is the degree of depth shot and long term based on selective attention principle of the present invention memory Recognition with Recurrent Neural Network process flow diagram.
Fig. 2 is the degree of depth shot and long term Memory Neural Networks acoustic model process flow diagram based on selective attention principle of the present invention.
Embodiment
Embodiments of the present invention are described in detail below in conjunction with drawings and Examples.
The present invention utilizes the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle, achieves continuous speech recognition.But models and methods provided by the invention is not limited to continuous speech recognition, also can be any method and apparatus relevant with speech recognition.
The present invention mainly comprises the steps:
The first step, builds the degree of depth shot and long term memory Recognition with Recurrent Neural Network based on selective attention principle
As shown in Figure 1, inputting 101 with input 102 is t and t-1 moment voice signal input x
tand x
t-1(t ∈ [1, T], T are the T.T. length of voice signal); The shot and long term memory Recognition with Recurrent Neural Network of t by noting door 103, input gate 104, forget door 105, memory cell 106, out gate 107, tanh function 108, tanh function 109, hidden layer 110, multiplier 122 and multiplier 123 form; The shot and long term memory Recognition with Recurrent Neural Network in t-1 moment by input gate 112, forget door 113, memory cell 114, out gate 115, tanh function 116, tanh function 117, hidden layer 118, multiplier 120 and multiplier 121 and form.T and t-1 moment hidden layer export and are respectively output 111 and export 119.
Wherein, input 102 is simultaneously as input gate 112, the input forgeing door 113, out gate 115 and tanh function 116, multiplier 120 is sent in the output of input gate 112 and the output of tanh function 116, output after computing is as the input of memory cell 114, the output of memory cell 114 is as the input of tanh function 117, multiplier 121 is sent in the output of tanh function 117 and the output of out gate 115, output after computing is as the input of hidden layer 118, and the output of hidden layer 118 is output 119.
Input 101, the output of memory cell 114 and the output of multiplier 121 are jointly as the input noting door 103, the output of attention door 103 and the output of multiplier 121 are jointly as the input of tanh function 108, note the output of door 103, the output of memory cell 114 and the output of multiplier 121 are common as input gate 104 respectively, forget the input of door 105 and out gate 107, multiplier 124 is sent in the output of the output and memory cell 114 of forgeing door 105, multiplier 122 is sent in the output of input gate 104 and the output of tanh function 108, the output of multiplier 124 and the output of multiplier 122 are as the input of memory cell 106, the output of memory cell 106 is as the input of tanh function 109, multiplier 123 is sent in the output of tanh function 109 and the output of out gate 107, the output of multiplier 123 is as the input of hidden layer 110, the output of hidden layer 110 is output 111.
That is: in the parameter in t ∈ [1, T] moment according to following formulae discovery:
G
atten_t=sigmoid(W
axx
t+W
amm
t-1+W
acCell
t-1+b
a)
G
input_t=sigmoid(W
iaG
atten_t+W
imm
t-1+W
icCell
t-1+b
i)
G
forget_t=sigmoid(W
faG
atten_t+W
fmm
t-1+W
fcCell
t-1+b
f)
Cell
t=G
forget_t⊙Cell
t-1+G
input_t⊙tanh(W
caG
atten_t+W
cmm
t-1+b
c)
G
output_t=sigmoid(W
oaG
atten_t+W
omm
t-1+W
ocCell
t-1+b
o)
m
t=G
output_t⊙tanh(Cell
t)
y
t=softmax
k(W
ymm
t+b
y)
Wherein G
atten_tfor t notes the output of door 103, G
input_tfor the output of t input gate 104, G
forget_tfor t forgets the output of door 105, Cell
tfor the output of t memory cell 106, G
output_tfor the output of t out gate 107, m
tfor the input of t hidden layer 110, y
tfor the output 111 of t; x
tfor the input 101, m of t
t-1for the input of t-1 moment hidden layer 118, Cell
t-1for the output of t-1 moment memory cell 114; W
axfor t notices that an a and t input the weight between x, W
amfor t notes the weight between an a and t-1 moment hidden layer input m, W
acfor t notes the weight between an a and t-1 moment memory cell c, W
iafor t input gate i and t note the weight between an a, W
imfor t input gate i and t-1 moment hidden layer input the weight between m, W
icfor the weight between t input gate i and t-1 moment memory cell c, W
fafor t forgets the weight that a f and t note between an a, W
fmfor t forgets the weight between a f and t-1 moment hidden layer input m, W
fcfor t forgets the weight between a f and t-1 moment memory cell c, W
cafor t memory cell c and t note the weight between an a, W
cmfor t memory cell c and t-1 moment hidden layer input the weight between m, W
oafor t out gate o and t note the weight between an a, W
omfor t out gate o and t-1 moment hidden layer input the weight between m, W
ocfor the weight between t out gate o and t-1 moment memory cell c; b
afor noting the departure of door a, b
ifor the departure of input gate i, b
ffor forgeing the departure of a f, b
cfor the departure of memory cell c, b
ofor the departure of out gate o, b
yfor exporting the departure of y, different b represents different departures; And have
wherein x
krepresent the input of kth ∈ [1, K] individual softmax function, l ∈ [1, K] is for all
summation; ⊙ represents matrix element and is multiplied.
Second step, builds the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle
On the basis of the first step, exist at interval of the degree of depth shot and long term memory Recognition with Recurrent Neural Network that s (s=5) moment is corresponding and note door, there is not attention door in the degree of depth shot and long term memory Recognition with Recurrent Neural Network in other moment, that is, the degree of depth shot and long term memory Recognition with Recurrent Neural Network that there is attention door based on the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model of selective attention principle by interval forms.Be illustrated in figure 2 the set up memory of the degree of depth shot and long term based on selective attention principle Recognition with Recurrent Neural Network acoustic model, the degree of depth shot and long term memory Recognition with Recurrent Neural Network of t exists notes door 201, the degree of depth shot and long term memory Recognition with Recurrent Neural Network in t-s moment exists notes door 202, so circulates.
Claims (2)
1., based on a construction method for the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model of selective attention principle, comprise the steps:
The first step, builds the degree of depth shot and long term memory Recognition with Recurrent Neural Network based on selective attention principle
A shot and long term memory Recognition with Recurrent Neural Network is defined as from being input to hidden layer, the output that the degree of depth refers to each shot and long term memory Recognition with Recurrent Neural Network is the input that next shot and long term remembers Recognition with Recurrent Neural Network, repetition like this, last shot and long term remembers the output of output as whole system of Recognition with Recurrent Neural Network; In each shot and long term memory Recognition with Recurrent Neural Network, voice signal x
tfor the input of t, x
t-1for the input in t-1 moment, by that analogy, x=[x is input as in T.T. length
1..., x
t] wherein t ∈ [1, T], T are the T.T. length of voice signal; The shot and long term memory Recognition with Recurrent Neural Network of t by noting door, input gate, out gate, forget door, memory cell, tanh function, hidden layer, multiplier form, the shot and long term memory Recognition with Recurrent Neural Network in t-1 moment by input gate, out gate, forget door, memory cell, tanh function, hidden layer, multiplier form; Hidden layer in T.T. length exports as y=[y
1..., y
t];
In the parameter in t ∈ [1, T] moment according to following formulae discovery:
G
atten_t=sigmoid(W
axx
t+W
amm
t-1+W
acCell
t-1+b
a)
G
input_t=sigmoid(W
iaG
atten_t+W
imm
t-1+W
icCell
t-1+b
i)
G
forget_t=sigmoid(W
faG
atten_t+W
fmm
t-1+W
fcCell
t-1+b
f)
Cell
t=G
forget_t⊙Cell
t-1+G
input_t⊙tanh(W
caG
atten_t+W
cmm
t-1+b
c)
G
output_t=sigmoid(W
oaG
atten_t+W
omm
t-1+W
ocCell
t-1+b
o)
m
t=G
output_t⊙tanh(Cell
t)
y
t=soft max
k(W
ymm
t+b
y)
Wherein G
atten_tfor t notes the output of door, G
input_tfor the output of t input gate, G
forget_tfor t forgets the output of door, Cell
tfor the output of t memory cell, G
output_tfor the output of t out gate, m
tfor the input of t hidden layer, y
tfor the output of t; x
tfor the input of t, m
t-1for the input of t-1 moment hidden layer, Cell
t-1for the output of t-1 moment memory cell; W
axfor t notices that an a and t input the weight between x, W
amfor t notes the weight between an a and t-1 moment hidden layer input m, W
acfor t notes the weight between an a and t-1 moment memory cell c, W
iafor t input gate i and t note the weight between an a, W
imfor t input gate i and t-1 moment hidden layer input the weight between m, W
icfor the weight between t input gate i and t-1 moment memory cell c, W
fafor t forgets the weight that a f and t note between an a, W
fmfor t forgets the weight between a f and t-1 moment hidden layer input m, W
fcfor t forgets the weight between a f and t-1 moment memory cell c, W
cafor t memory cell c and t note the weight between an a, W
cmfor t memory cell c and t-1 moment hidden layer input the weight between m, W
oafor t out gate o and t note the weight between an a, W
omfor t out gate o and t-1 moment hidden layer input the weight between m, W
ocfor the weight between t out gate o and t-1 moment memory cell c; b
afor noting the departure of door a, b
ifor the departure of input gate i, b
ffor forgeing the departure of a f, b
cfor the departure of memory cell c, b
ofor the departure of out gate o, b
yfor exporting the departure of y, different b represents different departures; And have
Wherein x
krepresent the input of kth ∈ [1, K] individual sof tmax function, l ∈ [1, K] is for all
summation; ⊙ represents matrix element and is multiplied;
Second step, builds the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle
On the basis of the first step, exist at interval of the degree of depth shot and long term memory Recognition with Recurrent Neural Network that the s moment is corresponding and note door, there is not attention door in the degree of depth shot and long term memory Recognition with Recurrent Neural Network in other moment, that is, the degree of depth shot and long term memory Recognition with Recurrent Neural Network that there is attention door based on the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model of selective attention principle by interval forms.
2., according to claim 1 based on the construction method of the degree of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model of selective attention principle, it is characterized in that, described s=5.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510122982.6A CN104700828B (en) | 2015-03-19 | 2015-03-19 | The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle |
PCT/CN2015/092381 WO2016145850A1 (en) | 2015-03-19 | 2015-10-21 | Construction method for deep long short-term memory recurrent neural network acoustic model based on selective attention principle |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510122982.6A CN104700828B (en) | 2015-03-19 | 2015-03-19 | The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104700828A true CN104700828A (en) | 2015-06-10 |
CN104700828B CN104700828B (en) | 2018-01-12 |
Family
ID=53347887
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510122982.6A Active CN104700828B (en) | 2015-03-19 | 2015-03-19 | The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104700828B (en) |
WO (1) | WO2016145850A1 (en) |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105185374A (en) * | 2015-09-11 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Prosodic hierarchy annotation method and device |
CN105354277A (en) * | 2015-10-30 | 2016-02-24 | 中国船舶重工集团公司第七0九研究所 | Recommendation method and system based on recurrent neural network |
CN105513591A (en) * | 2015-12-21 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | Method and device for speech recognition by use of LSTM recurrent neural network model |
CN105956469A (en) * | 2016-04-27 | 2016-09-21 | 百度在线网络技术(北京)有限公司 | Method and device for identifying file security |
WO2016145850A1 (en) * | 2015-03-19 | 2016-09-22 | 清华大学 | Construction method for deep long short-term memory recurrent neural network acoustic model based on selective attention principle |
CN106096729A (en) * | 2016-06-06 | 2016-11-09 | 天津科技大学 | A kind of towards the depth-size strategy learning method of complex task in extensive environment |
CN106652999A (en) * | 2015-10-29 | 2017-05-10 | 三星Sds株式会社 | System and method for voice recognition |
CN106650789A (en) * | 2016-11-16 | 2017-05-10 | 同济大学 | Image description generation method based on depth LSTM network |
CN106683663A (en) * | 2015-11-06 | 2017-05-17 | 三星电子株式会社 | Neural network training apparatus and method, and speech recognition apparatus and method |
CN107293291A (en) * | 2016-03-30 | 2017-10-24 | 中国科学院声学研究所 | A kind of audio recognition method end to end based on autoadapted learning rate |
CN107293288A (en) * | 2017-06-09 | 2017-10-24 | 清华大学 | A kind of residual error shot and long term remembers the acoustic model modeling method of Recognition with Recurrent Neural Network |
CN107484017A (en) * | 2017-07-25 | 2017-12-15 | 天津大学 | Supervision video abstraction generating method is had based on attention model |
CN107492121A (en) * | 2017-07-03 | 2017-12-19 | 广州新节奏智能科技股份有限公司 | A kind of two-dimension human body bone independent positioning method of monocular depth video |
CN107563122A (en) * | 2017-09-20 | 2018-01-09 | 长沙学院 | The method of crime prediction of Recognition with Recurrent Neural Network is locally connected based on interleaving time sequence |
CN107993636A (en) * | 2017-11-01 | 2018-05-04 | 天津大学 | Music score modeling and generation method based on recurrent neural network |
CN108269569A (en) * | 2017-01-04 | 2018-07-10 | 三星电子株式会社 | Audio recognition method and equipment |
CN108304914A (en) * | 2017-01-12 | 2018-07-20 | 三星电子株式会社 | System and method for high-order shot and long term memory network |
CN108475505A (en) * | 2015-11-12 | 2018-08-31 | 谷歌有限责任公司 | Using partial condition target sequence is generated from list entries |
CN108780521A (en) * | 2016-02-04 | 2018-11-09 | 渊慧科技有限公司 | It is associated with shot and long term Memory Neural Networks layer |
CN109155132A (en) * | 2016-03-21 | 2019-01-04 | 亚马逊技术公司 | Speaker verification method and system |
CN109243494A (en) * | 2018-10-30 | 2019-01-18 | 南京工程学院 | Childhood emotional recognition methods based on the long memory network in short-term of multiple attention mechanism |
CN109243493A (en) * | 2018-10-30 | 2019-01-18 | 南京工程学院 | Based on the vagitus emotion identification method for improving long memory network in short-term |
CN109460812A (en) * | 2017-09-06 | 2019-03-12 | 富士通株式会社 | Average information analytical equipment, the optimization device, feature visualization device of neural network |
CN109523995A (en) * | 2018-12-26 | 2019-03-26 | 出门问问信息科技有限公司 | Audio recognition method, speech recognition equipment, readable storage medium storing program for executing and electronic equipment |
CN109614485A (en) * | 2018-11-19 | 2019-04-12 | 中山大学 | A kind of sentence matching method and device of the layering Attention based on syntactic structure |
CN109866713A (en) * | 2019-03-21 | 2019-06-11 | 斑马网络技术有限公司 | Safety detection method and device, vehicle |
CN110085249A (en) * | 2019-05-09 | 2019-08-02 | 南京工程学院 | The single-channel voice Enhancement Method of Recognition with Recurrent Neural Network based on attention gate |
CN110135634A (en) * | 2019-04-29 | 2019-08-16 | 广东电网有限责任公司电网规划研究中心 | Long-medium term power load forecasting device |
CN110192204A (en) * | 2016-11-03 | 2019-08-30 | 易享信息技术有限公司 | The deep neural network model of data is handled by multiple language task levels |
CN110473529A (en) * | 2019-09-09 | 2019-11-19 | 极限元(杭州)智能科技股份有限公司 | A kind of Streaming voice transcription system based on from attention mechanism |
CN111081231A (en) * | 2016-03-23 | 2020-04-28 | 谷歌有限责任公司 | Adaptive audio enhancement for multi-channel speech recognition |
US11003949B2 (en) | 2016-11-09 | 2021-05-11 | Microsoft Technology Licensing, Llc | Neural network-based action detection |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102100977B1 (en) * | 2016-02-03 | 2020-04-14 | 구글 엘엘씨 | Compressed circulatory neural network model |
US9799327B1 (en) * | 2016-02-26 | 2017-10-24 | Google Inc. | Speech recognition with attention-based recurrent neural networks |
US10769522B2 (en) | 2017-02-17 | 2020-09-08 | Wipro Limited | Method and system for determining classification of text |
CN109543165B (en) * | 2018-11-21 | 2022-09-23 | 中国人民解放军战略支援部队信息工程大学 | Text generation method and device based on circular convolution attention model |
CN110473554B (en) * | 2019-08-08 | 2022-01-25 | Oppo广东移动通信有限公司 | Audio verification method and device, storage medium and electronic equipment |
CN111079906B (en) * | 2019-12-30 | 2023-05-05 | 燕山大学 | Cement finished product specific surface area prediction method and system based on long-short-term memory network |
CN111314345B (en) * | 2020-02-19 | 2022-09-16 | 安徽大学 | Method and device for protecting sequence data privacy, computer equipment and storage medium |
CN111311009B (en) * | 2020-02-24 | 2023-05-26 | 广东工业大学 | Pedestrian track prediction method based on long-term and short-term memory |
CN111429938B (en) * | 2020-03-06 | 2022-09-13 | 江苏大学 | Single-channel voice separation method and device and electronic equipment |
CN111709754B (en) * | 2020-06-12 | 2023-08-25 | 中国建设银行股份有限公司 | User behavior feature extraction method, device, equipment and system |
CN111814849B (en) * | 2020-06-22 | 2024-02-06 | 浙江大学 | DA-RNN-based wind turbine generator set key component fault early warning method |
CN111985610A (en) * | 2020-07-15 | 2020-11-24 | 中国石油大学(北京) | System and method for predicting pumping efficiency of oil pumping well based on time sequence data |
CN111930602B (en) * | 2020-08-13 | 2023-09-22 | 中国工商银行股份有限公司 | Performance index prediction method and device |
CN112001482A (en) * | 2020-08-14 | 2020-11-27 | 佳都新太科技股份有限公司 | Vibration prediction and model training method and device, computer equipment and storage medium |
CN112214852B (en) * | 2020-10-09 | 2022-10-14 | 电子科技大学 | Turbine mechanical performance degradation prediction method considering degradation rate |
CN112382265A (en) * | 2020-10-21 | 2021-02-19 | 西安交通大学 | Active noise reduction method based on deep cycle neural network, storage medium and system |
CN112434784A (en) * | 2020-10-22 | 2021-03-02 | 暨南大学 | Deep student performance prediction method based on multilayer LSTM |
CN112906291B (en) * | 2021-01-25 | 2023-05-19 | 武汉纺织大学 | Modeling method and device based on neural network |
CN112784472B (en) * | 2021-01-27 | 2023-03-24 | 电子科技大学 | Simulation method for simulating quantum condition principal equation in quantum transport process by using cyclic neural network |
CN113792772B (en) * | 2021-09-01 | 2023-11-03 | 中国船舶重工集团公司第七一六研究所 | Cold and hot data identification method for data hierarchical hybrid storage |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080172349A1 (en) * | 2007-01-12 | 2008-07-17 | Toyota Engineering & Manufacturing North America, Inc. | Neural network controller with fixed long-term and adaptive short-term memory |
CN102983819A (en) * | 2012-11-08 | 2013-03-20 | 南京航空航天大学 | Imitating method of power amplifier and imitating device of power amplifier |
CN103049792A (en) * | 2011-11-26 | 2013-04-17 | 微软公司 | Discriminative pretraining of Deep Neural Network |
CN103680496A (en) * | 2013-12-19 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | Deep-neural-network-based acoustic model training method, hosts and system |
CN104217226A (en) * | 2014-09-09 | 2014-12-17 | 天津大学 | Dialogue act identification method based on deep neural networks and conditional random fields |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104700828B (en) * | 2015-03-19 | 2018-01-12 | 清华大学 | The construction method of depth shot and long term memory Recognition with Recurrent Neural Network acoustic model based on selective attention principle |
-
2015
- 2015-03-19 CN CN201510122982.6A patent/CN104700828B/en active Active
- 2015-10-21 WO PCT/CN2015/092381 patent/WO2016145850A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080172349A1 (en) * | 2007-01-12 | 2008-07-17 | Toyota Engineering & Manufacturing North America, Inc. | Neural network controller with fixed long-term and adaptive short-term memory |
CN103049792A (en) * | 2011-11-26 | 2013-04-17 | 微软公司 | Discriminative pretraining of Deep Neural Network |
CN102983819A (en) * | 2012-11-08 | 2013-03-20 | 南京航空航天大学 | Imitating method of power amplifier and imitating device of power amplifier |
CN103680496A (en) * | 2013-12-19 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | Deep-neural-network-based acoustic model training method, hosts and system |
CN104217226A (en) * | 2014-09-09 | 2014-12-17 | 天津大学 | Dialogue act identification method based on deep neural networks and conditional random fields |
Non-Patent Citations (1)
Title |
---|
ALEX GRAVES等: ""Towards end-to-end speech recognition with recurrent neural networks"", 《PROCEEDINGS OF THE 31ST INTERNATIONAL CONFERENCE ON MACHINE》 * |
Cited By (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016145850A1 (en) * | 2015-03-19 | 2016-09-22 | 清华大学 | Construction method for deep long short-term memory recurrent neural network acoustic model based on selective attention principle |
CN105185374A (en) * | 2015-09-11 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Prosodic hierarchy annotation method and device |
CN105185374B (en) * | 2015-09-11 | 2017-03-29 | 百度在线网络技术(北京)有限公司 | Prosody hierarchy mask method and device |
CN106652999A (en) * | 2015-10-29 | 2017-05-10 | 三星Sds株式会社 | System and method for voice recognition |
CN105354277A (en) * | 2015-10-30 | 2016-02-24 | 中国船舶重工集团公司第七0九研究所 | Recommendation method and system based on recurrent neural network |
CN105354277B (en) * | 2015-10-30 | 2020-11-06 | 中国船舶重工集团公司第七0九研究所 | Recommendation method and system based on recurrent neural network |
CN106683663B (en) * | 2015-11-06 | 2022-01-25 | 三星电子株式会社 | Neural network training apparatus and method, and speech recognition apparatus and method |
CN106683663A (en) * | 2015-11-06 | 2017-05-17 | 三星电子株式会社 | Neural network training apparatus and method, and speech recognition apparatus and method |
CN108475505A (en) * | 2015-11-12 | 2018-08-31 | 谷歌有限责任公司 | Using partial condition target sequence is generated from list entries |
CN108475505B (en) * | 2015-11-12 | 2023-03-17 | 谷歌有限责任公司 | Generating a target sequence from an input sequence using partial conditions |
CN105513591B (en) * | 2015-12-21 | 2019-09-03 | 百度在线网络技术(北京)有限公司 | The method and apparatus for carrying out speech recognition with LSTM Recognition with Recurrent Neural Network model |
CN105513591A (en) * | 2015-12-21 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | Method and device for speech recognition by use of LSTM recurrent neural network model |
CN108780521B (en) * | 2016-02-04 | 2023-05-26 | 渊慧科技有限公司 | Associated long-short term memory neural network layer |
CN108780521A (en) * | 2016-02-04 | 2018-11-09 | 渊慧科技有限公司 | It is associated with shot and long term Memory Neural Networks layer |
CN109155132B (en) * | 2016-03-21 | 2023-05-30 | 亚马逊技术公司 | Speaker verification method and system |
CN109155132A (en) * | 2016-03-21 | 2019-01-04 | 亚马逊技术公司 | Speaker verification method and system |
CN111081231A (en) * | 2016-03-23 | 2020-04-28 | 谷歌有限责任公司 | Adaptive audio enhancement for multi-channel speech recognition |
CN111081231B (en) * | 2016-03-23 | 2023-09-05 | 谷歌有限责任公司 | Adaptive audio enhancement for multi-channel speech recognition |
CN107293291A (en) * | 2016-03-30 | 2017-10-24 | 中国科学院声学研究所 | A kind of audio recognition method end to end based on autoadapted learning rate |
CN105956469B (en) * | 2016-04-27 | 2019-04-26 | 百度在线网络技术(北京)有限公司 | File security recognition methods and device |
CN105956469A (en) * | 2016-04-27 | 2016-09-21 | 百度在线网络技术(北京)有限公司 | Method and device for identifying file security |
CN106096729B (en) * | 2016-06-06 | 2018-11-20 | 天津科技大学 | A kind of depth-size strategy learning method towards complex task in extensive environment |
CN106096729A (en) * | 2016-06-06 | 2016-11-09 | 天津科技大学 | A kind of towards the depth-size strategy learning method of complex task in extensive environment |
CN110192204A (en) * | 2016-11-03 | 2019-08-30 | 易享信息技术有限公司 | The deep neural network model of data is handled by multiple language task levels |
US11797825B2 (en) | 2016-11-03 | 2023-10-24 | Salesforce, Inc. | Training a joint many-task neural network model using successive regularization |
US11783164B2 (en) | 2016-11-03 | 2023-10-10 | Salesforce.Com, Inc. | Joint many-task neural network model for multiple natural language processing (NLP) tasks |
CN110192204B (en) * | 2016-11-03 | 2023-09-29 | 硕动力公司 | Deep neural network model for processing data through multiple language task hierarchies |
US11003949B2 (en) | 2016-11-09 | 2021-05-11 | Microsoft Technology Licensing, Llc | Neural network-based action detection |
CN106650789A (en) * | 2016-11-16 | 2017-05-10 | 同济大学 | Image description generation method based on depth LSTM network |
CN106650789B (en) * | 2016-11-16 | 2023-04-07 | 同济大学 | Image description generation method based on depth LSTM network |
CN108269569B (en) * | 2017-01-04 | 2023-10-27 | 三星电子株式会社 | Speech recognition method and device |
CN108269569A (en) * | 2017-01-04 | 2018-07-10 | 三星电子株式会社 | Audio recognition method and equipment |
CN108304914A (en) * | 2017-01-12 | 2018-07-20 | 三星电子株式会社 | System and method for high-order shot and long term memory network |
CN108304914B (en) * | 2017-01-12 | 2023-12-05 | 三星电子株式会社 | System and method for high-order long-short-term memory network |
CN107293288A (en) * | 2017-06-09 | 2017-10-24 | 清华大学 | A kind of residual error shot and long term remembers the acoustic model modeling method of Recognition with Recurrent Neural Network |
CN107492121A (en) * | 2017-07-03 | 2017-12-19 | 广州新节奏智能科技股份有限公司 | A kind of two-dimension human body bone independent positioning method of monocular depth video |
CN107492121B (en) * | 2017-07-03 | 2020-12-29 | 广州新节奏智能科技股份有限公司 | Two-dimensional human body bone point positioning method of monocular depth video |
CN107484017A (en) * | 2017-07-25 | 2017-12-15 | 天津大学 | Supervision video abstraction generating method is had based on attention model |
CN107484017B (en) * | 2017-07-25 | 2020-05-26 | 天津大学 | Supervised video abstract generation method based on attention model |
CN109460812A (en) * | 2017-09-06 | 2019-03-12 | 富士通株式会社 | Average information analytical equipment, the optimization device, feature visualization device of neural network |
CN107563122B (en) * | 2017-09-20 | 2020-05-19 | 长沙学院 | Crime prediction method based on interleaving time sequence local connection cyclic neural network |
CN107563122A (en) * | 2017-09-20 | 2018-01-09 | 长沙学院 | The method of crime prediction of Recognition with Recurrent Neural Network is locally connected based on interleaving time sequence |
CN107993636B (en) * | 2017-11-01 | 2021-12-31 | 天津大学 | Recursive neural network-based music score modeling and generating method |
CN107993636A (en) * | 2017-11-01 | 2018-05-04 | 天津大学 | Music score modeling and generation method based on recurrent neural network |
CN109243493A (en) * | 2018-10-30 | 2019-01-18 | 南京工程学院 | Based on the vagitus emotion identification method for improving long memory network in short-term |
CN109243493B (en) * | 2018-10-30 | 2022-09-16 | 南京工程学院 | Infant crying emotion recognition method based on improved long-time and short-time memory network |
CN109243494B (en) * | 2018-10-30 | 2022-10-11 | 南京工程学院 | Children emotion recognition method based on multi-attention mechanism long-time memory network |
CN109243494A (en) * | 2018-10-30 | 2019-01-18 | 南京工程学院 | Childhood emotional recognition methods based on the long memory network in short-term of multiple attention mechanism |
CN109614485A (en) * | 2018-11-19 | 2019-04-12 | 中山大学 | A kind of sentence matching method and device of the layering Attention based on syntactic structure |
CN109614485B (en) * | 2018-11-19 | 2023-03-14 | 中山大学 | Sentence matching method and device of hierarchical Attention based on grammar structure |
CN109523995B (en) * | 2018-12-26 | 2019-07-09 | 出门问问信息科技有限公司 | Audio recognition method, speech recognition equipment, readable storage medium storing program for executing and electronic equipment |
CN109523995A (en) * | 2018-12-26 | 2019-03-26 | 出门问问信息科技有限公司 | Audio recognition method, speech recognition equipment, readable storage medium storing program for executing and electronic equipment |
CN109866713A (en) * | 2019-03-21 | 2019-06-11 | 斑马网络技术有限公司 | Safety detection method and device, vehicle |
CN110135634A (en) * | 2019-04-29 | 2019-08-16 | 广东电网有限责任公司电网规划研究中心 | Long-medium term power load forecasting device |
CN110085249A (en) * | 2019-05-09 | 2019-08-02 | 南京工程学院 | The single-channel voice Enhancement Method of Recognition with Recurrent Neural Network based on attention gate |
CN110473529B (en) * | 2019-09-09 | 2021-11-05 | 北京中科智极科技有限公司 | Stream type voice transcription system based on self-attention mechanism |
CN110473529A (en) * | 2019-09-09 | 2019-11-19 | 极限元(杭州)智能科技股份有限公司 | A kind of Streaming voice transcription system based on from attention mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN104700828B (en) | 2018-01-12 |
WO2016145850A1 (en) | 2016-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104700828A (en) | Deep long-term and short-term memory recurrent neural network acoustic model establishing method based on selective attention principles | |
CN104538028B (en) | A kind of continuous speech recognition method that Recognition with Recurrent Neural Network is remembered based on depth shot and long term | |
Nakkiran et al. | Compressing deep neural networks using a rank-constrained topology | |
Gelly et al. | Optimization of RNN-based speech activity detection | |
Prabhavalkar et al. | On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition | |
Sainath et al. | Auto-encoder bottleneck features using deep belief networks | |
CN107293288B (en) | Acoustic model modeling method of residual long-short term memory recurrent neural network | |
US10783900B2 (en) | Convolutional, long short-term memory, fully connected deep neural networks | |
CN106919977B (en) | Feedforward sequence memory neural network and construction method and system thereof | |
CN105139864B (en) | Audio recognition method and device | |
Huang et al. | Sndcnn: Self-normalizing deep cnns with scaled exponential linear units for speech recognition | |
CN107301864A (en) | A kind of two-way LSTM acoustic models of depth based on Maxout neurons | |
Guiming et al. | Speech recognition based on convolutional neural networks | |
CN108847244A (en) | Voiceprint recognition method and system based on MFCC and improved BP neural network | |
CN110853668B (en) | Voice tampering detection method based on multi-feature fusion | |
Guo et al. | Time-delayed bottleneck highway networks using a DFT feature for keyword spotting | |
CN104464727A (en) | Single-channel music singing separation method based on deep belief network | |
CN110223714A (en) | A kind of voice-based Emotion identification method | |
CN109036467A (en) | CFFD extracting method, speech-emotion recognition method and system based on TF-LSTM | |
CN109410974A (en) | Sound enhancement method, device, equipment and storage medium | |
Liu et al. | Pruning deep neural networks by optimal brain damage. | |
Zhang et al. | High order recurrent neural networks for acoustic modelling | |
Huang et al. | Beyond cross-entropy: Towards better frame-level objective functions for deep neural network training in automatic speech recognition | |
Li et al. | Improving long short-term memory networks using maxout units for large vocabulary speech recognition | |
Cai et al. | Convolutional maxout neural networks for low-resource speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |