CN110033766A - A kind of audio recognition method based on binaryzation recurrent neural network - Google Patents
A kind of audio recognition method based on binaryzation recurrent neural network Download PDFInfo
- Publication number
- CN110033766A CN110033766A CN201910310341.1A CN201910310341A CN110033766A CN 110033766 A CN110033766 A CN 110033766A CN 201910310341 A CN201910310341 A CN 201910310341A CN 110033766 A CN110033766 A CN 110033766A
- Authority
- CN
- China
- Prior art keywords
- unit
- binaryzation
- state
- hidden
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 27
- 230000000306 recurrent effect Effects 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 title claims abstract description 22
- 239000002356 single layer Substances 0.000 claims abstract description 16
- 239000011159 matrix material Substances 0.000 claims description 37
- 230000006870 function Effects 0.000 claims description 17
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 238000012546 transfer Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 3
- 230000006403 short-term memory Effects 0.000 claims description 2
- 230000007774 longterm Effects 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 6
- 238000003058 natural language processing Methods 0.000 abstract description 4
- 238000012545 processing Methods 0.000 abstract description 4
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000005265 energy consumption Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 230000007423 decrease Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Theoretical Computer Science (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a kind of audio recognition methods based on binaryzation recurrent neural network, belong to artificial intelligence field.This method comprises: S1: voice input, vectorization audio;S2: building binaryzation recurrent neural networks model, and the audio after vectorization is decoded and is encoded;S3: the audio after exports coding, the i.e. audio of output character.The binaryzation recurrent neural networks model includes the network structures such as the unidirectional RNN model of binaryzation single layer, the two-way RNN model of binaryzation and the two-way LSTM model of binaryzation.The present invention keeps advantage of the binaryzation network in terms of speed and energy consumption while improving application model precision;Make it possible on embedded device realize under natural language processing and speech processing applications scene better performances LSTM/RNN model.
Description
Technical field
The invention belongs to artificial intelligence fields, are related to a kind of audio recognition method based on binaryzation recurrent neural network.
Background technique
Based on classical LSTM/RNN model realization series processing application, such as Entity recognition is named in natural language processing,
Or voice, to application scenarios such as the conversions of text, LSTM/RNN model can achieve preferable performance.However due to LSTM/
The parameter of the weight matrix of RNN model, various control doors is real-coded GA, and the realization of classical LSTM/RNN model needs to consume
A large amount of computing resource and storage resource, thus be difficult to implement on the limited embedded device of resource, or can only be embedded
More simple but poor performance LSTM/RNN model is realized in equipment.
At present in the application such as natural language processing and speech recognition, common LSTM/RNN model is more complex, often portion
Affix one's name to the server end abundant in computing capability and storage capacity.But as user increasingly payes attention to privacy, user is more
Tend to the LSTM/RNN model in offline end operation.There are the risks of privacy leakage for traditional server end implementation, and
And it is limited by network bandwidth, in the higher application scenarios of requirement of real-time, set in the implementation of server end compared to embedded
Standby upper implementation is more difficult to guarantee summary responses.
The floating type weight of threshold value two sides is directly quantified as 0 and 1 by given threshold by common Binarization methods, though
Memory overhead and computing cost can be so significantly reduced, but loss of significance is big.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of speech recognition sides based on binaryzation recurrent neural network
Method keeps advantage of the binaryzation network in terms of speed and energy consumption while improving application model precision.So that being set embedded
Standby upper realization LSTM/RNN model of better performances under natural language processing and speech processing applications scene is possibly realized.
In order to achieve the above objectives, the invention provides the following technical scheme:
A kind of audio recognition method based on binaryzation recurrent neural network, comprising the following steps:
S1: voice input, vectorization audio;
S2: building binaryzation recurrent neural networks model, and the audio after vectorization is decoded and is encoded;
S3: the audio after exports coding, the i.e. audio of output character.
Further, in step S2, the binaryzation recurrent neural networks model includes the following categories: binaryzation single layer is unidirectional
Recurrent neural network (Recurrent Neural Network, RNN) model, the two-way RNN model of binaryzation and binaryzation are two-way
Shot and long term memory network (Long Short-Term Memory, LSTM) model.
Further, the unidirectional RNN model of the binaryzation single layer includes: input unit, hidden unit and output unit.
Further, the state of the unidirectional RNN model of the binaryzation single layer shifts formula are as follows:
a(t)=b+ (BW o(t-1))αW+(BUx(t))αU
h(t)=tanh (a(t))
o(t)=c+ (BVh(t))αV
Wherein, the bias vector of b, c for parameter, U, V, W have respectively corresponded input unit to hidden unit, and hidden unit arrives
Output unit and output unit to hidden unit connection weight matrix;Floating type matrix U, V, W pass through binary conversion treatment
Afterwards, corresponding binaryzation weight matrix B is obtainedU, BV, BWAnd corresponding optimal approximation factor-alphaU, αV, αW;a(t)It indicates in t
The overlaying state after various state inputs is carved, the state of input includes the state o of t-1 moment output unit(t-1)It is defeated with t moment
Enter the state x of unit(t);h(t)Indicate the state of the hidden unit of t moment, tanh is hyperbolic tangent function;Expression passes through
Using softmax function, the output vector of standardization posterior probability is obtained.
Further, the two-way RNN model of the binaryzation includes: input unit, hidden unit and output unit;Wherein, hidden
Hiding unit is made of the opposite h hidden unit of both direction and g hidden unit;The h hidden unit indicates that circuit is from the past
It inputs in the state formed and obtains information;The g hidden unit indicates that circuit is to obtain from the state that following input is formed
Information.
Further, the state of the two-way RNN model of the binaryzation shifts formula are as follows:
h(t)=tanh (a1 (t))
g(t)=tanh (a2 (t))
Wherein, b, c are the bias vector of parameter, U1, V1, W1Input unit is respectively corresponded to h hidden unit, h hides single
Weight matrix of the member to output unit and output unit to the connection of h hidden unit;U2, V2, W2Input unit is respectively corresponded
To g hidden unit, the weight matrix of the connection of g hidden unit to output unit and output unit to g hidden unit;Floating type
Matrix U, V, W obtain corresponding binaryzation weight matrix B after binary conversion treatmentU, BV, BWAnd corresponding optimal approximation
Factor-alphaU, αV, αW;a(t)Indicate the overlaying state after t moment various states input, the state of input include the t-1 moment it is defeated
The state o of unit out(t-1)With the state x of t moment input unit(t);h(t)、g(t)Respectively indicate h the and g hidden unit of t moment
State, tanh are hyperbolic tangent function;Indicate by apply softmax function, obtain standardization posterior probability output to
Amount.
Further, the two-way LSTM model of the binaryzation includes: input unit, hidden unit and output unit;Wherein, hidden
Hiding unit is made of the opposite h hidden unit of both direction and g hidden unit;The h hidden unit indicates that circuit is from the past
It inputs in the state formed and obtains information;The g hidden unit indicates that circuit is to obtain from the state that following input is formed
Information;Input gate G is provided in the h hidden unit and g hidden uniti, forget door GfWith out gate Go。
Further, the state of the two-way LSTM model of the binaryzation shifts formula are as follows:
The circuit h, i.e., the state transfer from last time to current time:
The circuit g, the state transfer from future time instance to current time:
Output:
Wherein, the parameters in the circuit h, mark subscript 1, and the parameters in the circuit g mark subscript 2;Gi、Gf、Go
The gating parameter for respectively indicating input gate, forgeing door, out gate;σ is sigmoid activation primitive, and function is S-type, output valve position
Between 0 and 1, b, c are the bias vector of parameter, U1, V1, W1Input unit has been respectively corresponded to h hidden unit, h hidden unit
To output unit and output unit to the weight matrix of the connection of h hidden unit;U2, V2, W2Input unit has been respectively corresponded to g
Hidden unit, g hidden unit to output unit and output unit to g hidden unit connection weight matrix;Floating type matrix
U, V, W obtain corresponding binaryzation weight matrix B after binary conversion treatmentU, BV, BWAnd the corresponding optimal approximation factor
αU, αV, αW;U=Uh∪Ui∪Uf∪Uo, V=Vh∪Vi∪Vf∪Vo;a(t)Indicate the superposition after the various state inputs of t moment
State, the state of input include the state o of t-1 moment output unit(t-1)With the state x of t moment input unit(t);h(t)、g(t)
The state of the h and g hidden unit of t moment is respectively indicated, tanh is hyperbolic tangent function;It indicates by applying softmax letter
Number obtains the output vector of standardization posterior probability.
The beneficial effects of the present invention are: the present invention after LSTM/RNN weight binaryzation, will drastically reduce voice and know
Storage resource and calculation resources expense in not applying, in the case described in two-way LSTM, storage resource reduces about 93%, and in reality
Under the conditions of testing, it was demonstrated that LSTM/RNN model accuracy decline typically not greater than 1% after binaryzation.
Compared to traditional LSTM/RNN model, the present invention can be effectively in the voice highly sensitive to storage and calculation resources
It is disposed in identification Embedded Application scene, can be run in low-power-consumption embedded equipment with higher performance, avoid passing
Voice is uploaded the faced privacy concern of cloud server by way of uniting.
Other advantages, target and feature of the invention will be illustrated in the following description to a certain extent, and
And to a certain extent, based on will be apparent to those skilled in the art to investigating hereafter, Huo Zheke
To be instructed from the practice of the present invention.Target of the invention and other advantages can be realized by following specification and
It obtains.
Detailed description of the invention
To make the objectives, technical solutions, and advantages of the present invention clearer, the present invention is made below in conjunction with attached drawing excellent
The detailed description of choosing, in which:
Fig. 1 is the process flow diagram that the present invention realizes system using speech recognition;
Fig. 2 is the unidirectional RNN model structure schematic diagram of binaryzation single layer;
Fig. 3 is the two-way RNN model structure schematic diagram of binaryzation;
Fig. 4 is the two-way LSTM model structure schematic diagram of binaryzation;
Fig. 5 is the schematic diagram of internal structure of door control system.
Specific embodiment
Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification
Other advantages and efficacy of the present invention can be easily understood for disclosed content.The present invention can also pass through in addition different specific realities
The mode of applying is embodied or practiced, the various details in this specification can also based on different viewpoints and application, without departing from
Various modifications or alterations are carried out under spirit of the invention.It should be noted that diagram provided in following embodiment is only to show
Meaning mode illustrates basic conception of the invention, and in the absence of conflict, the feature in following embodiment and embodiment can phase
Mutually combination.
FIG. 1 to FIG. 5 is please referred to, Fig. 1 is the audio recognition method of the present invention based on binaryzation recurrent neural network
Flow chart, wherein voice input module is using microphone acquisition analog audio signal and sample quantization becomes digital signal.Audio
Vectorization representation module is filtered to audio and the vectorization by the way of entity insertion.Hereafter signal is via a pair two
The decoder coding device model that value forward-backward recutrnce neural network is constituted, wherein decoder model extracts hiding in audio signal
Character representation, and hiding character representation is mapped to corresponding textual representation by encoder model.Finally, text output module is used for
The text that voice is converted to is output to human-computer interaction interface.
The binaryzation recurrent neural networks model that the present embodiment uses mainly includes the following categories: binaryzation single layer is unidirectional
The two-way RNN model of RNN model, binaryzation and the two-way LSTM model of binaryzation etc..
1) the unidirectional RNN model of binaryzation single layer
As shown in Fig. 2, the model implements high-effect Binarization methods, binaryzation single layer list on the unidirectional RNN model of single layer
It is to the difference of RNN and the unidirectional RNN of common single layer to floating type weight matrix binaryzation weight matrix B (i.e. various bit wides
Integer data) and the product of optimal approximation factor-alpha substituted.As shown in Fig. 2, the unidirectional RNN corresponding data of binaryzation single layer
Weight on flow path is indicated with α B.Corresponding state transfer formula is as shown in following 4 formulas:
a(t)=b+ (BWo(t-1))αW+(BUx(t))αU
h(t)=tanh (a(t))
o(t)=c+ (BVh(t))αV
Wherein, the bias vector of b, c for parameter, U, V, W have respectively corresponded input unit to hidden unit, and hidden unit arrives
Output unit and output unit to hidden unit connection weight matrix.a(t)It indicates after the various state inputs of t moment
Overlaying state, the state of input includes the state o of t-1 moment output unit(t-1)With the state x of t moment input unit(t).And
Overlaying state is via activation primitive afterwards, such as can choose hyperbolic tangent function tanh here, obtains the hidden unit of t moment
State h(t).The state of hidden unit is superimposed bias vector c, obtains the state o of output unit after V matrixing(t)。
Finally, the output vector of standardization posterior probability can be obtained by applying softmax function
In order to reduce storage and computing cost, and avoid model accuracy from losing as far as possible, using binaryzation weight matrix B and
Optimal approximation factor-alpha carrys out the original floating type weight matrix of approximate substitution.Original floating type U, V, W matrix carry out two-value respectively
After change processing, B is obtainedU, BV, BWThree binaryzation weight matrixs and αU, αV, αWThree corresponding optimal approximation factors.Wherein
The method of B and α the evaluation degree of approximation of optimal approximation is as follows, while this is also the evaluation mark of binaryzation RNNs binaryzation effect
Standard, specific formula for calculation are as follows:
J(BW,αW)=‖ W- αWBW‖2
Algorithm: the specific steps of the unidirectional RNN network of binaryzation Weight Training single layer are used are as follows:
2) the two-way RNN model of binaryzation
As shown in figure 3, in practical application, more efficient two-way company of the RNN model from output unit to hidden unit
Fetch realization circulation loop.For the h unit h of t moment(t), state by the t-1 moment output unit state o(t-1)And t moment
Input unit state x(t)It codetermines.For the g unit g of t moment(t), state by the t+1 moment output unit state o(t +1)With the input unit state x of t moment(t)It codetermines.T moment output unit o(t)State by bias vector c, h(t)And g(t)
State codetermine.
Due to having two groups of contrary hidden units, thus from input unit to hidden unit, from hidden unit to
Hidden unit, and weight matrix U, V, W from hidden unit to output unit have two groups, for the circuit comprising h unit, use
U1, V1, W1It indicates;For the circuit comprising g unit, U is used2, V2, W2It indicates.
Before binaryzation, after binaryzation, respectively to the circuit where h and g, by U1, V1, W1And U2, V2, W2It replaces withWithAs shown in figure 3, the two-way RNN model of binaryzation
State shifts formula as shown in following 6 formulas:
h(t)=tanh (a1 (t))
g(t)=tanh (a2 (t))
3) the two-way LSTM model of binaryzation
As shown in figure 4, the state transfer from last time to current time is referred to as the circuit h, from future time instance to current
The state transfer at moment is referred to as the circuit g.Wherein black square indicates the delay of a time step.In two-way LSTM, due to
There are two circuits h and g, in order to distinguish on symbol, for the parameters in the circuit h, mark subscript 1, and for g
Parameters in circuit mark subscript 2.Such as the input gate gating parameter symbol in the circuit t moment hTo indicate.Double
Into LSTM, need to accumulate the output in the circuit h in o unitWith the output in the circuit gTherefore existThe internal structure of LSTM unit is as shown in Figure 5.
In two-way LSTM, need to carry out binaryzation to the circuit where h and g respectively approximate.Such as the U in the circuit hiMatrix
Binaryzation matrix useIt indicates, the corresponding optimal approximation factor is usedTo indicate.Similarly, in h, g two are returned
U in road, V, W matrix and its submatrix UiDeng will be indicated with similar mode.The mode of approximate substitution and single layer are unidirectional
LSTM is consistent, and complete state transfer formula is as follows:
The circuit h:
The circuit g:
Output:
Wherein, Gi、Gf、GoThe gating parameter for respectively indicating input gate, forgeing door, out gate;σ is that sigmoid activates letter
Number, function is S-type, and output valve is between 0 and 1, b, and c is the bias vector of parameter, U1, V1, W1Input unit is respectively corresponded
To h hidden unit, the weight matrix of the connection of h hidden unit to output unit and output unit to h hidden unit;U2, V2, W2
Connection of the input unit to g hidden unit, g hidden unit to output unit and output unit to g hidden unit is respectively corresponded
Weight matrix;Floating type matrix U, V, W obtain corresponding binaryzation weight matrix B after binary conversion treatmentU, BV, BW, with
And corresponding optimal approximation factor-alphaU, αV, αW;U=Uh∪Ui∪Uf∪Uo, V=Vh∪Vi∪Vf∪Vo;a(t)It indicates each in t moment
Overlaying state after kind state input, the state of input includes the state o of t-1 moment output unit(t-1)It is inputted with t moment single
The state x of member(t);h(t)、g(t)The state of the h and g hidden unit of t moment is respectively indicated, tanh is hyperbolic tangent function;Table
Show by applying softmax function, obtains the output vector of standardization posterior probability.
Algorithm: the specific steps of the two-way RNN network of binaryzation Weight Training are used are as follows:
The approximate binaryzation LSTM/RNN algorithm of the height that the present invention uses and realization device design can be LSTM/RNN weights
It is greatly reduced storage resource and calculation resources expense in speech recognition application after binaryzation, in the case described in two-way LSTM,
Storage resource reduces about 93%, and under experimental conditions, it was demonstrated that the decline of LSTM/RNN model accuracy is usually not more than after binaryzation
Cross 1%.
Finally, it is stated that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although referring to compared with
Good embodiment describes the invention in detail, those skilled in the art should understand that, it can be to skill of the invention
Art scheme is modified or replaced equivalently, and without departing from the objective and range of the technical program, should all be covered in the present invention
Scope of the claims in.
Claims (8)
1. a kind of audio recognition method based on binaryzation recurrent neural network, which is characterized in that method includes the following steps:
S1: voice input, vectorization audio;
S2: building binaryzation recurrent neural networks model, and the audio after vectorization is decoded and is encoded;
S3: the audio after exports coding, the i.e. audio of output character.
2. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 1, which is characterized in that
In step S2, the binaryzation recurrent neural networks model is included the following categories: the unidirectional recurrent neural network of binaryzation single layer
(Recurrent Neural Network, RNN) model, the two-way RNN model of binaryzation and the two-way shot and long term of binaryzation remember net
Network (Long Short-Term Memory, LSTM) model.
3. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 2, which is characterized in that
The unidirectional RNN model of binaryzation single layer includes: input unit, hidden unit and output unit.
4. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 3, which is characterized in that
The state of the unidirectional RNN model of binaryzation single layer shifts formula are as follows:
a(t)=b+ (BWo(t-1))αW+(BUx(t))αU
h(t)=tanh (a(t))
o(t)=c+ (BVh(t))αV
Wherein, the bias vector of b, c for parameter, U, V, W have respectively corresponded input unit to hidden unit, hidden unit to output
Unit and output unit to hidden unit connection weight matrix;Floating type matrix U, V, W are obtained after binary conversion treatment
To corresponding binaryzation weight matrix BU, BV, BWAnd corresponding optimal approximation factor-alphaU, αV, αW;a(t)It indicates each in t moment
Overlaying state after kind state input, the state of input includes the state o of t-1 moment output unit(t-1)It is inputted with t moment single
The state x of member(t);h(t)Indicate the state of the hidden unit of t moment, tanh is hyperbolic tangent function;Expression passes through application
Softmax function obtains the output vector of standardization posterior probability.
5. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 2, which is characterized in that
The two-way RNN model of binaryzation includes: input unit, hidden unit and output unit;Wherein, hidden unit is by both direction
Opposite h hidden unit and g hidden unit composition;The h hidden unit indicates that circuit is to input in the state to be formed from the past
Obtain information;The g hidden unit indicates that circuit is to obtain information from the state that following input is formed.
6. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 5, which is characterized in that
The state of the two-way RNN model of binaryzation shifts formula are as follows:
h(t)=tanh (a1 (t))
g(t)=tanh (a2 (t))
Wherein, b, c are the bias vector of parameter, U1, V1, W1Input unit is respectively corresponded to h hidden unit, h hidden unit arrives
Output unit and output unit to h hidden unit connection weight matrix;U2, V2, W2It is hidden to g input unit has been respectively corresponded
Hide the weight matrix of the connection of unit, g hidden unit to output unit and output unit to g hidden unit;Floating type matrix U,
V, W obtain corresponding binaryzation weight matrix B after binary conversion treatmentU, BV, BWAnd corresponding optimal approximation factor-alphaU,
αV, αW;a(t)Indicate the overlaying state after the various state inputs of t moment, the state of input includes t-1 moment output unit
State o(t-1)With the state x of t moment input unit(t);h(t)、g(t)The state of the h and g hidden unit of t moment is respectively indicated,
Tanh is hyperbolic tangent function;It indicates to obtain the output vector of standardization posterior probability by applying softmax function.
7. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 2, which is characterized in that
The two-way LSTM model of binaryzation includes: input unit, hidden unit and output unit;Wherein, hidden unit is by two sides
It is formed to opposite h hidden unit and g hidden unit;The h hidden unit indicates that circuit is that the state to be formed was inputted from the past
Middle acquisition information;The g hidden unit indicates that circuit is to obtain information from the state that following input is formed;The h hides
Input gate G is provided in unit and g hidden uniti, forget door GfWith out gate Go。
8. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 7, which is characterized in that
The state of the two-way LSTM model of binaryzation shifts formula are as follows:
The circuit h, i.e., the state transfer from last time to current time:
The circuit g, the state transfer from future time instance to current time:
Output:
Wherein, the parameters in the circuit h, mark subscript 1, and the parameters in the circuit g mark subscript 2;Gi、Gf、GoRespectively
The gating parameter for indicating input gate, forgeing door, out gate;σ is sigmoid activation primitive, and b, c are the bias vector of parameter, U1,
V1, W1Input unit has been respectively corresponded to h hidden unit, h hidden unit to output unit and output unit to h hidden unit
The weight matrix of connection;U2, V2, W2Input unit is respectively corresponded to g hidden unit, g hidden unit to output unit and defeated
Out unit to g hidden unit connection weight matrix;Floating type matrix U, V, W are obtained corresponding after binary conversion treatment
Binaryzation weight matrix BU, BV, BWAnd corresponding optimal approximation factor-alphaU, αV, αW;U=Uh∪Ui∪Uf∪Uo, V=Vh∪Vi
∪Vf∪Vo;a(t)Indicate the overlaying state after the various state inputs of t moment, the state of input includes exporting list the t-1 moment
The state o of member(t-1)With the state x of t moment input unit(t);h(t)、g(t)Respectively indicate the shape of the h and g hidden unit of t moment
State, tanh are hyperbolic tangent function;It indicates to obtain the output vector of standardization posterior probability by applying softmax function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910310341.1A CN110033766A (en) | 2019-04-17 | 2019-04-17 | A kind of audio recognition method based on binaryzation recurrent neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910310341.1A CN110033766A (en) | 2019-04-17 | 2019-04-17 | A kind of audio recognition method based on binaryzation recurrent neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110033766A true CN110033766A (en) | 2019-07-19 |
Family
ID=67238818
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910310341.1A Pending CN110033766A (en) | 2019-04-17 | 2019-04-17 | A kind of audio recognition method based on binaryzation recurrent neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110033766A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674265A (en) * | 2019-08-06 | 2020-01-10 | 上海孚典智能科技有限公司 | Unstructured information oriented feature discrimination and information recommendation system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106816147A (en) * | 2017-01-25 | 2017-06-09 | 上海交通大学 | Speech recognition system based on binary neural network acoustic model |
CN107293291A (en) * | 2016-03-30 | 2017-10-24 | 中国科学院声学研究所 | A kind of audio recognition method end to end based on autoadapted learning rate |
CN108765506A (en) * | 2018-05-21 | 2018-11-06 | 上海交通大学 | Compression method based on successively network binaryzation |
CN109086866A (en) * | 2018-07-02 | 2018-12-25 | 重庆大学 | A kind of part two-value convolution method suitable for embedded device |
US10192327B1 (en) * | 2016-02-04 | 2019-01-29 | Google Llc | Image compression with recurrent neural networks |
-
2019
- 2019-04-17 CN CN201910310341.1A patent/CN110033766A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10192327B1 (en) * | 2016-02-04 | 2019-01-29 | Google Llc | Image compression with recurrent neural networks |
CN107293291A (en) * | 2016-03-30 | 2017-10-24 | 中国科学院声学研究所 | A kind of audio recognition method end to end based on autoadapted learning rate |
CN106816147A (en) * | 2017-01-25 | 2017-06-09 | 上海交通大学 | Speech recognition system based on binary neural network acoustic model |
CN108765506A (en) * | 2018-05-21 | 2018-11-06 | 上海交通大学 | Compression method based on successively network binaryzation |
CN109086866A (en) * | 2018-07-02 | 2018-12-25 | 重庆大学 | A kind of part two-value convolution method suitable for embedded device |
Non-Patent Citations (1)
Title |
---|
GDTOP818: "[ICML14] Towards End-to-End Speech Recognition with Recurrent Neural Networks", 《CSDN博客,HTTPS://BLOG.CSDN.NET/WEIXIN_37993251/ARTICLE/DETAILS/88129893》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674265A (en) * | 2019-08-06 | 2020-01-10 | 上海孚典智能科技有限公司 | Unstructured information oriented feature discrimination and information recommendation system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108171701B (en) | Significance detection method based on U network and counterstudy | |
CN109543838B (en) | Image increment learning method based on variational self-encoder | |
CN108280233A (en) | A kind of VideoGIS data retrieval method based on deep learning | |
CN110851760B (en) | Human-computer interaction system for integrating visual question answering in web3D environment | |
CN110533570A (en) | A kind of general steganography method based on deep learning | |
CN111581383A (en) | Chinese text classification method based on ERNIE-BiGRU | |
CN111931814A (en) | Unsupervised anti-domain adaptation method based on intra-class structure compactness constraint | |
CN112733965A (en) | Label-free image classification method based on small sample learning | |
CN114565808B (en) | Double-action contrast learning method for unsupervised visual representation | |
CN110472255A (en) | Neural network machine interpretation method, model, electric terminal and storage medium | |
CN110263164A (en) | A kind of Sentiment orientation analysis method based on Model Fusion | |
CN114581341A (en) | Image style migration method and system based on deep learning | |
CN116935292B (en) | Short video scene classification method and system based on self-attention model | |
CN110033766A (en) | A kind of audio recognition method based on binaryzation recurrent neural network | |
CN112417118A (en) | Dialog generation method based on marked text and neural network | |
Li et al. | Towards communication-efficient digital twin via ai-powered transmission and reconstruction | |
LU503098B1 (en) | A method and system for fused subspace clustering based on graph autoencoder | |
CN114333069B (en) | Object posture processing method, device, equipment and storage medium | |
CN116245106A (en) | Cross-domain named entity identification method based on autoregressive model | |
CN115270917A (en) | Two-stage processing multi-mode garment image generation method | |
Cheng et al. | Solving monocular sensors depth prediction using MLP-based architecture and multi-scale inverse attention | |
Babaheidarian et al. | Decode and transfer: A new steganalysis technique via conditional generative adversarial networks | |
CN112598065A (en) | Memory-based gated convolutional neural network semantic processing system and method | |
Kouvakis et al. | Semantic communications for image-based sign language transmission | |
Yuan et al. | A review on generative adversarial networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190719 |