CN110033766A - A kind of audio recognition method based on binaryzation recurrent neural network - Google Patents

A kind of audio recognition method based on binaryzation recurrent neural network Download PDF

Info

Publication number
CN110033766A
CN110033766A CN201910310341.1A CN201910310341A CN110033766A CN 110033766 A CN110033766 A CN 110033766A CN 201910310341 A CN201910310341 A CN 201910310341A CN 110033766 A CN110033766 A CN 110033766A
Authority
CN
China
Prior art keywords
unit
binaryzation
state
hidden
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910310341.1A
Other languages
Chinese (zh)
Inventor
李坤平
张帅
周喜川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201910310341.1A priority Critical patent/CN110033766A/en
Publication of CN110033766A publication Critical patent/CN110033766A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a kind of audio recognition methods based on binaryzation recurrent neural network, belong to artificial intelligence field.This method comprises: S1: voice input, vectorization audio;S2: building binaryzation recurrent neural networks model, and the audio after vectorization is decoded and is encoded;S3: the audio after exports coding, the i.e. audio of output character.The binaryzation recurrent neural networks model includes the network structures such as the unidirectional RNN model of binaryzation single layer, the two-way RNN model of binaryzation and the two-way LSTM model of binaryzation.The present invention keeps advantage of the binaryzation network in terms of speed and energy consumption while improving application model precision;Make it possible on embedded device realize under natural language processing and speech processing applications scene better performances LSTM/RNN model.

Description

A kind of audio recognition method based on binaryzation recurrent neural network
Technical field
The invention belongs to artificial intelligence fields, are related to a kind of audio recognition method based on binaryzation recurrent neural network.
Background technique
Based on classical LSTM/RNN model realization series processing application, such as Entity recognition is named in natural language processing, Or voice, to application scenarios such as the conversions of text, LSTM/RNN model can achieve preferable performance.However due to LSTM/ The parameter of the weight matrix of RNN model, various control doors is real-coded GA, and the realization of classical LSTM/RNN model needs to consume A large amount of computing resource and storage resource, thus be difficult to implement on the limited embedded device of resource, or can only be embedded More simple but poor performance LSTM/RNN model is realized in equipment.
At present in the application such as natural language processing and speech recognition, common LSTM/RNN model is more complex, often portion Affix one's name to the server end abundant in computing capability and storage capacity.But as user increasingly payes attention to privacy, user is more Tend to the LSTM/RNN model in offline end operation.There are the risks of privacy leakage for traditional server end implementation, and And it is limited by network bandwidth, in the higher application scenarios of requirement of real-time, set in the implementation of server end compared to embedded Standby upper implementation is more difficult to guarantee summary responses.
The floating type weight of threshold value two sides is directly quantified as 0 and 1 by given threshold by common Binarization methods, though Memory overhead and computing cost can be so significantly reduced, but loss of significance is big.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of speech recognition sides based on binaryzation recurrent neural network Method keeps advantage of the binaryzation network in terms of speed and energy consumption while improving application model precision.So that being set embedded Standby upper realization LSTM/RNN model of better performances under natural language processing and speech processing applications scene is possibly realized.
In order to achieve the above objectives, the invention provides the following technical scheme:
A kind of audio recognition method based on binaryzation recurrent neural network, comprising the following steps:
S1: voice input, vectorization audio;
S2: building binaryzation recurrent neural networks model, and the audio after vectorization is decoded and is encoded;
S3: the audio after exports coding, the i.e. audio of output character.
Further, in step S2, the binaryzation recurrent neural networks model includes the following categories: binaryzation single layer is unidirectional Recurrent neural network (Recurrent Neural Network, RNN) model, the two-way RNN model of binaryzation and binaryzation are two-way Shot and long term memory network (Long Short-Term Memory, LSTM) model.
Further, the unidirectional RNN model of the binaryzation single layer includes: input unit, hidden unit and output unit.
Further, the state of the unidirectional RNN model of the binaryzation single layer shifts formula are as follows:
a(t)=b+ (BW o(t-1)W+(BUx(t)U
h(t)=tanh (a(t))
o(t)=c+ (BVh(t)V
Wherein, the bias vector of b, c for parameter, U, V, W have respectively corresponded input unit to hidden unit, and hidden unit arrives Output unit and output unit to hidden unit connection weight matrix;Floating type matrix U, V, W pass through binary conversion treatment Afterwards, corresponding binaryzation weight matrix B is obtainedU, BV, BWAnd corresponding optimal approximation factor-alphaU, αV, αW;a(t)It indicates in t The overlaying state after various state inputs is carved, the state of input includes the state o of t-1 moment output unit(t-1)It is defeated with t moment Enter the state x of unit(t);h(t)Indicate the state of the hidden unit of t moment, tanh is hyperbolic tangent function;Expression passes through Using softmax function, the output vector of standardization posterior probability is obtained.
Further, the two-way RNN model of the binaryzation includes: input unit, hidden unit and output unit;Wherein, hidden Hiding unit is made of the opposite h hidden unit of both direction and g hidden unit;The h hidden unit indicates that circuit is from the past It inputs in the state formed and obtains information;The g hidden unit indicates that circuit is to obtain from the state that following input is formed Information.
Further, the state of the two-way RNN model of the binaryzation shifts formula are as follows:
h(t)=tanh (a1 (t))
g(t)=tanh (a2 (t))
Wherein, b, c are the bias vector of parameter, U1, V1, W1Input unit is respectively corresponded to h hidden unit, h hides single Weight matrix of the member to output unit and output unit to the connection of h hidden unit;U2, V2, W2Input unit is respectively corresponded To g hidden unit, the weight matrix of the connection of g hidden unit to output unit and output unit to g hidden unit;Floating type Matrix U, V, W obtain corresponding binaryzation weight matrix B after binary conversion treatmentU, BV, BWAnd corresponding optimal approximation Factor-alphaU, αV, αW;a(t)Indicate the overlaying state after t moment various states input, the state of input include the t-1 moment it is defeated The state o of unit out(t-1)With the state x of t moment input unit(t);h(t)、g(t)Respectively indicate h the and g hidden unit of t moment State, tanh are hyperbolic tangent function;Indicate by apply softmax function, obtain standardization posterior probability output to Amount.
Further, the two-way LSTM model of the binaryzation includes: input unit, hidden unit and output unit;Wherein, hidden Hiding unit is made of the opposite h hidden unit of both direction and g hidden unit;The h hidden unit indicates that circuit is from the past It inputs in the state formed and obtains information;The g hidden unit indicates that circuit is to obtain from the state that following input is formed Information;Input gate G is provided in the h hidden unit and g hidden uniti, forget door GfWith out gate Go
Further, the state of the two-way LSTM model of the binaryzation shifts formula are as follows:
The circuit h, i.e., the state transfer from last time to current time:
The circuit g, the state transfer from future time instance to current time:
Output:
Wherein, the parameters in the circuit h, mark subscript 1, and the parameters in the circuit g mark subscript 2;Gi、Gf、Go The gating parameter for respectively indicating input gate, forgeing door, out gate;σ is sigmoid activation primitive, and function is S-type, output valve position Between 0 and 1, b, c are the bias vector of parameter, U1, V1, W1Input unit has been respectively corresponded to h hidden unit, h hidden unit To output unit and output unit to the weight matrix of the connection of h hidden unit;U2, V2, W2Input unit has been respectively corresponded to g Hidden unit, g hidden unit to output unit and output unit to g hidden unit connection weight matrix;Floating type matrix U, V, W obtain corresponding binaryzation weight matrix B after binary conversion treatmentU, BV, BWAnd the corresponding optimal approximation factor αU, αV, αW;U=Uh∪Ui∪Uf∪Uo, V=Vh∪Vi∪Vf∪Vo;a(t)Indicate the superposition after the various state inputs of t moment State, the state of input include the state o of t-1 moment output unit(t-1)With the state x of t moment input unit(t);h(t)、g(t) The state of the h and g hidden unit of t moment is respectively indicated, tanh is hyperbolic tangent function;It indicates by applying softmax letter Number obtains the output vector of standardization posterior probability.
The beneficial effects of the present invention are: the present invention after LSTM/RNN weight binaryzation, will drastically reduce voice and know Storage resource and calculation resources expense in not applying, in the case described in two-way LSTM, storage resource reduces about 93%, and in reality Under the conditions of testing, it was demonstrated that LSTM/RNN model accuracy decline typically not greater than 1% after binaryzation.
Compared to traditional LSTM/RNN model, the present invention can be effectively in the voice highly sensitive to storage and calculation resources It is disposed in identification Embedded Application scene, can be run in low-power-consumption embedded equipment with higher performance, avoid passing Voice is uploaded the faced privacy concern of cloud server by way of uniting.
Other advantages, target and feature of the invention will be illustrated in the following description to a certain extent, and And to a certain extent, based on will be apparent to those skilled in the art to investigating hereafter, Huo Zheke To be instructed from the practice of the present invention.Target of the invention and other advantages can be realized by following specification and It obtains.
Detailed description of the invention
To make the objectives, technical solutions, and advantages of the present invention clearer, the present invention is made below in conjunction with attached drawing excellent The detailed description of choosing, in which:
Fig. 1 is the process flow diagram that the present invention realizes system using speech recognition;
Fig. 2 is the unidirectional RNN model structure schematic diagram of binaryzation single layer;
Fig. 3 is the two-way RNN model structure schematic diagram of binaryzation;
Fig. 4 is the two-way LSTM model structure schematic diagram of binaryzation;
Fig. 5 is the schematic diagram of internal structure of door control system.
Specific embodiment
Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification Other advantages and efficacy of the present invention can be easily understood for disclosed content.The present invention can also pass through in addition different specific realities The mode of applying is embodied or practiced, the various details in this specification can also based on different viewpoints and application, without departing from Various modifications or alterations are carried out under spirit of the invention.It should be noted that diagram provided in following embodiment is only to show Meaning mode illustrates basic conception of the invention, and in the absence of conflict, the feature in following embodiment and embodiment can phase Mutually combination.
FIG. 1 to FIG. 5 is please referred to, Fig. 1 is the audio recognition method of the present invention based on binaryzation recurrent neural network Flow chart, wherein voice input module is using microphone acquisition analog audio signal and sample quantization becomes digital signal.Audio Vectorization representation module is filtered to audio and the vectorization by the way of entity insertion.Hereafter signal is via a pair two The decoder coding device model that value forward-backward recutrnce neural network is constituted, wherein decoder model extracts hiding in audio signal Character representation, and hiding character representation is mapped to corresponding textual representation by encoder model.Finally, text output module is used for The text that voice is converted to is output to human-computer interaction interface.
The binaryzation recurrent neural networks model that the present embodiment uses mainly includes the following categories: binaryzation single layer is unidirectional The two-way RNN model of RNN model, binaryzation and the two-way LSTM model of binaryzation etc..
1) the unidirectional RNN model of binaryzation single layer
As shown in Fig. 2, the model implements high-effect Binarization methods, binaryzation single layer list on the unidirectional RNN model of single layer It is to the difference of RNN and the unidirectional RNN of common single layer to floating type weight matrix binaryzation weight matrix B (i.e. various bit wides Integer data) and the product of optimal approximation factor-alpha substituted.As shown in Fig. 2, the unidirectional RNN corresponding data of binaryzation single layer Weight on flow path is indicated with α B.Corresponding state transfer formula is as shown in following 4 formulas:
a(t)=b+ (BWo(t-1)W+(BUx(t)U
h(t)=tanh (a(t))
o(t)=c+ (BVh(t)V
Wherein, the bias vector of b, c for parameter, U, V, W have respectively corresponded input unit to hidden unit, and hidden unit arrives Output unit and output unit to hidden unit connection weight matrix.a(t)It indicates after the various state inputs of t moment Overlaying state, the state of input includes the state o of t-1 moment output unit(t-1)With the state x of t moment input unit(t).And Overlaying state is via activation primitive afterwards, such as can choose hyperbolic tangent function tanh here, obtains the hidden unit of t moment State h(t).The state of hidden unit is superimposed bias vector c, obtains the state o of output unit after V matrixing(t)。 Finally, the output vector of standardization posterior probability can be obtained by applying softmax function
In order to reduce storage and computing cost, and avoid model accuracy from losing as far as possible, using binaryzation weight matrix B and Optimal approximation factor-alpha carrys out the original floating type weight matrix of approximate substitution.Original floating type U, V, W matrix carry out two-value respectively After change processing, B is obtainedU, BV, BWThree binaryzation weight matrixs and αU, αV, αWThree corresponding optimal approximation factors.Wherein The method of B and α the evaluation degree of approximation of optimal approximation is as follows, while this is also the evaluation mark of binaryzation RNNs binaryzation effect Standard, specific formula for calculation are as follows:
J(BWW)=‖ W- αWBW2
Algorithm: the specific steps of the unidirectional RNN network of binaryzation Weight Training single layer are used are as follows:
2) the two-way RNN model of binaryzation
As shown in figure 3, in practical application, more efficient two-way company of the RNN model from output unit to hidden unit Fetch realization circulation loop.For the h unit h of t moment(t), state by the t-1 moment output unit state o(t-1)And t moment Input unit state x(t)It codetermines.For the g unit g of t moment(t), state by the t+1 moment output unit state o(t +1)With the input unit state x of t moment(t)It codetermines.T moment output unit o(t)State by bias vector c, h(t)And g(t) State codetermine.
Due to having two groups of contrary hidden units, thus from input unit to hidden unit, from hidden unit to Hidden unit, and weight matrix U, V, W from hidden unit to output unit have two groups, for the circuit comprising h unit, use U1, V1, W1It indicates;For the circuit comprising g unit, U is used2, V2, W2It indicates.
Before binaryzation, after binaryzation, respectively to the circuit where h and g, by U1, V1, W1And U2, V2, W2It replaces withWithAs shown in figure 3, the two-way RNN model of binaryzation State shifts formula as shown in following 6 formulas:
h(t)=tanh (a1 (t))
g(t)=tanh (a2 (t))
3) the two-way LSTM model of binaryzation
As shown in figure 4, the state transfer from last time to current time is referred to as the circuit h, from future time instance to current The state transfer at moment is referred to as the circuit g.Wherein black square indicates the delay of a time step.In two-way LSTM, due to There are two circuits h and g, in order to distinguish on symbol, for the parameters in the circuit h, mark subscript 1, and for g Parameters in circuit mark subscript 2.Such as the input gate gating parameter symbol in the circuit t moment hTo indicate.Double Into LSTM, need to accumulate the output in the circuit h in o unitWith the output in the circuit gTherefore existThe internal structure of LSTM unit is as shown in Figure 5.
In two-way LSTM, need to carry out binaryzation to the circuit where h and g respectively approximate.Such as the U in the circuit hiMatrix Binaryzation matrix useIt indicates, the corresponding optimal approximation factor is usedTo indicate.Similarly, in h, g two are returned U in road, V, W matrix and its submatrix UiDeng will be indicated with similar mode.The mode of approximate substitution and single layer are unidirectional LSTM is consistent, and complete state transfer formula is as follows:
The circuit h:
The circuit g:
Output:
Wherein, Gi、Gf、GoThe gating parameter for respectively indicating input gate, forgeing door, out gate;σ is that sigmoid activates letter Number, function is S-type, and output valve is between 0 and 1, b, and c is the bias vector of parameter, U1, V1, W1Input unit is respectively corresponded To h hidden unit, the weight matrix of the connection of h hidden unit to output unit and output unit to h hidden unit;U2, V2, W2 Connection of the input unit to g hidden unit, g hidden unit to output unit and output unit to g hidden unit is respectively corresponded Weight matrix;Floating type matrix U, V, W obtain corresponding binaryzation weight matrix B after binary conversion treatmentU, BV, BW, with And corresponding optimal approximation factor-alphaU, αV, αW;U=Uh∪Ui∪Uf∪Uo, V=Vh∪Vi∪Vf∪Vo;a(t)It indicates each in t moment Overlaying state after kind state input, the state of input includes the state o of t-1 moment output unit(t-1)It is inputted with t moment single The state x of member(t);h(t)、g(t)The state of the h and g hidden unit of t moment is respectively indicated, tanh is hyperbolic tangent function;Table Show by applying softmax function, obtains the output vector of standardization posterior probability.
Algorithm: the specific steps of the two-way RNN network of binaryzation Weight Training are used are as follows:
The approximate binaryzation LSTM/RNN algorithm of the height that the present invention uses and realization device design can be LSTM/RNN weights It is greatly reduced storage resource and calculation resources expense in speech recognition application after binaryzation, in the case described in two-way LSTM, Storage resource reduces about 93%, and under experimental conditions, it was demonstrated that the decline of LSTM/RNN model accuracy is usually not more than after binaryzation Cross 1%.
Finally, it is stated that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although referring to compared with Good embodiment describes the invention in detail, those skilled in the art should understand that, it can be to skill of the invention Art scheme is modified or replaced equivalently, and without departing from the objective and range of the technical program, should all be covered in the present invention Scope of the claims in.

Claims (8)

1. a kind of audio recognition method based on binaryzation recurrent neural network, which is characterized in that method includes the following steps:
S1: voice input, vectorization audio;
S2: building binaryzation recurrent neural networks model, and the audio after vectorization is decoded and is encoded;
S3: the audio after exports coding, the i.e. audio of output character.
2. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 1, which is characterized in that In step S2, the binaryzation recurrent neural networks model is included the following categories: the unidirectional recurrent neural network of binaryzation single layer (Recurrent Neural Network, RNN) model, the two-way RNN model of binaryzation and the two-way shot and long term of binaryzation remember net Network (Long Short-Term Memory, LSTM) model.
3. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 2, which is characterized in that The unidirectional RNN model of binaryzation single layer includes: input unit, hidden unit and output unit.
4. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 3, which is characterized in that The state of the unidirectional RNN model of binaryzation single layer shifts formula are as follows:
a(t)=b+ (BWo(t-1)W+(BUx(t)U
h(t)=tanh (a(t))
o(t)=c+ (BVh(t)V
Wherein, the bias vector of b, c for parameter, U, V, W have respectively corresponded input unit to hidden unit, hidden unit to output Unit and output unit to hidden unit connection weight matrix;Floating type matrix U, V, W are obtained after binary conversion treatment To corresponding binaryzation weight matrix BU, BV, BWAnd corresponding optimal approximation factor-alphaU, αV, αW;a(t)It indicates each in t moment Overlaying state after kind state input, the state of input includes the state o of t-1 moment output unit(t-1)It is inputted with t moment single The state x of member(t);h(t)Indicate the state of the hidden unit of t moment, tanh is hyperbolic tangent function;Expression passes through application Softmax function obtains the output vector of standardization posterior probability.
5. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 2, which is characterized in that The two-way RNN model of binaryzation includes: input unit, hidden unit and output unit;Wherein, hidden unit is by both direction Opposite h hidden unit and g hidden unit composition;The h hidden unit indicates that circuit is to input in the state to be formed from the past Obtain information;The g hidden unit indicates that circuit is to obtain information from the state that following input is formed.
6. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 5, which is characterized in that The state of the two-way RNN model of binaryzation shifts formula are as follows:
h(t)=tanh (a1 (t))
g(t)=tanh (a2 (t))
Wherein, b, c are the bias vector of parameter, U1, V1, W1Input unit is respectively corresponded to h hidden unit, h hidden unit arrives Output unit and output unit to h hidden unit connection weight matrix;U2, V2, W2It is hidden to g input unit has been respectively corresponded Hide the weight matrix of the connection of unit, g hidden unit to output unit and output unit to g hidden unit;Floating type matrix U, V, W obtain corresponding binaryzation weight matrix B after binary conversion treatmentU, BV, BWAnd corresponding optimal approximation factor-alphaU, αV, αW;a(t)Indicate the overlaying state after the various state inputs of t moment, the state of input includes t-1 moment output unit State o(t-1)With the state x of t moment input unit(t);h(t)、g(t)The state of the h and g hidden unit of t moment is respectively indicated, Tanh is hyperbolic tangent function;It indicates to obtain the output vector of standardization posterior probability by applying softmax function.
7. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 2, which is characterized in that The two-way LSTM model of binaryzation includes: input unit, hidden unit and output unit;Wherein, hidden unit is by two sides It is formed to opposite h hidden unit and g hidden unit;The h hidden unit indicates that circuit is that the state to be formed was inputted from the past Middle acquisition information;The g hidden unit indicates that circuit is to obtain information from the state that following input is formed;The h hides Input gate G is provided in unit and g hidden uniti, forget door GfWith out gate Go
8. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 7, which is characterized in that The state of the two-way LSTM model of binaryzation shifts formula are as follows:
The circuit h, i.e., the state transfer from last time to current time:
The circuit g, the state transfer from future time instance to current time:
Output:
Wherein, the parameters in the circuit h, mark subscript 1, and the parameters in the circuit g mark subscript 2;Gi、Gf、GoRespectively The gating parameter for indicating input gate, forgeing door, out gate;σ is sigmoid activation primitive, and b, c are the bias vector of parameter, U1, V1, W1Input unit has been respectively corresponded to h hidden unit, h hidden unit to output unit and output unit to h hidden unit The weight matrix of connection;U2, V2, W2Input unit is respectively corresponded to g hidden unit, g hidden unit to output unit and defeated Out unit to g hidden unit connection weight matrix;Floating type matrix U, V, W are obtained corresponding after binary conversion treatment Binaryzation weight matrix BU, BV, BWAnd corresponding optimal approximation factor-alphaU, αV, αW;U=Uh∪Ui∪Uf∪Uo, V=Vh∪Vi ∪Vf∪Vo;a(t)Indicate the overlaying state after the various state inputs of t moment, the state of input includes exporting list the t-1 moment The state o of member(t-1)With the state x of t moment input unit(t);h(t)、g(t)Respectively indicate the shape of the h and g hidden unit of t moment State, tanh are hyperbolic tangent function;It indicates to obtain the output vector of standardization posterior probability by applying softmax function.
CN201910310341.1A 2019-04-17 2019-04-17 A kind of audio recognition method based on binaryzation recurrent neural network Pending CN110033766A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910310341.1A CN110033766A (en) 2019-04-17 2019-04-17 A kind of audio recognition method based on binaryzation recurrent neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910310341.1A CN110033766A (en) 2019-04-17 2019-04-17 A kind of audio recognition method based on binaryzation recurrent neural network

Publications (1)

Publication Number Publication Date
CN110033766A true CN110033766A (en) 2019-07-19

Family

ID=67238818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910310341.1A Pending CN110033766A (en) 2019-04-17 2019-04-17 A kind of audio recognition method based on binaryzation recurrent neural network

Country Status (1)

Country Link
CN (1) CN110033766A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674265A (en) * 2019-08-06 2020-01-10 上海孚典智能科技有限公司 Unstructured information oriented feature discrimination and information recommendation system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106816147A (en) * 2017-01-25 2017-06-09 上海交通大学 Speech recognition system based on binary neural network acoustic model
CN107293291A (en) * 2016-03-30 2017-10-24 中国科学院声学研究所 A kind of audio recognition method end to end based on autoadapted learning rate
CN108765506A (en) * 2018-05-21 2018-11-06 上海交通大学 Compression method based on successively network binaryzation
CN109086866A (en) * 2018-07-02 2018-12-25 重庆大学 A kind of part two-value convolution method suitable for embedded device
US10192327B1 (en) * 2016-02-04 2019-01-29 Google Llc Image compression with recurrent neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10192327B1 (en) * 2016-02-04 2019-01-29 Google Llc Image compression with recurrent neural networks
CN107293291A (en) * 2016-03-30 2017-10-24 中国科学院声学研究所 A kind of audio recognition method end to end based on autoadapted learning rate
CN106816147A (en) * 2017-01-25 2017-06-09 上海交通大学 Speech recognition system based on binary neural network acoustic model
CN108765506A (en) * 2018-05-21 2018-11-06 上海交通大学 Compression method based on successively network binaryzation
CN109086866A (en) * 2018-07-02 2018-12-25 重庆大学 A kind of part two-value convolution method suitable for embedded device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GDTOP818: "[ICML14] Towards End-to-End Speech Recognition with Recurrent Neural Networks", 《CSDN博客,HTTPS://BLOG.CSDN.NET/WEIXIN_37993251/ARTICLE/DETAILS/88129893》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674265A (en) * 2019-08-06 2020-01-10 上海孚典智能科技有限公司 Unstructured information oriented feature discrimination and information recommendation system

Similar Documents

Publication Publication Date Title
CN108171701B (en) Significance detection method based on U network and counterstudy
CN109543838B (en) Image increment learning method based on variational self-encoder
CN108280233A (en) A kind of VideoGIS data retrieval method based on deep learning
CN110851760B (en) Human-computer interaction system for integrating visual question answering in web3D environment
CN110533570A (en) A kind of general steganography method based on deep learning
CN111581383A (en) Chinese text classification method based on ERNIE-BiGRU
CN111931814A (en) Unsupervised anti-domain adaptation method based on intra-class structure compactness constraint
CN112733965A (en) Label-free image classification method based on small sample learning
CN114565808B (en) Double-action contrast learning method for unsupervised visual representation
CN110472255A (en) Neural network machine interpretation method, model, electric terminal and storage medium
CN110263164A (en) A kind of Sentiment orientation analysis method based on Model Fusion
CN114581341A (en) Image style migration method and system based on deep learning
CN116935292B (en) Short video scene classification method and system based on self-attention model
CN110033766A (en) A kind of audio recognition method based on binaryzation recurrent neural network
CN112417118A (en) Dialog generation method based on marked text and neural network
Li et al. Towards communication-efficient digital twin via ai-powered transmission and reconstruction
LU503098B1 (en) A method and system for fused subspace clustering based on graph autoencoder
CN114333069B (en) Object posture processing method, device, equipment and storage medium
CN116245106A (en) Cross-domain named entity identification method based on autoregressive model
CN115270917A (en) Two-stage processing multi-mode garment image generation method
Cheng et al. Solving monocular sensors depth prediction using MLP-based architecture and multi-scale inverse attention
Babaheidarian et al. Decode and transfer: A new steganalysis technique via conditional generative adversarial networks
CN112598065A (en) Memory-based gated convolutional neural network semantic processing system and method
Kouvakis et al. Semantic communications for image-based sign language transmission
Yuan et al. A review on generative adversarial networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190719