CN110033766A

CN110033766A - A kind of audio recognition method based on binaryzation recurrent neural network

Info

Publication number: CN110033766A
Application number: CN201910310341.1A
Authority: CN
Inventors: 李坤平; 张帅; 周喜川
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2019-04-17
Filing date: 2019-04-17
Publication date: 2019-07-19

Abstract

The present invention relates to a kind of audio recognition methods based on binaryzation recurrent neural network, belong to artificial intelligence field.This method comprises: S1: voice input, vectorization audio；S2: building binaryzation recurrent neural networks model, and the audio after vectorization is decoded and is encoded；S3: the audio after exports coding, the i.e. audio of output character.The binaryzation recurrent neural networks model includes the network structures such as the unidirectional RNN model of binaryzation single layer, the two-way RNN model of binaryzation and the two-way LSTM model of binaryzation.The present invention keeps advantage of the binaryzation network in terms of speed and energy consumption while improving application model precision；Make it possible on embedded device realize under natural language processing and speech processing applications scene better performances LSTM/RNN model.

Description

A kind of audio recognition method based on binaryzation recurrent neural network

Technical field

The invention belongs to artificial intelligence fields, are related to a kind of audio recognition method based on binaryzation recurrent neural network.

Background technique

Based on classical LSTM/RNN model realization series processing application, such as Entity recognition is named in natural language processing, Or voice, to application scenarios such as the conversions of text, LSTM/RNN model can achieve preferable performance.However due to LSTM/ The parameter of the weight matrix of RNN model, various control doors is real-coded GA, and the realization of classical LSTM/RNN model needs to consume A large amount of computing resource and storage resource, thus be difficult to implement on the limited embedded device of resource, or can only be embedded More simple but poor performance LSTM/RNN model is realized in equipment.

At present in the application such as natural language processing and speech recognition, common LSTM/RNN model is more complex, often portion Affix one's name to the server end abundant in computing capability and storage capacity.But as user increasingly payes attention to privacy, user is more Tend to the LSTM/RNN model in offline end operation.There are the risks of privacy leakage for traditional server end implementation, and And it is limited by network bandwidth, in the higher application scenarios of requirement of real-time, set in the implementation of server end compared to embedded Standby upper implementation is more difficult to guarantee summary responses.

The floating type weight of threshold value two sides is directly quantified as 0 and 1 by given threshold by common Binarization methods, though Memory overhead and computing cost can be so significantly reduced, but loss of significance is big.

Summary of the invention

In view of this, the purpose of the present invention is to provide a kind of speech recognition sides based on binaryzation recurrent neural network Method keeps advantage of the binaryzation network in terms of speed and energy consumption while improving application model precision.So that being set embedded Standby upper realization LSTM/RNN model of better performances under natural language processing and speech processing applications scene is possibly realized.

In order to achieve the above objectives, the invention provides the following technical scheme:

A kind of audio recognition method based on binaryzation recurrent neural network, comprising the following steps:

S1: voice input, vectorization audio；

S2: building binaryzation recurrent neural networks model, and the audio after vectorization is decoded and is encoded；

S3: the audio after exports coding, the i.e. audio of output character.

Further, in step S2, the binaryzation recurrent neural networks model includes the following categories: binaryzation single layer is unidirectional Recurrent neural network (Recurrent Neural Network, RNN) model, the two-way RNN model of binaryzation and binaryzation are two-way Shot and long term memory network (Long Short-Term Memory, LSTM) model.

Further, the unidirectional RNN model of the binaryzation single layer includes: input unit, hidden unit and output unit.

Further, the state of the unidirectional RNN model of the binaryzation single layer shifts formula are as follows:

a^(t)=b+ (B_W ^o(t-1))α_W+(B_Ux^(t))α_U

h^(t)=tanh (a^(t))

o^(t)=c+ (B_Vh^(t))α_V

Wherein, the bias vector of b, c for parameter, U, V, W have respectively corresponded input unit to hidden unit, and hidden unit arrives Output unit and output unit to hidden unit connection weight matrix；Floating type matrix U, V, W pass through binary conversion treatment Afterwards, corresponding binaryzation weight matrix B is obtained_U, B_V, B_WAnd corresponding optimal approximation factor-alpha_U, α_V, α_W；a^(t)It indicates in t The overlaying state after various state inputs is carved, the state of input includes the state o of t-1 moment output unit^(t-1)It is defeated with t moment Enter the state x of unit^(t)；h^(t)Indicate the state of the hidden unit of t moment, tanh is hyperbolic tangent function；Expression passes through Using softmax function, the output vector of standardization posterior probability is obtained.

Further, the two-way RNN model of the binaryzation includes: input unit, hidden unit and output unit；Wherein, hidden Hiding unit is made of the opposite h hidden unit of both direction and g hidden unit；The h hidden unit indicates that circuit is from the past It inputs in the state formed and obtains information；The g hidden unit indicates that circuit is to obtain from the state that following input is formed Information.

Further, the state of the two-way RNN model of the binaryzation shifts formula are as follows:

h^(t)=tanh (a₁ ^(t))

g^(t)=tanh (a₂ ^(t))

Wherein, b, c are the bias vector of parameter, U₁, V₁, W₁Input unit is respectively corresponded to h hidden unit, h hides single Weight matrix of the member to output unit and output unit to the connection of h hidden unit；U₂, V₂, W₂Input unit is respectively corresponded To g hidden unit, the weight matrix of the connection of g hidden unit to output unit and output unit to g hidden unit；Floating type Matrix U, V, W obtain corresponding binaryzation weight matrix B after binary conversion treatment_U, B_V, B_WAnd corresponding optimal approximation Factor-alpha_U, α_V, α_W；a^(t)Indicate the overlaying state after t moment various states input, the state of input include the t-1 moment it is defeated The state o of unit out^(t-1)With the state x of t moment input unit^(t)；h^(t)、g^(t)Respectively indicate h the and g hidden unit of t moment State, tanh are hyperbolic tangent function；Indicate by apply softmax function, obtain standardization posterior probability output to Amount.

Further, the two-way LSTM model of the binaryzation includes: input unit, hidden unit and output unit；Wherein, hidden Hiding unit is made of the opposite h hidden unit of both direction and g hidden unit；The h hidden unit indicates that circuit is from the past It inputs in the state formed and obtains information；The g hidden unit indicates that circuit is to obtain from the state that following input is formed Information；Input gate G is provided in the h hidden unit and g hidden unit_i, forget door G_fWith out gate G_o。

Further, the state of the two-way LSTM model of the binaryzation shifts formula are as follows:

The circuit h, i.e., the state transfer from last time to current time:

The circuit g, the state transfer from future time instance to current time:

Output:

Wherein, the parameters in the circuit h, mark subscript 1, and the parameters in the circuit g mark subscript 2；G_i、G_f、G_o The gating parameter for respectively indicating input gate, forgeing door, out gate；σ is sigmoid activation primitive, and function is S-type, output valve position Between 0 and 1, b, c are the bias vector of parameter, U₁, V₁, W₁Input unit has been respectively corresponded to h hidden unit, h hidden unit To output unit and output unit to the weight matrix of the connection of h hidden unit；U₂, V₂, W₂Input unit has been respectively corresponded to g Hidden unit, g hidden unit to output unit and output unit to g hidden unit connection weight matrix；Floating type matrix U, V, W obtain corresponding binaryzation weight matrix B after binary conversion treatment_U, B_V, B_WAnd the corresponding optimal approximation factor α_U, α_V, α_W；U=U_h∪U_i∪U_f∪U_o, V=V_h∪V_i∪V_f∪V_o；a^(t)Indicate the superposition after the various state inputs of t moment State, the state of input include the state o of t-1 moment output unit^(t-1)With the state x of t moment input unit^(t)；h^(t)、g^(t) The state of the h and g hidden unit of t moment is respectively indicated, tanh is hyperbolic tangent function；It indicates by applying softmax letter Number obtains the output vector of standardization posterior probability.

The beneficial effects of the present invention are: the present invention after LSTM/RNN weight binaryzation, will drastically reduce voice and know Storage resource and calculation resources expense in not applying, in the case described in two-way LSTM, storage resource reduces about 93%, and in reality Under the conditions of testing, it was demonstrated that LSTM/RNN model accuracy decline typically not greater than 1% after binaryzation.

Compared to traditional LSTM/RNN model, the present invention can be effectively in the voice highly sensitive to storage and calculation resources It is disposed in identification Embedded Application scene, can be run in low-power-consumption embedded equipment with higher performance, avoid passing Voice is uploaded the faced privacy concern of cloud server by way of uniting.

Other advantages, target and feature of the invention will be illustrated in the following description to a certain extent, and And to a certain extent, based on will be apparent to those skilled in the art to investigating hereafter, Huo Zheke To be instructed from the practice of the present invention.Target of the invention and other advantages can be realized by following specification and It obtains.

Detailed description of the invention

To make the objectives, technical solutions, and advantages of the present invention clearer, the present invention is made below in conjunction with attached drawing excellent The detailed description of choosing, in which:

Fig. 1 is the process flow diagram that the present invention realizes system using speech recognition；

Fig. 2 is the unidirectional RNN model structure schematic diagram of binaryzation single layer；

Fig. 3 is the two-way RNN model structure schematic diagram of binaryzation；

Fig. 4 is the two-way LSTM model structure schematic diagram of binaryzation；

Fig. 5 is the schematic diagram of internal structure of door control system.

Specific embodiment

Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification Other advantages and efficacy of the present invention can be easily understood for disclosed content.The present invention can also pass through in addition different specific realities The mode of applying is embodied or practiced, the various details in this specification can also based on different viewpoints and application, without departing from Various modifications or alterations are carried out under spirit of the invention.It should be noted that diagram provided in following embodiment is only to show Meaning mode illustrates basic conception of the invention, and in the absence of conflict, the feature in following embodiment and embodiment can phase Mutually combination.

FIG. 1 to FIG. 5 is please referred to, Fig. 1 is the audio recognition method of the present invention based on binaryzation recurrent neural network Flow chart, wherein voice input module is using microphone acquisition analog audio signal and sample quantization becomes digital signal.Audio Vectorization representation module is filtered to audio and the vectorization by the way of entity insertion.Hereafter signal is via a pair two The decoder coding device model that value forward-backward recutrnce neural network is constituted, wherein decoder model extracts hiding in audio signal Character representation, and hiding character representation is mapped to corresponding textual representation by encoder model.Finally, text output module is used for The text that voice is converted to is output to human-computer interaction interface.

The binaryzation recurrent neural networks model that the present embodiment uses mainly includes the following categories: binaryzation single layer is unidirectional The two-way RNN model of RNN model, binaryzation and the two-way LSTM model of binaryzation etc..

1) the unidirectional RNN model of binaryzation single layer

As shown in Fig. 2, the model implements high-effect Binarization methods, binaryzation single layer list on the unidirectional RNN model of single layer It is to the difference of RNN and the unidirectional RNN of common single layer to floating type weight matrix binaryzation weight matrix B (i.e. various bit wides Integer data) and the product of optimal approximation factor-alpha substituted.As shown in Fig. 2, the unidirectional RNN corresponding data of binaryzation single layer Weight on flow path is indicated with α B.Corresponding state transfer formula is as shown in following 4 formulas:

a^(t)=b+ (B_Wo^(t-1))α_W+(B_Ux^(t))α_U

h^(t)=tanh (a^(t))

o^(t)=c+ (B_Vh^(t))α_V

Wherein, the bias vector of b, c for parameter, U, V, W have respectively corresponded input unit to hidden unit, and hidden unit arrives Output unit and output unit to hidden unit connection weight matrix.a^(t)It indicates after the various state inputs of t moment Overlaying state, the state of input includes the state o of t-1 moment output unit^(t-1)With the state x of t moment input unit^(t).And Overlaying state is via activation primitive afterwards, such as can choose hyperbolic tangent function tanh here, obtains the hidden unit of t moment State h^(t).The state of hidden unit is superimposed bias vector c, obtains the state o of output unit after V matrixing^(t)。 Finally, the output vector of standardization posterior probability can be obtained by applying softmax function

In order to reduce storage and computing cost, and avoid model accuracy from losing as far as possible, using binaryzation weight matrix B and Optimal approximation factor-alpha carrys out the original floating type weight matrix of approximate substitution.Original floating type U, V, W matrix carry out two-value respectively After change processing, B is obtained_U, B_V, B_WThree binaryzation weight matrixs and α_U, α_V, α_WThree corresponding optimal approximation factors.Wherein The method of B and α the evaluation degree of approximation of optimal approximation is as follows, while this is also the evaluation mark of binaryzation RNNs binaryzation effect Standard, specific formula for calculation are as follows:

J(B_W,α_W)=‖ W- α_WB_W‖²

Algorithm: the specific steps of the unidirectional RNN network of binaryzation Weight Training single layer are used are as follows:

2) the two-way RNN model of binaryzation

As shown in figure 3, in practical application, more efficient two-way company of the RNN model from output unit to hidden unit Fetch realization circulation loop.For the h unit h of t moment^(t), state by the t-1 moment output unit state o^(t-1)And t moment Input unit state x^(t)It codetermines.For the g unit g of t moment^(t), state by the t+1 moment output unit state o^(t ⁺¹⁾With the input unit state x of t moment^(t)It codetermines.T moment output unit o^(t)State by bias vector c, h^(t)And g^(t) State codetermine.

Due to having two groups of contrary hidden units, thus from input unit to hidden unit, from hidden unit to Hidden unit, and weight matrix U, V, W from hidden unit to output unit have two groups, for the circuit comprising h unit, use U₁, V₁, W₁It indicates；For the circuit comprising g unit, U is used₂, V₂, W₂It indicates.

Before binaryzation, after binaryzation, respectively to the circuit where h and g, by U₁, V₁, W₁And U₂, V₂, W₂It replaces withWithAs shown in figure 3, the two-way RNN model of binaryzation State shifts formula as shown in following 6 formulas:

h^(t)=tanh (a₁ ^(t))

g^(t)=tanh (a₂ ^(t))

3) the two-way LSTM model of binaryzation

As shown in figure 4, the state transfer from last time to current time is referred to as the circuit h, from future time instance to current The state transfer at moment is referred to as the circuit g.Wherein black square indicates the delay of a time step.In two-way LSTM, due to There are two circuits h and g, in order to distinguish on symbol, for the parameters in the circuit h, mark subscript 1, and for g Parameters in circuit mark subscript 2.Such as the input gate gating parameter symbol in the circuit t moment hTo indicate.Double Into LSTM, need to accumulate the output in the circuit h in o unitWith the output in the circuit gTherefore existThe internal structure of LSTM unit is as shown in Figure 5.

In two-way LSTM, need to carry out binaryzation to the circuit where h and g respectively approximate.Such as the U in the circuit h_iMatrix Binaryzation matrix useIt indicates, the corresponding optimal approximation factor is usedTo indicate.Similarly, in h, g two are returned U in road, V, W matrix and its submatrix U_iDeng will be indicated with similar mode.The mode of approximate substitution and single layer are unidirectional LSTM is consistent, and complete state transfer formula is as follows:

The circuit h:

The circuit g:

Output:

Wherein, G_i、G_f、G_oThe gating parameter for respectively indicating input gate, forgeing door, out gate；σ is that sigmoid activates letter Number, function is S-type, and output valve is between 0 and 1, b, and c is the bias vector of parameter, U₁, V₁, W₁Input unit is respectively corresponded To h hidden unit, the weight matrix of the connection of h hidden unit to output unit and output unit to h hidden unit；U₂, V₂, W₂ Connection of the input unit to g hidden unit, g hidden unit to output unit and output unit to g hidden unit is respectively corresponded Weight matrix；Floating type matrix U, V, W obtain corresponding binaryzation weight matrix B after binary conversion treatment_U, B_V, B_W, with And corresponding optimal approximation factor-alpha_U, α_V, α_W；U=U_h∪U_i∪U_f∪U_o, V=V_h∪V_i∪V_f∪V_o；a^(t)It indicates each in t moment Overlaying state after kind state input, the state of input includes the state o of t-1 moment output unit^(t-1)It is inputted with t moment single The state x of member^(t)；h^(t)、g^(t)The state of the h and g hidden unit of t moment is respectively indicated, tanh is hyperbolic tangent function；Table Show by applying softmax function, obtains the output vector of standardization posterior probability.

Algorithm: the specific steps of the two-way RNN network of binaryzation Weight Training are used are as follows:

The approximate binaryzation LSTM/RNN algorithm of the height that the present invention uses and realization device design can be LSTM/RNN weights It is greatly reduced storage resource and calculation resources expense in speech recognition application after binaryzation, in the case described in two-way LSTM, Storage resource reduces about 93%, and under experimental conditions, it was demonstrated that the decline of LSTM/RNN model accuracy is usually not more than after binaryzation Cross 1%.

Finally, it is stated that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although referring to compared with Good embodiment describes the invention in detail, those skilled in the art should understand that, it can be to skill of the invention Art scheme is modified or replaced equivalently, and without departing from the objective and range of the technical program, should all be covered in the present invention Scope of the claims in.

Claims

1. a kind of audio recognition method based on binaryzation recurrent neural network, which is characterized in that method includes the following steps:

S1: voice input, vectorization audio；

S3: the audio after exports coding, the i.e. audio of output character.

2. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 1, which is characterized in that In step S2, the binaryzation recurrent neural networks model is included the following categories: the unidirectional recurrent neural network of binaryzation single layer (Recurrent Neural Network, RNN) model, the two-way RNN model of binaryzation and the two-way shot and long term of binaryzation remember net Network (Long Short-Term Memory, LSTM) model.

3. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 2, which is characterized in that The unidirectional RNN model of binaryzation single layer includes: input unit, hidden unit and output unit.

4. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 3, which is characterized in that The state of the unidirectional RNN model of binaryzation single layer shifts formula are as follows:

a^(t)=b+ (B_Wo^(t-1))α_W+(B_Ux^(t))α_U

h^(t)=tanh (a^(t))

o^(t)=c+ (B_Vh^(t))α_V

Wherein, the bias vector of b, c for parameter, U, V, W have respectively corresponded input unit to hidden unit, hidden unit to output Unit and output unit to hidden unit connection weight matrix；Floating type matrix U, V, W are obtained after binary conversion treatment To corresponding binaryzation weight matrix B_U, B_V, B_WAnd corresponding optimal approximation factor-alpha_U, α_V, α_W；a^(t)It indicates each in t moment Overlaying state after kind state input, the state of input includes the state o of t-1 moment output unit^(t-1)It is inputted with t moment single The state x of member^(t)；h^(t)Indicate the state of the hidden unit of t moment, tanh is hyperbolic tangent function；Expression passes through application Softmax function obtains the output vector of standardization posterior probability.

5. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 2, which is characterized in that The two-way RNN model of binaryzation includes: input unit, hidden unit and output unit；Wherein, hidden unit is by both direction Opposite h hidden unit and g hidden unit composition；The h hidden unit indicates that circuit is to input in the state to be formed from the past Obtain information；The g hidden unit indicates that circuit is to obtain information from the state that following input is formed.

6. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 5, which is characterized in that The state of the two-way RNN model of binaryzation shifts formula are as follows:

h^(t)=tanh (a₁ ^(t))

g^(t)=tanh (a₂ ^(t))

Wherein, b, c are the bias vector of parameter, U₁, V₁, W₁Input unit is respectively corresponded to h hidden unit, h hidden unit arrives Output unit and output unit to h hidden unit connection weight matrix；U₂, V₂, W₂It is hidden to g input unit has been respectively corresponded Hide the weight matrix of the connection of unit, g hidden unit to output unit and output unit to g hidden unit；Floating type matrix U, V, W obtain corresponding binaryzation weight matrix B after binary conversion treatment_U, B_V, B_WAnd corresponding optimal approximation factor-alpha_U, α_V, α_W；a^(t)Indicate the overlaying state after the various state inputs of t moment, the state of input includes t-1 moment output unit State o^(t-1)With the state x of t moment input unit^(t)；h^(t)、g^(t)The state of the h and g hidden unit of t moment is respectively indicated, Tanh is hyperbolic tangent function；It indicates to obtain the output vector of standardization posterior probability by applying softmax function.

7. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 2, which is characterized in that The two-way LSTM model of binaryzation includes: input unit, hidden unit and output unit；Wherein, hidden unit is by two sides It is formed to opposite h hidden unit and g hidden unit；The h hidden unit indicates that circuit is that the state to be formed was inputted from the past Middle acquisition information；The g hidden unit indicates that circuit is to obtain information from the state that following input is formed；The h hides Input gate G is provided in unit and g hidden unit_i, forget door G_fWith out gate G_o。

8. a kind of audio recognition method based on binaryzation recurrent neural network according to claim 7, which is characterized in that The state of the two-way LSTM model of binaryzation shifts formula are as follows:

The circuit h, i.e., the state transfer from last time to current time:

The circuit g, the state transfer from future time instance to current time:

Output:

Wherein, the parameters in the circuit h, mark subscript 1, and the parameters in the circuit g mark subscript 2；G_i、G_f、G_oRespectively The gating parameter for indicating input gate, forgeing door, out gate；σ is sigmoid activation primitive, and b, c are the bias vector of parameter, U₁, V₁, W₁Input unit has been respectively corresponded to h hidden unit, h hidden unit to output unit and output unit to h hidden unit The weight matrix of connection；U₂, V₂, W₂Input unit is respectively corresponded to g hidden unit, g hidden unit to output unit and defeated Out unit to g hidden unit connection weight matrix；Floating type matrix U, V, W are obtained corresponding after binary conversion treatment Binaryzation weight matrix B_U, B_V, B_WAnd corresponding optimal approximation factor-alpha_U, α_V, α_W；U=U_h∪U_i∪U_f∪U_o, V=V_h∪V_i ∪V_f∪V_o；a^(t)Indicate the overlaying state after the various state inputs of t moment, the state of input includes exporting list the t-1 moment The state o of member^(t-1)With the state x of t moment input unit^(t)；h^(t)、g^(t)Respectively indicate the shape of the h and g hidden unit of t moment State, tanh are hyperbolic tangent function；It indicates to obtain the output vector of standardization posterior probability by applying softmax function.