Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a circular neural network acceleration method for handwritten Chinese character recognition based on SVD, which can reduce the network calculation amount and ensure the network recognition rate.
The technical scheme of the invention is realized as follows:
a recurrent neural network acceleration method for handwritten Chinese character recognition based on SVD Singular Value Decomposition (Singular Value Decomposition) comprises the following steps:
s1: designing and training a recurrent neural network for online handwritten Chinese characters;
s2: SVD decomposition is carried out on the parameter matrix, and the decomposed parameter matrix is calculated according to the multiple of required acceleration;
s3: initializing the network according to the parameter matrix obtained by decomposition;
s4: the whole network is retrained aiming at the online handwriting recognition task so as to achieve the fine adjustment effect;
s5: optimizing forward implementation and dynamically setting the time node length of the forward process network.
Further, the specific steps of S1 are as follows:
s11: designing the structure of the recurrent neural network, setting parameters of an LSTM layer and a full connection layer, and selecting the length of a time node;
s12: and taking the data of the training set as the input of the recurrent neural network, training the recurrent neural network by adopting a self-adaptive gradient descent method, terminating the training when the error of the recurrent neural network on the training set is completely converged, and storing the network parameters.
Further, the specific steps of S2 are as follows:
s21: carrying out SVD on the parameter matrix of the LSTM layer, and calculating a reserved parameter matrix according to the required acceleration multiple;
s22: and carrying out SVD on the parameter matrix of the full connection layer, and calculating the reserved parameter matrix according to the required acceleration multiple.
Further, the specific steps of S3 are as follows:
s31: initializing the LSTM layer according to the parameter matrix calculated in the step S2;
s32: the full connection layer is initialized according to the parameter matrix calculated in step S2.
Further, in step S4, the network after SVD decomposition is retrained at a learning rate less than 10 times for the handwritten chinese character recognition task to achieve the fine tuning effect.
Further, the specific steps of S5 are as follows:
s51: calculating the time node length of each input character;
s52: and dynamically setting the time node length of the recurrent neural network in the forward process according to the time node length of the input character.
Compared with the prior art, the principle and the advantages of the scheme are as follows:
1. the method adopts the cyclic neural network to aim at the online handwritten character recognition, effectively utilizes the characteristic of time sequence among the strokes of online Chinese characters, and obviously improves the recognition rate of the network.
2. The acceleration method based on SVD is adopted to decompose the parameter matrixes of the LSTM layer and the fully-connected layer, so that the redundancy of the matrixes is greatly reduced, and the identification performance of the network can be still ensured under the condition of reducing the calculation complexity; in the forward process, the time node length of the recurrent neural network is dynamically set according to the input characters, so that the forward time is effectively accelerated.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention mainly solves the problem that the online handwritten Chinese character recognition speed based on the recurrent neural network is too slow, analyzes the calculation characteristics of an LSTM layer and a full connection layer, proposes a corresponding strategy, decomposes a parameter matrix into two smaller matrixes through SVD, modifies a realization code to a certain extent, and then calculates by multiplying the matrixes. The whole flow is shown in figure 1:
the embodiment of the invention comprises the following steps: s1: designing and training a recurrent neural network for online handwritten Chinese characters; s2: SVD decomposition is carried out on the parameter matrix, and the decomposed parameter matrix is calculated according to the multiple of required acceleration; s3: initializing the network according to the parameter matrix obtained by decomposition; s4: the whole network is retrained aiming at the online handwriting recognition task so as to achieve the fine adjustment effect; s5: optimizing forward implementation and dynamically setting the time node length of the forward process network. Specifically, a network is designed for training to obtain an initial model, then parameters reserved after SVD are calculated according to the size of each parameter matrix and the acceleration multiple, the network is initialized according to the parameters after SVD, retraining is performed on a handwriting recognition task, and finally forward implementation is optimized, the time node length of a forward process network is dynamically set, and forward calculation time is shortened.
The main steps of the embodiments of the present invention are described in detail below.
Step S1: designing and training a recurrent neural network for on-line handwritten Chinese characters, comprising the steps of
S11: designing the structure of the recurrent neural network, setting parameters of an LSTM layer and a full connection layer, and selecting the length of a time node;
in an embodiment of the present invention, each of the entered online handwritten kanji characters is preprocessed into a sequence of 150 coordinate points, each represented by a 6-dimensional vector. The set deep cycle neural network model structure comprises 2 LSTM layers in total, the output dimension of each LSTM layer is 100 and 512 dimensions respectively, and the time node length is 150; the device comprises a full connection layer, wherein the number of output neurons of the full connection layer is 512; it is worth noting that a ReLU is followed as an activation function at the fully connected layer. The last layer is an output layer, and 3755 types are output; the overall structure of the initial network is represented as:
Input--100LSTM--512LSTM--512FC--Output
s12: training the designed network;
and (3) carrying out classification during training, training the network by adopting a gradient descent method of self-adaptive momentum, wherein the training comprises two steps of forward propagation and backward propagation, the forward network propagates the error of the network, the backward propagation updates the parameters of each layer, and the parameters of the network are continuously optimized. During training, every ten thousand iterations, the data of the test set is tested completely by using the model, and finally the model with the highest result obtained during testing is kept.
Step S2: and carrying out SVD on the parameter matrix, and calculating the decomposed parameter matrix according to the multiple of the required acceleration. Comprises the steps of
S21: and carrying out SVD on the parameter matrix of the LSTM layer, and calculating the parameter matrix after decomposition according to the multiple of acceleration required.
A schematic diagram of the SVD decomposition is shown in FIG. 2, assuming an input vector I ∈ R
mThe output vector O is equal to R
nThe parameter matrix W is equal to R
m ×nAfter SVD decomposition of W, W ═ USV can be obtained
ΤSorting the singular values from large to small, reserving the first r singular values and the corresponding singular vectors to obtain
Order to
Then W ≈ PQ. Thus, one parameter matrix can be decomposed by SVD into two smaller matrices, with the computational complexity reduced from O (mn) to O (r (m + n)).
A schematic diagram of the LSTM layer is shown in FIG. 3, which is formed by an Input Gate (i)t∈RNForget Gate (form Gate) ft∈RNAnd Output Gate (Output Gate) ot∈RNInput Modulation Gate gt∈RN4 doorsForming that the input of each time node comprises the input vector x of the current time nodet∈RCAnd hidden state h of last time pointt∈RNThe forward calculation formula is as follows:
it=σ(Wxixt+Whiht-1+bi) (1)
ft=σ(Wxfxt+Whfht-1+bf) (2)
ot=σ(Wxoxt+Whoht-1+bo) (3)
according to the above formula, input vector x is inputt4 matrices W for calculationxi,Wxf,Wxo,WxcViewed as a large matrix Wx∈RC×4NFor hidden state ht4 matrices W for calculationhi,Whf,Who,WhcViewed as a large matrix Wh∈RN×4N. The entire LSTM layer can thus be seen as comprising two parameter matrices Wx∈RC×4NAnd Wh∈RN×4NThe computational complexity of each time node is O (4 (N)2+ NC)). Respectively carrying out SVD on the two parameter matrixes, and reserving the first r singular values and singular vectors, then Wx≈PxQx,Wh≈PhQhEach parameter matrix is decomposed into two smaller matrices, and the total computational complexity is reduced to O (4(3N + C) r). Therefore, if one wants to accelerate LSTM by a factor of d, the value of r should be set to:
s22: and carrying out SVD on the parameter matrix of the full connection layer, and calculating the parameter matrix after decomposition according to the multiple of acceleration required.
Assuming that the input neuron of the full connection layer is x ∈ RMThe output neuron is f e RNThen, the calculation formula is:
f=Wfx+b, (8)
wherein the parameter matrix Wf∈RM×NThe bias b ∈ RNThe computational complexity is o (mn).
For parameter matrix WfSVD is carried out, the first r singular values and the corresponding singular vectors are reserved, and W can be obtainedf≈PfQfThe parameter matrix is decomposed into two smaller matrices with a computational complexity of O (r (M + N)). Therefore, if one wants to speed up the fully connected layer by a factor of d, the value of r should be set to:
step S3: and initializing the network according to the parameter matrix obtained by decomposition. Comprises the steps of
S31: the LSTM layer is initialized according to the parameter matrix calculated in step S2.
As shown in FIG. 2, SVD decomposition turns one parameter matrix into two smaller parameter matrices, so in an LSTM implementation, one matrix is multiplied by WxxtDecomposed into two matrix multiplications PxQxxt. And initialized according to the parameter matrix calculated in step S2. For Whht-1The implementation is modified and initialized as such.
S32: initializing the full connection layer according to the parameter matrix calculated in step S2.
For the fully-connected layer, its implementation is modified so that the first matrix multiplication Wfx is decomposed into two matrix multiplications PfQfx and is initialized according to the parameter matrix calculated in step S2.
Step S4: and (4) retraining the whole network aiming at the online handwriting recognition task so as to achieve the fine tuning effect.
After SVD, only the first r larger singular values and singular vectors are reserved, so that the parameter precision is lost, and the overall performance of the network is influenced to a certain extent. Therefore, the whole recognition network is subjected to fine tuning training at a low learning rate aiming at the handwriting recognition task, and the effect of improving the recognition accuracy rate is achieved.
Step S5: optimizing forward implementation and dynamically setting the time node length of the forward process network. Comprises the steps of
S51: calculating the time node length of each input character;
the time node length of each input character can be obtained by counting the raw input data.
S52: dynamically setting the time node length of a recurrent neural network in the forward process according to the time node length of the input character;
in the training process of the recurrent neural network, the time node length of the network is required to be fixed, so the time node length T of the network needs to be preset, and according to the LSTM forward calculation formula in the step S2, the total calculation complexity of the LSTM layer is O (T · 4 (N)2+ NC)) is linear with time node length. Suppose that the length of the time node of the ith character in the Chinese character set is TiThen, the time node length T of the network in the training process is determined by the following formula:
T=max<T1,T2,T3,...,TN-1,TN-2,TN> (10)
in the training process, for characters with the time node length smaller than T, the time node length reaches T by zero filling. Thus, for most charactersThe time node length of the training network is larger than that of the characters, and therefore the calculation complexity is greatly improved. In the forward implementation of the present invention, the time node length in the network forward process is dynamically set according to the time node length of each input character. Suppose that the time node length of the input character is TiThe time complexity of the recurrent neural network is reduced to O (T)i·4(N2+NC))(TiT is less than or equal to T). By using the dynamically changed time node length, the calculation complexity can be effectively reduced, the forward calculation time is accelerated, and the identification precision is not influenced.
The embodiment adopts the cyclic neural network to aim at the online handwritten character recognition, effectively utilizes the characteristic of time sequence among the strokes of the online Chinese characters, and obviously improves the recognition rate of the network; in addition, an acceleration method based on SVD decomposition is adopted to decompose the parameter matrixes of the LSTM layer and the full connection layer, so that the redundancy of the matrixes is greatly reduced, and the identification performance of the network can be still ensured under the condition of reducing the calculation complexity; in the forward process, the time node length of the recurrent neural network is dynamically set according to the input characters, so that the forward time is effectively accelerated.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.