CN110533043B

CN110533043B - SVD-based acceleration method of recurrent neural network for handwritten Chinese character recognition

Info

Publication number: CN110533043B
Application number: CN201810502952.1A
Authority: CN
Inventors: 梁凯焕; 杨亚锋; 肖学锋; 金连文; 孙俊
Original assignee: South China University of Technology SCUT; Fujitsu Ltd
Current assignee: South China University of Technology SCUT; Fujitsu Research Development Centre Co Ltd
Priority date: 2018-05-23
Filing date: 2018-05-23
Publication date: 2022-04-08
Anticipated expiration: 2038-05-23
Also published as: CN110533043A

Abstract

The invention relates to a circulating neural network acceleration method for handwritten Chinese character recognition based on SVD, which comprises the following steps: s1: designing and training a recurrent neural network for online handwritten Chinese characters; s2: SVD decomposition is carried out on the parameter matrix, and the decomposed parameter matrix is calculated according to the multiple of required acceleration; s3: initializing the network according to the parameter matrix obtained by decomposition; s4: the whole network is retrained aiming at the online handwriting recognition task so as to achieve the fine adjustment effect; s5: optimizing forward implementation and dynamically setting the time node length of the forward process network. The invention adopts the cyclic neural network to recognize the online handwritten Chinese characters, and uses the SVD to decompose the trained network, thereby obviously reducing the computational complexity of the network, and simultaneously, dynamically setting the time node length of the network according to the input data in the forward process, accelerating the forward operation time of the network and ensuring the recognition precision of the network.

Description

SVD-based acceleration method of recurrent neural network for handwritten Chinese character recognition

Technical Field

The invention relates to the technical field of pattern recognition and artificial intelligence, in particular to a circular neural network acceleration method for handwritten Chinese character recognition based on SVD.

Background

The handwritten Chinese character recognition has the problems of inconsistent styles among writers, existence of similar characters and the like due to various Chinese character categories, and is always an important pattern recognition problem which is concerned and dedicated to be solved by academia. The handwritten Chinese characters are classified according to input data, and can be classified into offline handwritten Chinese characters and online handwritten Chinese characters. For offline handwritten Chinese characters, input data is a static image for storing character forms, and the method is mainly applied to the fields of bill, scene character, ancient book document identification and the like. For online handwritten Chinese characters, time sequence information of character strokes is also stored, and the method is mainly applied to mobile systems such as mobile phones and tablet computers.

In recent years, the appearance of a recurrent neural network, particularly the proposal of an LSTM layer, effectively utilizes the time sequence information of input data, and simultaneously avoids the complicated manual characteristic extraction step, so that the recognition performance of online handwritten Chinese characters is greatly improved. However, in general, the recurrent neural network has a large time calculation complexity and a long forward calculation time, and is difficult to be embedded into the mobile device. Therefore, it is important to accelerate the recurrent neural network on the premise of ensuring the recognition accuracy.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a circular neural network acceleration method for handwritten Chinese character recognition based on SVD, which can reduce the network calculation amount and ensure the network recognition rate.

The technical scheme of the invention is realized as follows:

a recurrent neural network acceleration method for handwritten Chinese character recognition based on SVD Singular Value Decomposition (Singular Value Decomposition) comprises the following steps:

s1: designing and training a recurrent neural network for online handwritten Chinese characters;

s2: SVD decomposition is carried out on the parameter matrix, and the decomposed parameter matrix is calculated according to the multiple of required acceleration;

s3: initializing the network according to the parameter matrix obtained by decomposition;

s4: the whole network is retrained aiming at the online handwriting recognition task so as to achieve the fine adjustment effect;

s5: optimizing forward implementation and dynamically setting the time node length of the forward process network.

Further, the specific steps of S1 are as follows:

s11: designing the structure of the recurrent neural network, setting parameters of an LSTM layer and a full connection layer, and selecting the length of a time node;

s12: and taking the data of the training set as the input of the recurrent neural network, training the recurrent neural network by adopting a self-adaptive gradient descent method, terminating the training when the error of the recurrent neural network on the training set is completely converged, and storing the network parameters.

Further, the specific steps of S2 are as follows:

s21: carrying out SVD on the parameter matrix of the LSTM layer, and calculating a reserved parameter matrix according to the required acceleration multiple;

s22: and carrying out SVD on the parameter matrix of the full connection layer, and calculating the reserved parameter matrix according to the required acceleration multiple.

Further, the specific steps of S3 are as follows:

s31: initializing the LSTM layer according to the parameter matrix calculated in the step S2;

s32: the full connection layer is initialized according to the parameter matrix calculated in step S2.

Further, in step S4, the network after SVD decomposition is retrained at a learning rate less than 10 times for the handwritten chinese character recognition task to achieve the fine tuning effect.

Further, the specific steps of S5 are as follows:

s51: calculating the time node length of each input character;

s52: and dynamically setting the time node length of the recurrent neural network in the forward process according to the time node length of the input character.

Compared with the prior art, the principle and the advantages of the scheme are as follows:

1. the method adopts the cyclic neural network to aim at the online handwritten character recognition, effectively utilizes the characteristic of time sequence among the strokes of online Chinese characters, and obviously improves the recognition rate of the network.

2. The acceleration method based on SVD is adopted to decompose the parameter matrixes of the LSTM layer and the fully-connected layer, so that the redundancy of the matrixes is greatly reduced, and the identification performance of the network can be still ensured under the condition of reducing the calculation complexity; in the forward process, the time node length of the recurrent neural network is dynamically set according to the input characters, so that the forward time is effectively accelerated.

Drawings

FIG. 1 is a flow chart of the acceleration method of the recurrent neural network for handwritten Chinese character recognition based on SVD of the present invention;

FIG. 2 is a schematic diagram of the SVD decomposition in step S2 according to the present invention;

fig. 3 is a schematic diagram of the LSTM layer in step S2 of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention mainly solves the problem that the online handwritten Chinese character recognition speed based on the recurrent neural network is too slow, analyzes the calculation characteristics of an LSTM layer and a full connection layer, proposes a corresponding strategy, decomposes a parameter matrix into two smaller matrixes through SVD, modifies a realization code to a certain extent, and then calculates by multiplying the matrixes. The whole flow is shown in figure 1:

the embodiment of the invention comprises the following steps: s1: designing and training a recurrent neural network for online handwritten Chinese characters; s2: SVD decomposition is carried out on the parameter matrix, and the decomposed parameter matrix is calculated according to the multiple of required acceleration; s3: initializing the network according to the parameter matrix obtained by decomposition; s4: the whole network is retrained aiming at the online handwriting recognition task so as to achieve the fine adjustment effect; s5: optimizing forward implementation and dynamically setting the time node length of the forward process network. Specifically, a network is designed for training to obtain an initial model, then parameters reserved after SVD are calculated according to the size of each parameter matrix and the acceleration multiple, the network is initialized according to the parameters after SVD, retraining is performed on a handwriting recognition task, and finally forward implementation is optimized, the time node length of a forward process network is dynamically set, and forward calculation time is shortened.

The main steps of the embodiments of the present invention are described in detail below.

Step S1: designing and training a recurrent neural network for on-line handwritten Chinese characters, comprising the steps of

in an embodiment of the present invention, each of the entered online handwritten kanji characters is preprocessed into a sequence of 150 coordinate points, each represented by a 6-dimensional vector. The set deep cycle neural network model structure comprises 2 LSTM layers in total, the output dimension of each LSTM layer is 100 and 512 dimensions respectively, and the time node length is 150; the device comprises a full connection layer, wherein the number of output neurons of the full connection layer is 512; it is worth noting that a ReLU is followed as an activation function at the fully connected layer. The last layer is an output layer, and 3755 types are output; the overall structure of the initial network is represented as:

Input--100LSTM--512LSTM--512FC--Output

s12: training the designed network;

and (3) carrying out classification during training, training the network by adopting a gradient descent method of self-adaptive momentum, wherein the training comprises two steps of forward propagation and backward propagation, the forward network propagates the error of the network, the backward propagation updates the parameters of each layer, and the parameters of the network are continuously optimized. During training, every ten thousand iterations, the data of the test set is tested completely by using the model, and finally the model with the highest result obtained during testing is kept.

Step S2: and carrying out SVD on the parameter matrix, and calculating the decomposed parameter matrix according to the multiple of the required acceleration. Comprises the steps of

S21: and carrying out SVD on the parameter matrix of the LSTM layer, and calculating the parameter matrix after decomposition according to the multiple of acceleration required.

A schematic diagram of the SVD decomposition is shown in FIG. 2, assuming an input vector I ∈ R^mThe output vector O is equal to RⁿThe parameter matrix W is equal to R^m ^×nAfter SVD decomposition of W, W ═ USV can be obtained^ΤSorting the singular values from large to small, reserving the first r singular values and the corresponding singular vectors to obtain

Order to

Then W ≈ PQ. Thus, one parameter matrix can be decomposed by SVD into two smaller matrices, with the computational complexity reduced from O (mn) to O (r (m + n)).

A schematic diagram of the LSTM layer is shown in FIG. 3, which is formed by an Input Gate (i)_t∈R^NForget Gate (form Gate) f_t∈R^NAnd Output Gate (Output Gate) o_t∈R^NInput Modulation Gate g_t∈R^N4 doorsForming that the input of each time node comprises the input vector x of the current time node_t∈R^CAnd hidden state h of last time point_t∈R^NThe forward calculation formula is as follows:

i_t＝σ(W_xix_t+W_hih_t-1+b_i) (1)

f_t＝σ(W_xfx_t+W_hfh_t-1+b_f) (2)

o_t＝σ(W_xox_t+W_hoh_t-1+b_o) (3)

according to the above formula, input vector x is input_t4 matrices W for calculation_xi，W_xf，W_xo，W_xcViewed as a large matrix W_x∈R^C×4NFor hidden state h_t4 matrices W for calculation_hi，W_hf，W_ho，W_hcViewed as a large matrix W_h∈R^N×4N. The entire LSTM layer can thus be seen as comprising two parameter matrices W_x∈R^C×4NAnd W_h∈R^N×4NThe computational complexity of each time node is O (4 (N)²+ NC)). Respectively carrying out SVD on the two parameter matrixes, and reserving the first r singular values and singular vectors, then W_x≈P_xQ_x，W_h≈P_hQ_hEach parameter matrix is decomposed into two smaller matrices, and the total computational complexity is reduced to O (4(3N + C) r). Therefore, if one wants to accelerate LSTM by a factor of d, the value of r should be set to:

s22: and carrying out SVD on the parameter matrix of the full connection layer, and calculating the parameter matrix after decomposition according to the multiple of acceleration required.

Assuming that the input neuron of the full connection layer is x ∈ R^MThe output neuron is f e R^NThen, the calculation formula is:

f＝W_fx+b， (8)

wherein the parameter matrix W_f∈R^M×NThe bias b ∈ R^NThe computational complexity is o (mn).

For parameter matrix W_fSVD is carried out, the first r singular values and the corresponding singular vectors are reserved, and W can be obtained_f≈P_fQ_fThe parameter matrix is decomposed into two smaller matrices with a computational complexity of O (r (M + N)). Therefore, if one wants to speed up the fully connected layer by a factor of d, the value of r should be set to:

step S3: and initializing the network according to the parameter matrix obtained by decomposition. Comprises the steps of

S31: the LSTM layer is initialized according to the parameter matrix calculated in step S2.

As shown in FIG. 2, SVD decomposition turns one parameter matrix into two smaller parameter matrices, so in an LSTM implementation, one matrix is multiplied by W_xx_tDecomposed into two matrix multiplications P_xQ_xx_t. And initialized according to the parameter matrix calculated in step S2. For W_hh_t-1The implementation is modified and initialized as such.

S32: initializing the full connection layer according to the parameter matrix calculated in step S2.

For the fully-connected layer, its implementation is modified so that the first matrix multiplication W_fx is decomposed into two matrix multiplications P_fQ_fx and is initialized according to the parameter matrix calculated in step S2.

Step S4: and (4) retraining the whole network aiming at the online handwriting recognition task so as to achieve the fine tuning effect.

After SVD, only the first r larger singular values and singular vectors are reserved, so that the parameter precision is lost, and the overall performance of the network is influenced to a certain extent. Therefore, the whole recognition network is subjected to fine tuning training at a low learning rate aiming at the handwriting recognition task, and the effect of improving the recognition accuracy rate is achieved.

Step S5: optimizing forward implementation and dynamically setting the time node length of the forward process network. Comprises the steps of

S51: calculating the time node length of each input character;

the time node length of each input character can be obtained by counting the raw input data.

S52: dynamically setting the time node length of a recurrent neural network in the forward process according to the time node length of the input character;

in the training process of the recurrent neural network, the time node length of the network is required to be fixed, so the time node length T of the network needs to be preset, and according to the LSTM forward calculation formula in the step S2, the total calculation complexity of the LSTM layer is O (T · 4 (N)²+ NC)) is linear with time node length. Suppose that the length of the time node of the ith character in the Chinese character set is T_iThen, the time node length T of the network in the training process is determined by the following formula:

T＝max＜T₁,T₂,T₃,...,T_N-1,T_N-2,T_N＞ (10)

in the training process, for characters with the time node length smaller than T, the time node length reaches T by zero filling. Thus, for most charactersThe time node length of the training network is larger than that of the characters, and therefore the calculation complexity is greatly improved. In the forward implementation of the present invention, the time node length in the network forward process is dynamically set according to the time node length of each input character. Suppose that the time node length of the input character is T_iThe time complexity of the recurrent neural network is reduced to O (T)_i·4(N²+NC))(T_iT is less than or equal to T). By using the dynamically changed time node length, the calculation complexity can be effectively reduced, the forward calculation time is accelerated, and the identification precision is not influenced.

The embodiment adopts the cyclic neural network to aim at the online handwritten character recognition, effectively utilizes the characteristic of time sequence among the strokes of the online Chinese characters, and obviously improves the recognition rate of the network; in addition, an acceleration method based on SVD decomposition is adopted to decompose the parameter matrixes of the LSTM layer and the full connection layer, so that the redundancy of the matrixes is greatly reduced, and the identification performance of the network can be still ensured under the condition of reducing the calculation complexity; in the forward process, the time node length of the recurrent neural network is dynamically set according to the input characters, so that the forward time is effectively accelerated.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.

Claims

1. The SVD-based acceleration method of the recurrent neural network for handwritten Chinese character recognition is characterized in that: the method comprises the following steps:

s5, optimizing forward implementation, and dynamically setting the time node length of the forward process network;

the specific steps of S2 are as follows:

s22: carrying out SVD on the parameter matrix of the full connection layer, and calculating a reserved parameter matrix according to the required acceleration multiple;

the specific steps of S5 are as follows:

s51: calculating the time node length of each input character;

2. The SVD-based acceleration method for recurrent neural networks for handwritten Chinese character recognition of claim 1, wherein: the specific steps of S1 are as follows:

3. The SVD-based acceleration method for recurrent neural networks for handwritten Chinese character recognition of claim 1, wherein: the specific steps of S3 are as follows:

4. The SVD-based acceleration method for recurrent neural networks for handwritten Chinese character recognition of claim 1, wherein: in step S4, the network after SVD decomposition is retrained at a learning rate less than 10 times for the handwritten chinese character recognition task to achieve the fine tuning effect.