CN107590497A

CN107590497A - Off-line Handwritten Chinese Recognition method based on depth convolutional neural networks

Info

Publication number: CN107590497A
Application number: CN201710855035.7A
Authority: CN
Inventors: 赵辉; 王艳美; 刘真三
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2017-09-20
Filing date: 2017-09-20
Publication date: 2018-01-16

Abstract

The present invention proposes an off-line handwritten Chinese character method using a deep convolutional neural network. Specifically, a spatial transformation network is first used to rotate, translate and scale the handwritten Chinese characters in the HWDB1.1 database. The original arbitrary position, proportion and direction of the Chinese character pictures are corrected. The corrected handwriting dataset is then fed into a convolutional neural network for recognition and classification. The twist parameter increments for rotating, translating, and scaling the original image are calculated by a convolutional neural network or neural network that is continuously adjusted through backpropagation. The delta of the warp parameter and the warp parameter update parameters in a composite way. Finally, we built the network framework of the handwritten Chinese character recognition of the present invention on the TensorFlow deep learning framework platform. And compared with the network framework of handwritten Chinese character recognition using convolutional neural network alone, after training with a large number of data sets, the test results show that the recognition rate of handwritten Chinese characters has been significantly improved.

Description

Offline Handwritten Chinese Character Recognition Method Based on Deep Convolutional Neural Network

技术领域technical field

本发明属于图像分类技术领域，具体是基于深度卷积神经网络的脱机手写汉字识别方法。The invention belongs to the technical field of image classification, in particular to an off-line handwritten Chinese character recognition method based on a deep convolutional neural network.

背景技术Background technique

脱机手写汉字识别是模式识别领域中的一个子方向。脱机是指所处理的手写文字是通过扫描仪或摄像头等图像捕捉设备采集到的手写文字二维图，以下均简称手写汉字识别。2011年以前发表的研究论文和技术报告大都着重讨论如何选择特征和匹配方法以适应手写体汉字字形的变化。就是在特征提取算法以及分类器的设计。传统HCCR步骤包括：图形归一化、特征提取、降维、分类器训练。而且由于汉字数量多，结构复杂、相似字多和书写风格多变等问题，这些传统方法不仅步骤复杂，而且在特征提取步骤中如果选取的特征不适合将会严重影响识别效果。虽然基于MQDF和DLQDF的方法已经取得了不错的效果，但是目前已经达到了他们的瓶颈。Off-line handwritten Chinese character recognition is a sub-direction in the field of pattern recognition. Offline means that the handwritten characters to be processed are two-dimensional images of handwritten characters collected by image capture devices such as scanners or cameras, hereinafter referred to as handwritten Chinese character recognition. Most of the research papers and technical reports published before 2011 focus on how to select features and matching methods to adapt to the variation of handwritten Chinese characters. It is in the design of feature extraction algorithms and classifiers. Traditional HCCR steps include: graph normalization, feature extraction, dimensionality reduction, and classifier training. Moreover, due to the large number of Chinese characters, complex structures, many similar characters, and changing writing styles, these traditional methods not only have complicated steps, but also if the selected features are not suitable in the feature extraction step, the recognition effect will be seriously affected. Although the methods based on MQDF and DLQDF have achieved good results, they have reached their bottleneck.

近几年由于基于深度学习的图像识别取得了重大的突破，深度卷积神经网络(Deep Convolutional Neural Network,DCNN)也应用在了手写汉字识别领域。在2013ICDAR国际比赛中，Fujitsu团队采用CNN模型以94.77％的识别率获得冠军。之后出现了采用改进的模型以及预处理手段的手写汉字识别模型。最常见的改进方法是使网络增加对不同手写汉字的形变的容忍度，其中就包括数据增强技术和空间池化，这两个方法有以下缺点，前者，根据对原始数据几何形变产生新的样本的方法是在手写汉字识别中应用最多的一种方法，如果只有样本成指数级增长的模型的学习能力才得以提升，则也是个不容忽视的问题。后者，池化操作会损坏图像细节。而且随着数据的爆炸式增长深度卷积神经网络虽然获得很大的成功，但是还没有一种准则解决同一种类数据的几何变化。比如同一个手写的“汉”字，不同的人写出来的大小和笔画的形状都是不尽相同的。数据的变换多样是影响深度卷积神经网络识别效果的关键因素。In recent years, due to the major breakthroughs in image recognition based on deep learning, Deep Convolutional Neural Network (DCNN) has also been applied in the field of handwritten Chinese character recognition. In the 2013 ICDAR international competition, the Fujitsu team won the championship with a recognition rate of 94.77% using the CNN model. Later, a handwritten Chinese character recognition model using improved models and preprocessing methods appeared. The most common improvement method is to increase the tolerance of the network to the deformation of different handwritten Chinese characters, including data enhancement technology and spatial pooling. These two methods have the following disadvantages. The former generates new samples based on the geometric deformation of the original data. The method is the most widely used method in handwritten Chinese character recognition. If only the learning ability of the model with exponential growth of samples can be improved, it is also a problem that cannot be ignored. In the latter, pooling operations can corrupt image details. Moreover, with the explosive growth of data, although the deep convolutional neural network has achieved great success, there is no criterion to solve the geometric changes of the same type of data. For example, the same handwritten "Han" character has different sizes and stroke shapes written by different people. The variety of data transformation is the key factor affecting the recognition effect of deep convolutional neural network.

改进的方法有：预处理方式，对汉字进行扭曲形变来扩充数据集，提高卷积神经网络的泛化能力；特征提取，结合传统Gabor方向特征、HoG特征等，采用不同DCNN模型。目前已经采用的模型有Szegedy等组成的GoogLeNet，K He,X Zhang等人设计的ResLeNet等模型。这些模型都是为图片分类设计。与手写汉字识别还是有一定的区别，虽然达到了不错的效果但是网络结构较深，模型复杂，调试困难。而且对汉字进行扭曲形变的程度不好把握，不能自适应的选取角度。2015年，来自于Google旗下的新锐AI公司DeepMind的四位剑桥PhdThe improved methods include: preprocessing method, distorting and deforming Chinese characters to expand the data set, and improving the generalization ability of the convolutional neural network; feature extraction, combining traditional Gabor direction features, HoG features, etc., using different DCNN models. The currently used models include GoogLeNet composed of Szegedy et al., ResLeNet designed by K He, X Zhang et al. These models are all designed for image classification. There is still a certain difference from handwritten Chinese character recognition. Although it has achieved good results, the network structure is deep, the model is complex, and debugging is difficult. Moreover, the degree of distortion and deformation of Chinese characters is not easy to grasp, and the angle cannot be selected adaptively. In 2015, four Cambridge PhDs from DeepMind, a cutting-edge AI company under Google,

研究员设计了空间变换网络(Spatial Transformer Network)，可以实现自适应的旋转、平移、缩放。但是直接将反向空间变换网络与卷积神经网络构成的框架用于手写汉字识别，也有一些问题。比如，经过反向空间变换网络之后，虽然汉字的方向得到纠正，但是纠正过的汉字笔画较粗。而且处理过程中样本是经过多层放缩、旋转平移的样本，然后直接输入到卷积神经网络识别。因为经过放缩的样本裁剪掉了原图片的边缘信息，用在手写汉字识别中会造成字体残缺，严重影响识别效果。The researchers designed a spatial transformation network (Spatial Transformer Network), which can achieve adaptive rotation, translation, and scaling. However, there are also some problems in directly using the framework composed of reverse spatial transformation network and convolutional neural network for handwritten Chinese character recognition. For example, after the reverse space transformation network, although the direction of the Chinese characters is corrected, the strokes of the corrected Chinese characters are thicker. Moreover, the samples in the processing process are samples that have been scaled, rotated and translated by multiple layers, and then directly input to the convolutional neural network for recognition. Because the scaled sample cuts out the edge information of the original image, it will cause incomplete fonts when used in handwritten Chinese character recognition, which will seriously affect the recognition effect.

因为目前大部分网络忽略了实际手写汉字的空间畸变，因为在实际书写环境、风格、方向、位置、大小的不同导致样本集变化多样。然而，CNN仍然缺乏对输入样本的空间变化的鲁棒性。传统的归一化方法仅把样本转化为规定大小的规范化汉字，虽然它对分类任务起到了举足轻重的作用，但是归一化方法不能保证HCCR任务是最佳的。而反向合成空间变换网络能够在没有标注关键点的情况下，根据任务学习图片的特征变换参数，将输入图片或者学习的特征空间上进行对齐，从而减少由于空间的旋转、平移、尺度、扭曲等几何变换对分类和识别的影响。Because most of the current networks ignore the spatial distortion of actual handwritten Chinese characters, because the actual writing environment, style, direction, position, and size are different, resulting in a variety of sample sets. However, CNNs still lack robustness to spatial variations of input samples. The traditional normalization method only converts samples into normalized Chinese characters of a specified size. Although it plays a pivotal role in the classification task, the normalization method cannot guarantee that the HCCR task is optimal. The reverse synthesis space transformation network can align the input picture or the learned feature space according to the feature transformation parameters of the task learning picture without labeling the key points, thereby reducing the spatial rotation, translation, scale, and distortion. The effect of isogeometric transformations on classification and recognition.

发明内容Contents of the invention

本发明针对上面存在的问题以及汉字的特点，采用反向合成空间变换算法以及深度卷积神经网络等算法对脱机手写汉字进行识别。本方法对不同的书写风格以及不同的书写环境有良好的鲁棒性。其中我们采用了反向合成空间变换网络来解决来因书写风格多样带来的字体扭曲、变形、倾斜等问题。并设计了与之对应的深度卷积神经网络。深度卷积神经网络可以有效提取特征并分类。最后反向合成空间变换网络的输出作为卷积神经网路的输入，实现对于手写汉字的识别。Aiming at the above problems and the characteristics of Chinese characters, the present invention adopts algorithms such as reverse synthesis space transformation algorithm and deep convolutional neural network to recognize offline handwritten Chinese characters. This method is robust to different writing styles and different writing environments. Among them, we use the reverse synthesis space transformation network to solve the problems of font distortion, deformation, and tilt caused by various writing styles. And designed the corresponding deep convolutional neural network. Deep convolutional neural networks can efficiently extract features and classify them. Finally, the output of the reverse synthesis space transformation network is used as the input of the convolutional neural network to realize the recognition of handwritten Chinese characters.

本发明的技术方案及流程顺序如下：Technical scheme of the present invention and flow sequence are as follows:

(1)搭建深度卷积神经网络的TensorFlow深度学习框架平台；(1) Build a TensorFlow deep learning framework platform for deep convolutional neural networks;

(2)将HWDB1.1数据集的GNT格式数据转化为二进制并存储为PKL格式(2) Convert the GNT format data of the HWDB1.1 dataset into binary and store it in PKL format

(3)读取PKL格式数据并进行归一化处理，并将其转换为训练集、交叉验证集和测试集；(3) Read the PKL format data and perform normalization processing, and convert it into training set, cross-validation set and test set;

(4)在TensorFlow框架平台中实现反向合成空间变换网络和卷积神经网络，并将反相合成空间变换网络的输出作为卷积神经网络的输入；(4) Realize the reverse synthesis space transformation network and convolutional neural network in the TensorFlow framework platform, and use the output of the reverse synthesis space transformation network as the input of the convolutional neural network;

(5)设置方向合成空间变换网络框架模型并将其加入本发明人自己设计的卷积神经网络框架中，形成反相合成空间变换网络加卷积神经网络的模型结构；(5) direction synthesis space transformation network frame model is set and added in the convolutional neural network framework designed by the inventor himself, forming the model structure of inverse synthesis space transformation network plus convolutional neural network;

(6)训练和测试构建好的网络框架。(6) Train and test the constructed network framework.

(7)采用TensorBoard工具可视化中间结果、训练和测试的结果。对比分析本发明网络结构的有效性。(7) Use the TensorBoard tool to visualize intermediate results, training and testing results. Comparatively analyze the effectiveness of the network structure of the present invention.

所述步骤2)的详细过程：HWDB1.1是CASIA-HWDB数据库的一个脱机手写汉字的数据集。其是中科院在2007-2010年间收集。存储的格式为独有的二进制GNT格式。因此我们需要首先将此数据集转化为可用的格式。因为此数据集书写字体多样，是目前验证模型识别效果最多也是最权威的数据集之一。The detailed process of said step 2): HWDB1.1 is a dataset of off-line handwritten Chinese characters of the CASIA-HWDB database. It was collected by the Chinese Academy of Sciences during 2007-2010. The storage format is the unique binary GNT format. So we need to convert this dataset into a usable format first. Because this data set has a variety of writing fonts, it is currently one of the most authoritative data sets that verify the model's recognition effect.

所述步骤3)：将图片进行反转并归一化，前者可以使图片大部分数据为零，加快运算速度，后者可以提高网络的收敛速度。Said step 3): reverse and normalize the picture, the former can make most of the data in the picture zero, speed up the calculation speed, and the latter can improve the convergence speed of the network.

所述步骤4)：反向合成空间变换网络的具体过程为：训练样本图像与初始化的扭曲参数p相乘，得到第一次纠正的图片。将其输入到一个几何预测器，几何预测器是由一个小型卷积神经网络或者神经网络构成。网络的前向传播预测扭曲增量Δp，反向传播BP算法来更新几何预测器的参数，进一步迭代更新扭曲参数。每更新Δp之后，用合成的方式更新参数p。之后将P组成的矩阵与训练样本相乘结果为纠正过的图片，将经过纠正的图片再此输入的几何预测器，这样迭代更新Δp、p。直到满足提前给定的循环次数，将最后的P值与原训练样本相乘，将得到的结果作为本发明的设计的卷积神经网络的输入进一步的提取特征和分类。The step 4): the specific process of inversely synthesizing the space transformation network is: multiplying the training sample image with the initialized distortion parameter p to obtain the first corrected image. Feed this into a geometric predictor, which is built from a small convolutional neural network, or neural network. The forward propagation of the network predicts the distortion increment Δp, and the backpropagation BP algorithm updates the parameters of the geometric predictor, and further iteratively updates the distortion parameters. After every update of Δp, the parameter p is updated synthetically. Afterwards, the matrix composed of P is multiplied by the training sample, and the result is a corrected picture, and the corrected picture is input to the geometric predictor, so that Δp and p are iteratively updated. Until the number of cycles given in advance is satisfied, the final P value is multiplied by the original training sample, and the obtained result is used as the input of the convolutional neural network designed by the present invention for further feature extraction and classification.

所述步骤5)：卷积神经网络采用前两个卷积加一个池化层，之后两个卷积层，卷积层之后是两个全连接层。Said step 5): the convolutional neural network adopts the first two convolutions plus a pooling layer, followed by two convolutional layers, and two fully connected layers after the convolutional layer.

与目前的基于深度卷积神经网络的手写汉字相比，1)本发明采用反向合成空间变换网络可以将手写汉字集经过扭曲形变而转化为易于识别的汉字图片有效的提高了手写汉字的识别效果.而且避免的人为设计参数对图片的扭曲形变等操作。2)卷积神经网络采用两个卷积层加一个池化层加两个卷积层的网络结构。采用较多的卷积层和小的卷积核来有效的提取汉字笔画特征。Compared with the current handwritten Chinese characters based on the deep convolutional neural network, 1) the present invention adopts the reverse synthesis space transformation network to convert the handwritten Chinese character set into easy-to-recognize Chinese character pictures through twisting and deformation, effectively improving the recognition of handwritten Chinese characters Effect. And avoid man-made design parameters such as distortion and deformation of the picture. 2) The convolutional neural network adopts a network structure of two convolutional layers plus one pooling layer plus two convolutional layers. More convolutional layers and small convolution kernels are used to effectively extract the stroke features of Chinese characters.

附图说明Description of drawings

图1反向合成空间变换网络流程图Figure 1 Flowchart of reverse synthesis space transformation network

图2CNN_4网络结构Figure 2 CNN_4 network structure

图3方案一经过反向空间变换网络前后对比图Figure 3 Comparison diagram of scheme 1 before and after the reverse space transformation network

(a)原手写汉字(a) Original handwritten Chinese characters

(b)经过ICSTN_2纠正的手写汉字(b) Handwritten Chinese characters corrected by ICSTN_2

图4方案二经过反向空间变换网络前后对比图Figure 4: Comparison of scheme 2 before and after the reverse space transformation network

(a)原手写汉字(a) Original handwritten Chinese characters

图5方案三经过反向空间变换网络前后对比图Figure 5. Comparison of scheme three before and after the reverse space transformation network

(a)原手写汉字(a) Original handwritten Chinese characters

(b)经过ICSTN_4纠正的手写汉字(b) Handwritten Chinese characters corrected by ICSTN_4

图6方案一代价函数和测试准确率变化曲线图Figure 6 Scheme 1 Cost function and test accuracy change curve

(a)采用ICSTN_1+CNN_4损失函数(a) Using ICSTN_1+CNN_4 loss function

(b)采用ICSTN_1+CNN_4测试准确率(b) Test accuracy using ICSTN_1+CNN_4

图7方案二代价函数和测试准确率变化曲线图Figure 7 The cost function and test accuracy change curve of scheme 2

(a)采用ICSTN_2+CNN_4损失函数(a) Using ICSTN_2+CNN_4 loss function

(b)采用ICSTN_2+CNN_4测试准确率(b) Test accuracy using ICSTN_2+CNN_4

图8方案三代价函数和测试准确率变化曲线图Figure 8 The three-cost function and test accuracy change curve of the scheme

(a)ICSTN_4+CNN_4损失函数(a) ICSTN_4+CNN_4 loss function

(b)采用ICSTN_4+CNN_4测试准确率(b) Test accuracy using ICSTN_4+CNN_4

图9CNN_4代价函数和测试准确率变化曲线图Figure 9 CNN_4 cost function and test accuracy change curve

(a)采用CNN_4损失函数(a) Using CNN_4 loss function

(b)CNN_4测试准确率(b) CNN_4 test accuracy

具体的实施方式specific implementation

下面结合附图对本发明进一步的详细说明.Below in conjunction with accompanying drawing, the present invention is described in further detail.

以下给出网络的具体构造方法，以及本发明采用的新理论的算法原理及其效果。对本发明的实施进一步的说明。本发明所采用的实例为HWDB1.1的200个种类的汉字，每个类别300个汉字，共60000个样本，其中训练集，交叉验证集和测试机分别为：48000、6000，6000。本发明所设计的网络运行的具体流程为：The specific construction method of the network, as well as the algorithm principle and effect of the new theory adopted by the present invention are given below. The implementation of the present invention is further described. The example that the present invention adopts is 200 kinds of Chinese characters of HWDB1.1, and each category has 300 Chinese characters, totally 60000 samples, wherein training set, cross-validation set and testing machine are respectively: 48000, 6000, 6000. The specific process of the network operation designed by the present invention is:

(1)数据预处理，将获取的二进制图像反转，归一化处理，以提高网络的收敛性。并将图像尺寸统一调整为32*32，然后分配为训练集、交叉验证集和测试集；(1) Data preprocessing, inverting the acquired binary image and normalizing it to improve the convergence of the network. And uniformly adjust the image size to 32*32, and then distribute it as training set, cross-validation set and test set;

(2)将训练样本输入到下一步形变参数，记为I(x)，x＝(x,y)表示图像中任意像素点得坐。p＝[p₁ p₂ … p_n]表示扭曲参数。本发明都以仿射变换为p＝[p₁ p₂ p₃ p₄ p₅ p₆],相关的其次坐标表变换矩阵为：(2) Input the training sample to the next step deformation parameter, denoted as I(x), x=(x,y) means that any pixel point in the image is located. p=[p ₁ p ₂ . . . p _n ] represents a twist parameter. In the present invention, the affine transformation is p=[p ₁ p ₂ p ₃ p ₄ p ₅ p ₆ ], and the relevant secondary coordinate table transformation matrix is:

(3)W(p)对原图像进行变换：(3) W(p) transforms the original image:

ImWarp(x)＝I(x)·W(p) (2)ImWarp(x)=I(x)·W(p) (2)

(4)因为经过扭曲变换图像坐标是非整的导致图像不连续，本发明将经过变换的图像ImWarp(x)进行双线性插值：(4) Because image coordinates are non-integral and cause image discontinuity through twisting and transforming, the present invention carries out bilinear interpolation through transformed image ImWarp(x):

V_i ^c：表示第c个样本的第i个像素点；V _i ^c : represents the i-th pixel of the c-th sample;

(x_i,y_i)：表示训练样本ImWarp的第i个像素点坐标；(x _i , y _i ): Indicates the i-th pixel coordinate of the training sample ImWarp;

W：表示图像的宽；W: Indicates the width of the image;

H：表示图像的高；H: Indicates the height of the image;

(5)将图像V(x)输入到卷积神经网络或者神经网络中，本发明称此网络为几何预测器(后面将会详细的推导为何用神经网络或者卷积神经网络学习扭曲增量)，几何预测器用来学习图像的扭曲增量Δp，其网络结构的组成共有三种方案。最后本发明将会逐一仿真实现每个方案的效果。三种方案分别为：(5) Input the image V(x) into the convolutional neural network or neural network. The present invention refers to this network as a geometric predictor (why the neural network or convolutional neural network is used to learn the distortion increment will be deduced in detail later) , the geometric predictor is used to learn the distortion increment Δp of the image, and there are three schemes for the composition of its network structure. Finally, the present invention will simulate and realize the effect of each scheme one by one. The three options are:

a)方案一：网络的结构为:Conv(7×7,4)+Conv(7×7,8)+P+FC(48)+FC(8),其中以Conv表示卷积层，P表示池化层，FC表示全连接层，括号内分别为卷积核的大小和个数，全连接层括号内为神经元个数；a) Scheme 1: The structure of the network is: Conv(7×7,4)+Conv(7×7,8)+P+FC(48)+FC(8), where Conv represents the convolutional layer, and P represents Pooling layer, FC means fully connected layer, the size and number of convolution kernels in parentheses, and the number of neurons in parentheses in fully connected layer;

b)方案二：网络的结构为:Conv(9×9,8)+FC(48)+FC(8)；b) Scheme 2: The structure of the network is: Conv(9×9,8)+FC(48)+FC(8);

c)方案三：网络的结构为:FC(48)+FC(8)c) Scheme 3: The structure of the network is: FC(48)+FC(8)

几何预测器的前向传播输出扭曲增量Δp，反向传播更新网络参数，进一步产生新的Δp。The forward propagation of the geometric predictor outputs the warp increment Δp, and the backpropagation updates the network parameters to further generate a new Δp.

(6)更新扭曲参数，扭曲参数p的更新方式为合成。具体公式下：(6) Updating the distortion parameter, the update method of the distortion parameter p is synthesis. Under the specific formula:

与之对应的变换矩阵为：The corresponding transformation matrix is:

W(p_out)＝W(ΔP)·W(p_in) (5)W(p _out )＝W(ΔP)·W(p _in ) (5)

(7)如果循环次数recurN＞1,转到步骤(3)直到循环结束。反向合成空间变换网络可以多次循环以获得对齐和扭曲图像最好的效果。(7) If the number of cycles recurN>1, go to step (3) until the end of the cycle. The inverse synthesis spatial transformation network can be cycled multiple times to achieve the best results in aligning and warping images.

(8)循环结束后，将最终更新的扭曲参数p组成的变换矩阵W(p_out)作用在原图像j即：(8) After the loop ends, the transformation matrix W(p _out ) composed of the finally updated distortion parameter p is applied to the original image j, namely:

Im＝I(x)·W(P_out) (6)Im=I(x)·W(P _out ) (6)

，之后将图像输入到卷积神经网络中；, and then input the image into the convolutional neural network;

(9)卷积神经网络的框架如图2，网络结构为Conv(3×3,8)+Conv(3×3,16)+P+Conv(3×3,32)+Conv(3×3,64)+P+FC(100)+FC(200)，每个参数具体的代表什么与上述的相同。接下来我们以一个卷积层一个池化层和一个全连接层进行实例讲解。(9) The framework of the convolutional neural network is shown in Figure 2. The network structure is Conv(3×3,8)+Conv(3×3,16)+P+Conv(3×3,32)+Conv(3×3 ,64)+P+FC(100)+FC(200), what each parameter represents is the same as above. Next, we will use a convolutional layer, a pooling layer, and a fully connected layer to give an example.

卷积层，积层的主要内容是采用可训练的卷积核来对输入数据进行卷积操作，并将结果以某种组合形式输出，其实质上就是对输入数据的特征提取。一般情况下，为了使模型获得非线性特性并把输出限制在给定的范围内，会将卷积的输出用一个非线性函数(Non-linear Function)来转换。这个函数也称为激活函数(Activation Function)。本实例采用的是收敛速度较快的Relu激活函数。卷积层的具体运算公式为:The convolutional layer, the main content of the convolutional layer is to use a trainable convolution kernel to perform convolution operations on the input data, and output the results in a certain combination form, which is essentially the feature extraction of the input data. In general, in order to obtain nonlinear characteristics of the model and limit the output within a given range, the output of the convolution is converted with a non-linear function (Non-linear Function). This function is also called the activation function (Activation Function). This example uses the Relu activation function with a faster convergence speed. The specific operation formula of the convolutional layer is:

其中Im为反向空间变换网络的输出，就是经过反向空间变换网络纠正的扭曲汉字的训练样本，W为卷积核，b为卷积层偏置；U为卷积层输出，f即为Relu激活函数，Y为卷积输出U经过激活函数的输出，是整个卷积层的输出，也是下一层的输入。Among them, Im is the output of the reverse space transformation network, which is the training sample of distorted Chinese characters corrected by the reverse space transformation network, W is the convolution kernel, b is the convolution layer bias; U is the convolution layer output, and f is Relu activation function, Y is the output of the convolution output U through the activation function, which is the output of the entire convolution layer and the input of the next layer.

池化层，主要任务是在二维空间对输入的数据样本进行采样操作，也称为下采样操作，具体的计算过程如下：The main task of the pooling layer is to sample the input data samples in two-dimensional space, also known as the downsampling operation. The specific calculation process is as follows:

池化层可使网络对平移，旋转等具有鲁棒性。使网络有更好的容忍度。Pooling layers make the network robust to translation, rotation, etc. Make the network more tolerant.

全连接层，全连接层的作用是将输入的二维特征矩阵降维到一维特征向量,便于输出层进行分类处理。Fully connected layer, the function of the fully connected layer is to reduce the input two-dimensional feature matrix to a one-dimensional feature vector, which is convenient for the output layer to perform classification processing.

输出层，输出层的作用是根据上面全连接层的输出的一维向量进行分类，本列采用的是softmax交叉熵损失Output layer, the function of the output layer is to classify according to the one-dimensional vector output of the above fully connected layer. This column uses the softmax cross entropy loss

图像的扭曲偏移量Δp可以由卷积神经网络模型求取是因为对于反向合成KL图像对齐算法：The distortion offset Δp of the image can be calculated by the convolutional neural network model because for the reverse synthesis KL image alignment algorithm:

式(9)反向合成图像对齐的目标函数，其中I是原图像，T是目标图像，p：扭曲参数、Δp：要求取的扭曲增量；I(p)和T(Δp)分别表示经过扭曲参数作用的图像。然后，将上式(9)经过一阶Taylor近似得:Equation (9) is the objective function of reverse composite image alignment, where I is the original image, T is the target image, p: the distortion parameter, Δp: the required distortion increment; I(p) and T(Δp) respectively represent the Image of the effect of warping parameters. Then, the above formula (9) is approximated by the first-order Taylor:

其最小二乘解为：Its least squares solution is:

其中+符号代表广义逆。接下来参数p的更新方式为：where the + sign represents the generalized inverse. Next, the parameter p is updated in the following way:

其中，为扭曲参数p的合成运算，式(10)的解就是个线性回归形式，可以将其转化为一般式：Among them, is the synthesis operation of the distortion parameter p, the solution of formula (10) is a linear regression form, which can be transformed into a general formula:

Δp＝R·I(p)+b (13)Δp=R·I(p)+b (13)

其中R是估计图像和图像的几何形状之间的线性关系的线性回归，b为偏置。主要是求取参数R和b。线性回归模型我们可以将其参数化为神经网络或者卷积神经网络，网络的输出即为Δp。where R is a linear regression that estimates the linear relationship between the image and the geometry of the image, and b is the bias. Mainly to obtain parameters R and b. We can parameterize the linear regression model as a neural network or a convolutional neural network, and the output of the network is Δp.

反向合成空间变换网络不是由网络直接预测图像的几何变换，而是在分类之前多次循环使用几何预测器更新扭曲参数p，而且在每次循环之后，扭曲参数p在输出图像后得以保存，这将使得在重新采样原图像时避免边缘效应。几何预测子{R,b},用监督梯度下降优化和学习的目标函数，其目标函数为：The reverse synthetic spatial transformation network does not directly predict the geometric transformation of the image by the network, but uses the geometric predictor to update the warp parameter p multiple times before classification, and after each cycle, the warp parameter p is saved after outputting the image, This will avoid edge effects when resampling the original image. The geometric predictor {R,b}, the objective function optimized and learned with supervised gradient descent, its objective function is:

Δp:扭曲增量，是几何预测器的输出Δp: Warp increment, is the output of the geometry predictor

M：图像的样本数M: the number of samples of the image

由公式，(12)和(13)可得：From the formula, (12) and (13) can get:

采用合成算法更新参数，不用每次都更新图像对参数的一阶偏导数，反向传播中求取梯度为中，扭曲参数的偏导数是个闭式数学表达式，P_in，P_out分别为输入输出的扭曲参数：Use the synthetic algorithm to update the parameters, instead of updating the first-order partial derivative of the image to the parameter every time, the gradient is calculated in the backpropagation, and the partial derivative of the distortion parameter is a closed mathematical expression, P _in and P _out are the input respectively Warp parameters for output:

I是单位矩阵，它使得梯度可以反向传播到几何预测器。I is the identity matrix, which enables gradients to be backpropagated to the geometric predictor.

为了直观地说明我们构造的网络框架的可行性和有效性，本发明通过实验仿真进行了验证。并逐一验证了我们设计的几何预测器的网络的三个方案。每个方案所组成的框架在训练时采用相同的参数配置：epoch＝16，初始学习率为0.001，每两个epoch学习率乘以0.8的因子。为了对比采用反相合成空间的有效性，本发明单独仿真实现了仅有卷积神经网络的识别效果。In order to intuitively illustrate the feasibility and effectiveness of the network framework we constructed, the present invention is verified through experimental simulation. And verified the three schemes of the geometric predictor network we designed one by one. The framework composed of each scheme adopts the same parameter configuration during training: epoch=16, the initial learning rate is 0.001, and the learning rate is multiplied by a factor of 0.8 every two epochs. In order to compare the effectiveness of using the inverse synthesis space, the present invention realizes only the recognition effect of the convolutional neural network through separate simulation.

首先本发明可视化了经过反向空间变换网络之前和之后的训练样本。方案一、二、三所对应的模型分别为：ICSTN_1+CNN_4、ICSTN_2+CNN_4和ICSTN_4+CNN_4。其中ICSTN_后面的数字为模型深度。如图3、4、5分别为三个不同的几何预测器模型对样本变换前后对比图。从整体看，在经过反向合成空间变换网络之后，每个汉字大小相同、笔画均匀、排列整齐，一些扭曲的笔画得到纠正，比原汉字辨识度高了很多。不同模型间的差异也相当显著，整体趋势是几何预测器的网络越深，经过空间变换的汉字笔画就越清晰，而且旋转过度情况较少，越多的扭曲笔画得到纠正。如图5(b)中汉字“巾”、“臼”、“尽”等字得到恰当的纠正变形。而图3效果最差，有较多的汉字产生了倾斜。综上可得出，反向合成空间变换网络的几何预测器的网络越深越能够有效的纠正扭曲形变的汉字。Firstly, the present invention visualizes the training samples before and after the inverse spatial transformation network. The models corresponding to schemes 1, 2, and 3 are: ICSTN_1+CNN_4, ICSTN_2+CNN_4, and ICSTN_4+CNN_4. The number after ICSTN_ is the model depth. Figures 3, 4, and 5 are the comparison charts of three different geometric predictor models before and after sample transformation. On the whole, after the reverse synthesis space transformation network, each Chinese character has the same size, uniform strokes, and neat arrangement, and some distorted strokes have been corrected, which is much more recognizable than the original Chinese characters. The difference between different models is also quite significant, and the overall trend is that the deeper the geometric predictor network, the clearer the spatially transformed Chinese character strokes, and the less over-rotation, the more distorted strokes are corrected. As shown in Figure 5(b), Chinese characters such as "jin", "jiu", and "jin" have been properly corrected and deformed. And the effect of Fig. 3 is the worst, there are more Chinese characters to produce inclination. In summary, it can be concluded that the deeper the geometric predictor network of the reverse synthesis space transformation network, the more effective it is to correct the distorted Chinese characters.

其次分析各网络框架的迭代曲线图：图6、7、8分别为三个方案所组成三个模型的损失函数曲线和测试准确率曲线。纵坐标的迭代次数是以50为单位，即每50次训练迭代的结果显示一次，所以共50000次迭代，10次epoch。从三个图的(a)图来看ICSTN_2+CNN_4和ICSTN_4+CNN_4模型收敛较快，模型ICSTN_4+CNN_4的模型稳定性最好，因为曲线的尖点较少。但是ICSTN_2+CNN_4模型的loss最接近零点。从三个图中的(b)图中可看得ICSTN_2+CNN_4的识别率最高。那为什么方案三形成的模型纠正汉字的效果最好，却没有得到最好的识别率呢？原因主要是ICSTN_4+CNN_4模型的参数较多在相同的epoch下没有得到充分的训练。CNN模型与方案一，二、三组成的模型相比差距较大，图9中的(a)(b)与图6、7、8的(a)(b)图相比损失函数收敛的值最大，识别率仅有82.08％。CNN_4的识别效果不好是因为模型较浅。Next, analyze the iteration curves of each network framework: Figures 6, 7, and 8 are the loss function curves and test accuracy curves of the three models composed of the three schemes. The number of iterations on the ordinate is in units of 50, that is, the results of every 50 training iterations are displayed, so there are a total of 50,000 iterations and 10 epochs. From (a) of the three figures, it can be seen that the ICSTN_2+CNN_4 and ICSTN_4+CNN_4 models converge faster, and the model ICSTN_4+CNN_4 has the best model stability because the curve has fewer sharp points. But the loss of the ICSTN_2+CNN_4 model is closest to zero. From (b) of the three figures, it can be seen that ICSTN_2+CNN_4 has the highest recognition rate. Then why does the model formed by Scheme 3 have the best effect of correcting Chinese characters, but does not get the best recognition rate? The main reason is that the ICSTN_4+CNN_4 model has many parameters and has not been fully trained under the same epoch. Compared with the model composed of schemes 1, 2, and 3, the CNN model has a large gap. Compared with (a) (b) in Figure 9 and (a) (b) in Figures 6, 7, and 8, the value of the loss function convergence The largest, the recognition rate is only 82.08%. The recognition effect of CNN_4 is not good because the model is shallow.

最后从表1可得，从训练模型消耗的时间上来看，反向空间变换网络越深所消耗的时间越多。所以综合效率以及识别率等因素，ICSTN_2+CNN_4的效果和料率最好。从测试的准确率来看。本发明有明显的优势。而CNN_4模型与ICSTN_1+CNN_4模型训练时间相差较小但是识别效果差距明显。ICSTN_1仅采用了一个一层的神经网路。所以本发明实现了反向空间变换网络和较浅的卷积神经网路相结合却得到了一个较高的识别的效果。Finally, it can be seen from Table 1 that from the perspective of the time consumed by training the model, the deeper the inverse spatial transformation network, the more time it consumes. Therefore, factors such as comprehensive efficiency and recognition rate, ICSTN_2+CNN_4 have the best effect and material rate. from the accuracy of the test. The present invention has clear advantages. The training time difference between the CNN_4 model and the ICSTN_1+CNN_4 model is small, but the difference in recognition effect is obvious. ICSTN_1 uses only one layer of neural network. Therefore, the present invention realizes the combination of the reverse spatial transformation network and the shallower convolutional neural network but obtains a higher recognition effect.

表1不同网络结构的识别效果Table 1 Recognition effect of different network structures

模型类别model class 识别准确率(％)Recognition accuracy (%) 耗时(h)Time-consuming (h) CNN_4CNN_4 82.0882.08 5.165.16 ICSTN_4+CNN_4ICSTN_4+CNN_4 95.8395.83 17.7217.72 ICSTN_2+CNN_4ICSTN_2+CNN_4 96.3896.38 13.4013.40 ICSTN_1+CNN_4ICSTN_1+CNN_4 95.3295.32 6.926.92

Claims

A kind of 1. method of the Off-line Handwritten Chinese Recognition based on depth convolutional neural networks, it is characterised in that including following step Suddenly：

TensorFlow deep learning frameworks are built on Windows；

Prepare data set, data set is converted into TensorFlow input form, and pretreatment is normalized in picture；

Inverse composition spatial alternation network and the convolutional neural networks of depth are built under TensorFlow environment, using reverse Blended space network carries out twist distortion to handwritten Chinese character, is corrected and alignd.And output it as convolutional neural networks Input；

Finally the network of inverse composition spatial alternation network and convolutional neural networks formation is trained using substantial amounts of data, Test.And the picture during network processes and final testing result are visualized by TensorBoard instruments, analyze its work With, and contrast recognition effect.
2. according to step described in claims 1, handwritten Chinese character is corrected using inverse composition spatial alternation, its feature exists In：

The homogeneous matrix formed using warp parameters enters line translation alignment, exemplified by warp parameters are in a manner of affine, ginseng to original image Number p=[p₁ p₂ p₃ p₄ p₅ p₆], corresponding homogeneous transform matrix：

<mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <mn>1</mn> <mo>+</mo> <msub> <mi>p</mi> <mn>1</mn> </msub> </mrow> </mtd> <mtd> <msub> <mi>p</mi> <mn>2</mn> </msub> </mtd> <mtd> <msub> <mi>p</mi> <mn>3</mn> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>p</mi> <mn>4</mn> </msub> </mtd> <mtd> <mrow> <mn>1</mn> <mo>+</mo> <msub> <mi>p</mi> <mn>1</mn> </msub> </mrow> </mtd> <mtd> <msub> <mi>p</mi> <mn>6</mn> </msub> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> </mrow>

Acted on image, twist distortion is carried out to image I (x), include the operation such as the rotation of image, translation, scaling.

ImWarp (x)=I (x) W (p)

ImWarp (x) is the picture corrected.Parameter p is to be updated to ask for by the way of iteration, and p distortion increases Amount Δ p is that this network is referred to as into geometry fallout predictor by a convolutional neural networks or neural computing, the present invention.By Inspired in inverse composition spatial alternation principle by inverse composition Lucas-Kanade algorithms：

<mrow> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mrow> <mi>&Delta;</mi> <mi>p</mi> </mrow> </munder> <mo>|</mo> <mo>|</mo> <mi>I</mi> <mrow> <mo>(</mo> <mi>P</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>T</mi> <mrow> <mo>(</mo> <mi>&Delta;</mi> <mi>P</mi> <mo>)</mo> </mrow> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> </mrow>

Shift onto and convert by a series of, its solution is：

Δ p=RI (p)+b

Last solution is linear regression forms, it is possible to which its solution procedure parameter is turned into neutral net or convolutional Neural net Network；

Seeking the network structure of distortion increment Delta p geometry fallout predictor has three kinds of schemes, and two schemes are convolutional neural networks structures, One neutral net for individual layer.There is detailed structure explanation in specification.We eventually verify each scheme and divided one by one Analyse its validity.

The Δ p of geometry fallout predictor renewal synthesizes with p before, undated parameter p.The p value of renewal can be existed from new role afterwards On image, loop iteration asks for optimal p value.Defined according to synthesis, update mode is as follows：

By the way of synthesis it is the gradient that need not remove to ask sample image every time come the benefit of undated parameter.Greatly accelerate anti- To the calculating speed of propagation.

It will be finally input to by the picture corrected in convolutional neural networks and carry out feature extraction and classifying.Convolutional neural networks Structural order is respectively：Two convolutional layers, then a pond layer, two convolutional layers, a pond, it is finally two full connections Layer.Final output neuron number is identical with the classification of present example.