CN106469304A - Position locating method of handwritten signature in bill based on deep convolution neural network - Google Patents

Position locating method of handwritten signature in bill based on deep convolution neural network Download PDF

Info

Publication number
CN106469304A
CN106469304A CN201610841643.8A CN201610841643A CN106469304A CN 106469304 A CN106469304 A CN 106469304A CN 201610841643 A CN201610841643 A CN 201610841643A CN 106469304 A CN106469304 A CN 106469304A
Authority
CN
China
Prior art keywords
step
neural network
layer
image data
position
Prior art date
Application number
CN201610841643.8A
Other languages
Chinese (zh)
Inventor
张二虎
李雪薇
Original Assignee
西安理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西安理工大学 filed Critical 西安理工大学
Priority to CN201610841643.8A priority Critical patent/CN106469304A/en
Publication of CN106469304A publication Critical patent/CN106469304A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00154Reading or verifying signatures; Writer recognition
    • G06K9/00161Reading or verifying signatures; Writer recognition based only on signature image, e.g. static signature recognition

Abstract

The invention discloses a position locating method of a handwritten signature in a bill based on a deep convolution neural network. The method is specifically implemented according to following steps: step 1: establishing a platform based on a caffe deep learning frame including various convolution neural network models; step 2: preparing a data set of the bill; step 3: training the network to obtain a locating detection model; and step 4: locating the position of the handwritten signature of the to-be-detected bill by employing the locating detection model obtained in step 3. According to the position locating method of the handwritten signature in the bill based on the deep convolution neural network, the position of the handwritten signature in the bill can be accurately calibrated.

Description

基于深度卷积神经网络的票据中手写签名位置定位方法 Paper-based depth convolutional neural network position location method for a handwritten signature

技术领域 FIELD

[0001] 本发明属于图像定位检测技术领域,具体涉及一种基于深度卷积神经网络的票据中手写签名位置定位方法。 [0001] The present invention belongs to an image location detection technologies, and particularly relates to a position location method signature handwritten notes depth convolutional neural network-based.

背景技术 Background technique

[0002] 目前,我国对票据自动检测系统的研究大多还在开发阶段,实际应用较少,并且配置一套票据指定手写签名位置检测系统成本较高,也限制了票据自动化检测技术的发展。 [0002] At present, the research on the bill automatic detection system mostly still in the development stage, few practical applications, and configure a set of notes designated higher handwritten signature position detection system costs, the bill also limits the development of automated testing technology. 这也就对广大的研究机构和学者提出了需求,需要研究出一种票据自动化检测的技术。 This means the majority of research institutions and scholars put forward the demand, the need developed a technique of automatic detection instruments. 票据中的手写签名具有字符的特性,对于字符的识别主要有:统计特征识别技术、结构特征识别技术、基于神经网络的识别技术。 The bill has the character handwritten signature features for character recognition are: statistical recognition technology, structural feature recognition technology, identification technology based on neural network. 统计特征有字符二维平面位置特征、字符在水平或垂直方向投影的直方图特征等,基于统计特征的字符识别技术对于形近字符区分能力弱,只适合字符粗分类。 Statistical characteristics of two-dimensional position of the feature with a character, a character in the vertical direction or the horizontal projection histograms of the like, character recognition technique based on statistical features for distinguishing weak character shape near capacity, only for rough classification of characters. 结构特征有笔划的走向、孤立的点以及是否含有闭合笔画等,这种方法便于区分字型变化大的字符。 Has structural features to strokes, isolated dots and absence of a closing stroke, etc., this method is easy to distinguish the large font characters change. 目前,对神经网络的研究正处于一个新的高峰时期,神经网络已经广泛应用于模式识别领域。 At present, the study of the neural network is at a new peak period, the neural network has been widely used in pattern recognition. 随着学术界对于深度学习的研究,深度学习的算法越来越成熟, 应用也越来越多。 As for the academic study of deep learning, deep learning algorithms more and more mature, more and more. 但是,大多数神经网络只能用于提取目标的特征,不能用于目标定位。 However, most neural network can only be used for feature extraction target can not be used for targeting.

发明内容 SUMMARY

[0003] 本发明的目的是提供一种基于深度卷积神经网络的票据中手写签名位置定位方法,能够准确的标定出票据中含有手写签名的位置。 [0003] The object of the present invention is to provide a method of position location based ticket signature depth convolutional neural network in the handwriting, the position can be accurately calibrated ticket containing handwritten signature.

[0004] 本发明所采用的技术方案是,基于深度卷积神经网络的票据中手写签名位置定位方法,具体按照以下步骤实施: [0004] The technical proposal of the present invention is based on the signature method of ticket depth position location convolutional neural network handwritten particular embodiment the following steps:

[0005] 步骤1:搭建基于包含多种卷积神经网络模型的caffe深度学习框架的平台; [0005] Step 1: caffe depth based learning framework structures comprising a plurality of convolutional neural network model of the platform;

[0006] 步骤2:准备票据的数据集; [0006] Step 2: Preparation ticket data set;

[0007] 步骤3:训练网络得到定位检测模型; [0007] Step 3: obtain the location detection network training model;

[0008] 步骤4:用步骤3得到的定位检测模型对待检测的票据定位手写的签名位置。 [0008] Step 4: treat bill positioning handwritten signature detected by the position detection model positioning step 3.

[0009] 本发明的特点还在于: [0009] The features of the present invention is further characterized by:

[0010] 步骤2具体为: [0010] Step 2 is specifically:

[0011] 步骤2.1:对票据进行拍照,得到票据的原始图像数据,并对原始图像数据进行样本扩充; [0011] Step 2.1: to notes photographed, original image data of the bill obtained, and the original image data expanded sample;

[0012] 步骤2.2:对步骤2.1中得到的所有图像数据进行编号及标定,标定出票据图像中手写签名位置的坐标,得到该签名位置的左上角坐标(Xmiη、Ymiη)和右下角坐标(Xmax、 Ymax),并且将所有图像的序号及相应的坐标位置信息写入到xml文件中; [0012] Step 2.2: for all the image data obtained in step is 2.1 number and calibrated in a handwritten signature in the form image coordinate position to obtain the coordinates of the upper left corner position of the signature (Xmiη, Ymiη) and lower right coordinates (Xmax , Ymax), and the number of all images and corresponding coordinate position information is written to the xml file;

[0013] 步骤2.3:将所有图像数据分为训练数据集和测试数据集,再将训练数据集分为训练数据和验证数据。 [0013] Step 2.3: all the image data into the training data set and test data sets, then the training data set into training data and verification data.

[0014] 步骤2.1中对原始图像数据进行样本扩充,包括: [0014] Step 2.1 for sample augment the original image data, comprising:

[0015] ①对原始图像数据进行不同角度的旋转及不同方向的平移; [0015] ① raw image data is translated at different angles and in different rotational directions;

[0016] ②对原始图像数据进行线性插值缩放; [0016] ② raw image data are linearly interpolated scale;

[0017] ③对原始图像数据增加不同强度的椒盐噪声、高斯噪声。 [0017] ③ different intensity increased salt and pepper noise, Gaussian noise to the original image data.

[0018] 步骤3具体为: [0018] Step 3 is specifically:

[0019] 步骤3.1:将步骤2中得到的数据集中的图像调整成大小的图像并放入ZF网络W Η 前5层层中做特征提取,输出大小的特征图; 16 16 [0019] Step 3.1: The data obtained in step 2 is adjusted to set the image size of the image and before placing ZF W Η 5-layer network to do feature extraction, feature FIG output size; 1616

[0020] 步骤3.2:用3*3的卷积核与步骤3.1得到的256个特征图进行卷积,得到256维特征向量,作为RPN的第一层; [0020] Step 3.2: 3 by a convolution kernel of step 3 3.1 * 256 obtained wherein FIG convoluted to obtain 256-dimensional feature vector, as the RPN of the first layer;

[0021] 步骤3.3:将步骤3.2得到的256维特征向量输入到分类层和回归层两个并行卷积层,根据正样本中前景概率的得分高低,选取得分高的前300个候选框; [0021] Step 3.3: 3.2 obtained in step 256 is input to the dimensional feature vector classification and regression layer layer two parallel convolutional layers, the level of probability based on the scores of positive samples foreground, select high-scoring candidate front frame 300;

[0022] 步骤3.4:使用R0I_Pooling层将步骤3.3中的300个候选框映射到ZF网络的第五层卷积后得到的256维特征图上,得到池化归一后的大小6*6的特征图; [0022] Step 3.4: Use the 256-dimensional feature FIG R0I_Pooling layer 300 in step 3.3 ZF candidate blocks mapped to the network obtained after convolution of the fifth layer, the size of the pool to obtain a normalized feature of a 6 * 6 Figure;

[0023] 步骤3.5、将每一个6*6大小的特征图输入到两个连续的全连接层fc6、fc7,先经过fc6得到4096维特征,再将4096维特征输入到fc7中,最终得到1*4096维特征; [0023] Step 3.5, each of a 6 * 6 size of features of an input to two successive layers fully connected fc6, fc7, to give after fc6 dimensional features 4096, 4096 and then inputted to the dimensional features fc7, the finally obtained 1 * 4096-dimensional feature;

[0024] 步骤3.6 :将1*4096维特征输入到两个并行的全连接层cls_score层、bbox_ predict层,Cls_SC〇re层用于分类,输出背景的概率和K类样本的概率,其中,K是样本类别数;bb 〇X_predict层用于调整候选区域位置,输出候选框的(X',y',w',h'),x'为调整后的候选框左上角横坐标,y'为调整后的候选框左上角纵坐标,w'为调整后的候选框的宽度,h' 为调整后的候选框的高度; [0024] Step 3.6: 1 to 4096 * dimensional characteristics of the input layer is connected to a full layer cls_score two parallel, bbox_ predict layer, a layer for Cls_SC〇re classification probability, and outputs the background probability class K samples, where, K is a sample number of categories; BB 〇X_predict layer for adjusting the position of the candidate region, the output candidate frame (X ', y', w ', h'), x 'is the candidate frame the upper left corner after the adjustment abscissa, y' is ordinate frame candidates left corner after the adjustment, w 'is the width of the adjusted candidate frame, h' for the candidate frame the adjusted height;

[0025] 步骤3.7:判断总迭代次数是否大于阈值,如果不大于阈值,则转到步骤3.2;如果大于阈值,则结束。 [0025] Step 3.7: The total number of iterations is greater than a threshold determination, if not greater than the threshold, then go to step 3.2; if greater than the threshold, then the ends.

[0026] 步骤3.1中数据集中的图像调整成WAHi大小的图像需要根据数据集中图像大小W* Η来计算,即: WW, [0026] Step 3.1 adjusted image data set into an image WAHi size needs to be calculated from the size of the image dataset W * Η, namely: WW,

[0027] 百=κ。 [0027] one hundred = κ.

[0028] 步骤3中训练网络时学习速率的初始值设定为1 r = 0.01,每当当前迭代次数达到步长值的整数倍时,学习速率衰减一次,当迭代次数达到总迭代次数时,结束;衰减后的lr =lr*gamma,其中,gamma = 0.1,迭代次数< 总迭代次数。 [0028] The initial value of the learning rate in step 3 is set to train the network 1 r = 0.01, whenever the current number of iterations reaches an integral multiple of the step value, the learning rate decay time, when the number of iterations reaches the total number of iterations, end; LR attenuated = lr * gamma, where, gamma = 0.1, the number of iterations <the total number of iterations.

[0029] 步骤3中训练网络时mini-batch size设为256。 [0029] Step 3 training network mini-batch size to 256.

[0030] 本发明的有益效果是:本发明基于深度卷积神经网络的票据中手写签名位置定位方法,采用的是深度神经网络来进行图像定位,比传统的方法在定位的速度和准确度上均有提高,准确度已经可以达到90.9%,速度基本可以做到实时定位,定位一张图片需要0.3s;并且本发明的票据图像数据库包含了各种各样的数据样本,数据样本具有多样性,使得定位的准确度有所上升。 [0030] Advantageous effects of the present invention is: a signature method of the present invention, position location based on handwritten notes depth convolutional neural network, the neural network is used in the depth image localization is performed, the speed and positioning accuracy than traditional methods We have improved the accuracy of 90.9% has been achieved, the basic rate can be done in real-time positioning, which requires 0.3s an image; and an image database bill the present invention comprises a variety of data samples, the data samples with the diversity accuracy, so that the positioning has increased.

附图说明 BRIEF DESCRIPTION

[0031] 图1是本发明方法中采集的正常票据图像; [0031] FIG. 1 is a normal form image acquisition method of the present invention;

[0032]图2是本发明方法中采集的旋转45 °票据图像; [0032] FIG. 2 is collected in the method of the present invention form image 45 ° rotation;

[0033]图3是本发明方法中采集的加椒盐噪声票据图像; [0033] FIG. 3 is a salt and pepper noise plus receipt image acquisition method of the invention;

[0034]图4是本发明方法中旋转90 °的待检测票据图像; [0034] FIG. 4 is to be detected in the methods of this invention form image rotated by 90 °;

[0035]图5是图4的检测结果图; [0035] FIG. 5 is a graph of the detection results of FIG 4;

[0036]图6是本发明方法中旋转180°的待检测票据图像; [0036] FIG. 6 is a form image to be detected in the method according to the present invention, a 180 ° rotation;

[0037]图7是图6的检测结果图; [0037] FIG. 7 is a diagram illustrating a detection result of Figure 6;

[0038]图8是本发明方法中旋转45 °的待检测票据图像; [0038] FIG. 8 is a form image to be detected in the method of the present invention is rotated by 45 °;

[0039]图9是图8的检测结果图; [0039] FIG. 9 is a diagram showing the detection results of FIG. 8;

[0040] 图10是本发明方法中加高斯噪声的待检测票据图像; [0040] FIG. 10 is a detection method of the present invention to be added to form image of Gaussian noise;

[0041] 图11是图10的检测结果图; [0041] FIG. 11 is a diagram of a detection result of FIG 10;

[0042] 图12是本发明方法中加椒盐噪声的待检测票据图像; [0042] FIG. 12 is a method of the present invention impulse noise added receipt image to be detected;

[0043]图13是图12的检测结果图。 [0043] FIG. 13 is a diagram illustrating a detection result of FIG. 12.

具体实施方式 Detailed ways

[0044]下面结合附图和具体实施方式对本发明进行详细说明。 [0044] The present invention will be described in detail in conjunction with accompanying drawings and specific embodiments.

[0045] 本发明基于深度卷积神经网络的票据中手写签名位置定位方法,具体按照以下步骤实施: [0045] The present invention is based on a handwritten signature ticket depth position location method convolutional neural network, the specific embodiments according to the following steps:

[0046] 步骤1:在ubuntu系统环境或者Windos环境下搭建基于包含多种卷积神经网络模型的caffe深度学习框架的平台; [0046] Step 1: In Windos ubuntu system environments or internet environments caffe depth structures based learning framework comprising a plurality of convolutional neural network model;

[0047] 步骤2:准备票据的数据集,具体为: [0047] Step 2: Preparation ticket data set, in particular:

[0048] 步骤2.1、由于在真实环境中的图像都是由用户自己在手机或者相机上拍照上传给票据系统的,所以在准备图片的时候,需要考虑不同分辨率的手机拍摄的照片,以及拍摄的环境光照等条件。 [0048] Step 2.1, since the image in a real environment is photographed by the user on the phone or uploaded to the camera ticketing system, so while preparing the pictures, we need to consider mobile phones to take pictures of different resolutions, as well as shooting ambient lighting conditions. 本发明使用多种不同分辨率的手机对票据进行拍摄,将这部分手机拍摄的图像称为原始图像数据。 The present invention uses a plurality of different resolutions on the phone bill for shooting, the image captured by the phone this part referred to as original image data. 为了使图像数据足够充分且能够符合各种实际情况,本发明对原始图像数据进行了样本扩充:①对原始图像数据进行不同角度的旋转及不同方向的平移;②对原始图像数据进行线性插值缩放,是考虑到不同相机采集到的图像大小不同;③对原始图像数据增加不同强度的椒盐噪声、高斯噪声; In order that the image data fully and sufficiently able to meet various circumstances, the present inventors conducted a sample of the original image data expanded: ① the raw image data is translated in different directions of rotation and at different angles; ② the original image data are linearly interpolated scaling , taking into account the different cameras of different sizes acquired image; ③ different intensity increased salt and pepper noise, Gaussian noise to the original image data;

[0049] 如图1-3给出采集到的部分票据样本图像,包括正常的票据图像、旋转的票据图像、含有噪声的票据图像。 [0049] FIG. 1-3 shows the acquired sample image portion of the bill, the bill includes a normal image, the image rotation bill, bill image containing noise.

[0050] 步骤2.2:对步骤2.1中得到的所有图像数据进行编号及标定,标定出票据图像中手写签名位置的坐标,得到该签名位置的左上角坐标(Xmiη、Ymiη)和右下角坐标(Xmax、 Ymax),并且将所有图像序号及相应的坐标位置信息写入到xml文件中; [0050] Step 2.2: for all the image data obtained in step is 2.1 number and calibrated in a handwritten signature in the form image coordinate position to obtain the coordinates of the upper left corner position of the signature (Xmiη, Ymiη) and lower right coordinates (Xmax , Ymax), and the image number and all the coordinates corresponding to the location information is written to the xml file;

[0051] 步骤2.3:对所有准备好的图像数据随机分成训练数据集trainval和测试数据集test两个部分数据,其中设置trainval数据集占整个数据集的8/10,test数据集占整个数据集的2/10。 [0051] Step 2.3: randomly divided image data prepared for all the training dataset and testing dataset trainval two part test data, wherein a data set representing trainval 8/10 whole data set, the total data set test data set 2/10. 在trainval数据集中又分成train数据和val数据,其中trian数据是用来做训练的,占trainval数据集的4/5, val数据是用来做验证的,占trainval数据集的1/5。 In trainval data set is divided into data train and val data, wherein the data is used for training trian, representing 4/5 trainval data set, the data is used for verification val, accounting for 1/5 trainval dataset.

[0052] 步骤3:训练网络得到定位检测模型并对训练参数进行优化 [0052] Step 3: obtain the location detection network training model and the training parameters to optimize

[0053] 步骤3.1:将步骤2中得到的数据集中的图像调整成WAHi (本发明中为600*800)大W. Η 小的图像并放入ZF网络前5层层中做特征提取,输出3 (本发明中为37*50)大小16 16 的特征图。 [0053] Step 3.1: The data obtained in step 2 is adjusted to focus an image Wahi (in the present invention is a 800 * 600) small image into a large front and W. Η ZF five-layer network to do feature extraction, output 3 (in the present invention is a 50 * 37) 16 16 of FIG size characteristics. 数据集中图像调整的大小需要根据数据集图像中图像的长宽比(图像最长边/图像最短边)来计算,设输入图像大小为W*H,调整后的图像大小为WAHi,关系式为: WW, The data size of the image focus adjustment needs to be calculated based on the aspect ratio of the image data set (image longest side / the shortest side of the image), the input image size is provided after the image size of W * H, Wahi adjusted, the relationship of the formula : WW,

[0054] 。 [0054].

[0055] 数据集中图像大小调整成600*800的原因: [0055] Resizing the image data set to a 600 * 800 reasons:

[0056] 数据集中的图像大小不一,计算归一后图像大小时,选定占整个数据集数目较多的几种类型大小的图像,并且在数据集中图像最长边与图像最短边之比占整个数据集较多的几种类型的图像,符合条件的包括600*800、1200*1600、1500*2000、2000*2600、3000* 4000,将这几个作为调整后的图像大小,调整后的图像需进行卷积计算,考虑到计算量的大小、GPU内存大小,需尽可能选取较小的调整比例,初步修改为600*800、1200*1600这两个大小的图像进行训练,选取600*800大小的图像进行训练准确度为90.84%,选取1200*1600大小的图像进行训练的准确度为90.86%,准确度后者比前者只提高了0.02%,但训练时间以及计算复杂度上后者比前者要复杂得多,因此最终选取600*800这个比例作为调整的比例。 [0056] The image data set sizes, after calculating a normalized image size, the selected number of the entire data set representing more types of image size, and the longer side of the image than the shortest side of the image in the dataset several types of accounting for the entire image data set more, including qualifying 800,1200 600 * * * 2600,3000 2000,2000 1600,1500 * * 4000, the image size as the number of these adjustments, the adjusted the images need to be convolution calculation, taking into account the size of the amount of calculation, GPU memory size, the need to select a smaller adjustment factor as far as possible, preliminary revised to 600 * 800,1200 * 1600 image size of two training, select 600 after the image size of 800 * training accuracy of 90.84%, 1200 * 1600 select image size is accurate training was 90.86%, the accuracy of the latter is increased only 0.02% but training time and computational complexity who is much more complex than the former, so the final select 600 * 800 ratio as the ratio adjustment. [0057] 步骤3.2:用3*3的卷积核(滑动窗口)与步骤3.1得到的256个特征图进行卷积,因为这个3*3的区域上,每一个特征图上得到一个1维向量,256个特征图上即可得到256维特征向量,作为RPN (region proposal network,区域候选网络)的第一层; [0057] Step 3.2: 3 * 3 convolution with the (sliding window) obtained in the step 3.1 256 wherein FIG convoluted, since the upper region of the 3 * 3 to obtain a one-dimensional feature vector in each of FIGS. , 256-dimensional feature vector can be obtained on the feature map 256, as RPN (region proposal network, the network region candidates) of the first layer;

[0058] 3*3滑窗中心点位置,对应预测输入图像的3种尺度(128、256、512)、3种长宽比(1: 2、2:1、7:1)的目标区域候选框,这种映射的机制称为锚点,产生了k = 9个锚点。 [0058] 3 * 3 sliding window center point, three kinds of prediction of an input image corresponding to the scale (256, 512), three kinds of aspect ratios (1: 2,2: 1,7: 1) a target region candidate block, this mapping is called the anchor mechanism, resulting in k = 9 anchor point. 即每个3*3 区域可以产生9个目标区域候选框。 I.e., each 3 * 3 area may produce 9 target region candidate frame. 所以对于本发明中的37*50的特征图,总共有约20000 (37*50*9)个锚点,也就是在输入图像上预测了20000个目标区域候选框。 Therefore, for the feature of the present invention in FIG. 37 * 50, a total of about 20,000 (37 * 50 * 9) anchor, which is 20,000 predicted target region candidate frame on the input image.

[0059]目标区域候选框的尺度有三个(128、256、512),这个实质是指候选框的面积为128*128、256*256、512*512,这个候选框面积大小和步骤3.1中图像调整归一化后的大小有关,尽可能候选框的面积要将图像中目标包围在内。 [0059] Objectives of the candidate block area, there are three (256, 512), this means a substantial area of ​​the candidate frame is 128 * 128, 256 * 256, 512 * 512, the candidate frame in the image area 3.1 and the step size adjust the size of the return of an area of ​​the image you want to target as possible candidates box encasing.

[0060] 目标区域候选框的长宽比选用1:2、2:1和7:1这三个比例是根据数据集中每个图像内目标候选框的宽(Wbox)和高(Hbox)的比例得出,选取宽高比例图像数最多的三个比例作, Wh.. 为候选框的宽高比__:w:,其中,Wb〇x = Xmax-Xmin,Hb〇x = Ymax-Ymin,(Xmin、Ymin)、(Xmax、 box. Ymax)为步骤2.2中标定的手写签名位置的左上角及右下角坐标值。 [0060] The aspect ratio of the target region candidate block selection 1: 2,2: 1 and 7: 1 ratio of the three is concentrated within each target candidate frame image data according to the ratio of width (Wbox) and high (HBox) of results, select up the image aspect ratio as the ratio of the number three, Wh candidate frame aspect ratio .. __: w :, wherein Wb〇x = Xmax-Xmin, Hb〇x = Ymax-Ymin, (Xmin, Ymin), the upper left corner and lower right corner coordinate values ​​(Xmax, box. Ymax) a step of 2.2 handwritten signature calibration position.

[0061] 步骤3.3:将256维特征向量输入到两个并行卷积层,即分类层和回归层,分别用于分类和边框回归。 [0061] Step 3.3: The feature vector 256 is input to the two-dimensional convolution parallel layers, i.e. layer classification and regression layers, respectively, for the classification and regression border. 就局部来说,这两层是全连接网络;就全局来说,由于网络在所有位置(共37*50个)的参数相同,所以实际用尺寸为1X1的卷积网络实现。 Topical, the two layers are fully connected network; on a global, since all the network parameters in the same position (50 of 37 *), the actual size of a 1X1 convolution networks. 需要注意的是:并没有显式地提取任何候选窗口,完全使用网络自身完成判断和修正。 Note that: not explicitly retrieve any candidate window, full use of the network itself completion judgment and corrected.

[0062] 对每个候选框,分类层从256维特征中输出属于前景和背景的概率,并对每个候选框进行标定:正样本为与真实区域重叠大于0.7,负样本为与真实区域重叠小于0.3,保留正样本; [0062] For each candidate block, the probability of belonging to the category level output from the foreground and background feature dimension 256, and each candidate block calibration: true positive samples overlaps with the region of greater than 0.7, the true negative samples overlap region less than 0.3, where positive samples;

[0063]同时回归层从256维特征中输出4个平移缩放参数(x,y,w,h),其中,X为候选框左上角横坐标,y为候选框左上角纵坐标,w为候选框的宽度,h为候选框的高度,这四个坐标元素用于确定目标位置。 [0063] Regression layer simultaneously outputs four pan and zoom parameters (x, y, w, h) from the 256-dimensional feature, wherein, X is the top left abscissa candidate block, y is the top left ordinate candidate block, w is a candidate the width of the frame, h is the height of the candidate block, the coordinates of these four elements are used to determine the target location.

[0064] 经过步骤3.2,本发明在输入图像上预测得到20000个候选框,经过步骤3.3后, 20000个预测候选框剩下2000个左右候选框,最后根据正样本中前景概率的得分高低,选取得分高的前300个候选框。 [0064] After step 3.2, the present invention is predicted on the input image 20000 candidate block, after step 3.3, the prediction candidate block 20,000 2,000 remaining candidate block, according to the last n samples foreground probability score level, selected from get high points of the first 300 candidates box.

[0065] 步骤3.4:使用R0I_Pooling层将步骤3.3中的300个候选框映射到ZF网络的第五层卷积后得到的256维特征图上,得到池化归一后的大小6*6的特征图。 [0065] Step 3.4: Use the 256-dimensional feature FIG R0I_Pooling layer 300 in step 3.3 ZF candidate blocks mapped to the network obtained after convolution of the fifth layer, the size of the pool to obtain a normalized feature of a 6 * 6 Fig.

[0066] R0I_Pooling层就是实现从原图区域映射到conv5区域最后池化到固定大小的功能。 [0066] R0I_Pooling layer is to implement the mapping from the last picture region to region pooled conv5 fixed size function.

[0067]首先计算预测的候选框映射到特征图上的坐标,即原始坐标乘以十六分之一,然后针对每个输出来进行计算,即将特征图上已经映射好的300个不同大小的候选框进行池化,然后将池化后的结果统一归一化为大小6*6的特征图。 [0067] The first calculating the predicted block candidates mapped to coordinates on the characteristic diagram, i.e., one-sixteenth of the original coordinates multiplied, then computed for each output, the upcoming feature map has 300 good mapping of different sizes candidate block for pooling, then the result of the pooling unifying feature FIG 6 * 6 size is normalized.

[0068] 步骤3.5、将每一个6*6大小的特征图(一共300个特征图)输入到两个连续的全连阶层fc6、fc7,这两个全连接层是连续的不是并行的,先经过fc6得到4096维特征,再将4096 维特征输入到fc7中,最终得到1*4096维特征。 [0068] Step 3.5, the 6 * 6 size of each characteristic graph (a total of 300 feature maps) input to two consecutive sectors of the whole company fc6, fc7, two fully connected layers are not parallel continuous first fc6 dimensional feature obtained after 4096, then 4096 is input to the dimensional features fc7, the finally obtained 1 * 4096-dimensional feature.

[0069] 步骤3.6 :将1*4096维特征输入到两个并行的全连接层cls_score层、bbox_ predict层,Cls_SC〇re层用于分类,输出背景的概率和K类样本的概率,其中,K是样本类别数;bb 〇X_predict层用于调整候选区域位置,输出候选框的(X',y',w',h'),x'为调整后的候选框左上角横坐标,y'为调整后的候选框左上角纵坐标,w'为调整后的候选框的宽度,h' 为调整后的候选框的高度; [0069] Step 3.6: 1 to 4096 * dimensional characteristics of the input layer is connected to a full layer cls_score two parallel, bbox_ predict layer, a layer for Cls_SC〇re classification probability, and outputs the background probability class K samples, where, K is a sample number of categories; BB 〇X_predict layer for adjusting the position of the candidate region, the output candidate frame (X ', y', w ', h'), x 'is the candidate frame the upper left corner after the adjustment abscissa, y' is ordinate frame candidates left corner after the adjustment, w 'is the width of the adjusted candidate frame, h' for the candidate frame the adjusted height;

[0070] 步骤3.7:判断总迭代次数(本发明中为8000)是否大于阈值,如果不大于阈值,则转到步骤3.2;如果大于阈值,则结束。 [0070] Step 3.7: Determine the total number of iterations (8000 in the present invention) is greater than a threshold value, if not greater than the threshold, then go to step 3.2; if greater than the threshold, then the ends.

[0071] 总迭代次数的选择:在训练过程中观察loss的值,当loss的值不再大幅度下降、趋于稳定,我们就可以选择当前的迭代次数最为最终的迭代次数。 Select [0071] The total number of iterations: observed loss in the training process, when the value of the loss is no longer a significant decline, stable, we can choose the most current iterations final number of iterations.

[0072] 在训练网络时学习速率的初始值设定为lr = 0.01,每当当前迭代次数达到步长值(本发明为6000)的整数倍时,学习速率衰减一次,当迭代次数达到总迭代次数时,结束;衰减后的lr = lr*gamma,其中,gamma = 0.1,迭代次数彡总迭代次数。 [0072] When training an initial value of the learning rate of the network is set lr = 0.01, every time the number of iterations reaches the current value at step (6000 of the present invention) is an integral multiple of, the learning rate decay time, when the total number of iterations reaches the iteration when frequency end; LR attenuated = lr * gamma, where, gamma = 0.1, the total number of iterations iterations San.

[0073] 运用梯度下降算法进行优化时,权重的更新规则中,在梯度项前会乘以一个系数, 这个系数就叫学习速率lr。 [0073] When using the gradient descent algorithm to optimize the weight update rule, before the gradient term is multiplied by a coefficient, which is called the learning rate lr. 如果学习速率太小,则会使网络收敛过慢,如果学习速率太大, 则会导致代价函数振荡,一个比较好的策略是在实践中先把学习速率设置为0.01,然后观察training cost的走向,如果training cost在减小,那可以逐步地调大学习速率,如0.1, 1 ·0···。 If the learning rate is too small, it will make the network slow convergence, if the learning rate is too large, it will lead to cost function oscillation, a better strategy is to put in practice the learning rate to 0.01, and then to observe the training cost If the training cost is reduced, it can gradually turn up the learning rate, such as 0.1, 1 · 0 · · ·. 如果training cost在增大,那就减小学习速率,如0· 001,0.0001···。 If the training cost increases, decreases the learning rate, such as 0.5 001,0.0001 ???. 经过上述方法确定学习速率的值。 Value learning rate is determined through the method described above.

[0074] 学习速率什么时候衰减与步长有关,减少多少与gamma有关。 [0074] When the learning rate decay and related steps to reduce the number of gamma-related. 选择步长时,可以尽可能的接近总迭代次数。 When selecting step, the total number of iterations accessible as possible.

[0075] 训练网络时mini-batch size设为256,采用mini-batch时的权重更新规则为: [0075] training network mini-batch size is set to 256, the weight when using mini-batch weight update rules are:

[0076] [0076]

Figure CN106469304AD00081

[0077] 也就是将256个样本的梯度求均值。 [0077] The gradient is 256 samples averaging.

[0078] 当采用mini-batch时,我们可以将一个batch里的所有样本放在一个矩阵里,利用线性代数库来加速梯度的计算,这是工程实现中的一个优化方法。 [0078] When mini-batch, we can be in a batch all samples in a matrix, the use of linear algebra library to speed up the computation of the gradient, which is a method of optimizing the engineering implementation.

[0079] -个大的batch size,可以充分利用矩阵、线性代数库来进行计算的加速,batch size越小,则加速效果可能越不明显。 [0079] - a big batch size, can take advantage of the matrix, linear algebra library to calculate acceleration, batch size is smaller, then the acceleration effect may be less obvious. batch size不是越大越好,太大了,权重的更新就会不那么频繁,导致优化过程太漫长。 batch size is not bigger is better, too, the weights will be updated less frequently, resulting in the optimization process is too long. 一般batch size大小为256,若是图片数据集不大、GPU 内存4G以下,可以考虑将batch size改小。 General batch size is 256, if the picture data is not set, GPU memory 4G or less, consider batch size piecemeal.

[0080] 步骤4:用步骤3得到的定位检测模型对待检测的票据定位手写的签名位置,即包括手写签名位置的矩形框的左上角坐标和右下角坐标。 [0080] Step 4: treat bill detected by the positioning detection model obtained in step 3 handwritten signature positioning position, i.e. the top left coordinates and lower right coordinates comprising a rectangular frame handwritten signature position.

[0081] 图4-图13是带有不同旋转角度和不同噪声类型的待检测的票据图像以及使用本发明方法定位处的手写签名位置在该图中用矩形框标记图。 [0081] 13 is a form image with different angles of rotation and different noise types to be detected handwritten signature Figures 4 and is positioned at a position of use in the present invention is a rectangular frame with the signature of FIG. 可以看出,本发明方法的鲁棒性非常好,可以克服旋转、噪声等情况对定位结果的影响,具有定位准确、定位速度快的特点。 As can be seen, the robustness of the method of the present invention is very good, you can overcome the effects of the rotation, noise, etc. on the positioning result, a positioning accuracy, positioning speed characteristics.

Claims (7)

1. 基于深度卷积神经网络的票据中手写签名位置定位方法,其特征在于,具体按照以下步骤实施: 步骤1:搭建基于包含多种卷积神经网络模型的caffe深度学习框架的平台; 步骤2:准备票据的数据集; 步骤3:训练网络得到定位检测模型; 步骤4:用步骤3得到的定位检测模型对待检测的票据定位手写的签名位置。 1. The position location method based ticket signature depth convolutional neural network handwriting, characterized in that the following specific steps: Step 1: caffe based structures comprising a plurality of frame depth study convolutional neural network model of the platform; Step 2 : preparation ticket data set; step 3: obtain the location detection network training model; step 4: locating detection model obtained in step 3 bill to be detected handwritten signature positioning position.
2. 根据权利要求1所述的基于深度卷积神经网络的票据中手写签名位置定位方法,其特征在于,所述步骤2具体为: 步骤2.1:对票据进行拍照,得到票据的原始图像数据,并对原始图像数据进行样本扩充; 步骤2.2:对步骤2.1中得到的所有图像数据进行编号及标定,标定出票据图像中手写签名位置的坐标,得到该签名位置的左上角坐标(Xmin、Ymin)和右下角坐标(Xmax、Ymax), 并且将所有图像的序号及相应的坐标位置信息写入到xml文件中; 步骤2.3:将所有图像数据分为训练数据集和测试数据集,再将训练数据集分为训练数据和验证数据。 According to claim ticket signature-based position location method depth convolutional neural network in the handwriting, wherein said 1, wherein the step 2 specifically includes: Step 2.1: to notes photographed, original image data obtained bill, and the expanded original image data samples; step 2.2: for all the image data obtained in step 2.1 are numbered and calibrated in a handwritten signature in the form image coordinate position to obtain the coordinates of the upper left corner position of the signature (Xmin, Ymin) and the bottom right coordinates (Xmax, Ymax), and the number of all images and corresponding coordinate position information is written to the xml file; step 2.3: all the image data into the training data set and test data sets, then the training data data set is divided into training and validation data.
3. 根据权利要求2所述的基于深度卷积神经网络的票据中手写签名位置定位方法,其特征在于,所述步骤2.1中对原始图像数据进行样本扩充,包括: ① 对原始图像数据进行不同角度的旋转及不同方向的平移; ② 对原始图像数据进行线性插值缩放; ③ 对原始图像数据增加不同强度的椒盐噪声、高斯噪声。 3. The method of position location based ticket signature depth convolutional neural network in the handwriting, wherein according to claim 2, said step 2.1 the original image data expanded sample, include: ① raw image data different angle of rotation and translation of different directions; ② the original image data scaled by linear interpolation; ③ the raw image data of different intensity increased salt and pepper noise, Gaussian noise.
4. 根据权利要求1所述的基于深度卷积神经网络的票据中手写签名位置定位方法,其特征在于,所述步骤3具体为: 步骤3.1:将步骤2中得到的数据集中的图像调整成WAHi大小的图像并放入ZF网络前5 w Η 层层中做特征提取,输出256个^*·^大小的特征图; 16 16 步骤3.2:用3*3的卷积核与步骤3.1得到的256个特征图进行卷积,得到256维特征向量,作为RPN的第一层; 步骤3.3:将步骤3.2得到的256维特征向量输入到分类层和回归层两个并行卷积层,根据正样本中前景概率的得分高低,选取得分高的前300个候选框; 步骤3.4:使用ROI_Pooling层将步骤3.3中的300个候选框映射到ZF网络的第五层卷积后得到的256维特征图上,得到池化归一后的大小6*6的特征图; 步骤3.5、将每一个6*6大小的特征图输入到两个连续的全连接层&6、&7,先经过&6 得到4096维特征,再将4096维特征输入到fc7中,最终 4. The method of position location based ticket signature depth convolutional neural network in the handwriting, wherein according to claim 1, the step 3 specifically includes: Step 3.1: The data obtained in Step 2 was adjusted to set the image WAHi size of the image and before placing ZF 5 w Η network to do feature extraction layers, wherein the output 256 of FIG. ^ * ^ * size; 16 16 3.2 step: step 3 using a convolution kernel obtained 3 * 3.1 wherein FIG convolving 256, 256 to obtain dimensional feature vector, as the first layer RPN; step 3.3: 3.2 obtained in step 256 is input to the dimensional feature vector classification and regression layer layer layer two parallel convolutional the positive samples Prospects low probability score, select high-scoring candidate frame front 300; step 3.4: use ROI_Pooling mapping layer 300 in the step 3.3 256-dimensional feature candidate to block view of a fifth layer of the convolutional networks obtained ZF , the resulting size of the pool of normalized feature FIG 6 * 6; step 3.5, each of a 6 * 6 size of features of an input to the two successive full connection layer & 6, & 7, first through & 6 was 4,096-dimensional feature , 4096 and then inputted to the dimensional features fc7, the final 得到1*4096维特征; 步骤3.6:将1*4096维特征输入到两个并行的全连接层〇18_8〇〇代层、1313(«_。^(1;[〇1:层, Cls_SC〇re层用于分类,输出背景的概率和K类样本的概率,其中,K是样本类别数;bbox_ predict层用于调整候选区域位置,输出候选框的(X',y',w',h'),x'为调整后的候选框左上角横坐标,y'为调整后的候选框左上角纵坐标,w'为调整后的候选框的宽度,h'为调整后的候选框的高度; 步骤3.7:判断总迭代次数是否大于阈值,如果不大于阈值,则转到步骤3.2;如果大于阈值,则结束。 1 * 4096 to obtain dimensional feature; Step 3.6: 4096 to 1 * input to two-dimensional features of the parallel connection layer 〇18_8〇〇 generation of full-layer, 1313 ( «_ ^ (1; [〇1: layer Cls_SC〇re layer for probabilistic classification, the probability of the background and the K-class output samples, where, K is the number of samples category; bbox_ predict layer for adjusting the position of the candidate region, the output candidate frame (X ', y', w ', h' ), x 'is the upper left corner of the candidate block after adjustment abscissa, y' of the candidate frame the upper left corner after the adjustment ordinate, w 'is the width of the adjusted candidate frame, h' for the candidate frame the adjusted height; step 3.7: the total number of iterations is greater than a threshold determination, if not greater than the threshold, then go to step 3.2; if greater than the threshold, then the ends.
5. 根据权利要求4所述的基于深度卷积神经网络的票据中手写签名位置定位方法,其特征在于,所述步骤3.1中数据集中的图像调整成大小的图像需要根据数据集中图像大小W*H来计算,即: W Η - Η丨。 According to claim ticket signature-based position location method depth convolutional neural network in the handwriting, wherein said 4, wherein the step of adjusting the image data set to a 3.1 image size as necessary dataset image size W * H is calculated, namely: W Η - Η Shu.
6. 根据权利要求4所述的基于深度卷积神经网络的票据中手写签名位置定位方法,其特征在于,所述步骤3中训练网络时学习速率的初始值设定为lr = 0.01,每当当前迭代次数达到步长值的整数倍时,学习速率衰减一次,当迭代次数达到总迭代次数时,结束;衰减后的lr = lr*gamma,其中,gamma = 0.1,迭代次数< 总迭代次数。 6. The method of position location based ticket signature depth convolutional neural network in the handwriting, wherein according to claim 4, wherein, when the initial value of the learning rate in step 3 is set to train the network lr = 0.01, whenever when the current number of iterations reaches a step value of an integral multiple of, the learning rate decay time, when the number of iterations reaches the total number of iterations, the end; LR attenuated = lr * gamma, where, gamma = 0.1, the number of iterations <the total number of iterations.
7. 根据权利要求4所述的基于深度卷积神经网络的票据中手写签名位置定位方法,其特征在于,所述步骤3中训练网络时mini-batch size设为256。 7. The method of position location based ticket signature depth convolutional neural network in the handwriting, wherein according to claim 4, wherein, in said step of training the network 3 mini-batch size is set to 256.
CN201610841643.8A 2016-09-22 2016-09-22 Position locating method of handwritten signature in bill based on deep convolution neural network CN106469304A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610841643.8A CN106469304A (en) 2016-09-22 2016-09-22 Position locating method of handwritten signature in bill based on deep convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610841643.8A CN106469304A (en) 2016-09-22 2016-09-22 Position locating method of handwritten signature in bill based on deep convolution neural network

Publications (1)

Publication Number Publication Date
CN106469304A true CN106469304A (en) 2017-03-01

Family

ID=58230165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610841643.8A CN106469304A (en) 2016-09-22 2016-09-22 Position locating method of handwritten signature in bill based on deep convolution neural network

Country Status (1)

Country Link
CN (1) CN106469304A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960186A (en) * 2017-03-17 2017-07-18 王宇宁 Ammunition identification method based on depth convolution nerve network
CN107977665A (en) * 2017-12-15 2018-05-01 北京科摩仕捷科技有限公司 The recognition methods of key message and computing device in a kind of invoice

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298976A (en) * 2014-10-16 2015-01-21 电子科技大学 License plate detection method based on convolutional neural network
CN104966097A (en) * 2015-06-12 2015-10-07 成都数联铭品科技有限公司 Complex character recognition method based on deep learning
CN105488468A (en) * 2015-11-26 2016-04-13 浙江宇视科技有限公司 Method and device for positioning target area
CN105550701A (en) * 2015-12-09 2016-05-04 福州华鹰重工机械有限公司 Real-time image extraction and recognition method and device
US9400919B2 (en) * 2014-05-27 2016-07-26 Beijing Kuangshi Technology Co., Ltd. Learning deep face representation
CN105868774A (en) * 2016-03-24 2016-08-17 西安电子科技大学 Selective search and convolutional neural network based vehicle logo recognition method
CN105893952A (en) * 2015-12-03 2016-08-24 无锡度维智慧城市科技股份有限公司 Hand-written signature identifying method based on PCA method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9400919B2 (en) * 2014-05-27 2016-07-26 Beijing Kuangshi Technology Co., Ltd. Learning deep face representation
CN104298976A (en) * 2014-10-16 2015-01-21 电子科技大学 License plate detection method based on convolutional neural network
CN104966097A (en) * 2015-06-12 2015-10-07 成都数联铭品科技有限公司 Complex character recognition method based on deep learning
CN105488468A (en) * 2015-11-26 2016-04-13 浙江宇视科技有限公司 Method and device for positioning target area
CN105893952A (en) * 2015-12-03 2016-08-24 无锡度维智慧城市科技股份有限公司 Hand-written signature identifying method based on PCA method
CN105550701A (en) * 2015-12-09 2016-05-04 福州华鹰重工机械有限公司 Real-time image extraction and recognition method and device
CN105868774A (en) * 2016-03-24 2016-08-17 西安电子科技大学 Selective search and convolutional neural network based vehicle logo recognition method

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
LUIZ G. HAFEMANN等: "Writer-independent feature learning for Offline Signature Verification using Deep Convolutional Neural Networks", 《2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)》 *
MATTHEW D. ZEILER等: "Visualizing and Understanding Convolutional Networks", 《ECCV 2014》 *
ROSS GIRSHICK: "Fast R-CNN", 《ICCV 2015》 *
ROSS GIRSHICK等: "Region-Based Convolutional Networks for Accurate Object Detection and Segmentation", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
SHAOQING REN等: "Faster R-CNN: towards real-time object detection with region proposal networks", 《NIPS"15》 *
何柳: "表单识别中的关键问题研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
杨名等: "甚高速区域卷积神经网络的船舶视频目标识别算法", 《2016年全国通信软件学术会议程序册与交流文集》 *
陈致远: "基于区域梯度统计分析与卷积神经网络的条码定位算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960186A (en) * 2017-03-17 2017-07-18 王宇宁 Ammunition identification method based on depth convolution nerve network
CN107977665A (en) * 2017-12-15 2018-05-01 北京科摩仕捷科技有限公司 The recognition methods of key message and computing device in a kind of invoice

Similar Documents

Publication Publication Date Title
CN102609942B (en) Depth map using a mobile camera location
CN103093490B (en) Real-time video cameras single person&#39;s face based animation
CN102622762B (en) Real-time camera tracking using depth maps
CN104573731B (en) Fast target detection method based on convolutional neural networks
CN105787439B (en) A kind of depth image human synovial localization method based on convolutional neural networks
CN103810699B (en) Sar image change detection method based on an unsupervised neural network depth
CN105069413B (en) A kind of human posture&#39;s recognition methods based on depth convolutional neural networks
Zhong et al. An adaptive subpixel mapping method based on MAP model and class determination strategy for hyperspectral remote sensing imagery
CN101465002B (en) Method for orientating secondary pixel edge of oval-shaped target
CN106504233A (en) Unmanned aerial vehicle patrol detection image power small component identification method and system based on Faster R-CNN
JP5778237B2 (en) Backfill points in point cloud
CN103363962B (en) Remote sensing evaluation method of lake water reserves based on multispectral images
CN1786980A (en) Melthod for realizing searching new position of person&#39;s face feature point by tow-dimensional profile
DE112016004535T5 (en) Universal Compliance Network
CN106157307A (en) Monocular image depth estimation method based on multi-scale CNN and continuous CRF
CN105139395A (en) SAR image segmentation method based on wavelet pooling convolutional neural networks
WO2015188445A1 (en) Point cloud three-dimensional model reconstruction method and system
CN102436589B (en) Complex object automatic recognition method based on multi-category primitive self-learning
CN102982559A (en) Method and system for vehicle tracking
CN104182970B (en) A souvenir stations portrait photography composition recommendation method based Rule
CN102542302B (en) Automatic complicated target identification method based on hierarchical object semantic graph
CN103632167B (en) Monocular visual recognition space under terrestrial gravity field environment
CN104079827B (en) A kind of optical field imaging weighs focusing method automatically
CN102831427B (en) Texture feature extraction method fused with visual significance and gray level co-occurrence matrix (GLCM)
CN102169581A (en) Feature vector-based fast and high-precision robustness matching method

Legal Events

Date Code Title Description
PB01
SE01