CN107220655A

CN107220655A - A kind of hand-written, printed text sorting technique based on deep learning

Info

Publication number: CN107220655A
Application number: CN201610168622.4A
Authority: CN
Inventors: 金连文; 冯子勇; 阳赵阳; 孙俊
Original assignee: South China University of Technology SCUT; Fujitsu Ltd
Current assignee: South China University of Technology SCUT; Fujitsu Ltd
Priority date: 2016-03-22
Filing date: 2016-03-22
Publication date: 2017-09-29

Abstract

Invention describes a kind of hand-written, printed text sorting technique based on deep learning.Specifically include the following steps：(1) data acquisition：Hand-written, printed text image is gathered, to form training set；(2) binaryzation, height normalized are carried out to training set image；(3) extensive sample：Training set image is cut, adds processing of making an uproar；(4) the depth convolutional neural networks constructed are trained by construction depth convolutional neural networks using training set image；(5) textual image of classification is intended in cutting, and the depth convolutional neural networks that input step (4) is constructed, according to obtained probability distribution, are averaged to probability distribution, output category result.The present invention automatic different characteristic learnt between handwritten text and printed text from sample by deep learning algorithm, is differentiated in which can make computer intelligence to handwritten text, printed text image.

Description

A kind of hand-written, printed text sorting technique based on deep learning

Technical field

It is more particularly to a kind of hand-written and printed text the invention belongs to pattern-recognition and field of artificial intelligence Sorting technique.

Background technology

With developing rapidly for computer technology, document analysis technology is also increasingly widely applied to paper document Storage and the daily life such as retrieval in.Digital document is transitioned into textual image via initial plain text document Mixing, printscript mixing, multilingual document mixing etc..

In actual life, the substantial amounts of hand-written and block letter mixing document being applied to.Hand-written and print in document Brush body text all each plays due effect, and the detection, differentiation and processing to these different type texts are Significantly.Particularly, the hand-written data in document often contains extra important information, therefore will Handwritten text is distinguished, it helps follow-up more targetedly data processing and algorithm research.

Convolutional neural networks are one kind of artificial neural network, it has also become current speech is analyzed and field of image recognition Study hotspot.Its weights share network structure and are allowed to be more closely similar to biological neural network, reduce network mould The complexity of type, reduces the quantity of weights.The advantage is showed more when the input of network is multidimensional image Substantially, image is allow directly as the input of network, it is to avoid complicated feature extraction in tional identification algorithm And data reconstruction processes.Convolutional neural networks are a Multilayer Perceptions of the particular design for identification two-dimensional shapes Device, deformation of this network structure to translation, proportional zoom, inclination or other forms has consistency.

Recently during the last ten years, the research work of artificial neural network particularly convolutional neural networks deepens continuously, Through making great progress, it has successfully solved many modern times in fields such as speech analysis, image recognitions The insoluble practical problem of computer, shows good intelligent characteristic.

The content of the invention

There is provided a kind of hand based on deep learning in order to overcome the technical problem present in prior art by the present invention Write, the sorting technique of printed text, the sorting technique can effectively learn to distinguishing hand-written and printed text Feature, so as to obtain more preferable classification performance, specific efficiency high, the characteristics of discrimination is high.

The present invention adopts the following technical scheme that to realize：It is a kind of hand-written, printed text point based on deep learning Class method, comprises the following steps：(1) data acquisition：Hand-written, printed text image is gathered, to form instruction Practice collection；(2) binaryzation, height normalized are carried out to training set image；(3) extensive sample：To instruction Practice collection image to be cut, add processing of making an uproar；(4) construction depth convolutional neural networks, utilize training set image The depth convolutional neural networks constructed are trained；(5) textual image of classification, input step are intended in cutting Suddenly the depth convolutional neural networks that (4) are constructed, according to obtained probability distribution, average to probability distribution, Output category result.

Preferably, the step (2) comprises the following steps：Training set image is converted into gray-scale map by (2-1)； Gray-scale map is highly normalized to H pixel by (2-2)；Picture after (2-3) is normalized to height is carried out Binaryzation.

Preferably, the binarization method is global average binaryzation：It is right using picture pixels average as threshold value Picture after highly normalizing carries out binaryzation, and the pixel that will be greater than threshold value is entered as 255, will be less than threshold value Pixel be entered as 0.

Preferably, the step (3) comprises the following steps：(3-1) presses step-length S, and binaryzation picture is cut It is segmented into the picture that width is W；If picture width is less than W, picture width is amplified to W；(3-2) is passed through Cross after step (3-1) processing, a binaryzation picture produces the N big small picture for W × H, and one is cut Picture after cutting carries out adding processing of making an uproar to obtain M plus picture of making an uproar, common N × M plus picture of making an uproar, to expand sample This space, H is the number of pixels after gray-scale map is highly normalized.

Preferably, the step (4) comprises the steps of：

(4-1) construction depth convolutional neural networks：

Input(96x32)->50C(7x3)S1->ReLU->MP2->80C(6x6)S1->ReLU-> MP2->500N->ReLU->Dropout(0.5)->2N->Softmax/Output(2x1)

Wherein, Input (96x32) represents that the picture size that input layer receives is 96x32 pixels；50C(7x3)S1 The convolutional layer to input picture progress feature extraction is represented, core size is 7x3, and step-length is 1, exports 50 spies Levy figure；ReLU represents the linearity rectification active coating being modified to the feature that convolution is obtained；MP2 is represented to amendment Feature afterwards carries out the maximum pond layer of maximum extraction, and core size is 2x2, and step-length is 2；500N is represented pair The full articulamentum that the feature that preceding layer is obtained is learnt according to different weights, is output as 500 dimensional characteristics； Dropout (0.5) is the random inhibition layer for preventing network from causing classification capacity to decline to training sample overlearning, Suppression ratio is 50%；Softmax/Output (2x1) represents that output layer is Softmax layers, and output is defeated Enter the probability distribution that picture is classified into handwritten text or printed text；

(4-2) utilizes training set image training depth convolutional neural networks：

(4-2-1) sets the number of pictures of batch training each time as BS, by step (3-1) and step The pictures produced after being cut in (3-2) and M plus the picture of making an uproar produced after adding processing of making an uproar, altogether M+1 pictures are considered as one group of pretreatment sample img_M+1；Every time to step (4-1) depth convolution god When being trained through network, one is respectively randomly selected from BS groups pretreatment sample, a collection of training sample is constituted img_BSCarry out batch training；

(4-2-2) is carried out using stochastic gradient descent method to step (4-1) the depth convolutional neural networks Training, sets initial learning rate as lr₀, initial learning rate is that neutral net is sought in training sample space Look for the iterative rate of optimal solution；Learning parameter penalty coefficient is λ, and learning parameter penalty coefficient is to prevent nerve net Parameter of the network to the overlearning of training set sample；Maximum training iterations is iters_max, maximum training iteration The study iterations of progress needed for when number of times reaches requirement threshold value for neural network classification precision；Learning rate is more New paragon is as follows：

Wherein, lr₀Value is 0.01,0.003 or 0.005；λ values are 0.01,0.005,0.001；iters_max Scope is 10000~15000；Iter is current iteration number of times；lr_iterFor current learning rate；γ scopes are 0.0003 To 0.0001；Stepsize scopes are 2000 to 3000.

Preferably, the step (5) comprises the steps of：

(5-1) is to any one picture img for intending classification_test, cut, intercepted out using sliding window mode Common N_testOpen the picture img of W × H sizes_split, sliding window size is W × H；

(5-2) is by N_testThe depth convolutional Neural networking of construction in image input step (4) is opened, N is obtained_testGroup It is classified as the probability distribution of handwritten text or printed text；By this N_testGroup probability distribution is averaged, with probability The maximum classification of average judges classification output as final.

Compared with prior art, the present invention has advantages below and beneficial effect：

(1) due to the text feature learning algorithm using depth network structure, so can be good at from data Learning is expressed to effective text feature, improves the accuracy rate of sorting technique of the present invention.

(2) compared with traditional text geometric properties, more appearance features can be extracted, are obtained preferably Text feature is described, so as to obtain recognition effect more more preferable than traditional text geometric properties.

(3) sorting technique discrimination height of the present invention, strong robustness, efficiency high, speed are fast, can be effectively Learn to the feature for distinguishing hand-written and printed text, so as to obtain more preferable classification performance.

Brief description of the drawings

Fig. 1 is the flow chart of sorting technique of the present invention；

Fig. 2 is pretreatment process figure of the invention；

Fig. 3 is the example of preprocessing process of the present invention；

Fig. 4 is depth convolutional neural networks structure chart of the invention；

Fig. 5 is Classification and Identification flow chart of the invention；

Fig. 6 is the example of Classification and Identification process of the present invention；

Embodiment

With reference to embodiment and accompanying drawing, the present invention is described further, but embodiments of the present invention are not limited In this.

Embodiment

Hand-written, printed text the sorting technique of the present invention, FB(flow block) as shown in Figure 1, comprises the following steps：

(1) data acquisition：Hand-written, printed text image is gathered, to form text image training set；

Textual image can be generated (for example by file photographing, character library：Given birth to using Times New Roman fonts Into English printed text picture) etc. mode obtain data, to be formed in text image training set, training set Printed text picture and handwritten text picture respectively account for half.

(2) data prediction：Image binaryzation, picture altitude normalization；

Step (2) is comprised the steps of：

Printed text picture and handwritten text picture in training set is converted into gray-scale map by (2-1)；

Gray scale picture height is normalized to 32 pixels by (2-2)；

Picture after (2-3) is normalized to height carries out binaryzation.It is preferred that global average binaryzation：With picture Pixel average carries out binaryzation as threshold value to image：The pixel that will be greater than threshold value is entered as 255 (i.e. in vain Color), the pixel less than threshold value is entered as 0 (i.e. black).

(3) extensive sample：Training set image is cut, adds processing of making an uproar；

Step (2) and (3) form the pretreatment process of the present invention, as shown in Figure 2.Step (3) is specific Comprise the steps of：

(3-1) presses the pixel of step-length 24, and binaryzation picture is cut into the picture that width is 96 pixels；If figure Piece width is less than 96 pixels, then picture width is amplified into 96 pixels；

(3-2) after step (3-1) processing, a binaryzation picture produces 3 big small for 96x32 Picture；To one cut after picture carry out plus make an uproar processing (rotation processing, lines interference, noise disturb, Gaussian Blur etc.) obtain 3 plus picture of making an uproar, common 3x3 plus picture of making an uproar, as shown in Figure 3.

(4) training network：Construction depth convolutional neural networks, are trained；

Step (4) comprises the following steps：

(4-1) constructs following depth convolutional neural networks (as shown in Figure 4)：

(4-2) depth convolutional neural networks are trained, and step is as follows：

(4-2-1) sets the number of pictures of batch training each time as 100, by step (3-1) and step The pictures produced after being cut in (3-2) are total to M produced after adding processing of making an uproar plus picture of making an uproar M+1 pictures are considered as one group of pretreatment sample img_M+1；Neutral net designed by step (4-1) is each One is respectively randomly selected when being trained from 100 groups of pretreatment samples, a collection of training sample img is constituted_BSEnter Row batch is trained；

Wherein, lr₀Value is 0.01；λ values are 0.005；iters_maxValue is 10000；Iter is current Iterations；lr_iterFor current learning rate；γ values are 0.0001；Stepsize values are 2500.

(5) textual image of cutting plan classification, the depth convolutional neural networks designed by input step (4), According to obtained probability distribution, probability distribution is averaged, output category result.

Step (5) comprises the following steps (as shown in Figure 5,6)：

(5-1) is cut using sliding window mode to a plan category images, intercepts out totally 4 96x32 (window size is 96x32 to the picture of size, and 24) step-length is；

The depth convolutional Neural networking that (5-2) designs 4 image input steps (4-1), obtains 4 groups of quilts It is categorized as the probability distribution of handwritten text or printed text；This 4 groups of probability distribution are averaged, with mathematical expectation of probability Maximum classification judges classification output as final.

In the example shown in Fig. 6, it is a handwritten text picture to intend the textual image of classification, using sliding window After mouth mode is cut, 4 pictures are obtained；The depth volume designed by the picture input present invention after 4 are cut Product neutral net, the convolution results to 4 pictures calculate printed text probability and handwritten text probability, asked respectively Mean of a probability distribution is taken, the average of handwritten text probability is maximum, and output category result is handwritten text picture.

Embodiments of the present invention are simultaneously not restricted to the described embodiments, other any spirit without departing from the present invention Essence with the change made under principle, modification, replacement, combine, simplification, should be equivalent substitute mode, It is included within protection scope of the present invention.

Claims

1. a kind of hand-written, printed text sorting technique based on deep learning, it is characterised in that including with Lower step：

(1) data acquisition：Hand-written, printed text image is gathered, to form training set；

(2) binaryzation, normalized are carried out to training set image；

(4) construction depth convolutional neural networks, using training set image to the depth convolutional Neural net that is constructed Network is trained；

(5) textual image of cutting plan classification, the depth convolutional neural networks that input step (4) is constructed, According to obtained probability distribution, probability distribution is averaged, output category result.

2. hand-written, printed text sorting technique according to claim 1, it is characterised in that described Step (1) generates textual image mode to obtain data by file photographing, character library, the training set formed Middle printed text picture and handwritten text picture respectively account for half.

3. hand-written, printed text sorting technique according to claim 1, it is characterised in that described Step (2) comprises the following steps：

Training set image is converted into gray-scale map by (2-1)；

Gray-scale map is normalized to H pixel by (2-2)；

Picture after (2-3) is normalized to height carries out binaryzation.

4. hand-written, printed text sorting technique according to claim 3, it is characterised in that described Binarization method is global average binaryzation：Using picture pixels average as threshold value, the figure after being normalized to height Piece carries out binaryzation, and the pixel that will be greater than threshold value is entered as 255, the pixel less than threshold value is entered as into 0.

5. hand-written, printed text sorting technique according to claim 1, it is characterised in that described Step (3) comprises the following steps：

(3-1) presses step-length S, and binaryzation picture is cut into the picture that width is W；If picture width is less than W, Picture width is then amplified to W；

(3-2) after step (3-1) processing, a binaryzation picture produces N big small for W × H's Picture, the picture after a cutting carries out adding processing of making an uproar to obtain M plus picture of making an uproar, common N × M plus figure of making an uproar Piece, with enlarged sample space, H is the number of pixels after gray-scale map is highly normalized.

6. hand-written, printed text sorting technique according to claim 5, it is characterised in that described Plus processing of making an uproar includes：Lines interference, noise interference, Gaussian Blur processing and rotation processing.

7. hand-written, printed text sorting technique according to claim 5, it is characterised in that the H Scope is that 28 to 34, S scopes are 23 to 25 pixels, and W scopes are 92 to 100 pixels.

8. hand-written, printed text sorting technique according to claim 5, it is characterised in that described Step (4) is comprised the steps of：

(4-1) construction depth convolutional neural networks：

(4-2-2) is carried out using stochastic gradient descent method to step (4-1) the depth convolutional neural networks Training, sets initial learning rate as lr₀, initial learning rate is that neutral net is sought in training sample space Look for the iterative rate of optimal solution；Learning parameter penalty coefficient is λ, and learning parameter penalty coefficient is to prevent nerve net Parameter of the network to training set sample overlearning；Maximum training iterations is iters_max, maximum training iteration time The study iterations of progress needed for when number reaches requirement threshold value for neural network classification precision；Learning rate updates Mode is as follows：

<mrow> <msub> <mi>lr</mi> <mrow> <mi>i</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> </mrow> </msub> <mo>=</mo> <msub> <mi>lr</mi> <mn>0</mn> </msub> <mo>&times;</mo> <mrow> <mo>(</mo> <mfrac> <mn>1</mn> <mrow> <mn>1</mn> <mo>+</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>&gamma;</mi> <mo>&times;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mo>-</mo> <mi>s</mi> <mi>t</mi> <mi>e</mi> <mi>p</mi> <mi>s</mi> <mi>i</mi> <mi>z</mi> <mi>e</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>

9. hand-written, printed text sorting technique according to claim 5, it is characterised in that described Step (5) is comprised the steps of：

10. hand-written, printed text sorting technique according to claim 9, it is characterised in that described Sliding window size is 96x32.