CN107273897A

CN107273897A - A kind of character recognition method based on deep learning

Info

Publication number: CN107273897A
Application number: CN201710538785.1A
Authority: CN
Inventors: 凌贺飞; 赵航; 李平
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2017-07-04
Filing date: 2017-07-04
Publication date: 2017-10-20

Abstract

The invention discloses a kind of character recognition method based on deep learning, this method includes structure and training stage of the structure stage of spatial alternation layer with deep layer convolutional neural networks.Spatial alternation layer includes three parts, and positioning network receives characteristic pattern as input, by a series of hidden layers, then exports the parameter of spatial alternation, the parameter will be used on characteristic pattern；The parameter that mesh generator is generated using Part I produces sampling grid；Characteristic pattern and sampling grid as input, are sampled, finally obtain output characteristic figure result by sampler on mesh point to characteristic pattern.Spatial alternation layer differentiable, can carry out spatial manipulation, so as to make e-learning to the consistency to spatial warping, it is to avoid need the process of manually generated a large amount of deformation samples in traditional convolution network training to view data in a network by it.In addition, volume and neutral net by building deeper, have more preferable recognition effect for the various Chinese character of classification.

Description

A kind of character recognition method based on deep learning

Technical field

The invention belongs to the field of character recognition in pattern-recognition, more particularly, to a kind of text based on deep learning Word recognition methods.

Background technology

As continuing to develop for modern science and technology is widely available with internet, we will be touched with all kinds of daily The magnanimity information resource that form is presented, particularly among our life study and works usually, is often difficult to need with avoiding Substantial amounts of text information is handled, and is entered into computer.Therefore, how rapidly and accurately these words to be believed Breath is become for a urgent problem among being entered into each class of electronic devices such as computer.Optical character identification is (referred to as OCR) refer to by automatically extracting out the word in picture by machinery equipment, and convert thereof into the word that machine can be edited A kind of technology.

In general, traditional Chinese characters recognition method mainly includes data prediction, feature extraction and Classification and Identification three Point.

(1) pre-process.The effect of pretreatment is to strengthen useful image information, removes noise, so as to be conducive to feature to carry Take.The process is performed by means such as binaryzation, smoothing denoising and normalization.Wherein, binaryzation is to realize gray scale text diagram Conversion as arriving binaryzation text image；Denoising is that the isolated point (stain) in image is removed by after binaryzation；Normalization is rule The size of model essay word, position shape are to reduce the deformation between identical characters.

(2) feature extraction.Feature extraction is divided into 2 major classes：Architectural feature is extracted to be extracted with statistical nature.Structure-based spy Levy extraction to refer to, extract the character pixels information on character outline or skeleton, such as stroke feature, profile, surrounding features, part Font change can be effectively adapted to Deng, this method, it is strong to distinguish similar character ability, but there are various interference in image text, such as Inclination, distortion, fracture, adhesion, 5 points etc., such method antijamming capability is weaker.That is extracted after mathematic(al) manipulation is carried out to sample Feature, is referred to as statistical nature.The method mainly used have wavelet transformation, Fourier transformation, frequency-domain transform, square, discrete cosine Conversion etc..The feature extracted is typically supplied statistical sorter and used.In general, the identification opposed configuration feature of statistical nature Subdivision ability is weaker, distinguishes the indifferent of similar character.

(3) Classification and Identification.Sample is obtained to feature extraction during Classification and Identification, is identified by the classifying rules of foundation. Grader is the key problem of Classification and Identification, and the effect of grader is to speed up matching speed, improves recognition efficiency, reaches identification effect Really.

But the recognition methods of tradition Chinese character font has deficiency, because the complexity of chinese character, feature extracting method Changeable character outline can not be handled, Feature Points Extraction needs human expert to define important characteristic point position, moreover, right Unified standard can not be provided in the importance of those characteristic points, so as to cause Text region accuracy rate relatively low.

The content of the invention

For the disadvantages described above or Improvement requirement of prior art, depth is based on object of the present invention is to provide one kind The character recognition method of habit, thus solves the relatively low technical problem of recognition accuracy of the current character recognition method to word.

To achieve the above object, according to one aspect of the present invention, there is provided a kind of Text region based on deep learning Method, including：Spatial alternation layer building stage and deep layer convolutional neural networks are built and the training stage；

The spatial alternation layer building stage includes：

The characteristic pattern that network receives input is positioned, by serial hidden layer, spatial transformation parameter is exported, wherein, the parameter is The parameter that transforming function transformation function is acted on characteristic pattern；

The spatial transformation parameter that mesh generator is exported using positioning network produces sampling grid；

Sampler using the characteristic pattern and sampling grid of input as input, to the characteristic pattern of input on sampling network lattice point Sampled, finally obtain output characteristic figure result；

The deep layer convolutional neural networks are built to be included with the training stage：

The structure of deep layer convolutional neural networks is built, the spatial alternation layer of structure is arranged on deep layer convolutional neural networks Most starting position obtains target deep layer convolutional neural networks；

Target deep layer convolutional neural networks are trained using stochastic gradient descent method, and then obtain character recognition mould Type, the character recognition model is used to carry out Text region to the character image to be identified of input.

Preferably, the positioning network includes two convolutional layers, and the convolution nuclear volume of each convolutional layer is M, and size is N, step A length of s, is respectively provided with a maximum pond layer after each convolutional layer, and the pond layer size is L, and step-length is t, in each pond One ReLU layers are respectively provided with after layer, a full articulamentum is set after second ReLU layers, one is set after full articulamentum ReLU layers, then last layer is also full articulamentum, and for exporting spatial transformation parameter, dimension is d.

Preferably, the spatial transformation parameter that the mesh generator is exported using positioning network produces sampling grid, wraps Include：

ByThe output pixel after each pixel transform in input feature vector figure is obtained, by institute Sampling grid in output characteristic figure is constituted by output pixel, wherein,Represent in input feature vector figure in ith pixel Source coordinate,Represent the sampling grid in output characteristic figure in the coordinates of targets of ith pixel, A_θRepresent affine to become Matrix is changed, for the spatial transformation parameter of positioning network output, G_iRepresent the set of pixels in sampling grid.

Preferably, the sampler using the characteristic pattern and sampling grid of input as input, it is right on sampling network lattice point The characteristic pattern of input is sampled, and finally obtains output characteristic figure result, including：

ByObtain output characteristic The corresponding coordinate points of each pixel in figurePixel value, wherein, Φ_xAnd Φ_yFor sampling kernel k () parameter,It is The pixel value of the coordinate points (n, m) of c passages ith pixel in input feature vector figure of input feature vector figure, V_i ^cIt is output characteristic figure C passages in output characteristic figure coordinate pointsOutput pixel value, W represents the width of input feature vector figure, and H represents input The height of characteristic pattern, C represents the port number of input feature vector figure.

ByObtain The corresponding coordinate points of each pixel in output characteristic figurePixel value, wherein,Expression is rounded downwards, δ () expressions gram Luo Neike functions,It is the pixel of the coordinate points (n, m) of c passages ith pixel in input feature vector figure of input feature vector figure Value, V_i ^cBe output characteristic figure c passages in output characteristic figure coordinate pointsOutput pixel value, W represents input feature vector The width of figure, H represents the height of input feature vector figure, and C represents the port number of input feature vector figure.

By The corresponding coordinate points of each pixel into output characteristic figurePixel value, wherein,It is the c passages of input feature vector figure The pixel value of coordinate points (n, m), V in input feature vector figure_i ^cBe output characteristic figure c passages in output characteristic figure i-th of picture The coordinate points of elementOutput pixel value, W represents the width of input feature vector figure, and H represents the height of input feature vector figure, and C represents defeated Enter the port number of characteristic pattern.

In general, by the contemplated above technical scheme of the present invention compared with prior art, it can obtain down and show Beneficial effect：Character recognition method proposed by the present invention based on deep learning by spatial alternation layer by being incorporated into convolutional Neural net In network, various spatial alternations can be actively carried out to input character image in a network, and without carrying out volume to optimization process Outer training supervision or modification.As a result show, utilization space transform layer can make model learning to translation, scaling, rotation And more general spatial warping consistency, can preferably it recognize in the presence of the character more substantially deformed.

Brief description of the drawings

Fig. 1 is a kind of schematic flow sheet of the character recognition method based on deep learning disclosed in the embodiment of the present invention；

Fig. 2 is a kind of structural representation of spatial alternation layer disclosed in the embodiment of the present invention.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in each embodiment of invention described below Not constituting conflict each other can just be mutually combined.

A kind of character recognition method based on deep learning disclosed by the invention, devises a kind of spatial alternation volume of deep layer Product neutral net, actively can carry out various spatial alternations, so as to reach the enhanced purpose of data, together to the character picture of input The ability of Shi Tisheng cyberspace consistency, has relatively higher recognition accuracy for chinese character.

It is a kind of flow signal of character recognition method based on deep learning disclosed in the embodiment of the present invention as shown in Figure 1 Figure, includes the structure for building stage and deep layer convolutional neural networks in two stages, i.e. spatial alternation layer in the method shown in Fig. 1 Build and the training stage, the two stages are specifically described below.

(A) the spatial alternation layer building stage includes：

Wherein, network is positioned by characteristic patternAs input, a width of W, a height of H, port number is C, is output as θ, θ are transforming function transformation function T_θThe parameter acted on characteristic pattern：θ=f_loc(U).θ form can be with various, after parametrization Alternative types, such as affine transformation, then θ be exactly one 6 dimension output.

Position network function f_loc() can be any form, such as fully-connected network or convolutional neural networks, but finally Layer, which will be returned, comprising one is used to generate transformation parameter θ.

In the present invention, the positioning network includes two convolutional layers, and the convolution nuclear volume of each convolutional layer is M, and size is N, step-length is s, and a maximum pond layer is respectively provided with after each convolutional layer, and the pond layer size is L, and step-length is t, each An activation primitive (Rectified Linear Units, ReLU) layer is respectively provided with after the layer of pond, is set after second ReLU layers A full articulamentum is put, one ReLU layers are set after full articulamentum, then last layer is also full articulamentum, for exporting sky Between transformation parameter, dimension is d.Preferably, M values are that 20, N values are that 5, s values are that 1, L values are that 2, t values are 2, Quan Lian The output classification for connecing layer is preferably 20.

Wherein, in order to carry out various modifications processing to input feature vector figure, each output pixel is entered by a sampling kernel Row calculating is obtained, and the kernel is centered on an ad-hoc location in input feature vector figure.Here input pixel is referred to commonly Characteristic pattern in a pixel, must be not necessarily original image.Generally, output pixel is in set of pixels Sampling grid G={ G_iIn be defined, so as to produce output characteristic figureWherein H' and W' represent sampling network The height and width of lattice, C represent port number.

Assuming that T_θIt is one 2 dimension affine transformation A_θ, then conversion is as shown in formula (1) pixel-by-pixel.

WhereinBe in output characteristic figure sampling grid ith pixel coordinates of targets,It is definition Source coordinate in the input feature vector figure of ith sample point, A_θIt is affine transformation matrix.After we are using height and the width normalization Coordinate so that have in the space boundary of outputHave in the space boundary of input(being also similar for y-coordinate).Source/object transformation and sampling operation and the standard texture in graphics Map and coordinate is equivalent.

Can be to conversion T_θClassification do more limitations, such as, work as transformation matrixWhen, can be with By adjusting s, t_xAnd t_yTo realize the operations such as cutting, translation and scaling.In fact, conversion can include any parametrization Form, but have a condition is exactly it relative to parameter can differentiable, this point is very crucial, it can allow by gradient from Sampled point T_θ(G_i) positioning network is propagated backward to, and then obtain parameter θ.

Wherein, in order to carry out spatial alternation on input feature vector figure, sampler needs to use sampling point set in characteristic pattern U T_θ(G) sampled, so that the output characteristic figure V after being sampled.T_θ(G) each inCoordinate definition is defeated Enter the locus of characteristic pattern, sampled in input feature vector figure using sampling kernel, obtain output characteristic figure specific pixel Shown in value, such as formula (2).

Wherein Φ_xAnd Φ_yFor universal sample kernel k () parameter, the interpolation operation that they define image is (such as linear Interpolation),It is the value of c passages coordinate points (n, m) in input feature vector figure of input feature vector figure, V_i ^cIt is that c passages are special in output Levy the coordinate of ith pixel in figureOutput pixel value.Note, be to each passage of input feature vector figure Exactly the same sampling processing, thus each passage be changed in an identical manner (so doing can keep logical Space Consistency between road).

In theory, as long as can be rightAndSub- gradient is defined, any type of sampling kernel can be made by bringing With.Such as, formula (2) can be converted into formula (3) using integer samples kernel.

WhereinX is rounded up to immediate integer by expression.δ () represents Kronecker function.The sampling Kernel is equal to distanceThe value of nearest pixel copies to outgoing positionOr two-wire can also be used Property sampling kernel, such as shown in formula (4).

In order to be able to realize the backpropagation of loss in this sampling mechanism, we can define the ladder relative to U and G Degree, for bilinearity sampling (4), partial derivative such as formula (5), shown in (6).

Calculation with (6) it is similar.

Network is positioned, mesh generator forms spatial alternation layer with sampler three, is illustrated in figure 2 spatial alternation layer Structural representation, this is a completely self-contained module, can be disposed in any amount in convolutional neural networks appoint What position, and then obtain spatial alternation network.The module calculating speed is fast, will not have an impact to training speed, time overhead is small.

(B) deep layer convolutional neural networks build and included with the training stage：

Wherein, deep layer convolutional neural networks are built, including definition constitutes the number of plies of network, convolution window size and nodes Deng.As an alternative embodiment, in embodiments of the present invention, the network for finally building completion contains the Internet of parameter There are 14 layers (there are 19 layers if including also calculating input layer, pond layer and softmax output), network includes 4 Inception modules are used to increase network node quantity, while network can be controlled effectively, in training, computation complexity will not There is explosive increase.Each inception module is 1 × 1,3 × 3 and 5 × 5 convolutional layer and one 3 × 3 by size Maximum pond layer composition.

Wherein, because spatial alternation layer is completely self-contained, any position of network can be placed in any quantity. Select the beginning by network is placed in the spatial alternation of middle design on last stage in the present invention, that is, the data of network are defeated After entering layer.

Spatial alternation layer is combined with deep layer convolutional neural networks, so that spatial alternation network is obtained, can be to feature Figure carries out the spatial alternation of active, and enhancing network is constant to translation, scaling, rotation and more general spatial warping Property.

Network model is trained using stochastic gradient descent method and obtains objective network, parameter setting is as follows：batch Size is that 256, base learning rate are 0.01, is not provided with weight decay, while learning rate are per 50k Secondary 10 times of reduction.Random initializtion is carried out to the weight of network, but does not include the last recurrence layer of positioning network, this layer it is initial Change will consider to return identical transformation.Objective network is used to carry out Text region to the character image of input.

As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, it is not used to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the invention etc., it all should include Within protection scope of the present invention.

Claims

1. a kind of character recognition method based on deep learning, it is characterised in that including：Spatial alternation layer building stage and depth Layer convolutional neural networks are built and the training stage；

The spatial alternation layer building stage includes：

The characteristic pattern that network receives input is positioned, by serial hidden layer, spatial transformation parameter is exported, wherein, the parameter is conversion The parameter that function is acted on characteristic pattern；

Sampler as input, carries out the characteristic pattern and sampling grid of input on sampling network lattice point to the characteristic pattern of input Sampling, finally obtains output characteristic figure result；

The structure of deep layer convolutional neural networks is built, the spatial alternation layer of structure is arranged on most opening for deep layer convolutional neural networks Beginning, position obtained target deep layer convolutional neural networks；

Target deep layer convolutional neural networks are trained using stochastic gradient descent method, and then obtain character recognition model, institute Stating character recognition model is used to carry out Text region to the character image to be identified of input.

2. according to the method described in claim 1, it is characterised in that the positioning network includes two convolutional layers, each convolutional layer Convolution nuclear volume be M, size is N, and step-length is s, and a maximum pond layer, the pond layer are respectively provided with after each convolutional layer Size is L, and step-length is t, and one ReLU layers are respectively provided with after each pond layer, and a full connection is set after second ReLU layers Layer, sets one ReLU layers, then last layer is also full articulamentum, for exporting spatial transformation parameter after full articulamentum, Dimension is d.

3. according to the method described in claim 1, it is characterised in that the space that the mesh generator is exported using positioning network Transformation parameter produces sampling grid, including：

ByThe output pixel after each pixel transform in input feature vector figure is obtained, by all defeated The sampling grid gone out in pixel composition output characteristic figure, wherein,Represent in input feature vector figure in the source of ith pixel Coordinate,Represent that sampling grid is in the coordinates of targets of ith pixel, A in output characteristic figure_θRepresent affine transformation matrix, For the spatial transformation parameter of positioning network output, T_θFor transforming function transformation function, G_iRepresent the set of pixels in sampling grid.

4. method according to claim 3, it is characterised in that the sampler is by the characteristic pattern and sampling grid of input As input, the characteristic pattern of input is sampled on sampling network lattice point, output characteristic figure result is finally obtained, including：

ByObtain in output characteristic figure The corresponding coordinate points of each pixelPixel value, wherein, Φ_xAnd Φ_yFor sampling kernel k () parameter,It is input The pixel value of c passages coordinate points (n, m) in input feature vector figure of characteristic pattern, V_i ^cBe output characteristic figure c passages it is special in output Levy the coordinate points of ith pixel in figureOutput pixel value, W represents the width of input feature vector figure, and H represents input feature vector The height of figure, C represents the port number of input feature vector figure.

5. method according to claim 3, it is characterised in that the sampler is by the characteristic pattern and sampling grid of input As input, the characteristic pattern of input is sampled on sampling network lattice point, output characteristic figure result is finally obtained, including：

ByExported The corresponding coordinate points of each pixel in characteristic patternPixel value, wherein,Expression is rounded downwards, and δ () is represented in Crow Gram function,It is the pixel value of c passages coordinate points (n, m) in input feature vector figure of input feature vector figure, V_i ^cIt is output characteristic The coordinate points of c passages ith pixel in output characteristic figure of figureOutput pixel value, W represents input feature vector figure Width, H represents the height of input feature vector figure, and C represents the port number of input feature vector figure.

6. method according to claim 3, it is characterised in that the sampler is by the characteristic pattern and sampling grid of input As input, the characteristic pattern of input is sampled on sampling network lattice point, output characteristic figure result is finally obtained, including：

ByObtain defeated Go out the corresponding coordinate points of each pixel in characteristic patternPixel value, wherein,It is the c passages of input feature vector figure defeated Enter the pixel value of coordinate points (n, m) in characteristic pattern, V_i ^cBe output characteristic figure c passages in output characteristic figure ith pixel Coordinate pointsOutput pixel value, W represents the width of input feature vector figure, and H represents the height of input feature vector figure, and C represents that input is special Levy the port number of figure.