CN108399408A

CN108399408A - A kind of deformed characters antidote based on deep space converting network

Info

Publication number: CN108399408A
Application number: CN201810181595.3A
Authority: CN
Inventors: 李子衿
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-03-06
Filing date: 2018-03-06
Publication date: 2018-08-14

Abstract

Optical character identification OCR is the classical problem in Pattern recognition and image processing field, and current artificial intelligence field basic research direction of interest.The technology is widely used to the industries such as bank, traffic, customs, public security, logistics, greatly reduces human cost, improves the working efficiency of the mankind.In order to carry out automatic straightening to the character of a variety of affine transformations such as translation, inclination, rotation, dimensional variation, the invention discloses a kind of deformed characters antidotes based on deep space converting network, by the introducing of spatial alternation network, the correction to deformed characters is realized；Wherein the spatial alternation network is character space converting network, specifically includes character locating network, mesh generator and character pixels sampler.The present invention can be suitably used for the various deformation situation of character picture, be reduced to the constraints of input picture, can be suitable for gray level image and RGB image simultaneously, and have stronger noiseproof feature.

Description

A kind of deformed characters antidote based on deep space converting network

Technical field

The invention belongs to character picture processing technology fields, and in particular to a kind of deformation based on deep space converting network Character antidote.

Background technology

Optical character identification (OCR, optical character recognition) is Pattern recognition and image processing neck The classical problem in domain, and current artificial intelligence field basic research direction of interest.The technology be widely used to bank, The industries such as traffic, customs, public security, logistics, greatly reduce human cost, improve the working efficiency of the mankind.But in reality In the character recognition application scenarios on border, due to shooting angle and distance etc., character picture more or less will present out deformation, Such as translation, inclination, rotation, dimensional variation, such deformation can influence the identification accuracy of OCR to a certain extent.Therefore, right It is to promote the important channel of OCR recognition accuracies that deformed characters, which carry out correction,.Deformed characters correction generally passes through image procossing etc. Relevant technological means is realized so that computerized algorithm can preferably be identified character, to further promote character Recognition accuracy.

The existing deformed characters correcting technology based on image procossing mainly has following three kinds：

(1) the character correction algorithm based on boundary rectangle.The depth-width ratio that the program passes through calculating character minimum enclosed rectangle Determine the degree of rotation of character picture, and according to the correspondence of depth-width ratio and character rotation angle come calculating character image need The angle to be corrected.The program is only capable of the rotational correction for solving the problems, such as to accord with image in same scale size condition lower word, and can not Suitable for other deformations of character picture, such as character scaling, inclination, translation.

(2) the character correcting technology based on parallel separation augmented vector approach.The program is asked using low-rank matrix Topic models character correction, and parallel separation method is combined with method of Lagrange multipliers, realizes the extensive of gray level image Multiple correction.The program is designed for gray level image, and is difficult to be applicable in the RGB image being widely present in life.

(3) the character antidote based on Hough transformation.The program is being inclined after detecting angle by using Hough transformation Projection localization is carried out to character on inclined-plane, and corresponding rotational correction is carried out one by one to the character of segmentation.The limitation of this method It is similar to scheme (1), it is only capable of the rotational deformation of processing character, and other deformations can not be suitable for.

Spatial alternation network was suggested to solve input picture in depth convolutional neural networks in 2016 and locally becomes for the first time Adaptive problem is changed, to achieve the purpose that improve classification accuracy.Spatial alternation network is one kind in deep neural network On the basis of improved after can to image local spatial alternation carry out adaptive learning depth network, it can utilize sit The affine transformation of mark matrix goes to realize that the classical of a series of images deforms, such as translation, scaling, rotation, inclination and other geometry Transformation.

The invention discloses a kind of deformed characters antidotes based on deep space converting network, are become using deep space Switching network automatic returning goes out the affine transformation parameter of deformed characters image, to carry out recovery correction to deformed characters.This programme Consider the various affine transformations of character, such as rotation, translation, scale, inclination, therefore can be suitably used for the more of character picture Kind deformation.Due to scheme using deep space converting network go carry out transformation parameter automatic returning so that scheme is to defeated Entering the constraints of image reduces, and can be suitable for gray level image and RGB image simultaneously, and have stronger noiseproof feature.

Invention content

The purpose of the present invention：In order to the character of a variety of affine transformations such as translation, inclination, rotation, dimensional variation into Row automatic straightening, the invention discloses a kind of deformed characters antidotes based on deep space converting network.

Technical scheme of the present invention：

A kind of deformed characters antidote based on deep space converting network, which is characterized in that pass through spatial alternation net The correction to deformed characters is realized in the introducing of network；Wherein the spatial alternation network is character space converting network, specific to wrap Include character locating network, mesh generator and character pixels sampler.

The introducing by spatial alternation network realizes the correction to deformed characters, specifically includes following steps：

The first step, character image data pretreatment：Affine transformation character sample is generated by character picture training set, including Translation, rotation, scaling, inclination conditions, and record corresponding affine transformation parameter, and by the deformed characters sample of generation and former character Sample matches, so as to the training that exercises supervision；

Second step designs character space converting network according to spatial alternation Principles of Network；

Third walks, the training of character space converting network：Utilize the paired sample and paired sample handled well in the first step The affine transformation parameter of middle deformed characters exercises supervision training to the character space converting network of design；

4th step, affine transformation parameter return：Affine transformation parameter is carried out using character locating network to deformed characters to return Return；

5th step, sampling grid generate：Mesh generator calculates output image lattice coordinate by affine transformation parameter θ Point corresponds to mesh coordinate position in the input image；

6th step, character pixels interpolation sampling：According to pair of the input grid being calculated in the 5th step and output grid Coordinate relationship is answered, carries out the interpolation sampling of character pixels in the input image, to determine the tool for exporting each grid in grid Volumetric pixel value completes the straightening of character.

Mnist may be used in character picture training set described in step 1.

The character locating network is designed according to character picture size difficulty or ease situation to be solved, specifically includes convolution Layer, pond layer, nonlinear activation elementary layer, full articulamentum and recurrence layer.

The supervised training of affine transformation parameter described in step 3 is by calculating net regression value and true tag value Averaged Square Error of Multivariate simultaneously carries out reversed gradient propagation and realizes；

The supervised training of paired sample described in step 3 is by calculating former character sample and character sample after correction Average pixel value mean square error simultaneously carries out reversed gradient propagation and realizes；

In supervised training described in step 3, training loss function consists of two parts, i.e. paired sample loss and change Parameter loss is changed, mathematical formulae indicates as follows：

Loss=MSE (Iⁱⁿ, I^out)+MSE(θ_evl, θ_gt)

Wherein IⁱⁿAnd I^outThe correcting image of the deformed characters image and network calculations output of input, θ are indicated respectively_evlWith θ_gtIndicate that the affine transformation parameter and true transformation parameter that deep space converting network returns, MSE indicate average square respectively Error；Under the action of the loss function, the parameter value of Optimized model is propagated by reversed gradient so that model reaches more satisfactory State.

Affine transformation parameter described in step 4 returns way of realization：

Regard the convolutional layer of character locating network, pond layer, non-linear unit layer as one group of feature extraction unit, then deforms Character picture will pass through feature extraction unit for several times successively, using recurrence layer is input to after several layers of full articulamentum, to generate 6 transformation parameter θ of affine transformation, can be indicated with following formula：

θ=F_loc(C_in)

Character pixels interpolation sampling specific method described in step 6 is：

The bilinearity character pixels interpolation sampling of following form is carried out in the input image：

Wherein H and W indicates the height and width of input character picture respectively,Indicate input character picture at coordinate points (n, m) Pixel value,Indicate the pixel value of i-th of coordinate points of output character image；Particularly, when the input net that required solution obtains Lattice pointIt has been more than the section of (- 1,1), then has meaned that some coordinate points for exporting grid do not have corresponding input coordinate Point is mapped, and Boundary filling strategy nearby can be taken.

Beneficial effects of the present invention：The deformed characters antidote based on deep space converting network designed by the present invention It goes to carry out regression estimates to the affine transformation parameter of deformed characters using deep space converting network, effectively covers character figure The various deformations situation such as translation, inclination, scale, rotation of picture, the character correction solution for comparing traditional have widely suitable The property used.The deep neural network can allow grayscale character image or RGB character pictures to be inputted simultaneously, be based on depth convolution The character locating network of network makes have stronger noiseproof feature.In addition, the program is not necessarily to the feature of engineer's character picture Extraction module, but deep space converting network is allowed to go to excavate the deformation state of image automatically, and the parameter of its deformation is assessed, it is real The deformed characters correction of " end-to-end " pattern is showed, this significantly reduces the cumbersome step that engineer's algorithm goes extraction transformation parameter Suddenly.Finally, the program can go to carry out the adjustment of network structure according to the complexity of task, this also improves character correction The Generalization Capability of technology.

Description of the drawings：

Fig. 1 is the flow chart of the deformed characters antidote based on deep space converting network.

Wherein, dotted arrow part indicates the loss function during model training, the supervised training of implementation model.Solid line Arrows show data flow direction.

Fig. 2 is the correction idiographic flow schematic diagram of one embodiment of the invention.

Specific implementation mode

Shown in reference to the accompanying drawings of the specification, the specific implementation mode of the present invention, in Fig. 1, dotted arrow part table are introduced Loss function in representation model training process, the supervised training of implementation model.Solid arrow indicates data flow direction.

The specific design and correction implementation steps of the present invention is completely as follows：The first step, character image data pretreatment：Pass through Character picture training set generates affine transformation character sample, including translation, rotation, scaling, inclination conditions, and records corresponding affine Transformation parameter, and the deformed characters sample of generation and former character sample are matched, so as to the training that exercises supervision.Wherein character picture The selection of training set is selected according to the target character classification of quasi- correction, and the character of such as quasi- correction belongs to numeric type, then can select Use Mnist as character picture training set.

Second step designs character space converting network according to the principle of spatial alternation network, specifically includes character locating net Network, mesh generator and character pixels sampler.Character locating network is according to character picture size difficulty or ease situation to be solved Design, including convolutional layer, pond layer, nonlinear activation elementary layer, full articulamentum and recurrence layer.

Third walks, the training of character space converting network：Utilize the paired sample and paired sample handled well in the first step The affine transformation parameter of middle deformed characters exercises supervision training to the character space converting network of design.Wherein, affine transformation is joined Several supervised trainings is by calculating the Averaged Square Error of Multivariate of net regression value and true tag value and by carry out reversed gradient propagation It realizes；The supervised training of the paired sample is equal by the average pixel value for calculating former character sample and character sample after correction Square error simultaneously carries out reversed gradient propagation and realizes；Training loss function consists of two parts, i.e. paired sample loss and transformation Parameter is lost, and mathematical formulae indicates as follows：

Loss=MSE (Iⁱⁿ, I^out)+MSE(θ_evl, θ_gt)

4th step, affine transformation parameter return：Affine transformation parameter is carried out using character locating network to deformed characters to return Return；For example, the convolutional layer of character locating network, pond layer, non-linear unit layer can be regarded as one group of feature extraction unit, then Deformed characters image will pass through feature extraction unit for several times successively, using being input to recurrence layer after several layers of full articulamentum, thus 6 transformation parameter θ for generating affine transformation, can be indicated with following formula：

θ=F_loc(C_in)

5th step, sampling grid generate：Mesh generator calculates output image lattice coordinate by affine transformation parameter θ Point corresponds to mesh coordinate position in the input image；In specific implementation process, in order to carry out dimension normalization, first output is schemed In the linear normalizing of mesh coordinate of picture to (- 1,1) range, for each mesh point of output imageUtilize first The parameter θ returned in step solves the mesh point coordinate value of input picture

The grid point coordinates of these input pictures is also to be normalized in (- 1,1) range, and the sampling that will be walked as third Grid.

6th step, character pixels interpolation sampling：According to pair of the input grid being calculated in the 5th step and output grid Coordinate relationship is answered, carries out the interpolation sampling of character pixels in the input image, to determine the tool for exporting each grid in grid Volumetric pixel value, for example, the bilinearity character pixels interpolation sampling of following form can be carried out in the input image：

Wherein H and W indicates the height and width of input character picture respectively,Indicate input character picture at coordinate points (n, m) Pixel value,Indicate the pixel value of i-th of coordinate points of output character image；Particularly, when the input net that required solution obtains Lattice pointIt has been more than the section of (- 1,1), then has meaned that some coordinate points for exporting grid do not have corresponding input coordinate Point is mapped, and can take Boundary filling strategy nearby at this time.

The specific embodiment application of technical solution of the present invention is as follows：

Assuming that word " is removed " in the deformation that input character picture is a width 80*80 resolution ratio, which first passes around character Network is positioned, returns out transformation parameter θ values by the feedforward of depth convolutional network and fully-connected network is propagated, value such as Fig. 2 Shown in lower right-most portion matrix.Then, mesh generator is according to this θ value and output mesh coordinate, the anti-sampling for releasing input picture Grid.Finally, character pixels sampler is according to the sampling grid generated, using the interpolation samplings method such as bilinear interpolation defeated The enterprising row interpolation sampling of deformed characters image entered, ultimately generates the character picture " removing " after correction as shown in the lower left corners Fig. 2.

Claims

1. a kind of deformed characters antidote based on deep space converting network, which is characterized in that pass through spatial alternation network Introducing, realize correction to deformed characters；Wherein the spatial alternation network is character space converting network, is specifically included Character locating network, mesh generator and character pixels sampler.

2. the deformed characters antidote according to claim 1 based on deep space converting network, which is characterized in that described logical The introducing of spatial alternation network is crossed, the correction to deformed characters is realized, specifically includes following steps：

The first step, character image data pretreatment：Affine transformation character sample is generated by character picture training set, including flat Move, rotation, scaling, inclination conditions, and record corresponding affine transformation parameter, and by the deformed characters sample of generation and former character sample This pairing, so as to the training that exercises supervision；

Third walks, the training of character space converting network：Using becoming in the paired sample and paired sample handled well in the first step The affine transformation parameter of shape character exercises supervision training to the character space converting network of design；

4th step, affine transformation parameter return：Affine transformation parameter recurrence is carried out to deformed characters using character locating network；

5th step, sampling grid generate：Mesh generator calculates output image lattice coordinate points by affine transformation parameter θ and exists Mesh coordinate position is corresponded in input picture；

6th step, character pixels interpolation sampling：According to the input grid being calculated in the 5th step and the corresponding seat for exporting grid Mark relationship carries out the interpolation sampling of character pixels in the input image, to determine the specific picture for exporting each grid in grid Element value, completes the straightening of character.

3. the deformed characters antidote according to claim 2 based on deep space converting network, which is characterized in that step Mnist may be used in character picture training set described in rapid one.

4. the deformed characters antidote according to claim 2 based on deep space converting network, which is characterized in that institute The character locating network stated is designed according to character picture size difficulty or ease situation to be solved, specifically include convolutional layer, pond layer, Nonlinear activation elementary layer, full articulamentum and recurrence layer.

5. the deformed characters antidote according to claim 2 based on deep space converting network, which is characterized in that step 3 Described in affine transformation parameter supervised training by calculating the Averaged Square Error of Multivariate of net regression value and true tag value simultaneously It carries out reversed gradient propagation and realizes.

6. the deformed characters antidote according to claim 2 based on deep space converting network, which is characterized in that step 3 Described in paired sample supervised training it is square by the average pixel value of character sample after calculating former character sample and correcting Error simultaneously carries out reversed gradient propagation and realizes.

7. the deformed characters antidote according to claim 2 based on deep space converting network, which is characterized in that step 3 Described in supervised training in, training loss function consist of two parts, i.e., paired sample loss and transformation parameter loss, mathematics Formula indicates as follows：

Loss=MSE (Iⁱⁿ, I^out)+MSE(θ_evl,θ_gt)

Wherein IⁱⁿAnd I^outThe correcting image of the deformed characters image and network calculations output of input, θ are indicated respectively_evlAnd θ_gtRespectively Indicate that the affine transformation parameter and true transformation parameter that deep space converting network returns, MSE indicate Averaged Square Error of Multivariate； Under the action of the loss function, the parameter value of Optimized model is propagated by reversed gradient so that model reaches comparatively ideal state.

8. the deformed characters antidote according to claim 2 based on deep space converting network, which is characterized in that step 4 Described in affine transformation parameter return way of realization be：

Regard the convolutional layer of character locating network, pond layer, non-linear unit layer as one group of feature extraction unit, then deformed characters Image will pass through feature extraction unit for several times successively, affine to generate using being input to recurrence layer after several layers of full articulamentum 6 transformation parameter θ of transformation, can be indicated with following formula：

θ=F_loc(C_in)。

9. the deformed characters antidote according to claim 2 based on deep space converting network, which is characterized in that step 6 Described in character pixels interpolation sampling specific method be：

Wherein H and W indicates the height and width of input character picture respectively,Picture of the expression input character picture in coordinate points (n, m) Element value,Indicate the pixel value of i-th of coordinate points of output character image；Particularly, when the input mesh point that required solution obtainsIt has been more than the section of (- 1,1), then has meaned that some coordinate points for exporting grid do not have corresponding input coordinate to click through Row mapping, can take Boundary filling strategy nearby.