CN110674815A

CN110674815A - Invoice image distortion correction method based on deep learning key point detection

Info

Publication number: CN110674815A
Application number: CN201910932792.9A
Authority: CN
Inventors: 池明辉; 肖欣庭; 梁欢; 罗珊珊; 赵冬
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2020-01-10

Abstract

The invention provides an invoice image distortion correction method based on deep learning key point detection, and belongs to the technical field of image processing. The invention solves the problem of correcting the bill image distortion, and the key points of the technical scheme are as follows: firstly, marking and enhancing training data; secondly, setting a network structure and training parameters; then, setting a training key point detection model by using the network structure and the training parameters, and storing the trained model; then, detecting key points of the bills by using the trained model; and finally, aligning the bill by using the detected key points. The method can be fast, accurate and suitable for natural scenes, the accuracy of OCR recognition is improved to a great extent by recognizing the corrected picture, the manpower and material resource investment is reduced for downstream OCR application, and resources are saved.

Description

Invoice image distortion correction method based on deep learning key point detection

Technical Field

The invention relates to the technical field of image processing, in particular to an invoice image distortion correction method based on deep learning key point detection.

Background

In recent years, the development of AI technology has been rapidly advanced, and its application fields are also increasingly wider, such as robot, speech recognition, image recognition, computer vision, automatic driving, and the like. In image recognition, deep learning-based OCR recognition is widely adopted in the industry because of its advantages such as high recognition accuracy and high recognition speed. As is well known, OCR technology generally divides into two technical branches of text detection and text recognition, and although end-to-end OCR recognition based on a neural network has been recently introduced, its effect in a specific scenario is not ideal. Therefore, the mainstream OCR recognition technology is divided into two directions of text detection and text recognition. The OCR recognition accuracy is not only limited by the quality of the recognition algorithm, but also the text detection effect plays a decisive role. And the influence of the image quality on the text detection effect is more obvious. Especially in the era of mobile internet, the rise of mobile devices has led to an increasing demand of common users for OCR applications, however, due to the uncontrollable behavior of users, images captured by mobile devices in various scenes are also very different. The influence of image distortion (non-flat shooting) on the positioning and recognition of the image character region is particularly obvious, and if the acquired image can be corrected before image recognition, the accuracy of character recognition can be effectively improved from the source.

Disclosure of Invention

The invention aims to provide an invoice image distortion correction method based on deep learning key point detection, which solves the problem of correcting the invoice image distortion.

The invention solves the technical problem, and adopts the technical scheme that: the invoice image distortion correction method based on the deep learning key point detection comprises the following steps:

step 1, marking and enhancing training data;

step 2, setting a network structure and training parameters;

step 3, setting a training key point detection model by using the network structure and the training parameters, and storing the trained model;

step 4, detecting key points of bills by using the trained model;

and 5, aligning the bill by using the detected key points.

Specifically, in the step 1, data is labeled by using a part of manual work, and then a large amount of training data is generated by using a data enhancement strategy, which specifically comprises the following steps:

step 101, preparing label data;

102, data annotation;

103, enhancing data;

step 104, converting data formats;

and 105, dividing the data set.

Further, in step 101, when preparing annotation data, collecting different types of pictures to be annotated, wherein 1000 pictures in each type define the positions and names of key points of each type of bills, and the number of the key points is more than 4;

the definition of the key points is performed according to the following criteria:

if the invoice picture has a table and the form style is fixed, the corner points in the table are taken as the standard during actual definition, and the selected key points are distributed at each position of the invoice face;

if the invoice picture has no table, defining key points according to the position of a fixed character area of the invoice;

if the actual invoice picture has an irregular problem, which causes that part of the defined key points are invisible, only corresponding visible key points are marked when the actual marking task is executed.

Specifically, in step 102, during data annotation, the invoice pictures with tables are annotated by adopting points task types of VIA tools, and the invoice pictures without tables are annotated by adopting rect task types of VIA tools.

Further, in step 103, during data enhancement, the labeled partial data is adopted, then image enhancement is performed on the labeled data according to the actual business bill image condition, and the imgauge image enhancement library of python is used as the enhancement strategy of the training data, and the following method is adopted:

random affine transformation of the image, wherein the scaling scale range is (0.5,2), the rotation angle range is [ (-15,15), (75,105), (165,195), (255,285) ], the unit is degree, the translation amount is (-200, 200), the unit is pixel;

image random perspective transformation, wherein the scale parameter random value range is (0.025, 0.15);

randomly adding noise to the image;

image contrast stretching;

adding shadow noise to the image;

the marked training data are expanded and enhanced by the fusion of the data enhancement modes.

Specifically, in step 104, during format conversion, format fusion and conversion are performed on the annotation data of different types of tickets, and the annotation data is arranged into a data set, for the annotation data of a specific type of ticket, the corresponding annotation data is not changed, and for the annotation data of other types of tickets, the annotation data of the corresponding point is filled with (0,0, 0).

Further, in step 105, when the data set is divided, the sorted data set is randomly divided into a training set and a testing set according to a 9:1 ratio.

Specifically, the step 2 specifically includes the following steps:

step 201, training a network structure, wherein basic feature extraction adopts resnet101 as a feature extraction module, and is followed by GolbalNet and RefineNet, wherein the GolbalNet is used for detecting all key points, and the RefineNet is used for predicting correction of the key points;

step 202, initializing parameters of the model, wherein a basic feature extraction module adopts a pre-training weight of resnet101 on ImageNet; the total training EPOCHS is 30, the batch-size under a single GPU is 4, the input size of the initialized picture is 512 x 512, an Adam optimizer is adopted, the learning rate is set to be 0.0001, and the loss function is the sum of the distances between the coordinates of each predicted point and the coordinates of the real point.

Further, the step 3 specifically comprises the following steps;

step 301, setting algorithm parameters, and setting model initialization parameters according to the step 202;

step 302, establishing a network model;

step 303, judging whether the current EPOCH is smaller than EPOCHS, if so, turning to step 304, otherwise, turning to step 4;

304, taking a batch-size invoice picture from the training set, training the model, and updating the model algorithm by using Adam;

step 305, judging whether all pictures in the training set are trained, if so, turning to step 306, otherwise, turning to step 304;

step 306, verifying the accuracy of the model by using the test set, storing the trained model, and going to step 303.

Specifically, in the step 4, when detecting the bill key points, performing key point detection on an image to be corrected by using a model obtained by training to obtain detected key points, marking the detected key points as predicted _ points, marking a corresponding key point set in a template picture as template _ points, then calculating a perspective transformation matrix corresponding to the detected key points and the key points of the standard bill picture as homograph by using the detected key points and the key points of the standard bill picture, and specifically calculating by using a findHomography method of opencv;

in step 5, when the bill is aligned, the obtained perspective transformation matrix is used as homograph for obtaining the corrected image dst on the image src to be corrected, and the correction adopts opencv warpPeractive method.

The method has the advantages that the method can accurately detect the key points in the invoice image through the invoice image distortion correction method based on the deep learning key point detection, further calculate the corresponding perspective transformation matrix by using the detected key points and the key points of the standard bill image, and act the transformation matrix on the image to be processed to obtain the corrected image. The method is rapid and accurate, is suitable for natural scenes, and greatly improves the accuracy of OCR recognition by recognizing the corrected pictures. The labor and material investment is reduced for downstream OCR application, and resources are saved.

Drawings

FIG. 1 is a flowchart of an invoice image distortion correction method based on deep learning keypoint detection according to the present invention.

Detailed Description

The technical scheme of the invention is described in detail in the following with reference to the accompanying drawings.

The invention discloses an invoice image distortion correction method based on deep learning key point detection, a flow chart of which is shown in figure 1, wherein the method comprises the following steps:

step 1, marking and enhancing training data. For effective training of neural networks, a large amount of labeling data is required, and manual labeling of data consumes a large amount of time. In order to ensure the quality of the labeled data and reduce the labeling cost as much as possible, the invention adopts a method of partially manually labeling the data and then generating a large amount of training data by using a data enhancement strategy, which comprises the following steps:

step 101, preparing marking data. Collecting different types of pictures to be marked, wherein about 1000 pictures in each type are collected, the positions and names of key points of each type of bills are defined, and the number of the key points is more than 4. The definition of the key points is performed according to the following criteria: 1. if the invoice picture has a table and the form style is fixed, the corner points in the table are taken as the standard during actual definition, and the selected key points are distributed at each position of the invoice face. 2. If the invoice picture has no table, the key points are defined according to the position of the fixed character area of the invoice. Furthermore, it should be noted that there may be irregularities in the actual invoice picture, such as defects, occlusion, distortion, severe exposure, etc. that result in partially defined key points being invisible. For such a situation, only the corresponding visible key points can be labeled when the actual labeling task is executed.

And 102, marking data. And when the annotation task is actually executed, the invoice picture with the table is annotated by adopting the points task type of the VIA tool. The non-table may use the rect task type of VIA to perform annotation, for example: a quota invoice in which the corresponding keypoints take the four vertices or the midpoint of the four edges of the rectangle.

And 103, enhancing data. And the labeling cost is limited, the labeled partial data can be actually adopted, and then the image enhancement is carried out on the labeled data according to the actual business bill image condition, so that the labeling cost is saved, and the robustness and the precision of the training model can be improved. The enhancement strategy of the training data in the invention uses the imgauge image enhancement library of python in the following way. 1. Random affine transformation (scaling, rotation, translation) of the image, wherein the scaling scale range is (0.5,2), the rotation angle range is [ (-15,15), (75,105), (165,195), (255,285) ], the unit is degree, and the translation amount is (-200, 200), the unit is pixel; 2. image random perspective transformation, wherein the scale parameter random value range is (0.025, 0.15); 3. randomly adding noise to the image; 4. image contrast stretching; 5. the image adds shadow noise. The marked training data are expanded and enhanced by the fusion of the data enhancement modes.

And 104, converting the data format. In order to support the key point detection of various bill types, the invention performs format fusion and conversion on the labeled data of different types of bills. Data types similar to one-hot manner are adopted. For a specific type of note annotation data, the corresponding annotation data is unchanged, and for other types of note annotation data, the corresponding point annotation data is filled with (0,0, 0).

And 105, dividing the data set. The sorted data set in step 104 is randomly divided into training sets and test sets in a 9:1 ratio.

And 2, setting a network structure and training parameters.

Step 201, training a network structure. The training adopts a model structure similar to CPN, and the basic feature extraction adopts resnet101 as a feature extraction module followed by GolbalNet and RefineNet. Wherein, GolbAlNet is used for detecting all key points, and RefineNet is used for predicting the correction of the key points.

Step 202, model initialization parameters. The basic feature extraction module adopts the pre-training weight of resnet101 on ImageNet; the total training EPOCHS is 30, the batch-size under a single GPU is 4, the input size of an initialized picture is 512 x 512, an Adam optimizer is adopted, the learning rate is set to be 0.0001, and a loss function is the sum of the distances between the coordinates of all predicted points and the coordinates of real points;

and 3, setting a training key point detection model by using the network structure and the training parameters, and storing the trained model. And (3) training a key point detection model by using the network structure and parameter setting in the step (2), and storing the trained model parameters for key point detection. The specific training steps are as follows:

step 302, establishing a network model;

And 4, detecting key points of the bills by using the trained model. And (3) performing key point detection on the image to be corrected by using the model obtained by training in the step (3), and recording the detected key points as preset _ points and recording a corresponding key point set in the template picture as template _ points. And then calculating a corresponding perspective transformation matrix as homographic by using the detected key points and the key points of the standard bill picture. The specific calculation adopts the findHomography method of opencv.

And 5, aligning the bill by using the detected key points. And (4) taking the perspective transformation matrix obtained in the step (4) as homograph to be used on the image src to be corrected to obtain a corrected image dst. The correction is carried out by the method warPerspectral of opencv.

Claims

1. The invoice image distortion correction method based on deep learning key point detection is characterized by comprising the following steps:

step 1, marking and enhancing training data;

step 2, setting a network structure and training parameters;

step 4, detecting key points of bills by using the trained model;

and 5, aligning the bill by using the detected key points.

2. The invoice image distortion correction method based on deep learning key point detection as claimed in claim 1, wherein in step 1, partial manual labeling data is adopted, and then a large amount of training data is generated by a data enhancement strategy, which specifically comprises the following steps:

step 101, preparing label data;

102, data annotation;

103, enhancing data;

step 104, converting data formats;

and 105, dividing the data set.

3. The invoice image distortion correction method based on deep learning key point detection as claimed in claim 2, characterized in that in step 101, when annotation data is prepared, different types of images to be annotated are collected, 1000 images of each type are collected, the key point position and name of each type of invoice are defined, and the number of key points is more than 4;

4. The invoice image distortion correction method based on deep learning key point detection as claimed in claim 2, wherein in step 102, during data annotation, the invoice pictures with tables are annotated by the points task type of VIA tool, and the invoice pictures without tables are annotated by the rect task type of VIA.

5. The invoice image distortion correction method based on deep learning key point detection as claimed in claim 2, characterized in that in step 103, during data enhancement, the labeled partial data is adopted, then image enhancement is performed on the labeled data according to the actual business bill image condition, and the imgauge image enhancement library of python is used as the enhancement strategy for the training data, and the following method is adopted:

randomly adding noise to the image;

image contrast stretching;

adding shadow noise to the image;

6. The invoice image distortion correction method based on deep learning key point detection as claimed in claim 2, characterized in that in step 104, during data format conversion, the format fusion and conversion are performed on the annotation data of different types of bills, and the annotation data is arranged into a data set, for a specific type of bill annotation data, the corresponding annotation data is unchanged, and for other types of bill annotation data, the corresponding annotation data is filled with (0,0, 0).

7. The invoice image distortion correction method based on deep learning key point detection as claimed in claim 2 or 6, characterized in that, in the step 105, when the data set is divided, the sorted data set is randomly divided into a training set and a testing set according to a 9:1 ratio.

8. The invoice image distortion correction method based on deep learning key point detection as claimed in claim 1, wherein the step 2 specifically comprises the following steps:

9. The invoice image distortion correction method based on deep learning key point detection as claimed in claim 1 or 8, characterized in that step 3 specifically comprises the following steps;

step 302, establishing a network model;

10. The invoice image distortion correction method based on deep learning key point detection as claimed in claim 1 or 8, characterized in that, in step 4, during the detection of the bill key point, the trained model is used to perform key point detection on the image to be corrected, the detected key point is recorded as a predicted _ points, the corresponding key point set in the template picture is recorded as a template _ points, then the detected key point and the key point of the standard bill picture are used to calculate the corresponding perspective transformation matrix and record as homograph, and the opendcv findHomography method is specifically adopted for calculation;