CN114418869A - Method, system, device and medium for geometric correction of document image - Google Patents
Method, system, device and medium for geometric correction of document image Download PDFInfo
- Publication number
- CN114418869A CN114418869A CN202111584077.4A CN202111584077A CN114418869A CN 114418869 A CN114418869 A CN 114418869A CN 202111584077 A CN202111584077 A CN 202111584077A CN 114418869 A CN114418869 A CN 114418869A
- Authority
- CN
- China
- Prior art keywords
- document image
- document
- image
- boundary
- coordinate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012937 correction Methods 0.000 title claims abstract description 53
- 238000000034 method Methods 0.000 title claims abstract description 52
- 239000011159 matrix material Substances 0.000 claims abstract description 71
- 230000007613 environmental effect Effects 0.000 claims abstract description 25
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000003909 pattern recognition Methods 0.000 abstract description 2
- 238000012549 training Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 13
- 238000011176 pooling Methods 0.000 description 8
- 238000005070 sampling Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000009191 jumping Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000005452 bending Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012015 optical character recognition Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/80—Geometric correction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30176—Document
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a method, a system, a device and a medium for geometric correction of a document image, wherein the method comprises the following steps: acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask image of the foreground document area; extracting control points on the mask image, primarily correcting the first document image according to the control points, deleting the environment boundary, and obtaining a second document image with primarily corrected and deleted environment boundary; and acquiring a first coordinate offset matrix of the second document image, and offsetting the second document image according to the first coordinate offset matrix to obtain a corrected third document image. The present invention is capable of handling captured document images having different environmental boundary regions, including situations with smaller environmental boundary regions, with larger environmental boundary regions, or without environmental boundary regions. The invention can be widely applied to the technical field of pattern recognition and artificial intelligence.
Description
Technical Field
The invention relates to the technical field of pattern recognition and artificial intelligence, in particular to a method, a system, a device and a medium for geometric correction of a document image.
Background
With the development of semiconductor technology, built-in cameras of mobile devices are more and more advanced, and imaging quality is also higher and higher. Digitizing document images by taking pictures with a built-in camera has become a convenient way of digitizing documents. However, due to perspective deformation caused by improper position and angle of the camera during photographing and deformation of the document such as bending, folding and folding, the photographed document image has geometric deformation. These deformations can affect the performance of the optical character recognition system as well as the aesthetics and readability of the document image. The document image geometric correction method based on deep learning greatly improves the correction performance and the robustness of different document layouts. However, the existing deep learning correction methods only focus on correcting the clipped document image, i.e. the document image with a smaller environmental boundary region, and require a complete document boundary. However, in practical situations, the environment boundary conditions are various, some document images have a large environment boundary region, the foreground document region only occupies a small part, and some document images do not have the environment boundary region, so that the complete document boundary is not formed. The aforementioned deep learning correction method does not work well for such images.
Disclosure of Invention
To solve at least one of the technical problems in the prior art to some extent, the present invention provides a method, a system, a device and a medium for geometric correction of a document image.
The technical scheme adopted by the invention is as follows:
a geometrical correction method for document images comprises the following steps:
acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask image of the foreground document area;
extracting control points on the mask image, performing primary correction on the first document image according to the control points, deleting an environment boundary, and obtaining a second document image with the primary correction and the deletion of the environment boundary;
and acquiring a first coordinate offset matrix of the second document image, and offsetting the second document image according to the first coordinate offset matrix to obtain a corrected third document image.
Further, the obtaining the corrected third document image includes:
judging whether an iteration step is executed or not according to the first coordinate offset matrix, and if the iteration step is not required to be executed, taking the third document image as an output image; otherwise, executing an iteration step;
the step of iterating comprises:
acquiring a second coordinate offset matrix of the third document image, updating the corrected image into the third document image after offsetting the third document image according to the second coordinate offset matrix, and recording the second coordinate offset matrix in the iteration step;
judging whether to continue to execute the iteration step according to the second coordinate offset matrix, if so, returning to execute the previous step; and on the contrary, the second document image is shifted according to the first coordinate shift matrix and all second coordinate shift matrices in the record, and a corrected image is obtained and used as an output image.
Further, the classifying the pixels in the first document image includes:
obtaining the classification confidence of each pixel position in the first document image by adopting a first deep convolutional neural network, and classifying according to the classification confidence;
the obtaining of the first coordinate offset matrix of the second document image includes:
and acquiring a first coordinate offset matrix of the second document image by adopting a second deep convolution neural network.
Further, the extracting a control point on the mask map, performing preliminary correction on the first document image according to the control point, deleting an environment boundary, and obtaining a second document image with the preliminary correction and the deletion of the environment boundary, includes:
extracting four corner points of the document on a mask image of the foreground document area by using a polygon fitting algorithm;
drawing a vertical bisector on a line segment formed by taking adjacent corner points as end points according to a preset bisection proportion, and taking the intersection point of the bisector and the mask image boundary of the foreground document region as a bisection point of the document boundary;
drawing a quadrilateral mask image by four corner points, and calculating an intersection-parallel ratio according to the quadrilateral mask image and the mask image of the foreground document area;
if the intersection ratio is smaller than a first preset threshold value, no correction is carried out, and the first document image is used as a second document image;
and if the intersection ratio is larger than a first preset threshold value, taking four corner points and a plurality of equally-divided points of the boundary as control points, primarily correcting the first document image by using a thin plate spline interpolation algorithm, deleting the environmental boundary, and obtaining a second document image with the primarily corrected and deleted environmental boundary.
Further, after the second document image is shifted according to the first coordinate shift matrix, obtaining a corrected third document image, including:
the first coordinate offset matrix designates a two-dimensional offset vector for each pixel point position in the second document image, and each pixel is offset according to the corresponding offset vector to obtain a corrected third document image;
the offset vector is used for representing the offset direction and distance on the two-dimensional plane.
Further, the determining whether to continue to perform the iteration step according to the second coordinate offset matrix includes:
calculating a standard deviation of the second coordinate offset matrix;
if the standard deviation of the second coordinate offset matrix is larger than a second preset threshold value, continuing to execute the iteration step;
and if the standard deviation of the second coordinate offset matrix is smaller than a second preset threshold value, stopping executing the iteration step.
Further, the shifting the second document image according to the first coordinate shift matrix and all second coordinate shift matrices in the record to obtain a corrected image as an output image includes:
shifting the first coordinate by a matrixAnd a number of coordinate offset matrices recorded in the iteration stepSumming to obtain final coordinate offset matrixShifting the matrix according to the final coordinatesAnd shifting the second document image to obtain a corrected image as an output image.
The other technical scheme adopted by the invention is as follows:
a document image geometry correction system, comprising:
the pixel classification module is used for acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image and acquiring a mask map of the foreground document area;
the preliminary correction module is used for extracting control points on the mask image, preliminarily correcting the first document image according to the control points, deleting the environment boundary and obtaining a second document image with preliminarily corrected and deleted environment boundary;
and the offset correction module is used for acquiring a first coordinate offset matrix of the second document image, and acquiring a corrected third document image after offsetting the second document image according to the first coordinate offset matrix.
The other technical scheme adopted by the invention is as follows:
a document image geometry correction apparatus comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method described above.
The other technical scheme adopted by the invention is as follows:
a computer readable storage medium in which a processor executable program is stored, which when executed by a processor is for performing the method as described above.
The invention has the beneficial effects that: the present invention is capable of handling captured document images having different environmental boundary regions, including situations with smaller environmental boundary regions, with larger environmental boundary regions, or without environmental boundary regions.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart illustrating an embodiment of a geometric correction method for a captured document image suitable for various environmental boundary conditions;
FIG. 2 is a schematic diagram of a document image with an environment boundary region removed by control point extraction and preliminary correction according to an embodiment of the present invention;
FIG. 3 is a schematic illustration of iterative correction in an embodiment of the present invention;
FIG. 4 is a diagram illustrating the rectification effect on document images with different environmental boundary conditions according to an embodiment of the present invention.
FIG. 5 is a flowchart illustrating steps of a method for geometrically correcting a document image according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.
As shown in fig. 5, the present embodiment provides a document image geometric correction method, including the steps of:
s101, acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask map of the foreground document area;
s102, extracting control points on the mask image, primarily correcting the first document image according to the control points, deleting the environment boundary, and obtaining a second document image with primarily corrected and deleted environment boundary;
s103, obtaining a first coordinate offset matrix of the second document image, and obtaining a corrected third document image after offsetting the second document image according to the first coordinate offset matrix.
In the embodiment, the first document image may be a document image obtained by shooting through a shooting device (such as a smart terminal, a camera, etc.), or may be a document image obtained by scanning, etc.
In some embodiments, the step of obtaining the corrected third document image in step S103 specifically includes:
judging whether an iteration step is executed or not according to the first coordinate offset matrix, and if the iteration step is not required to be executed, taking the third document image as an output image; otherwise, executing an iteration step;
wherein the iteration step comprises A1-A2:
a1, acquiring a second coordinate offset matrix of the third document image, updating the corrected image into the third document image after offsetting the third document image according to the second coordinate offset matrix, and recording the second coordinate offset matrix in the iteration step;
a2, judging whether to continue to execute the iteration step according to the second coordinate offset matrix, if so, returning to execute the previous step; and otherwise, shifting the second document image according to the first coordinate shift matrix and all second coordinate shift matrices in the record to obtain a corrected image as an output image.
In the embodiment, the flatness of the input image is reflected by the statistical information of the coordinate offset matrix; a low response indicates that the input map is relatively flat. And when the first coordinate offset matrix is smaller than a preset threshold value, directly taking the third document image as an output image. When the first coordinate offset matrix is larger than a preset threshold value, performing an iteration step, wherein the response of the coordinate offset matrix is lower and lower along with the increase of the iteration times until the coordinate offset matrix of the document image is smaller than the preset threshold value; and adding all the obtained coordinate offset matrixes, and offsetting the second document image to obtain a corrected image as an output image.
The above method is explained in detail below with reference to specific embodiments and the accompanying drawings.
As shown in fig. 1, the present embodiment provides a geometric correction method of a photographed document image suitable for various environmental boundary conditions, for solving the problem of geometric correction of a photographed document image having different environmental boundary conditions in an actual scene. The method specifically comprises the following steps:
s1, classifying each pixel of the input shot document image (namely the first document image), distinguishing a foreground document area and an environment boundary area on the image, and obtaining a mask image of the foreground document area. This allows for accurate separation of foreground document regions from the document image.
And S2, extracting control points on the mask image, performing initial correction on the input document image by using the control points, and removing the environmental boundary at the same time to obtain a document image (namely, a second document image) with the initial correction and the environmental boundary removal as the input of the next step.
The control points are extracted from the mask image, and the control points are binary images, so that the extraction is much easier compared with the extraction directly from the input shot document image.
Specifically, as shown in FIG. 2, the step S2 includes steps S21-S24:
s21, extracting four corner points of the document on the foreground document region mask image by using a Douglas-Peucker polygon fitting algorithm;
s22, distinguishing upper left corner points, upper right corner points, lower right corner points and lower left corner points according to the relative position relation of the four corner points;
s23, drawing a vertical bisector on a line segment formed by taking adjacent corner points as end points according to a preset bisection proportion, and taking the intersection point of the bisector and the foreground document area mask map boundary as a bisection point of the document boundary;
s24, drawing a quadrilateral mask image by four corner points, calculating an intersection ratio with the foreground document area mask image, if the intersection ratio is smaller than a preset threshold value (according to the small intersection ratio, the input shot document image does not contain a complete document boundary), not correcting, and taking the input document image as the input of the next step; if the cross-over ratio is larger than a preset threshold value (the input shot document image can be judged to contain a complete document boundary according to the large cross-over ratio), taking four corner points and a plurality of equally divided points of the boundary as control points, primarily correcting the input document image by using a thin plate spline interpolation algorithm, and removing the environmental boundary at the same time to obtain a document image which is primarily corrected and is subjected to environmental boundary removal and is used as the input of the next step. Because the preliminary correction is realized by utilizing the document boundary, if the input document image does not have a complete document boundary, the thin plate spline correction is still unreasonable, and the input document image without the complete document boundary is removed by setting an intersection ratio threshold value, and is directly sent to the next step without the thin plate spline correction.
And S3, predicting a coordinate offset matrix for the document image, obtaining a corrected document image after the document image is offset according to the coordinate offset matrix, performing iterative correction on the corrected document image as the input of the step S3 again, and determining whether the iterative correction is performed or not in a self-adaptive mode according to the statistical information of the coordinate offset matrix. And when the iteration is stopped, obtaining a final corrected document image according to the obtained plurality of coordinate offset matrixes.
In some optional embodiments, in step S3, the coordinate offset matrix specifies a two-dimensional offset vector for each pixel position in the input image, where the offset vector indicates an offset direction and a distance on a two-dimensional plane, the pixel is offset according to the corresponding offset vector to obtain a corrected document image, and a linear interpolation is used in the offset process.
In some alternative embodiments, step S3 uses the standard deviation as the statistical information of the coordinate offset matrix, because as the number of iterations increases, the input image becomes flatter and flatter, the corresponding predicted coordinate offset matrix becomes lower and smaller, the standard deviation thereof also becomes smaller and smaller, and when the standard deviation is smaller than a certain threshold, we can consider that the input document image corresponding to the input document image is already sufficiently flattened, so that the iteration can be stopped, otherwise, the iteration continues. In this way, a balance of corrective properties and corrective efficiencies can be achieved, making the system more efficient.
In some alternative embodiments, as shown in fig. 3, the performing rectification based on the coordinate offset matrices in step S3 includes: firstly, a plurality of coordinate offset matrixes are obtainedSumming to obtain the final coordinate offset matrixShifting the matrix according to the final coordinatesAnd performing coordinate offset on the output of the step S2 to obtain a final corrected document image. Not directly employed hereThe reason why the document image is rectified to be finally output is that: at this time, the document image has been sampled many times, which may cause a problem of blurring. But adoptThe rectified document image obtained at the output of step S2 is sampled only once, effectively avoiding the problem of blurring.
In some optional embodiments, in step S1, a deep convolutional neural network is used to obtain the classification confidence of each pixel position, parameters of the network are trained and optimized in advance with synthetic data, and binarization is performed through a threshold of 0.5 during prediction, so as to obtain a final foreground document region mask map. The network parameter optimization specifically comprises the following steps:
(1) data acquisition: using 100000 data samples (one data sample includes an input shot document image and its corresponding foreground document region mask map) in the Doc3D public synthesis data set as training (90000 data samples) and verification data (10000 data samples);
(2) network training:
(2-1) constructing a deep neural network: the DeepLabv3+ segmentation model is used as a network structure, the number of output categories is set to be 1, namely, the number of channels of the output result of the last layer of the network is 1.
(2-2) training mode: the training uses a gradient descent algorithm, and all parameters are updated by calculating the gradient from the last layer and transmitting layer by layer, so as to achieve the aim of training the network. The loss function during training is binary cross entropy loss.
(2-3) setting of training parameters:
iteration times are as follows: 50epoch
An optimizer: adam
Learning rate: 0.0001 (learning rate update strategy: the learning rate decays to 1/2 every 5 iterations)
Weight decay:0.0005
And (2-4) starting to train the deep neural network under random initialization parameters.
In some optional embodiments, in step S3, a deep convolutional neural network is used to obtain a coordinate offset matrix, and parameters of the network are trained and optimized in advance through synthetic data, which specifically includes:
(1) data acquisition: 100000 data samples (one data sample comprises an input shot document image and a corresponding left offset matrix) in a Doc3D public synthesis data set are used as training data (90000 data samples) and verification data (10000 data samples), and before training, the samples are processed by the steps S1 and S2 to remove environmental boundaries;
(2) network training:
(2-1) constructing a deep neural network: the network employs a down-sampling-then-up-sampling codec structure, while employing a hopping connection for preserving detail characteristics and facilitating gradient backhaul, as shown in table 1 below:
TABLE 1
Network layer | Detailed description of the invention | Feature size |
Input layer | - | 3*448*448 |
Convolutional layer | Number of kernels 32, convolution kernel 3 x 3, step 1 x 1, edge filling | 32*448*448 |
Non-linear layer | - | 32*448*448 |
Pooling layer | Pooling nuclei 2 x 2, step size 2 x 2 | 32*224*224 |
Convolutional layer | Number of kernels 64, convolution kernel 3 x 3, step 1 x 1, edge filling | 64*224*224 |
Non-linear layer | - | 64*224*224 |
Pooling layer | Pooling nuclei 2 x 2, step size 2 x 2 | 64*112*112 |
Convolutional layer | Number of kernels 128, convolution kernel 3 x 3, step 1 x 1, edge filling | 128*112*112 |
Non-linear layer | - | 128*112*112 |
Pooling layer | Pooling nuclei 2 x 2, step size 2 x 2 | 128*56*56 |
Convolutional layer | Number of kernels 256, convolution kernel 3 x 3, step 1 x 1, edge filling | 256*56*56 |
Non-linear layer | - | 256*56*56 |
Pooling layer | Pooling nuclei 2 x 2, step size 2 x 2 | 256*28*28 |
Convolutional layer | Number of kernels 128, convolution kernel 3 x 3, step 1 x 1, edge filling | 512*28*28 |
Non-linear layer | - | 512*28*28 |
Transposed convolution layer | Number of kernels 256, convolution kernel 4 x 4, step 2 x 2, edge filling | 256*56*56 |
Non-linear layer | - | 256*56*56 |
Jumping connection layers | Splicing corresponding feature maps in the down-sampling path on the channel | 512*56*56 |
Convolutional layer | Number of kernels 256, convolution kernel 3 x 3, step 1 x 1, edge filling | 256*56*56 |
Non-linear layer | - | 256*56*56 |
Transposed convolution layer | Number of kernels 128, convolution kernel 4 x 4, step 2 x 2, edge filling | 128*112*112 |
Non-linear layer | - | 128*112*112 |
Jumping connectionBonding layer | Splicing corresponding feature maps in the down-sampling path on the channel | 256*112*112 |
Convolutional layer | Number of kernels 128, convolution kernel 3 x 3, step 1 x 1, edge filling | 128*112*112 |
Non-linear layer | - | 128*112*112 |
Transposed convolution layer | Number of kernels 64, convolution kernel 4 x 4, step 2 x 2, edge filling | 64*224*224 |
Non-linear layer | - | 64*224*224 |
Jumping connection layers | Splicing corresponding feature maps in the down-sampling path on the channel | 128*224*224 |
Convolutional layer | Number of kernels 64, convolution kernel 3 x 3, step 1 x 1, edge filling | 64*224*224 |
Non-linear layer | - | 64*224*224 |
Transposed convolution layer | Number of kernels 32, convolution kernels 4 x 4, step size 2 x 2, edge filling | 32*448*448 |
Non-linear layer | - | 32*448*448 |
Jumping connection layers | Splicing corresponding feature maps in the down-sampling path on the channel | 64*448*448 |
Convolutional layer | Number of kernels 32, convolution kernel 3 x 3, step 1 x 1, edge filling | 32*448*448 |
Non-linear layer | - | 32*448*448 |
Convolutional layer | Number of kernels 2, convolution kernels 3 x 3, step size 1 x 1, edge filling | 2*448*448 |
Non-linear layer | - | 2*448*448 |
(2-2) training mode: the training uses a gradient descent algorithm, and all parameters are updated by calculating the gradient from the last layer and transmitting layer by layer, so as to achieve the aim of training the network. The loss function during training is the mean square error loss.
(2-3) setting of training parameters:
iteration times are as follows: 50epoch
An optimizer: adam
Learning rate: 0.0001 (learning rate update strategy: the learning rate decays to 1/2 every 5 iterations)
Weight decay:0.0005
And (2-4) starting to train the deep neural network under random initialization parameters.
As shown in FIG. 4, the method provided by the present embodiment can process the captured document images of various environmental boundary conditions, and can achieve a better correction effect. In summary, the method proposed by the embodiment can process the captured document image with different environment boundary areas, including the case with a smaller environment boundary area, a larger environment boundary area, and no environment boundary area. Meanwhile, aiming at document images with different geometric deformation degrees, the method provided by the embodiment can adaptively determine the iteration times and obtain a better correction effect.
The present embodiment also provides a document image geometric correction system, including:
the pixel classification module is used for acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image and acquiring a mask map of the foreground document area;
the preliminary correction module is used for extracting control points on the mask image, preliminarily correcting the first document image according to the control points, deleting the environment boundary and obtaining a second document image with preliminarily corrected and deleted environment boundary;
and the offset correction module is used for acquiring a first coordinate offset matrix of the second document image, and acquiring a corrected third document image after offsetting the second document image according to the first coordinate offset matrix.
The document image geometric correction system of the embodiment can execute the document image geometric correction method provided by the embodiment of the method of the invention, can execute any combination implementation steps of the embodiment of the method, and has corresponding functions and beneficial effects of the method.
The present embodiment also provides a document image geometry correction apparatus, including:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of fig. 5.
The document image geometric correction system of the embodiment can execute the document image geometric correction method provided by the embodiment of the method of the invention, can execute any combination implementation steps of the embodiment of the method, and has corresponding functions and beneficial effects of the method.
The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor, to cause the computer device to perform the method illustrated in fig. 5.
The embodiment also provides a storage medium, which stores an instruction or a program capable of executing the document image geometric correction method provided by the embodiment of the method of the invention, and when the instruction or the program is executed, the method can be executed by any combination of the embodiment of the method, and the method has corresponding functions and beneficial effects.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. A geometrical correction method for a document image is characterized by comprising the following steps:
acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask image of the foreground document area;
extracting control points on the mask image, performing primary correction on the first document image according to the control points, deleting an environment boundary, and obtaining a second document image with the primary correction and the deletion of the environment boundary;
and acquiring a first coordinate offset matrix of the second document image, and offsetting the second document image according to the first coordinate offset matrix to obtain a corrected third document image.
2. The method for geometrically correcting the document image according to claim 1, wherein the obtaining the corrected third document image comprises:
judging whether an iteration step is executed or not according to the first coordinate offset matrix, and if the iteration step is not required to be executed, taking the third document image as an output image; otherwise, executing an iteration step;
the step of iterating comprises:
acquiring a second coordinate offset matrix of the third document image, updating the corrected image into the third document image after offsetting the third document image according to the second coordinate offset matrix, and recording the second coordinate offset matrix in the iteration step;
judging whether to continue to execute the iteration step according to the second coordinate offset matrix, if so, returning to execute the previous step; and on the contrary, the second document image is shifted according to the first coordinate shift matrix and all second coordinate shift matrices in the record, and a corrected image is obtained and used as an output image.
3. The method for geometrically correcting the document image according to claim 1, wherein the classifying the pixels in the first document image comprises:
obtaining the classification confidence of each pixel position in the first document image by adopting a first deep convolutional neural network, and classifying according to the classification confidence;
the obtaining of the first coordinate offset matrix of the second document image includes:
and acquiring a first coordinate offset matrix of the second document image by adopting a second deep convolution neural network.
4. The method according to claim 1, wherein the extracting control points on the mask map, performing preliminary correction on the first document image according to the control points, deleting the environmental boundary, and obtaining a second document image with the preliminary correction and the environmental boundary deleted comprises:
extracting four corner points of the document on a mask image of the foreground document area by using a polygon fitting algorithm;
drawing a vertical bisector on a line segment formed by taking adjacent corner points as end points according to a preset bisection proportion, and taking the intersection point of the bisector and the mask image boundary of the foreground document region as a bisection point of the document boundary;
drawing a quadrilateral mask image by four corner points, and calculating an intersection-parallel ratio according to the quadrilateral mask image and the mask image of the foreground document area;
if the intersection ratio is smaller than a first preset threshold value, no correction is carried out, and the first document image is used as a second document image;
and if the intersection ratio is larger than a first preset threshold value, taking four corner points and a plurality of equally-divided points of the boundary as control points, primarily correcting the first document image by using a thin plate spline interpolation algorithm, deleting the environmental boundary, and obtaining a second document image with the primarily corrected and deleted environmental boundary.
5. The method for geometrically correcting the document image according to claim 1, wherein the obtaining a corrected third document image after the shifting of the second document image according to the first coordinate shifting matrix comprises:
the first coordinate offset matrix designates a two-dimensional offset vector for each pixel point position in the second document image, and each pixel is offset according to the corresponding offset vector to obtain a corrected third document image;
the offset vector is used for representing the offset direction and distance on the two-dimensional plane.
6. The method for geometrically correcting a document image according to claim 2, wherein said determining whether to continue the iteration step according to the second coordinate offset matrix comprises:
calculating a standard deviation of the second coordinate offset matrix;
if the standard deviation of the second coordinate offset matrix is larger than a second preset threshold value, continuing to execute the iteration step;
and if the standard deviation of the second coordinate offset matrix is smaller than a second preset threshold value, stopping executing the iteration step.
7. The method for geometrically correcting the document image according to claim 2, wherein the shifting the second document image according to the first coordinate shift matrix and all the second coordinate shift matrices in the record to obtain a corrected image as an output image comprises:
shifting the first coordinate by a matrixAnd a number of coordinate offset matrices recorded in the iteration stepSumming to obtain final coordinate offset matrixShifting the matrix according to the final coordinatesAnd shifting the second document image to obtain a corrected image as an output image.
8. A document image geometry correction system, comprising:
the pixel classification module is used for acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image and acquiring a mask map of the foreground document area;
the preliminary correction module is used for extracting control points on the mask image, preliminarily correcting the first document image according to the control points, deleting the environment boundary and obtaining a second document image with preliminarily corrected and deleted environment boundary;
and the offset correction module is used for acquiring a first coordinate offset matrix of the second document image, and acquiring a corrected third document image after offsetting the second document image according to the first coordinate offset matrix.
9. A document image geometry correction apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-7.
10. A computer-readable storage medium, in which a program executable by a processor is stored, wherein the program executable by the processor is adapted to perform the method according to any one of claims 1 to 7 when executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111584077.4A CN114418869B (en) | 2021-12-22 | 2021-12-22 | Document image geometric correction method, system, device and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111584077.4A CN114418869B (en) | 2021-12-22 | 2021-12-22 | Document image geometric correction method, system, device and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114418869A true CN114418869A (en) | 2022-04-29 |
CN114418869B CN114418869B (en) | 2024-08-13 |
Family
ID=81267830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111584077.4A Active CN114418869B (en) | 2021-12-22 | 2021-12-22 | Document image geometric correction method, system, device and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114418869B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115187995A (en) * | 2022-07-08 | 2022-10-14 | 北京百度网讯科技有限公司 | Document correction method, device, electronic equipment and storage medium |
CN116030120A (en) * | 2022-09-09 | 2023-04-28 | 北京市计算中心有限公司 | Method for identifying and correcting hexagons |
CN117853382A (en) * | 2024-03-04 | 2024-04-09 | 武汉人工智能研究院 | Sparse marker-based image correction method, device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190294970A1 (en) * | 2018-03-23 | 2019-09-26 | The Governing Council Of The University Of Toronto | Systems and methods for polygon object annotation and a method of training an object annotation system |
CN111401371A (en) * | 2020-06-03 | 2020-07-10 | 中邮消费金融有限公司 | Text detection and identification method and system and computer equipment |
CN111414915A (en) * | 2020-02-21 | 2020-07-14 | 华为技术有限公司 | Character recognition method and related equipment |
CN112766266A (en) * | 2021-01-29 | 2021-05-07 | 云从科技集团股份有限公司 | Text direction correction method, system and device based on staged probability statistics |
CN112767270A (en) * | 2021-01-19 | 2021-05-07 | 中国科学技术大学 | Fold document image correction system |
KR20210112992A (en) * | 2020-03-06 | 2021-09-15 | 주식회사 테스트웍스 | System and method of quality adjustment of object detection based on polyggon |
-
2021
- 2021-12-22 CN CN202111584077.4A patent/CN114418869B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190294970A1 (en) * | 2018-03-23 | 2019-09-26 | The Governing Council Of The University Of Toronto | Systems and methods for polygon object annotation and a method of training an object annotation system |
CN111414915A (en) * | 2020-02-21 | 2020-07-14 | 华为技术有限公司 | Character recognition method and related equipment |
KR20210112992A (en) * | 2020-03-06 | 2021-09-15 | 주식회사 테스트웍스 | System and method of quality adjustment of object detection based on polyggon |
CN111401371A (en) * | 2020-06-03 | 2020-07-10 | 中邮消费金融有限公司 | Text detection and identification method and system and computer equipment |
CN112767270A (en) * | 2021-01-19 | 2021-05-07 | 中国科学技术大学 | Fold document image correction system |
CN112766266A (en) * | 2021-01-29 | 2021-05-07 | 云从科技集团股份有限公司 | Text direction correction method, system and device based on staged probability statistics |
Non-Patent Citations (1)
Title |
---|
JIAXIN ZHANG ET AL.: "Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in theWild", 《ARXIV:2207.11515V1》, 23 July 2022 (2022-07-23), pages 1 - 11 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115187995A (en) * | 2022-07-08 | 2022-10-14 | 北京百度网讯科技有限公司 | Document correction method, device, electronic equipment and storage medium |
CN116030120A (en) * | 2022-09-09 | 2023-04-28 | 北京市计算中心有限公司 | Method for identifying and correcting hexagons |
CN116030120B (en) * | 2022-09-09 | 2023-11-24 | 北京市计算中心有限公司 | Method for identifying and correcting hexagons |
CN117853382A (en) * | 2024-03-04 | 2024-04-09 | 武汉人工智能研究院 | Sparse marker-based image correction method, device and storage medium |
CN117853382B (en) * | 2024-03-04 | 2024-05-28 | 武汉人工智能研究院 | Sparse marker-based image correction method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114418869B (en) | 2024-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114418869A (en) | Method, system, device and medium for geometric correction of document image | |
CN112233181B (en) | 6D pose recognition method and device and computer storage medium | |
CN109753971B (en) | Correction method and device for distorted text lines, character recognition method and device | |
WO2022141178A1 (en) | Image processing method and apparatus | |
CN112560861A (en) | Bill processing method, device, equipment and storage medium | |
CN108038826B (en) | Method and device for correcting perspective deformed shelf image | |
CN111626295B (en) | Training method and device for license plate detection model | |
CN111768415A (en) | Image instance segmentation method without quantization pooling | |
CN115410030A (en) | Target detection method, target detection device, computer equipment and storage medium | |
CN111353965B (en) | Image restoration method, device, terminal and storage medium | |
CN114782355B (en) | Gastric cancer digital pathological section detection method based on improved VGG16 network | |
CN115272691A (en) | Training method, recognition method and equipment for steel bar binding state detection model | |
CN116091823A (en) | Single-feature anchor-frame-free target detection method based on fast grouping residual error module | |
CN112686247A (en) | Identification card number detection method and device, readable storage medium and terminal | |
CN110516731B (en) | Visual odometer feature point detection method and system based on deep learning | |
CN113808033A (en) | Image document correction method, system, terminal and medium | |
CN113557520A (en) | Character processing and character recognition method, storage medium and terminal device | |
CN117409057A (en) | Panorama depth estimation method, device and medium | |
CN116895013A (en) | Method, system and storage medium for extracting road from remote sensing image | |
CN115601569A (en) | Different-source image optimization matching method and system based on improved PIIFD | |
CN113255405B (en) | Parking space line identification method and system, parking space line identification equipment and storage medium | |
CN115700785A (en) | Image feature extraction method and device, electronic equipment and storage medium | |
CN117877043B (en) | Model training method, text recognition method, device, equipment and medium | |
CN117611648B (en) | Image depth estimation method, system and storage medium | |
CN118485937B (en) | Automatic identification method and system for orthographic image change area of unmanned aerial vehicle and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |