CN114418869A - Method, system, device and medium for geometric correction of document image - Google Patents

Method, system, device and medium for geometric correction of document image Download PDF

Info

Publication number
CN114418869A
CN114418869A CN202111584077.4A CN202111584077A CN114418869A CN 114418869 A CN114418869 A CN 114418869A CN 202111584077 A CN202111584077 A CN 202111584077A CN 114418869 A CN114418869 A CN 114418869A
Authority
CN
China
Prior art keywords
document image
document
image
boundary
coordinate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111584077.4A
Other languages
Chinese (zh)
Other versions
CN114418869B (en
Inventor
金连文
张家鑫
罗灿杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Original Assignee
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, Zhuhai Institute of Modern Industrial Innovation of South China University of Technology filed Critical South China University of Technology SCUT
Priority to CN202111584077.4A priority Critical patent/CN114418869B/en
Publication of CN114418869A publication Critical patent/CN114418869A/en
Application granted granted Critical
Publication of CN114418869B publication Critical patent/CN114418869B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method, a system, a device and a medium for geometric correction of a document image, wherein the method comprises the following steps: acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask image of the foreground document area; extracting control points on the mask image, primarily correcting the first document image according to the control points, deleting the environment boundary, and obtaining a second document image with primarily corrected and deleted environment boundary; and acquiring a first coordinate offset matrix of the second document image, and offsetting the second document image according to the first coordinate offset matrix to obtain a corrected third document image. The present invention is capable of handling captured document images having different environmental boundary regions, including situations with smaller environmental boundary regions, with larger environmental boundary regions, or without environmental boundary regions. The invention can be widely applied to the technical field of pattern recognition and artificial intelligence.

Description

Method, system, device and medium for geometric correction of document image
Technical Field
The invention relates to the technical field of pattern recognition and artificial intelligence, in particular to a method, a system, a device and a medium for geometric correction of a document image.
Background
With the development of semiconductor technology, built-in cameras of mobile devices are more and more advanced, and imaging quality is also higher and higher. Digitizing document images by taking pictures with a built-in camera has become a convenient way of digitizing documents. However, due to perspective deformation caused by improper position and angle of the camera during photographing and deformation of the document such as bending, folding and folding, the photographed document image has geometric deformation. These deformations can affect the performance of the optical character recognition system as well as the aesthetics and readability of the document image. The document image geometric correction method based on deep learning greatly improves the correction performance and the robustness of different document layouts. However, the existing deep learning correction methods only focus on correcting the clipped document image, i.e. the document image with a smaller environmental boundary region, and require a complete document boundary. However, in practical situations, the environment boundary conditions are various, some document images have a large environment boundary region, the foreground document region only occupies a small part, and some document images do not have the environment boundary region, so that the complete document boundary is not formed. The aforementioned deep learning correction method does not work well for such images.
Disclosure of Invention
To solve at least one of the technical problems in the prior art to some extent, the present invention provides a method, a system, a device and a medium for geometric correction of a document image.
The technical scheme adopted by the invention is as follows:
a geometrical correction method for document images comprises the following steps:
acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask image of the foreground document area;
extracting control points on the mask image, performing primary correction on the first document image according to the control points, deleting an environment boundary, and obtaining a second document image with the primary correction and the deletion of the environment boundary;
and acquiring a first coordinate offset matrix of the second document image, and offsetting the second document image according to the first coordinate offset matrix to obtain a corrected third document image.
Further, the obtaining the corrected third document image includes:
judging whether an iteration step is executed or not according to the first coordinate offset matrix, and if the iteration step is not required to be executed, taking the third document image as an output image; otherwise, executing an iteration step;
the step of iterating comprises:
acquiring a second coordinate offset matrix of the third document image, updating the corrected image into the third document image after offsetting the third document image according to the second coordinate offset matrix, and recording the second coordinate offset matrix in the iteration step;
judging whether to continue to execute the iteration step according to the second coordinate offset matrix, if so, returning to execute the previous step; and on the contrary, the second document image is shifted according to the first coordinate shift matrix and all second coordinate shift matrices in the record, and a corrected image is obtained and used as an output image.
Further, the classifying the pixels in the first document image includes:
obtaining the classification confidence of each pixel position in the first document image by adopting a first deep convolutional neural network, and classifying according to the classification confidence;
the obtaining of the first coordinate offset matrix of the second document image includes:
and acquiring a first coordinate offset matrix of the second document image by adopting a second deep convolution neural network.
Further, the extracting a control point on the mask map, performing preliminary correction on the first document image according to the control point, deleting an environment boundary, and obtaining a second document image with the preliminary correction and the deletion of the environment boundary, includes:
extracting four corner points of the document on a mask image of the foreground document area by using a polygon fitting algorithm;
drawing a vertical bisector on a line segment formed by taking adjacent corner points as end points according to a preset bisection proportion, and taking the intersection point of the bisector and the mask image boundary of the foreground document region as a bisection point of the document boundary;
drawing a quadrilateral mask image by four corner points, and calculating an intersection-parallel ratio according to the quadrilateral mask image and the mask image of the foreground document area;
if the intersection ratio is smaller than a first preset threshold value, no correction is carried out, and the first document image is used as a second document image;
and if the intersection ratio is larger than a first preset threshold value, taking four corner points and a plurality of equally-divided points of the boundary as control points, primarily correcting the first document image by using a thin plate spline interpolation algorithm, deleting the environmental boundary, and obtaining a second document image with the primarily corrected and deleted environmental boundary.
Further, after the second document image is shifted according to the first coordinate shift matrix, obtaining a corrected third document image, including:
the first coordinate offset matrix designates a two-dimensional offset vector for each pixel point position in the second document image, and each pixel is offset according to the corresponding offset vector to obtain a corrected third document image;
the offset vector is used for representing the offset direction and distance on the two-dimensional plane.
Further, the determining whether to continue to perform the iteration step according to the second coordinate offset matrix includes:
calculating a standard deviation of the second coordinate offset matrix;
if the standard deviation of the second coordinate offset matrix is larger than a second preset threshold value, continuing to execute the iteration step;
and if the standard deviation of the second coordinate offset matrix is smaller than a second preset threshold value, stopping executing the iteration step.
Further, the shifting the second document image according to the first coordinate shift matrix and all second coordinate shift matrices in the record to obtain a corrected image as an output image includes:
shifting the first coordinate by a matrix
Figure BDA0003427328280000031
And a number of coordinate offset matrices recorded in the iteration step
Figure BDA0003427328280000032
Summing to obtain final coordinate offset matrix
Figure BDA0003427328280000033
Shifting the matrix according to the final coordinates
Figure BDA0003427328280000034
And shifting the second document image to obtain a corrected image as an output image.
The other technical scheme adopted by the invention is as follows:
a document image geometry correction system, comprising:
the pixel classification module is used for acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image and acquiring a mask map of the foreground document area;
the preliminary correction module is used for extracting control points on the mask image, preliminarily correcting the first document image according to the control points, deleting the environment boundary and obtaining a second document image with preliminarily corrected and deleted environment boundary;
and the offset correction module is used for acquiring a first coordinate offset matrix of the second document image, and acquiring a corrected third document image after offsetting the second document image according to the first coordinate offset matrix.
The other technical scheme adopted by the invention is as follows:
a document image geometry correction apparatus comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method described above.
The other technical scheme adopted by the invention is as follows:
a computer readable storage medium in which a processor executable program is stored, which when executed by a processor is for performing the method as described above.
The invention has the beneficial effects that: the present invention is capable of handling captured document images having different environmental boundary regions, including situations with smaller environmental boundary regions, with larger environmental boundary regions, or without environmental boundary regions.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart illustrating an embodiment of a geometric correction method for a captured document image suitable for various environmental boundary conditions;
FIG. 2 is a schematic diagram of a document image with an environment boundary region removed by control point extraction and preliminary correction according to an embodiment of the present invention;
FIG. 3 is a schematic illustration of iterative correction in an embodiment of the present invention;
FIG. 4 is a diagram illustrating the rectification effect on document images with different environmental boundary conditions according to an embodiment of the present invention.
FIG. 5 is a flowchart illustrating steps of a method for geometrically correcting a document image according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.
As shown in fig. 5, the present embodiment provides a document image geometric correction method, including the steps of:
s101, acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask map of the foreground document area;
s102, extracting control points on the mask image, primarily correcting the first document image according to the control points, deleting the environment boundary, and obtaining a second document image with primarily corrected and deleted environment boundary;
s103, obtaining a first coordinate offset matrix of the second document image, and obtaining a corrected third document image after offsetting the second document image according to the first coordinate offset matrix.
In the embodiment, the first document image may be a document image obtained by shooting through a shooting device (such as a smart terminal, a camera, etc.), or may be a document image obtained by scanning, etc.
In some embodiments, the step of obtaining the corrected third document image in step S103 specifically includes:
judging whether an iteration step is executed or not according to the first coordinate offset matrix, and if the iteration step is not required to be executed, taking the third document image as an output image; otherwise, executing an iteration step;
wherein the iteration step comprises A1-A2:
a1, acquiring a second coordinate offset matrix of the third document image, updating the corrected image into the third document image after offsetting the third document image according to the second coordinate offset matrix, and recording the second coordinate offset matrix in the iteration step;
a2, judging whether to continue to execute the iteration step according to the second coordinate offset matrix, if so, returning to execute the previous step; and otherwise, shifting the second document image according to the first coordinate shift matrix and all second coordinate shift matrices in the record to obtain a corrected image as an output image.
In the embodiment, the flatness of the input image is reflected by the statistical information of the coordinate offset matrix; a low response indicates that the input map is relatively flat. And when the first coordinate offset matrix is smaller than a preset threshold value, directly taking the third document image as an output image. When the first coordinate offset matrix is larger than a preset threshold value, performing an iteration step, wherein the response of the coordinate offset matrix is lower and lower along with the increase of the iteration times until the coordinate offset matrix of the document image is smaller than the preset threshold value; and adding all the obtained coordinate offset matrixes, and offsetting the second document image to obtain a corrected image as an output image.
The above method is explained in detail below with reference to specific embodiments and the accompanying drawings.
As shown in fig. 1, the present embodiment provides a geometric correction method of a photographed document image suitable for various environmental boundary conditions, for solving the problem of geometric correction of a photographed document image having different environmental boundary conditions in an actual scene. The method specifically comprises the following steps:
s1, classifying each pixel of the input shot document image (namely the first document image), distinguishing a foreground document area and an environment boundary area on the image, and obtaining a mask image of the foreground document area. This allows for accurate separation of foreground document regions from the document image.
And S2, extracting control points on the mask image, performing initial correction on the input document image by using the control points, and removing the environmental boundary at the same time to obtain a document image (namely, a second document image) with the initial correction and the environmental boundary removal as the input of the next step.
The control points are extracted from the mask image, and the control points are binary images, so that the extraction is much easier compared with the extraction directly from the input shot document image.
Specifically, as shown in FIG. 2, the step S2 includes steps S21-S24:
s21, extracting four corner points of the document on the foreground document region mask image by using a Douglas-Peucker polygon fitting algorithm;
s22, distinguishing upper left corner points, upper right corner points, lower right corner points and lower left corner points according to the relative position relation of the four corner points;
s23, drawing a vertical bisector on a line segment formed by taking adjacent corner points as end points according to a preset bisection proportion, and taking the intersection point of the bisector and the foreground document area mask map boundary as a bisection point of the document boundary;
s24, drawing a quadrilateral mask image by four corner points, calculating an intersection ratio with the foreground document area mask image, if the intersection ratio is smaller than a preset threshold value (according to the small intersection ratio, the input shot document image does not contain a complete document boundary), not correcting, and taking the input document image as the input of the next step; if the cross-over ratio is larger than a preset threshold value (the input shot document image can be judged to contain a complete document boundary according to the large cross-over ratio), taking four corner points and a plurality of equally divided points of the boundary as control points, primarily correcting the input document image by using a thin plate spline interpolation algorithm, and removing the environmental boundary at the same time to obtain a document image which is primarily corrected and is subjected to environmental boundary removal and is used as the input of the next step. Because the preliminary correction is realized by utilizing the document boundary, if the input document image does not have a complete document boundary, the thin plate spline correction is still unreasonable, and the input document image without the complete document boundary is removed by setting an intersection ratio threshold value, and is directly sent to the next step without the thin plate spline correction.
And S3, predicting a coordinate offset matrix for the document image, obtaining a corrected document image after the document image is offset according to the coordinate offset matrix, performing iterative correction on the corrected document image as the input of the step S3 again, and determining whether the iterative correction is performed or not in a self-adaptive mode according to the statistical information of the coordinate offset matrix. And when the iteration is stopped, obtaining a final corrected document image according to the obtained plurality of coordinate offset matrixes.
In some optional embodiments, in step S3, the coordinate offset matrix specifies a two-dimensional offset vector for each pixel position in the input image, where the offset vector indicates an offset direction and a distance on a two-dimensional plane, the pixel is offset according to the corresponding offset vector to obtain a corrected document image, and a linear interpolation is used in the offset process.
In some alternative embodiments, step S3 uses the standard deviation as the statistical information of the coordinate offset matrix, because as the number of iterations increases, the input image becomes flatter and flatter, the corresponding predicted coordinate offset matrix becomes lower and smaller, the standard deviation thereof also becomes smaller and smaller, and when the standard deviation is smaller than a certain threshold, we can consider that the input document image corresponding to the input document image is already sufficiently flattened, so that the iteration can be stopped, otherwise, the iteration continues. In this way, a balance of corrective properties and corrective efficiencies can be achieved, making the system more efficient.
In some alternative embodiments, as shown in fig. 3, the performing rectification based on the coordinate offset matrices in step S3 includes: firstly, a plurality of coordinate offset matrixes are obtained
Figure BDA0003427328280000061
Summing to obtain the final coordinate offset matrix
Figure BDA0003427328280000062
Shifting the matrix according to the final coordinates
Figure BDA0003427328280000063
And performing coordinate offset on the output of the step S2 to obtain a final corrected document image. Not directly employed here
Figure BDA0003427328280000071
The reason why the document image is rectified to be finally output is that: at this time, the document image has been sampled many times, which may cause a problem of blurring. But adopt
Figure BDA0003427328280000072
The rectified document image obtained at the output of step S2 is sampled only once, effectively avoiding the problem of blurring.
In some optional embodiments, in step S1, a deep convolutional neural network is used to obtain the classification confidence of each pixel position, parameters of the network are trained and optimized in advance with synthetic data, and binarization is performed through a threshold of 0.5 during prediction, so as to obtain a final foreground document region mask map. The network parameter optimization specifically comprises the following steps:
(1) data acquisition: using 100000 data samples (one data sample includes an input shot document image and its corresponding foreground document region mask map) in the Doc3D public synthesis data set as training (90000 data samples) and verification data (10000 data samples);
(2) network training:
(2-1) constructing a deep neural network: the DeepLabv3+ segmentation model is used as a network structure, the number of output categories is set to be 1, namely, the number of channels of the output result of the last layer of the network is 1.
(2-2) training mode: the training uses a gradient descent algorithm, and all parameters are updated by calculating the gradient from the last layer and transmitting layer by layer, so as to achieve the aim of training the network. The loss function during training is binary cross entropy loss.
(2-3) setting of training parameters:
iteration times are as follows: 50epoch
An optimizer: adam
Learning rate: 0.0001 (learning rate update strategy: the learning rate decays to 1/2 every 5 iterations)
Weight decay:0.0005
And (2-4) starting to train the deep neural network under random initialization parameters.
In some optional embodiments, in step S3, a deep convolutional neural network is used to obtain a coordinate offset matrix, and parameters of the network are trained and optimized in advance through synthetic data, which specifically includes:
(1) data acquisition: 100000 data samples (one data sample comprises an input shot document image and a corresponding left offset matrix) in a Doc3D public synthesis data set are used as training data (90000 data samples) and verification data (10000 data samples), and before training, the samples are processed by the steps S1 and S2 to remove environmental boundaries;
(2) network training:
(2-1) constructing a deep neural network: the network employs a down-sampling-then-up-sampling codec structure, while employing a hopping connection for preserving detail characteristics and facilitating gradient backhaul, as shown in table 1 below:
TABLE 1
Network layer Detailed description of the invention Feature size
Input layer - 3*448*448
Convolutional layer Number of kernels 32, convolution kernel 3 x 3, step 1 x 1, edge filling 32*448*448
Non-linear layer - 32*448*448
Pooling layer Pooling nuclei 2 x 2, step size 2 x 2 32*224*224
Convolutional layer Number of kernels 64, convolution kernel 3 x 3, step 1 x 1, edge filling 64*224*224
Non-linear layer - 64*224*224
Pooling layer Pooling nuclei 2 x 2, step size 2 x 2 64*112*112
Convolutional layer Number of kernels 128, convolution kernel 3 x 3, step 1 x 1, edge filling 128*112*112
Non-linear layer - 128*112*112
Pooling layer Pooling nuclei 2 x 2, step size 2 x 2 128*56*56
Convolutional layer Number of kernels 256, convolution kernel 3 x 3, step 1 x 1, edge filling 256*56*56
Non-linear layer - 256*56*56
Pooling layer Pooling nuclei 2 x 2, step size 2 x 2 256*28*28
Convolutional layer Number of kernels 128, convolution kernel 3 x 3, step 1 x 1, edge filling 512*28*28
Non-linear layer - 512*28*28
Transposed convolution layer Number of kernels 256, convolution kernel 4 x 4, step 2 x 2, edge filling 256*56*56
Non-linear layer - 256*56*56
Jumping connection layers Splicing corresponding feature maps in the down-sampling path on the channel 512*56*56
Convolutional layer Number of kernels 256, convolution kernel 3 x 3, step 1 x 1, edge filling 256*56*56
Non-linear layer - 256*56*56
Transposed convolution layer Number of kernels 128, convolution kernel 4 x 4, step 2 x 2, edge filling 128*112*112
Non-linear layer - 128*112*112
Jumping connectionBonding layer Splicing corresponding feature maps in the down-sampling path on the channel 256*112*112
Convolutional layer Number of kernels 128, convolution kernel 3 x 3, step 1 x 1, edge filling 128*112*112
Non-linear layer - 128*112*112
Transposed convolution layer Number of kernels 64, convolution kernel 4 x 4, step 2 x 2, edge filling 64*224*224
Non-linear layer - 64*224*224
Jumping connection layers Splicing corresponding feature maps in the down-sampling path on the channel 128*224*224
Convolutional layer Number of kernels 64, convolution kernel 3 x 3, step 1 x 1, edge filling 64*224*224
Non-linear layer - 64*224*224
Transposed convolution layer Number of kernels 32, convolution kernels 4 x 4, step size 2 x 2, edge filling 32*448*448
Non-linear layer - 32*448*448
Jumping connection layers Splicing corresponding feature maps in the down-sampling path on the channel 64*448*448
Convolutional layer Number of kernels 32, convolution kernel 3 x 3, step 1 x 1, edge filling 32*448*448
Non-linear layer - 32*448*448
Convolutional layer Number of kernels 2, convolution kernels 3 x 3, step size 1 x 1, edge filling 2*448*448
Non-linear layer - 2*448*448
(2-2) training mode: the training uses a gradient descent algorithm, and all parameters are updated by calculating the gradient from the last layer and transmitting layer by layer, so as to achieve the aim of training the network. The loss function during training is the mean square error loss.
(2-3) setting of training parameters:
iteration times are as follows: 50epoch
An optimizer: adam
Learning rate: 0.0001 (learning rate update strategy: the learning rate decays to 1/2 every 5 iterations)
Weight decay:0.0005
And (2-4) starting to train the deep neural network under random initialization parameters.
As shown in FIG. 4, the method provided by the present embodiment can process the captured document images of various environmental boundary conditions, and can achieve a better correction effect. In summary, the method proposed by the embodiment can process the captured document image with different environment boundary areas, including the case with a smaller environment boundary area, a larger environment boundary area, and no environment boundary area. Meanwhile, aiming at document images with different geometric deformation degrees, the method provided by the embodiment can adaptively determine the iteration times and obtain a better correction effect.
The present embodiment also provides a document image geometric correction system, including:
the pixel classification module is used for acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image and acquiring a mask map of the foreground document area;
the preliminary correction module is used for extracting control points on the mask image, preliminarily correcting the first document image according to the control points, deleting the environment boundary and obtaining a second document image with preliminarily corrected and deleted environment boundary;
and the offset correction module is used for acquiring a first coordinate offset matrix of the second document image, and acquiring a corrected third document image after offsetting the second document image according to the first coordinate offset matrix.
The document image geometric correction system of the embodiment can execute the document image geometric correction method provided by the embodiment of the method of the invention, can execute any combination implementation steps of the embodiment of the method, and has corresponding functions and beneficial effects of the method.
The present embodiment also provides a document image geometry correction apparatus, including:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of fig. 5.
The document image geometric correction system of the embodiment can execute the document image geometric correction method provided by the embodiment of the method of the invention, can execute any combination implementation steps of the embodiment of the method, and has corresponding functions and beneficial effects of the method.
The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor, to cause the computer device to perform the method illustrated in fig. 5.
The embodiment also provides a storage medium, which stores an instruction or a program capable of executing the document image geometric correction method provided by the embodiment of the method of the invention, and when the instruction or the program is executed, the method can be executed by any combination of the embodiment of the method, and the method has corresponding functions and beneficial effects.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A geometrical correction method for a document image is characterized by comprising the following steps:
acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask image of the foreground document area;
extracting control points on the mask image, performing primary correction on the first document image according to the control points, deleting an environment boundary, and obtaining a second document image with the primary correction and the deletion of the environment boundary;
and acquiring a first coordinate offset matrix of the second document image, and offsetting the second document image according to the first coordinate offset matrix to obtain a corrected third document image.
2. The method for geometrically correcting the document image according to claim 1, wherein the obtaining the corrected third document image comprises:
judging whether an iteration step is executed or not according to the first coordinate offset matrix, and if the iteration step is not required to be executed, taking the third document image as an output image; otherwise, executing an iteration step;
the step of iterating comprises:
acquiring a second coordinate offset matrix of the third document image, updating the corrected image into the third document image after offsetting the third document image according to the second coordinate offset matrix, and recording the second coordinate offset matrix in the iteration step;
judging whether to continue to execute the iteration step according to the second coordinate offset matrix, if so, returning to execute the previous step; and on the contrary, the second document image is shifted according to the first coordinate shift matrix and all second coordinate shift matrices in the record, and a corrected image is obtained and used as an output image.
3. The method for geometrically correcting the document image according to claim 1, wherein the classifying the pixels in the first document image comprises:
obtaining the classification confidence of each pixel position in the first document image by adopting a first deep convolutional neural network, and classifying according to the classification confidence;
the obtaining of the first coordinate offset matrix of the second document image includes:
and acquiring a first coordinate offset matrix of the second document image by adopting a second deep convolution neural network.
4. The method according to claim 1, wherein the extracting control points on the mask map, performing preliminary correction on the first document image according to the control points, deleting the environmental boundary, and obtaining a second document image with the preliminary correction and the environmental boundary deleted comprises:
extracting four corner points of the document on a mask image of the foreground document area by using a polygon fitting algorithm;
drawing a vertical bisector on a line segment formed by taking adjacent corner points as end points according to a preset bisection proportion, and taking the intersection point of the bisector and the mask image boundary of the foreground document region as a bisection point of the document boundary;
drawing a quadrilateral mask image by four corner points, and calculating an intersection-parallel ratio according to the quadrilateral mask image and the mask image of the foreground document area;
if the intersection ratio is smaller than a first preset threshold value, no correction is carried out, and the first document image is used as a second document image;
and if the intersection ratio is larger than a first preset threshold value, taking four corner points and a plurality of equally-divided points of the boundary as control points, primarily correcting the first document image by using a thin plate spline interpolation algorithm, deleting the environmental boundary, and obtaining a second document image with the primarily corrected and deleted environmental boundary.
5. The method for geometrically correcting the document image according to claim 1, wherein the obtaining a corrected third document image after the shifting of the second document image according to the first coordinate shifting matrix comprises:
the first coordinate offset matrix designates a two-dimensional offset vector for each pixel point position in the second document image, and each pixel is offset according to the corresponding offset vector to obtain a corrected third document image;
the offset vector is used for representing the offset direction and distance on the two-dimensional plane.
6. The method for geometrically correcting a document image according to claim 2, wherein said determining whether to continue the iteration step according to the second coordinate offset matrix comprises:
calculating a standard deviation of the second coordinate offset matrix;
if the standard deviation of the second coordinate offset matrix is larger than a second preset threshold value, continuing to execute the iteration step;
and if the standard deviation of the second coordinate offset matrix is smaller than a second preset threshold value, stopping executing the iteration step.
7. The method for geometrically correcting the document image according to claim 2, wherein the shifting the second document image according to the first coordinate shift matrix and all the second coordinate shift matrices in the record to obtain a corrected image as an output image comprises:
shifting the first coordinate by a matrix
Figure FDA0003427328270000021
And a number of coordinate offset matrices recorded in the iteration step
Figure FDA0003427328270000022
Summing to obtain final coordinate offset matrix
Figure FDA0003427328270000023
Shifting the matrix according to the final coordinates
Figure FDA0003427328270000024
And shifting the second document image to obtain a corrected image as an output image.
8. A document image geometry correction system, comprising:
the pixel classification module is used for acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image and acquiring a mask map of the foreground document area;
the preliminary correction module is used for extracting control points on the mask image, preliminarily correcting the first document image according to the control points, deleting the environment boundary and obtaining a second document image with preliminarily corrected and deleted environment boundary;
and the offset correction module is used for acquiring a first coordinate offset matrix of the second document image, and acquiring a corrected third document image after offsetting the second document image according to the first coordinate offset matrix.
9. A document image geometry correction apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-7.
10. A computer-readable storage medium, in which a program executable by a processor is stored, wherein the program executable by the processor is adapted to perform the method according to any one of claims 1 to 7 when executed by the processor.
CN202111584077.4A 2021-12-22 2021-12-22 Document image geometric correction method, system, device and medium Active CN114418869B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111584077.4A CN114418869B (en) 2021-12-22 2021-12-22 Document image geometric correction method, system, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111584077.4A CN114418869B (en) 2021-12-22 2021-12-22 Document image geometric correction method, system, device and medium

Publications (2)

Publication Number Publication Date
CN114418869A true CN114418869A (en) 2022-04-29
CN114418869B CN114418869B (en) 2024-08-13

Family

ID=81267830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111584077.4A Active CN114418869B (en) 2021-12-22 2021-12-22 Document image geometric correction method, system, device and medium

Country Status (1)

Country Link
CN (1) CN114418869B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187995A (en) * 2022-07-08 2022-10-14 北京百度网讯科技有限公司 Document correction method, device, electronic equipment and storage medium
CN116030120A (en) * 2022-09-09 2023-04-28 北京市计算中心有限公司 Method for identifying and correcting hexagons
CN117853382A (en) * 2024-03-04 2024-04-09 武汉人工智能研究院 Sparse marker-based image correction method, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190294970A1 (en) * 2018-03-23 2019-09-26 The Governing Council Of The University Of Toronto Systems and methods for polygon object annotation and a method of training an object annotation system
CN111401371A (en) * 2020-06-03 2020-07-10 中邮消费金融有限公司 Text detection and identification method and system and computer equipment
CN111414915A (en) * 2020-02-21 2020-07-14 华为技术有限公司 Character recognition method and related equipment
CN112766266A (en) * 2021-01-29 2021-05-07 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics
CN112767270A (en) * 2021-01-19 2021-05-07 中国科学技术大学 Fold document image correction system
KR20210112992A (en) * 2020-03-06 2021-09-15 주식회사 테스트웍스 System and method of quality adjustment of object detection based on polyggon

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190294970A1 (en) * 2018-03-23 2019-09-26 The Governing Council Of The University Of Toronto Systems and methods for polygon object annotation and a method of training an object annotation system
CN111414915A (en) * 2020-02-21 2020-07-14 华为技术有限公司 Character recognition method and related equipment
KR20210112992A (en) * 2020-03-06 2021-09-15 주식회사 테스트웍스 System and method of quality adjustment of object detection based on polyggon
CN111401371A (en) * 2020-06-03 2020-07-10 中邮消费金融有限公司 Text detection and identification method and system and computer equipment
CN112767270A (en) * 2021-01-19 2021-05-07 中国科学技术大学 Fold document image correction system
CN112766266A (en) * 2021-01-29 2021-05-07 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIAXIN ZHANG ET AL.: "Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in theWild", 《ARXIV:2207.11515V1》, 23 July 2022 (2022-07-23), pages 1 - 11 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187995A (en) * 2022-07-08 2022-10-14 北京百度网讯科技有限公司 Document correction method, device, electronic equipment and storage medium
CN116030120A (en) * 2022-09-09 2023-04-28 北京市计算中心有限公司 Method for identifying and correcting hexagons
CN116030120B (en) * 2022-09-09 2023-11-24 北京市计算中心有限公司 Method for identifying and correcting hexagons
CN117853382A (en) * 2024-03-04 2024-04-09 武汉人工智能研究院 Sparse marker-based image correction method, device and storage medium
CN117853382B (en) * 2024-03-04 2024-05-28 武汉人工智能研究院 Sparse marker-based image correction method, device and storage medium

Also Published As

Publication number Publication date
CN114418869B (en) 2024-08-13

Similar Documents

Publication Publication Date Title
CN114418869A (en) Method, system, device and medium for geometric correction of document image
CN112233181B (en) 6D pose recognition method and device and computer storage medium
CN109753971B (en) Correction method and device for distorted text lines, character recognition method and device
WO2022141178A1 (en) Image processing method and apparatus
CN112560861A (en) Bill processing method, device, equipment and storage medium
CN108038826B (en) Method and device for correcting perspective deformed shelf image
CN111626295B (en) Training method and device for license plate detection model
CN111768415A (en) Image instance segmentation method without quantization pooling
CN115410030A (en) Target detection method, target detection device, computer equipment and storage medium
CN111353965B (en) Image restoration method, device, terminal and storage medium
CN114782355B (en) Gastric cancer digital pathological section detection method based on improved VGG16 network
CN115272691A (en) Training method, recognition method and equipment for steel bar binding state detection model
CN116091823A (en) Single-feature anchor-frame-free target detection method based on fast grouping residual error module
CN112686247A (en) Identification card number detection method and device, readable storage medium and terminal
CN110516731B (en) Visual odometer feature point detection method and system based on deep learning
CN113808033A (en) Image document correction method, system, terminal and medium
CN113557520A (en) Character processing and character recognition method, storage medium and terminal device
CN117409057A (en) Panorama depth estimation method, device and medium
CN116895013A (en) Method, system and storage medium for extracting road from remote sensing image
CN115601569A (en) Different-source image optimization matching method and system based on improved PIIFD
CN113255405B (en) Parking space line identification method and system, parking space line identification equipment and storage medium
CN115700785A (en) Image feature extraction method and device, electronic equipment and storage medium
CN117877043B (en) Model training method, text recognition method, device, equipment and medium
CN117611648B (en) Image depth estimation method, system and storage medium
CN118485937B (en) Automatic identification method and system for orthographic image change area of unmanned aerial vehicle and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant