CN116403226A - Unconstrained fold document image correction method, system, equipment and storage medium - Google Patents
Unconstrained fold document image correction method, system, equipment and storage medium Download PDFInfo
- Publication number
- CN116403226A CN116403226A CN202310392392.XA CN202310392392A CN116403226A CN 116403226 A CN116403226 A CN 116403226A CN 202310392392 A CN202310392392 A CN 202310392392A CN 116403226 A CN116403226 A CN 116403226A
- Authority
- CN
- China
- Prior art keywords
- document image
- unconstrained
- fold
- mapping matrix
- coordinate mapping
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000003702 image correction Methods 0.000 title claims description 47
- 238000013507 mapping Methods 0.000 claims description 79
- 239000011159 matrix material Substances 0.000 claims description 65
- 238000012549 training Methods 0.000 claims description 33
- 238000004422 calculation algorithm Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000012937 correction Methods 0.000 abstract description 13
- 230000000694 effects Effects 0.000 abstract description 6
- 238000006243 chemical reaction Methods 0.000 abstract description 4
- 238000011084 recovery Methods 0.000 abstract description 3
- 238000013135 deep learning Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000037303 wrinkles Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- -1 carrier Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000000306 component Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000012535 impurity Substances 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 239000007858 starting material Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/43—Editing text-bitmaps, e.g. alignment, spacing; Semantic analysis of bitmaps of text without OCR
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/16—Image preprocessing
- G06V30/1607—Correcting image deformation, e.g. trapezoidal deformation caused by perspective
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19147—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a method, a system, equipment and a storage medium for correcting unconstrained fold document images, which are one-to-one corresponding schemes, and the related schemes can solve the problem that the application scene of the existing scheme is limited, namely, deformed document images without document boundaries or only containing partial document boundaries can not be corrected. Meanwhile, the invention also improves the correction and recovery effects on the image with the complete document boundary. Compared with the traditional method, the method has no constraint on any form of the input fold document image, can more robustly and accurately correct various deformation document images shot in daily life, can be widely applied to portable equipment with cameras such as smart phones, and has wider application scenes and higher accuracy. Therefore, the invention greatly promotes the popularization of the digitization of the document image and provides powerful technical support for the digitization conversion of the paper document.
Description
Technical Field
The invention relates to the technical field of fold document image correction, in particular to an unconstrained fold document image correction method, an unconstrained fold document image correction system, unconstrained fold document image correction equipment and a storage medium.
Background
With the rapid progress and popularity of portable cameras and smartphones, more and more people choose to shoot scanned paper documents with them without the need to use dedicated flatbed scanners as in the past. However, due to various uncertainty factors in the shooting environment, such as uncertainty in the camera position, uncertainty in the illumination environment, uncertainty in the type and degree of deformation of paper, etc., document images shot by these devices tend to be distorted and deformed to various degrees. This makes processing of downstream tasks, such as automated text recognition, content analysis, editing and understanding, etc., more difficult. At the same time, this is disadvantageous for the propagation and communication of information and knowledge in daily life. In order to solve this problem, the correction of the folded document image is an important research topic in the current computer vision field.
Traditional solutions are mainly based on 3D reconstruction techniques. These methods typically rely on additional hardware devices (e.g., laser scanners, depth cameras, etc.) or by capturing multi-view images around the pleated paper to reconstruct the three-dimensional structure of the paper and performing flattening correction based thereon. However, the popularization and use of these techniques are greatly limited due to high hardware costs or cumbersome shooting requirements.
Currently, many smartphones have document correction algorithms built in. These algorithms are mostly based on projective transformation techniques: firstly, detecting four straight line edges or four corner points of a paper document in a shot document image to form a quadrilateral region where the document is located; then, a projective transformation technique is applied to map the image into a regular rectangular image, thereby completing correction of the photographed document image. However, this solution requires that in the captured image, a complete document must appear and cannot be rectified if there is distortion in the document itself, thereby affecting the effect. This limitation also brings inconvenience: many times, the user may focus only on a partial region of the document.
In recent years, deep learning has been introduced into the field of image correction of wrinkled documents. Compared with the traditional method, the method based on the deep learning realizes similar performance and simultaneously requires less calculation cost. By training with a large number of deformation-free image pairs synthesized by the rendering engine, the neural network learns the ability to correct document wrinkles. In the reasoning stage, a single-fold RGB document image is input, the neural network can output a pixel-by-pixel coordinate mapping matrix, and pixels in a fold document area in the input image are sampled into an empty image, so that a complete correction image is obtained.
In general, whether a document correction algorithm is built in a smart phone or an existing deep learning method, the method mainly has the following defects:
(1) Current document image correction algorithms based on deep learning generally correct only a document image with complete boundaries, i.e. the input image must contain a complete document. However, in a practical application scenario, the user may only want to pay attention to or share a partial region or text in the document. Therefore, there may be a case where the document boundary is missing in the photographed image. In addition, there is often a case where an edge portion of a document image photographed by a mobile phone is missing. In this case, the existing document image correction method will fail, and a normal correction result cannot be obtained. The current technical solution lacks effective research for correcting the document image without document boundaries or only including partial document boundaries, and needs further exploration and improvement.
(2) The current smart phone has limited applicable scenarios for built-in document image correction algorithms. These algorithms are only applicable to complete, deformation-free document images, i.e. the paper document is free of folds, bends and wrinkles and appears completely in the captured image. In short, these algorithms simply switch the image projection plane of the paper document to a regular rectangular shape, and once the shape of the paper document is not a regular quadrilateral, these algorithms cannot normally complete document image correction.
(3) The existing document image correction algorithm based on deep learning still has a certain degree of distortion in corrected document images. This is because these consider only document images with complete borders and ignore document images without document borders or containing only partial document borders when training the model. The latter is incorporated into model training, so that the accuracy and the robustness of the model can be effectively improved. The method is characterized in that the document image without document boundaries or only comprising partial document boundaries is added into training, so that generalization of the model can be improved, and the model can learn how to correct the image by utilizing the characteristics of deformed text lines and the like only in the image more effectively.
In view of this, the present invention has been made.
Disclosure of Invention
The invention aims to provide a method, a system, equipment and a storage medium for correcting an unconstrained fold document image, which can correct a deformed document image without document boundaries or only including partial document boundaries and can also promote the correction effect of a complete document boundary image. In summary, the invention can effectively correct and recover various deformed document images regardless of the constraint of the integrity of the document boundary and the deformation degree of the input folded document image, and can effectively improve the practicability and the practical application effect of document image correction.
The invention aims at realizing the following technical scheme:
an unconstrained pleated document image correction method comprising:
modeling pixel mapping relation from the fold document image to the deformation-free document image, and generating sample pairs, wherein each sample pair comprises an unconstrained fold document image block and a coordinate mapping matrix from the unconstrained fold document image block to the deformation-free document image block;
constructing an unconstrained document image correction network, and training a formed training data set by utilizing a plurality of samples;
and inputting the unconstrained fold document image into a trained unconstrained document image correction network to obtain a predicted coordinate mapping matrix, and correcting the unconstrained fold document image by using the predicted coordinate mapping matrix to obtain a corrected image.
An unconstrained pleated document image correction system comprising:
the pixel mapping relation modeling and sample pair generating unit is used for modeling the pixel mapping relation from the fold document image to the deformation-free document image and generating sample pairs, wherein each sample pair comprises an unconstrained fold document image block and a coordinate mapping matrix from the unconstrained fold document image block to the deformation-free document image block;
the network construction and training unit is used for constructing an unconstrained document image correction network and training the formed training data set by utilizing a plurality of samples;
and the image correction unit is used for inputting the unconstrained fold document image into a trained unconstrained document image correction network to obtain a prediction coordinate mapping matrix, and correcting the unconstrained fold document image by using the prediction coordinate mapping matrix to obtain a corrected image.
A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the aforementioned methods.
A readable storage medium storing a computer program which, when executed by a processor, implements the method described above.
The technical scheme provided by the invention can solve the problem that the application scene of the existing scheme is limited, namely the deformed document image without document boundaries or only containing partial document boundaries can not be corrected. Meanwhile, the invention also improves the correction and recovery effects on the image with the complete document boundary. Compared with the traditional method, the method has no constraint on any form of the input fold document image, can more robustly and accurately correct various deformation document images shot in daily life, can be widely applied to portable equipment with cameras such as smart phones, and has wider application scenes and higher accuracy. Therefore, the invention greatly promotes the popularization of the digitization of the document image and provides powerful technical support for the digitization conversion of the paper document.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an unconstrained fold document image correction method provided by an embodiment of the present invention;
FIG. 2 is a schematic modeling diagram of a pixel mapping relationship between an input deformed document image and an output non-deformed document according to an embodiment of the present invention;
FIG. 3 is a flowchart for implementing deformation image correction based on an unconstrained document image correction network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an unconstrained pleated document image correction system according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a processing apparatus according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The terms that may be used herein will first be described as follows:
the terms "comprises," "comprising," "includes," "including," "has," "having" or other similar referents are to be construed to cover a non-exclusive inclusion. For example: including a particular feature (e.g., a starting material, component, ingredient, carrier, formulation, material, dimension, part, means, mechanism, apparatus, step, procedure, method, reaction condition, processing condition, parameter, algorithm, signal, data, product or article of manufacture, etc.), should be construed as including not only a particular feature but also other features known in the art that are not explicitly recited.
The term "consisting of … …" is meant to exclude any technical feature element not explicitly listed. If such term is used in a claim, the term will cause the claim to be closed, such that it does not include technical features other than those specifically listed, except for conventional impurities associated therewith. If the term is intended to appear in only a clause of a claim, it is intended to limit only the elements explicitly recited in that clause, and the elements recited in other clauses are not excluded from the overall claim.
The following describes in detail a method, a system, a device and a storage medium for correcting an image of an unconstrained pleated document. What is not described in detail in the embodiments of the present invention belongs to the prior art known to those skilled in the art. The specific conditions are not noted in the examples of the present invention and are carried out according to the conditions conventional in the art or suggested by the manufacturer.
Example 1
The embodiment of the invention provides a method for correcting an unconstrained fold document image, which mainly comprises the following steps as shown in fig. 1:
As shown in fig. 2, the preferred embodiment of this step is as follows:
(1) Global correction. And acquiring the folded document image with the complete boundary, and correcting the folded document image with the complete boundary into a deformation-free document image by using a corresponding coordinate mapping matrix.
In the embodiment of the invention, the folded document image with the complete boundary and the coordinate mapping matrix thereof are all from the existing public data set, and the coordinate mapping matrix describes the coordinate mapping relation between each pixel of the folded document image and the corresponding non-deformed document image, namely, the pixel position of each pixel in the non-deformed document image in the folded document image.
(2) Modeling the local coordinate mapping relation. Randomly intercepting an image block of an area in the folded document image with the complete boundary, namely an unconstrained folded document image block, finding a corresponding area in the undeformed document image according to a coordinate mapping matrix of the area, namely an undeformed document image block, and intercepting a matrix of the area in the coordinate mapping matrix, namely a coordinate mapping matrix from the unconstrained folded document image block to the undeformed document image block.
As shown in fig. 2, the lower left-hand dotted line frame portion is an image of one area, i.e., an unconstrained pleated document image block, which is randomly taken, and the lower right-hand dotted line frame portion is a corresponding deformation-free document image block. Since the image of an area is randomly intercepted, a folded document image without document boundaries or without complete document boundaries is obtained, and the folded document image with complete document boundaries can be obtained, so the folded document image is called an unconstrained folded document image block.
In the embodiment of the invention, for each fold document image with a complete boundary, an unconstrained fold document image block and a coordinate mapping matrix of the unconstrained fold document image block to the unconstrained fold document image block can be obtained by modeling the pixel mapping relation of the fold document image to the undeformed document image, and a sample pair is formed by the two.
In the embodiment of the invention, a training data set can be formed through a plurality of sample pairs; for each fold document image with a complete boundary, performing one or more times of local coordinate mapping relation modeling after performing global correction to obtain one or more sample pairs; of course, modeling as shown in fig. 2 may also be performed on a plurality of folded document images having complete boundaries to obtain corresponding sample pairs. The specific number of sample pairs may be set according to actual conditions or experience.
It should be noted that, fig. 2 mainly shows a principle of modeling a pixel mapping relationship from a pleated document image to a non-deformed document image, and in consideration of privacy, text in the document image is subjected to blurring processing, but the implementation of a scheme is not affected, and in practical application, the definition of the document image is not regulated.
And 2, constructing an unconstrained document image correction network, and training the formed training data set by using a plurality of samples.
In the embodiment of the invention, the document image correction network may be a full convolution neural network, such as UNet network, and mainly comprises a feature extractor and a feature decoder.
During training, an unconstrained fold document image block in a sample pair is input, feature extraction is carried out through a feature extractor, a predicted coordinate mapping matrix is output through a feature decoder, the coordinate mapping matrix from the unconstrained fold document image block to the unconstrained fold document image block in the sample pair is used as supervision information, and a loss function is built with the predicted coordinate mapping matrix to train the unconstrained document image correction network.
The training process can be implemented by referring to a conventional technology, which is not described in detail in the present invention, when a set stopping condition (for example, the number of training times reaches the set number, or the loss function converges, etc.) is satisfied, the training is stopped.
And step 3, inputting the unconstrained fold document image into a trained unconstrained document image correction network to obtain a prediction coordinate mapping matrix, and correcting the unconstrained fold document image by using the prediction coordinate mapping matrix to obtain a corrected image.
In the embodiment of the invention, the unconstrained fold document image can be a deformed image I in any fold form d As shown in FIG. 3, there may be a pleat shown in section (a) with a complete boundaryThe document image may be a folded document image without document boundaries as shown in the section (b) or a folded document image without complete document boundaries as shown in the section (c); feature extraction and feature decoding are carried out through the trained unconstrained document image correction network, and a predicted coordinate mapping matrix f is output b Then mapping matrix f by predictive coordinates using an up-sampling algorithm (e.g., bilinear interpolation algorithm) b Correcting the unconstrained fold document image to obtain a corrected image I r 。
The scheme provided by the embodiment of the invention can solve the problem that the application scene of the existing scheme is limited, namely, the deformed document image without document boundaries or only containing partial document boundaries can not be corrected. Meanwhile, the invention also improves the correction and recovery effects on the image with the complete document boundary. Compared with the traditional method, the method has no constraint on any form of the input fold document image, can more robustly and accurately correct various deformation document images shot in daily life, can be widely applied to portable equipment with cameras such as smart phones, and has wider application scenes and higher accuracy. Therefore, the invention greatly promotes the popularization of the digitization of the document image and provides powerful technical support for the digitization conversion of the paper document.
Example two
The embodiment of the invention provides an unconstrained fold document image correction system, as shown in fig. 4, which mainly comprises:
the pixel mapping relation modeling and sample pair generating unit is used for modeling the pixel mapping relation from the fold document image to the deformation-free document image and generating sample pairs, wherein each sample pair comprises an unconstrained fold document image block and a coordinate mapping matrix from the unconstrained fold document image block to the deformation-free document image block;
the network construction and training unit is used for constructing an unconstrained document image correction network and training the formed training data set by utilizing a plurality of samples;
and the image correction unit is used for inputting the unconstrained fold document image into a trained unconstrained document image correction network to obtain a prediction coordinate mapping matrix, and correcting the unconstrained fold document image by using the prediction coordinate mapping matrix to obtain a corrected image.
In the embodiment of the present invention, the modeling the pixel mapping relationship from the folded document image to the deformation-free document image, and generating the sample pair includes:
acquiring a fold document image with a complete boundary, and correcting the fold document image with the complete boundary into a deformation-free document image by using a corresponding coordinate mapping matrix;
randomly intercepting an image block of an area in the folded document image with the complete boundary, namely an unconstrained folded document image block, finding a corresponding area in the undeformed document image according to a coordinate mapping matrix of the area, namely an undeformed document image block, intercepting a matrix of the area in the coordinate mapping matrix, namely a coordinate mapping matrix from the unconstrained folded document image block to the undeformed document image block;
the obtained coordinate mapping matrix of unconstrained pleated document image blocks to unconstrained pleated document image blocks forms a sample pair.
In the embodiment of the invention, the construction of the unconstrained document image correction network and the training of the formed training data set by using a plurality of samples comprises the following steps:
constructing an unconstrained document image rectification network comprising a feature extractor and a feature decoder;
during training, an unconstrained fold document image block in a sample pair is input, feature extraction is carried out through a feature extractor, a predicted coordinate mapping matrix is output through a feature decoder, the coordinate mapping matrix from the unconstrained fold document image block to the unconstrained fold document image block in the sample pair is used as supervision information, and a loss function is built with the predicted coordinate mapping matrix to train the unconstrained document image correction network.
In the embodiment of the present invention, the correcting the unconstrained pleated document image by using the prediction coordinate mapping matrix, to obtain a corrected image includes:
and correcting the unconstrained fold document image through a predictive coordinate mapping matrix by utilizing an up-sampling algorithm to obtain a corrected image.
Example III
The present invention also provides a processing apparatus, as shown in fig. 5, which mainly includes: one or more processors; a memory for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by the foregoing embodiments.
Further, the processing device further comprises at least one input device and at least one output device; in the processing device, the processor, the memory, the input device and the output device are connected through buses.
In the embodiment of the invention, the specific types of the memory, the input device and the output device are not limited; for example:
the input device can be a touch screen, an image acquisition device, a smart phone, a physical key or a mouse and the like;
the output device may be a display terminal;
the memory may be random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as disk memory.
Example IV
The invention also provides a readable storage medium storing a computer program which, when executed by a processor, implements the method provided by the foregoing embodiments.
The readable storage medium according to the embodiment of the present invention may be provided as a computer readable storage medium in the aforementioned processing apparatus, for example, as a memory in the processing apparatus. The readable storage medium may be any of various media capable of storing a program code, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, and an optical disk.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (10)
1. An unconstrained pleated document image correction method, comprising:
modeling pixel mapping relation from the fold document image to the deformation-free document image, and generating sample pairs, wherein each sample pair comprises an unconstrained fold document image block and a coordinate mapping matrix from the unconstrained fold document image block to the deformation-free document image block;
constructing an unconstrained document image correction network, and training a formed training data set by utilizing a plurality of samples;
and inputting the unconstrained fold document image into a trained unconstrained document image correction network to obtain a predicted coordinate mapping matrix, and correcting the unconstrained fold document image by using the predicted coordinate mapping matrix to obtain a corrected image.
2. The method of claim 1, wherein modeling the pixel mapping relationship of the folded document image to the deformation-free document image, generating the sample pair comprises:
acquiring a fold document image with a complete boundary, and correcting the fold document image with the complete boundary into a deformation-free document image by using a corresponding coordinate mapping matrix;
randomly intercepting an image block of an area in the folded document image with the complete boundary, namely an unconstrained folded document image block, finding a corresponding area in the undeformed document image according to a coordinate mapping matrix of the area, namely an undeformed document image block, intercepting a matrix of the area in the coordinate mapping matrix, namely a coordinate mapping matrix from the unconstrained folded document image block to the undeformed document image block;
the obtained coordinate mapping matrix of unconstrained pleated document image blocks to unconstrained pleated document image blocks forms a sample pair.
3. The method of claim 1, wherein constructing an unconstrained folded document image correction network and training the formed training data set using a plurality of samples comprises:
constructing an unconstrained document image rectification network comprising a feature extractor and a feature decoder;
during training, an unconstrained fold document image block in a sample pair is input, feature extraction is carried out through a feature extractor, a predicted coordinate mapping matrix is output through a feature decoder, the coordinate mapping matrix from the unconstrained fold document image block to the unconstrained fold document image block in the sample pair is used as supervision information, and a loss function is built with the predicted coordinate mapping matrix to train the unconstrained document image correction network.
4. The method of claim 1, wherein said correcting said unconstrained folded document image using said predictive coordinate mapping matrix comprises:
and correcting the unconstrained fold document image through a predictive coordinate mapping matrix by utilizing an up-sampling algorithm to obtain a corrected image.
5. An unconstrained pleated document image correction system comprising:
the pixel mapping relation modeling and sample pair generating unit is used for modeling the pixel mapping relation from the fold document image to the deformation-free document image and generating sample pairs, wherein each sample pair comprises an unconstrained fold document image block and a coordinate mapping matrix from the unconstrained fold document image block to the deformation-free document image block;
the network construction and training unit is used for constructing an unconstrained document image correction network and training the formed training data set by utilizing a plurality of samples;
and the image correction unit is used for inputting the unconstrained fold document image into a trained unconstrained document image correction network to obtain a prediction coordinate mapping matrix, and correcting the unconstrained fold document image by using the prediction coordinate mapping matrix to obtain a corrected image.
6. The unconstrained folded document image rectification system of claim 5, wherein said modeling pixel mappings of folded document images to non-deformed document images, generating pairs of samples comprises:
acquiring a fold document image with a complete boundary, and correcting the fold document image with the complete boundary into a deformation-free document image by using a corresponding coordinate mapping matrix;
randomly intercepting an image block of an area in the folded document image with the complete boundary, namely an unconstrained folded document image block, finding a corresponding area in the undeformed document image according to a coordinate mapping matrix of the area, namely an undeformed document image block, intercepting a matrix of the area in the coordinate mapping matrix, namely a coordinate mapping matrix from the unconstrained folded document image block to the undeformed document image block;
the obtained coordinate mapping matrix of unconstrained pleated document image blocks to unconstrained pleated document image blocks forms a sample pair.
7. The unconstrained folded document image correction system of claim 5, wherein constructing an unconstrained document image correction network and training the formed training data set using the plurality of samples comprises:
constructing an unconstrained document image rectification network comprising a feature extractor and a feature decoder;
during training, an unconstrained fold document image block in a sample pair is input, feature extraction is carried out through a feature extractor, a predicted coordinate mapping matrix is output through a feature decoder, the coordinate mapping matrix from the unconstrained fold document image block to the unconstrained fold document image block in the sample pair is used as supervision information, and a loss function is built with the predicted coordinate mapping matrix to train the unconstrained document image correction network.
8. The unconstrained folded document image rectification system of claim 5, wherein said utilizing said predictive coordinate mapping matrix to rectify said unconstrained folded document image comprises:
and correcting the unconstrained fold document image through a predictive coordinate mapping matrix by utilizing an up-sampling algorithm to obtain a corrected image.
9. A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-4.
10. A readable storage medium storing a computer program, characterized in that the method according to any one of claims 1-4 is implemented when the computer program is executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310392392.XA CN116403226A (en) | 2023-04-13 | 2023-04-13 | Unconstrained fold document image correction method, system, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310392392.XA CN116403226A (en) | 2023-04-13 | 2023-04-13 | Unconstrained fold document image correction method, system, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116403226A true CN116403226A (en) | 2023-07-07 |
Family
ID=87008754
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310392392.XA Pending CN116403226A (en) | 2023-04-13 | 2023-04-13 | Unconstrained fold document image correction method, system, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116403226A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116912831A (en) * | 2023-09-15 | 2023-10-20 | 东莞市将为防伪科技有限公司 | Method and system for processing acquired information of letter code anti-counterfeiting printed matter |
-
2023
- 2023-04-13 CN CN202310392392.XA patent/CN116403226A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116912831A (en) * | 2023-09-15 | 2023-10-20 | 东莞市将为防伪科技有限公司 | Method and system for processing acquired information of letter code anti-counterfeiting printed matter |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102697331B1 (en) | Method, device, electronic device, storage medium and program product for restoring human image | |
Li et al. | Document rectification and illumination correction using a patch-based CNN | |
Zhang et al. | Framebreak: Dramatic image extrapolation by guided shift-maps | |
Piva | An overview on image forensics | |
JP4556813B2 (en) | Image processing apparatus and program | |
RU2368006C1 (en) | Method and system for adaptive reformatting of digital images | |
RU2631765C1 (en) | Method and system of correcting perspective distortions in images occupying double-page spread | |
US8503813B2 (en) | Image rectification method | |
CN112767270B (en) | Fold document image correction system | |
US11620730B2 (en) | Method for merging multiple images and post-processing of panorama | |
CN107749986B (en) | Teaching video generation method and device, storage medium and computer equipment | |
JP2007074578A (en) | Image processor, photography instrument, and program | |
CN103019537A (en) | Image preview method and image preview device | |
CN114615480B (en) | Projection screen adjustment method, apparatus, device, storage medium, and program product | |
CN116403226A (en) | Unconstrained fold document image correction method, system, equipment and storage medium | |
JP2017120503A (en) | Information processing device, control method and program of information processing device | |
JP4898655B2 (en) | Imaging apparatus and image composition program | |
CN115761827A (en) | Cosmetic progress detection method, device, equipment and storage medium | |
CN109359652A (en) | A method of the fast automatic extraction rectangular scanning part from digital photograph | |
Zhang et al. | Nonlocal edge-directed interpolation | |
CN112036342A (en) | Document snapshot method, device and computer storage medium | |
Dey | Image Processing Masterclass with Python: 50+ Solutions and Techniques Solving Complex Digital Image Processing Challenges Using Numpy, Scipy, Pytorch and Keras (English Edition) | |
CN113837018B (en) | Cosmetic progress detection method, device, equipment and storage medium | |
US20210281742A1 (en) | Document detections from video images | |
CN113837019B (en) | Cosmetic progress detection method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |