CN116403226A - Unconstrained fold document image correction method, system, equipment and storage medium - Google Patents

Unconstrained fold document image correction method, system, equipment and storage medium Download PDF

Info

Publication number
CN116403226A
CN116403226A CN202310392392.XA CN202310392392A CN116403226A CN 116403226 A CN116403226 A CN 116403226A CN 202310392392 A CN202310392392 A CN 202310392392A CN 116403226 A CN116403226 A CN 116403226A
Authority
CN
China
Prior art keywords
document image
unconstrained
fold
mapping matrix
coordinate mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310392392.XA
Other languages
Chinese (zh)
Inventor
李厚强
周文罡
冯浩
刘绍锴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202310392392.XA priority Critical patent/CN116403226A/en
Publication of CN116403226A publication Critical patent/CN116403226A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/43Editing text-bitmaps, e.g. alignment, spacing; Semantic analysis of bitmaps of text without OCR
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/1607Correcting image deformation, e.g. trapezoidal deformation caused by perspective
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method, a system, equipment and a storage medium for correcting unconstrained fold document images, which are one-to-one corresponding schemes, and the related schemes can solve the problem that the application scene of the existing scheme is limited, namely, deformed document images without document boundaries or only containing partial document boundaries can not be corrected. Meanwhile, the invention also improves the correction and recovery effects on the image with the complete document boundary. Compared with the traditional method, the method has no constraint on any form of the input fold document image, can more robustly and accurately correct various deformation document images shot in daily life, can be widely applied to portable equipment with cameras such as smart phones, and has wider application scenes and higher accuracy. Therefore, the invention greatly promotes the popularization of the digitization of the document image and provides powerful technical support for the digitization conversion of the paper document.

Description

Unconstrained fold document image correction method, system, equipment and storage medium
Technical Field
The invention relates to the technical field of fold document image correction, in particular to an unconstrained fold document image correction method, an unconstrained fold document image correction system, unconstrained fold document image correction equipment and a storage medium.
Background
With the rapid progress and popularity of portable cameras and smartphones, more and more people choose to shoot scanned paper documents with them without the need to use dedicated flatbed scanners as in the past. However, due to various uncertainty factors in the shooting environment, such as uncertainty in the camera position, uncertainty in the illumination environment, uncertainty in the type and degree of deformation of paper, etc., document images shot by these devices tend to be distorted and deformed to various degrees. This makes processing of downstream tasks, such as automated text recognition, content analysis, editing and understanding, etc., more difficult. At the same time, this is disadvantageous for the propagation and communication of information and knowledge in daily life. In order to solve this problem, the correction of the folded document image is an important research topic in the current computer vision field.
Traditional solutions are mainly based on 3D reconstruction techniques. These methods typically rely on additional hardware devices (e.g., laser scanners, depth cameras, etc.) or by capturing multi-view images around the pleated paper to reconstruct the three-dimensional structure of the paper and performing flattening correction based thereon. However, the popularization and use of these techniques are greatly limited due to high hardware costs or cumbersome shooting requirements.
Currently, many smartphones have document correction algorithms built in. These algorithms are mostly based on projective transformation techniques: firstly, detecting four straight line edges or four corner points of a paper document in a shot document image to form a quadrilateral region where the document is located; then, a projective transformation technique is applied to map the image into a regular rectangular image, thereby completing correction of the photographed document image. However, this solution requires that in the captured image, a complete document must appear and cannot be rectified if there is distortion in the document itself, thereby affecting the effect. This limitation also brings inconvenience: many times, the user may focus only on a partial region of the document.
In recent years, deep learning has been introduced into the field of image correction of wrinkled documents. Compared with the traditional method, the method based on the deep learning realizes similar performance and simultaneously requires less calculation cost. By training with a large number of deformation-free image pairs synthesized by the rendering engine, the neural network learns the ability to correct document wrinkles. In the reasoning stage, a single-fold RGB document image is input, the neural network can output a pixel-by-pixel coordinate mapping matrix, and pixels in a fold document area in the input image are sampled into an empty image, so that a complete correction image is obtained.
In general, whether a document correction algorithm is built in a smart phone or an existing deep learning method, the method mainly has the following defects:
(1) Current document image correction algorithms based on deep learning generally correct only a document image with complete boundaries, i.e. the input image must contain a complete document. However, in a practical application scenario, the user may only want to pay attention to or share a partial region or text in the document. Therefore, there may be a case where the document boundary is missing in the photographed image. In addition, there is often a case where an edge portion of a document image photographed by a mobile phone is missing. In this case, the existing document image correction method will fail, and a normal correction result cannot be obtained. The current technical solution lacks effective research for correcting the document image without document boundaries or only including partial document boundaries, and needs further exploration and improvement.
(2) The current smart phone has limited applicable scenarios for built-in document image correction algorithms. These algorithms are only applicable to complete, deformation-free document images, i.e. the paper document is free of folds, bends and wrinkles and appears completely in the captured image. In short, these algorithms simply switch the image projection plane of the paper document to a regular rectangular shape, and once the shape of the paper document is not a regular quadrilateral, these algorithms cannot normally complete document image correction.
(3) The existing document image correction algorithm based on deep learning still has a certain degree of distortion in corrected document images. This is because these consider only document images with complete borders and ignore document images without document borders or containing only partial document borders when training the model. The latter is incorporated into model training, so that the accuracy and the robustness of the model can be effectively improved. The method is characterized in that the document image without document boundaries or only comprising partial document boundaries is added into training, so that generalization of the model can be improved, and the model can learn how to correct the image by utilizing the characteristics of deformed text lines and the like only in the image more effectively.
In view of this, the present invention has been made.
Disclosure of Invention
The invention aims to provide a method, a system, equipment and a storage medium for correcting an unconstrained fold document image, which can correct a deformed document image without document boundaries or only including partial document boundaries and can also promote the correction effect of a complete document boundary image. In summary, the invention can effectively correct and recover various deformed document images regardless of the constraint of the integrity of the document boundary and the deformation degree of the input folded document image, and can effectively improve the practicability and the practical application effect of document image correction.
The invention aims at realizing the following technical scheme:
an unconstrained pleated document image correction method comprising:
modeling pixel mapping relation from the fold document image to the deformation-free document image, and generating sample pairs, wherein each sample pair comprises an unconstrained fold document image block and a coordinate mapping matrix from the unconstrained fold document image block to the deformation-free document image block;
constructing an unconstrained document image correction network, and training a formed training data set by utilizing a plurality of samples;
and inputting the unconstrained fold document image into a trained unconstrained document image correction network to obtain a predicted coordinate mapping matrix, and correcting the unconstrained fold document image by using the predicted coordinate mapping matrix to obtain a corrected image.
An unconstrained pleated document image correction system comprising:
the pixel mapping relation modeling and sample pair generating unit is used for modeling the pixel mapping relation from the fold document image to the deformation-free document image and generating sample pairs, wherein each sample pair comprises an unconstrained fold document image block and a coordinate mapping matrix from the unconstrained fold document image block to the deformation-free document image block;
the network construction and training unit is used for constructing an unconstrained document image correction network and training the formed training data set by utilizing a plurality of samples;
and the image correction unit is used for inputting the unconstrained fold document image into a trained unconstrained document image correction network to obtain a prediction coordinate mapping matrix, and correcting the unconstrained fold document image by using the prediction coordinate mapping matrix to obtain a corrected image.
A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the aforementioned methods.
A readable storage medium storing a computer program which, when executed by a processor, implements the method described above.
The technical scheme provided by the invention can solve the problem that the application scene of the existing scheme is limited, namely the deformed document image without document boundaries or only containing partial document boundaries can not be corrected. Meanwhile, the invention also improves the correction and recovery effects on the image with the complete document boundary. Compared with the traditional method, the method has no constraint on any form of the input fold document image, can more robustly and accurately correct various deformation document images shot in daily life, can be widely applied to portable equipment with cameras such as smart phones, and has wider application scenes and higher accuracy. Therefore, the invention greatly promotes the popularization of the digitization of the document image and provides powerful technical support for the digitization conversion of the paper document.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an unconstrained fold document image correction method provided by an embodiment of the present invention;
FIG. 2 is a schematic modeling diagram of a pixel mapping relationship between an input deformed document image and an output non-deformed document according to an embodiment of the present invention;
FIG. 3 is a flowchart for implementing deformation image correction based on an unconstrained document image correction network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an unconstrained pleated document image correction system according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a processing apparatus according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The terms that may be used herein will first be described as follows:
the terms "comprises," "comprising," "includes," "including," "has," "having" or other similar referents are to be construed to cover a non-exclusive inclusion. For example: including a particular feature (e.g., a starting material, component, ingredient, carrier, formulation, material, dimension, part, means, mechanism, apparatus, step, procedure, method, reaction condition, processing condition, parameter, algorithm, signal, data, product or article of manufacture, etc.), should be construed as including not only a particular feature but also other features known in the art that are not explicitly recited.
The term "consisting of … …" is meant to exclude any technical feature element not explicitly listed. If such term is used in a claim, the term will cause the claim to be closed, such that it does not include technical features other than those specifically listed, except for conventional impurities associated therewith. If the term is intended to appear in only a clause of a claim, it is intended to limit only the elements explicitly recited in that clause, and the elements recited in other clauses are not excluded from the overall claim.
The following describes in detail a method, a system, a device and a storage medium for correcting an image of an unconstrained pleated document. What is not described in detail in the embodiments of the present invention belongs to the prior art known to those skilled in the art. The specific conditions are not noted in the examples of the present invention and are carried out according to the conditions conventional in the art or suggested by the manufacturer.
Example 1
The embodiment of the invention provides a method for correcting an unconstrained fold document image, which mainly comprises the following steps as shown in fig. 1:
step 1, generating a sample pair by modeling a pixel mapping relation from a fold document image to a deformation-free document image, wherein the sample pair comprises an unconstrained fold document image block and a coordinate mapping matrix from the unconstrained fold document image block to the deformation-free document image block.
As shown in fig. 2, the preferred embodiment of this step is as follows:
(1) Global correction. And acquiring the folded document image with the complete boundary, and correcting the folded document image with the complete boundary into a deformation-free document image by using a corresponding coordinate mapping matrix.
In the embodiment of the invention, the folded document image with the complete boundary and the coordinate mapping matrix thereof are all from the existing public data set, and the coordinate mapping matrix describes the coordinate mapping relation between each pixel of the folded document image and the corresponding non-deformed document image, namely, the pixel position of each pixel in the non-deformed document image in the folded document image.
(2) Modeling the local coordinate mapping relation. Randomly intercepting an image block of an area in the folded document image with the complete boundary, namely an unconstrained folded document image block, finding a corresponding area in the undeformed document image according to a coordinate mapping matrix of the area, namely an undeformed document image block, and intercepting a matrix of the area in the coordinate mapping matrix, namely a coordinate mapping matrix from the unconstrained folded document image block to the undeformed document image block.
As shown in fig. 2, the lower left-hand dotted line frame portion is an image of one area, i.e., an unconstrained pleated document image block, which is randomly taken, and the lower right-hand dotted line frame portion is a corresponding deformation-free document image block. Since the image of an area is randomly intercepted, a folded document image without document boundaries or without complete document boundaries is obtained, and the folded document image with complete document boundaries can be obtained, so the folded document image is called an unconstrained folded document image block.
In the embodiment of the invention, for each fold document image with a complete boundary, an unconstrained fold document image block and a coordinate mapping matrix of the unconstrained fold document image block to the unconstrained fold document image block can be obtained by modeling the pixel mapping relation of the fold document image to the undeformed document image, and a sample pair is formed by the two.
In the embodiment of the invention, a training data set can be formed through a plurality of sample pairs; for each fold document image with a complete boundary, performing one or more times of local coordinate mapping relation modeling after performing global correction to obtain one or more sample pairs; of course, modeling as shown in fig. 2 may also be performed on a plurality of folded document images having complete boundaries to obtain corresponding sample pairs. The specific number of sample pairs may be set according to actual conditions or experience.
It should be noted that, fig. 2 mainly shows a principle of modeling a pixel mapping relationship from a pleated document image to a non-deformed document image, and in consideration of privacy, text in the document image is subjected to blurring processing, but the implementation of a scheme is not affected, and in practical application, the definition of the document image is not regulated.
And 2, constructing an unconstrained document image correction network, and training the formed training data set by using a plurality of samples.
In the embodiment of the invention, the document image correction network may be a full convolution neural network, such as UNet network, and mainly comprises a feature extractor and a feature decoder.
During training, an unconstrained fold document image block in a sample pair is input, feature extraction is carried out through a feature extractor, a predicted coordinate mapping matrix is output through a feature decoder, the coordinate mapping matrix from the unconstrained fold document image block to the unconstrained fold document image block in the sample pair is used as supervision information, and a loss function is built with the predicted coordinate mapping matrix to train the unconstrained document image correction network.
The training process can be implemented by referring to a conventional technology, which is not described in detail in the present invention, when a set stopping condition (for example, the number of training times reaches the set number, or the loss function converges, etc.) is satisfied, the training is stopped.
And step 3, inputting the unconstrained fold document image into a trained unconstrained document image correction network to obtain a prediction coordinate mapping matrix, and correcting the unconstrained fold document image by using the prediction coordinate mapping matrix to obtain a corrected image.
In the embodiment of the invention, the unconstrained fold document image can be a deformed image I in any fold form d As shown in FIG. 3, there may be a pleat shown in section (a) with a complete boundaryThe document image may be a folded document image without document boundaries as shown in the section (b) or a folded document image without complete document boundaries as shown in the section (c); feature extraction and feature decoding are carried out through the trained unconstrained document image correction network, and a predicted coordinate mapping matrix f is output b Then mapping matrix f by predictive coordinates using an up-sampling algorithm (e.g., bilinear interpolation algorithm) b Correcting the unconstrained fold document image to obtain a corrected image I r
The scheme provided by the embodiment of the invention can solve the problem that the application scene of the existing scheme is limited, namely, the deformed document image without document boundaries or only containing partial document boundaries can not be corrected. Meanwhile, the invention also improves the correction and recovery effects on the image with the complete document boundary. Compared with the traditional method, the method has no constraint on any form of the input fold document image, can more robustly and accurately correct various deformation document images shot in daily life, can be widely applied to portable equipment with cameras such as smart phones, and has wider application scenes and higher accuracy. Therefore, the invention greatly promotes the popularization of the digitization of the document image and provides powerful technical support for the digitization conversion of the paper document.
Example two
The embodiment of the invention provides an unconstrained fold document image correction system, as shown in fig. 4, which mainly comprises:
the pixel mapping relation modeling and sample pair generating unit is used for modeling the pixel mapping relation from the fold document image to the deformation-free document image and generating sample pairs, wherein each sample pair comprises an unconstrained fold document image block and a coordinate mapping matrix from the unconstrained fold document image block to the deformation-free document image block;
the network construction and training unit is used for constructing an unconstrained document image correction network and training the formed training data set by utilizing a plurality of samples;
and the image correction unit is used for inputting the unconstrained fold document image into a trained unconstrained document image correction network to obtain a prediction coordinate mapping matrix, and correcting the unconstrained fold document image by using the prediction coordinate mapping matrix to obtain a corrected image.
In the embodiment of the present invention, the modeling the pixel mapping relationship from the folded document image to the deformation-free document image, and generating the sample pair includes:
acquiring a fold document image with a complete boundary, and correcting the fold document image with the complete boundary into a deformation-free document image by using a corresponding coordinate mapping matrix;
randomly intercepting an image block of an area in the folded document image with the complete boundary, namely an unconstrained folded document image block, finding a corresponding area in the undeformed document image according to a coordinate mapping matrix of the area, namely an undeformed document image block, intercepting a matrix of the area in the coordinate mapping matrix, namely a coordinate mapping matrix from the unconstrained folded document image block to the undeformed document image block;
the obtained coordinate mapping matrix of unconstrained pleated document image blocks to unconstrained pleated document image blocks forms a sample pair.
In the embodiment of the invention, the construction of the unconstrained document image correction network and the training of the formed training data set by using a plurality of samples comprises the following steps:
constructing an unconstrained document image rectification network comprising a feature extractor and a feature decoder;
during training, an unconstrained fold document image block in a sample pair is input, feature extraction is carried out through a feature extractor, a predicted coordinate mapping matrix is output through a feature decoder, the coordinate mapping matrix from the unconstrained fold document image block to the unconstrained fold document image block in the sample pair is used as supervision information, and a loss function is built with the predicted coordinate mapping matrix to train the unconstrained document image correction network.
In the embodiment of the present invention, the correcting the unconstrained pleated document image by using the prediction coordinate mapping matrix, to obtain a corrected image includes:
and correcting the unconstrained fold document image through a predictive coordinate mapping matrix by utilizing an up-sampling algorithm to obtain a corrected image.
Example III
The present invention also provides a processing apparatus, as shown in fig. 5, which mainly includes: one or more processors; a memory for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by the foregoing embodiments.
Further, the processing device further comprises at least one input device and at least one output device; in the processing device, the processor, the memory, the input device and the output device are connected through buses.
In the embodiment of the invention, the specific types of the memory, the input device and the output device are not limited; for example:
the input device can be a touch screen, an image acquisition device, a smart phone, a physical key or a mouse and the like;
the output device may be a display terminal;
the memory may be random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as disk memory.
Example IV
The invention also provides a readable storage medium storing a computer program which, when executed by a processor, implements the method provided by the foregoing embodiments.
The readable storage medium according to the embodiment of the present invention may be provided as a computer readable storage medium in the aforementioned processing apparatus, for example, as a memory in the processing apparatus. The readable storage medium may be any of various media capable of storing a program code, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, and an optical disk.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (10)

1. An unconstrained pleated document image correction method, comprising:
modeling pixel mapping relation from the fold document image to the deformation-free document image, and generating sample pairs, wherein each sample pair comprises an unconstrained fold document image block and a coordinate mapping matrix from the unconstrained fold document image block to the deformation-free document image block;
constructing an unconstrained document image correction network, and training a formed training data set by utilizing a plurality of samples;
and inputting the unconstrained fold document image into a trained unconstrained document image correction network to obtain a predicted coordinate mapping matrix, and correcting the unconstrained fold document image by using the predicted coordinate mapping matrix to obtain a corrected image.
2. The method of claim 1, wherein modeling the pixel mapping relationship of the folded document image to the deformation-free document image, generating the sample pair comprises:
acquiring a fold document image with a complete boundary, and correcting the fold document image with the complete boundary into a deformation-free document image by using a corresponding coordinate mapping matrix;
randomly intercepting an image block of an area in the folded document image with the complete boundary, namely an unconstrained folded document image block, finding a corresponding area in the undeformed document image according to a coordinate mapping matrix of the area, namely an undeformed document image block, intercepting a matrix of the area in the coordinate mapping matrix, namely a coordinate mapping matrix from the unconstrained folded document image block to the undeformed document image block;
the obtained coordinate mapping matrix of unconstrained pleated document image blocks to unconstrained pleated document image blocks forms a sample pair.
3. The method of claim 1, wherein constructing an unconstrained folded document image correction network and training the formed training data set using a plurality of samples comprises:
constructing an unconstrained document image rectification network comprising a feature extractor and a feature decoder;
during training, an unconstrained fold document image block in a sample pair is input, feature extraction is carried out through a feature extractor, a predicted coordinate mapping matrix is output through a feature decoder, the coordinate mapping matrix from the unconstrained fold document image block to the unconstrained fold document image block in the sample pair is used as supervision information, and a loss function is built with the predicted coordinate mapping matrix to train the unconstrained document image correction network.
4. The method of claim 1, wherein said correcting said unconstrained folded document image using said predictive coordinate mapping matrix comprises:
and correcting the unconstrained fold document image through a predictive coordinate mapping matrix by utilizing an up-sampling algorithm to obtain a corrected image.
5. An unconstrained pleated document image correction system comprising:
the pixel mapping relation modeling and sample pair generating unit is used for modeling the pixel mapping relation from the fold document image to the deformation-free document image and generating sample pairs, wherein each sample pair comprises an unconstrained fold document image block and a coordinate mapping matrix from the unconstrained fold document image block to the deformation-free document image block;
the network construction and training unit is used for constructing an unconstrained document image correction network and training the formed training data set by utilizing a plurality of samples;
and the image correction unit is used for inputting the unconstrained fold document image into a trained unconstrained document image correction network to obtain a prediction coordinate mapping matrix, and correcting the unconstrained fold document image by using the prediction coordinate mapping matrix to obtain a corrected image.
6. The unconstrained folded document image rectification system of claim 5, wherein said modeling pixel mappings of folded document images to non-deformed document images, generating pairs of samples comprises:
acquiring a fold document image with a complete boundary, and correcting the fold document image with the complete boundary into a deformation-free document image by using a corresponding coordinate mapping matrix;
randomly intercepting an image block of an area in the folded document image with the complete boundary, namely an unconstrained folded document image block, finding a corresponding area in the undeformed document image according to a coordinate mapping matrix of the area, namely an undeformed document image block, intercepting a matrix of the area in the coordinate mapping matrix, namely a coordinate mapping matrix from the unconstrained folded document image block to the undeformed document image block;
the obtained coordinate mapping matrix of unconstrained pleated document image blocks to unconstrained pleated document image blocks forms a sample pair.
7. The unconstrained folded document image correction system of claim 5, wherein constructing an unconstrained document image correction network and training the formed training data set using the plurality of samples comprises:
constructing an unconstrained document image rectification network comprising a feature extractor and a feature decoder;
during training, an unconstrained fold document image block in a sample pair is input, feature extraction is carried out through a feature extractor, a predicted coordinate mapping matrix is output through a feature decoder, the coordinate mapping matrix from the unconstrained fold document image block to the unconstrained fold document image block in the sample pair is used as supervision information, and a loss function is built with the predicted coordinate mapping matrix to train the unconstrained document image correction network.
8. The unconstrained folded document image rectification system of claim 5, wherein said utilizing said predictive coordinate mapping matrix to rectify said unconstrained folded document image comprises:
and correcting the unconstrained fold document image through a predictive coordinate mapping matrix by utilizing an up-sampling algorithm to obtain a corrected image.
9. A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-4.
10. A readable storage medium storing a computer program, characterized in that the method according to any one of claims 1-4 is implemented when the computer program is executed by a processor.
CN202310392392.XA 2023-04-13 2023-04-13 Unconstrained fold document image correction method, system, equipment and storage medium Pending CN116403226A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310392392.XA CN116403226A (en) 2023-04-13 2023-04-13 Unconstrained fold document image correction method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310392392.XA CN116403226A (en) 2023-04-13 2023-04-13 Unconstrained fold document image correction method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116403226A true CN116403226A (en) 2023-07-07

Family

ID=87008754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310392392.XA Pending CN116403226A (en) 2023-04-13 2023-04-13 Unconstrained fold document image correction method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116403226A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912831A (en) * 2023-09-15 2023-10-20 东莞市将为防伪科技有限公司 Method and system for processing acquired information of letter code anti-counterfeiting printed matter

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912831A (en) * 2023-09-15 2023-10-20 东莞市将为防伪科技有限公司 Method and system for processing acquired information of letter code anti-counterfeiting printed matter

Similar Documents

Publication Publication Date Title
KR102697331B1 (en) Method, device, electronic device, storage medium and program product for restoring human image
Li et al. Document rectification and illumination correction using a patch-based CNN
Zhang et al. Framebreak: Dramatic image extrapolation by guided shift-maps
Piva An overview on image forensics
JP4556813B2 (en) Image processing apparatus and program
RU2368006C1 (en) Method and system for adaptive reformatting of digital images
RU2631765C1 (en) Method and system of correcting perspective distortions in images occupying double-page spread
US8503813B2 (en) Image rectification method
CN112767270B (en) Fold document image correction system
US11620730B2 (en) Method for merging multiple images and post-processing of panorama
CN107749986B (en) Teaching video generation method and device, storage medium and computer equipment
JP2007074578A (en) Image processor, photography instrument, and program
CN103019537A (en) Image preview method and image preview device
CN114615480B (en) Projection screen adjustment method, apparatus, device, storage medium, and program product
CN116403226A (en) Unconstrained fold document image correction method, system, equipment and storage medium
JP2017120503A (en) Information processing device, control method and program of information processing device
JP4898655B2 (en) Imaging apparatus and image composition program
CN115761827A (en) Cosmetic progress detection method, device, equipment and storage medium
CN109359652A (en) A method of the fast automatic extraction rectangular scanning part from digital photograph
Zhang et al. Nonlocal edge-directed interpolation
CN112036342A (en) Document snapshot method, device and computer storage medium
Dey Image Processing Masterclass with Python: 50+ Solutions and Techniques Solving Complex Digital Image Processing Challenges Using Numpy, Scipy, Pytorch and Keras (English Edition)
CN113837018B (en) Cosmetic progress detection method, device, equipment and storage medium
US20210281742A1 (en) Document detections from video images
CN113837019B (en) Cosmetic progress detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination