CN116403226A - Unconstrained fold document image correction method, system, equipment and storage medium - Google Patents
Unconstrained fold document image correction method, system, equipment and storage medium Download PDFInfo
- Publication number
- CN116403226A CN116403226A CN202310392392.XA CN202310392392A CN116403226A CN 116403226 A CN116403226 A CN 116403226A CN 202310392392 A CN202310392392 A CN 202310392392A CN 116403226 A CN116403226 A CN 116403226A
- Authority
- CN
- China
- Prior art keywords
- document image
- unconstrained
- wrinkled
- document
- mapping matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000003702 image correction Methods 0.000 title claims description 44
- 238000013507 mapping Methods 0.000 claims description 79
- 239000011159 matrix material Substances 0.000 claims description 65
- 238000012549 training Methods 0.000 claims description 24
- 238000012545 processing Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 3
- 230000037303 wrinkles Effects 0.000 claims description 2
- 238000012937 correction Methods 0.000 abstract description 10
- 230000000694 effects Effects 0.000 abstract description 6
- 238000006243 chemical reaction Methods 0.000 abstract description 4
- 238000011084 recovery Methods 0.000 abstract 1
- 238000013135 deep learning Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 239000000306 component Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000002552 dosage form Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000012535 impurity Substances 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000002994 raw material Substances 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/43—Editing text-bitmaps, e.g. alignment, spacing; Semantic analysis of bitmaps of text without OCR
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/16—Image preprocessing
- G06V30/1607—Correcting image deformation, e.g. trapezoidal deformation caused by perspective
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19147—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
Description
技术领域technical field
本发明涉及褶皱文档图像矫正技术领域,尤其涉及一种无约束褶皱文档图像矫正方法、系统、设备及存储介质。The present invention relates to the technical field of wrinkled document image correction, in particular to an unconstrained wrinkled document image correction method, system, device and storage medium.
背景技术Background technique
随着便携式相机和智能手机的快速进步和普及,越来越多的人选择用它们拍摄扫描纸质文档,而不需要像以往一样使用专用平板扫描仪。然而,由于拍摄环境中诸多不确定因素,如相机位置不确定、光照环境不确定,以及纸张形变类型和程度不确定等,由这些设备拍摄的文档图像往往会带有不同程度的各式各样的失真和变形。这使得下游任务的处理,如自动化的文本识别、内容分析、编辑和理解等,变得更加困难。同时,这也不利于日常生活中信息和知识的传播和交流。为了解决这一问题,褶皱文档图像矫正成为了当前计算机视觉领域中的重要研究课题。With the rapid progress and popularity of portable cameras and smart phones, more and more people choose to use them to scan paper documents instead of using dedicated flatbed scanners as before. However, due to many uncertain factors in the shooting environment, such as uncertain camera position, uncertain lighting environment, and uncertain types and degrees of paper deformation, document images captured by these devices often have varying degrees of various distortion and deformation. This makes the processing of downstream tasks, such as automated text recognition, content analysis, editing, and comprehension, much more difficult. At the same time, it is not conducive to the dissemination and exchange of information and knowledge in daily life. In order to solve this problem, wrinkled document image correction has become an important research topic in the field of computer vision.
传统的解决方案主要基于3D重建技术。这些方法通常需要依赖附加的硬件设备(如激光扫描仪、深度相机等),或者是通过围绕褶皱纸张拍摄多视角图像,来重建纸张的三维立体结构,并在此基础上进行展平矫正。然而,由于较高的硬件成本或繁琐的拍摄要求,这些技术的推广和使用受到了极大的限制。Traditional solutions are mainly based on 3D reconstruction techniques. These methods usually need to rely on additional hardware devices (such as laser scanners, depth cameras, etc.), or take multi-view images around the wrinkled paper to reconstruct the three-dimensional structure of the paper, and then perform flattening and correction on this basis. However, the popularization and use of these technologies has been greatly limited due to high hardware costs or cumbersome shooting requirements.
目前,许多智能手机都内置有文档矫正算法。这些算法大都基于投影变换技术:首先检测拍摄的文档图像中纸质文档的四条直线边缘或四个角点,形成文档所在的四边形区域;然后应用投影变换技术将其映射为规整的矩形图像,从而完成对拍摄文档图像的矫正。然而,这一解决方案要求拍摄的图像中,必须出现完整的文档,且如果文档本身存在形变无法进行矫正恢复,进而影响效果。这一限制同样带来了不便:很多时候,用户可能只关注于文档的部分区域。Currently, many smartphones have built-in document correction algorithms. Most of these algorithms are based on projection transformation technology: first, detect the four straight edges or four corner points of the paper document in the captured document image to form a quadrilateral area where the document is located; then apply projection transformation technology to map it into a regular rectangular image, thus Correction of the captured document image is completed. However, this solution requires that a complete document must appear in the captured image, and if the document itself is deformed, it cannot be corrected and restored, which will affect the effect. This limitation also brings inconvenience: many times, users may only focus on some areas of the document.
近年来,深度学习被引入到褶皱文档图像矫正领域。相对于传统的方法,基于深度学习的方法在实现相似性能的同时,仅需要较少的计算开销。通过使用渲染引擎合成的大量形变-无形变图像对进行训练,神经网络学习到了矫正文档褶皱的能力。在推理阶段,输入单张褶皱的RGB文档图像,神经网络可以输出逐像素的坐标映射矩阵,将输入图像中褶皱文档区域中的像素采样到空图像中,得到完整的矫正图像。In recent years, deep learning has been introduced into the field of wrinkled document image correction. Compared with traditional methods, deep learning-based methods require less computational overhead while achieving similar performance. The neural network learns the ability to correct wrinkled documents by training with a large number of deformed-undistorted image pairs synthesized by the rendering engine. In the inference stage, a single wrinkled RGB document image is input, and the neural network can output a pixel-by-pixel coordinate mapping matrix, which samples the pixels in the wrinkled document area in the input image into the empty image to obtain a complete rectified image.
总体来说,无论是智能手机内置的文档矫正算法,还是现有的深度学习方法,它们主要存在如下缺陷:Generally speaking, whether it is the built-in document correction algorithm of the smartphone or the existing deep learning method, they mainly have the following defects:
(1)当前基于深度学习的文档图像矫正算法,普遍只能矫正有完整边界的褶皱文档图像,即输入图像中必须包含一个完整的文档。然而,在实际应用场景中,用户可能仅仅想要关注或者分享文档中的部分区域或文字。因此,拍摄的图像可能会存在文档边界缺失的情况。此外,通过手机拍摄的文档图像往往存在边缘部分缺失的情况。这种情况下,现有的文档图像矫正方法将会失效,无法得到正常的矫正结果。当前的技术方案,对于无文档边界或仅包含部分文档边界的文档图像的矫正问题,缺乏有效的研究,需要进一步探索和改进。(1) The current document image correction algorithm based on deep learning can generally only correct wrinkled document images with complete boundaries, that is, the input image must contain a complete document. However, in practical application scenarios, users may only want to follow or share some regions or text in a document. Therefore, captured images may have missing document boundaries. In addition, document images captured by mobile phones often have missing edges. In this case, the existing document image correction method will be invalid, and a normal correction result cannot be obtained. The current technical solutions lack effective research on the rectification of document images without document boundaries or only contain part of document boundaries, and further exploration and improvement are needed.
(2)当前智能手机内置的文档图像矫正算法的适用场景受限。这些算法只适用于完整的、无形变的文档图像,即该纸质文档无折叠、弯曲和褶皱,且完整的出现在拍摄的图像中。简单来说,这些算法只是将纸质文档的成像投影平面切换到一个规则的矩形形状,一旦纸质文档的形状不是规则的四边形,这些算法就无法正常地完成文档图像矫正。(2) The applicable scenarios of the document image correction algorithm built into the current smart phone are limited. These algorithms only work on complete, undistorted document images, that is, the paper document is free of folds, bends and wrinkles and appears intact in the captured image. In simple terms, these algorithms just switch the imaging projection plane of the paper document to a regular rectangular shape. Once the shape of the paper document is not a regular quadrilateral, these algorithms cannot normally complete the document image correction.
(3)现有的基于深度学习的文档图像矫正算法,矫正的文档图像仍然存在一定程度的失真。这是因为,这些在模型训练时,仅考虑了有完整边界文档图像,忽略了无文档边界或仅包含部分文档边界的文档图像。而将后者纳入模型训练,能够有效地提高模型的准确性和鲁棒性。原因在于,让无文档边界或仅包含部分文档边界的文档图像加入训练,能够提升模型的泛化性,让模型更有效地学习如何利用图像中仅存的形变的文本行等特征来矫正图像。(3) With existing document image correction algorithms based on deep learning, the corrected document image still has a certain degree of distortion. This is because, during model training, only document images with complete boundaries are considered, and document images with no document boundaries or only partial document boundaries are ignored. Incorporating the latter into model training can effectively improve the accuracy and robustness of the model. The reason is that adding document images with no document boundaries or only part of the document boundaries to the training can improve the generalization of the model and allow the model to learn more effectively how to use the only features such as deformed text lines in the image to correct the image.
有鉴于此,特提出本发明。In view of this, the present invention is proposed.
发明内容Contents of the invention
本发明的目的是提供一种无约束褶皱文档图像矫正方法、系统、设备及存储介质,可以矫正无文档边界或仅包含部分文档边界的形变文档图像,也可以提升有完整文档边界图像的矫正效果。总而言之,本发明对输入的褶皱文档图像,无关于其中文档边界完整性及形变程度的约束,对各种形变文档图像均能有效地矫正恢复,且能够有效地提高文档图像矫正的实用性和实际应用效果。The purpose of the present invention is to provide a method, system, device and storage medium for unconstrained wrinkled document image correction, which can correct deformed document images without document boundaries or only contain part of document boundaries, and can also improve the correction effect of images with complete document boundaries . In a word, the present invention can effectively correct and restore various deformed document images, and can effectively improve the practicability and practicality of document image correction, regardless of the constraints of document boundary integrity and deformation degree. Apply effects.
本发明的目的是通过以下技术方案实现的:The purpose of the present invention is achieved by the following technical solutions:
一种无约束褶皱文档图像矫正方法,包括:An unconstrained wrinkled document image correction method, comprising:
建模褶皱文档图像到无形变文档图像的像素映射关系,生成样本对,其中,每一样本对包含无约束褶皱文档图像块以及无约束褶皱文档图像块至无形变文档图像块的坐标映射矩阵;modeling the pixel mapping relationship from the wrinkled document image to the undeformed document image, and generating sample pairs, wherein each sample pair includes an unconstrained wrinkled document image block and a coordinate mapping matrix from an unconstrained wrinkled document image block to an undeformed document image block;
构建无约束的文档图像矫正网络,并利用多个样本对形成的训练数据集进行训练;Construct an unconstrained document image correction network, and use multiple samples to train the formed training data set;
将无约束褶皱文档图像输入至训练后的无约束的文档图像矫正网络,获得预测坐标映射矩阵,利用所述预测坐标映射矩阵对所述无约束褶皱文档图像矫正,获得矫正图像。The unconstrained wrinkled document image is input to the trained unconstrained document image correction network to obtain a predicted coordinate mapping matrix, and the unconstrained wrinkled document image is corrected by using the predicted coordinate mapping matrix to obtain a corrected image.
一种无约束褶皱文档图像矫正系统,包括:An unconstrained wrinkled document image correction system comprising:
像素映射关系建模与样本对生成单元,用于建模褶皱文档图像到无形变文档图像的像素映射关系,生成样本对,其中,每一样本对包含无约束褶皱文档图像块以及无约束褶皱文档图像块至无形变文档图像块的坐标映射矩阵;The pixel mapping relationship modeling and sample pair generation unit is used to model the pixel mapping relationship from the wrinkled document image to the undeformed document image, and generate sample pairs, wherein each sample pair includes an unconstrained wrinkled document image block and an unconstrained wrinkled document A coordinate mapping matrix from the image block to the image block of the non-deformed document;
网络构建与训练单元,用于构建无约束的文档图像矫正网络,并利用多个样本对形成的训练数据集进行训练;The network construction and training unit is used to construct an unconstrained document image correction network, and use multiple samples to train the formed training data set;
图像矫正单元,用于将无约束褶皱文档图像输入至训练后的无约束的文档图像矫正网络,获得预测坐标映射矩阵,利用所述预测坐标映射矩阵对所述无约束褶皱文档图像矫正,获得矫正图像。The image correction unit is configured to input the unconstrained wrinkled document image to the trained unconstrained document image correction network to obtain a predicted coordinate mapping matrix, and use the predicted coordinate mapping matrix to correct the unconstrained wrinkled document image to obtain the corrected image.
一种处理设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序;A processing device comprising: one or more processors; memory for storing one or more programs;
其中,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现前述的方法。Wherein, when the one or more programs are executed by the one or more processors, the one or more processors are made to implement the aforementioned method.
一种可读存储介质,存储有计算机程序,当计算机程序被处理器执行时实现前述的方法。A readable storage medium stores a computer program, and implements the aforementioned method when the computer program is executed by a processor.
由上述本发明提供的技术方案可以看出,能够解决现有方案应用场景受限的问题,即不能矫正无文档边界或仅包含部分文档边界的形变文档图像。同时,本发明也提升了对于有完整文档边界的图像的矫正恢复效果。相较于传统方法,本发明对输入的褶皱文档图像无任何形式上的约束,能够更加鲁棒和准确地矫正日常生活中拍摄的各类形变文档图像,本发明可被广泛应用于智能手机等带有相机的便携式设备,应用场景更加广泛,准确率更高。因此,本发明将极大地促进文档图像数字化的普及,为纸质文档的数字化转换提供强有力的技术支持。It can be seen from the above-mentioned technical solution provided by the present invention that it can solve the problem of limited application scenarios of the existing solution, that is, it cannot correct deformed document images without document boundaries or only containing part of document boundaries. At the same time, the present invention also improves the rectification and restoration effect for images with complete document boundaries. Compared with traditional methods, the present invention does not have any formal constraints on the input wrinkled document images, and can more robustly and accurately correct various deformed document images captured in daily life. The present invention can be widely used in smartphones, etc. Portable devices with cameras have wider application scenarios and higher accuracy. Therefore, the invention will greatly promote the popularization of document image digitization, and provide strong technical support for the digitization conversion of paper documents.
附图说明Description of drawings
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings on the premise of not paying creative efforts.
图1为本发明实施例提供的一种无约束褶皱文档图像矫正方法的流程图;FIG. 1 is a flow chart of an unconstrained wrinkled document image correction method provided by an embodiment of the present invention;
图2为本发明实施例提供的输入形变文档图像和输出无形变文档像素映射关系的建模示意图;FIG. 2 is a schematic diagram of modeling the pixel mapping relationship between an input deformed document image and an output undeformed document provided by an embodiment of the present invention;
图3为本发明实施例提供的基于无约束的文档图像矫正网络实现形变图像矫正的流程图;FIG. 3 is a flow chart of realizing distorted image correction based on an unconstrained document image correction network provided by an embodiment of the present invention;
图4为本发明实施例提供的一种无约束褶皱文档图像矫正系统的示意图;FIG. 4 is a schematic diagram of an unconstrained wrinkled document image correction system provided by an embodiment of the present invention;
图5为本发明实施例提供的一种处理设备的示意图。Fig. 5 is a schematic diagram of a processing device provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明的保护范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
首先对本文中可能使用的术语进行如下说明:First, the terms that may be used in this article are explained as follows:
术语“包括”、“包含”、“含有”、“具有”或其它类似语义的描述,应被解释为非排它性的包括。例如:包括某技术特征要素(如原料、组分、成分、载体、剂型、材料、尺寸、零件、部件、机构、装置、步骤、工序、方法、反应条件、加工条件、参数、算法、信号、数据、产品或制品等),应被解释为不仅包括明确列出的某技术特征要素,还可以包括未明确列出的本领域公知的其它技术特征要素。The terms "comprising", "comprising", "containing", "having" or other descriptions with similar meanings shall be construed as non-exclusive inclusions. For example: including certain technical feature elements (such as raw materials, components, ingredients, carriers, dosage forms, materials, dimensions, parts, components, mechanisms, devices, steps, procedures, methods, reaction conditions, processing conditions, parameters, algorithms, signals, data, products or products, etc.), should be interpreted as including not only a certain technical feature element explicitly listed, but also other technical feature elements not explicitly listed in the art.
术语“由……组成”表示排除任何未明确列出的技术特征要素。若将该术语用于权利要求中,则该术语将使权利要求成为封闭式,使其不包含除明确列出的技术特征要素以外的技术特征要素,但与其相关的常规杂质除外。如果该术语只是出现在权利要求的某子句中,那么其仅限定在该子句中明确列出的要素,其他子句中所记载的要素并不被排除在整体权利要求之外。The term "consisting of" means excluding any technical characteristic elements not explicitly listed. If this term is used in a claim, the term will make the claim closed so that it does not contain technical characteristic elements other than those expressly listed, except for conventional impurities related to them. If the term only appears in a certain clause of a claim, it only limits the elements explicitly listed in the clause, and the elements stated in other clauses are not excluded from the entire claim.
下面对本发明所提供的一种无约束褶皱文档图像矫正方法、系统、设备及存储介质进行详细描述。本发明实施例中未作详细描述的内容属于本领域专业技术人员公知的现有技术。本发明实施例中未注明具体条件者,按照本领域常规条件或制造商建议的条件进行。A method, system, device and storage medium for unconstrained wrinkled document image correction provided by the present invention will be described in detail below. The content not described in detail in the embodiments of the present invention belongs to the prior art known to those skilled in the art. In the embodiment of the present invention, if no specific conditions are indicated, it is carried out according to the conventional conditions in the art or the conditions suggested by the manufacturer.
实施例一Embodiment one
本发明实施例提供一种无约束褶皱文档图像矫正方法,如图1所示,其主要包括:An embodiment of the present invention provides an unconstrained wrinkled document image correction method, as shown in Figure 1, which mainly includes:
步骤1、通过建模褶皱文档图像到无形变文档图像的像素映射关系,生成样本对,其中,样本对包含无约束褶皱文档图像块以及无约束褶皱文档图像块至无形变文档图像块的坐标映射矩阵。
如图2所示,本步骤的优选实施方式如下:As shown in Figure 2, the preferred implementation of this step is as follows:
(1)全局矫正。获取具有完整边界的褶皱文档图像,然后使用其对应的坐标映射矩阵,将所述具有完整边界的褶皱文档图像矫正为无形变文档图像。(1) Global correction. A wrinkled document image with a complete boundary is obtained, and then the wrinkled document image with a complete boundary is rectified into a non-deformed document image by using its corresponding coordinate mapping matrix.
在本发明实例中,具有完整边界的褶皱文档图像及其坐标映射矩阵均来自现有的公开数据集,该坐标映射矩阵描述了褶皱文档图像和对应无形变文档图像每一个像素的坐标映射关系,即无形变文档图像中每一个像素在褶皱文档图像中的像素位置。In the example of the present invention, the wrinkled document image with complete boundaries and its coordinate mapping matrix are all from the existing public data set. The coordinate mapping matrix describes the coordinate mapping relationship between each pixel of the wrinkled document image and the corresponding undeformed document image. That is, the pixel position of each pixel in the undeformed document image in the wrinkled document image.
(2)局部坐标映射关系建模。在所述具有完整边界的褶皱文档图像中随机截取一个区域的图像块,称为无约束褶皱文档图像块,根据该区域的坐标映射矩阵找到无形变文档图像中对应区域,称为无形变文档图像块,再截取坐标映射矩阵中同区域的矩阵,即无约束褶皱文档图像块至无形变文档图像块的坐标映射矩阵。(2) Modeling of local coordinate mapping relationship. An image block of an area randomly intercepted in the wrinkled document image with complete boundaries is called an unconstrained wrinkled document image block, and the corresponding area in the undeformed document image is found according to the coordinate mapping matrix of the area, which is called an undeformed document image Block, and then intercept the matrix of the same area in the coordinate mapping matrix, that is, the coordinate mapping matrix from the unconstrained wrinkled document image block to the undeformed document image block.
如图2所示,左下角虚线框部分为随机截取的一个区域的图像,即无约束褶皱文档图像块,右下角虚线框为对应的无形变文档图像块。由于此处是随机截取一个区域的图像,得到的是无文档边界或无完整文档边界的皱褶文档图像,当然也可以是具有完整文档边界的皱褶文档图像,因此,称之为无约束褶皱文档图像块。As shown in Figure 2, the dotted box in the lower left corner is an image of a region randomly intercepted, that is, the unconstrained wrinkled document image block, and the dotted line box in the lower right corner is the corresponding undeformed document image block. Since the image of an area is randomly intercepted here, the wrinkled document image without document boundary or complete document boundary is obtained, of course, it can also be a wrinkled document image with complete document boundary, so it is called unconstrained fold Document image block.
本发明实施例中,对于每一具有完整边界的褶皱文档图像,都可以通过建模褶皱文档图像到无形变文档图像的像素映射关系,获得无约束褶皱文档图像块,以及无约束褶皱文档图像块至无形变文档图像块的坐标映射矩阵,二者形成一个样本对。In the embodiment of the present invention, for each wrinkled document image with a complete boundary, the unconstrained wrinkled document image block and the unconstrained wrinkled document image block can be obtained by modeling the pixel mapping relationship between the wrinkled document image and the undeformed document image The coordinate mapping matrix to the image block of the undeformed document, the two form a sample pair.
本发明实施例中,通过多个样本对可以形成训练数据集;其中,对于每一具有完整边界的褶皱文档图像,在执行全局矫正后可以执行一次或多次局部坐标映射关系建模,得到一个或多个样本对;当然也可以多个具有完整边界的褶皱文档图像进行图2所示的建模得到对应的样本对。具体的样本对数量可以根据实际情况或者经验进行设定。In the embodiment of the present invention, a training data set can be formed by a plurality of sample pairs; wherein, for each wrinkled document image with a complete boundary, after performing global correction, local coordinate mapping relationship modeling can be performed one or more times to obtain a or multiple sample pairs; of course, multiple wrinkled document images with complete boundaries can be modeled as shown in FIG. 2 to obtain corresponding sample pairs. The number of specific sample pairs can be set according to the actual situation or experience.
需要说明的是,图2主要是呈现建模褶皱文档图像到无形变文档图像的像素映射关系的原理,考虑到隐私问题,文档图像中的文字做了模糊处理,但并不影响方案实施,在实际应用中,不会调节文档图像的清晰度。It should be noted that Figure 2 mainly presents the principle of modeling the pixel mapping relationship between a wrinkled document image and a non-deformed document image. Considering privacy issues, the text in the document image has been blurred, but this does not affect the implementation of the scheme. In actual application, the sharpness of the document image will not be adjusted.
步骤2、构建无约束的文档图像矫正网络,并利用多个样本对形成的训练数据集进行训练。
本发明实施例中,文档图像矫正网络可以是一个全卷积神经网络,如UNet网络,主要包含特征提取器与特征解码器。In the embodiment of the present invention, the document image correction network may be a fully convolutional neural network, such as a UNet network, which mainly includes a feature extractor and a feature decoder.
训练时,输入为样本对中的无约束褶皱文档图像块,通过特征提取器进行特征提取,并通过特征解码器输出预测坐标映射矩阵,将样本对中的无约束褶皱文档图像块至无形变文档图像块的坐标映射矩阵作为监督信息,与预测坐标映射矩阵构建损失函数训练所述无约束的文档图像矫正网络。During training, the input is the unconstrained wrinkled document image block in the sample pair, feature extraction is performed through the feature extractor, and the predicted coordinate mapping matrix is output through the feature decoder, and the unconstrained wrinkled document image block in the sample pair is converted to the undeformed document The coordinate mapping matrix of the image block is used as supervision information, and a loss function is constructed with the predicted coordinate mapping matrix to train the unconstrained document image rectification network.
训练过程可参照常规技术实现,本发明不做赘述,当满足设定的停止条件(例如,训练次数达到设定次数,或者损失函数收敛等)时,停止训练。The training process can be implemented with reference to conventional technologies, and the present invention does not repeat it. When the set stop conditions are met (for example, the number of training times reaches the set number of times, or the loss function converges, etc.), the training is stopped.
步骤3、将无约束褶皱文档图像输入至训练后的无约束的文档图像矫正网络,获得预测坐标映射矩阵,利用所述预测坐标映射矩阵对所述无约束褶皱文档图像矫正,获得矫正图像。Step 3: Input the unconstrained wrinkled document image into the trained unconstrained document image correction network to obtain a predicted coordinate mapping matrix, and use the predicted coordinate mapping matrix to correct the unconstrained wrinkled document image to obtain a corrected image.
本发明实施例中,无约束褶皱文档图像可以是任意褶皱形式的形变图像Id,如图3所示,可以是(a)部分所示的具有完整边界的褶皱文档图像,可以是(b)部分所示的无文档边界的褶皱文档图像,也可以是(c)部分所示的无完整文档边界的皱褶文档图像;通过训练后的无约束的文档图像矫正网络进行特征提取与特征解码,输出预测坐标映射矩阵fb,之后利用上采样算法(例如,双线性插值算法),通过预测坐标映射矩阵fb对无约束褶皱文档图像进行矫正,得到矫正图像Ir。In the embodiment of the present invention, the unconstrained wrinkled document image can be a deformed image I d in any wrinkled form, as shown in Figure 3, it can be the wrinkled document image with complete boundaries shown in part (a), it can be (b) The wrinkled document image without document boundary as shown in part (c) can also be the wrinkled document image without complete document boundary shown in part (c); feature extraction and feature decoding are performed through the trained unconstrained document image correction network, Output the predicted coordinate mapping matrix f b , and then use an upsampling algorithm (eg, bilinear interpolation algorithm) to correct the unconstrained wrinkled document image through the predicted coordinate mapping matrix f b to obtain a corrected image I r .
本发明实施例提供的上述方案,能够解决现有方案应用场景受限的问题,即不能矫正无文档边界或仅包含部分文档边界的形变文档图像。同时,本发明也提升了对于有完整文档边界的图像的矫正恢复效果。相较于传统方法,本发明对输入的褶皱文档图像无任何形式上的约束,能够更加鲁棒和准确地矫正日常生活中拍摄的各类形变文档图像,本发明可被广泛应用于智能手机等带有相机的便携式设备,应用场景更加广泛,准确率更高。因此,本发明将极大地促进文档图像数字化的普及,为纸质文档的数字化转换提供强有力的技术支持。The above solutions provided by the embodiments of the present invention can solve the problem of limited application scenarios of the existing solutions, that is, it is impossible to correct deformed document images without document boundaries or only containing part of document boundaries. At the same time, the present invention also improves the rectification and restoration effect for images with complete document boundaries. Compared with traditional methods, the present invention does not have any formal constraints on the input wrinkled document images, and can more robustly and accurately correct various deformed document images captured in daily life. The present invention can be widely used in smartphones, etc. Portable devices with cameras have wider application scenarios and higher accuracy. Therefore, the invention will greatly promote the popularization of document image digitization, and provide strong technical support for the digitization conversion of paper documents.
实施例二Embodiment two
本发明实施例提供一种无约束褶皱文档图像矫正系统,如图4所示,该系统主要包括:An embodiment of the present invention provides an unconstrained wrinkled document image correction system, as shown in Figure 4, the system mainly includes:
像素映射关系建模与样本对生成单元,用于建模褶皱文档图像到无形变文档图像的像素映射关系,生成样本对,其中,每一样本对包含无约束褶皱文档图像块以及无约束褶皱文档图像块至无形变文档图像块的坐标映射矩阵;The pixel mapping relationship modeling and sample pair generation unit is used to model the pixel mapping relationship from the wrinkled document image to the undeformed document image, and generate sample pairs, wherein each sample pair includes an unconstrained wrinkled document image block and an unconstrained wrinkled document A coordinate mapping matrix from the image block to the image block of the non-deformed document;
网络构建与训练单元,用于构建无约束的文档图像矫正网络,并利用多个样本对形成的训练数据集进行训练;The network construction and training unit is used to construct an unconstrained document image correction network, and use multiple samples to train the formed training data set;
图像矫正单元,用于将无约束褶皱文档图像输入至训练后的无约束的文档图像矫正网络,获得预测坐标映射矩阵,利用所述预测坐标映射矩阵对所述无约束褶皱文档图像矫正,获得矫正图像。The image correction unit is configured to input the unconstrained wrinkled document image to the trained unconstrained document image correction network to obtain a predicted coordinate mapping matrix, and use the predicted coordinate mapping matrix to correct the unconstrained wrinkled document image to obtain the corrected image.
本发明实施例中,所述建模褶皱文档图像到无形变文档图像的像素映射关系,生成样本对包括:In the embodiment of the present invention, the modeling of the pixel mapping relationship between the wrinkled document image and the undeformed document image, and generating the sample pair include:
获取具有完整边界的褶皱文档图像,然后使用其对应的坐标映射矩阵,将所述具有完整边界的褶皱文档图像矫正为无形变文档图像;Obtaining a wrinkled document image with a complete boundary, and then using its corresponding coordinate mapping matrix to correct the wrinkled document image with a complete boundary to a non-deformed document image;
在所述具有完整边界的褶皱文档图像中随机截取一个区域的图像块,称为无约束褶皱文档图像块,根据该区域的坐标映射矩阵找到无形变文档图像中对应区域,称为无形变文档图像块,再截取坐标映射矩阵中同区域的矩阵,即无约束褶皱文档图像块至无形变文档图像块的坐标映射矩阵;An image block of an area randomly intercepted in the wrinkled document image with complete boundaries is called an unconstrained wrinkled document image block, and the corresponding area in the undeformed document image is found according to the coordinate mapping matrix of the area, which is called an undeformed document image Block, and then intercept the matrix of the same area in the coordinate mapping matrix, that is, the coordinate mapping matrix from the unconstrained wrinkled document image block to the undeformed document image block;
获得的无约束褶皱文档图像块以及无约束褶皱文档图像块至无形变文档图像块的坐标映射矩阵形成一个样本对。The obtained unconstrained wrinkled document image block and the coordinate mapping matrix from the unconstrained wrinkled document image block to the undeformed document image block form a sample pair.
本发明实施例中,所述构建无约束的文档图像矫正网络,并利用多个样本对形成的训练数据集进行训练包括:In the embodiment of the present invention, the construction of an unconstrained document image correction network, and using multiple samples to train the formed training data set includes:
构建包含特征提取器与特征解码器的无约束的文档图像矫正网络;Construct an unconstrained document image rectification network including feature extractor and feature decoder;
训练时,输入为样本对中的无约束褶皱文档图像块,通过特征提取器进行特征提取,并通过特征解码器输出预测坐标映射矩阵,将样本对中的无约束褶皱文档图像块至无形变文档图像块的坐标映射矩阵作为监督信息,与预测坐标映射矩阵构建损失函数训练所述无约束的文档图像矫正网络。During training, the input is the unconstrained wrinkled document image block in the sample pair, feature extraction is performed through the feature extractor, and the predicted coordinate mapping matrix is output through the feature decoder, and the unconstrained wrinkled document image block in the sample pair is converted to the undeformed document The coordinate mapping matrix of the image block is used as supervision information, and a loss function is constructed with the predicted coordinate mapping matrix to train the unconstrained document image rectification network.
本发明实施例中,所述利用所述预测坐标映射矩阵对所述无约束褶皱文档图像矫正,获得矫正图像包括:In the embodiment of the present invention, the correction of the unconstrained wrinkled document image by using the predicted coordinate mapping matrix, and obtaining the corrected image includes:
利用上采样算法,通过预测坐标映射矩阵对所述无约束褶皱文档图像进行矫正,得到矫正图像。Using an up-sampling algorithm, the unconstrained wrinkled document image is corrected by predicting a coordinate mapping matrix to obtain a corrected image.
实施例三Embodiment three
本发明还提供一种处理设备,如图5所示,其主要包括:一个或多个处理器;存储器,用于存储一个或多个程序;其中,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现前述实施例提供的方法。The present invention also provides a processing device, as shown in FIG. 5 , which mainly includes: one or more processors; a memory for storing one or more programs; wherein, when the one or more programs are executed by the When the one or more processors execute, the one or more processors implement the methods provided in the foregoing embodiments.
进一步的,所述处理设备还包括至少一个输入设备与至少一个输出设备;在所述处理设备中,处理器、存储器、输入设备、输出设备之间通过总线连接。Further, the processing device further includes at least one input device and at least one output device; in the processing device, the processor, memory, input device, and output device are connected through a bus.
本发明实施例中,所述存储器、输入设备与输出设备的具体类型不做限定;例如:In the embodiment of the present invention, the specific types of the memory, input device and output device are not limited; for example:
输入设备可以为触摸屏、图像采集设备、智能手机、物理按键或者鼠标等;The input device can be a touch screen, an image acquisition device, a smart phone, a physical button or a mouse, etc.;
输出设备可以为显示终端;The output device can be a display terminal;
存储器可以为随机存取存储器(Random Access Memory,RAM),也可为非不稳定的存储器(non-volatile memory),例如磁盘存储器。The memory may be random access memory (Random Access Memory, RAM), or non-volatile memory (non-volatile memory), such as disk memory.
实施例四Embodiment four
本发明还提供一种可读存储介质,存储有计算机程序,当计算机程序被处理器执行时实现前述实施例提供的方法。The present invention also provides a readable storage medium storing a computer program, and when the computer program is executed by a processor, the methods provided in the foregoing embodiments are realized.
本发明实施例中可读存储介质作为计算机可读存储介质,可以设置于前述处理设备中,例如,作为处理设备中的存储器。此外,所述可读存储介质也可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。The readable storage medium in the embodiment of the present invention is used as a computer-readable storage medium, and may be set in the foregoing processing device, for example, as a memory in the processing device. In addition, the readable storage medium may also be various media capable of storing program codes, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk.
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明披露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求书的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person familiar with the technical field can easily conceive of changes or changes within the technical scope disclosed in the present invention. Replacement should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310392392.XA CN116403226A (en) | 2023-04-13 | 2023-04-13 | Unconstrained fold document image correction method, system, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310392392.XA CN116403226A (en) | 2023-04-13 | 2023-04-13 | Unconstrained fold document image correction method, system, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116403226A true CN116403226A (en) | 2023-07-07 |
Family
ID=87008754
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310392392.XA Pending CN116403226A (en) | 2023-04-13 | 2023-04-13 | Unconstrained fold document image correction method, system, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116403226A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116912831A (en) * | 2023-09-15 | 2023-10-20 | 东莞市将为防伪科技有限公司 | Method and system for processing acquired information of letter code anti-counterfeiting printed matter |
-
2023
- 2023-04-13 CN CN202310392392.XA patent/CN116403226A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116912831A (en) * | 2023-09-15 | 2023-10-20 | 东莞市将为防伪科技有限公司 | Method and system for processing acquired information of letter code anti-counterfeiting printed matter |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022110638A1 (en) | Human image restoration method and apparatus, electronic device, storage medium and program product | |
CN110033410B (en) | Image reconstruction model training method, image super-resolution reconstruction method and device | |
WO2023212997A1 (en) | Knowledge distillation based neural network training method, device, and storage medium | |
CN108765479A (en) | Using deep learning to monocular view estimation of Depth optimization method in video sequence | |
CN109472249A (en) | A kind of method and device of determining script superiority and inferiority grade | |
JP2019117577A (en) | Program, learning processing method, learning model, data structure, learning device and object recognition device | |
CN116664397B (en) | TransSR-Net structured image super-resolution reconstruction method | |
CN112767270B (en) | Fold document image correction system | |
WO2023284401A1 (en) | Image beautification processing method and apparatus, storage medium, and electronic device | |
CN110599411A (en) | Image restoration method and system based on condition generation countermeasure network | |
CN110418139B (en) | A kind of video super-resolution repair method, device, equipment and storage medium | |
WO2024099026A1 (en) | Image processing method and apparatus, device, storage medium and program product | |
JP2023541351A (en) | Character erasure model training method and device, translation display method and device, electronic device, storage medium, and computer program | |
CN115187978A (en) | A method for recognition of complex background seals based on deep learning | |
WO2024027583A1 (en) | Image processing method and apparatus, and electronic device and readable storage medium | |
Verhoeven et al. | UVDoc: neural grid-based document unwarping | |
CN116403226A (en) | Unconstrained fold document image correction method, system, equipment and storage medium | |
CN112651911A (en) | High dynamic range imaging generation method based on polarization image | |
CN112288626A (en) | A face illusion method and system based on dual-path deep fusion | |
CN114612798B (en) | Satellite image tampering detection method based on Flow model | |
CN113240584B (en) | Multitasking gesture picture super-resolution method based on picture edge information | |
Wang et al. | Perception-guided multi-channel visual feature fusion for image retargeting | |
CN114241167A (en) | A template-free virtual dressing method and device from video to video | |
CN115731591A (en) | Method, device and equipment for detecting makeup progress and storage medium | |
CN113591846A (en) | Image distortion coefficient extraction method, distortion correction method and system, and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |