CN114418869A - Method, system, device and medium for geometric correction of document image - Google Patents

Method, system, device and medium for geometric correction of document image Download PDF

Info

Publication number
CN114418869A
CN114418869A CN202111584077.4A CN202111584077A CN114418869A CN 114418869 A CN114418869 A CN 114418869A CN 202111584077 A CN202111584077 A CN 202111584077A CN 114418869 A CN114418869 A CN 114418869A
Authority
CN
China
Prior art keywords
document image
document
image
coordinate offset
offset matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111584077.4A
Other languages
Chinese (zh)
Other versions
CN114418869B (en
Inventor
金连文
张家鑫
罗灿杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Original Assignee
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, Zhuhai Institute of Modern Industrial Innovation of South China University of Technology filed Critical South China University of Technology SCUT
Priority to CN202111584077.4A priority Critical patent/CN114418869B/en
Publication of CN114418869A publication Critical patent/CN114418869A/en
Application granted granted Critical
Publication of CN114418869B publication Critical patent/CN114418869B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method, a system, a device and a medium for geometric correction of a document image, wherein the method comprises the following steps: acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask image of the foreground document area; extracting control points on the mask image, primarily correcting the first document image according to the control points, deleting the environment boundary, and obtaining a second document image with primarily corrected and deleted environment boundary; and acquiring a first coordinate offset matrix of the second document image, and offsetting the second document image according to the first coordinate offset matrix to obtain a corrected third document image. The present invention is capable of handling captured document images having different environmental boundary regions, including situations with smaller environmental boundary regions, with larger environmental boundary regions, or without environmental boundary regions. The invention can be widely applied to the technical field of pattern recognition and artificial intelligence.

Description

一种文档图像几何校正方法、系统、装置及介质A document image geometric correction method, system, device and medium

技术领域technical field

本发明涉及模式识别与人工智能技术领域,尤其涉及一种文档图像几何校正方法、系统、装置及介质。The invention relates to the technical field of pattern recognition and artificial intelligence, and in particular, to a method, system, device and medium for geometric correction of a document image.

背景技术Background technique

随着半导体技术的发展,移动设备的内置摄像头越来越先进,成像质量也越来越高。利用内置摄像头进行拍照来对文档图像进行数字化已经成为一种很方便的文档数字化方式。但是由于拍照时摄像头不恰当的位置及角度导致的透视形变,以及文档本身就具有的弯曲、折叠以及褶皱等形变,拍摄得到的文档图像会具有几何形变。这些形变会影响光学字符识别系统的性能,同时影响文档图像的美观和可读性。基于深度学习的文档图像几何校正方法不管是在校正性能上还是对不同文档布局的鲁棒性上都取得了很大的进步。但现有的深度学习校正方法都只关注于校正裁剪好的文档图像,即具有较小环境边界区域的文档图像,且要求具有完整的文档边界。但是实际情况中环境边界情况是多样的,有的文档图像具有大的环境边界区域,前景文档区域只占小部分,而有的文档图像则不具有环境边界区域,因此没有完整的文档边界。前述深度学习校正方法对这类图像效果都不好。With the development of semiconductor technology, the built-in cameras of mobile devices are becoming more and more advanced, and the image quality is getting higher and higher. Using the built-in camera to take pictures to digitize document images has become a very convenient way to digitize documents. However, due to the perspective deformation caused by the inappropriate position and angle of the camera when taking pictures, as well as the bending, folding and wrinkling of the document itself, the captured document image will have geometric deformation. These deformations can affect the performance of optical character recognition systems, as well as the aesthetics and readability of document images. Deep learning-based methods for geometric correction of document images have achieved great progress in both correction performance and robustness to different document layouts. However, the existing deep learning correction methods only focus on correcting the cropped document image, that is, the document image with a small environmental boundary area, and requires a complete document boundary. However, there are various environmental boundary conditions in actual situations. Some document images have a large environmental boundary area, and the foreground document area only occupies a small part, while some document images do not have an environmental boundary area, so there is no complete document boundary. The aforementioned deep learning correction methods do not work well for such images.

发明内容SUMMARY OF THE INVENTION

为至少一定程度上解决现有技术中存在的技术问题之一,本发明的目的在于提供一种文档图像几何校正方法、系统、装置及介质。In order to solve one of the technical problems existing in the prior art at least to a certain extent, the purpose of the present invention is to provide a method, system, device and medium for geometric correction of a document image.

本发明所采用的技术方案是:The technical scheme adopted in the present invention is:

一种文档图像几何校正方法,包括以下步骤:A document image geometric correction method, comprising the following steps:

获取第一文档图像,对所述第一文档图像中的像素进行分类,区分所述文档图像中前景文档区域和环境边界区域,获取前景文档区域的掩膜图;obtaining a first document image, classifying the pixels in the first document image, distinguishing a foreground document area and an environmental boundary area in the document image, and obtaining a mask map of the foreground document area;

在所述掩模图上提取控制点,根据所述控制点对所述第一文档图像进行初步矫正,删除环境边界,获得初步矫正并删除环境边界的第二文档图像;Extracting control points on the mask map, performing preliminary correction on the first document image according to the control points, deleting the environmental boundary, and obtaining a second document image with preliminary correction and deleting the environmental boundary;

获取所述第二文档图像的第一坐标偏移矩阵,根据所述第一坐标偏移矩阵对所述第二文档图像进行偏移后,获得矫正后的第三文档图像。A first coordinate offset matrix of the second document image is acquired, and after the second document image is offset according to the first coordinate offset matrix, a corrected third document image is obtained.

进一步地,所述获得矫正后的第三文档图像,包括:Further, the obtaining the corrected third document image includes:

根据所述第一坐标偏移矩阵判断是否执行迭代步骤,若不需要执行迭代步骤,将所述第三文档图像作为输出图像;反之,执行迭代步骤;It is judged according to the first coordinate offset matrix whether to perform the iterative step, if the iterative step is not required to be performed, the third document image is used as the output image; otherwise, the iterative step is performed;

所述迭代步骤包括:The iterative steps include:

获取所述第三文档图像的第二坐标偏移矩阵,根据所述第二坐标偏移矩阵对所述第三文档图像进行偏移后,将矫正后的图像更新为第三文档图像,以及记录迭代步骤中的第二坐标偏移矩阵;Acquiring a second coordinate offset matrix of the third document image, after offsetting the third document image according to the second coordinate offset matrix, updating the corrected image to the third document image, and recording the second coordinate offset matrix in the iteration step;

根据所述第二坐标偏移矩阵判断是否继续执行迭代步骤,若是,返回执行上一步骤;反之,根据所述第一坐标偏移矩阵和记录中所有的第二坐标偏移矩阵对所述所述第二文档图像进行偏移,获得矫正后的图像作为输出图像。Determine whether to continue the iterative step according to the second coordinate offset matrix, and if so, return to the previous step; The second document image is offset, and the corrected image is obtained as the output image.

进一步地,所述对所述第一文档图像中的像素进行分类,包括:Further, the classifying the pixels in the first document image includes:

采用第一深度卷积神经网络获取所述第一文档图像中每个像素位置的分类置信度,根据分类置信度进行分类;Using the first deep convolutional neural network to obtain the classification confidence of each pixel position in the first document image, and classify according to the classification confidence;

所述获取所述第二文档图像的第一坐标偏移矩阵,包括:The acquiring the first coordinate offset matrix of the second document image includes:

采用第二深度卷积神经网络获取所述第二文档图像的第一坐标偏移矩阵。A first coordinate offset matrix of the second document image is obtained by using a second deep convolutional neural network.

进一步地,所述在所述掩模图上提取控制点,根据所述控制点对所述第一文档图像进行初步矫正,删除环境边界,获得初步矫正并删除环境边界的第二文档图像,包括:Further, the control points are extracted on the mask map, the first document image is preliminarily corrected according to the control points, the environmental boundary is deleted, and the second document image that is preliminarily corrected and the environmental boundary is deleted, including :

使用多边形拟合算法在前景文档区域的掩模图上提取文档的四个角点;Extract the four corners of the document on the mask map of the foreground document area using a polygon fitting algorithm;

根据预设等分比例在以相邻角点为端点构成的线段上画垂直等分线,将等分线和前景文档区域的掩模图边界的交点作为文档边界的等分点;According to the preset equal division ratio, draw a vertical bisecting line on the line segment formed by the adjacent corner points as endpoints, and take the intersection of the bisecting line and the mask image boundary of the foreground document area as the bisecting point of the document boundary;

以四个角点绘制四边形掩模图,根据所述四边形掩模图与前景文档区域的掩模图计算交并比;Draw a quadrilateral mask map with four corners, and calculate the intersection ratio according to the quadrilateral mask map and the mask map of the foreground document area;

若交并比小于第一预设阈值,则不进行矫正,将第一文档图像作为第二文档图像;If the intersection ratio is smaller than the first preset threshold, no correction is performed, and the first document image is used as the second document image;

若交并比大于第一预设阈值,则以四个角点和若干边界等分点为控制点,使用薄板样条插值算法对第一文档图像进行初步矫正,删除环境边界,获得初步矫正并删除环境边界的第二文档图像。If the intersection ratio is greater than the first preset threshold, take the four corner points and several boundary points as control points, use the thin-plate spline interpolation algorithm to perform preliminary correction on the first document image, delete the environmental boundary, and obtain the preliminary correction and Delete the second document image of the environment boundary.

进一步地,所述根据所述第一坐标偏移矩阵对所述第二文档图像进行偏移后,获得矫正后的第三文档图像,包括:Further, after the second document image is offset according to the first coordinate offset matrix, a corrected third document image is obtained, including:

所述第一坐标偏移矩阵为所述第二文档图像中每个像素点位置指定一个二维的偏移向量,每个像素按照对应的偏移向量进行偏移,获得矫正后的第三文档图像;The first coordinate offset matrix specifies a two-dimensional offset vector for each pixel position in the second document image, and each pixel is offset according to the corresponding offset vector to obtain a corrected third document image;

其中,偏移向量用于表征二维平面上的偏移方向及距离。Among them, the offset vector is used to represent the offset direction and distance on the two-dimensional plane.

进一步地,所述根据所述第二坐标偏移矩阵判断是否继续执行迭代步骤,包括:Further, judging whether to continue to perform the iterative step according to the second coordinate offset matrix includes:

计算所述第二坐标偏移矩阵的标准差;calculating the standard deviation of the second coordinate offset matrix;

若所述第二坐标偏移矩阵的标准差大于第二预设阈值,继续执行迭代步骤;If the standard deviation of the second coordinate offset matrix is greater than the second preset threshold, continue to perform the iterative step;

若所述第二坐标偏移矩阵的标准差小于第二预设阈值,停止执行迭代步骤。If the standard deviation of the second coordinate offset matrix is smaller than the second preset threshold, the iterative step is stopped.

进一步地,所述根据所述第一坐标偏移矩阵和记录中所有的第二坐标偏移矩阵对所述所述第二文档图像进行偏移,获得矫正后的图像作为输出图像,包括:Further, the offsetting of the second document image according to the first coordinate offset matrix and all the second coordinate offset matrices in the record, to obtain a corrected image as an output image, includes:

将第一坐标偏移矩阵

Figure BDA0003427328280000031
和迭代步骤中记录的若干个坐标偏移矩阵
Figure BDA0003427328280000032
进行求和,获得最终的坐标偏移矩阵
Figure BDA0003427328280000033
根据最终坐标偏移矩阵
Figure BDA0003427328280000034
对所述第二文档图像进行偏移,获得矫正后的图像作为输出图像。Offset the first coordinate to the matrix
Figure BDA0003427328280000031
and several coordinate offset matrices recorded in the iteration step
Figure BDA0003427328280000032
Do the summation to get the final coordinate offset matrix
Figure BDA0003427328280000033
Offset matrix according to final coordinates
Figure BDA0003427328280000034
Offset the second document image to obtain a corrected image as an output image.

本发明所采用的另一技术方案是:Another technical scheme adopted by the present invention is:

一种文档图像几何校正系统,包括:A document image geometric correction system, comprising:

像素分类模块,用于获取第一文档图像,对所述第一文档图像中的像素进行分类,区分所述文档图像中前景文档区域和环境边界区域,获取前景文档区域的掩膜图;a pixel classification module, configured to obtain a first document image, classify the pixels in the first document image, distinguish a foreground document area and an environmental boundary area in the document image, and obtain a mask map of the foreground document area;

初步矫正模块,用于在所述掩模图上提取控制点,根据所述控制点对所述第一文档图像进行初步矫正,删除环境边界,获得初步矫正并删除环境边界的第二文档图像;A preliminary correction module for extracting control points on the mask map, performing preliminary correction on the first document image according to the control points, deleting the environmental boundary, and obtaining a second document image with preliminary correction and deleting the environmental boundary;

偏移矫正模块,用于获取所述第二文档图像的第一坐标偏移矩阵,根据所述第一坐标偏移矩阵对所述第二文档图像进行偏移后,获得矫正后的第三文档图像。an offset correction module, configured to obtain a first coordinate offset matrix of the second document image, and after offsetting the second document image according to the first coordinate offset matrix, obtain a corrected third document image.

本发明所采用的另一技术方案是:Another technical scheme adopted by the present invention is:

一种文档图像几何校正装置,包括:A document image geometric correction device, comprising:

至少一个处理器;at least one processor;

至少一个存储器,用于存储至少一个程序;at least one memory for storing at least one program;

当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现上所述方法。When the at least one program is executed by the at least one processor, the at least one processor implements the above method.

本发明所采用的另一技术方案是:Another technical scheme adopted by the present invention is:

一种计算机可读存储介质,其中存储有处理器可执行的程序,所述处理器可执行的程序在由处理器执行时用于执行如上所述方法。A computer-readable storage medium in which a processor-executable program is stored, the processor-executable program, when executed by the processor, is used to perform the method as described above.

本发明的有益效果是:本发明能够处理具有不同环境边界区域的拍摄文档图像,包括具有较小的环境边界区域、具有较大的环境边界区域或者不具有环境边界区域的情况。The beneficial effects of the present invention are that the present invention can process photographed document images with different environmental boundary areas, including situations with smaller environmental boundary areas, larger environmental boundary areas, or no environmental boundary areas.

附图说明Description of drawings

为了更清楚地说明本发明实施例或者现有技术中的技术方案,下面对本发明实施例或者现有技术中的相关技术方案附图作以下介绍,应当理解的是,下面介绍中的附图仅仅为了方便清晰表述本发明的技术方案中的部分实施例,对于本领域的技术人员而言,在无需付出创造性劳动的前提下,还可以根据这些附图获取到其他附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following descriptions are given to the accompanying drawings of the embodiments of the present invention or the related technical solutions in the prior art. It should be understood that the drawings in the following introduction are only In order to facilitate and clearly express some embodiments of the technical solutions of the present invention, for those skilled in the art, other drawings can also be obtained from these drawings without creative work.

图1是本发明实施例中一种适用于多种环境边界情况的拍摄文档图像几何校正方法的总体流程图;1 is an overall flow chart of a method for geometric correction of photographed document images suitable for multiple environmental boundary conditions in an embodiment of the present invention;

图2是本发明实施例中控制点提取及进行初步矫正得到去除环境边界区域的文档图像的示意图;2 is a schematic diagram of a control point extraction and preliminary correction to obtain a document image with an environmental boundary area removed in an embodiment of the present invention;

图3是本发明实施例中迭代式矫正的示意图;3 is a schematic diagram of iterative correction in an embodiment of the present invention;

图4是本发明实施例中针对具有不同环境边界情况的文档图像的矫正效果。FIG. 4 is a correction effect for document images with different environmental boundary conditions in an embodiment of the present invention.

图5是本发明实施例中一种文档图像几何校正方法的步骤流程图。FIG. 5 is a flow chart of steps of a method for geometric correction of a document image in an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。对于以下实施例中的步骤编号,其仅为了便于阐述说明而设置,对步骤之间的顺序不做任何限定,实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。The following describes in detail the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, only used to explain the present invention, and should not be construed as a limitation of the present invention. The numbers of the steps in the following embodiments are only set for the convenience of description, and the sequence between the steps is not limited in any way, and the execution sequence of each step in the embodiments can be adapted according to the understanding of those skilled in the art Sexual adjustment.

在本发明的描述中,需要理解的是,涉及到方位描述,例如上、下、前、后、左、右等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。In the description of the present invention, it should be understood that the azimuth description, such as the azimuth or position relationship indicated by up, down, front, rear, left, right, etc., is based on the azimuth or position relationship shown in the drawings, only In order to facilitate the description of the present invention and simplify the description, it is not indicated or implied that the indicated device or element must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be construed as limiting the present invention.

在本发明的描述中,若干的含义是一个或者多个,多个的含义是两个以上,大于、小于、超过等理解为不包括本数,以上、以下、以内等理解为包括本数。如果有描述到第一、第二只是用于区分技术特征为目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。In the description of the present invention, the meaning of several is one or more, the meaning of multiple is two or more, greater than, less than, exceeding, etc. are understood as not including this number, above, below, within, etc. are understood as including this number. If it is described that the first and the second are only for the purpose of distinguishing technical features, it cannot be understood as indicating or implying relative importance, or indicating the number of the indicated technical features or the order of the indicated technical features. relation.

本发明的描述中,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本发明中的具体含义。In the description of the present invention, unless otherwise clearly defined, words such as setting, installation, connection should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above words in the present invention in combination with the specific content of the technical solution.

如图5所示,本实施例提供一种文档图像几何校正方法,包括以下步骤:As shown in FIG. 5 , this embodiment provides a method for geometric correction of a document image, including the following steps:

S101、获取第一文档图像,对第一文档图像中的像素进行分类,区分文档图像中前景文档区域和环境边界区域,获取前景文档区域的掩膜图;S101, obtaining a first document image, classifying the pixels in the first document image, distinguishing a foreground document area and an environmental boundary area in the document image, and obtaining a mask map of the foreground document area;

S102、在掩模图上提取控制点,根据控制点对第一文档图像进行初步矫正,删除环境边界,获得初步矫正并删除环境边界的第二文档图像;S102, extracting control points on the mask map, performing preliminary correction on the first document image according to the control points, deleting the environmental boundary, and obtaining a second document image with preliminary correction and deleting the environmental boundary;

S103、获取第二文档图像的第一坐标偏移矩阵,根据第一坐标偏移矩阵对第二文档图像进行偏移后,获得矫正后的第三文档图像。S103: Acquire a first coordinate offset matrix of the second document image, and obtain a corrected third document image after offsetting the second document image according to the first coordinate offset matrix.

在本实施例中,该第一文档图像可以为通过拍摄设备(比如智能终端、摄像机等)拍摄获得的文档图像,也可以为扫描获得的文档图像等。In this embodiment, the first document image may be a document image obtained by photographing by a photographing device (such as a smart terminal, a camera, etc.), or may be a document image obtained by scanning, or the like.

在一些实施例中,步骤S103中获得矫正后的第三文档图像这一步骤,具体包括:In some embodiments, the step of obtaining the corrected third document image in step S103 specifically includes:

根据第一坐标偏移矩阵判断是否执行迭代步骤,若不需要执行迭代步骤,将第三文档图像作为输出图像;反之,执行迭代步骤;According to the first coordinate offset matrix, it is judged whether to perform the iterative step, if the iterative step is not required to be performed, the third document image is used as the output image; otherwise, the iterative step is performed;

其中,迭代步骤包括A1-A2:Among them, the iterative steps include A1-A2:

A1、获取第三文档图像的第二坐标偏移矩阵,根据第二坐标偏移矩阵对第三文档图像进行偏移后,将矫正后的图像更新为第三文档图像,以及记录迭代步骤中的第二坐标偏移矩阵;A1. Obtain the second coordinate offset matrix of the third document image, and after offsetting the third document image according to the second coordinate offset matrix, update the corrected image to the third document image, and record the The second coordinate offset matrix;

A2、根据第二坐标偏移矩阵判断是否继续执行迭代步骤,若是,返回执行上一步骤;反之,根据第一坐标偏移矩阵和记录中所有的第二坐标偏移矩阵对第二文档图像进行偏移,获得矫正后的图像作为输出图像。A2. Determine whether to continue the iterative step according to the second coordinate offset matrix, if so, return to the previous step; otherwise, perform the second document image according to the first coordinate offset matrix and all the second coordinate offset matrices in the record. Offset to obtain the corrected image as the output image.

在本实施例中,通过坐标偏移矩阵的统计信息来反映输入图像的平整程度;响应低则代表输入图较平整。当该第一坐标偏移矩阵小于预设阈值时,直接将第三文档图像作为输出图像。当该第一坐标偏移矩阵大于预设阈值时,则进行迭代步骤,随着迭代次数增加,这个坐标偏移矩阵的响应是越来越低的,直到文档图像的坐标偏移矩阵小于预设阈值;将获得的所有坐标偏移矩阵进行相加,对第二文档图像进行偏移,获得矫正后的图像作为输出图像。In this embodiment, the flatness of the input image is reflected by the statistical information of the coordinate offset matrix; a low response means that the input image is flat. When the first coordinate offset matrix is smaller than the preset threshold, the third document image is directly used as the output image. When the first coordinate offset matrix is greater than the preset threshold, the iterative step is performed. As the number of iterations increases, the response of the coordinate offset matrix becomes lower and lower until the coordinate offset matrix of the document image is smaller than the preset value. Threshold value; add all the obtained coordinate offset matrices, offset the second document image, and obtain the corrected image as the output image.

以下结合具体实施例以及附图对上述方法进行详细解释说明。The above method will be explained in detail below with reference to specific embodiments and accompanying drawings.

如图1所示,本实施例提供一种适用于多种环境边界情况的拍摄文档图像几何校正方法,用于解决实际场景中具有不同环境边界情况的拍摄文档图像的几何校正问题。该方法具体包括以下步骤:As shown in FIG. 1 , this embodiment provides a geometric correction method for photographed document images suitable for various environmental boundary conditions, which is used to solve the geometric correction problem of photographed document images with different environmental boundary conditions in an actual scene. The method specifically includes the following steps:

S1、对输入拍摄文档图像(即第一文档图像)的每个像素进行分类,区分图像上的前景文档区域和环境边界区域,得到前景文档区域的掩膜图。这样可以将前景文档区域从文档图像中精确地分离出来。S1. Classify each pixel of the input photographed document image (ie, the first document image), distinguish the foreground document area and the environmental boundary area on the image, and obtain a mask map of the foreground document area. This allows for precise separation of foreground document regions from the document image.

S2、在掩模图上提取控制点,利用控制点对输入文档图像进行初步矫正,同时去除环境边界,得到初步矫正并去除环境边界的文档图像(即第二文档图像)作为下一步骤的输入。S2, extracting control points on the mask map, using the control points to perform preliminary correction on the input document image, and removing the environmental boundary at the same time, to obtain a document image (ie, the second document image) with preliminary correction and removal of the environmental boundary as the input of the next step .

在掩模图上提取控制点,因为是二值图,所以相比于直接在输入拍摄文档图像上提取要容易很多,利用控制点对输入文档图像进行初步矫正,同时去除环境边界,得到初步矫正并去除环境边界的文档图像作为下一步骤的输入。Extracting control points on the mask map, because it is a binary map, is much easier than extracting directly on the input document image. Use the control points to perform preliminary correction on the input document image, while removing the environmental boundary to obtain preliminary correction. And remove the document image of the environment boundary as the input of the next step.

具体地,如图2所示,该步骤S2包括步骤S21-S24:Specifically, as shown in FIG. 2 , this step S2 includes steps S21-S24:

S21、使用Douglas-Peucker多边形拟合算法在前景文档区域掩模图上提取文档的四个角点;S21, using the Douglas-Peucker polygon fitting algorithm to extract four corner points of the document on the mask map of the foreground document area;

S22、根据四个角点的相对位置关系区分出左上、右上、右下、左下角点;S22. Distinguish the upper left, upper right, lower right and lower left corner points according to the relative positional relationship of the four corner points;

S23、根据预设等分比例在相邻角点为端点构成的线段上画垂直等分线,将等分线和前景文档区域掩模图边界的交点作为文档边界的等分点;S23, draw a vertical bisector line on the line segment formed by the adjacent corner points according to the preset equal division ratio, and use the intersection of the bisector line and the border of the mask image of the foreground document area as the equal division point of the document boundary;

S24、以四个角点绘制四边形掩模图,与前景文档区域掩模图计算交并比,若交并比小于预设阈值(根据小交并比可以判断该输入拍摄文档图像不含有完整的文档边界),则不进行矫正,将输入文档图像作为下一步骤的输入;若交并比大于预设阈值(根据大交并比可以判断该输入拍摄文档图像含有完整的文档边界),则以四个角点和若干边界等分点为控制点,使用薄板样条插值算法对输入文档图像进行初步矫正,同时去除环境边界,得到初步矫正并去除环境边界后的文档图像作为下一步骤的输入。因为初步矫正是利用文档的边界实现,如果输入文档图像不具有完整的文档边界,仍进行上述薄板样条矫正是不合理的,这里通过设定交并比阈值的方式将这类不具有完整文档边界的输入文档图像剔除,不进行薄板样条矫正,而直接送入下一步骤。S24, draw a quadrilateral mask map with four corner points, and calculate the intersection ratio with the mask map of the foreground document area, if the intersection ratio is less than a preset threshold (according to the small intersection ratio, it can be judged that the input shot document image does not contain complete document boundary), then do not correct, and use the input document image as the input of the next step; if the intersection ratio is greater than the preset threshold (according to the large intersection ratio, it can be judged that the input captured document image contains a complete document boundary), then use The four corner points and several boundary points are used as control points. The thin-plate spline interpolation algorithm is used to initially correct the input document image, and the environmental boundary is removed at the same time, and the document image after preliminary correction and removal of the environmental boundary is obtained as the input of the next step. . Because the initial correction is realized by using the boundary of the document, if the input document image does not have a complete document boundary, it is unreasonable to perform the above thin plate spline correction. The input document image of the boundary is culled, and the thin plate spline correction is not performed, but is directly sent to the next step.

S3、为文档图像预测坐标偏移矩阵,文档图像根据坐标偏移矩阵偏移后得到矫正后的文档图像,矫正后的文档图像可再次作为步骤S3的输入进行迭代矫正,是否进行迭代矫正根据坐标偏移矩阵的统计信息自适应地确定。当迭代停止后,根据得到的若干个坐标偏移矩阵得到最终的矫正文档图像。S3. Predict the coordinate offset matrix for the document image. After the document image is offset according to the coordinate offset matrix, a corrected document image is obtained. The corrected document image can be used again as the input of step S3 for iterative correction. Whether to perform iterative correction depends on the coordinates The statistics of the offset matrix are determined adaptively. When the iteration stops, the final corrected document image is obtained according to the obtained several coordinate offset matrices.

在一些可选的实施例中,步骤S3中坐标偏移矩阵为输入图像中每个像素点位置指定一个二维的偏移向量,偏移向量指示了二维平面上的偏移方向及距离,像素按照对应偏移向量进行偏移后得到矫正后的文档图像,偏移过程采用线性插值。In some optional embodiments, the coordinate offset matrix in step S3 specifies a two-dimensional offset vector for each pixel position in the input image, and the offset vector indicates the offset direction and distance on the two-dimensional plane, After the pixels are offset according to the corresponding offset vector, the corrected document image is obtained, and the offset process adopts linear interpolation.

在一些可选的实施例中,步骤S3采用标准差作为坐标偏移矩阵的统计信息,因为随着迭代次数的增加,输入图像越来越平整,相对应预测得到的坐标偏移矩阵的相应就会越来越低,其标准差也会越来越小,标准差小于一定阈值时,我们可以认为其对应的输入文档图像已经足够平整,因此可以停止迭代,反之则迭代继续。通过这种方式可以取得矫正性能和矫正效率的平衡,使得系统更加高效。In some optional embodiments, the standard deviation is used as the statistical information of the coordinate offset matrix in step S3, because as the number of iterations increases, the input image becomes more and more flat, and the corresponding predicted coordinate offset matrix is It will become lower and lower, and its standard deviation will also become smaller and smaller. When the standard deviation is less than a certain threshold, we can consider that the corresponding input document image is flat enough, so the iteration can be stopped, otherwise, the iteration will continue. In this way, a balance between correction performance and correction efficiency can be achieved, making the system more efficient.

在一些可选的实施例中,如图3所示,步骤S3中基于若干个坐标偏移矩阵进行矫正包括:先对得到的若干个坐标偏移矩阵

Figure BDA0003427328280000061
求和得到最终的坐标偏移矩阵
Figure BDA0003427328280000062
根据最终坐标偏移矩阵
Figure BDA0003427328280000063
对步骤S2的输出进行坐标偏移得到最终矫正后的文档图像。这里不直接采用
Figure BDA0003427328280000071
矫正得到文档图像作为最终输出的原因是:此时文档图像已经经过多次采样,会导致出现模糊的问题。而采用
Figure BDA0003427328280000072
在步骤S2的输出上获得的矫正文档图像则只经过了一次采样,可以有效避免出现模糊的问题。In some optional embodiments, as shown in FIG. 3 , performing correction based on several coordinate offset matrices in step S3 includes: firstly aligning the obtained several coordinate offset matrices
Figure BDA0003427328280000061
Sum up to get the final coordinate offset matrix
Figure BDA0003427328280000062
Offset matrix according to final coordinates
Figure BDA0003427328280000063
Coordinate offset is performed on the output of step S2 to obtain the final corrected document image. not directly used here
Figure BDA0003427328280000071
The reason for correcting the document image as the final output is that the document image has been sampled multiple times at this time, which will cause blurring. using
Figure BDA0003427328280000072
The corrected document image obtained on the output of step S2 has only been sampled once, which can effectively avoid the problem of blurring.

在一些可选的实施例中,步骤S1中采用深度卷积神经网络获得每个像素位置的分类置信度,网络的参数用合成数据预先进行训练优化,预测时通过0.5的阈值进行二值化,得到最终的前景文档区域掩模图。其中网络参数优化具体包括:In some optional embodiments, in step S1, a deep convolutional neural network is used to obtain the classification confidence of each pixel position, the parameters of the network are pre-trained and optimized with synthetic data, and binarization is performed by a threshold of 0.5 during prediction, Get the final foreground document area mask map. The network parameter optimization specifically includes:

(1)数据获取:利用Doc3D公开合成数据集中100000个数据样本(一个数据样本包括一张输入拍摄文档图像以及其相对应的前景文档区域掩模图)作为训练(90000个数据样本)及验证数据(10000个数据样本);(1) Data acquisition: use 100,000 data samples in the Doc3D public synthetic data set (one data sample includes an input captured document image and its corresponding foreground document area mask map) as training (90,000 data samples) and verification data (10000 data samples);

(2)网络训练:(2) Network training:

(2-1)构建深度神经网络:使用DeepLabv3+分割模型作为网络结构,输出类别数设为1,即网络最后一层输出结果的通道数为1。(2-1) Build a deep neural network: Use the DeepLabv3+ segmentation model as the network structure, and set the number of output categories to 1, that is, the number of channels of the output result of the last layer of the network is 1.

(2-2)训练方式:训练使用梯度下降算法,通过从最后一层计算梯度,逐层传递,更新所有的参数,达到训练网络的目的。训练时损失函数为二值交叉熵损失。(2-2) Training method: The training uses the gradient descent algorithm, and the purpose of training the network is achieved by calculating the gradient from the last layer, passing it layer by layer, and updating all parameters. The loss function during training is binary cross entropy loss.

(2-3)训练参数的设定:(2-3) Setting of training parameters:

迭代次数:50epochIterations: 50epoch

优化器:AdamOptimizer: Adam

学习率:0.0001(学习率更新策略:每经过5次迭代,学习率衰减为原来的1/2)Learning rate: 0.0001 (learning rate update strategy: after every 5 iterations, the learning rate decays to 1/2 of the original)

Weight decay:0.0005Weight decay: 0.0005

(2-4)在随机初始化参数下开始训练深度神经网络。(2-4) Start training the deep neural network with random initialization parameters.

在一些可选的实施例中,步骤S3中采用深度卷积神经网络获得坐标偏移矩阵,网络的参数通过合成数据预先进行训练优化,具体包括:In some optional embodiments, in step S3, a deep convolutional neural network is used to obtain a coordinate offset matrix, and the parameters of the network are pre-trained and optimized through synthetic data, specifically including:

(1)数据获取:利用Doc3D公开合成数据集中100000个数据样本(一个数据样本包括一张输入拍摄文档图像以及其相对应的左偏偏移矩阵)作为训练(90000个数据样本)及验证数据(10000个数据样本),训练之前先对样本经过前述步骤S1和步骤S2处理去掉环境边界;(1) Data acquisition: use 100,000 data samples in the Doc3D public synthetic data set (one data sample includes an input photographed document image and its corresponding left offset matrix) as training (90,000 data samples) and verification data (10,000 data samples), before the training, the samples are processed in the aforementioned steps S1 and S2 to remove the environmental boundary;

(2)网络训练:(2) Network training:

(2-1)构建深度神经网络:网络采用先下采样再上采样的编-解码结构,同时采用跳跃连接用于保留细节特征和利于梯度回传,如下表1所示:(2-1) Constructing a deep neural network: The network adopts an encoder-decoder structure of downsampling and then upsampling, and skip connections are used to retain detailed features and facilitate gradient return, as shown in Table 1 below:

表1Table 1

网络层Network layer 具体操作specific operation 特征图尺寸Feature map size 输入层input layer -- 3*448*4483*448*448 卷积层convolutional layer 核数量32,卷积核3*3,步长1*1,补边The number of kernels is 32, the convolution kernel is 3*3, the step size is 1*1, and the edge is supplemented 32*448*44832*448*448 非线性层nonlinear layer -- 32*448*44832*448*448 池化层pooling layer 池化核2*2,步长2*2Pooling kernel 2*2, step size 2*2 32*224*22432*224*224 卷积层convolutional layer 核数量64,卷积核3*3,步长1*1,补边The number of kernels is 64, the convolution kernel is 3*3, the step size is 1*1, and the edge is supplemented 64*224*22464*224*224 非线性层nonlinear layer -- 64*224*22464*224*224 池化层pooling layer 池化核2*2,步长2*2Pooling kernel 2*2, step size 2*2 64*112*11264*112*112 卷积层convolutional layer 核数量128,卷积核3*3,步长1*1,补边The number of kernels is 128, the convolution kernel is 3*3, the step size is 1*1, and the edge is supplemented 128*112*112128*112*112 非线性层nonlinear layer -- 128*112*112128*112*112 池化层pooling layer 池化核2*2,步长2*2Pooling kernel 2*2, step size 2*2 128*56*56128*56*56 卷积层convolutional layer 核数量256,卷积核3*3,步长1*1,补边The number of kernels is 256, the convolution kernel is 3*3, the step size is 1*1, and the edge is supplemented 256*56*56256*56*56 非线性层nonlinear layer -- 256*56*56256*56*56 池化层pooling layer 池化核2*2,步长2*2Pooling kernel 2*2, step size 2*2 256*28*28256*28*28 卷积层convolutional layer 核数量128,卷积核3*3,步长1*1,补边The number of kernels is 128, the convolution kernel is 3*3, the step size is 1*1, and the edge is supplemented 512*28*28512*28*28 非线性层nonlinear layer -- 512*28*28512*28*28 转置卷积层transposed convolutional layer 核数量256,卷积核4*4,步长2*2,补边The number of kernels is 256, the convolution kernel is 4*4, the step size is 2*2, and the edge is supplemented 256*56*56256*56*56 非线性层nonlinear layer -- 256*56*56256*56*56 跳跃连接层skip connection layer 将下采样路径中对应特征图在通道上拼接Splicing the corresponding feature maps in the downsampling path on the channel 512*56*56512*56*56 卷积层convolutional layer 核数量256,卷积核3*3,步长1*1,补边The number of kernels is 256, the convolution kernel is 3*3, the step size is 1*1, and the edge is supplemented 256*56*56256*56*56 非线性层nonlinear layer -- 256*56*56256*56*56 转置卷积层transposed convolutional layer 核数量128,卷积核4*4,步长2*2,补边The number of kernels is 128, the convolution kernel is 4*4, the step size is 2*2, and the edge is supplemented 128*112*112128*112*112 非线性层nonlinear layer -- 128*112*112128*112*112 跳跃连接层skip connection layer 将下采样路径中对应特征图在通道上拼接Splicing the corresponding feature maps in the downsampling path on the channel 256*112*112256*112*112 卷积层convolutional layer 核数量128,卷积核3*3,步长1*1,补边The number of kernels is 128, the convolution kernel is 3*3, the step size is 1*1, and the edge is supplemented 128*112*112128*112*112 非线性层nonlinear layer -- 128*112*112128*112*112 转置卷积层transposed convolutional layer 核数量64,卷积核4*4,步长2*2,补边The number of kernels is 64, the convolution kernel is 4*4, the step size is 2*2, and the edge is supplemented 64*224*22464*224*224 非线性层nonlinear layer -- 64*224*22464*224*224 跳跃连接层skip connection layer 将下采样路径中对应特征图在通道上拼接Splicing the corresponding feature maps in the downsampling path on the channel 128*224*224128*224*224 卷积层convolutional layer 核数量64,卷积核3*3,步长1*1,补边The number of kernels is 64, the convolution kernel is 3*3, the step size is 1*1, and the edge is supplemented 64*224*22464*224*224 非线性层nonlinear layer -- 64*224*22464*224*224 转置卷积层transposed convolutional layer 核数量32,卷积核4*4,步长2*2,补边The number of kernels is 32, the convolution kernel is 4*4, the step size is 2*2, and the edge is supplemented 32*448*44832*448*448 非线性层nonlinear layer -- 32*448*44832*448*448 跳跃连接层skip connection layer 将下采样路径中对应特征图在通道上拼接Splicing the corresponding feature maps in the downsampling path on the channel 64*448*44864*448*448 卷积层convolutional layer 核数量32,卷积核3*3,步长1*1,补边The number of kernels is 32, the convolution kernel is 3*3, the step size is 1*1, and the edge is supplemented 32*448*44832*448*448 非线性层nonlinear layer -- 32*448*44832*448*448 卷积层convolutional layer 核数量2,卷积核3*3,步长1*1,补边The number of kernels is 2, the convolution kernel is 3*3, the stride size is 1*1, and the edge is supplemented 2*448*4482*448*448 非线性层nonlinear layer -- 2*448*4482*448*448

(2-2)训练方式:训练使用梯度下降算法,通过从最后一层计算梯度,逐层传递,更新所有的参数,达到训练网络的目的。训练时损失函数为均方差损失。(2-2) Training method: The training uses the gradient descent algorithm to achieve the purpose of training the network by calculating the gradient from the last layer, passing it layer by layer, and updating all parameters. The loss function during training is the mean squared loss.

(2-3)训练参数的设定:(2-3) Setting of training parameters:

迭代次数:50epochIterations: 50epoch

优化器:AdamOptimizer: Adam

学习率:0.0001(学习率更新策略:每经过5次迭代,学习率衰减为原来的1/2)Learning rate: 0.0001 (learning rate update strategy: after every 5 iterations, the learning rate decays to 1/2 of the original)

Weight decay:0.0005Weight decay: 0.0005

(2-4)在随机初始化参数下开始训练深度神经网络。(2-4) Start training the deep neural network with random initialization parameters.

如图4所示,本实施例提供方法能够处理多种环境边界情况的拍摄文档图像,均能取得较好的矫正效果。综上所述,本实施例提出的方法能够处理具有不同环境边界区域的拍摄文档图像,包括具有较小的环境边界区域、具有较大的环境边界区域、不具有环境边界区域的情况。同时,针对具有不同几何形变程度的文档图像,本实施例提出的方法能够自适应确定迭代次数,取得更好的矫正效果。As shown in FIG. 4 , the method provided in this embodiment can process captured document images in various environmental boundary conditions, and can achieve better correction effects. To sum up, the method proposed in this embodiment can process captured document images with different environmental boundary areas, including situations with smaller environmental boundary areas, larger environmental boundary areas, and no environmental boundary areas. At the same time, for document images with different degrees of geometric deformation, the method proposed in this embodiment can adaptively determine the number of iterations to achieve better correction effects.

本实施例还提供一种文档图像几何校正系统,包括:This embodiment also provides a document image geometric correction system, including:

像素分类模块,用于获取第一文档图像,对所述第一文档图像中的像素进行分类,区分所述文档图像中前景文档区域和环境边界区域,获取前景文档区域的掩膜图;a pixel classification module, configured to obtain a first document image, classify the pixels in the first document image, distinguish a foreground document area and an environmental boundary area in the document image, and obtain a mask map of the foreground document area;

初步矫正模块,用于在所述掩模图上提取控制点,根据所述控制点对所述第一文档图像进行初步矫正,删除环境边界,获得初步矫正并删除环境边界的第二文档图像;A preliminary correction module for extracting control points on the mask map, performing preliminary correction on the first document image according to the control points, deleting the environmental boundary, and obtaining a second document image with preliminary correction and deleting the environmental boundary;

偏移矫正模块,用于获取所述第二文档图像的第一坐标偏移矩阵,根据所述第一坐标偏移矩阵对所述第二文档图像进行偏移后,获得矫正后的第三文档图像。an offset correction module, configured to obtain a first coordinate offset matrix of the second document image, and after offsetting the second document image according to the first coordinate offset matrix, obtain a corrected third document image.

本实施例的一种文档图像几何校正系统,可执行本发明方法实施例所提供的一种文档图像几何校正方法,可执行方法实施例的任意组合实施步骤,具备该方法相应的功能和有益效果。A document image geometric correction system in this embodiment can perform a document image geometric correction method provided by the method embodiment of the present invention, and can perform any combination of implementation steps of the method embodiment, and has corresponding functions and beneficial effects of the method. .

本实施例还提供一种文档图像几何校正装置,包括:This embodiment also provides a document image geometric correction device, including:

至少一个处理器;at least one processor;

至少一个存储器,用于存储至少一个程序;at least one memory for storing at least one program;

当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现图5所示方法。When the at least one program is executed by the at least one processor, the at least one processor implements the method shown in FIG. 5 .

本实施例的一种文档图像几何校正系统,可执行本发明方法实施例所提供的一种文档图像几何校正方法,可执行方法实施例的任意组合实施步骤,具备该方法相应的功能和有益效果。A document image geometric correction system in this embodiment can perform a document image geometric correction method provided by the method embodiment of the present invention, and can perform any combination of implementation steps of the method embodiment, and has corresponding functions and beneficial effects of the method. .

本申请实施例还公开了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存介质中。计算机设备的处理器可以从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行图5所示的方法。Embodiments of the present application further disclose a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device can read the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method shown in FIG. 5 .

本实施例还提供了一种存储介质,存储有可执行本发明方法实施例所提供的一种文档图像几何校正方法的指令或程序,当运行该指令或程序时,可执行方法实施例的任意组合实施步骤,具备该方法相应的功能和有益效果。This embodiment also provides a storage medium storing an instruction or a program for executing a method for geometric correction of a document image provided by the method embodiment of the present invention. When the instruction or program is executed, any of the method embodiments can be executed. The combined implementation steps have corresponding functions and beneficial effects of the method.

在一些可选择的实施例中,在方框图中提到的功能/操作可以不按照操作示图提到的顺序发生。例如,取决于所涉及的功能/操作,连续示出的两个方框实际上可以被大体上同时地执行或所述方框有时能以相反顺序被执行。此外,在本发明的流程图中所呈现和描述的实施例以示例的方式被提供,目的在于提供对技术更全面的理解。所公开的方法不限于本文所呈现的操作和逻辑流程。可选择的实施例是可预期的,其中各种操作的顺序被改变以及其中被描述为较大操作的一部分的子操作被独立地执行。In some alternative implementations, the functions/operations noted in the block diagrams may occur out of the order noted in the operational diagrams. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/operations involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of the various operations are altered and in which sub-operations described as part of larger operations are performed independently.

此外,虽然在功能性模块的背景下描述了本发明,但应当理解的是,除非另有相反说明,所述的功能和/或特征中的一个或多个可以被集成在单个物理装置和/或软件模块中,或者一个或多个功能和/或特征可以在单独的物理装置或软件模块中被实现。还可以理解的是,有关每个模块的实际实现的详细讨论对于理解本发明是不必要的。更确切地说,考虑到在本文中公开的装置中各种功能模块的属性、功能和内部关系的情况下,在工程师的常规技术内将会了解该模块的实际实现。因此,本领域技术人员运用普通技术就能够在无需过度试验的情况下实现在权利要求书中所阐明的本发明。还可以理解的是,所公开的特定概念仅仅是说明性的,并不意在限制本发明的范围,本发明的范围由所附权利要求书及其等同方案的全部范围来决定。Furthermore, while the invention is described in the context of functional modules, it is to be understood that, unless stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or or software modules, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to understand the present invention. Rather, given the attributes, functions, and internal relationships of the various functional modules in the apparatus disclosed herein, the actual implementation of the modules will be within the routine skill of the engineer. Accordingly, those skilled in the art, using ordinary skill, can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are illustrative only and are not intended to limit the scope of the invention, which is to be determined by the appended claims along with their full scope of equivalents.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。The logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, may be embodied in any computer-readable medium, For use with, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch instructions from and execute instructions from an instruction execution system, apparatus, or apparatus) or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus.

计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.

应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

在本说明书的上述描述中,参考术语“一个实施方式/实施例”、“另一实施方式/实施例”或“某些实施方式/实施例”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施方式或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。In the above description of the present specification, reference to the description of the terms "one embodiment/example", "another embodiment/example" or "certain embodiments/examples" etc. means the description in conjunction with the embodiment or example. Particular features, structures, materials, or characteristics are included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

尽管已经示出和描述了本发明的实施方式,本领域的普通技术人员可以理解:在不脱离本发明的原理和宗旨的情况下可以对这些实施方式进行多种变化、修改、替换和变型,本发明的范围由权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, The scope of the invention is defined by the claims and their equivalents.

以上是对本发明的较佳实施进行了具体说明,但本发明并不限于上述实施例,熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present invention, but the present invention is not limited to the above-mentioned embodiments, and those skilled in the art can also make various equivalent deformations or replacements on the premise of not violating the spirit of the present invention. Equivalent modifications or substitutions are included within the scope defined by the claims of the present application.

Claims (10)

1.一种文档图像几何校正方法,其特征在于,包括以下步骤:1. a document image geometric correction method, is characterized in that, comprises the following steps: 获取第一文档图像,对所述第一文档图像中的像素进行分类,区分所述文档图像中前景文档区域和环境边界区域,获取前景文档区域的掩膜图;obtaining a first document image, classifying the pixels in the first document image, distinguishing a foreground document area and an environmental boundary area in the document image, and obtaining a mask map of the foreground document area; 在所述掩模图上提取控制点,根据所述控制点对所述第一文档图像进行初步矫正,删除环境边界,获得初步矫正并删除环境边界的第二文档图像;Extracting control points on the mask map, performing preliminary correction on the first document image according to the control points, deleting the environmental boundary, and obtaining a second document image with preliminary correction and deleting the environmental boundary; 获取所述第二文档图像的第一坐标偏移矩阵,根据所述第一坐标偏移矩阵对所述第二文档图像进行偏移后,获得矫正后的第三文档图像。A first coordinate offset matrix of the second document image is acquired, and after the second document image is offset according to the first coordinate offset matrix, a corrected third document image is obtained. 2.根据权利要求1所述的一种文档图像几何校正方法,其特征在于,所述获得矫正后的第三文档图像,包括:2. A document image geometric correction method according to claim 1, wherein the obtaining the corrected third document image comprises: 根据所述第一坐标偏移矩阵判断是否执行迭代步骤,若不需要执行迭代步骤,将所述第三文档图像作为输出图像;反之,执行迭代步骤;It is judged according to the first coordinate offset matrix whether to perform the iterative step, if the iterative step is not required to be performed, the third document image is used as the output image; otherwise, the iterative step is performed; 所述迭代步骤包括:The iterative steps include: 获取所述第三文档图像的第二坐标偏移矩阵,根据所述第二坐标偏移矩阵对所述第三文档图像进行偏移后,将矫正后的图像更新为第三文档图像,以及记录迭代步骤中的第二坐标偏移矩阵;Acquiring a second coordinate offset matrix of the third document image, after offsetting the third document image according to the second coordinate offset matrix, updating the corrected image to the third document image, and recording the second coordinate offset matrix in the iteration step; 根据所述第二坐标偏移矩阵判断是否继续执行迭代步骤,若是,返回执行上一步骤;反之,根据所述第一坐标偏移矩阵和记录中所有的第二坐标偏移矩阵对所述所述第二文档图像进行偏移,获得矫正后的图像作为输出图像。Determine whether to continue the iterative step according to the second coordinate offset matrix, and if so, return to the previous step; The second document image is offset, and the corrected image is obtained as the output image. 3.根据权利要求1所述的一种文档图像几何校正方法,其特征在于,所述对所述第一文档图像中的像素进行分类,包括:3. A document image geometric correction method according to claim 1, wherein the classifying the pixels in the first document image comprises: 采用第一深度卷积神经网络获取所述第一文档图像中每个像素位置的分类置信度,根据分类置信度进行分类;Using the first deep convolutional neural network to obtain the classification confidence of each pixel position in the first document image, and classify according to the classification confidence; 所述获取所述第二文档图像的第一坐标偏移矩阵,包括:The acquiring the first coordinate offset matrix of the second document image includes: 采用第二深度卷积神经网络获取所述第二文档图像的第一坐标偏移矩阵。A first coordinate offset matrix of the second document image is obtained by using a second deep convolutional neural network. 4.根据权利要求1所述的一种文档图像几何校正方法,其特征在于,所述在所述掩模图上提取控制点,根据所述控制点对所述第一文档图像进行初步矫正,删除环境边界,获得初步矫正并删除环境边界的第二文档图像,包括:4. The method for geometric correction of a document image according to claim 1, wherein the control points are extracted on the mask map, and the first document image is preliminarily corrected according to the control points, Remove the environmental boundary, obtain a second document image with preliminary rectification and remove the environmental boundary, including: 使用多边形拟合算法在前景文档区域的掩模图上提取文档的四个角点;Extract the four corners of the document on the mask map of the foreground document area using a polygon fitting algorithm; 根据预设等分比例在以相邻角点为端点构成的线段上画垂直等分线,将等分线和前景文档区域的掩模图边界的交点作为文档边界的等分点;Draw a vertical bisector line on the line segment formed by the adjacent corner points according to the preset bisector ratio, and take the intersection of the bisector line and the mask image boundary of the foreground document area as the bisector point of the document boundary; 以四个角点绘制四边形掩模图,根据所述四边形掩模图与前景文档区域的掩模图计算交并比;Draw a quadrilateral mask map with four corners, and calculate the intersection ratio according to the quadrilateral mask map and the mask map of the foreground document area; 若交并比小于第一预设阈值,则不进行矫正,将第一文档图像作为第二文档图像;If the intersection ratio is smaller than the first preset threshold, no correction is performed, and the first document image is used as the second document image; 若交并比大于第一预设阈值,则以四个角点和若干边界等分点为控制点,使用薄板样条插值算法对第一文档图像进行初步矫正,删除环境边界,获得初步矫正并删除环境边界的第二文档图像。If the intersection-union ratio is greater than the first preset threshold, the four corner points and several boundary points are used as control points, and the thin-plate spline interpolation algorithm is used to perform preliminary correction on the first document image, delete the environmental boundary, and obtain a preliminary correction and Delete the second document image of the environment boundary. 5.根据权利要求1所述的一种文档图像几何校正方法,其特征在于,所述根据所述第一坐标偏移矩阵对所述第二文档图像进行偏移后,获得矫正后的第三文档图像,包括:5 . The method for geometric correction of a document image according to claim 1 , wherein after the second document image is offset according to the first coordinate offset matrix, a corrected third image is obtained. 6 . Documentation images, including: 所述第一坐标偏移矩阵为所述第二文档图像中每个像素点位置指定一个二维的偏移向量,每个像素按照对应的偏移向量进行偏移,获得矫正后的第三文档图像;The first coordinate offset matrix specifies a two-dimensional offset vector for each pixel position in the second document image, and each pixel is offset according to the corresponding offset vector to obtain a corrected third document image; 其中,偏移向量用于表征二维平面上的偏移方向及距离。Among them, the offset vector is used to represent the offset direction and distance on the two-dimensional plane. 6.根据权利要求2所述的一种文档图像几何校正方法,其特征在于,所述根据所述第二坐标偏移矩阵判断是否继续执行迭代步骤,包括:6. The method for geometric correction of document images according to claim 2, wherein the judging whether to continue to perform the iterative step according to the second coordinate offset matrix comprises: 计算所述第二坐标偏移矩阵的标准差;calculating the standard deviation of the second coordinate offset matrix; 若所述第二坐标偏移矩阵的标准差大于第二预设阈值,继续执行迭代步骤;If the standard deviation of the second coordinate offset matrix is greater than the second preset threshold, continue to perform the iterative step; 若所述第二坐标偏移矩阵的标准差小于第二预设阈值,停止执行迭代步骤。If the standard deviation of the second coordinate offset matrix is smaller than the second preset threshold, the iterative step is stopped. 7.根据权利要求2所述的一种文档图像几何校正方法,其特征在于,所述根据所述第一坐标偏移矩阵和记录中所有的第二坐标偏移矩阵对所述所述第二文档图像进行偏移,获得矫正后的图像作为输出图像,包括:7 . The method for geometric correction of a document image according to claim 2 , wherein the second coordinate offset matrix is adjusted according to the first coordinate offset matrix and all second coordinate offset matrices in the record. 8 . The document image is offset, and the corrected image is obtained as the output image, including: 将第一坐标偏移矩阵
Figure FDA0003427328270000021
和迭代步骤中记录的若干个坐标偏移矩阵
Figure FDA0003427328270000022
进行求和,获得最终的坐标偏移矩阵
Figure FDA0003427328270000023
根据最终坐标偏移矩阵
Figure FDA0003427328270000024
对所述第二文档图像进行偏移,获得矫正后的图像作为输出图像。
Offset the first coordinate to the matrix
Figure FDA0003427328270000021
and several coordinate offset matrices recorded in the iteration step
Figure FDA0003427328270000022
Do the summation to get the final coordinate offset matrix
Figure FDA0003427328270000023
Offset matrix according to final coordinates
Figure FDA0003427328270000024
Offset the second document image to obtain a corrected image as an output image.
8.一种文档图像几何校正系统,其特征在于,包括:8. A document image geometric correction system, characterized in that, comprising: 像素分类模块,用于获取第一文档图像,对所述第一文档图像中的像素进行分类,区分所述文档图像中前景文档区域和环境边界区域,获取前景文档区域的掩膜图;a pixel classification module, configured to obtain a first document image, classify the pixels in the first document image, distinguish a foreground document area and an environmental boundary area in the document image, and obtain a mask map of the foreground document area; 初步矫正模块,用于在所述掩模图上提取控制点,根据所述控制点对所述第一文档图像进行初步矫正,删除环境边界,获得初步矫正并删除环境边界的第二文档图像;A preliminary correction module for extracting control points on the mask map, performing preliminary correction on the first document image according to the control points, deleting the environmental boundary, and obtaining a second document image with preliminary correction and deleting the environmental boundary; 偏移矫正模块,用于获取所述第二文档图像的第一坐标偏移矩阵,根据所述第一坐标偏移矩阵对所述第二文档图像进行偏移后,获得矫正后的第三文档图像。an offset correction module, configured to obtain a first coordinate offset matrix of the second document image, and after offsetting the second document image according to the first coordinate offset matrix, obtain a corrected third document image. 9.一种文档图像几何校正装置,其特征在于,包括:9. A document image geometric correction device, characterized in that, comprising: 至少一个处理器;at least one processor; 至少一个存储器,用于存储至少一个程序;at least one memory for storing at least one program; 当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现权利要求1-7任一项所述方法。When the at least one program is executed by the at least one processor, the at least one processor implements the method of any one of claims 1-7. 10.一种计算机可读存储介质,其中存储有处理器可执行的程序,其特征在于,所述处理器可执行的程序在由处理器执行时用于执行如权利要求1-7任一项所述方法。10. A computer-readable storage medium in which a processor-executable program is stored, wherein the processor-executable program is used to execute any one of claims 1-7 when executed by the processor the method.
CN202111584077.4A 2021-12-22 2021-12-22 Document image geometric correction method, system, device and medium Active CN114418869B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111584077.4A CN114418869B (en) 2021-12-22 2021-12-22 Document image geometric correction method, system, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111584077.4A CN114418869B (en) 2021-12-22 2021-12-22 Document image geometric correction method, system, device and medium

Publications (2)

Publication Number Publication Date
CN114418869A true CN114418869A (en) 2022-04-29
CN114418869B CN114418869B (en) 2024-08-13

Family

ID=81267830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111584077.4A Active CN114418869B (en) 2021-12-22 2021-12-22 Document image geometric correction method, system, device and medium

Country Status (1)

Country Link
CN (1) CN114418869B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187995A (en) * 2022-07-08 2022-10-14 北京百度网讯科技有限公司 Document correction method, device, electronic equipment and storage medium
CN116030120A (en) * 2022-09-09 2023-04-28 北京市计算中心有限公司 Method for identifying and correcting hexagons
CN117853382A (en) * 2024-03-04 2024-04-09 武汉人工智能研究院 Sparse marker-based image correction method, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190294970A1 (en) * 2018-03-23 2019-09-26 The Governing Council Of The University Of Toronto Systems and methods for polygon object annotation and a method of training an object annotation system
CN111401371A (en) * 2020-06-03 2020-07-10 中邮消费金融有限公司 Text detection and identification method and system and computer equipment
CN111414915A (en) * 2020-02-21 2020-07-14 华为技术有限公司 Character recognition method and related equipment
CN112767270A (en) * 2021-01-19 2021-05-07 中国科学技术大学 Fold document image correction system
CN112766266A (en) * 2021-01-29 2021-05-07 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics
KR20210112992A (en) * 2020-03-06 2021-09-15 주식회사 테스트웍스 System and method of quality adjustment of object detection based on polyggon

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190294970A1 (en) * 2018-03-23 2019-09-26 The Governing Council Of The University Of Toronto Systems and methods for polygon object annotation and a method of training an object annotation system
CN111414915A (en) * 2020-02-21 2020-07-14 华为技术有限公司 Character recognition method and related equipment
KR20210112992A (en) * 2020-03-06 2021-09-15 주식회사 테스트웍스 System and method of quality adjustment of object detection based on polyggon
CN111401371A (en) * 2020-06-03 2020-07-10 中邮消费金融有限公司 Text detection and identification method and system and computer equipment
CN112767270A (en) * 2021-01-19 2021-05-07 中国科学技术大学 Fold document image correction system
CN112766266A (en) * 2021-01-29 2021-05-07 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIAXIN ZHANG ET AL.: "Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in theWild", 《ARXIV:2207.11515V1》, 23 July 2022 (2022-07-23), pages 1 - 11 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187995A (en) * 2022-07-08 2022-10-14 北京百度网讯科技有限公司 Document correction method, device, electronic equipment and storage medium
CN116030120A (en) * 2022-09-09 2023-04-28 北京市计算中心有限公司 Method for identifying and correcting hexagons
CN116030120B (en) * 2022-09-09 2023-11-24 北京市计算中心有限公司 Method for identifying and correcting hexagons
CN117853382A (en) * 2024-03-04 2024-04-09 武汉人工智能研究院 Sparse marker-based image correction method, device and storage medium
CN117853382B (en) * 2024-03-04 2024-05-28 武汉人工智能研究院 Sparse marker-based image correction method, device and storage medium

Also Published As

Publication number Publication date
CN114418869B (en) 2024-08-13

Similar Documents

Publication Publication Date Title
CN107704857B (en) An end-to-end lightweight license plate recognition method and device
CN114418869A (en) Method, system, device and medium for geometric correction of document image
CN109753971B (en) Correction method and device for distorted text lines, character recognition method and device
CN111209858B (en) Real-time license plate detection method based on deep convolutional neural network
CN115331245B (en) Table structure identification method based on image instance segmentation
CN115272691A (en) A training method, identification method and equipment for detecting model of steel bar binding state
CN105701770B (en) A kind of human face super-resolution processing method and system based on context linear model
CN116342600B (en) Segmentation method of cell nuclei in thymoma histopathological image
WO2021147437A1 (en) Identity card edge detection method, device, and storage medium
CN111444923A (en) Method and device for image semantic segmentation in natural scenes
CN113436220B (en) An Image Background Estimation Method Based on Depth Map Segmentation
CN113744142A (en) Image restoration method, electronic device and storage medium
CN110503651A (en) Method and device for image salient object segmentation
CN113033558B (en) Text detection method and device for natural scene and storage medium
CN116740528A (en) A method and system for target detection in side scan sonar images based on shadow features
CN112686247A (en) Identification card number detection method and device, readable storage medium and terminal
CN115984666A (en) Cross-channel pyramid pooling method, system, convolutional neural network and processing method
CN112132054A (en) Document positioning and segmenting method based on deep learning
CN114627484A (en) A complex multi-scene document segmentation method, system, device and medium
CN113808033A (en) Image document correction method, system, terminal and medium
WO2024174726A1 (en) Handwritten and printed text detection method and device based on deep learning
CN117409057A (en) Panorama depth estimation method, equipment and medium
CN110826564A (en) Small target semantic segmentation method and system in complex scene image
CN111666949A (en) Image semantic segmentation method based on iterative segmentation
CN113159020B (en) Text Detection Method Based on Kernel Scale Expansion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant