WO2023123763A1 - 一种文档图像的方向校正方法与装置 - Google Patents

一种文档图像的方向校正方法与装置 Download PDF

Info

Publication number
WO2023123763A1
WO2023123763A1 PCT/CN2022/088550 CN2022088550W WO2023123763A1 WO 2023123763 A1 WO2023123763 A1 WO 2023123763A1 CN 2022088550 W CN2022088550 W CN 2022088550W WO 2023123763 A1 WO2023123763 A1 WO 2023123763A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
degrees
angle
document
trimming
Prior art date
Application number
PCT/CN2022/088550
Other languages
English (en)
French (fr)
Inventor
刘鹏伟
郭丰俊
龙腾
丁凯
张彬
镇立新
Original Assignee
上海合合信息科技股份有限公司
上海临冠数据科技有限公司
上海生腾数据科技有限公司
上海盈五蓄数据科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海合合信息科技股份有限公司, 上海临冠数据科技有限公司, 上海生腾数据科技有限公司, 上海盈五蓄数据科技有限公司 filed Critical 上海合合信息科技股份有限公司
Publication of WO2023123763A1 publication Critical patent/WO2023123763A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Definitions

  • the present application relates to a method for correcting the direction of a document image.
  • a document image refers to a document in an image format, and is usually a document converted from a paper document into an image format by means of photographing, scanning, or the like.
  • the direction that can read the document correctly is regarded as the correct direction, and the direction of some document images is not the correct direction, for example, it is reversed by 180 degrees.
  • OCR optical character recognition, optical character recognition
  • the Chinese invention patent application "Text Recognition Method, Device, Equipment and Medium Based on Direction Detection" with the application publication number CN112329777A and the application publication date of February 5, 2021 discloses: Rotate the sliced samples to obtain the first training sample; use the first training sample to train the MobileNet-v2 network to obtain a text direction detection model; when receiving a picture to be detected, perform text position detection on the picture to be detected to obtain at least one text slice; each text slice after preprocessing input to the text direction detection model, and obtain the output of the text direction detection model as the text direction of each text slice.
  • This literature mainly involves the reading direction detection of a single line of text in a document image, rather than the direction detection of the entire document image.
  • the present application proposes a method for correcting the direction of a document image, which includes the following steps.
  • Step S10 Find the edges and four corners of the document area in the input image, and use the perspective transformation method to perform edge trimming and small-angle direction correction on the input image; if the input image is a document image, then the trimmed image is the input image Document area; small-angle direction correction refers to correcting the document area of the input image to one of four forms whose deviation angle from the correct direction is 0 degree, or 90 degree, or 180 degree, or 270 degree, the document area of the input image Whichever form is the closest to the four forms, it will be corrected to that form.
  • Step S20 Use an angle classification model to obtain the deviation angle detection value from the image after trimming and small angle direction correction; the deviation angle detection value has only four values - 0 degrees, or 90 degrees, or 180 degrees, or 270 degrees Spend.
  • the angle classification model is obtained in the following way: (1) Collect multiple document images after edge cutting, background images without documents and corresponding orientation labels as training data sets; The deviation angle is either 0 degrees, or 90 degrees, or 180 degrees, or 270 degrees; the orientation label of the trimmed document image is used to record the deviation between the actual orientation and the correct orientation of the trimmed document image Angle; the direction label of the background image indicates that the image is a background image; (2) Randomly rotate some or all of the images in the training data set with 90 degrees as the unit, and change the rotated edge cut accordingly.
  • Step S30 correcting the direction of the image after trimming and small-angle direction correction according to the deviation angle detection value.
  • the present application also proposes a device for correcting the direction of a document image, including a trimming and small-angle direction correction unit, a deviation angle detection unit, and a large-angle direction correction unit.
  • the trimming and small-angle direction correction unit is used to find the edges and four corners of the document area in the input image, and uses the perspective transformation method to perform trimming and small-angle direction correction on the input image; if the input image is a document image, then the trimming The final image is the document area of the input image; small-angle direction correction refers to correcting the document area of the input image to one of four forms whose deviation angle from the correct direction is 0 degree, or 90 degree, or 180 degree, or 270 degree First, the document area of the input image is the closest to whichever one of the four forms is corrected.
  • the deviation angle detection unit is used to obtain the deviation angle detection value through an angle classification model for the image after trimming and small angle direction correction; the deviation angle detection value has only four values - 0 degrees, or 90 degrees, or 180 degrees , or 270 degrees; the angle classification model is obtained in the following way: (1) collect multiple document images after edge cutting and background images without documents and corresponding direction labels as training data sets; the actual document images after edge cutting The deviation angle between the direction and the correct direction is either 0 degrees, or 90 degrees, or 180 degrees, or 270 degrees; the direction label of the document image after trimming is used to record the actual direction of the document image after trimming The angle of deviation from the correct direction; the direction label of the background image indicates that the image is a background image; (2) Randomly rotate some or all of the images in the training data set by 90 degrees, and change the rotation accordingly The orientation label of the document image after cutting the edge, and obtain the enhanced training data set; (3) use the enhanced training data set to train an angle classification model, which is used to distinguish the document image from the background image, and is also used to identify
  • FIG. 1 is a schematic flow diagram of a method for correcting the direction of a document image proposed in the present application
  • Figures 2 to 5 are schematic diagrams of several trimmed images before perspective transformation for small-angle direction correction
  • Figures 6 to 9 are schematic diagrams of several trimmed images after perspective transformation for small-angle direction correction
  • FIG. 10 is a schematic structural diagram of a device for correcting the direction of a document image proposed by the present application.
  • 10 is an edge trimming and small angle direction correction unit
  • 20 is a deviation angle detection unit
  • 30 is a large angle direction correction unit.
  • the technical problem to be solved in this application is to provide a method for correcting the direction of a document image, which uses the information of the document area in the document image to judge the direction of the image and correct it quickly and accurately.
  • the technical effect obtained by the application is: a set of solutions for quickly and accurately trimming edges and correcting directions is proposed for document images.
  • the system After inputting a picture, the system will automatically detect the document area in the picture according to the detection algorithm and give the four corner points of the document area, cut out the document area through perspective transformation and perform small angle direction correction at the same time, and then use the angle classification model to detect and Perform large-angle orientation correction to provide convenience for browsing document images or other post-processing operations.
  • This application addresses the direction detection and correction of the entire image, rather than the direction of a single line of text in it.
  • edge trimming is performed before the orientation of the document image is corrected, so as to improve the accuracy of the orientation of the document image.
  • background images are added in the orientation classification of document images, which improves the accuracy of orientation classification of document images.
  • the method for correcting the direction of a document image proposed in this application includes the following steps.
  • Step S10 Find the edges and four corners (corners) of the document area in the input image, use the perspective transformation (perspective transformation) method to perform trimming and small-angle direction correction on the input image, and obtain the trimmed edge and small-angle direction correction image. If the input image is a document image, then the cropped image is the document region of the input image. Small-angle direction correction refers to correcting the document area of the input image to one of four forms whose deviation angle from the correct direction is 0 degrees, or 90 degrees, or 180 degrees, or 270 degrees. Which one of the two forms is the closest is corrected to which form.
  • Fig. 2 to Fig. 5 are images after perspective transformation before small-angle orientation correction.
  • FIG. 6 to FIG. 9 which are the trimmed images after perspective transformation and small-angle direction correction.
  • the dotted line indicates the correct direction of the image after trimming
  • the solid line indicates the actual direction of the image after trimming.
  • the deviation angle ⁇ is defined as starting from the correct direction of the image after trimming and going clockwise to the direction after trimming
  • the angle between the actual directions of the images, the value range of the deviation angle ⁇ is 0° ⁇ 360°.
  • the edge-trimmed image shown in FIG. 2 is subjected to perspective transformation to perform small-angle direction correction to obtain the edge-trimmed image shown in FIG. 6 .
  • the edge-trimmed image shown in FIG. 3 is subjected to perspective transformation to perform small-angle direction correction to obtain the edge-trimmed image shown in FIG. 7 .
  • the edge-trimmed image shown in FIG. 4 is subjected to perspective transformation for small-angle direction correction to obtain the edge-trimmed image shown in FIG. 8 .
  • the edge-trimmed image shown in FIG. 5 is subjected to perspective transformation for small-angle direction correction to obtain the edge-trimmed image shown in FIG. 9 .
  • 0 degrees
  • no small angle correction is required.
  • the deviation angle ⁇ of the trimmed image obtained after small-angle direction correction has only four values—0 degree, 90 degree, 180 degree, and 270 degree—as shown in FIGS. 6 to 9 respectively.
  • Step S20 Pass the image corrected for edge trimming and small angle direction through an angle classification model to obtain a deviation angle detection value.
  • the deviation angle detection value has only four values—0 degree, or 90 degree, or 180 degree, or 270 degree.
  • the angle classification model is obtained as follows. (1) Collect multiple images and corresponding orientation labels as a training dataset. Among the multiple images, most of them are document images after trimming, and the rest are background images.
  • the edge-cut document image refers to the image in which the blank area of the edge is removed, and only the document area in the image is retained; and the deviation angle between the actual direction and the correct direction of these edge-cut document images is either 0 degrees or 90 degrees degrees, or 180 degrees, or 270 degrees.
  • a background image is an image without a document.
  • the orientation tag of the edge-trimmed document image is used to record the deviation angle between the actual orientation and the correct orientation of the edge-trimmed document image.
  • the orientation label of a background image is fixed at one, indicating that the image is a background image.
  • the training data set processed in this way is called an enhanced training data set, and its purpose is to make the distribution of the edge-cut document images in different directions as uniform as possible.
  • the angle classification model is used to distinguish document images from background images, and is also used to identify the deviation angle between the actual direction of each document image and the correct direction.
  • the angle classification model is trained using a lightweight neural network (Neural Network, NN), such as SqueezeNet, MobileNet, ShuffleNet, EffNet, etc., so as to be deployed on mobile terminals such as smart phones.
  • NN lightweight neural network
  • the input sizes of all the images in the enhanced training data set are uniformly scaled to a fixed size during the training process, so as to obtain a better training effect.
  • the trimmed and small-angle direction-corrected image is first scaled to a fixed size of the input image during the training of the angle classification model, and then the scaled trimmed and small-angle direction-corrected image is sent to the angle classification model .
  • Step S30 If the angle classification model gives the deviation angle detection value of the image after trimming and small-angle direction correction, it indicates that the image after trimming and small-angle direction correction belongs to the document image, and at this time, according to the deviation angle detection value, the The direction of the image corrected by the side and small angle direction is corrected, and the image after the trimming and small angle direction is corrected is rotated to the correct direction.
  • This step is to perform rotation correction on the trimmed and small-angle direction-corrected image, so as to facilitate reading and printing of the direction-corrected image.
  • the device for correcting the direction of a document image proposed in this application includes a trimming and small-angle direction correction unit 10 , a deviation angle detection unit 20 , and a large-angle direction correction unit 30 .
  • the trimming and small-angle direction correction unit 10 is used to find the edges and four corner points of the document area in the input image, and uses the perspective transformation method to perform trimming and small-angle direction correction on the input image to obtain the trimming and small-angle direction correction. Image.
  • the deviation angle detection unit 20 is used to pass the image corrected by the cut edge and the small angle direction through an angle classification model to obtain a deviation angle detection value.
  • the deviation angle detection value has only four values—0 degree, or 90 degree, or 180 degree, or 270 degree.
  • the angle classification model is obtained as follows. (1) Collect multiple images and corresponding orientation labels as the training data set. Among the multiple images, most of them are document images after trimming, and the rest are background images. The angle of deviation between the actual direction of the document image after edge trimming and the correct direction is either 0 degrees, or 90 degrees, or 180 degrees, or 270 degrees. A background image is an image without a document.
  • the orientation tag of the edge-trimmed document image is used to record the deviation angle between the actual orientation and the correct orientation of the edge-trimmed document image.
  • the orientation label of the background image is fixed at one, indicating that the image is a background image.
  • the training data set processed in this way is called an enhanced training data set.
  • the angle classification model is used to distinguish document images from background images, and is also used to identify the deviation angle between the actual direction of each document image and the correct direction. There are four values Which of the
  • the large-angle direction correction unit 30 is used for correcting the direction of the image after trimming and small-angle direction correction according to the deviation angle detection value, and rotating the image after trimming and small-angle direction correction to the correct direction.
  • the method and device for correcting the direction of a document image proposed in this application have the following beneficial effects.
  • this application is based on deep learning (deep learning) technology, with high robustness (robustness, also known as robustness, robustness).
  • the application changes the direction calculation problem of the document image into a classification problem of background images and four large-angle directions (0 degrees, 90 degrees, 180 degrees, 270 degrees), and the calculation speed is fast.
  • This application does not process every small angle, which simplifies the complexity of orientation correction of document images and facilitates neural network learning.
  • this application uses lightweight neural network training to obtain an angle classification model, which has fast calculation speed and small size, and is especially suitable for deployment on mobile terminals.

Abstract

本申请公开了一种文档图像的方向校正方法。寻找输入图像中文档区域的边和四个角点,使用透视变换方法对所述输入图像进行切边和小角度方向校正。将所述切边和小角度方向校正后的图像通过一个角度分类模型得到偏差角度检测值;所述偏差角度检测值仅有四种取值——0度、或90度、或180度、或270度。根据所述偏差角度检测值对所述切边和小角度方向校正后的图像的方向进行校正。

Description

一种文档图像的方向校正方法与装置
交叉引用
本申请基于申请号为“202111679610.5”、申请日为2021年12月31日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本申请涉及一种文档图像的方向校正方法。
背景技术
文档图像是指图像格式的文档,通常是由纸质文档采用拍照、扫描等方式转换为图像格式的文档。一般将能够正确阅读文档的方向视为正确方向,有些文档图像的方向并非正确方向,例如颠倒了180度。为了进行浏览阅读、OCR(optical character recognition,光学字符识别)识别等操作,文档图像的方向需要校正为正确方向。
申请公布号为CN112329777A、申请公布日为2021年2月5日的中国发明专利申请《基于方向检测的文字识别方法、装置、设备及介质》公开了:对切片样本进行旋转处理,得到第一训练样本;利用第一训练样本训练MobileNet-v2网络,得到文本方向检测模型;当接收到待检测图片时,对待检测图片进行文本位置检测,得到至少一个文字切片;将预处理后的每个文字切片输入至所述文本方向检测模型,并获取文本方向检测模型的输出作为每个文字切片的文本方向。该文献主要涉及文档图像中单行文字的阅读方向检测,而不是整张文档图像的方向检测。
发明内容
本申请提出了一种文档图像的方向校正方法,包括如下步骤。步骤S10: 寻找输入图像中文档区域的边和四个角点,使用透视变换方法对输入图像进行切边和小角度方向校正;如果输入图像是文档图像,那么切边后的图像就是输入图像的文档区域;小角度方向校正是指将输入图像的文档区域校正为与正确方向的偏差角度为0度、或90度、或180度、或270度的四种形态之一,输入图像的文档区域与四种形态的哪一种最接近,就校正为哪一种形态。步骤S20:将切边和小角度方向校正后的图像通过一个角度分类模型得到偏差角度检测值;偏差角度检测值仅有四种取值——0度、或90度、或180度、或270度。角度分类模型采用如下方式得到:(1)收集多张切边后的文档图像以及无文档的背景类图像及相应的方向标签作为训练数据集合;切边后的文档图像的实际方向与正确方向的偏差角度或者为0度,或者为90度,或者为180度,或者为270度;切边后的文档图像的方向标签用来记载该张切边后的文档图像的实际方向与正确方向的偏差角度;背景类图像的方向标签表示该张图像为背景类图像;(2)对训练数据集合中的部分或全部图像以90度为单位进行随机旋转,并相应地改变旋转后的切边后的文档图像的方向标签,得到增强的训练数据集合;(3)使用增强的训练数据集合训练一个角度分类模型,角度分类模型用于区分文档图像与背景类图像,还用于识别每张文档图像的实际方向与正确方向的偏差角度是四种取值中的哪一种。步骤S30:根据偏差角度检测值对切边和小角度方向校正后的图像的方向进行校正。
本申请还提出了一种文档图像的方向校正装置,包括切边和小角度方向校正单元、偏差角度检测单元、以及大角度方向校正单元。切边和小角度方向校正单元用于寻找输入图像中文档区域的边和四个角点,使用透视变换方法对输入图像进行切边和小角度方向校正;如果输入图像是文档图像,那么切边后的图像就是输入图像的文档区域;小角度方向校正是指将输入图像的文档区域校正为与正确方向的偏差角度为0度、或90度、或180度、或270度的四种形态之一,输入图像的文档区域与四种形态的哪一种最接近,就校正为哪一种形态。偏差角度检测单元用于将切边和小角度方向校正后的图像通过一个角度分类模型得到偏差角度检测值;偏差角度检测值仅有四种取值——0度、或90度、或180度、或270度;角度分类模型采用如下方式得到:(1)收集多张切边后的文档图像以及无文档的背景类图像及相应的方向标签作为训练数据集合;切边 后的文档图像的实际方向与正确方向的偏差角度或者为0度,或者为90度,或者为180度,或者为270度;切边后的文档图像的方向标签用来记载该张切边后的文档图像的实际方向与正确方向的偏差角度;背景类图像的方向标签表示该张图像为背景类图像;(2)对训练数据集合中的部分或全部图像以90度为单位进行随机旋转,并相应地改变旋转后的切边后的文档图像的方向标签,得到增强的训练数据集合;(3)使用增强的训练数据集合训练一个角度分类模型,角度分类模型用于区分文档图像与背景类图像,还用于识别每张文档图像的实际方向与正确方向的偏差角度是四种取值中的哪一种。大角度方向校正单元用于根据偏差角度检测值对切边和小角度方向校正后的图像的方向进行校正。
附图说明
图1是本申请提出的文档图像的方向校正方法的流程示意图;
图2至图5是透视变换进行小角度方向校正之前的几张切边后的图像的示意图;
图6至图9是透视变换进行小角度方向校正之后的几张切边后的图像的示意图;
图10是本申请提出的文档图像的方向校正装置的结构示意图;
图中附图标记说明:10为切边和小角度方向校正单元、20为偏差角度检测单元、30为大角度方向校正单元。
具体实施方式
本申请所要解决的技术问题是提供一种文档图像的方向校正方法,利用了文档图像中文档区域的信息对图像方向进行判断和快速准确地校正。
本申请取得的技术效果是:针对文档图像提出了一套快速准确切边并校正方向的解决方案。输入一张图片后,系统会自动依据检测算法检测图片中文档区域并给出文档区域的四个角点,通过透视变换切出文档区域并同时进行小角度方向校正,随后通过角度分类模型检测并进行大角度方向校正,为浏览文档图像或其他后续处理操作提供方便。本申请解决的是整张图像的方向检测与校正,而不是其中单行文字的方向。本申请在文档图像的方向转正之前进行了切 边处理,提高文档图像方向转正的准确率。本申请在文档图像的方向分类过成中增加了背景类图像,提升了文档图像方向分类的准确率。
请参阅图1,本申请提出的文档图像的方向校正方法包括如下步骤。
步骤S10:寻找输入图像中文档区域的边和四个角点(corner),使用透视变换(perspective transformation)方法对输入图像进行切边和小角度方向校正,得到切边和小角度方向校正后的图像。如果输入图像是文档图像,那么切边后的图像就是输入图像的文档区域。小角度方向校正是指将输入图像的文档区域校正为与正确方向的偏差角度为0度、或90度、或180度、或270度的四种形态之一,输入图像的文档区域与上述四种形态的哪一种最接近,就校正为哪一种形态。
这一步中如果无法找到输入图像中文档区域的边和四个角点,则表明输入图像属于背景类图像,退出整个流程。
请参阅图2至图5,这是透视变换进行小角度方向校正之前的切边后的图像。请参阅图6至图9,这是透视变换进行小角度方向校正之后的切边后的图像。其中虚线表示切边后的图像的正确方向,实线表示切边后的图像的实际方向,将偏差角度α定义为从切边后的图像的正确方向开始沿着顺时针方向到切边后的图像的实际方向之间的角度,偏差角度α的取值范围是0度≤α<360度。小角度方向校正具体是指:(1)将0度<α<45度以及315度<α<360度的切边后的图像,均校正为α=0度的切边后的图像。图2所示的切边后的图像经过透视变换进行小角度方向校正后就得到图6所示的切边后的图像。(2)将45度<α<135度的切边后的图像,均校正为α=90度的切边后的图像。图3所示的切边后的图像经过透视变换进行小角度方向校正后就得到图7所示的切边后的图像。(3)将135度<α<225度的切边后的图像,均校正为α=180度的切边后的图像。图4所示的切边后的图像经过透视变换进行小角度方向校正后就得到图8所示的切边后的图像。(4)将225度<α<315度的切边后的图像,均校正为α=270度的切边后的图像。图5所示的切边后的图像经过透视变换进行小角度方向校正后就得到图9所示的切边后的图像。还有几种特殊情况说明如下。当α=0度时,无需进行小角度校正。当α=45度时,既可以校正为为α=0度的切边后的图像,也可以校正为α=90度的切边后的图像。当α=135度时, 既可以校正为α=90度的切边后的图像,也可以校正为α=180度的切边后的图像。当α=225度时,既可以校正为为α=180度的切边后的图像,也可以校正为α=270度的切边后的图像。当α=315度时,既可以校正为为α=270度的切边后的图像,也可以校正为α=0度的切边后的图像。小角度方向校正之后得到的切边后的图像的偏差角度α只有四种取值——0度,90度,180度,270度——分别如图6至图9所示。
进一步地,小角度方向校正还包括:当α=0度时,不进行小角度校正;当α=45度时,或者校正为为α=0度的切边后的图像,或者校正为α=90度的切边后的图像;当α=135度时,或者校正为为α=90度的切边后的图像,或者校正为α=180度的切边后的图像;当α=225度时,或者校正为为α=180度的切边后的图像,或者校正为α=270度的切边后的图像;当α=315度时,或者校正为为α=270度的切边后的图像,或者校正为α=0度的切边后的图像。
步骤S20:将切边和小角度方向校正后的图像通过一个角度分类模型得到偏差角度检测值。偏差角度检测值仅有四种取值——0度、或90度、或180度、或270度。
角度分类模型采用如下方式得到。(1)收集多张图像及相应的方向标签作为训练数据集合(training dataset)。多张图像中,大部分是切边后的文档图像,剩余的小部分是背景类图像。切边后的文档图像是指将边缘的空白区域去除、仅保留图像中的文档区域的图像;并且这些切边后的文档图像的实际方向与正确方向的偏差角度或者为0度,或者为90度,或者为180度,或者为270度。背景类图像是指无文档的图像。切边后的文档图像的方向标签用来记载该张切边后的文档图像的实际方向与正确方向的偏差角度。背景类图像的方向标签固定为一个,表示该张图像为背景类图像。(2)对训练数据集合中的部分或全部图像以90度为单位进行随机旋转,并相应地改变旋转后的切边后的文档图像的方向标签。这样处理后的训练数据集合称为增强的训练数据集合,其目的是使切边后的文档图像在不同方向的分布尽量均匀。(3)使用增强的训练数据集合训练一个角度分类模型,角度分类模型用于区分文档图像与背景类图像,还用于识别每张文档图像的实际方向与正确方向的偏差角度是四种取值中的哪一种。优选地,角度分类模型采用轻量级神经网络(Neural Network,NN)训练得到, 例如SqueezeNet、MobileNet、ShuffleNet、EffNet等,以便于部署在智能手机等移动端。优选地,训练过程中增强的训练数据集合中的所有图像的输入尺寸统一缩放到一个固定尺寸,以获得较好的训练效果。
优选地,这一步先将切边和小角度方向校正后的图像缩放为角度分类模型训练时输入图像的固定尺寸,再将缩放后的切边和小角度方向校正后的图像送入角度分类模型。
这一步中,如果角度分类模型判定切边和小角度方向校正后的图像属于背景类图像,则无法给出偏差角度检测值,退出整个流程。
这一步中,如果角度分类模型判定切边和小角度方向校正后的图像的偏差角度检测值为0度,则无需对切边和小角度方向校正后的图像的方向进行校正,退出整个流程。
步骤S30:如果角度分类模型给出了切边和小角度方向校正后的图像的偏差角度检测值,表明切边和小角度方向校正后的图像属于文档图像,此时根据偏差角度检测值对切边和小角度方向校正后的图像的方向进行校正,将切边和小角度方向校正后的图像旋转为正确方向。这一步是针对切边和小角度方向校正后的图像进行旋转校正,方便对方向校正后的图像的阅读与打印。
请参阅图10,本申请提出的文档图像的方向校正装置包括切边和小角度方向校正单元10、偏差角度检测单元20、大角度方向校正单元30。
切边和小角度方向校正单元10用于寻找输入图像中文档区域的边和四个角点,使用透视变换方法对输入图像进行切边和小角度方向校正,得到切边和小角度方向校正后的图像。
偏差角度检测单元20用于将切边和小角度方向校正后的图像通过一个角度分类模型得到偏差角度检测值。偏差角度检测值仅有四种取值——0度、或90度、或180度、或270度。角度分类模型采用如下方式得到。(1)收集多张图像及相应的方向标签作为训练数据集合。多张图像中,大部分是切边后的文档图像,剩余的小部分是背景类图像。切边后的文档图像的实际方向与正确方向的偏差角度或者为0度,或者为90度,或者为180度,或者为270度。背景类图像是指无文档的图像。切边后的文档图像的方向标签用来记载该张切边后的文档图像的实际方向与正确方向的偏差角度。背景类图像的方向标签固定为 一个,表示该张图像为背景类图像。(2)对训练数据集合中的部分或全部图像以90度为单位进行随机旋转,并相应地改变旋转后的切边后的文档图像的方向标签。这样处理后的训练数据集合称为增强的训练数据集合。(3)使用增强的训练数据集合训练一个角度分类模型,角度分类模型用于区分文档图像与背景类图像,还用于识别每张文档图像的实际方向与正确方向的偏差角度是四种取值中的哪一种。
大角度方向校正单元30用于根据偏差角度检测值对切边和小角度方向校正后的图像的方向进行校正,将切边和小角度方向校正后的图像旋转为正确方向。
本申请提出的文档图像的方向校正方法和装置具有如下有益效果。
第一,本申请基于深度学习(deep learning)技术,鲁棒性(robustness,也称健壮性、稳健性)高。
第二,本申请将文档图像的方向计算问题改为背景类图像、四个大角度方向(0度、90度、180度、270度)分类问题,运算速度快。本申请不做每个小角度处理,简化文档图像的方向校正的复杂度,便于神经网络学习。
第三,本申请使用轻量级神经网络训练得到角度分类模型,运算速度快、体积小,特别适合部署在移动端。
以上仅为本申请的优选实施例,并不用于限定本申请。对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (10)

  1. 一种文档图像的方向校正方法,包括如下步骤;
    步骤S10:寻找输入图像中文档区域的边和四个角点,使用透视变换方法对所述输入图像进行切边和小角度方向校正;如果所述输入图像是文档图像,那么切边后的图像就是所述输入图像的文档区域;所述小角度方向校正是指将输入图像的文档区域校正为与正确方向的偏差角度为0度、或90度、或180度、或270度的四种形态之一,输入图像的文档区域与所述四种形态的哪一种最接近,就校正为哪一种形态;
    步骤S20:将所述切边和小角度方向校正后的图像通过一个角度分类模型得到偏差角度检测值;所述偏差角度检测值仅有四种取值——0度、或90度、或180度、或270度;
    所述角度分类模型采用如下方式得到:(1)收集多张切边后的文档图像以及无文档的背景类图像及相应的方向标签作为训练数据集合;所述切边后的文档图像的实际方向与正确方向的偏差角度或者为0度,或者为90度,或者为180度,或者为270度;所述切边后的文档图像的方向标签用来记载该张切边后的文档图像的实际方向与正确方向的偏差角度;所述背景类图像的方向标签表示该张图像为背景类图像;(2)对所述训练数据集合中的部分或全部图像以90度为单位进行随机旋转,并相应地改变旋转后的切边后的文档图像的方向标签,得到增强的训练数据集合;(3)使用所述增强的训练数据集合训练一个角度分类模型,所述角度分类模型用于区分文档图像与背景类图像,还用于识别每张文档图像的实际方向与正确方向的偏差角度是四种取值中的哪一种;
    步骤S30:根据所述偏差角度检测值对所述切边和小角度方向校正后的图像的方向进行校正。
  2. 根据权利要求1所述的文档图像的方向校正方法,其中,所述步骤S10中,如果无法找到输入图像中文档区域的边和四个角点,则表明所述输入图像不是文档图像,退出整个流程。
  3. 根据权利要求1或2所述的文档图像的方向校正方法,其中,所述步骤S10中,将偏差角度α定义为从切边后的图像的正确方向开始沿着顺时针方向到切边后的图像的实际方向之间的角度,偏差角度α的取值范围是0度≤α<360 度;所述小角度方向校正包括:将0度<α<45度以及315度<α<360度的切边后的图像均校正为α=0度的切边后的图像;将45度<α<135度的切边后的图像均校正为α=90度的切边后的图像;将135度<α<225度的切边后的图像均校正为α=180度的切边后的图像;将225度<α<315度的切边后的图像均校正为α=270度的切边后的图像。
  4. 根据权利要求1至3中任意一项所述的文档图像的方向校正方法,其中,所述小角度方向校正还包括:当α=0度时,不进行小角度校正;当α=45度时,或者校正为为α=0度的切边后的图像,或者校正为α=90度的切边后的图像;当α=135度时,或者校正为为α=90度的切边后的图像,或者校正为α=180度的切边后的图像;当α=225度时,或者校正为为α=180度的切边后的图像,或者校正为α=270度的切边后的图像;当α=315度时,或者校正为为α=270度的切边后的图像,或者校正为α=0度的切边后的图像。
  5. 根据权利要求1至4中任意一项所述的文档图像的方向校正方法,其中,所述步骤S20中,所述角度分类模型采用轻量级神经网络训练得到。
  6. 根据权利要求1至5中任意一项所述的文档图像的方向校正方法,其中,所述步骤S20中,所述角度分类模型在训练时,将所述增强的训练数据集合中的所有图像的输入尺寸统一缩放到一个固定尺寸。
  7. 根据权利要求6所述的文档图像的方向校正方法,其中,所述步骤S20中,先将所述切边和小角度方向校正后的图像缩放为所述角度分类模型训练时输入图像的固定尺寸,再将缩放后的所述切边和小角度方向校正后的图像送入所述角度分类模型。
  8. 根据权利要求1至7中任意一项所述的文档图像的方向校正方法,其中,所述步骤S20中,如果所述角度分类模型判定所述切边和小角度方向校正后的图像属于背景类图像,则退出整个流程。
  9. 根据权利要求1至8中任意一项所述的文档图像的方向校正方法,其中,所述步骤S20中,如果所述角度分类模型判定所述切边和小角度方向校正后的图像的偏差角度检测值为0度,则退出整个流程。
  10. 一种文档图像的方向校正装置,包括切边和小角度方向校正单元、偏差角度检测单元、以及大角度方向校正单元;
    所述切边和小角度方向校正单元用于寻找输入图像中文档区域的边和四个角点,使用透视变换方法对所述输入图像进行切边和小角度方向校正;如果所述输入图像是文档图像,那么切边后的图像就是所述输入图像的文档区域;所述小角度方向校正是指将输入图像的文档区域校正为与正确方向的偏差角度为0度、或90度、或180度、或270度的四种形态之一,输入图像的文档区域与所述四种形态的哪一种最接近,就校正为哪一种形态;
    所述偏差角度检测单元用于将所述切边和小角度方向校正后的图像通过一个角度分类模型得到偏差角度检测值;所述偏差角度检测值仅有四种取值——0度、或90度、或180度、或270度;所述角度分类模型采用如下方式得到:(1)收集多张切边后的文档图像以及无文档的背景类图像及相应的方向标签作为训练数据集合;所述切边后的文档图像的实际方向与正确方向的偏差角度或者为0度,或者为90度,或者为180度,或者为270度;所述切边后的文档图像的方向标签用来记载该张切边后的文档图像的实际方向与正确方向的偏差角度;所述背景类图像的方向标签表示该张图像为背景类图像;(2)对所述训练数据集合中的部分或全部图像以90度为单位进行随机旋转,并相应地改变旋转后的切边后的文档图像的方向标签,得到增强的训练数据集合;(3)使用所述增强的训练数据集合训练一个角度分类模型,所述角度分类模型用于区分文档图像与背景类图像,还用于识别每张文档图像的实际方向与正确方向的偏差角度是四种取值中的哪一种;
    所述大角度方向校正单元用于根据所述偏差角度检测值对所述切边和小角度方向校正后的图像的方向进行校正。
PCT/CN2022/088550 2021-12-31 2022-04-22 一种文档图像的方向校正方法与装置 WO2023123763A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111679610.5 2021-12-31
CN202111679610.5A CN114267046A (zh) 2021-12-31 2021-12-31 一种文档图像的方向校正方法与装置

Publications (1)

Publication Number Publication Date
WO2023123763A1 true WO2023123763A1 (zh) 2023-07-06

Family

ID=80832566

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/088550 WO2023123763A1 (zh) 2021-12-31 2022-04-22 一种文档图像的方向校正方法与装置

Country Status (2)

Country Link
CN (1) CN114267046A (zh)
WO (1) WO2023123763A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114267046A (zh) * 2021-12-31 2022-04-01 上海合合信息科技股份有限公司 一种文档图像的方向校正方法与装置
CN115457559B (zh) * 2022-08-19 2024-01-16 上海通办信息服务有限公司 一种将文本及证照类图片智能摆正的方法、装置和设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681729A (zh) * 2018-05-08 2018-10-19 腾讯科技(深圳)有限公司 文本图像矫正方法、装置、存储介质及设备
CN112101367A (zh) * 2020-09-15 2020-12-18 杭州睿琪软件有限公司 文本识别方法、图像识别分类方法、文档识别处理方法
CN112419207A (zh) * 2020-11-17 2021-02-26 苏宁金融科技(南京)有限公司 一种图像矫正方法及装置、系统
WO2021221614A1 (en) * 2020-04-28 2021-11-04 Hewlett-Packard Development Company, L.P. Document orientation detection and correction
CN114267046A (zh) * 2021-12-31 2022-04-01 上海合合信息科技股份有限公司 一种文档图像的方向校正方法与装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681729A (zh) * 2018-05-08 2018-10-19 腾讯科技(深圳)有限公司 文本图像矫正方法、装置、存储介质及设备
WO2021221614A1 (en) * 2020-04-28 2021-11-04 Hewlett-Packard Development Company, L.P. Document orientation detection and correction
CN112101367A (zh) * 2020-09-15 2020-12-18 杭州睿琪软件有限公司 文本识别方法、图像识别分类方法、文档识别处理方法
CN112419207A (zh) * 2020-11-17 2021-02-26 苏宁金融科技(南京)有限公司 一种图像矫正方法及装置、系统
CN114267046A (zh) * 2021-12-31 2022-04-01 上海合合信息科技股份有限公司 一种文档图像的方向校正方法与装置

Also Published As

Publication number Publication date
CN114267046A (zh) 2022-04-01

Similar Documents

Publication Publication Date Title
WO2023123763A1 (zh) 一种文档图像的方向校正方法与装置
CN109993112B (zh) 一种图片中表格的识别方法及装置
CN110569832B (zh) 基于深度学习注意力机制的文本实时定位识别方法
CN110032938B (zh) 一种藏文识别方法、装置及电子设备
US20210342571A1 (en) Automated signature extraction and verification
CN114299528B (zh) 一种针对扫描文档的信息提取和结构化方法
US6345130B1 (en) Method and arrangement for ensuring quality during scanning/copying of images/documents
WO2020097909A1 (zh) 文本检测方法、装置及存储介质
CN101719142B (zh) 基于分类字典的稀疏表示图片文字检测方法
US20220222284A1 (en) System and method for automated information extraction from scanned documents
WO2021047484A1 (zh) 文字识别方法和终端设备
CN102663379A (zh) 一种基于图像识别的阅卷方法及系统
US20210334529A1 (en) On-device partial recognition systems and methods
CN110738238B (zh) 一种证件信息的分类定位方法及装置
US11893765B2 (en) Method and apparatus for recognizing imaged information-bearing medium, computer device and medium
Li et al. Automatic comic page segmentation based on polygon detection
CN110717492A (zh) 基于联合特征的图纸中字符串方向校正方法
CN112686257A (zh) 一种基于ocr的店头文字识别方法及系统
CN111062317A (zh) 一种扫描文档的裁边方法与系统
CN113221897B (zh) 图像矫正方法、图像文本识别方法、身份验证方法及装置
WO2020244076A1 (zh) 人脸识别方法、装置、电子设备及存储介质
CN116524508A (zh) 表格类图像的矫正方法及装置、存储介质、计算机设备
CN115588024A (zh) 一种基于人工智能的复杂工业影像边缘提取方法及装置
CN115457559A (zh) 一种将文本及证照类图片智能摆正的方法、装置和设备
CN114359931A (zh) 一种快递面单识别方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22913064

Country of ref document: EP

Kind code of ref document: A1