JP4847592B2

JP4847592B2 - Method and system for correcting distorted document images

Info

Publication number: JP4847592B2
Application number: JP2010009524A
Authority: JP
Inventors: シュリーフェン; ウェンドンチャオ
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2009-01-22
Filing date: 2010-01-19
Publication date: 2011-12-28
Anticipated expiration: 2030-01-19
Also published as: CN101789122B; CN101789122A; JP2010171976A

Description

本発明は、歪み文書画像を補正する方法及びシステムに関し、特に、デジタルカメラにより書籍又は製本された文書から得られる歪み文書画像を補正する方法及びシステムに関する。 The present invention relates to a method and system for correcting a distorted document image, and more particularly to a method and system for correcting a distorted document image obtained from a book or a book bound by a digital camera.

デジタルカメラは、消費者の消耗品としてだけでなく、ビジネス及び専門技術等の分野における便利なツールとして益々人気が高まってきている。ＯＣＲ（光学文字認識）分野に対して、デジタルカメラは文書撮像装置であるスキャナに潜在的に代わる可能性を提供する。しかし、現在のＯＣＲ技術は、主に平坦な文書のデジタル走査済画像用に設計されており、カメラにより撮影された一般的な文書画像を処理できない。 Digital cameras are becoming increasingly popular not only as consumer consumables but also as useful tools in the business and technical fields. For the OCR (optical character recognition) field, digital cameras offer the potential to replace scanners, which are document imaging devices. However, current OCR technology is designed primarily for digitally scanned images of flat documents and cannot process general document images taken by a camera.

スキャナにより取り込まれた画像は、前方から直視してほぼ０度の視野角を有し、一般的に平坦であるため、透視又は反りの問題のために起こる幾何学的歪みが殆どない。しかし、デジタルカメラにより書籍や他の種類の製本された文書の画像を取り込む場合、多少の非ゼロの視野角を有し、書籍や製本された文書はそれらの厚さに依存する特定の反りを有する。その結果、デジタルカメラにより撮影された文書画像は、透視及び反りの問題により起こる幾何学的歪みの悪影響を受ける。図８に、透視及び反りの問題によって起こる明らかな歪みを含むデジタルカメラにより撮影された文書画像の一例を示す。そのような歪み文書画像がＯＣＲ動作に対して直接使用される場合、認識精度は低くなる。 The image captured by the scanner has a viewing angle of approximately 0 degrees when viewed directly from the front, and is generally flat, so there is little geometric distortion caused by fluoroscopy or warping problems. However, when capturing an image of a book or other type of bound document with a digital camera, it has some non-zero viewing angle, and the book or bound document has a specific curvature that depends on their thickness. Have. As a result, document images taken with a digital camera are adversely affected by geometric distortions caused by perspective and warping problems. FIG. 8 shows an example of a document image taken by a digital camera including obvious distortion caused by the problems of perspective and warpage. If such a distorted document image is used directly for OCR operation, the recognition accuracy will be low.

歪み文書画像の歪みを補正する多くの方法が提案されている。１つのカテゴリの方法は、歪みを補正するために特別な３Ｄ走査機器を使用する。非特許文献１における方法によると、文書の３Ｄ表面に２Ｄ光ネットワークを投影するためにレーザプロジェクタが使用され、メッシュが文書の３Ｄ表面を表すために作られ、展開可能なメッシュに直接平坦化されるか又は変換される。 Many methods for correcting the distortion of the distorted document image have been proposed. One category of methods uses special 3D scanning equipment to correct the distortion. According to the method in NPL 1, a laser projector is used to project a 2D optical network onto a 3D surface of a document and a mesh is created to represent the 3D surface of the document and is directly flattened into a deployable mesh. Or converted.

或いは、表面の３Ｄ形状は文書画像から推定される。３Ｄ形状を推定するパラメトリック法及び形状推定処理を回避するノンパラメトリック法が存在する。 Alternatively, the 3D shape of the surface is estimated from the document image. There are parametric methods for estimating 3D shapes and non-parametric methods for avoiding shape estimation processing.

非特許文献２において、書籍の３Ｄ表面モデルをシミュレートするための円筒及び平面の組合せを使用する方法が紹介されたが、このモデルに関するパラメータを推定する方法及び歪みを補正する際にモデルを使用する方法に関する問題は解決されないままである。更にこの方法は、特別な機器が使用される必要があるため高価である。また、この方法はスキャナにより走査される画像にも適用可能である。 Non-Patent Document 2 introduced a method of using a combination of a cylinder and a plane for simulating a 3D surface model of a book, a method for estimating parameters related to this model, and a model for correcting distortion Problems with how to do remain unresolved. Furthermore, this method is expensive because special equipment needs to be used. This method can also be applied to an image scanned by a scanner.

非特許文献３において紹介された方法によると、文書の画像の各点の文書の奥行きは、奥行き画像を作成するようにある特定のステレオビジョン方法により取得され、奥行き画像に従って文書の画像を平面に修正する。任意の種類の画像歪みが修正されると考えられるが、奥行き画像により規定される文書のノイズのある粗い表面上の点を平面上の点にマップする方法は依然として問題である。 According to the method introduced in Non-Patent Document 3, the depth of the document at each point of the document image is acquired by a specific stereo vision method so as to create a depth image, and the document image is planarized according to the depth image. Correct it. Although any type of image distortion is believed to be corrected, the method of mapping a point on the noisy rough surface of the document defined by the depth image to a point on the plane remains a problem.

非特許文献４において、製本された書籍の走査済画像は、文字セグメンテーション処理により修正される。陰になっている文字（表面が丸まっている）はセグメント化され、それらの文字の向き及び元の場所が推定され、文字はそれに従って調整される。 In Non-Patent Document 4, a scanned image of a book that has been bound is corrected by a character segmentation process. The shadowed characters (with rounded surfaces) are segmented, their orientation and original location are estimated, and the characters are adjusted accordingly.

カメラにより取り込まれた製本された文書の画像を修正するモデルを使用する方法は、非特許文献５において提案された。文書の表面は、一般柱面により表される。折れ等の他の反りの種類が処理できないことは明らかである。 A method of using a model for correcting an image of a bound document captured by a camera was proposed in Non-Patent Document 5. The surface of the document is represented by a general column surface. Obviously, other warp types such as folds cannot be handled.

一般に、文書画像が透視の問題に起因する歪みのみを有する場合、ページのエッジの方向の情報は、文字の方向を近似するために使用される。しかし、透視の問題だけでなく反りの問題によっても歪みが起きる場合、ある１ページの文字は異なる方向に異なる歪みを有するため、上記近似の方法は効果的ではない。 In general, if the document image has only distortions due to perspective problems, the information about the direction of the edge of the page is used to approximate the direction of the characters. However, when distortion occurs due not only to the perspective problem but also to the warp problem, the character of one page has different distortions in different directions, so the above approximation method is not effective.

従って、カメラにより撮影された文書画像に必ず伴う反り及び透視の問題による画像中の歪みに対処でき且つ実現するのに容易で効果的である技術が必要とされる。 Therefore, there is a need for a technique that is easy and effective to cope with and realize the distortion in the image due to the problem of warping and fluoroscopy necessarily associated with the document image taken by the camera.

A. Doncescu、A. Bouju、V. Quilletによる「Former Books Digital Processing: Image Warping」Proc. Workshop of Document Image Analysis、ページ5-9、1997年"Former Books Digital Processing: Image Warping" by A. Doncescu, A. Bouju, V. Quillet, Proc. Workshop of Document Image Analysis, pages 5-9, 1997 T. Kanungo、R. Haralick、I. Philipsによる「Global and Local Document Degradation Models」in Proc. 2nd International Conference on Document Analysis and Recognition、1993年"Global and Local Document Degradation Models" by T. Kanungo, R. Haralick, and I. Philips in Proc. 2nd International Conference on Document Analysis and Recognition, 1993 M. S. Brown、W. B. Sealesによる「Document Restoration Using 3D Shape: a General Deskewing Algorithm for Arbitrarily Warped Documents」Proc. International Conference on Computer vision、2001年7月"Document Restoration Using 3D Shape: a General Deskewing Algorithm for Arbitrarily Warped Documents" by M. S. Brown, W. B. Seales, Proc. International Conference on Computer vision, July 2001 Z. Zhang、C. L. Tanによる「Restoration of Images Scanned from Thick Bound Documents」in Proc. 6th International Conference on Document Analysis and Recognition、2001年"Restoration of Images Scanned from Thick Bound Documents" by Z. Zhang, C. L. Tan in Proc. 6th International Conference on Document Analysis and Recognition, 2001 Huaigu Cao、Xiaoqing Ding、Changsong Liuによる「Rectifying the Bound Document Image Captured by the Camera: A Model Based Approach」Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003)`` Rectifying the Bound Document Image Captured by the Camera: A Model Based Approach '' Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) by Huaigu Cao, Xiaoqing Ding, and Changsong Liu

上述の従来技術に存在する技術的問題を考慮して、歪み文書画像を補正する新しい方法が提供される。本発明は、自然に開かれた書籍又は自然に開かれた用紙束に常に存在する湾曲した文書ページがテキスト行の方向に垂直に又は綴じる方向に平行に拡張し且つテキスト行の方向に配置されるプレーナストライプのグループにより近似されるという基本概念に基づく。即ち、これらの画像ストライプは、少なくとも１行の垂直行を含む歪み文書画像を分割することにより得られる。各ストライプにおいて、反りは無視され、透視による歪みが主要な歪みになる。従って、複素非線形問題は、複数のより単純な局所線形問題に変換される。本発明は、テキスト行の局所的な向き及び垂直な文字ストロークである２つの重要なテキストの特徴に更に基づく。それらの特徴は、局所線形歪みを識別するのに使用される。 In view of the technical problems present in the prior art described above, a new method for correcting distorted document images is provided. The present invention extends a curved document page that is always present in a naturally opened book or a naturally opened stack of paper, extending perpendicularly to the direction of the text line or parallel to the direction of binding, and arranged in the direction of the text line. Based on the basic concept of being approximated by a group of planar stripes. That is, these image stripes are obtained by dividing a distorted document image including at least one vertical row. In each stripe, warping is ignored and distortion due to fluoroscopy becomes the main distortion. Thus, the complex nonlinear problem is converted into a plurality of simpler local linear problems. The present invention is further based on two important text features which are the local orientation of the text line and the vertical character stroke. Those features are used to identify local linear distortions.

本発明は、いかなる補助デバイスも必要とせず、製本、折れ及び透視により起こる歪み等の広範な歪みの種類に対処できる。尚、本発明は主にデジタルカメラにより撮影された文書画像の歪みを補正することを目的とするが、本発明は、フラットベッドスキャナにより走査された厚い書籍の文書画像等の他のデバイスから入力される文書画像を補正する際に適用可能である。 The present invention does not require any auxiliary devices and can address a wide variety of distortion types, such as those caused by bookbinding, folding and fluoroscopy. The present invention is mainly intended to correct distortion of a document image taken by a digital camera, but the present invention is input from another device such as a document image of a thick book scanned by a flatbed scanner. This is applicable when correcting a document image to be processed.

本発明の１つの態様によると、原稿の歪み文書画像における幾何学的変形を補正する方法であって、
前記原稿のテキスト行に直角な垂直方向の消失点である前記歪み文書画像の垂直消失点を検出する垂直消失点検出ステップと、
前記検出された垂直消失点から導出される複数の垂直線を使用して前記歪み文書画像の領域全体を複数の画像ストライプに分割する画像分割ステップと、
前記複数の画像ストライプの各々において前記原稿のテキスト行に平行な水平方向の消失点を求めることにより、前記複数の画像ストライプの各々に対する水平消失点を検出する水平消失点検出ステップと、
前記垂直消失点から導出された前記複数の垂直線と、前記複数の画像ストライプの各々に対して検出した前記水平消失点それぞれから導出される複数の水平線とに基づいて形成されるメッシュモデルを、前記歪み文書画像と補正文書画像との間のマッピング関係を記述する歪みモデルとして確立する歪みモデル生成ステップと、
前記歪みモデルに基づいて補正文書画像を生成する補正ステップと、
を有することを特徴とする方法が提供される。 According to one aspect of the present invention, a method for correcting geometric deformation in a distorted document image of a document, comprising:
A vertical vanishing point detecting step of detecting a vertical vanishing point of the distorted document image, which is a vertical vanishing point perpendicular to the text line of the document;
An image dividing step of dividing the entire region of the distorted document image into a plurality of image stripes using a plurality of vertical lines derived from the detected vertical vanishing points;
By obtaining the vanishing point of the horizontal direction parallel to the text lines of the document in each of said plurality of image stripes, and the horizontal vanishing point detecting step of detecting a horizontal vanishing point for each of the plurality of image stripes,
Wherein a plurality of vertical lines derived from the vertical vanishing point, a plurality of mesh model formed based on a horizontal line derived from each of the horizontal vanishing point detected for each of the plurality of image stripes, A distortion model generation step for establishing a distortion model describing a mapping relationship between the distortion document image and the corrected document image;
A correction step of generating a corrected document image based on the distortion model;
There is provided a method characterized by comprising:

本発明の別の態様によると、原稿の歪み文書画像における幾何学的変形を補正するシステムであって、
前記原稿のテキスト行に直角な垂直方向の消失点である前記歪み文書画像の垂直消失点を検出する垂直消失点検出手段と、
前記検出された垂直消失点から導出される複数の垂直線を使用して前記歪み文書画像の領域全体を複数の画像ストライプに分割する画像分割手段と、
前記複数の画像ストライプの各々において前記原稿のテキスト行に平行な水平方向の消失点を求めることにより、前記複数の画像ストライプの各々に対する水平消失点を検出する水平消失点検出手段と、
前記垂直消失点から導出された前記複数の垂直線と、前記複数の画像ストライプの各々に対して検出した前記水平消失点それぞれから導出される複数の水平線とに基づいて形成されるメッシュモデルを、前記歪み文書画像と補正文書画像との間のマッピング関係を記述する歪みモデルとして確立する歪みモデル生成手段と、
前記歪みモデルに基づいて補正文書画像を生成する補正手段と、
を有することを特徴とするシステムが提供される。 According to another aspect of the present invention, a system for correcting geometric deformation in a distorted document image of a document, comprising:
Vertical vanishing point detecting means for detecting a vertical vanishing point of the distorted document image, which is a vertical vanishing point perpendicular to the text line of the original;
Image dividing means for dividing the entire region of the distorted document image into a plurality of image stripes using a plurality of vertical lines derived from the detected vertical vanishing point;
By obtaining the vanishing point of the horizontal direction parallel to the text lines of the document in each of said plurality of image stripes, and the horizontal vanishing point detecting means for detecting a horizontal vanishing point for each of the plurality of image stripes,
Wherein a plurality of vertical lines derived from the vertical vanishing point, a plurality of mesh model formed based on a horizontal line derived from each of the horizontal vanishing point detected for each of the plurality of image stripes, A distortion model generating means for establishing a distortion model describing a mapping relationship between the distortion document image and the corrected document image;
Correction means for generating a corrected document image based on the distortion model;
There is provided a system characterized by comprising:

本発明の更なる特徴的な機能及び利点は、図面を参照し、以下の説明を読むことにより明らかとなるだろう。 Further characteristic features and advantages of the present invention will become apparent upon reading the following description with reference to the drawings.

本明細書に取り入れられ且つ本明細書の一部を構成する添付の図面は、本発明の実施形態を示し、説明と共に本発明の原理を説明するものである。 The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, explain the principles of the invention.

本発明に従って歪み文書画像を補正するシステムを実現するための演算装置の構成を示すブロック図である。It is a block diagram which shows the structure of the arithmetic unit for implement | achieving the system which correct | amends a distorted document image according to this invention. 本発明の一実施形態に従って各モジュール手段から構成される歪み文書画像を補正するシステムの一般的な構成を示すブロック図である。It is a block diagram which shows the general structure of the system which correct | amends the distortion document image comprised from each module means according to one Embodiment of this invention. 本発明に従って歪み文書画像を補正する方法を実現する一般的な処理を示すフローチャートである。6 is a flowchart illustrating a general process for realizing a method for correcting a distorted document image according to the present invention. 本発明の一実施形態に従って垂直消失点を検出する例示的な処理を示すフローチャートである。6 is a flowchart illustrating an exemplary process for detecting a vertical vanishing point according to an embodiment of the present invention. 消失点を計算する方法を示すために、線分とその線分の中点及び交点をつなぐ線とにより規定される交差角度を示す図である。It is a figure which shows the intersection angle prescribed | regulated by the line which connects the line segment and the midpoint of the line segment, and an intersection, in order to show the method of calculating a vanishing point. 本発明に従ってテキスト行の曲線の位置を特定する例示的な処理を示すフローチャートである。6 is a flowchart illustrating an exemplary process for identifying the position of a curve in a text line in accordance with the present invention. 歪み文書画像がメッシュモデルのグリッドの補正文書画像にマップされる方法を示す概略図である。FIG. 6 is a schematic diagram illustrating how a distorted document image is mapped to a corrected document image of a mesh model grid. カメラにより書籍の１ページから撮影された一般的な文書画像の一例を示す図である。It is a figure which shows an example of the general document image image | photographed from one page of the book with the camera. 図８に示す文書画像から導出される例示的なエッジ画像を示す図である。It is a figure which shows the example edge image derived | led-out from the document image shown in FIG. （ａ）は、図９に示すエッジ画像に対して回転、圧縮及びランレングス平滑化アルゴリズムを実行することにより取得される例示的なエッジ画像を示す図であり、（ｂ）は、（ａ）に示す画像から抽出される中間の高さの点により構成される例示的な画像を示す図である。(A) is a figure which shows the example edge image acquired by performing a rotation, compression, and a run length smoothing algorithm with respect to the edge image shown in FIG. 9, (b) is (a). It is a figure which shows the example image comprised by the point of the intermediate | middle height extracted from the image shown to. 図９に示すエッジ画像から水平ストロークのエッジを除去することにより取得される垂直ストロークにより構成される例示的なエッジ画像を示す図である。FIG. 10 is a diagram illustrating an exemplary edge image composed of vertical strokes acquired by removing horizontal stroke edges from the edge image shown in FIG. 9. 図１１に示すエッジ画像から抽出された垂直ストロークの連結成分を見つけることにより取得される例示的な画像を示す図である。It is a figure which shows the example image acquired by finding the connection component of the vertical stroke extracted from the edge image shown in FIG. 本発明の一実施形態に係る方法により構成されるメッシュと共に図９の文書画像を示す図である。FIG. 10 is a diagram illustrating the document image of FIG. 9 together with a mesh configured by a method according to an embodiment of the present invention. 本発明に係る歪み補正方法による例示的な補正文書画像を示す図である。It is a figure which shows the example correction | amendment document image by the distortion correction method which concerns on this invention. メッシュが構成される方法を説明するために使用する図である。It is a figure used in order to explain the method by which a mesh is constituted.

本発明の実施形態について、図面を参照して以下に詳細に説明する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

本発明の説明及び請求の範囲において、特に、歪み文書画像において使用される場合、「水平」又は「ｘ方向」という用語はほぼ水平を意味し、「垂直」又は「ｙ方向」という用語はほぼ垂直を意味する。特に、「歪み文書画像」で使用される「水平」という用語は、対応する補正文書画像又は原稿のテキスト行に平行な方向にあることを意味する。例えば、「水平線」という用語は、その線に対応する補正平面画像（又は原稿）の線が補正平面画像（又は原稿）のテキスト行と水平であることを意味する。同様に、歪み文書画像における「垂直」という用語は、対応する補正文書画像又は原稿のテキスト行に垂直であることを意味する。例えば、「垂直ストローク」という用語は、そのストロークが補正平面画像（又は原稿）のテキスト行に垂直であることを意味する。 In the description and claims of the present invention, particularly when used in distorted document images, the term “horizontal” or “x direction” means substantially horizontal and the term “vertical” or “y direction” is approximately Means vertical. In particular, the term “horizontal” used in “distorted document image” means in the direction parallel to the corresponding corrected document image or text line of the document. For example, the term “horizontal line” means that the line of the correction plane image (or document) corresponding to the line is horizontal with the text line of the correction plane image (or document). Similarly, the term “vertical” in a distorted document image means perpendicular to the corresponding corrected document image or text line of the document. For example, the term “vertical stroke” means that the stroke is perpendicular to the text line of the corrected planar image (or document).

本発明の説明において、「左」及び「右」という用語は、普段書籍又は文書を読む時と同様の通常の方法でページを閲覧する際の左側及び右側を示す。 In the description of the present invention, the terms “left” and “right” refer to the left side and the right side when browsing a page in the usual manner similar to when reading a book or document.

本発明の説明において、特に指示のない限り、全てのサイズ（長さ又は幅等）は、「画素」単位である。例えば、Ｌ＜５は、Ｌが５画素より少ないことを意味する。 In the description of the present invention, unless otherwise indicated, all sizes (length or width, etc.) are in “pixel” units. For example, L <5 means that L is less than 5 pixels.

図１は、本発明に従って歪み文書画像を補正するシステムを実現するための演算装置の構成を示すブロック図である。簡潔にするために、システムは単一の演算装置に内蔵されるように示される。しかし、システムは、単一の演算装置に内蔵されるか又はネットワークシステムとして複数の演算装置に内蔵されるかに関わらず効果的である。 FIG. 1 is a block diagram showing the configuration of an arithmetic unit for realizing a system for correcting a distorted document image according to the present invention. For simplicity, the system is shown as being built into a single computing device. However, the system is effective regardless of whether the system is incorporated in a single computing device or a plurality of computing devices as a network system.

図１に示すように、演算装置１００は、歪み文書画像を補正する処理を実現するために使用される。演算装置１００は、ＣＰＵ１０１、チップセット１０２、ＲＡＭ１０３、記憶装置コントローラ１０４、ディスプレイコントローラ１０５、ハードディスクドライブ１０６、ＣＤ−ＲＯＭドライブ１０７及びディスプレイ１０８を含んでも良い。演算装置１００は、ＣＰＵ１０１とチップセット１０２との間に接続される信号線１１１、チップセット１０２とＲＡＭ１０３との間に接続される信号線１１２、チップセット１０２と種々の周辺装置との間に接続される周辺装置バス１１３、記憶装置コントローラ１０４とハードディスクドライブ１０６との間に接続される信号線１１４、記憶装置コントローラ１０４とＣＤ−ＲＯＭドライブ１０７との間に接続される信号線１１５及びディスプレイコントローラ１０５とディスプレイ１０８との間に接続される信号線１１６を更に含んでも良い。 As shown in FIG. 1, the arithmetic device 100 is used to realize processing for correcting a distorted document image. The arithmetic device 100 may include a CPU 101, a chip set 102, a RAM 103, a storage device controller 104, a display controller 105, a hard disk drive 106, a CD-ROM drive 107, and a display 108. The arithmetic device 100 includes a signal line 111 connected between the CPU 101 and the chip set 102, a signal line 112 connected between the chip set 102 and the RAM 103, and a connection between the chip set 102 and various peripheral devices. Peripheral device bus 113, signal line 114 connected between storage device controller 104 and hard disk drive 106, signal line 115 connected between storage device controller 104 and CD-ROM drive 107, and display controller 105 And a signal line 116 connected between the display 108 and the display 108.

クライアント１２０は、演算装置１００に直接接続されても良く、ネットワーク１３０を介して接続されても良い。クライアント１２０は、演算装置１００に補正タスクを送出しても良く、演算装置１００は、クライアント１２０に補正結果を返しても良い。 The client 120 may be directly connected to the arithmetic device 100 or may be connected via the network 130. The client 120 may send a correction task to the arithmetic device 100, and the arithmetic device 100 may return the correction result to the client 120.

図２は、各モジュール手段から構成される歪み文書画像を補正するシステムの一般的な構成を示すブロック図である。 FIG. 2 is a block diagram showing a general configuration of a system for correcting a distorted document image constituted by each module means.

図２に示すように、歪み補正システム２００は、前記歪み文書画像の垂直消失点を検出する垂直消失点検出手段２０１と、検出した垂直消失点から開始する垂直線を使用して前記歪み文書画像の領域全体を複数の画像ストライプに分割する画像分割手段２０３と、前記画像ストライプ毎に水平消失点を検出する水平消失点検出手段２０５と、垂直消失点及び前記画像ストライプの水平消失点を使用して歪み文書画像と補正文書画像との間のマッピング関係を記述する歪みモデルを確立する歪みモデル生成手段２０７と、モデルに基づいて補正文書画像を生成する補正手段２０９とを含んでも良い。垂直消失点検出手段２０１は、歪み文書画像から文字の複数の垂直ストロークを抽出する垂直ストローク抽出手段２０１１と、複数の垂直直線線分により前記垂直ストロークをフィッティングする垂直直線線分フィッティング手段２０１３と、垂直直線線分の最適な収束点を探索することにより垂直直線線分から垂直消失点を計算する垂直最適収束点計算手段２０１５とを含むのが好ましい。垂直最適収束点計算手段２０１５は、前記垂直直線線分の任意の２つの間の交点を計算する交点計算手段２０１５−１と、交差角度の正弦の２乗の合計を最小にする最適な収束点として前記交点のうち１つの交点を選択する最適点選択手段２０１５−２とを含むのが好ましい。水平消失点検出手段２０５は、前記歪み文書画像からテキスト行の方向に沿ってテキスト行の曲線の位置を特定するテキスト行曲線位置特定手段２０５１と、画像ストライプに存在するテキスト行曲線の断片を抽出する断片抽出手段２０５２と、水平直線線分によりテキスト行曲線の前記断片をフィッティングする水平直線線分フィッティング手段２０５３と、水平直線線分の最適な収束点を探索することにより前記水平直線線分から水平消失点を計算する水平最適収束点計算手段２０５４とを含むのが好ましい。水平最適収束点計算手段２０５４は、前記水平直線線分の任意の２つの間の交点を計算する交点計算手段２０５４−１と、交差角度の正弦の２乗の合計を最小にする最適な収束点として前記交点のうち１つの交点を選択する最適点選択手段２０５４−２とを含むのが好ましい。テキスト行曲線位置特定手段２０５１は、歪み文書画像の文字の画素に対する中間の高さの点を抽出する中間高さ点抽出手段２０５１−１と、前記中間の高さの点を使用することによりテキスト行の文字の中間の高さにわたるテキスト行曲線の位置を特定するテキスト行曲線位置特定手段２０５１−２とを含むのが好ましい。 As shown in FIG. 2, the distortion correction system 200 uses the vertical vanishing point detecting means 201 for detecting the vertical vanishing point of the distorted document image and the distorted document image using a vertical line starting from the detected vertical vanishing point. An image dividing unit 203 for dividing the entire area into a plurality of image stripes, a horizontal vanishing point detecting unit 205 for detecting a horizontal vanishing point for each image stripe, a vertical vanishing point, and a horizontal vanishing point of the image stripe. Further, a distortion model generation unit 207 that establishes a distortion model that describes a mapping relationship between the distortion document image and the corrected document image, and a correction unit 209 that generates a correction document image based on the model may be included. The vertical vanishing point detection unit 201 includes a vertical stroke extraction unit 2011 that extracts a plurality of vertical strokes of a character from a distorted document image, a vertical straight line segment fitting unit 2013 that fits the vertical stroke by a plurality of vertical straight line segments, It is preferable to include vertical optimum convergence point calculation means 2015 for calculating a vertical vanishing point from the vertical straight line segment by searching for an optimal convergence point of the vertical straight line segment. The vertical optimum convergence point calculation means 2015 includes an intersection calculation means 2015-1 for calculating an intersection between any two of the vertical straight line segments, and an optimum convergence point that minimizes the sum of the sine squares of the intersection angles. It is preferable to include optimum point selection means 2015-2 for selecting one of the intersections. The horizontal vanishing point detecting means 205 extracts a text line curve position specifying means 2051 for specifying the position of the curve of the text line along the direction of the text line from the distorted document image, and a text line curve fragment existing in the image stripe. A segment extracting unit 2052 for performing horizontal segment line fitting unit 2053 for fitting the segment of the text line curve by a horizontal line segment, and searching for an optimal convergence point of the horizontal line segment by horizontally searching the horizontal line segment. It is preferable to include a horizontal optimum convergence point calculating means 2054 for calculating the vanishing point. The horizontal optimal convergence point calculation means 2054 is an intersection calculation means 2054-1 for calculating an intersection between any two of the horizontal straight line segments, and an optimal convergence point that minimizes the sum of the squares of the sine of the intersection angles. And an optimum point selecting means 2054-2 for selecting one of the intersections. The text line curve position specifying means 2051 uses the intermediate height point extracting means 2051-1 for extracting the intermediate height point with respect to the pixel of the character of the distorted document image, and the intermediate height point to use the text. It is preferable to include text line curve position specifying means 2051-2 for specifying the position of the text line curve over the middle height of the characters of the line.

上記手段は、以下に説明する処理を実現する例示的で好適なモジュールである。種々のステップを実現するモジュールについては、網羅的に上述されていない。しかし、ある特定の処理を実行するステップが存在する場合、対応する機能性モジュール又は同一の処理を実現する手段が存在する。 The above means is an exemplary suitable module for realizing the processing described below. The modules that implement the various steps are not exhaustively described above. However, when there is a step for executing a specific process, there is a corresponding functional module or means for realizing the same process.

図３は、本発明に従って歪み文書画像を補正する方法を実現する処理を示すフローチャートである。図８に、カメラにより書籍のページから撮影された一般的な文書画像の一例を示す。図８に示すように、反り及び透視の問題による明らかな歪みがある。 FIG. 3 is a flowchart showing a process for realizing a method for correcting a distorted document image according to the present invention. FIG. 8 shows an example of a general document image taken from a book page by a camera. As shown in FIG. 8, there is a clear distortion due to warpage and perspective problems.

ステップＳ３１０において、垂直文字ストロークの向きは垂直消失点を検出するために利用される。消失点の意味を以下に説明する。互いに平行である平面の全ての直線は、理論上、それらの直線がどのように拡張するかに関わらず互いに交差することはない。一方で、平面が非ゼロ視野角で３次元空間に配置される場合、最初に互いに平行であった線は平行であるようには見えなくなり、理論上それらの全ての線のうち拡張した線は１つの点で交わる。平面の互いに平行である線が非ゼロ視野角で３次元空間において互いに交わる点は消失点と呼ばれる。従って、平面の垂直線が非ゼロ視野角で３次元空間において互いに交わる点は垂直消失点と呼ばれる。同様に、平面の水平線が非ゼロ視野角で３次元空間において互いに交わる点は水平消失点と呼ばれる。「水平」及び「垂直」という用語は先に規定した。殆どの場合、製本された書籍の自然な反りは、綴じ線に平行な方向又はテキスト行に垂直な方向に拡張する。従って、１つの撮影された画像に対して垂直消失点は１つのみ存在する。Barnard, S.T.による「Interpreting Perspective Images」Artificial Intelligence、vol. 21、435-462ページ、1983年において開示されるガウス球面を使用する方法及びVirginio Cantoni、Luca Lombardi、Marco Porta、Nicolas Sicardによる「Vanishing Point Detection: Representation Analysis and New Approaches」Proceedings of the 11th International Conference on Image Analysis & Processingにおいて開示される極空間におけるＨｏｕｇｈ変換蓄積に基づく方法等の消失点を検出する多くの周知の方法がある。 In step S310, the direction of the vertical character stroke is used to detect the vertical vanishing point. The meaning of the vanishing point will be described below. All straight lines in planes that are parallel to each other do not, in theory, intersect each other regardless of how they extend. On the other hand, if the plane is placed in a three-dimensional space with a non-zero viewing angle, the lines that were initially parallel to each other will no longer appear to be parallel, and theoretically the expanded line of all those lines is Intersect at one point. The point where lines parallel to each other in the plane intersect each other in a three-dimensional space with a non-zero viewing angle is called a vanishing point. Therefore, the point where planar vertical lines intersect each other in a three-dimensional space with a non-zero viewing angle is called a vertical vanishing point. Similarly, the point where planar horizontal lines intersect each other in a three-dimensional space with a non-zero viewing angle is called a horizontal vanishing point. The terms “horizontal” and “vertical” are defined above. In most cases, the natural warping of a bound book extends in a direction parallel to the binding line or perpendicular to the text line. Therefore, there is only one vertical vanishing point for one captured image. “Interpreting Perspective Images” by Barnard, ST, Artificial Intelligence, vol. 21, pp. 435-462, the method of using Gaussian spheres disclosed in 1983 and “Vanishing Point Detection by Virginio Cantoni, Luca Lombardi, Marco Porta, Nicolas Sicard. There are many well-known methods for detecting vanishing points such as the method based on Hough transform accumulation in the polar space disclosed in “Representation Analysis and New Approaches”, Proceedings of the 11th International Conference on Image Analysis & Processing.

ステップＳ３１０で垂直消失点が判定された後、ステップＳ３２０において、垂直消失点から発生して画像領域にわたる垂直線のグループが導出され、画像領域全体を水平方向に沿って配置される複数のプレーナストライプに分割する。 After the vertical vanishing point is determined in step S310, in step S320, a group of vertical lines generated from the vertical vanishing point and extending over the image area is derived, and the entire image area is arranged along the horizontal direction. Divide into

ステップＳ３３０において、画像ストライプに対する水平消失点が検出される。 In step S330, a horizontal vanishing point for the image stripe is detected.

ステップＳ３４０において、画像ストライプ並びに対応する水平消失点及び垂直消失点から、歪み文書画像の歪み全体を記述するモデルが構成される。垂直消失点及び水平消失点が判定された後、透視特性及び反り特性の双方が判定される。従って、テキスト行、文字ストローク及びページのエッジ等の種々の文書特徴は、そのようなモデルを構成するために使用される。 In step S340, a model describing the overall distortion of the distorted document image is constructed from the image stripes and the corresponding horizontal and vertical vanishing points. After the vertical vanishing point and the horizontal vanishing point are determined, both the perspective characteristic and the warp characteristic are determined. Thus, various document features such as text lines, character strokes and page edges are used to construct such models.

最後のステップＳ３５０において、補正文書画像が上記構成されたモデルを使用して生成される。 In the final step S350, a corrected document image is generated using the constructed model.

以下は、歪み文書画像の歪みを補正する本発明に係る例示的な一実施形態である。 The following is an exemplary embodiment according to the present invention for correcting distortion of a distorted document image.

最初のステップＳ３１０において、垂直消失点を検出するために垂直文字ストロークの向きが利用される。ステップＳ３１０を実行する詳細なステップについては、図４を参照して以下に説明する。 In the first step S310, the orientation of the vertical character stroke is used to detect the vertical vanishing point. Detailed steps for executing step S310 will be described below with reference to FIG.

図４は、ステップＳ３１０において垂直消失点を検出する処理を示すフローチャートである。 FIG. 4 is a flowchart showing the process of detecting the vertical vanishing point in step S310.

ステップＳ４１０において、画像の前景物体のエッジが検出される。エッジを検出するために、本明細書においてソーベル演算子、Ｃａｎｎｙ演算子等の種々の一般的な輪郭検出技術が適用される。出力はエッジ画像及びエッジ角度画像である。エッジ角度画像は、検出されたエッジ画素毎に傾斜角に関する情報を有する。画素の傾斜角は、画素の階調値の変更方向を示す角度である。即ち、画素の傾斜角は、階調が隣接する画素からその画素に変更する方向を示す。図９に、図８に示す文書画像の例示的なエッジ画像を示す。各文字のエッジが抽出されることが分かる。 In step S410, the edge of the foreground object in the image is detected. In order to detect edges, various general contour detection techniques such as Sobel operator, Canny operator, etc. are applied herein. The output is an edge image and an edge angle image. The edge angle image has information regarding the tilt angle for each detected edge pixel. The inclination angle of the pixel is an angle indicating the change direction of the gradation value of the pixel. That is, the inclination angle of a pixel indicates a direction in which the gradation is changed from the adjacent pixel to the pixel. FIG. 9 shows an exemplary edge image of the document image shown in FIG. It can be seen that the edge of each character is extracted.

ステップＳ４２０において、エッジ画像は垂直文字ストロークを選ぶように以下の方法により処理される。デジタルカメラにより撮影される文書は、適切に配置されない可能性があるため、特定のスキューが導入される可能性が非常に高い。スキューの角度を検出するために、本明細書において、例えばYue Lu、Chew Lim Tanによる「A Nearest-Neighbour Chain Based Approach to Skew Estimation in Document Images」Pattern Recognition letters 24（2003年）、2315-2323ページにおいて提案される最近傍法又は投影を用いた方法等の２値画像におけるスキューの角度を検出する種々の既存の方法が使用される。必要なことはスキューの角度からテキスト行の大まかな方向を取得することだけであるため、検出したスキューの角度が必ずしも非常に正確である必要はない。 In step S420, the edge image is processed in the following manner to select a vertical character stroke. Documents photographed by a digital camera may not be properly placed, so it is very likely that a specific skew will be introduced. In order to detect the skew angle, for example, “A Nearest-Neighbour Chain Based Approach to Skew Estimation in Document Images” Pattern Recognition letters 24 (2003), pages 231-2323 by Yue Lu and Chew Lim Tan. Various existing methods for detecting the skew angle in a binary image, such as the nearest neighbor method proposed in US Pat. Since all that is necessary is to obtain the rough direction of the text line from the skew angle, the detected skew angle does not necessarily have to be very accurate.

垂直ストロークに属さないエッジの画素は、自身の傾斜方向とスキューの角度とを比較することにより除去される。θｉはエッジの（ｘｉ，ｙｉ）に位置付けられる画素の傾斜角であり、θは文書のスキューの角度であることを示す。｜θｉ−θ｜が所定の閾値より大きい場合、画素は除去されるべきである。尚、（ｘｉ，ｙｉ）は、歪み文書画像に対して確立されたデカルト座標系における座標である。図１１に、図９に示すエッジ画像から水平ストロークのエッジの画素を除去することによって取得される例示的なエッジ画像を示す。 Edge pixels that do not belong to a vertical stroke are removed by comparing their tilt direction with the skew angle. θi is the inclination angle of the pixel located at the edge (xi, yi), and θ is the skew angle of the document. If | θi−θ | is greater than a predetermined threshold, the pixel should be removed. Note that (xi, yi) are coordinates in the Cartesian coordinate system established for the distorted document image. FIG. 11 shows an exemplary edge image obtained by removing pixels at the edge of the horizontal stroke from the edge image shown in FIG.

ステップＳ４３０において、垂直ストローク候補がステップＳ４２０で取得されたエッジ画像において連結成分を探索することにより見つけられる。連結成分は画素の集合を意味し、各画素はその集合中の少なくとも別の画素と結合する。画素が別の画素の特定の近傍（例えば、前記別の画素から水平方向に３画素及び垂直方向に４画素内の近傍）内にある場合、２つの画素が「連結される」と考えられる。画素から連結成分を探索する従来技術において周知の多くのアルゴリズムが存在する。例えば、探索戦略は、まず画像の下側から開始点を選択し、垂直方向に上側に向かって黒画素を探索する。上述の近傍内で次の前景画素が常に黒画素から探索される。尚、近傍のサイズは、実際の要件に従って当業者により任意に選択可能である。消失点の計算の目的で、長さＬが特定の所望の範囲（例えば、12＜L150）にある主要な連結成分のみが考慮されるのが好ましい。即ち、この範囲内にない長さを有する連結成分は除去又は無視されるのが好ましい。尚、上記範囲の数字は単なる例示であり、当業者は設計要件又は実際の原稿の状態に従って、この範囲を任意に選択しても良い。図１２に、抽出された垂直ストロークの連結成分を図１１に示すエッジ画像から見つけることにより取得される例示的な画像を示す。 In step S430, vertical stroke candidates are found by searching for connected components in the edge image acquired in step S420. A connected component means a set of pixels, and each pixel is combined with at least another pixel in the set. Two pixels are considered “connected” if the pixel is in a particular neighborhood of another pixel (eg, a neighborhood within 3 pixels horizontally and 4 pixels vertically from the other pixel). There are many algorithms known in the prior art for searching for connected components from pixels. For example, in the search strategy, a start point is first selected from the lower side of the image, and black pixels are searched for upward in the vertical direction. The next foreground pixel is always searched from the black pixel within the neighborhood described above. The neighborhood size can be arbitrarily selected by those skilled in the art according to actual requirements. For the purpose of vanishing point calculation, it is preferable to consider only the main connected components whose length L is in a certain desired range (eg 12 <L150). That is, connected components having a length not within this range are preferably removed or ignored. The numbers in the above range are merely examples, and those skilled in the art may arbitrarily select this range according to the design requirements or the actual document state. FIG. 12 shows an exemplary image obtained by finding the connected components of the extracted vertical strokes from the edge image shown in FIG.

見つけられた連結成分毎に、線は原点からの距離ρ及び角度θによってフィッティングされ、パラメータ化される。 For each connected component found, the line is fitted and parameterized by the distance ρ from the origin and the angle θ.

式中、ｘ及びｙは、デカルト座標系における線上の点のｘ座標及びｙ座標であり、θ及びρはフィッティング中に判定される２つのパラメータである。 Where x and y are the x and y coordinates of a point on a line in a Cartesian coordinate system, and θ and ρ are two parameters determined during fitting.

取得された各連結成分は、同様の傾斜の向きを有するエッジ画素（ｘ_ｉ，ｙ_ｉ）のグループであり、（ｘ_ｉ，ｙ_ｉ）はデカルト座標系における連結成分のｉ番目の画素の座標であり、ｉ＝１，２，３…である。線パラメータは、エッジ画素に関連する行列Ｄの固有値λ_１及びλ_２、並びに固有ベクトルｖ_１及びｖ_２から直接判定される。行列Ｄは以下のように規定される。 Each acquired connected component is a group of edge pixels (x _i , y _i ) having the same inclination direction, and (x _i , y _i ) is the coordinate of the i-th pixel of the connected component in the Cartesian coordinate system. And i = 1, 2, 3,... The line parameters are determined directly from the eigenvalues λ ₁ and λ _{2 of} the matrix D associated with the edge pixels and the eigenvectors v ₁ and v ₂ . The matrix D is defined as follows.

行列の固有値及び固有ベクトルを評価する処理は従来技術において周知であるため、本明細書において、行列Ｄの固有値λ_１及びλ_２、並びに固有ベクトルｖ_１及びｖ_２を取得する詳細な処理は省略する。理想的な線の場合、固有値の１つがゼロであるべきである。ラインフィッティングの品質は、行列Ｄの２つの固有値の比、即ち、ｖ＝λ_１／λ_２により特徴付けられる。線パラメータは固有ベクトルｖ_１及びｖ_２から判定され、ｖ_１は最大の固有値と関連付けられる固有ベクトルである。線パラメータは以下のように計算される。 Since processing for evaluating eigenvalues and eigenvectors of a matrix is well known in the prior art, detailed processing for acquiring eigenvalues λ ₁ and λ ₂ and eigenvectors v ₁ and v ₂ of the matrix D is omitted in this specification. For an ideal line, one of the eigenvalues should be zero. The quality of the line fitting is characterized by the ratio of the two eigenvalues of the matrix D, ie v = λ ₁ / λ ₂ . Line parameters are determined from the eigenvectors v ₁ and v ₂ , where v ₁ is the eigenvector associated with the largest eigenvalue. The line parameters are calculated as follows:

ｖ_１（１）はｖ_１の第１次元であり、ｖ_１（２）はｖ_１の第２次元である。上記式によると、線のパラメータθ及びρが取得され、それにより垂直ストロークに対する連結成分の各々をフィッティングする垂直直線線分は取得される。 v ₁ (1) is a first dimension of _{v _1,} _v 1 (2) is a second dimension of _{v 1.} According to the above equation, the line parameters θ and ρ are obtained, thereby obtaining the vertical straight line segments fitting each connected component to the vertical stroke.

ステップＳ４４０において、それらの垂直直線線分の最適収束点を探索することにより、垂直消失点が取得される。複数の線の最適収束点を推定する際に利用可能である種々の既存の技術が存在する。以下は、それらの垂直直線線分の最適収束点を推定する例示的な処理である。最初に、前記線分のうち任意の２つの線分の間の交点が垂直消失点（ｘ_０ ^ｊ，ｙ_０ ^ｊ）、ｊ＝１，２，３…の候補集合として計算される。その後、統計的方法が使用され、結果として得られる垂直消失点として交点のグループから最適な収束点を選択する。垂直消失点は、例えば関数Ｆ（ｊ）を最小にする交点（ｘ_０ ^ｊ，ｙ_０ ^ｊ）のうちの１つの点であっても良い。 In step S440, the vertical vanishing point is acquired by searching for the optimum convergence point of the vertical line segments. There are various existing techniques that can be used in estimating the optimal convergence point of a plurality of lines. The following is an exemplary process for estimating the optimal convergence point of these vertical straight line segments. First, the intersection between any two of the line segments is calculated as a candidate set of vertical vanishing points (x ₀ ^j , y ₀ ^j ), j = 1, 2, 3. A statistical method is then used to select the optimal convergence point from the group of intersections as the resulting vertical vanishing point. The vertical vanishing point may be, for example, one of the intersection points (x ₀ ^j , y ₀ ^j ) that minimizes the function F (j).

直感的な表現については図５を参照。尚、ステップＳ４４０に対する上記説明は例示するだけのものであり、本発明の範囲を限定する意図はない。上述したように、複数の線の消失点を取得する周知の技術が多く存在し、消失点を取得する方法は上述の方法に限定されない。 See Figure 5 for intuitive expression. It should be noted that the above description for step S440 is merely exemplary and is not intended to limit the scope of the present invention. As described above, there are many known techniques for acquiring the vanishing points of a plurality of lines, and the method for acquiring the vanishing points is not limited to the above-described method.

垂直消失点がステップＳ３１０で判定された（例えば、Ｓ４１０〜Ｓ４４０の上述したサブステップを使用することにより）後、ステップＳ３２０において、垂直消失点から発生し且つ画像領域にわたる垂直線のグループが導出され、画像領域全体を水平方向に沿って配置される複数のプレーナストライプに分割する。例えばこの分割は、基本的に以下の１つ以上の例示的な基準に基づく。
（１）各スプライトにおける単一のテキスト行曲線の長さは特定の範囲［Ｌ１，Ｌ２］内である。Ｌ１及びＬ２は、例えば文書画像における平均文字サイズに従って判定される値である。
（２)処理される画像が２ページにわたる場合、ステイプル又は綴じ線はそれらの垂直線のうちの１つである。
（３)画像が２列以上を含む場合、隣接する列の間に前記垂直線のような分離線が存在する。
（４)ページの中央部分においてはストライプはより狭くなり、ページの左側及び右側においてはストライプはより広くなる。
（５）各ストライプは、ほぼ平坦であると考えられる。即ち、１つのストライプは水平消失点を１つのみ有する。平坦であることの基準は、実際の要件及び予想されるＯＣＲ精度に依存しても良い。 After the vertical vanishing point is determined in step S310 (eg, by using the above-described substeps of S410 to S440), in step S320, a group of vertical lines that originate from the vertical vanishing point and span the image area is derived. The entire image area is divided into a plurality of planar stripes arranged along the horizontal direction. For example, this division is basically based on one or more of the following exemplary criteria.
(1) The length of a single text line curve in each sprite is within a specific range [L1, L2]. L1 and L2 are values determined according to the average character size in the document image, for example.
(2) If the image being processed spans two pages, the stapling or binding line is one of those vertical lines.
(3) When an image includes two or more columns, a separation line such as the vertical line exists between adjacent columns.
(4) The stripe is narrower at the center of the page, and wider at the left and right sides of the page.
(5) Each stripe is considered to be substantially flat. That is, one stripe has only one horizontal vanishing point. The criteria for flatness may depend on actual requirements and expected OCR accuracy.

上記基準は、分割されたストライプが正確な水平消失点を計算するのに十分に広いことを保証し、それと同時にストライプが十分に平坦であることを保証する。 The above criteria ensure that the split stripe is wide enough to calculate an accurate horizontal vanishing point, while at the same time ensuring that the stripe is sufficiently flat.

尚、上記基準は単なる例示であり、本発明の保護範囲を限定する意図はない。当業者は、１つ以上の上記基準を採用してもよく、あるいは画像領域を分割するための他の基準を設計できる。画像領域の分割方法に関する基準は、実際の要件及び予想されるＯＣＲ精度に依存する。 The above criteria are merely examples, and are not intended to limit the protection scope of the present invention. One skilled in the art may employ one or more of the above criteria, or can design other criteria for segmenting the image area. The criteria for the image region segmentation method depends on the actual requirements and the expected OCR accuracy.

ステップＳ３３０において、画像ストライプ毎の水平消失点が検出される。一般に書籍の紙が水平方向に沿って変動するため、各画像ストライプの水平消失点は異なる。即ち、各画像ストライプは自身の水平消失点を有する。 In step S330, a horizontal vanishing point for each image stripe is detected. In general, since the paper of a book fluctuates along the horizontal direction, the horizontal vanishing point of each image stripe is different. That is, each image stripe has its own horizontal vanishing point.

以下は、画像ストライプ毎に水平消失点を取得する例示的な処理である。 The following is an exemplary process for obtaining a horizontal vanishing point for each image stripe.

最初に、各テキスト行の中間の高さにわたる曲線は、ステップＳ４１０において抽出されるようなエッジ画像から検出される。詳細には、文字の中間の高さの点が抽出され、テキスト行の曲線の位置は、連結成分解析を使用することにより中間の高さの点から特定される。テキスト行の曲線は、水平方向に沿って歪み情報を指定するのに十分に正確である。 Initially, a curve over the middle height of each text line is detected from the edge image as extracted in step S410. Specifically, the mid-height points of the character are extracted, and the position of the curve in the text line is determined from the mid-height points by using connected component analysis. The text line curve is sufficiently accurate to specify the distortion information along the horizontal direction.

図６に、テキスト行の曲線の位置を特定するための詳細なフローチャートを示す。尚、図６の処理は、単なる例示的で好適な例である。当業者には、エッジ画像から種々のテキスト行の曲線の位置を特定する種々の方法が周知である。 FIG. 6 shows a detailed flowchart for specifying the position of the curve in the text line. Note that the process of FIG. 6 is merely an illustrative and preferable example. Those skilled in the art are familiar with various methods for locating various text line curves from an edge image.

ステップＳ６１０において、ステップＳ４１０で抽出されるようなエッジ画像である２値画像が、例えばアフィン変換演算により変換され、新しい２値画像Ｉ１が生成される。アフィン変換演算は、前記２値画像を上述のスキューの角度だけ回転させてほぼ垂直に見えるようにし、水平圧縮率Ｎが垂直圧縮率Ｍより大きい「ＯＲ」法により回転された画像を圧縮する処理と同等である。「ＯＲ」法は、圧縮画像上の画素に対応する非圧縮画像上のＮ×Ｍ画像ブロックに対して、画像ブロック中に少なくとも１つの黒画素が存在する場合に圧縮画像上の対応する画素が黒色として設定されることを意味する。回転の目的は、テキスト行を十分に水平にすることであり、圧縮の主な目的は、文字の間隔をにじませ、テキスト行毎に「ベタ」テキストブロックを得ることである。「ベタ」テキストブロックにおいて文字の中間の高さの点を探索するのははるかに容易である。 In step S610, the binary image that is the edge image extracted in step S410 is converted by, for example, affine transformation, and a new binary image I1 is generated. In the affine transformation operation, the binary image is rotated by the above skew angle so that the binary image looks almost vertical, and the image rotated by the “OR” method in which the horizontal compression rate N is larger than the vertical compression rate M is compressed. Is equivalent to The “OR” method is such that for an N × M image block on an uncompressed image corresponding to a pixel on the compressed image, the corresponding pixel on the compressed image is present when at least one black pixel is present in the image block. It means that it is set as black. The purpose of the rotation is to make the text lines sufficiently horizontal, and the main purpose of compression is to blur the character spacing and get a “solid” text block for each text line. It is much easier to search for mid-height points of characters in a “solid” text block.

適切な「ベタ」効果を達成するために、且つそれと同時に隣接するテキスト行を混合しないように、垂直圧縮率Ｍは、Ｍにより分割された原画像の高さが所定の値（例えば、５１２）より大きくないことを満足する最小の正の整数として指定される。更に垂直圧縮率Ｍは、検出される文字の平均高さに従って割り当てることができる。例えば、検出される文字の平均高さがＨである場合、ＭはＨ／８として割り当てられても良い。水平圧縮率Ｎは３＊Ｍとして指定可能である。 In order to achieve an appropriate “solid” effect and at the same time not to mix adjacent text lines, the vertical compression ratio M is such that the height of the original image divided by M is a predetermined value (eg 512). Specified as the smallest positive integer that satisfies not being greater. Furthermore, the vertical compression rate M can be assigned according to the average height of the detected characters. For example, if the average height of the detected characters is H, M may be assigned as H / 8. The horizontal compression rate N can be specified as 3 * M.

ステップＳ６２０において、テキスト行に対するより適切な「ベタ」効果を得るために、２値画像Ｉ１は、水平方向にランレングス平滑化アルゴリズム（ＲＬＳＡ）を実行し且つその後、垂直方向にＲＬＳＡを実行することにより平滑化される。ランは、互いに間に空間（白画素）を有さない連続するＮ画素の一部を意味する。ランレングス平滑化アルゴリズムは、長さ（画素数）を特徴とするパラメータにより特徴付けられ、２つの画素間の距離がその長さより短い場合に、２つの画素間の画素が全て「黒色」でレンダリングされるか、或いは換言すると、２つの画素はランレングス平滑化アルゴリズムに従って「連続している」と考えられる。このパラメータは、２と４との間の値として選択可能である。ランレングス平滑化アルゴリズムには、「ほぼ」連続した線又は曲線を識別するように、合わせて短い距離を有する画素を通信する効果がある。図１０の（ａ）に、上述したように図９のエッジ画像に対して回転、圧縮及びランレングス平滑化アルゴリズムを行うことにより取得される例示的なエッジ画像を示す。 In step S620, to obtain a more appropriate “solid” effect on the text line, the binary image I1 performs a run length smoothing algorithm (RLSA) in the horizontal direction and then performs RLSA in the vertical direction. Is smoothed. A run means a part of consecutive N pixels that do not have a space (white pixel) between them. The run-length smoothing algorithm is characterized by a parameter characterized by length (number of pixels), and when the distance between the two pixels is less than that length, the pixels between the two pixels are all rendered “black” Or in other words, two pixels are considered “contiguous” according to a run-length smoothing algorithm. This parameter can be selected as a value between 2 and 4. The run-length smoothing algorithm has the effect of communicating pixels that have a short distance together to identify “almost” continuous lines or curves. FIG. 10A shows an exemplary edge image obtained by performing the rotation, compression and run-length smoothing algorithms on the edge image of FIG. 9 as described above.

次のステップＳ６３０において、黒ランは、２値画像Ｉ１上で垂直方向に沿って見つけられる。文字に属さない黒ランは、短すぎるラン又は長すぎるランを除去することにより廃棄される。Ｈ１及びＨ２は、例えばアフィン変換後の文書において最小である可能性のあるテキストの高さ及び最大である可能性のあるテキストの高さとしてそれぞれ指定される。黒ランの長さがＨ１より短い場合又はＨ２より長い場合、その黒ランは廃棄される。廃棄ステップの後、保持される黒ランの殆どは文字に属する。 In the next step S630, a black run is found along the vertical direction on the binary image I1. Black runs that do not belong to the letter are discarded by removing runs that are too short or too long. For example, H1 and H2 are respectively designated as the text height that may be the minimum and the text height that may be the maximum in the document after the affine transformation. If the length of the black run is shorter than H1 or longer than H2, the black run is discarded. After the discard step, most of the retained black runs belong to the characters.

ステップＳ６４０において、保持される黒ランの中間の高さの点は、文字の中間の高さの点として抽出される。その後、２値画像Ｉ２は２値画像Ｉ１と同一のサイズで生成される。２値画像Ｉ２において、画素は中間の高さの点に対応する位置で黒色に設定される。図１０の（ｂ）に、図１０の（ａ）の画像から抽出される中間の高さの点により構成される例示的な画像を示す。 In step S640, the intermediate height point of the black run to be held is extracted as the intermediate height point of the character. Thereafter, the binary image I2 is generated with the same size as the binary image I1. In the binary image I2, the pixel is set to black at a position corresponding to an intermediate height point. FIG. 10B shows an exemplary image composed of intermediate height points extracted from the image of FIG.

ステップＳ６５０において、中間の高さの点を含む２値画像Ｉ２が取得された後、連結成分探索方法を使用することにより曲線が見つけられる。ステップＳ４３０において説明された方法と同様に、探索戦略は、例えば最初に画像の左側から開始点を選択し、右側に向かって水平方向に黒画素を探索することである。方法の実現例において、次の前景画素が、例えば水平方向に４画素及び垂直方向に３画素内で常に黒画素から探索される。 In step S650, after a binary image I2 containing a medium height point is acquired, a curve is found by using a connected component search method. Similar to the method described in step S430, the search strategy is to first select a starting point from the left side of the image, for example, and search for black pixels in the horizontal direction toward the right side. In an implementation of the method, the next foreground pixel is always searched from black pixels, for example within 4 pixels in the horizontal direction and 3 pixels in the vertical direction.

多くの場合、このように取得された曲線は、図１０の（ｂ）に示すように文字ストロークの変動のために平滑ではない。従って、ステップＳ６６０において、ランレングス情報は、曲線を平滑化するために使用されるのが好ましい。例えば各曲線の平均ランレングスＨが計算され、対応するランレングスが［ａ＊Ｈ，ｂ＊Ｈ］の範囲を超える点が除去される。ここで、ａ＜１且つｂ＞１である。曲線が平滑化される限り、ステップＳ６６０において他の平滑化方法も使用可能である。計算の複雑さに対して制限があるか又は精度に対する要件が許す場合、更にステップＳ６６０が省略可能である。 In many cases, the curve obtained in this way is not smooth due to variations in character strokes as shown in FIG. Therefore, in step S660, run length information is preferably used to smooth the curve. For example, the average run length H of each curve is calculated, and points whose corresponding run length exceeds the range [a * H, b * H] are removed. Here, a <1 and b> 1. Other smoothing methods can be used in step S660 as long as the curve is smoothed. If the computational complexity is limited or the accuracy requirements allow, step S660 can be further omitted.

ステップＳ６７０において、テキスト行の曲線の座標は、元のエッジ画像に変換される。この変換演算は、上述のアフィン変換演算の逆演算である。 In step S670, the coordinates of the curve in the text line are converted to the original edge image. This conversion operation is an inverse operation of the above-described affine conversion operation.

その後、画像ストライプ毎に、水平消失点が以下のステップにより計算される。
ａ）画像ストライプに位置するテキスト行の曲線の断片を抽出するステップ；
ｂ）水平直線線分によるテキスト行の曲線の各断片をフィッティングするステップ；
ｃ）水平直線線分の最適な収束点を選択することにより、水平垂直線分から水平消失点を計算するステップ。 Then, for each image stripe, the horizontal vanishing point is calculated by the following steps.
a) extracting a curve segment of a text line located in an image stripe;
b) fitting each piece of the curve of the text line by a horizontal straight line;
c) calculating a horizontal vanishing point from the horizontal and vertical line segments by selecting an optimal convergence point of the horizontal straight line segment.

水平消失点を計算する際に最適な収束点を選択する処理は、例えばステップＳ４４０で垂直消失点を計算する時と同一の処理により実行されても良い。しかし、水平消失点を計算する際に最適な収束点を選択する処理は、消失点が計算できる限り、ステップＳ４４０で垂直消失点を計算する時とは異なる処理において実行されても良い。 The process of selecting the optimum convergence point when calculating the horizontal vanishing point may be executed by the same process as that for calculating the vertical vanishing point in step S440, for example. However, the process of selecting the optimum convergence point when calculating the horizontal vanishing point may be executed in a process different from that for calculating the vertical vanishing point in step S440 as long as the vanishing point can be calculated.

ここで図３に戻る。 Returning now to FIG.

ステップＳ３４０において、画像ストライプ、並びに対応する水平消失点及び垂直消失点から歪み文書画像を記述するモデルが歪み文書画像と補正文書画像との間のマッピングを記述するために構成される。この例において、モデルはメッシュである。図１５に、メッシュを構成する方法を例示する。図１５に示すように、文書画像Ｐａ−Ｐｂ−Ｐｃ−Ｐｄは実線の曲線で描かれ、左側から右側に順番にＳＴＲＩＰＥ１、ＳＴＲＩＰＥ２及びＳＴＲＩＰＥ３となる３つのストライプに分割される。１つの垂直消失点ＶＶＰ、並びに３つの水平消失点ＨＶＰ１、ＨＶＰ２及びＨＶＰ３は、上述の方法に従って見つけられる。水平消失点ＨＶＰ１、ＨＶＰ２及びＨＶＰ３は、それぞれＳＴＲＩＰＥ１、ＳＴＲＩＰＥ２及びＳＴＲＩＰＥ３に対する水平消失点である。従って、この画像を前記３つのストライプに分割する垂直消失点ＶＶＰから開始する２つの垂直線Ｐｅ−Ｐｆ及びＰｇ−Ｐｈが存在する。ここで、メッシュの水平曲線を考慮する。２つの水平曲線は、この図１５に図示する例において使用される。しかし、水平曲線の数は、予想されたＯＣＲ精度及び要件、並びに処理速度及び計算能力等の条件に依存して決定されても良い。例えば、この画像の左側エッジ上の２つの点は点Ｐ０１１及び点Ｐ０１２として選択される。これらの点は左側エッジを同等に分割するように選択されるのが好ましい。しかし、これは厳密な要件ではない。その後、１本の線は水平消失点ＨＶＰ１から開始して点Ｐ０１１に向かって描かれ、線Ｐｅ−Ｐｆとの交点Ｐ１２１を有するように延長し、１本の線は水平消失点ＨＶＰ１から開始して点Ｐ０１２に向かって描かれ、線Ｐｅ−Ｐｆとの交点Ｐ１２２を有するように延長する。次に、１本の線は水平消失点ＨＶＰ２から開始して点Ｐ１２１に向かって描かれ、線Ｐｇ−Ｐｈとの交点Ｐ２３１を有し、１本の線は水平消失点ＨＶＰ２から開始して点Ｐ１２２に向かって描かれ、線Ｐｇ−Ｐｈとの交点Ｐ２３２を有する。最後に、１本の線は水平消失点ＨＶＰ３から開始して点Ｐ２３１に向かって描かれ、画像の右側エッジとの交点Ｐ３０１を有し、１本の線は水平消失点ＨＶＰ３から開始して点Ｐ２３２に向かって描かれ、画像の右側エッジとの交点Ｐ３０２を有する。結果として、８つの点Ｐ０１１、Ｐ０１２、Ｐ１２１、Ｐ１２２、Ｐ２３１、Ｐ２３２、Ｐ３０１及びＰ３０２が取得される。２つの水平曲線は、点の２つのグループ、即ち、点Ｐ０１１、Ｐ１２１、Ｐ２３１及びＰ３０１のグループと点Ｐ０１２、Ｐ１２２、Ｐ２３２及びＰ３０２のグループとを使用することによりフィッティングされても良い。即ち、一般に水平曲線は、水平消失点の各々と垂直線との交点を計算することにより判定される。尚、上記図示する例において、方法は左側の水平消失点から開始されるが、ある特定のストライプ内の水平曲線の方向がそのストライプに対する水平消失点により判定される限り、任意の１つの水平消失点が開始水平消失点として使用可能である。 In step S340, a model describing the distorted document image from the image stripes and the corresponding horizontal and vertical vanishing points is configured to describe the mapping between the distorted document image and the corrected document image. In this example, the model is a mesh. FIG. 15 illustrates a method for constructing a mesh. As shown in FIG. 15, the document image Pa-Pb-Pc-Pd is drawn with a solid curve, and is divided into three stripes, STRIP1, STRIPE2 and STRIPE3 in order from the left side to the right side. One vertical vanishing point VVP and three horizontal vanishing points HVP1, HVP2 and HVP3 are found according to the method described above. Horizontal vanishing points HVP1, HVP2, and HVP3 are horizontal vanishing points for STRIP1, STRIPE2, and STRIPE3, respectively. Thus, there are two vertical lines Pe-Pf and Pg-Ph starting from the vertical vanishing point VVP that divides the image into the three stripes. Here, the horizontal curve of the mesh is considered. Two horizontal curves are used in the example illustrated in FIG. However, the number of horizontal curves may be determined depending on expected OCR accuracy and requirements, as well as conditions such as processing speed and computing power. For example, two points on the left edge of this image are selected as point P011 and point P012. These points are preferably selected to equally divide the left edge. However, this is not a strict requirement. Thereafter, one line is drawn starting from the horizontal vanishing point HVP1 toward the point P011, extended to have an intersection P121 with the line Pe-Pf, and one line starting from the horizontal vanishing point HVP1. Drawn toward the point P012 and extended to have an intersection P122 with the line Pe-Pf. Next, one line is drawn starting from the horizontal vanishing point HVP2 toward the point P121, and has an intersection P231 with the line Pg-Ph, and one line starts from the horizontal vanishing point HVP2. It is drawn toward P122 and has an intersection P232 with the line Pg-Ph. Finally, one line is drawn starting from the horizontal vanishing point HVP3 toward the point P231 and has an intersection P301 with the right edge of the image, and one line starting from the horizontal vanishing point HVP3. It is drawn towards P232 and has an intersection P302 with the right edge of the image. As a result, eight points P011, P012, P121, P122, P231, P232, P301 and P302 are obtained. The two horizontal curves may be fitted by using two groups of points: a group of points P011, P121, P231 and P301 and a group of points P012, P122, P232 and P302. That is, generally a horizontal curve is determined by calculating the intersection of each horizontal vanishing point and a vertical line. Note that in the example shown above, the method starts from the left horizontal vanishing point, but any one horizontal vanishing point as long as the direction of the horizontal curve within a particular stripe is determined by the horizontal vanishing point for that stripe. A point can be used as the starting horizontal vanishing point.

図１３に、上述の方法により構成されたメッシュと共に図９の文書画像を示す。図１３に示すように、画像領域全体は、垂直消失点から導出される７本の垂直線により導出される８つのプレーナストライプに分割される。９本の水平曲線のグループは、上述のように水平消失点と垂直線との間の交点を計算することにより判定される。 FIG. 13 shows the document image of FIG. 9 together with the mesh constructed by the method described above. As shown in FIG. 13, the entire image area is divided into eight planar stripes derived by seven vertical lines derived from the vertical vanishing point. The group of nine horizontal curves is determined by calculating the intersection between the horizontal vanishing point and the vertical line as described above.

メッシュが確立された後、歪み文書画像上の点と補正文書画像上の点との間のマッピングはメッシュを参照することにより生成され、その後補正文書画像は前記マッピングを参照することにより取得される。 After the mesh is established, a mapping between points on the distorted document image and points on the corrected document image is generated by referring to the mesh, and then the corrected document image is obtained by referring to the mapping. .

歪み文書画像上の点と補正文書画像上の点との間のマッピングは、境界補間に基づいて決定される。境界補間法の１つは、C. Strouthopoulos、N. Papamarkos及びC. Chamzasによる「Identification of Text-Only Areas in Mixed-type Documents」Engng Applic. Artif. Intell.、Elsevier Science Ltd、Great Britain、Vol. 10、No. 4、387-401ページ、1997年において説明される。 The mapping between points on the distorted document image and points on the corrected document image is determined based on boundary interpolation. One of the boundary interpolation methods is “Identification of Text-Only Areas in Mixed-type Documents” by C. Strouthopoulos, N. Papamarkos and C. Chamzas, Engng Applic. Artif. Intell., Elsevier Science Ltd, Great Britain, Vol. 10, No. 4, pages 387-401, explained in 1997.

一例において、図７に示すように、自然３次スプラインは、それらの交点をつなぎ且つメッシュのグリッドを境界曲線Ｃ_ｉ（ｉ＝１，２，３，４）として囲む曲線をフィッティングするために使用される。図７の左下の部分は、上述のように、４つの境界曲線Ｃ_ｉ（ｉ＝１，２，３，４）により囲まれるメッシュの１つのグリッドを示す。これらの境界曲線は、上述のように、垂直消失点及び水平消失点により取得される水平曲線及び垂直線の一部である。図７の右下の部分は、歪み文書画像のグリッドに対応する補正文書画像のグリッドにおけるパラメータ空間ｕ及びｖにわたり規定される補正文書画像を示す。ここで、ｕ∈［０，１］且つｖ∈［０，１］である。水平境界曲線ｃ_１及びｃ_３は、ｘ座標であるｃ_ｉｘ（ｕ）及びｙ座標であるｃ_ｉｙ（ｕ）（ｉ＝１，３）で表され、垂直境界線ｃ_２及びｃ_４は、ｘ座標であるｃ_ｉｘ（ｖ）及びｙ座標であるｃ_ｉｙ（ｖ）（ｉ＝２，４）で表される。即ち、補正文書画像の各ｕに対して、歪み文書画像中の水平境界曲線ｃ_１及びｃ_３上の各点は、(ｃ_ｉｘ（ｕ），ｃ_ｉｙ（ｕ）)（ｉ＝１，３）で表され、補正文書画像の各ｖに対して、歪み文書画像中の垂直境界線ｃ_２及びｃ_４上の各点は（ｃ_ｉｘ（ｖ），ｃ_ｉｙ（ｖ））（ｉ＝２，４）で表される。 In one example, natural cubic splines are used to fit curves that connect their intersections and surround the grid of meshes as boundary curves C _i (i = 1, 2, 3, 4), as shown in FIG. Is done. The lower left part of FIG. 7 shows one grid of a mesh surrounded by four boundary curves C _i (i = 1, 2, 3, 4) as described above. These boundary curves are part of the horizontal curve and the vertical line acquired by the vertical vanishing point and the horizontal vanishing point as described above. The lower right part of FIG. 7 shows the corrected document image defined over the parameter spaces u and v in the corrected document image grid corresponding to the distorted document image grid. Here, uε [0,1] and vε [0,1]. The horizontal boundary curves c ₁ and c ₃ are represented by x coordinates c _ix (u) and y coordinates c _iy (u) (i = 1, 3), and the vertical boundary lines c ₂ and c ₄ are represented by a x-coordinate _c ix (v) and _c iy is y-coordinate (v) (i = 2,4) . That is, for each u in the corrected document image, the points on the horizontal boundary curves c ₁ and c ₃ in the distorted document image are (c _ix (u), c _iy (u)) (i = 1, 3 ), And for each v of the corrected document image, each point on the vertical boundary lines c ₂ and c ₄ in the distorted document image is (c _ix (v), c _iy (v)) (i = 2) , 4).

各境界曲線ｃｉ（ｉ＝１，２，３，４）は、ｕ−ｖ空間の直線から成る画像の対応する辺にマップする。例えば、補正文書画像のｕ軸は歪み文書画像中の曲線ｃ_１に対応し、補正文書画像のｖ軸は歪み文書画像中の曲線ｃ_４に対応する。この場合、補正文書画像中の任意の点（ｕ，ｖ）を歪み文書画像中の境界曲線ｃｉ（ｉ＝１，２，３，４）により囲まれる歪み文書画像中の点（ｃ_ｘ（ｕ，ｖ），ｃ_ｙ（ｕ，ｖ））にマップする方法を記述する２Ｄ関数は、例えば以下のように双線形混合Coonsパッチを使用して提供される。 Each boundary curve ci (i = 1, 2, 3, 4) maps to a corresponding side of an image consisting of straight lines in uv space. For example, the u axis of the corrected document image corresponds to the curve c ₁ in the distorted document image, and the v axis of the corrected document image corresponds to the curve c ₄ in the distorted document image. In this case, an arbitrary point (u, v) in the corrected document image is surrounded by a boundary curve ci (i = 1, 2, 3, 4) in the distorted document image (c _x (u , V), c _y (u, v)), a 2D function that describes how to map is provided using a bilinear mixed Coons patch, for example:

これらの式は、２つの対向する境界曲線（式の第１の項及び第２の項）の線形補間により形成され、補正関数は境界の隅の点（式の第３の項）に基づく。そのような式の更なる詳細については、Zheng Zhang、Chew Lim Tanによる「Correcting document image warping based on regression of curved text lines」proceedings of the Seventh International Conference on Document Analysis and Recongnition（ICDAR' 03）において見つけられる。 These equations are formed by linear interpolation of two opposing boundary curves (first and second terms of the equation), and the correction function is based on the corner points of the boundary (third term of the equation). More details on such expressions can be found in “Correcting document image warping based on regression of curved text lines” proceedings of the Seventh International Conference on Document Analysis and Recongnition (ICDAR '03) by Zheng Zhang, Chew Lim Tan. .

メッシュの任意のグリッドにおけるマッピング関係を取得するために、グリッドを囲む２つの関連する水平曲線がｃ_１及びｃ_３として選択され、グリッドを囲む２つの関連する垂直線はｃ_２及びｃ_４として選択される。 To get the mapping relationship in any grid of the mesh, the two related horizontal curves surrounding the grid are selected as c ₁ and c ₃ and the two related vertical lines surrounding the grid are selected as c ₂ and c ₄ Is done.

上記処理によると、メッシュの各グリッドにおける点毎にマッピングを確立することにより、歪み文書画像と補正文書画像との画素マッピングが確立される。 According to the above processing, pixel mapping between the distorted document image and the corrected document image is established by establishing mapping for each point in each grid of the mesh.

尚、自然３次スプライン法は、それらの交点をつなぐ曲線をフィッティングするために使用されるが、２次曲線等の種々の他の曲線も使用でき、対応する補間方法がマッピングに対して使用されても良い。更に、直線により交点を単純につなぐ方法も使用可能である。この場合、メッシュの各グリッドは、四角形により近似されてもよく、この四角形内の各点は周知の線形技術を使用することにより補間される。 Note that the natural cubic spline method is used to fit curves connecting their intersections, but various other curves such as quadratic curves can also be used and the corresponding interpolation method is used for mapping. May be. Furthermore, a method of simply connecting the intersections with straight lines can be used. In this case, each grid of meshes may be approximated by a square, and each point in the square is interpolated using well-known linear techniques.

最後のステップＳ３５０において、補正文書画像はマッピングにより取得される。詳細には、マップされた画像が歪み文書画像の画素に対応して取得される場合、マップされた画素は、歪み文書画像の対応する画素と同一の色でレンダリングされる。図１４は、本発明に係る歪み補正方法により図８に示す歪み画像から補正された例示的な補正文書画像を示す。補正文書画像は、透視及び反りの問題による歪みがなく、非常に平坦に見えることが分かる。補正後、補正文書画像を使用するＯＣＲ認識精度は、歪み文書画像と比較して大きく向上される。 In the final step S350, the corrected document image is acquired by mapping. Specifically, when a mapped image is acquired corresponding to a pixel of a distorted document image, the mapped pixel is rendered with the same color as the corresponding pixel of the distorted document image. FIG. 14 shows an exemplary corrected document image corrected from the distorted image shown in FIG. 8 by the distortion correcting method according to the present invention. It can be seen that the corrected document image looks very flat without distortion due to perspective and warping problems. After correction, the OCR recognition accuracy using the corrected document image is greatly improved compared to the distorted document image.

本発明の方法及びシステムは多くの方法で実行できる。例えば、ソフトウェア、ハードウェア、ファームウェア又はそれらの任意の組合せにより本発明の方法及びシステムを実行できる。方法に対するステップの上述した順序は例示することのみを意図し、本発明の方法のステップは、特に指示のない限り特に上述した順序に限定されない。更にいくつかの実施形態において、本発明は、記録媒体に記録されたプログラムとして実施されてもよく、これは本発明に係る方法を実現するための機械可読命令を含む。従って、本発明は本発明に係る方法を実現するためのプログラムを格納する記録媒体も範囲に含む。 The method and system of the present invention can be implemented in many ways. For example, the methods and systems of the present invention can be implemented by software, hardware, firmware, or any combination thereof. The above-described order of steps for the method is intended to be exemplary only, and the steps of the method of the invention are not particularly limited to the order described above unless otherwise indicated. Further, in some embodiments, the present invention may be implemented as a program recorded on a recording medium, which includes machine-readable instructions for implementing the method according to the present invention. Therefore, the present invention also includes a recording medium for storing a program for realizing the method according to the present invention.

本発明のいくつかの特別な実施形態が例を使用して詳細に実証されたが、上記例は例示することのみを意図し、本発明の範囲を限定することを意図しないことが当業者には理解されるべきである。上記実施形態は、本発明の趣旨の範囲から逸脱せずに変更可能であることが当業者には理解されるべきである。本発明の範囲は、添付の特許請求の範囲により規定される。 While several specific embodiments of the present invention have been demonstrated in detail using examples, those skilled in the art will appreciate that the above examples are intended to be illustrative only and are not intended to limit the scope of the invention. Should be understood. It should be understood by those skilled in the art that the above embodiments can be modified without departing from the scope of the spirit of the present invention. The scope of the present invention is defined by the appended claims.

Claims

A method for correcting geometric deformation in a distorted document image of an original,
A vertical vanishing point detecting step of detecting a vertical vanishing point of the distorted document image, which is a vertical vanishing point perpendicular to the text line of the document;
An image dividing step of dividing the entire region of the distorted document image into a plurality of image stripes using a plurality of vertical lines derived from the detected vertical vanishing points;
By obtaining the vanishing point of the horizontal direction parallel to the text lines of the document in each of said plurality of image stripes, and the horizontal vanishing point detecting step of detecting a horizontal vanishing point for each of the plurality of image stripes,
Wherein a plurality of vertical lines derived from the vertical vanishing point, a plurality of mesh model formed based on a horizontal line derived from each of the horizontal vanishing point detected for each of the plurality of image stripes, A distortion model generation step for establishing a distortion model describing a mapping relationship between the distortion document image and the corrected document image;
A correction step of generating a corrected document image based on the distortion model;
A method characterized by comprising:

The vertical vanishing point detecting step includes
Extracting a plurality of vertical strokes of characters from the distorted document image;
A sub-step of fitting the vertical stroke by a plurality of vertical straight line segments;
Calculating the vertical vanishing point from the vertical line segment by searching for an optimal convergence point of the vertical line segment ; and
The method of claim 1, comprising:

The horizontal vanishing point detecting step includes:
A sub-step of locating the curve of the text line along the direction of the text line from the distorted document image;
For each image stripe, a sub-step of extracting a curve segment of the text line located in the image stripe;
Fitting the fragment of the curve of the text line by a horizontal straight line segment for each image stripe ;
For each of the image stripe, by searching an optimal convergence point of the horizontal line segments, the sub-steps of calculating the horizontal vanishing point for each of the plurality of images stripes from the horizontal line segments,
The method according to claim 1 or 2 , characterized by comprising:

The sub-step of locating the curve of the text line along the direction of the text line from the distorted document image comprises:
An intermediate height point extracting step for extracting an intermediate height point for the pixel of the character of the distorted document image;
A text line curve location step that uses the intermediate height points to locate the text line curve across the intermediate height of the characters of the text line ;
4. The method of claim 3 , comprising:

A system for correcting geometric deformation in a distorted document image of a document,
Vertical vanishing point detecting means for detecting a vertical vanishing point of the distorted document image, which is a vertical vanishing point perpendicular to the text line of the original;
Image dividing means for dividing the entire region of the distorted document image into a plurality of image stripes using a plurality of vertical lines derived from the detected vertical vanishing point;
By obtaining the vanishing point of the horizontal direction parallel to the text lines of the document in each of said plurality of image stripes, and the horizontal vanishing point detecting means for detecting a horizontal vanishing point for each of the plurality of image stripes,
Wherein a plurality of vertical lines derived from the vertical vanishing point, a plurality of mesh model formed based on a horizontal line derived from each of the horizontal vanishing point detected for each of the plurality of image stripes, A distortion model generating means for establishing a distortion model describing a mapping relationship between the distortion document image and the corrected document image;
Correction means for generating a corrected document image based on the distortion model;
The system characterized by having.

The vertical vanishing point detecting means includes
Means for extracting a plurality of vertical strokes of characters from the distorted document image;
Means for fitting the vertical stroke by a plurality of vertical straight line segments;
Means for calculating the vertical vanishing point from the vertical line segment by searching for an optimal convergence point of the vertical line segment ;
The system of claim 5 further comprising:

The horizontal vanishing point detecting means includes
Means for identifying the position of the curve of the text line along the direction of the text line from the distorted document image;
Means for extracting , for each image stripe, a piece of a curve of the text line located in the image stripe;
Means for fitting the fragment of the curve of the text line by a horizontal straight line segment for each image stripe ;
For each of the image stripe, by searching an optimal convergence point of the horizontal line segments, and means for calculating the horizontal vanishing point for each of the plurality of images stripes from the horizontal line segments,
The system according to claim 5 or 6 , further comprising:

The means for identifying the position of the curve of the text line along the direction of the text line from the distorted document image comprises:
An intermediate height point extracting means for extracting an intermediate height point for the pixel of the character of the distorted document image;
Text line curve position specifying means for determining the position of the text line curve over the intermediate height of the characters of the text line using the intermediate height point ;
8. The system of claim 7, comprising:

The program for making a computer perform each step of the method of any one of Claims 1 thru | or 4.