JP2008252856A

JP2008252856A - Method of correcting image, correction program, and apparatus of correcting image distortion

Info

Publication number: JP2008252856A
Application number: JP2007169502A
Authority: JP
Inventors: Seiichi Uchida; 誠一内田; Masakazu Iwamura; 雅一岩村; Shinichiro Omachi; 真一郎大町; Koichi Kise; 浩一黄瀬
Original assignee: Osaka University NUC; Osaka Prefecture University
Current assignee: Osaka University NUC; Osaka Prefecture University
Priority date: 2007-03-07
Filing date: 2007-06-27
Publication date: 2008-10-16
Anticipated expiration: 2027-06-27
Also published as: JP4859061B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technique capable of performing skew correction even in a case where characters in a document image are not arranged linearly or even in a case where a linear portion is not present in a character, and a technique also applicable to correction of an image subjected to deformation such as shading transform or affine transform. <P>SOLUTION: A geometrical estimation technique combining invariant and variant is used to estimate a deformation amount and on the basis of the estimation, an image is corrected. In an embodiment, a change in the variant (e.g., area of circumscribed rectangle of character) is measured while rotating various character fonts in advance, and this is stored as a case together with the invariant (e.g., areas of black pixels and white pixels in a character convex closure). The case is then called by the rotation invariant calculated from connection components (most of which correspond to single characters) in an image desired to be corrected, and a rotary angle of a character is obtained from rotation variant calculated similarly. When a simple inclination estimation experiment is done using 44 pieces of samples of document image, precision is obtained that error is less than or equal to 1 degree for 22 pieces of samples and error is less than or equal to 2 degrees for 42 pieces of samples. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、広くは、画像の補正方法、補正プログラムおよび画像歪み補正装置に関する。より詳細には、画像が受けた幾何学的変形を補正する方法、プログラムおよび装置に関する。 The present invention generally relates to an image correction method, a correction program, and an image distortion correction apparatus. More particularly, the present invention relates to a method, a program, and an apparatus for correcting a geometric deformation received by an image.

文書画像を対象とする文字認識において、幾何学的な変形（幾何変形）の推定および補償は重要な問題の1つである。ここで、「文書画像」は、画像の一種であって、文書が画像として記録されたものをいう。その一部に写真やイラスト等、文字以外の画像が含まれていてもよい。例えば、スキャナによって取得された文書画像中の文字については、紙面の傾き(スキュー)による回転変形が重要な問題とされてきた。このスキュー補正については、後述のように従来より非常に多くの研究事例がある。
一方、カメラによって取得された文書画像の場合、その画像中の文字が受けている幾何変形はより多様で複雑なものになる（例えば、非特許文献１参照）。例えば、カメラと紙面が正対していないことによる射影変換歪み、紙面自体が平面になっていないことによる非線形な歪みが生じうる。これらの歪みの補正法はdewarpingと呼ばれており、カメラベース文字認識の隆盛と共に、現在活発に研究されている。 In character recognition for document images, estimation and compensation of geometric deformation (geometric deformation) is one of important problems. Here, the “document image” is a kind of image and is a document recorded as an image. Some of them may include images other than characters such as photographs and illustrations. For example, with regard to characters in a document image acquired by a scanner, rotational deformation due to the inclination (skew) of the paper has been an important problem. As will be described later, there are a great many research examples of this skew correction.
On the other hand, in the case of a document image acquired by a camera, the geometric deformation received by characters in the image becomes more diverse and complicated (see, for example, Non-Patent Document 1). For example, projective transformation distortion due to the camera and the paper not facing each other, and non-linear distortion due to the paper itself not being flat may occur. These distortion correction methods are called dewarping and are currently being actively studied along with the rise of camera-based character recognition.

デジタルカメラを用いる処理では、スキャナでは起こり得ない非一様な照明やフォーカスのずれ、対象を斜めから撮影したときに生じる射影歪みなどによって画像が劣化するため、一般にスキャナを用いた処理に比べて難しい。しかし、それにも関わらずスキャナに替えてデジタルカメラを用いる理由は、その可搬性や簡便性にある。例えば、スキャナの設置には時間と手間を要するため、手軽に移動したり、持ち歩いたりする用途には適さない。また、ポスターや看板など、大きな物や移動できない物には利用できない。一方、デジタルカメラであれば、思い立ったときに手軽に撮影することができ、これまでにない新しい使用形態に発展する可能性を秘めている。 In the process using a digital camera, the image deteriorates due to non-uniform illumination and focus shift that cannot occur with a scanner, and projection distortion that occurs when an object is photographed from an oblique direction. difficult. Nevertheless, the reason for using a digital camera instead of a scanner is its portability and simplicity. For example, since it takes time and labor to install a scanner, it is not suitable for applications where the scanner is easily moved or carried around. Also, it cannot be used for large objects that cannot be moved, such as posters and billboards. On the other hand, with a digital camera, you can easily shoot when you think of it, and it has the potential to develop into a new form of use that has never existed before.

文書画像の傾き補正法については数多くの研究事例がある。その多くは、文書画像の大局な特徴を利用している。例えば回転角度を変えながら求めた周辺分布を手がかりにした方法はその典型的な例である。一方、局所領域(例えば連結成分)毎に傾き角を推定し、それを画像全体で統合することで、文書画像全体の回転角を推定する方法も幾つか提案されている。以下にそれらを概説する。 There are many research examples on document image tilt correction methods. Many of them use the general features of document images. For example, a typical example is a method based on the peripheral distribution obtained while changing the rotation angle. On the other hand, several methods for estimating the rotation angle of the entire document image by estimating the inclination angle for each local region (for example, connected components) and integrating the inclination angle over the entire image have been proposed. They are outlined below.

まず、非特許文献２では、画像中にある一定の大きさの円の局所領域を取り出し、その円内に引いた直線上の文書の画素値の変化を測定する。角度を変えて直線を引き、その変化が最も大きかった直線の角度方向を局所領域の回転角推定結果とする。また、統合方法としては変化が最も大きかった直線の角度を文書全体の回転角とする。 First, in Non-Patent Document 2, a local region of a certain size circle is extracted from an image, and a change in pixel value of a document on a straight line drawn in the circle is measured. A straight line is drawn by changing the angle, and the angle direction of the straight line where the change is the largest is taken as the rotation angle estimation result of the local region. Also, as an integration method, the angle of the straight line with the largest change is set as the rotation angle of the entire document.

次に、非特許文献３では、まずある局所領域内の連結成分から特徴点を検出する。各特徴点に対してその最近傍の特徴点３つの間の線分の比較によりおよその回転方向を表す直線を求め、それから微小距離以内にある点を一定数用いて最小２乗近似により最終的な局所回転角を表す直線を求める。この操作を各特徴点についてそれぞれ行い、その結果を投票し、全体としての回転角を求める。 Next, in Non-Patent Document 3, feature points are first detected from connected components in a certain local region. For each feature point, a straight line representing the approximate direction of rotation is obtained by comparing line segments between the three nearest feature points, and a final number is obtained by least square approximation using a certain number of points within a minute distance. A straight line representing a local rotation angle is obtained. This operation is performed for each feature point, the result is voted, and the rotation angle as a whole is obtained.

さらに、非特許文献４では、まずある連結成分について、一定の近傍条件を満たす連結成分を結合して拡大連結成分を形成する。次にその中で最遠点となる2画素間に直線を引き、その角度を局所回転角とする。ここでこの近傍条件とは、例えば文字行の直進性を考慮するようなものである。条件を満たす連結成分がない場合は、局所回転角も求められないことになる。以上の処理を各連結成分について行い、複数求めた局所回転角の中央値を全体の回転角とする。 Further, in Non-Patent Document 4, first, for a certain connected component, connected components satisfying a certain neighborhood condition are combined to form an expanded connected component. Next, a straight line is drawn between the two pixels that are the farthest points, and the angle is set as the local rotation angle. Here, the neighborhood condition is such that, for example, the straightness of the character line is taken into consideration. If there is no connected component that satisfies the condition, the local rotation angle cannot be obtained. The above processing is performed for each connected component, and the median of the plurality of local rotation angles obtained is set as the entire rotation angle.

また、文書画像の射影歪み補正は、カメラを用いた文書画像処理における基本的な課題であるため、既に様々な研究が行われている。それらは(1)文書の枠を利用する方法、(2)文書中の文字行を利用する方法、(3)ステレオ視を利用する方法、に大別できる。 Further, since correction of projective distortion of a document image is a basic problem in document image processing using a camera, various studies have already been performed. They can be broadly divided into (1) a method using a document frame, (2) a method using a character line in a document, and (3) a method using stereo vision.

(1)の方法は、文書の枠が長方形であり、撮影した文書画像から明瞭に求められることを前提とする。長方形が射影歪みの影響を受ければ、本来平行であるはずの対辺の平行性が失われるため、取得した画像中では一般の四角形になる。ここで四角形が本来長方形であったという情報を用いて長方形に戻るような変換を求めれば、射影歪みを受けた文書画像を正対した画像に復元することができる（例えば、非特許文献５参照）。また、市販のデジタルカメラ(Ricoh Caplio（登録商標） R6等)に実装されている。 The method (1) is based on the premise that the document frame is rectangular and can be clearly obtained from the photographed document image. If the rectangle is affected by the projective distortion, the parallelism of the opposite side, which should be parallel, is lost, so that it becomes a general rectangle in the acquired image. Here, if conversion that returns to a rectangle using information that the rectangle was originally a rectangle is obtained, the document image that has undergone the projection distortion can be restored to a directly opposed image (see, for example, Non-Patent Document 5). ). It is also mounted on a commercially available digital camera (such as Ricoh Caplio (registered trademark) R6).

(2)の方法は、文字行の平行性を仮定している。例えば、非特許文献５では消失点から傾き具合を推定する手法が提案されている。また、非特許文献５には、(1)の手法と(2)の手法の両方が記載されている。この手法ではまず、文書画像中の文字行を抽出して、文書の横方向の消失点を求める。次に、文字行の両端がほぼ揃っていることを仮定して、文字行の右端、中央、左端の3本の直線を推定し、文書の縦方向の消失点も求める。このように求めた2つの消失点を利用して正対した文書画像を復元する。
(3)は、複数のカメラ（例えば、非特許文献６参照）もしくは動画（例えば、非特許文献７参照）を用いて3次元形状を復元する方法である。 Method (2) assumes parallelism of character lines. For example, Non-Patent Document 5 proposes a method for estimating the inclination from the vanishing point. Non-Patent Document 5 describes both the method (1) and the method (2). In this method, first, a character line in a document image is extracted to obtain a vanishing point in the horizontal direction of the document. Next, assuming that both ends of the character line are almost aligned, three straight lines of the right end, the center, and the left end of the character line are estimated, and the vanishing point in the vertical direction of the document is also obtained. Using the two vanishing points obtained in this way, the document image that is directly facing is restored.
(3) is a method for restoring a three-dimensional shape using a plurality of cameras (for example, see Non-Patent Document 6) or moving images (for example, see Non-Patent Document 7).

黄瀬浩一、大町真一郎、内田誠一、岩村雅一、"ディジタルカメラによる文字・文書の認識・理解" 電子情報通信学会誌、vol.89, no.9, pp.36-841, Sep. 2006.Koichi Kise, Shinichiro Omachi, Seiichi Uchida, Masakazu Iwamura, "Recognition and Understanding of Characters and Documents Using Digital Cameras" IEICE Journal, vol.89, no.9, pp.36-841, Sep. 2006. Y.Ishitani, "Document Skew Detection Based on Local Region Complexity," Proc. Int. Conf. Doc. Anal. Recog., pp.49-52, 1993.Y.Ishitani, "Document Skew Detection Based on Local Region Complexity," Proc. Int. Conf. Doc. Anal. Recog., Pp.49-52, 1993. X.Jiang, H.Bunke，and D.Widmer-Kljajo, "Skew Detection of Document Images by Focused Nearest-Neighbor Clustering," Proc. Int. Conf. Doc. Anal. Recog., pp.629-632, 1999.X.Jiang, H.Bunke, and D.Widmer-Kljajo, "Skew Detection of Document Images by Focused Nearest-Neighbor Clustering," Proc. Int. Conf. Doc. Anal. Recog., Pp.629-632, 1999. Y.Lu and C.L.Tan，"Improved Nearest Neighbor Based Approach to Accurate Document Skew Estimation," Proc. Int. Conf. Doc. Anal. Recog., pp.503-507, 2003.Y.Lu and C.L.Tan, "Improved Nearest Neighbor Based Approach to Accurate Document Skew Estimation," Proc. Int. Conf. Doc. Anal. Recog., Pp.503-507, 2003. P.Clark and M.Mirmehdi, "Recognising text in real scenes," Int'l Journal of Document Analysis and Recognition, vol.4, pp.243-257, 2002.P. Clark and M. Mirmehdi, "Recognising text in real scenes," Int'l Journal of Document Analysis and Recognition, vol.4, pp.243-257, 2002. C.H. Lampert, T.Braun, A.Ulges, D.Keysers and T.M. Breuel, "Oblivious document capture and real-time retrieval", Proc. First Int'l. Workshop on Camera-Based Document Analysis and Recognition, pp.79-86, Aug. 2005.CH Lampert, T. Braun, A. Ulges, D. Keysers and TM Breuel, "Oblivious document capture and real-time retrieval", Proc. First Int'l. Workshop on Camera-Based Document Analysis and Recognition, pp. 79- 86, Aug. 2005. 池谷彰彦、佐藤智和、池田聖、神原誠之、中島昇、横矢直和、"カメラパラメータ推定による紙面を対象とした超解像ビデオモザイキング、"信学論(D)，vol.J88-D, no.8，pp.1490-1498，Aug. 2005．Akihiko Ikeda, Tomokazu Sato, Kiyoshi Ikeda, Noriyuki Kanbara, Noboru Nakajima, Naokazu Yokoya, "Super-resolution video mosaic for paper, estimated by camera parameters," IEICE (D), vol.J88-D, no .8, pp.1490-1498, Aug. 2005.

前述したスキュー補正の手法は、いずれも、文書の局所的な直線性を仮定したものである。すなわち、文書画像を構成する成分のうち、近傍の幾つかの連結成分が直線的な文字行を為すといった仮定や、文字には直線的な部分が存在するといった仮定である。しかし、そのような仮定が成り立たない場合もある。即ち、文書画像中の文字が直線状に並んでいない場合がある。あるいは、文字に直線的な部分が存在しない場合である。従って、こうした仮定を使用せずにスキュー補正できる手法が望まれている。 All of the above-described skew correction methods assume local linearity of the document. That is, it is an assumption that, among the components constituting the document image, some connected components in the vicinity form a straight character line, or a character has a straight line portion. However, there are cases where such assumptions do not hold. That is, the characters in the document image may not be arranged in a straight line. Or it is a case where a linear part does not exist in a character. Therefore, there is a demand for a technique that can correct skew without using such assumptions.

また、文書画像のスキュー補正だけにとどまらず、より自由度の高い幾何変形、即ち、射影変換やアフィン変換などに属する幾何変形を受けた画像の補正にも適用可能な、より普遍性のある手法が望まれている。前述のように、デジタルカメラを用いた文書画像処理は優れたアプリケーションを生み出す可能性がある反面、その実現は容易ではない。その理由は、既存の文書画像処理技術の多くがスキャナで取得した文書画像を対象としていることである。つまり、デジタルカメラで取得した文書画像に既存の技術を適用するにはデジタルカメラで取得した文書画像を変換・補正し、スキャナで取得したかのような画像を得る必要がある。デジタルカメラを用いることによって生じる文書画像の劣化のうち、射影歪み補正する手法が望まれている。すなわち、デジタルカメラで斜めから撮影した文書画像から、スキャナで取得したような正対した文書画像を得ることのできる手法が望まれている。 In addition to skew correction of document images, a more universal method that can be applied to correction of geometric deformation with a higher degree of freedom, that is, correction of images that have undergone geometric deformation such as projective transformation or affine transformation. Is desired. As described above, document image processing using a digital camera may produce an excellent application, but its realization is not easy. The reason is that many existing document image processing techniques target document images acquired by a scanner. That is, in order to apply an existing technique to a document image acquired by a digital camera, it is necessary to convert and correct the document image acquired by the digital camera and obtain an image as if it was acquired by a scanner. Of the deterioration of document images caused by using a digital camera, a method for correcting projection distortion is desired. That is, there is a demand for a technique capable of obtaining a document image facing directly as obtained by a scanner from a document image taken obliquely by a digital camera.

前述の(1)の方法は、多くの文書の枠が長方形であるという合理的な仮定を用いている反面、枠を含めた文書画像全体の撮影を必要とする。
また、(2)の方法は、文書のレイアウトに強い仮定を課しているため、適用範囲が限定される点に課題がある。特殊なレイアウトのページにはまず適用不可能であり、レイアウトが一般的であっても文書中に図や数式を多く含むページでは文字行の両端の推定は容易でないからである。 The method (1) described above uses a reasonable assumption that many document frames are rectangular, but requires the entire document image including the frame to be photographed.
Further, the method (2) has a problem in that the application range is limited because a strong assumption is imposed on the layout of the document. This is because it cannot be applied to a page with a special layout, and even if the layout is general, it is not easy to estimate both ends of a character line on a page containing many figures and mathematical expressions in a document.

上記の(1)と(2)の方法は、レイアウトが複雑(文字行が平行でない)で、かつ枠が画像中に文書の枠が含まれていない画像に対して適用することができない（例えば、図16参照）。
(3)の方法は、一台のスチルカメラを用いて撮影された画像を対象とするものではなく、この発明とは用いる装置の数や種類が異なる。 The above methods (1) and (2) cannot be applied to images whose layout is complicated (character lines are not parallel) and whose frame does not include a document frame (for example, FIG. 16).
The method (3) is not intended for an image photographed using a single still camera, and differs from the present invention in the number and type of devices used.

この発明は、前述したような事情を考慮してなされたものであり、射影変換やアフィン変換、相似変換などに属する幾何変形を受けた画像の補正に適用できる歪み補正手法を提供するものである。 The present invention has been made in consideration of the above-described circumstances, and provides a distortion correction method that can be applied to correction of an image subjected to geometric deformation belonging to projective transformation, affine transformation, similarity transformation, and the like. .

この発明は、幾何学的変形を受けた画像を入力としてその画像が受けた変形を補正する方法であって、入力された画像を局所的なパターンである局所パターンに分割する工程と、各局所パターンについて、変形の程度によって値が略一定である不変量と変形の程度に応じて値が変化する変量とを所定の手順に基づいて算出する算出工程と、算出された不変量に基づいて各局所パターンを複数カテゴリの何れかに分類する工程と、各カテゴリの各局所パターンについて算出された変量に基づいてその局所パターンが受けた変形の程度を推定する推定工程と、推定結果に基づいて画像を補正する工程とを備え、各工程をコンピュータが実行することを特徴とする画像の補正方法を提供する。 The present invention relates to a method for correcting an image subjected to geometric deformation by inputting the image subjected to geometric deformation, the step of dividing the input image into local patterns which are local patterns, For the pattern, a calculation step for calculating an invariant whose value is substantially constant according to the degree of deformation and a variable whose value changes according to the degree of deformation based on a predetermined procedure, and each based on the calculated invariant A step of classifying a local pattern into one of a plurality of categories, an estimation step of estimating the degree of deformation received by the local pattern based on a variable calculated for each local pattern of each category, and an image based on the estimation result And an image correction method, wherein each step is performed by a computer.

また、異なる観点から、この発明は、幾何学的変形を受けた画像を入力としてその画像が受けた変形を補正するためのプログラムであって、入力された画像を局所的なパターンである局所パターンに分割する処理と、各局所パターンについて、変形の程度によって値が略一定である不変量と変形の程度に応じて値が変化する変量とを所定の手順に基づいて算出する算出処理と、算出された不変量に基づいて各局所パターンを複数カテゴリの何れかに分類する処理と、各カテゴリの各局所パターンについて算出された変量に基づいてその局所パターンが受けた変形の程度を推定する推定処理と、推定結果に基づいて画像を補正する処理とをコンピュータに実行させることを特徴とする画像の補正プログラムを提供する。 Also, from a different point of view, the present invention is a program for correcting an image subjected to geometric deformation as an input, and correcting the deformation received by the image. And a calculation process for calculating, based on a predetermined procedure, an invariant whose value is substantially constant according to the degree of deformation and a variable whose value changes according to the degree of deformation for each local pattern A process for classifying each local pattern into one of a plurality of categories based on the invariant, and an estimation process for estimating the degree of deformation received by the local pattern based on the variable calculated for each local pattern in each category And an image correction program that causes a computer to execute processing for correcting an image based on the estimation result.

さらに、異なる観点から、この発明は、幾何学的変形を受けた画像を入力としてその画像が受けた変形を補正する装置であって、入力された画像を局所的なパターンである局所パターンに分割する分割部と、各局所パターンについて、変形の程度によって値が略一定である不変量と変形の程度に応じて値が変化する変量とを所定の手順に基づいて算出する算出部と、算出された不変量に基づいて各局所パターンを複数カテゴリの何れかに分類する分類部と、各カテゴリの各局所パターンについて算出された変量に基づいてその局所パターンが受けた変形の程度を推定する推定部と、推定結果に基づいて画像を補正する補正部とを備えることを特徴とする画像歪み補正装置を提供する。 Furthermore, from a different point of view, the present invention is an apparatus that receives an image subjected to geometric deformation as an input and corrects the deformation received by the image, and divides the input image into local patterns that are local patterns. A division unit that calculates, for each local pattern, a calculation unit that calculates an invariant whose value is substantially constant according to the degree of deformation and a variable whose value changes according to the degree of deformation based on a predetermined procedure. A classification unit that classifies each local pattern into one of a plurality of categories based on the invariant, and an estimation unit that estimates the degree of deformation received by the local pattern based on the variable calculated for each local pattern in each category. And a correction unit that corrects the image based on the estimation result.

換言すれば、第一の側面におけるこの発明の特徴は、ある特定の幾何変形に対する「変量」と「不変量」のみを利用することで、対象が受けた幾何変形の程度（変形量）を推定する点にある。特定の幾何変形としてスキューすなわち回転を例に取ると、回転により値が変わる特徴量と変わらない特徴量を組み合わせることで回転角度を推定することになる。したがってこの発明では、文字行の傾きや各文字の縦横のストロークの傾きを手がかりとする従来のスキュー補正法とは全く異なるアプローチを取る。 In other words, the feature of the present invention in the first aspect is that only the “variable” and “invariant” for a specific geometric deformation is used to estimate the degree of geometric deformation (deformation amount) that the object has received. There is in point to do. Taking a skew or rotation as an example of a specific geometric deformation, a rotation angle is estimated by combining a feature amount whose value changes with rotation and a feature amount that does not change. Therefore, the present invention takes a completely different approach from the conventional skew correction method that uses the inclination of the character line and the inclination of the vertical and horizontal strokes of each character as a clue.

また、第二の側面におけるこの発明の特徴は、第一の側面と異なり、事例に基づいて変形量を推定する点にある。この特徴をまず単純な定式を想定して以下に説明する。今、もし文書中の各文字パターンのカテゴリがわかっていれば、そのパターンに当該カテゴリの標準パターンを回転させながら重ね合わせ、最も重なりが大きかった回転角度がその文字の回転角度の推定値になろう。この単純な方式の場合、様々な角度で回転されている標準パターンをすべて事例とし、文書中の文字パターンを各事例と比べることで、回転角度を推定していると言える。ただしこの単純方式には、回転角の範囲や拡大縮小まで考えると、事例の数が膨大になり、現実的ではないという第１の問題がある。また、文書中の各文字パターンのカテゴリはそもそも不明であり、したがって事例の選出法という第２の問題も残っている。そこで、第1の問題については、変量というスカラー量を事例と用いることで、変形量推定の効率化を図る。第２の問題については、前述の不変量を利用することで、カテゴリが不明であっても事例が参照できるような工夫をする。 Further, the feature of the present invention in the second aspect is that, unlike the first aspect, the amount of deformation is estimated based on an example. This feature will be described below assuming a simple formula. Now, if the category of each character pattern in the document is known, the standard pattern of that category is superimposed on the pattern while rotating, and the rotation angle with the largest overlap is the estimated rotation angle of that character. Let's go. In the case of this simple method, it can be said that the rotation angle is estimated by taking all the standard patterns rotated at various angles as examples and comparing the character patterns in the document with each example. However, this simple method has a first problem that the number of cases becomes enormous when considering the range of rotation angle and enlargement / reduction, which is not practical. In addition, the category of each character pattern in the document is unknown in the first place, so the second problem of how to select cases remains. Therefore, for the first problem, a scalar amount called a variable is used as an example to improve the efficiency of deformation amount estimation. For the second problem, the above invariant is used so that the case can be referred to even if the category is unknown.

この発明による文書画像の補正方法は、算出された不変量に基づいて各局所パターンを複数カテゴリの何れかに分類する工程と、各カテゴリの各局所パターンについて算出された変量に基づいてその局所パターンが受けた変形の程度を推定する推定工程と、推定結果に基づいて画像を補正する工程とを備えるので、局所パターンの不変量を用いてそのカテゴリを決定し、そのカテゴリと前記局所パターンの変量とを用いて幾何変形の程度を推定し、推定結果に基づいて画像を補正することができる。したがって、文書画像のスキュー補正だけにとどまらず、より自由度の高い幾何変形、即ち、射影変換やアフィン変換などに属する幾何変形を受けた画像の補正にも適用できる。
また、この発明の補正方法を用いて文書画像のスキュー補正を行えば、文字が直線的な部分形状と並びを持つという、従来一般に利用されてきた仮定が成り立たない場合でも、回転角度を精度よく補正することが可能となる。即ち、文書画像における局所パターンとしての文字が直線状に並んでいなくても、またひらがなのように曲線の多い文字が支配的な場合でも、回転角度を精度よく推定することができる。 The document image correction method according to the present invention includes a step of classifying each local pattern into one of a plurality of categories based on the calculated invariant, and a local pattern based on the variable calculated for each local pattern of each category. An estimation step for estimating the degree of deformation received, and a step for correcting the image based on the estimation result. Therefore, the category is determined using the invariant of the local pattern, and the category and the variable of the local pattern are determined. The degree of geometric deformation can be estimated using and and the image can be corrected based on the estimation result. Therefore, the present invention can be applied not only to skew correction of a document image but also to correction of an image subjected to geometric deformation having a higher degree of freedom, that is, an image subjected to geometric deformation belonging to projective transformation or affine transformation.
In addition, if the skew correction of the document image is performed using the correction method of the present invention, the rotation angle can be accurately determined even when the assumption that has been generally used in the past is not satisfied that the characters have a linear partial shape and alignment. It becomes possible to correct. That is, even when characters as local patterns in a document image are not arranged in a straight line, or even when characters with many curves such as hiragana are dominant, the rotation angle can be accurately estimated.

また、この発明による文書画像の補正プログラムは、算出された不変量に基づいて各局所パターンを複数カテゴリの何れかに分類する処理と、各カテゴリの各局所パターンについて算出された変量に基づいてその局所パターンが受けた変形の程度を推定する推定処理と、推定結果に基づいて画像を補正する処理とをコンピュータに実行させるので、局所パターンの不変量を用いてそのカテゴリを決定し、そのカテゴリと前記局所パターンの変量とを用いて幾何変形の程度を推定し、推定結果に基づいて画像を補正することができる。したがって、文書画像のスキュー補正だけにとどまらず、より自由度の高い幾何変形、即ち、射影変換やアフィン変換などに属する幾何変形を受けた画像の補正にも適用できる。 Further, the document image correction program according to the present invention includes a process for classifying each local pattern into one of a plurality of categories based on the calculated invariant, and a variable calculated for each local pattern in each category. Since the computer executes an estimation process for estimating the degree of deformation received by the local pattern and a process for correcting the image based on the estimation result, the category is determined using the invariant of the local pattern, and the category and The degree of geometric deformation can be estimated using the local pattern variables, and the image can be corrected based on the estimation result. Therefore, the present invention can be applied not only to skew correction of a document image but also to correction of an image subjected to geometric deformation having a higher degree of freedom, that is, an image subjected to geometric deformation belonging to projective transformation or affine transformation.

以下、この発明の好ましい実施形態について説明する。
この発明による補正方法および補正プログラムにおいて、前記幾何学的変形は、射影変換、アフィン変換もしくは相似変換であってもよい。 Hereinafter, preferred embodiments of the present invention will be described.
In the correction method and the correction program according to the present invention, the geometric deformation may be projective transformation, affine transformation, or similarity transformation.

前記画像は文書画像であり、少なくとも一部の局所パターンは文字パターンであってもよい。 The image may be a document image, and at least a part of the local pattern may be a character pattern.

さらに、前記幾何学的変形は回転であり、変量は、局所パターンを回転させることにより変化する値であり、不変量は、局所パターンを回転させても略一定の値であってもよい。 Further, the geometric deformation is rotation, the variable is a value that changes by rotating the local pattern, and the invariant may be a substantially constant value even if the local pattern is rotated.

前記幾何学的変形は射影変換であり、変量は、奥行きにより変化する値であり、不変量は、奥行きの変化に対して略一定の値であってもよい。 The geometric deformation may be projective transformation, the variable may be a value that varies with depth, and the invariant may be a substantially constant value with respect to a change in depth.

さらに、前記変量は局所パターンに外接する矩形の面積であってもよい。
あるいは、前記変量は局所パターンの黒画素部分の面積であってもよい。 Further, the variable may be a rectangular area circumscribing the local pattern.
Alternatively, the variable may be the area of the black pixel portion of the local pattern.

またさらに、前記不変量は、局所パターンの凸包内の面積であってもよい。
ここで、凸包とは、あるパターン（ここでは、局所パターン）を包含する凸多角形のうち最小面積のものをいう。凸多角形とは、頂部の内角がすべて１８０度未満の多角形をいう。 Still further, the invariant may be an area within the convex hull of the local pattern.
Here, the convex hull refers to a convex polygon having a minimum area among convex polygons including a certain pattern (here, a local pattern). A convex polygon means a polygon in which the inner angles of the tops are all less than 180 degrees.

局所パターンは画像中の連結成分として分割されるパターン、もしくは、そのパターンの集合であってもよい。 The local pattern may be a pattern divided as a connected component in the image or a set of the patterns.

また、各カテゴリは、そのカテゴリに対する不変量をq_c、各局所パターンの不変量をq_xとするとき、
（ただし、εは予め定められた定数）
の関係を満たす局所パターンからなっていてもよい。 Each category has q _{c as} the invariant for that category and q _{x as} the invariant for each local pattern.
(Where ε is a predetermined constant)
It may consist of a local pattern that satisfies this relationship.

さらにまた、前記推定工程は、各局所パターンから算出される変量と各カテゴリに対応して予め記憶された基準値とを比較して局所パターン毎に変形の程度を仮推定し、仮推定された各結果を統計的に処理して変形の程度を推定してもよい。 Furthermore, the estimation step tentatively estimates the degree of deformation for each local pattern by comparing a variable calculated from each local pattern with a reference value stored in advance corresponding to each category. Each result may be statistically processed to estimate the degree of deformation.

さらに前記基準値は、各カテゴリの標準パターンを段階的に変形させて変量を測定し、各段階の変形量と測定された変量とを対応付けて記憶されたものであってもよい。 Further, the reference value may be a variable in which the standard pattern of each category is deformed step by step to measure a variable, and the deformation amount of each step and the measured variable are stored in association with each other.

また、前記推定工程は、各局所パターンの位置とその局所パターンの変量との関係に基づいて変形の程度をカテゴリ別に仮推定し、仮推定された各結果を統計的に処理して変形の程度を推定してもよい。
具体的には、例えば、「変量」として連結成分の面積、「不変量」として面積の比を用いる。これらの値はどのような文書からも計算できる基本的な量であり、簡便に計算できるため、他の手法のように文書の長方形の枠が完全に写っていることや、文書中の文字が直線上に並んでいるなどのレイアウトに関する強い制約を課さない。そのため、図1５に示す特異なレイアウトの文書を始め、幅広い対象に適用することが可能である The estimation step temporarily estimates the degree of deformation for each category based on the relationship between the position of each local pattern and the variable of the local pattern, and statistically processes each temporarily estimated result to determine the degree of deformation. May be estimated.
Specifically, for example, the area of the connected component is used as the “variable”, and the area ratio is used as the “invariant”. These values are basic quantities that can be calculated from any document and can be calculated easily, so that the rectangular frame of the document is completely visible as in other methods, and the characters in the document Does not impose strong restrictions on the layout such as being aligned on a straight line. Therefore, it can be applied to a wide range of objects, including documents with a unique layout shown in FIG.

入力された画像は文書画像であり、前記幾何学的変形は射影変換であり、変量は、局所パターンの黒画素部分の面積であり、不変量は、局所パターンの凸包内の面積であり、前記推定工程は、各局所パターンの黒画素部分の面積とその紙面上の位置との関係に基づいて前記文書画像の紙面の傾きの仮推定を行ってもよい。 The input image is a document image, the geometric deformation is projective transformation, the variable is the area of the black pixel portion of the local pattern, and the invariant is the area within the convex hull of the local pattern, In the estimation step, provisional estimation of the inclination of the paper surface of the document image may be performed based on the relationship between the area of the black pixel portion of each local pattern and its position on the paper surface.

ここで示した種々の好ましい実施形態は、それら複数を組み合わせることもできる。 The various preferred embodiments shown here can also be combined together.

以下、図面を用いてこの発明をさらに詳述する。なお、以下の説明は、すべての点で例示であって、この発明を限定するものと解されるべきではない。
≪実施の形態１≫
この実施形態では、まず、この発明の手法を文書画像のスキュー補正に適用する場合を例としてその手順を説明する。なお、補正の結果の定量的・定性的な説明は、実験例１に後述する。 Hereinafter, the present invention will be described in more detail with reference to the drawings. In addition, the following description is an illustration in all the points, Comprising: It should not be interpreted as limiting this invention.
<< Embodiment 1 >>
In this embodiment, first, the procedure will be described by taking as an example the case of applying the technique of the present invention to skew correction of a document image. A quantitative and qualitative description of the correction result will be described later in Experimental Example 1.

しかし、この発明の技術的思想は、それに限定されるものではなく、射影変換やアフィン変換など、より自由度の高い幾何変形を受けた画像の補正に適用することも原理的に可能である。また、補正の対象は文書画像に限定されず、所定のパターンを含む画像であれば、原理的に適用可能である。一般的な幾何変形および補正対象への拡張については、文書画像のスキュー補正を説明した後に触れる。 However, the technical idea of the present invention is not limited to this, and can be applied in principle to correction of an image subjected to geometric deformation with a higher degree of freedom such as projective transformation or affine transformation. Further, the correction target is not limited to the document image, and any image including a predetermined pattern can be applied in principle. General geometric deformation and expansion to a correction target will be described after describing skew correction of a document image.

1. 変量と不変量の組み合わせによる傾き推定
ここでは、変量と不変量の組み合わせによる幾何変形推定の方法について述べる。前述のように、幾何変形のうち回転を例にとって説明する。また、補正の対象についても文書画像に限定して論ずる。この発明の補正方法を用いて文書画像のスキュー補正を行えば、文字が直線的な部分形状と並びを持つという、従来一般に利用されてきた仮定が成り立たない場合でも、回転角度を精度よく補正することが可能となる。即ち、文書画像における局所パターンとしての文字が直線状に並んでいなくても、またひらがなのように曲線の多い文字が支配的な場合でも、回転角度を精度よく推定することができる。 1. Gradient estimation using a combination of variables and invariants This section describes a method for estimating geometric deformation using a combination of variables and invariants. As described above, rotation will be described as an example of geometric deformation. Further, the correction target is also limited to document images. If the skew correction of the document image is performed using the correction method of the present invention, the rotation angle can be accurately corrected even when the generally used assumption that characters have a linear partial shape and alignment is not satisfied. It becomes possible. That is, even when characters as local patterns in a document image are not arranged in a straight line, or even when characters with many curves such as hiragana are dominant, the rotation angle can be accurately estimated.

1-1. 傾き推定の原理
まず、文書画像内に存在しうる文字カテゴリについて、標準パターンを準備する。前述のように、この標準パターンのすべての回転画像を事例として登録しておけば、それらすべてと入力文書画像中の各文字(連結成分)を比較照合し、最も照合した回転標準パターン画像の角度から、文書画像の回転角を推定できる。この力づくの単純方式は、非常に明確ではあるが、明らかに効率が悪い。 1-1. Principle of tilt estimation First, prepare standard patterns for character categories that may exist in the document image. As described above, if all the rotated images of this standard pattern are registered as examples, all of them are compared with each character (connected component) in the input document image, and the angle of the rotated standard pattern image most matched From this, the rotation angle of the document image can be estimated. This simple method of powering is very clear but clearly inefficient.

そこでこの発明は、回転変量を事例として用いることで効率的な傾き推定を行う。推定を行うための準備段階として、事例の収集を行う。事例の収集を終えたら、その結果を用いて、入力された文書画像の傾きを推定する。なお、理解を容易にするために、ここでの説明では、入力文書画像中のある文字のカテゴリがcとわかっているものと仮定する。(後に述べる手法によりこの仮定は不要になる。) Therefore, the present invention performs efficient tilt estimation by using a rotation variable as an example. Cases are collected as a preparation stage for estimation. When the collection of cases is finished, the inclination of the input document image is estimated using the result. For ease of understanding, it is assumed in this description that the category of a certain character in the input document image is known as c. (This assumption is unnecessary by the method described later.)

1-1-1. 事例の収集(学習ステップ)
各カテゴリcの標準パターンを少しずつ回転させながら変量pを測定する。これを回転角θと変量pの関係p=p_c(θ)として保存しておく。一種の学習ステップであり、事例の収集段階でもある。カテゴリ集合については、英文書画像の場合、"A"〜"Z"、"a"〜"z"である。 1-1-1. Collection of cases (learning steps)
The variable p is measured while rotating the standard pattern of each category c little by little. This is stored as a relation p = _pc (θ) between the rotation angle θ and the variable p. It is a kind of learning step and is also a collection stage of cases. The category set is “A” to “Z” and “a” to “z” in the case of an English document image.

1-1-2. 傾きの推定(推定ステップ)
入力文書画像中の(傾いた)文字パターンxについて変量p_xを求める。そのカテゴリをcとすれば、p_x=p_c(θ)を満たすθが、入力文書画像の回転角の候補になる。このθを求めるのは、1次元関数p_c(θ)の逆引きテーブルを作っておけばよく、O(1)の少ない計算量で済む。
このように回転変量を使えば、非常に容易に傾き推定が可能になる。以下では、カテゴリcの推定法、推定ステップの詳細、ならびに実際に用いる変量と不変量について述べる。 1-1-2. Inclination estimation (estimation step)
A variable p _x is obtained for a (tilted) character pattern x in the input document image. If the category is c, θ satisfying p _x = p _c (θ) is a candidate for the rotation angle of the input document image. This θ can be obtained by creating a reverse lookup table for the one-dimensional function p _c (θ), which requires a small amount of calculation with O (1).
In this way, using the rotation variable makes it possible to estimate the inclination very easily. In the following, the estimation method for category c, the details of the estimation steps, and the variables and invariants actually used are described.

1-2. 不変量によるカテゴリcの推定
前節では、各文字のカテゴリcがわかっていることを仮定していた。しかし、傾き補正以前の段階でカテゴリcを知ることができるならば、そもそも傾き補正の不要な状況であろう。このため、不確かながら、事例を参照するためのカテゴリの確定が必要になる。
そこで、各文字(連結成分)のカテゴリcを、画像情報から計算される回転不変量qを用いて、回転変形に対してロバストに（安定して確実に）推定することを考える。具体的には、各カテゴリcの標準パターンについて、その回転不変量q_cを求めておく。そうして、入力文字xのカテゴリを推定する際は、その文字から回転不変量q_xを求め、q_x=q_cとなるcを求める。文書画像の回転についてq_xは不変であるから、原理的には正しいcが求められる。 1-2. Estimation of category c by invariant In the previous section, it was assumed that category c of each character was known. However, if the category c can be known before the tilt correction, the situation will not require tilt correction in the first place. For this reason, it is uncertain that it is necessary to determine a category for referring to the case.
Therefore, it is considered to estimate the category c of each character (connected component) robustly (stably and reliably) with respect to rotational deformation using the rotation invariant q calculated from the image information. Specifically, the rotation invariant q _c is obtained for the standard pattern of each category c. Thus, when estimating the category of the input character x, the rotation invariant q _x is obtained from the character, and _{c where} q _x = q _c is obtained. Since q _x does not change with respect to the rotation of the document image, in principle, correct c is obtained.

しかし実際には回転不変量qだけを頼りに文字カテゴリを唯一に絞り込むのは難しい。その難易度は用いる不変量やカテゴリ数、文書画像のノイズなどに因るが、要するに単一の特徴量による文字認識であり、厳密なカテゴリ推定を望むのには無理がある。このため、実際には小数ε（epsilon、イプシロン）を用いて、 In practice, however, it is difficult to narrow down the character category solely based on the rotation invariant q. The degree of difficulty depends on the invariants used, the number of categories, the noise of the document image, etc., but in short, it is character recognition based on a single feature amount, and it is impossible to expect strict category estimation. For this reason, actually using the decimal ε (epsilon),

を満たす複数のcをカテゴリ候補として利用することになる。 A plurality of c satisfying the condition is used as a category candidate.

1-3. 推定ステップの詳細
以上の推定ステップで「候補」という言葉が用いられていることからもわかるように、1文字だけではθを一意に決めることは困難である。これは次の理由による。 1-3. Details of the estimation step As can be seen from the use of the word “candidate” in the above estimation steps, it is difficult to uniquely determine θ using only one character. This is due to the following reason.

第一は、pとθが一対一対応でなく、同じ変量pに複数のθが対応することが多い点にある。したがって1文字では複数の回転角候補が与えられることになる。
第二に、回転角θを変えても変量pがあまり変化しないような場合(関数p_c(θ)が平坦になる場合)、変量pの測定誤差にθが鋭敏に反応し、推定値θの信頼性が低くなる。
第三に、前述したように推定されるカテゴリcが複数存在する点である。結局、これら複数のcについて、前節の推定ステップを駆動する他なく、結局複数の回転角の候補が与えられることになる。 The first is that p and θ do not have a one-to-one correspondence, and a plurality of θ often corresponds to the same variable p. Therefore, multiple rotation angle candidates are given for one character.
Second, if the variable p does not change much even when the rotation angle θ is changed (when the function p _c (θ) becomes flat), θ reacts sharply to the measurement error of the variable p, and the estimated value θ The reliability becomes low.
Third, as described above, there are a plurality of estimated categories c. Eventually, with respect to the plurality of c, there is no other way than driving the estimation step in the previous section, and eventually a plurality of rotation angle candidates are given.

そこで文書画像中の複数の文字から推定される角度を投票していき、最も投票結果の多いものを推定結果とする。即ち、統計的手法の一つとして投票処理を適用し、推定結果を得る。文書画像中に異なるカテゴリの文字が複数存在すれば、それだけ異なった事例によって候補が与えられることになり、多くの誤った候補が混入したとしても、それらが誤った一点に集中する可能性は低いと期待される。 Therefore, the angle estimated from a plurality of characters in the document image is voted, and the one with the highest voting result is used as the estimation result. That is, the voting process is applied as one of statistical methods to obtain an estimation result. If there are multiple characters of different categories in the document image, candidates will be given by different cases, and even if many wrong candidates are mixed, it is unlikely that they will be concentrated on one wrong point. It is expected.

以上の推定ステップの処理を、図1を通して具体的に説明する。図１は、文書画像の回転角度を推定する様子を示す説明図である。
ステップ１：まず、入力文書画像中の文字パターンxについて変量p_x、不変量q_xを計算する。
ステップ２：算出されたq_xに基づき、
となるカテゴリを選択する。
ステップ３：p_x=p_c(θ)となる角度、θ₁, θ₂, θ₃にそれぞれ投票する。図１の場合、θ₂が2つあるので2回投票する。このような処理を文書画像中の全ての連結成分に対して行い、最も投票数の多かったものをその文書の回転角とする。 The processing of the above estimation step will be specifically described with reference to FIG. FIG. 1 is an explanatory diagram showing how the rotation angle of a document image is estimated.
Step 1: First, a variable p _x and an invariant q _x are calculated for the character pattern x in the input document image.
Step 2: Based on the calculated q _x
Select a category.
Step 3: Vote for angles, θ ₁ , θ ₂ , and θ _3, where p _x = p _c (θ). In the case of FIG. 1, there are two θ ₂ , so vote twice. Such processing is performed for all connected components in the document image, and the one with the largest number of votes is set as the rotation angle of the document.

事例p_c(θ)を基に投票する際、不変量の場合と同様に、p_xの誤差に配慮し、
を満たす範囲のθについてすべて投票する。図2は、上式の関係を満たす回転角度θに対する投票の様子を示すグラフである。ここで留意すべき点は、変量関数p_c(θ)の傾きによって投票範囲の大きさが異なることである。
以上のステップで画像の回転角度（傾き）が推定できたならば、その推定結果に応じて対象の文書画像を補正する。つまり、傾きがゼロになるように画像を回転させる。 When voting based on case p _c (theta), as in the case of invariants, friendly error of p _x,
Vote for all θ in the range that satisfies FIG. 2 is a graph showing a voting state with respect to the rotation angle θ satisfying the relationship of the above equation. The point to be noted here is that the size of the voting range varies depending on the slope of the variable function p _c (θ).
If the rotation angle (tilt) of the image can be estimated by the above steps, the target document image is corrected according to the estimation result. That is, the image is rotated so that the inclination becomes zero.

1-4. 変量と不変量の具体例
この発明では、回転に対する任意の変量と不変量を用いてもよいが、この発明では最も単純なものを用いる。具体的には、変量pとして、文書画像中の1連結成分を回転させることにより変化する値、具体的には外接矩形の面積(図3(a))を用いる。また、不変量qとして、回転させても変化しない値、具体的には凸包内の面積(図3(b))を用いる。図３は、文字「A」を例に、この実施形態における変量／不変量である文字の外接矩形の面積／凸包の面積を示す説明図である。ここで、これらの面積と黒画素の面積の比をとることによりこの変量と不変量を、画像中に占める連結成分の大きさに依らないようにしておく。すなわち、p，q共にスケール不変量としておく。 1-4. Specific Examples of Variables and Invariants In this invention, arbitrary variables and invariants with respect to rotation may be used, but in this invention, the simplest one is used. Specifically, as the variable p, a value that changes by rotating one connected component in the document image, specifically, an area of a circumscribed rectangle (FIG. 3A) is used. Further, as the invariant q, a value that does not change even when rotated, specifically, the area in the convex hull (FIG. 3B) is used. FIG. 3 is an explanatory diagram illustrating the area of the circumscribed rectangle / area of the convex hull of the character that is the variable / invariant in this embodiment, taking the character “A” as an example. Here, by taking the ratio between these areas and the area of the black pixels, these variables and invariants are made independent of the size of the connected component in the image. That is, both p and q are set as scale invariants.

なお、この明細書で黒画素は、「地」、即ち、背景が白色、局所パターンが黒色であると想定して「黒画素」、「白画素」の語を用いている。しかし、局所パターンと「地」とが識別できれば、それらの色は黒と白に限定されない。例えば、赤色のパターンは、赤色の画素をこの明細書でいう「黒画素」とすればよい。地が黄色の場合は、黄色の画素をこの明細書でいう「白画素」とすればよい。 In this specification, the term “black pixel” and “white pixel” are used for the black pixel on the assumption that the background is white, that is, the background is white and the local pattern is black. However, if the local pattern and the “ground” can be identified, their colors are not limited to black and white. For example, in a red pattern, a red pixel may be a “black pixel” in this specification. When the ground is yellow, the yellow pixel may be a “white pixel” in this specification.

回転変量pおよび不変量qを改めて式で表すと以下のようになる。
以上の変量pは、図4のように周期的に変化する。図４は、この実施形態におけるカテゴリ“y"につき、回転角度が-180°から180°の範囲における変量pの値を示すグラフである。そのため、その1周期分、この場合-45°から45°だけが使用可能であり、この実施形態による推定可能な回転角も同様の範囲に限定される。 The rotational variable p and the invariant q can be expressed again by the following equations.
The variable p described above changes periodically as shown in FIG. FIG. 4 is a graph showing the value of the variable p in the range of the rotation angle from −180 ° to 180 ° for the category “y” in this embodiment. Therefore, only one cycle, in this case, −45 ° to 45 ° can be used, and the rotation angle that can be estimated according to this embodiment is limited to the same range.

≪実施の形態２≫
この実施形態では、文書画像の射影歪みをアフィン歪みに補正する場合を例としてその手順を説明する。なお、補正の結果の定量的・定性的な説明は、実験例２に後述する。
この実施形態の説明のために、最初にコンピュータビジョンにおける3次元座標と2次元座標の関係について述べ、その後この実施形態の詳細について述べる。 << Embodiment 2 >>
In this embodiment, the procedure will be described by taking as an example a case where the projection distortion of a document image is corrected to affine distortion. A quantitative and qualitative description of the correction result will be described later in Experimental Example 2.
For the purpose of describing this embodiment, the relationship between the three-dimensional coordinates and the two-dimensional coordinates in computer vision is first described, and then details of this embodiment are described.

2.1. カメラ座標系と画像座標系
3次元空間の物体をカメラで対象の物体を撮影するとき、どのように2次元画像が得られるかを考える。図11は、ピンホールカメラをモデルとする座標系と、それを配置し直した座標系（コンピュータビジョンの技術分野で慣用される座標系）とを示す説明図である。
通常、カメラのモデルに図11(a)のようなピンホールカメラを用いる。点Cがピンホールで、物体11から来る光は全てピンホールを通り、画像平面Iに像を結ぶ。点Cを焦点と呼ぶ。また、画像平面Iと焦点Cの距離fを焦点距離、焦点Cを通り画像平面に垂直な直線を光軸と呼び、光軸と画像平面の交点を画像中心cとおく。このモデルでは平行な2線は必ずしも平行線に変換されない。このような変換を射影変換、射影変換による歪みを射影歪みと呼ぶ。また射影変換のうち、平行線を変換したとき平行線であるものをアフィン変換、アフィン変換を受けたときの歪みをアフィン歪みと呼ぶ。 2.1. Camera coordinate system and image coordinate system
Consider how a 2D image can be obtained when an object in 3D space is photographed with a camera. FIG. 11 is an explanatory diagram showing a coordinate system using a pinhole camera as a model and a coordinate system in which the pinhole camera is rearranged (a coordinate system commonly used in the technical field of computer vision).
Usually, a pinhole camera as shown in FIG. 11 (a) is used as a camera model. Point C is a pinhole, and all the light coming from the object 11 passes through the pinhole and forms an image on the image plane I. Point C is called the focal point. The distance f between the image plane I and the focal point C is referred to as the focal length, the straight line passing through the focal point C and perpendicular to the image plane is referred to as the optical axis, and the intersection of the optical axis and the image plane is defined as the image center c. In this model, two parallel lines are not necessarily converted to parallel lines. Such conversion is called projective transformation, and distortion caused by projective transformation is called projective distortion. Of the projective transformations, those that are parallel lines when the parallel lines are transformed are called affine transformations, and the distortions that have undergone affine transformations are called affine distortions.

コンピュータビジョンの技術分野では一般に、画像平面を図11(b)のように配置し直して用いる（例えば、徐剛、辻三郎著，「3次元ビジョン」，共立出版，1998年、および、出口光一郎著，「画像と空間」，昭晃堂，1991年参照）。画像平面で画像中心cを原点とし、図11(b)のようにx軸とy軸を持つ座標系を画像座標系と呼ぶ。また、焦点Cを原点、光軸をZ軸とし、画像座標系のx軸、y軸に対応する方向にX軸、Y軸を持つ3次元座標系をカメラ座標系と呼ぶ。カメラ座標系の点(X,Y,Z)^Tを画像平面に射影したとき、画像平面の座標系でこれに対応する点(X,Y)^Tは In the field of computer vision technology, image planes are generally rearranged and used as shown in Fig. 11 (b) (for example, Xugang, Saburo Tsubaki, "3D Vision", Kyoritsu Publishing, 1998, and Koichiro Deguchi , “Image and Space”, Shosodo, 1991). A coordinate system having the image center c on the image plane as the origin and having an x-axis and a y-axis as shown in FIG. 11B is called an image coordinate system. A three-dimensional coordinate system having the focal point C as the origin, the optical axis as the Z axis, and the X and Y axes in the directions corresponding to the x and y axes of the image coordinate system is referred to as a camera coordinate system. When a point (X, Y, Z) ^T in the camera coordinate system is projected onto the image plane, the corresponding point (X, Y) ^T in the image plane coordinate system is

で求まる。 It is obtained by

2.2. 黒画素の面積と奥行き
文書画像中の各文字に注目すると、本来同じ字種であっても位置によって大きさが変化している。例えば、画像中からアルファベットのaのような特定の文字だけを取り出したときを考える。このとき、文字の黒画素の面積は射影歪みによって、カメラから近い方が大きく、遠い方が小さくなる。この面積の変化から文書の奥行き情報を求めることができるので、この実施形態ではこれを基に文書画像の補正を行う。 2.2. Area and Depth of Black Pixel Focusing on each character in the document image, the size changes depending on the position even for the same character type. For example, consider a case where only a specific character such as a in the alphabet is extracted from an image. At this time, the area of the black pixel of the character is larger near the camera and smaller near the camera due to projection distortion. Since the depth information of the document can be obtained from the change in the area, in this embodiment, the document image is corrected based on this.

奥行きと文字の黒画素の面積の関係について求める。紙面13上の同じ種類の文字を選んだときに、それらの中心のZ座標がそれぞれZ₁,Z₂であるとする。問題を簡単にするため、図12の模式図を用いて奥行きと文字の長さの関係を考える。図12は、傾いた紙面13をカメラから見たときの文字の射影と近似を説明するための説明図である。図12で、取り出した文字の本来の長さをLとすると、このときZ₁,≠Z₂なので、画像から得られる文字の射影の長さl₁，l₂は異なっている。それぞれの長さを簡単に求めるため、図12(b)のような近似を行う。つまり、それぞれの中心に画像平面と平行な平面をおき、文字の正射影を得る。紙面13と画像平面のなす角をα、正射影の長さをL'とすると、 The relationship between the depth and the area of the black pixel of the character is obtained. Assume that when the same type of characters on the paper surface 13 are selected, the Z coordinates of their centers are Z ₁ and Z ₂ , respectively. To simplify the problem, consider the relationship between depth and character length using the schematic diagram in FIG. FIG. 12 is an explanatory diagram for explaining the projection and approximation of characters when the inclined paper surface 13 is viewed from the camera. In FIG. 12, assuming that the original length of the extracted character is L, Z ₁ , ≠ Z _{2 at} this time, so the projection lengths l ₁ and l ₂ of the character obtained from the image are different. In order to easily obtain each length, approximation as shown in FIG. 12 (b) is performed. That is, a plane parallel to the image plane is placed at the center of each, and an orthogonal projection of the character is obtained. If the angle between the paper surface 13 and the image plane is α and the length of the orthogonal projection is L ′,

である。従って、式(1)のx座標についてのみ考えれば、 It is. Therefore, considering only the x-coordinate of equation (1),

が得られる。 Is obtained.

次に、これらの文字の面積に関しても同様に考える。それぞれの面積がSであり、正射影の面積をS'とすると、 Next, the same applies to the area of these characters. If each area is S and the area of orthographic projection is S ',

となる。このとき、画像平面への写像の面積s_jは、 It becomes. At this time, the area s _j of the mapping to the image plane is

である。 It is.

式(5)より画像平面上の面積は本来の面積に対して、奥行きの2乗に反比例していることがわかる。従って、ある字種の黒画素の面積に注目したとき、j番目の文字の奥行きZ_jを黒画素の面積s_jで表すと、 From equation (5), it can be seen that the area on the image plane is inversely proportional to the square of the depth with respect to the original area. Therefore, when attention is paid to the area of a black pixel of a certain character type, the depth Z _j of the j-th character is represented by the area s _j of the black pixel,

となる。また、焦点距離f、紙面13と画像平面のなす角αは撮影時に決まっているので、それぞれ定数である。
次に各文字の奥行きからカメラ座標系における紙面13の傾きを考える。式(1)よりj番目の文字の座標(X_j,Y_j,Z_j)^Tは、 It becomes. Further, since the focal length f and the angle α between the paper surface 13 and the image plane are determined at the time of photographing, they are constants.
Next, the inclination of the paper surface 13 in the camera coordinate system is considered from the depth of each character. The coordinate (X _j , Y _j , Z _j ) ^T of the _jth character from equation (1) is

である。ここで、Zを1/f倍したZ'で置き換えた座標系で各座標を表現すると、 It is. Here, if each coordinate is expressed in a coordinate system replaced with Z ′ that is Z multiplied by 1 / f,

となり、形の上では未知の定数fを消去できる。画像中の文字は本来同一平面上にあるので、式(7)を用いて各文字の3次元座標を計算し、Z'=aX+bY+cで表される平面に当てはめることで紙面13の傾きが推定できる。詳細は、2.4で述べる。 Thus, the unknown constant f can be eliminated in terms of shape. Since the characters in the image are essentially on the same plane, the three-dimensional coordinates of each character are calculated using Equation (7) and applied to the plane represented by Z '= aX + bY + c. The slope can be estimated. Details are described in 2.4.

2.3. 面積比によるクラスタリング
前記項目2.2で述べた、面積から奥行きを推定する方法は文書中に文字が一種類しか存在しない場合にのみ使用できる。しかし、実際の文書中には複数の字種が混在しているため、あらかじめ文字をその種類ごとに分けておく必要がある。字種を判別する方法として、文字認識が考えられるが、射影歪みを受けている場合は処理が難しい。また、ここでは字種を分類できればよく、文字認識のように文字にラベルを付ける必要はない。 2.3. Clustering by area ratio The method of estimating the depth from the area described in item 2.2 can be used only when there is only one type of character in the document. However, since a plurality of character types are mixed in an actual document, it is necessary to divide the characters for each type in advance. Character recognition can be considered as a method for discriminating character types, but processing is difficult when subject to projective distortion. Here, it is only necessary to classify the character type, and it is not necessary to label the character as in character recognition.

そこでこの実施形態では、アフィン変換を受けても変化しない量(アフィン不変量)である面積比を用いた分類を考える。すなわち、文字から2つの領域を得たとき、その2領域の面積比から不変量を得、これを用いて字種を分類する。アフィン不変量は射影変換に対して不変ではないが、局所領域において射影変換をアフィン変換に近似できることから、文字領域のように面積の小さな領域の面積比を射影不変量のように扱うことができる。 Therefore, in this embodiment, a classification using an area ratio, which is an amount that does not change even when subjected to affine transformation (affine invariant), is considered. That is, when two regions are obtained from a character, an invariant is obtained from the area ratio of the two regions, and the character type is classified using this. Affine invariants are not invariant to projective transformations, but projective transformations can be approximated to affine transformations in local regions, so the area ratio of a small area such as a character region can be treated like a projection invariant. .

字種の判別に面積比が満たすべき条件は次の2つである。
(1) 面積比(面積)を計算する領域は射影変換を受けても同一でなければならない。
(2) 面積比は、字種を十分に判別できなくてはならない。 There are two conditions that the area ratio must satisfy to determine the character type.
(1) The area where the area ratio (area) is calculated must be the same even if it undergoes projective transformation.
(2) The area ratio must be able to distinguish the character type sufficiently.

前記項目(1)について、面積比は不変量であるため、同一の領域から(近似的に)同一の値を算出することができる。しかし、面積を計算する領域が異なれば同一の値を計算することができないため、射影変換に不変な領域抽出法が必要となる。この実施形態では、各領域の凸包が線形変換に対して不変であることを利用する。 Regarding the item (1), since the area ratio is an invariant, the same value can be calculated (approximately) from the same region. However, since the same value cannot be calculated if the area for calculating the area is different, an area invariant method for projective transformation is required. This embodiment utilizes the fact that the convex hull of each region is invariant to linear transformation.

前記項目(2)については、異なる字種が偶然同一の面積比を持つ可能性がある。その場合、字種の混同が生じ、紙面の傾きを正しく推定できない。そこで複数の面積比を用い、面積比の判別性能を向上させる。異なる字種から計算した1種類の面積比が偶然近くなる確率に比べて、複数の面積比が偶然同時に近くなる確率は小さいからである。1文字から計算できる面積の種類は限られているため、この実施形態では最近傍の2文字を組み合わせ、2文字から計算した面積を使用する。図13および図14は、選ばれた2文字から変量と不変量とを得る例を示す説明図である。図13に示すように、2文字を選んだとき(この場合は"t"と"h")、文字の黒画素領域と凸包領域から図14(a)〜(e)の5種類の領域が得られる。これらを組み合わせることで、面積比を複数作り出すことができる。得られたm個の面積比はm次元の不変量ベクトルとして用いる。 Regarding the item (2), different character types may accidentally have the same area ratio. In that case, confusion of character types occurs, and the inclination of the page cannot be estimated correctly. Therefore, a plurality of area ratios are used to improve the area ratio discrimination performance. This is because the probability that a plurality of area ratios are close by chance is smaller than the probability that one area ratio calculated from different character types is close by chance. Since the types of area that can be calculated from one character are limited, in this embodiment, the nearest two characters are combined and the area calculated from the two characters is used. FIG. 13 and FIG. 14 are explanatory diagrams illustrating an example of obtaining a variable and an invariant from two selected characters. As shown in FIG. 13, when two characters are selected (in this case, “t” and “h”), the five types of regions shown in FIGS. 14 (a) to (e) are determined from the black pixel region and the convex hull region of the character. Is obtained. By combining these, a plurality of area ratios can be created. The obtained m area ratios are used as m-dimensional invariant vectors.

文書中から取り出された文字の集合(以降クラスタあるいはカテゴリと呼ぶ)を、面積比の近い文字の部分集合に分けることを考える。この実施形態ではクラスタリングにk-means法を用いる。クラスタリングにより得られたそれぞれのクラスタには同じ字種の組が含まれると期待される。以降の処理は2文字毎に行う。 Let us consider dividing a set of characters taken out of a document (hereinafter referred to as a cluster or category) into a subset of characters having a close area ratio. In this embodiment, the k-means method is used for clustering. Each cluster obtained by clustering is expected to contain the same set of character types. Subsequent processing is performed every two characters.

ここでクラスタリングに関して、この実施形態の文字の大きさに関する制約を述べておく。前記項目2.2で述べた、面積と奥行きの関係は、本来の文字の大きさが同じであることを仮定している。しかし、面積比を用いたクラスタリングでは、字形が同じで大きさのみが異なる文字は区別できない。そのため、クラスタ内に異なる大きさの文字が存在すると、文字の大きさの違いが奥行きの変化によるものなのか、本来の大きさが異なることに起因するものなのかの区別ができず、奥行き情報の外乱要因となる。ただし、通常の文書のように、ほとんどの文字が同じ大きさで、見出し部分のような一部分のみが大きな文字である程度であれば、後述するノイズ除去処理で棄却することができるため、問題ないと考えられる。 Here, regarding the clustering, restrictions on the character size of this embodiment will be described. The relationship between the area and depth described in item 2.2 above assumes that the original character size is the same. However, in clustering using the area ratio, characters having the same character shape but different in size cannot be distinguished. Therefore, if characters of different sizes exist in the cluster, it is impossible to distinguish whether the difference in character size is due to a change in depth or due to a difference in the original size. It becomes a disturbance factor. However, as with normal documents, if most characters are the same size and only a part such as the heading is a large character, it can be rejected by the noise removal processing described later, so there is no problem. Conceivable.

2.4. 平面への当てはめ
以上のようにクラスタリングを行うと、各クラスタにおいて前記項目2.2で述べた平面への当てはめを考えることができる。紙面の傾きを高精度に推定するためには同一字種が文書中で分散していることが望ましいが、このような状況は必ずしも期待できない。そこで、クラスタ(字種)毎に推定した紙面の傾き情報を統合することを考える。その際に問題になるのは黒画素の面積Sである。前記項目2.2の説明ではSが既知であるとしていたが、実際には未知であり、文字毎に異なる。そこで、それぞれの平面の傾きが等しくなるように、クラスタ毎の黒画素の面積の比も同時に推定することにする。以下でその詳細を述べるが、以降の説明では前記項目2.2におけるS，(X_j,Y_j,Z_j)^T，Z'_jにクラスタ番号iを付し、S_i，(X_ij,Y_ij,Z_ij)^T，Z'_ijとしてそれぞれ用いる。式(6)〜式(8)も同様である。
まず、式(8)は式(6)を利用して、 2.4. Fitting to a plane When clustering is performed as described above, it is possible to consider the fitting to the plane described in item 2.2 above in each cluster. In order to estimate the inclination of the page with high accuracy, it is desirable that the same character type is dispersed in the document, but such a situation cannot be expected. Therefore, consider integrating the information on the inclination of the paper estimated for each cluster (character type). At that time, the problem is the area S of the black pixel. In the description of the above item 2.2, it is assumed that S is known, but it is actually unknown and differs for each character. Therefore, the ratio of the black pixel area for each cluster is estimated at the same time so that the inclinations of the respective planes are equal. The details will be described below. In the following description, cluster numbers i are added to S, (X _j , Y _j , Z _j ) ^T , Z ′ _j in the above item 2.2, and S _i , (X _ij , Y _ij , Z _ij ) ^T and Z ' _ij respectively. The same applies to the equations (6) to (8).
First, Equation (8) uses Equation (6),

と表すことができる。この式は、各文字の座標(X_ij,Y_ij,Z'_ij)を画像から得られるx_ij,y_ij,s_ijを用いて計算できることを意味している。しかし、式(9)中の文字の本来の面積を表すS_iと、紙面の傾きαが未知のため、 It can be expressed as. This expression means that the coordinates (X _ij , Y _ij , Z ′ _ij ) of each character can be calculated using x _ij , y _ij , s _ij obtained from the image. However, since the S _i representing the original area of the characters in the formula (9), the plane of inclination α is unknown,

とおき、平面に対する各文字の奥行きの誤差を And the error of the depth of each character relative to the plane

と定義する。そして、全ての文字の誤差の和 It is defined as And the sum of errors for all characters

を最小にする[K_i]と平面のパラメータa,b,cを求める。ここで、[K_i]には定数倍の任意性があり、各パラメータが一意に定まらないため、この実施形態ではc=1に固定して[K_i]，a，bを求める。
ただし、平面への当てはめを行う際にはノイズ(外れ値)の影響を考慮する必要がある。ノイズの原因としては、画像中から文字を抽出する際の画像処理の失敗やクラスタリングにおける誤分類、更には2.3で述べた、文書中に同一字種で複数の大きさの文字が存在する場合が考えられる。これらのノイズに対応するために、この実施形態では2種類のノイズ除去を行う。 [K _i ] that minimizes and parameters a, b, and c of the plane are obtained. Here, [K _i ] has an arbitrary multiple, and each parameter is not uniquely determined. Therefore, in this embodiment, [K _i ], a, and b are obtained by fixing c = 1.
However, it is necessary to consider the influence of noise (outliers) when fitting to a plane. Causes of noise include image processing failure when extracting characters from the image, misclassification in clustering, and there are cases where characters of the same character type and multiple sizes exist as described in 2.3. Conceivable. In order to cope with these noises, two types of noise removal are performed in this embodiment.

2種類のノイズ除去は、(A)クラスタ内の外れ値を除去するもの、(B)クラスタ自体を除去するもの、である。(A)については、クラスタごとに平面から各文字までの距離ε_ijを計算し、距離が閾値t₁以上の文字を除去する。(B)については、要素数が少ない(t₂以下である)クラスタを除去する。そのようなクラスタからは誤った平面が推定される可能性が高いと考えられるからである。 The two types of noise removal are (A) removing outliers in the cluster and (B) removing the cluster itself. For (A), the distance ε _ij from the plane to each character is calculated for each cluster, and characters whose distance is greater than or equal to the threshold value t ₁ are removed. For (B), a cluster having a small number of elements (t ₂ or less) is removed. This is because there is a high possibility that an erroneous plane is estimated from such a cluster.

2.5. 紙面の回転
最後に画像中の紙面を回転して、正面に向ける。これは紙面の正面に視点を移すことと等価であるので、傾いた紙面の法線ベクトルを求め、これの延長上に視点を移すことを考える。回転表現には、任意の回転をZ軸周りの回転φ，Y軸周りの回転θ，X軸周りの回転psiの3段階の回転で表すロール・ピッチ・ヨー型の回転変換を利用する。ロール・ピッチ・ヨーを用いた回転行列Rは次式で表される。 2.5. Rotating the paper Finally, rotate the paper in the image so that it faces the front. Since this is equivalent to moving the viewpoint to the front of the page, it is considered to obtain a normal vector of the inclined page and move the viewpoint on the extension. The rotation expression uses a roll / pitch / yaw type rotation transformation in which an arbitrary rotation is represented by three stages of rotation φ around the Z axis, rotation θ around the Y axis, and rotation psi around the X axis. A rotation matrix R using roll, pitch, and yaw is expressed by the following equation.

紙面がZ'=aX+bY+1と表されているとき、Z'=Z/fとおいたので、Z=afX+bfY+fである。その法線ベクトルは(afbf(-f))^Tである。従って、このx座標とy座標を0にするようにRを用いて回転変換すればよい。Z軸周りの回転を行わないことを考慮すると、Rの回転角は When the paper surface is expressed as Z ′ = aX + bY + 1, since Z ′ = Z / f, Z = afX + bfY + f. Its normal vector is (afbf (-f)) ^T. Therefore, rotation conversion may be performed using R so that the x coordinate and the y coordinate are zero. In consideration of not rotating around the Z axis, the rotation angle of R is

となる。式(13)は角度の推定に未知パラメータfが必要なことを示している。しかし、現段階ではfを推定していないので、暫定的にf=1とした。この場合、射影歪み補正後に、本来は長方形である図形が平行四辺形になるアフィン歪みが残る可能性がある。
以下、回転後の2次元画像を求める。カメラ座標系の点(X_ij,Y_ij,Z_ij)^Tを回転して得られる点を It becomes. Equation (13) shows that the unknown parameter f is required for angle estimation. However, since f is not estimated at this stage, f = 1 is provisionally set. In this case, after the projection distortion correction, there is a possibility that an affine distortion in which a figure that is originally a rectangle becomes a parallelogram may remain.
Hereinafter, a two-dimensional image after rotation is obtained. Camera coordinate system point (X _ij , Y _ij , Z _ij ) A point obtained by rotating ^T

とおけば、 If you

となる。さらにこれを画像平面に射影することによって、回転による2次元画像の座標 It becomes. Furthermore, by projecting this onto the image plane, the coordinates of the two-dimensional image by rotation

を得ることができる。 Can be obtained.

実施形態１と実施形態２とは、補正の対象が異なる。しかし、それだけではなく、事前に学習が必要かどうかという点でも大きく異なる。実施形態１は、字種毎に回転角と変量(文字の外接矩形の面積)の対応をあらかじめ登録しておくので、学習結果との差異が小さくても精度よく補正を行うことができる。ただし、学習の手順と、それを記憶するための記憶部が必要である。また、文書中の文字は全て同じ方向を向いている(回転していない)という仮定を用いている。これに対して、実施形態２は、事前の学習処理やレイアウトに対する仮定を一切用いないで補正をすることができる。従って、多様なレイアウトの様々なフォントに(文字ではないマークの類いにさえ)適用可能である。また、学習結果を記憶するための記憶部が必要ない。さらに、学習対象とされていないフォントからなる文書にも適用することができる。 The target for correction differs between the first embodiment and the second embodiment. However, not only that, it also differs greatly in whether learning is necessary in advance. In the first embodiment, the correspondence between the rotation angle and the variable (area of the circumscribed rectangle of the character) is registered in advance for each character type, so that correction can be performed with high accuracy even if the difference from the learning result is small. However, a learning procedure and a storage unit for storing it are necessary. Also, the assumption is that all characters in the document are facing the same direction (not rotated). On the other hand, Embodiment 2 can correct | amend without using the assumption with respect to a prior learning process and a layout at all. Therefore, it can be applied to various fonts of various layouts (even to the kind of marks that are not characters). Further, a storage unit for storing the learning result is not necessary. Furthermore, the present invention can be applied to a document composed of fonts that are not targeted for learning.

≪より一般的な幾何変形の推定≫
以上、回転に対する補正および射影歪みからアフィン歪みへの補正を例にとって説明したが、これは他の幾何変形の推定とそれに基づく補正にも応用できる。すなわち、補正したい幾何学的変形の変量と不変量を組み合わせることにより、様々な幾何変形の程度を推定できる。例えば、アフィン変換に対する不変量(当然平行移動・スケール変換・回転にも不変)と、相似変換(平行移動・スケール変換・回転)に不変である量があるとすれば、これら2つの量を組み合わせることによりアフィン変換のうちせん断成分を推定できると考える。 ≪Estimating more general geometric deformation≫
The correction to rotation and the correction from projection distortion to affine distortion have been described above as examples, but this can also be applied to estimation of other geometric deformations and correction based thereon. In other words, the degree of various geometric deformations can be estimated by combining the geometric deformation variable and the invariant to be corrected. For example, if there is an invariant for affine transformation (which is of course invariant to translation, scale transformation, and rotation) and a quantity that is invariant to similarity transformation (translation, scale transformation, and rotation), combine these two quantities. Therefore, the shear component of the affine transformation can be estimated.

≪実験例１≫
以下の実験例１では、実施の形態１に対応した実験例について述べる。
3-1. 実験試料
スキュー推定の対象は、テキストベースの組版処理システムとして知られるLaTeXで作成された5種類の文書画像D1, D2, D3, D4およびD5である。それら文書の画像を図5に示す。大部分は事例と同じフォントから構成されるが、数式を含む文書もある。これら数式中のイタリックフォントや数学記号については対応する事例がなく、したがって誤推定の要因になりうる。それぞれの文書画像を±30°，±20°，±10°，±5°，±2°，0°で回転させて44個のテスト画像を生成した。図6はその例である。これらのテスト画像に対して左上から順に右下まで1連結成分(多くの場合、単文字)づつ投票を行うこととした。 ≪Experimental example 1≫
In Experimental Example 1 below, an experimental example corresponding to Embodiment 1 will be described.
3-1. The target of the experimental sample skew estimation is five types of document images D1, D2, D3, D4 and D5 created by LaTeX, which is known as a text-based typesetting system. The images of these documents are shown in FIG. Mostly composed of the same font as the example, but some documents contain mathematical expressions. There is no corresponding case for italic fonts and mathematical symbols in these mathematical expressions, which can cause erroneous estimation. Each test document image was rotated at ± 30 °, ± 20 °, ± 10 °, ± 5 °, ± 2 °, and 0 ° to generate 44 test images. FIG. 6 is an example. These test images were voted one connected component (in most cases, a single character) from the upper left to the lower right.

単一フォント(Times-Roman)の"A"から"Z"、"a"から"z"の52文字それぞれを-45°から45°まで0.1°刻みで変量と不変量を測定して事例を作成した。図7に実際に測定した変量と不変量の例を示す。カテゴリごとに測定した不変量は誤差が生じるので、事例として記憶しておく不変量q_cはその平均をとった。 Example of measuring variable and invariant in increments of 0.1 ° from -45 ° to 45 ° for 52 characters of “A” to “Z” and “a” to “z” in a single font (Times-Roman) Created. FIG. 7 shows examples of variables and invariants actually measured. Since the invariant measured for each category has an error, the invariant q _c stored as an example is averaged.

3-2. この発明による傾き補正結果
44枚のテスト画像について傾きを推定した結果をまとめたものを表1に示す。表1のカッコ内の単位は％である。テスト画像の95%を誤差2.0°以下で推定することができた。 3-2. Result of tilt correction according to this invention
Table 1 summarizes the results of estimating the tilt for 44 test images. The unit in parentheses in Table 1 is%. It was possible to estimate 95% of the test images with an error of 2.0 ° or less.

傾き推定の成功例としてテスト画像D1を-2.0°回転させた推定結果を図8のD1(-2°)に示す。図８(a)の推定角度は-1.4°であり、十分な精度で推定できたといえる。 As an example of successful tilt estimation, the estimation result obtained by rotating the test image D1 by -2.0 ° is shown in D1 (-2 °) of FIG. The estimated angle in Fig. 8 (a) is -1.4 °, which can be estimated with sufficient accuracy.

また、図８(b)の縦軸は推定に使用した文字数を表しており、上から下に行く方向で投票が進む。すなわち、推定に利用した文字数が増加する。推定に使用した文字数が増えるに従い推定精度が向上していることが確認できる。しかし、正解角度付近でわずかながら振動が起こっている。これは、図８(a)に示す通り、正解角度付近にほぼ同数の投票が行われているためである。項目1.3で述べたように、p_xの誤差を考慮するため、
を満たすθに投票している。このため、p_xとして常に同じ値が求まったとしても、２ε_pの幅の分だけ推定角度に曖昧さがでる。その結果、正解角度付近に複数のピークが見られる。 In addition, the vertical axis in FIG. 8B represents the number of characters used for estimation, and voting proceeds in the direction from top to bottom. That is, the number of characters used for estimation increases. It can be confirmed that the estimation accuracy improves as the number of characters used for estimation increases. However, there is a slight vibration near the correct angle. This is because, as shown in FIG. 8 (a), almost the same number of votes are performed near the correct angle. As mentioned in item 1.3, to account for errors in the p _x,
Voting to satisfy θ. Therefore, even always the same value as found by p _x, comes out ambiguity amount corresponding estimated angle of the width of 2 [epsilon] _p. As a result, a plurality of peaks are seen near the correct angle.

各テスト画像に対する傾きの推定誤差（単位°）を表2に示す。傾き推定誤差の平均は1.4°であった。テスト画像D3，D4のように事例を持たない文字を含む画像でも、大部分の連結成分が事例を持っていれば投票により正しく推定できている。また、D5のような文字が直線上に並んでいないテスト画像の回転角は従来法では推定することが難しいが、この発明では正しく推定できた。 Table 2 shows the estimation error (in degrees) of the tilt for each test image. The average slope estimation error was 1.4 °. Even in the case of images including characters that have no examples such as test images D3 and D4, if most of the connected components have examples, they can be correctly estimated by voting. Further, the rotation angle of a test image in which characters such as D5 are not arranged on a straight line is difficult to estimate by the conventional method, but can be estimated correctly by the present invention.

テスト画像D3の傾き20°とD5の傾き-5°とは、誤差が2.0°以上になった。これらの推定結果を図8のD3(20°)，D4(-5°)に示す。前者では全く違った角度に投票しており、後者では正解角度付近で振動している。これには現在の不変量では複数の事例を参照してしまうという共通した原因がある。詳細を以下に示す。 The error between the test image D3 tilt of 20 ° and the D5 tilt of -5 ° was 2.0 ° or more. These estimation results are shown as D3 (20 °) and D4 (-5 °) in Fig. 8. The former voted for a completely different angle, and the latter oscillated near the correct angle. This has a common cause that current invariants refer to multiple cases. Details are shown below.

項目1-2で、小数εを用いて、
In item 1-2, using decimal ε,

を満たす複数のcをカテゴリ候補として利用することを説明した。しかし、これにより誤った事例が参照されそれらが間違った角度へ正解の角度以上に投票してしまう場合がある。文書中に最も多く表れる"e"を例にとる。表3に、カテゴリ"e"及びその近辺のカテゴリとその不変量を示している。入力としてx="e"が与えられた場合、表3によると事例として参照されるのは"e"の他に"u"，"n"である。さらに、不変量の測定誤差により"s"まで投票する範囲に入ることがある。結果、誤った角度に多くの投票がなされた。
以上の原因を踏まえて誤差の大きくなった2サンプルについて考察する。テスト画像D3の傾き20°の場合の、入力x="e"における投票の様子は図9のようになっている。20°回転させた文字"e"の変量は同図のp_xである。このp_xに応じて投票される角度は、斜線部分では1つ、黒色部分では変量が重なっているために2つ投票される。その結果、投票してほしい20°近辺よりも、-11°から-12°に2倍近く投票されることになる。結果、正しい角度よりも誤った角度に多く投票されてしまったと考えられる。 We explained using multiple c satisfying the above as candidate categories. However, this may lead to incorrect cases being referenced and they may vote for the wrong angle more than the correct angle. Take "e" which appears most frequently in a document as an example. Table 3 shows the category “e” and its neighboring categories and their invariants. When x = "e" is given as an input, according to Table 3, “u” and “n” are referred to as examples in addition to “e”. In addition, due to invariant measurement errors, it may be within the voting range up to "s". As a result, many votes were taken at the wrong angle.
Based on the above causes, we consider two samples with large errors. The voting state at the input x = "e" when the test image D3 has an inclination of 20 ° is as shown in FIG. The variable of the letter “e” rotated by 20 ° is p _x in the figure. The p _x angle to be voted in accordance with the one in the shaded area, the black portions are voted two to overlap is variable. As a result, the vote will be almost twice from -11 ° to -12 °, rather than around 20 °. As a result, it is thought that many people voted for the wrong angle rather than the correct angle.

テスト画像D4の傾き-5°の場合、"e"は正しく投票されていたものの、D4の文書中に占める割合の大きい"t"において先程と同様に誤った角度に投票していた。図10に"t"とその不変量の近辺にあるカテゴリ"b"，"f"，"t"の変量と角度の関係のグラフを示す。p_xは入力x="t"の-5°における変量を表している。同図の斜線部分では1ずつ、黒色部分は変量のグラフが2本もしくは3本重なっているためそれぞれ2、3ずつ投票される。その結果、正しい角度の周辺で2、3倍の投票を行っているため、正しい角度の近くで振動してしまった。 When the test image D4 has an inclination of -5 °, “e” was voted correctly, but “t”, which has a large proportion of the document in D4, voted at the wrong angle as before. FIG. 10 shows a graph of the relationship between the variable and angle of categories “b”, “f”, and “t” in the vicinity of “t” and its invariant. p _x represents a variable at -5 ° of the input x = "t". The shaded area in the figure is voted one by one, and the black part is voted two or three because two or three variable graphs overlap each other. As a result, they voted 2 to 3 times around the correct angle, and oscillated near the correct angle.

≪実験例２≫
次に、実施形態２の有効性を検証するために実験を行った。実験データには Canon EOS 5D で撮影した、サイズが4,368×2,912の画像を使用した。実験結果の定量評価は今後行うことにして、この実施形態では実験結果を目視で評価した。実施の形態２の手法は文書の枠を必要としないが、今回の実験では手法の効果がわかりやすいように、枠が写っている画像を選んだ。
不変量の組み合わせを考える上で、2文字のうち黒画素の面積が大きい方を「文字大」、面積が小さい方を「文字小」と呼ぶことにする。用いた5種類の不変量の組み合わせは、 «Experimental example 2»
Next, an experiment was performed to verify the effectiveness of the second embodiment. For the experimental data, images taken with Canon EOS 5D and having a size of 4,368 x 2,912 were used. The quantitative evaluation of the experimental results will be performed in the future. In this embodiment, the experimental results were evaluated visually. The method of the second embodiment does not require a document frame, but in this experiment, an image with a frame was selected so that the effect of the method can be easily understood.
In considering the combination of invariants, the larger character area of the two pixels is called “character large” and the smaller character area is called “small character”. The five invariant combinations used are:

である。不変量ベクトルの次元数が5次元の場合は5種類全て、3次元の場合はi.〜iii.のみを使用する。
図15は、この発明に係る射影歪みの補正の実験結果を示す説明図である。図15は、対象の画像1〜3、それらの画像の補正に用いたパラメータおよび画像1〜3の補正結果を示す。 It is. If the invariant vector has 5 dimensions, use all 5 types, and if it has 3 dimensions, use only i. To iii.
FIG. 15 is an explanatory view showing an experimental result of correction of projection distortion according to the present invention. FIG. 15 shows target images 1 to 3, parameters used for correcting those images, and correction results of images 1 to 3.

クラスタリングとノイズ除去の各パラメータは、クラスタリング結果が最適になるように調節した。図15の「補正前」が元画像である。また、図16は、図15の「画像2」の元画像の詳細を示すものである。「画像2」は、文書中の文字が直線上に並んでいないレイアウトであってもこの発明の手法が適用可能なことを実証するためのサンプルである。
補正結果について、まず未知である焦点距離をf=1としたときの結果を図15に「補正後(f=1)」として示す。理想的には文書の角の直角は復元されないものの、平行線は復元されるはずである。実験の結果、画像1の平行線はほぼ復元されるものの、画像2と画像3では若干の誤差が残った。この主な原因は、平面当てはめの際に外れ値が影響し、パラメータの推定に誤差が生じたことである。 The parameters for clustering and noise removal were adjusted to optimize the clustering result. “Before correction” in FIG. 15 is the original image. FIG. 16 shows details of the original image of “image 2” in FIG. “Image 2” is a sample for demonstrating that the method of the present invention can be applied even to a layout in which characters in a document are not arranged on a straight line.
With respect to the correction result, first, the result when the unknown focal length is set to f = 1 is shown as “after correction (f = 1)” in FIG. Ideally, the right angle of the document corner is not restored, but the parallel lines should be restored. As a result of the experiment, the parallel lines in image 1 were almost restored, but some errors remained in images 2 and 3. The main cause is that an outlier has an effect upon plane fitting, and an error has occurred in parameter estimation.

次に焦点距離fを手動で探索し、ほぼ最適となる値を定めたときの結果を図15に「補正後(fは最適値)」として示す。理想的には、文書の枠の長方形が復元されるはずである。しかし、f=1のときと同様の理由で、この場合も画像1の長方形はほぼ復元されたものの、画像2と画像3では誤差が残った。 Next, the result of manually searching for the focal length f and determining a substantially optimal value is shown as “after correction (f is an optimal value)” in FIG. Ideally, the rectangle of the document frame should be restored. However, for the same reason as when f = 1, in this case as well, the rectangle of image 1 was almost restored, but errors remained in images 2 and 3.

以上のように、実施形態１および実験例１では、ある特定の幾何変形に対する「変量」と「不変量」のみを利用することで変形量を推定し、推定結果に基づいて補正する方法を説明した。特定の幾何変形としてスキューすなわち回転を例に取って、回転により値が変わる特徴量と変わらない特徴量を組み合わせることで回転角度を推定した。その結果、文書画像中の文字が直線状に並んでいなくても、回転角度を推定することが可能となった。また、事例に基づいて変形量を推定した。標準パターンをそのまま事例として用いるのではなく、変量と回転角の関係を標準パターンから求め、それを事例として登録しておくことで、効率のよい変形量推定を可能にした。さらに前述の不変量を利用することで、カテゴリが不明であっても事例が参照できるような工夫をした。 As described above, Embodiment 1 and Experimental Example 1 describe a method of estimating a deformation amount by using only “variables” and “invariants” for a specific geometric deformation, and correcting based on the estimation result. did. Taking a skew or rotation as an example of a specific geometric deformation, the rotation angle is estimated by combining a feature amount that changes with rotation and a feature amount that does not change. As a result, the rotation angle can be estimated even if the characters in the document image are not arranged in a straight line. In addition, the amount of deformation was estimated based on the case. Instead of using the standard pattern as an example as it is, the relationship between the variable and the rotation angle is obtained from the standard pattern and registered as an example, enabling efficient estimation of the amount of deformation. Furthermore, by using the invariants described above, we devised a way to refer to cases even if the category is unknown.

また、実施形態２および実験例２では、紙面の傾きに対する「変量」と「不変量」を組み合わせることで文書画像の射影歪みを補正する態様を示した。文書画像中の各連結成分に対して、面積を「変量」として、面積比を「不変量」として用いることにより、各連結成分の相対的な奥行き情報を推定することができる。そして、紙面全体の情報を統合することにより、紙面の傾き角度の推定を行う。実施形態２では、連結成分が持つ相対的な情報のみを利用しており、撮影方法やレイアウトに強い制約を用いていないため、多様な文書画像の射影歪みを補正することができる。
実験例２では、本来は長方形である物を平行四辺形に(射影歪みをアフィン歪み程度に)復元できる潜在能力を確認したが、本来の長方形を復元するまでには至っていない。この原因は推定した奥行き情報に定数倍の不確定性が残っていることと、平面の当てはめに誤差が生じていることによる。前者は実施形態２で用いたのとは別の「変量」と「不変量」を利用することで解決できると考えられ、後者はロバスト推定の導入やノイズ除去の精度向上によって解決可能であると思われる。なお、ここでいうロバスト推定とは、この技術分野の用語であって、パラメータの推定に用いるサンプルにその性質が他と異なるもの（いわゆる「外れ値」）が混じっている場合、その影響が極力排除されるような推定方法のことをいう。 Further, in the second embodiment and the experimental example 2, the aspect in which the projection distortion of the document image is corrected by combining “variable” and “invariant” with respect to the inclination of the paper surface is shown. By using the area as “variable” and the area ratio as “invariant” for each connected component in the document image, the relative depth information of each connected component can be estimated. Then, the inclination angle of the paper surface is estimated by integrating the information on the entire paper surface. In the second embodiment, only the relative information of the connected components is used, and no strong restrictions are applied to the shooting method and layout, so that it is possible to correct projection distortion of various document images.
In Experimental Example 2, the potential to restore an originally rectangular object to a parallelogram (projection distortion to about affine distortion) was confirmed, but the original rectangle was not restored. This is due to the fact that uncertainties of a constant multiple remain in the estimated depth information and an error has occurred in the fitting of the plane. The former can be solved by using “variables” and “invariants” different from those used in the second embodiment, and the latter can be solved by introducing robust estimation and improving the accuracy of noise removal. Seem. Note that the robust estimation here is a term in this technical field, and when the sample used for parameter estimation is mixed with a sample whose characteristics are different from others (so-called “outliers”), the effect is as much as possible. An estimation method that is excluded.

≪装置のブロック構成≫
図17は、この発明の画像歪み補正装置の機能的な構成を示すブロック図である。図17に示すように、この発明の装置は、分割部21、算出部23、分類部25、推定部27および補正部29を備えてなる。この装置を実現するハードウェアの一態様は、パーソナルコンピュータに、この発明の補正プログラムがインストールされたものである。パーソナルコンピュータのCPUがその補正プログラムを実行することによって各ブロックの機能が実現される。即ち、入力された画像を局所的なパターンである局所パターンに分割する処理を前記CPUが実行することによって分割部21の機能が実現される。各局所パターンについて、変形の程度によって値が略一定である不変量と変形の程度に応じて値が変化する変量とを所定の手順に基づいて算出する処理を前記CPUが実行することによって算出部23の機能が実現される。また、算出された不変量に基づいて各局所パターンを複数カテゴリの何れかに分類する処理を前記CPUが実行することによって分類部25の機能が実現される。さらに、各カテゴリの各局所パターンについて算出された変量に基づいてその局所パターンが受けた変形の程度を推定する処理を前記CPUが実行することによって推定部27の機能が実現される。さらにまた、推定結果に基づいて画像を補正する処理を前記CPUが実行することによって補正部29の機能が実現される。 ≪Device block configuration≫
FIG. 17 is a block diagram showing a functional configuration of the image distortion correction apparatus of the present invention. As shown in FIG. 17, the apparatus of the present invention includes a dividing unit 21, a calculating unit 23, a classifying unit 25, an estimating unit 27, and a correcting unit 29. One aspect of hardware that implements this apparatus is a personal computer in which the correction program of the present invention is installed. The function of each block is realized by the CPU of the personal computer executing the correction program. That is, the function of the dividing unit 21 is realized by the CPU executing a process of dividing the input image into local patterns that are local patterns. For each local pattern, the CPU executes a process of calculating, based on a predetermined procedure, an invariant whose value is substantially constant according to the degree of deformation and a variable whose value changes according to the degree of deformation. 23 functions are realized. Further, the function of the classification unit 25 is realized by the CPU executing a process of classifying each local pattern into one of a plurality of categories based on the calculated invariant. Furthermore, the function of the estimation unit 27 is realized by the CPU executing a process of estimating the degree of deformation received by the local pattern based on the variable calculated for each local pattern of each category. Furthermore, the function of the correction unit 29 is realized by the CPU executing a process of correcting an image based on the estimation result.

前述した実施の形態の他にも、この発明について種々の変形例があり得る。それらの変形例は、この発明の範囲に属さないと解されるべきものではない。この発明には、請求の範囲と均等の意味および前記範囲内でのすべての変形とが含まれるべきである。 In addition to the embodiments described above, there can be various modifications of the present invention. These modifications should not be construed as not belonging to the scope of the present invention. The present invention should include the meaning equivalent to the scope of the claims and all modifications within the scope.

この実施形態において、文書画像の回転角度を推定する様子を示す説明図である。In this embodiment, it is explanatory drawing which shows a mode that the rotation angle of a document image is estimated. この実施形態において、所定の関係を満たす回転角度θに対する投票の様子を示すグラフである。In this embodiment, it is a graph which shows the mode of voting with respect to rotation angle (theta) which satisfy | fills a predetermined relationship. 文字「A」を例に、この実施形態における変量／不変量である文字の外接矩形の面積／凸包の面積を示す説明図である。It is explanatory drawing which shows the area of the circumscribing rectangle of the character which is the variable / invariant / area of the convex hull in this embodiment taking the character “A” as an example. この実施形態におけるカテゴリ"y"につき、回転角度が-180°から180°の範囲における変量pの値を示すグラフである。6 is a graph showing a value of a variable p in a range of a rotation angle from −180 ° to 180 ° for the category “y” in this embodiment. この実施形態において、実験に使用した文書画像を示す説明図である。In this embodiment, it is explanatory drawing which shows the document image used for experiment. この実施形態において、実験例として回転させた文書画像の例を示す説明図である。In this embodiment, it is explanatory drawing which shows the example of the document image rotated as an example of an experiment. この実施形態において、実験例で求めた変量と不変量の回転角度に対する値を示すグラフである。In this embodiment, it is a graph which shows the value with respect to the rotation angle of the variable and the invariant calculated | required in the experiment example. この実施形態において、実験例の投票結果と、使用した文字数による推定結果の推移とをそれぞれ縦軸にとり、回転角度を横軸にとったグラフである。In this embodiment, the voting result of the experimental example and the transition of the estimation result based on the number of characters used are plotted on the vertical axis, and the rotation angle is plotted on the horizontal axis. この実施形態において、カテゴリ"e", "n"および"s"の変量と回転角度との関係を示すグラフである。In this embodiment, it is a graph which shows the relationship between the variable of category "e", "n", and "s", and a rotation angle. この実施形態において、カテゴリ"b", "f", "p"および"t"の変量と回転角度との関係を示すグラフである。In this embodiment, it is a graph which shows the relationship between the variable of category "b", "f", "p", and "t", and a rotation angle. ピンホールカメラをモデルとする座標系と、それを配置し直した座標系とを示す説明図である。It is explanatory drawing which shows the coordinate system which makes a pinhole camera a model, and the coordinate system which rearranged it. 傾いた紙面をカメラから見たときの文字の射影と近似を説明するための説明図である。It is explanatory drawing for demonstrating the projection and approximation of a character when the inclined paper surface is seen from a camera. この実施形態で、選ばれた2文字から変量と不変量とを得る例を示す第１の説明図である。In this embodiment, it is the 1st explanatory view showing the example which obtains a variable and an invariant from two chosen characters. この実施形態で、選ばれた2文字から変量と不変量とを得る例を示す第２の説明図である。In this embodiment, it is a 2nd explanatory view which shows the example which obtains a variable and an invariant from two selected characters. この発明に係る射影歪みの補正の実験結果を示す説明図である。It is explanatory drawing which shows the experimental result of correction | amendment of the projection distortion which concerns on this invention. 図15の「画像2」の詳細を示す説明図である。FIG. 16 is an explanatory diagram showing details of “Image 2” in FIG. この発明の画像歪み補正装置の機能的な構成を示すブロック図である。It is a block diagram which shows the functional structure of the image distortion correction apparatus of this invention.

Explanation of symbols

１１物体、撮影対象
１３紙面
２１分割部
２３算出部
２５分類部
２７推定部
２９補正部
C 焦点
D1, D2, D3, D4, D5 文書画像
I 画像平面
f 焦点距離 DESCRIPTION OF SYMBOLS 11 Object, imaging | photography object 13 Paper surface 21 Division | segmentation part 23 Calculation part 25 Classification | category part 27 Estimation part 29 Correction | amendment part
C focus
D1, D2, D3, D4, D5 Document images
I Image plane
f Focal length

Claims

A method for inputting an image subjected to geometric deformation and correcting the deformation received by the image,
Dividing the input image into local patterns which are local patterns;
For each local pattern, a calculation step for calculating an invariant whose value is substantially constant according to the degree of deformation and a variable whose value changes according to the degree of deformation based on a predetermined procedure;
Classifying each local pattern into one of a plurality of categories based on the calculated invariants;
An estimation step for estimating the degree of deformation that the local pattern has undergone based on the variables calculated for each local pattern of each category;
Correcting the image based on the estimation result,
A method of correcting an image, wherein a computer executes each step.

The method according to claim 1, wherein the geometric deformation is a projective transformation, an affine transformation, or a similarity transformation.

The method according to claim 1, wherein the image is a document image, and at least a part of the local pattern is a character pattern.

The geometric deformation is rotation;
A variable is a value that changes by rotating a local pattern,
The method according to claim 1, wherein the invariant is a substantially constant value even when the local pattern is rotated.

The geometric deformation is a projective transformation;
A variable is a value that changes with depth,
The method according to claim 1, wherein the invariant is a substantially constant value with respect to a change in depth.

The method according to claim 1, wherein the variable is a rectangular area circumscribing the local pattern.

The method according to claim 1, wherein the variable is an area of a black pixel portion of a local pattern.

The method according to claim 1, wherein the invariant is an area within a convex hull of a local pattern.

The method according to claim 1, wherein the local pattern is a pattern divided as a connected component in an image or a set of the patterns.

For each category, let q _{c be} the invariant for that category and q _{x be} the invariant for each local pattern.
(Where ε is a predetermined constant)
The method according to claim 1, comprising a local pattern satisfying the relationship:

The estimation step compares a variable calculated from each local pattern with a reference value stored in advance corresponding to each category, temporarily estimates the degree of deformation for each local pattern, and calculates each temporarily estimated result. The method according to claim 1, wherein the degree of deformation is estimated statistically.

The method according to claim 11, wherein the reference value is obtained by measuring a variable by stepwise deforming a standard pattern of each category, and storing the amount of deformation of each step and the measured variable in association with each other.

The estimation step temporarily estimates the degree of deformation for each category based on the relationship between the position of each local pattern and the variable of the local pattern, and statistically processes each temporarily estimated result to estimate the degree of deformation. The method according to any one of claims 1 to 10.

The input image is a document image,
The geometric deformation is a projective transformation;
The variable is the area of the black pixel portion of the local pattern,
The invariant is the area within the convex hull of the local pattern,
6. The method according to claim 5, wherein the estimating step performs temporary estimation of the inclination of the paper surface of the document image based on the relationship between the area of the black pixel portion of each local pattern and the position on the paper surface.

A program for correcting a deformation received by an image subjected to geometric deformation,
Processing to divide the input image into local patterns that are local patterns;
For each local pattern, a calculation process for calculating an invariant whose value is substantially constant according to the degree of deformation and a variable whose value changes according to the degree of deformation based on a predetermined procedure;
A process of classifying each local pattern into one of a plurality of categories based on the calculated invariant,
An estimation process for estimating the degree of deformation that the local pattern has undergone based on the variables calculated for each local pattern of each category;
An image correction program causing a computer to execute a process of correcting an image based on an estimation result.

An apparatus that receives an image subjected to geometric deformation as an input and corrects the deformation received by the image,
A dividing unit that divides the input image into local patterns that are local patterns;
For each local pattern, a calculation unit that calculates an invariant whose value is substantially constant according to the degree of deformation and a variable whose value changes according to the degree of deformation based on a predetermined procedure;
A classification unit for classifying each local pattern into one of a plurality of categories based on the calculated invariant,
An estimation unit that estimates the degree of deformation that the local pattern has undergone based on the variables calculated for each local pattern of each category;
An image distortion correction apparatus comprising: a correction unit that corrects an image based on an estimation result.