JP6542230B2

JP6542230B2 - Method and system for correcting projected distortion

Info

Publication number: JP6542230B2
Application number: JP2016541592A
Authority: JP
Inventors: マー、ジャングリン; ダウ、ミシェル; ミューレネール、ピエールドゥ; デュボン、オリヴィエ
Original assignee: イ．エル．イ．エス．
Priority date: 2013-12-20
Filing date: 2014-12-19
Publication date: 2019-07-10
Anticipated expiration: 2034-12-19
Also published as: WO2015092059A1; JP2017500662A

Description

本発明は、投影ひずみを補正するための方法、システム、デバイス及びコンピュータ・プログラム製品に関する。 The present invention relates to methods, systems, devices and computer program products for correcting projection distortion.

デジタル・カメラ（以下ではカメラと称される）が、画像を取り込むために使用されることもある。技術における進化とともに、デジタル・カメラは、ほとんどすべてのタイプのデジタル・デバイスにおいて、実装される。そのようなデジタル・デバイスの実例は、それだけには限定されないが、モバイル通信デバイスと、タブレットと、ラップトップと、携帯型個人情報端末（ＰＤＡ：ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）とを含む。多数の例においては、カメラがドキュメントの画像を取り込むために使用され得るので、カメラはドキュメント・スキャナについての代替手段としての役割を果たすことができる。ドキュメントの画像は、テキスト認識及び／又はテキスト抽出の前に処理される必要がある可能性がある。ドキュメントの画像の処理は、２つの主要な困難な課題をすなわち、好ましくない撮像状態に起因した取り込まれた画像の不十分な画像品質と、取り込まれた画像におけるひずみとを課す。ひずみは、カメラ、及び／又は画像を取り込みながらドキュメントのプレーンに対するカメラの角度及び位置に起因したものとすることができる。後者に起因したひずみは、投影ひずみとして知られている。投影ひずみにおいては、テキストのシンプトン又はキャラクタは、カメラ・プレーンに近づけば近づくほど、より大きく見え、遠くに離れれば離れるほど、サイズが縮小するように見える。画像の品質を改善するための知られている技法が存在している。しかしながら、画像の品質を改善することは、ドキュメントの画像が、とりわけ、投影ひずみを受けているときには、テキストの認識及び／又は抽出において、助けにならない可能性がある。投影ひずみは、テキストの視覚的解釈を乱すだけでなく、テキスト認識アルゴリズムの精度に影響を及ぼすこともある。 Digital cameras (hereinafter referred to as cameras) may also be used to capture images. With advances in technology, digital cameras are implemented in almost all types of digital devices. Examples of such digital devices include, but are not limited to, mobile communication devices, tablets, laptops, and personal digital assistants (PDAs). In many instances, the camera can serve as an alternative to the document scanner, as the camera can be used to capture an image of the document. Images of the document may need to be processed prior to text recognition and / or text extraction. Processing of the image of the document imposes two major challenges: poor image quality of the captured image due to objectionable imaging conditions and distortion in the captured image. The distortion may be due to the camera and / or the angle and position of the camera relative to the plane of the document while capturing an image. The distortion caused by the latter is known as projection distortion. In projected distortion, text symptoms or characters appear to be larger the closer they are to the camera plane and the smaller the further they are. There are known techniques for improving the quality of images. However, improving the quality of the image may not help in the recognition and / or extraction of text, especially when the document's image is subject to projection distortion. Projection distortion not only disturbs the visual interpretation of the text, but can also affect the accuracy of the text recognition algorithm.

投影ひずみを補正するための既存の技法が存在している。投影ひずみの補正を実行するための現在知られている技法のうちの１つは、補助データを使用している。補助データは、方向測定データと、加速度計データと、距離測定データとの組合せを含むことができる。しかしながら、そのような補助データは、様々なセンサ及び／又は処理能力の欠如に起因して、すべての電子デバイスにおいて、使用可能でないこともある。いくつかの他の技法は、投影ひずみの手動補正を考察している。１つのそのような技法は、ユーザがひずみの前に、２つの水平ライン・セグメントと、２つの垂直ライン・セグメントとによって形成される長方形であるように使用した四辺形の４つのコーナーを手動で、識別し、マーク付けすることを必要としている。別の技法は、ユーザがひずみの前に水平ライン又は垂直ラインに対応する平行ラインを識別し、マーク付けすることを必要としている。それらのコーナー又は平行ラインに基づいて、投影ひずみの補正が実行される。しかしながら、投影ひずみの手動補正は時間がかかり、非効率的であり、エラーを起こしやすい。 Existing techniques exist to correct for projected distortions. One of the currently known techniques for performing projection distortion correction uses auxiliary data. The auxiliary data may include a combination of direction measurement data, accelerometer data, and distance measurement data. However, such auxiliary data may not be available in all electronic devices due to the lack of various sensors and / or processing capabilities. Several other techniques contemplate manual correction of projection distortion. One such technique manually manually uses the four corners of a quadrilateral which the user used to be a rectangle formed by two horizontal line segments and two vertical line segments before straining. Need to identify, mark. Another technique requires the user to identify and mark parallel lines corresponding to horizontal or vertical lines prior to distortion. Based on those corners or parallel lines, correction of projection distortion is performed. However, manual correction of projection distortion is time consuming, inefficient and prone to errors.

投影ひずみアルゴリズムの自動補正のための技法もまた、存在している。これらの技法は、水平消失ポイントと、垂直消失ポイントとを識別することに焦点を当てたものである。それらの消失ポイントは、そこで画像の中のドキュメントの外形（例えば、水平の外形又は垂直の外形）が、あるポイントに収束するポイントのことを意味することができる。それらの技法は、水平消失ポイントと、垂直消失ポイントとを使用して投影ひずみの補正を実行する。しかしながら、ほとんどの技法は、補正のための複雑な手動パラメータ設定を必要とする。画像のコンテンツが変化する場合、パラメータは、手動で変更される必要がある。これは、それらの技法の能力を制限する。さらに、既存の技法は、計算的に費用がかかり、モバイル通信デバイスなど、小型のデバイスにおいて実施することを難しくしている。さらに、ほとんどの技法は、ドキュメント画像がテキストだけを含むという仮定の上で機能する。テキストと、ピクチャとの組合せを有するドキュメント画像の場合には、それらの技法は、全く有用な１つ又は複数の結果を生成しない可能性がある。それらの技法のうちの多くは、ドキュメントの画像の中のテキストが特定のやり方で形成され、且つ／又は位置づけられるという仮定の上で機能する。そのようにして、画像の中のテキストが特定のやり方で形成され、且つ／又は位置づけられるときに、それらの技法は、失敗する。 Techniques also exist for automatic correction of projection distortion algorithms. These techniques focus on identifying horizontal vanishing points and vertical vanishing points. These vanishing points can then mean points at which the outline of the document in the image (e.g. horizontal outline or vertical outline) converges to a certain point. These techniques use the horizontal and vertical vanishing points to perform projection distortion correction. However, most techniques require complex manual parameter settings for correction. If the content of the image changes, the parameters need to be changed manually. This limits the capabilities of those techniques. Furthermore, existing techniques are computationally expensive and difficult to implement in small devices, such as mobile communication devices. Furthermore, most techniques work on the assumption that the document image contains only text. In the case of document images that have a combination of text and pictures, those techniques may not produce one or more useful results at all. Many of these techniques work on the assumption that text in an image of a document is formed and / or positioned in a particular manner. As such, those techniques fail when the text in the image is formed and / or positioned in a particular manner.

ＭａｒｔｉｎＡ．Ｆｉｓｃｈｅｒ及びＲｏｂｅｒｔＣ．Ｂｏｌｌｅｓ、「ＲａｎｄｏｍＳａｍｐｌｅＣｏｎｓｅｎｓｕｓ：ＡＰａｒａｄｉｇｍｆｏｒＭｏｄｅｌＦｉｔｔｉｎｇｗｉｔｈＡｐｐｌｉｃａｔｉｏｎｓｔｏＩｍａｇｅＡｎａｌｙｓｉｓａｎｄＡｕｔｏｍａｔｅｄＣａｒｔｏｇｒａｐｈｙ」、Ｃｏｍｍ．ｏｆｔｈｅＡＣＭ２４（６）：３８１〜３９５頁、１９８１年６月Martin A. Fischer and Robert C. et al. Bolles, "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography", Comm. of the ACM 24 (6): 381-395, June 1981

ひずみのある画像の投影補正を実行するための方法、システム、デバイス及び／又はコンピュータ・プログラム製品を提供することが、本発明の目的であり、これは、上記で述べられた欠点のうちの少なくとも１つを示してはいない。 It is an object of the present invention to provide a method, system, device and / or computer program product for performing projection correction of distorted images, which is at least one of the drawbacks mentioned above. I do not show one.

この目的は、独立請求項において規定されるように、本発明に従って達成される。 This object is achieved according to the invention as defined in the independent claims.

本明細書において説明される他の態様と組み合わされることもある、本発明の第１の態様によれば、透視図法によってひずみを受ける少なくとも１つのテキスト部分を含む画像の投影補正のための方法が開示される。本方法は、画像２値化のステップを含んでおり、ここでは、前記画像が２値化される。その後に、本方法は、連結成分分析を実行するステップを含んでいる。連結成分分析は、前記２値化された画像の前記少なくとも１つのテキスト部分において、ピクセル・ブロブを検出するステップを必要とする。その後に、本方法は、水平消失ポイント決定のステップを含んでいる。水平消失ポイント決定は、前記ピクセル・ブロブの固有ポイントを用いて、テキスト・ベースラインを推定するステップと、前記テキスト・ベースラインを用いて前記少なくとも１つのテキスト部分の水平消失ポイントを決定するステップとを含む。本方法は、その垂直の特徴に基づいて、前記少なくとも１つのテキスト部分についての垂直消失ポイント決定のステップをさらに含んでいる。前記方法は、前記の水平消失ポイントと、垂直消失ポイントとに基づいて、前記画像における前記透視図法を補正するステップを必要とする投影補正のステップをさらに含んでいる。 According to a first aspect of the invention, which may be combined with other aspects described herein, there is provided a method for projection correction of an image comprising at least one text portion that is subject to perspective distortion. Disclosed. The method comprises the step of image binarization, wherein the image is binarized. Thereafter, the method includes the step of performing connected component analysis. Connected component analysis involves detecting pixel blobs in the at least one text portion of the binarized image. Thereafter, the method includes the step of horizontal vanishing point determination. Horizontal vanishing point determination comprises the steps of: estimating text baselines using the unique points of the pixel blobs; and determining horizontal vanishing points of the at least one text portion using the text baselines including. The method further includes the step of determining vertical vanishing points for the at least one text portion based on the vertical features. The method further comprises the step of projection correction which requires the step of correcting the perspective in the image based on the horizontal and vertical vanishing points.

第１の態様による実施例においては、テキストとピクチャとの分離のステップは、前記画像２値化の後、且つ前記連結成分分析の前に実行され、テキスト情報だけが保持される。 In an embodiment according to the first aspect, the step of text and picture separation is performed after the image binarization and before the connected component analysis, and only text information is retained.

第１の態様による実施例においては、各固有ポイントは、それぞれのピクセル・ブロブの境界ボックスの底部の中心とすることができる。テキスト・ベースラインを推定するステップは、混同させる（ｃｏｎｆｕｓｉｎｇ）固有ポイントの除去のステップを含むことができる。考慮している固有ポイントの近くにおいて、固有ポイントに関するラインを外れている混同させる固有ポイントは、検出される可能性がある。混同させる固有ポイントは、前記テキスト・ベースライン推定のために無視されることもある。 In an embodiment according to the first aspect, each unique point may be centered at the bottom of the bounding box of the respective pixel blob. Estimating the text baseline can include the step of removing unique points that are confusing. In the vicinity of the eigenpoint being considered, confusing eigenpoints that are out of line with the eigenpoint may be detected. Convoluted eigenpoints may be ignored for the text baseline estimation.

第１の態様による実施例においては、混同させる固有ポイントの除去ステップは、ピクセル・ブロブの幅と、高さとを決定するステップと、ピクセル・ブロブの幅と、高さとについての平均値を決定するステップと、考慮しているピクセル・ブロブの幅と、高さとのうちの少なくとも一方が、前記算出された平均値から所定の範囲だけ異なるピクセル・ブロブに属する固有ポイントとして前記混同させる固有ポイントを検出するステップとを含むことができる。 In an embodiment according to the first aspect, the step of removing the eigenpoints of confusion comprises determining the width and height of the pixel blob, and determining an average value for the width and height of the pixel blob. The eigenpoint is detected as an eigenpoint belonging to a pixel blob at least one of the step, the width of the pixel blob under consideration, and the height differs from the calculated average value by a predetermined range And the step of

第１の態様による実施例においては、テキスト・ベースラインを推定する前記ステップは、固有ポイントを固有ポイント・グループへとクラスタ化するステップを含むことができる。前記固有ポイント・グループは、以下の複数の条件、すなわち、
− グループの固有ポイントの間のポイント・ツー・ポイント距離が、第１の距離しきい値よりも下にある条件と、
− グループの各固有ポイントと、グループの固有ポイントによって形成されるラインとの間のポイント・ツー・ライン距離が、第２の距離しきい値よりも下にある条件と、
− グループの固有ポイントによって形成されるラインのオフ水平角度が、最大角度よりも下にある条件と、
− 固有ポイント・グループが、最小の数の固有ポイントを含んでいる条件と
のうちの少なくとも１つを満たすことができる。前記テキスト・ベースラインは、前記固有ポイント・グループに基づいて、推定されることもある。 In an embodiment according to the first aspect, the step of estimating text baselines may include the step of clustering unique points into unique point groups. The unique point group is subject to the following conditions:
-The condition that the point-to-point distance between the unique points of the group is below the first distance threshold;
A condition in which the point-to-line distance between each unique point of the group and the line formed by the unique points of the group is below a second distance threshold,
-The condition that the off horizontal angle of the line formed by the unique points of the group is below the maximum angle;
-The unique point group can satisfy at least one of the conditions including the minimum number of unique points. The text baseline may be estimated based on the unique point group.

第１の態様による実施例においては、前記第１の距離しきい値と、前記第２の距離しきい値と、前記最大角度と、前記最小の数の固有ポイントとは、画像のコンテンツに基づいて、適応的に設定されることもある。テキスト・ベースラインを推定する前記ステップは、固有ポイント・グループ・マージングのステップをさらに含むことができる。無視された固有ポイントの両側の上の固有ポイント・グループは、より大きな固有ポイント・グループへとマージされることもある。 In an embodiment according to the first aspect, the first distance threshold, the second distance threshold, the maximum angle, and the minimum number of unique points are based on the content of the image. May be set adaptively. The step of estimating the text baseline may further include the step of unique point group merging. The unique point groups on either side of the ignored unique points may be merged into a larger unique point group.

第１の態様による実施例においては、水平消失ポイントを決定する前記ステップは、前記推定されたテキスト・ベースラインのそれぞれをデカルト座標系におけるラインとして規定するステップと、デカルト座標系において規定される前記テキスト・ベースラインのそれぞれを同次座標系におけるデータ・ポイントへと変換するステップと、信頼度レベルをデータ・ポイントのそれぞれに割り当てるステップとを含むことができる。前記信頼度レベルは、それぞれのテキスト・ベースラインの少なくとも長さと、テキスト・ベースラインを推定するために使用される固有ポイントのグループと、結果として生ずるテキスト・ベースラインとの近接性とに基づいたものとすることができる。 In an embodiment according to the first aspect, the step of determining horizontal vanishing points comprises: defining each of the estimated text baselines as a line in a Cartesian coordinate system; Converting each of the text baselines into data points in the homogeneous coordinate system and assigning a confidence level to each of the data points may be included. The confidence level is based on at least the length of each text baseline, the group of unique points used to estimate the text baseline, and the proximity of the resulting text baseline It can be

第１の態様による実施例においては、水平消失ポイントを決定する前記ステップは、所定のしきい値よりも上の信頼度レベルを有するいくつかのデータ・ポイントを優先順位サンプル・アレイへとグループ分けするステップと、優先順位サンプル・アレイの中のデータ・ポイントをいくつかのサンプル・グループへとクラスタ化するステップと、サンプル・グループの中の各データ・ポイントに割り当てられる少なくとも信頼度レベルに基づいて、グループ信頼度値を各サンプル・グループに割り当てるステップと、ライン・フィッティングのために優先順位サンプル・アレイからデータ・ポイントのサンプル・グループを反復的に選択するステップと、をさらに含むことができる。各サンプル・グループは、２つ以上のデータ・ポイントを含むことができる。前記反復は、優先順位サンプル・アレイにおける、最高の信頼度値を有するサンプル・グループから開始することができる。 In an embodiment according to the first aspect, the step of determining a horizontal erasure point groups several data points having a confidence level above a predetermined threshold into a priority sample array. And clustering the data points in the priority sample array into several sample groups, based on at least the confidence level assigned to each data point in the sample groups. The method may further include: assigning group confidence values to each sample group; and iteratively selecting sample groups of data points from the priority sample array for line fitting. Each sample group can include more than one data point. The iteration may start with the sample group with the highest confidence value in the priority sample array.

第１の態様による実施例においては、水平消失ポイントを決定する前記ステップは、第１の適合されたラインをもたらす第１のサンプル・グループについてライン・フィッティングを実行するステップと、さらなる適合されたラインをもたらすそれぞれのさらなるサンプル・グループについてライン・フィッティングをその後に実行するステップと、第１の適合されたラインと、さらなる適合されたラインとに基づいて、第１の適合されたラインからの所定の距離しきい値よりも下に位置づけられるデータ・ポイントの組を決定するステップと、データ・ポイントの決定された組に対応する水平テキスト・ベースラインから少なくとも第１及び第２の水平消失ポイント候補を推定するステップと、を含むことができる。 In an embodiment according to the first aspect, the step of determining a horizontal vanishing point comprises performing line fitting for a first sample group resulting in a first adapted line, and a further adapted line. Performing a line fitting thereafter for each further sample group leading to a predetermined one from the first adapted line based on the first adapted line and the further adapted line Determining a set of data points located below the distance threshold, and at least first and second candidate horizontal vanishing points from the horizontal text baseline corresponding to the determined set of data points And estimating.

第１の態様による実施例においては、水平消失ポイントを決定する前記ステップは、それぞれの推定された水平消失ポイント候補に基づいて、投影補正を実行するステップと、投影補正の後に結果として生ずる水平テキスト方向に対する各水平消失ポイント候補の近接性を比較するステップと、投影補正の後に画像ドキュメントの水平テキスト方向に最も近い水平消失ポイント候補を選択するステップとを含むことができる。 In an embodiment according to the first aspect, the step of determining the horizontal vanishing point comprises performing a projection correction based on each estimated horizontal vanishing point candidate, and the resulting horizontal text after the projection correction. The steps of comparing the proximity of each horizontal vanishing point candidate to the direction and selecting the closest horizontal vanishing point candidate in the horizontal text direction of the image document after projection correction may be included.

第１の態様による実施例においては、垂直消失ポイントを決定する前記ステップは、それぞれが、画像のテキスト部分に対するブロブ・フィルタリング・アルゴリズムによって選択される前記ピクセル・ブロブのうちの選択された１つの方向に対応する、複数の垂直テキスト・ラインを推定するステップと、デカルト座標系におけるラインとして、前記推定された垂直テキスト・ラインのそれぞれを規定するステップと、デカルト座標系において推定される前記垂直テキスト・ラインのそれぞれを同次座標系におけるデータ・ポイントへと変換するステップと、信頼度レベルをデータ・ポイントのそれぞれに割り当てるステップとを含むことができる。前記信頼度レベルは、それぞれの垂直テキスト・ラインを推定するために使用されるピクセル・ブロブの形状の少なくとも偏心度に基づいたものとすることができる。 In an embodiment according to the first aspect, the steps of determining vertical vanishing points are each one of selected ones of the pixel blobs selected by the blob filtering algorithm for the text part of the image. Estimating a plurality of vertical text lines corresponding to Y, defining each of said estimated vertical text lines as lines in Cartesian coordinate system, and estimating said vertical texts in Cartesian coordinate system Converting each of the lines into data points in the homogeneous coordinate system and assigning a confidence level to each of the data points may be included. The confidence level may be based on at least the eccentricity of the shape of the pixel blob used to estimate the respective vertical text line.

第１の態様による実施例においては、垂直消失ポイントを決定する前記ステップは、所定のしきい値よりも上の信頼度レベルを有するいくつかのデータ・ポイントを優先順位サンプル・アレイへとグループ分けするステップと、優先順位サンプル・アレイの中のデータ・ポイントをいくつかのサンプル・グループへとクラスタ化するステップとを含むことができる。各サンプル・グループは、少なくとも２つのデータ・ポイントを含むことができる。垂直消失ポイントを決定する前記ステップは、サンプル・グループの中の各データ・ポイントに割り当てられる信頼度レベルに基づいてグループ信頼度値を各サンプル・グループに割り当てるステップと、ライン・フィッティングのために、優先順位サンプル・アレイからデータ・ポイントのサンプル・グループを反復的に選択するステップとを含む。前記反復は、優先順位サンプル・アレイにおける最高のグループ信頼度値を有するサンプル・グループから開始することができる。 In an embodiment according to the first aspect, the step of determining vertical erasure points groups several data points having a confidence level above a predetermined threshold into a priority sample array. , And clustering the data points in the priority sample array into several sample groups. Each sample group can include at least two data points. The step of determining vertical loss points comprises: assigning a group confidence value to each sample group based on the confidence level assigned to each data point in the sample group; Iteratively selecting a sample group of data points from the priority sample array. The iteration may start from the sample group with the highest group confidence value in the priority sample array.

第１の態様による実施例においては、垂直消失ポイントを決定する前記ステップは、第１の適合されたラインをもたらす第１のサンプル・グループについてライン・フィッティングを実行するステップと、さらなる適合されたラインをもたらすそれぞれのさらなるサンプル・グループについてライン・フィッティングをその後に実行するステップと、第１の適合されたラインと、さらなる適合されたラインとに基づいて、第１の適合されたラインからの所定の距離しきい値よりも下に位置づけられるデータ・ポイントの組を決定するステップと、データ・ポイントの決定された組に対応する垂直テキスト・ラインから少なくとも第１及び第２の垂直消失ポイント候補を推定するステップと、を含むことができる。 In an embodiment according to the first aspect, the step of determining the vertical vanishing point comprises performing line fitting for a first sample group resulting in a first adapted line, and a further adapted line Performing a line fitting thereafter for each further sample group leading to a predetermined one from the first adapted line based on the first adapted line and the further adapted line Determining a set of data points positioned below the distance threshold; and estimating at least first and second candidate vertical erasure points from vertical text lines corresponding to the determined set of data points And b.

第１の態様による実施例においては、垂直消失ポイントを決定する前記ステップは、それぞれの推定された垂直消失ポイント候補に基づいて、投影補正を実行するステップと、投影補正の後に結果として生ずる垂直テキスト方向に対するそれぞれの推定された垂直消失ポイント候補の近接性を比較するステップと、画像ドキュメントの垂直テキスト方向に最も近い垂直消失ポイント候補を選択するステップとを含むことができる。 In an embodiment according to the first aspect, the step of determining the vertical vanishing point comprises performing a projection correction based on each estimated vertical vanishing point candidate, and the resulting vertical text after the projection correction. Comparing the proximity of each estimated vertical vanishing point candidate to the direction and selecting the closest vertical vanishing point candidate in the vertical text direction of the image document may be included.

第１の態様による実施例においては、前記ブロブ・フィルタリング・アルゴリズムは、以下の複数の条件、すなわち、どのようにしてそれが引き延ばされているかを表す、考慮されたピクセル・ブロブの形状の偏心度（値は、０と１との間にあり、０と、１とは、両極端であり、すなわち、その偏心度が０であるブロブは、実際には円形オブジェクトであるが、その偏心度が１であるブロブは、ライン・セグメントである）が、所定のしきい値よりも上にある条件と、画像の境界に対する各ピクセル・ブロブの近接性が、所定の距離しきい値よりも上にある条件と、垂直方向に対する結果として生ずる垂直ラインの角度が、最大角度しきい値よりも下にある条件と、ピクセルの数によって規定される各ピクセル・ブロブのエリアが、最大エリアしきい値よりも下にあるが、最小エリアしきい値よりも上にある条件とのうちの１つ又は複数に基づいて、ピクセル・ブロブを選択することができる。 In an embodiment according to the first aspect, the blob filtering algorithm is in the form of considered pixel blobs which represent the following conditions: how it is stretched. Eccentricity (values between 0 and 1; 0 and 1 are extremes, ie a blob whose eccentricity is 0 is actually a circular object, but its eccentricity A blob where is 1 is a line segment) but the condition that the line segment is above a predetermined threshold, and the proximity of each pixel blob to the image boundary is above the predetermined distance threshold. And the area of each pixel blob defined by the number of pixels is the condition that the condition of and the angle of the resulting vertical line with respect to the vertical direction is below the maximum angle threshold. It is below the threshold, but on the basis of one or more of the conditions are above the minimum area threshold, it is possible to select a pixel blobs.

第１の態様による実施例においては、前記第１及び第２の消失ポイント候補は、最小二乗法と、重み付けされた最小二乗法と、適応最小二乗法とから成る群から選択される異なる近似方法を使用して推定されることもある。 In an embodiment according to the first aspect, the first and second vanishing point candidates are different approximation methods selected from the group consisting of least squares, weighted least squares, and adaptive least squares. It may be estimated using.

本明細書において説明される他の態様と組み合わされ得る、本発明の第１の代替的な態様においては、透視図法によってひずみを受ける少なくとも１つのテキスト部分を含む画像の投影補正のための方法が開示される。本方法は、前記画像が２値化される、画像２値化のステップと、連結成分分析のステップとを含む。連結成分分析は、前記２値化された画像の前記少なくとも１つのテキスト部分において、ピクセル・ブロブを検出する。前記ピクセル・ブロブのそれぞれでは、位置決定ピクセルが、ピクセル・ブロブのピクセル・ブロブ・ベースラインの上で選択されることもある。前記位置決定ピクセルは、２値化された画像におけるピクセル・ブロブの位置を規定することができる。本方法は、水平消失ポイント決定のステップをさらに含んでいる。水平消失ポイント決定は、前記位置決定ピクセルを用いてテキスト・ベースラインを推定するステップと、前記テキスト・ベースラインを用いて前記少なくとも１つのテキスト部分の水平消失ポイントを決定するステップとを含む。本方法は、垂直消失ポイント決定をさらに含んでいる。垂直消失ポイントは、その垂直の特徴に基づいて、前記少なくとも１つのテキスト部分について決定される。本方法は、投影補正のステップをさらに含んでおり、そこでは、前記画像の中の前記透視図法ひずみは、前記の水平消失ポイントと、垂直消失ポイントとに基づいて補正される。 In a first alternative aspect of the invention, which may be combined with the other aspects described herein, there is provided a method for projection correction of an image comprising at least one text portion that is subject to perspective distortion. Disclosed. The method comprises an image binarization step and a connected component analysis step, wherein the image is binarized. Connected component analysis detects pixel blobs in the at least one text portion of the binarized image. For each of the pixel blobs, location pixels may be selected over the pixel blob baseline of the pixel blob. The positioning pixels may define the positions of pixel blobs in the binarized image. The method further comprises the step of horizontal vanishing point determination. Horizontal vanishing point determination includes estimating a text baseline using the locating pixels and determining a horizontal vanishing point of the at least one text portion using the text baseline. The method further includes vertical vanishing point determination. Vertical vanishing points are determined for the at least one text portion based on the vertical features. The method further comprises the step of projection correction, wherein the perspective distortion in the image is corrected based on the horizontal and vertical vanishing points.

第１の代替的な態様による実施例においては、テキストとピクチャとの分離のステップが、前記画像２値化の後、且つ前記連結成分分析の前に実行され、テキスト情報だけが保持される。 In an embodiment according to the first alternative aspect, the step of text and picture separation is performed after the image binarization and before the connected component analysis, and only text information is retained.

第１の代替的な態様の実施例においては、説明されるような前記位置決定ピクセルは、ピクセル・ブロブの境界ボックスの底部の中心とすることができる。前記位置決定ピクセルは、代替的な実施例において、ピクセル・ブロブの境界ボックスの底部コーナー（すなわち、底部の左コーナー又は右コーナー）、或いはピクセル・ブロブ又はその上の境界ボックスの位置を決定する別のピクセルとすることができる。 In an embodiment of the first alternative aspect, the locating pixel as described may be centered at the bottom of the bounding box of the pixel blob. The location pixel may, in an alternative embodiment, determine the position of the bottom corner of the bounding box of the pixel blob (i.e. the bottom left corner or the right corner) or the pixel blob or the bounding box above it. Can be a pixel.

第１の態様、又は第１の代替的な態様の実施例においては、上記で説明された方法又はステップを実行するように構成された１つ又は複数のプロセッサと、互換性のあるソフトウェア・コード部分とを含むシステム又はデバイスが提供されることもある。 In an embodiment of the first aspect or the first alternative aspect, a software code compatible with one or more processors configured to perform the method or the steps described above Systems or devices may be provided that include parts.

第１の態様、又は第１の代替的な態様の実施例においては、その上に、コンピュータ・デバイスの上で実行可能なフォーマットにおけるソフトウェア・コード部分を含み、前記コンピュータ・デバイスの上で実行されるときに、上記で説明された方法又はステップを実行するように構成されたコンピュータ・プログラム製品が記憶される非一時的ストレージ媒体が、提供されることもある。前記コンピュータ・デバイスは、以下のデバイスのうちの、すなわち、パーソナル・コンピュータ、ポータブル・コンピュータ、ラップトップ・コンピュータ、ネットブック・コンピュータ、タブレット・コンピュータ、スマートフォン、デジタル・スチル・カメラ、ビデオ・カメラ、モバイル通信デバイス、携帯型個人情報端末、スキャナ、多機能デバイス、又は任意の他のコンピュータのようなデバイスのうちのどれかとすることもできる。 In an embodiment of the first aspect or the first alternative aspect, further comprising software code portions in a format that is executable on a computing device and executed on said computing device A non-transitory storage medium may also be provided, on which a computer program product configured to perform the method or the steps described above is stored. The computer device is one of the following devices: personal computer, portable computer, laptop computer, netbook computer, tablet computer, smartphone, digital still camera, video camera, mobile It can also be any device such as a communication device, a personal digital assistant, a scanner, a multifunction device, or any other computer.

本明細書において説明される他の態様と組み合わされ得る、本発明による第２の態様においては、透視図法によってひずみを受ける画像ドキュメントにおいて、テキスト部分の消失ポイント候補を決定するための方法が説明される。本方法は、前記画像が２値化される、画像２値化のステップを含んでいる。その後に、本方法は、連結成分分析を実行するステップを含んでおり、そこでは、ピクセル・ブロブは、前記２値化された画像の前記少なくとも１つのテキスト部分において、検出される。位置決定ピクセルは、ピクセル・ブロブのピクセル・ブロブ・ベースラインの上で前記ピクセル・ブロブのそれぞれについて選択され、前記位置決定ピクセルは、２値化された画像におけるピクセル・ブロブの位置を規定している。本方法はまた、デカルト座標系において、位置決定ピクセルに基づいて、各テキスト・ラインが、前記テキスト部分の水平テキスト方向又は垂直テキスト方向の近似を表す、いくつかのテキスト・ラインを推定するステップを含んでいる。本方法はまた、前記テキスト・ラインのそれぞれを同次座標系におけるデータ・ポイントに変換するステップを含んでいる。本方法は、信頼度レベルをデータ・ポイントのそれぞれに割り当てるステップをさらに含んでいる。本方法は、所定のしきい値よりも上の信頼度レベルを有するいくつかのデータ・ポイントを優先順位サンプル・アレイにグループ分けするステップを含んでいる。本方法は、優先順位サンプル・アレイの中のデータ・ポイントをいくつかのサンプル・グループへとクラスタ化するステップを含んでいる。各サンプル・グループは、２つ以上のデータ・ポイントを含んでいる。本方法は、サンプル・グループの中の各データ・ポイントに割り当てられる少なくとも信頼度レベルに基づいて、グループ信頼度値を各サンプル・グループに割り当てるステップをさらに含んでいる。さらに、本方法は、第１の適合されたラインに関して、前記データ・ポイントのうちで、インライアの組を決定するために、ランダム・サンプル・コンセンサス（ＲＡＮＳＡＣ）アルゴリズムを適用するステップを含んでいる。ＲＡＮＳＡＣアルゴリズムは、優先順位サンプル・アレイの中に最高のグループ信頼度値を有するサンプル・グループから開始される。本方法は、インライアの前記組に対応するテキスト・ラインから少なくとも１つの消失ポイント候補を推定するステップをさらに含んでいる。 In a second aspect according to the invention, which can be combined with the other aspects described herein, a method is described for determining candidate vanishing points of text portions in an image document that is distorted by perspective Ru. The method includes the step of image binarization, wherein the image is binarized. Thereafter, the method comprises performing connected component analysis, wherein pixel blobs are detected in the at least one text portion of the binarized image. Positioning pixels are selected for each of the pixel blobs above the pixel blob baseline of the pixel blob, the positioning pixels defining the positions of the pixel blobs in the binarized image There is. The method also estimates the number of text lines in the Cartesian coordinate system based on the positioning pixels, each text line representing an approximation of the horizontal text direction or the vertical text direction of the text portion. It contains. The method also includes the step of transforming each of the text lines into data points in the homogeneous coordinate system. The method further includes the step of assigning a confidence level to each of the data points. The method includes the step of grouping several data points with confidence levels above a predetermined threshold into a priority sample array. The method includes clustering data points in the priority sample array into a number of sample groups. Each sample group contains two or more data points. The method further includes assigning a group confidence value to each sample group based on at least the confidence level assigned to each data point in the sample group. Additionally, the method includes applying a random sample consensus (RANSAC) algorithm to determine an inlier set of the data points for the first adapted line. The RANSAC algorithm starts with the sample group having the highest group confidence value in the priority sample array. The method further comprises the step of estimating at least one erasure point candidate from text lines corresponding to the set of inliers.

第２の態様による実施例においては、テキストとピクチャとの分離のステップが、前記画像２値化の後、且つ前記連結成分分析の前に実行され、テキスト情報だけが保持される。 In an embodiment according to the second aspect, the step of separating text and picture is performed after the image binarization and before the connected component analysis, and only text information is retained.

第２の態様による実施例においては、前記データ・ポイントに割り当てられる信頼度レベルは、それぞれのテキスト・ラインの少なくとも長さと、それぞれのテキスト・ラインに対する位置決定ピクセルの近接性とに基づいたものとすることができる。 In an embodiment according to the second aspect, the confidence level assigned to the data points is based on at least the length of the respective text line and the proximity of the locating pixel to the respective text line. can do.

第２の態様による実施例においては、ＲＡＮＳＡＣアルゴリズムは、以下のステップを含むことができる。第１に、ライン・フィッティングのために優先順位サンプル・アレイからデータ・ポイントのサンプル・グループを反復的に選択するステップである。反復は、優先順位サンプル・アレイの中で最高のグループ信頼度値を有するサンプル・グループから開始される。次に、第１の適合されたラインをもたらす第１のサンプル・グループについてライン・フィッティングを実行するステップと、さらなる適合されたラインをもたらすそれぞれのさらなるサンプル・グループについてライン・フィッティングをその後に実行するステップとである。次に、第１の適合されたラインと、さらなる適合されたラインとに基づいて、第１の適合されたラインからの所定の距離しきい値よりも下に位置づけられるデータ・ポイントの組を決定するステップであり、データ・ポイントの前記組は、インライアの前記組を形成している。 In an embodiment according to the second aspect, the RANSAC algorithm may comprise the following steps: The first step is to iteratively select a sample group of data points from the priority sample array for line fitting. The iteration is started with the sample group having the highest group confidence value in the priority sample array. Then, performing line fitting for the first sample group leading to the first fitted line and then performing line fitting for each further sample group leading to the further adapted line Step. Then, based on the first adapted line and the further adapted line, a set of data points positioned below a predetermined distance threshold from the first adapted line is determined The set of data points form the set of inliers.

第２の態様による実施例においては、第１の適合されたラインからの所定の距離しきい値は、固定されたパラメータとすることができる。第１の適合されたラインからの所定の距離しきい値は、代わりに、画像ドキュメントのコンテンツに基づいて、適応可能とすることができる。 In an embodiment according to the second aspect, the predetermined distance threshold from the first adapted line may be a fixed parameter. The predetermined distance threshold from the first adapted line may instead be adaptable based on the content of the image document.

第２の態様による実施例においては、少なくとも第１及び第２の消失ポイント候補が、インライアの前記組に対応するテキスト・ラインから推定されることもある。第１及び第２の消失ポイント候補は、最小二乗法と、重み付けされた最小二乗法と、適応最小二乗法とから成る群から選択される異なる近似方法を使用して推定されることもある。本方法は、次いで、推定された消失ポイント候補から消失ポイントを選択するステップをさらに含むことができる。選択は、それぞれの推定された消失ポイント候補に基づいて、画像ドキュメントに対する投影補正を実行するステップと、投影補正の後に結果として生ずる水平テキスト方向又は垂直テキスト方向に対する各消失ポイント候補の近接性を比較するステップと、投影補正の後に画像ドキュメントの水平テキスト方向又は垂直テキスト方向に最も近い消失ポイント候補を選択するステップとを含むことができる。 In an embodiment according to the second aspect at least first and second candidate erasure points may be deduced from text lines corresponding to said set of inliers. The first and second candidate vanishing points may be estimated using different approximation methods selected from the group consisting of least squares, weighted least squares, and adaptive least squares. The method may then further include the step of selecting a vanishing point from the estimated vanishing point candidates. The selection is performed by performing a projection correction on the image document based on each estimated vanishing point candidate and comparing the proximity of each vanishing point candidate to the resulting horizontal or vertical text direction after projection correction. The method may include the steps of: performing the projection correction; and selecting the closest vanishing point candidate in the horizontal text direction or the vertical text direction of the image document.

第２の態様による実施例においては、各サンプル・グループのグループ信頼度値は、さらに、サンプル・グループの中のデータ・ポイントに対応するそれぞれの推定されたテキスト・ラインの間の距離に基づいたものとすることができる。データ・ポイントのそれぞれの信頼度レベルは、それぞれの各テキスト・ラインを推定するために使用されるピクセル・ブロブの主要な方向に基づいたものとすることができる。主要な方向は、各ピクセル・ブロブの形状の偏心度によって規定されることもある。優先順位サンプル・アレイへとグループ分けされるデータ・ポイントの最大数は、２と２０との間にあることもあり、５と１０との間にあることがより好ましいこともある。 In an embodiment according to the second aspect, the group confidence value of each sample group is further based on the distance between each estimated text line corresponding to a data point in the sample group. It can be The confidence level of each of the data points may be based on the principal direction of the pixel blob used to estimate each respective text line. The main direction may be defined by the eccentricity of the shape of each pixel blob. The maximum number of data points grouped into a priority sample array may be between 2 and 20, and more preferably between 5 and 10.

第２の態様による実施例においては、推定されたテキスト・ラインは、画像のテキスト部分に対するブロブ・フィルタリング・アルゴリズムによって選択される、前記ピクセル・ブロブの選択された１つの方向にそれぞれ対応する垂直テキスト・ブロブ・ラインとすることができる。 In an embodiment according to the second aspect, the estimated text line is a vertical text respectively corresponding to a selected one direction of said pixel blobs selected by a blob filtering algorithm on the text part of the image It can be a blob line.

第２の態様の実施例においては、上記で説明された方法又はステップを実行するように構成された１つ又は複数のプロセッサと、互換性のあるソフトウェア・コード部分とを含むシステム又はデバイスが提供されることもある。 In an embodiment of the second aspect, there is provided a system or device comprising one or more processors configured to perform the above described method or steps, and compatible software code portions. It is also possible.

第２の態様の実施例においては、その上に、コンピュータ・デバイスの上で実行可能なフォーマットにおけるソフトウェア・コード部分を含み、前記コンピュータ・デバイスの上で実行されるときに、上記で説明された方法又はステップを実行するように構成されたコンピュータ・プログラム製品が記憶される非一時的ストレージ媒体が、提供されることもある。前記コンピュータ・デバイスは、以下のデバイスのうちの、すなわち、パーソナル・コンピュータ、ポータブル・コンピュータ、ラップトップ・コンピュータ、ネットブック・コンピュータ、タブレット・コンピュータ、スマートフォン、デジタル・スチル・カメラ、ビデオ・カメラ、モバイル通信デバイス、携帯型個人情報端末、スキャナ、多機能デバイス、又は任意の他のコンピュータのようなデバイスのうちのどれかとすることもできる。 In an embodiment of the second aspect, it further comprises software code portions in an executable format on a computing device, as described above when executed on said computing device A non-transitory storage medium may also be provided on which a computer program product configured to perform the method or step is stored. The computer device is one of the following devices: personal computer, portable computer, laptop computer, netbook computer, tablet computer, smartphone, digital still camera, video camera, mobile It can also be any device such as a communication device, a personal digital assistant, a scanner, a multifunction device, or any other computer.

本明細書において説明される他の態様と組み合わされ得る、本発明の第３の態様においては、透視図法によってひずみを受ける少なくとも１つのテキスト部分を含む画像の投影補正のための方法が開示される。本方法は、前記画像が２値化される、画像２値化のステップを含んでいる。その後に、本方法は、連結成分分析を実行するステップを含んでいる。連結成分分析は、前記２値化された画像の前記少なくとも１つのテキスト部分についてピクセル・ブロブを検出するステップを必要とする。位置決定ピクセルが、ピクセル・ブロブのピクセル・ブロブ・ベースラインの上で前記ピクセル・ブロブのそれぞれについて、選択される。前記位置決定ピクセルは、２値化された画像におけるピクセル・ブロブの位置を規定する。本方法は、水平消失ポイント決定のステップを含んでいる。水平消失ポイント決定は、前記ピクセル・ブロブの位置決定ピクセルを用いてテキスト・ベースラインを推定するステップと、前記推定されたテキスト・ベースラインから水平消失ポイント候補を識別するステップと、前記水平消失ポイント候補を用いて前記少なくとも１つのテキスト部分の水平消失ポイントを決定するステップとを含む。本方法はまた、その垂直の特徴に基づいて、前記少なくとも１つのテキスト部分についての垂直消失ポイント決定のステップを含んでいる。本方法は、投影補正のステップをさらに含んでいる。投影補正は、前記の水平消失ポイントと、垂直消失ポイントとに基づいて、前記画像における前記透視図法を補正するステップを必要とする。水平消失ポイント決定は、固有ポイントのレベルに対する第１の除去ステップと、テキスト・ベースラインのレベルに対する第２の除去ステップと、水平消失ポイント候補のレベルに対する第３の除去ステップとを含むことができる。 In a third aspect of the invention, which may be combined with the other aspects described herein, a method is disclosed for projection correction of an image comprising at least one text portion that is subject to distortion by perspective. . The method includes the step of image binarization, wherein the image is binarized. Thereafter, the method includes the step of performing connected component analysis. Connected component analysis involves detecting pixel blobs for the at least one text portion of the binarized image. Positioning pixels are selected for each of the pixel blobs above the pixel blob baseline of the pixel blob. The position determining pixel defines the position of the pixel blob in the binarized image. The method includes the step of horizontal vanishing point determination. Horizontal vanishing point determination comprises the steps of estimating a text baseline using locating pixels of the pixel blob, identifying horizontal vanishing point candidates from the estimated text baseline, and the horizontal vanishing point Determining a horizontal vanishing point of the at least one text portion using the candidate. The method also includes the step of determining vertical vanishing points for the at least one text portion based on the vertical features. The method further comprises the step of projection correction. Projection correction requires the step of correcting the perspective in the image based on the horizontal and vertical vanishing points. The horizontal vanishing point determination may include a first removing step for the level of unique points, a second removing step for the level of the text baseline, and a third removing step for the level of the horizontal vanishing point candidate. .

第３の態様による実施例においては、テキストとピクチャとの分離のステップは、前記画像２値化の後、且つ前記連結成分分析の前に実行され、テキスト情報だけが保持される。 In an embodiment according to the third aspect, the step of text and picture separation is performed after the image binarization and before the connected component analysis, and only text information is retained.

第３の態様による実施例においては、第１の除去ステップは、考慮している固有ポイントの近くにおける固有ポイントに関してラインから外れている混同させる固有ポイントを検出するステップを含んでいる。前記混同させる固有ポイントは、前記テキスト・ベースライン推定のために無視されることもある。 In an embodiment according to the third aspect, the first removing step comprises the step of detecting confusable unique points out of line with respect to unique points near the considered unique point. The confused unique points may be ignored for the text baseline estimation.

第３の態様による実施例においては、前記混同させる固有ポイントの除去ステップは、ピクセル・ブロブの幅と、高さとを決定するステップと、ピクセル・ブロブの幅と、高さとについての平均値を決定するステップと、考慮しているピクセル・ブロブの幅と、高さとのうちの少なくとも一方が、前記算出された平均値から所定の範囲だけ異なるピクセル・ブロブに属する固有ポイントとして、前記混同させる固有ポイントを検出するステップとを含むことができる。 In an embodiment according to the third aspect, the confounding unique point removing step determines an average value of the pixel blob width and height, the pixel blob width and height, and the step of determining the pixel blob width and height. The eigenpoint to be confused as an eigenpoint belonging to a pixel blob different from the calculated average value by at least one of the following steps: width and height of the pixel blob under consideration And detecting.

第３の態様による実施例においては、テキスト・ベースラインを推定する前記ステップは、固有ポイントを固有ポイント・グループへとクラスタ化するステップを含んでいる。前記固有ポイント・グループは、以下の複数の条件、すなわち、
− グループの固有ポイントの間のポイント・ツー・ポイント距離が、第１の距離しきい値よりも下にある条件と、
− グループの各固有ポイントと、グループの固有ポイントによって形成されるラインとの間のポイント・ツー・ライン距離が、第２の距離しきい値よりも下にある条件と、
− グループの固有ポイントによって形成されるラインのオフ水平角度が、最大角度よりも下にある条件と、
− 固有ポイント・グループが、最小の数の固有ポイントを含んでいる条件と
のうちの少なくとも１つを満たすことができる。前記テキスト・ベースラインは、次いで、前記固有ポイント・グループに基づいて、推定されることもある。 In an embodiment according to the third aspect, the step of estimating text baselines comprises the step of clustering unique points into unique point groups. The unique point group is subject to the following conditions:
-The condition that the point-to-point distance between the unique points of the group is below the first distance threshold;
A condition in which the point-to-line distance between each unique point of the group and the line formed by the unique points of the group is below a second distance threshold,
-The condition that the off horizontal angle of the line formed by the unique points of the group is below the maximum angle;
-The unique point group can satisfy at least one of the conditions including the minimum number of unique points. The text baseline may then be estimated based on the eigenpoint group.

第３の態様による実施例においては、前記第１の距離しきい値と、前記第２の距離しきい値と、前記最大角度と、前記最小の数の固有ポイントとは、画像のコンテンツに基づいて、適応的に設定されることもある。テキスト・ベースラインを推定する前記ステップは、固有ポイント・グループ・マージングのステップをさらに含むことができ、このステップにおいては、無視された固有ポイントの両側の上の固有ポイント・グループは、より大きな固有ポイント・グループへとマージされる。 In an embodiment according to the third aspect, the first distance threshold, the second distance threshold, the maximum angle, and the minimum number of unique points are based on the content of the image. May be set adaptively. The step of estimating the text baseline may further include the step of unique point group merging, in which step the unique point groups on both sides of the ignored unique points are larger unique points. Merged into a point group.

第３の態様による実施例においては、第２の除去ステップは、前記テキスト・ベースラインに信頼度レベルを割り当てるステップと、前記信頼度レベルに基づいて、テキスト・ベースラインを除去するステップとを含む。信頼度レベルは、それぞれのテキスト・ベースラインの少なくとも長さと、テキスト・ベースラインを推定するために使用される固有ポイントのグループと結果として生ずるテキスト・ベースラインとの近接性とに基づいて、決定されることもある。テキスト・ベースラインの除去は、ＲＡＮＳＡＣアルゴリズムを用いて実行されることもあり、このＲＡＮＳＡＣアルゴリズムにおいては、前記信頼度レベルが考慮に入れられる。 In an embodiment according to the third aspect, the second removing step comprises the steps of: assigning a confidence level to the text baseline; and removing the text baseline based on the confidence level. . Confidence levels are determined based on at least the length of each text baseline and the proximity of the group of unique points used to estimate the text baseline to the resulting text baseline. It is also possible. Text baseline removal may also be performed using the RANSAC algorithm, in which the confidence level is taken into account.

第３の態様による実施例においては、第３の除去ステップは、それぞれの識別された水平消失ポイント候補に基づいて、投影補正を実行するステップと、投影補正の後に結果として生ずる水平テキスト方向に対する各水平消失ポイント候補の近接性を比較するステップと、投影補正の後に画像ドキュメントの水平テキスト方向に最も近い水平消失ポイント候補を選択するステップとを含む。 In an embodiment according to the third aspect, the third removing step comprises performing a projection correction based on each of the identified horizontal vanishing point candidates, and each one for the resulting horizontal text direction after the projection correction. The steps of comparing the proximity of the horizontal vanishing point candidate and selecting the closest horizontal vanishing point candidate in the horizontal text direction of the image document after projection correction.

第３の態様による実施例においては、第１及び第２の水平消失ポイント候補が、前記第２の除去ステップの後に、前記テキスト・ベースラインから推定されることもある。前記第１及び第２の水平消失ポイント候補の前記推定のために、最小二乗法と、重み付けされた最小二乗法と、適応最小二乗法とから成る群から選択される異なる近似方法が使用されることもある。 In an embodiment according to the third aspect, first and second candidate horizontal vanishing points may be estimated from the text baseline after the second removal step. A different approximation method selected from the group consisting of least squares, weighted least squares and adaptive least squares is used for the estimation of the first and second candidate horizontal vanishing points. Sometimes.

第３の態様の実施例においては、上記で説明された方法又はステップを実行するように構成された１つ又は複数のプロセッサと、互換性のあるソフトウェア・コード部分とを含むシステム又はデバイスが提供されることもある。 In an embodiment of the third aspect, there is provided a system or device comprising one or more processors configured to perform the above described methods or steps, and compatible software code portions. It is also possible.

第３の態様の実施例においては、その上に、コンピュータ・デバイスの上で実行可能なフォーマットにおけるソフトウェア・コード部分を含み、前記コンピュータ・デバイスの上で実行されるときに、上記で説明された方法又はステップを実行するように構成されたコンピュータ・プログラム製品が記憶される非一時的ストレージ媒体が、提供されることもある。前記コンピュータ・デバイスは、以下のデバイスのうちの、すなわち、パーソナル・コンピュータ、ポータブル・コンピュータ、ラップトップ・コンピュータ、ネットブック・コンピュータ、タブレット・コンピュータ、スマートフォン、デジタル・スチル・カメラ、ビデオ・カメラ、モバイル通信デバイス、携帯型個人情報端末、スキャナ、多機能デバイス、又は任意の他のコンピュータのようなデバイスのうちのどれかとすることもできる。 In an embodiment of the third aspect, it further comprises software code portions in an executable format on a computing device, as described above when executed on said computing device A non-transitory storage medium may also be provided on which a computer program product configured to perform the method or step is stored. The computer device is one of the following devices: personal computer, portable computer, laptop computer, netbook computer, tablet computer, smartphone, digital still camera, video camera, mobile It can also be any device such as a communication device, a personal digital assistant, a scanner, a multifunction device, or any other computer.

本発明は、さらに、以下の説明と、添付の図面とを用いて明らかにされるであろう。 The invention will be further elucidated using the following description and the attached drawings.

本開示の一実施例による、ひずみを受けた画像の投影補正が説明されるためのプロセス・フローを示す図である。FIG. 5 illustrates a process flow for describing projection correction of a distorted image, according to one embodiment of the present disclosure. 本開示の一実施例による、水平消失ポイントを識別するためのプロセス・フローを示す図である。FIG. 7 illustrates a process flow for identifying horizontal vanishing points, according to one embodiment of the present disclosure. 本開示の一実施例による、固有ポイント・クラスタ化アルゴリズムを示す、テキストにおいて一緒に図３と称されることもある、図である。FIG. 4 is a diagram illustrating a unique point clustering algorithm, also sometimes referred to in the text as FIG. 3, according to one embodiment of the present disclosure. 本開示の一実施例による、固有ポイント・クラスタ化アルゴリズムを示す、テキストにおいて一緒に図３と称されることもある、図である。FIG. 4 is a diagram illustrating a unique point clustering algorithm, also sometimes referred to in the text as FIG. 3, according to one embodiment of the present disclosure. 本開示の一実施例による、位置決定ピクセルを使用して垂直消失ポイントを識別するためのプロセス・フローを示す図である。FIG. 7 illustrates a process flow for identifying vertical vanishing points using positioning pixels, according to one embodiment of the present disclosure. 本開示の一実施例による、テキスト・ストロークの特徴を使用して垂直消失ポイントを識別するためのプロセス・フローを示す図である。FIG. 7 illustrates a process flow for identifying vertical vanishing points using text stroke features, according to one embodiment of the present disclosure. 本開示の一実施例による、テキストと一緒にピクチャを有する実例の２値化された画像を示す図である。FIG. 6 illustrates an example binarized image having a picture with text according to an embodiment of the present disclosure. 本開示の一実施例による、テキストからピクチャをフィルタにかけて除いた後の、結果として生ずる画像を示す図である。FIG. 7 illustrates the resulting image after filtering out a picture from text according to an embodiment of the present disclosure. 本開示の一実施例による、実例のピクセル・ブロブを示す図である。FIG. 7 illustrates an example pixel blob, according to one embodiment of the present disclosure. 本開示の一実施例による、ユーザが画像のコーナーを調整するためのプレゼンテーション・グリッドを示す図である。FIG. 7 illustrates a presentation grid for a user to adjust the corners of an image, according to one embodiment of the present disclosure. 本開示の一実施例による、取り込まれた画像を示す図である。FIG. 7 shows a captured image according to an embodiment of the present disclosure. 本開示の一実施例による、改善された画像を投影補正の結果として示す図である。FIG. 7 shows an improved image as a result of projection correction, according to an embodiment of the present disclosure. 本開示の一実施例による、テキストについての固有ポイントが識別される実例の画像を示す図である。FIG. 7 illustrates an example image in which unique points for text are identified, according to one embodiment of the present disclosure. 本開示の一実施例による、分類されすぎた固有ポイント・グループを有する実例の画像を示す図である。FIG. 7 shows an image of an example having over-classified unique point groups according to an embodiment of the present disclosure. 本開示の一実施例による、統合された固有ポイント・グループを有する実例の画像を示す図である。FIG. 7 shows an image of an example having an integrated unique point group according to an embodiment of the present disclosure. 本開示の一実施例による、ベースラインが推定されるテキストの実例の部分を示す図である。FIG. 7 is a diagram showing portions of an example of text for which a baseline is estimated, according to an embodiment of the present disclosure. 本開示の一実施例による、マージン特徴ポイントが、マージンにおいて識別される実例の画像を示す図である。FIG. 7 illustrates an example image in which margin feature points are identified in the margin, according to one embodiment of the present disclosure. 本開示の一実施例による、同じマージンに沿って２つの推定された垂直ラインを有する実例の画像を示す図である。FIG. 7 illustrates an example image having two estimated vertical lines along the same margin, according to one embodiment of the present disclosure. 本開示の一実施例による、推定された垂直ラインのマージングを示す実例の画像を示す図である。FIG. 7 shows an example image showing the merging of estimated vertical lines according to an embodiment of the present disclosure. 本開示の一実施例による、キャラクタのテキスト・ストロークの特徴を示す実例の画像を示す図である。FIG. 5 is an illustration of an example showing the character's text stroke characteristics according to an embodiment of the present disclosure. 本開示の一実施例による、テキスト・ストロークの特徴識別の後の選択的に抽出されたブロブを示す実例の画像を示す図である。FIG. 6 is an illustration of an example showing selectively extracted blobs after text stroke feature identification according to one embodiment of the present disclosure. 本開示の一実施例による、選択されたピクセル・ブロブについての推定された垂直テキスト・ブロブ・ラインを示す実例の画像を示す図である。FIG. 7 is an illustration of an example showing an estimated vertical text blob line for a selected pixel blob, according to one embodiment of the present disclosure. 本開示の一実施例による、垂直消失ポイントについて選択される垂直テキスト・ブロブ・ラインを示す実例の画像を示す図である。FIG. 7 is an illustration of an example showing vertical text blob lines selected for vertical vanishing points, according to one embodiment of the present disclosure.

本発明は、特定の実施例に関して、ある種の図面を参照して説明されることになるが、本発明は、それだけに限定されるものではないが、特許請求の範囲によってのみ限定されるものである。説明される図面は、概略的なものにすぎず、非限定的である。図面においては、それらの要素のうちのいくつかの大きさは、誇張されており、例証の目的のために、縮尺して描かれてはいない可能性がある。寸法と、相対的な寸法とは、本発明を実施する実際の縮小には必ずしも対応しているものとは限らない。 The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims. is there. The drawings to be described are merely schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes. The dimensions and relative dimensions do not necessarily correspond to the actual reductions in which the invention is implemented.

さらに、本説明における、特許請求の範囲における第１の、第２の、第３のなどの用語は、類似した要素の間で区別するために使用されるが、必ずしも逐次的順序又は時系列順序を説明するためのものとは限らない。用語は、適切な状況の下で交換可能であり、本発明の実施例は、本明細書において説明され、又は例証される以外の他のシーケンスにおいても動作することができる。 Furthermore, in the present description, the terms first, second, third and the like in the claims are used to distinguish between similar elements, but not necessarily in sequential order or chronological order. It is not necessarily for explaining. The terms are interchangeable under appropriate circumstances and the embodiments of the invention can operate in other sequences than described or illustrated herein.

さらに、本説明における、特許請求の範囲における上部、下部、上の、下のなどの用語は、説明の目的のために使用されるが、必ずしも相対的な位置を説明するためのものとは限らない。そのようにして使用されるそれらの用語は、適切な状況の下で交換可能であり、本明細書において説明される本発明の実施例は、本明細書において説明され、又は例証される以外の他の方向付けにおいても動作することができる。 Furthermore, the terms upper, lower, upper, lower, and the like in the description, in this description, are used for descriptive purposes, but not necessarily for describing relative positions. Absent. Those terms used as such are interchangeable under appropriate circumstances and the embodiments of the invention described herein are other than as described or exemplified herein. It can also operate in other orientations.

特許請求の範囲において使用される用語「備えている／含んでいる」は、その後にリストアップされる手段だけに限定されるように解釈されるべきではなく、それは、他の要素又はステップを排除することはない。その用語は、言及されるように、述べられた特徴、整数、ステップ、又はコンポーネントの存在を指定するように解釈される必要があるが、１つ又は複数の他の特徴、整数、ステップ、若しくはコンポーネント、又はそれらのグループの存在又は追加を除外するものではない。したがって、表現「手段Ａと、Ｂとを備えているデバイス」の範囲は、コンポーネントＡと、Ｂとだけから成るデバイスだけに限定されるべきではない。その表現は、本発明に関して、デバイスの単に関連のあるコンポーネントがＡとＢとであることを意味している。 The term "comprising / including", as used in the claims, should not be construed as being limited to the means listed thereafter; it excludes other elements or steps There is nothing to do. The term should be construed as designating the presence of the stated feature, integer, step or component, as mentioned, but one or more other features, integers, steps or It does not exclude the presence or addition of components, or their groups. Thus, the scope of the expression "a device comprising means A and B" should not be limited to a device consisting only of components A and B. The term, in the context of the present invention, means that the only relevant components of the device are A and B.

図１を参照すると、ひずみを受けた画像の投影補正のためのプロセス・フロー１００が説明されている。画像は、投影補正のために受信されることもある。画像は、オプションとして検査して、画像の品質を決定することができる。画像を検査することは、ノイズの存在、照明状態、キャラクタの明瞭性、分解能などをチェックすることを含むことができる。画像の品質が、所定のしきい値よりも上にある場合、画像は、ステップ１０２において処理されることもある。画像の品質が、所定のしきい値よりも下にある場合、画像が前処理されて、画像の品質を改善することができる。前処理することは、画像の分解能を回復させ、改善するために、色相を修正すること、輝度アンバランスを補正すること、鮮明さの調整、ノイズを除去すること、モーション・ブラーを除去すること／補正すること、カメラ誤焦点を補償することなどを含み、必要とする可能性がある。１つの実例の実装形態においては、前処理することは、自動的に実行されることもある。別の実例の実装形態においては、ツールボックス・オプションがユーザに提供されて、画像についての前処理のタイプを選択することができる。一実施例においては、前処理することは、それだけには限定されないが、ガウス・フィルタリング及びメジアン・フィルタリング、ウィーナー・フィルタリング、バイラテラル・フィルタリング、ウィーナー・デコンボリューション、全変動デコンボリューション、コントラスト制限適応ヒストグラム等化など、様々な画像フィルタリング方法を含む、知られている技法を使用して実施されることもある。 Referring to FIG. 1, a process flow 100 for projection correction of a distorted image is described. Images may also be received for projection correction. The image can optionally be examined to determine the quality of the image. Examining the image can include checking for the presence of noise, lighting conditions, character clarity, resolution, and the like. The image may be processed at step 102 if the quality of the image is above a predetermined threshold. If the quality of the image is below a predetermined threshold, the image can be preprocessed to improve the quality of the image. Pre-processing is modifying the hue, correcting the luminance imbalance, adjusting the sharpness, removing the noise, removing the motion blur, to restore and improve the resolution of the image. It may be necessary including / correcting, compensating for camera misfocus etc. In one example implementation, pre-processing may be performed automatically. In another example implementation, toolbox options can be provided to the user to select the type of preprocessing for the image. In one embodiment, pre-processing is, but is not limited to, Gaussian filtering and median filtering, Wiener filtering, bilateral filtering, Wiener deconvolution, total variation deconvolution, contrast limited adaptive histograms, etc. It may also be implemented using known techniques, including various image filtering methods, such as

ステップ１０２において、画像２値化が実行される。画像２値化は、受信された画像のピクセル値を論理一（１）又は論理ゼロ（０）のいずれかに変換することを含むことができる。これらの値は、単一ビットによって、又は例えば、８−ビット符号なし整数のような複数ビットによって表されることもある。受信された画像のピクセルは、グレースケール・ピクセル、カラー・ピクセル、又は任意の他の形式で表されたピクセルとすることができる。値は、対応する黒色のカラー又は白色のカラーによって表されることもある。一実施例においては、２値化は、グローバル・アプローチ、領域ベースのアプローチ、ローカル・アプローチ、ハイブリッド・アプローチ、又はその任意の変形形態へと広範に分類され得る、知られている技法のうちのどれかを使用して実行されることもある。１つの実例の実装形態においては、画像２値化は、サウボラ２値化（Ｓａｕｖｏｌａｂｉｎａｒｉｚａｔｉｏｎ）を使用して実行される。この技法においては、２値化は、小さな画像パッチに基づいて実行される。ローカル画像パッチの統計データを分析するとすぐに、２値化しきい値は、次の式、すなわち、

を使用して決定され、式中で、ｍと、ｓとは、それぞれ、ローカルな平均偏差と、標準偏差とであり、Ｒは、標準偏差の最大値であり、ｋは、しきい値の値を制御するパラメータである。パラメータｋは、ドキュメント画像に応じて選択されることもある。一実施例においては、ｋは、手動で設定されることもある。別の実施例においては、パラメータｋは、ドキュメント画像のテキスト特性に応じて自動的に設定されることもある。 In step 102, image binarization is performed. Image binarization can include converting pixel values of the received image to either logical ones (1) or logical zeros (0). These values may be represented by a single bit or by multiple bits, such as, for example, an 8-bit unsigned integer. The pixels of the received image may be grayscale pixels, color pixels, or pixels represented in any other format. The values may also be represented by the corresponding black or white color. In one embodiment, of the known techniques, binarization can be broadly classified into a global approach, a domain based approach, a local approach, a hybrid approach, or any variation thereof. It may also be run using one. In one example implementation, image binarization is performed using Sauvola binarization. In this technique, binarization is performed based on small image patches. As soon as analyzing the statistical data of the local image patch, the binarization threshold is:

Where m and s are the local mean deviation and standard deviation, respectively, R is the maximum value of the standard deviation, and k is the threshold value It is a parameter that controls the value. The parameter k may be selected according to the document image. In one embodiment, k may be set manually. In another embodiment, the parameter k may be set automatically depending on the text characteristics of the document image.

ステップ１０４においては、２値化された画像（以下、画像と称される）が任意のピクチャを含むかどうかが決定される。画像が任意のピクチャを含んでいない場合、プロセスはステップ１０８へと進む。画像が１つ又は複数のピクチャを含む場合、１つ又は複数のピクチャは、ステップ１０６において、テキストから分離される。ページ分析方法、テキスト・ロケーション方法、及び／又はマシン学習方法など、知られている技法のうちの任意の技法が、テキストから１つ又は複数のピクチャを分離するために使用されることもある。ページ分析方法に基づいた技法は、スキャンされたドキュメントから生成され、又はスキャンされたドキュメント画像に実質的に類似して見える画像のために使用されることもある。テキスト・ロケーション方法に基づいた技法は、バックグラウンドの中にピクチャを有するなど、複雑なバックグラウンドを有する画像のために使用されることもある。マシン学習方法に基づいた技法は、任意のタイプの画像のために使用されることもある。マシン学習方法に基づいた技法は、学習のためのトレーニング・サンプルを必要とする可能性がある。テキストから１つ又は複数のピクチャを分離するための実例の実装形態においては、ドキュメント画像のバックグラウンドが抽出される。バックグラウンドを使用して、ドキュメント画像は、正規化されて、一様でないイラストレーションの影響を補償する。その後に、非テキスト・オブジェクトが、ヒューリスティック・フィルタリングを使用して２値画像から取り除かれ、このヒューリスティック・フィルタリングにおいては、ヒューリスティック規則は、面積、相対的サイズ、画像フレームに対する近接性、密度、平均コントラスト、エッジ・コントラストなどに基づいている。図６Ａは、テキストと一緒にピクチャを含む実例の２値化された画像を示すものである。図６Ｂは、ピクチャを除去された後の、結果として生ずる画像を示すものである。 In step 104, it is determined whether the binarized image (hereinafter referred to as an image) includes any picture. If the image does not contain any pictures, the process proceeds to step 108. If the image includes one or more pictures, the one or more pictures are separated from the text at step 106. Any of the known techniques, such as page analysis methods, text location methods, and / or machine learning methods, may be used to separate one or more pictures from text. Techniques based on page analysis methods may also be used for images that appear substantially similar to document images generated or scanned from the scanned document. Techniques based on text location methods may also be used for images with complex backgrounds, such as having pictures in the background. Techniques based on machine learning methods may be used for any type of image. Techniques based on machine learning methods may require training samples for learning. In an example implementation for separating one or more pictures from text, the background of the document image is extracted. Using the background, the document image is normalized to compensate for the effects of non-uniform illustrations. Subsequently, non-text objects are removed from the binary image using heuristic filtering, in which heuristic rules are: area, relative size, proximity to image frame, density, average contrast , Edge and contrast etc. FIG. 6A shows an example binarized image including a picture with text. FIG. 6B shows the resulting image after the picture has been removed.

ステップ１０８において、連結成分分析が、テキスト情報だけを有する２値化された画像に対して実行される。連結成分分析は、２値画像の中の連結ピクセル成分を識別すること、及びラベル付けすることを必要とする可能性がある。ピクセル・ブロブが、連結成分分析中に識別されることもある。ピクセル・ブロブは、カラーなどいくつかの特性が、所定の範囲内で一定であり、又は変化する連結成分の組を有する領域とすることができる。例えば、単語「Ｈｅｌｌｏ」は、連結成分の５つの異なる組を有しており、すなわち、その単語の各キャラクタは、連結成分、又はピクセル・ブロブである。位置決定ピクセルが、ピクセル・ブロブのそれぞれについて識別される。位置決定ピクセルは、２値画像におけるピクセル・ブロブの位置を規定する。一実施例においては、位置決定ピクセルは、固有ポイントとすることができる。固有ポイントは、ピクセル・ブロブの内部のピクセル・ブロブ・ベースラインの中心におけるピクセルとすることができる。別の実施例においては、位置決定ピクセルは、ピクセル・ブロブの内部のピクセル・ブロブ・ベースラインの左端又は右端におけるピクセルとすることができる。ピクセル・ブロブ、又はピクセル・ブロブの上に描かれた境界ボックスの中の異なるロケーションにおいて位置決定ピクセルを有する他の実施例が、本開示の範囲内において企図される。図７Ａは、実例のピクセル・ブロブ７０２を示すものである。境界ボックス７０４は、連結成分又はピクセル・ブロブ７０２の周囲に形成される。図７Ａにおいて、識別される連結成分は、キャラクタ「Ａ」７０２である。境界ボックス７０４は、固有ポイント７０６を有しており、この固有ポイント７０６は、境界ボックス７０４の底部の中心として規定されることもある。固有ポイント７０６は、本明細書において使用される位置決定ピクセルのうちの１つとすることができる。他の位置決定ピクセルもまた、投影補正において使用されることもある。例えば、位置決定ピクセル７０８と、７１０とは、左下端の位置決定ピクセルと、左上端の位置決定ピクセルとを表す。位置決定ピクセルを使用して、２値化された画像の中で１つ又は複数の水平テキスト・ライン及び／又は垂直テキスト・ラインを推定することができる。各テキスト・ラインは、関連するテキスト部分の水平又は垂直のテキスト方向の近似を表す。 At step 108, connected component analysis is performed on the binarized image having only textual information. Connected component analysis may require identifying and labeling connected pixel components in a binary image. Pixel blobs may also be identified during connected component analysis. A pixel blob can be an area with a set of connected components where certain characteristics, such as color, are constant or change within a predetermined range. For example, the word "Hello" has five different sets of connected components, ie, each character of the word is a connected component, or a pixel blob. Positioning pixels are identified for each of the pixel blobs. Positioning pixels define the position of the pixel blob in the binary image. In one embodiment, the position determining pixels can be unique points. The unique point may be the pixel at the center of the pixel blob baseline inside the pixel blob. In another embodiment, the position determining pixel may be the pixel at the left or right edge of the pixel blob baseline inside the pixel blob. Other embodiments having pixel location pixels at different locations within the pixel blob, or a bounding box drawn above the pixel blob, are contemplated within the scope of the present disclosure. FIG. 7A illustrates an example pixel blob 702. A bounding box 704 is formed around the connected component or pixel blob 702. In FIG. 7A, the connected component identified is the character "A" 702. Bounding box 704 has a unique point 706, which may be defined as the center of the bottom of bounding box 704. The unique point 706 may be one of the position determining pixels used herein. Other position determining pixels may also be used in projection correction. For example, position determining pixels 708 and 710 represent the position determining pixel at the bottom left corner and the position determining pixel at the top left corner. Positioning pixels may be used to estimate one or more horizontal and / or vertical text lines in the binarized image. Each text line represents an approximation of the horizontal or vertical text direction of the associated text portion.

ステップ１１０において、水平消失ポイントが決定される。一実施例においては、水平消失ポイントは、位置決定ピクセルを使用して決定されるテキスト・ベースラインを使用して決定されることもある。水平消失ポイントを決定するための様々な実施例は、図２に関連して説明される。 At step 110, a horizontal vanishing point is determined. In one embodiment, the horizontal vanishing point may be determined using a text baseline determined using positioning pixels. Various embodiments for determining horizontal vanishing points are described in connection with FIG.

ステップ１１２において、垂直消失ポイントが決定される。一実施例においては、垂直消失ポイントは、位置決定ピクセルを使用して識別されるマージン・ラインを使用して決定される。別の実施例においては、垂直消失ポイントは、連結成分の垂直ストロークの特徴を使用して決定されることもある。さらに別の実施例においては、垂直消失ポイントは、マージン・ラインと、垂直ストロークの特徴とを使用して識別される。垂直消失ポイントを決定するための様々な実施例が、図３及び４に関連して説明される。 At step 112, vertical vanishing points are determined. In one embodiment, the vertical vanishing point is determined using margin lines identified using position determining pixels. In another embodiment, the vertical vanishing point may be determined using the vertical stroke feature of the connected component. In yet another embodiment, vertical vanishing points are identified using margin lines and vertical stroke features. Various embodiments for determining vertical vanishing points are described in connection with FIGS. 3 and 4.

ステップ１１４において、水平消失ポイントと、垂直消失ポイントとを使用して、画像の投影補正が実行される。投影補正は、投影変換モデルの８つの知られていないパラメータの推定に基づいて、実行される。例示の投影変換モデルが、以下に提供される。 At step 114, projection correction of the image is performed using the horizontal vanishing point and the vertical vanishing point. Projection correction is performed based on the estimation of eight unknown parameters of the projection transformation model. An exemplary projection transformation model is provided below.

一実施例においては、水平投影変換行列と、垂直投影変換行列とが、投影変換モデルのパラメータを推定するために構築される。水平投影変換行列と、垂直投影変換行列とは、以下で提供される式を使用して構築される。 In one embodiment, a horizontal projection transformation matrix and a vertical projection transformation matrix are constructed to estimate the parameters of the projection transformation model. The horizontal projection transformation matrix and the vertical projection transformation matrix are constructed using the equations provided below.

であり、式中で、（ｖ_ｘ，ｖ_ｙ）は、消失ポイントであり、（ｗ，ｈ）は、ドキュメント画像の幅と、高さとであり、ｔ_ｘ＝ｗ／２であり、ｔｙ＝ｈ／２であり、

である。投影行列を使用して、画像の投影補正が実行される。
Where (v _x , v _y ) are vanishing points, (w, h) are the width and height of the document image, t _x = w / 2, ty = h / 2,

It is. Projection correction of the image is performed using the projection matrix.

別の実施例においては、垂直消失ポイントと、水平消失ポイントとを使用して、元のひずみを受けた画像のコーナー（ｘ_ｉ，ｙ_ｊ）（４＜＝ｉ＜＝１）と、ひずみを受けていない、又は登録されたドキュメント画像の中のそれらの対応するロケーション（Ｘ_ｉ，Ｙ_ｊ）（４＜＝ｉ＜＝１）とを識別することができる。対応するコーナーの４つの対に基づいて、投影変換モデルが推定される可能性がある。投影変換モデルは、式、

を使用して推定されることもある。 In another embodiment, using the vertical vanishing point and the horizontal vanishing point, the corners of the original distorted image (x _i , y _j ) (4 <= i <= 1) and the It is possible to identify those corresponding locations (X _i , Y _j ) (4 <= i <= 1) in the document images that have not been received or registered. A projected transformation model may be estimated based on the four pairs of corresponding corners. The projection transformation model is an equation

It may be estimated using.

８つのパラメータは、投影的に補正された画像の中の４つのコーナーを識別することに続いて、（４）を使用することにより、取得される可能性がある。投影変換モデルを構築することに続いて、投影補正の一般的な傾向が、図８に示されるように、ユーザの再検討のために生成され、表示される。ユーザは、一般的な傾向を受け入れるべきオプション、又は４つのコーナーを調整すべきツールを提供されることもある。例えば、図８に示されるように、グラフィカル・ユーザ・インターフェース要素８０４が、ユーザがコーナーを調整するための可能性とともに、提供されることもある。ユーザ入力当たりのコーナーにおける変化に応じて、投影変換モデルが修正されることもあり、対応する投影補正が実行されることもある。変化のない受け入れに応じて、投影補正は、実行されることもある。結果として生ずる画像は、図８の要素８０６に示されるように、提示されることもある。当業者なら、適切な追加のオプションもまたユーザに対して提供される可能性もあることを理解するであろう。投影補正の結果の実例が図９Ａ及び９Ｂに例証される。図９Ａは、取り込まれた画像を示すものである。図９Ｂは、投影補正の後の画像を示すものである。 Eight parameters may be obtained by using (4) following identification of four corners in the projectively corrected image. Following construction of the projection transformation model, general trends of projection correction are generated and displayed for user review, as shown in FIG. The user may be provided with an option to accept the general trend or a tool to adjust the four corners. For example, as shown in FIG. 8, a graphical user interface element 804 may be provided along with the possibility for the user to adjust the corners. Depending on the change in the corners per user input, the projection transformation model may be modified and a corresponding projection correction may be performed. Depending on the unchanged acceptance, projection correction may be performed. The resulting image may be presented as shown in element 806 of FIG. One skilled in the art will appreciate that appropriate additional options may also be provided to the user. An example of the result of projection correction is illustrated in FIGS. 9A and 9B. FIG. 9A shows a captured image. FIG. 9B shows the image after projection correction.

図２は、一実施例による、水平消失ポイントを識別するための実例の方法２００を考察するものである。ステップ２０２において、固有ポイントが識別されることもある。固有ポイントは、画像の連結成分分析を通して、識別されることもある。固有ポイントは、すべてのピクセル・ブロブについて規定される。ステップ２０４において、固有ポイントは、クラスタ化され、グループ分けされる。一実施例においては、固有ポイントは、クラスタ化されることに先立って処理されることもある。固有ポイント処理は、混同させる固有ポイントを除去することを含むことができる。混同させる固有ポイントは、テキスト・ベースラインよりも上にあるか、又は下にある固有ポイントとすることができる。混同させる固有ポイントは、主として、キャラクタの３つの組からなるものとすることができ、すなわち、第１の組は、２つのブロブからなることもあるキャラクタを含んでおり、そこでは、より小さなブロブは、「ｊ」、「ｉ」など、テキスト・ベースラインよりも上にあり、第２の組は、「ｐ」、「ｑ」、「ｇ」など、印刷されるときに、テキスト・ベースラインよりも下に伸びるキャラクタを含んでおり、第３の組は、コンマ（，）、ハイフン（−）などのキャラクタを含んでいる。第１及び第３の組のキャラクタに関連する混同させる固有ポイントは、ピクセル・ブロブのサイズに基づいて、識別されることもある。第１の組及び第３の組のキャラクタに関連するピクセル・ブロブのサイズは、他のキャラクタと比べて、水平方向、又は垂直方向のいずれかにおいて、かなり小さいものとすることができる。したがって、混同させる固有ポイントは、すべてのピクセル・ブロブの平均値と、ピクセル・ブロブのサイズを比較することにより、識別されることもある。実例の一実装形態においては、すべてのピクセル・ブロブの幅と、高さとが計算される。さらに、ピクセル・ブロブの幅（ｍ_ｗ）と、高さ（ｍ_ｈ）とについての平均値が計算される。その幅及び／又は高さが所定の範囲だけ前記算出された平均値から逸脱するピクセル・ブロブに属する固有ポイントが、混同させる固有ポイントとしてマーク付けされる。実例の一例においては、［０．３，５］^＊ｍ_ｗの範囲を超える幅、及び／又は［０．３，５］^＊ｍ_ｈの範囲を超える高さを有する固有ポイントは、混同させる固有ポイントとして識別される。そのような混同させる固有ポイントは、さらなる処理から切り捨てられることもある。 FIG. 2 considers an example method 200 for identifying horizontal vanishing points, according to one embodiment. In step 202, unique points may be identified. Intrinsic points may also be identified through connected component analysis of the image. Unique points are defined for all pixel blobs. In step 204, unique points are clustered and grouped. In one embodiment, unique points may be processed prior to being clustered. Intrinsic point processing can include removing the ambiguity inherent points. The unique points to be confused can be unique points that are above or below the text baseline. The intrinsic points to be confused may consist mainly of three sets of characters, ie, the first set includes characters that may consist of two blobs, where there are smaller blobs Are above the text baseline, such as 'j', 'i', and the second set is the text baseline when printed, such as 'p', 'q', 'g' The third set includes characters such as comma (,) and hyphen (-). Confounding unique points associated with the first and third sets of characters may be identified based on the size of the pixel blob. The size of the pixel blobs associated with the first and third sets of characters can be quite small, either horizontally or vertically, as compared to the other characters. Thus, the confusing point may be identified by comparing the pixel blob size with the average value of all pixel blobs. In one example implementation, the widths and heights of all pixel blobs are calculated. In addition, the average value for the width (m _w ) and height (m _h ) of the pixel blob is calculated. Eigenpoints belonging to pixel blobs whose width and / or height deviate from the calculated average value by a predetermined range are marked as confusion points. In one example, unique points having a width that exceeds the range of [0.3, 5] ^* m _w and / or a height that exceeds the range of [0.3, 5] ^* m _h are confusing uniqueness Identified as a point. Such confusion points may be dropped from further processing.

残りの固有ポイントが、各固有ポイント・グループが、同じテキスト・ラインからの固有ポイントを含むように、異なる固有ポイント・グループへと分類され、クラスタ化される。実例の固有ポイント・クラスタ化アルゴリズムが図３に説明される。固有ポイント・クラスタ化アルゴリズムは、同じグループの固有ポイントが、一般的に、以下の複数の条件、すなわち、（１）これらの固有ポイントが互いに近くにある条件と、（２）これらの固有ポイントが、実質的に直線を形成する条件と、（３）構築されたラインの方向が、水平方向に近い条件とのうちの１つ又は複数を満たすという仮定に基づいたものである。一実施例においては、これらの条件は、以下の複数の条件、すなわち、グループのこの固有ポイントと他の固有ポイントとの間のポイント・ツー・ポイント距離が、第１の距離しきい値Ｔ_ｄよりも下にある条件と、グループのこの固有ポイントと、複数の固有ポイントによって形成されるラインとの間のポイント・ツー・ライン距離が、第２の距離しきい値Ｔ_ｉよりも下にある条件と、グループの複数の固有ポイントによって形成されるラインのオフ水平角度が、最大角度Ｔ_ａよりも下にある条件とのうちの少なくとも１つが満たされる場合に、固有ポイントが特定の固有ポイント・グループに割り当てられるように、固有ポイント・クラスタ化アルゴリズムにおけるそれぞれの制約条件に変換される。さらに、固有ポイント・クラスタ化アルゴリズムをより堅牢にするために、追加の制約条件が、固有ポイント・グループが少なくとも最小数の固有ポイントＴ_ｍを含むように、追加されることもある。 The remaining unique points are classified and clustered into different unique point groups such that each unique point group contains unique points from the same text line. An example unique point clustering algorithm is illustrated in FIG. The eigenpoint clustering algorithm is generally based on the following conditions (i) conditions where the eigenpoints are close to each other, and (2) the eigenpoints are: It is based on the assumption that the conditions that form a substantially straight line and (3) the direction of the constructed line satisfy one or more of the conditions close to the horizontal direction. In one embodiment, these conditions are defined by the following conditions: point-to-point distance between this unique point of the group and the other unique points, the first distance threshold T _d The point-to-line distance between the condition below and this unique point of the group and the line formed by the multiple unique points is below the second distance threshold T _i If the at least one of the condition and the off-horizontal angle of the line formed by the plurality of unique points of the group is below the maximum angle T _a , then the unique point is a specific unique point Converted to respective constraints in the unique point clustering algorithm so as to be assigned to groups. Furthermore, in order to more robust intrinsic point clustering algorithm, additional constraints, singularity point group is to include a specific point T _m of a at least a minimum number, it is also to be added that.

一実施例においては、固有ポイント・クラスタ化アルゴリズムの制約条件、すなわち、ポイント・ツー・ポイント距離しきい値Ｔ_ｄと、ポイント・ツー・ライン距離しきい値Ｔ_ｉと、最大角度オフ水平方向しきい値Ｔ_ａと、固有ポイント・グループの中の固有ポイントの最小数Ｔ_ｍとは、画像の分析、例えば、カメラ・ドキュメント画像の分析に基づいて、適応的に設定されることもある。代替的な一実施例においては、パラメータは、手動で設定されることもある。水平方向に関するＴ_ａは、約２０度にオフセットされることもあり、Ｔ_ｍは、テキストの中に少なくとも２つの単語、又は３つの単語を有することを仮定して、約１０とすることができる。他の値がＴ_ａとＴ_ｍとについて選択され得ることを理解すべきである。Ｔ_ｄと、Ｔ_ｉとの値は、ドキュメント画像の中のテキストのコンテンツに依存する可能性がある。例えば、キャラクタ・サイズが大きいＴ_ｄである場合、そのときにはＴ_ｉは、より高く保持されることもあり、逆もまた同様である。一実施例においては、Ｔ_ｄ及びＴ_ｉは、以下のように適応的に算出されることもある。単語の中の隣接するキャラクタの間のすべての最短距離に基づいたメジアン距離Ｄ_ｃが算出される。Ｔ_ｉは、Ｄ_ｃに設定されることもあり、Ｔ_ｄは、３^＊Ｄ_ｃに設定されることもある。これらの値は、水平方向における隣接するパラグラフに属する単語が同じ固有ポイント・グループの中にあるように考えられないようにしながら、Ｔ_ｄが、同じパラグラフの中で隣接する文字と単語とを検索するために十分大きいように選択される。Ｔ_ｄを同じパラグラフの中の隣接する文字と単語とを検索するために十分に大きく設定することは、パラグラフと、水平の隣接するパラグラフとの間のパラグラフ・マージン・ラインの識別を可能にするであろう。いくつかの実例の例においては、単一ラインの中の複数の単語の間のスペースは、複数の固有ポイント・グループへのラインの中の固有ポイントの過剰な分類を引き起こす可能性がある。過剰な分類は、複数の単語の間に大きなギャップ引き起こす固有ポイント除去プロシージャ中に取り除かれていることもある、いくつかの小さな、又は大きな連結成分に起因している可能性がある。 In one embodiment, the constraints of the eigenpoint clustering algorithm, ie, point-to-point distance threshold T _d , point-to-line distance threshold T _i , maximum angle off horizontal The threshold value T _a and the minimum number T _m of unique points in the unique point group may also be set adaptively based on analysis of the image, eg analysis of a camera document image. In an alternative embodiment, the parameters may be set manually. The horizontal direction T _a may be offset by about 20 degrees and T _m may be about 10, assuming that it has at least two words or three words in the text . It should be understood that other values can be selected for the T _a and T _m. The values of T _d and T _i may depend on the content of the text in the document image. For example, if the character size is a large T _d , then T _i may be held higher and vice versa. In one embodiment, T _d and T _i may be adaptively calculated as follows. A median distance D _c is calculated based on all shortest distances between adjacent characters in the word. T _i may be set to D _c and T _d may be set to 3 ^* D _c . These values allow T _d to find adjacent characters and words in the same paragraph, while preventing words belonging to adjacent paragraphs in the horizontal direction from being considered to be in the same unique point group. Selected to be large enough to do. Setting T _d large enough to search for adjacent characters and words in the same paragraph allows for the identification of paragraph margin lines between a paragraph and a horizontal adjacent paragraph. Will. In some illustrative examples, spaces between words in a single line can cause excessive classification of unique points in the line into multiple unique point groups. The overclassification may be due to some small or large connected components that may have been removed during the large gap causing unique point removal procedure between words.

ステップ２０６において、過剰分類された固有ポイント・グループは、対応するグループへとマージすることにより統合される。例示の固有ポイント・マージング・アルゴリズムが、以下のように説明されることもある。各固有ポイント・グループ｛Ｃ_ｉ｝（ｎ＞＝ｉ＞＝１）では、左端固有ポイントｌ_ｉと右端固有ポイントｒ_ｉと（ｎ＞＝ｉ＞＝１）が、それぞれ、識別されることもある。固有ポイント・グループのうちの最も右の固有ポイントに対応することができるピクセル・ブロブが識別される。最も右の固有ポイントの右の隣接するピクセル・ブロブが、切り捨てられたピクセル・ブロブ（例えば、混同させる固有ポイントに対応するピクセル・ブロブ）のうちから検索される。右の隣接するブロブを識別することに応じて、右の隣接するブロブは、新しい右のエンド・ポイントｒ_ｉとして設定されることもある。以前のステップにおいて説明されるような新しい右のエンド・ポイントのさらなる右の隣接するピクセル・ブロブを検索するステップは、さらなる右の隣接するブロブが見出されなくなるまで、反復されることもある。右の隣接するブロブがないことに応じて、ｒ＿ｎｅｗ_ｉのようなブロブの固有ポイント座標が記録される。右のエンド・ポイントの新しいアレイｒ＿ｎｅｗ_ｉ（ｎ＞＝ｉ＞＝１）を用いて、検索インデックスｋが、ゼロ（０）に初期化される。検索インデックスは、１だけ増加され、すなわち、ｋ＝ｋ＋１であり、ｌ_ｋとｒ＿ｎｅｗ_ｉ（ｎ＞＝ｉ＞＝１）との間の距離が算出されることもある。ポイントｌ_ｋと、ｒ＿ｎｅｗ_ｉ（｛Ｃ_ｋ｝及び｛Ｃ_ｉ｝）との対に対応する固有ポイント・グループは、それらが、以下の条件、すなわち、固有ポイント・グループの間の距離が、所定の距離の内部にある（実例の一実装形態において、距離が０．５^＊（Ｔｄ）未満とすることができる）条件と、固有ポイント・グループに対応するラインが、互いに近くにある（例えば、ライン距離が（Ｔ_ｉ）未満である）条件とのうちの少なくとも一方を満たす場合に、マージされることもある。固有ポイント・グループがマージされる場合には、固有ポイント・グループの数は、１だけ低減されることもあり、すなわち、ｎ＝ｎ−１である。チェックを実行して、検索インデックスがポイント・グループの数に等しい（ｋ＝＝ｎ）かどうかを決定することができる。検索インデックスが等しくない場合、そのときには検索インデックスは、増大され、それらが上述された規定された条件を満たす場合に、以前の、距離を算出するステップ、固有ポイント・グループ・マージングのステップが実行される。図１０Ａは、固有ポイント分類の前の実例の画像を示すものである。図１０Ａは、テキスト・ベースラインにおけるピクセル・ブロブについての固有ポイントを示すものである。図１０Ｂは、固有ポイントのグループへの分類の後の実例の画像を示すものである。図は、テキスト・ラインのそれぞれの中にグループを有する画像を示すものである。例えば、第１のテキスト・ラインは、固有ポイント・グループ１００２を示している。画像の中に示される第２のテキスト・ラインは、過剰分類された固有ポイント・グループ１００４及び１００６を示している。過剰分類されたグループ１００４及び１００６（２つのグループ）は、図１０Ｂのテキストの第２のラインの中に見られることもある（対応する固有ポイント・グループについての正方形シンボルと円形シンボルとによって示される）。図１０Ｃは、統合された固有ポイント・グループを有する実例の画像を示すものである。図１０Ｂの中で示されるような、第２のラインの過剰分類されたグループ１００４及び１００６は、１つの固有ポイント・グループ１００８（プラス・マークによって示される）へと統合される。 At step 206, overclassified unique point groups are merged by merging into corresponding groups. An exemplary intrinsic point merging algorithm may be described as follows. In each eigenpoint group {C _i } (n> = i> = 1), the left end eigenpoint l _i , the right end eigenpoint r _i and (n> = i> = 1) may be respectively identified. is there. A pixel blob is identified that can correspond to the rightmost one of the unique point groups. The right adjacent pixel blobs of the rightmost unique point are retrieved from among the truncated pixel blobs (eg, pixel blobs corresponding to the unique points to be confused). In response to identifying the blobs to right adjacent blobs right neighbor is sometimes set as the new right end points r _i. The step of searching for further right adjacent pixel blobs of the new right end point as described in the previous steps may be repeated until no further right adjacent blobs are found. In response to the absence of the right adjacent blob, the blob's unique point coordinates, such as r_new _i , are recorded. The search index k is initialized to zero (0) using the new array r_new _i (n> = i> = 1) of the right end point. The search index may be increased by one, ie k = k + 1, and the distance between l _k and r_new _i (n> = i> = 1) may be calculated. The unique point groups corresponding to the pairs of points l _k and r_new _i ({C _k } and {C _i }) have the following condition: the distance between unique point groups is predetermined The conditions that are inside the distance of (in one implementation of the example, the distance may be less than 0.5 ^* (Td)) and the lines corresponding to the unique point groups are close to each other (e.g. It may be merged if at least one of the following conditions is satisfied: the line distance is less than (T _i ). If unique point groups are merged, the number of unique point groups may be reduced by one, ie n = n-1. A check can be performed to determine if the search index is equal to the number of point groups (k == n). If the search index is not equal then the search index is increased and the previous steps of calculating distance, unique point group merging steps are performed if they meet the defined conditions described above Ru. FIG. 10A shows an image of a previous example of unique point classification. FIG. 10A shows unique points for pixel blobs in the text baseline. FIG. 10B shows an example image after classification of unique points into groups. The figure shows an image having a group in each of the text lines. For example, the first text line indicates unique point group 1002. The second text line shown in the image shows over-classified unique point groups 1004 and 1006. Overclassified groups 1004 and 1006 (two groups) may also be found in the second line of text in FIG. 10B (indicated by the square symbol and the circular symbol for the corresponding unique point group) ). FIG. 10C shows an example image with integrated unique point groups. Overlined groups 1004 and 1006 of the second line, as shown in FIG. 10B, are combined into one unique point group 1008 (indicated by a plus mark).

ステップ２０８において、テキスト・ベースラインは、クラスタ化ステップ及びマージング・ステップの後にもたらされるグループ分けされた固有ポイントを使用して推定される。一実施例においては、テキスト・ベースラインは、適応的な重み付けされたライン推定に基づいた方法（以下で、先験的ライン推定と称される）を使用して推定される。先験的ライン推定は、ライン推定において必要とされる各固有ポイントに重み付けファクタを割り当てることができる。ｎ個の固有ポイント、すなわち、ｐ１、ｐ２、．．．ｐｎがライン推定ａｘ＋ｂｙ＋ｃ＝０（又はｙ＝ｋｘ＋ｔ）のために使用される場合のシナリオを考慮する。固有ポイントのそれぞれには、重み付けファクタｗ１、ｗ２、．．．ｗｎが割り当てられることもある。この場合には、ライン推定は、

によって規定される最小化問題の同等形態と考えられることもある。 At step 208, the text baseline is estimated using the grouped unique points provided after the clustering and merging steps. In one embodiment, the text baseline is estimated using a method based on adaptive weighted line estimation (hereinafter referred to as a priori line estimation). A priori line estimation can assign a weighting factor to each unique point needed in the line estimation. n unique points, i.e. p1, p2,. . . Consider the scenario where pn is used for line estimation ax + by + c = 0 (or y = kx + t). For each of the unique points, the weighting factors w1, w2,. . . wn may be assigned. In this case, the line estimate is

It may be considered as an equivalent form of the minimization problem defined by

式［５］の中の二乗和の最小値は、勾配をゼロに設定することにより、見出されることもある。モデルが二（２）つのパラメータを含むので、二（２）つの勾配方程式が存在している。上記の式の最小化は、以下の実例の擬似コード、すなわち、

を使用して実行されることもある。各固有ポイントに対する重み付けファクタは、重み付け関数、すなわち、
ｗ_ｉ＝ｅｘｐ（−ｄｉｓ_ｉ）……［６］
を使用して割り当てられることもあり、式中で、ｄｉｓ_ｉは、固有ポイントと期待されたテキスト・ベースラインとの間の距離として規定される。したがって、固有ポイントが期待されたテキスト・ベースラインにより近い場合に、固有ポイントには、より高い重み付けファクタが割り当てられることもあり、逆もまた同様である。反復的プロシージャを使用して、期待されたテキスト・ベースラインのより近くに近づくことができる。実例の一実装形態においては、反復は、所定の数のラウンド（例えば、約１０〜７０ラウンド）にわたって、又は２つの逐次的ライン角度の間の差が小さなしきい値（例えば、約０．０１度）よりも下になるまで実行されることもある。 The minimum of the sum of squares in equation [5] may be found by setting the slope to zero. Since the model contains two (2) parameters, there are two (2) gradient equations. The minimization of the above equation leads to the pseudocode of the following example:

It may also be performed using The weighting factor for each unique point is a weighting function, ie
w _i = exp (−dis _i ) ...... [6]
Where dis _i is defined as the distance between the unique point and the expected text baseline. Thus, if the intrinsic points are closer to the expected text baseline, the intrinsic points may be assigned a higher weighting factor, and vice versa. An iterative procedure can be used to get closer to the expected text baseline. In one example implementation, the iteration may be a threshold (e.g., about 0.01) over which the difference between two successive line angles is small, for a predetermined number of rounds (e.g., about 10 to 70 rounds). It may be executed until it is lower than

推定されたラインは、さらに、固有ポイント・グループにおいてアウトライアを除去することにより、洗練されることもある。アウトライアは、例えば、ガウス・モデルを使用することにより、識別されることもある。ガウス・モデルによれば、ほとんどの固有ポイント（例えば、約９９．７％）は、３つの標準偏差の内部に位置している可能性がある。それゆえに、固有ポイントが３つの標準偏差を超えて位置している場合、固有ポイントは、アウトライアとして考えられることもある。ポイント・グループの中の残りの固有ポイントは、次いで、従来の最小二乗法を用いてライン推定のために使用されることもある。前記先験的ライン推定は、すべての固有ポイント・グループのために実行されることもある。図１１は、ベースラインが推定される対象のテキストの実例の部分を示すものである。固有ポイント・グループは、ラインによって接続されるように示されることが分かる可能性がある。実例のラインは、１１０２の内部で、強調表示される。 The estimated lines may be further refined by removing outliers at unique point groups. Outliers may be identified, for example, by using a Gaussian model. According to the Gaussian model, most of the eigenpoints (e.g., about 99.7%) may be located within the three standard deviations. Therefore, if the eigenpoints are located beyond three standard deviations, the eigenpoints may be considered as outliers. The remaining unique points in the point group may then be used for line estimation using conventional least squares. The a priori line estimation may be performed for all eigenpoint groups. FIG. 11 shows a portion of an example of text for which a baseline is to be estimated. It may be seen that unique point groups are shown as connected by lines. Example lines are highlighted within 1102.

ステップ２１０において、水平消失ポイントは、推定されたテキスト・ベースラインを使用して識別されることもある。同次座標理論によれば、デカルト座標系における各水平ラインは、一様な空間の中のデータ・ポイントとして見なされることもあり、これらのデータ・ポイントを通過するラインは、消失ポイントに対応している。それゆえに、水平消失ポイント識別は、同次座標系におけるライン・フィッティング問題として見なされる可能性がある。 At step 210, horizontal vanishing points may be identified using the estimated text baseline. According to homogeneous coordinate theory, each horizontal line in Cartesian coordinate system may be regarded as a data point in uniform space, and lines passing through these data points correspond to vanishing points ing. Therefore, horizontal vanishing point identification may be viewed as a line fitting problem in a homogeneous coordinate system.

推定されたテキスト・ベースラインは、注意深く推定されるが、いくつかのテキスト・ベースラインは、消失ポイント推定の観点からすれば、アウトライアに寄与することができる。そのようなアウトライア・データ・ポイントは、除去されて、水平消失ポイントの推定を改善することができる。アウトライアは、不正確なライン推定と、非テキスト成分（例えば、テキストとピクチャとの分離が失敗する場合における）と、ひずみなどとに起因して、取得されることもある。この問題を克服するために、一実施例に従って、ＭａｒｔｉｎＡ．Ｆｉｓｃｈｅｒ及びＲｏｂｅｒｔＣ．Ｂｏｌｌｅｓ、「ＲａｎｄｏｍＳａｍｐｌｅＣｏｎｓｅｎｓｕｓ：ＡＰａｒａｄｉｇｍｆｏｒＭｏｄｅｌＦｉｔｔｉｎｇｗｉｔｈＡｐｐｌｉｃａｔｉｏｎｓｔｏＩｍａｇｅＡｎａｌｙｓｉｓａｎｄＡｕｔｏｍａｔｅｄＣａｒｔｏｇｒａｐｈｙ」、Ｃｏｍｍ．ｏｆｔｈｅＡＣＭ２４（６）：３８１〜３９５頁、１９８１年６月において説明されるような従来のランダム・サンプル・コンセンサス（ＲＡＮＳＡＣ：ＲａｎｄｏｍＳａｍｐｌｅＣｏｎｓｅｎｓｕｓ）アルゴリズムに基づいた方法が、水平消失ポイント識別のために使用される。ＲＡＮＳＡＣ−ベースのアルゴリズムは、モデル・パラメータを推定するときに、アウトライアを除去する際に、その堅牢性に起因して選択される。提案されたＲＡＮＳＡＣ−ベースのアルゴリズムは、初期のデータ・ポイントが、モデル・パラメータ推定のために選択され、その信頼度レベルが、一緒に取られ得るやり方で、従来のＲＡＮＳＡＣアルゴリズムとは、異なる。従来のＲＡＮＳＡＣアルゴリズムにおける初期データ・ポイントのランダムな選択とは違って、提案されたＲＡＮＳＡＣ−ベースのアルゴリズムは、最大の信頼度を有する初期サンプルを選択する。 Although the estimated text baseline is carefully estimated, some text baselines can contribute to outliers in terms of vanishing point estimation. Such outlier data points can be removed to improve the estimate of the horizontal loss point. Outliers may also be obtained due to inaccurate line estimates, non-text components (e.g., in case of failure of separation of text and picture), distortion, etc. To overcome this problem, according to one embodiment, Martin A. et al. Fischer and Robert C. et al. Bolles, "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography", Comm. A method based on the conventional Random Sample Consensus (RANSAC) algorithm as described in the the ACM 24 (6): pages 381-395, June 1981, for identifying the horizontal loss point Used for The RANSAC-based algorithm is chosen due to its robustness in removing outliers when estimating model parameters. The proposed RANSAC-based algorithm differs from the conventional RANSAC algorithm in such a way that initial data points can be selected for model parameter estimation and their confidence level can be taken together. Unlike the random selection of initial data points in the conventional RANSAC algorithm, the proposed RANSAC-based algorithm selects the initial sample with the highest degree of confidence.

提案されたＲＡＮＳＡＣ−ベースのアルゴリズムの実例の一実装形態が、次に、以下で説明される。 An example implementation of the proposed RANSAC-based algorithm is next described below.

一実施例においては、推定されたテキスト・ベースラインのそれぞれが、デカルト座標系において、規定されることもある。デカルト座標系において規定されるテキスト・ベースラインのそれぞれは、同次座標系においてデータ・ポイントに変換されることもある。 In one embodiment, each of the estimated text baselines may be defined in a Cartesian coordinate system. Each of the text baselines defined in the Cartesian coordinate system may be converted to data points in the homogeneous coordinate system.

データ・ポイントのそれぞれについての信頼度レベルが、割り当てられることもある。データ・ポイントについての信頼度レベルは、結果として生ずるテキスト・ベースラインに対するテキスト・ベースラインを推定するために使用される固有ポイントの近接性と、それぞれのテキスト・ベースラインの長さとに基づいて決定されることもある。各水平テキスト・ベースラインについての信頼度レベルは、

として規定されることもあり、式中で、ｓ_ｍａｘと、ｓ_ｍｉｎとは、すべてのｎ個のライン・セグメントの最大標準偏差と、最小標準偏差とを表しており、ｌ_ｍａｘは、すべてのｎ本のラインのうちの最長のライン・セグメントを表している。それゆえに、より長い水平テキスト・ベースラインには、より高い信頼度レベルが割り当てられる。これは、水平テキスト・ベースラインが長くなれば長くなるほど、水平テキスト・ベースラインの推定はよりよくなるという仮定に基づいている。同様に、標準偏差（対応する推定されたテキスト・ベースラインに対する固有ポイントの近接性を示す）が低くなれば低くなるほど、テキスト・ベースライン推定はよりよくなる。その結果として、そのようなテキスト・ベースラインには、より高い信頼度レベルが割り当てられる。所定のしきい値よりも上の信頼度レベルを有する、サンプル・ポイントの中のデータ・ポイントは、優先順位サンプル・アレイへとグループ分けされることもある。優先順位サンプル・アレイの中のデータ・ポイントは、いくつかのサンプル・グループへとクラスタ化されることもある。一実施例においては、各サンプル・グループは、２つ以上のデータ・ポイントを含むことができる。ライン推定では、精度はまた、ラインを推定するために使用されるデータ・ポイントの距離によって決定されることもある。２つのデータ・ポイントが互いに遠く離れている場合、そのときにはライン推定が正確になることになる、より高い信頼度が存在している。それゆえに、第２の信頼度レベル・インジケータが、サンプル・グループの中のポイント対に割り当てられることもあり、すなわち、

であり、式中で、Ｄｉｓ_ｊ，ｋは、垂直方向におけるラインｊとラインｋとの間の距離であり、Ｄｉｓ_ｍａｘは、ラインのｍ^＊（ｍ−１）対のうちの最大の距離である。ｍ（ｍ＜＜ｎ）本のラインの選択が、最良の信頼度レベルを有する第１のｍ本のラインを選択する優先順位サンプル・グループを定式化するために考慮されることもある。各サンプル・グループには、サンプル・グループの中の各データ・ポイントに割り当てられる、少なくとも信頼度レベルに基づいて、グループ信頼度値が割り当てられることもある。 Confidence levels for each of the data points may be assigned. Confidence levels for the data points are determined based on the proximity of the unique points used to estimate the text baseline to the resulting text baseline and the length of each text baseline It is also possible. The confidence level for each horizontal text baseline is

_Where s _max and s _min denote the maximum standard deviation and the minimum standard deviation of all n line segments, l _max is It represents the longest line segment of the n lines. Therefore, longer horizontal text baselines are assigned higher confidence levels. This is based on the assumption that the longer the horizontal text baseline, the better the estimation of the horizontal text baseline. Similarly, the lower the standard deviation (indicating the proximity of the unique point to the corresponding estimated text baseline), the better the text baseline estimation. As a result, such text baselines are assigned higher confidence levels. Data points in sample points having confidence levels above a predetermined threshold may be grouped into a priority sample array. Data points in the priority sample array may be clustered into several sample groups. In one embodiment, each sample group can include more than one data point. In line estimation, the accuracy may also be determined by the distance of the data points used to estimate the line. If the two data points are far apart from one another, then there is a higher degree of confidence that the line estimate will be accurate. Hence, a second confidence level indicator may be assigned to the point pair in the sample group, ie

Where Dis _{j, k} is the distance between line j and line k in the vertical direction, and Dis _max is the _maximum distance of the m ^* (m-1) pairs of lines is there. The selection of m (m << n) lines may be taken into account to formulate a priority sample group that selects the first m lines with the best confidence level. Each sample group may also be assigned a group confidence value based on at least the confidence level assigned to each data point in the sample group.

ステップＡにおいて、データ・ポイントのサンプル・グループは、ライン・フィッティングのために、優先順位サンプル・アレイから反復的に選択されることもある。反復は、優先順位サンプル・アレイの中の最高の信頼度値を有するサンプル・グループから開始されることもある。（反復回数が、ある種のしきい値を超過する場合、そのときにはそれは停止される可能性があり、アルゴリズムは、ステップＦへと移行する）。ステップＢにおいては、ライン・フィッティングは、第１の適合されたラインを結果としてもたらす第１のサンプル・グループのために実行されることもあり、さらなる適合されたラインを結果としてもたらすそれぞれのさらなるサンプル・グループのためにライン・フィッティングをその後に実行している。 At step A, sample groups of data points may be iteratively selected from the prioritized sample array for line fitting. The iteration may be started from the sample group with the highest confidence value in the priority sample array. (If the number of iterations exceeds a certain threshold, then it may be stopped and the algorithm proceeds to step F). In step B, line fitting may be performed for the first sample group resulting in the first fitted line, and each further sample resulting in the further fitted line • Line fitting is then performed for the group.

ステップＣにおいて、第１の適合されたラインからの所定の距離しきい値よりも下に位置づけられるデータ・ポイントの組が、第１の適合されたラインと、さらなる適合されたラインとに基づいて、決定されることもある。これらのデータ・ポイントは、インライアと称される。第１の適合されたラインからの所定の距離しきい値は、固定されたパラメータとすることができ、又はドキュメント画像のコンテンツに基づいて、適応的に設定されることもある。ステップＤにおいて、第１の適合されたラインからの所定の距離しきい値よりも下に位置づけられるデータ・ポイントのカウントが算出される。決定される最大のインライア数が記録される。ステップＥにおいては、チェックが実行されて、最大インライア数がデータ・ポイントの数に等しいかどうかを決定することができる。最大インライア数がデータ・ポイントの数に等しくない場合、反復回数が再計算され、ステップＡが再び開始されることもある。最大インライア数がデータ・ポイントの数に等しい場合、ステップＦが開始されることもある。 In step C, the set of data points located below a predetermined distance threshold from the first adapted line is based on the first adapted line and the further adapted line. It may be decided. These data points are called inliers. The predetermined distance threshold from the first adapted line may be a fixed parameter or may be set adaptively based on the content of the document image. In step D, a count of data points located below a predetermined distance threshold from the first adapted line is calculated. The maximum number of inliers to be determined is recorded. In step E, a check may be performed to determine if the maximum number of inliers is equal to the number of data points. If the maximum number of inliers is not equal to the number of data points, the number of iterations is recalculated and step A may be started again. Step F may be initiated if the maximum number of inliers is equal to the number of data points.

ステップＦにおいて、最大インライアを使用して消失ポイントを推定することができる。一実施例においては、第１及び第２の水平消失ポイント候補が、最小二乗法、重み付けされた最小二乗法、及び／又は適応最小二乗法とから成る群から選択される異なる近似方法を使用して推定されることもある。他の近似方法の使用もまた、本明細書において企図される。ステップＧにおいては、投影補正の後の、画像ドキュメントの水平テキスト方向に最も近い水平消失ポイント候補が選択されることもある。水平テキスト方向の近さは、

によって測定されることもあり、式中で、ｎは、ドキュメント画像の中の水平ラインの数であり、α_ｉは、投影補正が実行された後の水平方向に関するｉ番目のライン角度の角度として規定され（１８０°≧α_ｉ≧０°）、ｐは、ｍ個の候補消失ポイントから選択されるｐ番目の候補水平消失ポイントのインデックスである。 In step F, vanishing points can be estimated using the largest inliers. In one embodiment, the first and second candidate horizontal vanishing points use different approximation methods selected from the group consisting of least squares, weighted least squares, and / or adaptive least squares. May be estimated. The use of other approximation methods is also contemplated herein. In step G, a candidate horizontal vanishing point closest to the horizontal text direction of the image document may be selected after projection correction. The proximity of the horizontal text direction is

Where n is the number of horizontal lines in the document image, and α _i is the angle of the ith line angle with respect to horizontal after projection correction has been performed As defined (180 ° ≧ α _i 00 °), p is the index of the p th candidate horizontal vanishing point selected from the m candidate vanishing points.

従来のＲＡＮＳＡＣアルゴリズムは、初期ライン推定のために、ランダムに選択されたポイントを使用する。その結果として、従来のＲＡＮＳＡＣアルゴリズムが実行されるたびごとに、異なる結果が存在している可能性がある。さらに、従来のＲＡＮＳＡＣアルゴリズムの結果を判断することは、難しい可能性がある。提案されたＲＡＮＳＡＣ−ベースのアルゴリズムは、ポイントについての何らかの先験的知識を組み込むことにより、この問題に対処している。提案されたＲＡＮＳＡＣ−ベースのアルゴリズムにおいては、よい信頼度レベルを有するポイントが最初に選択されて、インライアを推定する。その結果として、提案されたＲＡＮＳＡＣ−ベースのアルゴリズムは、より整合した結果を提供する。 The conventional RANSAC algorithm uses randomly selected points for initial line estimation. As a result, different results may exist each time the conventional RANSAC algorithm is performed. Furthermore, determining the result of the conventional RANSAC algorithm can be difficult. The proposed RANSAC-based algorithm addresses this problem by incorporating some a priori knowledge of the points. In the proposed RANSAC-based algorithm, points with good confidence levels are selected first to estimate the inliers. As a result, the proposed RANSAC-based algorithm provides more consistent results.

本開示は、水平消失ポイント決定のために固有ポイントを使用することを説明しているが、ピクセル・ブロブの他の位置決定ピクセルもまた、水平消失ポイント決定のために使用され得ることを理解すべきである。 Although this disclosure describes using unique points for horizontal vanishing point determination, it is understood that other positioning pixels of a pixel blob may also be used for horizontal vanishing point determination. It should.

図３は、一実施例による、実例の固有ポイント・クラスタ化アルゴリズム３００を説明するものである。ステップ３０２において、固有ポイントの組「Ｉ」が識別されることもある。ステップ３０４において、固有ポイントをカウントして、その数が固有ポイント・グループを生成するために十分であるかどうかを決定することができる。その数が十分よりも上（少なくともしきい値数（Ｔ_Ｍ）よりも上）にある場合、固有ポイントの組「Ｉ」が処理されることもある。しきい値数は、固有ポイント・グループの生成のための制約条件として設定されることもある。固有ポイントの数がしきい値よりも少ない場合、そのときにはステップ３２４が実行されることもある。実例の一実装形態においては、固有ポイントのしきい値数は、１０とすることができ、単一ラインの中に、少なくとも２つの、又は３つの単語の存在を示唆している。しきい値は、固有ポイント・グループに対して関連のない固有ポイントを割り当てる可能性を防止するように設定されることもある。 FIG. 3 describes an example unique point clustering algorithm 300 according to one embodiment. At step 302, a set of unique points "I" may be identified. At step 304, unique points can be counted to determine if the number is sufficient to generate unique point groups. If the number is above sufficient (at least above the threshold number ( _TM )), then the unique point set "I" may be processed. The threshold number may be set as a constraint for the generation of unique point groups. If the number of unique points is less than the threshold, then step 324 may be performed. In one example implementation, the threshold number of unique points may be ten, suggesting the presence of at least two or three words in a single line. The threshold may be set to prevent the possibility of assigning unrelated unique points to the unique point group.

ステップ３０６においては、固有ポイント（例えば、ｐ_０）が、固有ポイントの組Ｉからランダムに選択される。固有ポイントｐ_０は、候補ライン・グループ「Ｃ」の中の第１の固有ポイントとして入力されることもある。一実施例においては、候補ライン・グループＣは、双方向待ち行列とすることができる。さらに、固有ポイントｐ_０が、固有ポイントの組Ｉから取り除かれる。ｐ_０の一方の側からの固有ポイントは、候補ライン・グループＣへと入力される。 At step 306, unique points (eg, p ₀ ) are randomly selected from the set I of unique points. Singularity point p ₀ may also be entered as the first specific point in the candidate line group "C". In one embodiment, candidate line group C may be a bi-directional queue. Furthermore, the unique points p ₀ are removed from the set I of unique points. The unique points from one side of p ₀ are input to candidate line group C.

ステップ３０８において、候補固有ポイント・グループＣからの新しく加わった固有ポイントｐ_ｉは、双方向待ち行列（例えば、非負方向ｉ＞＝０の待ち行列）の一方の側から選択される。固有ポイントｐ_ｉに最も近い固有ポイントの組Ｉからの固有ポイントｐ^＊が識別される。 In step 308, the newly added eigenpoint p _i from candidate eigenpoint group C is selected from one side of a bi-directional queue (eg, a non-negative direction i> = 0 queue). Specific point p ^* is identified from the set I of the nearest unique point singularity point p _i.

ステップ３１０において、固有ポイントｐ_ｉとｐ^＊との間の距離が計算される。距離がしきい値距離（Ｔ_ｄ）よりも下にある場合、ステップ３１２が実行される。距離がしきい値距離（Ｔ_ｄ）よりも上にある場合、ステップ３１４が実行される。しきい値距離は、グループの内部にある固有ポイントの間の最大距離のことを意味することができる。実例の一実装形態においては、グループの固有ポイントの間のしきい値距離は、隣接する固有ポイントの最も近い組のメジアン距離の３倍とすることができる第１の距離しきい値よりも下にある。 In step 310, the distance between unique points p _i and p ^* is calculated. If the distance is below the threshold distance (T _d ), step 312 is performed. If the distance is above the threshold distance (T _d ), step 314 is performed. The threshold distance can mean the maximum distance between the eigenpoints that are inside the group. In one example implementation, the threshold distance between the eigenpoints of a group is below the first distance threshold, which may be three times the median distance of the closest set of adjacent eigenpoints. It is in.

ステップ３１２において、選択された固有ポイントｐ^＊がポイント・ツー・ライン距離しきい値（Ｔ_ｉ）と、水平方向に対する近接性しきい値（Ｔ_ａ）とによって課される制約条件を満たすかどうかが決定される。ポイント・ツー・ライン距離しきい値（Ｔ_ｉ）は、固有ポイントが固有ポイント・グループのために選択されるようにするために、テキスト・ベースラインからポイントの最大距離しきい値を規定することができる。ポイント・ツー・ライン距離しきい値（Ｔ_ｉ）を使用して、直線を形成する際に寄与する固有ポイントを選択する。水平方向に対する近接性しきい値（Ｔ_ａ）は、固有ポイントが固有ポイント・グループのために選択されるようにするための水平方向に関するラインからの固有ポイントの最大角度を規定することができる。水平方向に対する近接性しきい値（Ｔ_ａ）を使用して、水平方向に近いラインの方向の形成に寄与する固有ポイントを選択する。実例の一実装形態においては、Ｔ_ａは、二十（２０）度とすることができる。選択された固有ポイントｐ^＊が制約条件を満たすことを決定することに応じて、固有ポイントｐ^＊は、双方向待ち行列（非負方向における）と、その間の時間におけるｉ＝ｉ＋１とにおけるｐ_ｉ＋１ポイントとして、候補ライン・グループＣのために選択されることもある。選択された固有ポイントｐ^＊が制約条件を満たさないことを決定することに応じて、固有ポイントｐ^＊は、特別なライン・グループ「Ｌ」の中に配置されることもある。 In step 312, whether the selected unique point p ^* satisfies the constraints imposed by the point-to-line distance threshold (T _i ) and the horizontal proximity threshold (T _a ) Is determined. Point-to-line distance threshold (T _i ) shall specify the text baseline to point maximum distance threshold so that the unique points are selected for the unique point group Can. A point-to-line distance threshold (T _i ) is used to select unique points that contribute in forming a straight line. The horizontal proximity threshold (T _a ) may define the maximum angle of the eigenpoint from the line with respect to the horizontal to cause the eigenpoint to be selected for the eigenpoint group. The proximity threshold (T _a ) to the horizontal direction is used to select unique points that contribute to the formation of the direction of the line close to the horizontal direction. In one example implementation, T _a can be twenty (20) degrees. In response to determining that the chosen eigenpoint p ^* satisfies the constraint, the eigenpoint p ^* is p _{i + 1} point at the bi-directional queue (in the non-negative direction) and i = i + 1 in the time interval As a candidate line group C may be selected. In response to determining that the selected unique point p ^* does not satisfy the constraint, the unique point p ^* may be placed in a special line group "L".

３０８から３１２のプロセス・ステップは、一方の側（双方向待ち行列の非負方向）からのすべての固有ポイントが評価されるまで実行される。固有ポイントの一方の側の評価の完了に応じて、ｐ_０の他方の側からの残りの固有ポイントが考慮される（双方向待ち行列の非正方向）。ｐ_０の他方の側からの残りの固有ポイントが、候補ライン・グループＣへと入力される。 The process steps 308 to 312 are performed until all unique points from one side (bidirectional queue non-negative direction) have been evaluated. Depending on the completion of the evaluation of one side of the unique points, the remaining unique points from the other side of p ₀ are taken into account (non positive direction of the bi-directional queue). The remaining unique points from the other side of p ₀ are input to candidate line group C.

ステップ３１４において、候補ライン・グループＣからの固有ポイントｐ_ｊ（双方向待ち行列の非正方向、ｊ＜＝０）が、別の側から選択される。固有ポイント・グループＣの中の他方の側からの固有ポイントｐ_ｊに最も近い固有ポイントの組Ｉからの固有ポイントｐ^＊が識別される。ステップ３１６において、固有ポイントｐ_ｊとｐとの間の距離が計算される。距離がＴ_ｄよりも下にある場合、ステップ６１８が実行される。距離がＴ_ｄよりも上にある場合、ステップ３２０が実行される。 At step 314, the unique points p _j from the candidate line group C (non-forward direction of the bi-directional queue, j <= 0) are selected from the other side. A unique point p ^* from the set I of unique points closest to the unique point p _j from the other side of the unique point group C is identified. At step 316, the distance between the unique points p _j and p is calculated. If the distance is below T _d , step 618 is performed. If the distance is above T _d , step 320 is performed.

ステップ３１８において、固有ポイントｐ_ｊをチェックして、選択された固有ポイントｐ^＊がＴ_ｉとＴ_ａとに関して制約条件を満たすかどうかを決定する。固有ポイントｐ_ｊが制約条件を満たすことを決定することに応じて、固有ポイントｐ^＊は、双方向待ち行列（非正方向における）と、その間の時間におけるｊ＝ｊ−１とにおけるｐ_ｊ−１ポイントとして、候補ライン・グループＣについて選択されることもある。固有ポイントが制約条件を満たさないことを決定することに応じて、固有ポイントｐ_ｊは、特別のライン・グループ「Ｌ」の中に配置されることもある。 In step 318, the unique points p _j are checked to determine if the selected unique points p ^* satisfy the constraint condition with respect to T _i and T _a . Depending on the specific points p _j decides satisfying the constraint condition, singularity point p ^* is, p bidirectional queue and (in the non-positive), the j = j-1 Metropolitan in between time _{j- One} point may be selected for candidate line group C. In response to determining that the unique points do not satisfy the constraint, unique points p _j may be placed in a special line group “L”.

３１６から３１８へのプロセス・ステップは、他方の側からのすべての固有ポイントが評価されるまで、実行される。 The process steps 316 to 318 are performed until all unique points from the other side have been evaluated.

ステップ３２０において、候補ライン・グループＣの中の固有ポイントがカウントされて、数がしきい値数Ｔ_ｍよりも上にあるかどうかを決定することができる。数がＴ_ｍよりも上にある場合、ステップ３２２が実行される。数がＴ_ｍよりも下にある場合、プロセスがステップ３０４にマッピングされて、処理のための任意の他の固有ポイントが存在しているかどうかを決定する。ステップ３２２において、候補ライン・グループＣには、インデックス番号が割り当てられ、その結果、候補ライン・グループＣは、インデックス番号によってインデックスされるラインについての固有ポイント・アレイになる。 In step 320, it is possible to specific points in the candidate line group C is counted, the number to determine whether above the threshold number T _m. If the number is above _Tm , step 322 is performed. If the number is below _Tm , the process is mapped to step 304 to determine if any other unique points for processing exist. In step 322, candidate line group C is assigned an index number, such that candidate line group C becomes a unique point array for the lines indexed by the index number.

ステップ３２４において、特別なライン・グループＬの中の各固有ポイントについて、固有ポイントがライン・グループのうちのどれかについてのＴ_ｍと、Ｔ_ｉと、Ｔ_ａとの制約条件の内部にあるかどうかがチェックされる。固有ポイントが制約条件Ｔ_ｍと、Ｔ_ｉと、Ｔ_ａとの内部にあることを決定することに応じて、固有ポイントは、対応するライン・グループへとマージされる。 At step 324, for each unique point in a particular line group L, whether the unique point is within the constraints of T _m , T _i and T _a for any of the line groups It will be checked. In response to determining that the unique points are internal to the constraints T _m , T _i and T _a , the unique points are merged into the corresponding line group.

プロセスは、ドキュメント画像の中のすべてのラインが処理されるまで、あらゆるテキスト・ベースラインについて反復される。 The process is repeated for every text baseline until all lines in the document image have been processed.

本明細書において説明されるような固有ポイント・クラスタ化アルゴリズムの１つの利点は、それが、クラスタ化するための初期ポイントに関係なく、整合したクラスタ化結果を与えることである。双方向待ち行列の使用は、一方向における１つのエンド・ポイントではなくて、ラインの上の２つのエンド・ポイントの使用を可能にし、それによってポイント・グループを形成するシーディング・ポイントに対するアルゴリズムの信頼を低減させている。固有ポイント・クラスタ化アルゴリズムは、アルゴリズムが、各固有ポイントがポイント・グループのうちの１つに属する必要があることを必要としていないという意味で、柔軟性がある。グループのうちのどれにも含まれないいくつかの固有ポイントは、切り捨てられ、又は無視される。これは、従来のクラスタ化アルゴリズムよりも提案された固有ポイント・クラスタ化アルゴリズムについてのより簡単な、より高速な収束をもたらす。それにもかかわらず、固有ポイントを異なるライン・グループへとクラスタ化するための従来の、又は任意の他のクラスタ化アルゴリズムの使用もまた、本明細書において企図される。 One advantage of the unique point clustering algorithm as described herein is that it provides consistent clustering results regardless of the initial point to cluster. The use of bi-directional queuing allows the use of two end points on the line rather than one end point in one direction, thereby providing an algorithm for seeding points to form point groups. I am reducing my confidence. The unique point clustering algorithm is flexible in the sense that the algorithm does not require that each unique point needs to belong to one of the point groups. Some unique points not included in any of the groups are truncated or ignored. This results in simpler, faster convergence for the proposed Eigenpoint Clustering algorithm than conventional clustering algorithms. Nevertheless, the use of conventional or any other clustering algorithm to cluster unique points into different line groups is also contemplated herein.

図４は、一実施例による、マージン特徴ポイントを使用して垂直消失ポイントを識別するための実例のプロセス・フロー４００を説明するものである。ステップ４０２において、マージン特徴ポイントが識別されることもある。マージン特徴ポイントは、一実施例による、位置決定ピクセルとすることができる。マージン特徴ポイントは、以下で説明されるように識別されることもある。一実施例においては、マージン特徴ポイントは、左マージンについては、ピクセル・ブロブの左下エンド・ピクセルとすることができ、マージン特徴ポイントは、右マージンについては、ピクセル・ブロブの右下エンド・ピクセルとすることができる。左下エンド・ポイントは、固有ポイント・グループ（例えば、水平ライン推定中に識別される）の中の左固有ポイントに関連するブロブを見出すことにより、識別されることもある。固有ポイント・マージング・ステップの後、且つ水平ライン形成のための固有ポイント・グループの使用に先立って決定される固有ポイント・グループは、マージン・ポイント決定のために使用されることもある。固有ポイントが左又は右の固有ポイントにマージした後である理由は、マージング・ブロブに対応している可能性がある。固有ポイントは、ライン形成の直前に取り除かれていない可能性もある。左固有ポイントは、グループの中の固有ポイントのｘ−座標を比較した後に、見出されることもある。左固有ポイントの対応するブロブが見出されることもある。ブロブの左下エンド・ポイントは、左マージン特徴ポイントとして使用されることもある。左下エンド・ポイントと同様に、右下エンド・ポイントは、固有ポイント・グループの中の右固有ポイントに関連するブロブを見出すことにより識別されることもある。固有ポイント・グループの右端の上のブロブを識別した後に、識別された右端のブロブの近くに隣接するブロブが存在しているかどうかが決定されることもある。次いで、ブロブ検索が、固有ポイント・マージング・プロシージャにおいて隣接するブロブ検索アルゴリズムの中で使用されるプロセスに類似したプロセスを使用して実行される。次いで、見出されたブロブに対応する右下のエンド・ポイントを使用して、右マージン・ライン推定のための特徴ポイントを形成する。代替的な実施例においては、マージン特徴ポイントの他の変形形態が、使用されることもある。図１２は、マージン特徴ポイントがマージンにおいて識別される実例の画像を示すものである。マージン特徴ポイントは、１２０２の内部に示されるようにマージンにおいてドットによってマーク付けされることが分かる可能性がある。パラグラフ・マージンは、投影ひずみが起こらない場合には、通常、垂直であり、平行している。 FIG. 4 illustrates an example process flow 400 for identifying vertical loss points using margin feature points according to one embodiment. At step 402, margin feature points may be identified. The margin feature points may be position determining pixels according to one embodiment. Margin feature points may also be identified as described below. In one embodiment, the margin feature point may be the lower left end pixel of the pixel blob for the left margin, and the margin feature point may be the lower right end pixel of the pixel blob for the right margin. can do. The lower left end point may also be identified by finding the blob associated with the left unique point in the unique point group (eg, identified during horizontal line estimation). The unique point groups determined after the unique point merging step and prior to the use of unique point groups for horizontal line formation may also be used for margin point determination. The reason that the unique points are after merging with the left or right unique points may correspond to merging blobs. The unique points may not have been removed just prior to line formation. The left unique point may be found after comparing the x-coordinates of the unique points in the group. The corresponding blob of the left unique point may be found. The lower left end point of the blob may be used as a left margin feature point. Similar to the lower left end point, the lower right end point may be identified by finding the blob associated with the right unique point in the unique point group. After identifying the blob above the right end of the unique point group, it may be determined whether there is an adjacent blob near the identified right blob. A blob search is then performed using a process similar to the process used in the adjacent blob search algorithm in the unique point merging procedure. The bottom right end point corresponding to the blob found is then used to form feature points for right margin line estimation. In alternative embodiments, other variations of margin feature points may be used. FIG. 12 shows an example image in which margin feature points are identified in the margin. It can be seen that the margin feature points are marked by dots in the margin as shown inside 1202. Paragraph margins are usually vertical and parallel if no projection distortion occurs.

ステップ４０４において、マージン・ポイントの特徴が、異なるマージン・グループへとクラスタ化される。画像の中のドキュメントのマージン・ラインに沿ったマージン特徴ポイントは、マージンを推定するために使用されることもある。一実施例においては、マージン特徴ポイントは、対応するマージンの中のピクセル・ブロブの近接性に基づいて、クラスタ化されることもある。実例の一実施例においては、図３に関連して説明される固有ポイント・クラスタ化アルゴリズムに類似したクラスタ化アルゴリズムが、マージン特徴ポイントをクラスタ化するために使用されることもある。代替的な一実施例においては、異なるポイント・クラスタ化アルゴリズムが、以下に説明されるようになど、使用されることもある。
ステップ１：マージン・ポイント特徴距離しきい値ＴＥｎｄ_ｔｈを設定し、識別される（ステップ４０２における）すべての左マージン・ポイントが、｛Ｐ_ｉ｝として示され、
ステップ２：｛Ｐ_ｉ｝からのランダムに選択された１つのポイントを用いて左マージン・ポイント・グループ｛Ｃ_１｝を初期化し、このポイントを｛Ｐ_ｉ｝から取り除き、グループ＿インデックス＝１を設定し、
ステップ３：｛Ｐ_ｉ｝の中の各ポイントについて、このポイントと｛Ｃ_ｉ｝の中のポイントとの間の最小距離を算出する（グループ＿インデックス≧ｉ≧１）。距離が、ＴＥｎｄ_ｔｈよりも低い場合、そのときには、このポイントは、最小距離に到達するポイント・グループに割り当てられ、そうでなければ、グループ・インデックスは、１だけ増大することになり、すなわち、グループ＿インデックス＝グループ＿インデックス＋１であり、このポイントは、最新の左マージン・ポイント・グループ：｛Ｃグループ＿インデックス｝に割り当てられるであろう。 At step 404, the features of the margin points are clustered into different margin groups. Margin feature points along the margin lines of the document in the image may be used to estimate the margin. In one embodiment, the margin feature points may be clustered based on the proximity of pixel blobs within the corresponding margin. In one illustrative example, a clustering algorithm similar to the Eigenpoint Clustering algorithm described in connection with FIG. 3 may be used to cluster the margin feature points. In an alternative embodiment, different point clustering algorithms may be used, such as described below.
Step 1: Set margin point feature distance threshold TEnd _th and indicate all left margin points (at step 402) identified as {P _i },
Step 2: Initialize the left margin point group {C _1} with one point selected randomly from {P _i}, remove the point from the {P _i}, a group _ index = 1 Set,
Step 3: For each point in {P _i }, calculate the minimum distance between this point and the point in {C _i } (group_index ≧ i ≧ 1). If the distance is lower than TEnd _{th then} this point is assigned to the point group reaching the minimum distance, otherwise the group index will be increased by 1 ie the group _Index = group_index + 1, this point will be assigned to the most recent left margin point group: {C group_index}.

ＴＥｎｄ_ｔｈは、６^＊（Ｔ_ｄ）に等しくなるように設定され（（Ｔ_ｄ）は、図２に関連して以上で考察されるような固有ポイントの間のメジアン距離である）、この値は、それが、同じマージン・ラインの中にあることが期待される隣接するマージン・ポイントの特徴を検索するために十分満足できるようにして、選択されることもある。左エンド・ポイント・クラスタ化方法は、左エンド・ポイント・クラスタ化アルゴリズムがすべてのマージン・ポイントを使用することができるので、水平ライン推定のための固有ポイント・クラスタ化方法とは異なる可能性があるが、固有ポイント・クラスタ化アルゴリズムにおいては、いくつかの固有ポイントが、クラスタ化プロセス中に除去されることもある。 TEnd _th is set equal to 6 ^* (T _d ), where (T _d is the median distance between the eigenpoints as discussed above in connection with FIG. 2), this value May be selected such that it is sufficiently satisfactory to search for features of adjacent margin points that are expected to be within the same margin line. The left end point clustering method may differ from the eigenpoint clustering method for horizontal line estimation because the left end point clustering algorithm can use all margin points Although, in the eigenpoint clustering algorithm, some eigenpoints may be removed during the clustering process.

代替的な実施例においては、他のクラスタ化アルゴリズムもまた、使用されることもある。マージンにおいて識別されるクラスタ化された位置決定ピクセルは、異なるマージン・ポイント・グループへと処理されることもある。例えば、ドキュメント画像の中に、２つの列が存在している場合、両方の列の左マージンと右マージンとについての位置決定ピクセルが識別され、それに応じてグループ分けされる。ステップ４０６においては、過剰分類されたマージン・ラインが、対応するマージン・ラインと統合されることもある。例えば、同じマージンに沿った２本以上のラインが、単一のマージンへと統合されることもある。 Other clustering algorithms may also be used in alternative embodiments. Clustered position determining pixels identified in the margin may be processed into different margin point groups. For example, if there are two columns in the document image, positioning pixels for the left and right margins of both columns are identified and grouped accordingly. In step 406, overclassified margin lines may be merged with corresponding margin lines. For example, two or more lines along the same margin may be merged into a single margin.

ステップ４０８において、垂直ライン推定が、マージン・ポイント・グループを使用して実行されることもある。固有ポイント・クラスタ化アルゴリズムと同様に、必ずしもあらゆるマージン・ポイント・グループが、垂直ライン推定のために使用されることもあるとは限らない。グループのためのマージン特徴ピクセルは、マージン・ライン推定に適している以下の複数の条件、すなわち、マージン・ラインＰ_ｔｈの中の最小ポイント数（例えば、Ｐ_ｔｈのためのしきい値は、３つの固有ポイントとすることができる）と、マージン・ラインの上のポイントの最小パーセンテージＰ_ｌ（例えば、約５０％）と、垂直方向に関するラインの最大角度α_ｖ（例えば、最大角度は、約２０°とすることができる）と、最小非境界ポイント信頼度レベルＰ_ｂ（例えば、最小非境界ポイントは、約５０％とすることができる）とのうちの、１つ又は複数を満たす必要がある可能性がある。 At step 408, vertical line estimation may be performed using margin point groups. Similar to the eigenpoint clustering algorithm, not all margin point groups may be used for vertical line estimation. Margin feature pixels for a group are subject to the following conditions that are suitable for margin line estimation, ie, the minimum number of points in the margin line P _th (eg, threshold for P _th is 3 Two unique points), a minimum percentage P ₁ of points on the margin line (eg, about 50%), and a maximum angle α _v of the line in the vertical direction (eg, the maximum angle is about 20). Must meet one or more of: °) and the minimum non-boundary point confidence level P _b (for example, the minimum non-boundary point may be about 50%) there is a possibility.

マージン・ポイントの特徴（これは、Ｐ_ｔｈに寄与する）は、ピクセル決定ポイントとマージン・ラインとの間の距離がしきい値（Ｔ_ｌ）の内部にある場合に、マージン・ラインの内部にあるように見なされることもあり、このしきい値（Ｔ_ｌ）は、実例の一実装形態においては、メジアン固有ポイント距離（Ｔ_ｄ）に等しい。マージン・ラインＰ_ｌの上のポイントのパーセンテージは、クラスタ化された固有ポイント・グループの中のマージン・ラインの内部の固有ポイントの数とマージン・ポイントの特徴の数との間の比率として規定されることもある。いくつかの実施例においては、範囲を外れているピクセル決定ポイントが存在していることもある。例えば、ドキュメント・コンテンツが部分的に取り込まれるときに、画像の境界は半分取り込まれるコンテンツを有することができる。境界におけるそのようなブロブに関連するピクセル決定ポイントが、境界ポイントとして規定されることもある。境界ポイントは、マージン・ライン推定において使用されないこともあり、非境界ポイントのパーセンテージは、クラスタ化されたマージン・ポイント特徴グループの中の非境界ポイントの数と、マージン・ポイントの特徴の数との間の比率として規定される可能性がある。最小非境界ポイント信頼度レベルＰ_ｂは、マージン・ラインの上のポイントのパーセンテージと、非境界ポイントのパーセンテージとの乗算として規定されることもある。 The feature of the margin point (which contributes to P _th ) is inside the margin line if the distance between the pixel decision point and the margin line is inside the threshold (T _l ) This threshold (T _l ) may be regarded as being equal to the median specific point distance (T _d ) in one example implementation. The percentage of points on the margin line P _l is defined as the ratio between the number of unique points inside the margin line in the clustered eigenpoint group and the number of features of the margin point Sometimes. In some embodiments, there may be pixel decision points that are out of range. For example, when document content is partially captured, the boundaries of the image can have content that is partially captured. Pixel decision points associated with such blobs at boundaries may also be defined as boundary points. Boundary points may not be used in margin line estimation, and the percentage of non-boundary points is the number of non-boundary points in the clustered margin point feature group and the number of features of the margin point It may be defined as a ratio between The minimum non-boundary point confidence level P _b may be defined as the multiplication of the percentage of points above the margin line by the percentage of non-boundary points.

一実施例においては、垂直ライン推定は、垂直オフセット最小二乗法を使用して実行されることもあるが、代替的な方法もまた、ここで企図される。可能性のあるほとんど垂直なラインが、ｙ＝ｋｘ＋ｔとして表されることを仮定する。垂直オフセット最小二乗法を用いると、最適ライン係数は、次のオブジェクト最小化関数、すなわち、

に対応している。 In one embodiment, vertical line estimation may be performed using vertical offset least squares, although alternative methods are also contemplated herein. Suppose that a possible nearly vertical line is represented as y = kx + t. Using the vertical offset least squares method, the optimal line coefficient is given by the following object minimization function:

It corresponds to

垂直オフセット最小二乗法に基づいて、以下で説明されるようなほとんど垂直なライン推定のための反復的な堅牢な方法が、一実施例に従って、使用されることもある。 Based on the vertical offset least squares method, an iterative robust method for near vertical line estimation as described below may be used according to one embodiment.

ステップ１において、ラインが、垂直オフセット・ライン推定方法を使用して初期化される。ステップ２において、サンプル・ポイントからの距離が算出される。ステップ３において、ライン関数が、重み付けられた垂直オフセット方法に基づいて再計算される。ステップ４において、逐次的な推定されたラインの間の角度差が算出されることもある。角度差が所定のしきい値よりも下にあり、又は反復カウントが最大の許容可能反復を超過する場合、本方法は、ステップ５へと進む。角度差が所定のしきい値よりも上にあり、又は反復カウントが最大許容可能な反復の内部にある場合、次の反復が、ステップ２へと進むことにより、実行される。ステップ５において、ライン関数は、算出される。所定のしきい値と、最大許容可能な反復回数とは、一実施例による、水平ライン推定方法におけるそれぞれのパラメータと同じ値である。代わりに、水平ライン推定のために使用される値とは異なる値が、垂直ライン推定のための所定のしきい値と、最大許容可能反復とのために使用される。重み付けられた垂直オフセット方法は、以下の実例の擬似コード、すなわち、

を使用して実施される可能性がある。 In step 1, lines are initialized using the vertical offset line estimation method. In step 2, the distance from the sample point is calculated. In step 3, the line function is recalculated based on the weighted vertical offset method. In step 4, the angular difference between successive estimated lines may be calculated. The method proceeds to step 5 if the angular difference is below a predetermined threshold or if the iteration count exceeds the maximum allowable iteration. If the angular difference is above the predetermined threshold, or if the repeat count is within the maximum allowable iteration, then the next iteration is performed by proceeding to step 2. At step 5, a line function is calculated. The predetermined threshold and the maximum allowable number of iterations are the same value as the respective parameters in the horizontal line estimation method according to one embodiment. Instead, a value different from that used for horizontal line estimation is used for the predetermined threshold for vertical line estimation and the maximum allowable iteration. The weighted vertical offset method follows the pseudo code of the example below:

May be implemented using

別の実施例においては、垂直ライン推定は、ｘ−ｙ交換可能な重み付けされた最小二乗法を使用して実行されることもある。ｘ−ｙ交換可能な重み付けされた最小二乗法においては、ｘとｙとの座標は、垂直ラインの推定の前に交換される可能性があり、その結果、垂直オフセットは、垂直ライン推定中に制約を受けるであろう。 In another embodiment, vertical line estimation may be performed using xy replaceable weighted least squares. In the x-y exchangeable weighted least squares method, the x and y coordinates may be exchanged prior to the estimation of the vertical line, so that the vertical offsets are calculated during the vertical line estimation. You will be constrained.

ひとたび、垂直ラインが推定された後に、垂直ラインが、マージされることもある。例えば、ライン・スペースに沿った複数の折れたマージン・ラインがマージされて、単一のマージンを形成することができる。垂直ラインは、以下のステップを使用してマージされる可能性がある。ステップ１において、各マージン・ラインについて、ｘ−座標が算出されることもあり、垂直座標（ｙ−座標）が固定されることを保持している。ステップ２において、ｘ−座標の距離は、マージン・ラインのために算出されることもある。ｘ−座標の距離がしきい値Ｔ_ｖｔｈよりも下にある場合、マージン・ラインは、マージされることもある。Ｔ_ｖｔｈは、２^＊（Ｔ_ｄ）であるように、選択されることもあり、ここで、Ｔ_ｄは、マージン特徴ポイントの間のメジアン距離とすることができる。複数の垂直ラインが存在するときの例では、最も近い垂直ラインが、それらが垂直消失ポイント識別のために使用される前に、マージされることもある。図１３は、同じマージンに沿った２本の推定された垂直ライン１３０２Ａ及び１３０２Ｂを示す実例の画像を示している。図１４は、図１３の単一マージン１４０２への推定された垂直ラインのマージングを示す実例の画像を示すものである。 Once vertical lines have been estimated, vertical lines may be merged. For example, multiple broken margin lines along the line space can be merged to form a single margin. Vertical lines may be merged using the following steps. In step 1, x-coordinates may be calculated for each margin line, keeping the vertical coordinates (y-coordinates) fixed. In step 2, the x-coordinate distance may be calculated for the margin line. The margin lines may be merged if the x-coordinate distance is below the threshold _Tvth . T _vth may also be chosen to be 2 ^* (T _d ), where T _d may be the median distance between the margin feature points. In the example when there are multiple vertical lines, the closest vertical lines may be merged before they are used for vertical vanishing point identification. FIG. 13 shows an example image showing two estimated vertical lines 1302A and 1302B along the same margin. FIG. 14 shows an example image showing the merging of the estimated vertical lines into the single margin 1402 of FIG.

ステップ４１０において、推定された垂直ラインを使用して、垂直消失ポイントは、識別されることもある。決定された垂直ラインは、以下で説明されるように、修正されたＲＡＮＳＡＣアルゴリズムを使用して処理されることもあり、この修正されたＲＡＮＳＡＣアルゴリズムは、水平消失ポイント識別のために使用される方法と非常に類似している。マージング・ステップからもたらされる推定された垂直マージン・ラインは、デカルト座標系において規定されることもある。さらに、前記推定された垂直マージン・ラインのそれぞれは、デカルト座標系から同次座標系におけるデータ・ポイントへと変換される。データ・ポイントのそれぞれに対する信頼度レベルは、それが、水平消失ポイント識別を用いて行われたので、結果として生ずるマージン・ライン、並びにそれぞれのマージン・ラインの長さを推定するために使用されるマージン・ポイントの近接性に基づいて割り当てられることもある。所定のしきい値よりも上の信頼度レベルを有するデータ・ポイントのうちのデータ・ポイントの組が、優先順位サンプル・アレイへとグループ分けされる。さらに、優先順位サンプル・アレイの中のデータ・ポイントは、いくつかのサンプル・グループへとクラスタ化される。一実施例においては、サンプル・グループのそれぞれは、２つ以上のデータ・ポイントを含んでいる。さらに、グループ信頼度値は、サンプル・グループの中の各データ・ポイントに割り当てられる信頼度レベルに基づいて、各サンプル・グループに割り当てられることもある。データ・ポイントのサンプル・グループは、ライン・フィッティングのために、優先順位サンプル・アレイから反復して選択されることもある。一実施例においては、反復は、優先順位サンプル・アレイの中で最高の信頼度値を有するサンプル・グループから開始されることもある。第１のサンプル・グループについてのライン・フィッティングが実行されることもあり、第１の適合されたラインをもたらしている。それぞれのさらなるサンプル・グループについてのライン・フィッティングが、その後に実行されることもあり、さらなる適合されたラインをもたらしている。第１の適合されたラインからの所定の距離しきい値よりも下に位置づけられるデータ・ポイントの組が、第１の適合されたラインと、さらなる適合されたラインとに基づいて、決定されることもある。第１及び第２の垂直消失ポイント候補は、データ・ポイントの決定された組に対応する垂直ラインから推定されることもある。一実施例においては、第１及び第２の水平消失ポイント候補は、最小二乗法、重み付けされた最小二乗法、及び／又は適応最小二乗法など、異なる近似方法を使用して推定されることもある。他の近似方法もまた、使用されることもある。各垂直消失ポイント候補の近接性は、投影補正の後に、結果として生ずる垂直テキスト方向と比較されることもある。投影補正の後の画像ドキュメントの垂直テキスト方向に最も近い垂直消失ポイント候補が、選択されることもある。 At step 410, using the estimated vertical lines, vertical vanishing points may be identified. The determined vertical lines may also be processed using a modified RANSAC algorithm, as described below, which method is used for horizontal erasure point identification And very similar. The estimated vertical margin lines resulting from the merging step may be defined in a Cartesian coordinate system. In addition, each of the estimated vertical margin lines is transformed from a Cartesian coordinate system to data points in a homogeneous coordinate system. The confidence level for each of the data points is used to estimate the resulting margin line, as well as the length of each margin line, as it was done using horizontal vanishing point identification It may be assigned based on the proximity of the margin points. A set of data points of the data points having confidence levels above a predetermined threshold is grouped into a priority sample array. In addition, data points in the priority sample array are clustered into several sample groups. In one embodiment, each of the sample groups includes two or more data points. In addition, group confidence values may be assigned to each sample group based on the confidence level assigned to each data point in the sample group. Sample groups of data points may be iteratively selected from the prioritized sample array for line fitting. In one embodiment, the iteration may be started from the sample group having the highest confidence value in the priority sample array. Line fitting for the first sample group may also be performed, resulting in a first adapted line. Line fitting for each additional sample group may be performed subsequently, resulting in further adapted lines. A set of data points positioned below a predetermined distance threshold from the first adapted line is determined based on the first adapted line and the further adapted line. Sometimes. The first and second candidate vertical erasure points may be estimated from vertical lines corresponding to the determined set of data points. In one embodiment, the first and second candidate horizontal vanishing points may be estimated using different approximation methods, such as least squares, weighted least squares, and / or adaptive least squares. is there. Other approximation methods may also be used. The proximity of each vertical vanishing point candidate may be compared to the resulting vertical text direction after projection correction. The closest vertical vanishing point candidate in the vertical text direction of the image document after projection correction may be selected.

検出されたマージン・ラインの数が比較的小さい（例えば、５よりも小さい）場合、重み付けされた垂直消失ポイント識別方法を使用して、直接に消失ポイントを算出することも可能である。この方法を用いて、前記推定された垂直マージン・ラインのそれぞれは、デカルト座標系から同次座標系におけるデータ・ポイントへと変換される。データ・ポイントのそれぞれに対する信頼度レベルは、上記で述べられるように割り当てられることもある。その後に、重み付けされた最小二乗法を使用して垂直消失ポイントに対応するラインに適合させることができる。 If the number of margin lines detected is relatively small (e.g., less than 5), it is also possible to calculate the vanishing points directly using the weighted vertical vanishing point identification method. Using this method, each of the estimated vertical margin lines is transformed from a Cartesian coordinate system to data points in a homogeneous coordinate system. Confidence levels for each of the data points may be assigned as described above. Thereafter, weighted least squares can be used to fit the line corresponding to the vertical vanishing point.

図５は、一実施例による、連結成分分析を使用して垂直消失ポイントを識別するための実例のプロセス５００を説明するものである。プロセス５００は、垂直マージン・ラインがマージンのないことに起因して、使用可能でないこともある場合に、採用されることもある。垂直消失ポイントは、ピクセル・ブロブのテキスト・ストロークの特徴を使用して識別されることもあり、このピクセル・ブロブのテキスト・ストロークの特徴は、テキスト・キャラクタの構成ユニットである。ステップ５０２において、ピクセル・ブロブのテキスト・ストロークの特徴が識別されることもある。図１５は、キャラクタのテキスト・ストロークの特徴の識別情報を示す実例の画像を示すものである。円１５０２によって識別されるテキストの一部分が、図の右側に示されている。複数の文字「ｄａｎｓｌａ」のうちの垂直テキスト・ストロークの特徴１５０４が、識別され、示されている。 FIG. 5 illustrates an example process 500 for identifying vertical vanishing points using connected component analysis, according to one embodiment. Process 500 may also be employed if the vertical margin line may not be available due to the lack of margin. The vertical vanishing point may also be identified using the text stroke feature of the pixel blob, which is a constituent unit of the text character. At step 502, the text stroke features of the pixel blob may be identified. FIG. 15 shows an example image showing identification information of the character's text stroke features. A portion of the text identified by circle 1502 is shown on the right of the figure. Vertical text stroke features 1504 of the plurality of characters "dans la" are identified and shown.

ステップ５０４において、ピクセル・ブロブの組は、１つ又は複数の規定された判断基準に準拠したテキスト・ストロークの特徴を用いて識別されることもある。一実施例においては、ピクセル・ブロブは、ピクセル・ブロブが、複数の判断基準、すなわち、ピクセル・ブロブの偏心度０．９７と、マージンに対して近くないことと、７０°と１１０°との間のテキスト・ストロークの角度と、［０．３，５］^＊エリア_ｍの内部のピクセル・ブロブのエリアとのうちの１つ又は複数を満たす場合に、選択されることもある。偏心度を使用して、ピクセル・ブロブが円形形状にどれだけ近いかを示すことができる。円形形状の偏心度がゼロであるので、偏心度値が小さくなれば小さくなるほど、ピクセル・ブロブは、より円形になる。ピクセル・ブロブの偏心度が０．９７よりも大きい場合には、ピクセル・ブロブはライン・セグメントのように見え、それゆえに垂直ひずみを示すことができるひずみを受けたブロブとすることができる。一実施例においては、ピクセル・ブロブの偏心度は、ピクセル・ブロブの周囲の取り巻く楕円を識別することにより見出されることもあり、次いで次の式、すなわち、

に従って、それを算出することができ、式中で、ａと、ｂとは、楕円の長軸と、短軸とを表している。中国語やロシア語などの言語では、エッジ検出や数学的形態学フィルタリングなど、オプションの前処理プロシージャを使用して、ピクセル・ブロブの偏心度の特徴を強化することができる。０．９７を有するピクセル・ブロブは、適切なフィルタを使用してフィルタをかけられることもある。画像の境界に対するピクセル・ブロブの近さは、推定のために使用されないこともある。一実施例においては、近接性フィルタリングを使用して、画像境界との交差を有するピクセル・ブロブを除去することができる。同様にして、一実施例においては、角度フィルタリングが実行されて、７０度と、１１０度との内部にないテキスト・ストロークを有するピクセル・ブロブにフィルタをかけることができる。［０．３，５］^＊エリア_ｍの範囲の中にエリアを有するピクセル・ブロブが選択されることもある。そのような範囲の内部のブロブを識別するために、堅牢な方法を使用して、上記で述べられた判断基準のフィルタリングの後に選択されるピクセル・ブロブのメジアン・エリアを推定することができる。そのエリア値が［０．３，５］^＊エリア_ｍの範囲の中にあるピクセル・ブロブは、垂直消失ポイント推定のために使用される。図１６は、テキスト・ストロークの特徴の識別の後に、選択的に抽出されたブロブを示す実例の画像を示すものである。 At step 504, a set of pixel blobs may be identified using text stroke features in accordance with one or more defined criteria. In one embodiment, the pixel blobs may have pixel blobs that are not close to the margin, with eccentricities of 0.97 for the pixel blobs, 70 ° and 110 °. It may also be selected if one or more of the angle of the text stroke in between and the area of the pixel blob inside [0.3, 5] ^* area _m are satisfied. The eccentricity can be used to indicate how close the pixel blob is to a circular shape. Because the eccentricity of the circular shape is zero, the smaller the eccentricity value, the more circular the pixel blob. If the eccentricity of the pixel blob is greater than 0.97, the pixel blob looks like a line segment and can therefore be a distorted blob that can exhibit vertical distortion. In one embodiment, the eccentricity of a pixel blob may be found by identifying an ellipse surrounding the pixel blob, and then:

It can be calculated according to where a and b represent the major and minor axes of the ellipse. In languages such as Chinese and Russian, optional pre-processing procedures such as edge detection and mathematical morphological filtering can be used to enhance the pixel blob eccentricity feature. Pixel blobs having 0.97 may be filtered using an appropriate filter. The closeness of pixel blobs to image boundaries may not be used for estimation. In one embodiment, proximity filtering can be used to remove pixel blobs that intersect with image boundaries. Similarly, in one embodiment, angular filtering may be performed to filter pixel blobs having text strokes that are not within 70 degrees and 110 degrees. [0.3, 5] ^* Pixel blobs with areas within the area _m may be selected. To identify blobs within such a range, a robust method can be used to estimate the median area of the pixel blobs selected after filtering the criteria described above. Pixel blobs whose area values are in the range [0.3, 5] ^* area _m are used for vertical vanishing point estimation. FIG. 16 shows an example image showing selectively extracted blobs after identification of text stroke features.

選択されたピクセル・ブロブは、垂直テキスト・ブロブ・ラインを推定するために使用される。垂直ラインは、ステップ５０６において推定される。垂直ラインは、ピクセル・ブロブの方向に対応することができるライン関数を使用して推定される。図１７は、選択されたピクセル・ブロブについての推定された垂直テキスト・ブロブ・ラインを示す実例の画像を示すものである。 The selected pixel blob is used to estimate the vertical text blob line. Vertical lines are estimated at step 506. Vertical lines are estimated using a line function that can correspond to the direction of the pixel blob. FIG. 17 shows an example image showing estimated vertical text blob lines for selected pixel blobs.

ステップ５０８において、垂直ラインを使用して、垂直消失ポイントは、決定されることもある。一実施例においては、垂直消失ポイントは、以前に説明されるような修正されたＲＡＮＳＡＣアルゴリズムを使用して決定されることもある。図１８は、修正されたＲＡＮＳＡＣアルゴリズムの適用の結果として選択される垂直テキスト・ブロブ・ラインを示す実例の画像を示すものである。簡潔にするために、垂直ラインの上の修正されたＲＡＮＳＡＣの適用を要約する簡単な説明が以下で提供される。前記推定された垂直テキスト・ブロブ・ラインのそれぞれが、デカルト座標系におけるラインとして規定される。推定される前記垂直テキスト・ブロブ・ラインのうちのさらなるそれぞれは、デカルト座標系において、同次座標系におけるデータ・ポイントに対して変換される。データ・ポイントのそれぞれに対する信頼度レベルが割り当てられることもある。信頼度レベルは、それぞれの垂直テキスト・ブロブ・ラインを推定するために使用されるピクセル・ブロブの形状の少なくとも偏心度に基づいたものとすることができる。さらに、修正されたＲＡＮＳＡＣ方法は、垂直消失ポイントを決定するために、上記図面に関連して上記で説明されるように適用される。 At step 508, using vertical lines, vertical vanishing points may be determined. In one embodiment, the vertical erasure points may be determined using a modified RANSAC algorithm as described previously. FIG. 18 shows an example image showing vertical text blob lines selected as a result of application of the modified RANSAC algorithm. For the sake of brevity, a brief description summarizing the application of the modified RANSAC on vertical lines is provided below. Each of the estimated vertical text blob lines is defined as a line in a Cartesian coordinate system. Each further of the estimated vertical text blob lines is transformed in Cartesian coordinates to data points in homogeneous coordinates. Confidence levels for each of the data points may also be assigned. The confidence level may be based on at least the eccentricity of the shape of the pixel blob used to estimate each vertical text blob line. Furthermore, the modified RANSAC method is applied as described above in connection with the above figures to determine the vertical vanishing point.

投影補正アルゴリズムは、本明細書において説明される機能を実施するために、コンピューティング・デバイスの上にロードされるときに、マシンを生成するコンピュータに関連した命令の組として実施されることもある。これらのコンピュータ・プログラム命令は、コンピュータ又は他のプログラマブル・データ処理装置が、説明されるやり方で機能することを指示することができる非一時的コンピュータ読取り可能メモリに記憶される可能性もある。投影補正アルゴリズムはまた、コンピュータ・ベースのシステムにおいて、又はコンピュータ・ベースのシステムに関連して、実施され得るハードウェア、又はハードウェアとソフトウェアとの組合せとして実施されることもある。当業者なら、コンピュータ・ベースのシステムが、サーバ／コンピュータに関連するオペレーティング・システムと、様々なサポート・ソフトウェアとを含むことを理解することができる。本明細書において説明されるような投影補正アルゴリズムは、組織及び／又は組織に関連するサード・パーティ・ベンダーによって展開されることもある。 The projection correction algorithm may also be implemented as a set of instructions associated with a computer that generates a machine when loaded onto a computing device to perform the functions described herein. . These computer program instructions may also be stored in non-transitory computer readable memory, which can indicate that a computer or other programmable data processing device functions in the manner described. The projection correction algorithm may also be implemented in computer-based systems, or as hardware or a combination of hardware and software that may be implemented in connection with computer-based systems. One skilled in the art can appreciate that a computer-based system includes an operating system associated with a server / computer and various supporting software. Projection correction algorithms as described herein may also be deployed by the organization and / or third party vendors associated with the organization.

投影補正アルゴリズムは、画像処理アプリケーションやＯＣＲアプリケーションなど、他のアプリケーションと統合され得る、ユーザ・デバイスの上に存在するスタンドアロン・アプリケーション、又はモジュラー・アプリケーション（例えば、プラグイン）とすることができる。例えば、スタンドアロン・アプリケーションは、パーソナル・コンピュータ、ポータブル・コンピュータ、ラップトップ・コンピュータ、ネットブック・コンピュータ、タブレット・コンピュータ、スマートフォン、デジタル・スチル・カメラ、ビデオ・カメラ、モバイル通信デバイス、携帯型個人情報端末、スキャナ、多機能デバイス、又はドキュメント画像を取得すること、及び本明細書において説明されるオペレーションを実行するためのプロセッサを有することができる任意のデバイスなど、ユーザ・デバイスの上に存在することができる。別の企図された実装形態においては、投影補正アルゴリズムの一部分は、ユーザ・デバイス（例えば、ユーザのカメラ）によって実行されることもあり、投影補正アルゴリズムの他の部分は、ユーザ・デバイスに結合された処理デバイス（例えば、ユーザのパーソナル・コンピュータ）によって実行されることもある。この場合には、処理デバイスは、よりコンピュータ的に高くつくタスクを実行することができる。投影補正アルゴリズムはまた、ネットワークを通してユーザ・デバイスからアクセス可能なサーバ（例えば、ＯＣＲサーバ）の上に存在するサーバ・ベースのアプリケーションとして実施されることもある。投影補正アルゴリズムはまた、複数のネットワーク化されたデバイスを通して実施されるモジュールを有するネットワーク・ベースのアプリケーションとして、実施されることもある。 The projection correction algorithm may be a stand-alone application or a modular application (eg, a plug-in) residing on a user device that may be integrated with other applications, such as image processing applications and OCR applications. For example, stand-alone applications include personal computers, portable computers, laptop computers, netbook computers, tablet computers, smart phones, digital still cameras, video cameras, mobile communication devices, portable personal digital assistants , A scanner, a multifunction device, or any device capable of having a processor for performing the operations described herein, such as obtaining a document image, and being present on a user device it can. In another contemplated implementation, a portion of the projection correction algorithm may be performed by a user device (e.g., a user's camera), and other portions of the projection correction algorithm are coupled to the user device May be performed by a processing device (eg, a user's personal computer). In this case, the processing device can perform more computationally expensive tasks. The projection correction algorithm may also be implemented as a server based application residing on a server (eg, an OCR server) accessible to user devices through the network. The projection correction algorithm may also be implemented as a network based application having modules implemented through multiple networked devices.

要約すると、本開示は、透視図法によりひずみを受けた画像、例えば、カメラ・ベースのドキュメント画像の投影補正のための方法の様々な実施例を提供しており、これらの方法は、以下の技術的寄与のうちの少なくとも１つを有している。
− 水平消失ポイントを推定するための固有ポイントの使用。一般に、これらのベースラインが、ほとんど、テキスト部分の中の複数の逐次的なキャラクタのために位置合わせされているので、位置決定ピクセルとして境界ボックスのベースラインの上のピクセルのうちの１つを使用することが好ましい。これらのうちでは、それらの固有ポイントは、それらが、標準の連結成分分析の副産物であり、それゆえに、追加の処理ステップが、各ピクセル・ブロブについてこれらを取得するために必要とされないので、好ましい。
− 固有ポイント選択プロシージャが、テキスト・ライン推定のために使用され得る固有ポイントを選択するために提案される。混同させる固有ポイントを除去し、クラスタ化すること、又はマージすることにより残りの固有ポイントをグループ分けする実施例が開示されている。さらに、固有ポイントのクラスタ化することの結果は、既に推定されたベースラインである。
− テキスト部分のベースラインの左エンド・ポイントと、右エンド・ポイントとが、マージン・ライン推定のためのマージン特徴ポイントとして使用される。左及び右のエンド・ポイント・クラスタ化アルゴリズムが、マージン・ラインを推定するために提案される。
− 消去ポイント推定においてインライアを識別するために、優先順位−ＲＡＮＳＡＣと称され得る従来のＲＡＮＳＡＣアルゴリズムの適応が提案され、そこでは、従来のアルゴリズムは、先験的知識、例えば、信頼度値又は信頼度レベルを考慮に入れることにより改善される。
− 消失ポイント選択プログラムが、異なるやり方で決定され得るいくつかの候補消失ポイントのうちから選択するために採用される。
− 重み付けされたライン推定が、信頼度レベルを使用して、水平消失ポイント推定のために提案され、適応的重み付けされたライン推定が、垂直消失ポイント推定のために提案される。
− 垂直オフセット最小二乗法と、ｘ−ｙ交換可能な重み付けされた最小二乗法とが、垂直マージン・ラインを算出するために提案される。
− ブロブ分析に基づいた垂直消失ポイント推定が、特に、ピクセル・ブロブの垂直ストロークの特徴を考慮することにより提案される。
− ページ分析が処理チェーンの中に組み込まれ、テキスト情報だけが投影補正のために使用される。ステップが、投影補正を実行する前にピクチャを除去し、又は分離するために取られる実施例が提案される。
− 投影補正問題を解決する完全な処理チェーンが提案され、そこではユーザ介入のための必要性が、回避される可能性がある。
− 異なるレベルについての、すなわち、固有ポイント、ベースラインと、消失ポイント候補とについての除去ステップを含む投影補正方法が、投影補正の結果をまとめて改善するために提案される。 In summary, the present disclosure provides various embodiments of methods for projection correction of perspective-distorted images, such as camera-based document images, which methods include the following techniques: Have at least one of the
-Use of eigenpoints to estimate horizontal vanishing points. Generally, since these baselines are mostly aligned for multiple sequential characters in the text portion, one of the pixels on the border box's baseline as the positioning pixel It is preferred to use. Of these, their unique points are preferred as they are a by-product of standard connected component analysis and therefore no additional processing steps are required to obtain them for each pixel blob .
-A unique point selection procedure is proposed to select unique points that can be used for text line estimation. An example is disclosed that groups remaining unique points by removing confusing unique points, clustering, or merging. Furthermore, the result of clustering of unique points is a baseline that has already been estimated.
The left end point and the right end point of the baseline of the text part are used as margin feature points for margin line estimation. Left and right end point clustering algorithms are proposed to estimate margin lines.
In order to identify inliers in erasure point estimation, an adaptation of the conventional RANSAC algorithm, which may be referred to as RANSAC, is proposed, wherein the conventional algorithm is a priori knowledge, eg a confidence value or a confidence value. It is improved by taking into account the degree level.
An erasure point selection program is employed to select among a number of candidate erasure points that may be determined in different ways.
Weighted line estimation is proposed for horizontal vanishing point estimation using confidence level, and adaptive weighted line estimation is proposed for vertical vanishing point estimation.
Vertical offset least squares and xy replaceable weighted least squares are proposed to calculate vertical margin lines.
-Vertical vanishing point estimation based on blob analysis is proposed, in particular by considering the features of the vertical stroke of the pixel blob.
Page analysis is incorporated into the processing chain, only text information is used for projection correction. An embodiment is proposed in which the steps are taken to remove or separate pictures before performing projection correction.
-A complete processing chain is proposed to solve the projection correction problem, where the need for user intervention may be avoided.
A projection correction method is proposed to collectively improve the result of the projection correction, which comprises removal steps for different levels, i.e. for eigenpoints, baselines and missing point candidates.

Claims

A method for projection correction of an image comprising at least one text portion that is subject to distortion by perspective projection, comprising:
An image binarization step, wherein the image is binarized;
Connected component analysis, wherein pixel blobs are detected in the at least one text portion of the binarized image;
Estimating a horizontal baseline using the pixel blob's unique points, and determining horizontal vanishing points of the at least one text portion using the text baseline. When,
Determining a vertical vanishing point, wherein a vertical vanishing point is determined for the at least one text portion based on vertical features of the at least one text portion;
The perspective drawing in the image, wherein the correction based on the horizontal and vertical vanishing point, viewed including the steps of projecting the correction,
The step of estimating the text baseline comprises the step of clustering the unique points into unique point groups, the unique point groups satisfying the following conditions:
-The condition that the point-to-point distance between the eigenpoints of the eigenpoint group is below a first distance threshold;
-The point-to-line distance between each unique point of the unique point group and the line formed by the unique points of the unique point group is below a second distance threshold Condition and
The condition that the off horizontal angle of the line formed by the unique points of the unique point group is below a maximum angle;
-A condition in which the unique point group contains a minimum number of unique points
The text baseline is estimated based on the unique point group .

The method of claim 1, wherein each unique point is a center of a bottom of a bounding box of the respective pixel blob.

The step of estimating the text baseline includes the step of removing confusable eigenpoints, and confusing eigenpoints that are out of line with the eigenpoints are detected near the eigenpoints being considered The method of claim 1, wherein the confounding intrinsic points are ignored for the text baseline estimation.

The confusing point removal step is:
Determining the width and height of the pixel blob;
Determining an average value for the width and height of the pixel blob;
The width of the pixel blobs under consideration, the step of at least one of the height, to detect a specific point to the confused as a unique point belonging only to different pixel blobs predetermined range from the determined mean value The method according to claim 3, comprising

The first distance threshold, the second distance threshold, the maximum angle, and the minimum number of unique points are adaptively set based on the content of the image. The method of claim 1 .

The step of estimating text baselines further includes the step of unique point group merging, where unique point groups on either side of the ignored unique points are merged into a larger unique point group The method according to claim 1 .

The step of determining the horizontal vanishing point is:
Defining each of the estimated text baselines as a line in a Cartesian coordinate system;
Transforming each of the text baselines defined in the Cartesian coordinate system into data points in a homogeneous coordinate system;
The confidence level comprises the steps of assigning to each of the data points, the confidence level, the at least the length of each text baseline, the unique that is used to estimate the text baseline Assigning based on point groups and proximity to the estimated text baseline;
Grouping several data points having a confidence level above a predetermined threshold into a priority sample array;
Clustering the data points in the priority sample array into several sample groups, each sample group comprising at least two data points; ,
Assigning a group confidence value to each sample group based on at least the confidence level assigned to each data point in the sample group;
Iteratively selecting a sample group of data points from the priority sample array for line fitting, the iteration having the highest confidence value in the priority sample array Iteratively selecting from the first sample group;
Performing a line fitting on said first sample group leading to a first adapted line, and then performing a line fitting on each further sample group leading to a further adapted line ,
Based on the first adapted line and the further adapted line, determining a set of data points located below a predetermined distance threshold from the first adapted line Step to
Estimating at least first and second candidate horizontal vanishing points from a horizontal text baseline corresponding to the determined set of data points;
Performing a projection correction based on each estimated horizontal vanishing point candidate;
After projection correction, comparing the proximity of each horizontal vanishing point candidate to the horizontal text direction of the image ;
After projection correction, selecting the candidate horizontal vanishing point closest to the horizontal text direction of the image .

The first and second candidate horizontal vanishing points are estimated using different approximation methods selected from the group consisting of least squares, weighted least squares and adaptive least squares. The method of claim 7 .

The step of determining the vertical vanishing point is:
Estimating a plurality of vertical text blob lines, each corresponding to a selected one of the pixel blobs selected by the blob filtering algorithm for the text portion of the image;
Defining each of said estimated vertical text blob lines as a line in a Cartesian coordinate system;
Converting each of the vertical text blob lines estimated in the Cartesian coordinate system to data points in a homogeneous coordinate system;
Assigning a confidence level to each of the data points, wherein the confidence level is at least the eccentricity of the shape of the pixel blob used to estimate the respective vertical text blob line Assigning, and based on
Grouping several data points having a confidence level above a predetermined threshold into a priority sample array;
Clustering the data points in the priority sample array into several sample groups, each sample group comprising at least two data points; ,
Assigning a group confidence value to each sample group based on the confidence level assigned to each data point in the sample group;
Iteratively selecting a sample group of data points from the priority sample array for line fitting, the iteration comprising the highest group confidence value in the priority sample array Iteratively selecting from the first group of samples having
Performing a line fitting on said first sample group leading to a first adapted line, and then performing a line fitting on each further sample group leading to a further adapted line ,
Based on the first adapted line and the further adapted line, determining a set of data points located below a predetermined distance threshold from the first adapted line Step to
Estimating at least first and second candidate vertical erasure points from the vertical text blob line corresponding to the determined set of data points;
Performing a projection correction based on each estimated vertical vanishing point candidate;
After projection correction, comparing the proximity of each estimated vertical vanishing point candidate to the vertical text direction of the image ;
Selecting , after projection correction, the candidate vertical vanishing point closest to the vertical text direction of the image .

The first and second candidate vertical vanishing points are estimated using different approximation methods selected from the group consisting of least squares, weighted least squares and adaptive least squares. 10. The method of claim 9.

The blob filtering algorithm has the following conditions:
The condition that the eccentricity of the shape of each pixel blob, which represents the main direction of the pixel blob, is above a predetermined threshold,
The condition that the proximity of each pixel blob to the image border is above a predetermined distance threshold;
The condition that the angle of the estimated vertical text blob line is below a maximum angle threshold;
10. The pixel blob according to claim 9 , wherein the area of each pixel blob defined by the number of pixels selects the pixel blob based on at least one of the conditions below the maximum area threshold. Method.

The step of separating text and picture is performed after the image binarization and before the connected component analysis, wherein only text information is retained in the binarized image. the method of.

A system for projection correction of an image comprising at least one text part that is subject to distortion by perspective, said system comprising at least one processor and a program executable using said at least one processor. Storage, and
A first software code portion configured for image binarization that, when executed, binarizes the image;
A second software code portion configured for connected component analysis to detect pixel blobs in the at least one text portion of the binarized image when executed;
A horizontal vanishing point determination of estimating a text baseline using unique points of the pixel blobs, when executed, and determining a horizontal vanishing point of the at least one text portion using the text baseline A third software code portion configured to:
When executed, on the basis of the vertical features in at least one of the text portion, a fourth software code configured for vertical vanishing point determination for determining a vertical vanishing point for the at least one text portion Part,
When executed, on the basis of the horizontal and vertical vanishing point of viewing including a fifth software code portions configured for projection correction for correcting the perspective drawing in the image,
Estimating the text baseline comprises clustering the unique points into unique point groups, the unique point groups satisfying the following conditions:
-The condition that the point-to-point distance between the eigenpoints of the eigenpoint group is below a first distance threshold;
-The point-to-line distance between each unique point of the unique point group and the line formed by the unique points of the unique point group is below a second distance threshold Condition and
The condition that the off horizontal angle of the line formed by the unique points of the unique point group is below a maximum angle;
-A condition in which the unique point group contains a minimum number of unique points
A system in which the text baseline is estimated based on the unique point group .

Of the following: personal computers, portable computers, laptop computers, netbook computers, tablet computers, smart phones, digital still cameras, video cameras, mobile communication devices, portable personal digital assistants The system of claim 13 , comprising one of: a scanner, a multifunction device.

For projection correction images including at least one text portion subjected to distortion by perspective drawing, computer program product is stored, a non-transitory storage medium, the computer program product,
The following steps can be performed when executable on a computing device and executed on said computing device:
An image binarization step, wherein the image is binarized;
Connected component analysis, wherein pixel blobs are detected in the at least one text portion of the binarized image;
A horizontal vanishing point determination comprising: estimating a text baseline using the pixel blob unique points; and determining a horizontal vanishing point of the at least one text portion using the text baseline Step and
Determining a vertical vanishing point, wherein a vertical vanishing point is determined for the at least one text portion based on vertical features of the at least one text portion;
Performing the step of projection correction, wherein the perspective in the image is corrected based on the horizontal and vertical vanishing points ;
The step of estimating the text baseline comprises the step of clustering the unique points into unique point groups, the unique point groups satisfying the following conditions:
-The condition that the point-to-point distance between the eigenpoints of the eigenpoint group is below a first distance threshold;
-The point-to-line distance between each unique point of the unique point group and the line formed by the unique points of the unique point group is below a second distance threshold Condition and
The condition that the off horizontal angle of the line formed by the unique points of the unique point group is below a maximum angle;
-A condition in which the unique point group contains a minimum number of unique points
At least one filled of the text baseline, the singularity point based on the group, which includes software code portions in estimated format configured so that, non-transitory storage medium.

A method for determining vanishing point candidates of at least one text portion in an image subject to distortion by perspective projection, comprising:
An image binarization step, wherein the image is binarized;
Performing connected component analysis, wherein a pixel blob is detected in the at least one text portion of the binarized image, and for each of the pixel blobs, a position determining pixel is the pixel Performing on a pixel blob baseline of a blob, wherein the locating pixels define the position of the pixel blob in the binarized image;
Estimating, in a Cartesian coordinate system, a number of text lines, each text line representing an approximation of the horizontal or vertical text orientation of the text portion based on the positioning pixel;
Converting each of the text lines into data points in a homogeneous coordinate system;
Assigning a confidence level to each of the data points;
Grouping several data points having a confidence level above a predetermined threshold into a priority sample array;
Clustering the data points in the priority sample array into a number of sample groups, each sample group comprising at least two data points; ,
Assigning a group confidence value to each sample group based on at least the confidence level assigned to each data point in the sample group;
Applying the RANSAC algorithm to determine the set of inliers for the first adapted line of the data points, the RANSAC algorithm being the highest in the priority sample array Applying from the sample group having a group confidence value;
Estimating at least one candidate erasure point from the text line corresponding to the set of inliers.

17. The apparatus of claim 16 , wherein the confidence level assigned to the data point is based on at least the length of the respective text line and the proximity of the locating pixel to the respective text line. Method.

The RANSAC algorithm comprises the following steps:
Iteratively selecting a sample group of data points from said priority sample array for line fitting, said repetition being the highest group confidence in said priority sample array Iteratively selecting starting from a first sample group having a value;
Performing a line fitting on the first group of samples leading to a first fitted line, and then performing a line fitting on each further sample group leading to a further fitted line Step and
Based on the first adapted line and the further adapted line, determining a set of data points located below a predetermined distance threshold from the first adapted line The method of claim 16 , wherein the determining step comprises: determining the set of data points to form the set of inliers.

19. The method of claim 18 , wherein the predetermined distance threshold from the first adapted line is a fixed parameter.

The method according to claim 18 , wherein the predetermined distance threshold from the first adapted line is an adaptation parameter adapted based on the content of the image .

17. The method of claim 16 , wherein at least first and second candidate erasure points are estimated from the text line corresponding to the set of inliers.

The first and second candidate point for elimination are estimated using different approximation methods selected from the group consisting of least squares, weighted least squares and adaptive least squares. A method according to item 21 .

The method further includes the step of selecting a vanishing point from the estimated vanishing point candidates, the selection being
Performing a projection correction on the image based on each estimated vanishing point candidate;
After projection correction, comparing the proximity of each vanishing point candidate to the horizontal or vertical text direction of the image ;
After the projection correction, and selecting the nearest the vanishing point candidate to the horizontal or vertical text direction of the image, The method of claim 16.

It said group confidence value for each sample group is further based on the distance between the estimated the text lines of their respective that corresponds to the data point in the said sample group, The method of claim 16 .

The confidence level of each of the data points is further based on the principal direction of the pixel blob used to estimate each respective text line, the principal direction being: The method according to claim 16 , defined by the eccentricity of the shape of the blob.

The method according to claim 16 , wherein the maximum number of data points grouped into said priority sample array is between 2 and 20, more preferably between 5 and 10. .

17. The method of claim 16 , wherein each of the at least one vanishing point candidate is a horizontal vanishing point candidate and the positioning pixel is a unique point of the pixel blob.

Each of the at least one erasure point candidate is a vertical erasure point candidate, and the estimated text line is selected by a blob filtering algorithm on the at least one text portion of the image, each being The method according to claim 16 , wherein it is a vertical text blob line corresponding to the selected one of the pixel blobs.

Step separation of text and pictures, after the image binarized, it is and executed prior to the connected component analysis, are held in the image that only the text information is the binarized claim 16 The method described in.

A method for projection correction of an image comprising at least one text portion that is subject to distortion by perspective projection, comprising:
An image binarization step, wherein the image is binarized;
Performing connected component analysis, wherein a pixel blob is detected in the at least one text portion of the binarized image, and for each of the pixel blobs, a position determining pixel is the pixel blob. Performing on a pixel blob baseline of a blob, wherein the locating pixels define the position of the pixel blob in the binarized image;
Estimating a text baseline using the determined pixels of the pixel blob, and determining at least one candidate horizontal vanishing point of the at least one text portion using the text baseline The step of determining horizontal vanishing points, including
Estimating vertical text blob lines, each selected by a blob filtering algorithm on the text portion of the image, corresponding to a selected one of the pixel blobs; Determining a vertical erasure point comprising determining at least one vertical erasure point candidate of the at least one text portion using a text blob line,
At least one of the horizontal and vertical vanishing point determinations is:
Transforming each of the estimated text lines into data points in a homogeneous coordinate system;
Assigning a confidence level to each of the data points;
Grouping several data points having a confidence level above a predetermined threshold into a priority sample array;
Clustering the data points in the priority sample array into a number of sample groups, each sample group comprising at least two data points;
Assigning a group confidence value to each sample group based on at least the confidence level assigned to each data point in the sample group;
Applying, among the data points, a RANSAC algorithm to determine a set of inliers for a first adapted line, the RANSAC algorithm being the highest of the priority sample array Applying from the sample group having a group confidence value;
Estimating the at least one erasure point candidate from the text line corresponding to the set of inliers, determining a vertical erasure point;
A step of projection correction, wherein the perspective in the image is selected from a horizontal vanishing point selected from among the at least one horizontal vanishing point candidate and from the at least one vertical vanishing point candidate And correcting the projection point based on the vertical vanishing point.

A system for projection correction of an image comprising at least one text part that is subject to distortion by perspective, said system comprising at least one processor and a program executable using said at least one processor. Storage, and
A first software code portion configured for image binarization that, when executed, binarizes the image;
When executed, detect pixel blobs in the at least one text portion of the binarized image, and for each of the pixel blobs, above the pixel blob baseline of the pixel blobs. A second software code portion configured for connected component analysis to select positioning pixels defining the position of the pixel blob in the binarized image;
When executed, a text baseline is estimated using the locating pixels of the pixel blob and the text baseline is used to determine at least one candidate horizontal vanishing point of the at least one text portion A third software code portion configured for determining horizontal vanishing points;
When executed, vertical text blobs selected by a blob filtering algorithm on the at least one text portion of the image, each corresponding to a selected one of the pixel blobs. A fourth software code portion configured for vertical erasure point determination to estimate a line and determine at least one candidate vertical erasure point for the at least one text portion using the vertical text blob line. There,
At least one of the third and fourth software code portions is
Transforming each of the estimated text lines into data points in a homogeneous coordinate system;
Assigning a confidence level to each of the data points;
Grouping several data points having a confidence level above a predetermined threshold into a priority sample array;
Clustering the data points in the priority sample array into a number of sample groups, each sample group comprising at least two data points;
Assigning a group confidence value to each sample group based on at least the confidence level assigned to each data point in the sample group;
Applying the RANSAC algorithm to determine the set of inliers for the first adapted line of said data points, said RANSAC algorithm being the highest of said priority sample array Applying from the sample group having a group confidence value;
A fourth software code portion configured to perform the steps of: estimating the at least one erasure point candidate from the text line corresponding to the set of inliers;
In the image, based on the horizontal vanishing point selected from the at least one horizontal vanishing point candidate and the vertical vanishing point selected from the at least one vertical vanishing point candidate, when executed. A fifth software code portion configured to perform projection correction to correct the perspective.

Of the following: personal computers, portable computers, laptop computers, netbook computers, tablet computers, smart phones, digital still cameras, video cameras, mobile communication devices, portable personal digital assistants 32. The system of claim 31 , comprising one of:, a scanner, a multifunction device.

For determining the disappearance point candidates of the at least one of the text portion in the picture which receives the distortion by perspective drawing, computer program product is stored, a non-transitory storage medium,
The following steps can be performed when executed on a computing device, ie:
An image binarization step, wherein the image is binarized;
Performing connected component analysis, wherein a pixel blob is detected in the at least one text portion of the binarized image, and for each of the pixel blobs, a position determining pixel is the pixel blob. Performing on a pixel blob baseline of a blob, wherein the locating pixels define the position of the pixel blob in the binarized image;
Estimating in the Cartesian coordinate system a number of text lines, each text line representing an approximation of the horizontal or vertical text orientation of the at least one text portion based on the positioning pixel;
Converting each of the text lines into data points in a homogeneous coordinate system;
Assigning a confidence level to each of the data points;
Grouping several data points having a confidence level above a predetermined threshold into a priority sample array;
Clustering the data points in the priority sample array into a number of sample groups, each sample group comprising at least two data points; ,
Assigning a group confidence value to each sample group based on at least the confidence level assigned to each data point in the sample group;
Applying the RANSAC algorithm to determine the set of inliers for the first adapted line of the data points, the RANSAC algorithm being the highest in the priority sample array Applying from the sample group having a group confidence value;
The text that contains the software code portions in the configuration format to perform the steps of estimating at least one vanishing point candidates from the line, non-transitory storage medium corresponding to the set of inliers.

A method for projection correction of an image comprising at least one text portion that is subject to distortion by perspective projection, comprising:
An image binarization step, wherein the image is binarized;
A connected component analysis step, wherein a pixel blob is detected in the at least one text portion of the binarized image, and for each of the pixel blobs, a position determining pixel is the pixel blob Connected component analysis, selected on a pixel blob baseline, wherein the locating pixels define the position of the pixel blob in the binarized image;
Estimating a text baseline using the determined pixels of the pixel blob, identifying horizontal vanishing point candidates from the estimated text baseline, and using the horizontal vanishing point candidates Determining a horizontal vanishing point comprising: determining a horizontal vanishing point of at least one text portion;
A step of determining a vertical vanishing point, wherein a vertical vanishing point is determined for the at least one text portion based on vertical features of the at least one text portion;
The step of projection correction, wherein the perspective of the image is corrected based on the horizontal and vertical vanishing points;
The horizontal vanishing point determination has a first removal step on the level of the position-determining pixel, a second removal step on the level of the text baseline, and the third removing step of on the level of the horizontal vanishing point candidate only including,
The step of estimating the text baseline comprises the step of clustering unique points into unique point groups, said unique point groups satisfying the following conditions:
-The condition that the point-to-point distance between the eigenpoints of the eigenpoint group is below a first distance threshold;
-The point-to-line distance between each unique point of the unique point group and the line formed by the unique points of the unique point group is below a second distance threshold Condition and
The condition that the off horizontal angle of the line formed by the unique points of the unique point group is below a maximum angle;
-A condition in which the unique point group contains a minimum number of unique points
The text baseline is estimated based on the unique point group .

35. The method of claim 34 , wherein the position determining pixel is a unique point of the pixel blob.

The first removing step comprises the step of detecting confounding eigenpoints that are out of line with respect to eigenpoints near the eigenpoint being considered, wherein the confounding eigenpoints are the text baseline estimation 36. The method of claim 35 , wherein the method is ignored.

The unique points to be confused are the following steps:
Determining the width and height of the pixel blob;
Determining an average value for the width and height of the pixel blob;
The width of the pixel blobs under consideration, the step of at least one of the height, to detect a specific point to the confused as a unique point belonging only to different pixel blobs predetermined range from the determined mean value 37. The method of claim 36 , wherein the method is detected using

The first distance threshold, the second distance threshold, the maximum angle, and the minimum number of unique points are adaptively set based on the content of the image. 35. The method of claim 34 .

The step of estimating the text baseline further includes the step of unique point group merging, and the unique point groups on both sides of the unique points that are not ignored are merged into a larger unique point group 35. The method of claim 34 .

The second removal step
Assigning a confidence level to the text baseline;
On the basis of the confidence level, and removing the text baseline method of claim 34.

The confidence level is the proximity of the unique point group used to estimate at least the length of the respective text baseline, the text baseline, and the estimated text baseline. 41. The method of claim 40 , wherein the method is determined based on

Step, the confidence level is performed using the RANSAC algorithm to be taken into account, the method according to claim 40 of removing the text baseline.

The third removal step is
Performing a projection correction based on each of the identified horizontal vanishing point candidates;
After projection correction, comparing the proximity of each horizontal vanishing point candidate to the horizontal or text direction of the image ;
After the projection correction, and selecting the nearest the horizontal vanishing point candidates in the horizontal text direction of the image, The method of claim 34.

First and second candidate horizontal vanishing points are estimated from the text baseline after the second removing step, and a minimum of two for the estimation of the first and second candidate candidate horizontal vanishing points. 35. The method according to claim 34 , wherein different approximation methods selected from the group consisting of multiplication, weighted least squares and adaptive least squares are used.

35. The method according to claim 34 , wherein the step of separating text and picture is performed after the image binarization and before the connected component analysis, and only text information is retained in the binarized image. the method of.

A system for projection correction of an image comprising at least one text part that is subject to distortion by perspective, said system comprising at least one processor and a program executable using said at least one processor. Storage, and
A first software code portion configured for image binarization that, when executed, binarizes the image;
A connected component analysis that, when executed, detects pixel blobs in the at least one text portion of the binarized image, wherein for each of the pixel blobs, a position determining pixel is: A second configured for connected component analysis, selected on a pixel blob baseline of a blob, wherein the locating pixels define the position of the pixel blob in the binarized image; Two software code parts,
Estimating a text baseline using the determined pixels of the pixel blob, as executed, identifying horizontal vanishing point candidates from the estimated text baseline, and the horizontal vanishing A third software code portion configured for horizontal erasure point determination, performing the steps of: determining horizontal erasure points of the at least one text portion using point candidates.
When executed, on the basis of the vertical features in at least one of the text portion, a fourth software code configured for vertical vanishing point determination for determining a vertical vanishing point for the at least one text portion Part,
A fifth software code portion for projection correction to correct said perspective in said image based on said horizontal and vertical vanishing points when executed;
When the third software code portion is executed, a first removal step for the level of the locating pixel, a second removal step for the level of the text baseline, and a level of the horizontal vanishing point candidate run the third removal step for,
The step of estimating the text baseline comprises the step of clustering unique points into unique point groups, said unique point groups satisfying the following conditions:
-The condition that the point-to-point distance between the eigenpoints of the eigenpoint group is below a first distance threshold;
-The point-to-line distance between each unique point of the unique point group and the line formed by the unique points of the unique point group is below a second distance threshold Condition and
The condition that the off horizontal angle of the line formed by the unique points of the unique point group is below a maximum angle;
-A condition in which the unique point group contains a minimum number of unique points
A system in which the text baseline is estimated based on the unique point group .

Of the following: personal computers, portable computers, laptop computers, netbook computers, tablet computers, smart phones, digital still cameras, video cameras, mobile communication devices, portable personal digital assistants 47. The system of claim 46 , comprising: one of: a scanner, a multifunction device.

For projection correction images including at least one text portion subjected to distortion by perspective drawing, computer program product is stored, a non-transitory storage medium, the computer program product,
The following steps can be performed when executable on a computing device and executed on said computing device:
The image is binarized, and the binarized steps of the image,
A connected component analysis step, wherein a pixel blob is detected in the at least one text portion of the binarized image, and for each of the pixel blobs, a position determining pixel is the pixel blob Connected component analysis, selected on a pixel blob baseline, wherein the locating pixels define the position of the pixel blob in the binarized image;
Estimating a text baseline using the determined pixels of the pixel blob, identifying a candidate horizontal vanishing point from the estimated text baseline, and using the candidate horizontal vanishing point Determining a horizontal vanishing point comprising determining a horizontal vanishing point of the at least one text portion;
Determining a vertical vanishing point, wherein a vertical vanishing point is determined for the at least one text portion based on vertical features of the at least one text portion;
Including the software code portion in a format configured to perform the step of projection correction, wherein the perspective in the image is corrected based on the horizontal and vertical vanishing points;
The horizontal vanishing point determination includes a first removing step for the level of the locating pixel, a second removing step for the level of the text baseline, and a third removing step for the level of the horizontal vanishing point candidate. ,
The step of estimating the text baseline comprises the step of clustering unique points into unique point groups, said unique point groups satisfying the following conditions:
-The condition that the point-to-point distance between the eigenpoints of the eigenpoint group is below a first distance threshold;
-The point-to-line distance between each unique point of the unique point group and the line formed by the unique points of the unique point group is below a second distance threshold Condition and
The condition that the off horizontal angle of the line formed by the unique points of the unique point group is below a maximum angle;
-A condition in which the unique point group contains a minimum number of unique points
A non-transitory storage medium satisfying at least one of the above, wherein the text baseline is estimated based on the unique point group .