JP3649942B2

JP3649942B2 - Image input device, image input method, and storage medium

Info

Publication number: JP3649942B2
Application number: JP09639099A
Authority: JP
Inventors: 正義岡本
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1999-04-02
Filing date: 1999-04-02
Publication date: 2005-05-18
Anticipated expiration: 2019-04-02
Also published as: JP2000293627A

Description

【０００１】
【発明の属する技術分野】
本発明は、ＣＣＤなどの画像入力器を備えたカメラ、及び携帯情報端末、パーソナルコンピュータ、ワードプロセッサ等の電子機器において、連続的に撮影された画像の中から、被写体に対して、正面方向から撮影した画像データに最も近い画像データを自動的に選択し、更に真正面の画像データに変換して文字認識処理する画像入力装置、画像入力方法及びその処理を実行するプログラムが記録された記憶媒体に関する。
【０００２】
【従来の技術】
一般に、印刷物や物体に記された文字や図形を認識する場合、できるだけ被写体の正面から撮影した画像データに対して認識処理する方が、正しい認識結果を得る可能性が高いと知られている。これは、真正面方向からのズレ（位置、方向、回転）が大きくなるほど真正面の画像データへ戻す変換処理による誤差が大きくなり、画像データが劣化するためである。例えば、ＣＣＤカメラやディジタルカメラなどの撮影装置を使って、印刷された紙や物体等の被写体の真正面の画像データを入力しようとする場合、カメラか被写体のどちらかを手で動かして、真正面に近い画像データを入力するための位置合わせや角度合わせを行うことになる。例えば、図３は、パソコンに固定されたカメラで、手に持った印刷物を撮影するシーンであり、図４は、ディジタルカメラを手で持って印刷物に向けて撮影するシーンである。例えば、カメラと被写体が正対していれば、図５に示すような画像データを得ることができるが、カメラと被写体の位置と角度がずれていると、撮影した画像データは図６のように変形されたものとなる。なお、カメラの位置、姿勢推定など、カメラキャリブレーションについては、"コンピュータビジョン、技術評論と将来展望"(新技術コミュニケーションズ）pp.37-53,および同書pp.97-122に詳説されている。
【０００３】
しかしながら、従来の技術において、図６に示すような変形された画像データに対し「逆透視変換処理」と呼ばれる「カメラ位置を元の位置（真正面）に戻した時の画像データに変換する画像処理」を施すことによって、真正面の画像データに復元できることが知られている。「逆透視変換処理」については、”テレビジョン画像情報工学ハンドブック”（テレビジョン学会編、オーム社）,pp.462に詳説されている。
【０００４】
また、画像データ中の文字を認識する場合、Ｘ，Ｙ方向に傾いた画像データを画像変換で正立像に戻してから認識する方法が知られている（特開平6-103411,特開平110-25258）。画像データの中から文字列、文字を抽出する方法は、”拡張スプリット検出法による文書構造解析”（情報処理学会（平成９年度後期）第５５回全国大会 2-105,106)に詳しい。文字認識手法については、”手書き漢字の方向パターン・マッチング法による解析”（電子情報通信学会,Vol.J65-D,No.5,pp.550-557,1982.5)に詳しい。
【０００５】
しかしながら、正面からのズレの量が大きい画像データに対しては、画像データに「逆透視変換処理」を施しても完全に（正面から撮影した画像データ程度に）復元することは困難である。特に正面からのズレの量が大きくて文字が潰れたような画像データに対しては、文字を正しく認識することができる程度の画像データに復元することは不可能に近い。
【０００６】
従って、撮影時に、できるだけ正面に近い位置や方向からの画像データを得ることがこれらの課題の解決方法として最も有効である。
【０００７】
図４に示すシーンにおいては、撮影者が被写体に向けたカメラの位置や角度はおおよそ正面を向いていると考えてよい。このようなシーンにおける問題は、撮影者の手ぶれによるカメラの位置と角度の変化である。従来の撮影装置では、１枚の画像データを取り込むだけであったが、最近のデジタルカメラでは、一定時間連続して複数枚の画像データを記録できるものがある。１枚だけの画像データでは、正面から撮影した画像データとして適切なものである可能性は低いが、複数枚の連続した画像データが取り込まれると、その中に正面から撮影した画像データとして適切なものが含まれる可能性は大きい。
【０００８】
【発明が解決しようとする課題】
本発明では、一定時間連続して撮影された複数枚の画像データの中から最も正面の画像データとしてふさわしい画像データを選出する手段を実現し、その手段を用いて得られた画像データに逆透視変換処理を施して復元した真正面の画像データの中の文字や図形にパターン認識処理を施して結果を得る装置、及びその処理方法を実現する。
【０００９】
【問題点を解決するための手段、及び発明の効果】
本発明では、一定時間連続して撮影された複数枚の画像データの中から最も正面からの画像データとしてふさわしい画像データを選び出す手段を実現し、その手段を用いて得られた画像データに逆透視変換処理を施して復元した真正面の画像データの中に存在する文字や図形にパターン認識処理を施して結果を得る画像入力装置、及びその処理方法を実現する。
【００１０】
その結果、撮影時に「手振れ」があっても、連続して撮影された複数枚の画像データの中から最も正面の画像データを自動選出し、更に、逆透視変換処理を施して真正面の画像データに復元処理することにより、文字認識や図形認識の可能な品質の画像データを得て、画像データ中の文字や図形を精度よく認識することが可能となる。
【００１１】
【発明の実施の形態】
図を用いて本発明の一実施例を説明する。
【００１２】
図１は、本発明を実施するための装置の構成を示している。図中、（１０１）は、ＣＰＵであり、ＲＯＭ（１０５）に格納されたプログラムに従って処理を行う。ＲＯＭ（１０５）には、プログラムの他に画像処理や正面画像判定に必要な基準データなどが格納されている。また、ＲＡＭ（１０６）は、プログラムのロード領域、ワーキング領域、処理中の画像データ等の一時記憶領域などとして使用される。また、外部記憶（１０８）は、プログラムの他に画像処理や判定に用いる基準データ、及び処理中の画像データなどを一時的に記憶する。
【００１３】
（１０２）は、キーボードであり、撮影タイミングの指示や装置全体の制御指示をＣＰＵに入力する。
【００１４】
（１０３）は、表示部であり、撮影された生の画像データや変換された画像データや画像データをパターン認識した結果などがＣＰＵの指示に従って表示される。
【００１５】
（１０４）は、画像入力部（ＣＣＤカメラまたはディジタルカメラのCCD部）であり、ＣＰＵからの指示に従って１枚の画像データを撮影入力する。
【００１６】
図２は、装置全体の処理の構成を表すブロック図である。まず、画像データ入力部（１１０）により撮影された１枚の画像データがメモリに記録される。
【００１７】
視点位置算出部（１１１）は、入力された１枚の画像データの中に存在する複数の直線成分から、その画像データを撮影した時のカメラの位置（回転を含む）を逆算するブロックである。
【００１８】
適正画像判定部（１１２）は、複数枚の画像データの中に正面からのズレの量が許容範囲内の画像データがあるかどうか判定し、許容範囲内の画像データがあれば、その中から最も正面に近い画像データを選択するブロックである。
【００１９】
逆透視変換部（１１３）は、算出されたカメラの位置に基づいて、選択された画像データを、カメラを正面にした時の画像データに変換するブロックである。
【００２０】
文字認識部（１１４）は、画像データの中に存在する文字や図形を認識するブロックである。
【００２１】
図７は、本発明の特徴的な動作を示す図である。図中、G01〜G06は、連続的に撮影した画像データである。これらの画像データから４角形を構成する４本の線分を検出し、それらの線分の方向から、最も正面から撮影された画像データG04を選出する。選出された画像データG04に対して逆透視変換処理を施した結果がG041である。
【００２２】
図８は、複数枚の画像データを連続的に撮影し、その画像データの中から最も正面に近い画像データを選択する処理のフローチャートをである。以下、その処理の詳細について説明する。
【００２３】
まず、ステップＳ１０１の初期化処理において、ＣＰＵ（１０１）は、撮影画像枚数カウンタをクリアする。
【００２４】
そして、ステップＳ１０２の処理において、ＣＰＵ（１０１）は、画像入力部（１０４）へ撮影を指示し、撮影された１枚分の画像データをメモリＲＡＭ（１０６）に一時的に記憶する。
【００２５】
ステップＳ１０３の処理において、視点位置算出部１１１は、まず、撮影された画像データに対して、画像データを入力したカメラ部の空間的位置Ｐ（xp,yp,zp,rp）を算出する。具体的には、画像データの中から所定の長さ以上の複数の線分（例えば、図７の画像G01中の４角形の4辺など）を抽出し、その傾きから画像入力部のカメラ部の位置P(xp,yp,zp,rp)を算出する。図９上では、カメラ（Ｐ，Ｒ）の位置で示される。Ｘ，Ｙは被写体の水平座標、Ｙは垂直座標、xp,ypは被写体球面上の水平方向の極座標、垂直方向の極座標、zpは画像入力部（視点位置）と被写体との距離、rpは画像入力部と被写体との回転を示している。
【００２６】
図９では、視点位置（カメラ（Ｐ，Ｒ））の被写体の正面からのズレを極座標に変換した時の角度のズレとして算出する。ここで、極座標の中心Ｇは、被写体の中心（四辺形なら、その対角線の交点）とし、被写体平面上の四辺形の長辺軸方向をＸ軸とし、同平面上の短辺軸方向をＹ軸とする。Ｚ軸は被写体面に垂直方向にとり、被写体の中心Ｇからの距離を表す。回転はカメラの基準平面と被写体平面とのＺ軸を中心にした回転のズレとして表す。正面からのズレの量は極座標上の角度として表すことになる。図９では、視点位置のZpの極座標上の球面上で視点位置の軸と正面軸の間の角度として、[xp,yp,rp]として得ることができる。このように、xp,yp,rpの合計を正面からのズレの量Ｓとして計算し、視点位置Ｐ、正面からのズレの量Ｓは、画像データと対応づけてメモリ（ＲＡＭ１０６）に一時的に記憶しておく。
【００２７】
その後、ステップＳ１０４においては、撮影した画像の枚数が所定枚数に達したか否かを判定する。所定の枚数に達していなければ、達するまであるいはＣＰＵから撮影中止の指示が出されるまで、撮影を繰り返す（ステップＳ１０４のＮｏ）。所定の枚数に達していれば（ステップＳ１０４のＹｅｓ）、ステップＳ１０５に移る。
【００２８】
ステップＳ１０５の処理においては、適正画像判定部（１１２）が、複数枚の画像データの中に、正面からのズレの量が許容範囲内のものがあるか否かを判定する。ズレの量が許容範囲内の画像データがあれば（ステップＳ１０５のＹｅｓ)、ステップＳ１０６に進み、ズレの量が最小の画像データを最も正面に近い画像データとして選出する。ズレの量が許容範囲内の画像データがなければ（ステップＳ１０５のＮｏ）、その結果をＣＰＵに報知し、最初から撮影をやり直す。
【００２９】
以下、図１０を用いて、ステップＳ１０５の処理について、詳細に説明する。
【００３０】
まず、ステップＳ２０１からＳ２０３においてカウンタなどを初期化する。続いて、ステップＳ２０４において当該画像データのズレの量が許容範囲内にあるか否か判定する。ステップＳ２０４の処理を図１１を用いて説明する。まず、ステップＳ４１１において当該画像データのズレの量（xp,yp,rp）を取得する。次に、ステップＳ４１２において、xpがＸ方向の座標のズレの範囲内、ypがＹ方向の座標のズレの範囲内、rpが回転のズレの範囲内にあるか判定する。xp，yp，rpとも範囲内にあればステップＳ４１５において、ズレの量Ｓを計算する。さらに、Ｓがズレの許容範囲内にあるか判定し、範囲内にあれば、当該画像データのズレは許容範囲内であるという判定結果を与え（ステップＳ４１７）、Ｓがズレの許容範囲外であれば、当該画像データのズレは許容範囲外であるという判定結果を与えて（ステップＳ４１８）、それぞれステップＳ２０４の処理を終了する。
【００３１】
ステップＳ１０６の処理において、適正画像判定部（１１２）は、複数枚の画像データの中から最も正面に近い画像データ（ズレの量が最小の画像データ）を選出し、選出結果をメモリ（ＲＡＭ１０６）に記録し、ＣＰＵに報知して、連続撮影および適正画像選出処理を終了する。
【００３２】
図１２を用いて、ステップＳ１０６の処理を詳説する。まず、ステップＳ６０１でカウンタを初期化し、次にステップＳ６０２でズレの量が許容範囲内にあるＪ番目の画像データを特定し、ステップＳ６０３により、そのズレの量をメモリから取り出し、ステップＳ６０４によりズレの量Ｓを計算する。Ｓ６０５，Ｓ６０６は、Ｓｍｉｎ，Ｊｍｉｎを初期設定するルートである。Ｓ６０７，Ｓ６０８で最小のＳの値（Ｓｍｉｎ）とその画像データの番号（Ｊｍｉｎ）を更新し、最も小さいＳの値とその画像データの番号を記憶する。Ｓ６１０により、全てのズレの量が許容範囲ないの画像データのズレの量の比較の終了を判定し、全ての処理が終わればステップＳ６１１により、Ｊ番目の画像データを最も正面の画像データとして選出結果を与えて、ステップＳ１０６の処理を終了する。
【００３３】
上記実施例では、正面からのズレの量を、視点位置算出部１１１でのカメラの視点の位置の計算結果によって求めたが、別の実施方法として、図１３に示すように、検出した線分により構成される４辺形と、その４辺形を囲む矩形（長方形）との差分（近似的な変形具合）を正面からのズレの量として処理する方法もある。以下、別の実施方法において、主要な部分であるステップＳ１０３の処理からステップＳ１０６の処理について、詳しく説明する。
【００３４】
ステップＳ１０３の処理において、画像データの中から所定の長さ以上の複数の線分を抽出し、四辺形を構成する。次に、図１３に示すように、抽出した四辺形に外接する矩形を求めて、数１から数４の計算式によって、対応する各々の辺が形成する角度をズレの量｛ｑ１、ｑ２、ｑ３、ｑ４｝として算出する。以下の数式において、方向角度は、図１３における辺Ｂ１Ｂ４（水平右）方向を基準方向（０度）とした時の基準方向空の時計廻り角度で表す。正面からのズレの量Ｑは、数５によって計算し、画像データと対応づけてメモリＲＡＭ（１０６）に一時的に記憶しておく。数５において、｜ｑ｜は、ｑの絶対値を表す。
【００３５】
【数１】

【００３６】
【数２】

【００３７】
【数３】

【００３８】
【数４】

【００３９】
【数５】

【００４０】
その後、ステップＳ１０４においては、撮影した画像の枚数が所定枚数に達したか否かを判定する。所定の枚数に達していなければ、達するまであるいはＣＰＵから撮影中止の指示が出されるまで、撮影を繰り返す（ステップＳ１０４のＮｏ）。所定の枚数に達していれば（ステップＳ１０４のＹｅｓ）、ステップＳ１０５に移る。
【００４１】
ステップＳ１０５の処理においては、適正画像判定部（１１２）が、複数枚の画像データの中に、正面からのズレの量が許容範囲内のものがあるか否かを判定する。ズレの量が許容範囲内の画像データがあれば（ステップＳ１０５のＹｅｓ)、ステップＳ１０６に進み、ズレの量が最小の画像データを最も正面に近い画像データとして選出する。ズレの量が許容範囲内の画像データがなければ（ステップＳ１０５のＮｏ）、その結果をＣＰＵに報知し、最初から撮影をやり直す。
【００４２】
以下、図９を用いて、ステップＳ１０５の処理について、詳細に説明する。
【００４３】
まず、ステップＳ２０１からＳ２０３においてカウンタなどを初期化する。続いて、ステップＳ２０４において当該画像データのズレの量が許容範囲内にあるか否か判定する。この場合のステップＳ２０４の処理については、図１４を用いて説明する。まず、ステップＳ４２１において当該画像データのズレの量（q1,q2,q3,q4）を取得する。次に、ステップＳ４２２からステップＳ４２５において、ズレの量（q1,q2,q3,q4）がそれぞれの許容範囲にあるか判定し、許容範囲内にあれば、ステップS426において、ズレの量の合計を求めてステップＳ４２７において許容範囲の判定を行う。ここでも許容範囲内にあると判定されれば、当該画像データのズレの量は、許容範囲内にあるという判定結果を与えてステップＳ２０４の処理を終了する。
【００４４】
ステップＳ１０６の処理において、適正画像判定部（１１２）は、複数枚の画像データの中から最も正面に近い画像データ（ズレの量が最小の画像データ）を選出し、選出結果をメモリ（ＲＡＭ１０６）に記録し、ＣＰＵに報知して、連続撮影および適正画像選出処理を終了する。
【００４５】
図１５を用いて、ステップＳ１０６の処理を詳説する。まず、ステップＳ６２１でカウンタを初期化し、次にステップＳ６２２でズレの量が許容範囲内にあるＪ番目の画像データを特定し、ステップＳ６２３により、そのズレの量をメモリから取り出し、ステップＳ６２４によりズレの量Ｑを計算する。Ｓ６２５，Ｓ６２６は、Ｑｍｉｎ，Ｊｍｉｎを初期設定するルートである。Ｓ６２７，Ｓ６２８で最小のＱの値（Ｑｍｉｎ）とその画像データの番号（Ｊｍｉｎ）を更新し、最も小さいＳの値とその画像データの番号を記憶する。Ｓ６３０により、全てのズレの量が許容範囲内の画像データのズレの量の比較の終了を判定し、全ての処理が終わればステップＳ６３１により、Ｊｍｉｎ番目の画像データを最も正面の画像データとして選出結果を与えて、ステップＳ１０６の処理を終了する。
【００４６】
また、図２の処理ブロック構成図において、適正画像判定部（１１２）で求めた最も正面に近い画像データに対し、逆透視変換部（１１３）によって真正面の画像データに逆透視変換処理して、その結果をメモリへ記憶する。
【００４７】
更に、真正面の画像データに対し、文字認識部（１１４）によって文字を認識する。文字認識部（１１４）は、まず、画像をY軸に投影した黒画素の分布から黒画素の少ない部分を行間として検出し、行単位の画像データに分割する。更に、行単位の画像データを１文字毎の画像データに分割する。行単位の画像データをX軸に投影して黒画素の分布から黒画素の少ない部分を字間として１文字づつの画像データに分割する。そ結果として得られた１文字毎の画像データと文字認識部の文字認識用照合辞書の全文字との間で類似度計算を行い、最も類似度の高い文字を認識結果とする。
【００４８】
画像データの中から文字列、文字を抽出する方法は、”拡張スプリット検出法による文書構造解析”（情報処理学会（平成９年度後期）第５５回全国大会 2-105,106)に詳しい。また、文字認識手法については、”手書き漢字の方向パターン・マッチング法による解析”（電子情報通信学会,Vol.J65-D,No.5,pp.550-557,1982.5)に詳しい。
また、「逆透視変換」については、”テレビジョン画像情報工学ハンドブック”（テレビジョン学会編、オーム社）,pp.462に説明されている。また、カメラの位置、姿勢推定など、カメラキャリブレーションについては、"コンピュータビジョン、技術評論と将来展望"(新技術コミュニケーションズ）pp.37-53,および同書pp.97-122に説明されている。
【００４９】
また、図１の実施例では、上述した画像入力処理方法をＲＯＭ１０５に記憶されたプログラムによって実行する場合について示したが、このプログラムは、フロッピーディスクやハードディスクなどの外部記憶媒体（１０８）に記憶するようにしてもよい。
【図面の簡単な説明】
【図１】本発明の一実施例を示すための装置の構成図である。
【図２】本発明の実施例の処理のブロック構成図である。
【図３】パソコンの固定カメラで撮影するイメージ図である。
【図４】ディジタルスチルカメラなどの携帯型カメラで撮影するイメージ図である。
【図５】被写体を正面から撮影した画像のイメージ図である。
【図６】被写体に対して正面（中心、垂直）方向から、左に、かつ下にずれた方向から撮影した画像のイメージ図である。
【図７】連続撮影した画像から正面画像を自動選択する概念を表した図である。
【図８】複数の画像を連続して撮影し、その中から正面から撮影した画像として最適な画像を選択する処理を示すフローチャートである。
【図９】視点位置を極座標の概念で表した図である。
【図１０】複数枚の画像データの中にズレの量が許容範囲内のものがあるかの判定処理（Ｓ１０５）のフローチャートである。
【図１１】ズレの量が許容範囲内であるかの判定処理（Ｓ２０４の処理）のフローチャートである。
【図１２】最小のズレの量の画像データ判定処理（Ｓ１０６の処理）のフローチャートである。
【図１３】対象四角形と外接矩形とのズレの量の概念を示す図である。
【図１４】外接矩形とのズレが許容範囲内にあるかの判定処理（Ｓ２０４）のフローチャートである。
【図１５】別の実施例（外接矩形）における、最小のズレの量の画像データ判定処理（Ｓ１０６の処理）のフローチャートである。
【符号の説明】[0001]
BACKGROUND OF THE INVENTION
The present invention captures a subject from the front in a continuously photographed image in a camera equipped with an image input device such as a CCD and electronic devices such as a personal digital assistant, personal computer, and word processor. The present invention relates to an image input apparatus that automatically selects image data closest to the image data that has been processed, and further converts the image data into front-facing image data to perform character recognition processing, an image input method, and a storage medium on which a program that executes the processing is recorded.
[0002]
[Prior art]
In general, when recognizing characters or figures written on a printed matter or an object, it is known that a recognition process is more likely to obtain a correct recognition result when image data taken from the front side of the subject as much as possible. This is because as the deviation (position, direction, rotation) from the front direction increases, the error due to the conversion process for returning to the front image data increases, and the image data deteriorates. For example, when using an imaging device such as a CCD camera or digital camera to input image data directly in front of a subject such as printed paper or an object, move either the camera or the subject by hand to bring it directly in front. Position alignment and angle alignment for inputting close image data are performed. For example, FIG. 3 is a scene in which a printed matter held in a hand is photographed with a camera fixed to a personal computer, and FIG. 4 is a scene in which a digital camera is photographed with a hand toward the printed matter. For example, if the camera and the subject are facing each other, image data as shown in FIG. 5 can be obtained. However, if the camera and the subject are at different positions and angles, the captured image data is as shown in FIG. It will be transformed. Camera calibration, such as camera position and orientation estimation, is described in detail in "Computer Vision, Technical Review and Future Prospects" (New Technology Communications) pp. 37-53, and pp. 97-122.
[0003]
However, in the prior art, “image processing for converting image data when the camera position is returned to the original position (directly in front)” called “inverse perspective conversion processing” for the deformed image data as shown in FIG. It is known that the image data can be restored to the frontal image data by applying “ The “inverse perspective transformation process” is described in detail in “Television Image Information Engineering Handbook” (Television Society, Ohmsha), pp.462.
[0004]
Further, when recognizing characters in image data, a method is known in which image data inclined in the X and Y directions is converted to an upright image by image conversion (Japanese Patent Laid-Open Nos. 6-103411 and 110-110). 25258). The method of extracting character strings and characters from image data is detailed in “Document structure analysis by the extended split detection method” (Information Processing Society of Japan (late fiscal 1997) 55th National Conference 2-105,106). The character recognition method is detailed in “Analysis by Hand Patterned Kanji Direction Pattern Matching Method” (The Institute of Electronics, Information and Communication Engineers, Vol. J65-D, No. 5, pp. 550-557, 1982.5).
[0005]
However, for image data with a large amount of deviation from the front, it is difficult to completely restore (to the extent of image data taken from the front) even if the image data is subjected to “inverse perspective conversion processing”. In particular, it is almost impossible to restore image data having a large amount of misalignment from the front and crushed characters so that the characters can be recognized correctly.
[0006]
Therefore, obtaining image data from a position or direction as close to the front as possible at the time of shooting is the most effective solution for these problems.
[0007]
In the scene shown in FIG. 4, it may be considered that the position and angle of the camera that the photographer points at the subject are approximately facing the front. A problem in such a scene is a change in the position and angle of the camera due to camera shake of the photographer. In the conventional photographing apparatus, only one piece of image data is captured. However, some recent digital cameras can record a plurality of pieces of image data continuously for a certain period of time. Although it is unlikely that only one image data is appropriate as image data photographed from the front, when a plurality of continuous image data is captured, it is suitable as image data photographed from the front. There is a high possibility that things will be included.
[0008]
[Problems to be solved by the invention]
In the present invention, a means for selecting image data suitable as the most front image data from among a plurality of image data photographed continuously for a certain period of time is realized, and reverse fluoroscopy is performed on the image data obtained by using the means. An apparatus and a processing method for obtaining a result by performing pattern recognition processing on characters and figures in image data in front of the image restored by conversion processing are realized.
[0009]
[Means for solving the problems and effects of the invention]
The present invention realizes a means for selecting image data most suitable as the image data from the front from a plurality of pieces of image data photographed continuously for a certain period of time, and reverse fluoroscopy the image data obtained by using the means. An image input apparatus that performs pattern recognition processing on characters and graphics existing in frontal image data restored by conversion processing and obtains a result, and a processing method thereof are realized.
[0010]
As a result, even if there is “hand-shake” at the time of shooting, the most front image data is automatically selected from a plurality of consecutively shot image data, and further, reverse perspective transformation processing is performed and the front image data is applied. By performing the restoration processing, it is possible to obtain image data of a quality capable of character recognition and figure recognition and accurately recognize characters and figures in the image data.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
An embodiment of the present invention will be described with reference to the drawings.
[0012]
FIG. 1 shows the configuration of an apparatus for carrying out the present invention. In the figure, reference numeral (101) denotes a CPU which performs processing according to a program stored in a ROM (105). In addition to the program, the ROM (105) stores reference data necessary for image processing and front image determination. The RAM (106) is used as a program load area, a working area, a temporary storage area for image data being processed, and the like. In addition to the program, the external storage (108) temporarily stores reference data used for image processing and determination, image data being processed, and the like.
[0013]
Reference numeral (102) denotes a keyboard, which inputs an imaging timing instruction and an entire apparatus control instruction to the CPU.
[0014]
Reference numeral (103) denotes a display unit that displays raw captured image data, converted image data, a pattern recognition result of the image data, and the like according to instructions from the CPU.
[0015]
(104) is an image input unit (CCD unit of a CCD camera or a digital camera), which captures and inputs one piece of image data in accordance with an instruction from the CPU.
[0016]
FIG. 2 is a block diagram showing a configuration of processing of the entire apparatus. First, one piece of image data photographed by the image data input unit (110) is recorded in the memory.
[0017]
The viewpoint position calculation unit (111) is a block that reversely calculates the position (including rotation) of the camera when the image data is captured from a plurality of linear components existing in one input image data. .
[0018]
The appropriate image determination unit (112) determines whether there is image data whose amount of deviation from the front is within the allowable range in the plurality of image data, and if there is image data within the allowable range, from among them This block selects image data closest to the front.
[0019]
The reverse perspective conversion unit (113) is a block that converts selected image data into image data when the camera is in front based on the calculated position of the camera.
[0020]
The character recognition unit (114) is a block for recognizing characters and figures existing in the image data.
[0021]
FIG. 7 is a diagram showing a characteristic operation of the present invention. In the figure, G01 to G06 are image data taken continuously. Four line segments constituting the quadrangle are detected from these image data, and image data G04 photographed from the front is selected from the direction of these line segments. A result obtained by performing the reverse perspective transformation process on the selected image data G04 is G041.
[0022]
FIG. 8 is a flowchart of processing for continuously capturing a plurality of pieces of image data and selecting image data closest to the front from the image data. Details of the processing will be described below.
[0023]
First, in the initialization process of step S101, the CPU (101) clears the photographed image number counter.
[0024]
In the process of step S102, the CPU (101) instructs the image input unit (104) to shoot, and temporarily stores the image data for one shot in the memory RAM (106).
[0025]
In the processing of step S103, the viewpoint position calculation unit 111 first calculates the spatial position P (xp, yp, zp, rp) of the camera unit that has input the image data, with respect to the captured image data. Specifically, a plurality of line segments having a predetermined length or more (for example, four sides of the quadrangle in the image G01 in FIG. 7) are extracted from the image data, and the camera unit of the image input unit is extracted from the inclination. The position P (xp, yp, zp, rp) of is calculated. In FIG. 9, it is indicated by the position of the camera (P, R). X and Y are horizontal coordinates of the subject, Y is vertical coordinates, xp and yp are horizontal polar coordinates and vertical polar coordinates on the subject spherical surface, zp is the distance between the image input unit (viewpoint position) and the subject, rp is the image The rotation of the input unit and the subject is shown.
[0026]
In FIG. 9, the deviation of the viewpoint position (camera (P, R)) from the front of the subject is calculated as the angle deviation when converted into polar coordinates. Here, the center G of the polar coordinate is the center of the subject (if it is a quadrilateral, the intersection of its diagonal lines), the long side axis direction of the quadrangle on the subject plane is the X axis, and the short side axis direction on the same plane is Y Axis. The Z axis is perpendicular to the subject surface and represents the distance from the center G of the subject. The rotation is expressed as a shift of the rotation about the Z axis between the reference plane of the camera and the object plane. The amount of deviation from the front is expressed as an angle in polar coordinates. In FIG. 9, the angle between the viewpoint position axis and the front axis on the spherical surface of the viewpoint position Zp on the polar coordinates can be obtained as [xp, yp, rp]. Thus, the sum of xp, yp, and rp is calculated as the amount of deviation S from the front, and the viewpoint position P and the amount of deviation S from the front are temporarily associated with the image data in the memory (RAM 106). Remember.
[0027]
Thereafter, in step S104, it is determined whether or not the number of photographed images has reached a predetermined number. If the predetermined number of images has not been reached, shooting is repeated until the number is reached or until an instruction to stop shooting is issued from the CPU (No in step S104). If the predetermined number has been reached (Yes in step S104), the process proceeds to step S105.
[0028]
In the process of step S105, the appropriate image determination unit (112) determines whether or not there is a plurality of pieces of image data whose amount of deviation from the front is within an allowable range. If there is image data whose amount of deviation is within the allowable range (Yes in step S105), the process proceeds to step S106, and the image data with the smallest amount of deviation is selected as the image data closest to the front. If there is no image data in which the amount of deviation is within the allowable range (No in step S105), the CPU notifies the CPU of the result and restarts photographing from the beginning.
[0029]
Hereinafter, the process of step S105 will be described in detail with reference to FIG.
[0030]
First, in steps S201 to S203, a counter and the like are initialized. In step S204, it is determined whether the amount of deviation of the image data is within an allowable range. The process of step S204 will be described with reference to FIG. First, in step S411, the shift amount (xp, yp, rp) of the image data is acquired. Next, in step S412, it is determined whether xp is within the coordinate deviation range in the X direction, yp is within the coordinate deviation range in the Y direction, and rp is within the rotation deviation range. If xp, yp, and rp are both within the range, a deviation amount S is calculated in step S415. Further, it is determined whether S is within the allowable range of deviation, and if it is within the range, a determination result is given that the deviation of the image data is within the allowable range (step S417), and S is outside the allowable range of deviation. If there is, a determination result that the deviation of the image data is outside the allowable range is given (step S418), and the process of step S204 is ended.
[0031]
In the process of step S106, the appropriate image determination unit (112) selects the image data closest to the front (image data with the smallest amount of deviation) from the plurality of pieces of image data, and stores the selection result in the memory (RAM 106). To the CPU, and the continuous shooting and proper image selection processing ends.
[0032]
The process of step S106 will be described in detail with reference to FIG. First, in step S601, the counter is initialized, and in step S602, the J-th image data whose amount of deviation is within an allowable range is specified. In step S603, the amount of deviation is extracted from the memory, and in step S604, the amount of deviation is extracted. The amount S of is calculated. S605 and S606 are routes for initially setting Smin and Jmin. In S607 and S608, the minimum S value (Smin) and the image data number (Jmin) are updated, and the smallest S value and the image data number are stored. In step S610, it is determined whether or not all the amounts of deviation are within the allowable range, and the comparison of the amounts of deviation of the image data is finished. If all the processing is completed, the J-th image data is selected as the most front image data in step S611. A result is given and the process of step S106 is terminated.
[0033]
In the above embodiment, the amount of misalignment from the front is obtained from the calculation result of the camera viewpoint position in the viewpoint position calculation unit 111. As another implementation method, as shown in FIG. There is also a method of processing the difference (approximate deformation) between the quadrilateral formed by the above and the rectangle (rectangle) surrounding the quadrilateral as the amount of deviation from the front. Hereinafter, in another implementation method, the process from step S103 to step S106, which is a main part, will be described in detail.
[0034]
In the process of step S103, a plurality of line segments having a predetermined length or more are extracted from the image data to form a quadrilateral. Next, as shown in FIG. 13, a rectangle circumscribing the extracted quadrilateral is obtained, and the angle formed by each corresponding side is determined by the amount of deviation {q1, q2, Calculate as q3, q4}. In the following formula, the direction angle is represented by a clockwise angle in the reference direction sky when the direction of the side B1B4 (horizontal right) in FIG. 13 is defined as the reference direction (0 degree). The amount of misalignment Q from the front is calculated by Equation 5 and is temporarily stored in the memory RAM (106) in association with the image data. In Equation 5, | q | represents the absolute value of q.
[0035]
[Expression 1]

[0036]
[Expression 2]

[0037]
[Equation 3]

[0038]
[Expression 4]

[0039]
[Equation 5]

[0040]
Thereafter, in step S104, it is determined whether or not the number of photographed images has reached a predetermined number. If the predetermined number of images has not been reached, shooting is repeated until the number is reached or until an instruction to stop shooting is issued from the CPU (No in step S104). If the predetermined number has been reached (Yes in step S104), the process proceeds to step S105.
[0041]
In the process of step S105, the appropriate image determination unit (112) determines whether or not there is a plurality of pieces of image data whose amount of deviation from the front is within an allowable range. If there is image data whose amount of deviation is within the allowable range (Yes in step S105), the process proceeds to step S106, and the image data with the smallest amount of deviation is selected as the image data closest to the front. If there is no image data in which the amount of deviation is within the allowable range (No in step S105), the CPU notifies the CPU of the result and restarts photographing from the beginning.
[0042]
Hereinafter, the process of step S105 will be described in detail with reference to FIG.
[0043]
First, in steps S201 to S203, a counter and the like are initialized. In step S204, it is determined whether the amount of deviation of the image data is within an allowable range. The process in step S204 in this case will be described with reference to FIG. First, in step S421, the amount of deviation (q1, q2, q3, q4) of the image data is acquired. Next, in step S422 to step S425, it is determined whether the amount of deviation (q1, q2, q3, q4) is within each allowable range. If it is within the allowable range, the total amount of deviation is calculated in step S426. In step S427, the allowable range is determined. Here, if it is determined that it is within the allowable range, the determination result that the amount of deviation of the image data is within the allowable range is given, and the process of step S204 is ended.
[0044]
In the process of step S106, the appropriate image determination unit (112) selects the image data closest to the front (image data with the smallest amount of deviation) from the plurality of pieces of image data, and stores the selection result in the memory (RAM 106). To the CPU, and the continuous shooting and proper image selection processing ends.
[0045]
The process of step S106 will be described in detail using FIG. First, in step S621, the counter is initialized, and in step S622, the J-th image data whose amount of deviation is within the allowable range is specified. In step S623, the amount of deviation is extracted from the memory, and in step S624, the deviation amount is obtained. Calculate the quantity Q. S625 and S626 are routes for initially setting Qmin and Jmin. In S627 and S628, the minimum Q value (Qmin) and the image data number (Jmin) are updated, and the smallest S value and the image data number are stored. In step S630, it is determined whether or not all of the deviation amounts are within the allowable range, and the comparison of the deviation amounts of the image data is completed. When all the processes are completed, in step S631, the Jmin-th image data is selected as the most front image data. A result is given and the process of step S106 is terminated.
[0046]
In addition, in the processing block configuration diagram of FIG. 2, the image data obtained by the appropriate image determination unit (112) closest to the front is subjected to a reverse perspective conversion process to the front image data by the reverse perspective conversion unit (113), The result is stored in the memory.
[0047]
Further, the character recognition unit (114) recognizes the character with respect to the front image data. The character recognizing unit (114) first detects a portion with few black pixels from the distribution of black pixels obtained by projecting an image on the Y axis as a line spacing, and divides the image data into line units. Further, the line-by-line image data is divided into image data for each character. The line-by-line image data is projected on the X-axis, and the black pixel distribution is divided into character-by-character image data with a portion having few black pixels as the character spacing. The similarity calculation is performed between the image data for each character obtained as a result and all characters in the character recognition collation dictionary of the character recognition unit, and the character with the highest similarity is used as the recognition result.
[0048]
The method of extracting character strings and characters from image data is detailed in “Document structure analysis by the extended split detection method” (Information Processing Society of Japan (late fiscal 1997) 55th National Conference 2-105,106). The character recognition method is detailed in "Analysis by Hand Patterned Kanji Character Direction Pattern Matching Method" (The Institute of Electronics, Information and Communication Engineers, Vol. J65-D, No. 5, pp. 550-557, 1982.5).
“Reverse perspective transformation” is described in “Television Image Information Engineering Handbook” (Television Society, Ohmsha), pp.462. Camera calibration, such as camera position and orientation estimation, is described in "Computer Vision, Technical Review and Future Prospects" (New Technology Communications) pp. 37-53, and pp. 97-122.
[0049]
In the embodiment of FIG. 1, the case where the above-described image input processing method is executed by a program stored in the ROM 105 has been shown. This program is stored in an external storage medium (108) such as a floppy disk or a hard disk. You may do it.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of an apparatus for illustrating an embodiment of the present invention.
FIG. 2 is a block configuration diagram of processing according to an embodiment of the present invention.
FIG. 3 is an image view taken with a fixed camera of a personal computer.
FIG. 4 is an image view taken with a portable camera such as a digital still camera.
FIG. 5 is an image diagram of an image obtained by photographing a subject from the front.
FIG. 6 is an image diagram of an image taken from a front (center, vertical) direction, a left direction, and a downward direction with respect to a subject.
FIG. 7 is a diagram illustrating a concept of automatically selecting a front image from continuously captured images.
FIG. 8 is a flowchart illustrating a process of continuously capturing a plurality of images and selecting an optimal image as an image captured from the front.
FIG. 9 is a diagram showing a viewpoint position in terms of polar coordinates.
FIG. 10 is a flowchart of a determination process (S105) as to whether there is a deviation amount within an allowable range among a plurality of pieces of image data.
FIG. 11 is a flowchart of a process for determining whether the amount of deviation is within an allowable range (the process of S204).
FIG. 12 is a flowchart of image data determination processing (processing in S106) with a minimum amount of deviation.
FIG. 13 is a diagram illustrating a concept of an amount of deviation between a target rectangle and a circumscribed rectangle.
FIG. 14 is a flowchart of a determination process (S204) as to whether or not the deviation from the circumscribed rectangle is within an allowable range.
FIG. 15 is a flowchart of image data determination processing (processing of S106) with the smallest amount of deviation in another embodiment (circumscribed rectangle).
[Explanation of symbols]

Claims

Means for continuously shooting a subject and inputting a plurality of image data;
A quadrilateral composed of a plurality of line segments of a predetermined length or more is calculated from each of the plurality of image data, a quadrilateral circumscribing it is calculated, an angular difference formed by each corresponding side is calculated, and the four sides Means for determining that the image data with the smallest amount Q of deviation from the front which is the sum of the angle differences is image data taken from the front among the plurality of image data;
An image input apparatus comprising: means for recognizing characters and figures in the image data photographed from the front.

Means for continuously shooting a subject and inputting a plurality of image data;
A quadrilateral composed of a plurality of line segments of a predetermined length or more is calculated from each of the plurality of image data, a quadrilateral circumscribing it is calculated, an angular difference formed by each corresponding side is calculated, and the four sides Means for determining that the image data with the smallest amount Q of deviation from the front which is the sum of the angle differences is image data taken from the front among the plurality of image data;
Means for converting the image data photographed from the front to the image data photographed from the front;
An image input apparatus comprising: means for recognizing characters and figures in the image data converted into image data photographed from the front.

Photographing a subject continuously and inputting a plurality of image data;
Extracting a quadrilateral composed of a plurality of line segments having a predetermined length or more from each of the plurality of image data, obtaining a quadrilateral circumscribing it, calculating an angle difference formed by each corresponding side, Determining that the image data having the smallest amount of deviation Q from the front, which is the sum of the angle differences, is image data taken from the front,
And a step of recognizing characters and figures in the image data photographed from the front.

Photographing a subject continuously and inputting a plurality of image data;
Extracting a quadrilateral composed of a plurality of line segments having a predetermined length or more from each of the plurality of image data, obtaining a quadrilateral circumscribing it, calculating an angle difference formed by each corresponding side, Determining that the image data having the smallest amount of deviation Q from the front, which is the sum of the angle differences, is image data taken from the front,
Converting the image data photographed from the most front to image data photographed from the front;
And a step of recognizing characters and figures in the image data converted into the image data photographed from the front.