JP4145014B2

JP4145014B2 - Image processing device

Info

Publication number: JP4145014B2
Application number: JP2001004116A
Authority: JP
Inventors: 青木　　伸; 憲彦村田; 貴史北口
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-01-11
Filing date: 2001-01-11
Publication date: 2008-09-03
Anticipated expiration: 2021-01-11
Also published as: JP2002207963A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像処理装置に係り、特に、デジタルカメラで撮影した画像を画像処理可能な画像データに補正し、補正した画像データの文字認識や、撮影した複数の画像データを合成し、ユーザが画像データの内容を容易に認識することを可能にするための画像処理装置に関する。
【０００２】
詳しくは、平面上の対象物を分割撮影した複数の画像を補正し、合成された画像を生成するための画像処理装置に関する。
【０００３】
【従来の技術】
デジタルカメラ等の画像入力装置から画像を取得して、当該画像を簡単に読み易い画像データを生成する従来の技術として以下のような手法がある。
【０００４】
まず、平面上の対象物（例えば、文書）を分割撮影する方法として、特開平１１−２３２３７８「デジタルカメラ、そのデジタルカメラを用いた文書処理システム、コンピュータ可読の記憶媒体及びプログラムコード送出装置」がある。
【０００５】
この手法は、被写体である一枚の紙面を分割して、デジタルカメラで複数枚の画像として撮影し、ユーザが画面を見ながら対話的に変形パラメータを指定して、被写体に正対して撮影した画像に変形するパース補正を行い、ＯＣＲ処理によりパース補正された画像から文字列を読み取る。このようにして撮影画像データから読み取られた文字列を、分割撮影の配置に応じて結合し、元の紙面全体に対応する文字列を出力する。これにより、解像度の低いカメラを使って大きな文書を高解像度で取り込んだ場合と同様にＯＣＲ処理することができる。
【０００６】
しかしながら、上記従来のデジタルカメラで文書分割撮影する方法は、ユーザがパース補正のためのパラメータ手動で設定する必要があり、分割撮影した全ての画像に対し、それぞれ設定するのは手間がかかる。また、分割撮影したデータから統合された画像データは生成されず、最終的に統合されて出力されるのはテキストデータのみであり、文字と写真が混在するような文書や、レイアウトやデザインが重要な文書に対して、統合された画像データを生成し、出力できないという問題がある。
【０００７】
また、ＯＣＲのためのスキュー補正処理として、特開平６−２０３２０２「画像処理装置」がある。
【０００８】
この手法は、画像データに縮小、膨張処理を施し、語、行などに対応する連結成分を抽出し、各連結成分位置を各方向に投影した結果のヒストグラムを求め、その最頻値から画像全体の傾きを推定し、逆方向に回転することにより、各行が水平になるよう画像を補正するものである。
【０００９】
また、ＯＣＲのためのスキュー補正処理のもう一つの方法として、特開平６−１５００６０「画像傾き検出方法及び表処理方法」がある。
【００１０】
この手法は、各連結成分の外接矩形から、その方向を以下のように推定する。この例を図１７に示す。
【００１１】
・外接矩形ａ内の黒画素数をカウントする。
【００１２】
・黒画素数を外接矩形の長辺の長さｂで割り、黒画素領域の幅を推定する。
【００１３】
・外接矩形の短辺の長さｃから推定された黒画素領域の幅ｄを引いた結果と、長辺の長さｂとの比をその連結成分の勾配ｅ（推定された勾配）とする。
【００１４】
また、上記従来のスキュー補正処理では、画像全体を画像面内で回転することにより補正するため、カメラで撮影した画像のように、透視変換によって画像内の位置により行の傾きが変化する画像では補正しきれないという問題がある。
【００１５】
【発明が解決しようとする課題】
本発明は、上記の問題を解決するためになされたもので、撮影された文字と写真が混在する文書や、レイアウトやデザインが重要な文書等を含む画像を容易に読み取ることが可能な画像処理装置を提供することを目的とする。
【００１６】
【課題を解決するための手段】
上記の目的は、被写体を撮影し、撮影された画像から読み易い画像データを生成する画像処理装置であって、平面状の被写体を撮影して画像データを生成する撮影データ生成手段と、生成された一枚の画像データから被写体面の一方向を検出する方向検出手段と、検出された被写体面上の一方向に対応する画像上の直線群が一定の方向を向くように画像データを補正する画像データ補正手段とを有することを特徴とする画像処理装置を提供することにより達成される。このような手段によれば、デジタルカメラで撮影された一枚の画像データから、場所によって異なる方向を持つ文書の行方向を揃える補正を行うことにより読みやすい画像データを生成することが可能となる。
【００１７】
また、上記の方向検出手段は、撮影データ生成手段で撮影された一枚の画像データから、該画像データの連結成分を抽出し、抽出された連結成分の位置と方向に基づいて複数の方向の候補を算出する方向候補算出手段と、算出された方向候補を平面に投影し、平面上のヒストグラムをとることにより被写体の方向を求める方向算出手段とを有することにより、連結成分がノイズのために必ずしも被写体面上の平行方向に対応していない場合でも、ノイズによる影響を低減させることが可能となる。
【００１８】
また、上記の画像データ補正手段は、方向検出手段で検出された被写体上の一方向が、画像データ上の水平／垂直のいずれに近いかを判定する方向判定手段と、判定された方向から、補正後の前記画像データの直線群の方向を決定する方向決定手段とを有することにより、被写体が縦書きか横書きか、また、撮影時のカメラ配置が縦位置か横位置に関わらず、行方向を水平または、垂直に揃え、読みやすい画像データを生成することが可能となる。
【００１９】
また、本発明の目的は、被写体を撮影し、撮影された画像の文字を認識する画像処理装置であって、平面状の被写体を撮影して画像データを生成する撮影データ生成手段と、画像データ生成手段で生成された一枚の画像データから被写体面の一方向を検出する方向検出手段と、方向検出手段で検出された被写体面上の一方向に基づいて、画像データを認識する文字認識手段とを有することを特徴とする画像処理装置を提供することにより達成される。このような手段によれば、１枚の撮影画像のみを用いて、歪を補正した後、文字認識処理を行うため、カメラを使って簡単に誤認識率の低い正確なテキストデータを得ることが可能となる。
また、本発明の目的は、被写体を撮影し、撮影された複数画像から画像データを合成する画像処理装置であって、平面上の被写体を複数回撮影して画像データを生成する画像データ生成手段と、撮影された一枚の画像データから被写体面上の一方向を検出する方向検出手段と、検出された被写体面上の方向に基づいて、画像データに対して文字認識を行う文字認識手段と、認識された文字の種類と位置から変形パラメータを算出する変形パラメータ算出手段と、変形パラメータを用いて、撮影された複数の画像データを合成する合成手段とを有することを特徴とする画像処理装置を提供することにより達成される。このような手段によれば、文字の位置を含む文字認識処理の結果を利用して合成処理を行うため、文字種以外に対応付けの手掛かりが少ない文書に対しても正確な張り合わせ合成を行うことが可能となる。
【００２０】
また、本発明の目的は、撮影された複数の画像から画像データを合成する画像処理装置であって、デジタルカメラから撮影された画像データを取得する画像取得手段と、取得した画像データから被写体面上の一方向を検出する方向検出手段と、方向検出手段で検出された被写体面上の方向に基づいて、画像データに対して文字認識を行う文字認識手段と、認識された文字の種類と位置から変形パラメータを算出する変形パラメータ算出手段と、変形パラメータを用いて、撮影された複数の画像データを合成する合成手段とを有することを特徴とする画像処理装置を提供することにより達成される。これにより、デジタルカメラが、方向検出・補正・合成機能を持たなくとも、当該デジタルカメラを通信等により方向検出・補正・合成機能を有するコンピュータと接続し、撮影した画像データを当該コンピュータに渡すことにより、デジタルカメラに特別なハードウェアを使用しなくとも、正確に合成された画像データを得ることが可能となる。
【００２１】
【発明の実施の形態】
以下、図面と共に、本発明の各実施の形態を説明する。
【００２２】
以下の説明において、本発明では、被写体として表面に文字列が描画された平面状の物体を想定している。
【００２３】
［実施の形態１］
文書を拡大撮影した一枚の画像データでは、行方向以外に被写体の向き、つまり、あおり歪のパラメータを推定する情報を取得することが困難な場合がある。そのような場合、完全なあおり歪の補正は原理的に不可能であるが、行方向だけでも揃えることができれば、例え、行と垂直方向には歪が残っていても、文書の読みやすさは向上する。例えば、図１（ａ）の横書きの文書の画像から、同図（ｂ）の完全な補正画像は取得できなくとも、同図（ｃ）のように、行方向のみを揃えると画像データの読みやすさが向上することがわかる。なお、図１では、横書きの場合を示しており、同図では、横書きの場合において水平に揃えた例であり、縦書きの場合には、垂直に揃えるものとする。
【００２４】
そこで、本発明の実施の形態１では、デジタルカメラ内において、面状の被写体を撮影した一枚の画像データから、画像上の行方向が一定の方向になるように補正した画像データを生成し、記録媒体に記録する例を説明する。
【００２５】
図２は、本発明の実施の形態１に係る装置の構成を示す。
【００２６】
同図に示す装置は、画像撮影部１１、画像メモリ１２、行方向推定部１３、変形パラメータ設定部１４、画像補正部１５、画像圧縮部１６、不揮発性メモリ１７、ＣＰＵ１８、インタフェース回路１９、及びＬＣＤパネル２０から構成される。
【００２７】
画像撮影部１１は、通常のデジタルカメラと同様に、レンズ、ＣＣＤセンサ、Ａ／Ｄ変換回路、色変換／フィルタ処理回路などを使用し、被写体を撮影してデジタル画像データ（以下、画像データと記す）をデジタルカメラ内部の画像メモリ１２に記録する。
【００２８】
行方向推定部１３は、図３に示すように、画像メモリ１２から撮影した画像データを取得して、被写体平面ａ上の行方向を撮像面座標系での３次元方向として推定する。詳細は後述する。
【００２９】
変形パラメータ設定部１４は、行方向推定部１３において、推定された行方向に基づいて、補正画像を得るための仮想的な投影面を設定し、撮影画像をこの面に投影するための透視変換パラメータを計算する。詳細は後述する。
【００３０】
画像補正部１５は、変形パラメータ設定部１４で計算された透視変換パラメータを使用して、画像メモリ１２から取得した撮影された画像データを透視変換により撮影画像を補正し、被写体の各行が水平に並んだ画像を生成する。
【００３１】
画像圧縮部１６は、通常のデジタルカメラと同様に、透視変換された画像データを圧縮処理し、不揮発性メモリ１７に記録する。
【００３２】
不揮発性メモリ１７は、画像圧縮部１６で圧縮処理された画像データを記録し、当該画像データは、ＣＰＵ１８からの要求に応じて読み出される。
【００３３】
ＣＰＵ１８は、イタンタフェース回路１９や、ＬＣＤパネル２０への不揮発性メモリ１７からの画像データの読出し、及びその動作を制御する。これにより、インタフェース回路１９は、不揮発性メモリ１７から取得した画像データを外部の計算機やプリンタに転送し、ＬＣＤパネル２０は、ＣＰＵ１８の制御により装置の状態の情報を表示する。
【００３４】
ここで、上記の行方向推定部１３について詳述する。
【００３５】
行方向推定部１３は、基本的には撮影画像の行に相当する領域を認識し、その方向を検出する。但し、行の認識にはノイズが伴い、また、対象となる文書には、図、見出しなど、行以外に方向を持つ領域が存在する可能性がある。そこで、本実施の形態１では、行に相当すると思われる領域を多数検出し、その最頻値を求めることによって、安定的に行方向を検出する。
【００３６】
図４は、本発明の実施の形態１に係る行方向推定部の動作のフローチャートである。
【００３７】
以下、行方向推定部１３の動作を図４のフローチャートに沿って説明する。
【００３８】
ステップ１０１）行方向推定部１３は、画像撮影部で撮影され、画像メモリ１２に保存されている補正対象の画像データを入力する。
【００３９】
ステップ１０２）行方向推定部１３は、連結成分を抽出する。連結成分の抽出は、画像メモリ１２から取得した画像データを２値化し、縮小、膨張処理を行い、連結成分を求める。隣接する文字同士が連結し、行または、その一部分に相当する大きな領域となる。但し、図５に示すように、一部は短か過ぎたり、複数の行に跨がるなど、期待する「行または、その一部」に対応しない成分を含んでいることも考えられる。
【００４０】
ステップ１０３）次に、位置／傾きを算出する。ステップ１０１において、抽出された連結成分に含まれる画素のＸＹ座標の最大最小値から外接矩形を求める。外接矩形の中心を当該連結成分の位置として記録する。本実施の形態では、この処理として、図１７に示された前述の特開平６−１５００６０に開示されている方法を用いるものとする。当該方法は、まず、外接矩形ａ内の黒画素数をカウントし、当該黒画素数を外接矩形の長辺の長さｂで割り、黒画素領域の幅を推定する。外接矩形の短辺の長さｃから推定された黒画素領域の幅ｄを引いた結果と長辺の長さｂとの比をその連結成分の勾配ｅとして、各連結成分の傾きを算出し記録する。
【００４１】
ステップ１０４）次に、方向候補を算出する。ステップ１０２で求められた連結成分から、ステップ１０３で算出された位置と傾きから２つの連結成分の交点の位置を求め、さらに、光学中心からこの交点へのベクトルを求める。図６は、本発明の実施の形態１に係る行方向推定部における被写体面上の平行線と消失点ベクトルを説明するための図である。カメラの光学中心ａから画像面上の消失点ｂへ向かうベクトルｄ，ｄ’は、消失点ｂに対応する被写体面ｃ上の平行線ｅ，ｅ’と平行であり、消失点ｂの位置を求めることは、被写体面ｃに含まれる特定の方向を求めることと同等である。全ての連結成分の２つ組に対してこの候補ベクトル（方向候補）を求める。
【００４２】
ステップ１０５）更に、行方向判定を行う。この段階で、それぞれの連結成分について、傾きが水平／垂直のどちらに近いかを判断し、その数をカウントして比較する。水平に近いものの方が多い場合は、対象文書は横書き、そうでなければ縦書きと判断する。以下では、説明の簡単化のため文書が横書きの場合について主に説明するが、縦書きと推定された場合には、以下の説明において、縦／横を入れ替えた処理を行えばよい。また、この判定結果は、次段の変形パラメータ算出処理でも利用する。
【００４３】
ステップ１０６）次に、最頻値算出を行う。上記のステップ１０４で求めた連結成分の方向は、ノイズのため、必ずしも被写体面上の平行方向に対応しているとは限らない。そのため、前段の候補ベクトル（方向候補）は全てが一致することはない。そこで、多数の候補ベクトルについてヒストグラムを作成し、その最頻値を求めることにより、ノイズによる影響を低減し、信頼性の高い方法推定処理を行う。ヒストグラムは、ｘ軸成分１に正規化した方向ベクトルのｙｚ成分について採る（縦書きの場合は、ｙ軸成分を正規化したｘｚ成分を用いる）。ヒストグラムの範囲とステップは、例えば、±０．２の範囲で、０．０２刻みでとれば、ｔａｎ（０．２）≒１１．６度、ｔａｎ（０．０２）≒１．１度より、１０度程度の被写体の角度変動幅に対応して、１度程度の精度で方向を求めることができる。
【００４４】
上記のように、原理的には、方向ベクトルと消失点位置は同等であるが、図７に示すように、方向ベクトルは、ｘ軸に近い方向を向いている。そのため、消失点位置は連結成分のわずかなノイズによって大きく変動するので、その分布範囲は広がり、最頻値は取りにくい。一方、同じく方向ベクトルがｘ軸方向に近いという理由から、そのｙｚ成分は連結成分のノイズに影響されにくく、その分布範囲も限られているため、安定的に最頻値を採ることができる。
【００４５】
ステップ１０７）上記ステップ１０６で求められた最頻値を行方向の推定結果（行方向ベクトル）として出力する。
【００４６】
ステップ１０８）上記ステップ１０５で求められた行方向の判定結果（縦／横判定結果）を出力する。
【００４７】
次に、変形パラメータ設定部１４の動作について詳述する。
【００４８】
変形パラメータ設定部１４では、行方向推定部１３で推定された行方向に基づいて、補正画像を得るための仮想的な投影面を設定し、撮影された画像データをこの面に投影するための透視変換パラメータを計算する。
【００４９】
図８は、本発明の実施の形態１に係る変形パラメータ設定部における仮想投影面の設定を説明するための図である。
【００５０】
同図に示すように、この投影面は、撮像面ｂのｙ軸方向に傾きが無いと仮定した被写体平面ａとする。つまり、推定された行方向ベクトルと撮像面ｃのｙ軸ベクトル（縦書きの場合はｘ軸）で張られる平面とする。また、仮想投影面ｃのｘ軸（縦書きの場合はｙ軸）は、行方向推定部１３で推定された行方向と一致させる。
【００５１】
前段の行方向推定部１３の推定処理が正しければ、この平面ｃは、被写体平面ａ上の行方向と平行である。よって、撮影画像データをこの面に投影すれば、被写体を行方向と平行な面に透視変換した場合と同じ画像が得られるので、投影画像上では行は水平に並ぶことになる。
【００５２】
撮影された画像データを仮想投影面ｃに投影するための透視変換パラメータは、撮像面座標から仮想投影面座標への座標変換行列Ｒを経由して、以下のように求める。
【００５３】
撮像面座標系のｘ軸、ｙ軸、ｚ軸方向の単位ベクトルとそれぞれｅｘ，ｅｙ，ｅｚ（推定された行方向ベクトルと一致させた）仮想投影面上のｘ軸向き単位ベクトルをｖｘとする。仮想平面の単位法線ベクトルｎと、仮想平面上の単位ｙ軸ベクトルｖｙはそれぞれ、
ｎ＝（ｅｙ×ｖｘ）／｜ｅｙ×ｖｘ｜
ｖｙ＝（ｎ×ｖｘ）
である。撮像面座標系の単位ベクトルは変換行列Ｒによって、
ｖｘ＝Ｒｅｘ
ｖｙ＝Ｒｅｙ
ｎ＝Ｒｅｚ
と変換されるので、
［ｖｘ，ｖｙ，ｎ］＝Ｒ［ｅｘ，ｅｙ，ｅｚ］
よって、変換行列Ｒは、
３×３行列［ｖｘ，ｖｙ，ｎ］
である。
【００５４】
上記で求められた変換行列Ｒは、撮像面座標系と下層的な投影面座標系の関係を示すものである。よって、変換行列Ｒを用いて、斉次座標表現での透視変換パラメータを、
【００５５】
【数１】

として求め、この行列を使って投影面への変換を実行する。但し、ｋは、補正画像の大きさを決める係数、ｆは焦点距離である。当該変換行列Ｒから透視変換パラメータを求める方法は、特願２０００−２４３３１１における回転行列Ｒを用いて透視変換パラメータを求める方法と同様である。
【００５６】
なお、上記の式の焦点距離ｆ及び、方向候補ベクトルを求めるための焦点距離について、実際に撮影に使用した値以外を使用しても、被写体面上の行を水平にするという効果は変わらない。但し、その場合、余分な補正が生じるため、カメラ内部で撮影に使用した値を検出して使用することが望ましい。
【００５７】
上記のようにして求められた変形パラメータを画像補正部１５に渡すことで、画像補正部１５は、画像メモリ１２から取得した画像データを変形パラメータを用いて補正し、被写体の各行が水平に並んだ画像データを生成することができる。
【００５８】
この後段の画像圧縮部１６等の処理は、前述の通りである。
【００５９】
［実施の形態２］
本実施の形態では、前述の実施の形態１において、撮影した画像データを変形パラメータを用いて補正した後、当該画像データを２値化してＯＣＲ処理（文字認識処理）を行う例を説明する。
【００６０】
ＯＣＲ処理する対象は通常は文書であるため、被写体に行構造が存在する可能性が非常に高い。それは、前述の実施の形態１で説明した補正処理にとって有利な条件である。
【００６１】
また、前述の実施の形態１のような補正処理を行うことにより、補正された画像データは、行方向が水平に揃うため、撮影画像そのものを利用する場合よりもＯＣＲ処理の精度を向上させることができる。一方、縦方向には歪が残り各行で文字の大きさが変化するが、通常、ＯＣＲ処理は文字の大きさの変化には対応できるので、このような画像は行により文字の大きさの変化する文書と同様に、問題なく処理できる。
【００６２】
図９は、本発明の実施の形態２に係る装置の構成を示す。同図において、実施の形態１の図２に示す構成と同一部分には同一符号を付し、その説明を省略する。
【００６３】
図９に示す装置は、図２の画像圧縮部１６を用いずに、２値化部２１とＯＣＲ部２２を設けた構成である。
【００６４】
２値化部２１では、画像補正部１５で補正された画像データを２値化し、ＯＣＲ部２２に渡す。
【００６５】
ＯＣＲ部２２では、補正され、２値化された画像データに対して文字認識処理を行い、処理結果を不揮発性メモリ１７に記録する。
【００６６】
これにより、１枚の撮影画像データだけを使って、歪を補正した後に、ＯＣＲ処理を行うため、カメラを使って簡単に誤認識率の低い正確なテキストデータを得ることができる。
【００６７】
［実施の形態３］
本実施の形態では、被写体を複数枚に分割して撮影された画像データに対して、前述の実施の形態２と同様に、画像データの補正及びＯＣＲ処理（文字認識処理）を行い、ＯＣＲ結果を利用して、分割された画像データの貼り合わせ合成処理を行う。
【００６８】
図１０（ａ）に示すように、紙面全体を含むように撮影した画像データであれば、当該画面内に、紙の上下・左右など２方向の平行線が写っている可能性が大きいが、文書画像を分割撮影して張り合わせ合成する場合、図１０（ｂ）に示すように、文字種以外に対応点探索の手掛かりがないことがある。そのような場合でも、ＯＣＲ結果の文字種を利用して対応点探索を行うことにより、正確な貼り合わせ処理を行うことができる。
【００６９】
図１１は、本発明の実施の形態３に係る装置の構成を示す。同図において、実施の形態１の図２及び実施の形態２の図９の構成と同一構成部分には同一符号を付し、その説明を省略する。
【００７０】
図１１に示す装置は、前述の実施の形態２の構成において、ＯＣＲ部２２の後段に貼り合わせ合成部２３を付加した構成である。
【００７１】
本実施の形態３において、ＯＣＲ部２２は、後段の貼り合わせ合成部２３の合成処理のため、処理結果として、各文字の種類と画像上の位置（ｘｙ座標値）を要素とする２次元配列を出力する。但し、このＯＣＲ部２２は、あおり補正処理後の画像データを対象に実行するので、ＯＣＲ部２２が出力する位置データは、あおり補正処理画像データ上の位置である。そのため、ＯＣＲ部２２は、画像補正部１５より入力された透視変換パラメータを用いて、対応位置にあおり補正の逆変換を施した結果を出力する。
【００７２】
図１２は、本発明の実施の形態３に係るＯＣＲ部の出力データを示す。同図に示すように、ＯＣＲ部２２は、文字コード、撮影された画像データ上のｘｙ座標値の組（位置データ）を要素とする、画像中の行構造を反映した２次元配列である。
【００７３】
貼り合わせ合成部２３は、ＯＣＲ部２２から取得した対応位置情報に基づいて、合成のための変形パラメータを算出し、画像メモリ１２から撮影された画像データを取得して画像データの合成を行う。但し、対応位置を探索するために、画像ブロックの相関ではなく、以下のようにＯＣＲ処理結果を利用する。
【００７４】
ここで、複数の画像データの対応位置を探索する動作を説明する。
【００７５】
図１３は、本発明の実施の形態３に係る貼り合わせ合成部における複数の画像データの対応位置探索処理のフローチャートである。
【００７６】
ステップ２０１）貼り合わせ合成部２３は、画像メモリ１２から取得した第１の画像データから１行分の文字コード列を選択する。
【００７７】
ステップ２０２）図１４に示すように、画像メモリ１２から取得した第２の画像データの全ての行について、開始位置をずらしながら、第１の画像データと第２の画像データの２行の文字コード列の内容が一致する文字数を求める。
【００７８】
ステップ２０３）ステップ２０２で求められた一致文字数が最大となる行及び、開始位置を求める。
【００７９】
ステップ２０４）ステップ２０３において、求められた開始位置での各文字を対応する文字と判定し、各文字の画像上での位置を対応位置として記録する。上記の処理により求められた対応位置データを近似する透視変換パラメータを求め、当該透視変換パラメータに基づいて画像データを合成する。
【００８０】
上記のように、本実施の形態３では、画像メモリ１２から画像データを取得してあおり補正処理を施した結果に対して合成処理を行うことにより、ぼけを生じる変形処理を２回かけるよりも、あおり補正と合成処理に必要な変形を纏めて１回の変形処理で済むため、ぼけが少なく高画質な結果を得ることができる。
【００８１】
また、文字画像のように細かい濃淡構造が多く含まれる画像データを使ったマッチングで発生する傾きやノイズによる誤対応を、先にＯＣＲ処理による対応点探索を行うことで削減することができる。
【００８２】
なお、上記の実施の形態１〜３までの画像処理装置は、デジタルカメラとして一つの筐体に収容し、上記の画像処理装置の各構成要素をプログラムとして構築して小型記録媒体に格納して、デジタルカメラに装着し、当該プログラムをデジタルカメラに実行させることにより、デジタルカメラに上記の画像処理の機能を持たせることが可能となる。
【００８３】
［実施の形態４］
本実施の形態４では、実施の形態１〜３で適用された補正機能等を持たない通常のデジタルカメラを使用し、撮影した画像データをカメラ外部の計算機に取り込んだ後、その計算機上で動作するソフトウェアで前述の実施の形態３と同等の処理を行うものである。
【００８４】
図１５は、本発明の実施の形態４に係るシステム構成を示す。
【００８５】
同図に示すシステムは、ＣＰＵ３１、メモリ３２、ディスク装置３３、ディスプレイ３４、プリンタ３５、通信装置３６、フロッピーディスク装置３７及びデジタルカメラ４０から構成され、通信装置３６とデジタルカメラ４０は通信等または、接続機器等により接続されている。
【００８６】
前述の実施の形態３の装置構成のうち、行方向推定部１３、変形パラメータ設定部１４、画像補正部１５、二値化部２１、ＯＣＲ部２２、貼り合わせ合成部２３の各機能をソフトウェアとして構築し、記憶媒体（メモリ３２、ディスク装置３３、または、フロッピーディスク装置３７のいずれか）に格納しておき、デジタルカメラ４０で撮影された画像データを通信装置３６を介して取得して、記憶媒体（メモリ３２、ディスク装置３３、または、フロッピーディスク装置３７のいずれか）に格納する。
【００８７】
図１６は、本発明の実施の形態４に係るシステムにおける動作を示すフローチャートである。
【００８８】
ステップ４０１）ＣＰＵ３１は、画像データを格納した記憶媒体から画像データを入力する。
【００８９】
ステップ４０２）ＣＰＵ３１は、ソフトウェアを格納した記憶媒体から上記の機能を有するソフトウェアを起動させ、前述の実施の形態３の行方向推定部１３と同様に、行方向推定処理を行う。
【００９０】
ステップ４０３）ＣＰＵ３１は、前述の実施の形態３の変形パラメータ設定部１４と同様に、変形パラメータの設定を行う。
【００９１】
ステップ４０４）ＣＰＵ３１は、前述の実施の形態３の画像補正部１５と同様に画像データの補正処理を行う。
【００９２】
ステップ４０５）ＣＰＵ３１は、前述の実施の形態３の二値化部２１と同様に、補正された画像データを２値化する処理を行う。
【００９３】
ステップ４０６）ＣＰＵ３１は、前述の実施の形態３のＯＣＲ部２２と同様に各文字の種類と画像データ上の位置を求める処理を行う。
【００９４】
ステップ４０７）ＣＰＵ３１は、前述の実施の形態３の張り合わせ合成部２３と同様に、対応位置情報に基づいて、合成のための変形パラメータを算出し、画像データの合成を行う。
【００９５】
ステップ４０８）ＣＰＵ３１は、合成された画像データをディスプレイ３４に表示したり、または、記憶媒体（メモリ３２、ディスク装置３３、または、フロッピーディスク装置３７のいずれか）に記録する等の出力処理を行う。
【００９６】
なお、上記の全ての実施形態に係る画像処理装置の各構成要素は、コンピュータプログラムにより記述可能である。従って、当該プログラムをＣＤ−ＲＯＭやフロッピーディスク等のコンピュータ読み取り可能な記憶媒体に格納し、本発明を実施するコンピュータに装着し、当該プログラムをコンピュータにインストールすることにより容易に本発明の画像処理を実現することができる。
【００９７】
なお、本発明は、上記の実施の形態の例に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。
【００９８】
【発明の効果】
上述のように、本発明に係る画像処理装置によれば、文書をデジタルカメラで撮影した一枚の画像データから、場所によって異なる方向を有する被写体（文書）の行方向を、透視変換により補正することにより、容易に読みやすい文書の画像データを得ることができる。
【００９９】
また、画像データの連結成分を抽出し、当該連結成分の位置と方向から複数の方向の候補を求め、当該方向候補を平面に投影し、平面上のヒストグラムをとることにより、連結成分のノイズに影響されにくく、安定的な文書の行方向を検出することが可能である。
【０１００】
また、被写体文書が横書きか縦書きかを判定すると共に、撮影時のカメラの配置が縦位置か横位置に関わらず、行方向を水平または、垂直に揃える補正することにより読みやすい画像データを生成することができる。
【０１０１】
また、方向を検出し、文字認識を行うことにより、ユーザは、撮影方向を意識することなく、文書をデジタルカメラで撮影するだけでよく、ユーザの撮影時の負担を軽減することができる。
【０１０２】
また、文字認識により取得した文字の種類と位置から求められた変形パラメータを用いて撮影された複数の画像データを貼り合わせることにより、ユーザの撮影方向などの指示を必要とせず、正確な画像データの合成を行うことができる。さらに、デジタルカメラ自体に行方向の検出、画像データの方向の補正、文字認識や合成処理等の機能を備えていなくても、これらの機能をコンピュータに搭載させることにより、一般的な構成のデジタルカメラを用いてもこれらの画像処理を実行することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態１における画像データの補正の概要を説明するための図である。
【図２】本発明の実施の形態１に係る装置の構成図である。
【図３】本発明の実施の形態１に係る行方向推定部における被写体面上の行方向の推定動作を説明するための図である。
【図４】本発明の実施の形態１に係る行方向推定部の動作のフローチャートである。
【図５】本発明の実施の形態１に係る行方向推定部における連結成分抽出時の例である。
【図６】本発明の実施の形態１に係る行方向推定部における被写体面上の平行線と消失点ベクトルを説明するための図である。
【図７】本発明の実施の形態１に係る行方向推定部における商品点位置とベクトルの変動を説明するための図である。
【図８】本発明の実施の形態１に係る変形パラメータ設定部における仮想投影面の設定を説明するための図である。
【図９】本発明の実施の形態２に係る装置の構成図である。
【図１０】本発明の実施の形態３に係る撮像された画像データの例である。
【図１１】本発明の実施の形態３に係る装置の構成図である。
【図１２】本発明の実施の形態３に係るＯＣＲ部の出力データである。
【図１３】本発明の実施の形態３の張り合わせ合成部における複数の画像データの対応位置探索処理のフローチャートである。
【図１４】本発明の実施の形態３におけるＯＣＲ処理結果の対応探索を説明するための図である。
【図１５】本発明の実施の形態４に係るシステム構成図である。
【図１６】本発明の実施の形態４に係るシステムにおける動作のフローチャートである。
【図１７】被写体面上の行方向判定ブロック図である。
【符号の説明】
１１画像撮影部
１２画像メモリ
１３行方向推定部
１４変形パラメータ設定部
１５画像補正部
１６画像圧縮部
１７不揮発性メモリ
１８ＣＰＵ
１９インタフェース回路
２０ＬＣＤパネル
２１二値化部
２２ＯＣＲ部
２３張り合わせ合成部
３１ＣＰＵ
３２メモリ
３３ディスク装置
３４ディスプレイ
３５プリンタ
３６通信装置
３７フロッピーディスク装置
４０デジタルカメラ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing apparatus, and in particular, corrects an image captured by a digital camera into image data that can be processed, and performs character recognition of the corrected image data and synthesis of a plurality of captured image data. The present invention relates to an image processing apparatus for enabling easy recognition of the contents of image data.
[0002]
More specifically, the present invention relates to an image processing apparatus for correcting a plurality of images obtained by dividing and photographing an object on a plane and generating a synthesized image.
[0003]
[Prior art]
As a conventional technique for acquiring an image from an image input device such as a digital camera and generating image data that is easy to read, there are the following methods.
[0004]
First, as a method for dividing and photographing an object on a plane (for example, a document), Japanese Patent Laid-Open No. 11-232378 “Digital camera, document processing system using the digital camera, computer-readable storage medium, and program code sending device” is disclosed. is there.
[0005]
In this method, a single sheet of the subject is divided and photographed as a plurality of images with a digital camera, and the user specifies the deformation parameters interactively while looking at the screen, and photographs the subject directly. Parse correction that transforms into an image is performed, and a character string is read from the image that has been parse-corrected by OCR processing. The character strings read from the photographed image data in this way are combined according to the divided photographing arrangement, and a character string corresponding to the entire original paper surface is output. As a result, OCR processing can be performed in the same manner as when a large document is captured at a high resolution using a low resolution camera.
[0006]
However, in the above-described conventional method for taking a document with a digital camera, it is necessary for the user to manually set the parameters for the perspective correction, and it is troublesome to set each of the divided images. In addition, integrated image data is not generated from segmented data, and only the text data is finally output after being integrated. Documents with mixed text and photos, layout and design are important. There is a problem that integrated image data cannot be generated and output for a simple document.
[0007]
Japanese Patent Laid-Open No. 6-203202 “Image processing apparatus” is available as a skew correction process for OCR.
[0008]
This method reduces or expands image data, extracts connected components corresponding to words, lines, etc., obtains a histogram of the result of projecting each connected component position in each direction, and calculates the entire image from the mode value. The image is corrected so that each row becomes horizontal by estimating the inclination of the image and rotating in the reverse direction.
[0009]
Another method of skew correction processing for OCR is disclosed in Japanese Patent Application Laid-Open No. 6-150060 “Image inclination detection method and table processing method”.
[0010]
In this method, the direction is estimated from the circumscribed rectangle of each connected component as follows. An example of this is shown in FIG.
[0011]
Count the number of black pixels in the circumscribed rectangle a.
[0012]
Divide the number of black pixels by the length b of the long side of the circumscribed rectangle to estimate the width of the black pixel area.
[0013]
The ratio of the result obtained by subtracting the estimated width d of the black pixel area from the short side length c of the circumscribed rectangle to the long side length b is the gradient e (estimated gradient) of the connected component. .
[0014]
In addition, in the conventional skew correction processing, since the entire image is corrected by rotating in the image plane, an image whose line inclination changes depending on the position in the image by perspective transformation, such as an image taken by a camera. There is a problem that it cannot be corrected.
[0015]
[Problems to be solved by the invention]
The present invention has been made in order to solve the above-described problem, and image processing capable of easily reading an image including a document in which captured characters and photographs are mixed, a document in which layout and design are important, and the like. An object is to provide an apparatus.
[0016]
[Means for Solving the Problems]
An object of the present invention is to provide an image processing apparatus that captures an image of a subject and generates easy-to-read image data from the captured image. The image processing device generates image data by capturing an image of a planar object. Direction detecting means for detecting one direction of the subject surface from the single image data, and correcting the image data so that a group of straight lines on the image corresponding to the one direction on the detected subject surface is directed in a certain direction. This is achieved by providing an image processing apparatus comprising image data correction means. According to such a means, it becomes possible to generate easy-to-read image data by performing correction to align the line direction of a document having a different direction depending on the location from one image data photographed by a digital camera. .
[0017]
Further, the direction detection unit extracts a connected component of the image data from one piece of image data captured by the captured data generation unit, and a plurality of directions based on the position and direction of the extracted connected component. By having direction candidate calculation means for calculating candidates and direction calculation means for projecting the calculated direction candidates onto a plane and obtaining the direction of the subject by taking a histogram on the plane, the connected component is due to noise. Even when the direction does not necessarily correspond to the parallel direction on the subject surface, the influence of noise can be reduced.
[0018]
Further, the image data correcting means includes a direction determining means for determining whether one direction on the subject detected by the direction detecting means is closer to horizontal or vertical on the image data, and from the determined direction, Direction determining means for determining the direction of the straight line group of the image data after correction, the row direction regardless of whether the subject is vertical writing or horizontal writing and the camera arrangement at the time of shooting is vertical or horizontal position. It is possible to generate image data that is easy to read by aligning the images horizontally or vertically.
[0019]
Another object of the present invention is an image processing apparatus for photographing a subject and recognizing characters of the photographed image, photographing data generating means for photographing a planar subject and generating image data, and image data Direction detection means for detecting one direction of the subject surface from one piece of image data generated by the generation means, and character recognition means for recognizing image data based on one direction on the subject surface detected by the direction detection means It is achieved by providing an image processing apparatus characterized by comprising: According to such means, since the character recognition process is performed after correcting the distortion using only one photographed image, accurate text data with a low misrecognition rate can be easily obtained using a camera. It becomes possible.
Another object of the present invention is an image processing apparatus for photographing a subject and combining image data from the plurality of photographed images, the image data generating means for photographing the subject on a plane a plurality of times and generating image data And direction detecting means for detecting one direction on the subject surface from the photographed image data, and character recognition means for performing character recognition on the image data based on the detected direction on the subject surface. An image processing apparatus comprising: a deformation parameter calculating unit that calculates a deformation parameter from the recognized character type and position; and a combining unit that combines a plurality of captured image data using the deformation parameter. Is achieved by providing According to such a means, since the composition process is performed using the result of the character recognition process including the character position, it is possible to perform an accurate combining composition even for a document with few clues of correspondence other than the character type. It becomes possible.
[0020]
Another object of the present invention is an image processing apparatus for synthesizing image data from a plurality of captured images, an image acquisition means for acquiring image data captured from a digital camera, and a subject surface from the acquired image data. Direction detecting means for detecting one upper direction, character recognition means for performing character recognition on image data based on the direction on the subject surface detected by the direction detecting means, and the type and position of the recognized character The present invention is achieved by providing an image processing apparatus comprising: a deformation parameter calculating unit that calculates a deformation parameter from the image data; and a combining unit that combines a plurality of photographed image data using the deformation parameter. Thus, even if the digital camera does not have a direction detection / correction / combination function, the digital camera is connected to a computer having a direction detection / correction / combination function by communication or the like, and the captured image data is passed to the computer. Thus, it is possible to obtain accurately synthesized image data without using special hardware in the digital camera.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, each embodiment of the present invention will be described with reference to the drawings.
[0022]
In the following description, the present invention assumes a planar object with a character string drawn on the surface as a subject.
[0023]
[Embodiment 1]
In a single image data obtained by enlarging a document, it may be difficult to obtain information for estimating the orientation of the subject, that is, the tilt distortion parameter, in addition to the row direction. In such a case, it is impossible in principle to completely correct the distortion, but if it can be aligned only in the line direction, even if distortion remains in the direction perpendicular to the line, the readability of the document is improved. Will improve. For example, even if the complete corrected image shown in FIG. 1B cannot be obtained from the horizontally written document image shown in FIG. 1A, the image data can be read when only the row direction is aligned as shown in FIG. It turns out that ease improves. FIG. 1 shows the case of horizontal writing, and FIG. 1 shows an example of horizontal alignment in horizontal writing, and vertical alignment in vertical writing.
[0024]
Therefore, in the first embodiment of the present invention, image data corrected so that the row direction on the image becomes a fixed direction is generated from one piece of image data obtained by photographing a planar subject in the digital camera. An example of recording on a recording medium will be described.
[0025]
FIG. 2 shows the configuration of the apparatus according to Embodiment 1 of the present invention.
[0026]
The apparatus shown in the figure includes an image photographing unit 11, an image memory 12, a row direction estimating unit 13, a deformation parameter setting unit 14, an image correcting unit 15, an image compressing unit 16, a nonvolatile memory 17, a CPU 18, an interface circuit 19, and It is composed of an LCD panel 20.
[0027]
The image capturing unit 11 uses a lens, a CCD sensor, an A / D conversion circuit, a color conversion / filter processing circuit, and the like, as in a normal digital camera, captures a subject and digital image data (hereinafter referred to as image data). Is recorded in the image memory 12 inside the digital camera.
[0028]
As shown in FIG. 3, the row direction estimation unit 13 acquires image data taken from the image memory 12, and estimates the row direction on the subject plane a as a three-dimensional direction in the imaging plane coordinate system. Details will be described later.
[0029]
The deformation parameter setting unit 14 sets a virtual projection plane for obtaining a corrected image based on the estimated row direction in the row direction estimation unit 13, and perspective transformation for projecting the captured image onto this plane. Calculate the parameters. Details will be described later.
[0030]
The image correction unit 15 uses the perspective transformation parameters calculated by the deformation parameter setting unit 14 to correct the photographed image by perspective transformation of the photographed image data acquired from the image memory 12 so that each row of the subject is horizontal. Generate side-by-side images.
[0031]
The image compression unit 16 compresses the perspective-converted image data and records it in the non-volatile memory 17 in the same manner as a normal digital camera.
[0032]
The nonvolatile memory 17 records the image data compressed by the image compression unit 16, and the image data is read in response to a request from the CPU 18.
[0033]
The CPU 18 controls reading of image data from the non-volatile memory 17 to the interface circuit 19 and the LCD panel 20 and the operation thereof. As a result, the interface circuit 19 transfers the image data acquired from the nonvolatile memory 17 to an external computer or printer, and the LCD panel 20 displays information on the state of the apparatus under the control of the CPU 18.
[0034]
Here, the row direction estimation unit 13 will be described in detail.
[0035]
The row direction estimation unit 13 basically recognizes a region corresponding to a row of a captured image and detects the direction. However, there is a noise in line recognition, and there is a possibility that an area having a direction other than the line, such as a figure or a headline, exists in the target document. Therefore, in the first embodiment, a large number of regions that are considered to correspond to rows are detected, and the mode value is obtained to stably detect the row direction.
[0036]
FIG. 4 is a flowchart of the operation of the row direction estimation unit according to Embodiment 1 of the present invention.
[0037]
Hereinafter, the operation of the row direction estimation unit 13 will be described with reference to the flowchart of FIG.
[0038]
Step 101) The row direction estimation unit 13 inputs image data to be corrected, which is captured by the image capturing unit and stored in the image memory 12.
[0039]
Step 102) The row direction estimation unit 13 extracts a connected component. In the extraction of the connected component, the image data acquired from the image memory 12 is binarized and subjected to reduction and expansion processing to obtain a connected component. Adjacent characters are connected to form a large area corresponding to a line or part thereof. However, as shown in FIG. 5, it may be possible that some of the components are too short or include a component that does not correspond to the expected “row or part thereof”, such as straddling a plurality of rows.
[0040]
Step 103) Next, the position / tilt is calculated. In step 101, a circumscribed rectangle is obtained from the maximum and minimum values of the XY coordinates of the pixels included in the extracted connected component. The center of the circumscribed rectangle is recorded as the position of the connected component. In the present embodiment, as this processing, the method disclosed in the above-mentioned JP-A-6-150060 shown in FIG. 17 is used. The method first counts the number of black pixels in the circumscribed rectangle a, divides the number of black pixels by the length b of the long side of the circumscribed rectangle, and estimates the width of the black pixel region. The slope of each connected component is calculated by using the ratio of the result of subtracting the estimated width d of the black pixel area from the length c of the short side of the circumscribed rectangle and the length b of the long side as the gradient e of the connected component. Record.
[0041]
Step 104) Next, direction candidates are calculated. From the connected component obtained in step 102, the position of the intersection of the two connected components is obtained from the position and inclination calculated in step 103, and a vector from the optical center to this intersection is obtained. FIG. 6 is a diagram for explaining parallel lines and vanishing point vectors on the subject plane in the row direction estimation unit according to Embodiment 1 of the present invention. The vectors d and d ′ from the optical center a of the camera toward the vanishing point b on the image plane are parallel to the parallel lines e and e ′ on the subject surface c corresponding to the vanishing point b, and the position of the vanishing point b is determined. Obtaining is equivalent to obtaining a specific direction included in the subject surface c. This candidate vector (direction candidate) is obtained for every pair of all connected components.
[0042]
Step 105) Further, the row direction is determined. At this stage, for each connected component, it is determined whether the slope is close to horizontal or vertical, and the number is counted and compared. If there are more documents that are nearly horizontal, the target document is determined to be written horizontally, otherwise it is determined to be written vertically. In the following, a case where the document is horizontally written will be mainly described for simplification of the description. However, when it is estimated that the document is vertically written, in the following description, processing in which the portrait / landscape are switched may be performed. This determination result is also used in the next-stage deformation parameter calculation process.
[0043]
Step 106) Next, the mode value is calculated. The direction of the connected component obtained in step 104 above does not necessarily correspond to the parallel direction on the subject surface due to noise. For this reason, all of the preceding candidate vectors (direction candidates) do not match. Therefore, histograms are created for a large number of candidate vectors, and the mode value is obtained, thereby reducing the influence of noise and performing highly reliable method estimation processing. The histogram is taken for the yz component of the direction vector normalized to the x-axis component 1 (in the case of vertical writing, the xz component obtained by normalizing the y-axis component is used). The range and step of the histogram are, for example, within a range of ± 0.2, and in steps of 0.02, tan (0.2) ≈11.6 degrees and tan (0.02) ≈1.1 degrees, The direction can be obtained with an accuracy of about 1 degree corresponding to the angle fluctuation range of the subject of about 10 degrees.
[0044]
As described above, in principle, the direction vector is the same as the vanishing point position, but as shown in FIG. 7, the direction vector is oriented in the direction close to the x-axis. For this reason, the vanishing point position largely fluctuates due to slight noise of the connected component, so that the distribution range is widened and the mode value is difficult to take. On the other hand, since the direction vector is close to the x-axis direction, the yz component is hardly affected by the noise of the connected component and its distribution range is limited, so that the mode value can be stably taken.
[0045]
Step 107) The mode value obtained in Step 106 is output as a row direction estimation result (row direction vector).
[0046]
Step 108) The row direction determination result (vertical / horizontal determination result) obtained in step 105 is output.
[0047]
Next, the operation of the deformation parameter setting unit 14 will be described in detail.
[0048]
The deformation parameter setting unit 14 sets a virtual projection plane for obtaining a corrected image based on the row direction estimated by the row direction estimation unit 13, and projects the captured image data on this plane. Calculate the perspective transformation parameters.
[0049]
FIG. 8 is a diagram for explaining setting of a virtual projection plane in the deformation parameter setting unit according to Embodiment 1 of the present invention.
[0050]
As shown in the figure, the projection plane is a subject plane a that is assumed to have no inclination in the y-axis direction of the imaging plane b. That is, the plane is stretched between the estimated row direction vector and the y-axis vector (x-axis in the case of vertical writing) of the imaging surface c. Further, the x-axis (y-axis in the case of vertical writing) of the virtual projection plane c is matched with the row direction estimated by the row direction estimation unit 13.
[0051]
If the estimation process of the previous row direction estimation unit 13 is correct, the plane c is parallel to the row direction on the subject plane a. Therefore, if the captured image data is projected onto this plane, the same image as that obtained when the subject is perspective-transformed into a plane parallel to the row direction can be obtained, so that the rows are arranged horizontally on the projected image.
[0052]
The perspective transformation parameters for projecting the captured image data onto the virtual projection plane c are obtained as follows via the coordinate transformation matrix R from the imaging plane coordinates to the virtual projection plane coordinates.
[0053]
The unit vectors in the x-axis, y-axis, and z-axis directions of the imaging plane coordinate system and the x-axis direction unit vectors on the virtual projection plane that are ex, ey, and ez (matched with the estimated row direction vectors) are set as vx. . The unit normal vector n of the virtual plane and the unit y-axis vector vy on the virtual plane are respectively
n = (ey × vx) / | ey × vx |
vy = (n × vx)
It is. The unit vector of the imaging plane coordinate system is represented by a transformation matrix R,
vx = R ex
vy = R ey
n = Rez
Is converted to
[Vx, vy, n] = R [ex, ey, ez]
Therefore, the transformation matrix R is
3 × 3 matrix [vx, vy, n]
It is.
[0054]
The transformation matrix R obtained above shows the relationship between the imaging plane coordinate system and the lower-level projection plane coordinate system. Therefore, using the transformation matrix R, the perspective transformation parameter in the homogeneous coordinate expression is
[0055]
[Expression 1]

And conversion to the projection plane is executed using this matrix. Here, k is a coefficient that determines the size of the corrected image, and f is a focal length. The method for obtaining the perspective transformation parameter from the transformation matrix R is the same as the method for obtaining the perspective transformation parameter using the rotation matrix R in Japanese Patent Application No. 2000-243311.
[0056]
Note that the use of the focal length f and the focal length for obtaining the direction candidate vector other than those actually used for photographing does not change the effect of leveling the row on the subject surface. . However, in this case, since extra correction occurs, it is desirable to detect and use the value used for shooting inside the camera.
[0057]
By passing the deformation parameter obtained as described above to the image correction unit 15, the image correction unit 15 corrects the image data acquired from the image memory 12 using the deformation parameter, and each row of the subject is aligned horizontally. Image data can be generated.
[0058]
The processing of the subsequent image compression unit 16 and the like is as described above.
[0059]
[Embodiment 2]
In the present embodiment, an example will be described in which the captured image data is corrected using the deformation parameters in the above-described first embodiment, and then the image data is binarized to perform OCR processing (character recognition processing).
[0060]
Since an object to be subjected to OCR processing is usually a document, there is a very high possibility that a row structure exists in the subject. This is an advantageous condition for the correction processing described in the first embodiment.
[0061]
Further, by performing the correction process as described in the first embodiment, the corrected image data is aligned in the horizontal direction, so that the accuracy of the OCR process is improved as compared with the case where the captured image itself is used. Can do. On the other hand, distortion remains in the vertical direction, and the size of characters changes in each line. Usually, since OCR processing can cope with changes in the size of characters, such images change the size of characters depending on the line. Can be processed without problems, just like a document.
[0062]
FIG. 9 shows the configuration of an apparatus according to Embodiment 2 of the present invention. In the figure, the same parts as those shown in FIG. 2 of the first embodiment are denoted by the same reference numerals, and the description thereof is omitted.
[0063]
The apparatus shown in FIG. 9 has a configuration in which a binarization unit 21 and an OCR unit 22 are provided without using the image compression unit 16 of FIG.
[0064]
The binarization unit 21 binarizes the image data corrected by the image correction unit 15 and passes it to the OCR unit 22.
[0065]
The OCR unit 22 performs character recognition processing on the corrected and binarized image data, and records the processing result in the nonvolatile memory 17.
[0066]
Accordingly, since the OCR process is performed after correcting the distortion using only one photographed image data, accurate text data with a low misrecognition rate can be easily obtained using the camera.
[0067]
[Embodiment 3]
In the present embodiment, image data correction and OCR processing (character recognition processing) are performed on image data obtained by dividing a subject into a plurality of images in the same manner as in the second embodiment described above, and an OCR result is obtained. Is used to combine and divide the divided image data.
[0068]
As shown in FIG. 10A, if the image data is taken so as to include the entire paper surface, there is a high possibility that parallel lines in two directions such as up, down, left and right of the paper are reflected in the screen. When a document image is dividedly photographed and pasted and combined, as shown in FIG. 10B, there are cases where there is no clue to search for corresponding points other than character types. Even in such a case, by performing the corresponding point search using the character type of the OCR result, an accurate pasting process can be performed.
[0069]
FIG. 11 shows the configuration of an apparatus according to Embodiment 3 of the present invention. In the figure, the same components as those in FIG. 2 of the first embodiment and FIG. 9 of the second embodiment are denoted by the same reference numerals, and the description thereof is omitted.
[0070]
The apparatus shown in FIG. 11 has a configuration in which a bonding composition unit 23 is added to the subsequent stage of the OCR unit 22 in the configuration of the above-described second embodiment.
[0071]
In the third embodiment, the OCR unit 22 is a two-dimensional array having, as processing results, the type of each character and the position on the image (xy coordinate value) as a result of the composition processing of the pasting composition unit 23. Is output. However, since the OCR unit 22 executes the image data after the tilt correction process, the position data output from the OCR unit 22 is a position on the tilt correction process image data. For this reason, the OCR unit 22 outputs the result of reverse conversion of correction at the corresponding position using the perspective transformation parameter input from the image correction unit 15.
[0072]
FIG. 12 shows output data of the OCR unit according to Embodiment 3 of the present invention. As shown in the figure, the OCR unit 22 is a two-dimensional array that reflects the row structure in the image, with a character code and a set of xy coordinate values (position data) on the captured image data as elements.
[0073]
The bonding composition unit 23 calculates deformation parameters for composition based on the corresponding position information acquired from the OCR unit 22, acquires image data captured from the image memory 12, and combines the image data. However, in order to search for the corresponding position, not the correlation of the image blocks but the OCR processing result is used as follows.
[0074]
Here, an operation for searching for corresponding positions of a plurality of image data will be described.
[0075]
FIG. 13 is a flowchart of a corresponding position search process for a plurality of image data in the bonding composition unit according to the third embodiment of the present invention.
[0076]
Step 201) The bonding composition unit 23 selects a character code string for one line from the first image data acquired from the image memory 12.
[0077]
Step 202) As shown in FIG. 14, the character codes of the two lines of the first image data and the second image data are shifted for all the lines of the second image data acquired from the image memory 12 while shifting the start position. Find the number of characters that match the contents of a column.
[0078]
Step 203) The line where the number of matching characters obtained in Step 202 is maximized and the start position are obtained.
[0079]
Step 204) In step 203, each character at the determined start position is determined as a corresponding character, and the position of each character on the image is recorded as the corresponding position. A perspective transformation parameter that approximates the corresponding position data obtained by the above processing is obtained, and image data is synthesized based on the perspective transformation parameter.
[0080]
As described above, in the third embodiment, the image data is acquired from the image memory 12 and the combination processing is performed on the result of performing the correction processing, so that the deformation processing that causes blurring is performed twice. Since the deformations necessary for the tilt correction and the synthesis process are combined and only one deformation process is required, a high-quality result with less blur can be obtained.
[0081]
In addition, erroneous correspondence due to inclination or noise that occurs in matching using image data that includes a lot of fine gray structures such as character images can be reduced by first performing corresponding point search by OCR processing.
[0082]
The image processing apparatuses according to the first to third embodiments are housed in a single housing as a digital camera, and each component of the image processing apparatus is constructed as a program and stored in a small recording medium. When the digital camera is mounted on the digital camera and the program is executed by the digital camera, the digital camera can have the above-described image processing function.
[0083]
[Embodiment 4]
In the fourth embodiment, a normal digital camera having no correction function or the like applied in the first to third embodiments is used, and the captured image data is taken into a computer outside the camera and then operated on the computer. Software that performs the same processing as in the third embodiment.
[0084]
FIG. 15 shows a system configuration according to Embodiment 4 of the present invention.
[0085]
The system shown in FIG. 1 includes a CPU 31, a memory 32, a disk device 33, a display 34, a printer 35, a communication device 36, a floppy disk device 37, and a digital camera 40. The communication device 36 and the digital camera 40 communicate with each other. Connected by connecting devices.
[0086]
Of the apparatus configuration of the third embodiment described above, each function of the row direction estimation unit 13, the deformation parameter setting unit 14, the image correction unit 15, the binarization unit 21, the OCR unit 22, and the bonding composition unit 23 is used as software. It is constructed and stored in a storage medium (any one of the memory 32, the disk device 33, or the floppy disk device 37), and the image data captured by the digital camera 40 is acquired via the communication device 36 and stored. The data is stored in a medium (any one of the memory 32, the disk device 33, and the floppy disk device 37).
[0087]
FIG. 16 is a flowchart showing an operation in the system according to Embodiment 4 of the present invention.
[0088]
Step 401) The CPU 31 inputs image data from a storage medium storing the image data.
[0089]
Step 402) The CPU 31 activates the software having the above function from the storage medium storing the software, and performs the row direction estimation process in the same manner as the row direction estimation unit 13 of the third embodiment.
[0090]
Step 403) The CPU 31 sets the deformation parameter in the same manner as the deformation parameter setting unit 14 of the third embodiment described above.
[0091]
Step 404) The CPU 31 corrects the image data in the same manner as the image correction unit 15 of the third embodiment described above.
[0092]
Step 405) The CPU 31 performs a process of binarizing the corrected image data in the same manner as the binarization unit 21 of the third embodiment.
[0093]
Step 406) The CPU 31 performs a process for obtaining the type of each character and the position on the image data in the same manner as the OCR unit 22 of the third embodiment.
[0094]
Step 407) The CPU 31 calculates a deformation parameter for composition based on the corresponding position information and synthesizes the image data in the same manner as the pasting composition unit 23 of the third embodiment.
[0095]
Step 408) The CPU 31 performs output processing such as displaying the synthesized image data on the display 34 or recording it on a storage medium (either the memory 32, the disk device 33, or the floppy disk device 37). .
[0096]
In addition, each component of the image processing apparatus according to all the above embodiments can be described by a computer program. Therefore, the image processing of the present invention can be easily performed by storing the program in a computer-readable storage medium such as a CD-ROM or a floppy disk, mounting the program on a computer implementing the present invention, and installing the program in the computer. Can be realized.
[0097]
The present invention is not limited to the above-described embodiments, and various modifications and applications can be made within the scope of the claims.
[0098]
【The invention's effect】
As described above, according to the image processing apparatus of the present invention, the row direction of a subject (document) having a different direction depending on the location is corrected by perspective transformation from one piece of image data obtained by photographing the document with a digital camera. Thus, it is possible to easily obtain image data of a document that is easy to read.
[0099]
In addition, by extracting the connected component of the image data, obtaining candidates for a plurality of directions from the position and direction of the connected component, projecting the candidate direction to a plane, and taking a histogram on the plane, the noise of the connected component is obtained. It is possible to detect the line direction of a stable document that is not easily affected.
[0100]
In addition to determining whether the subject document is written horizontally or vertically, easy-to-read image data is generated by correcting the line direction to be horizontal or vertical, regardless of whether the camera is positioned vertically or horizontally. can do.
[0101]
Further, by detecting the direction and performing character recognition, the user need only shoot a document with a digital camera without being aware of the shooting direction, and the burden on the user when shooting can be reduced.
[0102]
In addition, by combining a plurality of image data captured using the deformation parameters determined from the character type and position acquired by character recognition, accurate image data is not required without requiring an instruction such as the user's imaging direction. Can be synthesized. Furthermore, even if the digital camera itself does not have functions such as row direction detection, image data direction correction, character recognition, and composition processing, by installing these functions in a computer, These image processes can also be executed using a camera.
[Brief description of the drawings]
FIG. 1 is a diagram for explaining an outline of correction of image data according to Embodiment 1 of the present invention;
FIG. 2 is a configuration diagram of an apparatus according to Embodiment 1 of the present invention.
FIG. 3 is a diagram for explaining an estimation operation of a row direction on a subject surface in a row direction estimation unit according to Embodiment 1 of the present invention.
FIG. 4 is a flowchart of the operation of a row direction estimation unit according to Embodiment 1 of the present invention.
FIG. 5 is an example at the time of connected component extraction in the row direction estimation unit according to Embodiment 1 of the present invention;
FIG. 6 is a diagram for explaining parallel lines and vanishing point vectors on the subject plane in the row direction estimation unit according to Embodiment 1 of the present invention;
[Fig. 7] Fig. 7 is a diagram for explaining changes in product point positions and vectors in a row direction estimation unit according to Embodiment 1 of the present invention.
FIG. 8 is a diagram for explaining setting of a virtual projection plane in a deformation parameter setting unit according to the first embodiment of the present invention.
FIG. 9 is a block diagram of an apparatus according to Embodiment 2 of the present invention.
FIG. 10 is an example of captured image data according to the third embodiment of the present invention.
FIG. 11 is a block diagram of an apparatus according to Embodiment 3 of the present invention.
FIG. 12 is output data of the OCR unit according to the third embodiment of the present invention.
FIG. 13 is a flowchart of a corresponding position search process for a plurality of image data in the stitching and combining unit according to the third embodiment of the present invention.
FIG. 14 is a diagram for explaining a correspondence search of OCR processing results in Embodiment 3 of the present invention.
FIG. 15 is a system configuration diagram according to Embodiment 4 of the present invention;
FIG. 16 is an operation flowchart in the system according to the fourth embodiment of the present invention;
FIG. 17 is a block diagram of a row direction determination on the subject surface.
[Explanation of symbols]
11 Image shooting unit
12 Image memory
13 Line direction estimation part
14 Deformation parameter setting section
15 Image correction unit
16 Image compression unit
17 Nonvolatile memory
18 CPU
19 Interface circuit
20 LCD panel
21 Binarization part
22 OCR Department
23 Bonding composition
31 CPU
32 memory
33 disk unit
34 display
35 Printer
36 Communication equipment
37 Floppy disk device
40 Digital camera

Claims

An image processing apparatus that captures a subject and generates easy-to-read image data from the captured image,
Shooting data generation means for shooting a planar subject and generating image data;
Direction detecting means for detecting one direction of the subject surface from one piece of image data generated by the image data generating means;
Image data for correcting the image data by using the calculated perspective transformation parameter so that a group of straight lines on the image corresponding to one direction on the subject surface detected by the direction detection means is directed to a certain direction. has a correction means, the,
The direction detecting means includes
Means for obtaining a position of an intersection of two connected components from the position and inclination of the connected component, and using a vector from the optical center to the intersection as a direction candidate;
Counts whether the slope of each connected component is closer to horizontal or vertical. If there are more horizontal counts, it is determined as horizontal writing, and if there are more vertical counts, it is determined as vertical writing. Means to
Means for creating a histogram for the direction candidates, and for horizontal writing, using the yz component of the direction vector normalized to the x component 1, and for vertical writing, obtaining the direction using the xz component with the y component normalized;
The image processing apparatus characterized by comprising a.

Instead of the image data correction means, a character recognition means for recognizing the image data based on one direction on the subject surface detected by the direction detection means.
The image processing apparatus according to claim 1 .

The photographing data generation means includes
Comprises means for a plurality of times shooting the planar object,
Deformation parameter calculation means for calculating a deformation parameter from the type and position of the character recognized by the character recognition means;
Using a deformation parameter obtained by the deformation parameter calculation means, a combining means for combining a plurality of the image data taken ;
The image processing apparatus according to claim 2, further comprising:

In place of the photographing data generating means, the image obtaining means obtains image data photographed by the photographing means ,
Deformation parameter calculation means for calculating a deformation parameter from the type and position of the character recognized by the character recognition means;
Using a deformation parameter obtained by the deformation parameter calculation means, a combining means for combining a plurality of the image data taken ;
The image processing apparatus according to claim 2, further comprising: