JP4070486B2

JP4070486B2 - Image processing apparatus, image processing method, and program used to execute the method

Info

Publication number: JP4070486B2
Application number: JP2002072872A
Authority: JP
Inventors: 慶久大黒
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2002-03-15
Filing date: 2002-03-15
Publication date: 2008-04-02
Anticipated expiration: 2022-03-15
Also published as: JP2003271897A

Description

【０００１】
【発明の属する技術分野】
本発明は、ＯＣＲ（光学的文字読み取り装置）や文字認識に利用される画像処理に関し、より特定すると、文字行・文字切り出し性能の向上を図るための入力文書画像の解像度の推定を行う手段（或いは処理ステップ）を備えた画像処理装置、画像処理方法及び同方法の実行に用いるプログラムに関する。
【０００２】
【従来の技術】
画像処理では、原稿から読み取った文書画像に記された文字の読み取りや認識処理が従来より行われている。この処理を行う際に、文書画像に存在する文字行（文字領域）の切り出しを正確に行うことは、高い認識精度を得るために不可欠である。
文字行の切り出し性能を向上させるためには、切り出し性能に影響するファクタである、処理対象原稿における文書画像の読み取り解像度や、スキューといった情報を原稿毎に把握する必要がある。文書画像の読み取り解像度は、文字切り出しに設定する各種パラメータを解像度によって調整するために用いられる。
【０００３】
文書画像に記された文字の読み取りや認識処理の適用条件を整えるための前処理（文字行の切り出しも含まれる）として従来から様々な提案がなされている。その一つとして、特開平6-187490（領域分割方法）を挙げることができる。この例では文書中の代表的な文字サイズを求め、処理に適した、予め設定済みの扱いやすい文字サイズに変換する方法である。この方法では、画像を処理に適した文字サイズに機械的に拡大・縮小するだけであり、本来の解像度に保つという発想はなく、文字自体が変形され、文字画像の特徴が損なわれるため、文字認識においては都合の悪い面がある。
また、特開2000-306041(文字サイズ推定方法および記録媒体)に示されている方法は、文字サイズを積極的に推定する方法であるが、文字のサイズを画素数でしか求めないので、紙の上では同じサイズの文字であっても読取解像度が異れば、求められる文字サイズも異ってしまい、文字認識装置側の設定を解像度によって変更する場合、装置の挙動が異なるおそれがあり、装置の使用者が混乱する原因になる。
一般には、文字認識装置の各種処理の内部パラメータの多くは文字サイズに基づいて設定されており、その場合、解像度に依存して読める文字のサイズが変化してしまう。解像度に応じてパラメータを変更する方式においても、解像度の情報が得られない場合には対応できない。例えば、デジタル・スチル・カメラなどを用いて、文書を非接触で読み取った場合には解像度なる情報は存在せず、画素数のみの情報しか得られないから、解像度として、予め設定済みのデフォルト値を利用するしかない。
近年、読み取り機器の精度向上によって、スキャナーなどの接触タイプの読み取りよりも手軽で高速に読み取ることが可能なために、非接触による文書原稿読み取りが増えている事実を鑑みると、解像度情報が取得できることを前提とした方法は、非常に都合が悪い。
【０００５】
【発明が解決しようとする課題】
本発明は、原稿から読み取った文書画像に記された文字の読み取りや認識処理を行う際に、解像度が不明なために文字行の切り出しが正確にできなかった従来技術の問題点に鑑みてなされたものであり、その目的は、読取解像度に依存することなく、行・文字を安定して切り出すことを可能とする手段（或いは処理ステップ）を備えた画像処理装置、画像処理方法及び同方法の実行に用いるプログラムを提供することにある。
【０００６】
【課題を解決するための手段】
請求項１の発明は、処理対象として入力された文書画像の画素ランに基づいて文字画像の外接矩形情報を生成する手段と、前記外接矩形情報に基づいて行毎に画素数で表現した行サイズを求める手段と、行毎に求めた行サイズからその代表値を算出する手段と、算出した行サイズの代表値と長さ単位で表現した標準文字サイズとを比較することにより解像度を推定する手段とを備えたことを特徴とする画像処理装置である。
請求項２の発明は、処理対象として入力された文書画像の画素ランに基づいて文字画像の外接矩形情報を生成する手段と、前記外接矩形情報に基づいて行毎に画素数で表現した行サイズを求める手段と、行毎に求めた行サイズに対して所定割合以上のサイズを有する当該行内文字画像の外接矩形情報から文字サイズの代表値を算出する手段と、算出した文字サイズの代表値と長さ単位で表現した標準文字サイズとを比較することにより解像度を推定する手段とを備えたことを特徴とする画像処理装置である。
請求項３の発明は、請求項２に記載された画像処理装置において、文字サイズの代表値を算出する前記手段は、行サイズに対して所定割合以上のサイズの当該行内文字画像の外接矩形数が、当該行内文字画像の外接矩形総数の所定割合以上の場合のみ、算出に用いるようにしたことを特徴とする画像処理装置である。
請求項４の発明は、処理対象として入力された文書画像の画素ランに基づいて文字画像の外接矩形情報を生成するステップと、前記外接矩形情報に基づいて行毎に画素数で表現した行サイズを求めるステップと、行毎に求めた行サイズからその代表値を算出するステップと、算出した行サイズの代表値と長さ単位で表現した標準文字サイズとを比較することにより解像度を推定するステップと、を有することを特徴とする画像処理方法である。
請求項５の発明は、処理対象として入力された文書画像の画素ランに基づいて文字画像の外接矩形情報を生成するステップと、前記外接矩形情報に基づいて行毎に画素数で表現した行サイズを求めるステップと、行毎に求めた行サイズに対して所定割合以上のサイズを有する当該行内文字画像の外接矩形情報から文字サイズの代表値を算出するステップと、算出した文字サイズの代表値と長さ単位で表現した標準文字サイズとを比較することにより解像度を推定するステップと、を有することを特徴とする画像処理方法である。
請求項６の発明は、請求項５に記載された画像処理方法において、文字サイズの代表値を算出する前記ステップは、行サイズに対して所定割合以上のサイズの当該行内文字画像の外接矩形数が、当該行内文字画像の外接矩形総数の所定割合以上の場合のみ、算出に用いるようにしたことを特徴とする画像処理方法である。
請求項７の発明は、請求項４ないし６のいずれかに記載された画像処理方法の各処理ステップをコンピュータに実行させるためのプログラムである。
【００１７】
【発明の実施の形態】
本発明を添付する図面とともに示す以下の実施形態に基づき説明する。
下記の「実施形態１」〜「実施形態３」は、原稿から読み取った文書画像における解像度を求めるための方法を示す。求めた解像度は、文書画像の文字読み取りや文字認識の処理を行う際に、設定する各種パラメータを調整し、処理対象の文字行を安定して切り出すために用いられる。
図１は、処理対象となる文書画像の一例を示す。なお、下記の各実施形態では、日本語文の横書原稿を例に説明するが、特にことわらない限り、本発明は、例示に限定されるものではなく、文書画像中に頻出する文字サイズを代表文字サイズとみなし、その画素数(単位[dot])と、一般文書で用いられる代表的な文字のサイズ(単位[inch][mm]など)とを用いて当該文書画像の実効解像度(単位[dot/inch][dot/mm]など)を推定すること、そしてそれに基づいて、行切り出し処理および文字切り出し処理にて使用される各種パラメータを変更することによって認識精度が向上することを示すものであり、特定の言語、文字画像種類(手書き／活字文字など)、書式(縦書き／横書き)に限定されない。
【００１８】
「実施形態１」
図１に示す横書きの文書を対象として、縦方向および横方向に射影を求めると、その結果は、それぞれ図２の（ａ），（ｂ）のようになる。なお、図２（ａ），（ｂ）は、それぞれ縦（Ｙ）軸、横（Ｘ）軸と直交する軸に累積黒画素数をとり、射影を求めた結果を表している。
図示の表現で、射影が横縞状に求められた場合には横書であるし、縦縞状に得られたのならば、縦書である。各々の場合に、縞の幅が文字高さあるいは文字幅に相当するから、この縞の幅を集計すれば、文字高さあるいは文字幅を集計したことになる。集計結果において、最も頻度の高い値を、対象画像の代表文字高さ(幅)として用いることができる。
しかしながら、処理対象となる文書画像が図３に示すような複雑なレイアウト（即ち、図や複数サイズのフォントが混在、段組が複雑）の原稿の場合、縦方向及び横方向に射影を求め、図２と同様の形式で表現すると、図４のようになり、縞状に求められないので、このやり方では文字高さ(幅)を求めることができない。
【００１９】
そこで、原稿画像中の黒ランの外接矩形を求める方法を適用する。なお、「ラン」は、連続画素データが同一値をとる場合に、この連続画素のかたまりを指す概念で、符号化の単位として扱われる（フアクシミリなどで扱う2値の文書画像において、一次元方向に連続する白画素、黒画素のかたまりを「白ラン」、「黒ラン」として符号化の単位とする例は周知）。
求めた黒ランの外接矩形の内、文字要素と思われる矩形を、その近隣の矩形と統合していくことによって、文字行を作成し、文字高さを求める。このとき、文字要素と思われる矩形の判定は、ＯＣＲ処理可能な文字のサイズ制限に基づいて、矩形サイズを制限することにより実行可能である。このようにして、対象画像（図１）において、黒ランの外接矩形を求めた結果を図５に示す。
次いで、この外接矩形に対し統合処理を行い文字行を生成する。統合処理は、図６の説明図に示す操作を行う。即ち、統合の対象として選択した２つの外接矩形を統合するか否かを矩形間の水平距離（図６（ａ））及び垂直距離（図６（ｂ））が基準値以内にあるか否かにより判定し、判定結果を受けて統合を実行する。
統合するか否かの判定は、順次選択される２外接矩形を対象にして全ての外接矩形について行うことにより文字行を作成する。
このようにして、対象画像の黒ランの外接矩形（図５）に統合処理を行い、得られる文字行の作成結果を図７に示す。
上記のようにして文字行（図７）を求めた後、原稿中の全ての文字行の幅(高さ)を集計して、その代表値を文字サイズとして得る。
本例では、全ての文字行の幅(高さ)に関して頻度ヒストグラムを作成し、最頻値の文字行のサイズを代表値とする。
図８は、このヒストグラムを例示するものである。ここでは、文字行の幅(高さ)をdot数（画素数）としてその頻度をヒストグラムとして表している。図示のように、最頻値の文字行のサイズを代表文字サイズとして、後述する実効解像度の算出に用いる。
【００２０】
ところで、上記した文字行を作成する方法で求めた文字サイズはあくまで画素数が単位であり、実際の物理的な長さが求められたわけではないことに注目する必要がある。
文字認識装置の内部では、解像度や画素数を用いて、処理対象の文字サイズや、行間距離、文字間距離など、各種パラメータに上限値や下限値を設定し、行切り出し処理や文字切り出し処理を行う。一般には、画素数によって読み取り文字サイズの上限や下限が規定されている文字認識装置が多く、読み取り画像データの解像度が異なると、読み取り可能な文字の実際の大きさも異る。
例えば、解像度400dpiの場合で読み取れる文字サイズの上限が24point(1point=1/72inch)であれば、解像度200dpiのデータに対しては、倍のサイズの48pointまで読み取り可能になる。一方、解像度400dpiの場合で読み取れる文字サイズの下限が6pointであれば、解像度200dpiにおいては、12point以上の文字でないと、読み取られなくなる。
図９は、同一原稿を解像度を変えて読み取った実際の例を示す。図中の（ａ）は解像度200dpi(主走査および副走査同じ)、（ｂ）は解像度400dpiで読み取った突起を例示する。図９に示すように、同一の原稿でも読み取り解像度が変われば、画素単位での文字サイズが異なることがわかる。また、図１０は、図９の１文字を同一サイズに拡大したものである。読み取り解像度が異なるので、文字を構成する画素数も異なっていることが文字における斜線部分のギザギザの程度で明確にわかる。
文字認識装置の使用者にとっては、画素数による文字サイズは文字認識装置の内部データにすぎず、文字サイズとしては実際の物理（長さ）単位の方を意識することになる。従って、解像度に依存して文字の読み取り可能なサイズが変化すること、つまり図１０のように長さが同じであるのに解像度が異なる場合のように、一方の文字の読み取りが不適になるという状況が起きるのは、混乱を生じるので好ましくない。
また、解像度に応じて各種パラメータの値を変更するような文字認識装置でも、「従来の技術」の項でデジタル・スチル・カメラを例に述べたように、解像度情報のない画像データも増えてきており、解像度のないデータに対しては、予め設定済みのデフォルト値を使用するしか方法がない。
【００２１】
そこで、文書中を代表する文字、本例では最も頻出している文字、を本文の文字とみなし、これが一般的な文書の本文に採用される文字の実際の物理的サイズ（長さ）を持つとすれば、画素数と実際のサイズ（標準文字サイズ）から解像度（以下「実効解像度」という）が推定できる。即ち下記式(1)によって、実効解像度（推定値）が算出可能である。
実効解像度[dpi]([dot/inch])=
文字のサイズ(画素数[dot])/標準文字サイズ(長さ[inch]) ………式(1)
一般的な文書の本文に採用される文字に標準サイズが存在するかが問題であるが、以下の歴史的事情を考慮すると、標準的なサイズは存在すると考えても構わない。
『日本における近代印刷である和文活字は、号数活字というシステムに基づいており、五号、つまり10.5ポイントを中心とした活字によって、長く日本の活字文化をささえていた。戦後、細いポイント活字の普及が進んだが、活字は一度大きさを決めて、そろえると簡単に変えるわけにはいかない。特に文字数の非常に多い日本語では、ひとそろいの活字の大きさを変更するにはたいへんな時間と労力と費用とがかかることになる。その結果、日本における活字の大きさは、号数活字・ポイント活字が混在することになった。』（大西哲彦著：「ユーザーのための写植ガイドブック」pp.16, 印刷学会出版部 (1992)より引用）
また、写植が発達しても、過去の活字文化を受け継ぎ、本文の文字サイズは10.5ポイントが多い。事実、広く一般に利用されているマイクロソフト社の日本語ワープロソフト「WORD」においても、デフォルトの文字サイズは10.5ポイントに設定されている。
このように、本文中の文字には標準的なサイズが存在すると仮定することは十分妥当であり、有意な結果をもたらす。
【００２２】
次に、文書画像の文字認識処理における各種処理の内部パラメータを設定する処理に係わるフローについて述べる。
本フローは、対象原稿から得た画像情報に基づいて推定値として算出される上記した実効解像度を内部パラメータの設定に反映させるもので、概略のフローを示す図１１を参照して、処理フローの各ステップを説明する。
先ず、スキャナー、デジタル・スチル・カメラなどの画像入力機器によって、処理対象の文書画像を記した原稿の読み取り、画像処理等の入力処理を行う（step 1）。この入力処理において、原稿の文書画像の黒ランの生成処理を行う。
次いで、生成された文書画像の黒ランに基づいて、黒ランの外接矩形を求める（step 2）。ここで求められる黒ランの外接矩形には、文字以外の図表等によるものも含まれている。
そこで、求めた黒ランの外接矩形から文字と見なせる矩形を抽出する処理を行い、抽出した文字と見なせる矩形同士で近隣の矩形と統合する処理を行い、文字行を作成する（step 3）。
作成した文字行から文字行サイズ（dot数）のヒストグラムを得、最頻値を代表文字サイズとして求め、実効解像度の算出のために設定する（step 4）。
次に、step 4で設定された代表文字サイズ（dot数）を予め設定済みの標準文字サイズ（inch）と比較し（上記式(1)）、即ちdot/inchの演算を行うことにより実効解像度を算出する（step 5）。
その後、step 5 で算出した実効解像度に基づいて、文字認識装置の各種処理の内部パラメータを解像度に適した値に設定しなおす（step 6）。
【００２３】
「実施形態２」
本実施形態は、「実施形態１」と同様に実効解像度を算出するが、文字サイズの求め方を異にし、読み取られた原稿画像に傾き（スキュー）が生じた場合に受ける影響を抑制することを可能にした方法を採用することにより、より正確な文字サイズを得ることを意図したものである。
原稿読み取りの際に、原稿が正しく（傾き無く）配置されてスキャンされた場合には、文字行の幅(高さ)が文字高さにほぼ相当するので、「実施形態１」による方法で文字サイズを求めても、問題は生じない。
しかしながら、「実施形態１」において、原稿が傾いて読み取られた場合、図１２に示すように、文字行も傾いてしまうため、文字行の矩形範囲と、行内の各文字矩形の範囲とに差が生じてしまう。この場合、実際の文字サイズよりも文字行の幅(高さ)の方が大きめになるために、文字行の幅(高さ)に基づいて算出される代表文字サイズも、実際のサイズよりも大きくなってしまい、正確な解像度が算出できなくなり、都合が悪い。
【００２４】
そこで、頻度ヒストグラムを求める対象として、文字行の幅(高さ)ではなく、文字行内に存在する外接矩形の幅(高さ)とすれば、原稿が傾いたことによる影響を極力排除できる。
このとき、図１２に示すように、文字要素の点や句読点など、行幅(高さ)に対して著しく小さい外接矩形の場合には、代表文字サイズの算出には適さないので頻度ヒストグラム集計の対象とはしない。これは、行幅(高さ)に対する行内の外接矩形幅(高さ)の割合に、予め所定のしきい値を設けておき、所定のしきい値以上の矩形のみを頻度ヒストグラム集計の対象に加えることによって、容易に実現できる。
上記の方法を実行するためには、「実施形態１」におけると同様に、文字行を作成するまでの処理を行った後に、以下の処理操作を行う必要がある。
作成された文字行を指定し、当該文字行の行内矩形の中の注目矩形の幅（高さ）：Waと、当該文字行の行幅（高さ）：Wbとを取得して、Wa/Wbを求め、得たWa/Wbがしきい値T1以上であれば、注目矩形は頻度ヒストグラム集計の対象とし、他方、Wa/Wbがしきい値T1未満であれば、注目矩形は集計対象外とする。
こうして対象を絞り、得られた矩形の幅（高さ）を頻度ヒストグラム集計の対象となる文字サイズとして代表文字サイズを求める。これ以降の処理は「実施形態１」と変わりがない。
【００２５】
「実施形態３」
本実施形態は、「実施形態２」において、傾きが大きい場合に、頻度ヒストグラム集計の対象に適さない文字サイズが入ってしまうので、これを排除することにより、より正確な文字サイズを得ることを意図したものである。
原稿読み取りの際に、傾きが大きくなると、図１２の（ｂ）ように、２行が１行にまとめられる場合がある。また、傾いていなくても、行間に図などの矩形が存在した場合にも、複数行が１行にまとめられてしまう。
このような行の場合、行内に行サイズに近いサイズの矩形は、存在しないか(傾きがひどい場合)、或いは文字でない矩形(行間に図やノイズなどの矩形が存在した場合)であり、こうした矩形を代表文字サイズの算出に取り込むとエラーが多くなるので、頻度ヒストグラム集計の対象には適さない。
例えば、図１２の（ａ）は代表文字サイズの算出に適する行であるが、図１２の（ｂ）は適さない行である。
よって、行サイズに対して所定割合以上のサイズの行内矩形の数が、行内矩形の総数に対して、所定割合より低い行は、代表文字サイズの判定処理の対象とはしない。これは、行内の矩形数と、一定割合以上のサイズの矩形数とを計数しておき、その比が予め設定した値以上の場合のみ、当該行の結果を頻度ヒストグラム集計の対象にすることで容易に実現できる。
上記の方法を実行するためには、「実施形態１」におけると同様に、文字行を作成するまでの処理を行った後に、以下の処理操作を行う必要がある。
作成された文字行を指定し、当該文字行の行内矩形の中の注目矩形の幅（高さ）：Waと、当該文字行の行幅（高さ）：Wbとから、Wa/Wbを得、得たWa/Wbがしきい値T1以上の矩形の数：Naを求め、当該文字行の行内矩形の総数：Nbを求め、それらの比：Na/Nbがしきい値T2以上の場合は、当該文字行は頻度ヒストグラム集計の対象とし、他方、Na/Nbがしきい値T2未満であれば、当該文字行は集計対象外とする。
こうして対象を絞り、得られた矩形の幅（高さ）を頻度ヒストグラム集計の対象となる文字サイズとして代表文字サイズを求める。これ以降の処理は「実施形態１」と変わりがない。
【００２６】
以下に示す「実施形態４」〜「実施形態８」は、原稿から読み取った文書画像におけるスキュー（傾き）を検出するための方法を示す。
スキューは、図１２に示すように、原稿を読み取り、得られる文書画像に生じた傾きを意味する。この傾きが大きい場合には、文書画像に対する分割処理、即ち直線によって１行毎に分割する処理、さらに一文字毎に分割する処理も困難にする。このように、スキューは、文書画像の文字読み取りや文字認識の処理等を行う場合に、エラーや処理不能が生じる原因となるので、それを補償するためにスキューの検出が行われる。
以下の実施形態では、黒ランの外接矩形メソッド（上記した実施形態１〜３においても、文字行の切り出しに用いた方法）をベースに、レイアウトの複雑な原稿に対してもスキューの検出を精度良く行うことを意図し、その実現を図るものである。
【００２７】
「実施形態４」
本実施形態におけるスキュー検出方法の原理を説明する。図１３は、この検出原理の説明図である。
黒ランの外接矩形メソッドをベースにした方法であり、これまでと同様に原稿からの読み取り文書画像から文字と見なせる黒ランの外接矩形の生成、矩形統合を行い文字行を生成する。
図１３（ａ）に示すように、文字行を生成した後に、文字行内の矩形を対象に回帰直線（破線）を求める処理を行う。なお、この処理の前提として、文字行内の各矩形は図１３（ｂ）に示すように、読み取り文書画像に設定されたＸＹ座標軸（スキューの基準軸を定めるものでもある）における２点の座標(Xs,Ys),(Xe,Ye)で定義しておく。
回帰直線（破線）を求める処理においては、ＸＹ座標軸で位置を定義された文字行内の矩形の４点の中の１点に注目し、それを座標(Xi,Yi)の形で表現する。本例では、図１３（ａ）に示すように、矩形の始点の座標(Xs,Ys)に注目しているが、実際には矩形の４点の内、どの点でも構わない。
文字行内の各矩形の注目点(Xi,Yi)の座標、即ち(X1,Y1),(X2,Y2),……,(Xn,Yn),…の軌跡を直線で近似すれば、図１３（ａ）中の破線となり、この破線と水平線（Ｘ軸）との角度が原稿の傾きに相当する。
【００２８】
座標(Xi,Yi)の軌跡の線形近似は、回帰分析を行うことにより求めることができる。
X に対するYの回帰直線を求める方法は、「統計」に関する教科書(例えば、．ガットマン．Ｓ．Ｓ．ウィルクス著「工科系のための統計概論」培風社刊)に詳しいが、簡単には以下のようになる。
X に対する Y の回帰直線の式は、
Y = AX + B ………式(2)
の形で表され、A をXに対するYの回帰係数と言う。
A = {NΣXiYi-(ΣXi)(ΣYi)}/{NΣXi²-(ΣXi)²} ………式(3)
によって A を求め、次に、
B =ΣYi/N-AΣXi/N ………式(4)
によって B を求める。
一行に関しては文字行内の各矩形の注目点(Xi,Yi)に上記式(2)〜(4)を適用することにより傾き：Aが算出できるので、原稿中の全文字行に対して傾きを求めた後、その代表値を求めるスキューとする。代表値としては、頻度ヒストグラムとして集計し、最も頻出する傾きを選択する方法、あるいは傾きの平均を算出する方法、などによって原稿の傾きを決定する。なお、傾き(スキュー)角度θは θ = (tan)^-1 A で求められる。
このようにして検出されたスキューは、処理文書画像の文字読み取りや文字認識の処理等を行う場合に、エラーや処理不能が生じる原因となるので、スキューによる影響を除く、或いは回避する等のスキュー補償の処理を行う。スキュー補償自体は、処理文書画像の文字読み取りや文字認識の処理等において、従来から実施されている方法を採用でき、例えば、画像入力処理において処理対象文書画像に補正を掛けるとか、スキューが著しい場合には、処理の対象としないで原稿の再読み取りを指示するといった方法によって対処する。
【００２９】
「実施形態５」
本実施形態は、「実施形態４」と同様にスキューを検出するが、スキューが著しくなると、複数の文字行が一つにまとめられるということが一部に起きる場合があり、スキューの検出結果にエラーを導く。このような影響を抑制することを可能にした方法を採用することにより、より正確なスキューを検出することを意図したものである。
図１４は、原稿画像のスキューが著しくなった結果、行切り出し処理に失敗して、複数行が一つの行にまとめられた場合を示している。複数行に「実施形態４」と同じ回帰直線を求める方法を適用すると、文字行内の矩形の座標の軌跡を直線で精度良く近似することは難しく、回帰分析して求めた回帰直線の傾き（図１４中に破線にて示す）と、実際の原稿の傾きとの不一致が著しくなる場合が生じる。もちろん、図１３のように、一行が正しく切り出されている場合でも、点や句読点など、行サイズと比較して著しく小さい矩形の座標も、回帰分析の対象に加えた場合、実際の行の傾きと回帰直線の傾きとが乖離する原因になる。
そこで、文字行の幅(高さ)と比較して著しく小さな当該文字行の行内矩形を回帰分析の対象から排除する。このために、文字行の幅(高さ)に対する行内矩形サイズの割合に、所定のしきい値を設けておき、一定以上の行内矩形のみ分析の対象とする。
【００３０】
上記の方法を実行するためには、「実施形態１」におけると同様に、文字行を作成するまでの処理を行うが、このときに用いる文字行作成（行切り出し）手段は、原稿のスキュー角度が大きい場合、文字行の切り出しを正常に行うことが困難であり、ほとんどの行切り出し結果が、図１４のように複数行が一つにまとめられてしまうおそれがある。その場合、回帰分析対象となる座標が少ないので、回帰直線を求めても、それが行の傾きを表現していないことも起こる。
こうした不具合を避けるために、回帰分析を行う前に、分析対象の矩形の数と、行内矩形の総数との比を算出し、分析対象の行内矩形数が少なく、分析対象の行内矩形数が行内矩形総数の予め定めたしきい値以下の割合であるならば、回帰分析を行わず、スキュー角度算出不能である、という判断を行うようにする。
具体的には、以下の処理操作を行う必要がある。
作成された文字行を指定し、当該文字行の行内矩形の中の矩形の幅（高さ）：Waと、当該文字行の行幅（高さ）：Wbとを取得して、Wa/Wbを求め、得たWa/Wbがしきい値T1以上である矩形の数：Naを求め、当該文字行の行内矩形の総数：Nbを求め、それらの比：Na/Nbがしきい値T2以上の場合は、当該文字行はスキュー角度算出可能とし、他方、Na/Nbがしきい値T2未満であれば、当該文字行はスキュー角度算出不可能とする。
こうして対象を絞り、得られたスキュー角度を頻度ヒストグラム集計等の対象となるスキュー角度として代表スキュー角度を求める。また、これ以降のスキューを補償する処理は「実施形態１」と変わりがない。
【００３１】
「実施形態６」
本実施形態は、「実施形態４，５」と同様にスキューを検出する場合、原稿の単位で検出を実行する意義があるかを定める基準を設け、検出時にその基準による判定を行うことにより適正かつ無駄のない動作を保証することを意図したものである。
一般的に、スキュー角度を検出可能な範囲には制限を設けておき、すべての角度を検出可能であることを保証しない。なぜなら、天地が逆転した原稿の場合、あるいは、90度回転した原稿の場合など、文字認識しない限りは、原稿方向は不明であり、傾き角度検出するために、負荷の大きな文字認識処理を複数回にわたって実行しなくてはならず、現実的ではないからである。多くの文字認識処理装置が保証しているスキュー検出角度は、-10〜0〜10度、最高±45度である。
したがって、検出可能なスキュー角度を越えて傾いている原稿に対しては、スキュー角度を求めることは無意味であり、大きくスキューしている旨を、オペレータに提示する方が実用上都合がよい。
スキュー角度が大きいために、行切り出し処理が失敗する例として、図１２(ｂ)のように複数行が１行にまとめられてしまう場合がある。このような行を検出するには、「実施形態５」で述べたように、当該文字行において、行サイズに対して所定割合以上のサイズの文字行内矩形数が、文字行内矩形総数より著しく少ない場合には、文字行切り出し失敗行であると判断すればよい。
そして、１枚の原稿中、行切り出しした行の総数に対し、上記の判断によって文字行切り出し失敗と判定された行が所定割合以上であれば、当該原稿は、スキュー角度検出不可能なほど傾いていると判断する。
上記の方法を実行するためには、文字行を作成するまでの処理を行った後に、以下の処理操作を行う必要がある。
対象（注目）原稿について作成された文字行を指定し、当該文字行の行内矩形の中の矩形の幅（高さ）：Waと、当該文字行の行幅（高さ）：Wbとを取得して、Wa/Wbを求め、得たWa/Wbがしきい値T1以上である矩形の数：Naを求め、当該文字行の行内矩形の総数：Nbを求め、それらの比：Na/Nbがしきい値T2以上であるかを全文字行にわたって調べ、しきい値T2以上の文字行の数：Ncを求める。次いで、しきい値T2以上の文字行の数：Ncと当該原稿の文字行の総数：Ndの比：Nc/Ndがしきい値T3以上であるかを調べ、当該原稿がしきい値T3以上ならば、注目原稿のスキュー角度検出を行い、未満ならば、注目原稿のスキュー角度検出を実行しない、という処理を行う。
こうして無意味な検出動作を避けて、意義のある対象に対するスキュー角度のみに検出を行う。
【００３２】
「実施形態７」
本実施形態は、「実施形態４〜６」と同様にスキューを回帰分析法により直線近似により検出する場合、検出した結果を評価し、評価結果に従い検出結果を利用するか否かを決めることにより、検出の高精度化を図ることを意図したものである。
回帰分析において、座標の軌跡の直線近似の合致の程度を表現する値として相関係数がある。相関係数は「実施形態４」で述べた処理手順（式(2)〜(4)、参照）に加え、以下の処理を追加することによって得られる。
即ち、X と Y の立場を逆にすると、もう１つの回帰直線ができる。
Y に対する X の回帰直線の式は、
X = CY + D ………式(5)
であり、この場合に、
C = {NΣXiYi-(ΣXi)(ΣYi)}/{NΣYi²-(ΣYi)²} ………式(6)
によって C を求め、次に、
D = ΣXi/N-CΣYi/N ………式(7)
によって D を求める。
X と Y の相関係数をρとすると、
ρ=±√|AC| ………式(8)
となる。なお、式(8)における複号の選択は、A または C の分子の符号とする。
相関係数ρの絶対値が１に近いほど、座標の直線近似がうまくいっていることになり、回帰直線が文字行の傾きに相当すると考えてよいといえる。逆に、相関係数ρの絶対値が１より小さく、０に近くなるほど、回帰直線と、文字行の傾きとは一致しない度合が強くなるといえる。
よって、当該文字行内の矩形座標に対して算出された相関係数ρの絶対値が小さい場合、その行は文字切り出しに失敗している可能性が高く、原稿の傾きを求めるには利用すべきでない。
本例の方法を実行するためには、回帰分析法を各文字行内の矩形に適用して直線近似によりスキューを検出する処理を行うときに、相関係数ρを求め、得た相関係数ρの絶対値に対して、予めしきい値を設定しておき、しきい値以下である行は、スキュー検出の対象としないようにすることにより容易に実現できる。
こうして精度が保証されない検出結果を用いることを避けて、正しい結果が得られる対象から得られたスキュー角度を頻度ヒストグラム集計等の処理に用いて代表スキュー角度を求めるようにする。また、これ以降のスキューを補償する処理は「実施形態１」と変わりがない。
【００３３】
「実施形態８」
本実施形態は、「実施形態４〜６」と同様にスキューを検出し、「実施形態７」と同様に相関係数によるチェックを掛ける場合、原稿の単位で検出する意義があるかを定める基準を設け、検出時にその基準による判定を行うことにより適正かつ無駄のない動作を保証することを意図したものである。
本例では、１枚の原稿中の各文字行に対して「実施形態７」に述べた相関係数によるチェックを掛け、検出対象から外された文字行数の原稿全体の文字行の総数に対する割合が多くなれば、その原稿はスキューが大きすぎたためであり、スキューを算出する意味がない原稿であると判断する。
本例の方法を実行するためには、文字行切り出し結果の総数を計数し、相関係数の絶対値に対して、予めしきい値を設定しておき、しきい値以下である行の数を計数し、その数が全行数にしめる割合に対してしきい値処理を行い、割合の大小を判定することによって、容易に実現できる。
即ち、対象（注目）原稿について作成された各文字行の矩形に回帰分析法を適用して直線近似によりスキューを検出する処理を行うときに、相関係数ρを求め、得た相関係数ρの絶対値がしきい値T4未満である文字行の数：Ncと当該原稿の文字行の総数：Ndの比：Nc/Ndがしきい値T5以上であるかを調べ、しきい値T5以上ならば、注目原稿のスキュー角度検出を行い、未満ならば、注目原稿のスキュー角度検出を実行しない、という処理を行う。
こうして無意味な検出動作を避けて、意義のある対象に対するスキュー角度のみに検出を行う。
【００３４】
「実施形態９」
本実施形態は、本発明に係わる文字認識装置、或いは画像処理装置の実施形態を示すものである。上記の「実施形態１〜３」に示した文書画像における実効解像度を求めるための方法、或いは「実施形態４〜８」に示した文書画像におけるスキューの検出方法に示した処理ステップを実行する手段として、汎用の処理装置（コンピュータ）を利用して構成される装置を例示するものである。
図１５は、本実施形態の文書画像の処理装置の構成を例示する。図１５に示すように、本例は、汎用の処理装置（コンピュータ）により実施する例を示すものであり、構成要素としてＣＰＵ１、メモリ２、ハードディスクドライブ３、スキャナ、キーボード、マウス等の入力装置４、ＣＤ−ＲＯＭドライブ５、ディスプレイ６、フレキシブルディスクドライブ７、通信装置８などを用意し、これらをバス接続して構成する。
また、記憶手段としてのメモリ２、ハードディスクドライブ３、ＣＤ−ＲＯＭドライブ５、フレキシブルディスクドライブ７が用いる記憶媒体（図示せず）の一部には、本発明に係わる文字認識処理や画像処理の機能を実現し、上記実施形態に示した実効解像度を求めるための方法、或いはスキューの検出方法で述べた処理手順を実現させるためのプログラム（ソフトウェア）が記録されている。
処理対象の原稿文書画像は、スキャナー等の入力装置４により入力され、例えばハードディスク３などに格納されているものである。ＣＰＵ１は、記憶手段が有する記録媒体から上記した処理機能・処理方法を実現するプログラムを読み出し、プログラムに従う処理を対象文書画像に実行し、その処理結果等をディスプレイ６などに出力する。
なお、本発明に係わる文字認識装置、或いは画像処理装置を、図１６に示すように、通信装置８によりインターネットなどの通信回線２０を介して、外部の装置１１〜１３と接続して、機能の一部をネットワーク上に持つような形態で実施してもよい。
【００３５】
【発明の効果】
（１）請求項１，４の発明に対応する効果
処理対象の文書画像を基に生成した黒ランの外接矩形から文字の要素と思われる矩形を抽出し、近隣の矩形同士を連結して行に成長させ、得られる各行のサイズを集計し、代表文字サイズ（画素数）を求め、求めた値を標準文字サイズ（長さ）と比較することにより解像度を推定するようにしたので、読取解像度に依存することなく、行・文字を安定して切り出すことができるようになり、高精度かつ頑強な画像処理装置又は画像処理方法を実現することが可能となる。
（２）請求項２，５の発明に対応する効果
処理対象の文書画像を基に生成した黒ランの外接矩形から文字の要素と思われる矩形を抽出し、近隣の矩形同士を連結して行に成長させ、得られる各行毎の行サイズに対して所定割合以上のサイズを有する当該行内の外接矩形情報を基に代表文字サイズ（画素数）を求め、求めた値を標準文字サイズ（長さ）と比較することにより解像度を推定するようにしたので、読取解像度に依存することなく、スキューの影響を受けにくく、行・文字を安定して切り出すことができるようになり、高精度かつ頑強な画像処理装置又は画像処理方法を実現することが可能となる。
（３）請求項３，６の発明に対応する効果
上記（２）の効果に加え、スキューが大きい場合に、代表文字サイズ算出の対象に適さない文字サイズが入ってしまうものを排除することができるので、より正確な文字サイズを得ることが可能になる。
（４）請求項７の発明に対応する効果
請求項４ないし６のいずれかに記載された画像処理方法の各ステップを実行するためのプログラムを汎用の処理装置（コンピュータ）に搭載することにより、上記（１）ないし（３）いずれかの効果を容易に具現化することが可能になる。
【図面の簡単な説明】
【図１】処理対象となる文書画像の一例を示す。
【図２】横書きの文書を対象として、縦方向および横方向に射影を求めた結果を示す。
【図３】複雑なレイアウトの原稿の例を示す図である。
【図４】図３に示す原稿を対象として、縦方向および横方向に射影を求めた結果を示す。
【図５】文書画像の例（図１）における文字と見なせる黒ランの外接矩形を作成した結果を示す。
【図６】近隣の矩形を統合する処理を説明する図である。
【図７】統合処理の結果得られる文字行の矩形と文字外接矩形を示す図である。
【図８】文字行の幅(高さ)に関してとった頻度ヒストグラムと代表文字サイズを示す図である。
【図９】同一原稿を解像度を変えて読み取ったときの実例を示す。
【図１０】元は同一サイズの文字を解像度を変えて読み取ったときの例を示す。
【図１１】実効解像度を内部パラメータの設定に反映させる処理手順を含む文書画像の文字認識処理に係わるフローチャートを示す。
【図１２】スキュー発生時の行切り出し状態を説明する図である。
【図１３】文字行内の外接矩形への回帰分析の適用（ａ）と座標による矩形の定義（ｂ）を説明する図である。
【図１４】スキュー発生時に行切り出しに異常が生じた場合の回帰直線の状態を説明する図である。
【図１５】本発明の実施形態に係わる文書画像の処理装置の構成を示す。
【図１６】本発明の実施形態に係わる文書画像の処理装置の他の構成を示す。
【符号の説明】
１…ＣＰＵ、２…メモリ、
３…ハードディスクドライブ、４…入力装置、
５…ＣＤ−ＲＯＭドライブ、６…ディスプレイ（表示装置）、
７…ＦＤドライブ、８…通信装置。[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to image processing used for OCR (optical character reader) and character recognition, and more specifically, an input document image for improving character line / character cutting performance.Solution ofEstimating imageDefiniteWith means (or processing steps) to performPaintingThe present invention relates to an image processing apparatus, an image processing method, and a program used for executing the method.
[0002]
[Prior art]
In image processing, reading and recognition processing of characters written in a document image read from a document has been conventionally performed. When this processing is performed, it is indispensable to accurately extract a character line (character region) existing in the document image in order to obtain high recognition accuracy.
In order to improve the character line segmentation performance, it is necessary to grasp information such as the reading resolution of the document image in the document to be processed and the skew for each document, which are factors affecting the segmentation performance. The reading resolution of the document image is used to adjust various parameters set for character segmentation according to the resolution.
[0003]
Conventionally, various proposals have been made as preprocessing (including character line segmentation) for adjusting application conditions for reading and recognition processing of characters written in a document image. One of them is JP-A-6-187490 (area division method). In this example, a representative character size in a document is obtained and converted to a preset easy-to-handle character size suitable for processing. This method only mechanically enlarges or reduces the image to a character size suitable for processing. There is no idea of maintaining the original resolution, the character itself is deformed, and the character image characteristics are impaired. There is an inconvenience in recognition.
Further, the method disclosed in Japanese Patent Application Laid-Open No. 2000-306041 (character size estimation method and recording medium) is a method for positively estimating the character size. However, since the character size is determined only by the number of pixels, Above, even if the characters are the same size, if the reading resolution is different, the required character size will also be different, and when changing the setting on the character recognition device side depending on the resolution, the behavior of the device may be different, This can cause confusion for the user of the device.
In general, many of the internal parameters of various processes of the character recognition apparatus are set based on the character size. In this case, the size of the character that can be read changes depending on the resolution. Even the method of changing the parameter according to the resolution cannot cope with the case where the resolution information cannot be obtained. For example, when a document is read in a non-contact manner using a digital still camera or the like, there is no resolution information and only information about the number of pixels can be obtained. There is no choice but to use.
In recent years, due to the increased accuracy of reading devices, it is easier and faster to read than contact-type scanning such as scanners, so in view of the fact that non-contact document manuscript reading is increasing, resolution information can be acquired The method based on the above is very inconvenient.
[0005]
[Problems to be solved by the invention]
  The present invention has been made in view of the problems of the prior art in which character lines cannot be cut out accurately because the resolution is unknown when performing reading or recognition processing of characters written in a document image read from a document. The purpose of this is to provide means (or processing steps) that make it possible to stably extract lines and characters without depending on the reading resolution.Image processingTo provide an apparatus, an image processing method, and a program used to execute the method.
[0006]
[Means for Solving the Problems]
  According to the first aspect of the present invention, there is provided means for generating circumscribing rectangle information of a character image based on a pixel run of a document image inputted as a processing target, and a row size expressed by the number of pixels for each row based on the circumscribing rectangle information. By comparing the representative value of the calculated line size with the standard character size expressed in length units.ResolvingMeans to estimate imageAndIt is characterized by havingImage processingDevice.
  According to the second aspect of the present invention, there is provided means for generating circumscribed rectangle information of a character image based on a pixel run of a document image inputted as a processing target, and a line size expressed by the number of pixels for each line based on the circumscribed rectangle information. Means for calculating the representative value of the character size from the circumscribed rectangle information of the in-line character image having a size equal to or larger than a predetermined ratio with respect to the line size determined for each line, and the representative value of the calculated character size An image processing apparatus comprising: means for estimating a resolution by comparing a standard character size expressed in length units.
  According to a third aspect of the present invention, in the image processing apparatus according to the second aspect, the means for calculating the representative value of the character size is the number of circumscribed rectangles of the in-line character image having a size equal to or larger than a predetermined ratio with respect to the line size. However, the image processing apparatus is characterized in that it is used for calculation only when the ratio is equal to or greater than a predetermined ratio of the total number of circumscribed rectangles of the in-line character image.
  According to a fourth aspect of the present invention, a circumscribing rectangle information of a character image is generated based on a pixel run of a document image input as a processing target, and a row size expressed by the number of pixels for each row based on the circumscribing rectangle information. A step of calculating the representative value from the line size obtained for each line, and a step of estimating the resolution by comparing the representative value of the calculated line size with a standard character size expressed in length units. And an image processing method.
  The invention of claim 5 includes a step of generating circumscribed rectangle information of a character image based on a pixel run of a document image input as a processing target, and a line size expressed by the number of pixels for each row based on the circumscribed rectangle information. A step of calculating a representative value of the character size from circumscribing rectangle information of the in-line character image having a size equal to or larger than a predetermined ratio with respect to the line size obtained for each row, and a representative value of the calculated character size And a step of estimating the resolution by comparing with a standard character size expressed in units of length.
  According to a sixth aspect of the present invention, in the image processing method according to the fifth aspect, in the step of calculating the representative value of the character size, the number of circumscribed rectangles of the in-line character image having a predetermined ratio or more with respect to the line size. However, this is an image processing method characterized in that it is used for calculation only when it is equal to or greater than a predetermined ratio of the total number of circumscribed rectangles of the in-line character image.
  A seventh aspect of the present invention is a program for causing a computer to execute each processing step of the image processing method according to any one of the fourth to sixth aspects.
[0017]
DETAILED DESCRIPTION OF THE INVENTION
The present invention will be described based on the following embodiments shown with the accompanying drawings.
The following “Embodiment 1” to “Embodiment 3” show methods for obtaining the resolution of a document image read from an original. The obtained resolution is used to stably extract a character line to be processed by adjusting various parameters to be set when performing character reading or character recognition processing of a document image.
FIG. 1 shows an example of a document image to be processed. In each of the following embodiments, a horizontal text manuscript written in Japanese will be described as an example. However, unless otherwise specified, the present invention is not limited to an example, and character sizes that frequently appear in a document image are described. Considering the representative character size, the effective resolution (units) of the document image using the number of pixels (unit [dot]) and the typical character size (unit [inch] [mm], etc.) used in general documents. [dot / inch] [dot / mm] etc.) and based on this, the various parameters used in line segmentation processing and character segmentation processing are shown to improve recognition accuracy It is not limited to a specific language, character image type (handwritten / printed character, etc.), and format (vertical writing / horizontal writing).
[0018]
“Embodiment 1”
When projections are obtained in the vertical and horizontal directions for the horizontally written document shown in FIG. 1, the results are as shown in FIGS. 2 (a) and 2 (b), respectively. 2 (a) and 2 (b) show the results of calculating the projection by taking the cumulative number of black pixels on the axes orthogonal to the vertical (Y) axis and the horizontal (X) axis, respectively.
In the expression shown in the figure, when the projection is obtained in a horizontal stripe shape, it is horizontal writing, and when it is obtained in the vertical stripe shape, it is vertical writing. In each case, the width of the stripe corresponds to the character height or the character width. Therefore, if the width of the stripe is added up, the character height or the character width is added up. In the counting result, the most frequently used value can be used as the representative character height (width) of the target image.
However, when the document image to be processed is a manuscript having a complicated layout as shown in FIG. 3 (that is, a figure and fonts of a plurality of sizes are mixed and the columns are complicated), the projection is obtained in the vertical and horizontal directions, If it is expressed in the same format as in FIG. 2, it becomes as shown in FIG. 4 and cannot be obtained in a striped manner, so the character height (width) cannot be obtained by this method.
[0019]
Therefore, a method for obtaining the circumscribed rectangle of the black run in the document image is applied. “Run” is a concept that refers to a group of continuous pixels when the continuous pixel data has the same value, and is treated as a unit of encoding (in a binary document image handled by facsimile, etc., in a one-dimensional direction). An example in which a group of white pixels and black pixels consecutive to each other is “white run” and “black run” as an encoding unit is well known).
Of the circumscribed rectangles of the obtained black run, a rectangle that seems to be a character element is integrated with its neighboring rectangles to create a character line and determine the character height. At this time, the determination of the rectangle that seems to be a character element can be executed by limiting the rectangle size based on the size limit of the character that can be subjected to OCR processing. FIG. 5 shows the result of obtaining the circumscribed rectangle of the black run in the target image (FIG. 1) in this way.
Next, an integration process is performed on the circumscribed rectangle to generate a character line. The integration process performs the operations shown in the explanatory diagram of FIG. That is, whether or not the two circumscribed rectangles selected as integration targets are integrated is determined whether or not the horizontal distance between the rectangles (FIG. 6A) and the vertical distance (FIG. 6B) are within the reference value. And the integration is executed in response to the determination result.
The determination as to whether or not to merge is performed for all circumscribed rectangles for two circumscribed rectangles that are sequentially selected, thereby creating a character line.
In this way, the integration process is performed on the circumscribed rectangle (FIG. 5) of the black run of the target image, and the resulting character line creation result is shown in FIG.
After obtaining the character lines (FIG. 7) as described above, the widths (heights) of all the character lines in the document are totaled to obtain a representative value as the character size.
In this example, a frequency histogram is created for the widths (heights) of all character lines, and the size of the most frequent character line is used as a representative value.
FIG. 8 illustrates this histogram. Here, the width (height) of a character line is represented as the number of dots (number of pixels), and the frequency is represented as a histogram. As shown in the figure, the size of the most frequent character line is used as the representative character size, and is used for calculating the effective resolution described later.
[0020]
By the way, it should be noted that the character size obtained by the above-described method for creating a character line is based on the number of pixels, and the actual physical length is not necessarily obtained.
Inside the character recognition device, the upper limit value and lower limit value are set for various parameters such as the character size to be processed, the distance between lines, and the distance between characters, using the resolution and the number of pixels, and line cut processing and character cut processing are performed. Do. In general, there are many character recognition devices in which the upper limit and the lower limit of the read character size are defined by the number of pixels. When the resolution of the read image data is different, the actual size of the readable character is also different.
For example, if the upper limit of the character size that can be read at a resolution of 400 dpi is 24 points (1 point = 1/72 inch), it is possible to read up to 48 points of double size for data with a resolution of 200 dpi. On the other hand, if the lower limit of the character size that can be read at a resolution of 400 dpi is 6 points, the character cannot be read at a resolution of 200 dpi unless the characters are 12 points or more.
FIG. 9 shows an actual example in which the same document is read at different resolutions. In the drawing, (a) shows a resolution of 200 dpi (same as main scanning and sub-scanning), and (b) shows a protrusion read at a resolution of 400 dpi. As shown in FIG. 9, it can be seen that the character size in pixel units differs if the reading resolution changes even for the same document. FIG. 10 is an enlarged view of one character in FIG. 9 having the same size. Since the reading resolution is different, it can be clearly seen that the number of pixels constituting the character is different with the jaggedness of the hatched portion of the character.
For the user of the character recognition device, the character size based on the number of pixels is merely internal data of the character recognition device, and the character size is conscious of the actual physical (length) unit. Therefore, the size of the character that can be read changes depending on the resolution, that is, it is inappropriate to read one character as in the case where the resolution is different although the length is the same as shown in FIG. The situation is undesirable because it causes confusion.
Even in character recognition devices that change the values of various parameters according to resolution, as described in the example of a digital still camera in the section of “Prior Art”, there is an increase in image data without resolution information. For data without resolution, there is only a method of using a preset default value.
[0021]
Therefore, the representative character in the document, the most frequent character in this example, is regarded as the character of the body, and this has the actual physical size (length) of the character used in the body of a general document. Then, the resolution (hereinafter referred to as “effective resolution”) can be estimated from the number of pixels and the actual size (standard character size). That is, the effective resolution (estimated value) can be calculated by the following equation (1).
Effective resolution [dpi] ([dot / inch]) =
Character size (number of pixels [dot]) / standard character size (length [inch]) ……… Formula (1)
The problem is whether there is a standard size for characters used in the text of general documents, but considering the following historical circumstances, it may be considered that a standard size exists.
“Japanese print, which is modern printing in Japan, is based on the system of number printing, and has long supported the Japanese printing culture with type 5, which is centered around 10.5 points. After the war, thin point type characters have become popular, but once the size is set, it cannot be changed easily. Especially in Japanese, which has a large number of characters, it takes a lot of time, labor, and money to change the size of a set of characters. As a result, the number of type letters and point letters are mixed in Japan. (Tetsuhiko Onishi: “Photocopying Guidebook for Users” pp.16, cited from the Japan Society for Printing Science (1992))
Even if photosetting is developed, it inherits the type culture of the past, and the text size of the text is 10.5 points. In fact, the default character size is set to 10.5 points even in Microsoft's Japanese word processor “WORD”, which is widely used.
Thus, assuming that there is a standard size for the characters in the text, it is reasonable enough to yield significant results.
[0022]
Next, a flow related to processing for setting internal parameters of various processing in character recognition processing of a document image will be described.
This flow reflects the above-described effective resolution calculated as an estimated value based on image information obtained from the target document in the setting of the internal parameters. With reference to FIG. Each step will be described.
First, an input process such as reading of an original on which a document image to be processed is written and image processing is performed by an image input device such as a scanner or a digital still camera (step 1). In this input process, a black run generation process for the document image of the document is performed.
Next, a circumscribed rectangle of the black run is obtained based on the black run of the generated document image (step 2). The circumscribed rectangle of the black run obtained here includes a chart or the like other than characters.
Therefore, processing is performed to extract a rectangle that can be regarded as a character from the circumscribed rectangle of the obtained black run, and processing that integrates the rectangles that can be regarded as the extracted character with neighboring rectangles to create a character line (step 3).
A histogram of the character line size (number of dots) is obtained from the created character line, the mode value is obtained as the representative character size, and set for calculating the effective resolution (step 4).
Next, the representative character size (number of dots) set in step 4 is compared with the preset standard character size (inch) (the above formula (1)), that is, the effective resolution is obtained by calculating dot / inch. Is calculated (step 5).
Thereafter, based on the effective resolution calculated in step 5, the internal parameters of various processes of the character recognition apparatus are reset to values suitable for the resolution (step 6).
[0023]
“Embodiment 2”
In the present embodiment, the effective resolution is calculated in the same manner as in the first embodiment, but the method for obtaining the character size is different, and the influence that is exerted when an inclination (skew) occurs in the read document image is suppressed. It is intended to obtain a more accurate character size by adopting a method that enables the above.
When the original is scanned when the original is correctly placed (without tilt) and scanned, the width (height) of the character line is almost equivalent to the character height. Finding the size does not cause a problem.
However, in the “embodiment 1”, when the original is read with an inclination, the character line is also inclined as shown in FIG. 12, so that there is a difference between the rectangular range of the character line and the range of each character rectangle in the line. Will occur. In this case, since the width (height) of the character line is larger than the actual character size, the representative character size calculated based on the width (height) of the character line is also larger than the actual size. It becomes large and it becomes impossible to calculate an accurate resolution, which is inconvenient.
[0024]
Therefore, if the frequency histogram is determined not by the width (height) of the character line but by the width (height) of the circumscribed rectangle present in the character line, the influence of the document being tilted can be eliminated as much as possible.
At this time, as shown in FIG. 12, in the case of a circumscribed rectangle that is remarkably small with respect to the line width (height) such as a character element point or a punctuation mark, it is not suitable for calculating the representative character size. Not targeted. This is because a predetermined threshold is set in advance for the ratio of the circumscribed rectangle width (height) in a row to the row width (height), and only rectangles that exceed the predetermined threshold are subject to frequency histogram aggregation. By adding, it can be easily realized.
In order to execute the above-described method, it is necessary to perform the following processing operations after performing the processing up to the creation of the character line, as in the first embodiment.
Specify the created character line, obtain the width (height): Wa of the rectangle of interest in the in-line rectangle of the character line, and the line width (height): Wb of the character line, and Wa / If Wb is obtained and the obtained Wa / Wb is equal to or greater than the threshold T1, the target rectangle is subject to frequency histogram aggregation. On the other hand, if Wa / Wb is less than the threshold T1, the target rectangle is not subject to aggregation. And
The target is narrowed down and the width (height) of the obtained rectangle is used as the character size to be subjected to frequency histogram tabulation to obtain the representative character size. The subsequent processing is the same as “Embodiment 1”.
[0025]
“Embodiment 3”
In this embodiment, when the inclination is large in “Embodiment 2”, a character size that is not suitable for frequency histogram aggregation is included. Therefore, by eliminating this, it is possible to obtain a more accurate character size. It is intended.
If the inclination increases during document reading, two lines may be combined into one line as shown in FIG. Further, even when the object is not inclined, a plurality of lines are combined into one line even when a rectangle such as a figure exists between the lines.
In the case of such a line, a rectangle with a size close to the line size does not exist in the line (when the inclination is severe), or it is a non-character rectangle (when a rectangle such as a figure or noise exists between the lines). If a rectangle is taken into the calculation of the representative character size, errors increase, so it is not suitable for frequency histogram aggregation.
For example, (a) in FIG. 12 is a line suitable for calculating the representative character size, but (b) in FIG. 12 is a non-suitable line.
Therefore, a line in which the number of in-line rectangles having a size equal to or larger than a predetermined ratio with respect to the line size is lower than the predetermined ratio with respect to the total number of in-line rectangles is not subject to the representative character size determination process. This is because the number of rectangles in a row and the number of rectangles of a certain size or more are counted, and only when the ratio is equal to or greater than a preset value, the result of that row is subject to frequency histogram aggregation. It can be easily realized.
In order to execute the above-described method, it is necessary to perform the following processing operations after performing the processing up to the creation of the character line, as in the first embodiment.
Specify the created character line, and obtain Wa / Wb from the width (height): Wa of the target rectangle in the in-line rectangle of the character line and the line width (height): Wb of the character line. The obtained number of rectangles with Wa / Wb equal to or greater than the threshold T1: Na is obtained, and the total number of rectangles in the line of the character line: Nb is obtained, and the ratio thereof: When Na / Nb is greater than or equal to the threshold T2. The character line is subject to frequency histogram aggregation. On the other hand, if Na / Nb is less than the threshold value T2, the character line is not subject to aggregation.
The target is narrowed down and the width (height) of the obtained rectangle is used as the character size to be subjected to frequency histogram tabulation to obtain the representative character size. The subsequent processing is the same as “Embodiment 1”.
[0026]
The following “Embodiment 4” to “Embodiment 8” show methods for detecting a skew in a document image read from a document.
As shown in FIG. 12, the skew means an inclination generated in a document image obtained by reading a document. When this inclination is large, the division processing for the document image, that is, the processing for dividing each line by a straight line, and the processing for dividing each character become difficult. As described above, the skew causes an error or an incapability of processing when performing character reading or character recognition processing of the document image. Therefore, the skew is detected to compensate for the error.
In the following embodiments, skew detection is accurate even for documents with complicated layouts based on the circumscribed rectangle method for black runs (the method used for character line segmentation in Embodiments 1 to 3 described above). It is intended to be performed well and is intended to realize it.
[0027]
“Embodiment 4”
The principle of the skew detection method in this embodiment will be described. FIG. 13 is an explanatory diagram of this detection principle.
This is a method based on the circumscribed rectangle method of black run, and generates a character line by generating a black run circumscribing rectangle that can be regarded as a character from the read document image from the original and integrating the rectangle as before.
As shown in FIG. 13A, after generating a character line, a process for obtaining a regression line (broken line) for a rectangle in the character line is performed. As a premise of this processing, each rectangle in the character line is represented by coordinates of two points on an XY coordinate axis (which also defines a skew reference axis) set in the read document image (see FIG. 13B). Xs, Ys) and (Xe, Ye).
In the process of obtaining a regression line (broken line), attention is paid to one of the four rectangular points in the character line whose position is defined by the XY coordinate axes, and this is expressed in the form of coordinates (Xi, Yi). In this example, as shown in FIG. 13A, attention is paid to the coordinates (Xs, Ys) of the start point of the rectangle, but in reality any point among the four points of the rectangle may be used.
If the coordinates of the attention point (Xi, Yi) of each rectangle in the character line, that is, the locus of (X1, Y1), (X2, Y2),..., (Xn, Yn),. The broken line in (a) is shown, and the angle between the broken line and the horizontal line (X axis) corresponds to the inclination of the document.
[0028]
Linear approximation of the locus of coordinates (Xi, Yi) can be obtained by performing regression analysis.
The method of calculating the regression line of Y with respect to X is detailed in textbooks on “statistics” (eg, Gatman SS Wilkes, “Statistics for Engineering” published by Bakufusha). become that way.
The equation for the regression line of Y with respect to X is
Y = AX + B ……… Formula (2)
Where A is the regression coefficient of Y with respect to X.
A = {NΣXiYi- (ΣXi) (ΣYi)} / {NΣXi²-(ΣXi)²} ……… Formula (3)
Find A by
B = ΣYi / N-AΣXi / N ……… Formula (4)
Find B by
For one line, the slope: A can be calculated by applying the above formulas (2) to (4) to the attention point (Xi, Yi) of each rectangle in the text line. After obtaining, the skew is obtained as a representative value. The representative values are aggregated as a frequency histogram, and the inclination of the document is determined by a method of selecting the most frequently occurring inclination or a method of calculating an average of inclinations. The inclination (skew) angle θ is θ = (tan)^-1 Required by A.
The skew detected in this way causes errors or incapability when performing character reading or character recognition processing of the processed document image. Therefore, the skew that eliminates or avoids the influence of the skew, etc. Perform compensation processing. The skew compensation itself can employ a method that has been conventionally used in character reading or character recognition processing of a processed document image. For example, when the processing target document image is corrected in the image input processing or the skew is significant. This is dealt with by a method of instructing re-reading of the original without processing.
[0029]
“Embodiment 5”
In the present embodiment, skew is detected in the same manner as in the fourth embodiment. However, when the skew becomes significant, a plurality of character lines may be combined into one part. Lead errors. By adopting a method that makes it possible to suppress such an influence, it is intended to detect a more accurate skew.
FIG. 14 shows a case where a line cut process has failed and a plurality of lines are combined into one line as a result of the skew of the document image becoming significant. If the same regression line calculation method as that of “Embodiment 4” is applied to a plurality of lines, it is difficult to accurately approximate the locus of rectangular coordinates in a character line with a straight line, and the slope of the regression line obtained by regression analysis (see FIG. 14 (indicated by a broken line in FIG. 14) and the actual inclination of the original document may be significantly different. Of course, as shown in FIG. 13, even when a line is cut out correctly, if the coordinates of a rectangle that is remarkably smaller than the line size, such as dots and punctuation marks, are added to the object of regression analysis, the actual line inclination And the slope of the regression line deviate.
Therefore, the in-line rectangle of the character line that is significantly smaller than the width (height) of the character line is excluded from the regression analysis target. For this purpose, a predetermined threshold value is provided for the ratio of the in-line rectangle size to the width (height) of the character line, and only a certain in-line rectangle is subject to analysis.
[0030]
In order to execute the above method, processing up to the creation of a character line is performed in the same manner as in “Embodiment 1”. The character line creation (line cutout) means used at this time is the skew angle of the document. Is large, it is difficult to cut out character lines normally, and there is a possibility that most lines are cut out into a plurality of lines as shown in FIG. In that case, since there are few coordinates used as a regression analysis object, even if it calculates | requires a regression line, it may not express the inclination of a line.
To avoid such problems, calculate the ratio of the number of rectangles to be analyzed and the total number of in-line rectangles before performing regression analysis, so that the number of in-line rectangles to be analyzed is small and the number of in-line rectangles to be analyzed is in-line. If the ratio is equal to or less than a predetermined threshold value of the total number of rectangles, the regression analysis is not performed and it is determined that the skew angle cannot be calculated.
Specifically, it is necessary to perform the following processing operations.
Specify the created character line, get the width (height): Wa of the rectangle in the in-line rectangle of the character line, and get the line width (height): Wb of the character line, Wa / Wb The number of rectangles for which Wa / Wb is equal to or greater than the threshold value T1: Na is determined, the total number of rectangles in the line of the character line: Nb is determined, and the ratio thereof: Na / Nb is equal to or greater than the threshold value T2. In this case, the skew angle can be calculated for the character line. On the other hand, if Na / Nb is less than the threshold value T2, the skew angle cannot be calculated for the character line.
The target is narrowed down, and the obtained skew angle is used as a skew angle to be subjected to frequency histogram aggregation or the like to obtain a representative skew angle. Further, the processing for compensating for the subsequent skew is the same as that in the first embodiment.
[0031]
“Embodiment 6”
In the present embodiment, when skew is detected, as in “Embodiments 4 and 5,” a standard is provided that determines whether it is meaningful to perform detection in units of originals. It is intended to guarantee a lean operation.
Generally, the range in which the skew angle can be detected is limited, and it is not guaranteed that all angles can be detected. This is because the orientation of the document is unknown unless character recognition is performed, such as in the case of a document with the top and bottom reversed, or a document rotated by 90 degrees. This is because it must be carried out over a long period of time and is not realistic. The skew detection angles guaranteed by many character recognition processing apparatuses are -10 to 0 to 10 degrees and a maximum of ± 45 degrees.
Therefore, it is meaningless to obtain a skew angle for a document that is tilted beyond a detectable skew angle, and it is practically convenient to present to the operator that the skew is greatly skewed.
As an example in which the line cut-out process fails due to a large skew angle, there are cases where a plurality of lines are combined into one line as shown in FIG. In order to detect such a line, as described in “Embodiment 5”, the number of rectangles in a character line having a size equal to or larger than a predetermined ratio with respect to the line size is significantly smaller than the total number of rectangles in a character line. In such a case, it may be determined that the line is a character line cut-out failure line.
If the number of lines determined to have failed to cut out character lines is greater than or equal to a predetermined ratio with respect to the total number of lines cut out in one original, the original is tilted so that the skew angle cannot be detected. Judge that
In order to execute the above method, it is necessary to perform the following processing operations after performing the processing up to the creation of the character line.
Specify the character line created for the target (attention) manuscript, and obtain the width (height): Wa and the line width (height): Wb of the character line in the in-line rectangle of the character line Then, Wa / Wb is obtained, the number of rectangles in which the obtained Wa / Wb is equal to or greater than the threshold T1: Na is obtained, the total number of in-line rectangles of the character line: Nb is obtained, and the ratio thereof: Na / Nb Whether or not is equal to or greater than the threshold value T2 is checked over all character lines, and the number of character lines equal to or greater than the threshold value T2: Nc is obtained. Next, the ratio of the number of character lines equal to or higher than the threshold T2: Nc and the total number of character lines of the original document: Nd: It is checked whether Nc / Nd is equal to or higher than the threshold T3. Then, the skew angle detection of the document of interest is performed, and if it is less, the skew angle detection of the document of interest is not executed.
In this way, a meaningless detection operation is avoided, and only the skew angle with respect to a meaningful object is detected.
[0032]
“Embodiment 7”
In the present embodiment, in the case where the skew is detected by the linear approximation by the regression analysis method as in the “embodiments 4 to 6,” the detected result is evaluated, and whether to use the detection result according to the evaluation result is determined. This is intended to increase the detection accuracy.
In regression analysis, there is a correlation coefficient as a value that expresses the degree of coincidence of linear approximation of the locus of coordinates. The correlation coefficient can be obtained by adding the following processing in addition to the processing procedure described in “Embodiment 4” (see equations (2) to (4)).
In other words, if the positions of X and Y are reversed, another regression line is created.
The equation for the regression line of X with respect to Y is
X = CY + D ……… Formula (5)
And in this case,
C = {NΣXiYi- (ΣXi) (ΣYi)} / {NΣYi²-(ΣYi)²} ……… Formula (6)
To find C, then
D = ΣXi / N-CΣYi / N ……… Formula (7)
To find D.
If the correlation coefficient between X and Y is ρ,
ρ = ± √ | AC | ……… Formula (8)
It becomes. Note that the selection of the compound number in formula (8) is the numerator sign of A or C.
It can be considered that the closer the absolute value of the correlation coefficient ρ is to 1, the better the linear approximation of coordinates, and the regression line corresponds to the slope of the character line. On the contrary, it can be said that as the absolute value of the correlation coefficient ρ is smaller than 1 and closer to 0, the degree to which the regression line and the inclination of the character line do not match increases.
Therefore, when the absolute value of the correlation coefficient ρ calculated with respect to the rectangular coordinates in the character line is small, the line is likely to have failed to cut out the character, and should be used to determine the inclination of the document. Not.
In order to execute the method of this example, when the regression analysis method is applied to the rectangle in each character line and the process of detecting the skew by linear approximation is performed, the correlation coefficient ρ is obtained, and the obtained correlation coefficient ρ A threshold value is set in advance with respect to the absolute value of, and a row that is equal to or smaller than the threshold value is not realized as a skew detection target.
Thus, avoiding the use of the detection result whose accuracy is not guaranteed, the skew angle obtained from the target for which the correct result is obtained is used for processing such as frequency histogram aggregation to obtain the representative skew angle. Further, the processing for compensating for the subsequent skew is the same as that in the first embodiment.
[0033]
“Eighth embodiment”
In the present embodiment, the skew is detected in the same manner as in the “fourth to sixth embodiments”, and when the check is performed using the correlation coefficient in the same manner as in the “seventh embodiment”, a criterion for determining whether it is meaningful to detect in units of documents. And is intended to guarantee an appropriate and lean operation by performing a determination based on the reference at the time of detection.
In this example, each character line in one document is checked by the correlation coefficient described in “Embodiment 7”, and the number of character lines excluded from detection targets is compared with the total number of character lines in the entire document. If the ratio increases, it is determined that the document is too skew, and it is determined that the document has no meaning for calculating the skew.
In order to execute the method of this example, the total number of character line cutout results is counted, a threshold is set in advance for the absolute value of the correlation coefficient, and the number of lines below the threshold Can be easily realized by performing threshold processing on the ratio of the number of lines to the total number of lines and determining the magnitude of the ratio.
That is, when a process of detecting a skew by linear approximation by applying a regression analysis method to a rectangle of each character line created for a target (attention) manuscript, a correlation coefficient ρ is obtained, and the obtained correlation coefficient ρ The number of character lines whose absolute value is less than the threshold T4: Nc and the total number of character lines of the document: Nd ratio: Check whether Nc / Nd is equal to or greater than the threshold T5. Then, the skew angle detection of the document of interest is performed, and if it is less, the skew angle detection of the document of interest is not executed.
In this way, a meaningless detection operation is avoided, and only the skew angle with respect to a meaningful object is detected.
[0034]
“Embodiment 9”
The present embodiment shows an embodiment of a character recognition apparatus or an image processing apparatus according to the present invention. Means for performing the processing steps shown in the method for obtaining the effective resolution in the document image shown in the above-mentioned “embodiments 1 to 3” or the method for detecting skew in the document image shown in “embodiments 4-8” As an example, an apparatus configured using a general-purpose processing apparatus (computer) is illustrated.
FIG. 15 illustrates the configuration of the document image processing apparatus of the present embodiment. As shown in FIG. 15, this example shows an example implemented by a general-purpose processing device (computer). As components, an input device 4 such as a CPU 1, a memory 2, a hard disk drive 3, a scanner, a keyboard, and a mouse. A CD-ROM drive 5, a display 6, a flexible disk drive 7, a communication device 8, etc. are prepared and connected by a bus.
In addition, some of the storage media (not shown) used by the memory 2, the hard disk drive 3, the CD-ROM drive 5, and the flexible disk drive 7 as storage means include character recognition processing and image processing functions according to the present invention. And a program (software) for realizing the processing procedure described in the method for obtaining the effective resolution or the skew detection method described in the above embodiment is recorded.
An original document image to be processed is input by an input device 4 such as a scanner and is stored in, for example, the hard disk 3. The CPU 1 reads a program that realizes the processing functions and methods described above from a recording medium included in the storage unit, executes processing according to the program on the target document image, and outputs the processing result and the like to the display 6 and the like.
As shown in FIG. 16, the character recognition apparatus or image processing apparatus according to the present invention is connected to the external apparatuses 11 to 13 via the communication line 20 such as the Internet by the communication apparatus 8 and functions. You may implement in the form which has a part on a network.
[0035]
【The invention's effect】
  (1) Claim 1, 4Effects corresponding to the invention
  Extract a rectangle that seems to be a character element from the circumscribed rectangle of the black run that was generated based on the document image to be processed, connect neighboring rectangles together to grow them into rows, add up the size of each obtained row, By calculating the character size (number of pixels) and comparing the calculated value with the standard character size (length)ResolvingEstimate imageDoAs a result, it is possible to stably extract lines and characters without depending on the reading resolution, and it is highly accurate and robust.Image processingapparatusOr image processing methodCan be realized.
  (2) Claim 2, 5Effects corresponding to the invention
  Extract the rectangle that seems to be a character element from the circumscribed rectangle of the black run that was generated based on the document image to be processed, connect neighboring rectangles to grow to a line, and obtain the line size for each line obtained By obtaining a representative character size (number of pixels) based on circumscribed rectangle information in the line having a size of a predetermined ratio or more, and comparing the obtained value with a standard character size (length)ResolvingEstimate imageDoAs a result, the line and characters can be cut out stably, without being affected by the skew, without depending on the reading resolution, and it is highly accurate and robust.Image processingapparatusOr image processing methodCan be realized.
  (3) Claim 3, 6Effects corresponding to the invention
  In addition to the effect of (2) above, it is possible to eliminate a case where a character size that is not suitable for the representative character size calculation is included when the skew is large, so that a more accurate character size can be obtained. Become.
  (4) Effect corresponding to invention of Claim 7
  By mounting a program for executing the steps of the image processing method according to any one of claims 4 to 6 on a general-purpose processing device (computer), the effect of any of (1) to (3) above Can be easily realized.
[Brief description of the drawings]
FIG. 1 shows an example of a document image to be processed.
FIG. 2 shows a result of obtaining projections in a vertical direction and a horizontal direction for a horizontally written document.
FIG. 3 is a diagram illustrating an example of a document having a complicated layout.
4 shows a result of obtaining projections in the vertical direction and the horizontal direction for the original shown in FIG. 3; FIG.
FIG. 5 shows a result of creating a circumscribed rectangle of a black run that can be regarded as a character in the example of the document image (FIG. 1).
FIG. 6 is a diagram illustrating processing for integrating neighboring rectangles.
FIG. 7 is a diagram illustrating a character line rectangle and a character circumscribing rectangle obtained as a result of integration processing;
FIG. 8 is a diagram showing a frequency histogram and a representative character size taken with respect to the width (height) of a character line.
FIG. 9 shows an actual example when the same document is read at different resolutions.
FIG. 10 shows an example when original characters of the same size are read at different resolutions.
FIG. 11 is a flowchart relating to character recognition processing of a document image including a processing procedure for reflecting the effective resolution in the setting of internal parameters.
FIG. 12 is a diagram illustrating a row cutout state when a skew occurs.
FIG. 13 is a diagram for explaining the application of regression analysis to a circumscribed rectangle in a character line (a) and the definition of a rectangle by coordinates (b).
FIG. 14 is a diagram for explaining the state of a regression line when an abnormality occurs in row segmentation when a skew occurs.
FIG. 15 shows a configuration of a document image processing apparatus according to an embodiment of the present invention.
FIG. 16 shows another configuration of the document image processing apparatus according to the embodiment of the present invention.
[Explanation of symbols]
1 ... CPU, 2 ... memory,
3 ... Hard disk drive 4 ... Input device,
5 ... CD-ROM drive, 6 ... Display (display device),
7: FD drive 8: Communication device

Claims

Means for generating circumscribing rectangle information of a character image based on a pixel run of a document image input as a processing target; means for obtaining a line size expressed by the number of pixels for each line based on the circumscribing rectangle information; comprising a means for calculating a representative value from the row size determined, and means for estimating a by understanding Zodo to compare the standard character size is expressed by the representative value and length unit of the calculated row size An image processing apparatus characterized by that.

Means for generating circumscribing rectangle information of a character image based on a pixel run of a document image input as a processing target; means for obtaining a line size expressed by the number of pixels for each line based on the circumscribing rectangle information; Means for calculating a representative value of the character size from the circumscribed rectangle information of the in-line character image having a size equal to or larger than a predetermined ratio with respect to the obtained line size, and a standard expressed by the representative value of the calculated character size and the length unit the image processing apparatus characterized by comprising a means for estimating a by understanding Zodo to compare the character size.

3. The image processing apparatus according to claim 2, wherein the means for calculating the representative value of the character size is such that the number of circumscribed rectangles of the in-line character image having a size equal to or larger than a predetermined ratio with respect to the line size is equal to that of the in-line character image. An image processing apparatus characterized in that it is used for calculation only when it is equal to or greater than a predetermined ratio of the total number of circumscribed rectangles .

Generating a bounding rectangle information of the character image on the basis of pixel runs in the input document image as a process target, determining a line size represented by the number of pixels per line on the basis of the circumscribing rectangle information, each row A step of calculating a representative value from the line size obtained in step (b), and a step of estimating the resolution by comparing the representative value of the calculated line size with a standard character size expressed in length units. An image processing method .

Generating circumscribing rectangle information of a character image based on a pixel run of a document image input as a processing target; obtaining a row size expressed by the number of pixels for each row based on the circumscribing rectangle information; A step of calculating a representative value of the character size from the circumscribed rectangle information of the in-line character image having a size equal to or larger than a predetermined ratio with respect to the obtained line size, and a standard expressed by the representative value of the calculated character size and the length unit And a step of estimating the resolution by comparing the character size .

6. The image processing method according to claim 5 , wherein the step of calculating the representative value of the character size is such that the number of circumscribed rectangles of the in-line character image having a size equal to or larger than a predetermined ratio with respect to the line size is equal to that of the in-line character image. An image processing method characterized in that it is used for calculation only when the ratio is equal to or greater than a predetermined ratio of the total number of circumscribed rectangles .

A program for causing a computer to execute each processing step of the image processing method according to claim 4.