JP4194309B2

JP4194309B2 - Document direction estimation method and document direction estimation program

Info

Publication number: JP4194309B2
Application number: JP2002202959A
Authority: JP
Inventors: 裕勝山
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2002-07-11
Filing date: 2002-07-11
Publication date: 2008-12-10
Anticipated expiration: 2022-07-11
Also published as: JP2004046528A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像中の文書方向を推定する文書方向推定方法および文書方向推定プログラムに関するものである。
【０００２】
【従来の技術】
従来、文書をスキャナー装置で読み取った文書画像の文字認識は、文書画像から文字領域を抽出し、抽出した文字領域について文字の配置されている方向に文字認識するようにしていた。
【０００３】
この際、抽出した文字領域について、４方向に順次回転して文字認識をそれぞれ行い、領域内文字の信頼度の平均値を求め、次に、全体の画像での平均値が最も高い方向を文書の方向とし、文字認識していた（特開平８−２１２２９８号公報）。
【０００４】
また、文書画像から文字領域を抽出し、抽出した文字領域に属性（タイトル、本文など）をつける。文字領域を４方向に回転して文字認識を行い、領域内文字の信頼度を予め定めた属性の優先順位を使って文書方向を決定する。このとき、方向決定には、各文字領域から求めた方向をもとに文書全体の方向を多数決で決める（特開平９−６９１３６号公報）。
【０００５】
また、文書画像から文字領域を抽出し、抽出した文字領域を４方向に回転して文字認識を行い、領域内文字の信頼度からそれぞれの領域の方向を決める。文書全体では、部分領域の面積を考慮して方向を決定する（特開平２００１−３１２６９７号公報）。
【０００６】
【発明が解決しようとする課題】
上述した従来の技術では、全て文字領域を求め、当該文字領域について画像を回転し、回転後の画像の文字認識をそれぞれ行い、信頼度の高い方向を文書方向と決定していた。
【０００７】
このため、以下のような課題があった。
（１）文字領域が正しく行われていればよいが、例えば文字領域の中に一部図形がかかる場合は、そのパターンの文字認識結果の信頼度や距離値が、領域全体の信頼度や距離値の平均などに影響を与えてしまい、領域の文書方向を正しく求められないという問題があった。
【０００８】
（２）文書画像から抽出した文字領域の全てについて文字認識してその信頼度から方向を推定することは、当該文字領域に含まれる文字数分の文字認識処理が必要となり、文書方向決定のための処理時間が長くかかってしまうという問題があった。
【０００９】
（３）（２）の問題を避けるため、文書方向を決定する文字領域の数を制限して少なくして高速化を図ることが考えられるが、文字領域を減らしたのでは文書全体としての方向を誤る可能性が大となってしまう問題があった。
【００１０】
本発明は、これらの問題を解決するため、文書画像から文字方向を決定する際に、文書方向の推定精度を保ちながら処理速度を速くし、かつ文書画像全体から文書方向を推定することを目的としている。
【００１１】
【課題を解決するための手段】
図１を参照して課題を解決するための手段を説明する。
【００１２】
図１において、レイアウト解析手段１４は、画像をレイアウト解析してテキスト領域、表領域などを抽出するものである。
【００１３】
探索領域抽出手段１５は、テキスト領域内の矩形について距離が近い矩形をまとめることを、まとめた矩形が指定数あるいはまとめ処理が指定回数となるまで繰り返して矩形数を減らすなどするものである。
【００１４】
文字認識手段１６は、文字を認識するものである。
次に、動作を説明する。
【００１５】
レイアウト解析手段１４が画像をレイアウト解析してテキスト領域を抽出し、探索領域抽出手段１５が抽出したテキスト領域内の矩形について距離が近い矩形をまとめることを、まとめた矩形が指定数あるいはまとめ処理が指定回数となるまで繰り返して矩形数を減らし、まとめた矩形をもとに文書方向を決定するようにしている。
【００１６】
この際、探索領域抽出手段１５が抽出したテキスト領域内の各矩形について、矩形サイズが所定サイズの範囲内、かつ当該矩形を中心にした所定探索領域内で白から黒あるいは黒から白あるいは両者の変化数を当該探索領域で除算した値が所定範囲内、かつ前記探索範囲内に大きな矩形がない当該矩形について前記距離が近いを矩形をまとめることを、まとめた矩形が指定数あるいはまとめ処理が指定回数となるまで繰り返すようにしている。
【００１７】
また、探索領域抽出手段１５がまとめた矩形の重心を算出して当該重心に最も近い矩形を選択して当該選択した矩形を中心に探索領域を決定するようにしている。
【００１８】
また、まとめた矩形内の全矩形あるいは探索領域内の全矩形を４方向に回転させてそれぞれの文字認識を行い、最も確からしい文書方向を決定するようにしている。
【００１９】
また、画像をレイアウト解析して抽出した表領域内の、各セルについてレイアウト解析して抽出したテキスト領域について、レイアウト解析してテキスト領域を抽出するようにしている。
【００２０】
従って、文書画像から文字方向を決定する際に、文書方向の推定精度を保ちながら処理速度を速くし、かつ文書画像全体から文書方向を推定することが可能となる。
【００２１】
【発明の実施の形態】
次に、図１から図８を用いていて本発明の実施の形態および動作を順次詳細に説明する。
【００２２】
図１は、本発明のシステム構成図を示す。
図１において、処理装置１は、プログラムに従い各種処理を実行するものであって、入力手段１１、ラベリング手段１２、文字サイズ推定手段１３、レイアウト解析手段１４、探索領域抽出手段１５、文字認識手段１６、および出力手段１７などから構成されるものである。
【００２３】
入力手段１１は、ＯＣＲ２で読み取った文書の画像を取り込んだりなどするものである。
【００２４】
ラベリング手段１２は、画像上で、ある黒画素に注目して当該黒画素に隣接する他の黒画素を順次連結した当該領域にラベルを付与するものである。
【００２５】
文字サイズ推定手段１３は、ラベリング手段１２によってラベルを付与した、画素の連結した領域中の文字サイズ（文字に相当する矩形のサイズ）の最頻度のサイズ（縦Ｈ０、横Ｗ０のサイズ）を算出するものである（図５の（ｂ）参照）。
【００２６】
レイアウト解析手段１４は、画像中の表領域、図形領域、およびテキスト領域を解析するものである（図４参照）。
【００２７】
探索領域抽出手段１５は、テキスト領域内の矩形について、距離が近い矩形をまとめることを、まとめた矩形が指定数あるいはまとめ処理が指定回数となるまで繰り返して矩形数を減らし、当該まとめた矩形の重心に最も近い矩形を中心とした所定サイズの探索領域を抽出するものである（図７参照）。
【００２８】
文字認識手段１６は、文字を認識するものである。
出力手段１７は、文字認識結果をファイル４に出力するものである。
【００２９】
ＯＣＲ２は、書類をスキャナで読み取って画像を生成するものである。
画像データ３は、書類をスキャナで読み取った画像である。
【００３０】
ファイル４は、画像データ３について文字認識した結果を格納するファイルである。
【００３１】
次に、図２および図３のフローチャートの順番に従い、図１の構成の動作を順次詳細に説明する。
【００３２】
図２および図３は、本発明の動作説明フローチャートを示す。
図２において、Ｓ１は、入力する。これは、文書をスキャナーなどで読み取った文書画像を図１の処理装置１に入力する。
【００３３】
Ｓ２は、黒画素連結領域を抽出する。これは、画像上で例えば左上の黒画素に連結する他の黒画素を全て抽出し当該領域にラベルを付与する（ラベリング）。
【００３４】
Ｓ３は、文字サイズを推定する。これは、Ｓ１でラベリングした黒画素を連結した各領域をもとに，文字サイズを推定する（図５の（ｂ）で後述）。
【００３５】
Ｓ４は、レイアウト解析を行う。これは、Ｓ２で抽出した黒画素の連結領域についてレイアウト解析、即ち、表領域、図領域、およびテキスト領域を解析する（図４を用いて後述する）。そして、テキスト領域については、Ｓ１１以降の処理を実行する。表領域については、Ｓ３１以降を実行する。
【００３６】
Ｓ１１は、Ｓ４のレイアウト解析でテキスト領域と解析された領域（テキスト領域）について、１つのテキスト領域を選択する。
【００３７】
Ｓ１２は、テキスト領域内の１矩形を選択する。
Ｓ１３は、条件３，４，５の判定を行う。これは、テキスト領域内の矩形につて、後述する図７の条件３，４、５の判定を行う。ここで、
・条件３は、矩形の幅ＷとＳ３で推定した文字サイズＷ０との差の絶対値が閾値以下、かつ、矩形の高ＨがＳ３で推定した文字サイズＨ０との差の絶対値が閾値以下かを判別する。
【００３８】
・条件４は、探索領域（図７の（ｂ）参照）内の線密度が閾値以下か判別する（図７を用いて後述する）。
【００３９】
・条件５は、探索矩形内に大きな矩形がない。
以上の３つの条件を全て満たしたときにＹＥＳとなり、Ｓ１４で文字候補として抽出し、Ｓ１５に進む。一方、ＮＯの場合には、Ｓ１５に進む。
【００４０】
Ｓ１５は、テキスト領域内の矩形について全てＳ１２からＳ１５の処理を終了したか判別する。ＹＥＳの場合には、Ｓ１６に進む。ＮＯの場合には、Ｓ１２以下を繰り返す。
【００４１】
Ｓ１６は、全てのテキスト領域が終了か判別する。ＹＥＳの場合には、全てのテキスト領域について、文字候補の抽出を終了したので、Ｓ１７に進む。ＮＯの場合には、Ｓ１１に戻り繰り返す。
【００４２】
Ｓ１７は、文字候補をクラスタリングする。
Ｓ１８は、クラスタが指定の数か判別する。これらＳ１７、Ｓ１８は、Ｓ１４で抽出したテキスト領域内の文字候補の矩形についてクラスタリング、即ち、ある矩形に近い他の矩形をまとめることを順次行い、全体のまとめた矩形の数が所定数（例えば３個）になるまで繰り返す（図８を用いて後述する）。ＹＥＳの場合には、クラスタの数が指定の数になったので、Ｓ１９に進む。ＮＯの場合には、Ｓ１７を繰り返す。
【００４３】
Ｓ１９は、矩形候補を生成する。これは、Ｓ１８のＹＥＳでクラスタリングして指定個数になったので、当該指定個数になった矩形の集合を矩形候補とする。
【００４４】
Ｓ２０は、重心を計算する。これは、Ｓ１９で矩形の集合の矩形候補の全体の重心を求める（図８参照）。
【００４５】
Ｓ２１は、重心に最も近い矩形候補を選択する。これは、Ｓ２０で求めた重心に最も近い矩形候補を選択する（図８の矩形候補□（中が黒）を選択する）。
【００４６】
Ｓ２２は、Ｓ２１で選択した矩形を中心とした探索領域を求める。これは、Ｓ２１で求めた例えば図８の矩形候補□（中が黒）を中心として探索領域を求める。そして、図２のＳ４１に進む。
【００４７】
Ｓ４１は、探索領域の取り込みを行う。
Ｓ４２は、４方向に回転した画像を作成する。これらＳ４１、Ｓ４２は、図２のＳ２２で求めた、例えば図８の探索領域の画像を取り込み、４方向に回転させた画像をそれぞれ作成する。
【００４８】
Ｓ４３は、文字認識する。これは、Ｓ４４で４方向に回転させた探索領域の画像について、それぞれ文字認識を行う。
【００４９】
Ｓ４４は、最も確からしい方向を判定する。これは、Ｓ４３で４方向に回転させた探索領域の画像について、それぞれ文字認識を行い、文字認識度の最も高い方向を文字方向として判定する。
【００５０】
Ｓ４５は、領域終了か判別する。ＹＥＳの場合には、Ｓ４６に進む。ＮＯの場合には、Ｓ４１に戻り、次の探索領域について繰り返す。
【００５１】
Ｓ４６は、各探索領域の方向を決定する。
Ｓ４７は、多数決で方向を決定する。これらＳ４６、Ｓ４７は、各探索領域の文字認識率の高い方向をそれぞれ決定し、テキスト領域内で当該決定した探索領域内の文字方向について多数決で１つを決定する。
【００５２】
Ｓ４８は、全体を多数決した方向で文字認識する。これは、Ｓ４７で決めた方向で、当該テキスト領域内の全ての文字矩形について文字認識を行う。
【００５３】
以上によって、文書画像上でテキスト領域を抽出し、条件３，４，５の判定で文字候補矩形のみを抽出し、抽出した文字候補についてクラスタリングを行ってまとめた矩形を生成して矩形数を減らし（例えば３個、５個などに減らし）、当該減らしたまとめた矩形の重心に最も近い文字候補矩形を中心に探索領域を設定し、当該探索領域内について文字認識して文字方向を判定してテキスト領域の全体の文字方向を多数決で決定し、当該決定した文字方向でテキスト領域内の矩形の文字認識を行うことにより、テキスト領域内の文字方向を少ない探索領域内のみで迅速かつに処理量少なくして決定し、決定した文字方向でテキスト領域内の文字認識を行うことが可能となる。
【００５４】
図２のＳ３１は、Ｓ４のレイアウト解析で表領域と判定されたので、当該表領域内のセルを抽出する。
【００５５】
Ｓ３２は、セル内をレイアウト解析してテキスト領域だけ抽出する。これは、Ｓ４のレイアウト解析と同様に行う。
【００５６】
Ｓ３３は、条件１，２に合致した領域を抽出する。これは、表のセル内のテキスト領域について、後述する図６の（ａ）の条件１，２に合致する領域を抽出する。ここで、
・条件１は、領域サイズが閾値の範囲内
・条件２は、領域内の黒画素密度が閾値の範囲内
とそれぞれ判定し、両者が満たされた領域のみを抽出する。
【００５７】
Ｓ３４は、閾値サイズ以上の領域か判別する。ＹＥＳの場合には、表領域のセル内のテキスト領域と判明したので、既述したＳ１１からＳ２２で探索領域を求め、続いて図３のＳ４１からＳ４８で文字方向を決定して文字認識する。一方、Ｓ３４のＮＯの場合には、Ｓ３５で面積でソートし、Ｓ３６で大きな指定個を探索領域として求め、当該求めた探索領域について既述した図３のＳ４１からＳ４８で文字方向を決定して文字認識する。
【００５８】
以上によって、表領域内のセルがテキスト領域の場合にも、同様に探索領域を求めて文字方向を迅速かつ処理量少なく決定し、当該文字方向でテキスト領域の文字認識を行うことが可能となる。
【００５９】
図４は、本発明の説明図（その１）を示す。
図４の（ａ）は、原画像例を示す。ここでは、画像上に図示のように、図、テキスト、表があるとする。
【００６０】
図４の（ｂ）は、レイアウト解析結果例を示す。これは、図４の（ａ）の原画像について、既述した図２のＳ４のレイアウト解析して得たレイアウト解析結果の例を示す。ここで、
・表領域は、黒画素が所定以上の長さ連結する罫線で構成されている領域として判定する。
【００６１】
・図領域は、サイズが閾値より大きな黒画素連結領域がある領域として判定する。
【００６２】
・テキスト領域は、原画像中で、表領域、図領域でない領域をここでは、テキスト領域と判定する。
【００６３】
以上の処理によって、原画像中からテキスト領域を抽出したり、更に、表領域内のセルについてレイアウト解析してテキスト領域を抽出（既述した図２のＳ３２）したりなどすることが可能となる。
【００６４】
図５は、本発明の説明図（その２）を示す。
図５の（ａ）は、ラベリング結果例を示す。ここで、各矩形は、原画像上で、ある黒画素に連結する他の黒画素をまとめ、当該まとめた黒画素のあつまりについて内接する矩形として生成したものである。
【００６５】
図５の（ｂ）は、文字サイズ推定例を示す。
図５の（ｂ−１）は、文字サイズの幅Ｗ０を推定する説明図を示す。図示の曲線は、図５の（ａ）などのラベリング結果の各文字矩形の幅Ｗを全て求め、横軸を当該求めた幅、縦軸をその頻度で表した曲線である。そして、図示のように求めた最頻度の幅を文字サイズ幅Ｗ０と推定する。本実施例は最頻度の文字サイズ幅を文字サイズ幅Ｗ０として推定したが、平均文字サイズ幅を文字サイズ幅Ｗ０として推定してもよい。
【００６６】
図５の（ｂ−２）は、文字サイズの高Ｈ０を推定する説明図を示す。図示の曲線は、図５の（ａ）などのラベリング結果の各文字矩形の高Ｈを全て求め、横軸を当該求めた高、縦軸をその頻度で表した曲線である。そして、最頻度の高Ｈ０を図示のように求め、文字サイズ高Ｈ０と推定する。本実施例は最頻度の文字サイズ高を文字サイズ高Ｈ０として推定したが、平均文字サイズ高を文字サイズ高Ｈ０として推定してもよい。
【００６７】
図６は、本発明の説明図（その３）を示す。
図６の（ａ）は、条件を示す。これは、既述した図２の表領域内のセルについてレイアウト解析して抽出したテキスト領域について、テキストが含まれている可能性が高い領域を抽出する条件であって、ここでは、
・条件１は、閾値＜領域サイズ＜閾値
・条件２は、閾値＜領域内の黒画素密度＜閾値
である。即ち、条件１で表領域内のセルについてテキスト領域とレイアウト解析された領域について、当該領域のサイズが所定の閾値の範囲内（条件１）、かつ当該領域内の黒画素の密度が所定の閾値の範囲内（条件２）のときに、可能性大のテキスト領域として抽出する。
【００６８】
図７は、本発明の説明図（その４）を示す。
図７の（ａ）は、条件を示す。これら条件３，４，５は、既述した図２のＳ１３の条件３，４，５であって、文字矩形として抽出する条件であり、
・条件３は、ΔＷ＝｜Ｗ−Ｗ０｜＜閾値かつ ΔＨ＝｜Ｈ−Ｈ０｜＜閾値
（Ｗ０，Ｈ０は推定文字サイズの幅、高（図５の（ｂ）））
・条件４は、閾値＜探索領域の線密度＜閾値
（線密度は領域内をラスタスキャンしたときの白から黒の変化点数／領域面積）
・条件５は、探索矩形内に大きな矩形が無い
（探索領域内をラベリングして大きな矩形を探索して無い）
である。ここで、条件３は、テキスト領域内の矩形の幅Ｗと高Ｈが、既述した図５の（ｂ）で推定した文字サイズの幅Ｗ０と高Ｈ０とのそれぞれの差が所定閾値以内であるという条件である。条件４は、文字矩形を中心とした探索領域の線密度（探索領域内を一定方向にスキャンして例えば白から黒に変わる点の数を当該探索領域の面積で除算した値）が所定の閾値の範囲内であるという条件である。条件５は、文字矩形を中心とした探索領域内に、大きな矩形がないという条件である。
【００６９】
以上の条件３，４，５を満たした場合、既述した図２のＳ１３のＹＥＳとなり、当該文字矩形を文字候補として抽出することが可能となる。
【００７０】
図７の（ｂ）は、探索領域の決定例を示す。図示のテキスト領域内の例えばほぼ中央の文字”か”の矩形に注目し、当該矩形の幅Ｗ，高Ｈとし、当該矩形を中心にして探索領域を図示のように設け、図７の（ａ）の条件３，４，５を適用し、ここでは、文字矩形候補として抽出する。
【００７１】
図８は、本発明の説明図（クラスタリングと最終探索領域の抽出例）を示す。
図８において、▲１▼の文字矩形は、既述した図２のＳ１４で文字候補として抽出された矩形を示す。
【００７２】
▲２▼のクラスタリング結果のクラスタは、文字矩形をクラスタリング、即ち、文字矩形に最も距離の近い他の文字矩形同士をまとめることを繰り返し行い、文字矩形の数を減らすことで求められた。文字矩形間の距離は、本実施例の場合、各矩形の左上の座標同士の距離とした。図８では３個に減らした、クラスタとしてまとめた文字矩形を示す。
【００７３】
▲３▼の×は、クラスタ重心である。本実施例では、クラスタ重心は、クラスタを構成する文字矩形の左上の座標値の平均値から求められる。
▲４▼の探索領域は、クラスタ重心に最も近い矩形□（中が黒）を中心に求めた探索領域を示す。
【００７４】
以上のようにして算出した▲４▼の探索領域について、既述した図３のＳ４１からＳ４８の処理により、文字方向を迅速かつ処理量を少なくして決定し、当該決定した文字方向でテキスト領域内の文字認識を行うことが可能となる。
【００７５】
【発明の効果】
以上説明したように、本発明によれば、画像中からテキスト領域を抽出し、当該テキスト領域中の矩形のうちから条件３，４，５により文字矩形候補を抽出し、当該文字矩形候補をまとめて数を減らし、当該減らした後のまとめた矩形の重心に近い文字矩形候補を中心に所定サイズの探索領域を決定し、当該探索領域で文字方向を決め、当該決めた文字方向でテキスト領域の文字認識を行う構成を採用しているため、文書画像から文字方向を決定する際に、文書方向の推定精度を保ちながら処理量を削減して迅速、かつ文書画像全体から文書方向を推定することが可能となる。そして、推定した文字方向でテキスト領域の文字認識を行うことが可能となる。
【図面の簡単な説明】
【図１】本発明のシステム構成図である。
【図２】本発明の動作説明フローチャート（その１）である。
【図３】本発明の動作説明フローチャート（その２）である。
【図４】本発明の説明図（その１）である。
【図５】本発明の説明図（その２）である。
【図６】本発明の説明図（その３）である。
【図７】本発明の説明図（その４）である。
【図８】本発明の説明図（クラスタリングと最終探索領域の抽出例）である。
【符号の説明】
１：処理装置
１１：入力手段
１２：ラベリング手段
１３：文字サイズ推定手段
１４：レイアウト解析手段
１５：探索領域抽出手段
１６：文字認識手段
１７：出力手段
２：ＯＣＲ
３：画像データ
４：ファイル（認識結果）[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a document direction estimation method and a document direction estimation program for estimating a document direction in an image.
[0002]
[Prior art]
Conventionally, character recognition of a document image obtained by reading a document with a scanner device extracts a character area from the document image, and recognizes the extracted character area in the direction in which the characters are arranged.
[0003]
At this time, character recognition is performed by sequentially rotating the extracted character area in four directions to obtain an average value of the reliability of the characters in the area, and then the direction with the highest average value in the entire image is Character recognition (Japanese Patent Laid-Open No. 8-212298).
[0004]
In addition, a character area is extracted from the document image, and attributes (title, text, etc.) are attached to the extracted character area. Character recognition is performed by rotating the character area in four directions, and the document orientation is determined by using the priority of attributes in which the reliability of the characters in the area is determined in advance. At this time, for direction determination, the direction of the entire document is determined by majority decision based on the direction obtained from each character area (Japanese Patent Laid-Open No. 9-69136).
[0005]
In addition, a character area is extracted from the document image, the extracted character area is rotated in four directions to perform character recognition, and the direction of each area is determined from the reliability of the characters in the area. In the entire document, the direction is determined in consideration of the area of the partial region (Japanese Patent Laid-Open No. 2001-312697).
[0006]
[Problems to be solved by the invention]
In the conventional technique described above, all character areas are obtained, the image is rotated for the character areas, character recognition is performed on the rotated image, and the direction with high reliability is determined as the document direction.
[0007]
For this reason, there were the following problems.
(1) The character area only needs to be correctly performed. For example, when a part of a figure is applied to the character area, the reliability or distance value of the character recognition result of the pattern is the reliability or distance of the entire area There is a problem in that the average of the values is affected and the document direction of the area cannot be obtained correctly.
[0008]
(2) Character recognition for all the character regions extracted from the document image and estimation of the direction from the reliability require character recognition processing for the number of characters included in the character region, There was a problem that it took a long processing time.
[0009]
(3) In order to avoid the problem of (2), it is conceivable to increase the speed by limiting the number of character areas for determining the document direction. However, if the number of character areas is reduced, the direction of the whole document There is a problem that the possibility of mistakes becomes large.
[0010]
In order to solve these problems, an object of the present invention is to increase the processing speed while maintaining the accuracy of estimating the document direction and determine the document direction from the entire document image when determining the character direction from the document image. It is said.
[0011]
[Means for Solving the Problems]
Means for solving the problem will be described with reference to FIG.
[0012]
In FIG. 1, a layout analysis unit 14 performs layout analysis on an image and extracts a text area, a table area, and the like.
[0013]
The search area extraction means 15 is to reduce the number of rectangles by repeatedly collecting rectangles that are close to each other in the text area until the number of collected rectangles is the designated number or the number of times the summarization process is performed.
[0014]
The character recognition means 16 recognizes characters.
Next, the operation will be described.
[0015]
The layout analysis means 14 performs layout analysis on the image to extract a text area, and the rectangles within the distance of the rectangles in the text area extracted by the search area extraction means 15 are grouped together. The number of rectangles is repeated until the specified number of times is reached, and the document orientation is determined based on the collected rectangles.
[0016]
At this time, for each rectangle in the text area extracted by the search area extraction means 15, the rectangle size is within a predetermined size range, and white to black or black to white or both within a predetermined search area centered on the rectangle. The number of changes divided by the search area is within a predetermined range, and the rectangles that are close to each other with no large rectangle within the search range are grouped together. It repeats until it reaches the number of times.
[0017]
Further, the centroid of the rectangles collected by the search area extraction means 15 is calculated, the rectangle closest to the centroid is selected, and the search area is determined with the selected rectangle as the center.
[0018]
In addition, all the rectangles in the collected rectangles or all the rectangles in the search area are rotated in four directions to perform character recognition to determine the most probable document direction.
[0019]
In addition, the text area extracted by performing layout analysis for each cell in the table area extracted by performing layout analysis on the image is subjected to layout analysis to extract the text area.
[0020]
Accordingly, when determining the character direction from the document image, it is possible to increase the processing speed while maintaining the estimation accuracy of the document direction and to estimate the document direction from the entire document image.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
Next, embodiments and operations of the present invention will be sequentially described in detail with reference to FIGS.
[0022]
FIG. 1 shows a system configuration diagram of the present invention.
In FIG. 1, a processing apparatus 1 executes various processes according to a program, and includes an input unit 11, a labeling unit 12, a character size estimation unit 13, a layout analysis unit 14, a search area extraction unit 15, and a character recognition unit 16. , And output means 17 and the like.
[0023]
The input unit 11 takes in an image of a document read by the OCR 2 or the like.
[0024]
The labeling means 12 gives a label to the region in which other black pixels adjacent to the black pixel are sequentially connected while paying attention to a certain black pixel on the image.
[0025]
The character size estimation unit 13 calculates the most frequent size (vertical H0, horizontal W0 size) of the character size (rectangular size corresponding to the character) in the pixel-connected region, to which the label is applied by the labeling unit 12. (See FIG. 5B).
[0026]
The layout analysis means 14 analyzes a table area, a graphic area, and a text area in the image (see FIG. 4).
[0027]
The search area extracting means 15 repeats the process of collecting rectangles that are close to each other in the text area until the number of collected rectangles is the specified number or the number of times the grouping process is performed, thereby reducing the number of rectangles. A search area having a predetermined size centered on a rectangle closest to the center of gravity is extracted (see FIG. 7).
[0028]
The character recognition means 16 recognizes characters.
The output unit 17 outputs the character recognition result to the file 4.
[0029]
The OCR 2 reads a document with a scanner and generates an image.
Image data 3 is an image obtained by reading a document with a scanner.
[0030]
The file 4 is a file for storing the result of character recognition for the image data 3.
[0031]
Next, the operation of the configuration of FIG. 1 will be sequentially described in detail in the order of the flowcharts of FIGS.
[0032]
2 and 3 are flowcharts for explaining the operation of the present invention.
In FIG. 2, S1 is input. In this case, a document image obtained by reading a document with a scanner or the like is input to the processing apparatus 1 shown in FIG.
[0033]
In S2, a black pixel connection area is extracted. For example, all other black pixels connected to the upper left black pixel are extracted on the image, and a label is assigned to the area (labeling).
[0034]
S3 estimates the character size. In this method, the character size is estimated on the basis of each region obtained by connecting the black pixels labeled in S1 (described later in FIG. 5B).
[0035]
In S4, layout analysis is performed. In this case, layout analysis, that is, a table area, a figure area, and a text area is analyzed for the black pixel connection area extracted in S2 (to be described later with reference to FIG. 4). And about a text area | region, the process after S11 is performed. For the table area, S31 and subsequent steps are executed.
[0036]
In S11, one text area is selected for the area (text area) analyzed in the layout analysis in S4.
[0037]
In S12, one rectangle in the text area is selected.
In S13, conditions 3, 4, and 5 are determined. In this case, the conditions 3, 4 and 5 in FIG. 7 to be described later are determined for the rectangle in the text area. here,
Condition 3 is that the absolute value of the difference between the rectangular width W and the character size W0 estimated at S3 is less than the threshold, and the absolute value of the difference between the rectangle height H and the character size H0 estimated at S3 is less than the threshold. Is determined.
[0038]
Condition 4 determines whether or not the line density in the search area (see FIG. 7B) is equal to or less than a threshold (described later with reference to FIG. 7).
[0039]
Condition 5 is that there is no large rectangle in the search rectangle.
When all the above three conditions are satisfied, the answer is YES, the character candidates are extracted in S14, and the process proceeds to S15. On the other hand, if NO, the process proceeds to S15.
[0040]
In step S15, it is determined whether the processing in steps S12 to S15 has been completed for all rectangles in the text area. If YES, the process proceeds to S16. In the case of NO, S12 and subsequent steps are repeated.
[0041]
In S16, it is determined whether all the text areas are completed. In the case of YES, extraction of character candidates has been completed for all text regions, and the process proceeds to S17. If NO, return to S11 and repeat.
[0042]
In step S17, character candidates are clustered.
In S18, it is determined whether the number of clusters is a specified number. In S17 and S18, clustering is performed on the candidate character rectangles in the text area extracted in S14, that is, other rectangles close to a certain rectangle are sequentially collected, and the total number of rectangles is a predetermined number (for example, 3 It repeats until it becomes (piece) (it mentions later using Drawing 8). In the case of YES, since the number of clusters has reached the specified number, the process proceeds to S19. If NO, repeat S17.
[0043]
S19 generates a rectangle candidate. Since this is the specified number after clustering with YES in S18, a set of rectangles having the specified number is set as a rectangle candidate.
[0044]
S20 calculates the center of gravity. In S19, the center of gravity of the rectangle candidates of the set of rectangles is obtained (see FIG. 8).
[0045]
S21 selects a rectangle candidate closest to the center of gravity. This selects the rectangle candidate closest to the center of gravity obtained in S20 (selects the rectangle candidate □ (inside is black) in FIG. 8).
[0046]
S22 calculates | requires the search area | region centering on the rectangle selected by S21. For example, the search area is obtained centering on the rectangle candidate □ (inside is black) in FIG. Then, the process proceeds to S41 in FIG.
[0047]
In S41, the search area is fetched.
In S42, an image rotated in four directions is created. In S41 and S42, for example, the image of the search area shown in FIG. 8 obtained in S22 of FIG. 2 is taken and images rotated in four directions are respectively created.
[0048]
S43 recognizes characters. In this process, character recognition is performed for each image in the search area rotated in four directions in S44.
[0049]
S44 determines the most probable direction. In this process, character recognition is performed for each search region image rotated in four directions in S43, and the direction with the highest character recognition level is determined as the character direction.
[0050]
In S45, it is determined whether or not the region is finished. If YES, the process proceeds to S46. If NO, return to S41 and repeat for the next search area.
[0051]
S46 determines the direction of each search area.
In S47, the direction is determined by majority vote. In S46 and S47, the direction in which the character recognition rate of each search area is high is determined, and one is determined by majority for the character direction in the search area determined in the text area.
[0052]
In S48, characters are recognized in the direction in which the majority is determined. This performs character recognition for all the character rectangles in the text area in the direction determined in S47.
[0053]
As described above, the text area is extracted from the document image, only the character candidate rectangles are extracted according to the determinations of the conditions 3, 4 and 5, and the extracted character candidates are clustered to generate a combined rectangle to reduce the number of rectangles. (E.g., reduce to 3, 5, etc.), a search area is set around the character candidate rectangle closest to the center of gravity of the reduced rectangle, character recognition is performed in the search area, and the character direction is determined. By deciding the entire character direction of the text area by majority decision and performing rectangular character recognition in the text area in the determined character direction, the character direction in the text area can be processed quickly and quickly in only a small search area It is possible to perform the character recognition in the text region in the determined character direction with the determined character direction.
[0054]
Since S31 in FIG. 2 is determined as a table area in the layout analysis of S4, cells in the table area are extracted.
[0055]
In S32, only the text area is extracted by analyzing the layout of the cell. This is performed in the same manner as the layout analysis in S4.
[0056]
In S33, an area that meets the conditions 1 and 2 is extracted. This extracts an area that matches the conditions 1 and 2 in FIG. 6A described later with respect to the text area in the table cell. here,
Condition 1 determines that the area size is within the threshold range Condition 2 determines that the black pixel density within the area is within the threshold range, and extracts only the area where both are satisfied.
[0057]
In S34, it is determined whether the area is equal to or larger than the threshold size. In the case of YES, since the text area is found in the table area cell, the search area is obtained in S11 to S22 described above, and then the character direction is determined in S41 to S48 in FIG. 3 to recognize the character. On the other hand, in the case of NO in S34, the area is sorted in S35, a large designated item is obtained as a search area in S36, and the character direction is determined in S41 to S48 of FIG. Recognize characters.
[0058]
As described above, even when a cell in the table area is a text area, it is possible to determine a search area in a similar manner, determine a character direction quickly and with a small amount of processing, and perform character recognition in the text area in the character direction. .
[0059]
FIG. 4 shows an explanatory diagram (part 1) of the present invention.
FIG. 4A shows an example of an original image. Here, it is assumed that there are a figure, a text, and a table as shown in the figure.
[0060]
FIG. 4B shows an example of the layout analysis result. This shows an example of the layout analysis result obtained by performing the layout analysis of S4 of FIG. 2 described above for the original image of FIG. here,
The front area is determined as an area composed of ruled lines in which black pixels are connected for a predetermined length or more.
[0061]
The figure area is determined as an area where there is a black pixel connection area whose size is larger than the threshold value.
[0062]
As the text area, an area that is not a table area or a figure area in the original image is determined as a text area here.
[0063]
Through the above processing, it is possible to extract a text area from the original image, and further extract a text area by analyzing the layout of the cells in the table area (S32 in FIG. 2 described above). .
[0064]
FIG. 5 shows an explanatory diagram (part 2) of the present invention.
FIG. 5A shows an example of a labeling result. Here, each rectangle is generated as a rectangle that inscribes other black pixels connected to a certain black pixel on the original image and that is inscribed in the block of the collected black pixels.
[0065]
FIG. 5B shows an example of character size estimation.
FIG. 5B-1 is an explanatory diagram for estimating the width W0 of the character size. The curve shown in the figure is a curve in which the width W of each character rectangle as a result of labeling such as (a) in FIG. 5 is obtained, the horizontal axis represents the obtained width and the vertical axis represents the frequency. Then, the most frequently obtained width as shown in the figure is estimated as the character size width W0. In the present embodiment, the most frequent character size width is estimated as the character size width W0, but the average character size width may be estimated as the character size width W0.
[0066]
FIG. 5B-2 is an explanatory diagram for estimating the high H0 of the character size. The curve shown in the figure is a curve in which the height H of each character rectangle in the labeling result such as FIG. 5A is obtained, the horizontal axis represents the obtained height, and the vertical axis represents the frequency. Then, the most frequent high H0 is obtained as shown in the figure, and the character size high H0 is estimated. In the present embodiment, the most frequent character size height is estimated as the character size height H0, but the average character size height may be estimated as the character size height H0.
[0067]
FIG. 6 shows an explanatory diagram (part 3) of the present invention.
FIG. 6A shows the conditions. This is a condition for extracting a region that is likely to contain text from the text region extracted by layout analysis of the cells in the table region of FIG. 2 described above.
Condition 1 is threshold <region size <threshold. Condition 2 is threshold <black pixel density in region <threshold. That is, with respect to the area that has been subjected to layout analysis for the cells in the table area under the condition 1, the size of the area is within a predetermined threshold (condition 1) and the density of black pixels in the area is the predetermined threshold. Within the range (condition 2), it is extracted as a text region with a high possibility.
[0068]
FIG. 7 shows an explanatory diagram (part 4) of the present invention.
FIG. 7A shows the conditions. These conditions 3, 4, and 5 are the conditions 3, 4, and 5 of S13 in FIG.
Condition 3 is that ΔW = | W−W0 | <threshold and ΔH = | H−H0 | <threshold (W0 and H0 are the estimated character size width and high ((b) in FIG. 5)).
Condition 4 is threshold <line density of search area <threshold (line density is the number of change points from white to black when area is raster scanned / area of area)
-Condition 5 is that there is no large rectangle in the search rectangle (the search rectangle is not searched for a large rectangle).
It is. Here, the condition 3 is that the width W and the height H of the rectangle in the text area are within a predetermined threshold when the difference between the width W0 and the height H0 of the character size estimated in FIG. It is a condition that there is. Condition 4 is that a line density of a search area centered on a character rectangle (a value obtained by dividing the number of points in which the search area is scanned in a certain direction and changed from white to black, for example, by the area of the search area) is a predetermined threshold value. It is a condition that it is within the range. Condition 5 is a condition that there is no large rectangle in the search area centered on the character rectangle.
[0069]
When the above conditions 3, 4 and 5 are satisfied, the result of S13 in FIG. 2 described above is YES, and the character rectangle can be extracted as a character candidate.
[0070]
FIG. 7B shows an example of determining a search area. For example, pay attention to the rectangle of the character “ka” in the middle of the illustrated text area. The width W and the height H of the rectangle, and a search area is provided as shown in FIG. ), 3, 4 and 5 are applied, and here, they are extracted as character rectangle candidates.
[0071]
FIG. 8 is an explanatory diagram of the present invention (clustering and final search area extraction example).
In FIG. 8, a character rectangle of (1) indicates a rectangle extracted as a character candidate in S14 of FIG.
[0072]
The cluster of the clustering result of (2) was obtained by repeatedly clustering the character rectangles, that is, repeatedly combining other character rectangles closest to the character rectangles, and reducing the number of character rectangles. In this embodiment, the distance between the character rectangles is the distance between the upper left coordinates of each rectangle. FIG. 8 shows character rectangles reduced to three and collected as clusters.
[0073]
The x in (3) is the cluster centroid. In the present embodiment, the cluster centroid is obtained from the average value of the upper left coordinate values of the character rectangles constituting the cluster.
The search area (4) indicates a search area obtained around the rectangle □ (inside is black) closest to the cluster centroid.
[0074]
With respect to the search area (4) calculated as described above, the character direction is determined quickly and with a small amount of processing by the processes of S41 to S48 of FIG. 3, and the text area is determined in the determined character direction. Character recognition can be performed.
[0075]
【The invention's effect】
As described above, according to the present invention, a text region is extracted from an image, character rectangle candidates are extracted from the rectangles in the text region according to conditions 3, 4, and 5, and the character rectangle candidates are collected. The search area of a predetermined size is determined around a candidate character rectangle that is close to the center of gravity of the collected rectangles after the reduction, the character direction is determined in the search area, and the text area is determined in the determined character direction. Since the character recognition configuration is adopted, when determining the character direction from the document image, the processing amount is reduced while maintaining the document direction estimation accuracy, and the document direction is estimated from the entire document image quickly. Is possible. Then, it is possible to perform character recognition of the text area in the estimated character direction.
[Brief description of the drawings]
FIG. 1 is a system configuration diagram of the present invention.
FIG. 2 is a flowchart (part 1) illustrating the operation of the present invention.
FIG. 3 is a flowchart (part 2) illustrating the operation of the present invention.
FIG. 4 is an explanatory diagram (part 1) of the present invention.
FIG. 5 is an explanatory diagram (part 2) of the present invention.
FIG. 6 is an explanatory diagram (part 3) of the present invention.
FIG. 7 is an explanatory diagram (part 4) of the present invention.
FIG. 8 is an explanatory diagram of the present invention (clustering and final search area extraction example).
[Explanation of symbols]
1: Processing device 11: Input means 12: Labeling means 13: Character size estimation means 14: Layout analysis means 15: Search area extraction means 16: Character recognition means 17: Output means 2: OCR
3: Image data 4: File (recognition result)

Claims

In a document direction estimation method for estimating a document direction in an image,
A first step of analyzing the layout of the image and extracting a text region;
A second step of reducing the number of rectangles by repeatedly collecting rectangles that are close in distance to the rectangles in the extracted text area until the number of collected rectangles is a specified number or until a specified number of times of bundling processing;
A search area determining step of calculating a center of gravity of the collected rectangles, selecting a rectangle closest to the center of gravity, and determining a search area around the selected rectangle;
A document direction estimation method comprising: a third step of performing character recognition in a plurality of directions designated for the image of the determined search area and determining a direction having a high character recognition degree as a document direction.

For each rectangle in the extracted text area, the number of changes in white to black or black to white or both in the predetermined search area centered on the rectangle is determined in the search area. It is characterized by repeating the collection of rectangles that are close to each other with respect to the rectangle that has a divided value within a predetermined range and no large rectangle within the search range until the number of combined rectangles reaches a specified number or a specified number of times of grouping processing. The document direction estimation method according to claim 1.

Perform each character recognition by rotating the entire rectangular or all rectangles of the search region of the combined the rectangle in four directions, and determines the most probable document direction according to claim 1 or claim 2 The document orientation estimation method described.

The image of the layout analysis Shie T extracted tablespace, the text area extracted by the layout analysis for each cell, of claims 1 to 3, characterized in that executes subsequent second step The document direction estimation method according to any one of the above.

In a document orientation estimation program that estimates the orientation of a document in an image,
On the computer,
A first step of analyzing the layout of the image and extracting a text region;
A second step of reducing the number of rectangles by repeatedly collecting rectangles that are close in distance to the rectangles in the extracted text area until the number of collected rectangles is a specified number or until a specified number of times of bundling processing;
A search area determining step of calculating a center of gravity of the collected rectangles, selecting a rectangle closest to the center of gravity, and determining a search area around the selected rectangle;
It performs character recognition in a plurality of directions specified for the image in the search area with the determined, document direction estimation program for executing the third step of determining a high character recognition degree direction and the document direction.