JP2004038530A

JP2004038530A - Image processing method, program used for executing the method and image processor

Info

Publication number: JP2004038530A
Application number: JP2002194267A
Authority: JP
Inventors: Fumihiro Hasegawa; 長谷川　史裕
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2002-07-03
Filing date: 2002-07-03
Publication date: 2004-02-05

Abstract

<P>PROBLEM TO BE SOLVED: To achieve an appropriate character row area information even if a document image of a document in which vertical writing and horizontal writing are mingled is provided unconditionally. <P>SOLUTION: A row candidate area which can be assumed to be a character row is extracted (a circumscribed rectangle of connected components of black pixels is obtained and adjoining rectangles are integrated) from an objective document image as a row candidate in a vertical/horizontal direction, if an overlapped area is produced between the extracted vertical/horizontal row candidate areas, the likelihood of character row of each row candidate area is calculated (step S5) as row likelihood (for instance, the feature quantity of the row: row length, row height, aspect ratio of row size, distance between the connected components, connected component size, their fluctuations and the like), according to the result, an inappropriate row candidate in the overlapped area is deleted, and appropriate character row area information is outputted. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、入力画像に含まれる文字・文書画像の認識処理の前段で利用される文字行領域を表す情報を取得するための画像処理に関し、より詳細には、対象とする１画像に縦行・横行が存在する場合であっても適切な文字行領域を表す情報が得られるようにする画像処理方法、同方法の実行に用いるプログラム及び画像処理装置に関する。
【０００２】
【従来の技術】
文字・文書を対象とする画像処理では、従来より読み取った文字・文書画像に対して文字認識等の処理が行われている。この処理を行う際に、処理対象画像に存在する文字・文書画像が占める文字領域の正しい位置情報を取得することは、高い認識精度を得るために不可欠である。仮に、文書画像のどこに文字があるのか不明な状態で文字認識処理を行った場合、文字認識の必要のない写真、図などの領域に文字認識処理を行ってしまうことになるため、不要な処理で時間がかかるだけでなく、文字の存在しない領域に無理に文字認識を実行した結果として、大量にエラ−が出力されることになり、認識結果を利用する際に、大きな困難を伴うことになる。
このため、文字領域の正しい位置情報を取得するための処理方法として、二値画像の黒画素の連結成分を用いた方法が提案された。この方法では、入力された画像の黒画素の連結成分の外接矩形を得、得た外接矩形から、文字、表、図、等に基本要素を分類し、その中から文字要素を取り出し統合して行を生成するという処理を行っている（例えば、特開平９−４４５９４号公報参照）。
【０００３】
ところで、日本語文書は、文字を縦書き・横書きのいずれにすることも可能であるから、予め行方向を知っておかなければ、文字認識処理を行っても正しい認識結果を得ることは困難である。上述の例では、行方向については考慮していないので、適正な行方向へ対応するために何らかの方法により知る必要がある。
こうした課題に対して、特開平１０−６３７７６号公報に示される方法では、処理対象領域を何らかの方法で与えておき、処理対象領域の黒画素の連結成分の外接矩形を求め、得られた外接矩形から次のような量を求めて行方向を推定している。
・外接矩形の縦・横各方向への線密度特徴（射影）を求める。
・隣接矩形間距離の縦・横両方向に関する累計値を求める。
・隣接矩形間の重複値（座標の重なり幅）の縦・横両方向に関する累計値を求
める。
手順としては、まず外接矩形の射影から縦方向・横方向の文字列数の推定を行う。推定した文字列数が１の場合、上記射影情報から行方向を決定する。文字列数が１でない場合は、外接矩形の距離の累計値からこれらの比を求め、所定値を越えたときの比から行方向を決定する。比の値が所定値内におさまって行方向がまだはっきりしない場合は、矩形間の重複値の累計値から行方向を決定する。
【０００４】
【発明が解決しようとする課題】
しかしながら、この方法（特開平１０−６３７７６号公報）では、処理対象となる領域があらかじめ指定されていなければならず、また、外接矩形の射影を用いる関係からこの領域内では縦行と横行が混在していてはならない。従って、縦書き・横書きが一つの文書で混在する文書画像だけを無条件で与えて処理をさせることができず、自動化の妨げとなってしまう。
本発明は、画像読み取り手段等により入力される画像を対象とし、そこに含まれる文字・文書画像が占める文字領域を表す情報を取得する処理における上記した従来技術の問題点に鑑みてなされたものであり、その目的は、縦書き・横書きが一つの文書で混在する文書画像を無条件で与えても適切な文字行領域情報を得ることを可能にする処理を行うための画像処理方法、同方法の実行に用いるプログラム及び画像処理装置を提供することにある。
【０００５】
【課題を解決するための手段】
請求項１の発明は、処理対象として入力された画像から横方向の行と見なせる行候補領域を抽出する横行抽出ステップと、該対象画像から縦方向の行と見なせる行候補領域を抽出する縦行抽出ステップと、前記横行及び縦行抽出ステップの結果をもとに互いが重複する領域を有する横方向と縦方向の行候補領域を検出する重複行検出ステップと、重複行検出ステップで検出された重複領域を有する横方向と縦方向の行候補領域の文字行としての適性を判断し、不適切な方向の行を選択する不適切行選択ステップと、前記不適切行選択ステップの結果に従い不適切な行候補を削除する行削除ステップの各ステップを備えたことを特徴とする画像処理方法である。
【０００６】
請求項２の発明は、請求項１に記載された画像処理方法において、前記横行及び縦行抽出ステップで抽出された行候補領域を格納する行データ格納ステップを備え、前記重複行検出ステップが、前記行データ格納ステップで格納されている行候補領域の中から一行を選び出すステップと、該一行と異なる方向を持ち、かつ領域が重なり合う行候補領域を少なくとも一行選び出すステップを備えたことを特徴とする方法である。
【０００７】
請求項３の発明は、請求項１に記載された画像処理方法において、前記横行及び縦行抽出ステップで抽出された行候補領域を格納する行データ格納ステップを備え、前記重複行検出ステップが、前記行データ格納ステップで格納されている行候補領域の中から一行を選び出す１次選出ステップと、該一行と異なる方向を持ち、かつ領域が重なり合う行候補領域を全て選び出す２次選出ステップと、２次選出ステップ以降に新たに選出される行に対して順次適用し、該行と異なる方向を持ち、かつ領域が重なり合う行候補領域を全て選び出す高次選出ステップと、前記高次選出ステップを所定の条件に従い停止する行選出停止ステップを備えたことを特徴とする方法である。
【０００８】
請求項４の発明は、請求項３に記載された画像処理方法において、前記行選出停止ステップが、選出の際に、あらかじめ定められた選出ステップ回数へ達したとき、選出対象となる行候補領域がなくなったとき、選出された行候補領域数が所定数以上となったとき、選出された行候補領域の外接矩形の面積が所定値を超えたときの各停止条件中の少なくとも一つの条件の成立により選出を停止するステップであることを特徴とする方法である。
【０００９】
請求項５の発明は、請求項１に記載された画像処理方法において、前記横行及び縦行抽出ステップの各々が、黒画素または白画素の連結成分を抽出するステップ、隣接する画素で明度の近い画素同士を連結成分として抽出するステップ、隣接する画素で色の近い画素同士を連結成分として抽出するステップの各ステップの中の少なくとも一つの連結成分抽出ステップと、前記連結成分抽出ステップで得た連結成分から近傍の連結成分を選択するステップと、近傍の連結成分同士のサイズを比較するステップと、比較結果をもとに連結成分のグループ化を行うステップと、グループ化された連結成分の外接矩形を求めるステップとを備えたことを特徴とする方法である。
【００１０】
請求項６の発明は、請求項１に記載された画像処理方法において、前記不適切行選択ステップが、重複領域を有する横方向と縦方向の行候補領域の文字行としての適性を判断するために各行候補領域において行らしさを表す尤度を計算するステップを備えたことを特徴とする方法である。
【００１１】
請求項７の発明は、請求項１に記載された画像処理方法において、前記不適切行選択ステップが、重複領域を有する横方向と縦方向の行候補領域の文字行としての適性を判断するために各方向の行候補領域における単数または複数の行候補全体で行らしさを表す尤度を計算するステップを備えたことを特徴とする方法である。
【００１２】
請求項８の発明は、請求項６又は７に記載された画像処理方法において、前記尤度を計算するステップが、単数または複数の特徴量を求めるステップと、求めた特徴量にもとづいて尤度を算出するステップを備えたことを特徴とする方法である。
【００１３】
請求項９の発明は、請求項８に記載された画像処理方法において、前記特徴量を求めるステップが、行の長さを算出するステップ、行の高さを算出するステップ、行サイズの縦横比を算出するステップ、行を構成する画素の連結成分間の距離を算出するステップ、行を構成する画素の連結成分のサイズを算出するステップ、行の長さのばらつきを算出するステップ、行の高さのばらつきを算出するステップ、行サイズの縦横比のばらつきを算出するステップ、行を構成する画素の連結成分間の距離のばらつきを算出するステップ、行を構成する画素の連結成分のサイズのばらつきを算出するステップ、行を構成する画素の明度情報を抽出するステップ、行を構成する画素の色情報を抽出するステップ、行を構成する画素のエッジ強度を算出するステップ、行を構成する画素の連結成分と周囲の画素の明度差を算出するステップ、行を構成する画素の連結成分と周囲の画素の色差を算出するステップの各ステップの中の少なくとも一つを備えたことを特徴とする方法である。
【００１４】
請求項１０の発明は、請求項７に記載された画像処理方法において、前記尤度を計算するステップが、重複領域を有する横方向と縦方向の行候補領域の横・縦の行数比を算出するステップと、算出された行数比が予め定められた値であるときに、別の尤度計算処理を適用する計算処理ステップを備えたことを特徴とする方法である。
【００１５】
請求項１１の発明は、請求項１０に記載された画像処理方法において、前記別の尤度計算処理を適用する計算処理ステップが、前記重複領域を有する行候補領域に隣接する行候補領域を抽出するステップと、抽出された行候補領域の高さが所定の条件を満たすかを調べるステップと、行候補領域の高さが所定の条件を満たしているものの数をカウントするステップと、該カウント値が所定の条件を満たすかを調べるステップと、該調査結果をもとに行らしさを表す尤度を決定するステップとを備えたことを特徴とする方法である。
【００１６】
請求項１２の発明は、請求項１１に記載された画像処理方法において、前記抽出された行候補領域の高さを調べるステップが、行候補領域の高さ方向の位置がほぼ一致し、行候補領域の高さの差が小さいものを所定の条件に合致する行候補領域と判断するステップであることを特徴とする方法である。
【００１７】
請求項１３の発明は、請求項１２に記載された画像処理方法において、前記所定の条件に合致する行候補領域と判断するステップが、高さの差が小さいものが存在する場合でもさらに距離が近い場所に高さの差が大きいものが存在する場合には、条件に一致する行は存在しないと判断するステップであることを特徴とする方法である。
【００１８】
請求項１４の発明は、請求項１に記載された画像処理方法において、前記横行及び縦行抽出ステップの前段に処理対象として入力された原画像から圧縮画像を生成するステップと、前記行削除ステップを経た後に得られる圧縮画像にもとづく行領域を原画像にもとづく値に復元するステップとをさらに備えたことを特徴とする方法である。
【００１９】
請求項１５の発明は、請求項１４に記載された画像処理方法において、前記圧縮画像を生成するステップが、原画像の解像度に応じて圧縮比を変更するステップを備えたことを特徴とする方法である。
【００２０】
請求項１６の発明は、請求項１４に記載された画像処理方法において、前記圧縮画像を生成するステップは、原画像が二値画像であるとき、原画像の圧縮単位となる画素範囲に黒画素が含まれていれば、生成する圧縮画像の当該画素を黒画素とするステップを備えたことを特徴とする方法である。
【００２１】
請求項１７の発明は、請求項１４に記載された画像処理方法において、前記圧縮画像を生成するステップは、原画像が二値画像であるとき、原画像の圧縮単位となる画素範囲に白画素が含まれていれば、生成する圧縮画像の当該画素を白画素とするステップを備えたことを特徴とする方法である。
【００２２】
請求項１８の発明は、請求項１４に記載された画像処理方法において、前記圧縮画像を生成するステップが、原画像が多値画像のときには、原画像の圧縮単位となる画素範囲の平均画素値を生成する圧縮画像の当該画素の画素値とするステップ、原画像の圧縮単位となる画素範囲の中で最も明度の高い画素値を生成する圧縮画像の当該画素の画素値とするステップ、原画像の圧縮単位となる画素範囲の中で最も明度の低い画素値を生成する圧縮画像の当該画素の画素値とするステップの各ステップの中の少なくとも一つを備えたことを特徴とする方法である。
【００２３】
請求項１９の発明は、請求項１乃至１８に記載された画像処理方法の各ステップをコンピュータに実行させるためのプログラムである。
【００２４】
請求項２０の発明は、請求項１９に記載されたプログラムを搭載したコンピュータを備え、該コンピュータにより対象画像のデータを処理することを特徴とする画像処理装置である。
【００２５】
【発明の実施の形態】
本発明を添付する図面とともに示す以下の実施形態に基づき説明する。
なお、下記の「実施形態１」〜「実施形態４」ではそれぞれ、本発明の「画像処理方法」を実施するための処理ステップに併せて、各処理ステップの実行に必要な手段（装置）を示す。これらの手段（装置）は、後記「実施形態５」に示すように、本発明の「画像処理方法」の各処理ステップの実行プログラムを搭載したコンピュータによって実施することができるが、各処理ステップの実行に必要な機能を電気回路の形態によって実施することも可能である。
「実施形態１」
ここでは、縦書き・横書きが一つの文書で混在する文書画像を無条件で与えても正しい文字行領域情報を得ることを可能にする基本的な画像処理方法を示すもので、文字行と見なすことができる領域（なお、以下の記述において、文字行と見なすことができる領域を“行候補領域”“行候補”或いは単に“行”と記載することがある）を縦・横方向の行候補として抽出し、抽出された縦横の行候補領域の間に重なりが生じた場合に、不要な行候補を削除するという処理を要部とする。
図１は、本実施形態に係る画像処理方法の処理フローを示すチャートであり、図２は、本実施形態に係る画像処理装置の構成を示すブロック図である。なお、以下に示す実施形態では、処理対象の文書画像は二値画像を基本として説明をするが、一部を手直しすれば濃淡画像やカラー画像に対しても同様の処理が行えるので、二値画像対象の処理と異なる点があればその都度説明する。
【００２６】
本実施形態に係る画像処理装置を図２を参照してその概略を説明すると、画像入力手段１０１は、処理対象の原画像を取得し、格納しておく手段（例えば、カラー画像データを出力するスキャナ、撮像装置などの原稿読み取り装置により実施し得る）であり、画像入力手段１０１で取得した原画像データを行抽出手段１０２に送出する。
行抽出手段１０２（後記で図４を参照し詳述）は、一文書から文字行と見なせる行の全てを抽出し、抽出した文字行情報を行格納手段１０３に送る。
重複行抽出手段１０４は、行格納手段１０３から得た文字行情報をもとに重なり合う重複行を抽出し、結果を行尤度計算手段１０５に出力する。
行尤度計算手段１０５（後記で図６を参照し詳述）は、重複行それぞの文字行としての尤もらしさを計算し、その計算結果を不適行削除手段１０６に送る。
不適行削除手段１０６は、重複行を排除するために行尤度の計算結果をもとに不適な行を判断し、その削除処理を行い、その結果を行格納手段１０３に送る。
行格納手段１０３は、正しい文字行情報を出力できるように、不適行の削除処理の結果を反映した文字行情報を追加し格納する。
【００２７】
次に、本実施形態に係る画像処理方法を図１のフローチャートを参照して説明する。なお、以下の説明は、上記画像処理装置の各構成要素の詳細な構成とその動作の説明を合わせて行う。
図１のフローによると、まず、処理したい原稿を画像入力手段１０１により原画像として取得する（Ｓ１）。ここで取得される画像は、通常のラスタ方式による連続画素データの形式をとる。
行抽出手段１０２を用い、処理対象画像から横方向の行候補を抽出し、そのデータを行格納手段１０３に格納する（Ｓ２）。ここで、行候補の抽出に用いることが可能な方法には様々あるが、ここでは黒画素の連結成分を利用する方法を例示する。
図３は、本実施形態に係る行抽出処理（Ｓ２／Ｓ３）のより詳細なフローを示すチャートであり、図４は、本実施形態に係る行抽出手段１０２のより詳細な構成を示すブロック図である。
行抽出手段１０２は、図４に示すように、連結成分抽出手段１０２−０１、近傍連結成分抽出手段１０２−０２、連結成分サイズ比較手段１０２−０３、連結成分グループ化手段１０２−０４、グループ構成連結成分外接矩形算出手段１０２−０５を備える。行抽出手段１０２が備える上記各手段の持つ機能は、フローに従った以下の動作説明により示す。
【００２８】
本例の行抽出の手順を示す図３のフローによると、まず連結成分算出手段１０２−０１を用い、処理対象画像から黒画素の連結成分を抽出する（Ｓ２−０１）。なお、この抽出処理で、処理対象の文書画像が２値画像でなく濃淡画像の場合は、互いに接する画素のうち、明度値の近いものをグループ化し、これを連結成分として用いればよい。また、カラー画像の場合は明度値でなく色の近いものをグループ化すればよい。色の近さは、たとえば画素の各色成分の差の２乗和などを用いて計算する。これらの連結成分抽出の手法の詳細は、本願出願人による先の出願（特願２００１−８６４８４号）に説明があり、これを参照することとする。
次いで、抽出した連結成分をもとに近傍連結成分抽出手段１０２−０２を用い、抽出したい行の方向（ここでは横方向）の近傍の連結成分を抽出する。近傍の定義にはさまざま考えられるが、ここでは連結成分の外接矩形の大きさ程度の距離以内とする。この後、抽出した近傍の連結成分をもとに連結成分サイズ比較手段１０２−０３を用い、抽出された連結成分同士を比較し（Ｓ２−０２）、サイズが似通っているかどうかを調べる（Ｓ２−０３）。例えば、高さが近く、位置も近い場合にそれらは同一の行を生成するとみなして、これらを連結成分グループ化手段１０２−０４を用いてグループ化する（Ｓ２−０４）。サイズが近いと判断できない場合には（Ｓ２−０３−ＮＯ）このグループ化処理をパスする。
上記した近傍連結成分に対するグループ化処理をすべての矩形について吟味したことを確認し（Ｓ２−０５）、確認できたグループの外接矩形を、グループ構成連結成分外接矩形算出手段１０２−０５を用いて算出し、この座標値を文字行データとし、次のステップに渡す（Ｓ２−０６）。
また、上記した横方向の行候補におけると同様に、縦方向の行候補も抽出し、得た文字行データを次のステップに渡す（Ｓ３）。
【００２９】
図１に示すフローのステップＳ２，Ｓ３で横方向と縦方向の文字行データをそれぞれ抽出した後、重複行抽出手段１０４を用いて、抽出結果として得た横方向と縦方向の文字行候補の中から、位置が重なり合うものがあるか否かを判定し、ある場合に重なり合う文字行候補同士を１本ずつ取り出す（Ｓ４）。
次いで、１本ずつ取り出した位置が重なり合う文字行候補同士をどちらかが正しい文字行に相当するものであるから、その判断を行うために、重なり合う対象文字行候補の行らしさ（尤度）を行尤度計算手段１０５で重なり合う行それぞれについて計算する（Ｓ５）。
このときに用いる行らしさの計算方法には、様々なものが考えられるが、ここでは以下の方法で行うこととする。
図５は、本実施形態に係る行尤度計算処理（Ｓ５）のより詳細なフローを示すチャートであり、図６は、本実施形態に係る行尤度計算手段１０５のより詳細な構成を示すブロック図である。
行尤度計算手段１０５は、図６に示すように、行長算出手段１０５−０１、行高算出手段１０５−０２、縦横比算出手段１０５−０３、連結成分間距離算出手段１０５−０４、連結成分サイズ算出手段１０５−０５、特徴量合算手段１０５−０６を備える。行尤度計算手段１０５が備える上記各手段の持つ機能は、フローに従った以下の動作説明により示す。
【００３０】
本例の行尤度計算の手順を示す図５のフローによると、まず、行長算出手段１０５−０１と行高算出手段１０５−０２を用い、対象文字行の長さ、高さを求める（Ｓ５−０１）。
この後、求めた行の長さ、高さをもとに縦横比算出手段１０５−０３で行の縦横比（長さ比）を計算する（Ｓ５−０２）。
また、連結成分間距離算出手段１０５−０４を用いて、行を構成する連結成分同士の距離を求める（Ｓ５−０３）。ここで求めた連結成分同士の距離は、文字間隔に相当するものである。
さらに、連結成分サイズ算出手段１０５−０５を用いて、行を構成する連結成分のサイズを吟味する（Ｓ５−０４）。ここでは、連結成分のうち、文字らしいサイズの連結成分数を数える。
次に、以上のステップＳ５−０２〜Ｓ５−０４で求めた値を特徴量として用い、特徴量合算手段１０５−０６で文字行らしさを表す尤度を計算する（Ｓ５−０５）。尤度の計算方法には様々な方法が考えられるが、ここでは、あらかじめ求めておいた各特徴量の典型値との差の絶対値の線型和を尤度とする例を次に示す。
即ち、ある特徴量ｉの値をＦｉ、行候補が実際に行である場合の特徴量ｉの典型値をＥｉとすると尤度Ａは、下記［式１］で表せる。
【００３１】
【式１】

【００３２】
なお、［式１］において、Ｋｉは重みであり、実験により最適な値を求めておく。
また、典型値Ｅｉは実験により求める値で、行であることがわかっている領域を多数用意し、それらから得られる特徴量Ｆｉの平均値として求めればよい。
また、上記した線型和の例に代えて、マハラノビス距離を用いることもでき、その例を次に示す。
即ち、ある特徴量ｉの値をＦｉ、実験により求めた特徴量の典型値（平均値）をＥｉ、実験時の特徴量の分散をＶｉとすると、尤度Ａは、下記［式２］で表せる。
【００３３】
【式２】

【００３４】
なお、実際のマハラノビス距離は分散だけでなく、特徴量間の共分散も考慮に入れる必要があるが、ここでは簡単のために共分散は無視した。
さらに、濃淡画像やカラー画像の場合は明度情報や色情報を上記の特徴量として用いてもよい。エッジ強度、連結成分を構成する画素の明度と周囲の明度の差、連結成分を構成する画素の色と周囲の色の差、などを用いることが可能である。　このうち、エッジ強度の求め方の例を次に示す。
注目画素を中心にした３×３画素の範囲に、係数を図７の（Ａ）或いは（Ｂ）とした空間フィルタを明度値（ＲＧＢ成分からなるカラー画像の場合はＧ値とする）に対して施し、得られた値の絶対値和をこの注目画素のエッジ強度とする。これを文字行を構成する画素全体に対して行い、その平均値を文字行におけるエッジ強度とするという方法を用いる。
【００３５】
上記のようにして、行尤度計算（Ｓ５）を行った後、得られた重なり合う一対の文字行候補それぞれの尤度をもとに不適行削除手段１０６で行らしくないほうを削除する（Ｓ６）。この削除処理では、尤度Ａは小さいほど文字らしいので、不適行削除手段１０６により、縦行と横行のうち、尤度Ａの値が小さいほうを正しい文字行とみなし、他方を削除する。
重なり合う一対の文字行候補から不適行を削除するための一連の処理ステップ（Ｓ４〜Ｓ６）は、重なりをもつ一対の文字行候補単位で行い、重なりをもつ行候補同士を次々に吟味し、重なりがなくなるまでこの処理を繰り返し、全ての対象行候補の処理の終了を確認して（Ｓ７−ＹＥＳ）、重なりのない正しい文字行情報を処理結果として行格納手段１０３から出力し（Ｓ８）、このフローを終了する。
【００３６】
「実施形態２」
上記した「実施形態１」では一行対一行の重なりを取り出し、その単位で不適正行の削除までの一連の処理を行う手順によっており、行候補全体の処理操作が多くなり、又条件によってはエラーが生じる可能性があるという問題を含んでいた。本実施形態は、この問題点を解消することを意図し、横行と縦行が複数対複数の重なりを検知し、各グループ内で尤度を計算し、少ない処理操作で適切な文字行領域情報を得ることを可能にするものである。
図８は、本実施形態に係る画像処理方法の処理フローを示すチャートであり、図９は、本実施形態に係る画像処理装置の構成を示すブロック図である。
本実施形態に係る画像処理装置を図９を参照してその概略を説明すると、画像入力手段２０１は、処理対象の原画像を取得し、格納しておく手段（例えば、カラー画像データを出力するスキャナ、撮像装置などの原稿読み取り装置により実施し得る）であり、画像入力手段２０１で取得した原画像データを行抽出手段２０２に送出する。
行抽出手段２０２は、一文書から文字行と見なせる行候補の全てを抽出し、抽出した文字行情報を行格納手段２０３に送る。なお、行抽出手段２０２は、「実施形態１」の行抽出手段２０２と同様の構成（図４を参照し上記で詳述）を有する。
重複行群抽出手段２０４（後記で図１０を参照し詳述）は、行格納手段１０３から得た文字行情報をもとに重なり合う重複行を所定の条件に従って重複行群として抽出し、結果を行群尤度計算手段２０５に出力する。
行群尤度計算手段２０５（後記で図６を参照し詳述）は、重複行群における縦・横それぞれの文字行としての尤もらしさを計算し、その計算結果を不適行群削除手段２０６に送る。
不適行削除手段１０６は、重複行を排除するために行尤度の計算結果をもとに不適な行を判断し、その削除処理を行い、その結果を行格納手段２０３に送る。
行格納手段２０３は、正しい文字行情報を出力できるように、不適行の削除処理の結果を反映した文字行情報を追加し格納する。
【００３７】
次に、本実施形態に係る画像処理方法を図８のフローチャートを参照して説明する。なお、以下の説明は、画像処理装置（図９）の各構成要素のより詳細な構成とその動作の説明を合わせて行う。
図８のフローによると、まず、処理したい原稿を画像入力手段１０１により原画像として取得する（Ｓ２１）。ここで取得される画像は、通常のラスタ方式による連続画素データの形式をとる。
次いで、行抽出手段２０２を用い、処理対象画像から横方向及び縦方向の行候補をそれぞれ抽出し、その文字行データを行格納手段２０３に格納する（Ｓ２３，Ｓ２４）。ここに、行候補の抽出に用いる方法は上記「実施形態１」と同様に実施することができる。従って、行抽出処理については先の説明（Ｓ２，Ｓ３に関する説明）を参照することとし、説明を省略する。
この後、抽出した横方向及び縦方向の文字行データをもとに重複行群抽出手段２０４により、行の重複判定を行う（Ｓ２４）。
図１０は、本実施形態に係る複数行重なり判定ステップ（Ｓ２４）のより詳細なフローを示すチャートであり、図１１は、本実施形態に係る重複行群抽出手段２０４のより詳細な構成を示すブロック図である。
重複行群抽出手段２０４は、図１１に示すように、行選択手段２０４−０１、重複縦行抽出手段２０４−０２、重複横行抽出手段２０４−０３、重複行抽出制御手段２０４−０４、重複行群格納手段２０４−０５を備える。重複行群抽出手段２０４が備える上記各手段の持つ機能は、フローに従った以下の動作説明により示す。
【００３８】
本例の複数行重なり判定（Ｓ２４）の手順を示す図１０のフローによると、まず行選択手段２０４−０１を用い、ある横行を１つ選ぶ（Ｓ２４−０１）。
次いで、重複縦行抽出手段２０４−０２により、この選択した横行と位置が重なっている縦行をすべて抽出する（Ｓ２４−０２）。この抽出結果は重複行群格納手段２０４−０５に格納する。
この後に、重複横行抽出手段２０４−０３により、ステップＳ２４−０２で抽出された縦行と重なり合う横行をすべて抽出する（Ｓ２４−０３）。
このステップＳ２４−０２，Ｓ２４−０３の行抽出処理を繰り返す。この結果抽出された行群の一例を図１２に示す。同図の（Ａ）に示す原画像からは、（Ｃ）に示す横行、（Ｄ）に示す縦行が抽出され、上記した横行→縦行→横行→…と繰り返すステップＳ２４−０２，Ｓ２４−０３の手順を行うと、同図の（Ｂ）に示す行群が抽出できる。
ステップＳ２４−０２，Ｓ２４−０３の繰り返しは所定の終了条件をもって終了させる（Ｓ２４−０４）。重複行抽出制御手段２０４−０４によりその制御を実行する。終了条件として様々な条件を設定することが可能であり、以下に示すような条件が例示できる。
・　予め定めておいた有限回で終了させる。
・　新たに重なる行が検知できなくなった時点で終了させる。
・　重なりを持つ行の外接矩形がある程度以上の面積になった時点で終了させる。なお、重なりを持つ行の外接矩形については、図１２（Ｅ）の例示を参照。
・　重なりを持つ行の本数がある数以上になったら終了させる。
これらの条件は必要に応じて複合して用いるようにする。例えば、行の本数があらかじめ定められた数まで増えなくても、新たに重なる行が検知できなければ終了させて、無限ループに陥ることがないようにする。なお、本例では、有限回の繰り返しで処理を終了させるように条件設定を行うことにする。
【００３９】
次に、ステップＳ２４（図１０）で得た重なりを持つ行群データをもとに、図８のフローでは、行群尤度算出手段２０５により行群中の縦行群と横行群のそれぞれで尤度を求める（Ｓ２５）。縦行群と横行群の尤度は、上記「実施形態１」と同様の手法を用いて各行ごとに尤度を求め、求めた値の平均値を用いることにより実施し得る。また、行が複数あることから、行長や行高のばらつき（分散）を尤度値を求めるための特徴量に加えても良い。
このようにして、縦横の各行群から一つずつの尤度を計算し、得られた結果をもとに、不適行群削除手段２０６を用い、尤度Ａの値の大きい（文字らしくない）行群を判定し、判定結果に従い不適行を削除する（Ｓ２６）。例えば、横行群から求めたＡが縦行群から求めたＡよりも大きければ、横行群に属する行をすべて削除する。
複数行が重なり合う縦行群と横行群から不適行群を削除するための一連の処理ステップ（Ｓ２４〜Ｓ２６）は、上記のように一つの横行を選択し所定条件に従う行群単位で行い、縦行群と横行群同士を次々に吟味し、吟味する横行がなくなるまでこの処理を繰り返し、全ての対象行処理の終了を確認して（Ｓ２７−ＹＥＳ）、重なりのない正しい文字行情報を処理結果として行格納手段２０３から出力し（Ｓ２８）、このフローを終了する。
【００４０】
「実施形態３」
上記した「実施形態２」では複数行が重なり合う縦行群と横行群を所定条件に従って抽出することにより求めた行群単位で不適正ないずれかの行群を削除するという手順により、目的とする行データを得るという方法を採っており、より適正な行データの取得が可能になる。ところが、この方法では、図１３に示すような目次の記事、即ち目次の項目と対応するページ番号が記載された原稿（同図中、（Ａ）は原稿の画像を示す）の場合、ページの並び部分が縦行として抽出されてしまう（同図中、（Ｂ）は行抽出結果を示す）、というエラーが生じる場合がある。本実施形態は、このようなエラーを生じないようにし、適切な文字行領域情報を得ることを可能にするものである。
図１４は、本実施形態に係る画像処理方法の処理フロー（一部）を示すチャートであり、図１５は、本実施形態に係る画像処理装置の構成を示すブロック図である。
本実施形態に係る画像処理装置は、図１５に示すような構成で、同図中破線で囲んだ部分は、上記「実施形態２」の装置構成（図９）と同様である。従って、先の説明を参照することとし、ここでは説明を省略する。
また、本実施形態に特有の構成要素として、行数比算出手段３０１、隣接行抽出手段３０２、行高吟味手段３０３、尤度設定手段３０４、カウンタ値吟味手段３０５、カウント手段３０６を備える。本実施形態に特有のこれらの各手段の持つ機能は、フローに従った以下の動作説明により明らかにする。
【００４１】
次に、本実施形態に係る画像処理方法を図１４のフローチャートを参照して説明する。なお、本実施形態のフローは、「実施形態２」のフロー（図８）における複数行重なり判定のステップＳ２４（詳細フローは図１０）と行群の尤度計算のステップＳ２５の間に図１４に示すフローを挿入することを特徴とするものである。従って、以下の説明は、図１４のフローに示す手順を中心に行う。なお、この手順を説明するために用いる例を図１６に示す。同図は、目次の記載事項の目次に対応するページ番号の行抽出を例にこの処理を説明するための図である。
上記「実施形態２」において示したように、複数行重なり判定ステップＳ２４で縦横行同士で重なり合っている行群を抽出した後に、本実施形態特有の処理として図１４のフローによる処理が行われる。ここでは、抽出した行群をもとに行数比算出手段３０１により、横行と縦行の数が１：多（または多：１）の関係にあるか否かをか吟味する（Ｓ３１）。なお、この関係は、図１６（Ａ）に示すように、縦行（点線）が“１”である場合に、重なり合う横行が“５”となる例示したページの並び部分において成立する。
このときに、１：多（または多：１）の関係が成り立たなければ、行群の尤度計算のステップＳ２５（図８参照）へ進む。
【００４２】
他方、１：多（または多：１）が成り立てば、目次を記載した記事のページの並び部分である可能性があるので、さらに別の観点で吟味を行う。
次に吟味する観点は、条件にあう行数をカウントするので、そのために用意したカウント手段３０６のカウント値を“０”にリセットする（Ｓ３２）。
次いで、隣接行抽出手段３０２を用いて複数抽出された行（以下「参照元行」と呼び、図１６（Ｂ）に示す例では、ページ番号の並びがこれに相当する）の長さ方向に隣接した行を探す（Ｓ３３）。このときには、高さ方向の座標値が一致し、参照元行に最も近い行（以下「隣接行」と呼び、図１６（Ｂ）に示す例では目次の項目、例えば“従来手法”と記載した横行がこれに相当する）を選び出す。
ステップＳ３３を経て、もし隣接行があれば（Ｓ３４−ＮＯ）、以下のＳ３５〜Ｓ３７のステップへ進めるがなければ、このステップをパスしＳ３８へ進める。
隣接行があれば、行高吟味手段３０３を用い、隣接行と参照元行の高さが近ければ（Ｓ３５−ＹＥＳ）、カウント手段３０６のカウント値に“１”を加算して（Ｓ３６）、後記のステップＳ３８へ進めて、参照元の行を変更して（Ｓ３９）、他の参照元行について上記と同様の処理を行う。この処理は、目次ページ番号とペアになる文字列（目次項目の記載）を探していることに相当する。
【００４３】
隣接行の高さが参照元行と合わず、しかも高すぎる場合は（Ｓ３７−ＹＥＳ）、同じ参照元行に対するこれ以上の隣接行の探索は行わず、後記のステップＳ３８へ進む。この処理は、ペアになるべき文字列が存在しないと判断したことに相当する。
この手順を必要とする例を図１６（Ｃ）に示す。この例では、隣接行と参照元行の間に、目次ページ番号と対応する文字列の関係が成立しないような高さを有した図形が存在しており、目次としては不自然な構成である。
隣接行の高さが参照元行と合わないが、高すぎるという程ではない場合は（Ｓ３７−ＮＯ）、ステップＳ３３に戻って同じ参照元行に対し、現在の隣接行の次に位置が近い隣接行を探索する。
隣接行の吟味がすべて終了した場合や、条件を満たす隣接行が見つかった場合は、まだ未吟味の参照元行があるか否かを確認する（Ｓ３８）。ここで、まだ未吟味の参照元行がある場合は参照元行を変更し（Ｓ３９）、再び前記のステップＳ３３に戻って隣接行の探索を行う。
ステップＳ３８で未吟味の参照元矩形がない場合は、カウンタ値吟味手段３０５によりカウンタ値を吟味する（Ｓ４０）。即ち、カウンタ値があらかじめ定められた値より大きいか、或いは全参照元行数との比率がある一定値以上の場合は（Ｓ４０−ＹＥＳ）、目次のページ番号の並びであると判断し、尤度設定手段３０４を用いて、行群の尤度として単数行（１：多の行数の関係にある場合の“１”側の行）の方が削除されるような値を設定し（Ｓ４１）、ステップＳ２６（図８）へ進む。カウンタ値吟味手段３０５によりカウンタ値を吟味する手順を図１６の例で説明すると、図１６（Ｂ）の例では、参照元行５に対し、条件に合う隣接行が５行見つかることになり、カウンタ値５となる。カウンタ値の全参照元行数に対する割合は１．０となり極めて高いので、これは目次のページ番号であると判断するのが妥当ということになる。
カウンタ値を吟味した結果、カウンタ値があらかじめ定められた値より大きくない場合は、ステップＳ２５（図８）へ進み、尤度計算を行う。
【００４４】
「実施形態４」
本実施形態は、上記の「実施形態１」「実施形態２」に示したような、入力文書画像から適切な文字行候補領域情報を得ることを可能にする画像処理方法及び装置における処理の高速化を意図するものである。ここでは対象画像データに圧縮をかけることにより、その意図を実現させるものである。
図１７は、本実施形態に係る画像処理方法の処理フローを示すチャートであり、図１８は、本実施形態に係る画像処理装置の構成を示すブロック図である。
本実施形態では、上記「実施形態１」において、処理の高速化を図るための要素を適用した例を示す。従って、「実施形態１」の画像処理方法及び画像処理装置（図１及び図２）と同じ構成要素については、図面上で同一の参照番号を付すこととし、同じ構成要素については先の説明を参照することとし、ここでは説明を省略する。
図１８に示すように、本実施形態に特有の構成要素として、圧縮画像生成手段４０１、座標値変換手段４０２を備える。本実施形態に特有のこれらの各手段の持つ機能は、フローに従った以下の動作説明により明らかにする。
【００４５】
ここで、本実施形態に係る画像処理方法を図１７のフローチャートを参照して説明する。なお、本実施形態のフローは、「実施形態１」と同様のフロー（図１参照）における画像入力のステップＳ１と横行抽出のステップＳ２の間に圧縮画像生成のステップＳ４１、及び同じく全ての行を吟味したかを確認するステップＳ７と結果出力のステップＳ８の間に座標値変換のステップＳ４２を挿入することを特徴とするものである。従って、以下の説明は、図１７のフローで追加した手順を中心に行う。
上記「実施形態１」において示したように、画像入力ステップＳ１で処理対象画像を画像入力手段１０１により入力した後、圧縮画像生成手段４０１を用いて処理対象の原画像から圧縮画像（縮小画像）を生成する（Ｓ４１）。圧縮画像は原画像の数画素四方を１画素にまとめることで生成する。圧縮比は固定値を用いてもよいし、原画像の解像度にあわせて変化させてもよい。例えば、原画像の解像度が高い場合には圧縮率を高めるようにすれば、処理時間は解像度が増してもあまり増加しないようにすることが可能である。ここでは固定値として２×２画素の範囲を１画素にまとめる圧縮を行うとする。なお、まとめた画素の値は、二値画像の場合は２×２画素に黒画素が含まれていれば黒、そうでなければ白とする（いわゆるＯＲ圧縮）といった方法を採用することにより実施し得る。また、濃淡画像やカラー画像の場合は２×２画素領域内の平均値を用いるなどの方法を採用しても良い。
ステップＳ４１で生成された圧縮画像がその後、処理対象画像として横行データの抽出ステップ（Ｓ２）に渡され、以下、「実施形態１」と同様の処理ステップを経て、ステップＳ７まで進む。
ステップＳ７で、すべての行を吟味したと判断されたら、座標値変換手段４０２で前段までの手順を経て得られた正しい行領域情報を示す座標値を原画像のものに変換する（Ｓ４２）。２×２画素を１画素にまとめる圧縮を採用した場合には、Ｘ，Ｙ座標とも座標値を２倍にする座標値変換を行うようにすればよい。この例では使用する画像の画素数が４分の１に減るため、処理の高速化を図ることが可能になる。
【００４６】
「実施形態５」
本実施形態は、上記した「実施形態１」〜「実施形態４」に示した文字行領域情報を得るための画像処理のステップを実行する手段として、汎用のコンピュータを利用した実施形態を示すものである。
図１９は、本実施形態の処理装置の構成を示す。図１９に示すように、本実施形態は、汎用のコンピュータにより実施する例を示すものであり、構成要素としてＣＰＵ８０１、メモリ８０２、ハードディスクドライブ８０３、入力装置８０４、ＣＤ−ＲＯＭドライブ８０５、ディスプレイ８０６、マウスなどを用意する。また、ＣＤ−ＲＯＭドライブ８０５が用いるＣＤ−ＲＯＭなどのリムーバブルな記録媒体８０７には、本発明の文字行領域情報を得るための上記した処理機能や処理手順を実現させるためのプログラム（ソフトウェア）が記録されている。
処理対象の原稿画像は、スキャナー等の入力装置８０４により入力され、例えばハードディスク８０３などに格納されているものである。ＣＰＵ８０１は、記録媒体８０７から上記した処理機能、手順を実現するプログラムを読み出し、プログラムに従う文字行領域情報を得るための処理を対象画像に実行し、その結果をディスプレイ８０６などに出力する。
【００４７】
【発明の効果】
（１）　請求項１の発明に対応する効果
互いに重なり合う重複領域を持つ縦・横方向の行候補領域に対して文字行としての適性を判断し、その結果をもとに不適切な方向の行を削除するようにしたので、縦書・横書が一文書で混在する文書画像を無条件で与えても、適切な文字行領域を表す情報を得ることが可能になり、従来法におけるように処理対象領域を指定することなく、適切に文字行を把握することができる。
（２）　請求項２の発明に対応する効果
請求項１の発明において、互いに重なり合う重複領域を持つ縦・横方向の行候補領域を取り出すための基本的な手順を提供する。
（３）　請求項３，４の発明に対応する効果
縦・横方向の行候補領域が複数対複数の重なりを検知して所定のルールでグループ化し、各グループ（行候補群）内でどちらがより適切な行かを判定するようにしたので、少ない処理操作で文字行領域情報を得ることが可能になる。
（４）　請求項５の発明に対応する効果
黒画素または白画素の連結成分、隣接する画素で明度の近い画素同士の連結成分、隣接する画素で色の近い画素同士の連結成分として抽出するという方法を用意したので、二値及び多値（カラー）画像に対しても請求項１の発明が適用可能になる。
【００４８】
（５）　請求項６〜９の発明に対応する効果
互いに重なり合う重複領域を持つ縦・横方向の行候補領域の文字行としての適性を、文字行の特徴量（行長さ、行高さ、行サイズの縦横比、連結成分間距離、連結成分サイズ、上記各量のばらつき、画素の明度、画素の色情報、画素のエッジ強度、画素の連結成分と周囲の画素の明度差、画素の連結成分と周囲の画素の色差等）を含む、行らしさを表す尤度計算によって判断するようにしたので、より適切な結果を得ることが可能になる。
（６）　請求項１０〜１３の発明に対応する効果
グループ（行候補群）内でどちらがより適切な行かを判定するようにした場合に生じる可能性のあるエラー（図１３に関する記載参照）を起こす条件を考慮した行らしさを表す尤度計算処理を行うようにしたので、このエラーを回避することが可能になる。
（７）　請求項１４〜１８の発明に対応する効果
縦・横方向の行候補領域を抽出する処理の前段に入力原画像に圧縮をかける（原画像が二値のとき、圧縮単位に黒画素が含まれていれば、圧縮画像を黒画素に、逆に白画素であれば圧縮画像を黒画素とする。また、原画像が多値のとき、圧縮単位の平均値、最高明度値、最低明度値の少なくとも一つを採用する。）ようにしたので、処理の高速化を図ることが可能になる。
（８）　請求項１９，２０の発明に対応する効果
請求項１乃至１８に記載された画像処理方法の各ステップを実行するためのプログラムを汎用のコンピュータに搭載することにより、上記（１）〜（７）の効果を容易に具現化し、また、該効果を奏する画像処理装置を提供することが可能になる。
【図面の簡単な説明】
【図１】本発明に係わる文字行領域を表す情報を取得する画像処理方法の処理フロー（実施形態１）を示すチャートである。
【図２】図１の処理を行うための画像処理装置の構成を示すブロック図である。
【図３】図１における行抽出ステップ（Ｓ２，Ｓ３）の詳細フローを示すチャートである。
【図４】図２における行抽出手段のより詳細な構成を示すブロック図である。
【図５】図１における行尤度計算ステップ（Ｓ５）の詳細フローを示すチャートである。
【図６】図２における行尤度計算手段のより詳細な構成を示すブロック図である。
【図７】カラー画像における文字行の特徴量を得るための処理に用いる空間フィルタの特性を示す。
【図８】本発明に係わる文字行領域を表す情報を取得する画像処理方法の処理フロー（実施形態２）を示すチャートである。
【図９】図８の処理を行うための画像処理装置の構成を示すブロック図である。
【図１０】図８における複数行重なり判定ステップ（Ｓ２４）の詳細フローを示すチャートである。
【図１１】図９における重複行群抽出手段のより詳細な構成を示すブロック図である。
【図１２】重複行群を抽出する処理を説明するための図を示す。
【図１３】実施形態２の重複行群抽出処理により発生し得る縦行の抽出エラーを説明するための図である。
【図１４】縦行の抽出エラーを回避する処理フロー（実施形態３）を示すチャートである。
【図１５】図１４の処理を行うための手段を備えた画像処理装置の構成を示すブロック図である。
【図１６】縦行の抽出エラーを回避する処理（実施形態３）を説明するための図である。
【図１７】本発明に係わる文字行領域を表す情報を取得する画像処理方法の処理フロー（実施形態４）を示すチャートである。
【図１８】図１７の処理を行うための画像処理装置の構成を示すブロック図である。
【図１９】本発明に係わる画像処理方法を実行するコンピュータを示す図である。
【符号の説明】
１０１，２０１…画像入力手段、　　１０２，２０２…行抽出手段、
１０３，２０３…行格納手段、　　　　　　１０４…重複行抽出手段、
１０５…行尤度計算手段、　　　　１０６…不適行削除手段、
２０４…重複行群抽出手段、　　　２０５…行群尤度計算手段、
２０６…不適行群削除手段、　　　３０１…行数比算出手段、
３０２…隣接行抽出手段、　　　　３０３…行高吟味手段、
３０４…尤度設定手段、　　　　　３０５…カウンタ値吟味手段、
３０６…カウント手段、　　　　　４０１…圧縮画像生成手段、
４０２…座標値変換手段、　　　　８０１…ＣＰＵ、
８０２…メモリ、　　　　　　　　８０３…ハードディスクドライブ、
８０４…入力装置、　　　　　　　８０５…ＣＤ−ＲＯＭドライブ、
８０６…ディスプレイ、　　　　　８０７…ＣＤ−ＲＯＭ。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to image processing for acquiring information representing a character line area used in a preceding stage of a recognition process of a character / document image included in an input image, and more particularly, to a vertical image processing for one target image. The present invention relates to an image processing method for obtaining information indicating an appropriate character line area even when a horizontal line exists, a program used for executing the method, and an image processing apparatus.
[0002]
[Prior art]
In image processing for characters and documents, processing such as character recognition is conventionally performed on characters and document images that have been read. When performing this processing, it is indispensable to obtain correct position information of a character area occupied by a character / document image present in the processing target image in order to obtain high recognition accuracy. If character recognition processing is performed in a state where it is unknown where characters are in the document image, character recognition processing will be performed on areas such as photographs and drawings that do not require character recognition. Not only takes a long time, but also causes a large amount of errors to be output as a result of forcibly executing character recognition in an area where no character exists. Become.
For this reason, a method using a connected component of black pixels of a binary image has been proposed as a processing method for obtaining correct position information of a character area. In this method, a circumscribed rectangle of a connected component of black pixels of an input image is obtained, and from the obtained circumscribed rectangle, basic elements are classified into characters, tables, figures, and the like. A process of generating a line is performed (for example, see Japanese Patent Application Laid-Open No. 9-44594).
[0003]
By the way, in Japanese documents, characters can be written either vertically or horizontally, so if you do not know the line direction in advance, it is difficult to obtain a correct recognition result even if you perform character recognition processing. is there. In the above example, since the row direction is not considered, it is necessary to know by some method in order to cope with the appropriate row direction.
To cope with such a problem, in the method disclosed in Japanese Patent Application Laid-Open No. H10-63776, a processing target area is given by some method, and a circumscribed rectangle of a connected component of black pixels in the processing target area is obtained. The following direction is obtained from the above to estimate the row direction.
-Find the line density characteristics (projection) in the vertical and horizontal directions of the circumscribed rectangle.
・ Calculate the total value of the distance between adjacent rectangles in both the vertical and horizontal directions.
-Calculate the total value of the overlap value (coordinate overlap width) between adjacent rectangles in both the vertical and horizontal directions
I will.
As a procedure, first, the number of character strings in the vertical and horizontal directions is estimated from the projection of the circumscribed rectangle. If the estimated number of character strings is 1, the line direction is determined from the projection information. If the number of character strings is not 1, these ratios are obtained from the total value of the distances of the circumscribed rectangles, and the row direction is determined from the ratio when the distance exceeds a predetermined value. If the ratio value falls within a predetermined value and the row direction is not clear yet, the row direction is determined from the cumulative value of the overlapping values between the rectangles.
[0004]
[Problems to be solved by the invention]
However, in this method (Japanese Patent Application Laid-Open No. 10-63776), a region to be processed must be specified in advance, and vertical and horizontal lines are mixed in this region due to the use of projection of a circumscribed rectangle. Don't do it. Therefore, it is not possible to unconditionally give a document image in which both vertical writing and horizontal writing are mixed in one document and to perform processing, which hinders automation.
The present invention has been made in view of the above-described problems of the related art in a process of acquiring information representing a character area occupied by a character / document image included in an image input by an image reading unit or the like. The purpose of the image processing method is to perform a process for obtaining appropriate character line area information even if a document image in which vertical writing and horizontal writing are mixed in one document is unconditionally given. An object of the present invention is to provide a program and an image processing device used for executing the method.
[0005]
[Means for Solving the Problems]
According to the first aspect of the present invention, there is provided a horizontal line extracting step of extracting a line candidate region that can be regarded as a horizontal line from an image input as a processing target, and a vertical line extracting a line candidate region that can be regarded as a vertical line from the target image. An extraction step, a duplicate row detection step for detecting a row candidate area in the horizontal direction and a vertical direction having an area overlapping each other based on a result of the horizontal row and the vertical row extraction step, and a duplicate row detection step. An inappropriate line selection step of judging the suitability of the horizontal and vertical line candidate areas having overlapping areas as character lines and selecting a line in an inappropriate direction; An image processing method comprising: a row deletion step of deleting a row candidate.
[0006]
According to a second aspect of the present invention, in the image processing method according to the first aspect, the image processing method further includes a row data storing step of storing a row candidate area extracted in the horizontal and vertical row extracting step. Selecting a row from the row candidate areas stored in the row data storing step; and selecting at least one row of row candidate areas having a direction different from the one row and overlapping the areas. Is the way.
[0007]
According to a third aspect of the present invention, in the image processing method according to the first aspect, a row data storing step of storing a row candidate area extracted in the horizontal and vertical row extracting step is provided, and the duplicate row detecting step includes: A primary selection step of selecting one row from the row candidate areas stored in the row data storage step, a secondary selection step of selecting all row candidate areas having a direction different from the one row and overlapping areas, A higher-order selection step of sequentially applying to a row newly selected after the next selection step, selecting a row candidate area having a different direction from the row and overlapping areas, and The method comprises a row selection stop step of stopping according to a condition.
[0008]
According to a fourth aspect of the present invention, in the image processing method according to the third aspect, when the row selection stopping step reaches a predetermined number of selection steps at the time of selection, a row candidate area to be selected. Disappears, when the number of selected line candidate areas is equal to or greater than a predetermined number, and when at least one of the stop conditions in each stop condition when the area of the circumscribed rectangle of the selected line candidate area exceeds a predetermined value. The method is characterized by the step of stopping the selection upon establishment.
[0009]
According to a fifth aspect of the present invention, in the image processing method according to the first aspect, each of the horizontal and vertical row extracting steps includes a step of extracting a connected component of a black pixel or a white pixel, and adjacent pixels having similar brightness. Extracting at least one connected component among the steps of extracting pixels as connected components, extracting adjacent pixels having similar colors as connected components, and the connection obtained in the connected component extraction step. Selecting neighboring connected components from the components, comparing the sizes of the neighboring connected components, grouping the connected components based on the comparison result, and circumscribing a rectangle of the grouped connected components. And a step of obtaining
[0010]
According to a sixth aspect of the present invention, in the image processing method according to the first aspect, the inappropriate line selecting step determines the suitability of the horizontal and vertical line candidate regions having an overlapping region as character lines. Calculating a likelihood representing the likelihood in each row candidate area.
[0011]
According to a seventh aspect of the present invention, in the image processing method according to the first aspect, the inappropriate line selecting step determines the suitability of the horizontal and vertical line candidate regions having an overlap region as character lines. Calculating a likelihood representing the likelihood of the entirety of one or more row candidates in the row candidate area in each direction.
[0012]
According to an eighth aspect of the present invention, in the image processing method according to the sixth or seventh aspect, the step of calculating the likelihood includes the step of obtaining one or more feature amounts, and the likelihood based on the obtained feature amounts. Is calculated.
[0013]
According to a ninth aspect of the present invention, in the image processing method according to the eighth aspect, the step of obtaining the feature amount includes a step of calculating a line length, a step of calculating a line height, and an aspect ratio of the line size. Calculating the distance between the connected components of the pixels forming the row, calculating the size of the connected components of the pixels forming the row, calculating the variation in the length of the row, and calculating the height of the row. Calculating the variation of the height, the step of calculating the variation of the aspect ratio of the row size, the step of calculating the variation of the distance between the connected components of the pixels constituting the row, the variation of the size of the connected components of the pixels constituting the row Calculating the lightness information of the pixels constituting the row, extracting the color information of the pixels constituting the row, and calculating the edge intensity of the pixels constituting the row. Step, calculating the brightness difference between the connected components of the pixels constituting the row and the surrounding pixels, at least one of the steps of calculating the color difference between the connected components of the pixels constituting the row and the surrounding pixels, A method characterized by comprising:
[0014]
According to a tenth aspect of the present invention, in the image processing method according to the seventh aspect, the step of calculating the likelihood includes the step of calculating the ratio of the number of horizontal / vertical rows of the horizontal and vertical row candidate areas having an overlapping area. The method comprises a calculating step, and a calculating step of applying another likelihood calculating process when the calculated ratio of the number of rows is a predetermined value.
[0015]
According to an eleventh aspect of the present invention, in the image processing method according to the tenth aspect, the calculation processing step of applying the another likelihood calculation process extracts a row candidate area adjacent to the row candidate area having the overlapping area. Determining whether the height of the extracted row candidate area satisfies a predetermined condition; counting the number of row candidate area heights satisfying a predetermined condition; A step of examining whether or not satisfies a predetermined condition, and a step of determining a likelihood representing the likelihood based on the result of the examination.
[0016]
According to a twelfth aspect of the present invention, in the image processing method according to the eleventh aspect, the step of examining the height of the extracted line candidate area is substantially the same as the position of the line candidate area in the height direction. The method is characterized by the step of judging a region having a small difference in height between regions as a line candidate region meeting predetermined conditions.
[0017]
According to a thirteenth aspect of the present invention, in the image processing method according to the twelfth aspect, the step of judging the line candidate area that meets the predetermined condition further includes a step of determining a line candidate area having a small difference in height. This method is a step of determining that there is no row that matches the condition when there is an object having a large difference in height at a close place.
[0018]
According to a fourteenth aspect of the present invention, in the image processing method according to the first aspect, a step of generating a compressed image from an original image input as a processing target before the horizontal and vertical row extracting step, and the row deleting step And restoring the row region based on the compressed image obtained after the processing to a value based on the original image.
[0019]
According to a fifteenth aspect of the present invention, in the image processing method according to the fourteenth aspect, the step of generating the compressed image includes a step of changing a compression ratio according to a resolution of an original image. It is.
[0020]
According to a sixteenth aspect of the present invention, in the image processing method according to the fourteenth aspect, the step of generating the compressed image includes, when the original image is a binary image, a black pixel in a pixel range serving as a compression unit of the original image. Is included, the pixel of the compressed image to be generated is set as a black pixel.
[0021]
According to a seventeenth aspect of the present invention, in the image processing method according to the fourteenth aspect, the step of generating the compressed image includes, when the original image is a binary image, a white pixel in a pixel range serving as a compression unit of the original image. Is included, the pixel of the compressed image to be generated is set as a white pixel.
[0022]
According to an eighteenth aspect of the present invention, in the image processing method according to the fourteenth aspect, the step of generating the compressed image includes, when the original image is a multi-valued image, an average pixel value of a pixel range which is a compression unit of the original image. Generating the pixel value of the relevant pixel of the compressed image, generating the pixel value having the highest brightness in the pixel range serving as the compression unit of the original image, and setting the pixel value of the relevant pixel of the compressed image generating the original image, And at least one of the steps of setting a pixel value of the pixel of the compressed image to generate a pixel value with the lowest brightness in a pixel range as a compression unit of .
[0023]
The invention of claim 19 is a program for causing a computer to execute each step of the image processing method according to any one of claims 1 to 18.
[0024]
According to a twentieth aspect of the present invention, there is provided an image processing apparatus comprising a computer having the program according to the nineteenth aspect mounted thereon, wherein the computer processes data of a target image.
[0025]
BEST MODE FOR CARRYING OUT THE INVENTION
The present invention will be described based on the following embodiments shown in the accompanying drawings.
In the following “Embodiment 1” to “Embodiment 4”, means (apparatus) necessary for executing each processing step are added together with processing steps for implementing the “image processing method” of the present invention. Show. These means (apparatuses) can be implemented by a computer equipped with an execution program for each processing step of the “image processing method” of the present invention, as described in “Embodiment 5” below. The functions required for execution can be implemented in the form of an electric circuit.
"Embodiment 1"
Here, a basic image processing method that enables to obtain correct character line area information even if a document image in which vertical writing and horizontal writing are mixed in one document is given unconditionally, is regarded as a character line. (In the following description, an area that can be regarded as a character line may be referred to as a “line candidate area”, “line candidate” or simply “line”). The process of deleting unnecessary row candidates when an overlap occurs between the extracted vertical and horizontal row candidate areas is regarded as a main part.
FIG. 1 is a chart showing the processing flow of the image processing method according to the present embodiment, and FIG. 2 is a block diagram showing the configuration of the image processing apparatus according to the present embodiment. In the embodiment described below, a document image to be processed is described based on a binary image. However, if a part of the document image is modified, the same processing can be performed on a grayscale image or a color image. If there is a difference from the processing of the image object, it will be described each time.
[0026]
The image processing apparatus according to the present embodiment will be briefly described with reference to FIG. 2. An image input unit 101 acquires and stores an original image to be processed (for example, outputs color image data). The original image data acquired by the image input unit 101 is sent to the line extracting unit 102.
The line extracting unit 102 (described in detail below with reference to FIG. 4) extracts all lines that can be regarded as character lines from one document, and sends the extracted character line information to the line storage unit 103.
The duplicate line extraction unit 104 extracts overlapping lines based on the character line information obtained from the line storage unit 103 and outputs the result to the line likelihood calculation unit 105.
The line likelihood calculating means 105 (described later in detail with reference to FIG. 6) calculates the likelihood of each duplicate line as a character line, and sends the calculation result to the inappropriate line deleting means 106.
The unsuitable row deletion unit 106 determines an unsuitable row based on the calculation result of the row likelihood in order to eliminate a duplicate row, performs a deletion process thereof, and sends the result to the row storage unit 103.
The line storage unit 103 adds and stores character line information reflecting the result of the unsuitable line deletion process so that correct character line information can be output.
[0027]
Next, an image processing method according to the present embodiment will be described with reference to the flowchart of FIG. In the following description, the detailed configuration of each component of the image processing apparatus and its operation will be described together.
According to the flow of FIG. 1, first, a document to be processed is acquired as an original image by the image input means 101 (S1). The image acquired here takes the form of continuous pixel data in a normal raster system.
A row candidate in the horizontal direction is extracted from the processing target image by using the row extraction unit 102, and the data is stored in the row storage unit 103 (S2). Here, there are various methods that can be used for extracting a row candidate. Here, a method using a connected component of black pixels is exemplified.
FIG. 3 is a chart showing a more detailed flow of the row extracting process (S2 / S3) according to the present embodiment, and FIG. 4 is a block diagram showing a more detailed configuration of the row extracting unit 102 according to the present embodiment. It is.
As shown in FIG. 4, the row extracting unit 102 includes a connected component extracting unit 102-01, a neighboring connected component extracting unit 102-02, a connected component size comparing unit 102-03, a connected component grouping unit 102-04, a group configuration A connected component circumscribed rectangle calculation means 102-05 is provided. The function of each of the above-described units included in the row extracting unit 102 will be described by the following operation description according to the flow.
[0028]
According to the flow of FIG. 3 showing the row extraction procedure of this example, first, the connected component of black pixels is extracted from the processing target image using the connected component calculation unit 102-01 (S2-01). In this extraction process, when the document image to be processed is not a binary image but a grayscale image, among the pixels that are in contact with each other, those having similar brightness values may be grouped and used as a connected component. Further, in the case of a color image, images having similar colors may be grouped instead of brightness values. The closeness of the colors is calculated using, for example, the sum of squares of the differences between the color components of the pixels. The details of the method of extracting these connected components are described in an earlier application filed by the present applicant (Japanese Patent Application No. 2001-86484), which will be referred to.
Next, based on the extracted connected components, the connected components in the vicinity of the row to be extracted (horizontal direction in this case) are extracted using the nearby connected component extracting means 102-02. Although there are various possible definitions of the neighborhood, here, it is assumed that the distance is within a distance about the size of the circumscribed rectangle of the connected component. Thereafter, the connected components are compared with each other by using the connected component size comparing means 102-03 based on the extracted connected components in the vicinity (S2-02), and it is determined whether or not the sizes are similar (S2-02). 03). For example, when the heights are close and the positions are close, they are regarded as generating the same row, and they are grouped using the connected component grouping unit 102-04 (S2-04). If it cannot be determined that the sizes are close (S2-03-NO), this grouping process is passed.
It is confirmed that the above-described grouping process for the neighboring connected components has been examined for all rectangles (S2-05), and the circumscribed rectangle of the confirmed group is calculated using the group component connected component circumscribed rectangle calculating means 102-05. Then, these coordinate values are used as character line data and passed to the next step (S2-06).
Similarly to the horizontal line candidates described above, the vertical line candidates are also extracted, and the obtained character line data is passed to the next step (S3).
[0029]
After extracting the horizontal and vertical character line data in steps S2 and S3 of the flow shown in FIG. 1, the horizontal and vertical character line candidates obtained as the extraction result are extracted by using the overlapping line extraction unit 104. From among them, it is determined whether or not there is an overlapping part, and if so, the overlapping character line candidates are extracted one by one (S4).
Next, since one of the character line candidates whose positions are taken out one by one is equivalent to a correct character line, the line likelihood (likelihood) of the overlapping target character line candidate is determined in order to make the determination. The likelihood calculating means 105 calculates each of the overlapping rows (S5).
There are various methods of calculating the likelihood used at this time, but here, the following method is used.
FIG. 5 is a chart showing a more detailed flow of the row likelihood calculation processing (S5) according to the present embodiment, and FIG. 6 shows a more detailed configuration of the row likelihood calculating means 105 according to the present embodiment. It is a block diagram.
As shown in FIG. 6, the row likelihood calculation means 105 includes a row length calculation means 105-01, a row height calculation means 105-02, an aspect ratio calculation means 105-03, a connected component distance calculation means 105-04, a connection A component size calculating unit 105-05 and a feature amount adding unit 105-06 are provided. The function of each of the above means included in the row likelihood calculating means 105 will be described by the following operation description according to the flow.
[0030]
According to the flow of FIG. 5 showing the procedure of the line likelihood calculation of this example, first, the length and the height of the target character line are obtained by using the line length calculation unit 105-01 and the line height calculation unit 105-02 ( S5-01).
Thereafter, the aspect ratio calculating means 105-03 calculates the aspect ratio (length ratio) of the row based on the determined row length and height (S5-02).
Further, the distance between the connected components constituting the row is obtained by using the connected component distance calculating means 105-04 (S5-03). The distance between the connected components obtained here corresponds to the character interval.
Further, the size of the connected component constituting the row is examined using the connected component size calculation means 105-05 (S5-04). Here, of the connected components, the number of connected components having a character-like size is counted.
Next, using the values obtained in the above steps S5-02 to S5-04 as the feature amount, the feature amount summation unit 105-06 calculates the likelihood representing the character line likelihood (S5-05). There are various methods for calculating the likelihood. Here, an example in which the linear sum of the absolute value of the difference from the typical value of each feature amount obtained in advance is set as the likelihood will be described below.
That is, assuming that the value of a certain feature i is Fi and the typical value of the feature i when the row candidate is actually a row is Ei, the likelihood A can be expressed by the following [Equation 1].
[0031]
(Equation 1)

[0032]
Note that, in [Equation 1], Ki is a weight, and an optimum value is obtained by an experiment.
The typical value Ei is a value obtained by an experiment, and a large number of regions known to be rows may be prepared, and the average value of the characteristic amounts Fi obtained from these regions may be obtained.
Further, the Mahalanobis distance can be used instead of the above example of the linear sum, and an example thereof is shown below.
That is, assuming that the value of a certain characteristic amount i is Fi, the typical value (average value) of the characteristic amounts obtained by the experiment is Ei, and the variance of the characteristic amounts at the time of the experiment is Vi, the likelihood A is represented by the following [Equation 2]. Can be expressed.
[0033]
[Equation 2]

[0034]
Note that the actual Mahalanobis distance needs to take into account not only the variance but also the covariance between feature values, but here, the covariance is ignored for simplicity.
Further, in the case of a grayscale image or a color image, brightness information or color information may be used as the above-mentioned feature amount. It is possible to use the edge intensity, the difference between the brightness of the pixels forming the connected component and the surrounding brightness, the difference between the color of the pixels forming the connected component and the surrounding color, and the like. An example of how to determine the edge strength is shown below.
A spatial filter whose coefficient is (A) or (B) in FIG. 7 is applied to a lightness value (a G value in the case of a color image composed of RGB components) in a 3 × 3 pixel range around the target pixel. And the sum of the absolute values of the obtained values is defined as the edge strength of the pixel of interest. This is performed for all the pixels constituting the character line, and the average value is used as the edge strength in the character line.
[0035]
After performing the line likelihood calculation (S5) as described above, based on the likelihood of each of the obtained pair of overlapping character line candidates, the unsuitable line deletion unit 106 deletes the one not to be performed (S6). ). In this deletion process, the smaller the likelihood A is, the more likely the character is to be a character. Therefore, the inappropriate row deletion unit 106 regards the one with the smaller value of the likelihood A as the correct character row, and deletes the other.
A series of processing steps (S4 to S6) for deleting an unsuitable line from a pair of overlapping character line candidates is performed in units of a pair of overlapping character line candidates. This process is repeated until there is no more, and the completion of the process for all the target line candidates is confirmed (S7-YES), and correct character line information without overlap is output from the line storage means 103 as a processing result (S8). End the flow.
[0036]
"Embodiment 2"
In the above-described "Embodiment 1", a one-to-one row overlap is extracted, and a series of processing is performed until an incorrect row is deleted in the unit. Problems that could occur. The present embodiment aims to solve this problem, detects overlap of rows and columns in multiple pairs, calculates the likelihood within each group, and sets appropriate character line area information with a small number of processing operations. It is possible to obtain.
FIG. 8 is a chart illustrating a processing flow of the image processing method according to the present embodiment, and FIG. 9 is a block diagram illustrating a configuration of the image processing apparatus according to the present embodiment.
The image processing apparatus according to the present embodiment will be briefly described with reference to FIG. 9. An image input unit 201 acquires and stores an original image to be processed (for example, outputs color image data). The original image data acquired by the image input unit 201 is sent to the line extracting unit 202.
The line extracting unit 202 extracts all the line candidates that can be regarded as character lines from one document, and sends the extracted character line information to the line storage unit 203. Note that the row extracting unit 202 has a configuration similar to that of the row extracting unit 202 of the “first embodiment” (described in detail with reference to FIG. 4).
Duplicate line group extracting means 204 (described later in detail with reference to FIG. 10) extracts an overlapping line as an overlapping line group according to a predetermined condition based on the character line information obtained from the line storage means 103, and It outputs to the row group likelihood calculating means 205.
The row group likelihood calculating means 205 (described later with reference to FIG. 6) calculates the likelihood of each of the vertical and horizontal character lines in the overlapping row group, and sends the calculation result to the inappropriate row group deleting means 206. send.
The unsuitable row deletion unit 106 determines an unsuitable row based on the calculation result of the row likelihood in order to eliminate a duplicate row, performs the deletion process, and sends the result to the row storage unit 203.
The line storage unit 203 adds and stores character line information reflecting the result of the unsuitable line deletion process so that correct character line information can be output.
[0037]
Next, an image processing method according to the present embodiment will be described with reference to the flowchart in FIG. In the following description, a more detailed configuration of each component of the image processing apparatus (FIG. 9) and its operation will be described together.
According to the flow of FIG. 8, first, a document to be processed is acquired as an original image by the image input unit 101 (S21). The image acquired here takes the form of continuous pixel data in a normal raster system.
Next, horizontal and vertical line candidates are extracted from the processing target image using the line extracting unit 202, and the character line data is stored in the line storing unit 203 (S23, S24). Here, the method used for extracting the row candidates can be implemented in the same manner as in the above-described “first embodiment”. Therefore, for the row extracting process, the above description (the description regarding S2 and S3) is referred to, and the description is omitted.
Thereafter, based on the extracted horizontal and vertical character line data, the line duplication determination unit 204 performs line duplication determination (S24).
FIG. 10 is a chart showing a more detailed flow of the multiple row overlap determination step (S24) according to the present embodiment, and FIG. 11 shows a more detailed configuration of the duplicate row group extracting means 204 according to the present embodiment. It is a block diagram.
As shown in FIG. 11, the duplicate row group extraction means 204 includes a row selection means 204-01, a duplicate vertical row extraction means 204-02, a duplicate row extraction means 204-03, a duplicate row extraction control means 204-04, a duplicate row A group storage unit 204-05 is provided. The function of each of the above-described units included in the duplicated row group extraction unit 204 will be described by the following operation description according to the flow.
[0038]
According to the flow of FIG. 10 showing the procedure of the multiple row overlap determination (S24) of this example, first, one row is selected by using the row selecting means 204-01 (S24-01).
Next, the overlapping vertical line extracting means 204-02 extracts all the vertical lines whose positions overlap the selected horizontal lines (S24-02). This extraction result is stored in the duplicate row group storage means 204-05.
Thereafter, the overlap row extraction unit 204-03 extracts all the rows that overlap the vertical rows extracted in step S24-02 (S24-03).
The line extraction processing of steps S24-02 and S24-03 is repeated. FIG. 12 shows an example of the row group extracted as a result. Steps S24-02 and S24- are repeated from the original image shown in (A) of FIG. 17 in which the horizontal row shown in (C) and the vertical row shown in (D) are extracted and the above-described horizontal row → vertical row → horizontal row... When the procedure of 03 is performed, a row group shown in FIG.
The repetition of steps S24-02 and S24-03 is terminated under a predetermined termination condition (S24-04). The control is executed by the duplicate row extraction control means 204-04. Various conditions can be set as the end condition, and the following conditions can be exemplified.
・ Terminate at a predetermined finite number of times.
・ Terminate when a new overlapping line can no longer be detected.
・ The process ends when the circumscribed rectangle of the overlapping row has a certain area or more. For the circumscribed rectangle of the overlapping row, see the example of FIG.
・ Terminate when the number of overlapping rows exceeds a certain number.
These conditions are used in combination as needed. For example, even if the number of rows does not increase to a predetermined number, if a new overlapping row cannot be detected, the process is terminated so that an infinite loop is prevented. In this example, the condition is set so that the process is terminated after a finite number of repetitions.
[0039]
Next, based on the row group data having the overlap obtained in step S24 (FIG. 10), in the flow of FIG. 8, the row group likelihood calculating means 205 calculates each of the vertical row group and the horizontal row group in the row group. The likelihood is obtained (S25). The likelihood of the vertical group and the horizontal group can be determined by calculating the likelihood for each row using the same method as in the first embodiment and using the average of the calculated values. Further, since there are a plurality of rows, the variation (variance) of the row length and the row height may be added to the feature amount for obtaining the likelihood value.
In this way, the likelihood is calculated one by one from each of the vertical and horizontal line groups, and based on the obtained result, the value of the likelihood A is large (not like a character) by using the inappropriate row group deletion unit 206. A row group is determined, and an unsuitable row is deleted according to the determination result (S26). For example, if A obtained from the horizontal row group is larger than A obtained from the vertical row group, all rows belonging to the horizontal row group are deleted.
A series of processing steps (S24 to S26) for deleting an inappropriate row group from the vertical row group and the horizontal row group in which a plurality of rows are overlapped is performed by selecting one horizontal row as described above and in a row group unit according to a predetermined condition. The row group and the row group are examined one after another, and this processing is repeated until there are no more rows to be examined. After the completion of all target row processing is confirmed (S27-YES), correct character row information without overlap is processed. Is output from the row storage means 203 (S28), and this flow ends.
[0040]
"Embodiment 3"
In the above-described “Embodiment 2”, the purpose is to delete one of the incorrect row groups in a row group unit obtained by extracting a vertical row group and a horizontal row group in which a plurality of rows overlap under predetermined conditions. The method of obtaining line data is adopted, so that more appropriate line data can be obtained. However, in this method, in the case of an article of a table of contents as shown in FIG. 13, that is, a document in which page numbers corresponding to the items of the table of contents are described ((A) in FIG. 13 shows an image of the document), There is a case where an error occurs in which the arranged portion is extracted as a vertical row ((B) indicates the row extraction result in the figure). The present embodiment prevents such an error from occurring and makes it possible to obtain appropriate character line area information.
FIG. 14 is a chart showing a processing flow (part) of the image processing method according to the present embodiment, and FIG. 15 is a block diagram showing the configuration of the image processing apparatus according to the present embodiment.
The image processing apparatus according to the present embodiment has a configuration as shown in FIG. 15, and a portion surrounded by a broken line in FIG. 15 is the same as the apparatus configuration (FIG. 9) of the “second embodiment”. Therefore, the above description is referred to, and the description is omitted here.
Further, as constituent elements unique to the present embodiment, there are provided a row number ratio calculating unit 301, an adjacent row extracting unit 302, a row height examining unit 303, a likelihood setting unit 304, a counter value examining unit 305, and a counting unit 306. The functions of each of these means unique to the present embodiment will be clarified by the following description of the operation according to the flow.
[0041]
Next, an image processing method according to the present embodiment will be described with reference to the flowchart in FIG. It should be noted that the flow of the present embodiment is the same as the flow of FIG. Is inserted. Therefore, the following description focuses on the procedure shown in the flow of FIG. FIG. 16 shows an example used to explain this procedure. FIG. 11 is a diagram for explaining this processing by taking as an example a line extraction of a page number corresponding to the table of contents of the items described in the table of contents.
As described in the above-described “Embodiment 2”, after extracting a group of rows overlapping in the vertical and horizontal rows in the multiple-row overlap determination step S24, the processing according to the flow of FIG. 14 is performed as processing unique to the present embodiment. Here, based on the extracted row group, the row number ratio calculating means 301 examines whether the number of rows and columns is 1: many (or many: 1) (S31). As shown in FIG. 16A, this relationship is established in the illustrated portion of the page where the overlapping row is "5" when the vertical row (dotted line) is "1".
At this time, if the relationship of 1: many (or many: 1) does not hold, the process proceeds to the likelihood calculation step S25 (see FIG. 8) of the row group.
[0042]
On the other hand, if 1: many (or many: 1) holds, there is a possibility that the article is a row of pages of the article describing the table of contents, so that the examination is performed from another viewpoint.
Next, since the number of rows that meet the condition is counted, the count value of the counting means 306 prepared for that is reset to "0" (S32).
Next, a plurality of rows (hereinafter, referred to as “reference source rows”), which are extracted by using the adjacent row extracting unit 302, are arranged in the length direction in the example shown in FIG. An adjacent row is searched (S33). At this time, the coordinate values in the height direction match, and the row closest to the reference source row (hereinafter referred to as “adjacent row”; in the example shown in FIG. (Corresponding to this).
After the step S33, if there is an adjacent line (S34-NO), if there is no proceeding to the following steps S35 to S37, this step is passed and the step proceeds to S38.
If there is an adjacent row, the row height examination means 303 is used. If the height of the adjacent row and the reference source row are close (S35-YES), "1" is added to the count value of the counting means 306 (S36). Proceeding to step S38 described below, the line of the reference source is changed (S39), and the same processing as described above is performed for the other reference source lines. This processing is equivalent to searching for a character string (described in the table of contents item) that is paired with the table of contents page number.
[0043]
If the height of the adjacent row does not match the reference source row and is too high (S37-YES), the process proceeds to step S38 described below without searching for a further adjacent row for the same reference source row. This processing is equivalent to determining that there is no character string to be paired.
An example requiring this procedure is shown in FIG. In this example, there is a graphic having a height such that the relationship between the table of contents page number and the corresponding character string does not hold between the adjacent line and the reference source line, which is an unnatural configuration as a table of contents. .
If the height of the adjacent row does not match the reference source row, but it is not too high (S37-NO), the process returns to step S33 and the position is next to the same reference source row next to the current adjacent row. Search for adjacent rows.
When the examination of all the adjacent rows is completed, or when an adjacent row that satisfies the condition is found, it is confirmed whether or not there is still an unexamined reference source row (S38). Here, if there is still an unexamined reference source line, the reference source line is changed (S39), and the process returns to step S33 again to search for an adjacent line.
If there is no unexamined reference source rectangle in step S38, the counter value is examined by the counter value examination means 305 (S40). That is, when the counter value is larger than a predetermined value or when the ratio with respect to the total number of reference source lines is equal to or more than a certain value (S40-YES), it is determined that the page numbers are arranged in the table of contents. Using the degree setting means 304, a value is set as the likelihood of the row group such that the single row (1: the row on the “1” side when there is a large number of rows) is deleted (S41). ), And proceeds to step S26 (FIG. 8). The procedure of examining the counter value by the counter value examining means 305 will be described with reference to the example of FIG. 16. In the example of FIG. 16B, five adjacent rows matching the condition are found for the reference source row 5. The counter value becomes 5. Since the ratio of the counter value to the total number of reference source rows is 1.0, which is extremely high, it is appropriate to determine that this is a table of contents page number.
As a result of examining the counter value, if the counter value is not larger than the predetermined value, the process proceeds to step S25 (FIG. 8) to perform likelihood calculation.
[0044]
"Embodiment 4"
According to the present embodiment, as described in the above-described “Embodiment 1” and “Embodiment 2”, high-speed processing in an image processing method and apparatus capable of obtaining appropriate character line candidate area information from an input document image. Is intended. Here, the purpose is realized by compressing the target image data.
FIG. 17 is a chart illustrating a processing flow of the image processing method according to the present embodiment, and FIG. 18 is a block diagram illustrating a configuration of the image processing apparatus according to the present embodiment.
In the present embodiment, an example will be described in which an element for speeding up processing is applied in the first embodiment. Therefore, the same components as those of the image processing method and the image processing apparatus (FIGS. 1 and 2) according to the first embodiment are denoted by the same reference numerals in the drawings, and the same components are described in the previous description. For reference, the description is omitted here.
As shown in FIG. 18, a compressed image generating unit 401 and a coordinate value converting unit 402 are provided as constituent elements unique to this embodiment. The functions of each of these means unique to the present embodiment will be clarified by the following description of the operation according to the flow.
[0045]
Here, the image processing method according to the present embodiment will be described with reference to the flowchart in FIG. It should be noted that the flow of the present embodiment is the same as that of the first embodiment (see FIG. 1), except that a step S41 for generating a compressed image and a step S41 for generating a compressed image, Is inserted between the step S7 for confirming whether or not the step has been examined and the step S8 for outputting the result. Therefore, the following description focuses on the procedure added in the flow of FIG.
As described in the first embodiment, after the image to be processed is input by the image input unit 101 in the image input step S1, the compressed image (reduced image) is converted from the original image to be processed using the compressed image generation unit 401. Is generated (S41). The compressed image is generated by combining several pixels square of the original image into one pixel. The compression ratio may use a fixed value or may change it according to the resolution of the original image. For example, when the resolution of the original image is high, if the compression ratio is increased, the processing time can be prevented from increasing so much even if the resolution increases. Here, it is assumed that compression is performed to combine a range of 2 × 2 pixels into one pixel as a fixed value. It should be noted that the values of the collected pixels are implemented by adopting a method in which in the case of a binary image, black is included in 2 × 2 pixels if the pixel is black, and otherwise white (so-called OR compression). I can do it. In the case of a grayscale image or a color image, a method of using an average value in a 2 × 2 pixel area may be adopted.
The compressed image generated in step S41 is then passed to the row data extraction step (S2) as a processing target image, and the process proceeds to step S7 through the same processing steps as in the first embodiment.
If it is determined in step S7 that all rows have been examined, the coordinate value conversion means 402 converts the coordinate values indicating the correct row area information obtained through the procedure up to the previous stage into those of the original image (S42). In the case of adopting compression in which 2 × 2 pixels are combined into one pixel, a coordinate value conversion for doubling the coordinate values of both X and Y coordinates may be performed. In this example, the number of pixels of the image to be used is reduced to one fourth, so that the processing can be speeded up.
[0046]
"Embodiment 5"
This embodiment shows an embodiment using a general-purpose computer as a means for executing the image processing steps for obtaining the character line area information shown in the above-described “Embodiment 1” to “Embodiment 4”. It is.
FIG. 19 shows the configuration of the processing apparatus of the present embodiment. As shown in FIG. 19, this embodiment shows an example of implementation using a general-purpose computer, and includes a CPU 801, a memory 802, a hard disk drive 803, an input device 804, a CD-ROM drive 805, a display 806, Prepare a mouse etc. Also, a removable recording medium 807 such as a CD-ROM used by the CD-ROM drive 805 stores a program (software) for realizing the above-described processing functions and processing procedures for obtaining the character line area information of the present invention. Has been recorded.
A document image to be processed is input by an input device 804 such as a scanner, and is stored in, for example, a hard disk 803 or the like. The CPU 801 reads a program for realizing the above-described processing functions and procedures from the recording medium 807, executes a process for obtaining character line area information according to the program on the target image, and outputs the result to the display 806 or the like.
[0047]
【The invention's effect】
(1) Effects corresponding to the first aspect of the invention
The vertical and horizontal line candidate areas that have overlapping areas that overlap each other are judged to be appropriate as character lines, and based on the results, lines in inappropriate directions are deleted. Even if a document image in which documents are mixed in one document is given unconditionally, it is possible to obtain information representing an appropriate character line area, and it is possible to obtain appropriate information without specifying a processing target area as in the conventional method. The line can be grasped.
(2) Effects corresponding to the second aspect of the invention
According to the first aspect of the present invention, a basic procedure for extracting vertical and horizontal line candidate areas having overlapping areas overlapping each other is provided.
(3) Effects corresponding to the inventions of

claims

3 and 4
Since the line candidate areas in the vertical and horizontal directions are detected in a plural-to-multiple overlapping manner and grouped according to a predetermined rule, which is determined to be a more appropriate line in each group (line candidate group), less processing operation is required. To obtain character line area information.
(4) Effects corresponding to the invention of claim 5
A method of extracting a connected component of a black pixel or a white pixel, a connected component of pixels having close brightness in adjacent pixels, and a method of extracting a connected component of pixels having close colors in adjacent pixels is provided. The invention of claim 1 can be applied to a (color) image.
[0048]
(5) Effects corresponding to the inventions of claims 6 to 9
The suitability of character candidate lines in the vertical / horizontal line candidate areas with overlapping areas is determined by the character line features (line length, line height, line size aspect ratio, connected component distance, connected component size). Variability of the above amounts, brightness of pixels, color information of pixels, edge strength of pixels, brightness differences between connected components of pixels and surrounding pixels, color differences between connected components of pixels and surrounding pixels, etc.). Since the determination is made by the likelihood calculation representing, it is possible to obtain a more appropriate result.
(6) Effects corresponding to the inventions of claims 10 to 13
A likelihood calculation process is performed which represents a likelihood in consideration of a condition that may cause an error (see the description related to FIG. 13) that may occur when it is determined which is more appropriate in a group (row candidate group). As a result, this error can be avoided.
(7) Effects corresponding to the inventions of claims 14 to 18
The input original image is compressed before the process of extracting the row candidate areas in the vertical and horizontal directions. (When the original image is binary, if the compression unit includes black pixels, the compressed image is converted to black pixels. Conversely, if the pixel is a white pixel, the compressed image is a black pixel, and when the original image is multi-valued, at least one of the average value, the maximum brightness value, and the minimum brightness value of the compression unit is adopted. Therefore, it is possible to speed up the processing.
(8) Effects corresponding to the inventions of claims 19 and 20
The effects of the above (1) to (7) are easily realized by installing a program for executing each step of the image processing method according to any one of claims 1 to 18 on a general-purpose computer. It is possible to provide an image processing device having an effect.
[Brief description of the drawings]
FIG. 1 is a chart showing a processing flow (first embodiment) of an image processing method for obtaining information representing a character line area according to the present invention.
FIG. 2 is a block diagram illustrating a configuration of an image processing apparatus for performing the processing in FIG. 1;
FIG. 3 is a chart showing a detailed flow of a row extraction step (S2, S3) in FIG. 1;
FIG. 4 is a block diagram showing a more detailed configuration of a row extracting unit in FIG. 2;
FIG. 5 is a chart showing a detailed flow of a row likelihood calculation step (S5) in FIG. 1;
FIG. 6 is a block diagram showing a more detailed configuration of a row likelihood calculating unit in FIG. 2;
FIG. 7 shows characteristics of a spatial filter used for a process for obtaining a feature amount of a character line in a color image.
FIG. 8 is a chart showing a processing flow (second embodiment) of an image processing method for acquiring information representing a character line area according to the present invention.
FIG. 9 is a block diagram illustrating a configuration of an image processing apparatus for performing the processing of FIG. 8;
FIG. 10 is a chart showing a detailed flow of a multiple line overlap determination step (S24) in FIG. 8;
FIG. 11 is a block diagram showing a more detailed configuration of a duplicate row group extraction unit in FIG. 9;
FIG. 12 is a diagram for explaining a process of extracting a group of duplicate rows.
FIG. 13 is a diagram illustrating a vertical row extraction error that may occur in the duplicate row group extraction processing according to the second embodiment.
FIG. 14 is a chart showing a processing flow (Embodiment 3) for avoiding a vertical extraction error.
FIG. 15 is a block diagram illustrating a configuration of an image processing apparatus provided with a unit for performing the processing in FIG. 14;
FIG. 16 is a diagram for describing processing (Embodiment 3) for avoiding a vertical row extraction error.
FIG. 17 is a chart showing a processing flow (Embodiment 4) of an image processing method for acquiring information representing a character line area according to the present invention.
18 is a block diagram illustrating a configuration of an image processing apparatus for performing the processing in FIG.
FIG. 19 is a diagram illustrating a computer that executes an image processing method according to the present invention.
[Explanation of symbols]
101, 201 ... image input means, 102, 202 ... row extraction means,
103, 203: row storage means, 104: duplicate row extraction means,
105: row likelihood calculation means 106: unsuitable row deletion means
204: duplicate row group extraction means 205: row group likelihood calculation means
206: unsuitable row group deleting means, 301: row number ratio calculating means,
302: adjacent row extraction means, 303: row height examination means,
304: likelihood setting means 305: counter value examination means
306: counting means 401: compressed image generating means
402: coordinate value conversion means, 801: CPU,
802: memory, 803: hard disk drive,
804 input device 805 CD-ROM drive
806: Display, 807: CD-ROM.

Claims

A row extracting step of extracting a row candidate area that can be regarded as a horizontal row from an image input as a processing target; a vertical row extracting step of extracting a row candidate area that can be regarded as a vertical row from the target image; A duplicate row detection step of detecting a row candidate area in the horizontal direction and a vertical direction having an area overlapping each other based on the result of the vertical row extraction step, and a horizontal direction having the overlap area detected in the duplicate row detection step. An inappropriate line selection step of determining the suitability of a vertical line candidate area as a character line and selecting a line in an inappropriate direction, and a line for deleting an inappropriate line candidate according to the result of the inappropriate line selection step An image processing method comprising: each step of a deletion step.

2. The image processing method according to claim 1, further comprising a row data storing step of storing a row candidate area extracted in the horizontal and vertical row extracting step, wherein the duplicate row detecting step is stored in the row data storing step. An image processing method comprising the steps of: selecting one row from the selected row candidate areas; and selecting at least one row of row candidate areas having a direction different from the one row and overlapping the areas.

2. The image processing method according to claim 1, further comprising a row data storing step of storing a row candidate area extracted in the horizontal and vertical row extracting step, wherein the duplicate row detecting step is stored in the row data storing step. A primary selection step of selecting one row from among the selected row candidate areas, a secondary selection step of selecting all row candidate areas having a direction different from the one row and overlapping areas, and a new selection step after the secondary selection step. A high-order selection step of sequentially applying to the selected rows, selecting all the row candidate areas having different directions from the row and overlapping areas, and a row selection stop for stopping the high-order selection step according to predetermined conditions An image processing method comprising steps.

4. The image processing method according to claim 3, wherein the row selection stopping step is performed when a predetermined number of selection steps is reached during the selection or when there is no more row candidate area to be selected. Stopping the selection when at least one of the stop conditions is satisfied when the area of the circumscribed rectangle of the selected row candidate area exceeds a predetermined value when the number of the row candidate areas exceeds a predetermined number. An image processing method, characterized in that:

2. The image processing method according to claim 1, wherein in each of the horizontal and vertical row extracting steps, a connected component of a black pixel or a white pixel is extracted, and adjacent pixels having similar brightness are extracted as connected components. Extracting at least one connected component among the steps of extracting pixels having similar colors in adjacent pixels as connected components, and extracting a nearby connected component from the connected components obtained in the connected component extraction step. Selecting, comparing the sizes of neighboring connected components, performing a grouping of connected components based on the comparison result, and obtaining a circumscribed rectangle of the grouped connected components. An image processing method comprising:

2. The image processing method according to claim 1, wherein the inappropriate line selection step is performed in each of the line candidate regions to determine the suitability of the horizontal and vertical line candidate regions having an overlap region as character lines. Calculating the likelihood representing the following.

2. The image processing method according to claim 1, wherein the inappropriate line selecting step determines the suitability of the horizontal and vertical line candidate regions having an overlapping region as character lines in each direction. Calculating a likelihood representing the likelihood of the entirety of a single or a plurality of row candidates in the image processing method.

8. The image processing method according to claim 6, wherein the step of calculating the likelihood includes a step of calculating one or more feature amounts, and a step of calculating the likelihood based on the obtained feature amounts. An image processing method comprising:

9. The image processing method according to claim 8, wherein the step of obtaining the characteristic amount includes the steps of calculating a line length, calculating a line height, calculating an aspect ratio of the line size, Calculating the distance between the connected components of the constituent pixels; calculating the size of the connected components of the pixels forming the row; calculating the variation in the length of the row; calculating the variation in the height of the row Calculating the variation in the aspect ratio of the row size, calculating the variation in the distance between the connected components of the pixels constituting the row, calculating the variation in the size of the connected components of the pixels constituting the row, Extracting the brightness information of the constituent pixels, extracting the color information of the pixels forming the row, calculating the edge intensity of the pixels forming the row, and configuring the row. Calculating a brightness difference between a connected component of a pixel and a surrounding pixel, and calculating a color difference between a connected component of a row and a surrounding pixel. Image processing method.

8. The image processing method according to claim 7, wherein the step of calculating the likelihood includes a step of calculating a ratio of the number of horizontal and vertical rows of the candidate horizontal and vertical rows having an overlapping area. An image processing method comprising: a calculation step of applying another likelihood calculation process when the row number ratio is a predetermined value.

11. The image processing method according to claim 10, wherein the calculation processing step of applying the another likelihood calculation processing includes: extracting a row candidate area adjacent to a row candidate area having the overlapping area; Checking whether the height of the line candidate area satisfies a predetermined condition; counting the number of lines whose height satisfies the predetermined condition; and determining whether the count value satisfies a predetermined condition. And a step of determining a likelihood representing the likelihood based on the result of the investigation.

12. The image processing method according to claim 11, wherein the step of examining the height of the extracted line candidate region is such that the positions of the line candidate regions in the height direction substantially match, and the difference in the height of the line candidate region is An image processing method for determining a small one as a line candidate area meeting a predetermined condition.

13. The image processing method according to claim 12, wherein the step of determining the line candidate area that meets the predetermined condition includes the step of determining a line candidate area having a small difference in height even in a place with a shorter distance. Is a step of determining that there is no line that satisfies the condition when there is a line having a large value.

2. The image processing method according to claim 1, wherein a compressed image is generated from an original image input as a processing target before the horizontal and vertical row extracting step, and the compressed image obtained after the row deleting step. Restoring the line area based on the original image to a value based on the original image.

15. The image processing method according to claim 14, wherein the step of generating the compressed image includes a step of changing a compression ratio according to a resolution of an original image.

In the image processing method according to claim 14, the step of generating the compressed image includes: when the original image is a binary image, if a black pixel is included in a pixel range that is a compression unit of the original image; An image processing method, comprising the step of setting the pixel of a compressed image to be generated as a black pixel.

In the image processing method according to claim 14, the step of generating the compressed image includes, when the original image is a binary image, if a white pixel is included in a pixel range serving as a compression unit of the original image, An image processing method, comprising the step of setting the pixel of a compressed image to be generated as a white pixel.

15. The image processing method according to claim 14, wherein, when the original image is a multi-valued image, the step of generating the compressed image includes the step of generating an average pixel value of a pixel range serving as a compression unit of the original image. The step of setting the pixel value of the pixel; the step of setting the pixel value of the pixel of the compressed image that generates the pixel value with the highest brightness in the pixel range that is the compression unit of the original image; the pixel range that is the compression unit of the original image An image processing method comprising: at least one of the steps of: setting a pixel value of a pixel of a compressed image that generates a pixel value with the lowest brightness among the pixels.

A program for causing a computer to execute each step of the image processing method according to claim 1.

An image processing apparatus, comprising: a computer on which the program according to claim 19 is installed, wherein the computer processes data of a target image.