JP2004334743A

JP2004334743A - Method for extracting ruled line, method for encoding ruled line, and ruled line extracting program

Info

Publication number: JP2004334743A
Application number: JP2003132697A
Authority: JP
Inventors: Toru Tanaka; 通田中
Original assignee: DAEMON KK
Current assignee: DAEMON KK
Priority date: 2003-05-12
Filing date: 2003-05-12
Publication date: 2004-11-25

Abstract

<P>PROBLEM TO BE SOLVED: To extract and encode ruled lines from document image data. <P>SOLUTION: Binary document images are acquired and connecting pixels are recognized from the images. The periphery lengths of the connecting pixels are found, the connecting pixels with larger periphery length than a predetermined threshold THC are determined to be the ruled lines, and the ruled lines are solely extracted. When setting the threshold value THC, the connecting pixels in the document images are stored, the number of pixels with the periphery length for every periphery length for each stored connecting pixel is counted, the local maximum point (x<SB>m</SB>', y<SB>m</SB>') as the farthest point from the straight line connecting a point (x<SB>1</SB>, y<SB>1</SB>) in the rank of the smallest periphery length and a point (x<SB>N</SB>, y<SB>N</SB>) in the rank of the largest periphery length in the relation between the periphery lengths and the number of pixels is found, and the periphery length of the local maximum point is set as the threshold value THC. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、文書の画像データから罫線を抽出する方法に関する。
【０００２】
【従来の技術】
高度情報化社会の中、多くの文書が電子化され、各種メディアを通じて利用できるようになっている。一方、依然として、紙による文書も様々な状況において大量に用いられているが、情報の開示や作業の効率化のため、近年これら紙の文書に対する自動電子化の要求が急速に高まってきている。この「紙」から「電子データ」への変換の際に、大きな手助けとなるのがＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅａｄｅｒ）である。文書画像処理技術の向上に伴い、ＯＣＲはより実用的なものとなり、イメージスキャナを含むコンピュータの普及や、高認識率ＯＣＲソフトウェアの低価格化が進んだことにより、社会の中で利用可能な環境が整いつつある。
【０００３】
しかしながら、図表、罫線などの文字以外の成分が含まれた文書画像に関しては、現状のＯＣＲでは対応しきれていないことが多い。各領域の属性を手作業で指定するＯＣＲソフトウェアもあるが、その場合、処理の完全な自動化を行うことはできない。また、処理の対象を表形式文書に限定し、表構造の認識に特化したＯＣＲソフトウェアもあるが、その出力においては、表構造にかなりの冗長性を持たせて表現しており、表構造を簡潔に表していない。すなわち、細かい格子に罫線をマッピングしたに過ぎず、システムが表構造を理解していることにはなっていない。
【０００４】
表は、社会全般において、使われる機会が非常に多い表現方法であり、文章などのような構造的に複雑なものではなく、内容を属性ごとに整列させ、意味をまとめて整理するために用いられるため、文章中では特に重要な要素である。したがって、表の表現には、構造を視覚的に再現するだけではなく、冗長性を無くした簡潔なものを用いなくては、表を理解したことにならない。ここ数年、電子商取引（ＥＣ：ＥｌｅｃｔｒｏｎｉｃＣｏｍｍｅｒｃｅ）や電子データ交換（ＥＤＩ：ＥｌｅｃｔｒｏｎｉｃＤａｔａＩｎｔｅｒｃｈａｎｇｅ）を進めるために、ＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）という技術が注目されているが、これは、各領域のデータに属性を与え、テキスト処理を容易にするためのものであり、領域の指定に冗長性が有る場合には、そのデータに属性を与えることが困難になる。
【０００５】
このような表構造の理解により、表構造を符号化する前提として、表構造（罫線）を電子データ化する技術が必要である。表構造を電子データ化する技術としては、入力された文書画像を２値化し、画像の縦方向と横方向について、それぞれ黒画素の画素数を各行、各列ごとに求め、この投影値に基づいて罫線を抽出する方法が知られている（例えば、特許文献１参照）。
【０００６】
【特許文献１】
特開２００１−１０９８８８号公報（段落００６４、図１８等）
【０００７】
【発明が解決しようとする課題】
しかしながら、前記した表が大きく傾いている場合や、同じ文字が連続して配置されている文書画像の場合には、正確に罫線を抽出することは困難であった。
そこで、本発明では、容易かつ正確に罫線を抽出する方法を提供することを課題とする。
【０００８】
【課題を解決するための手段】
前記した課題を解決するため、本発明の請求項１では、文字と罫線が混ざった文書画像から罫線を抽出する方法であって、２値化した文書の画像を取得するステップと、連結画素を認識するステップと、連結画素の周囲長を求めるステップと、周囲長がしきい値より大きい連結画素を罫線と判断するステップとを含むことを特徴とする。
【０００９】
このような方法によれば、２値化した画像における、連結画素、つまりひとかたまりと認識される部分の周囲長の長さがしきい値より大きいかどうかにより、罫線とそれ以外の要素を区別することができる。ここでの周囲長とは、連結画素の輪郭をたどる一周の長さであり、例えば輪郭を構成する画素の数で求められる。
【００１０】
また、請求項２の発明では、前記しきい値の決定に際し、前記文書画像中の連結画素を記憶するステップと、前記記憶した各連結画素について、前記周囲長を階級ごとに累積して度数を得るステップと、前記周囲長と前記度数を座標軸とした直交座標上において、最も小さい周囲長の階級にある点と最も大きい周囲長の階級にある点を結ぶ直線から最も離れた連結画素の点である極大点を求めるステップと、前記極大点の周囲長を前記しきい値として設定するステップとを含んで決定することを特徴とする。
【００１１】
このように、周囲長を階級ごとに累積して度数を数え、周囲長と度数の関係を直交座標上に作ると、文字等は、小さい周囲長で高い度数存在する一方で、罫線は、大きい周囲長で、低い度数しか存在しない。そのため、周囲長が小さい部分では高い度数があるが、ある周囲長から極端に度数が低くなり、前記関係のグラフを作ると、ある周囲長で急激に曲がる曲線となる。そして、この曲線が急激に曲がる点を求めるため、最も小さい周囲長の階級にある点と最も大きい周囲長の階級にある点を結ぶ直線から最も離れた点（本明細書において、これを「極大点」とする。）を求める。この極大点をしきい値として設定すれば、このしきい値より大きい周囲長を有する連結画素のみを選択して、罫線を抽出できることになる。
ここで、階級といっているのは、直交座標上にプロットできる点は、周囲長について連続しているものではなく、とびとびに存在しているためである。そして、前記した極大点の周囲長をしきい値とするということは、実際に罫線として選択される連結画素は、前記しきい値の次の階級の周囲長を有するものとなる。
【００１２】
また、請求項３に記載の発明では、請求項１又は請求項２に記載の罫線の抽出方法により抽出した罫線画像を符号化する方法であって、前記罫線画像から、罫線が交差する節点位置を認識するステップと、前記節点位置において、その節点が、表構造のうちの右上、左上、右下、または左下のそれぞれの角部になりうるかを判定するステップと、前記右上、左上、右下、または左下のそれぞれの角部になりうるかの組合せに対し１対１に節点番号を対応させたテーブルを参照し、前記節点に節点番号を割り当てるステップとを含むことを特徴とする。
このように、罫線を抽出した画像から罫線の節点位置を求めることで、罫線と文字が混ざっている状態から直接節点位置を求めるよりも、正確に節点位置を求めることができる。そして、各節点について、右上、左上、右下、または左下のそれぞれの角部になりうるかの組合せから、１対１に節点番号を対応させたテーブルを用いることで、この節点になりうるかの判断のみで、節点番号を割り振ることができる。なお、これらの発明では、コンピュータにより各ステップを実行する。
【００１３】
また、本発明の請求項４では、文字と罫線が混ざった文書画像から罫線を抽出するための罫線抽出プログラムであって、コンピュータを、２値化した文書の画像を入力する手段と、連結画素を認識する手段と、連結画素の周囲長を求める手段と、周囲長がしきい値より大きい連結画素を罫線と判断する手段として機能させることを特徴とする。
【００１４】
【発明の実施の形態】
次に、適宜図面を参照しながら本発明の実施形態について説明する。
参照する図面において、図１は、実施形態に係る罫線の抽出方法および符号化方法を実行するシステム（以下、「罫線抽出符号化装置」という）の構成図であり、図２は、実施形態に係る罫線の抽出方法および符号化方法の処理の概略を示すフローチャートであり、図３は、実施形態に係る罫線抽出符号化装置の機能ブロック図である。
【００１５】
図１に示すように、罫線抽出符号化装置１は、中央処理装置（ＣＰＵ）１１、入力装置としてのキーボード１３ならびにマウス１４、出力装置としてのディスプレイ１６、および記憶装置１５を備えた一般的なコンピュータに、罫線を含む文書Ｄをデジタル画像（以下、単に「画像」という）として読み込むスキャナ１２が接続されて構成されている。記憶装置１５には、後記する各機能を実現するコンピュータプログラムが記憶され、適宜ＣＰＵ１１にロードされて実行される。
【００１６】
図２を参照して、実施形態に係る罫線の抽出方法および符号化方法の概略を説明する。まず、スキャナ１２を使用して、文書Ｄを２値化した画像として取得する（Ｓ１）。そして、取得した画像中から一つにつながった連結画素を認識する（Ｓ２）。次に、各連結画素について周囲の長さ（周囲長）を計算する（Ｓ３）。次に、周囲長の長さから、その連結画素が罫線かどうかを判定する（Ｓ４）。罫線は、文字に比較して周囲長が長いことから、あるしきい値より大きい場合には、罫線であると判定すればよい。そして、罫線から節点を見つけ出した上で、節点を符号化することで、罫線を符号化する（Ｓ５）。この際、図１３に示したような節点符号化テーブル２９を参照することにより符号化を行う。
【００１７】
次に、図３を参照しながら、罫線抽出符号化装置の詳細について説明する。
図３に示すように、罫線抽出符号化装置１は、文書画像入力部２１と、連結画素認識部２２と、周囲長演算部２３と、しきい値演算部２４と、罫線判定部２５と、節点認識部２６と、節点符号化部２７と、節点符号化テーブル２９とを有して構成されている。
【００１８】
文書画像入力部２１は、前記したスキャナ１２を作動させるソフトウェアとして構成される。文書画像入力部２１は、スキャナ１２から、２値化画像として文書Ｄを取り込み、記憶装置１５に記憶させる。カラー画像、モノクロの多階調の画像として画像を入力する場合には、画像を入力した後、２値化するように構成すればよい。
文書画像入力部２１で入力される画像は、例えば、図４（ａ）の原画像や、図６（ａ）の原画像のようになる。
【００１９】
連結画素認識部２２は、文書画像入力部２１で入力された文書Ｄの２値化画像から、一つながりの黒画素（連結画素）を認識する部分である。なお、黒地に白で文字や罫線を標記した文書を入力する場合には、白画素を認識するように構成すればよい。
連結画素の認識方法は、種々の方法をとることができる。
例えば、各画素にラベルを付けていく方法である。これは、注目画素の周囲（上のラインと左）の画素を参照し、その周囲の画素が黒であれば、注目画素に周囲の画素と同じラベルを付す方法で、この処理を画像の左上の画素から順に画像全体にわたって行えば、各連結画素にユニークなラベルを付すことができる。
【００２０】
周囲長演算部２３は、連結画素認識部２２で認識した各連結画素の周囲長を演算する。周囲長とは、連結画素の輪郭の長さであり、例えば、輪郭を構成する画素（例えば、上下左右の画素のうち少なくとも一つが自己の連結画素でない画素）の総数をカウントすることで求められる。もちろん、この方法に限られず、他の方法により周囲長を求めても良い。また、周囲長は、罫線の内側のように黒画素の内側に囲まれた白画素がある場合、この内縁の黒画素を周囲長としてカウントしても、しなくても良いが、周囲長としてカウントした方が、文字と罫線をより正確に区別できるので望ましい。
【００２１】
しきい値演算部２４は、周囲長演算部２３で求めた各連結画素の周囲長と、その周囲長を有する連結画素の度数との関係を作り、この関係から、罫線と文字を判別するためのしきい値を演算する部分である。周囲長は、取得した文書画像の分解能に応じ、適度な幅で階級にしておくと良い。例えば、周囲長を、連結画素の周囲を形成する画素数でカウントする場合、１０画素ごとや、５０画素ごと、１００画素ごとなどで階級を作り、各連結画素を、それらの階級に割り当てる。そして、各階級ごとに、その階級に相当する連結画素の数を累積して度数を得る。
周囲長と、度数の関係は、例えば、図５や図７のようになる。ここで、図５は、図４（ａ）の文書（原画像）の文書の連結画素について、横軸に階級（周囲長）、縦軸に階級ごとの度数をとってプロットしたものであり、図７は、図６（ａ）の文書（原画像）の連結画素について、横軸に階級（周囲長）、縦軸に階級ごとの度数をとってプロットしたものである。これらを見て分かるように、度数の分布は、低い階級（短い周囲長）で高い度数を示し、階級が高く（周囲長が長く）なるにつれ急激に度数が低くなり、最高階級まで、低い度数で分布する。これは、一般に、文字と罫線が混ざった文書画像では、文字の数は周囲長が短く、度数が非常に多いのに対し、罫線は周囲長が長く、度数が非常に少ないからである。
【００２２】
図５または図７のような周囲長と度数の関係から、罫線か否かを判定する階級のしきい値を求める方法について、図８を参照しながら説明する。図８は、図７に対応する周囲長と度数の関係から、しきい値を求める方法を説明する図である。
図８に示すように、もとの周囲長の階級の軸（横軸）をｘ軸とし、度数の軸（縦軸）をｙ軸とする。そして、最も小さい周囲長の階級の点（ｘ_１，ｙ_１）と、最も大きい周囲長の階級の点（ｘ_Ｎ，ｙ_Ｎ）を結ぶ直線を作成し、これをｘ’軸とする。次に、ｘ’軸に直交する方向（図８では、（ｘ_１，ｙ_１）から左下向きを正に引いている）にｙ’軸をとる。ｘ’ｙ’の座標系をＳ’系と呼ぶことにする。
そして、Ｓ’系において、各点のｙ’軸への投影が最も大きい点を極大点（ｘ_ｍ’，ｙ_ｍ’）とし、この階級の周囲長をしきい値ＴＨＣとする。
極大点（ｘ_ｍ’，ｙ_ｍ’）は、図８の点の分布から明らかなように、ｘ’軸から最も遠い点である。
【００２３】
罫線判定部２５は、しきい値演算部２４で求めたしきい値ＴＨＣに基づき、各連結画素が罫線か否かを判定する部分である。すなわち、連結画素ごとに、その連結画素の周囲長と前記しきい値ＴＨＣの大小を比較し、周囲長がしきい値ＴＨＣよりも大きければ罫線であると判定し、小さければ罫線ではないと判断する。
また、罫線判定部２５は、文書画像から、罫線以外の部分（連結画素）を消去した画像を生成する。このようにして罫線が抽出された画像は、例えば図４（ｂ）、図６（ｂ）のようになる。
【００２４】
節点認識部２６は、文書画像中の罫線の位置を特定して、節点位置を求める部分である。例えば、節点認識部２６では、文書画像の縦（ｙ軸）方向に並んだ画素のラインについて、黒画素の数をカウントし、これをすべてのｘ座標について行う。このカウントした黒画素の数（累積値）を図示すると、図９（ａ）、（ｃ）のようになる。なお、図９は、ｘ座標、ｙ座標ごとの黒画素の累積値を図示したグラフであり、（ａ）が文字消去前のｘ方向、（ｂ）が文字消去前のｙ方向、（ｃ）が文字消去後のｘ方向、（ｄ）が文字消去後のｙ方向である。図９（ａ）、（ｃ）に示すように、累積値は、ｘ座標で間歇的に高い値を示す。その累積値が高いｘ座標が、罫線の縦線が位置する座標になる。
同様に、文書画像の横（ｘ軸）方向に並んだ画素のラインについて、黒画素の数をカウントし、これをすべてのｙ座標について行う。これを図示すると、図９（ｂ）、（ｄ）のようになる。ｙ座標についても、高い累積値を示すｙ座標が、罫線の横線が位置する座標になる。
【００２５】
（罫線位置の特定）
この縦線のｘ座標、及び横線のｙ座標の認識の仕方については、種々考えられる。
例えば、図１０に示すように、ｘ軸上に連続した３つの注目点（ｋ−１，ｋ，ｋ＋１）をとり、それらを移動させながら、累積値の分布（輪郭）を追跡する。図９（ｃ）、（ｄ）から分かるように、罫線抽出処理をして、罫線以外の部分（連結画素）を消去した上で黒画素の累積値の計算をすると、罫線がある箇所以外は平らになっており、平らな部分と罫線との境界が検知しやすくなっている。
ｈ（ｋ）を座標ｋにおける黒画素の累積値の値とし、罫線部分とそれ以外の部分との累積値の差をしきい値ＴＨＲとすると、次式（１）〜（３）の３式を共に満足する位置（ｋ＋１）を罫線が始まる位置とする（図１０（ａ）参照）。
ｈ（ｋ＋１）−ｈ（ｋ）＞ＴＨＲ・・・・・（１）
ｈ（ｋ＋１）−ｈ（ｋ−１）＞ＴＨＲ・・・・・（２）
｜ｈ（ｋ）−ｈ（ｋ−１）｜＜ＴＨＲ・・・・・（３）
罫線が始まる位置を特定した後に、次式（４）〜（６）を共に満たす位置（ｋ−１）を罫線の終わりを示す位置とする（図１０（ｂ）参照）。
ｈ（ｋ−１）−ｈ（ｋ）＞ＴＨＲ・・・・・（４）
ｈ（ｋ−１）−ｈ（ｋ＋１）＞ＴＨＲ・・・・・（５）
｜ｈ（ｋ＋１）−ｈ（ｋ）｜＜ＴＨＲ・・・・・（６）
そして、節点の座標を一つに決める必要があるときには、これらの判定により求められた罫線の始まりの位置と終わりの位置の中央の位置（座標）などを罫線の座標とすればよい。
【００２６】
また、図９（ｃ）、（ｄ）のように、罫線の位置の累積値とそれ以外の累積値に顕著な差がでるので、累積値のしきい値ＴＨＲを設定して、しきい値ＴＨＲより累積値が大きい場合には、罫線の部分であると判定し、小さい場合には、罫線が位置する座標ではないと判定してもよい。この場合、しきい値ＴＨＲの設定方法としては、例えば、累積値の平均値の定数倍をしきい値ＴＨＲとして設定する等があげられる。そして、節点の座標を決める必要があるときは、隣接して罫線と判定された部分を１本の罫線と考え、隣接する座標群のうちから適当な座標、例えば、最も小さい座標や、中央の座標を罫線の座標とすればよい。
【００２７】
（節点位置）
そして、縦線が位置するｘ座標と、横線が位置するｙ座標のすべての組合せを作り、組み合わせたｘ、ｙ座標が節点の座標となる。たとえば、ｘ座標が、１０，２０，５０に縦線があり、ｙ座標が１００，１２０に横線があると判定された場合には、ｘ座標、ｙ座標のすべての組合せ、すなわち（１０，１００）、（１０，１２０）、（２０，１００）、（２０，１２０）、（５０，１００）、（５０，１２０）が節点の座標となる。
このようにして、節点位置を求めて、罫線と重ねて図示した例が図１１である。図１１（ａ）においては、節点の位置を○印で示している。なお、Ｅ．Ｒ．（ＥｌｅｍｅｎｔＲｅｇｉｏｎ）は文字などが配置される要素領域を示す。図１１に示す例では、罫線として、５本の縦線と、５本の横線が認識された結果、各縦線のｘ座標と各横線のｙ座標を組み合わせた２５個の節点が認識されている。なお、図１１（ａ）に示す節点Ｎ４４のように、縦線及び横線のいずれも存在しない点も、線がない節点として認識される。
【００２８】
節点符号化部２７は、各節点が、要素領域の右上、左上、右下、または左下の角部になりうるかを、形状から判定（これを、「角位置判定」とする。）した後、各節点のなりうる角位置の組合せから、各節点を符号化する部分である。
【００２９】
（角位置判定）
角位置の判定は、例えば次のようにして行う。図１２は、角位置の判定方法を説明する図である。図１２（ａ）において、各マスは画素を示しており、斜線で塗りつぶした画素は黒画素を示している。図１２（ａ）では、横の罫線Ｒ１と縦の罫線Ｒ２が交差している。横の罫線Ｒ１は、座標ｙｓが罫線の始まりであり、座標ｙｅが罫線の終わりである。縦の罫線Ｒ２は、座標ｘｓが罫線の始まりであり、座標ｘｅが罫線の終わりである。
前記した節点位置の特定により、座標ｙｓ，ｙｅ，ｘｓ，ｘｅは特定されており、一つの節点の角位置判定については、この座標の周辺のみで行えばよい。
角位置の判定には、角１つにつき、次の第１〜第３の３ステップの処理がある。基本的な処理については、４種類の角（右上、左上、右下、左下）のすべてに共通しているので、ここでは節点が左上の角になりうるかどうかを判定する処理を例にして説明する。
【００３０】
［第１ステップ］
図１２（ａ）に太線で囲んで示した領域Ｄを設定する。領域Ｄは、縦横の罫線の太さ分に限定された領域を、右下に１画素（ｘ方向に１画素、ｙ方向に１画素）移動した領域である。
そして、要素領域の左上になりうる点を、領域Ｄから探す。つまり、次式（７）〜（９）を共に満たす点を探す。なお、０は白画素、１は黒画素であることを示す。
ｂ（ｘ，ｙ）＝０・・・・・（７）
ｂ（ｘ，ｙ−１）＝１・・・・・（８）
ｂ（ｘ−１，ｙ）＝１・・・・・（９）
これらの条件式を満たす点が存在すれば、その点を注目点Ｔ（ｘ_ｔ，ｙ_ｔ）とする。また、存在しなければ、その時点でその節点が左上の角位置を構成しない（なりえない）と判定する。
【００３１】
［第２ステップ］
罫線を構成する黒画素に沿って、注目点Ｔを所定数の画素数、例えば２００画素（ｐｉｘｅｌ）移動させる。注目点Ｔの移動は、進行方向右側に黒画素があるように行う。つまり、図１２（ａ）の下方向の矢印２のように、進行方向右側に黒画素が位置した状態で図中下方向に移動する。すると、図１２（ｂ）に示したような、逆Ｔ字型の節点であって、左上の角を構成しない場合には、注目点Ｔはｘ座標が小さくなる方向に進み、ｘｓに達してしまう。逆に、その節点が図中左上の角位置になりうるなら、注目点Ｔはｘ座標が大きくはなる（ｘｓから離れる）が、小さくはならないはずである。そこで、注目点Ｔのｘ座標がｘｓに達した場合には、その節点は左上の角位置を構成し、ｘｓに達しない場合には、その節点は左上の角位置を構成すると判定する。
第１ステップのみでは、図１２（ｂ）のように、逆Ｔ字型の節点において縦の罫線が横の罫線を一部はみ出した場合でも、左上の角位置を構成するという条件を満たしてしまうが、本ステップにより、このような節点が左上の角位置を構成しないと正しく判定できる。
【００３２】
［第３ステップ］
注目点Ｔを初期の座標（ｘ_ｔ，ｙ_ｔ）に戻し、罫線を構成する黒画素に沿って、注目点を所定数の画素数、例えば２００画素移動させる。この場合は、進行方向左側に黒画素があるように行う。例えば図１２（ａ）では、図中右向きの矢印３のように、進行方向左側に黒画素が位置した状態で右方向へ進行する。前記した第２ステップと同様に、もし、その節点が左上の角位置になりうるなら、注目点Ｔはｙ座標が大きくはなる（ｙｓから離れる）が、小さくはならないはずである。そこで、この移動の際、注目点のｙ座標がｙｓに達した場合には、その節点が左上の角位置を構成しないと判定される。この移動を最後まで終了した場合には、その節点が左上の構成要素であると判定する。
【００３３】
以上の第１〜第３ステップの処理を、右上、右下、及び左下の角位置についても行い、角位置になりうるか否かの判定結果を記憶装置１５に記憶する。そして、以上の処理をすべての節点について行う。
【００３４】
（符号化）
角位置判定ができたならば、図１３の節点符号化テーブルを参照して、角位置になりうるかどうかの値（○又は×）から、各節点に節点番号を割り当てる。
図１３の節点符号化テーブル２９は、占有する要素領域の角位置（左上、右上、左下、右下）の組合せについて、それぞれ１つの節点番号が関係づけられている。例えば、図１３の２行目のように、左上の角位置のみ構成する節点は、節点番号ｎ_１が関係づけられている。
この節点符号化テーブルにしたがって節点を節点番号に置き換えると、図１１（ｂ）のような行列（これを、「節点行列」という）とすることができる。図１１（ｂ）は、図１１（ａ）の表構造の各節点を図１３の節点符号化テーブルにしたがって符号化したものである。
図１３の節点番号はｎ_０からｎ_９の１０種類であるから、それぞれ０から９の１桁の数字であらわせば、図１１（ｃ）のように符号化することもできる。すなわち、１節点あたり１文字の数字でテキスト表現することができ、コンピュータプログラム言語内で、表を符合として取り扱うときに簡便である。
【００３５】
（符号化の後処理）
節点符号化部２７は、節点を符号化した後、冗長なデータ部分を削除して、表以外の部分を除いた圧縮されたデータとする。例えば、図１４（ａ）に示すような表Ｔ１と、下線Ｕ１が抽出された画像で節点行列を求めると、図１４（ｂ）のようになる。図１４（ｂ）では、下線Ｕ１の部分が、非節点（ｎ_０）と認識されて、１行目がすべてｎ_０（図では、強調するため大文字のＮ_０で表現している）となる。このように、一行または一列の全体にわたって非節点（ｎ_０）が連続している場合には、その行または列を削除することにより、データ量を低減することができる。行を削除した後のデータは、図１４（ｃ）のようになる。
なお、この後処理を行うかどうかは、本発明では任意である。
【００３６】
以上のような罫線抽出符号化装置１の動作について、図１５のフローチャートを参照しながら説明する。
まず、文書画像入力部２１が、文書Ｄをスキャナ１２で画像として取り込み、２値化する（Ｓ１１）。例えば、図４（ａ）、図６（ａ）のような画像を入力する。そして、連結画素認識部２２が、２値化画像中の連結画素を認識する（Ｓ１２）。次に、周囲長演算部２３は、各連結画素について周囲長を計算する（Ｓ１３）。この周囲長に基づいて、しきい値演算部２４は、周囲長と、その周囲長を有する連結画素の度数を、階級に分けて関係づける（Ｓ１４）。これを図示すると、例えば、図５や、図７のようなグラフになる。この関係（グラフ）において、最小階級と最大階級の点を直線で結び（Ｓ１５）、この直線から最も離れた点（階級）を探して、その周囲長をしきい値ＴＨＣとする（Ｓ１６）。
そして、罫線判定部２５は、各連結画素の周囲長と、しきい値ＴＨＣを比較して、それぞれの連結画素が罫線か否かを判定する（Ｓ１７）。さらに、罫線でないと判定された連結画素を、元の文書画像から消去して（Ｓ１８）、新たな画像を生成する（図４（ｂ）、図６（ｂ）参照）。
次に、節点認識部２６は、罫線判定部２５が生成した新たな画像について、ｘ座標、ｙ座標ごとに黒画素の累積値を求める（Ｓ１９）。例えば、図９（ｃ）、（ｄ）のような累積値の分布が得られる。そして、累積値の分布を追跡して節点の位置を求める。つまり、注目座標の累積値と、隣の座標の累積値とを比較して、しきい値ＴＨＲより大きいか否か（前記式（１）〜（６）参照）で、罫線の存在の有無及び位置を判断する（Ｓ２０）。
そして、節点符号化部２７は、各節点位置において、その節点が、要素領域（表構造）の右上、左上、右下、または左下の角位置を構成できるかを判定する（Ｓ２１）。この判定の際には、隣接画素との関係のみで、（［第１ステップ］）その角位置になれるかの判断と、罫線（黒画素）に沿って移動させていった場合にも、罫線があるｘ座標またはｙ座標を跨がないか（［第２ステップ］［第３ステップ］）により、要素座標の角位置を構成できるかどうかを判断する。
最後に、節点符号化部２７は、節点符号化テーブル２９を参照して、角位置の組合せから各節点を符合（ｎ_０〜ｎ_９）へ変換する（Ｓ２２）。
【００３７】
以上のような実施形態によれば、次のような効果がある。
まず、罫線の位置を認識するのに、連結画素と周囲長の関係を作り、この関係において、直線と点の距離を求め、この距離が最も遠い点をしきい値ＴＨＣとし、罫線の判定をするという、簡単な処理により罫線を判定できるので、処理速度を早くすることができる。
そして、罫線を抽出した後の画像から、罫線の位置（節点の位置）を求めるので、罫線の抽出ミスを少なくすることができる。
また、節点が、右上、左上、右下、または左下のいずれの角位置を構成しうるかの組合せに対し、１対１に対応させた節点符号化テーブルを参照して、前記組合せから節点番号を割り当てて符号化するので、極めて簡単に罫線を符号化することができる。さらに、後処理として、非節点の行または列を削除することで、不要なデータを捨てて、データ量を低減することができる。
また、本発明の節点符号化テーブルでは、１０通りの節点で整理してあるので、各節点モデルに０から９の数字を節点番号として割り当てることで、コンピュータプログラム上での表構造の取り扱いが簡便となる。
【００３８】
なお、本発明は、罫線の抽出・符号化方法及び罫線抽出プログラムであるので詳しくは説明しないが、符号化した罫線のデータは、前記した罫線符号化テーブルを利用して、節点番号から節点モデルを求め、節点モデルの直線を延長するように繋いでいくことで罫線（表構造）を再現することができる。
【００３９】
【発明の効果】
以上詳述した通り、本発明では、罫線を少ない処理、データ量で簡単に抽出し、符号化することができる。
【図面の簡単な説明】
【図１】実施形態に係る罫線抽出符号化装置の構成図である。
【図２】実施形態に係る罫線の抽出方法および符号化方法の処理の概略を示すフローチャートである。
【図３】実施形態に係る罫線抽出符号化装置の機能ブロック図である。
【図４】入力される画像の例であり、（ａ）が原画像、（ｂ）が罫線抽出後を示す。
【図５】図４の例における周囲長と度数の関係を示すグラフである。
【図６】入力される画像の例であり、（ａ）が原画像、（ｂ）が罫線抽出後を示す。
【図７】図６の例における周囲長と度数の関係を示すグラフである。
【図８】図７に対応する周囲長と度数の関係から、しきい値を求める方法を説明する図である。
【図９】ｘ座標、ｙ座標ごとの黒画素の累積値を図示したグラフであり、（ａ）が文字消去前のｘ方向、（ｂ）が文字消去前のｙ方向、（ｃ）が文字消去後のｘ方向、（ｄ）が文字消去後のｙ方向である。
【図１０】（ａ）は、罫線が始まる位置の探索、（ｂ）は、罫線が終わる位置の探索を説明する図である。
【図１１】（ａ）は、抽出された罫線と節点を重ねて示した図であり、（ｂ）は、（ａ）の罫線を符号化した行列、（ｃ）は、（ｂ）の添え字を使用して数値化した行列を示す。
【図１２】罫線の角位置の判定方法を説明する図であり、（ａ）が十字の節点の場合、（ｂ）が逆Ｔ字の節点の場合である。
【図１３】節点符号化テーブルである。
【図１４】符号化の後処理を説明する図であり、（ａ）が罫線抽出後の画像、（ｂ）が符号化後の行列、（ｃ）が後処理後の行列を示す。
【図１５】実施形態に係る罫線抽出符号化装置の動作を示すフローチャートである。
【符号の説明】
１１ＣＰＵ
１２スキャナ
１５記憶装置
２１文書画像入力手段
２２連結画素認識部
２３周囲長演算部
２４しきい値演算部
２５罫線判定部
２６節点認識部
２７節点符号化部
２９節点符号化テーブル[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a method for extracting ruled lines from image data of a document.
[0002]
[Prior art]
In the advanced information society, many documents are digitized and can be used through various media. On the other hand, a large number of paper documents are still used in various situations. In recent years, however, the demand for automatic electronicization of these paper documents has been rapidly increasing in order to disclose information and improve work efficiency. An OCR (Optical Character Reader) greatly assists in the conversion from “paper” to “electronic data”. With the improvement of document image processing technology, OCR has become more practical. With the spread of computers including image scanners and the low price of high recognition rate OCR software, the environment that can be used in society has been improved. Is getting ready.
[0003]
However, with respect to document images containing components other than characters such as charts and ruled lines, the current OCR is often unable to cope. Some OCR software manually specifies the attributes of each area, but in that case, complete automation of the process cannot be performed. In addition, there is OCR software that limits the processing target to tabular documents and specializes in recognizing the table structure. However, in the output, the table structure is expressed with considerable redundancy. Is not concisely represented. That is, it merely maps ruled lines on a fine grid, and does not mean that the system understands the table structure.
[0004]
Tables are a type of expression that is very often used in society as a whole, and are not structurally complex like sentences, but are used to sort the contents by attribute and organize meanings. Is a particularly important element in the text. Therefore, in order to represent the table, the table must be understood not only by visually reproducing the structure but also by using a simple one without redundancy. In recent years, in order to promote electronic commerce (EC) and electronic data exchange (EDI), a technology called XML (extensible Markup Language) has been attracting attention. This attribute is used to make text processing easy, and if there is redundancy in the designation of an area, it becomes difficult to give an attribute to the data.
[0005]
By understanding such a table structure, a technique for converting the table structure (ruled lines) into electronic data is required as a premise for encoding the table structure. As a technique for converting a table structure into electronic data, an input document image is binarized, and the number of black pixels is determined for each row and each column in the vertical and horizontal directions of the image. There is known a method of extracting a ruled line by using a conventional technique (for example, see Patent Document 1).
[0006]
[Patent Document 1]
JP 2001-109888 A (Paragraph 0064, FIG. 18, etc.)
[0007]
[Problems to be solved by the invention]
However, it has been difficult to accurately extract ruled lines when the table described above is greatly inclined or in a document image in which the same characters are continuously arranged.
Therefore, an object of the present invention is to provide a method for easily and accurately extracting a ruled line.
[0008]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, according to claim 1 of the present invention, there is provided a method for extracting a ruled line from a document image in which characters and ruled lines are mixed, wherein a step of obtaining a binarized document image, The method includes a step of recognizing, a step of calculating a perimeter of the connected pixel, and a step of determining a connected pixel whose perimeter is larger than a threshold value as a ruled line.
[0009]
According to such a method, ruled lines and other elements can be distinguished by determining whether or not the length of the perimeter of a connected pixel, that is, a portion recognized as a lump, in a binarized image is greater than a threshold value. it can. Here, the perimeter is the length of one circumference that follows the contour of the connected pixel, and is determined, for example, by the number of pixels forming the contour.
[0010]
In the invention of claim 2, when determining the threshold value, the step of storing connected pixels in the document image, and for each of the stored connected pixels, accumulates the perimeter for each class to calculate a frequency. Obtaining, on orthogonal coordinates using the perimeter and the frequency as coordinate axes, a point of a connected pixel that is the farthest from a straight line connecting a point in the class with the smallest perimeter and a point in the class with the largest perimeter. The determination is made including a step of obtaining a certain local maximum point and a step of setting the circumference of the local maximum point as the threshold value.
[0011]
In this way, when the perimeter is accumulated for each class and the frequency is counted, and the relationship between the perimeter and the frequency is formed on the rectangular coordinates, characters and the like have a high frequency with a small perimeter, while the ruled line is large. There is only low frequency at the perimeter. For this reason, there is a high frequency in a portion where the peripheral length is small, but the frequency is extremely low from a certain peripheral length. When a graph of the above relationship is created, a curve is sharply bent at a certain peripheral length. Then, in order to find a point where this curve bends sharply, a point farthest from a straight line connecting a point in the class with the smallest perimeter and a point in the class with the largest perimeter (in this specification, this is referred to as Point "). If this maximum point is set as a threshold value, it is possible to select only connected pixels having a perimeter longer than this threshold value and extract a ruled line.
Here, the class is referred to because the points that can be plotted on the rectangular coordinates are not continuous with respect to the perimeter, but exist discretely. Using the peripheral length of the local maximum point as the threshold value means that the connected pixel actually selected as the ruled line has the peripheral length of the class next to the threshold value.
[0012]
According to a third aspect of the present invention, there is provided a method for encoding a ruled line image extracted by the ruled line extracting method according to the first or second aspect. Recognizing and, at the node position, determining whether the node can be the upper right, upper left, lower right, or lower left corner of the table structure, and the upper right, upper left, lower right , Or referring to a table in which node numbers are made to correspond one-to-one to combinations of corner points at the lower left, and assigning node numbers to the nodes.
As described above, by determining the node positions of the ruled lines from the image from which the ruled lines are extracted, the node positions can be determined more accurately than by directly determining the node positions from the state where the ruled lines and characters are mixed. Then, for each node, a determination is made as to whether or not this node can be formed by using a table in which node numbers are made to correspond one-to-one, based on a combination of whether each node is located at the upper right, upper left, lower right, or lower left corners. The node number can be assigned only by using the node numbers. In these inventions, each step is executed by a computer.
[0013]
According to a fourth aspect of the present invention, there is provided a ruled line extracting program for extracting a ruled line from a document image in which characters and ruled lines are mixed, the computer comprising: means for inputting a binarized document image; , A means for determining the perimeter of a connected pixel, and a means for determining a connected pixel whose perimeter is greater than a threshold value as a ruled line.
[0014]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, an embodiment of the present invention will be described with reference to the drawings as appropriate.
In the drawings to be referred to, FIG. 1 is a configuration diagram of a system for executing a ruled line extraction method and an encoding method according to an embodiment (hereinafter, referred to as “ruled line extraction encoding device”), and FIG. It is a flowchart which shows the outline of a process of such a ruled line extraction method and encoding method, and FIG. 3 is a functional block diagram of a ruled line extraction encoding device according to the embodiment.
[0015]
As shown in FIG. 1, a ruled line extraction encoding apparatus 1 includes a central processing unit (CPU) 11, a keyboard 13 and a mouse 14 as input devices, a display 16 as an output device, and a storage device 15. The computer is connected to a scanner 12 that reads a document D including ruled lines as a digital image (hereinafter, simply referred to as an “image”). The storage device 15 stores a computer program for realizing each function described later, and is loaded into the CPU 11 and executed as appropriate.
[0016]
The outline of the ruled line extraction method and the encoding method according to the embodiment will be described with reference to FIG. First, the document D is acquired as a binarized image using the scanner 12 (S1). Then, connected pixels connected to one from the acquired image are recognized (S2). Next, the perimeter (perimeter) is calculated for each connected pixel (S3). Next, it is determined whether or not the connected pixel is a ruled line from the length of the peripheral length (S4). Since the perimeter of the ruled line is longer than that of the character, the ruled line may be determined to be a ruled line if it is larger than a certain threshold. Then, a node is found from the ruled line, and the node is encoded, thereby encoding the ruled line (S5). At this time, encoding is performed by referring to the node encoding table 29 as shown in FIG.
[0017]
Next, details of the ruled line extraction encoding apparatus will be described with reference to FIG.
As shown in FIG. 3, the ruled line extraction encoding apparatus 1 includes a document image input unit 21, a connected pixel recognition unit 22, a perimeter calculation unit 23, a threshold value calculation unit 24, a ruled line determination unit 25, It has a node recognition unit 26, a node encoding unit 27, and a node encoding table 29.
[0018]
The document image input unit 21 is configured as software for operating the scanner 12 described above. The document image input unit 21 takes in the document D as a binarized image from the scanner 12 and stores the document D in the storage device 15. When an image is input as a color image or a monochrome multi-tone image, the image may be input and then binarized.
The image input by the document image input unit 21 is, for example, an original image shown in FIG. 4A or an original image shown in FIG.
[0019]
The connected pixel recognition unit 22 is a unit that recognizes a series of black pixels (connected pixels) from the binarized image of the document D input by the document image input unit 21. When a document in which characters and ruled lines are marked in white on a black background is input, it may be configured to recognize white pixels.
Various methods can be used for recognizing connected pixels.
For example, there is a method of labeling each pixel. This method refers to pixels around the target pixel (upper line and left), and if the surrounding pixels are black, assigns the same label to the target pixel as the surrounding pixels. By performing the processing on the entire image in order from the pixel of, a unique label can be attached to each connected pixel.
[0020]
The perimeter calculating unit 23 calculates the perimeter of each connected pixel recognized by the connected pixel recognition unit 22. The perimeter is the length of the outline of the connected pixel, and is obtained, for example, by counting the total number of pixels constituting the outline (for example, a pixel in which at least one of the upper, lower, left, and right pixels is not its own connected pixel). . Of course, the present invention is not limited to this method, and the perimeter may be obtained by another method. In addition, when there is a white pixel surrounded by black pixels like the inside of a ruled line, the perimeter may or may not be counted as the perimeter of this inner black pixel. Counting is preferable because characters and ruled lines can be more accurately distinguished.
[0021]
The threshold value calculation unit 24 makes a relationship between the circumference length of each connected pixel obtained by the circumference length calculation unit 23 and the frequency of the connected pixel having the circumference length, and determines a ruled line and a character from this relationship. This is the part that calculates the threshold value. The perimeter may be graded with an appropriate width according to the resolution of the acquired document image. For example, when the perimeter is counted by the number of pixels forming the periphery of the connected pixels, a class is created every 10 pixels, every 50 pixels, every 100 pixels, and the like, and each connected pixel is assigned to the class. Then, for each class, the number of connected pixels corresponding to the class is accumulated to obtain a frequency.
The relationship between the perimeter and the frequency is, for example, as shown in FIGS. Here, FIG. 5 is a plot of the connected pixels of the document (original image) of FIG. 4 (a), taking the class (perimeter) on the horizontal axis and the frequency for each class on the vertical axis. FIG. 7 is a plot of the connected pixels of the document (original image) of FIG. 6A, with the horizontal axis representing the class (perimeter) and the vertical axis representing the frequency for each class. As can be seen, the frequency distribution shows a high frequency at low classes (short perimeters), a sharp decrease in frequency with higher classes (longer perimeters), and lower frequencies up to the highest class. Distributed in This is because, in general, in a document image in which characters and ruled lines are mixed, the number of characters has a short perimeter and a large number of frequencies, whereas a ruled line has a long perimeter and a very small number of frequencies.
[0022]
A method of obtaining a class threshold for determining whether or not a ruled line is determined from the relationship between the perimeter and the frequency as shown in FIG. 5 or FIG. 7 will be described with reference to FIG. FIG. 8 is a diagram for explaining a method of obtaining a threshold value from the relationship between the perimeter and the frequency corresponding to FIG.
As shown in FIG. 8, the axis (horizontal axis) of the class of the original perimeter is the x-axis, and the axis of frequency (vertical axis) is the y-axis. Then, the point of the class with the smallest perimeter (x ₁ , Y ₁ ) And the largest perimeter class point (x _N , Y _N ) Is created, and this is defined as the x ′ axis. Next, in the direction orthogonal to the x 'axis (in FIG. 8, (x ₁ , Y ₁ )), The y 'axis is taken in the lower left direction. The coordinate system of x'y 'is called an S' system.
Then, in the S ′ system, the point where each point is projected on the y ′ axis is the maximum point (x _m ', Y _m '), And the perimeter of this class is set as the threshold THC.
The maximum point (x _m ', Y _m ') Is the point furthest from the x' axis, as is evident from the distribution of points in FIG.
[0023]
The ruled line determination unit 25 is a unit that determines whether each connected pixel is a ruled line based on the threshold value THC obtained by the threshold value calculation unit 24. That is, for each connected pixel, the perimeter of the connected pixel is compared with the threshold THC, and if the perimeter is larger than the threshold THC, it is determined that the pixel is a ruled line. I do.
In addition, the ruled line determination unit 25 generates an image in which portions (connected pixels) other than the ruled lines are deleted from the document image. The images from which the ruled lines have been extracted in this way are, for example, as shown in FIGS. 4B and 6B.
[0024]
The node recognizing unit 26 is a part that specifies a position of a ruled line in a document image to obtain a node position. For example, the node recognition unit 26 counts the number of black pixels for a line of pixels arranged in the vertical (y-axis) direction of the document image, and performs this for all x coordinates. FIGS. 9A and 9C illustrate the counted number of black pixels (accumulated value). FIG. 9 is a graph showing the cumulative value of black pixels for each of the x and y coordinates, where (a) is the x direction before character erasure, (b) is the y direction before character erasure, and (c) Is the x direction after character erasure, and (d) is the y direction after character erasure. As shown in FIGS. 9A and 9C, the cumulative value shows a high value intermittently on the x coordinate. The x-coordinate with the higher accumulated value is the coordinate at which the vertical line of the ruled line is located.
Similarly, the number of black pixels is counted for a line of pixels arranged in the horizontal (x-axis) direction of the document image, and this is performed for all y coordinates. This is illustrated in FIGS. 9B and 9D. As for the y-coordinate, the y-coordinate indicating the high cumulative value is the coordinate at which the horizontal line of the ruled line is located.
[0025]
(Identification of ruled line position)
There are various ways of recognizing the x-coordinate of the vertical line and the y-coordinate of the horizontal line.
For example, as shown in FIG. 10, three consecutive points of interest (k-1, k, k + 1) are taken on the x-axis, and the distribution (contour) of the accumulated value is tracked while moving them. As can be seen from FIGS. 9 (c) and 9 (d), when the ruled line extraction process is performed, the portion other than the ruled line (connected pixels) is deleted, and the cumulative value of the black pixels is calculated. It is flat and the boundary between the flat part and the ruled line is easy to detect.
Assuming that h (k) is the value of the accumulated value of the black pixel at the coordinate k and the difference between the accumulated values of the ruled line portion and the other portion is the threshold value THR, the following expressions (1) to (3) are obtained. Is the position where the ruled line starts (see FIG. 10A).
h (k + 1) −h (k)> THR (1)
h (k + 1) −h (k−1)> THR (2)
| H (k) -h (k-1) | <THR... (3)
After specifying the position where the ruled line starts, a position (k-1) satisfying both of the following expressions (4) to (6) is defined as a position indicating the end of the ruled line (see FIG. 10B).
h (k−1) −h (k)> THR (4)
h (k−1) −h (k + 1)> THR (5)
| H (k + 1) -h (k) | <THR (6)
When it is necessary to determine the coordinates of the node as one, the center position (coordinates) of the start position and the end position of the ruled line determined by these determinations may be used as the coordinate of the ruled line.
[0026]
Further, as shown in FIGS. 9C and 9D, there is a remarkable difference between the accumulated value of the ruled line position and the other accumulated values. Therefore, the threshold value THR of the accumulated value is set and the threshold value THR is set. If the cumulative value is larger than THR, it may be determined that the line is a ruled line portion, and if the cumulative value is smaller than THR, it may be determined that the coordinate is not the position where the ruled line is located. In this case, as a method of setting the threshold value THR, for example, a constant multiple of the average value of the accumulated values is set as the threshold value THR. When it is necessary to determine the coordinates of the node, a portion determined to be an adjacent ruled line is regarded as one ruled line, and appropriate coordinates, for example, the smallest coordinate or the central The coordinates may be the coordinates of the ruled line.
[0027]
(Node position)
Then, all combinations of the x-coordinate where the vertical line is located and the y-coordinate where the horizontal line is located are created, and the combined x and y coordinates become the coordinates of the node. For example, when it is determined that the x coordinate has a vertical line at 10, 20, and 50 and the y coordinate has a horizontal line at 100 and 120, all combinations of the x coordinate and the y coordinate, that is, (10, 100) ), (10, 120), (20, 100), (20, 120), (50, 100), and (50, 120) are the coordinates of the node.
FIG. 11 shows an example in which the positions of the nodes are obtained in such a manner and are superimposed on the ruled lines. In FIG. 11A, the positions of the nodes are indicated by circles. In addition, E. R. (Element Region) indicates an element area where characters and the like are arranged. In the example shown in FIG. 11, five vertical lines and five horizontal lines are recognized as rule lines, and as a result, 25 nodes combining the x coordinate of each vertical line and the y coordinate of each horizontal line are recognized. I have. Note that a point having neither a vertical line nor a horizontal line, such as a node N44 shown in FIG. 11A, is also recognized as a node having no line.
[0028]
The node encoding unit 27 determines from the shape whether each node can be the upper right, upper left, lower right, or lower left corner of the element region (this is referred to as “corner position determination”). This part encodes each node from a combination of possible angular positions of each node.
[0029]
(Angle position judgment)
The determination of the angular position is performed, for example, as follows. FIG. 12 is a diagram illustrating a method for determining the angular position. In FIG. 12A, each square represents a pixel, and pixels shaded with oblique lines represent black pixels. In FIG. 12A, a horizontal ruled line R1 and a vertical ruled line R2 intersect. In the horizontal ruled line R1, the coordinate ys is the start of the ruled line, and the coordinate ye is the end of the ruled line. In the vertical ruled line R2, the coordinate xs is the start of the ruled line, and the coordinate xe is the end of the ruled line.
The coordinates ys, ye, xs, and xe are specified by specifying the above-described node positions, and the angular position of one node need only be determined around the coordinates.
The determination of the corner position includes the following first to third steps for each corner. Since the basic processing is common to all four types of corners (upper right, upper left, lower right, lower left), here, an example of processing for determining whether a node can be the upper left corner will be described. I do.
[0030]
[First step]
An area D indicated by a thick line in FIG. 12A is set. The area D is an area obtained by moving the area limited by the thickness of the vertical and horizontal ruled lines by one pixel to the lower right (one pixel in the x direction and one pixel in the y direction).
Then, a point that can be the upper left of the element region is searched from the region D. That is, a point that satisfies the following equations (7) to (9) is searched. Note that 0 indicates a white pixel and 1 indicates a black pixel.
b (x, y) = 0 (7)
b (x, y-1) = 1 (8)
b (x-1, y) = 1 (9)
If there is a point that satisfies these conditional expressions, that point is referred to as a notice point T (x _t , Y _t ). If there is no such point, it is determined that the node does not constitute (cannot be) the upper left corner position at that time.
[0031]
[Second step]
The attention point T is moved by a predetermined number of pixels, for example, 200 pixels (pixels) along the black pixels forming the ruled line. The attention point T is moved such that the black pixel is on the right side in the traveling direction. That is, as shown by the downward arrow 2 in FIG. 12A, the pixel moves downward in the figure with the black pixel located on the right side in the traveling direction. Then, when the node is an inverted T-shaped node as shown in FIG. 12B and does not form the upper left corner, the point of interest T advances in a direction in which the x coordinate decreases and reaches xs. I will. Conversely, if the node can be at the upper left corner in the figure, the point of interest T should have a larger x-coordinate (away from xs) but not a smaller one. Therefore, when the x coordinate of the point of interest T reaches xs, it is determined that the node constitutes the upper left corner position, and when it does not reach xs, the node constitutes the upper left corner position.
Only in the first step, as shown in FIG. 12B, even when the vertical ruled line partially protrudes from the horizontal ruled line at the inverted T-shaped node, the condition that the upper left corner position is formed is satisfied. However, by this step, it can be correctly determined that such a node does not constitute the upper left corner position.
[0032]
[Third step]
The point of interest T is set at the initial coordinates (x _t , Y _t ), And shifts the point of interest by a predetermined number of pixels, for example, 200 pixels, along the black pixels forming the ruled line. In this case, the operation is performed such that the black pixel is on the left side in the traveling direction. For example, in FIG. 12A, as shown by a rightward arrow 3 in the figure, the vehicle travels rightward with the black pixel located on the left side in the traveling direction. As in the second step described above, if the node can be at the upper left corner, the point of interest T should have a larger y-coordinate (away from ys) but not a smaller one. Therefore, during this movement, if the y coordinate of the point of interest reaches ys, it is determined that the node does not constitute the upper left corner position. When this movement is completed to the last, it is determined that the node is the upper left component.
[0033]
The above processing of the first to third steps is also performed for the upper right, lower right, and lower left corner positions, and the result of the determination as to whether or not the corner positions can be obtained is stored in the storage device 15. Then, the above processing is performed for all nodes.
[0034]
(Coding)
If the corner position can be determined, a node number is assigned to each node based on a value (○ or ×) indicating whether or not a corner position can be obtained with reference to the node encoding table of FIG.
In the node encoding table 29 of FIG. 13, one node number is associated with each combination of the corner positions (upper left, upper right, lower left, lower right) of the occupied element area. For example, as shown in the second row of FIG. 13, a node that forms only the upper left corner position is a node number n ₁ Is related.
By replacing the nodes with the node numbers according to the node coding table, a matrix as shown in FIG. 11B (this is called a “node matrix”) can be obtained. FIG. 11 (b) is obtained by encoding each node of the table structure of FIG. 11 (a) according to the node encoding table of FIG.
The node number in FIG. ₀ To n ₉ Since each of the ten types is represented by a single-digit number from 0 to 9, it can be encoded as shown in FIG. In other words, each node can be expressed in text with one character number, which is convenient when treating a table as a code in a computer program language.
[0035]
(Post-processing of encoding)
After encoding the nodes, the node encoding unit 27 deletes redundant data portions to obtain compressed data excluding portions other than tables. For example, when a node matrix is obtained from a table T1 as shown in FIG. 14A and an image from which the underline U1 is extracted, the result is as shown in FIG. 14B. In FIG. 14B, the underlined U1 is a non-node (n ₀ ) And the first line is all n ₀ (In the figure, capital N to emphasize ₀ It is expressed by). Thus, non-nodes (n ₀ ) Are continuous, the data amount can be reduced by deleting the row or column. The data after deleting the row is as shown in FIG.
Whether or not to perform this post-processing is optional in the present invention.
[0036]
The operation of the ruled line extraction encoding apparatus 1 as described above will be described with reference to the flowchart of FIG.
First, the document image input unit 21 takes in the document D as an image by the scanner 12 and binarizes it (S11). For example, an image as shown in FIGS. 4A and 6A is input. Then, the connected pixel recognition unit 22 recognizes the connected pixels in the binarized image (S12). Next, the perimeter calculator 23 calculates the perimeter for each connected pixel (S13). Based on the perimeter, the threshold calculator 24 associates the perimeter with the frequency of the connected pixel having the perimeter by classifying the perimeter (S14). When this is illustrated, for example, a graph as shown in FIG. 5 or FIG. 7 is obtained. In this relationship (graph), points of the minimum class and the maximum class are connected by a straight line (S15), a point (class) farthest from the straight line is searched for, and the perimeter thereof is set as a threshold THC (S16).
Then, the ruled line determination unit 25 compares the perimeter of each connected pixel with the threshold value THC to determine whether or not each connected pixel is a ruled line (S17). Further, the connected pixels determined to be not ruled lines are deleted from the original document image (S18), and a new image is generated (see FIGS. 4B and 6B).
Next, the node recognizing unit 26 obtains the cumulative value of black pixels for each x coordinate and y coordinate of the new image generated by the ruled line determining unit 25 (S19). For example, distributions of accumulated values as shown in FIGS. 9C and 9D are obtained. Then, the position of the node is obtained by tracking the distribution of the accumulated value. That is, the cumulative value of the coordinate of interest is compared with the cumulative value of the adjacent coordinate, and whether or not the ruled line is present is determined based on whether or not the cumulative value is greater than the threshold value THR (see Equations (1) to (6)). The position is determined (S20).
Then, at each node position, the node encoding unit 27 determines whether the node can form the upper right, upper left, lower right, or lower left corner position of the element region (table structure) (S21). In this determination, it is determined whether or not the corner position can be reached ([first step]) only by the relationship with the adjacent pixel, and when the pixel is moved along the ruled line (black pixel), It is determined whether or not the angular position of the element coordinates can be formed based on whether or not a certain x-coordinate or y-coordinate is straddled ([second step] [third step]).
Lastly, the node encoding unit 27 refers to the node encoding table 29 to code each node from the combination of the angular positions (n ₀ ~ N ₉ ) (S22).
[0037]
According to the above embodiment, the following effects can be obtained.
First, in order to recognize the position of a ruled line, a relationship between a connected pixel and a perimeter is created. In this relationship, a distance between a straight line and a point is determined. Since the ruled line can be determined by simple processing, the processing speed can be increased.
Then, the position of the ruled line (the position of the node) is determined from the image after the ruled line is extracted, so that mistakes in extracting the ruled line can be reduced.
Further, for a combination of which of the upper right, upper left, lower right, and lower left corner positions the node can form, refer to the node encoding table corresponding to one-to-one, and change the node number from the combination. Since assignment and encoding are performed, ruled lines can be encoded very easily. Furthermore, by deleting non-node rows or columns as post-processing, unnecessary data can be discarded, and the data amount can be reduced.
Further, in the node encoding table of the present invention, since ten types of nodes are arranged, by assigning a number from 0 to 9 to each node model as a node number, the handling of the table structure on a computer program is simplified. It becomes.
[0038]
Since the present invention is a ruled line extraction / encoding method and a ruled line extraction program, it will not be described in detail. However, coded ruled line data is obtained by using the ruled line coding table and a , And connecting the nodes so as to extend the straight line of the node model, the ruled line (table structure) can be reproduced.
[0039]
【The invention's effect】
As described in detail above, according to the present invention, ruled lines can be easily extracted and encoded with a small amount of processing and a small amount of data.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a ruled line extraction encoding apparatus according to an embodiment.
FIG. 2 is a flowchart illustrating an outline of processing of a ruled line extraction method and an encoding method according to the embodiment;
FIG. 3 is a functional block diagram of a ruled line extraction encoding apparatus according to the embodiment.
FIG. 4 is an example of an input image, in which (a) shows an original image and (b) shows a ruled line after extraction.
FIG. 5 is a graph showing a relationship between a perimeter and a frequency in the example of FIG. 4;
FIG. 6 is an example of an input image, in which (a) shows an original image and (b) shows a ruled line after extraction.
FIG. 7 is a graph showing a relationship between a perimeter and a frequency in the example of FIG. 6;
8 is a diagram for explaining a method of obtaining a threshold value from the relationship between the perimeter and the frequency corresponding to FIG. 7;
9A and 9B are graphs showing cumulative values of black pixels for each of the x coordinate and the y coordinate, where (a) is the x direction before the character is erased, (b) is the y direction before the character is erased, and (c) is the character. The x direction after erasure, and (d) the y direction after erasure.
10A is a diagram illustrating a search for a position where a ruled line starts, and FIG. 10B is a diagram illustrating a search for a position where a ruled line ends.
11A is a diagram showing an extracted ruled line superimposed on a node, FIG. 11B is a matrix obtained by encoding the ruled line of FIG. 11A, and FIG. Shows a matrix quantified using letters.
12A and 12B are diagrams for explaining a method of determining a corner position of a ruled line, where FIG. 12A illustrates a case of a cross node and FIG. 12B illustrates a case of an inverted T-shaped node.
FIG. 13 is a node encoding table.
FIGS. 14A and 14B are diagrams for explaining post-processing of encoding. FIG. 14A shows an image after ruled line extraction, FIG. 14B shows a matrix after encoding, and FIG. 14C shows a matrix after post-processing.
FIG. 15 is a flowchart illustrating an operation of the ruled line extraction encoding apparatus according to the embodiment.
[Explanation of symbols]
11 CPU
12 Scanner
15 Storage device
21 Document image input means
22 Connected pixel recognition unit
23 Perimeter calculation unit
24 threshold calculator
25 Ruled line judgment unit
26 Node recognition unit
27 Node coding unit
29 node coding table

Claims

A method of extracting ruled lines from a document image in which characters and ruled lines are mixed,
Obtaining an image of the binarized document;
Recognizing connected pixels;
Determining the perimeter of the connected pixel;
Determining a connected pixel whose perimeter is greater than a threshold value as a ruled line.

In determining the threshold,
Storing connected pixels in the document image;
For each of the stored connected pixels, accumulating the perimeter for each class to obtain a frequency,
On the orthogonal coordinates using the perimeter and the frequency as coordinate axes, a local maximum point, which is a point of a connected pixel that is the farthest from a straight line connecting a point in the class with the smallest perimeter and a point in the class with the largest perimeter, Seeking steps;
Setting the perimeter of the maximum point as the threshold value.

A method of encoding a ruled line image extracted by the ruled line extracting method according to claim 1 or 2,
Recognizing a node position where the ruled line intersects from the ruled line image;
At the node position, a step of determining whether the node can be the upper right, upper left, lower right, or lower left corner of the table structure,
Referencing a table in which node numbers are associated one-to-one with respect to combinations of the upper right, upper left, lower right, and lower left corners, and assigning node numbers to the nodes. The encoding method of the ruled line.

To extract ruled lines from a document image that contains mixed characters and ruled lines,
Means for inputting an image of a binarized document,
Means for recognizing connected pixels,
Means for determining the perimeter of a connected pixel,
A ruled line extraction program for causing a connected pixel having a perimeter longer than a threshold value to function as a ruled line.