JP2004094427A

JP2004094427A - Slip image processor and program for realizing the same device

Info

Publication number: JP2004094427A
Application number: JP2002252347A
Authority: JP
Inventors: Minenobu Seki; 関　峰伸; Shoji Ikeda; 池田　尚司; Yutaka Sako; 酒匂　裕
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2002-08-30
Filing date: 2002-08-30
Publication date: 2004-03-25

Abstract

<P>PROBLEM TO BE SOLVED: To provide a pattern recognizing method and a device for realizing highly precise pattern recognition by fetching the surface of a document as a concentration image, and generating a binary image suitable for the recognition of characters, symbols, and marks even when the document image is constituted of various color frame lines, pre-print characters, backgrounds, entry characters, symbols and marks. <P>SOLUTION: A document such as a slip is inputted as an image, and a frame structure described in the document is extracted on the inputted document image, and the inside of the document image is divided into a plurality of areas based on the acquired frame structure, and a binary threshold suitable for recognizing characters, symbols, and marks described in the frame is calculated for each of the divided areas, and a binary image is generated for each of the divided areas by using the obtained threshold, and the recognition of the characters, symbols, and marks is executed on the basis of the generated binary image. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、文書の表面を濃淡画像として取り込み、この画像から文字、記号、マーク等のパターンを認識するパターン認識方法及びパターン認識装置に関するものである。
【０００２】
【従来の技術】
文字、記号、マーク等のパターン認識の中で、文字を認識する場合を例に述べる。記号、マークのパターンを認識する場合も同様である。
一般的に文書の濃淡画像から文字認識を行う場合，文字の部分が黒，背景が白となるように２値化処理を行い，この処理で生成された２値画像から文字を切り出し、その文字の形から文字の識別を行う。この２値化方法には様々な方法が存在し、これらは２つの観点から分類できる。一つは、閾値を決定するための処理単位であり、代表的なものは▲１▼画像中の指定された全領域（全面領域）、▲２▼着目する画素を中心とする近傍の微小領域（局所領域）、▲３▼画像をメッシュ状に分割した領域（メッシュ領域）、を処理単位とする方法である。もう一つは、閾値を決定するための特徴量であり、代表的なものには（ａ）濃淡値の頻度分布、（ｂ）濃淡値の平均値、中央値、最大値、最小値を用いる方法がある。▲１▼全面領域を処理単位とする場合、斎藤泰一，山田博三，“２平均中点の荷重平均を採用するしきい値選定法と２値化評価用データ”電子情報通信学会論文誌，Ｄ−２，Ｖｏｌ．Ｊ８３−Ｄ−２，Ｎｏ．２，ｐｐ．５７５−５８３に記載の方法で、（ａ）の特徴量をもとに判別分析基準やｋ−ｍｅａｎｓ法による２値化方法を用いるのが一般的である。▲２▼局所領域を処理単位とする場合、Ｏｉｖｉｄ　Ｄｅｕ　Ｔｉｅｒ，Ａｎｂｉｌ　Ｋ．Ｊａｉｎ，“Ｇｏａｌ−Ｄｉｒｅｃｔｅｄ　Ｅｖａｌｕａｔｉｏｎ　ｏｆ　Ｂｉｎａｒｉｚａｔｉｏｎ　Ｍｅｔｈｏｄｓ”，ＩＥＥＥ　Ｔｒａｎｓ．Ｐａｔｔｅｒｎ　Ａｎａｌｙｓｉｓ　ａｎｄ　Ｍａｃｈｉｎｅ　Ｉｎｔｅｌｌｉｇｅｎｃｅ，ｖｏｌ．１７，ｎｏ．１２，ｐｐ．１１９１−１２０１，１９９５．に記載のように、（ｂ）の特徴量（平均値）を閾値として２値化するのが一般的である。図１１にその例を示す。▲３▼メッシュ領域を処理単位とする場合、特開平６−４７０６にあるように、一度（ｂ）の特徴量（平均値）を用いて全面を２値化し、その２値画像から文字切りだしを行い、その文字切りだし結果から、一つの文字のサイズを推定し、一つの文字毎の領域（メッシュ状）に分割し、分割した領域毎に改めて（ｂ）の特徴量（平均値）を用いて２値化する方法がある。
帳票等の記入枠を多く含む文書画像において、文書中の枠内の文字をすべて認識する場合、図３に示すように、文書画像を入力し（３０１）、文書全面を上記のいずれかの方法で全面の２値画像を生成し（３０２）、生成した２値画像から枠構造を抽出し（３０３）、２値画像内を枠毎の領域に分割し（３０４）、枠領域毎に文字切りだし行い、文字認識を行う（３０５）。そして、文書画像内にある複数の枠のうち、その一部の枠内の文字だけを読取る場合は、特開２０００−２９３６２９にあるように、例えば図４に示すような文書画像を入力し（４０１）、入力された画像内の帳票位置を検出し（４０２）、スキャナ取り込み時の帳票伸縮を検出し、予め用意しておいた同一フォーマットの帳票画像の枠位置情報をもとに、濃淡画像内の読取りたい枠の領域を推定し切り出す（４０３）。そして、切り出された領域（切り出し領域）で、上記のいずれかの方法を用いて２値化を行い（４０５）、得られた２値画像から枠構造を抽出し（４０６）、２値画像内を枠毎の領域に分割し（４０７）、文字認識を行う（４０８）。ただし、帳票用紙のサイズのばらつき、画像内の帳票位置の検出誤差、スキャナ取り込み時の帳票伸縮検出誤差があり、切り出し領から記入枠や記入枠内の文字がはみ出してしまう場合がある。このため切り出す領域は推定した枠位置よりも少し広い領域を切り出さなければならない。
【０００３】
【発明が解決しようとする課題】
文字、記号、マーク等のパターン認識の中で、文字を認識する場合を例に述べる。記号、マークのパターンを認識する場合も同様である。
帳票等の文書画像には、様々な色の枠線、プレ印刷文字、背景、記入文字で構成されているものがある。このような帳票を多値画像として入力し、その多値画像内にある一部の枠内の文字を認識する場合の読取り領域は、例えば図６のように、（６０１）の記入文字（濃い）、（６０２）の記入文字（薄い）、（６０３）の枠線（濃い）、（６０４）のプレ印刷文字、（６０５）の枠線（薄い）、（６０６）の背景部（濃い）、（６０７）の背景部（薄い）で構成される。この読取り領域内を２値化する場合を考える。従来の技術で述べた▲１▼全面領域を処理単位として２値化を行うと、読取り領域内の濃淡分布は図７のように、図６中の（６０１）の濃淡分布である（７０１）、図６中の（６０２）の濃淡分布である（７０２）、図６中の（６０３）の濃淡分布である（７０３）、図６中の（６０４）の濃淡分布である（７０４）、図６中の（６０５）の濃淡分布である（７０５）、図６中の（６０６）の濃淡分布である（７０６）、図６中の（６０７）の濃淡分布である（７０７）となる。そして２値化閾値は図８に示す閾値Ａ（８０１）、閾値Ｂ（８０２）となる。しかし、閾値Ａによる２値化結果は図９となり、読取り対象の文字“６　３０”を抽出することができず、その他の文字のみ抽出しているため、文字の認識ができない。また閾値Ｂによる２値化結果は図１０であり、目的の文字が濃い背景部分に塗りつぶされてしまい、文字の認識ができない。このように従来の技術▲１▼の方法では、読取り領域内に様々な濃淡値の枠線、プレ印刷文字、背景、記入文字がある場合、濃淡分布が複雑に重なり合うため、読取りたい文字とそのまわりの背景を区別できるような閾値を推定することが難しくなるという問題がある。また従来の技術で述べた▲２▼局所領域を処理単位として２値化を行うと、記入文字の濃淡値（７０２）とプレ印刷文字の濃淡値（７０４）が低く（濃く）、背景色の濃淡値（７０６）が高い（薄い）場合、図１２に示すように、記入文字とプレ印刷文字がともに抽出されるため記入文字とプレ印刷文字が重なってしまい（１２０１、１２０２）、文字を正しく認識できない。また記入文字の背景色の濃淡値（７０２）が低く（濃く）、プレ印刷文字の濃淡値（７０４）と背景色の濃淡値（７０６）が高い（薄い）場合（ただしプレ印刷文字の濃淡値は背景の濃淡値よりも低い）、図１３に示すように、プレ印刷文字の一部が欠けてしまい、文字を正しく認識できない。また図１４に示すように枠線の濃淡値（７０３）が低く、記入文字の濃淡値（７０２）が高い場合の２値化結果は図１５となり、記入文字の一部が欠けてしまい文字を正しく認識できない。そして従来の技術で述べた▲３▼のメッシュ領域を処理単位として２値化を行うと、帳票内の文字のピッチ、大きさは様々であり、記入文字とプレ印刷文字は重なる場合があるため、文字毎（メッシュ状）に区切ることは困難である。ゆえに、それぞれのメッシュ領域内には様々な濃淡値の記入文字、プレ印刷文字、枠線、背景が含まれる場合が生じ、▲１▼▲２▼の２値化方法と同じ問題が発生する。そして、上記の問題は文書中の枠内の文字すべてを認識する場合にも同様に発生する。
また、文字認識結果の確認のために認識領域の２値画像をディスプレイに表示する場合、上記の問題が発生するため、認識対象の文字のかすれ、潰れや、認識対象となる枠のずれが生じるため、認識結果の確認が困難になる場合がある。
【０００４】
本発明はこのような従来技術がもっていた問題を鑑みてなされたものであって、帳票をはじめとする様々な色の枠線、プレ印刷文字、背景、記入文字で構成されている文書画像であっても、それぞれの枠内に記入された文字の認識に適した２値画像を生成し、精度の高い文字認識を実現し、また目視による認識結果の確認を容易にすることができる文字認識方法及び文字認識装置を提供することを目的とする。
【０００５】
【課題を解決するための手段】
前記課題を解決するために、本願の開示する代表的な発明は、様々な色の枠線、プレ印刷文字、背景、記入文字で構成されている文書画像であっても、その濃淡画像から文字認識に適した２値画像を生成する方法であって、文書全体の濃淡画像、あるいはその一部分を入力し（１０１）、その画像から文書に記載された枠の構造を抽出し（１０２）、得られた枠構造をもとに入力画像内を複数の領域に分割し（１０３）、分割された領域毎に文字認識に好適な２値化閾値を算出し（１０４）、得られた閾値を用いて分割された領域毎に２値画像を生成する（１０５）ことを特徴とする。
【０００６】
さらに帳票等の文書画像から文字、記号、マーク等のパターンを認識する方法において、文書中の特定の枠内にある文字を認識する場合に、文書の表面を濃淡画像として入力し、画像中にある文書の位置を検出し、予め同じフォーマットの文書内の枠位置を計測することにより用意しいておいたフォーマット（枠位置）情報を用いることにより、入力画像中から認識対象となる文字が記載される枠が存在する位置を推定し、推定した枠位置を上下左右に拡大し、拡大した領域を入力画像から切り出し、切り出した部分領域画像中の枠構造を抽出し、枠毎に枠内に記入されている文字を認識するのに適した２値化閾値を算出し、得られた閾値を用いて枠毎に２値画像を生成し、枠毎に文字認識を行うことを特徴としている。
【０００７】
【発明の実施の形態】
以下、図を用いて本願発明を説明する。本願の開示する発明は、図１に示すように、文書全体の濃淡画像、あるいはその一部分を入力し（１０１）、その画像から文書に記載された枠の構造を抽出し（１０２）、得られた枠構造をもとに入力画像内を複数の領域に分割し（１０３）、分割された領域毎に文字認識に好適な２値化閾値を算出し（１０４）、得られた閾値を用いて分割された領域毎に２値画像を生成する（１０５）ことを特徴とする。本願の構成により様々な色の枠線、プレ印刷文字、背景、記入文字で構成されている文書画像であっても、それぞれの枠内に記入された文字の認識に適した２値画像を生成し、精度の高い文字認識を実現する。
本発明の一実施例である帳票の枠内の文字を読取る方法の処理フローを示す図２６を用いて本願発明の全体的フローを詳細に説明する。まず、ステップ１０１にて画像を入力する。入力される画像は濃淡画像であり、図６のように様々な濃淡値を持つ記入文字、プレ印刷文字、枠線、背景で構成される。
次にステップ２６０１において、入力画像中の帳票位置を検出する。これは、図２８に示すように入力画像には、帳票部分と黒背景部分があり、入力画像中のどの位置に帳票が存在するのかを求めるものである。具体的には、入力画像の４隅のいずれかを原点とし、帳票の４隅座標を求める。
【０００８】
次にステップ２６０２において、読取り対象の文字が記載されている枠の領域を入力画像から切りだす。予め読取り対象となる帳票と同じフォーマットの帳票を用いて帳票内に記載されている枠の位置座標を計測し、図２７のフォーマット情報記憶部（２０９）にフォーマット情報として保持しておく。そして、このフォーマット情報の枠位置座標をもとに読取り対象の文字が記載されている枠の領域を切り出す。この際、画像取り込み時の帳票伸縮補正、傾き補正を行う。しかし、帳票位置検出、帳票伸縮補正、傾き補正の誤差や、帳票用紙のサイズのばらつきがあるため、読取り対象となる枠の４隅を正確に切り出すことはできず、保持していた枠の位置と読取る画像上の枠の位置がずれる場合がある。保持していた枠の位置情報をそのまま用いて領域を切り出し、その位置がずれた場合、読取り対象となる文字が欠けてしまい、読取ることができない場合がある。そこで枠の領域の切り出し処理では、予め保持しておいた帳票の枠の位置から推定した領域を上下左右に拡大した範囲を切り出す。この領域を切り出し領域と呼ぶ。図６に切り出し領域を例示する。この場合（６０２）を含み、領域の中心にある枠が読取り対象となる枠である。拡大する範囲は予め設定しておく。例えば、予め（Ａ，Ｂ，Ｃ，Ｄ）を定数として設定しておき、予め保持しておいた帳票の枠の位置から推定した領域を上方向にＡｍｍ，下方向にＢｍｍ，左方向にＣｍｍ，右方向にＤｍｍに拡大する。本構成により、切り出し領域内には読取り対象の文字が完全に含まれ、文字認識精度が向上する。また、認識結果の目視確認のために切り出し領域の画像を表示する際、読取り対象の文字が完全に含まれるため、確認が容易になる。
次に、ステップ１０２において、切り出し領域内の存在する記入枠の構造を抽出する。枠の構造を抽出するとは、枠を構成する罫線の位置を検出し（図１６）、検出された罫線によって囲まれる閉領域を検出することによって個々の枠の位置を算出することである（図１７）。図中の１７０１、１７０２、１７０３、１７０４、１７０５、１７０６、１７０７、１７０８は、枠構造抽出を行い、得られた個々の枠領域を示す。罫線の位置の検出は、一度全面を従来の技術で述べた▲２▼局所領域の平均値を閾値として２値化を行い、得られた２値画像から水平、垂直に連なる黒画素を検出することで行う。ただし、他の方法として森　俊二，坂倉　栂子，“画像認識の基礎２”，ｐ３−１１，オーム社，にあるように多値画像からハフ変換法を用いて検出する方法等を用いることもできる。また枠の位置の検出方法には様々な方法があるが、検出された水平方向の罫線と垂直方向の罫線の交点を検出し、検出した交点を辿り、交点を頂点とするような閉領域を検出する方法も用いることができる。
【０００９】
枠構造を抽出した後、ステップ１０３において、図１８に示すように得られた枠構造の情報を用いて切り出された濃淡画像内を枠毎の領域に分割する。この際、一つ一つの枠領域（単一枠領域）は、枠線の領域と枠線の領域に囲まれる領域（文字が記入される領域）に分割される。これにより、以降の処理において枠線の色を考慮しなくてもよい。　次に、ステップ１０４において単一枠領域毎に文字認識に適した２値化閾値を算出する。ステップ１０２にて枠構造を抽出するために生成した２値画像は用いず、改めて文字の認識に適した２値画像を生成する。ここでの２値化閾値の算出方法は、従来の技術で述べたｋ−ｍｅａｎｓ法を用いる。ｋ−ｍｅａｎｓ法は代表的なクラスタリング手法の一つであり、数値あるいは数値ベクトルを持つデータを、予め指定した数のグループ（クラスタ）に分割する方法である。文字認識に適した２値化閾値を算出するために、領域内の各画素を、輝度値を用いてクラスタリングすることにより、文字、プレ印刷文字、背景の画素を区別する。２値化閾値の算出方法には、従来の技術で述べたように他に様々な方法があり、例えば判別分析を用いた方法でも２値化可能であるが、尚、ｋ−ｍｅａｎｓ法は計算量が少なく処理時間が短くて済み、また精度の面でも他に劣らないという利点がある。ただし、帳票などの一般に使用されている文書画像中の単一枠領域内には、図６に記載するように１８０１や１８０７のように記入文字と背景、プレ印刷文字と背景の２色が存在する場合と、１８０４のように記入文字とプレ印刷文字と背景の３色が存在する場合のいずれかの場合であることが多い。そこで、枠領域内の濃淡値のヒストグラムを作成し、得られたヒストグラムに対してｋ−ｍｅａｎｓ法で２つのクラスタへのクラスタリング（以降クラスタリング（２）とする）と３つのクラスタへのクラスタリング（以降クラスタリング（３）とする）を行い、閾値を算出する。図２０に１８０１の濃淡ヒストグラム、図２１に１８０７の濃淡ヒストグラム、図１９に１８０４の濃淡ヒストグラムを示す。前もって単一枠領域内が何色で構成されるかがわかる場合は、予め設定しておいた数のクラスタリングを行えば良い。一方で多種のフォーマットの文書画像を扱う場合、それぞれの枠領域内の色数を事前に調べておくことは困難であるし、同種類の文書の同一位置の枠領域でもプレ印刷文字がある場合とない場合がある等、色数が限定されていない場合があるため、単一枠領域毎に、クラスタリング（２）とクラスタリング（３）を行い、それぞれ閾値を算出する。尚、扱う帳票によりクラスタ数は２若しくは３には限られない。ここでクラスタリング（３）を行うと、図１９に示すように２つの閾値が算出されるが、帳票に記入される文字、すなわち読取り対象の文字は、一般にプレ印刷文字よりも濃淡値が低い（濃い）ことから、得られた２つの閾値のうち濃淡値が低い方である閾値Ｃ（１９０４）を用いる。本実施例で取扱う帳票の単一枠領域内には上記のように２色、あるいは３色のみ存在したが、２値化を行う領域内にＮ色存在する場合には、Ｎ個のクラスタへのクラスタリングを行うことで各画素を分割することが考えられる。
【００１０】
次にステップ２６０７において、単一枠領域毎に算出した２つの閾値を用いて、単一枠領域内を２値化し２枚の２値画像を生成する。１８０４を閾値Ｃ（１９０４）で２値化した結果が図２２であり、１８０４を閾値Ｄ（１９０５）で２値化した結果が図２３である。図２２に示すように、閾値Ｃで２値化した場合には、プレ印刷文字を除き、記入文字のみを黒画素とすることができるが、図２３で示すように閾値Ｄで２値化した場合には記入文字とプレ印刷文字が重なってしまい、文字を正しく認識することができない。
【００１１】
次にステップ２６０９において、単一枠領域毎に文字認識を行う。このこのとき単一枠領域毎に、異なる２値化閾値（クラスタリング（２）による閾値とクラスタリング（３）による閾値）による２枚の２値画像があり、この２枚の２値画像に対して、それぞれ文字認識を行う。
【００１２】
次にステップ２６１０において、単一枠領域毎に得られた２値化閾値の異なる２組の文字認識結果のうち、枠内に記載される文字列の知識（金額、氏名、住所等）と一致する文字認識結果を読取り結果として出力する。文字列の知識とは、各帳票に記載されるべき個々の情報の属性をいい、枠内に記載される文字列の表記パターン、或いは全パターンのデータベースである。例えば金額ならば￥マーク、数字の羅列、カンマで構成される等の情報が格納され、氏名、住所ならば、記入される全氏名、全住所の表記パターンが蓄積される。そして、これらは図２のフォーマット情報記録部２０９に予め保持しておく。
【００１３】
図２７は、上記実施例の構成図である。図中、２７０１は帳票を濃淡画像として読み込むスキャナ等の濃淡画像を入力する手段、２７０２は読み込んだ濃淡画像、処理途中の２値画像を記憶しておく手段、２７０４は読み込んだ濃淡画像中の帳票の位置（４隅座標）を検出する手段、２７０５は読取り対象の文字が記載されている部分を濃淡画像から切り出す手段、２７０６は帳票の枠構造を抽出する手段、２６０７は濃淡画像内を枠毎の領域に分割する手段、２７０８は画像内の部分領域毎に文字認識に適した２値画像を生成する２値化閾値を算出する手段、２７０９は２７０８で得られた閾値により画像内の部分領域の２値画像を生成する手段、２７１０は２７０９で生成した２値画像から文字を切り出し、文字認識を行う手段、２７１１は２７１０で得られた文字認識結果のうち、読取り対象の枠内に記載される文字列の知識（金額、氏名、住所等）と一致するものを出力する手段である。２７０６は２７１２の枠構造を抽出するための２値化手段と２７１３の罫線を抽出する手段と２７１４の枠の位置検出する手段で構成される。そして、２７１５は２７０５にて文字が記載されている部分を切り出す際に用いる帳票のフォーマット（枠位置）情報を保持しておく手段、２７１６は２７１１で用いる文字列の知識を保持しておく手段である。
２７１８は認識結果を目視により確認するための２値画像表示部である。
尚、以上開示した本願発明はプログラムで実現し、コンピュータ等の情報機器で実行することもできる。
【００１４】
【発明の効果】
本発明によれば、帳票等の文書画像中の文字、記号、マーク等のパターン認識方法において、文書画像中の枠構造を抽出した後、この枠構造の情報を用いて文書画像内を一つ一つの枠領域に分割し、枠領域毎に枠内に記載されたパターンを認識するのに適した２値化閾値を算出し、この閾値を用いて２値画像（２値画像Ａ）を生成することによって、様々な色の枠線、プレ印刷文字、背景、記入文字で構成されている文書画像であっても、認識に好適な２値画像を生成することが可能となり認識の精度を上げることができる。また認識結果を確認するためのディスプレイに表示する２値画像を２値画像Ａにすることによって、様々な色の枠線、プレ印刷文字、背景、記入文字で構成されている文書画像であっても、かすれや潰れのない認識対象の文字を表示でき、また認識対象の枠を正確に表示できるため、認識結果を目視にて確認することが容易になる。
【図面の簡単な説明】
【図１】本発明を表す処理フローを示す図。
【図２】実施例の構成図。
【図３】帳票内の文字すべてを読取る場合の従来法による処理フローを示す図。
【図４】帳票内の特定の位置にある文字を読取る場合の従来法による処理フローを示す図。
【図５】枠構造抽出処理の処理フローの例を示す図。
【図６】様々な濃淡値の枠線、記入文字、プレ印刷文字、背景で構成される画像の例を示す図。
【図７】読取り領域内の濃淡ヒストグラムを示す図。
【図８】読取り領域の濃淡ヒストグラムをもとに算出した二値化閾値（Ａ，Ｂ）を示す図。
【図９】閾値Ａ（８０１）による二値画像を示す図。
【図１０】閾値Ｂ（８０２）による二値画像を示す図。
【図１１】局所領域毎の二値化法を示す図。
【図１２】局所領域毎の二値化法による二値画像例（１）を示す図。
【図１３】局所領域毎の二値化法による二値画像例（２）を示す図。
【図１４】濃い枠線に近接して薄い文字が存在する場合の多値画像を示す図。
【図１５】濃い枠線に近接して薄い文字が存在する場合の局所領域毎の二値化法による二値画像を示す図。
【図１６】罫線検出結果を示す図。
【図１７】枠検出結果を示す図。
【図１８】枠情報による領域分割画像を示す図。
【図１９】枠領域（１８０４）の濃淡ヒストグラムを示す図。
【図２０】枠領域（１８０１）の濃淡ヒストグラム（２）を示す図。
【図２１】枠領域（１８０７）の濃淡ヒストグラム（３）を示す図。
【図２２】閾値Ｃ（１９０４）による枠領域（１８０４）の二値画像を示す図。
【図２３】閾値Ｄ（１９０５）による枠領域（１８０４）の二値画像を示す図。
【図２４】閾値Ｅ（２００３）による枠領域（１８０１）の二値画像を示す図。
【図２５】閾値Ｆ（２１０３）による枠領域（１８０７）の二値画像を示す図。
【図２６】実施例の処理フローを示す図。
【図２７】実施例の入力画像の構成を示す図。
【符号の説明】
６０１…記入文字（濃い）、６０２…記入文字（薄い）、６０３…枠線（濃い）、６０４…プレ印刷文字、６０５…枠線（薄い）、６０６…背景部（濃い）、６０７…背景部（薄い）、７０１…図６中の上部に存在する記入文字列“０００２５７００３”と図６中の右部に存在する記入文字“１”の濃淡分布、７０２…図６の中心付近に存在する記入文字“６，３０”の濃淡分布、７０３…図６中の太い枠線の濃淡分布、７０４…図６中のプレ印刷文字“込期限，月，日，営業所”の濃淡分布、７０５…図６中の細い枠線の濃淡分布、７０６…図６中の中心部に存在する濃い背景部分の濃淡分布、７０７…図６中の薄い背景部分の濃淡分布、８０１、８０２…図６の濃淡ヒストグラムからｋ−ｍｅａｎｓ法により得られた閾値、１２０１、１２０２…局所領域毎の二値化法による二値画像例によって、記入文字とプレ印刷文字が重なる部分、１３０１、１３０２…局所領域毎の二値化法による二値画像例によって、プレ印刷文字が欠けた部分、１７０１、１７０２、１７０３、１７０４、１７０５、１７０６、１７０７、１７０８…枠構造抽出を行い、得られた個々の枠領域、１８０１、１８０２、１８０３、１８０４、１８０５、１８０６、１８０７、１８０８…枠構造の情報を用いて画像内を分割し、得られた個々の枠領域、１９０１…枠領域（１８０４）に存在する記入文字の濃淡分布、１９０２…枠領域（１８０４）に存在するプレ印刷文字の濃淡分布、１９０３…枠領域（１８０４）に存在する背景の濃淡分布、１９０４、１９０５…枠領域（１８０４）の濃淡ヒストグラムからｋ−ｍｅａｎｓ法を用いて算出した２値化閾値（Ｃ、Ｄ）、２００１…枠領域（１８０１）に存在する記入文字の濃淡分布、２００２…枠領域（１８０１）に存在する背景の濃淡分布、２００３…枠領域（１８０１）の濃淡ヒストグラムからｋ−ｍｅａｎｓ法を用いて算出した２値化閾値（Ｅ）、２１０１…枠領域（１８０７）に存在するプレ印刷文字の濃淡分布、２１０２…枠領域（１８０７）に存在する背景の濃淡分布、２１０３…枠領域（１８０７）の濃淡ヒストグラムからｋ−ｍｅａｎｓ法を用いて算出した２値化閾値（Ｆ）。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a pattern recognition method and a pattern recognition device that captures the surface of a document as a gray-scale image and recognizes patterns such as characters, symbols, and marks from the image.
[0002]
[Prior art]
An example will be described in which characters are recognized in pattern recognition of characters, symbols, marks, and the like. The same applies when recognizing a symbol or mark pattern.
In general, when character recognition is performed from a gray image of a document, binarization processing is performed so that the character part is black and the background is white, and the character is cut out from the binary image generated by this processing, and the character is extracted. Characters are identified from the form There are various binarization methods, which can be classified from two viewpoints. One is a processing unit for determining a threshold, and typical examples are (1) the entire designated area (entire area) in the image, and (2) a minute area near the pixel of interest. (Local area), and (3) a method in which an image is divided into meshes (mesh area) as processing units. The other is a feature amount for determining a threshold value, and typical ones use (a) frequency distribution of gray values, (b) average, median, maximum, and minimum values of gray values. There is a way. {Circle around (1)} When the whole area is used as the processing unit, Taiichi Saito and Hirozo Yamada, "Threshold Selection Method Using Weighted Average of Two Mean Midpoints and Data for Binary Evaluation" Transactions of IEICE , D-2, Vol. J83-D-2, No. 2, pp. In the method described in 575-583, it is general to use a discriminant analysis criterion or a binarization method based on the k-means method based on the feature amount of (a). {Circle around (2)} When a local region is used as a processing unit, Ovid Deu Tier, Anbil K. et al. Jain, "Goal-Directed Evaluation of Binarization Methods", IEEE Trans. See Pattern Analysis and Machine Intelligence, vol. 17, no. 12, pp. 1191-1201, 1995. As described in (1), binarization is generally performed using the feature amount (average value) of (b) as a threshold value. FIG. 11 shows an example. {Circle around (3)} When the mesh area is used as a processing unit, as described in JP-A-6-4706, the entire surface is binarized once using the feature amount (average value) of (b), and characters are cut out from the binary image. Is performed, the size of one character is estimated from the character segmentation result, the character is divided into regions (mesh shape) for each character, and the characteristic amount (average value) of (b) is newly calculated for each divided region. There is a method of binarizing by using this.
In the case of recognizing all characters in a frame in a document in a document image including a lot of entry frames such as a form, the user inputs the document image as shown in FIG. Generates a binary image of the entire surface (302), extracts a frame structure from the generated binary image (303), divides the binary image into regions for each frame (304), and cuts characters for each frame region Then, character recognition is performed (305). Then, when reading only characters in a part of the plurality of frames in the document image, a document image as shown in FIG. 4 is input as disclosed in JP-A-2000-293629 ( 401), a form position in the input image is detected (402), form expansion / contraction at the time of scanning is detected, and a grayscale image is prepared based on frame position information of a form image of the same format prepared in advance. The area of the frame to be read within is estimated and cut out (403). Then, binarization is performed on the cut-out area (cut-out area) using any of the above methods (405), and a frame structure is extracted from the obtained binary image (406). Is divided into regions for each frame (407), and character recognition is performed (408). However, there are variations in the size of the form paper, errors in detecting the position of the form in the image, and errors in detecting expansion and contraction of the form when taking in the scanner, so that the entry frame and characters in the entry frame may protrude from the cutout area. For this reason, it is necessary to cut out a region that is slightly wider than the estimated frame position.
[0003]
[Problems to be solved by the invention]
An example will be described in which characters are recognized in pattern recognition of characters, symbols, marks, and the like. The same applies when recognizing a symbol or mark pattern.
Some document images such as forms are composed of various color frame lines, preprinted characters, backgrounds, and entered characters. When such a form is input as a multi-valued image and characters in some of the frames in the multi-valued image are recognized, the reading area (601) is, for example, as shown in FIG. ), (602) written characters (light), (603) frame lines (dark), (604) preprinted characters, (605) frame lines (light), (606) background portion (dark), It consists of a background part (thin) of (607). Consider a case where the reading area is binarized. When binarization is performed using the whole area as a processing unit as described in (1) of the prior art, the gray level distribution in the reading area is the gray level distribution of (601) in FIG. 6 as shown in FIG. 7 (701). 6, (703) which is the gray scale distribution of (602) in FIG. 6, (703) which is the gray scale distribution of (603) in FIG. 6, and (704) which is the gray scale distribution of (604) in FIG. 6 (705) which is the gray scale distribution of (605) in FIG. 6, (706) which is the gray scale distribution of (606) in FIG. 6, and (707) which is the gray scale distribution of (607) in FIG. Then, the binarization threshold values are the threshold value A (801) and the threshold value B (802) shown in FIG. However, the binarization result based on the threshold value A is shown in FIG. 9, and the character “630” to be read cannot be extracted. Since only the other characters are extracted, the character cannot be recognized. FIG. 10 shows a binarization result based on the threshold value B. The target character is painted over a dark background portion, and the character cannot be recognized. As described above, according to the method of the prior art (1), when there are various shades of frame lines, pre-printed characters, backgrounds, and entered characters in the reading area, the grayscale distribution is complicatedly overlapped. There is a problem that it becomes difficult to estimate a threshold value that can distinguish the surrounding background. When binarization is performed using the local region as a processing unit as described in (2) in the related art, the gray value (702) of the written character and the gray value (704) of the pre-printed character are low (dark), and the background color is low. When the gray value (706) is high (thin), as shown in FIG. 12, the input character and the pre-print character are both extracted, so that the input character and the pre-print character overlap (1201, 1202), and the character is correctly I can't recognize. Also, when the shade value (702) of the background color of the input character is low (dark), and the shade value (704) of the pre-print character and the shade value (706) of the background color are high (light) (however, the shade value of the pre-print character) Is lower than the density value of the background), as shown in FIG. 13, a part of the pre-printed character is missing, and the character cannot be correctly recognized. In addition, as shown in FIG. 14, when the gray value (703) of the frame line is low and the gray value (702) of the input character is high, the binarization result is as shown in FIG. I cannot recognize it correctly. When binarization is performed using the mesh area of (3) described in the prior art as a processing unit, the pitch and size of characters in a form are various, and written characters and preprinted characters may overlap. , It is difficult to divide each character (mesh shape). Therefore, in each mesh area, there may be cases where various shaded characters, preprinted characters, frame lines, and backgrounds are included, and the same problem as the binarization method of (1) and (2) occurs. The above problem also occurs when all the characters in the frame in the document are recognized.
Further, when a binary image of a recognition area is displayed on a display to confirm the result of character recognition, the above-described problem occurs, so that characters to be recognized are blurred or crushed, and a frame to be recognized is shifted. Therefore, it may be difficult to confirm the recognition result.
[0004]
The present invention has been made in view of such a problem of the related art, and includes a document image including various forms of frame lines, such as a form, a preprinted character, a background, and an input character. Even if there is, character recognition that generates a binary image suitable for recognizing characters entered in each frame, realizes highly accurate character recognition, and can easily confirm the recognition result visually. It is an object to provide a method and a character recognition device.
[0005]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, a representative invention disclosed in the present application is to provide a character image from a grayscale image of a document image including a frame line of various colors, a preprinted character, a background, and an input character. This is a method for generating a binary image suitable for recognition. A grayscale image of the entire document or a part thereof is input (101), and a frame structure described in the document is extracted from the image (102). The input image is divided into a plurality of regions based on the obtained frame structure (103), a binarization threshold suitable for character recognition is calculated for each of the divided regions (104), and the obtained threshold is used. A binary image is generated for each divided region (105).
[0006]
Furthermore, in a method of recognizing a pattern of characters, symbols, marks, etc. from a document image such as a form, when recognizing a character in a specific frame in the document, input the surface of the document as a grayscale image, and By using the format (frame position) information prepared by detecting the position of a certain document and measuring the frame position in a document of the same format in advance, the characters to be recognized from the input image are described. Estimate the position where the frame exists, expand the estimated frame position up, down, left and right, cut out the enlarged area from the input image, extract the frame structure in the cut out partial area image, and fill in the frame for each frame It is characterized in that a binarization threshold suitable for recognizing a given character is calculated, a binary image is generated for each frame using the obtained threshold, and character recognition is performed for each frame.
[0007]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, the present invention will be described with reference to the drawings. According to the invention disclosed in the present application, as shown in FIG. 1, a gray-scale image of the entire document or a part thereof is input (101), and the frame structure described in the document is extracted from the image (102). The input image is divided into a plurality of regions based on the frame structure (103), a binarization threshold suitable for character recognition is calculated for each of the divided regions (104), and the obtained threshold is used. A binary image is generated for each of the divided areas (105). According to the configuration of the present application, a binary image suitable for recognizing characters entered in each frame is generated even for a document image composed of various color frame lines, preprinted characters, backgrounds, and entered characters. And realizes highly accurate character recognition.
An overall flow of the present invention will be described in detail with reference to FIG. 26 which shows a processing flow of a method of reading characters in a form frame according to an embodiment of the present invention. First, in step 101, an image is input. The input image is a grayscale image, and is composed of input characters having various grayscale values, preprinted characters, a frame line, and a background as shown in FIG.
Next, in step 2601, a form position in the input image is detected. In this method, as shown in FIG. 28, the input image includes a form portion and a black background portion, and the position in the input image where the form exists is obtained. Specifically, one of the four corners of the input image is set as the origin, and the four corner coordinates of the form are obtained.
[0008]
Next, in step 2602, the area of the frame in which the character to be read is described is cut out from the input image. The position coordinates of the frame described in the form are measured in advance using a form in the same format as the form to be read, and stored as format information in the format information storage unit (209) in FIG. Then, based on the frame position coordinates of the format information, a frame region in which the character to be read is described is cut out. At this time, form expansion / contraction correction and inclination correction at the time of image capture are performed. However, due to errors in form position detection, form expansion / contraction correction, and inclination correction, and variations in the size of form paper, the four corners of the frame to be read cannot be accurately cut out, and the position of the held frame The position of the frame on the image to be read may be shifted. If the area is cut out using the held position information of the frame as it is, and the position is shifted, the character to be read may be missing and reading may not be possible. Therefore, in the frame area cutout processing, a range obtained by expanding the area estimated from the position of the form frame held in advance up, down, left, and right is cut out. This area is called a cutout area. FIG. 6 shows an example of the cutout area. Including the case (602), the frame at the center of the area is the frame to be read. The enlargement range is set in advance. For example, (A, B, C, D) is set in advance as a constant, and the area estimated from the position of the previously held form frame is Amm in the upward direction, Bmm in the downward direction, and Cmm in the leftward direction. , To the right to Dmm. With this configuration, the character to be read is completely included in the cutout area, and the character recognition accuracy is improved. Further, when displaying the image of the cut-out area for visual confirmation of the recognition result, the character to be read is completely included, so that the confirmation becomes easy.
Next, in step 102, the structure of the entry frame existing in the cutout area is extracted. To extract the structure of a frame means to calculate the position of each frame by detecting the position of a ruled line constituting the frame (FIG. 16) and detecting a closed area surrounded by the detected ruled line (FIG. 16). 17). Reference numerals 1701, 1702, 1703, 1704, 1705, 1706, 1707, and 1708 denote individual frame regions obtained by performing frame structure extraction. The position of the ruled line is detected by binarizing the entire surface once using the average value of the local region as a threshold as described in the prior art (2), and detecting horizontal and vertical black pixels from the obtained binary image. Do it by doing. However, as another method, a method of detecting from a multi-valued image using a Hough transform method as described in Shunji Mori and Tsugako Sakakura, “Basics of Image Recognition 2”, p3-11, Ohmsha, etc. may be used. it can. There are various methods for detecting the position of the frame, but the intersection of the detected horizontal ruled line and the vertical ruled line is detected, the detected intersection is traced, and a closed region having the intersection as a vertex is detected. A detection method can also be used.
[0009]
After the frame structure is extracted, in step 103, the inside of the shaded image cut out using the information on the frame structure obtained as shown in FIG. 18 is divided into regions for each frame. At this time, each frame area (single frame area) is divided into a frame area and an area surrounded by the frame area (an area where characters are written). This eliminates the need to consider the color of the frame line in the subsequent processing. Next, in step 104, a binarization threshold suitable for character recognition is calculated for each single frame area. A binary image suitable for character recognition is generated again without using the binary image generated for extracting the frame structure in step 102. The binarization threshold is calculated using the k-means method described in the related art. The k-means method is one of typical clustering methods, and is a method of dividing data having a numerical value or a numerical vector into a predetermined number of groups (clusters). In order to calculate a binarization threshold suitable for character recognition, each pixel in the region is clustered using a luminance value to distinguish a character, a preprinted character, and a background pixel. As described in the related art, there are various other methods for calculating the binarization threshold. For example, binarization can be performed by a method using discriminant analysis. However, the k-means method is a calculation method. There is an advantage that the amount is small and the processing time is short, and the accuracy is not inferior to the others. However, in a single frame area in a commonly used document image such as a form, there are two colors of a written character and a background and a preprinted character and a background like 1801 and 1807 as shown in FIG. In many cases, there are cases where there are three colors, that is, an input character, a preprinted character, and a background as in 1804. Therefore, a histogram of gray values in the frame area is created, and the obtained histogram is clustered into two clusters (hereinafter referred to as clustering (2)) and clustered into three clusters (hereinafter referred to as clustering (2)) by the k-means method. Clustering (3)) to calculate a threshold value. FIG. 20 shows the density histogram of 1801, FIG. 21 shows the density histogram of 1807, and FIG. 19 shows the density histogram of 1804. If it is known in advance how many colors are included in a single frame area, a predetermined number of clusterings may be performed. On the other hand, when handling document images of various formats, it is difficult to check the number of colors in each frame area in advance, and there are pre-printed characters even in the same position of a frame area of the same type of document In some cases, for example, the number of colors is not limited. For example, clustering (2) and clustering (3) are performed for each single frame region, and threshold values are calculated. The number of clusters is not limited to two or three depending on the form to be handled. Here, when the clustering (3) is performed, two thresholds are calculated as shown in FIG. 19, but the characters to be entered in the form, that is, the characters to be read, generally have lower grayscale values than the preprinted characters ( Therefore, the threshold value C (1904), which is the lower of the two gray levels, is used. Although only two colors or three colors exist in the single frame area of the form handled in this embodiment as described above, if N colors exist in the area to be binarized, the cluster is divided into N clusters. It is conceivable to divide each pixel by performing clustering.
[0010]
Next, in step 2607, the inside of the single frame area is binarized using the two threshold values calculated for each single frame area to generate two binary images. FIG. 22 shows a result of binarizing 1804 with a threshold C (1904), and FIG. 23 shows a result of binarizing 1804 with a threshold D (1905). As shown in FIG. 22, when binarized by the threshold C, only the entered characters can be black pixels except for the pre-printed characters, but binarized by the threshold D as shown in FIG. In this case, the entered characters and the pre-printed characters overlap, and the characters cannot be correctly recognized.
[0011]
Next, in step 2609, character recognition is performed for each single frame area. At this time, for each single frame area, there are two binary images with different binarization thresholds (threshold by clustering (2) and threshold by clustering (3)). Perform character recognition.
[0012]
Next, in step 2610, of the two sets of character recognition results with different binarization thresholds obtained for each single frame region, the character recognition results (e.g., amount, name, address, etc.) described in the frame match. The character recognition result to be output is output as a reading result. The knowledge of a character string refers to the attribute of each piece of information to be described in each form, and is a database of notation patterns of character strings described in the frame or all patterns. For example, if the amount is money, information such as a mark, a series of numbers, and commas is stored. If the name is an address, the notation patterns of all names and addresses to be entered are stored. These are stored in the format information recording unit 209 of FIG. 2 in advance.
[0013]
FIG. 27 is a configuration diagram of the above embodiment. In the figure, reference numeral 2701 denotes a means for inputting a gray image such as a scanner which reads a form as a gray image, 2702 denotes a read gray image, means for storing a binary image being processed, and 2704 denotes a form in the read gray image. Means for detecting the position (coordinates of the four corners), 2705 means for cutting out the portion where the characters to be read are described from the shaded image, 2706 means for extracting the frame structure of the form, and 2607 means the inside of the shaded image for each frame. Means 2708 for calculating a binarization threshold for generating a binary image suitable for character recognition for each partial area in the image, and 2709 means a partial area in the image based on the threshold obtained in 2708. Means for generating a binary image of 2710, means for cutting out characters from the binary image generated in 2709 and performing character recognition, and 2711 means for recognizing the character recognition result obtained in 2710. Chi is a means for outputting a match with the knowledge of the character string to be described in the reading target in the frame (amount, name, address, etc.). Reference numeral 2706 denotes a binarizing unit for extracting the frame structure of 2712, a unit for extracting the ruled line of 2713, and a unit for detecting the position of the frame of 2714. Reference numeral 2715 denotes a unit for holding information on a format (frame position) of a form used when cutting out a portion where characters are described in 2705, and 2716 denotes a unit for holding knowledge of a character string used in 2711. is there.
Reference numeral 2718 denotes a binary image display unit for visually confirming the recognition result.
The present invention disclosed above can be realized by a program and executed by an information device such as a computer.
[0014]
【The invention's effect】
According to the present invention, in a pattern recognition method for characters, symbols, marks, and the like in a document image such as a form, after extracting a frame structure in the document image, one information in the document image is extracted using information of the frame structure. It divides into one frame area, calculates a binarization threshold suitable for recognizing a pattern described in the frame for each frame area, and generates a binary image (binary image A) using this threshold. By doing so, it is possible to generate a binary image suitable for recognition even if the document image is composed of various color frame lines, preprinted characters, backgrounds, and entered characters, thereby improving recognition accuracy. be able to. Further, by changing the binary image displayed on the display for confirming the recognition result to the binary image A, it is possible to obtain a document image composed of various color frame lines, preprinted characters, backgrounds, and entered characters. Also, since the character to be recognized without blurring or crushing can be displayed, and the frame to be recognized can be accurately displayed, it is easy to visually confirm the recognition result.
[Brief description of the drawings]
FIG. 1 is a diagram showing a processing flow representing the present invention.
FIG. 2 is a configuration diagram of an embodiment.
FIG. 3 is a diagram showing a processing flow according to a conventional method when reading all characters in a form.
FIG. 4 is a diagram showing a processing flow according to a conventional method when reading a character at a specific position in a form.
FIG. 5 is a diagram showing an example of a processing flow of a frame structure extraction process.
FIG. 6 is a view showing an example of an image composed of various grayscale frame lines, written characters, preprinted characters, and a background.
FIG. 7 is a diagram showing a light and shade histogram in a reading area.
FIG. 8 is a diagram showing binarization thresholds (A, B) calculated based on a density histogram of a reading area.
FIG. 9 is a view showing a binary image based on a threshold value A (801).
FIG. 10 is a diagram showing a binary image based on a threshold value B (802).
FIG. 11 is a diagram showing a binarization method for each local region.
FIG. 12 is a diagram showing an example (1) of a binary image by a binarization method for each local region.
FIG. 13 is a diagram showing an example (2) of a binary image by a binarization method for each local region.
FIG. 14 is a diagram showing a multi-value image when a light character exists near a dark frame line.
FIG. 15 is a diagram showing a binary image by a binarization method for each local region when a light character exists near a dark frame line.
FIG. 16 is a diagram showing a ruled line detection result.
FIG. 17 is a view showing a frame detection result.
FIG. 18 is a diagram showing an area division image based on frame information.
FIG. 19 is a view showing a density histogram of a frame area (1804).
FIG. 20 is a view showing a density histogram (2) of a frame area (1801).
FIG. 21 is a view showing a density histogram (3) of a frame area (1807).
FIG. 22 is a view showing a binary image of a frame area (1804) based on a threshold value C (1904).
FIG. 23 is a view showing a binary image of a frame area (1804) based on a threshold value D (1905).
FIG. 24 is a view showing a binary image of a frame area (1801) based on a threshold value E (2003).
FIG. 25 is a view showing a binary image of a frame area (1807) based on a threshold value F (2103).
FIG. 26 is a diagram showing a processing flow of the embodiment.
FIG. 27 is a diagram illustrating a configuration of an input image according to the embodiment.
[Explanation of symbols]
601: Entry characters (dark), 602: Entry characters (light), 603: Border line (dark), 604: Pre-printed characters, 605: Border line (light), 606: Background part (dark), 607 ... Background part (Thin), 701: shading distribution of the entry character string “000257003” existing at the top in FIG. 6 and the entry character “1” existing at the right in FIG. 6, 702: entry existing near the center of FIG. Shade distribution of characters "6, 30", 703: shade distribution of thick frame line in FIG. 6, 704 ... shade distribution of pre-printed characters "post-date, month, day, business office" in FIG. 6, the density distribution of the thin frame line, 706... The density distribution of the dark background portion existing at the center in FIG. 6, 707 the density distribution of the thin background portion in FIG. 6, 801, 802. , Threshold values obtained by the k-means method from 02: a portion where the entered character overlaps the pre-printed character according to the binary image example by the local region-based binarization method; 1301, 1302 ... the pre-printed character is formed by the binary image example according to the local region-based binarization method. Missing portions, 1701, 1702, 1703, 1704, 1705, 1706, 1707, 1708 ... frame structure extraction is performed, and the obtained individual frame regions, 1801, 1802, 1803, 1804, 1805, 1806, 1807, 1808 ... The image is divided using the information on the frame structure, and the obtained individual frame areas are obtained, 1901... Distribution of the density of the characters existing in the frame area (1804), 1902... Preprinted characters existing in the frame area (1804) , Distribution of the background existing in the frame area (1804), 1904, 1905, density histogram of the frame area (1804) Binarization thresholds (C, D) calculated from the data using the k-means method, 2001: shading distribution of characters entered in the frame area (1801), 2002: shading of the background existing in the frame area (1801) Distribution, 2003: binarization threshold (E) calculated from the density histogram of the frame area (1801) using the k-means method, 2101: Density distribution of preprinted characters existing in the frame area (1807), 2102: frame The gray level distribution (F) of the background existing in the area (1807), 2103..., Calculated from the gray level histogram of the frame area (1807) using the k-means method.

Claims

An image input unit, a form image processing apparatus having a processing unit,
The processing unit includes:
Detecting an entry frame from the form image input via the image input unit, and dividing the form image using information on the detected entry frame,
Calculating a threshold value for binarization for each of the divided areas;
Performing binarization using the threshold value for each of the regions;
A form image processing apparatus characterized by controlling

2. The form image processing apparatus according to claim 1, wherein the dividing step is performed for each of the detected entry frames.

3. The form image processing apparatus according to claim 1, wherein the dividing step further divides the line area of the entry frame and an area surrounded by the line area of the entry frame.

The form image processing apparatus further includes a storage unit that stores format information of the form, and the control unit includes:
4. The form image processing apparatus according to claim 1, further comprising a step of recognizing a character entered in the binarized form image using the format information.

5. The form image processing apparatus according to claim 1, wherein the processing unit further controls a step of binarizing the form image prior to the step of detecting the entry frame.

The step of calculating the threshold value for binarization includes:
A histogram of gray values in the region is created, and pixels in the region are clustered into a plurality of preset groups using the distribution of the histogram. Calculate thresholds from the results,
The step of performing the binarization is performed for each of the thresholds,
Recognizing the characters for each of the binarized images, comparing the character string knowledge stored in the storage unit with the respective recognition results, and outputting any of the recognition results based on the comparison results. The form image processing apparatus according to claim 4 or 5, wherein:

Input means for a form image,
Storage means for storing format information of the form image input via the input means,
Means for detecting the coordinates of the reading area in the form image input from the format information,
Means for cutting out an area obtained by enlarging the area specified by the coordinates by a predetermined value;
Means for detecting a frame from the cut-out region, and dividing the cut-out region for each frame;
Means for calculating a binarization threshold for each frame,
Means for generating a binary image for each frame using the threshold value;
Means for recognizing characters in the binary image.

8. The pattern recognition apparatus according to claim 7, further comprising a display unit for displaying the cutout area.

Obtaining a form image via the image input unit;
Detecting an entry frame from the form image;
Dividing the form image using the information of the detected entry frame,
Calculating a threshold value for binarization for each of the divided areas;
Performing binarization using the threshold value for each of the regions;
Performing character recognition for each of the binarized areas;
A program for causing a computer to execute a pattern recognition method characterized by having: