JP4006176B2

JP4006176B2 - Character string recognition device

Info

Publication number: JP4006176B2
Application number: JP2000309291A
Authority: JP
Inventors: 昌史古賀; 竜治嶺; 浩道藤澤
Original assignee: Hitachi Omron Terminal Solutions Corp
Current assignee: Hitachi Omron Terminal Solutions Corp
Priority date: 2000-10-04
Filing date: 2000-10-04
Publication date: 2007-11-14
Anticipated expiration: 2020-10-04
Also published as: JP2002117374A

Description

【０００１】
【発明の属する技術分野】
本発明は、帳票などの文書上に記載された金額、日付け、電話番号など数字を中心とする文字列を読取る文字列認識装置にかかわる。
【０００２】
【従来の技術】
数字を主とする文字列の代表例として金額、電話番号、年月日などがある。以下では文字列認識に関わる従来技術を、上記金額を表す文字列の認識を例に説明する。
【０００３】
帳票などに記載する金額を表す文字列には「￥１００−」、「￥１，２００−」、「￥１２，０００，０００−」のようなものがある。
ここでは、
・金額の先頭には「￥」をつける、
・３桁ごとに「，」をつける、
・金額の最後には「−」をつける、
などの規則が用いられている。
【０００４】
こうした文字列を読取る方式としては、一般的には以下のような手順が知られている。
（１）文字切出し…文字として尤もらしい部分画像を切り出す。
（２）文字識別…各々の部分画像の文字カテゴリ（文字コード）を識別する。
（３）文字列照合…文字の識別結果を文字列として解釈する。
【０００５】
上記（１）の文字切出しには、位置、大きさなどの一定の条件を満たす連結成分の組合せを文字行中から取出す方式がしばしば用いられる。この際、文字同士の接触などのため、文字間の境界を検出するのが困難となる場合がある。このような場合には、接触した箇所を検出して切断するとともに、文字切出しに多重仮説検定法（文献１：H.Fujisawa, Y.Nakano, and K.Kurino,”Segmentation Methods for Character Recognition : From Segmentation to Document Structure Analysis”Proc. of the IEEE vol.80,No.7,1992）を用いることで精度を高めることができる。
【０００６】
ここで上記多重仮説検定法とは、一旦複数の文字の切出し方を試行したのち、文字識別の結果を利用して文字の切出し方を決定する方式である。
【０００７】
上記（２）の文字識別には、予め記憶している標準パタン中で最も入力パタンに類似したものを選び出すパタン整合法がよく用いられる。文字識別は通常複数の文字候補とそれらの尤もらしさの評価値を出力する。後段の文字列照合にて、これら複数の文字候補から文字列として解釈して妥当なものを選択する。
【０００８】
上記（３）の文字列照合では、認識対象の文字列が上述の規則に従っているかどうかを検査する。また、帳票上の汚れもしくは文字のかすれなどがある場合、あるいは文字の変形が大きかったり文字同士が接触していたりする場合、文字列の認識を正確に行うことが困難になる。こうした場合、文字切出しや文字識別の精度を向上するために、文字列照合処理によって上記のような規則を利用した文字の切出し方あるいは候補文字の選択をおこなうことが解決策として知られている。
【０００９】
そして上記（３）の手法としては、準定型帳票中のＩ．Ｄ．コードなどを対象にしたもの（文献２：丸川他「表記規則を持つ数字文字列の認識における文字列チェック機能の一検討」１９９４年電子情報通信学会秋期大会Ｄ−３１３，ｐ．３２１）、住所の丁目・番地を対象にしたもの（文献３：緒方他「住所表示番号と棟・部屋番号の連続表記に対する照合方式」信学技報ＰＲＭＵ９９−２１９）、（文献４：大井他「住所読み取りにおける丁目・街区認識方式」信学技報ＮＬＣ９２−９６、１９９２）等がある。
【００１０】
しかし、従来の文字列照合の手法は、対象への依存が大きく、対象を変更することは困難であった。例えば上記文献２あるいは文献４では、それぞれの認識対象に特化した方式であって、他の方式への適用は困難である。また、上記文献３の技術では表記形式の辞書を変更することにより認識対象を変更することができるが、表記の組合わせは膨大な数になることが多く、辞書の変更が容易ではなかった。
【００１１】
また、文脈自由文法を文字列照合に用いると、多くの表記形式を簡潔に記述できることが知られている（文献５：古賀他「地名表記方法および地名文字列認識方法」特願平１１−１８７７５３）。しかし、本発明のように数字を中心とする文字列にこうした手法を適用した例はなかった。
【００１２】
さらに、金額などを高い精度で読取るためには「，」は文字行の下のほうにある、数字同士はほぼ大きさが揃っているが「ー」や「，」は小さいことがある、などといった文字の空間的配置に関する情報を利用することが必要である。しかし、従来の方式ではこうした配置の情報を表記の規則と有効に結び付けることができなかった。
【００１３】
【発明が解決しようとする課題】
本発明が解決しようとする第１の課題は、数字を中心とする文字列の表記上の規則を利用し、高精度かつ高速に文字識別結果を文字列として認識する文字列認識装置を実現することである。本発明が解決しようとする第２の課題は、文脈的な表記上の制約を簡潔に記述することを可能とし、これにより文字列認識装置が容易に様々な認識対象に対応することを可能にすることである。本発明が解決しようとする第３の課題は、文字の空間的配置を文字列の表記のし方と結びつけて扱い、文字列を高い精度で認識することを実現することである。
【００１４】
【課題を解決するための手段】
本発明の文字列認識装置では、数字の文字列の知識を文脈自由文法で記憶する手段を設ける。これにより、数字を中心とする文字列の表記上の制約が簡潔に表現できる。さらに上昇型構文解析法を利用した文字列照合手段を設け、高速かつ高精度に文字識別識別結果を文字列として解釈することを実現する。さらに、文脈自由文法における記号列同士の書換えの規則に併せて文字の空間的配置の制約を記憶する手段を設けるとともに、置換え規則を適用する際に文字の空間的配置が妥当かどうか判定する手段を設ける。
【００１５】
文脈自由文法は、記号列同士の書換えの規則（生成規則）により文字列の表記上の制約を表現するものである。これを利用し、数字と「，」「￥」などの組合せを文法的に定義することにより、数字を中心とした文字列の表記上の制約を簡潔に表現することができる。
【００１６】
また、文字識別結果の解釈の際に用いる上昇型構文解析法は、部分的な記号列を生成規則にしたがって上位の記号に繰返し書換えていくことにより、文字識別結果から最適な文字列の解釈を見い出す方式である。
【００１７】
類似の方式は音声認識の分野ではＣＹＫ法（文献６：中川「確率モデルによる音声認識」電子情報通信学会）として知られている。この方式は、文法的に許容される文字列の組合せの数が多い場合にも、効率よく最適な文字列を選択でき、高精度かつ高速に文字識別結果を解釈できる。もともとこの方式は音声信号のような１次元の入力を対象にしていた。本発明では文字の切出し方を後述する切出し仮説ネットワークと称するネットワークの形式で表現することにより、画像のような２次元の入力を扱えるよう工夫した。
【００１８】
本発明では、さらに生成規則の表記法を工夫し、ある記号列を別の記号に書換える際に、その記号列を構成する文字が満たす空間的配置の制約を記憶できるようにした。これにより、文字列照合処理またはその後段の処理で文字の空間的な配置を検査し、認識精度を高めることを可能とした。
【００１９】
【発明の実施の形態】
本発明の実施形態の一例を図１に示す。本実施例は、文法の人手による作成を支援する文法登録装置１０１、与えられた文法を利用して文書１０４上の文字列を認識する文字列認識装置１０２からなる。
【００２０】
文法登録装置１０１で人手で作成した文法のデータは、通信回線１０３を介して文字列認識装置１０２に送信する。また、ＦＤ（フロッピーディスク）などの記憶媒体を介して複写してもよい。
【００２１】
文法登録装置１０１は通信装置１１０、プロセッサ１１２、ＦＤドライブ１１４、ディスク装置１１５、メモリ１１３、表示装置１１６、キーボード１１７、マウス１１８をバス１１１で接続したものである。
【００２２】
通信装置１１０は通信回線１０３に接続され、文字列認識装置１０２との通信を司る。プロセッサ１１２は文法登録装置１０１全体の制御、文法の作成を支援するプログラム類の実行などを司る。プログラムファイル群１２０と作成した文法情報辞書ファイル１２１はディスク装置１１５に格納し、システム起動時にメモリ１１３に読込む。
【００２３】
人手による文法の編集は、文法辞書編集プログラム１２３により行う。これは、メモリ１１３に格納した文法辞書テーブル１２５内のデータの表示／編集、ディスク装置１１５からの読込み、ディスク装置１１５への書込みなどの機能を有する。編集を行う人は、表示装置１１６に表示された内容を確認しながらキーボード１１７、マウス１１８などの入力装置を用いて入力を行う。
【００２４】
また、編集の過程を支援するため、文法辞書テーブル１２５に格納した文法からどのような文字列が生成されるかを例示する例示プログラム１２４もメモリ１１３に格納しておく。この例示プログラム１２４は文法が妥当かチェックも併せて行う。さらに、作成作業の際の参考にするため、過去の文法の事例を登録してある文法事例ファイル１２２もディスク装置１１５に格納する。ＦＤドライブ１１４は、ＦＤを用いて文法を文字列認識装置１０２に複写する際などに用いる。
【００２５】
文字列認識装置１０２は、通信装置１３０、プロセッサ１３２、スキャナ１３６、ＦＤドライブ１３４、ディスク装置１３５、メモリ１３３をバス１３１で接続したものである。プロセッサ１３２は文字列認識装置１０２全体の制御、文字列認識のプログラム類の実行などを司る。プログラムファイル群１６０と文法登録装置１０１で作成した文法情報辞書ファイル１６１はディスク装置１３５に格納し、システム起動時にメモリ１３３に読込む。
【００２６】
文法登録装置１０１で作成した文法は、通信回線１０３を介して通新装置１３０によりディスク装置１３５に格納するか、あるいはＦＤドライブ１１４を用いてＦＤから複写する。読取り対象の文書１０４はスキャナ１３６により光電変換して入力画像として取込み、メモリ１３３に格納する。入力画像としては通常２値画像を用いる。本明細書でも特に断りがない限り、画像は２値画像であるものとする。
【００２７】
メモリ１３３には、後述するプログラム群、すなわちフィールド切出しプログラム１４１、切出し仮説生成プログラム１４２、構文解析プログラム１４３、標準化プログラム１４４、検定プログラム１４５などを格納する。さらに、メモリ１３３には、文法情報を格納する標準型生成規則テーブル１５０の他に、後述するフィールド画像テーブル１４７、切出し仮説テーブル１４８、構文要素ラティステーブル１４９、文候補テーブル１５１、認識結果テーブル１５２を設ける。
【００２８】
本発明の認識対象である文書１０４の一例を図２に模式的に示す。本実施例での読取り対象の文書は、枠線で領域が区切られた帳票である。区切られた枠の一つ２０１に読取り対象の文字列を記入するよう定められており、枠の座標は予め分かっているものとする。
【００２９】
本発明で数字文字列の表記の規則を表すために用いる文法は、つぎの数１のような形で表される（文献７：Ｊ．Ｅ．ホップクロフト他「言語理論とオートマトン」サイエンス社、ＩＳＢＮ−７８１９−０２５０−２）。
【００３０】
【数１】

【００３１】
ここでは、数字および「￥」「，」「−」など、実際に数字文字列中で用いられている文字が終端記号にあたり、それ以外の構文要素（文献によっては文法的類ともいう）が非終端記号にあたる（上記文献７のｐ．１１参照）。生成規則は記号列の生成の規則を表すものである。すなわち、数１中の「→」の左辺の記号から右辺の記号列が生成できることを表している（上記文献７のｐ．１０参照）。
【００３２】
本発明では文脈自由文法を用いるため、左辺は必ず１つの変数となり、また右辺は空文字列であってはならない。さらに文法的に妥当な文字列（「￥１，２３４−」など）が文開始記号にあたる。以下、終端記号には小文字のアルファベット、非終端記号もしくは終端記号を表す変数には大文字のアルファベット、変数の列にはギリシャ文字を用いる。
【００３３】
本発明で用いる文法情報辞書ファイル１２１には、図３に示すようなテキストデータを用いる。ここでは、「：：＝」は前記数１で置換えを意味する「→」に相当し、各行が１つの生成規則に対応する。また終端記号は常に１文字の単語、非終端記号は２文字以上の単語とし、単語および「：：＝」の間は空白で区切る。例えば、図３の１０行目では、記号Ｂｉｎｚｎ（３０１）と記号列「Ｎｚｄ０」（３０３）が記号「：：＝」（３０２）で結び付けられており、前者から後者が生成できることを示している。また、このテキストデータで最後、すなわち図３の例では最終行に表れた非終端記号を文開始記号とみなす。この場合は「ａｍｔ」が文開始記号となる。
【００３４】
本発明で用いる文法情報辞書ファイル１２１には、上述のような文法情報と併せて、文字の配置に関する情報を３０４のような形式で格納する。これは、各生成規則を適用する際に、元の文字列に含まれる文字が満たすべき空間的配置に関する制約条件を表している。例えば３０４でａｉ（ｉは１，２，３，……）は「：：＝」の右辺の第ｉ項を表し、「ｅｑｐｓ（ａ１，ａ２）」は、ａ１およびａ２に含まれる文字（すなわち、終端記号に対応する部分画像）のサイズとピッチがほぼ等しいことを表している。ここで用いられる記号類の意味は図１８に示してある。
【００３５】
本発明の文字列認識における処理の流れを図１および図４のデータフロー図で説明する。文字列認識処理４０２は文字列認識装置１０２で実現される処理である。文字列認識処理４０２は、文法情報４０１（図１の文法情報辞書ファイル１６１に格納する）とスキャナ１０４で取り込んだ画像を入力とし、認識結果の文字列を出力とする。文法情報４０１は文法情報編集４００にて予め人手で編集する。文法情報編集４００は図１の文法登録装置１０１にて実現する。
【００３６】
まず、フィールド切出し４０３は、入力画像から読取り対象の領域（読取りフィールド）の画像を切出し、フィールド画像テーブル１４７に出力する処理で、フィールド切出しプログラム１４１にて実現する。切出し仮説生成４０４は、フィールド画像を解析し、大きさ、形状などの情報に基づいて文字として尤もらしい部分画像（候補パタン）を検出する処理で、切出し仮説生成プログラム１４２にて実現し、その結果は後述する切出し仮説ネットワークの形式で切出し仮説テーブル１４８に格納する。
【００３７】
標準化処理４０６は入力の文法情報４０１を後の処理で用いやすい標準型に変換する処理で、標準化プログラム１４４で実現し、結果は標準型生成規則テーブル１５０に格納する。
【００３８】
構文解析４０５は標準化した生成規則と、切出し仮説ネットワークを入力とし、文字識別４０７を利用して文字列を認識する処理で、構文解析プログラム１４３で実現する。また、尤もらしい文開始記号が得られたなら、この文開始記号に対応する文字列の情報などを文候補と称するデータの形式で文候補テーブル１５１に出力する。
【００３９】
検定処理４０８は、得られた文候補が妥当かどうかをさらに詳細に調べ、妥当なものの中からさらに最も確からしいものを認識結果として認識結果テーブル１５２に出力する処理で、検定プログラム１４５で実現する。
【００４０】
つぎに、切出し仮説生成４０４、標準化４０６、文字識別４０７、構文解析４０５、検定４０８の処理について、順をおって説明する。
【００４１】
切出し仮説生成４０４は、フィールド画像を解析し、大きさ、形状などの情報に基づいて文字として尤もらしい部分画像（候補パタン）を検出する処理である。この段階では一意に文字の切出し方を特定できない場合が多い。このため、切出し仮説生成４０４では複数の文字の切出し方の仮説に基づいて候補パタンを検出しておき、後段の処理で正しいものを選択するようにする。
【００４２】
例えば、図２の入力画像の枠２０１の部分を切出して得られる図５に示すフィールド画像からは、図６に示すような候補パタンが検出できる。図５に示すように、フィールド画像には読取り対象以外の文字や汚れも混入することが避けられない。これらの候補パタンの集合は図６に示すようなネットワークで表現できる。以下、このようなネットワークを切出し仮説ネットワークと呼ぶ。
【００４３】
切出し仮説ネットワーク上では、検出された候補パタンが辺、候補パタンの境界が節点で表現されている。また、フィールド中の文字の切出し方の仮説は、切出し仮説ネットワーク中の経路で表現される。
【００４４】
切出し仮説ネットワーク上の候補パタン境界には、整数の識別子を付与する。各候補パタンはつぎの数２のような形式で表現する。また、切出し仮説ネットワーク全体は数３のように表される。
【００４５】
【数２】

【００４６】
【数３】

【００４７】
標準化処理４０６は、入力の文法情報を後の処理で用いやすい標準型に変換する処理である。ここでの標準形とは、生成規則の右辺が１つの終端記号もしくは２つの記号となっているものである。任意の文脈自由文法Ｇはこのような標準形（Chomskyの標準形）に変換できる（上記文献７のＰ．５６参照）。
【００４８】
図７に、図３から標準型に変換された生成規則の例を示す。なおこの際、空間的配置に関する制約の記述も生成規則と併せて変換する。変換は、以下のように行う。
【００４９】
・標準化した生成規則に関する空間的配置の記述があれば、それをそのまま標準化した生成規則に付加する。
【００５０】
・空間的配置が「ｅｑｐｓ（ａ１，ａ２，ａ３）」のように３つ以上の引き数をもつ関数で表されている場合には、これを２項ずつの記述「ｅｑｐｓ（ａ１，ａ２）」「ｅｑｐｓ（ａ２，ａ３）」に分割し、それぞれに対応する生成規則に付加する。
【００５１】
文字識別４０７は候補パタンを入力とし、尤もらしい終端記号とその評価値を求める処理で、文字識別プログラム１４６で実現する。森「パターン認識」（文献８：ＩＳＢＮ４−８８５５２−０７５−４Ｃ３０５５、電子情報通信学会）のような方式をここに用いることができる。この段階で尤もらしい終端記号を一意に決定できない場合も多い。このような場合には、終端記号と評価値の対を複数出力する。
【００５２】
つぎに図８から図１４を用いて、構文解析４０５の処理手順を説明する。
【００５３】
図８は構文解析４０５の処理の概略を示すデータフローチャートである。構文解析４０５は、前記従来の技術の項で述べた（３）の文字列照合に相当する処理を司る。
【００５４】
まず、各候補パタンを終端記号として文字識別４０７を利用して評価し、確からしさを求める（８０１）。処理の過程は構文要素ラティスと称する形式で構文要素ラティステーブル１４９に記憶する。
【００５５】
つぎに、得られた構文要素の候補同士の組合せを、標準化した生成規則と照らし合せて評価する（８０２）。この照合処理は、新たな構文要素が見い出されなくなるまで繰り返す。つぎに、得られた構文要素の候補の中に文開始記号が含まれていたなら、これら文開始記号に対応するデータ（文候補）を出力する（８０３）。もし文開始記号が含まれていなければＮＵＬＬポインタを出力する。
【００５６】
候補構文要素はつぎの数４のような形式であらわされる。
【００５７】
【数４】

【００５８】
ｌｆｔとｒｉｔは、候補構文要素同士の境界の識別子である。左側の境界の識別子がｌｆｔ、右側の境界の識別子がｒｉｔとする。αは候補構文要素に対応する記号である。ｆは文字識別結果などに基づくこの候補構文要素の確からしさの値である。
【００５９】
αが終端記号の場合には、ｃｓｅＬとｃｓｅＲにそれぞれＮＵＬＬポインタ（メモリアドレスとしてあり得ない値）を代入し、ｐｔｎには対応する候補パタンへのポインタを代入する。この際、ｌｆｔとｒｉｔの値はそれぞれｐｔｎのｌｆｔとｒｉｔと同じ値とする。ｆには文字識別の結果得られる評価値を代入する。αが非終端記号の場合はｃｓｅＬとｃｓｅＲにそれぞれ生成規則の右辺の記号に対応する候補構文要素へのポインタを代入し、ｐｔｎにはＮＵＬＬポインタを代入し、ｌｆｔの値はｃｓｅＬのｌｆｔの値と等しくする。
【００６０】
この際、ｃｓｅＲがＮＵＬＬの場合は、ｒｉｔの値はｃｓｅＬのｒｉｔの値と等しくする。例として、終端記号から生成された候補構文要素（「ｙｅｎ」「ｚｒ」などの非終端記号）群を後に説明する図１３の破線１３０３から１３０４の間に模式的に示す。またｃｓｅＲがＮＵＬＬでない場合は、ｒｉｔの値はｃｓｅＲのｒｉｔの値と等しくする。
【００６１】
構文要素ラティスは、候補パタンネットワークと同様につぎの数５のような形式で表される。
【００６２】
【数５】

【００６３】
図１３に、構文要素ラティスの例を模式的に示す。候補パタンネットワーク１３０１上の候補パタンに対応する終端記号の候補構文要素群が破線１３０２と１３０３の間に示してある。また、これら終端記号から見い出された非終端記号が１３０３と１３０４の間に示してある。また、非終端記号の対から登録された候補構文要素群を破線１３０４以下に模式的に示してある。
【００６４】
終端記号の評価（８０１）の処理手順を図９に示す。以下、詳細な処理手順の説明にはアクションダイアグラム（文献９：Ｊ．マーチン「ソフトウエア構造化技法」近代科学社、ＩＳＢＮ４−７６４９−０１２４−２Ｃ３０５０Ｐ５５６２Ｅ）を用いる。
【００６５】
ここで、前記数４のｃｓｅのように、複数の要素を持つ変数の特定の要素、例えばｌｆｔを表すには、演算子「．」を用い「ｃｓｅ．ｌｆｔ」のように表すものとする。また、「ｃｓｅ＝（ａ，ｂ，ｃ，ｄ，ｅ，ｆ）」のような表記は、ｃｓｅの各要素にａ，ｂ，ｃ，ｄ，ｅ，ｆを代入することを表すものとする。また「＋＋」は、ある変数の値に１を加算することを示す。
【００６６】
まず、処理の冒頭で登録済み候補構文要素を表す変数ｍの値を０に初期化する。つぎに全ての候補パタンの終端記号としての確からしさを関数ｆで評価する。もしｆの値が基準値ｔｆを超えていればｍの値に１を加算し、この構文要素を候補構文要素の配列ｃｓｅのｍ番目に格納する（９０１）。
【００６７】
この際のｆの評価には文字識別４０７を用いる。つぎに、ｃｓｅに格納したｍ個の候補構文要素各々に関し、記号α（この場合αは全て終端記号）が生成される生成規則が生成規則の集合Ｐにあるかどうかを調べる（９０２）。もしあればｍの値に１を加算し、この生成規則の左辺にあたる候補構文要素に対応する情報をｃｓｅのｍ番目に格納する。
【００６８】
非終端記号の評価（８０２）は、構文要素ラティス中から生成規則で生成される候補構文要素の対を見い出し、この対に対応する候補構文要素を追加していくことを繰返す処理である。この際、処理の手順には以下のような工夫がなされている。
【００６９】
・前回および前々回の繰返し時点での登録済み候補構文要素数ｍ０，ｍ１を参照し、新たな構文要素が追加できたを判定。追加がなければ繰返しを終了する。
【００７０】
・構文要素ラティス上の同じ範囲に複数の候補構文要素がある場合には（すなわちｌｆｔとｒｉｔの値が各々同じ候補構文要素が複数ある場合には）、動的計画法の原理を用い、これらのうちで最も評価値が高いもののみを用いる。
【００７１】
非終端記号の評価（８０２）の処理手順を図１０で説明する。まず処理の冒頭でｍ０，ｍ１をそれぞれ０とｍに初期化する。つぎにループ１００１の中で構文要素ラティス中で隣り合っている（すなわち一方のｌｆｔともう一方のｒｉｔが同じ値になっている）対を全て見い出し（１００２）、さらにこの対を生成する生成規則が標準化生成規則中にないかを調べる（１００３）。
【００７２】
もし該当する生成規則があり、さらにこの候補構文要素の対のいずれかがループ１００１の前回の繰返し以降で登録されたものなら（１００４）、この組合せの評価値を計算し、変数ｆに格納する。ここでは組合せ確からしさとして、２つの候補構文要素の評価値ｆの和を用いる。この際、標準化生成規則に併せて記憶してある文字の空間的配置に関する制約条件を参照し、これを満たさない場合には評価値を０とし、後段の処理でこの部分が低く評価されるようにする。
【００７３】
さらにこの候補構文要素の対と範囲が同じ候補構文要素がすでに登録されていないかどうかを調べる（１００５）。登録されていた場合には、新しい対の方が以前からある候補構文要素より評価値が高い場合にのみ、新しい対に対応する候補構文要素の情報をｃｓｅのｍ番目の要素に格納する。この際、ｍ１以降に登録された候補構文要素で、１００５の判定を満たすものがあるかどうかを判断し（１００６）、あればその候補構文要素に新たな候補構文要素を上書きする。なければｍに１を加算し、配列ｃｓｅのｍ番目に新たな候補構文要素の情報を格納する。１００５の判定で該当する候補構文要素が登録されていない場合には、無条件でｍに１を加算し新しい対に対応する候補構文要素の情報をｃｓｅのｍ番目の要素に格納する。
【００７４】
ループ１００１の最後で、新たに候補構文要素が追加されたかどうかを判定し、追加されていればループ１００１を終了する（１００７）。されていなければ変数ｍ０，ｍ１を更新し、再度繰返しを実行する。
【００７５】
つぎに、図１１により認識結果出力８０３の処理手順を説明する。まず、処理の冒頭で文候補数を表す変数ｃを０に初期化する。つぎに全ての候補構文要素を調べ、文開始記号に対応するものがないかどうか調べる（１１０１）。もしあれば、ｃに１を加算し、関数ｒｅｃｏｖｅｒＳｔｒを呼び出して文開始記号に対応する文字列を再構成し、文候補数ｃを１加算し、候補構文要素や文字列の情報を文候補の配列ｃｎｄのｃ番目の要素に格納する。
【００７６】
ここで、文候補はつぎの数６の形式で表されるデータである。
【００７７】
【数６】

【００７８】
文字列再構成の関数ｒｅｃｏｖｅｒＳｔｒの処理手順を図１２に示す。関数ｒｅｃｏｖｅｒＳｔｒは、候補構文要素ｃｓｅ文字ｎの配列ｓｔｒを引き数とし、ｓｔｒにｃｓｅに対応する終端記号の列を格納する関数である。まず、ｃｓｅに対応する記号を調べ、もし終端記号であればこの終端記号を文字列の末尾に追加して処理を終了する。もし、終端記号でなければ、候補構文要素ｃｓｅ．ｃｓｅＬ、ｃｓｅ．ｃｓｅＲを引き数としてｒｅｃｏｖｅｒＳｔｒを再帰的に呼び出す。この際、常にｃｓｅ．ｃｓｅＬを先にｒｅｃｏｖｅｒＳｔｒに渡してｒｅｃｏｖｅｒＳｔｒを呼び出すことにより、正しい順序で文字列がｓｔｒに格納されることになる。
【００７９】
図１３は、構文解析４０５で候補構文要素が登録される過程を模式的に示したものである。１３０１は入力となる候補パタンネットワークを模式的に示す。これから図１０におけるループ１００１の第１回目の繰り返しで登録された候補構文要素を破線１３０２と１３０３の間に示す。以下同様に２回目、３回目の繰返しで登録した候補構文要素が破線１３０３以下に示してある。最終的に登録された文開始記号Ｓに対応する候補構文要素は破線１３０５の下に示してある。
【００８０】
図１４は、文開始記号Ｓに対応する候補構文要素１４０２、１４０３を登録するのに寄与したもの候補構文要素のみに注目し、それらの関係を模式的に表したものである。図中の点線矢印は、候補構文要素を登録する際にどのように生成規則を適用したかを示す。例えば、矢印１４０４と矢印１４０５は、候補構文要素１４０６を登録する際、候補構文要素１４０７と１４０８を参照してｙｎｒ→ｙｅｎｂｉｎｚｎという生成規則を適用したことを示している。
【００８１】
検定処理４０８は構文解析４０５で検出した文候補が妥当な文字列となっているか、さらに詳細に検定する処理である。検定処理４０８の処理の概略を図１５に示す。入力は単数もしくは複数の文候補である。まず、各文候補の元になる候補パタンの位置、大きさの関係が妥当かどうか判定する（１５０１）。もし妥当でなければ、当該文候補を削除する。
【００８２】
つぎに、得られた文字列の内容が妥当かどうか、検定する。ここでは例えば金額の値が異常に大きかったり小さかったりしないかなどを調べ、もし妥当でなければ、当該文候補を削除する（１５０２）。最後に、残った文候補の中から最も評価値が大きいものを選択し、出力する（１５０３）。もし、一つも文候補が残らなかったなら、空文字列を出力する。
【００８３】
配置検定処理の対象の一例を図１６（Ａ）に示す。ここでは、文候補１６０１が候補構文要素１６０２と１６０３から見い出されたものｔであるとする。通常は数字や「￥」などの中央附近にあるべき「−」が、この例では上の方に位置している。検定処理はこうした配置情報の異常を検出し、不正があれば当該文候補を削除する。こうした判定の際には、標準化辞書中に格納された空間的配置に関する制約条件を利用する。
【００８４】
また、図１６（Ｂ）の例で、文字識別が候補パタン１６０４を「，」と識別（すなわち「，」とみなすことに対し文字識別が高い評価値を出力）としたとする。この場合、本来下にあるべきが「，」が上の方にあるため、検定処理はこのように１６０４を「，」とみなす解釈を含む文候補を削除する。
【００８５】
また、同じく図１６（Ｂ）において候補パタン１６０４「１」と文字識別が識別したが他の文字と比べて極端に小さい場合、図１６（Ｃ）において候補パタン１６０５（破線でかこわれている部分）を文字識別が「３」と識別したが他の文字より極端に大きい場合などが、検定で文候補を削除する対象となる。
【００８６】
以上の実施例では、ＣＹＫ法にならい、生成規則を一旦標準化してから構文解析４０５を実行する。一方、標準化せずに構文解析を行うことも可能である。この場合、長さが３以上の記号列を生成する生成規則を、候補構文要素の登録の際に用いることができる。このため、標準化した生成規則を用いる際に比べ、不要な候補構文要素を少なくすることができる。
【００８７】
標準化しない生成規則を構文解析４０５で用いるためには、候補構文要素をつぎの数７のような形式に改める必要がある。
【００８８】
【数７】

【００８９】
ここで｛ｃｓｅ１，ｃｓｅ２，，，ｃｓｅｌ｝は生成規則右辺に対応する候補構文要素の列である。これは、前記数４で生成規則右辺に対応する候補構文要素を表していたｃｓｅＬとｃｓｅＲに代わるものである。
【００９０】
処理手順に関しては、終端記号の評価８０１、認識結果出力８０３はそれぞれ図９、図１０の説明のとおりでよい。非終端記号の評価は図１７のような処理手順となる。ここでは説明を簡単にするために、動的計画法に基づく絞り込み（図１０では処理ブロック１００５）を行わない処理を例としている。
【００９１】
まず、ループ１７０１の冒頭で前回繰り返し時点の候補構文要素数ｍ０をｍに代入する。つぎに、生成規則に対応する連続した候補構文要素の列が存在するかどうかを調べる（１７０２）。存在すれば、ｍを１加算し、この候補構文要素の列に対応する新たな候補構文要素を配列ｃｓｅのｍ番目に登録する。ループ１７０１の最後で、もしｍ０＝ｍであれば、すなわちその繰り返しで新たに登録された候補構文要素がなければ、ループを終了する。
【００９２】
図１８は、文法情報辞書ファイル内の記述で言語空間的配置の制約条件を表す関数の意味を示す。
【００９３】
「（）」は、当該の生成規則に関し、空間的配置に関する制約がないことを示す。「ｅｑｓ（ａ１，ａ２，，，ａｎ）」は、文字の上端と下端が揃っているかどうかを判定する関数で、ａ１，ａ２，，，ａｎに含まれる文字の上端および下端の座標の最大値と最小値の差が、文字高さの平均の２０％以内であれば真を、そうでない場合には偽となる。
【００９４】
「ｅｑｓｐ（ａ１，ａ２，，，ａｎ）」は、文字のピッチが揃っているかどうか判定する関数で、ｅｑｓ（ａ１，ａ２，，，ａｎ）の条件に加え、ａ１，ａ２，，，ａｎに含まれる文字の重心間の距離の最大値と最小値の差が、平均値の２０％以内であれば真を、そうでない場合には偽となる。
【００９５】
「ｓｍａｌｌ（ａ）」は文字ａが「，」などのように小さい文字かを判定する関数で、ａの（下端から上端まで）の値が、それ以外の文字の平均の４０％以下であれば真を、そうでない場合には偽となる。
【００９６】
「ｌｏｗｅｒ（ａ）」は文字ａが「，」などのように文字行の下の方にある文字かを判定する関数で、ａの上端の値とａ以外の文字の上端の平均の値の差が、ａ以外の文字の高さの平均の４０％以下であれば真を、そうでない場合には偽となる。
【００９７】
「ｍｉｄｄｌｅ（ａ）」は文字ａが「ー」などのように文字行の中の辺りの高さにある文字かを判定する関数で、ａの重心のＹ座標値とａ以外の文字の重心の平均のＹ座標値の差が、ａ以外の文字の高さの平均の４０％以下であれば真を、そうでない場合には偽となる。
【００９８】
「ｎｍｒ（ａ）」は、ａを構成する文字から数字のみを取り出して、他の関数に渡すための関数である。
【００９９】
【発明の効果】
数字の文字列の知識を文脈自由文法で記憶する手段を設けることにより、数字を中心とする文字列の表記上の制約が簡潔に表現できる。また、上昇型構文解析法を利用した文字列照合手段により、高速かつ高精度に文字識別識別結果を文字列として解釈することを実現できる。さらに、文脈自由文法における記号列同士の書換えの規則に併せて文字の空間的配置の制約を記憶する手段を設けるとともに、置換え規則を適用する際に文字の空間的配置が妥当かどうか判定する手段を設けることにより、より高精度で文字列の認識が可能となる。
【図面の簡単な説明】
【図１】本発明の一実施例におけるハードウエアの構成例を示すブロック図。
【図２】読み取り対象の文書の例を示す図。
【図３】文法情報辞書ファイルの例を示す説明図。
【図４】本発明の文字列認識のデータフローの例を示す流れ図。
【図５】本発明におけるフィールド画像の例を示す図。
【図６】切出し仮説ネットワークの例を示す図。
【図７】数字文字列の標準化された生成規則の例を示す説明図。
【図８】構文解析のデータフローの例を示す流れ図。
【図９】終端記号の評価の処理手順の例を示す図。
【図１０】非終端記号の評価の処理手順の例を示す図。
【図１１】認識結果出力の処理手順の例を示す図。
【図１２】文字列再構成関数の処理手順の例を示す図。
【図１３】候補構文要素の登録の過程の例を示す説明図。
【図１４】候補構文要素の登録の過程の例を示す図。
【図１５】検定処理のデータフローの例を示す流れ図。
【図１６】配置検定の対象の例を示す説明図。
【図１７】標準化しない生成規則を用いた非終端記号評価処理手順の例を示す図。
【図１８】空間的配置の制約を表す関数の例を示す説明図。
【符号の説明】
１０１…文法登録装置、１０２…通信回線、１０３…文字列認識装置、１０４…認識対象の文書、４０５…構文解析、４０６…検定。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a character string recognizing apparatus that reads a character string centered on numbers such as an amount, date, and telephone number described on a document such as a form.
[0002]
[Prior art]
Representative examples of character strings mainly consisting of numbers include a monetary amount, a telephone number, and a date. In the following, the prior art related to character string recognition will be described by taking the recognition of a character string representing the amount as an example.
[0003]
Examples of character strings representing amounts of money to be written on a form include “¥ 100-”, “¥ 1,200-”, and “¥ 12,000,000-”.
here,
・ Add "¥" to the beginning of the amount.
・ Add "," every 3 digits,
・ Add "-" at the end of the amount,
Rules are used.
[0004]
As a method for reading such a character string, the following procedure is generally known.
(1) Extraction of characters: Extract partial images that are likely to be characters.
(2) Character identification: The character category (character code) of each partial image is identified.
(3) Character string verification: Interpret the character identification result as a character string.
[0005]
In the character cutout of (1) above, a method is often used in which a combination of connected components that satisfy certain conditions such as position and size is extracted from a character line. At this time, it may be difficult to detect a boundary between characters due to contact between characters. In such a case, the contacted part is detected and cut, and multiple hypothesis testing method is used for character extraction (Reference 1: H. Fujisawa, Y. Nakano, and K. Kurino, “Segmentation Methods for Character Recognition: From Segmentation to Document Structure Analysis “Proc. Of the IEEE vol. 80, No. 7, 1992) can be used to improve accuracy.
[0006]
Here, the multiple hypothesis testing method is a method of determining how to cut out characters by using a result of character identification after once trying to cut out a plurality of characters.
[0007]
For the character identification in (2) above, a pattern matching method is often used in which a standard pattern stored in advance is selected most closely to the input pattern. Character identification usually outputs a plurality of character candidates and evaluation values of their likelihood. In the subsequent character string matching, an appropriate one is selected from these plural character candidates as a character string.
[0008]
In the character string collation of (3) above, it is checked whether or not the character string to be recognized complies with the above rules. In addition, when there is dirt on the form or blurring of the character, or when the deformation of the character is large or the characters are in contact with each other, it is difficult to accurately recognize the character string. In such a case, in order to improve the accuracy of character extraction and character identification, it is known as a solution to perform character extraction or selection of candidate characters using the above rules by character string matching processing.
[0009]
As the above method (3), the I.D. D. Codes etc. (reference 2: Marukawa et al. "A study of character string check function in recognition of numeric character strings with notation rules" 1994 IEICE Autumn Conference D-313, p.321), address (Reference 3: Ogata et al. “Verification method for continuous notation of address display number and ridge / room number” IEICE Technical Report PRMU 99-219), (Reference 4: Ooi et al. "Chome / Block Recognition System", IEICE Technical Report NLC92-96, 1992).
[0010]
However, the conventional character string matching method is highly dependent on the object, and it is difficult to change the object. For example, Document 2 or Document 4 is a method specialized for each recognition target, and is difficult to apply to other methods. Further, in the technique of the above-mentioned document 3, the recognition target can be changed by changing the dictionary of the notation format, but the number of combinations of notations is often enormous, and it is not easy to change the dictionary.
[0011]
In addition, it is known that when a context-free grammar is used for character string matching, many notation formats can be described in a concise manner (Reference 5: Koga et al. “Place name notation method and place name character string recognition method”, Japanese Patent Application No. 11-187753. ). However, there has been no example in which such a method is applied to a character string centered on numbers as in the present invention.
[0012]
Furthermore, in order to read the amount of money with high accuracy, "," is at the bottom of the text line, numbers are almost the same size, but "-" and "," may be small, etc. It is necessary to use information on the spatial arrangement of characters. However, in the conventional method, such arrangement information cannot be effectively combined with the notation rules.
[0013]
[Problems to be solved by the invention]
A first problem to be solved by the present invention is to realize a character string recognition device that recognizes a character identification result as a character string with high accuracy and high speed by using a rule for notation of a character string centered on numbers. That is. The second problem to be solved by the present invention is that the contextual notation restrictions can be described succinctly, thereby enabling the character string recognition device to easily cope with various recognition objects. It is to be. The third problem to be solved by the present invention is to realize the recognition of a character string with high accuracy by handling the spatial arrangement of characters in combination with the way of representing the character string.
[0014]
[Means for Solving the Problems]
The character string recognition apparatus of the present invention is provided with means for storing knowledge of numeric character strings in a context-free grammar. Thereby, the restrictions on the notation of the character string centering on the number can be expressed concisely. Furthermore, a character string collation means using an ascending syntax analysis method is provided, and a character identification / identification result is interpreted as a character string at high speed and with high accuracy. Furthermore, means for storing the restrictions on the spatial arrangement of characters in accordance with the rewriting rules between symbol strings in the context-free grammar and means for determining whether the spatial arrangement of the characters is appropriate when applying the replacement rules Is provided.
[0015]
The context-free grammar expresses restrictions on the notation of a character string by a rewrite rule (generation rule) between symbol strings. By using this to define a combination of numbers and “,”, “¥”, etc. grammatically, it is possible to simply express restrictions on the notation of character strings centered on numbers.
[0016]
Also, the ascending syntax analysis method used for interpreting character identification results rewrites partial symbol strings to higher-order symbols repeatedly according to generation rules, so that the optimum character string interpretation can be performed from the character identification results. It is a method to find out.
[0017]
A similar method is known in the field of speech recognition as the CYK method (Document 6: Nakagawa “Speech Recognition by Stochastic Model” The Institute of Electronics, Information and Communication Engineers). This method can efficiently select an optimal character string even when there are a large number of grammatically acceptable character string combinations, and can interpret a character identification result with high accuracy and high speed. Originally, this method was intended for a one-dimensional input such as an audio signal. In the present invention, the method of cutting out characters is expressed in the form of a network called a cut-out hypothesis network, which will be described later, so that a two-dimensional input such as an image can be handled.
[0018]
In the present invention, the notation of the generation rule is further devised so that when a symbol string is rewritten with another symbol, the spatial arrangement constraints satisfied by the characters constituting the symbol string can be stored. As a result, the spatial arrangement of characters can be inspected by the character string matching process or the subsequent process, and the recognition accuracy can be improved.
[0019]
DETAILED DESCRIPTION OF THE INVENTION
An example of an embodiment of the present invention is shown in FIG. This embodiment includes a grammar registration device 101 that supports manual creation of a grammar, and a character string recognition device 102 that recognizes a character string on a document 104 using a given grammar.
[0020]
Grammar data created manually by the grammar registration apparatus 101 is transmitted to the character string recognition apparatus 102 via the communication line 103. Further, copying may be performed via a storage medium such as an FD (floppy disk).
[0021]
The grammar registration device 101 includes a communication device 110, a processor 112, an FD drive 114, a disk device 115, a memory 113, a display device 116, a keyboard 117, and a mouse 118 connected by a bus 111.
[0022]
The communication device 110 is connected to the communication line 103 and manages communication with the character string recognition device 102. The processor 112 is responsible for overall control of the grammar registration apparatus 101, execution of programs that support creation of grammars, and the like. The program file group 120 and the created grammar information dictionary file 121 are stored in the disk device 115 and read into the memory 113 when the system is activated.
[0023]
The manual grammar editing is performed by the grammar dictionary editing program 123. This has functions such as display / editing of data in the grammar dictionary table 125 stored in the memory 113, reading from the disk device 115, and writing to the disk device 115. A person who performs editing performs input using an input device such as the keyboard 117 and the mouse 118 while confirming the contents displayed on the display device 116.
[0024]
An example program 124 exemplifying what character string is generated from the grammar stored in the grammar dictionary table 125 is also stored in the memory 113 in order to support the editing process. The example program 124 also checks whether the grammar is valid. Further, a grammar case file 122 in which past grammar cases are registered is also stored in the disk device 115 for reference during creation. The FD drive 114 is used when copying the grammar to the character string recognition device 102 using the FD.
[0025]
The character string recognition device 102 is obtained by connecting a communication device 130, a processor 132, a scanner 136, an FD drive 134, a disk device 135, and a memory 133 through a bus 131. The processor 132 is responsible for overall control of the character string recognition apparatus 102, execution of character string recognition programs, and the like. The program file group 160 and the grammar information dictionary file 161 created by the grammar registration device 101 are stored in the disk device 135 and read into the memory 133 when the system is activated.
[0026]
The grammar created by the grammar registration device 101 is stored in the disk device 135 by the new device 130 via the communication line 103 or copied from the FD using the FD drive 114. The document 104 to be read is photoelectrically converted by the scanner 136 and taken as an input image and stored in the memory 133. A binary image is usually used as the input image. Unless otherwise specified in this specification, the image is assumed to be a binary image.
[0027]
The memory 133 stores a group of programs described later, that is, a field extraction program 141, an extraction hypothesis generation program 142, a syntax analysis program 143, a standardization program 144, an examination program 145, and the like. In addition to the standard type generation rule table 150 that stores grammatical information, the memory 133 includes a field image table 147, a cut-out hypothesis table 148, a syntax element lattice table 149, a sentence candidate table 151, and a recognition result table 152, which will be described later. Provide.
[0028]
An example of the document 104 that is a recognition target of the present invention is schematically shown in FIG. The document to be read in this embodiment is a form in which areas are divided by frame lines. It is determined that a character string to be read is entered in one of the divided frames 201, and the coordinates of the frame are known in advance.
[0029]
The grammar used in the present invention to express the rules for the representation of numeric character strings is expressed in the form of the following number 1 (Reference 7: JE Hopcroft et al., “Language Theory and Automata” Science, ISBN-7819-0250-2).
[0030]
[Expression 1]

[0031]
Here, numbers and characters such as “¥”, “,” and “-” that are actually used in numeric character strings are terminators, and other syntax elements (also called grammatical classes in some literature) are non-terminals. It corresponds to a symbol (see page 11 of the above-mentioned document 7). The generation rule represents a rule for generating a symbol string. That is, the symbol string on the right side can be generated from the symbol on the left side of “→” in Equation 1 (see p. 10 of the above-mentioned document 7).
[0032]
Since the context free grammar is used in the present invention, the left side is always one variable, and the right side must not be an empty character string. Furthermore, a grammatically valid character string (such as “¥ 1,234-”) corresponds to the sentence start symbol. In the following, lower case alphabets are used for terminal symbols, upper case alphabets are used for variables representing non-terminal symbols or terminal symbols, and Greek letters are used for variable strings.
[0033]
Text data as shown in FIG. 3 is used for the grammar information dictionary file 121 used in the present invention. Here, “:: =” corresponds to “→”, which means replacement in Equation 1, and each line corresponds to one generation rule. The terminal symbol is always a one-character word, the non-terminal symbol is a two-character word, and the word and “:: =” are separated by a space. For example, the 10th line in FIG. 3 shows that the symbol Binzn (301) and the symbol string “Nzd 0” (303) are linked by the symbol “:: =” (302), and the latter can be generated from the former. Yes. Also, the non-terminal symbol appearing at the end of this text data, that is, the last line in the example of FIG. 3, is regarded as a sentence start symbol. In this case, “amt” is the sentence start symbol.
[0034]
In the grammar information dictionary file 121 used in the present invention, information related to the arrangement of characters is stored in a format such as 304 together with the grammatical information as described above. This represents a constraint condition regarding a spatial arrangement that should be satisfied by characters included in the original character string when applying each generation rule. For example, in 304, ai (i is 1, 2, 3,...) Represents the i-th term on the right side of “:: =”, and “eqps (a1, a2)” is a character included in a1 and a2 (ie, , The partial image corresponding to the terminal symbol) is substantially equal in size and pitch. The meanings of the symbols used here are shown in FIG.
[0035]
The flow of processing in character string recognition according to the present invention will be described with reference to the data flow diagrams of FIGS. The character string recognition process 402 is a process realized by the character string recognition apparatus 102. The character string recognition process 402 takes grammar information 401 (stored in the grammar information dictionary file 161 of FIG. 1) and an image captured by the scanner 104 as input, and outputs a character string as a recognition result. Grammar information 401 is manually edited in advance by grammar information editing 400. The grammar information editing 400 is realized by the grammar registration apparatus 101 of FIG.
[0036]
First, the field cutout 403 is realized by the field cutout program 141 in a process of cutting out an image of a reading target area (reading field) from an input image and outputting it to the field image table 147. The cut-out hypothesis generation 404 is a process of analyzing a field image and detecting a partial image (candidate pattern) that is likely to be a character based on information such as size and shape, and is realized by the cut-out hypothesis generation program 142. Is stored in the cut-out hypothesis table 148 in the form of a cut-out hypothesis network described later.
[0037]
The standardization process 406 is a process of converting the input grammar information 401 into a standard type that can be easily used in later processing, and is realized by the standardization program 144, and the result is stored in the standard type generation rule table 150.
[0038]
The syntax analysis 405 is realized by the syntax analysis program 143, which is a process of recognizing a character string using the character identification 407 using the standardized generation rule and the extracted hypothesis network as inputs. If a reasonable sentence start symbol is obtained, information on the character string corresponding to the sentence start symbol is output to the sentence candidate table 151 in the form of data called a sentence candidate.
[0039]
The verification process 408 is a process for checking in detail whether the obtained sentence candidate is valid, and outputting the most probable one from the valid ones as recognition results to the recognition result table 152, and is realized by the verification program 145. .
[0040]
Next, processing of the cut-out hypothesis generation 404, standardization 406, character identification 407, syntax analysis 405, and test 408 will be described in order.
[0041]
The cut-out hypothesis generation 404 is a process of analyzing a field image and detecting a partial image (candidate pattern) that is likely to be a character based on information such as size and shape. At this stage, it is often impossible to uniquely specify how to cut out characters. For this reason, the cut-out hypothesis generation 404 detects candidate patterns based on a hypothesis on how to cut out a plurality of characters, and selects the correct one in the subsequent processing.
[0042]
For example, candidate patterns as shown in FIG. 6 can be detected from the field image shown in FIG. 5 obtained by cutting out the frame 201 of the input image shown in FIG. As shown in FIG. 5, it is inevitable that characters and dirt other than those to be read are mixed in the field image. A set of these candidate patterns can be expressed by a network as shown in FIG. Hereinafter, such a network is referred to as a cut-out hypothesis network.
[0043]
On the cut-out hypothesis network, the detected candidate pattern is expressed as an edge, and the boundary of the candidate pattern is expressed as a node. A hypothesis on how to cut out characters in a field is expressed by a path in the cut-out hypothesis network.
[0044]
An integer identifier is assigned to the candidate pattern boundary on the extracted hypothesis network. Each candidate pattern is expressed in the following formula 2. Further, the entire cut-out hypothesis network is expressed as in Equation 3.
[0045]
[Expression 2]

[0046]
[Equation 3]

[0047]
The standardization process 406 is a process for converting input grammar information into a standard type that can be easily used in later processing. Here, the standard form is one in which the right side of the generation rule is one terminal symbol or two symbols. An arbitrary context-free grammar G can be converted into such a standard form (Chomsky's standard form) (see P.56 of the above-mentioned document 7).
[0048]
FIG. 7 shows an example of the generation rule converted from FIG. 3 to the standard type. At this time, the description of the constraint on the spatial arrangement is also converted together with the generation rule. The conversion is performed as follows.
[0049]
-If there is a description of the spatial arrangement related to the standardized production rule, it is added to the standardized production rule as it is.
[0050]
When the spatial arrangement is represented by a function having three or more arguments such as “eqps (a1, a2, a3)”, this is described in two terms “eqps (a1, a2)” ”“ Eqps (a2, a3) ”and added to the corresponding generation rules.
[0051]
The character identification 407 is realized by the character identification program 146 through a process for obtaining a probable terminal symbol and its evaluation value using the candidate pattern as input. A system such as Mori “Pattern Recognition” (Reference 8: ISBN4-88552-075-4 C3055, IEICE) can be used here. In many cases, a plausible terminal symbol cannot be uniquely determined at this stage. In such a case, a plurality of terminal symbol / evaluation value pairs are output.
[0052]
Next, the processing procedure of the syntax analysis 405 will be described with reference to FIGS.
[0053]
FIG. 8 is a data flowchart showing an outline of processing of the syntax analysis 405. The syntax analysis 405 governs processing corresponding to the character string collation (3) described in the section of the conventional technique.
[0054]
First, each candidate pattern is evaluated using the character identification 407 as a terminal symbol, and the probability is obtained (801). The process is stored in the syntax element lattice table 149 in a format called a syntax element lattice.
[0055]
Next, the obtained combination of syntax element candidates is evaluated against a standardized generation rule (802). This matching process is repeated until no new syntax element is found. Next, if sentence start symbols are included in the obtained syntax element candidates, data (sentence candidates) corresponding to these sentence start symbols are output (803). If the sentence start symbol is not included, a NULL pointer is output.
[0056]
Candidate syntax elements are represented in the following formula 4.
[0057]
[Expression 4]

[0058]
lft and rit are identifiers of boundaries between candidate syntax elements. Let the left boundary identifier be lft and the right boundary identifier be rit. α is a symbol corresponding to the candidate syntax element. f is a probability value of the candidate syntax element based on the character identification result or the like.
[0059]
When α is a terminal symbol, a NULL pointer (a value that cannot be a memory address) is assigned to cseL and cseR, respectively, and a pointer to the corresponding candidate pattern is assigned to ptn. At this time, the values of lft and rit are set to the same values as lft and rit of ptn, respectively. An evaluation value obtained as a result of character identification is substituted for f. When α is a non-terminal symbol, a pointer to a candidate syntax element corresponding to the symbol on the right side of the generation rule is substituted for cseL and cseR, a NULL pointer is substituted for ptn, and the value of lft is the value of lft of cseL. Make equal.
[0060]
At this time, if cseR is NULL, the value of rit is set equal to the value of rit of cseL. As an example, a group of candidate syntax elements (non-terminal symbols such as “yen” and “zr”) generated from terminal symbols is schematically shown between broken lines 1303 to 1304 in FIG. If cseR is not NULL, the value of rit is made equal to the value of cseR.
[0061]
The syntax element lattice is expressed in the following form as in the case of the candidate pattern network.
[0062]
[Equation 5]

[0063]
FIG. 13 schematically shows an example of the syntax element lattice. A candidate syntax element group of terminal symbols corresponding to candidate patterns on the candidate pattern network 1301 is shown between broken lines 1302 and 1303. Also, non-terminal symbols found from these terminal symbols are shown between 1303 and 1304. In addition, a candidate syntax element group registered from a pair of non-terminal symbols is schematically shown below a broken line 1304.
[0064]
FIG. 9 shows a processing procedure of terminal symbol evaluation (801). Hereinafter, an action diagram (Reference 9: J. Martin “Software structuring technique”, Modern Science Co., Ltd., ISBN 4-7649-0124-2 C3050 P5562E) is used to describe the detailed processing procedure.
[0065]
Here, in order to represent a specific element of a variable having a plurality of elements, for example, lft, such as cse in Equation 4, the operator “.” Is used and represented as “cse.lft”. In addition, a notation such as “cse = (a, b, c, d, e, f)” represents that a, b, c, d, e, and f are substituted for each element of cse. . “++” indicates that 1 is added to the value of a certain variable.
[0066]
First, the value of the variable m representing the registered candidate syntax element is initialized to 0 at the beginning of the process. Next, the probability of all candidate patterns as terminal symbols is evaluated by the function f. If the value of f exceeds the reference value tf, 1 is added to the value of m, and this syntax element is stored in the mth of the candidate syntax element array cse (901).
[0067]
In this case, character identification 407 is used for evaluation of f. Next, for each of the m candidate syntax elements stored in cse, it is checked whether or not the generation rule for generating the symbol α (in this case, α is all terminal symbols) is in the generation rule set P (902). If there is, 1 is added to the value of m, and information corresponding to the candidate syntax element corresponding to the left side of this generation rule is stored in the mth of cse.
[0068]
The evaluation of non-terminal symbols (802) is a process of repeatedly finding a candidate syntax element pair generated by the generation rule from the syntax element lattice and adding candidate syntax elements corresponding to the pair. At this time, the following measures are taken in the processing procedure.
[0069]
Referring to the number m0 and m1 of registered candidate syntax elements at the previous and previous iterations, it is determined that a new syntax element has been added. If there is no addition, the iteration ends.
[0070]
If there are multiple candidate syntax elements in the same range on the syntax element lattice (that is, if there are multiple candidate syntax elements with the same lft and rit values), use the principle of dynamic programming. Only the one with the highest evaluation value is used.
[0071]
The processing procedure of non-terminal symbol evaluation (802) will be described with reference to FIG. First, at the beginning of the process, m0 and m1 are initialized to 0 and m, respectively. Next, the loop 1001 finds all the pairs that are adjacent in the syntax element lattice (that is, one lft and the other rit have the same value) (1002), and further generates the pair. Is checked for in the standardized generation rule (1003).
[0072]
If there is a corresponding generation rule and any one of the candidate syntax element pairs is registered after the previous iteration of the loop 1001 (1004), the evaluation value of this combination is calculated and stored in the variable f. . Here, the sum of evaluation values f of two candidate syntax elements is used as the probability of combination. At this time, the restriction condition regarding the spatial arrangement of characters stored together with the standardized generation rule is referred to. If this restriction condition is not satisfied, the evaluation value is set to 0, and this portion is evaluated low in the subsequent processing. To.
[0073]
Further, it is checked whether or not a candidate syntax element having the same range as the pair of candidate syntax elements has already been registered (1005). If registered, the information of the candidate syntax element corresponding to the new pair is stored in the mth element of cse only when the new pair has a higher evaluation value than the existing candidate syntax element. At this time, it is determined whether there is a candidate syntax element registered after m1 that satisfies the determination of 1005 (1006). If there is, the candidate syntax element is overwritten with a new candidate syntax element. If not, 1 is added to m, and information on a new candidate syntax element is stored in the m-th array cse. If the corresponding candidate syntax element is not registered in the determination of 1005, 1 is unconditionally added, and information on the candidate syntax element corresponding to the new pair is stored in the mth element of cse.
[0074]
At the end of the loop 1001, it is determined whether a new candidate syntax element has been added. If it has been added, the loop 1001 is terminated (1007). If not, the variables m0 and m1 are updated and the iteration is executed again.
[0075]
Next, the processing procedure of the recognition result output 803 will be described with reference to FIG. First, a variable c representing the number of sentence candidates is initialized to 0 at the beginning of the process. Next, all candidate syntax elements are examined, and it is examined whether there is anything corresponding to the sentence start symbol (1101). If there is, add 1 to c, call the function recoverStr to reconstruct the character string corresponding to the sentence start symbol, add 1 to the sentence candidate number c, and add the candidate syntax element and character string information to the sentence candidate. Store in the c th element of the array cnd.
[0076]
Here, the sentence candidate is data represented in the following formula 6.
[0077]
[Formula 6]

[0078]
FIG. 12 shows the processing procedure of the character string reconstructing function recoverStr. The function recoverStr is a function that takes an array str of candidate syntax elements cse character n as an argument and stores a string of terminal symbols corresponding to cse in str. First, the symbol corresponding to cse is checked. If it is a terminal symbol, this terminal symbol is added to the end of the character string, and the process is terminated. If it is not a terminal symbol, the candidate syntax element cse. cseL, cse. RecoverStr is called recursively with cseR as an argument. At this time, cse. By passing cseL to the recoverStr first and calling the recoverStr, the character strings are stored in the str in the correct order.
[0079]
FIG. 13 schematically shows a process in which candidate syntax elements are registered in the syntax analysis 405. Reference numeral 1301 schematically denotes a candidate pattern network to be input. The candidate syntax elements registered in the first iteration of the loop 1001 in FIG. 10 are shown between broken lines 1302 and 1303. Similarly, the candidate syntax elements registered in the second and third iterations are shown below the broken line 1303. The candidate syntax element corresponding to the finally registered sentence start symbol S is shown below the broken line 1305.
[0080]
FIG. 14 schematically shows the relationship between the candidate syntax elements that contribute to registering the candidate syntax elements 1402 and 1403 corresponding to the sentence start symbol S and pay attention only to the candidate syntax elements. The dotted arrows in the figure indicate how the generation rules are applied when registering candidate syntax elements. For example, an arrow 1404 and an arrow 1405 indicate that when the candidate syntax element 1406 is registered, the generation rule ynr → yen binzn is applied with reference to the

candidate syntax elements

1407 and 1408.
[0081]
The verification process 408 is a process for verifying in more detail whether the sentence candidate detected by the syntax analysis 405 is a valid character string. An outline of the processing of the test processing 408 is shown in FIG. Input is one or more sentence candidates. First, it is determined whether the relationship between the position and size of the candidate pattern that is the basis of each sentence candidate is appropriate (1501). If not valid, the sentence candidate is deleted.
[0082]
Next, it is tested whether the contents of the obtained character string are valid. Here, for example, whether the value of the amount is abnormally large or small is checked, and if it is not valid, the sentence candidate is deleted (1502). Finally, the sentence having the largest evaluation value is selected from the remaining sentence candidates and output (1503). If no sentence candidates remain, an empty string is output.
[0083]
An example of the object of the placement verification process is shown in FIG. Here, it is assumed that the sentence candidate 1601 is t found from the

candidate syntax elements

1602 and 1603. Normally, “−”, which should be near the center, such as a number or “¥”, is located at the top in this example. The verification process detects such an abnormality in the arrangement information, and deletes the sentence candidate if there is an injustice. In making such a determination, the constraint condition regarding the spatial arrangement stored in the standardized dictionary is used.
[0084]
Further, in the example of FIG. 16B, it is assumed that the character pattern is identified as “,” as a candidate pattern 1604 (that is, an evaluation value with a high character pattern is output compared with “,”). In this case, since “,” is supposed to be at the top, the test process deletes sentence candidates including interpretations that regard 1604 as “,”.
[0085]
Similarly, in FIG. 16B, when the character identification is identified as candidate pattern 1604 “1” but extremely small compared to other characters, the candidate pattern 1605 in FIG. ) Is identified as “3” but is extremely larger than other characters, etc., the sentence candidate is deleted by the test.
[0086]
In the above embodiment, the parsing 405 is executed after the generation rule is once standardized in accordance with the CYK method. On the other hand, parsing can also be performed without standardization. In this case, a generation rule that generates a symbol string having a length of 3 or more can be used when registering candidate syntax elements. For this reason, unnecessary candidate syntax elements can be reduced as compared with the case of using a standardized generation rule.
[0087]
In order to use the generation rule that is not standardized in the syntax analysis 405, it is necessary to modify the candidate syntax element into the following formula 7.
[0088]
[Expression 7]

[0089]
Here, {cse1, cse2,... Csel} is a sequence of candidate syntax elements corresponding to the right side of the generation rule. This is an alternative to cseL and cseR, which represent the candidate syntax elements corresponding to the right side of the generation rule in Equation 4.
[0090]
Regarding the processing procedure, terminal symbol evaluation 801 and recognition result output 803 may be as described in FIGS. 9 and 10, respectively. Evaluation of a non-terminal symbol is a processing procedure as shown in FIG. Here, in order to simplify the description, an example of processing that does not perform narrowing down based on dynamic programming (processing block 1005 in FIG. 10) is taken as an example.
[0091]
First, at the beginning of the loop 1701, the number m0 of candidate syntax elements at the previous iteration is substituted for m. Next, it is examined whether there is a sequence of consecutive candidate syntax elements corresponding to the generation rule (1702). If it exists, m is incremented by 1, and a new candidate syntax element corresponding to this candidate syntax element column is registered in the mth array cse. At the end of the loop 1701, if m0 = m, that is, if there are no candidate syntax elements newly registered in the repetition, the loop is terminated.
[0092]
FIG. 18 shows the meaning of a function that represents the constraint condition of the language spatial arrangement in the description in the grammar information dictionary file.
[0093]
“()” Indicates that there is no restriction on the spatial arrangement for the generation rule. “Eqs (a1, a2,..., An)” is a function for determining whether the upper and lower ends of characters are aligned. The maximum value of the coordinates of the upper and lower ends of characters included in a1, a2,. True if the difference between and the minimum value is within 20% of the average character height, false otherwise.
[0094]
“Eqsp (a1, a2,..., An)” is a function for determining whether or not the pitch of characters is uniform. In addition to the conditions of eqs (a1, a2,. True if the difference between the maximum value and the minimum value of the distance between the centroids of the included characters is within 20% of the average value, and false otherwise.
[0095]
“Small (a)” is a function that determines whether the character a is a small character such as “,”, etc. If the value of a (from the bottom to the top) is 40% or less of the average of other characters True if not, false otherwise.
[0096]
“Lower (a)” is a function that determines whether the character a is at the bottom of the character line, such as “,”, and is the value of the upper end of a and the average value of the upper ends of characters other than a. True if the difference is 40% or less of the average height of characters other than a, and false otherwise.
[0097]
“Middle (a)” is a function for determining whether the character a is at a height in the character line, such as “−”, and the Y coordinate value of the center of gravity of a and the center of gravity of characters other than a True if the difference in the average Y-coordinate value is 40% or less of the average height of characters other than a, and false otherwise.
[0098]
“Nmr (a)” is a function for extracting only numbers from the characters constituting a and passing them to other functions.
[0099]
【The invention's effect】
By providing means for storing knowledge of numeric character strings in a context-free grammar, restrictions on the notation of character strings centered on numbers can be expressed concisely. Moreover, it is possible to interpret the character identification and identification result as a character string at high speed and with high accuracy by the character string matching means using the ascending syntax analysis method. Furthermore, means for storing the restrictions on the spatial arrangement of characters in accordance with the rewriting rules between symbol strings in the context-free grammar and means for determining whether the spatial arrangement of the characters is appropriate when applying the replacement rules By providing the character string, the character string can be recognized with higher accuracy.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a hardware configuration example according to an embodiment of the present invention.
FIG. 2 is a diagram showing an example of a document to be read.
FIG. 3 is an explanatory diagram showing an example of a grammar information dictionary file.
FIG. 4 is a flowchart showing an example of a data flow for character string recognition according to the present invention.
FIG. 5 is a diagram showing an example of a field image in the present invention.
FIG. 6 is a diagram showing an example of a cut-out hypothesis network.
FIG. 7 is an explanatory diagram illustrating an example of a standardized generation rule for a numeric character string.
FIG. 8 is a flowchart showing an example of a data flow of syntax analysis.
FIG. 9 is a diagram showing an example of processing procedures for terminal symbol evaluation;
FIG. 10 is a diagram illustrating an example of a processing procedure for evaluating a non-terminal symbol.
FIG. 11 is a diagram illustrating an example of a processing procedure for outputting a recognition result.
FIG. 12 is a diagram illustrating an example of a processing procedure of a character string reconstruction function.
FIG. 13 is an explanatory diagram showing an example of a process of registering candidate syntax elements.
FIG. 14 is a diagram showing an example of a registration process of candidate syntax elements.
FIG. 15 is a flowchart showing an example of the data flow of the verification process.
FIG. 16 is an explanatory diagram showing an example of a placement verification target.
FIG. 17 is a diagram illustrating an example of a non-terminal symbol evaluation processing procedure using a generation rule that is not standardized;
FIG. 18 is an explanatory diagram illustrating an example of a function representing a spatial arrangement constraint.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 101 ... Grammar registration apparatus, 102 ... Communication line, 103 ... Character string recognition apparatus, 104 ... Document to be recognized, 405 ... Syntax analysis, 406 ... Examination.

Claims

Image input means for converting the density information of the document surface into an electrical signal and inputting it as an image;
Field extraction means for extracting a reading field, which is an area in which a character string to be read is to be described, from the input image;
Means for cutting out a partial image as a candidate pattern from the extracted reading field , expressing the candidate pattern as a side, and expressing a boundary between the candidate patterns as a contact point ; and
Character identifying means for identifying the detected candidate pattern as a symbol;
Grammar information storage means for storing a notation of a character string as a set of symbols and a rewrite rule of the symbol string, the rewrite rule having a restriction on a spatial arrangement of candidate patterns as a basis of the symbol string ;
Syntactic analysis means for generating a candidate syntax element group from the plurality of identified symbols according to a rewrite rule, and outputting a combination of the candidate syntax elements as sentence candidates;
A character string recognition apparatus comprising: an evaluation unit that evaluates whether the sentence candidate is valid as a character string, and determines validity of the character string based on a spatial arrangement of the combination of candidate patterns.

2. The character string recognition apparatus according to claim 1, wherein the notation stored in the grammar information storage means is a method of arranging numbers and symbols other than numbers.