JP3709305B2

JP3709305B2 - Place name character string collation method, place name character string collation device, place name character string recognition device, and mail classification system

Info

Publication number: JP3709305B2
Application number: JP18775399A
Authority: JP
Inventors: 昌史古賀; 直広古川; 尚司池田; 日佐男緒方; 裕酒匂; 浩道藤澤
Original assignee: Hitachi Omron Terminal Solutions Corp
Current assignee: Hitachi Omron Terminal Solutions Corp
Priority date: 1999-07-01
Filing date: 1999-07-01
Publication date: 2005-10-26
Anticipated expiration: 2019-07-01
Also published as: CN100424676C; KR20010015113A; KR100692327B1; CN1287317A; JP2001014311A

Description

【０００１】
【発明の属する技術分野】
本発明は、地名文字列照合方法、文字列照合装置、文字列認識装置及び郵便物区分システムに係り、特に、文書上に記載された地名等の文字列を読取る装置に使用される地名文字列記憶手段及び照合手段に適用して好適な地名文字列照合方法、文字列照合装置、文字列認識装置及び郵便物区分システムに関する。
【０００２】
【従来の技術】
一般に、都道府県名、市町村名、字名等の地名単語の並びからなる文字列（以下、地名文字列という）を画像中から読み取る文字認識装置は、
（１）文字パターンを切り出す（文字切出し）、
（２）各々の文字パターンの字種（文字コード）を識別する（文字識別）、
（３）文字の識別結果を予め記憶した地名単語の列と照合する（文字列照合）、
の３つの機能を備えて構成されている。
【０００３】
文字列照合の方法に関する従来技術として、例えば、丸川等による方式（情報処理学会論文誌第３５巻第６号「手書き漢字住所認識のためのエラー修正アルゴリズム」）等が知られている。また、文字切出し、認識、照合を一体化した方式に関する従来技術として、隠れマルコフモデルに基づく方式（O. E. Agazzi、 et al., "Connected And Degraded Text Recognition using Planar Hidden Markov Models、" Proceedings of International Conference on Acoustics、 Speech、 and Signal Processing ）、探索的に文字列を認識する方法（Koga、 et al.,“Lexical Search Approach for Character-String Recognition” Third International Association for Pattern Recognition Workshop on Document Analysis Systems 1998）が知られている。
【０００４】
前述した従来技術は、文字列照合処理のために、予め出現し得る地名文字列を記憶する手段、地名文字列辞書が必要である。そして、地名文字列辞書としては、以下示すような３種類のものがある。
（１）ファイルに格納された「辞書ソースファイル」
これは、後述する「地名表記規則ファイル」等であり、新規作成や修正のために、編集が可能でなくてはならない。
（２）メモリ上に格納された「辞書テーブル」
これは、後述する「地名表記ネットワーク」等であり、辞書ファイルの内容を、照合処理に適した形式でメモリ上に展開したものである。
（３）前述の（１）と（２）との中間段階の「辞書バイナリファイル」
これは、メモリ上への展開を容易にするため、予め展開処理の一部を施した結果をファイルに格納したものである。
【０００５】
従来技術に使用される辞書ソースファイルの形式は明らかにされていない場合が多い。しかし、従来技術は、いずれも、出現し得る地名文字列を予めもれなく辞書テーブルに記憶することを前提にしており、このため、出現し得る地名文字列を、予めもれなく列挙したテキストファイルが、辞書ソースファイルとして用いられていると考えられる。
【０００６】
【発明が解決しようとする課題】
前述した従来技術は、文字列照合処理のために辞書を用意する必要があるが、日本語には、同一の地域を異なる文字列で表現する異表記が多く、出現し得る地名文字列を辞書にもれなく登録することが困難であり、このための完全な辞書を人手で作成することが事実上不可能であるという問題点を有している。
【０００７】
日本語の地名の異表記には、使用する文字の違いによる異表記、単語の省略による異表記、付加的な文字列による異表記、通り名の表記による異表記等がある。以下、これらの異表記の例について説明する。
【０００８】
（１）使用する文字の違いによる異表記
「小沢」と「小澤」、「市ヶ谷」と「市ケ谷」と「市が谷」等がある。
【０００９】
（２）単語の省略による異表記
都道府県名を省略する異表記、「大字」、「字」を省略する異表記がある。
都道府県名を省略する異表記は、郵便物の宛名等の場合に多く見られ、例えば、「埼玉県川越市大字小ヶ谷」と「川越市大字小ヶ谷」等がある。また、「大字」、「字」を省略する例として、例えば、「埼玉県川越市大字小ヶ谷」と「埼玉県川越市小ヶ谷」等がある。
【００１０】
（３）付加的な文字列による異表記
小字名等の本来住所の特定には不要である文字列が付加され異表記であり、例えば、「埼玉県川越市大字小ヶ谷」と「埼玉県川越市大字小ヶ谷字東関」等がある。
【００１１】
（４）通り名の表記による異表記
京都などで多く見られるもので、例えば、「京都市下京区大政所町」と「烏丸通仏光寺下る」等がある。
【００１２】
前述したように、地名の異表記には各種のものがあるが、例えば、「埼玉県川越市小ヶ谷」という地名を例にして、これに対応する異表記を調べて見ると、次に列挙するように、極めて多数の異表記があることが判る。
「埼玉県川越市小ヶ谷」
「埼玉県川越市小ケ谷」
「埼玉県川越市小が谷」
「埼玉県川越市大字小ヶ谷」
「埼玉県川越市大字小ケ谷」
「埼玉県川越市大字小が谷」
「川越市小ヶ谷」
「川越市小ケ谷」
「川越市小が谷」
「川越市大字小ヶ谷」
「川越市大字小ケ谷」
「川越市大字小が谷」
【００１３】
前述の例では、さらに「埼玉県川越市小ヶ谷東田」、「埼玉県川越市小ヶ谷東関」、「埼玉県川越市小ヶ谷西関」等、小字名が併せて用いられる場合があり、前述で列記した１２の異表記との組み合わせを考慮すると、合計８４通りの異表記が存在することになる。
【００１４】
従来技術の場合、前述したような多様な異表記の全ての組み合わせを、網羅的に人手で辞書ファイルに登録する必要があり、このため、辞書ファイル作成には多くの人手がかかるという問題があった。しかも、異表記が特に多い京都市等の場合、市内の町名と通り名との呼び方の合計が数十万通りにものぼり、完全な辞書を人手で作成することは事実上不可能であった。
【００１５】
本発明の第１の目的は、前述した問題点を解決し、文字列照合用辞書に多様な異表記をもれなく登録することを容易にすることのできる地名表記方法を提供することにある。
【００１６】
前述したように、地名の表記に異表記が多い場合、仮に異表記をもれなく辞書に記載することができたとしても、従来技術のものでは、辞書の記憶容量が大きくなり、処理時間も異表記の数に応じてに大きくなってしまうという問題点を生じることになる。
【００１７】
前述の問題点を解決することができる技術として、トライ（Trie）と呼ばれるデータ形式により、辞書の記憶容量を小さくし、さらに照合処理を高速にすることができるようにした技術が、（Koga、 et al.,“Lexical Search Approach for Character-String Recognition” Third International Association for Pattern Recognition Workshop on Document Analysis Systems 1998）等に記載されて知られている。この技術は、表記に多様さがある部分のみ分岐するような、木形式のデータとして地名を表記するものであり、文字列の集合からTrieを自動的に生成することを容易にしたものである。
【００１８】
前述の技術は、例えば、「埼玉県川越市小ヶ谷東田」、「埼玉県川越市小ヶ谷東関」、「埼玉県川越市小ヶ谷西関」の３つの表記から、以下のようなTrieを容易に生成することができる。
埼玉県川越市小ヶ谷東田

以下、このように、地名文字列における文字の連接関係をネットワークで表したものを地名表記ネットワークと呼ぶこととする。
【００１９】
しかし、このようなTrie型の地名表記ネットワークは、文字列の一部分に相違がある場合、これらを全く別の文字列として扱い、別の枝を生成せざるを得ないことになり、このため、例えば、「埼玉県川越市小ヶ谷」の異表記群に対応するTrieは、次に示すような大きなものになってしまう。
【００２０】

（以下略）
【００２１】
前述したように、Trie型の地名表記ネットワークを用いる手法で異表記を表現しようとしても、辞書容量、処理時間ともに大幅に増大してしまうという問題点を生じてしまう。
【００２２】
従って、本発明の目的は、多様な異表記を認識するのための地名辞書に使用するのための、記憶容量が小さくかつ高速に照合処理ができるような記憶形式を持つ地名文字列照合方法、文字列照合装置、文字列認識装置及び郵便物区分システムを提供することにある。
【００２３】
【課題を解決するための手段】
本発明によれば前記目的は、文字列の入力を受ける入力インタフェースと、複数の地名文字列を格納するメモリと、演算処理装置と、出力インタフェースとを有する地名認識装置における地名文字列照合方法において、前記メモリには、地名文字列の一部または全部を構成する部分文字列毎に、文字または構文カテゴリの配列を定義し、文字または定義済みの構文カテゴリの配列からなる構文カテゴリにより表された複数の地名文字列をもとに生成され、該複数の地名文字列における文字の連接関係を有向グラフを用いて表す地名表記ネットワークが予め格納されており、前記演算処理装置は、前記入力インタフェースから入力される入力文字列の部分文字列が、前記メモリに地名表記ネットワークとして予め格納された複数の地名文字列のうちの１つと一致するか否かを判断することにより、入力文字列の中から地名を照合し、前記出力インタフェースからその照合結果を出力することにより達成される。
【００２４】
また、前記目的は、地名文字列照合装置において、地名文字列の一部または全部を構成する部分文字列毎に、文字または構文カテゴリの配列を定義し、文字または定義済みの構文カテゴリの配列からなる構文カテゴリにより表された地名文字列をもとに生成され、該複数の地名文字列における文字の連接関係を有向グラフを用いて表す地名表記ネットワークを記憶する記憶手段と、文字列を入力する入力手段と、前記入力された文字列が前記記憶手段に地名表記ネットワークとして記憶した地名文字列であるか否かを照合する手段と、照合の結果を出力する手段と備えたことにより達成される。
【００２５】
また、前記目的は、地名文字列認識装置において、文書の表面の濃淡を電気信号に変換して得られた画像を入力として、文書上に記載されていた文字を読み取る文字読取り手段と、前述の地名文字列照合装置とを備え、前記地名文字列照合手段内の入力手段が前記文字読取り手段からの文字列を入力することにより達成される。
【００２６】
また、前記目的は、郵便物区分システムにおいて、前述の地名文字列認識装置を使用し、郵便物の宛名の中の地名の文字列を認識し、郵便物を区分し、または、認識結果を郵便物に印刷することにより達成される。
【００２７】
また、前記目的は、文書の表面の濃淡を電気信号に変換して得られた画像を入力とし、文書上に記載されていた文字を読み取り、地名文字列を認識する地名文字列認識装置において、地域を表す地名が、異なる文字列であるが同一の地域を意味する単語の配列により表現される複数の異表記を持つ地名文字列の集合を表現し、地名文字列の一部または全部を構成する部分文字列毎に、文字または構文カテゴリの配列を定義し、文字または定義済みの構文カテゴリの配列からなる構文カテゴリにより地名文字列を表す地名表現方法を使用して地名文字列を記憶する手段と、前記入力画像中の部分画像の配列であって、それぞれの部分画像が前記地名文字列表現により表わされた地名文字列の１つに含まれる各文字と類似するものを見い出す照合を行い、該照合結果を前記部分画像の配列の選択にフィードバックすることにより地名文字列を認識する手段とを備えたことにより達成される。
【００２８】
具体的に言えば、本発明は、前記目的を達成するため、地名の異表記を文脈自由文法の生成規則を用いて表現する。文脈自由文法は、ある文の要素（構文カテゴリ）がどのような他の構文カテゴリの列に置換されるかを、生成規則により表わす(「自然言語処理入門」近代科学社、ISBN4-7649-0143-9）。本発明は、生成規則の表現法の１つとして知られるＢＮＦ記法（Backus-Naur-Form）（中田「コンパイラ」ISBN4-7828-5057-3)を、地名の表現に適するよう拡張した拡張ＢＮＦ記法を用いる。
【００２９】
前述の生成規則により、典型的な異表記のパタン、例えば、「ヶ」、「ケ」、「が」を１つの構文カテゴリとして定義することができ、地名の異表記の集合を簡潔に表現できる。さらに、ＢＮＦ記法で採用されている選択記号を用いることにより、地名の異表記をより簡潔に表現することが可能となる。このため、本発明によれば、多様な異表記の集合をもれなく記載した辞書を容易に作成することができる。
【００３０】
ＢＮＦ記法は、文脈自由文法の生成規則を、置換、オプション、選択等の記号を用いて表現する記法であり、以下のような記号を用いる。
::＝置換。左辺の構文カテゴリを右辺の構文カテゴリまたは文字の配列で置換できることを意味する。
［］オプション。［］内の記述があってもなくてもよいことを意味する。
｜選択。右辺、左辺のいづれかを意味する。
【００３１】
一例として、前述の「埼玉県川越市小ヶ谷」の異表記の生成規則をＢＮＦ記法で表現した例を以下に示す。
＜ｗヶ＞ ::＝ヶ｜ケ｜が
＜埼玉県川越市小ヶ谷＞::＝［埼玉県］川越市［大字］小＜ｗヶ＞谷［［字］東田｜東関｜西関］
【００３２】
また、前述のような表記形式を用いることにより、地名表記ネットワークを小形化することが可能となる。前述の表記形式では、部分文字列の相違は記号「［］」や「｜」を用いて陽に表現されている。このため、部分文字列の相違が異表記にありえる場合、その部分をバイパスするような経路をネットワーク上に容易に設定することができる。例えば、前述のＢＮＦ記法の表記は、下に示すようなコンパクトなネットワークに置き換えることができる。従来のような文字列の羅列からこのようなコンパクトなネットワークを生成することは、困難であった。
【００３３】

【００３４】
【発明の実施の形態】
以下、本発明による地名表記方法及び地名文字列認識方法の実施形態を図面により詳細に説明する。
【００３５】
図１は本発明の実施形態による地名文字列認識の処理例を説明するフローチャートであり、まず、このフローについて説明する。なお、以下の説明において使用するフローチャートは、ゲーン・サーソン記法に従って表現した。この記法に関しては、「Ｊ。マーチン「ソフトウエア構造化技法」近代科学社」、「ISBN4 - 7649 - 0124 - 2 C3050 P5562E」に記載されている。
【００３６】
（１）まず、地名の認識に先立ち、地名文字列生成規則編集処理（ステップ１０１）が、地名の異表記の事例に基づき地名文字列の生成規則を作成し、この生成規則を地名文字列生成規則ファイル１０２に格納する。ステップ１０１の地名文字列生成規則編集処理は、計算機を介した人間の編集作業により実現することができる。
【００３７】
（２）次に、地名表記ネットワーク生成処理（ステップ１０３）が、地名文字列生成規則ファイル１０２を読み込み、地名認識１０４のための辞書である地名表記ネットワークを生成する。ステップ１０３の地名表記ネットワーク生成処理は、計算機上のプログラムとして実現することができる。
【００３８】
（３）次に、地名認識処理（ステップ１０４）が、地名表記ネットワークを参照し、入力画像中から地名文字列を読み取る。ステップ１０４の地名認識処理１０４は、計算機上のプログラムとして実現することができる。
【００３９】
地名文字列生成規則ファイル１０２は、本発明による「拡張ＢＮＦ記法」を用い、地名の異表記群を文脈自由文法の生成規則により表現する。拡張ＢＮＦ記法は、ＢＮＦ記法に結合等の記号を拡張したものであり、以下に説明するような記号を用いられる。
::＝置換。左辺の構文カテゴリを右辺の構文カテゴリ、または、文字の配列で置換できることを意味する。
［］オプション。［］内の記述を省略してもよい事を意味する。
｜選択。この記号の右、左のいづれかを意味する。
（）結合。前後の変数より先に括弧内を評価する。
＜Ｗ文字列＞構文カテゴリ。
＜Ｎ数字＞特定の地域を示す地名文字列の異表記群を表す構文カテゴリ。数字は、地名の識別子。０より大きい整数を用いる。
【００４０】
そして、前述した記号は、以下の優先順位により評価される。
（１）＜Ｗ文字列＞及び＜Ｎ数字＞の変数名の定義
（２）［］及び（）のかっこ類。２重以上の入れ子でかっこ類を用いる場合、内側のかっこを優先して評価
（３）｜
（４）::＝
【００４１】
図２は前述のステップ１０１による編集処理で編集された地名文字列生成規則により表現された地名の表記例と生成規則を用いずに異表記を羅列した例とを示す図である。
【００４２】
図２（Ａ）に示す地名文字列生成規則により表現された地名の表記例は、その例として、「埼玉県川越市大字小ヶ谷」（「東田」「東関」「西関」が小字）、「埼玉県川越市大字笠幡」（「久保」「河南」が小字）、「埼玉県川越市下広谷」の異表記を本発明による拡張ＢＮＦ記法で表記した例である。このように、多数の異表記を含む地名を、本発明により導入した記号を用いることにより、極めて簡単に表現することができる。これに対して、図２（Ｂ）に示す生成規則を用いずに異表記を羅列した例は、多数の異表記を羅列するだけであるので、図２（Ａ）に示す４行の表記から生成される異表記の数は１０６通りにもなる。図２（Ｂ）に示しているのはその一部である。
【００４３】
地名文字列生成規則ファイル１０２は、通常のテキストファイルであり、地名文字列生成規則編集処理のステップ１０１の実現手段としては一般的なテキストエディタを適用することが可能である。
【００４４】
図３は図２（Ａ）の生成規則の例から作られる地名表記ネットワークを模式的に示す図であり、以下、これについて説明する。
【００４５】
地名表記ネットワークは、各辺が部分文字列に、各頂点が部分文字列の境界に対応する有向グラフである。各辺の方向は、文字列中の文字の順に一致する。図３において、NULLと記された辺は、NULL遷移、すなわち、その箇所に何も文字列がなくてよいことを示している。図３の中の右下に線の入った円３０１は、地名文字列の開始位置を示す。また、中央に斜線が入った円３０２〜３０４は、地名文字列の終わりの位置を示す。
【００４６】
図４は地名表記ネットワークを計算機上に実装する際のデータ形式を説明する図であり、以下、これについて説明する。そして、計算機上に地名表記ネットワークを実装する際、地名表記ネットワークは、図４に示すようなデータ形式（left-child．right-sibling representation、Ｔ．コルメン他「アルゴリズムイントロダクション」近代科学社、pp．201-202)を用いて表現される。このデータ形式は、文字の連接関係を子ポインタで、地名表記ネットワークの分岐を兄弟ポインタで表現するものである。
【００４７】
図４（Ａ）は、各データレコードの構成要素を示しており、各データレコードは、データ項目ｃ４０１、ｂ４０２、ｄ４０３の３つのデータ項目からなる。データ項目ｃは文字コードであり、データ項目ｂは兄弟ポインタである。また、データ項目ｄは子ポインタである。そして、あるデータレコードからの分岐は、兄弟ポインタにより、また、文字列は、子ポインタによって接続されたリスト形式で表現される。例えば、図３に示す地名表記ネットワークを前述したデータレコードによりリスト形式で表現すると、図４（Ｂ）に示すようなものとなる。
【００４８】
図４（Ｂ）に示すリスト形式で表現した地名表記ネットワークにおいて、データレコード４０４’（文字コード「小」に対応）からは、データレコード４０４〜４０６に分岐するが、データレコード４０４’から４０４には子ポインタにより連結され、データレコード４０４、４０５、４０６は兄弟ポインタにより連結されている。また、文字列「埼玉県」は、子ポインタで接続されたデータレコード４０７、４０８、４０９で表されている。また、データレコードがNULL遷移に対応する場合、そのデータレコードの文字コードｃ４０１にはNULL文字が格納され、このNULL文字が格納されたデータコードから分岐するデータレコードは、省略されてもよいことが意味される。さらに、地名文字列の最後の文字に対応するデータレコードの次に、データレコード４１０として示すように、１つ余分のデータレコードが設けられ、このデータレコード４１０の子ポインタｄには、NULLポインタが格納されて、ネットワークの終端であることを表すと共に、兄弟ポインタｂに地名の識別子が格納される。
【００４９】
前述のような形式表現される図４（Ｂ）に示すリスト形式の地名表記ネットワークは、各データレコードがノードに対応するグラフとみなすことができ、図３に模式的に表した地名表記ネットワーク中の各辺が、ここでは文字数分のノードで表わされたことになる。
【００５０】
図５は図１のステップ１０３における地名文字列生成規則から地名表記ネットワークを生成する処理を説明するフローチャート、図６は生成される構文木の例を説明する図であり、以下、これらについて説明する。
【００５１】
まず、地名表記生成規則ファイル１０２の中の各地名文字列の生成規則、例えば、図２（Ａ）の上から２行目以降の＜Ｎ数字＞で始まる各行について、制御ループ５０１により１つづつ処理する。各行に対し、まず、ステップ５０２で、行内の文字列の構文解析を行い、図６に示すような構文木を作成する。次に、ステップ５０３で、その地名異表記群に対応する地名表記ネットワークの終端ノードｔ_i を生成する。以下、特にことわりがない場合、「地名表記ネットワーク上のノード」とは図４（Ａ）の形式のデータレコードを示すものとする。ｔ_i の中の文字コードｃにはNULLを、子ポインタｄにはNULLを、兄弟ポインタｂにはその地名異表記群の地名識別子を格納する。次に、ステップ５０４で、後述する関数procを使用してその地名異表記群に対応する地名表記ネットワークを生成する。全ての地名文字列生成規則を処理し終わった後、ステップ５０５で、生成された地名表記ネットワークの冗長な部分を統合する。
【００５２】
地名文字列生成規則から構文木を生成する処理は、例えば、「自然言語処理入門」（近代科学社、ISBN4-7649-0143-9、pp．19-30）に記載されているような生成規則に応じた遷移ネットワークを生成する手法等を使用することができる。図６に示すステップ５０２の処理で生成された構文木の例は、図２（Ａ）の２行目から生成される構文木の例である。この図６において、「＋」が記された円は文字列の連接を、「［］」が記された円はオプションを、「｜」が記された円は選択を表わし、四角は文字列を表わしている。拡張ＢＮＦ記法は、括弧「（」、「）」も使用されるが、本発明の実施形態に使用する構文木は、括弧に対応するノードは設けず、括弧により定まる演算の順序を、構文木の構造自身に反映させたものとする。
【００５３】
関数procは、構文木から地名表記ネットワークを生成するために使用する関数であり、ｐとａとの２つの引き数をとる。引き数ｐは、生成する地名表記ネットワークの終端のノードの子ポインタｄがとる値を指定する。また、引き数ａは、処理対象の構文木の最上位ノードを示す。あるノードに引き数ａが指定されると、引き数ａ以下の全てのノードが再帰的に処理される。
【００５４】
図７は関数procによる処理動作を説明するフローチャート、図８、図９は関数procによって地名表記ネットワークが生成される過程を説明する図、図１０は地名表記生成規則から生成された地名表記ネットワーク群を示す図であり、以下、これらについて説明する。なお、図７において、図に示されているｐ、ｑ、ｒは、図４に示す形式のデータレコードのアドレスを表す変数であり、記号「 −＞」は、データレコードの中のデータ項目を表わしている。また、図７に示すフローの処理は、構文木のノードａの種類に応じて４つに場合分けして実行される。
【００５５】
（１）構文木のノードａの種類を判別し、種類が、「＋」、「｜」、「［］」、「文字列」の何れてあるかを判定する（ステップ７０１）。
【００５６】
（２）ステップ７０１での判定で、構文木のノードａの種類が「＋」すなわち結合であった場合、まず、変数ｑに引き数ｐをコピーする。すなわち、この処理で生成する部分ネットワークの終端ノードのアドレスをコピーする（ステップ７０２）。
【００５７】
（３）次に、構文木の子ノードｎ_i （１≦ｉ≦子ノードの数）を右から順に関数proc（）により処理して地名表記ネットワークの部分ネットワークを生成する。その際、関数proc（）で生成する部分ネットワークの終点がｑとなるよう引き数を渡す。この結果生成された部分ネットワークの始点のポインタをｑへ代入し直し、次に生成する部分ネットワークの終点とする。このようにして関数proc( )を繰り返し呼び出すことにより、構文木の「＋」の子ノードから生成される地名表記ネットワークの部分ネットワークが次々と連結される（ステップ７０３、７０４）。
【００５８】
（４）全ての子ノードを処理し終わったなら、その時点でのｑすなわち部分ネットワークの先頭を戻り値として返す（ステップ７０５）。
【００５９】
（５）ステップ７０１での判定で、構文木のノードａの種類が「｜」すなわち選択であった場合、まず、子ノードの１つｎ₁ から部分ネットワークを生成し、得られた部分ネットワークの先頭アドレスを変数ｑに代入する（ステップ７０６）。
【００６０】
（６）次に、変数ｒにｑの値を代入し、他の子ノードｎ_i （２≦ｉ≦子ノードの数）から生成する部分ネットワークを順に生成する。生成した部分ネットワークの先頭アドレスは、ｒの兄弟ポインタｂに格納する。さらに、生成した部分ネットワークの先頭アドレスをｒに代入し、以下同様の処理を繰り返す（ステップ７０７〜７１０）。
【００６１】
（７）全ての子ノードを処理し終わったなら、ｑすなわち一番始めに生成した部分ネットワークの先頭のアドレスｑを戻り値として返す（ステップ７１１）。
【００６２】
（８）ステップ７０１での判定で、構文木のノードａの種類が「［］」すなわちオプションであった場合、まず、構文木の子ノードに対応する部分ネットワークを生成し、その先頭アドレスを変数ｑに格納する。その際、生成した部分ネットワークの末端はｐとなるようパラメータを指定する（ステップ７１２）。
【００６３】
（９）次に、NULL遷移に対応するノードを関数newNd（）を用いて生成し、そのアドレスをｑの兄弟ポインタｂに格納する。なお、newNd（）は、図４に示す形式のデータレコードの記憶領域を新たに１つ確保する関数であり、確保されたデータレコードのデータ項目ｂにはNULLポインタがセットされる（ステップ７１３）。
【００６４】
（10）次に、NULL遷移に対応するノードの文字コードｃにNULLを代入し、さらに、NULL遷移に対応するノードの子ノードポインタｄにｐをセットする（ステップ７１４、７１５）。
【００６５】
（11）最後に生成した部分ネットワークの先頭のアドレスｑを戻り値として返す（ステップ７１６）。
【００６６】
（12）ステップ７０１での判定で、構文木のノードａの種類が文字列であった場合、まず、変数ｑにｐの値を代入する（ステップ７１７）。
【００６７】
（13）次に、下記の処理を文字列中の各文字Ｃ_i （1≦i≦文字列長）に対して、文字列の終わりから順に繰り返し、各文字に対応するノードを１つづつ生成する。ここでは、まず、関数newNd（）でノードの記憶領域を１つ分確保する。次に、新たに生成したノードの文字コードｃに、Ｃ_i を代入する。次に、新たに生成したノードの子ノードｄにｑの値を代入する。次に、ｑの値を新たに生成したノードのアドレスで置き換える（ステップ７１８〜７２２）。
【００６８】
（14）前述の処理を各文字Ｃ_i に対して実行した後、新たに生成した部分ネットワークのアドレスｑを戻り値として返す（ステップ７２３）。
【００６９】
関数procによって地名表記ネットワークが生成される過程を示す図８、図９において、８０１は、図５に示すフローのステップ５０３の処理で終端ノードを生成し、識別子「3501104」を格納したところである。その後、図７に示すフローの各ステップの処理により、図８、図９に示す図の上から下に順に示すように地名表記ネットワークが生成されていく。そして、図６に示す構文木のノード６０３を関数procで処理した場合、まず、ノード６０２に対応する部分ネットワークが関数procで生成され、８０２に示すようにノード６０２に対応する部分ネットワークが生成される。次に、ノード６０４に対応する部分ネットワークを関数procで生成する。この場合、ｐはノード８０４のアドレスを格納しており、生成された部分ネットワークは、８０３に示すようにノード８０４に接続される。
【００７０】
図５に示すフローの制御ループ５０１により、各地名文字列の生成規則毎に別個の地名表記ネットワークが生成される。この結果、図２の地名表記生成規則から生成された地名表記ネットワーク群は、図１０に示すようなものとして生成され、さらに、ステップ５０５の処理により、これらのネットワーク群の冗長な部分、例えば、埼玉県川越市の部分を統合し、図３により説明したような地名表記ネットワークが生成される。
【００７１】
図１１は従来技術を用いて地名表記ネットワークを生成する処理手順を説明するフローチャート、図１２は従来技術により生成される地名表記ネットワークの生成過程の例を説明する図、図１３は従来技術により生成された地名表記ネットワークの例を示す図であり、以下、これらの図を参照して、生成規則を用いない場合の地名表記ネットワーク生成方法について説明する。
【００７２】
ここで従来技術を説明する理由は、従来の地名文字列の表記方法からはTrieと呼ばれる木構造の地名表記ネットワークしか生成することができず、本発明の表記方法から生成される地名表記ネットワークが記憶容量、照合に要する処理時間共に優れていることを示すためである。従来技術による地名の表記を表現する手法は、図２（Ｂ）に示すような地名文字列の羅列であり、図１１により説明するフローは、このような単語の羅列から地名表記ネットワークを生成する手順である。ここで、図２（Ｂ）の中のｋ番目の文字列をＳ_k 、その長さをＬ_k 、各文字列のｉ番目の文字をＣ_i とする。また、各文字列に対応する識別子が別途記憶されているものとする。そして、生成する地名表記ネットワークは、図４に示すデータ形式で実現する。
【００７３】
（１）まず、地名表記ネットワークの仮の根となるノードｒｒを生成する。このノードの子ノードポインタｄにはNULLをセットする（ステップ１１０１、１１０２）。
【００７４】
（２）ループ１１０３により全ての文字列Ｓ_k を１つづつ処理する。
【００７５】
（３）まず、変数ｐに根のアドレスを代入する。次に文字列中の文字の１つ毎に、サブルーチンSrchNxt を呼び出す。サブルーチンSrchNxt は、各文字に対応するノードがすでに生成されているか否かを判断し、生成されていない場合、新たなノードを追加する処理手順であり、この手順については後述する（ステップ１１０４〜１１０６）。
【００７６】
（４）サブルーチンSrchNxtで文字列中の文字を処理し終わったならば、新たな子ノードを関数newNd( )で生成し、その文字列の識別子をポインタｂの領域に格納し、さらにこの新たな子ノードのアドレスをｐの子ノードポインタｄに代入する。ループ１１０３が終了した時点でのｒｒの子ノードが地名表記ネットワークの根となる（ステップ１１０７〜１１１０）。
【００７７】
次に、サブルーチンSrchNxtの処理について説明する。
【００７８】
（１）まず、変数ｑにｐの子ノードｄの値を代入し、次に、ループ処理を行って全てのｐの子ノードを変数ｑにより走査し、対応する文字コードすなわちデータ項目ｃがＣ_i と等しいか否か調べる。もし等しければ、すでにＣ_i に対応するノードが生成されているとみなして、ポインタｐをそのノードｑに進めて終了する（ステップ１１１１、１１１３〜１１１５、ループ１１１２）。
【００７９】
（２）ステップ１１１３でのチェックで、データ項目ｃがＣ_i と等しくなければ、ｑにｑの兄弟ポインタの値を代入し、ｑがNULLとなるまでループ処理を繰り返す。（ステップ１１１６）。
【００８０】
（３）ループ処理が終了してもＣ_i に対応するノードが見いだせない場合、新たなノードを関数newNd( )で生成し、新たなノードの文字コードｃにＣ_i を、子ノードポインタｄにNULLを、兄弟ポインタｂにｐの子ノードポインタｄの値をそれぞれ代入し、この新たなノードのアドレスをｐの子ノードポインタｄへ代入し、ポインタｐに新たな子ノードのアドレスを代入して、このサブルーチンの処理を終了する（ステップ１１１７〜１１２２）。
【００８１】
前述した図１１の処理手順に従って地名表記ネットワークが生成される過程を図１２に示している。ここで挙げた例は、図２（Ｂ）の上から３行を処理する過程である。まず、始めに、「川越市小ヶ谷」に対応する地名表記ネットワークが生成される（１２０１）。次に、「川越市笠幡」を処理するが、「川越市」の部分は、１２０１ですでに生成されているので、新たなノードは生成されない。しかし、ポインタｐが１２０２に示す位置に達し、「笠」の文字を処理するときには、「笠」に該当するノードは「市」の子ノードにはない。そこで、「小」の兄弟ノードとして新たに「笠」に該当するノードを生成する。以下、残りの文字「幡」に対応するノードを、新たに生成したノードの子ノードとして連結する（１２０３）。「川越市下広谷」の場合も同様に処理され、「下」に対するノードを「小」、「笠」の兄弟として新たに生成し（１２０４）、以降の文字に対応するノードが連結される（１２０５）。
【００８２】
図１３は図２（Ｂ）に記した異表記群から生成した地名表記ネットワークの一部を模式的に示すものであるが、この例は、図３に示す場合と異なり、従来の表記方法から生成される地名表記ネットワークは、木の形式、すなわち一度分岐したら再び合流することがない形式になっている。これは、Trieとして知られているデータの表現形式である。図３と比較すると、冗長な部分が多いことが判る。例えば、「東田」、「東関」、「西関」に対応する部分ネットワークが、６回繰り返されている。このことは、必要とする記憶容量の増大につながることを意味し、階層的なメモリ構成をとる計算機の場合、アクセスするメモリ空間が大きくなるとキャッシュのミスヒットなどによりアクセスが遅くなり、ひいては後述する文字列照合処理自身が遅くなることになる。
【００８３】
本発明により、図３に示すような冗長さが少ない地名表記ネットワークを生成することができることは、生成規則による地名単語表記の本質的な利点である。この生成規則を用いると、冗長な箇所を明確に表現することができる。例えば、図２（Ａ）に示す例の場合、「小ヶ谷」の「ヶ」には３通りの異表記があるが、「ヶ」以降の文字列は同じであることが拡張ＢＮＦ記法により示されている。このため、図３に示すように、「小」と「谷」との間のみ、３つの経路があるようなネットワークが生成される。
【００８４】
これに対して、図２（Ｂ）に示すような従来の地名文字列の表記方法は、「ヶ」以降の異表記群が等価かどうかを検知することができず、図１３に示すようなネットワークしか生成することができない。
【００８５】
図１４は図１に示す地名認識処理１０４での処理動作を説明するフローチャートであり、以下、これについて説明する。
【００８６】
（１）まず、入力画像から文字行切出し処理により、文字行の部分の画像を切出す（ステップ１４０１）。
【００８７】
（２）次に、文字切出し処理により、文字行画像中から文字と思われるパタン、すなわち、候補パタンを切出す。この段階で一意に文字の境界を決定できない場合、複数の境界の仮説に基づいて、文字パタンの切出しを試み、それぞれの仮説に対応した候補パタンを出力する（ステップ１４０２）。
【００８８】
（３）次に、文字認識処理により、切出されたそれぞれの候補パタンがどんな文字であるかを認識し、候補文字列として出力する。文字の切出し方が複数の仮説に基づいている場合、また、文字認識の結果、１つのパタンに対し複数の候補文字が出力され場合、文字認識処理は、それぞれの切出し方及び候補文字に組み合わせに対応して複数の候補文字列を出力する（ステップ１４０３）。
【００８９】
（４）最後に、文字列照合処理により、それぞれの候補文字列が正しい地名文字列になっているか否かを、地名表記ネットワークを参照して照合する。照合で受理された候補文字列を地名認識結果とする（ステップ１４０４）。
【００９０】
図１５は前述の文字列照合処理１４０４での処理動作を説明するフローチャートであり、以下、これについて説明する。この処理は、１つの文字列を入力とし、入力文字列の少なくとも一部が地名文字列として受理し得るか否か判定し、受理し得るなら該当するその地名表記の識別子を求める処理である。ここで、入力文字列の長さをＬ、文字列のｉ番目の文字をＣ_i とする。
【００９１】
（１）まず、ループ１５０１により、照合の起点ｓを１からＬまで変えながら、ステップ１５０２、１５０３を繰り返す。
【００９２】
（２）ノードを指し示す変数ｐに、地名表記ネットワークの根のアドレスをセットする。次に、引き数ｐおよびｓを与えて関数srchを呼び出す。関数srchは、地名表記ネットワーク中から入力文字列に一致する経路を見い出し、その終端のノードのアドレスを返す関数である。srchの戻り値が、NULLポインタでなければ照合に成功したものとみなして、関数srchの戻り値が示すノードに格納された識別子を出力する（ステップ１５０２〜１５０４）。
【００９３】
（３）もしｓがＬに達しても照合が成功しなければ、文字列照合処理は失敗したものとして、処理を終了する（ステップ１５０５）。
【００９４】
前述の処理において、関数srchは、再帰的に自分自身からも呼び出され、地名表記ネットワーク中から入力文字列に一致する経路を深さ優先で探索する。関数srchは、引き数ｐ及びｉの２つの引き数をとる。引き数ｐは、探索を開始するノードを指し示す。また、引き数ｉは、整数であり、現在の処理で注目しているのが入力文字列中の何番目の文字かを表す。受理される文字列が見つかった場合、関数srchは、その文字列の終端のノードのアドレスを返し、受理される文字列が見つからなかった場合、NULLポインタを返す。
【００９５】
図１６は前述の処理での関数srchの処理動作を説明フローチャートであり、以下、これについて説明する。
【００９６】
（１）まず、引き数ｐが文字列終了ノードを指しているか否かを調べる。もし、文字列終了ノードを指している場合には、入力文字列が受理されたとみなし、ｐを戻り値として返して処理を終了する（ステップ１６０１）。
【００９７】
（２）次に、すでに全ての文字を処理し終わったか否か判定する。ｉがＬより大きく、全ての文字を処理し終わっているにもかかわらず、地名表記ネットワークの終端にｐが達していない場合、NULLを返す（ステップ１６０２）。
【００９８】
（３）次に、ｐのデータ項目ｃが文字列のｉ番目の文字Ｃ_i と一致するか否かを調べる。もし一致すれば、ｐの子ノードｐ−＞ｄを探索の起点とし、ｉ＋１番目から文字列を処理するように、関数srchを再帰的に呼び出す。この戻り値ｒがNULLでなければ、文字列が受理されたとみなし、ｒを戻り値として処理を終了する（ステップ１６０３）。
【００９９】
（４）次に、pがNULL遷移に対応するノードか調べる。もしそうであれば、ｐの子ノードｐ−＞ｄを探索の起点とし、ｉ番目から文字列を処理するように、関数srchを再帰的に呼び出す。この戻り値ｒがNULLでなければ、文字列が受理されたとみなし、ｒを戻り値として終了する（ステップ１６０４）。
【０１００】
（５）次に、ｐに兄弟ノードｐ−＞ｂが連結されているかどうか調べる。もし連結されていれば、ｐの兄弟ノードｐ−＞ｂを探索の起点とし、ｉ番目から文字列を処理するように、関数srchを再帰的に呼び出し、この戻り値を上位に返す（ステップ１６０５）。
【０１０１】
（６）もし、前述のいずれの処理でも入力文字列が受理されなければ、これ以上の探索はできないため、NULLを戻り値として処理を終了する（ステップ１６０６）。
【０１０２】
前述までで説明した本発明の実施形態は、文字切出し、文字認識、文字列照合を順次行うとして説明したが、本発明は、古賀他「宛名読取り装置および郵便物等区分機および文字列認識方法」（特願平１０−２８０７７号〔特許第３２４６４３２号公報参照〕）のように文字列照合結果を文字切出しにフィードバックする方式に容易に拡張することもできる。
【０１０３】
図１７は本発明の実施形態による地名文字列認識の処理を応用したシステムの構成例を示すブロック図、図１８は地名文字列生成規則編集装置の構成を示すブロック図である。このシステム例は、郵便区分システムに本発明を適用した例である。図１７、図１８において、１７０１は郵便区分機、１７０２はスキャナ、１７０３はディレイライン、１７０４はソーター、１７０５は地名認識装置、１７０６は入力用インタフェース、１７０７は演算処理装置、１７０８は出力用処理装置、１７１０はメモリ、１７１１はネットワークインタフェース、１７１２はハードディスク、１７１３はメディア着脱可能記憶装置、１７１４は地名文字列生成規則編集装置、１７１８はネットワーク、１８０１はマウス、１８０２はキーボード、１８０３はディスプレイ、１８０４は地名文字列生成規則編集プログラム、１８０５は文字列照合プログラム、１８０６は地名表記ネットワーク表示プログラム、１８０７は地名文字列生成規則ファイル、１８０８は地名表記ネットワーク生成プログラム、１８０９は地名表記ネットワークデータ、１８１０は通信装置、１８１１はメディア着脱可能記憶装置、１８１２はコンピュータである。
【０１０４】
図１７に示すシステムは、１台または複数台の郵便区分機１７０１と、１台または複数台の地名文字列生成規則編集装置１７１４とがネットワーク１７１８で接続されて構成される。郵便区分機１７０１は、スキャナ１７０２、ディレイライン１７０３、ソータ１７０４、地名認識装置１７０５から構成される。また、地名認識装置１７０５は、入力用インタフェース１７０６、演算処理装置１７０７、出力用処理装置１７０８、メモリ１７１０、ネットワークインタフェース１７１１、ハードディスク１７１２、メディア着脱可能記憶装置１７１３から構成される。なお、図における太線は、郵便物の流れを示す。
【０１０５】
図１７に示すシステムにおいて、スキャナ１７０２から入力された郵便物に記載されている地名の画像情報は、地名認識装置１７０５へ転送される。そして、郵便物がディレイライン１７０３を搬送される間に、地名認識装置１７０５は、郵便物に記載されている地名を認識し、認識結果をソータ１７０４へ転送する。ソータ１７０４は、認識結果に応じて郵便物を区分する。
【０１０６】
郵便物の区分の準備段階として、地名認識装置１７０５は、ハードディスク１７１２から地名表記ネットワーク生成プログラムファイルをメモリ１７１０に読み込んで演算装置１７０７で起動する。地名表記ネットワーク生成プログラムの制御の下に、地名認識装置１７０５は、地名文字列生成規則を地名文字列生成規則編集装置１７１４からネットワークインタフェース１７１１を介して入力し、地名表記ネットワークファイルを作成してハードディスク１７１２に格納する。
【０１０７】
なお、地名文字列生成規則は、ネットワークを介して地名文字列生成規則編集装置１７１４から入力する代わりに、フロッピーディスクドライブ等のメディア着脱可能記憶装置１７１３より読み込んでもよい。
【０１０８】
地名認識装置１７０５は、郵便物の区分時に、ハードディスク１７１２から認識プログラムファイル及び地名表記ネットワークファイルをメモリ１７１０に読み込んで演算装置１８０７により実行する。そして、地名認識装置１７０５は、認識プログラムの制御の下に、入力インタフェース１７０６から画像を入力し、郵便物に記載された地名を認識し、認識結果を出力インタフェース１７０８を介して出力する。
【０１０９】
地名文字列生成規則編集装置１７１４は、図１８に示すように、コンピュータ１８１２に、マウス１８０１、キーボード１８０２、ディスプレイ１８０３、地名文字列生成規則ファイル１８０７を格納するディスク装置、通信装置１８１０、メディア着脱可能記憶装置１８１１を接続して構成される。編集作業は、コンピュータ１８１２上で動作する地名文字列生成規則編集プログラム１８０４を介し地名文字列生成規則ファイル１８０７を編集することにより実行される。地名文字列生成規則ファイル１８０７は、テキストファイルであり、編集には通常のテキストエディタを用いることができる。また、コンピュータ１８１２上で地名表記ネットワーク生成プログラム１８０８を実行し、地名文字列生成規則ファイル１８０７から地名表記ネットワーク１８０９を生成することができる。
【０１１０】
地名文字列生成規則編集装置１７１４は、前述の機能により、編集中の地名単語生成規則が文法的に正しいか否かを確認することができ、さらに、認識処理での文字列照合１４０４と等価なプログラム１８０５を実行し、キーボード１８０３から入力された試験用の文字列が受理されるか否かを確認することができる。
【０１１１】
また、コンピュータ１８１２は、地名表記ネットワーク１８０９を、例えば、図３に示すような形式で表示するための地名表記ネットワーク表示プログラム１８０６を実行するので、作業者は、編集結果を視覚的に確認することができる。編集した結果の地名文字列生成規則ファイル１８０７は、通信装置１８１０を介して地名認識装置１７０５へ転送され、あるいは、メディア着脱可能記憶装置１８１１によりフロッピーディスクなどの着脱可能な記憶メディアに複写され、記憶メディアにより郵便区分機１７０１へ輸送されてもよい。
【０１１２】
図１９は本発明の他の実施形態の構成を示すブロック図、図２０はディスプレイに表示される画面例を説明する図である。この例は、本発明による地名文字列の表記方法及び地名照合方式を利用し、地名を表す文字列から地名に関する情報を検索するための地名録装置である。図１９において、１９０１はマウス、１９０２はキーボード、１９０３はディスプレイ、１９０４はプリンタ、１９０５は入力ファイル、１９０６は出力ファイル、１９０７は地名録プログラム、１９０８は地名付加情報ファイル、１９０９は地名文字列生成規則ファイル、１９１０は通信モジュール、１９１１はインタフェースモジュール、１９１２は地名リストデータ、１９１３は地名リストソートモジュール、１９１４は地名情報検索モジュール、１９１５は地名リスト生成モジュール、１９１６は文字列照合モジュール、１９１７は地名表記展開モジュール、１９１８は地名表記ネットワーク生成プログラム、１９１９は地名表記ネットワークデータである。
【０１１３】
図１９に示す装置は、以下のようなサービスを提供するものである。
（１）キーボードから入力された地名文字列の標準形を表示または印刷する。
（２）キーボードから入力された地名文字列の異表記を表示または印刷する。
（３）キーボードから入力された地名文字列に対応する地域の情報（郵便番号など）を表示または印刷する。
（４）ファイルから入力した地名文字列を標準形または郵便番号等該当する地域に固有の情報に変換してファイルへ出力する。
（５）ネットワークから入力した地名文字列を標準形または郵便番号等該当する地域に固有の情報に変換してネットワークへ出力する。
前述において、標準形とは、例えば、行政区分で定められているある地域を表す正式な文字列のことである。
【０１１４】
図１９に示す実施形態は、計算機上で実行される地名録プログラム１９０７に、マウス１９０１、キーボード１９０２、ディスプレイ１９０３、プリンタ１９０４、入力ファイル１９０５、出力ファイル１９０６、地名付加情報ファイル１９０８、地名文字列生成規則ファイル１９０９が接続されて構成される。表示、入出力は、インタフェースモジュール１９１１を介して行われる。検索対象の文字列が入力されると、地名情報検索モジュール１９１４は、文字列照合モジュール１９１６を呼び出す。文字列照合モジュール１９１６は、図１４における文字列照合処理１４０４と等価な処理を司るモジュールであり、地名表記生成規則ファイル１９０９から地名表記ネットワーク生成プログラム１９１８によって生成された地名表記ネットワークデータ１９１９を参照し、入力文字列がいかなる識別子の地名表記に該当するかを調べる。
【０１１５】
地名情報検索モジュール１９１４は、得られた識別子を手がかりに、地名付加情報ファイル１９０８から、標準形と、郵便番号等の付加的な情報とを検索する。また、地名表記展開モジュール１９１７は、地名表記ネットワークデータ１９１９からあり得る異表記を全て列挙する。得られた異表記群は、地名リストデータ１９１２に格納し、必要に応じインタフェースモジュール１９１１を介して出力する。また、地名リストソートモジュール１９１３は、操作者の指示に従い、異表記群の順序を並べ替えて出力する。このような処理のための入力は、キーボード１９０１、入力ファイル１９０５、通信モジュール１９１０のいずれを介して行われてもよい。また、出力は、ディスプレイ１９０４、出力ファイル１９０６、通信モジュール１９１０のいずれを介して行ってもよい。
【０１１６】
図２０に示す図１９の実施形態のディスプレイ１９０３に表示される画面例において、図２０（Ａ）に示す例は、操作者が、「川越市小ヶ谷」という文字列を入力して、検索を実行した際にディスプレイ１９０３に表示される画面例である。入力文字列は、フィールド２００５へ入力され、ボタン２００６をマウスでクリックすることにより、検索が実行される。検索の結果、入力文字列に該当することが判った文字列は、ウインドウ２００７に表示される。各行の「標準」の項目には、その文字列が標準形か否かが表わされる。項目「地名」は、その文字列を表示する。項目「郵便番号」には、その文字列に対応する郵便番号を表示されるが、その他のその地域の付加情報を表示してもよい。
【０１１７】
領域２００４に並べられた「標準」、「地名」、「郵便番号」の枠はボタンとなっており、各ボタンをマウスによりクリックすることにより、それぞれの項目に基づいた行の並べ換えを指示する。ウインドウ２００８は、検索のオプションを指定するためのものである。ここで、標準形のみを表示するか、字、大字等に基づく異表記群を表示するか、通称名（「＊＊団地」等）に基づく異表記群を表示するかを指定する。ボタン２００２は、表示内容の印刷を指示するためのボタンであり、ボタン２００１は、キーボードとディスプレイとに代わり、ファイルを入出力するモードへの切り替えのためのボタンである。また、ボタン２００３は、プログラムの終了を指示するためのボタンである。
【０１１８】
図２０（Ｂ）に開かれたウィンドウ２００９は、照合の結果得られた地名の読み方、小字、郵便番号等の詳細な情報を表示するウインドウである。このウインドウ２００９は、ウインドウ２００７上に表示された検索結果をマウスでクリックすることにより起動される。
【０１１９】
なお、本発明の実施形態による表記方法により表記された地名文字列は、ＦＤ、ＭＯ、ＤＶＤ等の記憶媒体に地名辞書として格納し提供することができる。
【０１２０】
【発明の効果】
以上説明したように本発明によれば、地名の表記に多くの異表記がある場合でも、あり得る全ての地名文字列を網羅する地名辞書を少ない人手で作成することができる。また、高速な照合処理が可能なネットワーク形式の地名辞書を容易に作成することができる。
【図面の簡単な説明】
【図１】本発明の実施形態による地名文字列認識の処理例を説明するフローチャートである。
【図２】編集された地名文字列生成規則により表現された地名の表記例と生成規則を用いずに異表記を羅列した例とを示す図である。
【図３】生成規則の例から作られる地名表記ネットワークを模式的に示す図である。
【図４】地名表記ネットワークを計算機上に実装する際のデータ形式を説明する図である。
【図５】地名文字列生成規則から地名表記ネットワークを生成する処理を説明するフローチャートである。
【図６】生成される構文木の例を説明する図である。
【図７】地名表記生成規則から地名表記ネットワーク生成する関数procによる処理動作を説明するフローチャートである。
【図８】関数procによって地名表記ネットワークが生成される過程を説明する図（その１）である。
【図９】関数procによって地名表記ネットワークが生成される過程を説明する図（その２）である。
【図１０】地名表記生成規則から生成された地名表記ネットワーク群を示す図である。
【図１１】従来技術を用いて地名表記ネットワークを生成する処理手順を説明するフローチャートである。
【図１２】従来技術により生成される地名表記ネットワークの生成過程の例を説明する図である。
【図１３】従来技術により生成された地名表記ネットワークの例を示す図である。
【図１４】図１に示す地名認識処理での処理動作を説明するフローチャートである。
【図１５】図１４に示す文字列照合処理での処理動作を説明するフローチャートである。
【図１６】関数srchの処理動作を説明フローチャートである。
【図１７】本発明の実施形態による地名文字列認識の処理を応用したシステムの構成例を示すブロック図である。
【図１８】地名文字列生成規則編集装置の構成を示すブロック図である。
【図１９】本発明の他の実施形態の構成を示すブロック図である。
【図２０】ディスプレイに表示される画面例を説明する図である。
【符号の説明】
１０１地名文字列生成規則編集処理
１０２地名文字列生成規則ファイル
１０３地名表記ネットワーク生成処理
１０４地名認識処理
１４０４文字列照合処理
１７０１郵便区分機
１７０２スキャナ
１７０３ディレイライン
１７０４ソーター
１７０５地名認識装置
１７０６入力用インタフェース
１７０７演算処理装置
１７０８出力用処理装置
１７１０メモリ
１７１１ネットワークインタフェース
１７１２ハードディスク
１７１３メディア着脱可能記憶装置
１７１４地名文字列生成規則編集装置
１７１８ネットワーク
１８０１マウス
１８０２キーボード
１８０３ディスプレイ
１８０４地名文字列生成規則編集プログラム
１８０５文字列照合プログラム
１８０６地名表記ネットワーク表示プログラム
１８０７地名文字列生成規則ファイル
１８０８地名表記ネットワーク生成プログラム
１８０９地名表記ネットワークデータ
１８１０通信装置
１８１１メディア着脱可能記憶装置
１８１２コンピュータ
１９０１マウス
１９０２キーボード
１９０３ディスプレイ
１９０４プリンタ
１９０５入力ファイル
１９０６出力ファイル
１９０７地名録プログラム
１９０８地名付加情報ファイル
１９０９地名文字列生成規則ファイル
１９１０通信モジュール
１９１１インタフェースモジュール
１９１２地名リストデータ
１９１３地名リストソートモジュール
１９１４地名情報検索モジュール
１９１５地名リスト生成モジュール
１９１６文字列照合モジュール
１９１７地名表記展開モジュール
１９１８地名表記ネットワーク生成プログラム
１９１９地名表記ネットワークデータ[0001]
BACKGROUND OF THE INVENTION
The present invention , Place name Character string collation method, character string collation device, character string recognition device as well as Postal classification system To In particular, it is suitable for application to place name character string storage means and collation means used in an apparatus for reading a character string such as a place name described in a document. Place name Character string collation method, character string collation device, character string recognition device as well as Postal classification system To Related.
[0002]
[Prior art]
In general, a character recognition device that reads a character string (hereinafter referred to as a place name character string) consisting of a sequence of place name words such as a prefecture name, a municipality name, and a character name from an image,
(1) Cut out character pattern (cut out character),
(2) Identify the character type (character code) of each character pattern (character identification),
(3) The character identification result is collated with a string of place name words stored in advance (character string collation),
These three functions are provided.
[0003]
As a conventional technique related to a character string matching method, for example, a method by Marukawa et al. (Information Processing Society of Japan Journal, Vol. 35, No. 6, “Error Correction Algorithm for Handwritten Kanji Address Recognition”) is known. In addition, as a conventional technique related to a method that combines character extraction, recognition, and collation, a method based on a hidden Markov model (OE Agazzi, et al., “Connected And Degraded Text Recognition using Planar Hidden Markov Models,” “Proceedings of International Conference on Acoustics, Speech, and Signal Processing), Koga, et al., “Lexical Search Approach for Character-String Recognition” Third International Association for Pattern Recognition Workshop on Document Analysis Systems 1998) It has been.
[0004]
The above-described conventional technique requires a place name character string dictionary and a means for storing place name character strings that can appear in advance for the character string matching process. The place name character string dictionary includes the following three types.
(1) "Dictionary source file" stored in a file
This is a “place name notation rule file”, which will be described later, and must be editable for new creation or correction.
(2) “Dictionary table” stored in memory
This is a “place name notation network”, which will be described later, in which the contents of a dictionary file are expanded on a memory in a format suitable for collation processing.
(3) “Dictionary binary file” at the intermediate stage between (1) and (2) above
In this case, in order to facilitate development on the memory, the result of performing a part of the development processing is stored in a file in advance.
[0005]
The format of dictionary source files used in the prior art is often not clarified. However, all of the conventional techniques are based on the premise that all the place name character strings that can appear are stored in the dictionary table. For this reason, a text file that enumerates the place name character strings that can appear in advance is stored in the dictionary. It is thought that it is used as a source file.
[0006]
[Problems to be solved by the invention]
The above-described prior art requires that a dictionary be prepared for the character string matching process, but in Japanese there are many different notations that represent the same region with different character strings, and the place name character strings that can appear can be found in the dictionary. It is difficult to register, and there is a problem that it is virtually impossible to manually create a complete dictionary for this purpose.
[0007]
Examples of Japanese place names include different notations due to differences in characters used, different notations due to omission of words, different notations due to additional character strings, and different notations due to street names. Hereinafter, examples of these different notations will be described.
[0008]
(1) Different notation due to differences in characters used
There are "Ozawa" and "Ozawa", "Ichigaya", "Ichigaya" and "Ichigaya".
[0009]
(2) Different notation by omission of words
There are different notations that omit the prefecture name, and different notations that omit “Large” and “Character”.
Different notations omitting the prefecture name are often seen in the case of mailing address, etc., for example, “Ogaya, Kawagoe City, Saitama Prefecture” and “Ogaya, Kawagoe City, Ogaya”. Examples of omitting “Large” and “Character” include “Ogaya, Kawagoe City, Saitama Prefecture” and “Ogaya, Kawagoe City, Saitama”, for example.
[0010]
(3) Different notation with additional character strings
Character strings that are not necessary to specify the original address such as small name are added and different notation, such as `` Kagaya, Kawagoe City, Saitama Prefecture '' and `` Ogaya, Ogaya, Kawagoe City, Saitama Prefecture '' is there.
[0011]
(4) Different notation by street name notation
It is often seen in Kyoto, for example, “Omasocho, Shimogyo-ku, Kyoto” and “Going down Karasuma-dori Bukkoji”.
[0012]
As mentioned above, there are various kinds of place names. For example, if you look at the place name “Ogaya, Kawagoe City, Saitama Prefecture” as an example, It can be seen that there are a large number of different notations as listed.
"Ogaya, Kawagoe City, Saitama Prefecture"
"Ogaya, Kawagoe City, Saitama"
"Kogaya, Kawagoe City, Saitama"
“Ogaya, Kawagoe City, Saitama Prefecture”
“Ogaya, Kawagoe City, Saitama Prefecture”
“Kagogaya, Kawagoe City, Saitama Prefecture”
"Ogaya, Kawagoe City"
“Ogaya, Kawagoe City”
"Ogaya, Kawagoe City"
"Ogaya, Kawagoe City Large Character"
“Kawagoe-shi Ogiya”
"Kawagoe City Ogata Valley"
[0013]
In the above-mentioned example, “Ogaya Higashida, Kawagoe City, Saitama Prefecture”, “Ogaya Higashi Seki, Kawagoe City, Saitama Prefecture”, “Ogaya Nishiseki”, Kawagoe City, Saitama Prefecture, etc. Yes, considering the combinations with the 12 different notations listed above, there are a total of 84 different notations.
[0014]
In the case of the prior art, it is necessary to manually register all combinations of various different notations as described above in the dictionary file manually, and there is a problem that it takes a lot of manpower to create the dictionary file. It was. Moreover, in Kyoto City, where there are many different names, there are hundreds of thousands of street names and street names in the city, and it is virtually impossible to create a complete dictionary manually. there were.
[0015]
A first object of the present invention is to provide a place name notation method that solves the above-described problems and makes it easy to register all kinds of different notations in a character string matching dictionary.
[0016]
As mentioned above, if there are many different notations in place names, even if all the different notations can be written in the dictionary, the conventional technology increases the storage capacity of the dictionary, and the processing time is also differently written. As a result, the problem of the increase in the number increases.
[0017]
As a technology that can solve the above-mentioned problems, a technology that can reduce the storage capacity of the dictionary and speed up the collation process by using a data format called Trie (Koga, et al., “Lexical Search Approach for Character-String Recognition” Third International Association for Pattern Recognition Workshop on Document Analysis Systems 1998). In this technology, place names are written as tree-format data that branches only where there is a variety of notations, and it is easy to automatically generate a Trie from a set of character strings. .
[0018]
The above-mentioned technology is, for example, from the following three notations: “Ogaya Higashida, Kawagoe City, Saitama Prefecture”, “Ogaya Higashi Seki, Kawagoe City, Saitama Prefecture”, and “Ogaya Nishiseki, Kawagoe City, Saitama Prefecture” Trie can be generated easily.
Ogaya Higashida, Kawagoe City, Saitama Prefecture

Hereinafter, a network representing the connection relation of characters in a place name character string will be referred to as a place name notation network.
[0019]
However, if there is a difference in a part of the character string, such a Trie-type place name notation network must treat these as completely different character strings and generate another branch. For example, the Trie corresponding to the different notation group of “Ogaya, Kawagoe City, Saitama Prefecture” will be large as shown below.
[0020]

(Omitted)
[0021]
As described above, even when trying to express different notations using a method using a Trie-type place name notation network, there is a problem that both the dictionary capacity and the processing time are greatly increased.
[0022]
Therefore, an object of the present invention is to have a storage format that can be used for a place name dictionary for recognizing a variety of different notations and has a small storage capacity and can be collated at high speed. Place name Character string matching method, character string matching device, character string recognition device as well as Postal classification system The It is to provide.
[0023]
[Means for Solving the Problems]
According to the present invention, the object is a place name character string matching method in a place name recognition apparatus having an input interface for receiving an input of a character string, a memory for storing a plurality of place name character strings, an arithmetic processing unit, and an output interface. The memory defines an array of characters or syntax categories for each partial character string constituting part or all of the place name character string, and is represented by a syntax category consisting of an array of characters or predefined syntax categories. Multiple place name strings A place name notation network which is generated based on the name and represents the connection relation of characters in the place name character strings using a directed graph. Is stored in advance, and the arithmetic processing unit stores a partial character string of an input character string input from the input interface in the memory. As a place name notation network It is achieved by collating the place name from the input character string by determining whether it matches one of a plurality of place name character strings stored in advance, and outputting the collation result from the output interface. The
[0024]
Further, the object is to define an array of characters or syntax categories for each partial character string constituting part or all of the place name character string in the place name character string collating device, and from the character or defined syntax category array, Place name string represented by the syntax category A place name notation network which is generated based on the name and represents the connection relation of characters in the place name character strings using a directed graph. Storage means for storing, input means for inputting a character string, and the input character string in the storage means As a place name notation network This is achieved by comprising means for collating whether or not the place name character string is stored and means for outputting the result of collation.
[0025]
Also, the purpose is In the place name character string recognizing device, the character reading means for reading the characters described on the document using the image obtained by converting the density of the surface of the document into an electrical signal, and the place name character string collating device described above, And the input means in the place name character string collating means inputs the character string from the character reading means. Is achieved.
[0026]
Also, the purpose is In the mail classification system, the above-mentioned place name character string recognition device is used to recognize the place name character string in the mail address, classify the mail, or print the recognition result on the mail. Is achieved.
[0027]
In the place name character string recognition apparatus, the object is to input an image obtained by converting the density of the surface of the document into an electrical signal, read characters written on the document, and recognize a place name character string. A place name representing a region is a different character string, but represents a set of place name character strings with multiple different notations expressed by an array of words that mean the same region, and constitutes part or all of the place name character string Means for defining a character or syntax category array for each partial character string and storing the name string using a place name expression method that represents a place name string with a syntax category consisting of an array of characters or predefined syntax categories And an array of partial images in the input image, wherein each partial image is similar to each character included in one of the place name character strings represented by the place name character string representation. Perform collation and feed back the collation result to the selection of the partial image array This is achieved by providing means for recognizing the place name character string.
[0028]
Specifically, in order to achieve the above-mentioned object, the present invention expresses a place name different expression using a generation rule of a context free grammar. The context-free grammar expresses what other categorical category columns are replaced by elements of a sentence (syntax category) ("Introduction to Natural Language Processing", Modern Science, ISBN4-7649-0143 -9). The present invention extends the BNF notation (Backus-Naur-Form) (Nakada “compiler” ISBN4-7828-5057-3), which is known as one of the generation rules, to an extended BNF notation suitable for the expression of place names. Is used.
[0029]
With the above-mentioned generation rules, typical different notation patterns such as “month”, “ke”, and “ga” can be defined as one syntax category, and a set of different notations of place names can be expressed concisely. . Further, by using the selection symbol adopted in the BNF notation, it becomes possible to express the place name different notation more simply. For this reason, according to the present invention, it is possible to easily create a dictionary in which various sets of different notations are described.
[0030]
The BNF notation is a notation for expressing the generation rules of the context free grammar using symbols such as substitution, option, and selection, and the following symbols are used.
:: = replacement. This means that the syntax category on the left side can be replaced with the syntax category on the right side or an array of characters.
[] Optional. [] Means that the description in [] may or may not be present.
| Select. It means either the right side or the left side.
[0031]
As an example, an example in which the above-described generation rule for “Ogaya, Kawagoe City, Saitama Prefecture” is expressed in BNF notation is shown below.
<W months> :: = months |
<Ogaya, Kawagoe City, Saitama> :: = [Saitama Prefecture] Kawagoe City [Large] Small <w> Valley [[Character] Higashida | Higashiseki | Nishiseki]
[0032]
Further, the place name notation network can be miniaturized by using the notation format as described above. In the above-described notation format, the difference between the substrings is expressed explicitly using the symbols “[]” and “|”. For this reason, when the difference of the partial character string can be in different notation, a path that bypasses the part can be easily set on the network. For example, the BNF notation described above can be replaced with a compact network as shown below. It has been difficult to generate such a compact network from a conventional sequence of character strings.
[0033]

[0034]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of a place name notation method and a place name character string recognition method according to the present invention will be described in detail with reference to the drawings.
[0035]
FIG. 1 is a flowchart for explaining a processing example of place name character string recognition according to an embodiment of the present invention. First, this flow will be described. In addition, the flowchart used in the following description was expressed according to the Gane Sarson notation. This notation is described in “J. Martin“ Software Structuring Technique ”Modern Science,” “ISBN4-7649-0124-2 C3050 P5562E”.
[0036]
(1) First, prior to recognition of a place name, place name character string generation rule editing processing (step 101) creates a place name character string generation rule based on a case of different place names, and this generation rule is generated as a place name character string. Store in the rule file 102. The place name character string generation rule editing process in step 101 can be realized by a human editing operation via a computer.
[0037]
(2) Next, the place name notation network generation process (step 103) reads the place name character string generation rule file 102 and generates a place name notation network which is a dictionary for the place name recognition 104. The place name notation network generation processing in step 103 can be realized as a program on a computer.
[0038]
(3) Next, the place name recognition process (step 104) reads the place name character string from the input image with reference to the place name notation network. The place name recognition processing 104 in step 104 can be realized as a computer program.
[0039]
The place name character string generation rule file 102 uses the “extended BNF notation” according to the present invention, and expresses a place name different notation group by a generation rule of a context free grammar. The extended BNF notation is an extension of a symbol such as a bond to the BNF notation, and the symbols described below are used.
:: = replacement. This means that the syntax category on the left side can be replaced with the syntax category on the right side or an array of characters.
[] Optional. It means that the description in [] may be omitted.
| Select. It means either the right or left of this symbol.
() Join. The parentheses are evaluated before the surrounding variables.
<W string> Syntax category.
<N number> A syntax category representing a different group of place name character strings indicating a specific region. Numbers are place name identifiers. Use an integer greater than zero.
[0040]
The symbols described above are evaluated according to the following priority order.
(1) Definition of <W character string> and <N number> variable names
(2) [] and () parentheses. When parentheses are used with more than one nesting, priority is given to the inner parentheses.
(3) |
(4) :: =
[0041]
FIG. 2 is a diagram showing a notation example of the place name expressed by the place name character string generation rule edited by the editing process in the above-described step 101 and an example of listing the different notation without using the generation rule.
[0042]
An example of a place name notation represented by the place name character string generation rule shown in FIG. 2 (A) is “Ogataya, Kawagoe City, Saitama Prefecture” (“Higashida”, “Tozeki”, “Nishiguan” are small letters). , “Kawagoi, Kawagoe City, Saitama Prefecture” (“Kubo” and “Kanan” are small characters) and “Shimohiroya, Kawagoe City, Saitama Prefecture” are expressed in the extended BNF notation according to the present invention. In this way, place names including a large number of different notations can be expressed very simply by using the symbols introduced by the present invention. On the other hand, the example in which the different notations are listed without using the generation rule shown in FIG. 2B only lists a large number of different notations. Therefore, from the four-line notation shown in FIG. The number of different notations generated is 106. A part of it is shown in FIG.
[0043]
The place name character string generation rule file 102 is a normal text file, and a general text editor can be applied as means for realizing Step 101 of the place name character string generation rule editing process.
[0044]
FIG. 3 is a diagram schematically showing a place name notation network created from the example of the generation rule of FIG. 2A, and this will be described below.
[0045]
The place name notation network is a directed graph in which each side corresponds to a partial character string and each vertex corresponds to a boundary of the partial character string. The direction of each side matches the order of the characters in the character string. In FIG. 3, the side marked NULL indicates a null transition, that is, there may be no character string at that location. A circle 301 with a line at the lower right in FIG. 3 indicates the start position of the place name character string. Also, circles 302 to 304 with a diagonal line in the center indicate the end position of the place name character string.
[0046]
FIG. 4 is a diagram for explaining a data format when a place name notation network is mounted on a computer. This will be described below. When a place name notation network is implemented on a computer, the place name notation network has a data format as shown in FIG. 4 (left-child.right-sibling representation, T. Colmen et al., “Algorithm Introduction”, Modern Science Co., Ltd. 201-202). In this data format, character connection relations are represented by child pointers, and branches of the place name notation network are represented by sibling pointers.
[0047]
FIG. 4A shows the components of each data record, and each data record consists of three data items, data items c401, b402, and d403. Data item c is a character code, and data item b is a sibling pointer. The data item d is a child pointer. A branch from a data record is expressed in a list form connected by sibling pointers, and a character string is connected by child pointers. For example, when the place name notation network shown in FIG. 3 is expressed in a list format by the data record described above, it becomes as shown in FIG.
[0048]
In the place name notation network expressed in the list format shown in FIG. 4B, the data record 404 ′ (corresponding to the character code “small”) branches to the data records 404 to 406, but the data records 404 ′ to 404 are branched. Are linked by child pointers, and

data records

404, 405, 406 are linked by sibling pointers. The character string “Saitama” is represented by

data records

407, 408, and 409 connected by child pointers. When the data record corresponds to the NULL transition, a NULL character is stored in the character code c401 of the data record, and the data record branched from the data code in which the NULL character is stored may be omitted. Is meant. Further, after the data record corresponding to the last character of the place name character string, one extra data record is provided as shown as the data record 410, and a null pointer is set in the child pointer d of the data record 410. It is stored to indicate the end of the network, and the place name identifier is stored in the sibling pointer b.
[0049]
The list name place name notation network shown in FIG. 4B expressed in the form as described above can be regarded as a graph in which each data record corresponds to a node. In the place name notation network schematically shown in FIG. Here, each side is represented by a node corresponding to the number of characters.
[0050]
FIG. 5 is a flowchart for explaining a process for generating a place name notation network from the place name character string generation rule in step 103 of FIG. 1, and FIG. 6 is a view for explaining an example of a syntax tree to be generated. .
[0051]
First, the control loop 501 causes the local name character string generation rules in the place name notation generation rule file 102 to be generated one by one by the control loop 501 for each line starting with <N number> in the second and subsequent lines from the top of FIG. To process. For each line, first, in step 502, the character string in the line is parsed to create a syntax tree as shown in FIG. Next, at step 503, the terminal node t of the place name notation network corresponding to the place name notation group. _i Is generated. Hereinafter, unless otherwise specified, the “node on the place name notation network” indicates a data record in the format of FIG. t _i NULL is stored in the character code c, the NULL is stored in the child pointer d, and the place name identifier of the place name notation group is stored in the sibling pointer b. Next, in step 504, a place name notation network corresponding to the place name different notation group is generated using a function proc described later. After all the place name character string generation rules have been processed, redundant portions of the generated place name notation network are integrated in step 505.
[0052]
The process of generating a syntax tree from a place name string generation rule is, for example, a generation rule as described in "Introduction to Natural Language Processing" (Modern Science, ISBN4-7649-0143-9, pp. 19-30). A technique for generating a transition network according to the above can be used. The example of the syntax tree generated by the process of step 502 shown in FIG. 6 is an example of the syntax tree generated from the second line in FIG. In FIG. 6, a circle with “+” indicates concatenation of character strings, a circle with “[]” indicates an option, a circle with “|” indicates a selection, and a square indicates a character string. Represents. In the extended BNF notation, parentheses “(”, “”) are also used. However, the syntax tree used in the embodiment of the present invention does not provide a node corresponding to the parenthesis, and the order of operations determined by the parenthesis This is reflected in the structure itself.
[0053]
The function proc is a function used to generate a place name notation network from the syntax tree and takes two arguments, p and a. The argument p specifies a value taken by the child pointer d of the terminal node of the place name notation network to be generated. The argument a indicates the highest node of the syntax tree to be processed. When an argument a is specified for a certain node, all nodes below the argument a are processed recursively.
[0054]
FIG. 7 is a flowchart for explaining the processing operation by the function proc, FIGS. 8 and 9 are diagrams for explaining the process of generating a place name notation network by the function proc, and FIG. 10 is a place name notation network group generated from place name notation generation rules. These will be described below. In FIG. 7, p, q, and r shown in the figure are variables representing the address of the data record in the format shown in FIG. 4, and the symbol “->” denotes a data item in the data record. It represents. Also, the processing of the flow shown in FIG. 7 is executed in four cases according to the type of node a in the syntax tree.
[0055]
(1) The type of the node a in the syntax tree is determined, and it is determined whether the type is “+”, “|”, “[]”, or “character string” (step 701).
[0056]
(2) If it is determined in step 701 that the type of the node a in the syntax tree is “+”, that is, a combination, the argument p is first copied to the variable q. That is, the address of the terminal node of the partial network generated by this process is copied (step 702).
[0057]
(3) Next, a child node n of the syntax tree _i (1 ≦ i ≦ number of child nodes) is processed in order from the right by the function proc () to generate a partial network of the place name notation network. At that time, an argument is passed so that the end point of the partial network generated by the function proc () becomes q. The pointer of the start point of the partial network generated as a result is substituted into q, and the end point of the partial network to be generated next is set. By repeatedly calling the function proc () in this way, the partial networks of the place name notation networks generated from the “+” child nodes of the syntax tree are successively connected (steps 703 and 704).
[0058]
(4) When all child nodes have been processed, q at that time, that is, the top of the partial network is returned as a return value (step 705).
[0059]
(5) If it is determined in step 701 that the type of the node a in the syntax tree is “|”, that is, selection, first, one of the child nodes n ₁ Then, a partial network is generated, and the obtained start address of the partial network is substituted into the variable q (step 706).
[0060]
(6) Next, the value of q is substituted into the variable r, and another child node n _i Partial networks are generated in order from (2 ≦ i ≦ number of child nodes). The start address of the generated partial network is stored in sibling pointer b of r. Further, the head address of the generated partial network is substituted for r, and the same processing is repeated thereafter (steps 707 to 710).
[0061]
(7) When all the child nodes have been processed, q, that is, the first address q of the partial network generated first is returned as a return value (step 711).
[0062]
(8) If the type of the node a in the syntax tree is “[]”, that is, an option in the determination in step 701, first, a partial network corresponding to the child node of the syntax tree is generated, and the start address is set as a variable Store in q. At this time, parameters are designated so that the end of the generated partial network is p (step 712).
[0063]
(9) Next, a node corresponding to the NULL transition is generated using the function newNd (), and the address is stored in the sibling pointer b of q. Note that newNd () is a function for newly securing one storage area for the data record in the format shown in FIG. 4, and a NULL pointer is set in the data item b of the secured data record (step 713). .
[0064]
(10) Next, NULL is assigned to the character code c of the node corresponding to the NULL transition, and p is set to the child node pointer d of the node corresponding to the NULL transition (steps 714 and 715).
[0065]
(11) The head address q of the last generated partial network is returned as a return value (step 716).
[0066]
(12) If it is determined in step 701 that the type of the node a in the syntax tree is a character string, first, the value of p is substituted into the variable q (step 717).
[0067]
(13) Next, the following processing is performed for each character C in the character string. _i For (1 ≦ i ≦ character string length), the process is repeated in order from the end of the character string, and a node corresponding to each character is generated one by one. Here, first, a storage area of one node is secured by the function newNd (). Next, the character code c of the newly generated node is changed to C _i Is assigned. Next, the value q is substituted into the child node d of the newly generated node. Next, the value of q is replaced with the address of the newly generated node (steps 718 to 722).
[0068]
(14) Repeat the above process for each character C _i Then, the newly generated partial network address q is returned as a return value (step 723).
[0069]
In FIG. 8 and FIG. 9 showing the process of generating a place name notation network by the function proc, reference numeral 801 denotes a terminal node generated by the processing of step 503 of the flow shown in FIG. 5 and storing the identifier “3501104”. Thereafter, a place name notation network is generated in order from the top to the bottom of the diagrams shown in FIGS. 8 and 9 by the processing of each step of the flow shown in FIG. Then, when the node 603 of the syntax tree shown in FIG. 6 is processed by the function proc, first, a partial network corresponding to the node 602 is generated by the function proc, and a partial network corresponding to the node 602 is generated as indicated by 802. The Next, a partial network corresponding to the node 604 is generated by the function proc. In this case, p stores the address of the node 804, and the generated partial network is connected to the node 804 as indicated by 803.
[0070]
A separate place name notation network is generated for each local name character string generation rule by the flow control loop 501 shown in FIG. As a result, the place name notation network group generated from the place name notation generation rule of FIG. 2 is generated as shown in FIG. 10, and further, by the processing of step 505, redundant portions of these network groups, for example, By integrating the parts of Kawagoe City, Saitama Prefecture, a place name notation network as described with reference to FIG. 3 is generated.
[0071]
FIG. 11 is a flowchart for explaining a processing procedure for generating a place name notation network using the prior art, FIG. 12 is a view for explaining an example of a place name notation network generation process generated by the prior art, and FIG. FIG. 2 is a diagram showing an example of a place name notation network, and a place name notation network generation method when a generation rule is not used will be described below with reference to these drawings.
[0072]
Here, the reason for explaining the prior art is that only a tree-structured place name notation network called Trie can be generated from the conventional place name character string notation method, and the place name notation network generated from the notation method of the present invention is This is to show that both the storage capacity and the processing time required for collation are excellent. The technique for expressing the place name notation according to the prior art is a list of place name character strings as shown in FIG. 2B, and the flow described with reference to FIG. 11 generates a place name notation network from such a list of words. It is a procedure. Here, the kth character string in FIG. _k , Its length is L _k , The i-th character of each string is C _i And Further, it is assumed that an identifier corresponding to each character string is stored separately. The place name notation network to be generated is realized in the data format shown in FIG.
[0073]
(1) First, a node rr serving as a temporary root of the place name notation network is generated. NULL is set to the child node pointer d of this node (steps 1101 and 1102).
[0074]
(2) All character strings S by loop 1103 _k Are processed one by one.
[0075]
(3) First, the root address is substituted into the variable p. Next, the subroutine SrchNxt is called for each character in the string. Subroutine SrchNxt is a processing procedure for determining whether or not a node corresponding to each character has already been generated, and adding a new node if it has not been generated. This procedure will be described later (steps 1104 to 1106). ).
[0076]
(4) When the subroutine SrchNxt finishes processing the characters in the character string, a new child node is generated by the function newNd (), the identifier of the character string is stored in the area of the pointer b, and this new The address of the child node is assigned to the child node pointer d of p. The child node of rr at the time when the loop 1103 ends becomes the root of the place name notation network (steps 1107 to 1110).
[0077]
Next, the processing of the subroutine SrchNxt will be described.
[0078]
(1) First, the value of the child node d of p is assigned to the variable q, and then a loop process is performed to scan all the child nodes of p with the variable q, and the corresponding character code, that is, the data item c is C. _i To see if they are equal. If they are equal, it is already C _i Assuming that a node corresponding to is generated, the pointer p is advanced to the node q and the process ends (

steps

1111, 1113 to 1115, loop 1112).
[0079]
(2) In the check at step 1113, the data item c is C. _i If not equal to q, the value of the sibling pointer of q is assigned to q, and the loop processing is repeated until q becomes NULL. (Step 1116).
[0080]
(3) Even if loop processing is completed, C _i If a node corresponding to is not found, a new node is generated by the function newNd (), and the character code c of the new node is set to C. _i , NULL is assigned to the child node pointer d, the value of the child node pointer d of p is assigned to the sibling pointer b, the address of this new node is assigned to the child node pointer d of p, and a new child is assigned to the pointer p. Substituting the node address, the processing of this subroutine is terminated (steps 1117 to 1122).
[0081]
FIG. 12 shows a process of generating a place name notation network according to the processing procedure of FIG. 11 described above. The example given here is a process of processing three rows from the top in FIG. First, a place name notation network corresponding to “Ogaya, Kawagoe City” is generated (1201). Next, “Kawagoe City Kasuga” is processed, but since the “Kawagoe City” portion has already been generated in 1201, a new node is not generated. However, when the pointer p reaches the position indicated by 1202 and the character “Kas” is processed, the node corresponding to “Kas” is not a child node of “city”. Therefore, a node corresponding to “Kasa” is newly generated as a “small” sibling node. Thereafter, the node corresponding to the remaining character “幡” is connected as a child node of the newly generated node (1203). In the case of “Kawagoe City Shimohiroya”, the same processing is performed, and a node for “below” is newly generated as a sibling of “small” and “cas” (1204), and nodes corresponding to the subsequent characters are connected ( 1205).
[0082]
FIG. 13 schematically shows a part of the place name notation network generated from the different notation group shown in FIG. 2B, but this example differs from the case shown in FIG. 3 from the conventional notation method. The generated place name notation network is in the form of a tree, that is, once branched, it will not merge again. This is a data representation format known as Trie. Compared to FIG. 3, it can be seen that there are many redundant portions. For example, the partial network corresponding to “Higashida”, “Higashi-Seki”, and “Nishi-Seki” is repeated six times. This means that the required storage capacity increases, and in the case of a computer having a hierarchical memory configuration, if the memory space to be accessed becomes large, the access becomes slow due to a cache miss or the like, which will be described later. The character string matching process itself becomes slow.
[0083]
The ability to generate a place name notation network with less redundancy as shown in FIG. 3 according to the present invention is an essential advantage of place name notation according to the generation rules. By using this generation rule, a redundant part can be clearly expressed. For example, in the example shown in FIG. 2A, there are three different notations for “month” in “Ogaya”, but the character strings after “month” are the same according to the extended BNF notation. It is shown. For this reason, as shown in FIG. 3, a network having three paths only between “small” and “valley” is generated.
[0084]
On the other hand, the conventional place name character string notation method as shown in FIG. 2B cannot detect whether different notation groups after “month” are equivalent, as shown in FIG. Only a network can be created.
[0085]
FIG. 14 is a flowchart for explaining the processing operation in the place name recognition process 104 shown in FIG. 1, which will be described below.
[0086]
(1) First, an image of a character line portion is cut out from the input image by a character line cut-out process (step 1401).
[0087]
(2) Next, by the character cut-out process, a pattern considered as a character, that is, a candidate pattern is cut out from the character line image. If a character boundary cannot be determined uniquely at this stage, a character pattern is extracted based on a plurality of boundary hypotheses, and candidate patterns corresponding to the respective hypotheses are output (step 1402).
[0088]
(3) Next, the character recognition process recognizes what character each extracted candidate pattern is and outputs it as a candidate character string. When the character extraction method is based on a plurality of hypotheses, or when a plurality of candidate characters are output for one pattern as a result of character recognition, the character recognition process is combined with each extraction method and candidate character. Correspondingly, a plurality of candidate character strings are output (step 1403).
[0089]
(4) Finally, it is verified by referring to the place name notation network whether or not each candidate character string is a correct place name character string by the character string matching process. The candidate character string accepted by the collation is used as the place name recognition result (step 1404).
[0090]
FIG. 15 is a flowchart for explaining the processing operation in the above-described character string matching processing 1404. This will be described below. This process is a process in which one character string is input, it is determined whether or not at least a part of the input character string can be accepted as a place name character string, and if it can be accepted, an identifier of the corresponding place name notation is obtained. Here, the length of the input character string is L, and the i-th character of the character string is C _i And
[0091]
(1) First, steps 1502 and 1503 are repeated by changing the starting point s of collation from 1 to L by a loop 1501.
[0092]
(2) The root address of the place name notation network is set in the variable p indicating the node. Next, the function srch is called with arguments p and s. The function srch is a function that finds a path that matches the input character string in the place name notation network and returns the address of the terminal node. If the return value of srch is not a NULL pointer, it is assumed that the collation has succeeded, and the identifier stored in the node indicated by the return value of function srch is output (steps 1502 to 1504).
[0093]
(3) If collation is not successful even if s reaches L, it is determined that the character string collation process has failed, and the process ends (step 1505).
[0094]
In the above-described processing, the function srch is recursively called from itself and searches the place name notation network for a path matching the input character string with depth priority. The function srch takes two arguments, arguments p and i. The argument p indicates the node where the search is started. The argument i is an integer and represents what number character in the input character string is focused on in the current processing. If an accepted string is found, the function srch returns the address of the end node of the string, and if no accepted string is found, returns a NULL pointer.
[0095]
FIG. 16 is a flowchart for explaining the processing operation of the function srch in the above-described processing, which will be described below.
[0096]
(1) First, it is checked whether or not the argument p indicates a character string end node. If the character string end node is pointed to, it is considered that the input character string has been accepted, p is returned as a return value, and the process is terminated (step 1601).
[0097]
(2) Next, it is determined whether or not all characters have been processed. If i is greater than L and all characters have been processed, but p has not reached the end of the place name notation network, NULL is returned (step 1602).
[0098]
(3) Next, the data item c of p is the i-th character C of the character string. _i To see if it matches. If they match, the function srch is recursively called so that the child node p-> d of p is the starting point of the search and the character string is processed from the (i + 1) th. If this return value r is not NULL, it is considered that the character string has been accepted, and the process is terminated with r as the return value (step 1603).
[0099]
(4) Next, it is checked whether p is a node corresponding to the NULL transition. If so, the function srch is recursively called so that the child node p-> d of p is the starting point of the search and the character string is processed from the i-th. If the return value r is not NULL, it is considered that the character string has been accepted, and the process ends with r as the return value (step 1604).
[0100]
(5) Next, it is checked whether sibling node p-> b is connected to p. If connected, the sibling node p-> b of p is used as the search starting point, the function srch is recursively called so as to process the character string from the i-th, and this return value is returned to the upper level (step 1605). ).
[0101]
(6) If the input character string is not accepted in any of the processes described above, no further search is possible, and the process ends with NULL as the return value (step 1606).
[0102]
Although the embodiments of the present invention described above have been described as performing character extraction, character recognition, and character string collating sequentially, the present invention is not limited to Koga et al. (Japanese Patent Application No. 10-28077) [Refer to Japanese Patent No. 3246432] ) Can easily be extended to a method of feeding back the result of character string matching to character extraction.
[0103]
FIG. 17 is a block diagram showing a configuration example of a system to which the place name character string recognition process according to the embodiment of the present invention is applied, and FIG. 18 is a block diagram showing the structure of the place name character string generation rule editing apparatus. This system example is an example in which the present invention is applied to a mail sorting system. 17 and 18, 1701 is a postal sorting machine, 1702 is a scanner, 1703 is a delay line, 1704 is a sorter, 1705 is a place name recognition device, 1706 is an input interface, 1707 is an arithmetic processing device, and 1708 is an output processing device. , 1710 is a memory, 1711 is a network interface, 1712 is a hard disk, 1713 is a medium removable storage device, 1714 is a place name string generation rule editing device, 1718 is a network, 1801 is a mouse, 1802 is a keyboard, 1803 is a display, and 1804 is Place name character string generation rule editing program, 1805 is a character string collation program, 1806 is a place name notation network display program, 1807 is a place name character string generation rule file, and 1808 is a place name notation network generation program. 1809 place name notation network data, 1810 communication apparatus, 1811 media removable storage device, 1812 is a computer.
[0104]
The system shown in FIG. 17 is configured by connecting one or more postal dividers 1701 and one or more place name character string generation rule editing devices 1714 via a network 1718. The postal sorting machine 1701 includes a scanner 1702, a delay line 1703, a sorter 1704, and a place name recognition device 1705. The place name recognition device 1705 includes an input interface 1706, an arithmetic processing device 1707, an output processing device 1708, a memory 1710, a network interface 1711, a hard disk 1712, and a media removable storage device 1713. In addition, the thick line in a figure shows the flow of mail.
[0105]
In the system shown in FIG. 17, the place name image information described in the mail piece input from the scanner 1702 is transferred to the place name recognition apparatus 1705. While the postal matter is conveyed through the delay line 1703, the place name recognition device 1705 recognizes the place name described in the postal matter and transfers the recognition result to the sorter 1704. The sorter 1704 sorts the mail according to the recognition result.
[0106]
As a preparation stage for mail classification, the place name recognition apparatus 1705 reads the place name notation network generation program file from the hard disk 1712 into the memory 1710 and starts it up with the arithmetic unit 1707. Under the control of the place name notation network generation program, the place name recognition device 1705 inputs place name character string generation rules from the place name character string generation rule editing device 1714 via the network interface 1711 to create a place name notation network file to create a hard disk. 1712.
[0107]
The place name character string generation rule may be read from a medium detachable storage device 1713 such as a floppy disk drive, instead of being input from the place name character string generation rule editing device 1714 via the network.
[0108]
The place name recognizing device 1705 reads the recognition program file and the place name notation network file from the hard disk 1712 into the memory 1710 and executes them by the arithmetic unit 1807 when sorting mail items. The place name recognition apparatus 1705 receives an image from the input interface 1706 under the control of the recognition program, recognizes the place name described in the mail piece, and outputs the recognition result via the output interface 1708.
[0109]
As shown in FIG. 18, the place name character string generation rule editing device 1714 includes a mouse 1801, a keyboard 1802, a display 1803, a disk device that stores a place name character string generation rule file 1807 in a computer 1812, a communication device 1810, and removable media. A storage device 1811 is connected. The editing work is executed by editing the place name character string generation rule file 1807 via the place name character string generation rule editing program 1804 operating on the computer 1812. The place name character string generation rule file 1807 is a text file, and a normal text editor can be used for editing. Further, the place name notation network generation program 1808 is executed on the computer 1812, and the place name notation network 1809 can be generated from the place name character string generation rule file 1807.
[0110]
The place name character string generation rule editing device 1714 can confirm whether or not the place name word generation rule being edited is grammatically correct by the above-described function, and is equivalent to the character string matching 1404 in the recognition process. The program 1805 can be executed to check whether or not a test character string input from the keyboard 1803 is accepted.
[0111]
Further, the computer 1812 executes a place name notation network display program 1806 for displaying the place name notation network 1809 in a format as shown in FIG. 3, for example, so that the operator visually confirms the editing result. Can do. The edited place name character string generation rule file 1807 is transferred to the place name recognizing device 1705 via the communication device 1810, or copied to a removable storage medium such as a floppy disk by the media removable storage device 1811 and stored. It may be transported to the mail sorting machine 1701 by media.
[0112]
FIG. 19 is a block diagram showing a configuration of another embodiment of the present invention, and FIG. 20 is a diagram for explaining an example of a screen displayed on the display. This example is a place name record device for searching for information on place names from a character string representing place names using the place name character string notation method and place name collation method according to the present invention. 19, 1901 is a mouse, 1902 is a keyboard, 1903 is a display, 1903 is a printer, 1905 is an input file, 1906 is an output file, 1907 is a place name record program, 1908 is a place name additional information file, and 1909 is a place name character string generation rule. File, 1910 is a communication module, 1911 is an interface module, 1912 is a place name list data, 1913 is a place name list sort module, 1914 is a place name information search module, 1915 is a place name list generation module, 1916 is a character string matching module, and 1917 is a place name notation An expansion module 1918 is a place name notation network generation program, and 1919 is place name notation network data.
[0113]
The apparatus shown in FIG. 19 provides the following service.
(1) Display or print the standard form of the place name character string input from the keyboard.
(2) Display or print a different notation of the place name character string input from the keyboard.
(3) Display or print the local information (such as a zip code) corresponding to the place name character string input from the keyboard.
(4) The place name character string input from the file is converted into information specific to the corresponding area, such as a standard form or a postal code, and output to the file.
(5) The place name character string input from the network is converted into information specific to the corresponding area, such as a standard form or a postal code, and output to the network.
In the above description, the standard form is a formal character string representing a certain area defined by administrative division, for example.
[0114]
In the embodiment shown in FIG. 19, a place name record program 1907 executed on a computer includes a mouse 1901, a keyboard 1902, a display 1903, a printer 1904, an input file 1905, an output file 1906, a place name additional information file 1908, and a place name character string generation. A rule file 1909 is connected. Display and input / output are performed via the interface module 1911. When the character string to be searched is input, the place name information search module 1914 calls the character string matching module 1916. The character string collation module 1916 is a module that performs processing equivalent to the character string collation processing 1404 in FIG. 14, and refers to the place name notation network data 1919 generated from the place name notation generation rule file 1909 by the place name notation network generation program 1918. , And what kind of place name representation of the identifier corresponds to the input string.
[0115]
The place name information search module 1914 searches the place name additional information file 1908 for a standard form and additional information such as a zip code using the obtained identifier as a clue. Further, the place name notation development module 1917 lists all possible different notations from the place name notation network data 1919. The obtained different notation group is stored in the place name list data 1912 and is output via the interface module 1911 as necessary. Further, the place name list sorting module 1913 rearranges the order of the different notation groups according to the instructions of the operator and outputs them. Input for such processing may be performed via any of the keyboard 1901, the input file 1905, and the communication module 1910. The output may be performed via any of the display 1904, the output file 1906, and the communication module 1910.
[0116]
In the screen example displayed on the display 1903 of the embodiment of FIG. 19 shown in FIG. 20, the example shown in FIG. 20A is a case where the operator inputs the character string “Ogaya in Kawagoe City” and searches. It is an example of a screen displayed on the display 1903 when executing the above. The input character string is input to the field 2005, and the search is executed by clicking the button 2006 with the mouse. A character string that is found to correspond to the input character string as a result of the search is displayed in the window 2007. The item “standard” in each line indicates whether or not the character string is in a standard form. The item “place name” displays the character string. In the item “zip code”, a zip code corresponding to the character string is displayed, but other additional information of the area may be displayed.
[0117]
The frames of “standard”, “place name”, and “zip code” arranged in the area 2004 are buttons, and by clicking each button with the mouse, rearrangement of lines based on each item is instructed. A window 2008 is used to specify search options. Here, it is specified whether to display only the standard form, to display a group of different notations based on characters, large letters, etc., or to display a group of different notations based on a common name (such as “** complex”). A button 2002 is a button for instructing printing of display contents, and the button 2001 is a button for switching to a mode for inputting and outputting files instead of the keyboard and the display. A button 2003 is a button for instructing the end of the program.
[0118]
A window 2009 opened in FIG. 20B is a window that displays detailed information such as how to read a place name obtained as a result of collation, small letters, and a zip code. The window 2009 is activated by clicking the search result displayed on the window 2007 with the mouse.
[0119]
The place name character string written by the notation method according to the embodiment of the present invention can be stored and provided as a place name dictionary in a storage medium such as FD, MO, or DVD.
[0120]
【The invention's effect】
As described above, according to the present invention, a place name dictionary that covers all possible place name character strings can be created with a small number of people even when there are many different forms of place names. In addition, it is possible to easily create a network-type place name dictionary capable of high-speed collation processing.
[Brief description of the drawings]
FIG. 1 is a flowchart for explaining a processing example of place name character string recognition according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a notation example of a place name expressed by an edited place name character string generation rule and an example in which different notations are listed without using a generation rule;
FIG. 3 is a diagram schematically showing a place name notation network created from an example of generation rules.
FIG. 4 is a diagram for explaining a data format when a place name notation network is mounted on a computer;
FIG. 5 is a flowchart for explaining processing for generating a place name notation network from place name character string generation rules;
FIG. 6 is a diagram illustrating an example of a generated syntax tree.
FIG. 7 is a flowchart for explaining a processing operation by a function proc for generating a place name notation network from a place name notation generation rule;
FIG. 8 is a diagram (part 1) illustrating a process in which a place name notation network is generated by a function proc;
FIG. 9 is a diagram (part 2) illustrating a process of generating a place name notation network by a function proc;
FIG. 10 is a diagram illustrating a place name notation network group generated from a place name notation generation rule;
FIG. 11 is a flowchart illustrating a processing procedure for generating a place name notation network using a conventional technique.
FIG. 12 is a diagram illustrating an example of a generation process of a place name notation network generated by a conventional technique.
FIG. 13 is a diagram showing an example of a place name notation network generated by a conventional technique.
14 is a flowchart for explaining a processing operation in the place name recognition process shown in FIG. 1; FIG.
FIG. 15 is a flowchart for explaining a processing operation in the character string matching process shown in FIG. 14;
FIG. 16 is a flowchart illustrating the processing operation of the function srch.
FIG. 17 is a block diagram illustrating a configuration example of a system to which place name character string recognition processing according to an embodiment of the present invention is applied.
FIG. 18 is a block diagram showing a configuration of a place name character string generation rule editing device.
FIG. 19 is a block diagram showing a configuration of another embodiment of the present invention.
FIG. 20 is a diagram illustrating an example of a screen displayed on the display.
[Explanation of symbols]
101 Place name string generation rule editing process
102 Place name string generation rule file
103 Place name notation network generation processing
104 Place name recognition processing
1404 Character string matching process
1701 Postal sorting machine
1702 Scanner
1703 Delay line
1704 Sorter
1705 Place name recognition device
1706 Input interface
1707 arithmetic processing unit
1708 Output processing device
1710 memory
1711 Network interface
1712 hard disk
1713 Media removable storage device
1714 Place name string generation rule editing device
1718 network
1801 mouse
1802 Keyboard
1803 display
1804 Location name character string generation rule editing program
1805 String matching program
1806 Place name notation network display program
1807 Place name string generation rule file
1808 Place name notation network generation program
1809 place name network data
1810 communication apparatus
1811 Media removable storage device
1812 computer
1901 mouse
1902 Keyboard
1903 display
1904 Printer
1905 Input file
1906 Output file
1907 place name record program
1908 Place name additional information file
1909 Place name string generation rule file
1910 Communication module
1911 interface module
1912 Place name list data
1913 Place name list sorting module
1914 Place name information search module
1915 Place name list generation module
1916 Character string matching module
1917 Place name notation expansion module
1918 Place name notation network generation program
1919 Network data with place names

Claims

In a place name character string collating method in a place name recognition apparatus having an input interface for receiving input of a character string, a memory for storing a plurality of place name character strings, an arithmetic processing unit, and an output interface,
In the memory, an array of characters or syntax categories is defined for each partial character string constituting part or all of the place name character string, and a plurality of characters represented by a syntax category consisting of an array of characters or defined syntax categories. A place name notation network that is generated based on a place name character string and represents a connection relation of characters in the plurality of place name character strings using a directed graph is stored in advance.
The arithmetic processing unit determines whether or not a partial character string of an input character string input from the input interface matches one of a plurality of place name character strings stored in advance as a place name notation network in the memory. A place name character string collation method comprising: collating a place name from an input character string and outputting the collation result from the output interface.

Define an array of characters or syntax categories for each partial string that makes up part or all of the place name string, and based on the place name string represented by a syntax category consisting of an array of letters or predefined syntax categories Storage means for storing a place name notation network generated by using a directed graph, and a storage means for inputting a character string; and the input character string is stored in the storage means. A place name character string collating apparatus comprising: means for collating whether or not the place name character string is stored as a place name notation network; and means for outputting a result of collation.

A character reading means for reading characters written on the document using an image obtained by converting the density of the surface of the document into an electric signal, and a place name character string matching device according to claim 2, A place name character string recognition apparatus, wherein an input means in a place name character string collating apparatus inputs a character string from the character reading means.

A place name character string recognizing device according to claim 3 is used to recognize a place name character string in a mail address, to classify the mail piece, or to print the recognition result on the mail piece. Mail sorting system.

In the place name character string recognition device that takes an image obtained by converting the shading of the surface of the document into an electric signal, reads characters written on the document, and recognizes the place name character string,
A place name representing a region is a different character string, but represents a set of place name character strings with multiple different representations expressed by an array of words that mean the same region, and forms part or all of the place name character string Means for defining a character or syntax category array for each partial character string to be stored, and using a place name expression method that represents a place name string by a syntax category consisting of an array of characters or predefined syntax categories When,
A partial image in the input image, wherein each partial image is collated to find a similar character to each character included in one of the place name character strings represented by the place name character string representation ; A place name character string recognition apparatus comprising: means for recognizing a place name character string by feeding back a collation result to selection of the arrangement of the partial images .