JP2001014311A

JP2001014311A - Place name representing method, and method and device for place name character string recognition

Info

Publication number: JP2001014311A
Application number: JP11187753A
Authority: JP
Inventors: Masashi Koga; 昌史古賀; Naohiro Furukawa; 直広古川; Shoji Ikeda; 尚司池田; Hisao Ogata; 日佐男緒方; Yutaka Sako; 裕酒匂; Hiromichi Fujisawa; 浩道藤澤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1999-07-01
Filing date: 1999-07-01
Publication date: 2001-01-19
Anticipated expiration: 2019-07-01
Also published as: JP3709305B2; CN100424676C; KR20010015113A; KR100692327B1; CN1287317A

Abstract

PROBLEM TO BE SOLVED: To permit a high-speed collating process by generating a place name dictionary covering all place name character strings even when place names has many different notations, with less man-power. SOLUTION: Different notations of a place name are represented by extending a BNF notation method known as one representing method for the generation rule of context free grammar to match to the notation of a place name. According to the generation rule, the pattern of a typical different notation, e.g. small 'ke' and large 'ke' in the square form of Japanese syllabary and 'ga' in the cursive form of Japanese syllabary are defined as one syntax category, so that a set of the different notations of the place name can be simply represented. Further, the different notations of the place name can be simply represented by using select symbols adopted by the BNF notation method. Namely, the different notations of the place name are represented according to the generation rule (102) and recognition (104) is performed by using a network obtained from the generation rule. Consequently, the dictionary into which the set of various different notations is entered can be easily generated without any omission.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、地名群表現方法、
地名文字列認識方法及び装置に係り、特に、文書上に記
載された地名を読取る装置に使用される地名文字列記憶
手段及び照合手段に適用して好適な地名群表現方法、地
名文字列認識方法及び装置に関する。TECHNICAL FIELD The present invention relates to a method for expressing a place name group,
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a place name character string recognition method and apparatus, and in particular, a place name group expression method and a place name character string recognition method suitable for a place name character string storage means and a collation means used in an apparatus for reading a place name written on a document. And an apparatus.

【０００２】[0002]

【従来の技術】一般に、都道府県名、市町村名、字名等
の地名単語の並びからなる文字列（以下、地名文字列と
いう）を画像中から読み取る文字認識装置は、（１）文
字パターンを切り出す（文字切出し）、（２）各々の文
字パターンの字種（文字コード）を識別する（文字識
別）、（３）文字の識別結果を予め記憶した地名単語の
列と照合する（文字列照合）、の３つの機能を備えて構
成されている。2. Description of the Related Art In general, a character recognition device that reads a character string composed of a sequence of place name words such as a prefecture name, a municipal name, a character name, and the like (hereinafter, referred to as a place name character string) from an image has the following characteristics. (Character extraction), (2) identifying the character type (character code) of each character pattern (character identification), (3) collating the character identification result with a prestored string of place name words (character string collation) ) And three functions.

【０００３】文字列照合の方法に関する従来技術とし
て、例えば、丸川等による方式（情報処理学会論文誌第
３５巻第６号「手書き漢字住所認識のためのエラー修正
アルゴリズム」）等が知られている。また、文字切出
し、認識、照合を一体化した方式に関する従来技術とし
て、隠れマルコフモデルに基づく方式（O. E. Agazzi、
et al., "Connected And Degraded Text Recognition
using Planar Hidden Markov Models、" Proceedings o
f International Conference on Acoustics、 Speech、
and Signal Processing ）、探索的に文字列を認識す
る方法（Koga、 et al.,“Lexical Search Approach fo
r Character-String Recognition” Third Internation
al Association for Pattern Recognition Workshop on
Document Analysis Systems 1998）が知られている。As a conventional technique relating to a character string collation method, for example, a method by Marukawa et al. (Information Processing Society of Japan, Vol. 35, No. 6, "Error Correction Algorithm for Handwritten Kanji Address Recognition") is known. . In addition, as a conventional technology related to a method that integrates character extraction, recognition, and collation, a method based on a hidden Markov model (OE Agazzi,
et al., "Connected And Degraded Text Recognition
using Planar Hidden Markov Models, "Proceedings o
f International Conference on Acoustics, Speech,
and Signal Processing), a method for exploratory recognition of character strings (Koga, et al., “Lexical Search Approach fo
r Character-String Recognition ”Third Internation
al Association for Pattern Recognition Workshop on
Document Analysis Systems 1998) is known.

【０００４】前述した従来技術は、文字列照合処理のた
めに、予め出現し得る地名文字列を記憶する手段、地名
文字列辞書が必要である。そして、地名文字列辞書とし
ては、以下示すような３種類のものがある。（１）ファイルに格納された「辞書ソースファイル」これは、後述する「地名表記規則ファイル」等であり、
新規作成や修正のために、編集が可能でなくてはならな
い。（２）メモリ上に格納された「辞書テーブル」これは、後述する「地名表記ネットワーク」等であり、
辞書ファイルの内容を、照合処理に適した形式でメモリ
上に展開したものである。（３）前述の（１）と（２）との中間段階の「辞書バイ
ナリファイル」これは、メモリ上への展開を容易にするため、予め展開
処理の一部を施した結果をファイルに格納したものであ
る。The above-mentioned prior art requires a means for storing a place name character string that can appear in advance and a place name character string dictionary for the character string collation processing. There are three types of place name character string dictionaries as shown below. (1) “Dictionary source file” stored in the file This is a “place name notation rule file” described later, and the like.
Editing must be possible for new creation or modification. (2) “Dictionary table” stored on the memory This is a “place name notation network” described later, and the like.
The contents of the dictionary file are expanded on the memory in a format suitable for the matching process. (3) "Dictionary binary file" at an intermediate stage between the above (1) and (2) This is to store in a file the result of performing a part of the expansion processing in advance in order to facilitate expansion on the memory. It was done.

【０００５】従来技術に使用される辞書ソースファイル
の形式は明らかにされていない場合が多い。しかし、従
来技術は、いずれも、出現し得る地名文字列を予めもれ
なく辞書テーブルに記憶することを前提にしており、こ
のため、出現し得る地名文字列を、予めもれなく列挙し
たテキストファイルが、辞書ソースファイルとして用い
られていると考えられる。[0005] In many cases, the format of a dictionary source file used in the prior art has not been clarified. However, all of the conventional techniques are based on the premise that all possible place name character strings are stored in a dictionary table in advance, so that a text file listing all possible place name character strings in advance is stored in a dictionary. It is considered to be used as a source file.

【０００６】[0006]

【発明が解決しようとする課題】前述した従来技術は、
文字列照合処理のために辞書を用意する必要があるが、
日本語には、同一の地域を異なる文字列で表現する異表
記が多く、出現し得る地名文字列を辞書にもれなく登録
することが困難であり、このための完全な辞書を人手で
作成することが事実上不可能であるという問題点を有し
ている。The prior art described above is
It is necessary to prepare a dictionary for the string matching process,
In Japanese, there are many different notations that represent the same area with different character strings, and it is difficult to register possible place name character strings in the dictionary, so it is necessary to manually create a complete dictionary for this Has the problem that it is virtually impossible.

【０００７】日本語の地名の異表記には、使用する文字
の違いによる異表記、単語の省略による異表記、付加的
な文字列による異表記、通り名の表記による異表記等が
ある。以下、これらの異表記の例について説明する。[0007] The different notations of Japanese place names include different notations due to differences in characters used, different notations due to omission of words, different notations due to additional character strings, different notations due to notation of street names, and the like. Hereinafter, examples of these different notations will be described.

【０００８】（１）使用する文字の違いによる異表記「小沢」と「小澤」、「市ヶ谷」と「市ケ谷」と「市が
谷」等がある。(1) Different notations due to differences in characters used There are “Ozawa” and “Ozawa”, “Ichigaya”, “Ichigaya”, “Ichigaya”, and the like.

【０００９】（２）単語の省略による異表記都道府県名を省略する異表記、「大字」、「字」を省略
する異表記がある。都道府県名を省略する異表記は、郵
便物の宛名等の場合に多く見られ、例えば、「埼玉県川
越市大字小ヶ谷」と「川越市大字小ヶ谷」等がある。ま
た、「大字」、「字」を省略する例として、例えば、
「埼玉県川越市大字小ヶ谷」と「埼玉県川越市小ヶ谷」
等がある。(2) Different notation by omitting words There are different notations by omitting the names of prefectures, and different notations by omitting “large characters” and “characters”. Omitting the abbreviation of a prefecture name is often seen in the case of a mailing address of a mail, for example, "Ogaya, Kawagoe-shi, Saitama Prefecture" and "Ogaya, Kawagoe-shi, Ogaya". In addition, as an example of omitting “large characters” and “characters”, for example,
"Ogaya, Kawagoe City, Saitama Prefecture" and "Ogaya, Kawagoe City, Saitama Prefecture"
Etc.

【００１０】（３）付加的な文字列による異表記小字名等の本来住所の特定には不要である文字列が付加
され異表記であり、例えば、「埼玉県川越市大字小ヶ
谷」と「埼玉県川越市大字小ヶ谷字東関」等がある。(3) Different notation by additional character strings Character strings that are not necessary for identification of an address such as a small letter name are added and are different notations. For example, "Ogaya Ogaya, Kawagoe-shi, Saitama""Saitama prefecture Kawagoe city Ogaya character Higashiseki" and so on.

【００１１】（４）通り名の表記による異表記京都などで多く見られるもので、例えば、「京都市下京
区大政所町」と「烏丸通仏光寺下る」等がある。[0011] (4) Different notation of street name Notation that is often found in Kyoto and the like, for example, "Omasamachi-cho, Shimogyo-ku, Kyoto" and "Karasuma-dori Bukkoji-jiru".

【００１２】前述したように、地名の異表記には各種の
ものがあるが、例えば、「埼玉県川越市小ヶ谷」という
地名を例にして、これに対応する異表記を調べて見る
と、次に列挙するように、極めて多数の異表記があるこ
とが判る。「埼玉県川越市小ヶ谷」「埼玉県川越市小ケ谷」「埼玉県川越市小が谷」「埼玉県川越市大字小ヶ谷」「埼玉県川越市大字小ケ谷」「埼玉県川越市大字小が谷」「川越市小ヶ谷」「川越市小ケ谷」「川越市小が谷」「川越市大字小ヶ谷」「川越市大字小ケ谷」「川越市大字小が谷」As described above, there are various types of different names of place names. For example, when a place name of "Ogaya, Kawagoe-shi, Saitama" is taken as an example, the corresponding different notation is examined. It can be seen that there are a large number of different notations, as listed below. `` Ogaya, Kawagoe City, Saitama Prefecture '' `` Ogaya, Kawagoe City, Saitama Prefecture '' `` Ogaya, Kawagoe City, Saitama Prefecture '' `` Ogaya, Kawagoe City, Saitama Prefecture '' `` Ogaya, Kawagoe City, Saitama Prefecture '' `` Saitama Prefecture Ogaya, Kawagoe-shi Ogaya, Kawagoe-shi Ogaya, Kawagoe-shi Ogaya, Kawagoe-shi Ogaya, Kawagoe-shi Ogaya, Kawagoe-shi Ogaya, Kawagoe-shi Ogaya, Kawagoe-shi valley"

【００１３】前述の例では、さらに「埼玉県川越市小ヶ
谷東田」、「埼玉県川越市小ヶ谷東関」、「埼玉県川越
市小ヶ谷西関」等、小字名が併せて用いられる場合があ
り、前述で列記した１２の異表記との組み合わせを考慮
すると、合計８４通りの異表記が存在することになる。[0013] In the above example, small names such as "Ogaya Higashida, Kawagoe City, Saitama Prefecture", "Ogaya Higashiseki, Kawagoe City, Saitama Prefecture", and "Ogaya Nishiseki, Kawagoe City, Saitama Prefecture" are also used. In consideration of the combination with the twelve different notations listed above, there are a total of 84 different notations.

【００１４】従来技術の場合、前述したような多様な異
表記の全ての組み合わせを、網羅的に人手で辞書ファイ
ルに登録する必要があり、このため、辞書ファイル作成
には多くの人手がかかるという問題があった。しかも、
異表記が特に多い京都市等の場合、市内の町名と通り名
との呼び方の合計が数十万通りにものぼり、完全な辞書
を人手で作成することは事実上不可能であった。In the case of the prior art, it is necessary to comprehensively register all combinations of the various different notations described above in the dictionary file manually, and therefore, it takes a lot of labor to create the dictionary file. There was a problem. Moreover,
In the case of Kyoto City, where there are many different notations, the total name of street names and street names in the city is hundreds of thousands, and it was virtually impossible to create a complete dictionary by hand .

【００１５】本発明の第１の目的は、前述した問題点を
解決し、文字列照合用辞書に多様な異表記をもれなく登
録することを容易にすることのできる地名表記方法を提
供することにある。A first object of the present invention is to solve the above-mentioned problems and to provide a place name notation method capable of easily registering various different notations without exception in a character string collation dictionary. is there.

【００１６】前述したように、地名の表記に異表記が多
い場合、仮に異表記をもれなく辞書に記載することがで
きたとしても、従来技術のものでは、辞書の記憶容量が
大きくなり、処理時間も異表記の数に応じてに大きくな
ってしまうという問題点を生じることになる。As described above, in the case where there are many different notations in place names, even if it is possible to write all the different notations in the dictionary, the conventional technology increases the storage capacity of the dictionary and increases the processing time. Also increases according to the number of different notations.

【００１７】前述の問題点を解決することができる技術
として、トライ（Trie）と呼ばれるデータ形式により、
辞書の記憶容量を小さくし、さらに照合処理を高速にす
ることができるようにした技術が、（Koga、 et al.,
“Lexical Search Approach for Character-String Rec
ognition” Third International Association for Pat
tern Recognition Workshop on Document Analysis Sys
tems 1998）等に記載されて知られている。この技術
は、表記に多様さがある部分のみ分岐するような、木形
式のデータとして地名を表記するものであり、文字列の
集合からTrieを自動的に生成することを容易にしたもの
である。As a technique capable of solving the above-mentioned problem, a data format called Trie is used.
A technology that reduced the storage capacity of dictionaries and made the matching process faster was developed (Koga, et al.,
“Lexical Search Approach for Character-String Rec
ognition ”Third International Association for Pat
tern Recognition Workshop on Document Analysis Sys
tems 1998). This technology describes place names as tree-like data, such as branching only in parts with various notations, and facilitates automatic generation of Tries from a set of character strings. .

【００１８】前述の技術は、例えば、「埼玉県川越市小
ヶ谷東田」、「埼玉県川越市小ヶ谷東関」、「埼玉県川
越市小ヶ谷西関」の３つの表記から、以下のようなTrie
を容易に生成することができる。以下、このように、地名文字列における文字の連接関係
をネットワークで表したものを地名表記ネットワークと
呼ぶこととする。[0018] The above-mentioned technology is based on, for example, three notations of "Ogaya Higashida, Kawagoe-shi, Saitama", "Higashiseki, Ogaya, Kawagoe-shi, Saitama", and "Ogaya-Nishiseki, Kawagoe-shi, Saitama". Trie like
Can be easily generated. Hereinafter, a network that represents the concatenation relationship of the characters in the place name character string will be referred to as a place name notation network.

【００１９】しかし、このようなTrie型の地名表記ネッ
トワークは、文字列の一部分に相違がある場合、これら
を全く別の文字列として扱い、別の枝を生成せざるを得
ないことになり、このため、例えば、「埼玉県川越市小
ヶ谷」の異表記群に対応するTrieは、次に示すような大
きなものになってしまう。However, in the Trie type place name notation network, when there is a difference in a part of a character string, it is necessary to treat the character string as a completely different character string and generate another branch. Therefore, for example, Trie corresponding to the different notation group of “Ogaya, Kawagoe-shi, Saitama” becomes a large one as shown below.

【００２０】（以下略）[0020] (Abbreviated below)

【００２１】前述したように、Trie型の地名表記ネット
ワークを用いる手法で異表記を表現しようとしても、辞
書容量、処理時間ともに大幅に増大してしまうという問
題点を生じてしまう。As described above, even if an attempt is made to express a different notation by a method using a Trie type place name notation network, there arises a problem that both the dictionary capacity and the processing time are greatly increased.

【００２２】従って、本発明の目的は、多様な異表記を
認識するのための地名辞書に使用するのための、記憶容
量が小さくかつ高速に照合処理ができるような記憶形式
を持つ地名表現方法、地名文字列認識方法及び装置を提
供することにある。Accordingly, an object of the present invention is to provide a place name expression method having a storage format with a small storage capacity and capable of high-speed collation processing for use in a place name dictionary for recognizing various different notations. And a place name character string recognition method and apparatus.

【００２３】[0023]

【課題を解決するための手段】本発明によれば前記目的
は、地域を表す地名が、異なる文字列であるが同一の地
域を意味する単語の配列により表現される複数の異表記
を持つ地名文字列の集合を表記する地名表現方法におい
て、地名文字列の一部または全部を構成する部分文字列
毎に、文字または構文カテゴリの配列を定義し、文字ま
たは定義済みの構文カテゴリの配列からなる構文カテゴ
リにより地名文字列を表すことにより達成される。According to the present invention, it is an object of the present invention to provide a method in which a place name representing an area has a plurality of different notations represented by an array of words having different character strings but meaning the same area. In the place name expression method of representing a set of character strings, an array of characters or syntax categories is defined for each substring constituting a part or all of the place name character string, and the array of characters or defined syntax categories is formed. Achieved by representing place name strings by syntax category.

【００２４】また、前記目的は、前記地名文字列を、構
文カテゴリが別のどのような文字または構文カテゴリ列
に置換されるかを表す置換記号と、ある構文カテゴリが
特定の地域を表すことを示す地名記号とを用いて表現す
ることにより達成される。In addition, the object is that the place name character string is replaced with a substitution symbol indicating what kind of character or syntax category string the syntax category is replaced with, and that a certain syntax category represents a specific region. This is achieved by expressing using the place name symbols shown.

【００２５】また、前記目的は、入力文字列の部分文字
列が、予め与えられた、地名文字列の一部または全部を
構成する部分文字列毎に、文字または構文カテゴリの配
列を定義し、文字または定義済みの構文カテゴリの配列
からなる構文カテゴリにより表された地名文字列の１つ
と一致するか否かを判断することにより、入力文字列の
中から地名を照合することにより達成される。[0025] The object is to define an array of character or syntax categories for each partial character string that constitutes part or all of a place name character string, wherein the partial character string of the input character string is provided. This is achieved by matching a place name from the input string by determining whether it matches one of the place name strings represented by a syntax category consisting of a character or an array of defined syntax categories.

【００２６】また、前記目的は、地名文字列の一部また
は全部を構成する部分文字列毎に、文字または構文カテ
ゴリの配列を定義し、文字または定義済みの構文カテゴ
リの配列からなる構文カテゴリにより表された地名文字
列を記憶する記憶手段と、文字列を入力する入力手段
と、入力された文字列が前記記憶手段に記憶した地名文
字列であるか否かを照合する手段と、照合の結果を出力
する手段と備えたことをにより達成される。The object is to define an array of characters or syntax categories for each partial character string that constitutes a part or all of a place name character string, and to define an array of characters or defined syntax categories. Storage means for storing the represented place name character string; input means for inputting the character string; means for comparing whether the input character string is the place name character string stored in the storage means; This is achieved by providing means for outputting a result.

【００２７】また、前記目的は、文書の表面の濃淡を電
気信号に変換して得られた画像を入力として、文書上に
記載されていた文字を読み取る文字読取り手段を備え、
前記入力手段が前記文字読取り手段からの文字列を入力
することにより達成される。[0027] Further, the object is to provide a character reading means for reading a character written on a document by using an image obtained by converting the density of the surface of the document into an electric signal as an input,
This is achieved by the input means inputting a character string from the character reading means.

【００２８】具体的に言えば、本発明は、前記目的を達
成するため、地名の異表記を文脈自由文法の生成規則を
用いて表現する。文脈自由文法は、ある文の要素（構文
カテゴリ）がどのような他の構文カテゴリの列に置換さ
れるかを、生成規則により表わす(「自然言語処理入
門」近代科学社、ISBN4-7649-0143-9）。本発明は、生
成規則の表現法の１つとして知られるＢＮＦ記法（Back
us-Naur-Form）（中田「コンパイラ」ISBN4-7828-5057-
3)を、地名の表現に適するよう拡張した拡張ＢＮＦ記法
を用いる。Specifically, in order to achieve the above object, the present invention expresses a different notation of a place name using a generation rule of a context-free grammar. In the context-free grammar, generation rules indicate which elements of a certain sentence (syntax category) are replaced by what other syntactic categories ("Introduction to Natural Language Processing", Modern Science, ISBN4-7649-0143 -9). The present invention uses the BNF notation (Back
us-Naur-Form) (Nakada “Compiler” ISBN4-7828-5057-
Use extended BNF notation that extends 3) to be suitable for the representation of place names.

【００２９】前述の生成規則により、典型的な異表記の
パタン、例えば、「ヶ」、「ケ」、「が」を１つの構文
カテゴリとして定義することができ、地名の異表記の集
合を簡潔に表現できる。さらに、ＢＮＦ記法で採用され
ている選択記号を用いることにより、地名の異表記をよ
り簡潔に表現することが可能となる。このため、本発明
によれば、多様な異表記の集合をもれなく記載した辞書
を容易に作成することができる。According to the above-described generation rules, typical patterns of different notations, for example, “ga”, “ke”, and “ga” can be defined as one syntax category, and a set of different notations of place names can be simplified. Can be expressed as Further, by using the selection symbols adopted in the BNF notation, it is possible to express the notation of the place name more simply. Therefore, according to the present invention, it is possible to easily create a dictionary in which a set of various different notations is completely described.

【００３０】ＢＮＦ記法は、文脈自由文法の生成規則
を、置換、オプション、選択等の記号を用いて表現する
記法であり、以下のような記号を用いる。 ::＝置換。左辺の構文カテゴリを右辺の構文カ
テゴリまたは文字の配列で置換できることを意味する。［］オプション。［］内の記述があっても
なくてもよいことを意味する。｜選択。右辺、左辺のいづれかを意味する。The BNF notation is a notation for expressing the generation rule of the context-free grammar by using symbols such as replacement, option, and selection. The following symbols are used. :: = substitution. This means that the left-hand syntax category can be replaced with the right-hand syntax category or character array. [] Optional. It means that the description in [] is optional. | Select. It means either the right side or the left side.

【００３１】一例として、前述の「埼玉県川越市小ヶ
谷」の異表記の生成規則をＢＮＦ記法で表現した例を以
下に示す。＜ｗヶ＞ ::＝ヶ｜ケ｜が＜埼玉県川越市小ヶ谷＞::＝［埼玉県］川越市［大字］
小＜ｗヶ＞谷［［字］東田｜東関｜西関］As an example, an example in which the above-described generation rule of the different notation of “Ogaya, Kawagoe-shi, Saitama” is expressed by BNF notation is shown below. <W ka> :: = ka | ke | is <Ogaya, Kawagoe-shi, Saitama> ::: = [Saitama Prefecture] Kawagoe-shi [larger characters]
Small <w> valley [[character] Higashida |

【００３２】また、前述のような表記形式を用いること
により、地名表記ネットワークを小形化することが可能
となる。前述の表記形式では、部分文字列の相違は記号
「［］」や「｜」を用いて陽に表現されている。この
ため、部分文字列の相違が異表記にありえる場合、その
部分をバイパスするような経路をネットワーク上に容易
に設定することができる。例えば、前述のＢＮＦ記法の
表記は、下に示すようなコンパクトなネットワークに置
き換えることができる。従来のような文字列の羅列から
このようなコンパクトなネットワークを生成すること
は、困難であった。Further, by using the notation format as described above, it is possible to reduce the size of the place name notation network. In the above-mentioned notation format, the difference between the partial character strings is explicitly expressed using the symbols “[]” and “|”. For this reason, in the case where the difference between the partial character strings may be different, it is possible to easily set a path on the network that bypasses the partial character string. For example, the BNF notation described above can be replaced with a compact network as shown below. It has been difficult to generate such a compact network from a sequence of character strings as in the related art.

【００３３】 [0033]

【００３４】[0034]

【発明の実施の形態】以下、本発明による地名表記方法
及び地名文字列認識方法の実施形態を図面により詳細に
説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of a place name notation method and a place name character string recognition method according to the present invention will be described in detail with reference to the drawings.

【００３５】図１は本発明の実施形態による地名文字列
認識の処理例を説明するフローチャートであり、まず、
このフローについて説明する。なお、以下の説明におい
て使用するフローチャートは、ゲーン・サーソン記法に
従って表現した。この記法に関しては、「Ｊ。マーチン
「ソフトウエア構造化技法」近代科学社」、「ISBN4- 7
649 - 0124 - 2 C3050 P5562E」に記載されている。FIG. 1 is a flow chart for explaining a processing example of place name character string recognition according to the embodiment of the present invention.
This flow will be described. In addition, the flowchart used in the following description was expressed in accordance with the Gain-Thurson notation. For this notation, see "J. Martin" Software Structuring Technique "Modern Science Company", "ISBN4-7
649-0124-2 C3050 P5562E ".

【００３６】（１）まず、地名の認識に先立ち、地名文
字列生成規則編集処理（ステップ１０１）が、地名の異
表記の事例に基づき地名文字列の生成規則を作成し、こ
の生成規則を地名文字列生成規則ファイル１０２に格納
する。ステップ１０１の地名文字列生成規則編集処理
は、計算機を介した人間の編集作業により実現すること
ができる。(1) First, prior to the recognition of a place name, a place name character string generation rule editing process (step 101) creates a place name character string generation rule based on a case of a different notation of a place name, It is stored in the character string generation rule file 102. The place name character string generation rule editing process in step 101 can be realized by a human editing operation via a computer.

【００３７】（２）次に、地名表記ネットワーク生成処
理（ステップ１０３）が、地名文字列生成規則ファイル
１０２を読み込み、地名認識１０４のための辞書である
地名表記ネットワークを生成する。ステップ１０３の地
名表記ネットワーク生成処理は、計算機上のプログラム
として実現することができる。(2) Next, the place name notation network generation processing (step 103) reads the place name character string generation rule file 102 and generates a place name notation network which is a dictionary for the place name recognition 104. The place name notation network generation processing in step 103 can be realized as a program on a computer.

【００３８】（３）次に、地名認識処理（ステップ１０
４）が、地名表記ネットワークを参照し、入力画像中か
ら地名文字列を読み取る。ステップ１０４の地名認識処
理１０４は、計算機上のプログラムとして実現すること
ができる。(3) Next, place name recognition processing (step 10)
4) reads the place name character string from the input image with reference to the place name notation network. The place name recognition processing 104 in step 104 can be realized as a program on a computer.

【００３９】地名文字列生成規則ファイル１０２は、本
発明による「拡張ＢＮＦ記法」を用い、地名の異表記群
を文脈自由文法の生成規則により表現する。拡張ＢＮＦ
記法は、ＢＮＦ記法に結合等の記号を拡張したものであ
り、以下に説明するような記号を用いられる。 ::＝置換。左辺の構文カテゴリを右辺の構文カテゴリ、または、文字の配列で置換できることを意味する。［］オプション。［］内の記述を省略してもよい事を意味する。｜選択。この記号の右、左のいづれかを意味する。（）結合。前後の変数より先に括弧内を評価する。＜Ｗ文字列＞構文カテゴリ。＜Ｎ数字＞特定の地域を示す地名文字列の異表記群を表す構文カテゴリ。数字は、地名の識別子。０より大きい整数を用いる。The place name character string generation rule file 102 uses the "extended BNF notation" according to the present invention to represent a group of different names of place names according to a generation rule of a context-free grammar. Extended BNF
The notation is an extension of a symbol such as a combination to the BNF notation, and the notation described below is used. :: = substitution. This means that the left-hand syntax category can be replaced with the right-hand syntax category or an array of characters. [] Optional. This means that the description in [] may be omitted. | Select. Means either right or left of this symbol. () Join. Evaluate the parentheses before the surrounding variables. <W character string> Syntax category. <N number> Syntax category representing a group of different notations of a place name character string indicating a specific area. The number is the identifier of the place name. Use an integer greater than 0.

【００４０】そして、前述した記号は、以下の優先順位
により評価される。（１）＜Ｗ文字列＞及び＜Ｎ数字＞の変数名の定義（２）［］及び（）のかっこ類。２重以上の入れ子で
かっこ類を用いる場合、内側のかっこを優先して評価（３）｜（４）::＝The above-mentioned symbols are evaluated according to the following priorities. (1) Definition of variable names of <W character string> and <N number> (2) Parentheses of [] and (). When parentheses are used in double or more nests, evaluation takes precedence over inner parentheses. (3) | (4) ::: =

【００４１】図２は前述のステップ１０１による編集処
理で編集された地名文字列生成規則により表現された地
名の表記例と生成規則を用いずに異表記を羅列した例と
を示す図である。FIG. 2 is a diagram showing an example of notation of place names expressed by the place name character string generation rules edited by the editing processing in step 101 and an example of listing different notations without using the generation rules.

【００４２】図２（Ａ）に示す地名文字列生成規則によ
り表現された地名の表記例は、その例として、「埼玉県
川越市大字小ヶ谷」（「東田」「東関」「西関」が小
字）、「埼玉県川越市大字笠幡」（「久保」「河南」が
小字）、「埼玉県川越市下広谷」の異表記を本発明によ
る拡張ＢＮＦ記法で表記した例である。このように、多
数の異表記を含む地名を、本発明により導入した記号を
用いることにより、極めて簡単に表現することができ
る。これに対して、図２（Ｂ）に示す生成規則を用いず
に異表記を羅列した例は、多数の異表記を羅列するだけ
であるので、図２（Ａ）に示す４行の表記から生成され
る異表記の数は１０６通りにもなる。図２（Ｂ）に示し
ているのはその一部である。The notation example of the place name represented by the place name character string generation rule shown in FIG. 2A is, as an example, "Ogaya Ogaya, Kawagoe-shi, Saitama"("Higashida""Higashiseki""Nishiseki") This is an example of using the extended BNF notation according to the present invention to express the different notations of "Kazahata, Kawagoe City, Saitama Prefecture"("Kubo" and "Kanan" are small letters) and "Shimohiroya, Kawagoe City, Saitama Prefecture". As described above, a place name including many different notations can be expressed very easily by using the symbol introduced according to the present invention. On the other hand, in the example in which different notations are listed without using the generation rules shown in FIG. 2B, only a large number of different notations are listed, so the four lines shown in FIG. As many as 106 different notations are generated. FIG. 2B shows a part thereof.

【００４３】地名文字列生成規則ファイル１０２は、通
常のテキストファイルであり、地名文字列生成規則編集
処理のステップ１０１の実現手段としては一般的なテキ
ストエディタを適用することが可能である。The place name character string generation rule file 102 is a normal text file, and a general text editor can be applied as a means for implementing step 101 of the place name character string generation rule editing process.

【００４４】図３は図２（Ａ）の生成規則の例から作ら
れる地名表記ネットワークを模式的に示す図であり、以
下、これについて説明する。FIG. 3 is a diagram schematically showing a place name notation network created from the example of the generation rule of FIG. 2A, which will be described below.

【００４５】地名表記ネットワークは、各辺が部分文字
列に、各頂点が部分文字列の境界に対応する有向グラフ
である。各辺の方向は、文字列中の文字の順に一致す
る。図３において、NULLと記された辺は、NULL遷移、す
なわち、その箇所に何も文字列がなくてよいことを示し
ている。図３の中の右下に線の入った円３０１は、地名
文字列の開始位置を示す。また、中央に斜線が入った円
３０２〜３０４は、地名文字列の終わりの位置を示す。The place name notation network is a directed graph in which each side corresponds to a partial character string and each vertex corresponds to a boundary of the partial character string. The direction of each side matches the order of the characters in the character string. In FIG. 3, the side marked NULL indicates a null transition, that is, that there is no need for any character string at that location. A circle 301 with a line at the lower right in FIG. 3 indicates the start position of the place name character string. Circles 302 to 304, which are hatched in the center, indicate the end positions of the place name character strings.

【００４６】図４は地名表記ネットワークを計算機上に
実装する際のデータ形式を説明する図であり、以下、こ
れについて説明する。そして、計算機上に地名表記ネッ
トワークを実装する際、地名表記ネットワークは、図４
に示すようなデータ形式（left-child．right-sibling
representation、Ｔ．コルメン他「アルゴリズムイント
ロダクション」近代科学社、pp．201-202)を用いて表現
される。このデータ形式は、文字の連接関係を子ポイン
タで、地名表記ネットワークの分岐を兄弟ポインタで表
現するものである。FIG. 4 is a diagram for explaining a data format when the place name notation network is mounted on a computer. This will be described below. When a place name notation network is implemented on a computer, the place name notation network is configured as shown in FIG.
Data format as shown in (left-child.right-sibling
representation, T. Colmen et al. "Algorithm Introduction" Modern Science, pp. 201-202). In this data format, the concatenation of characters is represented by a child pointer, and the branch of the place name notation network is represented by a sibling pointer.

【００４７】図４（Ａ）は、各データレコードの構成要
素を示しており、各データレコードは、データ項目ｃ４
０１、ｂ４０２、ｄ４０３の３つのデータ項目からな
る。データ項目ｃは文字コードであり、データ項目ｂは
兄弟ポインタである。また、データ項目ｄは子ポインタ
である。そして、あるデータレコードからの分岐は、兄
弟ポインタにより、また、文字列は、子ポインタによっ
て接続されたリスト形式で表現される。例えば、図３に
示す地名表記ネットワークを前述したデータレコードに
よりリスト形式で表現すると、図４（Ｂ）に示すような
ものとなる。FIG. 4A shows the components of each data record. Each data record has a data item c4.
01, b402, and d403. Data item c is a character code, and data item b is a sibling pointer. The data item d is a child pointer. The branch from a certain data record is represented by a sibling pointer, and the character string is represented by a list connected by a child pointer. For example, if the place name notation network shown in FIG. 3 is represented in a list format by the above-described data record, it becomes as shown in FIG. 4B.

【００４８】図４（Ｂ）に示すリスト形式で表現した地
名表記ネットワークにおいて、データレコード４０４’
（文字コード「小」に対応）からは、データレコード４
０４〜４０６に分岐するが、データレコード４０４’か
ら４０４には子ポインタにより連結され、データレコー
ド４０４、４０５、４０６は兄弟ポインタにより連結さ
れている。また、文字列「埼玉県」は、子ポインタで接
続されたデータレコード４０７、４０８、４０９で表さ
れている。また、データレコードがNULL遷移に対応する
場合、そのデータレコードの文字コードｃ４０１にはNU
LL文字が格納され、このNULL文字が格納されたデータコ
ードから分岐するデータレコードは、省略されてもよい
ことが意味される。さらに、地名文字列の最後の文字に
対応するデータレコードの次に、データレコード４１０
として示すように、１つ余分のデータレコードが設けら
れ、このデータレコード４１０の子ポインタｄには、NU
LLポインタが格納されて、ネットワークの終端であるこ
とを表すと共に、兄弟ポインタｂに地名の識別子が格納
される。In the place name notation network expressed in the list format shown in FIG.
(Corresponding to the character code "small"), data record 4
Although the process branches from 04 to 406, data records 404 'to 404 are linked by a child pointer, and data records 404, 405, and 406 are linked by sibling pointers. The character string "Saitama" is represented by data records 407, 408, and 409 connected by a child pointer. When the data record corresponds to the NULL transition, the character code c401 of the data record contains NU.
This means that a data record storing an LL character and branching from the data code storing the NULL character may be omitted. Further, after the data record corresponding to the last character of the place name character string, the data record 410
, One extra data record is provided, and the child pointer d of this data record 410 has NU
The LL pointer is stored to indicate that the network is at the end, and the identifier of the place name is stored in the sibling pointer b.

【００４９】前述のような形式表現される図４（Ｂ）に
示すリスト形式の地名表記ネットワークは、各データレ
コードがノードに対応するグラフとみなすことができ、
図３に模式的に表した地名表記ネットワーク中の各辺
が、ここでは文字数分のノードで表わされたことにな
る。The place name notation network in the form of a list shown in FIG. 4B, which is expressed as described above, can be regarded as a graph in which each data record corresponds to a node.
Each side in the place name notation network schematically shown in FIG. 3 is represented here by nodes for the number of characters.

【００５０】図５は図１のステップ１０３における地名
文字列生成規則から地名表記ネットワークを生成する処
理を説明するフローチャート、図６は生成される構文木
の例を説明する図であり、以下、これらについて説明す
る。FIG. 5 is a flowchart for explaining the process of generating a place name notation network from the place name character string generation rule in step 103 of FIG. 1, and FIG. 6 is a view for explaining an example of a generated syntax tree. Will be described.

【００５１】まず、地名表記生成規則ファイル１０２の
中の各地名文字列の生成規則、例えば、図２（Ａ）の上
から２行目以降の＜Ｎ数字＞で始まる各行について、制
御ループ５０１により１つづつ処理する。各行に対し、
まず、ステップ５０２で、行内の文字列の構文解析を行
い、図６に示すような構文木を作成する。次に、ステッ
プ５０３で、その地名異表記群に対応する地名表記ネッ
トワークの終端ノードｔ_i を生成する。以下、特にこと
わりがない場合、「地名表記ネットワーク上のノード」
とは図４（Ａ）の形式のデータレコードを示すものとす
る。ｔ_i の中の文字コードｃにはNULLを、子ポインタｄ
にはNULLを、兄弟ポインタｂにはその地名異表記群の地
名識別子を格納する。次に、ステップ５０４で、後述す
る関数procを使用してその地名異表記群に対応する地名
表記ネットワークを生成する。全ての地名文字列生成規
則を処理し終わった後、ステップ５０５で、生成された
地名表記ネットワークの冗長な部分を統合する。First, the generation rule of each place name character string in the place name notation generation rule file 102, for example, for each line starting with <N number> from the second line from the top in FIG. Process one by one. For each row,
First, in step 502, a syntax analysis of a character string in a line is performed to create a syntax tree as shown in FIG. Next, at step 503, to produce a terminal node t _i place name notation network corresponding to the place name different notation group. In the following, unless otherwise specified, "nodes on the place name notation network"
Indicates a data record in the format of FIG. NULL is set for the character code c in t _i and the child pointer d
Is stored as NULL, and the sibling pointer b is stored as the place name identifier of the place name variant notation group. Next, in step 504, a place name notation network corresponding to the place name variant notation group is generated using a function proc described later. After all the place name character string generation rules have been processed, in step 505, the redundant parts of the generated place name notation network are integrated.

【００５２】地名文字列生成規則から構文木を生成する
処理は、例えば、「自然言語処理入門」（近代科学社、
ISBN4-7649-0143-9、pp．19-30）に記載されているよう
な生成規則に応じた遷移ネットワークを生成する手法等
を使用することができる。図６に示すステップ５０２の
処理で生成された構文木の例は、図２（Ａ）の２行目か
ら生成される構文木の例である。この図６において、
「＋」が記された円は文字列の連接を、「［］」が記
された円はオプションを、「｜」が記された円は選択を
表わし、四角は文字列を表わしている。拡張ＢＮＦ記法
は、括弧「（」、「）」も使用されるが、本発明の
実施形態に使用する構文木は、括弧に対応するノードは
設けず、括弧により定まる演算の順序を、構文木の構造
自身に反映させたものとする。The process of generating a syntax tree from the place name character string generation rules is described in, for example, “Introduction to Natural Language Processing” (Kindai Kagakusha,
ISBN4-7649-0143-9, pp. A method of generating a transition network according to a generation rule as described in 19-30) can be used. The example of the syntax tree generated in the process of step 502 illustrated in FIG. 6 is an example of the syntax tree generated from the second line in FIG. In FIG.
Circles marked with “+” indicate concatenation of character strings, circles marked with “[]” indicate options, circles marked with “|” indicate selection, and squares indicate character strings. In the extended BNF notation, parentheses “(”, “)” are also used. However, the syntax tree used in the embodiment of the present invention does not include a node corresponding to the parenthesis, and determines the order of operations determined by the parentheses. Is reflected in the structure itself.

【００５３】関数procは、構文木から地名表記ネットワ
ークを生成するために使用する関数であり、ｐとａとの
２つの引き数をとる。引き数ｐは、生成する地名表記ネ
ットワークの終端のノードの子ポインタｄがとる値を指
定する。また、引き数ａは、処理対象の構文木の最上位
ノードを示す。あるノードに引き数ａが指定されると、
引き数ａ以下の全てのノードが再帰的に処理される。The function proc is a function used to generate a place name notation network from a syntax tree, and takes two arguments, p and a. The argument p designates a value taken by the child pointer d of the node at the end of the generated place name notation network. The argument a indicates the top node of the syntax tree to be processed. When an argument a is specified for a certain node,
All nodes below the argument a are processed recursively.

【００５４】図７は関数procによる処理動作を説明する
フローチャート、図８、図９は関数procによって地名表
記ネットワークが生成される過程を説明する図、図１０
は地名表記生成規則から生成された地名表記ネットワー
ク群を示す図であり、以下、これらについて説明する。
なお、図７において、図に示されているｐ、ｑ、ｒは、
図４に示す形式のデータレコードのアドレスを表す変数
であり、記号「 −＞」は、データレコードの中のデー
タ項目を表わしている。また、図７に示すフローの処理
は、構文木のノードａの種類に応じて４つに場合分けし
て実行される。FIG. 7 is a flowchart for explaining the processing operation by the function proc. FIGS. 8 and 9 are diagrams for explaining the process of generating a place name notation network by the function proc.
Is a diagram showing a group of place name notation networks generated from the place name notation generation rules, which will be described below.
In FIG. 7, p, q, and r shown in FIG.
This is a variable representing the address of the data record in the format shown in FIG. 4, and the symbol "->" represents a data item in the data record. The processing of the flow shown in FIG. 7 is executed in four cases according to the type of the node a of the syntax tree.

【００５５】（１）構文木のノードａの種類を判別し、
種類が、「＋」、「｜」、「［］」、「文字列」の何
れてあるかを判定する（ステップ７０１）。(1) Determine the type of node a in the syntax tree,
It is determined whether the type is “+”, “|”, “[]”, or “character string” (step 701).

【００５６】（２）ステップ７０１での判定で、構文木
のノードａの種類が「＋」すなわち結合であった場合、
まず、変数ｑに引き数ｐをコピーする。すなわち、この
処理で生成する部分ネットワークの終端ノードのアドレ
スをコピーする（ステップ７０２）。(2) If the type of the node a of the syntax tree is “+”, that is, a join in the judgment in step 701,
First, the argument p is copied to the variable q. That is, the address of the terminal node of the partial network generated in this process is copied (step 702).

【００５７】（３）次に、構文木の子ノードｎ_i （１≦
ｉ≦子ノードの数）を右から順に関数proc（）により
処理して地名表記ネットワークの部分ネットワークを生
成する。その際、関数proc（）で生成する部分ネット
ワークの終点がｑとなるよう引き数を渡す。この結果生
成された部分ネットワークの始点のポインタをｑへ代入
し直し、次に生成する部分ネットワークの終点とする。
このようにして関数proc( )を繰り返し呼び出すことに
より、構文木の「＋」の子ノードから生成される地名表
記ネットワークの部分ネットワークが次々と連結される
（ステップ７０３、７０４）。(3) Next, the child nodes n _i (1 ≦ 1) of the syntax tree
(i ≦ the number of child nodes) are sequentially processed from the right by the function proc () to generate a partial network of the place name notation network. At this time, an argument is passed so that the end point of the partial network generated by the function proc () becomes q. The pointer of the start point of the partial network generated as a result is re-substituted into q, which is set as the end point of the partial network to be generated next.
By repeatedly calling the function proc () in this way, the partial networks of the place name notation networks generated from the child nodes of the “+” of the syntax tree are successively connected (steps 703 and 704).

【００５８】（４）全ての子ノードを処理し終わったな
ら、その時点でのｑすなわち部分ネットワークの先頭を
戻り値として返す（ステップ７０５）。(4) When all child nodes have been processed, q at that time, that is, the head of the partial network is returned as a return value (step 705).

【００５９】（５）ステップ７０１での判定で、構文木
のノードａの種類が「｜」すなわち選択であった場合、
まず、子ノードの１つｎ₁ から部分ネットワークを生成
し、得られた部分ネットワークの先頭アドレスを変数ｑ
に代入する（ステップ７０６）。(5) If the type of the node a in the syntax tree is "|"
First, a partial network is generated from one of the child nodes n _1, and the head address of the obtained partial network is set as a variable q
(Step 706).

【００６０】（６）次に、変数ｒにｑの値を代入し、他
の子ノードｎ_i （２≦ｉ≦子ノードの数）から生成する
部分ネットワークを順に生成する。生成した部分ネット
ワークの先頭アドレスは、ｒの兄弟ポインタｂに格納す
る。さらに、生成した部分ネットワークの先頭アドレス
をｒに代入し、以下同様の処理を繰り返す（ステップ７
０７〜７１０）。(6) Next, the value of q is substituted for the variable r, and partial networks are generated in order from other child nodes n _i (2 ≦ i ≦ number of child nodes). The head address of the generated partial network is stored in the sibling pointer b of r. Further, the head address of the generated partial network is substituted for r, and the same processing is repeated thereafter (step 7).
07-710).

【００６１】（７）全ての子ノードを処理し終わったな
ら、ｑすなわち一番始めに生成した部分ネットワークの
先頭のアドレスｑを戻り値として返す（ステップ７１
１）。(7) When all child nodes have been processed, q, that is, the top address q of the partial network generated first is returned as a return value (step 71).
1).

【００６２】（８）ステップ７０１での判定で、構文木
のノードａの種類が「［］」すなわちオプションであ
った場合、まず、構文木の子ノードに対応する部分ネッ
トワークを生成し、その先頭アドレスを変数ｑに格納す
る。その際、生成した部分ネットワークの末端はｐとな
るようパラメータを指定する（ステップ７１２）。(8) If the type of the node a in the syntax tree is “[]”, that is, the option in the determination in step 701, first, a partial network corresponding to the child node of the syntax tree is generated. Store the address in variable q. At this time, a parameter is specified so that the end of the generated partial network becomes p (step 712).

【００６３】（９）次に、NULL遷移に対応するノードを
関数newNd（）を用いて生成し、そのアドレスをｑの兄
弟ポインタｂに格納する。なお、newNd（）は、図４に
示す形式のデータレコードの記憶領域を新たに１つ確保
する関数であり、確保されたデータレコードのデータ項
目ｂにはNULLポインタがセットされる（ステップ７１
３）。(9) Next, a node corresponding to the NULL transition is generated using the function newNd (), and its address is stored in the sibling pointer b of q. Note that newNd () is a function for newly allocating one storage area for the data record in the format shown in FIG. 4, and a NULL pointer is set in the data item b of the allocated data record (step 71).
3).

【００６４】（10）次に、NULL遷移に対応するノードの
文字コードｃにNULLを代入し、さらに、NULL遷移に対応
するノードの子ノードポインタｄにｐをセットする（ス
テップ７１４、７１５）。(10) Next, NULL is assigned to the character code c of the node corresponding to the NULL transition, and p is set to the child node pointer d of the node corresponding to the NULL transition (steps 714 and 715).

【００６５】（11）最後に生成した部分ネットワークの
先頭のアドレスｑを戻り値として返す（ステップ７１
６）。(11) The first address q of the partial network generated last is returned as a return value (step 71).
6).

【００６６】（12）ステップ７０１での判定で、構文木
のノードａの種類が文字列であった場合、まず、変数ｑ
にｐの値を代入する（ステップ７１７）。(12) If it is determined in step 701 that the type of the node a in the syntax tree is a character string, first, the variable q
Is substituted for the value of p (step 717).

【００６７】（13）次に、下記の処理を文字列中の各文
字Ｃ_i （1≦i≦文字列長）に対して、文字列の終わりか
ら順に繰り返し、各文字に対応するノードを１つづつ生
成する。ここでは、まず、関数newNd（）でノードの記
憶領域を１つ分確保する。次に、新たに生成したノード
の文字コードｃに、Ｃ_i を代入する。次に、新たに生成
したノードの子ノードｄにｑの値を代入する。次に、ｑ
の値を新たに生成したノードのアドレスで置き換える
（ステップ７１８〜７２２）。(13) Next, the following processing is repeated for each character C _i (1 ≦ i ≦ character string length) in the character string in order from the end of the character string, and the node corresponding to each character is set to 1 Generate one after another. Here, first, one storage area of the node is secured by the function newNd (). Then, the character code c of the newly generated node substitutes C _i. Next, the value of q is assigned to the child node d of the newly generated node. Then, q
Is replaced with the address of the newly generated node (steps 718 to 722).

【００６８】（14）前述の処理を各文字Ｃ_i に対して実
行した後、新たに生成した部分ネットワークのアドレス
ｑを戻り値として返す（ステップ７２３）。(14) After the above processing is performed for each character C _i , the address q of the newly generated partial network is returned as a return value (step 723).

【００６９】関数procによって地名表記ネットワークが
生成される過程を示す図８、図９において、８０１は、
図５に示すフローのステップ５０３の処理で終端ノード
を生成し、識別子「3501104」を格納したところであ
る。その後、図７に示すフローの各ステップの処理によ
り、図８、図９に示す図の上から下に順に示すように地
名表記ネットワークが生成されていく。そして、図６に
示す構文木のノード６０３を関数procで処理した場合、
まず、ノード６０２に対応する部分ネットワークが関数
procで生成され、８０２に示すようにノード６０２に対
応する部分ネットワークが生成される。次に、ノード６
０４に対応する部分ネットワークを関数procで生成す
る。この場合、ｐはノード８０４のアドレスを格納して
おり、生成された部分ネットワークは、８０３に示すよ
うにノード８０４に接続される。8 and 9 showing a process of generating a place name notation network by the function proc, reference numeral 801 denotes
The terminal node is generated in the process of step 503 of the flow shown in FIG. 5 and the identifier “3501104” is stored. Thereafter, by the processing of each step of the flow shown in FIG. 7, a place name notation network is generated as shown in order from top to bottom in the diagrams shown in FIGS. Then, when the node 603 of the syntax tree shown in FIG. 6 is processed by the function proc,
First, the partial network corresponding to the node 602 is a function
A partial network generated by the proc and corresponding to the node 602 as shown by 802 is generated. Next, node 6
A partial network corresponding to 04 is generated by a function proc. In this case, p stores the address of the node 804, and the generated partial network is connected to the node 804 as indicated by 803.

【００７０】図５に示すフローの制御ループ５０１によ
り、各地名文字列の生成規則毎に別個の地名表記ネット
ワークが生成される。この結果、図２の地名表記生成規
則から生成された地名表記ネットワーク群は、図１０に
示すようなものとして生成され、さらに、ステップ５０
５の処理により、これらのネットワーク群の冗長な部
分、例えば、埼玉県川越市の部分を統合し、図３により
説明したような地名表記ネットワークが生成される。The control loop 501 of the flow shown in FIG. 5 generates a separate place name notation network for each place name character string generation rule. As a result, the place name notation network group generated from the place name notation generation rule of FIG. 2 is generated as shown in FIG.
By the process of 5, the redundant part of these network groups, for example, the part of Kawagoe-shi, Saitama, is integrated, and the place name notation network described with reference to FIG. 3 is generated.

【００７１】図１１は従来技術を用いて地名表記ネット
ワークを生成する処理手順を説明するフローチャート、
図１２は従来技術により生成される地名表記ネットワー
クの生成過程の例を説明する図、図１３は従来技術によ
り生成された地名表記ネットワークの例を示す図であ
り、以下、これらの図を参照して、生成規則を用いない
場合の地名表記ネットワーク生成方法について説明す
る。FIG. 11 is a flowchart for explaining a processing procedure for generating a place name notation network using a conventional technique.
FIG. 12 is a diagram for explaining an example of a generation process of a place name notation network generated according to the related art, and FIG. 13 is a diagram illustrating an example of a place name notation network generated according to the related art. Next, a method of generating a place name notation network when no generation rule is used will be described.

【００７２】ここで従来技術を説明する理由は、従来の
地名文字列の表記方法からはTrieと呼ばれる木構造の地
名表記ネットワークしか生成することができず、本発明
の表記方法から生成される地名表記ネットワークが記憶
容量、照合に要する処理時間共に優れていることを示す
ためである。従来技術による地名の表記を表現する手法
は、図２（Ｂ）に示すような地名文字列の羅列であり、
図１１により説明するフローは、このような単語の羅列
から地名表記ネットワークを生成する手順である。ここ
で、図２（Ｂ）の中のｋ番目の文字列をＳ_k 、その長さ
をＬ_k 、各文字列のｉ番目の文字をＣ_i とする。また、
各文字列に対応する識別子が別途記憶されているものと
する。そして、生成する地名表記ネットワークは、図４
に示すデータ形式で実現する。Here, the reason why the prior art is explained is that only the tree-structured place name notation network called Trie can be generated from the conventional place name character string notation method, and the place name generated from the notation method of the present invention. This is because the notation network is superior in both storage capacity and processing time required for collation. A method of expressing the notation of a place name according to the related art is a list of place name character strings as shown in FIG.
The flow described with reference to FIG. 11 is a procedure for generating a place name notation network from such a list of words. Here, the k-th character string in FIG. 2B is S _k , its length is L _k , and the i-th character of each character string is C _i . Also,
It is assumed that an identifier corresponding to each character string is separately stored. The generated place name notation network is shown in FIG.
This is realized in the data format shown in

【００７３】（１）まず、地名表記ネットワークの仮の
根となるノードｒｒを生成する。このノードの子ノード
ポインタｄにはNULLをセットする（ステップ１１０１、
１１０２）。(1) First, a node rr serving as a temporary root of the place name notation network is generated. NULL is set to the child node pointer d of this node (step 1101,
1102).

【００７４】（２）ループ１１０３により全ての文字列
Ｓ_k を１つづつ処理する。(2) All character strings S _k are processed one by one by a loop 1103.

【００７５】（３）まず、変数ｐに根のアドレスを代入
する。次に文字列中の文字の１つ毎に、サブルーチンSr
chNxt を呼び出す。サブルーチンSrchNxt は、各文字に
対応するノードがすでに生成されているか否かを判断
し、生成されていない場合、新たなノードを追加する処
理手順であり、この手順については後述する（ステップ
１１０４〜１１０６）。(3) First, the root address is substituted for the variable p. Next, for each character in the character string, the subroutine Sr
Call chNxt. The subroutine SrchNxt is a processing procedure for determining whether or not a node corresponding to each character has already been generated, and if not, adding a new node. This procedure will be described later (steps 1104 to 1106). ).

【００７６】（４）サブルーチンSrchNxtで文字列中の
文字を処理し終わったならば、新たな子ノードを関数ne
wNd( )で生成し、その文字列の識別子をポインタｂの領
域に格納し、さらにこの新たな子ノードのアドレスをｐ
の子ノードポインタｄに代入する。ループ１１０３が終
了した時点でのｒｒの子ノードが地名表記ネットワーク
の根となる（ステップ１１０７〜１１１０）。(4) When the subroutine SrchNxt has finished processing the characters in the character string, a new child node is added to the function ne
wNd (), the identifier of the character string is stored in the area of the pointer b, and the address of this new child node is
Is assigned to the child node pointer d. The child node of rr at the end of the loop 1103 becomes the root of the place name notation network (steps 1107 to 1110).

【００７７】次に、サブルーチンSrchNxtの処理につい
て説明する。Next, the processing of the subroutine SrchNxt will be described.

【００７８】（１）まず、変数ｑにｐの子ノードｄの値
を代入し、次に、ループ処理を行って全てのｐの子ノー
ドを変数ｑにより走査し、対応する文字コードすなわち
データ項目ｃがＣ_i と等しいか否か調べる。もし等しけ
れば、すでにＣ_i に対応するノードが生成されていると
みなして、ポインタｐをそのノードｑに進めて終了する
（ステップ１１１１、１１１３〜１１１５、ループ１１
１２）。(1) First, the value of the child node d of p is substituted for the variable q, and then the loop processing is performed to scan all the child nodes of p with the variable q, and the corresponding character code, that is, the data item Check whether c is equal to C _i . If they are equal, it is considered that a node corresponding to C _i has already been generated, and the pointer p is advanced to that node q, and the processing ends (steps 1111, 1113 to 1115, loop 11).
12).

【００７９】（２）ステップ１１１３でのチェックで、
データ項目ｃがＣ_i と等しくなければ、ｑにｑの兄弟ポ
インタの値を代入し、ｑがNULLとなるまでループ処理を
繰り返す。（ステップ１１１６）。(2) In the check in step 1113,
If the data item c is not equal to C _i, and substitutes the value of the sibling pointer of q to q, and repeats a loop process until q is NULL. (Step 1116).

【００８０】（３）ループ処理が終了してもＣ_i に対応
するノードが見いだせない場合、新たなノードを関数ne
wNd( )で生成し、新たなノードの文字コードｃにＣ_i
を、子ノードポインタｄにNULLを、兄弟ポインタｂにｐ
の子ノードポインタｄの値をそれぞれ代入し、この新た
なノードのアドレスをｐの子ノードポインタｄへ代入
し、ポインタｐに新たな子ノードのアドレスを代入し
て、このサブルーチンの処理を終了する（ステップ１１
１７〜１１２２）。(3) If the node corresponding to C _i cannot be found even after the loop processing ends, a new node is added to the function ne
Generated by wNd (), and the character code c of the new node is C _i
, NULL for child node pointer d, p for sibling pointer b
Is assigned to the value of the child node pointer d, the address of the new node is assigned to the child node pointer d of p, the address of the new child node is assigned to the pointer p, and the processing of this subroutine is completed. (Step 11
17 to 1122).

【００８１】前述した図１１の処理手順に従って地名表
記ネットワークが生成される過程を図１２に示してい
る。ここで挙げた例は、図２（Ｂ）の上から３行を処理
する過程である。まず、始めに、「川越市小ヶ谷」に対
応する地名表記ネットワークが生成される（１２０
１）。次に、「川越市笠幡」を処理するが、「川越市」
の部分は、１２０１ですでに生成されているので、新た
なノードは生成されない。しかし、ポインタｐが１２０
２に示す位置に達し、「笠」の文字を処理するときに
は、「笠」に該当するノードは「市」の子ノードにはな
い。そこで、「小」の兄弟ノードとして新たに「笠」に
該当するノードを生成する。以下、残りの文字「幡」に
対応するノードを、新たに生成したノードの子ノードと
して連結する（１２０３）。「川越市下広谷」の場合も
同様に処理され、「下」に対するノードを「小」、
「笠」の兄弟として新たに生成し（１２０４）、以降の
文字に対応するノードが連結される（１２０５）。FIG. 12 shows a process of generating a place name notation network according to the processing procedure of FIG. The example given here is a process of processing the top three rows in FIG. First, a place name notation network corresponding to "Ogaya, Kawagoe-shi" is generated (120).
1). Next, "Kawagoe City Kasabata" is processed.
Is already generated in 1201, no new node is generated. However, if the pointer p is 120
When the character reaches the position shown in FIG. 2 and the character "shade" is processed, the node corresponding to "shade" is not a child node of "city". Therefore, a node corresponding to “kasa” is newly generated as a sibling node of “small”. Hereinafter, the node corresponding to the remaining character “Hata” is connected as a child node of the newly generated node (1203). The same applies to the case of "Kawagoe Shimohiroya", where the nodes for "below" are "small"
A new node is generated as a sibling of "Kasa" (1204), and nodes corresponding to subsequent characters are linked (1205).

【００８２】図１３は図２（Ｂ）に記した異表記群から
生成した地名表記ネットワークの一部を模式的に示すも
のであるが、この例は、図３に示す場合と異なり、従来
の表記方法から生成される地名表記ネットワークは、木
の形式、すなわち一度分岐したら再び合流することがな
い形式になっている。これは、Trieとして知られている
データの表現形式である。図３と比較すると、冗長な部
分が多いことが判る。例えば、「東田」、「東関」、
「西関」に対応する部分ネットワークが、６回繰り返さ
れている。このことは、必要とする記憶容量の増大につ
ながることを意味し、階層的なメモリ構成をとる計算機
の場合、アクセスするメモリ空間が大きくなるとキャッ
シュのミスヒットなどによりアクセスが遅くなり、ひい
ては後述する文字列照合処理自身が遅くなることにな
る。FIG. 13 schematically shows a part of a place name notation network generated from the different notation group shown in FIG. 2B. This example differs from the case shown in FIG. The place name notation network generated from the notation method is in the form of a tree, that is, a form that, once branched, does not merge again. This is a form of representation of data known as Trie. Compared to FIG. 3, it can be seen that there are many redundant parts. For example, "Higashida", "Higashiseki",
The partial network corresponding to "Nishiguan" is repeated six times. This means that the required storage capacity is increased. In the case of a computer having a hierarchical memory configuration, if the memory space to be accessed becomes large, the access becomes slow due to a cache mishit or the like, which will be described later. The string matching process itself will be slow.

【００８３】本発明により、図３に示すような冗長さが
少ない地名表記ネットワークを生成することができるこ
とは、生成規則による地名単語表記の本質的な利点であ
る。この生成規則を用いると、冗長な箇所を明確に表現
することができる。例えば、図２（Ａ）に示す例の場
合、「小ヶ谷」の「ヶ」には３通りの異表記があるが、
「ヶ」以降の文字列は同じであることが拡張ＢＮＦ記法
により示されている。このため、図３に示すように、
「小」と「谷」との間のみ、３つの経路があるようなネ
ットワークが生成される。The fact that the present invention can generate a place name notation network with little redundancy as shown in FIG. 3 is an essential advantage of the place name word notation according to the generation rules. By using this generation rule, a redundant portion can be clearly expressed. For example, in the case of the example shown in FIG. 2 (A), there are three different notations for “ga” of “Ogaya”,
It is shown by the extended BNF notation that the character strings after “ka” are the same. For this reason, as shown in FIG.
Only between "small" and "valley", a network having three paths is generated.

【００８４】これに対して、図２（Ｂ）に示すような従
来の地名文字列の表記方法は、「ヶ」以降の異表記群が
等価かどうかを検知することができず、図１３に示すよ
うなネットワークしか生成することができない。On the other hand, the conventional notation method of the place name character string as shown in FIG. 2 (B) cannot detect whether or not the different notation groups after “「 ”are equivalent. Only networks as shown can be created.

【００８５】図１４は図１に示す地名認識処理１０４で
の処理動作を説明するフローチャートであり、以下、こ
れについて説明する。FIG. 14 is a flowchart for explaining the processing operation in the place name recognition processing 104 shown in FIG. 1. This will be described below.

【００８６】（１）まず、入力画像から文字行切出し処
理により、文字行の部分の画像を切出す（ステップ１４
０１）。(1) First, an image of a character line portion is cut out from the input image by a character line cutout process (step 14).
01).

【００８７】（２）次に、文字切出し処理により、文字
行画像中から文字と思われるパタン、すなわち、候補パ
タンを切出す。この段階で一意に文字の境界を決定でき
ない場合、複数の境界の仮説に基づいて、文字パタンの
切出しを試み、それぞれの仮説に対応した候補パタンを
出力する（ステップ１４０２）。(2) Next, a pattern which seems to be a character, that is, a candidate pattern, is extracted from the character line image by a character extracting process. If character boundaries cannot be uniquely determined at this stage, character pattern extraction is attempted based on a plurality of boundary hypotheses, and candidate patterns corresponding to the respective hypotheses are output (step 1402).

【００８８】（３）次に、文字認識処理により、切出さ
れたそれぞれの候補パタンがどんな文字であるかを認識
し、候補文字列として出力する。文字の切出し方が複数
の仮説に基づいている場合、また、文字認識の結果、１
つのパタンに対し複数の候補文字が出力され場合、文字
認識処理は、それぞれの切出し方及び候補文字に組み合
わせに対応して複数の候補文字列を出力する（ステップ
１４０３）。(3) Next, the character recognition process recognizes what character each of the extracted candidate patterns is, and outputs it as a candidate character string. When the method of extracting characters is based on a plurality of hypotheses,
When a plurality of candidate characters are output for one pattern, the character recognition process outputs a plurality of candidate character strings corresponding to the respective extraction methods and combinations of the candidate characters (step 1403).

【００８９】（４）最後に、文字列照合処理により、そ
れぞれの候補文字列が正しい地名文字列になっているか
否かを、地名表記ネットワークを参照して照合する。照
合で受理された候補文字列を地名認識結果とする（ステ
ップ１４０４）。(4) Finally, in the character string collation processing, whether or not each candidate character string is a correct place name character string is collated with reference to the place name notation network. The candidate character string received by the collation is set as a place name recognition result (step 1404).

【００９０】図１５は前述の文字列照合処理１４０４で
の処理動作を説明するフローチャートであり、以下、こ
れについて説明する。この処理は、１つの文字列を入力
とし、入力文字列の少なくとも一部が地名文字列として
受理し得るか否か判定し、受理し得るなら該当するその
地名表記の識別子を求める処理である。ここで、入力文
字列の長さをＬ、文字列のｉ番目の文字をＣ_i とする。FIG. 15 is a flowchart for explaining the processing operation in the above-described character string collation processing 1404, which will be described below. In this process, one character string is input, and it is determined whether at least a part of the input character string can be accepted as a place name character string, and if acceptable, a corresponding identifier of the place name notation is obtained. Here, the length of the input string L, and the i-th character in the string and C _i.

【００９１】（１）まず、ループ１５０１により、照合
の起点ｓを１からＬまで変えながら、ステップ１５０
２、１５０３を繰り返す。(1) First, by changing the starting point s of the collation from 1 to L by the loop 1501,
2. Repeat 1503.

【００９２】（２）ノードを指し示す変数ｐに、地名表
記ネットワークの根のアドレスをセットする。次に、引
き数ｐおよびｓを与えて関数srchを呼び出す。関数srch
は、地名表記ネットワーク中から入力文字列に一致する
経路を見い出し、その終端のノードのアドレスを返す関
数である。srchの戻り値が、NULLポインタでなければ照
合に成功したものとみなして、関数srchの戻り値が示す
ノードに格納された識別子を出力する（ステップ１５０
２〜１５０４）。(2) The root address of the place name notation network is set in the variable p indicating the node. Next, the function srch is called with the arguments p and s. Function srch
Is a function that finds a route that matches the input character string from the place name notation network and returns the address of the terminal node at the end. If the return value of srch is not a NULL pointer, it is assumed that the matching is successful, and the identifier stored in the node indicated by the return value of the function srch is output (step 150).
2 to 1504).

【００９３】（３）もしｓがＬに達しても照合が成功し
なければ、文字列照合処理は失敗したものとして、処理
を終了する（ステップ１５０５）。(3) If collation does not succeed even if s reaches L, it is determined that the character string collation processing has failed and the processing is terminated (step 1505).

【００９４】前述の処理において、関数srchは、再帰的
に自分自身からも呼び出され、地名表記ネットワーク中
から入力文字列に一致する経路を深さ優先で探索する。
関数srchは、引き数ｐ及びｉの２つの引き数をとる。引
き数ｐは、探索を開始するノードを指し示す。また、引
き数ｉは、整数であり、現在の処理で注目しているのが
入力文字列中の何番目の文字かを表す。受理される文字
列が見つかった場合、関数srchは、その文字列の終端の
ノードのアドレスを返し、受理される文字列が見つから
なかった場合、NULLポインタを返す。In the above-described processing, the function srch is recursively called by itself, and searches the place name notation network for a path that matches the input character string in a depth-first manner.
The function srch takes two arguments, p and i. The argument p points to the node where the search starts. The argument i is an integer, and indicates the number of the character in the input character string that is focused on in the current processing. The function srch returns the address of the node at the end of the string if an acceptable string is found, and returns a NULL pointer if no acceptable string is found.

【００９５】図１６は前述の処理での関数srchの処理動
作を説明フローチャートであり、以下、これについて説
明する。FIG. 16 is a flowchart for explaining the processing operation of the function srch in the above-described processing, which will be described below.

【００９６】（１）まず、引き数ｐが文字列終了ノード
を指しているか否かを調べる。もし、文字列終了ノード
を指している場合には、入力文字列が受理されたとみな
し、ｐを戻り値として返して処理を終了する（ステップ
１６０１）。(1) First, it is checked whether or not the argument p points to the character string end node. If it points to the character string end node, it is considered that the input character string has been accepted, and p is returned as a return value, and the processing is terminated (step 1601).

【００９７】（２）次に、すでに全ての文字を処理し終
わったか否か判定する。ｉがＬより大きく、全ての文字
を処理し終わっているにもかかわらず、地名表記ネット
ワークの終端にｐが達していない場合、NULLを返す（ス
テップ１６０２）。(2) Next, it is determined whether or not all characters have already been processed. If i is larger than L and all characters have been processed, but p has not reached the end of the place name notation network, NULL is returned (step 1602).

【００９８】（３）次に、ｐのデータ項目ｃが文字列の
ｉ番目の文字Ｃ_i と一致するか否かを調べる。もし一致
すれば、ｐの子ノードｐ−＞ｄを探索の起点とし、ｉ＋
１番目から文字列を処理するように、関数srchを再帰的
に呼び出す。この戻り値ｒがNULLでなければ、文字列が
受理されたとみなし、ｒを戻り値として処理を終了する
（ステップ１６０３）。[0098] (3) Next, determine whether the data item c of p matches the i-th character C _i of the string. If they match, the child node p-> d of p is set as the starting point of the search, and i +
Call the function srch recursively to process the string from the first. If the return value r is not NULL, it is determined that the character string has been accepted, and the process ends with r as the return value (step 1603).

【００９９】（４）次に、pがNULL遷移に対応するノー
ドか調べる。もしそうであれば、ｐの子ノードｐ−＞ｄ
を探索の起点とし、ｉ番目から文字列を処理するよう
に、関数srchを再帰的に呼び出す。この戻り値ｒがNULL
でなければ、文字列が受理されたとみなし、ｒを戻り値
として終了する（ステップ１６０４）。(4) Next, it is checked whether p is a node corresponding to a NULL transition. If so, the child nodes p-> d of p
Is used as the starting point of the search, and the function srch is recursively called so as to process the character string from the ith position. This return value r is NULL
If not, it is determined that the character string has been accepted, and the process ends with r as a return value (step 1604).

【０１００】（５）次に、ｐに兄弟ノードｐ−＞ｂが連
結されているかどうか調べる。もし連結されていれば、
ｐの兄弟ノードｐ−＞ｂを探索の起点とし、ｉ番目から
文字列を処理するように、関数srchを再帰的に呼び出
し、この戻り値を上位に返す（ステップ１６０５）。(5) Next, it is checked whether or not a sibling node p-> b is connected to p. If connected
The function srch is recursively called so that the character string is processed from the i-th node, with the sibling node p-> b of p as the starting point of the search, and this return value is returned to the higher order (step 1605).

【０１０１】（６）もし、前述のいずれの処理でも入力
文字列が受理されなければ、これ以上の探索はできない
ため、NULLを戻り値として処理を終了する（ステップ１
６０６）。(6) If the input character string is not accepted in any of the above-described processes, further search cannot be performed, and the process is terminated with NULL as a return value (step 1).
606).

【０１０２】前述までで説明した本発明の実施形態は、
文字切出し、文字認識、文字列照合を順次行うとして説
明したが、本発明は、古賀他「宛名読取り装置および郵
便物等区分機および文字列認識方法」（特願平１０−２
８０７７号公報）のように文字列照合結果を文字切出し
にフィードバックする方式に容易に拡張することもでき
る。The embodiment of the present invention described above is
Although it has been described that character extraction, character recognition, and character string collation are sequentially performed, the present invention relates to Koga et al., "Address reading device, mail sorting machine, and character string recognition method" (Japanese Patent Application No. 10-2).
No. 8077), it is possible to easily extend the method to feed back the character string collation result to character extraction.

【０１０３】図１７は本発明の実施形態による地名文字
列認識の処理を応用したシステムの構成例を示すブロッ
ク図、図１８は地名文字列生成規則編集装置の構成を示
すブロック図である。このシステム例は、郵便区分シス
テムに本発明を適用した例である。図１７、図１８にお
いて、１７０１は郵便区分機、１７０２はスキャナ、１
７０３はディレイライン、１７０４はソーター、１７０
５は地名認識装置、１７０６は入力用インタフェース、
１７０７は演算処理装置、１７０８は出力用処理装置、
１７１０はメモリ、１７１１はネットワークインタフェ
ース、１７１２はハードディスク、１７１３はメディア
着脱可能記憶装置、１７１４は地名文字列生成規則編集
装置、１７１８はネットワーク、１８０１はマウス、１
８０２はキーボード、１８０３はディスプレイ、１８０
４は地名文字列生成規則編集プログラム、１８０５は文
字列照合プログラム、１８０６は地名表記ネットワーク
表示プログラム、１８０７は地名文字列生成規則ファイ
ル、１８０８は地名表記ネットワーク生成プログラム、
１８０９は地名表記ネットワークデータ、１８１０は通
信装置、１８１１はメディア着脱可能記憶装置、１８１
２はコンピュータである。FIG. 17 is a block diagram showing a configuration example of a system to which the place name character string recognition processing according to the embodiment of the present invention is applied, and FIG. 18 is a block diagram showing a configuration of a place name character string generation rule editing device. This system example is an example in which the present invention is applied to a postal sorting system. 17 and 18, 1701 is a mail sorting machine, 1702 is a scanner, 1
703 is a delay line, 1704 is a sorter, 170
5 is a place name recognition device, 1706 is an input interface,
1707 is an arithmetic processing unit, 1708 is an output processing unit,
1710 is a memory, 1711 is a network interface, 1712 is a hard disk, 1713 is a removable medium storage device, 1714 is a place name character string generation rule editing device, 1718 is a network, 1801 is a mouse,
802 is a keyboard, 1803 is a display, 180
4 is a place name character string generation rule editing program, 1805 is a character string collation program, 1806 is a place name notation network display program, 1807 is a place name character string generation rule file, 1808 is a place name notation network generation program,
Reference numeral 1809 denotes a place name notation network data, reference numeral 1810 denotes a communication device, reference numeral 1811 denotes a removable medium storage device, reference numeral 181.
2 is a computer.

【０１０４】図１７に示すシステムは、１台または複数
台の郵便区分機１７０１と、１台または複数台の地名文
字列生成規則編集装置１７１４とがネットワーク１７１
８で接続されて構成される。郵便区分機１７０１は、ス
キャナ１７０２、ディレイライン１７０３、ソータ１７
０４、地名認識装置１７０５から構成される。また、地
名認識装置１７０５は、入力用インタフェース１７０
６、演算処理装置１７０７、出力用処理装置１７０８、
メモリ１７１０、ネットワークインタフェース１７１
１、ハードディスク１７１２、メディア着脱可能記憶装
置１７１３から構成される。なお、図における太線は、
郵便物の流れを示す。In the system shown in FIG. 17, one or a plurality of mail sorting machines 1701 and one or a plurality of place name character string generation rule editing devices 1714 are connected to a network 171.
8 to be connected. The mail sorting machine 1701 includes a scanner 1702, a delay line 1703, a sorter 17
04, a place name recognition device 1705. In addition, the place name recognition device 1705 includes an input interface 170.
6, arithmetic processing unit 1707, output processing unit 1708,
Memory 1710, network interface 171
1, a hard disk 1712, and a medium removable storage device 1713. The bold line in the figure is
Shows the flow of mail.

【０１０５】図１７に示すシステムにおいて、スキャナ
１７０２から入力された郵便物に記載されている地名の
画像情報は、地名認識装置１７０５へ転送される。そし
て、郵便物がディレイライン１７０３を搬送される間
に、地名認識装置１７０５は、郵便物に記載されている
地名を認識し、認識結果をソータ１７０４へ転送する。
ソータ１７０４は、認識結果に応じて郵便物を区分す
る。In the system shown in FIG. 17, the image information of the place name described in the mail sent from the scanner 1702 is transferred to the place name recognizing device 1705. Then, while the mail is conveyed on the delay line 1703, the place name recognition device 1705 recognizes the place name described in the mail and transfers the recognition result to the sorter 1704.
Sorter 1704 sorts the mail according to the recognition result.

【０１０６】郵便物の区分の準備段階として、地名認識
装置１７０５は、ハードディスク１７１２から地名表記
ネットワーク生成プログラムファイルをメモリ１７１０
に読み込んで演算装置１７０７で起動する。地名表記ネ
ットワーク生成プログラムの制御の下に、地名認識装置
１７０５は、地名文字列生成規則を地名文字列生成規則
編集装置１７１４からネットワークインタフェース１７
１１を介して入力し、地名表記ネットワークファイルを
作成してハードディスク１７１２に格納する。As a preparation stage for sorting mail, the place name recognizing device 1705 stores the place name notation network generation program file from the hard disk 1712 into the memory 1710.
And is started by the arithmetic unit 1707. Under the control of the place name notation network generation program, the place name recognition device 1705 transmits the place name character string generation rule from the place name character string generation rule editing device 1714 to the network interface 17.
11, a place name notation network file is created and stored in the hard disk 1712.

【０１０７】なお、地名文字列生成規則は、ネットワー
クを介して地名文字列生成規則編集装置１７１４から入
力する代わりに、フロッピーディスクドライブ等のメデ
ィア着脱可能記憶装置１７１３より読み込んでもよい。Note that the place name character string generation rule may be read from a removable medium storage device 1713 such as a floppy disk drive instead of being input from the place name character string generation rule editing device 1714 via a network.

【０１０８】地名認識装置１７０５は、郵便物の区分時
に、ハードディスク１７１２から認識プログラムファイ
ル及び地名表記ネットワークファイルをメモリ１７１０
に読み込んで演算装置１８０７により実行する。そし
て、地名認識装置１７０５は、認識プログラムの制御の
下に、入力インタフェース１７０６から画像を入力し、
郵便物に記載された地名を認識し、認識結果を出力イン
タフェース１７０８を介して出力する。The place name recognizing device 1705 stores the recognition program file and the place name notation network file from the hard disk 1712 in the memory 1710 when sorting mail.
And executed by the arithmetic unit 1807. Then, the place name recognition device 1705 inputs an image from the input interface 1706 under the control of the recognition program,
The place name described in the mail is recognized, and the recognition result is output via the output interface 1708.

【０１０９】地名文字列生成規則編集装置１７１４は、
図１８に示すように、コンピュータ１８１２に、マウス
１８０１、キーボード１８０２、ディスプレイ１８０
３、地名文字列生成規則ファイル１８０７を格納するデ
ィスク装置、通信装置１８１０、メディア着脱可能記憶
装置１８１１を接続して構成される。編集作業は、コン
ピュータ１８１２上で動作する地名文字列生成規則編集
プログラム１８０４を介し地名文字列生成規則ファイル
１８０７を編集することにより実行される。地名文字列
生成規則ファイル１８０７は、テキストファイルであ
り、編集には通常のテキストエディタを用いることがで
きる。また、コンピュータ１８１２上で地名表記ネット
ワーク生成プログラム１８０８を実行し、地名文字列生
成規則ファイル１８０７から地名表記ネットワーク１８
０９を生成することができる。The place name character string generation rule editing device 1714
As shown in FIG. 18, a computer 1802 has a mouse 1801, a keyboard 1802, a display 180
3. A disk device for storing the place name character string generation rule file 1807, a communication device 1810, and a removable media storage device 1811 are connected to each other. The editing work is executed by editing the place name character string generation rule file 1807 via the place name character string generation rule editing program 1804 operating on the computer 1812. The place name character string generation rule file 1807 is a text file, and an ordinary text editor can be used for editing. Also, the place name notation network generation program 1808 is executed on the computer 1812, and the place name notation network 18
09 can be generated.

【０１１０】地名文字列生成規則編集装置１７１４は、
前述の機能により、編集中の地名単語生成規則が文法的
に正しいか否かを確認することができ、さらに、認識処
理での文字列照合１４０４と等価なプログラム１８０５
を実行し、キーボード１８０３から入力された試験用の
文字列が受理されるか否かを確認することができる。The place name character string generation rule editing device 1714
With the above-described function, it is possible to confirm whether or not the place name word generation rule being edited is grammatically correct, and further, a program 1805 equivalent to the character string collation 1404 in the recognition processing.
Is executed, it is possible to confirm whether or not the test character string input from the keyboard 1803 is accepted.

【０１１１】また、コンピュータ１８１２は、地名表記
ネットワーク１８０９を、例えば、図３に示すような形
式で表示するための地名表記ネットワーク表示プログラ
ム１８０６を実行するので、作業者は、編集結果を視覚
的に確認することができる。編集した結果の地名文字列
生成規則ファイル１８０７は、通信装置１８１０を介し
て地名認識装置１７０５へ転送され、あるいは、メディ
ア着脱可能記憶装置１８１１によりフロッピーディスク
などの着脱可能な記憶メディアに複写され、記憶メディ
アにより郵便区分機１７０１へ輸送されてもよい。Also, since the computer 1812 executes the place name notation network display program 1806 for displaying the place name notation network 1809 in, for example, the format shown in FIG. 3, the operator visually recognizes the editing result. You can check. The edited place name character string generation rule file 1807 is transferred to the place name recognition device 1705 via the communication device 1810, or copied to a removable storage medium such as a floppy disk by the media removable storage device 1811 and stored. It may be transported to the mail sorter 1701 by media.

【０１１２】図１９は本発明の他の実施形態の構成を示
すブロック図、図２０はディスプレイに表示される画面
例を説明する図である。この例は、本発明による地名文
字列の表記方法及び地名照合方式を利用し、地名を表す
文字列から地名に関する情報を検索するための地名録装
置である。図１９において、１９０１はマウス、１９０
２はキーボード、１９０３はディスプレイ、１９０４は
プリンタ、１９０５は入力ファイル、１９０６は出力フ
ァイル、１９０７は地名録プログラム、１９０８は地名
付加情報ファイル、１９０９は地名文字列生成規則ファ
イル、１９１０は通信モジュール、１９１１はインタフ
ェースモジュール、１９１２は地名リストデータ、１９
１３は地名リストソートモジュール、１９１４は地名情
報検索モジュール、１９１５は地名リスト生成モジュー
ル、１９１６は文字列照合モジュール、１９１７は地名
表記展開モジュール、１９１８は地名表記ネットワーク
生成プログラム、１９１９は地名表記ネットワークデー
タである。FIG. 19 is a block diagram showing the configuration of another embodiment of the present invention, and FIG. 20 is a diagram for explaining an example of a screen displayed on a display. This example is a gazetteer for retrieving information relating to a place name from a character string representing a place name, using a place name character string notation method and a place name collation method according to the present invention. In FIG. 19, reference numeral 1901 denotes a mouse;
2 is a keyboard, 1903 is a display, 1904 is a printer, 1905 is an input file, 1906 is an output file, 1907 is a gazetteer program, 1908 is a place name additional information file, 1909 is a place name character string generation rule file, 1910 is a communication module, 1911 Is an interface module, 1912 is place name list data, 19
13 is a place name list sort module, 1914 is a place name information search module, 1915 is a place name list generation module, 1916 is a character string collation module, 1917 is a place name notation expansion module, 1918 is a place name notation network generation program, and 1919 is a place name notation network data. is there.

【０１１３】図１９に示す装置は、以下のようなサービ
スを提供するものである。（１）キーボードから入力された地名文字列の標準形を
表示または印刷する。（２）キーボードから入力された地名文字列の異表記を
表示または印刷する。（３）キーボードから入力された地名文字列に対応する
地域の情報（郵便番号など）を表示または印刷する。（４）ファイルから入力した地名文字列を標準形または
郵便番号等該当する地域に固有の情報に変換してファイ
ルへ出力する。（５）ネットワークから入力した地名文字列を標準形ま
たは郵便番号等該当する地域に固有の情報に変換してネ
ットワークへ出力する。前述において、標準形とは、例えば、行政区分で定めら
れているある地域を表す正式な文字列のことである。The device shown in FIG. 19 provides the following services. (1) Display or print the standard form of the place name character string input from the keyboard. (2) Display or print the different notation of the place name character string input from the keyboard. (3) Display or print area information (postal code, etc.) corresponding to the place name character string input from the keyboard. (4) The place name character string input from the file is converted into information unique to the corresponding area, such as a standard form or a zip code, and output to a file. (5) The place name character string input from the network is converted into information specific to the corresponding area, such as a standard form or a zip code, and output to the network. In the above description, the standard form is, for example, a formal character string representing a certain area defined by an administrative division.

【０１１４】図１９に示す実施形態は、計算機上で実行
される地名録プログラム１９０７に、マウス１９０１、
キーボード１９０２、ディスプレイ１９０３、プリンタ
１９０４、入力ファイル１９０５、出力ファイル１９０
６、地名付加情報ファイル１９０８、地名文字列生成規
則ファイル１９０９が接続されて構成される。表示、入
出力は、インタフェースモジュール１９１１を介して行
われる。検索対象の文字列が入力されると、地名情報検
索モジュール１９１４は、文字列照合モジュール１９１
６を呼び出す。文字列照合モジュール１９１６は、図１
４における文字列照合処理１４０４と等価な処理を司る
モジュールであり、地名表記生成規則ファイル１９０９
から地名表記ネットワーク生成プログラム１９１８によ
って生成された地名表記ネットワークデータ１９１９を
参照し、入力文字列がいかなる識別子の地名表記に該当
するかを調べる。In the embodiment shown in FIG. 19, a place name book program 1907 executed on a computer has a mouse 1901,
Keyboard 1902, display 1903, printer 1904, input file 1905, output file 190
6, a place name additional information file 1908 and a place name character string generation rule file 1909 are connected and configured. Display and input / output are performed via the interface module 1911. When a character string to be searched is input, the place name information search module 1914 causes the character string collation module 191
Call 6. The character string collation module 1916 is configured as shown in FIG.
4 is a module that performs processing equivalent to the character string collation processing 1404 in FIG.
, Refer to the place name notation network data 1919 generated by the place name notation network generation program 1918 to check which identifier the input character string corresponds to.

【０１１５】地名情報検索モジュール１９１４は、得ら
れた識別子を手がかりに、地名付加情報ファイル１９０
８から、標準形と、郵便番号等の付加的な情報とを検索
する。また、地名表記展開モジュール１９１７は、地名
表記ネットワークデータ１９１９からあり得る異表記を
全て列挙する。得られた異表記群は、地名リストデータ
１９１２に格納し、必要に応じインタフェースモジュー
ル１９１１を介して出力する。また、地名リストソート
モジュール１９１３は、操作者の指示に従い、異表記群
の順序を並べ替えて出力する。このような処理のための
入力は、キーボード１９０１、入力ファイル１９０５、
通信モジュール１９１０のいずれを介して行われてもよ
い。また、出力は、ディスプレイ１９０４、出力ファイ
ル１９０６、通信モジュール１９１０のいずれを介して
行ってもよい。The place name information search module 1914 uses the obtained identifier as a clue to add the place name additional information file 190.
From 8, a standard form and additional information such as a postal code are retrieved. The place name notation development module 1917 enumerates all possible notations from the place name notation network data 1919. The obtained different notation group is stored in the place name list data 1912, and output via the interface module 1911 as necessary. In addition, the place name list sort module 1913 rearranges and outputs the order of the different notation groups according to the instruction of the operator. Inputs for such processing include a keyboard 1901, an input file 1905,
This may be performed via any of the communication modules 1910. The output may be performed via any of the display 1904, the output file 1906, and the communication module 1910.

【０１１６】図２０に示す図１９の実施形態のディスプ
レイ１９０３に表示される画面例において、図２０
（Ａ）に示す例は、操作者が、「川越市小ヶ谷」という
文字列を入力して、検索を実行した際にディスプレイ１
９０３に表示される画面例である。入力文字列は、フィ
ールド２００５へ入力され、ボタン２００６をマウスで
クリックすることにより、検索が実行される。検索の結
果、入力文字列に該当することが判った文字列は、ウイ
ンドウ２００７に表示される。各行の「標準」の項目に
は、その文字列が標準形か否かが表わされる。項目「地
名」は、その文字列を表示する。項目「郵便番号」に
は、その文字列に対応する郵便番号を表示されるが、そ
の他のその地域の付加情報を表示してもよい。In the example of the screen displayed on the display 1903 of the embodiment of FIG. 19 shown in FIG.
In the example shown in (A), when the operator inputs a character string “Ogaya, Kawagoe-shi” and executes a search, the display 1 is displayed.
903 is a screen example displayed on the screen. The input character string is input to field 2005, and a search is executed by clicking button 2006 with a mouse. As a result of the search, the character string found to correspond to the input character string is displayed in window 2007. The "standard" item in each line indicates whether the character string is in a standard form. The item "place name" displays the character string. In the item “postal code”, a postal code corresponding to the character string is displayed, but other additional information of the area may be displayed.

【０１１７】領域２００４に並べられた「標準」、「地
名」、「郵便番号」の枠はボタンとなっており、各ボタ
ンをマウスによりクリックすることにより、それぞれの
項目に基づいた行の並べ換えを指示する。ウインドウ２
００８は、検索のオプションを指定するためのものであ
る。ここで、標準形のみを表示するか、字、大字等に基
づく異表記群を表示するか、通称名（「＊＊団地」等）
に基づく異表記群を表示するかを指定する。ボタン２０
０２は、表示内容の印刷を指示するためのボタンであ
り、ボタン２００１は、キーボードとディスプレイとに
代わり、ファイルを入出力するモードへの切り替えのた
めのボタンである。また、ボタン２００３は、プログラ
ムの終了を指示するためのボタンである。The frames of “standard”, “place name”, and “zip code” arranged in the area 2004 are buttons, and when each button is clicked with a mouse, the rows are rearranged based on the respective items. To instruct. Window 2
008 is for designating a search option. Here, whether to display only the standard form, to display a group of different notations based on characters, capital letters, etc.
Specify whether to display a group of different notations based on. Button 20
Reference numeral 02 denotes a button for instructing printing of display contents, and a button 2001 is a button for switching to a file input / output mode instead of the keyboard and the display. A button 2003 is a button for instructing termination of the program.

【０１１８】図２０（Ｂ）に開かれたウィンドウ２００
９は、照合の結果得られた地名の読み方、小字、郵便番
号等の詳細な情報を表示するウインドウである。このウ
インドウ２００９は、ウインドウ２００７上に表示され
た検索結果をマウスでクリックすることにより起動され
る。The window 200 opened in FIG.
Reference numeral 9 denotes a window for displaying detailed information such as how to read place names, small letters, and postal codes obtained as a result of the comparison. This window 2009 is activated by clicking a search result displayed on the window 2007 with a mouse.

【０１１９】なお、本発明の実施形態による表記方法に
より表記された地名文字列は、ＦＤ、ＭＯ、ＤＶＤ等の
記憶媒体に地名辞書として格納し提供することができ
る。The place name character string described by the notation method according to the embodiment of the present invention can be provided as a place name dictionary stored in a storage medium such as FD, MO, and DVD.

【０１２０】[0120]

【発明の効果】以上説明したように本発明によれば、地
名の表記に多くの異表記がある場合でも、あり得る全て
の地名文字列を網羅する地名辞書を少ない人手で作成す
ることができる。また、高速な照合処理が可能なネット
ワーク形式の地名辞書を容易に作成することができる。As described above, according to the present invention, a place name dictionary covering all possible place name character strings can be created with a small number of hands, even when there are many different notations in place name notation. . Further, it is possible to easily create a network-type place name dictionary that enables high-speed collation processing.

[Brief description of the drawings]

【図１】本発明の実施形態による地名文字列認識の処理
例を説明するフローチャートである。FIG. 1 is a flowchart illustrating a processing example of place name character string recognition according to an embodiment of the present invention.

【図２】編集された地名文字列生成規則により表現され
た地名の表記例と生成規則を用いずに異表記を羅列した
例とを示す図である。FIG. 2 is a diagram illustrating a description example of a place name represented by an edited place name character string generation rule and an example in which different notations are listed without using the generation rule.

【図３】生成規則の例から作られる地名表記ネットワー
クを模式的に示す図である。FIG. 3 is a diagram schematically illustrating a place name notation network created from an example of a generation rule.

【図４】地名表記ネットワークを計算機上に実装する際
のデータ形式を説明する図である。FIG. 4 is a diagram illustrating a data format when a place name notation network is implemented on a computer.

【図５】地名文字列生成規則から地名表記ネットワーク
を生成する処理を説明するフローチャートである。FIG. 5 is a flowchart illustrating a process of generating a place name notation network from a place name character string generation rule.

【図６】生成される構文木の例を説明する図である。FIG. 6 is a diagram illustrating an example of a generated syntax tree.

【図７】地名表記生成規則から地名表記ネットワーク生
成する関数procによる処理動作を説明するフローチャー
トである。FIG. 7 is a flowchart illustrating a processing operation by a function proc that generates a place name notation network from a place name notation generation rule.

【図８】関数procによって地名表記ネットワークが生成
される過程を説明する図（その１）である。FIG. 8 is a diagram (part 1) illustrating a process of generating a place name notation network by a function proc.

【図９】関数procによって地名表記ネットワークが生成
される過程を説明する図（その２）である。FIG. 9 is a diagram (part 2) illustrating a process of generating a place name notation network by a function proc.

【図１０】地名表記生成規則から生成された地名表記ネ
ットワーク群を示す図である。FIG. 10 is a diagram showing a group of place name notation networks generated from a place name notation generation rule.

【図１１】従来技術を用いて地名表記ネットワークを生
成する処理手順を説明するフローチャートである。FIG. 11 is a flowchart illustrating a processing procedure for generating a place name notation network using a conventional technique.

【図１２】従来技術により生成される地名表記ネットワ
ークの生成過程の例を説明する図である。FIG. 12 is a diagram illustrating an example of a generation process of a place name notation network generated according to the related art.

【図１３】従来技術により生成された地名表記ネットワ
ークの例を示す図である。FIG. 13 is a diagram illustrating an example of a place name notation network generated according to the related art.

【図１４】図１に示す地名認識処理での処理動作を説明
するフローチャートである。FIG. 14 is a flowchart illustrating a processing operation in the place name recognition processing illustrated in FIG. 1;

【図１５】図１４に示す文字列照合処理での処理動作を
説明するフローチャートである。FIG. 15 is a flowchart illustrating a processing operation in the character string collation processing illustrated in FIG. 14;

【図１６】関数srchの処理動作を説明フローチャートで
ある。FIG. 16 is a flowchart illustrating a processing operation of a function srch.

【図１７】本発明の実施形態による地名文字列認識の処
理を応用したシステムの構成例を示すブロック図であ
る。FIG. 17 is a block diagram illustrating a configuration example of a system to which a place name character string recognition process according to an embodiment of the present invention is applied.

【図１８】地名文字列生成規則編集装置の構成を示すブ
ロック図である。FIG. 18 is a block diagram illustrating a configuration of a place name character string generation rule editing device.

【図１９】本発明の他の実施形態の構成を示すブロック
図である。FIG. 19 is a block diagram showing a configuration of another embodiment of the present invention.

【図２０】ディスプレイに表示される画面例を説明する
図である。FIG. 20 is a diagram illustrating an example of a screen displayed on a display.

[Explanation of symbols]

１０１地名文字列生成規則編集処理１０２地名文字列生成規則ファイル１０３地名表記ネットワーク生成処理１０４地名認識処理１４０４文字列照合処理１７０１郵便区分機１７０２スキャナ１７０３ディレイライン１７０４ソーター１７０５地名認識装置１７０６入力用インタフェース１７０７演算処理装置１７０８出力用処理装置１７１０メモリ１７１１ネットワークインタフェース１７１２ハードディスク１７１３メディア着脱可能記憶装置１７１４地名文字列生成規則編集装置１７１８ネットワーク１８０１マウス１８０２キーボード１８０３ディスプレイ１８０４地名文字列生成規則編集プログラム１８０５文字列照合プログラム１８０６地名表記ネットワーク表示プログラム１８０７地名文字列生成規則ファイル１８０８地名表記ネットワーク生成プログラム１８０９地名表記ネットワークデータ１８１０通信装置１８１１メディア着脱可能記憶装置１８１２コンピュータ１９０１マウス１９０２キーボード１９０３ディスプレイ１９０４プリンタ１９０５入力ファイル１９０６出力ファイル１９０７地名録プログラム１９０８地名付加情報ファイル１９０９地名文字列生成規則ファイル１９１０通信モジュール１９１１インタフェースモジュール１９１２地名リストデータ１９１３地名リストソートモジュール１９１４地名情報検索モジュール１９１５地名リスト生成モジュール１９１６文字列照合モジュール１９１７地名表記展開モジュール１９１８地名表記ネットワーク生成プログラム１９１９地名表記ネットワークデータ 101 place name character string generation rule editing processing 102 place name character string generation rule file 103 place name notation network generation processing 104 place name recognition processing 1404 character string collation processing 1701 postal sorter 1702 scanner 1703 delay line 1704 sorter 1705 place name recognition device 1706 input interface 1707 Arithmetic processing unit 1708 Output processing unit 1710 Memory 1711 Network interface 1712 Hard disk 1713 Removable media storage device 1714 Place name character string generation rule editing device 1718 Network 1801 Mouse 1802 Keyboard 1803 Display 1804 Place name character string generation rule editing program 1805 Character string collation program 1806 Place name notation network display program 1807 Place name sentence Column creation rule file 1808 place name notation network generation program 1809 place name notation network data 1810 communication device 1811 removable media storage device 1812 computer 1901 mouse 1902 keyboard 1903 display 1904 printer 1905 input file 1906 output file 1907 gazetteer program 1908 geolocation additional information file 1909 Place name character string generation rule file 1910 Communication module 1911 Interface module 1912 Place name list data 1913 Place name list sort module 1914 Place name information search module 1915 Place name list generation module 1916 Character string collation module 1917 Place name notation expansion module 1918 Place name notation network generation program 19 19 Place Name Notation Network Data

フロントページの続き (72)発明者池田尚司東京都国分寺市東恋ケ窪一丁目280番地株式会社日立製作所中央研究所内 (72)発明者緒方日佐男東京都国分寺市東恋ケ窪一丁目280番地株式会社日立製作所中央研究所内 (72)発明者酒匂裕東京都国分寺市東恋ケ窪一丁目280番地株式会社日立製作所中央研究所内 (72)発明者藤澤浩道東京都国分寺市東恋ケ窪一丁目280番地株式会社日立製作所中央研究所内Ｆターム(参考） 5B009 LA03 ME02 ME17 ME23 QA16 VA02 Continued on the front page (72) Inventor Shoji Ikeda 1-280 Higashi-Koikekubo, Kokubunji-shi, Tokyo Inside the Central Research Laboratory, Hitachi, Ltd. (72) Inventor Hisao Ogata 1-280 Higashi-Koikekubo, Kokubunji-shi, Tokyo Hitachi, Ltd. In the laboratory (72) Inventor Hiroshi Saka 1-280 Higashi Koikekubo, Kokubunji-shi, Tokyo Inside the Hitachi, Ltd. Central Research Laboratory (72) Inventor Hiromichi Fujisawa 1-280 Higashi Koikekubo, Kokubunji-shi, Tokyo F-term in the Hitachi, Ltd. (Reference) 5B009 LA03 ME02 ME17 ME23 QA16 VA02

Claims

[Claims]

1. A place name expression method for representing a set of place name character strings having a plurality of different notations in which place names representing regions are different character strings but are represented by an array of words meaning the same region. An array of characters or syntax categories is defined for each substring constituting part or all of the character string, and the place name character string is represented by a syntax category consisting of an array of characters or defined syntax categories. Place name expression method.

2. The method according to claim 1, wherein the place name character string is replaced with a replacement symbol indicating what kind of character or syntax category string the syntax category is replaced with, and a place name symbol indicating that a certain syntax category represents a specific region. The place name expression method according to claim 1, wherein the place name is represented by using the following.

3. A partial character string of an input character string defines an array of characters or syntax categories for each partial character string that constitutes a part or all of a place name character string, and the character or defined character string is defined. A place name character string collating method, wherein a place name is collated from an input character string by determining whether or not it matches one of the place name character strings represented by a syntax category composed of an array of syntax categories. .

4. An array of characters or syntax categories is defined for each partial character string constituting a part or all of a place name character string, and a place name represented by a syntax category composed of an array of characters or defined syntax categories. Storage means for storing a character string, input means for inputting the character string, means for checking whether the input character string is the place name character string stored in the storage means, and outputting the result of the check And a place name character string collating device.

5. A character reading means for reading a character described in a document by inputting an image obtained by converting the density of a surface of the document into an electric signal, and a place name character string collating means according to claim 4. Wherein the input means in the place name character string collating means inputs a character string from the character reading means.

6. A method for recognizing a place name character string in an address of a postal matter by using the place name character string recognizing device according to claim 5, classifying the postal matter, or printing a recognition result on the postal matter. A mail sorting system characterized by the following.

7. A place name character string having a plurality of different notations in which a place name representing an area is a different character string but represented by an array of words meaning the same area, is converted into a part of the place name character string or A place name character string recording medium characterized in that an array of characters or syntax categories is defined for each of the partial character strings constituting the entirety, and is stored by being represented by a syntax category consisting of an array of characters or defined syntax categories.

8. A character reading device for inputting an image obtained by converting the density of a surface of a document into an electric signal and reading characters described on the document, using the place name expression method according to claim 1. Means for storing a place name character string, and an array of partial images in the input image, wherein each partial image is one of the place name character strings represented by the place name expression.
A place name character string recognition device comprising means for recognizing a place name by finding a character similar to each character included in the place name.

9. A place name expression method in which a place name representing an area is a different character string but expresses a set of place name character strings having a plurality of different notations represented by an array of words meaning the same area. Alternatively, a place name expression method characterized by expression according to a partial character string generation rule composed of word parts.