JPH0793467A

JPH0793467A - Address reading system

Info

Publication number: JPH0793467A
Application number: JP5236154A
Authority: JP
Inventors: Masaaki Shizuno; 正明静野
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-09-22
Filing date: 1993-09-22
Publication date: 1995-04-07

Abstract

PURPOSE:To attain high accuracy and high speed in the automatic reading processing of address notation. CONSTITUTION:All the address notation characters are character-recognized at first. With respect to the first combination of some characters in the recognized address notation characters, the pertient name of the metropolis and districts is retrieved by narrowing an object to the metropolis and districts (a first hierarchy). When the name of an actual metropolis and districts, Tokyo metropolis, e.g. is found as the result of the retrieving, the object is narrowed to the inside of Tokyo metropolis (a second hierarchy) and with respect to the next combination of some characters in the recognized address notation characters, a pertient community name (ward, city, etc.) is retrieved. When an actual community name, Adachi ward, e.g. is found as the result of the retrieving, the object is narrowed to the inside of Adachi ward (a third hierarchy) and with respect to the furthermore next combination of some characters in the recognized address notation characters, a place (the name of town, etc.) is retrieved. An address is retrieved in an order from a high-order hierarchy and the retrieving object is narrowed whenever the retrieving result is obtained like this so as to improve the rate of correct reading and to reduce a processing time.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、郵便物に印刷されあ
るいは手書き記載された住所表記を電子的に読み取り認
識する住所読取システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an address reading system which electronically reads and recognizes an address notation printed or handwritten on a mail piece.

【０００２】[0002]

【従来の技術】通常、住所読取処理では、まず文字を認
識し、次にこれを並べて単語を認識し、同時に（あるい
は続いて）単語を並べて住所を認識する。すなわち、住
所読取システムにおける認識処理は、「文字認識」と
「住所認識」という２つの処理に分けることができる。2. Description of the Related Art Generally, in an address reading process, characters are first recognized, then they are arranged to recognize words, and at the same time (or subsequently) words are arranged to recognize an address. That is, the recognition process in the address reading system can be divided into two processes, "character recognition" and "address recognition".

【０００３】ここで、文字認識処理部では読み取った文
字画像（文字パターン）を可能性のある全文字種の辞書
と照合するが、単語認識（住所表記語認識）ではその処
理量を軽減するため類似度の高い数文字種（例えば１０
種）のみを限定選択して住所認識処理部に送るようにし
ている。しかしこの場合、正解がこの中（選択された１
０種）に入らないと住所の読み取りが不能になる。Here, the character recognition processing unit collates the read character image (character pattern) with a dictionary of all possible character types, but in word recognition (address notation word recognition), the processing amount is reduced, so that the similarity is achieved. Several high-character types (for example, 10
Only the seeds) are selected and sent to the address recognition processing section. However, in this case, the correct answer is
If you do not enter 0), you will not be able to read the address.

【０００４】この発明に拠らない場合、住所読取システ
ムを構築するための方法としては、次のいずれかが考え
られる。（１）文字認識では各文字毎に住所認識処理で使用する
全文字種と照合／評価処理し、評価値の高い方から一定
数の文字種候補を選択して住所認識処理に渡す。（２）住所表記を上位階層から順に（例えば県→市→区
→町）処理し、各階層毎に文字認識、住所認識を繰り返
す。In the case of not relying on the present invention, one of the following methods can be considered as a method for constructing an address reading system. (1) In character recognition, each character is collated / evaluated with all the character types used in the address recognition process, and a certain number of character type candidates are selected from the one with the highest evaluation value and passed to the address recognition process. (2) Address notation is processed in order from the upper hierarchy (for example, prefecture → city → ward → town), and character recognition and address recognition are repeated for each hierarchy.

【０００５】[0005]

【発明が解決しようとする課題】上記（１）の場合、住
所認識で都道府県の認識を行なっている場合に、都道府
県名の文字認識結果の上位候補に市、区、あるいは町の
みに使用される文字が多く含まれると、正解文字が１０
種の候補から落ちてしまう場合がある。この場合は、当
然、正しい単語や住所を認識することができず、読取不
能となる。In the case of the above (1), when the prefecture is recognized by the address recognition, it is used only for a city, a ward, or a town as a high-ranked candidate of the character recognition result of the prefecture name. If many characters are included, the correct character will be 10
Sometimes you fall out of a candidate for a species. In this case, of course, the correct word or address cannot be recognized, which makes it unreadable.

【０００６】たとえば図６（ａ）の例では、「東」
「京」「都」の３文字画像から文字認識を行なった場合
に、「東」と「都」の２文字については１０種の候補中
に正解文字（下線付き）が入っているが、「京」につい
ては１０種の候補中に正解文字が入っていない（「京」
は第１１候補に入っている）。また「足」「立」「区」
の３文字画像から文字認識を行なった場合に、「足」と
「区」の２文字については１０種の候補中に正解文字
（下線付き）が入っているが、「立」については１０種
の候補中に正解文字が入っていない（「立」は第１１候
補に入っている）。この例では、「東京都足立区」の読
み取りは不能となる。For example, in the example of FIG. 6A, "east"
When character recognition is performed from the three-character image of "Kyo" and "Miyako", the correct characters (underlined) are included in the 10 candidates for the two characters of "East" and "Miyako". For "Kyo", the correct characters are not included in the 10 candidates ("Kyo"
Is in the 11th candidate). In addition, "foot""standing""ward"
When the character recognition is performed from the three-character image of, the correct answer character (underlined) is included in 10 kinds of candidates for 2 characters of "foot" and "ku", but 10 kinds of for "standing" No correct character is included in the candidate of "(right" is included in the 11th candidate). In this example, "Adachi Ward, Tokyo" cannot be read.

【０００７】一方、上記（２）の場合では、読み取る住
所階層に限定して文字認識を行なえるため（１）の場合
の欠点（読取不能）は解消できる。たとえば、「東京
都」が見つかればその後の数文字には「区」、「市」な
どの名称を表す数文字しか現われない。この場合、日本
全国の地域名称をカバーする全文字種から上位１０文字
を選ぶより、「東京都」に続いて現われる限られた文字
種の中から選ぶほうが、「区」や「市」の名称の正解が
１０候補文字種中に入る確率が高くなり、最終的に住所
読取率が向上する。On the other hand, in the case of the above (2), since the character recognition can be performed only in the address hierarchy to be read, the defect (unreadable) in the case of (1) can be solved. For example, if "Tokyo" is found, only a few letters that represent names such as "ward" and "city" will appear in the subsequent letters. In this case, rather than selecting the top 10 characters from all the character types that cover the regional names of all over Japan, choosing from the limited character types that appear after "Tokyo" is the correct answer for the names of "ward" and "city". Is more likely to be included in the 10 candidate character types, and finally the address reading rate is improved.

【０００８】しかし、上記（２）の場合では時間のかか
る文字認識処理を住所表記階層毎に繰り返すため処理効
率が悪く、処理時間が長くなる。たとえば、「神奈川
県」「川崎市」「幸区」「柳町」のように住所階層が深
くなると、住所判定結果を得るまでに何度も（この場合
４度）同じ文字の文字認識をすることになり、効率が悪
い。また上位階層（例えば川崎市）の住所認識が終わる
まで次の階層（例えば幸区）の文字認識ができないこと
から、文字認識で使用するハードウエアの遊び時間が生
じてしまい、さらに非効率的となる。この発明の目的
は、住所表記の自動読取処理において高精度化と高速化
を図った住所読取システムを提供することである。However, in the case of the above (2), since the time-consuming character recognition processing is repeated for each address notation hierarchy, the processing efficiency is poor and the processing time becomes long. For example, if the address hierarchy becomes deeper, such as "Kanagawa Prefecture", "Kawasaki City", "Suki Ward", "Yanagicho", the same character will be recognized many times (four times in this case) before the address judgment result is obtained. And the efficiency is low. In addition, since the character recognition of the next layer (for example, Sachi-ku) cannot be performed until the address recognition of the upper layer (for example, Kawasaki City) is completed, play time of the hardware used for character recognition is generated, which is more inefficient. Become. An object of the present invention is to provide an address reading system which is highly accurate and has high speed in automatic reading processing of an address notation.

【０００９】[0009]

【課題を解決するための手段】この発明の住所読取シス
テムは、正しい住所表記のデータ（都道府県名、各市区
町村名、その他の実在地名）を格納している住所辞書
（５０）と；１以上の住所表記階層（第１階層／都道府
県名「東京都」；第２階層「足立区」；第３階層「西新
井栄町」；第４階層・・・・・・）からなる住所表示に
用いられる１以上の住所表記語（「東京都」「足立区」
「西新井栄町」・・・・・・）を構成する文字各々を認
識して、文字認識結果（「東」「京」「都」「足」
「立」「区」「西」「新」「井」「栄」「町」・・・・
・・）を出力する文字認識手段（１０、３０；ＳＴ１
０、ＳＴ１２）と；前記１以上の住所表記語（「東京
都」「足立区」「西新井栄町」）各々の認識に必要な最
小限の文字種を含んで構成される１以上の文字種テーブ
ル（４０；ＳＴ１６、ＳＴ２０、ＳＴ２４）と；前記文
字認識手段（１０、３０）で認識された文字認識結果
（「東」「京」「都」・・・・・・）のうち、これから
認識しようとする住所階層（「東京都」）の判定に必要
な所定数の文字種（「東」「京」「都」それぞれ１字に
つき１０種）を前記文字種テーブル（４０；ＳＴ１６）
から抽出し、抽出された文字各々の組合せで前記住所辞
書（５０）を引いて、該当する住所表記語（「東京
都」）を検出する住所認識手段（２０）とを備えてい
る。The address reading system of the present invention includes an address dictionary (50) storing correct address notation data (prefecture name, city name, other real name); Used for address display consisting of the above address notation hierarchy (1st hierarchy / prefecture name “Tokyo”; 2nd hierarchy “Adachi-ku”; 3rd hierarchy “Nishiaraicho”; 4th hierarchy ...) One or more address notation ("Tokyo", "Adachi-ku")
Recognizing each of the characters that make up "Nishiarai Sakae Town" ...
"Stand""ward""west""new""well""Sakae""town" ...
..) character recognition means (10, 30; ST1)
0, ST12); and one or more character type tables (40 including minimum character types necessary for recognition of each of the one or more address notation words (“Tokyo”, “Adachi Ward”, “Nishiarai Sakaemachi”)). ST16, ST20, ST24), and; from among the character recognition results (“East”, “Kyo”, “Miyako” ...) Recognized by the character recognition means (10, 30) The character type table (40; ST16) of the predetermined number of character types (10 types for each character of "East", "Kyo", and "Miyako") necessary for determining the address hierarchy ("Tokyo")
And an address recognition means (20) for detecting the corresponding address notation word (“Tokyo”) by subtracting the address dictionary (50) with each combination of the extracted characters.

【００１０】[0010]

【作用】住所を構成する各文字について、各住所階層の
住所表記に用いられる全文字種との類似度（あるいは評
価値）が予め求められ、各住所階層の文字テーブルにま
とめられている。住所構成文字の階層を上位（都道府県
名）から順に読み進めていく際に、上位階層文字の読取
結果に基づき次の階層（市／区名など）の読取に必要な
類似文字種のみで構成される文字テーブルを選択して参
照することにより、住所文字の読取正解率の向上と処理
時間の短縮を図る。With respect to each character forming the address, the degree of similarity (or evaluation value) with all the character types used in the address notation of each address hierarchy is obtained in advance and is summarized in the character table of each address hierarchy. When reading the hierarchy of address constituent characters in order from the upper level (prefecture name), only the similar character type necessary for reading the next level (city / ward name, etc.) is configured based on the reading result of the upper level characters. By selecting and referring to a character table to be read, it is possible to improve the reading accuracy rate of address characters and shorten the processing time.

【００１１】[0011]

【実施例】以下、図面を参照してこの発明の一実施例に
係る住所読取システムを説明する。図５は、この住所読
取システムに適用されるハードウエアの構成の概要を示
す。この実施例システムは、光電変換ユニット１０２、
領域検出ユニット１０４、行検出ユニット１０６、文字
検出ユニット１０８および文字認識ユニット１１０から
なる文字認識部１０と、文字認識ユニット１１０で文字
認識の際に参照される基本文字を全て含む文字辞書３０
と、文字種テーブル４０および住所辞書５０を参照して
文字認識部１０で認識された読取文字の組合せから住所
を認識する住所認識ユニット２０とで構成されている。
文字認識ユニット１１０および住所認識ユニット２０は
マイクロコンピュータ（ＣＰＵ）を含んでおり、これら
のユニットの機能はそのＣＰＵ上で走るソフトウエアで
実現される。なお、文字認識部１０そのものは従来の光
学文字読取装置（ＯＣＲ）等で採用されている文字認識
技術により構成できる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An address reading system according to an embodiment of the present invention will be described below with reference to the drawings. FIG. 5 shows an outline of a hardware configuration applied to this address reading system. The system of this embodiment includes a photoelectric conversion unit 102,
A character recognition unit 10 including an area detection unit 104, a line detection unit 106, a character detection unit 108, and a character recognition unit 110, and a character dictionary 30 that includes all basic characters referred to by the character recognition unit 110 for character recognition.
And an address recognition unit 20 that recognizes an address from a combination of read characters recognized by the character recognition unit 10 with reference to the character type table 40 and the address dictionary 50.
The character recognition unit 110 and the address recognition unit 20 include a microcomputer (CPU), and the functions of these units are realized by software running on the CPU. The character recognition unit 10 itself can be configured by a character recognition technique adopted in a conventional optical character reader (OCR) or the like.

【００１２】初めに、この実施例システムの内容を簡単
に説明する。光電変換ユニット１０２は、読取対象文字
を含む郵便物の表面画像を取り込み、これを２値化し
て、文字画像を含む原画像データを出力する。First, the contents of the system of this embodiment will be briefly described. The photoelectric conversion unit 102 captures the surface image of the mail including the character to be read, binarizes the image, and outputs the original image data including the character image.

【００１３】領域検出ユニット１０４は、郵便物の表面
を走査して得た窓信号および精走査信号から、宛名の記
載された領域を抽出する。すなわち、まず郵便物全体の
画像情報（原画像データ）を２次元的に圧縮処理し、そ
の処理画像により大局的な画像のブロック化を図り、細
部に捕われない大局的な領域検出を行なう。次に、ブロ
ック単位の投影データを生成し、線分の複雑さ／方向性
を判定し、その判定結果を予め与えられた宛名領域編集
知識によって編集して、宛名領域を決定する。The area detection unit 104 extracts the area in which the address is described from the window signal and the fine scanning signal obtained by scanning the surface of the mail. That is, first, the image information (original image data) of the entire postal matter is two-dimensionally compressed, and the processed image is used to make a global image into blocks, thereby performing a global area detection that is not captured in details. Next, projection data for each block is generated, the complexity / direction of the line segment is determined, and the determination result is edited by previously given address area editing knowledge to determine the address area.

【００１４】宛名領域からは、領域内における画像濃度
ヒストグラムから求められる２値化しきい値候補、検出
された宛名領域位置から求められる記載方向候補、およ
び手書き／印刷活字などの字体候補が出力される。宛名
領域編集知識には、多量の郵便物から抽出した宛名記載
位置／領域について、統計的調査に基づくデータが宛名
領域画像知識として取り込まれている。From the address area, a binarization threshold value candidate obtained from the image density histogram in the area, a writing direction candidate obtained from the detected address area position, and a character style candidate such as handwritten / printed characters are output. . In the address area editing knowledge, data based on a statistical survey regarding address description positions / areas extracted from a large amount of postal matter is incorporated as address area image knowledge.

【００１５】郵便物表面には宛名と同様に記載されてい
る差出人名、差出人住所、通信文、切手などの料額印
面、そして多種多様な広告などがあり、これらの位置、
領域、複雑さ、印字方向などが宛名領域画像知識の構築
に利用される。On the surface of the postal matter, there are a sender's name, a sender's address, a correspondence, a stamp stamp such as a stamp, and various kinds of advertisements, as well as an address.
Area, complexity, print direction, etc. are used to build address area image knowledge.

【００１６】行検出ユニット１０６は、領域検出ユニッ
ト１０４からの出力を受け、宛名文字の行を囲う枠線ま
たは宛名記載上の罫線などのノイズ成分を取り除き、文
字行単位の分離抽出を行なう。また検出された文字行単
位に印字濃度ヒストグラムを求め、検出文字行単位での
最適２値化しきい値を決定する。The line detection unit 106 receives the output from the area detection unit 104, removes noise components such as a frame line surrounding a line of an address character or a ruled line on the address description, and separates and extracts each character line. Further, a print density histogram is obtained for each detected character line, and the optimum binarization threshold value for each detected character line is determined.

【００１７】ここでの文字行単位検出のアルゴリズムは
宛名領域検出のアルゴリズムと基本的には同様である。
しかし文字行単位検出では大局的な捕え方はせず、宛名
領域検出の場合よりも詳細な画像分析によって下線領
域、空白領域などを除外し読取対象画像（行画像）を限
定抽出する。The character line unit detection algorithm here is basically the same as the address area detection algorithm.
However, the character line unit detection does not take a global view, and the underline area, the blank area, etc. are excluded by a more detailed image analysis than the case of the address area detection, and the image to be read (line image) is limitedly extracted.

【００１８】文字検出ユニット１０８は、行検出ユニッ
ト１０６で抽出／決定された行画像／２値化しきい値に
よって、文字単位に画像を分離する。すなわち、文字行
幅データによって予め定めたしきい値以下を印刷活字宛
名と仮定し、それ以外を手書き宛名と仮定する。そし
て、行方向の文字分離を射影情報から検出する。その検
出論理は文字間の余白によって生ずるスペース検知であ
るが、文字間の接触、文字内の分離については検出され
た文字外接枠を正方形を基準として変動予測評価し決定
する。評価の決定があいまいな場合は複数の検出候補を
認める。The character detection unit 108 separates the images character by character according to the row image / binarization threshold value extracted / determined by the row detection unit 106. That is, it is assumed that the print type address is equal to or less than a threshold value determined in advance by the character line width data, and the others are handwritten addresses. Then, character separation in the row direction is detected from the projection information. The detection logic is a space detection caused by a space between characters, but regarding the contact between characters and the separation within characters, the detected character circumscribing frame is evaluated by fluctuation prediction based on a square and determined. Allow multiple detection candidates if the evaluation decision is ambiguous.

【００１９】文字認識ユニット１１０は、公知の光学文
字読取装置（ＯＣＲ）と同様な文字認識処理を行なう。
この文字認識処理にあたり予め用意される識別対象文字
種は、アラビア数字／漢数字および片仮名／平仮名の全
文字と宛名表記に用いられる漢字約２００文字である。
これらの文字（合計４００文字以下）は、印刷活字用と
手書き文字用にそれぞれ用意される（用意される文字数
は最大８００文字程度）。The character recognition unit 110 performs the same character recognition processing as a known optical character reader (OCR).
The character types to be identified prepared in advance in this character recognition process are all characters of Arabic numerals / kanji and katakana / hiragana and about 200 kanji used for addressing.
These characters (400 characters or less in total) are prepared for print type characters and handwritten characters (the maximum number of prepared characters is about 800 characters).

【００２０】上述のユニット１０２〜１１０各々につい
ては従来技術で構成できるので、これ以上詳細に立ち入
った説明は避けることにする。あいまいさを含んだ入力
文字画像に対し、文字識別を１００％正解するよう図る
ことは現実的でない。そこで、本願実施例では、後に住
所知識（住所辞書５０）を利用した後処理があることを
前提にして、入力画像単位に識別候補という形で類似度
の高いものから順に１０候補を出力するようにしてい
る。こうすることによって、識別候補を単一にしたので
は得られない高い識別率を獲得できる。Since each of the above-mentioned units 102-110 can be constructed by conventional techniques, a more detailed description will be avoided. It is not realistic to attempt 100% correct character identification for an input character image containing ambiguity. Therefore, in the present embodiment, assuming that there is post-processing that uses address knowledge (address dictionary 50) later, 10 candidates are output in order from the highest similarity in the form of identification candidates in input image units. I have to. By doing so, it is possible to obtain a high identification rate that cannot be obtained by using a single identification candidate.

【００２１】また、記載上のあいまいさとして残された
記載方向（縦書き／横書き）の区分については、入力文
字画像を９０°および１８０°回転させて識別処理を行
ない、それぞれの回転単位について一連の文字候補行列
を出力する。Further, regarding the division of the writing direction (vertical writing / horizontal writing) left as the ambiguity in the writing, the input character image is rotated by 90 ° and 180 ° for identification processing, and a series of rotation units are set. Output the character candidate matrix of.

【００２２】住所認識ユニット２０では、配達局管内に
与えられた区名、町名、大口受取人名などの知識データ
ベース（辞書５０の一部）を用い、文字識別候補行列か
ら宛名を決定する。たとえば東京都足立区の足立郵便局
を例に取ると、住所として用意した標準表記町名は３１
種あり、また大口受取人名としては足立区役所など１０
種ある。しかしそれらの名称の表記は様々な変形を含む
ので、他の町名／大口受取人名と競合しない限りにおい
て、変形の頻度に応じて学習を進め、宛名知識データを
増やすようにする。The address recognition unit 20 determines the address from the character identification candidate matrix using a knowledge database (part of the dictionary 50) of the ward name, town name, large recipient name, etc. given to the delivery office. For example, taking the Adachi post office in Adachi-ku, Tokyo as an example, the standard notation town name prepared as an address is 31.
There are 10 types of seeds, such as Adachi Ward Office for large recipients.
There is a seed. However, since the notations of these names include various variations, learning is advanced according to the frequency of variations and the address knowledge data is increased unless it conflicts with other town names / large recipient names.

【００２３】たとえば、足立区の「小台」という町名を
例に取ると、標準の知識では「オダイ」となるがこの町
名には「コダイ」、［オタイ］、「コタイマチ」、「オ
ダイチョウ」などの変形がある。これらの変形呼称がい
ずれも足立区の「小台」を指すものとして、宛名知識デ
ータベース（住所辞書５０）に蓄積される。For example, taking the town name "Odai" in Adachi Ward as an example, the standard knowledge is "Odai", but this town name is "Kodai", "Otai", "Kotaimachi", "Odaicho", etc. There is a variation of. All of these modified names are stored in the address knowledge database (address dictionary 50) as pointing to "small stand" in Adachi Ward.

【００２４】上述したような変形を含む住所／大口受取
人名による知識から、２文字以上連接した文字群によっ
て作られる単語を作成し、この単語によって前記文字識
別候補行列を評価する。たとえば作成された単語が「東
京都」であり対応する文字識別候補行列が図６（ｂ）に
示すようなものであれば、「東東群」、「東東都」、
「東京群」、「東京都」、「京東群」、「京東都」、
「京東群」、「京京都」といった語群から、「東京都」
という住所辞書５０に（宛名知識として）登録された地
名が最終的に選択される。From the knowledge based on the address / major recipient's name including the above-described modifications, a word made up of a group of two or more characters concatenated is created, and the word is used to evaluate the character identification candidate matrix. For example, if the created word is “Tokyo” and the corresponding character identification candidate matrix is as shown in FIG. 6B, “East East group”, “East East capital”,
"Tokyo group", "Tokyo", "Kyoto group", "Kyoto capital",
From the group of words such as "Kyoto group" and "Kyoto group", "Tokyo"
The place name registered in the address dictionary 50 (as addressing knowledge) is finally selected.

【００２５】その後、全ての単語照合が終了した段階
で、いくつかの単語候補について再度知識処理が行なわ
れる。それは単語照合と同様に単語間の隣接関係を保っ
たうえで、住所または大口受取人名としての単語並びが
得られるかどうかを評価する処理である。たとえば、東
京都という単語の次に足立区という単語が並びその次に
町名が並んでいるか、あるいは足立区の次に区役所など
の大口受取人名称単語が並んでいるか、といった点が評
価される。この評価では、単語毎に所定の得点を与え
（現実の住所地名表記に近いものほど高得点）、得点の
積算により最も高い得点を得た宛名を出力するようにで
きる。After that, when all word matching is completed, knowledge processing is performed again for some word candidates. Similar to word matching, it is a process of evaluating whether or not a word sequence as an address or a large recipient name can be obtained while maintaining the adjacency relationship between words. For example, whether the word "Tokyo" is followed by the word "Adachi-ku" followed by the town name, or the word "adachi-ku" is followed by large recipient address words such as ward offices is evaluated. In this evaluation, a predetermined score is given for each word (the closer to the actual address and place name notation, the higher the score), and the address having the highest score can be output by accumulating the scores.

【００２６】大口受取人宛ての郵便物を除いて、町名以
降の丁目、街区（番／号など）を識別するため、検出さ
れた町名の文字画像上の位置および宛名記載書式コード
（縦書き／横書き／回転などの情報）を従属情報として
次の処理に渡す。In order to identify the streets after the town name and the block (number / number etc.) except the mail addressed to the large recipient, the position of the detected town name on the character image and the address description format code (vertical writing / (Information such as horizontal writing / rotation) is passed to the next process as dependent information.

【００２７】町名までの認識後では、町名に連なる文字
画像が丁目／街区を表すことが明かとなる。住所認識ユ
ニット２０は、この丁目／街区の部分を識別して最終的
な宛名を決定し、これを宛名区分コードに変換して出力
する。After recognition up to the town name, it becomes clear that the character image in the street name represents the chome / block. The address recognition unit 20 identifies this part of the chome / block, determines a final address, converts it into an address classification code, and outputs it.

【００２８】以上まとめると、図５の実施例は以下のよ
うに機能する。すなわち、文字認識部１０では文字辞書
３０が参照され、手書き文字あるいは印刷活字の文字読
取／認識が行なわれる。住所認識ユニット２０では文字
テーブル４０が参照され、文字認識部１０で認識された
１以上の読取文字の組合せが１以上作成される。次に作
成された読取文字の組合せによって住所辞書５０が引か
れ、この辞書５０から読取文字組合せに該当する正しい
住所表示語が読み出される。辞書５０から読み出された
１以上の住所表示語の文字コードと、これらの住所表示
語に続く住所表示数値（丁目、番、号など）のコードが
まとめられて、住所読取結果（宛名区分コード）として
出力される。In summary, the embodiment of FIG. 5 functions as follows. That is, the character recognition unit 10 refers to the character dictionary 30 to read / recognize handwritten characters or printed characters. The address recognition unit 20 refers to the character table 40 and creates one or more combinations of one or more read characters recognized by the character recognition unit 10. Next, the address dictionary 50 is looked up by the created combination of read characters, and the correct address display word corresponding to the read character combination is read from this dictionary 50. The character code of one or more address display words read from the dictionary 50 and the code of the address display numerical values (chome, number, number, etc.) following these address display words are collected, and the address read result (address classification code) is collected. ) Is output.

【００２９】次に、図１を参照して、文字認識後の住所
表記文字から住所が読み取られるまでの過程をいくつか
の段階に分けて説明する。（１）まず、郵便物の紙面に記載された住所文字は、図
５の光電変換ユニット１０２、領域検出ユニット１０
４、行検出ユニット１０６、および文字検出ユニット１
０８を経て、図６（ａ）の文字画像のように１文字づつ
に切り分けられる。Next, referring to FIG. 1, the process from the address notation character after character recognition until the address is read will be described in several stages. (1) First, the address character written on the paper surface of the mail is the photoelectric conversion unit 102 and the area detection unit 10 shown in FIG.
4, line detection unit 106, and character detection unit 1
After 08, it is divided into each character as shown in the character image of FIG.

【００３０】次に各文字画像は、図５の文字認識ユニッ
ト１１０において文字辞書３０に登録されている全ての
文字種と照合／評価され（図１のステップＳＴ１０、ス
テップＳＴ１２）、図６（ａ）の最上段の「文字画像」
の行に示すような文字認識結果が得られる。（ここで文
字画像は正しい文字「東京都足立区…」で例示してある
が、これらはたとえば「東東群足土区…」のように誤っ
た文字を含んでいてもよい。）その後の住所認識処理の
処理量を大幅に軽減するため、住所認識処理で扱う文字
は、図６（ａ）のように評価値の高い方から一定数の候
補、例えば１０候補に限定される。この場合、各文字の
正解の評価値が１１位以下の場合は住所読取は不能とな
り、正解の読取率は低下する。Next, each character image is collated / evaluated with all the character types registered in the character dictionary 30 in the character recognition unit 110 of FIG. 5 (step ST10, step ST12 of FIG. 1), FIG. 6 (a). "Character image" at the top of
The character recognition result as shown in the line is obtained. (Here, the character image is exemplified by the correct character "Adachi-ku, Tokyo ...", but these may include incorrect characters such as "East-higashi Ashito-ku ..." In order to significantly reduce the processing amount of the address recognition processing, the characters handled in the address recognition processing are limited to a certain number of candidates, for example, 10 candidates from the one having the highest evaluation value as shown in FIG. In this case, when the evaluation value of the correct answer of each character is 11th or less, the address reading becomes impossible and the reading rate of the correct answer decreases.

【００３１】すなわち、文字認識ユニット１１０から文
字認識結を出力する前にこの候補選択処理を行なうと、
図６（ａ）の「京」の様に都道府県名に用いられる文字
を読み取る場合でも、都道府県名には現われない市、
区、町名の文字などが上位に上がり、正解の「京」が１
０の候補に残らない場合が多発する。That is, if this candidate selection processing is performed before the character recognition unit 110 outputs the character recognition result,
Even when reading a character used for a prefecture name such as "Kyo" in FIG. 6A, a city that does not appear in the prefecture name,
Characters such as ward and town name go up to the top, and correct answer "Kyo" is 1
Frequently, there are cases where the candidate does not remain 0.

【００３２】（２）そこで本願実施例では、正解文字が
候補に残らないということをなくすために、文字認識ユ
ニット１１０での認識結果（たとえば東京都足立区西新
井栄町１丁目２番３号）を全て住所認識ユニット２０に
送るようにしている。そして、住所認識ユニット２０で
は、住所を住所表記の上位階層（都道府県名称）から段
階的に読み進める。(2) Therefore, in the present embodiment, in order to prevent the correct characters from remaining in the candidates, the recognition result in the character recognition unit 110 (for example, 1-3-2 Nishiaraieicho, Adachi-ku, Tokyo) is used. All are sent to the address recognition unit 20. Then, in the address recognition unit 20, the address is read step by step from the upper hierarchy (prefecture name) of the address notation.

【００３３】まず、読取対象として都道府県が指定され
る（ステップＳＴ１４）。次に文字種テーブル４０か
ら、都道府県名称（１都２府１道４３県）だけの読み取
りに必要な文字種リスト（都道府県専用文字テーブル）
が文字辞書／基本文字候補３０から読み出される（ステ
ップＳＴ１６）。First, the prefecture is designated as the reading target (step ST14). Next, from the character type table 40, a list of character types required to read only prefecture names (1 prefecture, 2 prefectures, 43 prefectures) (character tables dedicated to prefectures)
Is read from the character dictionary / basic character candidate 30 (step ST16).

【００３４】そして、文字認識結果の住所表記文字
（「東」「京」「都」）各々に対して、この文字種リス
トに含まれる中で上位１０候補が選択される。この結
果、図６（ｂ）に示すように、都道府県指定による該当
文字候補が得られる。Then, for each address notation character (“East”, “Kyo”, “Miyako”) as a result of character recognition, the top 10 candidates in the character type list are selected. As a result, as shown in FIG. 6B, the corresponding character candidates designated by the prefecture are obtained.

【００３５】すなわち、図６（ａ）の例のように、単に
文字認識結果の上位１０候補を選択した時には１０候補
中に入らなかった「京」の正解文字が、都道府県指定下
の選択では１０候補中に入るようになる。これらの文字
候補（東、京、郡、都など）の組合せ単語と住所辞書５
０の登録内容とが照合されて、正しい住所表記（東京
都）が読み取られる。That is, as in the example of FIG. 6 (a), when the top 10 candidates of the character recognition result are simply selected, the correct character of "Kyo" which is not included in the 10 candidates is selected by the prefecture. You will be in the 10 candidates. Combination word of these character candidates (east, Kyoto, county, capital, etc.) and address dictionary 5
The registered content of 0 is checked and the correct address notation (Tokyo) is read.

【００３６】（３）図６（ｂ）の例では、「東京都」が
読み取れたので、次の読取対象は都下の市／区／郡／島
名となる。次に、読取対象として東京都（他の都道府県
名、例えば神奈川県でもやり方は同様）が指定される
（ステップＳＴ１８）。(3) In the example of FIG. 6B, "Tokyo" has been read, so the next read target is the city / ward / county / island name of the suburbs. Next, the Tokyo metropolitan area (name of other prefecture, for example, the method is the same in Kanagawa prefecture) is designated as the reading target (step ST18).

【００３７】次に文字種テーブル４０から、都下の地域
名称（２３区２７市ほか）の読み取りに必要な文字種リ
スト（区／市／郡／島専用文字テーブル；すなわち地域
名文字テーブル）が基本文字候補３０から読み出される
（ステップＳＴ２０）。Next, from the character type table 40, the character type list (ward / city / county / island-specific character table; that is, area name character table) necessary for reading the local name (23 wards, 27 cities, etc.) in the suburbs is the basic character. It is read from the candidate 30 (step ST20).

【００３８】そして、文字認識結果の住所表記文字
（「足」「立」「区」）各々に対して、この文字種リス
トに含まれる中で上位１０候補が選択される。この結
果、図６（ｃ）に示すように、東京都指定下における該
当文字候補が得られる。Then, for each address notation character (“foot”, “standing”, “ward”) as a result of character recognition, the top 10 candidates in the character type list are selected. As a result, as shown in FIG. 6C, the corresponding character candidate under the designation of Tokyo is obtained.

【００３９】すなわち、単に文字認識結果の上位１０候
補を選択した時（図６（ａ）の例）には１０候補中に入
らなかった「立」の正解文字が、東京都指定下の選択で
は１０候補中に入るようになる。これらの文字候補
（足、北、区、荒、立、国など）の組合せ単語と住所辞
書５０の登録内容とが照合されて、正しい住所表記（足
立区）が読み取られる。That is, when the top 10 candidates of the character recognition result are simply selected (the example of FIG. 6A), the correct character of "standing" which is not included in the 10 candidates is selected by the designated Tokyo. You will be in the 10 candidates. The combination word of these character candidates (Adachi, Kita, Ward, Ara, Tachi, Nation, etc.) is collated with the registered contents of the address dictionary 50 to read the correct address notation (Adachi Ward).

【００４０】（４）次に、読取対象として足立区（他の
都下地域名、例えば府中市でも同様）が指定される（ス
テップＳＴ２２）。続いて文字種テーブル４０から、足
立区内の地域名称（町名など）の読み取りに必要な文字
種リスト（町名／地名専用文字テーブル）が基本文字候
補３０から読み出される（ステップＳＴ２４）。(4) Next, Adachi Ward (name of another suburban area, for example, Fuchu City) is designated as a reading target (step ST22). Subsequently, a character type list (town name / place name-dedicated character table) necessary for reading the area name (town name, etc.) in Adachi-ku is read from the character type table 40 from the basic character candidate 30 (step ST24).

【００４１】そして、文字認識結果の住所表記文字
（「西」「新」「井」「栄」「町」）各々に対して、こ
の文字種リストに含まれる中で上位１０候補が選択され
る。この結果、足立区指定による該当文字候補が得られ
る。Then, for each address notation character (“West”, “New”, “I”, “Sakae”, “Machi”) as a result of character recognition, the top 10 candidates in the character type list are selected. As a result, a corresponding character candidate designated by Adachi Ward is obtained.

【００４２】すなわち、住所表記文字（「西」「新」
「井」「栄」「町」の５文字）各々に対する上位１０候
補中に正解文字が入るようになる。これらの各１０文字
候補（合計５０文字）の組合せ単語と住所辞書５０の登
録内容とが照合されて、正しい住所表記（西新井栄町）
が読み取られる。That is, the address notation characters (“west”, “new”)
Correct characters will be included in the top 10 candidates for each of "I", "Sakae", and "Machi". The combination word of each of these 10-character candidates (50 characters in total) is collated with the registered contents of the address dictionary 50, and correct address notation (Nishiaraeicho)
Is read.

【００４３】ここで、読み取った住所が最終段階であれ
ば処理を終え、更に下位階層があれば読取対象を更新し
て別の文字種リスト（専用文字テーブル）を基本文字候
補３０から読み出して、該当住所表記の読み取りが行な
われる。If the read address is at the final stage, the process is terminated. If there is a lower layer, the read target is updated, another character type list (dedicated character table) is read from the basic character candidate 30, and the corresponding The address notation is read.

【００４４】（５）その後、読み取った都道府県名と区
市町村名（東京都足立区西新井栄町）の文字コードが出
力され（ステップＳＴ２６）、これに街区識別コード
（丁目／番地／号の数値コード）が付加される（ステッ
プＳＴ２８）。そして、これらの文字／数値コードから
最終的な住所区分コードが求められ、住所認識ユニット
２０から住所区分コードが出力される（ステップＳＴ３
０）。(5) Thereafter, the character codes of the prefecture name and the ward municipality name (Nishiaraimachi, Adachi-ku, Tokyo) that have been read are output (step ST26), and the block identification code (numerical code of chome / address / number) is output to this. ) Is added (step ST28). Then, a final address classification code is obtained from these character / numerical codes, and the address recognition unit 20 outputs the address classification code (step ST3).
0).

【００４５】図２は、この発明の一実施例に係る住所読
取システムにおいて、文字認識後の住所表記文字のう
ち、第１階層の住所表示（都道府県）を読み取る手順を
示す。まず、都道府県名から始まる文字列が図５の文字
認識部１０に入力される（ステップＳＴ１００）。する
と文字認識ユニット１１０のＣＰＵは文字辞書３０を参
照して入力された文字列の認識処理を行なう（ステップ
ＳＴ１０２）。文字辞書３０の記憶容量は通常は８００
文字程度でよいが、状況によりもっと大容量（たとえば
３０００文字分）にしてもよい。FIG. 2 shows a procedure of reading the address display (prefecture) of the first hierarchy among the address notation characters after character recognition in the address reading system according to the embodiment of the present invention. First, a character string starting with the prefecture name is input to the character recognition unit 10 in FIG. 5 (step ST100). Then, the CPU of the character recognition unit 110 refers to the character dictionary 30 to perform the recognition process of the input character string (step ST102). The storage capacity of the character dictionary 30 is normally 800.
Although it may be about characters, it may be larger (for example, 3000 characters) depending on the situation.

【００４６】文字認識処理が済むと、ユニット１１０は
都道府県名から始まる文字列を出力する（ステップＳＴ
１０４）。ここで、文字認識が正しく行なわれたと仮定
すれば、ユニット１１０からは、たとえば「東」「京」
「都」「足」「立」「区」「西」「新」「井」「栄」
「町」・・・・・・を示す文字コードの集合が出力され
る。この文字認識が誤りを含んでいるときは、たとえば
「東」「東」「群」「足」「北」「区」・・・・・・と
いった文字コードの集合が出力される。After the character recognition processing is completed, the unit 110 outputs a character string starting with the prefecture name (step ST
104). Here, assuming that the character recognition is correctly performed, from the unit 110, for example, "East" and "Kyo"
"Capital""foot""standing""ward""west""new""well""Sakae"
A set of character codes indicating "town" ... Is output. When this character recognition includes an error, a set of character codes such as "East", "East", "Group", "Feet", "North", "Ku" ... Is output.

【００４７】文字認識ユニット１１０から出力された文
字コードの集合は、住所表記の上位階層（都道府県名）
から順に下位階層（市／区／町名）に向かって並んでい
る。このような文字コードの集合（文字列）が、住所認
識ユニット２０に渡される。The set of character codes output from the character recognition unit 110 is the upper layer of the address notation (prefecture name).
They are arranged in order from the lower hierarchy (city / ward / town name). A set (character string) of such character codes is passed to the address recognition unit 20.

【００４８】住所認識ユニット２０では、まず都道府県
名の読み取りから初めるために、読取対象を「都道府
県」にセットする（ステップＳＴ１０６）。すると住所
認識ユニット２０のＣＰＵは、都道府県名にだけ用いら
れる文字で構成される都道府県文字テーブルを基本文字
候補３０から抽出する（ステップＳＴ１０８）。都道府
県文字テーブルの記憶容量は、１００文字分あれば足り
る。In the address recognition unit 20, the reading target is set to "prefecture" in order to start reading the prefecture name (step ST106). Then, the CPU of the address recognition unit 20 extracts a prefecture character table composed of characters used only for the prefecture name from the basic character candidate 30 (step ST108). The storage capacity of the prefecture character table is enough for 100 characters.

【００４９】ユニット２０のＣＰＵは、文字認識後の文
字列の頭から３文字（たとえば「東」「東」「群」）を
取り出し、各文字について、都道府県文字テーブルか
ら、類似度の高い順に１０文字を抽出する（ステップＳ
Ｔ１１０）。すなわち、合計３０文字が都道府県名の読
み取りのための候補として挙げられる。The CPU of the unit 20 extracts three characters (for example, "East", "East", and "Group") from the beginning of the character string after character recognition, and for each character, from the prefecture character table, in descending order of similarity. Extract 10 characters (step S
T110). That is, a total of 30 characters are listed as candidates for reading the prefecture name.

【００５０】なお、都道府県名のうち神奈川県、和歌山
県、鹿児島県は４文字であるが、これらの県名はその最
初の３文字（神奈川、和歌山、鹿児島）で識別できる。
一方、文字認識後の文字列の頭から２文字しか読まない
と、仮に都道府県名が平仮名で記載されていた場合（た
とえば「やまがた県」「やまなし県」「やまぐち
県」）、都道府県名の識別が不能になる。３文字読め
ば、平仮名記載が含まれていても、都道府県名の識別は
可能になる。Of the prefecture names, Kanagawa prefecture, Wakayama prefecture, and Kagoshima prefecture have four characters, but these prefecture names can be identified by the first three characters (Kanagawa, Wakayama, Kagoshima).
On the other hand, if only two characters are read from the beginning of the character string after character recognition, if the prefecture name is written in hiragana (for example, "Yamagata prefecture", "Yamamanashi prefecture", "Yamaguchi prefecture"), the prefecture The first name cannot be identified. If three characters are read, it is possible to identify the prefecture name even if the hiragana entry is included.

【００５１】次にユニット２０のＣＰＵは、抽出された
３組１０種の各文字組グループそれぞれから１文字づつ
取り出して３文字の組合せ単語を順次作りこれらの３文
字単語と住所辞書５０（知識データベース含む）に登録
された住所表記名とを逐次一対比較する（ステップＳＴ
１１２）。Next, the CPU of the unit 20 takes out one character from each of the extracted three character sets and each of the ten character group groups and sequentially creates three character combination words, and these three character words and the address dictionary 50 (knowledge database). A pair of address notation names registered in (including) are successively compared (step ST
112).

【００５２】この一対比較の結果、一致単語（文字列
「東京都」に対する登録語「東京都」）が見つかれば、
あるいは知識データベースを利用して対応単語（文字列
「東東群」に対する登録語「東京都」）が見つかれば、
次の処理に移る（ステップＳＴ１１４、イエス）。一致
単語あるいは対応単語が見つからなければ、読取不能と
してその処理を終了し（ステップＳＴ１１４、ノー）別
の郵便物の住所読取処理に移る。As a result of this pairwise comparison, if a matching word (registered word "Tokyo" for the character string "Tokyo") is found,
Or if you can find the corresponding word (registered word "Tokyo" for the character string "East and East") using the knowledge database,
The process moves to the next process (step ST114, Yes). If the matching word or the corresponding word is not found, it is considered unreadable and the process is terminated (step ST114, No), and the process proceeds to another mail address reading process.

【００５３】上記一対比較で一致単語（文字列「東京
都」に対する登録語「東京都」）が見つかると、現在の
住所階層が最終階層であるかどうかチェックされる（ス
テップＳＴ１１６）。もし最終階層であれば（ステップ
ＳＴ１１６、イエス）、ユニット２０のＣＰＵは、読み
取られた住所表記の文字コードに街区表示（丁目／番／
号など）の数字コードを付け、そこから対応する住所区
分コードを求めて、住所読取判別結果を出力する（ステ
ップＳＴ１１８）。When a matching word (registered word "Tokyo" for the character string "Tokyo") is found in the above pair comparison, it is checked whether the current address hierarchy is the final hierarchy (step ST116). If it is the final layer (step ST116, Yes), the CPU of the unit 20 displays the block code (chome / number /
No.) and the corresponding address classification code is obtained from the numerical code, and the address read determination result is output (step ST118).

【００５４】しかし、この時点（ステップＳＴ１０６〜
ステップＳＴ１１４）では都道府県を読み取っていたの
であるから最終階層ではない（ステップＳＴ１１６、ノ
ー）。However, at this point (step ST106-
Since the prefecture was read in step ST114), it is not the final layer (step ST116, No).

【００５５】図３は、図２の手順の続きであって、文字
認識後の住所表記文字のうち、第２階層の住所表示（区
／市など）を読み取る手順を示す。住所認識ユニット２
０は、ステップＳＴ１１２での比較結果に基づいて、次
に都下の区／市／郡／島の読み取りから初めるために、
読取対象を「東京都」にセットする（ステップＳＴ１２
０）。すると住所認識ユニット２０のＣＰＵは、東京都
下の地名にだけ用いられる文字で構成される東京都文字
テーブルを基本文字候補３０から抽出する（ステップＳ
Ｔ１２２）。東京都文字テーブルの記憶容量は、１００
文字分で足りる。FIG. 3 is a continuation of the procedure of FIG. 2 and shows a procedure of reading the address display (ward / city, etc.) of the second hierarchy among the address notation characters after character recognition. Address recognition unit 2
0 means to start reading the next ward / city / county / island based on the comparison result in step ST112,
The reading target is set to "Tokyo" (step ST12).
0). Then, the CPU of the address recognition unit 20 extracts a Tokyo character table composed of characters used only for the place name under Tokyo from the basic character candidate 30 (step S).
T122). The memory capacity of the Tokyo character table is 100.
Just enough letters.

【００５６】ユニット２０のＣＰＵは、文字認識後の文
字列の４字目以降の５文字（たとえば「足」「北」
「区」「西」「新」）を取り出し、各文字について、東
京都文字テーブルから、類似度の高い順に１０文字を抽
出する（ステップＳＴ１２４）。すなわち、合計５０文
字が都下の区／市／郡／島の地名の読取候補として挙げ
られる。The CPU of the unit 20 uses the five characters (for example, "foot" and "north") of the fourth and subsequent characters of the character string after character recognition.
"Ward", "west", "new") is extracted, and for each character, 10 characters are extracted from the Tokyo character table in descending order of similarity (step ST124). That is, a total of 50 characters can be cited as a candidate for reading the place name of a ward / city / county / island in the suburbs.

【００５７】なお、都下の地名のうち最長文字は東久留
米市、武蔵村山市の５文字であり、最短文字は北区、港
区等の２文字であるが、これらの地名の大部分は３文字
（足立区、府中市など）で構成されている。平仮名表記
をふくめて５文字あれば都下の長い地名（「ひがしくる
め市」「ひがしむらやま市」「ひがしやまと市」など）
の区別を付けることができる。Among the place names in the suburbs, the longest characters are 5 letters of Higashi Kurume City and Musashi Murayama City, and the shortest letters are 2 letters such as Kita Ward and Minato Ward. Most of these place names are It consists of three characters (Adachi Ward, Fuchu City, etc.). Long place names in Tokyo with 5 characters including Hiragana notation (such as "Higashi Kurume City""HigashimurayamaCity""HigashiyamatoCity")
Can be distinguished.

【００５８】次にユニット２０のＣＰＵは、抽出された
５組１０種の各文字組グループそれぞれから１文字づつ
取り出して２文字ないし５文字の組合せ単語を順次作り
これらの２〜５文字単語と住所辞書５０（知識データベ
ース含む）に登録された住所表記名とを逐次一対比較す
る（ステップＳＴ１２６）。Next, the CPU of the unit 20 takes out one character from each of the extracted five character sets and each of the ten character set groups and sequentially creates a combination word of two to five characters and these two to five character words and the address. A pair of address notation names registered in the dictionary 50 (including the knowledge database) are successively compared (step ST126).

【００５９】この一対比較の結果、一致単語（文字列
「足立区」に対する登録語「足立区」）が見つかれば、
あるいは知識データベースを利用して対応単語（文字列
「足北区」に対する登録語「足立区」）が見つかれば、
次の処理に移る（ステップＳＴ１２８、イエス）。一致
単語あるいは対応単語が見つからなければ、読取不能と
してその処理を終了し（ステップＳＴ１２８、ノー）別
の郵便物の住所読取処理に移る。As a result of this pairwise comparison, if a matching word (registered word "Adachi-ku" for the character string "Adachi-ku") is found,
Or if a corresponding word (registered word "Adachi-ku" for the character string "Ashikita-ku") is found using the knowledge database,
The process moves to the next process (step ST128, Yes). If no matching word or corresponding word is found, it is considered unreadable and the process is terminated (step ST128, No), and another mail address reading process is started.

【００６０】上記一対比較で一致単語（文字列「足立
区」に対する登録語「足立区」）が見つかると、現在の
住所階層が最終階層であるかどうかチェックされる（ス
テップＳＴ１３０）。もし最終階層であれば（ステップ
ＳＴ１３０、イエス）、ユニット２０のＣＰＵは、読み
取られた住所表記の文字コードに街区表示（丁目／番／
号など）の数字コードを付け、そこから対応する住所区
分コードを求めて、住所読取判別結果を出力する（ステ
ップＳＴ１３２）。When a matching word (registered word "Adachi-ku" for the character string "Adachi-ku") is found in the above paired comparison, it is checked whether or not the current address hierarchy is the final hierarchy (step ST130). If it is the final layer (step ST130, Yes), the CPU of the unit 20 displays the block code (chome / number /
No.) and the corresponding address classification code is obtained from the numerical code, and the address read determination result is output (step ST132).

【００６１】しかし、この時点（ステップＳＴ１２０〜
ステップＳＴ１２８）では都下の区／市などの地名（住
所表記の上位〜中位階層）を読み取っており、最終階層
ではない（ステップＳＴ１３０、ノー）。However, at this point (step ST120-
In step ST128), the place name (upper to middle hierarchy in the address notation) of a ward / city, etc. under Tokyo is read, and it is not the final hierarchy (step ST130, No).

【００６２】図４は、図３の手順の続きであって、文字
認識後の住所表記文字のうち、第３階層の住所表示（町
／村その他の地名）を読み取る手順を示す。住所認識ユ
ニット２０は、ステップＳＴ１２６での比較結果に基づ
いて、次に足立区内の地名の読み取りから初めるため
に、読取対象を「足立区」にセットする（ステップＳＴ
１３４）。すると住所認識ユニット２０のＣＰＵは、足
立区内の地名にだけ用いられる文字で構成される足立区
文字テーブルを基本文字候補３０から抽出する（ステッ
プＳＴ１３６）。足立区文字テーブルの記憶容量も１０
０文字分でよい。FIG. 4 is a continuation of the procedure of FIG. 3 and shows a procedure of reading the address display (town / village or other place name) of the third hierarchy among the address notation characters after character recognition. The address recognition unit 20 sets the reading target to "Adachi-ku" in order to start reading the place name in Adachi-ku next based on the comparison result in step ST126 (step ST).
134). Then, the CPU of the address recognition unit 20 extracts an Adachi-ku character table composed of characters used only for place names in Adachi-ku from the basic character candidates 30 (step ST136). Adachi Ward character table also has a storage capacity of 10
0 characters is enough.

【００６３】ユニット２０のＣＰＵは、文字認識後の文
字列の７字目以降の７文字（たとえば「西」「新」
「井」「栄」「町」「１」「丁」）を取り出し、各文字
について、東京都文字テーブルから、類似度の高い順に
１０文字を抽出する（ステップＳＴ１３８）。すなわ
ち、合計７０文字が足立区内の地名の読取候補として挙
げられる。The CPU of the unit 20 has seven characters after the seventh character of the character string after character recognition (for example, "west" and "new").
"I", "Sakae", "town", "1", "Ding") are taken out, and for each character, 10 characters are extracted from the Tokyo character table in descending order of similarity (step ST138). That is, a total of 70 characters are listed as candidates for reading the place name in Adachi City.

【００６４】なお、都下の地名のうち最長文字は西新井
栄町等の５文字であり、最短文字は青井等の２文字であ
る。平仮名表記をふくめて７文字あれば足立区内の長い
地名（「にしあらいさかえ町」「にしあらいほん町」な
ど）の区別を付けることができる。Among the place names in Tokyo, the longest characters are 5 characters such as Nishiarai Sakaemachi and the shortest characters are 2 characters such as Aoi. With 7 characters including Hiragana notation, it is possible to distinguish long place names in Adachi City (such as "Nishiaraisakae Town" and "Nishiaraihon Town").

【００６５】次にユニット２０のＣＰＵは、抽出された
７組１０種の各文字組グループそれぞれから１文字づつ
取り出して２文字ないし７文字の組合せ単語を順次作り
これらの２〜７文字単語と住所辞書５０（知識データベ
ース含む）に登録された住所表記名とを逐次一対比較す
る（ステップＳＴ１４０）。Next, the CPU of the unit 20 takes out one character from each of the extracted 7 sets and 10 types of character set groups, and sequentially creates a combination word of 2 to 7 characters and these 2 to 7 character words and the address. A pair of address notation names registered in the dictionary 50 (including the knowledge database) are successively compared (step ST140).

【００６６】この一対比較の結果、一致単語（文字列
「西新井栄町」に対する登録語「西新井栄町」）が見つ
かれば、あるいは知識データベースを利用して対応単語
（たとえば文字列「酉新丼栄田」に対する登録語「西新
井栄町」）が見つかれば、次の処理に移る（ステップＳ
Ｔ１４２、イエス）。一致単語あるいは対応単語が見つ
からなければ、読取不能としてその処理を終了し（ステ
ップＳＴ１４２、ノー）別の郵便物の住所読取処理に移
る。As a result of this pairwise comparison, if a matching word (registered word "Nishiarai Sakae town" for the character string "Nishiarai Sakae town") is found, or a corresponding word (for example, for the character string "Rooster Shindon Eitada") is utilized using the knowledge database. If the registered word "Nishiarai-cho" is found, move to the next process (step S).
T142, yes). If the matching word or the corresponding word is not found, it is considered unreadable and the process is terminated (step ST142, No), and another mail address reading process is performed.

【００６７】上記一対比較で一致単語（文字列「西新井
栄町」に対する登録語「西新井栄町」）が見つかると、
入力された文字列のうちまだ一対比較していない（つま
り読み取り判定していない）文字列があるかどうかチェ
ックされる（ステップＳＴ１４４）。もし未比較文字列
があれば（ステップＳＴ１４４、イエス）、ユニット２
０のＣＰＵは、街区（丁目／番／号など数字を伴うも
の）以外の文字各々について足立区文字テーブルを参照
し、足立区内の地名として用いられている文字１０候補
を、未比較文字列それぞれの文字に対して選択する（ス
テップＳＴ１４６）。そして各文字候補の全ての組合せ
単語と住所辞書５０内の足立区地名とが一対比較される
（ステップＳＴ１４８）。When a matching word (registered word "Nishiarai Sakae" for the character string "Nishiarai Sakae") is found in the above paired comparison,
It is checked whether or not there is a character string that has not undergone pair comparison (that is, read determination has not been performed) among the input character strings (step ST144). If there is an uncompared character string (step ST144, Yes), unit 2
The CPU of 0 refers to the Adachi Ward character table for each character other than the block (things with numbers such as chome / ban / go), and compares the 10 character candidates used as place names in Adachi Ward with the uncompared character string. A selection is made for each character (step ST146). Then, all the combination words of each character candidate and the Adachi-ku place name in the address dictionary 50 are pair-compared (step ST148).

【００６８】この一対比較の結果、一致単語あるいは対
応単語が見つからなければ、読取不能としてその処理を
終了し（ステップＳＴ１４２、ノー）、別の郵便物の住
所読取処理に移る。一致単語あるいは対応単語が見つか
り（ステップＳＴ１４２、イエス）、その後に未比較文
字列がないならば（ステップＳＴ１４４、ノー）、ユニ
ット２０のＣＰＵはその時点が住所表記の最終階層と判
断し、読み取られた住所表記（東京都足立区西新井栄
町）の文字コードに街区表示（丁目／番／号など）の数
字コードを付け、そこから対応する住所区分コードを求
めて、住所読取判別結果を出力する（ステップＳＴ１５
０）。If no matching word or corresponding word is found as a result of this pair-wise comparison, it is considered unreadable and the process is terminated (step ST142, No), and another mail address reading process is started. If a matching word or a corresponding word is found (step ST142, Yes) and there is no uncompared character string after that (step ST144, No), the CPU of the unit 20 judges that time is the last hierarchy of the address notation and is read. The address code (Nishiarai-cho, Adachi-ku, Tokyo) is attached to the numerical code of the block display (chome / ban / go etc.), the corresponding address classification code is obtained from it, and the address read determination result is output ( Step ST15
0).

【００６９】[0069]

【発明の効果】この発明によれば、住所を上位階層から
順に認識して行く場合に、各階層に必要な文字種のみを
参照するため、不用な文字候補が出てくることはなく、
正解文字を含む最適な文字候補を選択することができる
から、住所読取正解率を向上させることができる。According to the present invention, when an address is recognized in order from the upper hierarchy, only the character types required for each hierarchy are referred to, so that no unnecessary character candidates appear.
Since it is possible to select the optimum character candidate including the correct answer character, it is possible to improve the address reading correct answer rate.

【００７０】また、全ての文字認識を終わらせてから住
所認識に移るので、連続して多数の住所読取を行なう場
合には、住所認識処理中（図１のステップＳＴ１４〜ス
テップＳＴ３０）に文字認識のハードウエア（１０）は
次の住所表記の文字認識（図１のステップＳＴ１０〜ス
テップＳＴ１２）を開始することができる。すると、文
字認識と住所認識の並列処理が可能となるから、処理効
率が向上する。Further, since the address recognition is started after finishing all the character recognition, the character recognition is performed during the address recognition processing (step ST14 to step ST30 in FIG. 1) when a large number of address readings are continuously performed. The hardware (10) can start the next character recognition of the address notation (step ST10 to step ST12 in FIG. 1). Then, since the character recognition and the address recognition can be performed in parallel, the processing efficiency is improved.

[Brief description of drawings]

【図１】図１は、この発明の一実施例に係る住所読取シ
ステムにおいて、文字認識後の住所表記文字からどのよ
うにして正しい住所が読み取られるのかの概要を説明す
る図。FIG. 1 is a diagram illustrating an outline of how a correct address is read from an address notation character after character recognition in an address reading system according to an embodiment of the present invention.

【図２】図２は、この発明の一実施例に係る住所読取シ
ステムにおいて、文字認識後の住所表記文字のうち、第
１階層の住所表示（都道府県）を読み取る手順を説明す
るフローチャート。FIG. 2 is a flowchart for explaining a procedure of reading the address display (prefecture) of the first layer among the address notation characters after character recognition in the address reading system according to the embodiment of the present invention.

【図３】図３は、図２の手順の続きであって、文字認識
後の住所表記文字のうち、第２階層の住所表示（区／市
など）を読み取る手順を説明するフローチャート。FIG. 3 is a flowchart that is a continuation of the procedure of FIG. 2 and illustrates a procedure of reading the address display (ward / city, etc.) of the second layer among the address notation characters after character recognition.

【図４】図４は、図３の手順の続きであって、文字認識
後の住所表記文字のうち、第３階層の住所表示（町／村
その他の地名）を読み取る手順を説明するフローチャー
ト。FIG. 4 is a flowchart that is a continuation of the procedure of FIG. 3 and illustrates a procedure of reading the address display (town / village or other place name) of the third hierarchy among the address notation characters after character recognition.

【図５】図５は、この発明の一実施例に係る住所読取シ
ステムに適用されるハードウエア構成の概要を説明する
図。FIG. 5 is a diagram illustrating an outline of a hardware configuration applied to an address reading system according to an embodiment of the present invention.

【図６】図６は、住所表記文字がどのようにして認識さ
れるのかの具体例を説明する図。FIG. 6 is a diagram illustrating a specific example of how address notation characters are recognized.

[Explanation of symbols]

１０…文字認識部、１０２…光電変換ユニット、１０４
…領域検出ユニット、１０６…行検出ユニット、１０８
…文字検出ユニット、１１０…文字認識ユニット（ＣＰ
Ｕ）、２０…住所認識ユニット（ＣＰＵ）、３０…文字
辞書（基本文字候補）、４０…文字種テーブル（都道府
県名文字テーブル／地域名文字テーブル）、５０…住所
辞書。10 ... Character recognition unit, 102 ... Photoelectric conversion unit, 104
... Area detection unit, 106 ... Row detection unit, 108
… Character detection unit, 110… Character recognition unit (CP
U), 20 ... Address recognition unit (CPU), 30 ... Character dictionary (basic character candidate), 40 ... Character type table (prefecture name character table / region name character table), 50 ... Address dictionary.

Claims

[Claims]

1. An address dictionary that stores correct address notation data and an address dictionary that is composed of one or more address notation layers 1
A character recognition means for recognizing each of the characters forming the above address notation word and outputting a character recognition result, and a minimum character type necessary for recognizing each of the at least one address notation word. Of the above character type table and the character recognition result recognized by the character recognition means, a predetermined number of character types necessary for determining the address hierarchy to be recognized from the character type table is extracted from the character type table, and each of the extracted characters is extracted. An address reading system comprising: an address recognition unit that detects the corresponding address notation word by drawing the address dictionary in combination.

2. The character type table includes a second character type table configured only of character types necessary for recognizing an area specified by the address notation detected by the address recognizing means, and the address recognizing means. The address reading system according to claim 1, wherein the address dictionary is configured to be searched by subtracting the address dictionary with each combination of characters extracted from the second character type table. .

3. An address dictionary storing correct address notation data, and recognizing each character constituting a plurality of address notation words used for address indication including at least first and second address notation hierarchies. A character recognition unit that outputs a character recognition result, a first character type table configured to include a minimum character type necessary for recognizing the address notation word of the first address notation hierarchy, and the character recognition unit. Of the recognized character recognition results, a predetermined number of character types necessary for the determination of the first address notation hierarchy are extracted from the first character type table, and the address dictionary is subtracted with each combination of the extracted characters. , A first address recognition means for detecting the corresponding first address notation word, and a minimum necessary for recognition of an area specified by the first address notation word detected by the first address recognition means. Including the character type From the second character type table configured and the character recognition result recognized by the character recognizing means, a predetermined number of character types necessary for determining the second address notation hierarchy are extracted from the second character type table. A second address recognition means for detecting the corresponding second address notation word by drawing the address dictionary with each combination of the extracted characters; and the second address recognition means for detecting the corresponding second address notation word. An address reading system comprising: means for outputting a corresponding address code from the first and second address notation words.

4. A character recognition means for recognizing a character corresponding to a plurality of character patterns and outputting an address notation character string, an address dictionary storing data of correct address notation, and a name of a prefecture. With reference to the prefecture character table composed of the characters used, the prefecture character table and the address dictionary, it corresponds to the combination of the first few characters of the address notation string recognized by the character recognition means. A prefecture name detecting means for detecting a prefecture name, a region name character table composed of characters used for a region name representing a place in the region of the prefecture name detected by the prefecture name detecting means, and By referring to the area name character table and the address dictionary, it corresponds to a combination of several characters appearing next to the prefecture name in the address notation character string recognized by the character recognition means. An area name detecting means for detecting an area name, and a means for outputting a corresponding address code from the prefecture name detected by the prefecture name detecting means and the area name detected by the area name detecting means are provided. An address reading device characterized by the above.

5. An address notation character string is output by recognizing characters corresponding to a plurality of character patterns, and a prefecture character table composed of characters used for the names of prefectures and correct address notation data are stored. The address dictionary that is being used to detect the prefecture name corresponding to the combination of the first few characters of the output address notation character string, and the region representing the place within the region of the detected prefecture name Detecting a region name corresponding to a combination of several characters appearing next to the prefecture name in the output address notation character string by referring to the region name character table composed of characters used in the name and the address dictionary Then, the address reading method is characterized in that a corresponding address code is output from the detected prefecture name and the detected area name.