JP2671311B2

JP2671311B2 - Address reader

Info

Publication number: JP2671311B2
Application number: JP62213368A
Authority: JP
Inventors: 一成江上; 徹夫梅田; 重信粕谷
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-08-26
Filing date: 1987-08-26
Publication date: 1997-10-29
Anticipated expiration: 2012-10-29
Also published as: JPS63153689A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は郵便物などの紙葉上に連続した単語の集合と
して表記されている住所を読取る住所読取装置に関す
る。〔従来の技術〕従来の住所読取装置としては、特開昭57−146380に示
されるように、住所を構成する連続した単語の集合が順
次住所辞書と照合され最もよく一致する住所が抽出され
る方法が用いられている。ところが、住所表記中には、
国名，県名あるいは州名，都市名，町村名街路名，番地
あるいは街路番号，会社名，ビルディング名等、多数の
宛先情報が一般に含まれている。この住所読取りにおい
ては、これらの情報を逐一切り分け、認識しながら最終
的な住所を認識する手段がとられている。〔発明が解決しようとする問題点〕このように従来の装置では、これらの多数の宛先情報
から、国名，県名あるいは州名，都市名，等宛先構成要
素を検出するために、住所を構成する連続した単語か
ら、一つあるいは複数の連続した単語のすべての組合せ
をとり出し住所辞書と遂一照合していく手段が用いられ
たため、読取りの処理に多大の時間を要し装置の処理能
力を低下させる要因となっていた。第２図（ａ）〜（ｃ）は住所表記の構成例を示す住所
配置図である。第２図（ａ）は、住所表記が３行にわた
る場合で、下から第１行目に都市名と郵便番号、下から
第２行目にビルディング名と宛先の階、下から第３行目
に街路名と地域番号（地番）とが記載されている。第２
図（ｂ），（ｃ）は第２図（ａ）の住所表記を単語の単
位に分解した図を示し、住所読取りにおいては、第２図
（ｃ）の単語の配列の中から住所の構成要素である都市
名，街路名，ビルディング名等を検出していく。この検出過程は前述の特開昭57−146380にも示される
ごとく、第２図（ａ）に示す住所表記の下から第１行目
から上へと、住所構成要素を検出分離していき、また、
大区分から小区分すなわち都市名から街路名，地番，ビ
ルディング名の順に検出が行われる。この結果、第５図に示すように、各行で実行される住
所辞書との照合において、照合対象となる単語の組合せ
は第１行目で３通り、第２行目で15通り、第３行目で28
通りとなる。ここで、国名都市名登録件数500件、街路
名登録件数2000件、ビルディング名登録件数500件の住
所辞書における従来装置の照合回数を概算すると次のよ
うになる。この従来装置の住所辞書は、登録名の文字数（語長）
で住所辞書を分類する方法がとられている。国名，都市名分類……３語長〜20語長の18分類街路名分類……６語長〜30語長の25分類ビルディング名分類……６語長〜30語長の25分類また、照合方式としては、ダイナミックプログラミン
グ、（Dynamic Programming;DP）マッチング方式がとら
れており、語長ｎの入力単語に対してｎ−1,n,n＋１の
３分類の辞書と照合がとられる。従って照合回数は平均
すると（１）第１行目…国名，都市名検出３通り×500件÷18×３＝250回（２）第２行目…街路名，ビルディング名検出 28通り×（2000件＋500件）÷25×３＝8,400回（３）第３行目…ビルディング名検出 15通り×500件÷25×３＝900回この第２図（ａ）の住所検出のためには、これらの総
計9,550回もの照合が実行されることになる。従って、
従来装置で１回の照合に20μｓを要したとすると、前記
照合を完全に実行するには約190msを要し、これは毎時1
9,000通の処理速度に相当する。一方、装置仕様である
毎時30,000通を満足するためにはこの従来装置として２
台の照合処理部を必要とする結果となった。本発明の目的は、このような問題点を解決し、郵便物
などの紙葉上に記入される住所の表記上の特徴、すなわ
ち住所表記のキーワードを検出し、住所を構成する連続
した単語の中でのキーワードの位置から住所判別対象と
なる単語を限定することによって、住所辞書との照合回
数を減らし読取り処理時間を大幅に減少させた住所読取
装置を提供することにある。〔問題点を解決するための手段〕本発明の住所読取装置は、紙葉上に記載された住所表
記に係る文字列を読み取る文字読取手段と、この文字読
取手段の文字列出力を単語に編集する単語抽出手段と、
前記住所表記に対応する住所文字列を記憶する住所辞書
記憶手段と、前記単語抽出手段から出力される単語と前
記住所辞書記憶手段に記憶された文字列とを照合し住所
を判別する住所判別手段とを備えた住所読取装置におい
て、前記住所表記に対応する所定キーワードを記憶する
キーワード辞書記憶手段と、前記住所表記に含まれるキ
ーワードを検出するキーワード検出手段と、前記キーワ
ード検出結果に基づいて、前記住所判別手段において前
記住所辞書記憶手段に記憶された文字列と照合される前
記単語の組み合わせを限定する手段とを備えたものであ
る。〔作用〕本発明の構成によれば、住所表記を構成する単語の特
徴を表わすキーワードがキーワード辞書記憶手段に登録
され、また文字読取手段からの出力文字列は単語抽出手
段で単語に区切られてキーワード検出手段で登録キーワ
ードと照合されてまずキーワードが検出される。この検
出キーワードから次に検出すべき住所構成単語の照合組
合せが単語限定手段により判定され、限定された単語の
みが住所判別手段に送られ読取られた文字列に対応する
住所が判別される。すなわち、キーワードにより照合範
囲が限定されるので、照合時間を短縮することが出来
る。〔実施例〕以下本発明を図面に基づいて詳細に説明する。第１図は本発明の一実施例の構成を示すブロック構成
図である。本実施例も、第２図（ａ）〜（ｃ）に示すよ
うな住所表記の内、都市名，街路名，地番及びビルディ
ング名を読取るように構成している。この検出過程は、
従来と同様に第２図（ａ）に示す住所表記の下から第１
行目から上の行へと、住所構成要素を検出分離してい
き、また、大区分から小区分、すなわち都市名から街路
名，地番，ビルディング名の順に検出が行われる。この実施例は、文字読取部１と、単語抽出部２と、キ
ーワード検出手段３と、キーワード辞書記憶手段４と、
住所判別部５と、住所辞書記憶部６とから構成される。
また、キーワード検出部３は、単語抽出部２からの出力
単語を受け取る入力単語選択部31,入力単語とキーワー
ド辞書とを照合する最大一致検出部32,キーワードを判
定するキーワード判定部33,判定されたキーワードから
住所判別部５で住所検出に用いる単語を選択する入力単
語限定部34、そして出力単語情報を格納する単語バッフ
ァ35から構成され、キーワード辞書記憶手段４は、キー
ワード辞書記憶部41,キーワード辞書索引部42,照合キー
ワードバッファ43から構成される。文字読取部１の出力文字列100は、単語抽出部２の入
力に接続され、この単語抽出部２の出力単語列101は入
力単語選択部31に入力される。入力単語選択部の第一の
出力である単語102が最大一致検出部32の第一の入力に
接続される。第二の出力である単語の語長データ103が
辞書索引部42に入力される。辞書索引部42は入力した語
長データ103に基づきキーワード辞書記憶部41から対応
する語長の登録データ104が入力される。辞書索引部42
の出力である辞書登録データ105は照合キーワードバッ
ファ43に入力され、この照合キーワードバッファ43の出
力の辞書登録データ106は最大一致検出部32の第二の入
力に接続され、最大一致検出部32の照合結果107はキー
ワード判定部33に入力される。キーワード判定部33の第
一の出力であるキーワード判定終了信号108は入力単語
選択部31に接続され、第二の出力であるキーワード判定
結果109は入力単語限定部34に入力される。入力単語限
定部34の出力110は単語バッファ35に入力され、単語バ
ッファ35の出力111は住所判別部５の第一の入力に接続
され、住所辞書記憶部６の出力の辞書登録データ112は
住所判別部５の第二の入力に接続される。次に本実施例の動作を第１図，第２図及び第６図〜第
８図を用いて説明する。文字読取部１からは、第２図（ａ）に示す文字の読取
り結果100が各行毎に左端又は右端から一連して出力さ
れる。単語抽出部２では単語間の区切り（スペース等）
が検出されて、第２図（ｂ）のように各単語に分離さ
れ、最下行右端の単語から左へ順に単語番号が（１），
（２），（３）…（14）と与えられる。これらの単語列
101は入力単語選択部31に送られる。単語抽出部２の出力単語列101のデータ構成は第７図
に示される。検出された単語の総数，各行の単語数各単
語に含まれる文字数並びに各単語の構成文字の読取結果
を格納しているメモリアドレスを示すポインタから成
る。次に、キーワード検出手段につき動作を第６図のフロ
ーチャートを用いて説明する。入力単語選択部31は第７図に示す入力単語列情報101
から、総単語数データを入力し、キーワード検出処理ル
ープ108を制御するカウンタWCにセットする。次にキー
ワード検出を行なう単語の番号を記憶するレジスタWNを
１に初期設定し、キーワード検出を開始する。まず、レジスタWNにセットされた単語No.の語長デー
タすなわち、単語に含まれる文字数を第７図単語列情報
101から入力しキーワード辞書索引部42へ転送し、辞書
の読出しを行なう。次に同じ単語の文字読取出力を最大一致検出部32へ転
送する。そして、キーワード辞書からの読出しが終了す
るのを待つ。キーワード読出しの終了は、照合キーワー
ドバッファ43が“Ready"になることで検出される。一
方、キーワード辞書索引部42は第６図（ｂ）に示すごと
く動作する。すなわち、単語の語長データｎを入力する
と語長を調べ参照するキーワード辞書グループの範囲を
決定する。実施例では語長ｎが５文字以下のとき入力単
語と同一語長のキーワード辞書グループを読み出し照合
キーワードバッファ43にセットする。語長ｎが６文字以
上16文字以下の場合は語長（ｎ−１）,n,（ｎ＋１）の
３つのキーワード辞書グループを参照する。語長ｎが17
文字以上のときは（ｎ−２），（ｎ−１）,n,（ｎ＋
１），（ｎ＋２）の５つのキーワード辞書グループを参
照する。すべての辞書グループが読み出され照合キーワ
ード辞書バッファ43にセットされると、バッファ“Read
y"のフラグセットされる。そして最大一致検出部32が起
動される。最大一致検出部32では入力単語選択部31から
与えられる単語102と照合キーワードバッファ43から与
えられるキーワード106との照合が行われる。入力単語選択部31は、第２図（ｂ）の単語を第１行目
の単語（１）から順に最大一致検出部32に入力する。最
大一致検出部32は「DPマッチング方式」により照合を行
い。その結果をキーワード判定部33に出力する。このキーワードは、例えば第１表に示すようにキーワ
ード辞書記憶部41にキーワードとその属性コードとが登
録される。第２図（ｂ）の第１行目の単語（１）から第３行目単
語（14）までキーワード判定部33は次の単語との照合要
求108を入力単語選択部31に出力しこの照合を繰り返
す。第１表のキーワード辞書は、索引のための語長データ
と、キーワードと、その属性コードとから構成されてい
る。属性コードは、キーワードの性質を表わし、例えば
第４図の配置図に示すように、16ビットからなり、MSB
「０」ビットに街路名に用いられるキーワードフラグ、
「１」ビットにビルディング名に用いられるキーワード
フラグ、「２」ビットに番号を伴うキーワードフラグを
設け区別を行う。また、「14」ビットに先頭に置かれる
もの（接頭語）、「15」ビット（LSB）に末尾におかれ
るもの（接尾語）、「13」ビットに位置が不定のものか
を表わす情報（フラグ）を含んでいる。例えば、“AVEN
UE"というキーワードの属性コードは（8001）₁₆で街路
名に用いられその末尾にキーワードが置かれることを示
し、“BLK"または“BLOCK"というキーワードの属性コー
ドは（2002）₁₆で番号を伴うキーワードでありキーワー
ドの後に番号を持つことを示している。また、キーワードとしては、住所表記上出現頻度の高
いもの、例えば“SINGAPORE"のようなシンガポールの郵
便物であれば、ほとんどの郵便物に記載されるようなキ
ーワードを登録することが、住所検出処理の高速化には
有効である。キーワード判定部33は第６図（ａ）に示すように、一
定の閾値を満足する最も一致度の高いキーワード候補を
一時メモリに記憶し、すべての入力単語列101で、前記
と同様のキーワード検出を繰返す。すべての入力単語列
101でキーワード検出が終ると、住所の行単位にキーワ
ードの有無を調べる。結果は第８図に示すような行キー
ワードフラグレジスタにセットされる。第２図（ｂ）第１行目の単語（２）とキーワードとの
照合においては、第１表キーワード辞書の“SINGAPORE"
と一致し、キーワード判定結果109が入力単語限定部34
に送られる。第１行キーワードフラグレジスタには（30
00）₁₆というコードがセットされる。第２行目では単語
（３）とキーワード“AVENUE"単語（９）とキーワード
“BLOCK"が一致し、その判定結果109が入力単語限定部3
4に送られる。第２行キーワードフラグレジスタには（A
000）₁₆というコードがセットされる。第３行目では単
語（10）とキーワード“FLOOR"、単語（12）とキーワー
ド“BLDG"とが一致し、その判定結果109が入力単語限定
部34に送られる。第３行キーワードフラグレジスタには
（6000）₁₆というコードがセットされる。入力単語限定
部34では、各行のキーワードフラグレジスタを並びにキ
ーワード属性コードを参照しながら、住所判別を行うた
めの単語組合せを決定し単語バッファ35にセットする。第３図（ａ）〜（ｃ）は第１表のキーワードの検出状
態を示す説明図である。第３図（ａ）は第１行目のキー
ワード検出結果が示されている。入力単語限定部34は第
１行目単語（２）が後に番号を伴うキーワードであるこ
とをキーワード“SINGAPORE"の属性コード（2002）₁₆か
ら知る。この結果、第１行目では既に都市名が検出さ
れ、以降の住所判別部５では第１行目は扱う必要がなく
なる。第２行目においては、第３図（ｂ）のように単語
（３）と単語（９）がキーワードとして検出されてい
る。単語（３）は“AVENUE"というキーワードでその属
性コード（8001）₁₆から、街路名の末尾におかれるキー
ワードあることがわかる。また単語（９）は“BLOCK"と
いうキーワードで、その属性コード（2002）₁₆から後に
番号を伴うキーワードであることがわかる。この結果、
第２行目では単語（３）を末尾に含む一連の単語の組合
せで街路名を判別すればよいことがわかる。従って、第３図（ｂ）のステップからまでの４通
りの単語の組合せのみで住所辞書６の中の街路名辞書と
照合すればよく、キーワードを検出しない場合の単語の
組合せ（第５図（ｂ））の28通に比べ1/7の組合せです
む。第３行目においては、第３図（ｃ）のよのに単語（1
0）と単語（12）がキーワードとして検出されている。
単語（10）は“FLOOR"というキーワードで、その属性コ
ード（2004）₁₆からキーワードの前後いずれかに番号を
伴うことがわかる。単語（12）は“BLDG"というキーワ
ードでその属性コード（4001）₁₆からビルディング名に
用いられ、その末尾に置かれるキーワードであることが
わかる。この結果、第３行目では単語（12）を末尾に含
む一連の単語の組合せでビルディング名を判別すればよ
いことがわかる。従って、第３図（ｃ）のステップ，の２通の単語
の組合せのみで住所辞書記憶部６のビルディング名辞書
と照合すればよく、キーワード検出しない場合の単語の
組合せ（第５図（ｃ））の15通りに比べ1/7.5の組合せ
で済む。入力単語限定部34では限定した入力単語の組合せを単
語バッファ35に入力する。そして住所判別部５では、従
来と同様に入力端子111と住所辞書記憶部６に記憶され
た住所辞書とを照合し、住所判別結果113を出力する。なお、本実施例における住所辞書との照合回数の概算
は次のようになる。（１）キーワード検出 14通り×15（キーワード）＝210回（２）第１行目国名都市名検出０回（キーワード検出で既に都市名検出）（３）第２行目街路名検出４通り×2000件÷25×３＝960回（４）第３行目ビルディング名検出２通り×500件÷25×３＝120回従って、辞書との照合統計は1290回となる。従って、
キーワード検出を実施しない従来の照合回数9550回に比
べ1/7.5に短縮されたことになる。〔発明の効果〕以上説明したように本発明は、紙葉上に記載された一
連の単語列から住所判別を行う過程でキーワードを検出
することにより、住所判別対象となる単語を限定できる
ので、住所辞書との照合回数を大幅に減らし、住所読取
処理時間を大幅に短縮できる効果がある。The present invention relates to an address reading device for reading an address written as a set of consecutive words on a paper sheet such as a mail piece. [Prior Art] As a conventional address reading device, as shown in JP-A-57-146380, a set of consecutive words forming an address is sequentially collated with an address dictionary to extract the best matching address. Method is used. However, in the address notation,
A large number of destination information such as country name, prefecture name or state name, city name, town name, street name, street address or street number, company name, building name, etc. are generally included. In this address reading, a means for recognizing the final address while dividing and recognizing these pieces of information is taken. [Problems to be Solved by the Invention] As described above, in the conventional device, an address is constructed in order to detect destination components such as a country name, a prefecture name or a state name, a city name, etc. from a large number of these destination information. Since a method of extracting all combinations of one or more consecutive words from the consecutive consecutive words and collating them with the address dictionary is used, it takes a lot of time to read and the processing capability of the device. Was a factor that lowers. FIGS. 2A to 2C are address layout diagrams showing an example of the address notation. FIG. 2 (a) shows the case where the address notation spans three lines, the city name and zip code are on the first line from the bottom, the building name and destination floor are on the second line from the bottom, and the third line from the bottom. The street name and area number (lot number) are described in. Second
2B and 2C are diagrams in which the address notation of FIG. 2A is decomposed into word units, and in address reading, the address configuration is made from the word array of FIG. 2C. It detects elements such as city names, street names, and building names. As shown in the above-mentioned Japanese Patent Laid-Open No. 57-146380, this detection process detects and separates the address constituent elements from the bottom to the first line of the address notation shown in FIG. 2 (a). Also,
Detection is performed in the order of major to minor, that is, city name to street name, lot number, and building name. As a result, as shown in FIG. 5, in the matching with the address dictionary executed in each line, there are three combinations of words to be matched in the first line, 15 in the second line, and three in the third line. 28 with the eyes
It becomes a street. Here, the number of collations of the conventional device in the address dictionary with 500 registered country names and city names, 2000 registered street names, and 500 registered building names is roughly calculated as follows. The address dictionary of this conventional device uses the number of characters (word length) of the registered name.
The method of classifying address dictionaries is adopted. Country name, city name classification …… 3 word length to 20 word length 18 classification Street name classification …… 6 word length to 30 word length 25 classification Building name classification …… 6 word length to 30 word length 25 classification As a method, a dynamic programming (DP) matching method is adopted, and an input word having a word length n is collated with a dictionary of three classifications of n-1, n, n + 1. Therefore, the average number of collations is (1) 1st line… Country name / city name detection 3 ways × 500 cases ÷ 18 × 3 = 250 times (2) 2nd line… street name, building name detection 28 ways × (2000 (+500 cases) ÷ 25 × 3 = 8,400 times (3) 3rd line… Building name detection 15 streets × 500 cases ÷ 25 × 3 = 900 times In order to detect the address in Fig. 2 (a), Will be executed 9,550 times in total. Therefore,
If the conventional device requires 20 μs for one verification, it takes about 190 ms to execute the verification completely, which is 1 hour / hour.
This is equivalent to a processing speed of 9,000. On the other hand, in order to satisfy the device specifications of 30,000 / hour
This resulted in the need for a collation processing unit for the machine. An object of the present invention is to solve such a problem, detect a notational characteristic of an address written on a paper sheet such as a mail, that is, detect a keyword of an address notation, and detect a continuous word constituting an address. An object of the present invention is to provide an address reading device in which the number of collations with the address dictionary is reduced and the reading processing time is significantly reduced by limiting the words to be subjected to the address discrimination from the position of the keyword therein. [Means for Solving Problems] An address reading device of the present invention is a character reading unit that reads a character string relating to an address notation written on a paper sheet, and a character string output of the character reading unit is edited into a word. A word extraction means for
Address dictionary storage means for storing an address character string corresponding to the address notation, and address discrimination means for collating the word output from the word extraction means and the character string stored in the address dictionary storage means to discriminate the address. In an address reading device comprising: a keyword dictionary storage unit that stores a predetermined keyword corresponding to the address notation, a keyword detection unit that detects a keyword included in the address notation, based on the keyword detection result, And means for limiting combinations of the words to be collated with the character strings stored in the address dictionary storage means in the address determination means. [Operation] According to the configuration of the present invention, the keyword representing the feature of the word forming the address notation is registered in the keyword dictionary storage means, and the output character string from the character reading means is divided into words by the word extracting means. The keyword detection means first compares the registered keyword with the registered keyword to detect the keyword. The matching combination of the address constituent words to be detected next from this detected keyword is determined by the word limiting means, and only the limited word is sent to the address determining means, and the address corresponding to the read character string is determined. That is, since the matching range is limited by the keyword, the matching time can be shortened. EXAMPLES The present invention will be described in detail below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention. This embodiment is also configured to read the city name, street name, lot number, and building name in the address notation as shown in FIGS. 2 (a) to (c). This detection process is
The first from the bottom of the address notation shown in FIG.
The address components are detected and separated from the first line to the upper line, and detection is performed in the order of the large section to the small section, that is, the city name to the street name, the lot number, and the building name. In this embodiment, a character reading unit 1, a word extracting unit 2, a keyword detecting unit 3, a keyword dictionary storing unit 4,
It is composed of an address discrimination unit 5 and an address dictionary storage unit 6.
Further, the keyword detection unit 3 receives an output word from the word extraction unit 2, an input word selection unit 31, a maximum match detection unit 32 that compares the input word with a keyword dictionary, a keyword determination unit 33 that determines a keyword, and a determination. It comprises an input word limiting unit 34 for selecting a word to be used for address detection in the address discriminating unit 5 from the keywords, and a word buffer 35 for storing output word information. It is composed of a dictionary index unit 42 and a matching keyword buffer 43. The output character string 100 of the character reading unit 1 is connected to the input of the word extracting unit 2, and the output word string 101 of this word extracting unit 2 is input to the input word selecting unit 31. The word 102, which is the first output of the input word selection unit, is connected to the first input of the maximum match detection unit 32. The word length data 103 of the word which is the second output is input to the dictionary index unit 42. The dictionary index unit 42 receives corresponding word length registration data 104 from the keyword dictionary storage unit 41 based on the input word length data 103. Dictionary index section 42
The dictionary registration data 105 which is the output of is input to the matching keyword buffer 43, and the dictionary registration data 106 of the output of this matching keyword buffer 43 is connected to the second input of the maximum match detecting unit 32, The matching result 107 is input to the keyword determination unit 33. The keyword determination end signal 108 that is the first output of the keyword determination unit 33 is connected to the input word selection unit 31, and the keyword determination result 109 that is the second output is input to the input word limiting unit 34. The output 110 of the input word limiting unit 34 is input to the word buffer 35, the output 111 of the word buffer 35 is connected to the first input of the address discriminating unit 5, and the dictionary registration data 112 of the output of the address dictionary storage unit 6 is the address. It is connected to the second input of the discriminator 5. Next, the operation of this embodiment will be described with reference to FIGS. 1, 2, and 6 to 8. From the character reading unit 1, a character reading result 100 shown in FIG. 2A is sequentially output from the left end or the right end for each line. The word extraction unit 2 separates words (spaces, etc.)
Is detected and separated into words as shown in FIG. 2 (b), and the word numbers are (1), in order from the rightmost word on the bottom line to the left.
They are given as (2), (3) ... (14). These word strings
101 is sent to the input word selection unit 31. The data structure of the output word string 101 of the word extracting unit 2 is shown in FIG. It consists of the total number of detected words, the number of words in each line, the number of characters contained in each word, and a pointer indicating the memory address storing the read result of the constituent characters of each word. Next, the operation of the keyword detecting means will be described with reference to the flowchart of FIG. The input word selection unit 31 uses the input word string information 101 shown in FIG.
Then, the total word number data is input and set in the counter WC which controls the keyword detection processing loop 108. Next, the register WN that stores the number of the word for which the keyword is detected is initialized to 1, and the keyword detection is started. First, the word length data of the word No. set in the register WN, that is, the number of characters included in the word, is shown in FIG.
It is input from 101, transferred to the keyword dictionary indexing section 42, and the dictionary is read. Next, the character read output of the same word is transferred to the maximum match detection unit 32. Then, it waits until the reading from the keyword dictionary is completed. The end of the keyword reading is detected when the collation keyword buffer 43 becomes "Ready". On the other hand, the keyword dictionary indexing section 42 operates as shown in FIG. 6 (b). That is, when the word length data n of a word is input, the word length is checked and the range of the keyword dictionary group to be referred to is determined. In the embodiment, when the word length n is 5 characters or less, a keyword dictionary group having the same word length as the input word is read and set in the matching keyword buffer 43. When the word length n is 6 characters or more and 16 characters or less, three keyword dictionary groups having word lengths (n-1), n, (n + 1) are referred to. Word length n is 17
If there are more than characters, (n-2), (n-1), n, (n +
Reference is made to five keyword dictionary groups 1) and (n + 2). When all dictionary groups have been read and set in the collation keyword dictionary buffer 43, the buffer “Read
The flag "y" is set. Then, the maximum match detection unit 32 is activated. The maximum match detection unit 32 performs matching between the word 102 given from the input word selection unit 31 and the keyword 106 given from the matching keyword buffer 43. The input word selection unit 31 inputs the words of Fig. 2 (b) in order from the word (1) on the first line to the maximum match detection unit 32. The maximum match detection unit 32 uses the "DP matching method". Collate. The result is output to the keyword determination unit 33. For this keyword, for example, as shown in Table 1, the keyword and its attribute code are registered in the keyword dictionary storage unit 41. From the word (1) on the first line to the word (14) on the third line in FIG. 2B, the keyword determination unit 33 outputs a matching request 108 with the next word to the input word selection unit 31, and this matching is performed. repeat. The keyword dictionary in Table 1 is composed of word length data for indexing, keywords, and their attribute codes. The attribute code represents the property of the keyword. For example, as shown in the layout diagram of FIG.
Keyword flag used for street name in "0" bit,
The "1" bit is provided with a keyword flag used for a building name, and the "2" bit is provided with a keyword flag accompanied by a number for distinction. Also, information that is placed at the beginning of "14" bits (prefix), that that is placed at the end of "15" bits (LSB) (suffix), and information that indicates whether the position is undefined at "13" bits ( Flag) is included. For example, "AVEN
The attribute code of the keyword "UE" is used in the street name in (8001) ₁₆ and indicates that the keyword is placed at the end, and the attribute code of the keyword "BLK" or "BLOCK" is accompanied by a number in (2002) _16. It indicates that it is a keyword and has a number after the keyword.As for the keyword, most frequently appearing in address notation, for example, Singapore mail such as "SINGAPORE" Registering a keyword as described is effective for speeding up the address detection processing.The keyword determination unit 33, as shown in FIG. The high keyword candidates are stored in the temporary memory, and the same keyword detection as above is repeated for all input word strings 101. All input word strings
When the keyword detection ends in 101, the presence or absence of the keyword is checked for each line of the address. The result is set in the row keyword flag register as shown in FIG. In matching the word (2) on the first line of FIG. 2 (b) with the keyword, "SINGAPORE" in the keyword dictionary of Table 1 is used.
And the keyword determination result 109 matches the input word limiting unit 34
Sent to The first line keyword flag register contains (30
00) The code ₁₆ is set. In the second line, word (3) matches the keyword “AVENUE” word (9) matches the keyword “BLOCK”, and the determination result 109 is the input word limiting unit 3
Sent to 4. The second line keyword flag register contains (A
000) ₁₆ code is set. In the third line, the word (10) and the keyword “FLOOR” match, and the word (12) and the keyword “BLDG” match, and the determination result 109 is sent to the input word limiting unit 34. The code (6000) ₁₆ is set in the third line keyword flag register. The input word limiting unit 34 determines a word combination for address determination and sets it in the word buffer 35 by referring to the keyword flag register of each line and the keyword attribute code. FIGS. 3 (a) to 3 (c) are explanatory views showing the detection states of the keywords in Table 1. FIG. 3 (a) shows the keyword detection result on the first line. The input word limiting unit 34 learns from the attribute code (2002) ₁₆ of the keyword “SINGAPORE” that the first line word (2) is a keyword accompanied by a number after it. As a result, the city name has already been detected in the first line, and the subsequent address discrimination unit 5 does not need to handle the first line. In the second line, word (3) and word (9) are detected as keywords, as shown in FIG. 3 (b). The word (3) is the keyword “AVENUE”, and it can be understood from the attribute code (8001) ₁₆ that there is a keyword at the end of the street name. Further, it can be seen that the word (9) is the keyword "BLOCK", and is a keyword accompanied by a number after the attribute code (2002) ₁₆ . As a result,
In the second line, it is understood that the street name may be determined by a combination of a series of words including the word (3) at the end. Therefore, it suffices to match only the four combinations of words from the steps of FIG. 3 (b) with the street name dictionary in the address dictionary 6, and the combinations of words when the keyword is not detected (see FIG. Compared to 28 in b)), 1/7 combination is required. In the 3rd line, the word (1
0) and the word (12) are detected as keywords.
It can be seen from the attribute code (2004) ₁₆ that the word (10) is the keyword "FLOOR", and that a number is attached either before or after the keyword. The word (12) is the keyword "BLDG", and it can be seen from its attribute code (4001) ₁₆ that it is used as the building name and is the keyword placed at the end. As a result, in the third line, it is understood that the building name may be determined by a combination of a series of words including the word (12) at the end. Therefore, it suffices to check the building name dictionary in the address dictionary storage unit 6 only with the combination of two words in the step of FIG. 3 (c), and the combination of words when no keyword is detected (FIG. 5 (c)). ) Is 1 / 7.5 compared to 15 ways. The input word limiting unit 34 inputs the limited combination of input words to the word buffer 35. Then, the address discrimination unit 5 collates the input terminal 111 with the address dictionary stored in the address dictionary storage unit 6 as in the conventional case, and outputs the address discrimination result 113. The approximate number of collations with the address dictionary in this embodiment is as follows. (1) Keyword detection 14 ways × 15 (keywords) = 210 times (2) First line Country name City name detection 0 times (already detected city name by keyword detection) (3) Second line Street name detection 4 ways × 2000 cases ÷ 25 × 3 = 960 times (4) 3rd line Building name detection 2 ways × 500 cases ÷ 25 × 3 = 120 times Therefore, the matching statistics with the dictionary is 1290 times. Therefore,
This means that the number of collations was reduced to 1 / 7.5, which is 1 / 7.5 of the conventional collation count of 9550 without keyword detection. [Advantages of the Invention] As described above, the present invention detects a keyword in the process of performing address discrimination from a series of word strings written on a paper sheet, so that it is possible to limit the words to be subjected to address discrimination, This has the effect of significantly reducing the number of collations with the address dictionary and greatly reducing the address reading processing time.

【図面の簡単な説明】第１図は本発明の一実施例の構成を示すブロック図、第
２図（ａ）〜（ｃ）は住所表記と単語構成の一例を示す
住所配置図、第３図（ａ）〜（ｃ）は本実施例の動作を
説明する住所配置図、第４図は本実施例のキーワード辞
書の構成を示すフラグ配列図、第５図（ａ）〜（ｃ）は
従来装置の単語照合状態を説明する住所配置図、第６図
はキーワード検出手段の動作を説明するフローチャー
ト、第７図は単語列情報データ構成図、第８図は行キー
ワードフラグレジスタ構成図である。１……文字読取部、２……単語抽出部、３……キーワー
ド検出手段、４……キーワード辞書記憶手段、５……住
所判別部、６……住所辞書記憶部、31……入力単語選択
部、32……最大一致検出部、33……キーワード判定部、
34……入力単語限定部、35……単語バッファ、41……キ
ーワード辞書記憶部、42……キーワード辞書索引部、43
……照合キーワードバッファ、100……出力文字列、101
……出力単語列、102……単語、103……語長データ、10
4……語長登録データ、105,112……辞書登録データ、10
6……出力データ、107……照合出力、108……判定終了
信号、109,113……判定出力。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention, FIGS. 2 (a) to (c) are address layout diagrams showing an example of address notation and word configuration, and FIG. FIGS. 5A to 5C are address arrangement diagrams for explaining the operation of this embodiment, FIG. 4 is a flag array diagram showing the configuration of the keyword dictionary of this embodiment, and FIGS. FIG. 6 is a diagram for explaining the operation of the keyword detecting means, FIG. 7 is a diagram for constructing word string information data, and FIG. 8 is a diagram for constructing a row keyword flag register. . 1 ... Character reading unit, 2 ... Word extraction unit, 3 ... Keyword detection unit, 4 ... Keyword dictionary storage unit, 5 ... Address discrimination unit, 6 ... Address dictionary storage unit, 31 ... Input word selection Part, 32 ... maximum match detection part, 33 ... keyword determination part,
34 …… Input word limiter, 35 …… Word buffer, 41 …… Keyword dictionary storage, 42 …… Keyword dictionary index, 43
…… Collation keyword buffer, 100 …… Output character string, 101
…… Output word string, 102 …… Word, 103 …… Word length data, 10
4 …… Word length registration data, 105,112 …… Dictionary registration data, 10
6 ... Output data, 107 ... Collation output, 108 ... Judgment end signal, 109, 113 ... Judgment output.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭57−137976（ＪＰ，Ａ) 特開昭60−233782（ＪＰ，Ａ) 特開昭57−146380（ＪＰ，Ａ) ────────────────────────────────────────────────── ─── Continuation of front page (56) References JP-A-57-137976 (JP, A) JP-A-60-233782 (JP, A) JP-A-57-146380 (JP, A)

Claims

(57) [Claims] Character reading means for reading a character string relating to the address notation written on the paper sheet, word extracting means for editing the character string output of the character reading means into words, and storing an address character string corresponding to the address notation. An address reading device comprising: an address dictionary storage means; and an address discrimination means for discriminating an address by collating a word output from the word extraction means with a character string stored in the address dictionary storage means. A keyword dictionary storage means for storing a predetermined keyword corresponding to, a keyword detection means for detecting a keyword included in the address notation, and stored in the address dictionary storage means in the address determination means based on the keyword detection result. An address reading device, comprising means for limiting a combination of the words to be collated with the character string.