JPH02181269A

JPH02181269A - Address recognizing system

Info

Publication number: JPH02181269A
Application number: JP64000114A
Authority: JP
Inventors: Kaoru Katagiri; 片桐　薫; Hiroyuki Sugai; 菅井　弘幸
Original assignee: Toshiba Corp; Toshiba System Development Co Ltd
Current assignee: Toshiba Corp; Toshiba System Development Co Ltd
Priority date: 1989-01-05
Filing date: 1989-01-05
Publication date: 1990-07-16

Abstract

PURPOSE:To retrieve the addresses with high accuracy by preparing all possible meanings to an abbreviation and defining the standard forms of words and the inputted words as the recognizing candidates. CONSTITUTION:An input part 11 consists of a photoelectric transducer like a CCD, etc., and converts the pictures drawn on a mail into the picture information '0' and '1'. These picture information obtained via the part 11 are supplied to a character detecting/recognizing part 13. The part 13 extracts the information forming characters out of the picture information and recognizes each character. Based on this recognizing result, a word recognizing part 15 recognizes words by means of a 1st data base 17. In this word recognizing process, the recognizing result is not merely obtained but plural recognizing candidates are obtained in the order of higher degrees of likelihood. Thus plural words are outputted as the recognizing candidates. An address recognizing part 21 recognizes addresses by means of a 2nd data base 19 after the words are recognized. Then the recognized addresses are outputted via an output part 23.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、住所認識方式に関する。[Detailed description of the invention] [Purpose of the invention] (Industrial application field) The present invention relates to an address recognition method.

（従来の技術）認識方法には種々雑多のものがあるが、住所認識の場合
には以下のような方式が採用されている。(Prior Art) There are various recognition methods, but in the case of address recognition, the following methods are adopted.

まず、住所認識を実行するために、文字を認識し、これ
に引続き単語を認識する。この認識結果に基づいて、単
語を組合せて住所を検索する。この単語認識に際し、同
義語があった場合、略称を候補として挙げていた。例え
ば、ｒＨｉＧＨＷＡＹＪ　　ｒＨｉＷＡＹＪ　　ｒＨＷ
ＹＪとの記載を読取り認識した場合、辞書からその略称
であるｒＨＷＹＪを認識候補として得ていた。これは、
ｒＨｉ　ＧＨＷＡＹＪ　　ｒＨｉＷＡＹＪ　　ｒＨＷＹ
Ｊが共に同一の意味、即ち高速道路という意味しかない
場合には非常に有効であった。First, to perform address recognition, characters are recognized, followed by words. Based on this recognition result, addresses are searched by combining words. During word recognition, if synonyms were found, abbreviations were suggested as candidates. For example, rHiGHWAYJ rHiWAYJ rHW
When the description YJ was read and recognized, its abbreviation rHWYJ was obtained from the dictionary as a recognition candidate. this is,
rHi GHWAYJ rHiWAYJ rHWY
It was very effective when both J's had the same meaning, that is, expressway.

しかしながら、本願発明者が、試験、検討を重ねたとこ
ろ、略称の意味が重複する場合があり、正しい住所検索
が出来なかった。However, after repeated tests and studies by the inventor of the present application, it was found that the meanings of the abbreviations overlapped in some cases, making it impossible to perform correct address searches.

（発明が解決しようとする課題）このように従来の技術では、単語の候補を単一の略称で
表現していたため、略称の意味が重複する場合、正しい
住所検索が出来なかった。(Problems to be Solved by the Invention) As described above, in the conventional technology, word candidates are expressed by a single abbreviation, so when the meanings of the abbreviations overlap, correct address retrieval cannot be performed.

そこで、この発明は、略称の意味が重複する場合でもよ
り正確な認識の出来る住所検索方式を提供することを目
的とする。Therefore, it is an object of the present invention to provide an address retrieval method that allows more accurate recognition even when the meanings of abbreviations overlap.

［発明の構成〕（課題を解決するための手段）本発明は上記課題を解決するために、検索対象を入力す
る入力手段と、この入力手段から入力された単語に対し
辞書の内容との比較を行って認識を実行する手段とを具
備する識方式において、略称が用いられる単語に対して
その標準形を前記辞書に登録しておき、前記単語の検索
に際し前記標準形を出力すると共に、前記入力手段から
入力された単語を、認識候補として出力することを特徴
とする。[Structure of the Invention] (Means for Solving the Problems) In order to solve the above problems, the present invention provides an input means for inputting a search target, and a comparison of the words inputted from this input means with the contents of a dictionary. In the identification method, the standard form of a word for which an abbreviation is used is registered in the dictionary, and the standard form is output when searching for the word, and the standard form is It is characterized in that the words input from the input means are output as recognition candidates.

（作用）本発明によれば、略称に対し可能な全ての意味を持たせ
、複数の候補を認識候補とするため住所認識がより正確
に実行される。(Operation) According to the present invention, all possible meanings are assigned to abbreviations and a plurality of candidates are used as recognition candidates, so that address recognition is performed more accurately.

（実施例）次にこの発明の一実施例について、図面を用いて詳細に
説明する。(Example) Next, an example of the present invention will be described in detail using the drawings.

この実施例は米国での郵便物の住所認識装置に関する。This embodiment relates to a mail address recognition system in the United States.

この住所認識装置は、第１図に示されるよう入力部１１
を含む。この入力部１１は、ＣＣＤ等の光電変換装置か
ら構成され、郵便物上の画像をｒＯＪ　　ｒｌＪの画像
情報に変換するものである。この入力部１１で得られた
画像情報は、文字検出認識部１３に供給される。この文
字検出認識部１３では、画像情報から文字を構成する情
報を抽出し、個々の文字について認識処理を実行する。This address recognition device has an input section 11 as shown in FIG.
including. This input unit 11 is composed of a photoelectric conversion device such as a CCD, and converts an image on a mail item into rOJ rlJ image information. The image information obtained by this input section 11 is supplied to a character detection recognition section 13. The character detection and recognition unit 13 extracts information constituting characters from image information and performs recognition processing on each character.

この結果に対して単語認識部１５において、第１のデー
タベースを用いて単語認識が実行される。The word recognition unit 15 executes word recognition on this result using the first database.

単語認識の後、第２のデータベース１９を用いて住所認
識部２１にて住所認識がなされる。この認識結果が出力
装置２３に供給し、表示等をする。After word recognition, the address recognition unit 21 uses the second database 19 to recognize the address. This recognition result is supplied to the output device 23 and displayed.

次に上記構成における動作処理の詳細について説明する
。Next, details of the operation processing in the above configuration will be explained.

文字検出認識部１３での文字検出は、周知の技術で達成
される。まず、画像情報に対して２軸方向（水平、垂直
）に分布の統計をとる。即ち、２軸方向の各々に射影を
とり、「１」の数を計数する。すると、文字のある場所
では他の領域に比し、「１」の数、即ち画素数が多いの
で、文字の検出は容易に実現される。ただし、この時に
は、通常の認識処理に用いるより粗い密度の画素により
分布をとる。例えば、ＣＣＤでの読取り能力が８本／　
ｍ　ｍであり、通常の認識処理でも８本／　ｍ　ｍで行
っていたとすると、上記の検出処理には１本／ｍｍの精
度で行う。粗い画素での処理の方が文字検出に誤りが生
じないので好ましい。Character detection by the character detection recognition unit 13 is achieved using a well-known technique. First, statistics on the distribution of image information in two axes (horizontal and vertical) are taken. That is, projections are taken in each of the two axis directions and the number of "1"s is counted. Then, since the number of "1"s, that is, the number of pixels is larger in the area where the character is present than in other areas, the detection of the character can be easily realized. However, at this time, the distribution is based on pixels with a coarser density than those used in normal recognition processing. For example, the reading capacity of a CCD is 8 lines/
mm, and if normal recognition processing is performed at a rate of 8 lines/mm, the above detection process is performed with an accuracy of 1 line/mm. Processing using coarse pixels is preferable because it prevents errors in character detection.

この文字検出では、２軸方向に射影をとるが、−の方向
の射影には切れ目（画素のない領域）があり、しかもそ
の間隔がある一定値以上になっている。この方向を第１
の方向と呼ぶ。他の方向の射影は、切れ目がないか、あ
ったとしてもその間隔が狭い。In this character detection, projections are taken in the two-axis directions, and there is a break (area with no pixels) in the projection in the - direction, and the interval between the breaks is greater than a certain value. This direction is the first
is called the direction of Projections in other directions are either seamless or closely spaced.

ここでは、切れ目のある方向に沿って「行」が構成され
ていると考えられる。よって、上記の切れ目に沿って、
「行」を切出していく。Here, it is considered that "rows" are constructed along the direction of the cut. Therefore, along the above cut,
Cut out the “row”.

行を切出したなら、この行を構成する領域の画像に対し
て、第１の方向とは垂直な第２の方向に沿って射影を取
る。これにより、文字のある領域では画素がカウントさ
れ、文字のない領域では画素がカウントされない。この
処理においては、扱う画素密度を行切出し時より精密な
ものにしても良い。上記と同様に、文字の存在の有無が
第２の方向への射影となって表れる。よって、行に対し
て文字の切出しが出来、文字検切が達成される。Once a row is cut out, a projection is taken along a second direction perpendicular to the first direction with respect to the image of the area constituting this row. As a result, pixels are counted in areas with characters, and pixels are not counted in areas without characters. In this process, the pixel density to be handled may be made more precise than that at the time of line extraction. Similarly to the above, the presence or absence of a character appears as a projection in the second direction. Therefore, characters can be cut out from a line, and character inspection can be achieved.

文字の検切に引続き、文字認識が行われる。この実施例
では、例えば複合類似度法を用いて文字認識を行なう。Following character verification, character recognition is performed. In this embodiment, character recognition is performed using, for example, a composite similarity method.

ここまでが、文字検出認識部１３での処理であり、文字
検出認識部１３からは、ｒＨＪ　　ｒｉＪ　　ｒＷＪ　
　ｒＹＪ等の文字についての情報が得られる。The processing up to this point is the processing in the character detection and recognition unit 13. From the character detection and recognition unit 13, rHJ riJ rWJ
Information about characters such as rYJ can be obtained.

この文字認識の後、単語認識部１５にて、文字の組合わ
せである単語の認識が実行される。尚、認識については
、単純に認識結果が得られるのではなく、複数の認識候
補が尤度の高い順に得られ、通常の場合、郵便物上の読
取りにより、複数の単語が認識候補として出力される。After this character recognition, the word recognition unit 15 recognizes a word, which is a combination of characters. Regarding recognition, rather than simply obtaining a recognition result, multiple recognition candidates are obtained in descending order of likelihood, and in normal cases, multiple words are output as recognition candidates by reading on a piece of mail. Ru.

今、説明の都合上、対象の郵便物上の住所中にｒｘｘｘ
ｘ　　ＨｉＷＡＹＪとあり、文字認識の結果、ｒＨｉＷ
ＡＹＪが単語として抽出されたとする。この単語ｒＨｉ
ＷＡＹＪと第１のデータベース１７の内容とを比較照合
する。For the sake of explanation, please note that rxxx is included in the address on the target mail.
x HiWAYJ, and as a result of character recognition, rHiW
Suppose that AYJ is extracted as a word. This word rHi
WAYJ and the contents of the first database 17 are compared and verified.

第１のデータベース１７の該当する部分が以下の第１表
のように構成されている。The relevant portion of the first database 17 is configured as shown in Table 1 below.

（以下余白）第１表ここで、第１表の左欄の「書状タイプ」とあるのは、略
称であるものを含んでいる。(Margin below) Table 1 Here, the term "letter type" in the left column of Table 1 includes abbreviations.

よって、郵便物から読取った単語がｒＨｉＷＡＹ」であ
るから、この単語と第１表に示される第１のデータベー
ス１７の「書状タイプ」との比較照合が取られる。この
例では、第１表の第２行目のものが一致するので、まず
、その標準形を認識候補として抽出する。更に、郵便物
に書かれているｒＨｉＷＡＹＪを同義語として認識候補
の一つとする。これは、「ＨｉＷＡＹＪが「高速道路（
Ｈｉ　ＧＨＷＡＹ）Ｊである場合の他に、地名である場
合もあるからである。Therefore, since the word read from the mail is "rHiWAY", this word is compared with the "letter type" in the first database 17 shown in Table 1. In this example, since the second line of Table 1 matches, the standard form is first extracted as a recognition candidate. Furthermore, rHiWAYJ written on the mail is treated as a synonym and one of the recognition candidates. This is because ``HiWAYJ is a ``expressway (
This is because in addition to the case where it is Hi GHWAY) J, it may also be a place name.

こうして、２つの認識候補ｒＨｉＷＡＹＪｒＨｉＧＨＷ
ＡＹＪが出力され、住所認識部２１に送られる。住所認
識部２１では第２のデータベース１９を利用して認識候
補を一つに絞っていく。具体的には、読取った単語の前
後の単語まで含め、認識候補を一つに決定する。ここで
は、地名であるとして、ｒｘｘｘｘ　　ＨｉＧＨＷＡＹ
Ｊが選び出される。In this way, two recognition candidates rHiWAYJrHiGHW
AYJ is output and sent to the address recognition section 21. The address recognition unit 21 uses the second database 19 to narrow down the recognition candidates to one. Specifically, one recognition candidate is determined, including the words before and after the read word. Here, as a place name, rxxxxx HiGHWAY
J is selected.

この結果は、出力部２３により表示等がなされる。This result is displayed, etc. by the output unit 23.

また、郵便物上に「ロロロロ　ＨＷＹ　　ＸＸＸＸ」と
の記載があり、その通りに読取ったとする。Also, assume that the mail has the words "RORORORO HWY XXXX" written on it and is read as such.

この場合には、ｒＨＷＹＪという単語の認識において、
「ＨＷＹＪとその標準形であるｒＨｉ　ＧＨＷＡＹＪと
いう単語が認識候補として抽出される。In this case, in recognizing the word rHWYJ,
The words “HWYJ” and its standard form “rHi GHWAYJ” are extracted as recognition candidates.

そして、この両候補が住所認識部２１にて最終的に一つ
に絞られる。Then, these two candidates are finally narrowed down to one by the address recognition unit 21.

［発明の効果］以上説明したように本発明によれば、単語の標準形と入
力された単語を認識候補とするのでより正確に住所検索
が実行される。[Effects of the Invention] As described above, according to the present invention, since the standard form of a word and the input word are used as recognition candidates, an address search can be performed more accurately.

[Brief explanation of the drawing]

第１図は本発明の一実施例に係る住所認識装置の構成を
示す概略図である。１１・・・入力部１３・・・文字検出認識部１５・・・単語認識部１７・・・第１のデータベース１９・・・第２のデータベース２１・・・住所認識部２３・・・出力部代コ！人弁Σ〒士則FIG. 1 is a schematic diagram showing the configuration of an address recognition device according to an embodiment of the present invention. 11... Input unit 13... Character detection recognition unit 15... Word recognition unit 17... First database 19... Second database 21... Address recognition unit 23... Output unit Substitute! Jinben Σ

Claims

[Claims]

(1) In a recognition method that includes an input means for inputting a search target and a means for performing recognition by comparing the words input from this input means with the contents of a dictionary, the words for which abbreviations are used are An address recognition method characterized in that the standard form of the word is registered in the dictionary, and the standard form is output when searching for the word, and the word input from the input means is output as a recognition candidate. .