JPH02181269A - Address recognizing system - Google Patents

Address recognizing system

Info

Publication number
JPH02181269A
JPH02181269A JP64000114A JP11489A JPH02181269A JP H02181269 A JPH02181269 A JP H02181269A JP 64000114 A JP64000114 A JP 64000114A JP 11489 A JP11489 A JP 11489A JP H02181269 A JPH02181269 A JP H02181269A
Authority
JP
Japan
Prior art keywords
recognizing
recognition
words
word
candidates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP64000114A
Other languages
Japanese (ja)
Inventor
Kaoru Katagiri
片桐 薫
Hiroyuki Sugai
菅井 弘幸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Toshiba System Development Co Ltd
Original Assignee
Toshiba Corp
Toshiba System Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp, Toshiba System Development Co Ltd filed Critical Toshiba Corp
Priority to JP64000114A priority Critical patent/JPH02181269A/en
Publication of JPH02181269A publication Critical patent/JPH02181269A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

PURPOSE:To retrieve the addresses with high accuracy by preparing all possible meanings to an abbreviation and defining the standard forms of words and the inputted words as the recognizing candidates. CONSTITUTION:An input part 11 consists of a photoelectric transducer like a CCD, etc., and converts the pictures drawn on a mail into the picture information '0' and '1'. These picture information obtained via the part 11 are supplied to a character detecting/recognizing part 13. The part 13 extracts the information forming characters out of the picture information and recognizes each character. Based on this recognizing result, a word recognizing part 15 recognizes words by means of a 1st data base 17. In this word recognizing process, the recognizing result is not merely obtained but plural recognizing candidates are obtained in the order of higher degrees of likelihood. Thus plural words are outputted as the recognizing candidates. An address recognizing part 21 recognizes addresses by means of a 2nd data base 19 after the words are recognized. Then the recognized addresses are outputted via an output part 23.

Description

【発明の詳細な説明】 [発明の目的] (産業上の利用分野) 本発明は、住所認識方式に関する。[Detailed description of the invention] [Purpose of the invention] (Industrial application field) The present invention relates to an address recognition method.

(従来の技術) 認識方法には種々雑多のものがあるが、住所認識の場合
には以下のような方式が採用されている。
(Prior Art) There are various recognition methods, but in the case of address recognition, the following methods are adopted.

まず、住所認識を実行するために、文字を認識し、これ
に引続き単語を認識する。この認識結果に基づいて、単
語を組合せて住所を検索する。この単語認識に際し、同
義語があった場合、略称を候補として挙げていた。例え
ば、rHiGHWAYJ  rHiWAYJ  rHW
YJとの記載を読取り認識した場合、辞書からその略称
であるrHWYJを認識候補として得ていた。これは、
rHi GHWAYJ  rHiWAYJ  rHWY
Jが共に同一の意味、即ち高速道路という意味しかない
場合には非常に有効であった。
First, to perform address recognition, characters are recognized, followed by words. Based on this recognition result, addresses are searched by combining words. During word recognition, if synonyms were found, abbreviations were suggested as candidates. For example, rHiGHWAYJ rHiWAYJ rHW
When the description YJ was read and recognized, its abbreviation rHWYJ was obtained from the dictionary as a recognition candidate. this is,
rHi GHWAYJ rHiWAYJ rHWY
It was very effective when both J's had the same meaning, that is, expressway.

しかしながら、本願発明者が、試験、検討を重ねたとこ
ろ、略称の意味が重複する場合があり、正しい住所検索
が出来なかった。
However, after repeated tests and studies by the inventor of the present application, it was found that the meanings of the abbreviations overlapped in some cases, making it impossible to perform correct address searches.

(発明が解決しようとする課題) このように従来の技術では、単語の候補を単一の略称で
表現していたため、略称の意味が重複する場合、正しい
住所検索が出来なかった。
(Problems to be Solved by the Invention) As described above, in the conventional technology, word candidates are expressed by a single abbreviation, so when the meanings of the abbreviations overlap, correct address retrieval cannot be performed.

そこで、この発明は、略称の意味が重複する場合でもよ
り正確な認識の出来る住所検索方式を提供することを目
的とする。
Therefore, it is an object of the present invention to provide an address retrieval method that allows more accurate recognition even when the meanings of abbreviations overlap.

[発明の構成〕 (課題を解決するための手段) 本発明は上記課題を解決するために、検索対象を入力す
る入力手段と、この入力手段から入力された単語に対し
辞書の内容との比較を行って認識を実行する手段とを具
備する識方式において、略称が用いられる単語に対して
その標準形を前記辞書に登録しておき、前記単語の検索
に際し前記標準形を出力すると共に、前記入力手段から
入力された単語を、認識候補として出力することを特徴
とする。
[Structure of the Invention] (Means for Solving the Problems) In order to solve the above problems, the present invention provides an input means for inputting a search target, and a comparison of the words inputted from this input means with the contents of a dictionary. In the identification method, the standard form of a word for which an abbreviation is used is registered in the dictionary, and the standard form is output when searching for the word, and the standard form is It is characterized in that the words input from the input means are output as recognition candidates.

(作用) 本発明によれば、略称に対し可能な全ての意味を持たせ
、複数の候補を認識候補とするため住所認識がより正確
に実行される。
(Operation) According to the present invention, all possible meanings are assigned to abbreviations and a plurality of candidates are used as recognition candidates, so that address recognition is performed more accurately.

(実施例) 次にこの発明の一実施例について、図面を用いて詳細に
説明する。
(Example) Next, an example of the present invention will be described in detail using the drawings.

この実施例は米国での郵便物の住所認識装置に関する。This embodiment relates to a mail address recognition system in the United States.

この住所認識装置は、第1図に示されるよう入力部11
を含む。この入力部11は、CCD等の光電変換装置か
ら構成され、郵便物上の画像をrOJ  rlJの画像
情報に変換するものである。この入力部11で得られた
画像情報は、文字検出認識部13に供給される。この文
字検出認識部13では、画像情報から文字を構成する情
報を抽出し、個々の文字について認識処理を実行する。
This address recognition device has an input section 11 as shown in FIG.
including. This input unit 11 is composed of a photoelectric conversion device such as a CCD, and converts an image on a mail item into rOJ rlJ image information. The image information obtained by this input section 11 is supplied to a character detection recognition section 13. The character detection and recognition unit 13 extracts information constituting characters from image information and performs recognition processing on each character.

この結果に対して単語認識部15において、第1のデー
タベースを用いて単語認識が実行される。
The word recognition unit 15 executes word recognition on this result using the first database.

単語認識の後、第2のデータベース19を用いて住所認
識部21にて住所認識がなされる。この認識結果が出力
装置23に供給し、表示等をする。
After word recognition, the address recognition unit 21 uses the second database 19 to recognize the address. This recognition result is supplied to the output device 23 and displayed.

次に上記構成における動作処理の詳細について説明する
Next, details of the operation processing in the above configuration will be explained.

文字検出認識部13での文字検出は、周知の技術で達成
される。まず、画像情報に対して2軸方向(水平、垂直
)に分布の統計をとる。即ち、2軸方向の各々に射影を
とり、「1」の数を計数する。すると、文字のある場所
では他の領域に比し、「1」の数、即ち画素数が多いの
で、文字の検出は容易に実現される。ただし、この時に
は、通常の認識処理に用いるより粗い密度の画素により
分布をとる。例えば、CCDでの読取り能力が8本/ 
m mであり、通常の認識処理でも8本/ m mで行
っていたとすると、上記の検出処理には1本/mmの精
度で行う。粗い画素での処理の方が文字検出に誤りが生
じないので好ましい。
Character detection by the character detection recognition unit 13 is achieved using a well-known technique. First, statistics on the distribution of image information in two axes (horizontal and vertical) are taken. That is, projections are taken in each of the two axis directions and the number of "1"s is counted. Then, since the number of "1"s, that is, the number of pixels is larger in the area where the character is present than in other areas, the detection of the character can be easily realized. However, at this time, the distribution is based on pixels with a coarser density than those used in normal recognition processing. For example, the reading capacity of a CCD is 8 lines/
mm, and if normal recognition processing is performed at a rate of 8 lines/mm, the above detection process is performed with an accuracy of 1 line/mm. Processing using coarse pixels is preferable because it prevents errors in character detection.

この文字検出では、2軸方向に射影をとるが、−の方向
の射影には切れ目(画素のない領域)があり、しかもそ
の間隔がある一定値以上になっている。この方向を第1
の方向と呼ぶ。他の方向の射影は、切れ目がないか、あ
ったとしてもその間隔が狭い。
In this character detection, projections are taken in the two-axis directions, and there is a break (area with no pixels) in the projection in the - direction, and the interval between the breaks is greater than a certain value. This direction is the first
is called the direction of Projections in other directions are either seamless or closely spaced.

ここでは、切れ目のある方向に沿って「行」が構成され
ていると考えられる。よって、上記の切れ目に沿って、
「行」を切出していく。
Here, it is considered that "rows" are constructed along the direction of the cut. Therefore, along the above cut,
Cut out the “row”.

行を切出したなら、この行を構成する領域の画像に対し
て、第1の方向とは垂直な第2の方向に沿って射影を取
る。これにより、文字のある領域では画素がカウントさ
れ、文字のない領域では画素がカウントされない。この
処理においては、扱う画素密度を行切出し時より精密な
ものにしても良い。上記と同様に、文字の存在の有無が
第2の方向への射影となって表れる。よって、行に対し
て文字の切出しが出来、文字検切が達成される。
Once a row is cut out, a projection is taken along a second direction perpendicular to the first direction with respect to the image of the area constituting this row. As a result, pixels are counted in areas with characters, and pixels are not counted in areas without characters. In this process, the pixel density to be handled may be made more precise than that at the time of line extraction. Similarly to the above, the presence or absence of a character appears as a projection in the second direction. Therefore, characters can be cut out from a line, and character inspection can be achieved.

文字の検切に引続き、文字認識が行われる。この実施例
では、例えば複合類似度法を用いて文字認識を行なう。
Following character verification, character recognition is performed. In this embodiment, character recognition is performed using, for example, a composite similarity method.

ここまでが、文字検出認識部13での処理であり、文字
検出認識部13からは、rHJ  riJ  rWJ 
 rYJ等の文字についての情報が得られる。
The processing up to this point is the processing in the character detection and recognition unit 13. From the character detection and recognition unit 13, rHJ riJ rWJ
Information about characters such as rYJ can be obtained.

この文字認識の後、単語認識部15にて、文字の組合わ
せである単語の認識が実行される。尚、認識については
、単純に認識結果が得られるのではなく、複数の認識候
補が尤度の高い順に得られ、通常の場合、郵便物上の読
取りにより、複数の単語が認識候補として出力される。
After this character recognition, the word recognition unit 15 recognizes a word, which is a combination of characters. Regarding recognition, rather than simply obtaining a recognition result, multiple recognition candidates are obtained in descending order of likelihood, and in normal cases, multiple words are output as recognition candidates by reading on a piece of mail. Ru.

今、説明の都合上、対象の郵便物上の住所中にrxxx
x  HiWAYJとあり、文字認識の結果、rHiW
AYJが単語として抽出されたとする。この単語rHi
WAYJと第1のデータベース17の内容とを比較照合
する。
For the sake of explanation, please note that rxxx is included in the address on the target mail.
x HiWAYJ, and as a result of character recognition, rHiW
Suppose that AYJ is extracted as a word. This word rHi
WAYJ and the contents of the first database 17 are compared and verified.

第1のデータベース17の該当する部分が以下の第1表
のように構成されている。
The relevant portion of the first database 17 is configured as shown in Table 1 below.

(以下余白) 第1表 ここで、第1表の左欄の「書状タイプ」とあるのは、略
称であるものを含んでいる。
(Margin below) Table 1 Here, the term "letter type" in the left column of Table 1 includes abbreviations.

よって、郵便物から読取った単語がrHiWAY」であ
るから、この単語と第1表に示される第1のデータベー
ス17の「書状タイプ」との比較照合が取られる。この
例では、第1表の第2行目のものが一致するので、まず
、その標準形を認識候補として抽出する。更に、郵便物
に書かれているrHiWAYJを同義語として認識候補
の一つとする。これは、「HiWAYJが「高速道路(
Hi GHWAY)Jである場合の他に、地名である場
合もあるからである。
Therefore, since the word read from the mail is "rHiWAY", this word is compared with the "letter type" in the first database 17 shown in Table 1. In this example, since the second line of Table 1 matches, the standard form is first extracted as a recognition candidate. Furthermore, rHiWAYJ written on the mail is treated as a synonym and one of the recognition candidates. This is because ``HiWAYJ is a ``expressway (
This is because in addition to the case where it is Hi GHWAY) J, it may also be a place name.

こうして、2つの認識候補rHiWAYJrHiGHW
AYJが出力され、住所認識部21に送られる。住所認
識部21では第2のデータベース19を利用して認識候
補を一つに絞っていく。具体的には、読取った単語の前
後の単語まで含め、認識候補を一つに決定する。ここで
は、地名であるとして、rxxxx  HiGHWAY
Jが選び出される。
In this way, two recognition candidates rHiWAYJrHiGHW
AYJ is output and sent to the address recognition section 21. The address recognition unit 21 uses the second database 19 to narrow down the recognition candidates to one. Specifically, one recognition candidate is determined, including the words before and after the read word. Here, as a place name, rxxxxx HiGHWAY
J is selected.

この結果は、出力部23により表示等がなされる。This result is displayed, etc. by the output unit 23.

また、郵便物上に「ロロロロ HWY  XXXX」と
の記載があり、その通りに読取ったとする。
Also, assume that the mail has the words "RORORORO HWY XXXX" written on it and is read as such.

この場合には、rHWYJという単語の認識において、
「HWYJとその標準形であるrHi GHWAYJと
いう単語が認識候補として抽出される。
In this case, in recognizing the word rHWYJ,
The words “HWYJ” and its standard form “rHi GHWAYJ” are extracted as recognition candidates.

そして、この両候補が住所認識部21にて最終的に一つ
に絞られる。
Then, these two candidates are finally narrowed down to one by the address recognition unit 21.

[発明の効果] 以上説明したように本発明によれば、単語の標準形と入
力された単語を認識候補とするのでより正確に住所検索
が実行される。
[Effects of the Invention] As described above, according to the present invention, since the standard form of a word and the input word are used as recognition candidates, an address search can be performed more accurately.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例に係る住所認識装置の構成を
示す概略図である。 11・・・入力部 13・・・文字検出認識部 15・・・単語認識部 17・・・第1のデータベース 19・・・第2のデータベース 21・・・住所認識部 23・・・出力部 代コ!人 弁Σ〒士 則
FIG. 1 is a schematic diagram showing the configuration of an address recognition device according to an embodiment of the present invention. 11... Input unit 13... Character detection recognition unit 15... Word recognition unit 17... First database 19... Second database 21... Address recognition unit 23... Output unit Substitute! Jinben Σ

Claims (1)

【特許請求の範囲】[Claims] (1)検索対象を入力する入力手段と、 この入力手段から入力された単語に対し辞書の内容との
比較を行って認識を実行する手段とを具備する認識方式
において、 略称が用いられる単語に対してその標準形を前記辞書に
登録しておき、前記単語の検索に際し前記標準形を出力
すると共に、前記入力手段から入力された単語を、認識
候補として出力することを特徴とする住所認識方式。
(1) In a recognition method that includes an input means for inputting a search target and a means for performing recognition by comparing the words input from this input means with the contents of a dictionary, the words for which abbreviations are used are An address recognition method characterized in that the standard form of the word is registered in the dictionary, and the standard form is output when searching for the word, and the word input from the input means is output as a recognition candidate. .
JP64000114A 1989-01-05 1989-01-05 Address recognizing system Pending JPH02181269A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP64000114A JPH02181269A (en) 1989-01-05 1989-01-05 Address recognizing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP64000114A JPH02181269A (en) 1989-01-05 1989-01-05 Address recognizing system

Publications (1)

Publication Number Publication Date
JPH02181269A true JPH02181269A (en) 1990-07-16

Family

ID=11465031

Family Applications (1)

Application Number Title Priority Date Filing Date
JP64000114A Pending JPH02181269A (en) 1989-01-05 1989-01-05 Address recognizing system

Country Status (1)

Country Link
JP (1) JPH02181269A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03150668A (en) * 1989-11-08 1991-06-27 Fujitsu Ltd Input character string normalization system for retrieval system
JPH06231142A (en) * 1993-01-29 1994-08-19 Toraberu Data:Kk Computer system for traveling agent

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03150668A (en) * 1989-11-08 1991-06-27 Fujitsu Ltd Input character string normalization system for retrieval system
JPH06231142A (en) * 1993-01-29 1994-08-19 Toraberu Data:Kk Computer system for traveling agent

Similar Documents

Publication Publication Date Title
KR890002580B1 (en) Method for distinguishing between complex character sets
JPH11232291A (en) Method for retrieving protein three-dimensional structure data base
JPH02181269A (en) Address recognizing system
JPH08221510A (en) Device and method for processing form document
JP3375819B2 (en) Recognition method combining method and apparatus for performing the method
JPH08272811A (en) Document management method and device therefor
JPH02173886A (en) Word recognizing system
Lee et al. Table structure recognition based on grid shape graph
JP2685257B2 (en) Recognition method
JPH0247788B2 (en)
JPH02166586A (en) Address retrieving system
JP3419425B2 (en) Recognition character correction device
JPH064600A (en) Method and device for image retrieval
Sethi et al. Local structural association for retrieval and recognition of signature images
JPH053631B2 (en)
JPH02173883A (en) Address retrieval system
JPH02173885A (en) Address retrieval system
JPS58125183A (en) Method for displaying unrecognizable character in optical character reader
CN110674367A (en) Single Chinese character retrieval method and device based on travel industry products
JP2865443B2 (en) Kanji conversion device for Kana name or Kana corporation name
JPH04191992A (en) Information processor
JPS63136286A (en) Online character recognition system
JPS63138479A (en) Character recognizing device
JP2977244B2 (en) Character recognition method and character recognition device
JP2839515B2 (en) Character reading system