JPH08171615A

JPH08171615A - Address reader

Info

Publication number: JPH08171615A
Application number: JP6317164A
Authority: JP
Inventors: Shunichi Fukushima; 俊一福島; Eiki Ishidera; 永記石寺; Hideki Shimomura; 秀樹下村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1994-12-20
Filing date: 1994-12-20
Publication date: 1996-07-02
Anticipated expiration: 2013-07-30
Also published as: JP2780654B2

Abstract

PURPOSE: To exactly judge printing and to enable user to exactly and efficiently read it in the case of a printing character string by deciding the character string of the final address read result based on a place name part candidate buffer and an address part candidate buffer. CONSTITUTION: When a place name part candidate like printing character exists, the combination of the character candidates of an address part suited to attributes in a printing attribute buffer 9 and suited to the conditions of the address part in a place name table memory 5 is extracted from a recognized result buffer 4 by a printing character priority address part retrieving means 11. When no place name part candidate like printing character exists, the combination of the character candidates of the address part in the place name table memory 5 is extracted from the recognized result buffer 4 by a general address part retrieving means 10. An address part candidate buffer 12 stores the result of the printing priority address part retrieving means 11 or the result of the general address part retrieving means 10. Based on a place name part candidate buffer 7 and an address part candidate buffer 12, a final deciding means 12 decides the character string of the final address read result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】入力されたイメージデータから地
名部と番地部から成る住所文字列を読み取る住所読み取
り装置に関する。なお、住所文字列の例として「神奈川
県川崎市宮前区宮崎４−１−１」や「港区芝五丁目７の
１」を挙げると、本明細書でいう地名部とは「神奈川県
川崎市宮前区宮崎」や「港区芝」を指し、番地部とは
「４−１−１」や「五丁目７の１」を指す。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an address reading device for reading an address character string consisting of a place name part and an address part from input image data. As examples of the address character string, "4-1-1 Miyazaki, Miyazaki-ku, Kawasaki-shi, Kanagawa" and "5-1, 7 Shiba, Minato-ku" are mentioned, and the place name part in this specification is "Kawasaki, Kanagawa-ken.""Miyazaki,Ichimae-ku" and "Shiba, Minato-ku" are referred to, and the address section is "4-1-1" and "5-chome 7-1".

【０００２】[0002]

【従来の技術】図３は、従来の住所読み取り装置の構成
を示すブロック図である。2. Description of the Related Art FIG. 3 is a block diagram showing the configuration of a conventional address reading device.

【０００３】イメージ入力手段１は、地名部と番地部か
ら成る住所文字列の記載されたイメージデータを入力す
る。イメージバッファ２は、入力されたイメージデータ
を格納する。個別認識手段３は、イメージデータのなか
から１文字に相当するセグメントを切り出して、そのセ
グメントごとに個別の文字認識を行なう。認識結果バッ
ファ４は、個別認識手段３の結果を格納する。地名テー
ブルメモリ５は、読み取り対象の地名のリストと、各地
名に対応する番地部の条件を記憶する。地名部検索手段
６は、地名テーブルメモリ５を検索して、認識結果バッ
ファ４内の文字候補の組み合わせに対応する地名部候補
を抽出する。地名部候補バッファ７は、抽出された地名
部候補を格納する。一般番地部検索手段１０は、認識結
果バッファ４から、地名テーブルメモリ５における番地
部の条件に適合する番地部の文字候補の組み合わせを抽
出する。番地部候補バッファ１２は、一般番地部検索手
段１０の結果を格納する。最終判定手段１３は、地名部
候補バッファ７と番地部候補バッファ１２をもとに、最
終的な住所読み取り結果の文字列を決定する。The image input means 1 inputs image data in which an address character string consisting of a place name part and an address part is described. The image buffer 2 stores the input image data. The individual recognition means 3 cuts out a segment corresponding to one character from the image data, and individually recognizes each segment. The recognition result buffer 4 stores the result of the individual recognition means 3. The place name table memory 5 stores a list of place names to be read and conditions of an address part corresponding to each place name. The place name part search means 6 searches the place name table memory 5 and extracts place name part candidates corresponding to the combination of character candidates in the recognition result buffer 4. The place name part candidate buffer 7 stores the extracted place name part candidates. The general address part searching means 10 extracts, from the recognition result buffer 4, a combination of address part character candidates that match the conditions of the address part in the place name table memory 5. The address part candidate buffer 12 stores the result of the general address part search means 10. The final determination means 13 determines the final character string of the address read result based on the place name part candidate buffer 7 and the address part candidate buffer 12.

【０００４】図４は、従来の住所読み取り装置の別な構
成を示すブロック図である。FIG. 4 is a block diagram showing another structure of a conventional address reading device.

【０００５】イメージ入力手段１は、地名部と番地部か
ら成る住所文字列の記載されたイメージデータを入力す
る。イメージバッファ２は、入力されたイメージデータ
を格納する。個別認識手段３は、イメージデータのなか
から１文字に相当するセグメントを切り出して、そのセ
グメントごとに個別の文字認識を行なう。認識結果バッ
ファ４は、個別認識手段３の結果を格納する。地名テー
ブルメモリ５は、読み取り対象の地名のリストと、各地
名に対応する番地部の条件を記憶する。地名部検索手段
６は、地名テーブルメモリ５を検索して、認識結果バッ
ファ４内の文字候補の組み合わせに対応する地名部候補
を抽出する。地名部候補バッファ７は、抽出された地名
部候補を格納する。一般番地部個別認識手段１０は、イ
メージデータから番地部の１文字に相当するセグメント
を切り出して、そのセグメントごとに個別の文字認識を
行なう。番地部認識結果バッファ１６は、一般番地部個
別認識手段１４の結果を格納する。一般番地部検索手段
１０は、認識結果バッファ４から、地名テーブルメモリ
５における番地部の条件に適合する番地部の文字候補の
組み合わせを抽出する。番地部候補バッファ１２は、一
般番地部検索手段１０の結果を格納する。最終判定手段
１３は、地名部候補バッファ７と番地部候補バッファ１
２をもとに、最終的な住所読み取り結果の文字列を決定
する。The image input means 1 inputs image data in which an address character string consisting of a place name part and an address part is described. The image buffer 2 stores the input image data. The individual recognition means 3 cuts out a segment corresponding to one character from the image data, and individually recognizes each segment. The recognition result buffer 4 stores the result of the individual recognition means 3. The place name table memory 5 stores a list of place names to be read and conditions of an address part corresponding to each place name. The place name part search means 6 searches the place name table memory 5 and extracts place name part candidates corresponding to the combination of character candidates in the recognition result buffer 4. The place name part candidate buffer 7 stores the extracted place name part candidates. The general address part individual recognition means 10 cuts out a segment corresponding to one character of the address part from the image data, and performs individual character recognition for each segment. The address part recognition result buffer 16 stores the result of the general address part individual recognition means 14. The general address part searching means 10 extracts, from the recognition result buffer 4, a combination of address part character candidates that match the conditions of the address part in the place name table memory 5. The address part candidate buffer 12 stores the result of the general address part search means 10. The final determination means 13 includes a place name section candidate buffer 7 and an address section candidate buffer 1.
Based on 2, the final character string of the address reading result is determined.

【０００６】図３の従来構成と図４の従来構成とでは、
番地部の文字切り出し・文字認識に違いがある。図３の
構成では、番地部も地名部と同じ文字切り出し・文字認
識の結果（認識結果バッファ３）を処理対象とするのに
対して、図４の構成では、番地部は地名部とは別に改め
て文字切り出し・文字認識を実行して、その結果（番地
部認識結果バッファ１６）を処理対象としている。In the conventional configuration of FIG. 3 and the conventional configuration of FIG. 4,
There is a difference in the character cutting and character recognition of the address part. In the configuration of FIG. 3, the address part has the same character segmentation / character recognition result (recognition result buffer 3) as the place name part, but in the configuration of FIG. 4, the address part is separate from the place name part. Character extraction and character recognition are executed again, and the result (address part recognition result buffer 16) is processed.

【０００７】図５は地名テーブルメモリ５の内容の例で
ある。東京都品川区の住所構成をもとにした。地名テー
ブル５６は、地名のリスト５６と番地部の条件５７をも
つ。FIG. 5 shows an example of the contents of the place name table memory 5. Based on the address structure of Shinagawa-ku, Tokyo. The place name table 56 has a list 56 of place names and a condition 57 of an address part.

【０００８】地名のリスト５６には、都道府県名５０、
市区名５１、町名５２などの地名が、その階層関係も併
せて登録されている。住所読み取りの対象とする地域に
よっては、郡名、大字名、字名などを含む場合もある。The list 56 of place names includes prefecture names 50,
Place names such as city name 51 and town name 52 are also registered together with their hierarchical relationships. Depending on the area where the address is read, it may include a county name, an abbreviated name, or a character name.

【０００９】番地部の条件５７には、各町名５２に対す
る丁目・番・号の各々として許される値の範囲が記述さ
れている。例えば、図５の「荏原」の場合、丁目は１丁
目から７丁目まであり、番は１〜５０、号は１〜９の範
囲の値が許されるということを表わしている。The condition 57 of the address part describes the range of values permitted as each chome, number, and number for each town name 52. For example, in the case of "EBARA" in FIG. 5, the number of chomes is from 1 to 7, and numbers 1 to 50 and numbers 1 to 9 are allowed.

【００１０】図５の例においては、番地部は丁目・番・
号で構成されるものとしているが、町名５２によって
は、番地・号というタイプもある。そのような場合は、
タイプの区別（丁目・番・号タイプか、番地・号タイプ
か）を付与したり、あるいは、丁目・番・号のうちの１
項目の値を使用しない（値を０にするなど）といった方
法がとられている。In the example of FIG. 5, the address part is a claw, a number,
Although it is assumed to be composed of a number, depending on the town name 52, there is also a type of address / number. In such cases,
Add type distinction (chome / ban / go type or street / go type), or 1 of chome / ban / go
The method of not using the value of the item (for example, setting the value to 0) is adopted.

【００１１】地名部検索手段６は、認識結果バッファ４
内の文字候補の組み合わせ（隣接するセグメントに対す
る文字候補を組み合わせたもの）と、上述のような地名
テーブルメモリ５の地名のリスト５６内の地名の文字列
と照合し、かつ、地名の階層関係と矛盾を生じないよう
な地名の並びを、地名部の候補として地名部候補バッフ
ァ７に出力する。The place name part searching means 6 is provided in the recognition result buffer 4
The combination of the character candidates in the place name (a combination of character candidates for adjacent segments) is collated with the character string of the place name in the place name list 56 of the place name table memory 5 as described above, and the place name hierarchical relationship is obtained. The place name sequence that does not cause contradiction is output to the place name part candidate buffer 7 as a place name part candidate.

【００１２】一般番地部検索手段１０は、認識結果バッ
ファ４（図３の構成の場合）あるいは番地部認識結果バ
ッファ１６（図４の構成の場合）内の文字候補を組み合
わせて、番地部の文字列を形成する。その際、地名部候
補バッファ７の地名部候補を参照することで、番地部の
文字列の形成にいくつかの制約条件を加える。The general address part searching means 10 combines the character candidates in the recognition result buffer 4 (in the case of the configuration of FIG. 3) or the address part recognition result buffer 16 (in the case of the configuration of FIG. 4) to combine the characters of the address part. Form a row. At that time, by referring to the place name part candidates in the place name part candidate buffer 7, some constraint conditions are added to the formation of the character string of the address part.

【００１３】第一の制約条件は、地名部候補の末尾位置
を知り、番地部を、その直後の位置から形成するように
制御することである。The first constraint condition is to know the end position of the place name part candidate and control to form the address part from the position immediately after that.

【００１４】第二の制約条件は、地名部候補に対応する
番地部の条件を地名テーブルメモリ５から知り、それを
満たすように丁目・番・号の各数字列を組み合わせるこ
とである。The second constraint condition is that the condition of the address part corresponding to the candidate place name part is known from the place name table memory 5, and the number strings of chome, number and number are combined so as to satisfy the condition.

【００１５】さらに、地名部候補には依存しないが、
「丁目」「番」「号」「番地」「の」「ノ」「−」な
ど、数字列を区切るセパレータの並びや組み合わせが不
自然でないかも、番地部の文字列形成における制約条件
として利用される。このような番地部の制約条件を満た
した文字列が、番地部候補として番地部候補バッファ１
２に出力される。Further, although it does not depend on the place name part candidate,
Even if the arrangement or combination of the separators that delimit the number strings such as "chome", "ban", "go", "street number", "no", "no", "-" is not unnatural, it is used as a constraint condition in the character string formation of the address part. It A character string satisfying such a restriction condition of the address part is the address part candidate buffer 1 as the address part candidate.
2 is output.

【００１６】以上のような従来の住所読み取り装置およ
び各構成要素の実現方法は、例えば、「日本郵政省向け
郵便物あて名自動読取区分機」（石川ほか、ＮＥＣ技
報、第４４巻第３号、１９９１年）、「郵便物あて名自
動読取区分機ＴＲ−１７」（鳥本ほか、東芝レビュー、
第４５巻第２号、１９９０年）、特開平５−３２４８９
９号公報「郵便物記載住所認識装置」、特開平３−１８
９７８０号公報「住所認識装置」、特開平６−１２４３
６６号公報「住所読取装置」、特開平５−１６９０３３
「宛名読取装置」などに記載されている。The conventional address reading device and the method of realizing each component described above are described, for example, in "Automatic reading / sorting machine for mail address for Japan Post" (Ishikawa et al., NEC Technical Report, Vol. 44, No. 3). , 1991), "Automatic reading / sorting machine for mailing name TR-17" (Torimoto et al., Toshiba review,
Vol. 45, No. 2, 1990), JP-A-5-32489
Japanese Unexamined Patent Application Publication No. 9- "Postal item address recognition device", JP-A-3-18
No. 9780, "Address Recognition Device", JP-A-6-1243.
No. 66, "Address Reader", Japanese Patent Laid-Open No. 5-169033
It is described in "Address Reader" or the like.

【００１７】[0017]

【発明が解決しようとする課題】帳票などにおいて、あ
らかじめ設けられた文字枠内に記入された住所文字列に
ついては、文字切り出しは比較的容易である。しかし、
郵便物などのように文字枠の設けられないものも多く、
住所を記入する側の立場になれば、文字枠などに制約さ
れずに記入できる方が便利である。したがって、文字枠
を設けずに自由に書かれた住所文字列を対象として、文
字切り出しを正確に行なえることが必要になってきてい
る。It is relatively easy to cut out a character string of an address character string entered in a character frame provided in advance in a form or the like. But,
There are many things such as mail that do not have a character frame,
From the standpoint of address entry side, it is more convenient to be able to enter the address without being restricted by letter boxes. Therefore, it is necessary to accurately perform character segmentation on an address character string that is freely written without providing a character frame.

【００１８】その一方で、ワープロの普及や各種業務の
電子化の進展を背景に、印活タイプの比率も多くなって
きている。印活文字列の場合、文字サイズやピッチの均
一性から、手書き文字列に比較すると、文字切り出しは
容易である。文字切り出しに関する従来手法について
は、特開平５−１６６０９９号公報「文字切り出し・認
識方法及び装置」、特開平５−１２８３０７号公報「文
字認識装置」などに記載されている。On the other hand, the ratio of printing type is increasing due to the spread of word processors and the progress of computerization of various operations. In the case of a printed character string, it is easier to cut out a character than a handwritten character string because of the uniformity of character size and pitch. A conventional method for character extraction is described in Japanese Patent Laid-Open No. 5-166099 “Character cutting / recognition method and apparatus”, Japanese Patent Laid-Open No. 5-128307 “Character recognition apparatus” and the like.

【００１９】しかし、手書きのものと印活のものとは混
在した状態で処理されるのが一般的であり、事前に手書
きか印活かを指定して処理を切り換えることで済むよう
なケースは少ない。However, it is general that the handwriting and the printing are processed in a mixed state, and there are few cases in which the processing can be switched by designating handwriting or printing in advance. .

【００２０】これに対して、前述のような従来の住所読
み取り装置では、手書きと印活は区別なく処理するよう
になっている。そのため、印活文字列についても、手書
き文字列と同様に、多様な文字サイズ・ピッチを想定し
た文字切り出しが行なわれる。On the other hand, in the conventional address reading device as described above, handwriting and printing are processed without distinction. Therefore, with respect to the printed character string, the character segmentation is performed assuming various character sizes and pitches similarly to the handwritten character string.

【００２１】そのような場合、住所の番地部の文字列で
は、次のような問題が発生する。In such a case, the following problem occurs in the character string of the address part of the address.

【００２２】第一の問題は、特に縦書きの「二」「三」
などについて、多数の可能性が発生してしまい、その競
合により番地部の読み取り結果を一意に決定できなくな
ってしまうことである。例えば、縦書きで「三」と書か
れた場合、「一一一」「一二」「二一」「三」などの可
能性が発生してしまう。もし、印活文字列だということ
がわかれば、文字サイズやピッチの条件を使って、「一
一一」「一二」「二一」などの候補は棄却して、一意に
「三」に決定できる。The first problem is the vertical writing of "two" and "three".
For example, a lot of possibilities occur, and it is impossible to uniquely determine the reading result of the address part due to the competition. For example, if "3" is written vertically, there is a possibility that "11", "12", "21", "3", etc. will occur. If it is known that it is an Inprint character string, using the conditions of character size and pitch, reject candidates such as "11-11", "12", "21", and uniquely set to "3". I can decide.

【００２３】第二の問題は、印活文字（特にワープロ文
字）では、「二」や「三」の上下の横棒が文字サイズの
上端／下端ギリギリに位置するようにデザインされてい
るものが多く、単純に線分の間隔だけに着目して文字切
り出しを行なうと、前後の文字に結合しやすいことであ
る。図６に、この問題による誤切り出しの例を示した。
「品川区荏原二丁目」という印活文字列に対して、図６
（ａ）が正しい切り出し結果である。それに対して、図
６（ｂ）では「二」の上側の横棒が直前の「原」のセグ
メントに含まれてしまっており、図６（ｃ）の例では
「二」の下側の横棒が直後の「丁」のセグメントに含ま
れてしまっている。その結果、（ｂ）（ｃ）の場合の住
所読み取り結果は、「品川区荏原一丁目」という誤った
結果となってしまう（文字認識方式によっては、縦書き
の「原一」をまとめて「原」、縦書きの「一丁」をまと
めて「丁」と認識することは大いにあり得る）。The second problem is that in print characters (especially word processing characters), the horizontal bars above and below "two" and "three" are designed to be positioned at the upper and lower ends of the character size. In many cases, if the character segmentation is performed simply by paying attention to only the space between line segments, it is easy to combine the characters with the preceding and succeeding characters. FIG. 6 shows an example of erroneous clipping due to this problem.
Figure 6 for the stamp character string "Ebara 2-chome, Shinagawa-ku"
(A) is a correct cutting result. On the other hand, in Fig. 6 (b), the horizontal bar above "2" is included in the immediately preceding "original" segment, and in the example of Fig. 6 (c), the horizontal bar below "2" is included. The stick is included in the "Ding" segment immediately after. As a result, in the case of (b) and (c), the result of reading the address will be an erroneous result of "Ebara 1-chome, Shinagawa-ku". It is very likely that "hara" and vertically written "ichi" are collectively recognized as "cho."

【００２４】このような手書きと印活の混在時の問題に
対して、図３や図４のような従来の住所読み取り装置の
構成では、次のような２通りの対策が考えられる。With respect to the problem when the handwriting and the printing are mixed, the following two countermeasures can be considered in the configuration of the conventional address reading device as shown in FIGS. 3 and 4.

【００２５】第一の対策は、個別認識手段３において、
手書きか印活かの判断を行なって、それを文字切り出し
に反映させる方法である。The first measure is that the individual recognition means 3
This is a method of determining whether it is handwriting or printing and reflecting it in the character segmentation.

【００２６】第二の対策は、手書き／印活を気にせず
に、個別認識手段３、地名部検索手段６、一般番地部個
別認識手段１４、一般番地部検索手段１０などを動作さ
せ、最終判定手段１３の段階で地名部候補や番地部候補
の選択方法を工夫するという方法である。The second measure is to operate the individual recognizing means 3, the place name part retrieving means 6, the general address part individual recognizing means 14, the general address part retrieving means 10, etc. without worrying about handwriting / printing, and finally. This is a method of devising a method of selecting a place name part candidate or an address part candidate at the stage of the determination means 13.

【００２７】しかし、これら２通りの対策には、次のよ
うな欠点がある。However, these two measures have the following drawbacks.

【００２８】第一の対策の欠点は、個別認識手段３の段
階で、手書きか印活かを正確に判断するには限界がある
ことである。個別認識手段３の段階で、手書きか印活か
を判断するとすれば、文字の切り出し方の可能性のなか
で等サイズ・等ピッチのものがとれれば、印活と判定
し、そうでなければ手書き判定するようなことになる。
しかし、等サイズ・等ピッチなどの物理的な条件だけで
判断すると、手書き文字列を強引に細切れにして等サイ
ズ・等ピッチのセグメントを切り出してしまったり、印
活文字列でも複数の切り出し可能性が生じて、誤ったポ
イントで切り出ししてしまうこともある。図７は、
（ａ）が正しい切り出し方であるのに対して、（ｂ）
（ｃ）は等サイズ・等ピッチでも誤った切り出しとなっ
てしまった例である。The drawback of the first measure is that there is a limit to the accurate judgment of handwriting or printing at the stage of the individual recognition means 3. If it is judged at the stage of the individual recognition means 3 whether it is handwriting or printing, if it is possible to cut out the characters with equal size and pitch, it is judged as printing, and if not, it is handwritten. It will be decided.
However, if only the physical conditions such as equal size and equal pitch are used for judgment, the handwritten character string will be forcibly cut into small segments of equal size and equal pitch, or even multiple characters may be cut out from a printed character string. Occasionally, it may cut out at the wrong point. FIG.
While (a) is the correct cutting method, (b)
(C) is an example in which erroneous cutting is performed even with the same size and the same pitch.

【００２９】第二の対策の欠点は、印活文字列の場合で
も、多数の地名部候補や番地部候補が発生して、処理に
時間がかかることである。文字切り出しに多数の候補が
発生すると、その結果を対象とした後続の処理は、一般
に組み合わせ的に処理量が増大する。郵便物の宛名住所
の読み取りなどの場合には、１通の住所読み取りに対し
て制限時間が与えられることになっており、組み合わせ
が多くなると、制限時間内に処理できなくなってしま
う。The disadvantage of the second measure is that even in the case of a print character string, a lot of place name part candidates and address part candidates are generated, and it takes time to process. When a large number of candidates for character extraction occur, the subsequent processing targeting the result generally increases the processing amount in a combinatorial manner. In the case of reading the address of the mail, the time limit is given to one address read, and if the number of combinations increases, the processing cannot be performed within the time limit.

【００３０】そこで、本発明では、上述のような従来の
住所読み取り装置の欠点を除去し、手書き文字列か印活
文字列かの指定が外から与えられなくとも、正確に印活
判定を行ない、印活文字列であった場合は、その特性
（等サイズ・等ピッチなど）を生かして、正確で効率の
良い読み取りを実現することを目的とする。In view of the above, the present invention eliminates the drawbacks of the conventional address reading device as described above, and makes accurate printing judgment even if the designation of a handwritten character string or a stamp character string is not given from the outside. If the character string is a print character string, it is an object to realize accurate and efficient reading by utilizing its characteristics (equal size, equal pitch, etc.).

【００３１】[0031]

【課題を解決するための手段】第一の発明は、入力され
たイメージデータから地名部と番地部から成る住所文字
列を読み取る住所読み取り装置において、前記イメージ
データのなかから１文字に相当するセグメントを切り出
して該セグメントごとに個別の文字認識を行なう個別認
識手段と、前記個別認識手段の結果を格納する認識結果
バッファと、読み取り対象の地名のリストと該地名に対
応する番地部の条件を登録した地名テーブルメモリと、
前記地名テーブルメモリを検索して前記認識結果バッフ
ァ内の文字候補の組み合わせに対応する地名部候補を抽
出する地名部検索手段と、前記地名部候補を格納する地
名部候補バッファと、前記地名部候補を構成する文字列
の印活らしさを判定して印活らしい地名部候補が存在す
る場合は前記地名部候補のなかで該印活らしい地名部候
補を優先する印活性判定手段と、前記印活らしい地名部
候補の属性を格納する印活属性バッファと、前記印活ら
しい地名部候補が存在した場合に前記認識結果バッファ
から前記印活属性バッファ内の属性と前記地名テーブル
メモリにおける番地部の条件に適合する番地部の文字候
補の組み合わせを抽出する印活優先番地部検索手段と、
前記印活らしい地名部候補が存在しなかった場合に前記
認識結果バッファから前記地名テーブルメモリにおける
番地部の条件に適合する番地部の文字候補の組み合わせ
を抽出する一般番地部検索手段と、前記印活優先番地部
検索手段の結果または前記一般番地部検索手段の結果を
格納する番地部候補バッファと、前記地名部候補バッフ
ァと前記番地部候補バッファをもとに最終的な住所読み
取り結果の文字列を決定する最終判定手段とを備えるこ
とを特徴とする住所読み取り装置である。According to a first aspect of the present invention, in an address reading device for reading an address character string consisting of a place name part and an address part from input image data, a segment corresponding to one character in the image data. An individual recognition unit that cuts out and recognizes individual characters for each segment, a recognition result buffer that stores the result of the individual recognition unit, a list of place names to be read, and a condition of an address part corresponding to the place name are registered. And the place name table memory
A place name part search means for searching the place name table memory to extract a place name part candidate corresponding to a combination of character candidates in the recognition result buffer, a place name part candidate buffer for storing the place name part candidate, and the place name part candidate If there is a place name part candidate that seems to be in print by determining the print impression of the character string that constitutes, the print activity determination means that prioritizes the place name part candidate that seems to be the print place among the place name part candidates, Stamp attribute buffer that stores the attributes of the likely place name part candidates, and the conditions of the address part in the place name table memory and the attributes in the stamp attribute buffer from the recognition result buffer when the likely place name part candidates exist Inquiry priority address part search means for extracting a combination of character candidates of the address part that conforms to,
A general address part searching means for extracting a combination of address part character candidates that meet the conditions of the address part in the place name table memory from the recognition result buffer when there is no place name part candidate that seems to be stamped; An address part candidate buffer for storing the result of the live priority address part search means or the result of the general address part search means, and a character string of the final address read result based on the place name part candidate buffer and the address part candidate buffer An address reading device comprising a final determination means for determining.

【００３２】第二の発明は、入力されたイメージデータ
から地名部と番地部から成る住所文字列を読み取る住所
読み取り装置において、前記イメージデータのなかから
１文字に相当するセグメントを切り出して該セグメント
ごとに個別の文字認識を行なう個別認識手段と、前記個
別認識手段の結果を格納する認識結果バッファと、読み
取り対象の地名のリストと該地名に対応する番地部の条
件を登録した地名テーブルメモリと、前記地名テーブル
メモリを検索して前記認識結果バッファ内の文字候補の
組み合わせに対応する地名部候補を抽出する地名部検索
手段と、前記地名部候補を格納する地名部候補バッファ
と、前記地名部候補を構成する文字列の印活らしさを判
定して印活らしい地名部候補が存在する場合は前記地名
部候補のなかで該印活らしい地名部候補を優先する印活
性判定手段と、前記印活らしい地名部候補の属性を格納
する印活属性バッファと、前記印活らしい地名部候補が
存在した場合に前記イメージデータから前記印活属性バ
ッファ内の属性に適合するように番地部の１文字に相当
するセグメントを切り出して該セグメントごとに個別の
文字認識を行なう印活優先番地部個別認識手段と、前記
印活らしい地名部候補が存在しなかった場合に前記イメ
ージデータから番地部の１文字に相当するセグメントを
切り出して該セグメントごとに個別の文字認識を行なう
一般番地部個別認識手段と、前記印活優先番地部個別認
識手段の結果または前記一般番地部個別認識手段の結果
を格納する番地部認識結果バッファと、前記番地部認識
結果バッファから前記地名テーブルメモリにおける番地
部の条件に適合する番地部の文字候補の組み合わせを抽
出する一般番地部検索手段と、前記一般番地部検索手段
の結果を格納する番地部候補バッファと、前記地名部候
補バッファと前記番地部候補バッファをもとに最終的な
住所読み取り結果の文字列を決定する最終判定手段とを
備えることを特徴とする住所読み取り装置である。A second invention is an address reading device for reading an address character string consisting of a place name part and an address part from input image data, and a segment corresponding to one character is cut out from the image data and each segment is segmented. An individual recognition means for individually recognizing characters, a recognition result buffer for storing the result of the individual recognition means, a list of place names to be read, and a place name table memory in which conditions of an address part corresponding to the place names are registered, A place name part search means for searching the place name table memory to extract a place name part candidate corresponding to a combination of character candidates in the recognition result buffer, a place name part candidate buffer for storing the place name part candidate, and the place name part candidate If there is a place name part candidate that is likely to be printed by judging the printing impression of the character string forming the A seal activation determining unit that gives priority to a lively place name part candidate, a print job attribute buffer that stores an attribute of the printable place name part candidate, and a stamp from the image data when the printable place name part candidate exists. Indicating priority address individual recognizing means for cutting out a segment corresponding to one character of the address so as to match the attribute in the live attribute buffer and recognizing individual characters for each segment, and the place name part candidate likely to be in printing If there is not, a segment corresponding to one character of the address part is cut out from the image data and individual character recognition is performed for each segment, and the printing priority address part individual recognition means. Address result recognition buffer storing the result of the above or the result of the general address individual recognition means, and the address table from the address recognition result buffer General address part search means for extracting a combination of address part character candidates that match the conditions of the address part in memory, an address part candidate buffer for storing the result of the general address part search means, the place name part candidate buffer, and the An address reading device, comprising: final determination means for determining a character string of a final address reading result based on an address part candidate buffer.

【００３３】[0033]

【作用】本発明では、いったん地名部候補を抽出した後
で、その地名部候補の印活らしさを判定する。そして、
印活らしい地名部候補が存在した場合は、それを優先す
る。さらに、印活らしい地名部候補が得られたときは、
その地名部候補に関する文字サイズやピッチなどの属性
を利用して、番地部の処理を行なう。In the present invention, the place name part candidate is once extracted, and then the impression of the place name part candidate is judged. And
If there is an Inzai-like candidate for a place name, it will be given priority. Furthermore, when a place name club candidate that seems to be Inzai was obtained,
The address part is processed by using the attributes such as the character size and pitch regarding the place name part candidate.

【００３４】地名部候補の印活らしさの情報を利用して
番地部の文字切り出しの候補を絞り込むため、手書き／
印活の区別をしない場合の従来の欠点として指摘した前
述の２点を除去している。すなわち、印活文字の「三」
を「一一一」や「一二」や「二一」と分割してしまうこ
とはなくなり、図６（ｃ）のような地名部の文字サイズ
・ピッチと整合の悪い番地部の切り出しは棄却できる。In order to narrow down the candidates for character cutting out of the address part by using the information on the impression of the place name part candidates, handwriting / writing
The above-mentioned two points which have been pointed out as the conventional defects when the printing is not distinguished are removed. In other words, the print letter "three"
Will no longer be divided into "11", "12", and "21", and the cutout of the address part that does not match the character size and pitch of the place name part as shown in Fig. 6 (c) will be rejected. it can.

【００３５】また、地名部候補を求めた後で、印活らし
いものを優先するようにしているため、従来の第一の対
策のもつ問題を除去できる。すなわち、等サイズ・等ピ
ッチであっても、図７の（ｂ）（ｃ）のような間違った
文字切り出しは、文字候補の組み合わせとして地名部候
補ができようがないため、排除することができる。図６
（ｂ）のような地名部の切り出しに対して地名部候補が
読めてしまっても、それは「原一」の部分で文字サイズ
・ピッチの乱れを生じているから、図６（ａ）のような
等サイズ・等ピッチの地名部候補が存在すれば、（ｂ）
は排除されて（ａ）が優先されることになる。Further, since the place name portion candidates are obtained, the ones that are likely to be printed are prioritized, so that the problem of the first conventional measure can be eliminated. That is, even with the same size and the same pitch, a wrong character cutout as shown in (b) and (c) of FIG. 7 can be eliminated because a place name part candidate cannot be formed as a combination of character candidates. . Figure 6
Even if the place name part candidate is readable for the cut-out of the place name part as shown in (b), the character size / pitch is disturbed in the "Haraichi" part, so as shown in FIG. 6 (a). If there is a place name part candidate of equal size and pitch, (b)
Will be excluded and priority will be given to (a).

【００３６】さらに、地名部候補を求めた段階で印活性
にもとづいて番地部の余分な候補形成を抑制するので、
従来の第二の対策の欠点であった処理効率の悪さも除去
できている。Furthermore, since the formation of extra candidates for the address part is suppressed based on the sign activity at the stage of obtaining the candidates for the place name part,
The poor processing efficiency, which was the drawback of the conventional second measure, can be eliminated.

【００３７】[0037]

【実施例】図１は、第一の発明の一実施例の構成を示す
ブロック図である。1 is a block diagram showing the configuration of an embodiment of the first invention.

【００３８】イメージ入力手段１は、地名部と番地部か
ら成る住所文字列の記載されたイメージデータを入力す
る。イメージバッファ２は、入力されたイメージデータ
を格納する。個別認識手段３は、イメージデータのなか
から１文字に相当するセグメントを切り出して、そのセ
グメントごとに個別の文字認識を行なう。認識結果バッ
ファ４は、個別認識手段３の結果を格納する。地名テー
ブルメモリ５は、読み取り対象の地名のリストと、各地
名に対応する番地部の条件を記憶する。地名部検索手段
６は、地名テーブルメモリ５を検索して、認識結果バッ
ファ４内の文字候補の組み合わせに対応する地名部候補
を抽出する。地名部候補バッファ７は、抽出された地名
部候補を格納する。印活性判定手段８は、地名部候補バ
ッファ７内の地名部候補について、それを構成する文字
列の印活らしさを判定し、印活らしい地名部候補が存在
する場合は、地名部候補のなかで印活らしい地名部候補
を優先するように、地名部候補バッファ７の内容を書き
換える。印活属性バッファ８は、印活性判定手段７の検
出した印活らしい地名部候補の属性を格納する。印活優
先番地部検索手段１１は、印活らしい地名部候補が存在
した場合に、認識結果バッファ４から、印活属性バッフ
ァ９内の属性に適合し、かつ、地名テーブルメモリ５に
おける番地部の条件にも適合する番地部の文字候補の組
み合わせを抽出する。一般番地部検索手段１０は、印活
らしい地名部候補が存在しなかった場合に、認識結果バ
ッファ４から、地名テーブルメモリ５における番地部の
条件に適合する番地部の文字候補の組み合わせを抽出す
る。番地部候補バッファ１２は、印活優先番地部検索手
段１１の結果または一般番地部検索手段１０の結果を格
納する。最終判定手段１３は、地名部候補バッファ７と
番地部候補バッファ１２をもとに、最終的な住所読み取
り結果の文字列を決定する。The image input means 1 inputs image data in which an address character string including a place name part and an address part is described. The image buffer 2 stores the input image data. The individual recognition means 3 cuts out a segment corresponding to one character from the image data, and individually recognizes each segment. The recognition result buffer 4 stores the result of the individual recognition means 3. The place name table memory 5 stores a list of place names to be read and conditions of an address part corresponding to each place name. The place name part search means 6 searches the place name table memory 5 and extracts place name part candidates corresponding to the combination of character candidates in the recognition result buffer 4. The place name part candidate buffer 7 stores the extracted place name part candidates. The printing activity determination means 8 determines the printing activity of the character string that constitutes the printing site name candidate in the printing site name candidate buffer 7, and if there is a printing site name candidate that seems to be printing activity, it is one of the printing site name candidates. The contents of the place name part candidate buffer 7 are rewritten so that the place name part candidates that are likely to be printed are prioritized. The stamp activity attribute buffer 8 stores the attribute of the place name portion candidate which seems to be the stamp activity detected by the stamp activity determination means 7. When there is a place name part candidate that seems to be a seal, the print priority address part search means 11 matches the attribute in the print attribute buffer 9 from the recognition result buffer 4 and stores the address part in the place name table memory 5. A combination of character candidates of the address part that also meets the conditions is extracted. The general address part searching means 10 extracts a combination of address part character candidates that meet the conditions of the address part in the place name table memory 5 from the recognition result buffer 4 when there is no place name part candidate that seems to be printing. . The address part candidate buffer 12 stores the result of the printing priority address part searching means 11 or the result of the general address part searching means 10. The final determination means 13 determines the final character string of the address read result based on the place name part candidate buffer 7 and the address part candidate buffer 12.

【００３９】これらの構成要素のうち、イメージ入力手
段１、イメージバッファ２、個別認識手段３、認識結果
バッファ４、地名テーブルメモリ５、地名部検索手段
６、地名部候補バッファ７、一般番地部検索手段１０、
番地部候補バッファ１２、最終判定手段１３の各々は、
図３の従来の住所読み取り装置と同様の構成要素であ
る。新規の構成要素は、印活性判定手段８、印活属性バ
ッファ９、印活優先番地部検索手段１１の３つである。Among these components, the image input means 1, the image buffer 2, the individual recognition means 3, the recognition result buffer 4, the place name table memory 5, the place name part searching means 6, the place name part candidate buffer 7, and the general address part search. Means 10,
Each of the address part candidate buffer 12 and the final determination means 13 is
The components are the same as those of the conventional address reading device shown in FIG. The three new components are the stamp activation determination means 8, the stamp attribute buffer 9, and the stamp priority address part searching means 11.

【００４０】これら３つの構成要素について説明する。These three components will be described.

【００４１】印活性判定手段８は、まず、地名部候補バ
ッファ７内の地名部候補について、それを構成する文字
列の印活らしさを判定する。この印活らしさの判定は、
各地名部候補を構成する文字（セグメント）の、例え
ば、次のような点に着目すればよい。印活らしい場合
は、これらの着眼点の各々についてＹＥＳと判定され
る。（Ａ）セグメントの中心点が文字列方向に一直線に並ん
でいるか。（Ｂ）セグメントの中心点の間隔（ピッチ）が均等か。（Ｃ）セグメントのサイズが安定しているか。（Ｄ）セグメントの幅が安定しているか。First, the print activation determining means 8 determines the print activity of the character strings forming the place name part candidates in the place name part candidate buffer 7. This impression of impression is
For example, the following points of the characters (segments) forming the place name department candidates may be focused on. If it is likely to be an impression, it is determined to be YES for each of these viewpoints. (A) Are the center points of the segments aligned in the character string direction? (B) Are the intervals (pitch) between the center points of the segments uniform? (C) Is the segment size stable? (D) Is the segment width stable?

【００４２】図８（ａ）には、印活らしいと判定される
例を示し、併せて、セグメント４０の中心点４１、ピッ
チ４２、サイズ４３、幅４４などの意味も図示した。文
字切り出しの段階で、各セグメントの座標情報は得られ
ているので、中心点、ピッチ、サイズ、幅などは容易に
計算できる。FIG. 8A shows an example in which it is determined that the printing is likely to be made, and the meanings of the center point 41 of the segment 40, the pitch 42, the size 43, the width 44, etc. are also shown. Since the coordinate information of each segment is obtained at the stage of character extraction, the center point, pitch, size, width, etc. can be easily calculated.

【００４３】（Ａ）は、各文字の中心点について、文字
列方向と垂直な方向の座標値の最大と最小との差分が一
定値以内に収まるかどうかを計算すればよい。図８
（ｂ）は、その最大と最小の差分が大きくて、印活らし
いとは判定できない場合の例である。In (A), it is sufficient to calculate whether or not the difference between the maximum and minimum coordinate values in the direction perpendicular to the character string direction is within a fixed value for the center point of each character. FIG.
(B) is an example in which the difference between the maximum and the minimum is large, and it cannot be determined that the printing is likely to be performed.

【００４４】（Ｂ）は、中心点の文字列方向の座標の差
分を順次にとって、ピッチを計算し、そのピッチの最大
と最小の差分が一定値以内に収まるかどうかを判定すれ
ばよい。図８（ｃ）は、ピッチのばらつきが大きくて、
印活らしいとは判定できない場合の例である。In (B), the difference between the coordinates of the center point in the character string direction is sequentially calculated, the pitch is calculated, and it is determined whether the difference between the maximum and the minimum of the pitch falls within a fixed value. In FIG. 8C, the variation in pitch is large,
This is an example in the case where it cannot be determined that it seems to be Inzai.

【００４５】（Ｃ）（Ｄ）も同様に、セグメントの頂点
の座標の差分からサイズや幅を計算して、そのばらつき
を調べる。ただし、サイズや幅の場合は、「一」や
「１」などのように値の小さくなる文字もあるので、そ
れらについては例外的に扱う必要がある。Similarly, in (C) and (D), the size and width are calculated from the difference between the coordinates of the vertices of the segment, and the variation is checked. However, in the case of size and width, there are characters such as "one" and "1" that have smaller values, so it is necessary to handle them exceptionally.

【００４６】また、印活らしさの判定の基準として、必
ずしも、（Ａ）（Ｂ）（Ｃ）（Ｄ）のすべての点を満た
すことを条件にしなくともよい。最初から判定項目を絞
っておいてもよいし、判定項目のうちのいくつかを満た
せばよいものとしてもよい。また、上記の（Ａ）（Ｂ）
（Ｃ）（Ｄ）以外の条件を導入してもよい。Further, as a criterion for determining the impression of printing, it is not always necessary to satisfy all the points (A), (B), (C) and (D). The determination items may be narrowed down from the beginning, or some of the determination items may be satisfied. In addition, the above (A) (B)
Conditions other than (C) and (D) may be introduced.

【００４７】さらに、印活性判定手段８は、印活らしい
地名部候補が存在した場合に、地名部候補のなかで印活
らしい地名部候補を優先するように、地名部候補バッフ
ァ７の内容を書き換える。この書き換え方は、例えば、
次のような方法などが考えられる。（１）印活らしい地名部候補のみを残して、他の地名部
候補を削除する。（２）各地名部候補について、その優先度を格納できる
ように地名部候補バッファ７内に領域を用意しておき、
印活らしい地名部候補が存在した場合には、その候補の
優先度が他の候補の優先度よりも相対的に高くなるよう
に値をセットする。Further, when there is a place name part candidate that seems to be Inzai, the mark activation determining means 8 changes the contents of the place name part candidate buffer 7 so as to give priority to the place name part candidate that seems to be Inkaku among the place name part candidates. rewrite. This rewriting method is, for example,
The following methods are possible. (1) Only the place name part candidates that are likely to be Inzai are left, and other place name part candidates are deleted. (2) For each place name division candidate, an area is prepared in the place name division candidate buffer 7 so that the priority can be stored,
When there is a place name part candidate that seems to be Inzai, a value is set so that the priority of the candidate is relatively higher than the priority of other candidates.

【００４８】例えば、印活性判定手段８を実行する前に
地名部候補バッファ７に、図６（ａ）の地名部と図６
（ｂ）の地名部のような２通りの候補が存在したとす
る。このとき、印活性判定手段８では、（ａ）は印活ら
しいが、（ｂ）は印活らしくないと判定し、上記の
（１）のような書き換え方法をとったとすると、地名部
候補バッファ７から（ｂ）の候補が削除されることにな
る。For example, the place name part of FIG. 6A and the place name part of FIG.
It is assumed that there are two types of candidates such as the place name part in (b). At this time, the seal activation determination means 8 determines that (a) is likely to be active, but (b) is not likely to be active, and if the rewriting method as described in (1) above is adopted, the place name portion candidate buffer is used. 7 to (b) candidates will be deleted.

【００４９】印活属性バッファ９には、印活性判定手段
８で求めた属性のうち、印活優先番地検索手段１１（あ
るいは第二の発明の印活優先番地部個別認識手段１５）
で用いる属性を格納する。本実施例では、例えば、印活
らしいと判定された地名部候補の末尾文字の中心点の座
標、中心点の間隔（ピッチ）、セグメントのサイズを属
性として格納するものとする。In the stamp activation attribute buffer 9, among the attributes obtained by the stamp activation determining means 8, the stamp priority address searching means 11 (or the stamp priority address individual recognizing means 15 of the second invention).
Stores the attributes used in. In the present embodiment, for example, the coordinates of the center points of the last characters of the place name part candidates determined to be printing impressions, the intervals (pitch) of the center points, and the segment size are stored as attributes.

【００５０】印活優先番地部検索手段１１は、一般番地
部検索手段１０と同様に、認識結果バッファ４から番地
部の条件に適合する番地部の文字候補の組み合わせを抽
出する。その際に、一般番地部検索手段１０は、地名テ
ーブルメモリ５における番地部の条件との適合性のみを
考慮するのに対して、印活優先番地部検索手段１１は、
さらに印活属性バッファ９内の属性との適合性も併せて
考慮する点が異なる。したがって、印活優先番地部検索
手段１１は、一般番地部検索手段１０に例えば次のよう
な改造を加えることで実現できる。Similar to the general address part searching means 10, the printing priority address part searching means 11 extracts from the recognition result buffer 4 a combination of character candidates of the address parts which meet the conditions of the address parts. At that time, the general address part searching means 10 considers only the compatibility with the condition of the address part in the place name table memory 5, whereas the printing priority address part searching part 11
Further, the compatibility with the attributes in the print attribute buffer 9 is also taken into consideration. Therefore, the printing priority address part searching means 11 can be realized by adding the following modification to the general address part searching means 10, for example.

【００５１】まず、前処理として、認識結果バッファ４
のなかの各セグメントについて、印活属性バッファ９内
の属性との適合性を判定するようにする。この適合性の
判定方法は、例えば、印活らしい地名部候補の末尾文字
の中心点の座標をピッチにしたがってずらしたもの（推
定中心点座標）と、各セグメントの中心点の座標とを計
算し、その比較を行なえばよい。一例であるが、セグメ
ントの中心点座標から最も近い推定中心点座標との距離
を、印活属性との適合度とみなすことができる。この前
処理では、このような適合度を計算し、それを各セグメ
ントに付与する。あるいは、適合度が敷居値に到達しな
いようなセグメントは、認識結果バッファ４内から削除
してしまうのでもよい。First, as preprocessing, the recognition result buffer 4
For each segment, the compatibility with the attribute in the print attribute buffer 9 is determined. This suitability determination method is performed, for example, by calculating the coordinates of the center point of the last character of the place name part candidate that seems to be In printing (the estimated center point coordinates) and the coordinates of the center point of each segment. , You can make the comparison. As one example, the distance from the center point coordinate of the segment to the closest estimated center point coordinate can be regarded as the compatibility with the printing attribute. In this preprocessing, such a goodness of fit is calculated and given to each segment. Alternatively, a segment whose adaptability does not reach the threshold value may be deleted from the recognition result buffer 4.

【００５２】このような前処理を実行した後で、一般番
地部検索手段１０と同様の処理を実行する。もし一般番
地部検索手段１０において、番地部の文字候補の組み合
わせとしての適切さをコスト値のようなもので定義して
いるならば、上記の印活属性との適合度をコスト値に反
映させればよい。そうでなければ、適合度が敷居値に到
達しないようなセグメントの文字候補を無視して（事前
に削除してしまって）、一般番地部検索手段１０の処理
を実行すればよい。After performing such preprocessing, processing similar to that of the general address part searching means 10 is performed. If the general address part searching means 10 defines the suitability of the character candidates of the address part as something like a cost value, the cost value reflects the compatibility with the printing attribute. Just do it. If not, the character candidates of the segment whose adaptability does not reach the threshold value may be ignored (deleted in advance), and the process of the general address part searching means 10 may be executed.

【００５３】これによって、例えば、図６（ａ）（ｃ）
のような地名部候補に対して、図６（ａ）の「二」
「丁」「目」に対応するようなセグメントは印活属性と
適合するが、図６（ｃ）の「一」「一丁」に対応するよ
うなセグメントは印活属性と適合しないことから、図６
（ｃ）のような誤った番地部候補の抽出を避けることが
できる。As a result, for example, as shown in FIGS.
For a place name part candidate such as "2" in Fig. 6 (a)
The segments corresponding to "Ding" and "eyes" match the printing attributes, but the segments corresponding to "One" and "One Ding" in FIG. 6C do not match the printing attributes. Figure 6
It is possible to avoid erroneous address part candidate extraction as in (c).

【００５４】なお、印活優先番地部検索手段１１と一般
番地部検索手段１０は、印活性判定手段８によって切り
換えて実行される。そして、番地部候補バッファ１２
は、印活優先番地部検索手段１１と一般番地部検索手段
１０のうちの、印活性判定手段８に選択された方の結果
を格納することになる。The seal activation priority address searching means 11 and the general address searching means 10 are switched and executed by the seal activation determining means 8. And the address part candidate buffer 12
Stores the result of one of the stamp activation priority address portion searching means 11 and the general address portion searching means 10 which is selected by the stamp activation determining means 8.

【００５５】図２は、第二の発明の一実施例の構成を示
すブロック図である。FIG. 2 is a block diagram showing the configuration of an embodiment of the second invention.

【００５６】イメージ入力手段１は、地名部と番地部か
ら成る住所文字列の記載されたイメージデータを入力す
る。イメージバッファ２は、入力されたイメージデータ
を格納する。個別認識手段３は、イメージデータのなか
から１文字に相当するセグメントを切り出して、そのセ
グメントごとに別の文字認識を行なう。認識結果バッフ
ァ４は、個別認識手段３の結果を格納する。地名テーブ
ルメモリ５は、読み取り対象の地名のリストと、各地名
に対応する番地部の条件を記憶する。地名部検索手段６
は、地名テーブルメモリ５を検索して、認識結果バッフ
ァ４内の文字候補の組み合わせに対応する地名部候補を
抽出する。地名部候補バッファ７は、抽出された地名部
候補を格納する。印活性判定手段８は、地名部候補バッ
ファ７内の地名部候補について、それを構成する文字列
の印活らしさを判定し、印活らしい地名部候補が存在す
る場合は、地名部候補のなかで印活らしい地名部候補を
優先するように、地名部候補バッファ７の内容を書き換
える。印活属性バッファ８は、印活性判定手段７の検出
した印活らしい地名部候補の属性を格納する。印活優先
番地部個別認識手段１５は、印活らしい地名部候補が存
在した場合に、イメージデータバッファ２内のイメージ
データから、印活属性バッファ９内の属性に適合するよ
うに番地部の１文字に相当するセグメントを切り出し、
そのセグメントごとに個別の文字認識を行なう。一般番
地部個別認識手段１０は、印活らしい地名部候補が存在
しなかった場合に、イメージデータバッファ２内のイメ
ージデータから、番地部の１文字に相当するセグメント
を切り出し、そのセグメントごとに個別の文字認識を行
なう。番地部認識結果バッファ１６は、印活優先番地部
個別認識手段１５の結果または一般番地部個別認識手段
１４の結果を格納する。一般番地部検索手段１０は、認
識結果バッファ４から、地名テーブルメモリ５における
番地部の条件に適合する番地部の文字候補の組み合わせ
を抽出する。番地部候補バッファ１２は、一般番地部検
索手段１０の結果を格納する。最終判定手段１３は、地
名部候補バッファ７と番地部候補バッファ１２をもと
に、最終的な住所読み取り結果の文字列を決定する。The image input means 1 inputs image data in which an address character string including a place name part and an address part is described. The image buffer 2 stores the input image data. The individual recognition means 3 cuts out a segment corresponding to one character from the image data, and recognizes another character for each segment. The recognition result buffer 4 stores the result of the individual recognition means 3. The place name table memory 5 stores a list of place names to be read and conditions of an address part corresponding to each place name. Place name part search means 6
Searches the place name table memory 5 and extracts place name part candidates corresponding to the combination of character candidates in the recognition result buffer 4. The place name part candidate buffer 7 stores the extracted place name part candidates. The printing activity determination means 8 determines the printing activity of the character string that constitutes the printing site name candidate in the printing site name candidate buffer 7, and if there is a printing site name candidate that seems to be printing activity, it is one of the printing site name candidates. The contents of the place name part candidate buffer 7 are rewritten so that the place name part candidates that are likely to be printed are prioritized. The stamp activity attribute buffer 8 stores the attribute of the place name portion candidate which seems to be the stamp activity detected by the stamp activity determination means 7. When there is a place name portion candidate that seems to be a seal stamp, the stamp priority priority address portion individual recognizing means 15 uses the address portion 1 to match the attribute in the seal stamp attribute buffer 9 from the image data in the image data buffer 2. Cut out the segment corresponding to the character,
Individual character recognition is performed for each segment. The general address part individual recognition means 10 cuts out a segment corresponding to one character of the address part from the image data in the image data buffer 2 when there is no place name part candidate that is likely to be printed, and individually for each segment. Character recognition. The address part recognition result buffer 16 stores the result of the printing priority address part individual recognition means 15 or the result of the general address part individual recognition means 14. The general address part searching means 10 extracts, from the recognition result buffer 4, a combination of address part character candidates that match the conditions of the address part in the place name table memory 5. The address part candidate buffer 12 stores the result of the general address part search means 10. The final determination means 13 determines the final character string of the address read result based on the place name part candidate buffer 7 and the address part candidate buffer 12.

【００５７】これらの構成要素のうち、イメージ入力手
段１、イメージバッファ２、個別認識手段３、認識結果
バッファ４、地名テーブルメモリ５、地名部検索手段
６、地名部候補バッファ７、一般番地部個別認識手段１
４、番地部認識結果バッファ１６、一般番地部検索手段
１０、番地部候補バッファ１２、最終判定手段１３の各
々は、図４の従来の住所読み取り装置と同様の構成要素
である。新規の構成要素は、印活性判定手段８、印活属
性バッファ９、印活優先番地部個別認識手段１５の３つ
である。そのうち、印活性判定手段８と印活属性バッフ
ァ９は、第一の発明の実施例の構成（図１）と共通であ
り、既に説明済みである。Of these constituent elements, the image input means 1, the image buffer 2, the individual recognition means 3, the recognition result buffer 4, the place name table memory 5, the place name part search means 6, the place name part candidate buffer 7, the general address part individual Recognition means 1
4, the address part recognition result buffer 16, the general address part search means 10, the address part candidate buffer 12, and the final determination part 13 are the same constituent elements as the conventional address reading device of FIG. The three new components are the stamp activation determination means 8, the stamp attribute buffer 9, and the stamp priority address part individual recognition means 15. Among them, the print activation judging means 8 and the print attribute buffer 9 are common to the configuration of the embodiment of the first invention (FIG. 1) and have been already described.

【００５８】印活優先番地部個別認識手段１５について
説明する。The printing priority address part individual recognition means 15 will be described.

【００５９】印活優先番地部個別認識手段１５は、一般
番地部個別認識手段１４と同様に、イメージバッファ２
から番地部の１文字に相当するセグメントを切り出して
個別に文字認識する。その切り出し処理の際に、印活優
先番地部個別認識手段１５は、印活属性バッファ９内の
属性との適合性も考慮する点が異なる。したがって、印
活優先番地部個別検索手段１５は、一般番地部個別認識
手段１４の文字切り出し処理に、例えば次のような改造
を加えることで実現できる。The print priority address individual recognizing means 15 is similar to the general address individual recognizing means 14 in the image buffer 2.
A segment corresponding to one character in the address section is cut out and the character is individually recognized. In the cutout processing, the stamp impression priority address part individual recognition means 15 is different in that the compatibility with the attribute in the stamp impression attribute buffer 9 is also taken into consideration. Therefore, the printing priority address individual search unit 15 can be realized by adding the following modification to the character segmentation process of the general address individual recognition unit 14, for example.

【００６０】印活属性バッファ９内の属性から、例え
ば、印活らしい地名部候補の末尾文字の中心点の座標と
ピッチがわかる。そこで、その地名部候補の末尾文字の
中心点の座標をピッチにしたがってずらしたもの（推定
中心点座標）を計算する。そして、印活属性バッファ９
内の属性から文字（セグメント）のサイズもわかれば、
推定中心点座標を中心として、その固定サイズでセグメ
ントを切り出せばよい。From the attributes in the printing attribute buffer 9, for example, the coordinates and pitch of the center point of the last character of the place name portion candidate that seems to be printing can be known. Therefore, a coordinate (estimated center point coordinate) obtained by shifting the coordinates of the center point of the last character of the place name portion candidate according to the pitch is calculated. And the printing attribute buffer 9
If you know the size of the character (segment) from the attributes inside,
A segment may be cut out with a fixed size around the estimated center point coordinates.

【００６１】なお、印活優先番地部個別認識手段１５と
一般番地部個別認識手段１４は、印活性判定手段８によ
って切り換えて実行される。そして、番地部認識結果バ
ッファ１６は、印活優先番地部個別認識手段１５と一般
番地部個別認識手段１４のうちの、印活性判定手段８に
選択された方の結果を格納することになる。The printing activation priority address individual recognizing means 15 and the general address individual recognizing means 14 are switched and executed by the printing activation determining means 8. Then, the address part recognition result buffer 16 stores the result of one of the printing priority address part individual recognition means 15 and the general address part individual recognition means 14 which is selected by the printing activity determination part 8.

【００６２】[0062]

【発明の効果】以上に述べたように、本発明では、地名
部候補を求めた後で印活性を判定しているので、正確な
印活判定が可能である。そして、印活らしいと判定でき
たときは、それを番地部の処理に利用するため、番地部
の文字切り出しの可能性を適切に絞り込むことができ、
印活文字列の番地部の読み取りが正確で効率の良いもの
になっている。すなわち、本発明によれば、手書き文字
列か印活文字列かの指定が外から与えられなくとも、正
確に印活判定を行ない、印活文字列であった場合は、そ
の特性（等サイズ・等ピッチなど）を生かして読み取
る、正確で効率の良い住所読み取り装置が提供できる。As described above, according to the present invention, since the printing activity is determined after the place name portion candidates are obtained, the printing activity can be accurately determined. Then, when it is determined that it seems to be an impression, it is used for the processing of the address part, so the possibility of character cutting out of the address part can be appropriately narrowed down,
The reading of the address part of the print character string is accurate and efficient. That is, according to the present invention, even if the designation of the handwritten character string or the stamp character string is not given from the outside, the stamp mark is accurately determined, and if the stamp character string is detected, the characteristics (equal size)・ We can provide an accurate and efficient address reading device that reads by making the best use of equal pitches.

[Brief description of drawings]

【図１】第一の発明の一実施例の構成を示すブロック図
である。FIG. 1 is a block diagram showing the configuration of an embodiment of the first invention.

【図２】第二の発明の一実施例の構成を示すブロック図
である。FIG. 2 is a block diagram showing the configuration of an embodiment of the second invention.

【図３】従来の住所読み取り装置の構成を示すブロック
図である。FIG. 3 is a block diagram showing a configuration of a conventional address reading device.

【図４】従来の住所読み取り装置の構成を示すブロック
図である。FIG. 4 is a block diagram showing a configuration of a conventional address reading device.

【図５】地名テーブルメモリ５の内容の例である。5 is an example of contents of a place name table memory 5. FIG.

【図６】印活文字列に対する文字切り出し結果の例であ
る。FIG. 6 is an example of a character cutout result for a printed character string.

【図７】印活文字列に対する文字切り出し結果の例であ
る。FIG. 7 is an example of a character cutout result for a printed character string.

【図８】印活性判定手段８による判定例を説明するため
の図である。FIG. 8 is a diagram for explaining a determination example by a seal activation determination means 8.

[Explanation of symbols]

１イメージ入力手段２イメージバッファ３個別認識手段４認識結果バッファ５地名テーブルメモリ６地名部検索手段７地名部候補バッファ８印活性判定手段９印活属性バッファ１０一般番地部検索手段１１印活優先番地部検索手段１２番地部候補バッファ１３最終判定手段１４一般番地部個別認識手段１５印活優先番地部個別認識手段１６番地部認識結果バッファ４０セグメント４１セグメントの中心点４２セグメントのピッチ４３セグメントのサイズ４４セグメントの幅５０都道府県名５１市区名５２町名５３丁目５４番５５号５６地名のリスト５７番地部の条件 1 image input means 2 image buffer 3 individual recognition means 4 recognition result buffer 5 place name table memory 6 place name part search means 7 place name part candidate buffer 8 stamp activation judgment means 9 stamp attribute buffer 10 general address search means 11 stamp priority address Part search means 12 Address part candidate buffer 13 Final decision means 14 General address part individual recognition means 15 Printing priority address part individual recognition means 16 Address part recognition result buffer 40 Segment 41 Segment center point 42 Segment pitch 43 Segment size 44 Width of segment 50 Prefecture name 51 City name 52 Town name 53 Chome 54 No. 55 No. 56 List of place names 57 Condition of address part

Claims

[Claims]

1. In an address reading device for reading an address character string consisting of a place name part and an address part from input image data, a segment corresponding to one character is cut out from the image data, and an individual character is segmented for each segment. Individual recognition means for recognition, a recognition result buffer for storing the result of the individual recognition means, a place name table memory in which a list of place names to be read and conditions of an address part corresponding to the place name are registered, and the place name table memory A place name part searching means for extracting a place name part candidate corresponding to a combination of character candidates in the recognition result buffer, a place name part candidate buffer storing the place name part candidate, and a character forming the place name part candidate. If there is a place name club candidate that seems to be an In print after judging the print impression of the row, the place name that seems to be the Inzai among the place name club candidates. A priority indicia activation determining means candidate,
An impression stamp buffer that stores the attributes of the place name portion candidate that seems to be printing, and an attribute in the printing attribute buffer and the address in the place name table memory from the recognition result buffer when the candidate printing place name portion exists Print priority address part search means for extracting a combination of character candidates of the address part that match the conditions of the copy part, and an address in the place name table memory from the recognition result buffer when the place name part candidate that seems to be the print position does not exist A general address part search means for extracting a combination of address part character candidates that match the conditions of the copy part, and an address part candidate buffer for storing the result of the printing priority address part search means or the result of the general address part search means , And a final determination means for determining a final character string of the address reading result based on the address part candidate buffer and the address part candidate buffer. Address reading device, characterized in that.

2. In an address reading device for reading an address character string consisting of a place name part and an address part from input image data, a segment corresponding to one character is cut out from the image data, and an individual character for each segment is cut out. Individual recognition means for recognition, a recognition result buffer for storing the result of the individual recognition means, a place name table memory in which a list of place names to be read and conditions of an address part corresponding to the place name are registered, and the place name table memory A place name part searching means for extracting a place name part candidate corresponding to a combination of character candidates in the recognition result buffer, a place name part candidate buffer storing the place name part candidate, and a character forming the place name part candidate. If there is a place name club candidate that seems to be an In print after judging the print impression of the row, the place name that seems to be the Inzai among the place name club candidates. A priority indicia activation determining means candidate,
A print job attribute buffer that stores the attributes of the print job place name portion candidate, and an address portion that matches the attribute in the print job attribute buffer from the image data when the print job likely place name portion candidate exists. Indicative priority address part individual recognition means for cutting out a segment corresponding to one character and recognizing an individual character for each segment, and an address part from the image data when there is no place name part candidate that seems to be the impression. A general address part individual recognition means for cutting out a segment corresponding to one character and performing individual character recognition for each segment, and a result of the printing priority address part individual recognition means or a result of the general address part individual recognition means is stored. Address part recognition result buffer, and an address part character that meets the condition of the address part in the place name table memory from the address part recognition result buffer A general address part searching means for extracting a complementary combination, an address part candidate buffer for storing the result of the general address part searching means, and a final address reading based on the place name part candidate buffer and the address part candidate buffer. An address reading device, comprising: final determination means for determining a resulting character string.