JPH01296393A

JPH01296393A - Category deciding device

Info

Publication number: JPH01296393A
Application number: JP63127413A
Authority: JP
Inventors: Kaoru Katagiri; 片桐　薫
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1988-05-25
Filing date: 1988-05-25
Publication date: 1989-11-29

Abstract

PURPOSE:To discriminate the category of words having variable lengths by calculating evaluated values of characters at every kind of character and the evaluated value of a word from the sum of the number of each digit of the calculated value and deciding the category composed of the numbers of kinds and digits of the characters constituting the word. CONSTITUTION:The address information written on a postal matter P is read by a scanner 11 and sent to a recognizing section 13 after the character of the information are segmented at a character detecting and segmenting section 12. At the recognizing section 13, capital alphabetical characters, small alphabetical characters, numerical characters, and special characters are recognized from the inputted characters based on character information stored in the 1st memory 14 and a recognized result is sent to a category deciding section 15 together with character patterns constituting words. The deciding section 15 discriminates whether each word is constituted of numerical characters or alphabetical characters only or a combination of alphabetical and numerical characters and how many digit each word has. At a word recognizing section 17, the inputted address information is separated into 'alphabetical' and 'numerical' words and whether a corresponding word represents a 'state', 'city', etc., is recognized based on words stored in a 3rd memory 18.

Description

【発明の詳細な説明】［発明の目的コ（産業上の利用分野）この発明は、例えば外国郵便物に付されたアドレス情報
を読取る所謂アドレス・リーダに適用されるものであり
、可変長の単語に対してもカテゴリ判定を行うことが可
能なカテゴリ判定装置に関する。[Detailed Description of the Invention] [Purpose of the Invention (Industrial Application Field) This invention is applied to a so-called address reader that reads address information attached to foreign mail, for example, and is applicable to a variable-length address reader. The present invention relates to a category determination device that can perform category determination even on words.

（従来の技術）例えば国内向けの郵便物に付された郵便番号は手書き数
字、活字数字、〒マーク、−（ハイフォン）からなる文
字種の組合わせによって構成されている。この郵便番号
の組合わせカテゴリの判定は、郵便番号を構成している
各文字種で求められた複合類似度の最大類似度値を有す
る文字種の組合わせが、〒マークと数字３桁、数字３桁
と−（ハイフォン）と数字２桁、数字３桁のみのいずれ
のフォーマットと一致しているかを判定することにより
行っている。(Prior Art) For example, a postal code assigned to a domestic mail item is composed of a combination of character types including handwritten numbers, printed numbers, a 〒 mark, and a - (hyphen). This postal code combination category determination is based on the combination of character types that has the maximum similarity value of the composite similarity calculated for each character type that makes up the postal code: 〒 mark, 3 digits, and 3 digits. This is done by determining which format matches - (hyphen), 2-digit numbers, or only 3-digit numbers.

即ち、従来のカテゴリ判定はアドレス情報を構成する１
つの単語中の最大類似度を有する文字種の組合わせが、
判定するカテゴリと一致していなければ判定することが
できないものであった。したがって、外国郵便物に付さ
れたアドレス情報のように、複数の文字種からなる可変
長の単語に対してはカテゴリ判定を行うことが不可能な
ものであった。In other words, conventional category determination is based on the 1 that constitutes address information.
The combination of character types with the maximum similarity in two words is
It was impossible to make a judgment unless it matched the category to be judged. Therefore, it has been impossible to perform category determination on variable length words made up of multiple character types, such as address information attached to foreign mail.

また、従来の装置においては、例えば単語中にかすれた
文字が１文字でもあれば、カテゴリ判定を行うことが不
可能なものであった。Further, in conventional devices, if there is even one blurred character in a word, for example, it is impossible to perform category determination.

（発明が解決しようとする課題）この発明は、複数の文字種からなる可変長の単語のカテ
ゴリ判定を行うことができないという課題を解決するも
のであり、その目的とするところは、複数の文字種から
なる可変長の単語のカテゴリ判定を確実に行うことが可
能なカテゴリ判定装置を提供しようとするものである。(Problems to be Solved by the Invention) This invention solves the problem that it is not possible to perform category determination for words of variable length consisting of multiple character types. The present invention aims to provide a category determination device that can reliably determine the category of words of variable length.

［発明の構成］（課題を解決するための手段）この発明は、少なくとも２種の文字種で１桁以上からな
る可変長の単語を読取る読取り手段と、この読取り手段
の読取り出力より単語を構成する文字を認識する認識手
段と、この認識手段の認識出力より単語を構成する文字
種の各桁毎の評価値を算出する第１の算出手段と、この
各文字種の桁毎に算出された評価値を各桁毎に加算する
ことにより、これら文字種によって構成される単語評価
値を算出する第２の算出手段と、この算出された単語評
価値より、単語を構成する文字種および桁数からなるカ
テゴリを判定する判定手段とから構成されている。[Structure of the Invention] (Means for Solving the Problems) This invention comprises a reading means for reading words of variable length consisting of one or more digits in at least two types of characters, and a word is constructed from the reading output of the reading means. a recognition means for recognizing characters; a first calculation means for calculating an evaluation value for each digit of a character type constituting a word from a recognition output of the recognition means; A second calculation means that calculates a word evaluation value made up of these character types by adding each digit, and a category made of the character types and number of digits that make up the word is determined from the calculated word evaluation value. and determining means.

（作用）この発明は、読取り手段によって読取られ、２鷹手段に
より認識された文字より、第１の算出手段によって単語
を構成する文字種毎の評価値を算出し、第２の算出手段
によってこの文字種毎に算出された評価値の桁毎の和を
求めることより単語評価値を算出し、この算出された単
語評価値より、判定手段によって単語を構成する文字種
および文字の桁数からなるカテゴリを判定することによ
り、複数の文字種によって構成された可変長の単語のカ
テゴリ判別を可能としている。(Operation) The present invention calculates an evaluation value for each character type constituting a word by the first calculation means from the characters read by the reading means and recognized by the second reading means, and calculates the evaluation value for each character type constituting the word by the second calculation means. A word evaluation value is calculated by calculating the sum of the evaluation values calculated for each digit, and from this calculated word evaluation value, the determination means determines the category consisting of the character type and the number of digits of the character that make up the word. By doing so, it is possible to determine the category of variable-length words made up of multiple character types.

（実施例）以下、この発明の一実施例について図面を参照して説明
する。(Example) Hereinafter, an example of the present invention will be described with reference to the drawings.

第１図において、郵便物Ｐに記載されたアドレス情報は
、スキャナ装置１１によって読取られ、この読取られた
アドレス情報は、文字検出切出部１２に供給される。こ
の文字検出切出部１２ではアドレス情報を構成する文字
が切出され、この切出された文字情報は認識部１３に供
給される。この認識部１３では、第１のメモリ１４に記
憶されている文字情報に基づいて、入力された文字が「
英字」、英字の「小文字」、「数字」、「特殊記号」の
何れであるかが認識される。In FIG. 1, address information written on a postal item P is read by a scanner device 11, and this read address information is supplied to a character detection and cutting section 12. The character detection and cutting section 12 cuts out characters constituting the address information, and the cut out character information is supplied to the recognition section 13. In this recognition unit 13, based on the character information stored in the first memory 14, the input character is "
It recognizes whether it is an alphabetic character, a lowercase alphabetic character, a number, or a special symbol.

ところで、外国郵便物に付されたアドレス情報は、第２
図に示す如く、「英字」　「数字」およびピリオド、ハ
イフォン等の「特殊記号」によって構成されている。By the way, the address information attached to foreign mail is
As shown in the figure, it is composed of "alphabet letters", "numbers", and "special symbols" such as periods and hyphens.

さらに、これら「英字」　「数字」　「特殊記号」から
なる単語は、１つの単語中に複数の文字種が混在してい
ることはなく、例えばｒＮＥＷＪのように「英字」のみ
からなる単語、ｒ２７５Ｊのように「数字」のみからな
る単語、あるいはｒ７ＴＨＪのように先頭からある桁（
この場合１桁）までは「数字」、これ以降は「英字」と
いうように配列された単語によって構成され、１つの単
語において例えば「数字」　「英字」　「数字」という
ような構成は出現しないものである。Furthermore, these words consisting of "alphabetic letters,""numbers," and "special symbols" do not have multiple character types mixed together in one word. For example, words consisting only of "alphabetic letters" such as rNEWJ, A word consisting only of ``numbers'', such as ``r7THJ'', or a word consisting only of ``digits'' from the beginning, such as ``r7THJ'' (
In this case, it is composed of words arranged in such a way that up to one digit (1 digit) is a ``number'' and after that is an ``alphabetic letter.'' For example, the words ``number,'' ``alphabetic letter,'' and ``number'' do not appear in one word. It is.

上記認識部１３では、「英字」、英字の「小文字」、「
数字」、「特殊記号」の認識結果（候補）が単語毎に出
力され、この認識結果は、単語を構成する文字パターン
とともにカテゴリ判定部１５に供給される。The recognition unit 13 recognizes "alphabetic characters", "lowercase letters", "
Recognition results (candidates) for ``numbers'' and ``special symbols'' are output for each word, and the recognition results are supplied to the category determination unit 15 together with the character patterns that make up the word.

このカテゴリ判定部１５は例えばマイクロコンピータに
よって構成されており、このカテゴリ判定部１５には、
動作プログラム等が記憶された第２のメモリ１６が接続
されている。This category determining section 15 is constituted by, for example, a microcomputer, and this category determining section 15 includes:
A second memory 16 in which operating programs and the like are stored is connected.

このカテゴリ判定部１６では、読取られた単語が「英字
」あるいは「数字」のみから構成されるいるものか、「
数字」　「英字」の組合わせで、これらの文字種が前桁
ずつによって構成されているものかが判別される。This category determination unit 16 determines whether the read word is composed only of "alphabetic characters" or "numbers".
By combining numbers and letters, it is determined whether these character types are composed of the first digits.

第３図は、カテゴリ判定部１５の動作を示すものであり
、以下、第３図に従ってカテゴリ判定部１５の動作につ
いて説明する。FIG. 3 shows the operation of the category determining section 15, and the operation of the category determining section 15 will be described below with reference to FIG.

例えば前記認識部１３より６桁の単語に対する認識結果
が入力されたものとすると、この単語を構成する文字の
認識結果に応じて、第４図に示すようなリストが作成さ
れる（ステップ５ＴＩ）。For example, if the recognition result for a 6-digit word is input from the recognition unit 13, a list as shown in FIG. 4 is created according to the recognition result of the characters that make up this word (step 5TI). .

即ち、第４図（ａ）は、６桁の単語を「数字」として認
識した認識結果を示すものであり、「数字」として認識
できた部分には「Ｏ」、認識できなかった部分にはｒＸ
Ｊが各桁に対応して設定される。In other words, Fig. 4(a) shows the recognition result of recognizing a 6-digit word as a "number", where the part that was recognized as a "number" is marked with an "O", and the part that cannot be recognized is marked with an "O". rX
J is set corresponding to each digit.

また、同図（ｂ）は、同様に６桁の単語について「英字
」と認識できた部分に「Ｏ」、認識できなかった部分に
「×」が設定されている。In addition, in FIG. 6B, similarly, for a six-digit word, "O" is set for the part that can be recognized as an "alphabet character", and "x" is set for the part that cannot be recognized.

次に、このようにして作成されたリストに基づき、「英
字」のみ、「数字」のみ、「英字」　「数字」の組合わ
せの、いずれのカテゴリであるかが判定される。Next, based on the list created in this way, it is determined whether the category is "alphabetic characters" only, "numbers" only, or a combination of "alphabetic characters" and "numbers".

先ず、第４図（ａ）に示す「数字」のリストについて、
単語の先頭からｒＯＪを１点、「×」を０点として、各
桁についてｒＯＪの数を数えたリストを第５図（ａ）に
示す如く生成する（ステップ５Ｔ２）。First, regarding the list of "numbers" shown in Figure 4(a),
A list is generated by counting the number of rOJs for each digit, as shown in FIG. 5(a), with rOJs as 1 point and "x" as 0 points from the beginning of the word (step 5T2).

一方、「英字」については、第４図（ｂ）に示すリスト
の末尾からｒＯＪを１点、「×」を０点として、各桁に
ついてｒＯＪの数を数えたリストを第５図（ｂ）に示す
如く生成する（ステップ５Ｔ３）。On the other hand, for "alphabetic characters", from the end of the list shown in Figure 4(b), rOJ is set as 1 point and "×" is set as 0 points, and the list obtained by counting the number of rOJ for each digit is shown in Figure 5(b). It is generated as shown in (step 5T3).

このようにして各桁について求めた評価値より、第５図
に示す如く、「英字６桁」と「数字０桁」、「英字５桁
」と「数字１桁」・・・「英字１桁」と「数字５桁」、
「英字０桁」と「数字６桁」の組合わせについてそれぞ
れ「数字」と「英字」の評価値を加えることにより、第
６図に示すような単語評価値が求められる（ステップ５
Ｔ４）。この求められた単語評価値の中で、最大値がス
ライス値より大きい値が存在するか否かが判別され（ス
テップ５Ｔ５）、この結果、最大値を有する「数字」　
「英字」の組合わせが求めるカテゴリとして単語認識部
１７に出力される（ステップ５Ｔ６）。From the evaluation values obtained for each digit in this way, as shown in Figure 5, "6 digits of alphabets" and "0 digits of numbers", "5 digits of alphabets" and "1 digit of numbers"... "1 digit of alphabets" ” and “5-digit number”,
By adding the evaluation values of ``numbers'' and ``alphabetic characters'' for the combinations of ``0-digit alphabetical character'' and ``6-digit number,'' the word evaluation value as shown in Figure 6 is obtained (Step 5
T4). It is determined whether or not there is a value whose maximum value is larger than the slice value among the obtained word evaluation values (step 5T5), and as a result, a "number" having the maximum value exists.
The combination of "alphabetic characters" is output to the word recognition unit 17 as the desired category (step 5T6).

第６図に示す場合、単語評価値の最大値は「５」であり
、スライス値「４」より大きいため、この単語評価値「
５」に対応する「数字４桁＋英字２桁」、「数字５桁＋
英字１桁」が前記入力された単語のカテゴリであると判
定することができ、このようにして判定されたカテゴリ
は、単語認識部１７に供給される。In the case shown in FIG. 6, the maximum value of the word evaluation value is "5", which is larger than the slice value "4", so this word evaluation value "
``4 digits + 2 digits alphabet'' corresponding to ``5'', ``5 digits + 5 digits''
It can be determined that "1 digit alphabetic character" is the category of the input word, and the category thus determined is supplied to the word recognition unit 17.

この単語認識部１７では、判定されたカテゴリに従って
、前記入力された文字情報が「英字」からなる単語と、
「数字」からなる単語に分けられ、この分けられた単語
に対して、第３のメモリ１８に記憶されている単語に基
づいて、対応する単語が「州名」　「都市名」等のいず
れであるかが認識される。この認識結果は、図示せぬ郵
便番号決定部に供給され、この認識された「州名」　「
都市名」等から対応する「郵便番号」が決定される。In this word recognition unit 17, according to the determined category, the input character information is a word consisting of "alphabetic characters",
It is divided into words consisting of "numbers," and based on the words stored in the third memory 18, it is determined whether the corresponding word is "state name", "city name", etc. What is there is recognized. This recognition result is supplied to a postal code determination unit (not shown), and the recognized "state name" and "
The corresponding "zip code" is determined from the city name, etc.

上記実施例によれば、郵便物から読取られたアドレス情
報を構成する単語に含まれる「英字」「数字」の評価値
をそれぞれ求め、「英字」の先頭からの評価値と、「数
字」の末尾からの評価値の各桁毎の和を求めることによ
って単語評価値を求め、この単語評価値が所定値以上で
ある場合、その単語評価値に対応する「数字」と「英字
」のカテゴリを、その単語のカテゴリとして出力するよ
うにしている。したがって、複数の文字種によって構成
された可変長の単語であっても、確実にその単語のカテ
ゴリを判別することができるものである。According to the above embodiment, the evaluation values of "alphabets" and "numbers" included in the words constituting the address information read from the mail are obtained, and the evaluation values from the beginning of "alphabetic characters" and the evaluation values of "numbers" are calculated. A word evaluation value is obtained by calculating the sum of each digit of the evaluation value from the end, and if this word evaluation value is greater than a predetermined value, the categories of "numbers" and "alphabets" corresponding to that word evaluation value are determined. , the word is output as a category. Therefore, even if the word is a variable length word made up of multiple character types, the category of the word can be reliably determined.

また、文字の評価値、単語評価値を求めているため、読
取られた単語中に掠れた文字が含まれている場合におい
ても、カテゴリ判別の精度を向上することが可能である
。Furthermore, since character evaluation values and word evaluation values are obtained, it is possible to improve the accuracy of category discrimination even when a read word contains blurred characters.

さらに、単語評価値を求めているため、これら単語評価
値のうち複数の単語評価値がスライス値より大きい場合
、これら評価値の最大値をその単語のカテゴリに対応す
るものと判別することによリ、複数の単語評価値が同時
にスライス値を越えた場合においても、より確実にカテ
ゴリを判別することができるものである。Furthermore, since word evaluation values are being calculated, if multiple word evaluation values among these word evaluation values are larger than the slice value, the maximum value of these evaluation values is determined to be the one that corresponds to the word category. Second, even if a plurality of word evaluation values exceed the slice value at the same time, categories can be determined more reliably.

尚、この発明は上記実施例に限定されるものではなく、
この発明の要旨を変えない範囲において、種々変形実施
可能なことは勿論である。Note that this invention is not limited to the above embodiments,
Of course, various modifications can be made without departing from the gist of the invention.

［発明の効果］以上、詳述したようにこの発明によれば、読取り手段に
よって読取られ、認識手段により認識された文字より、
第１の算出手段によって単語を構成する文字種毎の評価
値を算出し、第２の算出手段によってこの文字種毎に算
出された評価値の桁毎の和を求めることより単語評価値
を算出し、この算出された単語評価値より、判定手段に
よって単語を構成する文字種および文字の桁数からなる
カテゴリを判定することにより、複数の文字線によって
構成された可変長の単語のカテゴリを判別することが可
能なカテゴリ判定装置を提供できる。[Effects of the Invention] As detailed above, according to the present invention, from the characters read by the reading means and recognized by the recognition means,
Calculating the word evaluation value by calculating the evaluation value for each character type constituting the word by the first calculation means, and calculating the sum of the evaluation values calculated for each character type by the second calculation means for each digit, From this calculated word evaluation value, by determining the category consisting of the character type and the number of digits of characters constituting the word using the determination means, it is possible to determine the category of a variable length word composed of multiple character lines. A possible category determination device can be provided.

[Brief explanation of the drawing]

第１図はこの発明の一実施例を示す構成図、第２図は読
取られる文字の一例を示す図、第３図乃至第６図はそれ
ぞれこの発明の詳細な説明するために示す図である。１１・・・スキャナ装置、１２・・・文字検出切出部、
１３・・・認識部、１５・・・カテゴリ判定部。出願人代理人　弁理士　鈴江武彦１１２図第３図FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a diagram showing an example of characters to be read, and FIGS. 3 to 6 are diagrams for explaining the invention in detail. . 11... Scanner device, 12... Character detection cutting section,
13... Recognition unit, 15... Category determination unit. Applicant's agent Patent attorney Takehiko Suzue 112 Figure 3

Claims

[Scope of Claims] Reading means for reading words of variable length consisting of one or more digits in at least two types of characters; recognition means for recognizing characters constituting a word from the read output of this reading means; A first calculation means that calculates an evaluation value for each digit of the character types that make up a word from the recognition output, and a A second calculating means for calculating a word evaluation value, and a determining means for determining a category consisting of a character type and a number of digits forming a word from the calculated word evaluation value. category determination device.