JPH0646423B2

JPH0646423B2 - Word dictionary matching device

Info

Publication number: JPH0646423B2
Application number: JP61235610A
Authority: JP
Inventors: 修池田; 一成江上
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-10-02
Filing date: 1986-10-02
Publication date: 1994-06-15
Anticipated expiration: 2009-06-15
Also published as: JPS6389991A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、入力された読取文字列を単語辞書内の登録内
容と照合して該読取文字列を認識する単語辞書照合装置
に関する。TECHNICAL FIELD The present invention relates to a word dictionary matching device that recognizes a read character string by matching the input read character string with registered contents in a word dictionary.

[Conventional technology]

第３図はこの種の単語照合装置の従来例を示すブロック
図である。FIG. 3 is a block diagram showing a conventional example of this type of word matching device.

入力レジスタ31は文字読取機（不図示）から読取られた
文字列を入力して記憶し、さらに語長テーブルポインタ
32へ転送する。語長テーブルポインタ32は読取文字列を
その中に含まれる単語列の語長（文字数）ごとに分類
し、さらにこれを単語辞書に適合するコードに変換す
る。単語辞書読取部33は該コードを入力してこのコード
を用いて単語辞書34に読出し指令を出し、単語辞書34か
ら該コードに対応する文字列を読出すとともに、入力レ
ジスタ31内に記憶されている読取文字列と揃えて単語照
合部35へ出力する。単語照合部35は単語辞書読出部33か
ら入力された両文字列が一致するかどうかを照合する。The input register 31 inputs and stores a character string read from a character reader (not shown), and further stores a word length table pointer.
Transfer to 32. The word length table pointer 32 classifies the read character string according to the word length (the number of characters) of the word string contained therein, and further converts this into a code suitable for the word dictionary. The word dictionary reading unit 33 inputs the code, issues a read command to the word dictionary 34 using this code, reads the character string corresponding to the code from the word dictionary 34, and stores it in the input register 31. The read character string is output in alignment with the read character string. The word collating unit 35 collates whether both character strings input from the word dictionary reading unit 33 match.

このようにして、単語照合部35は両文字列が一致したと
き、そのコードをもって当該読取文字列のコードとして
出力する。In this way, when the two character strings match, the word matching unit 35 outputs the code as the code of the read character string.

[Problems to be solved by the invention]

上述した従来の単語照合装置では、単語を構成する文字
数が多くなればなるほど１文字当りのビット数が増大す
るので、単語辞書の容量と照合時間が増大するという欠
点がある。In the above-mentioned conventional word collating device, the number of bits per character increases as the number of characters forming a word increases, which has a drawback of increasing the capacity of the word dictionary and the collation time.

[Means for solving problems]

本発明の単語辞書照合装置は、読出文字中に予め定めら
れた出現頻度の低い文字を含まないときは、前記読取文
字列の一致を確認したとき該読取文字またはこれを代表
するコードを出力し、読取文字中に予め定められた出現
頻度の低い文字を含むときは、該文字を予め定められた
単一コードに変換した文字列として同様に辞書を検索
し、該文字列の一致を確認したとき辞書内の該文字列に
格納されているポインタ情報を出力する第１の照合手段
と、第１の照合手段から入力された前記ポインタ情報
と、前記文字読取機から入力された読取文字列中の該ポ
インタ情報に示された出現頻度を低い文字とから、予め
出現頻度の低い文字を含む文字列を格納している第２の
辞書を検索照合して、読取文字列の一致を確認したとき
該読取文字またはこれを代表するコードを出力する第２
の照合手段を有している。The word dictionary matching device of the present invention outputs the read character or a code representative thereof when the read character does not include a character having a predetermined appearance frequency that is low in frequency, when the read character string is confirmed to match. , If the read character includes a character with a predetermined occurrence frequency that is low in frequency, the dictionary is searched in the same way as a character string obtained by converting the character into a predetermined single code, and the matching of the character string is confirmed. When the first collating means for outputting pointer information stored in the character string in the dictionary, the pointer information input from the first collating means, and the read character string input from the character reader When it is confirmed that the read character string matches by searching and collating a second dictionary that stores a character string including a character with a low appearance frequency in advance from the character with a low appearance frequency indicated by the pointer information This read character or this The second to output the code that representative
It has a collating means.

このようにして、出現頻度の高い文字のみで構成された
読取文字列は出現頻度の高い文字のみで構成された容量
の小さい辞書で検索照合するので、従来例のような容量
の大きい辞書について検索照合する場合に比べて処理時
間が小さく、読取文字列に出現頻度の低い文字を含む場
合に２回照合したとしても総合的に処理時間を節約でき
る。In this way, since the read character string composed only of the frequently appearing characters is searched and collated with the small capacity dictionary composed only of the frequently appearing character, the dictionary having a large capacity like the conventional example is searched. The processing time is shorter than that in the case of collation, and the processing time can be totally saved even if the collation is performed twice when the read character string includes a character having a low appearance frequency.

〔Example〕

本発明の実施例について図面を参照して説明する。 Embodiments of the present invention will be described with reference to the drawings.

第１図は本発明の単語辞書照合装置の一実施例の構成を
示すブロック図、第２図は本実施例で照合される住所例
を示す図である。FIG. 1 is a block diagram showing the configuration of an embodiment of a word dictionary matching device of the present invention, and FIG. 2 is a diagram showing an example of an address matched in this embodiment.

本実施例は郵便物の宛名住所を予め作成された住所辞書
と照合して確認するもので、文字読取機３は入力された
文字列のうち対象とする住所部分を読取って、第１の読
取文字列としてレジスタ４に出力する。レジスタ４は入
力された第１の読取文字列を第１の照合手段１と第２の
照合手段２へそれぞれ出力する。第１の照合手段１は索
引テーブルポインタ11と辞書読出部12と辞書部13と照合
部14とから構成され、出現頻度の高い文字のみよりなる
文字列（住所）を照合出力し、もし文字列中に出現頻度
の低い文字が含まれているときは、その照合を第２の照
合手段に行なわせる。索引テーブルポインタ11は、レジ
スタ４から入力された第１の読取文字列の文字中に、予
め定められている出現頻度の低い文字があれば該文字を
予め定められた単一コードで置換し、なければそのまゝ
で、いずれの場合も第２の読取文字列として出力する。
辞書読出部12は第２の読出文字列を入力してそのうちに
含まれる単語を連鎖して若干個の単語列を形成し、それ
ぞれの語長（全文字数）により分類する。辞書部13は予
め、多数の住所名の文字列を格納しており、辞書読出部
12から入力された前記単語列に対応する文字列を第３の
読取文字列として出力し、辞書読出部12は前記第２の読
取文字列とともにこの第３の読取文字列を照合部14に出
力する。照合部14は第２の読取文字列と第３の読取文字
列とを照合して、両者が一致して、かつ読取文字列中に
単一コードを含まぬときはこれをそのまゝ、またはこれ
を代表するコードを出力する。もし、一致した読取文字
列中に単一コードが含まれているときは、単一コードを
含む文字列に対応して辞書部13に格納されているポイン
タ情報が照合部14から第２の照合手段２に出力される。
第２照合手段２は索引テーブルポインタ21と辞書読出部
22と辞書部23と照合部24とから構成されて、出現頻度の
低い文字を含む文字列（住所）を照合対象とする。索引
テーブルポインタ21はレジスタ４から第１の読取文字列
を、また、第１の照合手段１から前記ポインタ情報をそ
れぞれ入力して、ポインタ値により指定された単語中の
単一コードに該当する箇所の文字（出現頻度の低い文
字）を予め定められたコードに変換して第４の読取文字
列とし、第１および第４の読取文字列を辞書読出部22に
出力する。辞書部23は予め出現頻度の低い文字に対応す
る住所名の文字列を格納しており、辞書読出部22から第
４の読取文字列を入力したとき、対応する文字列を第５
の読取文字列として辞書読出部22へ返す。辞書読出部22
はこの第５の読取文字列と第１の読取文字列とを照合部
24へ出力する。照合部24は入力された両読取文字列を照
合して、両者が一致すれば照合された読取文字列または
これを代表するコードを出力する。区分機５は第１の照
合手段または第２の照合手段からの出力を受けて、宛先
住所による郵便物の仕分けを行う。In this embodiment, the address of the mail is checked by checking it against an address dictionary created in advance, and the character reader 3 reads the target address portion of the input character string and performs the first reading. It is output to the register 4 as a character string. The register 4 outputs the input first read character string to the first collating means 1 and the second collating means 2, respectively. The first collating means 1 is composed of an index table pointer 11, a dictionary reading unit 12, a dictionary unit 13 and a collating unit 14, collates and outputs a character string (address) consisting only of frequently appearing characters, and if the character string When a character with a low appearance frequency is included, the matching is performed by the second matching means. The index table pointer 11 replaces the character of the first read character string input from the register 4 with a predetermined single code if the predetermined occurrence frequency is low. If not, it is output as the second read character string in any case.
The dictionary reading unit 12 inputs the second read character string, links the words contained therein to form a few word strings, and classifies them according to their respective word lengths (total number of characters). The dictionary unit 13 stores a large number of address name character strings in advance, and the dictionary reading unit
A character string corresponding to the word string input from 12 is output as a third read character string, and the dictionary reading unit 12 outputs the third read character string to the collating unit 14 together with the second read character string. To do. The collating unit 14 collates the second read character string and the third read character string, and when both match and the read character string does not include a single code, this is kept as it is, or A code representative of this is output. If the matching read character string includes a single code, the pointer information stored in the dictionary unit 13 corresponding to the character string including the single code is output from the collating unit 14 to the second collating unit. It is output to the means 2.
The second collating means 2 includes an index table pointer 21 and a dictionary reading unit.
A character string (address) including a character having a low appearance frequency is configured to be a matching target, which is composed of a 22, a dictionary unit 23, and a matching unit 24. The index table pointer 21 receives the first read character string from the register 4 and the pointer information from the first collating means 1, respectively, and corresponds to a single code in the word designated by the pointer value. Is converted into a predetermined code to form a fourth read character string, and the first and fourth read character strings are output to the dictionary reading unit 22. The dictionary unit 23 stores in advance a character string of an address name corresponding to a character having a low appearance frequency, and when the fourth read character string is input from the dictionary reading unit 22, the corresponding character string is changed to a fifth character string.
To the dictionary reading unit 22 as a read character string. Dictionary reading unit 22
Collates the fifth read character string with the first read character string.
Output to 24. The collating unit 24 collates both input read character strings, and outputs the collated read character string or a code representative thereof when the both match. The sorter 5 receives the output from the first collating means or the second collating means, and sorts the mail by the destination address.

次に、本実施例の動作を第２図に示す宛先住所例の場合
について説明する。Next, the operation of this embodiment will be described in the case of the destination address example shown in FIG.

いま、出現頻度の低い文字として数字が指定されている
ものとし、単一コードとして＊が定められているとする
と、住所名「Ang Mo KioIndustrial Park 5」が文字読
取機３からレジスタ４を介して索引テーブルポインタ11
へ入力されて「Ang Mo Kio Industrial Park ＊」に変
換され、第２の読取文字列として辞書読出部12へ出力さ
れる。辞書読出部12ではこの第２の読取文字列の各単語
を連鎖することにより例えば次のような単語列を生成す
る。Now, assuming that a number is specified as a character with a low frequency of occurrence and * is set as a single code, the address name "Ang Mo Kio Industrial Park 5" is read from the character reader 3 via the register 4 Index table pointer 11
Is input to the dictionary reading unit 12 as a second read character string. The dictionary reading unit 12 links the words of the second read character string to generate the following word string, for example.

Park ＊ Mo Kio Industrial Park ＊ Kio Industrial Park など。 Park ＊ Mo Kio Industrial Park ＊ Kio Industrial Park etc.

これらの単語列はその語長（文字数）にしたがって逐次
辞書部13へ入力される。辞書部13では、予め語長にした
がって第２の読取文字列に対応する住所名の文字列が格
納されているので、検索された第３の文字列は辞書読出
部12を介して照合部14で第２の読取文字列と一致するか
どうか照合される。通常は１個なし数個の単語列による
検索で足り、いまの場合「Ang Mo Kio Industrial Park
＊」が検索されて両文字列が一致することが検出され
る。もしこの文字列中に単一コード＊が含まれていなけ
れば直ちに照合結果として出力されるが、いまの場合は
含まれているので、辞書部13内のこの部分に格納されて
いるポインタ情報が第２の照合手段へ出力される。These word strings are sequentially input to the dictionary unit 13 according to the word length (number of characters). Since the character string of the address name corresponding to the second read character string is stored in advance in the dictionary unit 13 according to the word length, the retrieved third character string is collated by the collation unit 14 via the dictionary reading unit 12. Then, it is checked whether or not it matches the second read character string. Usually, it is enough to search for a few words without one word. In the present case, "Ang Mo Kio Industrial Park
"*" Is searched and it is detected that both character strings match. If the single code * is not included in this character string, it is output as the collation result immediately, but in this case, it is included, so the pointer information stored in this part in the dictionary unit 13 is It is output to the second matching means.

一方、第２の照合手段２では、索引テーブルポインタ21
が、前記ポインタ情報と第１の読取文字列を入力して、
第１の読取文字列中のポインタ情報に該当する箇所、す
なわち、いまの場合「５」を予め定められたコードに変
換し、第４の読取文字列として第１の読取文字列ととも
に辞書読出部22へ出力する。第２の照合手段２の辞書部
23にはこの出現頻度の低い文字に対応する住所名が予め
格納されているので、第４の読取文字「５」により第１
の照合手段１における場合と同様の手順で第５の読取文
字列が検索され、照合部24で第１の読取文字列と一致す
るかどうか照合される。いまの場合「Ang Mo Kio Indus
trial Park 5」が照合の結果として出力される。On the other hand, in the second matching means 2, the index table pointer 21
Enter the pointer information and the first read character string,
A portion corresponding to the pointer information in the first read character string, that is, "5" in this case is converted into a predetermined code, and is read as a fourth read character string together with the first read character string in the dictionary reading unit. Output to 22. Dictionary part of the second matching means 2
Since the address name corresponding to this infrequently appearing character is stored in advance in 23, the first read character "5" is used as the first read character.
The fifth read character string is searched for by the same procedure as in the matching means 1 and the matching unit 24 matches whether or not the fifth read character string matches the first read character string. In this case, "Ang Mo Kio Indus
trial Park 5 ”is output as the result of the verification.

このようにして、出現頻度の高い文字のみで構成された
住所名は出現頻度の高い文字のみで構成された容量の小
さい辞書部13で検索できるので、従来のように容量の大
きい辞書部を用いる場合に比べて処理時間が少なくてす
み、出現頻度が低い文字を含む場合に２回照合したとし
ても、総合的には処理時間を節約することができる。In this way, since an address name composed only of frequently appearing characters can be searched by a small capacity dictionary unit 13 composed only of frequently appearing characters, a dictionary unit having a large capacity is used as in the past. The processing time is shorter than that in the case, and the processing time can be saved as a whole even if the character including the character having a low appearance frequency is collated twice.

〔The invention's effect〕

以上説明したように本発明は、入力される読取文字列の
文字について、予め出現頻度の高い文字と出現頻度の低
い文字とに分類しておき、出現頻度の低い文字を含まな
い文字列を照合出力する第１の照合手段と、該文字を含
む文字列を照合対象とする第２の照合手段とを設けて、
出現頻度の低い文字の有無にしたがい文字列を別々に照
合処理することにより、通常は出現頻度の高い文字の辞
書の照合のみで読取文字列の照合が終了し、辞書容量お
よび照合時間を小さくするという効果がある。As described above, according to the present invention, the characters of the input read character string are classified into a character having a high appearance frequency and a character having a low appearance frequency in advance, and a character string that does not include a character having a low appearance frequency is collated. A first collating unit for outputting and a second collating unit for collating a character string including the character are provided,
By matching the character strings separately according to the presence or absence of characters that occur infrequently, normally the matching of the read character string is completed only by matching the dictionary of the characters that appear frequently, and the dictionary capacity and matching time are reduced. There is an effect.

[Brief description of drawings]

第１図は本発明の単語辞書照合装置の一実施例の構成を
示すブロック図、第２図は本実施例に入力される住所例
を示す図、第３図は単語辞書照合装置の従来例の構成を
示すブロック図である。１……第１の照合手段、２……第２の照合手段、３……文字読取機、４……レジスタ、５……区分機、 11，21……索引テーブルポインタ、 12，22……辞書読出部、 13，23……辞書部、 14，24……照合部。FIG. 1 is a block diagram showing a configuration of an embodiment of a word dictionary matching device of the present invention, FIG. 2 is a diagram showing an example of an address input to this embodiment, and FIG. 3 is a conventional example of a word dictionary matching device. 3 is a block diagram showing the configuration of FIG. 1 ... First collating means, 2 ... Second collating means, 3 ... Character reader, 4 ... Register, 5 ... Sorting machine, 11, 21 ... Index table pointer, 12, 22 ... Dictionary reading section, 13, 23 ... Dictionary section, 14, 24 ... Collation section.

Claims

[Claims]

1. A word string contained in a read character string read by a character reader is used to search and collate a word dictionary in which a character string corresponding to the character string is stored in advance.
In a word dictionary matching device that confirms whether the read character string matches a character string stored in a dictionary, when the read character does not include a character with a predetermined low appearance frequency, the read character When it is confirmed that the columns match, the read character or a code representative thereof is output, and when the read character includes a character with a predetermined low appearance frequency,
Similarly, the dictionary is searched as a character string obtained by converting the character into a predetermined single code, and when the matching of the character strings is confirmed, pointer information stored in the character string in the dictionary is output. The matching frequency, the pointer information input from the first matching means, and the character having a low frequency of appearance indicated by the pointer information in the read character string input from the character reader, A second dictionary storing a character string including a low character, and when the matching of the read character string is confirmed, the read character or a code representative of the read character is output. A word dictionary matching device characterized by the above.