JPH0474756B2

JPH0474756B2 -

Info

Publication number: JPH0474756B2
Application number: JP57163401A
Authority: JP
Priority date: 1982-09-20
Filing date: 1982-09-20
Publication date: 1992-11-27
Also published as: JPS5953985A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、例えば手書入力されたｎ文字からな
る単語を簡易に且つ効果的に認識し、その認識結
果によつて示されるデータベース等の属性データ
を得ることのできる文字認識装置に関する。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention easily and effectively recognizes, for example, a handwritten word consisting of n characters, and the attributes of a database, etc. indicated by the recognition result. This invention relates to a character recognition device that can obtain data.

[Technical background of the invention and its problems]

文字を認識する場合、基本的にはその文字の特
徴を検出して行われるが、ｎ文字からなる単語を
認識するとき、個々の文字をそれぞれ認識し、そ
の結果を組合せるだけでは不十分なことが多い。
また或る文字の認識が不十分な場合にはその単語
を認識することが困難となることもある。そこで
従来では、第１図に示すように認識装置を構成
し、有意の単語数が限られることを利用して個々
の文字認識が不十分な場合であつてもその単語を
認識するような工夫が施されている。即ち、文字
入力部１を介して与えられるｎ文字の文字列から
なる単語を認識部２に導びいて文字認識する。こ
の認識が困難な場合には、その単語に対する複数
の候補カテゴリを求めて単語照合部３に与える。
この単語照合部３では、上記候補カテゴリの中の
単語として有意な組合せを、単語辞書４に登録さ
れた単語の中から探し出し、これによつて正しい
単語の選択を行いその認識を行う。このような認
識処理によれば、個々の文字に対する認識結果が
不完全であつても、その候補カテゴリの組合せに
よつて有意な単語を選び出すことができ、結果と
して効果的な単語認識が可能となる。然し乍ら、
文字認識によつて得られる候補カテゴリが多い場
合、その組合せが膨大となる為に、単語との照合
に極めて多大な時間を要すると云う不具合があ
る。 When recognizing a character, it is basically done by detecting the characteristics of that character, but when recognizing a word consisting of n characters, it is not enough to recognize each individual character and combine the results. There are many things.
Furthermore, if the recognition of a certain character is insufficient, it may be difficult to recognize the word. Therefore, in the past, a recognition device was configured as shown in Figure 1, and by taking advantage of the limited number of meaningful words, it was devised to recognize the word even when individual character recognition was insufficient. is applied. That is, a word consisting of a character string of n characters given through the character input section 1 is led to the recognition section 2 for character recognition. If this recognition is difficult, a plurality of candidate categories for the word are determined and provided to the word matching unit 3.
The word matching section 3 searches for combinations that are significant as words in the candidate categories from among the words registered in the word dictionary 4, and thereby selects and recognizes the correct word. According to such recognition processing, even if the recognition results for individual characters are incomplete, significant words can be selected by combining candidate categories, and as a result, effective word recognition is possible. Become. However,
When there are many candidate categories obtained through character recognition, the number of combinations becomes enormous, and there is a problem in that it takes an extremely long time to match words.

また、このように文字認識装置において、認識
された単語に付属した属性情報も得られるように
したい場合に、これまで考えられていたものは、
候補カテゴリとの比較照合に用いる辞書として認
識対象単語を登録した辞書を用意し、これと別個
に単語に対応する属性データを記憶したテーブル
を用意するものであつた。 In addition, if you want to be able to obtain attribute information attached to recognized words in a character recognition device, what has been considered so far is
A dictionary in which words to be recognized are registered is prepared as a dictionary used for comparison with candidate categories, and a table in which attribute data corresponding to the words is stored is separately prepared.

つまり、従来では、例えば認識対象単語として
人名、属性データとして住所を例にとると、人
名／住所に対応させて記憶したデータベースが既
に存在しているため、これと文字認識装置を別個
に構築し、人名のみを記憶した単語辞書を用いて
文字認識を行つた後に得られた単語（人名）で上
記既存のデータベースを検索しようとするものが
考えられていた。 In other words, conventionally, for example, if we take a person's name as a recognition target word and an address as attribute data, a database that stores the person's name/address in correspondence already exists, so a character recognition device is constructed separately from this database. One idea has been to search the existing database using words (personal names) obtained after performing character recognition using a word dictionary that stores only personal names.

ところが、これまで考えられているものは、管
理するデータベースが文字認識に用いる単語辞書
と属性データの検索に用いるテーブルと２個にな
つてしまい、その管理が面倒になるという欠点が
あり、さらに、単語辞書中の認識対象単語を特定
するための検索、特定した後の属性テーブルの読
み出しのためのテーブル指定、テーブルの検索と
いう繁雑な手順を必ず経る必要があり、検索を２
度行うために時間も掛かるという欠点があつた。 However, the methods that have been considered so far have the disadvantage that the database to be managed consists of two databases: a word dictionary used for character recognition and a table used for searching attribute data, which makes management cumbersome. It is necessary to go through the complicated steps of searching to identify the recognition target word in the word dictionary, specifying the table to read the attribute table after identifying it, and searching the table.
The disadvantage is that it takes time to do it once.

[Purpose of the invention]

本発明はこのような事情を考慮してなされたも
ので、その目的とするところは、簡易に且つ高速
度に辞書単語と候補カテゴリとの比較を行つて単
語認識を効果的に行い、同時にその単語の属性デ
ータをも効果的に得ることのできる文字認識装置
を提供することにある。 The present invention was made in consideration of these circumstances, and its purpose is to perform effective word recognition by simply and quickly comparing dictionary words and candidate categories, and at the same time, to perform word recognition effectively. It is an object of the present invention to provide a character recognition device which can also effectively obtain attribute data of words.

[Summary of the invention]

本発明は、文字列として与えられた入力単語の
各文字につき認識してそれぞれの候補カテゴリを
求める手段、予め認識対象単語と該認識対象単語
に対応する属性データとを隣接させ組としたもの
を複数組登録した単語辞書、この単語辞書のうち
前記属性データを除いて前記認識対象単語のみを
検索しこの検索した認識対象単語と上記候補カテ
ゴリとを比較照合する手段、比較照合が行われた
認識対象単語の前記入力単語に対する一致度を候
補カテゴリの候補順位または入力単語の文字に対
する類似度から計算して一致度の高い認識対象単
語を候補単語として選択する手段、この選択され
た認識対象単語と該認識対象単語に対応する属性
データを前記単語辞書から読出す手段により構成
されている。 The present invention provides a means for recognizing each character of an input word given as a character string to obtain a candidate category for each character, and a method in which a recognition target word and attribute data corresponding to the recognition target word are arranged in advance as adjacent pairs. A word dictionary in which a plurality of sets are registered, a means for searching only the recognition target word excluding the attribute data from the word dictionary, and comparing and matching the searched recognition target word with the candidate category, and recognition in which the comparison and verification have been performed. Means for calculating the degree of matching of the target word with the input word from the candidate ranking of the candidate category or the degree of similarity to the characters of the input word and selecting a recognition target word with a high degree of matching as a candidate word; It is constituted by means for reading attribute data corresponding to the recognition target word from the word dictionary.

〔Effect of the invention〕

従つて、本発明によれば、文字認識に用いる単
語辞書に認識対象単語と対応する属性データを組
にし、これらが互いに隣接するように登録し、各
認識対象単語と候補カテゴリとの比較照合の際、
属性データを飛ばして認識対象単語のみを検索す
るという新たな構成を採用することにより、管理
するデータベースを１個にでき、管理制御が簡単
になるとともに、単語辞書中の認識対象単語が特
定されるとこれに隣接して登録された属性データ
を読出せばよいので、検索が１度で済み、時間も
大幅に短縮することができるようになる。 Therefore, according to the present invention, recognition target words and corresponding attribute data are paired in a word dictionary used for character recognition, and these are registered so that they are adjacent to each other, and each recognition target word and candidate category are compared and matched. edge,
By adopting a new configuration in which attribute data is skipped and only the words to be recognized are searched, the number of databases to be managed can be reduced to one, simplifying management control and identifying the words to be recognized in the word dictionary. Since it is only necessary to read out the attribute data registered adjacent to this, the search can be performed only once, and the time can be significantly reduced.

[Embodiments of the invention]

以下、図面を参照して本発明の一実施例につき
説明する。 Hereinafter, one embodiment of the present invention will be described with reference to the drawings.

第２図は実施例装置の概略構成図である。ｎ文
字からなる文字列として与えられる入力単語は、
認識部１１にて各文字につきそれぞれ認識され、
それぞれｒ個の候補カテゴリ（文字コード）が求
められる。これらの候補カテゴリは、その候補順
位に応じて第１位から第ｒ位まで各単語文字位置
毎に定められる。尚、このとき各候補カテゴリの
文字に対する類似度、つまり候補カテゴリの標準
文字パターンと入力文字パターンとが似ている度
合をそれぞれ求め、この類似度と候補カテゴリと
を一体的に扱うようにしてもよい。かくして、認
識部１１によつて求められたｎ文字に対するそれ
ぞれｒ個の候補カテゴリの組合せによつて得られ
る認識対象単語はrⁿ個となる。そして、これらの
候補カテゴリは、編集部１２を介して単語照合部
１３に与えられる。 FIG. 2 is a schematic configuration diagram of the embodiment device. The input word given as a string of n characters is
Each character is recognized by the recognition unit 11,
For each, r candidate categories (character codes) are determined. These candidate categories are determined for each word character position from the first place to the rth place according to the candidate ranking. At this time, it is also possible to calculate the degree of similarity for characters in each candidate category, that is, the degree to which the standard character pattern of the candidate category and the input character pattern are similar, and to treat this degree of similarity and the candidate category as one. good. Thus, the number of recognition target words obtained by combining r candidate categories for each of the n characters found by the recognition unit 11 is r ⁿ . These candidate categories are then provided to the word matching unit 13 via the editing unit 12.

一方、単語辞書１４にはｎ文字からなるLn個
の単語と、その単語に付属する属性データとがそ
れぞれ対応して予め登録されている。これらの単
語は順次読出されて前記単語照合部１３に与えら
れ、認識処理に供される。 On the other hand, in the word dictionary 14, Ln words each consisting of n characters and attribute data attached to the words are registered in advance in correspondence with each other. These words are sequentially read out and provided to the word matching section 13, where they are subjected to recognition processing.

さて、単語照合部１３は、基本的には次のよう
に構成される。前記認識部１１を介して認識され
たｎ文字に対するｒ個の候補カテゴリ、つまりｎ
×ｒ個の候補カテゴリは候補文字レジスタ２１に
格納される。またアドレスカウンタ２２の制御を
受けて単語辞書１４から読出される単語は単語辞
書レジスタ２３に格納される。この単語辞書レジ
スタ２３に格納された単語の各文字はコンパレー
タ２４に供給され、レジスタカウンタ２５の制御
を受けて候補文字レジスタ２１から順次読出され
る候補カテゴリと単語文字位置に対応して一致比
較される。このコンパレータ２４によつて単語文
字と候補カテゴリとの一致が単語文字位置毎に行
われる。そして、その一致検出情報は一致度計算
部２６に与えられ、この計算部２６にて前記レジ
スタ２３に格納されて比較に供された単語の入力
単語に対する一致度が計算されている。この一致
度計算は、例えば一致した候補カテゴリの候補順
位を各単語文字位置毎に求め、その候補順位情報
の総合値を求める等して行われる。また前記した
ように候補カテゴリについてその類似度が求めら
れている場合には、各単語文字位置毎に一致検出
された候補カテゴリの類似度の総和値として一致
度を求めるようにしてもよい。尚、この一致比較
によつていずれかの単語文字位置において一致す
る候補カテゴリが見出されない場合には、その単
語に対する一致性がないものとして取扱うように
すればよい。そして、このような単語の一致計算
は、単語辞書１４から単語を読出す都度繰返して
行われ、Ln個の全ての単語につき、その一致度
が求められる。しかして、このようにして求めら
れた各単語の入力単語に対する一致度の情報はソ
ート処理部に導びかれ、一致度の高いものから順
にソート処理される。そして、一致度の高いｘ個
の単語が候補単語として編集部１２に与えられ
る。この編集部１２では、これらの候補単語と入
力単語の認識情報とを総合的に編集し、これを判
定して認識結果を得、その単語と共に単語に付属
した属性データを求めている。 Now, the word matching section 13 is basically configured as follows. r candidate categories for n characters recognized through the recognition unit 11, that is, n
The ×r candidate categories are stored in the candidate character register 21. Further, words read out from the word dictionary 14 under the control of the address counter 22 are stored in the word dictionary register 23. Each character of the word stored in this word dictionary register 23 is supplied to a comparator 24, and under the control of a register counter 25, it is compared in accordance with the candidate category read out sequentially from the candidate character register 21 and the word character position. Ru. This comparator 24 matches word characters and candidate categories for each word character position. The matching detection information is then given to a matching degree calculating section 26, which calculates the matching degree of the word stored in the register 23 and used for comparison with the input word. This degree of matching calculation is performed, for example, by determining the candidate ranking of the matched candidate category for each word character position, and calculating the total value of the candidate ranking information. Furthermore, when the degree of similarity has been determined for candidate categories as described above, the degree of coincidence may be determined as the sum of the degrees of similarity of candidate categories whose matches are detected for each word character position. Note that if a matching candidate category is not found at any word character position through this matching comparison, it may be treated as if there is no matching for that word. Then, such word matching calculation is repeated every time a word is read from the word dictionary 14, and the matching degree is determined for all Ln words. The information on the degree of matching of each word with respect to the input word thus obtained is led to the sorting processing section, and is sorted in descending order of the degree of matching. Then, x words with a high degree of matching are given to the editing unit 12 as candidate words. The editing unit 12 comprehensively edits these candidate words and the recognition information of the input word, determines them, obtains a recognition result, and obtains the word and attribute data attached to the word.

ところで、このような照合処理は、具体的には
次のようにして行われる。第３図はその一例を示
すもので、候補文字レジスタ２１は、ｎ行ｒ列の
記憶エリアを持つシフトレジスタにより構成され
る。しかして、ｎ文字につきそれぞれ認識された
候補カテゴリは、ｎ行に単語文字位置を対応さ
せ、その候補順位に従つてｒ列に亘つて順次格納
される。そして、これらの格納された候補カテゴ
リは、カウンタ２５の制御を受けて１列毎に並列
的に読出され、コンパレータ２４に与えられる。
またこのコンパレータ２４にはレジスタ２３から
単語の各文字データが単語文字位置対応して並列
的に与えられている。これによつて、各単語文字
位置において、候補カテゴリと単語文字との同時
対比が行われる。そして、一致結果を得た単語文
字位置からその情報が一致度計算部２６に与えら
れ、そのときのカウンタ２５の計算値が候補順位
情報として取込まれる。そして、１回の比較が終
了したとき、カウンタ２５が歩進されて次の１列
の候補カテゴリがコンパレータ２４に与えられ、
同様にして一致検出が行われる。そして、この一
致検出は、ｒ回繰返して行われ、これによつてレ
ジスタ２３に格納された単語に対する照合処理を
終える。 By the way, such a verification process is specifically performed as follows. FIG. 3 shows an example, in which the candidate character register 21 is constituted by a shift register having a storage area of n rows and r columns. Thus, the candidate categories recognized for each of the n characters are stored in sequence over r columns in accordance with the candidate rankings, with the word character positions corresponding to the n rows. These stored candidate categories are then read out in parallel column by column under the control of the counter 25 and provided to the comparator 24.
Further, the comparator 24 is provided with character data of each word from the register 23 in parallel in correspondence with word character positions. This allows simultaneous comparison of candidate categories and word characters at each word character position. Then, information from the word character position for which a matching result is obtained is given to the matching degree calculation section 26, and the calculated value of the counter 25 at that time is taken in as candidate ranking information. When one comparison is completed, the counter 25 is incremented and the next column of candidate categories is given to the comparator 24.
Match detection is performed in the same manner. This matching detection is repeated r times, thereby completing the matching process for the words stored in the register 23.

このとき、一致度計算部２６には各単語文字位
置毎に一致検出された候補カテゴリを示すカウン
ト値、つまり候補順位の情報が得られており、こ
の情報に従つて、レジスタ２３に格納された単語
の入力単語に対する一致度が求められている。ソ
ート処理部２７は、このときのアドレスカウンタ
２２のカウント値から、上記一致度が求められた
単語を認識しており、その一致度の高い単語を、
一致度とカウント値として格納している。そし
て、次に入力された単語の一致度と、先に入力さ
れた単語の一致度とを比較し、一致度の低い単語
情報を切捨てると共に、その並び換えを行う等し
てソート処理を行つている。 At this time, the match calculation unit 26 has obtained a count value indicating the candidate category in which a match has been detected for each word character position, that is, information on the candidate ranking, and according to this information, The degree of matching of a word to an input word is determined. The sorting processing unit 27 recognizes the word for which the degree of matching has been determined from the count value of the address counter 22 at this time, and selects the word with the high degree of matching.
It is stored as a match value and a count value. Then, the matching degree of the next input word is compared with the matching degree of the previously input word, and the word information with a low matching degree is discarded, and the sorting process is performed by rearranging the word information. It's on.

尚、この場合、レジスタ２１に格納した候補カ
テゴリに対応して、その類似度情報を別の記憶エ
リアに格納しておき、一致度計算を前記した候補
順位に代えて類似度情報に従つて行うようにして
もよい。 In this case, the similarity information corresponding to the candidate category stored in the register 21 is stored in a separate storage area, and the matching degree calculation is performed according to the similarity information instead of the above-mentioned candidate ranking. You can do it like this.

また、この第３図に示す構成では、照合処理を
各単語文字位置につき並列的に同時に実行するよ
うにしたが、第４図に示すようにシリアルに実行
するようにしても良い。即ち候補文字レジスタ２
１をｎ×ｒ個の記憶エリアをもつものとし、これ
らの記憶エリアに候補カテゴリを順に格納するよ
うに構成する。尚ここでは候補カテゴリと共に類
似度情報も格納するようにしたものを示す。そし
て、上記候補カテゴリをカウンタ２５のアドレス
制御により順次読出してコンパレータ２４に供給
すると共に、レジスタ２３に格納された単語の各
文字をセレクタ２８を介して順次読出すようにす
る。この場合、第１位の文字をセレクトしてその
文字位置における候補文字を順次読出して一致検
出し、その後第２位の文字をセレクトしてその文
字位置の候補カテゴリに対する一致検出を行うよ
うにカウンタ２５のアドレス制御と、セレクタ２
８のセレクト制御を同期させるようにすればよ
い。このようにしても、先の例と同様に照合処理
を行うことができる。 Further, in the configuration shown in FIG. 3, the collation process is executed simultaneously in parallel for each word character position, but it may be executed serially as shown in FIG. 4. That is, candidate character register 2
1 has n×r storage areas, and the candidate categories are sequentially stored in these storage areas. Here, an example is shown in which similarity information is also stored together with candidate categories. Then, the candidate categories are sequentially read out under address control of the counter 25 and supplied to the comparator 24, and each character of the word stored in the register 23 is sequentially read out via the selector 28. In this case, the counter selects the first character and sequentially reads candidate characters at that character position to detect a match, and then selects the second character and performs match detection for the candidate category at that character position. 25 address control and selector 2
8 selection controls may be synchronized. Even in this case, the matching process can be performed in the same way as in the previous example.

第５図は上述した認識処理の概念を示す図であ
り、ここでは入力単語として「黒沢」なる人名単
語が与えられる例を示している。しかして単語辞
書１４には、認識対象とする単語１４ａと共に、
その単語１４ａに付属する属性データ１４ｂが対
応して予め登録されている。このうち、上記単語
１４ａが入力単語に対する認識処理に供されるこ
とになる。そして、この単語１４ａが認識結果と
して前述した如く求めれたとき、この単語１４ａ
と共に、その属性データ１４ｂが読出されること
になる。この読出し処理が前記編集部１２により
行われる。 FIG. 5 is a diagram showing the concept of the above-mentioned recognition process, and here shows an example in which a person's name word "Kurosawa" is given as an input word. Therefore, in the word dictionary 14, along with the word 14a to be recognized,
Attribute data 14b attached to the word 14a is registered in advance. Among these words, the word 14a is subjected to the recognition process for the input word. When this word 14a is obtained as a recognition result as described above, this word 14a
At the same time, the attribute data 14b is read out. This reading process is performed by the editing section 12.

尚、上記属性データ１４ｂは、例えば単語１４
ａが人名単語として与えられる場合にはその住所
や電話番号、年令等のデータとして与えれる。そ
して、この属性データ１４ｂによつて会員名簿や
顧客台帳等のデータベース検索や、その管理が行
われることになる。また単語１４ａが仮名文字で
与えられる場合、属性データ１４ｂを対応漢字と
して与えることにより、仮名・漢字変換を行うこ
ともできる。 Note that the attribute data 14b is, for example, the word 14.
When a is given as a person's name word, it is given as data such as the address, telephone number, age, etc. This attribute data 14b is used to search and manage databases such as member lists and customer ledgers. Further, when the word 14a is given in kana characters, kana/kanji conversion can be performed by giving the attribute data 14b as the corresponding kanji.

以上説明したように本発明によれば、文字認識
に用いる単語辞書に認識対象単語と対応する属性
データを組にし、これらが互いに隣接するように
登録し、各認識対象単語と候補カテゴリとの比較
照合の際、属性データを飛ばして認識対象単語の
みを検索するという新たな構成を採用することに
より、管理するデータベースを１個にできること
を特徴としている。これにより管理制御が簡単に
なるとともに、単語辞書中の認識対象単語が特定
されるとこれに隣接して登録された属性データを
読出せばよいので、検索が１度で済み、時間も大
幅に短縮することができるようになる。 As explained above, according to the present invention, recognition target words and corresponding attribute data are grouped in a word dictionary used for character recognition, are registered so that they are adjacent to each other, and each recognition target word is compared with candidate categories. By adopting a new configuration in which attribute data is skipped and only words to be recognized are searched during verification, the system can manage only one database. This simplifies management control, and when a word to be recognized in the word dictionary is identified, the attribute data registered adjacent to it can be read out, so the search only needs to be done once, which significantly saves time. It will be possible to shorten it.

尚、本発明は上記実施例に限定されるものでは
ない。例えば属性データとして、単語の分類情報
を与えてもよい。また、その単語の出現頻度の情
報を属性データとして与えることも可能であり、
要するに本発明はその要旨を逸脱しない範囲で
種々変形して実施することができる。 Note that the present invention is not limited to the above embodiments. For example, word classification information may be provided as the attribute data. It is also possible to provide information on the frequency of occurrence of the word as attribute data.
In short, the present invention can be implemented with various modifications without departing from the gist thereof.

[Brief explanation of drawings]

第１図は従来装置の一例を示す概略構成図、第
２図は本発明の一実施例装置の概略構成図、第３
図および第４図はそれぞれ単語照合部の基本的な
構成例を示す図、第５図は認識処理概念を示す図
である。１１…認識部、１２…編集部、１３…単語照合
部、１４…単語辞書、２１…候補文字レジスタ、
２２…アドレスカウンタ、２３…単語辞書レジス
タ、２４…コンパレータ、２５…レジスタカウン
タ、２６…一致度計算部、２７…ソート処理部、
２８…セレクタ、１４ａ…単語、１４ｂ…属性デ
ータ。 FIG. 1 is a schematic configuration diagram showing an example of a conventional device, FIG. 2 is a schematic configuration diagram of an embodiment of the device of the present invention, and FIG.
4 and 4 are diagrams each showing a basic configuration example of the word matching section, and FIG. 5 is a diagram showing the concept of recognition processing. 11... Recognition unit, 12... Editing unit, 13... Word matching unit, 14... Word dictionary, 21... Candidate character register,
22... Address counter, 23... Word dictionary register, 24... Comparator, 25... Register counter, 26... Matching degree calculation section, 27... Sorting processing section,
28...Selector, 14a...Word, 14b...Attribute data.

Claims

[Scope of Claims] 1. Means for recognizing each character of an input word given as a character string to obtain each candidate category; a word dictionary in which a plurality of sets of words are registered; a means for searching only the recognition target word excluding the attribute data from the word dictionary; and comparing and matching the searched recognition target word with the candidate category; means for calculating the matching degree of the matched recognition target word with the input word from the candidate ranking of the candidate category or the similarity to the characters of the input word, and selecting a recognition target word with a high matching degree as a candidate word; A character recognition device comprising means for reading out a selected recognition target word and attribute data corresponding to the recognition target word from the word dictionary.