JP2906758B2

JP2906758B2 - Character reader

Info

Publication number: JP2906758B2
Application number: JP3225753A
Authority: JP
Inventors: 俊史山内
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1991-09-05
Filing date: 1991-09-05
Publication date: 1999-06-21
Anticipated expiration: 2014-06-21
Also published as: JPH0567238A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、手書き文字、印刷文字
を自動読取する文字読取装置に関し、特に手書きの変形
を有する字体、類似した字体、マルチフォント印刷文
字、オムニフォント印刷文字を読取する文字読取装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character reading apparatus for automatically reading handwritten characters and printed characters, and more particularly to a character for reading handwritten deformations, similar fonts, multi-font printed characters, and omni-font printed characters. It relates to a reading device.

【０００２】[0002]

【従来の技術】従来、手書き文字、印刷文字などの文字
パターンを読取する文字読取装置では、帳票あるいは文
書上に手書きされた、あるいは印刷された文字列データ
に対し、文字切り出し部において個々の文字に切り出し
処理が行われた後に、個別文字の認識処理が行われる。
一般の個別文字認識処理は、文字切り出し部で切り出さ
れた個々の未知入力文字パターンに対し、特徴抽出処理
を行い、得られた特徴ベクトルと、予め学習文字パター
ン集合より設計された文字認識辞書との距離値あるいは
類似度を計算し、判定部では最小距離値あるいは最大類
似度値を取る文字カテゴリーを読取結果とすることによ
り行われる。2. Description of the Related Art Conventionally, in a character reading apparatus for reading a character pattern such as a handwritten character or a printed character, individual character data is extracted in a character cutout section from character string data handwritten or printed on a form or a document. After the clipping process is performed, the individual character recognition process is performed.
The general individual character recognition process performs a feature extraction process on each unknown input character pattern cut out by the character cutout unit, and obtains the obtained feature vector and a character recognition dictionary designed in advance from a set of learning character patterns. The distance value or similarity is calculated, and the determination unit determines the character category having the minimum distance value or the maximum similarity value as the read result.

【０００３】文字読取において読取性能の向上の手段と
して、１つのカテゴリーについて、様々の変形に対応し
た複数のサブカテゴリーからなる文字認識辞書を設ける
ことにより、文字読取を行う方法がある。（参考文献：
Ｍｉｙａｍｏｔｏ．Ｎ．，Ｎａｋａｊｉｍａ．Ｎ．ａｎ
ｄＫａｗａｔａｎｉ．Ｔ：”Ｈｉｇｈｐｅｒｆｏｒ
ｍａｎｃｅｏｐｔｉｃａｌｃｈａｒａｃｔｅｒｒ
ｅａｄｅｒｆｏｒｈａｎｄｐｒｉｎｔｅｄｎｕｍｅ
ｒａｌｓ．ａｌｐｈａｂｅｔｓ．ａｎｄｋａｔａｋａｎ
ａ”，ＮＴＴＲｅｖｉｅｗ，Ｖｏｌ．１，Ｎｏ．２，
ｐｐ．７３−８１（Ｊｕｌｙ．１９８９）．As a means of improving the reading performance in character reading, there is a method of performing character reading by providing a character recognition dictionary including a plurality of subcategories corresponding to various deformations for one category. (References:
Miyamoto. N. , Nakajima. N. an
d Kawatani. T: "High performer
mance optical character r
eaderforhand printed number
rals. alphabets. and Katakan
a ", NTT Review, Vol. 1, No. 2,
pp. 73-81 (July. 1989).

【発明が解決しようとする課題】しかし、従来の文字読
取装置では、各文字毎に独立して認識処理を行っている
ため、帳票あるいは文書上に書かれている他の文字デー
タの情報はなく、筆記者の癖などが原因で、複数の文字
カテゴリーと類似した文字パターンが入力されたとき、
識別が困難で、判定不能、もしくは誤認識が発生すると
いう欠点がある。However, in the conventional character reading apparatus, since recognition processing is performed independently for each character, there is no information of other character data written on a form or a document. , When a character pattern similar to multiple character categories is input due to the writer's habit, etc.
There is a drawback that identification is difficult, determination is impossible, or erroneous recognition occurs.

【０００４】図２に帳票に書かれた文字の例を示す。文
字枠内に５文字記入されているが、第２文字目は、数字
の１か７かの識別が困難で、従来の文字読取装置では判
定不能として読取が棄却される。ところが、人間が文字
読取を行う際には、必ずしも１文字のみ注目して、個別
に認識を行っているのではなく、前後に書かれている文
字の情報も利用している。FIG. 2 shows an example of characters written on a form. Although five characters are entered in the character frame, the second character is difficult to discriminate whether it is a numeral 1 or 7, and the reading is rejected because it cannot be determined by the conventional character reading device. However, when a human reads a character, he or she does not always pay attention to one character and recognizes it individually, but also uses information of characters written before and after.

【０００５】図２の文字の例では、１文字だけ注目した
場合は、第２文字目は、１か７かの識別が困難である
が、第４文字目は、確実に７と判定できる。人間は、同
一筆記者が記入したデータであるという仮定がある場
合、第４文字目の判定情報を利用して、第４文字目と第
２文字目は字形に差があること、第４文字目が７と判定
可能であることから、第２文字目を１と判定する。した
がって、１文字ずつ独立して判定処理を行う従来の文字
読取装置では、人間に近い読取性能を得るのは困難であ
る。In the example of the characters shown in FIG. 2, if only one character is focused on, it is difficult to determine whether the second character is 1 or 7, but the fourth character can be reliably determined to be 7. If it is assumed that the data is the data written by the same writer, the fourth character and the second character have a difference in the character shape using the fourth character determination information. Since the eye can be determined to be 7, the second character is determined to be 1. Therefore, it is difficult for a conventional character reading device that performs the determination process independently for each character to obtain a reading performance close to that of a human.

【０００６】本発明の目的は、同一筆記者が記入した帳
票あるいは文書などにおいて、同一筆記者が記入した同
一のカテゴリーの字形のばらつきは小さいという性質を
利用し、帳票あるいは文書上に記入された全体の文字字
形情報から読取を行うことにより、従来の１文字ずつの
処理を行う個別文字認識方式では読取困難であった文字
字形について、読取可能とする文字読取装置を提供する
ことにある。An object of the present invention is to fill in a form or a document written by the same writer by taking advantage of the fact that the variation in the character shape of the same category written by the same writer is small. It is an object of the present invention to provide a character reading device that can read a character shape that is difficult to read by the conventional individual character recognition method that performs processing for each character by reading from the entire character character shape information.

【０００７】[0007]

【課題を解決するための手段】第１の発明の文字読取装
置は、１つの文字カテゴリーに対して複数のサブカテゴ
リーの特徴ベクトルを格納する認識辞書を有し、入力文
字パターンの特徴ベクトルと認識辞書の各サブカテゴリ
ーの特徴ベクトル間の距離値に基づきカテゴリーの判定
処理を行う個別文字認識手段を用い、帳票あるいは文書
上の文字を読取る文字読取装置において、サブカテゴリ
ー特徴ベクトル間の距離値の小さいサブカテゴリー対に
近接フラグを立て記憶する近接フラグメモリーと、距離
値の第１候補、第２候補のカテゴリー名、サブカテゴリ
ー特徴ベクトル番号、および読取るか棄却するかを示す
判定フラグを記憶する判定結果メモリーと、１枚分の帳
票の判定処理が終了した後、第ｉ文字目が棄却を示して
いるとき、第ｉ文字目の第２候補カテゴリーと同一カテ
ゴリーが他の文字の第１候補カテゴリーとして存在する
か比較を行い、帳票あるいは文書上の第ｊ文字目に同一
カテゴリーが存在し、かつ第ｉ文字目の第１候補、第２
候補サブカテゴリー対に近接フラグが立っており、第ｉ
文字目の第１候補、第ｊ文字目第１候補サブカテゴリー
対に近接フラグが立っていないとき、第ｉ文字目の第１
候補カテゴリーを判定結果とし、棄却文字を再度強制判
定する強制文字判定手段とを有することを特徴とする。According to a first aspect of the present invention, there is provided a character reading apparatus having a recognition dictionary for storing feature vectors of a plurality of sub-categories for one character category, and recognizing a feature vector of an input character pattern. In a character reading device that reads characters on a form or a document using individual character recognition means that performs category determination processing based on the distance value between feature vectors of each subcategory of a dictionary, the distance value between the subcategory feature vectors is small. A determination result that stores a proximity flag memory that sets and stores a proximity flag for a subcategory pair, a category name of a first candidate and a second candidate of a distance value, a subcategory feature vector number, and a determination flag indicating whether to read or reject. If the i-th character indicates rejection after the judgment process of the memory and one sheet of form is completed, the i-th sentence A comparison is made as to whether the same category as the second candidate category of the eye exists as the first candidate category of another character, and the same category exists at the j-th character on the form or document, and the first category of the i-th character Candidate, second
The proximity flag is set for the candidate subcategory pair, and
When the proximity flag is not set for the first candidate of the character and the first candidate subcategory pair of the j-th character, the first candidate of the i-th character
It is characterized by having forced character determination means for determining a candidate category as a determination result and forcibly determining a rejected character again.

【０００８】第２の発明の文字読取装置は、１つの文字
カテゴリーに対して複数のサブカテゴリーの特徴ベクト
ルを格納する認識辞書を有し、入力文字パターンの特徴
ベクトルと認識辞書の各サブカテゴリーの特徴ベクトル
間の距離値に基づきカテゴリーの判定処理を行う個別文
字認識手段を用い、帳票あるいは文書上の文字を読取る
文字読取装置において、サブカテゴリー特徴ベクトル間
の距離値の小さいサブカテゴリー対に近接フラグを立て
記憶する近接フラグメモリーと、距離値の第１候補、第
２候補のカテゴリー名、サブカテゴリー特徴ベクトル番
号、および読取るか棄却するかを示す判定フラグを記憶
する判定結果メモリーと、同一筆記者から求めた認識辞
書における各サブカテゴリー特徴ベクトルの判定頻度を
予め記憶する判定頻度分布メモリーと、１枚分の帳票の
判定処理が終了した後、第ｉ文字目が棄却を示している
とき、第ｉ文字目の第２候補カテゴリーと同一カテゴリ
ーが他の判定済みの文字の第１候補カテゴリーとして存
在するか比較を行い、帳票あるいは文書上の第ｊ文字目
に同一カテゴリーが存在したとき、第ｉ文字目の第２候
補のサブカテゴリー特徴ベクトル番号、第ｊ文字目の第
１候補のサブカテゴリー特徴ベクトル番号に基づき前記
判定頻度分布メモリーを参照し、判定頻度がしきい値以
下のとき、第ｉ文字目の第１候補カテゴリーを判定結果
とし、棄却文字を再度強制判定する強制文字判定手段と
を有することを特徴とする。第３の発明の文字読取装置
は、第２の発明の文字読取装置の判定頻度分布メモリー
において、帳票あるいは文書の読取動作中に判定頻度分
布メモリーの内容を更新することを特徴とする。The character reading apparatus of the second invention has a recognition dictionary for storing a plurality of sub-category feature vectors for one character category, and includes a feature vector of an input character pattern and a sub-category of each sub-category of the recognition dictionary. In a character reading device that reads characters on a form or a document using an individual character recognition unit that performs a category determination process based on a distance value between feature vectors, a proximity flag is set to a subcategory pair having a small distance value between the subcategory feature vectors. The same writer as a proximity flag memory that stores and stores a first candidate and a second candidate of a distance value, a category name, a subcategory feature vector number, and a determination flag indicating whether to read or reject. To pre-store the judgment frequency of each subcategory feature vector in the recognition dictionary obtained from After the determination processing of the degree distribution memory and one sheet of form is completed, when the i-th character indicates rejection, the same category as the second candidate category of the i-th character is used for other determined characters. A comparison is made as to whether the same category exists as the first candidate category. If the same category exists at the j-th character on the form or document, the sub-category feature vector number of the second candidate at the i-th character, The judgment frequency distribution memory is referred to based on the subcategory feature vector number of one candidate, and when the judgment frequency is equal to or less than the threshold value, the first candidate category of the i-th character is used as the judgment result, and the rejected character is forcibly judged again. A forced character determination unit. A character reading device according to a third invention is characterized in that, in the determination frequency distribution memory of the character reading device according to the second invention, the content of the determination frequency distribution memory is updated during the operation of reading a form or a document.

【０００９】[0009]

【作用】帳票あるいは文書上の文字について判定処理終
了後、判定結果メモリー内のデータを参照し、第１候補
カテゴリーと第２候補カテゴリーの距離値が接近し棄却
を示している文字について、近接フラグメモリーの内
容、または判定頻度分布メモリーの内容により、棄却を
示している文字の第２候補カテゴリーと同一カテゴリー
が帳票あるいは文書上の他の文字の第１候補カテゴリー
として存在し、複数の異なるカテゴリーと近接すること
なく高い信頼度でもって判定している場合、棄却を示し
ている文字を第１候補カテゴリーに強制判定する。After the judgment processing for characters on a form or document is completed, the data in the judgment result memory is referred to, and the proximity flag is set for a character whose distance value between the first candidate category and the second candidate category is close and indicates rejection. According to the content of the memory or the content of the determination frequency distribution memory, the same category as the second candidate category of the character indicating rejection exists as the first candidate category of another character on the form or document, and a plurality of different categories If the determination is made with high reliability without proximity, the character indicating rejection is forcibly determined as the first candidate category.

【００１０】[0010]

【実施例】以下に第１、２、３の発明の構成について図
面を参照しながら説明する。図１は第１、２、３の発明
の一実施例を示す構成図である。スキャナ部１におい
て、光学的にスキャンされた帳票あるいは文書イメージ
データに対し二値化処理を行い、白黒二値レベルの文字
列パターンを生成する。文字切り出し部２では、文字列
パターンの大きさ、ピッチ情報などに基づき文字列パタ
ーン切り出し、個々の文字切り出し処理が行われる。特
徴抽出部３では、文字の濃淡特徴、輪郭特徴などの文字
特徴を抽出し、Ｎ次元の特徴ベクトルｆ＝（ｆ₁，・・
・・・・・・，ｆ_N）を生成する。認識辞書部４には、
認識対象のＭ種類の文字カテゴリーＣ₁（Ｉ＝１，・・
・，Ｍ）の学習パターンについて、Ｌ個のサブカテゴリ
ーに分割を行ったサブカテゴリーＣ_{I J}（Ｉ＝１，・・
・，Ｍ，Ｊ＝１，・・・，Ｌ）の特徴ベクトルの集合の
演算により得られるサブカテゴリー特徴ベクトルｇ_{I J}
＝（ｇ_{I J 1}，・・・・・，ｇ_{I J N}）を格納してあ
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS The constructions of the first, second and third inventions will be described below with reference to the drawings. FIG. 1 is a configuration diagram showing one embodiment of the first, second, and third inventions. The scanner unit 1 performs a binarization process on a form or document image data optically scanned to generate a black and white binary level character string pattern. The character cutout unit 2 cuts out a character string pattern based on the size and pitch information of the character string pattern, and performs individual character cutout processing. The feature extracting unit 3 extracts character features such as the shading feature and the outline feature of the character, and obtains an N-dimensional feature vector f = (f ₁ ,.
.., F _N ). In the recognition dictionary unit 4,
M types of character categories C _{1 to be} recognized (I = 1,.
., M), the sub-category C _IJ (I = 1,...) Divided into L sub-categories
, M, J = 1,..., L) sub-category feature vector g _IJ obtained by calculating a set of feature vectors
= (G _{IJ 1} ,..., G _IJN ).

【００１１】次に距離計算部５において、入力文字パタ
ーンの特徴ベクトルと各サブカテゴリー特徴ベクトル間
の距離計算を式（１）に基づき行う。Next, a distance calculator 5 calculates the distance between the feature vector of the input character pattern and each sub-category feature vector based on equation (1).

【００１２】Ｄ²（ｆ，ｇ_{I J}）＝（ｆ−ｇ_{I J}）^t（ｆ−ｇ_{I J}）（１）判定部６では、距離計算部５において得られた距離値に
ついて、小さい順に並び替え処理を行い、得られた判定
結果を判定結果メモリー８に書き込む。判定結果メモリ
ー８に格納される判定結果データは、図３に示すよう
に、帳票または文書上に記入されたｎ個の文字につい
て、１文字ずつ順番に各文字に対して判定部６において
得られた結果を書き込む。第ｋ文字目の結果は、判定か
棄却を示す判定フラグｈ^{( k )}と、距離値が第１位（最
小）の第１候補カテゴリー名Ｃ_{I 1} ^{(k )}、サブカテゴ
リー特徴ベクトル番号Ｓ_{I 1 J 1} ^{( k )}、距離値が第２
位の第２候補カテゴリー名（第１位のカテゴリーとは異
なる）Ｃ_{I 2} ^{( k )}、サブカテゴリー特徴ベクトル番号
Ｓ_{I 2 J 2} ^{( k )}から成る。D ² (f, g _IJ ) = (f−g _IJ ) ^t (f−g _IJ ) (1) The determination unit 6 rearranges the distance values obtained by the distance calculation unit 5 in ascending order. And writes the obtained determination result in the determination result memory 8. As shown in FIG. 3, the determination result data stored in the determination result memory 8 is obtained by the determination unit 6 for each character one by one in order for n characters written on a form or a document. Write the result. The result of the k-th character is a judgment flag h ^(k) indicating judgment or rejection, a first candidate category name C _{I 1} ^(k) having the first (minimum) distance value, and a sub-category feature vector number S _{I. 1 J 1} ^(k) , distance value is second
Second candidate category name of position (different from the first position ^category) C _{I 2} ^(k), consisting of the sub-category feature vector number _{^{S I 2 J 2 (k)}} .

【００１３】判定フラグは式（２）、（３）に示すよう
に、第ｋ文字目の第１候補の距離値と第２候補の距離値
の差が、しきい値εより小さいとき、異なるカテゴリー
と距離値が接近しているため、判定フラグを立て（判定
フラグｈ_k＝１）、しきい値εより大きいとき、判定フ
ラグは立てない（判定フラグｈ_k＝０）。従来の文字読
取装置では、判定フラグを立てた文字については棄却
（読取不能）処理を行っている。As shown in equations (2) and (3), the determination flag is different when the difference between the distance value of the first candidate and the distance value of the second candidate of the k-th character is smaller than the threshold value ε. Since the category and the distance value are close to each other, the judgment flag is set (judgment flag h _k = 1). When the value is larger than the threshold value ε, the judgment flag is not set (judgment flag h _k = 0). In a conventional character reading apparatus, a character for which a determination flag is set is rejected (impossible to read).

【００１４】Ｄ²（ｆ，_{ｇI 2 J 2} ^{( k )}）−Ｄ²（ｆ，ｇ_{I 1 J 1} ^{( k )}）≦ε のときｈ_k＝１（２）Ｄ²（ｆ，_{ｇI 2 J 2} ^{( k )}）−Ｄ²（ｆ，ｇ_{I 1 J 1} ^{( k )}）＞ε のときｈ_k＝０（３）またサブカテゴリー特徴ベクトル間の距離計算を式
（４）に基づき行い、式（５）に示すように、辞書間の
距離が、予め設定されたしきい値δより小さいとき、近
接フラグを立て（Ｒ_{I J I ' J'}＝１）、式（６）に示
すように、しきい値δ以上のとき、近接フラグには０
（Ｒ_{I J I ' J'}＝０）が、近接フラグメモリー７に書
き込まれる。When D ² (f, _{gI 2 J 2} ^(k) ) − D ² (f, g _{I 1 J 1} ^(k) ) ≦ ε, h _k = 1 (2) D ² (f, _{g I 2 J)} It performed based on the _{^{^{2 (k)) -D 2 (}}} f, g I 1 J 1 (k))> ε h k = 0 (3) the equation (4 calculation of distance between the sub-category feature vector time), the formula As shown in (5), when the distance between dictionaries is smaller than a preset threshold value δ, a proximity flag is set (R _{IJI 'J'} = 1), and as shown in equation (6), When the value is equal to or larger than the threshold value δ, 0 is set in the proximity flag.
(R _{IJI 'J'} = 0) is written to the proximity flag memory 7.

【００１５】Ｄ²（ｇ_{I J}，ｇ_{I ' J '}）＝（ｇ_{I J}−ｇ_{I ' J '}）^t（ｇ_{I J}−ｇ_{I ' J '}）（４）Ｄ²（ｇ_{I J}，ｇ_{I ' J '}）＜δ （Ｉ≠Ｉ’）のときＲ_{I J I ' J '}＝１（５）（Ｉ＝Ｉ’）のときＲ_{I J I ' J '}＝０（６）Ｄ²（ｇ_{I J}，ｇ_{I ' J '}）≧δ のときＲ_{I J I ' J '}＝０（７）近接フラグメモリー７の内容を図４に示す。図４におい
て、各サブカテゴリー特徴ベクトル番号２１、２２をア
ドレスとし、２３に示す近接フラグの値Ｒ_{IJI ' J '}を
データとするメモリーである。Ｓ_{I J}とＳ_{I J '}のよう
な同一カテゴリーの場合は、近接フラグの値は０とな
り、Ｓ_{I J}とＳ_{I ' J '} （Ｉ≠Ｉ’）のような異なった
カテゴリーについては、式（５）、（６）、（７）の条
件に基づいて、２３に示す近接フラグＲ_{I J I ' J '}の
内容は定められ、近接フラグメモリー７に格納される。D ² (g _IJ , g _{I 'J'} ) = (g _IJ -g _{I 'J'} ) ^t (g _IJ -g _{I 'J'} ) (4) D ² (g _IJ , g _{I 'J) '} ) <Δ (I ≠ I') R _{IJI 'J'} = 1 (5) When (I = I ') R _{IJI' J '} = 0 (6) D ² (g _IJ , g _{I' J '} ) ≧ δ R _{IJI' J '} = 0 (7) The contents of the proximity flag memory 7 are shown in FIG. In FIG. 4, the sub-category feature vector numbers 21 and 22 are used as addresses, and the proximity flag value R _{IJI 'J'} shown as 23 is used as data. In the case of the same category such as _SIJ and _{SIJ '} , the value of the proximity flag is 0. For different categories such as _SIJ and _SI'J _' (I ≠ I '), the expression (5) , (6) and (7), the contents of the proximity flag R _{IJI 'J'} shown in 23 are determined and stored in the proximity flag memory 7.

【００１６】次に判定頻度分布メモリー１０の内容を図
５に示す。各サブカテゴリー特徴ベクトル番号２４、２
５をアドレスとし、２６に示す判定頻度数Ｐ
_{I J I ' J '}がデータとなる行列である。判定頻度分布
メモリー１０としきい値レジスタ１１は判定頻度分布制
御部９において制御され、判定部６において得られたサ
ブカテゴリー特徴ベクトル番号に基づきメモリーの内容
を制御する。Next, the contents of the judgment frequency distribution memory 10 are shown in FIG. Each subcategory feature vector number 24, 2
5 is the address, and the determination frequency P shown in 26
_{IJI 'J'} is a matrix for data. The judgment frequency distribution memory 10 and the threshold value register 11 are controlled by the judgment frequency distribution control unit 9, and control the contents of the memory based on the sub-category feature vector numbers obtained by the judgment unit 6.

【００１７】第２の発明の文字読取装置における、予め
同一筆記者が記入した学習パターンから判定頻度分布を
求めるアルゴリズムは以下のようになる。同一筆記者が
記入した学習文字パターンが判定されたサブカテゴリー
特徴ベクトル番号をＳ_{I J}とし、Ｓ_{I J}で判定される文
字数のカウンターをｑ_{I J}としたとき、ｓｔｅｐ．１
判定頻度数Ｐ_{I J I ' J '}の各成分を初期化する。In the character reading device of the second invention, an algorithm for obtaining a determination frequency distribution from a learning pattern previously written by the same writer is as follows. When the subcategory feature vector number in which the learning character pattern written by the same writer is determined is S _IJ, and the counter of the number of characters determined by S _IJ is q _IJ , step. 1
Each component of the judgment frequency P _{IJI 'J'} is initialized.

【００１８】Ｐ_{I J I ' J '}＝０（Ｉ＝１，・・・，Ｍ，Ｊ＝１，・・・，Ｌ，Ｉ’＝１，・・・，Ｍ，Ｊ’＝１，・・・，Ｌ）（８）ｓｔｅｐ．２カウンターの各成分を初期化する。P _{IJI 'J'} = 0 (I = 1,..., M, J = 1,..., L, I ′ = 1,..., M, J ′ = 1 _,. , L) (8) step. 2 Initialize each component of the counter.

【００１９】ｑ_{I J}＝０、ｒ_{I J}＝０（Ｉ＝１，・・・，Ｍ，Ｊ＝１，・・・，Ｌ）（９）ｓｔｅｐ．３同一筆記者が記入した学習文字パターン
集合の全ての文字について、特徴ベクトルと各サブカテ
ゴリー特徴ベクトル間の距離計算を式（１）に基づき行
い、距離値が最小となるサブカテゴリー特徴ベクトル番
号Ｓ_{I J}としたとき、カウンターの値を加算する。Q _IJ = 0, r _IJ = 0 (I = 1,..., M, J = 1,..., L) (9) step. 3. The distance between the feature vector and each subcategory feature vector is calculated based on equation (1) for all the characters in the set of learning character patterns written by the same writer, and the subcategory feature vector number S that minimizes the distance value is calculated. _{When IJ} is set, add the value of the counter.

【００２０】ｑ_{I J}＝ｑ_{I J}＋１（１０）ｓｔｅｐ．４全学習文字パターンについてｓｔｅｐ．
３を実行した後、カテゴリーの発生頻度とサブカテゴリ
ーの発生頻度の比をしきい値処理を行うことにより、ｒ
_{I J}の値を更新する。Q _IJ = q _IJ +1 (10) step. 4 For all learning character patterns, step.
After executing step 3, the ratio between the frequency of occurrence of the category and the frequency of occurrence of the sub-category is subjected to threshold processing to obtain r
Update the value of _IJ .

【００２１】[0021]

【数１】 (Equation 1)

【００２２】一人の筆記者の記入した学習文字につい
て、ｓｔｅｐ．２からｓｔｅｐ．４を実行するｓｔｅｐ．５ｒ_{I J}＝１かつｒ_{I ' J '}＝１のとき（Ｉ＝１，・・・，Ｍ，Ｊ＝１，・・・，Ｌ，Ｉ’＝１，・・・Ｍ，Ｊ’＝１，・・・，Ｌ）Ｐ_{I J I ' J '}＝１（１２）ｓｔｅｐ．６筆記者を変更した学習文字データベース
において、ｓｔｅｐ．２からｓｔｅｐ．５を実行する。
筆記者をα人としたとき、しきい値レジスタ１１に設定
するしきい値θは式（１３）によって求められる。The learning characters entered by one scribe are written in step. 2 to step. Execute step 4. 5 When r _IJ = 1 and r _{I 'J'} = 1 (I = 1,..., M, J = 1,..., L, I ′ = 1,... M, J ′ = 1 ,..., L) P _{IJI 'J'} = 1 (12) step. 6 In the learning character database in which the scribe is changed, step. 2 to step. Step 5 is executed.
When the number of writers is α, the threshold θ set in the threshold register 11 is obtained by Expression (13).

【００２３】 θ＝ｆ（α）ｆ：単調増加関数（１３）第３の発明の文字読取装置における、帳票あるいは文書
の読取動作中に、未知入力文字パターンから判定頻度分
布を求めるアルゴリズムは以下のようになる。未知入力
文字パターンが判定されるサブカテゴリー特徴ベクトル
番号をＳ_{I J}とし、Ｓ_{I J}で判定される文字数のカウン
ターをｑ_IJとしたとき、ｓｔｅｐ．１判定頻度数Ｐ
_{I J I ' J '}の各成分を初期化する。Θ = f (α) f: monotonically increasing function (13) In the character reading device of the third invention, an algorithm for obtaining a determination frequency distribution from an unknown input character pattern during an operation of reading a form or document is as follows. Become like When the subcategory feature vector number for which an unknown input character pattern is determined is S _IJ and the counter of the number of characters determined by S _IJ is q _IJ , step. 1 Judgment frequency P
_Initialize each component of _{IJI 'J'} .

【００２４】Ｐ_{I J I ' J '}＝０（Ｉ＝１，・・・，Ｍ，Ｊ＝１，・・・，Ｌ，Ｉ’＝１，・・・，Ｍ，Ｊ’＝１，・・・，Ｌ）（１４）ｓｔｅｐ．２カウンターの各成分を初期化する。P _{IJI 'J'} = 0 (I = 1,..., M, J = 1,..., L, I ′ = 1,..., M, J ′ = 1 _,. , L) (14) step. 2 Initialize each component of the counter.

【００２５】ｑ_{I J}＝０、ｒ_{I J}＝０（Ｉ＝１，・・・，Ｍ，Ｊ＝１，・・・，Ｌ）（１５）ｓｔｅｐ．３入力された帳票あるいは文書に記入され
た文字について、特徴ベクトルと各サブカテゴリー特徴
ベクトル間の距離計算を式（１）に基づき行い、距離値
が最小となるサブカテゴリー特徴ベクトル番号をＳ_{I J}
としたとき、カウンターの値を加算する。Q _IJ = 0, r _IJ = 0 (I = 1,..., M, J = 1,..., L) (15) step. 3. The distance between the feature vector and each subcategory feature vector is calculated based on equation (1) for the characters entered in the input form or document, and the subcategory feature vector number that minimizes the distance value is S _IJ
Then, add the value of the counter.

【００２６】ｑ_{I J}＝ｑ_{I J}＋１（１６）ｓｔｅｐ．４全学習文字パターンについてｓｔｅｐ．
３を実行した後、カテゴリーの発生頻度とサブカテゴリ
ーの発生頻度の比をしきい値処理を行うことにより、ｒ
_{I J}の値を更新する。Q _IJ = q _IJ +1 (16) step. 4 For all learning character patterns, step.
After executing step 3, the ratio between the frequency of occurrence of the category and the frequency of occurrence of the sub-category is subjected to threshold processing to obtain r
Update the value of _IJ .

【００２７】[0027]

【数２】 (Equation 2)

【００２８】入力された１枚の帳票あるいは文書に対し
て、ｓｔｅｐ．２からｓｔｅｐ．４を実行する。ｓｔｅｐ．５ｒ_{I J}＝１かつｒ_{I ' J '}＝１の
とき（Ｉ＝１，・・・，Ｍ，Ｊ＝１，・・・，Ｌ，Ｉ’＝１，・・・Ｍ，Ｊ’＝１，・・・，Ｌ）Ｐ_{I J I ' J '}＝１（１８）ｓｔｅｐ．６しきい値レジスタ１１に設定するしきい
値θは式（１９）によって求められる。For one input form or document, step. 2 to step. Execute Step 4. step. 5 When r _IJ = 1 and r _{I 'J'} = 1 (I = 1,..., M, J = 1,..., L, I ′ = 1,... M, J ′ = 1 ,..., L) P _{IJI 'J'} = 1 (18) step. 6 The threshold value θ to be set in the threshold value register 11 is obtained by Expression (19).

【００２９】 θ＝θ₁ θ₁：定数（１９）本発明の文字読取装置では、総合判定部１２を有してお
り、従来、棄却処理を行っていた文字についても、判定
結果メモリー８に格納されている帳票あるいは文書全体
の判定結果情報を利用し、救済処理をすることにより読
取ることを可能とする。Θ = θ ₁ θ ₁ : constant (19) The character reading apparatus of the present invention has the comprehensive judgment unit 12, and stores characters which have been subjected to rejection processing in the judgment result memory 8 in the past. It is possible to read by performing a rescue process using the determined result information of the completed form or the entire document.

【００３０】第１の発明の文字読取装置における総合判
定部の処理を図６、７、８のフローを用いて説明する。
帳票あるいは文書上の全文字について認識処理を行い、
判定結果メモリー８に判定結果が格納されている段階に
おいて、処理２８で第ｋ文字目の判定フラグｈ^{( k )}の
チェックを行い、判定フラグが立っている（ｈ^{( k )}＝
１）ときは棄却を示しているため本処理の対象となる。
処理３２において、第１候補カテゴリーＣ_{I 1} ^{( k )}、
第２候補カテゴリーＣ_{I 2} ^{( k )}について近接フラグで
あるＲ_{I 1 J 1 I 2 J 2}のチェックを行う。近接フラグ
が立っていない（Ｒ_{I 1 J 1 I 2 J 2}＝０）ときは次の
文字に処理を移し、近接フラグが立っている（Ｒ
_{I 1 J 1 I 2 J 2}＝１）ときは他の文字の判定処理デー
タをサーチし、処理３６において、第２候補カテゴリー
Ｃ_{I 2} ^{( k )}と等しいカテゴリーが他の文字の第１候補
カテゴリーとして存在するかどうかチェックする。処理
３８において、等しいカテゴリーが存在しかつ該当する
文字の近接フラグが立っていないとき（Ｒ
_{I 1 ' J 1 ' I 2 ' J 2 '}＝０）、処理４１においてカ
テゴリーＣ_{I 1} ^{( k )}に判定し、処理４０において判定
フラグｈ^{( k )}をクリアする。上記以外の場合は判定フ
ラグが立っていないとき処理４１においてカテゴリーを
Ｃ_{I 1} ^{(k )}に判定し、立っているときは処理４６にお
いて棄却処理を行う。The processing of the overall judgment section in the character reading apparatus of the first invention will be described with reference to the flowcharts of FIGS.
Performs recognition processing for all characters on a form or document,
At the stage where the determination result is stored in the determination result memory 8, the determination flag h ^(k) of the k-th character is checked in process 28, and the determination flag is set (h ^(k) =
Since 1) indicates rejection, it is subject to this processing.
In process 32, the first candidate category C _{I 1} ^(k) ,
The second candidate category C _{I 2} ^(k) a check is _{R I 1 J 1 I 2 J} 2 is a proximity flag. If the proximity flag is not set (R _{I 1 J 1 I 2 J 2} = 0), the processing shifts to the next character, and the proximity flag is set (R
_{If I 1 J 1 I 2 J 2} = 1), the search processing data of another character is searched, and in a process 36, the category equal to the second candidate category C _{I 2} ^(k) is the first candidate category of another character. Check if it exists as. In the process 38, when the same category exists and the proximity flag of the corresponding character is not set (R
_{I 1 'J 1' I 2 'J 2'} = 0), the process 41 determines the category C _{I 1} ^(k) , and the process 40 clears the determination flag h ^(k) . In cases other than the above, when the determination flag is not set, the category is determined to be C _{I 1} ^(k) in the processing 41, and when it is set, the rejection processing is performed in the processing 46.

【００３１】次に、第２、第３の発明の文字読取装置に
おける総合判定部の処理を図９、１０、１１のフローを
用いて説明する。帳票あるいは文書上の全文字について
認識処理を行い、判定結果メモリー８に判定結果が格納
されている段階において、処理４９で第ｋ文字目の判定
フラグｈ^{( k )}のチェックを行い、判定フラグが立って
いる（ｈ^{( k )}＝１）ときは棄却を示しているため本処
理の対称となる。判定結果メモリー８内の他の文字の判
定処理データをサーチし、処理５５において第２候補カ
テゴリーＣ_{I 2} ^{( k )}と等しいカテゴリーが他の文字の
第１候補カテゴリーとして存在するかどうかチェックす
る。等しいカテゴリーが存在した場合、その判定してい
る第１候補のサブカテゴリー特徴ベクトル番号と棄却を
示している文字の第１候補のサブカテゴリー特徴ベクト
ル番号に基づき判定頻度分布メモリー１０を参照し、処
理５９において判定頻度数Ｐ_{I 2 J 2 I 1 ' J 1 '}とし
きい値レジスタ１１の内容であるしきい値θを比較し、Ｐ_{I 2 J 2 I 1 ' J 1 '}≦θ （２０）のとき、処理６１においてカテゴリーをＣ_{I 1} ^{( k )}に
判定し、処理６０において判定フラグｈ^{( k )}をクリア
する。上記以外の場合は判定フラグが立っていないとき
処理６１においてカテゴリーＣ_{I 1} ^{( k )}に判定し、判
定フラグが立っているときは処理６６において棄却処理
を行う。Next, the processing of the overall judgment section in the character reading apparatus according to the second and third aspects of the present invention will be described with reference to the flowcharts of FIGS. Recognition processing is performed for all characters on the form or document, and at the stage where the determination result is stored in the determination result memory 8, the determination flag h ^(k) for the k-th character is checked in processing 49, and the determination flag is determined. When standing (h ^(k) = 1), it indicates rejection, and this processing is symmetric. The judgment processing data of another character in the judgment result memory 8 is searched, and in a process 55, it is checked whether a category equal to the second candidate category C _{I 2} ^(k) exists as the first candidate category of another character. If the same category exists, the judgment frequency distribution memory 10 is referred to based on the subcategory feature vector number of the first candidate being determined and the subcategory feature vector number of the first candidate of the character indicating rejection, and processing is performed. At 59, the judgment frequency P _{I 2 J 2 I 1 'J 1'} is compared with the threshold value θ which is the content of the threshold value register 11, and PI _{2 J 2 I 1 'J 1'} ≦ θ (20) At this time, in step 61, the category is determined to be C _{I 1} ^(k) , and in step 60, the determination flag h ^(k) is cleared. In cases other than the above, when the judgment flag is not set, judgment is made in the category C _{I 1} ^(k) in step 61, and when the judgment flag is set, rejection processing is performed in step 66.

【００３２】図１２において、第１の発明の文字読取装
置における読取結果の説明を、従来技術の読取結果と比
較しながら行う。図１２の第２文字目６８の字形は、カ
テゴリー１かカテゴリー７かあいまいであり、従来の文
字読取装置では棄却処理が行われ、従来技術の読取結果
は、棄却７２となる。ところが、本発明の文字読取装置
では、帳票あるいは文書全体の文字情報に基づき読み取
ることが可能である。第４文字目７０において、カテゴ
リー７と判定可能な字形が存在することにより、第４文
字目７０の文字情報を用い第２文字目６８の文字を再度
判定する。判定の方法としては、全ての文字の判定結果
が判定結果メモリー８に格納された段階において、第４
文字目の判定結果は棄却７２となっており、処理２９に
おける第１候補カテゴリーは１、第２候補カテゴリーは
７である。処理３１における近接フラグはカテゴリー１
と７のサブカテゴリー特徴ベクトルベクトルは、カテゴ
リー１の字形８２、カテゴリー７の字形８５において接
近しているため、第１候補近接フラグ、第２候補近接フ
ラグは１となる。帳票もしくは文書上の他の文字の判定
結果で７３に示す第２候補カテゴリー名である７と同一
カテゴリーが第１候補のカテゴリーとして存在するか否
か処理３６において比較を行い、７４に示す第４文字目
７０において同一カテゴリーが第１候補カテゴリーに存
在する。処理３９において字形８４のサブカテゴリー特
徴ベクトルについては、特に近接する他のカテゴリーが
存在しないため、第１候補近接フラグ、第２候補近接フ
ラグは０となる。よって処理４１おいて読取結果として
カテゴリーを１と強制判定すること可能である。In FIG. 12, the reading result of the character reading apparatus of the first invention will be described while comparing it with the reading result of the prior art. The character shape of the second character 68 in FIG. 12 is ambiguous whether it is category 1 or category 7, and the rejection process is performed in the conventional character reading device, and the rejection result in the related art is rejection 72. However, with the character reading device of the present invention, it is possible to read based on the character information of a form or the entire document. Since the fourth character 70 has a character shape that can be determined as Category 7, the character of the second character 68 is determined again using the character information of the fourth character 70. As a determination method, when the determination results of all the characters are stored in the determination result memory 8, the fourth
The determination result of the character is rejection 72, and the first candidate category in the process 29 is 1 and the second candidate category is 7. The proximity flag in process 31 is category 1
Since the sub-category feature vector vectors of and are close to each other in the character shape 82 of category 1 and the character shape 85 of category 7, the first candidate proximity flag and the second candidate proximity flag become 1. In the processing 36, a comparison is made in the process 36 as to whether or not the same category as the second candidate category name 7 shown in 73 in the determination result of the other characters on the form or document exists as the first candidate category. In character 70, the same category exists as the first candidate category. In the processing 39, regarding the subcategory feature vector of the character shape 84, since there is no other category that is particularly close, the first candidate proximity flag and the second candidate proximity flag become 0. Therefore, it is possible to forcibly determine that the category is 1 as the reading result in the process 41.

【００３３】図１４において、第２、３の発明の文字読
取装置における読取結果の説明を、従来技術の読取結果
と比較しながら行う。図１４の第２文字目７６の字形
は、カテゴリー１かカテゴリー７かあいまいであり、従
来の文字読取装置では棄却処理が行われ、読取結果は、
棄却８０となる。ところが、第２の発明の文字読取装置
では帳票あるいは文書全体の文字情報に基づき読み取る
ことが可能である。第４文字目７８において、カテゴリ
ー７と明らかに判定可能な字形が存在することにより、
第４文字目７８の文字情報を用い第２文字目７６の文字
を再度判定する。判定の方法としては、全ての文字の判
定結果が判定結果メモリー８に格納された段階におい
て、第２文字目７６の判定結果は棄却８０となっている
が、処理５０における第１候補カテゴリーは１、第２候
補カテゴリーは７であり、帳票もしくは文書上の他の文
字の判定結果で８１に示す第２候補カテゴリーと同一カ
テゴリーが第１位のカテゴリーとして存在するか否か処
理５５において比較を行い、８２に示す第４文字目の判
定結果において同一カテゴリーである７が第１候補に存
在する。このとき第２文字目７６の第１候補サブカテゴ
リー特徴ベクトル番号Ｓ_{1 2}と第４文字目７８のサブカ
テゴリー特徴ベクトル番号Ｓ_{7 2}をアドレスとし、判定
頻度分布メモリーから処理５７に示す判定頻度数Ｐ
_{1 2 7 1}をロードし、設定されたしきい値θとの比較を
行う。Referring to FIG. 14, the reading result of the character reading apparatus according to the second and third aspects of the present invention will be described while comparing with the reading result of the prior art. The character shape of the second character 76 in FIG. 14 is ambiguous whether it is category 1 or category 7, and rejection processing is performed in the conventional character reading device, and the reading result is
It will be rejected 80. However, the character reading apparatus according to the second aspect of the present invention can read the form or the entire document based on the character information. In the fourth character 78, there is a character shape that can be clearly determined to be category 7,
The character of the second character 76 is determined again using the character information of the fourth character 78. As a determination method, at the stage when the determination results of all characters are stored in the determination result memory 8, the determination result of the second character 76 is rejection 80, but the first candidate category in the process 50 is 1 The second candidate category is 7, and in the process 55, a comparison is made as to whether or not the same category as the second candidate category indicated by 81 in the determination result of the other characters on the form or document exists as the first category. , 82, which is the same category in the determination result of the fourth character, exists as the first candidate. In this case the first candidate sub category feature vector number S _{1 2} and address subcategories feature vector number S _{7 2} of the fourth character 78 of the second character 76, the number of determination frequency indicating the determination frequency distribution memory to the processing 57 P
_{1 2 7 1} is loaded and compared with the set threshold value θ.

【００３４】図１５には、８４のような字形をした、サ
ブカテゴリー特徴ベクトル番号Ｓ₇₁を基準とした、判
定頻度分布メモリーの内容Ｐ_{I J 7 1}（Ｉ＝１，Ｊ＝
１，・・・，Ｌ，Ｌ＝４）について示している。図１３
においてカテゴリー７の字形を字形８４のように記入す
る筆記者は同一カテゴリー７の字形を別の場所に字形８
５、字形８６のような異なった字形で記入する頻度は極
めて少ない。これは筆記者が文字を記入するときに、同
一のカテゴリーについての字形のばらつきは小さく同じ
ような字形を記入するという性質であり、本発明では、
この性質を利用する。処理５９において、判定頻度数Ｐ
_{7 2 7 1}について、Ｐ_{7 2 7 1}≦θ （２１）が成り立つことより、処理６０において判定フラグをク
リアし、処理６１において判定結果をＣ₁ ^{( 2 )}、カテ
ゴリー１と強制判定する。FIG. 15 shows the contents P _IJ ₇₁ (I = 1, J = 1) of the judgment frequency distribution memory based on the sub-category feature vector number _S71 having a character shape like 84.
1,..., L, L = 4). FIG.
The scribe who fills in the character shape of category 7 as character shape 84 in FIG.
5. The frequency of writing in different character shapes such as character shape 86 is extremely low. This is a property that when a scribe writes a character, the variance of the character shape for the same category is small and a similar character shape is entered.
Take advantage of this property. In process 59, the determination frequency P
_Since P ₇ 27 ₁ ≤θ (21) holds for 7271, the determination flag is cleared in processing 60 and the determination result is forcibly determined to be C ₁ ⁽²⁾ and category 1 in processing 61.

【００３５】[0035]

【発明の効果】以上に説明したように、本発明によれ
ば、帳票上の前後に書かれている文字データの全体の情
報をもとに、筆記者の癖などを原因とする歪を吸収する
ことにより、複数のカテゴリーに類似した文字の読取が
可能である。また印刷文字のマルチフォント文字のよう
に、単独の文字では他のフォントの異なるカテゴリーと
同字形か存在し、読取できない場合のような全体の文字
字形から判定が必要な場合でも本発明の文字読取装置で
は読取が可能である。また、本発明では文字を対象とし
て説明を行ったが、画像、音声、図形を対象としても容
易に実現可能である。また、特徴ベクトルと認識辞書間
の近さを示す尺度としてユークリッド距離を用いて説明
を行ったが、他の距離（マハラノビス距離、シティブロ
ック距離など）、類似度（単純類似度、複合類似度な
ど）にも適用可能である。As described above, according to the present invention, distortion caused by a writer's habit is absorbed based on the entire information of character data written before and after on a form. By doing so, it is possible to read characters similar to a plurality of categories. In addition, even when a single character, such as a multi-font character of a print character, has the same character shape as a different category of another font, and it is necessary to judge from the entire character character shape, such as when the character cannot be read, the character reading of the present invention is performed. The device can read. Although the present invention has been described with reference to characters, the present invention can be easily realized with respect to images, sounds, and graphics. Also, the explanation was made using the Euclidean distance as a measure indicating the closeness between the feature vector and the recognition dictionary, but other distances (Maharanobis distance, city block distance, etc.) and similarities (simple similarity, compound similarity, etc.) ) Is also applicable.

[Brief description of the drawings]

【図１】第１、２、３の発明の文字読取装置の一実施例
を説明するためのブロック図。FIG. 1 is a block diagram for explaining an embodiment of a character reading device according to first, second and third aspects of the present invention;

【図２】従来の文字読取装置の読取結果を説明するため
の図。FIG. 2 is a diagram for explaining a reading result of a conventional character reading device.

【図３】第１、２、３の発明の文字読取装置の判定結果
メモリーの内容を説明するための図。FIG. 3 is a diagram for explaining the contents of a determination result memory of the character reading device according to the first, second, and third inventions.

【図４】第１、２、３の発明の文字読取装置の近接フラ
グメモリーの内容を説明するための図。FIG. 4 is a diagram for explaining the contents of a proximity flag memory of the character reading device according to the first, second, and third inventions.

【図５】第２、３の発明の文字読取装置の判定頻度分布
メモリーの内容を説明するための図。FIG. 5 is a diagram for explaining the contents of a judgment frequency distribution memory of the character reading device according to the second and third inventions.

【図６】第１発明の文字読取装置の総合判定部の処理に
ついて説明するためのフローの一部。FIG. 6 is a part of a flow for explaining a process of a comprehensive judgment unit of the character reading device of the first invention.

【図７】第１発明の文字読取装置の総合判定部の処理に
ついて説明するためのフローの一部。FIG. 7 is a part of a flow for describing a process of a comprehensive judgment unit of the character reading device of the first invention.

【図８】第１発明の文字読取装置の総合判定部の処理に
ついて説明するためのフローの一部。FIG. 8 is a part of a flow for describing a process of a comprehensive judgment unit of the character reading device of the first invention.

【図９】第２の発明の文字読取装置の総合判定部の処理
について説明するためのフローの一部。FIG. 9 is a part of a flow for explaining a process of a comprehensive judgment unit of the character reading device of the second invention.

【図１０】第２の発明の文字読取装置の総合判定部の処
理について説明するためのフローの一部。FIG. 10 is a part of a flow for explaining a process of a comprehensive judgment unit of the character reading device of the second invention.

【図１１】第２の発明の文字読取装置の総合判定部の処
理について説明するためのフローの一部。FIG. 11 is a part of a flow for explaining a process of a comprehensive judgment unit of the character reading device of the second invention.

【図１２】第１の発明の文字読取装置による文字読取結
果と従来技術による文字読取結果について比較説明する
ための図。FIG. 12 is a diagram for comparing and explaining a character reading result by the character reading device of the first invention and a character reading result by the conventional technique.

【図１３】第１、２、３の発明の文字読取装置で読取る
カテゴリー１とカテゴリー７の文字字形例について説明
するための図。FIG. 13 is a view for explaining examples of character shapes of category 1 and category 7 which are read by the character reading apparatuses of the first, second and third inventions.

【図１４】第２、３の発明の文字読取装置による文字読
取結果と従来技術による文字読取結果について比較説明
するための図。FIG. 14 is a diagram for comparing and explaining a character reading result by the character reading device according to the second and third inventions and a character reading result by the conventional technique.

【図１５】第２、３の発明の判定頻度分布メモリーに格
納されている分布の状態について説明するための図。FIG. 15 is a diagram for explaining states of distributions stored in a judgment frequency distribution memory according to the second and third inventions.

[Explanation of symbols]

１スキャナ部２文字切り出し部３特徴抽出部４認識辞書部５距離計算部６判定部７近接フラグメモリー８判定結果メモリー９判定頻度分布制御部１０判定頻度分布メモリー１１しきい値レジスタ１２総合判定部１３判定結果 DESCRIPTION OF SYMBOLS 1 Scanner part 2 Character extraction part 3 Feature extraction part 4 Recognition dictionary part 5 Distance calculation part 6 Judgment part 7 Proximity flag memory 8 Judgment result memory 9 Judgment frequency distribution control part 10 Judgment frequency distribution memory 11 Threshold register 12 Comprehensive judgment part 13 Judgment result

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06K 9/03 G06K 9/62 G06K 9/68 G06K 9/72 ──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int.Cl. ⁶ , DB name) G06K 9/03 G06K 9/62 G06K 9/68 G06K 9/72

Claims

(57) [Claims]

1. A recognition dictionary for storing feature vectors of a plurality of subcategories for one character category, based on a distance value between a feature vector of an input character pattern and a feature vector of each subcategory of the recognition dictionary. In a character reading device that reads characters on a form or a document by using an individual character recognizing unit that performs category determination processing, a feature vector of an input character pattern and each sub-card of a recognition dictionary are used.
Distance value between categorical feature vectors is 1st and 2nd
Sai first candidate, category name of the second candidate, the first candidate distance values, category name of the second candidate, and the determination result memory for storing a determination flag indicating whether to reject or subcategory feature vector number, and read, If the determination flag is i-th
When the character indicates rejection, the first and
2 If the proximity flag is not set for the candidate category,
The i-th character is determined to be unreadable and the process proceeds to the next character.
When the proximity flag is on, the i-th character
The same category as the second candidate category has already been judged
Compare whether the character exists as the first candidate category
Determining the frequency for storing in advance determined frequency of each sub-category feature vectors in the sub-category, wherein a proximity flag memory for storing sets a proximity flag on smaller sub-category pair of distance values between the vectors, the recognition dictionary determined from the same writer to After the determination processing of the distribution memory and one sheet has been completed, if the i-th character indicates rejection, the i-th character
A comparison is made as to whether the same category as the second candidate category of the character exists as the first candidate category of another determined character. When the same category exists as the jth character on the form or document, the i-th character The judgment frequency distribution memory is referred to based on the subcategory feature vector number of the second candidate of the eye and the subcategory feature vector number of the first candidate of the jth character. A character reading device comprising: forced character determination means for forcibly determining a rejected character again using a first candidate category of an eye as a determination result.

2. A determination frequency distribution memory, a character reader according to claim 1, wherein updating the contents of the determination frequency distribution memory during a read operation of the form or character.