JPH06318270A

JPH06318270A - Method and device for reading out chracter

Info

Publication number: JPH06318270A
Application number: JP3000530A
Authority: JP
Inventors: Toshifumi Yamauchi; 俊史山内
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1991-01-08
Filing date: 1991-01-08
Publication date: 1994-11-15

Abstract

PURPOSE:To execute character recognizing processing with high identifying capacity for similar characters by utilizing preceding and succeeding character shape information. CONSTITUTION:A projection vector obtained at the time of executing distance calculation between a feature vector obtained by a feacture extracting part 3 and a recognition dictionary vector stored in a recognition dictionary memory 4 by a distance calculating part 5 is stored in a projection vector register 7. An evaluation function selecting table memory 8 finds out an evaluation function corresponding to the area of the projection vector, and when a judging part 11 judges that a judgement condition is satisfied, stores an evaluation function number 14 in an evaluation function selecting register 9. An evalution function table memory 10 outputs an evaluation value 16 based upon a specified evaluation function and the judging part 11 determines a category based upon the value 16 and a distance value found out by the calculating part 5. Thereby an evaluation function can be changed in accordance with the areas of projection vectors for the preceding and succeeding character shapes and even a character which can not be judged can be judged by changing its evaluation function and recognizing it again.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は手書き文字，印刷文字を
自動読取りする文字読取方法および装置に関し、特に手
書きの変形を有する類似した字体の文字を読取りする文
字読取方法および装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character reading method and a device for automatically reading handwritten characters and printed characters, and more particularly to a character reading method and a device for reading characters of a similar typeface having handwriting variations.

【０００２】[0002]

【従来の技術】従来、手書き文字，印刷文字などの文字
パターンを読取りする文字読取方法および装置では、帳
票上に書かれた文字列データに対し、文字切り出し部に
おいて個々の文字に切り出し処理が行われ、特徴抽出部
において文字パターンに対する特徴抽出処理が行われ
る。特徴抽出処理によって文字パターンの変形が吸収さ
れかつ文字パターンの次元数が低減された特徴ベクトル
に変換される。一般の個別文字認識処理は文字切り出し
部で切り出された個々の未知入力文字パターンに対し特
徴抽出処理を行い、得られた特徴ベクトルと予め学習文
字パターン集合から設計された文字認識辞書ベクトルと
の距離値を計算し、判定部では最小距離値を取る文字カ
テゴリを読取結果とすることにより行われる。2. Description of the Related Art Conventionally, in a character reading method and device for reading a character pattern such as a handwritten character or a printed character, character string data written on a form is cut out into individual characters by a character cutting section. That is, the feature extraction unit performs a feature extraction process on the character pattern. The feature extraction process absorbs the deformation of the character pattern and converts it into a feature vector in which the number of dimensions of the character pattern is reduced. In general individual character recognition processing, the feature extraction processing is performed on each unknown input character pattern cut out by the character cutout unit, and the distance between the obtained feature vector and the character recognition dictionary vector designed in advance from the learned character pattern set. The determination is performed by calculating the value and using the character category having the minimum distance value as the reading result.

【０００３】文字認識辞書は学習文字パターンの特徴ベ
クトルの集合をなるべく低次元の代表的なベクトルで表
現することが望ましく、統計的なデータ圧縮手法として
主成分分析法を用いた文字認識辞書の設計手法がある。
主成分分析法は相関性のあるデータ集合をデータ圧縮す
る手法として有効であり、データ集合の共分散行列の固
有値問題を解き、固有値の大きい順に対応する固有ベク
トルを選ぶことにより求められる。文字認識辞書設計に
応用する際には、文字パターンの変形がカテゴリ毎に相
関性を有することに着目し、学習文字パターン集合の特
徴ベクトル間の相関性を利用し、主成分分析法を用いて
圧縮する。認識辞書ベクトルは学習文字パターン集合の
特徴ベクトルの各成分の平均値からなる平均ベクトル
と、特徴ベクトルの主成分分析により得られる固有ベク
トルである主成分軸ベクトルとから成る。（参考文献：
“手書き文字認識における投影距離法”，池田，田中，
元岡，情報処理学会論文誌，Ｖｏｌ．２４，ＮＯ．１，
Ｊａｎ．１９８３）In the character recognition dictionary, it is desirable to represent a set of feature vectors of learning character patterns by a representative vector of a low dimension as much as possible. Design of the character recognition dictionary using the principal component analysis method as a statistical data compression method. There is a technique.
The principal component analysis method is effective as a method for compressing a data set having a correlation, and is obtained by solving the eigenvalue problem of the covariance matrix of the data set and selecting the corresponding eigenvectors in descending order of eigenvalues. When applied to the character recognition dictionary design, paying attention to the fact that the transformation of character patterns has correlation for each category, using the correlation between the feature vectors of the learning character pattern set, and using the principal component analysis method. Compress. The recognition dictionary vector is composed of an average vector composed of average values of respective components of the feature vector of the learning character pattern set, and a principal component axis vector which is an eigenvector obtained by the principal component analysis of the feature vector. (References:
"Projection distance method in handwriting recognition", Ikeda, Tanaka,
Motooka, Journal of Information Processing Society of Japan, Vol. 24, NO. 1,
Jan. (1983)

【発明が解決しようとする課題】しかし、この従来の文
字読取方法では個々の文字に切り出し処理を行った後、
各文字毎に個々に独立して個別に認識処理を行っている
ため、帳票あるいは文書上の前後に書かれている文字デ
ータの情報はなく、筆記者の癖などが原因で複数の文字
カテゴリと類似した文字パターンが入力されたときは識
別が困難であり、判定不能もしくは誤認識が発生すると
いう欠点がある。However, in this conventional character reading method, after cutting out the individual characters,
Since each character is individually and individually recognized, there is no information on the character data written before or after the form or document, and there are multiple character categories due to the habit of the writer. When a similar character pattern is input, it is difficult to identify it, and there is a drawback in that it cannot be determined or misrecognition occurs.

【０００４】図２は従来技術および本発明による判定結
果を比較して示す図で、（ａ）は帳票に書かれた文字の
例を示し、（ｂ），（ｃ）はそぞれ従来技術，本発明に
よる判定結果を示す。図２の（ａ）において、文字枠内
に５文字記入されているが、第２文字目１２は数字の
‘１’か‘７’かの識別が困難で、従来の文字読取方法
では図２の（ｂ）の５１に示すように、判定不能として
処理される。ところが、人間が文字読取りを行う際には
必ずしも１文字のみに注目して個別に認識を行っている
のではなく、前後に書かれている文字の情報も利用して
いる。図２の（ａ）の文字の例では、１文字だけに注目
した場合は第２文字目１２は‘１’か‘７’かの識別が
困難であるが、第４文字目１３が確実に‘７’と判定で
きることにより、第２文字目は‘１’の可能性が高いと
判定する。１文字ずつ独立して判定処理を行う従来の文
字読取方法では人間に近い読取性能を得るのは困難であ
る。FIG. 2 is a diagram showing comparison between the determination results according to the prior art and the present invention, (a) shows an example of characters written on a form, and (b) and (c) respectively show the prior art. , Shows the determination results according to the present invention. In FIG. 2 (a), five characters are entered in the character frame, but it is difficult to distinguish the second character 12 from the numeral "1" or "7", and the conventional character reading method shown in FIG. As indicated by reference numeral 51 in (b) of FIG. However, when a person reads a character, he or she does not necessarily pay attention to only one character and individually recognizes it, but also uses information of characters written before and after. In the example of the characters in FIG. 2A, when only one character is focused, it is difficult to identify whether the second character 12 is '1' or '7', but the fourth character 13 is surely Since it is possible to determine “7”, it is determined that the second character is likely to be “1”. It is difficult to obtain reading performance close to that of a human with the conventional character reading method in which the determination processing is performed for each character independently.

【０００５】[0005]

【課題を解決するための手段】本発明の文字読取方法
は、学習文字パターンの集合から抽出された特徴ベクト
ルの平均ベクトルと特徴ベクトルの主成分分析により求
められる主成分軸ベクトルとからなる認識辞書ベクトル
と未知入力文字パターンの抽出された特徴ベクトルとの
間の距離値を尺度として文字パターンの属するカテゴリ
の判定処理を行う個別文字認識方法を用いて帳票あるい
は文書上の文字を読み取る文字読取方法において、前記
未知入力パターンの特徴ベクトルと前記平均ベクトルと
の差分ベクトルと前記主成分軸ベクトルとの間の内積演
算により得られる射影ベクトルの領域を定義域として各
カテゴリ毎に定義される評価関数と、前記帳票あるいは
文書上の前後の文字の射影ベクトルの領域に適応して評
価関数を選択し更新する手段を有し、前記距離値と前記
評価関数から得られる評価値とを尺度とした判定処理に
おいて判定不能となった文字パターンに対し前記評価関
数から得られる評価値に基づき再度判定処理を行うこと
を特徴とする。A character reading method according to the present invention is a recognition dictionary comprising an average vector of feature vectors extracted from a set of learned character patterns and a principal component axis vector obtained by principal component analysis of the feature vectors. A character reading method for reading characters on a form or a document using an individual character recognition method that determines the category of a character pattern using the distance value between the vector and the extracted feature vector of the unknown input character pattern as a scale. An evaluation function defined for each category with a domain of a projection vector obtained by an inner product operation between a difference vector between the feature vector of the unknown input pattern and the average vector and the principal component axis vector, The evaluation function is selected and updated according to the area of the projection vector of the preceding and following characters on the form or document. And means for performing the determination process again on the basis of the evaluation value obtained from the evaluation function for the character pattern that cannot be determined in the determination process using the distance value and the evaluation value obtained from the evaluation function as a scale. It is characterized by

【０００６】また本発明の文字読取装置は、文字特徴の
抽出処理を行う特徴抽出部と、学習文字パターンの集合
から前記特徴抽出部において抽出された特徴ベクトルの
平均ベクトルと特徴ベクトルの主成分分析により求めら
れる主成分軸ベクトルとからなる認識辞書ベクトルを格
納する認識辞書メモリを有し、未知入力文字パターンか
ら前記特徴抽出部において抽出された特徴ベクトルと前
記認識辞書ベクトルとの間の距離値を尺度として判定処
理を行う個別文字認識方法を用いて帳票あるいは文書上
の文字を読み取る文字読取装置において、前記未知入力
文字パターンの特徴ベクトルと前記平均ベクトルとの差
分ベクトルと前記主成分軸ベクトルとの間の内積演算に
より得られる射影ベクトル値を格納する射影ベクトルレ
ジスタと、この射影ベクトルレジスタの値をアドレスと
する評価関数選択テーブルメモリと、前記帳票あるいは
文書上の前後の文字の射影ベクトルの領域に適応した評
価関数を選択するためのデータを格納する評価関数選択
レジスタと、選択された評価関数に従い前記射影ベクト
ルレジスタの内容をアドレスとして評価値を出力する評
価関数テーブルメモリと、前記距離値と前記評価値に基
づき判定処理を行う判定部とを備えることを特徴とす
る。Further, the character reading apparatus of the present invention includes a feature extraction unit for performing a process of extracting character features, an average vector of feature vectors extracted by the feature extraction unit from a set of learned character patterns, and a principal component analysis of the feature vectors. It has a recognition dictionary memory for storing a recognition dictionary vector consisting of the principal component axis vector obtained by, and the distance value between the recognition vector and the feature vector extracted by the feature extraction unit from the unknown input character pattern In a character reading device that reads characters on a form or a document using an individual character recognition method that performs determination processing as a scale, a difference vector between the feature vector of the unknown input character pattern and the average vector, and the principal component axis vector And the projection vector register that stores the projection vector value obtained by the inner product operation between An evaluation function selection table memory having a vector register value as an address, an evaluation function selection register for storing data for selecting an evaluation function adapted to a projection vector area of characters before and after the form or document, and a selection An evaluation function table memory that outputs an evaluation value using the contents of the projection vector register as an address according to the evaluated function, and a determination unit that performs determination processing based on the distance value and the evaluation value are provided.

【０００７】[0007]

【実施例】次に、本発明について図面を参照して説明す
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the present invention will be described with reference to the drawings.

【０００８】図１は本発明の文字読取装置の一実施例を
示す構成図である。スキャナ部１において、光学的にス
キャンされた帳票イメージデータに対し２値化処理を行
い、白黒２値レベルの文字列パターンを生成する。文字
切り出し部２では、文字列パターンのピッチ情報などに
基づき文字列パターンから個々の文字を切り出す処理が
行われる。特徴抽出部３では、文字の濃淡特徴，輪郭特
徴などの文字特徴を抽出し、特徴ベクトルｆ＝（ｆ₁，
……，ｆ_N）を生成する。認識辞書メモリ４には、認識
対象の文字カテゴリＣ_i（ｉ＝１，……，Ｍ）の学習パ
ターンの特徴ベクトルの集合から得られる平均ベクトル
ｇ_i＝（ｇ_i1，……，ｇ_iN）と主成分軸ベクトルφ_ik＝
（φ_ik1，……，φ_ikN），（ｋ＝１，……，Ｌ）から
成る認識辞書ベクトルを格納しており、距離計算部５で
は、特徴ベクトルｆとカテゴリＣ_iの認識辞書ベクトル
との距離値Ｄ_iを（１）式により求める。FIG. 1 is a block diagram showing an embodiment of the character reading apparatus of the present invention. In the scanner unit 1, the optically scanned form image data is binarized to generate a monochrome binary level character string pattern. The character cutout unit 2 cuts out individual characters from the character string pattern based on the pitch information of the character string pattern and the like. The feature extraction unit 3 extracts the character features such as the grayscale feature and the outline feature of the character, and the feature vector f = (f ₁ ,
.., f _N ) is generated. In the recognition dictionary memory 4, an average vector g _i = (g _i1 , ..., g _iN ) obtained from a set of feature vectors of learning patterns of the character category C _i (i = 1, ..., M) to be recognized. And the principal component axis vector φ _ik ＝
A recognition dictionary vector consisting of (φ _ik1 , ..., φ _ikN ), (k = 1, ..., L) is stored, and the distance calculation unit 5 stores the feature vector f and the recognition dictionary vector of the category C _i. The distance value D _{i of} is calculated by the equation (1).

【０００９】 [0009]

【００１０】カテゴリレジスタ６には、最小距離値Ｄ_j1
を取るカテゴリＣ_j1を格納する。The minimum distance value D _{j1 is stored in the} category register 6.
The category C _j1 that takes is stored.

【００１１】 [0011]

【００１２】射影ベクトルレジスタ７に、（３）式に示
す特徴ベクトルｆと主成分軸ベクトルφ_ikの内積値より
求められる射影ベクトルＲ_i＝（ｒ_i1，…，ｒ_ik，…，
ｒ_iL）が格納される。In the projection vector register 7, a projection vector R _i = (r _i1 , ..., R _ik , ..., _Which is obtained from the inner product of the feature vector f and the principal component axis vector φ _ik shown in the equation (3).
r _iL ) is stored.

【００１３】ｒ_ik＝φ_ik・（ｆ−ｇ_i） …（３）評価関数選択テーブルメモリ８は、各カテゴリに対し射
影ベクトルの領域に適応した評価関数を選択するためデ
ータを与えるメモリで、カテゴリレジスタ６の値と射影
ベクトルレジスタ７の値をアドレスとし、出力するデー
タは各カテゴリの射影ベクトルの領域毎に選択すべき関
数の番号である。R _ik = φ _ik · (f−g _i ) ... (3) The evaluation function selection table memory 8 is a memory that gives data for selecting an evaluation function adapted to the area of the projection vector for each category. The value of the category register 6 and the value of the projection vector register 7 are used as addresses, and the output data is the number of the function to be selected for each area of the projection vector of each category.

【００１４】評価関数選択レジスタ９は、カテゴリの数
だけ準備されたレジスタであり、各カテゴリに対応する
選択すべき関数の番号の最新の値を記憶しておく。The evaluation function selection register 9 is a register prepared as many as the number of categories, and stores the latest value of the number of the function to be selected corresponding to each category.

【００１５】評価関数テーブルメモリ１０は、各カテゴ
リ毎に射影ベクトルの領域に応じて評価値を与えるため
のデータを格納してあり、カテゴリレジスタ６の内容で
あるカテゴリ名と、評価関数選択レジスタ９の内容であ
る選択すべき関数の番号と、射影ベクトルレジスタ７の
値をアドレスとし、評価値１６がデータとして出力され
る。The evaluation function table memory 10 stores data for giving an evaluation value according to the area of the projection vector for each category. The category name as the contents of the category register 6 and the evaluation function selection register 9 are stored. The evaluation value 16 is output as data, with the number of the function to be selected, which is the content of, and the value of the projection vector register 7 as an address.

【００１６】判定部１１は、第１回目の判定処理とし
て、距離計算部５において求められた各カテゴリに対す
る距離値Ｄ_iと予め設定されている閾値との比較処理お
よび各カテゴリに対し評価値１６の比較処理を行い、カ
テゴリの判定処理を行うかもしくは判定不能として処理
するかの決定を行う。また、第１回目の処理で判定不能
となった文字についてのみ再度判定処理を行い、評価関
数テーブルメモリ１０の出力である評価値１６の値に基
づき、カテゴリの判定処理を行うかもしくは判定不能と
して処理するかの第２回目の決定を行う。As the first determination processing, the determination unit 11 compares the distance value D _i for each category obtained by the distance calculation unit 5 with a preset threshold value and the evaluation value 16 for each category. Is performed to determine whether to perform the category determination process or the undeterminable process. Further, only the character that cannot be determined in the first processing is subjected to the determination processing again, and based on the value of the evaluation value 16 which is the output of the evaluation function table memory 10, the category determination processing is performed or the determination is not possible. A second decision is made whether to process.

【００１７】次に、本実施例の動作について説明を行
う。図２の（ａ）に示す帳票に書かれた文字データに対
してスキャナ部１において光学的にスキャン，２値化処
理が行われ、文字切り出し部２においては個々の文字に
切り出す処理が行われる。特徴抽出部３においては特徴
ベクトルｆに変換処理が行われ、距離計算部５において
は（１）式に基づく特徴ベクトルと認識辞書ベクトルと
の間の距離計算が行われる。判定部１１においては
（２）式により得られた第１候補カテゴリＣ_j1と（４）
式によって得られる第２候補カテゴリＣ_j2の距離値のチ
ェックが行われる。Next, the operation of this embodiment will be described. The scanner unit 1 optically scans and binarizes the character data written in the form shown in FIG. 2A, and the character cutout unit 2 cuts out individual characters. . The feature extraction unit 3 performs a conversion process on the feature vector f, and the distance calculation unit 5 calculates the distance between the feature vector and the recognition dictionary vector based on the equation (1). In the determination unit 11, the first candidate category C _j1 obtained by the equation (2) and (4)
The distance value of the second candidate category C _j2 obtained by the formula is checked.

【００１８】 [0018]

【００１９】距離値の閾値として、予めε（＞０），δ
（＞０）を定義しておき、Ｄ_j1＜δ …（５）Ｄ_j2−Ｄ_j1＞ε …（６）また評価関数テーブルメモリの出力値Ｐi （Ｒi ）に対
し、Ｐ_j1（Ｒ_j1）＞０ …（７）Ｐ_i（Ｒ_i）＝０（ｉ≠ｊ１なるすべてのｉについて） …（８）とする。As thresholds for the distance value, ε (> 0), δ
(> 0) is defined, and D _j1 <δ (5) D _j2 -D _j1 > ε (6) Further, P _j1 (R _j1 ) is set for the output value Pi (Ri) of the evaluation function table memory. > 0 (7) P _i (R _i ) = 0 (for all i where i ≠ j1) (8)

【００２０】第１回目の判定では、（５），（６），
（７），（８）式をともに満たすときカテゴリＣ_j1に判
定処理を行い、それ以外の場合は判定不能処理を行う。
第２回目の判定では、第１回目判定不能文字のみに対し
処理を行い、（７），（８）式をともに満たすとき判定
カテゴリをＣ_j1とする。In the first judgment, (5), (6),
When both the expressions (7) and (8) are satisfied, the determination process is performed on the category C _j1 , and otherwise the determination determination process is performed.
In the second determination, only the first undeterminable character is processed, and the determination category is set to C _j1 when both the expressions (7) and (8) are satisfied.

【００２１】また距離計算部５において得られた第１候
補カテゴリＣ_j1はカテゴリレジスタ６に格納され、
（３）式により得られた射影ベクトルは射影ベクトルレ
ジスタ７に格納される。評価関数選択テーブルメモリ８
には各カテゴリ毎に射影ベクトルの領域に応じて用いる
べき評価関数番号１４が出力される。判定部１１の判定
結果がカテゴリＣ_j1に判定処理を行ったとき、判定信号
１５に基づき、各カテゴリ別の評価関数選択レジスタ９
に評価関数番号１４は格納される。判定部１１において
（５），（６）式が満たされないとき、判定信号１５は
出力されず、評価関数番号１４は評価関数選択レジスタ
９には格納されない。The first candidate category C _j1 obtained by the distance calculation unit 5 is stored in the category register 6,
The projection vector obtained by the equation (3) is stored in the projection vector register 7. Evaluation function selection table memory 8
For each category, an evaluation function number 14 to be used according to the area of the projection vector is output. When the determination result of the determination unit 11 performs the determination process for the category C _j1 , the evaluation function selection register 9 for each category is determined based on the determination signal 15.
The evaluation function number 14 is stored in. When the determination unit 11 does not satisfy the equations (5) and (6), the determination signal 15 is not output and the evaluation function number 14 is not stored in the evaluation function selection register 9.

【００２２】図２の（ａ）に示す文字パターンの例で
は、第２文字目１２以外は標準的な文字字形をしている
ため判定部１１によって得られる結果は第２文字目１２
のみ判定不能であり、他の文字はカテゴリの判定処理を
行っている。よって本実施例の詳細については第２文字
目１２と第４文字目１３について説明を行う。In the example of the character pattern shown in FIG. 2A, the result obtained by the determination unit 11 is the second character 12 because the characters other than the second character 12 have standard character shapes.
Only the character cannot be determined, and the other characters are subjected to category determination processing. Therefore, the second character 12 and the fourth character 13 will be described in detail for the present embodiment.

【００２３】図３は主成分分析法を用いた個別文字認識
方法において未知入力パターンの認識処理を説明するた
めの図で、数字の‘１’の学習文字パターンの集合から
得られる特徴ベクトルの分布１８，数字の‘７’の学習
文字パターンの分布１９を示している。共分散行列の固
有値展開として求められる主成分分析法では‘１’の特
徴ベクトルの平均ベクトルｇ₁を通り、第１主成分軸ベ
クトルφ₁₁と、直交した第２主成分軸ベクトルφ₁₂が得
られる。‘７’についても同様に平均ベクトルｇ₇と、
第１主成分軸ベクトルφ₇₁と、第２主成分軸ベクトルφ
₇₂が得られる。未知入力パターンの特徴ベクトルｆと認
識辞書ベクトルとの距離は特徴ベクトルｆから平均ベク
トルと主成分軸ベクトルで生成される平面に対する垂線
の長さとなり、カテゴリ‘１’との距離Ｄ₁とカテゴリ
‘７’との距離Ｄ₇として得られる。また（３）式によ
り求めた射影ベクトルＲ₁，Ｒ₇はそれぞれの平均ベク
トルを原点とした距離Ｄ₁，Ｄ₇の垂線と平面の交点の
位置ベクトルとして求められる。図３の特徴ベクトルの
分布の場合には、２つのカテゴリ‘１’と‘７’の特徴
ベクトルが近接している領域２９があり、図２の（ａ）
の第２文字目１２に示したような‘１’と‘７’の類似
字形は一般に近接領域２９に特徴ベクトルｆが存在して
いる。FIG. 3 is a diagram for explaining the recognition processing of the unknown input pattern in the individual character recognition method using the principal component analysis method, and the distribution of the feature vector obtained from the set of learning character patterns of the number "1". 18 shows a distribution 19 of learning character patterns of the numeral "7". In the principal component analysis method obtained as the eigenvalue expansion of the covariance matrix, the first principal component axis vector φ ₁₁ and the orthogonal second principal component axis vector φ ₁₂ are obtained through the mean vector g ₁ of the feature vectors of '1'. To be Similarly for '7', the average vector g ₇
The first principal component axis vector φ ₇₁ and the second principal component axis vector φ
₇₂ is obtained. The distance between the feature vector f of the unknown input pattern and the recognition dictionary vector is the length of a perpendicular line to the plane generated from the feature vector f by the average vector and the principal component axis vector, and the distance D _{1 from the} category '1' and the category ' It is obtained as the distance D _{7 from} 7 '. Further, the projection vectors R ₁ and R ₇ obtained by the equation (3) are obtained as the position vector of the intersection of the perpendicular of the distances D ₁ and D ₇ with the respective average vectors as the origin and the plane. In the case of the distribution of the feature vectors of FIG. 3, there is a region 29 in which the feature vectors of the two categories “1” and “7” are close to each other, and FIG.
In the similar character shapes of “1” and “7” as shown in the second character 12 of, the feature vector f generally exists in the proximity area 29.

【００２４】図４は本発明における評価関数について説
明するための図で、これを参照して図２の（ａ）の第２
文字目１２と第４文字目１３とに示す文字データが入力
されたときの判定方法について説明する。図４における
ｆ⁽²⁾は第２文字目１２の特徴ベクトル、ｆ⁽⁴⁾は第４
文字目１３の特徴ベクトルを示し、‘７’の認識辞書の
平均ベクトルと主成分軸ベクトルに基づき、特徴ベクト
ルｆ⁽²⁾を射影したベクトルが射影ベクトルＲ₇ ⁽²⁾，
特徴ベクトル３２を射影したベクトルが射影ベクトルＲ
₇ ⁽⁴⁾である。また射影ベクトルＲ₇ ⁽²⁾の第１主成分
軸に対する成分をｒ₇ ⁽²⁾，射影ベクトルＲ₇ ⁽⁴⁾の第
１主成分軸に対する成分をｒ₇ ⁽⁴⁾とする。図４の下半
分には‘７’の認識辞書における射影ベクトルの第１主
成分軸に対する成分と評価値との関係をあらわす評価関
数の一例が示してある。FIG. 4 is a diagram for explaining the evaluation function in the present invention. With reference to FIG. 4, the second function shown in FIG.
A determination method when the character data indicated by the 12th character and the 4th character 13 is input will be described. In FIG. 4, f ⁽²⁾ is the feature vector of the second character 12, and f ⁽⁴⁾ is the fourth
A feature vector of the thirteenth character is shown, and a vector obtained by projecting the feature vector f ⁽²⁾ is a projection vector R ₇ ⁽²⁾ , based on the average vector of the recognition dictionary of '7' and the principal component axis vector.
The vector obtained by projecting the feature vector 32 is the projection vector R
₇ ⁽⁴⁾ . The component of the projection vector R ₇ ⁽²⁾ with respect to the first principal component axis is r ₇ ⁽²⁾ , and the component of the projection vector R ₇ ⁽⁴⁾ with respect to the first principal component axis is r ₇ ⁽⁴⁾ . The lower half of FIG. 4 shows an example of an evaluation function showing the relationship between the evaluation value and the component of the projection vector in the recognition dictionary of '7' with respect to the first principal component axis.

【００２５】本例では評価関数は信頼度を表わしてお
り、学習文字パターンの集合の平均ベクトルに近いほど
評価値は大きく、平均ベクトルからの距離が発生するに
したがって評価値は小さくなる関数である。また本発明
の特長として、複数の評価関数を記憶でき、前後の文字
の射影ベクトルの情報をもとに評価関数を変更すること
が可能である。図４の例では、カテゴリ‘７’に対する
評価関数として傾斜の異なる２種類の評価関数、すなわ
ち評価関数Ｐ₇₁（ｒ）と、評価関数Ｐ₇₂（ｒ）を有し、
カテゴリ‘１’に対する評価関数としても同様に評価関
数Ｐ₁₁（ｒ）と、評価関数Ｐ₁₂（ｒ）を有している。In this example, the evaluation function represents the reliability. The evaluation value is larger as it is closer to the average vector of the set of learning character patterns, and the evaluation value is smaller as the distance from the average vector is increased. . Further, as a feature of the present invention, a plurality of evaluation functions can be stored, and the evaluation function can be changed based on the information of the projection vectors of the preceding and succeeding characters. In the example of FIG. 4, two types of evaluation functions having different slopes, that is, an evaluation function P ₇₁ (r) and an evaluation function P ₇₂ (r) are provided as evaluation functions for the category “7”,
Similarly, the evaluation function P ₁₁ (r) and the evaluation function P ₁₂ (r) are provided as the evaluation function for the category “1”.

【００２６】評価関数は評価関数テーブルメモリ１０に
格納されており、各カテゴリ毎に複数の評価関数を記憶
してあり、使用する評価関数の種類は評価関数選択レジ
スタ９において指定される。The evaluation function is stored in the evaluation function table memory 10, and a plurality of evaluation functions are stored for each category. The kind of evaluation function to be used is designated in the evaluation function selection register 9.

【００２７】図５は本発明における評価関数の選択方法
について説明するための図である。図５を併用して評価
関数選択テーブルメモリ８の内容について説明する。特
徴ベクトルの射影ベクトルが平均ベクトルの近傍領域４
０にあるときは評価関数Ｐ₇₁（ｒ）を選択し、平均ベク
トルから離れた領域４１にあるときは評価関数Ｐ
₁₁（ｒ）を選択する。また評価関数選択レジスタ９の初
期値としては評価関数Ｐ₁₁（ｒ）を選択するように値を
セットしておく。すなわち、図４に示した評価関数Ｐ₁₁
（ｒ）と評価関数Ｐ₇₁（ｒ）が用いられる。FIG. 5 is a diagram for explaining a method of selecting an evaluation function according to the present invention. The contents of the evaluation function selection table memory 8 will be described with reference to FIG. Region 4 where the projection vector of the feature vector is the average vector
When it is 0, the evaluation function P ₇₁ (r) is selected, and when it is in the area 41 away from the average vector, the evaluation function P ₇₁ (r) is selected.
₁₁ Select (r). As the initial value of the evaluation function selection register 9, a value is set so as to select the evaluation function P ₁₁ (r). That is, the evaluation function P ₁₁ shown in FIG.
(R) and the evaluation function P ₇₁ (r) are used.

【００２８】第１回目の判定では、図２における第２文
字目１２が入力されたとき、（５），（６），（７），
（８）式を満たさないため判定結果はリジェクトとな
り、判定信号１５は発生せず評価関数選択テーブルメモ
リ８の内容は評価関数選択レジスタ９には格納されない
ため、評価関数選択レジスタ９の内容は初期値のままで
評価関数Ｐ₇₁（ｒ）が選択されるが、第４文字目１３が
入力されたとき、射影ベクトルＲ₇ ⁽⁴⁾は領域４０にあ
たるため評価関数Ｐ₇₂（ｒ）の番号が評価関数選択テー
ブルメモリ８から評価関数番号１４に出力され、判定部
１１においても（５），（６），（７），（８）式を満
たしているため、判定信号１５が出力され、評価関数Ｐ
₇₂（ｒ）の番号が‘７’のカテゴリの評価関数選択レジ
スタ９に格納される。すなわち第４文字目１３を認識し
た時点以降は、評価関数としてＰ₇₂（ｒ）が用いられ
る。In the first determination, when the second character 12 in FIG. 2 is input, (5), (6), (7),
Since the judgment result is rejected because the expression (8) is not satisfied, the judgment signal 15 is not generated, and the contents of the evaluation function selection table memory 8 are not stored in the evaluation function selection register 9. Therefore, the contents of the evaluation function selection register 9 are initially set. The evaluation function P ₇₁ (r) is selected with the value as it is, but when the fourth character 13 is input, the projection vector R ₇ ⁽⁴⁾ corresponds to the area 40, so that the number of the evaluation function P ₇₂ (r) is The evaluation function selection table memory 8 outputs the evaluation function number 14, and the judgment unit 11 also satisfies the expressions (5), (6), (7), and (8). Function P
_{The 72} (r) number is stored in the evaluation function selection register 9 of the category of “7”. That is, after the fourth character 13 is recognized, P ₇₂ (r) is used as the evaluation function.

【００２９】第２回目の判定では、判定不能文字として
第２文字目１２の情報が再びセットされ、射影ベクトル
の領域としては領域４０にあるため評価関数選択テーブ
ルメモリ８の出力としては評価関数Ｐ₇₁（ｒ）が選択さ
れるが、（５），（６）式を満たしていないため、判定
信号１５は出力されず評価関数選択レジスタ９の内容
は、評価関数Ｐ₇₂（ｒ）の番号のままで、評価関数とし
てはＰ₇₂（ｒ）が用いられる。評価関数テーブルメモリ
１０の出力として、Ｐ₁₁（Ｒ₇ ⁽²⁾）＞０，Ｐ₇₂（Ｒ₇
⁽²⁾）＝０となるため、（７），（８）の条件式を満た
し、図２の（ｃ）に示すように判定結果５６としてカテ
ゴリ‘１’が得られる。In the second judgment, the information of the second character 12 is set again as an undecidable character, and since the area of the projection vector is in the area 40, the evaluation function P is the output of the evaluation function selection table memory 8. ₇₁ (r) is selected, but since the expressions (5) and (6) are not satisfied, the determination signal 15 is not output and the contents of the evaluation function selection register 9 are the same as those of the evaluation function P ₇₂ (r). Up to now, P ₇₂ (r) is used as the evaluation function. As an output of the evaluation function table memory 10, P ₁₁ (R ₇ ⁽²⁾ )> 0, P ₇₂ (R ₇
^{Since (2)} ) = 0, the conditional expressions (7) and (8) are satisfied, and the category "1" is obtained as the determination result 56 as shown in (c) of FIG.

【００３０】また、第２回目の判定に必要な文字情報と
してはカテゴリ名と射影ベクトルのみであることより、
判定部１１において判定不能となった時にカテゴリレジ
スタ６と射影ベクトルレジスタ７の値を保存しておけ
ば、スキャナ部１，文字切り出し部２，特徴抽出部３，
距離計算部５における処理は再度行う必要はない。Further, since the character information necessary for the second determination is only the category name and the projection vector,
If the values of the category register 6 and the projection vector register 7 are saved when the determination unit 11 cannot determine, the scanner unit 1, the character cutting unit 2, the feature extraction unit 3,
The process in the distance calculation unit 5 does not need to be performed again.

【００３１】なお、本実施例では文字を対象としている
が、画像，音声，図形を対象としても同様の動作とな
る。In this embodiment, characters are used as targets, but the same operation is applied to images, voices, and graphics.

【００３２】[0032]

【発明の効果】以上説明したように本発明によれば、帳
票上の前後に書かれている文字データの情報をもとに、
筆記者の癖などを原因とする複数のカテゴリに類似した
文字の読取が可能である。また、評価関数は主成分分析
を用い圧縮された空間において定義されているため、前
後の文字特徴情報を保存する上でも効率がよい。さら
に、本発明における評価関数はシステムに応じて自由に
設定でき、用いる評価関数も自由に選択可能なため柔軟
な読取処理が可能である。また、本発明は文字以外に画
像，音声，図形を対象としても容易に実現可能である。As described above, according to the present invention, based on the information of the character data written before and after the form,
It is possible to read characters similar to a plurality of categories due to the habit of a writer. Further, since the evaluation function is defined in the space compressed by using the principal component analysis, it is efficient in saving the character feature information before and after. Furthermore, the evaluation function in the present invention can be set freely according to the system, and the evaluation function to be used can also be freely selected, so that flexible reading processing is possible. Further, the present invention can be easily realized not only for characters but also for images, sounds and figures.

[Brief description of drawings]

【図１】本発明の文字読取装置の一実施例を示す構成図
である。FIG. 1 is a configuration diagram showing an embodiment of a character reading device of the present invention.

【図２】従来技術と本発明とによる判定結果を比較して
示す図である。FIG. 2 is a diagram showing comparison of determination results according to a conventional technique and the present invention.

【図３】主成分分析法を用いた個別文字認識方法におけ
る未知入力パターンの認識処理を説明するための図であ
る。FIG. 3 is a diagram for explaining a recognition process of an unknown input pattern in the individual character recognition method using the principal component analysis method.

【図４】本発明における評価関数について説明するため
の図である。FIG. 4 is a diagram for explaining an evaluation function in the present invention.

【図５】本発明における評価関数の選択方法について説
明するための図である。FIG. 5 is a diagram for explaining a method of selecting an evaluation function according to the present invention.

[Explanation of symbols]

１スキャナ部２文字切り出し部３特徴抽出部４認識辞書メモリ５距離計算部６カテゴリレジスタ７射影ベクトルレジスタ８評価関数選択テーブルメモリ９評価関数選択レジスタ１０評価関数テーブルメモリ１１判定部１４評価関数番号１５判定信号１６評価値 1 Scanner Section 2 Character Extraction Section 3 Feature Extraction Section 4 Recognition Dictionary Memory 5 Distance Calculation Section 6 Category Register 7 Projection Vector Register 8 Evaluation Function Selection Table Memory 9 Evaluation Function Selection Register 10 Evaluation Function Table Memory 11 Evaluation Section 14 Evaluation Function Number 15 Judgment signal 16 Evaluation value

Claims

[Claims]

1. A recognition dictionary vector composed of an average vector of feature vectors extracted from a set of learning character patterns and a principal component axis vector obtained by principal component analysis of the feature vectors and an extracted feature vector of unknown input character patterns. In a character reading method for reading characters on a form or a document by using an individual character recognition method for determining a category to which a character pattern belongs, using a distance value between and as a scale, the feature vector of the unknown input pattern and the average vector And the evaluation function defined for each category with the area of the projection vector obtained by the inner product operation between the difference vector and the principal component axis vector, and the projection vector of the preceding and following characters on the form or document. Of the evaluation function by selecting and updating the evaluation function adapted to the region of The character reading method is characterized in that the judgment process is performed again on the character pattern that cannot be judged in the judgment process using the evaluation value obtained from the above as a scale based on the evaluation value obtained from the evaluation function.

2. A feature extraction unit for extracting character features, and an average vector of feature vectors extracted by the feature extraction unit from a set of learned character patterns and a principal component axis vector obtained by principal component analysis of the feature vectors. A recognition dictionary memory that stores a recognition dictionary vector consisting of and, and performs a determination process using the distance value between the feature vector extracted from the feature extraction unit from the unknown input character pattern and the recognition dictionary vector as a scale. In a character reading device for reading characters on a form or a document by using a character recognition method, it is obtained by an inner product operation between a difference vector between the feature vector of the unknown input character pattern and the average vector and the principal component axis vector. Add the projection vector register that stores the projection vector value and the value of this projection vector register. Memory, an evaluation function selection table that stores data for selecting an evaluation function adapted to the area of the projection vector of the preceding and following characters on the form or document, and the evaluation function selection table according to the selected evaluation function. An evaluation function table memory for outputting an evaluation value with the contents of the projection vector register as an address,
A character reading device comprising: a determination unit that performs determination processing based on the distance value and the evaluation value.