JPS6355105B2

JPS6355105B2 -

Info

Publication number: JPS6355105B2
Application number: JP55120189A
Authority: JP
Inventors: Eiichiro Yamamoto; Kyonori Myata; Hideaki Sugawara; Yoshihisa Fujii; Koya Fujita
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1980-08-30
Filing date: 1980-08-30
Publication date: 1988-11-01
Also published as: JPS5745680A

Description

【発明の詳細な説明】本発明は、辞書の標準パターンをサンプル文字
により漸次修正する標準パターン修正機能を有し
た文字認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character recognition device having a standard pattern correction function that gradually corrects a standard pattern in a dictionary using sample characters.

手書文字の変形は種々様々であるため、印刷文
字のようにその変形を少数のサンプルから推定す
ることは難しい。したがつて、手書文字の認識装
置の読取性能を向上させるためには、多数のサン
プル文字を使用し、それらの文字から識別に供す
る辞書の標準パターンを作成する必要がある。し
かしながら、あらかじめ多数のサンプル文字を集
めることは困難である。そこで初めは少数のサン
プル文字により辞書パターンを作成し、漸次、サ
ンプル文字を増加してその都度該サンプル文字に
よる辞書パターン修正を行なう式が合理的であ
る。この場合サンプル数が増加すれば一層良好な
標準パターンになるかと言えばそうではなく、あ
らゆる変形文字を取入れてその平均値などから割
出した標準パターンはむしろ認識対象文字から離
れたものになつてしまう。 Since handwritten characters are deformed in a variety of ways, it is difficult to estimate the deformations from a small number of samples, as is the case with printed characters. Therefore, in order to improve the reading performance of a handwritten character recognition device, it is necessary to use a large number of sample characters and create standard patterns for a dictionary to be used for identification from these characters. However, it is difficult to collect a large number of sample characters in advance. Therefore, it is reasonable to first create a dictionary pattern using a small number of sample characters, and then gradually increase the number of sample characters and modify the dictionary pattern using the sample characters each time. In this case, it is not true that increasing the number of samples will result in a better standard pattern; rather, a standard pattern that incorporates all kinds of modified characters and calculates them from their average value will become far removed from the target character. Put it away.

本発明はかゝる問題に対処しようとするもので
あり、特徴とする所は、カテゴリＣに属するサン
プルパターンX〓₁〜X〓_Nから導出されそしてパター
ン認識に対する標準パターンとなる平均値μ〓^C _N、お
よび標準偏差σ〓^C _N、ならびにサンプル数Ｎを記憶す
る辞書を備え、また新たに入力されたサンプルパ
ターンX〓_N+1を含めた標準偏差σ〓^C _N+1が閾値ｄに対
し ‖σ〓^C _N+1‖ｄであるときは該サンプルパターンX〓_N+1をカテゴ
リＣに含めてその平均値をμ〓^C _N+1に修正し、また ‖σ〓_N+1‖＞ｄであるときは該カテゴリＣの平均値はμ〓^C _N+1とした
ままで該サンプルパターンX〓_N+1でサブカテゴリ
C′を起す辞書修正回路を有することにあるが、以
下図面を参照しながらこれを詳細に説明する。 The present invention attempts to deal with such a problem, and is characterized by an average _value _μ ^C _N , standard deviation σ〓 ^C _N , and the number of samples N are stored in the dictionary, and the standard deviation σ〓 ^C _{N+1 including the newly input sample pattern X〓 N+} ₁ becomes the threshold d. On the other hand ^, when ‖σ〓 ^C _N ₊₁ _‖d _, the sample pattern > d, the average value of the category C is μ〓 ^C _N+1 , and the sample pattern X〓 _N+1 is the subcategory.
The main purpose of the present invention is to have a dictionary correction circuit for generating C', which will be explained in detail below with reference to the drawings.

サンプル値の系列をxjとし、ｊ＝１、……、Ｎ
のＮ個のサンプルの平均値をμ_N、標準偏差をσ_Nと
すれば、 μ_N＝１／Ｎ_N 〓^j=1 xj 従つて_N 〓^j=1 xj＝Nμ_N ……(1) σ² _N＝１／Ｎ_N 〓^j=1 （xj−μ_N）² 従つて_N 〓^j=1 x² _j＝Ｎ（σ² _N＋μ² _N） ……(2) で表わされる。今μ_N、σ_N、Ｎが既知とし、新たに
（Ｎ＋１）番目のサンプル値x_N+1が得られたとす
ると、平均値μ_Nおよび分散σ² _N+1は μ_N+1＝１／Ｎ＋１_N 〓^j=1 xj＝１／Ｎ＋１（_N 〓^j=1 xj＋x_N+1）＝１／Ｎ＋１（Nμ_N＋x_N+1） ……(3) σ² _N+1＝１／Ｎ＋１_N 〓^j=1 （xj−μ_N+1）²＝１／Ｎ＋１_N 〓^j=1 x² _j−μ² _N+1＝１／Ｎ＋１（_N 〓^j=1 x² _j＋x² _N+1）−μ² _N+1 ＝１／Ｎ＋１｛Ｎ（σ² _N＋μ² _N）x² _N+1｝−μ² _N+1…
…(4) となり、μ_N+1、σ_N+1は、μ_N、σ_N、Ｎおよびx_N+1か
ら求めることができる。即ち、Ｎ個のサンプルに
ついてのμ_N、σ_Nが既知とき、サンプルが１つ追加
された（Ｎ＋１）個のサンプルについてのμ_N+1、
σ_N+1は(3)、(4)式から求めることができる。本発明
はこの関係を利用してＮ個のサンプルによる標準
パターンで認識を開始し、各認識処理毎に提供さ
れるサンプルで該標準パターンを修正して次第に
より適切な標準パターンに変更してゆこうとする
ものである。 Let xj be the series of sample values, j=1, ..., N
If the average value of N samples is μ _N and the standard deviation is σ _N , then μ _N = 1/N _N 〓 ^j=1 xj Therefore, _N 〓 ^j=1 xj=Nμ _N ……(1) σ ² _N = 1/N _N 〓 ^j=1 (xj-μ _N ) ² Therefore, _N 〓 ^j=1 x ² _j = N (σ ² _N + μ ² _N ) ...(2). Assuming that μ _N , σ _N , and N are known and a new (N+1)th sample value x _N+1 is obtained, the mean value μ _N and variance σ ² _N+1 are μ _N+1 = 1/ N+1 _N 〓 ^j=1 xj=1/N+1 ( _N 〓 ^j=1 xj+x _N+1 )=1/N+1 (Nμ _N +x _N+1 ) ...(3) σ ² _N+1 = 1/N+1 _N 〓 ^j=1 (xj−μ _N+1 ) ² =1/N+1 _N 〓 ^j=1 x ² _j −μ ² _N+1 =1/N+1 ( _N 〓 ^j=1 x ² _j +x ² _N+1 )− μ ² _N+1 = 1/N+1 {N(σ ² _N + μ ² _N ) x ² _N+1 }−μ ² _N+1 …
...(4), and μ _N+1 and σ _N+1 can be found from μ _N , σ _N , N, and x _N+1 . That is, when μ _N and σ _N for N samples are known, μ _N+1 for (N+1) samples with one sample added,
σ _N+1 can be obtained from equations (3) and (4). The present invention utilizes this relationship to start recognition with a standard pattern made up of N samples, modify the standard pattern with samples provided for each recognition process, and gradually change it to a more appropriate standard pattern. This is what we are trying to do.

あるカテゴリＣに属する文字サンプルｉより抽
出される特徴をX〓^C _iとするとき、X〓^C _iの平均値、μ〓
^C _N
をカテゴリＣの標準パターンとする。すなわち、
カテゴリＣに属する文字サンプルがＮ個与えられ
たときμ〓^C _Nは μ〓^C _N＝１／Ｎ_N 〓ⁱ⁼¹ X〓^C _i ……(5) である。こゝでカテゴリＣとは「漢」「字」など
の字種を示しており、サンプル文字とはそれを各
種態様で手書きしたものをいう、そして特徴X〓^C _i
とは手書文字パターンをＸ方向またはＹ方向に投
影して得られた線密度（例えば文字「漢」のビデ
オ信号の各水平走査線に含まれる黒ドツトの個数
を垂直走査順に配列した数値群でこれはｎ次元ベ
クトルである）をいう。ところで同一カテゴリＣ
に属するすべてのサンプル文字の特徴パターン
（以下サンプルパターンという）からμ〓^C _Nを作成す
ると、サンプルパターンは手書文字の変形をすべ
て含むため、逆に精度のよい標準パターンμ〓^C _Nを構
成することができない。そこで本発明では特徴
X〓^C _iの標準偏差σ〓^C _Nを用いて ‖σ〓^C _N‖ｄ ……(6) となるようにカテゴリＣを制限し、この範囲に収
まらないものはサブカテゴリに分離し（つまりカ
テゴリＣを複数個のサブカテゴリに分割する）、
精度のよい標準パターンを構成する。なお上式で
‖・‖はベクトルからスカラへの変換を示し、ま
たｄは閾値である。今すでにＮ個のサンプルパタ
ーンX〓₁、X〓₂、……X〓_Nから作成されたカテゴリＣ
の標準パターンμ〓^C _Nが存在するとし、その標準偏差
をσ〓^C _Nとすると、新たなサンプルパターンX〓_N+1が
与えられたとき、先ず(3)、(4)式からμ〓^C _N+1、σ〓^C _N
+1を
求め、 ‖σ〓^C _N+1‖ｄ ……(7) であるか否かを判定する。そして上式が満たされ
ればX〓_N+1をカテゴリＣに含めその標準パターン
（平均値）をμ〓^C _Nからμ〓^C _N+1に修正する。これに対
し、 ‖σ〓^C _N+1‖＞ｄ ……(8) であればX〓_N+1をカテゴリＣに含めず、新たにサ
ブカテゴリC′の標準パターンとして登録する。 When the feature extracted from a character sample i belonging to a certain category C is X〓 ^C _i , the average value of X〓 ^C _i , μ〓
^C _N
is the standard pattern of category C. That is,
When N character samples belonging to category C are given, μ〓 ^C _N is μ〓 ^C _N =1/N _N 〓 ⁱ⁼¹ X〓 ^C _i ...(5). Here, category C indicates types of characters such as "kanji" and "characters," and sample characters are those handwritten in various ways, and the characteristics X〓 ^C _i
is the line density obtained by projecting a handwritten character pattern in the X or Y direction (for example, a group of numerical values in which the number of black dots included in each horizontal scanning line of the video signal of the character "Kan" is arranged in vertical scanning order) This is an n-dimensional vector). By the way, same category C
When μ〓 ^C _N is created from the characteristic patterns of all sample characters belonging to (hereinafter referred to as sample patterns), since the sample pattern includes all the transformations of handwritten characters, conversely it forms a standard pattern μ〓 ^C _N with high accuracy. Can not do it. Therefore, the present invention features
Using the standard deviation σ of X〓 ^C i〓 ^C _N _, limit the category C so that ‖σ〓 ^C _N ‖d ...(6), and those that do not fall within this range are separated into subcategories (that is, categories C into multiple subcategories),
Construct a highly accurate standard pattern. Note that in the above equation, ‖·‖ indicates conversion from a vector to a scalar, and d is a threshold value. Category C has already been created from N sample patterns X〓 ₁ , X〓 ₂ , ...X〓 _N
Assuming that there exists a standard pattern μ〓 ^C _N , and its standard deviation is σ〓 ^C _N , when a new sample pattern X〓 _N+1 is given, first from equations (3) and (4), μ〓 ^C _N+1 , σ〓 ^C _N Find ₊₁ and determine whether ‖σ〓 ^C _N+1 ‖d ...(7). If the above formula is satisfied, X〓 _N+1 is included in category C and its standard pattern (average value) is corrected from μ〓 ^C _N to μ〓 ^C _N+1 . On the other hand, if ‖σ〓 ^C _N+1 ‖>d (8), then X〓 _N+1 is not included in category C and is newly registered as a standard pattern of subcategory C'.

第１図はこれを概念的に説明するもので、多次
元の特徴空間ＡでカテゴリＣが表現される場合
に、(7)式を満たすサンプルパターンx_N+1はカテゴ
リＣに含め、その標準パターンμ〓^C _Nをμ〓^C _N+1に修正
す
る。しかし、(8)式の関係にある他のサンプルパタ
ーンx_N+2、x_N+3等はカテゴリＣに含めてその標準
パターンを修正するのは有害であるので、例えば
X_N+3はサブカテゴリC′に含ませる。同じカテゴ
リには属するがサンプルパターンX〓_N+3のように
特徴がずれているというものは、例えば正規の文
字に対する略字などがこれに該当する。ひこの標
準パターン修正操作を本装置の稼動中、新たなサ
ンプルパターンが出現する都度行なので、最初は
少しのサンプル、次第に多くのサンプルの特徴を
収集した適格なものとなり、精度良い文字認識が
可能となる。しかも、収集した全ての特徴で単一
の標準パターンを作成するものではないので、雑
多な特徴の単純な集合（平均値）で認識精度を低
下させることはない。 Figure 1 conceptually explains this. When category C is expressed in multidimensional feature space A, sample pattern x _N+1 that satisfies equation (7) is included in category C, and its standard Modify the pattern μ〓 ^C _N to μ〓 ^C _N+1 . However, it is harmful to include other sample patterns x _N+2 , x _N+3 , etc. that have the relationship in equation (8) in category C and modify the standard pattern, so for example
X _N+3 is included in subcategory C′. Sample patterns that belong to the same category but have _different characteristics, such as sample pattern Since Hiko's standard pattern correction operation is performed every time a new sample pattern appears while the device is in operation, it starts with a few samples and gradually becomes a qualified one that collects the characteristics of many samples, enabling highly accurate character recognition. becomes. Moreover, since a single standard pattern is not created from all the collected features, recognition accuracy will not be reduced by a simple collection (average value) of miscellaneous features.

第２図は本発明の一実施例を示すブロツク図で
ある。同図において１は制御部で、これは本パタ
ーン認識装置全体の制御を行なう。観測部２は、
入力された帳票上の文字Ｍを映像信号Ｖにして特
徴抽出部３へ送る。特徴抽出部３では、観測部２
より得られた映像信号Ｖから特徴パターンX〓を抽
出する。識別部４は、特徴パターンX〓と辞書５か
らの標準パターンμ〓^C（Ｃはカテゴリ）との距離を
算出し、最短距離のカテゴリＣに識別し、その結
果Ｒを出力する。辞書作成部６、辞書登録部７
は、辞書登録時に動作する。辞書作成部６は、入
力の特徴パターンX〓及び辞書５を参照することに
より、(3)(4)式で辞書（標準）パターンを計算す
る。辞書登録部７は計算された辞書パターンμ_N+1
の標準偏差σ〓^C _N+1とあらかじめ与えられた閾値ｄと
を(7)(8)式の様に比較することより、今までの辞書
パターンを修正し、あるいは新しい辞書パターン
を追加登録する。 FIG. 2 is a block diagram showing one embodiment of the present invention. In the figure, reference numeral 1 denotes a control section, which controls the entire pattern recognition apparatus. The observation section 2 is
The input character M on the form is converted into a video signal V and sent to the feature extraction section 3. In the feature extraction unit 3, the observation unit 2
The feature pattern X is extracted from the video signal V obtained by The identification unit 4 calculates the distance between the feature ^pattern Dictionary creation section 6, dictionary registration section 7
operates during dictionary registration. The dictionary creation unit 6 calculates a dictionary (standard) pattern using equations (3) and (4) by referring to the input feature pattern X and the dictionary 5. The dictionary registration unit 7 stores the calculated dictionary pattern μ _N+1
By comparing the standard deviation σ〓 ^C _N+1 with the pre-given threshold d as shown in equations (7) and (8), the existing dictionary patterns can be corrected or new dictionary patterns can be added and registered. .

第３図は辞書５の構成例を示す。ひとつのサブ
カテゴリの辞書は、平均値、標準偏差、サンプル
数、カテゴリ名の４項目からなる。サンプル数
N₁、N₂は各サブカテゴリC₁、C₂、……の辞書を
作るのに使用したサンプルの数であり、平均値お
よび標準偏差は、そのＮ個のサンプルの特徴パタ
ーンの平均値および標準偏差である。このサブカ
テゴリの説明に従えば、第２図のＣ，C′は例えば
C₁、C₂に相当し、そし全体が同一のカテゴリＣ
に属する。 FIG. 3 shows an example of the structure of the dictionary 5. The dictionary for one subcategory consists of four items: mean value, standard deviation, number of samples, and category name. The number of samples
N ₁ , N ₂ is the number of samples used to create the dictionary for each subcategory C ₁ , C ₂ , ..., and the average value and standard deviation are the average value and standard deviation of the characteristic pattern of the N samples. It is a deviation. According to the explanation of this subcategory, C and C' in Figure 2 are, for example,
Corresponds to C ₁ and C ₂ , and the whole category C is the same
belongs to

以上述べたように本発明によれば、種々に変形
される手書文字を、各種のサンプルパターンの学
習を経るにつれて順次高精度に認識できる利点が
ある。 As described above, according to the present invention, there is an advantage that handwritten characters that are variously transformed can be sequentially recognized with high precision as various sample patterns are learned.

[Brief explanation of the drawing]

第１図は本発明に係る辞書パターンの説明図、
第２図は本発明の一実施例を示すブロツク図、第
３図はサブカテゴリに分類された辞書の説明図で
ある。図中、５は辞書、６は辞書作成部、７は辞書登
録部である。 FIG. 1 is an explanatory diagram of a dictionary pattern according to the present invention,
FIG. 2 is a block diagram showing one embodiment of the present invention, and FIG. 3 is an explanatory diagram of a dictionary classified into subcategories. In the figure, 5 is a dictionary, 6 is a dictionary creation section, and 7 is a dictionary registration section.

Claims

[Claims] 1 Sample pattern X belonging to category C ₁ ~
The mean value μ〓 ^C _N , which is derived from X〓 _N and becomes the standard pattern for pattern recognition, and the standard deviation σ〓 ^C _N ,
In addition, the standard deviation σ〓 ^C _{N+1 including the newly input sample pattern X〓 N+} ₁ is ‖σ〓 ^C _N+1 ‖d with respect to the threshold d. _If so _, ^include _the sample pattern The value remains μ〓 ^C _N+1, and the subcategory is set to the sample pattern X〓 _N+1.
A pattern recognition device characterized by having a dictionary correction circuit that generates C′.