JPS63150785A

JPS63150785A - Character recognition device

Info

Publication number: JPS63150785A
Application number: JP61297777A
Authority: JP
Inventors: Toshiaki Yagasaki; 矢ケ崎　敏明
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1986-12-16
Filing date: 1986-12-16
Publication date: 1988-06-23

Abstract

PURPOSE:To identify and recognize a similar character with different size and position by providing a recognizing means recognizing a normalized pattern as a part of a recognizing parameter for the power variation ratio decided by a power variation means. CONSTITUTION:In receiving a document image data from an input section 11, a CPU 12 uses a processing program stored in a ROM 13 to segment a character while using a RAM 14 as a storage and an auxiliary storage of image data and the data is subjected to power variation by either an average power variation ratio or a power variation ratio corresponding to each pattern based on the power variation ratio of each pattern and the average power variation ratio of plural patterns. Then the normalized pattern is recognized by a dictionary corresponding to the power variation ratio in use in a recognition dictionary 15 for special characters or a recognition dictionary 16 for other character and the result is outputted from an output section 17 such as a display device or a storage device.

Description

【発明の詳細な説明】［ａ業上の利用分野〕本発明は、文字認識装置、特に前処理として認識過程に
引き渡すための文字の大きさの正規化を行う文字認識装
置に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Application in Industry] The present invention relates to a character recognition device, and particularly to a character recognition device that normalizes the size of characters before handing them over to the recognition process as preprocessing.

［従来の技術］従来、この種の装置は書かれた文字の位置が規定されて
おり、且つ文字の大きさに関しても所定の記入枠の中に
ある程度の大きさで書くことが限定されていた。一方、
フリーなフォーマットで任意の位置に書かれた文字を認
識する場合には、文字単位に切り出すことと、切り出さ
れた文字の大きさが未定のため大きさを正規化する必要
とが生じる。一般に大きさを正規化をするには、−文字
毎に切り出された文字パターンの外接枠から認識パター
ンへの変倍率を算出して、パターンを当てはめる手順を
とる。さらに、この変倍率も上記文字パターンのＸ方向
、Ｘ方向独立の変倍率を採用しているものが多い。この
場合、例えば“１”のような文字パターンが−“口“の
ように正規化されたり、“Ｏ”や“０”のようなＸ方向
とＸ方向とが微妙に違うパターンに関しては全くの同一
の認識パターンに正規化される。このような場合にＸ方
向、Ｘ方向の変倍率を変えることによって、つまり変倍
率の小さい方を基準に正規化を行なうという手段によっ
て容易に問題を解決できることも明らかになっている。[Prior Art] Conventionally, in this type of device, the position of written characters is specified, and the size of the characters is limited to a certain size within a predetermined writing frame. . on the other hand,
When recognizing characters written at arbitrary positions in a free format, it is necessary to extract each character and to normalize the size of the extracted characters because the size of the extracted characters is undetermined. Generally, in order to normalize the size, the procedure is to calculate the scaling factor from the circumscribed frame of the character pattern cut out for each character to the recognition pattern, and apply the pattern. Further, in many cases, this magnification ratio is independent of the X direction of the character pattern and the X direction. In this case, for example, a character pattern such as "1" is normalized to -"mouth", or a pattern such as "O" or "0" where the X direction is slightly different from the X direction is completely ignored. Normalized to the same recognition pattern. It has also become clear that in such a case, the problem can be easily solved by changing the magnification in the X direction and in the X direction, that is, by performing normalization based on the smaller magnification.

ところが、特殊文字の認識、つまり（”、”　　・“、
”　・　“。”・・・）のような文字の認識においては
、上記の正規化手段では区別できなかったり、正規化に
より別の文字になってしまうことかある。However, the recognition of special characters, that is, (”, “・“,
”・“. When recognizing characters such as ``...'', the above normalization means may not be able to distinguish them, or the normalization may result in different characters.

［発明が解決しようとする問題点］本発明は、上述従来例の欠点を除去し、サイズの違う類
似文字やパターンの位置の異なる類似文字を識別して認
識する文字認識装置を提供することを目的とする。[Problems to be Solved by the Invention] The present invention aims to eliminate the drawbacks of the above-mentioned conventional examples and provide a character recognition device that identifies and recognizes similar characters with different sizes and similar characters with different pattern positions. purpose.

［問題点を解決するための手段］上記目的を達成するために本発明の文字認識装置は以下
のような構成から成る。[Means for Solving the Problems] In order to achieve the above object, the character recognition device of the present invention has the following configuration.

即ち、配列された文字を認識する文字認識装置において
、所定方向に配列されたパターンを分離する分離手段と
、該分離手段により分離されたパターンを個々のパター
ンの変倍比率と複数のパターンの平均変倍比率とに基づ
いて決定された変倍比率で変倍して所定の大きさに正規
化する変倍手段と、該変倍手段で使用された変倍比率に
対応した辞書に基づいて変倍手段により正規化されたパ
ターンの認識を行う認識手段とを備える。That is, in a character recognition device that recognizes arranged characters, a separating means separates patterns arranged in a predetermined direction, and the patterns separated by the separating means are calculated by calculating the scaling ratio of each pattern and the average of a plurality of patterns. and a scaling means for normalizing to a predetermined size by scaling at a scaling ratio determined based on the scaling ratio; and and recognition means for recognizing the pattern normalized by the multiplication means.

［作用コ以上の構成において、配列された文字を認識する文字認
識装置において、所定方向に配列されたパターンを分離
する分離手段と、該分離手段により分離されたパターン
を個々のパターンの変倍比率と複数のパターンの平均変
倍比率とに基づいて決定された変倍比率で変倍して所定
の大きさに正規化する変倍手段と、該変倍手段で使用さ
れた変倍比率に対応した辞書に基づいて変倍手段により
正規化されたパターンの認識を行うように動作する。[Operations] In the above configuration, in the character recognition device that recognizes arranged characters, there is a separating means for separating patterns arranged in a predetermined direction, and a scaling ratio of each pattern for the patterns separated by the separating means. and a scaling means for normalizing to a predetermined size by scaling at a scaling ratio determined based on the average scaling ratio of the plurality of patterns; The scaling means operates to recognize a normalized pattern based on the dictionary.

［実力伍例コ第２図（ａ）は従来の文字認識装置の構成を示すブロッ
ク図、第２図（ｂ）は従来の文字認識装置の処理フロー
チャートである。FIG. 2(a) is a block diagram showing the configuration of a conventional character recognition device, and FIG. 2(b) is a processing flowchart of the conventional character recognition device.

従来の文字認識装置は、入力部２１とＣＰＵ２２とＲＯ
Ｍ２３とＲＡＭ２４と認識辞書２５と出力部２６とを備
え、人力部２１より文書をイメージデータで人力すると
、ＣＰＵ２２はＲＯＭ２３に格納された処理プログラム
に従って、ＲＡＭ２４をイメージデータの記憶及び補助
記憶として使いながら文字の切り出し一正規化一認識を
行い、表示装置や記憶装置等の出力部２６より出力する
。A conventional character recognition device has an input section 21, a CPU 22, and an RO.
It is equipped with an M23, a RAM 24, a recognition dictionary 25, and an output section 26, and when a document is manually inputted from the human power section 21 using image data, the CPU 22 uses the RAM 24 as a memory for image data and as an auxiliary memory according to a processing program stored in a ROM 23. Character cutting, normalization, and recognition are performed, and the results are output from an output unit 26 such as a display device or a storage device.

その動作を第２図（ｂ）のフローチャートに従って説明
する。ステップ５２００で人力部２１からの入力文書デ
ータが人力される。ここではスキャナで入力する。この
スキャナは人力文章をイメージデータに変換するが、一
般的にＣＣＤ等のセンサを使うとイメージデータはアナ
ログ値を示す。このアナログ値を２値化（０，１のパタ
ーン）するのがステップ５２０１である。２値化された
イメージデータはイメージメモリのＲＡＭ２４に格納さ
れ、ステップ５２０２で文字の切り出しが行なわれる。The operation will be explained according to the flowchart shown in FIG. 2(b). In step 5200, the input document data from the human power section 21 is input manually. Here, input using a scanner. This scanner converts human text into image data, but generally when a sensor such as a CCD is used, the image data represents an analog value. In step 5201, this analog value is binarized (into a pattern of 0 and 1). The binarized image data is stored in the RAM 24 of the image memory, and characters are cut out in step 5202.

文字の切り出しを第３図の入力文書例に従って説明する
。イメージデータをＸ方向にスキャンしてＸ方向の濃度
ヒストグラムＨＹを求め、ライン毎に発生するパルス波
形毎に文字列の判別を行なう。第３図においては、■〜
■が文字列とみなされる。さらに文字列毎に分離して、
それぞれに対してＸ方向の濃度ヒストグラムＨｘをとる
。Cutting out characters will be explained according to the input document example shown in FIG. Image data is scanned in the X direction to obtain a density histogram HY in the X direction, and character strings are determined for each pulse waveform generated for each line. In Figure 3, ■~
■ is considered a character string. Furthermore, separate each string,
A density histogram Hx in the X direction is obtained for each.

第３図には０列の濃度ヒストグラムＨＸが示してあり、
これにより■・・・■のように１文字車位に文字が切り
出される。FIG. 3 shows the density histogram HX in the 0 column,
As a result, characters such as ■...■ are cut out at one character position.

第４図は１文字車位に切り出された文字例を表わしだの
ものである。第４図に示されるように、Ｘ方向の幅はす
べて同一の間隔であることがわかる。一方、Ｘ方向の幅
は文字の横の大きさで変化する。つまり、“、′キ”は
ほぼ文字外形のピままに切り出され、“−”は外形としては“ビ”とほぼ
同じになるがＸ方向の幅の中心部のみに片寄っている。Figure 4 shows an example of characters cut out to the size of one character. As shown in FIG. 4, it can be seen that the widths in the X direction are all at the same interval. On the other hand, the width in the X direction changes depending on the horizontal size of the character. In other words, ", 'ki" is cut out almost exactly as the outer shape of the character, and "-" has almost the same outer shape as "bi" but is offset only to the center of the width in the X direction.

“１”、”。”等は細長のデータが切り出される。For "1", ".", etc., elongated data is cut out.

ステップ５２０３では、ステップ５２０２で切り出され
た文字の平滑化処理がなされる。特にここでは孤立画像
の除去（ノイズ除去）が中心である。ステップ５２０４
では、第５図に示されるように文字外形の矩形抽出が行
なわれる。ステップ５２０５では、ステップ５２０４で
抽出された矩形図形ｆ　（ｉ）と特徴描出される画像パ
ターンの大きさＦとの縦幅と横幅の倍率から、大きさの
正規化をするための変倍率ｇが計算される。例えば、ス
テップ５２０４で求められた第５図の矩形図形ｆ（ｉ）
が３２Ｘ３２の画素からなっていて、特徴描出される画
像パターンＦが６４Ｘ６４のときは、変倍率ｇ＝２が得
られる。In step 5203, the characters cut out in step 5202 are smoothed. In particular, the focus here is on removing isolated images (noise removal). Step 5204
Then, as shown in FIG. 5, rectangular extraction of the outer shape of the character is performed. In step 5205, a scaling factor g for normalizing the size is determined from the magnification of the vertical and horizontal widths of the rectangular figure f (i) extracted in step 5204 and the size F of the image pattern to be characterized. calculated. For example, the rectangular figure f(i) in FIG. 5 obtained in step 5204
is made up of 32×32 pixels, and when the image pattern F in which features are depicted is 64×64, a magnification ratio g=2 is obtained.

この変倍率ｇだけｆ（ｉ）を変倍することによって、ス
テップ５２０６では正規化パターンＦ（ｉ）（正規化さ
れる最終の画像パターン）が作り出される。ステップ５
２０７では正規化パターンＦ（ｉ）の特徴描出が実行さ
れ、ステップ３２０８で予め認識辞書２５に記憶された
パラメータとの比較によって識別ｆｉ埋がなされ、ステ
ップ５２０９で認識結果が出力部２６に出力される。こ
の場合、候補文字という形で複数個を出力することが多
い。By scaling f(i) by this scaling factor g, a normalized pattern F(i) (the final image pattern to be normalized) is created in step 5206. Step 5
In step 207, the characteristics of the normalized pattern F(i) are drawn, and in step 3208, identification is performed by comparison with parameters stored in the recognition dictionary 25 in advance, and in step 5209, the recognition result is output to the output unit 26. Ru. In this case, a plurality of candidate characters are often output.

第１図（ａ）は第１の実施例の文字認識装置のブロック
図、第１図（ｂ）、（ｃ）は第１の実施例の処理フロー
チャートである。FIG. 1(a) is a block diagram of a character recognition device according to the first embodiment, and FIGS. 1(b) and 1(c) are processing flowcharts of the first embodiment.

第１の実施例の文字認識装置は、入力部１１とＣＰＵ１
２とＲＯＭ１３とＲＡＭ１４と認識辞書１５と認識辞書
１６と出力部１７とを備え、人力部１１より文書をイメ
ージデータで入力すると、ＣＰＵ１２はＲＯＭ１３に格
納された処理プログラムに従って、ＲＡＭ１４をイメー
ジデータの記憶及び補助記憶として使用しながら、文字
の切り出しと正規化が実行される。正規化されたパター
ンに基づいて認識辞書１５と認識辞書１６とが選択され
て文字認識が実行され、表示装置や記憶装置等の出力部
１７より出力する。The character recognition device of the first embodiment includes an input section 11 and a CPU 1.
2, a ROM 13, a RAM 14, a recognition dictionary 15, a recognition dictionary 16, and an output section 17.When a document is input as image data from the human power section 11, the CPU 12 stores the image data in the RAM 14 according to the processing program stored in the ROM 13. Characters are extracted and normalized while being used as auxiliary memory. The recognition dictionaries 15 and 16 are selected based on the normalized pattern, character recognition is performed, and the results are output from an output unit 17 such as a display device or a storage device.

その動作を第１図（ｂ）、（Ｃ）の処理フローチャート
に従って説明する。ステップ５１００〜ステツプ５１０
３は前述のステップ３２００〜ステツプ５２０３と同一
の処理を行なう。ステップ５１０４では、連続したｎ文
字の切り出しパターンデータから、第５図で示されるよ
うな矩形データｆ　（１）、　　ｆ　（２）、　・・・
、　　ｆ　（ｎ）を得る。The operation will be explained according to the processing flowcharts shown in FIGS. 1(b) and 1(C). Step 5100 - Step 510
Step 3 performs the same processing as steps 3200 to 5203 described above. In step 5104, rectangular data f (1), f (2), . . . as shown in FIG. 5 are obtained from the cutout pattern data of n consecutive characters.
, obtain f(n).

ステップ５１０５において、得られた矩形データのＸ成
分の平均値ｆ’ＸとＸ成分の平均値ｔ”ｙとが、次式により算出される。尚、ｆｘＨｃ）、　　ｆｙ（ｋ）は
ｆ　（ｋ）のＸ成分とＸ成分である。以下、Ｘ成分とＸ
成分とは処理が同じなので、第１図（ｂ）ではＸ、Ｙを
区別せず共通の書き方をする。ステップ５１０６におい
ては、平均値ｆ′と認識文字のサイズＦとから変倍率ｇ
＝Ｆ／ｆ′を算出する。In step 5105, the average value f'X of the X component and the average value t"y of the X component of the obtained rectangular data are calculated using the following equations. Note that fxHc) and fy(k) are f (k ).Hereinafter, the X component and
Since the processing is the same as that of the components, in FIG. 1(b), X and Y are written in the same way without distinction. In step 5106, the scaling factor g is calculated from the average value f' and the size F of the recognized character.
=F/f' is calculated.

ステップ５１０７で各々の文字の変倍率ｇ’　（ｋ）＝
Ｆ／ｆ　（ｋ）を算出する。ｋは各文字の順番とする。In step 5107, the magnification of each character g' (k)=
Calculate F/f (k). Let k be the order of each character.

ステップ５ｔｏａでは、ｇとｇ′（ｋ）との比較がなさ
れる。これは、各々の文字の変倍率ｇ’（ｋ）が平均変
倍率ｇに所定値Ａを加えた値より大きければ、ステップ
５１０９で平均値変倍率をとることにより、異常に大き
な変倍が実行されないことになる。たとえば、切り出さ
れた文字が第５図の“。”、“、”等の場合に与えられ
ることになる。尚、本例のようにＸ、Ｙの区別をせず全
体の平均値・変倍率を算出して処理をしても良い。In step 5toa, a comparison is made between g and g'(k). This means that if the scaling factor g'(k) of each character is larger than the average scaling factor g plus a predetermined value A, an abnormally large scaling factor is executed by taking the average scaling factor in step 5109. It will not be done. For example, the cut out characters are given in the case of ".", ",", etc. in FIG. Note that processing may be performed by calculating the overall average value/magnification ratio without distinguishing between X and Y as in this example.

一方、ステップ５１０８で各々の文字の変倍率ｇ′（ｋ
）が平均変倍率ｇに所定値Ａを加えた値以下ならば、従
来のように各々の文字の変倍率ｇ　（ｋ）を使用する。On the other hand, in step 5108, the scaling factor g'(k
) is less than the value obtained by adding the predetermined value A to the average scaling factor g, then the scaling factor g(k) of each character is used as in the conventional method.

ステップ５１０９，１１０で変倍された文字はステップ
５１１１で正規化パターンＦ　（ｋ）として、以下ステ
ップ３１１２〜１１５で特徴抽出して文字の認識を行い
、出力部１７に出力する。The characters scaled in steps 5109 and 110 are used as a normalized pattern F (k) in step 5111, features are extracted in steps 3112 to 115, character recognition is performed, and the characters are output to the output unit 17.

この場合、第１図（ａ）のブロック図で、特殊文字の認
識辞書１５を別に設けて、ステップ５１１３で認識辞書
１５と１６の選択を行い、ステップ５１０９に進んだ場
合は特徴抽出後に認識辞書１５にアクセスがかかるよう
にすれば、認識時間の大幅な削減も達成できる。つまり
、第１の実施例では認識辞書１５は特殊文字認識辞書で
あり、認識辞書１６は他の文字となる。又、特殊文字の
認識辞書１５が与えられるために、特殊文字に対する認
識処理において、文字データの認識不良が回避できる。In this case, in the block diagram of FIG. 1(a), if the recognition dictionary 15 for special characters is separately provided, the recognition dictionaries 15 and 16 are selected in step 5113, and the process proceeds to step 5109, the recognition dictionary is used after feature extraction. 15, the recognition time can be significantly reduced. That is, in the first embodiment, the recognition dictionary 15 is a special character recognition dictionary, and the recognition dictionary 16 is for other characters. Furthermore, since the special character recognition dictionary 15 is provided, poor recognition of character data can be avoided in special character recognition processing.

更に、小文字のアルファベットや促音（や、ゆ、っ・・
・）などが判別できるために、これらの文字を認識辞書
１５に格納すれば、大文字と類似の小文字に対する判別
が達成できる。In addition, lowercase alphabets and consonants (ya, yu,...
·) etc. can be distinguished, so if these characters are stored in the recognition dictionary 15, discrimination between uppercase letters and similar lowercase letters can be achieved.

第６図（ａ）は第２の実施例の文字認識装置のブロック
図、第６図（ｂ）、（’Ｃ）は第２の実施例の処理フロ
ーチャートである。FIG. 6(a) is a block diagram of the character recognition device of the second embodiment, and FIGS. 6(b) and ('C) are processing flowcharts of the second embodiment.

第２の実゛施例の文字認識装置は、入力部６１とＣＰＵ
６２とＲＯＭ６３とＲＡＭ６４と認識辞書１５ａ、１５
ｂ、６５ｃ、−と出力部６７とを備え、人力部６１より
文書イメージデータで入力すると、ＣＰＵ６２はＲＯＭ
６３に格納された処理プログラムに従って、ＲＡＭ６４
をイメージデータの記憶及び補助記憶として使用しなが
ら、文字の切り出しと正規化が実行される。正規化され
た結果、つまり正規化された手順に基づいて、認識辞書
６５　ａ、　６５　ｂ、　６５　ｃ、・・・が選択され
て文字認識が実行され、表示装置や記憶装置等の出力部
６６より出力する。The character recognition device of the second embodiment includes an input section 61 and a CPU.
62, ROM63, RAM64, and recognition dictionary 15a, 15
b, 65c, - and an output section 67, and when document image data is input from the human power section 61, the CPU 62 outputs the data from the ROM.
According to the processing program stored in the RAM 64
Character segmentation and normalization are performed while using the image data storage and auxiliary storage. Based on the normalized result, that is, the normalized procedure, the recognition dictionaries 65 a, 65 b, 65 c, . Output from

その動作を、第６図（ｂ）、（ｃ）の処理フローチャー
トに従って説明する。ステップ３６００〜ステツプ５６
０３は、従来例の５２００〜５２０３と同一の処理を行
なう。ステップ５６０４では、切り出し文字パターンの
図形の重心が算出される。重心というのは、切り出され
た文字が例えば第７図に示されるような図形の場合には
、Ａ点を基準にＡ　（０，Ｏ）。The operation will be explained according to the processing flowcharts in FIGS. 6(b) and 6(c). Step 3600~Step 56
03 performs the same processing as 5200 to 5203 of the conventional example. In step 5604, the center of gravity of the figure of the cut-out character pattern is calculated. The center of gravity is A (0, O) based on point A, if the cut out character is a figure like the one shown in Figure 7, for example.

Ｂ　（Ｗ＋　、Ｏ）、Ｃ（０，ｈ＋　）、Ｄ　（Ｗ＋　
。B (W+, O), C (0, h+), D (W+
.

ｈ２）なる座標を設け、−例として、ｆ　（ｘ。h2), - as an example, f (x.

ｙ）が座標（ｘ、ｙ）における画素の濃度値とすると、を算出することで（Ｘ＋＋’ｊｌ）が重心の座標として
求まる。ステップ５６０５においては、求められたＸ＋
　、’／ｒの値に応じて第８図のように重心の位置ｇ（
ｉ）の割り当てを行う。ｉは１〜９の整数で、第７図の
パターンはｇ（５）が割りつけられる。“、”のような
パターンはｇ（３）が割りつけられることになる。When y) is the density value of the pixel at the coordinates (x, y), (X++'jl) can be found as the coordinates of the center of gravity by calculating the following. In step 5605, the obtained X+
, '/r, the position of the center of gravity g(
i) Make the assignment. i is an integer from 1 to 9, and g(5) is assigned to the pattern in FIG. Patterns such as "," will be assigned g(3).

ステップ５６０６では、上記パターンの外形図形が求め
られる。外形図形というのは、パターンの始めと終りの
位置をＸ方向、Ｘ方向につめることである。この時のＸ
方向の幅をｆ　（ｘ）、Ｘ方向の幅をｆ　（ｙ）とする
。ステップ５６０７では、所定値Ａとｆ　（ｘ）、　　
ｆ　（ｙ）をそれぞれ比較して両方ともｔ　（ｘ）、　
　ｆ　（ｙ）のほうが小さいならば、つまり促音文字や
特殊文字ならば、ステップ５６０８に進む。その他の文
字の場合はステップ５６０９に進むことになる。In step 5606, the external shape of the pattern is determined. The outline figure is the starting and ending positions of the pattern arranged in the X direction and the X direction. X at this time
Let the width in the direction be f (x) and the width in the X direction be f (y). In step 5607, the predetermined values A and f (x),
Compare f (y) and both t (x),
If f (y) is smaller, that is, if it is a consonant character or a special character, the process advances to step 5608. In the case of other characters, the process advances to step 5609.

ステップ５６０９では、ｇ　’　＝Ｍｉｎ　（Ｆｘ／ｆ（ｘ）、　Ｆｙ／ｆ（ｙ
））でＸ方向とＸ方向の幅の変倍率の内で小さい値のほ
うが選択される。これで、“−”のような文字ではＸ方
向の変倍率がとられ、“１”のような文字の場合にはＸ
方向の変倍率が選ばれることになる。尚、Ｆｘ、Ｆｙは
、認識パターンが（６４゜６４）のパターンであるとす
るとＦｘ＝６４゜Ｆｙ＝６４となる。In step 5609, g' = Min (Fx/f(x), Fy/f(y
)), the smaller value of the magnification ratio of the width in the X direction and the width in the X direction is selected. Now, for a character like "-", the scaling factor in the X direction is taken, and for a character like "1", it is
The magnification ratio in the direction will be selected. Note that Fx and Fy are Fx=64°Fy=64 if the recognition pattern is a (64°64) pattern.

一方、ステップ５６０８では、上記算出されたｇ（ｉ）
によってフローが異なる。ｇ（１）のとぎ、つまり“′
”のような文字に対してはステップ５６１０ａに進み、
ｇ（２）のとき、つまり“＊”のような文字のときはス
テップ５６１０ｂに進み、ｇ（３）のとき、つまり“、
”のような文字のときはステップ５６１０ｃの処理をす
る。On the other hand, in step 5608, the above calculated g(i)
The flow differs depending on the situation. The end of g(1), that is, “′
”, proceed to step 5610a;
When g(2), that is, a character like "*", the process proceeds to step 5610b, and when g(3), that is, a character like ","
”, the process of step 5610c is performed.

第６図（ｂ）にはｇ（１）〜ｇ（３）の場合しか図示し
なかったが、ｇ（４）・・・ｇ（９）の場合にもそれぞ
れの処理を行う。又、ｇ（１）。Although only the cases g(1) to g(3) are shown in FIG. 6(b), the respective processes are also performed in the cases g(4)...g(9). Also, g(1).

ｇ　（４）、　ｇ　（７）をｇ（１）で代表させる処理
も考えられる。ステップ５６１０ａ、６１０ｂ。Processing in which g (4) and g (7) are represented by g (1) can also be considered. Steps 5610a, 610b.

６１０　ｃ　、　・−の処理は、Ｆ／ｆ詳しくは、Ｍｉ
ｎ（Ｆｘ／ｆ　（ｘ）　、　Ｆｙ／ｆ　（ｙ）　）で変
倍率が求められ、さらに１／４．１／３．１／２．・・
・をかけることにより正規化変倍率を異ならしめる。こ
れは、“”と“、”との正規化後の特徴ベクトル空間の
パターンを異ならしめるもので、認識率の向上がはかれ
る。610 c, ・- processing is F/f details, Mi
The magnification ratio is determined by n(Fx/f (x), Fy/f (y)), and further 1/4.1/3.1/2.・・・
By multiplying by , the normalized magnification factor is made different. This makes the patterns of the feature vector space after normalization between "" and "," different, thereby improving the recognition rate.

ステップＳ６１】では、ステップ５６１０ａ。Step S61], step 5610a.

６１０ｂ、６１０ｃ、・・・で求められたｇ′の変倍率
によって人力データの正規化が実行されＦ　（ｘ）　、
　　Ｆ　（ｙ）の正規化パターンが求まることになる。Normalization of the manual data is performed using the scaling factor of g′ obtained in steps 610b, 610c, . . . F (x) ,
A normalized pattern of F (y) will be found.

正規化パターンはステップ５６１２で特徴抽出が実行さ
れる。ステップ５６１３では、特徴抽出されたベクトル
ｆ　（ａ）・・・但ａ＝１〜６４、に前述のステップで
求められたｇ′。Feature extraction is performed on the normalized pattern in step 5612. In step 5613, the feature extracted vector f(a)...where a=1 to 64 is added to g' obtained in the previous step.

ｇ（ｉ）を付加することで、識別処理を行なう最終的な
ベクトル空間が求められることになる。By adding g(i), the final vector space for performing the identification process is obtained.

第２の実施例では、ｇ’、ｇ　（ｉ）の２つの値をベク
トル空間に付けているが、どちらか一方でも大きな効果
が出る。ステップ５６１４では、ｇ’、ｇ（ｉ）の値に
よって認識辞書６５ａ。In the second embodiment, two values, g' and g (i), are added to the vector space, but either one can have a large effect. In step 5614, the recognition dictionary 65a is determined based on the values of g' and g(i).

６５ｂ、６５ｃ、・・・の選択が実行される。ｇ′とｇ
（ｉ）の組み合わせのそれぞれに対して認識辞書を設け
るならば、第６図（ｂ）のフローで示しただけでも４Ｘ
９＝３６種類の認識辞書を作ることができる。認識辞書
の数は、コストとシステムの必要性に応じて設ければ良
く、重心位置の決定のための区分けも９つに限らない。Selection of 65b, 65c, . . . is executed. g′ and g
If a recognition dictionary is provided for each of the combinations in (i), even the flow shown in Figure 6(b) will yield 4X
9=36 types of recognition dictionaries can be created. The number of recognition dictionaries may be provided according to cost and system necessity, and the division for determining the center of gravity position is not limited to nine.

又、第１の実施例と同様にＸ成分とＸ成分を分けずに処
理をしてもよい。以下、ステップ５６１５で選択された
認識辞書を基に文字の識別が行なわれ、ステップ５６１
６で候補文字が出力される。Further, as in the first embodiment, processing may be performed without separating the X component and the X component. Thereafter, characters are identified based on the recognition dictionary selected in step 5615, and in step 561
6, candidate characters are output.

以上述べてきたように本実施例によりサイズの違いによ
る類似文字“ヤ”、“ヤ”などに関しての識別に関して
有効な手段を提供するものである。そのために、切り出
された文字の矩形図形。As described above, this embodiment provides an effective means for identifying similar characters "ya", "ya", etc. due to differences in size. For that purpose, the rectangular shape of the character was cut out.

Ｘ方向の長さ、Ｘ方向の長さを算出することで、パター
ンの複数のＸ方向の平均値、Ｘ方向の平均値を求め、平
均値を基準にしてパターンデータの個々の長さを比較す
る手段を有し、比較結果に応じてパターンデータの変倍
率を決定することにある。又、上記変倍率は、各パター
ンデータによって独立に設けられることと、各パターン
の正規化への変倍率に応じて、平均値の変倍率か各パタ
ーンに対応する変倍率かが選択的に決められることを特
徴とする。By calculating the length in the X direction, the average value in multiple X directions and the average value in the X direction of the pattern can be calculated, and the individual lengths of the pattern data can be compared based on the average value. The purpose of the present invention is to have means for determining the magnification of the pattern data according to the comparison result. Furthermore, the scaling factor described above is set independently for each pattern data, and the scaling factor of the average value or the scaling factor corresponding to each pattern is selectively determined depending on the scaling factor for normalization of each pattern. It is characterized by being

さらに、本実施例によると、フリーフォーストで書かれ
る文章に対して、句読点や“目゛。Furthermore, according to this embodiment, punctuation marks and "eyes" are added to sentences written in free text.

日”、“１”などの縦長の文章、又横長の文字に対して
も読取りパターンデータを忠実に保ちながら、正規化す
る手段を提供することで文字の認識率そのものを向上す
ることが可能となる。It is possible to improve the character recognition rate itself by providing a means to normalize while maintaining the fidelity of reading pattern data for vertically long sentences such as ``日'' and ``1'' as well as horizontally long characters. Become.

さらに、本実施例では、切り出された文字の画像パター
ンの重心の位置を算出し、重心の位置と画像パターンの
大きさに応じて正規化のための変倍手段を変える。これ
により、パターンの位置関係によって異なる類似文字“
、”、“などの識別に関して有効な手段を提供するもの
である。Furthermore, in this embodiment, the position of the center of gravity of the image pattern of the extracted character is calculated, and the scaling means for normalization is changed depending on the position of the center of gravity and the size of the image pattern. This allows for similar characters that vary depending on the positional relationship of the pattern.
, ", ", etc. is provided.

つまり、本実施例においては、上記類似文字には重心の
位置の違いが生じている点に着目して、文字外形図形パ
ターンからの大きさによる認識図形パターンへの変倍率
を異にすることで特徴ベクトルを空間の性質を違えてし
まう。In other words, in this embodiment, by focusing on the fact that the above-mentioned similar characters have different centers of gravity, by varying the scaling ratio from the character outer shape pattern to the recognized shape pattern depending on the size. The spatial properties of the feature vectors are changed.

さらに、本実施例は、上記手段において正規化された認
識図形パターンより特徴抽出することと、抽出によって
求められた特徴ベクトル空間に上記手段によって求めら
れた変倍率を特徴パラメータとして付加することを特徴
とする。又、上記手段において正規化され求められた特
徴ベクトル空間に、文字切り出し後に求められた重心の
アドレス位置のパラメータを付加することを特徴とする
。もちろん、上述の変倍パラメータ及び重心アドレス位
置のパラメータは独立した２つのベクトル空間として付
けることも可能である。Furthermore, this embodiment is characterized in that features are extracted from the recognized figure pattern normalized by the above means, and the scaling factor found by the above means is added as a feature parameter to the feature vector space found by the extraction. shall be. Further, the present invention is characterized in that a parameter of the address position of the center of gravity determined after character segmentation is added to the feature vector space normalized and determined in the above-mentioned means. Of course, the above-mentioned scaling parameter and barycenter address position parameter can be added as two independent vector spaces.

［発明の効果コ本発明により、サイズの違う類似文字やパターンの位置
の異なる類似文字を識別して認識する文字認識装置を提
供できる。[Effects of the Invention] According to the present invention, it is possible to provide a character recognition device that identifies and recognizes similar characters with different sizes and similar characters with different pattern positions.

[Brief explanation of the drawing]

第１図（ａ）は第１の実施例の文字１Ｂ識装置のブロッ
ク図、第１図（ｂ）、（Ｃ）は第１の実施例の処理フローチャ
ート、第２図（ａ）は従来の文字認識装置のブロック図、第２図（ｂ）は従来の処理フローチャート、第３図は人
力文字例を示す図、第４図は切り出した文字例を示す図、第５図は第４図の文字に対応する矩形図形を示す図、第６図（ａ）は第２の実施例の文字認識装置のブロック
図、第６図（ｂ）、（Ｃ）は第毒の実施例の処理フローチャ
ート、第７図は第２の実施例の重心の算出を説明する図、第８図は第２の実施例の重心の割り当て位置を説明する
図である。図中、１１・・・人力部、１２・・−ＣＰＵ、１３・・
・ＲＯＭ、１４・・・ＲＡＭ、１５．１６・・・認識辞
書、１７・・・出力部、６１・・・入力部、６２・・・
ＣＰＵ。８３−ＲＯＭ、　　６４−ＲＡＭ、　　６５ａ、　　６
５ｂ。６５ｃ・・・認識辞書、６６・・・出力部である。特許出願人　　　キャノン株式会社１翻ン第１図　（０）第１図　（Ｃ）第２図　　（０）第６図　（０）第６図　（Ｃ）Fig. 1(a) is a block diagram of the character 1B recognition device of the first embodiment, Fig. 1(b) and (C) are the processing flowchart of the first embodiment, and Fig. 2(a) is the conventional A block diagram of the character recognition device, Figure 2(b) is a conventional processing flowchart, Figure 3 is a diagram showing an example of human-powered characters, Figure 4 is a diagram showing an example of extracted characters, and Figure 5 is the same as in Figure 4. A diagram showing a rectangular figure corresponding to a character, FIG. 6(a) is a block diagram of the character recognition device of the second embodiment, FIGS. 6(b) and (C) are a processing flowchart of the second embodiment, FIG. 7 is a diagram for explaining the calculation of the center of gravity in the second embodiment, and FIG. 8 is a diagram for explaining the assigned position of the center of gravity in the second embodiment. In the figure, 11...human resources department, 12...-CPU, 13...
ROM, 14...RAM, 15.16...Recognition dictionary, 17...Output section, 61...Input section, 62...
CPU. 83-ROM, 64-RAM, 65a, 6
5b. 65c... Recognition dictionary, 66... Output unit. Patent applicant: Canon Co., Ltd. Figure 1 (0) Figure 1 (C) Figure 2 (0) Figure 6 (0) Figure 6 (C)

Claims

[Claims]

(1) In a character recognition device that recognizes arranged characters, a separating means separates patterns arranged in a predetermined direction, and the patterns separated by the separating means are divided into a plurality of patterns according to a scaling ratio of each pattern. A scaling means for normalizing to a predetermined size by scaling at a scaling ratio determined based on the average scaling ratio, and a dictionary corresponding to the scaling ratio used by the scaling means. and recognition means for recognizing the pattern normalized by the scaling means.

(2) The character recognition device according to claim 1, wherein the magnification changing means changes the magnification separately in the vertical direction and the horizontal direction.

(3) The scaling means includes a comparison means for comparing the scaling ratio of each pattern with an average scaling ratio of a plurality of patterns, and a scaling ratio of each pattern is determined from a comparison result of the comparison means for a plurality of patterns. If the scaling ratio of each pattern is smaller than the average scaling ratio of multiple patterns, the scaling ratio of each individual pattern is normalized. 2. The character recognition device according to claim 1, further comprising: scaling ratio determining means for determining a scaling ratio of .

(4) The character recognition device according to claim 1, wherein the recognition means uses the scaling ratio determined by the scaling means as part of the recognition parameter.