JPS62192887A

JPS62192887A - Feature quantity generating method in character recognizing device

Info

Publication number: JPS62192887A
Application number: JP61036056A
Authority: JP
Inventors: Masahiro Nakamura; 昌弘中村
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-02-20
Filing date: 1986-02-20
Publication date: 1987-08-24

Abstract

PURPOSE:To decrease a feature quantity and to realize the improvement of a recognizing ratio and the recognizing speed by unifying a part of the divided block of an input pattern, generating a histogram and designating the block to be unified. CONSTITUTION:An operator inputs unifying instruction data to a host computer 51 by using a keyboard 54. The host computer 51 writes the unifying instruction data into a memory 53. After an OCR processor 52 reads an original 55 and the direction coding and area dividing of an outline part are executed concerning the input pattern, the histogram is prepared. At the time of preparing the historgram, the OCR processor 52 reads the unifying instruction data of the memory 53, decides the block to be unified and unifies the histogram of plural instructed blocks. Thereafter, the OCR processor 52 compares the obtained histogram with the dictionary and determines a candidate character.

Description

【発明の詳細な説明】〔技術分野〕本発明はＯＣＲ等の文字認識装置における特徴量生成方
法に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field] The present invention relates to a feature generation method in a character recognition device such as OCR.

[Prior art]

ＯＣＲ等における文字認識方法の一つに、入力パターン
をその輪郭部に方向コードを付けて複数ブロックに分割
し、この分割したブロック毎にその方向コード別のヒス
トグラムをとり、この各ヒストグラムを特徴量として文
字認識を行う方法がある。この場合、例えば入力パター
ンを４×４に分割し、８方向の方向コードを用いるとす
ると、４ｘ４ｘ８＝１２８次元の特徴量が抽出される。One of the character recognition methods in OCR, etc. is to divide an input pattern into multiple blocks by attaching a direction code to its outline, take a histogram for each divided block by its direction code, and use each histogram as a feature value. There is a method for character recognition. In this case, for example, if the input pattern is divided into 4×4 and eight direction codes are used, 4×4×8=128 dimensional feature amounts are extracted.

ところで、この特徴量の中には文字の識別能力の高いも
のもあれば低いものもある。しかしながら、従来はこれ
らの特徴量をいずれも同じように扱って距離演算を行っ
ており、このため、認識率及び認識速度の低下を招く一
因となっていた。By the way, some of these feature quantities have high character discrimination ability, while others have low character discrimination ability. However, conventionally, distance calculations have been performed by treating all of these feature amounts in the same way, which has been one of the causes of a decrease in recognition rate and recognition speed.

〔the purpose〕

本発明の目的は、入力パターンを複数ブロックに分割し
、この分割したブロック毎にその方向コード別にヒスト
グラムをとり、この各ヒストグラムを特徴量として文字
認識を行う文字認識装置において、認識率及び認識速度
の向上を図ることにある。An object of the present invention is to improve recognition rate and recognition speed in a character recognition device that divides an input pattern into a plurality of blocks, takes a histogram for each direction code for each divided block, and performs character recognition using each histogram as a feature quantity. The aim is to improve the

〔composition〕

本発明は、分割したブロックの一部を統合してヒストグ
ラムを生成し、しかも、統合するブロックを操作者が指
定できるようにして、特徴量を削減し、それによって認
識率及び認識速度の向上を実現するものである。以下、
図面によって本発明の一実施例を説明する。The present invention integrates some of the divided blocks to generate a histogram, and also allows the operator to specify the blocks to be integrated, thereby reducing the amount of features and thereby improving the recognition rate and speed. It is something that will be realized. below,
An embodiment of the present invention will be described with reference to the drawings.

はじめ、第２図により入力パターンの領域分割について
説明する。まず、入力パターンの輪郭部について方向コ
ードを付ける（ステップ２１）。First, region division of an input pattern will be explained with reference to FIG. First, a direction code is attached to the outline of the input pattern (step 21).

次に、この入力パターンの輪郭部に付けた方向コードを
カウントし、その総数を求める（ステップ２２）。次に
、方向コードの総数に基づいてＸ方向、Ｙ方向への分割
座標を求める。例えば、領域をｎＸｍに分割するとして
、方向コードの総数をそこで、入力パターンをＸ方向に
スキャンし、方向コード数が各分割点となるＸ座標を求
める（ステップ２５）。同様に、Ｙ方向の分割点は一一
一。Next, the direction codes attached to the contours of this input pattern are counted and the total number is determined (step 22). Next, the division coordinates in the X direction and the Y direction are determined based on the total number of direction codes. For example, assuming that the area is divided into nXm, the input pattern is scanned in the X direction using the total number of direction codes, and the X coordinates at which the number of direction codes corresponds to each division point are determined (step 25). Similarly, the dividing point in the Y direction is 111.

ｍ　　　　　　　　　　ｍ２７）。そこで、入力パターンをＹ方向にスキャンし、
方向コード数が各分割点となるＹ座標を求める（ステッ
プ２８）。m m27). Therefore, scan the input pattern in the Y direction,
The Y coordinate at which the direction code number corresponds to each division point is determined (step 28).

第１図は本発明による特徴量生成を説明するためのフロ
ーチャートである。FIG. 1 is a flowchart for explaining feature amount generation according to the present invention.

まず、各分割したブロック毎に、それぞれの方向コード
のヒストグラムを作成する（ステップ１１）１次に、ホ
ストプロセッサからの統合指示データを読み込み（ステ
ップ１２）、この統合指示データを参照して該当ブロッ
クのヒストグラムを統合する（ステップ１３）。第３図
は４×４に領域を分割する例であり、この場合の統合す
るブロックと統合指示データの一例を第４図に示す。First, a histogram of each direction code is created for each divided block (step 11).Next, the integrated instruction data from the host processor is read (step 12), and this integrated instruction data is referenced to block the corresponding block. (step 13). FIG. 3 shows an example of dividing an area into 4×4 areas, and FIG. 4 shows an example of blocks to be integrated and integration instruction data in this case.

第５図は本発明の方法を実現するハードウェア構成の概
略ブロック図である。第５図において、ホストコンピュ
ータ５１とＯＣＲプロセッサ５２はメモリ５３を共用し
ている。操作者はキーボード５４を用いて第４図に示す
如き形式の統合指示データをホストコンピュータ５１に
入力する。この統合指示データをホストコンピュータ５
１はメモリ５３に書込む。一方、ＯＣＲプロセッサ５２
は原稿５５を読み取り、その入力パターンについて、第
２図のフローにしたがって輪郭部の方向コード付は及び
領域分割を行った後、第１図のフローにしたがってヒス
トグラムを作成する。このヒストグラム作成時、ＯＣＲ
プロセッサ５２はメモリ５３の統合指示データを読み取
って統合するブロックを判定し、指示された複数ブロッ
クのヒストグラムを統合する。その後、ＯＣＲプロセッ
サ５２は、得られた各ヒストグラムを入力パターンの特
徴量としてあらかじめ用意した辞書と比較演算し、その
距離により候補文字を決定する。FIG. 5 is a schematic block diagram of a hardware configuration that implements the method of the present invention. In FIG. 5, a host computer 51 and an OCR processor 52 share a memory 53. The operator uses the keyboard 54 to input integrated instruction data in the format shown in FIG. 4 into the host computer 51. This integrated instruction data is transferred to the host computer 5.
1 is written to the memory 53. On the other hand, OCR processor 52
reads the original document 55, attaches a direction code to the outline and divides the input pattern into regions according to the flowchart shown in FIG. 2, and then creates a histogram according to the flowchart shown in FIG. When creating this histogram, OCR
The processor 52 reads the integration instruction data in the memory 53, determines the blocks to be integrated, and integrates the histograms of the specified blocks. Thereafter, the OCR processor 52 compares each of the obtained histograms with a dictionary prepared in advance as a feature quantity of the input pattern, and determines candidate characters based on the distance.

なお、統合するブロックは、あらかじめ辞書を作成する
時に多変量解析等の手法を使い、識別能力を判定して決
定すればよい、また、操作者は文字種又はフォント毎に
統合するブロックを可変にできる。Note that the blocks to be integrated can be determined by using methods such as multivariate analysis when creating the dictionary in advance and determining the discrimination ability.Also, the operator can change the blocks to be integrated for each character type or font. .

〔effect〕

本発明によれば、入力パターンをその輪郭部に方向コー
ドを付けて複数ブロックに分割し、該分割したブロック
毎にその方向コード別のヒストグラムをとり、この各ヒ
ストグラムを特徴量として文字認識行う際、一部のブロ
ックを統合して特徴量を作成することにより、特徴量が
削減できるため、認識速度の向上が期待できる。また、
統合する特徴量は辞書識別能力の低いものであることか
ら、認識率の向上が期待でき、かつ、辞書容量が削減で
きる。According to the present invention, an input pattern is divided into a plurality of blocks by attaching a direction code to its outline, a histogram is obtained for each direction code for each divided block, and each histogram is used as a feature when performing character recognition. By merging some blocks to create a feature, the number of features can be reduced, and recognition speed can be expected to improve. Also,
Since the features to be integrated have low dictionary identification ability, it is expected that the recognition rate will improve and the dictionary capacity can be reduced.

[Brief explanation of drawings]

第１図は本発明方法を説明するためのフローチャート、
第２図は入力パターンの領域分割を説明するためのフロ
ーチャート、第３図は領域分割の一例を示す図、第４図
は統合指示データと統合ブロックの一例を示す図、第５
図は本発明を実現するハードウェア構成のブロック図で
ある。１・・・ホストコンピュータ、　　５２・・・ＯＣＲプ
ロセッサ、　５３・・・メモリ、　５４・・・キーボー
ド。５５・・・原稿。FIG. 1 is a flowchart for explaining the method of the present invention,
FIG. 2 is a flowchart for explaining region division of an input pattern, FIG. 3 is a diagram showing an example of region division, FIG. 4 is a diagram showing an example of integrated instruction data and integrated blocks, and FIG.
The figure is a block diagram of a hardware configuration for realizing the present invention. DESCRIPTION OF SYMBOLS 1...Host computer, 52...OCR processor, 53...Memory, 54...Keyboard. 55...Manuscript.

Claims

[Claims]

(1) In a character recognition device that divides an input pattern in which a direction code is attached to an outline into a plurality of blocks, takes a histogram for each direction code for each block, and performs character recognition using each histogram as a feature quantity, A method for generating feature amounts in a character recognition device, the method comprising generating a histogram by integrating parts of divided blocks.

(2) A method for generating feature amounts in a character recognition device according to claim 1, wherein an operator arbitrarily specifies the blocks to be integrated.