JPH0731711B2

JPH0731711B2 - Optical character reader

Info

Publication number: JPH0731711B2
Application number: JP58013985A
Authority: JP
Inventors: 吉久田辺
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1983-01-31
Filing date: 1983-01-31
Publication date: 1995-04-10
Anticipated expiration: 2010-04-10
Also published as: JPS59149569A

Description

【発明の詳細な説明】〔発明の技術分野〕この発明は、自由手書文字の読取処理を行なう光学的文
字読取装置に関する。Description: TECHNICAL FIELD OF THE INVENTION The present invention relates to an optical character reading device for performing reading processing of free handwritten characters.

[Technical background of the invention and its problems]

近年、光学的文字読取装置（以下OCRと称する）には、
手書文字の読取りを行なうことができる方式のものがあ
る。このような方式のOCRでは、個人の癖による文字の
変形に対して、辞書メモリに予め標準パターンを多数用
意して対応することが行なわれる。In recent years, optical character readers (hereinafter referred to as OCR) have
There is a system that can read handwritten characters. In the OCR of such a system, a large number of standard patterns are prepared in advance in the dictionary memory to deal with the deformation of characters due to individual habits.

しかしながら、上記のような方式のOCRでは、不特定多
数の人が記入する自由手書文字の読取りを行なう場合、
用意すべき標準パターンが膨大となるため、膨大な容量
の辞書メモリを必要とする。この標準パターンの記憶量
を減少すると、OCRの読取精度が低下するという障害が
生ずることになる。従来では、上記読取精度の低下に対
して、文章処理等の後処理を行なうことにより、読取精
度を高めるようにしたものがある。しかしながら、この
場合、手書文字が純粋な数字のみのときには、適用でき
ない不都合がある。However, in the OCR of the above-mentioned method, when reading freehand written characters written by unspecified number of people,
Since the number of standard patterns to prepare is huge, a huge amount of dictionary memory is required. If the storage amount of the standard pattern is reduced, the OCR reading accuracy will be degraded. Conventionally, there has been a method in which post-processing such as text processing is performed to improve the reading accuracy with respect to the deterioration of the reading accuracy. However, in this case, there is an inconvenience that cannot be applied when the handwritten characters are pure numbers only.

[Object of the Invention]

この発明は上記の事情に鑑みてなされたもので、自由手
書文字の読取処理において、特別に大きな容量の辞書メ
モリを用いることなく、高い精度で読取処理を行なうこ
とができる光学的文字読取装置を提供することを目的と
する。The present invention has been made in view of the above circumstances, and an optical character reading device capable of performing a reading process with high accuracy in a reading process of freehand-written characters without using a dictionary memory having a particularly large capacity. The purpose is to provide.

[Outline of Invention]

この発明は、用紙上を走査して光電変換された１文字毎
に文字パターンの特徴を抽出する特徴抽出部、この特徴
抽出部から出力された文字パターンの特徴を少なくとも
１シート分格納する特徴メモリ、この特徴メモリから読
出した文字パターンの特徴どうしの類似性を求め、一定
値以上の類似性がある文字パターンの特徴を特定の文字
に対応する文字パターンの特徴のグループとして分類す
る分類部、分類されたグループ毎に平均パターン特徴を
抽出する平均パターン抽出部、および抽出された平均パ
ターン特徴と辞書メモリに予め記憶された標準パターン
とを照合し、特定の文字を識別する識別部を備えた光学
的文字読取装置である。これにより、特別に大容量の辞
書メモリを用いることなく、自由手書文字の読取処理を
高い精度で行なうことができるものである。The present invention relates to a feature extraction unit that extracts a feature of a character pattern for each character that is photoelectrically converted by scanning a sheet, and a feature memory that stores at least one sheet of the feature of a character pattern output from the feature extraction unit. , A classifying unit that obtains the similarity between the characteristics of the character patterns read from the characteristic memory, and classifies the characteristics of the character patterns having a certain value of similarity as a group of the characteristics of the character pattern corresponding to a specific character. An average pattern extractor for extracting an average pattern feature for each selected group, and an optical disc having an identifying unit for identifying a specific character by comparing the extracted average pattern feature with a standard pattern stored in advance in a dictionary memory. It is a static character reader. As a result, the free handwritten character reading process can be performed with high accuracy without using a particularly large-capacity dictionary memory.

Example of Invention

以下図面を参照してこの発明の一実施例について説明す
る。図中、１は光電変換部で、用紙上を走査して得られ
る自由手書文字の走査パターンＡを２値化信号に変換
し、しかも適切な前処理を行なう。特徴抽出部２は、光
電変換部１から送られる１文字毎の文字パターンの特徴
を抽出する。特徴メモリ３は、特徴抽出部２で抽出され
た特徴を文字毎に記憶し、例えば１シート分の記憶容量
を有する。分類部４は、特徴メモリ３から読出した文字
パターン特徴を文字毎の複数のグループに分類する。５
は判定処理部で、分類部４での分類処理の終了を判定
し、終了と判定した場合には分類部４からグループ毎の
文字パターン特徴群を平均パターン抽出部６に送るよう
にする。また、判定処理部５は、分類処理が不可能、即
ち類似性の計算結果に基づいて一定値以上の類似性が得
られず文字パターン特徴を特定のグループに分類できな
いと判定した場合、判定部７に制御信号Ｂを出力する。
判定部７は、判定処理部５の制御信号Ｂに基づいて、光
電変換部１からの処理を再度行なうようにする制御信号
C1または特徴抽出部２からの処理を再度行なうようにす
る制御信号C2を出力する。平均パターン抽出部６は、文
字毎のグループの平均的なパターン特徴を抽出する。８
は識別部で、辞書メモリ９に予め格納されている標準パ
ターンに基づいて、平均パターン抽出部６からのパター
ン特徴から文字を識別して出力する。An embodiment of the present invention will be described below with reference to the drawings. In the figure, reference numeral 1 denotes a photoelectric conversion unit which converts a scanning pattern A of free handwritten characters obtained by scanning a sheet into a binarized signal, and also performs an appropriate preprocessing. The feature extraction unit 2 extracts the features of the character pattern for each character sent from the photoelectric conversion unit 1. The feature memory 3 stores the features extracted by the feature extraction unit 2 for each character, and has a storage capacity of, for example, one sheet. The classification unit 4 classifies the character pattern features read from the feature memory 3 into a plurality of groups for each character. 5
Is a judgment processing unit, which judges the end of the classification process in the classification unit 4, and when it is judged that the classification process is completed, the character pattern feature group for each group is sent from the classification unit 4 to the average pattern extraction unit 6. If the determination processing unit 5 determines that the classification process is impossible, that is, the similarity of a certain value or more is not obtained based on the similarity calculation result, and the character pattern features cannot be classified into a specific group, the determination unit 5 The control signal B is output to 7.
Based on the control signal B of the determination processing unit 5, the determination unit 7 is a control signal that causes the process from the photoelectric conversion unit 1 to be performed again.
A control signal C2 that causes the processing from C1 or the feature extraction unit 2 to be performed again is output. The average pattern extraction unit 6 extracts average pattern features of each character group. 8
Is an identification unit that identifies and outputs a character from the pattern feature from the average pattern extraction unit 6 based on a standard pattern stored in advance in the dictionary memory 9.

このような構成において、その動作を説明する。いま仮
に、１シート分の自由手書文字が１行毎に光電変換部１
で２値化信号に変換され、特徴抽出部２に送られる。特
徴抽出部２は、１文字毎に切出された文字パターンの特
徴を抽出する。特徴メモリ３は、１シート分または複数
シート分（但し、同一個人が文字を記入したことが判明
している範囲）の文字パターン特徴を文字毎に記憶す
る。さらに、分類部４は、特徴メモリ３から文字パター
ン特徴を読出して、類似する文字毎のグループに分類す
る。この場合、分類部４は文字パターン特徴間のマッチ
ングまたは特徴コードの相違度等により、類似性を判断
し分類することになる。判定処理部５は、分類部４での
類似性計算において一定の値以上の類似性が得られた場
合には、分類処理を終了したと判定する。また、一定の
値以上の類似性が得られずに、分類処理が不可であると
判定した場合、その内容に応じた制御信号Ｂを出力す
る。例えば、光電変換部１において文字像の信号化によ
る再現が不十分である場合、判定処理部５は再度光電変
換部１からの処理を指示する制御信号Ｂを判定部７に出
力する。判定部７は、制御信号Ｂに基づいて光電変換部
１からの処理または特徴抽出部２からの処理を再度行な
うように制御信号C1,C2を出力することになる。The operation of such a configuration will be described. Now, suppose that one sheet of freehand handwriting is line-by-line for photoelectric conversion unit 1.
Is converted into a binary signal and sent to the feature extraction unit 2. The feature extraction unit 2 extracts the features of the character pattern cut out for each character. The feature memory 3 stores the character pattern features for one sheet or a plurality of sheets (however, it is known that the same individual has written a character) for each character. Further, the classification unit 4 reads the character pattern features from the feature memory 3 and classifies the character pattern features into groups of similar characters. In this case, the classification unit 4 judges and classifies the similarity based on the matching between the character pattern features or the degree of difference between the feature codes. The determination processing unit 5 determines that the classification process is completed when the similarity calculation performed by the classification unit 4 yields a similarity equal to or greater than a certain value. Further, when it is determined that the classification processing is impossible because the similarity of a certain value or more is not obtained, the control signal B corresponding to the content is output. For example, when the photoelectric conversion unit 1 does not sufficiently reproduce the character image by signalization, the determination processing unit 5 outputs the control signal B instructing the process from the photoelectric conversion unit 1 to the determination unit 7 again. The determination unit 7 outputs the control signals C1 and C2 so that the process from the photoelectric conversion unit 1 or the process from the feature extraction unit 2 is performed again based on the control signal B.

ところで、判定処理部５が上記のように分類部４での分
類処理が終了したと判定した場合、分類部４からグルー
プ毎の文字パターン特徴群が平均パターン抽出部６に与
えられる。この平均パターン抽出部６で、文字パターン
特徴群の平均的なパターン特徴が計算されて求められ
る。この場合、平均パターン抽出部６では、例えば文字
パターン特徴群を距離空間に表現したとき、分布の中心
座標を計算する等の方法で平均的パターン特徴が求めら
れる。識別部８では、平均パターン抽出部６で求められ
た平均的パターン特徴がどの文字に相当するかを、辞書
メモリ９に予め格納された標準パターンに基づいて識別
する。By the way, when the determination processing unit 5 determines that the classification processing by the classification unit 4 is completed as described above, the classification unit 4 supplies the character pattern feature group for each group to the average pattern extraction unit 6. The average pattern extraction unit 6 calculates and obtains an average pattern feature of the character pattern feature group. In this case, the average pattern extraction unit 6 obtains the average pattern feature by a method such as calculating the center coordinates of the distribution when the character pattern feature group is expressed in the metric space. The identification unit 8 identifies which character the average pattern feature obtained by the average pattern extraction unit 6 corresponds to, based on a standard pattern stored in advance in the dictionary memory 9.

このようにして、用紙に記録された自由手書文字の読取
処理を行なうことができる。この場合、この発明では、
分類部４において、少なくとも１シート分の文字パター
ン特徴を類似計算により、文字毎のグループに分類す
る。したがって、分類されたグループは別々のカテゴリ
に分けられたことになり、そのグループ毎の平均パター
ン特徴から識別部８で文字を識別する場合、同一の答が
得られることはない。そのため、確実に自由手書文字の
読取を行なうことができる。In this way, the free handwritten characters recorded on the paper can be read. In this case, in the present invention,
The classification unit 4 classifies character pattern features for at least one sheet into groups for each character by similarity calculation. Therefore, the classified groups are divided into different categories, and the same answer cannot be obtained when the identifying unit 8 identifies the character from the average pattern feature of each group. Therefore, free handwritten characters can be reliably read.

〔The invention's effect〕

以上詳述したようにこの発明によれば、自由手書文字の
読取処理において、少なくとも１シート分の文字パター
ン特徴を文字毎のグループに分類し、そのグループ毎に
平均的パターン特徴を求めることにより、確実に文字の
識別を行なうことができる。したがって、結果的に特別
に大きな容量の辞書メモリに格納される多大な標準パタ
ーンを用いることなく、高い精度で自由手書文字の読取
処理を行なうことができるものである。As described above in detail, according to the present invention, in the free handwritten character reading process, the character pattern features for at least one sheet are classified into groups for each character, and the average pattern features are obtained for each group. Therefore, the characters can be surely identified. Therefore, as a result, the free handwritten character reading process can be performed with high accuracy without using a large standard pattern stored in the dictionary memory having a particularly large capacity.

【図面の簡単な説明】図はこの発明の一実施例に係る光学的文字読取装置の構
成を示すブロック図である。２…特徴抽出部、３…特徴メモリ、４…分類部、６…平
均パターン抽出部、７…辞書メモリ、８…識別部。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing the configuration of an optical character reading device according to an embodiment of the present invention. 2 ... feature extraction unit, 3 ... feature memory, 4 ... classification unit, 6 ... average pattern extraction unit, 7 ... dictionary memory, 8 ... identification unit.

Claims

[Claims]

1. A feature extraction unit for extracting a feature of a character pattern for each character that is photoelectrically converted by scanning a sheet, and stores the feature of the character pattern output from the feature extraction unit for at least one sheet. And a similarity between the features of the character pattern read from the feature memory, and the features of the character pattern having a similarity of a certain value or more as a group of features of the character pattern corresponding to a specific character. A classifying unit for classifying, an average pattern extracting unit for extracting an average pattern feature for each of the groups classified by the classifying unit, the average pattern feature extracted by the average pattern extracting unit, and a pre-stored dictionary memory. An optical character reading device comprising: an identification unit that matches the standard pattern and identifies the specific character.