JPH06231307A - Optical character recognition system - Google Patents

Optical character recognition system

Info

Publication number
JPH06231307A
JPH06231307A JP5015247A JP1524793A JPH06231307A JP H06231307 A JPH06231307 A JP H06231307A JP 5015247 A JP5015247 A JP 5015247A JP 1524793 A JP1524793 A JP 1524793A JP H06231307 A JPH06231307 A JP H06231307A
Authority
JP
Japan
Prior art keywords
character
recognition
image
matching
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP5015247A
Other languages
Japanese (ja)
Inventor
Michiyo Ito
美智代 伊藤
Masato Teramoto
正人 寺本
Hiroaki Haneda
浩章 羽田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Hitachi Asahi Electronics Co Ltd
Original Assignee
Hitachi Ltd
Hitachi Asahi Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd, Hitachi Asahi Electronics Co Ltd filed Critical Hitachi Ltd
Priority to JP5015247A priority Critical patent/JPH06231307A/en
Publication of JPH06231307A publication Critical patent/JPH06231307A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To hold recognition precision and to shorten recognition time by using feature values in common by the two processings of character segment and matching. CONSTITUTION:A recognition processor generates a projection table 31 where a black picture element is projected in a direction vertical to a character string, namely, a longitudinal direction in an input image 30 stored in a picture memory and stores it in a work memory. Then, a projection table 51 where the black picture element is projected in a lateral direction is generated in a belt-form image 36 including only one extracted character and stores it in the work memory. Furthermore, the upper/lower positions 54 and 55 of the character are decided, a rectangular image (input pattern) 56 is segmented and is stored in the work memory. The input pattern 56 is matched with a recognition dictionary which is previously registered after the segment of the character completes. At that time, matching using the longitudinal/lateral projection tables 31 and 51 generated at the time of segmenting the character is executed, and it is applied for large classification or the extraction of a candidate character.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、光学式文字認識におい
て、一度求めた特徴量データを複数の処理で共通に利用
することにより、高速かつ精度を保って文字認識を行う
方式に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for performing character recognition at high speed and with high accuracy by commonly using feature quantity data once obtained in a plurality of processes in optical character recognition.

【0002】[0002]

【従来の技術】一般に、文字認識装置では、文字切り出
し時に必要な特徴量データを求め、マッチング時には、
新たに、必要な特徴量デ−タを求めるという具合に、各
処理ごとに適合した特徴量の計算を行っている。
2. Description of the Related Art Generally, in a character recognition device, characteristic amount data necessary for character extraction is obtained, and at the time of matching,
A new feature amount data is newly calculated, and a feature amount suitable for each process is calculated.

【0003】又、文字切り出し処理を行った後には、ノ
イズやエッジ誤差のない、本来の文字矩形でマッチング
処理を行うため、専用にノイズ等を除去する処理を加え
ることが必要であった。
Further, since the matching process is performed with the original character rectangle without noise and edge error after the character cutting process, it is necessary to add a process for removing noise and the like.

【0004】例えば特開平4−205690号公報に
は、認識率向上を図るために少なくとも2つの認識方法
を用いることが提案されている。
For example, Japanese Patent Laid-Open No. 4-205690 proposes to use at least two recognition methods in order to improve the recognition rate.

【0005】[0005]

【発明が解決しようとする課題】イメージデータから指
定の文字を切り出し、認識辞書とマッチングを行う文字
認識装置において、処理の種類が多い、又は各処理で計
算する特徴量が多いと、認識率は向上するが、処理時間
は多くかかるという問題がある。
In a character recognition device that cuts out a specified character from image data and performs matching with a recognition dictionary, if there are many types of processing or there are many feature quantities calculated in each processing, the recognition rate will be high. Although improved, there is a problem that it takes a lot of processing time.

【0006】本発明の目的は、認識精度は保ったまま、
より高速に文字認識を行うための処理方式を提供するこ
とにある。
An object of the present invention is to maintain recognition accuracy,
It is to provide a processing method for performing character recognition at a higher speed.

【0007】[0007]

【課題を解決するための手段】上記目的を達成するため
に、各処理で計算する特徴量を減らすことを考える。
In order to achieve the above object, it is considered to reduce the feature amount calculated in each process.

【0008】本発明によれば、文字切り出し処理のため
に作成した特徴量のテーブルを、マッチングの一処理の
手段として再度利用する、という文字認識の処理方式が
提供される。
According to the present invention, there is provided a character recognition processing method in which the table of the characteristic amount created for the character cutting processing is reused as a matching processing means.

【0009】又、処理の種類を少なくするために、文字
切り出しの際、その特徴量を用いてノイズ除去も同時に
行えるように、投影値が数ドットで、かつ、文字の端と
考えられる部分は、あらかじめノイズ等として除いて文
字を取り出す、という処理方式が提供される。
In order to reduce the number of types of processing, when a character is cut out, noise can be removed at the same time by using the feature amount of the character. A processing method is provided in which a character is extracted in advance as noise or the like and a character is extracted.

【0010】[0010]

【作用】本発明によれば、文字切り出しとマッチングの
2処理で特徴量を共通使用することにより、マッチング
専用の特徴データを作成する手間を省くことができるた
め、データ有効利用による処理時間の短縮が可能とな
る。
According to the present invention, since the feature amount is commonly used in the two processes of the character segmentation and the matching, it is possible to save the labor for creating the feature data dedicated to the matching, so that the processing time can be shortened by effectively utilizing the data. Is possible.

【0011】さらに、本発明によれば、投影値の小さい
部分は除いて文字切り出しを行うことにより、ノイズ除
去処理を改めて設定しなくても本来の文字矩形を得られ
るため、処理時間の縮減かつ認識精度の保持が可能とな
る。
Further, according to the present invention, the character rectangle is extracted by excluding the portion having a small projection value, and the original character rectangle can be obtained without setting the noise removal processing again. Therefore, the processing time can be reduced. It is possible to maintain the recognition accuracy.

【0012】[0012]

【実施例】以下、本発明の一実施例を詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described in detail below.

【0013】図1は本発明の一実施例による文字認識処
理の流れ図、図2は本発明が適用される光学式文字認識
装置の機能ブロック図、、図3は文字切り出しの例、図
4はマッチングの例を示す。
FIG. 1 is a flow chart of character recognition processing according to an embodiment of the present invention, FIG. 2 is a functional block diagram of an optical character recognition device to which the present invention is applied, FIG. 3 is an example of character cutout, and FIG. An example of matching is shown.

【0014】はじめに、文字認識処理の概略を、図2を
用いて説明する。光学式文字認識装置10は認識処理前
に、画像幅、画像高さ、スライスレベル等が設定された
入力テーブル11を上位装置から入力し、ワークメモリ
17に格納する。次に、制御プロセッサ16は入力した
入力テーブル11に基づきスキャナ12を起動して、光
電変換によりイメージを読み込み、画像メモリ13に格
納する。その後、制御プロセッサ16が認識の起動をか
けると、認識プロセッサ14は画像メモリ13上でイメ
ージデータから指定文字の外接矩形イメージを切り出す
処理を行い、切り出されたイメージデータをワークメモ
リ17に入力する。そして、それを認識辞書15とマッ
チングすることにより文字認識を行い、認識結果をワー
クメモリ17に格納し、処理を終える。
First, the outline of the character recognition processing will be described with reference to FIG. The optical character recognition device 10 inputs the input table 11 in which the image width, the image height, the slice level, etc. are set from the host device and stores it in the work memory 17 before the recognition process. Next, the control processor 16 activates the scanner 12 based on the input table 11 that has been input, reads an image by photoelectric conversion, and stores it in the image memory 13. After that, when the control processor 16 activates recognition, the recognition processor 14 performs a process of cutting out a circumscribed rectangular image of a specified character from the image data on the image memory 13, and inputs the cut-out image data to the work memory 17. Then, by matching it with the recognition dictionary 15, character recognition is performed, the recognition result is stored in the work memory 17, and the processing is ended.

【0015】次に、図1、図2、図3及び図4により、
共通の特徴量を用いて文字切り出しとマッチングを行う
手順を詳細に説明する。
Next, referring to FIG. 1, FIG. 2, FIG. 3 and FIG.
A procedure for performing character segmentation and matching using a common feature amount will be described in detail.

【0016】まず、認識プロセッサ14は、画像メモリ
13に格納されている入力イメージ30(図3参照)に
おいて、文字列と垂直な方向、すなわち、縦方向に黒画
素を投影した投影テーブル31を作成しワークメモリ1
7に格納する(図1ステップ22)。投影値により、ライ
ン毎の黒画素の存在状況がわかるため、あるしきい値3
2を設定し、投影値がそれ以上であるライン33を文字
部分とみなすことができる。これにより、投影値のしき
い値32以下からしきい値32以上への変化点34を文
字の左位置、しきい値32以上からしきい値32以下へ
の変化点35を文字の右位置と決定し、文字の左右に外
接する帯状のイメージ36を抽出することができる(図
1ステップ23)。つぎに、抽出された、1文字のみを
含む帯状のイメージ36において、横方向に投影した投
影テーブル51を作成しワークメモリ17に格納する、
これに基づいて、左右位置決定時と同様に、文字の上下
位置54、55を決定し(図1ステップ24)、文字の上
下に外接するように矩形イメージ(入力パターン)56を
切り出しワークメモリ17に格納する。(図1ステップ
25)。以上の処理により、文字の外接矩形が抽出さ
れ、文字切り出し(20)が完了する。
First, the recognition processor 14 creates a projection table 31 in which black pixels are projected in a direction perpendicular to a character string, that is, a vertical direction in the input image 30 (see FIG. 3) stored in the image memory 13. Work memory 1
7 (step 22 in FIG. 1). Since the presence of black pixels for each line can be known from the projection value, a certain threshold value 3
2 can be set, and the line 33 having a projection value higher than that can be regarded as a character portion. Thus, the change point 34 from the threshold value 32 or less to the threshold value 32 or more of the projection value is the left position of the character, and the change point 35 from the threshold value 32 or more to the threshold value 32 or less is the right position of the character. It is possible to determine and extract the band-shaped image 36 that circumscribes the left and right of the character (step 23 in FIG. 1). Next, in the extracted strip-shaped image 36 containing only one character, a projection table 51 projected in the horizontal direction is created and stored in the work memory 17.
Based on this, the vertical positions 54 and 55 of the character are determined (step 24 in FIG. 1), and a rectangular image (input pattern) 56 is cut out so as to circumscribe the upper and lower sides of the character, as in the case of determining the horizontal position. To store. (FIG. 1, step 25). By the above processing, the circumscribed rectangle of the character is extracted, and the character segmentation (20) is completed.

【0017】その後、入力パターン56を、あらかじめ
登録しておいた認識辞書15とマッチングする。その
際、文字切り出し時に作成した、縦・横投影テーブル3
1、51を用いたマッチングを行い、大分類(図1ステ
ップ26)、あるいは、候補文字の抽出(図1ステップ2
7)等に適用する。これにより、マッチングの際に新た
な特徴量を用意する手間が省けるわけである。
Thereafter, the input pattern 56 is matched with the recognition dictionary 15 registered in advance. At that time, the vertical / horizontal projection table 3 created when the characters were cut out
Matching using 1 and 51 is performed to perform a large classification (step 26 in FIG. 1) or extraction of candidate characters (step 2 in FIG. 1).
Apply to 7) etc. This saves the trouble of preparing a new feature amount at the time of matching.

【0018】続いて、図4を用いて、縦・横投影テーブ
ル31、51を用いたマッチング方法について述べる。
特徴量としては、縦投影テーブル31、横投影テーブル
51それぞれの文字切り出しにおいて文字と判断された
部分、すなわち、投影値がしきい値以上であるラインの
部分33、53を使用する。そして、文字矩形イメージ
(入力パターン)56と辞書文字イメージ(標準パターン)
46との類似度を求めるため、各投影テーブル31、5
1において、式41に示す距離計算を施し、距離の小さ
いものを類似度が高いとみなす。
Next, a matching method using the vertical / horizontal projection tables 31 and 51 will be described with reference to FIG.
As the feature quantity, the portions judged to be characters in the character segmentation of the vertical projection table 31 and the horizontal projection table 51, that is, the portions 33 and 53 of the lines whose projection values are equal to or greater than the threshold value are used. And the character rectangle image
(Input pattern) 56 and dictionary character image (standard pattern)
In order to obtain the similarity with 46, the projection tables 31, 5
1, the distance calculation shown in Expression 41 is performed, and the one with a short distance is considered to have a high degree of similarity.

【0019】距離=Σ|入力パターン(i)−標準パター
ン(i)| ……(式41) そして、例えば、縦投影テーブル31の類似度により大
分類を行い(図1ステップ26)、大分類された辞書内文
字に対して、今度は横投影テーブル51の類似度を求
め、候補文字をさらに絞り(図1ステップ27)。その
後、例えば、入力パターンの右端中心位置と標準パター
ンの右端中心位置を合わせて重ね合わせを行い、一致す
る黒画素のドット数をカウントすることにより類似度を
求めるパターンマッチング法等により、1文字を決定す
る(図1ステップ28)。これで1文字の認識が完了する
(21)。
Distance = Σ | input pattern (i) -standard pattern (i) | (Equation 41) Then, for example, the large classification is performed according to the similarity of the vertical projection table 31 (step 26 in FIG. 1), and the large classification is performed. Next, the similarity of the horizontal projection table 51 is obtained for the characters in the created dictionary, and the candidate characters are further narrowed down (step 27 in FIG. 1). After that, for example, the right edge center position of the input pattern and the right edge center position of the standard pattern are aligned and overlapped, and one character is extracted by a pattern matching method or the like that calculates the degree of similarity by counting the number of matching black pixel dots. It is decided (step 28 in FIG. 1). This completes recognition of one character
(21).

【0020】以上のような方式を採れば、文字切り出し
で使った特徴量をマッチングでも有効に利用することが
可能となる。
If the above-mentioned method is adopted, it is possible to effectively use the feature amount used in the character extraction even in matching.

【0021】一方、文字切り出し時、しきい値を、非ノ
イズ部分とみなせる投影値の最小値32、52に設定す
れば、投影値が数ドットしか現れていない部分37、5
7は予め除いて取り出せるので、文字に接触するノイズ
38a、文字エッジの2値化に伴う誤差部分38bも同時
に除去できる。よって、認識時に改めてノイズ除去処理
を行うことが不要となるため、精度を保ったまま、より
処理時間を短縮することにつながるという効果もある。
On the other hand, when the character is cut out, if the threshold value is set to the minimum projection values 32 and 52 that can be regarded as non-noise portions, the projection values 37, 5 where only a few dots appear.
Since 7 can be removed in advance and taken out, the noise 38a contacting the character and the error portion 38b due to the binarization of the character edge can be simultaneously removed. Therefore, it is not necessary to perform the noise removal process again at the time of recognition, and there is an effect that the processing time can be further shortened while maintaining the accuracy.

【0022】[0022]

【発明の効果】本発明によれば、帳票イメージにおける
文字を認識する光学式文字認識装置において、認識精度
は保持したまま、認識時間の短縮が可能となる。
According to the present invention, in an optical character recognition device for recognizing characters in a form image, the recognition time can be shortened while maintaining the recognition accuracy.

【0023】[0023]

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例による文字認識処理の流れを
示す図である。
FIG. 1 is a diagram showing a flow of character recognition processing according to an embodiment of the present invention.

【図2】本発明が適用される光学式文字認識装置の機能
ブロック図である。
FIG. 2 is a functional block diagram of an optical character recognition device to which the present invention is applied.

【図3】帳票イメージ画像から1文字単位に文字を切り
出す処理を示す図である。
FIG. 3 is a diagram illustrating a process of cutting out a character from a form image image in units of one character.

【図4】縦・横投影テーブルを用いてマッチングを行う
処理を示す図である。
FIG. 4 is a diagram showing a process of performing matching using a vertical / horizontal projection table.

【符号の説明】[Explanation of symbols]

10 …………光学式文字認識装置 12 …………スキャナ 13 …………画像メモリ 14 …………認識プロセッサ 15 …………認識辞書 16 …………制御プロセッサ 17 …………ワークメモリ 31 …………縦投影テーブル 51 …………横投影テーブル 38a …………文字に接触するノイズ 38b …………文字エッジの2値化に伴う誤差 10 ………… Optical character recognition device 12 ………… Scanner 13 ………… Image memory 14 ………… Recognition processor 15 ………… Recognition dictionary 16 ………… Control processor 17 ………… Work Memory 31 ………… Vertical projection table 51 ………… Horizontal projection table 38a ………… Noise in contact with characters 38b ………… Error due to binarization of character edges

───────────────────────────────────────────────────── フロントページの続き (72)発明者 羽田 浩章 愛知県尾張旭市晴丘町池上1番地 株式会 社日立旭エレクトロニクス内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hiroaki Haneda No. 1 Ikegami, Haruoka-cho, Owariasahi-shi, Aichi Hitachi Asahi Electronics Co., Ltd.

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】帳票のイメージデータを画像メモリに格納
し、該画像メモリに格納されたイメージデータの指定文
字を切り出して認識を行う光学式文字認識方式におい
て、文字切り出し処理時に作成した特徴量デ−タを、文
字認識処理時に再利用して特徴量計算の回数を減らすこ
とにより、認識までの処理時間を短縮することを特徴と
する光学式文字認識方式。
1. An optical character recognition method in which image data of a form is stored in an image memory and a specified character of the image data stored in the image memory is cut out for recognition, and a feature amount data created at the time of character cutting processing is used. -The optical character recognition method characterized in that the processing time until recognition is shortened by reusing the data in the character recognition processing to reduce the number of times the feature amount is calculated.
【請求項2】請求項1記載の光学式文字認識方式におい
て、文字切り出しと同時に、前記特徴量データを用いて
文字エッジの2値化に伴う誤差や文字に接触するノイズ
の除去も行うことを特徴とする光学式文字認識方式。
2. The optical character recognition method according to claim 1, wherein at the same time as character extraction, an error associated with binarization of a character edge and noise that comes into contact with a character are removed by using the feature amount data. Characteristic optical character recognition method.
JP5015247A 1993-02-02 1993-02-02 Optical character recognition system Pending JPH06231307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP5015247A JPH06231307A (en) 1993-02-02 1993-02-02 Optical character recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP5015247A JPH06231307A (en) 1993-02-02 1993-02-02 Optical character recognition system

Publications (1)

Publication Number Publication Date
JPH06231307A true JPH06231307A (en) 1994-08-19

Family

ID=11883532

Family Applications (1)

Application Number Title Priority Date Filing Date
JP5015247A Pending JPH06231307A (en) 1993-02-02 1993-02-02 Optical character recognition system

Country Status (1)

Country Link
JP (1) JPH06231307A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021177333A1 (en) 2020-03-03 2021-09-10 国立大学法人信州大学 Adiponectin quantification method and analysis reagent for use in said method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021177333A1 (en) 2020-03-03 2021-09-10 国立大学法人信州大学 Adiponectin quantification method and analysis reagent for use in said method

Similar Documents

Publication Publication Date Title
JP2926066B2 (en) Table recognition device
JPH06231307A (en) Optical character recognition system
JPH1125222A (en) Method and device for segmenting character
JPH0410087A (en) Base line extracting method
JPH09274645A (en) Method and device for recognizing character
JP3411795B2 (en) Character recognition device
JP3140079B2 (en) Ruled line recognition method and table processing method
JP2909132B2 (en) Optical character reader
JP3848792B2 (en) Character string recognition method and recording medium
JPH05210761A (en) Character recognizing device
JPH04335487A (en) Character segmenting method for character recognizing device
JP2001266070A (en) Device and method for recognizing character and storage medium
JP3566738B2 (en) Shaded area processing method and shaded area processing apparatus
JPH05174178A (en) Character recognizing method
JPH10171924A (en) Character recognizing device
JP2000339408A (en) Character segment device
JP3345469B2 (en) Word spacing calculation method, word spacing calculation device, character reading method, character reading device
JP2888885B2 (en) Character extraction device
JPH05128305A (en) Area dividing method
JPH05108880A (en) English character recognition device
JPH05114047A (en) Device for segmenting character
JPH03160582A (en) Method for separating ruled line and character in document picture data
JPH03225576A (en) Device for segmenting word
JPH05233877A (en) Word reading method
JP2003030586A (en) Method and device for processing business form and program therefor