JPH1166237A

JPH1166237A - Optical character reader

Info

Publication number: JPH1166237A
Application number: JP9226171A
Authority: JP
Inventors: Yoshimi Yamada; 義美山田; Yuji Hamazaki; 祐児浜崎; Katsumi Fukuchi; 克己福地
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1997-08-22
Filing date: 1997-08-22
Publication date: 1999-03-09

Abstract

PROBLEM TO BE SOLVED: To provide a optical character reader which is shortened in recognition processing time. SOLUTION: A document T is read by an image read part 30 for generating a multi-valued image signal S34. The image signal S34 is stored in an image memory 41, and the image signal S41 outputted from the image memory 41 is converted by a binarizing process part 42 into a binary image signal S42. The binary image signal S42 is filtered by a filter process part 43 to obtain an image signal S43. A character segmenting processing part 44 cuts character pattern areas S44 out of the image signal S43, character by character. For a character pattern area S44, an X-feature quantity extraction part 45a calculates an X-directional feature quantity S45a and a Y-feature quantity extraction part 45c calculates a Y-directional feature quantity S45c. A character discrimination part 46 collates the featured quantities S45a and S45c against the featured quantities of standard patterns so as to discriminate a character, and its code is outputted as a recognition result S46.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、認識処理時間を短
縮した光学式文字読取装置（以下、ＯＣＲという）に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an optical character reader (hereinafter referred to as "OCR") having a reduced recognition processing time.

【０００２】[0002]

【従来の技術】図２は、従来のＯＣＲの一例を示す構成
図である。このＯＣＲでは、帳票Ｔがイメージ読取部１
０中の図示しない機構部によって所定の位置へ搬送さ
れ、光源１１で照射される。帳票Ｔの面の反射光は、レ
ンズ１２により電荷結合素子センサ（以下、ＣＣＤセン
サという）１３上に結像され、該ＣＣＤセンサ１３によ
りアナログの電気信号Ｓ１３に変換される。この電気信
号Ｓ１３はアナログ／ディジタル変換部（以下、ＡＤＣ
という）１４で階調データのディジタルのイメージ信号
Ｓ１４に変換される。イメージ信号Ｓ１４は、認識部２
０中のイメージメモリ２１に格納され、更に２値化処理
部２２で白と黒とに２値化され、該２値化処理部２２か
らイメージ信号Ｓ２２が出力される。イメージ信号Ｓ２
２は、フィルタ処理部２３で文字パタン以外の黒点（例
えば、ごみ等）や文字パタン内の白抜けを除去するフィ
ルタ処理が施される。フィルタ処理部２３の出力信号Ｓ
２３は、文字切出処理部２４で１文字毎の文字パタン領
域Ｓ２４として切り出され、文字特徴抽出部２５へ送出
される。2. Description of the Related Art FIG. 2 is a block diagram showing an example of a conventional OCR. In this OCR, the form T is the image reading unit 1
The light is conveyed to a predetermined position by a mechanism (not shown) in FIG. The reflected light from the surface of the form T is imaged on a charge-coupled device sensor (hereinafter, referred to as a CCD sensor) 13 by a lens 12, and is converted into an analog electric signal S13 by the CCD sensor 13. This electric signal S13 is supplied to an analog / digital converter (hereinafter, ADC).
Is converted into a digital image signal S14 of gradation data. The image signal S14 is output to the recognition unit 2
The image data is stored in the image memory 21 in which the image data is 0, and is further binarized by the binarization processing unit 22 into white and black, and the binarization processing unit 22 outputs an image signal S22. Image signal S2
2, the filter processing unit 23 performs a filter process for removing black points (for example, dust) other than the character pattern and white spots in the character pattern. Output signal S of filter processing unit 23
23 is cut out as a character pattern area S24 for each character by a character cutout processing unit 24 and sent to a character feature extraction unit 25.

【０００３】文字特徴抽出部２５は、１文字毎に切出さ
れた文字パタン領域Ｓ２４に対してその文字認識方式に
合致した文字の特徴抽出を行って出力信号Ｓ２５を生成
する。文字識別部２６は、標準パタンを予め格納した辞
書と出力信号Ｓ２５とを照合することによって文字を決
定する文字認識処理を行い、対応する文字コードを認識
結果Ｓ２６として出力する。これらのイメージメモリ２
１、２値化処理部２２、フィルタ処理部２３、文字切出
処理部２４、文字特徴抽出部２５、及び文字識別部２６
は、認識制御部２７で制御される。図３は、図２中のイ
メージメモリ２１に格納された多値（例えば、８階調）
の文字パタンの例を示す図である。この図では、「・」
は白を表し、「０」が黒を表す。「１」〜「６」は白と
黒の中間で、数値が大きいほど白に近いことを表す。[0003] A character feature extraction unit 25 extracts a feature of a character conforming to the character recognition method from a character pattern area S24 extracted for each character, and generates an output signal S25. The character identification unit 26 performs a character recognition process of determining a character by comparing a dictionary in which a standard pattern is stored in advance with the output signal S25, and outputs a corresponding character code as a recognition result S26. These image memories 2
1, binarization processing unit 22, filter processing unit 23, character extraction processing unit 24, character feature extraction unit 25, and character identification unit 26
Is controlled by the recognition control unit 27. FIG. 3 shows a multi-valued (for example, 8 gradation) stored in the image memory 21 in FIG.
It is a figure which shows the example of the character pattern of. In this figure,
Represents white, and “0” represents black. “1” to “6” are between white and black, and the larger the numerical value, the closer to white.

【０００４】図４は、図３の多値の文字パタンに対し、
図２中の２値化処理部２２、フィルタ処理部２３、及び
文字切出処理部２４でそれぞれ２値化処理、フィルタ処
理、及び１文字毎に切出す処理を施した文字パタンの例
を示す図である。この図では、２値化の閾値を例えば
「３」とした場合の例が示されている。従って、図３中
の「・」〜「４」は白になり、「０」〜「３」が黒にな
っている。このようにして切出された１文字単位の文字
パタン領域Ｓ２４に対し、文字特徴抽出部２５及び文字
識別部２６で文字認識処理が行われる。この文字認識処
理において、数字、英字、カタカナを対象とした文字認
識方式として、文字のストロークを解析するストローク
・アナリシス法や、文字のパターンと基準パターンとの
一致を調べるパターン・マッチング法、更に、文字の変
形をより吸収する方法として、文字のパターンの線構造
を解析する方法や、その背景構造を解析する方法等、種
々の方式が提案されている。FIG. 4 shows the multi-valued character pattern shown in FIG.
2 shows an example of a character pattern that has been subjected to a binarization process, a filter process, and a process of extracting each character by a binarization processing unit 22, a filter processing unit 23, and a character extraction processing unit 24 in FIG. FIG. This figure shows an example in which the binarization threshold is set to, for example, “3”. Therefore, “·” to “4” in FIG. 3 are white, and “0” to “3” are black. The character pattern extracting unit 25 and the character identifying unit 26 perform a character recognition process on the character pattern area S24 extracted in units of one character as described above. In this character recognition processing, as a character recognition method for numbers, alphabets, and katakana, a stroke analysis method for analyzing a character stroke, a pattern matching method for checking a match between a character pattern and a reference pattern, and Various methods have been proposed as a method of absorbing character deformation, such as a method of analyzing a line structure of a character pattern and a method of analyzing a background structure thereof.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、図２の
ＯＣＲでは、次のような課題があった。文字認識方式と
して、例えばストローク・アナリシス法を用いる場合、
各文字の特徴を変形までも含めて辞書に記述するので、
この辞書が膨大なものになる。そのため、認識に要する
処理時間が長くなり、この処理時間を短くしようとする
とハードウェアの規模が大きくなるという課題があっ
た。又、背景構造を解析する方法では、文字の背景部に
着目して該文字を形成する白黒点を判定し、その文字を
形成する線のループや、凹状態及び凸状態等をその特徴
として抽出し、この特徴に基づいてその文字を認識する
が、この特徴を抽出する処理が複雑であるという課題が
あった。However, the OCR of FIG. 2 has the following problems. When using the stroke analysis method as the character recognition method, for example,
Since the characteristics of each character are described in the dictionary including the deformation,
This dictionary will be huge. For this reason, there is a problem that the processing time required for the recognition becomes longer, and if the processing time is reduced, the scale of the hardware becomes larger. In the method of analyzing the background structure, a black-and-white point forming the character is determined by focusing on the background portion of the character, and a loop of a line forming the character, a concave state, a convex state, and the like are extracted as features. However, the character is recognized based on this feature, but there is a problem that the process of extracting this feature is complicated.

【０００６】[0006]

【課題を解決するための手段】前記課題を解決するため
に、本発明のうちの第１の発明は、ＯＣＲにおいて、帳
票上の文字のイメージを光学系を用いて読取って多値イ
メージ信号を生成するイメージ読取部と、前記多値イメ
ージ信号を２値化して２値イメージ信号を生成する２値
化処理部と、前記２値イメージ信号から１文字毎に文字
パタン領域を切出す文字切出処理部と、前記文字パタン
領域をＸＹ座標のＸ方向に走査して該文字パタン領域内
の文字パタンの該Ｘ方向の連続した黒点数をＸ方向特徴
量として算出するＸ特徴量算出部と、前記文字パタン領
域を前記ＸＹ座標のＹ方向に走査して該文字パタン領域
内の文字パタンの該Ｙ方向の連続した黒点数をＹ方向特
徴量として算出するＹ特徴量算出部と、前記Ｘ方向特徴
量及びＹ方向特徴量の標準パタンを予め格納した辞書を
有し、該Ｘ方向特徴量及びＹ方向特徴量を該標準パタン
と比較することによって前記帳票上の文字を識別する文
字識別部とを、備えている。このような構成を採用した
ことにより、帳票上の文字のイメージがイメージ読取部
で読取られ、多値イメージ信号が生成される。この多値
イメージ信号は、２値化処理部で２値化されて２値イメ
ージ信号として出力される。この２値イメージ信号は、
文字切出処理部で１文字毎の文字パタン領域に切出され
る。この文字パタン領域はＸ特徴量算出部及びＹ特徴量
算出部でそれぞれＸ方向及びＹ方向に走査され、Ｘ方向
特徴量及びＹ方向特徴量が算出される。Ｘ方向特徴量及
びＹ方向特徴量は、文字識別部の辞書に格納されている
標準パタンと比較されて識別される。According to a first aspect of the present invention, a multi-valued image signal is read by reading an image of a character on a form using an optical system in an OCR. An image reading unit for generating, a binarization processing unit for binarizing the multi-level image signal to generate a binary image signal, and a character extracting unit for extracting a character pattern area for each character from the binary image signal A processing unit, an X feature amount calculation unit that scans the character pattern area in the X direction of XY coordinates and calculates the number of continuous black points in the X direction of the character pattern in the character pattern area as an X direction feature amount; A Y feature amount calculation unit that scans the character pattern area in the Y direction of the XY coordinates and calculates the number of continuous black points in the Y direction of the character pattern in the character pattern area as a Y direction feature amount; Features and Y-direction features Has a prestored dictionary the reference pattern of the X-direction feature amount and the Y-direction feature amount and a character identifying unit that identifies the characters on the form by comparing it with the reference pattern comprises. By employing such a configuration, an image of a character on a form is read by the image reading unit, and a multi-valued image signal is generated. This multi-level image signal is binarized by a binarization processing unit and output as a binary image signal. This binary image signal is
In the character extraction processing unit, each character is extracted into a character pattern area. The character pattern area is scanned in the X direction and the Y direction by the X feature value calculation unit and the Y feature value calculation unit, respectively, and the X direction feature value and the Y direction feature value are calculated. The X-direction feature amount and the Y-direction feature amount are identified by comparison with a standard pattern stored in a dictionary of the character identification unit.

【０００７】第２の発明では、ＯＣＲにおいて、第１の
発明のイメージ読取部、２値化処理部、及び文字切出処
理部と、前記文字パタン領域を所定の大きさに正規化し
て正規化文字パタン領域を生成する正規化処理部と、前
記正規化文字パタン領域をＸＹ座標のＸ方向に走査して
該正規化文字パタン領域内の文字パタンの該Ｘ方向の連
続した黒点数をＸ方向特徴量として算出するＸ特徴量算
出部と、前記正規化文字パタン領域を前記ＸＹ座標のＹ
方向に走査して該正規化文字パタン領域内の文字パタン
の該Ｙ方向の連続した黒点数をＹ方向特徴量として算出
するＹ特徴量算出部と、前記Ｘ方向特徴量及びＹ方向特
徴量の標準パタンを予め格納した辞書を有し、該Ｘ方向
特徴量及びＹ方向特徴量を該標準パタンと比較すること
によって前記帳票上の文字を識別する文字識別部とを、
備えている。このような構成を採用したことにより、第
１の発明の文字切出処理部で切出された文字パタン領域
が、正規化処理部で所定の大きさに正規化されて正規化
文字パタン領域として出力される。この正規化文字パタ
ン領域はＸ特徴量算出部及びＹ特徴量算出部でそれぞれ
Ｘ方向及びＹ方向に走査され、Ｘ方向特徴量及びＹ方向
特徴量が算出される。Ｘ方向特徴量及びＹ方向特徴量
は、文字識別部の辞書に格納されている標準パタンと比
較されて識別される。従って、前記課題を解決できるの
である。According to a second aspect of the present invention, in the OCR, the image reading unit, the binarization processing unit, and the character cutout processing unit according to the first invention, and the character pattern area are normalized to a predetermined size. A normalization processing unit that generates a character pattern area; and scans the normalized character pattern area in the X direction of XY coordinates to determine the number of continuous black points of the character pattern in the normalized character pattern area in the X direction. An X feature amount calculation unit for calculating as a feature amount, and the normalized character pattern area is defined as Y of the XY coordinates.
A Y feature amount calculating unit that scans in the direction to calculate the number of continuous black points in the Y direction of the character pattern in the normalized character pattern area as a Y direction feature amount; A dictionary that stores a standard pattern in advance, and a character identification unit that identifies a character on the form by comparing the X-direction feature amount and the Y-direction feature amount with the standard pattern.
Have. By adopting such a configuration, the character pattern area extracted by the character extraction processing unit of the first invention is normalized to a predetermined size by the normalization processing unit and becomes a normalized character pattern area. Is output. The normalized character pattern area is scanned in the X direction and the Y direction by the X feature amount calculation unit and the Y feature amount calculation unit, respectively, and the X direction feature amount and the Y direction feature amount are calculated. The X-direction feature amount and the Y-direction feature amount are identified by comparison with a standard pattern stored in a dictionary of the character identification unit. Therefore, the above problem can be solved.

【０００８】[0008]

【発明の実施の形態】第１の実施形態図１は、本発明の第１の実施形態を示すＯＣＲの構成図
である。このＯＣＲは、イメージ読取部３０を有してい
る。イメージ読取部３０は、光源３１で照射された帳票
Ｔ上の文字のイメージを光学系（例えば、レンズ３２及
びＣＣＤセンサ３３）を用いて読取り、該ＣＣＤセンサ
３３のアナログの出力信号Ｓ３３からＡＤＣ３４でディ
ジタルの多値イメージ信号Ｓ３４を生成する機能を有し
ている。ＡＤＣ３４の出力側には、認識部４０が接続さ
れている。認識部４０は、多値イメージ信号Ｓ３４を格
納するイメージメモリ４１を有し、このイメージメモリ
４１の出力側に、該イメージメモリ４１に格納された多
値イメージ信号Ｓ４１を２値化して２値イメージ信号Ｓ
４２を生成する２値化処理部４２が接続されている。２
値化処理部４２の出力側には、２値イメージ信号Ｓ４２
に対してゴミ除去処理等を行って２値イメージ信号Ｓ４
３を生成するフィルタ処理部４３が接続されている。フ
ィルタ処理部４３の出力側には、２値イメージ信号Ｓ４
３から文字パタン領域Ｓ４４を１文字毎に切出す文字切
出処理部４４が接続されている。文字切出処理部４４の
出力側には、文字パタン領域Ｓ４４をＸＹ座標のＸ方向
に走査し、該文字パタン領域Ｓ４４内の文字パタンのＸ
方向の連続した黒点数をカウントしてＸ方向特徴量Ｓ４
５ａとして算出するＸ特徴量算出部４５ａが接続されて
いる。Ｘ方向特徴量Ｓ４５ａは、Ｘメモリ４５ｂに格納
されるようになっている。又、文字切出処理部４４の出
力側には、文字パタン領域Ｓ４４をＸＹ座標のＹ方向に
走査し、該文字パタン領域Ｓ４４内の文字パタンのＹ方
向の連続した黒点数をカウントしてＹ方向特徴量Ｓ４５
ｃとして算出するＹ特徴量算出部４５ｃが接続されてい
る。Ｙ方向特徴量Ｓ４５ｃは、Ｙメモリ４５ｄに格納さ
れるようになっている。Ｘ特徴量算出部４５ａ及びＹ特
徴量算出部４５ｃの各出力側には、文字識別部４６が接
続されている。文字識別部４６は、Ｘ方向特徴量Ｓ４５
ａ及びＹ方向特徴量Ｓ４５ｃの標準パタンを予め格納し
た辞書を有し、該Ｘ方向特徴量Ｓ４５ａ及びＹ方向特徴
量Ｓ４５ｃを該標準パタンと比較することによって帳票
Ｔ上の文字を識別する機能を有している。認識部４０中
の各ブロックは、認識制御部４７によって制御されるよ
うになっている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS First Embodiment FIG. 1 is a block diagram of an OCR showing a first embodiment of the present invention. This OCR has an image reading unit 30. The image reading unit 30 reads an image of a character on the form T illuminated by the light source 31 using an optical system (for example, a lens 32 and a CCD sensor 33), and outputs an analog output signal S33 of the CCD sensor 33 to the ADC 34. It has a function of generating a digital multi-level image signal S34. A recognition unit 40 is connected to the output side of the ADC 34. The recognizing unit 40 has an image memory 41 for storing the multi-valued image signal S34. The output side of the image memory 41 converts the multi-valued image signal S41 stored in the image memory 41 into a binary image. Signal S
The binarization processing unit 42 for generating the data 42 is connected. 2
On the output side of the binarization processing unit 42, a binary image signal S42
To the binary image signal S4
3 is connected. On the output side of the filter processing unit 43, the binary image signal S4
3 is connected to a character extraction processing unit 44 for extracting a character pattern area S44 for each character. On the output side of the character extraction processing unit 44, the character pattern area S44 is scanned in the X direction of the XY coordinates, and the X of the character pattern in the character pattern area S44 is scanned.
The number of continuous black points in the direction is counted, and the X-direction feature amount S4 is calculated.
An X feature amount calculation unit 45a for calculating as 5a is connected. The X-direction feature value S45a is stored in the X memory 45b. On the output side of the character extraction processing unit 44, the character pattern area S44 is scanned in the Y direction of the XY coordinates, and the number of continuous black points of the character pattern in the character pattern area S44 in the Y direction is counted. Direction feature value S45
The Y feature amount calculation unit 45c that calculates as c is connected. The Y direction feature amount S45c is stored in the Y memory 45d. A character identification unit 46 is connected to each output side of the X feature amount calculation unit 45a and the Y feature amount calculation unit 45c. The character identification unit 46 determines the X-direction feature amount S45.
a dictionary that previously stores standard patterns of “a” and “Y” direction feature values S45c, and has a function of identifying a character on the form T by comparing the “X” direction feature value S45a and the “Y” direction feature value S45c with the standard pattern. Have. Each block in the recognition unit 40 is controlled by a recognition control unit 47.

【０００９】次に、図１のＯＣＲの動作（１）〜（３）
を説明する。（１）図１のＯＣＲ全体の動作このＯＣＲでは、帳票Ｔはイメージ読取部３０中の図示
しない機構部によって搬送され、所定の位置で光源３１
に照射される。帳票Ｔの面の反射光は、レンズ３２によ
りＣＣＤセンサ３３上に結像され、該ＣＣＤセンサ３３
によりアナログの電気信号Ｓ３３に変換される。この電
気信号Ｓ３３はＡＤＣ３４で多値のイメージ信号Ｓ３４
に変換される。イメージ信号Ｓ３４は認識部４０中のイ
メージメモリ４１に格納され、該イメージメモリ４１か
ら出力されたイメージ信号Ｓ４１が２値化処理部４２で
予め設定された閾値に基づいて２値イメージ信号Ｓ４２
に変換される。２値イメージ信号Ｓ４２はフィルタ処理
部４３でゴミ等のノイズを除去するフィルタ処理が施さ
れ、該フィルタ処理部４３からイメージ信号Ｓ４３が出
力される。文字切出処理部４４は、イメージ信号Ｓ４３
に対して１文字毎に文字パタン領域Ｓ４４を切出す。文
字パタン領域Ｓ４４に対し、Ｘ特徴量算出部４５ａはＸ
方向の特徴量Ｓ４５ａを算出し、次に、Ｙ特徴量算出部
４５ｃがＹ方向の特徴量Ｓ４５ｃを算出する。文字識別
部４６は、特徴量Ｓ４５ａ，Ｓ４５ｃと標準パタンを予
め格納した辞書の特徴量との照合を行うことによって文
字を識別し、該文字のコードを認識結果Ｓ４６として出
力する。Next, the operations (1) to (3) of the OCR shown in FIG.
Will be described. (1) Overall operation of the OCR in FIG. 1 In this OCR, a form T is conveyed by a mechanism (not shown) in the image reading unit 30 and a light source 31 at a predetermined position.
Is irradiated. The reflected light from the surface of the form T is imaged on the CCD sensor 33 by the lens 32, and the CCD sensor 33
Is converted into an analog electric signal S33. The electric signal S33 is converted by the ADC 34 into a multi-valued image signal S34.
Is converted to The image signal S34 is stored in an image memory 41 in the recognizing unit 40, and the image signal S41 output from the image memory 41 is converted into a binary image signal S42 based on a threshold set in advance by a binarization processing unit 42.
Is converted to The binary image signal S42 is subjected to a filtering process for removing noise such as dust in a filtering section 43, and the filtering section 43 outputs an image signal S43. The character cutout processing section 44 outputs the image signal S43.
, A character pattern area S44 is cut out for each character. For the character pattern area S44, the X feature amount calculation unit 45a
The feature amount S45a in the direction is calculated, and then the Y feature amount calculating unit 45c calculates the feature amount S45c in the Y direction. The character identifying unit 46 identifies the character by comparing the characteristic amounts S45a and S45c with the characteristic amount of the dictionary in which the standard pattern is stored in advance, and outputs the character code as the recognition result S46.

【００１０】（２）Ｘ特徴量及びＹ特徴量の算出動作図５は図１のＯＣＲにおける特徴量算出の概念を示す
図、図６は特徴量の算出例を説明する図である。図５に
示すように、Ｘ特徴量算出部４５ａは、Ｙ方向の第１行
をＸ方向に走査して該Ｘ方向の連続した黒点の数をカウ
ントし、Ｘメモリ４５ｂに該連続した黒点数を格納す
る。この図５では、連続する黒点数が（１、１）の２つ
のブロックから成り立っているので、（１、１）がＸメ
モリ４５ｂに格納される。次に、Ｘ特徴量算出部４５ａ
は、Ｙ方向の第２行をＸ方向に走査して該Ｘ方向の連続
した黒点の数をカウントし、（３）がＸメモリ４５ｂに
格納される。同様に、Ｘ特徴量算出部４５ａは、Ｙ方向
の第５行までについてＸ方向の連続した黒点の数をカウ
ントし、（１）、（３、１）、及び（２、２）がＸ方向
特徴量Ｓ４５ａとしてＸメモリ４５ｂに順次格納され
る。一方、Ｙ特徴量算出部４５ｃは、Ｘ方向の第１列を
Ｙ方向に走査して該Ｙ方向の連続した黒点の数をカウン
トし、Ｙメモリ４５ｄに該連続した黒点数を格納する。
第１列では、連続する黒点の数が（１、２）の２つのブ
ロックから成り立っているので、（１、２）がＹメモリ
４５ｄに格納される。同様に、Ｙ特徴量算出部４５ｃ
は、Ｘ方向の第５列までについてＹ方向の連続した黒点
の数をカウントし、（２）、（３）、（２、１）、及び
（１、２）がＹ方向特徴量Ｓ４５ｄとしてＹメモリ４５
ｄに格納される。(2) Calculation of X and Y feature values FIG. 5 is a diagram showing the concept of calculating feature values in the OCR shown in FIG. 1, and FIG. 6 is a diagram for explaining an example of calculating feature values. As shown in FIG. 5, the X feature quantity calculation unit 45a scans the first row in the Y direction in the X direction, counts the number of continuous black points in the X direction, and stores the number of continuous black points in the X memory 45b. Is stored. In FIG. 5, (1, 1) is stored in the X memory 45b because the number of consecutive black dots is made up of two blocks of (1, 1). Next, the X feature amount calculation unit 45a
Scans the second row in the Y direction in the X direction, counts the number of continuous black points in the X direction, and stores (3) in the X memory 45b. Similarly, the X feature quantity calculation unit 45a counts the number of continuous black points in the X direction up to the fifth row in the Y direction, and (1), (3, 1), and (2, 2) indicate the X direction. The feature amount S45a is sequentially stored in the X memory 45b. On the other hand, the Y feature amount calculation unit 45c scans the first column in the X direction in the Y direction, counts the number of continuous black points in the Y direction, and stores the number of continuous black points in the Y memory 45d.
In the first column, (1, 2) is stored in the Y memory 45d because the number of consecutive black points is composed of two blocks (1, 2). Similarly, the Y feature amount calculation unit 45c
Counts the number of continuous black points in the Y direction up to the fifth column in the X direction, and (2), (3), (2, 1), and (1, 2) indicate Y direction feature amount S45d as Y Memory 45
d.

【００１１】図６では、図４のパターンにおけるＸ方向
及びＹ方向に連続した黒点の数が示されている。これら
の黒点の数に対して図５に示す方法で算出されたＸ方向
特徴量Ｓ４５ａ及びＹ方向特徴量Ｓ４５ｄと標準パタン
に格納された辞書の特徴量との照合を行うことによって
文字が決定される。尚、各カテゴリ毎の標準パタンもＸ
方向とＹ方向とが独立し、Ｘメモリ４５ｂ及びＹメモリ
４５ｄと同様に、Ｘ方向の特徴量はＹ方向の第１行から
順次記述され、Ｙ方向の特徴量がＸ方向の第１列から順
次記述されている。この照合の方法は、Ｘメモリ４５ｂ
及びＹメモリ４５ｄの各先頭と、各カテゴリのＸ方向及
びＹ方向の標準パタンの各先頭とからそれぞれ順次照合
し、Ｘ方向の黒点数及びＹ方向の黒点数の不一致の数を
それぞれカウントしてＸ方向及びＹ方向の不一致数の合
計が最少の文字を認識結果Ｓ４６とする。ここで、照合
するＸメモリ４５ｂ及びＹメモリ４５ｄと標準パタンと
は、ブロック数が一致しないことがある。そのため、連
続した黒点数の最大のブロックの位置を一致させてお
き、黒点数の不一致の数をカウントする。又、ブロック
数が一致した場合でも、連続した黒点数の最大のブロッ
クの位置を一致させ、黒点数の不一致の数をカウントす
る。FIG. 6 shows the number of continuous black spots in the X and Y directions in the pattern of FIG. Characters are determined by comparing the number of these black points with the X-direction feature amount S45a and the Y-direction feature amount S45d calculated by the method shown in FIG. 5 against the dictionary feature amount stored in the standard pattern. You. Note that the standard pattern for each category is also X
The direction and the Y direction are independent, and similarly to the X memory 45b and the Y memory 45d, the X direction feature amount is sequentially described from the first row in the Y direction, and the Y direction feature amount is described from the first column in the X direction. They are described sequentially. This collation method uses the X memory 45b.
And the heads of the Y memory 45d and the heads of the standard patterns in the X and Y directions of each category are sequentially collated, and the number of mismatches between the number of black points in the X direction and the number of black points in the Y direction are counted. The character having the smallest total number of mismatches in the X direction and the Y direction is regarded as the recognition result S46. Here, the X memory 45b and the Y memory 45d to be compared may not have the same number of blocks as the standard pattern. For this reason, the positions of the blocks with the largest number of consecutive black dots are matched, and the number of mismatches of the number of black dots is counted. Even when the number of blocks matches, the position of the block with the largest number of consecutive black points is matched, and the number of mismatches of the number of black points is counted.

【００１２】（３）文字識別動作図７は、図１中の文字識別部４６におけるメモリと標準
パタンとの照合の方法の一例を示す図である。この図７
に示すように、Ｘメモリ４５ｂ又はＹメモリ４５ｄであ
るメモリＭと標準パタンＳＰとの各先頭から順次照合す
る。即ち、メモリＭの（１、５、３）の３ブロックと標
準パタンＳＰの（６，３）の２ブロックとの照合を行
う。この場合、連続する黒点数が最大のブロックである
メモリＭの“５”と標準パタンの“６”との位置を一致
させておき、各ブロックの不一致数をカウントすると、
該不一致数は各々“１”、“１”、“０”になり、該不
一致数の合計が“２”になる。次に、メモリＭ及び標準
パタンＳＰがそれぞれ（１，３）及び（３，１）のよう
に、ブロック数が２で一致した場合、該メモリＭ及び該
標準パタンＳＰの各“３”の位置を一致させておき、各
ブロックの不一致数をカウントすると、該不一致数は各
々“１”、“０”、“１”になり、該不一致数の合計が
“２”になる。同様に、メモリＭ及び標準パタンＳＰが
それぞれ（１、３、１）及び（３、１、２）のように、
ブロック数が３で一致した場合、該メモリＭと該標準パ
タンＳＰの各“３”の位置を一致させておき、各ブロッ
クの不一致数をカウントすると、該不一致数は各々
“１”、“０”、“０”、“２”になり、該不一致数の
合計が“３”になる。(3) Character identification operation FIG. 7 is a diagram showing an example of a method of comparing a memory with a standard pattern in the character identification section 46 in FIG. This FIG.
As shown in (1), the memory M, which is the X memory 45b or the Y memory 45d, and the standard pattern SP are sequentially collated from the head. That is, three blocks (1, 5, 3) in the memory M are compared with two blocks (6, 3) in the standard pattern SP. In this case, if the position of “5” in the memory M, which is the block with the largest number of continuous black dots, matches the position of “6” in the standard pattern, and the number of mismatches in each block is counted,
The number of mismatches becomes "1", "1", and "0", respectively, and the total number of mismatches becomes "2". Next, when the memory M and the standard pattern SP have the same number of blocks as 2, as in (1,3) and (3,1), respectively, the position of each "3" in the memory M and the standard pattern SP Are matched, and the number of mismatches in each block is counted. The numbers of mismatches are "1", "0", and "1", respectively, and the total number of mismatches is "2". Similarly, when the memory M and the standard pattern SP are (1, 3, 1) and (3, 1, 2), respectively,
When the number of blocks matches 3, the position of each "3" of the memory M and the standard pattern SP is matched, and the number of mismatches of each block is counted. The numbers of mismatches are "1" and "0", respectively. , "0", "2", and the total number of mismatches becomes "3".

【００１３】更に、メモリＭ及び標準パタンＳＰがそれ
ぞれ（３，２）及び（４，２）のように２ブロックの場
合、該メモリの“３”と該標準パタンの“４”との位置
を一致させておき、各ブロックの不一致数をカウントす
ると、該不一致数は各々“１”、“０”になり、該不一
致数の合計が“１”になる。又、メモリＭ及び標準パタ
ンＳＰがそれぞれ（３）及び（３、３）の場合、不一致
数は“３”になる。これで、メモリＭと標準パタンＳＰ
との照合が終了したとすれば、文字としての不一致数は
それぞれ“２”、“２”、“３”、“１”、“３”であ
り、該不一致数の合計が“１１”になる。このようにし
て、Ｘ方向及びＹ方向について、メモリＭと標準パタン
ＳＰの全カテゴリとの照合を行い、Ｘ方向及びＹ方向の
各不一致数の合計が最少である文字を識別結果とする。
ここで、メモリＭと標準パタンＳＰとを照合する際、文
字の大きさが問題になる。即ち、文字の縦横の大きさが
違うと、照合の際、本来照合すべき行や列ではなく、ず
れた行や列と照合してしまうことになるが、これに対し
ては、各文字の大ききに対応した辞書を持つようにし、
その大ききの標準パタンと照合するようにすればよい。Further, when the memory M and the standard pattern SP are two blocks as shown in (3, 2) and (4, 2), the position of "3" of the memory and the position of "4" of the standard pattern are determined. When the number of inconsistencies in each block is counted in advance, the numbers of inconsistencies become "1" and "0", respectively, and the total number of inconsistencies becomes "1". When the memory M and the standard pattern SP are (3) and (3, 3), respectively, the number of mismatches is “3”. Thus, the memory M and the standard pattern SP
Is completed, the number of mismatches as characters is "2", "2", "3", "1", "3", respectively, and the total number of mismatches is "11". . In this manner, in the X direction and the Y direction, the memory M is compared with all the categories of the standard pattern SP, and a character having the smallest total number of mismatches in the X direction and the Y direction is determined as an identification result.
Here, when comparing the memory M with the standard pattern SP, the size of the character becomes a problem. In other words, if the vertical and horizontal sizes of the characters are different, the collation will be performed not on the row or column that should be collated but with the shifted row or column. Make sure you have a dictionary that is large,
What is necessary is just to match with the large standard pattern.

【００１４】以上のように、この第１の実施形態では、
次のような利点がある。（ａ）Ｘ特徴量算出部４５ａ及びＹ特徴量算出部４５
ｃが文字のＸ方向特徴量Ｓ４５ａ及びＹ方向特徴量Ｓ４
５ｄを算出してＸメモリ４５ｂ及びＹメモリ４５ｄにそ
れぞれ格納し、文字識別部４６で標準パタンＳＰと照合
してその文字の特徴を抽出するようにしたので、従来の
ような複雑な特徴抽出をする必要がなく、特徴抽出処理
を非常に簡単にできる。（ｂ）識別する文字が活字の数字、英字、又はカタカ
ナの場合、同種の文字の大きさはばらつきが少ないの
で、標準パタンＳＰとして辞書に格納する各種の文字の
大きさは、Ｘ方向及びＹ方向についてそれぞれ２〜３種
類程度でよい。そのため、辞書が膨大にならないので、
従来よりも認識処理時間を短縮できる。As described above, in the first embodiment,
There are the following advantages. (A) X feature value calculation unit 45a and Y feature value calculation unit 45
c is the X-direction feature amount S45a and the Y-direction feature amount S4 of the character.
5d is calculated and stored in the X memory 45b and the Y memory 45d, respectively, and is compared with the standard pattern SP by the character identification unit 46 to extract the characteristic of the character. And the feature extraction process can be made very simple. (B) When the characters to be identified are type numbers, alphabets, or katakana, since the size of the same type of character has little variation, the sizes of various characters stored in the dictionary as the standard pattern SP are X direction and Y direction. About two or three types of directions are sufficient. Because the dictionary does not become huge,
Recognition processing time can be reduced as compared with the conventional case.

【００１５】第２の実施形態図８は、本発明の第２の実施形態を示すＯＣＲの構成図
であり、第１の実施形態を示す図１中の要素と共通の要
素には共通の符号が付されている。このＯＣＲでは、図
１中の認識部４０に代えて、異なる構成の認識部４０Ａ
が設けられている。認識部４０Ａでは、図１中の文字切
出処理部４４の出力側と、Ｘ特徴量抽出部４５ａ及びＹ
特徴量抽出部４５ｃの入力側との間に、新たに正規化処
理部４８が設けられると共に、図１中の認識制御部４７
と異なる構成の認識制御部４７Ａが設けられている。正
規化処理部４８は、文字切出処理部４４から切出された
文字パタン領域Ｓ４４を所定の大きさに正規化して正規
化文字パタン領域Ｓ４８を生成する機能を有している。
認識部４０Ａ中の各ブロックは、認識制御部４７Ａによ
って制御されるようになっている。他は、図１と同様の
構成である。 Second Embodiment FIG. 8 is a block diagram of an OCR according to a second embodiment of the present invention. In FIG. 8, the same elements as those in FIG. Is attached. In this OCR, a recognition unit 40A having a different configuration is used instead of the recognition unit 40 in FIG.
Is provided. In the recognizing unit 40A, the output side of the character cutout processing unit 44 in FIG.
A new normalization processing unit 48 is provided between the input side of the feature amount extraction unit 45c and the recognition control unit 47 in FIG.
A recognition control unit 47A having a configuration different from that of the first embodiment is provided. The normalization processing unit 48 has a function of normalizing the character pattern area S44 extracted from the character extraction processing unit 44 to a predetermined size to generate a normalized character pattern area S48.
Each block in the recognition unit 40A is controlled by a recognition control unit 47A. Other configurations are the same as those in FIG.

【００１６】次に、図８のＯＣＲの動作（１）〜（２）
を説明する。（１）図８のＯＣＲの全体動作このＯＣＲの動作では、次の点が図１と異なっている。
文字切出処理部４４で切り出された１文字毎の文字パタ
ン領域Ｓ４４は、正規化処理部４８で縦横が所定の大き
さに正規化され、該正規化処理部４８から正規化文字パ
タン領域Ｓ４８が出力される。この正規化文字パタン領
域Ｓ４８対し、Ｘ特徴量抽出部４５ａはＸ方向特徴量Ｓ
４５ａを算出し、次に、Ｙ特徴量抽出部４５ｃがＹ方向
特徴量Ｓ４５ｃを算出する。文字識別部４６は、特徴量
Ｓ４５ａ，Ｓ４５ｃと標準パタン予め格納した辞書の特
徴量との照合を行うことによって文字を識別し、該文字
コードを認識結果Ｓ４６として出力する。Next, the operations (1) and (2) of the OCR shown in FIG.
Will be described. (1) Overall operation of the OCR in FIG. 8 The operation of the OCR differs from FIG. 1 in the following points.
The character pattern area S44 for each character extracted by the character extraction processing section 44 is normalized by the normalization processing section 48 in the vertical and horizontal directions to a predetermined size, and the normalization processing section 48 outputs the normalized character pattern area S48. Is output. With respect to the normalized character pattern area S48, the X feature amount extraction unit 45a outputs the X direction feature amount S
45a, and then the Y feature amount extraction unit 45c calculates the Y direction feature amount S45c. The character identification unit 46 identifies the character by comparing the characteristic amounts S45a and S45c with the characteristic amount of the dictionary stored in advance in the standard pattern, and outputs the character code as the recognition result S46.

【００１７】（２）正規化処理動作図９は、図４を正規化したパタンの一例を示す図であ
る。この図９では、図４の縦方向２２画素及び横方向１
３画素のパタンが、縦方向８画素及び横方向８画素に正
規化されている。Ｘ特徴量抽出部４５ａ及びＹ特徴量抽
出部４５ｃは第１の実施形態と同様の動作を行う。即
ち、Ｘ特徴量抽出部４５ａは、先ず、図９に示すパタン
のＹ方向の第１行から第８行までのＸ方向に連続した黒
点数をカウントし、Ｘメモリ４５ｂに格納する。次に、
Ｙ特徴量抽出部４５ｃは、Ｘ方向の第１列から第８列ま
でのＹ方向に連続した黒点数をカウントし、Ｙメモリ４
５ｄに格納する。図９中の文字“２”の場合、Ｘメモリ
４５ｂには、（７）、（２）、…、（１、３）、（５）
が順次格納され、Ｙメモリ４５ｄには（３、１）、
（３、１）、（１、２、１）、…、（１、３）、（２）
が順次格納される。同様に、文字“３”の場合、Ｘメモ
リ４５ｂには、（７）、（２）、…、（３）、（７）と
順次格納され、Ｙメモリ４５ｄには（１、１）、（１、
１）、…、（４、２）、（３）と順次格納される。(2) Normalization Processing Operation FIG. 9 is a diagram showing an example of a pattern obtained by normalizing FIG. In FIG. 9, 22 pixels in the vertical direction and 1 pixel in the horizontal direction in FIG.
The pattern of three pixels is normalized to eight pixels in the vertical direction and eight pixels in the horizontal direction. The X feature quantity extraction unit 45a and the Y feature quantity extraction unit 45c perform the same operation as in the first embodiment. That is, the X feature quantity extraction unit 45a first counts the number of black points continuous in the X direction from the first row to the eighth row in the Y direction of the pattern shown in FIG. 9, and stores the number in the X memory 45b. next,
The Y feature quantity extraction unit 45c counts the number of black points continuous in the Y direction from the first column to the eighth column in the X direction, and
5d. In the case of the character “2” in FIG. 9, the X memory 45b stores (7), (2),..., (1, 3), (5)
Are sequentially stored, and (3, 1) are stored in the Y memory 45d.
(3, 1), (1, 2, 1), ..., (1, 3), (2)
Are sequentially stored. Similarly, in the case of the character “3”, (7), (2),..., (3) and (7) are sequentially stored in the X memory 45b, and (1, 1), (7) are stored in the Y memory 45d. 1,
.., (4, 2), (3) are sequentially stored.

【００１８】メモリと標準パタンとの照合は、第１の実
施形態と同様に行われる。即ち、Ｘメモリ４５ｂ及びＹ
メモリ４５ｄと、各カテゴリのＸ方向及びＹ方向の標準
パタンとの各先頭から順次照合し、Ｘ方向とＹ方向の黒
点数の不一致の数をそれぞれカウントして該Ｘ方向と該
Ｙ方向の不一致数の合計が最少の文字を識別結果Ｓ４６
とする。本実施形態では、メモリと標準パタンとを照合
する際、文字の大きさを正規化しているので、文字識別
部４６は、Ｘ方向及びＹ方向についてそれぞれ１個の標
準パタンと照合するのみでよい。以上のように、この第
２の実施形態では、正規化処理部４８は、文字切出処理
部４４で切り出された１文字毎の文字パタン領域Ｓ４４
を正規化するので、標準パタンとして辞書に格納してい
る各種の文字の大きさは、Ｘ方向及びＹ方向についてそ
れぞれ１種類でよい。そのため、辞書を非常に小さくで
きるので、第１の実施形態の利点に加え、更に認識処理
時間を短縮できる。The collation between the memory and the standard pattern is performed in the same manner as in the first embodiment. That is, X memory 45b and Y memory
The memory 45d is sequentially compared with the standard pattern in the X direction and the Y direction of each category from the beginning, and the number of mismatches in the number of black points in the X direction and the Y direction is counted. Characters whose total number is the smallest are identified as result S46
And In the present embodiment, when the memory and the standard pattern are collated, the character size is normalized, so that the character identification unit 46 need only collate with one standard pattern in each of the X direction and the Y direction. . As described above, in the second embodiment, the normalization processing unit 48 determines the character pattern area S44 for each character extracted by the character extraction processing unit 44.
Is normalized, the size of various characters stored in the dictionary as a standard pattern may be one type for each of the X direction and the Y direction. Therefore, the dictionary can be made very small, and in addition to the advantages of the first embodiment, the recognition processing time can be further reduced.

【００１９】尚、本発明は上記実施形態に限定されず、
種々の変形が可能である。その変形例としては、例えば
次のようなものがある。（ａ）第２の実施形態では、文字パタン領域Ｓ４４の
大きさを正規化する際、縦方向８画素及び横方向８画素
の大きさに正規化した例で説明したが、縦方向と横方向
の画素数の比は同じである必要はなく、文字認識に必要
な特徴量がとれ、各カテゴリの識別ができれば、どのよ
うな大きさに正規化してもよい。（ｂ）図７に示すメモリＭと標準パタンＳＰの値は、
他の値でもよい。The present invention is not limited to the above embodiment,
Various modifications are possible. For example, there are the following modifications. (A) In the second embodiment, when the size of the character pattern area S44 is normalized, the size is normalized to 8 pixels in the vertical direction and 8 pixels in the horizontal direction. The ratio of the number of pixels need not be the same, and may be normalized to any size as long as a feature amount necessary for character recognition can be obtained and each category can be identified. (B) The values of the memory M and the standard pattern SP shown in FIG.
Other values may be used.

【００２０】[0020]

【発明の効果】以上詳細に説明したように、第１の発明
によれば、文字切出処理部で切り出された文字パタンの
Ｘ方向及びＹ方向の連続する黒点数を計数するのみでそ
の文字の特徴を抽出するようにしたので、従来のような
複雑な特徴抽出をする必要がなく、特徴抽出処理が非常
に簡単にできる。更に、文字が活字の数字、英字、又は
カタカナの場合、同種の文字の大きさはばらつきが少な
いので、標準パタンとして辞書に格納する各種の文字の
大きさは、Ｘ方向及びＹ方向それぞれ２〜３種類程度で
よい。そのため、辞書が膨大にならないので、従来より
も認識処理時間を短縮できる。第２の発明によれば、正
規化処理部は、文字切出処理部で切り出された１文字毎
の文字パタン領域を正規化するので、標準パタンとして
辞書に格納している各種の文字の大きさは、Ｘ方向及び
Ｙ方向についてそれぞれ１種類でよい。そのため、辞書
を非常に小さくできるので、第１の発明の効果に加え、
更に認識処理時間を短縮できる。As described above in detail, according to the first aspect, only the number of continuous black spots in the X and Y directions of the character pattern extracted by the character extraction processing section is counted. Since the feature is extracted, there is no need to perform complicated feature extraction as in the related art, and the feature extraction processing can be very easily performed. Furthermore, when the characters are type numbers, alphabets, or katakana, the sizes of the same type of characters are less scattered. Therefore, the sizes of various characters stored in the dictionary as standard patterns are 2 to 2 in the X and Y directions, respectively. About three types are sufficient. For this reason, the dictionary does not become enormous, so that the recognition processing time can be reduced as compared with the related art. According to the second aspect, the normalization processing unit normalizes the character pattern area for each character extracted by the character extraction processing unit, so that the size of various characters stored in the dictionary as a standard pattern is determined. One type may be used for each of the X direction and the Y direction. Therefore, the dictionary can be made very small, and in addition to the effects of the first invention,
Further, the recognition processing time can be reduced.

[Brief description of the drawings]

【図１】本発明の第１の実施形態のＯＣＲの構成図であ
る。FIG. 1 is a configuration diagram of an OCR according to a first embodiment of the present invention.

【図２】従来のＯＣＲの構成図である。FIG. 2 is a configuration diagram of a conventional OCR.

【図３】図２中のイメージメモリ２１内の多値の文字パ
タンの例を示す図である。FIG. 3 is a diagram showing an example of a multi-valued character pattern in an image memory 21 in FIG. 2;

【図４】図３の多値の文字パタンを２値化して切出した
文字パタンの例を示す図である。FIG. 4 is a diagram illustrating an example of a character pattern extracted by binarizing the multi-valued character pattern in FIG. 3;

【図５】図１のＯＣＲにおける特徴量算出の概念を示す
図である。FIG. 5 is a diagram showing a concept of feature amount calculation in the OCR of FIG. 1;

【図６】特徴量の算出を説明する図である。FIG. 6 is a diagram illustrating calculation of a feature amount.

【図７】図１のＯＣＲにおけるメモリと標準パタンとの
照合を示す図である。FIG. 7 is a diagram showing a comparison between a memory and a standard pattern in the OCR of FIG. 1;

【図８】本発明の第２の実施形態のＯＣＲの構成図であ
る。FIG. 8 is a configuration diagram of an OCR according to a second embodiment of the present invention.

【図９】図４を正規化したパタンを示す図である。FIG. 9 is a diagram showing a pattern obtained by normalizing FIG. 4;

[Explanation of symbols]

３０イメージ読取部３１光源３２帳票Ｔレンズ３３ＣＣＤセンサ３４Ａ／Ｄ変換器４０，４０Ａ認識部４１メモリ４２２値化処理部４３フィルタ処理部４４文字切出処理部４５ａＸ特徴量抽出部４５ｃＹ特徴量抽出部４６文字識別部４７認識制御部４８正規化処理部 Reference Signs List 30 image reading section 31 light source 32 form T lens 33 CCD sensor 34 A / D converter 40, 40A recognition section 41 memory 42 binarization processing section 43 filter processing section 44 character extraction processing section 45a X feature quantity extraction section 45c Y Feature amount extraction unit 46 Character identification unit 47 Recognition control unit 48 Normalization processing unit

Claims

[Claims]

An image reading unit that reads a character image on a form using an optical system to generate a multi-valued image signal; and a binary that generates a binary image signal by binarizing the multi-valued image signal. A character pattern processing unit that cuts out a character pattern area for each character from the binary image signal; and scans the character pattern area in the X direction of the XY coordinates to extract a character pattern in the character pattern area. the Y of the X feature amount calculation unit for calculating a continuous black points of the X-direction as the X direction feature quantity, by scanning the character pattern area in the Y direction of the XY coordinates of the character pattern of the character pattern area A Y feature amount calculation unit for calculating the number of black spots in the direction as a Y direction feature amount; a dictionary in which standard patterns of the X direction feature amount and the Y direction feature amount are stored in advance; Directional features are calculated using the standard pattern Optical character reader, characterized in that the character identifying unit that identifies the characters on the form, with the by comparing the.

2. The image reading unit according to claim 1, a binarization processing unit, and a character extraction processing unit, and a normalization unit that normalizes the character pattern area to a predetermined size to generate a normalized character pattern area. An X feature that scans the normalized character pattern area in the X direction of XY coordinates and calculates the number of continuous black points in the X direction of the character pattern in the normalized character pattern area as an X direction feature amount. An amount calculating unit that scans the normalized character pattern area in the Y direction of the XY coordinates and calculates the number of continuous black points in the Y direction of the character pattern in the normalized character pattern area as a Y direction feature amount. A feature value calculation unit; and a dictionary in which standard patterns of the X-direction feature value and the Y-direction feature value are stored in advance, and comparing the X-direction feature value and the Y-direction feature value with the standard pattern, Character identification to identify the character of Preparative, optical character reading apparatus characterized by comprising.