JPH0664629B2

JPH0664629B2 - Character recognition method

Info

Publication number: JPH0664629B2
Application number: JP60263672A
Authority: JP
Inventors: 晃治伊東; 義征山下; 寿之有賀
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1985-11-26
Filing date: 1985-11-26
Publication date: 1994-08-22
Anticipated expiration: 2009-08-22
Also published as: JPS62125485A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は文字図形を認識する文字認識方式に関し、更に
詳細には、文字図形パターンを複数のサブパターンに分
割して各サブパターン毎の特徴マトリクスを得、この特
徴マトリクスを辞書内の標準文字マスクと照合して識別
する方式に関する。TECHNICAL FIELD The present invention relates to a character recognition system for recognizing a character graphic, and more specifically, a character graphic pattern is divided into a plurality of sub-patterns and features of each sub-pattern. The present invention relates to a method of obtaining a matrix and identifying the feature matrix by comparing the feature matrix with a standard character mask in a dictionary.

（従来の技術）従来、文字図形認識装置に於ては文字図形パターンより
ストロークを抽出し、それら抽出されたストロークの位
置、長さ、ストローク間の相互関係等を用いて認識する
方式が多く採用されている。その手法は(1)文字図形の
輪郭を追跡することにより検出された輪郭点系列につい
て曲率を計算し、その曲率の大きな値の点を分割点とし
て輪郭系列を分割し、分割された系列を組合わせること
によりストロークを抽出するか、(2)文字図形パターン
に細線化処理を行なって骨格化し、その骨格パターンの
連結性及び骨格パターンを追跡し急激な角度の変化点等
を検出してストロークを抽出し、前記(1)(2)より抽出さ
れたストロークについて幾何学的な特徴等を抽出して識
別を行なっていた。しかしながら(1)の方法は文字図形
パターンが大きくなり、又文字図形パターンが複雑化す
ると、その処理量が増大し処理速度の低下を招いてい
た。又(2)の方法は文字図形パターンを細線化する必要
があり、又その細線化によるパターンのひずみヒゲの発
生等の問題がありその後の処理を複雑なものとしてい
た。(Prior Art) Conventionally, in a character / graphics recognizing device, a method of extracting strokes from a character / graphics pattern and recognizing them by using the positions, lengths, mutual relationships between the strokes, etc. are often adopted. Has been done. The method is as follows: (1) Calculate the curvature of the contour point sequence detected by tracing the contour of the character figure, divide the contour sequence with the point with a large curvature value as the dividing point, and combine the divided sequences. The strokes are extracted by combining them, or (2) the character / graphic pattern is thinned to form a skeleton, and the connectivity and skeleton pattern of the skeleton pattern are tracked to detect a sudden angle change point, etc. The strokes extracted and extracted in (1) and (2) above are identified by extracting geometrical features and the like. However, in the method (1), when the character / graphic pattern becomes large and the character / graphic pattern becomes complicated, the processing amount increases and the processing speed decreases. In the method (2), it is necessary to make the character / graphic pattern thin, and there is a problem that the thinning of the pattern causes distortion and whiskers in the pattern, making subsequent processing complicated.

これらの問題点を解決するために、本出願人は特開昭57
-23185号公報に開示の文字認識方式を提案している。こ
の方式は、文字パターンより抽出されたサブパターンに
ついて文字枠内を（Ｎ×Ｍ）の領域に分割を行うことに
より抽出した各ストロークの特徴マトリクスと辞書との
照合を行うものであり、これによって高速で安定した文
字認識を行うことができる。In order to solve these problems, the applicant of the present invention has disclosed in
-23185 proposes the character recognition method disclosed. In this method, the character matrix of the sub-pattern extracted from the character pattern is divided into (N × M) regions to collate the extracted feature matrix of each stroke with the dictionary. High-speed and stable character recognition can be performed.

（発明が解決しようとする問題点）しかしながら、上記従来の文字認識方式は以下の問題点
を有する。(Problems to be Solved by the Invention) However, the conventional character recognition method described above has the following problems.

例えば、印刷文字のゴシック体と、明朝体について認識
する場合、ゴシック体は縦線幅、横線幅がほぼ一定であ
るため、文字全体の線幅が縦線幅又は横線幅と等しく、
セルを単位として文字全体の線幅で正規化しても特徴は
安定している。一方明朝体は縦線幅、横線幅が大きく異
なるため、セルを単位として文字全体の線幅で正規化す
ると、ゴシック体と同じ線長を持つ文字線でもゴシック
体と比較して水平特徴マトリクスの量は小さく、垂直特
徴マトリクスの量は大きくなり、特徴として不安定であ
る。For example, when recognizing the Gothic font and the Mincho font of printed characters, the vertical line width and horizontal line width of the Gothic line are almost constant, so the line width of the entire character is equal to the vertical line width or horizontal line width,
The characteristics are stable even if the line width of the entire character is normalized using the cell as a unit. On the other hand, the vertical and horizontal line widths of Mincho type are very different, so if you normalize by the line width of the entire character in units of cells, even a character line with the same line length as Gothic type will have a horizontal feature matrix compared to Gothic type Is small and the amount of the vertical feature matrix is large, and the feature is unstable.

第２図にゴシック体と明朝体のサブパターンの例を示
す。同様の現象が手書文字にも当てはまり、筆記用具の
状態、書き手が筆記用具に入れる力の具合で同一の線長
を持つ文字線でも各特徴量は異なり、特徴が不安定であ
る。この問題点を解決しようとすれば、その識別のため
の辞書を増加させる必要があり、ひいては処理時間の低
下を招くことになる。Figure 2 shows examples of Gothic and Mincho subpatterns. The same phenomenon applies to handwritten characters, and even if a character line has the same line length depending on the state of the writing instrument and the force of the writer's ability to put it in the writing instrument, the features are different and the features are unstable. In order to solve this problem, it is necessary to increase the number of dictionaries for the identification, which eventually leads to a decrease in processing time.

従って、本発明はこれらの問題点を解決し、高速で安定
な文字認識方式を提供することにある。Therefore, the present invention solves these problems and provides a fast and stable character recognition system.

（問題点を解決するための手段）本発明は、文字図形を光電変換して量子化することによ
りり黒ビット及び白ビットで表されるディジタル信号の
原パターンを作成し、該原パターンの線幅を算出し、前
記原パターンを複数の方向に走査を行って各走査列毎の
黒ビットの連続個数を検出し、該検出した黒ビット連続
個数と前記算出した原パターンの線幅とに基づいて前記
複数の走査方向毎に対応した複数のサブパターンを抽出
し、前記原パターンの文字枠内領域を前記抽出した各サ
ブパターンについて（Ｎ×Ｍ）個の領域（Ｎ、Ｍは定
数）に分割し、該分割された領域内についてセルを単位
として黒点を計数し、該計数した黒点数を基に特徴量を
計算し、該計算した特徴量を文字の大きさで正規化して
特徴マトリクスを作成し、該作成した特徴マトリクスを
予め用意した文字図形パターンの標準文字マスクと照合
して文字図形を認識する文字認識方式に関している。(Means for Solving the Problems) According to the present invention, an original pattern of a digital signal represented by black bits and white bits is created by photoelectrically converting and quantizing a character graphic, and a line of the original pattern is created. The width is calculated, the original pattern is scanned in a plurality of directions to detect the number of consecutive black bits in each scanning row, and the number of consecutive black bits is detected and based on the calculated line width of the original pattern. A plurality of sub-patterns corresponding to each of the plurality of scanning directions are extracted, and the area within the character frame of the original pattern is divided into (N × M) areas (N and M are constants) for each extracted sub-pattern. Divide and count the number of black dots in the divided area in units of cells, calculate the feature amount based on the counted number of black points, normalize the calculated feature amount with the size of the character to form a feature matrix. Created and the created features Concerns a recognized character recognition system graphic character against the standard character masks prepared in advance character graphic pattern Torikusu.

特に本発明によれば、抽出したサブパターン毎に線幅が
計算され、計数した黒点数をこの抽出したサブパターン
毎の線幅で除算することにより特徴量が計算される。In particular, according to the present invention, the line width is calculated for each extracted sub-pattern, and the feature amount is calculated by dividing the counted number of black dots by the line width for each extracted sub-pattern.

（作用）識別すべき文字図形は２値化されてディジタル信号の原
パターンが作成される。次いでこの原パターンの線幅
（平均値）が算出される。一方この原パターンから複数
の走査方向毎に対応した複数のサブパターンが抽出さ
れ、抽出した各サブパターン毎の線幅が計算される。さ
らに原パターンの文字枠内領域が、抽出した各サブパタ
ーンについて（Ｎ×Ｍ）個の領域（Ｎ、Ｍは定数）に分
割される。次いで抽出した各サブパターン毎に特徴量が
計算される。この特徴量の計算は、分割した領域内につ
いてセルを単位として計数した黒点数を抽出したサブパ
ターン毎に計算した各サブパターンの線幅で除算するこ
とにより行われる。このように計算された特徴量は正規
化されて特徴マトリクスとなり、この作成した特徴マト
リクスが予め用意した文字図形パターンの標準文字マス
クと照合されることにより文字図形の認識が行われる。(Operation) The character pattern to be identified is binarized to create the original pattern of the digital signal. Next, the line width (average value) of this original pattern is calculated. On the other hand, a plurality of sub patterns corresponding to a plurality of scanning directions are extracted from this original pattern, and the line width of each extracted sub pattern is calculated. Further, the area within the character frame of the original pattern is divided into (N × M) areas (N and M are constants) for each extracted sub-pattern. Then, the feature amount is calculated for each extracted sub-pattern. The calculation of the feature amount is performed by dividing the number of black dots counted in units of cells in the divided area by the line width of each sub-pattern calculated for each extracted sub-pattern. The feature amount calculated in this way is normalized into a feature matrix, and the created feature matrix is collated with a standard character mask of a character / figure pattern prepared in advance to recognize a character / figure.

（実施例）以下、本発明を一実施例に基づき図面を参照して詳細に
説明する。(Embodiment) Hereinafter, the present invention will be described in detail based on an embodiment with reference to the drawings.

第１図は、本発明の一実施例を示すブロック図である。
同図において、１は光信号入力、２は光電変換部、３は
パターンレジスタ、４は線幅計算部、５は文字枠検出
部、６は垂直サブパターン抽出部、７は水平サブパター
ン抽出部、８は右斜めサブパターン抽出部、９は左斜め
サブパターン抽出部、10は垂直サブパターン線幅計算
部、11は水平サブパターン線幅計算部、12は右斜めサブ
パターン線幅計算部、13は左斜めサブパターン線幅計算
部、14は特徴マトリクス抽出部、15は識別部、16は文字
名出力である。FIG. 1 is a block diagram showing an embodiment of the present invention.
In the figure, 1 is an optical signal input, 2 is a photoelectric conversion unit, 3 is a pattern register, 4 is a line width calculation unit, 5 is a character frame detection unit, 6 is a vertical sub-pattern extraction unit, and 7 is a horizontal sub-pattern extraction unit. , 8 is a right diagonal sub-pattern extraction unit, 9 is a left diagonal sub-pattern extraction unit, 10 is a vertical sub-pattern line width calculation unit, 11 is a horizontal sub-pattern line width calculation unit, 12 is a right diagonal sub-pattern line width calculation unit, 13 is a left diagonal sub-pattern line width calculation unit, 14 is a feature matrix extraction unit, 15 is an identification unit, and 16 is a character name output.

次に、各部の構成について説明する。光電変換部２は原
パターンの光信号入力を２値の量子化された電気信号に
変換する。パターンレジスタ３はこの電気信号を格納す
る。この格納の際、文字は例えば100×100個のセルに分
割されて、各々の２値コードがパターンレジスタ３に記
憶される。線幅計算部４は入力パターン（原パターン）
の線幅を計算する。垂直サブパターン抽出部６はパター
ンレジスタ３について垂直スキャンを全面行なって、黒
ビットの連続の長さと線幅計算部４に於て計算された線
幅との関係より垂直サブパターン(VSP)を抽出する。同
様に水平サブパターン抽出部７は水平スキャンにより水
平サブパターン(HSP)を、右斜めサブパターン抽出部８
は右斜め（45゜）スキャンにより、右斜めサブパターン
（RSP）を、左斜めサブパターン抽出部９は左斜め（45
゜）スキャンにより、左斜めサブパターン（LSP）を抽
出する。第３図に原パターンとサブパターンの例を示
す。第３図の(a)が原パターン、(b)がVSP、(c)がHSP、
(d)がRSP、(e)がLSPである。文字枠検出部５はパターン
レジスタ内の文字パターンに外接する文字枠を検出しそ
の結果を特徴マトリクス抽出部10へ送る。特徴マトリク
ス抽出部10はサブパターンレジスタについて原パターン
の文字枠に対応する領域を（Ｎ×Ｍ）の領域（本発明の
実施例ではＮ＝Ｍ＝５）に分割する。例えば、文字が10
0×100のセルに分割され、Ｎ＝Ｍ＝５の場合には各領域
は20×20のセルを有する。Next, the configuration of each unit will be described. The photoelectric conversion unit 2 converts the optical signal input of the original pattern into a binary quantized electric signal. The pattern register 3 stores this electric signal. At the time of this storage, the character is divided into, for example, 100 × 100 cells, and each binary code is stored in the pattern register 3. The line width calculation unit 4 uses the input pattern (original pattern)
Calculate the line width of. The vertical sub-pattern extraction unit 6 performs a vertical scan over the pattern register 3 and extracts a vertical sub-pattern (VSP) from the relationship between the continuous length of black bits and the line width calculated by the line width calculation unit 4. To do. Similarly, the horizontal sub-pattern extraction unit 7 detects the horizontal sub-pattern (HSP) by the horizontal scan and the right oblique sub-pattern extraction unit 8
The right diagonal sub-pattern (RSP) is scanned by the right diagonal (45 °) scan, and the left diagonal sub-pattern extraction unit 9 scans the left diagonal (45).
°) Extract left diagonal sub-pattern (LSP) by scanning. FIG. 3 shows an example of the original pattern and the sub-pattern. In Fig. 3, (a) is the original pattern, (b) is the VSP, (c) is the HSP,
(d) is RSP and (e) is LSP. The character frame detector 5 detects a character frame circumscribing the character pattern in the pattern register and sends the result to the feature matrix extractor 10. The feature matrix extraction unit 10 divides the area corresponding to the character frame of the original pattern in the sub-pattern register into (N × M) areas (N = M = 5 in the embodiment of the present invention). For example, the letter 10
It is divided into 0 × 100 cells, and when N = M = 5, each region has 20 × 20 cells.

ここで、VSPを例にとり、特徴マトリクスを抽出する方
法を説明する。垂直サブパターン線幅計算部10は、垂直
サブパターン抽出部６において抽出された垂直サブパタ
ーンの線幅（ｗ_ｖ）を、線幅計算部４と同様の処理によ
り計算する。特徴マトリクス抽出部14は前述した分割領
域の黒点数（Ｂ_ij）をセルを単位として計算し、下記
(1)式を基に垂直サブパターンの線幅（ｗ_ｖ））を使用
して下記(2)式により各領域内サブパターンのストロー
クの長さを表現する特徴量をセルを単位として計算し、
（Ｎ×Ｍ）次元の特徴マトリクスを作成する。Here, taking VSP as an example, a method for extracting a feature matrix will be described. The vertical sub-pattern line width calculation unit 10 calculates the line width (w _v ) of the vertical sub-pattern extracted by the vertical sub-pattern extraction unit 6 by the same process as the line width calculation unit 4. The feature matrix extraction unit 14 calculates the number of black dots (B _ij ) in the divided area described above in units of cells, and
Using the line width (w _v ) of the vertical sub-pattern based on the equation (1), the feature quantity expressing the stroke length of the sub-pattern in each area is calculated by the following equation (2) in units of cells. ,
A (N × M) -dimensional feature matrix is created.

黒点数＝線長×線幅 …(1) ｌ_ij＝Ｂ_ij／ｗ_ｖ …(2) 同様の処理をHSPについては、水平サブパターン線幅計
算部11で計算した水平サブパターンの線幅（ｗ_Ｈ）を使
用し、RSPについては右斜めサブパターン線幅計算部12
で計算した右斜めサブパターンの線幅（ｗ_Ｒ）を使用
し、LSPについては、左斜めサブパターン線幅計算部13
で計算した左斜めサブパターンの線幅（ｗ_Ｌ）を使用し
て行い特徴マトリクスを作成する。Number of black dots = line length × line width (1) l _ij = B _ij / w _v (2) For the HSP, the horizontal sub-pattern line width calculated by the horizontal sub-pattern line width calculation unit 11 ( w _H ), and for RSP, the right diagonal sub-pattern line width calculator 12
Using the line width (w _R ) of the right diagonal sub-pattern calculated in, the left diagonal sub-pattern line width calculator 13 is used for LSP.
A feature matrix is created by using the line width (w _L ) of the left diagonal sub-pattern calculated in.

次に、特徴マトリクス抽出部14は、抽出した特徴マトリ
クスを文字の大きさで正規化し、正規化した特徴マトリ
クスを作成する。その方法は正規化前の特徴マトリクス
の１要素をｌ_ij、正規化後の要素をＬ_ij、文字枠の水平
方向の長さをΔＸ、垂直方向の長さをΔＹとすると下記
の様な処理を行なう。Next, the feature matrix extraction unit 14 normalizes the extracted feature matrix by the character size, and creates a normalized feature matrix. The method is as follows when one element of the feature matrix before normalization is l _ij , the element after normalization is L _ij , the horizontal length of the character frame is ΔX, and the vertical length is ΔY. Do.

(1)垂直サブパターン（VSP）マトリクスの場合Ｌ_ij＝l_ij／ΔＹ …(3) (2)水平サブパターン（HSP）マトリクスの場合Ｌ_ij＝l_ij／ΔＸ …(4) (3)斜めサブパターン（RSP,LSP）マトリクスの場合前記処理により、特徴マトリクス抽出部14は最終的に原
パターンを表現する｛（Ｎ×Ｍ）×４｝次元の正規化し
た特徴マトリクスを作成する。(1) In case of vertical sub-pattern (VSP) matrix L _ij = l _ij / ΔY… (3) (2) In case of horizontal sub-pattern (HSP) matrix L _ij ＝ l _ij / ΔX… (4) (3) Oblique For sub-pattern (RSP, LSP) matrix Through the above processing, the feature matrix extraction unit 14 finally creates a {(N × M) × 4} -dimensional normalized feature matrix expressing the original pattern.

識別部15は標準文字マスク（ｆ_ｍ）と特徴マトリクス抽
出部14に於て抽出された特徴マトリクス（ｆ_ｉ）との間
に従来から使用されている下式(6)の距離（Ｄ）、すな
わち（Ｎ×Ｍ）×４次元特徴空間に於ける２つのベクト
ルの差分ベクトルの長さが最小の値を与える標準文字マ
スクのカテゴリ名を文字名出力12として出力する。The discriminating unit 15 uses the distance (D) of the following formula (6) conventionally used between the standard character mask (f _m ) and the characteristic matrix (f _i ) extracted by the characteristic matrix extracting unit 14, That is, the category name of the standard character mask that gives the minimum value of the difference vector length of two vectors in the (N × M) × 4 dimensional feature space is output as the character name output 12.

次に、動作を説明する。光信号入力１は光電変換部２よ
り光電変換され、パターンレジスタ３及び線幅計算部４
に供給される。線幅計算部４は原パターンの線幅を計算
し、これを各サブパターン抽出部６〜９に出力する。垂
直サブパターン抽出部はパターンレジスタ３の出力及び
線幅計算部４の出力に基づき、前述した方法によりVSP
を抽出する。同様にして、水平サブパターン抽出部７、
右斜めサブパターン抽出部８及び左斜めサブパターン抽
出部９はそれぞれHSP，RSP及びLSPを抽出する。垂直サ
ブパターン線幅計算部10はVSPの線幅（ｗ_ｖ）を計算す
る。同様にして、水平サブパターン線幅計算部11、右斜
めサブパターン線幅計算部12及び左斜めサブパターン線
幅計算部13はそれぞれHSPの線幅（ｗ_Ｈ）、RSPの線幅
（ｗ_Ｒ）及びLSPの線幅（ｗ_Ｌ）を計算する。そして、
特徴マトリクス抽出部14は前記(1)及び(2)式に従い特徴
マトリクスを作成し、更に文字枠検出部５から出力され
た文字の大きさで正規化し、特徴マトリクスを作成す
る。このようにして作成された特徴マトリクスは識別部
11で標準文字マスクと比較され、最小の距離を与える標
準文字マスクのカテゴリ名が文字名出力12として出力さ
れる。 Next, the operation will be described. The optical signal input 1 is photoelectrically converted by the photoelectric conversion unit 2, and the pattern register 3 and the line width calculation unit 4
Is supplied to. The line width calculation unit 4 calculates the line width of the original pattern and outputs it to each of the sub pattern extraction units 6 to 9. The vertical sub-pattern extraction unit uses the output of the pattern register 3 and the output of the line width calculation unit 4 to perform VSP according to the method described above.
To extract. Similarly, the horizontal sub-pattern extraction unit 7,
The right diagonal sub-pattern extracting unit 8 and the left diagonal sub-pattern extracting unit 9 extract HSP, RSP and LSP, respectively. The vertical sub-pattern line width calculation unit 10 calculates the VSP line width (w _v ). Similarly, the horizontal sub-pattern line width calculation unit 11, the right diagonal sub-pattern line width calculation unit 12 and the left diagonal sub-pattern line width calculation unit 13 respectively have an HSP line width (w _H ) and an RSP line width (w _R ) And the line width (w _L ) of the LSP. And
The feature matrix extraction unit 14 creates a feature matrix according to the equations (1) and (2), and further normalizes it with the size of the character output from the character frame detection unit 5 to create a feature matrix. The feature matrix created in this way is the identification unit
The category name of the standard character mask that gives the minimum distance is compared with the standard character mask at 11 and is output as the character name output 12.

（発明の効果）以上詳細に説明したように本発明によれば、原パターン
を複数の方向に走査してサブパターンをそれぞれ抽出
し、得られたサブパターン毎に線幅を計算し、計数した
黒点数をこの抽出したサブパターン毎の線幅で除算する
ことにより特徴マトリクスが計算される。従って、例え
ば明朝体のように水平線幅と垂直線幅とが大きく異なる
（水平線の線幅が細く、垂直線の線幅が太い）ような場
合にも、水平サブパターンの線幅により水平特徴マトリ
クスが計算されかつ垂直サブパターンの線幅により垂直
特徴マトリクスが計算されるから、ゴシック体のごとく
水平線幅及び垂直線幅が同じ場合と同様な特徴マトリク
スが得られる。即ち、文字の書体に無関係に、文字の変
形又は文字線幅の変動がある場合にも安定した特徴抽
出、従って安定した文字認識を行うことができる。(Effect of the Invention) As described in detail above, according to the present invention, the original pattern is scanned in a plurality of directions to extract each sub-pattern, and the line width is calculated and counted for each of the obtained sub-patterns. The feature matrix is calculated by dividing the number of black dots by the line width of each of the extracted sub patterns. Therefore, even if the horizontal line width and the vertical line width are significantly different (the line width of the horizontal line is thin and the line width of the vertical line is thick), for example, in the case of Mincho type, the horizontal feature depends on the line width of the horizontal sub-pattern. Since the matrix is calculated and the vertical feature matrix is calculated according to the line width of the vertical sub-pattern, a feature matrix similar to the case where the horizontal line width and the vertical line width are the same as in Gothic is obtained. That is, regardless of the typeface of the character, stable feature extraction and therefore stable character recognition can be performed even when the character is deformed or the character line width is changed.

また、原パターンから抽出したサブパターンについて単
純な走査により特徴マトリクスを抽出しているので、高
速の文字認識が可能となることはもちろんである。Further, since the feature matrix is extracted by simple scanning for the sub-pattern extracted from the original pattern, it goes without saying that high-speed character recognition is possible.

[Brief description of drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
ゴシック体と明朝体のサブパターン例を示す図、第３図
は原パターンとサブパターンの抽出例を示す図である。１……光信号入力、２……光電変換部、３……パターン
レジスタ、４……線幅計算部、５……文字枠検出部、６
……垂直サブパターン抽出部、７……水平サブパターン
抽出部、８……右斜めサブパターン抽出部、９……左斜
めサブパターン抽出部、10……垂直サブパターン線幅計
算部、11……水平サブパターン線幅計算部、13……左斜
めサブパターン線幅計算部、12……右斜めサブパターン
線幅計算部、14……特徴マトリクス抽出部、15……識別
部、16……文字名出力。FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a diagram showing examples of Gothic and Mincho type subpatterns, and FIG. 3 is a diagram showing an example of extracting original patterns and subpatterns. . 1 ... Optical signal input, 2 ... Photoelectric conversion unit, 3 ... Pattern register, 4 ... Line width calculation unit, 5 ... Character frame detection unit, 6
...... Vertical sub-pattern extraction unit, 7 ... Horizontal sub-pattern extraction unit, 8 ... Right diagonal sub-pattern extraction unit, 9 ... Left diagonal sub-pattern extraction unit, 10 ... Vertical sub-pattern line width calculation unit, 11 ... … Horizontal sub-pattern line width calculator, 13 …… Left diagonal sub-pattern line width calculator, 12 …… Right diagonal sub-pattern line width calculator, 14 …… Feature matrix extractor, 15 …… Identifier, 16 …… Character name output.

Claims

[Claims]

1. An original pattern of a digital signal represented by a black bit and a white bit is created by photoelectrically converting and quantizing a character pattern, and a line width of the original pattern is calculated. The number of consecutive black bits in each scanning row is detected by scanning in the direction of, and corresponding to each of the plurality of scanning directions based on the detected number of consecutive black bits and the calculated line width of the original pattern. Extracting a plurality of sub-patterns, dividing the character frame area of the original pattern into (N × M) areas (N and M are constants) for each of the extracted sub-patterns, and regarding the divided areas Black dots are counted in units of cells, a feature amount is calculated based on the counted number of black dots, the calculated feature amount is normalized by the size of a character to create a feature matrix, and the created feature matrix is set in advance. Prepared letter diagram In a character recognition method for recognizing a character figure by collating with a standard character mask of a pattern, a line width is calculated for each of the extracted sub patterns, and the counted number of black dots is divided by the line width for each of the extracted sub patterns. A character recognition method characterized in that the feature amount is calculated by the above.