JPS603071A

JPS603071A - Character recognition system

Info

Publication number: JPS603071A
Application number: JP58109188A
Authority: JP
Inventors: Yoshiyuki Yamashita; 山下　義征
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1983-06-20
Filing date: 1983-06-20
Publication date: 1985-01-09
Also published as: JPH0545990B2

Abstract

PURPOSE:To improve the precision of character recognition by scanning a subpattern vertically or horizontally, and raising the distance between a character frame side and a character line to the (N)th power-sum of a normalized value. CONSTITUTION:A subpattern extraction part 4 scans vertically on the entire surface about pattern registers to extract a vertical subpattern (VSP) on the basis of the relation between the continuous length of a black dot and line width calculated by a line width calculation part 3. Similarly, a horizontal subpattern (HSP) is extracted by a horizontal scan, a right slanting subpattern (RSP) is extracted by a right slanting 45 deg. scan, and a left slanting subpattern (LSP) is extracted by a left slanting 45 deg. scan. Then, the feature extraction part 6 starts making a horizontal scan with the vertical subpattern at the point above the left vertical side among four sides constituting the character frame to detect all change points from a white point to a black point.

Description

【発明の詳細な説明】（技術分野）本発明にＬ高速て精度の良い特徴抽出方式に関１″ろも
のである。DETAILED DESCRIPTION OF THE INVENTION (Technical Field) The present invention relates to a fast and accurate feature extraction method.

（背景技術）従来文字図形認識装置に於ては、文字□□□形ノ々ター
ンよりストロークを抽出し、それし抽出されたストロー
クの位置、長さ、ストローク間の相互関係等を用いて認
識する方式が多べ採用されている。(Background technology) In conventional character/figure recognition devices, strokes are extracted from the character □□□ shape number turns, and then recognized using the position, length, mutual relationship between strokes, etc. of the extracted strokes. Many methods have been adopted.

この棟の装置においては、（１）文字図形の輪郭を追跡
することにより検出された輪郭点系列について曲率を計
算し、その曲率の大きな値の点を分岐点として輪郭系列
を分割し、分割された系列を組合わせることによりスト
ロークを抽出するか、又は（２）文字図形パターンに細
線化処理を行なって骨格化し、その骨格パターンの連結
性及び骨格パターンを追跡し急激な角度の変化点等を検
出し一〇゛ストロークを抽出し、該抽出されたストロー
クについて幾伺学的な肋徴等を抽出し文字図形の認別な
行なっていた。しかしながら、（１１の方法（・コ１、
文字（図形パターンが大きくなり又文字図形パターンか
複動６化すると、その処理量か増太しそのため処理速度
の低下を招き、（２）の方法は、文字図形パターンを細
線化する必要があり又その細線化によるパターンのひず
み、ヒゲ等の問題かあり、その後の処理が複雑なものと
なる欠点がある。The equipment in this building (1) calculates the curvature of a contour point series detected by tracing the contour of a character figure, divides the contour series using points with large curvature values as branching points; (2) Extract the strokes by combining the series, or (2) perform thinning processing on the character/figure pattern to create a skeleton, trace the connectivity of the skeleton pattern and trace the skeletal pattern, and detect sudden angular changes, etc. It detected and extracted 10 strokes, extracted geometrical rib features, etc. from the extracted strokes, and recognized character shapes. However, (11 methods (・ko1,
When the character (figure pattern) becomes large and the character figure pattern becomes double-acting 6, the amount of processing increases and the processing speed decreases.In method (2), it is necessary to make the character figure pattern thinner. Further, there are problems such as pattern distortion and whiskers due to the thinning of the lines, and the subsequent processing becomes complicated.

（発明の課題）本発明の目的はこれらの欠点を改善１−ろもので、文字
図形パターンの所望の方向のストローク成分を表わすサ
ブパターンを抽出し、サブパターンについ−Ｃ文字外接
枠の辺上の点から走査して走査線上のすベーこり文字線
の位置を検出し、走査開始点と文字線との距ぬ（１を１
３ｉＪ記走査方向の文字外接枠の大きさで正規化し−Ｃ
ｌ’ｌｌｌ　ｔｌｊ　′１−るＮ乗和の辺単位の配列を
分割して！ｌ−１ＩＧ（−＼りトル群を抽出づ−ること
を肋徴とし、その１１的ばｉｆ７＋速て安定な文字認識
装置を提供′ｆろことにＪ’Ｉイ）。(Problems to be solved by the invention) The purpose of the present invention is to improve these drawbacks by extracting a sub-pattern representing a stroke component in a desired direction of a character figure pattern, and extracting a sub-pattern representing a stroke component in a desired direction of a character figure pattern. The position of the entire character line on the scanning line is detected by scanning from the point , and the distance between the scanning start point and the character line (1 is 1
3iJ Normalized by the size of the character circumscribing frame in the scanning direction -C
l'llll tlj '1-Divide the side-by-side array of the N-th power sum! 1-1IG (-\I) whose objective is to extract a small group, and to provide a fast and stable character recognition device if7+.

（発明の構成Ｊ・５よσ作Ｊ−１１）第１図は本発明の文字認識装置におけろ実施例、第２１
はサノバクーン例、第３図は勃徴抽Ｌｆｊ例を示−Ｊ−
。(Structure of the invention J-5 to σ J-11) Fig. 1 shows an embodiment of the character recognition device of the present invention.
Figure 3 shows an example of Sanobakun, and Figure 3 shows an example of erectile drawing Lfj.
.

第１図中１（佳光届１俊換１７１１（，２はパターンレ
ジスタ、；うは線幅言口″＋１部、・ロ土ザブパターン
抽Ｉｎ　Ｂｔｉ、５（佳文字枠倹１゜旧′１シ、６（１
、辺距離計算部、７（」、重機マトリクス仙ＩＩ届’ｉ
ｌｌ、）Ｎ、ｌ：識別ｉ′ｉｆ）、０は文字名出力−〇
−ある。In Fig. 1, 1 (Kako Report 1 Shunkan 1711 (, 2 is the pattern register, ; U is the line width word mouth" + 1 part, Roto Zabu pattern drawing In Bti, 5 (Kako letter frame + 1゜old') 1shi, 6(1
, Edge Distance Calculation Department, 7('', Heavy Equipment Matrix Sen II Notification'i
ll, )N, l: Identification i'if), 0 is character name output -〇-.

本実施例の動作は、読取機構にセットされた帳閉上の文
字は光電変換＞ｘｌ＋　＋において２値の量子化された
ティジクル電気・箭号に変換され、パターンレジスタ２
に格納される。それと同時に、線幅計算部３に４６いて
入カバターンの線幅（Ｗ）が計算される。サブパターン
抽出部４（・土、パターンレジスタについて垂直スキャ
ンを全面１ｊ−なって、黒ヒントの連続長さと線幅割算
部；３におい℃計碧された線幅との関係より垂直サブパ
ターン（〜１Ｓ１１Ｊ）を抽出する。同様に、水平スキ
ャンにより水平ザブパターン（Ｈ４Ｆ　）を、右斜め・
１５°スキヤンにより右斜めサブパターン（ｌ（、ＳＰ
　）を、左斜め４５°スキヤンにより左心１めザブパタ
ーン（Ｌ、’−Ｅｆ）　）を抽出１−ろ。The operation of this embodiment is that the characters on the book closing set in the reading mechanism are converted into binary quantized tickle electricity and digit symbols by photoelectric conversion>xl+ +, and the pattern register 2
is stored in At the same time, the line width (W) of the input cover pattern is calculated in the line width calculating section 346. Sub-pattern extraction unit 4 (Sat, the pattern register is vertically scanned over the entire surface, and the continuous length of black hints and line width division unit; ~1S11J).Similarly, the horizontal sub pattern (H4F) is extracted by horizontal scanning.
A right diagonal subpattern (l(, SP
), the left center first sub-pattern (L,'-Ef) is extracted by scanning 45 degrees diagonally to the left.

第２図は原パターンと各ザブパターンの例で（２１）は
原パターン、（１））は垂直ザブパターン（ＶＳＩ’）
、（Ｃ）は水平サブパターン（ｌｌ５Ｐ　）、（（１）
は右斜めザブパターン（Ｒ８Ｐ　）　、（ｃｌは左斜め
サブパクー：、’　（ＬＳＰ）である。文字枠検出部５
は、パターンレジスタ内の文字図形パターンに外接１−
ろ方形の枠（以後文字枠と称する）を検出し、パターン
レジスタで定義される２次元平面における前記文字枠を
規定′１−る為の位置座標を特徴抽出部６へ送出する。Figure 2 is an example of the original pattern and each sub-pattern, where (21) is the original pattern and (1)) is the vertical sub-pattern (VSI').
, (C) is the horizontal subpattern (ll5P ), ((1)
is a right diagonal sub pattern (R8P), (cl is a left diagonal sub pattern:,' (LSP).Character frame detection unit 5
is a circumscribed 1- to the character/figure pattern in the pattern register.
A rectangular frame (hereinafter referred to as a character frame) is detected, and position coordinates for defining the character frame on a two-dimensional plane defined by a pattern register are sent to the feature extraction unit 6.

以後の説明においては文字枠の左−トを原点とし、水平
方向をＸ軸、垂直方向をＹ軸とする座標系を使用する。In the following description, a coordinate system will be used in which the left-hand side of the character frame is the origin, the horizontal direction is the X axis, and the vertical direction is the Y axis.

特徴抽出部６ば、まず垂直サブパターンにつｔ・て、文
字枠を構成する４辺のうち垂直な辺である左辺−１−の
点Ｐ（０，ｙ）から水平査定を開始し、白点から黒点へ
の変化点を１−べて検出し、検出した変１１点と前記走
査を開始した垂直辺」二の点１ゝとの間の距１々［Ｌず
なわしＸ座標の差を文字枠のＸ方向の長さを正規化定数
どして正規化した仙のＮ乗（Ｎは定数、本実施例て゛は
Ｎ−２）の値の割算を１）ＩＪ記検出したーＪ−八での
変化点について行な（・、その総和を配列■ｐ、（ｙ）
に格納″１−る。但し前記内点とは文字背ｊＩｔ部を表
わし、黒点とは文字′に９．都を表わす。また式（１）
は前記のＶパｙ）を式て表わしたもθうであり△ＸＡば
それぞれσ）変化点と文字枠辺上の走査開始点との距前
な示し、〃−１，・・、１＼、１（は検出された変化点
個数を表わづ−。又、式（１）中のΔＸは文字枠の水ｓ
（１方向の長さでｋ）す、Ｃは整数化定数でｋ）り本実
施例において（１、Ｃ＝５（＋とした。The feature extraction unit 6 first starts horizontal evaluation of the vertical sub-pattern from point P(0, y) on the left side -1-, which is the vertical side among the four sides constituting the character frame, and Detect the change points from a dot to a black dot, and calculate the distance between the detected change point 11 and the point 1 on the vertical side where the scanning started [L square X coordinate difference] The length of the character frame in the X direction is normalized using a normalization constant, and the division of the value of X to the Nth power (N is a constant, in this example, ゛ is N-2) is calculated as follows: 1) IJ was detected. - Do this for the change point at 8 (・, the sum is arrayed ■ p, (y)
is stored in ``1-''.However, the interior point represents the back jIt part of the character, and the black point represents the 9. capital in the character '.Also, formula (1)
is the formula for the above-mentioned Vpy), and △XA is the distance between the change point and the scanning start point on the side of the character frame, 〃-1,..., 1\ , 1 (represents the number of detected change points. Also, ΔX in equation (1) is the water s of the character frame.
(The length in one direction is k), and C is an integer constant k). In this example, (1, C=5 (+) is used.

上ｎ（２の様な処理を文字枠の２つの垂直辺上のづ−ベ
ての点を開始点として行ない、垂直ザブパターンについ
て、文字枠の左辺上の点から水平走査を開始して作成す
る配列Ｖ、（ｉ）、文字枠の右辺」−の点から水平走査
を開始して作成１−ろ配列Ｖｒ（ｉ）を抽出づ−ろ。但
し１−０９　・、ＹＴ；ＹＴ＆ｊ、文字枠上辺のＹ座標
である。同様な処理により、水３１ｚザブパターン、右
斜めザブパターン、左斜めサブパターンにすし・では文
字枠の２個の水平辺土の１−べての点から垂直走査を行
なって、水平ザブパターンについて配列”ｈ（ｊ）、ｉ
＋ズ（」）、右斜めサブパターンにつ℃・ての配列Ｒ，
！、（ｊ）、ＨバＪ）、左斜めザブパターンについての
配列Ｌｈ（Ｊ）、以（１）を抽１゛ｊ」する。Perform the process similar to above (2) starting from all points on the two vertical sides of the character frame, and create the vertical sub pattern by starting horizontal scanning from the point on the left side of the character frame. Extract the array Vr(i) created by starting horizontal scanning from the point "-" on the right side of the character frame.However, 1-09 ・,YT; This is the Y coordinate of the top side.By similar processing, vertical scanning is performed from all points on the two horizontal sides of the character frame for the water 31z sub pattern, right diagonal sub pattern, and left diagonal sub pattern. Then, for the horizontal sub pattern, the array ``h(j),i
+ zu (''), ℃・te arrangement R for the right diagonal sub-pattern,
! , (j), HbJ), array Lh(J) for the left diagonal diagonal pattern, and extract (1) below.

世しＪ−０，・・・、Ｘｌｔ、　Ｘ＋（、は文字枠右辺
のＸ座標である。前記水平サブパターン、右斜めザフィ
々り一ン、左斜めザブパターンについ−Ｃ抽出１−ろ配
列の添字りは文字枠の水平な下辺上の点を開始点とした
ものを表わし又は水平な上辺上の点を開始点としたもの
を表わす。又、１１ｈ　（］　）　、１．　ｌス（ｊＬ
ｌもｈ（Ｊ）、Ｈソ、（ｊ）。SeiJ-0,..., Xlt, The subscript represents a point on the horizontal bottom side of the character frame as the starting point, or a point on the horizontal top side as the starting point.
l also h (J), H so, (j).

Ｌｌ、　ｆｊ）　、Ｌｘ（ｊ）を抽出する際における走
査１５トｊ始点と前記変化点との距離の正規化定数とし
ては文字枠の垂直方向の長さ△Ｙを使用した。文字マ１
．　ＩＪクス抽出部７は！時機抽出’＊’ｒｌＳ　６に
おいて抽出された８河Ｉの配列を使用し、各配列をＭ個
（へ４は定数、本実施例でむ１、へ４−７）に分割し、
分割された配列の同一分割７１ｊ位内の配列の値の平均
値をＪ１算することによりへ・１×８次元の６徴マトリ
クスＰ　ＣＩｎ・１］）を抽出する７、但いｎ　＝、　
、１　、・・、へ４．ｎ＝ｌ、・・−２８識別＞５１１
８は重機マトリクス抽出部７で抽出された重機マトリク
スと、同形式で記述された標準文字マスクｆ　（ｍ　、
　ｎ　）どの間の式（２）で示される距離（１））をＭ
’ｌ算しその距ν１１（か１股も小さい値をｉｉえろ標
準文字マスクのカテゴリ名を文字名出力（）−＼出力す
る。When extracting Ll, fj) and Lx(j), the vertical length ΔY of the character frame was used as a normalization constant for the distance between the start point of scan 15 and the change point. Character mark 1
．． IJ Kusu extraction part 7 is! Time extraction '*' rlS Using the array of 8 rivers extracted in 6, divide each array into M pieces (He4 is a constant, M1 in this example, He4-7),
By calculating J1 the average value of the array values within the same division 71j of the divided array, a 1×8-dimensional six-character matrix P CIn 1]) is extracted7, where n =,
, 1 ,..., to 4. n=l,...-28 identification>511
8 is a standard character mask f (m,
n) between which distance (1)) shown in equation (2) is M
Calculate the distance ν11 (or 1) and select the smaller value.Output the category name of the standard character mask as a character name ()-\.

Ｄ−〜ｔ’７ＴＴＴ席、１〕）−′（ｍ匹）（２）（発
ｊｌｌＪの効果）Ｊ以−Ｌ説明した様に、本実施例の４）徴マｌ−１）ク
ス抽出部において抽出された特徴マ）　ｌ）クスは、文
字図形パターンのストロークの位置、長さ、方向等を表
わすものて、文字７１当有な性質を表現している。又、
図３に２独り形が類似した文学パターンと特徴抽出部で
抽出１−ろ配列を図形的に表現した例において観察され
るように、文字の局ｎｉ的な違いが前記配列に充分に反
映されるので認識Ｉ？４度の向」二を図ることができろ
。又、特徴抽出部において各配列を作成する際に、走査
を開始した文字枠辺上の点と文学線との距離を文字枠の
当該走査方向の大きさで正規化して見゛るりで手摺文字
に４６いて特有な筆者の違いによる文字の大きさの変動
的を吸収することかてきろのて’１１’ｊ　１隻の良い
認）撮か用能である。又、文字図形パターンからの特徴
抽出を単純な走査といつ処理により実現しているので高
度な認識かＤＪ能であり、装置の小、！１１！！化を図
ることもできる利点かある。D-~t'7TTT seat, 1])-' (m animals) (2) (Effect of emitted jllJ) J to L As explained, 4) Signal mark l-1) Kusu extraction part of this example Features extracted in (1) The marks represent the position, length, direction, etc. of the stroke of the character graphic pattern, and express the characteristics typical of the character 71. or,
As observed in Figure 3, an example of a literary pattern with similar two solitary forms and a graphical representation of the extraction arrangement using the feature extraction unit, local differences in characters are sufficiently reflected in the arrangement. Is it recognized because it is? Be able to measure the direction of 4 degrees. In addition, when creating each array in the feature extraction section, the distance between the point on the side of the character frame where scanning started and the literary line is normalized by the size of the character frame in the scanning direction, and the handrail characters are It is possible to absorb the fluctuations in font size due to differences in writers, which is unique to 46 characters. In addition, since feature extraction from text/figure patterns is achieved through simple scanning and processing, advanced recognition or DJ capabilities are possible, and the equipment is small! 11! ! It also has the advantage of being able to be used in a variety of ways.

本発明は、文字図形パターンか１：）各方向のストロー
ク成分を抽出したザブパターンを垂直又は水平に走査し
、文字枠辺と文字線との距１カ［１を当該走査方向の文
字枠の大きさで正規化した値のＮ乗和を４４．徴として
いるので、複雑な処理を必要とせす又、手書文字の変形
に追従して安定に市徴を抽出しているので高速で精度の
良い文字認識装置に利用することかできろ。The present invention scans a character figure pattern (1:) vertically or horizontally with a sub pattern from which stroke components are extracted in each direction, and calculates the distance between the character frame side and the character line by 1 [1] of the character frame in the scanning direction. The sum of the N-th power of the values normalized by the size is 44. Since city signs are used as signs, complex processing is required.Also, since city signs are stably extracted by following the deformation of handwritten characters, it can be used in high-speed and highly accurate character recognition devices.

[Brief explanation of the drawing]

第１図に１本発明の文字１；、り識装置にオ６けろ実施
例、ｉ′Ｊ２図Ｃ’）　〜（ｃ）＆：Ｌ−リフパターン
例を示１−１図、第：３図は特徴・例をン」ミ″づ図て
あろ、。］・・・光重重変換部　２・・パターンレジスタ；３　
線幅泪鏝部　・１・・・ザブパターン抽出部５・・文字
枠構ｉ、Ｌ′ｌｊ′ｔＢ　（ｉ・・・特徴抽出部７・・
・７１力徴マトリクス油出ｒｉＢ　８・・・識別部９・
・・文字名出力！時ｄ７１出力須人沖ＴＵ：気工業株式会社特６７１出１如代」！［１人弁　埋士　山　木　恵　− 券／　図襄２　図＜　ｏ、　）　＜　ｂノ　にり（ｄ）　＜ｅ）Figure 1 shows an example of the letter 1 of the present invention, an example of the character 1; The figure shows the features and examples step by step.]...Light weight/weight conversion unit 2...Pattern register; 3
Line width drop section 1... Sub pattern extraction section 5... Character frame structure i, L'lj'tB (i... Feature extraction section 7...
・71 Force signature matrix YuderiB 8...Identification part 9・
...Character name output! Time d71 output Sunoki TU: Ki Kogyo Co., Ltd. Special 671 Output 1 Ruyo”! [Single-person dialect Buri Megumi Yamaki - Ticket / Illustration 2 Figure < o, ) < b no niri (d) < e)

Claims

[Claims]

A subpattern is created by scanning a character/figure pattern in a desired direction, detecting a cross section of a character line in the scanning direction, and extracting a cross section whose cross section length is sufficiently longer than the character line width of the character/figure pattern. Extraction is performed in multiple directions, and for each extracted subpattern, m
Scanning in a predetermined direction using the edge point of the character circumscribing frame of the "l" written figure pattern as a starting point, detecting the positions of all character lines on the scanning line, and the point on the edge where scanning started,
The process of extracting the sum of powers (N is a constant) of the value obtained by normalizing the distance to the detected character line by the size of the Ail character circumscribing frame in the predetermined direction is carried out for each of the subpatterns, and Starting from all points on at least two of the four sides of the character circumscribing frame, scan lines are scanned as a unit, and for each sub-pattern, each scanning line is scanned using the sides of the character circumscribing frame as a unit. The array of the N-th power sum of 1llll L13 L, the extracted N-th power sum array is each divided into M pieces (M is a constant), and the average value of the N-th power sum within each division part is calculated as follows: By extracting M-dimensional feature vectors from each array, the character figure (turn)
- A character recognition method characterized in that a character is recognized by extracting a group of feature vectors and comparing them with a dictionary expressed in the same format as the extracted il# feature vector group.