JP2658153B2

JP2658153B2 - Character identification method

Info

Publication number: JP2658153B2
Application number: JP63084132A
Authority: JP
Inventors: 義美山田; 直人信太
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1988-04-07
Filing date: 1988-04-07
Publication date: 1997-09-30
Anticipated expiration: 2012-09-30
Also published as: JPH01258086A

Description

【発明の詳細な説明】（産業上の利用分野）この発明は、読み取った文字のパターンを形成する輪
郭の線分の特徴から前記文字を識別する文字識別方式に
関する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character identification method for identifying a character based on features of a line segment of a contour forming a pattern of a read character.

（従来の技術）従来、英数字、カタカナを含む文字を識別する文字識
別方法として、文字のストロークを解析するストローク
・アナリシス法、文字のパターンと基準パターンとの一
致を調べるパターン・マッチング法、更に変形文字に対
応する方法として、文字のパターンの線構造を解析する
方法、その背景構造を解析する方法など、種々の方式が
提案されている。(Prior Art) Conventionally, as character identification methods for identifying characters including alphanumeric characters and katakana, a stroke analysis method for analyzing a stroke of a character, a pattern matching method for checking a match between a character pattern and a reference pattern, and further, Various methods have been proposed as methods for dealing with deformed characters, such as a method for analyzing the line structure of a character pattern and a method for analyzing the background structure.

また、本出願人による特願昭第58−34482号では、識
別すべき文字のストロークをなす線分の始点と終点とを
記憶し、前記線分の分布位置に基づいて文字を識別する
技術が開示されている。Further, Japanese Patent Application No. 58-34482 filed by the present applicant discloses a technique of storing a start point and an end point of a line segment forming a stroke of a character to be identified, and identifying the character based on the distribution position of the line segment. It has been disclosed.

（発明が解決しようとする課題）しかしながら、前記ストローク・アナリシス法は変形
文字を含む各文字の特徴を記述した、文字を識別するた
めの辞書を備えなければならないので、その情報量が膨
大なものとなり、処理時間が長くなるという欠点があっ
た。このような処理時間を短くするためには、ハードウ
ェアの規模を大きくしなければならないという欠点があ
った。(Problems to be Solved by the Invention) However, the stroke analysis method must have a dictionary for identifying characters, which describes the characteristics of each character including deformed characters, and therefore the amount of information is enormous. And there is a disadvantage that the processing time becomes longer. In order to shorten such processing time, there is a disadvantage that the scale of hardware must be increased.

また、背景構造を解析する方法は、記述されている文
字の背景部分に着目し、その文字を形成する線のルー
プ、凹状態及び凸状態などをその特徴として抽出し、こ
の特徴に基づいてその文字を認識する処理を実行するも
のである。この処理には文字の線を識別するために文字
を形成するイメージの白黒点を判定する処理が必要とな
る。しかし、このような処理は、一般的に複雑であると
いう欠点があった。Also, the method of analyzing the background structure focuses on the background portion of the described character, extracts loops, concave states and convex states of lines forming the character as its features, and based on this feature, This executes a process for recognizing characters. This process requires a process of determining a black and white point of an image forming a character in order to identify a character line. However, such a process generally has a disadvantage that it is complicated.

また、本出願人による特願昭第58−34482号の文字識
別方式は、識別すべき文字を形成するストロークの線分
の始点と終点とを記憶しなければならず、この制約から
識別の処理が複雑になるという欠点があった。In the character identification system of Japanese Patent Application No. 58-34482 filed by the present applicant, the start point and the end point of a line segment of a stroke forming a character to be identified must be stored. However, there was a drawback that it became complicated.

この発明は、前記のような従来技術の欠点を除去する
ことを目的とするものであり、識別の処理が簡単、従っ
て高速、かつ正確な文字識別方式を提供することを目的
とする。SUMMARY OF THE INVENTION An object of the present invention is to eliminate the above-mentioned drawbacks of the prior art, and to provide a character identification system in which identification processing is simple, and therefore, high-speed and accurate.

（課題を解決するための手段）本発明は識別すべき文字の二値化された文字パターン
から前記文字の輪郭を抽出し、該輪郭に関連する複数の
特徴点から前記文字の識別をする文字識別方式に関する
ものである。本発明によれば、特に、輪郭についてのxy
座標位置を表わす輪郭情報から、該輪郭を形成する各輪
郭線分の分布位置を示す特徴量として、所定の基準位置
に対する相対座標系における該各輪郭線分のｘ成分のｘ
軸上の位置、該各輪郭線分のｘ成分のｙ軸上の位置、該
各輪郭線分のｙ成分のｙ軸上の位置、及び該各輪郭線分
のｙ成分のｘ軸上の位置に関する特徴量の算出を行う特
徴量算出手段と、前記特徴量算出手段により算出された
前記特徴量と、複数の文字に対応してそれぞれ予め定め
られた基準特徴量との間でそれぞれ比較をし、両者間の
差が最小の値を示すときに当該特徴量に対応されている
文字を前記文字パターンの文字と識別する識別手段とが
備えられている。(Means for Solving the Problems) The present invention extracts a character outline from a binarized character pattern of a character to be identified, and identifies the character from a plurality of feature points related to the outline. It relates to an identification method. According to the invention, in particular, the xy
From the contour information representing the coordinate position, the characteristic value indicating the distribution position of each contour line forming the contour is defined as x of x component of each contour line in a relative coordinate system with respect to a predetermined reference position.
Position on the axis, position on the y axis of the x component of each contour line, position on the y axis of the y component of each contour line, and position on the x axis of the y component of each contour line A feature value calculating means for calculating a feature value related to the feature value, and a comparison between the feature value calculated by the feature value calculating means and a reference feature value predetermined for each of a plurality of characters. And an identification means for identifying a character corresponding to the feature amount from a character of the character pattern when a difference between the two indicates a minimum value.

（作用）輪郭線分の分布位置を示す特徴量は、輪郭についての
xy座標位置を表わす輪郭情報から、所定の基準位置に対
する相対座標系における該各輪郭線分のｘ成分のｘ軸上
の位置、該各輪郭線分のｘ成分のｙ軸上の位置、該各輪
郭線分のｙ成分のｙ軸上の位置、及び該各輪郭線分のｙ
成分のｘ軸上の位置に関する特徴量として求められる。
この特徴と、前記予め定めた基準特徴量との間の差は両
者が同一のときに最小となる。よって前記識別手段によ
り、このような差が最小を示す前記基準特徴量を抽出
し、これに対応する文字が識別されるべき前記文字であ
ると判定することにより、文字を識別するための処理を
簡単、かつ正確なものにする。(Operation) The feature quantity indicating the distribution position of the contour line is
From the contour information representing the xy coordinate position, the position on the x axis of the x component of each contour line in the relative coordinate system with respect to a predetermined reference position, the position on the y axis of the x component of each contour line, The position of the y component of the contour on the y-axis, and the y of each contour
It is obtained as a feature value relating to the position of the component on the x-axis.
The difference between this feature and the predetermined reference feature amount is minimized when both are the same. Therefore, the identification means extracts the reference feature amount indicating the smallest difference, and determines that the corresponding character is the character to be identified, thereby performing a process for identifying the character. Be simple and accurate.

（実施例）実施例について図面を参照して説明する。第１図は、
この発明の一実施例を示す文字識別方式のブロック図で
ある。第１図において、１は二値化された文字パターン
1aを格納するパターン・レジスタ、２はパターン・レジ
スタ１から出力される文字パターン1bから公知の方法に
より文字パターン1bの輪郭2aを抽出する輪郭抽出部、３
は輪郭抽出部２により抽出された輪郭2aにより文字パタ
ーン1bの以下に説明する輪郭線分の分布位置を示す特徴
量3aを算出する特徴量算出部、４は識別部であり、特徴
量算出部３により算出された特徴量3aと、予め定めた基
準特徴量との間で比較をし、両者の差が最小となるのが
検出されるときに、その基準特徴量に対応されている文
字が文字パターン1aの文字であると判定し、この判定の
結果4aを出力するものである。(Example) An example will be described with reference to the drawings. Figure 1
FIG. 1 is a block diagram of a character identification system showing one embodiment of the present invention. In FIG. 1, 1 is a binarized character pattern.
A pattern register 2 for storing the outline 1a of the character pattern 1b from the character pattern 1b output from the pattern register 1 by a known method.
Is a feature amount calculation unit that calculates a feature amount 3a indicating a distribution position of a contour line segment of the character pattern 1b described below from the outline 2a extracted by the outline extraction unit 2, 4 is an identification unit, and 4 is an identification unit. 3 is compared with a predetermined reference feature amount, and when it is detected that the difference between the two is minimized, the character corresponding to the reference feature amount is determined. It is determined that the character is the character of the character pattern 1a, and the result 4a of this determination is output.

次に、このような構成の動作を説明する。図示なしの
公知の文字パターン入力手段が識別すべき図示なしの文
字、例えば記帳上の文字“A"を読み取る。そして第２図
に示すようにxy座標の位置に関連させて二値化されたデ
ータからなる文字パターン1aを出力する。パターン・レ
ジスタ１は、このような文字パターン1aを入力して記憶
すると共に、これを文字パターン1bとして出力する。Next, the operation of such a configuration will be described. A known character pattern input unit (not shown) reads a character (not shown) to be identified, for example, a character "A" on a book. Then, as shown in FIG. 2, a character pattern 1a composed of binarized data is output in association with the position of the xy coordinate. The pattern register 1 inputs and stores such a character pattern 1a and outputs it as a character pattern 1b.

輪郭抽出部２は、パターン・レジスタ１から出力され
る文字パターン1bの輪郭について追跡するために、その
開始点として、例えば第３図に示すように文字“A"の上
端に位置するxy座標の（9,32）及び（11,24）をとり、
これらを開始点として輪郭追跡を開始する。これにより
第３図に示すように、文字パターン1bの輪郭についての
xy座標位置を表わす一連の輪郭情報を輪郭2aとして抽出
する。この場合に、開始点は、任意に選択することがで
きる。このような輪郭情報を抽出する抽出方法は、公知
のものでよく、この発明の目的でもないので、ここでは
詳細な説明を省略する。In order to track the outline of the character pattern 1b output from the pattern register 1, the outline extraction unit 2 uses, for example, the xy coordinates of the xy coordinate located at the upper end of the character "A" as shown in FIG. Take (9,32) and (11,24)
The contour tracking is started with these as starting points. Thereby, as shown in FIG. 3, the outline of the character pattern 1b is
A series of contour information representing the xy coordinate position is extracted as a contour 2a. In this case, the starting point can be arbitrarily selected. The extraction method for extracting such contour information may be a known method and is not the object of the present invention, and therefore, a detailed description is omitted here.

特徴量算出部３は、まず輪郭2aを入力し、輪郭2a上の
各線分がxy座標上でどこに位置するかを線分のｘ成分及
びｙ成分別にそれぞれ数値化する。その後、それらの平
均的な値をそれぞれ有すると共に、ｘ成分のｘ軸上の位
置に関連する特徴量Q₁、ｙ成分のｙ軸上の位置に関連す
る特徴量Q₂、ｘ成分のｙ軸上の位置に関連する特徴量
Q₃、及びｙ成分のｘ軸上の位置に関連する特徴量Q₄をそ
れぞれ次のように算出する。ただし、H_x、H_yは文字パタ
ーン1aの文字幅を表わし、x_i、y_iはxy座標上の位置であ
る。First, the feature quantity calculation unit 3 inputs the contour 2a, and quantifies where each line segment on the contour 2a is located on the xy coordinate for each of the x component and the y component of the line segment. After that, while having their respective average values, the feature quantity Q ₁ related to the position of the x component on the x axis, the feature quantity Q ₂ related to the position of the y component on the y axis, the y axis of the x component Features related to the upper position
Q ₃ and a feature quantity Q ₄ related to the position of the y component on the x-axis are calculated as follows. However, H _x, H _y represents a character width of a character pattern 1a, x _{_i,} y _i is the position on the xy coordinate.

ここで、特徴量Q₁〜Q₄は、第３図のxy座標上で示す
と、文字パターン1aに外接する長方形の左下の位置であ
る（1,1）を原点とした座標系の相対位置に基づいて演
算されことに注意すべきである。従って、特徴量Q₁〜Q₄
の値は、その線分が原点に近い程、小さな値のものとな
る。また、特徴量Q₁〜Q₄は、xy座標上の位置に関連する
全線分を加算した項を含むので、線分の変化が特徴量Q₁
〜Q₄に与える影響は、原点に近い位置での線分の変化で
は小さいが、文字パターン1aに外接する長方形の右上に
近い位置での線分の変化では大きい。 Here, when shown on the xy coordinates in FIG. 3, the feature amounts Q _{1 to} Q ₄ are relative positions in a coordinate system having the origin at (1,1) which is the lower left position of the rectangle circumscribing the character pattern 1a. It should be noted that the calculation is based on Therefore, the feature quantities Q _{1 to} Q ₄
Is smaller as the line segment is closer to the origin. Further, since the feature amounts Q _{1 to} Q ₄ include a term obtained by adding all the line segments related to the position on the xy coordinate, the change of the line segment is the feature amount Q ₁
Impact on to Q ₄ is smaller than the segment changes in the position close to the origin, larger line segment changes in the position close to the upper right of the rectangle circumscribing the character pattern 1a.

特徴量Q₁〜Q₄にはこのような性質があるので、この発
明では、文字の識別を容易するために、更に文字パター
ン1aに外接する長方形の右上の座標位置（22,32）を原
点とした輪郭2aを求め、これらを、特徴量Q₁〜Q₄を導出
した演算式に適用して特徴量Q₅〜Q₈も導出する。Since the feature quantities Q _{1 to} Q ₄ have such a property, in the present invention, in order to facilitate the identification of characters, the upper right coordinate position (22, 32) of the rectangle circumscribing the character pattern 1a is further set as the origin. and obtains the contour 2a that, they feature quantity Q ₅ to Q ₈ are applied to a computing equation that derives the characteristic amount Q ₁ to Q ₄ may be derived.

更に、第４図（ａ）に示す外縁輪郭における特徴量Q₁
〜Q₈と、第４図（ｂ）に示す内縁輪郭における特徴量Q₁
〜Q₈との和を求めることにより、輪郭2aから導出した文
字パターン1aの特徴量Q₁〜Q₈を求める。前記演算は、識
別すべき文字に複数の内縁が存在するときは、その全線
分について実行される。Further, the feature quantity Q ₁ in the outer edge contour shown in FIG.
And to Q _8, the feature quantity Q ₁ in the inner edge contour shown in FIG. 4 (b)
By obtaining the sum of the to Q _8, calculates a characteristic quantity Q ₁ to Q ₈ character pattern 1a derived from the contour 2a. When a plurality of inner edges exist in a character to be identified, the above operation is performed on all the line segments.

識別部４は識別対象範囲に含む文字についての基準特
徴量をそれぞれ格納している。これらの基準特徴量と、
輪郭2aから導出した文字パターン1aの特徴量Q₁〜Q₈とを
逐次比較することにより、両者間の差、即ちマッチング
距離を導出する。この導出したマッチング距離のうちで
最小値を示す基準特徴量をソーティングにより選択す
る。識別部４は、このようにして選択した基準特徴量に
対応する文字を識別すべき文字であるとし、これを結果
ａとして出力する。The identification unit 4 stores reference feature amounts of characters included in the identification target range. These reference features and
By comparing the feature quantity Q ₁ to Q ₈ character pattern 1a derived from the contour 2a sequentially difference between them, i.e. to derive a matching distance. From the derived matching distances, a reference feature indicating the minimum value is selected by sorting. The identification unit 4 determines that the character corresponding to the reference feature amount selected in this way is a character to be identified, and outputs this as a result a.

（発明の効果）この発明は、以上詳細に説明したように本発明によれ
ば、輪郭についてのxy座標位置を表わす輪郭情報から、
該輪郭を形成する各輪郭線分の分布位置を示す特徴とし
て、所定の基準位置に対する相対座標系における該各輪
郭線分のｘ成分のｘ軸上の位置、該各輪郭線分のｘ成分
のｙ軸上の位置、及び該各輪郭線分のｙ成分のｘ軸上の
位置、及び該各輪郭線分のｙ成分のｘ軸上の位置に関す
る特徴量の算出を行い、これを基準特徴量と比較するよ
うに構成しているため、文字の識別処理が非常に簡単か
つ高速となり、かつその識別結果の信頼性も高い。(Effects of the Invention) According to the present invention, as described in detail above, according to the present invention, based on contour information representing an xy coordinate position of a contour,
As a feature indicating the distribution position of each contour line forming the contour, the position on the x-axis of the x component of each contour line in a relative coordinate system with respect to a predetermined reference position, A feature value is calculated for the position on the y-axis, the position of the y component of each contour line on the x-axis, and the position of the y component of each contour line on the x-axis. Therefore, character identification processing is very simple and fast, and the reliability of the identification result is high.

[Brief description of the drawings]

第１図はこの発明の一実施例を示す文字識別方式のブロ
ック図、第２図は識別対象の文字のパターン図、第３図は第２図の文字の輪郭を示す図、第４図は第２図の文字の輪郭の座標系列の図である。３……特徴点抽出部、４……識別部。FIG. 1 is a block diagram of a character identification system showing an embodiment of the present invention, FIG. 2 is a pattern diagram of a character to be identified, FIG. 3 is a diagram showing the outline of the character in FIG. 2, and FIG. FIG. 3 is a diagram of a coordinate series of the outline of the character in FIG. 2; 3 ... Feature point extraction unit, 4 ... Identification unit.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭62−290985（ＪＰ，Ａ) 特開昭62−269286（ＪＰ，Ａ) 電子通信学会技術研究報告ＰＲＬ82− 29 電子通信学会技術研究報告ＰＲＬ82− 79 ────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-62-290985 (JP, A) JP-A-62-269286 (JP, A) IEICE Technical Report PRL82-29 IEICE Technical Report PRL82 − 79

Claims

(57) [Claims]

1. A character identification method for extracting an outline of a character from a binarized character pattern of a character to be identified and identifying the character from a plurality of feature points related to the outline. From the contour information representing the xy coordinate position of
As a feature amount indicating a distribution position of each contour line forming the contour, a position on the x axis of the x component of each contour line in a relative coordinate system with respect to a predetermined reference position, an x component of each contour line A feature amount calculating means for calculating a feature amount related to a position on the y-axis, a position on the y-axis of a y component of each contour line, and a position on the x-axis of a y component of each contour line; The feature amount calculated by the feature amount calculation means is compared with a predetermined reference feature amount corresponding to each of a plurality of characters, and when a difference between the two shows a minimum value. A character identification method comprising: identification means for identifying a character corresponding to the characteristic amount from a character of the character pattern.