JP2002099876A

JP2002099876A - Character recognition device, character recognition method, and storage medium with character recognition program stored therein

Info

Publication number: JP2002099876A
Application number: JP2000290577A
Authority: JP
Inventors: Naoya Tanaka; 直哉田中
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2000-09-25
Filing date: 2000-09-25
Publication date: 2002-04-05
Anticipated expiration: 2020-09-25
Also published as: JP3546827B2

Abstract

PROBLEM TO BE SOLVED: To provide a character recognition device strong to a break of character stroke. SOLUTION: A profile information extraction part 2 approximates the profile of an input character pattern accumulated in a character pattern input part 11 with a diagonal line, and extracts profile information including a directed segment of line that is this diagonal line. A structure matching part 4 collates the extracted profile information of the input character pattern with the profile information of typical character pattern preliminarily stored in a dictionary 3 for each of character categories with the directed segment of line as unit, and determines and outputs the distance between the input character pattern and the typical character pattern as a distance value. A character recognition result output part 5 determines the character category of the most certain input character pattern from the distance value outputted by the structure matching part 4 for each of the character categories and outputs a character code.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は文字認識装置および
文字認識方法に関し、特に２値の文字パターンを構造解
析的手法を用いて文字認識する文字認識装置および文字
認識方法に関する。The present invention relates to a character recognition device and a character recognition method, and more particularly to a character recognition device and a character recognition method for recognizing a binary character pattern using a structural analysis technique.

【０００２】[0002]

【従来の技術】文字を自動的に読取る光学的文字読取り
装置においては、印刷活字、手書文字についてそれぞれ
英数字や漢字といった様々な字種を認識する文字認識技
術が発達してきている。このような文字認識技術におい
ては、大きく分けてパターンマッチング的な手法と構造
解析的な手法の２つが用いられている。パターンマッチ
ング的な手法は、印刷活字のように変形の少ない文字の
認識に非常に優れる。変形の大きい手書文字に対しても
文字サイズを一定に揃える前処理、いわゆる正規化処理
を事前に施すことにより有効性が高いことが種々報告さ
れている。しかしながら、パターンマッチング的な手法
においては手書文字の中で類似形状を持つものを誤読す
る場合が生じうる。例えば片仮名の「シ」と「ン」のよ
うに全体形状が似ている文字を互いに誤読する場合が生
じる。このような類似形状を持つ手書文字の認識には構
造解析的な手法が有効である。片仮名の「シ」と「ン」
についても、構造解析的な手法では全ての文字ストロー
クや輪郭の情報を個別に辞書登録し、辞書と入力文字パ
ターンの間で対応する文字ストロークや輪郭を照合する
ことによって正しい識別が可能になる。2. Description of the Related Art In an optical character reading apparatus for automatically reading characters, a character recognition technology for recognizing various character types such as alphanumeric characters and kanji for print characters and handwritten characters has been developed. In such a character recognition technology, there are roughly divided two methods, a pattern matching method and a structure analysis method. The pattern matching method is very excellent in recognizing characters with little deformation, such as print characters. Various reports have been reported that pre-processing for uniforming the character size, that is, so-called normalization processing, is highly effective even for handwritten characters having large deformation. However, in a pattern matching method, a handwritten character having a similar shape may be misread. For example, characters having similar overall shapes, such as katakana characters "shi" and "n", may be misread each other. A structure analysis method is effective for recognizing a handwritten character having such a similar shape. Katakana "shi" and "n"
In the case of (1), the structure analysis method individually registers information of all character strokes and contours in a dictionary, and matches the corresponding character strokes and contours between the dictionary and the input character pattern to enable correct identification.

【０００３】従来の構造解析的な手法を用いた文字認識
装置の一例が、特開平７−１７５８９５号広報に記載さ
れている。この従来の文字認識装置は、文字を構成する
輪郭の数、各輪郭をその図形の凹部分と凸部分に分割し
た線分の数、線分形状特徴、輪郭間特徴、線分間特徴を
構造識別辞書に格納された識別対象の代表的な文字と入
力文字パターンとの間で照合することによって文字認識
を行なう。また、特開平８−７７２９３号広報には別の
文字認識装置の例が記載されている。この文字認識装置
は、文字を構成する輪郭の外接矩形の包含関係から抽出
する穴部の位置と文字パターンの端点の位置および曲率
およびストローク方向を特徴量として辞書と照合を行な
う。しかしながら、これらの従来技術には、入力文字パ
ターンに途切れが生じていた場合に棄却、誤読が生じ易
い。特開平８−７７２９３号広報では端点間の距離等に
よって文字パターンの穴部の一部が切断されていると考
えられる場合には、切断部を接続して得られる特徴とす
る前の特徴の２通りの特徴を辞書パターンとの照合に利
用するが、穴部以外でも様々生じうる途切れの問題を全
て解決しているわけではない。An example of a conventional character recognition apparatus using a structural analysis technique is disclosed in Japanese Patent Laid-Open Publication No. Hei 7-175895. This conventional character recognition apparatus identifies the number of contours constituting a character, the number of line segments obtained by dividing each contour into concave portions and convex portions of the figure, line segment shape features, inter-contour features, and line segment features. Character recognition is performed by comparing a representative character to be identified stored in the dictionary with an input character pattern. Japanese Patent Application Laid-Open No. 8-77293 discloses another example of a character recognition device. This character recognition device compares the position of a hole extracted from the inclusive relation of a circumscribed rectangle of a contour constituting a character, the position of an end point of a character pattern, the curvature, and the stroke direction with a dictionary as a feature amount. However, these prior arts are liable to be rejected or misread when an input character pattern is interrupted. In Japanese Patent Application Laid-Open No. 8-77293, if it is considered that a part of a hole of a character pattern is cut off due to a distance between end points or the like, a feature obtained by connecting the cut portion is referred to as a second feature. Although the same features are used for matching with the dictionary pattern, not all the problems that may occur in the portions other than the holes are not solved.

【０００４】[0004]

【発明が解決しようとする課題】上述した従来の文字認
識装置および文字認識方法は、入力文字パターンに途切
れが生じていた場合に棄却、誤読が生じ易いという問題
点がある。また、端点間の距離等によって文字パターン
の穴部の一部が切断されていると考えられる場合には、
切断部を接続して得られる特徴とする前の特徴の２通り
の特徴を辞書パターンとの照合に利用する広報がある
が、穴部以外で様々生じうる途切れについては未解決で
あるという問題がある。The above-described conventional character recognition device and character recognition method have a problem that when an input character pattern is interrupted, it is easily rejected or misread. If it is considered that a part of the hole of the character pattern is cut due to the distance between the end points, etc.,
There is public information that uses two types of features, which are the features obtained by connecting the cut portions before the feature, for matching with the dictionary pattern. However, there is a problem that various breaks other than holes are unresolved. is there.

【０００５】本発明の目的はこのような従来の欠点を除
去するため、文字ストロークの途切れに頑強な文字認識
装置および文字認識方法を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a character recognition apparatus and a character recognition method that are robust against breaks in character strokes in order to eliminate such conventional disadvantages.

【０００６】[0006]

【課題を解決するための手段】本発明の文字認識装置
は、格子状に標本化され２値に量子化された文字パター
ンを受けて入力文字パターンとして蓄積する文字パター
ン入力部と、前記文字パターン入力部に蓄積された前記
入力文字パターンの輪郭を折線近似しこの折線近似した
ときのそれぞれの折れ線である有向線分を含む輪郭情報
を抽出する輪郭情報抽出部と、文字カテゴリの各々につ
いて代表的な前記文字パターンの前記有向線分を含む前
記輪郭情報を前記文字カテゴリに対応する文字コードと
ともに予め格納する辞書と、前記輪郭情報抽出部により
抽出された前記入力文字パターンの前記輪郭情報と前記
辞書に格納された前記代表的文字パターンの前記輪郭情
報とを前記文字カテゴリの各々について前記輪郭情報に
含まれる前記有向線分を単位にして照合し、前記入力文
字パターンと前記代表的文字パターンとの距離を求め距
離値として出力する構造マッチング部と、前記文字カテ
ゴリの各々について前記構造マッチング部が出力した前
記距離値より最も確からしい前記入力文字パターンの前
記文字カテゴリを求める文字認識結果出力部と、を備え
て構成されている。A character recognition device according to the present invention comprises: a character pattern input section for receiving a character pattern sampled in a lattice and quantized into a binary form and storing the character pattern as an input character pattern; A contour information extraction unit for approximating the contour of the input character pattern stored in the input unit and extracting contour information including a directed line segment which is a polygonal line when the contour is approximated; and a representative for each character category. A dictionary in which the contour information including the directed line segment of the character pattern is stored in advance together with a character code corresponding to the character category, and the contour information of the input character pattern extracted by the contour information extracting unit. The outline information of the representative character pattern stored in the dictionary and the directed line included in the outline information for each of the character categories And a structure matching unit that calculates a distance between the input character pattern and the representative character pattern and outputs the distance as a distance value, and the distance value output by the structure matching unit for each of the character categories. And a character recognition result output unit for obtaining the character category of the input character pattern that is likely.

【０００７】本発明の文字認識装置の前記輪郭情報抽出
部は、前記文字パターン入力部に蓄積された前記入力文
字パターンに対して左回りまたは右回りで輪郭追跡を行
なって前記入力文字パターンの輪郭の座標値列を抽出す
る輪郭追跡部と、前記輪郭追跡部で抽出した前記入力文
字パターンの輪郭の前記座標値列より前記入力文字パタ
ーンの輪郭を折線近似しこの折線近似したときのそれぞ
れの折れ線である有向線分をプリミティブセグメントと
してそれぞれ抽出するプリミティブセグメント生成部
と、前記プリミティブセグメント生成部で抽出した前記
入力文字パターンのそれぞれの前記プリミティブセグメ
ントを前記入力文字パターンの幅と高さとがそれぞれ予
め定められた一定サイズになるようにそれぞれ変換し、
この変換したそれぞれの前記プリミティブセグメントを
含む前記輪郭情報を前記構造マッチング部に出力する正
規化処理部と、また、本発明の文字認識装置の前記輪郭
情報抽出部は、前記正規化処理部により変換されて正規
化された前記入力文字パターンのそれぞれの前記プリミ
ティブセグメントから前記入力文字パターン中の凸部分
を形成する前記プリミティブセグメントの組を示す凸セ
グメントとこの凸セグメント以外の部分である前記入力
文字パターン中の凹部分を形成する前記プリミティブセ
グメントの組を示す凹セグメントとを生成する凹凸セグ
メント生成部と、前記凹凸セグメント生成部が生成した
前記凹セグメント及び前記凸セグメントに予め定めた角
度以内のコーナーがあるか否かを調べ前記コーナーがあ
るときにこのコーナー位置でこのコーナーを有する前記
セグメントを分割しこの分割したセグメントと前記コー
ナーのない前記凹セグメント及び前記凸セグメントを輪
郭セグメントとして生成するコーナー分割処理部と、前
記コーナー分割処理部が生成した前記輪郭セグメントと
前記輪郭追跡部が抽出した前記入力文字パターンの輪郭
の前記座標値列とにより、前記輪郭セグメントに対応す
る前記座標値列を予め定めた個数おきにサンプリングし
て新たな座標値列を生成して標準輪郭セグメントとしこ
の標準輪郭セグメントを前記輪郭情報の一部として入力
文字パターンに対応する文字コードとともに出力し前記
辞書に予め格納させる標準輪郭セグメント生成部と、を
備えて構成されている。The outline information extraction unit of the character recognition device of the present invention traces the outline of the input character pattern counterclockwise or clockwise with respect to the input character pattern stored in the character pattern input unit, and performs the outline trace of the input character pattern. A contour tracing unit for extracting a coordinate value sequence, and a polygonal line approximation of the contour of the input character pattern from the coordinate value sequence of the contour of the input character pattern extracted by the contour tracing unit. A primitive segment generator that extracts each directed segment as a primitive segment, and the width and height of each of the primitive segments of the input character pattern extracted by the primitive segment generator are set in advance. Each is converted to a fixed size,
The normalization processing unit that outputs the converted outline information including each of the primitive segments to the structure matching unit, and the outline information extraction unit of the character recognition device according to the present invention is configured such that the normalization processing unit A convex segment indicating a set of primitive segments forming a convex portion in the input character pattern from each of the primitive segments of the input character pattern that has been normalized and the input character pattern being a portion other than the convex segment A concave / convex segment generation unit that generates a concave segment indicating a set of the primitive segments that form a concave portion inside, and a corner within a predetermined angle of the concave segment and the convex segment generated by the concave / convex segment generation unit are Check to see if there is A corner division processing unit that divides the segment having the corner at the position and generates the divided segment and the concave segment and the convex segment without the corner as a contour segment; and the contour generated by the corner division processing unit. A new coordinate value sequence is generated by sampling the coordinate value sequence corresponding to the contour segment at predetermined intervals according to the segment and the coordinate value sequence of the contour of the input character pattern extracted by the contour tracking unit. A standard outline segment generating unit for outputting the standard outline segment as a part of the outline information together with a character code corresponding to the input character pattern, and storing the standard outline segment in the dictionary in advance.

【０００８】本発明の文字認識装置の前記構造マッチン
グ部は、前記輪郭特徴抽出部から得られる前記入力文字
パターンに含まれるｉ番目（但し、ｉ＝１〜前記入力文
字パターンに含まれる前記プリミティブセグメントの
数）の前記プリミティブセグメントＶｉと、前記辞書中
の任意の前記文字コードに対応する任意の文字パターン
に含まれるすべての前記標準輪郭セグメントとの間で第
１の距離計算を行いこの距離値が予め定めたしきい値以
内のときに前記しきい値以内となった文字パターンに対
応する文字コードとこの文字コードに対応する文字パタ
ーンに含まれる前記しきい値以内となった前記標準輪郭
セグメントとを出力し、前記計算値が予め定めたしきい
値以内でないときに任意の前記文字パターンの中に前記
プリミティブセグメントＶｉに対応する標準輪郭セグメ
ントがないことを通知する候補セグメント検出部と、前
記候補セグメント検出部から出力された１以上の前記文
字コードのそれぞれの前記標準輪郭セグメントと前記プ
リミティブセグメントＶｉとの間で第２の距離計算をそ
れぞれ行ない距離値をそれぞれ出力するセグメント間距
離算出部と、セグメント間距離算出部が出力したそれぞ
れの前記距離値のうちの前記プリミティブセグメントＶ
ｉとの間で最も小さい距離値に対応する前記標準輪郭セ
グメントを検出し、その距離値を出力する最良セグメン
ト検出部と、最良セグメント検出部の出力する距離値を
集計して前記入力文字パターンと前記辞書中の前記文字
パターンとの文字間の距離値を求める文字パターン間距
離算出部と、を備えて構成されている。In the character recognition device of the present invention, the structure matching unit may include an i-th (where i = 1 to 1) primitive primitive segment included in the input character pattern included in the input character pattern obtained from the contour feature extracting unit. Of the primitive segment Vi) and all the standard contour segments included in any character pattern corresponding to any of the character codes in the dictionary, and a first distance calculation is performed. A character code corresponding to a character pattern that is within the threshold when within a predetermined threshold, and the standard contour segment that is within the threshold included in a character pattern corresponding to the character code; And outputs the primitive segment in any of the character patterns when the calculated value is not within a predetermined threshold value. And a candidate segment detecting unit that notifies that there is no standard contour segment corresponding to Vi, and between each of the standard contour segments and the primitive segments Vi of one or more of the character codes output from the candidate segment detecting unit. Performs the second distance calculation and outputs a distance value, respectively, and the primitive segment V among the distance values output by the inter-segment distance calculation unit
i, the standard contour segment corresponding to the smallest distance value is detected, and a best segment detection unit that outputs the distance value; and a distance value output by the best segment detection unit is aggregated to obtain the input character pattern. And a character pattern distance calculating unit for calculating a distance value between characters with the character pattern in the dictionary.

【０００９】本発明の文字認識方法は、格子状に標本化
され２値に量子化された文字パターンを受けて入力文字
パターンとして蓄積する第１のステップと、この蓄積し
た前記入力文字パターンの輪郭を折線近似しこの折線近
似したときのそれぞれの折れ線である有向線分を含む輪
郭情報を抽出する第２のステップと、文字カテゴリに対
応する文字コードとともに予め格納した前記文字カテゴ
リの各々における代表的な文字パターンの前記有向線分
を含む前記輪郭情報と前記第２のステップにて抽出した
前記入力文字パターンの前記輪郭情報とを前記文字カテ
ゴリの各々について前記輪郭情報に含まれる前記有向線
分を単位にして照合し、前記入力文字パターンと前記代
表的文字パターンとの距離を求める第３のステップと、
この求めた前記距離より最も確からしい前記入力文字パ
ターンの前記文字カテゴリを求める第４のステップと、
を含んでいる。According to the character recognition method of the present invention, a first step of receiving a character pattern sampled in a lattice and quantized into a binary form and storing it as an input character pattern, and the outline of the stored input character pattern A second step of extracting outline information including directed line segments as respective broken lines when the broken line is approximated, and a representative in each of the character categories stored in advance together with a character code corresponding to the character category. The contour information including the directed line segment of the typical character pattern and the contour information of the input character pattern extracted in the second step are included in the contour information for each of the character categories. A third step of collating in units of line segments to determine a distance between the input character pattern and the representative character pattern;
A fourth step of obtaining the character category of the input character pattern that is most likely than the obtained distance;
Contains.

【００１０】本発明の文字認識方法の前記第２のステッ
プは、前記第１のステップで蓄積した前記入力文字パタ
ーンに対して左回りまたは右回りで輪郭追跡を行なって
前記入力文字パターンの輪郭の座標値列を抽出するステ
ップと、前記抽出した前記入力文字パターンの輪郭の前
記座標値列より前記入力文字パターンの輪郭を折線近似
しこの折線近似したときのそれぞれの折れ線である有向
線分をプリミティブセグメントとしてそれぞれ抽出する
ステップと、前記抽出した前記入力文字パターンのそれ
ぞれの前記プリミティブセグメントを前記入力文字パタ
ーンの幅と高さとがそれぞれ予め定められた一定サイズ
になるようにそれぞれ変換し、この変換したそれぞれの
前記プリミティブセグメントを含む前記輪郭情報を出力
するステップと、を含んでいる。The second step of the character recognition method according to the present invention is characterized in that the input character pattern accumulated in the first step is traced counterclockwise or clockwise to obtain a contour of the input character pattern. Extracting a coordinate value sequence, and approximating the contour of the input character pattern from the coordinate value sequence of the extracted contour of the input character pattern with a polygonal line. Extracting each as a primitive segment, and converting each of the primitive segments of the extracted input character pattern so that the width and height of the input character pattern are each a predetermined constant size, and Outputting the contour information including the respective primitive segments, Which comprise.

【００１１】本発明の文字認識プログラムを記録した記
録媒体は、格子状に標本化され２値に量子化された文字
パターンを受けて入力文字パターンとして蓄積する第１
の処理と、この蓄積した前記入力文字パターンの輪郭を
折線近似しこの折線近似したときのそれぞれの折れ線で
ある有向線分を含む輪郭情報を抽出する第２の処理と、
文字カテゴリに対応する文字コードとともに予め格納し
た前記文字カテゴリの各々における代表的な文字パター
ンの前記有向線分を含む前記輪郭情報と前記第２の処理
にて抽出した前記入力文字パターンの前記輪郭情報とを
前記文字カテゴリの各々について前記輪郭情報に含まれ
る前記有向線分を単位にして照合し、前記入力文字パタ
ーンと前記代表的文字パターンとの距離を求める第３の
処理と、この求めた前記距離より最も確からしい前記入
力文字パターンの前記文字カテゴリを求める第４の処理
と、をコンピュータに実行させるための文字認識プログ
ラムを記録している。A recording medium on which the character recognition program of the present invention is recorded receives a character pattern sampled in a lattice and quantized into a binary form, and stores it as an input character pattern.
And a second process of extracting outline information including a directed line segment that is a broken line when the outline of the input character pattern thus accumulated is broken and the broken line is approximated.
The outline information including the directed line segment of the representative character pattern in each of the character categories stored in advance together with the character code corresponding to the character category, and the outline of the input character pattern extracted in the second processing And a third process of comparing information with each of the character categories in units of the directed line segment included in the outline information to determine a distance between the input character pattern and the representative character pattern. And a fourth process for obtaining the character category of the input character pattern that is most probable from the distance, and a character recognition program for causing a computer to execute the fourth process.

【００１２】[0012]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面を参照して説明する。Next, embodiments of the present invention will be described with reference to the drawings.

【００１３】図１は、本発明の文字認識装置の一つの実
施の形態を示すブロック図である。FIG. 1 is a block diagram showing one embodiment of the character recognition device of the present invention.

【００１４】図１に示す本実施の形態は、格子状に標本
化され２値に量子化された文字パターンを受けて入力文
字パターンとして蓄積する文字パターン入力部１と、文
字パターン入力部１に蓄積された入力文字パターンの輪
郭を折線近似しこの折線近似したときのそれぞれの折れ
線である有向線分を含む輪郭情報を抽出する輪郭情報抽
出部２と、文字カテゴリの各々について代表的な文字パ
ターンの有向線分を含む輪郭情報を文字カテゴリに対応
する文字コードとともに予め格納する辞書３と、輪郭情
報抽出部２により抽出された入力文字パターンの輪郭情
報と辞書３に格納された代表的文字パターンの輪郭情報
とを文字カテゴリの各々について輪郭情報に含まれる有
向線分を単位にして照合し、入力文字パターンと代表的
文字パターンとの距離を求め距離値として出力する構造
マッチング部４と、文字カテゴリの各々について構造マ
ッチング部４が出力した距離値より最も確からしい入力
文字パターンの文字カテゴリを求める文字認識結果出力
部５とにより構成されている。The present embodiment shown in FIG. 1 includes a character pattern input unit 1 which receives a character pattern sampled in a lattice and quantized into a binary form and stores it as an input character pattern. A contour information extracting unit 2 for approximating the outline of the accumulated input character pattern and extracting outline information including directed line segments as respective broken lines when the broken line approximation is performed, and representative characters for each character category A dictionary 3 in which contour information including a directed line segment of a pattern is stored in advance together with a character code corresponding to a character category, and contour information of an input character pattern extracted by the contour information extracting unit 2 and a representative stored in the dictionary 3 The contour information of the character pattern is collated with the directed line segments included in the contour information for each of the character categories, and the input character pattern and the representative character pattern are compared. The structure matching unit 4 calculates a distance and outputs the distance as a distance value, and a character recognition result output unit 5 calculates a character category of an input character pattern that is most probable from the distance value output by the structure matching unit 4 for each character category. ing.

【００１５】なお、図１には、文字認識装置に使用する
文字認識プログラムを記憶する記録媒体６を併せて示
し、文字認識装置は文字認識プログラムをこの記録媒体
６より読み込みこの文字認識プログラムを実行すること
により文字認識する。FIG. 1 also shows a recording medium 6 for storing a character recognition program used in the character recognition device. The character recognition device reads the character recognition program from the recording medium 6 and executes the character recognition program. To recognize characters.

【００１６】次に、本実施の形態の文字認識装置の動作
を図２から図１２を参照して詳細に説明する。Next, the operation of the character recognition apparatus according to the present embodiment will be described in detail with reference to FIGS.

【００１７】図２は、本発明の実施の形態の動作の一例
を示し、本発明の文字認識方法の一例を示す図である。FIG. 2 is a diagram showing an example of the operation of the embodiment of the present invention and an example of the character recognition method of the present invention.

【００１８】図３は、輪郭情報抽出部の一つの実施の形
態を示すブロック図であり、この輪郭情報抽出部２は、
文字パターン入力部１に蓄積された入力文字パターンに
対して左回りまたは右回りで輪郭追跡を行なって入力文
字パターンの輪郭の座標値列を抽出する輪郭追跡部７
と、輪郭追跡部７で抽出した入力文字パターンの輪郭の
座標値列より入力文字パターンの輪郭を折線近似しこの
折線近似したときのそれぞれの折れ線である有向線分を
プリミティブセグメントとしてそれぞれ抽出するプリミ
ティブセグメント生成部８と、プリミティブセグメント
生成部８で抽出した入力文字パターンのそれぞれのプリ
ミティブセグメントを入力文字パターンの幅と高さとが
それぞれ予め定められた一定サイズ（例えば、４０ｘ４
０画素）になるようにそれぞれ変換し、この変換したそ
れぞれのプリミティブセグメントを含む輪郭情報を構造
マッチング部４に出力する正規化処理部９と、正規化処
理部９が変換した入力文字パターンのプリミティブセグ
メントの中点座標を求めこの求めた中点座標を輪郭情報
の一部として構造マッチング部４に出力する中心位置算
出部１０とにより構成されている。また、この輪郭情報
抽出部２は、正規化処理部９により変換されて正規化さ
れた入力文字パターンのそれぞれのプリミティブセグメ
ントから入力文字パターン中の凸部分を形成するプリミ
ティブセグメントの組を示す凸セグメントとこの凸セグ
メント以外の部分である入力文字パターン中の凹部分を
形成するプリミティブセグメントの組を示す凹セグメン
トとを生成する凹凸セグメント生成部１１と、凹凸セグ
メント生成部１１が生成した凹セグメント及び凸セグメ
ントに予め定めた角度（例えば、９０度）以内のコーナ
ーがあるか否かを調べコーナーがあるときにこのコーナ
ー位置でこのコーナーを有するセグメントを分割しこの
分割したセグメントとコーナーのない凹セグメント及び
凸セグメントを輪郭セグメントとして生成するコーナー
分割処理部１２と、コーナー分割処理部１２が生成した
輪郭セグメントと輪郭追跡部７が抽出した入力文字パタ
ーンの輪郭の座標値列とにより、輪郭セグメントに対応
する座標値列を予め定めた個数（例えば、１から３個）
おきにサンプリングして新たな座標値列を生成して標準
輪郭セグメントとしこの標準輪郭セグメントを輪郭情報
の一部として入力文字パターンに対応する文字コードと
ともに出力し辞書３に予め格納させる標準輪郭セグメン
ト生成部１３とを更に備えている。FIG. 3 is a block diagram showing one embodiment of the contour information extracting unit.
A contour tracing unit 7 for tracing the input character pattern stored in the character pattern input unit 1 counterclockwise or clockwise to extract a coordinate value sequence of the contour of the input character pattern.
From the coordinate value sequence of the contour of the input character pattern extracted by the contour tracing unit 7, the contour of the input character pattern is approximated by a polygonal line, and each of the directed segments that are the polygonal lines obtained when the polygonal approximation is performed is extracted as a primitive segment. The primitive segment generation unit 8 and the primitive segments of the input character pattern extracted by the primitive segment generation unit 8 are converted into a predetermined size (for example, 40 × 4) in which the width and the height of the input character pattern are respectively predetermined.
0), and a normalization processing unit 9 that outputs contour information including the converted primitive segments to the structure matching unit 4, and a primitive of the input character pattern converted by the normalization processing unit 9. The center position calculating unit 10 determines the midpoint coordinates of the segment and outputs the obtained midpoint coordinates to the structure matching unit 4 as a part of the outline information. The contour information extracting unit 2 also includes a convex segment indicating a set of primitive segments that form a convex portion in the input character pattern from each primitive segment of the input character pattern converted and normalized by the normalization processing unit 9. A concave / convex segment generating unit 11 for generating a concave segment indicating a set of primitive segments forming a concave portion in the input character pattern, which is a part other than the convex segment, a concave segment and a convex generated by the concave / convex segment generating unit 11 It is checked whether or not the segment has a corner within a predetermined angle (for example, 90 degrees). When there is a corner, the segment having the corner is divided at the corner position, and the divided segment and the concave segment having no corner are divided. Corners that generate convex segments as contour segments A predetermined number of coordinate value sequences (corresponding to the contour segments) are obtained from the division processing unit 12 and the outline segment generated by the corner division processing unit 12 and the coordinate value sequence of the outline of the input character pattern extracted by the outline tracking unit 7 ( (For example, 1 to 3)
Every other sampling to generate a new coordinate value sequence and use it as a standard outline segment. This standard outline segment is output as a part of outline information together with a character code corresponding to the input character pattern and stored in the dictionary 3 in advance. And a unit 13.

【００１９】図４は、構造マッチング部の一つの実施の
形態を示すブロック図であり、構造マッチング部４は、
輪郭特徴抽出部から得られる入力文字パターンに含まれ
るｉ番目（但し、ｉ＝１〜入力文字パターンに含まれる
プリミティブセグメントの数）のプリミティブセグメン
トＶｉと、辞書３中の任意の文字コードに対応する任意
の文字パターンに含まれるすべての標準輪郭セグメント
｛ＲＳｋ｝との間で第１の距離（プリミティブセグメン
トの中点と標準輪郭セグメントを構成するすべての有向
線分の終点との距離）計算を行いこの距離値が予め定め
たしきい値以内のときにこのしきい値以内となった文字
パターンに対応する文字コードとこの文字コードに対応
する文字パターンに含まれるこのしきい値以内となった
標準輪郭セグメントＲＳｋとをセグメント間距離算出部
１５に出力し、計算値が予め定めたしきい値以内でない
ときに任意の文字パターンの中にプリミティブセグメン
トＶｉに対応する標準輪郭セグメントＲＳｋがないこと
を文字パターン間距離算出部１７に通知する候補セグメ
ント検出部１４と、候補セグメント検出部１４から出力
された１以上の文字コードのそれぞれの標準輪郭セグメ
ントＲＳｋとプリミティブセグメントＶｉとの間で第２
の距離計算をそれぞれ行ない距離値をそれぞれ出力する
セグメント間距離算出部１５と、セグメント間距離算出
部１５が出力したそれぞれの距離値のうちのプリミティ
ブセグメントＶｉとの間で最も小さい距離値に対応する
標準輪郭セグメントＲＳｋを検出し、その距離値を出力
する最良セグメント検出部１６と、最良セグメント検出
部１６の出力する距離値を集計して入力文字パターンと
辞書３中の文字パターンとの文字間の距離値を求める文
字パターン間距離算出部１７とを備えている。FIG. 4 is a block diagram showing one embodiment of the structure matching unit.
It corresponds to the i-th primitive segment Vi (where i = 1 to the number of primitive segments included in the input character pattern) included in the input character pattern obtained from the contour feature extraction unit, and an arbitrary character code in the dictionary 3. The first distance (distance between the midpoint of the primitive segment and the end point of all the directed segments constituting the standard contour segment) between all the standard contour segments {RSk} included in the arbitrary character pattern is calculated. When the distance value is within a predetermined threshold value, the character code corresponding to the character pattern within the threshold value and the threshold value included in the character pattern corresponding to the character code are within the threshold value. The standard contour segment RSk is output to the inter-segment distance calculating unit 15 and any character is output when the calculated value is not within a predetermined threshold value. A candidate segment detector 14 that notifies the character pattern distance calculator 17 that there is no standard contour segment RSk corresponding to the primitive segment Vi in the turn; and one or more character codes of the character codes output from the candidate segment detector 14. The second between each standard contour segment RSk and primitive segment Vi
, And outputs the distance value, and the distance value corresponds to the smallest distance value between the primitive segment Vi among the distance values output by the inter-segment distance calculation unit 15. A best segment detecting unit 16 that detects the standard contour segment RSk and outputs its distance value, and a distance value output by the best segment detecting unit 16 is compiled to calculate the distance between the character of the input character pattern and the character pattern in the dictionary 3. A character pattern distance calculating unit 17 for calculating a distance value;

【００２０】図５は、入力文字パターンの一例を示す図
である。FIG. 5 is a diagram showing an example of an input character pattern.

【００２１】図６は、プリミティブセグメント抽出の一
例を示す図である。FIG. 6 is a diagram showing an example of primitive segment extraction.

【００２２】図７は、凹凸セグメント生成の一例を示す
図である。FIG. 7 is a diagram showing an example of the generation of an uneven segment.

【００２３】図８は、標準輪郭セグメント生成の一例を
示す図である。FIG. 8 is a diagram showing an example of standard contour segment generation.

【００２４】図９は、輪郭追跡の一例を示す図である。FIG. 9 is a diagram showing an example of contour tracking.

【００２５】図１０は、プリミティブセグメント生成の
一例を示す図である。FIG. 10 is a diagram showing an example of primitive segment generation.

【００２６】図１１は、プリミティブセグメントと標準
輪郭セグメントとの距離計算の一例を示す図である。FIG. 11 is a diagram showing an example of calculating the distance between a primitive segment and a standard contour segment.

【００２７】図１２は、辞書を作成する動作の一例を示
すフローチャートである。FIG. 12 is a flowchart showing an example of the operation for creating a dictionary.

【００２８】図１において、図２を使用して説明する
と、文字パターン入力部１により、格子状に標本化され
２値に量子化された文字パターンを受けて入力文字パタ
ーンとして蓄積し（Ｓ２１）、輪郭情報抽出部２によ
り、この蓄積した入力文字パターンの輪郭を折線近似し
この折線近似したときのそれぞれの折れ線である有向線
分を含む輪郭情報を抽出し（Ｓ２２）と、構造マッチン
グ部４により、文字カテゴリに対応する文字コードとと
もに予め格納した文字カテゴリの各々における代表的な
文字パターンの有向線分を含む輪郭情報と第２のステッ
プ（Ｓ２２）にて抽出した入力文字パターンの輪郭情報
とを文字カテゴリの各々について輪郭情報に含まれる有
向線分を単位にして照合し、入力文字パターンと代表的
文字パターンとの距離を求め（Ｓ２３）、文字認識結果
出力部５により、第３のステップで求めた距離より最も
確からしい入力文字パターンの文字カテゴリを求めて出
力する（Ｓ２４）。ここで、第２のステップ（Ｓ２２）
は、前記第１のステップ（Ｓ２１）で蓄積した入力文字
パターンに対して左回りまたは右回りで輪郭追跡を行な
って入力文字パターンの輪郭の座標値列を抽出し、この
抽出した入力文字パターンの輪郭の座標値列より入力文
字パターンの輪郭を折線近似しこの折線近似したときの
それぞれの折れ線である有向線分をプリミティブセグメ
ントとしてそれぞれ抽出し、この抽出した入力文字パタ
ーンのそれぞれのプリミティブセグメントを入力文字パ
ターンの幅と高さとがそれぞれ予め定められた一定サイ
ズになるようにそれぞれ変換し、この変換したそれぞれ
のプリミティブセグメントを含む輪郭情報を出力するよ
うにしている。Referring to FIG. 1, a description will be given with reference to FIG. 2. A character pattern input unit 1 receives a character pattern sampled in a lattice and quantized into binary, and stores it as an input character pattern (S21). The contour information extracting unit 2 approximates the outline of the accumulated input character pattern with a polygonal line, and extracts outline information including directed line segments as respective polygonal lines when the polygonal line is approximated (S22). 4, the outline information including the directed line segment of the representative character pattern in each of the character categories stored in advance together with the character code corresponding to the character category, and the outline of the input character pattern extracted in the second step (S22). Information and the directed character segment included in the contour information for each character category, and the distance between the input character pattern and the representative character pattern is compared. Calculated (S23), the character recognition result output section 5 obtains and outputs a character category the most likely input character pattern from the distance obtained in the third step (S24). Here, the second step (S22)
Performs contour tracing clockwise or clockwise with respect to the input character pattern accumulated in the first step (S21) to extract a coordinate value sequence of the contour of the input character pattern, and The contour of the input character pattern is approximated by a polygonal line from the coordinate value sequence of the outline, and each directed line segment that is a polygonal line when the polygonal line is approximated is extracted as a primitive segment, and each primitive segment of the extracted input character pattern is extracted. The width and height of the input character pattern are respectively converted so as to have a predetermined fixed size, and contour information including each of the converted primitive segments is output.

【００２９】図１において、更に詳しく説明すると、文
字パターン入力部１は、格子状に標本化され２値に量子
化された文字パターン（入力文字パターン）を受け取る
入力インターフェイスおよび入力文字パターンを蓄積す
るメモリを含んでいる。ここで、入力文字パターンは、
文書をイメージスキャナ等で光電変換して得られる格子
状に標本化され２値またはそれ以上の多値で量子化され
たディジタル画像から１文字単位で切出され２値に量子
化された部分画像である。入力文字パターンは、背景部
分の画素値を０、文字部分の画素値を１とし、値０の画
素を白画素、値１の画素を黒画素と呼ぶものとする。本
実施例においては以上に述べたように背景部分の画素値
を０、文字部分の画素値１とするディジタル化された文
字パターンを対象として説明を行なうが、勿論、背景部
分の画素値を１、文字部分の画素値を０とした反転画像
を対象とする場合でも画素値の判定処理部の１と０に対
する判定結果を入れ替えれば本実施例と同様な装置を実
現できる。Referring to FIG. 1 in more detail, the character pattern input section 1 stores an input interface and an input character pattern which receive a character pattern (input character pattern) sampled in a lattice and quantized into binary. Includes memory. Here, the input character pattern is
A partial image that is sampled in a grid pattern obtained by photoelectrically converting a document by an image scanner or the like and is quantized into two or more binary values, and is extracted in units of one character and quantized into two values. It is. In the input character pattern, the pixel value of the background portion is set to 0 and the pixel value of the character portion is set to 1. The pixel having the value 0 is called a white pixel, and the pixel having the value 1 is called a black pixel. In the present embodiment, as described above, a description will be given of a digitized character pattern in which the pixel value of the background portion is 0 and the pixel value of the character portion is 1 as a matter of course. Even in the case of an inverted image in which the pixel value of the character portion is 0, a device similar to that of the present embodiment can be realized by exchanging the determination results for 1 and 0 in the pixel value determination processing unit.

【００３０】輪郭情報抽出部２は、図３の説明で示した
構成をしており、輪郭追跡部７は、図形を構成する画素
の内で背景と接する画素である輪郭画素を輪郭追跡によ
って検出する。図形輪郭上にある着目画素の８近傍画素
の中から着目画素に隣接する輪郭上の画素（輪郭画素）
を反時計回り（または時計回り）に検出してその検出さ
れた画素へ着目画素を移動させる処理を繰り返しながら
全ての輪郭画素を検出する。図９にその様子を示す。図
９において斜線の付された画素は追跡済み画素、×印の
付された画素は着目画素、円弧の矢印は隣接画素の中か
ら次の輪郭画素を見つける順序を示す。検出された輪郭
画素はその座標値が検出順に保存される。これが輪郭画
素の座標値列である。輪郭追跡の詳細は文献「ディジタ
ル画像処理」（近代科学社）の３５３頁〜３６０頁など
にも紹介されている。The contour information extraction unit 2 has the configuration shown in the description of FIG. 3, and the contour tracking unit 7 detects, by contour tracing, a contour pixel which is a pixel in contact with the background among the pixels constituting the figure. I do. A pixel on the contour adjacent to the pixel of interest (contour pixel) among the eight neighboring pixels of the pixel of interest on the figure contour
Is detected counterclockwise (or clockwise), and all contour pixels are detected while repeating the process of moving the pixel of interest to the detected pixel. FIG. 9 shows this state. In FIG. 9, a hatched pixel indicates a tracked pixel, an X-marked pixel indicates a target pixel, and an arc arrow indicates an order in which the next contour pixel is found from adjacent pixels. The coordinates of the detected contour pixels are stored in the order of detection. This is the coordinate value sequence of the contour pixels. The details of the contour tracking are also introduced on pages 353 to 360 of the document “Digital Image Processing” (Kindai Kagaku).

【００３１】プリミティブセグメント生成部８は、輪郭
追跡部７から出力された輪郭画素の座標値列を有向線分
で近似して出力する。各有向線分の方向は、輪郭画素の
座標値列の順序に従った方向とする。近似アルゴリズム
は様々なものを用いることができるが、本実施例では図
１０に示されるように輪郭と線分との距離ｄｎが予め定
めたしきい値以下になるまで繰り返し分割してゆく方法
を用いる。この方法は、例えば文献「ＩｍａｇｅＰｒ
ｏｃｅｓｓｉｎｇ，ＡｎａｌｙｓｉｓａｎｄＭａ
ｃｈｉｎｅＶｉｓｉｏｎ」ＭｉｌａｎＳｏｎｋａ，
ＶａｃｌａｖＨｌａｖａｃａｎｄＲｏｇｅｒＢ
ｏｙｌｅ（ＣＨＡＰＭＡＮ＆ＨＡＬＬ）の２０９頁
〜２１０頁に紹介されている。これを例をあげて説明す
ると、例えば、輪郭追跡法によって得られた輪郭画素の
座標値列Ｐｉ（ｘｉ，ｙｉ）を、Ｐ１（１，１），Ｐ２
（２，１），Ｐ３（３，１），Ｐ４（４，２），Ｐ５
（５，３），Ｐ６（６，４），Ｐ７（５，４），Ｐ８
（４，４），Ｐ９（３，４），Ｐ１０（２，４）とする
時、以下のステップで線分近似を行い、プリミティブセ
グメントを生成する。ただし、しきい値Ｔｈを１とす
る。ステップ１：始点と終点すなわちＰ１とＰ１０を結
ぶ有向線分Ｌ（１，１０）をつくる。Ｐ１からＰ１０
へ至る方向を正とする。ステップ２：輪郭画素の座標値
列でＰ１からＰ１０に至るすべての輪郭画素と有向線分
Ｌ（１，１０）との距離を測定して距離最大になる画素
（点）とその時の距離を求める。この場合、分割点はＰ
６であり、そのときの線分Ｌ（１，１０）とＰ６との距
離ｄ１は、ｄ１＝ＳＱＲＴ｛Ｌ（１，６）・Ｌ（１，
６）−Ｌ（１，６）・Ｌ（１，１０）／（Ｌ（１，１
０）・Ｌ（１，１０））｝≒約４．１２３となる。ただ
し、’・’は内積演算記号、Ｌ（１，６）は画素Ｐ１か
らＰ６へ至る有向線分、ＳＱＲＴ｛｝は平方根の演算を
表す。ステップ３：ｄ１＞Ｔｈなので、画素Ｐ６におい
て分割を行って、輪郭画素の座標値列Ｐ１〜Ｐ６とＰ６
〜Ｐ１０をつくる。ステップ４：分割された２つの輪郭
画素の座標値列Ｐ１〜Ｐ６とＰ６〜Ｐ１０について、各
々、有向線分Ｌ（１，６）とＬ（６，１０）との距離を
ステップ２と同様に求める。その結果、Ｐ１〜Ｐ６とＬ
（１，６）との距離ｄ２≒１．９２５（ただし、距離最
大となる画素はＰ３）、Ｐ６〜Ｐ１０とＬ（６，１０）
との距離ｄ３＝０となる。ステップ５：ｄ２＞Ｔｈなの
で座標値列Ｐ１〜Ｐ６はＰ３で分割する。ステップ６：
全ての分割された座標値列（セグメント）と対応する座
標値列との最大距離が１以下となったので分割を終了す
る。生成されたプリミティブセグメントは、Ｌ（１，
３），Ｌ（３，６），Ｌ（６，１０）である。The primitive segment generation unit 8 approximates the coordinate value sequence of the outline pixels output from the outline tracking unit 7 by a directed line segment and outputs the result. The direction of each directed line segment is a direction according to the order of the coordinate value sequence of the contour pixels. Various approximation algorithms can be used. In this embodiment, as shown in FIG. 10, a method of repeatedly dividing until the distance dn between the contour and the line segment becomes equal to or smaller than a predetermined threshold value is used. Used. This method is described, for example, in the document “Image Pr
processing, Analysis and Ma
chine Vision "by Milan Sonka,
VaclavHlavac and Roger B
oil (CHAPMAN & HALL), pp. 209-210. This will be described by taking an example. For example, a coordinate value sequence Pi (xi, yi) of contour pixels obtained by a contour tracking method is represented by P1 (1, 1), P2
(2, 1), P3 (3, 1), P4 (4, 2), P5
(5, 3), P6 (6, 4), P7 (5, 4), P8
When (4, 4), P9 (3, 4), and P10 (2, 4) are used, a line segment is approximated in the following steps to generate a primitive segment. However, the threshold value Th is set to 1. Step 1: A directed line segment L (1, 10) connecting the start point and the end point, that is, P1 and P10 is created. P1 to P10
The direction leading to is positive. Step 2: Measure the distances between all the contour pixels from P1 to P10 and the directed line segment L (1, 10) in the coordinate value sequence of the contour pixels, and determine the maximum pixel (point) and the distance at that time. Ask. In this case, the dividing point is P
6, and the distance d1 between the line segment L (1, 10) and P6 at that time is d1 = SQRT ｛L (1, 6) · L (1,
6) −L (1,6) · L (1,10) / (L (1,1
0) · L (1,10))｝ ≒ approximately 4.123. Here, “•” represents an inner product operation symbol, L (1, 6) represents a directed line segment from the pixels P1 to P6, and SQRT ｛｝ represents a square root operation. Step 3: Since d1> Th, division is performed at the pixel P6, and the coordinate value strings P1 to P6 and P6 of the contour pixels are obtained.
Create ~ P10. Step 4: The distance between the directed line segments L (1,6) and L (6,10) is the same as in step 2 for the coordinate value strings P1 to P6 and P6 to P10 of the two divided outline pixels. Ask for. As a result, P1 to P6 and L
Distance d2 to (1,6) d ≒ 1.925 (P3 is the pixel with the maximum distance), P6 to P10 and L (6,10)
Distance d3 = 0. Step 5: Since d2> Th, the coordinate value strings P1 to P6 are divided by P3. Step 6:
Since the maximum distance between all the divided coordinate value strings (segments) and the corresponding coordinate value strings has become 1 or less, the division ends. The generated primitive segment is L (1,
3), L (3, 6) and L (6, 10).

【００３２】正規化処理部９は、プリミティブセグメン
ト生成部８から出力されたプリミティブセグメントを入
力として、文字パターンの幅および高さがそれぞれ予め
定められた一定値になるように輪郭の座標値列とプリミ
ティブセグメントをスケール変換する。例えば文字パタ
ーンの幅がＷ、高さがＨであり、予め定めた幅と高さの
一定値が共にＦの場合、輪郭の座標値列とプリミティブ
セグメントの始点・終点のｘ座標値を全てＷ／Ｆ倍し、
ｙ座標値を全てＨ／Ｆ倍する。この操作を正規化と呼
ぶ。The normalization processing section 9 receives the primitive segment output from the primitive segment generation section 8 as an input, and generates a coordinate value sequence of a contour such that the width and height of the character pattern become predetermined constant values. Scale transform primitive segments. For example, if the width of the character pattern is W and the height is H, and both the predetermined width and the predetermined height are F, the coordinate value sequence of the outline and the x-coordinate values of the start and end points of the primitive segment are all represented by W / F times,
All y coordinate values are multiplied by H / F. This operation is called normalization.

【００３３】凹凸セグメント生成部１１は、図３の説明
で示したとおりであり、例えば、図５、図６に示される
文字パターンからは、図７に示されるＳ１、Ｓ２、Ｓ
３、Ｓ４の４つの輪郭セグメントが生成される。The concavo-convex segment generation section 11 is as described in the description of FIG. 3. For example, from the character patterns shown in FIGS. 5 and 6, S1, S2, and S shown in FIG.
3, four contour segments of S4 are generated.

【００３４】コーナー分割処理部１２は、凹凸セグメン
ト生成部１１から出力された凹セグメントまたは凸セグ
メント内にコーナーがある場合に凹セグメトまたは凸セ
グメントをコーナー位置で分割する。その結果得られる
セグメントを輪郭セグメントとする。例えば、図７では
凹セグメントＳ３が点Ｐにおいて分割される。つまり、
図７の凹凸セグメントを点Ｐで分割したものを輪郭セグ
メントとする。When there is a corner in the concave segment or the convex segment output from the concave / convex segment generating unit 11, the corner division processing unit 12 divides the concave segment or the convex segment at the corner position. The resulting segment is defined as a contour segment. For example, in FIG. 7, the concave segment S3 is divided at the point P. That is,
A contour segment is obtained by dividing the concave and convex segment in FIG.

【００３５】標準輪郭セグメント生成部１３は、コーナ
ー分割処理部１２から出力される輪郭セグメントと輪郭
追跡部７から出力される輪郭の座標値列とを入力とし、
この輪郭セグメントに対応する輪郭座標値列を検出した
後、この輪郭座標値列を正規化し、さらに例えば１〜３
画素程度の距離毎に座標値を間引いて新たな座標値列を
生成する。この座標値列に座標値列の並び順の方向を与
えて有向線分の列としたものを標準輪郭セグメントとし
て出力する。例えば、図７の輪郭セグメントからは、図
８で示されるような５つの標準輪郭セグメント：ＲＳ
１，ＲＳ２，ＲＳ３，ＲＳ４，ＲＳ５が出力される。The standard outline segment generation unit 13 receives as input the outline segment output from the corner division processing unit 12 and the coordinate value sequence of the outline output from the outline tracking unit 7.
After detecting the outline coordinate value sequence corresponding to the outline segment, the outline coordinate value sequence is normalized, and further, for example, 1 to 3
A new coordinate value sequence is generated by thinning out coordinate values for each distance of about a pixel. The direction of the arrangement order of the coordinate value sequence is given to this coordinate value sequence, and a sequence of directed segments is output as a standard contour segment. For example, from the contour segment of FIG. 7, five standard contour segments as shown in FIG.
1, RS2, RS3, RS4, and RS5 are output.

【００３６】中心位置算出部１０は、正規化処理部９か
ら出力される正規化されたプリミティブセグメント（有
向線分）の中心座標値を算出して出力する。The center position calculation unit 10 calculates and outputs the center coordinate values of the normalized primitive segments (directed line segments) output from the normalization processing unit 9.

【００３７】辞書３は、文字カテゴリ毎に、輪郭情報抽
出部２から出力される標準輪郭セグメントを文字コード
とともに保存している。The dictionary 3 stores standard contour segments output from the contour information extracting unit 2 together with character codes for each character category.

【００３８】構造マッチング部４は、図４の説明で示し
た構成をしており、候補セグメント検出部１４は、入力
文字パターンと辞書３に格納されている任意の文字コー
ドに対応する任意の文字パターンとの照合において、入
力文字パターンに含まれるｉ番目のプリミティブセグメ
ントＶｉと任意の文字パターンに含まれる全ての標準輪
郭セグメント｛ＲＳｋ｝との間で距離を計算する。ただ
し、距離計算は、プリミティブセグメントＶｉの中点
と、標準輪郭セグメントＲＳｋを構成する全ての有向線
分の終点との間で１対１で行う。距離がしきい値（例え
ば、文字パターンの正規化サイズ（正規化処理部９にお
けるＦの値）に依存するが、４０×４０画素程度のサイ
ズの場合で１０画素程。）以内の場合にはこのしきい値
以内となった文字パターンに対応する文字コードとこの
文字コードに対応する文字パターンに含まれるこのしき
い値以内となった標準輪郭セグメントＲＳｋとをセグメ
ント間距離算出部１５に通知し、この距離が予め定めた
しきい値以内でないときに任意の文字パターンの中にプ
リミティブセグメントＶｉに対応する標準輪郭セグメン
トＲＳｋがないことを文字パターン間距離算出部１７に
通知する。以上の動作を辞書３内のすべての文字コード
について行う。The structure matching section 4 has the structure shown in the description of FIG. 4, and the candidate segment detecting section 14 outputs an input character pattern and an arbitrary character code corresponding to an arbitrary character code stored in the dictionary 3. In matching with the pattern, the distance is calculated between the i-th primitive segment Vi included in the input character pattern and all the standard contour segments {RSk} included in the arbitrary character pattern. However, the distance calculation is performed on a one-to-one basis between the midpoint of the primitive segment Vi and the end points of all the directed segments constituting the standard contour segment RSk. When the distance is within a threshold value (for example, depending on the normalized size of the character pattern (the value of F in the normalization processing unit 9), it is about 10 pixels when the size is about 40 × 40 pixels). The inter-segment distance calculation unit 15 is notified of the character code corresponding to the character pattern within the threshold value and the standard contour segment RSk included in the character pattern corresponding to the character code and within the threshold value. When this distance is not within the predetermined threshold value, it notifies the inter-character-pattern distance calculation unit 17 that there is no standard contour segment RSk corresponding to the primitive segment Vi in an arbitrary character pattern. The above operation is performed for all the character codes in the dictionary 3.

【００３９】セグメント間距離算出部１５は、プリミテ
ィブセグメントＶｉと候補セグメント検出部１４から通
知された１以上の文字コードのそれぞれの標準輪郭セグ
メントＲＳｋとの間で第２の距離計算を行なう。この計
算を図１１を用いて説明する。図１１において、Ｖ９は
プリミティブセグメント、ＲＳ２は有向線分ｕ２１、ｕ
２２、ｕ２３、ｕ２４、ｕ２５、ｕ２６、ｕ２７、ｕ２
８、ｕ２９から構成される標準輪郭セグメントである。
標準輪郭セグメントＲＳ２を構成する全ての有向線分の
終点（または始点）からプリミティブセグメントＶ９に
対して垂線を下して交わった点（交点）までの距離をそ
れぞれｄ２１、ｄ２２、ｄ２３、ｄ２４、ｄ２５、ｄ２
６、ｄ２７とする。ここで、プリミティブセグメトＶ９
の範囲外で直線と交わった場合は距離計算を行なわな
い。この時のプリミティブセグメントＶ９と標準輪郭セ
グメントＲＳ２との距離Ｄ（９，２）は、Ｄ（９，２）
＝（１／Ｎ）｛ｄ２１＋ｄ２２＋ｄ２３＋ｄ２４＋ｄ２
５＋ｄ２６＋ｄ２７＋α（｜Δθ２１｜＋｜Δθ２２｜
＋｜Δθ２３｜＋｜Δθ２４｜＋｜Δθ２５｜＋｜Δθ
２６｜＋｜Δθ２７｜）｝（式１）である。ここで、Ｎ
＝７である。つまりＮは標準輪郭セグメントＲＳ２を構
成する有向線分の内、プリミティブセグメントＶ９に垂
線を下ろしてＶ９の範囲内で交わったものの数である。
Δθ２ｊは、有向線分ｕ２ｊとプリミティブベクトルＶ
９の成す角の大きさである。また、αは実験的に求めら
れる結合係数である。距離計算の後、標準プリミティブ
セグメントＲＳｋはプリミティブセグメントとの間で距
離を求めることができた有向線分に照合済みマークを付
けて辞書３へ書き戻しておく。入力文字パターンのプリ
ミティブセグメントＶｉと任意の文字コードの標準輪郭
セグメントＲＳｋとの距離値Ｄ（ｉ，ｋ）を算出して出
力する。以上の動作を辞書３内のすべての文字コードに
ついて行う。The inter-segment distance calculation unit 15 performs a second distance calculation between the primitive segment Vi and each of the standard contour segments RSk of one or more character codes notified from the candidate segment detection unit 14. This calculation will be described with reference to FIG. In FIG. 11, V9 is a primitive segment, RS2 is directed segments u21, u
22, u23, u24, u25, u26, u27, u2
8, a standard contour segment composed of u29.
The distances from the end points (or start points) of all the directed line segments constituting the standard contour segment RS2 to the points (intersection points) perpendicularly intersecting the primitive segment V9 are d21, d22, d23, d24, respectively. d25, d2
6, d27. Here, Primitive Segmet V9
If it intersects a straight line outside the range, the distance calculation is not performed. The distance D (9,2) between the primitive segment V9 and the standard contour segment RS2 at this time is D (9,2)
= (1 / N) ｛d21 + d22 + d23 + d24 + d2
5 + d26 + d27 + α (| Δθ21 | + | Δθ22 |
+ | Δθ23 | + | Δθ24 | + | Δθ25 | + | Δθ
26 | + | Δθ27 |)｝ (Equation 1). Where N
= 7. That is, N is the number of directional segments constituting the standard contour segment RS2 which are perpendicular to the primitive segment V9 and intersect within the range of V9.
Δθ2j is defined by a directed segment u2j and a primitive vector V
9 is the size of the angle. Α is a coupling coefficient obtained experimentally. After the distance calculation, the standard primitive segment RSk writes a collated mark on the directed line segment whose distance from the primitive segment could be obtained and writes it back to the dictionary 3. The distance value D (i, k) between the primitive segment Vi of the input character pattern and the standard contour segment RSk of an arbitrary character code is calculated and output. The above operation is performed for all the character codes in the dictionary 3.

【００４０】最良セグメント検出部１６は、セグメント
間距離算出部１５が候補セグメント検出部１４から通知
されたすべての標準輪郭セグメントＲＳｋとの間で距離
値を算出し終えセグメント間距離算出部１５から距離値
Ｄ（ｉ，ｋ）を受け取り終えたら、受け取った距離値Ｄ
（ｉ，ｋ）の中で最小距離をとる標準輪郭セグメントＲ
Ｓｋｍｉｎを検出し、この時の距離値を出力する。検出
後、ＲＳｋｍｉｎ以外の標準輪郭セグメントＲＳｋ（ｋ
はｋｍｉｎ以外）の有向線分に（セグメント間距離算出
部１５で）つけられた照合済みマークは全てクリアす
る。The best-segment detecting unit 16 finishes calculating the distance value between all the standard contour segments RSk notified from the candidate segment detecting unit 14 by the inter-segment distance calculating unit 15 and calculates the distance from the inter-segment distance calculating unit 15. When the value D (i, k) has been received, the received distance value D
Standard contour segment R taking the minimum distance in (i, k)
Skmin is detected, and the distance value at this time is output. After the detection, a standard contour segment RSk (k
(Other than kmin), all the verified marks added by the inter-segment distance calculating unit 15 are cleared.

【００４１】文字パターン間距離算出部１７は、最良セ
グメント検出部１６から出力される距離値を累積して文
字パターン間距離とする。ただし、候補セグメント検出
部１４で候補が１つも見つからなかった場合は、ペナル
ティ評価値Ｂ（ここで、ペナルティ評価値Ｂとは、入力
文字パターンと辞書３中の文字コードの標準輪郭セグメ
ントとのマッチングにおいて、ミスマッチしたセグメン
トの長さを評価する値である。ミスマッチしたセグメン
ト長の総和がしきい値以上ならば、文字認識結果出力部
５において、文字認識結果をリジェクトとする。ミスマ
ッチしたセグメントには、入力文字パターンに含まれる
ものと、辞書３中の文字コードの標準輪郭セグメントに
含まれるものの２種類があり、両者の和をペナルティ評
価値とする。入力文字パターンに含まれるプリミティブ
セグメントの内、候補セグメント検出部１４において、
辞書３中の文字コードのどの標準輪郭セグメントとも対
応のつかなかった（距離が遠かった）ものがミスマッチ
したプリミティブセグメントとなる。辞書３中の文字コ
ードの標準輪郭セグメントの内、入力文字パターンの何
れのプリミティブセグメントＶｉとも対応のつかなかっ
たもの、あるいは、極一部でしか対応のつかなかったも
のがミスマッチした標準輪郭セグメントとなる。）の値
にプリミティブセグメントＶｉの長さに比例した値を加
算する。プリミティブセグメントの長さに比例した値
は、例えば入力文字パターンのプリミティブセグメント
長の総和を１に正規化した時の相対値とする。また、辞
書３内の文字パターンの標準輪郭セグメントにおいて照
合済みマークの無いものおよび、照合済みマークはある
もののそのマーク数が含まれる有向線分数のＧ％（ただ
し、Ｇは予め与える定数で、例えば３０％である。）に
満たないものがある場合もその標準輪郭セグメントの長
さに比例した値をペナルティ評価値Ｂに加算する。辞書
３中の文字パターンに含まれる標準輪郭セグメントの
内、照合済みマークの全く無い、またはほとんど無いも
のの長さに比例した値は、例えば文字パターンに含まれ
る全ての輪郭線セグメントを構成する有向線分の長さの
総和を１に正規化した時の相対値とする。The inter-character-pattern distance calculating section 17 accumulates the distance values output from the best segment detecting section 16 to obtain the inter-character pattern distance. However, if no candidate is found by the candidate segment detection unit 14, the penalty evaluation value B (here, the penalty evaluation value B is a match between the input character pattern and the standard outline segment of the character code in the dictionary 3) Is a value for evaluating the length of the mismatched segment. If the sum of the lengths of the mismatched segments is equal to or greater than the threshold value, the character recognition result is rejected in the character recognition result output unit 5. There are two types, those included in the input character pattern and those included in the standard outline segment of the character code in the dictionary 3. The sum of the two is used as the penalty evaluation value. In the candidate segment detection unit 14,
A primitive segment that does not correspond to any of the standard contour segments of the character code in the dictionary 3 (has a long distance) is a mismatched primitive segment. Among the standard contour segments of the character codes in the dictionary 3, those that did not correspond to any of the primitive segments Vi of the input character pattern, or those that corresponded only at a very small portion to the mismatched standard contour segments Become. ) Is added to a value proportional to the length of the primitive segment Vi. The value proportional to the length of the primitive segment is, for example, a relative value when the sum of the primitive segment lengths of the input character pattern is normalized to 1. In the standard contour segment of the character pattern in the dictionary 3, G% of a directed segment that does not have a matched mark and that has a matched mark although the matched mark exists (where G is a constant given in advance, For example, even if there is less than 30%), a value proportional to the length of the standard contour segment is added to the penalty evaluation value B. Among the standard contour segments included in the character pattern in the dictionary 3, a value proportional to the length of the standard contour segment having no or almost no matched mark is, for example, a directional value that constitutes all the contour line segments included in the character pattern. The sum of the lengths of the line segments is a relative value when normalized to 1.

【００４２】文字認識結果出力部５は、構造マッチング
部４の文字パターン間距離算出部１７から出力される文
字パターン間距離が最小値をとる文字コードを出力す
る。ただし、同じく文字パターン間距離算出部１７から
出力されるペナルティ評価値がしきい値以上の時は棄却
と判定し、棄却を表すコードを出力する。ペナルティ評
価値のしきい値は例えば０．３程度の値を設定する。The character recognition result output unit 5 outputs a character code in which the distance between character patterns output from the character pattern distance calculation unit 17 of the structure matching unit 4 has a minimum value. However, when the penalty evaluation value output from the character pattern distance calculation unit 17 is equal to or larger than the threshold value, it is determined to be rejected, and a code indicating rejection is output. The threshold value of the penalty evaluation value is set to, for example, about 0.3.

【００４３】また、辞書３へは予め次のようにして格納
しておく。すなわち、図１２を参照してまず、文字パタ
ーン入力部１に代表的文字パターンを入力する。次に、
図３に示すようにこの文字パターンに対して、輪郭追跡
部７により輪郭追跡を行って輪郭画素の座標値列を生成
し（Ｓ１２１）、プリミティブセグメント生成部８によ
りステップ１２１で得られた輪郭画素の座標値列を有効
線分で近似してこれをプリミティブセグメントとし（Ｓ
１２２）、正規化処理部９によりプリミティブセグメン
トの座標を正規化する（Ｓ１２３）。次に、凹凸セグメ
ント生成部１１により、ステップ１２３で正規化された
プリミティブセグメントから凹凸セグメントを生成し
（Ｓ１２４）、コーナー分割処理部１２によりこの凹凸
セグメント内のコーナーを分割点として凹凸セグメント
を分割し得られたセグメントを輪郭セグメントとする
（Ｓ１２５）。そして、標準輪郭セグメント生成部１３
により輪郭セグメントから標準輪郭セグメントを生成し
この生成した標準輪郭セグメントを文字カテゴリを表す
文字コードとともに辞書３へ格納する（Ｓ１２６）。The dictionary 3 is stored in advance as follows. That is, referring to FIG. 12, first, a representative character pattern is input to the character pattern input unit 1. next,
As shown in FIG. 3, the contour tracing section 7 performs contour tracing on the character pattern to generate a coordinate value sequence of contour pixels (S121), and the primitive segment generating section 8 obtains the contour pixels obtained in step 121. Is approximated by an effective line segment and this is defined as a primitive segment (S
122), the coordinates of the primitive segment are normalized by the normalization processing unit 9 (S123). Next, the concave / convex segment generation unit 11 generates a concave / convex segment from the primitive segment normalized in step 123 (S124), and the corner division processing unit 12 divides the concave / convex segment using the corners in the concave / convex segment as division points. The obtained segment is set as a contour segment (S125). Then, the standard contour segment generation unit 13
Then, a standard outline segment is generated from the outline segment, and the generated standard outline segment is stored in the dictionary 3 together with a character code representing a character category (S126).

【００４４】以上説明した動作は、記録媒体６に予め記
録した文字認識プログラムを文字認識装置内の図示しな
いコンピュータが読み込みこのコンピュータがこの文字
認識プログラムを実行することにより成される。The operation described above is performed by a computer (not shown) in the character recognition apparatus reading a character recognition program recorded in advance on the recording medium 6 and executing the character recognition program.

【００４５】[0045]

【発明の効果】以上説明したように、本発明の文字認識
装置および文字認識方法によれば、輪郭情報抽出部によ
り、文字パターン入力部に蓄積された入力文字パターン
の輪郭を折線近似しこの折線近似したときのそれぞれの
折れ線である有向線分を含む輪郭情報を抽出し、構造マ
ッチング部により、輪郭情報抽出部により抽出された入
力文字パターンの輪郭情報と辞書に予め格納された代表
的文字パターンの輪郭情報とを文字カテゴリの各々につ
いて輪郭情報に含まれる有向線分を単位にして照合し、
入力文字パターンと代表的文字パターンとの距離を求め
距離値として出力し、文字認識結果出力部により、文字
カテゴリの各々について構造マッチング部が出力した距
離値より最も確からしい入力文字パターンの文字カテゴ
リを求めるため、文字パターンの照合の単位を文字パタ
ーンの輪郭を折線近似して生成した有向線分としこの照
合単位が従来よりも小さいので、文字ストロークに途切
れのある文字や文字の一部に潰れのある文字でも従来よ
りも高い文字認識性能が得られる。As described above, according to the character recognition apparatus and the character recognition method of the present invention, the outline of the input character pattern stored in the character pattern input section is approximated by the outline information extracting section, and the broken line is approximated. The contour information including the directed segments, which are the polygonal lines at the time of approximation, is extracted, and the structure matching unit extracts the outline information of the input character pattern extracted by the outline information extraction unit and the representative characters stored in the dictionary in advance. The contour information of the pattern is collated for each character category in units of directed segments included in the contour information,
The distance between the input character pattern and the representative character pattern is obtained and output as a distance value, and the character recognition result output unit determines the character category of the input character pattern that is most probable from the distance value output by the structure matching unit for each of the character categories. Since the unit of character pattern matching is a directed line segment generated by approximating the contour of the character pattern with a broken line, the matching unit is smaller than before, so it is collapsed into characters or parts of characters with broken character strokes. Higher character recognition performance than before can be obtained even for characters with characters.

[Brief description of the drawings]

【図１】本発明の文字認識装置の一つの実施の形態を示
すブロック図である。FIG. 1 is a block diagram showing one embodiment of a character recognition device of the present invention.

【図２】本発明の実施の形態の動作の一例を示し、本発
明の文字認識方法の一例を示す図である。FIG. 2 is a diagram illustrating an example of an operation of the exemplary embodiment of the present invention and an example of a character recognition method of the present invention.

【図３】輪郭情報抽出部の一つの実施の形態を示すブロ
ック図である。FIG. 3 is a block diagram showing one embodiment of a contour information extraction unit.

【図４】構造マッチング部の一つの実施の形態を示すブ
ロック図である。FIG. 4 is a block diagram showing one embodiment of a structure matching unit.

【図５】入力文字パターンの一例を示す図である。FIG. 5 is a diagram illustrating an example of an input character pattern.

【図６】プリミティブセグメント抽出の一例を示す図で
ある。FIG. 6 is a diagram illustrating an example of primitive segment extraction.

【図７】凹凸セグメント生成の一例を示す図である。FIG. 7 is a diagram illustrating an example of uneven segment generation.

【図８】標準輪郭セグメント生成の一例を示す図であ
る。FIG. 8 is a diagram illustrating an example of standard contour segment generation.

【図９】輪郭追跡の一例を示す図である。FIG. 9 is a diagram illustrating an example of contour tracking.

【図１０】プリミティブセグメント生成の一例を示す図
である。FIG. 10 is a diagram illustrating an example of primitive segment generation.

【図１１】プリミティブセグメントと標準輪郭セグメン
トとの距離計算の一例を示す図である。FIG. 11 is a diagram illustrating an example of calculating a distance between a primitive segment and a standard contour segment.

【図１２】辞書を作成する動作の一例を示すフローチャ
ートである。FIG. 12 is a flowchart illustrating an example of an operation for creating a dictionary.

[Explanation of symbols]

１文字パターン入力部２輪郭情報抽出部３辞書４構造マッチング部５文字認識結果出力部６記録媒体７輪郭追跡部８プリミティブセグメント生成部９正規化処理部１０中心位置算出部１１凹凸セグメント生成部１２コーナー分割処理部１３標準輪郭セグメント生成部１４候補セグメント検出部１５セグメント間距離算出部１６最良セグメント検出部１７文字パターン間距離算出部 Reference Signs List 1 character pattern input unit 2 contour information extraction unit 3 dictionary 4 structure matching unit 5 character recognition result output unit 6 recording medium 7 contour tracking unit 8 primitive segment generation unit 9 normalization processing unit 10 center position calculation unit 11 irregularity segment generation unit 12 Corner division processing unit 13 Standard contour segment generation unit 14 Candidate segment detection unit 15 Inter-segment distance calculation unit 16 Best segment detection unit 17 Character pattern distance calculation unit

Claims

[Claims]

1. A character pattern input unit for receiving a character pattern sampled in a grid and quantized into a binary form and storing the character pattern as an input character pattern, and an outline of the input character pattern stored in the character pattern input unit. A contour information extraction unit for extracting contour information including a directed line segment that is a polygonal line when the polygonal line approximation is performed, and the directed line segment of the character pattern representative for each character category. A dictionary that stores the outline information in advance together with a character code corresponding to the character category, and the outline information of the input character pattern extracted by the outline information extraction unit and the representative character pattern stored in the dictionary. The contour information is collated for each of the character categories in units of the directed line segment included in the contour information, and the input character pattern is collated. A structure matching unit that calculates a distance between the character pattern and the representative character pattern and outputs the distance as a distance value; and for each of the character categories, the character of the input character pattern that is most probable from the distance value output by the structure matching unit. And a character recognition result output unit for obtaining a category.

2. The contour information extracting unit performs contour tracking in a clockwise or counterclockwise direction on the input character pattern stored in the character pattern input unit, and calculates a coordinate value sequence of the contour of the input character pattern. A contour tracing unit to be extracted; a line-shaped approximation of the contour of the input character pattern from the coordinate value sequence of the contour of the input character pattern extracted by the contour tracing unit; A primitive segment generator for extracting each of the minutes as a primitive segment, and a constant size in which the width and height of each of the primitive segments of the input character pattern extracted by the primitive segment generator are predetermined. And convert each of the primitives Character recognition apparatus according to claim 1, comprising: the normalization processing unit which outputs the contour information including a segment in the structure matching unit.

3. The primitive segment forming a convex portion in the input character pattern from each primitive segment of the input character pattern converted and normalized by the normalization processing unit. And a concave and convex segment generation unit that generates a convex segment indicating a set of the primitive segments and a concave segment in the input character pattern that is a part other than the convex segment. It is checked whether the generated concave segment and the convex segment have a corner within a predetermined angle.When the corner is present, the segment having this corner is divided at this corner position, and the divided segment and The concave segment and the convex segment without corners A corner segmentation processing unit that generates the contour segment as an outline segment; and the outline segment generated by the corner division processing unit and the coordinate value sequence of the outline of the input character pattern extracted by the outline tracking unit. A new coordinate value sequence is generated by sampling the coordinate value sequence at predetermined intervals, and is used as a standard outline segment. This standard outline segment is output together with a character code corresponding to an input character pattern as a part of the outline information. 3. The character recognition device according to claim 2, further comprising: a standard contour segment generating unit that stores the outline in the dictionary in advance.

4. The method according to claim 1, wherein the structural matching unit includes an i included in the input character pattern obtained from the contour feature extracting unit.
The primitive segment Vi (where i = 1 to the number of primitive segments included in the input character pattern) and all the primitive segments Vi included in an arbitrary character pattern corresponding to the arbitrary character code in the dictionary A first distance calculation is performed between a standard contour segment and a character code corresponding to a character pattern that falls within the threshold when the distance value falls within a predetermined threshold, and the character code corresponding to the character code. Outputting the standard contour segment that is within the threshold included in the character pattern,
A candidate segment detecting unit that notifies that there is no standard contour segment corresponding to the primitive segment Vi in any of the character patterns when the calculated value is not within a predetermined threshold value; An inter-segment distance calculation unit that performs a second distance calculation between each of the standard contour segments of each of the one or more output character codes and the primitive segment Vi and outputs a distance value; A best segment detecting unit that detects the standard contour segment corresponding to the smallest distance value between the primitive segment Vi and each of the distance values output by the unit, and outputs the distance value; Of the input character pattern and the letter Character recognition apparatus according to claim 3, characterized in that and a inter-character pattern distance calculation unit that calculates a distance value between the character and the character pattern in.

5. A first step of receiving a character pattern sampled in a lattice form and quantized into a binary form and storing the character pattern as an input character pattern, and approximating a contour of the stored input character pattern by a polygonal line. A second step of extracting outline information including a directed line segment that is a polygonal line at the time of approximation; and a step of extracting a representative character pattern in each of the character categories stored in advance together with a character code corresponding to the character category. The contour information including a directed segment and the contour information of the input character pattern extracted in the second step are united for the directed segment included in the contour information for each of the character categories. A third step of collating and determining a distance between the input character pattern and the representative character pattern; and A fourth step of obtaining the character category of the input character pattern.

6. The second step extracts a coordinate value sequence of an outline of the input character pattern by performing contour tracing in a clockwise or counterclockwise direction with respect to the input character pattern accumulated in the first step. Performing a polygonal line approximation of the contour of the input character pattern from the coordinate value sequence of the extracted contour of the input character pattern, and extracting, as primitive segments, directed segments that are polygonal lines when the polygonal line approximation is performed. And converting each of the primitive segments of the extracted input character pattern so that the width and height of the input character pattern have a predetermined constant size, respectively, and the converted respective primitives Outputting the contour information including a segment. The character recognition method according to claim 5, wherein

7. A first process for receiving a character pattern sampled in a lattice and quantized into binary values and storing the character pattern as an input character pattern, and approximating a contour of the stored input character pattern with a broken line. A second process of extracting outline information including a directed segment that is a polygonal line at the time of approximation; and a process of extracting a representative character pattern in each of the character categories stored in advance together with a character code corresponding to the character category. The outline information including a directed line segment and the outline information of the input character pattern extracted in the second processing are united for the directed line segment included in the outline information for each of the character categories. A third process of comparing the input character pattern and the representative character pattern to determine a distance between the input character pattern and the representative character pattern; And a fourth process for obtaining the character category of the character recognition program.

8. The second process extracts the coordinate value sequence of the contour of the input character pattern by performing contour tracing in a counterclockwise or clockwise direction with respect to the input character pattern accumulated in the first process. And a line segment approximation of the contour of the input character pattern from the coordinate value sequence of the extracted contour of the input character pattern, and a directed segment that is a polygonal line when the polygonal line is approximated is extracted as a primitive segment. And converting each of the primitive segments of the extracted input character pattern so that the width and height of the input character pattern have a predetermined fixed size, respectively, and the converted respective primitives 8. The character according to claim 7, further comprising: outputting the contour information including a segment. A recording medium that stores a recognition program.