JPH08180138A

JPH08180138A - Character recognizing device

Info

Publication number: JPH08180138A
Application number: JP6324582A
Authority: JP
Inventors: Mamoru Yamada; 守山田; Hidetoshi Fukushima; 秀利福島
Original assignee: NAGANO NIPPON DENKI SOFTWARE KK; NEC Corp; NEC Software Nagano Ltd
Current assignee: NAGANO NIPPON DENKI SOFTWARE KK; NEC Corp; NEC Software Nagano Ltd
Priority date: 1994-12-27
Filing date: 1994-12-27
Publication date: 1996-07-12
Anticipated expiration: 2013-06-18
Also published as: JP2766205B2

Abstract

PURPOSE: To speed up the character recognition of a Japanese document by decreasing the number of arithmetic processes for a degree of difference without altering the array order of components of multi-dimensional feature vectors of a character recognition dictionary. CONSTITUTION: Characters to be recognized are divided into plural character groups consisting of characters which are small in degree of difference, and the character groups that the characters belong to and information showing whether or not they are representative characters of the character groups are stored in a character recognition dictionary 1 together with feature vectors. The character pattern of a character to be recognized which is inputted by a character pattern input part 2 is normalized by a character pattern normalization part 3 and then passed to a feature vector extraction part 4, which extracts multidimensional feature vectors consisting of plural feature elements. An initial candidate selection part 5 compares them with the representative characters of the character groups to select a character group whose degree of difference is less than a selection reference, and a final candidate selection part 6 while sequentially totalizing and adding the degrees of differences by feature elements about all character kinds of the selected character group decides a decision at each time to replace candidate characters. Lastly, a recognition result output part 7 outputs the candidate character of the highest order as a recognition result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は文字認識装置に関し、特
に分割された小領域ごとに複数の特徴要素成分を有する
特徴ベクトルを用い日本語文字の認識を行う文字認識装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device, and more particularly to a character recognition device for recognizing Japanese characters using a feature vector having a plurality of feature element components for each divided small area.

【０００２】[0002]

【従来の技術】日本語文章を対象とする文字認識におい
ては、漢字，平がな，片カナ，英数字および記号が含ま
れ使用される文字種が多い上に、複雑な漢字が多く含ま
れているため、文字領域を重複部分を有する多数の小領
域に分割して小領域ごとに複数の特徴要素を成分とする
ベクトルを設定し、この結果得られる「特徴要素数×小
領域数」の成分を有する多次元の特徴ベクトルを比較す
る方法が用いられる。2. Description of the Related Art In character recognition for Japanese sentences, many kanji, hiragana, katakana, alphanumeric characters and symbols are used, and many complicated kanji are included. Therefore, the character area is divided into a large number of small areas that have overlapping parts, and a vector with multiple feature elements is set for each small area. The resulting "number of feature elements x number of small areas" component A method of comparing multidimensional feature vectors with is used.

【０００３】複数の特徴要素を成分とする特徴ベクトル
の代表的なものには、文字画像を細線化した線画パター
ンから定義するものと、文字画像の輪郭線パターンから
定義するものとがあり、特徴ベクトルの総次元数を増加
させずに認識率を向上させるために、それぞれ小領域の
分割法や特徴要素の選定法の異なる種々のものが提案さ
れている。以下に、それぞれの代表的な例について説明
する。Typical feature vectors having a plurality of feature elements as components include one defined by a line drawing pattern of a character image and one defined by a contour line pattern of the character image. In order to improve the recognition rate without increasing the total number of vector dimensions, various methods have been proposed, each of which has a different method of dividing a small area and a method of selecting characteristic elements. Below, each typical example is demonstrated.

【０００４】前者の一例として方向線素特徴量と呼ばれ
るものがある。これは、６４×６４ドットの文字領域に
表示された文字パターンに細線化を施し、文字を構成す
る骨格線を表す線画パターンを求め、骨格線を構成する
各線素（一画素相当）の向きを縦（｜），横（―）及び
±４５度（／，＼）の４方向に量子化し、それを特徴要
素の成分とした１９６次元の特徴ベクトルである。ま
ず、６４×６４ドットの文字領域を８×８ドット単位の
区画に分割し、隣接する４区画をまとめた１６×１６ド
ットを小領域とし、各小領域を縦と横の両方向にそれぞ
れ半分ずつ重複させた４９個の小領域を構成する。この
ようにして構成した小領域ごとに中心部に重く周辺部で
軽いガウスフィルタ的な重みを持つ回廊状のフィルタを
対応させて方向ごとにヒストグラムを作成することによ
り、４９小領域×４方向＝１９６次元の特徴ベクトルが
得られる。詳細については、「方向線素特徴量を用いた
高精度文字認識」（電子通信学会論文誌Ｄ−II，ｖｏ
ｌ．Ｊ７４−Ｄ―II，Ｎｏ３，ｐｐ３３０〜３３９，１
９９１年３月）を参照されたい。An example of the former is a direction line element feature amount. This is to thin a character pattern displayed in a character area of 64 × 64 dots to obtain a line drawing pattern representing a skeleton line forming a character, and determine the orientation of each line element (corresponding to one pixel) forming the skeleton line. It is a 196-dimensional feature vector that is quantized in four directions of vertical (|), horizontal (-), and ± 45 degrees (/, \), and that is quantized. First, a character area of 64 × 64 dots is divided into 8 × 8 dot units, and 16 × 16 dots, which is a combination of adjacent 4 areas, are set as small areas, and each small area is divided by half in both vertical and horizontal directions. 49 overlapping small regions are formed. By creating a histogram for each direction by associating a corridor-shaped filter having a Gaussian filter-like weight that is heavy in the central part and light in the peripheral part for each small region configured in this way, 49 small regions × 4 directions = A 196-dimensional feature vector is obtained. For details, see “High-precision character recognition using directional line element features” (IEICE Transactions D-II, vo).
l. J74-D-II, No3, pp330-339, 1
March 991).

【０００５】一方、後者の一例である加重方向指数ヒス
トグラムでは、正規化後の２値化文字パターンの文字輪
郭を８連結で追跡し、各輪郭点での輪郭線の方向を４５
度おきの４方向に量子化する。まず、この方向量子化さ
れたパターンを縦１１個×横１１個＝１２１個の小区画
に分け、小区画ごとにヒストグラムを作成することによ
り縦１１×横１１×４方向＝４８４次元の方向指数ヒス
トグラムを得る。この方向指数ヒストグラムに重なりの
ある２次元ガウスフィルタ（５×５）を対応させて次元
圧縮することにより、縦６×横６×４方向＝１４４次元
の加重方向指数ヒストグラムが得られる。なお、詳細に
ついては、「加重方向指数ヒストグラム法のつぶれ文字
への対応」（信学技報，ＰＲＵ９０−１２８，ｐｐ２１
〜２６）を参照されたい。On the other hand, in the weighted direction index histogram, which is an example of the latter, the character contours of the binarized character pattern after normalization are tracked with eight connections, and the direction of the contour line at each contour point is 45.
Quantize in 4 directions. First, this direction-quantized pattern is divided into 11 vertical subsections × 11 horizontal subsections = 121 subsections, and a histogram is created for each subdivision so that the vertical 11 × horizontal 11 × 4 subdivisions = 484-dimensional subdivision Get the histogram. A two-dimensional Gaussian filter (5 × 5) having overlap with this direction index histogram is associated and dimensionally compressed to obtain a weighted direction index histogram of vertical 6 × horizontal 6 × 4 directions = 144 dimensions. For details, refer to "Corresponding to the collapsed characters in the weighted direction exponential histogram method" (Shingaku Giho, PRU90-128, pp21).
~ 26).

【０００６】特徴ベクトルを使用する文字認識装置にお
いては、認識しようとする文字画像（入力パターン）の
特徴ベクトルを抽出して文字認識辞書に格納されている
標準パターンの特徴ベクトルの対応した次元同士の相違
度あるいは類似度を計算し、各次元の相違度の総和が最
も小さくなる又は類似度の総和が最も大きくなる標準パ
ターンの文字種を認識結果として選択する。日本語の文
章を認識しようとすると、ＪＩＳ第１水準漢字に限って
みてもそれだけで三千字種近い文字種との相違度あるい
は類似度の計算を行わねばならず、高い精度の認識結果
を得ようとすると、どうしても特徴ベクトルの次元数が
増大するので、使用文字種数および次元数の増大に伴っ
て処理時間が増加し膨大なものになるという問題があっ
た。In the character recognition device using the feature vector, the feature vector of the character image (input pattern) to be recognized is extracted and the corresponding dimensions of the feature vectors of the standard pattern stored in the character recognition dictionary are compared. The degree of dissimilarity or the degree of similarity is calculated, and the character type of the standard pattern in which the sum of the dissimilarities of each dimension is the smallest or the sum of the similarity is the largest is selected as the recognition result. When trying to recognize a Japanese sentence, even if it is limited to JIS Level 1 Kanji, it is necessary to calculate the degree of difference or similarity with a character type close to 3,000, and a highly accurate recognition result is obtained. If this is done, the dimension number of the feature vector will inevitably increase, and there has been a problem that the processing time increases and becomes enormous as the number of character types used and the number of dimensions increase.

【０００７】この対策として次元数の少ない特徴ベクト
ル（例えば４×４×４＝６４次元）を用いて予備的な粗
選別を行い、選別された限定数の文字種に対してのみ次
元数の大きい特徴ベクトルを用いて最終選別を行う２段
階認識法が考えられるが、２種類の文字認識辞書と２種
類の特徴ベクトルの抽出を必要とする難点がある。As a countermeasure against this, a preliminary rough selection is performed using a feature vector having a small number of dimensions (for example, 4 × 4 × 4 = 64 dimensions), and a feature having a large dimension only for a limited number of selected character types. Although a two-step recognition method in which final selection is performed using a vector is conceivable, there is a drawback that it requires two types of character recognition dictionaries and two types of feature vector extraction.

【０００８】これに対し、１種類の文字認識辞書のみを
使用して、次元数の増加に伴う処理時間の増加を抑制す
る一方法が、特開昭６３―７８０号公報に提案されてい
る。この方法は、文字認識辞書に格納される特徴ベクト
ルの配列順序を、辞書に登録されている全文字種に関す
る標準偏差または分散の大きい次元順に並べ替えて登録
しておき、未知パターンの認識を行う場合には、未知パ
ターンの特徴ベクトルを抽出した後に同じ順序に並べ替
えを行い、上位次元の成分から一定次元数を用いて全文
字種との比較を行って粗選別を行い、選別された文字種
に対してのみ下位次元の成分比較を行うことにより、演
算対象となる文字種および次元数を抑制しようとするも
のである。すなわち、粗選別は上位から限定された一定
数の特徴ベクトル成分のみを用いて全文字種と比較し、
相違度または類似度が一定の限界を超えるものは候補か
ら外し、粗選別で残った候補に対してのみ全次元数の特
徴ベクトルを用いて最終候補を選択するものである。な
お、粗選別を何段階かに分けて繰り返し、その後で最終
選別を行ってもよい。On the other hand, Japanese Patent Laid-Open No. 63-780 proposes a method of suppressing an increase in processing time due to an increase in the number of dimensions by using only one type of character recognition dictionary. This method is used when recognizing an unknown pattern by rearranging the arrangement order of the feature vectors stored in the character recognition dictionary and registering them in order of the dimension with the largest standard deviation or variance for all the character types registered in the dictionary. , The feature vector of the unknown pattern is extracted and rearranged in the same order, and compared with all character types using a certain number of dimensions from the higher-dimensional component to perform rough selection, and for the selected character type By comparing the components of the lower dimension only with the above, the type of characters and the number of dimensions to be operated are suppressed. That is, rough selection is compared with all character types by using only a limited number of feature vector components from the top,
Those whose degree of difference or similarity exceeds a certain limit are excluded from the candidates, and the final candidate is selected using the feature vectors of all dimensions only for the candidates remaining in the rough selection. The rough selection may be repeated in several stages, and then the final selection may be performed.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、上述し
た特開昭６３―７８０号公報記載の方法は、粗選別を限
定された一定数の特徴ベクトル成分のみを用いて行うた
め、文字認識辞書に特徴ベクトルの配列順序を標準偏差
または分散の大きい次元順に並べ替えて登録し、未知パ
ターンの特徴ベクトルもこれと同じ順序に並べ替えを行
う必要がある。すなわち、多次元の特徴ベクトルのすべ
てについて、標準偏差または分散を計算して配列順序を
決定しなければならないため、対象字種数が多くなると
大きな処理工数を要するほか、対象字種数を変更すると
文字認識辞書全体を更新しなければならないという欠点
がある。However, in the method described in Japanese Patent Laid-Open No. 63-780, the rough selection is performed using only a limited fixed number of feature vector components. It is necessary to register the vector arrangement order by rearranging it in the order of the dimension having the larger standard deviation or variance, and rearrange the feature vector of the unknown pattern in the same order. In other words, for all of the multidimensional feature vectors, the standard deviation or variance must be calculated to determine the array order, so if the number of target character types increases, a large processing man-hour will be required and if the number of target character types is changed. It has the drawback of having to update the entire character recognition dictionary.

【００１０】本発明の目的は、文字認識辞書の特徴ベク
トルの配列を変更することなく、１種類の文字認識辞書
のみで相違度の演算数を削減でき日本語文章の認識を高
速化できる文字認識装置を提供することにある。An object of the present invention is to perform character recognition in which only one type of character recognition dictionary can be used to reduce the number of operations for the dissimilarity and speed up the recognition of Japanese sentences without changing the arrangement of feature vectors in the character recognition dictionary. To provide a device.

【００１１】[0011]

【課題を解決するための手段】請求項１の文字認識装置
は、分割された小領域ごとに複数の特徴要素成分を有す
る多次元の特徴ベクトルを用い日本語文字の認識を行う
文字認識装置であり、各文字種の標準パターンの特徴ベ
クトルと共に特徴ベクトルの相違度が小さい複数の文字
種で構成される複数の文字群のいずれに属するかを示す
文字群識別情報と各文字群の中心となる代表文字である
か否かを示す代表文字識別情報とが格納されている文字
認識辞書と、認識すべき未知入力文字の文字パターンを
取り込む文字パターン入力部と、取り込まれた文字パタ
ーンの大きさや線幅を一意に取り扱えるように正規化す
る文字パターン正規化部と、正規化された文字パターン
から特徴ベクトルを抽出する特徴ベクトル抽出部と、抽
出された特徴ベクトルと前記文字認識辞書の代表文字の
特徴ベクトルとを比較して相違度の小さい順にあらかじ
め定められた条件で初期候補となる代表文字を選択する
初期候補選択部と、選択された代表文字から相違度の小
さい順に候補文字を初期設定した後に選択された代表文
字が属する文字群の全文字種を対象として決められた特
徴要素成分の順番で各文字種について特徴要素成分ごと
に相違度の中間集計を行いながらその都度判定を加えて
随時候補文字を入れ替えていく最終候補選択部と、少な
くとも最終的に候補文字の最上位ランクに位置する文字
を認識結果として出力する認識結果出力部とを備えて構
成されている。A character recognition device according to claim 1 is a character recognition device for recognizing Japanese characters using a multidimensional feature vector having a plurality of feature element components for each of the divided small areas. Yes, character group identification information indicating which of a plurality of character groups each of which consists of a plurality of character types with a small degree of difference between the feature vectors and the characteristic vector of the standard pattern of each character type, and a representative character that is the center of each character group The character recognition dictionary that stores representative character identification information that indicates whether or not the character pattern input unit that captures the character pattern of an unknown input character to be recognized, and the size and line width of the captured character pattern A character pattern normalization unit that normalizes so that it can be handled uniquely, a feature vector extraction unit that extracts a feature vector from the normalized character pattern, and an extracted feature vector And a feature vector of a representative character of the character recognition dictionary are compared, and an initial candidate selection unit that selects a representative character that is an initial candidate under a predetermined condition in order of increasing degree of difference, and a difference from the selected representative character After initializing the candidate characters in ascending order of the degree, the intermediate summation of the dissimilarity of each feature element component is performed for each character type in the order of the determined feature element components for all the character types of the character group to which the selected representative character belongs. However, it is configured to include a final candidate selection unit that replaces candidate characters at any time by adding judgments and a recognition result output unit that at least finally outputs the character positioned in the highest rank of candidate characters as a recognition result. ing.

【００１２】請求項２の文字認識装置は、請求項１記載
の文字認識装置において、前記初期候補選択部が、あら
かじめ定めた限度値よりも相違度が小さいすべての代表
文字を初期候補として選択することを特徴としている。The character recognition device according to a second aspect is the character recognition device according to the first aspect, wherein the initial candidate selection unit selects all representative characters having a difference degree smaller than a predetermined limit value as an initial candidate. It is characterized by that.

【００１３】請求項３の文字認識装置は、請求項１又は
請求項２記載の文字認識装置において、前記初期候補選
択部が、抽出された特徴ベクトルと前記文字認識辞書の
代表文字の特徴ベクトルとを決められた特徴要素成分の
順番で比較し各代表文字について特徴要素成分ごとに相
違度の中間集計を行いながらその都度判定を加えて初期
候補となる代表文字を選択することを特徴としている。A character recognition apparatus according to a third aspect is the character recognition apparatus according to the first or second aspect, in which the initial candidate selecting section extracts the extracted feature vector and the feature vector of a representative character of the character recognition dictionary. Is compared with each other in the order of the determined characteristic element components, and for each representative character, the intermediate degree of the dissimilarity is calculated for each characteristic element component, and a judgment is added each time to select a representative character as an initial candidate.

【００１４】請求項４の文字認識装置は、請求項１，２
又は請求項３記載の文字認識装置において、前記最終候
補選択部が、前記初期候補選択部で選択された代表文字
の中から相違度の小さい順に一定数のみを候補文字とし
て初期設定することを特徴としている。The character recognition device according to claim 4 is the character recognition device according to claim 1
Alternatively, in the character recognition device according to claim 3, the final candidate selection unit initially sets only a fixed number of candidate characters in the order of decreasing dissimilarity among the representative characters selected by the initial candidate selection unit. I am trying.

【００１５】請求項５の文字認識装置は、請求項１から
請求項４までの各項記載の文字認識装置において、前記
最終候補選択部が、特徴要素成分ごとの相違度の中間集
計値を候補文字の最下位ランク文字の相違度と比較し、
残りの特徴要素成分についての相違度の計算を継続する
かどうかを逐次判定していくことを特徴としている。A character recognition apparatus according to a fifth aspect is the character recognition apparatus according to each of the first to fourth aspects, in which the final candidate selection unit selects an intermediate tabulated value of the dissimilarity for each characteristic element component. Compared to the dissimilarity of the lowest rank character of the character,
The feature is that it is sequentially determined whether or not the calculation of the dissimilarity of the remaining feature element components is continued.

【００１６】請求項６の文字認識装置は、請求項１から
請求項５までの各項記載の文字認識装置において、特徴
ベクトルを構成する複数の特徴要素成分が文字画像を細
線化した骨格パターン（又は文字画像の輪郭線パター
ン）の線素の方向を示す４方向要素であり、特徴要素成
分ごとに相違度の中間集計を行う際の順序を縦および横
方向の処理が斜め方向よりも先になるよう定めたことを
特徴としている。A character recognition apparatus according to a sixth aspect is the character recognition apparatus according to each of the first to fifth aspects, in which a plurality of feature element components forming a feature vector are skeleton patterns ( Or, it is a four-direction element indicating the direction of the line element of the outline pattern of the character image), and the order of performing the intermediate summation of the dissimilarity for each feature element component is such that the vertical and horizontal processing is performed before the diagonal processing. The feature is that it is set to be.

【００１７】[0017]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。Embodiments of the present invention will now be described with reference to the drawings.

【００１８】図１は本発明の一実施例の構成を示すブロ
ック図である。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【００１９】本実施例の文字認識装置は、図１に示すよ
うに、各文字種の標準パターンの特徴ベクトルと共に、
相違度が小さい類似文字で構成した複数の文字群のいず
れに属するかを示す識別情報および各文字群の中心とな
る代表文字を示す識別情報が格納されている文字認識辞
書１と、文字パターンを取り込むための文字パターン入
力部２と、取り込まれた文字パターンを一意に取り扱え
るように大きさや線幅を正規化するための文字パターン
正規化部３と、正規化された文字パターンから多次元の
特徴ベクトルを抽出する特徴ベクトル抽出部４と、抽出
された特徴ベクトルのすべての成分について前記文字認
識辞書１の代表文字と比較し、相違度があらかじめ定め
られた限度値より小さい代表文字を初期候補として選択
する初期候補選択部５と、選択された代表文字のすべて
を候補文字に初期設定した後に代表文字が属する全文字
群の全文字種を対象として、決められた特徴要素成分の
順番で特徴要素成分ごとに相違度の中間集計を逐次行
い、その都度処理継続の判定を入れながら随時候補文字
を入れ替えていく最終候補選択部６と、最終的に最上位
ランクに位置する文字を認識結果として出力する認識結
果出力部７とを備えて構成されている。As shown in FIG. 1, the character recognition apparatus of the present embodiment, together with the feature vector of the standard pattern of each character type,
A character recognition dictionary 1 that stores identification information indicating which of a plurality of character groups composed of similar characters with a small dissimilarity and identification information indicating a representative character at the center of each character group and a character pattern are stored. A character pattern input unit 2 for capturing, a character pattern normalizing unit 3 for normalizing the size and line width so that the captured character pattern can be handled uniquely, and a multidimensional feature from the normalized character pattern. A feature vector extraction unit 4 for extracting a vector is compared with the representative character of the character recognition dictionary 1 for all components of the extracted feature vector, and a representative character whose dissimilarity is smaller than a predetermined limit value is set as an initial candidate. Pair the initial candidate selection unit 5 to be selected with all the character types of all the character groups to which the representative character belongs after initializing all the selected representative characters as candidate characters. As a final candidate selection unit 6 that sequentially performs intermediate calculation of dissimilarity for each characteristic element component in the order of the determined characteristic element components and replaces the candidate characters as needed while making a determination to continue processing. And a recognition result output section 7 for outputting the character positioned at the highest rank as a recognition result.

【００２０】以下、日本語を対象とし、文字の骨格また
は輪郭の線素に４方向の特徴要素を割り当て、縦Ｘ×横
Ｙ×４方向＝４ＸＹ次元（ＸＹ小領域につき各４要素）
の特徴ベクトルを抽出して相違度による文字認識を行う
ものとして説明する。In the following, for the Japanese language, characteristic elements in four directions are assigned to the skeleton or outline line elements of a character, and vertical X × horizontal Y × 4 directions = 4XY dimensions (4 elements for each XY small area).
The description will be made assuming that the feature vector is extracted and the character recognition is performed based on the difference.

【００２１】文字認識辞書１には、認識対象となる全文
字種の標準パターンの特徴ベクトル成分が、各文字コー
ドに対して、決められた小領域順（例えば左上から右下
への順）で決められた要素順（例えば縦「｜」，横
「―」，＋４５度「／」，−４５度「＼」の順）に登録
されている。ここで、文字認識辞書１に登録されている
特徴ベクトルの成分をＧmik で表現し、添え字のｍは辞
書に登録された文字種の順番を、ｉは小領域の順番を、
ｋは特徴要素の方向の順番を示すものとする。小領域お
よび特徴要素の順番を上に例示した順序で数字で表す
と、ｉ，ｋはそれぞれｉ＝１〜Ｍ（Ｍ＝ＸＹ），ｋ＝１
〜４となり、文字種ごとに特徴ベクトルの各成分は、Ｇ
m11,Ｇm12,Ｇm13,Ｇm14,Ｇm21,Ｇm22,Ｇm23,Ｇm24,……
…ＧmM1,ＧmM2,ＧmM3,ＧmM4 のように配列されている。In the character recognition dictionary 1, the standard vector feature vector components of all character types to be recognized are determined for each character code in the determined small area order (for example, from the upper left to the lower right). The elements are registered in the given element order (for example, vertical "|", horizontal "-", +45 degrees "/", -45 degrees "\"). Here, the component of the feature vector registered in the character recognition dictionary 1 is represented by Gmik, the subscript m is the order of the character types registered in the dictionary, i is the order of the small areas,
It is assumed that k indicates the order of the direction of the characteristic element. When the order of the small area and the characteristic element is represented by numbers in the order illustrated above, i and k are i = 1 to M (M = XY) and k = 1, respectively.
~ 4, and each component of the feature vector for each character type is G
m11, Gm12, Gm13, Gm14, Gm21, Gm22, Gm23, Gm24, ...
It is arranged like GmM1, GmM2, GmM3, GmM4.

【００２２】文字認識辞書１には、これに加え、各文字
コードに対して、相違度が小さい類似文字で構成した複
数の文字群中のいずれに属するかを示す文字群識別情報
と、各文字群の中心となる代表文字であるか否かを示す
代表文字識別情報とが格納されている。すなわち、文字
認識辞書１に登録されている文字種は幾つかの文字群に
分類されており、各文字群は相違度の小さい複数の類似
文字から構成され、各文字群には一つの代表文字が指定
されている。ＪＩＳ第１水準漢字までを含む日本語文章
を対象とすると、認識対象の文字種数は約３３００であ
り文字群の数を２００とすると一文字群の平均字種数は
１６５となる。文字群の編成法としては種々の方法が考
えられ特に限定されるものではないが、代表文字とその
文字群内の文字種との相違度が一定の値を超えないこと
が必要である。なお、各文字群の範囲に多少の重複があ
っても差し支えない。In addition to this, the character recognition dictionary 1 also includes character group identification information indicating to which one of a plurality of character groups composed of similar characters having a small dissimilarity for each character code, and each character code. Representative character identification information indicating whether or not the character is the representative character that is the center of the group is stored. That is, the character types registered in the character recognition dictionary 1 are classified into several character groups, each character group is composed of a plurality of similar characters with a small dissimilarity, and each character group has one representative character. It is specified. When a Japanese sentence including JIS level 1 kanji is targeted, the number of character types to be recognized is about 3300, and when the number of character groups is 200, the average number of character types per character group is 165. Although various methods are conceivable as a method of organizing a character group and the method is not particularly limited, it is necessary that the degree of difference between the representative character and the character type in the character group does not exceed a certain value. It should be noted that there may be some overlap in the range of each character group.

【００２３】文字パターン入力部２は、イメージスキャ
ナで読み込んだ文書イメージ中の１文字分の矩形領域を
文字パターンとして取り出すものであり、ここで取り込
まれた文字パターンは、大きさの正規化や線幅の正規化
を行うために文字パターン正規化部３に渡される。The character pattern input unit 2 extracts a rectangular area for one character in a document image read by an image scanner as a character pattern. The character pattern taken in here is the normalization of the size or the line. It is passed to the character pattern normalization unit 3 to normalize the width.

【００２４】正規化を行った文字パターンは、特徴ベク
トル抽出部４において複数の特徴要素に分解され特徴ベ
クトルが抽出される。すなわち、特徴ベクトルの性質に
応じて骨格線画パターン又は輪郭線パターンが抽出さ
れ、分割単位ごとに各線素が方向（特徴要素）別に集計
され、重み付け及び併合処理が行われて各小領域の特徴
ベクトル成分が計算される。The normalized character pattern is decomposed into a plurality of characteristic elements in the characteristic vector extraction unit 4 to extract a characteristic vector. That is, a skeleton line drawing pattern or contour line pattern is extracted according to the property of the feature vector, each line element is aggregated for each direction (feature element) for each division unit, weighted and merged, and the feature vector of each small area The ingredients are calculated.

【００２５】抽出された特徴ベクトルは初期候補選択部
５に渡され、文字認識辞書１に登録されている各文字群
の代表文字の標準パターンとの相違度が計算され、粗選
別により詳細比較を行う対象文字数が限定される。すな
わち、文字認識辞書１の代表文字識別情報を参照して各
文字群の代表文字を順次選択し、特徴ベクトルの同じ次
元同士を比較して相違度の総和を算出し、相違度の総和
があらかじめ定めた選別基準を超える文字群を除外して
いき、相違度の総和が選別基準以内の代表文字のみを初
期候補として最終候補選択部６に渡す。初期候補選択部
５における相違度の計算は、対象が代表文字のみである
ため全次元を対象とした通常の方法でもよいが、後述す
る最終候補選択部６の方法を適用すれば更に計算数を低
減し高速化することができる。なお、選別基準は文字群
数や文字群の大きさ等を考慮して設定され、後続処理を
簡単にするためには小さい値の方が有利であるが、代表
文字とその文字群内文字との相違度の最大値よりも大き
いことが必要である。The extracted feature vector is passed to the initial candidate selection unit 5, the degree of difference from the standard pattern of the representative character of each character group registered in the character recognition dictionary 1 is calculated, and the detailed comparison is performed by rough selection. The number of target characters to perform is limited. That is, the representative characters of each character group are sequentially selected with reference to the representative character identification information of the character recognition dictionary 1, the same dimensions of the feature vectors are compared to calculate the sum of the dissimilarities, and the sum of the dissimilarities is calculated in advance. Character groups that exceed the defined selection criterion are excluded, and only representative characters whose sum of differences is within the selection criterion are passed to the final candidate selection unit 6 as initial candidates. The calculation of the dissimilarity in the initial candidate selection unit 5 may be an ordinary method for all dimensions because the target is only the representative character, but if the method of the final candidate selection unit 6 described below is applied, the number of calculations is further reduced. It can be reduced and increased in speed. The selection criterion is set in consideration of the number of character groups and the size of the character group, and a smaller value is advantageous to simplify subsequent processing. It is necessary that the difference is larger than the maximum value.

【００２６】最終候補選択部６は、最初に初期候補選択
部５から渡されたすべての代表文字を相違度の小さい順
にソートして候補文字として初期設定する。初期設定が
終わると、渡された代表文字が属する文字群の全文字種
を対象とし、決められた特徴要素の順番で特徴要素ごと
に特徴ベクトルの相違度の中間集計を逐次行い、その都
度候補文字の最下位ランクの文字の相違度と比較して処
理継続の判定をする。相違度の中間集計値が最下位ラン
ク文字の相違度を超えていれば当該文字種に関する計算
を打ち切り次の文字に移る。全特徴要素の相違度の総和
が候補文字の最下位ランク文字の相違度より小さけれ
ば、当該文字種で候補文字を入れ替えていく。以上の処
理を繰り返すことにより、初期設定された候補文字は、
対象とした文字群中の相違度の総和が小さい文字で順次
入れ替えられ、相違度の総和が小さい順に配列されるこ
とになる。The final candidate selecting section 6 sorts all the representative characters first passed from the initial candidate selecting section 5 in the order of decreasing dissimilarity and initializes them as candidate characters. After the initial setting, all the character types of the character group to which the passed representative character belongs are targeted, and the intermediate summation of the feature vector dissimilarity is sequentially performed for each feature element in the order of the determined feature elements. The continuation of the process is determined by comparing the dissimilarity of the lowest rank character of. If the intermediate aggregate value of the dissimilarities exceeds the dissimilarity of the lowest rank character, the calculation for the character type is aborted and the next character is moved to. If the sum of the dissimilarities of all the feature elements is smaller than the dissimilarity of the lowest rank character of the candidate character, the candidate character is replaced with the character type. By repeating the above process, the initially set candidate character is
Characters having a smaller total sum of dissimilarities in the target character group are sequentially replaced, and the character groups are arranged in the order of smaller total sum of dissimilarities.

【００２７】認識結果出力部７は、最終候補文字の最上
位ランクに位置する文字を認識結果として出力し表示す
ればよい。なお、最上位ランクと次位ランク以下との相
違度の差が無いか僅少の場合には、これらの文字を記憶
すると共に認識結果の表示文字にマークを付し、必要な
場合に指示によって第２，第３ランクの候補文字を代替
候補として提示するように構成することもできる。The recognition result output unit 7 may output and display the character positioned at the highest rank of the final candidate characters as the recognition result. If there is no difference in the difference between the highest rank and the second rank or lower, or if there is a slight difference, these characters are memorized and a mark is added to the display character of the recognition result. The second and third rank candidate characters may be configured to be presented as alternative candidates.

【００２８】図２は最終候補選択部６の処理の詳細を示
す流れ図である。以下、図２を参照して最終候補選択部
６の動作を再度説明する。FIG. 2 is a flow chart showing details of the processing of the final candidate selecting section 6. Hereinafter, the operation of the final candidate selection unit 6 will be described again with reference to FIG.

【００２９】初期候補選択部５から渡された代表文字を
候補文字として相違度の小さい順にソートして初期設定
を行い処理を開始する。ここで、候補文字数（文字群
数）をＪとし、対応する文字群に含まれる総文字数をＮ
とすると、Ｎ文字の中で代表文字のＪ文字については既
に初期候補選択部５で相違度が計算済みであるから、残
りのＮ−Ｊ文字について順次相違度計算を行い、相違度
の小さい文字種があれば初期設定した候補文字を入れ替
え、相違度の小さい順にＪ文字種を最終候補文字として
残す処理を図２に示す手順に従って行う。まず、相違度
計算の対象となるＮ個の文字種に順番を付け、代表文字
を＃１〜＃Ｊとして、ステップＳ１で対象文字種の順序
を示すカウンタｎの値をｎ＝Ｊ＋１に設定する。The representative character passed from the initial candidate selecting section 5 is sorted as a candidate character in the ascending order of the dissimilarity, and the initial setting is performed to start the processing. Here, the number of candidate characters (the number of character groups) is J, and the total number of characters included in the corresponding character group is N.
Then, since the dissimilarity has already been calculated in the initial candidate selection unit 5 for the representative J character among the N characters, the dissimilarity calculation is sequentially performed for the remaining N-J characters, and the character type having a small dissimilarity is calculated. If there is, the process of replacing the initially set candidate characters and leaving the J character type as the final candidate character in the ascending order of difference is performed according to the procedure shown in FIG. First, N character types to be targets of difference calculation are ordered, representative characters are set to # 1 to #J, and a value of a counter n indicating the order of target character types is set to n = J + 1 in step S1.

【００３０】次に、ステップＳ２で特徴要素（方向）を
示すカウンタｋと相違度の集計値を入れるレジスタ△k-
1 及び△k の初期設定を行う。続いて、ステップＳ３で
入力パターンの特徴ベクトル成分Ｆikと標準パターンの
特徴ベクトル成分Ｇnik の差分の二乗を相違度として特
徴要素ごとに全小領域分の集計を行う。Ｍ（＝ＸＹ）は
特徴ベクトルの１特徴要素（方向成分）当たりの次元数
であり、相違度の集計は特徴要素ごとに行われ、レジス
タ△k に計算結果が累積加算されていく。Next, in step S2, a counter k indicating the characteristic element (direction) and a register Δk-for storing the aggregate value of the dissimilarity.
Initialize 1 and Δk. Then, in step S3, the sum of the squares of the difference between the feature vector component Fik of the input pattern and the feature vector component Gnik of the standard pattern is used as the degree of difference, and all the small regions are totalized for each feature element. M (= XY) is the number of dimensions per one characteristic element (direction component) of the characteristic vector, the dissimilarity is totaled for each characteristic element, and the calculation result is cumulatively added to the register Δk.

【００３１】ステップＳ４は１方向成分ごとの相違度が
計算されるたびに候補文字の最下位ランクの文字の相違
度△cJ（添え字ｃは候補文字をＪはランクを示す）と現
在のレジスタ△k の相違度とを比較する。ここで、レジ
スタ△k の相違度が既に相違度△cJよりも大きければ、
ステップＳ８に進んで文字種位置を一つ進め、ステップ
Ｓ９の判定でカウンタｎが対象とする総文字数Ｎを超え
ていなければステップＳ２に戻り次の文字種との比較に
移る。カウンタｎが総文字数Ｎを超えていればそこで処
理を終了する。In step S4, the dissimilarity ΔcJ (the subscript c indicates the candidate character and J indicates the rank) of the lowest rank character of the candidate character and the current register each time the difference for each one-direction component is calculated. Compare with the difference of Δk. Here, if the dissimilarity of the register Δk is already larger than the dissimilarity ΔcJ,
The process advances to step S8 to advance the character type position by one, and if the counter n does not exceed the target total number N of characters in the determination in step S9, the process returns to step S2 to compare with the next character type. If the counter n exceeds the total number N of characters, the process ends there.

【００３２】レジスタ△k の相違度が相違度△cJより小
さければ、ステップＳ５に進みカウンタｋを一つ進めて
対象方向を変更する。このとき、ステップＳ６における
比較で既に４方向についての相違度計算が終了していれ
ば、ステップＳ７で候補文字の入れ替えを行い、その後
に他の文字種との相違度計算へ移る。ステップＳ６の判
定でまだ別の方向成分の相違度計算が未処理であれば、
ステップＳ３に戻りその方向成分の相違度を計算して加
算し、ステップＳ４以降の処理を繰り返す。If the dissimilarity of the register Δk is smaller than the dissimilarity ΔcJ, the process proceeds to step S5, the counter k is incremented by one, and the target direction is changed. At this time, if the difference degree calculation in the four directions has already been completed in the comparison in step S6, the candidate characters are replaced in step S7, and then the difference degree calculation with other character types is performed. If the difference degree calculation of another direction component is not yet processed in the determination of step S6,
The process returns to step S3, the degree of difference of the directional component is calculated and added, and the processing from step S4 is repeated.

【００３３】図３は図２中のステップＳ７の候補文字の
入れ替え処理の流れ図である。図３を参照して入れ替え
処理を詳細に説明する。FIG. 3 is a flow chart of the candidate character replacement process of step S7 in FIG. The replacement process will be described in detail with reference to FIG.

【００３４】最初に、ステップＳ７１でカウンタｊに候
補文字数Ｊを設定し、候補文字の最下位ランクから順次
上位ランクへと入れ替えを行っていく。まず、最下位ラ
ンクの候補文字を候補外へ移して新しい候補文字を最下
位に挿入する。すなわち、ステップＳ７２において、相
違度△cJと新しい候補文字の相違度△4 （ｋ＝４とした
△k ）との入れ替えが行われる。ここで、△cj+1は候補
外の相違度を表す。First, in step S71, the number of candidate characters J is set in the counter j, and the candidate character is sequentially replaced from the lowest rank to the highest rank. First, the candidate character having the lowest rank is moved out of the candidate and a new candidate character is inserted at the lowest position. That is, in step S72, the dissimilarity ΔcJ and the dissimilarity Δ4 (Δk with k = 4) of the new candidate character are exchanged. Here, Δcj + 1 represents the degree of difference outside the candidate.

【００３５】続いて、ステップＳ７３において一つ上の
ランクの候補文字との相違度の比較が行われ、一つ上の
ランクの候補文字よりも相違度が大きければ入れ替え処
理は終了する。一つ上のランクの候補文字よりも相違度
が小さければステップＳ７４の処理に移る。ステップＳ
７４においては、相違度の退避用変数△o に一つ上のラ
ンクの候補文字の相違度を退避し、候補ランクを入れ替
えるという処理が行われる。Subsequently, in step S73, the degree of dissimilarity with the candidate character having the rank higher by one is compared. If the degree of dissimilarity is higher than the candidate character with the rank higher by one, the replacement process ends. If the dissimilarity is smaller than the candidate character of the next higher rank, the process proceeds to step S74. Step S
In 74, the difference degree saving variable Δo saves the difference degree of the candidate character one rank higher and replaces the candidate rank.

【００３６】候補ランクの入れ替えが終わるとステップ
Ｓ７５においてカウンタｊの値が一つ戻される。このと
き、ステップＳ７６でカウンタｊが最上位ランク（ｊ＝
１）を指していればそこで処理を終了させる。そうでは
なく、カウンタｊがまだ最上位ランクに達していなけれ
ばステップＳ７３以降の処理を繰り返す。When the replacement of the candidate ranks is completed, the value of the counter j is returned by one in step S75. At this time, in step S76, the counter j has the highest rank (j =
If it points to 1), the process ends there. Otherwise, if the counter j has not yet reached the highest rank, the processing from step S73 is repeated.

【００３７】以上のようにして、初期候補選択部５で選
択された代表文字に対応する文字群のすべての文字種の
標準パターンとの比較および候補文字の入れ替えが終了
した時点で最終候補選択部６の処理は終了し、認識結果
出力部７から最終的に候補文字の最上位ランクに位置し
た文字が出力され、文字認識が終了する。As described above, when the comparison with the standard patterns of all the character types of the character group corresponding to the representative character selected by the initial candidate selection unit 5 and the replacement of the candidate characters are completed, the final candidate selection unit 6 The process of (1) is ended, the character positioned at the highest rank of the candidate characters is finally output from the recognition result output unit 7, and the character recognition is ended.

【００３８】上述したように、本実施例においては、特
徴ベクトルの相違度の計算を先頭の成分から順に行って
一度に総和を求めるのではなく、決められた特徴要素の
順番で特徴要素ごとに中間集計を行いながら実行し、そ
の都度打ち切りか処理継続かの判定をするものである。
前述したように、文字認識辞書１には、各文字種の特徴
ベクトル成分が、決められた小領域順で決められた特徴
要素順に登録されている。従って、特徴ベクトルの相違
度を特徴要素ごとに中間集計することは、特徴ベクトル
成分の並び替えを行うことなく容易に行うことができ
る。すなわち、特徴要素が線素の４方向であれば、４個
ごとの成分を取り出してそれぞれ対応する成分の差分を
計算し集計を行えばよい。なお、文字認識辞書１に各文
字種の特徴ベクトル成分が、特徴要素順，小領域順で登
録されている場合には、小領域数ずつの連続した成分の
集計を行うことになる。As described above, in the present embodiment, instead of calculating the dissimilarity of feature vectors in order from the first component and obtaining the sum at a time, the feature elements are calculated for each feature element in a predetermined order. It is executed while performing the intermediate totaling, and each time, it is determined whether to abort or continue the processing.
As described above, in the character recognition dictionary 1, the characteristic vector components of each character type are registered in the determined characteristic element order in the determined small area order. Therefore, the intermediate summation of the dissimilarity of feature vectors for each feature element can be easily performed without rearranging the feature vector components. That is, if the characteristic element is in four directions of the line element, it is sufficient to take out every four components, calculate the difference between the respective corresponding components, and perform aggregation. When the feature vector components of each character type are registered in the character recognition dictionary 1 in the order of feature elements and in the order of small areas, the continuous components are counted for each number of small areas.

【００３９】日本語文章には、漢字，平がな，片カナ，
英数字，記号が混在しており、使用文字の３〜４割は漢
字であると考えられる。文字を構成する線素の方向とい
う特徴要素に着目すると、平がな等は斜めあるいは曲線
といった成分が多いが、漢字ではそのほとんどが縦また
は横の直線成分であるといえる。このような日本語文章
の性質と第１水準の文字種の三千字種近くが漢字である
ことを考慮すると、縦また横の成分を先にし、斜め（４
方向で考えた場合±４５度）の成分を後に集計するのが
効果的である。Japanese sentences include kanji, hiragana, katakana,
Alphanumeric characters and symbols are mixed, and it is considered that 30 to 40% of the used characters are Kanji. Focusing on the characteristic element of the direction of the line elements that make up a character, it can be said that most of the kanji characters are diagonal or curved straight line components, but most of the kanji are vertical or horizontal straight line components. Considering the nature of Japanese sentences and the fact that nearly 3,000 first-level character types are Kanji, the vertical and horizontal components are first and diagonal (4
It is effective to add up the components of ± 45 degrees when considered in the direction later.

【００４０】上述した実施例の説明においては、初期候
補選択部５は、相違度が選別基準以下の代表文字のすべ
てを初期候補として最終候補選択部６に渡すものとした
が、最終候補選択部６における処理対象文字数を制限す
る観点から、代表文字のすべてではなく、相違度の小さ
いものから一定数の代表文字のみを渡すようにしてもよ
い。ただし、代表文字のすべてを渡す上述の実施例の方
式には、文字認識辞書１にＪＩＳ第２水準漢字などの認
識対象文字を追加登録する場合に、既登録の文字群の編
成を変更することなく、追加文字の文字群を範囲の重複
を考慮することなく独立に設定できる利点がある。これ
に対して、一定数のみを渡す方式の場合は、文字群の編
成を変更するか、渡す代表文字の数を変更するか、何ら
かの対応が必要となる。In the above description of the embodiment, the initial candidate selection unit 5 is assumed to pass all the representative characters whose dissimilarity is less than the selection criterion to the final candidate selection unit 6 as initial candidates. From the viewpoint of limiting the number of characters to be processed in 6, it is possible to pass not only all the representative characters but only a certain number of representative characters from the one having the smallest difference. However, in the method of the above-described embodiment in which all the representative characters are passed, when the recognition target character such as JIS second level kanji is additionally registered in the character recognition dictionary 1, the organization of the registered character group is changed. Moreover, there is an advantage that the character groups of the additional characters can be independently set without considering the overlapping of ranges. On the other hand, in the case of the method of passing only a fixed number, some kind of countermeasure is required, such as changing the organization of the character group or changing the number of representative characters to be passed.

【００４１】又、最終候補選択部６は、初期候補選択部
５から渡されたすべての代表文字を相違度の小さい順に
ソートして候補文字として初期設定するものとした。し
かしながら、最終候補選択部６において順次入れ替えの
対象となる候補文字数は、代表文字の全数でなく限定さ
れた一定数（一つでもよいが複数が望ましい）でもよ
い。この場合、候補文字数と対象文字群の数とは一致し
なくなり、図２のステップＳ１における“Ｊ”と、ステ
ップＳ４における△cJの“Ｊ”は同一でなく、後者は図
３の“Ｊ”と共に“Ｊo ”（Ｊ＞Ｊo ）に置き替えられ
る。Further, the final candidate selecting section 6 sorts all the representative characters passed from the initial candidate selecting section 5 in the order of decreasing dissimilarity and initializes them as candidate characters. However, the number of candidate characters to be sequentially replaced in the final candidate selection unit 6 may be a limited fixed number (one may be preferable, but a plurality is preferable) instead of the total number of representative characters. In this case, the number of candidate characters and the number of target character groups do not match, and “J” in step S1 of FIG. 2 and “J” of ΔcJ in step S4 are not the same, and the latter is “J” in FIG. It is replaced with "Jo"(J> Jo).

【００４２】更に、上述の実施例においては、最終候補
選択部６における相違度の中間集計ごとの継続処理の判
定基準として、候補文字の最下位ランクの総合相違度を
使用している。従って、判定基準値が処理の進行と共に
変化するが、この判定基準値を固定値としてもよい。こ
の場合、固定値は小さい方がよく、少なくとも初期候補
選択部５における選別基準よりも小さい値が適当であ
る。Further, in the above-described embodiment, the total difference degree of the lowest rank of the candidate character is used as the criterion for the continuation processing of the difference degree in the final candidate selecting section 6 for each intermediate tabulation. Therefore, the judgment reference value changes as the processing progresses, but this judgment reference value may be a fixed value. In this case, the fixed value is preferably small, and at least a value smaller than the selection criterion in the initial candidate selection unit 5 is suitable.

【００４３】[0043]

【発明の効果】以上説明したように、本発明の文字認識
装置においては、認識対象文字を相違度の小さい文字か
ら成る複数の文字群に分け、文字認識辞書に特徴ベクト
ルと共に文字群および代表文字の識別情報を登録し、代
表文字による粗選別を行うと共に、粗選別後の最終選別
における相違度の計算を特徴要素成分ごとに分割して行
い、一つの特徴要素成分についての相違度の計算を終え
るたびに判定を入れて必要のない特徴要素成分の計算を
打ち切るよう構成したので、余分な計算を省くことがで
きると共に候補文字の入れ替え処理も削減され、１種類
の特徴ベクトルのみを使用し、文字認識辞書の特徴ベク
トル成分の配列順序を特に変更することなく、文字認識
の認識速度を大幅に向上できるという効果が得られる。As described above, in the character recognition device of the present invention, the character to be recognized is divided into a plurality of character groups consisting of characters having a small dissimilarity, and the character recognition dictionary and the character group and the representative character are stored in the character recognition dictionary. The identification information is registered, rough selection is performed using representative characters, and the degree of difference in the final selection after rough selection is divided for each feature element component to calculate the difference for one feature element component. Since it is configured to terminate the calculation of unnecessary feature element components by inserting a judgment each time it is finished, it is possible to omit extra calculation and reduce the replacement process of candidate characters, and use only one type of feature vector. The recognition speed of the character recognition can be significantly improved without changing the arrangement order of the feature vector components of the character recognition dictionary.

[Brief description of drawings]

【図１】本発明の一実施例の構成を示すブロック図であ
る。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【図２】図１中の最終候補選択部における相違度計算の
処理を示す流れ図である。FIG. 2 is a flow chart showing a process of calculating a dissimilarity degree in a final candidate selecting unit in FIG.

【図３】図２中の候補文字の入れ替え処理の詳細を示す
流れ図である。FIG. 3 is a flowchart showing details of a candidate character replacement process in FIG.

[Explanation of symbols]

１文字認識辞書２文字パターン入力部３文字パターン正規化部４特徴ベクトル抽出部５初期候補選択部６最終候補選択部７認識結果出力部 1 character recognition dictionary 2 character pattern input unit 3 character pattern normalization unit 4 feature vector extraction unit 5 initial candidate selection unit 6 final candidate selection unit 7 recognition result output unit

Claims

[Claims]

1. A character recognition device for recognizing Japanese characters using a multidimensional feature vector having a plurality of feature element components for each of the divided small areas, and the feature vector together with the feature vector of a standard pattern of each character type. Character group identification information indicating which of a plurality of character groups composed of a plurality of character types having a small difference degree and representative character identification information indicating whether or not the character is a representative character at the center of each character group The stored character recognition dictionary, the character pattern input part that captures the character patterns of unknown input characters to be recognized, and the character pattern normalization part that normalizes the size and line width of the captured character patterns so that they can be handled uniquely. A feature vector extraction unit for extracting a feature vector from the normalized character pattern, a feature vector extraction unit for extracting the feature vector and a feature vector of a representative character of the character recognition dictionary. After initializing the candidate characters from the selected representative character in the ascending order of the dissimilarity, the initial candidate selection unit that selects the representative character as an initial candidate under a predetermined condition by comparing All the character types of the character group to which the selected representative character belongs are targeted in the order of the characteristic element components determined for each character type. A character recognition device, comprising: a final candidate selection unit for replacing and a recognition result output unit for at least finally outputting a character positioned at the highest rank of candidate characters as a recognition result.

2. The character recognition device according to claim 1, wherein the initial candidate selection unit selects, as initial candidates, all the representative characters whose dissimilarity is smaller than a predetermined limit value.

3. The initial candidate selection unit compares the extracted feature vector and the feature vector of a representative character of the character recognition dictionary in a predetermined order of feature element components, and for each representative character, for each feature element component. The character recognition device according to claim 1 or 2, wherein the representative character that is an initial candidate is selected by adding a determination each time while performing the intermediate totalization of the dissimilarity.

4. The final candidate selecting section initializes only a fixed number of candidate characters in the descending order of the degree of difference among the representative characters selected by the initial candidate selecting section. The character recognition device according to claim 2 or claim 3.

5. The final candidate selection unit compares the intermediate tabulated value of the dissimilarity of each feature element component with the dissimilarity of the lowest rank character of the candidate character, and calculates the dissimilarity of the remaining feature element components. The character recognition device according to each of claims 1 to 4, wherein it is sequentially determined whether or not to continue.

6. A plurality of characteristic element components constituting a characteristic vector are four-direction elements indicating the directions of line elements of a skeleton pattern (or a contour line pattern of a character image) obtained by thinning a character image, and each characteristic element component 6. The character recognition device according to each of claims 1 to 5, characterized in that the order of performing the intermediate totalization of the dissimilarity is determined such that the processing in the vertical and horizontal directions comes before the processing in the diagonal direction. .