JP2866920B2

JP2866920B2 - Standard pattern creation method and apparatus, and character recognition apparatus and method

Info

Publication number: JP2866920B2
Application number: JP8047036A
Authority: JP
Inventors: 正行木村
Original assignee: HOKURIKU SENTAN KAGAKU GIJUTSU DAIGAKUIN DAIGAKUCHO
Current assignee: HOKURIKU SENTAN KAGAKU GIJUTSU DAIGAKUIN DAIGAKUCHO
Priority date: 1996-03-05
Filing date: 1996-03-05
Publication date: 1999-03-08
Anticipated expiration: 2016-03-05
Also published as: JPH09245126A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、標準パターン作成
方法及び装置並びに文字認識装置及び方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and an apparatus for creating a standard pattern and an apparatus and a method for character recognition.

【０００２】[0002]

【従来の技術】文字認識装置は、膨大な印刷文書の電子
化やデータベース化、文書処理および自動翻訳等の情報
処理装置として好適であり、またさらなる研究・開発も
進められている。2. Description of the Related Art A character recognition device is suitable as an information processing device for digitizing a large number of printed documents, creating a database, document processing, automatic translation, and the like, and further research and development are ongoing.

【０００３】用紙に印刷されたり手書きされた文字イメ
ージを認識する従来の文字認識システムは、一般に、
（１）文書画像入力、（２）文字切り出し、（３）前処
理（平滑化、正規化、細線化など）、（４）特徴抽出、
（５）大分類、（６）細分類、（７）後処理などから構
成される。[0003] Conventional character recognition systems for recognizing character images printed or handwritten on paper generally include:
(1) document image input, (2) character extraction, (3) preprocessing (smoothing, normalization, thinning, etc.), (4) feature extraction,
(5) Large classification, (6) Fine classification, (7) Post-processing, etc.

【０００４】このような文字認識システムにおいて、用
紙上の文字イメージは、まず、光学像として読み込ま
れ、電気信号に変換される。システムに読み込まれた文
字イメージは、周辺分布のヒストグラムなどに基づき、
所定の認識単位ごと、例えば１文字ごとに切り出された
後、効率よく認識を行うために前処理が施される。特徴
抽出では、構造解析法やパターン整合法などを用いて認
識処理を行うための入力文字の特徴、例えばトポロジカ
ルな特徴やメッシュに分割された各画素単位の特徴が求
められる。大分類は、文字カテゴリー数が多い漢字など
を対象にする場合に特に用いられ、単純な手法によって
候補カテゴリーを絞る。細分類では、絞られた候補につ
いて詳細な認識処理を行う。さらに、後処理では、個々
の入力文字の認識では候補を確定しきれない場合に、隣
接する入力文字を連結し、文法などを参照することによ
り文字列として判定を行う。In such a character recognition system, a character image on a sheet is first read as an optical image and converted into an electric signal. The text image loaded into the system is based on the histogram of the marginal distribution, etc.
After clipping is performed for each predetermined recognition unit, for example, for each character, preprocessing is performed for efficient recognition. In the feature extraction, a feature of an input character for performing a recognition process using a structural analysis method, a pattern matching method, or the like, for example, a topological feature or a feature of each pixel divided into meshes is obtained. The large classification is used particularly when targeting kanji having a large number of character categories, and narrows down candidate categories by a simple method. In the fine classification, detailed recognition processing is performed on the narrowed candidates. Further, in the post-processing, when candidates cannot be completely determined by recognizing individual input characters, adjacent input characters are connected, and a determination is made as a character string by referring to a grammar or the like.

【０００５】この種の従来の文字認識システムには次の
ような問題点がある。[0005] This type of conventional character recognition system has the following problems.

【０００６】従来の特徴抽出では、一定の大きさに正規
化された文字画像上を画素（ビット）単位でスキャン
し、スキャンされている当該画素とその近傍の画素との
関係に着目して文字の特徴量（特徴を数量化したもの）
を抽出する。このような特徴量は各文字の全体的あるい
は総合的特徴を表現するものであって、必ずしも各文字
の顕著な特徴を際立たせるものとはなっていない。その
ために、これらの特徴量は、ノイズに影響され易いとい
う問題がある。In the conventional feature extraction, a character image normalized to a certain size is scanned in units of pixels (bits), and the character is focused on by focusing on the relationship between the scanned pixel and its neighboring pixels. Features (quantified features)
Is extracted. Such features represent the overall or comprehensive features of each character, and do not necessarily highlight the salient features of each character. Therefore, there is a problem that these feature amounts are easily affected by noise.

【０００７】従来の大分類や細分類に用いられる文字認
識では、前述の特徴量で表現された未知入力文字と辞書
の標準パターンとの間で、両者の総合的な類似の度合い
を計るパターンマッチング（距離計算）が行われ、距離
の小さい順に適当数の候補字種が選択される。このよう
な従来の方式では、距離尺度が分類の唯一の規準である
ので、各文字の構造上の顕著な特徴を分類の過程で柔軟
に活用するということができないなどの欠点がある。こ
のため、従来方式では、未知入力パターンをすべての字
種の標準パターンと比較することが必要となる。例え
ば、字種数が５０００の場合には、基本的には５０００
回の距離計算が必要となる。このことが文字認識の高速
化の最大の障害になっている。さらに、文字認識の過程
で分類や認識結果の正当性のチェックが困難となり、誤
認識のチェックや訂正はすべて膨大な単語辞書との照合
という後処理に委ねられている。この点も高速化の大き
な障害になっている。[0007] In the conventional character recognition used for large classification and fine classification, pattern matching for measuring the degree of overall similarity between an unknown input character represented by the above-described feature and a standard pattern of a dictionary is known. (Distance calculation) is performed, and an appropriate number of candidate character types are selected in ascending order of the distance. In such a conventional method, since the distance scale is the only criterion for classification, there is a drawback that remarkable structural features of each character cannot be flexibly utilized in the classification process. Therefore, in the conventional method, it is necessary to compare the unknown input pattern with the standard patterns of all character types. For example, if the number of character types is 5000, basically 5000
One time distance calculation is required. This is the biggest obstacle to speeding up character recognition. Furthermore, it becomes difficult to check the correctness of classification and recognition results in the course of character recognition, and all checks and corrections of erroneous recognition are left to post-processing such as matching with an enormous word dictionary. This is also a major obstacle to speeding up.

【０００８】そこで、上記課題を解決するために、本発
明者は特願平６−３１２０４９号にて、新しい原理に基
づく文字認識方式を開示している。この文字認識方式
は、概略的には、文字画像の各方向の周辺分布（ヒスト
グラム）に文字の構造に関する情報が直接的に反映され
ていることに着目し、切り出した入力文字画像のヒスト
グラムから得た０−１パターン表現と辞書内の各標準文
字のヒストグラムから得た各０−１パターン表現とを部
分照合して、候補字種を絞り込んで行くものである。In order to solve the above problem, the present inventor has disclosed a character recognition system based on a new principle in Japanese Patent Application No. 6-312049. This character recognition method generally focuses on the fact that information on the structure of a character is directly reflected in a peripheral distribution (histogram) in each direction of a character image, and obtains the information from a histogram of a cut-out input character image. The 0-1 pattern expression obtained is partially collated with each 0-1 pattern expression obtained from the histogram of each standard character in the dictionary to narrow down the candidate character types.

【０００９】さて、この文字認識方式では、認識対象と
なる文字集合の１セットのサンプルを標準パターンと
し、フォントや文字のサイズの違いによる字体の変動を
独自の大分類法によって吸収する方法を用いている。こ
の方法で吸収しきれない場合には、誤認識となった文字
の０−１パターン表現を標準パターンに追加することと
していた。このため、認識対象となる文字のフォントの
種類を増やすと字体の変動も大きくなり、その大きな変
動を大分類法で吸収しようとすると、未知入力文字に対
して大分類で絞り込まれた候補字種数が必然的に増加す
る。未知入力文字のサイズの範囲を大きく広げた場合も
ほぼ同様に候補字種数が増加する。候補字種数が増加す
ると細分類に負担がかかり認識速度が低下することか
ら、誤認識となった文字の０−１パターン表現を標準パ
ターンに追加して候補字種数の増加を避けようとして
も、今度は標準パターンの追加による標準パターン数の
増加に伴って大分類の所要時間が増すことになり認識速
度の低下をもたらす。しかし、標準パターン数の増大
（辞書の大規模化）に対する高速化の方法は与えられて
いない。一方、字体の変動を考慮した統計的標準パター
ンを用いれば、従来のパターンマッチング方式と同様す
べての標準パターンとの距離計算が必要になり、高速化
は不可能となる。In this character recognition method, a set of samples of a character set to be recognized is used as a standard pattern, and a method of absorbing variations in fonts due to differences in fonts and character sizes using a unique large classification method is used. ing. If it cannot be absorbed by this method, the 0-1 pattern expression of the misrecognized character is added to the standard pattern. For this reason, if the font type of the character to be recognized is increased, the variation of the font will also increase, and if the large variation is to be absorbed by the large classification method, the candidate character type narrowed down to the unknown input character by the large classification The number inevitably increases. Even when the range of the size of the unknown input character is greatly expanded, the number of candidate character types increases in substantially the same manner. If the number of candidate character types increases, the burden on the sub-classification increases and the recognition speed decreases. Therefore, an attempt was made to avoid the increase in the number of candidate character types by adding a 0-1 pattern expression of misrecognized characters to a standard pattern. However, this time, the time required for the large classification increases with the increase in the number of standard patterns due to the addition of the standard patterns, and the recognition speed decreases. However, no method for speeding up the increase in the number of standard patterns (increase in the size of the dictionary) has not been provided. On the other hand, if a statistical standard pattern in which the variation of the font is taken into consideration is used, distance calculation with all standard patterns is required as in the case of the conventional pattern matching method, and speeding up is not possible.

【００１０】[0010]

【発明が解決しようとする課題】したがって、このよう
な文字認識方式において、多様な字体の変動を吸収し
て、さらなる認識速度および認識精度の向上を図るため
には、従来とは全く異なる新しいタイプの標準パターン
を作り出すことが要求される。また、標準パターンが増
大（辞書の大規模化）した場合でも、認識精度を維持し
つつ認識速度の向上を図ることが要求される。Therefore, in such a character recognition system, a new type which is completely different from the conventional one is required in order to absorb various variations of the font and further improve the recognition speed and the recognition accuracy. It is required to produce standard patterns. Even when the number of standard patterns increases (the dictionary becomes large-scale), it is required to improve the recognition speed while maintaining the recognition accuracy.

【００１１】本発明は、上記事情に鑑みてなされたもの
であり、従来に比べ、耐ノイズ性、認識速度および認識
精度すべての面を同時に高めることができる文字認識装
置および方法を提供することを目的とするとともに、そ
の認識辞書に登録する標準パターンとして字体の違いに
よる文字の変動を効果的に吸収することのできる標準パ
ターンを作成するための標準パターン作成方法及び装置
を提供することを目的とする。また、標準パターンが増
大しても、認識精度を維持しつつ認識速度の向上を図る
ことができる文字認識装置および方法を提供することを
目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and provides a character recognition apparatus and method capable of simultaneously improving all aspects of noise resistance, recognition speed, and recognition accuracy as compared with the related art. It is another object of the present invention to provide a method and an apparatus for creating a standard pattern for creating a standard pattern capable of effectively absorbing variations in characters due to differences in fonts as a standard pattern to be registered in the recognition dictionary. I do. It is another object of the present invention to provide a character recognition device and method capable of improving the recognition speed while maintaining the recognition accuracy even when the number of standard patterns increases.

【００１２】[0012]

【課題を解決するための手段】本発明（請求項１）は、
入力された文字画像のドットマトリクスからヒストグラ
ムを作成し、作成された該文字画像のヒストグラムから
抽出される０−１パターン表現と、各字種ごとに１つ又
は複数用意された各字種の標準パターンとなる０−１パ
ターン表現とを夫々部分的に照合して該入力文字画像に
対応する字種の候補を探索する文字認識装置のための前
記標準パターンとなる０−１パターン表現を作成する標
準パターン作成方法であって、同一字種であるが互いに
字体の異なる２以上の文字画像のドットマトリクスから
夫々ヒストグラムを作成し、作成された各ヒストグラム
から夫々０−１パターン表現を作成し、作成された各０
−１パターン表現をもとに最大化処理および最小化処理
を行い、その結果を前記標準パターンとすることを特徴
とする。Means for Solving the Problems The present invention (claim 1) provides:
A histogram is created from the dot matrix of the input character image, a 0-1 pattern expression extracted from the created histogram of the character image, and one or a plurality of standardized character types prepared for each character type. A 0-1 pattern expression serving as the standard pattern is created for a character recognition device that searches a candidate for a character type corresponding to the input character image by partially collating the 0-1 pattern expression serving as a pattern. A standard pattern creation method, wherein a histogram is created from a dot matrix of two or more character images having the same character type but different fonts, and a 0-1 pattern expression is created from each created histogram. Each done 0
The present invention is characterized in that a maximizing process and a minimizing process are performed based on a -1 pattern expression, and the result is used as the standard pattern.

【００１３】また、本発明（請求項８）は、入力された
文字画像のドットマトリクスからヒストグラムを作成
し、作成された該文字画像のヒストグラムから抽出され
る０−１パターン表現と、各字種ごとに１つ又は複数用
意された各字種の標準パターンとなる０−１パターン表
現とを夫々部分的に照合して該入力文字画像に対応する
字種の候補を探索する文字認識装置のための前記標準パ
ターンとなる０−１パターン表現を作成する標準パター
ン作成装置であって、同一字種であるが互いに字体の異
なる２以上の文字画像のドットマトリクスから夫々ヒス
トグラムを作成する手段と、作成された各ヒストグラム
から夫々０−１パターン表現を作成する手段と、作成さ
れた各０−１パターン表現をもとに最大化処理および最
小化処理を行う手段とを備えたことを特徴とする。According to the present invention (claim 8), a histogram is created from a dot matrix of an input character image, a 0-1 pattern expression extracted from the created histogram of the character image, and each character type. A character recognition device that partially matches each of one or a plurality of standard patterns of each character type prepared for each character type with a 0-1 pattern expression and searches for a character type candidate corresponding to the input character image. Means for creating a histogram from a dot matrix of two or more character images of the same character type but different fonts, respectively. Means for creating a 0-1 pattern expression from each of the generated histograms, and means for performing a maximization process and a minimization process based on each of the created 0-1 pattern expressions Characterized by comprising a.

【００１４】ここで、縦や横に分離する文字（例えば
「門」「乱」など）に対し、前記字種として分離した各
部分を用意し、各分離部分ごとに認識することも可能で
ある。Here, for characters that are separated vertically and horizontally (for example, “gate”, “ran”, etc.), it is also possible to prepare each part separated as the character type and recognize each separated part. .

【００１５】また、本発明（請求項９）に係る文字認識
装置は、入力された文書画像を所定の認識単位毎に切り
出す検切手段と、切り出された認識単位からヒストグラ
ムを作成するヒストグラム作成手段と、作成された前記
認識単位のヒストグラムから抽出された０−１パターン
表現と、予め用意された各認識対象カテゴリの０−１パ
ターン表現とを夫々部分的に照合して、該認識単位に対
応する認識対象カテゴリの候補を探索する大分類手段と
を備え、前記予め用意された各認識対象カテゴリの０−
１パターン表現は、同一字種であるが互いに字体の異な
る２以上の文字画像のドットマトリクスから得られるヒ
ストグラムから夫々作成された０−１パターン表現をも
とに、所定の最大化処理および最小化処理を行って得ら
れたものであることを特徴とする。好ましくは、前記認
識単位は文字単位であり、前記認識対象カテゴリは字種
であることを特徴とする。Further, the character recognition device according to the present invention (claim 9) is a checking means for cutting out an input document image for each predetermined recognition unit, and a histogram creating means for forming a histogram from the cut out recognition unit. And the 0-1 pattern expression extracted from the created histogram of the recognition unit and the previously prepared 0-1 pattern expression of each recognition target category are partially collated to correspond to the recognition unit. Large classification means for searching for a candidate for a recognition target category to be executed.
The one-pattern expression is a predetermined maximizing process and a predetermined minimizing process based on 0-1 pattern expressions respectively created from a histogram obtained from a dot matrix of two or more character images having the same character type but different fonts. It is characterized by being obtained by performing processing. Preferably, the recognition unit is a character unit, and the recognition target category is a character type.

【００１６】また、本発明（請求項１０）に係る文字認
識方法は、認識対象とする複数の字種それぞれについ
て、同一字種であるが互いに字体の異なる２以上の文字
画像のドットマトリクスから夫々ヒストグラムを作成
し、作成された各ヒストグラムから夫々０−１パターン
表現を作成し、作成された各０−１パターン表現をもと
に最大化処理および最小化処理を行い、その結果を認識
対象とする字種に対応する標準パターンとなる０−１パ
ターン表現として認識辞書に登録しておき、入力された
文書画像を所定の認識単位毎に切り出し、切り出された
認識単位からヒストグラムを作成し、作成された前記認
識単位のヒストグラムから抽出された０−１パターン表
現と、前記認識辞書に登録された認識対象とする各字種
に対応する前記標準パターンとなる０−１パターン表現
とを夫々部分的に照合して、該認識単位に対応する字種
の候補を探索することを特徴とする。好ましくは、前記
認識単位は文字単位であることを特徴とする。Further, in the character recognition method according to the present invention (claim 10), for each of a plurality of character types to be recognized, a dot matrix of two or more character images having the same character type but different fonts is respectively used. A histogram is created, a 0-1 pattern expression is created from each of the created histograms, a maximizing process and a minimizing process are performed based on each of the created 0-1 pattern expressions, and the result is used as a recognition target. Is registered in the recognition dictionary as a 0-1 pattern expression that is a standard pattern corresponding to the character type to be extracted, and an input document image is cut out for each predetermined recognition unit, and a histogram is created from the cut out recognition unit and created. The 0-1 pattern expression extracted from the histogram of the recognized recognition unit and the standard pattern corresponding to each character type to be recognized registered in the recognition dictionary. By matching the over emissions become 0-1 pattern representations respectively partially, characterized by searching the character types of candidates corresponding to the recognition unit. Preferably, the recognition unit is a character unit.

【００１７】また、本発明（請求項１１）に係る文字認
識装置は、各認識対象カテゴリ毎に標準パターンとして
少なくとも１つ用意された、同一字種であるが互いに字
体の異なる２以上の文字画像のドットマトリクスから得
られるヒストグラムから夫々作成された０−１パターン
表現をもとに所定の最大化処理および最小化処理を行っ
て得られた０−１パターン表現と、該０−１パターン表
現を所定の分類条件に基づいてクラス分類して得られた
該標準パターンの属するクラスを示す情報とを対応付け
て登録した認識辞書と、入力された文書画像を所定の認
識単位毎に切り出す検切手段と、切り出された認識単位
からヒストグラムを作成するヒストグラム作成手段と、
作成された前記認識単位のヒストグラムから抽出された
０−１パターン表現を前記所定の分類条件に従ってクラ
ス分類し、該当するクラスを求める前処理手段と、作成
された前記認識単位のヒストグラムから抽出された０−
１パターン表現と、前記認識辞書に登録された標準パタ
ーンのうち前記前処理手段により求められたクラスと同
一のクラスを示す情報に対応付けられた標準パターンと
を夫々部分的に照合して、該認識単位に対応する認識対
象カテゴリの候補を探索する大分類手段とを備えたこと
を特徴とする。好ましくは、前記認識単位は文字単位で
あり、前記認識対象カテゴリは字種であることを特徴と
する。Further, according to the character recognition device of the present invention (claim 11), at least two character images of the same character type but different fonts are provided as at least one standard pattern for each recognition target category. The 0-1 pattern expression obtained by performing predetermined maximization processing and minimization processing based on the 0-1 pattern expression respectively created from the histogram obtained from the dot matrix of FIG. A recognition dictionary registered in association with information indicating a class to which the standard pattern belongs obtained by classifying based on a predetermined classification condition, and a checking unit for cutting out an input document image for each predetermined recognition unit A histogram creating means for creating a histogram from the cut out recognition units,
Preprocessing means for classifying the 0-1 pattern expression extracted from the created histogram of the recognition unit according to the predetermined classification condition to obtain a corresponding class, and extracting from the created histogram of the recognition unit. 0-
One pattern expression is partially collated with a standard pattern associated with information indicating the same class as the class obtained by the pre-processing means among the standard patterns registered in the recognition dictionary. Large classification means for searching for a candidate for a recognition target category corresponding to a recognition unit. Preferably, the recognition unit is a character unit, and the recognition target category is a character type.

【００１８】また、本発明（請求項１６）に係る文字認
識方法は、各認識対象カテゴリ毎に標準パターンとして
少なくとも１つ用意された、同一字種であるが互いに字
体の異なる２以上の文字画像のドットマトリクスから得
られるヒストグラムから夫々作成された０−１パターン
表現をもとに所定の最大化処理および最小化処理を行っ
て得られた０−１パターン表現と、該０−１パターン表
現を所定の分類条件に基づいてクラス分類して得られた
該標準パターンの属するクラスを示す情報とを対応付け
て認識辞書に登録しておき、入力された文書画像を所定
の認識単位毎に切り出し、切り出された認識単位からヒ
ストグラムを作成し、作成された前記認識単位のヒスト
グラムから抽出された０−１パターン表現を前記所定の
分類条件に従ってクラス分類して該当するクラスを求
め、作成された前記認識単位のヒストグラムから抽出さ
れた０−１パターン表現と、前記認識辞書に登録された
標準パターンのうち、求められた前記クラスと同一のク
ラスを示す情報に対応付けられた標準パターンとを夫々
部分的に照合して、該認識単位に対応する認識対象カテ
ゴリの候補を探索することを特徴とする。好ましくは、
前記認識単位は文字単位であり、前記認識対象カテゴリ
は字種であることを特徴とする。In the character recognition method according to the present invention (claim 16), two or more character images of the same character type but different fonts are provided as at least one standard pattern for each recognition target category. The 0-1 pattern expression obtained by performing predetermined maximization processing and minimization processing based on the 0-1 pattern expression respectively created from the histogram obtained from the dot matrix of FIG. Information indicating the class to which the standard pattern belongs obtained by classifying based on a predetermined classification condition is registered in association with a recognition dictionary, and an input document image is cut out for each predetermined recognition unit, A histogram is created from the cut-out recognition units, and the 0-1 pattern expression extracted from the created histogram of the recognition units is expressed in accordance with the predetermined classification condition. The corresponding class is obtained by classifying the class, and the 0-1 pattern expression extracted from the created histogram of the recognition unit and the same class as the obtained class among the standard patterns registered in the recognition dictionary. Is partially collated with the standard pattern associated with the information indicating the search unit, and a candidate for a recognition target category corresponding to the recognition unit is searched. Preferably,
The recognition unit is a character unit, and the recognition target category is a character type.

【００１９】好ましくは、前記ヒストグラムを作成する
ための基とする前記文字画像のドットマトリクスの対象
範囲は、文字画像の高さと幅で定められる内接矩形領域
であることを特徴とする。Preferably, a target range of the dot matrix of the character image based on which the histogram is created is an inscribed rectangular area defined by a height and a width of the character image.

【００２０】好ましくは、前記０−１パターン表現の作
成に用いるヒストグラムは、前記文字画像のドットマト
リクスについて所定の方向から射影した黒画素の計数値
を該所定の方向に並ぶ画素数で割って作成されたヒスト
グラムを、２以上の所定の方向について求め、これらを
ヒストグラムの横軸方向に連結したものであることを特
徴とする。Preferably, the histogram used to create the 0-1 pattern expression is created by dividing the count value of black pixels projected from a predetermined direction on the dot matrix of the character image by the number of pixels arranged in the predetermined direction. The obtained histogram is obtained in two or more predetermined directions, and these are connected in the horizontal axis direction of the histogram.

【００２１】好ましくは、前記所定の方向は、縦方向お
よび横方向の２つの方向、斜め±４５度方向の２つの方
向、または縦方向、横方向および斜め±４５度方向の４
つの方向であることを特徴とする。Preferably, the predetermined direction is two directions of a vertical direction and a horizontal direction, two directions of a diagonal ± 45 degrees direction, or four directions of a vertical direction, a horizontal direction and a diagonal ± 45 degrees direction.
It is characterized by two directions.

【００２２】なお、前記所定の方向には、縦方向、横方
向および斜め±４５度方向以外の方向を用いることも可
能である。The predetermined direction may be any direction other than the vertical direction, the horizontal direction, and the oblique direction of ± 45 degrees.

【００２３】好ましくは、前記０−１パターン表現の作
成にあたっては、前記連結したヒストグラムの横軸をＬ
個の区間に分割し、縦軸をｍ個の区間に分割して、ｍ×
Ｌ個の領域を形成し、該ｍ×Ｌ個の領域夫々において、
該領域を通過するヒストグラムが存在するとき該領域に
１を、存在しないとき０を割り当てることを特徴とす
る。Preferably, in creating the 0-1 pattern expression, the horizontal axis of the connected histogram is L
Divided into m sections, the vertical axis is divided into m sections, and mx
L regions are formed, and in each of the m × L regions,
When a histogram passing through the region exists, 1 is assigned to the region, and when it does not exist, 0 is assigned.

【００２４】また、好ましくは、前記連結したヒストグ
ラムの横軸ｉを分割してなるＬ個の区間を、前記連結し
たヒストグラムの基となった各ヒストグラム内におい
て、互いに所定の幅だけ重複させることを特徴とする。[0024] Preferably, L sections obtained by dividing the horizontal axis i of the connected histogram are overlapped by a predetermined width in each of the histograms on which the connected histogram is based. Features.

【００２５】好ましくは、前記最大化処理は、前記標準
パターンの０−１パターン表現のｍ×Ｌ個の領域を、前
記同一字種であるが互いに字体の異なる２以上の文字画
像夫々に対応する前記０−１パターン表現のうちのいず
れかから、前記横軸のＬ個の各区間ごとに選択するもの
であり、該選択にあたっては、前記横軸の各区間につい
て、縦軸方向に並ぶｍ個の領域のうち１である領域の数
が最大である前記０−１パターン表現における当該ｍ個
の領域を選択するものであることを特徴とする。Preferably, in the maximizing process, m × L regions of the 0-1 pattern expression of the standard pattern correspond to two or more character images of the same character type but different fonts. A selection is made for each of the L sections on the horizontal axis from any of the 0-1 pattern representations. In the selection, m sections are arranged in the vertical axis direction for each section on the horizontal axis. , And selects the m areas in the 0-1 pattern expression in which the number of 1 areas is the largest.

【００２６】好ましくは、前記最小化処理は、前記標準
パターンの０−１パターン表現のｍ×Ｌ個の領域を、前
記同一字種であるが互いに字体の異なる２以上の文字画
像夫々に対応する前記０−１パターン表現のうちのいず
れかから、前記横軸のＬ個の各区間ごとに選択するもの
であり、該選択にあたっては、前記横軸の各区間につい
て、縦軸方向に並ぶｍ個の領域のうち１である領域の数
が最小である前記０−１パターン表現における当該ｍ個
の領域を選択するものであることを特徴とする。Preferably, in the minimizing process, m × L areas of the 0-1 pattern expression of the standard pattern correspond to two or more character images of the same character type but different fonts. A selection is made for each of the L sections on the horizontal axis from any of the 0-1 pattern representations. In the selection, m sections are arranged in the vertical axis direction for each section on the horizontal axis. And selecting the m areas in the 0-1 pattern expression in which the number of areas that are 1 out of the areas is the smallest.

【００２７】また、本発明（請求項１２）は、請求項１
１において、前記０−１パターン表現の作成に用いるヒ
ストグラムは、前記文字画像のドットマトリクスについ
て所定の方向から射影した黒画素の計数値を該所定の方
向に並ぶ画素数で割って作成されたヒストグラムを、２
以上の所定の方向について求め、これらをヒストグラム
の横軸方向に連結したものであり、前記０−１パターン
表現は、前記連結したヒストグラムの横軸をＬ個の区間
に分割し、縦軸をｍ個の区間に分割して、ｍ×Ｌ個の領
域を形成し、該ｍ×Ｌ個の領域夫々において、該領域を
通過するヒストグラムが存在するとき該領域に１を、存
在しないとき０を割り当てて作成したものであることを
特徴とする。The present invention (Claim 12) provides Claim 1
1, the histogram used to create the 0-1 pattern expression is a histogram created by dividing the count value of black pixels projected from a predetermined direction on the dot matrix of the character image by the number of pixels arranged in the predetermined direction. , 2
The above-mentioned predetermined directions are obtained, and these are connected in the horizontal axis direction of the histogram. The 0-1 pattern expression divides the horizontal axis of the connected histogram into L sections and sets the vertical axis to m sections. M × L areas are formed, and 1 is assigned to each of the m × L areas when a histogram passing through the area exists, and 0 when no histogram exists. It is characterized by being created by

【００２８】また、本発明（請求項１３）は、請求項１
１または１２において、前記最大化処理は、前記標準パ
ターンの０−１パターン表現のｍ×Ｌ個の領域を、前記
同一字種であるが互いに字体の異なる２以上の文字画像
夫々に対応する前記０−１パターン表現のうちのいずれ
かから、前記横軸のＬ個の各区間ごとに選択するもので
あり、該選択にあたっては、前記横軸の各区間につい
て、縦軸方向に並ぶｍ個の領域のうち１である領域の数
が最大である前記０−１パターン表現における当該ｍ個
の領域を選択するものであり、前記最小化処理は、前記
標準パターンの０−１パターン表現のｍ×Ｌ個の領域
を、前記同一字種であるが互いに字体の異なる２以上の
文字画像夫々に対応する前記０−１パターン表現のうち
のいずれかから、前記横軸のＬ個の各区間ごとに選択す
るものであり、該選択にあたっては、前記横軸の各区間
について、縦軸方向に並ぶｍ個の領域のうち１である領
域の数が最小である前記０−１パターン表現における当
該ｍ個の領域を選択するものであることを特徴とする。The present invention (claim 13) provides claim 1
In 1 or 12, the maximizing process includes the step of converting the m × L areas of the 0-1 pattern expression of the standard pattern into two or more character images corresponding to the same character type but different fonts from each other. A selection is made for each of the L sections on the horizontal axis from any of the 0-1 pattern representations. In the selection, m sections arranged in the vertical axis direction are selected for each section on the horizontal axis. And selecting the m areas in the 0-1 pattern expression in which the number of areas that are 1 among the areas is the largest. L regions are determined for each of the L sections on the horizontal axis from any of the 0-1 pattern expressions corresponding to the two or more character images having the same character type but different fonts. To choose, In other words, for each section on the horizontal axis, the m areas in the 0-1 pattern expression in which the number of areas that are 1 are the minimum among the m areas arranged in the vertical axis direction are selected. There is a feature.

【００２９】また、本発明（請求項１４）は、請求項１
１において、前記分類条件は、前記０−１パターン表現
が持つ０と１からなるパターンの形状を規定するもので
あることを特徴とする。Further, the present invention (claim 14) provides claim 1
1, wherein the classification condition specifies a shape of a pattern consisting of 0 and 1 in the 0-1 pattern expression.

【００３０】また、本発明（請求項１５）は、請求項１
２において、前記分類条件は、１組または複数組の異な
る基準からなり、各々の基準は、前記０−１パターン表
現において横軸の区間の持つ縦軸方向の値についての規
定と、この規定を満たす横軸の区間についての規定から
なるものであることを特徴とする。The present invention (claim 15) provides claim 1
2, the classification condition includes one or a plurality of different criteria, and each criteria is defined by a rule on a value in a vertical axis direction of a section on a horizontal axis in the 0-1 pattern expression and a rule on this rule. It is characterized by the provision of the section of the horizontal axis to be satisfied.

【００３１】本発明に係る文字認識装置及び方法では、
縦および横の線分の大きさと相互の位置関係が、縦およ
び横方向のヒストグラムの構造に直接的に反映されるこ
とに着目するとともに、斜めの線分については、±４５
度の２つの方向のヒストグラムに反映されることに着目
し、ヒストグラムの０−１パターン表現を駆使してノイ
ズの影響を効率よく吸収し、かつ文字の構造情報を有効
に活用するようにしている。In the character recognition device and method according to the present invention,
Note that the size and the positional relationship between the vertical and horizontal line segments are directly reflected in the structure of the histogram in the vertical and horizontal directions.
Focusing on the fact that the degree is reflected in the histogram in two directions, the effect of noise is efficiently absorbed by making full use of the 0-1 pattern expression of the histogram, and the structural information of the character is effectively used. .

【００３２】すなわち、本発明では、まず入力された文
書画像を所定の認識単位毎、例えば文字単位毎に切り出
し、切り出された認識単位から例えば縦方向や横方向の
ヒストグラムを作成する。That is, in the present invention, an input document image is firstly cut out for each predetermined recognition unit, for example, for each character unit, and a vertical or horizontal histogram is created from the cut out recognition unit.

【００３３】そして、作成されたヒストグラムから抽出
された０−１パターン表現と、予め用意された各認識対
象カテゴリの０−１パターン表現とを夫々部分的に照合
して、該認識単位に対応する認識対象カテゴリの候補を
探索し、カテゴリ候補（例えば文字候補）を絞る。この
１つまたは複数の認識対象カテゴリの候補が、認識結果
（あるいは大分類結果）として出力される。Then, the 0-1 pattern expression extracted from the created histogram and the previously prepared 0-1 pattern expression of each recognition target category are partially collated to correspond to the recognition unit. A candidate for a category to be recognized is searched for, and category candidates (for example, character candidates) are narrowed down. The one or more candidates for the recognition target category are output as a recognition result (or a large classification result).

【００３４】本発明では、比較判定・計数などの単純な
操作のみで、従来のようなパターンマッチングによる距
離計算や文字画像の正規化、細線化などに類する複雑な
画像処理を一切必要とせず、ヒストグラムの生成という
単純な処理が唯一の画像処理になるという大きな利点が
ある。このようにして、少ない計算量で未知入力文字と
標準パターンを照合することによって、大分類の飛躍的
な高速化を達成している。In the present invention, there is no need for complicated image processing such as distance calculation by pattern matching, normalization of character images, thinning, and the like at all with simple operations such as comparison judgment and counting. There is a great advantage that a simple process of generating a histogram is the only image processing. In this way, by comparing an unknown input character with a standard pattern with a small amount of calculation, dramatic speed-up of large classification is achieved.

【００３５】また、本発明で初めて提供される文字画像
のヒストグラムを用いた０−１パターン表現による部分
的照合を用いれば、該ヒストグラムには文字の構造情報
が際だった特徴情報として直接的に反映されているの
で、上記のような簡単な照合処理によって、高精度に文
字を認識することができる。If partial matching based on a 0-1 pattern expression using a histogram of a character image, which is provided for the first time in the present invention, is used, the structural information of the character is directly included in the histogram as distinctive feature information. Since it is reflected, characters can be recognized with high accuracy by the simple collation processing as described above.

【００３６】さて、発明に係る標準パターン作成方法及
び装置では、上記のような文字認識装置及び方法におい
て次のことが可能となる。Now, with the method and apparatus for creating a standard pattern according to the present invention, the following can be achieved in the above-described character recognition apparatus and method.

【００３７】まず、標準パターン作成方法では、同一字
種であるが互いに字体の異なる２以上の文字画像のドッ
トマトリクスから夫々ヒストグラムを作成し、作成され
た各ヒストグラムから夫々０−１パターン表現を作成
し、作成された各０−１パターン表現をもとに最大化処
理および最小化処理を行い、その結果を前記標準パター
ンとしている。First, in the standard pattern creating method, a histogram is created from a dot matrix of two or more character images having the same character type but different fonts, and a 0-1 pattern expression is created from each created histogram. Then, maximization processing and minimization processing are performed based on each of the created 0-1 pattern expressions, and the result is used as the standard pattern.

【００３８】この結果、上記のようにして作成した標準
パターンを用いると、入力文字画像と標準パターンを０
−１パターン表現にて部分照合した場合、入力文字画像
が前記標準パターンを作成する際に用いた字体であれ
ば、必ず部分照合の条件に適合するので、認識候補とし
て選ばれることになる。As a result, when the standard pattern created as described above is used, the input character image and the standard pattern
In the case of partial matching using the -1 pattern expression, if the input character image is a font used when creating the standard pattern, the input character image always matches the condition of partial matching and is therefore selected as a recognition candidate.

【００３９】したがって、認識の対象となるフォントの
種類や文字サイズの範囲が与えられると、そのサンプル
セットを用いて字体の変動を吸収しかつ大分類で得られ
る平均候補字種数を所望の値以下とするような標準パタ
ーンセットの系統的作成法を与えることができる。Therefore, given the range of font types and character sizes to be recognized, the variation of fonts is absorbed using the sample set, and the average number of candidate character types obtained in the large classification is set to a desired value. It is possible to provide a method for systematically creating a standard pattern set as follows.

【００４０】次に、本発明による辞書大規模化に係る認
識の高速化について述べる。Next, a description will be given of the speeding up of the recognition according to the enlargement of the dictionary according to the present invention.

【００４１】日本語や中国語では、認識対象となる字種
数の増大（通常５千字種以上）と数多くの多様な字体が
存在することから１字種当り標準パターン数を平均５個
としても辞書の規模はたちまち数万個になる。本発明で
は、辞書大規模化に係る認識の高速化のために、０−１
パターン表現に現れる顕著な特徴の組合せによる分類条
件（分類基準）に基づいて標準パターンを予め適当数の
クラスに分類し、入力文字画像に対してはこの分類条件
に基づいて該当するクラスを決定しそのクラスの標準パ
ターンを用いて前述の大分類が行われる。この場合、実
際には各クラスの規模を例えば数千個程度としてよいの
で、文字画像のヒストグラムの０−１パターン表現に基
づく本手法においては、入力文字画像が該当するクラス
の決定は極めて正確かつ簡単にできる。In Japanese and Chinese, the number of character types to be recognized is increased (usually more than 5,000 character types) and there are many various fonts. The size of the dictionary will soon reach tens of thousands. According to the present invention, 0-1
The standard pattern is classified into an appropriate number of classes in advance based on classification conditions (classification criteria) based on a combination of salient features appearing in the pattern expression, and a corresponding class is determined for the input character image based on the classification conditions. The above-described large classification is performed using the standard pattern of the class. In this case, in practice, the scale of each class may be, for example, about several thousands. Therefore, in the present method based on the 0-1 pattern expression of the histogram of the character image, the determination of the class to which the input character image corresponds is extremely accurate and Easy to do.

【００４２】従って、本発明によれば、辞書の規模が大
規模、例えば５万個程度、の標準パターンの場合でも大
分類のための所要時間は未知入力が該当する標準パター
ンのクラスの大きさ（数千個程度）で定まる所要時間に
ほぼ等しいものとなる。Therefore, according to the present invention, even when a dictionary has a large scale, for example, about 50,000 standard patterns, the time required for the large classification is the size of the class of the standard pattern to which the unknown input corresponds. (Approximately several thousand), which is almost equal to the required time.

【００４３】なお、従来の文字認識方法において標準パ
ターンのクラス分けを導入して認識の高速化を図ったも
のはなく、本発明は文字画像のヒストグラムを介して文
字の構造情報を活用する文字認識に独特の高速化法であ
るということができる。There is no conventional character recognition method for introducing a standard pattern classification to speed up the recognition, and the present invention provides a character recognition method utilizing character structure information via a histogram of a character image. It can be said that this is a unique speed-up method.

【００４４】以上のように本発明によれば、印刷文字認
識の高精度化と高速化を効率良く実現することが可能と
なる。As described above, according to the present invention, it is possible to efficiently realize high precision and high speed printing character recognition.

【００４５】[0045]

【発明の実施の形態】以下、図面を参照しながら発明の
実施の形態を説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００４６】まず、本実施形態の文字認識で用いるヒス
トグラムとその０−１パターン表現を説明する。なお、
本発明に係る標準パターン作成方法は、この０−１パタ
ーン表現の標準パターン（つまり辞書登録する辞書パタ
ーン）の作成方法を工夫し、１つあるいはごく少数の標
準パターンで字体の相違する同一字種を高速かつ正確に
認識できるようにしたものである。また、本発明に係る
認識の高速化法は、標準パターンの数が増大しても（例
えば数万程度に増大したとしても）高速認識を可能とす
るものである。First, a histogram used in character recognition according to the present embodiment and its 0-1 pattern expression will be described. In addition,
The standard pattern creating method according to the present invention devises a method for creating a standard pattern of the 0-1 pattern expression (that is, a dictionary pattern to be registered in a dictionary), and uses one or a very small number of standard patterns with the same character type having different fonts. Can be quickly and accurately recognized. Further, the method for speeding up recognition according to the present invention enables high-speed recognition even when the number of standard patterns increases (for example, even if it increases to about tens of thousands).

【００４７】本実施形態で用いるヒストグラムとは、い
わゆる周辺分布であり、文字画像のドットマトリクス
を、縦、横あるいは±４５度などの方向に射影した黒画
素の計数値の分布である。ただし、本実施形態では、文
字画像のヒストグラムの対象範囲として、文字の高さと
幅で定められる内接矩形領域を定義している。The histogram used in the present embodiment is a so-called marginal distribution, which is a distribution of black pixel count values obtained by projecting a dot matrix of a character image in a vertical, horizontal or ± 45 degrees direction. However, in the present embodiment, an inscribed rectangular area defined by the height and width of the character is defined as the target range of the histogram of the character image.

【００４８】図１（ａ）は、文字画像「本」を示すドッ
トマトリクスの一例を示している。内接矩形領域は、横
７ドット・縦８ドットを包含する領域である。（ｂ）
に、縦方向のヒストグラムの一例を、（ｃ）に横方向の
ヒストグラムの一例を示す。図２には、±４５度方向の
ヒストグラムの一例を夫々示す。また、様々な文字画像
の縦および横の２方向のヒストグラムの例を図３および
図４に示す。FIG. 1A shows an example of a dot matrix showing a character image "book". The inscribed rectangular area is an area including 7 horizontal dots and 8 vertical dots. (B)
FIG. 1 shows an example of a vertical histogram, and FIG. 2C shows an example of a horizontal histogram. FIG. 2 shows an example of the histogram in the ± 45 degrees direction. FIGS. 3 and 4 show examples of histograms of various character images in the vertical and horizontal directions.

【００４９】次に、０−１パターン表現について説明す
る。後述するように、この０−１パターン表現は、入力
文字のドットマトリクスから得たものと辞書登録された
ものとを後述する分類手順により照合するために用いら
れる。０−１パターン表現は予め定めたいくつかの方向
のヒストグラムを基にして作成するものであるが、ここ
では縦および横方向のヒストグラムを用いる場合につい
て説明する。Next, the 0-1 pattern expression will be described. As will be described later, the 0-1 pattern expression is used for collating a character obtained from a dot matrix of input characters with a character registered in a dictionary by a classification procedure described later. The 0-1 pattern expression is created based on histograms in some predetermined directions. Here, a case where vertical and horizontal histograms are used will be described.

【００５０】図５と図１４の（ａ），（ｂ）は、縦およ
び横方向のヒストグラムの例を、（ｃ）は０−１パター
ン表現（またはその一部分）の一例をそれぞれ示してい
る。以下、図５や図１４の（ａ），（ｂ）のヒストグラ
ムから、（ｃ）の０−１パターン表現を得るために行う
操作について、図５を参照しながら説明する。FIGS. 5A and 14A show examples of histograms in the vertical and horizontal directions, and FIG. 5C shows an example of 0-1 pattern expression (or a part thereof). Hereinafter, an operation performed to obtain the 0-1 pattern expression of (c) from the histograms of (a) and (b) of FIG. 5 and FIG. 14 will be described with reference to FIG.

【００５１】まず、図５のように、（ａ）のような縦方
向のヒストグラムと（ｂ）のような横方向のヒストグラ
ムを２つ並べる。（ａ），（ｂ）の各ヒストグラムの横
軸を、それぞれ、Ｌ１個およびＬ２個の区間に分割す
る。字体の変動を考慮して隣接する区間を適当な幅δだ
け重複させる。First, as shown in FIG. 5, two vertical histograms as shown in FIG. 5A and two horizontal histograms as shown in FIG. The horizontal axis of each of the histograms (a) and (b) is divided into L1 and L2 sections, respectively. Adjacent sections are overlapped by an appropriate width δ in consideration of the change of the font.

【００５２】図５において、縦および横方向のヒストグ
ラムの高さを、それぞれ、文字画像の高さおよび幅に対
する百分率（％）で表す。縦軸の区間［０，１００］を
ｍ個の区間に分割し、ｍ×（Ｌ１＋Ｌ２）個の領域にお
いて、その領域を通過するヒストグラムが存在するとき
その領域に１を、存在しないとき０を割り当てる。この
ようにして、（ｃ）のようなヒストグラムの０−１パタ
ーン表現を得る。ただし、ヒストグラムの高さが１００
％に等しいときは、縦軸１００％の境界を通過したと見
なす。In FIG. 5, the heights of the histograms in the vertical and horizontal directions are expressed as percentages (%) with respect to the height and width of the character image, respectively. The section [0, 100] on the vertical axis is divided into m sections, and in the m × (L1 + L2) areas, 1 is assigned to a histogram when there is a histogram passing through the area, and 0 is assigned to no such histogram. . In this way, a 0-1 pattern expression of the histogram as shown in (c) is obtained. However, if the height of the histogram is 100
When it is equal to%, it is considered that the image has passed the boundary of 100% on the vertical axis.

【００５３】横軸の各区間は、座標ｉ（ｉ＝１，２，
…，Ｌ１＋Ｌ２）と呼ぶ。縦軸をしきい値θで表し、両
端と分割点をθｔ（ｔ＝１，２，…，ｍ＋１）で表す。
ただし、θ１＝１００％、θm+1 ＝０％とおく。する
と、ヒストグラムの０−１パターン表現の各領域は、
（ｉ，θｔ）、ｉ＝１，…，Ｌ１＋Ｌ２、ｔ＝１，２，
…，ｍで表される。Each section on the horizontal axis is represented by coordinates i (i = 1, 2, 2, 3).
.., L1 + L2). The vertical axis is represented by threshold value θ, and both ends and division points are represented by θt (t = 1, 2,..., M + 1).
However, θ1 = 100% and θm + 1 = 0%. Then, each area of the 0-1 pattern representation of the histogram is
(I, θt), i = 1,..., L1 + L2, t = 1, 2,
.., M.

【００５４】ここで、図５はＬ１＝３、Ｌ２＝４にした
例であり、また図１４はＬ１＝５、Ｌ２＝５、δ＝０、
ｍ＝１３とした例となっている。FIG. 5 shows an example in which L1 = 3 and L2 = 4, and FIG. 14 shows an example in which L1 = 5, L2 = 5, δ = 0,
In this example, m = 13.

【００５５】なお、±４５度の２方向のヒストグラムに
対する０−１パターン表現あるいは縦、横方向および±
４５度の４方向のヒストグラムに対する０−１パターン
表現等も全く同様に構成される。It should be noted that the 0-1 pattern expression for the two-direction histograms of ± 45 degrees or the vertical and horizontal directions and ±
The 0-1 pattern expression and the like for the 45-degree four-direction histogram are configured in exactly the same manner.

【００５６】ここで、０−１パターン表現の双対性につ
いて説明する。Here, the duality of the 0-1 pattern expression will be described.

【００５７】０−１パタ−ン表現はヒストグラムを表す
ことから、（Ａ）しきい値θｔを、θ１＝１００％から順次下に
（θm+1 ＝０％に）向けて、ｔ＝１，２，３，…，とス
キャンして行く過程では、領域（ｉ，θｔ）が１なら
ば、ｔ´≧ｔとなるすべての領域（ｉ，θｔ´）で１で
あり、領域（ｉ，θｔ）が０ならば、ｔ´≦ｔとなるす
べての領域（ｉ，θｔ´）で０である。Since the 0-1 pattern expression represents a histogram, (A) the threshold value θt is sequentially reduced from θ1 = 100% (to θm + 1 = 0%), and t = 1, In the process of scanning as 2, 3,..., If the region (i, θt) is 1, it is 1 in all the regions (i, θt ′) where t ′ ≧ t, and the region (i, θt) ) Is 0, it is 0 in all regions (i, θt ′) where t ′ ≦ t.

【００５８】これとは逆に、次のことが言える。On the contrary, the following can be said.

【００５９】（Ｂ）しきい値θｔの目盛りは、上記
（Ａ）で１００％のところを０％に、０％のところを１
００％とし、θ１＝１００％から順次上に（θm+1 ＝０
％）に向けて、ｔ＝１，２，３，…，とスキャンして行
く過程では、領域（ｉ，θｔ）が０ならば、ｔ´≧ｔと
なるすべての領域（ｉ，θｔ´）で０であり、領域
（ｉ，θｔ）が１ならば、ｔ´≦ｔとなるすべての領域
（ｉ，θｔ´）で１である。(B) The scale of the threshold value θt is 100% at the above (A), 0%, and 0% at 1%.
00%, and sequentially from θ1 = 100% (θm + 1 = 0)
%), In the process of scanning t = 1, 2, 3,..., If the region (i, θt) is 0, all the regions (i, θt ′) satisfying t ′ ≧ t Is 0 and if the region (i, θt) is 1, it is 1 in all the regions (i, θt ′) where t ′ ≦ t.

【００６０】このような（Ａ）と（Ｂ）の関係を、ここ
では、０−１パターンの双対性と呼ぶ。Such a relationship between (A) and (B) is referred to herein as a 0-1 pattern duality.

【００６１】大分類手順（後述）において上記（Ａ）に
基づいて行う分類手順を分類手順１Ａ、上記（Ｂ）に基
づく分類手順を分類手順１Ｂと呼び、ここでは縦方向お
よび横方向の０−１パターンを対象にしている。また、
細分類（後述）では、分類手順１Ａ、分類手順１Ｂと同
じ分類手順を、それぞれ分類手順２Ａ、分類手順２Ｂと
呼び、ここでは±４５度方向のヒストグラムに対する０
−１パターンを対象にしている。つまり、基本的な処理
手順はいずれも同じである。A classification procedure performed based on the above (A) in a large classification procedure (described later) is referred to as a classification procedure 1A, and a classification procedure based on the above (B) is referred to as a classification procedure 1B. One pattern is targeted. Also,
In the sub-classification (described later), the same classification procedure as the classification procedure 1A and the classification procedure 1B is called a classification procedure 2A and a classification procedure 2B, respectively.
-1 pattern is targeted. That is, the basic processing procedures are the same.

【００６２】次に、本実施形態の文字認識装置について
説明する。本文字認識装置の構成は、基本的には、本発
明者が特願平６−３１２０４９号にて開示した新しい原
理に基づく文字認識装置と同様であるが、本実施形態の
文字認識装置では０−１パターン表現として持つ標準パ
ターンに特徴がある。Next, the character recognition device of this embodiment will be described. The configuration of this character recognition device is basically the same as that of the character recognition device based on the new principle disclosed by the present inventor in Japanese Patent Application No. 6-312049. There is a feature in a standard pattern having a -1 pattern expression.

【００６３】この文字認識手法は、概略的には、文字画
像の各方向の周辺分布（ヒストグラム）に文字の構造に
関する情報が直接的に反映されていることに着目し、切
り出した入力文字画像のヒストグラムから得た０−１パ
ターン表現と用意してある各標準文字のヒストグラムか
ら得た０−１パターン表現とを部分照合するものであ
る。This character recognition method generally focuses on the fact that information on the structure of a character is directly reflected in a peripheral distribution (histogram) in each direction of the character image, and focuses on the extracted input character image. The partial matching is performed between the 0-1 pattern expression obtained from the histogram and the 0-1 pattern expression obtained from the histogram of each prepared standard character.

【００６４】図６に示すように本文字認識装置は、文書
画像入力部２、文字切り出し処理部４、大分類処理部
６、細分類処理部８、文字のドットマトリクスからヒス
トグラムを作成するヒストグラム作成部１０、大分類お
よび必要に応じて細分類に用いる０−１パターン表現の
標準辞書を持つ０−１パターン表現辞書１２を備えてい
る。As shown in FIG. 6, the present character recognition apparatus includes a document image input unit 2, a character cutout processing unit 4, a large classification processing unit 6, a fine classification processing unit 8, and a histogram creation for creating a histogram from a character dot matrix. The unit 10 includes a 0-1 pattern expression dictionary 12 having a standard dictionary of 0-1 pattern expressions used for large classification and, if necessary, fine classification.

【００６５】０−１パターン表現辞書１２には、認識対
象文字の（ｍ×（Ｌ１＋Ｌ２）個の領域すべてに対す
る）０−１パターン表現を標準パターンとして持ってい
る。認識対象文字ごとに、複数の作成条件（上記のｍ、
Ｌ１、Ｌ２、δ、あるいは後述する１つの標準パターン
にて代表するフォントの種類あるいはサイズの範囲等）
に夫々対応する０−１パターン表現を用意しておいても
良い。なお、本実施形態に係る標準パターン、その作成
方法、およびこれを用いての分類手順についての詳細は
後述する。The 0-1 pattern expression dictionary 12 has 0-1 pattern expressions (for all m × (L1 + L2) areas) of the characters to be recognized as standard patterns. For each character to be recognized, a plurality of creation conditions (m,
L1, L2, δ, font type or size range represented by one standard pattern described later, etc.)
May be prepared in correspondence with 0-1 pattern expressions. The details of the standard pattern according to the present embodiment, a method of creating the standard pattern, and a classification procedure using the standard pattern will be described later.

【００６６】文書画像入力部２、文字切り出し処理部
４、ヒストグラム作成部１０、大分類処理部６および細
分類処理部８は、特願平６−３１２０４９号にて開示さ
れているものと同様である。以下、その要点を説明す
る。The document image input section 2, character cutout processing section 4, histogram creation section 10, large classification processing section 6, and fine classification processing section 8 are the same as those disclosed in Japanese Patent Application No. 6-312049. is there. Hereinafter, the main points will be described.

【００６７】文書画像入力部２は、例えばイメージスキ
ャナを用いて、書面に印刷された文字イメージを光学像
として取り込み電気信号に変換し、その後、予め定めら
れたしきい値に従い、その濃度値を白黒の２値に量子化
して出力する。The document image input unit 2 takes in a character image printed on a document as an optical image and converts it into an electric signal by using, for example, an image scanner, and then converts the density value according to a predetermined threshold value. Quantizes to black and white binary and outputs.

【００６８】文字切り出し処理部４は、用紙上の文字の
並びの規則性などに基づいた公知の手法により入力イメ
ージから文字を切り出す検切処理を行う。The character cutout processing section 4 performs a checkout process for cutting out characters from an input image by a known method based on the regularity of the arrangement of characters on a sheet.

【００６９】ヒストグラム作成部１０は、前述のように
して、文字のドットマトリクスからヒストグラムを作成
する。The histogram creating section 10 creates a histogram from the character dot matrix as described above.

【００７０】大分類処理部６は、ヒストグラム作成部１
０を用い、切り出し文字のドットマトリクスから大分類
用のヒストグラム（ここでは縦方向および横方向のヒス
トグラム）を得る。そして、０−１パターン表現辞書１
２を用いて大分類処理（ここでは後述する分類手順１Ａ
／１Ｂ）を行い、未知入力文字に対する候補字種を絞
る。The large classification processing section 6 includes a histogram creation section 1
Using 0, a histogram for large classification (here, histograms in the vertical and horizontal directions) is obtained from the dot matrix of the cut-out character. Then, the 0-1 pattern expression dictionary 1
2 using the large classification process (here, a classification procedure 1A described later).
/ 1B) to narrow down candidate character types for unknown input characters.

【００７１】細分類処理部８は、大分類処理部６にて得
られた候補字種の集合をもとに出力すべき認識結果の選
定と、文字候補が複数ある場合にその優先順位の決定を
行う。その際、必要に応じて、切り出し文字のドットマ
トリクスから得た細分類用のヒストグラム（ここでは±
４５度方向の２つのヒストグラム）および０−１パター
ン表現辞書１２を用いた分類手順２Ａ／２Ｂ等を行い、
その結果をもとに認識結果を決定する。The sub-classification processing unit 8 selects a recognition result to be output based on the set of candidate character types obtained by the large classification processing unit 6 and determines a priority order when there are a plurality of character candidates. I do. At this time, if necessary, a histogram for fine classification obtained from the dot matrix of the cut-out character (here, ±
Classification procedure 2A / 2B using the two histograms in the 45-degree direction) and the 0-1 pattern expression dictionary 12
The recognition result is determined based on the result.

【００７２】最終的な処理結果は、例えば文字コードと
して出力する。文字候補が複数ある場合は、優先順位第
１位のもののみあるいは優先順位情報とともにいくつか
を出力する。出力された文字コード等は、ＲＡＭもしく
は磁気ディスクなどの記憶装置に格納され、および／ま
たはＣＲＴやプリンタなどの表示装置に表示される。い
くつかの候補が出力された場合は、必要に応じてユーザ
に選択させる。The final processing result is output, for example, as a character code. If there are a plurality of character candidates, only the first one or several with the priority information are output. The output character code or the like is stored in a storage device such as a RAM or a magnetic disk, and / or displayed on a display device such as a CRT or a printer. When some candidates are output, the user is made to select as necessary.

【００７３】次に、大分類に用いる分類手順１Ａを説明
する。分類手順１Ａの分類アルゴリズムを、図７〜図９
に示す。この処理は、大きく４つの処理（ａ），
（ｂ），（ｃ），（ｄ）に分けられる。Next, the classification procedure 1A used for the large classification will be described. The classification algorithm of the classification procedure 1A is shown in FIGS.
Shown in This process is roughly divided into four processes (a),
(B), (c), and (d).

【００７４】（処理ａ）まず、初期値の設定を行う（ス
テップＳ１１）。区間の重複幅δおよび候補字種数を絞
り込む目安となる定数κを決める。また、初期値とし
て、分類の対象となる字種の集合Ωを、 Ω＝｛対象とする全字種（０−１パターン表現）の集
合｝とおき、縦軸をｍ個の区間に分割しｔの初期値を、ｔ＝１，θ１＝１００％，（例：θt+1 ＝θｔ−５％，
ｔ＝１，２，…，ｍ＝２０）とおく。座標ｉは最初はいずれも固定されていないもの
とする。(Process a) First, an initial value is set (step S11). The overlap width δ of the section and the constant κ as a guide for narrowing down the number of candidate character types are determined. Also, as an initial value, a set Ω of character types to be classified is set as Ω = {a set of all character types (0-1 pattern expression)}, and the vertical axis is divided into m sections. The initial value of t is t = 1, θ1 = 100%, (Example: θt + 1 = θt-5%,
t = 1, 2,..., m = 20). It is assumed that none of the coordinates i is initially fixed.

【００７５】また、（ｉ，θｔ），ｉ＝１，２，…，Ｌ
１＋Ｌ２の各領域がとる０，１の配列（横の配列）をｕ
で、同様に（ｉ，θt+1 ），（ｉ，θt+2 ）および
（ｉ，θt+3 ）に対するそれを、それぞれ、ｖ，ｗおよ
びｘで表す。以下では、ｕ，ｖ，ｗおよびｘを、それぞ
れ、単に配列と呼ぶ。Also, (i, θt), i = 1, 2,..., L
The array of 0 and 1 (horizontal array) taken by each area of 1 + L2 is u
In the same manner, those for (i, θt + 1), (i, θt + 2) and (i, θt + 3) are represented by v, w, and x, respectively. Hereinafter, u, v, w, and x are each simply referred to as an array.

【００７６】ｕ，ｖ，ｗ，ｘは、その定め方から、０−
１パターン表現の部分パターンを表す（図５（ｃ）、図
１５（ｃ））。U, v, w, x are defined as 0-
A partial pattern of one pattern expression is shown (FIGS. 5C and 15C).

【００７７】（処理ｂ）Ωに属する未知入力文字Ｘのヒ
ストグラムについて、しきい値θｔ，θt+1 ，θt+2 お
よびθt+3 について、それぞれ、配列ｕ，ｖ，ｗおよび
ｘを求める（図５（ｃ）、図１５（ｃ））、（ステップ
Ｓ１２）。(Process b) With respect to the histogram of the unknown input character X belonging to Ω, the arrays u, v, w and x are obtained for the threshold values θt, θt + 1, θt + 2 and θt + 3, respectively (FIG. 5 (c), FIG. 15 (c)), (Step S12).

【００７８】（処理ｃ）固定座標以外の座標で、ｕの値
が１となる座標が存在せず、かつｘの値が０となる座標
も存在しない場合（ステップＳ１３でＹｅｓの場合）、
θｔ＝θt+1 ，θt+1 ＝θt+2 ，θt+2 ＝θt+3 ，θt+
3 ＝θt+4 とおいて、処理ｂ（ステップＳ１２）へ移る
（ステップＳ１４）。(Process c) When there are no coordinates other than the fixed coordinates where the value of u is 1 and there are no coordinates where the value of x is 0 (Yes in step S13),
θt = θt + 1, θt + 1 = θt + 2, θt + 2 = θt + 3, θt +
With 3 = θt + 4, the process moves to the process b (step S12) (step S14).

【００７９】その他の場合（ステップＳ１３でＮｏの場
合）、ｕの値が１となる座標をｉ、ｘの値が０となる座
標をｉ´とする（ステップＳ１５）。ヒストグラムの性
質から、ｉとｉ´は異なる。In other cases (No in step S13), coordinates where the value of u is 1 are i, and coordinates where the value of x is 0 are i '(step S15). From the nature of the histogram, i and i 'are different.

【００８０】さらに、ｕの値が１となる座標ｉではｗの
値は１、ｘの値が０となる座標ｉ´ではｗの値は０とな
る。そこでΩの中で、このようなすべての座標ｉ（１≦
ｉ≦Ｌ１＋Ｌ２）およびｉ´（１≦ｉ´≦Ｌ１＋Ｌ２）
について、ｗの値が、それぞれ、１および０となるすべ
ての字種からなる集合をＳｗ（ｉ＝１，ｉ´＝０）で表
す。ただし、このようなｉおよびｉ´の一方が存在しな
い場合には、括弧内に存在するもののみを記す。すなわ
ち、Ｓｗ（ｉ＝１，ｉ´＝０）の代わりにＳｗ（ｉ＝
１）またはＳｗ（ｉ´＝０）で表す（ステップＳ１
６）。Further, the value of w is 1 at the coordinate i where the value of u is 1, and the value of w is 0 at the coordinate i 'where the value of x is 0. Then, in Ω, all such coordinates i (1 ≦ 1)
i ≦ L1 + L2) and i ′ (1 ≦ i ′ ≦ L1 + L2)
, A set of all character types in which the value of w is 1 and 0 is represented by Sw (i = 1, i ′ = 0). However, when one of such i and i 'does not exist, only the one existing in parentheses is described. That is, instead of Sw (i = 1, i ′ = 0), Sw (i =
1) or Sw (i ′ = 0) (Step S1)
6).

【００８１】そして、１≦｜Ｓｗ（ｉ＝１，ｉ´＝０）
｜≦κならば（ステップＳ１７でＹｅｓの場合）、候補
字種の集合としてＳｗ（ｉ＝１，ｉ´＝０）を出力して
（ステップＳ２３）、終了する。Then, 1 ≦ | Sw (i = 1, i ′ = 0)
If | ≦ κ (Yes in step S17), Sw (i = 1, i ′ = 0) is output as a set of candidate character types (step S23), and the process ends.

【００８２】｜Ｓｗ（ｉ＝１，ｉ´＝０）｜＞κならば
（ステップＳ１８でＹｅｓの場合）、ここで用いたｉの
みをすべて固定座標として、後述する処理ｄ（ステップ
Ｓ３０）へ移る（ステップＳ２８）。If | Sw (i = 1, i ′ = 0) |> κ (Yes in step S18), only i used here is set as fixed coordinates, and the process proceeds to a process d (step S30) described later. The process proceeds (step S28).

【００８３】１≦｜Ｓｗ（ｉ＝１）｜≦κならば（ステ
ップＳ１９でＹｅｓの場合）、候補字種の集合としてＳ
ｗ（ｉ＝１）を出力して（ステップＳ２４）、終了す
る。If 1 ≦ | Sw (i = 1) | ≦ κ (Yes in step S19), S is set as a set of candidate character types.
w (i = 1) is output (step S24), and the process ends.

【００８４】｜Ｓｗ（ｉ＝１）｜＞κならば（ステップ
Ｓ２０でＹｅｓの場合）、ここで用いたｉをすべて固定
座標として、後述する処理ｄ（ステップＳ３０）へ移る
（ステップＳ２９）。If | Sw (i = 1) |> κ (Yes in step S20), the process proceeds to the process d (step S30) described later, with all i used here as fixed coordinates (step S29).

【００８５】１≦｜Ｓｗ（ｉ´＝０）｜≦κならば（ス
テップＳ２１でＹｅｓの場合）、候補字種の集合として
Ｓｗ（ｉ´＝０）を出力して（ステップＳ２５）、終了
する。If 1 ≦ | Sw (i ′ = 0) | ≦ κ (Yes in step S21), Sw (i ′ = 0) is output as a set of candidate character types (step S25), and the processing ends. I do.

【００８６】｜Ｓｗ（ｉ´＝０）｜＞κならば（ステッ
プＳ２２でＹｅｓの場合）、Ω＝Ｓｗ（ｉ´＝０）とお
き、θｔ＝θt+1 ，θt+1 ＝θt+2 ，θt+2 ＝θt+3 ，
θt+3 ＝θt+4 とおいて、処理ｂ（ステップＳ１２）へ
移る（ステップＳ２６）。If | Sw (i ′ = 0) |> κ (Yes in step S22), Ω = Sw (i ′ = 0), θt = θt + 1, θt + 1 = θt + 2 , Θt + 2 = θt + 3,
With θt + 3 = θt + 4, the process moves to the process b (step S12) (step S26).

【００８７】｜Ｓｗ（ｉ＝１，ｉ´＝０）｜＝０または
｜Ｓｗ（ｉ＝１）｜＝０または｜Ｓｗ（ｉ´＝０）｜＝
０ならば（ステップＳ２２でＮｏの場合）、候補字種の
集合として現段階で対象にしている字種の集合Ωを出力
して（ステップＳ２７）、終了する。| Sw (i = 1, i ′ = 0) | = 0 or | Sw (i = 1) | = 0 or | Sw (i ′ = 0) | =
If it is 0 (No in step S22), the set Ω of the character type currently targeted as the set of candidate character types is output (step S27), and the process ends.

【００８８】（処理ｄ）座標ｉ（１≦ｉ≦Ｌ１＋Ｌ２）
がすべて固定されているならば（ステップＳ３０でＹｅ
ｓの場合）、候補字種の集合として、ｉとｉ´が存在す
る集合はＳｗ（ｉ＝１，ｉ´＝０）を、ｉのみ存在する
場合はＳｗ（ｉ＝１）を出力して（ステップＳ３１）、
終了する。(Process d) Coordinate i (1 ≦ i ≦ L1 + L2)
Are fixed (Ye in step S30)
s), as a set of candidate character types, a set where i and i ′ exist is output as Sw (i = 1, i ′ = 0), and if only i is present, Sw (i = 1) is output. (Step S31),
finish.

【００８９】固定座標でない座標ｉが存在するとき（ス
テップＳ３０でＮｏの場合）、上述の２つの場合に応じ
て、Ω＝Ｓｗ（ｉ＝１，ｉ´＝０）、またはΩ＝Ｓｗ
（ｉ＝１）とおき、θｔ＝θt+1 ，θt+1 ＝θt+2 ，θ
t+2 ＝θt+3 ，θt+3 ＝θt+4とおいて、処理ｂ（ステ
ップＳ１２）へ移る（ステップＳ３２）。When there is a coordinate i that is not a fixed coordinate (No in step S30), Ω = Sw (i = 1, i ′ = 0) or Ω = Sw according to the above two cases.
(I = 1), θt = θt + 1, θt + 1 = θt + 2, θ
With t + 2 = θt + 3 and θt + 3 = θt + 4, the process proceeds to the process b (step S12) (step S32).

【００９０】以上が、分類手順１Ａの分類アルゴリズム
である。The above is the classification algorithm of the classification procedure 1A.

【００９１】大分類で用いられるもう１つの分類手順１
Ｂの分類アルゴリズムは、分類手順１Ａの分類アルゴリ
ズムにおいて、処理ｂ（ステップＳ１２）以降における
０を１に、１を０に置き換えることによって得られる。
このことは、分類手順１Ｂにおいて、０−１パターン表
現の０を１に、１を０に置き換えて、分類手順１Ａの分
類アルゴリズムを適用することと等価であることを意味
する。前述したように、この関係を分類手順１Ａと分類
手順１Ｂの双対性と呼ぶ。Another Classification Procedure 1 Used in Major Classification
The classification algorithm of B is obtained by replacing 0 after processing b (step S12) with 1 and 1 with 0 in the classification algorithm of the classification procedure 1A.
This means that it is equivalent to applying the classification algorithm of the classification procedure 1A by replacing 0 and 1 in the 0-1 pattern expression in the classification procedure 1B. As described above, this relationship is called duality between the classification procedure 1A and the classification procedure 1B.

【００９２】次に、細分類処理について簡単に説明す
る。Next, the sub-classification processing will be briefly described.

【００９３】大分類における分類手順１Ａは文字画像の
黒画素を、分類手順１Ｂは白画素を、それぞれベースと
する分類手順と見ることができ、ノイズの与える影響も
異なる。それゆえ、細分類では、両者の分類結果に共通
する候補字種が唯一である場合には、それは十分な信頼
性を持って真の候補と見なし、これを認識結果として出
力する。The classification procedure 1A in the large classification can be regarded as a classification procedure based on black pixels of a character image, and the classification procedure 1B can be regarded as a classification procedure based on white pixels, and the influence of noise differs. Therefore, in the sub-classification, when the candidate character type common to both classification results is only one, it is regarded as a true candidate with sufficient reliability and is output as a recognition result.

【００９４】そうでない場合には、信頼性についての順
位を定める処理を行って複数候補を出力する。この場
合、必要に応じて、分類手順１Ａと分類手順１Ｂと同様
の分類アルゴリズムである分類手順２Ａと分類手順２Ｂ
を用いる。ここでの分類手順２Ａと分類手順２Ｂは、文
字画像のヒストグラムとして±４５度方向のヒストグラ
ムを使用するものであり、傾斜が±４５度に近い線分を
持つ文字の分類に特に有用である。If not, a process for determining the order of reliability is performed and a plurality of candidates are output. In this case, if necessary, the classification procedure 2A and the classification procedure 2B which are the same classification algorithms as the classification procedure 1A and the classification procedure 1B
Is used. Here, the classification procedure 2A and the classification procedure 2B use a histogram in the ± 45 degree direction as a histogram of the character image, and are particularly useful for the classification of a character having a line segment whose inclination is close to ± 45 degrees.

【００９５】また、必要に応じて優先順位付けのために
ベクトルパターン間の距離計算を行う。未知入力文字Ｘ
に対する縦、横および斜め（±４５度）の４方向のヒス
トグラムの０−１パターンを一定の順序、例えば、縦、
横、−４５度、＋４５度の順に配列し、その横軸の各座
標、すなわち、各区間に通し番号を付し、各区間ｉにお
ける１の数をその値とする変数をｙｉで表す。ただし、
１≦ｉ≦Ｌ１＋Ｌ２＋Ｌ３＋Ｌ４＝ｎ。すると、ベクト
ルＹは、次のように定義される。Ｙ＝ｙ１＋ｙ２＋…＋ｙｎＹの定義から、各要素ｙｉは非負の正数値をとる。Further, a distance between vector patterns is calculated for prioritizing as necessary. Unknown input character X
The 0-1 pattern of the vertical, horizontal, and diagonal (± 45 degrees) histograms for
Arranged in the order of horizontal, -45 degrees, and +45 degrees, each coordinate on the horizontal axis, that is, a serial number is assigned to each section, and a variable whose value is the number of 1s in each section i is represented by yi. However,
1 ≦ i ≦ L1 + L2 + L3 + L4 = n. Then, the vector Y is defined as follows. From the definition of Y = y1 + y2 +... + Yn Y, each element yi takes a nonnegative positive value.

【００９６】なお、ここで用いるベクトルパターンとし
ては、標準パターン全体を用いても良いが、必要に応じ
て部分パターンあるいはヒストグラムの縦および横軸を
部分的により細かく分割するなどして柔軟に利用すると
効果的である。As the vector pattern used here, the entire standard pattern may be used. However, if necessary, the vertical and horizontal axes of the partial pattern or the histogram may be partially and finely divided for flexible use. It is effective.

【００９７】図１０〜図１３に、大分類処理で分類手順
１Ａと分類手順１Ｂを行った後の小分類処理の手順の一
例を示す。ここでは、未知入力文字Ｘに対して、大分類
の結果得られた候補字種をもとに１位の候補字種の選択
や誤り検査などを行っている。この処理手順は一例であ
り、他の様々なバリエーションが考えられる。FIGS. 10 to 13 show an example of the procedure of the small classification process after performing the classification procedure 1A and the classification procedure 1B in the large classification process. Here, for the unknown input character X, selection of the first-place candidate character type and error checking are performed based on the candidate character types obtained as a result of the large classification. This processing procedure is an example, and various other variations are possible.

【００９８】なお、上述した実施形態では、大分類にお
ける分類手順１Ａ，１Ｂでは縦および横方向のヒストグ
ラムを、細分類における分類手順２Ａ，２Ｂでは±４５
度方向のヒストグラムを用いたが、各分類手順で縦、
横、±４５度方向の４つのヒストグラムを用いること
も、また±３０度や±６０度などの他の方向のヒストグ
ラムを用いることも可能であり、ようするに適宜設定す
ることが可能である。In the embodiment described above, the histograms in the vertical and horizontal directions are used in the classification procedures 1A and 1B in the large classification, and ± 45 in the classification procedures 2A and 2B in the fine classification.
Although the histogram of the degree direction was used, the vertical,
Four histograms in the horizontal and ± 45 degrees directions can be used, and histograms in other directions such as ± 30 degrees and ± 60 degrees can be used, and can be set as appropriate.

【００９９】また、大分類において分類手順１Ａまたは
分類手順１Ｂのいずれかのみを行い、細分類は行わない
構成にしても従来に比較して十分効果が得られる。すな
わち、細分類は本発明に必須ではなく、後処理の一手法
を示したものである。Further, even in a configuration in which only one of the classification procedure 1A and the classification procedure 1B is performed in the large classification and the sub-classification is not performed, a sufficient effect can be obtained as compared with the related art. That is, the sub-classification is not essential to the present invention, but shows one method of post-processing.

【０１００】以下では、まず、本実施形態に係る０−１
パターン表現の標準パターン（辞書パターン）、その作
成方法、および該標準パターンを用いての分類手順につ
いて詳細に説明する。In the following, first, 0-1 according to the present embodiment will be described.
A standard pattern (dictionary pattern) for pattern expression, a method for creating the pattern, and a classification procedure using the standard pattern will be described in detail.

【０１０１】特願平６−３１２０４９号にて開示された
文字認識の手順は、文字切出→ヒストグラムの生成→０
−１パターン表現→大分類（構造情報の抽出・標準パタ
ーンとの照合・候補字種の選択）→細分類→出力（１
位、２位候補等）である。そして、本発明は、前述の大
分類の際に使用される標準パターンとして、従来とは全
く異なり、与えられたサンプルセットのヒストグラムに
現れるフォントや文字のサイズ等の違いによる字体の変
動をすべて吸収することを可能にする標準パターンの作
成法を与えるものである。The procedure of character recognition disclosed in Japanese Patent Application No. 6-312049 is as follows: character extraction → histogram generation → 0
-1 pattern expression → large classification (extraction of structural information, collation with standard pattern, selection of candidate character type) → fine classification → output (1
Rank, second place candidate, etc.). The present invention, as a standard pattern used in the above-described large classification, is completely different from the conventional one, and absorbs all font variations due to differences in fonts and character sizes appearing in the histogram of a given sample set. It provides a method for creating a standard pattern that allows the

【０１０２】最初に、分類アルゴリズムで用いる従来の
標準パターンについて述べておく。従来は、文字認識の
アルゴリズムが与えられたとき、そのアルゴリズムを用
いて高精度の認識を実現するためには、異なる字体を持
つ同一字種の一定の有限集合（Ｓで表す）を代表する標
準パターンを字種ごとに、１個または複数個作成しなけ
ればならない。パターンマッチング法に基づく認識方法
では、文字の特徴を表現する特徴空間において、集合Ｓ
について統計的処理やクラスタリングなどを行い、字種
ごとにその集合を代表する少数の統計的標準パターンを
作成する。First, a conventional standard pattern used in the classification algorithm will be described. Conventionally, when a character recognition algorithm is given, in order to realize high-precision recognition using the algorithm, a standard representing a certain finite set (represented by S) of the same character type having different fonts is used. One or more patterns must be created for each character type. In a recognition method based on a pattern matching method, a set S
, Perform statistical processing, clustering, etc., and create a small number of statistical standard patterns representing the set for each character type.

【０１０３】ヒストグラムに基づく文字認識では、図５
や図１４に示したように、文字の各方向のヒストグラム
（縦および横、あるいは縦、横および斜め±４５度等）
を量子化して得られた０−１パターン表現が特徴空間に
相当する。０−１パターン表現では、横軸の任意の座標
ｉ（すなわち小区間ｉ）の領域において、縦軸の区間
は、下から上に伸びる１の列（黒画素のヒストグラムに
対応する）の区間と、上から下に伸びる０の列（白画素
のヒストグラムに対応する）の区間とからなる。縦軸は
適当な数（ｍで表す）の小区間に分割されている。In character recognition based on a histogram, FIG.
And histograms in each direction of characters (vertical and horizontal, or vertical, horizontal and oblique ± 45 degrees, etc.) as shown in FIG.
0-1 pattern expression obtained by quantizing is equivalent to the feature space. In the 0-1 pattern expression, in the area of arbitrary coordinates i on the horizontal axis (that is, small section i), the section on the vertical axis is the section of one column (corresponding to the histogram of black pixels) extending from bottom to top. , Which extend from top to bottom (corresponding to a histogram of white pixels). The vertical axis is divided into an appropriate number (represented by m) of small sections.

【０１０４】図５や図１４のように、０−１パターン表
現の縦軸はこの小区間を単位にして表されているので、
１の列の区間の先端（０の列の区間との境界）の縦軸の
値は、１の列に含まれる１の個数で与えられる。よっ
て、横軸の任意の座標ｉにおける１の個数をｙｉで表す
と０−１パターン表現は、ｎ次元ベクトルＹＹ＝（ｙ１，ｙ２，…，ｙｎ），ｎ＝ｌ₁＋ｌ₂＋ｌ₃＋ｌ₄ （１）で表される。ただし、ｌ₁、ｌ₂、ｌ₃、およびｌ
₄は、それぞれ、縦、横および斜め±４５度方向の４つ
のヒストグラムの横軸の小区間の数を表す。なお、上記
のｎは用いるヒストグラムの種類に応じて異なり、例え
ば縦および横のヒストグラムだけ用いる場合はｎ＝ｌ₁
＋ｌ₂となる。As shown in FIGS. 5 and 14, the vertical axis of the 0-1 pattern expression is expressed in units of this small section.
The value of the vertical axis at the end of the section of column 1 (boundary with the section of column 0) is given by the number of 1s included in column 1. Therefore, when the number of 1s at an arbitrary coordinate i on the horizontal axis is represented by yi, the 0-1 pattern expression is an n-dimensional vector YY = (y1, y2,..., Yn), n = l ₁ + l ₂ + l ₃ + l ₄ Represented by (1). Where l ₁ , l ₂ , l ₃ , and l
₄ indicates the number of small sections on the horizontal axis of the four histograms in the vertical, horizontal, and oblique directions of ± 45 degrees, respectively. Note that the above n varies depending on the type of histogram used. For example, when only vertical and horizontal histograms are used, n = l ₁
+ L ₂ .

【０１０５】特願平６−３１２０４９号では、集合Ｓの
中の任意の１つの０−１パターン表現を標準パターンと
し、前述した大分類法（すなわち分類手順１Ａ等）によ
って字体の変動を吸収するものとし、それで吸収しきれ
ないときは標準パターンの数を増やすことを基本的手段
としている。これに対して本発明によればヒストグラム
の特性を活用することにより、該大分類法において、同
一字種の集合Ｓのどの文字が未知入力となっても、それ
と同一字種が必ず候補字種に含まれるように標準パター
ンを作成することができる。以下、本発明による標準パ
ターンの作成法について詳しく述べる。In Japanese Patent Application No. 6-312049, any one 0-1 pattern expression in the set S is used as a standard pattern, and the variation of the font is absorbed by the above-described large classification method (that is, the classification procedure 1A). If it cannot be absorbed, the basic means is to increase the number of standard patterns. On the other hand, according to the present invention, by utilizing the characteristics of the histogram, even if any character of the set S of the same character type becomes an unknown input in the large classification method, the same character type is always a candidate character type. Can be created to be included in the standard pattern. Hereinafter, a method of creating a standard pattern according to the present invention will be described in detail.

【０１０６】まず、基本となる考え方を説明する。First, the basic concept will be described.

【０１０７】（ａ）文字画像の０−１パターン表現の１
の列に着目した場合未知入力文字Ｘの０−１パターン表現の横座標ｉにおけ
る１の個数（１の列の高さ、すなわち黒画素ヒストグラ
ムの高さに対応する）をｘｉで表し、すべての横座標に
ついてｙｉ≧ｘｉ（ｉ＝１，２，…，ｎ）（２）を満たす１の個数ｙｉを持つ０−１パターン表現をＹ
（１）とする。すると、ヒストグラムの性質から任意の
横座標ｉにおいて、Ｘが１となる区間ではＹ（１）も１
となるので、該大分類法においてＹ（１）は必ずＸの候
補字種に選ばれる。(A) 1 of 0-1 pattern expression of character image
In the case where attention is paid to the column of, the number of 1s (corresponding to the height of the column of 1, ie, the height of the black pixel histogram) in the abscissa i of the 0-1 pattern expression of the unknown input character X is represented by xi, For the abscissa, yi ≧ xi (i = 1, 2,..., N) (2)
(1). Then, at an arbitrary abscissa i from the property of the histogram, Y (1) is also 1 in a section where X is 1.
Therefore, Y (1) is always selected as a candidate character type of X in the large classification method.

【０１０８】（ｂ）文字画像の０−１パターン表現の０
の列に着目した場合未知入力文字Ｘの０−１パターン表現の横座標ｉにおけ
る０の個数（０−１パターン表現の上から下に伸びる０
の列の高さ、すなわち白画素ヒストグラムの高さに対応
する）をｘｉ´で表し、すべての横座標についてｙｉ´≧ｘｉ´ （ｉ＝１，２，…，ｎ）（３）を満たす０の個数ｙｉ´を持つ０−１パターン表現をＹ
（０）とする。すると、ヒストグラムの性質から任意の
横座標ｉにおいて、Ｘが０となる区間ではＹ（０）も０
となるので、該大分類法においてＹ（０）は必ずＸの候
補字種に選ばれる。(B) 0 of 0-1 pattern expression of character image
Focusing on the column of the number of 0s on the abscissa i of the 0-1 pattern expression of the unknown input character X (0 extending from top to bottom of the 0-1 pattern expression)
Ii ′, i.e., corresponding to the height of the white pixel histogram), is represented by xi ′, and all the abscissas satisfy yi ′ ≧ xi ′ (i = 1, 2,..., N) (3). Is expressed as 0-1 pattern expression having the number yi ′ of Y
(0). Then, at an arbitrary abscissa i from the nature of the histogram, Y (0) is also 0 in a section where X is 0.
Therefore, in the large classification method, Y (0) is always selected as a candidate character type of X.

【０１０９】ところで、縦軸の区間数をｍで表したの
で、横座標ｉにおけるＸの１の個数ｘｉとＹ（１）の１
の個数ｙｉは、それぞれ、ｘｉ＝ｍ−ｘｉ´，ｙｉ＝ｍ
−ｙｉ´となる。よって、これらを式（３）に代入する
と、Ｙ（０）はｙｉ≦ｘｉ（ｉ＝１，２，…，ｎ）（４）を満たす０−１パターン表現になる。Since the number of sections on the vertical axis is represented by m, the number xi of 1 in X on the abscissa i and 1 in Y (1)
Are xi = m−xi ′ and yi = m, respectively.
−yi ′. Therefore, when these are substituted into Expression (3), Y (0) becomes a 0-1 pattern expression satisfying yi ≦ xi (i = 1, 2,..., N) (4).

【０１１０】したがって、該大分類法において、Ｙ
（１）とＹ（０）の対で表現されるＹ＝（Ｙ（１），Ｙ
（０））を１つの字種に対応させておけば、未知入力文
字Ｘの０−１パターン表現の１の列に着目した場合には
式（２）が成立し、０の列に着目した場合には式（４）
が成立するので、Ｘの候補字種の１つとして、必ず字種
Ｙが選ばれることになる。Therefore, in the large classification method, Y
Y = (Y (1), Y expressed as a pair of (1) and Y (0))
If (0)) is made to correspond to one character type, when focusing on the 1st column of the 0-1 pattern expression of the unknown input character X, Expression (2) is established, and focusing on the 0th column Equation (4) in the case
Holds, the character type Y is always selected as one of the candidate character types of X.

【０１１１】次に、字体の変動を吸収する標準パターン
の作成法について説明する。Next, a description will be given of a method of creating a standard pattern that absorbs variations in fonts.

【０１１２】同一字種であっても様々なフォントや文字
のサイズによって多様な字体が存在する。このような字
体の変動を吸収する標準パターンの作成法は以下のよう
になる。There are various fonts for the same character type depending on various fonts and character sizes. The method of creating a standard pattern that absorbs such a change in font is as follows.

【０１１３】同一字種で字体が異なるｒ個の文字Ｙ１，
Ｙ２，…，Ｙｒからなる有限集合をＳとおく。Ｓ＝｛Ｙ１，Ｙ２，…，Ｙｒ｝（５）以下では、記述を簡単にするために、Ｙ１，Ｙ２，…，
Ｙｒは文字を表すとともに文字の０−１パターン表現を
も表す記号として用いる。R characters Y1, having the same character type but different fonts
Let S be a finite set consisting of Y2,..., Yr. S = {Y1, Y2,..., Yr} (5) In the following, in order to simplify the description, Y1, Y2,.
Yr represents a character and is used as a symbol representing a 0-1 pattern expression of the character.

【０１１４】ｙ１ｉ（１），ｙ２ｉ（１），…，ｙｒｉ
（１）を、それぞれ、０−１パターン表現Ｙ１，Ｙ２，
…，Ｙｒの横座標ｉにおける１の個数とし、その中の最
大値をｙｉ（１）で表すと、ｙｉ（１）＝ｍａｘ｛ｙ１ｉ（１），ｙ２ｉ（１），…，ｙｒｉ（１）｝（ｉ＝１，２，…，ｎ）（６）となる。Y1i (1), y2i (1),..., Yri
(1) are represented by 0-1 pattern expressions Y1, Y2,
.., Yr as the number of 1s on the abscissa i, and the maximum value among them is represented by yi (1). Yi (1) = max ｛y1i (1), y2i (1),. ｝ (I = 1, 2,..., N) (6)

【０１１５】したがって、ｙｉ（１）≧ｙ１ｉ（１），ｙｉ（１）≧ｙ２ｉ（１），…，ｙｉ（１）≧ｙｒｉ（１）（ｉ＝１，２，…，ｎ）（７）が成り立つ。横座標ｉの１の個数がｙｉ（１），（ｉ＝
１，２，…，ｎ）で与えられる０−１パターン表現をＹ
（１）で表す。Therefore, yi (1) ≧ y1i (1), yi (1) ≧ y2i (1),..., Yi (1) ≧ yri (1) (i = 1, 2,..., N) (7) ) Holds. The number of 1s on the abscissa i is yi (1), (i =
The 0-1 pattern expression given by 1, 2,.
Represented by (1).

【０１１６】例えば、図１５の（ａ），（ｂ），（ｃ）
のように同一字種を表す３種類の０−１パターン表現か
らは、図１６（ａ）に示すような０−１パターン表現Ｙ
（１）が得られる。各図を比較すれば用意に分かるよう
に、図１６（ａ）のＹ（１）の座標ｉにおける１の列
は、図１５の（ａ），（ｂ），（ｃ）の各図の座標ｉに
おける最大の１の列に等しくなっている。For example, (a), (b), and (c) of FIG.
From the three types of 0-1 pattern expressions representing the same character type as shown in FIG.
(1) is obtained. As can be easily understood by comparing the figures, the column of 1 at the coordinate i of Y (1) in FIG. 16A is the coordinate of each figure in (a), (b), and (c) in FIG. It is equal to the largest one column in i.

【０１１７】以上のような処理を、０−１パターン表現
の最大化処理と呼ぶ。The above processing is called maximization processing of 0-1 pattern expression.

【０１１８】一方、ｙ１ｉ（１），ｙ２ｉ（１），…，
ｙｒｉ（１）の中の最小値をｙｉ（０）で表すと、ｙｉ（０）＝ｍｉｎ｛ｙ１ｉ（１），ｙ２ｉ（１），…，ｙｒｉ（１）｝（ｉ＝１，２，…，ｎ）（８）となる。On the other hand, y1i (1), y2i (1),.
When the minimum value of yri (1) is represented by yi (0), yi (0) = min {y1i (1), y2i (1),..., yri (1)} (i = 1, 2,...) , N) (8).

【０１１９】したがって、ｙｉ（０）≦ｙ１ｉ（０），ｙｉ（０）≦ｙ２ｉ（０），…，ｙｉ（０）≦ｙｒｉ（０）（ｉ＝１，２，…，ｎ）（９）が成り立つ。横座標ｉの０の個数がｙｉ（０），（ｉ＝
１，２，…，ｎ）で与えられる０−１パターン表現をＹ
（０）で表す。Therefore, yi (0) ≦ y1i (0), yi (0) ≦ y2i (0),..., Yi (0) ≦ yri (0) (i = 1, 2,..., N) (9) ) Holds. If the number of 0s on the abscissa i is yi (0), (i =
The 0-1 pattern expression given by 1, 2,.
It is represented by (0).

【０１２０】例えば、図１５の（ａ），（ｂ），（ｃ）
の０−１パターン表現からは、図１６（ｂ）に示すよう
な０−１パターン表現Ｙ（０）が得られる。図１６
（ｂ）のＹ（０）の座標ｉにおける１の列は、図１５の
（ａ），（ｂ），（ｃ）の各図の座標ｉにおける最小の
１の列に等しくなっている。For example, (a), (b), and (c) of FIG.
The 0-1 pattern expression Y (0) as shown in FIG. 16B is obtained from the 0-1 pattern expression. FIG.
The row of 1 at the coordinate i of Y (0) in (b) is equal to the minimum row of 1 at the coordinate i in each of FIGS. 15 (a), (b) and (c).

【０１２１】以上のような処理を、０−１パターン表現
の最小化処理と呼ぶ。The above processing is called a 0-1 pattern expression minimization processing.

【０１２２】次に、Ｙ（Ｓ）を０−１パターン表現Ｙ
（１）とＹ（０）の対として、Ｙ（Ｓ）＝（Ｙ（１），Ｙ（０））（１０）で表し、Ｙ（Ｓ）は集合Ｓと同じ字種を持ち集合Ｓを代
表する標準パターンとする。そして、未知入力文字Ｘの
０−１パターン表現の１の列に着目して候補字種を選択
する場合にはＹ（１）を用い、０の列に着目して候補字
種を選択する場合にはＹ（０）を用いる。この２つの場
合について前者の横座標をｉ、後者の横座標をｉ´とす
るとき、Ｘについて選択されたすべてのｉおよびｉ´に
ついて、それぞれ、ｙｉ（１）≧ｘｉおよびｙｉ´
（０）≦ｘｉ´が成立すれば、該大分類においてＹ
（Ｓ）は絞り込まれたＸの候補字種に必ず含まれること
になる。Next, Y (S) is represented by a 0-1 pattern expression Y
A pair of (1) and Y (0) is represented by Y (S) = (Y (1), Y (0)) (10), where Y (S) has the same character type as set S and sets S This is a representative standard pattern. When selecting a candidate character type by focusing on the 1st column of the 0-1 pattern expression of the unknown input character X, using Y (1), and selecting a candidate character type by focusing on the 0th column Is Y (0). Assuming that the abscissa of the former is i and the abscissa of the latter is i ′ in these two cases, yi (1) ≧ xi and yi ′ for all i and i ′ selected for X, respectively.
If (0) ≦ xi ′ is satisfied, Y
(S) is always included in the narrowed down candidate character types of X.

【０１２３】集合Ｓに属するＹ１，Ｙ２，…，Ｙｒにつ
いては、式（７）および式（９）が成立しているので、
その何れが未知入力となっても、当然、Ｙ（Ｓ）は必ず
絞り込まれた候補字種に含まれる。よって、集合Ｓの字
体の変動をすべて吸収した標準パターンＹ（Ｓ）が作成
されたことになる。For Y1, Y2,..., Yr belonging to the set S, since the equations (7) and (9) hold,
Whichever input is unknown, Y (S) is naturally included in the narrowed down candidate character types. Therefore, the standard pattern Y (S) that absorbs all the variations of the font of the set S is created.

【０１２４】このように本実施形態によれば、認識の対
象となるフォントの種類や文字サイズの範囲が与えられ
ると、そのサンプルセットを用いて字体の変動を吸収し
かつ前記大分類で得られる平均候補字種数を所望の値以
下とするような標準パターンセットの系統的作成法を与
えることができる。As described above, according to the present embodiment, given a range of font types and character sizes to be recognized, variations in fonts can be absorbed using the sample set and obtained in the above-described large classification. It is possible to provide a method for systematically creating a standard pattern set such that the average number of candidate character types is equal to or less than a desired value.

【０１２５】また、該大分類処理において、０−１パタ
ーン表現の１の列が主体となる前記分類処理１ＡをＹ
（１）を用いて実行し、０−１パターン表現の０の列が
主体となる前記分類処理１ＢをＹ（０）を用いて実行す
ることによって、両者の利点を活かした有効な候補字種
の絞り込みが可能となる。In the large classification process, the classification process 1A mainly composed of one column of the 0-1 pattern expression is referred to as Y.
By using (1) to execute the above-described classification process 1B mainly using a sequence of 0s in the 0-1 pattern expression using Y (0), an effective candidate character type utilizing the advantages of both. Can be narrowed down.

【０１２６】従って、このような標準パターンを用いる
ことにより、印刷文字の認識精度と認識速度をより効果
的に向上させることができる。Therefore, by using such a standard pattern, the recognition accuracy and the recognition speed of a printed character can be more effectively improved.

【０１２７】ここで、同一字種で字体の変動が大きい場
合について説明する。Here, a case will be described in which the variation of the font is large for the same character type.

【０１２８】式（５）の同一字種の有限集合Ｓ＝｛Ｙ
１，Ｙ２，…，Ｙｒ｝における字体の変動が大きいと認
められる場合には、大分類において得られる候補字種数
が増えて細分類の負担が大きくなる。このような場合に
は、集合Ｓを字体がより近いものどうしの部分集合（ク
ラス）に分割し、各クラスを代表する標準パターンを前
述の方法に従って作成すれば良い。このようなクラスの
数をいくつにするかは、大分類で得られる平均候補字種
数をどのくらいの数に抑えるかに依存する。したがっ
て、クラスの数は字種によって大きく異なることが予想
される。例えば、「Ｔ」や「川」という字種は、ＪＩＳ
第一水準や第二水準等の文字の範囲では、フォントやプ
リンターの種類および文字のサイズにあまり依存せず、
クラス数は１または高々２にとどまる可能性が大であ
る。A finite set S = 字 Y of the same character type in equation (5)
If it is recognized that the variation of the font in 1, Y2,..., Yr} is large, the number of candidate character types obtained in the large classification increases, and the burden of the fine classification increases. In such a case, the set S may be divided into subsets (classes) having similar fonts, and a standard pattern representing each class may be created according to the method described above. The number of such classes depends on how small the average number of candidate character types obtained in the large classification is. Therefore, the number of classes is expected to vary greatly depending on the character type. For example, the characters "T" and "river" are JIS
In the range of characters such as Level 1 and Level 2, it does not depend much on the font, printer type and character size,
The number of classes is likely to be one or at most two.

【０１２９】同一字種の集合Ｓを字体のより近い文字ど
うしの適当なクラス数に分割する方法として、次の手順
が考えられる。The following procedure can be considered as a method of dividing the set S of the same character type into an appropriate number of classes of characters having similar fonts.

【０１３０】（ｉ）クラスタリング手法に基づくクラス
分け最も一般的な方法として、クラスタリングの手法を適用
して同一字種の集合Ｓを適当な数のクラスに分割し、各
クラスの代表となる標準パターンを前述の方法に従って
作成する。(I) Classification based on clustering method As a most general method, a set S of the same character type is divided into an appropriate number of classes by applying a clustering method, and a standard pattern representative of each class is obtained. Is created according to the method described above.

【０１３１】（ｉｉ）認識実験に基づくクラス分けの検
証クラスタリング手法による集合Ｓのクラス分けが適当で
あるか否かの検証は認識実験によって行われる。すなわ
ち、各クラスを代表する標準パターンを用い、Ｓに含ま
れる字体のデータおよびＳに含まれなかった字体のデー
タを対象とした認識実験を行い、前者については絞り込
まれた平均候補数、後者については平均候補数とその中
に真の候補字種が含まれる割合（累積認識率）などに基
づいてクラス分けの妥当性が検証される。検証の結果、
不都合があれば、上記（ｉ）に戻るという手順を繰返
し、妥当な結果を得る。(Ii) Verification of Classification Based on Recognition Experiment Whether the classification of the set S by the clustering method is appropriate or not is verified by a recognition experiment. That is, using a standard pattern representative of each class, a recognition experiment was performed on the font data included in S and the font data not included in S, and the average number of candidates narrowed down for the former and the average number of candidates for the latter The validity of the classification is verified based on the average number of candidates and the ratio (cumulative recognition rate) in which true candidate character types are included therein. As a result of the verification,
If there is any inconvenience, the procedure of returning to the above (i) is repeated to obtain a reasonable result.

【０１３２】以下では、本発明の一実施形態に係る標準
パターン作成装置および標準パターン作成機能を含む文
字認識装置について説明する。Hereinafter, a standard pattern creation device and a character recognition device including a standard pattern creation function according to an embodiment of the present invention will be described.

【０１３３】図１７は本標準パターン作成装置の一構成
例であり、図１８はその概略手順を示すフローチャート
である。FIG. 17 shows an example of the configuration of the standard pattern creating apparatus, and FIG. 18 is a flowchart showing a schematic procedure thereof.

【０１３４】図１７のように本標準パターン作成装置
は、文字画像データ入力部２２、ヒストグラム作成部２
４、０−１パターン表現作成部２６、標準パターン作成
部２８を備えている。As shown in FIG. 17, the standard pattern creating apparatus includes a character image data input section 22, a histogram creating section 2
4, a 0-1 pattern expression creating unit 26 and a standard pattern creating unit 28 are provided.

【０１３５】まず、標準パターンの作成に用いる同一字
種で互いに字体の異なる標準文字を決定する（ステップ
Ｓ２０１）。標準文字の決定にあたっては、必要に応じ
て前述のような（ｉ）クラスタリング手法に基づくクラ
ス分けや（ｉｉ）認識実験に基づくクラス分けの検証を
行う。First, standard characters having the same character type but different fonts for use in creating a standard pattern are determined (step S201). In determining the standard characters, verification of (i) the classification based on the clustering method and (ii) the classification based on the recognition experiment is performed as necessary.

【０１３６】文字画像データ入力部２２は、例えばイメ
ージスキャナを用いて、書面に印刷された標準文字をそ
れぞれ光学像として取り込み電気信号に変換し、その
後、予め定められたしきい値に従い、その濃度値を白黒
の２値に量子化する。あるいは、すでに磁気ディスク等
の情報記憶媒体に格納されている各標準文字に対応する
ドットマトリクスを読み込む（ステップＳ２０２）。The character image data input unit 22 takes in standard characters printed on a document as an optical image and converts them into an electric signal using, for example, an image scanner, and then converts the density into an electric signal according to a predetermined threshold value. Quantizes the value to binary black and white. Alternatively, a dot matrix corresponding to each standard character already stored in an information storage medium such as a magnetic disk is read (step S202).

【０１３７】なお、すでに標準文字に対応するヒストグ
ラムまたは０−１パターン表現が磁気ディスク内等に存
在する場合は、これを読み込んでも良い。言うまでもな
く、ヒストグラムを読み込んだ場合には当該標準文字に
ついて後のステップＳ２０３の処理が不要となり、０−
１パターン表現を読み込んだ場合には当該標準文字につ
いて後のステップＳ２０３およびステップＳ２０４が不
要となる。If a histogram or 0-1 pattern expression corresponding to a standard character already exists in a magnetic disk or the like, it may be read. Needless to say, when the histogram is read, the processing of the subsequent step S203 is unnecessary for the standard character, and 0-
When the one-pattern expression is read, the subsequent steps S203 and S204 are not necessary for the standard character.

【０１３８】次に、ヒストグラム作成部２４は、基本的
には図６のヒストグラム作成部１０と同様の機能を持
ち、前述のようにして各標準文字のドットマトリクスか
ら夫々のヒストグラムを作成する（ステップＳ２０
３）。Next, the histogram creating section 24 has basically the same function as the histogram creating section 10 of FIG. 6, and creates respective histograms from the dot matrix of each standard character as described above (step S1). S20
3).

【０１３９】次に、０−１パターン表現作成部２６は、
前述のようにして、各標準文字のドットマトリクスから
夫々の０−１パターン表現を作成する（ステップＳ２０
４）。Next, the 0-1 pattern expression creating section 26
As described above, the respective 0-1 pattern expressions are created from the dot matrix of each standard character (step S20).
4).

【０１４０】最後に、標準パターン作成部２８は、得ら
れた各標準文字に対応する夫々の０−１パターン表現に
対して各座標ｉごとに前述のような最大化処理および最
小化処理を施し、標準パターンを得る（ステップＳ２０
５）。Finally, the standard pattern creating section 28 performs the above-described maximization processing and minimization processing for each coordinate i for each of the obtained 0-1 pattern expressions corresponding to the respective standard characters. To obtain a standard pattern (step S20)
5).

【０１４１】このようにして作成された標準パターン
は、例えば図６の０−１パターン表現辞書１２に登録さ
れる。また、必要に応じて前述の（ｉ）クラスタリング
手法に基づくクラス分けや前述の（ｉｉ）認識実験に基
づくクラス分けの検証を行い、その結果必要があれば０
−１パターン表現辞書１２に登録済の標準パターンがよ
り妥当なものに更新される。The standard pattern created in this way is registered in, for example, the 0-1 pattern expression dictionary 12 in FIG. In addition, if necessary, the above-described (i) classification based on the clustering method and (ii) verification of the classification based on the recognition experiment are performed.
The standard pattern registered in the -1 pattern expression dictionary 12 is updated to a more appropriate one.

【０１４２】図１９には、本標準パターン作成機能を含
む文字認識装置の一例を示す。この文字認識装置は図６
と図１７を組み合わせたものであり、図１７のヒストグ
ラム作成部２４を図６のヒストグラム作成部１０で兼ね
ている点以外は、図１９と図６および図１７とで同一番
号の付された部分は同様の構成である。FIG. 19 shows an example of a character recognition device including the standard pattern creation function. This character recognition device is shown in FIG.
17 and FIG. 17 except that the histogram creating unit 24 in FIG. 17 is also used as the histogram creating unit 10 in FIG. Has the same configuration.

【０１４３】なお、以上では、認識単位を文字単位とし
て説明したが、縦や横に分離する文字（例えば「門」
「乱」など）に対し、前記字種として分離した各部分を
用意し、入力画像と標準パターンとを各分離部分ごとに
比較・認識することも可能である。Although the recognition unit has been described as a character unit in the above description, characters separated vertically and horizontally (for example, “gate”)
It is also possible to prepare each part separated as the character type for "random" and compare and recognize the input image and the standard pattern for each separated part.

【０１４４】ところで、文字認識の対象となる字種数
は、日本語や中国語では、通常数千を越える。さらに、
多種多様なフォントやプリンタおよび文字のサイズなど
によって字体がことなることを考慮すると、上述の標準
パターン作成法を用いても必要な標準パターンの総数、
すなわち文字認識用辞書の規模は数万個を越えることが
予想される。従来の方法では、辞書の規模にほぼ比例し
て文字認識の所要時間が増加する形態となっている。By the way, the number of character types to be subjected to character recognition usually exceeds several thousand in Japanese and Chinese. further,
Considering that fonts vary depending on various fonts, printers, character sizes, etc., the total number of standard patterns required even using the standard pattern creation method described above,
That is, the size of the character recognition dictionary is expected to exceed tens of thousands. In the conventional method, the time required for character recognition increases almost in proportion to the size of the dictionary.

【０１４５】このような場合には、上述のようにして作
成した標準パターン（０−１パターン表現）を、予め０
−１パターン表現に現れる顕著な特徴に基づいた方法に
よって幾つかのクラスに分類しておき、大分類処理に先
だって未知入力文字画像の０−１パターン表現から該当
するクラスを求め、このクラスに属する標準パターンに
対象を絞って大分類処理を行うと、文字認識用辞書が大
規模化されても、高い認識精度を維持しつつ認識処理を
高速化することが可能となる。In such a case, the standard pattern (0-1 pattern expression) created as described above is
-1 is classified into several classes by a method based on the salient features appearing in the pattern expression, and before the large classification process, a corresponding class is obtained from the 0-1 pattern expression of the unknown input character image, and belongs to this class. When the large classification process is performed by focusing on the standard pattern, even if the dictionary for character recognition is enlarged, it is possible to speed up the recognition process while maintaining high recognition accuracy.

【０１４６】以下では、文字認識用辞書の大規模化に係
る認識の高速化法について説明する。In the following, a description will be given of a method for speeding up the recognition in accordance with the enlargement of the character recognition dictionary.

【０１４７】本方法は、前述した特願平６−３１２０４
９号において開示された文字認識の大分類の基本となる
考え方を応用することによって、予め標準パターンを適
当数、例えば十数個のクラスに分類し、未知入力文字画
像の０−１パターン表現に基づいて照合すべき標準パタ
ーンのクラスを正確かつ簡単な比較判定処理で決定する
ことを可能にしたものである。本方法は、概略的には、
（１）分類条件の設定、（２）設定された分類条件によ
る標準パターンのクラス分類、（３）大分類処理におけ
る誤分類を回避するために必要に応じて行う、他クラス
への標準パターンの追加、（４）手順２、あるいは手順
２および３を全標準パターンについて行う、の手順で行
われる。This method is based on the above-mentioned Japanese Patent Application No. Hei 6-31204.
By applying the basic idea of the large classification of character recognition disclosed in No. 9 in advance, standard patterns are classified in advance into an appropriate number, for example, a dozen or more classes, and the 0-1 pattern expression of an unknown input character image is obtained. This makes it possible to determine the class of the standard pattern to be collated based on accurate and simple comparison and judgment processing. The method generally comprises:
(1) Classification condition setting, (2) Classification of standard patterns according to the set classification conditions, (3) Standard pattern assignment to other classes, which is performed as necessary to avoid erroneous classification in the large classification process Addition, (4) Step 2 or Steps 2 and 3 are performed for all standard patterns.

【０１４８】以下にその具体的方法について述べる。The specific method will be described below.

【０１４９】まず、標準パターンの分類条件の設定につ
いて説明する。First, the setting of the standard pattern classification condition will be described.

【０１５０】文字画像の縦、横および斜め（±４５度）
方向のヒストグラムの０−１パターン表現を分類する場
合、様々な分類条件の設定があり得るが、以下に一例を
挙げてその基本的考え方を示す。Vertical, horizontal and oblique (± 45 degrees) of the character image
When classifying the 0-1 pattern expression of the directional histogram, various classification conditions can be set. The basic concept will be described below using an example.

【０１５１】ここでは、下記の４個の記号を用いて分類
条件を表す。Here, the classification conditions are represented using the following four symbols.

【０１５２】Ａ_i（ｕ）：０−１パターン表現の縦軸の
目盛りがｕ％を越える区間ではすべて０の列である。A _i (u): In the section where the scale on the vertical axis of the 0-1 pattern expression exceeds u%, all columns are 0.

【０１５３】Ｂ_i（ｖ）：０−１パターン表現の高さが
ｖ％以下の区間はすべて１の列である。B _i (v): All sections where the height of the 0-1 pattern expression is less than or equal to v% are 1 column.

【０１５４】Ｃ_i（ｗ，ｐ）：０−１パターン表現にお
いて、高さがｗ％以上のｐ個以下の１の列が存在する。
ただし、ｐ≧１である。C _i (w, p): In the 0-1 pattern expression, there are p rows of 1 or less with a height of w% or more.
However, p ≧ 1.

【０１５５】Ｄ（ｙ）：０−１パターン表現において、
横軸の一方または両方の端の区間に高さがｙ％以上の１
の列が存在する。D (y): In the 0-1 pattern expression,
In the section at one or both ends of the horizontal axis, the height of y% or more 1
Column exists.

【０１５６】ここで、各記号は、添字のｉ＝１，２，
３，４に応じて、それぞれ、対象とする文字画像の縦、
横、斜め（±４５度）方向のヒストグラムの０−１パタ
ーン表現について、その右の文で示された命題が成立す
るときは１（Ｔｒｕｅ）を、成立しないときは０（Ｆａ
ｌｓｅ）をとる命題変数とする。Here, each symbol has a subscript i = 1, 2, 2,
According to 3 and 4, respectively, the vertical of the target character image,
Regarding the 0-1 pattern expression of the histogram in the horizontal and diagonal directions (± 45 degrees), 1 (True) is established when the proposition shown in the right sentence is satisfied, and 0 (Fa) when the proposition is not satisfied.
1se).

【０１５７】例えば、Ａ₁（ｕ）＝１は、対象とする文
字画像の縦方向のヒストグラムの０−１パターン表現に
おいて、縦軸の目盛りがｕ％を越える区間ではすべて０
の列であることが成立していることを示し、およびＡ₁
（ｕ）＝０は、それが成立していないことを示す。For example, A ₁ (u) = 1 indicates 0 in the 0-1 pattern expression of the vertical histogram of the target character image in all sections where the vertical axis scale exceeds u%.
And that A ₁
(U) = 0 indicates that it is not established.

【０１５８】同様に、Ｃ₂（ｗ，ｐ）＝１は、対象とす
る文字画像の横方向のヒストグラムの０−１パターン表
現において、高さがｗ％以上のｐ（ｐ≧１）個以下の１
の列が存在することが成立していることを示す。Similarly, C ₂ (w, p) = 1 indicates that p (p ≧ 1) or less with a height of w% or more in a 0-1 pattern expression of a horizontal histogram of a target character image. Of 1
It is shown that the existence of the sequence of

【０１５９】また、Ａ₁（ｕ）＝１かつＣ₂（ｗ，ｐ）
＝１、あるいはＡ₁（ｕ）＝１かつＣ₂（ｗ，ｐ）＝
０、などのように、４個の各条件について種々の組合せ
およびｕ，ｖ，ｗ，ｐなどのパラメータの与え方によ
り、標準パターンの分類に必要な種々の分類条件を表す
ことができる。A ₁ (u) = 1 and C ₂ (w, p)
= 1 or A ₁ (u) = 1 and C ₂ (w, p) =
Various classification conditions necessary for the classification of the standard pattern can be represented by various combinations of the four conditions, such as 0, and the way of giving parameters such as u, v, w, p.

【０１６０】次に、このような分類条件による具体的な
分類方法について述べる。Next, a specific classification method based on such classification conditions will be described.

【０１６１】認識対象とする文字集合の特性に応じてそ
れに適する分類条件を選択して用いることによって、辞
書の数万個の標準パターンを各クラスの大きさが数千個
以下の標準パターンからなる適当数のクラスに分類する
ことが可能である。その際、字体の変動やノイズ等を考
慮して複数のクラスに入る可能性のある標準パターンに
ついては、それらの該当する複数個のクラスに属するよ
うに各クラスを構成する。By selecting and using classification conditions suitable for the character set to be recognized according to the characteristics of the character set to be recognized, tens of thousands of standard patterns in the dictionary are composed of standard patterns of which the size of each class is not more than several thousand. It is possible to classify into an appropriate number of classes. At this time, with respect to a standard pattern that may fall into a plurality of classes in consideration of variations in fonts, noise, and the like, each class is configured to belong to the corresponding plurality of classes.

【０１６２】例えば、Ａ₁（ｕ）を分類条件とした場合
には、標準パターンはＡ₁（ｕ）が１となるクラス、す
なわち縦軸の目盛りがｕ％を越える区間ではすべて０の
列からなる標準パターンのクラスと、Ａ₁（ｕ）が０と
なるクラス、すなわち縦軸の目盛りがｕ％を越える区間
にも１の列が存在する標準パターンのクラスに分類され
る。しかし、前者のクラスにはｕ％を越えないがｕ％ま
たはそれに近い高さの１の列を持つ標準パターンが存在
する可能性がある。同様に、後者のクラスにはｕ％を越
えてはいるがｕ％に近い高さの１の列を持つ標準パター
ンが存在する可能性がある。従って、それらの標準パタ
ーンに対応する文字画像が入力された場合には、字体の
変動やノイズによって前者に該当するものが後者に、あ
るいは後者に該当するものが前者に分類されるという誤
分類が起こり得る可能性がある。For example, when A ₁ (u) is used as the classification condition, the standard pattern is a class in which A ₁ (u) is 1, that is, in a section where the scale on the vertical axis exceeds u%, the column from the column of all 0s. The standard pattern is classified into a class of the standard pattern and a class of A ₁ (u) of 0, that is, a class of the standard pattern in which a column of 1 also exists in a section where the scale of the vertical axis exceeds u%. However, there may be a standard pattern in the former class with one row of height not exceeding u% but at or near u%. Similarly, in the latter class there may be standard patterns with more than u% but one row of heights close to u%. Therefore, when character images corresponding to those standard patterns are input, there is a misclassification that the former corresponding to the former is classified into the latter or the latter corresponding to the former due to fluctuations in the font and noise. It is possible.

【０１６３】このような誤分類を避けるためには、０−
１パターン表現の縦軸の目盛りのｕ％を中心にｕ±δの
領域を設ける。ただし、δ＞０である。そして、Ａ
₁（ｕ）では１となるがＡ₁（ｕ−δ）では０となる標
準パターン、つまり１の列の高さがｕ％以下であるが
（ｕ−δ）％を越える標準パターンをすべてＡ₁（ｕ）
が０となるクラスにも加える。一方、Ａ₁（ｕ）では０
となるがＡ₁（ｕ＋δ）では１となる標準パターン、つ
まり１の列の高さがｕ％を越えているが（ｕ＋δ）％以
下である標準パターンをすべてＡ₁（ｕ）が１となるク
ラスにも加える。ただし、δの大きさは字体の変動やノ
イズの大きさを考慮して適切な値に定めることが必要で
ある。In order to avoid such misclassification, 0-
A region of u ± δ is provided around u% of the scale on the vertical axis of one pattern expression. Here, δ> 0. And A
₁ reference pattern becomes a 1 in (u) the A ₁ (u-δ) at 0, that is the height of one of the columns is less than u% (u-δ) all standard patterns exceeding% A ₁ (u)
Is added to the class where is 0. On the other hand, in A ₁ (u), 0
However, A ₁ (u) is 1 for all the standard patterns that are _{1 in} A ₁ (u + δ), that is, the standard patterns in which the height of one row exceeds u% but is not more than (u + δ)%. Add to class. However, it is necessary to determine the magnitude of δ to an appropriate value in consideration of the variation of the font and the magnitude of noise.

【０１６４】Ｃ_i（ｗ，ｐ）の場合には、ｗに対しては
ｕの場合と同様ｗ±αの領域を設けて同様の考察が必要
となる。ただし、α＞０である。ｐはｗ％以上の高さを
持つ１の列の個数であり、１の列の個数の変動は、主と
して、柱状をした黒画素のヒストグラムが０−１パター
ン表現に変換される場合の横軸の区間の境界線上にくる
と、その境界線の両側の区間に、それぞれ、１の列が出
来て合計２本の１の列に変換されることによる。文字画
像では、このような柱状の部分の幅は横軸の区間幅に比
して通常は相当に小さいので、それが境界線上にない限
り１個の１の列に変換される。In the case of C _i (w, p), similar considerations are required for w by providing an area of w ± α as in the case of u. Here, α> 0. p is the number of one row having a height of not less than w%, and the variation of the number of one row is mainly due to the horizontal axis when the histogram of the columnar black pixels is converted to the 0-1 pattern expression. , One column is formed in each of the sections on both sides of the boundary line, which is converted into a total of two one columns. In a character image, the width of such a columnar portion is usually considerably smaller than the section width of the horizontal axis, so that the columnar portion is converted into one column unless it is on the boundary line.

【０１６５】例えば、ｐ＝２とすると、Ｃ_i（ｗ，ｐ）
が１となるクラスは高さがｗ％以上の１の列が１個また
は２個である標準パターンからなり、Ｃ_i（ｗ，ｐ）が
０となるクラスは高さがｗ％以上の１の列が３個以上で
ある標準パターンからなる。そこで、字体の変動やノイ
ズを考慮した場合に、後者のクラスから前者に移動する
可能性のある標準パターンを前者に加えておく必要があ
る（ただし、後者からそれらの標準パターンを除くとい
う意味ではない）。この例の場合には、加えておくべき
標準パターンを次のような標準パターンとすれば十分で
ある。すなわち、高さがｗ％以上の１の列が３個でかつ
その中の２個が隣接している標準パターン（この２個が
１個となる可能性がある）、および高さがｗ％以上の１
の列が４個でかつその中の２個づつが隣接している標準
パターン（隣接した２組の２個がそれぞれ１個となる可
能性がある）となる。一方、前者から後者に移動する可
能性のある標準パターンとしては、高さがｗ％以上の１
の列が２個の標準パターンのみであり（この２個のうち
少なくとも１個が２つの１の列となれば、後者のクラス
に移動することになる）、これらを後者に加えておけば
良い。For example, if p = 2, C _i (w, p)
Is a standard pattern having one or two rows of 1 having a height of not less than w%, and a class having a value of C _i (w, p) of 0 is 1 having a height of not less than w%. Is a standard pattern having three or more columns. Therefore, it is necessary to add to the former a standard pattern that may move from the latter class to the former in consideration of font variations and noise (however, in the sense that those standard patterns are excluded from the latter) Absent). In the case of this example, it is sufficient if the standard patterns to be added are the following standard patterns. That is, a standard pattern in which three rows of 1 having a height of w% or more are three and two of them are adjacent (there is a possibility that these two become one), and the height is w% Above 1
Is a standard pattern in which two columns are adjacent to each other (each of two adjacent pairs may be one). On the other hand, as a standard pattern that may move from the former to the latter, 1
Is only two standard patterns (if at least one of the two becomes two single columns, it will move to the latter class), and these may be added to the latter. .

【０１６６】このようにして、種々の分類条件を用いて
標準パターンを予め分類しておけば、未知入力文字画像
の０−１パターン表現について、それが該当するクラス
を分類条件を基にして極めて高い精度で見いだし、その
中から特願平６−３１２０４９号におけるような大分類
法に基づいて候補字種を高速かつ正確に選択することが
可能となる。As described above, if the standard patterns are classified in advance using various classification conditions, the class to which the 0-1 pattern expression of the unknown input character image corresponds can be extremely determined based on the classification conditions. It is possible to select candidate character types at high speed and with high accuracy based on a large classification method as disclosed in Japanese Patent Application No. 6-312049.

【０１６７】ここで、上述した文字認識用辞書の大規模
化に係る認識の高速化法、すなわちクラス分類された標
準パターンを用いる方法を適用した文字認識システムに
ついて説明する。この文字認識システムは、図６、図１
９の文字認識システム、図１７の標準パターン作成装置
等に適宜修正を施すことにより実現できる。修正を施す
部分以外については、基本的にはこれまで説明してきた
ものと同様であるので、修正を施す部分についてのみ説
明する。Here, a description will be given of a character recognition system to which the above-described method for increasing the size of the character recognition dictionary, that is, a method for using a standard pattern classified into classes, is applied. This character recognition system is shown in FIGS.
9 can be realized by making appropriate modifications to the character recognition system of FIG. Except for the portion to be corrected, it is basically the same as that described above, and thus only the portion to be corrected will be described.

【０１６８】まず、前述のような標準パターンに対する
クラス分類の処理は、図１７の標準パターン作成装置あ
るいは図１９の文字認識システムにおける標準パターン
作成部２８の後段に設けるクラス分類部により行う。First, the above-described class classification processing for the standard pattern is performed by the class classification unit provided after the standard pattern generation unit 28 in the standard pattern generation device in FIG. 17 or the character recognition system in FIG.

【０１６９】そして、クラス分類された標準パターン
は、図６あるいは図１９の文字認識システムにおける０
−１パターン表現辞書１２に登録される。すなわち、各
標準パターンについて、０−１パターン表現のデータ
と、属するクラスを示す情報とが対応付けられて格納さ
れる。Then, the standard pattern classified into the class is 0 in the character recognition system shown in FIG. 6 or FIG.
-1 is registered in the pattern expression dictionary 12. That is, for each standard pattern, the data of the 0-1 pattern expression and the information indicating the class to which it belongs are stored in association with each other.

【０１７０】文字認識システムに未知入力文字が入力さ
れると、大分類処理部６による大分類処理の前処理とし
て、未知入力文字画像から得られた０−１パターン表現
に対し、先に標準パターンをクラス分類したときと同じ
条件によるクラス分類処理を施し、未知入力文字の該当
するクラスを求める。言うまでもなく、未知入力文字の
該当するクラスの決定は極めて正確かつ簡単にできる。When an unknown input character is input to the character recognition system, as a pre-process of the large classification processing by the large classification processing unit 6, a standard pattern is first added to the 0-1 pattern expression obtained from the unknown input character image. Is subjected to a class classification process under the same conditions as when the class was classified, and a class corresponding to the unknown input character is obtained. Of course, the determination of the class of the unknown input character is extremely accurate and easy.

【０１７１】そして、大分類処理においては、未知入力
文字について求められたクラスと同一のクラスに属する
標準パターンに対象を絞って大分類処理を行う。In the large classification process, the large classification process is performed focusing on standard patterns belonging to the same class as the class obtained for the unknown input character.

【０１７２】このように、辞書が大規模化した場合にも
容易に高速、高精度な文字認識を実現することが可能と
なる。As described above, high-speed and high-precision character recognition can be easily realized even when the dictionary is enlarged.

【０１７３】以下では、本実施形態により得られる効果
について説明する。Hereinafter, effects obtained by the present embodiment will be described.

【０１７４】特願平６−３１２０４９号では、辞書とし
て予め用意される標準パターンは認識の対象となる文字
集合１セット分の文字データ（サンプル）である。そし
て、各字種ごとの字体の変動に対しては大分類法で対処
し、それで対処しきれないで誤認識となる文字を標準パ
ターンとして辞書に追加するという対策をとっている。
このため複数のフォントやサイズの異なる文字からなる
認識対象に対しては、学習セットを用いて大分類の学習
実験を行い誤認識となった文字を標準パターンとして予
め用意された辞書に追加する。このような操作を行って
補強された辞書を用いて本番の認識が行われる。In Japanese Patent Application No. 6-312049, a standard pattern prepared in advance as a dictionary is character data (sample) for one set of character sets to be recognized. Then, the variation of the font for each character type is dealt with by the large classification method, and a character that cannot be dealt with and is erroneously recognized is added to the dictionary as a standard pattern.
Therefore, for a recognition target composed of a plurality of fonts and characters having different sizes, a learning experiment for large classification is performed using a learning set, and characters that have been erroneously recognized are added to a dictionary prepared in advance as a standard pattern. The actual recognition is performed using the dictionary reinforced by performing such an operation.

【０１７５】一方、本発明による標準パターン（辞書）
の作成法によれば、次のことが可能となる。On the other hand, the standard pattern (dictionary) according to the present invention
According to the method of making, the following is possible.

【０１７６】（１）多様な字体の変動を吸収する標準パ
ターン（辞書）の作成前述したように、認識対象文字のフォントの種類および
文字のサイズが与えられれば、その１セットを基にし
て、各字種の字体の変動を完全に吸収した最小限に近い
数の標準パターン（辞書）を即座に作成することができ
る。(1) Creation of Standard Pattern (Dictionary) for Absorbing Variations of Various Fonts As described above, given the font type and character size of the character to be recognized, based on one set thereof, A near-minimum number of standard patterns (dictionaries) that completely absorb the variation of the font of each character type can be created immediately.

【０１７７】この点について、以下に具体例を示す。A specific example will be described below in this regard.

【０１７８】写植文字でサイズが、６，７，８，１０．
５，１１，１２，１４，２０ポイントの石井明朝体、お
よびサイズが９ポイントの石井太明朝体、Ｌ明朝体、お
よびＭ明朝体について、ＪＩＳ第一水準の漢字２９６５
字種を対象にして、本発明による方法で標準パターンを
作成し、前記大分類法により大分類の実験を行った。た
だし、文字画像の縦と横方向のヒストグラムの横軸の分
割は何れも６分割、±４５度方向のヒストグラムの横方
向の分割は何れも３分割、つまり、６，６，３，３とし
た（なお、言うまでもなく、横軸の分割を増やすと大分
類における候補字種数はより少数に絞り込まれる）。The size is 6, 7, 8, 10,.
JIS first-level kanji characters 2965 for 5,11,12,14,20 points of Ishii Mincho style, and 9 points of Ishii Tai Mincho style, L Mincho style, and M Mincho style
A standard pattern was created for the character type by the method according to the present invention, and a large classification experiment was performed by the above-described large classification method. However, the horizontal axis of the vertical and horizontal histograms of the character image is divided into six, and the horizontal division of the ± 45-degree histogram is divided into three, that is, 6, 6, 3, 3. (Of course, if the number of divisions on the horizontal axis is increased, the number of candidate character types in the large classification is reduced to a smaller number.)

【０１７９】この場合には、文字のサイズが９種類とフ
ォントが４種類であり、サイズまたはフォントが異なる
サンプルは字種ごとに１１個となる（前述の集合Ｓに相
当する）。このように同一字種で字体の異なる１１個の
文字からなる集合を、各字種ごとに標準パターン１個
（前述のＹ（Ｓ））で代表させ、漢字２９６５字種につ
いて標準パターンの作成に取り込んだすべての文字を未
知入力として大分類実験を行った。In this case, there are nine types of character sizes and four types of fonts, and there are 11 samples having different sizes or fonts for each character type (corresponding to the above-described set S). A set of 11 characters having the same character type but different fonts is represented by one standard pattern (Y (S) described above) for each character type, and a standard pattern is created for 2965 kanji characters. A large classification experiment was performed using all the captured characters as unknown input.

【０１８０】その結果、石井明朝体の場合はどのサイズ
でも平均候補数は約４個であった。その中に真の候補は
１００％含まれていた。なお、この場合、横軸の分割を
７，７，３，３にすると、石井明朝体の場合は平均候補
数が１．５個程度となった。As a result, in the case of Mincho Ishii, the average number of candidates was about 4 for any size. Among them, 100% of true candidates were included. In this case, when the horizontal axis is divided into 7, 7, 3, and 3, the average number of candidates is about 1.5 in the case of Minami Ishii.

【０１８１】さらに、漢字２９６５字種について、標準
パターンの作成に取り込まれなかった２２ポイントの石
井明朝体の大分類実験を行った結果は、平均候補３．８
個でその中に真の候補が含まれる割合は９８．２％であ
った。Further, as for the 2965 kanji characters, a large classification experiment of 22 points of Akashi Ishii which was not included in the creation of the standard pattern showed that the average candidate was 3.8.
The percentage of individuals that contained true candidates was 98.2%.

【０１８２】もし上述の認識対象について本発明に係る
標準パターンを適用せずに字体の変動を吸収しようとす
れば、学習実験に相当な時間と労力が要求されるばかり
でなく、大分類により得られる平均候補字種数も２桁近
いものとなることが予想される。さらに、本発明を用い
れば大分類における字種ごとに妥当な標準パターンの個
数を比較的容易に決定できるのに対し、大分類法と学習
実験にのみ頼る技術ではそのような決定を下すには相当
な困難が予想される。If an attempt is made to absorb variations in fonts without applying the standard pattern according to the present invention to the above-mentioned recognition target, not only a considerable amount of time and effort is required for a learning experiment, but also a large classification is required. It is expected that the average number of candidate character types obtained will be close to two digits. Furthermore, while the present invention makes it relatively easy to determine the number of valid standard patterns for each character type in a large classification, a technique that relies solely on a large classification method and a learning experiment requires such a determination. Considerable difficulties are expected.

【０１８３】（２）誤認識に基づく標準パターンの修正
と追加誤りを生じた入力文字Ｘと、この入力文字Ｘと同一字種
の標準パターン（一般に複数個）とを比較し、例えば、
０−１パターン表現の横軸の各座標で式（２）と式
（４）の条件を活用して、条件を満たさない座標軸が適
当に定めた数（しきい値）以下の標準パターンが存在す
れば、Ｘとその標準パターンの２つを代表する新たな標
準パターンを作成するという方法で標準パターンの修正
を行い、そのような標準パターンが存在しなければ、Ｘ
を標準パターンに追加するという方法、あるいはＸをそ
の字種の既存の標準パターンに含めて改めてクラスタリ
ングを行い新たに標準パターンを作る方法などを与えて
おくことによって、効率良く標準パターンの修正・追加
ができる。(2) Correction and Addition of Standard Pattern Based on Misrecognition An input character X having an error is compared with a standard pattern (generally a plurality) of the same character type as the input character X.
By utilizing the conditions of Equations (2) and (4) at each coordinate on the horizontal axis of the 0-1 pattern expression, there are standard patterns whose number of coordinate axes that do not satisfy the conditions is equal to or less than a predetermined number (threshold). Then, the standard pattern is corrected by a method of creating a new standard pattern representing two of X and its standard pattern. If such a standard pattern does not exist, X
Correction / addition of the standard pattern efficiently by giving a method of adding X to the standard pattern, or giving a method of creating a new standard pattern by performing clustering again by including X in the existing standard pattern of that character type Can be.

【０１８４】（３）潰れ文字の認識への応用文字サイズが６ポイント以下のようなごく小さい文字や
込み入った漢字などは、入力機器のイメージスキャナで
潰れが生じることや、あるいはプリントの際に潰れが生
じることなどが少なくない。このため、潰れ文字に対す
る対策は、文字認識の高精度化の面で実用上最も重要な
課題の１つといえる。本発明は、その対策として有効に
機能することが期待される。(3) Application to Recognition of Collapsed Characters Very small characters with a character size of 6 points or less or complicated Chinese characters may be collapsed by an image scanner of an input device, or collapsed during printing. Often occur. For this reason, it can be said that measures against broken characters are one of the most important practical issues in terms of improving the accuracy of character recognition. The present invention is expected to function effectively as a countermeasure.

【０１８５】すなわち、文字の潰れを字体の変動が大き
い場合の特殊なケースとみなし、潰れが起こりやすい文
字について潰れが生じた種々の場合の０−１パターン表
現のクラスタリングを行い、その各クラスを代表する標
準パターンを作成する。そして、文字切出の際の情報な
どから潰れの可能性が大きいと判断された場合など（例
えば、脚注などで６ポイント以下の小文字が使用されて
いる場合、あるいは印刷が悪く潰れがある場合など）、
状況に応じて潰れの標準パターンを併用するという方法
で、認識の高精度化に寄与することが期待される。That is, character crushing is regarded as a special case in which the variation of the font is large, and the clustering of the 0-1 pattern expression in various cases where crushing occurs is performed for characters that are likely to be crushed. Create a representative standard pattern. Then, when it is determined that the possibility of crushing is large based on information at the time of character extraction (for example, when a small letter of 6 points or less is used in a footnote, etc., or when printing is bad and there is crushing, etc.). ),
It is expected that the method of using a collapsed standard pattern in accordance with the situation will contribute to higher recognition accuracy.

【０１８６】（４）字体の変動を広範囲に亘って考慮す
る場合字体の変動をフォントやサイズの違いおよび種々のプリ
ンタなど広範囲に亘って考慮する場合には、各字種ごと
ではなく、フォント、サイズ、プリンタを単位とするレ
ベル、言い換えれば、各フォント、各サイズ、各プリン
タごとに対象とする全字種のベクトルを同一順に連結し
て得られるベクトルを表現するベクトル空間でフォン
ト、サイズおよびプリンタを一括したクラスタリングを
行って、得られた各クラスごとに標準パターンを作成す
るという簡便法が考えられる。この方法によっても実用
性の高い標準パターンが得られる。(4) When Considering Variations of Fonts over a Wide Range When considering variations of fonts over a wide range such as differences in fonts and sizes and various printers, fonts, fonts, A font, size, and printer in a vector space that represents a vector obtained by concatenating vectors of all character types targeted for each font, each size, and each printer in the same order, in a level in units of size and printer. A simple method can be considered in which clustering is collectively performed to create a standard pattern for each obtained class. This method also provides a highly practical standard pattern.

【０１８７】本発明は、上述した実施の形態に限定され
るものではなく、その技術的範囲において種々変形して
実施することができる。The present invention is not limited to the above-described embodiments, but can be implemented with various modifications within the technical scope.

【０１８８】[0188]

【発明の効果】本発明によれば、入力文字と字種の標準
パターンとを、それらのドットマトリクスの例えば縦方
向や横方向のヒストグラムに基づいた０−１パターン表
現により部分照合して、入力文字に対する字種の候補を
探索する文字認識装置において、同一字種であるが互い
に字体の異なる２以上の文字画像のドットマトリクスか
ら得られるヒストグラムから夫々作成された０−１パタ
ーン表現をもとにしていずれの字体も部分照合に適合す
るように標準パターンを作成するので、字体の違いによ
る文字の変動を効果的に吸収することのできる標準パタ
ーンを作成することができるとともに、この標準パター
ンを作成して辞書登録して文字認識を行えば、同一字種
で字体の違う文字が入力されても高速かつ正確に字種候
補を得ることが可能である。According to the present invention, input characters and standard patterns of character types are partially collated by a 0-1 pattern expression based on, for example, a vertical or horizontal histogram of those dot matrices. In a character recognition device that searches for a character type candidate for a character, based on 0-1 pattern expressions respectively created from a histogram obtained from a dot matrix of two or more character images of the same character type but different fonts from each other. In this case, a standard pattern is created so that each font conforms to partial matching, so it is possible to create a standard pattern that can effectively absorb variations in characters due to differences in fonts, and create this standard pattern By registering a dictionary and recognizing characters, even if characters with the same character type but different fonts are input, character type candidates can be obtained quickly and accurately. It is.

【０１８９】また、本発明によれば、標準パターン（０
−１パターン表現）を、予め０−１パターン表現に現れ
る顕著な特徴に基づく分類条件に従って幾つかのクラス
に分類しておき、大分類処理に先だって入力文字の０−
１パターン表現から該当するクラスを求め、このクラス
に属する標準パターンに対象を絞って大分類処理を行う
ので、文字認識用辞書が大規模化されても、高い認識精
度を維持しつつ認識処理を高速化することが可能とな
る。According to the present invention, the standard pattern (0
-1 pattern expression) are classified into several classes in advance according to classification conditions based on salient features appearing in the 0-1 pattern expression.
Since the corresponding class is obtained from one pattern expression and the large classification process is performed by focusing on the standard patterns belonging to this class, the recognition process can be performed while maintaining high recognition accuracy even if the character recognition dictionary is enlarged. It is possible to increase the speed.

[Brief description of the drawings]

【図１】文字画像「本」のドットマトリクスの一例とそ
の縦および横方向のヒストグラムの一例を示す図FIG. 1 is a diagram illustrating an example of a dot matrix of a character image “book” and an example of histograms in the vertical and horizontal directions thereof.

【図２】文字画像「本」のドットマトリクスの一例とそ
の±４５度方向のヒストグラムの一例を示す図FIG. 2 is a diagram illustrating an example of a dot matrix of a character image “book” and an example of a histogram in a ± 45-degree direction thereof;

【図３】様々な文字画像の縦および横方向のヒストグラ
ムの例を示す図FIG. 3 is a diagram showing an example of vertical and horizontal histograms of various character images.

【図４】様々な文字画像の縦および横方向のヒストグラ
ムの例を示す図FIG. 4 is a diagram showing an example of vertical and horizontal histograms of various character images.

【図５】縦および横方向のヒストグラムと０−１パター
ン表現を説明するための図FIG. 5 is a diagram for explaining vertical and horizontal histograms and 0-1 pattern expression;

【図６】本発明の一実施形態に係る文字認識システムの
構成を示すブロック図FIG. 6 is a block diagram showing a configuration of a character recognition system according to an embodiment of the present invention.

【図７】分類手順１Ａの分類アルゴリズムを示すフロー
チャートFIG. 7 is a flowchart showing a classification algorithm of a classification procedure 1A.

【図８】分類手順１Ａの分類アルゴリズムを示すフロー
チャートFIG. 8 is a flowchart showing a classification algorithm of a classification procedure 1A.

【図９】分類手順１Ａの分類アルゴリズムを示すフロー
チャートFIG. 9 is a flowchart showing a classification algorithm of a classification procedure 1A.

【図１０】細分類手順の一例を示すフローチャートFIG. 10 is a flowchart illustrating an example of a sub-classification procedure.

【図１１】細分類手順の一例を示すフローチャートFIG. 11 is a flowchart illustrating an example of a sub-classification procedure.

【図１２】細分類手順の一例を示すフローチャートFIG. 12 is a flowchart illustrating an example of a sub-classification procedure.

【図１３】細分類手順の一例を示すフローチャートFIG. 13 is a flowchart showing an example of a sub-classification procedure.

【図１４】縦および横方向のヒストグラムと０−１パタ
ーン表現を説明するための図FIG. 14 is a diagram for explaining histograms in the vertical and horizontal directions and 0-1 pattern expression;

【図１５】同一字種を表す３種類の０−１パターン表現
を示す図FIG. 15 is a diagram showing three types of 0-1 pattern expressions representing the same character type.

【図１６】本発明による標準パターンを説明するための
図FIG. 16 is a diagram for explaining a standard pattern according to the present invention.

【図１７】本発明の一実施形態に係る標準パターン作成
装置の構成を示すブロック図FIG. 17 is a block diagram showing a configuration of a standard pattern creation device according to an embodiment of the present invention.

【図１８】標準パターン作成手順の一例を示すフローチ
ャートFIG. 18 is a flowchart illustrating an example of a standard pattern creation procedure.

【図１９】本発明の一実施形態に係る文字認識システム
の構成を示すブロック図FIG. 19 is a block diagram showing a configuration of a character recognition system according to an embodiment of the present invention.

[Explanation of symbols]

２…文書画像入力部４…文字切り出し処理部６…大分類処理部８…細分類処理部１０…ヒストグラム作成部１２…０−１パターン表現辞書１４…標準文字ヒストグラム辞書２２…文字画像データ入力部２４…ヒストグラム作成部２６…０−１パターン表現作成部２８…標準パターン作成部 2 Document image input unit 4 Character extraction processing unit 6 Large classification processing unit 8 Fine classification processing unit 10 Histogram creation unit 12 0-1 pattern expression dictionary 14 Standard character histogram dictionary 22 Character image data input unit 24 ... Histogram creation unit 26 ... 0-1 pattern expression creation unit 28 ... Standard pattern creation unit

Claims

(57) [Claims]

1. A histogram is created from a dot matrix of an input character image, and a 0-1 pattern expression extracted from the created histogram of the character image, and one or a plurality of patterns are prepared for each character type. The standard pattern 0-1 as the standard pattern for the character recognition device for partially comparing the character pattern 0-1 pattern expression as the standard pattern of each character type and searching for a character type candidate corresponding to the input character image. A standard pattern creation method for creating a pattern expression, wherein a histogram is created from a dot matrix of two or more character images having the same character type but different fonts, and a 0-1 pattern expression is created from each created histogram. And performing maximization processing and minimization processing based on each of the generated 0-1 pattern expressions, and using the result as the standard pattern. Standard pattern creation how.

2. The method according to claim 1, wherein the target range of the dot matrix of the character image based on which the histogram is created is an inscribed rectangular area defined by the height and width of the character image. The standard pattern creation method described.

3. A histogram used to create the 0-1 pattern expression is created by dividing a count value of black pixels projected from a predetermined direction on the dot matrix of the character image by the number of pixels arranged in the predetermined direction. Histogram
3. The standard pattern creating method according to claim 1, wherein the predetermined directions are obtained and connected in the horizontal direction of the histogram.

4. The predetermined direction is two directions of a vertical direction and a horizontal direction, two directions of a diagonal ± 45 degrees direction, or four directions of a vertical direction, a horizontal direction and a diagonal ± 45 degrees direction. 4. The standard pattern creating method according to claim 3, wherein:

5. When creating the 0-1 pattern expression, the horizontal axis of the connected histogram is divided into L sections, and the vertical axis is divided into m sections, and m × L areas are divided. 5. The standard according to claim 3, wherein, in each of the m × L regions, 1 is assigned to the region when a histogram passing through the region exists, and 0 is assigned to the region when no histogram exists. Pattern creation method.

6. The maximizing process includes the step of:
The mxL area of the -1 pattern expression is represented by any one of the 0-1 pattern expressions corresponding to the two or more character images having the same character type but different fonts from each other. The selection is performed for each of the L sections. In the selection, for each of the sections on the horizontal axis, the number of 0-areas, which is the largest among the m areas arranged in the vertical axis direction, is the largest. 6. The standard pattern creating method according to claim 5, wherein said m regions in one pattern expression are selected.

7. The minimizing process includes the step of:
The mxL area of the -1 pattern expression is represented by any one of the 0-1 pattern expressions corresponding to the two or more character images having the same character type but different fonts from each other. The selection is made for each of the L sections, and in the selection, the number of the 0-areas, which is the smallest among the m areas arranged in the vertical axis direction for each of the sections on the horizontal axis, is the smallest. 6. The standard pattern creating method according to claim 5, wherein said m regions in one pattern expression are selected.

8. A histogram is created from a dot matrix of an input character image, and a 0-1 pattern expression extracted from the created histogram of the character image and one or a plurality of patterns are prepared for each character type. The standard pattern 0-1 as the standard pattern for the character recognition device for partially comparing the character pattern 0-1 pattern expression as the standard pattern of each character type and searching for a character type candidate corresponding to the input character image. A standard pattern creating apparatus for creating a pattern expression, comprising: means for creating histograms from dot matrices of two or more character images of the same character type but having different fonts; 0-1 from each of the created histograms; Means for creating a pattern expression; and means for performing a maximization process and a minimization process based on each of the created 0-1 pattern expressions. Quasi-pattern forming apparatus.

9. An inspection means for cutting out an input document image for each predetermined recognition unit, a histogram making means for making a histogram from the cut out recognition unit, and a histogram extracted from the created histogram of the recognition unit. By partially comparing the 0-1 pattern expression with the 0-1 pattern expression of each of the recognition target categories prepared in advance,
Large classification means for searching for a candidate of a recognition target category corresponding to the recognition unit, wherein the 0-1 pattern expression of each of the previously prepared recognition target categories has the same character type but different fonts. Character recognition characterized by being obtained by performing predetermined maximization processing and predetermined minimization processing based on 0-1 pattern expressions respectively created from a histogram obtained from a dot matrix of a character image of apparatus.

10. For each of a plurality of character types to be recognized, a histogram is created from a dot matrix of two or more character images having the same character type but different fonts, and a histogram is created from each of the created histograms. One pattern expression is created, maximization processing and minimization processing are performed based on each created 0-1 pattern expression, and the result is a 0-1 pattern that is a standard pattern corresponding to the character type for which recognition is to be performed. Registered as an expression in the recognition dictionary, the input document image is cut out for each predetermined recognition unit, a histogram is created from the cut out recognition unit, and 0-1 extracted from the created histogram of the recognition unit is created.
The pattern expression is partially collated with the 0-1 pattern expression that is the standard pattern corresponding to each character type to be recognized registered in the recognition dictionary, and the character type corresponding to the recognition unit is compared. A character recognition method characterized by searching for candidates.

11. A 0-bit histogram prepared from a dot matrix of two or more character images having the same character type but different fonts, which are prepared as at least one standard pattern for each recognition target category. A 0-1 pattern expression obtained by performing predetermined maximization and minimization processing based on one pattern expression, and a class obtained by classifying the 0-1 pattern expression based on predetermined classification conditions. A recognition dictionary in which information indicating a class to which the standard pattern belongs is registered, a check means for cutting out an input document image for each predetermined recognition unit, and a histogram creation for forming a histogram from the cut out recognition units Means, classifying the 0-1 pattern expression extracted from the created histogram of the recognition unit according to the predetermined classification condition, Preprocessing means for obtaining a corresponding class; 0-1 pattern expression extracted from the created histogram of the recognition unit; and a class obtained by the preprocessing means among standard patterns registered in the recognition dictionary. A large classifying means for partially collating a standard pattern associated with information indicating the same class and searching for a candidate for a recognition target category corresponding to the recognition unit. Recognition device.

12. A histogram used to create the 0-1 pattern expression is created by dividing a count value of black pixels projected from a predetermined direction on the dot matrix of the character image by the number of pixels arranged in the predetermined direction. The histogram
Two or more predetermined directions are obtained, and these are connected in the horizontal axis direction of the histogram. In the 0-1 pattern expression, the horizontal axis of the connected histogram is divided into L sections, and the vertical axis is It is divided into m sections to form m × L areas, and in each of the m × L areas, 1 is set to the area when there is a histogram passing through the area, and 0 is set to 0 when no histogram exists. The character recognition device according to claim 11, wherein the character recognition device is created by assigning.

13. The maximizing process includes the step of converting m × L regions of the 0-1 pattern expression of the standard pattern into two or more character images corresponding to the same character type but different fonts from each other. From any of the 0-1 pattern representations,
The selection is performed for each of the L sections on the horizontal axis. In the selection, the number of areas that are 1 out of m areas arranged in the vertical axis direction is maximum for each section on the horizontal axis. The m areas in the 0-1 pattern expression are selected, and the minimization processing is performed by setting the m × L areas in the 0-1 pattern expression of the standard pattern to the same character type. Corresponds to each of two or more character images having different fonts.
A selection is made for each of the L sections on the horizontal axis from any one of the one-pattern representations. In the selection, for each of the sections on the horizontal axis, m areas in the vertical axis direction are selected. 13. The character recognition device according to claim 11, wherein the m regions in the 0-1 pattern expression in which the number of regions that are 1 are the smallest are selected.

14. The character recognition apparatus according to claim 11, wherein said classification condition specifies a shape of a pattern consisting of 0 and 1 of said 0-1 pattern expression.

15. The classification condition comprises one or more sets of different criteria, each criterion defining a value in the vertical axis direction of a section of the horizontal axis in the 0-1 pattern expression. 13. The character recognition device according to claim 12, wherein the character recognition apparatus includes a rule for a section on the horizontal axis that satisfies the rule.

16. At least one of the standard patterns prepared for each of the recognition target categories, each of which is created from a histogram obtained from a dot matrix of two or more character images having the same character type but different fonts from each other. A 0-1 pattern expression obtained by performing predetermined maximization and minimization processing based on one pattern expression, and a class obtained by classifying the 0-1 pattern expression based on predetermined classification conditions. Information indicating the class to which the standard pattern belongs is registered in the recognition dictionary in association with it, the input document image is cut out for each predetermined recognition unit, and a histogram is created from the cut out recognition units. The 0-1 pattern expression extracted from the histogram of the recognition unit is classified into classes according to the predetermined classification condition to obtain a corresponding class. The 0-1 pattern expression extracted from the histogram of the recognition unit and a standard pattern associated with information indicating the same class as the obtained class among the standard patterns registered in the recognition dictionary. A character recognition method characterized by partially collating and searching for a candidate for a recognition target category corresponding to the recognition unit.