JP2918363B2

JP2918363B2 - Character classification method and character recognition device

Info

Publication number: JP2918363B2
Application number: JP3236305A
Authority: JP
Inventors: 浩史吉田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1991-09-17
Filing date: 1991-09-17
Publication date: 1999-07-12
Anticipated expiration: 2014-07-12
Also published as: JPH0573723A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、文字分類方法及び文
字認識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character classification method and a character recognition device.

【０００２】[0002]

【従来の技術】機械が文字図形を自動的に識別できれ
ば、例えばコンピュータへのデータ入力を人間が行うよ
り効率よくかつ正確に行うことが出来る等、種々の利点
が得られる。このため、文字認識装置に関する研究が従
来から盛んに行われている。2. Description of the Related Art Various advantages can be obtained if a machine can automatically recognize a character or graphic, for example, data can be input to a computer more efficiently and more accurately than a human. For this reason, research on a character recognition device has been actively conducted.

【０００３】従来の文字認識装置は、一般に、以下のよ
うな手順で行われていた。[0003] The conventional character recognition apparatus is generally performed in the following procedure.

【０００４】まず、文字、図形等が記載されている媒体
例えば帳票を走査して得られた光信号を光電変換し、さ
らに文字線部を例えば黒ビット、背景部を白ビットで表
した２値の入力文字行データを得る。次に、この入力文
字行データより文字パタンを切り出す。さらにこの文字
パタンより特徴抽出を行い、この特徴を予め用意してあ
る標準文字の特徴と比較し最も類似度の高い標準文字パ
タンの文字名を被認識文字の認識結果文字名として出力
する。First, an optical signal obtained by scanning a medium on which characters, figures, and the like are written, for example, a form, is photoelectrically converted, and further, a character line portion is expressed by, for example, a black bit and a background portion is expressed by a binary bit. To get the input character line data. Next, a character pattern is cut out from the input character line data. Further, feature extraction is performed from this character pattern, and this feature is compared with the feature of a standard character prepared in advance, and the character name of the standard character pattern having the highest similarity is output as the recognition result character name of the recognized character.

【０００５】しかし、このような文字認識装置で英語の
文書、或いは英字で記された氏名、住所等の文字行を認
識する場合、この文字行中にはカンマ「，」とアポスト
ロフィ「’」、また大文字「Ｐ」と小文字「ｐ」等のよ
うに形状の全く等しい文字が混在しているため、字形の
みでは文字認識を精度よく行うことが出来ないと言う問
題点があった。However, when such a character recognition device recognizes an English document or a character line such as a name or address written in English characters, a comma "," and an apostrophe "'" are included in the character line. In addition, since characters of exactly the same shape, such as uppercase "P" and lowercase "p", are mixed, there is a problem that character recognition cannot be performed accurately with only the character shape.

【０００６】そこでこの問題点を解決するために、文字
パタンの字形に加え文字パタンの大きさ及び文字パタン
の文字行中の相対的位置を用いて文字認識を行う方法が
用いられていた。この種の方法としては例えば文献；昭
和６３年電子情報通信学会春期全国大会Ｄ４４８に開示
されているものがあった。In order to solve this problem, a method has been used in which character recognition is performed using the size of the character pattern and the relative position of the character pattern in the character line in addition to the character shape of the character pattern. As this kind of method, for example, there is a method disclosed in a literature; 1988, IEICE Spring National Convention D448.

【０００７】この方法によれば、まず、文字行から文字
に外接する矩形枠が抽出される。次に、文字行中の各文
字の外接矩形枠が比較され、最も大きい文字に比して極
端に小さい文字が除去される。次に、残った文字の外接
矩形枠の上端及び下端の高さの位置によるヒストグラム
が作成される。次に、このヒストグラムより、矩形上端
で最も低い位置にあるピークと、矩形下端で最も高い位
置にあるピークとが検出されこれらピーク間の距離とほ
ぼ同じ大きさの文字の上下端の座標を用いて最小二乗法
により文字行の傾きを与える直線が求められる。次に、
得られた直線の傾きよりスキューによる文字高さのズレ
が補正された後再び先に説明したと同様な方法でヒスト
グラムが作成される。次に、このヒストグラムより、先
に説明したと同様に２つのピークが検出されこれらピー
クが上側基準線及び下側基準線とされる。次に、これら
上側及び下側基準線間の距離が基準サイズの文字とさ
れ、文字行の各文字パタンの大きさがこの基準サイズ文
字の大きさと比較され、また各文字パタンの位置が上側
及び下側基準線と比較される。そしてこの比較結果に基
づき文字行の各文字が複数のカテゴリに分類され、これ
により認識精度の向上が図られていた。According to this method, first, a rectangular frame circumscribing a character is extracted from a character line. Next, a circumscribed rectangular frame of each character in the character line is compared, and a character extremely small as compared with the largest character is removed. Next, a histogram based on the height positions of the upper and lower ends of the circumscribed rectangular frame of the remaining characters is created. Next, from this histogram, the peak at the lowest position at the upper end of the rectangle and the peak at the highest position at the lower end of the rectangle are detected, and the coordinates of the upper and lower ends of the character having substantially the same size as the distance between these peaks are used. A straight line giving the inclination of the character line is obtained by the least square method. next,
After the deviation of the character height due to the skew is corrected from the obtained inclination of the straight line, a histogram is created again in the same manner as described above. Next, two peaks are detected from this histogram in the same manner as described above, and these peaks are used as an upper reference line and a lower reference line. Next, the distance between the upper and lower reference lines is regarded as a character of the reference size, the size of each character pattern in the character line is compared with the size of this reference size character, and the position of each character pattern is set to the upper and lower positions. Compared to the lower reference line. Each character of the character line is classified into a plurality of categories based on the comparison result, thereby improving recognition accuracy.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、上述し
た方法では、矩形情報の入力、微小文字の除去、行傾き
補正、さらに基準線算出等の一連の処理を行う必要があ
るため認識速度が著しく低下していた。また、傾きの補
正等に最小二乗法により計算を用いるので少しでも傾斜
した文字行については各文字の分類処理に著しく時間が
かかると同時に、例えばＦＡＸ画像等に頻出するよう
に、単純な傾斜ではなく、基準線が凹凸状に成っている
場合には正しく基準線が算出できず、前記分類方法が有
効に機能しないと言う問題があった。従って高精度に、
かつ高速処理可能な文字認識装置の実現が困難なものと
なっていた。However, in the above-described method, it is necessary to perform a series of processes such as inputting rectangular information, removing small characters, correcting line inclination, and calculating a reference line. Was. In addition, since the calculation is performed by the least square method for the correction of the inclination, etc., even for a character line that is slightly inclined, it takes a considerable time to classify each character, and at the same time, a simple inclination such as frequently appearing in a FAX image or the like. When the reference line is uneven, the reference line cannot be calculated correctly, and the classification method does not function effectively. Therefore, with high accuracy,
Moreover, it has been difficult to realize a character recognition device capable of high-speed processing.

【０００９】この発明は以上述べた分類速度が遅くなる
という問題点や、文字の並びが凹凸しているような文書
画像からは正しく基準線を検出できず、従って文字を正
しく分類できないので、高精度な認識が実現できないと
言う問題点を除去し、簡単な処理で、高速に、また文字
の並びに凹凸や傾斜が有るような文字行からも文字の分
類が正確に行えるような文字分類方法を提供すると共に
当該文字分類方法を用いた認識精度が高く、高速処理可
能な文字認識装置を提供することを目的とする。According to the present invention, since the classification speed described above is slow, and the reference line cannot be detected correctly from a document image in which the arrangement of characters is uneven, characters cannot be correctly classified. A character classification method that eliminates the problem that accurate recognition cannot be realized, and that can perform character classification accurately and easily from character lines with irregularities and inclinations, with simple processing, and character processing. It is an object of the present invention to provide a character recognition device that provides high recognition accuracy using the character classification method and that can perform high-speed processing.

【００１０】[0010]

【課題を解決するための手段】この発明は上記課題を解
決するために、文字行より切り出された文字を分類する
文字分類方法において、文字行の先頭の文字に対して所
定の仮の分類値を付与し、２番目以降の文字については
当該文字と直前の文字の外接枠の上辺位置の差及び下辺
位置の差をそれぞれ算出し、該位置の差の絶対値と符号
により変位量及び変位方向を求め、該変位量と変位方向
及び直前の文字に付与された仮の分類値とに基づいて、
当該文字に対して外接枠の上辺位置に基づく仮の分類値
と外接枠の下辺位置に基づく仮の分類値の２種類の仮の
分類値を付与する仮分類ステップと、当該文字行中の各
文字に付与された前記２種類の仮の分類値の各々につい
て分類値を集計して分類値毎の度数を求める集計ステッ
プと、当該文字行中の各文字を文字高さ方向に予め設定
された複数の基準線によって分類する為に、各基準線を
番号によって識別し、外接枠の上辺位置に基づく仮の分
類値の度数が最大となる文字群Ａに対しては、文書にお
いて出現頻度の高い文字群の外接枠上辺位置の存在頻度
が高くなるミーンラインに準ずる基準線の前記識別番号
を上辺分類値として付与し、その他の文字群の上辺分類
値については当該文字群の外接枠上辺位置に基づく仮の
分類値と前記文字群Ａの仮の分類値との大小関係により
付与し、当該文字行において外接枠の下辺位置に基づく
仮の分類値の度数が最大となる文字群Ｂに対しては、文
書において出現頻度の高い文字群の外接枠下辺位置の存
在頻度が高くなるベースラインに準じる基準線の前記識
別番号を下辺分類値として付与し、その他の文字群の下
辺分類値については当該文字群の外接枠下辺位置に基づ
く仮の分類値と前記文字群Ｂの仮の分類値との大小関係
により付与し、各文字毎に付与された前記上辺分類値と
下辺分類値の組み合わせに対して新たな分類結果値を付
与する分類ステップとから構成したものである。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the present invention provides a character classification method for classifying characters cut out from a character line. , And for the second and subsequent characters, calculate the difference between the upper side position and the lower side position of the circumscribed frame of the character and the immediately preceding character, respectively, and calculate the displacement amount and the displacement direction based on the absolute value of the position difference and the sign. Is calculated, based on the displacement amount, the displacement direction, and the provisional classification value assigned to the character immediately before,
A provisional classification step of assigning two types of provisional classification values to the character, a provisional classification value based on the upper side position of the circumscribed frame and a provisional classification value based on the lower side position of the circumscribed frame; A totaling step of totaling the classification values for each of the two types of temporary classification values given to the character to obtain a frequency for each classification value; and setting each character in the character line in a character height direction in advance. In order to classify by a plurality of reference lines, each reference line is identified by a number, and for the character group A in which the frequency of the provisional classification value based on the position of the upper side of the circumscribed frame is the largest, the frequency of appearance in the document is high. The identification number of the reference line corresponding to the mean line where the frequency of the upper side position of the circumscribed frame of the character group is higher is assigned as the upper side classification value, and the upper side classification value of the other character group is set at the upper side position of the circumscribed frame of the character group Provisional classification value based on the characters A character group B in which the frequency of the provisional classification value based on the position of the lower side of the circumscribed frame is maximum in the character line given in accordance with the magnitude relation with the provisional classification value of A, The identification number of the reference line according to the baseline in which the existence frequency of the circumscribed frame lower side position of the group is higher is given as the lower side classification value, and the lower side classification value of the other character groups is based on the circumscribed frame lower side position of the character group. A new classification result value is provided for the combination of the upper side classification value and the lower side classification value given for each character by giving a magnitude relationship between the provisional classification value and the provisional classification value of the character group B. And a classification step.

【００１１】[0011]

【００１２】[0012]

【００１３】また、前記課題を解決する為の本発明の文
字認識装置は、画像データを入力する画像入力部と、入
力された画像データより文字行を切り出す文字行切り出
し部と、文字行内から文字パタンを切り出す文字切り出
し部と、切り出した文字パタンを認識する文字認識部
と、文字認識の為の辞書マスクを格納した辞書部を具え
る文字認識装置において、前記文字分類方法により文字
を分類する文字分類部を具え、文字分類結果に基づいて
選択された辞書を用いて文字認識を行うことを特徴とす
る。According to another aspect of the present invention, there is provided a character recognition device for inputting image data, a character line cutout unit for cutting out a character line from the input image data, and a character line from within the character line. In a character recognition device including a character cutout unit that cuts out a pattern, a character recognition unit that recognizes a cutout character pattern, and a dictionary unit that stores a dictionary mask for character recognition, a character that classifies characters by the character classification method. A characterizing unit is provided, and character recognition is performed using a dictionary selected based on the character classification result.

【００１４】[0014]

【作用】本発明の文字分類方法によれば、仮分類ステッ
プに於いて、例えば当該文字と直前の文字の外接枠位置
の差及び直前の文字の仮の分類値に基づいて位置の変位
が所定範囲の文字に対して同一の仮の分類値を付与する
処理を外接枠上辺位置と下辺位置についてそれぞれ行
い、各文字毎に２種類の仮の分類値を付与し、集計ステ
ップにおいて、２種類の仮の分類値の各々について仮の
分類値毎の度数が求められる。分類ステップにおいて
は、例えば外接枠の上辺位置に基づく仮の分類値の度数
が最大となる文字群Ａの上辺分類値を、最も出現頻度の
高い文字群に対して予め設定されている基準線（例え
ば、ミーンラインに準ずる基準線）の番号に割り当てる
ことにより当該文字群Ａを一つのグループに分類すると
共に、残りの文字群の上辺分類値を文字群Ａの外接枠上
辺位置による仮の分類値と当該残りの文字群の仮の分類
値との大小比較により分類する。分類ステップにおいて
は、更に外接枠の下辺位置についても同様の方法で分類
して下辺分類値を得る。しかる後、上辺分類値と下辺分
類値の２種類の分類値の組み合わせに対して新たな分類
値を付与することにより、各文字を再分類する。従っ
て、文字の並びに凹凸や傾斜があるような文字行からも
簡単な処理で正確に文字を分類することが可能となる。According to the character classification method of the present invention, in the provisional classification step, for example, the displacement of the position is determined based on the difference between the circumscribed frame position of the character and the immediately preceding character and the temporary classification value of the immediately preceding character. A process of assigning the same temporary classification value to the characters in the range is performed for each of the upper side position and the lower side position of the circumscribed frame, and two types of temporary classification values are assigned to each character. The frequency of each provisional classification value is determined for each provisional classification value. In the classification step, for example, the upper-side classification value of the character group A in which the frequency of the provisional classification value based on the upper-side position of the circumscribed frame is the maximum is set to a reference line ( For example, the character group A is classified into one group by assigning it to a number of a reference line according to the mean line, and the upper side classification value of the remaining character group is provisionally classified based on the position of the upper side of the circumscribed frame of the character group A. And the provisional classification value of the remaining character group. In the classification step, the lower side position of the circumscribed frame is further classified in the same manner to obtain a lower side classification value. Thereafter, each character is re-classified by adding a new classification value to the combination of the two types of classification values, the upper classification value and the lower classification value. Therefore, characters can be accurately classified by simple processing even from character lines having irregularities or inclinations.

【００１５】また、本発明の文字認識装置によれば、こ
の文字分類方法を用いた文字分類部を備えており、分類
結果に基づいて辞書を選択する。従って認識精度が高く
高速処理の可能な文字認識装置が提供される。Further, according to the character recognition device of the present invention, a character classification unit using this character classification method is provided, and a dictionary is selected based on the classification result. Therefore, a character recognition device with high recognition accuracy and capable of high-speed processing is provided.

【００１６】[0016]

【実施例】以下図１〜図６を参照してこの発明の文字分
類方法、及び文字認識装置につき説明をする。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A character classification method and a character recognition device according to the present invention will be described below with reference to FIGS.

【００１７】図１は本発明の文字分類方法を用いた文字
認識装置の一実施例を示す構成図である。文字認識装置
１００は、画像入力部１１０、文字行切り出し部１２
０、文字切り出し部１３０、パタンメモリ１４０、本発
明の文字分類方法を用いた文字分類部１５０、辞書部１
６０、文字認識部１７０、出力端子１８０、制御部１９
０とを具えてなる。また、文字分類部１５０は、記憶部
１５１、仮分類部１５２、集計部１５３、分類部１５４
から成る構成である。FIG. 1 is a block diagram showing one embodiment of a character recognition device using the character classification method of the present invention. The character recognition device 100 includes an image input unit 110, a character line cutout unit 12
0, character segmentation unit 130, pattern memory 140, character classification unit 150 using the character classification method of the present invention, dictionary unit 1
60, character recognition unit 170, output terminal 180, control unit 19
With zero. The character classifying unit 150 includes a storage unit 151, a temporary classifying unit 152, a counting unit 153, and a classifying unit 154.
It is the structure which consists of.

【００１８】画像入力部１１０は、文字、図形、記号等
（以下、文字と言う）が記憶された例えば図２に示すよ
うな帳票からの光信号Ｓを光電変換し、例えば文字線部
を黒画素、背景部を白画素で表現した、白黒２値に量子
化された電気信号（以下、帳票画像データと言う）を生
成し、文字行切り出し部１２０に出力する。The image input unit 110 photoelectrically converts an optical signal S from a form as shown in FIG. 2 in which characters, figures, symbols, and the like (hereinafter, referred to as characters) are stored. A black and white binary quantized electric signal (hereinafter, referred to as form image data) in which pixels and a background portion are represented by white pixels is generated and output to the character line cutout unit 120.

【００１９】文字行切り出し部１２０は、画像入力部１
１０より入力された帳票画像データを、文字行方向を主
走査方向として、また文字列方向を副走査方向として順
次走査し、黒画素の分布を作成し、該黒画素の分布が
「０」から「１」以上に変化する位置から、「１」以上
から「０」に変化する位置の直前の位置までを文字行と
して切り出し当該文字行データを順次文字切り出し部１
３０に出力する。The character line cutout unit 120 is provided in the image input unit 1.
The form image data input from step 10 is sequentially scanned with the character line direction as the main scanning direction and the character string direction as the sub-scanning direction, and a distribution of black pixels is created. From the position changing to “1” or more to the position immediately before the position changing from “1” or more to “0” as a character line, the character line data is sequentially extracted from the character line data.
Output to 30.

【００２０】文字切り出し部１３０においては、文字行
切り出し部１２０より入力された文字行データより、１
文字づつの文字パタンを切り出し、該文字パタンデータ
をパタンメモリ１４０に順次格納すると共に、当該文字
パタンの位置情報を文字分類部１５０内の記憶部１５１
及び、仮分類部１５２に出力する。この１文字づつの文
字パタンの切り出しは、前記文字行切り出し部１２０よ
り入力された文字行データを、文字列方向を主走査方
向、文字行方向を副走査方向として走査し、黒画素の分
布を作成し、該黒画素の分布が「０」から「１」以上に
変化する位置から、「１」以上から「０」に変化する直
前の位置までを切り出すことにより行う。また、前記切
り出された文字パタンの位置情報とは、本実施例では当
該文字パタンの外接枠上辺座標及び下辺座標とし、図２
に示すような文字行方向をｘ軸方向、文字列方向をｙ軸
方向としたｘｙ座標系にて表した帳票上における絶対座
標値とする。In character extracting section 130, the character line data input from character line extracting section 120
A character pattern for each character is cut out, the character pattern data is sequentially stored in the pattern memory 140, and the position information of the character pattern is stored in the storage unit 151 in the character classification unit 150.
And it outputs to provisional classification part 152. This character pattern extraction for each character is performed by scanning the character line data input from the character line extraction unit 120 with the character string direction as the main scanning direction and the character line direction as the sub-scanning direction. This is performed by cutting out from a position where the distribution of the black pixels changes from “0” to “1” or more to a position immediately before changing from “1” or more to “0”. In the present embodiment, the position information of the cut-out character pattern is the upper side coordinate and the lower side coordinate of the circumscribed frame of the character pattern.
Are the absolute coordinate values on the form expressed in the xy coordinate system where the character line direction is the x-axis direction and the character string direction is the y-axis direction.

【００２１】パタンメモリ１４０は、文字切り出し部１
３０によって切り出された文字パタンデータを順次格納
しておく記憶部であり、ＩＣメモリ等で容易に構成でき
る。本実施例では、該パタンメモリは１文字あたり１２
８×１２８画素の容量を有し、当該文字パタンデータを
２次元形式で再現可能なように格納できる構成とする。The pattern memory 140 includes a character extracting unit 1
This is a storage unit for sequentially storing character pattern data cut out by 30 and can be easily configured with an IC memory or the like. In this embodiment, the pattern memory stores 12 characters per character.
It has a capacity of 8 × 128 pixels and can store the character pattern data so that it can be reproduced in a two-dimensional format.

【００２２】文字分類部１５０内の記憶部１５１におい
ては、文字切り出し部１３０より入力された切り出され
た文字パタンの外接枠上辺ｙ座標、及び外接枠下辺ｙ座
標よりなる位置情報、及び後述する仮分類部１５２によ
り各文字パタン毎に付与された仮の分類値を、文字行単
位に、当該文字行を構成している文字パタン各々につい
て記憶しておく記憶部である。例えば、図２に示す帳票
の第１行目については、文字切り出し部１３０において
１文字づつの文字パタンが切り出されるが、その場合、
図３に示すように該文字行の各文字に対応した外接枠上
辺位置及び外接枠上辺に対する仮の分類値、及び外接枠
下辺位置及び外接枠下辺に対する仮の分類値が記憶部１
５１に表形式で記憶される。In the storage unit 151 in the character classifying unit 150, position information including the upper y-coordinate of the circumscribed frame and the lower y-coordinate of the circumscribed frame of the extracted character pattern input from the character extraction unit 130, and a temporary information described later. This is a storage unit for storing, for each character line, a provisional classification value assigned by the classification unit 152 for each character pattern for each character pattern constituting the character line. For example, for the first line of the form shown in FIG. 2, a character pattern for each character is cut out by the character cutout unit 130. In this case,
As shown in FIG. 3, a temporary classification value for the upper side of the circumscribing frame and a temporary classification value for the lower side of the circumscribing frame and a temporary classification value for the upper side of the circumscribing frame corresponding to each character of the character line are stored in the storage unit 1.
51 is stored in a table format.

【００２３】仮分類部１５２は、文字切り出し部１３０
より入力された文字行単位の文字パタンの位置情報に基
づいて、各文字パタンを分類して仮の分類値を付与し、
各文字パタンに対応させて記憶部１５１に記憶すると共
に、前記付与した仮の分類値を集計部１５３に出力す
る。The provisional classification unit 152 includes a character extracting unit 130.
Based on the input character pattern position information of the character line unit, each character pattern is classified and a provisional classification value is assigned,
While being stored in the storage unit 151 in association with each character pattern, the assigned temporary classification value is output to the totaling unit 153.

【００２４】前記、仮の分類値の付与方法は以下によ
る。The provision method of the provisional classification value is as follows.

【００２５】まず、当該文字行先頭文字パタンに対し
ては仮の分類値として５を付与する。First, 5 is assigned to the character line head character pattern as a provisional classification value.

【００２６】次に、当該文字行の２番目以降の文字
に対しては、直前の文字パタンと当該文字パタンの外接
枠上辺位置の差、または下辺位置の差からｙ軸方向の変
位の大きさと方向を求め、（１）式により仮の分類値を
決定する。Next, for the second and subsequent characters of the character line, the magnitude of the displacement in the y-axis direction is determined from the difference between the immediately preceding character pattern and the upper-side position of the circumscribed frame or the lower-side position of the character pattern. The direction is obtained, and a provisional classification value is determined by equation (1).

【００２７】[0027]

【数１】 (Equation 1)

【００２８】但し、ｙ_s＝ｙ_n−ｙ_n-1であり、ｙ_nは
当該文字パタンの外接枠上辺または下辺のｙ座標値、ｙ
_n-1は当該文字パタンの直前の文字パタンの外接枠上辺
または下辺のｙ座標値である。また、Ｃ_nは当該文字パ
タンの仮の分類値であり、Ｃ_n-1は当該文字パタンの直
前の文字パタンの仮の分類値である。Where y _s = y _n −y _n−1 , where y _n is the y coordinate value of the upper or lower side of the circumscribed frame of the character pattern, y
_n-1 is the y coordinate value of the upper side or lower side of the circumscribed frame of the character pattern immediately before the character pattern. C _n is a provisional classification value of the character pattern, and C _n-1 is a provisional classification value of the character pattern immediately before the character pattern.

【００２９】Ｔの値は認識対象文字の画像メモリ上の大
きさ及び認識対象文字フォントのデザインに応じて設定
される予め定めた閾値であり、後述のボディハイトの１
／２よりも大きくディセンダー高さよりも小さい任意の
値を設定して差し支えない。本実施例ではＴ＝１５であ
る。The value of T is a predetermined threshold value set in accordance with the size of the character to be recognized on the image memory and the design of the character font to be recognized.
Any value larger than / 2 and smaller than the descender height may be set. In this embodiment, T = 15.

【００３０】図２に示したような帳票の第１行目の文字
行からは、図３に示すような仮の分類値が外接枠上辺位
置及び下辺位置の各々について求められる。From the first character line of the form as shown in FIG. 2, a provisional classification value as shown in FIG. 3 is obtained for each of the upper side position and the lower side position of the circumscribed frame.

【００３１】集計部１５３においては、仮分類部１５２
から入力された当該文字行の各文字パタンの仮の分類値
を集計し、各仮の分類値の文字パタンが当該文字行中に
何パタン存在するかを計数し、該計数結果を分類部１５
４に出力するものである。図３に示したような仮の分類
値からは、図４に示すような外接枠上辺及び下辺の各々
についての仮の分類値の計数結果が得られる。ここで、
図４における上辺度数とは、図３の上辺位置に基づく仮
の分類値の集計結果値であり、また、下辺度数とは、図
３の下辺位置に基づく仮の分類値の集計結果値である。In the counting section 153, the provisional classification section 152
Of the provisional classification values of the respective character patterns of the character line input from, the number of character patterns of the provisional classification values existing in the character line is counted, and the counted result is classified into the classification unit 15.
4 is output. From the provisional classification values as shown in FIG. 3, the counting results of the provisional classification values for each of the upper and lower sides of the circumscribed frame as shown in FIG. 4 are obtained. here,
The upper-side frequency in FIG. 4 is the total result value of the provisional classification value based on the upper-side position in FIG. 3, and the lower-side frequency is the total result value of the provisional classification value based on the lower-side position in FIG. .

【００３２】分類部１５４においては、記憶部１５１に
記憶されている当該文字行の各文字パタンの仮の分類
値、及び集計部１５３より入力された集計結果に基づい
て、当該文字を再分類し、該分類結果を辞書部１６０に
出力する。The classifying unit 154 re-classifies the character based on the provisional classification value of each character pattern of the character line stored in the storage unit 151 and the tabulation result input from the tabulation unit 153. , And outputs the classification result to the dictionary unit 160.

【００３３】再分類は、まず外接枠上辺位置及び下辺位
置の各々について、該位置を表す分類値を求め、次に上
辺位置及び下辺位置の分類値に基づいて当該文字パタン
を分類するという方法により行う。The reclassification is performed by first obtaining a classification value representing the position of each of the upper side position and the lower side position of the circumscribed frame, and then classifying the character pattern based on the classification values of the upper side position and the lower side position. Do.

【００３４】外接枠上辺位置及び下辺位置の各々につい
ての分類値を得る方法について以下で説明する。A method for obtaining the classification value for each of the upper side position and the lower side position of the circumscribed frame will be described below.

【００３５】図５（ｂ）の１から４までの番号を付与し
て示した各ラインは、英文等に於いて用いられる、アッ
センダーライン、ミーンライン、ベースライン、ディセ
ンダーラインに準ずる位置に設けられており、文字パタ
ンの外接枠上端位置及び下端位置を此等の１から４まで
の各ラインによって分類するための基準線である。ここ
で、前述のボディハイトとは、図５（ｂ）におけるライ
ン２からライン３までの間隔、即ち、英字小文字の標準
高さを意味しており、また、ディセンダー高さとはライ
ン３と４との間のディセンダー部の高さのことである。Each line numbered from 1 to 4 in FIG. 5B is provided at a position similar to the ascender line, mean line, base line, descender line used in English and the like. This is a reference line for classifying the upper end position and the lower end position of the circumscribed frame of the character pattern by these 1 to 4 lines. Here, the above-mentioned body height means the interval from line 2 to line 3 in FIG. 5B, that is, the standard height of lowercase alphabetic characters, and the descender heights are lines 3 and 4. Is the height of the descender part between

【００３６】外接枠上辺位置については、前記集計部よ
り入力された集計結果、最も度数の多い仮の分類値を持
つ文字パタンの分類値を２とし、当該仮の分類値より小
さい値の仮の分類値を持つ文字パタンには分類値１を、
当該仮の分類値より大きな値の仮の分類値を持つ文字パ
タンには分類値３を付与する。Regarding the position of the upper side of the circumscribed frame, the classification result of the character pattern having the provisional classification value having the highest frequency is set to 2 as a result of the totaling input from the totaling unit, and the provisional classification value of a value smaller than the provisional classification value is set to 2. A classification value of 1 is assigned to a character pattern having a classification value.
A classification value 3 is assigned to a character pattern having a temporary classification value larger than the temporary classification value.

【００３７】ここで、上述の最も頻度の高い仮の分類値
を持つ文字パタンの分類値を２とする根拠は、通常の英
語文書において、出現頻度の高い文字がｅ，ｓ，ｒ，
ｎ，ａ，ｏ，ｉ，ｕ，ｐ，ｙ等であり、此等の文字の外
接枠上辺位置がライン２（ミーンラインに準ずる基準線
の番号）の近傍に位置する頻度が極めて高いことによ
る。Here, the reason why the classification value of the character pattern having the most frequent provisional classification value is 2 is that in a normal English document, the characters with high appearance frequency are e, s, r, and
n, a, o, i, u, p, y, etc., due to the extremely high frequency at which the upper side of the circumscribed frame of these characters is located near line 2 (the number of the reference line that is similar to the mean line). .

【００３８】図４の場合、上辺度数の最も大きな仮の分
類値は６であるため、図３において、上辺位置の仮の分
類値６を有する文字パタンには分類値２を付与し、上辺
の仮の分類値が５以下である文字パタンには分類値とし
て１を付与し、上辺の仮の分類値が７以上である文字パ
タンには分類値３を付与する。In the case of FIG. 4, the provisional classification value having the largest upper side frequency is 6, so in FIG. 3, the character pattern having the provisional classification value 6 at the upper side position is assigned the classification value 2 and the upper side is assigned the classification value 2. A character pattern with a provisional classification value of 5 or less is assigned a classification value of 1, and a character pattern with an upper side provisional classification value of 7 or more is assigned a classification value of 3.

【００３９】外接枠下辺位置については、前記集計部よ
り入力された集計結果中、下辺度数の最も多い仮の分類
値を持つ文字パタンの分類値を３とし（これは、前述の
場合と同様に、出現頻度の高い文字の外接枠下辺位置
が、ベースラインに準ずる基準線即ちライン３の近傍に
位置する頻度が最も高いことによる）、当該仮の分類値
より小さい値の仮の分類値をもつ文字パタンには分類値
２を付与し、また、当該仮の分類値より大きな値の仮の
分類値を持つ文字パタンには分類値４を付与する。Regarding the position of the lower side of the circumscribed frame, the classification value of the character pattern having the provisional classification value with the highest lower side frequency is set to 3 in the totaling result input from the totaling unit (this is the same as in the case described above). Because the frequency of the lower side of the circumscribed frame of the character having a high frequency of appearance is the highest in the vicinity of the reference line, that is, the line 3, which is similar to the base line), and has a temporary classification value smaller than the temporary classification value. A classification value of 2 is assigned to a character pattern, and a classification value of 4 is assigned to a character pattern having a temporary classification value larger than the temporary classification value.

【００４０】図４の場合、下辺度数が最大となる仮の分
類値は５であるため、図３において、下辺の仮の分類値
５を有する文字パタンには分類値３を付与し、下辺の仮
の分類値が４以下である文字パタンには分類値２を付与
し、下辺の仮の分類値が６以上である文字パタンには分
類値４を付与する。In FIG. 4, the provisional classification value at which the lower side frequency is the maximum is 5, so in FIG. 3, the classification value 3 is assigned to the character pattern having the provisional classification value 5 on the lower side, and A classification value of 2 is assigned to a character pattern with a provisional classification value of 4 or less, and a classification value of 4 is assigned to a character pattern with a provisional classification value of 6 or more on the lower side.

【００４１】外接枠上辺、及び下辺についての分類値が
得られたら、本実施例ではさらに、図５（ａ）に示すテ
ーブルに従って、当該文字パタンの分類結果値を得る。
例えば、上辺の分類値が１であり、下辺の分類値が３で
ある文字［Ｍ］には分類結果値に２が付与され、また上
辺の分類値が２であり下辺の分類値が４である文字
［ｙ］には分類値５が付与される。When the classification values for the upper side and the lower side of the circumscribed frame are obtained, this embodiment further obtains the classification result value of the character pattern according to the table shown in FIG.
For example, a character [M] having an upper side classification value of 1 and a lower side classification value of 3 is assigned a classification result value of 2, and an upper side classification value of 2 and a lower side classification value of 4 A classification value 5 is assigned to a certain character [y].

【００４２】このような処理により、分類部１５４にお
いては、図３に示したような各々の文字パタンについて
の分類結果が得られる。By such processing, the classification unit 154 obtains a classification result for each character pattern as shown in FIG.

【００４３】辞書部１６０は、後述する文字認識部１７
０において用いる標準文字パタンの特徴マトリクスが格
納されているものであり、図６に示すような（Ａ）〜
（Ｆ）なる６つの各カテゴリに分類された６つの辞書よ
り構成されており、文字分類部１５０内の分類部１５４
から入力される各文字パタンについての分類結果値（１
〜７）に基づいて照合を行う辞書を選択し、当該選択さ
れた辞書の基準文字マスクを文字認識部１７０に出力す
るものである。The dictionary unit 160 includes a character recognition unit 17 described later.
This stores the characteristic matrix of the standard character pattern used in the case of 0, as shown in FIG.
(F) is composed of six dictionaries classified into each of the six categories, and the classification unit 154 in the character classification unit 150
Classification result value (1
7) selecting a dictionary to be collated based on the above (7), and outputting a reference character mask of the selected dictionary to the character recognition unit 170.

【００４４】前記辞書の選択は、図５（ａ）に示すよう
なテーブルに基づいて決定される。例えば図３の第１文
字目の文字パタン［Ｍ］については、分類値２が入力さ
れるので、辞書（Ａ）及び（Ｆ）の辞書の辞書マトリク
スを文字認識部１７０に出力する。The selection of the dictionary is determined based on a table as shown in FIG. For example, for the character pattern [M] of the first character in FIG. 3, since the classification value 2 is input, the dictionary matrix of the dictionaries (A) and (F) is output to the character recognition unit 170.

【００４５】文字認識部１７０においては、パタンメモ
リ１４０より文字パタンデータを順次読み込み、当該文
字パタンデータより特徴マトリクスを抽出し、辞書部１
６０より入力された当該文字パタンデータに対応した照
合対象の辞書マトリクスデータと前記特徴マトリクスの
照合を行い、１以上候補文字を得、該候補文字を認識結
果として出力端子１８０に出力するものである。The character recognizing section 170 sequentially reads the character pattern data from the pattern memory 140, extracts a characteristic matrix from the character pattern data,
The feature matrix is compared with the dictionary matrix data to be matched corresponding to the character pattern data input from 60, and one or more candidate characters are obtained, and the candidate characters are output to the output terminal 180 as a recognition result. .

【００４６】文字パタンデータからの特徴マトリクスの
抽出は以下のように行う。The extraction of the feature matrix from the character pattern data is performed as follows.

【００４７】まず、対象文字パタンデータよりサブパタ
ンを抽出する。文字パタンデータを複数の方向に走査
し、各走査線上で予め定めた特定の値ｈ（本実施例では
ｈ＝５）以上連続している黒画素列を検出し、該連続し
た黒画素列をサブパタンの黒画素成分として抽出するこ
とにより、文字パタンより各走査方向別のサブパタンを
抽出する。First, a sub-pattern is extracted from the target character pattern data. The character pattern data is scanned in a plurality of directions, and a black pixel row continuous over a predetermined value h (h = 5 in this embodiment) is detected on each scanning line, and the continuous black pixel row is detected. By extracting the sub-pattern as a black pixel component, a sub-pattern for each scanning direction is extracted from the character pattern.

【００４８】前記走査方向は、本実施例では、文字行方
向（以下、Ｘ軸方向）に垂直な方向（垂直方向）、及び
平行な方向（水平方向）、Ｘ軸から反時計方向４５°の
方向（左斜め方向）及び時計方向４５°の方向（左斜め
方向）とし、これら各方向毎に文字パタンを走査して各
方向別に４個のサブパタンを抽出する。In the present embodiment, the scanning direction is a direction (vertical direction) perpendicular to the character line direction (hereinafter, X-axis direction), a direction parallel to the character line direction (horizontal direction), and 45 ° counterclockwise from the X-axis. The direction is set to the direction (diagonally left) and the direction of 45 ° clockwise (diagonally left), and a character pattern is scanned in each direction to extract four sub-patterns for each direction.

【００４９】例えば垂直方向のサブパタンの抽出では垂
直方向を主走査方向とし、パタンレジスタの垂直方向の
走査線上で連続する黒画素（黒ラン）を検出し、１≧ｈ
となる長さ１の黒ランを垂直方向のサブパタンの黒画素
部分として抽出することにより垂直方向サブパタンを抽
出する。For example, in the extraction of the vertical sub-pattern, the vertical direction is set as the main scanning direction, and continuous black pixels (black runs) are detected on the vertical scanning line of the pattern register.
The vertical sub-pattern is extracted by extracting a black run of length 1 as a black pixel portion of the vertical sub-pattern.

【００５０】垂直方向のサブパタン抽出と同様にして、
残りの他の方向を主走査方向としたときのサブパタンの
抽出も行う。Similarly to the vertical sub-pattern extraction,
Sub-pattern extraction when the remaining other directions are set as the main scanning direction is also performed.

【００５１】次に、前記抽出された各方向のサブパタン
上に、文字パタンの文字外接枠に対応する方形領域を設
定し、該方形領域をＮ×Ｍ個（Ｎ、Ｍは任意好適な自然
数）の小領域に分割し、各小領域に含まれる各サブパタ
ンの文字線の長さを表す特徴量を抽出し、該特徴量を文
字外接枠の大きさで正規化し、特徴量ｆ_iからなる特徴
マトリクスＦを作成する。Next, a rectangular area corresponding to the character circumscribed frame of the character pattern is set on the extracted sub-patterns in each direction, and N × M square areas (N and M are arbitrary suitable natural numbers) are set. of divided into small regions, extracting a feature value representing the length of the character lines in each sub-patterns included in each small area, normalizing the feature amount of the character bounding box with the size, characteristics consisting of feature amounts f _i Create a matrix F.

【００５２】尚、本実施例では、前記分割数はＮ、Ｍ＝
８とし、また前記特徴量は（ｄＸ＋ｄＹ）／２なる値で
正規化するものとする。但し、ｄＸは文字外接枠の水平
方向の長さ及びｄＹは文字外接枠の垂直方向の長さであ
る。また特徴量ｆ_iは、各小領域に１〜Ｎ×Ｍまでの番
号ｉ（ｉ＝１、２、…、Ｎ×Ｍ）を順次に付して小領域
を表したときに、番号ｉの小領域の特徴量を表し、特徴
マトリクスＦの要素値である。In this embodiment, the number of divisions is N, M =
8, and the feature quantity is normalized by a value of (dX + dY) / 2. Here, dX is the horizontal length of the character circumscribed frame, and dY is the vertical length of the character circumscribed frame. The feature value f _i is the number i (i = 1,2, ..., N × M) up to 1 to N × M in each small area when representing the subregions sequentially subjected, the number i It represents the feature amount of the small area and is an element value of the feature matrix F.

【００５３】前記特徴マトリクスと辞書マトリクスの照
合は、特徴マトリクスＦと、辞書マトリクスＧとの距離
Ｒを、次式（２）により求め、距離Ｒが予め定めた値Ｐ
以下の辞書マトリクスの文字名を候補文字名とし、さら
に距離の低い順に第１位候補文字、第２位候補文字と順
位付けを行い、候補文字列を得ることにより行う。The matching between the feature matrix and the dictionary matrix is performed by calculating the distance R between the feature matrix F and the dictionary matrix G by the following equation (2).
This is performed by setting the character names of the following dictionary matrix as candidate character names, and ranking the first and second candidate characters in ascending order of distance to obtain a candidate character string.

【００５４】[0054]

【数２】 (Equation 2)

【００５５】但し、ｇｉは辞書マトリクスの要素を表
す。Here, gi represents an element of the dictionary matrix.

【００５６】制御部１９０は、図示せぬ制御信号線を通
して、文字認識装置１００を構成する各部の制御、各部
の動作やデータの同期の制御、外部とのインターフェイ
ス等のコントロールを行うものである。The control unit 190 controls each unit constituting the character recognition apparatus 100, controls the operation of each unit and data synchronization, and controls the interface with the outside through a control signal line (not shown).

【００５７】出力端子１０９は、認識結果を外部に出力
するためのデータ出力端子であり、その他のシステム
や、認識結果を記録する媒体、通信網、その他の情報処
理システム等を接続するものである。The output terminal 109 is a data output terminal for outputting a recognition result to the outside, and connects other systems, a medium for recording the recognition result, a communication network, other information processing systems, and the like. .

【００５８】尚、本発明は上述した実施例にのみ限定さ
れるものではなく、各構成成分の動作、処理の仕方、入
出力信号の流れ、配設個数、位置、形状及び個数その他
の条件を任意好適に変更できる。It should be noted that the present invention is not limited to only the above-described embodiments, and the operation of each component, the manner of processing, the flow of input / output signals, the number of arrangements, the position, the shape, the number, and other conditions are not limited. Any suitable changes can be made.

【００５９】例えば、本実施例の文字分類部において
は、文字パタンの外接枠上辺位置、及び下辺位置各々よ
り先ず分類を行い、さらに両者を併せて当該文字パタン
の分類結果値を得る構成としたが、これに限られるもの
ではなく、外接枠上辺位置のみによる分類、或いは下辺
位置のみによる分類をそのまま当該文字パタンの分類結
果としても何等差し支えない。For example, the character classifying section of this embodiment is configured to first classify the upper and lower positions of the circumscribed frame of a character pattern, and to obtain a classification result value of the character pattern by combining the two. However, the present invention is not limited to this, and the classification based on only the upper side position of the circumscribed frame or the classification based only on the lower side position may be used as the classification result of the character pattern.

【００６０】また、本実施例の文字認識装置において
は、本発明の文字分類方法による分類結果に基づいて辞
書を切り替える構成としたが、これに限られるものでは
なく、文字認識部の認識方法の切り替え、また文字パタ
ンの切り出し方法の切り替え等に前記分類結果を用いる
構成としても何等差し支えない。In the character recognition apparatus of this embodiment, the dictionary is switched based on the classification result by the character classification method of the present invention. However, the present invention is not limited to this. The configuration using the classification result for switching, switching of the character pattern cutting method, or the like may be used.

【００６１】また前記画像入力部は帳票からの光信号を
光電変換して、帳票画像データを得る構成としたが、こ
れに限られるものではなく、例えばＦＡＸ等で送信され
てきた圧縮画像データを展開して該帳票画像データを得
るようなデコード部としての機能を持った画像入力部で
も良い。The image input unit is configured to photoelectrically convert an optical signal from a form to obtain form image data. However, the present invention is not limited to this. For example, compressed image data transmitted by facsimile or the like may be used. An image input unit having a function as a decoding unit that develops the form image data to obtain the form image data may be used.

【００６２】そのほか、文字行の切り出し方法、文字の
切り出し方法、パタンメモリの構成方法、文字認識方
法、辞書部の構成等も、本発明の範囲内で適宜自由な構
成として良いことは明かである。In addition, it is apparent that a method of extracting a character line, a method of extracting a character, a method of configuring a pattern memory, a method of recognizing a character, a configuration of a dictionary unit, and the like may be freely configured within the scope of the present invention. .

【００６３】[0063]

【発明の効果】以上詳細に説明したように、本発明によ
れば、文字行より切り出された文字を分類する文字分類
方法において、文字行の先頭の文字に対して所定の仮の
分類値を付与し、２番目以降の文字については当該文字
と直前の文字の外接枠の上辺位置の差及び下辺位置の差
をそれぞれ算出し、該位置の差の絶対値と符号により変
位量及び変位方向を求め、該変位量と変位方向及び直前
の文字に付与された仮の分類値とに基づいて、当該文字
に対して外接枠の上辺位置に基づく仮の分類値と外接枠
の下辺位置に基づく仮の分類値の２種類の仮の分類値を
付与する仮分類ステップと、当該文字行中の各文字に付
与された前記２種類の仮の分類値の各々について分類値
を集計して分類値毎の度数を求める集計ステップと、当
該文字行中の各文字を文字高さ方向に予め設定された複
数の基準線によって分類する為に、各基準線を番号によ
って識別し、外接枠の上辺位置に基づく仮の分類値の度
数が最大となる文字群Ａに対しては、文書において出現
頻度の高い文字群の外接枠上辺位置の存在頻度が高くな
るミーンラインに準ずる基準線の前記識別番号を上辺分
類値として付与し、その他の文字群の上辺分類値につい
ては当該文字群の外接枠上辺位置に基づく仮の分類値と
前記文字群Ａの仮の分類値との大小関係により付与し、
当該文字行において外接枠の下辺位置に基づく仮の分類
値の度数が最大となる文字群Ｂに対しては、文書におい
て出現頻度の高い文字群の外接枠下辺位置の存在頻度が
高くなるベースラインに準じる基準線の前記識別番号を
下辺分類値として付与し、その他の文字群の下辺分類値
については当該文字群の外接枠下辺位置に基づく仮の分
類値と前記文字群Ｂの仮の分類値との大小関係により付
与し、各文字毎に付与された前記上辺分類値と下辺分類
値の組み合わせに対して新たな分類結果値を付与する分
類ステップを設けたので、簡単な処理で文字を分類する
ことが可能となり、また傾斜している文字行や、基準線
が凹凸しているような文字行から切り出した文字パタン
についても正確に分類できる。As described above in detail, according to the present invention, in a character classification method for classifying characters cut out from a character line, a predetermined temporary classification value is assigned to the first character of the character line. For the second and subsequent characters, the difference between the upper side position and the lower side position of the circumscribed frame between the character and the immediately preceding character is calculated, and the displacement amount and the displacement direction are calculated based on the absolute value of the position difference and the sign. Based on the displacement amount, the displacement direction, and the provisional classification value assigned to the character immediately before, the provisional classification value based on the upper side position of the circumscribed frame and the provisional classification value based on the lower side position of the circumscribed frame for the character are obtained. A provisional classification step of providing two types of provisional classification values of the classification values, and a classification value for each of the two types of provisional classification values assigned to each character in the character line. Tabulation step to find the frequency of each sentence and each sentence in the character line In order to classify by a plurality of reference lines set in advance in the character height direction, each reference line is identified by a number, and a character group A in which the frequency of the provisional classification value based on the position of the upper side of the circumscribed frame is the largest. On the other hand, the identification number of the reference line corresponding to the mean line where the frequency of the upper side of the circumscribed frame of the character group having a high appearance frequency in the document is high is given as the upper side classification value, and the upper side classification value of the other character groups is given. Is given by the magnitude relationship between the provisional classification value based on the position of the upper side of the circumscribed frame of the character group and the provisional classification value of the character group A,
For the character group B in which the frequency of the provisional classification value based on the lower side position of the circumscribing frame in the character line is the largest, the base line in which the frequency of the lower side position of the circumscribing frame of the character group that appears frequently in the document increases The identification number of the reference line according to is assigned as the lower side classification value, and the lower side classification value of the other character group is a provisional classification value based on the position of the lower side of the circumscribed frame of the character group and the provisional classification value of the character group B. And a new classification result value is provided for the combination of the upper side classification value and the lower side classification value provided for each character, so that characters can be classified by simple processing. It is also possible to accurately classify character patterns cut out from inclined character lines or character lines in which the reference line is uneven.

【００６４】従ってこの文字分類方法を文字認識装置に
適用した場合には、文字の分類が正確に行われるので、
認識精度が高く、処理速度が速く従って高性能な文字認
識装置を実現できる。Therefore, when this character classification method is applied to a character recognition device, character classification is performed accurately.
A high-performance character recognition device with high recognition accuracy and high processing speed can be realized.

[Brief description of the drawings]

【図１】本発明の文字分類装置を用いた文字認識装置の
一実施例を示す構成図である。FIG. 1 is a configuration diagram showing an embodiment of a character recognition device using the character classification device of the present invention.

【図２】帳票の一例を示す図である。FIG. 2 is a diagram illustrating an example of a form.

【図３】文字分類部の説明に供する図である。FIG. 3 is a diagram provided for explanation of a character classification unit.

【図４】集計部の説明に供する図である。FIG. 4 is a diagram provided for explanation of a tallying unit.

【図５】分類部の説明に供する図である。FIG. 5 is a diagram provided for explanation of a classification unit.

【図６】辞書部の説明に供する図である。。FIG. 6 is a diagram provided for explanation of a dictionary unit. .

[Explanation of symbols]

１００文字認識装置１１０画像入力部１２０文字行切り出し部１３０文字切り出し部１４０パタンメモリ１５０文字分類部１５１記憶部１５２仮分類部１５３集計部１５４分類部１６０辞書部１７０文字認識部１８０出力端子１９０制御部 REFERENCE SIGNS LIST 100 character recognition device 110 image input unit 120 character line cutout unit 130 character cutout unit 140 pattern memory 150 character classification unit 151 storage unit 152 provisional classification unit 153 counting unit 154 classification unit 160 dictionary unit 170 character recognition unit 180 output terminal 190 control unit

Claims

(57) [Claims]

1. A character classification method for classifying characters cut out from a character line, wherein a predetermined temporary classification value is assigned to the first character of the character line, The difference between the upper side position and the lower side position of the circumscribed frame of the character is calculated, the displacement amount and the displacement direction are obtained from the absolute value of the position difference and the sign, and the displacement amount, the displacement direction, and the immediately preceding character are assigned. Two types of provisional classification values are assigned to the character based on the provisional classification value obtained, that is, a provisional classification value based on the upper side position of the circumscribed frame and a provisional classification value based on the lower side position of the circumscribed frame. A provisional classification step; a totalization step of totalizing the classification values for each of the two types of temporary classification values assigned to each character in the character line to obtain a frequency for each classification value; Characters are set in multiple criteria that are preset in the height direction In order to classify by character line, each reference line is identified by a number, and for the character group A in which the frequency of the provisional classification value based on the position of the upper side of the circumscribed frame is the largest, the character group having a high appearance frequency in the document The identification number of the reference line corresponding to the mean line where the frequency of the upper side of the circumscribed frame is higher is assigned as the upper side classification value, and the upper side classification value of the other character group is provisionally determined based on the upper side position of the circumscribed frame of the character group. For the character group B in which the frequency of the provisional classification value based on the lower side position of the circumscribed frame in the character line is given by the magnitude relation between the classification value and the provisional classification value of the character group A, a document In the character group having a high appearance frequency, the identification number of the reference line according to the baseline in which the existence frequency of the lower side of the circumscribed frame of the character group having a high frequency is given as the lower side classification value, and the lower side classification value of the other character group is Outside A new classification is provided for a combination of the upper side classification value and the lower side classification value given for each character by giving a magnitude relationship between a provisional classification value based on the position of the lower side of the frame and the provisional classification value of the character group B. And a classification step of assigning a result value.

2. An image input unit for inputting image data, a character line cutout unit for cutting out a character line from the input image data, a character cutout unit for cutting out a character pattern from the character line, and recognizing the cutout character pattern. A character recognition device including a character recognition unit and a dictionary unit storing a dictionary mask for character recognition, comprising: a character classification unit configured to classify characters by the character classification method according to claim 1, based on a character classification result. A character recognition device for performing character recognition using a selected dictionary.