JPH05290211A

JPH05290211A - Discrimination method of character kind and the like

Info

Publication number: JPH05290211A
Application number: JP4088551A
Authority: JP
Inventors: Tamotsu Maeda; 保前田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-04-09
Filing date: 1992-04-09
Publication date: 1993-11-05

Abstract

PURPOSE:To reduce the erroneous discrimination of the character kinds, etc., of the character, etc., of different font even at the time of the character kind and the same character by obtaining the difference between the centroid value and the average value of a frequency by calculating them from an actual size for every character kind, etc., and using a value obtained from a frequency distribution when the difference is smaller than a threshold value and an intermediate value between the centroid/ average values when the difference is larger respectively as standard sizes. CONSTITUTION:A character recognition circuit 21 recognizes N-number of characters. The frequency of height and width obtained concerning the N-number of the characters are read from a size frequency storage part 11. Besides, the total value of height and width and the number of the characters are read from a character kind size storage part 9 to calculate the average value by the character kind. Then, whether the difference between the centroid/average values is within the threshold value or not is judged for every character kind and when it it within the threshold value, the intermediate value between the centroid/average values, for example, is calculated for every character kind to set the calculated value to be the standard size. When the difference of the centroid/average values is other than the threshold value, the centroid value is decided to be the standard size.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は文字認識装置からの文字
コードに対して大文字・小文字や漢字仮名類似字形文字
等の文字種等の判別方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method of discriminating character types such as uppercase / lowercase letters and kanji / kana characters similar to a character code from a character recognition device.

【０００２】[0002]

【従来の技術】近年、文字認識装置が組み込まれた情報
機器が開発されている。文字認識装置の大文字、小文字
の判別方法として、特開平２−２２４０８４号公報が開
示されている。2. Description of the Related Art In recent years, information equipment incorporating a character recognition device has been developed. Japanese Patent Laid-Open No. 2-224084 discloses a method for discriminating between uppercase letters and lowercase letters in a character recognition device.

【０００３】前記公報には、認識結果の各文字について
文字コード、大きさを順次記憶しつつその文字コードか
ら文字種を判別するとともに、この文字が標準サイズと
同一のパターンの幅又は高さが等しい文字か、または
「ヤ」と「ャ」のような類似な字形の小文字を持つ文字
か、もしくは「工業」の「工」や「アイウエ」の「エ」
のように漢字と仮名で類似な字形を持つ文字（以下、漢
字仮名類似字形文字と言う）かを文字コード毎に予め設
定されたテーブルを参照して判断し、小文字を持つ文字
あるいは漢字仮名類似字形文字ならば記憶した文字にマ
ークを付ける一方、標準サイズを持つ文字の実際の大き
さを文字種毎に集計して一文書の認識結果を得、文字種
毎に計測したサイズを集計した値から頻度分布または平
均値を求めて文字種毎に標準サイズを確定し、先にマー
クを付けた文字についてその文字種対応の前記確定した
標準サイズに所定のしきい値を設定して、大文字か小文
字または漢字仮名類似文字が漢字か仮名かを判別する文
字種の判別方法が開示されている。In the above-mentioned publication, the character code and size of each character of the recognition result are sequentially stored, the character type is discriminated from the character code, and the character has the same pattern width or height as the standard size. Letters, or letters with similar lowercase letters, such as "ya" and "ya," or "engineering" in "industrial" and "e" in "aiue"
Judgment is made by referring to a table that is preset for each character code to determine whether the character has a similar glyph to the kanji and kana (hereinafter referred to as the kanji kana-similar glyph). If it is a glyph character, a mark is added to the stored characters, while the actual size of characters with a standard size is totaled for each character type to obtain the recognition result of one document, and the size is measured for each character type. Determine the standard size for each character type by obtaining the distribution or average value, and set a predetermined threshold to the fixed standard size corresponding to the character type previously marked for the character type, uppercase or lowercase letters or kana kana A method of discriminating a character type for discriminating whether a similar character is a kanji or a kana is disclosed.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら上記従来
の構成では、標準サイズ（文字種毎の標準的な大きさ）
を持つ文字を予め決めておく必要があるが、同一の文字
であっても文字フォントが違うと実際の標準サイズに対
する大きさが異なるために、これらの文字の大きさを文
字種毎に集計して頻度分布または平均値を計算して標準
サイズを求めても、実際の標準サイズとずれてしまい、
判別が困難になるという問題点を有していた。However, in the above conventional structure, the standard size (standard size for each character type) is used.
It is necessary to decide in advance which characters have the same size, but even if they are the same character, different character fonts have different sizes from the actual standard size. Even if the standard size is calculated by calculating the frequency distribution or the average value, it will deviate from the actual standard size,
It has a problem that it is difficult to discriminate.

【０００５】本発明は上記従来の問題点を解決するもの
で、文字種や同一文字でもフォントの異なった文字等の
文字種等の誤判別を少なくし、判別精度を著しく向上さ
せることができる文字種等の判別方法を提供することを
目的とする。The present invention solves the above-mentioned problems of the prior art by reducing erroneous discrimination of character types and the like, such as characters having the same character but different fonts, and significantly improving the discrimination accuracy. The purpose is to provide a determination method.

【０００６】[0006]

【課題を解決するための手段】この目的を達成するため
に本発明の文字種等の判別方法は、文字種等毎に実際の
大きさから度数の重心の値（重心値）と平均値を計算し
て、これらの差を求め、これがしきい値より小さければ
度数分布から求めた値を標準サイズとし、大きければ重
心値と平均値の中間の値を標準サイズとして使用する構
成からなる。In order to achieve this object, the method of discriminating the character type of the present invention calculates the value of the center of gravity (centroid value) and the average value of the frequency from the actual size for each character type. Then, if the difference is smaller than the threshold value, the value obtained from the frequency distribution is used as the standard size, and if it is larger, the intermediate value between the center of gravity value and the average value is used as the standard size.

【０００７】具体的には、所定の文字数に対して文字認
識部から出力された文字コード，文字列の中心線及び文
字パターンの高さ，幅及び中心点を読み取り、当該文字
コードにより大文字・小文字や漢字仮名類似字形文字等
の文字種を判断し、大文字と小文字等の同形の文字に対
しては、文字列の中心線と文字パターンの中心点との差
を計算するとともに、文字パターンの高さと幅の度数の
重心と平均値を計算し、前記重心の値と平均値を比較
し、この結果に応じて前記文字種毎の標準の大きさを推
定し、これと文字パターンの高さあるいは幅を比較する
ことにより文字種等を判別する構成からなる。Specifically, for a predetermined number of characters, the character code output from the character recognition unit, the center line of the character string, and the height, width, and center point of the character pattern are read, and the uppercase and lowercase letters are read according to the character code. Characters such as Kanji and Kana and similar characters are determined, and for uppercase and lowercase characters of the same shape, the difference between the center line of the character string and the center point of the character pattern is calculated, and the height of the character pattern is calculated. Calculate the center of gravity and the average value of the frequency of the width, compare the value of the center of gravity and the average value, and estimate the standard size for each character type according to this result, and calculate the height or width of the character pattern. It is configured to determine the character type and the like by comparing.

【０００８】[0008]

【作用】この構成によって、文字種毎に実際の大きさか
ら度数の重心の値（重心値）と平均値を計算して、これ
らの差を求め、これがしきい値より小さければ度数分布
から求めた値を標準サイズとし、大きければ重心値と平
均値の中間の値を標準サイズとして使用するので、計算
により求めた標準サイズと実際の標準サイズとの誤差を
小さくすることができ、判別精度を著しく向上させるこ
とができる。With this configuration, the centroid value (centroid value) and the average value of the frequencies are calculated from the actual size for each character type, and the difference between them is calculated. If this difference is smaller than the threshold value, it is calculated from the frequency distribution. The value is the standard size, and if it is larger, the value between the center of gravity value and the average value is used as the standard size, so the error between the calculated standard size and the actual standard size can be reduced, and the discrimination accuracy is significantly improved. Can be improved.

【０００９】[0009]

【実施例】以下本発明の一実施例について、図面を参照
しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００１０】図１は本発明の一実施例における文字種等
の判別方法を実行する判別装置の機能構成を示すブロッ
ク図である。１は読み取る対象の文書をイメージスキャ
ナなどにより２値画像として入力する画像入力部、２は
画像入力部１からの２値画像から文字列領域を抽出する
文字列切り出し部、３は文字列切り出し部２で切りださ
れた文字列の中心線を求める文字列中心線計算部、４は
文字列領域に含まれる文字を抽出し文字パターンは文字
認識部５に、文字パターンの高さと幅は文字情報記憶部
６の高さ領域、および幅領域に記憶する文字切り出し
部、５は文字切り出し部４から入力された文字パターン
に対応する文字コードを出力する文字認識部、６は文字
認識部５の認識結果を文字コード領域に記憶する文字情
報記憶部、７は文字列中心線計算部３と文字切り出し部
４から文字列の中心と文字パターンの中心とのずれを計
算しその結果を文字情報記憶部６のずれ値領域に記憶さ
せるずれ計算部、８は文字情報記憶部６に記憶された文
字コードが大文字と小文字の両方を持つ文字か、それ以
外の文字かを判断する文字コード分類部、９は文字コー
ド分類部８での文字種別サイズの結果を格納する文字種
別サイズ記憶部、１０は文字種別サイズ記憶部９中のデ
ータから文字種別に平均サイズを計算する平均値計算
部、１１は文字コード分類部９のサイズ度数の結果を格
納するサイズ度数記憶部、１２はサイズ度数記憶部１１
のデータから文字種別に頻度の重心値を計算する頻度重
心値計算部、１３は平均値計算部１０と頻度重心値計算
部１２の結果から標準サイズを決定する標準サイズ決定
部、１４は標準サイズ決定部１３からの標準サイズを基
準として、認識文字が大文字か小文字か等を判別する大
文字・小文字等判別部、１５は以上の動作の制御を行う
制御部である。FIG. 1 is a block diagram showing the functional arrangement of a discriminating apparatus for executing a method for discriminating character types and the like according to an embodiment of the present invention. 1 is an image input unit for inputting a document to be read as a binary image by an image scanner or the like, 2 is a character string cutout unit for extracting a character string region from the binary image from the image input unit 1, and 3 is a character string cutout unit The character string center line calculation unit for obtaining the center line of the character string cut out in 2, 4 extracts the characters contained in the character string area, the character pattern is in the character recognition unit 5, and the height and width of the character pattern are the character information. A character cutout unit for storing in the height region and the width region of the storage unit 6, 5 is a character recognition unit for outputting a character code corresponding to the character pattern input from the character cutout unit 4, and 6 is recognition of the character recognition unit 5. A character information storage unit 7 stores the result in the character code area. Reference numeral 7 denotes a character string center line calculation unit 3 and a character cutout unit 4 for calculating the deviation between the center of the character string and the center of the character pattern, and the result is stored in the character information storage unit. 6 A deviation calculation unit to be stored in the deviation value area, 8 is a character code classification unit that determines whether the character code stored in the character information storage unit 6 is a character having both upper case and lower case characters, and 9 is a character. A character type size storage unit that stores the result of the character type size in the code classifying unit 10, an average value calculation unit that calculates an average size for each character type from the data in the character type size storage unit 9, and a character code classification unit 11. A size frequency storage unit for storing the result of the size frequency of the unit 9, 12 is a size frequency storage unit 11
Frequency centroid value calculation unit that calculates the frequency centroid value for each character type from the data of 13, the standard size determination unit 13 that determines the standard size from the results of the average value calculation unit 10 and the frequency centroid value calculation unit 12, and 14 is the standard size An upper / lower case discrimination unit that discriminates whether the recognized character is an upper case letter, a lower case letter or the like based on the standard size from the determination unit 13, and a control unit 15 that controls the above operation.

【００１１】図２は文字種等の判別方法を実行する判別
装置のハード構成を示すブロック図である。FIG. 2 is a block diagram showing a hardware configuration of a discriminating apparatus for executing a method of discriminating character types and the like.

【００１２】２０は文書画像を２値データとして入力す
るイメージスキャナ、２１はイメージスキャナ２０で２
値化された画像データから文字列および文字を切りだし
た後、文字コードを出力する文字認識回路、２２は文字
情報記憶部６，文字種別サイズ記憶部９，サイズ度数記
憶部１１からなるＲＡＭ、２３はＣＰＵ（中央処理装
置）、２４は文字カテゴリ別サイズ情報テーブル２５、
プログラム記憶部２６から構成されるＲＯＭである。Reference numeral 20 is an image scanner for inputting a document image as binary data. Reference numeral 21 is an image scanner 20 for inputting binary data.
A character recognition circuit that outputs a character code after cutting out a character string and characters from the binarized image data, 22 is a RAM including a character information storage unit 6, a character type size storage unit 9, and a size frequency storage unit 11, 23 is a CPU (Central Processing Unit), 24 is a character category size information table 25,
It is a ROM configured by the program storage unit 26.

【００１３】以上のように構成された文字種等の判別装
置について、図３のフローチャートを用いて説明する。
図３は本実施例の文字種等の判別方法を実行する際のフ
ローチャートである。The character type discriminating apparatus configured as described above will be described with reference to the flowchart of FIG.
FIG. 3 is a flow chart when the method of discriminating the character type and the like of this embodiment is executed.

【００１４】まず、文字認識回路２１にてＮ個の文字を
認識する（Ｓ１）。文字認識回路２１からは文字コー
ド、文字列の中心線、文字パターンの中心点、高さおよ
び幅が出力される。Ｎ個全ての文字について、文字認識
回路２１からの出力をそれぞれ文字情報記憶部６の文字
コード領域、文字列中心線領域、文字中心点領域、高さ
領域そして幅領域に記憶する（Ｓ２）。次にＮ文字のそ
れぞれについて文字の種類（英文字、平仮名、片仮名な
ど）を調べ、平仮名と片仮名については大文字と小文字
が同形の文字（たとえば、“あ”と“ぁ”、“ヤ”と
“ャ”、等）か否かを調べ、英字については大文字と小
文字が同形の文字（“Ｃ”と“ｃ”、等）か、大文字と
小文字が同形でない大文字（“Ａ”等）か、または大文
字と小文字が同形でない小文字（“ａ”等）かのいずれ
かを調べ、文字情報記憶部６の属性領域中の該当する欄
にマークを記入する（Ｓ３）。次にｉに１をセットし
（Ｓ４）、ｉ番目の文字の属性、高さおよび幅をそれぞ
れ文字情報記憶部６の属性領域、高さ領域、幅領域から
読み取り、サイズ度数記憶部１１の該当する文字種の高
さと幅の欄の値を１だけ増加させ、文字種別サイズ記憶
部９の該当する文字種の高さと幅の合計値の欄にそれぞ
れｉ番目の文字の高さと幅の値を加算し、文字数の欄に
１を加算する（Ｓ５）。サイズ度数記憶部１１と文字種
別サイズ記憶部９の各欄の初期値は０である。文字情報
記憶部６のｉ番目の文字の文字列中心線領域と文字中心
点領域の値を読みだし、文字パターンの中心が文字列の
中心線より下方に位置する度合いを計算し、これを文字
情報記憶部６のずれ値領域に記憶する（Ｓ６）。ｉがＮ
より小さいかどうかを判定する（Ｓ７）。小さい場合は
ｉをひとつ増やした後（Ｓ８）、Ｓ５に戻る。等しい
か、大きい場合はＳ９に進む。Ｎ文字について求まった
高さと幅の度数をサイズ度数記憶部１１から読みだし、
重心値を文字種別に計算する（Ｓ９）。高さと幅の合計
値と文字数を文字種別サイズ記憶部９から読みだし、平
均値を文字種別に計算する（Ｓ１０）。文字種毎に重心
値と平均値の差がしきい値以内かどうかを判定する（Ｓ
１１）。しきい値以内のときにはＳ１２へ、それ以外の
ときにはＳ１３へ進む。Ｓ１２では、文字種毎に、たと
えば重心値と平均値の中間の値を計算し、これを標準サ
イズとする。Ｓ１３では重心値を標準サイズと決める。
次に、ｉに１をセットする（Ｓ１４）。文字情報記憶部
６のｉ番目の文字の属性領域を読みだして、大文字・小
文字が同形の文字ならば、文字情報記憶部６の高さ領域
あるいは幅領域がＳ１２あるいはＳ１３で求めた標準サ
イズに文字カテゴリ別サイズ情報テーブル２５の値を乗
じた値より大きいか小さいかの度合いと、文字情報記憶
部６のずれ値領域の値の大きさの関係から、たとえば前
者がかなり大きければ後者にかかわらず大文字と判別
し、前者が中位に大きくて後者が大きいときは小文字、
前者が中位に大きくて後者が小さいときは大文字などの
ルールを用いて、大文字・小文字を判別する（Ｓ１
５）。ｉがＮより小さいかどうかを調べ（Ｓ１６）、等
しいか大きい場合はＳ１７にてｉをひとつ増やしてＳ１
５に戻り、小さい場合は処理を終了する。First, the character recognition circuit 21 recognizes N characters (S1). The character recognition circuit 21 outputs the character code, the center line of the character string, the center point of the character pattern, the height and the width. The outputs from the character recognition circuit 21 for all N characters are stored in the character code area, the character string center line area, the character center point area, the height area, and the width area of the character information storage unit 6 (S2). Next, check the type of each N character (English characters, Hiragana, Katakana, etc.), and for Hiragana and Katakana, the same uppercase and lowercase letters (eg "a" and "a", "ya" and " , Etc.) and check whether the uppercase and lowercase letters are the same as for alphabetic characters (“C” and “c”, etc.), or whether the uppercase and lowercase letters are not isomorphic (“A”, etc.), or Whether or not the uppercase and lowercase letters are not the same shape (such as "a") is checked, and a mark is entered in the corresponding column in the attribute area of the character information storage unit 6 (S3). Next, i is set to 1 (S4), and the attribute, height, and width of the i-th character are read from the attribute area, height area, and width area of the character information storage unit 6, respectively. The value of the height and width of the character type to be increased is incremented by 1, and the height and width values of the i-th character are added to the total value of the height and width of the corresponding character type in the character type size storage unit 9. , 1 is added to the column of the number of characters (S5). The initial value of each column of the size frequency storage unit 11 and the character type size storage unit 9 is 0. The values of the character string center line area and the character center point area of the i-th character in the character information storage unit 6 are read out, the degree to which the center of the character pattern is located below the center line of the character string is calculated, and this value is calculated. The data is stored in the shift value area of the information storage unit 6 (S6). i is N
It is determined whether it is smaller than (S7). If it is smaller, i is incremented by 1 (S8), and the process returns to S5. If they are equal or larger, the process proceeds to S9. The height and width frequencies obtained for N characters are read from the size frequency storage unit 11,
The centroid value is calculated for each character type (S9). The total value of the height and width and the number of characters are read from the character type size storage unit 9, and the average value is calculated for the character type (S10). It is determined for each character type whether the difference between the centroid value and the average value is within a threshold value (S
11). If it is within the threshold value, the process proceeds to S12, and if not, the process proceeds to S13. In S12, for example, an intermediate value between the centroid value and the average value is calculated for each character type, and this is set as the standard size. In S13, the center of gravity value is determined as the standard size.
Next, i is set to 1 (S14). When the attribute area of the i-th character of the character information storage unit 6 is read and the upper and lower case letters are the same shape, the height area or width area of the character information storage unit 6 becomes the standard size determined in S12 or S13. From the relationship between the degree of being larger or smaller than the value obtained by multiplying the value of the character category size information table 25 and the size of the value of the deviation value area of the character information storage unit 6, for example, if the former is considerably large, regardless of the latter. Distinguish it as an uppercase letter, and if the former is large in the middle and the latter is large, it is in lowercase,
When the former is large in the middle and the latter is small, uppercase / lowercase is discriminated using rules such as uppercase letters (S1).
5). It is checked whether i is smaller than N (S16). If equal or larger, i is incremented by 1 in S17 and S1
Returning to 5, the processing is ended when it is smaller.

【００１５】次に、本実施例の文字種等の判別装置につ
いて、文字種等の判別方法を具体例を用いて説明する。Next, the method of discriminating character types etc. of the character type discriminating apparatus of the present embodiment will be explained using a concrete example.

【００１６】図４は具体例の文字が文字情報記憶部に記
憶されている記憶内容配置図であり、図５は具体例の文
字がサイズ度数記憶部に記憶されている記憶内容配置図
であり、図６は具体例の文字が文字カテゴリ別サイズ情
報テーブルに記憶されている記憶内容配置図であり、図
７は具体例の文字が文字種別サイズ記憶部に記憶されて
いる記憶内容配置図である。FIG. 4 is a storage content layout diagram in which specific example characters are stored in the character information storage unit, and FIG. 5 is a storage content layout diagram in which specific example characters are stored in the size frequency storage unit. 6 is a storage content layout diagram in which characters of a specific example are stored in a character category size information table, and FIG. 7 is a storage content layout diagram in which characters of a specific example are stored in a character type size storage unit. is there.

【００１７】まず、図４に示すように文字認識回路２１
からの出力を記憶した文字情報記憶部６の文字コードが
属する文字種を調べ、文字情報記憶部６の属性領域にマ
ークする。次に、文字情報記憶部６の文字列中心線領域
の値から文字中心点領域の差を取りずれ値領域に書き込
む。First, as shown in FIG. 4, the character recognition circuit 21
The character type to which the character code of the character information storage unit 6 storing the output from is checked and the attribute area of the character information storage unit 6 is marked. Next, the difference of the character center point area from the value of the character string center line area of the character information storage unit 6 is written in the error value area.

【００１８】以下説明をわかり易くするために英文字に
ついて説明する。図５，図６に示すように文字情報記憶
部６の属性領域を見て英字の大文字欄にマークが付され
ている文字の、高さ領域と幅領域の値に該当するサイズ
度数記憶部１１の英大文字の高さおよび幅の欄の値をひ
とつ増加する（図５）。同様の処理を小文字についても
行う。また、図７に示すように文字情報記憶部６で属性
領域を見て英字の大文字欄または小文字欄にマークがあ
る文字の、高さ領域と幅領域の値をそれぞれ文字カテゴ
リ別サイズ情報テーブル２５の該当する文字カテゴリの
値で除した値の合計値と文字数を文字種別サイズ記憶部
９に記憶する。図５より英字大文字の高さの重心値は
（５０＊１＋５１＊１）／２の計算により５０．５、幅
は同様にして４４．０に、小文字では高さ４３．５、幅
４２．０になる。一方、平均値は図７の合計値を文字数
で除し、英字の大文字では高さ５１．８、幅４２．０、
小文字では高さ５１．２５、幅５２．５が得られる。英
大文字の場合、高さと幅の標準サイズがしきい値以内、
英小文字では高さと幅がしきい値以上とすると、英大文
字の標準サイズは、高さが（５０．５＋５１．８）／２
＝５１．１５、（４４．０＋４２．０）／２＝４３．０
となる。一方、英小文字の標準サイズは重心値だけを採
用して、高さが４３．５、幅が４２．０になる。最後
に、文字情報記憶部６で英字の両方にマークされた文字
の高さ領域と幅領域の値を、たとえば英大文字と英小文
字の標準サイズの平均値と比較し、これよりかなり大き
ければ大文字、かなり小さければ小文字、その中間なら
ば文字情報記憶部６の該当する文字のずれ値領域の値を
見てこれがしきい値より大きければ大文字、小さければ
小文字に判別する。In order to make the description easier to understand, English characters will be described below. As shown in FIGS. 5 and 6, when looking at the attribute area of the character information storage unit 6, the size frequency storage unit 11 corresponding to the values of the height region and the width region of the character marked in the uppercase column of the alphabet. Increment the values in the height and width columns of the capital letters of (Figure 5). Similar processing is performed for lowercase letters. In addition, as shown in FIG. 7, when the attribute area is viewed in the character information storage unit 6, the values of the height area and the width area of the characters having a mark in the upper case column or the lower case column of the alphabet are respectively classified by the character category size information table 25. The total of the values divided by the value of the corresponding character category and the number of characters are stored in the character type size storage unit 9. From Fig. 5, the height center of gravity of uppercase letters is 50.5 by calculation of (50 * 1 + 51 * 1) / 2, the width is 44.0 in the same manner, and the height is 43.5 and the width is 42.0 for lowercase letters. become. On the other hand, the average value is obtained by dividing the total value shown in FIG. 7 by the number of characters, and for uppercase letters, the height is 51.8 and the width is 42.0.
A small letter gives a height of 51.25 and a width of 52.5. For capital letters, the standard size for height and width is within the threshold,
If the height and width are lower than the threshold for lowercase letters, the standard size for uppercase letters is (50.5 + 51.8) / 2.
= 51.15, (44.0 + 42.0) /2=43.0
Becomes On the other hand, the standard size of English small letters adopts only the center of gravity value, and the height is 43.5 and the width is 42.0. Finally, the values in the height and width areas of the characters marked in both letters in the character information storage unit 6 are compared with, for example, the average value of the standard sizes of uppercase letters and lowercase letters. If it is considerably smaller than the threshold value, if it is larger than the threshold value, it is determined to be a capital letter.

【００１９】[0019]

【発明の効果】以上のように本発明は、文字種等毎に実
際の大きさから度数の重心の値（重心値）と平均値を計
算して、これらの差を求め、これがしきい値より小さけ
れば度数分布から求めた値を標準サイズとし、大きけれ
ば重心値と平均値の中間の値を標準サイズとして使用す
るから、計算により求めた標準サイズと実際の標準サイ
ズとの誤差が小さくすることができ、誤判別が極めて少
なく、大文字・小文字等の文字種等の判別やフォントの
異なる文字種等の判別を精度良く行うことができる文字
種等の判別方法を実現できるものである。As described above, according to the present invention, the value of the center of gravity of the frequency (center of gravity value) and the average value are calculated from the actual size for each character type, and the difference between them is calculated. If it is small, the value obtained from the frequency distribution will be used as the standard size, and if it is large, the intermediate value between the center of gravity and the average value will be used as the standard size, so the error between the calculated standard size and the actual standard size should be small. Therefore, it is possible to realize a method of discriminating character types, etc., which can perform discrimination of character types such as uppercase and lowercase letters and character types of different fonts with high accuracy, with very few misjudgments.

[Brief description of drawings]

【図１】本発明の一実施例における大文字・小文字の判
別方法を実施する判別装置の機能構成を示すブロック図FIG. 1 is a block diagram showing a functional configuration of a discriminating apparatus that implements a method for discriminating uppercase / lowercase letters according to an embodiment of the present invention.

【図２】本実施例における文字種等の判別方法を実施す
る判別装置のハード構成を示すブロック図FIG. 2 is a block diagram showing a hardware configuration of a discriminating apparatus that implements a method for discriminating character types and the like according to the present embodiment.

【図３】本実施例の文字種等の判別方法を実行する際の
フローチャートFIG. 3 is a flowchart for executing a method for determining a character type and the like according to the present embodiment.

【図４】具体例の文字が本実施例の判別装置の文字情報
記憶部に記憶されている記憶内容配置図FIG. 4 is a storage content layout diagram in which characters of a specific example are stored in a character information storage unit of the determination device of the present embodiment.

【図５】具体例の文字が本実施例の判別装置のサイズ度
数記憶部に記憶されている記憶内容配置図FIG. 5 is a storage content layout diagram in which characters of a specific example are stored in a size frequency storage unit of the determination device of the present embodiment.

【図６】具体例の文字が本実施例の判別装置の文字カテ
ゴリ別サイズ情報テーブルに記憶されている記憶内容配
置図FIG. 6 is a storage content layout diagram in which characters of a specific example are stored in a character category size information table of the discrimination apparatus of the present embodiment.

【図７】具体例の文字が本実施例の判別装置の文字種別
サイズ記憶部に記憶されている記憶内容配置図FIG. 7 is a storage content layout diagram in which characters of a specific example are stored in a character type size storage unit of the determination device of the present embodiment.

[Explanation of symbols]

１画像入力部２文字列切り出し部３文字列中心線計算部４文字切り出し部５文字認識部６文字情報記憶部７ずれ計算部８文字コード分類部９文字種別サイズ記憶部１０平均値計算部１１サイズ度数記憶部１２頻度重心値計算部１３標準サイズ決定部１４大文字・小文字等判別部２０イメージスキャナ２１文字認識回路２２ＲＡＭ２３ＣＰＵ２４ＲＯＭ２５文字カテゴリ別サイズ情報テーブル２６プログラム記憶部 1 image input unit 2 character string cutout unit 3 character string centerline calculation unit 4 character cutout unit 5 character recognition unit 6 character information storage unit 7 deviation calculation unit 8 character code classification unit 9 character type size storage unit 10 average value calculation unit 11 Size frequency storage unit 12 Frequency centroid value calculation unit 13 Standard size determination unit 14 Upper / lower case discrimination unit 20 Image scanner 21 Character recognition circuit 22 RAM 23 CPU 24 ROM 25 Character category size information table 26 Program storage unit

Claims

[Claims]

1. A character code output from a character recognition unit for a predetermined number of characters, a center line of a character string, a height, a width and a center point of a character pattern are read, and the character code is used to detect uppercase / lowercase letters or kana / kana characters. Judges the character type such as similar glyphs and calculates the difference between the center line of the character string and the center point of the character pattern for characters of the same shape, such as uppercase and lowercase, and the frequency of the height and width of the character pattern. Calculates the center of gravity and the average value, compares the value of the center of gravity and the average value, estimates the standard size for each character type according to this result, and compares this with the height or width of the character pattern A method for discriminating character types, etc., characterized by discriminating the character types.