JPH05290211A - Discrimination method of character kind and the like - Google Patents

Discrimination method of character kind and the like

Info

Publication number
JPH05290211A
JPH05290211A JP4088551A JP8855192A JPH05290211A JP H05290211 A JPH05290211 A JP H05290211A JP 4088551 A JP4088551 A JP 4088551A JP 8855192 A JP8855192 A JP 8855192A JP H05290211 A JPH05290211 A JP H05290211A
Authority
JP
Japan
Prior art keywords
character
value
characters
size
height
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP4088551A
Other languages
Japanese (ja)
Inventor
Tamotsu Maeda
保 前田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP4088551A priority Critical patent/JPH05290211A/en
Publication of JPH05290211A publication Critical patent/JPH05290211A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

PURPOSE:To reduce the erroneous discrimination of the character kinds, etc., of the character, etc., of different font even at the time of the character kind and the same character by obtaining the difference between the centroid value and the average value of a frequency by calculating them from an actual size for every character kind, etc., and using a value obtained from a frequency distribution when the difference is smaller than a threshold value and an intermediate value between the centroid/ average values when the difference is larger respectively as standard sizes. CONSTITUTION:A character recognition circuit 21 recognizes N-number of characters. The frequency of height and width obtained concerning the N-number of the characters are read from a size frequency storage part 11. Besides, the total value of height and width and the number of the characters are read from a character kind size storage part 9 to calculate the average value by the character kind. Then, whether the difference between the centroid/average values is within the threshold value or not is judged for every character kind and when it it within the threshold value, the intermediate value between the centroid/average values, for example, is calculated for every character kind to set the calculated value to be the standard size. When the difference of the centroid/average values is other than the threshold value, the centroid value is decided to be the standard size.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は文字認識装置からの文字
コードに対して大文字・小文字や漢字仮名類似字形文字
等の文字種等の判別方法に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method of discriminating character types such as uppercase / lowercase letters and kanji / kana characters similar to a character code from a character recognition device.

【0002】[0002]

【従来の技術】近年、文字認識装置が組み込まれた情報
機器が開発されている。文字認識装置の大文字、小文字
の判別方法として、特開平2−224084号公報が開
示されている。
2. Description of the Related Art In recent years, information equipment incorporating a character recognition device has been developed. Japanese Patent Laid-Open No. 2-224084 discloses a method for discriminating between uppercase letters and lowercase letters in a character recognition device.

【0003】前記公報には、認識結果の各文字について
文字コード、大きさを順次記憶しつつその文字コードか
ら文字種を判別するとともに、この文字が標準サイズと
同一のパターンの幅又は高さが等しい文字か、または
「ヤ」と「ャ」のような類似な字形の小文字を持つ文字
か、もしくは「工業」の「工」や「アイウエ」の「エ」
のように漢字と仮名で類似な字形を持つ文字(以下、漢
字仮名類似字形文字と言う)かを文字コード毎に予め設
定されたテーブルを参照して判断し、小文字を持つ文字
あるいは漢字仮名類似字形文字ならば記憶した文字にマ
ークを付ける一方、標準サイズを持つ文字の実際の大き
さを文字種毎に集計して一文書の認識結果を得、文字種
毎に計測したサイズを集計した値から頻度分布または平
均値を求めて文字種毎に標準サイズを確定し、先にマー
クを付けた文字についてその文字種対応の前記確定した
標準サイズに所定のしきい値を設定して、大文字か小文
字または漢字仮名類似文字が漢字か仮名かを判別する文
字種の判別方法が開示されている。
In the above-mentioned publication, the character code and size of each character of the recognition result are sequentially stored, the character type is discriminated from the character code, and the character has the same pattern width or height as the standard size. Letters, or letters with similar lowercase letters, such as "ya" and "ya," or "engineering" in "industrial" and "e" in "aiue"
Judgment is made by referring to a table that is preset for each character code to determine whether the character has a similar glyph to the kanji and kana (hereinafter referred to as the kanji kana-similar glyph). If it is a glyph character, a mark is added to the stored characters, while the actual size of characters with a standard size is totaled for each character type to obtain the recognition result of one document, and the size is measured for each character type. Determine the standard size for each character type by obtaining the distribution or average value, and set a predetermined threshold to the fixed standard size corresponding to the character type previously marked for the character type, uppercase or lowercase letters or kana kana A method of discriminating a character type for discriminating whether a similar character is a kanji or a kana is disclosed.

【0004】[0004]

【発明が解決しようとする課題】しかしながら上記従来
の構成では、標準サイズ(文字種毎の標準的な大きさ)
を持つ文字を予め決めておく必要があるが、同一の文字
であっても文字フォントが違うと実際の標準サイズに対
する大きさが異なるために、これらの文字の大きさを文
字種毎に集計して頻度分布または平均値を計算して標準
サイズを求めても、実際の標準サイズとずれてしまい、
判別が困難になるという問題点を有していた。
However, in the above conventional structure, the standard size (standard size for each character type) is used.
It is necessary to decide in advance which characters have the same size, but even if they are the same character, different character fonts have different sizes from the actual standard size. Even if the standard size is calculated by calculating the frequency distribution or the average value, it will deviate from the actual standard size,
It has a problem that it is difficult to discriminate.

【0005】本発明は上記従来の問題点を解決するもの
で、文字種や同一文字でもフォントの異なった文字等の
文字種等の誤判別を少なくし、判別精度を著しく向上さ
せることができる文字種等の判別方法を提供することを
目的とする。
The present invention solves the above-mentioned problems of the prior art by reducing erroneous discrimination of character types and the like, such as characters having the same character but different fonts, and significantly improving the discrimination accuracy. The purpose is to provide a determination method.

【0006】[0006]

【課題を解決するための手段】この目的を達成するため
に本発明の文字種等の判別方法は、文字種等毎に実際の
大きさから度数の重心の値(重心値)と平均値を計算し
て、これらの差を求め、これがしきい値より小さければ
度数分布から求めた値を標準サイズとし、大きければ重
心値と平均値の中間の値を標準サイズとして使用する構
成からなる。
In order to achieve this object, the method of discriminating the character type of the present invention calculates the value of the center of gravity (centroid value) and the average value of the frequency from the actual size for each character type. Then, if the difference is smaller than the threshold value, the value obtained from the frequency distribution is used as the standard size, and if it is larger, the intermediate value between the center of gravity value and the average value is used as the standard size.

【0007】具体的には、所定の文字数に対して文字認
識部から出力された文字コード,文字列の中心線及び文
字パターンの高さ,幅及び中心点を読み取り、当該文字
コードにより大文字・小文字や漢字仮名類似字形文字等
の文字種を判断し、大文字と小文字等の同形の文字に対
しては、文字列の中心線と文字パターンの中心点との差
を計算するとともに、文字パターンの高さと幅の度数の
重心と平均値を計算し、前記重心の値と平均値を比較
し、この結果に応じて前記文字種毎の標準の大きさを推
定し、これと文字パターンの高さあるいは幅を比較する
ことにより文字種等を判別する構成からなる。
Specifically, for a predetermined number of characters, the character code output from the character recognition unit, the center line of the character string, and the height, width, and center point of the character pattern are read, and the uppercase and lowercase letters are read according to the character code. Characters such as Kanji and Kana and similar characters are determined, and for uppercase and lowercase characters of the same shape, the difference between the center line of the character string and the center point of the character pattern is calculated, and the height of the character pattern is calculated. Calculate the center of gravity and the average value of the frequency of the width, compare the value of the center of gravity and the average value, and estimate the standard size for each character type according to this result, and calculate the height or width of the character pattern. It is configured to determine the character type and the like by comparing.

【0008】[0008]

【作用】この構成によって、文字種毎に実際の大きさか
ら度数の重心の値(重心値)と平均値を計算して、これ
らの差を求め、これがしきい値より小さければ度数分布
から求めた値を標準サイズとし、大きければ重心値と平
均値の中間の値を標準サイズとして使用するので、計算
により求めた標準サイズと実際の標準サイズとの誤差を
小さくすることができ、判別精度を著しく向上させるこ
とができる。
With this configuration, the centroid value (centroid value) and the average value of the frequencies are calculated from the actual size for each character type, and the difference between them is calculated. If this difference is smaller than the threshold value, it is calculated from the frequency distribution. The value is the standard size, and if it is larger, the value between the center of gravity value and the average value is used as the standard size, so the error between the calculated standard size and the actual standard size can be reduced, and the discrimination accuracy is significantly improved. Can be improved.

【0009】[0009]

【実施例】以下本発明の一実施例について、図面を参照
しながら説明する。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【0010】図1は本発明の一実施例における文字種等
の判別方法を実行する判別装置の機能構成を示すブロッ
ク図である。1は読み取る対象の文書をイメージスキャ
ナなどにより2値画像として入力する画像入力部、2は
画像入力部1からの2値画像から文字列領域を抽出する
文字列切り出し部、3は文字列切り出し部2で切りださ
れた文字列の中心線を求める文字列中心線計算部、4は
文字列領域に含まれる文字を抽出し文字パターンは文字
認識部5に、文字パターンの高さと幅は文字情報記憶部
6の高さ領域、および幅領域に記憶する文字切り出し
部、5は文字切り出し部4から入力された文字パターン
に対応する文字コードを出力する文字認識部、6は文字
認識部5の認識結果を文字コード領域に記憶する文字情
報記憶部、7は文字列中心線計算部3と文字切り出し部
4から文字列の中心と文字パターンの中心とのずれを計
算しその結果を文字情報記憶部6のずれ値領域に記憶さ
せるずれ計算部、8は文字情報記憶部6に記憶された文
字コードが大文字と小文字の両方を持つ文字か、それ以
外の文字かを判断する文字コード分類部、9は文字コー
ド分類部8での文字種別サイズの結果を格納する文字種
別サイズ記憶部、10は文字種別サイズ記憶部9中のデ
ータから文字種別に平均サイズを計算する平均値計算
部、11は文字コード分類部9のサイズ度数の結果を格
納するサイズ度数記憶部、12はサイズ度数記憶部11
のデータから文字種別に頻度の重心値を計算する頻度重
心値計算部、13は平均値計算部10と頻度重心値計算
部12の結果から標準サイズを決定する標準サイズ決定
部、14は標準サイズ決定部13からの標準サイズを基
準として、認識文字が大文字か小文字か等を判別する大
文字・小文字等判別部、15は以上の動作の制御を行う
制御部である。
FIG. 1 is a block diagram showing the functional arrangement of a discriminating apparatus for executing a method for discriminating character types and the like according to an embodiment of the present invention. 1 is an image input unit for inputting a document to be read as a binary image by an image scanner or the like, 2 is a character string cutout unit for extracting a character string region from the binary image from the image input unit 1, and 3 is a character string cutout unit The character string center line calculation unit for obtaining the center line of the character string cut out in 2, 4 extracts the characters contained in the character string area, the character pattern is in the character recognition unit 5, and the height and width of the character pattern are the character information. A character cutout unit for storing in the height region and the width region of the storage unit 6, 5 is a character recognition unit for outputting a character code corresponding to the character pattern input from the character cutout unit 4, and 6 is recognition of the character recognition unit 5. A character information storage unit 7 stores the result in the character code area. Reference numeral 7 denotes a character string center line calculation unit 3 and a character cutout unit 4 for calculating the deviation between the center of the character string and the center of the character pattern, and the result is stored in the character information storage unit. 6 A deviation calculation unit to be stored in the deviation value area, 8 is a character code classification unit that determines whether the character code stored in the character information storage unit 6 is a character having both upper case and lower case characters, and 9 is a character. A character type size storage unit that stores the result of the character type size in the code classifying unit 10, an average value calculation unit that calculates an average size for each character type from the data in the character type size storage unit 9, and a character code classification unit 11. A size frequency storage unit for storing the result of the size frequency of the unit 9, 12 is a size frequency storage unit 11
Frequency centroid value calculation unit that calculates the frequency centroid value for each character type from the data of 13, the standard size determination unit 13 that determines the standard size from the results of the average value calculation unit 10 and the frequency centroid value calculation unit 12, and 14 is the standard size An upper / lower case discrimination unit that discriminates whether the recognized character is an upper case letter, a lower case letter or the like based on the standard size from the determination unit 13, and a control unit 15 that controls the above operation.

【0011】図2は文字種等の判別方法を実行する判別
装置のハード構成を示すブロック図である。
FIG. 2 is a block diagram showing a hardware configuration of a discriminating apparatus for executing a method of discriminating character types and the like.

【0012】20は文書画像を2値データとして入力す
るイメージスキャナ、21はイメージスキャナ20で2
値化された画像データから文字列および文字を切りだし
た後、文字コードを出力する文字認識回路、22は文字
情報記憶部6,文字種別サイズ記憶部9,サイズ度数記
憶部11からなるRAM、23はCPU(中央処理装
置)、24は文字カテゴリ別サイズ情報テーブル25、
プログラム記憶部26から構成されるROMである。
Reference numeral 20 is an image scanner for inputting a document image as binary data. Reference numeral 21 is an image scanner 20 for inputting binary data.
A character recognition circuit that outputs a character code after cutting out a character string and characters from the binarized image data, 22 is a RAM including a character information storage unit 6, a character type size storage unit 9, and a size frequency storage unit 11, 23 is a CPU (Central Processing Unit), 24 is a character category size information table 25,
It is a ROM configured by the program storage unit 26.

【0013】以上のように構成された文字種等の判別装
置について、図3のフローチャートを用いて説明する。
図3は本実施例の文字種等の判別方法を実行する際のフ
ローチャートである。
The character type discriminating apparatus configured as described above will be described with reference to the flowchart of FIG.
FIG. 3 is a flow chart when the method of discriminating the character type and the like of this embodiment is executed.

【0014】まず、文字認識回路21にてN個の文字を
認識する(S1)。文字認識回路21からは文字コー
ド、文字列の中心線、文字パターンの中心点、高さおよ
び幅が出力される。N個全ての文字について、文字認識
回路21からの出力をそれぞれ文字情報記憶部6の文字
コード領域、文字列中心線領域、文字中心点領域、高さ
領域そして幅領域に記憶する(S2)。次にN文字のそ
れぞれについて文字の種類(英文字、平仮名、片仮名な
ど)を調べ、平仮名と片仮名については大文字と小文字
が同形の文字(たとえば、“あ”と“ぁ”、“ヤ”と
“ャ”、等)か否かを調べ、英字については大文字と小
文字が同形の文字(“C”と“c”、等)か、大文字と
小文字が同形でない大文字(“A”等)か、または大文
字と小文字が同形でない小文字(“a”等)かのいずれ
かを調べ、文字情報記憶部6の属性領域中の該当する欄
にマークを記入する(S3)。次にiに1をセットし
(S4)、i番目の文字の属性、高さおよび幅をそれぞ
れ文字情報記憶部6の属性領域、高さ領域、幅領域から
読み取り、サイズ度数記憶部11の該当する文字種の高
さと幅の欄の値を1だけ増加させ、文字種別サイズ記憶
部9の該当する文字種の高さと幅の合計値の欄にそれぞ
れi番目の文字の高さと幅の値を加算し、文字数の欄に
1を加算する(S5)。サイズ度数記憶部11と文字種
別サイズ記憶部9の各欄の初期値は0である。文字情報
記憶部6のi番目の文字の文字列中心線領域と文字中心
点領域の値を読みだし、文字パターンの中心が文字列の
中心線より下方に位置する度合いを計算し、これを文字
情報記憶部6のずれ値領域に記憶する(S6)。iがN
より小さいかどうかを判定する(S7)。小さい場合は
iをひとつ増やした後(S8)、S5に戻る。等しい
か、大きい場合はS9に進む。N文字について求まった
高さと幅の度数をサイズ度数記憶部11から読みだし、
重心値を文字種別に計算する(S9)。高さと幅の合計
値と文字数を文字種別サイズ記憶部9から読みだし、平
均値を文字種別に計算する(S10)。文字種毎に重心
値と平均値の差がしきい値以内かどうかを判定する(S
11)。しきい値以内のときにはS12へ、それ以外の
ときにはS13へ進む。S12では、文字種毎に、たと
えば重心値と平均値の中間の値を計算し、これを標準サ
イズとする。S13では重心値を標準サイズと決める。
次に、iに1をセットする(S14)。文字情報記憶部
6のi番目の文字の属性領域を読みだして、大文字・小
文字が同形の文字ならば、文字情報記憶部6の高さ領域
あるいは幅領域がS12あるいはS13で求めた標準サ
イズに文字カテゴリ別サイズ情報テーブル25の値を乗
じた値より大きいか小さいかの度合いと、文字情報記憶
部6のずれ値領域の値の大きさの関係から、たとえば前
者がかなり大きければ後者にかかわらず大文字と判別
し、前者が中位に大きくて後者が大きいときは小文字、
前者が中位に大きくて後者が小さいときは大文字などの
ルールを用いて、大文字・小文字を判別する(S1
5)。iがNより小さいかどうかを調べ(S16)、等
しいか大きい場合はS17にてiをひとつ増やしてS1
5に戻り、小さい場合は処理を終了する。
First, the character recognition circuit 21 recognizes N characters (S1). The character recognition circuit 21 outputs the character code, the center line of the character string, the center point of the character pattern, the height and the width. The outputs from the character recognition circuit 21 for all N characters are stored in the character code area, the character string center line area, the character center point area, the height area, and the width area of the character information storage unit 6 (S2). Next, check the type of each N character (English characters, Hiragana, Katakana, etc.), and for Hiragana and Katakana, the same uppercase and lowercase letters (eg "a" and "a", "ya" and " , Etc.) and check whether the uppercase and lowercase letters are the same as for alphabetic characters (“C” and “c”, etc.), or whether the uppercase and lowercase letters are not isomorphic (“A”, etc.), or Whether or not the uppercase and lowercase letters are not the same shape (such as "a") is checked, and a mark is entered in the corresponding column in the attribute area of the character information storage unit 6 (S3). Next, i is set to 1 (S4), and the attribute, height, and width of the i-th character are read from the attribute area, height area, and width area of the character information storage unit 6, respectively. The value of the height and width of the character type to be increased is incremented by 1, and the height and width values of the i-th character are added to the total value of the height and width of the corresponding character type in the character type size storage unit 9. , 1 is added to the column of the number of characters (S5). The initial value of each column of the size frequency storage unit 11 and the character type size storage unit 9 is 0. The values of the character string center line area and the character center point area of the i-th character in the character information storage unit 6 are read out, the degree to which the center of the character pattern is located below the center line of the character string is calculated, and this value is calculated. The data is stored in the shift value area of the information storage unit 6 (S6). i is N
It is determined whether it is smaller than (S7). If it is smaller, i is incremented by 1 (S8), and the process returns to S5. If they are equal or larger, the process proceeds to S9. The height and width frequencies obtained for N characters are read from the size frequency storage unit 11,
The centroid value is calculated for each character type (S9). The total value of the height and width and the number of characters are read from the character type size storage unit 9, and the average value is calculated for the character type (S10). It is determined for each character type whether the difference between the centroid value and the average value is within a threshold value (S
11). If it is within the threshold value, the process proceeds to S12, and if not, the process proceeds to S13. In S12, for example, an intermediate value between the centroid value and the average value is calculated for each character type, and this is set as the standard size. In S13, the center of gravity value is determined as the standard size.
Next, i is set to 1 (S14). When the attribute area of the i-th character of the character information storage unit 6 is read and the upper and lower case letters are the same shape, the height area or width area of the character information storage unit 6 becomes the standard size determined in S12 or S13. From the relationship between the degree of being larger or smaller than the value obtained by multiplying the value of the character category size information table 25 and the size of the value of the deviation value area of the character information storage unit 6, for example, if the former is considerably large, regardless of the latter. Distinguish it as an uppercase letter, and if the former is large in the middle and the latter is large, it is in lowercase,
When the former is large in the middle and the latter is small, uppercase / lowercase is discriminated using rules such as uppercase letters (S1).
5). It is checked whether i is smaller than N (S16). If equal or larger, i is incremented by 1 in S17 and S1
Returning to 5, the processing is ended when it is smaller.

【0015】次に、本実施例の文字種等の判別装置につ
いて、文字種等の判別方法を具体例を用いて説明する。
Next, the method of discriminating character types etc. of the character type discriminating apparatus of the present embodiment will be explained using a concrete example.

【0016】図4は具体例の文字が文字情報記憶部に記
憶されている記憶内容配置図であり、図5は具体例の文
字がサイズ度数記憶部に記憶されている記憶内容配置図
であり、図6は具体例の文字が文字カテゴリ別サイズ情
報テーブルに記憶されている記憶内容配置図であり、図
7は具体例の文字が文字種別サイズ記憶部に記憶されて
いる記憶内容配置図である。
FIG. 4 is a storage content layout diagram in which specific example characters are stored in the character information storage unit, and FIG. 5 is a storage content layout diagram in which specific example characters are stored in the size frequency storage unit. 6 is a storage content layout diagram in which characters of a specific example are stored in a character category size information table, and FIG. 7 is a storage content layout diagram in which characters of a specific example are stored in a character type size storage unit. is there.

【0017】まず、図4に示すように文字認識回路21
からの出力を記憶した文字情報記憶部6の文字コードが
属する文字種を調べ、文字情報記憶部6の属性領域にマ
ークする。次に、文字情報記憶部6の文字列中心線領域
の値から文字中心点領域の差を取りずれ値領域に書き込
む。
First, as shown in FIG. 4, the character recognition circuit 21
The character type to which the character code of the character information storage unit 6 storing the output from is checked and the attribute area of the character information storage unit 6 is marked. Next, the difference of the character center point area from the value of the character string center line area of the character information storage unit 6 is written in the error value area.

【0018】以下説明をわかり易くするために英文字に
ついて説明する。図5,図6に示すように文字情報記憶
部6の属性領域を見て英字の大文字欄にマークが付され
ている文字の、高さ領域と幅領域の値に該当するサイズ
度数記憶部11の英大文字の高さおよび幅の欄の値をひ
とつ増加する(図5)。同様の処理を小文字についても
行う。また、図7に示すように文字情報記憶部6で属性
領域を見て英字の大文字欄または小文字欄にマークがあ
る文字の、高さ領域と幅領域の値をそれぞれ文字カテゴ
リ別サイズ情報テーブル25の該当する文字カテゴリの
値で除した値の合計値と文字数を文字種別サイズ記憶部
9に記憶する。図5より英字大文字の高さの重心値は
(50*1+51*1)/2の計算により50.5、幅
は同様にして44.0に、小文字では高さ43.5、幅
42.0になる。一方、平均値は図7の合計値を文字数
で除し、英字の大文字では高さ51.8、幅42.0、
小文字では高さ51.25、幅52.5が得られる。英
大文字の場合、高さと幅の標準サイズがしきい値以内、
英小文字では高さと幅がしきい値以上とすると、英大文
字の標準サイズは、高さが(50.5+51.8)/2
=51.15、(44.0+42.0)/2=43.0
となる。一方、英小文字の標準サイズは重心値だけを採
用して、高さが43.5、幅が42.0になる。最後
に、文字情報記憶部6で英字の両方にマークされた文字
の高さ領域と幅領域の値を、たとえば英大文字と英小文
字の標準サイズの平均値と比較し、これよりかなり大き
ければ大文字、かなり小さければ小文字、その中間なら
ば文字情報記憶部6の該当する文字のずれ値領域の値を
見てこれがしきい値より大きければ大文字、小さければ
小文字に判別する。
In order to make the description easier to understand, English characters will be described below. As shown in FIGS. 5 and 6, when looking at the attribute area of the character information storage unit 6, the size frequency storage unit 11 corresponding to the values of the height region and the width region of the character marked in the uppercase column of the alphabet. Increment the values in the height and width columns of the capital letters of (Figure 5). Similar processing is performed for lowercase letters. In addition, as shown in FIG. 7, when the attribute area is viewed in the character information storage unit 6, the values of the height area and the width area of the characters having a mark in the upper case column or the lower case column of the alphabet are respectively classified by the character category size information table 25. The total of the values divided by the value of the corresponding character category and the number of characters are stored in the character type size storage unit 9. From Fig. 5, the height center of gravity of uppercase letters is 50.5 by calculation of (50 * 1 + 51 * 1) / 2, the width is 44.0 in the same manner, and the height is 43.5 and the width is 42.0 for lowercase letters. become. On the other hand, the average value is obtained by dividing the total value shown in FIG. 7 by the number of characters, and for uppercase letters, the height is 51.8 and the width is 42.0.
A small letter gives a height of 51.25 and a width of 52.5. For capital letters, the standard size for height and width is within the threshold,
If the height and width are lower than the threshold for lowercase letters, the standard size for uppercase letters is (50.5 + 51.8) / 2.
= 51.15, (44.0 + 42.0) /2=43.0
Becomes On the other hand, the standard size of English small letters adopts only the center of gravity value, and the height is 43.5 and the width is 42.0. Finally, the values in the height and width areas of the characters marked in both letters in the character information storage unit 6 are compared with, for example, the average value of the standard sizes of uppercase letters and lowercase letters. If it is considerably smaller than the threshold value, if it is larger than the threshold value, it is determined to be a capital letter.

【0019】[0019]

【発明の効果】以上のように本発明は、文字種等毎に実
際の大きさから度数の重心の値(重心値)と平均値を計
算して、これらの差を求め、これがしきい値より小さけ
れば度数分布から求めた値を標準サイズとし、大きけれ
ば重心値と平均値の中間の値を標準サイズとして使用す
るから、計算により求めた標準サイズと実際の標準サイ
ズとの誤差が小さくすることができ、誤判別が極めて少
なく、大文字・小文字等の文字種等の判別やフォントの
異なる文字種等の判別を精度良く行うことができる文字
種等の判別方法を実現できるものである。
As described above, according to the present invention, the value of the center of gravity of the frequency (center of gravity value) and the average value are calculated from the actual size for each character type, and the difference between them is calculated. If it is small, the value obtained from the frequency distribution will be used as the standard size, and if it is large, the intermediate value between the center of gravity and the average value will be used as the standard size, so the error between the calculated standard size and the actual standard size should be small. Therefore, it is possible to realize a method of discriminating character types, etc., which can perform discrimination of character types such as uppercase and lowercase letters and character types of different fonts with high accuracy, with very few misjudgments.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例における大文字・小文字の判
別方法を実施する判別装置の機能構成を示すブロック図
FIG. 1 is a block diagram showing a functional configuration of a discriminating apparatus that implements a method for discriminating uppercase / lowercase letters according to an embodiment of the present invention.

【図2】本実施例における文字種等の判別方法を実施す
る判別装置のハード構成を示すブロック図
FIG. 2 is a block diagram showing a hardware configuration of a discriminating apparatus that implements a method for discriminating character types and the like according to the present embodiment.

【図3】本実施例の文字種等の判別方法を実行する際の
フローチャート
FIG. 3 is a flowchart for executing a method for determining a character type and the like according to the present embodiment.

【図4】具体例の文字が本実施例の判別装置の文字情報
記憶部に記憶されている記憶内容配置図
FIG. 4 is a storage content layout diagram in which characters of a specific example are stored in a character information storage unit of the determination device of the present embodiment.

【図5】具体例の文字が本実施例の判別装置のサイズ度
数記憶部に記憶されている記憶内容配置図
FIG. 5 is a storage content layout diagram in which characters of a specific example are stored in a size frequency storage unit of the determination device of the present embodiment.

【図6】具体例の文字が本実施例の判別装置の文字カテ
ゴリ別サイズ情報テーブルに記憶されている記憶内容配
置図
FIG. 6 is a storage content layout diagram in which characters of a specific example are stored in a character category size information table of the discrimination apparatus of the present embodiment.

【図7】具体例の文字が本実施例の判別装置の文字種別
サイズ記憶部に記憶されている記憶内容配置図
FIG. 7 is a storage content layout diagram in which characters of a specific example are stored in a character type size storage unit of the determination device of the present embodiment.

【符号の説明】[Explanation of symbols]

1 画像入力部 2 文字列切り出し部 3 文字列中心線計算部 4 文字切り出し部 5 文字認識部 6 文字情報記憶部 7 ずれ計算部 8 文字コード分類部 9 文字種別サイズ記憶部 10 平均値計算部 11 サイズ度数記憶部 12 頻度重心値計算部 13 標準サイズ決定部 14 大文字・小文字等判別部 20 イメージスキャナ 21 文字認識回路 22 RAM 23 CPU 24 ROM 25 文字カテゴリ別サイズ情報テーブル 26 プログラム記憶部 1 image input unit 2 character string cutout unit 3 character string centerline calculation unit 4 character cutout unit 5 character recognition unit 6 character information storage unit 7 deviation calculation unit 8 character code classification unit 9 character type size storage unit 10 average value calculation unit 11 Size frequency storage unit 12 Frequency centroid value calculation unit 13 Standard size determination unit 14 Upper / lower case discrimination unit 20 Image scanner 21 Character recognition circuit 22 RAM 23 CPU 24 ROM 25 Character category size information table 26 Program storage unit

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】所定の文字数に対して文字認識部から出力
された文字コード,文字列の中心線及び文字パターンの
高さ,幅及び中心点を読み取り、当該文字コードにより
大文字・小文字や漢字仮名類似字形文字等の文字種を判
断し、大文字と小文字等の同形の文字に対しては、文字
列の中心線と文字パターンの中心点との差を計算すると
ともに、文字パターンの高さと幅の度数の重心と平均値
を計算し、当該重心の値と平均値とを比較し、この結果
に応じて前記文字種毎の標準の大きさを推定し、これと
文字パターンの高さあるいは幅を比較することにより文
字種を判別することを特徴とする文字種等の判別方法。
1. A character code output from a character recognition unit for a predetermined number of characters, a center line of a character string, a height, a width and a center point of a character pattern are read, and the character code is used to detect uppercase / lowercase letters or kana / kana characters. Judges the character type such as similar glyphs and calculates the difference between the center line of the character string and the center point of the character pattern for characters of the same shape, such as uppercase and lowercase, and the frequency of the height and width of the character pattern. Calculates the center of gravity and the average value, compares the value of the center of gravity and the average value, estimates the standard size for each character type according to this result, and compares this with the height or width of the character pattern A method for discriminating character types, etc., characterized by discriminating the character types.
JP4088551A 1992-04-09 1992-04-09 Discrimination method of character kind and the like Pending JPH05290211A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4088551A JPH05290211A (en) 1992-04-09 1992-04-09 Discrimination method of character kind and the like

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4088551A JPH05290211A (en) 1992-04-09 1992-04-09 Discrimination method of character kind and the like

Publications (1)

Publication Number Publication Date
JPH05290211A true JPH05290211A (en) 1993-11-05

Family

ID=13946000

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4088551A Pending JPH05290211A (en) 1992-04-09 1992-04-09 Discrimination method of character kind and the like

Country Status (1)

Country Link
JP (1) JPH05290211A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010044485A (en) * 2008-08-11 2010-02-25 Omron Corp Character recognition device, character recognition program and character recognition method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010044485A (en) * 2008-08-11 2010-02-25 Omron Corp Character recognition device, character recognition program and character recognition method

Similar Documents

Publication Publication Date Title
US5803629A (en) Method and apparatus for automatic, shape-based character spacing
US6643401B1 (en) Apparatus and method for recognizing character
US7580571B2 (en) Method and apparatus for detecting an orientation of characters in a document image
JPH09179937A (en) Method for automatically discriminating boundary of sentence in document picture
JPH09179942A (en) Method for automatically recognizing drop word in document picture using no ocr
EP0810542A2 (en) Bitmap comparison apparatus and method
KR100582039B1 (en) Character recognizing apparatus
JPH05290211A (en) Discrimination method of character kind and the like
US5119441A (en) Optical character recognition apparatus and method using masks operation
JP2005063419A (en) Language identification apparatus, program and recording medium
JP3911942B2 (en) Character recognition device
JPH11338977A (en) Method and device for character processing and storage medium
JP3111521B2 (en) Recognition character correction method
JP2510722B2 (en) How to distinguish uppercase and lowercase letters in English
JP3457094B2 (en) Character recognition device and character recognition method
JP2930605B2 (en) How to distinguish between uppercase, lowercase and Kanji Kana-like characters
JP2697790B2 (en) Character type determination method
JPH0950488A (en) Method for reading different size characters coexisting character string
JPH11126235A (en) Handwritten character recognition device and medium where handwritten character recognition device control program is stored
JP3111522B2 (en) Recognition character correction method
JPH01114991A (en) Method for discriminating capital letter/small letter
JP2001266070A (en) Device and method for recognizing character and storage medium
JPH03150690A (en) Character recognizing device
JP2851102B2 (en) Character extraction method
JP3320083B2 (en) Character recognition apparatus and method