JPH03126188A - Character recognizing device - Google Patents

Character recognizing device

Info

Publication number
JPH03126188A
JPH03126188A JP1264734A JP26473489A JPH03126188A JP H03126188 A JPH03126188 A JP H03126188A JP 1264734 A JP1264734 A JP 1264734A JP 26473489 A JP26473489 A JP 26473489A JP H03126188 A JPH03126188 A JP H03126188A
Authority
JP
Japan
Prior art keywords
character
name
line
recognized
pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP1264734A
Other languages
Japanese (ja)
Other versions
JP2788506B2 (en
Inventor
Hiroshi Yoshida
浩史 吉田
Toru Ishikawa
石川 融
Koichi Higuchi
浩一 樋口
Yoshiyuki Yamashita
山下 義征
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oki Electric Industry Co Ltd
Original Assignee
Oki Electric Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oki Electric Industry Co Ltd filed Critical Oki Electric Industry Co Ltd
Priority to JP1264734A priority Critical patent/JP2788506B2/en
Publication of JPH03126188A publication Critical patent/JPH03126188A/en
Application granted granted Critical
Publication of JP2788506B2 publication Critical patent/JP2788506B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

PURPOSE:To accurately identify characters with the same character type and different size by calculating the reference line of character row data by using the recognition result of first to n-th characters to be recognized of the character row, and hereafter performing recognition based on the character type of the character and the relative position relation of the coordinate of a character pattern in character row data and that of the reference line. CONSTITUTION:A photoelectric conversion part 112 which obtains the input character row data of the character row on a medium by performing the photoelectric conversion and quantization of an optical signal 111 from the medium, a line buffer 113, a character segmenting part 114, and a pattern register 115 which stores a segmented character pattern are provided. And the reference line of the character row is decided based on the recognition result of the first to n-th characters of the character row when each character on the character row is recognized, and when the characters behind an (n+1)th character is recognized, the recognition is performed based on the character type of the character and the relative relation of the quantizing pattern of the character and the reference line. Thereby, it is possible to accurately identify the characters even when the characters with the same character type and different size are contained.

Description

【発明の詳細な説明】 (産業上の利用分野) この発明は、高い認識精度を得ることが出来る文字認識
装置に関するものである。
DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a character recognition device that can obtain high recognition accuracy.

(従来の技術) 機械が文字図形を自動的に識別出来れば、例えばコンピ
ュータへのデータ入力を人間が行なうより効率良くかつ
正確に行なうことが出来る等、種々の利点が得られる。
(Prior Art) If a machine can automatically identify characters and graphics, various advantages can be obtained, such as being able to enter data into a computer more efficiently and accurately than humans can.

このため、文字認識装置に間する研究が従来がら盛んに
行なわれている。
For this reason, research into character recognition devices has been actively conducted.

従来の文字認識装置は、一般(こ、以下の■〜■に示す
ような構成成分を具えていた。
Conventional character recognition devices generally include the following components.

[1]…文字、図形等が記載されている媒体例えば帳票
を走査して得られた光信号を光電変換し、ざらに文字線
部を例えば黒ビット、背景部を白ヒツトで表わした2碕
の入力文字行データを得る光電変換部。
[1]...The optical signal obtained by scanning a medium, such as a form, on which characters, figures, etc. are written is photoelectrically converted, and the text lines are roughly represented by black bits, and the background part is represented by white hits. A photoelectric conversion unit that obtains input character line data.

[2]…この入力文字行データより文字パタンを切り出
す文字切り出し部。
[2]...Character cutting unit that cuts out a character pattern from this input character line data.

■・・・この文字パタンより特徴量を抽出し、この特徴
Jlを予め用意しである標準文字の特徴量と比較し最も
類似度の高い標準文字パタンの文字名を被認識文字の認
識結果文字名として出力する認識部。
■... Extract the feature amount from this character pattern, compare this feature Jl with the feature amount of the standard character prepared in advance, and select the character name of the standard character pattern with the highest degree of similarity as the recognition result character of the character to be recognized. Recognizer that outputs as a name.

しかし、このような文字認識装置で外国語例えば英語の
文1、或いは英字で記された氏名、住所等の文字行を認
識する場合、この文字行中1こはカンマ[、Jとアポス
トOフィ「゛」、また、大文字「P」と小文字「PJ等
のように形状の全く等しい文字が混在しているため、文
字パタンの字形のみでは文字認識を精度良く行なうこと
が出来ないという問題点があった。
However, when such a character recognition device recognizes a sentence in a foreign language, such as English, or a line of characters written in English, such as a name or address, the first line in the line is a comma [, J, and a post O letter. Also, since there are characters with exactly the same shape, such as the capital letter "P" and the lowercase letter "PJ," there is a problem that character recognition cannot be performed accurately based only on the shape of the character pattern. there were.

そこでこの問題点を解決するために、文字パタンの字形
に加え文字パタンの大きざ及び文字パタンの文字行中の
相対的位Mlfr用いて文学誌mを行なう方法が用いら
れていた。
Therefore, in order to solve this problem, a method has been used in which literary journal m is performed using the size of the character pattern and the relative position Mlfr of the character pattern in the character line in addition to the shape of the character pattern.

この種の方法としでは例えば文献(昭和63年電子情報
通信学会春季全国大会(昭和63年3.15) D−4
48)に開示されているものがあった。
Examples of this type of method include the literature (1988 Institute of Electronics, Information and Communication Engineers Spring National Conference (March 15, 1988) D-4
48) was disclosed.

この文献に開示された方法によれば、先ず、文字行から
文字に外接する矩形枠が抽出される0次に、文字行中の
各文字の外接矩形枠が比較され、最も大きい文字に比し
極端に小さい文字が除去される0次に、残った文字の外
接矩形枠の上端及び下端の高ざの位置によるヒストグラ
ムが作成される0次に、このヒストグラムより、矩形上
端で最も低い位置にあるビークと、矩形下端で最も高い
位置にあるピークとが検出されこれらビーク闇の距離と
ほぼ同じ大きざの文字の上下端の座標を用いて最小二自
乗法により文字行の傾きを与える直線が求められる0次
に、得られたM線の傾きよりスキューによる文字高ざの
ずれが補正された後再び先に説明したと同様な方法でヒ
ストグラムが作成される。次に、このヒストグラムより
、先に説明したと同様に2つのピークが検出されこれら
と一部が上側基準線及び下側基準線とされる0次に、こ
れら上側及び下側基準線間の距離が基準サイズの文字と
され、文字行の各文字バクンの大きざがこの基準サイズ
文字の大きざと比較されまた、各文字パクンの位置が上
側及び下側基準線と比較される。そしてこの比較結果に
基づき文字行の各文字が複数のカテゴリに分類され、こ
れにより認識精度の向上が図られていた。
According to the method disclosed in this document, first, a rectangular frame circumscribing a character is extracted from a character line. Next, the circumscribing rectangular frames of each character in the character line are compared, and the rectangular frame circumscribing each character in the character line is compared. Extremely small characters are removed Next, a histogram is created based on the height positions of the top and bottom edges of the circumscribed rectangle of the remaining characters. From this histogram, the lowest position at the top of the rectangle is The beak and the peak located at the highest position at the bottom of the rectangle are detected, and a straight line giving the slope of the character line is found by the least squares method using the coordinates of the upper and lower ends of the character whose size is approximately the same as the distance between these beaks. Next, the deviation in character height due to skew is corrected from the slope of the obtained M line, and then a histogram is created again in the same manner as described above. Next, from this histogram, two peaks are detected in the same way as explained earlier, and these and a portion are taken as the upper reference line and the lower reference line. Next, the distance between these upper and lower reference lines is is set as a standard size character, and the size of each character in the character line is compared with the size of this standard size character, and the position of each character in the line is compared with the upper and lower reference lines. Based on the results of this comparison, each character in the character line is classified into multiple categories, thereby improving recognition accuracy.

(発明が解決しようとする課題) しかしながら、上述した文献に開示されている従来の文
字認識方法は、文字行中の全文字を対象としで、矩形情
報の入力、像小文字の除去、行傾き補正、ざらに基準線
算出等の一連の処理を行なう必要があるため、認m速度
が著しく低下してしまうという問題点があった。
(Problem to be Solved by the Invention) However, the conventional character recognition method disclosed in the above-mentioned literature targets all characters in a character line, inputs rectangular information, removes lowercase characters, and corrects line inclination. Since it is necessary to perform a series of processes such as rough reference line calculation, there is a problem in that the recognition speed is significantly reduced.

また、1文字や2文字程度の文字で構成されている短い
文字行の場合、ヒストグラムを作成するためのデータが
非常に少ないので、基準線になるピークを正確に検出出
来ない、従って従来の方法は短い文字行には適用出来ず
、また適用したとしても認識精度はかえって低下しでし
まうという問題点があった。
In addition, in the case of short character lines consisting of one or two characters, there is very little data to create a histogram, so it is not possible to accurately detect the peak that will serve as the reference line. cannot be applied to short character lines, and even if it were applied, there was a problem in that the recognition accuracy would actually decrease.

また、はとんどが同じ大きざの文字で構成されでいる文
字行の場合は、ヒストグラムにおける凹凸が小さいため
、基準線及び基準サイズが検出出来ず、従って正確な文
字認識を行なうことが出来ないという問題点があった。
In addition, in the case of a character line consisting of characters that are mostly the same size, the unevenness in the histogram is small, so the reference line and reference size cannot be detected, and therefore accurate character recognition cannot be performed. The problem was that there was no.

この発明はこのような点に鑑みなされたものであり、従
ってこの発明の目的は、上述の問題点を解決し、形状の
等しい文字も正確に然も高速に認識出来る文字認識装M
を提供することにある。
The present invention has been made in view of the above points, and therefore, an object of the present invention is to provide a character recognition device M that can solve the above-mentioned problems and can recognize characters of the same shape accurately and at high speed.
Our goal is to provide the following.

(課題を解決するための手段) この目的の達成を図るため、この出願に係る発明者はt
i々の検討を重ねた。その結果、文字行の各文字を認識
する際に文字行の先頭からn文字目までの認識結果に基
づいてこの文字行の基準IIを決定し、n+1文字目以
降の文字を認識するに当たっては該文字の字形及び該文
字の量子化パタン(文字パタン)の前記基準線との相対
的な位MIWi係に基づいて認識を行なえば、この発明
の目的が達成出来るという結論を得た。
(Means for solving the problem) In order to achieve this objective, the inventor of this application
We have repeatedly considered various issues. As a result, when recognizing each character in a character line, the standard II for this character line is determined based on the recognition results from the beginning of the character line to the nth character, and when recognizing characters from the n+1st character onward, the It was concluded that the object of the present invention can be achieved if recognition is performed based on the MIWi relationship of the character shape and the quantization pattern (character pattern) of the character relative to the reference line.

従ってこの発明によれば、媒体からの光を光電変換し量
子化して媒体上の文字行の入力文字行データを得る光電
変換部、該入力文字行データより文字パタンを切り出す
文字切り出し部及び該文字パタンの特徴を抽出し被認識
文字の認識結果文字名を出力する認識部を具える文字認
識装置において、 前述の認識部を、下記(A)、(B)に従い認識結果文
字名を決定する構成としたことを特徴とする文字認識装
置。
Therefore, according to the present invention, there is provided a photoelectric conversion unit that photoelectrically converts and quantizes light from a medium to obtain input character line data of a character line on the medium, a character cutting unit that cuts out a character pattern from the input character line data, and a character cutting unit that cuts out a character pattern from the input character line data, and In a character recognition device comprising a recognition unit that extracts features of a pattern and outputs a recognition result character name of a recognized character, the recognition unit described above is configured to determine a recognition result character name according to (A) and (B) below. A character recognition device characterized by:

(A)文字行の先頭からn番目までの被認識文字につい
ては、1文字づつ、当該被認識文字の文字パタンの字形
に起因する特徴を含む特徴に基づいて認識結果文字名を
決定する(但しnは任意の正の整数である)。
(A) For the characters to be recognized from the beginning of the character line to the n is any positive integer).

(B)前述の文字行の先頭からn+1番目以降の被認識
文字についでは、1文字づつ、 [1]…当該被認識文字の文字パタンの字形に起因する
特徴を含む特徴、並びに [2]…前述のn番目までの文字の認識結果文字名犬々
に対応する予め定めた係数の一部又は全部の係数及び核
用いる係数に対応する文字パタンの入力文字行データに
おける座標を用いて算出した前述の入力文字行データの
基準線座標と、当該被認識文字の文字パタンの入力文字
行データにおける座標との相対位置 に基づいて認識結果文字名を決定する。
(B) For the characters to be recognized after the (n+1)th character from the beginning of the character line mentioned above, each character is: [1]...Features including features resulting from the shape of the character pattern of the character to be recognized, and [2]... The above-mentioned recognition result of the characters up to the n-th character is calculated using some or all of the predetermined coefficients corresponding to the characters and the coordinates in the input character line data of the character pattern corresponding to the coefficients to be used. A recognition result character name is determined based on the relative position of the reference line coordinates of the input character line data and the coordinates of the character pattern of the character to be recognized in the input character line data.

ここで、当該被認識文字の文字パタンの字形に起因する
特徴を含む特徴とは、例えば、被認識文字の字形のみの
特徴、被認識文字の大きざによる特徴、被認識文字の字
形及び大きざのそれぞれの特徴等のことである。
Here, the features that include features resulting from the shape of the character pattern of the recognized character include, for example, features of only the shape of the recognized character, features due to the size of the recognized character, and features of the character shape and size of the recognized character. It refers to the characteristics of each.

なおこの発明の英施に当たり、前述の認識部を、 被認識文字の文字パタンの字形に起因する特徴に基づい
て候補文字名を抽出する候補文字名抽出部と、前述の抽
出された候補文字名から以下の(a)、(b)に従い認
識結果文字名を決定する文字名決定部とで構成するのが
好適である。
In implementing this invention, the above-mentioned recognition section is replaced with a candidate character name extraction section that extracts a candidate character name based on the characteristics resulting from the shape of the character pattern of the character to be recognized, and the above-mentioned extracted candidate character name. It is preferable that the character name determination unit is configured to include a character name determination unit that determines a recognition result character name according to (a) and (b) below.

(a)前述のn番目までの文字については候補文字名の
うちの被認識文字に対する類似度が最も大きい候補文字
名を認識結果文字名とする。
(a) For the above-mentioned characters up to the n-th character, the candidate character name with the greatest degree of similarity to the character to be recognized among the candidate character names is set as the recognition result character name.

(b)前述のn+1番目以降の被認識文字については、 前述の基準線座標と、被認識文字の文字パタンの座標と
に基づいて該文字パタンの入力文字行データにおける位
置の特徴を算出し、 該位置の特徴を被認識文字の候補文字名に関連する所定
の値と比較して予め定めた条件を満足した場合該候補文
字名を認識結果文字名とし、方、満足しなかった場合は
第三位以下の候補文字名につき該候補文字名に関連する
所定の値及び前述の位言の特徴間の比較を行ない、第三
位以下の候補文字名のうちの前述の予め定めた条件を満
足した候補文字名を認識結果文字名とする。
(b) For the n+1st and subsequent characters to be recognized, calculate the positional characteristics of the character pattern in the input character line data based on the reference line coordinates and the coordinates of the character pattern of the character to be recognized; The feature of the position is compared with a predetermined value related to the candidate character name of the character to be recognized, and if the predetermined condition is satisfied, the candidate character name is set as the recognition result character name; A comparison is made between the predetermined value related to the candidate character name and the above-mentioned positional features for the candidate character names ranked 3rd or lower, and the aforementioned predetermined conditions are satisfied among the candidate character names ranked 3rd or lower. The candidate character name obtained is set as the recognition result character name.

(作用) この発明の文字認識装置によれば、文字行の先頭からn
番目までの被認識文字の認識結果を用いて、n+1番目
以降の被認識文字の候補文字名が認識結果文字名として
適切か否かの判定に用いる文字位置の基準線を算出し、
この基準線と被認識文字の文字パタンかう得られるデー
タとに基づいて認識結果文字名を決定出来る。このため
、n+1番目以降の被認識文字中に、例えば大文字「P
」、小文字rP J等のように字形が同じで大きさが異
なる文字が含まれていても、両者を正確に識別出来る。
(Operation) According to the character recognition device of the present invention, n characters from the beginning of a character line are
Using the recognition results of up to the recognition character, calculate a reference line of character positions used to determine whether candidate character names of the n+1st and subsequent recognition characters are appropriate as recognition result character names,
The recognition result character name can be determined based on this reference line and the obtained character pattern data of the character to be recognized. For this reason, for example, the capital letter "P" is included in the characters to be recognized after the n+1st
Even if characters with the same shape but different sizes are included, such as ``'', lowercase rP, J, etc., the two can be accurately identified.

このため文字行全体における認識精度の向上が図れる。Therefore, recognition accuracy for the entire character line can be improved.

然も、基準線の算出は文字行中の先頭部分の1又は複数
文字を用いて行なうだけであるので、基準線の算出時間
は従来に比し極めて短時間で行なえる。従って、認識時
間の短縮が図れる。
However, since the reference line is calculated only by using one or more characters at the beginning of a character line, the time required to calculate the reference line can be extremely shortened compared to the conventional method. Therefore, the recognition time can be shortened.

(実施例) 以下、図面を参照してこの発明の文字認識装置の実施例
につき説明する。
(Embodiments) Hereinafter, embodiments of the character recognition device of the present invention will be described with reference to the drawings.

S′I 刀−量 第1図は、実施例の文字認識装置の構成を概略的に示し
たプロ・ンク図である。
Figure 1 is a diagram schematically showing the configuration of a character recognition device according to an embodiment.

第1図において、100は文字認識装置、111は媒体
(例えば帳票)からの光信号、112は媒体からの光信
号111を光電変換し量子化して媒体上の文字行の入力
文字行データを得る光電変換部、113はこの入力文字
行データを格納するためのラインバッファ、114はラ
インバッファ内の入力文字行データより文字パタンを切
り出す文字切り出し部、115は切り出した文字パタン
を格納するパタンレジスタをそれぞれ示す、これら光電
変換部112、ラインバッファ113、文字切り出しg
E+14及びパタンレジスタ115は、それぞれ従来公
知の回路で構成しである。またこの実施例の場合、ライ
ンバッファ113は128 x4096画素の容量を有
するメモリで構成しであり、パタンレジスタ115は1
28 X512画素の容量を有するメモリで構成しであ
る。
In FIG. 1, 100 is a character recognition device, 111 is an optical signal from a medium (for example, a form), and 112 is a photoelectric conversion and quantization of the optical signal 111 from the medium to obtain input character line data for character lines on the medium. 113 is a photoelectric conversion unit; 113 is a line buffer for storing this input character line data; 114 is a character cutting unit that cuts out a character pattern from the input character line data in the line buffer; and 115 is a pattern register that stores the cut out character pattern. These photoelectric conversion unit 112, line buffer 113, and character cutout g are shown respectively.
E+14 and pattern register 115 are each constructed from conventionally known circuits. In this embodiment, the line buffer 113 is composed of a memory having a capacity of 128 x 4096 pixels, and the pattern register 115 is composed of a memory having a capacity of 128 x 4096 pixels.
It consists of a memory having a capacity of 28 x 512 pixels.

ざらに第1図において116は、この発明に係る認識部
を示す、この認識部116は、下記(A)、(B)に従
い認識結果文字名を決定する構成としである。
Briefly, in FIG. 1, reference numeral 116 indicates a recognition unit according to the present invention. This recognition unit 116 is configured to determine a recognition result character name according to (A) and (B) below.

(A)文字行の先頭からn番目までの被認識文字につい
ては、1文字づつ、被認識文字の文字パタンの字形に起
因する特徴を含む特徴に基づいて認識結果文字名を決定
する。
(A) For each character to be recognized from the beginning to the nth character in a character line, a recognition result character name is determined for each character based on features including features resulting from the shape of the character pattern of the character to be recognized.

(B)前記文字行の先頭からn+1番目以降の被認識文
字については、 [1]…1文字づつ、当該被認識文字の文字パタンの字
形に起因する特徴、並びに [2]…前記n番目までの文字の認識結果文字名夫々に
対応する予め定めた係数の一部又は全部の係数及び核用
いる係数に対応する文字パタンの入力文字行データにお
ける座標を用いて算出した前記入力文字行データの基準
線座標と、当該被認識文字の文字パタンの入力文字行デ
ータにおける座標との相対位置 に基づいて認識結果文字名を決定する。
(B) For the characters to be recognized after the (n+1)th character from the beginning of the character line, [1]...Characteristics due to the shape of the character pattern of the character to be recognized, character by character, and [2]...Up to the nth character above. The standard of the input character line data calculated using some or all of the predetermined coefficients corresponding to each character name and the coordinates in the input character line data of the character pattern corresponding to the coefficient used as the core. A recognition result character name is determined based on the relative position of the line coordinates and the coordinates of the character pattern of the character to be recognized in the input character line data.

そして、上述の(A)及び(B)の処理を容易にするた
め、この実施例の認識部116は、被認識文字の文字パ
タンの字形に起因する特徴に基づいて候補文字名を抽出
する候補文字名抽出部117と、前記抽出された候補文
字名から以下の(a)、(b)に従い認識結果文字名を
決定するために文字位置判定部118a、文字位置特徴
テーブル118b、基準線算出部118C及び基準線記
憶部++8dM具える文字名決定部118とて構成しで
ある。
In order to facilitate the processing of (A) and (B) above, the recognition unit 116 of this embodiment extracts candidate character names based on the features resulting from the shape of the character pattern of the character to be recognized. A character name extraction unit 117, a character position determination unit 118a, a character position feature table 118b, and a reference line calculation unit for determining recognition result character names from the extracted candidate character names according to (a) and (b) below. 118C and a character name determining section 118 comprising a reference line storage section ++8 dM.

(a)文字行の先頭からn番目までの被認識文字につい
ては候補文字名のうちの被認識文字に対する類似度が最
も大きい候補文字名を認識結果文字名とする。
(a) For the characters to be recognized from the beginning of the character line to the n-th character, the candidate character name with the greatest degree of similarity to the character to be recognized among the candidate character names is set as the recognition result character name.

(b)前述の文字行の先頭からn+1番目以降の被認識
文字についでは、 前記基準線座標と、被認識文字の文字パタンの座標とに
基づいて該文字パタンの入力文字行データにおける位置
の特徴を算出し、 該位置の特徴を被認識文字の候補文字名に関連する所定
の値と比較して予め定めた条件を満足した場合該候補文
字名を認識結果文字名とし、方、満足しなかった場合は
第三位以下の候補文字名につき該候補文字名に関連する
所定の値及び前記位置の特徴間の比較を行ない、第三位
以下の候補文字名のうちの前記予め定めた条件を満足し
た候補文字名を認識結果文字名とする。
(b) For the characters to be recognized after the n+1th character from the beginning of the character line, the characteristics of the position of the character pattern in the input character line data are based on the reference line coordinates and the coordinates of the character pattern of the character to be recognized. is calculated, and the characteristics of the position are compared with a predetermined value related to the candidate character name of the character to be recognized, and if the predetermined condition is satisfied, the candidate character name is set as the recognition result character name; In this case, the predetermined value related to the candidate character name and the characteristics of the position are compared for the candidate character name in the third place or below, and the predetermined condition is determined for the candidate character name in the third place or below. The satisfied candidate character name is set as the recognition result character name.

ここで、基準線算出テーブル118cは、認識結果文字
名人々に対応する上述した予め定めた係数を格納してい
る。また、文字位置特徴テーブル118bは、被認識文
字の候補文字名に間違する上述の所定の値を格納してい
る。
Here, the reference line calculation table 118c stores the above-mentioned predetermined coefficients corresponding to the recognition result character names. Further, the character position feature table 118b stores the above-mentioned predetermined value that is incorrect as a candidate character name of a character to be recognized.

ざらに第1図において119は文字名決定部118で決
定された文字名を例えば外部コンヒューク、表示製画等
に出力するための文字名出力端子を示す。
Briefly, in FIG. 1, reference numeral 119 indicates a character name output terminal for outputting the character name determined by the character name determining section 118 to an external console, display screen, etc.

I+IfTJ−の    日 次に、実施例の文字認識装置の理解を深めるために、第
1図、別表1、別表2、第2図(A)及びCB)並びに
別表3を参照して実施例の文字認識装置の動作説明を行
なう、ここで、別表1は、基準線算出テーブル118c
の説明に供する表、別表2は、文字位置特徴テーブル+
18bの説明に供する表、第2図(A)は、ラインバッ
ファに記憶されている入力文字行データ21の説明に供
する図、第2図(B)は、入力文字行データ21におけ
る基準線23の説明に供する図、別表3は、被認識文字
が小文字「P」である場合における候補文字名及び認識
結果文字名の説明に供する表である。
In order to deepen the understanding of the character recognition device of the example, we will refer to Figure 1, Attached Table 1, Attachment 2, Figure 2 (A) and CB), and Attachment 3 to understand the character recognition device of the example. The operation of the recognition device will be explained. Here, Appendix 1 shows the reference line calculation table 118c.
Attached Table 2, which provides an explanation of the character position feature table +
18b is a table for explaining the input character line data 21 stored in the line buffer. FIG. 2(B) is a table for explaining the input character line data 21 stored in the line buffer. Table 3 is a table for explaining candidate character names and recognition result character names when the character to be recognized is the lowercase letter "P".

先ず、文字、図形等(以下、単に文字と称する)が記!
された帳票からの光信号111は光電変換部112に入
力される。光電変換部112は、この光信号111ヲ光
電変換し文字線部が例えば黒ヒツトで表現され背景部が
白ビットで表現される247mのディジクル信号(この
信号が入力文字行データに相当する。)に変換し、この
入力文字行データをラインバッファ113に格納する。
First, write down the characters, figures, etc. (hereinafter simply referred to as characters)!
An optical signal 111 from the generated form is input to a photoelectric conversion unit 112. The photoelectric conversion unit 112 photoelectrically converts this optical signal 111 into a 247 m digital signal in which the character line portion is expressed by black bits and the background portion is expressed by white bits (this signal corresponds to input character line data). This input character line data is stored in the line buffer 113.

ラインバッファ113は、光電変換部112がら入力さ
れた入力文字行データを2次元座標が再現出来る形式で
記憶する。第2図(A)は、ラインバッファ113に記
憶させた入力文字行データ21の様子を可視的に示した
ものである。
The line buffer 113 stores input character line data input from the photoelectric conversion unit 112 in a format that allows reproduction of two-dimensional coordinates. FIG. 2A visually shows the input character line data 21 stored in the line buffer 113.

次に文字切り出し部114は、ラインバッファ113よ
り入力文字行データを読み込みこれを文字行と垂直な方
向(第2図(A)中Yで示す方向(Yと逆の方向でも良
い、)以下列方向と称する。)ヲ主走査方向としかつ左
端より右端に順次に走査をし、各列毎の黒ヒツト数を計
数して黒ビットによるヒストグラムを作成する。ざらに
文字切り出し部114は、作成したヒストグラムを調べ
、黒ビット数が予め定めた第1の閾値8以上である列が
予め定めた第2の閾値り以上連続している端域を文字パ
タンデータとして抽出し、これをパタンレジスタ115
に格納する。さらに、文字切り出し部114は、パタン
レジスタ115に文字パタンデータを格納する際に、該
文字パターンデータが文字行中の先頭から何番目の文字
であるかを示す文字パタン番号m(第2図(A)参照)
と、該文字パタンのラインバッファ113上における最
上点の座標Yt及び最下点の座標Y、とを文字位置判定
部118aに出力する。なお、この実施例の場合、第1
の閾値Bを1とし、第2の閾値りを5として文字パタン
データを抽出した。また、座標Yt及びYbは、ライン
バッファ113内に付された絶対座標(第2図(A)の
Y座標)で示されるものとしている。
Next, the character cutting unit 114 reads the input character line data from the line buffer 113 and stores it in a direction perpendicular to the character line (direction indicated by Y in FIG. (referred to as "direction") is the main scanning direction, and scanning is performed sequentially from the left end to the right end, and the number of black hits in each column is counted to create a histogram of black bits. The rough character segmentation unit 114 examines the created histogram, and extracts character pattern data from end areas in which columns in which the number of black bits is equal to or greater than a predetermined first threshold of 8 are continuous for a predetermined second threshold or more. and extract this as pattern register 115.
Store in. Furthermore, when storing the character pattern data in the pattern register 115, the character cutting unit 114 also generates a character pattern number m (see FIG. See A))
, the coordinates Yt of the highest point and the coordinates Y of the lowest point on the line buffer 113 of the character pattern are output to the character position determining section 118a. In addition, in the case of this example, the first
The character pattern data was extracted by setting the threshold value B to 1 and setting the second threshold value B to 5. Further, the coordinates Yt and Yb are indicated by absolute coordinates (Y coordinates in FIG. 2(A)) assigned in the line buffer 113.

文字切り出し部114から出力された文字パタンデータ
を受は取ったパタンレジスタ115は、文字パタンデー
タをその2次元座標が再現出来る形式で格納する。
The pattern register 115 that receives the character pattern data output from the character cutting section 114 stores the character pattern data in a format that can reproduce its two-dimensional coordinates.

次に認識部116の候補文字名抽出部117は、パタン
レジスタ115に記憶されている文字パタンデータを読
み取り、これの特徴を所定の方法により抽出して特徴マ
トリクスを作成する。さらに、この特徴マトリクスと、
予め用意されている標準文字パタンの辞書マトリクスと
の類似度を算出し類似度の大きい順にに個までの辞書マ
トリクスの文字名を候補文字名として文字名決定部11
8の文字位置判定部118aに出力する。なお、この実
施例の場合、K=5としている。ここで、文字パタンデ
ータからの特徴の抽出は、従来公知の種々の方法により
行なうことが出来るが、この実施例の場合以下に説明す
るような方法で行なった。
Next, the candidate character name extraction unit 117 of the recognition unit 116 reads the character pattern data stored in the pattern register 115, extracts the characteristics of this data using a predetermined method, and creates a feature matrix. Furthermore, this feature matrix and
The character name determination unit 11 calculates the degree of similarity between standard character patterns prepared in advance and the dictionary matrix, and selects character names from the dictionary matrix in descending order of degree of similarity as candidate character names.
8 to the character position determination unit 118a. Note that in this embodiment, K=5. Here, extraction of features from character pattern data can be performed using various conventionally known methods, but in this embodiment, the extraction was performed using the method described below.

先ず、文字パタンデータ(ごついでその文字線部に外接
する例えば矩形の枠を検出する。
First, a rectangular frame, for example, circumscribing a character line portion is detected using character pattern data.

次に、この文字パタンの線幅W%下記(1)式で示され
る周知の近似式を用いて算出する。
Next, the line width W% of this character pattern is calculated using a well-known approximation formula shown by the following formula (1).

W= 1/ (1−Q/A)・・・(1)ここで(1)
式において、Qは、文字パタンを2×2ヒツトの窓から
のぞいた場合この窓内の4画素全てが黒ビットとなる窓
の数であり、Aは、文字パタン中の全黒ビットの個数で
ある。
W= 1/ (1-Q/A)...(1) where (1)
In the formula, Q is the number of windows in which all four pixels within this window are black bits when a character pattern is viewed through a 2 x 2 window, and A is the number of all black bits in the character pattern. be.

次に、この文字パタンを複数の方向に走査を行なって各
走査列毎の黒ビットの連続個数を検出し、この黒ヒツト
の連続個数と、上述の線幅Wとに基づいて上述の複数の
方向毎に対応したサブパターンをそれぞれ抽出する。そ
して、この文字パタンの上述の外接枠内を各サブパタン
について(NXM)個の領tft(N、Mは定数)にそ
れぞれ分割し、さらに各分割領域内の文字線を表わす特
徴量を各分割領域毎に計算し、この特徴量を文字枠の大
きさで正規化して特徴マトリクスを得る。
Next, this character pattern is scanned in multiple directions to detect the number of consecutive black bits in each scanning line, and based on this number of consecutive black bits and the line width W described above, the number of consecutive black bits is detected. A subpattern corresponding to each direction is extracted. Then, the above-mentioned circumscribed frame of this character pattern is divided into (NXM) regions tft (N, M are constants) for each subpattern, and the feature amount representing the character line in each divided region is calculated for each divided region. This feature quantity is normalized by the size of the character frame to obtain a feature matrix.

この実施例では、特徴量を(Δχ+ΔY)/2なる値で
除することによって正規化する。ここでΔXは外接枠の
水平方向の長さ、ΔYは外接枠の垂直方向の長さである
In this embodiment, the feature amount is normalized by dividing it by a value of (Δχ+ΔY)/2. Here, ΔX is the length of the circumscribing frame in the horizontal direction, and ΔY is the length of the circumscribing frame in the vertical direction.

また、このようにして求めた特徴マトリクスと、予め用
意されている標準文字パタンの辞書マトリクスとの類似
度の算出は、この実施例では、下記(2)式に従い求め
でいる。
Further, in this embodiment, the degree of similarity between the feature matrix obtained in this manner and a dictionary matrix of standard character patterns prepared in advance is calculated according to the following equation (2).

但し、(2)式中、日は類似度、flは被認識文字の文
字パタンデータの特徴マトリクスの要素値、9.は辞書
マトリクスの要素値、NXMは被認識文字の特徴マトリ
クス及び辞書マトリクスの次元数をそれぞれ示す。
However, in formula (2), day is the degree of similarity, fl is the element value of the feature matrix of the character pattern data of the character to be recognized, and 9. is the element value of the dictionary matrix, and NXM is the number of dimensions of the feature matrix of the character to be recognized and the dictionary matrix, respectively.

次に、文字名決定部118の動作lこつき説明する。な
お、この説明の理解を容易にするために、第2図(A)
に示した入力文字行データ21を処理する例により動作
説明を行なう。
Next, the operation of the character name determining section 118 will be explained. In addition, in order to facilitate understanding of this explanation, Fig. 2 (A)
The operation will be explained using an example of processing the input character line data 21 shown in FIG.

文字名決定部118の文字位置判定部118aは、文字
切り出し部114から入力された文字パタン番号mを予
め定めた特定の値nと比較しこの比較結果に応じ以下に
説明するように動作する。ここで、nは所定の正の整数
でありこの実施例の場合n=1としている。
The character position determining section 118a of the character name determining section 118 compares the character pattern number m input from the character cutting section 114 with a predetermined specific value n, and operates as described below according to the comparison result. Here, n is a predetermined positive integer, and in this embodiment, n=1.

(a) n5mであった場合即ち被認識文字が文字行の
先頭からn番目までの文字である場合、第4図(A)の
例で云うと第1番目の文字「工」の場合、文字位置判定
部118aは、候補文字名抽出部117から出力された
に個の候補文字名のうちの被認識文字に対する類似度が
最も大きい文字名を認識結果文字名として文字名出力端
子119に出力する。またざらに文字位置判定部118
aは、基準線算出テーブルll8c(別表1)から、上
述の認識結果文字名(「■」の文字名)に対応した所定
の係数α(以下、基準線算出係数と称することもある。
(a) If it is n5m, that is, if the character to be recognized is the nth character from the beginning of the character line, in the example of Figure 4 (A), if the first character is The position determination unit 118a outputs the character name with the greatest degree of similarity to the recognized character among the candidate character names output from the candidate character name extraction unit 117 to the character name output terminal 119 as a recognition result character name. . Additionally, the character position determination unit 118
a is a predetermined coefficient α (hereinafter sometimes referred to as a reference line calculation coefficient) corresponding to the above-mentioned recognition result character name (character name "■") from the reference line calculation table ll8c (Appendix 1).

)を基準線算出テーブル118cから読出し、この係数
αと、この認識結果文字名に対応する文字パタン「工」
の入力文字行データにおける座標この例では文字切り出
し部114から入力される最上点座標Y、及び最下点座
標Ybとを用い下記(3)式に従い入力文字行データの
基準線座標Y、を算出する。
) is read out from the reference line calculation table 118c, and this coefficient α and the character pattern “K” corresponding to the recognition result character name are read out.
In this example, the reference line coordinate Y of the input character line data is calculated according to the following formula (3) using the highest point coordinate Y and the lowest point coordinate Yb input from the character cutting section 114. do.

うな構成となっており、各英字の文字名と、基準線算出
係数αとを予め対応づけて登録することで構成しである
This configuration is such that the character name of each alphabetic character and the reference line calculation coefficient α are registered in advance in association with each other.

第2図(A)の入力文字行データ21の基準線座標Y、
について考えると、第1番目の文字パタンrIJの基準
線算出係数α、最上点座標Yt及び最下点座標Y、それ
ぞれが、 α=0.0 Yt =98 Yb=30 であるので、基準線座標Y、は、 Y、=30+Ox (98−30)=30となる。
The reference line coordinate Y of the input character line data 21 in FIG. 2(A),
Considering, the reference line calculation coefficient α, the highest point coordinate Yt, and the lowest point coordinate Y of the first character pattern rIJ are α=0.0 Yt=98 Yb=30, so the reference line coordinate Y, is: Y,=30+Ox (98-30)=30.

Y、=Yb +α (Yt−’v’b)   −・・ 
(3)次に、文字位置判定部118aは、算出した基準
線座標Ysを基準線記憶部118dに格納する。
Y, = Yb + α (Yt-'v'b) -...
(3) Next, the character position determination unit 118a stores the calculated reference line coordinate Ys in the reference line storage unit 118d.

なお、基準線算出テーブル118Cは、英大文字及び英
小文字用のもので考えると例えば別表1のよ(b)一方
m>Nであった場合即ち被認識文字が文字行の先頭から
n+1番目以降の文字である場合、文字位置判定部11
8aは以下に説明するように認識結果文字名を決定する
。この動作説明を、第2図(A)の第3番目の文字であ
る小文字「P」の例で行なう。
Note that the reference line calculation table 118C is for uppercase English letters and lowercase English letters, for example, as shown in Attached Table 1 (b).On the other hand, if m>N, that is, if the recognized character is from the (n+1)th or later position from the beginning of the character line, If it is a character, the character position determination unit 11
8a determines the recognition result character name as described below. This operation will be explained using the example of the lowercase letter "P" which is the third character in FIG. 2(A).

文字位置判定部118aは、文字切り出し部+14から
入力された文字パタンrp Jの最上点座標Yt及び最
下点座標Ybと、基準線記憶部118dに記憶されてい
る基準線座標Y3とを用い、下記(4)式に従い文字パ
タン「P」の位11を表わす特徴eを算出する。
The character position determining unit 118a uses the highest point coordinate Yt and the lowest point coordinate Yb of the character pattern rpJ input from the character cutting unit +14 and the reference line coordinate Y3 stored in the reference line storage unit 118d, A feature e representing the 11th digit of the character pattern "P" is calculated according to the following equation (4).

但しく4)式中Zは定数であり、この実施例の場合Z=
10としている。
However, 4) In the formula, Z is a constant, and in this example, Z=
It is set at 10.

次に、文字位置判定部118aは、候補文字名抽出部1
17から入力されでいるに個の候補文字名について被認
識文字に対する類似度の大きいものから順次該文字名に
対応する所定の値(文字位置特徴9L及び9Hと称する
。)を文字位置特徴テーブル118bから読出す。
Next, the character position determination unit 118a selects the candidate character name extraction unit 1.
Predetermined values (referred to as character position features 9L and 9H) corresponding to the candidate character names inputted from 17 are sequentially stored in the character position feature table 118b in descending order of similarity to the character to be recognized. Read from.

なお、文字位置特徴テーブル118bは、英大文字及び
英小文字用のもので考えると例えば別表2のような構成
となっており、各英字の文字名と、当該文字の基準線の
位1(下限座標9L及び上限座標9N)とを予め対応づ
けて登録することで構成しである。
Note that the character position feature table 118b has a structure as shown in Attached Table 2, for example, when considering uppercase English letters and lowercase English letters. 9L and the upper limit coordinate 9N) are registered in advance in association with each other.

次に、文字位置判定部118aは、文字位置特徴テーブ
ル118bから読出した文字位置特徴9.及び9Hと、
文字パタンの(4)式に従い算出した位置の特徴eとを
比較する。そして、比較結果が、9L≦e≦9H を満足した場合は、当該候補文字名を認識結果文字名と
して文字名出力端子119に出力する。
Next, the character position determining unit 118a determines the character position characteristics 9. read from the character position characteristic table 118b. and 9H,
The character pattern is compared with the positional feature e calculated according to equation (4). If the comparison result satisfies 9L≦e≦9H, the candidate character name is outputted to the character name output terminal 119 as a recognition result character name.

これに対し比較結果が、 9、〉e または eggH である場合は、当該候補文字名は認識結果文字名ではな
いと判定し、当該候補文字名の次に類似度が大きい候補
文字名に対して上述したと同様な処理を行なう。
On the other hand, if the comparison result is 9,〉e or eggH, it is determined that the candidate character name is not a recognition result character name, and the candidate character name with the next highest degree of similarity is selected. The same processing as described above is performed.

小文字「P」の認識結果文字名を決定する例について考
えると、この文字の文字パタンの最上点座標Yt及び最
下点座標Y、が、 vt  =50 Y、=15 であり、 入力文字行データの基準線座標Y、が、先に求めたよう
に、 Y、=30 であるので、小文字「P」の文字パタンの位置の特徴e
は、(4)式より、 となる。
Considering the example of determining the recognition result character name of the lowercase letter "P", the highest point coordinate Yt and the lowest point coordinate Y of the character pattern of this character are vt = 50 Y, = 15, and the input character line data Since the reference line coordinate Y, is Y, = 30 as calculated earlier, the character pattern position characteristic e of the lowercase letter "P" is
From equation (4), it becomes.

また、小文字「P」の候補文字名は、類似度順位層に別
表3に示すように「P」、rpJ、「0」、rl)J及
びrcJとなっている。
Further, candidate character names for the lowercase letter "P" are "P", rpJ, "0", rl)J, and rcJ as shown in Appendix 3 in the similarity ranking layer.

そこで、文字位置判定部118aは、先ず、第1位の候
補文字名「P」の文字位置特徴9L及び9Hを文字位置
特徴テーブル118bから読出し、これら9L及び9□
と、算出した位置の特徴eとを比較する。しかし、第1
位の候補文字名rpJは、第別表2からも明らかなよう
に、9L =O及び9H=1であるので、文字パタン「
P」の位置の特徴e=4.2との関係においてg、<e
となってしまう、従って、文字位置判定部118aは、
第1位の候補文字名rpJは認識結果文字名ではないと
判定する。
Therefore, the character position determining unit 118a first reads the character position features 9L and 9H of the first candidate character name "P" from the character position feature table 118b, and reads these 9L and 9□
and the calculated positional feature e are compared. However, the first
As is clear from Attached Table 2, the candidate character name rpJ for the position is 9L = O and 9H = 1, so the character pattern ``
In relation to the feature e=4.2 of the position of "P", g, < e
Therefore, the character position determination unit 118a
It is determined that the first candidate character name rpJ is not a recognition result character name.

次に文字位置判定部118aは、第2位の候補文字名「
P」について、第1位の候補文字名の場合と同様な処理
を行なう、この際、第2位の候補文字名「P」は、第3
図に示すように、9L=4及び9H=6であるので、文
字パタンrPJの位置の特徴e=4.2との関係におい
て9L≦e≦9Hを満足する。従って、文字位置判定部
118aは、第2位の候補文字名[PJを認識結果文字
名として決定し、文字名出力端子119に出力する。
Next, the character position determination unit 118a determines the second candidate character name “
The same process as for the first candidate character name is performed for "P". In this case, the second candidate character name "P" is
As shown in the figure, since 9L=4 and 9H=6, 9L≦e≦9H is satisfied in relation to the character pattern rPJ positional feature e=4.2. Therefore, the character position determination unit 118a determines the second candidate character name [PJ as the recognition result character name, and outputs it to the character name output terminal 119.

以上がこの発明の文字認識装置の実施例の説明である。The above is the description of the embodiment of the character recognition device of the present invention.

しかし、この発明は上述の実施例にのみ限定されるもの
ではなく以下に説明するような種々の変更を加えること
が出来る。
However, the present invention is not limited to the above-described embodiments, and various modifications as described below can be made.

上述の実施例は、文字行の先頭にある被認識文字と、2
文字目以降にある被認識文字とで認識処理を異ならせた
例であった。即ち文字行の先頭からの文字数n8n=1
と設定した例であった。しかしこのnの数は設計に応じ
変更出来ることは明らかである。ただし、nを2以上の
値にした場合の入力文字行データにおける基準線座標Y
、は、例えば以下のように算出するのが好適である。
In the above embodiment, the character to be recognized at the beginning of the character line, and the two
This is an example in which the recognition processing is different depending on the characters to be recognized after the first character. That is, the number of characters from the beginning of the character line n8n = 1
This is an example of setting. However, it is clear that the number n can be changed depending on the design. However, when n is set to a value of 2 or more, the reference line coordinate Y in the input character line data
, is preferably calculated as follows, for example.

〈第1の算出法〉 第1の方法としては、先頭からn番目までの文字毎で夫
々算出した基準線座標の平均値を基準線座標Y、とする
方法がある。
<First calculation method> As a first method, there is a method in which the average value of the reference line coordinates calculated for each character from the beginning to the nth character is set as the reference line coordinate Y.

例えば菓2図(A)の入力文字行データ21に対し、n
=2を設定した場合の例で説明すると、先ず第1番目の
文字「工」について(3)式に従いYs+を算出し、次
に第2番目の文字rnJについて(3)式に従いY、2
@算出し、これらの平均値(Y *I+ Y 12) 
/2を、n=2の場合における基準線座標Y、とする。
For example, for the input character line data 21 of Figure 2 (A), n
To explain using an example where = 2 is set, first calculate Ys+ for the first character "ENG" according to formula (3), then calculate Y, 2 for the second character rnJ according to formula (3).
@Calculate and average these values (Y * I + Y 12)
/2 is the reference line coordinate Y in the case of n=2.

く第2の算出法〉 第2の方法としては、先頭からn番目までの文字夫々の
認識結果文字名のうちで最も類似度の大きい認識結果文
字名に対応する予め定めた係数と、該認識結果文字名を
得た文字パタンの座標とを用いて算出した基準線座標を
基準線座標Y、とする方法がある。
Second calculation method> The second method uses a predetermined coefficient corresponding to the recognition result character name with the highest degree of similarity among the recognition result character names of the respective characters from the beginning to the nth character, and the recognition result character name. There is a method in which the reference line coordinates calculated using the coordinates of the character pattern from which the resulting character name is obtained are set as the reference line coordinates Y.

第2の算出法の具体例について第1の算出方法の場合と
同じ例で説明すると、菓1番目の文字「工」の認識時の
第1位候補文字名の類似度と、第2番目の文字「n」の
認識時の第1位候補文字名の類似度とを比較し、類似度
が大きい方の文字の第1位候補文字名のについて(3)
式に従い基準線座標を算出しこれを、n=2の場合の基
準線座標Y3とする。
To explain a specific example of the second calculation method using the same example as the first calculation method, the similarity of the first candidate character name at the time of recognition of the first character "ku" and the second Compare the similarity of the first candidate character name when recognizing the character "n" and consider the first candidate character name of the character with the greater similarity (3)
The reference line coordinate is calculated according to the formula, and this is set as the reference line coordinate Y3 in the case of n=2.

なお、nが3以上の場合の第1及び第2の算出方法の実
施は、n=2の場合と同様な手順で行なえる。
Note that the first and second calculation methods when n is 3 or more can be performed using the same procedure as when n=2.

また、上述の実施例は、基準線は1本でありか・つ基準
線をベースラインとした例であうた。、シかしこの発明
の実施に当たって基準線はベースラインに限られるもの
ではなく、他のもの例えばディッセンダーライン、ミー
ンライン、キャップラインまたはアッセンダーライン等
としても良い、さらに基準線は2本以上設定しても良い
Further, in the above embodiment, there is only one reference line, and the reference line is used as the baseline. However, in carrying out the present invention, the reference line is not limited to the baseline, and may be other lines such as a descender line, mean line, cap line, or ascender line, and moreover, two or more reference lines may be set. Also good.

(発明の効果) 上述した説明からも明らかなように、この発明の文字認
識装置によれば、文字行の先頭からn番目までの被認識
文字の認識結果を用いて文字行データの基準線を算出し
、n+1番目以降の被認識文字については当該文字の字
形と、当該文字の文字パタンの文字行データにおける座
標及び前記文字行データの基準線座標の相対的な位置関
係とに基づき認識を行なう、このため、n+1番目以降
の被認識文字中に、例えば大文字「P」、小文字「PJ
等のように字形が同じで大きざが異なる文字が含まれで
いても、両者を正確に識別出来る。
(Effects of the Invention) As is clear from the above description, the character recognition device of the present invention can determine the reference line of character line data using the recognition results of the n-th characters from the beginning of the character line. The n+1st and subsequent characters to be recognized are recognized based on the shape of the character, the coordinates of the character pattern of the character in the character line data, and the relative positional relationship between the reference line coordinates of the character line data. , Therefore, among the characters to be recognized after the n+1st character, for example, the uppercase letter "P", the lowercase letter "PJ
Even if characters with the same shape but different sizes are included, such as , the two can be accurately identified.

然も、基準線の算出は文字行中の先頭部分の1又は複数
文字を用いて行なうだけであるので、基準線の算出時開
は従来に比し極めて短時間で行なえる。また、基準線は
、1又は複数の文字を用いて算出するので、少ない文字
数で構成された文字行や同しような大きさの文字で構成
された文字行からも基準線が算出出来、この結果これら
文字行の認識も精度良く行なえる。
However, since the reference line is calculated only by using one or more characters at the beginning of the character line, the reference line calculation can be completed in a much shorter time than in the past. In addition, since the reference line is calculated using one or more characters, the reference line can also be calculated from a character line made up of a small number of characters or a character line made up of characters of similar size. These character lines can also be recognized with high accuracy.

これがため、形状の等しい文字も正確に然も高速1こ認
識出来る装Mを提供することが出来る。
Therefore, it is possible to provide a system M that can accurately and quickly recognize characters of the same shape.

別表3Attached table 3

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は、実施例の文字認識装置の構成を示すブロック
図、 第2図(A)及びCB)は、実施例の説明に供する図で
ある。 00・・・文字認識装置、 111・・・光信号2・・
・光電変換部、  +13−・・ラインバッファ4・・
・文字切り出し部、115・・・パタンレジスタ6・・
・認識部、    117・・・候補文字名抽出部8・
・・文字名決定部、 118a・・・文字位置判定部8
b・・・文字位置特徴テーブル 8c・・・基準線算出テーブル 8d・・・基準線記憶部、 119・・・文字名出力端
子21・・・入力文字行データ 23・・・基準線。
FIG. 1 is a block diagram showing the configuration of a character recognition device according to an embodiment, and FIGS. 2(A) and CB) are diagrams for explaining the embodiment. 00... Character recognition device, 111... Optical signal 2...
・Photoelectric conversion section, +13-...Line buffer 4...
・Character cutting section, 115...Pattern register 6...
・Recognition unit, 117...Candidate character name extraction unit 8・
...Character name determination section, 118a...Character position determination section 8
b...Character position feature table 8c...Reference line calculation table 8d...Reference line storage section, 119...Character name output terminal 21...Input character line data 23...Reference line.

Claims (5)

【特許請求の範囲】[Claims] (1)媒体からの光を光電変換し量子化して媒体上の文
字行の入力文字行データを得る光電変換部、該入力文字
行データより文字パタンを切り出す文字切り出し部及び
該文字パタンの特徴を抽出し被認識文字の認識結果文字
名を出力する認識部を具える文字認識装置において、 前記認識部を、下記(A)、(B)に従い認識結果文字
名を決定する構成としたことを特徴とする文字認識装置
。 (A)文字行の先頭からn番目までの被認識文字につい
ては、1文字づつ、当該被認識文字の文字パタンの字形
に起因する特徴を含む特徴に基づいて認識結果文字名を
決定する(但しnは正の任意の整数である)。 (B)前記文字行の先頭からn+1番目以降の被認識文
字については、1文字づつ、 [1]…当該被認識文字の文字パタンの字形に起因する
特徴を含む特徴、並びに [2]…前記n番目までの文字の認識結果文字名夫々に
対応する予め定めた係数の一部又は全部の係数及び該用
いる係数に対応する文字パタンの入力文字行データにお
ける座標を用いて算出した前記入力文字行データの基準
線座標と、当該被認識文字の文字パタンの入力文字行デ
ータにおける座標との相対位置 に基づいて認識結果文字名を決定する。
(1) A photoelectric conversion unit that photoelectrically converts and quantizes light from a medium to obtain input character line data of character lines on the medium, a character cutting unit that cuts out a character pattern from the input character line data, and characteristics of the character pattern. A character recognition device comprising a recognition unit that extracts and outputs a recognition result character name of a character to be recognized, characterized in that the recognition unit is configured to determine a recognition result character name according to (A) and (B) below. character recognition device. (A) For the characters to be recognized from the beginning of the character line to the n is any positive integer). (B) For each character to be recognized after the n+1th character from the beginning of the character line, [1]...features including features resulting from the shape of the character pattern of the character to be recognized, and [2]...the above. The input character line calculated using some or all of the predetermined coefficients corresponding to each of the character names as a result of recognition of up to the nth character, and the coordinates in the input character line data of the character pattern corresponding to the coefficients used. A recognition result character name is determined based on the relative position between the reference line coordinates of the data and the coordinates of the character pattern of the character to be recognized in the input character line data.
(2)請求項1に記載の文字認識装置において、前記認
識部を、被認識文字の文字パタンの字形に起因する特徴
に基づいて候補文字名を抽出する候補文字名抽出部と、
前記抽出された候補文字名から以下の(a)、(b)に
従い認識結果文字名を決定する文字名決定部とで構成し
たことを特徴とする文字認識装置。 (a)前記n番目までの文字については候補文字名のう
ちの被認識文字に対する類似度が最も大きい候補文字名
を認識結果文字名とする。 (b)前記n+1番目以降の被認識文字については、 前記基準線座標と、被認識文字の文字パタンの座標とに
基づいて該文字パタンの入力文字行データにおける位置
の特徴を算出し、 該位置の特徴を被認識文字の候補文字名に対応する所定
の値と比較して予め定めた条件を満足した場合該候補文
字名を認識結果文字名とし、一方、満足しなかった場合
は第二位以下の候補文字名につき該候補文字名に関連す
る所定の値及び前記位置の特徴間の比較を行ない、第二
位以下の候補文字名のうちの前記予め定めた条件を満足
した候補文字名を認識結果文字名とする。
(2) The character recognition device according to claim 1, wherein the recognition unit includes a candidate character name extraction unit that extracts a candidate character name based on a feature resulting from a character shape of a character pattern of a character to be recognized;
A character recognition device comprising: a character name determining unit that determines a recognition result character name from the extracted candidate character names according to (a) and (b) below. (a) For the characters up to the n-th character, among the candidate character names, the candidate character name with the greatest degree of similarity to the recognized character is set as the recognition result character name. (b) For the n+1st and subsequent characters to be recognized, calculate the characteristics of the position of the character pattern in the input character line data based on the reference line coordinates and the coordinates of the character pattern of the character to be recognized, and calculate the position of the character pattern in the input character line data; Compare the characteristics of the character with a predetermined value corresponding to the candidate character name of the character to be recognized, and if the predetermined conditions are satisfied, the candidate character name is set as the recognition result character name, and on the other hand, if the characteristics are not satisfied, the second place For the following candidate character names, a comparison is made between a predetermined value related to the candidate character name and the characteristics of the position, and a candidate character name that satisfies the predetermined condition among the second or lower candidate character names is selected. Use the recognition result character name.
(3)請求項1又は2に記載の文字認識装置において、 前記基準線座標を、先頭からn番目までの文字毎で夫々
算出した基準線座標の平均値としたことを特徴とする文
字認識装置。
(3) The character recognition device according to claim 1 or 2, wherein the reference line coordinate is an average value of reference line coordinates calculated for each character from the beginning to the nth character. .
(4)請求項1又は2に記載の文字認識装置において、 前記基準線座標を、先頭からn番目までの文字夫々の認
識結果文字名のうちで最も類似度の大きい認識結果文字
名に対応する予め定めた係数と、該認識結果文字名を得
た文字パタンの座標とを用いて算出した基準線座標とし
たこと を特徴とする文字認識装置。
(4) In the character recognition device according to claim 1 or 2, the reference line coordinates correspond to the recognition result character name with the highest degree of similarity among the recognition result character names of the respective characters from the beginning to the nth character. A character recognition device characterized in that the reference line coordinates are calculated using a predetermined coefficient and the coordinates of a character pattern from which the recognition result character name is obtained.
(5)請求項1〜4のいずれか1項に記載の文字認識装
置において、 前記基準線をディッセンダーライン、ベースライン、ミ
ーンライン、キャップライン及びアッセンダーラインの
中から選ばれた1以上のラインとしたことを特徴とする
文字認識装置。
(5) In the character recognition device according to any one of claims 1 to 4, the reference line is one or more lines selected from a descender line, a baseline, a mean line, a cap line, and an ascender line. A character recognition device characterized by the following.
JP1264734A 1989-10-11 1989-10-11 Character recognition device Expired - Lifetime JP2788506B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1264734A JP2788506B2 (en) 1989-10-11 1989-10-11 Character recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1264734A JP2788506B2 (en) 1989-10-11 1989-10-11 Character recognition device

Publications (2)

Publication Number Publication Date
JPH03126188A true JPH03126188A (en) 1991-05-29
JP2788506B2 JP2788506B2 (en) 1998-08-20

Family

ID=17407431

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1264734A Expired - Lifetime JP2788506B2 (en) 1989-10-11 1989-10-11 Character recognition device

Country Status (1)

Country Link
JP (1) JP2788506B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0573723A (en) * 1991-09-17 1993-03-26 Oki Electric Ind Co Ltd Character classifying method and character recognition device
JP5913763B1 (en) * 2015-07-17 2016-04-27 楽天株式会社 Reference line setting device, reference line setting method, and reference line setting program
WO2017013719A1 (en) * 2015-07-17 2017-01-26 楽天株式会社 Character recognition device, character recognition method, and character recognition program

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS55112687A (en) * 1979-02-22 1980-08-30 Nec Corp Character recognition system
JPS62187988A (en) * 1985-10-01 1987-08-17 ザ パランチ−ル コ−ポレ−シヨン Processing means used in optical character recognition system
JPH01108691A (en) * 1987-10-21 1989-04-25 Sharp Corp Character image processing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS55112687A (en) * 1979-02-22 1980-08-30 Nec Corp Character recognition system
JPS62187988A (en) * 1985-10-01 1987-08-17 ザ パランチ−ル コ−ポレ−シヨン Processing means used in optical character recognition system
JPH01108691A (en) * 1987-10-21 1989-04-25 Sharp Corp Character image processing system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0573723A (en) * 1991-09-17 1993-03-26 Oki Electric Ind Co Ltd Character classifying method and character recognition device
JP5913763B1 (en) * 2015-07-17 2016-04-27 楽天株式会社 Reference line setting device, reference line setting method, and reference line setting program
WO2017013720A1 (en) * 2015-07-17 2017-01-26 楽天株式会社 Reference line setting device, reference line setting method, and reference line setting program
WO2017013719A1 (en) * 2015-07-17 2017-01-26 楽天株式会社 Character recognition device, character recognition method, and character recognition program
JPWO2017013719A1 (en) * 2015-07-17 2018-03-08 楽天株式会社 Character recognition device, character recognition method, and character recognition program
US10657404B2 (en) 2015-07-17 2020-05-19 Rakuten, Inc. Character recognition device, character recognition method, and character recognition program

Also Published As

Publication number Publication date
JP2788506B2 (en) 1998-08-20

Similar Documents

Publication Publication Date Title
US4903312A (en) Character recognition with variable subdivisions of a character region
US5048107A (en) Table region identification method
JPH05242292A (en) Separating method
US5526440A (en) Hand-written character recognition apparatus
JP2926066B2 (en) Table recognition device
JPH03126188A (en) Character recognizing device
KR0186025B1 (en) Candidate character classification method
JPH0516632B2 (en)
JP2675303B2 (en) Character recognition method
JP2902097B2 (en) Information processing device and character recognition device
JP3083609B2 (en) Information processing apparatus and character recognition apparatus using the same
JP2918363B2 (en) Character classification method and character recognition device
JP4011859B2 (en) Word image normalization device, word image normalization program recording medium, and word image normalization program
JP3104355B2 (en) Feature extraction device
JPS63126082A (en) Character recognizing system
JP2963474B2 (en) Similar character identification method
JPH05114047A (en) Device for segmenting character
JP3127413B2 (en) Character recognition device
JPH05108882A (en) Character recognition device
JP2974167B2 (en) Large Classification Recognition Method for Characters
JPH1021332A (en) Non-linear normalizing method
JPS63131287A (en) Character recognition system
JPH06131496A (en) Pattern normalization processing method
JPH03219384A (en) Character recognizing device
JPH03290773A (en) Character type deciding device and character recognizing device