JPH03126188A

JPH03126188A - Character recognizing device

Info

Publication number: JPH03126188A
Application number: JP1264734A
Authority: JP
Inventors: Hiroshi Yoshida; 浩史吉田; Toru Ishikawa; 石川　融; Koichi Higuchi; 浩一樋口; Yoshiyuki Yamashita; 山下　義征
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1989-10-11
Filing date: 1989-10-11
Publication date: 1991-05-29
Anticipated expiration: 2013-08-20
Also published as: JP2788506B2

Abstract

PURPOSE:To accurately identify characters with the same character type and different size by calculating the reference line of character row data by using the recognition result of first to n-th characters to be recognized of the character row, and hereafter performing recognition based on the character type of the character and the relative position relation of the coordinate of a character pattern in character row data and that of the reference line. CONSTITUTION:A photoelectric conversion part 112 which obtains the input character row data of the character row on a medium by performing the photoelectric conversion and quantization of an optical signal 111 from the medium, a line buffer 113, a character segmenting part 114, and a pattern register 115 which stores a segmented character pattern are provided. And the reference line of the character row is decided based on the recognition result of the first to n-th characters of the character row when each character on the character row is recognized, and when the characters behind an (n+1)th character is recognized, the recognition is performed based on the character type of the character and the relative relation of the quantizing pattern of the character and the reference line. Thereby, it is possible to accurately identify the characters even when the characters with the same character type and different size are contained.

Description

【発明の詳細な説明】（産業上の利用分野）この発明は、高い認識精度を得ることが出来る文字認識
装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a character recognition device that can obtain high recognition accuracy.

（従来の技術）機械が文字図形を自動的に識別出来れば、例えばコンピ
ュータへのデータ入力を人間が行なうより効率良くかつ
正確に行なうことが出来る等、種々の利点が得られる。(Prior Art) If a machine can automatically identify characters and graphics, various advantages can be obtained, such as being able to enter data into a computer more efficiently and accurately than humans can.

このため、文字認識装置に間する研究が従来がら盛んに
行なわれている。For this reason, research into character recognition devices has been actively conducted.

従来の文字認識装置は、一般（こ、以下の■〜■に示す
ような構成成分を具えていた。Conventional character recognition devices generally include the following components.

［１］…文字、図形等が記載されている媒体例えば帳票
を走査して得られた光信号を光電変換し、ざらに文字線
部を例えば黒ビット、背景部を白ヒツトで表わした２碕
の入力文字行データを得る光電変換部。[1]...The optical signal obtained by scanning a medium, such as a form, on which characters, figures, etc. are written is photoelectrically converted, and the text lines are roughly represented by black bits, and the background part is represented by white hits. A photoelectric conversion unit that obtains input character line data.

［２］…この入力文字行データより文字パタンを切り出
す文字切り出し部。[2]...Character cutting unit that cuts out a character pattern from this input character line data.

■・・・この文字パタンより特徴量を抽出し、この特徴
Ｊｌを予め用意しである標準文字の特徴量と比較し最も
類似度の高い標準文字パタンの文字名を被認識文字の認
識結果文字名として出力する認識部。■... Extract the feature amount from this character pattern, compare this feature Jl with the feature amount of the standard character prepared in advance, and select the character name of the standard character pattern with the highest degree of similarity as the recognition result character of the character to be recognized. Recognizer that outputs as a name.

しかし、このような文字認識装置で外国語例えば英語の
文１、或いは英字で記された氏名、住所等の文字行を認
識する場合、この文字行中１こはカンマ［、Ｊとアポス
トＯフィ「゛」、また、大文字「Ｐ」と小文字「ＰＪ等
のように形状の全く等しい文字が混在しているため、文
字パタンの字形のみでは文字認識を精度良く行なうこと
が出来ないという問題点があった。However, when such a character recognition device recognizes a sentence in a foreign language, such as English, or a line of characters written in English, such as a name or address, the first line in the line is a comma [, J, and a post O letter. Also, since there are characters with exactly the same shape, such as the capital letter "P" and the lowercase letter "PJ," there is a problem that character recognition cannot be performed accurately based only on the shape of the character pattern. there were.

そこでこの問題点を解決するために、文字パタンの字形
に加え文字パタンの大きざ及び文字パタンの文字行中の
相対的位Ｍｌｆｒ用いて文学誌ｍを行なう方法が用いら
れていた。Therefore, in order to solve this problem, a method has been used in which literary journal m is performed using the size of the character pattern and the relative position Mlfr of the character pattern in the character line in addition to the shape of the character pattern.

この種の方法としでは例えば文献（昭和６３年電子情報
通信学会春季全国大会（昭和６３年３．１５）　Ｄ−４
４８）に開示されているものがあった。Examples of this type of method include the literature (1988 Institute of Electronics, Information and Communication Engineers Spring National Conference (March 15, 1988) D-4
48) was disclosed.

この文献に開示された方法によれば、先ず、文字行から
文字に外接する矩形枠が抽出される０次に、文字行中の
各文字の外接矩形枠が比較され、最も大きい文字に比し
極端に小さい文字が除去される０次に、残った文字の外
接矩形枠の上端及び下端の高ざの位置によるヒストグラ
ムが作成される０次に、このヒストグラムより、矩形上
端で最も低い位置にあるビークと、矩形下端で最も高い
位置にあるピークとが検出されこれらビーク闇の距離と
ほぼ同じ大きざの文字の上下端の座標を用いて最小二自
乗法により文字行の傾きを与える直線が求められる０次
に、得られたＭ線の傾きよりスキューによる文字高ざの
ずれが補正された後再び先に説明したと同様な方法でヒ
ストグラムが作成される。次に、このヒストグラムより
、先に説明したと同様に２つのピークが検出されこれら
と一部が上側基準線及び下側基準線とされる０次に、こ
れら上側及び下側基準線間の距離が基準サイズの文字と
され、文字行の各文字バクンの大きざがこの基準サイズ
文字の大きざと比較されまた、各文字パクンの位置が上
側及び下側基準線と比較される。そしてこの比較結果に
基づき文字行の各文字が複数のカテゴリに分類され、こ
れにより認識精度の向上が図られていた。According to the method disclosed in this document, first, a rectangular frame circumscribing a character is extracted from a character line. Next, the circumscribing rectangular frames of each character in the character line are compared, and the rectangular frame circumscribing each character in the character line is compared. Extremely small characters are removed Next, a histogram is created based on the height positions of the top and bottom edges of the circumscribed rectangle of the remaining characters. From this histogram, the lowest position at the top of the rectangle is The beak and the peak located at the highest position at the bottom of the rectangle are detected, and a straight line giving the slope of the character line is found by the least squares method using the coordinates of the upper and lower ends of the character whose size is approximately the same as the distance between these beaks. Next, the deviation in character height due to skew is corrected from the slope of the obtained M line, and then a histogram is created again in the same manner as described above. Next, from this histogram, two peaks are detected in the same way as explained earlier, and these and a portion are taken as the upper reference line and the lower reference line. Next, the distance between these upper and lower reference lines is is set as a standard size character, and the size of each character in the character line is compared with the size of this standard size character, and the position of each character in the line is compared with the upper and lower reference lines. Based on the results of this comparison, each character in the character line is classified into multiple categories, thereby improving recognition accuracy.

（発明が解決しようとする課題）しかしながら、上述した文献に開示されている従来の文
字認識方法は、文字行中の全文字を対象としで、矩形情
報の入力、像小文字の除去、行傾き補正、ざらに基準線
算出等の一連の処理を行なう必要があるため、認ｍ速度
が著しく低下してしまうという問題点があった。(Problem to be Solved by the Invention) However, the conventional character recognition method disclosed in the above-mentioned literature targets all characters in a character line, inputs rectangular information, removes lowercase characters, and corrects line inclination. Since it is necessary to perform a series of processes such as rough reference line calculation, there is a problem in that the recognition speed is significantly reduced.

また、１文字や２文字程度の文字で構成されている短い
文字行の場合、ヒストグラムを作成するためのデータが
非常に少ないので、基準線になるピークを正確に検出出
来ない、従って従来の方法は短い文字行には適用出来ず
、また適用したとしても認識精度はかえって低下しでし
まうという問題点があった。In addition, in the case of short character lines consisting of one or two characters, there is very little data to create a histogram, so it is not possible to accurately detect the peak that will serve as the reference line. cannot be applied to short character lines, and even if it were applied, there was a problem in that the recognition accuracy would actually decrease.

また、はとんどが同じ大きざの文字で構成されでいる文
字行の場合は、ヒストグラムにおける凹凸が小さいため
、基準線及び基準サイズが検出出来ず、従って正確な文
字認識を行なうことが出来ないという問題点があった。In addition, in the case of a character line consisting of characters that are mostly the same size, the unevenness in the histogram is small, so the reference line and reference size cannot be detected, and therefore accurate character recognition cannot be performed. The problem was that there was no.

この発明はこのような点に鑑みなされたものであり、従
ってこの発明の目的は、上述の問題点を解決し、形状の
等しい文字も正確に然も高速に認識出来る文字認識装Ｍ
を提供することにある。The present invention has been made in view of the above points, and therefore, an object of the present invention is to provide a character recognition device M that can solve the above-mentioned problems and can recognize characters of the same shape accurately and at high speed.
Our goal is to provide the following.

（課題を解決するための手段）この目的の達成を図るため、この出願に係る発明者はｔ
ｉ々の検討を重ねた。その結果、文字行の各文字を認識
する際に文字行の先頭からｎ文字目までの認識結果に基
づいてこの文字行の基準ＩＩを決定し、ｎ＋１文字目以
降の文字を認識するに当たっては該文字の字形及び該文
字の量子化パタン（文字パタン）の前記基準線との相対
的な位ＭＩＷｉ係に基づいて認識を行なえば、この発明
の目的が達成出来るという結論を得た。(Means for solving the problem) In order to achieve this objective, the inventor of this application
We have repeatedly considered various issues. As a result, when recognizing each character in a character line, the standard II for this character line is determined based on the recognition results from the beginning of the character line to the nth character, and when recognizing characters from the n+1st character onward, the It was concluded that the object of the present invention can be achieved if recognition is performed based on the MIWi relationship of the character shape and the quantization pattern (character pattern) of the character relative to the reference line.

従ってこの発明によれば、媒体からの光を光電変換し量
子化して媒体上の文字行の入力文字行データを得る光電
変換部、該入力文字行データより文字パタンを切り出す
文字切り出し部及び該文字パタンの特徴を抽出し被認識
文字の認識結果文字名を出力する認識部を具える文字認
識装置において、前述の認識部を、下記（Ａ）、（Ｂ）に従い認識結果文
字名を決定する構成としたことを特徴とする文字認識装
置。Therefore, according to the present invention, there is provided a photoelectric conversion unit that photoelectrically converts and quantizes light from a medium to obtain input character line data of a character line on the medium, a character cutting unit that cuts out a character pattern from the input character line data, and a character cutting unit that cuts out a character pattern from the input character line data, and In a character recognition device comprising a recognition unit that extracts features of a pattern and outputs a recognition result character name of a recognized character, the recognition unit described above is configured to determine a recognition result character name according to (A) and (B) below. A character recognition device characterized by:

（Ａ）文字行の先頭からｎ番目までの被認識文字につい
ては、１文字づつ、当該被認識文字の文字パタンの字形
に起因する特徴を含む特徴に基づいて認識結果文字名を
決定する（但しｎは任意の正の整数である）。(A) For the characters to be recognized from the beginning of the character line to the n is any positive integer).

（Ｂ）前述の文字行の先頭からｎ＋１番目以降の被認識
文字についでは、１文字づつ、［１］…当該被認識文字の文字パタンの字形に起因する
特徴を含む特徴、並びに［２］…前述のｎ番目までの文字の認識結果文字名犬々
に対応する予め定めた係数の一部又は全部の係数及び核
用いる係数に対応する文字パタンの入力文字行データに
おける座標を用いて算出した前述の入力文字行データの
基準線座標と、当該被認識文字の文字パタンの入力文字
行データにおける座標との相対位置に基づいて認識結果文字名を決定する。(B) For the characters to be recognized after the (n+1)th character from the beginning of the character line mentioned above, each character is: [1]...Features including features resulting from the shape of the character pattern of the character to be recognized, and [2]... The above-mentioned recognition result of the characters up to the n-th character is calculated using some or all of the predetermined coefficients corresponding to the characters and the coordinates in the input character line data of the character pattern corresponding to the coefficients to be used. A recognition result character name is determined based on the relative position of the reference line coordinates of the input character line data and the coordinates of the character pattern of the character to be recognized in the input character line data.

ここで、当該被認識文字の文字パタンの字形に起因する
特徴を含む特徴とは、例えば、被認識文字の字形のみの
特徴、被認識文字の大きざによる特徴、被認識文字の字
形及び大きざのそれぞれの特徴等のことである。Here, the features that include features resulting from the shape of the character pattern of the recognized character include, for example, features of only the shape of the recognized character, features due to the size of the recognized character, and features of the character shape and size of the recognized character. It refers to the characteristics of each.

なおこの発明の英施に当たり、前述の認識部を、被認識文字の文字パタンの字形に起因する特徴に基づい
て候補文字名を抽出する候補文字名抽出部と、前述の抽
出された候補文字名から以下の（ａ）、（ｂ）に従い認
識結果文字名を決定する文字名決定部とで構成するのが
好適である。In implementing this invention, the above-mentioned recognition section is replaced with a candidate character name extraction section that extracts a candidate character name based on the characteristics resulting from the shape of the character pattern of the character to be recognized, and the above-mentioned extracted candidate character name. It is preferable that the character name determination unit is configured to include a character name determination unit that determines a recognition result character name according to (a) and (b) below.

（ａ）前述のｎ番目までの文字については候補文字名の
うちの被認識文字に対する類似度が最も大きい候補文字
名を認識結果文字名とする。(a) For the above-mentioned characters up to the n-th character, the candidate character name with the greatest degree of similarity to the character to be recognized among the candidate character names is set as the recognition result character name.

（ｂ）前述のｎ＋１番目以降の被認識文字については、前述の基準線座標と、被認識文字の文字パタンの座標と
に基づいて該文字パタンの入力文字行データにおける位
置の特徴を算出し、該位置の特徴を被認識文字の候補文字名に関連する所定
の値と比較して予め定めた条件を満足した場合該候補文
字名を認識結果文字名とし、方、満足しなかった場合は
第三位以下の候補文字名につき該候補文字名に関連する
所定の値及び前述の位言の特徴間の比較を行ない、第三
位以下の候補文字名のうちの前述の予め定めた条件を満
足した候補文字名を認識結果文字名とする。(b) For the n+1st and subsequent characters to be recognized, calculate the positional characteristics of the character pattern in the input character line data based on the reference line coordinates and the coordinates of the character pattern of the character to be recognized; The feature of the position is compared with a predetermined value related to the candidate character name of the character to be recognized, and if the predetermined condition is satisfied, the candidate character name is set as the recognition result character name; A comparison is made between the predetermined value related to the candidate character name and the above-mentioned positional features for the candidate character names ranked 3rd or lower, and the aforementioned predetermined conditions are satisfied among the candidate character names ranked 3rd or lower. The candidate character name obtained is set as the recognition result character name.

（作用）この発明の文字認識装置によれば、文字行の先頭からｎ
番目までの被認識文字の認識結果を用いて、ｎ＋１番目
以降の被認識文字の候補文字名が認識結果文字名として
適切か否かの判定に用いる文字位置の基準線を算出し、
この基準線と被認識文字の文字パタンかう得られるデー
タとに基づいて認識結果文字名を決定出来る。このため
、ｎ＋１番目以降の被認識文字中に、例えば大文字「Ｐ
」、小文字ｒＰ　Ｊ等のように字形が同じで大きさが異
なる文字が含まれていても、両者を正確に識別出来る。(Operation) According to the character recognition device of the present invention, n characters from the beginning of a character line are
Using the recognition results of up to the recognition character, calculate a reference line of character positions used to determine whether candidate character names of the n+1st and subsequent recognition characters are appropriate as recognition result character names,
The recognition result character name can be determined based on this reference line and the obtained character pattern data of the character to be recognized. For this reason, for example, the capital letter "P" is included in the characters to be recognized after the n+1st
Even if characters with the same shape but different sizes are included, such as ``'', lowercase rP, J, etc., the two can be accurately identified.

このため文字行全体における認識精度の向上が図れる。Therefore, recognition accuracy for the entire character line can be improved.

然も、基準線の算出は文字行中の先頭部分の１又は複数
文字を用いて行なうだけであるので、基準線の算出時間
は従来に比し極めて短時間で行なえる。従って、認識時
間の短縮が図れる。However, since the reference line is calculated only by using one or more characters at the beginning of a character line, the time required to calculate the reference line can be extremely shortened compared to the conventional method. Therefore, the recognition time can be shortened.

（実施例）以下、図面を参照してこの発明の文字認識装置の実施例
につき説明する。(Embodiments) Hereinafter, embodiments of the character recognition device of the present invention will be described with reference to the drawings.

Ｓ′Ｉ　刀−量第１図は、実施例の文字認識装置の構成を概略的に示し
たプロ・ンク図である。Figure 1 is a diagram schematically showing the configuration of a character recognition device according to an embodiment.

第１図において、１００は文字認識装置、１１１は媒体
（例えば帳票）からの光信号、１１２は媒体からの光信
号１１１を光電変換し量子化して媒体上の文字行の入力
文字行データを得る光電変換部、１１３はこの入力文字
行データを格納するためのラインバッファ、１１４はラ
インバッファ内の入力文字行データより文字パタンを切
り出す文字切り出し部、１１５は切り出した文字パタン
を格納するパタンレジスタをそれぞれ示す、これら光電
変換部１１２、ラインバッファ１１３、文字切り出しｇ
Ｅ＋１４及びパタンレジスタ１１５は、それぞれ従来公
知の回路で構成しである。またこの実施例の場合、ライ
ンバッファ１１３は１２８　ｘ４０９６画素の容量を有
するメモリで構成しであり、パタンレジスタ１１５は１
２８　Ｘ５１２画素の容量を有するメモリで構成しであ
る。In FIG. 1, 100 is a character recognition device, 111 is an optical signal from a medium (for example, a form), and 112 is a photoelectric conversion and quantization of the optical signal 111 from the medium to obtain input character line data for character lines on the medium. 113 is a photoelectric conversion unit; 113 is a line buffer for storing this input character line data; 114 is a character cutting unit that cuts out a character pattern from the input character line data in the line buffer; and 115 is a pattern register that stores the cut out character pattern. These photoelectric conversion unit 112, line buffer 113, and character cutout g are shown respectively.
E+14 and pattern register 115 are each constructed from conventionally known circuits. In this embodiment, the line buffer 113 is composed of a memory having a capacity of 128 x 4096 pixels, and the pattern register 115 is composed of a memory having a capacity of 128 x 4096 pixels.
It consists of a memory having a capacity of 28 x 512 pixels.

ざらに第１図において１１６は、この発明に係る認識部
を示す、この認識部１１６は、下記（Ａ）、（Ｂ）に従
い認識結果文字名を決定する構成としである。Briefly, in FIG. 1, reference numeral 116 indicates a recognition unit according to the present invention. This recognition unit 116 is configured to determine a recognition result character name according to (A) and (B) below.

（Ａ）文字行の先頭からｎ番目までの被認識文字につい
ては、１文字づつ、被認識文字の文字パタンの字形に起
因する特徴を含む特徴に基づいて認識結果文字名を決定
する。(A) For each character to be recognized from the beginning to the nth character in a character line, a recognition result character name is determined for each character based on features including features resulting from the shape of the character pattern of the character to be recognized.

（Ｂ）前記文字行の先頭からｎ＋１番目以降の被認識文
字については、［１］…１文字づつ、当該被認識文字の文字パタンの字
形に起因する特徴、並びに［２］…前記ｎ番目までの文字の認識結果文字名夫々に
対応する予め定めた係数の一部又は全部の係数及び核用
いる係数に対応する文字パタンの入力文字行データにお
ける座標を用いて算出した前記入力文字行データの基準
線座標と、当該被認識文字の文字パタンの入力文字行デ
ータにおける座標との相対位置に基づいて認識結果文字名を決定する。(B) For the characters to be recognized after the (n+1)th character from the beginning of the character line, [1]...Characteristics due to the shape of the character pattern of the character to be recognized, character by character, and [2]...Up to the nth character above. The standard of the input character line data calculated using some or all of the predetermined coefficients corresponding to each character name and the coordinates in the input character line data of the character pattern corresponding to the coefficient used as the core. A recognition result character name is determined based on the relative position of the line coordinates and the coordinates of the character pattern of the character to be recognized in the input character line data.

そして、上述の（Ａ）及び（Ｂ）の処理を容易にするた
め、この実施例の認識部１１６は、被認識文字の文字パ
タンの字形に起因する特徴に基づいて候補文字名を抽出
する候補文字名抽出部１１７と、前記抽出された候補文
字名から以下の（ａ）、（ｂ）に従い認識結果文字名を
決定するために文字位置判定部１１８ａ、文字位置特徴
テーブル１１８ｂ、基準線算出部１１８Ｃ及び基準線記
憶部＋＋８ｄＭ具える文字名決定部１１８とて構成しで
ある。In order to facilitate the processing of (A) and (B) above, the recognition unit 116 of this embodiment extracts candidate character names based on the features resulting from the shape of the character pattern of the character to be recognized. A character name extraction unit 117, a character position determination unit 118a, a character position feature table 118b, and a reference line calculation unit for determining recognition result character names from the extracted candidate character names according to (a) and (b) below. 118C and a character name determining section 118 comprising a reference line storage section ++8 dM.

（ａ）文字行の先頭からｎ番目までの被認識文字につい
ては候補文字名のうちの被認識文字に対する類似度が最
も大きい候補文字名を認識結果文字名とする。(a) For the characters to be recognized from the beginning of the character line to the n-th character, the candidate character name with the greatest degree of similarity to the character to be recognized among the candidate character names is set as the recognition result character name.

（ｂ）前述の文字行の先頭からｎ＋１番目以降の被認識
文字についでは、前記基準線座標と、被認識文字の文字パタンの座標とに
基づいて該文字パタンの入力文字行データにおける位置
の特徴を算出し、該位置の特徴を被認識文字の候補文字名に関連する所定
の値と比較して予め定めた条件を満足した場合該候補文
字名を認識結果文字名とし、方、満足しなかった場合は
第三位以下の候補文字名につき該候補文字名に関連する
所定の値及び前記位置の特徴間の比較を行ない、第三位
以下の候補文字名のうちの前記予め定めた条件を満足し
た候補文字名を認識結果文字名とする。(b) For the characters to be recognized after the n+1th character from the beginning of the character line, the characteristics of the position of the character pattern in the input character line data are based on the reference line coordinates and the coordinates of the character pattern of the character to be recognized. is calculated, and the characteristics of the position are compared with a predetermined value related to the candidate character name of the character to be recognized, and if the predetermined condition is satisfied, the candidate character name is set as the recognition result character name; In this case, the predetermined value related to the candidate character name and the characteristics of the position are compared for the candidate character name in the third place or below, and the predetermined condition is determined for the candidate character name in the third place or below. The satisfied candidate character name is set as the recognition result character name.

ここで、基準線算出テーブル１１８ｃは、認識結果文字
名人々に対応する上述した予め定めた係数を格納してい
る。また、文字位置特徴テーブル１１８ｂは、被認識文
字の候補文字名に間違する上述の所定の値を格納してい
る。Here, the reference line calculation table 118c stores the above-mentioned predetermined coefficients corresponding to the recognition result character names. Further, the character position feature table 118b stores the above-mentioned predetermined value that is incorrect as a candidate character name of a character to be recognized.

ざらに第１図において１１９は文字名決定部１１８で決
定された文字名を例えば外部コンヒューク、表示製画等
に出力するための文字名出力端子を示す。Briefly, in FIG. 1, reference numeral 119 indicates a character name output terminal for outputting the character name determined by the character name determining section 118 to an external console, display screen, etc.

Ｉ＋ＩｆＴＪ−の　　　　日次に、実施例の文字認識装置の理解を深めるために、第
１図、別表１、別表２、第２図（Ａ）及びＣＢ）並びに
別表３を参照して実施例の文字認識装置の動作説明を行
なう、ここで、別表１は、基準線算出テーブル１１８ｃ
の説明に供する表、別表２は、文字位置特徴テーブル＋
１８ｂの説明に供する表、第２図（Ａ）は、ラインバッ
ファに記憶されている入力文字行データ２１の説明に供
する図、第２図（Ｂ）は、入力文字行データ２１におけ
る基準線２３の説明に供する図、別表３は、被認識文字
が小文字「Ｐ」である場合における候補文字名及び認識
結果文字名の説明に供する表である。In order to deepen the understanding of the character recognition device of the example, we will refer to Figure 1, Attached Table 1, Attachment 2, Figure 2 (A) and CB), and Attachment 3 to understand the character recognition device of the example. The operation of the recognition device will be explained. Here, Appendix 1 shows the reference line calculation table 118c.
Attached Table 2, which provides an explanation of the character position feature table +
18b is a table for explaining the input character line data 21 stored in the line buffer. FIG. 2(B) is a table for explaining the input character line data 21 stored in the line buffer. Table 3 is a table for explaining candidate character names and recognition result character names when the character to be recognized is the lowercase letter "P".

先ず、文字、図形等（以下、単に文字と称する）が記！
された帳票からの光信号１１１は光電変換部１１２に入
力される。光電変換部１１２は、この光信号１１１ヲ光
電変換し文字線部が例えば黒ヒツトで表現され背景部が
白ビットで表現される２４７ｍのディジクル信号（この
信号が入力文字行データに相当する。）に変換し、この
入力文字行データをラインバッファ１１３に格納する。First, write down the characters, figures, etc. (hereinafter simply referred to as characters)!
An optical signal 111 from the generated form is input to a photoelectric conversion unit 112. The photoelectric conversion unit 112 photoelectrically converts this optical signal 111 into a 247 m digital signal in which the character line portion is expressed by black bits and the background portion is expressed by white bits (this signal corresponds to input character line data). This input character line data is stored in the line buffer 113.

ラインバッファ１１３は、光電変換部１１２がら入力さ
れた入力文字行データを２次元座標が再現出来る形式で
記憶する。第２図（Ａ）は、ラインバッファ１１３に記
憶させた入力文字行データ２１の様子を可視的に示した
ものである。The line buffer 113 stores input character line data input from the photoelectric conversion unit 112 in a format that allows reproduction of two-dimensional coordinates. FIG. 2A visually shows the input character line data 21 stored in the line buffer 113.

次に文字切り出し部１１４は、ラインバッファ１１３よ
り入力文字行データを読み込みこれを文字行と垂直な方
向（第２図（Ａ）中Ｙで示す方向（Ｙと逆の方向でも良
い、）以下列方向と称する。）ヲ主走査方向としかつ左
端より右端に順次に走査をし、各列毎の黒ヒツト数を計
数して黒ビットによるヒストグラムを作成する。ざらに
文字切り出し部１１４は、作成したヒストグラムを調べ
、黒ビット数が予め定めた第１の閾値８以上である列が
予め定めた第２の閾値り以上連続している端域を文字パ
タンデータとして抽出し、これをパタンレジスタ１１５
に格納する。さらに、文字切り出し部１１４は、パタン
レジスタ１１５に文字パタンデータを格納する際に、該
文字パターンデータが文字行中の先頭から何番目の文字
であるかを示す文字パタン番号ｍ（第２図（Ａ）参照）
と、該文字パタンのラインバッファ１１３上における最
上点の座標Ｙｔ及び最下点の座標Ｙ、とを文字位置判定
部１１８ａに出力する。なお、この実施例の場合、第１
の閾値Ｂを１とし、第２の閾値りを５として文字パタン
データを抽出した。また、座標Ｙｔ及びＹｂは、ライン
バッファ１１３内に付された絶対座標（第２図（Ａ）の
Ｙ座標）で示されるものとしている。Next, the character cutting unit 114 reads the input character line data from the line buffer 113 and stores it in a direction perpendicular to the character line (direction indicated by Y in FIG. (referred to as "direction") is the main scanning direction, and scanning is performed sequentially from the left end to the right end, and the number of black hits in each column is counted to create a histogram of black bits. The rough character segmentation unit 114 examines the created histogram, and extracts character pattern data from end areas in which columns in which the number of black bits is equal to or greater than a predetermined first threshold of 8 are continuous for a predetermined second threshold or more. and extract this as pattern register 115.
Store in. Furthermore, when storing the character pattern data in the pattern register 115, the character cutting unit 114 also generates a character pattern number m (see FIG. See A))
, the coordinates Yt of the highest point and the coordinates Y of the lowest point on the line buffer 113 of the character pattern are output to the character position determining section 118a. In addition, in the case of this example, the first
The character pattern data was extracted by setting the threshold value B to 1 and setting the second threshold value B to 5. Further, the coordinates Yt and Yb are indicated by absolute coordinates (Y coordinates in FIG. 2(A)) assigned in the line buffer 113.

文字切り出し部１１４から出力された文字パタンデータ
を受は取ったパタンレジスタ１１５は、文字パタンデー
タをその２次元座標が再現出来る形式で格納する。The pattern register 115 that receives the character pattern data output from the character cutting section 114 stores the character pattern data in a format that can reproduce its two-dimensional coordinates.

次に認識部１１６の候補文字名抽出部１１７は、パタン
レジスタ１１５に記憶されている文字パタンデータを読
み取り、これの特徴を所定の方法により抽出して特徴マ
トリクスを作成する。さらに、この特徴マトリクスと、
予め用意されている標準文字パタンの辞書マトリクスと
の類似度を算出し類似度の大きい順にに個までの辞書マ
トリクスの文字名を候補文字名として文字名決定部１１
８の文字位置判定部１１８ａに出力する。なお、この実
施例の場合、Ｋ＝５としている。ここで、文字パタンデ
ータからの特徴の抽出は、従来公知の種々の方法により
行なうことが出来るが、この実施例の場合以下に説明す
るような方法で行なった。Next, the candidate character name extraction unit 117 of the recognition unit 116 reads the character pattern data stored in the pattern register 115, extracts the characteristics of this data using a predetermined method, and creates a feature matrix. Furthermore, this feature matrix and
The character name determination unit 11 calculates the degree of similarity between standard character patterns prepared in advance and the dictionary matrix, and selects character names from the dictionary matrix in descending order of degree of similarity as candidate character names.
8 to the character position determination unit 118a. Note that in this embodiment, K=5. Here, extraction of features from character pattern data can be performed using various conventionally known methods, but in this embodiment, the extraction was performed using the method described below.

先ず、文字パタンデータ（ごついでその文字線部に外接
する例えば矩形の枠を検出する。First, a rectangular frame, for example, circumscribing a character line portion is detected using character pattern data.

次に、この文字パタンの線幅Ｗ％下記（１）式で示され
る周知の近似式を用いて算出する。Next, the line width W% of this character pattern is calculated using a well-known approximation formula shown by the following formula (1).

Ｗ＝　１／　（１−Ｑ／Ａ）・・・（１）ここで（１）
式において、Ｑは、文字パタンを２×２ヒツトの窓から
のぞいた場合この窓内の４画素全てが黒ビットとなる窓
の数であり、Ａは、文字パタン中の全黒ビットの個数で
ある。W= 1/ (1-Q/A)...(1) where (1)
In the formula, Q is the number of windows in which all four pixels within this window are black bits when a character pattern is viewed through a 2 x 2 window, and A is the number of all black bits in the character pattern. be.

次に、この文字パタンを複数の方向に走査を行なって各
走査列毎の黒ビットの連続個数を検出し、この黒ヒツト
の連続個数と、上述の線幅Ｗとに基づいて上述の複数の
方向毎に対応したサブパターンをそれぞれ抽出する。そ
して、この文字パタンの上述の外接枠内を各サブパタン
について（ＮＸＭ）個の領ｔｆｔ（Ｎ、Ｍは定数）にそ
れぞれ分割し、さらに各分割領域内の文字線を表わす特
徴量を各分割領域毎に計算し、この特徴量を文字枠の大
きさで正規化して特徴マトリクスを得る。Next, this character pattern is scanned in multiple directions to detect the number of consecutive black bits in each scanning line, and based on this number of consecutive black bits and the line width W described above, the number of consecutive black bits is detected. A subpattern corresponding to each direction is extracted. Then, the above-mentioned circumscribed frame of this character pattern is divided into (NXM) regions tft (N, M are constants) for each subpattern, and the feature amount representing the character line in each divided region is calculated for each divided region. This feature quantity is normalized by the size of the character frame to obtain a feature matrix.

この実施例では、特徴量を（Δχ＋ΔＹ）／２なる値で
除することによって正規化する。ここでΔＸは外接枠の
水平方向の長さ、ΔＹは外接枠の垂直方向の長さである
。In this embodiment, the feature amount is normalized by dividing it by a value of (Δχ+ΔY)/2. Here, ΔX is the length of the circumscribing frame in the horizontal direction, and ΔY is the length of the circumscribing frame in the vertical direction.

また、このようにして求めた特徴マトリクスと、予め用
意されている標準文字パタンの辞書マトリクスとの類似
度の算出は、この実施例では、下記（２）式に従い求め
でいる。Further, in this embodiment, the degree of similarity between the feature matrix obtained in this manner and a dictionary matrix of standard character patterns prepared in advance is calculated according to the following equation (2).

但し、（２）式中、日は類似度、ｆｌは被認識文字の文
字パタンデータの特徴マトリクスの要素値、９．は辞書
マトリクスの要素値、ＮＸＭは被認識文字の特徴マトリ
クス及び辞書マトリクスの次元数をそれぞれ示す。However, in formula (2), day is the degree of similarity, fl is the element value of the feature matrix of the character pattern data of the character to be recognized, and 9. is the element value of the dictionary matrix, and NXM is the number of dimensions of the feature matrix of the character to be recognized and the dictionary matrix, respectively.

次に、文字名決定部１１８の動作ｌこつき説明する。な
お、この説明の理解を容易にするために、第２図（Ａ）
に示した入力文字行データ２１を処理する例により動作
説明を行なう。Next, the operation of the character name determining section 118 will be explained. In addition, in order to facilitate understanding of this explanation, Fig. 2 (A)
The operation will be explained using an example of processing the input character line data 21 shown in FIG.

文字名決定部１１８の文字位置判定部１１８ａは、文字
切り出し部１１４から入力された文字パタン番号ｍを予
め定めた特定の値ｎと比較しこの比較結果に応じ以下に
説明するように動作する。ここで、ｎは所定の正の整数
でありこの実施例の場合ｎ＝１としている。The character position determining section 118a of the character name determining section 118 compares the character pattern number m input from the character cutting section 114 with a predetermined specific value n, and operates as described below according to the comparison result. Here, n is a predetermined positive integer, and in this embodiment, n=1.

（ａ）　ｎ５ｍであった場合即ち被認識文字が文字行の
先頭からｎ番目までの文字である場合、第４図（Ａ）の
例で云うと第１番目の文字「工」の場合、文字位置判定
部１１８ａは、候補文字名抽出部１１７から出力された
に個の候補文字名のうちの被認識文字に対する類似度が
最も大きい文字名を認識結果文字名として文字名出力端
子１１９に出力する。またざらに文字位置判定部１１８
ａは、基準線算出テーブルｌｌ８ｃ（別表１）から、上
述の認識結果文字名（「■」の文字名）に対応した所定
の係数α（以下、基準線算出係数と称することもある。(a) If it is n5m, that is, if the character to be recognized is the nth character from the beginning of the character line, in the example of Figure 4 (A), if the first character is The position determination unit 118a outputs the character name with the greatest degree of similarity to the recognized character among the candidate character names output from the candidate character name extraction unit 117 to the character name output terminal 119 as a recognition result character name. . Additionally, the character position determination unit 118
a is a predetermined coefficient α (hereinafter sometimes referred to as a reference line calculation coefficient) corresponding to the above-mentioned recognition result character name (character name "■") from the reference line calculation table ll8c (Appendix 1).

）を基準線算出テーブル１１８ｃから読出し、この係数
αと、この認識結果文字名に対応する文字パタン「工」
の入力文字行データにおける座標この例では文字切り出
し部１１４から入力される最上点座標Ｙ、及び最下点座
標Ｙｂとを用い下記（３）式に従い入力文字行データの
基準線座標Ｙ、を算出する。) is read out from the reference line calculation table 118c, and this coefficient α and the character pattern “K” corresponding to the recognition result character name are read out.
In this example, the reference line coordinate Y of the input character line data is calculated according to the following formula (3) using the highest point coordinate Y and the lowest point coordinate Yb input from the character cutting section 114. do.

うな構成となっており、各英字の文字名と、基準線算出
係数αとを予め対応づけて登録することで構成しである
。This configuration is such that the character name of each alphabetic character and the reference line calculation coefficient α are registered in advance in association with each other.

第２図（Ａ）の入力文字行データ２１の基準線座標Ｙ、
について考えると、第１番目の文字パタンｒＩＪの基準
線算出係数α、最上点座標Ｙｔ及び最下点座標Ｙ、それ
ぞれが、 α＝０．０Ｙｔ　＝９８Ｙｂ＝３０であるので、基準線座標Ｙ、は、Ｙ、＝３０＋Ｏｘ　（９８−３０）＝３０となる。The reference line coordinate Y of the input character line data 21 in FIG. 2(A),
Considering, the reference line calculation coefficient α, the highest point coordinate Yt, and the lowest point coordinate Y of the first character pattern rIJ are α=0.0 Yt=98 Yb=30, so the reference line coordinate Y, is: Y,=30+Ox (98-30)=30.

Ｙ、＝Ｙｂ　＋α　（Ｙｔ−’ｖ’ｂ）　　　−・・　
（３）次に、文字位置判定部１１８ａは、算出した基準
線座標Ｙｓを基準線記憶部１１８ｄに格納する。Y, = Yb + α (Yt-'v'b) -...
(3) Next, the character position determination unit 118a stores the calculated reference line coordinate Ys in the reference line storage unit 118d.

なお、基準線算出テーブル１１８Ｃは、英大文字及び英
小文字用のもので考えると例えば別表１のよ（ｂ）一方
ｍ＞Ｎであった場合即ち被認識文字が文字行の先頭から
ｎ＋１番目以降の文字である場合、文字位置判定部１１
８ａは以下に説明するように認識結果文字名を決定する
。この動作説明を、第２図（Ａ）の第３番目の文字であ
る小文字「Ｐ」の例で行なう。Note that the reference line calculation table 118C is for uppercase English letters and lowercase English letters, for example, as shown in Attached Table 1 (b).On the other hand, if m>N, that is, if the recognized character is from the (n+1)th or later position from the beginning of the character line, If it is a character, the character position determination unit 11
8a determines the recognition result character name as described below. This operation will be explained using the example of the lowercase letter "P" which is the third character in FIG. 2(A).

文字位置判定部１１８ａは、文字切り出し部＋１４から
入力された文字パタンｒｐ　Ｊの最上点座標Ｙｔ及び最
下点座標Ｙｂと、基準線記憶部１１８ｄに記憶されてい
る基準線座標Ｙ３とを用い、下記（４）式に従い文字パ
タン「Ｐ」の位１１を表わす特徴ｅを算出する。The character position determining unit 118a uses the highest point coordinate Yt and the lowest point coordinate Yb of the character pattern rpJ input from the character cutting unit +14 and the reference line coordinate Y3 stored in the reference line storage unit 118d, A feature e representing the 11th digit of the character pattern "P" is calculated according to the following equation (4).

但しく４）式中Ｚは定数であり、この実施例の場合Ｚ＝
１０としている。However, 4) In the formula, Z is a constant, and in this example, Z=
It is set at 10.

次に、文字位置判定部１１８ａは、候補文字名抽出部１
１７から入力されでいるに個の候補文字名について被認
識文字に対する類似度の大きいものから順次該文字名に
対応する所定の値（文字位置特徴９Ｌ及び９Ｈと称する
。）を文字位置特徴テーブル１１８ｂから読出す。Next, the character position determination unit 118a selects the candidate character name extraction unit 1.
Predetermined values (referred to as character position features 9L and 9H) corresponding to the candidate character names inputted from 17 are sequentially stored in the character position feature table 118b in descending order of similarity to the character to be recognized. Read from.

なお、文字位置特徴テーブル１１８ｂは、英大文字及び
英小文字用のもので考えると例えば別表２のような構成
となっており、各英字の文字名と、当該文字の基準線の
位１（下限座標９Ｌ及び上限座標９Ｎ）とを予め対応づ
けて登録することで構成しである。Note that the character position feature table 118b has a structure as shown in Attached Table 2, for example, when considering uppercase English letters and lowercase English letters. 9L and the upper limit coordinate 9N) are registered in advance in association with each other.

次に、文字位置判定部１１８ａは、文字位置特徴テーブ
ル１１８ｂから読出した文字位置特徴９．及び９Ｈと、
文字パタンの（４）式に従い算出した位置の特徴ｅとを
比較する。そして、比較結果が、９Ｌ≦ｅ≦９Ｈを満足した場合は、当該候補文字名を認識結果文字名と
して文字名出力端子１１９に出力する。Next, the character position determining unit 118a determines the character position characteristics 9. read from the character position characteristic table 118b. and 9H,
The character pattern is compared with the positional feature e calculated according to equation (4). If the comparison result satisfies 9L≦e≦9H, the candidate character name is outputted to the character name output terminal 119 as a recognition result character name.

これに対し比較結果が、９、〉ｅ　または　ｅｇｇＨである場合は、当該候補文字名は認識結果文字名ではな
いと判定し、当該候補文字名の次に類似度が大きい候補
文字名に対して上述したと同様な処理を行なう。On the other hand, if the comparison result is 9,〉e or eggH, it is determined that the candidate character name is not a recognition result character name, and the candidate character name with the next highest degree of similarity is selected. The same processing as described above is performed.

小文字「Ｐ」の認識結果文字名を決定する例について考
えると、この文字の文字パタンの最上点座標Ｙｔ及び最
下点座標Ｙ、が、ｖｔ　　＝５０Ｙ、＝１５であり、入力文字行データの基準線座標Ｙ、が、先に求めたよう
に、Ｙ、＝３０であるので、小文字「Ｐ」の文字パタンの位置の特徴ｅ
は、（４）式より、となる。Considering the example of determining the recognition result character name of the lowercase letter "P", the highest point coordinate Yt and the lowest point coordinate Y of the character pattern of this character are vt = 50 Y, = 15, and the input character line data Since the reference line coordinate Y, is Y, = 30 as calculated earlier, the character pattern position characteristic e of the lowercase letter "P" is
From equation (4), it becomes.

また、小文字「Ｐ」の候補文字名は、類似度順位層に別
表３に示すように「Ｐ」、ｒｐＪ、「０」、ｒｌ）Ｊ及
びｒｃＪとなっている。Further, candidate character names for the lowercase letter "P" are "P", rpJ, "0", rl)J, and rcJ as shown in Appendix 3 in the similarity ranking layer.

そこで、文字位置判定部１１８ａは、先ず、第１位の候
補文字名「Ｐ」の文字位置特徴９Ｌ及び９Ｈを文字位置
特徴テーブル１１８ｂから読出し、これら９Ｌ及び９□
と、算出した位置の特徴ｅとを比較する。しかし、第１
位の候補文字名ｒｐＪは、第別表２からも明らかなよう
に、９Ｌ　＝Ｏ及び９Ｈ＝１であるので、文字パタン「
Ｐ」の位置の特徴ｅ＝４．２との関係においてｇ、＜ｅ
となってしまう、従って、文字位置判定部１１８ａは、
第１位の候補文字名ｒｐＪは認識結果文字名ではないと
判定する。Therefore, the character position determining unit 118a first reads the character position features 9L and 9H of the first candidate character name "P" from the character position feature table 118b, and reads these 9L and 9□
and the calculated positional feature e are compared. However, the first
As is clear from Attached Table 2, the candidate character name rpJ for the position is 9L = O and 9H = 1, so the character pattern ``
In relation to the feature e=4.2 of the position of "P", g, < e
Therefore, the character position determination unit 118a
It is determined that the first candidate character name rpJ is not a recognition result character name.

次に文字位置判定部１１８ａは、第２位の候補文字名「
Ｐ」について、第１位の候補文字名の場合と同様な処理
を行なう、この際、第２位の候補文字名「Ｐ」は、第３
図に示すように、９Ｌ＝４及び９Ｈ＝６であるので、文
字パタンｒＰＪの位置の特徴ｅ＝４．２との関係におい
て９Ｌ≦ｅ≦９Ｈを満足する。従って、文字位置判定部
１１８ａは、第２位の候補文字名［ＰＪを認識結果文字
名として決定し、文字名出力端子１１９に出力する。Next, the character position determination unit 118a determines the second candidate character name “
The same process as for the first candidate character name is performed for "P". In this case, the second candidate character name "P" is
As shown in the figure, since 9L=4 and 9H=6, 9L≦e≦9H is satisfied in relation to the character pattern rPJ positional feature e=4.2. Therefore, the character position determination unit 118a determines the second candidate character name [PJ as the recognition result character name, and outputs it to the character name output terminal 119.

以上がこの発明の文字認識装置の実施例の説明である。The above is the description of the embodiment of the character recognition device of the present invention.

しかし、この発明は上述の実施例にのみ限定されるもの
ではなく以下に説明するような種々の変更を加えること
が出来る。However, the present invention is not limited to the above-described embodiments, and various modifications as described below can be made.

上述の実施例は、文字行の先頭にある被認識文字と、２
文字目以降にある被認識文字とで認識処理を異ならせた
例であった。即ち文字行の先頭からの文字数ｎ８ｎ＝１
と設定した例であった。しかしこのｎの数は設計に応じ
変更出来ることは明らかである。ただし、ｎを２以上の
値にした場合の入力文字行データにおける基準線座標Ｙ
、は、例えば以下のように算出するのが好適である。In the above embodiment, the character to be recognized at the beginning of the character line, and the two
This is an example in which the recognition processing is different depending on the characters to be recognized after the first character. That is, the number of characters from the beginning of the character line n8n = 1
This is an example of setting. However, it is clear that the number n can be changed depending on the design. However, when n is set to a value of 2 or more, the reference line coordinate Y in the input character line data
, is preferably calculated as follows, for example.

〈第１の算出法〉第１の方法としては、先頭からｎ番目までの文字毎で夫
々算出した基準線座標の平均値を基準線座標Ｙ、とする
方法がある。<First calculation method> As a first method, there is a method in which the average value of the reference line coordinates calculated for each character from the beginning to the nth character is set as the reference line coordinate Y.

例えば菓２図（Ａ）の入力文字行データ２１に対し、ｎ
＝２を設定した場合の例で説明すると、先ず第１番目の
文字「工」について（３）式に従いＹｓ＋を算出し、次
に第２番目の文字ｒｎＪについて（３）式に従いＹ、２
＠算出し、これらの平均値（Ｙ　＊Ｉ＋　Ｙ　１２）　
／２を、ｎ＝２の場合における基準線座標Ｙ、とする。For example, for the input character line data 21 of Figure 2 (A), n
To explain using an example where = 2 is set, first calculate Ys+ for the first character "ENG" according to formula (3), then calculate Y, 2 for the second character rnJ according to formula (3).
@Calculate and average these values (Y * I + Y 12)
/2 is the reference line coordinate Y in the case of n=2.

く第２の算出法〉第２の方法としては、先頭からｎ番目までの文字夫々の
認識結果文字名のうちで最も類似度の大きい認識結果文
字名に対応する予め定めた係数と、該認識結果文字名を
得た文字パタンの座標とを用いて算出した基準線座標を
基準線座標Ｙ、とする方法がある。Second calculation method> The second method uses a predetermined coefficient corresponding to the recognition result character name with the highest degree of similarity among the recognition result character names of the respective characters from the beginning to the nth character, and the recognition result character name. There is a method in which the reference line coordinates calculated using the coordinates of the character pattern from which the resulting character name is obtained are set as the reference line coordinates Y.

第２の算出法の具体例について第１の算出方法の場合と
同じ例で説明すると、菓１番目の文字「工」の認識時の
第１位候補文字名の類似度と、第２番目の文字「ｎ」の
認識時の第１位候補文字名の類似度とを比較し、類似度
が大きい方の文字の第１位候補文字名のについて（３）
式に従い基準線座標を算出しこれを、ｎ＝２の場合の基
準線座標Ｙ３とする。To explain a specific example of the second calculation method using the same example as the first calculation method, the similarity of the first candidate character name at the time of recognition of the first character "ku" and the second Compare the similarity of the first candidate character name when recognizing the character "n" and consider the first candidate character name of the character with the greater similarity (3)
The reference line coordinate is calculated according to the formula, and this is set as the reference line coordinate Y3 in the case of n=2.

なお、ｎが３以上の場合の第１及び第２の算出方法の実
施は、ｎ＝２の場合と同様な手順で行なえる。Note that the first and second calculation methods when n is 3 or more can be performed using the same procedure as when n=2.

また、上述の実施例は、基準線は１本でありか・つ基準
線をベースラインとした例であうた。、シかしこの発明
の実施に当たって基準線はベースラインに限られるもの
ではなく、他のもの例えばディッセンダーライン、ミー
ンライン、キャップラインまたはアッセンダーライン等
としても良い、さらに基準線は２本以上設定しても良い
。Further, in the above embodiment, there is only one reference line, and the reference line is used as the baseline. However, in carrying out the present invention, the reference line is not limited to the baseline, and may be other lines such as a descender line, mean line, cap line, or ascender line, and moreover, two or more reference lines may be set. Also good.

（発明の効果）上述した説明からも明らかなように、この発明の文字認
識装置によれば、文字行の先頭からｎ番目までの被認識
文字の認識結果を用いて文字行データの基準線を算出し
、ｎ＋１番目以降の被認識文字については当該文字の字
形と、当該文字の文字パタンの文字行データにおける座
標及び前記文字行データの基準線座標の相対的な位置関
係とに基づき認識を行なう、このため、ｎ＋１番目以降
の被認識文字中に、例えば大文字「Ｐ」、小文字「ＰＪ
等のように字形が同じで大きざが異なる文字が含まれで
いても、両者を正確に識別出来る。(Effects of the Invention) As is clear from the above description, the character recognition device of the present invention can determine the reference line of character line data using the recognition results of the n-th characters from the beginning of the character line. The n+1st and subsequent characters to be recognized are recognized based on the shape of the character, the coordinates of the character pattern of the character in the character line data, and the relative positional relationship between the reference line coordinates of the character line data. , Therefore, among the characters to be recognized after the n+1st character, for example, the uppercase letter "P", the lowercase letter "PJ
Even if characters with the same shape but different sizes are included, such as , the two can be accurately identified.

然も、基準線の算出は文字行中の先頭部分の１又は複数
文字を用いて行なうだけであるので、基準線の算出時開
は従来に比し極めて短時間で行なえる。また、基準線は
、１又は複数の文字を用いて算出するので、少ない文字
数で構成された文字行や同しような大きさの文字で構成
された文字行からも基準線が算出出来、この結果これら
文字行の認識も精度良く行なえる。However, since the reference line is calculated only by using one or more characters at the beginning of the character line, the reference line calculation can be completed in a much shorter time than in the past. In addition, since the reference line is calculated using one or more characters, the reference line can also be calculated from a character line made up of a small number of characters or a character line made up of characters of similar size. These character lines can also be recognized with high accuracy.

これがため、形状の等しい文字も正確に然も高速１こ認
識出来る装Ｍを提供することが出来る。Therefore, it is possible to provide a system M that can accurately and quickly recognize characters of the same shape.

別表３Attached table 3

[Brief explanation of the drawing]

第１図は、実施例の文字認識装置の構成を示すブロック
図、第２図（Ａ）及びＣＢ）は、実施例の説明に供する図で
ある。００・・・文字認識装置、　１１１・・・光信号２・・
・光電変換部、　　＋１３−・・ラインバッファ４・・
・文字切り出し部、１１５・・・パタンレジスタ６・・
・認識部、　　　　１１７・・・候補文字名抽出部８・
・・文字名決定部、　１１８ａ・・・文字位置判定部８
ｂ・・・文字位置特徴テーブル８ｃ・・・基準線算出テーブル８ｄ・・・基準線記憶部、　１１９・・・文字名出力端
子２１・・・入力文字行データ２３・・・基準線。FIG. 1 is a block diagram showing the configuration of a character recognition device according to an embodiment, and FIGS. 2(A) and CB) are diagrams for explaining the embodiment. 00... Character recognition device, 111... Optical signal 2...
・Photoelectric conversion section, +13-...Line buffer 4...
・Character cutting section, 115...Pattern register 6...
・Recognition unit, 117...Candidate character name extraction unit 8・
...Character name determination section, 118a...Character position determination section 8
b...Character position feature table 8c...Reference line calculation table 8d...Reference line storage section, 119...Character name output terminal 21...Input character line data 23...Reference line.

Claims

[Claims]

(1) A photoelectric conversion unit that photoelectrically converts and quantizes light from a medium to obtain input character line data of character lines on the medium, a character cutting unit that cuts out a character pattern from the input character line data, and characteristics of the character pattern. A character recognition device comprising a recognition unit that extracts and outputs a recognition result character name of a character to be recognized, characterized in that the recognition unit is configured to determine a recognition result character name according to (A) and (B) below. character recognition device. (A) For the characters to be recognized from the beginning of the character line to the n is any positive integer). (B) For each character to be recognized after the n+1th character from the beginning of the character line, [1]...features including features resulting from the shape of the character pattern of the character to be recognized, and [2]...the above. The input character line calculated using some or all of the predetermined coefficients corresponding to each of the character names as a result of recognition of up to the nth character, and the coordinates in the input character line data of the character pattern corresponding to the coefficients used. A recognition result character name is determined based on the relative position between the reference line coordinates of the data and the coordinates of the character pattern of the character to be recognized in the input character line data.

(2) The character recognition device according to claim 1, wherein the recognition unit includes a candidate character name extraction unit that extracts a candidate character name based on a feature resulting from a character shape of a character pattern of a character to be recognized;
A character recognition device comprising: a character name determining unit that determines a recognition result character name from the extracted candidate character names according to (a) and (b) below. (a) For the characters up to the n-th character, among the candidate character names, the candidate character name with the greatest degree of similarity to the recognized character is set as the recognition result character name. (b) For the n+1st and subsequent characters to be recognized, calculate the characteristics of the position of the character pattern in the input character line data based on the reference line coordinates and the coordinates of the character pattern of the character to be recognized, and calculate the position of the character pattern in the input character line data; Compare the characteristics of the character with a predetermined value corresponding to the candidate character name of the character to be recognized, and if the predetermined conditions are satisfied, the candidate character name is set as the recognition result character name, and on the other hand, if the characteristics are not satisfied, the second place For the following candidate character names, a comparison is made between a predetermined value related to the candidate character name and the characteristics of the position, and a candidate character name that satisfies the predetermined condition among the second or lower candidate character names is selected. Use the recognition result character name.

(3) The character recognition device according to claim 1 or 2, wherein the reference line coordinate is an average value of reference line coordinates calculated for each character from the beginning to the nth character. .

(4) In the character recognition device according to claim 1 or 2, the reference line coordinates correspond to the recognition result character name with the highest degree of similarity among the recognition result character names of the respective characters from the beginning to the nth character. A character recognition device characterized in that the reference line coordinates are calculated using a predetermined coefficient and the coordinates of a character pattern from which the recognition result character name is obtained.

(5) In the character recognition device according to any one of claims 1 to 4, the reference line is one or more lines selected from a descender line, a baseline, a mean line, a cap line, and an ascender line. A character recognition device characterized by the following.