JPH01171080A

JPH01171080A - Recognizing device for error automatically correcting character

Info

Publication number: JPH01171080A
Application number: JP62330806A
Authority: JP
Inventors: Zuiseki Ro; 呂　瑞鉐; Biei Chin; 陳　美瑛; Inchiyuu En; 袁　允中; Toshihiro Hayashi; 俊宏林
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1987-12-25
Filing date: 1987-12-25
Publication date: 1989-07-06

Abstract

PURPOSE:To omit a collating time with a word spelling by automatically checking and correcting erroneous recognition in specified feature quantity according to the information of the position, size and external form of a character and the constituting rule of the character itself with obtaining the capital letter and small letter of English, a figure and a special symbol as a recognizing object character. CONSTITUTION:The title device equips a character size deciding means 22 to compute the height of the character by the three-step constitution (upper, medium and lower) feature of the character and to decide whether the character is large or small, a character external form deciding means 23 to decide whether the character has a wide width or a long length, a character collating means 40 to successively compares the prescribed feature quantity to be extracted from the input character and the feature quantity of each standard character pattern and to determine the character whose quantity of distance is also small and a character constituting rule correcting means 50, etc. Thus, the auxiliary information of the input character such as the position of the character string of the input character, the size and external form of the character are compared with the auxiliary information of a standard template 45 and the erroneous recognition in the main feature quantity of the character such as stroke extraction or a peripheral feature is detected. Then, the correction of a similar character is executed and spelling collation is eliminated.

Description

【発明の詳細な説明】産業上の利用分野本発明は誤認識された文字を文字の構成規則により自動
的に検出し、そして修正する文字認識装置に間するもの
である。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a character recognition device that automatically detects and corrects erroneously recognized characters based on character construction rules.

従来の技術第２図及び第３図は、従来の文字認識装置を示すブロッ
ク図である。第２図、第３図において、ｔｏは、イメー
ジ・スキャナ（ｉｎ＋ａｇｅ　５ｃａｎｎｅｒ）などの
光学式走査装置で、原稿を走査し、その原稿にある白い
画素を”０”、黒い画素を”１”とする二値データ（ｂ
ｉｎａｒｙ　ｄａｔａ）に変換したのちに、それをバッ
ファに記憶する画像走査手段である。２０はバッファに
貯蔵されたこ値データを検査し、画像の水平時゛性分布
（ｉｎ＋ａｇｅ　ｈｉｓｔｏｇｒａ＋ｗ）としきい値（
ｔｈｒｅｓｌｌｏｌｄ　ｖａｌｕｅ　）により、文字列
を分離し、そして画像の垂直特性分布により、分離され
た文字列に対して、文字を分離する文字列又は文字切り
出し手段である。３０は、所定の特徴抽出方法で、例え
ば、ストローク抽出（５ｔｒｏｋｅ　ｅｘｔｒａｃｔｉ
ｏｎ　）や周辺特徴（ｐｅｒｉｐｈｅｒａｌ　ｆｅａｔ
ｕｒｅ　）などによッテ、切り出し手段２０で切り出し
た文字を分析し、この入力文字の特徴量を計算し、識別
用の特徴を抽出する文字特徴抽出手段であ、る。４５は
予めメモリに記憶している、上記文字特徴抽出手段３０
で抽出した識別対象文字ごとの照合用の標準文字テンブ
レー）（ｓｔａｎｄａｒｄ　ｔｅｍｐｌａｔｅ）である
。４０は、文字特徴抽出手段３０で抽出した入力文字の
特徴量との照合用の標準文字テンプレート４５とをそれ
ぞれ照合して、これらの差異を計算し、差異値によって
確度の高い順に識別結果の候補を出力する文字照合手段
である。５１は、文字照合手段４０で識別された結果に
ついて、人手での検査と修正を行う手段（第２図）、５
２は原稿の文章内単語のスペリング検査と修正を行う手
段（第３図）である。６０は識別結果を端末機やプリン
ターに示す認識結果表示手段である。BACKGROUND OF THE INVENTION FIGS. 2 and 3 are block diagrams showing conventional character recognition devices. In Figures 2 and 3, to scans a document with an optical scanning device such as an image scanner (in+age 5canner), and marks white pixels on the document as "0" and black pixels as "1". Binary data (b
This is an image scanning means that converts the image into initial data and then stores it in a buffer. 20 examines the value data stored in the buffer and determines the horizontal temporal distribution of the image (in+age histogra+w) and the threshold value (
This is a character string or character cutting means that separates a character string according to the vertical characteristic distribution of the image (thresllold value) and separates the characters from the separated character string according to the vertical characteristic distribution of the image. 30 is a predetermined feature extraction method, for example, stroke extraction
on ) and peripheral features
The character feature extraction means analyzes the characters cut out by the cutout means 20, calculates the feature amount of the input character, and extracts the features for identification. 45 is the character feature extraction means 30 stored in memory in advance.
This is a standard character template for matching each character to be identified extracted in . Reference numeral 40 compares the feature amount of the input character extracted by the character feature extraction means 30 with the standard character template 45 for comparison, calculates the difference between them, and selects candidates for identification results in descending order of accuracy according to the difference value. It is a character matching means that outputs . 51 is a means (FIG. 2) for manually inspecting and correcting the results identified by the character matching means 40;
2 is a means (FIG. 3) for checking and correcting the spelling of words in sentences of a manuscript. Reference numeral 60 denotes a recognition result display means for displaying the identification result on a terminal or printer.

上記のように文字を認識処理の単位とする認識システム
の誤認識は下記のように三種類に分けられる。Misrecognition by recognition systems that use characters as the unit of recognition processing, as described above, can be divided into three types as described below.

（１）大文字と小文字の混交：第４図、第５図に示すように、Ｃ／　ｃ、０１０、　　
Ｓ／ｓ％Ｕ／ｕ、　　Ｖ／ｖ、　　Ｗ／ｗ、　　Ｘ／ｘ
、　　Ｚ／ｚ、　　Ｐ／ｐなど。(1) Mixture of uppercase and lowercase letters: As shown in Figures 4 and 5, C/c, 010,
S/s%U/u, V/v, W/w, X/x
, Z/z, P/p, etc.

（２）類似文字の混交：０１０１０．１／ｌ／Ｉ、５／Ｓ、９／ｇなと。(2) Mixing of similar characters: 01010.1/l/I, 5/S, 9/g.

（３）その他：ｌ→］、゛　（省略記号）→、　（コンマ）など。(3) Others: l → ], ゛ (ellipsis) →, (comma), etc.

従来の文字認識システムは誤認識に対する処理ζｊ二種
類しかない。一つｉ第２図の人工検査を修正手段５０の
ように人手によって誤りを検出し、キーボード入力によ
って修正する方法である。ほかの一つは第３図のスペリ
ング照合検査に修正手段５２のようにスペリングチエツ
クで訂正２するのである。Conventional character recognition systems have only two types of processing ζj for misrecognition. One method is to manually detect errors in the artificial inspection shown in FIG. 2, as in the correction means 50, and correct them using keyboard input. The other method is to make corrections 2 using a spelling check like the correcting means 52 in the spelling check shown in FIG.

第２図のような方式は文字認識結果を人手によって原稿
と対応し、誤認識があれば、キーボードから入力して修
正する。そのような検査とキーボード入力との訂正方式
は多大な時間と労力がかかる上に、自動化することがで
きない。In the system shown in FIG. 2, the character recognition results are manually matched with the original, and if there are any misrecognitions, they are corrected by inputting them from the keyboard. Such inspection and keyboard input correction schemes are time consuming and labor intensive and cannot be automated.

第３図のスペリング照合検査と修正手段５２はスペリン
グチエツクによって、誤りを検出し、修正する方法であ
る。例えば、１９８７年３月フイ・イー・イー　トラン
サ゛クシコン　オシ　ハ０ターｙ　？ナリシス　？ンド
マシン　インテリシ゛エンス　　（ＩＥＥＥ　　ＴＲＡ
ＮＳＡＣＴＩＯＮＳ　　ＯＮ　　　ＰＡＴＴＥＲＮ　　
ＡＮＡＬＹＳＩＳ　　ＡＮＤ　ＭＡＣｌｌｌＮＥ　ＩＮ
置ＬＩＧＥＮｃＥ　）論文誌に発表された”オン　サー
　−Ｊｈり゛ニジ３ン　オフ゛　フロリンティド　１ヤ
ラクタ　オフ−エニー　フォント　？ントー　号イス゛
（Ｏｎ　　ｔｈｅ　　Ｒｅｃｏｇｎｉｔｉｏｎ　　ｏｆ
　　Ｐｒ１ｎｔｅｄ　Ｃｂａｒａｃｔｅｒｓ　ｏｆ　Ａ
ｎｙ　Ｆｏｎｔ　ａｎｄ　５ｉｚｅ）論文のシステムは
その方法を採用したのである。スペリングでマツチング
して、修正する方法は単語（ｗｏｒｄ　）を単位として
検査の処理を行なうため、予めスペリング照合用の単語
を貯えなければならない。それで、そのような単語を記
憶する単語テンプレートは大容量のメモリが必要である
。また、照合用の単語テンプレートが増加するにつれて
、検査とイ１蓬正の処理時間も増加する。さらに、派生
単語（Ｄ、ｅｒｉｖａｔｉｖｅ　ｖｏｒｄ　、例えば、
”’ＯＣＲ’Ｓ”）についての検査と修正も、対象かも
との参照用の単語テンプレートに限定されているために
、行なうことができない。The spelling check and correction means 52 shown in FIG. 3 is a method for detecting and correcting errors by spelling check. For example, in March 1987, when did the FIE Transmission System begin? Narcissism? IEEE TRA
NSACTIONS ON PATTERN
ANALYSIS AND MACllNE IN
On the Recognition of
Pr1nted Cbaracters of A
The system in the paper adopted this method. In the spelling matching and correction method, the checking is performed word by word, so words for spelling matching must be stored in advance. Therefore, a word template that stores such words requires a large amount of memory. Furthermore, as the number of word templates for matching increases, the processing time for testing and checking also increases. In addition, derivative words (D), e.g.
``'OCR'S'') cannot be inspected or corrected because the target is limited to the original reference word template.

発明が解決しようとする問題点このように従来例によれば、（１）文字認識システムで自動的に誤認識を検査及び修
正することができず、大幅に人手による誤認識を検査す
る回数が多くなる。Problems to be Solved by the Invention As described above, according to the conventional example, (1) Character recognition systems cannot automatically check and correct misrecognitions, and the number of times that misrecognitions have to be checked manually is greatly increased. There will be more.

（２）スペリング照合を採用すれば、メモリの記憶区間
と検査時間が多い。(2) If spelling verification is used, the memory storage area and verification time will be large.

（３）派生単語の検査と修正の機能がない。(3) There is no function to inspect and correct derived words.

という問題点があった。　　　　　゛問題点を解決するための手段上記の問題点を解消するために、本発明は、入力文字列
と文字を切り出した後、例えば、文字の三段構造（上、
中、下）特徴により、該当入力文字が文字列の上半部に
あるか下半部にあるかを判定する文字位置判断手段と、
該当入力文字の例えば位置座標により、文字の高さを計
算し、この文字が大きい文字か小さい文字かを判定する
文字大小判断手段と、該当入力文字の例えば位置座標に
より、文字の幅を計算して、この文字のアスペクト比に
よってこの文字が幅の広い文字か長い文字かを判定する
文字外形判断手段と、各認識対象文字の文字列の位置、
文字大小及び文字外形より構成されるフラグを貯蔵する
文字構成フラグ辞書と、該当入力文字から抽出した所定
の特微量と各標準文字パターンの特微量とを逐一比較し
てその距離の最も小さい文字パターンを当初認識文字と
して決定する文字照合手段と、上記入力文字の文字位置
判断手段、文字大小判断手段及び文字外形判断手段で得
た情報と当初認識文字構成フラグとを比較して、必要が
ある場合には、また文字の前後関係の規則により判定し
て、類似文字の修正を行なう必要があるかどうかを決定
する文字構成規則訂正手段を備える誤り自動訂正文字認
識装置である。There was a problem.゛Means for Solving the Problems In order to solve the above problems, the present invention provides, for example, a three-level structure of characters (top, top,
(middle, bottom) character position determining means for determining whether the corresponding input character is in the upper half or the lower half of the character string based on the characteristics;
Character size determining means for calculating the height of a character based on, for example, the positional coordinates of the corresponding input character and determining whether the character is a large character or a small character; and a character outer shape determination means for determining whether this character is a wide character or a long character based on the aspect ratio of this character, and a character string position of each recognition target character,
A character composition flag dictionary that stores flags composed of character sizes and character outlines, and a character pattern with the smallest distance by comparing a predetermined feature amount extracted from the corresponding input character and the feature amount of each standard character pattern one by one. If necessary, compare the information obtained by the character matching means that determines the character as an initially recognized character, the character position determining means, the character size determining means, and the character outer shape determining means of the input character with the initially recognized character composition flag. The present invention is also an automatic error-correcting character recognition device that includes a character structure rule correction means that determines whether or not it is necessary to correct similar characters by making a judgment based on rules regarding the context of characters.

作用本発明は前記の構成により、入力文字の文字列の位置、
文字大小及び文字外形などの入力文字の補助情報を標準
テンプレートの前記同じ文字の位置、文字大小、文字外
形などの補助情報と比較して、ストローク抽出や周辺特
徴のような文字の主要特徴量での誤認識を検出し、そし
て類似文字の修正を行なう。Effect The present invention has the above-mentioned configuration, and the position of the input character string;
Compare the auxiliary information of input characters such as character size and character outline with the auxiliary information of the same character in the standard template such as position, character size, character outline, etc., and extract main features of characters such as stroke extraction and peripheral features. Detects misrecognition of characters and corrects similar characters.

実施例第１図は、本発明の一実施例を示すブロック図である。Example FIG. 1 is a block diagram showing one embodiment of the present invention.

第１図において、１０は画像走査手段、２０は文字列又
は文字切り出し手段、以上は第２図の構成と同じもので
ある。２１は文字列又は文字切り出し手段２０で分離さ
れた文字を文字列においての位置座標より、文字の位置
を判定する文字位置判断手段である。この文字位置判断
手段２１は以下のように動作する。まず、分離された文
字の上限と下限により、文字列の位置基準線を決める。In FIG. 1, 10 is an image scanning means, and 20 is a character string or character cutting means, which is the same as the structure shown in FIG. 2. Reference numeral 21 denotes a character position determining means for determining the position of a character separated by the character string or the character cutting means 20 from the positional coordinates in the character string. This character position determining means 21 operates as follows. First, the position reference line of the character string is determined by the upper and lower limits of the separated characters.

次に第４図に示すように一般の印刷字体の大文字と０”
や”ａ”などの小文字の高さの比は３：２であるので基
準線座標ｙｔｈｒ　＝　２　／　３　（文字列上限中文
字列下限）を決める。ここで、文字の上限座標をｙｔｏ
ｐとして、もしｙｔｏｐ≧ｙｔｈｒであれば、その文字
は文字列の上半部に位置することを意味する。Next, as shown in Figure 4, the uppercase letters and 0"
Since the height ratio of lowercase letters such as "a" and "a" is 3:2, the reference line coordinate ythr = 2/3 (upper limit of character string, lower limit of character string) is determined. Here, the upper limit coordinates of the character are yto
As p, if ytop≧ythr, it means that the character is located in the upper half of the string.

そうでない場合はその文字は文字列の下半部に位置する
。それによって、文字位置フラグ（ＣＲＦ）をｌか或い
は０にする。、この文字位置の特徴データは文字構成規
則誤り訂正手段５０の類似文字の混交に役立つものであ
る。Otherwise, the character is placed in the bottom half of the string. As a result, the character position flag (CRF) is set to 1 or 0. , this character position characteristic data is useful for mixing similar characters by the character construction rule error correction means 50.

例えば、第４図のように、字形が類似であるが、位置の
違う文字”９　”と”、′を自動的に検査することがで
きる。For example, as shown in FIG. 4, the characters "9", "," and ', which have similar shapes but different positions, can be automatically checked.

２２は文字切り出し手段２０で分離された文字の座標に
より、大きい文字か小さい文字かを判断する文字大小判
断手段である。判断方法は下記のようにＨ（１）＝　ｙｔｏｏ−ｙｂａｔｔｏｌＩここで、ｙｔ
ｏｐ、ｙｂｏｔｔｏｓはそれぞれ文字の上限と下限の座
標である。Ｈ（ｉ）をその文字の高さとして、Ｈ＾ＬＩ
Ｅは全部の文字の平均高さである。もしＨ（ｉ）≦Ｈ＾
ＬＩＥならば、第５図に示すように一般文章原稿におい
て中間に分布している文字は小さい文字と判定される。Reference numeral 22 denotes a character size determining means for determining whether the character is a large character or a small character based on the coordinates of the character separated by the character cutting means 20. The judgment method is as follows: H(1) = ytoo-ybattolI, where yt
op and ybottos are the coordinates of the upper and lower limits of the character, respectively. Let H(i) be the height of the character, H^LI
E is the average height of all letters. If H(i)≦H^
In the case of LIE, as shown in FIG. 5, characters distributed in the middle of a general text manuscript are determined to be small characters.

反対にもし、）ｌ（ｉ）＞ＨＡｕＥの場合は即ち第７図
、第８図のような文字は大きい文字と判定される。大、
小文字によって、文字大小フラグ（Ｃ３Ｆ）を１或いは
０にする。On the other hand, if )l(i)>HAuE, the characters shown in FIGS. 7 and 8 are determined to be large characters. Big,
Set the character case flag (C3F) to 1 or 0 depending on the lowercase letter.

第６図は大文字と小文字の三段分布を示す図である。FIG. 6 is a diagram showing a three-level distribution of uppercase and lowercase letters.

２３は文字切り出し手段２０で分離された文字の座標に
より、その文字の幅を計算して、そして、上述した文字
大小判断手段２２で得た情報により、次式のように文字
外形を判定する。23 calculates the width of the character from the coordinates of the character separated by the character cutting means 20, and determines the outer shape of the character using the information obtained by the character size determining means 22 as shown in the following equation.

Ｗ（ｊ　）＝　Ｘ　ｒｉｇｈｔ　　Ｘ　１ｅｆｔＦ（ｉ
　）＝Ｗ（ｉ　）／　Ｈ（ｉ　）ここで、Ｋｒｉｇｈｔ
−Ｘ　１ｅｆｔはそれぞれ文字の右、左限の座標で、Ｗ
（ｉ）は文字の幅で、Ｈ（ｉ　）は文字の高さで、Ｆ（
ｉ）は文字の幅及び高さの比で、即ち、その文字のアス
ペクト比を指す。もし、Ｆ（ｉ）≧しきい値（ｔｈｒｅ
ｓｈｏｌｄ　ｖａｌｕｅ）であれば、その文字は幅の広
い文字である。例えば”Ｍ”、”Ｗ”などの文字があて
はまる。そうでない場合はその文字を長い文字とする。W(j)=X right
)=W(i)/H(i) where, Wright
-X 1ef are the coordinates of the right and left limits of the character, respectively, and W
(i) is the width of the character, H(i) is the height of the character, F(
i) is the ratio of the width and height of a character, ie refers to the aspect ratio of the character. If F(i)≧threshold (thre
shold value), the character is a wide character. For example, characters such as "M" and "W" are applicable. Otherwise, make the character a long character.

例えば、′ｂ”、′ｆ”、”１　”などの文字がこれに
相当する。同様に幅の広い文字か長い文字かにより、ｌ
或いは０をつけて、文字外形フラグ（ＣＣＦ）を決める
。For example, characters such as 'b', 'f', and '1' correspond to this. Similarly, depending on whether the character is wide or long, l
Alternatively, add 0 to determine the character outline flag (CCF).

前述した三種類の誤認識されやすい文字の文字位置フラ
グ（ＣＰＦ）１１．文字大小フラグ（Ｃ９Ｆ）、及び文
字外形フラグ（ＣＣＦ）を第１表に示す。Character position flags (CPF) for the three types of characters that are likely to be misrecognized as described above11. Table 1 shows the character size flag (C9F) and character outline flag (CCF).

（以下余白）第１表５５は各識別対象文字の文字位置フラグ、文字大小フラ
グ及び文字外形フラグ三つからなる文字構成フラグを貯
蔵する構成フラグ辞書である。(The following is a margin) The first table 55 is a composition flag dictionary that stores character composition flags consisting of three character position flags, character size flags, and character outline flags for each character to be identified.

本発明の一実施例を示す第１図において、４０は文字特
徴抽出手段３０で獲得した入力文字ストローク抽出及び
周辺特徴の複合特徴量を識別対象文字の特徴量と照合し
て、それぞれの距離を計算し、そあ距離の最も小さい文
字パターンから順に当該認識文字鎖補とする文字照合手
段である。文字構成規則誤り訂正手段５０は、上記文字
位置判断手段２１、文字大小判断手段２２、及び文字外
形判断手段２３で得たフラグにより、文字構成フラグ辞
書５５に貯えられている当該当初認識文字の文字構成フ
ラグとを照合する。照合の結果一致しないと、第１表に
示すようにフラグの違う類似文字を誤認識したことを意
味する。この場合、大、小文字の１１正や、数字と文字
とめ訂正や、記号の修正などの処理に入る。照合の結果
が、一致すると、正しい認識結果を得られたか、或いは
、第１表のようにフラグの同一な類似文字が認識された
ことによる。フラグの同一な類似文字の場合なら、文字
構成の前後関係の規則により検査する。もし規則に合え
ば、正しい認識結果とみとめられるが、合わない場合は
識別文字についてもう一度修正する必要がある。In FIG. 1 showing an embodiment of the present invention, reference numeral 40 compares the composite feature amount of the input character stroke extraction and peripheral features obtained by the character feature extraction means 30 with the feature amount of the character to be identified, and calculates the distance between them. This is a character matching means that calculates and selects the recognized character chain complement in order from the character pattern with the smallest distance. The character composition rule error correction means 50 uses the flags obtained by the character position determination means 21, the character size determination means 22, and the character outer shape determination means 23 to correct the character of the initially recognized character stored in the character composition flag dictionary 55. Check against configuration flags. If there is no match as a result of the comparison, it means that similar characters with different flags have been erroneously recognized, as shown in Table 1. In this case, processing such as correcting uppercase and lowercase letters, correcting numbers and characters, and correcting symbols is performed. If the comparison results match, it is because a correct recognition result has been obtained, or because similar characters with the same flags have been recognized as shown in Table 1. If it is a similar character with the same flag, it is checked according to the rules of the context of the character structure. If it matches the rules, it is considered a correct recognition result, but if it doesn't, the identification characters need to be corrected again.

文字構成規則誤り訂正手段５０は下記のように文字構成
の前後間係の規則により修正する。The character composition rule error correction means 50 corrects the character composition according to the rules for the preceding and following characters as described below.

（１）句読点”、′の後の最初の文字は必ず大文字であ
ること。(1) The first letter after the punctuation mark ``,'' must be capitalized.

（２）省略記号”２”の一つ前の文字（以下前文字とす
る）は小文字であれば、省略記号の一つ後の文字（以下
後文字とする）は必ず小文字である。例えば、第９図に
示すような”Ｓ”文字がこれに相当する。(2) If the character immediately before the abbreviation "2" (hereinafter referred to as the previous character) is a lowercase letter, the character immediately after the abbreviation (hereinafter referred to as the later character) is always a lowercase letter. For example, the letter "S" as shown in FIG. 9 corresponds to this.

（３）先端文字は小文字で、最後の文字も小文字であれ
ば、その二文字の間の文字は必ず小文字であること。(3) If the first character is a lowercase letter and the last character is also a lowercase letter, the characters between the two characters must be lowercase.

（４）前文字はスペースであって、そして（Ａ）後文字
は大文字である場合には、その後文字の前の文字は必ず
大文字であること。第１０図がこれに相当する。(4) If the first character is a space, and (A) the second character is a capital letter, then the character before the second character must be a capital letter. FIG. 10 corresponds to this.

（Ｂ）後文字は小文字である場合には、ａ、後文字は大
きい文字で、また後文字の位置はその文字より高かった
ならばその文字は必ず小文字である。例えば、第１１図
に示された”Ｃ”文字がこれに相当する。(B) If the trailing character is a lowercase letter, then a. If the trailing character is a large character, and the position of the trailing character is higher than that character, then that character is always a lowercase letter. For example, the character "C" shown in FIG. 11 corresponds to this.

ｂ、後文字が小さな文字で、また後文字の位置もその文
字より低い場合その文字は必ず大文字である。例えば、
第１２図に示された”Ｓ”文字が相当する。b. If the second character is a small character and the position of the second character is lower than that character, that character is always uppercase. for example,
This corresponds to the letter "S" shown in FIG.

（５）前文字が大文字であってそして（Ａ）その文字の位置が前文字より低かったら、その文
字は必ず小文字である。例えば、第１２図に示された”
Ｃ”文字が相当する。(5) If the preceding letter is an uppercase letter, and (A) the position of that letter is lower than the preceding letter, then that letter is always a lowercase letter. For example, as shown in FIG.
C” character corresponds.

（Ｂ）その文字の位置が前文字と等しい場合は、第１０
図に示すようにその文字は必ず大文字或いは大きい文字
である。(B) If the position of that character is equal to the previous character, the 10th
As shown in the figure, the letters are always uppercase or large letters.

（６）単語においては、文字は数字と混合してはならな
いので、文字は数字列の前か或いは後にしか現われない
。(6) In words, letters must not be mixed with numbers, so letters can only appear before or after a string of numbers.

上述した文字前後関係の規則による検査方法は今プログ
ラム言語、例えばＣ言語で簡単に実行することができる
。６０は自動訂正し終わった認識結果をデイスプレーす
る認識結果表示手段である。The checking method according to the character context rules described above can now be easily implemented in a programming language, such as the C language. 60 is a recognition result display means for displaying the recognition result that has been automatically corrected.

本実施例のような文字認識装置のハードウェア配置図は
第１３図に示す。この文字認識装置の中心部はＣＰＵ及
びディジタル・シグナル・プロセッサ（Ｄｉｇｉｔａｌ
　Ｓｉｇｎａｌ　Ｐｒｏｃｅｓｓｏｒ）を含めた認識装
置で構成されている。認識すべき原稿は画像走査装置に
より、二値データを得たのちに、これらのデータを検査
し、文字列又は文字を切り出す処理によって、それぞれ
の文字を分離し、文字の境界方向（Ｂｏｕｎｄａｒｙ　
Ｄｉｒｅｃ−ｔｉｏｎ　）と背景密度（Ｂａｃｋｇｒｏ
ｕｎｄ　Ｄｅｎｓｉｔｙ　）により、それぞれの特徴量
を予め貯蔵された照合用の標準テンプレートと対応し、
特徴量を比較し、特徴量の差異値を累加し、その値によ
り識別候補文字の順序を決めて、差異の最小となる候補
文字は最初認識結果とする。そして、本発明の文字構成
規則により、自動的に検査と修正の処理をする。すると
、獲得した結果は最終の認識結果となる。A hardware layout diagram of a character recognition device like this embodiment is shown in FIG. The core of this character recognition device is a CPU and a digital signal processor.
It consists of a recognition device including a signal processor. For the original to be recognized, binary data is obtained using an image scanning device, and then this data is inspected, each character is separated by a process of cutting out character strings or characters, and character boundary directions (Boundary
Direction) and background density (Backgro
and Density), each feature is matched with a pre-stored standard template for matching,
The feature amounts are compared, the difference values of the feature amounts are accumulated, and the order of identification candidate characters is determined based on the values, and the candidate character with the smallest difference is taken as the first recognition result. Then, inspection and correction processing is automatically performed using the character structure rules of the present invention. Then, the obtained result becomes the final recognition result.

この発明は上記実施例に限定されることなく、その要旨
を変更しない限り、変更して実施することができる。例
えば、特徴抽出手段、又は照合手段の違うシステムにお
いてもこの発明を利用して自動的に検査と修正すること
ができる。また、本発明の文字構成規則の対象は英文字
に限らず、スペリング諸系、例えば、ドイツ語、フラン
ス語にも適用できる。This invention is not limited to the above-mentioned embodiments, and can be implemented with modifications as long as the gist thereof is not changed. For example, even systems with different feature extraction means or matching means can be automatically inspected and corrected using the present invention. Furthermore, the object of the character construction rules of the present invention is not limited to English characters, but can also be applied to various spelling systems, such as German and French.

発明の効果本発明によれば、英語の大文字や小文字、数字及び特殊
記号を認識対象文字として、文字の位置、大小、外形と
文字自身の構成規則の情報により、自動的に既定特徴量
での誤認識を検査、修正することができるので、多量の
照合用の標準単語スペリングを記憶するメモリが不用と
なり、単語スペリングとの照合時間を省くことができ、
その実用的効果は大きい。Effects of the Invention According to the present invention, English uppercase and lowercase letters, numbers, and special symbols are recognized as characters to be recognized, and information about the position, size, outline, and composition rules of the characters themselves are used to automatically recognize them using predetermined features. Since misrecognition can be inspected and corrected, memory for storing a large number of standard word spellings for comparison is no longer required, and the time required for matching word spellings can be saved.
Its practical effects are great.

[Brief explanation of the drawing]

第１図は本発明における一実施例の文字認識装置の構成
を示すブロック図、第２図及び第３図は従来例の文字認
識装置の構成を示すブロック図、第４図は文字列におい
て文字の位置を示す説明図、第５図及び第７図は混交し
やすい大文字と小文字の例を示す説明図、第６図は大文
字と小文字の三段分布を示す説明図、第８ＵｇＪは文字
の平均大きさを示す説明図、第９図〜第１２図は文字構
成規則の例を示す説明図、第１３図は本発明の実施例に
おけるハードウェアの配置図である。ＩＯ・・・画像走査手段、２０・・・文字列又は文字切
り出し手段、２１・・・文字位置判断手段、２２・・・
文字大小判断手段、２３・・・文字外形判断手段、３０
・・・文字特徴抽出手段、４０・・・文字照合手段、４
５・・・標準文字テンプレート、５１・・・文字構成規
則誤り訂正手段、５５・・・文字構成フラグ辞書、６０
・・・認識結果表示手段。代理人の氏名　弁理士　中尾敏男　はか１名ＩＩＩ　　
図第２図第３図第４図第５図第６図第７図FIG. 1 is a block diagram showing the configuration of a character recognition device according to an embodiment of the present invention, FIGS. 2 and 3 are block diagrams showing the configuration of a conventional character recognition device, and FIG. Figures 5 and 7 are explanatory diagrams showing examples of uppercase and lowercase letters that are likely to be mixed, Figure 6 is an explanatory diagram showing the three-level distribution of uppercase and lowercase letters, and 8th UgJ is the average of the characters. FIGS. 9 to 12 are explanatory diagrams showing examples of character composition rules, and FIG. 13 is a hardware layout diagram in an embodiment of the present invention. IO...Image scanning means, 20...Character string or character cutting means, 21...Character position determining means, 22...
Character size determination means, 23...Character outer shape determination means, 30
...Character feature extraction means, 40...Character matching means, 4
5... Standard character template, 51... Character composition rule error correction means, 55... Character composition flag dictionary, 60
...Recognition result display means. Name of agent: Patent attorney Toshio Nakao Hakaichimei III
Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7

Claims

[Claims]

Character position judgment that cuts out characters for each character string from input image data, determines whether the position of the corresponding input character is in the upper half or lower half of the character string, and stores 1 or 0 in the position flag. Determine whether a character is large or small based on the method and the height of the corresponding input character, and set the size flag to 1.
Alternatively, a character size determining means for storing 0, and a character external shape determining means for determining whether a character is a wide character or a long character based on the aspect ratio of the input character and storing 1 or 0 in an external shape flag; A character composition flag dictionary that stores flags including the position flag, character size flag, character outline flag, etc. for each standard character within the recognition range, and predetermined feature amounts extracted from the corresponding input character and each standard. a character matching means for comparing character pattern feature quantities point by point with character pattern feature quantities and determining the feature quantity of the character pattern with the smallest distance as the initially recognized character; a set of position, size, and outline flags of the input character; and the character composition graph dictionary. Compare the configuration flags of the initially recognized characters and if the two sets of flags do not match, immediately modify the similar characters, and if they match, check whether similar characters need to be modified further according to the character structure rules before and after the characters. 1. An automatic error correction character recognition device comprising: a character configuration rule error correction means for determining whether or not the character structure rule is incorrect.