JP2953162B2

JP2953162B2 - Character recognition device

Info

Publication number: JP2953162B2
Application number: JP3347467A
Authority: JP
Inventors: 靖彦村山
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1991-12-27
Filing date: 1991-12-27
Publication date: 1999-09-27
Anticipated expiration: 2014-09-27
Also published as: JPH05182013A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、手書きまたは印刷され
た文字イメージデータが観念する文字を認識する文字認
識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for recognizing a character which is conceived by handwritten or printed character image data.

【０００２】[0002]

【従来の技術】従来より、手書きの文字イメージデータ
や雑誌などに印刷された文字イメージデータを２値画像
データとしてイメージスキャナで読み取り、辞書登録さ
れた文字パターンと比較することによって、文字として
認識できる文字認識装置が開発されている。この装置
は、キーボードからのデータ入力が省略できるために、
非常に効率のよいマンマシンインタフェースとして、幅
広い応用が期待できる。2. Description of the Related Art Conventionally, handwritten character image data or character image data printed on a magazine or the like is read as binary image data by an image scanner and compared with a character pattern registered in a dictionary. Character recognition devices have been developed. This device can omit data input from the keyboard,
A wide range of applications can be expected as a very efficient man-machine interface.

【０００３】ところが、この文字認識装置は、手書きの
文字イメージデータの癖や、印刷された文字イメージデ
ータのにじみ・つぶれによって、誤って文字認識される
ことがある。画面上に表示された認識結果の文字列の中
から、オペレータが誤認識文字を発見した場合には、次
のような訂正方法がある。まず、誤認識文字をライトペ
ンで指示して、正しい文字の可能性が高い順番で別の文
字を表示させて訂正する方法がある。次に、キーボード
の打鍵により正解の可能性が高い順に複数の文字を表示
させることにより訂正を行う方法がある。さらに、認識
すべき対象文字と認識結果文字とを画面の同一箇所にそ
れぞれ表示して訂正する方法がある。この方法であれ
ば、目視比較が容易でかつ正確・迅速に訂正できる。However, this character recognition device may erroneously recognize characters due to the habit of handwritten character image data or the blurring or crushing of printed character image data. When the operator finds a misrecognized character from the character string of the recognition result displayed on the screen, there are the following correction methods. First, there is a method in which an erroneously recognized character is instructed with a light pen, and another character is displayed in the order in which the possibility of a correct character is high to correct the character. Next, there is a method of performing correction by displaying a plurality of characters in descending order of the possibility of a correct answer by tapping the keyboard. Further, there is a method of displaying a target character to be recognized and a recognition result character at the same position on the screen, respectively, for correction. According to this method, visual comparison is easy, accurate, and quick.

【０００４】[0004]

【発明が解決しようとする課題】ところで、従来の文字
認識装置では、一箇所で文字イメージデータが誤認識さ
れた場合には、同じ文字イメージデータが複数箇所で誤
認識されている可能性が高い。しかし、このような場合
でも、オペレータが１文字ずつ訂正していかなければな
らなかった。このため、文字認識後の訂正作業に時間が
掛かり問題であった。本発明は、このような問題を解決
することを目的とする。In the conventional character recognition apparatus, when character image data is erroneously recognized at one place, there is a high possibility that the same character image data is erroneously recognized at a plurality of places. . However, even in such a case, the operator had to correct each character one by one. For this reason, the correction work after character recognition takes time, which is a problem. An object of the present invention is to solve such a problem.

【０００５】[0005]

【課題を解決するための手段】上記課題を解決するため
に、本発明の文字認識装置は、手書きまたは印刷された
複数個の文字イメージデータを入力する入力手段と、入
力手段で入力された文字イメージデータを辞書登録され
た複数の文字パターンと比較して特定の文字と認識する
識別手段と、識別手段で認識された認識文字群の中から
誤認識文字をマニュアル訂正するマニュアル訂正手段
と、認識文字群の中からマニュアル訂正手段によって訂
正文字に置換された誤認識文字と同一認識された文字イ
メージデータを検出して、これらの文字イメージデータ
のパターンが、誤認識文字の辞書登録された文字パター
ンよりも、誤認識文字のイメージデータのパターンに近
似している場合に誤認識の可能性が高いと判定する再確
認手段とを備えている。In order to solve the above-mentioned problems, a character recognition device according to the present invention comprises an input means for inputting a plurality of handwritten or printed character image data, and a character input by the input means. Identification means for comparing image data with a plurality of character patterns registered in a dictionary to recognize a specific character; manual correction means for manually correcting an erroneously recognized character from a group of recognized characters recognized by the identification means; Detects character image data that is identically recognized as a misrecognized character replaced with a corrected character by manual correction means from a group of characters, and converts the pattern of these character image data into a character pattern registered in a dictionary of misrecognized characters. Re-confirmation means for determining that the possibility of misrecognition is high when the pattern is close to the pattern of the image data of the misrecognized character. .

【０００６】さらに、再確認手段で誤認識の可能性が高
いと判定された文字イメージデータを自動的に訂正する
自動訂正手段が備えられている。Further, there is provided an automatic correcting means for automatically correcting the character image data determined to have a high possibility of erroneous recognition by the reconfirming means.

【０００７】[0007]

【作用】本発明の文字認識装置によれば、手書きまたは
印刷された複数個の文字イメージデータを入力手段で入
力し、この文字イメージデータを識別手段で解析するこ
とによって、特定の文字であると認識される。そして、
識別手段で誤認識された文字は、マニュアル訂正手段で
マニュアル訂正される。再確認手段では、マニュアル訂
正手段で訂正文字に置換された誤認識文字と同一認識さ
れた文字が検出され、これらの文字のイメージデータの
パターンが、誤認識文字の辞書登録された文字パターン
よりも、誤認識文字のイメージデータのパターンに近似
している場合に誤認識の可能性が高いと判定される。ま
た、このように判定された文字については、自動訂正手
段を用いて自動訂正することもできる。According to the character recognition device of the present invention, a plurality of handwritten or printed character image data is input by the input means, and the character image data is analyzed by the identification means to determine that the character is a specific character. Be recognized. And
Characters misrecognized by the identification means are manually corrected by the manual correction means. In the reconfirmation means, characters which are recognized as the same as the erroneously recognized characters replaced with the corrected characters by the manual correction means are detected, and the pattern of the image data of these characters is smaller than the character pattern registered in the dictionary of the erroneously recognized characters. It is determined that the possibility of erroneous recognition is high when the pattern is similar to the pattern of the image data of the erroneously recognized character. In addition, the character determined as described above can be automatically corrected by using an automatic correction unit.

【０００８】[0008]

【実施例】以下、本発明の一実施例について添付図面を
用いて説明する。図１は、本実施例の構成を示すブロッ
ク図である。同図より、本実施例の文字認識装置は、各
処理を制御するＣＰＵ１０と、認識結果テーブルを格納
するＲＡＭ装置２０と、処理モジュールなどが格納され
たＲＯＭ装置３０と、文字の入力や表示などを行う入出
力装置４０とを備えている。ＲＯＭ装置３０には、手書
きまたは印刷された複数個の文字イメージデータを入力
する入力手段である入力モジュール３１と、特定の文字
と認識する識別手段である識別モジュール３２と、誤認
識された文字をマニュアル訂正するマニュアル訂正手段
であるマニュアル訂正モジュール３３と、他の文字の誤
認識を再チェックする再確認手段である再確認モジュー
ル３４と、誤認識文字を自動的に訂正する自動訂正手段
である自動訂正モジュール３５と、誤認識の可能性が高
いと判定された文字を強調表示する訂正表示手段である
訂正表示モジュール３６と、誤認識の可能性が高いと判
定された文字について誤認識かを入力させる確認入力手
段である確認入力モジュール３７とが備えられている。
また、入出力装置４０には、手書きまたは印刷された文
字イメージデータを入力するイメージスキャナ４１と、
認識後の文字データを表示するディスプレイ装置４２
と、誤認識文字の訂正を行うキーボード装置４３と、複
数の文字のパターンが特徴量として登録された判別辞書
が格納されたハードディスク装置４４と、訂正処理で使
用するスイッチ４５とが備えられている。An embodiment of the present invention will be described below with reference to the accompanying drawings. FIG. 1 is a block diagram illustrating the configuration of the present embodiment. As shown in the figure, the character recognition device of the present embodiment includes a CPU 10 for controlling each process, a RAM device 20 for storing a recognition result table, a ROM device 30 for storing processing modules and the like, character input and display, and the like. And an input / output device 40 for performing the operation. The ROM device 30 includes an input module 31 as input means for inputting a plurality of handwritten or printed character image data, an identification module 32 as identification means for recognizing a specific character, and an erroneously recognized character. A manual correction module 33 which is a manual correction means for manually correcting, a reconfirmation module 34 which is a reconfirmation means for rechecking erroneous recognition of another character, and an automatic correction means which is an automatic correction means for automatically correcting erroneously recognized characters. A correction module 35, a correction display module 36 that is a correction display means for highlighting a character determined to have a high possibility of misrecognition, and an input indicating whether a character determined to have a high possibility of misrecognition is misrecognized. A confirmation input module 37 is provided as confirmation input means for causing the user to make a confirmation.
The input / output device 40 includes an image scanner 41 for inputting handwritten or printed character image data,
Display device 42 for displaying character data after recognition
And a keyboard device 43 for correcting erroneously recognized characters, a hard disk device 44 for storing a discrimination dictionary in which a plurality of character patterns are registered as feature amounts, and a switch 45 for use in correction processing. .

【０００９】次に本実施例の処理内容について説明す
る。本実施例の文字認識装置の処理は、入力モジュール
３１によるデータ読込み処理と、識別モジュール３２に
よるデータ認識処理と、マニュアル訂正モジュール３３
や再確認モジュール３４などによる誤認識訂正処理とか
ら構成されている。Next, the processing contents of this embodiment will be described. The processing of the character recognition device of the present embodiment includes a data reading process by the input module 31, a data recognition process by the identification module 32, and a manual correction module 33.
And an erroneous recognition and correction process by the reconfirmation module 34 and the like.

【００１０】まず、データ入力処理について説明する。
データ入力処理は、オペレータによるイメージスキャナ
４１を用いた読込みによって行われる。具体的には、オ
ペレータがＣＰＵ１０の制御下で入力モジュール３１を
起動させ、イメージスキャナ４１を雑誌等に印刷された
文字の上に押付けて、所望の文字を読み取る。この読取
りによって、文字イメージデータが入力される。First, the data input processing will be described.
The data input processing is performed by reading by the operator using the image scanner 41. More specifically, the operator activates the input module 31 under the control of the CPU 10, and presses the image scanner 41 on a character printed on a magazine or the like to read a desired character. By this reading, character image data is input.

【００１１】次に、データ認識処理について説明する。
データ認識処理は、オペレータがＣＰＵ１０の制御下で
識別モジュール３２を起動させ、入力された文字イメー
ジデータのペリフェラル特徴を抽出し、この特徴量とハ
ードディスク装置４４に格納された判別辞書の複数の登
録文字パターンの特徴量とを比較・判別して行う。ここ
で、ペリフェラル特徴の抽出は次のように行われる。ま
ず、文字パターンの外接枠を求め、外接枠の４辺をそれ
ぞれ４分割する。そして、分割された外接枠と、外接枠
から見て最初に出会う文字部で囲まれた白領域の面積を
計数し、これを全体の面積で規格化することによって、
特徴量（１６次元特徴ベクトル）を求める。例えば、図
２に示すように、６４×６４ビットからなる「水」とい
う文字の文字画像領域を１６×１６ドット単位に分割す
ると、特徴量（α₁、α₂、……、α₁₆）は、（α₁＝
２９、α₂＝１７、……、α₁₆＝１５）のようになる。Next, the data recognition processing will be described.
In the data recognition process, the operator activates the identification module 32 under the control of the CPU 10, extracts the peripheral features of the input character image data, and stores the feature amount and a plurality of registered characters of the discrimination dictionary stored in the hard disk device 44 in the discrimination dictionary. This is performed by comparing and discriminating the feature amount of the pattern. Here, the extraction of the peripheral feature is performed as follows. First, the circumscribed frame of the character pattern is obtained, and the four sides of the circumscribed frame are divided into four. Then, by counting the area of the divided circumscribed frame and the white area surrounded by the character portion that first meets when viewed from the circumscribed frame, and standardizing this with the entire area,
A feature amount (16-dimensional feature vector) is obtained. For example, as shown in FIG. 2, when the character image area of the character “water” composed of 64 × 64 bits is divided into 16 × 16 dot units, the feature amounts (α ₁ , α ₂ ,..., Α ₁₆ ) , (Α ₁ =
29, α ₂ = 17,..., Α ₁₆ = 15).

【００１２】このようにして抽出された文字イメージデ
ータの特徴量α_jと、ハードディスク装置４４に格納さ
れた判別辞書の複数の登録文字パターンの特徴量β
_ij（ｉ＝１〜ｎ、Ｊ＝１〜１６、ｎは登録字数）との比
較は以下のように行われる。The feature amount α _j of the character image data extracted in this way and the feature amount β of a plurality of registered character patterns of the discrimination dictionary stored in the hard disk device 44
The comparison with _ij (i = 1 to n, J = 1 to 16, n is the number of registered characters) is performed as follows.

【００１３】ここで、ｉはｉ番目に辞書に登録されている文字を表
し、Ｄ（ｉ）はｉ番目に辞書に登録されている文字との
距離（２つの文字パターンが似ている度合い）を表す。
このようにして得られた距離Ｄ（ｉ）（ｉ＝１〜ｎ、ｎ
は登録字数）を小さい順にソートして、ＲＡＭ装置２０
内の認識結果テーブルに格納する。認識結果テーブルと
は、誤認識訂正処理に使用するテーブルをいい、格納さ
れるデータの種類によって図３に示すような４種類のテ
ーブルに分けられる。これらの認識結果テーブルについ
て、図３を用いて説明する。図３（ａ）は、認識結果テ
ーブル１００の構造を示す概念図である。認識結果テー
ブル１００には、辞書登録された文字の文字コードと、
入力文字データの特徴量と辞書登録された文字の特徴量
間の距離と、入力文字データの特徴量とが、特徴量間の
距離の小さい順に格納されている。図３（ｂ）は、認識
結果テーブル１１０の構造を示す概念図である。認識結
果テーブル１１０には、文字コードと、特徴量間の距離
と、入力文字データの特徴量と、入力文字データの文字
イメージとが、特徴量間の距離の小さい順に格納されて
いる。この認識結果テーブル１１０は、ディスプレイ装
置４２の画面上に入力イメージの文字データを表示させ
る場合に適している。図３（ｃ）は、認識結果テーブル
１２０の構造を示す概念図である。認識結果テーブル１
２０には、文字コードと、特徴量間の距離と、入力文字
データの文字イメージとが、特徴量間の距離の小さい順
に格納されている。この認識結果テーブル１２０は、特
徴量間の距離のデータがない分だけデータ量を少なくで
きるが、マッチング時に入力文字データの文字イメージ
から特徴量を抽出しなければならないといった欠点があ
る。図３（ｄ）は、認識結果テーブル１３０を示す概念
図である。認識結果テーブル１３０には、文字コード
と、特徴量間の距離とが、特徴量間の距離の小さい順に
格納されている。さらに、１位と２位の特徴量間の距離
の比が一定値γより小さい場合だけ、これらのデータと
共に入力文字データの特徴量が格納されている。このよ
うに可変型のテーブル構造にすることによって、データ
量を削減することができる。[0013] Here, i represents the character registered in the dictionary i-th, and D (i) represents the distance from the character registered in the dictionary i-th (the degree of similarity between the two character patterns).
The distance D (i) (i = 1 to n, n
Is the number of registered characters) in ascending order.
In the recognition result table. The recognition result table refers to a table used for erroneous recognition correction processing, and is divided into four types of tables as shown in FIG. 3 according to the type of data stored. These recognition result tables will be described with reference to FIG. FIG. 3A is a conceptual diagram showing the structure of the recognition result table 100. The recognition result table 100 includes character codes of characters registered in the dictionary,
The distance between the characteristic amount of the input character data and the characteristic amount of the character registered in the dictionary and the characteristic amount of the input character data are stored in ascending order of the distance between the characteristic amounts. FIG. 3B is a conceptual diagram illustrating the structure of the recognition result table 110. In the recognition result table 110, a character code, a distance between feature amounts, a feature amount of input character data, and a character image of input character data are stored in ascending order of the distance between feature amounts. The recognition result table 110 is suitable for displaying character data of an input image on the screen of the display device 42. FIG. 3C is a conceptual diagram showing the structure of the recognition result table 120. Recognition result table 1
20 stores a character code, a distance between feature amounts, and a character image of input character data in ascending order of the distance between feature amounts. Although the data amount of the recognition result table 120 can be reduced as much as there is no data of the distance between the feature amounts, there is a drawback that the feature amount must be extracted from the character image of the input character data at the time of matching. FIG. 3D is a conceptual diagram illustrating the recognition result table 130. In the recognition result table 130, the character codes and the distances between the feature amounts are stored in ascending order of the distance between the feature amounts. Further, only when the ratio of the distance between the first and second feature values is smaller than the fixed value γ, the feature value of the input character data is stored together with these data. By using a variable table structure in this way, the data amount can be reduced.

【００１４】データ認識処理では、この認識結果テーブ
ルの１位の文字が、入力されたイメージデータの文字と
同一の文字であると判定される。この処理は入力された
すべてのイメージデータについて行われ、認識された文
字列がディスプレイ装置４２に表示される。In the data recognition processing, it is determined that the first character in the recognition result table is the same character as the character of the input image data. This processing is performed for all the input image data, and the recognized character string is displayed on the display device 42.

【００１５】次に、誤認識訂正処理について、図４〜図
１４を用いて説明する。図４は、誤認識訂正処理の概要
を示すフローチャートである。同図より、オペレータが
ＣＰＵ１０の制御下でマニュアル訂正モジュール３３や
再確認モジュール３４などを起動させ、ディスプレイ装
置４２に表示されたすべての文字列が正しく認識されて
いるかを確認する（ステップ２００）。そして、誤認識
された文字を発見すると（ステップ２２０）、ディスプ
レイ装置４２に表示された文字にカーソルを移して、キ
ーボード４３から訂正入力を行う（ステップ２３０）。
この訂正入力によって、既に認識された他の文字の再チ
ェックが行われ、誤認識の可能性が高いすべての文字が
抽出される（ステップ２４０）。そして、抽出されたす
べての文字の訂正処理が行われ（ステップ２５０）、訂
正処理終了後に処理をステップ２００に戻す。以上の処
理が繰り返され、すべての文字の確認が終了した段階で
処理を終了させる（ステップ２１０）。Next, the erroneous recognition and correction processing will be described with reference to FIGS. FIG. 4 is a flowchart showing an outline of the erroneous recognition correction processing. As shown in the figure, the operator activates the manual correction module 33, the reconfirmation module 34, and the like under the control of the CPU 10, and confirms whether all the character strings displayed on the display device 42 are correctly recognized (step 200). Then, when an erroneously recognized character is found (step 220), the cursor is moved to the character displayed on the display device 42 and a correction input is made from the keyboard 43 (step 230).
With this correction input, another character that has already been recognized is rechecked, and all characters that are likely to be erroneously recognized are extracted (step 240). Then, the correction processing of all the extracted characters is performed (step 250), and the processing returns to step 200 after the correction processing is completed. The above process is repeated, and when all characters have been confirmed, the process is terminated (step 210).

【００１６】次に図５、図６を用いて、誤認識の可能性
が高い文字の抽出処理（図４のステップ２４０の処理）
について説明する。図５は、再確認モジュール３４によ
る抽出処理を示すフローチャートである。なお、この抽
出処理には、認識結果テーブル１００、１１０、１３０
が用いられる。同図より、ステップ２３０による訂正入
力によって、既に認識された他のすべての文字が再チェ
ックされ、訂正した文字と同一の文字の検索が行われる
（ステップ２４２）。そして、同一の文字が検出された
場合には、検出された文字の特徴量と訂正した文字の特
徴量の距離Ｄ_istを、認識結果テーブル１００、１１０
に格納されたデータを用いて計算する（ステップ２４
３）。このようにして計算された距離Ｄ_istに統計的に
求めた補正数値δを加えた値と、検出された文字の１位
の距離ｄ₁とが比較され（ステップ２４４）、距離ｄ₁
の方が大きい場合には、検出された文字は誤認識である
可能性が高いと判定する（ステップ２４５）。また、距
離ｄ₁の方が小さい場合には、検出された文字は正しく
認識されていると判定する。そして、すべての文字につ
いて比較・判定が行われた後に、抽出処理を終了させる
（ステップ２４１）。Next, referring to FIGS. 5 and 6, a process of extracting a character having a high possibility of erroneous recognition (the process of step 240 in FIG. 4).
Will be described. FIG. 5 is a flowchart showing the extraction processing by the reconfirmation module 34. Note that this extraction processing includes the recognition result tables 100, 110, 130
Is used. As shown in the figure, by the correction input in step 230, all other recognized characters are rechecked, and the same character as the corrected character is searched (step 242). When the same character is detected, the distance D _ist between the characteristic amount of the detected character and the characteristic amount of the corrected character is stored in the recognition result tables 100 and 110.
(Step 24)
3). A value obtained by adding the statistically obtained correction value δ to the distance D _ist calculated in this way is compared with the first-place distance d _{1 of} the detected character (step 244), and the distance d _{1 is obtained.}
If is larger, it is determined that the detected character is likely to be erroneously recognized (step 245). If the distance d ₁ is smaller, it is determined that the detected character is correctly recognized. Then, after the comparison / determination has been performed for all the characters, the extraction processing is terminated (step 241).

【００１７】図６も、再確認モジュール３４による抽出
処理を示すフローチャートである。なお、この抽出処理
には、認識結果テーブル１２０が用いられる。上述した
ように認識結果テーブル１２０が他のテーブルと異なる
のは、テーブル中に特徴量のデータを持たないことであ
る。このため、同図に示すフローチャートでは、ステッ
プ２４６で、訂正した文字の特徴量と、検出された文字
の特徴量とをそれぞれ求めている。それ以外の処理は、
図５のフローチャートの処理と同じである。FIG. 6 is a flowchart showing the extraction processing by the reconfirmation module 34. Note that the recognition result table 120 is used for this extraction processing. As described above, the difference between the recognition result table 120 and other tables is that the table does not have feature amount data. For this reason, in the flowchart shown in the figure, in step 246, the characteristic amount of the corrected character and the characteristic amount of the detected character are obtained. Other processing is
This is the same as the processing in the flowchart of FIG.

【００１８】次に、誤認識の可能性が高い文字の抽出処
理（ステップ２４０）の具体例について、図７を用いて
説明する。同図では、まず、イメージスキャナ４１を用
いて雑誌などからイメージデータを入力し、このイメー
ジデータの特徴量と、ハードディスク装置４４に格納さ
れた判別辞書に登録された文字の特徴量とを比較・判定
する。そして、その認識結果をディスプレイ装置４２に
表示する。この画面表示をオペレータが確認して、２桁
目の文字「ナ」を「す」に訂正する。この訂正処理によ
って、他の文字に「ナ」と認識した文字がないかを検索
する。この検索によって、４桁目の「ナ」が検出され
る。２桁目の文字「ナ」の特徴量｛２８、１５、２５、
……、２１、１５｝と、４桁目の「ナ」の特徴量｛２
９、１６、２６、……、２０、１６｝との距離はＤ_ist
＝２９である。また、４桁目の「ナ」の特徴量と、判別
辞書に登録された文字「ナ」の特徴量｛３０、１６、３
１、……、２１、２１｝との距離はｄ₁＝５３である。
ここで、補正数値δ＝１０とすると、（ｄ₁＝５３）＞
（Ｄ_ist＝２９）＋（δ＝１０）の条件を満足するの
で、４桁目の「ナ」は誤認識の可能性が高く「す」に訂
正すべきであることが分かる。Next, a specific example of the process of extracting a character having a high possibility of erroneous recognition (step 240) will be described with reference to FIG. In the figure, first, image data is input from a magazine or the like using the image scanner 41, and the feature amount of the image data is compared with the feature amount of characters registered in the discrimination dictionary stored in the hard disk device 44. judge. Then, the recognition result is displayed on the display device 42. The operator confirms this screen display and corrects the second digit character "na" to "su". Through this correction process, a search is made to see if any other character has a character recognized as "na". By this search, the fourth digit “Na” is detected. The feature amount of the second digit character “Na” is $ 28, 15, 25,
..., 21, 15} and the feature amount {2 of the fourth digit “Na”
The distance from 9, 16, 26, ..., 20, 16 is D _ist
= 29. In addition, the feature amount of the fourth digit “Na” and the feature amount of the character “Na” registered in the discriminating dictionary, which are $ 30, 16, 3
The distance from 1,..., 21, 21 ° is d ₁ = 53.
Here, assuming that the correction value δ = 10, (d ₁ = 53)>
Since it satisfies the condition of (D _ist = 29) + (δ = 10), it is understood that the possibility of misrecognition of the fourth digit “na” is high and should be corrected to “su”.

【００１９】次に、このように抽出された文字の訂正処
理（ステップ２５０）について、図８〜図１４を用いて
説明する。訂正処理は、自動訂正モジュール３５を用い
て自動訂正する方法や、訂正表示モジュール３６および
確認入力モジュール３７を用いてオペレータの確認によ
って１文字ずつ訂正する方法や、自動訂正とオペレータ
による訂正を選択した上で訂正する方法などがある。Next, the process of correcting the character thus extracted (step 250) will be described with reference to FIGS. In the correction processing, a method of automatically correcting using the automatic correction module 35, a method of correcting one character at a time by the operator's confirmation using the correction display module 36 and the confirmation input module 37, or a selection of automatic correction and correction by the operator. There is a method to correct above.

【００２０】図８は自動訂正による訂正処理を示すフロ
ーチャート、図９は自動訂正による訂正処理の表示例で
ある。図８および図９より、ステップ２３０でオペレー
タが２桁目の「ナ」を「す」に訂正すると（画面３０
１、３０２）、ステップ２４０で４桁目と１９桁目の
「ナ」が抽出される。そして、抽出された２カ所の文字
「ナ」が同時に「す」に訂正される（ステップ２５
２）。また、訂正された文字が容易に判別できるよう
に、リバース文字や色を変えた文字や点滅文字などの強
調文字で表示される（ステップ２５３、画面３０３）。
そして、抽出された文字すべての訂正を行った後に、処
理を終了させる（ステップ２５１）。FIG. 8 is a flowchart showing the correction processing by automatic correction, and FIG. 9 is a display example of the correction processing by automatic correction. 8 and 9, when the operator corrects the second digit “NA” to “SU” in step 230 (screen 30).
1, 302), and at step 240, the fourth digit and the 19th digit “na” are extracted. Then, the extracted two characters "na" are simultaneously corrected to "su" (step 25).
2). In addition, in order to easily identify the corrected character, the character is displayed as a highlighted character such as a reverse character, a character with a different color, or a blinking character (step 253, screen 303).
Then, after correcting all the extracted characters, the process is terminated (step 251).

【００２１】図１０はオペレータの確認によって１文字
ずつ訂正する訂正処理を示すフローチャート、図１１は
オペレータの確認によって１文字ずつ訂正する訂正処理
の表示例である。図１０および図１１より、ステップ２
３０でオペレータが２桁目の「ナ」を「す」に訂正する
と（画面３１１、３１２）、ステップ２４０で４桁目と
１９桁目の「ナ」が抽出される。そして、抽出された４
桁目の文字「ナ」の上にカーソルを移動させて（ステッ
プ２６２）、移動した位置の文字を強調表示させる（ス
テップ２６３）。そして、訂正の確認用のウィンドウを
開き（ステップ２６４、画面３１３）、オペレータが改
行キーを打鍵することによって（ステップ２６５）、４
桁目の文字「ナ」が「す」に訂正される（ステップ２６
６）。この訂正後、ステップ２６２に処理を戻し、１９
桁目の文字「ナ」の位置で訂正の確認用ウィンドウを開
く（画面３１４）。そして、オペレータによる改行キー
の打鍵によって、１９桁目の文字「ナ」も「す」に訂正
される。この訂正で、他には訂正すべき可能性のある文
字がなくなったので、元の位置にカーソルを戻し、処理
を終了させる（ステップ２６１）。FIG. 10 is a flowchart showing a correction process for correcting characters one by one by the confirmation of the operator, and FIG. 11 is a display example of a correction process for correcting characters one by one by the confirmation of the operator. From FIG. 10 and FIG.
When the operator corrects the second digit “na” to “su” at 30 (screens 311 and 312), at step 240, the fourth digit and the 19th digit “na” are extracted. And the extracted 4
The cursor is moved over the character "na" in the digit (step 262), and the character at the moved position is highlighted (step 263). Then, a window for confirming the correction is opened (step 264, screen 313), and the operator presses a line feed key (step 265), and
The character "na" in the digit is corrected to "su" (step 26).
6). After this correction, the process returns to step 262, where 19
A correction confirmation window is opened at the position of the character "na" in the digit (screen 314). Then, the 19th digit character “Na” is also corrected to “su” by the input of the line feed key by the operator. As a result of this correction, there are no other characters that may be corrected, so the cursor is returned to the original position and the process is terminated (step 261).

【００２２】図１２は自動訂正かオペレータによる訂正
かを選択できるスイッチ４５をユーザインタフェースに
備えた訂正処理を示すフローチャートである。同図よ
り、ステップ２４０で誤認識の可能性が高い文字を抽出
して、スイッチ４５の状態をチェックする（ステップ２
７２）。スイッチ４５がオン状態の場合は、自動訂正に
よる処理を行う（ステップ２７３）。また、スイッチ４
５がオフ状態の場合は、オペレータによる訂正処理を行
う（ステップ２７４）。FIG. 12 is a flowchart showing a correction process in which a user interface is provided with a switch 45 capable of selecting automatic correction or correction by an operator. As shown in the figure, in step 240, a character having a high possibility of erroneous recognition is extracted, and the state of the switch 45 is checked (step 2).
72). If the switch 45 is on, processing by automatic correction is performed (step 273). Switch 4
If 5 is off, the operator performs a correction process (step 274).

【００２３】図１３は誤動作防止のフラグを備えた自動
訂正による訂正処理を示すフローチャート、図１４は誤
動作防止のフラグを備えた自動訂正による訂正処理の表
示例である。図１３および図１４より、ステップ２３０
でオペレータが２桁目の「ナ」を「す」に訂正すると
（画面３２１、３２２）、ステップ２４０で４桁目と１
９桁目の「ナ」が抽出される。そして、抽出された２カ
所の文字「ナ」のフラグが「ＯＮ」になっているかを判
定する（ステップ２８２）。フラグが「ＯＦＦ」であれ
ば、文字「ナ」は「す」に訂正される（ステップ２８
３）。また、訂正された文字が容易に判別できるよう
に、リバース文字や色を変えた文字や点滅文字などの強
調文字で表示される（ステップ２８４、画面３２３）。
そして、訂正された文字のフラグを「ＯＮ」にする（ス
テップ２８５）。この処理によって、２桁目、４桁目お
よび１９桁目の文字「な」のフラグは「ＯＮ」となり、
次にオペレータが３桁目の「す」を「ま」に訂正して
も、２桁目、４桁目および１９桁目の文字「な」は、ス
テップ２５３で「ま」に訂正されることはない（画面３
０４）。FIG. 13 is a flowchart showing a correction process by an automatic correction having a malfunction prevention flag. FIG. 14 is a display example of a correction process by an automatic correction having a malfunction prevention flag. From FIG. 13 and FIG.
When the operator corrects the second digit “na” to “su” (screens 321 and 322), the fourth digit and 1
The ninth digit “na” is extracted. Then, it is determined whether or not the flags of the extracted two characters "na" are "ON" (step 282). If the flag is "OFF", the character "na" is corrected to "su" (step 28).
3). In addition, the corrected characters are displayed as highlighted characters such as reverse characters, characters with different colors, and blinking characters so that the characters can be easily identified (step 284, screen 323).
Then, the flag of the corrected character is set to "ON" (step 285). By this processing, the flag of the character “na” at the second, fourth, and 19th digits is set to “ON”,
Next, even if the operator corrects the third digit "su" to "ma", the characters "na" in the second, fourth and 19th digits are corrected to "ma" in step 253. No (Screen 3
04).

【００２４】なお、本実施例では、文字パターンの認識
を、重ね合わせ法によるペリフェラル特徴を抽出によっ
て行っているが、本発明はこの抽出方法に限らず、メッ
シュ法や構造解析法などを用いてもよい。In this embodiment, the recognition of the character pattern is performed by extracting the peripheral features by the superposition method. However, the present invention is not limited to this extraction method, but uses a mesh method or a structural analysis method. Is also good.

【００２５】[0025]

【発明の効果】本発明の文字認識装置であれば、マニュ
アル訂正手段で訂正文字に置換された誤認識文字と同一
認識された文字イメージデータが検出される。そして、
検出された文字イメージデータのパターンが、誤認識文
字の辞書登録された文字パターンよりも、誤認識文字の
イメージデータのパターンに近似している場合に、誤認
識の可能性が高いと判定される。また、このように判定
された文字イメージデータについては、自動訂正手段を
用いて自動訂正することもできる。このため、文字認識
後のマニュアル訂正の作業が短時間で済み、オペレータ
の負担が軽減する。According to the character recognition device of the present invention, the character image data which is recognized as the same as the erroneously recognized character replaced with the corrected character by the manual correcting means is detected. And
If the pattern of the detected character image data is closer to the pattern of the image data of the misrecognized character than the character pattern registered in the dictionary of the misrecognized character, it is determined that the possibility of misrecognition is high. . In addition, the character image data determined in this way can be automatically corrected using an automatic correction unit. Therefore, the work of manual correction after character recognition is completed in a short time, and the burden on the operator is reduced.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本実施例の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of the present embodiment.

【図２】ペリフェラル特徴の抽出例を示す概念図であ
る。FIG. 2 is a conceptual diagram showing an example of extracting a peripheral feature.

【図３】認識結果テーブルのデータ構造を示す概念図で
ある。FIG. 3 is a conceptual diagram showing a data structure of a recognition result table.

【図４】誤認識訂正処理の概要を示すフローチャートで
ある。FIG. 4 is a flowchart illustrating an outline of an erroneous recognition correction process.

【図５】再確認モジュールによる抽出処理を示すフロー
チャートである。FIG. 5 is a flowchart showing an extraction process by a reconfirmation module.

【図６】再確認モジュールによる抽出処理を示すフロー
チャートである。FIG. 6 is a flowchart showing an extraction process by a reconfirmation module.

【図７】抽出処理の具体例を示す概念図である。FIG. 7 is a conceptual diagram showing a specific example of an extraction process.

【図８】自動訂正による訂正処理を示すフローチャート
である。FIG. 8 is a flowchart illustrating correction processing by automatic correction.

【図９】自動訂正による訂正処理の表示例を示す図であ
る。FIG. 9 is a diagram showing a display example of correction processing by automatic correction.

【図１０】１文字ずつ訂正する訂正処理を示すフローチ
ャートである。FIG. 10 is a flowchart showing a correction process for correcting one character at a time.

【図１１】１文字ずつ訂正する訂正処理の表示例を示す
図である。FIG. 11 is a diagram illustrating a display example of a correction process for correcting one character at a time.

【図１２】訂正処理を示すフローチャートである。FIG. 12 is a flowchart illustrating a correction process.

【図１３】フラグを備えた自動訂正による訂正処理を示
すフローチャートである。FIG. 13 is a flowchart illustrating a correction process by an automatic correction having a flag.

【図１４】フラグを備えた自動訂正による訂正処理の表
示例を示す図である。FIG. 14 is a diagram illustrating a display example of correction processing by automatic correction having a flag.

[Explanation of symbols]

１０…ＣＰＵ、２０…ＲＡＭ装置、３０…ＲＯＭ装置、
３１…入力モジュール、３２…識別モジュール、３３…
マニュアル訂正モジュール、３４…再確認モジュール、
３５…自動訂正モジュール、３６…訂正表示モジュー
ル、３７…確認入力モジュール、４０…入出力装置。10 CPU, 20 RAM device, 30 ROM device,
31 input module, 32 identification module, 33
Manual correction module, 34 ... reconfirmation module,
35: automatic correction module, 36: correction display module, 37: confirmation input module, 40: input / output device.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06K 9/00 - 9/82 ──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int.Cl. ⁶ , DB name) G06K 9/00-9/82

Claims

(57) [Claims]

An input unit for inputting a plurality of handwritten or printed character image data, and comparing the character image data input by the input unit with a plurality of character patterns registered in a dictionary to specify a specific character. An identifying means for recognizing; a manual correcting means for manually correcting an erroneously recognized character from the group of recognized characters recognized by the identifying means; and an erroneous character replaced by the manual correcting means from the group of recognized characters. Detecting the character image data that is recognized as the same as the recognized character, the pattern of the character image data is closer to the pattern of the image data of the erroneously recognized character than the character pattern registered in the dictionary of the erroneously recognized character. A character recognizing device comprising: a re-confirmation unit that determines that the possibility of erroneous recognition is high when the character recognition is performed.

2. The data pattern determination by the reconfirmation means is performed by extracting a peripheral feature of each data pattern and comparing the feature amounts to make an approximate determination. Character recognition device.

3. An automatic correction means for automatically correcting character image data determined to have a high possibility of erroneous recognition by said reconfirmation means.
Alternatively, the character recognition device according to claim 2.

4. The apparatus according to claim 1, further comprising correction display means for highlighting character image data determined to have a high possibility of erroneous recognition by said reconfirmation means. The character recognition device described in Crab.

5. The apparatus according to claim 1, further comprising a confirmation input unit for inputting whether the character image data determined to have a high possibility of erroneous recognition by the reconfirmation unit is erroneous recognition. The character recognition device according to claim 4.