JPH03291777A

JPH03291777A - Recognition candidate output control method for character recognizing device

Info

Publication number: JPH03291777A
Application number: JP2093724A
Authority: JP
Inventors: Atsuko Kurihara; 栗原　敦子; Sueshige Harada; 季栄原田; Toshiyuki Yoshida; 敏之吉田; Mamoru Okada; 守岡田
Original assignee: N T T DATA TSUSHIN KK; NTT Data Communications Systems Corp
Current assignee: N T T DATA TSUSHIN KK; NTT Data Group Corp
Priority date: 1990-04-09
Filing date: 1990-04-09
Publication date: 1991-12-20
Anticipated expiration: 2014-12-13
Also published as: JP2990734B2

Abstract

PURPOSE:To reduce the load of a user for the work of confirming and correcting a candidate character by learning the appearance order and appearance frequency of the exact character among the recognition candidate characters, selecting the recognition candidate character with much higher accuracy and reducing the number of recognition candidate characters to be displayed. CONSTITUTION:While referring to a recognition dictionary 14, a character recognition part 1 takes out the recognition candidate characters corresponding to candidate character selection and stores these taken-out characters into a recognition candidate character storage part 2. In respect to these stored recognition candidate characters, a candidate character selection part 3 refers to a second candidate character storage part 6 and selects characters to be a second candidate character group, and these characters are sent to a confirmation/correction part 4 and displayed on a display part 7. Through a keyword 8, the user designates the exact character out of the displayed second candidate character group, and the correction part 4 sends the correct character to an order learning part 5. When the definite correct character is received, the learning part 5 calculates the real order of the correct character while referring to the storage part 2. Further, the learning part 5 updates contents stored in the storage part 6 by changing the total number of times for generating the correct character in the area of the corresponding storage part 6, and the number of times for generating the relevant order.

Description

【発明の詳細な説明】〔産業上の利用分野］本発明は、文字認識装置、すなわち、光学式文字読取り
装置（以下、ＯＣＲと記載：　０ｐｔｉｃａｌＣｈａｒ
ａｃｔｅｒ　　Ｒｅａｄｅｒ）の認識候補文字の出力制
御方法に係り、特に、認識候補文字の中から、効率良く
正解文字を選択するのに好適なＯＣＲの認識候補文字出
力制御方法に関するものである。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a character recognition device, that is, an optical character reading device (hereinafter referred to as OCR: 0pticalChar).
The present invention relates to a method for controlling the output of recognition candidate characters in an OCR reader, and particularly to a method for controlling the output of recognition candidate characters in an OCR suitable for efficiently selecting a correct character from among recognition candidate characters.

[Conventional technology]

ＯＣＲは、光学的な手段で、文字を読み取る装置であり
、用紙媒体」−に記入、あるいは、印刷された文字を自
動認識し、計算機への入力機能を果たすものである。OCR is a device that reads characters using optical means, and it automatically recognizes characters written or printed on a paper medium and performs the function of inputting them into a computer.

ＯＣＲの実際の用途としては、郵便番号の読み取りや、
税金納付票、電気、ガス料金払込票などの内容の読み取
りに利用されている。Actual uses of OCR include reading postal codes,
It is used to read the contents of tax payment slips, electricity and gas bill payment slips, etc.

ＯＣＲのうち、英数字仮名を対象としたものは、はぼ１
００％の認識率が得られている。Among OCR, those that target alphanumeric kana are Habo1
A recognition rate of 0.00% was obtained.

しかし、漢字を対象としたＯＣＲは、文字種が多いこと
や、書体、および、文字の大きさが、多種多様であり、
さらに、パターンが複雑であることから、より複雑な文
字認識方法が必要である。However, OCR for kanji has a wide variety of character types, fonts, and character sizes.
Furthermore, the complexity of the patterns requires more complex character recognition methods.

このようなＯＣＲに関しては、電子情報通信学会績「電
子情報通信ハンドブック／１９８８Ｊ（オーム社発行）
のＰＰ、１６７７〜１６７８、および、ＰＰ、２６８７
〜２６８８に記載されている。Regarding this type of OCR, please refer to the IEICE publication “Electronic Information and Communication Handbook/1988J (published by Ohmsha)”
PP, 1677-1678, and PP, 2687
-2688.

以下に述べるＯＣＲは、全て、漢字を対象としたＯＣＲ
に関してのものである。The OCR described below is all OCR for kanji.
It is about.

従来のＯＣＲにおいては、−・つの文字を読み取り、認
識する場合には、まず、複数個の認識候補文字を出力す
る。In conventional OCR, when reading and recognizing characters, first, a plurality of recognition candidate characters are output.

その認識候補文字の中には、ただ−個の正解が含まれる
か、もしくは、正解文字が全て含まれないかのいずれか
である。すなわち、認識候補文字の大部分が正解文字で
はない。Among the recognition candidate characters, either only - number of correct answers are included, or all correct answer characters are not included. In other words, most of the recognition candidate characters are not correct characters.

例えば、一つの文字に対して、］６個の認識候補文字を
出力するＯＣＲの場合、少なくとも１５個の認識候補文
字は、正解文字ではない。For example, in the case of OCR that outputs six recognition candidate characters for one character, at least 15 recognition candidate characters are not correct characters.

また、ＯＣＲは、形か類似した候補文字を出力するため
に、当然のことながら、認識候補文字には、単語を形成
しない文字が多く含まれる。Furthermore, since OCR outputs candidate characters that are similar in shape, the recognition candidate characters naturally include many characters that do not form words.

このようなＯＣＲにおいて、従来の技術では、一つの文
字に対して、複数個の認識候補文字を表示装置などに全
て表示し、その中から、利用者が、正解文字を選択する
構成になっていた。In conventional OCR technology, multiple recognition candidate characters are displayed on a display device, etc. for one character, and the user selects the correct character from among them. Ta.

[Problem to be solved by the invention]

従来のＯＣＲにおいては、認識して得られた認識候補文
字を表示装置などに全て表示し、その中から、利用者が
、正解文字を選択する構成になっていた。In conventional OCR, all the recognition candidate characters obtained through recognition are displayed on a display device or the like, and the user selects the correct character from among them.

そのために、利用者が、認識結果を確認し、訂正する作
業か多くなり、文字認識を用いたデータエントリ業務の
スループットを低下させる大きな原因なっていた。As a result, the user has to perform a lot of work to check and correct the recognition results, which is a major cause of reducing the throughput of data entry operations using character recognition.

このように認識候補文字を表示する場合、従来のＯＣＲ
は、読み取った文字の一文字一文字について、正解文字
として指定される確率の高い順、すなわち、確度の順（
１位、２位、・・・、ｍ位）に認識候補文字を出力して
いた。When displaying recognition candidate characters in this way, conventional OCR
is the order of the probability of each character being read as a correct character, that is, the order of accuracy (
Recognition candidate characters were output at positions 1, 2, ..., m).

しかし、その確度は、予め決められており、さらに、そ
の決め方は、それぞれのＯＣＲにより異なる。また、Ｏ
ＣＲの特性や、文字の書き方などの要因により、正解文
字が必ずしも確度の高い順位に出現するとは限らないな
どの問題があった。However, the accuracy is determined in advance, and the method for determining it differs depending on each OCR. Also, O
Due to factors such as the characteristics of the CR and the way the characters are written, there have been problems such as the correct characters not necessarily appearing in a highly accurate ranking.

本発明の目的は、これら従来技術の課題を解決し、認識
候補文字の中における正解文字の出現順位と、その出現
頻度とを学習し、より正解度の高い認識候補文字を選別
し、認識候補文字の個数を削減して、候補文字の確認訂
正作業に掛る負荷を軽減させることを可能とするＯＣＲ
の認識候補文字出力制御方法を提供することである。The purpose of the present invention is to solve the problems of these conventional techniques, learn the order of appearance of correct characters among recognition candidate characters and their frequency of appearance, select recognition candidate characters with higher accuracy, and create recognition candidates. OCR that reduces the number of characters and reduces the burden of checking and correcting candidate characters.
An object of the present invention is to provide a recognition candidate character output control method.

[Means to solve the problem]

上記目的を達成するため、本発明のＯＣＲの認識候補文
字出力制御方法は、出力された第１候補文字群のそれぞ
れの文字の文字コードを、予め任意に設定された正解文
字として抽出される確率の高い順位で記憶する認識候補
文字格納部と、この認識候補文字格納部に記憶された第
１候補文字群から正解文字として抽出された文字に関し
て、正解文字として抽出された総回数、および、確率の
順位に対応した出現回数を記憶する第２候補文字記憶部
とを付与し、この第２候補文字記憶部に記憶した正解文
字として抽出された総回数と、確率の順位に対応した出
現回数に基づき、認識候補文字格納部に記憶された候補
文字から、さらに、正解文字として抽出される確率の高
い第２候補文字群を選別するステップと、このようにし
て選別した第２候補文字群から、人手を介して指定され
た正解文字を抽出するステップ、そして、この抽出した
正解文字の認識候補文字格納部での出現順位を認識し、
この正解文字の第２候補文字記憶部における出現順位に
対応した出現回数と、正解文字として抽出された総回数
とを訂正するステップを含むことを特徴とする。In order to achieve the above object, the OCR recognition candidate character output control method of the present invention has a probability that the character code of each character of the output first candidate character group is extracted as a correct character set arbitrarily in advance. A recognition candidate character storage section that stores the characters in a high order, and the total number of times that characters are extracted as correct characters from the first candidate character group stored in this recognition candidate character storage section, and the probability. A second candidate character storage unit that stores the number of appearances corresponding to the ranking of Based on the candidate characters stored in the recognition candidate character storage unit, further selecting a second candidate character group with a high probability of being extracted as a correct character, and from the thus selected second candidate character group, a step of manually extracting the specified correct character; and recognizing the order of appearance of the extracted correct character in the recognition candidate character storage;
The present invention is characterized by including the step of correcting the number of appearances of the correct character corresponding to the appearance order in the second candidate character storage unit and the total number of times the correct character has been extracted as the correct character.

[Effect]

本発明において、Ｏ，ＣＲは、確定された正解文字の第
１候補文字群内における順位、すなわち、認識候補文字
格納部における順位を学習し、この学習した結果を、文
字毎に、第２候補文字記憶部に格納する。In the present invention, O and CR learn the rank of the determined correct character in the first candidate character group, that is, the rank in the recognition candidate character storage unit, and use this learned result for each character as the second candidate character group. Store in character storage.

そして、同様な第１候補文字群が、認識候補文字格納部
に出力されてきた場合には、第２候補文字記憶部に格納
した文字毎の学習結果に基づき、さらに、正解文字とし
て抽出される確率の高い候補文字を選別する。Then, when a similar first candidate character group is output to the recognition candidate character storage section, it is further extracted as a correct character based on the learning results for each character stored in the second candidate character storage section. Select candidate characters with high probability.

このように、認識候補文字を、表示装置に表示する場合
には、認識候補文字の各々について、第２候補文字記憶
部に格納された過去からの出現類位のデータを参照して
、出現している順位が妥当かどうかを判定し、表示すべ
き認識候補文字を選別する。このことにより、表示され
る認識候補文字を、より、正解文字として抽出される確
率の高い文字だけに限定することが出来る。In this way, when displaying recognition candidate characters on a display device, the appearance similarity data for each recognition candidate character from the past stored in the second candidate character storage section is referred to. The system determines whether the rankings are appropriate and selects recognition candidate characters to be displayed. This makes it possible to limit the displayed recognition candidate characters to only characters with a higher probability of being extracted as correct characters.

そして、表示装置に表示された候補文字が絞り込まれて
いるために、利用者は、迅速に、正解文字を確定するこ
とが出来る。Since the candidate characters displayed on the display device are narrowed down, the user can quickly determine the correct character.

さらに、利用者が正解文字を確定した後、この正解文字
の出現順位を学習し、その学習結果を第２候補文字記憶
部に反映する。このことにより、表示装置に表示する認
識候補文字を、さらに、確率の高い文字だけに限定する
。Further, after the user has determined the correct character, the appearance order of the correct character is learned, and the learning result is reflected in the second candidate character storage section. This further limits the recognition candidate characters displayed on the display device to only characters with a high probability.

〔Example〕

以下、本発明の実施例を、図面により詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第１図は、本発明を施したＯＣＲの構成の一実施例を示
すブロック部である。FIG. 1 is a block diagram showing an embodiment of the configuration of an OCR according to the present invention.

帳票１０を走行し、帳票ｌｏ上の文字の濃淡を、電気信
号にする光学走査・光電変換部１１、光学走査・光電変
換部１１で得た文字パターンの雑音除去と、−文字分の
パターン切り出しを行なう前処理部１２、認識アルゴリ
ズムを実行する特徴抽出部１３、そして、特徴抽出部１
３で得られた特徴と認識辞書１４との比較を行ない、か
つ、本発明の認識候補文字出力制御を行ない、認識候補
文字を決定する認識処理部９、さらに、利用者が、正解
文字を決定するために用いる表示部７とキーボード８か
ら構成されている。An optical scanning/photoelectric conversion unit 11 that travels on a form 10 and converts the shading of characters on the form lo into an electrical signal, removes noise from the character pattern obtained by the optical scanning/photoelectric conversion unit 11, and cuts out a pattern for - characters. a preprocessing unit 12 that performs
The recognition processing unit 9 compares the features obtained in step 3 with the recognition dictionary 14, performs the recognition candidate character output control of the present invention, and determines recognition candidate characters; and the user determines the correct character. It consists of a display section 7 and a keyboard 8 used for

さらに、本発明の処理を行なう認識候補文字処理部９は
、特徴抽出部１３で得られた特徴と認識辞書１４との比
較を行ない、読み取った文字を認識し、対応する認識候
補文字群を出力する文字認識部ｌ、文字認識部１が出力
した認識候補文字群を格納する認識候補文字格納部２、
認識候補文字格納部２に格納した認識候補文字群から、
さらに、正解の確度の高い第２候補文字群を選別する候
補文字選別部３、候補文字選別部３により選別された第
２候補文字群から、利用者の指定に基づき、正解の文字
を抽出する確認訂正部４、そして、確認訂正部４におけ
る処理結果を学習する順位学習部５と、順位学習部５の
結果を文字毎に格納する第２候補文字記憶部６から構成
されている。Furthermore, the recognition candidate character processing unit 9 that performs the processing of the present invention compares the features obtained by the feature extraction unit 13 with the recognition dictionary 14, recognizes the read characters, and outputs a corresponding recognition candidate character group. a character recognition unit 1 for processing, a recognition candidate character storage unit 2 for storing a group of recognition candidate characters output from the character recognition unit 1;
From the recognition candidate character group stored in the recognition candidate character storage unit 2,
Further, a candidate character selection section 3 selects a second candidate character group with a high probability of being correct, and a correct character is extracted from the second candidate character group selected by the candidate character selection section 3 based on the user's specifications. It consists of a confirmation/correction section 4, a ranking learning section 5 that learns the processing results of the confirmation/correction section 4, and a second candidate character storage section 6 that stores the results of the ranking learning section 5 for each character.

尚、候補文字選別部３により選別された第２候補文字群
は、確認訂正部４を介して表示部７に表示され、利用者
は、この第２候補文字群から、正解の文字を指定し、キ
ーボード８を介して、確認訂正部４に通知する。The second candidate character group selected by the candidate character selection section 3 is displayed on the display section 7 via the confirmation/correction section 4, and the user can specify the correct character from this second candidate character group. , the confirmation/correction section 4 is notified via the keyboard 8.

また、第２候補文字記憶部６は、第３図で後述するよう
に、それぞれの文字に対応した記憶領域をもち、この記
憶領域には、それぞれの文字の総出現回数、第１順位で
の出現回数、第２順位での出現回数、さらに、第ｍ順位
での出現回数などが記録される構成となっている。In addition, the second candidate character storage unit 6 has a storage area corresponding to each character, as will be described later in FIG. The configuration is such that the number of appearances, the number of appearances in the second rank, the number of appearances in the m-th rank, etc. are recorded.

以下、この構成によるＯＣＲの、特に、認識処理部９の
本発明に係る処理動作を説明する。The following describes the OCR with this configuration, particularly the processing operation of the recognition processing section 9 according to the present invention.

第２図は、第１図における認識処理部の本発明に係る処
理動作の一実施例を示すフローチャートである。FIG. 2 is a flowchart showing one embodiment of the processing operation of the recognition processing section in FIG. 1 according to the present invention.

第１図の構成における各処理部の動作に基づき説明する
。The operation of each processing section in the configuration shown in FIG. 1 will be explained.

まず、文字認識部１は、認識辞書１４を参照し、特徴抽
出部１３で得られた文字に対応して、それぞれ、ｍ個の
認識候補文字を取り出し、認識候補文字格納部２に出力
する（ステップ２０１）。この時、ｍ個の認識候補文字
は、それぞれ、予め設定された順位で出力される。First, the character recognition unit 1 refers to the recognition dictionary 14, extracts m recognition candidate characters corresponding to the characters obtained by the feature extraction unit 13, and outputs them to the recognition candidate character storage unit 2 ( Step 201). At this time, each of the m recognition candidate characters is output in a preset order.

ここまでは、従来のＯＣＲの認識処理動作と同様である
。The operations up to this point are similar to the conventional OCR recognition processing operations.

ｍ個の全ての認識候補文字を、認識候補文字格納部２に
格納したならば（ステップ２０２）、次に、候補文字選
別部３は、認識候補文字格納部２に記憶されたｍ個の認
識候補文字の各々に対して、第２候補文字記憶部６の内
容を参照しくステップ２０３）、第２候補文字群となり
得る文字、すなわち、正解文字として抽出される確率の
高い文字を選別する（ステップ２０４）。そして、選別
した第２候補文字群を、確認訂正部４に送出する（ステ
ップ２０５）。Once all m recognition candidate characters are stored in the recognition candidate character storage unit 2 (step 202), the candidate character selection unit 3 next stores the m recognition candidate characters stored in the recognition candidate character storage unit 2. For each candidate character, refer to the contents of the second candidate character storage unit 6 (step 203) and select characters that can be part of the second candidate character group, that is, characters that have a high probability of being extracted as correct characters (step 203). 204). Then, the selected second candidate character group is sent to the confirmation/correction unit 4 (step 205).

確認訂正部４は、候補文字選別部３から送られた第２候
補文字群を表示部７に表示する（ステップ２０６）。利
用者は、表示部７に表示している第２候補文字群から正
解文字を指定し、キーボド８を介して正解文字を人力す
る（ステップ２゜７．２０８）。そして、確認訂正部４
は、利用者が入力した正解文字を順位学習部５へ送る（
ステップ２０９）。The confirmation/correction unit 4 displays the second candidate character group sent from the candidate character selection unit 3 on the display unit 7 (step 206). The user specifies the correct character from the second candidate character group displayed on the display section 7, and inputs the correct character manually via the keyboard 8 (step 2.7.208). And confirmation correction section 4
sends the correct characters input by the user to the ranking learning section 5 (
Step 209).

順位学習部５は、確定した正解文字を、確認訂正部４か
ら受は取ると、認識候補文字格納部２を参照して、正解
文字の実際の順位を求める（ステップ２］０）。When the ranking learning unit 5 receives the determined correct character from the confirmation/correction unit 4, it refers to the recognition candidate character storage unit 2 to determine the actual ranking of the correct character (step 2] 0).

さらに、順位学習部５は、正解文字に対応する第２候補
文字記憶部６の領域における総出現回数と、当該順位の
出現回数を変更して、第２候補文字記憶部６に記憶して
いる内容を更新する（ステップ２１１）。Furthermore, the rank learning unit 5 changes the total number of appearances in the area of the second candidate character storage unit 6 corresponding to the correct character and the number of appearances of the corresponding rank, and stores them in the second candidate character storage unit 6. The contents are updated (step 211).

このように、本実施例によれば、文字を読み取り正解文
字を確定させる度に、この正解文字に関して、その認識
候補文字群中での出現順位と、出現頻度を対応させて記
憶していた内容を更新する。In this way, according to the present embodiment, each time a character is read and a correct character is determined, the contents are stored in association with the order of appearance in the group of recognition candidate characters and the frequency of appearance of the correct character. Update.

そのことにより、候補文字選別部３は、第１認識候補文
字群の中から、より正解度の高い候補文字を選別して表
示することが出来る。Thereby, the candidate character selection unit 3 can select and display candidate characters with higher accuracy from the first recognition candidate character group.

第３図は、第１図における確認訂正部による第２候補文
字群の表示部への表示構成の一実施例を示す説明図であ
る。FIG. 3 is an explanatory diagram showing an example of the display configuration of the second candidate character group on the display unit by the confirmation/correction unit in FIG. 1.

第１図における表示部７の画面上には、第１順位の正解
の文字候補欄３１、第２順位の正解の文字候補欄３２、
さらに、最後の候補としての第■１順位の正解の文字候
補欄３３から構成される第２文字候補群３０が表示され
ている。On the screen of the display unit 7 in FIG. 1, there is a first-rank correct character candidate field 31, a second-rank correct character candidate field 32,
Furthermore, a second character candidate group 30 consisting of the correct character candidate column 33 of the first rank as the last candidate is displayed.

利用者は、画面上に表示されている第２候補文字群３ｏ
から正解文字を認識して、第１図のキーボード８を介し
て正解文字を入力する。The user selects the second candidate character group 3o displayed on the screen.
The correct character is recognized from , and the correct character is input via the keyboard 8 in FIG.

本実施例においては、候補文字選別部３により、候補文
字が、絞り込まれており、表示される候補文字が少なく
、利用者は、容易に、正解の文字を認識し、指定するこ
とが出来る。In this embodiment, the candidate characters are narrowed down by the candidate character selection unit 3, and the number of candidate characters displayed is small, allowing the user to easily recognize and specify the correct character.

第４図は、第１図における第２候補文字記憶部の記憶構
成を示す説明図である。FIG. 4 is an explanatory diagram showing the storage configuration of the second candidate character storage section in FIG. 1.

第２候補文字記憶部６は、ＩＣメモリや、磁気ディスク
装置などから構成され、ある文字、例えば、候補文字「
間」４１に対応する記憶領域４０には、この文字の総出
現回数（Ｎ、）４０１、第１順位での出現回数（Ｎ、）
４０２、第２順位での出現回数（Ｎ、）４０３、さらに
、第ｍ順位での出現回数（Ｎｍ）４０４が記録される構
成となっている。The second candidate character storage section 6 is composed of an IC memory, a magnetic disk device, etc., and stores a certain character, for example, a candidate character "
In the storage area 40 corresponding to "Ma" 41, the total number of times this character appears (N, ) 401, and the number of times it appears in the first rank (N, )
402, the number of appearances in the second rank (N,) 403, and the number of appearances in the m-th rank (Nm) 404 are recorded.

第１図における候補文字選別部３は、この記憶内容を参
照し、認識候補文字格納部２の第１候補文字群から、さ
らに、第２図に示された第２候補文字群を選別する。The candidate character selection section 3 in FIG. 1 refers to this stored content and further selects the second candidate character group shown in FIG. 2 from the first candidate character group in the recognition candidate character storage section 2.

以下、第１図における候補文字選別部３、および、順位
学習部５の動作を、第３図、および、第４図に示された
認識候補文字の記憶構成を用いて、さらに具体的に説明
する。The operations of the candidate character selection section 3 and the ranking learning section 5 in FIG. 1 will be explained in more detail below using the storage configurations of recognition candidate characters shown in FIGS. 3 and 4. do.

第５図は、第１図における候補文字選別部の本発明に係
る処理動作の一実施例を示すフローチャートである。FIG. 5 is a flowchart showing one embodiment of the processing operation of the candidate character selection section in FIG. 1 according to the present invention.

まず、認識候補文字群が、正解の文字候補となり得る出
現の割合の基準「Ｔ」を、予め、設定する（ステップ５
０］）。First, a standard "T" for the proportion of occurrences in which a recognition candidate character group can become a correct character candidate is set in advance (step 5
0]).

そして、第１図の認識候補文字格納部２に記憶されたｍ
個の認識候補文字の各々に対して、第２候補文字記憶部
６の内容を参照し、以下の動作を行なう。Then, the m stored in the recognition candidate character storage section 2 in FIG.
The following operations are performed for each of the recognition candidate characters by referring to the contents of the second candidate character storage section 6.

今、認識候補文字群の内の任意の１個を取り出す（ステ
ップ５０２）。これを、例えば、「Ｘ」とする。そして
、この文字ｒＸ４に対応する第２候補文字記憶部６の内
容に基づき、文字ｒＸＪの出現の割合が、ＩＴＪ以上と
なる最小の順位を求める（ステップ５０３）。Now, any one of the recognition candidate characters is extracted (step 502). For example, let this be "X". Then, based on the contents of the second candidate character storage section 6 corresponding to this character rX4, the minimum rank in which the appearance rate of the character rXJ is equal to or higher than ITJ is determined (step 503).

すなわち、文字「ｘ」に対応する第２候補文字記憶部６
の内容を、総出現回数がｒＮｏ（Ｘ）Ｊ、第ｉ順位での
出現回数がｒＮ　ｉ　（Ｘ）Ｊ　（ｉ　＝　１　。That is, the second candidate character storage unit 6 corresponding to the character "x"
The total number of appearances is rNo(X)J, and the number of appearances at the i-th rank is rN i (X)J (i = 1).

２、・・・、ｍ）とする。2, ..., m).

この時、となる最小のｒｋＪ　を求める。At this time, Find the minimum rkJ.

すなわち、文字「Ｘ」の出現の割合が、「Ｔ」以上とな
る最小の順位１ｆＪを求める。尚、ＩＴＪは、通常、０
．８〜０．９に設定すれば良い。That is, the minimum rank 1fJ in which the appearance rate of the character "X" is equal to or greater than "T" is determined. In addition, ITJ is usually 0
．． It is sufficient to set it to 8 to 0.9.

次に、この最小の順位ｒｋＪ　と文字ｒＸＪの認識候補
文字格納部２における順位（ｋ、（Ｘ）とする）を比較
する（ステップ５０４）。Next, this minimum rank rkJ is compared with the rank (k, (X)) of the character rXJ in the recognition candidate character storage unit 2 (step 504).

そして、ｒｋｏ（Ｘ）≦ｋ」ならば（ステップ５０５）
、文字ｒＸＪ　を候補文字として採用すると判定する（
ステップ５０６）。Then, if rko(X)≦k” (step 505)
, it is determined that the character rXJ is adopted as a candidate character (
Step 506).

また、ステップ５０５において、「ｋ。（Ｘ）〉ｋ」な
らば、文字「ｘ」を候補文字として採用しないと判定す
る（ステップ５０８）。Further, in step 505, if "k.(X)>k", it is determined that the character "x" is not adopted as a candidate character (step 508).

ステップ５０５で、ｒｋｏ（Ｘ）≦ｋ」であり、文字゛
「Ｘ」を候補文字として採用すると判定したならば、文
字「Ｘ」を、確認訂正部４に送出する（ステップ５０７
）。In step 505, if it is determined that rko(X)≦k and the character "X" is to be adopted as a candidate character, the character "X" is sent to the confirmation/correction unit 4 (step 507
).

第１図の候補文字選別部３は、以上の動作を、全ての認
識候補文字群に対して行ない、認識候補文字の選別結果
を確認訂正部４に送る。The candidate character selection section 3 in FIG. 1 performs the above operations for all recognition candidate character groups, and sends the recognition candidate character selection results to the confirmation/correction section 4.

第６図は、第１図における確認訂正部の本発明に係る処
理動作の一実施例を示すフローチャート１６である。FIG. 6 is a flowchart 16 showing an embodiment of the processing operation of the confirmation/correction section in FIG. 1 according to the present invention.

確認訂正部は、第１図の候補文字選別部３から送られた
候補文字群を第３図の第２候補文字群３０として表示し
くステップ６０１）、利用者に正解文字を選択させる（
ステップ６０２．６０３）。すなわち、第３図の第２候
補文字群３０には、第１順位の候補文字３１と、第２順
位以降の候補文字３２〜３３を区別して表示させ、カー
ソルキーやマウスなどの位置指示器を用いて、利用者に
正解文字を選択させ（ステップ６０３）で、正解文字、
例えば、ｒＹＪ　を順位学習部５へ送る（ステップ６０
４）。The confirmation/correction unit displays the candidate character group sent from the candidate character selection unit 3 in FIG. 1 as the second candidate character group 30 in FIG. 3 (step 601), and allows the user to select the correct character (step 601).
Steps 602 and 603). That is, in the second candidate character group 30 in FIG. 3, the first-rank candidate character 31 and the second-rank candidate characters 32 to 33 are displayed separately, and the position indicator such as a cursor key or mouse is displayed. is used to make the user select the correct character (step 603), and the correct character,
For example, send rYJ to the rank learning section 5 (step 60
4).

第７図は、第１図における順位学習部の本発明に係る処
理動作の一実施例を示すフローチャートである。FIG. 7 is a flowchart showing an embodiment of the processing operation of the ranking learning section in FIG. 1 according to the present invention.

順位学習部は、確定した正解文字「Ｙ」を、第１図の確
認訂正部４から受は取ると、第１図の認識候補文字格納
部２を参照して、正解文字「Ｙ」の実際の順位、例えば
、「ｊ」を求める（ステップ７０１）。When the ranking learning unit receives the confirmed correct character “Y” from the confirmation/correction unit 4 shown in FIG. 1, the ranking learning unit refers to the recognition candidate character storage unit 2 shown in FIG. The rank, for example, "j" is determined (step 701).

次に、正解文字「Ｙ」に対応する第１図の第２候補文字
記憶部６の領域を参照し、総出現回数ＦＮ。（Ｙ）」と
、第ｊ順位の出現回数ｒＮｊ（Ｙ）」を更新する（ステ
ップ７０２）。Next, the area of the second candidate character storage unit 6 in FIG. 1 corresponding to the correct character "Y" is referred to, and the total number of appearances FN is determined. (Y)'' and the number of appearances rNj(Y) of the jth rank are updated (step 702).

すなわち、Ｎｏ（Ｙ）←Ｎ、（Ｙ）＋１Ｎｊ（Ｙ）←Ｎｊ（Ｙ）＋１として、内容を更新する。ただし、「←」は、代入を表
す。That is, the contents are updated as No(Y)←N, (Y)+1 Nj(Y)←Nj(Y)+1. However, "←" represents assignment.

このように、正解文字に関して、その認識候補文字中で
の出現順位と、出現頻度を対応させて第２候補文字記憶
部６に格納し、かつ、正解文字を確定する度に、第２候
補文字記憶部６の格納内容を更新する。このことにより
、第１図の候補文字選別部３は、認識候補文字の中から
、より正解度の高い候補文字を選別して表示することが
可能となる。In this way, the correct character is stored in the second candidate character storage unit 6 in correspondence with its appearance order among recognition candidate characters and appearance frequency, and each time the correct character is determined, the second candidate character is The contents stored in the storage unit 6 are updated. This makes it possible for the candidate character selection unit 3 of FIG. 1 to select and display candidate characters with a higher degree of accuracy from among the recognition candidate characters.

尚、第２候補文字記憶部６に記録する内容は、出現回数
としているが、出現の比率を求めて記録しても同じ効果
が得られることは自明である。Although the content recorded in the second candidate character storage section 6 is the number of appearances, it is obvious that the same effect can be obtained even if the appearance ratio is determined and recorded.

以上、第１図〜第７図により説明したように、本実施例
においては、候補文字選別部３で、第２候補文字記憶部
６の記憶内容を参照して、正解の文字となり得る確率か
高い文字のみを、認識候補文字格納部２の第１候補文字
群から、第２候補文字群として選別する。このことによ
り、表示部７に表示される候補文字が少なくなり、利用
者は、正解の文字の選択が容易になり、文字認識結果の
確認訂正に要する時間を短縮することが出来る。As described above with reference to FIGS. 1 to 7, in this embodiment, the candidate character selection unit 3 refers to the stored contents of the second candidate character storage unit 6 and determines the probability that the character can be a correct character. Only high-value characters are selected from the first candidate character group in the recognition candidate character storage section 2 as the second candidate character group. This reduces the number of candidate characters displayed on the display section 7, making it easier for the user to select the correct character, and reducing the time required to confirm and correct character recognition results.

さらに、第２候補文字記憶部６の記憶内容は、正解の文
字が確定する度に、順位学習部５の処理結果に基づき、
更新される。このことにより、候補文字選別部３は、よ
り確度の正確な情報を得ることが出来るため、第２候補
文字群の選別を、より正確に行なうことが可能となる。Furthermore, the memory contents of the second candidate character storage unit 6 are changed based on the processing results of the ranking learning unit 5 each time a correct character is determined.
Updated. As a result, the candidate character selection section 3 can obtain more accurate information, and therefore can more accurately perform selection of the second candidate character group.

そして、利用者の正解の文字の指定動作時の負荷を軽減
することが出来る。Then, the load on the user when specifying the correct character can be reduced.

さらに、本実施例を、認識候補文字と単語辞書を照合し
て正解文字を推定する照合処理の前処理に適用すれば、
照合時間を短縮することが可能となる。Furthermore, if this embodiment is applied to the preprocessing of the matching process that compares the recognition candidate characters with the word dictionary and estimates the correct characters,
It becomes possible to shorten the verification time.

［発明の効果］本発明によれば、認識候補文字の中における正解文字の
出現順位と、その出現頻度とを学習し、より正確度の高
い認識候補文字を選別させ、認識候補文字の表示個数を
削減して、利用者の候補文字の確認訂正作業に掛る負荷
を軽減させる。[Effects of the Invention] According to the present invention, the order of appearance of correct characters among recognition candidate characters and their frequency of appearance are learned, the recognition candidate characters with higher accuracy are selected, and the number of recognition candidate characters to be displayed is determined. To reduce the burden on a user in checking and correcting candidate characters.

[Brief explanation of the drawing]

図面は本発明の実施例を示し、第１図は本発明を施した
ＯＣＲの構成の一実施例を示すブロック図、第２図は第
１図における認識処理部の本発明に係る処理動作を示す
フローチャート、第３図は第１図における確認訂正部に
よる第２候補文字群の表示部への表示動作を示す説明図
、第４図は第１図における第２候補文字記憶部の記憶構
成を示す説明図、第５図は第１図における候補文字選別
部の本発明に係る処理動作の一実施例を示すフローチャ
ート、第６図は第１図における確認訂正＆ｌｊの本発明
に係る処理動作の一実施例を示すフローチャーＩ〜、第
７図は第１図における順位学習部の０本発明に係る処理動作の一実施例を示すフローチャート
である。１　文字認識部、２：認識候補文字格納部、３候補文字
選別部、４：確認訂正部、５：順位学習部、６：第２候
補文字記憶部、７：表示部、８：キーボード、９　認識
処理部、１０：帳票、］１：光学走査・光電変換部、１
２：前処理部、１３゜特徴抽出部、１４：認識辞書、３
０：第２候補文字群、３１：第１順位の正解の文字候補
欄、３２第２順位の正解の文字候補欄、３３、第ｍ順位
の正解の文字候補欄、４０：記憶領域、４１：候補文字
［間Ｊ、４０１：総出現回数（Ｎ。）、　４０２　：第
１順位での出現回数（Ｎ、）、　４０３　：第２順位で
の出現回数（Ｎ、）、４０４：第ｍ順位での出現回数（
Ｎｍ）。第２図第図０１第図第５図第図７３５− ドア０２The drawings show embodiments of the present invention, FIG. 1 is a block diagram showing an embodiment of the configuration of an OCR according to the present invention, and FIG. 2 shows the processing operation of the recognition processing section in FIG. 1 according to the present invention. FIG. 3 is an explanatory diagram showing the operation of displaying the second candidate character group on the display unit by the confirmation/correction unit in FIG. 1, and FIG. 4 shows the storage configuration of the second candidate character storage unit in FIG. FIG. 5 is a flowchart showing an example of the processing operation of the candidate character selection section in FIG. 1 according to the present invention, and FIG. FIG. 7 is a flowchart showing an example of the processing operation of the ranking learning section in FIG. 1 according to the present invention. 1 Character recognition section, 2: Recognition candidate character storage section, 3 Candidate character selection section, 4: Confirmation correction section, 5: Rank learning section, 6: Second candidate character storage section, 7: Display section, 8: Keyboard, 9 Recognition processing unit, 10: Form, ] 1: Optical scanning/photoelectric conversion unit, 1
2: Preprocessing unit, 13° feature extraction unit, 14: Recognition dictionary, 3
0: Second candidate character group, 31: Correct character candidate field of first rank, 32 Correct character candidate field of second rank, 33, Correct character candidate field of mth rank, 40: Storage area, 41: Candidate character [J], 401: Total number of occurrences (N.), 402: Number of occurrences in first rank (N,), 403: Number of occurrences in second rank (N,), 404: Number of occurrences in mth rank Number of occurrences of (
Nm). Figure 2 Figure 0 1 Figure 5 Figure 735- Door 02

Claims

[Claims]

(1) Possibility that each character written on a paper medium and read using optical means will be recognized as a set of pixels for each character, and that the correct character will match the character represented by the set of pixels. A recognition candidate character output control method for a character recognition device that outputs a first candidate character group consisting of a certain plurality of characters in a predetermined order of a probability of being extracted as a correct character set arbitrarily. Regarding the recognition candidate character storage means for storing the character codes of each character of the first candidate character group in the above-mentioned order, and the characters extracted as correct characters from the first candidate character group stored in the recognition candidate character storage means. , the total number of times extracted as correct characters, and
and a second candidate character storage means for storing the number of appearances corresponding to the ranking, and the recognition candidate characters are stored in the recognition candidate character storage means based on the total number of times and the number of appearances stored in the second candidate character storage means. A step of further selecting a second candidate character group from the candidate characters that has a high probability of being extracted as a correct character, and a step of manually extracting a specified correct character from the selected second candidate character group. , recognize the appearance order of the correct character in the recognition candidate character storage means, and determine the number of appearances of the correct character in the second candidate character storage means corresponding to the appearance order, and the total number of times the correct character has been extracted as a correct character. A method for controlling the output of recognition candidate characters in a character recognition device, the method comprising the step of: