JPH05182014A

JPH05182014A - Character recognition method

Info

Publication number: JPH05182014A
Application number: JP3358446A
Authority: JP
Inventors: Keiji Kojima; 啓嗣小島
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1991-12-28
Filing date: 1991-12-28
Publication date: 1993-07-23

Abstract

PURPOSE:To permit a user to precisely recognize the certainty of a recognized result, to efficiently execute correction work and to easily judge the cause of errorneous recognition. CONSTITUTION:A certainty decision part 9 synthetically decides the certainty of the recognized result with information obtained in respective processing stages 4-7 till a final recognized result is obtained as evidence. A display 11 and a printing device 12 add information on the certainty to the recognized result so as to display or print it. A candidate whose certainty is low is removed in the halfway processing stages if need or a processing is interrupted. Furthermore, certainty till the certain processing stage is reflected on the preceding or succeeding processing.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、複数処理による文字認
識方法に係り、特に利用者による認識結果の評価や修正
などを容易にするのに好適な文字認識方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition method using a plurality of processes, and more particularly to a character recognition method suitable for facilitating evaluation and correction of recognition results by a user.

【０００２】[0002]

【従来の技術】従来の文字認識方法は基本的に、入力パ
ターンとパターン辞書とを比較して相違度の小さいも
の、または類似度の大きなものを認識結果として出力す
るものである。しかし認識率１００％を実現することは
事実上不可能である。そうなると当然、認識結果の中に
は間違えた文字も含まれるので、その修正を利用者が行
なう必要がある。2. Description of the Related Art Basically, a conventional character recognition method is to compare an input pattern with a pattern dictionary and output one having a small difference or a large similarity as a recognition result. However, it is virtually impossible to achieve a recognition rate of 100%. In that case, naturally, the recognition result includes a wrong character, so that the user needs to correct it.

【０００３】この修正作業の効率化のためには、認識結
果がどの程度信頼できるものかを装置利用者が容易に評
価できるようにするとよい。In order to improve the efficiency of the correction work, it is preferable that the apparatus user can easily evaluate how reliable the recognition result is.

【０００４】この目的を達成するため、パターン辞書と
の類似度に応じて認識結果の表示色を変化させる光学文
字認識装置が提案されている（例えば特公昭６１−６４
３０号公報）。In order to achieve this object, an optical character recognition device has been proposed which changes the display color of the recognition result according to the degree of similarity with the pattern dictionary (for example, Japanese Patent Publication No. 61-64).
30 publication).

【０００５】また、認識中に文字のパターンまたは特徴
量を保存しておき、認識結果を修正した場合に、保存さ
れているパターンまたは特徴量を用いて認識のための辞
書の学習（更新、追加登録）を行う方式が考案されてい
る。Further, when a character pattern or a feature amount is stored during recognition and the recognition result is corrected, the dictionary for recognition is learned (updated, added) using the stored pattern or feature amount. A method of performing registration) has been devised.

【０００６】[0006]

【発明が解決しようとする課題】近年、読み取り対象原
稿に用いられる書体や文字種が著しく増加し、また低品
質のコピー原稿なども読み取り対象となることが多くな
っていることから、パターンマッチングのみでは認識率
の向上に限界があることが分かり、切り出し技術、言語
処理などの後処理技術の研究が精力的に行なわれてい
る。このような状況では、従来のようなパターンマッチ
ングの類似度のみで認識結果の確からしさを判断したの
では判断を誤る危険が大きい。In recent years, the typefaces and character types used for reading originals have increased remarkably, and low-quality copy originals and the like are also increasing for reading. Therefore, only pattern matching is necessary. It has been found that there is a limit to the improvement of recognition rate, and post-processing technology such as clipping technology and language processing is being actively researched. In such a situation, if the certainty of the recognition result is judged only by the conventional pattern matching similarity, the risk of misjudgment is high.

【０００７】また、認識結果の修正は、ある単位毎に認
識処理を中断して行なう方式と、原稿全面の認識処理が
終了した後に行なう方式とがある。いずれの方式であっ
ても、従来は、認識結果の修正時に辞書学習を行なう場
合、認識処理の開始から中断または終了までに対象とな
った全文字のパターンまたは特徴量を保存する。したが
って、原稿全面を一括して認識する方式の場合、極めて
大量のパターンまたは特徴量を保存しなければならず、
ＲＡＭや磁気ディスクの大きな記憶スペースを必要と
し、しかも、保存情報の大部分は辞書学習に利用されな
いため、極めて不経済である。他方、ある単位で認識処
理を中断して修正を行なう方式によれば、処理の単位を
小さくすることによりパターンまたは特徴量を保存する
ための記憶スペースを減らすことができるが、処理が小
さな単位毎に中断するために効率が非常に悪い。Further, there are a method of correcting the recognition result by interrupting the recognition processing for each unit, and a method of correcting the recognition result after the recognition processing of the entire surface of the document is completed. In either method, conventionally, when the dictionary learning is performed at the time of correcting the recognition result, the pattern or the characteristic amount of all the targeted characters from the start to the interruption or the end of the recognition process is saved. Therefore, in the case of the method of recognizing the entire surface of the original all at once, an extremely large amount of patterns or feature quantities must be stored,
It requires a large storage space such as a RAM or a magnetic disk, and most of the stored information is not used for dictionary learning, which is extremely uneconomical. On the other hand, according to the method in which the recognition process is interrupted and corrected in a certain unit, it is possible to reduce the storage space for storing the pattern or the feature amount by reducing the unit of the process, but it is possible to reduce the size of each unit in which the process is small. Very inefficient to break into.

【０００８】本発明の目的は、認識結果の確信度をより
的確に決定し、それを利用者に与える文字認識方法を提
供することにある。An object of the present invention is to provide a character recognition method which more accurately determines the certainty factor of a recognition result and gives it to the user.

【０００９】本発明の他の目的は、希望するような認識
結果が得られないことが明らかな場合に、処理を継続す
ることによる時間的な無駄を減らし、また利用者が必要
な処置を迅速に採り得るようにした文字認識方法を提供
することにある。Another object of the present invention is to reduce the time waste caused by continuing the processing when it is clear that the desired recognition result cannot be obtained, and to prompt the user to take necessary measures. It is to provide a character recognition method that can be adopted in.

【００１０】本発明の他の目的は、処理過程のできる限
り早い時点で不適切な候補を排除し、あるいは処理の方
法やパラメータを変更するなどして処理の効率や認識率
を向上できる文字認識方法を提供することにある。Another object of the present invention is character recognition which can improve processing efficiency and recognition rate by eliminating inappropriate candidates at the earliest possible point in the processing process or changing processing methods and parameters. To provide a method.

【００１１】本発明の他の目的は、利用者が誤認識の原
因や必要な処置を容易に判断できるようにした文字認識
方法を提供することにある。Another object of the present invention is to provide a character recognition method that allows a user to easily determine the cause of erroneous recognition and necessary treatment.

【００１２】本発明の更に他の目的は、認識処理の単位
が原稿全面というように大きい場合においても、辞書学
習のための文字のパターンまたは特徴量の保存に大きな
記憶スペースを必要とせず、また辞書の学習を効率的に
行う文字認識方法を提供することにある。Still another object of the present invention is that a large storage space is not required for storing a character pattern or feature amount for dictionary learning even when the unit of recognition processing is large, such as the entire surface of an original, and It is to provide a character recognition method for efficiently learning a dictionary.

【００１３】[0013]

【課題を解決するための手段】請求項１の発明は、文字
を認識するために、パターンマッチング処理、文字組合
せ選択処理、ルール処理、言語処理等の複数の処理を行
い、その各処理によって得られた情報に基づき総合的に
認識結果の確信度を決定することを特徴とする。According to a first aspect of the present invention, in order to recognize a character, a plurality of processes such as pattern matching process, character combination selection process, rule process, language process, etc. are performed and obtained by each process. It is characterized in that the certainty factor of the recognition result is comprehensively determined based on the obtained information.

【００１４】請求項２の発明は、文字を識別するため
に、パターンマッチング処理、文字組合せ選択処理、ル
ール処理、言語処理等の複数の処理を順に行い、その各
段階の処理毎に、当該処理で得られた情報に基づき候補
の確信度を求め、この求まった確信度により前段階の処
理までに求められた確信度を更新する操作を最終の処理
まで繰り返すことにより、最終的な認識結果に対する最
終的な確信度を決定することを特徴とする。According to a second aspect of the present invention, in order to identify a character, a plurality of processes such as pattern matching process, character combination selection process, rule process, and language process are sequentially performed, and the process is performed for each process of each stage. The confidence level of the candidate is obtained based on the information obtained in step 1, and the operation of updating the confidence level obtained up to the process in the previous stage based on the obtained confidence level is repeated until the final process, so that the final recognition result Characterized by determining the final certainty factor.

【００１５】請求項３の発明は、上記決定された確信度
の情報を認識結果に付加して表示または印刷することを
特徴とする。The invention of claim 3 is characterized in that the information of the determined certainty factor is added to the recognition result and displayed or printed.

【００１６】請求項４または５の発明は、表示または印
刷される認識結果の色、輝度、装飾、書体などの視覚的
な条件を確信度に応じて変化させ、あるいは確信度を表
わす文字または記号を認識結果と対応させて表示または
印刷することを特徴とする。According to a fourth or fifth aspect of the present invention, a visual condition such as color, brightness, decoration, or typeface of the recognition result displayed or printed is changed according to the certainty factor, or a character or a symbol indicating the certainty factor. Is displayed or printed in association with the recognition result.

【００１７】請求項６の発明は、請求項１または２の発
明において、確信度が一定レベル以下の認識結果文字数
を計数し、計数値が所定の閾値を越えたことを検出して
警告を発し、または処理を中断することを特徴とする。According to a sixth aspect of the present invention, in the first or second aspect of the present invention, the number of recognition result characters having a certainty factor of a certain level or less is counted, and when the count value exceeds a predetermined threshold value, a warning is issued. Alternatively, the processing is interrupted.

【００１８】請求項７の発明は、請求項２の発明におい
て、ある処理において確信度が一定レベル以下となった
候補を後の処理の候補から外すことを特徴とする。The invention of claim 7 is characterized in that, in the invention of claim 2, a candidate whose certainty factor becomes a certain level or less in a certain process is excluded from candidates of the subsequent process.

【００１９】請求項８の発明は、請求項１または２の発
明において、最終的な確信度とは別に個々の処理毎の確
信度の情報が保存されることを特徴とする。The invention of claim 8 is characterized in that, in the invention of claim 1 or 2, information of the certainty factor for each process is stored separately from the final certainty factor.

【００２０】請求項９の発明は、上記認識結果文字に関
して保存された個々の処理毎の確信度の情報に基づきメ
ッセージを出力することを特徴とする。A ninth aspect of the present invention is characterized in that a message is output based on the information of the certainty factor for each processing stored for the recognition result character.

【００２１】請求項１０の発明は、請求項１あるいは２
の発明において、ある処理までの確信度を、その前また
は後の処理あるいは前後の処理に反映させることを特徴
とする。The invention of claim 10 is the invention of claim 1 or 2.
In the invention, the certainty factor up to a certain process is reflected in the process before or after the process, or the process before and after the process.

【００２２】請求項１１の発明は、請求項１または２の
発明において、認識対象文字に対し決定された確信度が
所定レベルの場合に限り、該認識対象文字のパターンま
たは特徴量を保存させることを特徴とする。According to the invention of claim 11, in the invention of claim 1 or 2, the pattern or the characteristic amount of the recognition target character is stored only when the certainty factor determined for the recognition target character is a predetermined level. Is characterized by.

【００２３】請求項１２の発明は、認識結果が修正され
た場合に、修正された認識結果に対する確信度が所定レ
ベルのときに限り、上記保存されている対応したパター
ンまたは特徴量を用いてパターンマッチング処理のため
の辞書の学習を実行することを特徴とする。According to the twelfth aspect of the present invention, when the recognition result is modified, the pattern is stored using the stored corresponding pattern or feature amount only when the certainty factor for the modified recognition result is a predetermined level. It is characterized in that dictionary learning for matching processing is executed.

【００２４】[0024]

【作用】確信度とは、認識結果の文字がどの程度確から
しいかを表わすもので、０％から１００％の数値で表わ
されたり、あるいは、その数値を何段階かに量子化して
表わさせる。例えば、次のＡ，Ｂ，Ｃの３ランクで表わ
す。Ａランク：認識結果は正しい。Ｂランク：認識結果は怪しい（正しいか間違いか分から
ない）。Ｃランク：認識結果は間違い。Confidence indicates the degree of certainty of a character as a recognition result and is represented by a numerical value of 0% to 100%, or the numerical value is quantized in several steps. Let For example, it is represented by the following three ranks A, B, and C. Rank A: The recognition result is correct. B rank: The recognition result is suspicious (I do not know whether it is correct or incorrect). C rank: The recognition result is incorrect.

【００２５】本発明によれば、パターンマッチング処
理、文字組合せ選択処理、ルール処理、言語処理等の複
数段階の処理により得られる情報に基づき、総合的に最
終認識結果の確信度を決定する。例えば、パターンマッ
チング処理から第一候補の評価値あるいは第一候補と第
二候補との評価値の差、文字組合せ選択処理から複数文
字あるいは単数文字の決定時の評価値、ルール処理か
ら、どのようなルールが適用され修正されたかを表わす
情報、言語処理から言語修正の結果を表わす情報を集
め、これらの情報を証拠として、例えばデンプスター・
シェーファー（Ｄempster & Ｓhafer)の確率理論を使っ
て確信度を総合的に判断する。According to the present invention, the certainty factor of the final recognition result is comprehensively determined on the basis of the information obtained by a plurality of stages of processing such as pattern matching processing, character combination selection processing, rule processing and language processing. For example, from the pattern matching process, the evaluation value of the first candidate or the difference between the evaluation values of the first candidate and the second candidate, from the character combination selection process, the evaluation value when determining multiple characters or a single character, from the rule process, Information indicating whether or not a rule has been applied and corrected, and information indicating the result of language correction from language processing, and using this information as evidence, for example, Dempster
The reliability is comprehensively judged by using the probability theory of Shafer (Dempster & Shafer).

【００２６】このような確信度の決定は、最終の処理段
階で、それまでの処理段階で得られた情報を集めて一括
して行なうか、あるいは各処理段階で得られた情報に基
づき候補の確信度を求め、これによって前処理段階まで
に求められた確信度を更新する操作を最終処理段階まで
繰り返すり返すことにより行なう。The determination of such a certainty factor is carried out in the final processing stage by collecting the information obtained in the processing stages up to that time, or by determining the candidates based on the information obtained in each processing stage. The certainty factor is obtained, and the operation of updating the certainty factor obtained up to the preprocessing stage is repeated until the final processing stage.

【００２７】このようにして決定される確信度は、パタ
ーン辞書との類似度のみによる場合に比べ、より的確な
ものとすることができ、また全く不適当な確信度となる
危険を大幅に減らすことができる。The certainty factor thus determined can be more accurate than the case where only the degree of similarity with the pattern dictionary is used, and the risk of a completely inappropriate certainty factor is greatly reduced. be able to.

【００２８】そして、かかる確信度に応じて、認識結果
の表示または印刷の際に色や輝度などの視覚的条件を変
化させ、あるいは認識結果に対応させて文字または記号
を表示または印刷することにより、利用者は、認識結果
の確信度を容易に認識し、修正が必要な文字を素早く的
確に見つけ、その修正作業を効率よく行なうことができ
る。By changing the visual condition such as color and brightness in displaying or printing the recognition result or displaying or printing the character or the symbol corresponding to the recognition result in accordance with the certainty factor. The user can easily recognize the certainty factor of the recognition result, quickly and accurately find the character that needs correction, and efficiently perform the correction work.

【００２９】さらに、確信度があるレベル以下の文字
数、例えば確信度がＢランクとＣランクの文字数が所定
の閾値を越えた場合に警告（例えば警告音）を発生する
ことにより、装置利用者は現在読み取り中の原稿は間違
いが多いであろうと判断し、処理を停止させたり、ある
いは認識条件を変更して起動するといった、必要な処置
を素早くとることができる。また、このような場合、処
理を継続しても無駄になることが多いが、自動的に処理
を中断することにより、そのような時間的な無駄を減ら
すことができる。Furthermore, when the certainty factor is less than a certain level, for example, when the certainty factors of the B rank and the C rank exceed a predetermined threshold value, a warning (for example, a warning sound) is generated, so that the device user can It can be determined that there are many mistakes in the document currently being read, and the necessary processing such as stopping the processing or changing the recognition condition and activating can be taken quickly. Further, in such a case, it is often wasteful to continue the processing, but by automatically suspending the processing, such time waste can be reduced.

【００３０】また、最終処理段階に至るまでの、ある処
理段階における確信度に基づき、正解の可能性が低い候
補をその段階で候補から外すことにより、その分だけ後
続処理段階の処理量を減らし、処理効率を上げることが
できる。Further, based on the certainty factor in a certain processing stage up to the final processing stage, by removing candidates having a low possibility of correct answer from candidates at that stage, the processing amount of the subsequent processing stage is reduced accordingly. The processing efficiency can be improved.

【００３１】また、最終的な確信度とは別に個々の処理
段階毎の確信度の情報を保存しておくことにより、その
情報を文字認識装置の保守等に活用することができ、ま
た誤認識文字の修正時に、関連した個々の処理段階毎の
確信度の情報に基づき、誤認識の原因や必要な処置等を
装置利用者に助言するためのメッセージを出力し、装置
利用者による修正作業や装置保守を容易にすることがで
きる。Further, by storing the information of the certainty factor for each processing step separately from the final certainty factor, the information can be utilized for the maintenance of the character recognition device, and erroneous recognition is performed. When correcting a character, a message is output to advise the user of the cause of misrecognition, necessary measures, etc., based on the information of the certainty factor for each related processing step, and the correction work by the device user or Device maintenance can be facilitated.

【００３２】また、ある処理までの確信度を、その前ま
たは後の処理あるいは前後の処理に反映させ、前または
後の処理の方法やパラメータ等を適正化することによ
り、認識率を向上し、また修正作業の効率を改善するこ
とができる。Further, the certainty factor up to a certain process is reflected in the process before or after the process or before and after the process, and the method and parameters of the process before or after are optimized to improve the recognition rate, In addition, the efficiency of correction work can be improved.

【００３３】さらに、認識結果の修正は確信度が低い文
字を中心に進めるわけであり、また辞書学習の必要度が
高いのは、確信度が低くなった文字についてである。本
発明によれば、確信度が所定レベルより低く辞書学習の
必要度が高い文字についてのみ、パターンまたは特徴量
を保存するため、限られた記憶スペースを用い必要なパ
ターンまたは特徴量の情報を保存できる。また、修正時
には、確信度が低く辞書学習の必要度が高い文字に限っ
て、保存しておいたパターンまたは特徴量を利用して効
率的に辞書学習を行なうことができる。Further, the correction of the recognition result is carried out mainly on the character having a low certainty factor, and the necessity of dictionary learning is high for the character having the low certainty factor. According to the present invention, since the pattern or the characteristic amount is stored only for the character whose certainty factor is lower than a predetermined level and the necessity of dictionary learning is high, the information of the necessary pattern or the characteristic amount is stored using a limited storage space. it can. Further, at the time of correction, only a character having a low degree of certainty and a high degree of necessity for dictionary learning can be efficiently used for dictionary learning by using the saved pattern or feature amount.

【００３４】[0034]

【実施例】以下、本発明の各実施例について図面により
詳細に説明する。Embodiments of the present invention will be described in detail below with reference to the drawings.

【００３５】図１は本発明の第１の実施例に係る文字認
識装置の概略ブロック図である。図１において、スキャ
ナー１は原稿の画像を読み取り、２値データとして入力
するもので、切り出し処理部はその入力画像より文字の
画像を切り出す部分である。正規化処理部３は、切り出
された文字画像の歪みを補正するための正規化処理を行
なう部分である。パターンマッチング処理部４は、正規
化後の文字画像の特徴量を抽出しパターン辞書と比較し
て類似度または相違度を算出し、類似度が大きいまたは
相違度が小さい一つ以上の認識候補を決定する処理部で
ある。文字組合せ選択処理部５は、例えば日本語の場
合、全角で１文字なのか半角で２文字なのかなど最適な
文字の組み合わせを選択する処理部である。ルール処理
部６は、文字の行内での相対位置についての情報（例え
ば「・」と「．」などの区別に関係）、文字の大きさの
情報（正規化するので、例えばＣとｃなどの大小文字の
区別に関係）、文字種の情報（例えばカタカナ文字列の
中に漢数字の「一」があったら、これはカタカナの長音
記号「ー」に修正する）などのルールによって、パター
ンマッチングによる認識結果候補を修正する部分であ
る。言語処理部８は、ルール処理によって修正後の認識
結果文字列を形態素解析し、単語辞書と照合することに
より誤りを修正する部分である。この言語処理の結果が
最終的な認識結果で、利用者による修正の対象となる。FIG. 1 is a schematic block diagram of a character recognition apparatus according to the first embodiment of the present invention. In FIG. 1, a scanner 1 reads an image of a document and inputs it as binary data, and a cut-out processing section is a part for cutting out a character image from the input image. The normalization processing unit 3 is a unit that performs normalization processing for correcting the distortion of the cut out character image. The pattern matching processing unit 4 extracts the characteristic amount of the normalized character image and compares it with a pattern dictionary to calculate the similarity or dissimilarity, and selects one or more recognition candidates with high similarity or low dissimilarity. It is a processing unit that determines. For example, in the case of Japanese, the character combination selection processing unit 5 is a processing unit that selects an optimum character combination such as whether the full-width character is one character or the half-width character is two characters. The rule processing unit 6 includes information on the relative position of characters in a line (related to the distinction between “.” And “.”, Etc.) and character size information (for normalization, such as C and c). According to the rules such as character case information (for example, if there is a kanji number "1" in the katakana character string, this is corrected to the katakana long syllabary "-"), pattern matching is performed. This is a part for correcting the recognition result candidates. The language processing unit 8 is a unit that corrects an error by morphologically analyzing the recognition result character string after correction by rule processing and matching it with a word dictionary. The result of this language processing is the final recognition result, which is subject to correction by the user.

【００３６】確信度決定部９は、パターンマッチング処
理部４から第一候補の評価値（例えば辞書との距離を総
輪郭数で除した値など）または第一候補と第二候補との
評価値の差、文字組合せ選択処理部５から文字組合せを
決めるときの評価値、ルール処理部６から修正のために
適用されたルールの情報、言語処理部７から言語修正の
結果を表わす情報を集め、これらの情報を証拠としてデ
ンプスター・シェーファーの確率理論を使って総合的に
最終認識認識結果の確信度を決定する部分である。具体
的には、後述のように確信度をＡ，Ｂ，Ｃランクに量子
化する。The certainty factor determination unit 9 evaluates the first candidate from the pattern matching processing unit 4 (for example, the value obtained by dividing the distance from the dictionary by the total number of contours) or the evaluation value of the first candidate and the second candidate. , The evaluation value for determining the character combination from the character combination selection processing unit 5, the information of the rule applied for the correction from the rule processing unit 6, and the information representing the result of the language correction from the language processing unit 7, This is a part of comprehensively determining the certainty factor of the final recognition result using the Dempster-Schaefer probability theory using these information as evidence. Specifically, the certainty factor is quantized into A, B, and C ranks as described later.

【００３７】出力制御部８は、最終的認識結果のディス
プレイ１１による表示または印字装置１２による印刷を
制御する部分であるが、確信度決定部９により与えられ
た確信度の情報を認識結果に付加してディスプレイ１１
または印字装置１２より出力させるために、表示または
印刷される認識結果の視覚的条件を確信度に応じて変化
させる。The output control section 8 is a section for controlling the display of the final recognition result on the display 11 or the printing by the printer 12, and adds the information of the certainty factor given by the certainty factor determining section 9 to the recognition result. Then display 11
Alternatively, the visual condition of the recognition result to be displayed or printed is changed according to the certainty factor in order to be output from the printing device 12.

【００３８】具体的には、確信度がＡランクの文字は
黒、ＢランクとＣランクの文字は背景を黒で白抜きとし
たり、確信度により文字の輝度を変化させたり、背景色
を黒として、Ａランクの文字は白色、Ｂランクの文字は
黄色（注意を促す色）、Ｃランクの文字は赤色（警告を
表わす色）とするなど、確信度に応じて文字または背景
の色を変化させる。Specifically, a character having a certainty factor of A rank is black, and a character of B and C ranks has a black background with a white background, the brightness of the character is changed according to the certainty factor, and the background color is black. Change the color of the character or background according to the certainty, such as the A rank character is white, the B rank character is yellow (color that calls attention), and the C rank character is red (color that indicates a warning). Let

【００３９】あるいは、背景色を黒とし、Ａランクの文
字は白色、Ｂランクの文字はシングル・アンダーライン
付きまたは淡い網掛け、Ｃランクの文字はダブル・アン
ダーライン付きまたは濃い網掛けにするというように、
確信度に応じて文字の装飾を変化させたり、背景色を黒
とし、Ａランクの文字は明朝体、Ｂランクの文字はゴシ
ック体、Ｃランクの文字は斜体という具合に、確信度に
応じて文字の書体を変化させる。Alternatively, the background color is black, the A rank character is white, the B rank character is single underlined or lightly shaded, and the C rank character is double underlined or dark shaded. like,
Depending on the certainty, the decoration of the letters is changed, the background color is black, the A rank letters are Mincho, the B rank letters are Gothic, and the C rank letters are italic. Change the typeface of the letters.

【００４０】または、文字に対応する画像の背景色を白
色、Ａランクの結果に対応する画像は黒色、Ｂランクの
結果に対応する画像は黄色、Ｃランクの結果に対応する
画像は赤色といった具合に、確信度に応じて認識結果の
文字に対応する画像の色を変化させる。Alternatively, the background color of the image corresponding to the character is white, the image corresponding to the A rank result is black, the image corresponding to the B rank result is yellow, and the image corresponding to the C rank result is red. Then, the color of the image corresponding to the character of the recognition result is changed according to the certainty factor.

【００４１】あるいは、Ｂランクの文字の下にＢ、Ｃラ
ンクの文字の下にＣを並べて表示または印刷するという
ように、認識結果に対応付けて確信度を表わす文字もし
くは記号を出力する。Alternatively, a character or a symbol indicating the certainty factor is output in association with the recognition result, such that B is displayed below the character of rank B and C is displayed below the character of rank C to be displayed or printed.

【００４２】装置制御部１０は装置全体の制御を司るほ
か、確信度が一定レベル以下の認識結果文字、例えばＢ
ランクとＣランクの文字の個数を計数し、計数値が所定
の閾値を越えたときに、処理を中断させるとともに、デ
ィスプレイ１１に付属したブザーを鳴動させるなどして
装置利用者に警告する。The device control unit 10 controls the entire device, and the recognition result character having a certainty factor of a certain level or less, for example, B.
The number of rank and C rank characters is counted, and when the count value exceeds a predetermined threshold value, the process is interrupted and the device user is warned by sounding a buzzer attached to the display 11.

【００４３】次に、確信度決定部９の処理について詳述
する。確信度決定部９での処理の流れを図２に示す。Next, the processing of the certainty factor determining unit 9 will be described in detail. FIG. 2 shows the flow of processing in the certainty factor determination unit 9.

【００４４】前述のように、確信度決定部９はパターン
マッチング処理、文字組合せ選択処理、ルール処理、言
語処理の各処理段階から集めた情報を証拠として、デン
プスター・シェーファーの確率理論を使って総合的に確
信度を決定する。図３にその処理の概要を示す。As described above, the certainty factor determination unit 9 uses the information gathered from the process stages of the pattern matching process, the character combination selection process, the rule process, and the language process as evidence and uses the Dempster-Schaefer probability theory to synthesize the information. The degree of certainty. FIG. 3 shows an outline of the processing.

【００４５】このデンプスター・シェーファーの確率理
論は、無知量を表わすことができ、断片的な情報（独立
な証拠）を集め合成できるという特徴があり、The Dempster-Schaefer probability theory has a feature that it can represent ignorance and collect and synthesize fragmentary information (independent evidence).

【００４６】[0046]

【表１】のように、確率を「信用」「無知」「不信用」の領域に
わけて表わし、「信用」を下界確率、「信用＋無知」を
上界確率と呼ぶ。[Table 1] As described above, the probability is divided into areas of “credibility”, “ignorance”, and “untrust”, and “credibility” is called a lower bound probability, and “credit + ignorance” is called an upper bound probability.

【００４７】具体的には、まず、この文字は正しいとい
う仮説Ｈ１、この文字は間違いであるという仮説Ｈ２を
たて、集まった各証拠について擬似可能性分布（各仮説
に割り当てられた、その仮説が真である可能性の分布）
を求める。本実施例においては、図２に示すように、パ
ターンマッチング処理より集めた証拠である第一候補の
評価値に関する擬似可能性分布、同じく第一候補と第二
候補との評価値の差に関する擬似可能性分布、パス選択
処理より集めた証拠であるパス選択時の評価値の差に関
する擬似可能性分布、ルール処理から集めた証拠内容に
関する擬似可能性分布、言語処理より集めた修正内容に
関する擬似可能性分布を生成する。Specifically, first, a hypothesis H1 that this character is correct and a hypothesis H2 that this character is incorrect are prepared, and a pseudo-possibility distribution (for each hypothesis, the hypothesis assigned to each hypothesis Distribution of the probability that is true)
Ask for. In the present embodiment, as shown in FIG. 2, the pseudo-possibility distribution regarding the evaluation value of the first candidate, which is the evidence collected by the pattern matching process, and the pseudo-potential distribution regarding the difference between the evaluation values of the first candidate and the second candidate. Possibility distribution, Pseudo-possibility distribution regarding the difference in evaluation value at the time of path selection, which is evidence collected by the path selection process, Pseudo-possibility distribution regarding evidence content collected from the rule process, Pseudo-possibility related to correction contents collected from the language process Generate a sex distribution.

【００４８】証拠が数値情報の場合、この擬似可能性分
布は例えばｙ＝−ｘ／３＋１００（ｙは擬似可能性分
布、ｘは数値情報の証拠）のような一次方程式による求
める。この式で、評価値が１００の場合、仮説Ｈ１の可
能性６７％、仮説Ｈ２の可能性３３％が求められる。When the evidence is numerical information, this pseudo-possibility distribution is obtained by a linear equation such as y = -x / 3 + 100 (y is pseudo-possibility distribution, x is evidence of numerical information). If the evaluation value is 100, the probability 67% of the hypothesis H1 and the probability 33% of the hypothesis H2 are calculated by this formula.

【００４９】証拠が数値情報でない場合は、ヒューリス
ティックに与える。例えば、あるルールを満足しなかっ
たという証拠の場合、仮説Ｈ１の可能性は２０％、仮説
Ｈ２の可能性は８０％というように予め決めておく。If the evidence is not numerical information, it is given as a heuristic. For example, in the case of evidence that a certain rule is not satisfied, the probability of the hypothesis H1 is 20% and the probability of the hypothesis H2 is 80%.

【００５０】このようにして集めた各証拠についての擬
似可能性分布から、次に基本確率割り当てを求める。す
なわち、擬似可能性分布を「正しい」「無知」「間違
い」の各確率に振り分ける。From the pseudo-possibility distribution for each piece of evidence thus collected, the basic probability assignment is next obtained. That is, the pseudo possibility distribution is assigned to each probability of “correct”, “ignorance”, and “error”.

【００５１】例えば仮説Ｈ１の可能性が６７％、仮説Ｈ
２の可能性が３３％の場合は、For example, the probability of hypothesis H1 is 67%, the hypothesis H is
If the probability of 2 is 33%,

【００５２】[0052]

【表２】のようにする。[Table 2] Like

【００５３】また例えば、仮説Ｈ１の可能性が２０％、
仮説Ｈ２の可能性が８０％の場合、For example, the probability of hypothesis H1 is 20%,
If the probability of hypothesis H2 is 80%,

【００５４】[0054]

【表３】のようにする。[Table 3] Like

【００５５】つぎにデンプスター（Ｄempster）の統合
規則により、基本確率を合成する。これは独立な証拠か
ら推論された基本確率を統合する方法であり、Next, the basic probabilities are synthesized by the Dempster integration rule. This is a way to integrate the basic probabilities inferred from independent evidence,

【数１】ただし、ｍ１，ｍ２は基本確率ＨはＨ１またはＨ２またH1H2 H1H2はＨ１かＨ２であるという必ず真の仮説で表わされ
る。[Equation 1] However, m1 and m2 are always expressed by the true hypothesis that the basic probability H is H1 or H2, and H1H2 H1H2 is H1 or H2.

【００５６】次に、下界確率と上界確率を求める。次の
陪審員のモデルを例に説明する。Next, the lower bound probability and the upper bound probability are obtained. Take the following jury model as an example.

【００５７】[0057]

【表４】この例では、有罪の下界確率は１５％、上界確率は７４
％（=15+59 ）無罪の下界確率は２６％、上界確率は８５％（=26+59
）となる。[Table 4] In this example, the guilty lower bound probability is 15% and the upper bound probability is 74.
% (= 15 + 59) Innocent lower bound probability is 26%, Upper bound probability is 85% (= 26 + 59
) Becomes.

【００５８】最後に、下界確率と上界確率から確信度の
ランクを決定する。その方法は様々考えられるが、もっ
とも簡単な方法としては例えば、仮説Ｈ２の上界確率か
ら次のような閾値によりＡ，Ｂ，Ｃのランクを決める。Ｈ２の上界確率が０〜１９正しい（Ａランク）Ｈ２の上界確率が２０〜７９怪しい（Ｂランク）Ｈ２の上界確率が８０〜１００間違い（Ｃランク）Finally, the confidence level is determined from the lower bound probability and the upper bound probability. Although various methods are conceivable, as the simplest method, for example, the ranks of A, B, and C are determined from the upper bound probability of the hypothesis H2 by the following threshold values. The upper bound probability of H2 is 0 to 19 correct (A rank) The upper bound probability of H2 is 20 to 79 Suspicious (B rank) The upper bound probability of H2 is 80 to 100 False (C rank)

【００５９】なお、下界確率と上界確率の２段階で確信
度のランクを決定してもよい。The rank of the certainty factor may be determined in two steps of the lower bound probability and the upper bound probability.

【００６０】図４に認識結果の具体例、図５にそれに対
する確信度決定の具体的処理を示す。即ち、この例の場
合、最終の認識結果「ル」の確信度は正しいとランク付
けされる。FIG. 4 shows a specific example of the recognition result, and FIG. 5 shows a specific process for determining the certainty factor for it. That is, in the case of this example, the certainty factor of the final recognition result "ru" is ranked as correct.

【００６１】また、図６には確信度を付与した認識結果
をディスプレイ１１に表示した例を示し、図７には印字
装置１２により紙に印字した例を示す。さらに、図８に
は確信度を認識結果と対応させて表示した例を、図９に
は同じく紙に印字した例を示す。Further, FIG. 6 shows an example in which the recognition result to which the certainty factor is added is displayed on the display 11, and FIG. 7 shows an example in which the printer 12 prints on the paper. Further, FIG. 8 shows an example in which the certainty factor is displayed in association with the recognition result, and FIG. 9 shows an example in which the certainty factor is also printed on paper.

【００６２】図１０は本発明の第２の実施例に係る文字
認識装置の概略ブロック図である。FIG. 10 is a schematic block diagram of a character recognition apparatus according to the second embodiment of the present invention.

【００６３】図１０において、スキャナー１、切り出し
処理部２、正規化処理部３、パターンマッチング処理部
４、文字組合せ選択処理部５、ルール処理部６、言語処
理部７、出力制御部８、ディスプレイ１１、印字装置１
２は図１に示した対応部分と同じ部分である。In FIG. 10, a scanner 1, a cutout processing unit 2, a normalization processing unit 3, a pattern matching processing unit 4, a character combination selection processing unit 5, a rule processing unit 6, a language processing unit 7, an output control unit 8 and a display. 11, printing device 1
2 is the same part as the corresponding part shown in FIG.

【００６４】確信度決定部１３は、デンプスター・シェ
ーファーの確率理論を用いて確信度を求める部分である
が、図１に示した確信度決定部９とは若干の相違があ
る。すなわち、この確信度決定部１３においては、パタ
ーンマッチング処理部４から第一候補の評価値（例えば
辞書との距離を総輪郭数で除した値）または第一候補と
第二候補との評価値の差の情報、文字組合せ選択処理部
５からの文字組合せ選択時の評価値の情報、ルール処理
部６から修正のために適用されたルールの情報、言語処
理部７から言語修正の結果を表わす情報を集め、さらに
切り出し処理部２からも文字間ピッチと文字矩形幅等の
情報を集め、これら各処理段階で集めた情報を証拠とし
て、その処理段階での候補についての確信度を決定し、
この決定した確信度によって前処理段階までの確信度を
更新する。この操作を最終処理段階すなわち言語処理段
階まで繰り返すことによって、最終的認識結果に対する
最終的な確信度を決定し、これを前述のようなＡ，Ｂ，
Ｃの３ランクに量子化する。The certainty factor determining unit 13 is a unit for obtaining the certainty factor using the probability theory of Dempster-Schafer, but there is a slight difference from the certainty factor determining unit 9 shown in FIG. That is, in the certainty factor determination unit 13, the evaluation value of the first candidate from the pattern matching processing unit 4 (for example, the value obtained by dividing the distance from the dictionary by the total number of contours) or the evaluation value of the first candidate and the second candidate. Difference information, evaluation value information when the character combination is selected from the character combination selection processing unit 5, rule information applied for correction from the rule processing unit 6, and language processing result from the language processing unit 7. Information such as character pitch and character rectangle width is collected from the cutout processing unit 2, and the certainty factor about the candidate at that processing stage is determined by using the information collected at each processing stage as evidence.
The certainty factor up to the preprocessing stage is updated by the determined certainty factor. By repeating this operation up to the final processing stage, that is, the language processing stage, the final certainty factor for the final recognition result is determined, and this is determined by A, B, and
Quantize into 3 ranks of C.

【００６５】文字数計数部１４は、確信度決定部１３に
より決定された最終的な確信度が一定レベル以下の認識
結果文字、例えばＢランクとＣランクの文字の個数を計
数し、計数値が所定の閾値を越えたときに文字数超過信
号を出す。図１１にその処理フローを示す。The number-of-characters counting unit 14 counts the number of recognition-result characters whose final certainty factor determined by the certainty factor determining unit 13 is a certain level or less, for example, B rank and C rank characters, and the count value is predetermined. A character number excess signal is output when the threshold value is exceeded. FIG. 11 shows the processing flow.

【００６６】候補削除判定部１５は、切り出し処理段階
で得られた情報に基づき確信度決定部１３により得られ
た各候補の確信度を一定の閾値と比較し、それを下回る
確信度の候補を削除候補と判定し削除信号を出力する。
ここでは、対象処理段階として切り出し処理が選ばれた
が、他の処理段階を対象に選んでもよく、あるいは２以
上の処理段階を対象に選んでもよい。The candidate deletion determination unit 15 compares the certainty factor of each candidate obtained by the certainty factor determination unit 13 with a certain threshold value based on the information obtained in the cut-out processing stage, and selects the candidate with the certainty factor lower than that. Judge as a candidate for deletion and output a deletion signal.
Here, the cutout process is selected as the target process stage, but another process stage may be selected as the target, or two or more process stages may be selected as the target.

【００６７】装置制御部１０は、装置全体の制御を司る
ほか、文字計数部１４から文字数超過信号が出た場合に
処理を中断させるとともに、ディスプレイ１１に付属し
たブザー等を鳴動させて装置利用者に警告し、また候補
削除判定部１５から削除信号が出た場合に該当の候補を
次の処理段階の候補から外す制御を行なう。The device control unit 10 controls the entire device, interrupts the process when a character number excess signal is output from the character counting unit 14, and sounds a buzzer or the like attached to the display 11 to allow the user of the device to operate. When the deletion signal is output from the candidate deletion determination unit 15, control is performed to remove the corresponding candidate from the candidates for the next processing stage.

【００６８】次に確信度決定部１３の処理について説明
する。図１２に各処理部の処理と確信度決定部１３での
処理との関係を示し、図１３に確信度決定部１３におけ
る各処理段階での確信度決定処理の流れを示す。Next, the processing of the certainty factor determination unit 13 will be described. FIG. 12 shows the relationship between the processing of each processing unit and the processing of the certainty factor determination unit 13, and FIG. 13 shows the flow of the certainty factor determination process at each processing stage in the certainty factor determination unit 13.

【００６９】前述のように、確信度決定には処理段階か
ら得られた情報を証拠として、デンプスター・シェーフ
ァーの確率理論を使って行なう。より具体的には、Ｈ
１：第１候補、Ｈ２：第２候補，…，Ｈｎ：第ｎ候補、
というように、予め決まっている最大候補数分の仮説Ｈ
ｉをたてる。つまり、最終の認識結果文字は１番目の候
補なのか、２番目の候補なのか、といった具合に考え
て、それぞれの確率を求める。As described above, the certainty factor is determined by using the information obtained from the processing stage as evidence and using the probability theory of Dempster-Schaefer. More specifically, H
1: first candidate, H2: second candidate, ..., Hn: nth candidate,
As described above, the hypothesis H for the predetermined maximum number of candidates is
make i. That is, the probability of each character is obtained by considering whether the final recognition result character is the first candidate or the second candidate.

【００７０】まず、各処理段階において、そこで集まっ
た各情報すなわち各証拠について擬似可能性分布（各仮
説に割り当てられた、その仮説が真である可能性の分
布）を求める。このときに、証拠が数値情報であればｙ
＝−ｘ／３＋１００のような一次方程式により擬似可能
性（ｙ）を求めること、証拠が数値情報でなければ擬似
可能性分布をヒューリスティックに与えることは、前述
の通りである。First, at each processing stage, a pseudo-possibility distribution (a distribution of the possibility that the hypothesis is true assigned to each hypothesis) is obtained for each piece of information gathered there, that is, each piece of evidence. At this time, if the evidence is numerical information, y
As described above, the pseudo possibility (y) is obtained by a linear equation such as = -x / 3 + 100, and the pseudo possibility distribution is heuristically provided if the evidence is not numerical information.

【００７１】次に、仮説Ｈ１〜Ｈｎの擬似可能性分布を
用いて、協和列と呼ばれる仮説の列Ｈ１，Ｈ
２，．．．，Ｈｎを作り、基本確率割り当てへ変換す
る。すなわち、擬似可能性分布を「正しい」「無知」
「間違い」の各基本確率に振り分ける。基本確率は確信
度であるが、それより求められる下界確率と上界確率か
ら確信度を決定してもよい。Next, using the pseudo-possibility distribution of the hypotheses H1 to Hn, the sequences of hypotheses H1 and H called consonance sequences.
2 ,. ．． , Hn, and convert to basic probability allocation. That is, the pseudo possibility distribution is “correct” and “ignorance”.
Allocate to each basic probability of "mistake". The basic probability is the certainty factor, but the certainty factor may be determined from the lower bound probability and the upper bound probability obtained from it.

【００７２】もし、証拠が複数あるならば、前述のよう
にデンプスターの統合規則を使用し合成式（数１）によ
り各証拠について求めた各仮説の基本確率を合成する。If there is a plurality of evidences, the basic probability of each hypothesis obtained for each evidence is synthesized by the synthesis formula (Equation 1) using the Dempster integration rule as described above.

【００７３】このようにして各処理段階で得られた証拠
に基づいた各仮説の基本確率（確信度）が得られたら、
これを直前の処理段階までに得られている各仮説の基本
確率と前記合成式（数１）によって合成する（更新す
る）。当然、各仮説の下界確率と上界確率も求まる。When the basic probability (certainty factor) of each hypothesis based on the evidence obtained at each processing stage is obtained in this way,
This is combined (updated) by the basic probability of each hypothesis obtained up to the immediately preceding processing stage and the above-mentioned composition formula (Equation 1). Naturally, the lower bound probability and upper bound probability of each hypothesis can also be obtained.

【００７４】最終段階すなわち言語処理では、決定した
確信度をランク付けする。このランク付けの方法は様々
考えられるが、最も簡単な方法は前述のような上界確率
と閾値との比較による方法である。At the final stage, that is, language processing, the determined certainty factors are ranked. Although various ranking methods can be considered, the simplest method is a method by comparing the upper bound probability with a threshold value as described above.

【００７５】以上の確信度決定処理について、文字認識
処理の流れに従ってより具体的に説明する。The above-described certainty factor determination process will be described more specifically according to the flow of the character recognition process.

【００７６】まず、切り出し処理で、図１４に示すよう
な切り出し結果となったとする。図中の矩形は文字の候
補となる文字矩形を表わしている。そして、Ａ，Ｂ，Ｃ
の矩形に注目した場合、Ａ矩形（全角文字）、ＢとＣの
矩形（半角文字）が文字の候補となり、それぞれについ
て文字間ピッチや矩形の幅等の情報（証拠）から前述の
確率（確信度）がＡ矩形下界確率？％上界確率？％Ｂ矩形下界確率？％上界確率？％Ｃ矩形下界確率？％上界確率？％というように求められる。First, it is assumed that the cutting process results in the cutting result as shown in FIG. The rectangles in the figure represent character rectangles that are character candidates. And A, B, C
When paying attention to the rectangle of, the rectangle A (full-width character) and the rectangles B and C (half-width character) are candidates for the character, and the probability (confidence) described above is obtained from the information (evidence) such as the pitch between characters and the width of the rectangle for each. Degree) is an A rectangle lower bound probability? % Upper bound probability? % B rectangle Lower bound probability? % Upper bound probability? % C rectangle Lower bound probability? % Upper bound probability? % Is required.

【００７７】本実施例では、例えば下界確率または上界
確率が所定の閾値を下回った矩形は、次の処理段階の文
字候補から外す。In the present embodiment, for example, a rectangle whose lower bound probability or upper bound probability falls below a predetermined threshold is excluded from character candidates in the next processing stage.

【００７８】なお、この段階の確信度より最も確からし
い一つの矩形だけを候補に決定することもできる。It is also possible to determine only one rectangle that is most likely to be the candidate from the certainty factor at this stage.

【００７９】これらの矩形Ａ，Ｂ，Ｃのそれぞれについ
て（文字候補から除外されたものは当然含まれない）、
次のパターンマッチング処理（認識処理）で認識結果候
補郡が得られるが、各矩形に対する第一候補の評価値、
第一候補と第二候補との評価値の差などの情報に基づ
き、Ａ矩形第１候補下界確率？％上界確率？％第２候補下界確率？％上界確率？％第３候補下界確率？％上界確率？％Ｂ矩形第１候補下界確率？％上界確率？％第２候補下界確率？％上界確率？％第３候補下界確率？％上界確率？％Ｃ矩形第１候補下界確率？％上界確率？％第２候補下界確率？％上界確率？％第３候補下界確率？％上界確率？％を求める。そして、前段の切り出し処理で求められた確
信度と合成することにより、確信度の更新を行なうこと
は前述の通りである。For each of these rectangles A, B and C (of course, those excluded from the character candidates are not included),
A recognition result candidate group is obtained by the next pattern matching processing (recognition processing), but the evaluation value of the first candidate for each rectangle,
Based on information such as the difference in evaluation value between the first candidate and the second candidate, A rectangle first candidate lower bound probability? % Upper bound probability? % 2nd candidate lower bound probability? % Upper bound probability? % 3rd candidate lower bound probability? % Upper bound probability? % B rectangle First candidate lower bound probability? % Upper bound probability? % 2nd candidate lower bound probability? % Upper bound probability? % 3rd candidate lower bound probability? % Upper bound probability? % C rectangle First candidate lower bound probability? % Upper bound probability? % 2nd candidate lower bound probability? % Upper bound probability? % 3rd candidate lower bound probability? % Upper bound probability? Ask for%. As described above, the certainty factor is updated by combining the certainty factor with the certainty factor obtained in the cutting process in the previous stage.

【００８０】なお、この段階で各矩形の候補中の確信度
が予め定めた閾値以下の候補を、切り出し処理の場合と
同様に候補から除外することにより、処理時間の節減を
図ってもよい。さらに、この段階でＡ矩形であるか、
Ｂ，Ｃ矩形であるかを合成された確信度から判断するこ
ともできる。ここでは、Ａ矩形が文字の候補として選ば
れたものとして説明を進める。At this stage, it is possible to reduce the processing time by excluding, from the candidates, the certainty factor in each of the rectangle candidates, which is equal to or less than a predetermined threshold value, as in the case of the clipping process. Furthermore, at this stage, if it is A rectangle,
It is also possible to determine whether the rectangle is a B or C rectangle from the combined certainty factors. Here, the description will proceed assuming that the A rectangle is selected as a character candidate.

【００８１】次のルール処理では、文字の位置、文字の
サイズ、文字種等の情報から各認識結果候補の確信度を
求め、パターンマッチング処理段階での確信度と合成
し、更新された確信度第１候補下界確率？％上界確率？％第２候補下界確率？％上界確率？％第３候補下界確率？％上界確率？％を求める。例えば文字の位置情報に関しては、対象画像
が行の中心辺りに位置しているならば「・」の確信度は
上昇するが、「．」「’」の確信度は下がることにな
る。In the next rule processing, the certainty factor of each recognition result candidate is obtained from the information such as the character position, the character size, the character type, etc., and is combined with the certainty factor at the pattern matching processing stage to obtain the updated certainty factor. 1 candidate lower bound probability? % Upper bound probability? % 2nd candidate lower bound probability? % Upper bound probability? % 3rd candidate lower bound probability? % Upper bound probability? Ask for%. For example, regarding the position information of characters, if the target image is located near the center of the line, the certainty factor of "." Increases, but the certainty factor of ".""'" Decreases.

【００８２】次の言語処理では、どの候補が最も相応し
いかを単語辞書等を使って決定し、その結果に対して確
信度を求め、前処理段階の確信度と合成する。例えば単
語辞書との照合によれば、注目文字の第１候補と次の文
字の第２候補との組み合わせにより意味のある単語を形
成するときには、注目文字の第１候補と次の文字の第２
候補の確信度が上がる。したがって、言語処理を行なう
ためには、ある程度の長さの文字列が必要である。In the next language processing, which candidate is the most suitable is determined using a word dictionary or the like, a certainty factor is obtained for the result, and the result is combined with the certainty factor in the preprocessing stage. For example, according to matching with a word dictionary, when a meaningful word is formed by combining the first candidate of the target character and the second candidate of the next character, the first candidate of the target character and the second candidate of the next character are formed.
The confidence level of the candidate increases. Therefore, in order to perform language processing, a character string having a certain length is required.

【００８３】このようにして、例えば、第１候補下界確率８０％上界確率１５％第２候補下界確率３３％上界確率１０％第３候補下界確率１２％上界確率３３％のような最終的な確信度が得られるので、例えば下界確
率が最大の第１候補を最終的な認識結果として出力し、
その確信度には前述のような閾値処理によつてランク付
けをする。なお、候補の並びは、各処理段階で確信度の
高い順にソートしてもよいし、最終的な確信度の高い順
にソートしてもよい。Thus, for example, the first candidate lower bound probability 80% the upper bound probability 15% the second candidate lower bound probability 33% the upper bound probability 10% the third candidate lower bound probability 12% the upper bound probability 33% Since a certainty factor is obtained, for example, the first candidate with the maximum lower bound probability is output as the final recognition result,
The certainty factor is ranked by the threshold processing as described above. The candidates may be sorted in descending order of certainty at each processing stage, or may be sorted in descending order of final certainty.

【００８４】また、複数の証拠がある場合の下界確率と
上界確率の求め方は、前記表４を用い陪審員のモデルを
例に説明したように、対応した基本確率を加算するだけ
でよい。なお、ここでは文字認識処理の場合であるか
ら、上記陪審員モデルでの説明中の「有罪」「分からな
い」「無罪」を「正しい」、「正しいか間違いか分から
ない」「間違い」と言い替えればよい。Further, the lower bound probability and the upper bound probability in the case of a plurality of pieces of evidence need only be obtained by adding the corresponding basic probabilities as described in the example of the jury model using Table 4 above. .. In addition, since it is the case of character recognition processing, "guilty", "don't know", and "innocence" in the above description of the jury model can be paraphrased as "correct", "don't know whether it is right or wrong", and "wrong". Good.

【００８５】図１５乃至図１７に第２の実施例の具体的
処理例を示す。この例は、候補文字が第１から第３まで
あるとし、そのうち、第３候補文字を文字組合せ選択処
理の段階で足切りしたものである。即ち、図１６に示す
ように、文字組合せ選択処理までの基本確率の合成処理
の結果、ｍ（Ｈ₂），ｍ（Ｈ₃）は共に０であるが、Ｈ₂
を含む基本確率がｍ（Ｈ₁Ｈ₂）＋ｍ（Ｈ₁Ｈ₂Ｈ₃）で５
７％であることに対し、Ｈ₃はｍ（Ｈ₁Ｈ₂Ｈ₃）が５％
（後は０％）であるので、第３候補は足切りを行う。そ
の結果、次からの処理では第１候補と第２候補が処理の
対象になり、確信度処理の時間が節約される。つまり、
ルール処理からは仮説が２つになるため、ｍ（Ｈ₃），
ｍ（Ｈ₂Ｈ₃），ｍ（Ｈ₁Ｈ₃），ｍ（Ｈ₁Ｈ₂Ｈ₃）は計算
されなくなる。15 to 17 show specific processing examples of the second embodiment. In this example, there are first to third candidate characters, and the third candidate character is cut off at the stage of the character combination selection processing. That is, as shown in FIG. 16, m (H ₂ ), m (H ₃ ) are both 0 as a result of the combination processing of the basic probabilities up to the character combination selection processing, but H ₂
Is 5 (m (H ₁ H ₂ ) + m (H ₁ H ₂ H ₃ ))
7%, whereas H ₃ contains 5% of m (H ₁ H ₂ H ₃ ).
Since it is (0% after that), the third candidate cuts off. As a result, in the subsequent processing, the first candidate and the second candidate are the processing targets, and the time for the confidence factor processing is saved. That is,
Since there are two hypotheses from the rule processing, m (H ₃ ),
m (H ₂ H ₃ ), m (H ₁ H ₃ ), m (H ₁ H ₂ H ₃ ) are no longer calculated.

【００８６】次に前記第１及び第２の実施例の変形例に
ついて説明する。本変形例においては、前記実施例と同
様に確信度決定部９，１３で切り出し処理、パターンマ
ッチング処理、文字組合せ選択処理、ルール処理、言語
処理の各処理段階毎に確信度を求め、合成することによ
って最終的な確信度を決定するが、最終的な確信度とは
別に、各処理段階毎の確信度、すなわち各処理段階で得
られた情報を基に決定された確信度（数値あるいはラン
ク）をメモリに保存する。ディスプレイ１１の画面上の
認識結果文字に対して、装置制御部１０は、その文字に
関する各処理段階毎の確信度の情報を上記メモリより読
み出し、その情報を基にユーザーに対するメッセージ
（助言）を作成して出力制御部８へ渡しディスプレイ１
１の画面に表示させる。図１８にその処理フロー図を示
す。Next, a modification of the first and second embodiments will be described. In this modified example, as in the above-described embodiment, the certainty factor determination units 9 and 13 obtain and combine certainty factors for each processing step of cutout processing, pattern matching processing, character combination selection processing, rule processing, and language processing. The final confidence level is determined by determining the confidence level for each processing step, that is, the confidence level (numerical value or rank) determined based on the information obtained at each processing step, in addition to the final confidence level. ) In memory. For the recognition result character on the screen of the display 11, the device control unit 10 reads information on the certainty factor for each processing step regarding the character from the memory, and creates a message (advice) to the user based on the information. And pass it to the output control unit 8 display 1
Display it on the 1st screen. FIG. 18 shows a processing flow chart thereof.

【００８７】例えば、確信度情報より、切り出し処理の
確からしさが他の処理段階に比べて怪しいと判断される
場合は「文字を正確に切り出すことができなかった可能
性があります。」というようなメッセージを、パターン
マッチング処理（認識処理）の確からしさが他の処理段
階に比べて怪しいと判断される場合は「認識した文字画
像をチェックしてください。」というようなメッセージ
を、後処理（ルール処理または言語処理）の確からしさ
が他の処理段階に比べて怪しいと判断される場合は「文
字列単位でチェックしてください。」あるいは「知識
（ルール）、単語辞書をチェックしてください。」とい
うようなメッセージを表示させる。For example, if it is judged from the certainty factor information that the certainty of the cutout process is more suspicious than the other processing steps, it is possible that the character could not be cut out correctly. If the certainty of the pattern matching process (recognition process) is more suspicious than the other processing stages, the message such as "Check the recognized character image." If the certainty of processing (language processing) is more doubtful than other processing steps, "Check by character string" or "Knowledge (rule), check word dictionary". Message is displayed.

【００８８】このようなメッセージの表示により、ユー
ザーは誤認識文字の修正のみならず、その原因の判断が
容易になり必要な処置をとりやすくなる。By displaying such a message, the user can not only correct the erroneously recognized character, but also easily determine the cause and take necessary measures.

【００８９】また、前記第１及び第２の実施例におい
て、ある処理段階までの確信度に応じて、その前または
後、あるいは前後の処理段階の処理方式あるいはパラメ
ータ等を複数ある中から最適なものを選択することも、
認識率の向上、最終的な修正作業の効率向上に効果があ
る。図１９にその処理フロー図を示す。Further, in the first and second embodiments, the optimum method is selected from a plurality of processing methods, parameters, etc. of the processing steps before or after the processing step or before and after the processing step according to the certainty factor up to a certain processing step. You can also choose one
It is effective in improving the recognition rate and improving the efficiency of the final correction work. FIG. 19 shows a processing flow chart thereof.

【００９０】例えば第２の実施例において、装置制御部
１０は、パターンマッチング処理までの確信度が低い場
合に、判断文字切り出し処理での切り出し位置を変えて
再度認識をしたり、あるいはパターンマッチング処理内
の前処理である特徴抽出の方法を変更して再度認識をす
るように各部を制御する。前者は文字切り出し位置が不
適切であるために認識結果の確信度が低いと考えられる
からであり、後者は認識画像が前回の特徴抽出方法には
適切でなかったと考えられるからである。このように確
信度の情報を前の処理にフィードバックした場合、装置
制御部１０は、処理方法またはパラメータを変えて試み
た複数の認識結果より確信度の高い結果を有効なものと
して選択する。For example, in the second embodiment, when the certainty factor up to the pattern matching process is low, the device control section 10 changes the cutout position in the determination character cutout process and recognizes again, or the pattern matching process. Each part is controlled so that the feature extraction method, which is the pre-processing in the above, is changed and recognition is performed again. This is because the former is considered to have low confidence in the recognition result because the character cutout position is inappropriate, and the latter is considered to be because the recognized image was not suitable for the previous feature extraction method. In this way, when the information on the certainty factor is fed back to the previous process, the device control unit 10 selects a result having a higher certainty factor than a plurality of recognition results tried by changing the processing method or the parameter as an effective one.

【００９１】逆に、パターンマッチング処理までの確信
度が高い場合、その後の言語処理内の単語照合の際に結
果を変更させないように装置制御部１１２が制御する。
つまり、単語照合は一般的に、認識結果候補の組み合わ
せを予め用意されている単語辞書と照合するわけである
が、その際に確信度の高い文字は固定し、確信度の低い
文字の候補のみを入れ替え単語辞書と照合する。こうす
ることにより、単語の組み合わせ数が減り、処理時間の
短縮、認識率の向上につながる。このように、後の処理
に確信度の情報を活用する場合は、後の処理でのパラメ
ータ、条件、処理方法などを選択することが可能にな
る。On the contrary, when the certainty factor up to the pattern matching process is high, the device control unit 112 controls so as not to change the result in the subsequent word matching in the language process.
In other words, word matching generally matches a combination of recognition result candidates with a word dictionary prepared in advance, but at that time, the character with high certainty is fixed, and only the candidate for the character with low certainty is fixed. And replace it with the word dictionary. By doing so, the number of word combinations is reduced, which leads to a reduction in processing time and an improvement in recognition rate. In this way, when utilizing the information on the certainty factor in the subsequent processing, it becomes possible to select the parameters, conditions, processing methods, etc. in the subsequent processing.

【００９２】図２０は本発明の第３の実施例を示す概略
ブロック図である。２００は図１または図２に示した文
字認識装置本体と同様の構成にスキャナーを加えた部分
であり、便宜上ここでは装置本体と呼ぶ。FIG. 20 is a schematic block diagram showing the third embodiment of the present invention. Reference numeral 200 denotes a part in which a scanner is added to the same structure as the character recognition device main body shown in FIG. 1 or FIG.

【００９３】２０２は装置本体２００内のパターンマッ
チング処理部で抽出された文字パターンの特徴量または
装置本体２００内の正規化処理部により正規化後の文字
パターンを保存するための保存メモリである。ただし、
ここでは保存メモリ２０２に文字パターンの特徴量が保
存されるものとして説明する。２０４は装置本体２００
内の確信度決定部より確信度ランクの情報を受けて、Ｂ
ランクとＣランクの場合に対応した文字パターンの特徴
量を保存メモリ２０２に保存させる制御を行う保存制御
部である。この保存制御部２０４による特徴量（または
パターン）の保存の処理の流れを図２１に示す。Reference numeral 202 denotes a storage memory for storing the feature amount of the character pattern extracted by the pattern matching processing unit in the apparatus main body 200 or the character pattern after being normalized by the normalization processing unit in the apparatus main body 200. However,
Here, it is assumed that the characteristic amount of the character pattern is stored in the storage memory 202. Reference numeral 204 denotes the apparatus main body 200
The confidence level information is received from the confidence level determination unit in
It is a storage control unit that controls the storage memory 202 to store the feature amount of the character pattern corresponding to the rank and the C rank. FIG. 21 shows the flow of processing for saving the characteristic amount (or pattern) by the saving control unit 204.

【００９４】２０６は図１または図１０には示さなかっ
たが前記第１実施例または第２実施例においても当然に
備わっていた認識結果メモリであり、ここには認識結果
データと確信度情報が記憶される。１１は使用者が情報
を入力するためのキーボードであり、認識結果の修正時
の正解文字はキーボードから入力される。２０８はキー
ボード１１からの入力に従って認識結果メモリ２０６内
の認識結果データを修正する修正処理部である。Although not shown in FIG. 1 or FIG. 10, 206 is a recognition result memory which is naturally provided in the first embodiment or the second embodiment. Here, the recognition result data and the confidence information are stored. Remembered. Reference numeral 11 is a keyboard for the user to input information, and the correct character when the recognition result is corrected is input from the keyboard. A correction processing unit 208 corrects the recognition result data in the recognition result memory 206 according to the input from the keyboard 11.

【００９５】２１２は装置本体２００内のパターンマッ
チング処理部での認識処理に用いられる辞書を格納した
辞書メモリであり、これも当然に前記第１実施例または
第２実施例においても存在したものである。２１４は、
この辞書の学習を行うための学習処理部である。この学
習処理部２１４は、認識結果の修正が行なわれた時に、
修正処理部２０８より修正文字の確信度ランクの情報と
正解文字コードを渡されると、確信度ランクがＢランク
またはＣランクの場合に、修正文字の特徴量を保存メモ
リ２０２より読み出し、これを用いて辞書の学習処理を
実行する。その処理フローを図２２に、処理の具体例を
図２３に示す。図２３は「朋」を「明」に修正した場合
の例で、「明」の文字パターンをディスプレイの画面に
表示し、特徴量を用いて辞書学習することを示してい
る。Reference numeral 212 is a dictionary memory that stores a dictionary used for recognition processing in the pattern matching processing section in the apparatus main body 200, which is naturally also present in the first embodiment or the second embodiment. is there. 214 is
A learning processing unit for learning this dictionary. This learning processing unit 214, when the recognition result is corrected,
When the confidence character rank information of the corrected character and the correct character code are passed from the correction processing unit 208, when the confidence rank is B rank or C rank, the characteristic amount of the corrected character is read from the storage memory 202 and used. And executes the dictionary learning process. FIG. 22 shows the processing flow, and FIG. 23 shows a specific example of the processing. FIG. 23 shows an example in which "To" is corrected to "Bright", which shows that the character pattern of "Bright" is displayed on the screen of the display and the dictionary is learned using the feature amount.

【００９６】なお、保存メモリ２０２に文字パターンを
保存する場合には、学習処理部２１４は辞書学習時に、
保存メモリ２０２より修正文字のパターンを読み込み、
その特徴量の抽出を学習処理部内部で行ない、あるい
は、装置本体２００内のパターンマッチング処理部に行
なわせ、得られた特徴量を辞書学習に用いる。When the character pattern is stored in the storage memory 202, the learning processing unit 214 uses
Read the pattern of modified characters from the save memory 202,
The feature amount is extracted inside the learning processing unit, or the pattern matching processing unit in the apparatus main body 200 is caused to use the obtained feature amount for dictionary learning.

【００９４】[0094]

【発明の効果】以上詳細に説明した如く、本発明によれ
ば、認識結果の確信度をより的確に決定して、その情報
を認識結果に付加して装置利用者に知らせることができ
るため、利用者は認識結果の確信度を的確容易に認識
し、必要な修正作業を効率よく行なうことができるよう
になり、認識結果の確信度が低く、処理継続が無駄な場
合などに利用者は処理を停止させたり、あるいは認識条
件を変更して起動するといった、必要な処置を素早くと
ることができるようになり、また、このような場合に無
駄な処理の継続による時間の無駄を減らすことができ、
さらに個々の処理毎の確信度の情報に基づき誤認識の原
因や必要な処置等のメッセージを出力し、装置利用者の
修正作業や装置保守作業等を容易にし、またさらに、あ
る処理までの確信度に応じて前または後の処理あるいは
前後の処理の方法やパラメータ等を自動的に切り替える
ことにより、認識率を向上し、あるいは修正作業を効率
化することができ、また、認識結果の修正時に効率よく
辞書学習を行うことができ、辞書学習に必要な文字パタ
ーンまたは特徴量を保存するための記憶スペースを減ら
すことができる等の効果が得られる。As described in detail above, according to the present invention, it is possible to more accurately determine the certainty factor of the recognition result, add the information to the recognition result, and notify the user of the device. The user can easily recognize the certainty factor of the recognition result and can efficiently perform the necessary correction work.If the certainty factor of the recognition result is low and the continuation of the process is wasteful, the user can process it. It becomes possible to quickly take the necessary action such as stopping the process or starting it after changing the recognition condition, and in such a case, it is possible to reduce the waste of time due to the continuation of useless processing. ,
Furthermore, based on information on the certainty factor for each process, a message such as the cause of misrecognition and necessary measures is output, facilitating correction work and device maintenance work by the device user. The recognition rate can be improved or the correction work can be made more efficient by automatically switching the method or parameter of the previous or subsequent processing or the processing before or after the processing depending on the degree. The dictionary learning can be efficiently performed, and the storage space for storing the character pattern or the feature amount necessary for the dictionary learning can be reduced.

[Brief description of drawings]

【図１】本発明の第１の実施例に係る概略ブロック図で
ある。FIG. 1 is a schematic block diagram according to a first embodiment of the present invention.

【図２】第１の実施例における確信度決定処理のフロー
図である。FIG. 2 is a flowchart of a certainty factor determination process in the first embodiment.

【図３】第１の実施例における確信度決定処理の概念図
である。FIG. 3 is a conceptual diagram of confidence factor determination processing in the first embodiment.

【図４】認識結果文字の具体例である。FIG. 4 is a specific example of recognition result characters.

【図５】第１の実施例における確信度決定処理の具体例
を図４の例で示した図である。FIG. 5 is a diagram showing a specific example of the certainty factor determination processing in the first embodiment in the example of FIG. 4;

【図６】確信度を付与した認識結果を表示した例であ
る。FIG. 6 is an example in which a recognition result with a certainty factor is displayed.

【図７】確信度を付与した認識結果を印字した例であ
る。FIG. 7 is an example in which a recognition result with a certainty factor is printed.

【図８】確信度を認識結果と対応させて表示した例であ
る。FIG. 8 is an example in which a certainty factor is displayed in association with a recognition result.

【図９】確信度を認識結果と対応させて印字した例であ
る。FIG. 9 is an example in which the certainty factor is printed in association with the recognition result.

【図１０】本発明の第２の実施例に係る概略ブロック図
である。FIG. 10 is a schematic block diagram according to a second embodiment of the present invention.

【図１１】認識結果の確信度が低い場合に警告を発して
処理を中断する場合のフロー図である。FIG. 11 is a flow chart when a warning is issued and the process is interrupted when the certainty factor of the recognition result is low.

【図１２】第２の実施例における全体のフロー図であ
る。FIG. 12 is an overall flow chart of the second embodiment.

【図１３】第２の実施例における各処理段階での確信度
決定処理のフロー図である。FIG. 13 is a flowchart of a certainty factor determination process at each processing stage in the second embodiment.

【図１４】文字切り出し処理に関する説明図である。FIG. 14 is an explanatory diagram related to character cutting processing.

【図１５】第２の実施例における確信度決定処理の具体
例である。FIG. 15 is a specific example of a certainty factor determination process in the second embodiment.

【図１６】図１５に続く図である。FIG. 16 is a diagram following FIG. 15;

【図１７】図１６に続く図である。FIG. 17 is a diagram subsequent to FIG. 16;

【図１８】各処理段階について確信度に応じてメッセー
ジを出力する場合のフロー図である。FIG. 18 is a flow chart when a message is output according to the certainty factor for each processing stage.

【図１９】ある処理までの確信度を前あるいは後の処理
に反映させる場合のフロー図である。FIG. 19 is a flowchart when the certainty factor up to a certain process is reflected in the previous or subsequent process.

【図２０】本発明の第３の実施例に係る概略ブロック図
である。FIG. 20 is a schematic block diagram according to a third embodiment of the present invention.

【図２１】特徴量またはパターンの保存処理のフロー図
である。FIG. 21 is a flowchart of a storage process of a feature amount or a pattern.

【図２２】辞書学習処理のフロー図である。FIG. 22 is a flowchart of dictionary learning processing.

【図２３】辞書学習の具体的処理例である。FIG. 23 is a specific processing example of dictionary learning.

[Explanation of symbols]

１スキャナー２切り出し処理
部３正規化処理部４パターンマッ
チング処理部５文字組合せ選択処理部６ルール処理部７言語処理部８出力制御部９確信度決定部１０装置制御部１１ディスプレイ１２印字装置１３確信度決定部１４文字数計数
部１５候補削除判定部２００装置本体２０２保存メモリ２０４保存制御
部２０６認識結果メモリ２０８修正処理
部２１２辞書メモリ２１４学習処理
部1 Scanner 2 Cutout Processing Part 3 Normalization Processing Part 4 Pattern Matching Processing Part 5 Character Combination Selection Processing Part 6 Rule Processing Part 7 Language Processing Part 8 Output Control Part 9 Confidence Determining Part 10 Device Control Part 11 Display 12 Printing Device 13 Confidence Degree determining unit 14 Character number counting unit 15 Candidate deletion determining unit 200 Device body 202 Storage memory 204 Storage control unit 206 Recognition result memory 208 Correction processing unit 212 Dictionary memory 214 Learning processing unit

Claims

[Claims]

1. In order to recognize a character, a plurality of processes such as pattern matching process, character combination selection process, rule process, and language process are performed, and a comprehensive recognition result is obtained based on the information obtained by each process. A character recognition method characterized by determining a certainty factor.

2. A plurality of processes such as a pattern matching process, a character combination selecting process, a rule process, and a language process are performed in order to identify a character, and the information obtained by the process is performed for each process. The final confidence for the final recognition result is obtained by repeating the operation of updating the confidence level of the candidate based on A character recognition method characterized by determining the degree.

3. The character recognition method according to claim 1, wherein information on the determined certainty factor is added to the recognition result and displayed or printed.

4. The character recognition method according to claim 3, wherein visual conditions such as color, brightness, decoration, and typeface of the recognition result displayed or printed are changed according to the certainty factor.

5. The character recognition method according to claim 3, wherein the character or the symbol indicating the certainty factor is displayed or printed in association with the recognition result.

6. The method according to claim 1, wherein the number of recognition result characters having a certainty factor equal to or lower than a certain level is counted, and when the count value exceeds a predetermined threshold value, a warning is issued or the process is interrupted. Or the character recognition method described in 2.

7. The character recognition method according to claim 2, wherein a candidate whose certainty factor is lower than a certain level in a certain process is excluded from candidates for a subsequent process.

8. The character recognition method according to claim 1 or 2, wherein information on the certainty factor for each process is stored separately from the final certainty factor.

9. The character recognition method according to claim 8, wherein the message is output based on the information of the certainty factor for each processing stored for the recognition result character.

10. The character recognition method according to claim 1, wherein the certainty factor up to a certain process is reflected in the process before or after the process, or the process before and after the process.

11. The character recognition method according to claim 1, wherein the pattern or the characteristic amount of the recognition target character is stored only when the certainty factor determined for the recognition target character is a predetermined level. ..

12. A dictionary for pattern matching processing using a stored corresponding pattern or feature amount only when the certainty factor for the corrected recognition result is a predetermined level when the recognition result is corrected. 12. The character recognition method according to claim 11, wherein learning is performed.