JP3226355B2

JP3226355B2 - Recognition result evaluation method

Info

Publication number: JP3226355B2
Application number: JP33937492A
Authority: JP
Inventors: 隆邦嶺脇
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1992-12-21
Filing date: 1992-12-21
Publication date: 2001-11-05
Anticipated expiration: 2016-11-05
Also published as: JPH06187487A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、文字認識装置に係り、
特に、英字、記号等も含む日本語文書の文字認識を行な
う文字認識装置の認識結果の確からしさを評価する方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device,
In particular, the present invention relates to a method for evaluating the certainty of a recognition result of a character recognition device that performs character recognition of a Japanese document including English characters and symbols.

【０００２】[0002]

【従来の技術】文字認識装置において、日本語文書を対
象とした場合に１００パーセントの認識率を達成するこ
とは実際上不可能であるため、人が認識結果の誤りをチ
ェックして修正する必要がある。このような修正作業を
効率化するために、認識結果の確からしさを評価し、そ
の情報を認識結果の文字と一緒にディスプレイに表示す
る（例えば、認識結果の文字の表示色を確からしさに応
じて切り替える）方法が知られている。2. Description of the Related Art In a character recognition apparatus, it is practically impossible to achieve a recognition rate of 100% when a Japanese document is targeted. Therefore, it is necessary for a person to check and correct an error in the recognition result. There is. In order to make such correction work more efficient, the likelihood of the recognition result is evaluated, and the information is displayed on the display together with the character of the recognition result (for example, the display color of the character of the recognition result is changed according to the certainty). Switching) is known.

【０００３】このような認識結果の出力制御を実施する
前提として、認識結果の確からしさを評価する必要があ
る。その方法として、文字認識時の辞書マッチングで得
られた第１候補文字の類似度もしくは相違度（距離）に
よって確からしさを評価する方法や、第１候補文字と第
２候補文字についての相違度を用いて確からしさを評価
する方法（特開昭６２−２８０９８３号）が知られてい
る。As a premise of performing such output control of the recognition result, it is necessary to evaluate the certainty of the recognition result. As the method, there is a method of evaluating the likelihood based on the similarity or the difference (distance) of the first candidate character obtained by dictionary matching at the time of character recognition, and the difference between the first candidate character and the second candidate character. A method of evaluating the likelihood by using the method (Japanese Patent Laid-Open No. 62-280983) is known.

【０００４】[0004]

【発明が解決しようとする課題】しかし、辞書マッチン
グで得られる類似度もしくは相違度は、文字認識の方式
（アルゴリズム）や原稿画像の品質、フォント等の違い
により同じ文字でも値が大きく変動するため、従来の確
からしさの評価方法は汎用性・安定性に欠ける。そのた
め従来は、文字認識装置の認識方式や対象原稿等に応じ
て、類似度もしくは相違度の評価用閾値等を調整しなけ
ればならないという面倒があった。However, the similarity or difference obtained by dictionary matching greatly varies with the same character due to differences in the character recognition method (algorithm), the quality of the original image, the font, and the like. However, conventional methods for evaluating certainty lack versatility and stability. For this reason, conventionally, it has been troublesome to adjust a threshold value for evaluating the similarity or the difference degree according to the recognition method of the character recognition device or the target document.

【０００５】本発明の目的は、そのような問題点を解消
した新しい認識結果評価方法を提供することにある。An object of the present invention is to provide a new recognition result evaluation method which solves such a problem.

【０００６】[0006]

【課題を解決するための手段】本発明による認識結果評
価方法は、文字認識装置において、認識結果として複数
の候補文字が得られた場合、該複数の候補文字の画数ま
たは構成線素数を用いて、認識結果の確からしさを評価
することを骨子とするものである。具体的には、認識結
果の複数の候補文字の最大画数と最小画数との差値を求
め、該差値を認識結果の確からしさの決定に用いる。あ
るいは、認識結果の複数の候補文字の最大構成線素数と
最小構成線素数との差値を求め、該差値を認識結果の確
からしさの決定に用いる。 SUMMARY OF THE INVENTION Recognition result evaluation according to the present invention
Value method, in the character recognition apparatus, when a plurality of candidate characters as a recognition result is obtained, with the number of strokes or configuration line primes the plurality of candidate characters, evaluate the likelihood of the recognition result
The main point is to do. More specifically,
The difference between the maximum stroke count and the minimum stroke count of multiple candidate characters
Therefore, the difference value is used for determining the certainty of the recognition result. Ah
Alternatively, the maximum number of constituent line elements of multiple candidate characters
Find the difference value from the minimum number of constituent line elements, and use the difference value to confirm the recognition result.
Used for determination of mustache.

【０００７】[0007]

【作用】ここで、構成線素数は、文字を擬似直線の集合
として表わしたときの擬似直線の総数であって、一般
に、画数に比べ、大きな値となり（漢字「口」の画数は
３であるが構成線素数は４である）、文字の構造的な特
徴をよりよく表わす。The constituent line prime number is the total number of pseudo straight lines when a character is represented as a set of pseudo straight lines, and generally has a larger value than the number of strokes (the number of strokes of the kanji "mouth" is three). Is 4), which better represents the structural features of the character.

【０００８】そして、画数と構成線素数は文字毎に一意
に決まる数値であり、文字認識方式や原稿画像品質、フ
ォント等の違いによる影響を受けない。したがって、本
発明の方法によれば、文字認識方式等が異なっても格別
の調整を必要とせずに、認識結果の確からしさの安定・
高精度の評価が可能となる。[0008] The number of strokes and the number of constituent line elements are numerical values that are uniquely determined for each character, and are not affected by differences in the character recognition method, original image quality, font, and the like. Therefore, according to the method of the present invention, even if the character recognition method or the like is different, no special adjustment is required, and the reliability of the recognition result can be stabilized.
High-precision evaluation becomes possible.

【０００９】[0009]

【実施例】以下、図面を用いて本発明の実施例を説明す
る。Embodiments of the present invention will be described below with reference to the drawings.

【００１０】実施例１図１は本実施例に係る文字認識装置のブロック図を示
す。図１において、画像入力部１００は光学的スキャナ
等によって原稿の画像を読み取り、その画像データを画
像メモリ１０２へ格納する。行・文字切り出し部１０４
は、画像メモリ１０２から認識すべき文字行と文字画像
を切り出し、切り出した文字画像のデータを文字画像メ
モリ１０６に格納する。文字認識部１０８は、文字画像
メモリ１０６より文字画像のデータを読み込み、ノイズ
除去、正規化等の前処理を行なってから特徴量を抽出
し、この特徴量と文字辞書メモリ１１０に格納されてい
る特徴量との比較によって類似度の大きな（相違度の小
さな）候補文字を複数決定し、必要ならば候補の足切り
処理を行なって、候補文字を認識結果メモリ１１２へ格
納する。足切り処理は、類似度の絶対値を閾値として切
る方法、順位が隣合う候補の距離差を閾値として切る方
法、可能な候補を全て出力する方法等々、認識アルゴリ
ズムに応じた適当な方法によればよい。 Embodiment 1 FIG. 1 is a block diagram of a character recognition device according to this embodiment. In FIG. 1, an image input unit 100 reads an image of a document using an optical scanner or the like, and stores the image data in an image memory 102. Line / character extraction unit 104
Cuts out a character line and a character image to be recognized from the image memory 102 and stores the data of the cut out character image in the character image memory 106. The character recognizing unit 108 reads the data of the character image from the character image memory 106, performs pre-processing such as noise removal and normalization, and then extracts a characteristic amount. The characteristic amount is stored in the character dictionary memory 110. A plurality of candidate characters having a large degree of similarity (a small degree of difference) are determined by comparing with the feature amount, and if necessary, the candidate characters are truncated, and the candidate characters are stored in the recognition result memory 112. The truncation process is performed by an appropriate method according to the recognition algorithm, such as a method of cutting the absolute value of the similarity as a threshold, a method of cutting the distance difference between candidates having adjacent ranks as a threshold, a method of outputting all possible candidates, and the like. I just need.

【００１１】本実施例では、認識結果の確からしさの評
価に候補文字の画数を用いるため、認識対象範囲内の文
字の画数情報が画数情報メモリ１１４に格納されてい
る。なお、画数情報を文字辞書メモリ１１０に格納して
もよい。確からしさ決定部１１６は、認識結果メモリ１
１２に格納された候補の文字コードによって画数情報メ
モリ１１４を参照して画数情報を取得し、これ基づき各
認識結果の確からしさを決定し、その結果を認識結果メ
モリ１１２に格納する。確からしさの決定処理の内容に
ついては後述する。In this embodiment, the stroke number information of the characters within the recognition target range is stored in the stroke number information memory 114 in order to use the stroke number of the candidate character in evaluating the likelihood of the recognition result. The stroke number information may be stored in the character dictionary memory 110. The likelihood determining unit 116 recognizes the recognition result memory 1
The stroke number information is acquired by referring to the stroke number information memory 114 based on the candidate character code stored in 12, the likelihood of each recognition result is determined based on this, and the result is stored in the recognition result memory 112. The details of the certainty determination processing will be described later.

【００１２】結果出力部１１８は、認識結果メモリ１１
２の内容（候補文字コード、確からしさ等）を、ディス
プレイやプリンタ等の出力機器に出力する。例えば、認
識結果の第１候補文字を、その確からしさに応じて、記
号を付加したり網掛け等の修飾を施したり、あるいは表
示色もしくは輝度を変化させて出力する。このような出
力制御によって、ユーザは修正を必要とする文字を容易
に検索できるようになる。The result output unit 118 stores the recognition result memory 11
The content (No. 2 candidate character code, certainty, etc.) is output to an output device such as a display or a printer. For example, the first candidate character of the recognition result is added with a symbol, modified such as shading, or changed in display color or luminance according to the likelihood, and output. Such output control allows the user to easily search for a character that requires correction.

【００１３】次に、確からしさ決定部１１６の処理内容
について説明する。図２は、その処理のフローチャート
である。Next, the processing contents of the likelihood determining section 116 will be described. FIG. 2 is a flowchart of the process.

【００１４】まず、認識結果メモリ１１２内の注目した
一つの認識結果について、その候補数が２文字以上であ
るかを調べる（ステップ２００）。候補が１文字のとき
は、予め定められた固定値を認識結果の確からしさの値
として出力する（ステップ２１５）。First, it is checked whether the number of candidates for one noted recognition result in the recognition result memory 112 is two or more characters (step 200). If the candidate is one character, a predetermined fixed value is output as a certainty value of the recognition result (step 215).

【００１５】候補が２文字以上であるときは、その第１
候補文字を読み込み（ステップ２１０）、その文字コー
ドによって画数情報メモリ１１４を参照して画数を取得
し（ステップ２２０）、候補が残っているならば（ステ
ップ２２５）、次位の候補文字を読み込み（ステップ２
３０）、その画数を画数情報メモリ１１４より取得する
（ステップ２２０）。最後の候補まで画数を取得する
と、取得した画数の中で最大画数と最小画数を見つけ、
その差値を求める（ステップ２３５）。なお、ここでは
全候補を対象としたが、例えば上位の３候補までという
ように、対象とする候補数を制限してもよい。If the candidate has two or more characters, the first
The candidate characters are read (step 210), the number of strokes is obtained by referring to the stroke number information memory 114 based on the character code (step 220), and if there are remaining candidates (step 225), the next candidate character is read (step 220). Step 2
30), the number of strokes is obtained from the stroke count information memory 114 (step 220). When the number of strokes has been acquired up to the last candidate, the maximum and minimum strokes are found among the acquired strokes,
The difference value is obtained (step 235). Here, all candidates are targeted, but the number of targeted candidates may be limited, for example, up to the top three candidates.

【００１６】次に、画数の差値から確からしさを次の式確からしさ＝１００−（画数差値×固定値） ....（１）によって算出する（ステップ２４０）。Next, the likelihood is calculated from the difference value of the number of strokes according to the following equation: likelihood = 100− (number of stroke difference × fixed value) (1) (step 240).

【００１７】すなわち、認識結果の複数の候補の中に、
画数の大きく異なる文字があるときには認識結果の確か
らしさは「低い」と評価され、逆に、候補として挙がっ
た文字の画数の違いが小さいときに確からしさは「高
い」と評価されることになる。文字画像の輪郭の方向等
の特徴量を用い「量」という文字を認識した場合に、例
えば表１のような認識結果が得られたとする。「量」の
文字品質が悪く、つぶれがちな場合などには、このよう
な認識結果となる傾向がある。なお、文字認識の方法
は、構造解析的なものやニューラルネット等、入力画像
に対して複数の候補を得られる任意の方法でよい。That is, among a plurality of candidates for the recognition result,
When there are characters with greatly different numbers of strokes, the likelihood of the recognition result is evaluated as "low", and conversely, when the difference in the number of strokes of the characters listed as candidates is small, the likelihood is evaluated as "high" . It is assumed that, when a character “quantity” is recognized using a characteristic amount such as a direction of a contour of a character image, a recognition result as shown in Table 1 is obtained. Such a recognition result tends to be obtained when the character quality of the “quantity” is poor and tends to be crushed. The character recognition method may be any method that can obtain a plurality of candidates for the input image, such as a structural analysis method or a neural network.

【００１８】[0018]

【表１】 [Table 1]

【００１９】この認識結果の各候補文字に対して、表２
に示す画数が取得される。For each candidate character of this recognition result, Table 2
Are obtained.

【００２０】[0020]

【表２】 [Table 2]

【００２１】ここで、第３位候補までを対象にすると、
最大画数は１２、最小画数は２であるので、式（１）中
の「固定値」を５とすると、確からしさは５０となる
（値が大きいほど確からしさが高い）。Here, when targeting up to the third place candidate,
Since the maximum number of strokes is 12 and the minimum number of strokes is 2, if the “fixed value” in equation (1) is set to 5, the certainty is 50 (the larger the value, the higher the certainty).

【００２２】なお、確からしさの算出式は式（１）に限
定されるものではない。また、ここでは最大画数と最小
画数の差値によって確からしさを算出したが、これに限
られるものではない。例えば、最大画数と最小画数の差
値及び平均値の組を用いて確からしさを算出する方法、
最大画数と最小画数の差値及び最大画数の組を用いて確
からしさを算出する方法、あるいは最大画数と最小画数
の差値及び最小画数の組を用いて確からしさを算出する
方法等々も採用可能である。The formula for calculating the certainty is not limited to the formula (1). Although the certainty is calculated here based on the difference between the maximum number of images and the minimum number of images, the present invention is not limited to this. For example, a method of calculating certainty using a set of a difference value and an average value between the maximum number of strokes and the minimum number of strokes,
It is also possible to adopt a method of calculating certainty using the set of the difference value between the maximum and minimum strokes and the maximum number of strokes, or the method of calculating certainty using the set of the difference value between the maximum and minimum strokes and the minimum number of strokes. It is.

【００２３】実施例２図３は本実施例に係る文字認識装置のブロック図であ
り、図１と同一の符号は同一部を示す。 Embodiment 2 FIG. 3 is a block diagram of a character recognition apparatus according to the present embodiment. The same reference numerals as in FIG. 1 denote the same parts.

【００２４】本実施例は、前記実施例１の確からしさ決
定の算出をテーブル引に変更したものであり、そのため
に最大画数と最小画数の差値に対応した確からしさの値
（前記式（１）により計算した値）を格納した確からし
さテーブルメモリ１２０が追加されている。確からしさ
テーブルメモリ１２０の内容の一例を表３に示す。In this embodiment, the calculation of the certainty determination in the first embodiment is changed to a table table. Therefore, the certainty value corresponding to the difference between the maximum number of strokes and the minimum number of strokes (formula (1)) The likelihood table memory 120 which stores the value calculated by the above) is added. Table 3 shows an example of the contents of the probability table memory 120.

【００２５】[0025]

【表３】 [Table 3]

【００２６】図４は確からしさ決定部１１６Ａの処理の
フローチャートである。前記実施例１の確からしさ決定
処理（図２）との相違点は、ステップ２３５で求めた最
大画数と最小画数の差値を用いて確からしさを算出する
代わりに、ステップ２４０Ａで差値により確からしさテ
ーブルメモリ１２０を参照することにより、確からしさ
の値を取得することである。これ以外は前記実施例１と
同様である。FIG. 4 is a flowchart of the processing of the likelihood determining section 116A. The difference from the likelihood determination processing of the first embodiment (FIG. 2) is that instead of calculating the likelihood using the difference value between the maximum number of pictures and the minimum number of pictures obtained in step 235, the likelihood is determined by the difference value in step 240A. By referring to the likelihood table memory 120, a value of likelihood is obtained. Other than this, it is the same as the first embodiment.

【００２７】例えば、表１に示すような認識結果の場
合、表２より最大画数と最小画数の差値は１０であるの
で、表３より確からしさの値は５０となる。For example, in the case of a recognition result as shown in Table 1, the difference between the maximum number of images and the minimum number of images is 10 from Table 2, and the value of certainty is 50 from Table 3.

【００２８】なお、前記実施例１で述べたように、確か
らしさ決定部１１６Ａで、最大画数と最小画数の差値及
び平均値の組、最大画数と最小画数の差値及び最大画数
の組、あるいは最大画数と最小画数の差値及び最小画数
の組を求め、それに対応した確からしさ値を確からしさ
テーブルメモリ１２０より取得させてもよい。As described in the first embodiment, the likelihood determining section 116A sets the difference value between the maximum number of strokes and the minimum number of strokes and the average value, the difference value between the maximum number of strokes and the minimum number of strokes, and the maximum number of strokes. Alternatively, a set of a difference value between the maximum number of images and the minimum number of images and a minimum number of images may be obtained, and a certainty value corresponding thereto may be obtained from the certainty table memory 120.

【００２９】実施例３図５は本実施例に係る文字認識装置のブロック図であ
り、図１と同一の符号は同一部を示す。本実施例は、確
からしさ決定部１１６Ｂで、確からしさの算出のために
「画数」に代えて「構成線素数」を用いるため、認識対
象範囲内の文字の構成線素数情報が構成線素数情報メモ
リ１２４に格納されている。ただし、構成線素数情報を
文字辞書メモリ１１０に格納してもよい。 Embodiment 3 FIG. 5 is a block diagram of a character recognition apparatus according to this embodiment, and the same reference numerals as those in FIG. 1 indicate the same parts. In the present embodiment, the likelihood determining unit 116B uses the “elementary line number” instead of the “number of strokes” for the calculation of the likelihood. It is stored in the memory 124. However, the constituent line prime number information may be stored in the character dictionary memory 110.

【００３０】文字認識と認識結果出力については前記各
実施例と同様であるので説明を省略し、確からしさの決
定処理につい説明する。図６はその処理のフローチャー
トである。Since the character recognition and the output of the recognition result are the same as those in the above-described embodiments, the description will be omitted, and the process of determining the certainty will be described. FIG. 6 is a flowchart of the process.

【００３１】まず、認識結果メモリ１１２内の注目した
一つの認識結果について、その候補数が２文字以上であ
るかを調べる（ステップ３００）。候補が１文字のとき
は、予め定められた固定値を、認識結果の確からしさの
値として出力する（ステップ３１５）。First, it is checked whether or not the number of candidates for one noted recognition result in the recognition result memory 112 is two or more characters (step 300). If the candidate is a single character, a predetermined fixed value is output as a certainty value of the recognition result (step 315).

【００３２】候補が２文字以上であるときは、その第１
候補文字を読み込み（ステップ３１０）、その文字コー
ドによって構成線素数情報メモリ１２２を参照して構成
線素数を取得し（ステップ３２０）、候補が残っている
ならば（ステップ３２５）、次位の候補文字を読み込み
（ステップ３３０）、その画数を構成線素情報メモリ１
２２より取得する（ステップ３２０）。最後の候補まで
構成線素数を取得すると、取得した構成線素数の中で最
大構成線素数と最小構成線素数を見つけ、その差値を求
める（ステップ３３５）。なお、ここでは全候補を対象
としたが、例えば上位の３候補までというように、対象
とする候補数を制限してもよい。If the candidate has two or more characters, the first
The candidate character is read (step 310), the constituent line prime number is acquired by referring to the constituent line prime information memory 122 based on the character code (step 320), and if the candidate remains (step 325), the next candidate The character is read (step 330), and the number of strokes is stored in the constituent line element information memory 1.
22 (step 320). When the constituent element numbers are obtained up to the last candidate, the maximum constituent element number and the minimum constituent element number are found among the obtained constituent element numbers, and their difference values are obtained (step 335). Here, all candidates are targeted, but the number of targeted candidates may be limited, for example, up to the top three candidates.

【００３３】次に、構成線素数の差値から確からしさを
次の式確からしさ＝１００−（構成線素数差値×固定値） ....（２）によって算出する（ステップ３４０）。Next, the likelihood is calculated from the difference value of the constituent line prime numbers according to the following equation: likelihood = 100− (component line number difference value × fixed value) (2) (step 340).

【００３４】すなわち、認識結果の複数の候補の中に、
構成線素数の大きく異なる文字があるときには認識結果
の確からしさは「低い」と評価され、逆に、候補として
挙がった文字の構成線素数の違いが小さいときに確から
しさは「高い」と評価されることになる。That is, among a plurality of candidates of the recognition result,
When there are characters with greatly different constituent line prime numbers, the likelihood of the recognition result is evaluated as "low", and conversely, when the difference in the constituent line prime numbers of the candidate characters is small, the likelihood is evaluated as "high". Will be.

【００３５】前記表１に示された認識結果の各候補につ
いて表４に示すような構成線素数が得られたとする。It is assumed that the constituent line numbers as shown in Table 4 are obtained for each candidate of the recognition result shown in Table 1 above.

【００３６】[0036]

【表４】 [Table 4]

【００３７】ここで、第３位候補までを対象にすると、
最大構成線素数は１４、最小構成線素数は２であるの
で、式（２）中の「固定値」を５とすると、確からしさ
は６０となる（値が大きいほど確からしさが高い）。Here, when targeting the third candidate,
Since the maximum number of constituent line elements is 14 and the minimum number of constituent line elements is 2, if the “fixed value” in the equation (2) is 5, the likelihood is 60 (the larger the value, the higher the likelihood).

【００３８】なお、確からしさの算出式は式（２）に限
定されるものではない。また、ここでは最大構成線素数
と最小構成線素数の差値によって確からしさを算出した
が、これに限られるものではない。例えば、最大構成線
素数と最小構成線素数の差値及び平均値の組を用いて確
からしさを算出する方法、最大構成線素数と最小構成線
素数の差値及び最大構成線素数の組を用いて確からしさ
を算出する方法、あるいは最大構成線素数と最小構成線
素数の差値及び最小構成線素数の組を用いて確からしさ
を算出する方法等々も採用可能である。The formula for calculating the likelihood is not limited to the formula (2). In addition, although the certainty is calculated here based on the difference between the maximum number of constituent line elements and the minimum number of constituent line elements, the present invention is not limited to this. For example, a method of calculating certainty using a set of a difference value and an average value between the maximum number of constituent element numbers and the minimum constituent element number, using a set of a difference value between the maximum constituent element number and the minimum constituent element number and a maximum constituent element number It is also possible to employ a method of calculating certainty, a method of calculating certainty using a set of a difference value between the maximum number of constituent line elements and the minimum number of constituent element numbers, and a set of minimum constituent element numbers.

【００３９】実施例４図７は本実施例に係る文字認識装置のブロック図であ
り、図５と同一の符号は同一部を示す。 Embodiment 4 FIG. 7 is a block diagram of a character recognition apparatus according to this embodiment, and the same reference numerals as in FIG. 5 indicate the same parts.

【００４０】本実施例は、前記実施例３の確からしさ決
定の算出をテーブル引に変更したものであり、そのため
に最大構成線素数と最小構成線素数の差値に対応した確
からしさの値（前記式（２）により計算した値）を格納
した確からしさテーブルメモリ１２４が追加されてい
る。確からしさテーブルメモリ１２４の内容の一例を表
５に示す。In the present embodiment, the calculation of the certainty determination in the third embodiment is changed to a table table, and for that purpose, the certainty value (corresponding to the difference value between the maximum number of constituent line elements and the minimum number of constituent line elements) is calculated. The likelihood table memory 124 storing the value calculated by the above equation (2) is added. Table 5 shows an example of the contents of the certainty table memory 124.

【００４１】[0041]

【表５】 [Table 5]

【００４２】図８は確からしさ決定部１１６Ｃの処理の
フローチャートである。前記実施例３の確からしさ決定
処理（図６）との相違点は、ステップ３３５で求めた最
大構成線素数と最小構成線素数の差値を用いて確からし
さを算出する代わりに、ステップ３４０Ａで差値により
確からしさテーブルメモリ１２４を参照することによ
り、確からしさの値を取得することである。これ以外は
前記実施例３と同様である。FIG. 8 is a flowchart of the process of the likelihood determining section 116C. The difference from the likelihood determination processing of the third embodiment (FIG. 6) is that instead of calculating the likelihood using the difference value between the maximum number of constituent line elements and the minimum number of constituent line elements determined in step 335, the likelihood is calculated in step 340A. By referring to the certainty table memory 124 based on the difference value, the value of the certainty is obtained. Except for this, it is the same as the third embodiment.

【００４３】例えば、表１に示すような認識結果の場
合、表４より最大構成線素数と最小構成線素数の差値は
１２であるので、表５より確からしさの値は５４とな
る。For example, in the case of a recognition result as shown in Table 1, the difference between the maximum number of constituent elements and the minimum number of constituent elements is 12 according to Table 4, and the certainty value is 54 according to Table 5.

【００４４】なお、前記実施例３で述べたように、確か
らしさ決定部１１６Ｃで、最大構成線素数と最小構成線
素数の差値及び平均値の組、最大構成線素数と最小構成
線素数の差値及び最大構成線素数の組、あるいは最大構
成線素数と最小構成線素数の差値及び最小構成線素数の
組を求め、それに対応した確からしさ値を確からしさテ
ーブルメモリ１２４より取得させてもよい。As described in the third embodiment, the likelihood determining unit 116C sets a set of a difference value and an average value between the maximum number of constituent element numbers and the minimum constituent element number, and sets the maximum constituent element number and the minimum constituent element number. A set of a difference value and a maximum constituent line element number or a difference value between a maximum constituent line element number and a minimum constituent line element number and a minimum constituent line element number may be obtained, and a certainty value corresponding thereto may be obtained from the certainty table memory 124. Good.

【００４５】以上説明したように、本発明の主たる特徴
は、認識結果の確からしさの評価に画数または構成線素
数を用いることであるが、この評価値と、他の方法によ
る表価値とを組み合わせて、最終的な確からしさを決定
することも可能である。As described above, the main feature of the present invention is to use the number of strokes or the number of constituent lines for evaluating the likelihood of the recognition result. The evaluation value is combined with the table value by another method. It is also possible to determine the final certainty.

【００４６】[0046]

【発明の効果】以上に詳細に説明した如く、本発明の方
法は、認識結果の確からしさの決定のために文字毎に一
意に決まる数値である画数または構成線素数を用いるの
で、文字認識方式や原稿画像品質、フォント等が異なっ
ても、格別の調整を必要とすることなく、認識結果の確
からしさを安定かつ高精度に評価することができる。As described above in detail, the method of the present invention uses a stroke number or a constituent line prime number, which is a numerical value uniquely determined for each character, to determine the likelihood of the recognition result. It is possible to stably and highly accurately evaluate the certainty of the recognition result without requiring any special adjustment even if the document image quality and the document image quality and fonts are different.

[Brief description of the drawings]

【図１】実施例１に係る文字認識装置のブロック図であ
る。FIG. 1 is a block diagram of a character recognition device according to a first embodiment.

【図２】実施例１の認識結果確からしさ決定処理のフロ
ーチャートである。FIG. 2 is a flowchart of a recognition result likelihood determining process according to the first embodiment.

【図３】実施例２に係る文字認識装置のブロック図であ
る。FIG. 3 is a block diagram of a character recognition device according to a second embodiment.

【図４】実施例２の認識結果確からしさ決定処理のフロ
ーチャートである。FIG. 4 is a flowchart of a recognition result likelihood determination process according to the second embodiment.

【図５】実施例３に係る文字認識装置のブロック図であ
る。FIG. 5 is a block diagram of a character recognition device according to a third embodiment.

【図６】実施例３の認識結果確からしさ決定処理のフロ
ーチャートである。FIG. 6 is a flowchart of a recognition result likelihood determination process according to the third embodiment.

【図７】実施例４に係る文字認識装置のブロック図であ
る。FIG. 7 is a block diagram of a character recognition device according to a fourth embodiment.

【図８】実施例４の認識結果確からしさ決定処理のフロ
ーチャートである。FIG. 8 is a flowchart of a recognition result likelihood determination process according to the fourth embodiment.

[Explanation of symbols]

１００画像入力部１０２画像メモリ１０４行・文字切り出し部１０６文字画像メモリ１０８文字認識部１１０文字辞書１１２認識結果メモリ１１４画数情報メモリ１１６，１１６Ａ，１１６Ｂ，１１６Ｃ確からしさ決
定部１１８結果出力部１２０，１２４確からしさテーブルメモリ１２２構成線素数情報メモリREFERENCE SIGNS LIST 100 Image input unit 102 Image memory 104 Line / character cutout unit 106 Character image memory 108 Character recognition unit 110 Character dictionary 112 Recognition result memory 114 Stroke number information memory 116, 116A, 116B, 116C Probability determining unit 118 Result output unit 120, 124 Probability table memory 122 Constituent line prime number information memory

Claims

(57) [Claims]

1. A character recognition apparatus, when a plurality of candidate characters as a recognition result is obtained, of the plurality of candidate ideographic
A recognition result evaluation method characterized by determining the likelihood of the recognition result based on the number of large strokes and the minimum number of strokes .

2. The recognition result according to claim 1 , wherein a difference value between a maximum stroke number and a minimum stroke number of the plurality of candidate characters of the recognition result is obtained, and the difference value is used to determine the certainty of the recognition result. Evaluation method.

3. A character recognition apparatus, when a plurality of candidate characters as a recognition result is obtained, of the plurality of candidate ideographic
A recognition result evaluation method, comprising determining the likelihood of the recognition result based on the number of large constituent line elements and the minimum number of constituent line elements.

4. The method according to claim 3 , wherein a difference value between a maximum number of constituent line elements and a minimum number of constituent line elements of a plurality of candidate characters of the recognition result is obtained, and the difference value is used to determine the certainty of the recognition result. Recognition result evaluation method described.