JPH0540853A

JPH0540853A - Post-processing system for character recognizing result

Info

Publication number: JPH0540853A
Application number: JP3196508A
Authority: JP
Inventors: Akitoshi Tsukamoto; 明利塚本; Sadamasa Hirogaki; 節正広垣; Naohiro Amamoto; 直弘天本
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1991-08-06
Filing date: 1991-08-06
Publication date: 1993-02-19

Abstract

PURPOSE:To correct the wrong character recognizing result into the grammatically correct words. CONSTITUTION:The characters are recognized and the difference of resemblance is calculated between a candidate character and its original character pattern in a step 1. In a step 2, the candidate characters are combined together into a candidate word and also the assurance of the candidate is calculated from the difference of resemblance. In a step 3, the contents of a grammar dictionary 4 and a part-of-speech dictionary 5 are referred to for the words whose maximum assurance values of their candidates are less than a prescribed threshold level. At the same time, the grammatical relations are utilized between those words having the maximum assurance values of their candidates less than the threshold level and the words having the maximum assurance values of their candidates more than the threshold level. Thus the wrong words can be corrected into the grammatically correct words.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、光学的に読取った文字
を認識して出力する装置に関し、特に文字認識結果に誤
りが存在した場合に、これを自動的に修正して出力する
文字認識結果の後処理方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a device for recognizing and outputting an optically read character, and more particularly to a character recognition for automatically correcting and outputting an error in the character recognition result. It relates to the post-processing method of the result.

【０００２】[0002]

【従来の技術】従来、この分野の技術としては、例え
ば、特開平２−２６７６７０に示されるものがあった。
上記文献に開示された技術は、単語中に認識不能文字
（リジェクト文字）が存在した場合、このリジェクト文
字の前後の文字、前後の文字配列及び位置に基づいて文
字テーブルから自動的に候補文字を呼出し、これをリジ
ェクト文字に置き換えた単語について検索を行うことに
より文字認識結果の修正を行うものであった。2. Description of the Related Art Conventionally, as a technique in this field, for example, there is one disclosed in Japanese Patent Laid-Open No. 2-267670.
The technology disclosed in the above document, when an unrecognizable character (reject character) is present in a word, automatically detects candidate characters from the character table based on the characters before and after this reject character, the character array before and after, and the position. The character recognition result was corrected by calling and searching for a word in which this was replaced with a reject character.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記従
来の技術は、文字単位に修正を行う方法であり、単語間
の関係を規定する文法を利用していないため、修正結果
が意味的に通じず、文法的に誤りであるような単語への
修正を行う可能性があるという問題点があった。本発明
は、前記問題点を解決して、文字認識の結果が誤ってい
ても、文法的に正しい単語に修正することのできる文字
認識結果の後処理方式を提供することを目的とする。However, since the above-mentioned conventional technique is a method of making corrections on a character-by-character basis and does not utilize a grammar that defines the relationship between words, the correction result cannot be meaningfully understood. , There was a problem in that there is a possibility of correcting a word that is grammatically incorrect. SUMMARY OF THE INVENTION It is an object of the present invention to solve the above problems and provide a post-processing method of a character recognition result that can correct a grammatically correct word even if the character recognition result is incorrect.

【０００４】[0004]

【課題を解決するための手段】前記問題点を解決するた
めに、本発明は、文字認識結果の各単語に対する候補単
語の作成と確信度の算出を行い、その確信度の最大値が
所定のしきい値よりも小さい単語に対し、その単語に近
接する候補単語の確信度の最大値が所定のしきい値より
も大きい単語との文法的関係を利用して、その単語を文
法的に正しい単語に修正するものである。本発明におい
て、確信度とは候補単語が正しいものと確信される度合
である。In order to solve the above problems, the present invention creates a candidate word for each word of a character recognition result and calculates the certainty factor, and the maximum value of the certainty factor is a predetermined value. Using a grammatical relationship between a word smaller than a threshold value and a word whose maximum confidence value of a candidate word close to the word is larger than a predetermined threshold value, the word is grammatically correct. It is to correct it into words. In the present invention, the certainty factor is the degree to which the candidate word is surely correct.

【０００５】[0005]

【作用】本発明によれば、以上のように文字認識結果の
後処理方式を構成したので、文字認識結果の各単語に対
する候補単語の作成と確信度の算出を行い、その確信度
の最大値が所定のしきい値よりも小さい単語に対し、そ
の単語に近接する候補単語の確信度の最大値が所定のし
きい値よりも大きい単語との文法的関係を利用して、そ
の単語を文法的に正しい単語に修正する。したがって、
文字認識の結果が誤りであっても、文法的に正しい単語
に修正することができる。According to the present invention, since the post-processing method of the character recognition result is configured as described above, the candidate word for each word of the character recognition result is created and the certainty factor is calculated, and the maximum value of the certainty factor is calculated. For a word whose is less than a predetermined threshold, use the grammatical relationship with the word whose maximum confidence value of candidate words near that word is greater than the predetermined threshold to Correct words. Therefore,
Even if the result of character recognition is incorrect, it can be corrected to a grammatically correct word.

【０００６】[0006]

【実施例】以下、本発明の実施例について図面を参照し
ながら詳細に説明する。図１は本発明の実施例に係る文
字認識結果の後処理方式を示す流れ図、図２は本発明の
実施例に係る文字認識結果の後処理方式を実施する文字
認識装置を示すブロック図、図３は本発明の実施例にお
ける候補単語と確信度の一例を示す図、図４は本発明の
実施例における候補単語の選定処理の説明図である。Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 1 is a flow chart showing a post-processing method of a character recognition result according to an embodiment of the present invention, and FIG. 2 is a block diagram showing a character recognition device implementing a post-processing method of a character recognition result according to the embodiment of the present invention. 3 is a diagram showing an example of a candidate word and a certainty factor in the embodiment of the present invention, and FIG. 4 is an explanatory diagram of a candidate word selection process in the embodiment of the present invention.

【０００７】図２において、１１は装置全体を制御する
ＣＰＵ、１２は単語に対する品詞を与える品詞情報を記
載している品詞辞書、１３は品詞間の関係を与える情報
を記載している文法辞書、１４は文書上の単語を読取る
文書読取り手段、１５は読取った文字を認識して候補文
字とその距離を出力する文字認識手段、１６は認識結果
である候補文字とその距離を記憶する文字認識結果記憶
手段、１７は候補文字を組合わせて候補単語を作成する
候補単語作成手段、１８は候補文字の距離からその確信
度を計算する確信度算出手段、１９は候補単語の選定を
行う時に品詞辞書１２と文法辞書１３を検索する辞書検
索手段、２０は辞書の内容に基づいて候補単語の選定を
行う出力単語決定手段、２１は出力単語を表示・出力す
る結果表示・出力手段である。In FIG. 2, 11 is a CPU that controls the entire apparatus, 12 is a part-of-speech dictionary that describes part-of-speech information that gives a part-of-speech for a word, and 13 is a grammar dictionary that describes information that gives a relationship between parts-of-speech. Reference numeral 14 is a document reading means for reading a word on the document, 15 is a character recognizing means for recognizing the read character and outputting a candidate character and its distance, 16 is a character recognition result for storing the candidate character as a recognition result and its distance Storage means, 17 is a candidate word creating means for creating a candidate word by combining candidate characters, 18 is a confidence factor calculating means for calculating the certainty factor from the distance between the candidate characters, and 19 is a part-of-speech dictionary when selecting a candidate word. 12 is a dictionary search means for searching the grammar dictionary 13, 20 is an output word determining means for selecting a candidate word based on the contents of the dictionary, 21 is a result display / output for displaying / outputting the output word It is a stage.

【０００８】以下、図１〜図４を参照しながら本発明の
実施例に係る文字認識結果の後処理方式の処理動作を説
明する。（１）文字認識（ステップ１）文書読取り手段１４により文書上の単語を読取り、文字
認識手段１５により、読取った単語の各文字の認識とそ
の距離の計算を行い、認識結果記憶手段１６に記憶す
る。ここで、距離とは各候補文字と元の文字パターンと
の類似度を表すもので、その数値が小さいほどその候補
文字と元の文字パターンとが似ていることになる。Hereinafter, the processing operation of the post-processing method of the character recognition result according to the embodiment of the present invention will be described with reference to FIGS. (1) Character Recognition (Step 1) The word on the document is read by the document reading means 14, the character recognition means 15 recognizes each character of the read word, and the distance between them is calculated and stored in the recognition result storage means 16. To do. Here, the distance represents the similarity between each candidate character and the original character pattern, and the smaller the numerical value, the more similar the candidate character and the original character pattern.

【０００９】（２）候補単語作成／確信度算出（ステッ
プ２）候補単語作成手段１７により、認識結果記憶手段１６に
記憶されている候補文字を組合わせて候補単語を作成す
る。また、確信度算出手段１８により、候補文字の距離
を基にその確信度を計算する。ここで、確信度は候補単
語が正しいものと確信される度合いのことで、本実施例
においては、「候補単語の確信度」＝「候補単語の距離
の逆数／１つの入力単語に対する各候補単語の距離の逆
数の総和」により計算した。(2) Candidate word creation / certainty factor calculation (step 2) The candidate word creation means 17 creates candidate words by combining the candidate characters stored in the recognition result storage means 16. Further, the certainty factor calculating means 18 calculates the certainty factor based on the distance of the candidate character. Here, the certainty factor is the degree of certainty that the candidate word is correct, and in the present embodiment, “the certainty factor of the candidate word” = “the reciprocal of the distance of the candidate word / each candidate word for one input word” The sum of the reciprocals of the distances ".

【００１０】図３は英文“Ｉａｍａｂｏｙ．”に
対する算出結果の一例であり、各文字パターンに対する
候補単語と確信度が与えられている。同図において、単
語“Ｉ”に対する候補単語“Ｉ”の確信度は１００％で
あり、また単語“ａｍ”に対する候補単語“ａｎ”の確
信度は６０％、候補単語“ａｍ”に対する確信度は４０
％である。単語“ａ”、“ｂｏｙ”、“．”に対しても
同様に候補単語と確信度が与えられている。FIG. 3 shows an example of the calculation result for the English sentence "I am a boy.", In which the candidate word and the certainty factor for each character pattern are given. In the figure, the certainty factor of the candidate word “I” with respect to the word “I” is 100%, the certainty factor of the candidate word “an” with respect to the word “am” is 60%, and the certainty factor with respect to the candidate word “am” is 40
%. Similarly, candidate words and certainty factors are given to the words "a", "boy", and ".".

【００１１】（３）後処理修正（ステップ３）候補単語の確信度が１００％である場合は、出力単語決
定手段２０は候補単語をそのまま選定する。図３におい
ては、“Ｉ”及び“ｂｏｙ”がこれに該当する。次に、
各単語に対する確信度の最大値がしきい値よりも低い単
語に対して、その単語に近接する確信度の最大値がしき
い値よりも高い単語の文法的特徴を利用して候補単語の
選定を行う。このとき、辞書検索手段１９により品詞辞
書１２と文法辞書１３の検索を行い、その内容を参照す
る。(3) Post-processing correction (step 3) When the certainty factor of the candidate word is 100%, the output word determining means 20 selects the candidate word as it is. In FIG. 3, "I" and "boy" correspond to this. next,
For a word whose maximum confidence value for each word is lower than the threshold value, the candidate word is selected by using the grammatical features of the words whose maximum confidence value is higher than the threshold value for the word. I do. At this time, the dictionary search means 19 searches the part-of-speech dictionary 12 and the grammar dictionary 13 and refers to the contents thereof.

【００１２】図４は図３の算出結果に対する実施例であ
り、確信度のしきい値を９０％とすると、単語“ａ
ｍ”、“ａ”、及び“．”に対する候補単語の確信度の
最大値がしきい値よりも低い。しかしながら、文末
の“．”に対しては文法的にピリオドであることが適当
であるため、等しい確信度を持つ候補単
語“．”、“，”、のうち“．”が選ばれ、修正結果と
して出力される。また、単語“ａ”に対しては次の単語
が確信度１００％の“ｂｏｙ”であり、その品詞は名詞
である。単語“ａ”に対する候補単語“ａ”は冠詞であ
り、その他の候補単語に対しては品詞が与えられない。
そして、名詞の前には冠詞がくるのが適当であるので、
候補単語“ａ”が結果として選ばれる。また、その前の
単語“ａｍ”に対しては、冠詞が２つ連続して出現する
ことが文法的に許されないため、動詞の“ａｍ”が選ば
れる。このように、前後の単語との文法的関係を利用す
ることにより、たとえ確信度は低くとも、文法的に誤り
ではない単語に修正することができる。FIG. 4 shows an embodiment for the calculation result of FIG. 3, and if the threshold value of the certainty factor is 90%, the word "a"
m ”,“ a ”, and“. The maximum confidence value of the candidate word for "is lower than the threshold value. Since it is appropriate to have a grammatical period for "," the candidate word ". “,”, Among “,”. Is selected and output as a correction result. Further, for the word “a”, the next word is “boy” with a certainty factor of 100%, and its part of speech is a noun. Candidates for the word “a” The word "a" is an article, and no part of speech is given to other candidate words.
And it is appropriate that the article comes before the noun,
The candidate word "a" is selected as a result. Further, with respect to the word "am" before that, the verb "am" is selected because it is grammatically not allowed that two consecutive articles appear. In this way, by using the grammatical relationship with the preceding and following words, it is possible to correct the word to a word that is not grammatically incorrect even if the certainty factor is low.

【００１３】以上、本発明の実施例を英単語の場合につ
いて説明したが、利用する文法的知識を他の言語やプロ
グラム言語の文法に変更することにより、本方式は他の
言語についても実施することが可能である。また、本発
明は上記実施例に限定されるものではなく、本発明の趣
旨に基づき種々の変形が可能であり、それらを本発明の
範囲から排除するものではない。Although the embodiment of the present invention has been described above for the case of English words, the present method can be applied to other languages by changing the grammatical knowledge to be used into the grammar of another language or programming language. It is possible. Further, the present invention is not limited to the above embodiments, and various modifications can be made based on the gist of the present invention, and they are not excluded from the scope of the present invention.

【００１４】[0014]

【発明の効果】以上、詳細に説明したように、本発明に
よれば、単語の文法的特徴を利用しているので、文字認
識の結果が誤っていても文法的に正しい単語に修正する
ことができる。As described above in detail, according to the present invention, the grammatical feature of a word is used. Therefore, even if the result of character recognition is incorrect, the word can be corrected to a grammatically correct word. You can

[Brief description of drawings]

【図１】本発明の実施例に係る文字認識結果の後処理方
式を示す流れ図である。FIG. 1 is a flow chart showing a post-processing method of a character recognition result according to an embodiment of the present invention.

【図２】本発明の実施例に係る文字認識結果の後処理方
式を実施する文字認識装置を示すブロック図である。FIG. 2 is a block diagram showing a character recognition device that implements a post-processing method for character recognition results according to an embodiment of the present invention.

【図３】本発明の実施例における候補単語と確信度の一
例を示す図である。FIG. 3 is a diagram showing an example of a candidate word and a certainty factor in the embodiment of the present invention.

【図４】本発明の実施例における候補単語の選定処理の
説明図である。FIG. 4 is an explanatory diagram of a candidate word selection process according to the embodiment of this invention.

[Explanation of symbols]

１文字認識２候補単語作成／確信度算出３後処理修正４文法辞書５品詞辞書 1 Character recognition 2 Candidate word creation / certainty factor calculation 3 Post-processing modification 4 Grammar dictionary 5 Part-of-speech dictionary

Claims

[Claims]

1. A method of: (a) creating a candidate word for each word of a character recognition result and calculating a certainty factor; and (b) for a word whose maximum certainty factor is smaller than a predetermined threshold value. Character recognition characterized by correcting a word to a grammatically correct word by using a grammatical relationship with a word whose maximum confidence value of a candidate word close to the word is larger than a predetermined threshold value Post-processing method of results.