JPH0540853A - Post-processing system for character recognizing result - Google Patents

Post-processing system for character recognizing result

Info

Publication number
JPH0540853A
JPH0540853A JP3196508A JP19650891A JPH0540853A JP H0540853 A JPH0540853 A JP H0540853A JP 3196508 A JP3196508 A JP 3196508A JP 19650891 A JP19650891 A JP 19650891A JP H0540853 A JPH0540853 A JP H0540853A
Authority
JP
Japan
Prior art keywords
word
candidate
words
character
certainty factor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
JP3196508A
Other languages
Japanese (ja)
Inventor
Akitoshi Tsukamoto
明利 塚本
Sadamasa Hirogaki
節正 広垣
Naohiro Amamoto
直弘 天本
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oki Electric Industry Co Ltd
Original Assignee
Oki Electric Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oki Electric Industry Co Ltd filed Critical Oki Electric Industry Co Ltd
Priority to JP3196508A priority Critical patent/JPH0540853A/en
Publication of JPH0540853A publication Critical patent/JPH0540853A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To correct the wrong character recognizing result into the grammatically correct words. CONSTITUTION:The characters are recognized and the difference of resemblance is calculated between a candidate character and its original character pattern in a step 1. In a step 2, the candidate characters are combined together into a candidate word and also the assurance of the candidate is calculated from the difference of resemblance. In a step 3, the contents of a grammar dictionary 4 and a part-of-speech dictionary 5 are referred to for the words whose maximum assurance values of their candidates are less than a prescribed threshold level. At the same time, the grammatical relations are utilized between those words having the maximum assurance values of their candidates less than the threshold level and the words having the maximum assurance values of their candidates more than the threshold level. Thus the wrong words can be corrected into the grammatically correct words.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、光学的に読取った文字
を認識して出力する装置に関し、特に文字認識結果に誤
りが存在した場合に、これを自動的に修正して出力する
文字認識結果の後処理方式に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a device for recognizing and outputting an optically read character, and more particularly to a character recognition for automatically correcting and outputting an error in the character recognition result. It relates to the post-processing method of the result.

【0002】[0002]

【従来の技術】従来、この分野の技術としては、例え
ば、特開平2−267670に示されるものがあった。
上記文献に開示された技術は、単語中に認識不能文字
(リジェクト文字)が存在した場合、このリジェクト文
字の前後の文字、前後の文字配列及び位置に基づいて文
字テーブルから自動的に候補文字を呼出し、これをリジ
ェクト文字に置き換えた単語について検索を行うことに
より文字認識結果の修正を行うものであった。
2. Description of the Related Art Conventionally, as a technique in this field, for example, there is one disclosed in Japanese Patent Laid-Open No. 2-267670.
The technology disclosed in the above document, when an unrecognizable character (reject character) is present in a word, automatically detects candidate characters from the character table based on the characters before and after this reject character, the character array before and after, and the position. The character recognition result was corrected by calling and searching for a word in which this was replaced with a reject character.

【0003】[0003]

【発明が解決しようとする課題】しかしながら、上記従
来の技術は、文字単位に修正を行う方法であり、単語間
の関係を規定する文法を利用していないため、修正結果
が意味的に通じず、文法的に誤りであるような単語への
修正を行う可能性があるという問題点があった。本発明
は、前記問題点を解決して、文字認識の結果が誤ってい
ても、文法的に正しい単語に修正することのできる文字
認識結果の後処理方式を提供することを目的とする。
However, since the above-mentioned conventional technique is a method of making corrections on a character-by-character basis and does not utilize a grammar that defines the relationship between words, the correction result cannot be meaningfully understood. , There was a problem in that there is a possibility of correcting a word that is grammatically incorrect. SUMMARY OF THE INVENTION It is an object of the present invention to solve the above problems and provide a post-processing method of a character recognition result that can correct a grammatically correct word even if the character recognition result is incorrect.

【0004】[0004]

【課題を解決するための手段】前記問題点を解決するた
めに、本発明は、文字認識結果の各単語に対する候補単
語の作成と確信度の算出を行い、その確信度の最大値が
所定のしきい値よりも小さい単語に対し、その単語に近
接する候補単語の確信度の最大値が所定のしきい値より
も大きい単語との文法的関係を利用して、その単語を文
法的に正しい単語に修正するものである。本発明におい
て、確信度とは候補単語が正しいものと確信される度合
である。
In order to solve the above problems, the present invention creates a candidate word for each word of a character recognition result and calculates the certainty factor, and the maximum value of the certainty factor is a predetermined value. Using a grammatical relationship between a word smaller than a threshold value and a word whose maximum confidence value of a candidate word close to the word is larger than a predetermined threshold value, the word is grammatically correct. It is to correct it into words. In the present invention, the certainty factor is the degree to which the candidate word is surely correct.

【0005】[0005]

【作用】本発明によれば、以上のように文字認識結果の
後処理方式を構成したので、文字認識結果の各単語に対
する候補単語の作成と確信度の算出を行い、その確信度
の最大値が所定のしきい値よりも小さい単語に対し、そ
の単語に近接する候補単語の確信度の最大値が所定のし
きい値よりも大きい単語との文法的関係を利用して、そ
の単語を文法的に正しい単語に修正する。したがって、
文字認識の結果が誤りであっても、文法的に正しい単語
に修正することができる。
According to the present invention, since the post-processing method of the character recognition result is configured as described above, the candidate word for each word of the character recognition result is created and the certainty factor is calculated, and the maximum value of the certainty factor is calculated. For a word whose is less than a predetermined threshold, use the grammatical relationship with the word whose maximum confidence value of candidate words near that word is greater than the predetermined threshold to Correct words. Therefore,
Even if the result of character recognition is incorrect, it can be corrected to a grammatically correct word.

【0006】[0006]

【実施例】以下、本発明の実施例について図面を参照し
ながら詳細に説明する。図1は本発明の実施例に係る文
字認識結果の後処理方式を示す流れ図、図2は本発明の
実施例に係る文字認識結果の後処理方式を実施する文字
認識装置を示すブロック図、図3は本発明の実施例にお
ける候補単語と確信度の一例を示す図、図4は本発明の
実施例における候補単語の選定処理の説明図である。
Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 1 is a flow chart showing a post-processing method of a character recognition result according to an embodiment of the present invention, and FIG. 2 is a block diagram showing a character recognition device implementing a post-processing method of a character recognition result according to the embodiment of the present invention. 3 is a diagram showing an example of a candidate word and a certainty factor in the embodiment of the present invention, and FIG. 4 is an explanatory diagram of a candidate word selection process in the embodiment of the present invention.

【0007】図2において、11は装置全体を制御する
CPU、12は単語に対する品詞を与える品詞情報を記
載している品詞辞書、13は品詞間の関係を与える情報
を記載している文法辞書、14は文書上の単語を読取る
文書読取り手段、15は読取った文字を認識して候補文
字とその距離を出力する文字認識手段、16は認識結果
である候補文字とその距離を記憶する文字認識結果記憶
手段、17は候補文字を組合わせて候補単語を作成する
候補単語作成手段、18は候補文字の距離からその確信
度を計算する確信度算出手段、19は候補単語の選定を
行う時に品詞辞書12と文法辞書13を検索する辞書検
索手段、20は辞書の内容に基づいて候補単語の選定を
行う出力単語決定手段、21は出力単語を表示・出力す
る結果表示・出力手段である。
In FIG. 2, 11 is a CPU that controls the entire apparatus, 12 is a part-of-speech dictionary that describes part-of-speech information that gives a part-of-speech for a word, and 13 is a grammar dictionary that describes information that gives a relationship between parts-of-speech. Reference numeral 14 is a document reading means for reading a word on the document, 15 is a character recognizing means for recognizing the read character and outputting a candidate character and its distance, 16 is a character recognition result for storing the candidate character as a recognition result and its distance Storage means, 17 is a candidate word creating means for creating a candidate word by combining candidate characters, 18 is a confidence factor calculating means for calculating the certainty factor from the distance between the candidate characters, and 19 is a part-of-speech dictionary when selecting a candidate word. 12 is a dictionary search means for searching the grammar dictionary 13, 20 is an output word determining means for selecting a candidate word based on the contents of the dictionary, 21 is a result display / output for displaying / outputting the output word It is a stage.

【0008】以下、図1〜図4を参照しながら本発明の
実施例に係る文字認識結果の後処理方式の処理動作を説
明する。 (1)文字認識(ステップ1) 文書読取り手段14により文書上の単語を読取り、文字
認識手段15により、読取った単語の各文字の認識とそ
の距離の計算を行い、認識結果記憶手段16に記憶す
る。ここで、距離とは各候補文字と元の文字パターンと
の類似度を表すもので、その数値が小さいほどその候補
文字と元の文字パターンとが似ていることになる。
Hereinafter, the processing operation of the post-processing method of the character recognition result according to the embodiment of the present invention will be described with reference to FIGS. (1) Character Recognition (Step 1) The word on the document is read by the document reading means 14, the character recognition means 15 recognizes each character of the read word, and the distance between them is calculated and stored in the recognition result storage means 16. To do. Here, the distance represents the similarity between each candidate character and the original character pattern, and the smaller the numerical value, the more similar the candidate character and the original character pattern.

【0009】(2)候補単語作成/確信度算出(ステッ
プ2) 候補単語作成手段17により、認識結果記憶手段16に
記憶されている候補文字を組合わせて候補単語を作成す
る。また、確信度算出手段18により、候補文字の距離
を基にその確信度を計算する。ここで、確信度は候補単
語が正しいものと確信される度合いのことで、本実施例
においては、「候補単語の確信度」=「候補単語の距離
の逆数/1つの入力単語に対する各候補単語の距離の逆
数の総和」により計算した。
(2) Candidate word creation / certainty factor calculation (step 2) The candidate word creation means 17 creates candidate words by combining the candidate characters stored in the recognition result storage means 16. Further, the certainty factor calculating means 18 calculates the certainty factor based on the distance of the candidate character. Here, the certainty factor is the degree of certainty that the candidate word is correct, and in the present embodiment, “the certainty factor of the candidate word” = “the reciprocal of the distance of the candidate word / each candidate word for one input word” The sum of the reciprocals of the distances ".

【0010】図3は英文“I am a boy.”に
対する算出結果の一例であり、各文字パターンに対する
候補単語と確信度が与えられている。同図において、単
語“I”に対する候補単語“I”の確信度は100%で
あり、また単語“am”に対する候補単語“an”の確
信度は60%、候補単語“am”に対する確信度は40
%である。単語“a”、“boy”、“.”に対しても
同様に候補単語と確信度が与えられている。
FIG. 3 shows an example of the calculation result for the English sentence "I am a boy.", In which the candidate word and the certainty factor for each character pattern are given. In the figure, the certainty factor of the candidate word “I” with respect to the word “I” is 100%, the certainty factor of the candidate word “an” with respect to the word “am” is 60%, and the certainty factor with respect to the candidate word “am” is 40
%. Similarly, candidate words and certainty factors are given to the words "a", "boy", and ".".

【0011】(3)後処理修正(ステップ3) 候補単語の確信度が100%である場合は、出力単語決
定手段20は候補単語をそのまま選定する。図3におい
ては、“I”及び“boy”がこれに該当する。次に、
各単語に対する確信度の最大値がしきい値よりも低い単
語に対して、その単語に近接する確信度の最大値がしき
い値よりも高い単語の文法的特徴を利用して候補単語の
選定を行う。このとき、辞書検索手段19により品詞辞
書12と文法辞書13の検索を行い、その内容を参照す
る。
(3) Post-processing correction (step 3) When the certainty factor of the candidate word is 100%, the output word determining means 20 selects the candidate word as it is. In FIG. 3, "I" and "boy" correspond to this. next,
For a word whose maximum confidence value for each word is lower than the threshold value, the candidate word is selected by using the grammatical features of the words whose maximum confidence value is higher than the threshold value for the word. I do. At this time, the dictionary search means 19 searches the part-of-speech dictionary 12 and the grammar dictionary 13 and refers to the contents thereof.

【0012】図4は図3の算出結果に対する実施例であ
り、確信度のしきい値を90%とすると、単語“a
m”、“a”、及び“.”に対する候補単語の確信度の
最大値がしきい値よりも低い。しかしながら、文末
の“.”に対しては文法的にピリオドであることが適当
であるため、等しい確信度を持つ候補単
語“.”、“,”、のうち“.”が選ばれ、修正結果と
して出力される。また、単語“a”に対しては次の単語
が確信度100%の“boy”であり、その品詞は名詞
である。単語“a”に対する候補単語“a”は冠詞であ
り、その他の候補単語に対しては品詞が与えられない。
そして、名詞の前には冠詞がくるのが適当であるので、
候補単語“a”が結果として選ばれる。また、その前の
単語“am”に対しては、冠詞が2つ連続して出現する
ことが文法的に許されないため、動詞の“am”が選ば
れる。このように、前後の単語との文法的関係を利用す
ることにより、たとえ確信度は低くとも、文法的に誤り
ではない単語に修正することができる。
FIG. 4 shows an embodiment for the calculation result of FIG. 3, and if the threshold value of the certainty factor is 90%, the word "a"
m ”,“ a ”, and“. The maximum confidence value of the candidate word for "is lower than the threshold value. Since it is appropriate to have a grammatical period for "," the candidate word ". “,”, Among “,”. Is selected and output as a correction result. Further, for the word “a”, the next word is “boy” with a certainty factor of 100%, and its part of speech is a noun. Candidates for the word “a” The word "a" is an article, and no part of speech is given to other candidate words.
And it is appropriate that the article comes before the noun,
The candidate word "a" is selected as a result. Further, with respect to the word "am" before that, the verb "am" is selected because it is grammatically not allowed that two consecutive articles appear. In this way, by using the grammatical relationship with the preceding and following words, it is possible to correct the word to a word that is not grammatically incorrect even if the certainty factor is low.

【0013】以上、本発明の実施例を英単語の場合につ
いて説明したが、利用する文法的知識を他の言語やプロ
グラム言語の文法に変更することにより、本方式は他の
言語についても実施することが可能である。また、本発
明は上記実施例に限定されるものではなく、本発明の趣
旨に基づき種々の変形が可能であり、それらを本発明の
範囲から排除するものではない。
Although the embodiment of the present invention has been described above for the case of English words, the present method can be applied to other languages by changing the grammatical knowledge to be used into the grammar of another language or programming language. It is possible. Further, the present invention is not limited to the above embodiments, and various modifications can be made based on the gist of the present invention, and they are not excluded from the scope of the present invention.

【0014】[0014]

【発明の効果】以上、詳細に説明したように、本発明に
よれば、単語の文法的特徴を利用しているので、文字認
識の結果が誤っていても文法的に正しい単語に修正する
ことができる。
As described above in detail, according to the present invention, the grammatical feature of a word is used. Therefore, even if the result of character recognition is incorrect, the word can be corrected to a grammatically correct word. You can

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の実施例に係る文字認識結果の後処理方
式を示す流れ図である。
FIG. 1 is a flow chart showing a post-processing method of a character recognition result according to an embodiment of the present invention.

【図2】本発明の実施例に係る文字認識結果の後処理方
式を実施する文字認識装置を示すブロック図である。
FIG. 2 is a block diagram showing a character recognition device that implements a post-processing method for character recognition results according to an embodiment of the present invention.

【図3】本発明の実施例における候補単語と確信度の一
例を示す図である。
FIG. 3 is a diagram showing an example of a candidate word and a certainty factor in the embodiment of the present invention.

【図4】本発明の実施例における候補単語の選定処理の
説明図である。
FIG. 4 is an explanatory diagram of a candidate word selection process according to the embodiment of this invention.

【符号の説明】[Explanation of symbols]

1 文字認識 2 候補単語作成/確信度算出 3 後処理修正 4 文法辞書 5 品詞辞書 1 Character recognition 2 Candidate word creation / certainty factor calculation 3 Post-processing modification 4 Grammar dictionary 5 Part-of-speech dictionary

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 (a)文字認識結果の各単語に対する候
補単語の作成と確信度の算出を行い、 (b)該確信度の最大値が所定のしきい値よりも小さい
単語に対し、その単語に近接する候補単語の確信度の最
大値が所定のしきい値よりも大きい単語との文法的関係
を利用して、その単語を文法的に正しい単語に修正する
ことを特徴とする文字認識結果の後処理方式。
1. A method of: (a) creating a candidate word for each word of a character recognition result and calculating a certainty factor; and (b) for a word whose maximum certainty factor is smaller than a predetermined threshold value. Character recognition characterized by correcting a word to a grammatically correct word by using a grammatical relationship with a word whose maximum confidence value of a candidate word close to the word is larger than a predetermined threshold value Post-processing method of results.
JP3196508A 1991-08-06 1991-08-06 Post-processing system for character recognizing result Withdrawn JPH0540853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP3196508A JPH0540853A (en) 1991-08-06 1991-08-06 Post-processing system for character recognizing result

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP3196508A JPH0540853A (en) 1991-08-06 1991-08-06 Post-processing system for character recognizing result

Publications (1)

Publication Number Publication Date
JPH0540853A true JPH0540853A (en) 1993-02-19

Family

ID=16358923

Family Applications (1)

Application Number Title Priority Date Filing Date
JP3196508A Withdrawn JPH0540853A (en) 1991-08-06 1991-08-06 Post-processing system for character recognizing result

Country Status (1)

Country Link
JP (1) JPH0540853A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870800B (en) * 2012-12-18 2018-12-25 富士施乐株式会社 Information processing equipment and information processing method
US10817756B2 (en) 2018-06-13 2020-10-27 Fuji Xerox Co., Ltd. Information processing apparatus and non-transitory computer readable medium
US11258925B2 (en) 2020-03-24 2022-02-22 Fujifilm Business Innovation Corp. Information processing apparatus for displaying the correction of an image and non-transitory computer readable medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870800B (en) * 2012-12-18 2018-12-25 富士施乐株式会社 Information processing equipment and information processing method
US10817756B2 (en) 2018-06-13 2020-10-27 Fuji Xerox Co., Ltd. Information processing apparatus and non-transitory computer readable medium
US11258925B2 (en) 2020-03-24 2022-02-22 Fujifilm Business Innovation Corp. Information processing apparatus for displaying the correction of an image and non-transitory computer readable medium

Similar Documents

Publication Publication Date Title
US5610812A (en) Contextual tagger utilizing deterministic finite state transducer
US5784489A (en) Apparatus and method for syntactic signal analysis
JPH07325828A (en) Grammar checking system
WO2007097176A1 (en) Speech recognition dictionary making supporting system, speech recognition dictionary making supporting method, and speech recognition dictionary making supporting program
JPH07325824A (en) Grammar checking system
JPH0540853A (en) Post-processing system for character recognizing result
JP2595934B2 (en) Kana-Kanji conversion processor
JPH06215184A (en) Labeling device for extracted area
JP3071745B2 (en) Post-processing method of character recognition result
JP2870375B2 (en) Sentence correction device
JP2908460B2 (en) Error recognition correction method and apparatus
JP2918380B2 (en) Post-processing method of character recognition result
JPH0540854A (en) Post-processing method for character recognizing result
JP2838850B2 (en) Kana-Kanji conversion device
JPH09120296A (en) Device and method for speech recognition, device and method for dictionary generation, and information storage medium
JP3344793B2 (en) Kana-Kanji conversion device
JPH0415960B2 (en)
JPS5899829A (en) Erroneous character detection and correction backing device
JPS62212871A (en) Sentence reading correcting device
JPH04252390A (en) Post processing method for character recognition result
JPH0769710B2 (en) Natural language analysis method
JPH0458381A (en) Optical character reader
KR20000032270A (en) Voice recognition method of voice typing system
JPH01281561A (en) Method for extracting japanese sentence correcting candidate character
JPS5961899A (en) Japanese language voice input unit

Legal Events

Date Code Title Description
A300 Application deemed to be withdrawn because no request for examination was validly filed

Free format text: JAPANESE INTERMEDIATE CODE: A300

Effective date: 19981112