JP2974145B2

JP2974145B2 - Correcting character recognition results

Info

Publication number: JP2974145B2
Application number: JP63286578A
Authority: JP
Inventors: 啓嗣小島
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1988-11-12
Filing date: 1988-11-12
Publication date: 1999-11-08
Anticipated expiration: 2014-11-08
Also published as: JPH02132577A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、文字認識結果の修正方法に係り、特に英語
のように文字種が限られる言語の文書の文字認識の場合
に効果的に適用し得る文字認識結果の修正方法に関す
る。Description: TECHNICAL FIELD The present invention relates to a method for correcting a character recognition result, and particularly to a method for effectively applying character recognition to a document in a language with limited character types such as English. The present invention relates to a method for correcting a character recognition result to be obtained.

[Conventional technology]

マルチフォント・マルチサイズの英文の文字認識にお
いては、文字パターンの特徴の照合による方法では、大
文字・小文字や記号（コンマとアボストロフィー、マイ
ナスとアンダースコアーなど）の区別が大別難しく、そ
の区別の誤りによる誤認識が生じやすい。In character recognition of multi-font and multi-size English sentences, it is difficult to distinguish uppercase and lowercase letters and symbols (comma and apostrophe, minus and underscore, etc.) by a method that matches the characteristics of character patterns. Misrecognition is likely to occur due to an error in

このような誤認識を減らす手法としては、昭和63年電
子情報通信学会春季全国大会予稿集、第１−191頁に掲
載の論文“英文文書認識処理における文字の大きさと位
置に着目した文字分類方式”に述べられている方法があ
る。As a method for reducing such misrecognition, a paper "Character Classification Method Focusing on Character Size and Position in English Document Recognition Processing" published on pp. 1-191 of the IEICE Spring National Convention, pp. 1-191. There is a method described in "

これは文字認識の前処理段階において、行単位に文字
の高さが揃っていることを前提とし、行単位で文字の上
端位置と下端位置のヒストグラムにより上下の基準線を
抽出し、この基準線と各文字イメージの外接矩形の上下
端との位置関係および各文字の高さより、行上の文字を
複数のカテゴリーに分類するというものである。This presupposes that the heights of the characters are the same for each line in the preprocessing stage of character recognition, and extracts the upper and lower reference lines from the histogram of the top and bottom positions of the characters for each line. The characters on the line are classified into a plurality of categories based on the positional relationship between the characters and the upper and lower ends of a circumscribed rectangle of each character image and the height of each character.

[Problems to be solved by the invention]

しかし、このような文字分類を前処理段階で行い、決
定したカテゴリーについてのみ文字パターン特徴による
認識を行う方法によると、文字分類を誤った場合、正し
い文字が候補から絶対的に除外されてしまい、その誤認
識文字を後処理（例えば単語処理）によって修正するこ
とが極めて困難になってしまう。However, according to the method of performing such character classification in the preprocessing stage and performing recognition based on the character pattern feature only for the determined category, if the character classification is incorrect, correct characters are absolutely excluded from candidates, It becomes extremely difficult to correct the erroneously recognized character by post-processing (for example, word processing).

また、処理が行単位であるため、単語（スペースとス
ペースで区切られた文字列）の単位で文字サイズが変化
する英文文書などには適用できない。Further, since the processing is performed in units of lines, it cannot be applied to an English document or the like in which the character size changes in words (spaces and character strings separated by spaces).

さらに、行単位のヒストグラムから基準線を抽出する
ため、基準線抽出の際に認識原稿のスキューの補正処理
が必要になり、前処理が複雑化する。Furthermore, since the reference line is extracted from the line-by-line histogram, it is necessary to perform a skew correction process on the recognition document at the time of extracting the reference line, which complicates the preprocessing.

一方、文字認識の後処理において、特徴パラメータの
みで判定できない類似パターンの候補文字について、過
去の統計などにもとづいてあらかじめ用意された形状パ
ラメータと比較して文字を同定する方法もあるが（例え
ば特開昭59−167783号公報）、この方法は、あらかじめ
用意される形状パラメータの質によって精度が依存し、
認識対象となる文書によっては形状パラメータを構築し
直すことも必要となる。On the other hand, in the post-processing of character recognition, there is a method of identifying a character by comparing a candidate character of a similar pattern that cannot be determined only with the feature parameter with a shape parameter prepared in advance based on past statistics or the like (for example, In this method, accuracy depends on the quality of shape parameters prepared in advance,
Depending on the document to be recognized, it may be necessary to reconstruct the shape parameters.

本発明の目的は、上記方法のような問題を解決できる
文字認識結果の修正方法を提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to provide a method of correcting a character recognition result that can solve the above-described problems.

[Means and actions for solving the problem]

本発明は、文字認識の後処理において、対象文字列の
文字認識結果の第１候補を調べて、特定のカテゴリー
（絶対大文字高さあるいは絶対小文字高さの文字カテゴ
リー）に所属するキー文字を抽出し、該キー文字を基準
として、該キー文字と当該文字列中の他の文字のイメー
ジの上下位置または高さの関係および該他の文字の第１
候補の属するカテゴリーに従って当該文字列中の他の文
字の認識結果を修正するものである。According to the present invention, in a post-processing of character recognition, a first candidate of a character recognition result of a target character string is examined to extract a key character belonging to a specific category (a character category having an absolute uppercase height or an absolute lowercase height). Then, based on the key character, the relationship between the vertical position or height of the key character and the image of another character in the character string and the first character of the other character
According to the category to which the candidate belongs, the recognition result of another character in the character string is corrected.

〔Example〕

以下、本発明の一実施例について図面により説明す
る。Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

第１図は本発明に係る英文文字認識装置の一例を示す
ハードウェア構成図である。１は処理対象の英文文書を
２値イメージとして読取るスキャナー、２は英文文書の
２値イメージを記憶するためのイメージメモリである。
３は文字切出し、特徴抽出、辞書照合、後処理などの処
理を実行する中央処理装置（CPU）である。４はこのよ
うな処理のプログラムおよびパラメータ類を格納したRO
M、５は処理によって得られるデータを記憶するためのR
AMである。６は文字認識のためテンプレートを格納した
辞書（ROMまたはRAM）である。FIG. 1 is a hardware configuration diagram showing an example of an English character recognition device according to the present invention. A scanner 1 reads an English document to be processed as a binary image, and an image memory 2 stores a binary image of the English document.
Reference numeral 3 denotes a central processing unit (CPU) that executes processes such as character extraction, feature extraction, dictionary matching, and post-processing. 4 is an RO that stores programs and parameters for such processing.
M and 5 are R for storing data obtained by processing.
AM. Reference numeral 6 denotes a dictionary (ROM or RAM) storing templates for character recognition.

処理の全体の流れは次の通りである。イメージメモリ
２内の英文文書の２値イメージに対して行切出し、文字
切出しが行われる。切出された個々の文字のイメージの
特徴が抽出され、辞書６に登録された各文字のテンプレ
ートと比較照合され、距離が小さい候補文字が例えば第
ｎ候補まで選ばれ、その文字コードと距離がRAM5に記憶
される。The overall flow of the process is as follows. Line extraction and character extraction are performed on the binary image of the English document in the image memory 2. Image features of the extracted individual characters are extracted, compared with the template of each character registered in the dictionary 6, and candidate characters having a small distance are selected, for example, up to the n-th candidate. Stored in RAM5.

また、後処理において利用されるデータも抽出され、
RAM5に記憶される。このデータとしては、各文字イメー
ジの外接矩形の上端と下端の高さ位置、または特願昭62
−111766号の明細書および図面に開示されている方法に
よって文字の領域分割線の上下基準線の高さ位置が抽出
される。この抽出は文字切出し時または特徴抽出時に行
われる。さらに必要に応じ、評価関数による各文字の第
１候補の確からしさのフラグも設定される。In addition, data used in post-processing is also extracted,
Stored in RAM5. This data includes the height positions of the upper and lower edges of the circumscribed rectangle of each character image, or
The height positions of the upper and lower reference lines of the area dividing line of the character are extracted by the method disclosed in the specification and the drawings of -111766. This extraction is performed at the time of character extraction or feature extraction. Further, a flag of the probability of the first candidate of each character by the evaluation function is set as needed.

正解単語“Propose"に対して得られるデータの例を第
４図に示す。こゝではｎ＝３としている。FIG. 4 shows an example of data obtained for the correct word “Propose”. Here, n = 3.

確からしさの評価関数は例えば（１）式で与えられ
る。The evaluation function of the likelihood is given by, for example, equation (1).

Ｅ＝（D2−D1）×100/D1 （１） D1は第１候補の距離 D2は第２候補の距離このＥ値は大きいほど第１候補の確からしさが高いこ
とを意味し、第１候補と第２候補の距離差が大きく、第
１候補の距離が小さいほど大きくなる。E = (D2−D1) × 100 / D1 (1) D1 is the distance of the first candidate D2 is the distance of the second candidate The larger the E value is, the higher the probability of the first candidate is. The distance difference between the first candidate and the second candidate is large, and the distance becomes large as the distance between the first candidates is small.

そしてＥ＞閾値ならばフラグ＝０Ｅ≦閾値ならばフラグ＝１に設定される。 If E> threshold, flag = 0. If E ≦ threshold, flag = 1 is set.

次に誤認識修正のための後処理が実行されるが、これ
については実施例別に説明する。なお、後処理の単位を
スペースとスペースで挟まれた文字列である単語として
いるが、キー文字の探索範囲、キー文字の前、後あるい
は前後の一定文字数の範囲などを処理単位とすることも
可能である。Next, post-processing for erroneous recognition correction is executed, which will be described for each embodiment. Although the unit of post-processing is a word that is a character string sandwiched between spaces, a processing range may be a key character search range, a range of a fixed number of characters before, after, or before or after a key character. It is possible.

こゝで、文字のカテゴリーは次の通りである。 Here, the categories of characters are as follows.

ア）絶対大文字高さ文字（大文字レベルの高さを持つ
文字） ABDEFGHIJKLMNQRT bdfhiklt 123456789 イ）絶対小文字高さ文字（小文字レベルの高さを持つ
文字） aemnr ウ）相似形文字（大文字と小文字の形が似ている文
字） COSUVWXZ cosuvwxz エ）ディセンダー（ベースラインから下側に出る文
字：第５図参照） PY gjpqy オ）記号コンマ、アボストロフィー、マイナス、アンダースコ
アーなどなお、キー文字としては絶対大文字高さ文字および絶
対小文字高さ文字が選ばれる。A) Absolute uppercase letters (characters with uppercase level height) ABDEFGHIJKLMNQRT bdfhiklt 123456789 b) Absolute lowercase letters (characters with lowercase level height) aemnr c) Similar characters (uppercase and lowercase characters (Similar characters) COSUVWXZ cosuvwxz d) Descender (characters below the baseline: see Fig. 5) PY gjpqy e) Symbol Comma, apostrophe, minus, underscore, etc. Note that the capital letters are absolute capital letters. Character and absolute lowercase height character are selected.

実施例１処理フローは第２図の通りである。Example 1 The processing flow is as shown in FIG.

単語（ワード）の先頭文字より、その第１候補につい
て絶対大文字高さ文字または絶対小文字高さ文字を探す
（処理）。第４図の例では、２番目の文字の第１候補
ｒが絶対小文字高さの文字であるので、２番目の文字を
キー文字とする。From the first character of the word (word), an absolute uppercase character or an absolute lowercase character is searched for the first candidate (process). In the example of FIG. 4, since the first candidate r of the second character is a character having an absolute lowercase height, the second character is used as a key character.

単語の先頭文字より順にキー文字との情報比較を行
い、文字（第１候補）の所属するカテゴリーに対応した
方法によって認識結果の修正を行う（処理，）。第
４図の単語の例では処理内容は次のようになる。The information is compared with the key characters in order from the first character of the word, and the recognition result is corrected by a method corresponding to the category to which the character (first candidate) belongs (processing,). In the example of the word in FIG. 4, the processing contents are as follows.

（１）１番目の文字（正解“P"）第１候補はディセンダーのｐであるので、キー文字
（２番目の文字）の下端高さ位置と１番目文字の下端高
さ位置との差と閾値を比較する。(1) First character (correct answer "P") Since the first candidate is the descender p, the difference between the lower end height position of the key character (second character) and the lower end height position of the first character is Compare thresholds.

差≦閾値ならばベースラインから下側に文字が出てい
ない可能性が高いので、第２候補のＰを結果として出力
する（認識結果を修正する）。If the difference is smaller than or equal to the threshold, there is a high possibility that no character appears below the baseline, so the second candidate P is output as a result (the recognition result is corrected).

差＞閾値ならばベースラインから下側に文字が出てい
る可能性つまりディセンダーである可能性が高いので、
第１候補をそのまゝ認識結果として出力する（修正しな
い）。If the difference is greater than the threshold value, there is a high possibility that a character appears below the baseline, that is, a descender.
The first candidate is output as it is as a recognition result (not corrected).

当該例では前者である。 In the example, it is the former.

（２）２番目の文字（正解“r"）これはキー文字であるから、第１候補ｒをそのまゝ出
力する（修正しない）。(2) Second character (correct "r") Since this is a key character, the first candidate r is output as it is (no correction).

（３）３番目の文字（正解“o"）第１候補は相似形文字のＯ（大文字）であるので、キ
ー文字の文字高さ（その上端位置と下端位置との差）と
３番目文字の文字高さとの差と、閾値とを比較する。(3) Third character (correct "o") Since the first candidate is a similar character O (upper case), the character height of the key character (difference between the upper end position and the lower end position) and the third character Is compared with the threshold value.

差≦閾値ならば強制的に小文字ｏを出力する（修正す
る）。If the difference is smaller than or equal to the threshold value, a lowercase letter o is forcibly output (corrected).

差＞閾値ならば大文字Ｏを出力する（修正しない）。 If difference> threshold, output uppercase O (no correction).

当該例では前者である。 In the example, it is the former.

（４）４番目の文字（正解“p"）第１候補はディセンダーのｐであるので１番目文字と
同様であり、当該例では第１候補の小文字ｐが出力され
る。(4) Fourth Character (Correct "p") Since the first candidate is descender p, it is the same as the first character. In this example, the first candidate small letter p is output.

（５）５番目の文字（正解“o"）第１候補は相似形文字のｏであるので３番目文字と同
様であり、当該例では小文字ｏが出力される。(5) Fifth character (correct answer "o") Since the first candidate is a similar character o, it is the same as the third character. In this example, a lowercase o is output.

（６）６番目の文字（正解“s"）第１候補は相似形文字のＳであるので３番目文字と同
様であり、当該例では小文字ｓが出力される。(6) Sixth Character (Correct "s") Since the first candidate is a similar character S, it is the same as the third character. In this example, a small letter s is output.

（７）７番目の文字（正解“e"）第１候補は絶対小文字高さ文字のｅであるので、キー
文字の文字高さと当該７番目文字の文字高さとの差と閾
値を比較する。(7) Seventh Character (Correct “e”) Since the first candidate is the absolute lowercase height character e, the difference between the character height of the key character and the character height of the seventh character is compared with a threshold.

差≦閾値ならば候補中の絶対小文字高さ文字を出力す
る。当該例はこの場合であるのでｅを出力する。なお、
絶対小文字高さ文字の候補が２以上あるときは、順位が
最も高い候補を選択する。If the difference is smaller than or equal to the threshold, the absolute lowercase height character in the candidate is output. Since this example is this case, e is output. In addition,
If there are two or more candidates for absolute lower case height characters, the candidate with the highest ranking is selected.

差＞閾値ならば候補中の絶対大文字高さ文字を出力す
る。絶対大文字高さ文字の候補が２以上あるときは、順
位が最も高い候補を選択する。If difference> threshold, output the absolute capital height character in the candidate. If there are two or more absolute capital height character candidates, the candidate with the highest ranking is selected.

要約すると、キー文字の持つ情報（絶対大文字高さ・
絶対小文字高さの情報および文字の外接矩形の上下端の
高さの位置情報）を単語内文字に伝播させ、第１候補が
絶対大文字高さ文字、絶対小文字高さ文字またはディセ
ンダーの場合は、それぞれのカテゴリーに対応した判定
の結果に従って正解文字を候補中より選ぶが、第１候補
が“相似形文字の場合は特定の判定の結果に従って強制
的に第１候補を大文字または小文字に変換して正解文字
とする。To summarize, the key character information (absolute capital letter height,
If the first candidate is an absolute uppercase letter, an absolute lowercase letter or a descender, the first candidate is an absolute uppercase letter, an absolute lowercase letter, or a descender. According to the result of the judgment corresponding to each category, the correct character is selected from the candidates. However, if the first candidate is a similar character, the first candidate is forcibly converted to uppercase or lowercase according to the result of the specific judgment. Correct characters.

記号については次の通りである。文字の上端あるいは
下端の高さ位置とキー文字の上端あるいは下端の高さ位
置とを比較することにより、コンマとアボストロフィ
ー、マイナスとアンダースコアーを区別し、確からしい
方を出力する。The symbols are as follows. By comparing the height position of the upper or lower end of the character with the height position of the upper or lower end of the key character, a comma and an apostrophe are discriminated, and a minus and an underscore are output, and a more likely one is output.

以上の処理，はスペーが検出されるまで繰返さ
れ、スペースが検出されると１単語の処理を終了する。The above processing is repeated until a space is detected, and when a space is detected, the processing of one word ends.

なお、第２図では省略されているが、単語中にキー文
字が見つからない場合は、その旨のフラグなどを単語に
付加する。このような単語についての修正は、後段の単
語処理（単語辞書との照合による修正処理）で行われ
る。Although not shown in FIG. 2, if a key character is not found in a word, a flag or the like is added to the word. The correction of such a word is performed in the word processing (correction processing by collation with the word dictionary) in the subsequent stage.

実施例２処理フローは第２図の通りである。Example 2 The processing flow is as shown in FIG.

前記実施例１においては、処理で文字の高さ位置と
して文字の外接矩形の上、下端の位置を用いたが、この
実施例２においては、特願昭62−111766号の基準位置決
定方法による領域分割線の上下の基準位置を用いる。こ
れ以外は前記実施例１と同様である。In the first embodiment, the upper and lower positions of the circumscribed rectangle of the character are used as the height position of the character in the processing, but in the second embodiment, the reference position determination method of Japanese Patent Application No. 62-111766 is used. Reference positions above and below the region dividing line are used. Other than this, it is the same as the first embodiment.

実施例３処理フローは第２図の通りである。Embodiment 3 The processing flow is as shown in FIG.

前記実施例1,2においては、質の悪い原稿の場合に誤
ったキー文字を抽出する心配がある。例えば、つぶれた
ａの認識候補としてＢ、８が得られた場合、Ｂまたは８
がキー文字として抽出されてしまう。この文字の誤った
第１候補はそのまゝ出力されてしまうが、さらに絶対大
文字高さ文字がキー文字として選ばれたにも拘らず、そ
の文字（つぶれたａ）の実際の高さ位置情報は絶対小文
字高さ文字に相当するため、他の文字の修正を正常に行
うことができない。In the first and second embodiments, there is a concern that an incorrect key character may be extracted in the case of a poor quality document. For example, when B and 8 are obtained as recognition candidates for crushed a, B or 8
Is extracted as a key character. The erroneous first candidate of this character is output as it is, but even though the absolute capital height character is selected as a key character, the actual height position information of the character (crushed a) is displayed. Is equivalent to an absolute lowercase character, so other characters cannot be modified normally.

本実施例３は処理において、前記評価関数による確
からしさのフラグが１の文字を除外してキー文字抽出を
行うため、質の悪い原稿の場合におけるキー文字の誤抽
出とそれにより修正間違いを防ぐことができる。In the third embodiment, key characters are extracted while excluding characters whose likelihood flag is 1 by the evaluation function in the processing, so that erroneous extraction of key characters in the case of a poor quality document and correction mistakes are thereby prevented. be able to.

これ以外は前記実施例1,2と同様である。 Except for this, it is the same as the first and second embodiments.

実施例４処理フローは第３図の通りである。Embodiment 4 The processing flow is as shown in FIG.

処理は第２図中の処理と同様のキー文字抽出を行
う。こゝで、絶対大文字高さ文字または絶対小文字高さ
文字が第１候補のキー文字が抽出された場合、第２図中
の処理，，と同様の処理で前記実施例1,2また
は３と同様の修正を行う。The process performs key character extraction similar to the process in FIG. Here, when a key character whose absolute uppercase character or absolute lowercase character is the first candidate is extracted, the same processing as in the processing in FIG. Make similar corrections.

キー文字が抽出されない場合、単語内のディセンダー
の大文字（P,Y）、小文字（gjpqy）の情報や文章の一般
規則によって単語内の修正を行う（処理〜）。If the key character is not extracted, the word is corrected in accordance with the uppercase (P, Y) and lowercase (gjpqy) information of the descender in the word and the general rules of the sentence (processing ~).

キー文字が抽出できないケースの例としては次の場合
がある。The following are examples of cases where key characters cannot be extracted.

（１）“scoop"のような絶対大文字高さ文字も絶対小文
字高さ文字も含まれない単語の場合。(1) In the case of a word such as "scoop" that does not include absolute capital height characters or absolute lowercase height characters.

（２）絶対大文字高さ文字または絶対小文字高さ文字が
含まれていても、そのフラグが１の場合。(2) When the flag is 1 even if an absolute uppercase character or an absolute lowercase character is included.

修正処理は例えば次のようにして行われる。上記
（１）の“scoop"の場合、ｐのフラグが０ならば単語内
の小文字があると仮定し、単語内に相似形文字を小文字
に修正する。上記（２）の場合、英文文章は一般的に大
文字で始まり、文章中の小文字で、ピリオドで終わると
いう規則を用い、文章中の相似形文字を小文字に修正す
る。The correction process is performed as follows, for example. In the case of "scoop" in the above (1), if the flag of p is 0, it is assumed that there is a lowercase letter in the word, and the similar characters in the word are corrected to lowercase letters. In the case of the above (2), a similar sentence in a sentence is corrected to a lower case by using a rule that an English sentence generally starts with an upper case letter, ends with a lower case letter and ends with a period.

〔The invention's effect〕

以上詳細に説明したように、本発明によれば、文字認
識の後処理において、対象文字列の文字認識結果の第１
候補を調べて特定のカテゴリ（絶対大文字高さあるいは
絶対小文字高さの文字カテゴリー）に所属するキー文字
を抽出し、該キー文字と他の文字のイメージの上下位置
または高さの関係および該他の文字の第１候補の属する
カテゴリーに従って当該文字列中の他の文字の文字認識
結果を修正するため、以下のような効果が得られる。As described above in detail, according to the present invention, in the post-processing of character recognition, the first character recognition result of the target character string is obtained.
By examining the candidates, key characters belonging to a specific category (character category of absolute uppercase height or absolute lowercase height) are extracted, and the relationship between the vertical position or height of the image of the key character and other characters and other In order to correct the character recognition result of another character in the character string according to the category to which the first candidate of the character belongs, the following effects can be obtained.

（１）処理対象は行内の文字高さが一定の文書に限定
されず、単語毎に文字サイズ（高さ）が変化する英文文
書などの文字認識結果の修正も可能である。(1) The processing target is not limited to a document in which the character height in a line is constant, and it is also possible to correct the character recognition result of an English document whose character size (height) changes for each word.

（２）前処理で文字分類する方法のように正確文字が
絶対的に排除され修正が不可能もしくは極めて困難にな
ることもなく、さらにイメージに直接関係する情報は個
々の文字毎の情報であるため、面倒なスキュー補正が必
要でなくなる。(2) Accurate characters are absolutely excluded as in the method of character classification in the preprocessing, and correction is not impossible or extremely difficult. Further, information directly related to the image is information for each individual character. Therefore, troublesome skew correction is not required.

（３）対象文字列中に出現する文字（キー文字）を基
準にして、当該対象文字列中の他の文字を比較するた
め、例えば特開昭59−167783号公報のように過去の統計
などにもとづくあらかじめ用意された形状パラメータの
質によって精度が左右されることがなく、また、パラメ
ータの準備、更新、修正といったメンテナンスも必要が
ない。(3) On the basis of the characters (key characters) appearing in the target character string, the other characters in the target character string are compared. Accuracy does not depend on the quality of the shape parameters prepared in advance based on the above, and there is no need for maintenance such as preparation, updating, and correction of parameters.

[Brief description of the drawings]

第１図は本発明に係る英文文字認識装置の一例を示すハ
ードウェア構成図、第２図および第３図はそれぞれ本発
明の実施例における修正処理の概略フローチャート、第
４図は文字認識結果および後処理で利用されるデータの
例を示す図、第５図は絶対大文字高さ文字、絶対小文字
高さ文字およびディセンダー文字を示す図である。１……スキャナー、２……イメージメモリ、３……中央処理装置、４……ROM、５……RAM、６……辞書。FIG. 1 is a hardware configuration diagram showing an example of an English character recognition device according to the present invention, FIGS. 2 and 3 are schematic flowcharts of correction processing in an embodiment of the present invention, respectively, and FIG. FIG. 5 is a diagram showing an example of data used in the post-processing, and FIG. 5 is a diagram showing an absolute uppercase character, an absolute lowercase character, and descender characters. 1 ... Scanner, 2 ... Image memory, 3 ... Central processing unit, 4 ... ROM, 5 ... RAM, 6 ... Dictionary.

Claims

(57) [Claims]

In a post-processing of character recognition, a first candidate of a character recognition result of a target character string is examined to extract a key character belonging to a character category of an absolute uppercase height or an absolute lowercase height. Correcting the recognition result of another character in the character string according to the relationship between the vertical position or height of the image of the character and another character in the character string and the category to which the first candidate of the other character belongs. How to correct the character recognition result.

2. The method according to claim 1, wherein key characters are extracted and corrected in units of words sandwiched between spaces.

3. The method for correcting a character recognition result according to claim 1, wherein the likelihood of the character recognition result is evaluated, and a character having a low probability is removed from the key character.