JPH06259595A

JPH06259595A - Device and method for processing character recognition

Info

Publication number: JPH06259595A
Application number: JP5076228A
Authority: JP
Inventors: Yoshitaka Hamaguchi; 佳孝濱口; Sadamasa Hirogaki; 節正広垣
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1993-03-10
Filing date: 1993-03-10
Publication date: 1994-09-16

Abstract

PURPOSE:To improve recognition performance by enabling a same recognition result to be obtained for characters of a same form in the same document even when collation with a word dictionary, etc., is performed. CONSTITUTION:A recognition character in a character reading part 3 is corrected by collating with the word dictionary 7 in the unit of word, while, classification at every character of a same form is performed. Processing to obtain a same recognition result for the character of the same form is performed by reflecting the classification result of the character of a same form on character recognition by collation with the word dictionary. Thereby, it is possible to correct an erroneously recognized character by utilizing the result of the collation with another word for an unknown word.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文書イメージを光学的
に読み取り、その文字認識を行うとともに、文字認識結
果を言語知識等を用いて自動的に修正する機能を持った
文字認識処理装置及び認識処理方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition processing device having a function of optically reading a document image, recognizing the character thereof, and automatically correcting the result of character recognition using linguistic knowledge or the like. A recognition processing method.

【０００２】[0002]

【従来の技術】印刷物、書籍、その他に記載された情報
を情報処理装置に自動的に取り込むために、文字認識処
理装置が使用されている。この装置は、文書イメージを
光学的に読み取り、電気信号に変換して、その文書イメ
ージに含まれる文字の文字認識を行う。文字認識では、
通常、読み取られた各文字の特徴量を抽出し、標準的な
文字の特徴量と比較し、最もその特徴量の似た文字を見
つけ出し、認識結果を得る。しかしながら、個々の文字
ごとに認識処理を行っても、認識誤りを完全に防止する
ことはできず、実質的にはオペレータが原文を読みなが
ら認識結果を評価し、修正するという作業を必要として
いた。このオペレータの負担を軽減するために、文字認
識結果を単語単位で単語辞書と照合し、認識性能を向上
させる後処理を行う技術が紹介されている（特開昭５８
−４８１８１号公報）。2. Description of the Related Art A character recognition processing device is used for automatically incorporating information described in a printed matter, a book, or the like into an information processing device. This apparatus optically reads a document image, converts it into an electric signal, and performs character recognition of characters included in the document image. In character recognition,
Usually, the characteristic amount of each read character is extracted, compared with the standard character characteristic amount, the character with the most similar characteristic amount is found, and the recognition result is obtained. However, even if the recognition process is performed for each character, it is not possible to completely prevent the recognition error, and in reality, the operator needs to evaluate the recognition result while reading the original text and correct it. . In order to reduce the burden on the operator, a technique has been introduced in which character recognition results are collated word by word with a word dictionary and post-processing is performed to improve the recognition performance (Japanese Patent Laid-Open No. 58-58).
-48181).

【０００３】図２に、このような従来の文字認識動作フ
ローチャートを示す。まず、ステップＳ１において、先
に説明した通りの各文字の特徴量抽出を行う。次に、ス
テップＳ２において、標準文字特徴量辞書を参照して特
徴量を照合する。これによって、文字ごとの認識結果を
得る。次に、ステップＳ３において、その文字列を単語
単位で切り分ける。英語等では単語間にスペースが存在
するため、その切り分けは容易である。日本語の場合、
文法解析等により切り分けが行われる。その後、予め用
意した単語辞書と照合する（ステップＳ４）。そして、
スペルが最も近い単語を見つけ出す。スペルが完全に一
致すれば、そのままでよいが、一部文字の誤認識等によ
りスペルが不一致であっても、全体としてその単語であ
ると認められる場合には、ステップＳ５において、該当
する単語に置き換える。こうして、誤認識文字の修正が
行われた後ステップＳ６に移り、文字認識結果の出力が
行われる。従来は、上記のような方法により、文字認識
処理が行われていた。FIG. 2 shows such a conventional character recognition operation flowchart. First, in step S1, the feature amount extraction of each character is performed as described above. Next, in step S2, the feature amount is collated with reference to the standard character feature amount dictionary. As a result, a recognition result for each character is obtained. Next, in step S3, the character string is cut into word units. In English etc., there is a space between words, so it is easy to separate them. In Japanese,
Separation is performed by grammar analysis. Then, the word dictionary prepared in advance is collated (step S4). And
Find the word with the closest spelling. If the spelling is exactly the same, it may be left as it is, but if the spelling is not the same due to erroneous recognition of some characters, etc., if the word is recognized as a whole, in step S5, replace. In this way, after the erroneously recognized character is corrected, the process proceeds to step S6, and the character recognition result is output. Conventionally, character recognition processing has been performed by the above method.

【０００４】[0004]

【発明が解決しようとする課題】ところで、上記のよう
な従来の方法では、単語辞書との照合により認識性能を
向上させることから、単語辞書にない未知の単語につい
ては、その修正をすることができない。例えば、「Ａｐ
ｌｙｓｉａ」という単語が「Ａｐｌｙ５：ａ」というよ
うに誤認識されたとする。即ち、この場合には、「Ｓ」
が「５」と、「ｉ」が「：」と誤認識されている。しか
しながら、該当するスペルの近い単語が見当らないと、
この修正を全くすることができない。また、例えば「ｓ
ｙｎａｐｔｉｃ」を「ｓｙｎｃｐｔｉｃ」と誤認識した
ような場合、上記の処理によって、単語辞書から「ｓｙ
ｎｏｐｔｉｃ」と「ｓｙｎａｐｔｉｃ」という２個の単
語が候補単語としてあげられる。この場合、どちらを選
択するかは、特徴量の総和の比較による。この結果は一
般に、文字の字体や認識部の癖等によって決ってしま
う。従って、同一文書中に全く同一の単語が使用されて
いるにも関わらず、ある場所では「ｓｙｎｏｐｔｉｃ」
と認識され、別の場所では「ｓｙｎａｐｔｉｃ」と認識
されてしまうことも有り得る。即ち、単語辞書に適切な
候補単語が存在するような場合においても、誤修正が生
じてしまうという問題があった。By the way, in the conventional method as described above, since recognition performance is improved by collating with the word dictionary, it is possible to correct unknown words that are not in the word dictionary. Can not. For example, "Ap
It is assumed that the word "lysia" is erroneously recognized as "Aly5: a". That is, in this case, "S"
Is mistakenly recognized as "5" and "i" as ":". However, if you can't find a word with a similar spelling,
You can't make this correction at all. Also, for example, "s
If "synaptic" is erroneously recognized as "syncptic", the above process causes "sy" to be detected from the word dictionary.
Two words, "noptic" and "synaptic" are given as candidate words. In this case, which one is selected depends on the comparison of the sum of the feature amounts. This result is generally determined by the font of characters and the habit of the recognition unit. Thus, even though the exact same word is used in the same document, it may be "synaptic" in some places.
It may be recognized as “Synaptic” in another place. That is, there is a problem that an erroneous correction occurs even when an appropriate candidate word exists in the word dictionary.

【０００５】特に、全く同じ形の文字で、文字ごとの認
識結果は同一の結果が得られるにも関わらず、上記のよ
うな単語辞書との照合により他の文字の認識結果と組み
合わせることによって、それぞれ別々の文字に修正され
てしまうといった不合理な結果も生じ得る。本発明は以
上の点に着目してなされたもので、特に、同形文字につ
いては、上記のような単語辞書との照合等を行っても、
同一の認識結果が得られるようにして認識性能を向上さ
せた文字認識処理装置及び認識処理方法を提供すること
を目的とするものである。In particular, even if the recognition result for each character is the same with the characters having exactly the same shape, by combining with the recognition result of other characters by collating with the above-mentioned word dictionary, The unreasonable result that each is corrected to a different character may occur. The present invention has been made by paying attention to the above points, and in particular, for homomorphic characters, even if the collation with the word dictionary as described above is performed,
It is an object of the present invention to provide a character recognition processing device and a recognition processing method that improve the recognition performance by obtaining the same recognition result.

【０００６】[0006]

【課題を解決するための手段】本発明の第１発明は、文
書イメージを光学的に読み取り電気信号に変換して、そ
の文書イメージに含まれる文字の文字認識を行なう文字
読取り部と、前記認識文字を、特徴の近似した同形文字
ごとに分類する同形文字分類部と、前記認識文字を単語
辞書と照合して、前記文字認識結果を単語単位で修正す
る認識結果修正部と、この認識結果修正部による各文字
の修正を、前記同形文字分類部の分類結果ににより、同
形文字は同一の認識結果が得られるように制御する修正
制御部とを備えたことを特徴とする文字認識処理装置に
関する。According to a first aspect of the present invention, there is provided a character reading section for optically reading a document image and converting it into an electric signal for character recognition of a character included in the document image, and the recognition. A homomorphic character classification unit that classifies characters by homomorphic characters with similar characteristics, a recognition result correction unit that corrects the character recognition result on a word-by-word basis by matching the recognized characters with a word dictionary, and this recognition result correction A character recognition processing device, comprising: a correction control unit that controls the correction of each character by the unit so that the same recognition result can be obtained for the homomorphic character according to the classification result of the homomorphic character classification unit. .

【０００７】第２発明は、文書イメージ中に含まれる文
字の文字認識を行なう一方、特徴量の近似した同形文字
ごとに分類を行い、同一分類の複数の文字について異な
る認識結果が得られているとき、多数決により唯一の認
識結果を得ることを特徴とする文字認識処理方法に関す
る。According to the second aspect of the present invention, while character recognition of characters included in a document image is performed, classification is performed for each isomorphic character having a similar feature amount, and different recognition results are obtained for a plurality of characters of the same classification. At this time, the present invention relates to a character recognition processing method characterized by obtaining a unique recognition result by a majority vote.

【０００８】第３発明は、文書イメージ中に含まれる文
字の文字認識を行なう一方、特徴量の近似した同形文字
ごとに分類を行い、前記認識結果を単語単位で単語辞書
と照合し、認識結果を候補単語に置き換えることによ
り、特定の文字が誤認識と判断されたとき、その特定の
文字と同形文字の認識結果を一括して修正することを特
徴とする文字認識処理方法に関する。According to a third aspect of the present invention, while character recognition of characters included in a document image is performed, classification is performed for each isomorphic character having a similar feature amount, and the recognition result is collated with a word dictionary on a word-by-word basis. When a specific character is determined to be erroneously recognized by replacing the character with a candidate word, the recognition result of the specific character and the isomorphic character is collectively corrected.

【０００９】第４発明は、文書イメージ中に含まれる文
字の文字認識を行なう一方、特徴量の近似した同形文字
ごとに分類を行い、前記認識結果を単語単位で単語辞書
と照合し、認識結果を候補単語に置き換える場合に、同
形文字の認識結果を同一にした場合に最も妥当と認めら
れる候補単語を選択することを特徴とする文字認識処理
方法に関する。According to a fourth aspect of the present invention, while character recognition of characters included in a document image is performed, classification is performed for each isomorphic character having a similar feature amount, and the recognition result is collated with a word dictionary on a word-by-word basis. The present invention relates to a character recognition processing method characterized by selecting a candidate word that is recognized as the most valid when the recognition results of homomorphic characters are made the same when replacing the with.

【００１０】[0010]

【作用】この装置は、文字読取り部の認識文字を従来通
り単語辞書と単語単位で照合し修正する一方、同形文字
ごとの分類を行う。そして、単語辞書との照合による文
字認識結果に同形文字の分類結果を反映させて、同形文
字は同一の認識結果が得られるような認識処理を行う。
同形文字については、同一文についてその認識結果を集
計し、多数決により唯一の認識結果を得れば統一が図れ
る。単語辞書照合の結果、特定の文字の認識結果を修正
するとき、その文字と同形の文字について、一括修正を
行えば、統一が図れる。また、単語照合の際、同形文字
が同一認識結果になるように考慮して候補単語を選択す
れば、より正確な文字認識が可能になる。これらによ
り、未知の単語について、他の単語照合の結果や同形文
字分類の結果を利用して、誤認識の文字を正しく修正で
きる。In this device, the recognized characters of the character reading unit are collated and corrected with the word dictionary in the unit of word as in the conventional case, and the same character is classified. Then, the classification result of the homomorphic characters is reflected in the character recognition result obtained by matching with the word dictionary, and the recognition processing is performed so that the same recognition result can be obtained for the homomorphic characters.
For homomorphic characters, the recognition results for the same sentence can be aggregated and unified if only a single recognition result is obtained by majority voting. When the recognition result of a specific character is corrected as a result of the word dictionary collation, the characters having the same shape as the character can be collectively corrected to be unified. Further, in the word matching, if the candidate word is selected in consideration of the same recognition result of the same characters, more accurate character recognition becomes possible. As a result, with respect to the unknown word, the misrecognized character can be correctly corrected by using the result of other word matching or the result of the homomorphic character classification.

【００１１】[0011]

【実施例】以下、本発明を図の実施例を用いて詳細に説
明する。［第１発明と第２発明］図１は、本発明の文字認識処理
装置実施例を示すブロック図である。図の装置は、多数
の文字１が記載された文書イメージ２を読み取り、その
各文字を認識する文字読取り部３を備えている。また、
この他に、文字読取り部３において認識された文字を特
徴の近似した同形文字ごとに分類するために、本発明に
おいては、同形文字分類部４と同形文字分類結果格納部
５を設けている。また、従来同様の単語辞書との照合に
より文字認識結果を単語単位で修正するために、認識結
果修正部６、単語辞書７及び候補単語格納部８が設けら
れている。The present invention will be described in detail below with reference to the embodiments shown in the drawings. [First and Second Inventions] FIG. 1 is a block diagram showing an embodiment of a character recognition processing apparatus of the present invention. The apparatus shown in the figure includes a character reading unit 3 that reads a document image 2 in which a large number of characters 1 are written and recognizes each character. Also,
In addition to this, in order to classify the characters recognized by the character reading unit 3 for each isomorphic character having a similar feature, the isomorphic character classification unit 4 and the isomorphic character classification result storage unit 5 are provided in the present invention. Further, a recognition result correction unit 6, a word dictionary 7, and a candidate word storage unit 8 are provided to correct the character recognition result on a word-by-word basis by collating with a word dictionary similar to the conventional one.

【００１２】更に、本発明においては、認識結果の修正
を同形文字分類結果によって調整するために、修正制御
部９が設けられている。この修正制御部９には、同形文
字分類結果格納部５に格納された結果が受け入れられ、
認識結果修正部６による修正動作が制御され、あるいは
必要に応じて単語単位の再修正が行われる構成となって
いる。こうして得られた修正結果は認識結果格納部１０
に格納されるよう構成されている。Further, in the present invention, the correction control unit 9 is provided in order to adjust the correction of the recognition result according to the homomorphic character classification result. The correction control unit 9 receives the result stored in the homomorphic character classification result storage unit 5,
The correction operation by the recognition result correction unit 6 is controlled, or word-by-word re-correction is performed as necessary. The correction result thus obtained is stored in the recognition result storage unit 10
Is configured to be stored in.

【００１３】一般に、同一文書中の同一文字は、同じ字
体や同じ大きさで印字されており、文字読取り部３にお
いて読み取られた結果も同形として認識される。従っ
て、そのような同形文字を互いに照合し分類する操作
は、比較的正確に行うことができる。本発明の装置は、
このような情報を利用し、同形文字については同一文書
内で同一の認識結果になるようにしている。Generally, the same character in the same document is printed in the same font and the same size, and the result read by the character reading unit 3 is recognized as the same shape. Therefore, the operation of collating and classifying such homomorphic characters can be performed relatively accurately. The device of the present invention is
By using such information, the same recognition result is obtained for the same characters in the same document.

【００１４】以下、本発明の装置の具体的な動作を順に
説明する。まず、文字読取り部３は、従来の装置に設け
られていたものと同様に、文書イメージ２から切り出さ
れた文字パターンをパターンマッチングにより辞書と比
較し、各文字について、その文字認識を行う部分であ
る。図３に、このような文字認識動作の説明図を示す。
たとえば、図に示すように、文字パターンが「Ａｐｌｙ
ｓｉａ」というように読み込まれた場合、既に図２を用
いて説明したように、これらの文字パターンの特徴量を
抽出し、標準文字特徴量辞書を参照して、その特徴量を
照合する。具体的には、特徴量の距離をパラメータとし
て、該当する最も妥当な候補文字がリストアップされ
る。例えば、図３の例では、文字Ａについては候補文字
が１個、文字ｙについては候補文字が２個、その他の文
字については候補文字が３個挙げられている。これらは
特徴量の差である距離の順に並べられ、最も距離の短い
ものが第１候補とされる。従って、文字読取り部３は、
こうして挙げられた候補文字の第１候補を認識結果とし
て、図１に示す同形文字分類部４や認識結果修正部６に
向け出力する。The specific operation of the apparatus of the present invention will be described below in order. First, the character reading unit 3 compares a character pattern cut out from the document image 2 with a dictionary by pattern matching, similarly to that provided in a conventional apparatus, and performs a character recognition for each character. is there. FIG. 3 shows an explanatory diagram of such a character recognition operation.
For example, as shown in the figure, the character pattern is "Aly
When it is read as "sia", the feature quantities of these character patterns are extracted and the feature quantities are collated by referring to the standard character feature quantity dictionary, as already described with reference to FIG. Specifically, with the distance of the feature quantity as a parameter, the corresponding most appropriate candidate characters are listed. For example, in the example of FIG. 3, the character A has one candidate character, the character y has two candidate characters, and the other characters have three candidate characters. These are arranged in the order of the distance which is the difference between the feature amounts, and the one having the shortest distance is set as the first candidate. Therefore, the character reading unit 3
The first candidate of the candidate characters thus listed is output as a recognition result to the homomorphic character classification unit 4 and the recognition result correction unit 6 shown in FIG.

【００１５】図４に、同形文字分類部の動作フローチャ
ートを示す。上記同形文字分類部４は、この図に示すよ
うな手順に従って動作する。まず、ステップＳ１におい
て、各文字の特徴量を抽出する。この特徴量の抽出は、
同形文字分類部４において独自に行ってもよいが、文字
読取り部３において、既に行われた抽出結果をそのまま
利用してもよい。これによって、処理速度の向上と効率
化を図ることができる。FIG. 4 shows an operation flowchart of the homomorphic character classification unit. The isomorphic character classification unit 4 operates according to the procedure shown in this figure. First, in step S1, the feature amount of each character is extracted. This feature extraction is
The homomorphic character classification unit 4 may perform the extraction independently, or the extraction result already performed by the character reading unit 3 may be used as it is. As a result, the processing speed can be improved and the efficiency can be improved.

【００１６】次に、ステップＳ２において、同一文書中
の各文字と特徴量を比較する。即ち、文字読取り部３に
おいては、文字ごとに特徴量を抽出し、その文字ごとに
辞書との比較を行って文字認識をしたが、同形文字分類
部４では、同一文書中の各文字の特徴量を相互に比較
し、同一の特徴量を持つ文字について、それぞれ共通す
る同形文字分類を付与する（ステップＳ３）。この比較
処理は、既に説明したように、辞書との比較に比べて一
致性が非常に高い。従って、極めて正確に分類が可能で
ある。ステップＳ４において、全ての文字についての分
類が終了がしたかどうかが判断され、全ての文字の分類
が行われるまで、ステップＳ１からステップＳ３までの
処理が繰り返される。Next, in step S2, each character in the same document is compared with the characteristic amount. That is, the character reading unit 3 extracts the feature amount for each character and performs character recognition by comparing each character with the dictionary, but the isomorphic character classification unit 4 determines the feature of each character in the same document. The quantities are compared with each other, and the same isomorphic character classification is given to each character having the same feature quantity (step S3). As described above, this comparison process has a much higher degree of matching than the comparison with the dictionary. Therefore, the classification can be performed extremely accurately. In step S4, it is determined whether or not all the characters have been classified, and the processes from step S1 to step S3 are repeated until all the characters are classified.

【００１７】図５に、同形文字分類による認識動作説明
図を示す。まず、同形文字につき、例えばこの図５に示
すように、それぞれ例えば数字による分類番号を付与す
る。この例では、文字「ｓ」が分類番号１、文字「ｙ」
が分類番号２、文字「ｎ」が分類番号３というように分
類付けされる。また、この例では、同一文書内に同形文
字の分類番号が１の文字が４個存在したとされている。
なお、この例の場合、その文字が「ｓ」であると認識さ
れて文字コードが付与されたものが３個あったとされて
いる。以下の文字についても同様で、全て同形文字につ
いて、同一の文字であると認識されたものと、文字
「ａ」のように一部が別の文字と認識されたものとが混
在する。FIG. 5 is a diagram for explaining the recognition operation by isomorphic character classification. First, for each homomorphic character, for example, as shown in FIG. 5, a classification number based on, for example, a number is given. In this example, the character "s" is the classification number 1 and the character "y".
Is classified as 2, and the character “n” is classified as 3, and so on. In addition, in this example, it is assumed that four characters having the same classification number of 1 exist in the same document.
In the case of this example, it is assumed that there were three characters whose characters were recognized as “s” and were given a character code. The same applies to the following characters, and as for all the isomorphic characters, some are recognized as the same character and some are recognized as different characters such as the character "a".

【００１８】ここで、本発明においては、このような同
形文字の分類結果を尊重し、同一の分類に属する全ての
文字について、最も出現度の高い文字コードをその分類
の文字の文字コードというように決定し、唯一の認識結
果を得るようにする。例えば、同形文字の分類番号が４
である「ａ」という文字については、文書中に同形文字
が５個出現している。この場合、「ａ」と認識され文字
コードが付与された件数が３、「ｏ」と認識され文字コ
ードが付与された件数が１とすると、この分類番号４の
文字については、文字コードを「ａ」というように決定
する。従って、例えば認識不可能な残りの１文字につい
ても同様に「ａ」と認識する。このような方法を採用す
ることによって、各文字ごとにより妥当な認識結果を得
ることができる。Here, in the present invention, such a classification result of homomorphic characters is respected, and for all characters belonging to the same classification, the character code with the highest appearance is called the character code of the character of that classification. To obtain the only recognition result. For example, the classification number of homomorphic characters is 4
As for the character "a" which is, five homomorphic characters appear in the document. In this case, assuming that the number of cases where the character code is recognized as “a” is 3 and the number of cases where the character code is recognized as “o” and is 1 is 1, the character code of the character of the classification number 4 is “ a ”. Therefore, for example, the remaining one unrecognizable character is similarly recognized as "a". By adopting such a method, it is possible to obtain a more appropriate recognition result for each character.

【００１９】［第３発明］次に、図１に示す認識結果修
正部６の動作について説明する。認識結果修正部６は、
従来と同様文字読取り部３の出力する認識結果を受け入
れて、認識文字を単語単位で単語辞書と照合し、修正作
業を行う。その具体的な動作は、図２において既に説明
した通りである。ここで、例えば同形文字については同
一の特徴量を有するため、文字読取り部３において、正
しい認識文字よりも特徴量の近い文字が発見された場合
には、その認識結果が全て誤りになるケースが存在す
る。[Third Invention] Next, the operation of the recognition result correction section 6 shown in FIG. 1 will be described. The recognition result correction unit 6
As in the conventional case, the recognition result output from the character reading unit 3 is accepted, the recognized characters are collated with the word dictionary in word units, and the correction work is performed. The specific operation is as already described in FIG. Here, for example, since homomorphic characters have the same characteristic amount, when the character reading unit 3 finds a character whose characteristic amount is closer than the correct recognized character, the recognition result may be all incorrect. Exists.

【００２０】そこで、本発明においては、認識結果修正
部６において、単語単位で照合を行い、認識結果を候補
単語に置き換えることによって、特定の文字を修正した
場合、その修正結果を図５に示す同形文字分類に反映さ
せる。即ち、例えば同形文字の分類番号が１の「ｓ］と
いう文字が文書全体について、数字の「５」と認識さ
れ、先の第２発明の処理によって、多数決によりその文
字が「５」と認識された場合においても、単語照合によ
って「ｓ」と認識することが妥当という結論が得られた
場合、その認識結果を一括して「ｓ」に修正する。これ
により、その他の部分の単語照合等の結果をより妥当な
ものとすることができる。In view of this, in the present invention, when the recognition result correction unit 6 corrects a specific character by collating word by word and replacing the recognition result with a candidate word, the correction result is shown in FIG. It is reflected in the homomorphic character classification. That is, for example, the character “s” having the classification number 1 of the homomorphic character is recognized as the numeral “5” in the entire document, and the character is recognized as “5” by the majority decision by the process of the second invention. Even in such a case, when it is concluded that it is appropriate to recognize “s” by word matching, the recognition result is collectively corrected to “s”. As a result, the result of word matching or the like in the other part can be made more appropriate.

【００２１】上記のような同形文字分類結果の単語照合
への反映は、修正制御部９により実行される。また、こ
の修正制御部９は、更に次のような制御によって認識性
能の高い処理を実行する。図６に、修正制御部の動作フ
ローチャートを示す。また、図７には、第４発明の動作
説明図を示す。まず、図６において、文字読取り部３に
よる文字読み取りが行われ（ステップＳ１）、更にステ
ップＳ２において、認識結果修正部６による単語照合が
行われ、並行してステップＳ３において、同形文字分類
部４における同形文字分類が行われると、その結果によ
り文字コードの対応付けを行う（ステップＳ４）。この
対応付けは、図５等で説明したような処理となる。The correction control unit 9 reflects the above homomorphic character classification result in the word matching. Further, the modification control unit 9 further executes a process with high recognition performance by the following control. FIG. 6 shows an operation flowchart of the correction control unit. Further, FIG. 7 shows an operation explanatory view of the fourth invention. First, in FIG. 6, characters are read by the character reading unit 3 (step S1), word recognition is performed by the recognition result correction unit 6 in step S2, and in parallel, in the step S3, the isomorphic character classification unit 4 is executed. When the homomorphic character classification is performed, the character codes are associated based on the result (step S4). This association is the processing described with reference to FIG.

【００２２】更に、これらの結果を使用して、ステップ
Ｓ５において、認識結果修正部６における単語照合動作
を制御する。即ち、認識結果修正部６において、任意の
単語についてそれぞれ候補単語があるかどうかを判断す
る。そして、候補単語があれば、ステップＳ６におい
て、同形文字分類を利用した候補単語の選択を行う。一
方、候補単語がない場合には、ステップＳ７において、
同形文字分類を利用した候補文字の選択を行う。Further, using these results, the word matching operation in the recognition result correction section 6 is controlled in step S5. That is, the recognition result correction unit 6 determines whether or not there is a candidate word for each arbitrary word. Then, if there is a candidate word, the candidate word is selected using the homomorphic character classification in step S6. On the other hand, when there is no candidate word, in step S7,
Candidate characters are selected using homomorphic character classification.

【００２３】この図のステップＳ６における処理は、具
体的には図７に示すような動作となる。まず、例えば文
字パターン「ｓｙｎａｐｔｉｃ」について、その文字認
識の結果が８個の第１候補文字として「ｓｙｎｃｐｔｉ
ｃ」となったものとする。この場合、候補単語は、図に
示すように「ｓｙｎｏｐｔｉｃ」と「ｓｙｎａｐｔｉ
ｃ」となる。従来通りの単語単位での特徴量の処理比較
では、前者、即ち「ｓｙｎｏｐｔｉｃ」が第１候補単語
となり、後者は第２候補単語となるものとする。The processing in step S6 of this figure is specifically the operation as shown in FIG. First, for example, for the character pattern “synaptic”, the result of the character recognition is “syncpti” as eight first candidate characters.
c ”. In this case, the candidate words are “synaptic” and “synapti” as shown in the figure.
c ”. In the conventional process comparison of feature amounts in word units, the former, that is, "synopic" is the first candidate word, and the latter is the second candidate word.

【００２４】本発明においては、この候補単語の各文字
コードが同形文字分類に着目した場合に、どちらが一致
する数が多いかを比較する。このとき、同形文字分類
は、図５において既に決定したものを当てはめる。従っ
て、「ａ」は、この部分では第１候補文字として「ｃ」
とされているが、同形文字の分類は、図５に示す分類番
号４に該当する。そして、その文字コードは「ａ」とな
っている。従って、候補単語をこのような同形文字分類
に着目して比較すると、第１候補単語である「ｓｙｎｏ
ｐｔｉｃ」は７文字が一致するが、第２候補単語である
「ｓｙｎａｐｔｉｃ」は８文字が一致する。この結果、
本発明においては、「ｓｙｎａｐｔｉｃ」を最終的な認
識結果として使用する。このように同形文字の認識結果
を同一にした場合に、最も妥当な候補単語を選択するこ
とによって、より信頼性の高い認識結果が得られる。In the present invention, when each character code of this candidate word is focused on the homomorphic character classification, it is compared which one has a greater number of matches. At this time, the homomorphic character classification applies that already determined in FIG. Therefore, "a" is "c" as the first candidate character in this part.
However, the classification of the homomorphic characters corresponds to the classification number 4 shown in FIG. The character code is "a". Therefore, when comparing candidate words by focusing on such homomorphic character classification, the first candidate word “syno” is compared.
“Ptic” matches 7 characters, but the second candidate word “synaptic” matches 8 characters. As a result,
In the present invention, "synaptic" is used as the final recognition result. In this way, when the recognition results of homomorphic characters are the same, by selecting the most appropriate candidate word, a more reliable recognition result can be obtained.

【００２５】また、図７において、文字パターン「ａｓ
ｓｏｃｉａｔｉｖｅ」については、実際の文字パターン
と第１候補文字との一致性が極めて低いが、これは従来
の方法により候補単語が１個だけ得られることから、各
文字の認識結果をそれぞれ決定することが可能となる。
従って、この第７番目の文字である「ａ」が文字コード
「ａ」に該当すると認識される。この「ａ」は、同形文
字分類が４である。従って、その結果を別の単語の照
合、例えば既に説明した「ｓｙｎａｐｔｉｃ」の照合に
反映させれば、候補単語が先に得られた結論と同様に選
択できる。図７における文字パターン「ｌｅａｒｎｉｎ
ｇ」においても、文字「ａ」が文字コード「ａ」に該当
すると認識されていることから、この同形文字分類が４
である文字については、「ａ」であると認識すること
が、より確実性が高いと判断することもできる。Further, in FIG. 7, the character pattern "as
For “sociative”, the match between the actual character pattern and the first candidate character is extremely low. This is because only one candidate word can be obtained by the conventional method, so the recognition result of each character must be determined. Is possible.
Therefore, it is recognized that the seventh character "a" corresponds to the character code "a". This “a” has a homomorphic character classification of 4. Therefore, if the result is reflected in the matching of another word, for example, in the matching of "synaptic" described above, the candidate word can be selected similarly to the conclusion obtained previously. The character pattern "learnin" in FIG.
Also in "g", since the character "a" is recognized as corresponding to the character code "a", this isomorphic character classification is 4
It is possible to determine that it is more certain to recognize that a character is “a”.

【００２６】一方、図７に示す文字パターン「Ａｐｌｙ
ｓｉａ」については、該当する候補単語が見当らないた
め、単語照合に失敗している。従来は、このような場
合、認識結果がそのまま出力された。しかしながら、本
発明によれば、文字パターン「ａｓｓｏｃｉａｔｉｖ
ｅ」についての２文字目や３文字目の処理結果により、
文字分類が１の文字は「ｓ」であると判断されている。
従って、この結果をもとに、「Ａｐｌｙｓｉａ」の５番
目の文字は「ｓ」であるという結論が得られる。６番目
の文字についても、他の文字パターンの処理結果から
「ｉ」であると結論付けることができる。この結果、単
語照合が不可能な場合においても、同形分類結果を利用
してより妥当な文字認識を行うことができる。On the other hand, the character pattern "Aply" shown in FIG.
For “sia”, no matching candidate word is found, and therefore word matching has failed. Conventionally, in such a case, the recognition result is output as it is. However, according to the invention, the character pattern "associative"
Depending on the processing result of the second or third character for "e",
It is determined that the character whose character classification is 1 is “s”.
Therefore, based on this result, it can be concluded that the fifth character of "Alysia" is "s". As for the sixth character, it can be concluded from the processing results of other character patterns that it is "i". As a result, even when word matching is impossible, more appropriate character recognition can be performed using the isomorphic classification result.

【００２７】本発明は以上の実施例に限定されない。上
記の認識処理装置は、例えば各ブロックごとに上記の機
能を有するハードウェアを用意してもよいし、また条件
に応じて、それぞれ上記の説明のような動作を行う処理
プログラム等により構成しても差し支えない。また、英
語の単語照合等に限らず、日本語についても同様の処理
を実行することが可能である。The present invention is not limited to the above embodiments. The above recognition processing device may be provided with hardware having the above function for each block, for example, or may be configured by a processing program or the like that performs the operations described above according to conditions. It doesn't matter. Further, it is possible to execute similar processing not only for English word matching but also for Japanese.

【００２８】[0028]

【発明の効果】以上説明した本発明の文字認識処理装置
及び認識処理方法によれば、文書イメージに含まれる文
字の文字認識を行い、その認識文字を特徴量の近似した
同形文字ごとに分類し、認識文字を単語辞書と照合して
単語単位で修正する場合に、文字分類の分類結果を利用
し、同形文字は同一の認識結果が得られるようにしたの
で、これまでよりも信頼性の高い文字認識処理を行うこ
とが可能となる。しかも単語照合による候補単語が得ら
れないようなものについても、他の単語照合結果等を利
用して最も妥当な認識結果を得ることが可能となる。According to the character recognition processing apparatus and the recognition processing method of the present invention described above, the characters included in the document image are recognized, and the recognized characters are classified into homomorphic characters having similar feature amounts. , When the recognized characters are collated with the word dictionary and corrected in word units, the classification result of the character classification is used, and the same recognition result can be obtained for the homomorphic characters, so it is more reliable than before. It becomes possible to perform character recognition processing. In addition, even if a candidate word cannot be obtained by word matching, it is possible to obtain the most appropriate recognition result by using another word matching result or the like.

【００２９】また、同一分類の複数の文字について異な
る認識結果が得られても、多数決により唯一の認識結果
を得ることによって認識誤りが減少できる。更に、単語
辞書の照合によって、特定の文字が誤認識と判断された
場合、その文字と同形文字の認識結果を一括して修正す
るようにすれば、単語照合のできない場合の認識結果を
向上させるだけでなく、その他の単語照合の信頼性も高
めることができる。更に、同形文字の認識結果を同一に
した場合に、最も妥当な候補単語を選択するようにすれ
ば、候補単語の選択字体が全体として信頼性を高めら
れ、文字認識の後処理の性能が向上する。Further, even if different recognition results are obtained for a plurality of characters of the same classification, the recognition error can be reduced by obtaining the unique recognition result by the majority vote. Furthermore, if a certain character is determined to be erroneously recognized by collating the word dictionary, the recognition result when the character collation is not possible can be improved by correcting the recognition results of the character and the isomorphic character collectively. Not only that, the reliability of other word matching can be improved. Furthermore, if the recognition results of homomorphic characters are the same, selecting the most appropriate candidate word will improve the reliability of the selected font of the candidate word as a whole and improve the post-processing performance of character recognition. To do.

[Brief description of drawings]

【図１】本発明の文字認識処理装置実施例を示すブロッ
ク図である。FIG. 1 is a block diagram showing an embodiment of a character recognition processing device of the present invention.

【図２】従来の文字認識動作フローチャートである。FIG. 2 is a flowchart of a conventional character recognition operation.

【図３】文字認識動作説明図である。FIG. 3 is a diagram for explaining a character recognition operation.

【図４】同形文字分類部の動作フローチャートである。FIG. 4 is an operation flowchart of a homomorphic character classification unit.

【図５】同形文字分類による認識動作説明図である。FIG. 5 is a diagram illustrating a recognition operation based on homomorphic character classification.

【図６】修正制御部の動作フローチャートである。FIG. 6 is an operation flowchart of a correction control unit.

【図７】第４発明の動作説明図である。FIG. 7 is an operation explanatory diagram of the fourth invention.

[Explanation of symbols]

３文字読取り部４同形文字分類部５同形文字分類結果格納部６認識結果修正部７単語辞書８候補単語格納部９修正制御部１０認識結果格納部 3 Character reading unit 4 Homomorphic character classification unit 5 Homomorphic character classification result storage unit 6 Recognition result correction unit 7 Word dictionary 8 Candidate word storage unit 9 Correction control unit 10 Recognition result storage unit

Claims

[Claims]

1. A character reading unit for optically reading a document image and converting it into an electric signal for character recognition of characters included in the document image; and the recognized characters classified by homomorphic characters having similar characteristics. A homomorphic character classification unit, a recognition result correction unit that matches the recognized characters with a word dictionary, and corrects the character recognition result on a word-by-word basis, and correction of each character by the recognition result correction unit is performed by the homomorphic character classification unit. A character recognition processing device, comprising: a correction control unit that controls the same character to obtain the same recognition result according to the classification result of the unit.

2. When the characters included in the document image are recognized, the same characters having similar feature quantities are classified, and different recognition results are obtained for a plurality of characters of the same classification. A character recognition processing method characterized by obtaining the only recognition result by.

3. Character recognition of characters included in a document image is performed, classification is performed for each homomorphic character having a similar feature amount, the recognition result is collated with a word dictionary for each word, and the recognition result is a candidate word. When the specific character is determined to be erroneously recognized, the character recognition processing method is to collectively correct the recognition results of the specific character and the isomorphic character.

4. Character recognition of characters included in a document image is performed, classification is performed for each isomorphic character having a similar feature amount, the recognition result is collated with a word dictionary for each word, and the recognition result is a candidate word. A character recognition processing method characterized by selecting a candidate word that is considered to be the most appropriate when the recognition results of the same-shaped characters are the same when replacing with.