JPH0652366A

JPH0652366A - Post-processing method for character recognition result

Info

Publication number: JPH0652366A
Application number: JP4203687A
Authority: JP
Inventors: Yoshitaka Hamaguchi; 佳孝濱口; Akitoshi Tsukamoto; 明利塚本; Sadamasa Hirogaki; 節正広垣
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1992-07-30
Filing date: 1992-07-30
Publication date: 1994-02-25

Abstract

PURPOSE:To retrieve a candidate character with high efficiency without consuming much time by setting dictionary side character classification and recognition side character classification as separate ones based on the asymmetric property of error trend. CONSTITUTION:The same classification name is attached on classification conformed as the error trend in the dictionary side character classification and the recognition side character classification. Difference between the dictionary side character classification with the same classification name and the recognition side character classification is generated by considering the asymmetric property of the error trend. For example, since such possibility that a character (k) is recognized erroneously as a character (t) in the classification name B, the character (t) is classified to the recognition side character classification with the same classification name as that of the dictionary side character classification including the character (k), however, since the character (t) is not recognized erroneously as the character (k), the character (t) is removed from the dictionary side character classification. Also. characters (a), (o), and (c) are included in the recognition side character classification, and the characters (a), (o) in the dictionary side character classification in the classification name A, while, the character (c) is included in the recognition side character classification, and the characters (c), (o) in the dictionary side character classification, which conforms to the asymmetric property.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は文字認識結果の後処理方
法に関し、特に、認識結果の誤り傾向を考慮して単語を
検索することにより認識性能を向上させるようとしたも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a post-processing method for character recognition results, and more particularly to improving recognition performance by searching for words in consideration of error tendency of the recognition results.

【０００２】[0002]

【従来の技術】例えば、機械翻訳システムとして、入力
手段に文字認識装置を適用してユーザによる入力操作の
簡便化を計ったものがある。例えば、このように適用さ
れる文字認識装置においては、文字単位の認識は勿論、
単語単位の認識も重要である。文字単位の認識において
は誤った認識結果があっても、単語単位には正確な認識
結果を得ることができるようにした方法が、従来、既に
提案されている。2. Description of the Related Art For example, as a machine translation system, there is a machine translation system in which a character recognition device is applied to an input means to simplify an input operation by a user. For example, in the character recognizing device applied in this way, of course character-by-character recognition,
Word-by-word recognition is also important. In the past, a method has been already proposed in which a correct recognition result can be obtained for each word even if there is an erroneous recognition result in recognition for each character.

【０００３】例えば、文献『沼倉他著、「誤ったキーで
も検索できる情報検索システム」、情報処理学会論文
誌、Vol.30、No.11 、pp.1468-1478、1989年11月』を挙
げることができる。For example, reference is made to “Numakura et al.,“ Information Retrieval System Retrievable with Incorrect Key ””, Journal of Information Processing Society of Japan, Vol. 30, No. 11, pp. 1468-1478, November 1989, for example. be able to.

【０００４】以下、認識対象単語を構成する各文字の認
識結果から認識対象単語についてのより正確な認識結果
を得る、上記文献に開示された方法に従った単語の検索
方法（文字認識結果の後処理方法に一部を構成する処
理）を説明する。Hereinafter, a method for searching for a word according to the method disclosed in the above-mentioned document for obtaining a more accurate recognition result for a recognition target word from the recognition result for each character constituting the recognition target word (after the character recognition result) Processing which constitutes a part of the processing method) will be described.

【０００５】なお、この方法を適用するに際しては、予
め、文字分類と、各単語の類名表記で分類した階層的な
単語辞書とを作成しておくことを要する。ここで、文字
分類とは、文字の誤り傾向に基づいて、全ての文字を幾
つかの類に分類したものであり、各類には類名が付与さ
れている。また、単語の類名表記とは、単語を構成する
文字が属する類名を並べて形成された表記である。When applying this method, it is necessary to prepare in advance a character classification and a hierarchical word dictionary classified by the notation of each word. Here, the character classification is a classification of all characters into some classes based on the error tendency of the characters, and each class is given a class name. The word class name notation is a notation formed by arranging the class names to which the characters that make up the word belong.

【０００６】単語の認識時においては、まず、認識対象
単語を構成する各文字の認識結果をそれぞれ、文字分類
の類名に置き換えた類名表記Ｘを作成する。次に、この
ようにして得られた類名表記Ｘと最も一致度の高い類名
表記Ｙを、上述の単語辞書の類名表記より検索する。そ
して最後に、類名表記Ｙを有する上述の単語辞書中の単
語を検索対象とし、上述した認識結果と最も一致度の高
い単語を検索し、その単語を検索結果とする。When recognizing a word, first, a class name notation X is created by replacing the recognition result of each character forming the recognition target word with the class name of the character classification. Next, the class name notation X having the highest degree of coincidence with the class name notation X thus obtained is searched from the class name notation of the above-mentioned word dictionary. Finally, a word in the above-mentioned word dictionary having the class name notation Y is set as a search target, a word having the highest degree of coincidence with the above-described recognition result is searched, and the word is set as the search result.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、上述の
従来方法は、ある文字Ａを文字Ｂに誤認識することも逆
に文字Ｂを文字Ａに誤認識することがある対称的な誤り
傾向を呈する文字認識装置において良好に機能するが、
ある文字Ａを文字Ｂに誤認識することはあるが文字Ｂを
文字Ａに誤認識することがない非対称的な誤り傾向を呈
する文字認識装置に対しては必ずしも満足できるもので
はなかった。However, the above-mentioned conventional method exhibits a symmetrical error tendency that a certain character A may be erroneously recognized as the character B and vice versa. Works well in character recognition devices,
This is not always satisfactory for a character recognition device that exhibits an asymmetric error tendency in which a certain character A may be erroneously recognized as the character B, but the character B is not erroneously recognized as the character A.

【０００８】文字認識装置は、基準パターンの特徴量を
有し入力パターンの特徴量との距離によって認識結果を
得るものであるが、基準パターンの特徴量算出に係るパ
ターンと、認識対象文書の文字パターンとが異なってい
るような場合には、非対称の誤り傾向を呈することは多
い。The character recognition device obtains a recognition result based on the distance from the feature amount of the input pattern and the feature amount of the reference pattern. The pattern relating to the feature amount calculation of the reference pattern and the character of the recognition target document are obtained. When the pattern is different, an asymmetric error tendency is often exhibited.

【０００９】上述した文字Ｂは文字Ａには誤らなくとも
文字Ｃと誤ることがあり、文字Ａ及び文字Ｃ間で誤るこ
とがなくとも、従来では、文字Ａ、Ｂ及びＣは同一の類
に分類される。このようにした場合には、単語辞書の同
一の類名表記を有する単語が多くなる。その結果、単語
辞書を検索して得た候補単語も多くなり、しかも、候補
単語には実際上考慮する必要がない不必要な単語が多く
入り込む。そのため、候補単語を見つける処理や候補単
語から出力単語を決定する処理にも多くの時間がかかっ
たり、出力単語を決定してもそれが妥当な単語でない場
合が増えてユーザによる手直しの機会が増えるという問
題が生じる。The above-mentioned character B may be mistaken for the character C without being mistaken for the character A. Even if there is no error between the character A and the character C, the characters A, B and C are in the same class in the past. being classified. In this case, there are many words having the same class name notation in the word dictionary. As a result, the number of candidate words obtained by searching the word dictionary increases, and moreover, the candidate words include many unnecessary words that need not be actually considered. Therefore, the process of finding a candidate word and the process of determining an output word from the candidate word also take a lot of time, and even if the output word is determined, it is not a valid word in many cases, and the opportunity for the user to rework increases. The problem arises.

【００１０】例えば、仮に、ｋをｔと誤認識することは
あるがｔはｋと誤認識されないような認識処理が行なわ
れる場合において、ｋをｔと誤認識するという誤り傾向
を表現するためには、従来においてはｋとｔを同じ類に
分類しなければならない。このようにすると、単語「ｔ
ａｋｅ」が「ｔａｔｅ」と誤認識された場合に、参照単
語「ｔａｔｅ」から検索された候補単語の中には「ｔａ
ｋｅ」が含まれる。しかしながら、このような文字分類
を作成すると、単語「ｂａｋｅ」が「ｋａｋｅ」と誤認
識されたような場合にも、ｔとｋが同じ類に分類されて
いるため、参照単語「ｋａｋｅ」から検索された候補単
語の中に「ｔａｋｅ」が含まれてしまい、この候補単語
「ｔａｋｅ」に対しても出力単語にするか否かの判断を
行なう必要が生じてしまう。For example, if a recognition process is performed such that k may be erroneously recognized as t but t is not erroneously recognized as k, an error tendency of erroneously recognizing k as t is expressed. Must conventionally classify k and t into the same class. In this way, the word "t
When “ake” is erroneously recognized as “state”, “ta” is included in the candidate words retrieved from the reference word “state”.
ke ”is included. However, if such a character classification is created, even if the word “bake” is erroneously recognized as “kake”, t and k are classified into the same class, and therefore the reference word “kake” is searched. "Take" is included in the selected candidate words, and it is necessary to determine whether or not the candidate word "take" should be an output word.

【００１１】本発明は、以上の点を考慮してなされたも
のであり、文字認識に非対称な誤り傾向がある場合に
も、時間を掛けずに効率良く候補単語を検索することが
できて、認識結果に対する後処理の処理時間及び精度を
共に向上させることができる文字認識結果の後処理方法
を提供しようとしたものである。The present invention has been made in consideration of the above points, and even when character recognition has an asymmetric error tendency, it is possible to efficiently search for candidate words in a short time, An object of the present invention is to provide a post-processing method for character recognition results that can improve both the processing time and accuracy of post-processing for recognition results.

【００１２】[0012]

【課題を解決するための手段】かかる課題を解決するた
め、本発明においては、文字を認識時の誤り傾向に基づ
いて分類した各分類に類名が付された文字分類と、単語
を構成する文字が属する分類の類名を並べた類名表記に
よって階層化されている単語辞書とを備え、文字認識結
果から単語辞書の検索時に参照する参照単語を作成する
処理と、参照単語の類名表記を得る処理と、参照単語の
類名表記をキーとして単語辞書を検索する処理とを含む
文字認識結果の後処理方法において、上記単語辞書にお
ける類名表記を作成するための辞書側文字分類と、参照
単語からその類名表記を得るための認識側文字分類と
を、誤り傾向の非対称性に基づいた別個なものとしたこ
とを特徴とする。In order to solve such a problem, in the present invention, a character classification in which a class name is assigned to each classification in which a character is classified based on an error tendency at the time of recognition and a word are formed. A process for creating a reference word that is referred to when searching the word dictionary from the character recognition result, and a class name description of the reference word In the post-processing method of character recognition results including the process of obtaining, and a process of searching a word dictionary using the class name notation of the reference word as a key, a dictionary side character classification for creating the class name notation in the word dictionary, It is characterized in that the recognition-side character classification for obtaining the class name notation from the reference word is made different based on the asymmetry of the error tendency.

【００１３】[0013]

【作用】本発明は、非対称な誤り傾向に基づいた単語辞
書の検索をできれば、時間を掛けずに効率良く候補単語
を検索できて、認識結果に対する後処理の処理時間及び
精度を向上できるということに基づいてなされた。そし
て、非対称な誤り傾向に基づいた単語辞書の検索を可能
とすべく、単語辞書における類名表記を作成するための
辞書側文字分類と、参照単語からその類名表記を得るた
めの認識側文字分類とを、誤り傾向の非対称性に基づい
た別個なものとした。According to the present invention, if a word dictionary can be searched based on an asymmetric error tendency, candidate words can be searched efficiently without spending time, and the processing time and accuracy of post-processing for recognition results can be improved. Was made based on. Then, in order to be able to search the word dictionary based on the asymmetrical error tendency, the dictionary side character classification for creating the class name notation in the word dictionary and the recognition side character for obtaining the class name notation from the reference word The classification and the two were made different based on the asymmetry of the error tendency.

【００１４】[0014]

【実施例】以下、本発明による文字認識結果の後処理方
法を、英単語の認識に適用した一実施例について図面を
参照しながら詳述する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A post-processing method for character recognition results according to the present invention will be described in detail below with reference to the drawings regarding an embodiment in which it is applied to the recognition of English words.

【００１５】図示は省略するが、この実施例は、実際
上、例えば光学式文字読取り装置（ＯＣＲ）を備えたワ
ークステーション等によってハードウェア上は実現され
る。機能ブロック的には、図２に示す通りである。Although not shown in the drawings, this embodiment is practically realized in hardware, for example, by a workstation equipped with an optical character reader (OCR). The functional block is as shown in FIG.

【００１６】図２において、文書読取り手段１１は、用
紙上の文書を２値データとして読み取るものである。文
字認識手段１２は、２値データに基づいて、各文字領域
や各単語領域等を切出すと共に、各文字領域の２値パタ
ーンの特徴量を得て、予め各文字について作成されてい
る基準特徴量との距離によって各文字の認識結果を得
て、単語（領域）毎に認識結果記憶手段１３に格納させ
るものである。参照単語作成手段１４は、文字毎の認識
結果に基づいて、検索時に参照するための参照単語を作
成するものである。文字分類記憶手段１５には、各文字
についての誤り傾向に基づいて全ての文字を重複を許し
て分類した文字分類が類名（分類名）を伴って格納され
ている。単語辞書１６には、少なくとも一般的な文章で
出現すると思われる英単語が格納されており、類名表記
による階層化も行なわれている。候補単語検索手段１７
は、参照単語を構成する各文字の類名でなる類名表記を
得て、この類名表記に一致する類名表記を有するもので
あって参照単語との一致文字数が最も多い単語を単語辞
書１６から検索するものであり、検索された候補単語を
認識結果記憶手段１３に一時記憶させるものである。出
力単語決定手段１８は、当初の認識結果を利用して候補
単語の中から出力単語を決定し、結果出力手段２１から
印字出力又は表示出力させるものである。In FIG. 2, the document reading means 11 reads a document on a sheet as binary data. The character recognition means 12 cuts out each character area, each word area, and the like based on the binary data, obtains the characteristic amount of the binary pattern of each character area, and creates the reference feature for each character in advance. The recognition result of each character is obtained according to the distance from the amount, and is stored in the recognition result storage means 13 for each word (region). The reference word creating means 14 creates a reference word to be referred to at the time of search based on the recognition result for each character. The character classification storage unit 15 stores a character classification in which all characters are classified by allowing duplication based on the error tendency of each character, together with a class name (classification name). The word dictionary 16 stores at least English words that appear to appear in general sentences, and is also hierarchized by class name notation. Candidate word search means 17
Is a word dictionary that obtains a class name notation that consists of the class names of the characters that make up the reference word and that has a class name notation that matches this class name notation and that has the largest number of matching characters with the reference word. 16 is searched, and the searched candidate word is temporarily stored in the recognition result storage means 13. The output word determining means 18 determines an output word from the candidate words using the initial recognition result, and causes the result output means 21 to print or display.

【００１７】図３は、単語辞書の構成例を示すものであ
る。単語辞書は、同一単語長の単語が固められていると
共に、さらに類名表記が同一のもの毎に固められてい
る。また、検索の高速化を期して類名表記毎にハッシュ
キーが付与されている。このハッシュキーとしては、例
えば、類名表記における各類名（文字）のＡＳＣＩＩコ
ードの和を３２で割った余りを適用する。FIG. 3 shows an example of the structure of a word dictionary. In the word dictionary, words having the same word length are hardened, and further, the same kind name is hardened for each word. In addition, a hash key is provided for each class name notation in order to speed up the search. As the hash key, for example, the remainder obtained by dividing the sum of the ASCII codes of the class names (characters) in the class name notation by 32 is applied.

【００１８】この実施例の場合、このような単語辞書に
おける類名表記を作成するための文字分類（以下、辞書
側文字分類と呼ぶ）と、認識結果から得られた参照単語
の類名表記を得るための文字分類（以下、認識側文字分
類と呼ぶ）とが異なっている点に一つの特徴を有するも
のである。In the case of this embodiment, character classification (hereinafter referred to as dictionary side character classification) for creating class name notation in such a word dictionary and class name notation of reference word obtained from the recognition result are described. One of the characteristics is that the character classification for obtaining (hereinafter referred to as the recognition-side character classification) is different.

【００１９】図１は、辞書側文字分類と認識側文字分類
とを示すものである。辞書側文字分類と認識側文字分類
とにおいて、誤り傾向として対応する分類については、
同一の類名が付されている。同一類名を有する辞書側文
字分類と認識側文字分類との違いは、誤り傾向の非対称
性が考慮されたものである。認識側文字分類は、認識時
において同じ誤り傾向を有する文字を分類することに主
眼をおいたものである。他方、辞書側文字分類は、認識
時に同じ誤り傾向を有するものであっても候補文字の検
索時には同一の分類に属すると捕らえない方が好ましい
文字を排除するように分類したものである。FIG. 1 shows the dictionary side character classification and the recognition side character classification. Regarding the classification corresponding to the error tendency in the dictionary side character classification and the recognition side character classification,
The same class name is attached. The difference between the dictionary side character classification having the same class name and the recognition side character classification is that the asymmetry of the error tendency is taken into consideration. The recognition-side character classification focuses on classifying characters having the same error tendency at the time of recognition. On the other hand, the dictionary-side character classification classifies characters that have the same error tendency at the time of recognition and exclude those that should not be regarded as belonging to the same classification when searching for candidate characters.

【００２０】例えば、類名Ｂにおいては、文字ｋは文字
ｔと誤認識される可能性があるので文字ｋを含む辞書側
文字分類と同じ類名の認識側文字分類に文字ｔを分類し
たが、文字ｔは文字ｋと誤認識されることがないので文
字ｔを辞書側文字分類から外している。また、文字ａや
ｏが文字ｃと誤認識される可能性があるが、文字ｃは文
字ｏに対しては誤認識の可能性はあっても文字ａには誤
認識の恐れがないので、類名Ａにおいて認識側文字分類
には文字ａ、ｏ、ｃを含め辞書側文字分類には文字ａ及
びｏを含め、他方、類名Ｃにおいて認識側文字分類には
文字ｃを含め辞書側文字分類には文字ｃ及びｏを含め
て、上述のような非対称性に応じるようにしている。For example, in the class name B, since the character k may be erroneously recognized as the character t, the character t is classified into the recognition side character classification having the same class name as the dictionary side character classification including the character k. , The character t is not erroneously recognized as the character k, so the character t is excluded from the dictionary side character classification. Further, although the characters a and o may be erroneously recognized as the character c, the character c may be erroneously recognized with respect to the character o, but the character a is not erroneously recognized. In the class name A, the recognition-side character classification includes the characters a, o, and c, and in the dictionary-side character classification, the characters a and o are included. On the other hand, in the class-name C, the recognition-side character classification includes the character c, and the dictionary-side character. The letters c and o are included in the classification so as to comply with the asymmetry described above.

【００２１】なお、候補単語を検索して得るためには、
少なくとも類名と認識側文字分類とが格納されていれば
良い。辞書側文字分類は、単語辞書における類名表記を
形成できれば良いものであり、この実施例では候補単語
の検索時には直接には使用しない。In order to retrieve and obtain candidate words,
It is sufficient that at least the class name and the recognition-side character classification are stored. The dictionary-side character classification is sufficient if it can form a class name notation in the word dictionary, and is not directly used when searching for candidate words in this embodiment.

【００２２】以上のような機能ブロックによって実現さ
れると共に、上記構成を有する単語辞書及び文字分類を
利用する実施例方法の処理の流れは、図４、図５及び図
６に示す通りである。The process flow of the embodiment method which is realized by the functional blocks as described above and uses the word dictionary and the character classification having the above-described configuration is as shown in FIGS. 4, 5 and 6.

【００２３】まず、図４に基づいて、実施例方法の大き
な処理の流れを説明する。First, a large processing flow of the embodiment method will be described with reference to FIG.

【００２４】２値データに変換された文書データから文
字領域や単語領域等を切出し、各文字領域の２値パター
ン（文字パターン）の特徴量を得て、予め各文字につい
て作成されている基準特徴量との距離によって各文字の
認識結果を得て、各単語領域毎に各文字の認識結果を整
理する（ステップ１００）。図７は、文字認識結果の例
を示すものであり、入力された認識対象単語（正確には
そのパターン）が「ｔａｋｅ」と「ｂａｋｅ」の例であ
る。ここで、今回の読取り対象の文字と基準特徴量を作
成した際の文字とは同一の文字であってもその字体等の
文字パターンの違いによって特徴量が異なるので、必ず
しも距離が最小とはならず、誤った文字認識結果が得ら
れることが生じる。ここでは、距離がある閾値以下の文
字を全て認識結果として取り出している。Character regions, word regions, etc. are cut out from the document data converted into binary data, the characteristic amount of the binary pattern (character pattern) of each character region is obtained, and the reference feature created in advance for each character. The recognition result of each character is obtained according to the distance from the amount, and the recognition result of each character is arranged for each word area (step 100). FIG. 7 shows an example of the character recognition result, and is an example in which the input recognition target words (correctly, their patterns) are “take” and “bake”. Here, even if the character to be read this time and the character at the time of creating the reference feature amount are the same character, the feature amount differs due to the difference in the character pattern such as the font, so the distance is not always the minimum. Therefore, an incorrect character recognition result may be obtained. Here, all the characters whose distance is less than or equal to a certain threshold value are extracted as the recognition result.

【００２５】このようにして単語を構成する各文字につ
いて認識結果を得ると、各文字についての第１候補の文
字を繋げた参照単語を形成する（ステップ１０１）。When the recognition result is obtained for each character forming the word in this way, a reference word is formed by connecting the first candidate characters for each character (step 101).

【００２６】図８は、処理段階が進むについて得られる
情報参照単語、類名表記、候補単語）を示した図表であ
る。この図８に示すように、図７に示すような認識結果
を得た場合には、参照単語は「ｔａｔｅ」や「ｋａｋ
ｅ」となる。FIG. 8 is a table showing information reference words, class name notations, and candidate words) obtained as the processing steps progress. As shown in FIG. 8, when the recognition result as shown in FIG. 7 is obtained, the reference words are “state” and “kak”.
e ”.

【００２７】このような参照単語が得られると、格納さ
れている認識側文字分類を利用して類名表記を得て、こ
の類名表記をキーとして単語辞書を検索して候補単語を
得る（ステップ１０２）。このステップの処理にもこの
実施例の特徴があり、詳細については後述する。When such a reference word is obtained, a class name notation is obtained by using the stored recognition-side character classification, and a word dictionary is searched using this class name notation as a key to obtain candidate words ( Step 102). The processing of this step also has a feature of this embodiment, and the details will be described later.

【００２８】このようにして１以上の候補単語が得られ
ると、ステップ１００で得られた認識結果をも用いて出
力する単語を決定する（ステップ１０３）。この出力単
語の決定処理には、例えば特願平３−１９６５０９号明
細書及び図面に記載された方法を用いることができる。
すなわち、候補単語の各文字の基準特徴量を利用して認
識対象単語の各文字との距離を求め、この各文字につい
て求めた距離の総和をこの候補単語の評価値とする。そ
して、評価値が最も小さい候補単語を出力する単語とす
る。When one or more candidate words are obtained in this way, the word to be output is determined using the recognition result obtained in step 100 (step 103). For this output word determination processing, for example, the method described in Japanese Patent Application No. 3-196509 and drawings can be used.
That is, the reference feature amount of each character of the candidate word is used to obtain the distance from each character of the recognition target word, and the sum of the distances obtained for each character is used as the evaluation value of this candidate word. Then, the candidate word having the smallest evaluation value is set as the word to be output.

【００２９】そして、決定された出力単語を印字又は表
示によって出力して一連の処理を終了する（ステップ１
０４）。Then, the determined output word is output by printing or displaying and a series of processing is completed (step 1).
04).

【００３０】次に、上述のステップ１０２による単語辞
書の検索処理を、図５及び図６を用いて詳述する。Next, the word dictionary search processing in step 102 will be described in detail with reference to FIGS. 5 and 6.

【００３１】単語辞書の検索処理に入ると、まず、参照
単語の各文字をそれが属する認識側文字分類の類名に置
き換えた類名表記を全て作成する（ステップ２００）。
この作成処理については後述する。When the process of searching the word dictionary is started, first, all class name notation in which each character of the reference word is replaced with the class name of the recognition side character classification to which it belongs is created (step 200).
This creation process will be described later.

【００３２】これにより１以上の類名表記が得られる
と、以下のループ処理を各類名表記毎に繰返す。まず、
１個の類名表記を取出し（ステップ２０１）、その類名
表記のハッシュ値を算出してこのハッシュ値をキーとし
てその類名表記に係る単語辞書の単語群を対象とし、そ
の中で参照単語と一致する文字数が最大の単語を全て取
出す（ステップ２０２）。その後、今回の一致文字数
と、今まで候補単語として登録されている単語の一致文
字数とを大小比較する（ステップ２０３）。一致してい
れば、今回の類名表記について取出された単語を今まで
の候補単語に追加し（ステップ２０４）、今回の一致文
字数が多ければ以前の単語候補を破棄し、今回取出され
た単語を候補単語として登録する（ステップ２０５）。
今回の一致文字数が少ない場合や、候補単語の追加や候
補単語の破棄変更が終了した場合には、検索処理してい
ない類名表記が残っているかを確認し（ステップ２０
６）、残っていれば上述のステップ２０１に戻り、全て
の類名表記に対して検索処理が終了した場合には、上述
した出力単語の決定処理（ステップ１０３）に進む。When one or more class name notations are obtained in this way, the following loop processing is repeated for each class name notation. First,
One class name notation is taken out (step 201), a hash value of the class name notation is calculated, and the hash value is used as a key to target the word group of the word dictionary related to the class name notation, in which the reference word All words having the maximum number of characters that match with are extracted (step 202). Then, the number of matching characters this time is compared with the number of matching characters of the words registered as candidate words up to now (step 203). If there is a match, the word extracted for this class name notation is added to the candidate words so far (step 204), and if there is a large number of matching characters this time, the previous word candidate is discarded and the word extracted this time. Is registered as a candidate word (step 205).
When the number of matching characters is small this time, or when the addition of the candidate word or the change of discarding the candidate word is completed, it is confirmed whether there is any class name not yet searched (step 20).
6) If it remains, the process returns to step 201 described above, and if the search process is completed for all the class name notations, the process proceeds to the above-described output word determination process (step 103).

【００３３】この実施例の場合、ある文字が複数の認識
側文字分類に属することもあるので、ステップ２００の
類名表記の作成処理は、従来のように単純ではなく、図
６に示す通りである。なお、図６に示す類名表記の作成
処理は、途中のステップ３０２に図６に示す類名表記の
作成処理を呼び出す再帰的な処理となっている。In the case of this embodiment, since a character may belong to a plurality of character classes on the recognition side, the process of creating the class name notation in step 200 is not as simple as the conventional one, and is as shown in FIG. is there. Note that the class name notation creating process shown in FIG. 6 is a recursive process that calls the class name notation creating process shown in FIG.

【００３４】まず、参照単語の文字数が０か否かを判断
し（ステップ３００）、０でなければ参照単語の末尾よ
り１文字を削除して、それを参照単語ｘとする（ステッ
プ３０２）。そして、参照単語ｘに対して、自己処理ル
ーチンである図６の類名表記ｘの作成処理に進む（ステ
ップ３０２）。他方、参照単語の文字数が０であれば、
類名表記として空文字列を設定して当該処理を終了する
（ステップ３０７）。First, it is judged whether or not the number of characters of the reference word is 0 (step 300), and if it is not 0, one character is deleted from the end of the reference word to make it the reference word x (step 302). Then, for the reference word x, the process proceeds to a self-processing routine for creating the class name notation x in FIG. 6 (step 302). On the other hand, if the number of characters in the reference word is 0,
An empty character string is set as the class name notation and the process ends (step 307).

【００３５】上述したステップ３０２が終了すると、認
識側文字分類の各類に対して以下の処理ループを繰返
す。１個の認識側文字分類を取出して、参照単語の末尾
の文字がその分類に属するか否かを判断する（ステップ
３０３、３０４）。属する場合には、類名表記ｘの末尾
にその類名を加えて類名表記に追加する（ステップ３０
５）。このような追加処理が終了したとき、又は、参照
単語の末尾の文字がその分類に属していないときには、
全ての認識側文字分類に対する処理を終了したか否かを
確認し（ステップ３０６）、終了していなければ上述の
ステップ３０３に戻り、終了していれば一連の処理を終
了する。When step 302 described above is completed, the following processing loop is repeated for each class of recognition-side character classification. One recognition-side character classification is extracted and it is determined whether or not the last character of the reference word belongs to that classification (steps 303 and 304). When it belongs, the class name is added to the end of the class name notation x (step 30).
5). When such additional processing is completed, or when the last character of the reference word does not belong to the classification,
It is confirmed whether or not the processes for all the recognition-side character classifications have been completed (step 306). If not completed, the process returns to step 303 described above, and if completed, a series of processes is completed.

【００３６】図９は、図６の処理が完了するまでの処理
ステップの変化例等を示すものであり、当初の参照単語
が「ｔａｔｅ」である場合を示している。FIG. 9 shows an example of changes in processing steps until the processing of FIG. 6 is completed, and shows a case where the initial reference word is “state”.

【００３７】最初に図６に示す処理に入った場合には参
照単語の文字数が４であるので、参照単語ｘとして「ｔ
ａｔ」を作成し、ステップ３０２によって図６に示す処
理に再度入る。このときには参照単語「ｔａｔ」の文字
数３であるので、参照単語ｘとして「ｔａ」を作成し、
ステップ３０２によって図６に示す処理に再度入る。こ
のときには参照単語「ｔａ」の文字数２であるので、参
照単語ｘとして「ｔ」を作成し、ステップ３０２によっ
て図６に示す処理に再度入る。このときには参照単語
「ｔ」の文字数１であるので、参照単語ｘとして空文字
を作成し、ステップ３０２によって図６に示す処理に再
度入る。このときには、参照単語の文字数が０であるの
で、ステップ３０７によって類名表記の空文字列化を実
行し、再帰回数が一つ前の図６処理におけるステップ３
０３にリターンする。When the process shown in FIG. 6 is first executed, the number of characters of the reference word is 4, so that the reference word x is "t".
At "is created, and the processing shown in FIG. At this time, since the reference word “tat” has three characters, “ta” is created as the reference word x,
The process shown in FIG. 6 is reentered by step 302. At this time, the number of characters of the reference word "ta" is 2, so "t" is created as the reference word x, and the process shown in FIG. At this time, since the number of characters of the reference word “t” is 1, a blank character is created as the reference word x, and the process shown in FIG. At this time, since the number of characters of the reference word is 0, the class name is converted into an empty character string in step 307, and the number of recursion is one step before in step 3 in FIG.
Return to 03.

【００３８】このときには、参照単語が「ｔ」で類名表
記ｘが空文字列であるので、文字ｔを要素とする分類の
類名Ｂ及びＤを類名表記ｘに追加した類名表記を作成し
て、再帰回数が一つ前の図６処理におけるステップ３０
３にリターンする。At this time, since the reference word is “t” and the class name notation x is an empty character string, a class name notation in which the class names B and D of the classification having the character t as an element are added to the class name notation x is created. Then, step 30 in the process of FIG.
Return to 3.

【００３９】このときには、参照単語が「ｔａ」で類名
表記ｘがＢ，Ｄであるので、参照単語の最終文字ａを要
素とする分類の類名Ａを類名表記ｘのＢ，Ｄにそれぞれ
追加した類名表記ＢＡ，ＤＡを作成して、再帰回数が一
つ前の図６処理におけるステップ３０３にリターンす
る。At this time, since the reference word is "ta" and the class name notation x is B or D, the class name A having the last character a of the reference word as an element is changed to the class name notation x B or D. The added class name notations BA and DA are created, and the process returns to step 303 in the process of FIG.

【００４０】このときには、参照単語が「ｔａｔ」で類
名表記ｘがＢＡ，ＤＡであるので、参照単語の最終文字
ｔを要素とする分類の類名Ｂ及びＤを類名表記ｘのＢ
Ａ，ＤＡにそれぞれ追加した類名表記ＢＡＢ，ＤＡＢ，
ＢＡＤ，ＤＡＤを作成して、再帰回数が一つ前の図６処
理におけるステップ３０３にリターンする。At this time, since the reference word is “tat” and the class name notation x is BA or DA, the class names B and D of the classification having the last character t of the reference word as an element are the class name notation x.
Class name notation added to A and DA respectively BAB, DAB,
BAD and DAD are created, and the process returns to step 303 in the process of FIG.

【００４１】このときには、参照単語が「ｔａｔｅ」で
類名表記ｘがＢＡＢ，ＤＡＢ，ＢＡＤ，ＤＡＤであるの
で、参照単語の最終文字ｅを要素とする分類の類名Ｃを
類名表記ｘのＢＡＢ，ＤＡＢ，ＢＡＤ，ＤＡＤにそれぞ
れ追加した類名表記ＢＡＢＣ，ＤＡＢＣ，ＢＡＤＣ，Ｄ
ＡＤＣを作成して、一連の類名表記に作成処理を完了す
る。At this time, since the reference word is "state" and the class name notation x is BAB, DAB, BAD, DAD, the class name C of the classification having the last character e of the reference word as an element is the class name notation x. Name designations BABC, DABC, BADC, D added to BAB, DAB, BAD, DAD, respectively
The ADC is created, and the creation process is completed for a series of class name notations.

【００４２】このようにして作成された類名表記に基づ
いて、上述したように単語辞書を検索して候補単語を得
る。Based on the class name notation thus created, the word dictionary is searched as described above to obtain candidate words.

【００４３】上述した図８は、参照単語が「ｔａｔｅ」
である場合と、参照単語が「ｋａｋｅ」である場合と
で、得られた類名表記と候補単語をも示している。In FIG. 8 described above, the reference word is "state".
Also, the obtained class name notation and the candidate word are also shown depending on whether the reference word is “ake” or.

【００４４】上記実施例によれば、誤り傾向の非対称性
を考慮して、単語辞書の類名表記作成に係る辞書側文字
分類と、参照単語から類名表記を作成する認識側文字分
類とを別個なものとしたので、非対称の誤り傾向に基づ
いた単語辞書の検索を行なうことができる。According to the above embodiment, in consideration of the asymmetry of the error tendency, the dictionary side character classification relating to the creation of the class name notation of the word dictionary and the recognition side character classification creating the class name notation from the reference word are performed. Since they are separate, the word dictionary can be searched based on the asymmetric error tendency.

【００４５】上述した参照単語「ｔａｔｅ」の場合に
は、２番目のｔをｋに置き換えた単語「ｔａｋｅ」も候
補単語に含まれる。一方、参照単語「ｋａｋｅ」の場合
には、１番目のｋをｔに置き換えた「ｔａｋｅ」は候補
単語には含まれていない。すなわち、ｋはｔと誤認識さ
れ得るが、ｔはｋとは誤認識されないという誤り傾向に
基づいた検索が行われていることになる。In the case of the reference word "ate" described above, the word "take" in which the second t is replaced with k is also included in the candidate words. On the other hand, in the case of the reference word “kake”, “take” obtained by replacing the first k with t is not included in the candidate words. That is, a search is performed based on an error tendency that k can be erroneously recognized as t, but t is not erroneously recognized as k.

【００４６】このような非対称な誤り傾向に基づいた検
索を実行することにより、不必要な単語が候補単語にな
ることを防止でき、決定された出力単語の精度を高める
ことができて出力単語に対する訂正操作を減らすことが
でき、また、全体としての認識処理時間を平均的に短い
ものとすることができる。By executing the search based on such an asymmetric error tendency, it is possible to prevent unnecessary words from becoming candidate words, improve the accuracy of the determined output word, and increase the accuracy of the output word. The correction operation can be reduced, and the overall recognition processing time can be shortened on average.

【００４７】なお、上記実施例においては、英単語が認
識対象の場合を説明したが、本発明は、他の言語の単語
を認識対象とした場合にも適用できるものである。例え
ば、日本語の場合には、単語の切出しと単語の認識とが
並行して行なわれるが本発明を適用することができる。In the above embodiment, the case where an English word is a recognition target has been described, but the present invention is also applicable when a word in another language is a recognition target. For example, in the case of Japanese, the extraction of words and the recognition of words are performed in parallel, but the present invention can be applied.

【００４８】また、上記実施例においては、参照単語を
１個設定して候補単語を得るものを示したが、文字認識
結果から２以上の参照単語を設定して候補単語を得るよ
うにしても良い。In the above embodiment, one reference word is set to obtain the candidate word, but two or more reference words are set from the character recognition result to obtain the candidate word. good.

【００４９】[0049]

【発明の効果】以上のように、単語辞書における類名表
記を作成するための辞書側文字分類と、参照単語からそ
の類名表記を得るための認識側文字分類とを、誤り傾向
の非対称性に基づいた別個なものとしたので、文字認識
に非対称な誤り傾向がある場合にも、時間を掛けずに効
率良く候補単語を検索することができて、認識結果に対
する後処理の処理時間及び精度を共に向上させることが
できる文字認識結果の後処理方法を実現できる。As described above, the dictionary side character classification for creating the class name notation in the word dictionary and the recognition side character classification for obtaining the class name notation from the reference word are asymmetric with respect to the error tendency. Since they are separate based on each other, even if there is an asymmetric error tendency in character recognition, it is possible to efficiently search candidate words without spending time, and the processing time and accuracy of post-processing for the recognition result can be improved. It is possible to realize a post-processing method of a character recognition result that can improve both of the above.

[Brief description of drawings]

【図１】実施例の辞書側文字分類と認識側文字分類を示
す説明図である。FIG. 1 is an explanatory diagram showing dictionary-side character classification and recognition-side character classification according to an embodiment.

【図２】実施例が適用される構成を示す機能ブロック図
である。FIG. 2 is a functional block diagram showing a configuration to which an embodiment is applied.

【図３】実施例の単語辞書構成を示す説明図である。FIG. 3 is an explanatory diagram showing a word dictionary configuration of the embodiment.

【図４】実施例の概略処理を示すフローチャートであ
る。FIG. 4 is a flowchart showing a schematic process of an embodiment.

【図５】実施例の辞書検索処理を示すフローチャートで
ある。FIG. 5 is a flowchart showing dictionary search processing according to the embodiment.

【図６】実施例の類名表記の作成処理を示すフローチャ
ートである。FIG. 6 is a flowchart showing a process of creating a class name notation according to the embodiment.

【図７】実施例の認識結果例を示す説明図である。FIG. 7 is an explanatory diagram illustrating an example of a recognition result of the example.

【図８】実施例の各処理段階で得られる情報を示す説明
図である。FIG. 8 is an explanatory diagram showing information obtained at each processing stage of the embodiment.

【図９】実施例の類名表記の作成処理の説明図である。FIG. 9 is an explanatory diagram of a class name notation creating process according to the embodiment.

[Explanation of symbols]

１５…文字分類記憶手段、１６…単語辞書。 15 ... Character classification storage means, 16 ... Word dictionary.

Claims

[Claims]

1. Hierarchization by a class name notation in which a class name is assigned to each class in which characters are classified based on an error tendency at the time of recognition, and a class name of a class to which a character forming a word belongs is arranged. And a process of creating a reference word to be referred to when searching the word dictionary from the character recognition result, and
In a post-processing method of character recognition results including a process of obtaining a class name notation of a reference word and a process of searching a word dictionary using the class name notation of a reference word as a key, a class name notation in the word dictionary is created. A post-processing method for character recognition results, characterized in that the character classification on the dictionary side and the character classification on the recognition side for obtaining the class name notation from the reference word are separate based on the asymmetry of the error tendency.