JPH0944604A

JPH0944604A - Character recognizing processing method

Info

Publication number: JPH0944604A
Application number: JP7216632A
Authority: JP
Inventors: Yoshitaka Hamaguchi; 佳孝濱口; Masashi Ito; 昌史伊藤; Yoshiji Maeno; 芳史前野; Makoto Torigoe; 真鳥越
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1995-08-02
Filing date: 1995-08-02
Publication date: 1997-02-14
Anticipated expiration: 2015-08-02
Also published as: JP3188154B2

Abstract

PROBLEM TO BE SOLVED: To narrow down comparative collation object and to shorten the time for comparative collation by selecting words including the same number of key characters among a lot of words even when a lot of words having the same number of characters exist. SOLUTION: The key character having the low error rate of recognized result is detected out of candidate characters. When such a key character is detected, a dictionary filter part 4 is generated for narrowing down the word of collation object when reading this word out of a word dictionary 3. The number of candidate character strings is counted from the word dictionary 3 and the word having the same number of characters as that is extracted. In this case, that kind becomes (e) or (s) as a result of key character detection. Besides, concerning (e), the number becomes two characters and concerning (s), it becomes one character. Then, the word having five characters, two key characters of (e) and one key character of (s) is selected from among a lot of word groups stored in the word dictionary 3. Thus, the number of words to be the collation object is decreased.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、文書のイメージを
読み取って文字を抽出し認識する場合に、その認識結果
を単語辞書を用いて自動的に修正する文字認識処理方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition processing method for automatically correcting a recognition result when a document image is read to extract and recognize a character by using a word dictionary.

【０００２】[0002]

【従来の技術】手書き文字、印刷文字をイメージデータ
として読み取って、そのイメージを文字単位で切り出し
て認識処理する装置は、文書等のデータを自動的に情報
処理装置に入力したり、各種の演算処理を行うために広
く利用されている。このような文字認識装置は、１文字
ずつそのパターンを認識し、予め用意した辞書と比較照
合して各文字の認識処理を行う。しかしながら、手書き
文字等は必ずしも辞書のパターンと正確に一致せず、あ
る程度の認識誤りを生じる。例えば、ｔという文字の認
識処理の結果、ｔであるかあるいはｌであるか判別がつ
かない場合がある。このような場合には、まず各文字に
ついて１文字又は２文字以上の候補文字を挙げる。そし
て、１つの単語を構成する文字列について単語辞書と照
合する。単語辞書からはその文字列を構成する文字数と
同一文字数の単語を取り出し、１つずつ比較する。一致
率の最も高い単語を文字認識結果とし、文字認識の正読
率を向上させる。このような後処理技術は、例えば特公
昭６１−２００３８号公報に記載されている。2. Description of the Related Art An apparatus for reading handwritten characters and printed characters as image data and cutting out the image in character units for recognition processing automatically inputs data such as a document into an information processing apparatus or performs various arithmetic operations. Widely used to perform processing. Such a character recognition device recognizes the pattern character by character and compares it with a dictionary prepared in advance to perform recognition processing of each character. However, handwritten characters and the like do not always exactly match the patterns of the dictionary, and some recognition errors occur. For example, it may not be possible to determine whether the character is t or l as a result of the recognition process of the character t. In such a case, first, one character or two or more candidate characters are listed for each character. Then, the character string forming one word is collated with the word dictionary. From the word dictionary, words having the same number of characters as the number of characters forming the character string are extracted and compared one by one. The word with the highest matching rate is taken as the character recognition result, and the correct reading rate of character recognition is improved. Such a post-processing technique is described in, for example, Japanese Patent Publication No. 61-20038.

【０００３】[0003]

【発明が解決しようとする課題】ところで、上記のよう
な文字認識処理方法には次のような解決すべき課題があ
った。単語辞書には、文字認識処理の結果出力されるで
あろう多数の単語を予め格納しておく。従って、辞書に
登録されている単語数が多い場合、同一文字数の単語が
辞書中に非常に多く存在することになり、単語の照合の
ために長時間を必要とする。これでは文字認識処理のた
めの速度が遅くなるという問題があった。本発明は以上
の点に着目してなされたもので、認識結果の誤り率が低
い文字をキー文字に設定し、そのキー文字数やキー文字
の位置等に着目して照合すべき単語を絞り込み、比較照
合のための時間を短縮することを目的とする。The character recognition processing method as described above has the following problems to be solved. A large number of words that will be output as a result of the character recognition processing are stored in the word dictionary in advance. Therefore, when the number of words registered in the dictionary is large, a large number of words having the same number of characters exist in the dictionary, and it takes a long time to match the words. This causes a problem that the speed for character recognition processing becomes slow. The present invention has been made in view of the above points, sets a character with a low error rate of the recognition result as a key character, narrows down the words to be matched by focusing on the number of key characters and the position of the key character, The purpose is to shorten the time for comparison and matching.

【０００４】[0004]

[Means for Solving the Problems]

〈構成〉本発明の文字認識処理方法は、文書のイメージ
を読み取って、そのイメージを文字単位で切り出して認
識処理するとともに、認識処理の結果得られた候補文字
列により構成される単語を、単語辞書から抽出した同数
の文字列から構成される単語群と比較照合して、認識結
果の後処理を行うものにおいて、予め、認識結果の誤り
率が低いキー文字を設定し、そのキー文字を含む候補文
字列により構成される単語について、キー文字数をカウ
ントして、単語辞書から抽出した、同数の文字列から構
成され、かつ、同数のキー文字を含む単語群と比較照合
する。〈作用〉文字数が同数の単語が多く存在しても、そのう
ちキー文字を同数含む単語を選び出すことによって、比
較照合対象を絞り込む。<Structure> The character recognition processing method of the present invention reads an image of a document, cuts out the image for each character and performs recognition processing, and converts a word formed by a candidate character string obtained as a result of the recognition processing into a word. In the case where post-processing of the recognition result is performed by comparing and collating with a word group composed of the same number of character strings extracted from the dictionary, a key character with a low error rate of the recognition result is set in advance and the key character is included. The number of key characters of the word composed of the candidate character string is counted, and the word group composed of the same number of character strings extracted from the word dictionary and including the same number of key characters is compared and collated. <Operation> Even if there are many words with the same number of characters, the comparison and collation target is narrowed down by selecting the word including the same number of key characters.

【０００５】〈構成〉本発明の他の文字認識処理方法
は、文書のイメージを読み取って、そのイメージを文字
単位で切り出して認識処理するとともに、認識処理の結
果得られた候補文字列により構成される単語を、単語辞
書から抽出した同数の文字列から構成される単語群と比
較照合して、認識結果の後処理を行うものにおいて、予
め、認識結果の誤り率が低いキー文字を設定し、そのキ
ー文字を含む候補文字列により構成される単語につい
て、キー文字の位置を検出して、単語辞書から抽出し
た、同数の文字列から構成され、かつ、同位置に該当す
るキー文字を含む単語群と比較照合する。〈作用〉文字数が同数の単語群からキー文字が同位置に
あるものを選択すれば、一層比較照合対象となる単語を
絞り込むことができる。また、以上の手段は、音声認識
により得られた文字列の処理にも適用できる。いずれの
場合にも、キー文字にはできるだけどの単語にも出現率
の高いものを選択することが好ましい。<Structure> Another character recognition processing method of the present invention reads an image of a document, cuts out the image in character units, performs recognition processing, and is composed of candidate character strings obtained as a result of the recognition processing. A word that compares and collates a word group composed of the same number of character strings extracted from the word dictionary and performs post-processing of the recognition result, in advance, sets a key character with a low error rate of the recognition result, For a word composed of a candidate character string containing the key character, a word composed of the same number of character strings extracted from the word dictionary by detecting the position of the key character and containing the key character corresponding to the same position Compare and collate with group. <Operation> If words having the same number of key characters in the same position are selected from a group of words having the same number of characters, the words to be compared and collated can be further narrowed down. The above means can also be applied to the processing of a character string obtained by voice recognition. In any case, it is preferable to select a key character that has a high appearance rate in any word as much as possible.

【０００６】[0006]

【発明の実施の形態】以下、本発明を図の実施の形態を
用いて詳細に説明する。〈キー文字数による絞り込み〉図１は、本発明の文字認
識処理方法の概略を説明する説明図である。この図を用
いて、まず本発明の概略を説明する。入力文書１には文
字認識の対象となる文書が記載されている。ここではそ
の一例として、１つの単語「ｒｅｓｅｔ」が示されてい
る。単語を構成する各文字の認識処理を行う場合には、
まず入力文書１をイメージデータとして読み取り、その
イメージデータから１文字ずつ文字の切り出しを行う。
その結果、図の２−１〜２−５に示す５文字がこの単語
を構成するものとして切り出され、各文字についてパタ
ーン認識等による認識処理が実施される。ここで、その
ような認識処理を行うと、認識結果として出力されるべ
き候補文字が１文字あるいは数文字検出される。ここで
は３番目の文字２−３についてＳ，Ｖが、５番目の文字
２−５についてはｌ，ｔという候補文字が、それぞれ得
られている。BEST MODE FOR CARRYING OUT THE INVENTION The present invention will be described in detail below with reference to the embodiments shown in the drawings. <Narrowing Down by Number of Key Characters> FIG. 1 is an explanatory diagram for explaining the outline of the character recognition processing method of the present invention. First, the outline of the present invention will be described with reference to this figure. The input document 1 describes a document that is a target of character recognition. Here, as an example, one word "reset" is shown. When recognizing each character that makes up a word,
First, the input document 1 is read as image data, and characters are cut out character by character from the image data.
As a result, the five characters shown in FIGS. 2-1 to 2-5 are cut out as constituting this word, and the recognition process by pattern recognition or the like is performed for each character. Here, when such recognition processing is performed, one or several candidate characters to be output as a recognition result are detected. Here, candidate characters S and V are obtained for the third character 2-3, and l and t are obtained for the fifth character 2-5.

【０００７】ここで、本発明では、このような候補文字
の中から認識結果の誤り率が低いキー文字を検出する。
なお、このキー文字は文字の認識処理方法や認識処理装
置の特性によって様々になる。従って、例えば装置によ
っては図に示すｅという文字が認識結果の誤り率が非常
に低いものもあるし、またａという文字が認識率が低い
といった場合もある。従って、装置ごとにこのようなキ
ー文字が任意に設定される。なお、文字の認識率を高め
るために設定するキー文字であるから、比較的各単語に
頻繁に出現する文字をキー文字に設定することが好まし
い。従って、たとえ認識結果の誤り率が低い文字であっ
ても、あまり多くの単語に登場しないような文字は必ず
しもキー文字に設定する必要はない。In the present invention, a key character having a low recognition result error rate is detected from such candidate characters.
The key characters vary depending on the character recognition processing method and the characteristics of the recognition processing device. Therefore, for example, depending on the device, the letter e shown in the figure may have a very low error rate in the recognition result, and the letter a may have a low recognition rate. Therefore, such key characters are arbitrarily set for each device. Since the key character is set in order to increase the character recognition rate, it is preferable to set the character that appears relatively frequently in each word as the key character. Therefore, even if the recognition result has a low error rate, characters that do not appear in too many words do not necessarily need to be set as key characters.

【０００８】このようなキー文字を検出すると、図に示
す単語辞書３から照合対象の単語を読み出す場合に、こ
れを絞り込むための辞書フィルタ部４が生成される。単
語辞書３からは候補文字列の数を数えて、それと同一文
字数の単語を取り出す。１つの単語について文字認識を
行った場合、文字数について誤りを生じる場合が極めて
少ないと考えられるからである。ここでは、単語「ｒｅ
ｓｅｔ」についての処理であるから、従来ならば全部で
５文字の単語全てが単語辞書３から取り出される。ここ
で、キー文字を検出した結果、その種類はｅあるいはｓ
となる。また、その数はｅについては２文字、ｓについ
ては１文字となる。そこで、単語辞書３中に格納された
多数の単語群の中から、文字数が５文字で、ｅというキ
ー文字が２文字、ｓというキー文字が１文字ある単語を
選択する。これによって、照合対象となる単語数が減少
し、「ｒｅｓｅｔ」が後処理結果として出力される。な
お、後で説明する別の実施の形態では、キー文字の位置
を考慮して単語数を更に絞り込むようにしている。When such a key character is detected, the dictionary filter unit 4 for narrowing down the matching target word is read out when the matching target word is read out from the word dictionary 3 shown in the figure. From the word dictionary 3, the number of candidate character strings is counted, and words having the same number of characters as that are extracted. This is because when character recognition is performed on one word, it is considered that an error in the number of characters is extremely rare. Here, the word "re
Since it is a process for "set", in the conventional case, all the words of five characters are taken out from the word dictionary 3. Here, as a result of detecting the key character, the type is e or s.
Becomes Also, the number is two characters for e and one character for s. Therefore, a word having five characters, two key characters e and one key character s is selected from a large number of word groups stored in the word dictionary 3. As a result, the number of words to be matched is reduced, and "reset" is output as the post-processing result. In another embodiment described later, the number of words is further narrowed down in consideration of the position of the key character.

【０００９】図２には、本発明実施のためのハードウェ
アブロック図を示す。本発明を実施するためには、図に
示すように、文字認識部６、キー文字テーブル７、キー
文字検出部８、単語辞書３、辞書フィルタ部４、単語照
合部９及び出力部１０等を備えた装置を使用する。文字
認識部６は、入力文書のイメージを読み取り、図１で説
明した文字の切出し及び文字認識を行って、各文字につ
いて単数または複数の候補文字の文字コードを出力する
部分である。この部分の構成は従来の装置と全く同様の
ものである。また、単語照合部９は、文字認識部６から
単語単位に認識結果を受け取って、辞書フィルタ部４を
通じて引き出される単語辞書３に登録された単語と１個
ずつ比較照合を行い、一致度の最も高い単語を候補単語
として選出する部分である。一致度の計算は、一致文字
数、パターン比較上の一致度あるいは候補文字の順位
等、従来よく知られた各種の手法を用いることができ
る。FIG. 2 shows a hardware block diagram for implementing the present invention. In order to carry out the present invention, as shown in the figure, a character recognition unit 6, a key character table 7, a key character detection unit 8, a word dictionary 3, a dictionary filter unit 4, a word matching unit 9, an output unit 10, and the like are provided. Use the equipment provided. The character recognition unit 6 is a unit that reads an image of an input document, performs character extraction and character recognition described in FIG. 1, and outputs a character code of a single character or a plurality of candidate characters for each character. The structure of this part is exactly the same as that of the conventional device. The word matching unit 9 receives the recognition result from the character recognition unit 6 on a word-by-word basis, and compares and matches the words one by one with the words registered in the word dictionary 3 extracted through the dictionary filter unit 4 to obtain the highest matching score. This is a part where a high word is selected as a candidate word. For the calculation of the degree of coincidence, various well-known methods such as the number of coincident characters, the degree of coincidence in pattern comparison or the rank of candidate characters can be used.

【００１０】キー文字検出部８は、文字認識部６から単
語単位に認識結果を受け取って、キー文字テーブル７に
登録されているキー文字が、認識結果として出力された
単語中に何文字存在するかを数える部分である。なお、
キー文字テーブル７には、先に説明したように、予め設
定されたキー文字であるｅやｓという文字が記憶され表
示されている。このキー文字テーブルは、例えば文字認
識部６の文字認識方法を変更するごとに書き換えること
ができる。辞書フィルタ部４は、キー文字検出部８が出
力するキー文字数とそのキー文字に着目し、単語辞書３
に登録された単語のうち照合対象となる単語を選択して
単語照合部９に出力する部分である。単語辞書３は、辞
書フィルタ部４がキー文字検出部８の出力をもとに迅速
に単語を選択できるように構成されている。これは、後
で図３を用いて説明する。The key character detection unit 8 receives the recognition result for each word from the character recognition unit 6, and how many key characters are registered in the key character table 7 in the words output as the recognition result. This is the part that counts. In addition,
As described above, the key character table 7 stores and displays characters e and s, which are preset key characters. This key character table can be rewritten, for example, every time the character recognition method of the character recognition unit 6 is changed. The dictionary filter unit 4 pays attention to the number of key characters output by the key character detection unit 8 and the key characters, and the word dictionary 3
This is a part for selecting a word to be matched among the words registered in and outputting it to the word matching unit 9. The word dictionary 3 is configured so that the dictionary filter unit 4 can quickly select a word based on the output of the key character detection unit 8. This will be described later with reference to FIG.

【００１１】出力部１０は、単語照合部９から出力され
る候補単語をもとに、後処理結果を出力するための文字
列を決定する部分である。この部分は従来の文字認識処
理後の後処理装置と同様の構成とされる。図３には、単
語辞書例説明図を示す。上記のように、候補文字を例え
ばｅとｓに設定した場合に、単語辞書３は、そのキー文
字数と単語を構成する文字数をもとに該当する単語を効
率よく引き出すよう構成する。図３に示す例は、キー文
字ｅ及びｓを含む単語長が５文字の単語を集めた部分の
みを抜粋してテーブル化したものである。単語が含むキ
ー文字の数により単語辞書を分類しておけば、このよう
なテーブルが容易に取り出せる。例えば、この図でキー
文字ｅの文字数が“０”でキー文字ｓの文字数が“１”
の単語は、「ｆｉｒｓｔ」と「ｂｕｒｓｔ」…であるこ
とが分かる。図１に示した例では、キー文字ｅの数が
“２”、キー文字ｓの数が“１”である。従って、この
図３に示す単語辞書から「ｓｅｉｚｅ」と「ｒｅｓｅ
ｔ」という２個の単語が抽出される。The output unit 10 is a unit that determines a character string for outputting the post-processing result based on the candidate words output from the word matching unit 9. This part has the same structure as the post-processing device after the conventional character recognition process. FIG. 3 shows an explanatory diagram of an example of a word dictionary. As described above, when the candidate characters are set to e and s, for example, the word dictionary 3 is configured to efficiently extract the corresponding word based on the number of key characters and the number of characters forming the word. The example shown in FIG. 3 is a table in which only a portion in which words having a word length of 5 characters including key characters e and s are collected is extracted and tabulated. Such a table can be easily retrieved by classifying the word dictionary according to the number of key characters included in the word. For example, in this figure, the number of key characters e is "0" and the number of key characters s is "1".
It can be seen that the words are "first" and "burst" .... In the example shown in FIG. 1, the number of key characters e is “2” and the number of key characters s is “1”. Therefore, from the word dictionary shown in FIG. 3, “size” and “rese”
Two words "t" are extracted.

【００１２】なお、キー文字は認識率が高い文字である
が、誤認識する場合もあり得る。従って、例えばキー文
字ｅが２文字で、キー文字ｓが０文字から２文字の間
と、キー文字ｅが１文字から３文字の間で、キー文字ｓ
が１文字というように、ある程度の範囲を定めてそこか
ら候補単語を取り出すようにしてもよい。これでも、単
に文字数が一致した単語全てを取り出す場合に比べて、
十分に比較照合対象となる単語の絞り込みができる。図
１に示した例では、候補文字が２番目の文字２−２と５
番目の文字２−５について、それぞれ２文字ずつ存在す
る。従って、このような候補文字の組合せでできる文字
列は「ｒｅｓｅｌ」、「ｒｅｖｅｌ」、「ｒｅｓｅ
ｔ」、「ｒｅｖｅｔ」の４種類となる。これらと辞書フ
ィルタ部４から取り出された単語との比較照合を行う
と、「ｒｅｓｅｔ」が候補単語として出力される。The key character has a high recognition rate, but it may be erroneously recognized. Therefore, for example, if the key character e is 2 characters, the key character s is 0 to 2 characters, and the key character e is 1 to 3 characters, the key character s
You may make it extract a candidate word from a certain range, such as "1". Even with this, compared to the case where all the words with the same number of characters are extracted,
It is possible to sufficiently narrow down the words to be compared and matched. In the example shown in FIG. 1, the candidate character is the second character 2-2 and 5
There are two characters for each of the second characters 2-5. Therefore, character strings formed by such combinations of candidate characters are "resel", "revel", "rese".
There are four types, “t” and “revet”. When these are compared and collated with the words extracted from the dictionary filter unit 4, “reset” is output as a candidate word.

【００１３】これによって、単語辞書から取り出される
単語のうち比較照合が行われるものが十分に絞り込ま
れ、照合処理のための処理速度が速まる。また、キー文
字部分が他の文字と入れ替わった候補単語が選択される
可能性が減少し、正読率が向上する。例えば、単語長が
５文字の全ての単語と照合を行うと、図３に示した辞書
の中から「ｒｅｖｅｌ」、「ｒｅｓｅｔ」の２単語が候
補単語として出力される。候補文字について、それぞれ
パターン認識の際に一定の一致率が演算処理されてい
る。従って、いずれかの文字が一致率が高いものとして
第一候補、他の文字は第二候補として出力される。しか
しながら、このような認識処理の結果、誤って「ｒｅｖ
ｅｌ」が最終的な候補単語として決定される場合も少な
くない。ところが、本発明を使用してキー文字ｅやｓを
照合対象の絞り込みに利用すると、「ｒｅｖｅｌ」は候
補単語とならないため誤認識が生じない。As a result, among the words extracted from the word dictionary, the words to be compared and matched are sufficiently narrowed down, and the processing speed for the matching processing is increased. Further, the possibility that a candidate word in which the key character portion is replaced with another character is selected is reduced, and the correct reading rate is improved. For example, if matching is performed with all words having a word length of 5 characters, two words "revel" and "reset" are output as candidate words from the dictionary shown in FIG. A constant matching rate is calculated for each of the candidate characters during pattern recognition. Therefore, any one of the characters is output as the first candidate with a high matching rate, and the other characters are output as the second candidates. However, as a result of such recognition processing, the “rev
In many cases, "el" is determined as the final candidate word. However, when the key characters e and s are used for narrowing down the collation target by using the present invention, “revel” does not become a candidate word, so that misrecognition does not occur.

【００１４】〈キー文字位置による絞り込み〉上記の実
施の形態は、キー文字の数によって照合対象となる単語
を絞り込む例を説明した。ところが、認識結果の誤り率
が低いキー文字は、その数だけでなく単語中の位置まで
も確実性が高い。そこで、次の実施の形態では、キー文
字の位置に着目した絞り込みを行う。この目的のため
に、図２に示すキー文字検出部８は文字認識部６の出力
する認識結果を単語単位に受け入れ、そのキー文字位置
を検出して辞書フィルタ部４に出力するよう構成する。
また、辞書フィルタ部４はキー文字位置に着目して単語
辞書３から文字数が同一で該当位置にキー文字の存在す
る単語を取り出し、単語照合部９に送り込む構成とす
る。<Refining by Key Character Position> In the above embodiment, an example of narrowing down the words to be collated by the number of key characters has been described. However, the key character with a low error rate of the recognition result has high reliability not only in the number but also in the position in the word. Therefore, in the next embodiment, narrowing down is performed by focusing on the position of the key character. For this purpose, the key character detection unit 8 shown in FIG. 2 is configured to accept the recognition result output by the character recognition unit 6 in word units, detect the key character position, and output the key character position to the dictionary filter unit 4.
Further, the dictionary filter unit 4 focuses on the key character position, takes out a word having the same number of characters from the word dictionary 3 and having a key character at the corresponding position, and sends it to the word matching unit 9.

【００１５】図４には、このような実施の形態に使用す
る単語辞書の例説明図を示す。この図では、キー文字を
ｅとｓとし、その文字位置に対応して単語が配列されて
いる。即ち、５文字で構成される単語のうち、キー文字
ｅが１番目から５番目に、それぞれ配置されている単語
にはどのような単語が存在するか、これをリストアップ
している。“０”とあるのはキー文字ｅが存在しない単
語である。キー文字ｓについても同様である。再び、図
１の例を用いて具体的にその動作を説明する。FIG. 4 shows an example explanatory diagram of a word dictionary used in such an embodiment. In this figure, key characters are e and s, and words are arranged corresponding to the character positions. That is, of the words consisting of 5 characters, the words in which the key character e is arranged from the 1st to the 5th are listed, respectively. "0" is a word for which the key character e does not exist. The same applies to the key character s. Again, the operation will be specifically described using the example of FIG.

【００１６】図１の例で認識処理の結果得られた候補文
字には、先に説明したように２番目の文字２−２と４番
目の文字２−４とにキー文字ｅが含まれる。また、３番
目の文字２−３にキー文字ｓが含まれる。その検出結果
が辞書フィルタ部４に送り込まれる。ここで、辞書フィ
ルタ部４は、図４に示す単語辞書からキー文字ｅの文字
位置が２番目と４番目の単語であって、キー文字ｓの文
字位置が３番目の単語を取り出す。その結果は図に示す
「ｒｅｓｉｎ」、「ｒｅｓｅｔ」及び「ｏｕｓｅｌ」、
「ｌａｓｅｒ」の４単語となる。即ち、同一の文字数の
単語は、この図に示すように非常に多く存在するが、そ
の中でキー文字の位置に着目すると、照合対象がこうし
て絞り込まれる。なお、複数の位置に同じキー文字を持
つ単語が図４に示す単語辞書を見ると、数カ所に重複し
て登録されている。例えば、「ｒｅｓｅｔ」という単語
は、キー文字ｅが２番目にも４番目にも存在するから、
両方の箇所に登録されることがある。従って、重複して
単語が取り出せることもある。この場合には、いずれか
一方を抽出する。The candidate character obtained as a result of the recognition processing in the example of FIG. 1 includes the key character e in the second character 2-2 and the fourth character 2-4 as described above. The key character s is included in the third character 2-3. The detection result is sent to the dictionary filter unit 4. Here, the dictionary filter unit 4 extracts from the word dictionary shown in FIG. 4 the words whose key character e is at the second and fourth positions and whose key character s is at the third character position. The result is "resin", "reset" and "ousel" shown in the figure.
It will be four words "laser". That is, there are a great number of words having the same number of characters as shown in this figure, but if the focus is placed on the position of the key character, the collation target is narrowed down in this way. In the word dictionary shown in FIG. 4, words having the same key character at a plurality of positions are redundantly registered at several places. For example, in the word "reset", the key character e exists in the second and fourth key,
May be registered in both places. Therefore, words may be extracted in duplicate. In this case, either one is extracted.

【００１７】以上により、キー文字の数を用いた場合と
同様にして比較照合対象となる単語の効果的な単語の絞
り込みが行われる。なお、キー文字の数とキー文字の位
置の両方に着目して単語の絞り込みを行うようにしても
差し支えない。As described above, as in the case of using the number of key characters, the words to be compared and collated are effectively narrowed down. Note that the words may be narrowed down by focusing on both the number of key characters and the positions of the key characters.

【００１８】本発明は以上の実施の形態に限定されな
い。上記実施の形態では、文字認識の結果を利用した後
処理についての説明を行ったが、音声認識等各種の任意
のデータを文字コード列等に変換する装置で、出力され
るコード列を辞書と比較する場合に、同様にして本発明
を採用することが可能である。The present invention is not limited to the above embodiment. In the above embodiment, the post-processing using the result of character recognition was described.However, in a device that converts various arbitrary data such as voice recognition into a character code string, a code string to be output is a dictionary. When making a comparison, the present invention can be similarly adopted.

[Brief description of drawings]

【図１】本発明の文字認識処理方法説明図である。FIG. 1 is an explanatory diagram of a character recognition processing method of the present invention.

【図２】本発明実施のためのハードウェアブロック図で
ある。FIG. 2 is a hardware block diagram for implementing the present invention.

【図３】単語辞書例説明図（その１）である。FIG. 3 is an explanatory diagram of a word dictionary example (No. 1).

【図４】単語辞書例説明図（その２）である。FIG. 4 is an explanatory diagram of a word dictionary example (No. 2).

[Explanation of symbols]

１入力文書２−１〜２−５認識対象の文字３単語辞書４辞書フィルタ部 1 Input Document 2-1 to 2-5 Characters to be recognized 3 Word dictionary 4 Dictionary filter unit

フロントページの続き (72)発明者鳥越真東京都港区虎ノ門１丁目７番12号沖電気工業株式会社内Front page continued (72) Inventor Makoto Torikoshi 1-7-12 Toranomon, Minato-ku, Tokyo Oki Electric Industry Co., Ltd.

Claims

[Claims]

1. A document image is read, the image is cut out in character units for recognition processing, and a word composed of candidate character strings obtained as a result of the recognition processing is extracted from a word dictionary with the same number of characters. In the case where post-processing of recognition results is performed by comparing and collating with a group of words, a key character with a low error rate of the recognition result is set in advance, and it is composed of a candidate character string containing that key character. A character recognition processing method characterized by counting the number of key characters of a word, and comparing and matching with a word group that is composed of the same number of character strings extracted from a word dictionary and that contains the same number of key characters.

2. An image of a document is read, the image is cut out in character units for recognition processing, and a word composed of candidate character strings obtained as a result of the recognition processing is extracted from a word dictionary with the same number of characters. In the case where post-processing of recognition results is performed by comparing and collating with a group of words, a key character with a low error rate of the recognition result is set in advance, and it is composed of a candidate character string containing that key character. Character recognition characterized by detecting the position of a key character for a word and comparing and matching it with a word group that consists of the same number of character strings extracted from the word dictionary and that contains the key character corresponding to the same position. Processing method.

3. The character recognition processing method according to claim 1, wherein the result of recognition processing in character units instead of the image of the document is compared and collated with a word group.

4. The character recognition processing method according to claim 1, wherein only characters having a high appearance rate are set as key characters.