JPS59128682A

JPS59128682A - Character reader

Info

Publication number: JPS59128682A
Application number: JP58003002A
Authority: JP
Inventors: Haruo Mizukami; 水上　治雄; Masataka Yamamoto; 山本　勝敬
Original assignee: Computer Basic Technology Research Association Corp
Current assignee: Computer Basic Technology Research Association Corp
Priority date: 1983-01-12
Filing date: 1983-01-12
Publication date: 1984-07-24

Abstract

PURPOSE:To recognize a character easily and to improve the recognizing accuracy by using the similarity of an input character pattern to a reference character pattern in a recognition dictionary and the appearance frequency of a character or a subset at each character position on a business form to recognize the character. CONSTITUTION:A character written in the business form 1 is scanned by a scanning means 2 and converted into an electric signal, which is processed by binary-coding or the like in a preprocessing means 3. A feature extracting means 4 extracts the feature of a character pattern and a similarity calculating means 5 matches the patterns from the feature of the extracted character pattern and a reference character pattern previously stored in the recognition dictionary 6. The similarity is applied to a discriminating means 8 and a character corresponding to the character pattern is decided from the appearance frequency of a position on the business form 1 which is stored in an appearance frequency storing means 7 and is to be read out at present or the appearance frequency of a subset. The read-out result 9 is outputted and sent to an appearance frequency undating means 10 to update the appearance frequency.

Description

【発明の詳細な説明】〔発明の技術分野〕この発明は、帳票などに記録された文字を認識するに当
り、入力文字／母ターンと認識辞書内の基準文字ノリー
ンとの類似度と、帳票上の各文字位置での文字又はサブ
セットの出現頻度を用いて文字を認識して読み取る文字
読取り装置に関するものである。[Detailed Description of the Invention] [Technical Field of the Invention] In recognizing characters recorded on a form, etc., the present invention is based on the similarity between an input character/mother turn and a reference character Noreen in a recognition dictionary, and on a form. The present invention relates to a character reading device that recognizes and reads characters using the frequency of appearance of characters or subsets at each character position.

[Prior art]

従来この種の文字読取り装置においては、未知の文字ノ
４ターンと認識辞書内の基準文字パターンとの類似度の
みを用いて文字を認識して読み敗るように構成されてい
るので、漢字等を含む多字種の文字を認識対象とする場
合には、類似した文字が多いということかからして文字
の認識率が低下するとい５欠点があった。また、帳票の
読み取りフィールドごとに読み取り字種や読み取りサブ
セットを指定する方法も提案されているが、この場合に
は、帳票フォーマットを作成するのが非常に面倒である
という欠点があった。Conventionally, this type of character reading device is configured to recognize and fail to read a character using only the degree of similarity between the four turns of an unknown character and a reference character pattern in a recognition dictionary. When recognizing multiple types of characters, including , there are five drawbacks: the character recognition rate decreases because there are many similar characters. A method has also been proposed in which a reading type or reading subset is specified for each reading field of a form, but this method has the disadvantage that it is extremely troublesome to create a form format.

[Summary of the invention]

この発明は、上記のような従来のものの欠点を除去する
目的でなされたもので、文章入力用以外の帳票などにお
いては、この帳票の各文字位置によって出現しやすい文
字又はサブセットと、出現しにくい文字又はサブセット
が存在するという観点から、入力文字パターンと認識辞
薔内の基準文字パターンとの類似度と、帳票上の各文字
位置での文字又はサブセットの出現頻度を用いて文字を
認識するようにし、これにより文字認識精度を向上させ
ることができるようにした文字読取り装置を提供するも
のである。This invention was made for the purpose of eliminating the drawbacks of the conventional ones as described above, and in forms other than text input, it is possible to distinguish between characters or subsets that are more likely to appear and those that are less likely to appear, depending on the character position of the form. From the viewpoint of the existence of characters or subsets, characters are recognized using the degree of similarity between the input character pattern and the reference character pattern in the recognition dictionary and the frequency of appearance of characters or subsets at each character position on the form. The object of the present invention is to provide a character reading device that can improve character recognition accuracy.

[Embodiments of the invention]

以下、この発明の実施例について説明する。図面はこの
発明の一実施例である文字読取り装置を示す！ロック構
成図である。図において、ｌは帳票、２は帳票ｌ上の文
字を走査して電気信号に変換する走査手段、３は走糞手
段２で得られた文字の電気信号から文字の２値ノリーン
等を得る前処理手段、４は前処理手段３で前処理を施さ
れた文字ノｅターンから特徴を抽出する特徴抽出手段、
６は認識対象文字の基準・リーンの特徴を格納した認識
辞書、５は特徴抽出手段４で抽出された特徴と認識辞書
６内の基準文字・母ターンの特徴とから類似度を計算す
る類似度計算手段、７は帳票ｌ上の各文字位置での認識
対象文字の出現頻度、又は漢字、英字、数字１月数名、
平板名、記号などのサブセットの出現頻度を格納する出
現頻度格納手段、８は判定手段、９は読み取り結果、１
０は文字の認識結果から帳票１の各文字位置での文字又
はサブセットの出現頻度を更新する出現頻度更新手段で
ある。Examples of the present invention will be described below. The drawing shows a character reading device which is an embodiment of the present invention! It is a lock block diagram. In the figure, l is a form, 2 is a scanning means for scanning the characters on the form l and converting it into an electric signal, and 3 is before obtaining the binary value of the character from the electric signal of the character obtained by the scanning means 2. processing means; 4 is a feature extraction means for extracting features from the character e-turns preprocessed by the preprocessing means 3;
Reference numeral 6 denotes a recognition dictionary storing standard/lean features of characters to be recognized, and 5 a similarity measure for calculating similarity from the features extracted by the feature extraction means 4 and the features of the reference character/mother turn in the recognition dictionary 6. Calculation means, 7 is the appearance frequency of the character to be recognized at each character position on the form l, or the number of kanji, alphabetic characters, and numbers per month;
1 is an appearance frequency storage means for storing the appearance frequency of a subset of flat names, symbols, etc., 8 is a determination means, 9 is a reading result, 1
0 is an appearance frequency updating means that updates the appearance frequency of a character or a subset at each character position of the form 1 based on the character recognition result.

次に、上記図面に示された構成を有する文字読取り装置
の動作について説明する。まず、帳票１上に書かれた文
字を、走査手段２で、あらかじめ与えられた帳票フォー
マットに基づいて走査して電気信号に変換する。前処理
手段３では、走査手段２で得られた文字の電気信号から
文字の２値化等の前処理を施した文字／４’ターンを得
る。特徴抽出手段４では、前処理手段３で前処理を施さ
れた文字パターンの特徴を抽出する。類似度計算手段５
では、特徴抽出手段４で得られた入力文字／ＩＰターン
Ｐｉの特徴と、認識辞書６に格納されているすべての基
準文字・母ターンＰｊの特徴とのマツチングを取り、類
似度Ｓｉｊを求める。出現頻度格納手段７には、認識対
象となっている各文字について、現在読み取ろうとして
いる文字が記入されている帳票１上の位置での出現頻度
Ｆｉｊが格納されている。判定手段８では、上記類似度
Ｓｉｊと上記出現頻度ＦＢｊから計算するｆ（８１ｊ、
Ｆｉｊ）を用いて、入力文字ツクターンＰｉがどの文字
であるか、又は読み取り拒否とするかどうかを判定する
。また、判定手段８で得られた読み取り結果９を出現頻
度更新手段１０に送り、文字の出現頻度を更新する。た
だし、読み取り結果９が読み取り拒否の場合には、読み
取り結果９を出現頻度更新手段ｌＯには送らない。Next, the operation of the character reading device having the configuration shown in the above drawings will be explained. First, characters written on a form 1 are scanned by the scanning means 2 based on a predetermined form format and converted into electrical signals. The preprocessing means 3 obtains a character/4' turn from the electric signal of the character obtained by the scanning means 2, which has undergone preprocessing such as character binarization. The feature extraction means 4 extracts the features of the character pattern that has been preprocessed by the preprocessing means 3. Similarity calculation means 5
Now, the features of the input character/IP turn Pi obtained by the feature extracting means 4 are matched with the features of all the reference characters/mother turns Pj stored in the recognition dictionary 6 to determine the degree of similarity Sij. The appearance frequency storage means 7 stores, for each character to be recognized, the appearance frequency Fij at the position on the form 1 where the character to be currently read is written. The determining means 8 calculates f(81j,
Fij) is used to determine which character the input character Tsukturn Pi is, or whether to refuse to read it. Further, the reading result 9 obtained by the determining means 8 is sent to the appearance frequency updating means 10 to update the appearance frequency of the character. However, if the reading result 9 is a refusal to read, the reading result 9 is not sent to the appearance frequency updating means IO.

さて、実際の具体例として、通常の場合、同じフォーマ
ットを持つ帳票ｌを連続して読み取ることが多くある。Now, as an actual concrete example, in normal cases, forms l having the same format are often read successively.

この場合、最初は文字の出現頻度がすべての文字につい
て０であるから、帳票１のＩＤ（Ｒ別うベル）を読み取
り、それに基づいた帳票フォーマットに従い、類似度だ
けを用いて文字の認識を行なう。そして、その読み取り
結果９により文字の出現頻度を更新する。この場合、読
み取り結果９が誤読であっても、実用レベルでの文字認
識精度は９０数−以上であるから、文字の出現頻度に悪
影響を及ぼすことは少ない。文字の出現頻度が更新され
て行くと、上記類似度と文字の出現頻度を用いて文字を
認識する。例えば、帳票１のある文字位置に記録された
文字ノｅターンに対して、片仮名「り」と数字「７」が
同程度の類似度を持った候補文字となったとき、その文
字位置での文字の出現頻度のうち、数字「７」の方が圧
倒的に片仮名「り」より大きければ、読み取り結果９を
数字「７」とする。つまり、類似度だけを用いて文字を
認識する場合に比較して、より高い認識精度を得ること
ができる。ここで、別のＩＤを持つ帳票１が来た場合に
は、別の帳票ｌのＩＤに対応する帳票フォーマットの文
字の出現頻度を用いて文字の認識を行なう。In this case, since the character appearance frequency is initially 0 for all characters, read the ID of form 1 (R separate bell), follow the form format based on it, and perform character recognition using only similarity. . Then, the appearance frequency of characters is updated based on the reading result 9. In this case, even if the reading result 9 is misread, the accuracy of character recognition at a practical level is 90 or more, so there is little negative effect on the appearance frequency of characters. As the appearance frequency of the characters is updated, the characters are recognized using the above-mentioned similarity and the appearance frequency of the characters. For example, when the katakana ``ri'' and the number ``7'' become candidate characters with the same degree of similarity for the character e-turn recorded at a certain character position in form 1, If the number "7" is overwhelmingly higher than the katakana "ri" in the appearance frequency of characters, the reading result 9 is set as the number "7". In other words, higher recognition accuracy can be obtained compared to the case where characters are recognized using only similarity. Here, when a form 1 with a different ID is received, character recognition is performed using the frequency of appearance of characters in the form format corresponding to the ID of the different form 1.

なお、上記実施例では類似度と文字の出現頻度を用いて
判定する場合について説明したが、文字の出現頻度の代
わりに、帳票１上の文字位置ごとに、漢字、英字、数字
２月数名、平板名、記号などのサブセットのうち、どの
サブセットが出現しやすいかというサブセットの出現頻
度を用いて文字を認識するようにしても良い。In addition, in the above embodiment, the case where the determination is made using the degree of similarity and the appearance frequency of characters was explained. Characters may be recognized using the appearance frequency of which subset is likely to appear among subsets such as , flat name, symbol, etc.

〔Effect of the invention〕

この発明は以上説明したように、帳票などに記録された
文字を認識するのに、入力文字／ぐターンから抽出され
た特徴と認識辞書内の基準文字・ぐターンの特徴との類
似度と、帳票上の文字位置ごとの文字の出現頻度又はサ
ブセットの出現頻度を用いて文字を認識するように構成
したので、従来のこの種の文字読取り装置におけるよう
に、類似度だけを用いて文字を認識する場合に比較して
、文字に対する認識率の精度をより一層向上させること
ができるという優れた効果を奏するものである。As explained above, in order to recognize characters recorded on a form etc., this invention uses the degree of similarity between the features extracted from input characters/grams and the features of reference characters/grams in the recognition dictionary. Since it is configured to recognize characters using the appearance frequency of each character position on a form or the appearance frequency of a subset, unlike conventional character reading devices of this type, characters cannot be recognized using only the degree of similarity. This has an excellent effect in that the accuracy of the recognition rate for characters can be further improved compared to the case where the method is used.

[Brief explanation of drawings]

図面はこの発明の一実施例である文字読取り装置を示す
ブロック構成図である。図において、１・・・帳票、２・・・走査手段、３・・
・前処理手段、番・・・特徴抽出手段、５・・・類似度
計算手段、６・・・認識辞書、７・・・出現頻度格納手
段、８・・・判定手段、９・・・読み取り結果、ｌＯ・
・・出現頻度更新手段である。代理人　　葛野信−The drawing is a block diagram showing a character reading device according to an embodiment of the present invention. In the figure, 1... form, 2... scanning means, 3...
・Pre-processing means, number: Feature extraction means, 5: Similarity calculation means, 6: Recognition dictionary, 7: Appearance frequency storage means, 8: Judgment means, 9: Reading As a result, lO・
...This is an appearance frequency update means. Agent Makoto Kuzuno

Claims

[Claims]

In a character reading device that recognizes and reads characters recorded on a form, etc., there is a scanning means that scans the characters according to a predetermined form format and converts them into electrical signals, and an electrical signal of the characters obtained by the scanning means. Preprocessing means for obtaining binary noso turns of characters, feature extraction means for extracting features from characters/mother turns preprocessed by the preprocessing means, and storing features of reference A' turns of characters to be recognized. recognition dictionary and the frequency of occurrence of the recognition target character at each character position on the form, or kanji, alphabetic characters, 1-number katakana,
an appearance frequency storage means for storing the appearance frequency of a subset of flat names, symbols, etc., and a similarity degree for calculating a degree of similarity from the features extracted by the feature extraction means and the features of the reference character noguturn in the recognition dictionary. comprising a calculation means and an appearance frequency updating means for updating the appearance frequency of a character or a subset at each character position of the form based on the character recognition result, and recognizing a character using the similarity and the appearance frequency. A character reading device characterized by a character reading of 5.