JPS59128682A - Character reader - Google Patents

Character reader

Info

Publication number
JPS59128682A
JPS59128682A JP58003002A JP300283A JPS59128682A JP S59128682 A JPS59128682 A JP S59128682A JP 58003002 A JP58003002 A JP 58003002A JP 300283 A JP300283 A JP 300283A JP S59128682 A JPS59128682 A JP S59128682A
Authority
JP
Japan
Prior art keywords
character
appearance frequency
characters
similarity
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP58003002A
Other languages
Japanese (ja)
Inventor
Haruo Mizukami
水上 治雄
Masataka Yamamoto
山本 勝敬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Basic Technology Research Association Corp
Original Assignee
Computer Basic Technology Research Association Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Basic Technology Research Association Corp filed Critical Computer Basic Technology Research Association Corp
Priority to JP58003002A priority Critical patent/JPS59128682A/en
Publication of JPS59128682A publication Critical patent/JPS59128682A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To recognize a character easily and to improve the recognizing accuracy by using the similarity of an input character pattern to a reference character pattern in a recognition dictionary and the appearance frequency of a character or a subset at each character position on a business form to recognize the character. CONSTITUTION:A character written in the business form 1 is scanned by a scanning means 2 and converted into an electric signal, which is processed by binary-coding or the like in a preprocessing means 3. A feature extracting means 4 extracts the feature of a character pattern and a similarity calculating means 5 matches the patterns from the feature of the extracted character pattern and a reference character pattern previously stored in the recognition dictionary 6. The similarity is applied to a discriminating means 8 and a character corresponding to the character pattern is decided from the appearance frequency of a position on the business form 1 which is stored in an appearance frequency storing means 7 and is to be read out at present or the appearance frequency of a subset. The read-out result 9 is outputted and sent to an appearance frequency undating means 10 to update the appearance frequency.

Description

【発明の詳細な説明】 〔発明の技術分野〕 この発明は、帳票などに記録された文字を認識するに当
り、入力文字/母ターンと認識辞書内の基準文字ノリー
ンとの類似度と、帳票上の各文字位置での文字又はサブ
セットの出現頻度を用いて文字を認識して読み取る文字
読取り装置に関するものである。
[Detailed Description of the Invention] [Technical Field of the Invention] In recognizing characters recorded on a form, etc., the present invention is based on the similarity between an input character/mother turn and a reference character Noreen in a recognition dictionary, and on a form. The present invention relates to a character reading device that recognizes and reads characters using the frequency of appearance of characters or subsets at each character position.

〔従来技術〕[Prior art]

従来この種の文字読取り装置においては、未知の文字ノ
4ターンと認識辞書内の基準文字パターンとの類似度の
みを用いて文字を認識して読み敗るように構成されてい
るので、漢字等を含む多字種の文字を認識対象とする場
合には、類似した文字が多いということかからして文字
の認識率が低下するとい5欠点があった。また、帳票の
読み取りフィールドごとに読み取り字種や読み取りサブ
セットを指定する方法も提案されているが、この場合に
は、帳票フォーマットを作成するのが非常に面倒である
という欠点があった。
Conventionally, this type of character reading device is configured to recognize and fail to read a character using only the degree of similarity between the four turns of an unknown character and a reference character pattern in a recognition dictionary. When recognizing multiple types of characters, including , there are five drawbacks: the character recognition rate decreases because there are many similar characters. A method has also been proposed in which a reading type or reading subset is specified for each reading field of a form, but this method has the disadvantage that it is extremely troublesome to create a form format.

〔発明の概要〕[Summary of the invention]

この発明は、上記のような従来のものの欠点を除去する
目的でなされたもので、文章入力用以外の帳票などにお
いては、この帳票の各文字位置によって出現しやすい文
字又はサブセットと、出現しにくい文字又はサブセット
が存在するという観点から、入力文字パターンと認識辞
薔内の基準文字パターンとの類似度と、帳票上の各文字
位置での文字又はサブセットの出現頻度を用いて文字を
認識するようにし、これにより文字認識精度を向上させ
ることができるようにした文字読取り装置を提供するも
のである。
This invention was made for the purpose of eliminating the drawbacks of the conventional ones as described above, and in forms other than text input, it is possible to distinguish between characters or subsets that are more likely to appear and those that are less likely to appear, depending on the character position of the form. From the viewpoint of the existence of characters or subsets, characters are recognized using the degree of similarity between the input character pattern and the reference character pattern in the recognition dictionary and the frequency of appearance of characters or subsets at each character position on the form. The object of the present invention is to provide a character reading device that can improve character recognition accuracy.

〔発明の実施例〕[Embodiments of the invention]

以下、この発明の実施例について説明する。図面はこの
発明の一実施例である文字読取り装置を示す!ロック構
成図である。図において、lは帳票、2は帳票l上の文
字を走査して電気信号に変換する走査手段、3は走糞手
段2で得られた文字の電気信号から文字の2値ノリーン
等を得る前処理手段、4は前処理手段3で前処理を施さ
れた文字ノeターンから特徴を抽出する特徴抽出手段、
6は認識対象文字の基準・リーンの特徴を格納した認識
辞書、5は特徴抽出手段4で抽出された特徴と認識辞書
6内の基準文字・母ターンの特徴とから類似度を計算す
る類似度計算手段、7は帳票l上の各文字位置での認識
対象文字の出現頻度、又は漢字、英字、数字1月数名、
平板名、記号などのサブセットの出現頻度を格納する出
現頻度格納手段、8は判定手段、9は読み取り結果、1
0は文字の認識結果から帳票1の各文字位置での文字又
はサブセットの出現頻度を更新する出現頻度更新手段で
ある。
Examples of the present invention will be described below. The drawing shows a character reading device which is an embodiment of the present invention! It is a lock block diagram. In the figure, l is a form, 2 is a scanning means for scanning the characters on the form l and converting it into an electric signal, and 3 is before obtaining the binary value of the character from the electric signal of the character obtained by the scanning means 2. processing means; 4 is a feature extraction means for extracting features from the character e-turns preprocessed by the preprocessing means 3;
Reference numeral 6 denotes a recognition dictionary storing standard/lean features of characters to be recognized, and 5 a similarity measure for calculating similarity from the features extracted by the feature extraction means 4 and the features of the reference character/mother turn in the recognition dictionary 6. Calculation means, 7 is the appearance frequency of the character to be recognized at each character position on the form l, or the number of kanji, alphabetic characters, and numbers per month;
1 is an appearance frequency storage means for storing the appearance frequency of a subset of flat names, symbols, etc., 8 is a determination means, 9 is a reading result, 1
0 is an appearance frequency updating means that updates the appearance frequency of a character or a subset at each character position of the form 1 based on the character recognition result.

次に、上記図面に示された構成を有する文字読取り装置
の動作について説明する。まず、帳票1上に書かれた文
字を、走査手段2で、あらかじめ与えられた帳票フォー
マットに基づいて走査して電気信号に変換する。前処理
手段3では、走査手段2で得られた文字の電気信号から
文字の2値化等の前処理を施した文字/4’ターンを得
る。特徴抽出手段4では、前処理手段3で前処理を施さ
れた文字パターンの特徴を抽出する。類似度計算手段5
では、特徴抽出手段4で得られた入力文字/IPターン
Piの特徴と、認識辞書6に格納されているすべての基
準文字・母ターンPjの特徴とのマツチングを取り、類
似度Sijを求める。出現頻度格納手段7には、認識対
象となっている各文字について、現在読み取ろうとして
いる文字が記入されている帳票1上の位置での出現頻度
Fijが格納されている。判定手段8では、上記類似度
Sijと上記出現頻度FBjから計算するf(81j、
Fij)を用いて、入力文字ツクターンPiがどの文字
であるか、又は読み取り拒否とするかどうかを判定する
。また、判定手段8で得られた読み取り結果9を出現頻
度更新手段10に送り、文字の出現頻度を更新する。た
だし、読み取り結果9が読み取り拒否の場合には、読み
取り結果9を出現頻度更新手段lOには送らない。
Next, the operation of the character reading device having the configuration shown in the above drawings will be explained. First, characters written on a form 1 are scanned by the scanning means 2 based on a predetermined form format and converted into electrical signals. The preprocessing means 3 obtains a character/4' turn from the electric signal of the character obtained by the scanning means 2, which has undergone preprocessing such as character binarization. The feature extraction means 4 extracts the features of the character pattern that has been preprocessed by the preprocessing means 3. Similarity calculation means 5
Now, the features of the input character/IP turn Pi obtained by the feature extracting means 4 are matched with the features of all the reference characters/mother turns Pj stored in the recognition dictionary 6 to determine the degree of similarity Sij. The appearance frequency storage means 7 stores, for each character to be recognized, the appearance frequency Fij at the position on the form 1 where the character to be currently read is written. The determining means 8 calculates f(81j,
Fij) is used to determine which character the input character Tsukturn Pi is, or whether to refuse to read it. Further, the reading result 9 obtained by the determining means 8 is sent to the appearance frequency updating means 10 to update the appearance frequency of the character. However, if the reading result 9 is a refusal to read, the reading result 9 is not sent to the appearance frequency updating means IO.

さて、実際の具体例として、通常の場合、同じフォーマ
ットを持つ帳票lを連続して読み取ることが多くある。
Now, as an actual concrete example, in normal cases, forms l having the same format are often read successively.

この場合、最初は文字の出現頻度がすべての文字につい
て0であるから、帳票1のID(R別うベル)を読み取
り、それに基づいた帳票フォーマットに従い、類似度だ
けを用いて文字の認識を行なう。そして、その読み取り
結果9により文字の出現頻度を更新する。この場合、読
み取り結果9が誤読であっても、実用レベルでの文字認
識精度は90数−以上であるから、文字の出現頻度に悪
影響を及ぼすことは少ない。文字の出現頻度が更新され
て行くと、上記類似度と文字の出現頻度を用いて文字を
認識する。例えば、帳票1のある文字位置に記録された
文字ノeターンに対して、片仮名「り」と数字「7」が
同程度の類似度を持った候補文字となったとき、その文
字位置での文字の出現頻度のうち、数字「7」の方が圧
倒的に片仮名「り」より大きければ、読み取り結果9を
数字「7」とする。つまり、類似度だけを用いて文字を
認識する場合に比較して、より高い認識精度を得ること
ができる。ここで、別のIDを持つ帳票1が来た場合に
は、別の帳票lのIDに対応する帳票フォーマットの文
字の出現頻度を用いて文字の認識を行なう。
In this case, since the character appearance frequency is initially 0 for all characters, read the ID of form 1 (R separate bell), follow the form format based on it, and perform character recognition using only similarity. . Then, the appearance frequency of characters is updated based on the reading result 9. In this case, even if the reading result 9 is misread, the accuracy of character recognition at a practical level is 90 or more, so there is little negative effect on the appearance frequency of characters. As the appearance frequency of the characters is updated, the characters are recognized using the above-mentioned similarity and the appearance frequency of the characters. For example, when the katakana ``ri'' and the number ``7'' become candidate characters with the same degree of similarity for the character e-turn recorded at a certain character position in form 1, If the number "7" is overwhelmingly higher than the katakana "ri" in the appearance frequency of characters, the reading result 9 is set as the number "7". In other words, higher recognition accuracy can be obtained compared to the case where characters are recognized using only similarity. Here, when a form 1 with a different ID is received, character recognition is performed using the frequency of appearance of characters in the form format corresponding to the ID of the different form 1.

なお、上記実施例では類似度と文字の出現頻度を用いて
判定する場合について説明したが、文字の出現頻度の代
わりに、帳票1上の文字位置ごとに、漢字、英字、数字
2月数名、平板名、記号などのサブセットのうち、どの
サブセットが出現しやすいかというサブセットの出現頻
度を用いて文字を認識するようにしても良い。
In addition, in the above embodiment, the case where the determination is made using the degree of similarity and the appearance frequency of characters was explained. Characters may be recognized using the appearance frequency of which subset is likely to appear among subsets such as , flat name, symbol, etc.

〔発明の効果〕〔Effect of the invention〕

この発明は以上説明したように、帳票などに記録された
文字を認識するのに、入力文字/ぐターンから抽出され
た特徴と認識辞書内の基準文字・ぐターンの特徴との類
似度と、帳票上の文字位置ごとの文字の出現頻度又はサ
ブセットの出現頻度を用いて文字を認識するように構成
したので、従来のこの種の文字読取り装置におけるよう
に、類似度だけを用いて文字を認識する場合に比較して
、文字に対する認識率の精度をより一層向上させること
ができるという優れた効果を奏するものである。
As explained above, in order to recognize characters recorded on a form etc., this invention uses the degree of similarity between the features extracted from input characters/grams and the features of reference characters/grams in the recognition dictionary. Since it is configured to recognize characters using the appearance frequency of each character position on a form or the appearance frequency of a subset, unlike conventional character reading devices of this type, characters cannot be recognized using only the degree of similarity. This has an excellent effect in that the accuracy of the recognition rate for characters can be further improved compared to the case where the method is used.

【図面の簡単な説明】[Brief explanation of drawings]

図面はこの発明の一実施例である文字読取り装置を示す
ブロック構成図である。 図において、1・・・帳票、2・・・走査手段、3・・
・前処理手段、番・・・特徴抽出手段、5・・・類似度
計算手段、6・・・認識辞書、7・・・出現頻度格納手
段、8・・・判定手段、9・・・読み取り結果、lO・
・・出現頻度更新手段である。 代理人  葛野信−
The drawing is a block diagram showing a character reading device according to an embodiment of the present invention. In the figure, 1... form, 2... scanning means, 3...
・Pre-processing means, number: Feature extraction means, 5: Similarity calculation means, 6: Recognition dictionary, 7: Appearance frequency storage means, 8: Judgment means, 9: Reading As a result, lO・
...This is an appearance frequency update means. Agent Makoto Kuzuno

Claims (1)

【特許請求の範囲】[Claims] 帳票などに記録された文字を認識して読み取る文字読み
取り装置において、あらかじめ与えられた帳票フォーマ
ットに従って文字を走査して電気信号に変換する走査手
段と、該走査手段で得られた文字の電気信号から文字の
2値ノソターン等を得る前処理手段と、該前処理手段で
前処理を施された文字・母ターンから特徴を抽出する特
徴抽出手段と、認識対象文字の基準A’ターンの特徴を
格納した認識辞書と、前記帳票上の各文字位置での認識
対象文字の出現頻度、又は漢字、英字、数字1片仮名、
平板名、記号などのサブセットの出現頻度を格納する出
現頻度格納手段と、前記特徴抽出手段で抽出された特徴
と前記認識辞書内の基準文字ノぐターンの特徴とから類
似度を計算する類似度計算手段と、文字の認識結果から
前記帳票の各文字位置での文字又はサブセットの出現頻
度を更W[する出現頻度更新手段とを具備し、前記類似
度と前記出現頻度を用いて文字を認識するよ5にしたこ
とを特徴とする文字読取り装置。
In a character reading device that recognizes and reads characters recorded on a form, etc., there is a scanning means that scans the characters according to a predetermined form format and converts them into electrical signals, and an electrical signal of the characters obtained by the scanning means. Preprocessing means for obtaining binary noso turns of characters, feature extraction means for extracting features from characters/mother turns preprocessed by the preprocessing means, and storing features of reference A' turns of characters to be recognized. recognition dictionary and the frequency of occurrence of the recognition target character at each character position on the form, or kanji, alphabetic characters, 1-number katakana,
an appearance frequency storage means for storing the appearance frequency of a subset of flat names, symbols, etc., and a similarity degree for calculating a degree of similarity from the features extracted by the feature extraction means and the features of the reference character noguturn in the recognition dictionary. comprising a calculation means and an appearance frequency updating means for updating the appearance frequency of a character or a subset at each character position of the form based on the character recognition result, and recognizing a character using the similarity and the appearance frequency. A character reading device characterized by a character reading of 5.
JP58003002A 1983-01-12 1983-01-12 Character reader Pending JPS59128682A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58003002A JPS59128682A (en) 1983-01-12 1983-01-12 Character reader

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58003002A JPS59128682A (en) 1983-01-12 1983-01-12 Character reader

Publications (1)

Publication Number Publication Date
JPS59128682A true JPS59128682A (en) 1984-07-24

Family

ID=11545151

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58003002A Pending JPS59128682A (en) 1983-01-12 1983-01-12 Character reader

Country Status (1)

Country Link
JP (1) JPS59128682A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01219971A (en) * 1988-02-29 1989-09-01 Fujitsu Ltd Character recognizing system
JPH03223986A (en) * 1989-12-26 1991-10-02 Fuji Facom Corp Method for rejecting recognized result

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01219971A (en) * 1988-02-29 1989-09-01 Fujitsu Ltd Character recognizing system
JPH03223986A (en) * 1989-12-26 1991-10-02 Fuji Facom Corp Method for rejecting recognized result

Similar Documents

Publication Publication Date Title
US5077805A (en) Hybrid feature-based and template matching optical character recognition system
JPS6140684A (en) Contour tracking device
JPH0564834B2 (en)
JP2004280334A (en) Image reading device
JPS59128682A (en) Character reader
JPH11328315A (en) Character recognizing device
JPS6316795B2 (en)
JP2985813B2 (en) Character string recognition device and knowledge database learning method
JPH0475557B2 (en)
JP2894111B2 (en) Comprehensive judgment method of recognition result in optical type character recognition device
JP2630261B2 (en) Character recognition device
JPH09114926A (en) Method and device for rough classifying input characters for on-line character recognition
JP2959054B2 (en) Line type discrimination method in pattern recognition device
JPS60238986A (en) Pattern matching system of character recognition device
JPS6111886A (en) Character recognition system
JP2746345B2 (en) Post-processing method for character recognition
JP2930996B2 (en) Image recognition method and image recognition device
JP2973898B2 (en) Character recognition method and device
JPS6160184A (en) Optical character reader
JP2828820B2 (en) Fingerprint collation device
JPH05274464A (en) Electronic filing device
JPH11288461A (en) Method for recognizing image and recording medium
JPH0576674B2 (en)
JP2000288478A (en) Address specifying device
JPH0578068B2 (en)