JPH0193876A - Character reader - Google Patents

Character reader

Info

Publication number
JPH0193876A
JPH0193876A JP62249782A JP24978287A JPH0193876A JP H0193876 A JPH0193876 A JP H0193876A JP 62249782 A JP62249782 A JP 62249782A JP 24978287 A JP24978287 A JP 24978287A JP H0193876 A JPH0193876 A JP H0193876A
Authority
JP
Japan
Prior art keywords
character
characters
word
candidate
similar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP62249782A
Other languages
Japanese (ja)
Inventor
Akizo Kadota
門田 彰三
Masao Yamamoto
雅夫 山本
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP62249782A priority Critical patent/JPH0193876A/en
Publication of JPH0193876A publication Critical patent/JPH0193876A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To improve the accuracy of a word collating result by adding plural similar characters to a recognizing candidate obtained by a character reading means to the recognizing candidate. CONSTITUTION:When the plural recognizing candidate characters are outputted from a character recognizing means 2, a candidate selecting means 4 defines the recognizing candidate character to be a key word and indexes a similar character table 5. The characters registered in the similar character table 5 are considered to be the candidate characters and a word collating means 6 collates the candidate character to a word dictionary 7 to output the most matching word as a collated result word 8.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 本発明は、帳票上に曹かれた文字を認識して読み取る文
字読取り装置に係わり、特に単語辞書を用いて入力文字
を正確に認識でさろような文字認識後処坤を具備した文
字読取り装置に関する。
[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a character reading device that recognizes and reads characters written on a form, and particularly to a character reading device that can accurately recognize input characters using a word dictionary. The present invention relates to a character reading device equipped with such a character recognition device.

〔従来の技術〕[Conventional technology]

従来の文字認識方式では1文字読取り装置で読まれた結
果の精度を上げるために、特公昭61−20058号公
報に記載のように、複数の候補を出力し単語辞書と照合
し、最も良くマツチングし念単語を出力していた。
In conventional character recognition methods, in order to improve the accuracy of the results read by a single character reading device, as described in Japanese Patent Publication No. 61-20058, multiple candidates are output and compared with a word dictionary to find the best match. It was outputting words of remembrance.

〔発明が解決しようとする問題点〕[Problem that the invention seeks to solve]

しかるに上記方法では、認識候補の中に、正しいカテゴ
リの文字が入っている必要があり、無げれは誤まって読
み取られる可能性が多分にある。
However, in the above method, it is necessary that the recognition candidates include characters of the correct category, and if there is not, there is a high possibility that the characters will be read incorrectly.

−万、文字読取り装置で読み取れる文字種の数には制限
があり、常用漢字あるいはJIS第1水準の漢字しか読
めない。しかるに帳票に書かれる文字はこれ以外の文字
11例えばJIS第2水準の漢字が含まれることがある
- There is a limit to the number of character types that can be read by character reading devices, and only common kanji or JIS level 1 kanji can be read. However, the characters written on the form may include other characters11, such as JIS level 2 kanji.

これらの文字を書かない様に規制することは記入者に不
必要な負担を課丁ことになり、かつどの文字が文字読取
り装置で読み取れるかを憶えることは、字種が数千文字
に及ぶことから不可能に近い。
Restricting the writing of these characters would place an unnecessary burden on the person writing them, and remembering which characters can be read by a character reading device would be difficult since there are thousands of character types. It's almost impossible.

本発明の目的とするところは、文字読取り装置よりの認
識候補文字の中に正解文字が含まれていなくとも誤認識
することな(単語照合を行なわせしめる手段を提供する
ことにある。
SUMMARY OF THE INVENTION An object of the present invention is to provide a means for performing word matching without erroneously recognizing characters even if the correct characters are not included in candidate characters for recognition from a character reading device.

〔問題点を解決するための手段〕[Means for solving problems]

上記目的は、文字読取装置にだいて、文字読取手段より
得られた認識候補から、それに類似し友複数個の類似文
字を認識候補に付加する候補選択手段を有し、か(]−
て作成されt候補群をもとに単語辞書に保持された単語
の中から最も類似性の大ざい単語を選択するようにし之
ことによυ達成される。
The above object is to provide a character reading device with candidate selection means for adding a plurality of similar characters similar to the recognition candidates obtained by the character reading means to the recognition candidates,
This is accomplished by selecting the word with the most similarity from among the words stored in the word dictionary based on the t candidate groups created by the method.

〔作用〕[Effect]

一般にあるカテゴリの文字が誤って他のカテゴリに読ま
2するとぎ、他カテゴリの数はそれはと多くな(、かつ
特定のカテゴリに限定されることが多い。したがってこ
れらをあらかじめ測定しておぎ類似文字テーブルを作っ
てRくと、正解文字がその類似文字テーブル中に現われ
る確率は非常に太さくなる。
In general, characters in one category are mistakenly read as other categories, but the number of other categories is very large (and often limited to a specific category. Therefore, it is necessary to measure these in advance and read similar characters. When a table is created and R is run, the probability that a correct character will appear in the similar character table becomes very large.

同様にして標準バタンに登録されていない文字(外字)
を読ませてどのような文字に誤読しゃ丁いかを調べてお
けは、類似文字テーブルの中に外字を入れて?(ことか
でさる。
Similarly, characters that are not registered in standard buttons (external characters)
To find out which characters are mispronounced by having them read it, put the foreign characters in the similar character table. (Kotoka De Saru.

し友がって文字読取り装置から複数の認識候補文字が出
力された地合、認識候補文字をキーワードにして類似文
字チーフル全引き、これら類似文字テーブルに登録され
ている文字をも候補文字として使用することにすると、
朕補文字の中に高い確率で正解文字が含ぼれることにな
り、単語照合をまちがえることは無くなる。
If multiple recognition candidate characters are output from the character reading device, the recognition candidate characters are used as keywords to retrieve all similar characters, and the characters registered in these similar character tables are also used as candidate characters. Deciding to do so,
Correct characters will be included in the complement characters with a high probability, and there will be no mistakes in word matching.

〔芙施例〕[Fu example]

以下、本発明の一実施例を第1図により説明する。 An embodiment of the present invention will be described below with reference to FIG.

第1図において1は文字の沓かれ几帳票であり。In Figure 1, 1 is a neatly written ledger.

文字認識手段2によって取り込まれ、読み取られる。文
字認識手段2は認識用辞書5を使って認識し、認識候補
を出力する。認識候補は、カテゴリ名、マスク番号、類
似度などから構成さnているものとする。マスク番号は
、カテゴリあたり複数のマスクから標本バタンか構成さ
れている場合に、どのマスクでアクセツ)gn皮かを示
す友めのものであり、カテゴリあ之ジ1マスクしか使用
しない時には不要である、 6に単語照合手段であジ、候補文字と単語辞書7とを照
合し、最もうまくマツチングする単語を照合結果単語8
として出力するものである。
It is captured and read by the character recognition means 2. The character recognition means 2 performs recognition using a recognition dictionary 5 and outputs recognition candidates. It is assumed that the recognition candidates are composed of a category name, a mask number, a degree of similarity, and the like. The mask number is used to indicate which mask is used when the specimen button is composed of multiple masks per category, and is unnecessary when only one mask is used for the category. , 6, the word matching means matches the candidate characters with the word dictionary 7, and the word that matches best is the matching result word 8.
This is what is output as.

候補選択手段4は文字認識手段2からの認識候補から類
似文字テーブル5を引さ、単語照合手段6に入力する候
補文字群を作成する。
The candidate selection means 4 subtracts the similar character table 5 from the recognition candidates from the character recognition means 2 and creates a candidate character group to be input to the word matching means 6.

第2図は、文字認識手段2からの認識候補の1例を示す
。認識候補は候補数NCとN、C個のカテゴリ名、マス
ク番号、類似皮表とからなりytりている。
FIG. 2 shows an example of recognition candidates from the character recognition means 2. The recognition candidates consist of the number of candidates NC and N, C category names, mask numbers, and similar skins.

第6図は、類似文字テーブル501例を示す。FIG. 6 shows an example of a similar character table 501.

類似文字テーブルは、カテゴリ名、マスク番号及び類似
文字数HBおよびN5個の類似文字カテゴリ名から構成
されている。
The similar character table is composed of a category name, a mask number, the number of similar characters HB, and N5 similar character category names.

以下、例を上げて具体的動作を説明する。今帳票上に書
かれた′市′という文字を読み取った所、第2図に示す
ような候補が得られ之とする。この候補の中に正解1市
′は含まれていないため、単語照合で誤まって他の単語
とマツチングしてしまうぢそれがある。
The specific operation will be explained below using an example. Suppose that when we read the word ``city'' written on the form, we get the candidates shown in Figure 2. Since the correct answer ``city'' is not included in these candidates, there is a possibility that the word matching may be mistakenly matched with another word.

次に、認識候補の第1位の文字丁なわちカテゴリ名′布
′、マスク番号1′(!−類似文字テーブル5の中から
探し出丁。第6図に示す類似文字テーブルにはそれがあ
ジ、4つの類似文字、下部留年がみつかる。これらの余
似文字のうち帝は認識候補の中に存在するので残りの下
部年金候補テーブルに追加する。
Next, the first character in the recognition candidate, category name 'Cloth', mask number 1' (!-) is searched from the similar character table 5. It is listed in the similar character table shown in FIG. Aji, four similar characters, and lower grade repetition are found.Among these similar characters, emperor exists among the recognition candidates, so it is added to the remaining lower grade pension candidate table.

最終的な候補テーブルを第4図に示で。この中には正解
カテゴリ′市′が含まれて?す、単語照合で誤、まろこ
とは無(なるであろう。
The final candidate table is shown in Figure 4. Does this include the correct category ``city''? There was a mistake in word matching, and Maroko was nothing.

以上の例では、候補の中に正解文字が含まれていない場
合を例にとったが、記入された文字が外字であっても、
類似文字表に外字を入れておけば同様に使えることは明
らかである。
In the above example, we took the case where the correct character is not included in the candidates, but even if the written character is a non-standard character,
It is clear that if you include external characters in the similar character table, you can use them in the same way.

(発明の効果〕 本発明によれは、認識候補から類推して類似文字音候補
として追加できるため、認識候補の中に正解文字が含ま
れていなくても候補文字の中に高い確率で正解文字を含
ませることができ、単語照合結果を精度よ(することか
でざる。また類似文字テーブルの中に外字を含ませるこ
とによシ、外字文字が書かれていても単語照合を成功さ
せることができる。
(Effects of the Invention) According to the present invention, since it is possible to infer from recognition candidates and add them as similar character-sound candidates, even if the correct character is not included in the recognition candidates, there is a high probability that the correct character will be among the candidate characters. By including non-standard characters in the similar character table, word matching can be performed successfully even if non-standard characters are written. Can be done.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一笑施例を示すブロック図。 第2図に認識候補の例全示す説明図、第5囚は類似文字
テーブルの例を示て説明囚、第4図は認識候補と類似文
字テーフ諏しから作成され九候補群の1例を示す説明図
である。 1・・・・帳票。 2・・・文字認識手段。 4・・・候補選択手段。 6・・・単語照合手段。 代理麟理士小用勝男 発2A 発:5 凹 幕4 圓
FIG. 1 is a block diagram showing a simple embodiment of the present invention. Figure 2 is an explanatory diagram showing all examples of recognition candidates, Figure 5 is an explanatory diagram showing an example of a similar character table, and Figure 4 is an example of a group of nine candidates created from recognition candidates and similar character table summaries. FIG. 1...Ledger. 2...Character recognition means. 4...Candidate selection means. 6...Word matching means. Deputy Rinshi Katsuo Koyo 2A Departure: 5 Inokumaku 4 En

Claims (1)

【特許請求の範囲】[Claims] 1、帳票上に書かれた文字を認識する文字読取手段と、
該読取手段によって読取られた文字が単語辞書に登録さ
れた単語と一致することを検出するマッチング手段を具
備する文字読取装置において、文字読取手段より得られ
た認識候補から、それに類似した複数個の類似文字を認
識候補に付加する候補選択手段を有し、かくして作成さ
れた候補群をもとに単語辞書に保持された単語の中から
最も類似性の大きい単語を選択するようにしたことを特
徴とする文字読取装置。
1. A character reading means for recognizing characters written on a form;
In a character reading device equipped with a matching means for detecting that a character read by the reading means matches a word registered in a word dictionary, a plurality of similar recognition candidates are selected from recognition candidates obtained by the character reading means. It is characterized by having a candidate selection means for adding similar characters to the recognition candidates, and selecting the word with the greatest similarity from among the words held in the word dictionary based on the candidate group thus created. Character reading device.
JP62249782A 1987-10-05 1987-10-05 Character reader Pending JPH0193876A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP62249782A JPH0193876A (en) 1987-10-05 1987-10-05 Character reader

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP62249782A JPH0193876A (en) 1987-10-05 1987-10-05 Character reader

Publications (1)

Publication Number Publication Date
JPH0193876A true JPH0193876A (en) 1989-04-12

Family

ID=17198147

Family Applications (1)

Application Number Title Priority Date Filing Date
JP62249782A Pending JPH0193876A (en) 1987-10-05 1987-10-05 Character reader

Country Status (1)

Country Link
JP (1) JPH0193876A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6477653B2 (en) 1998-01-08 2002-11-05 Fujitsu Limited Information storage system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6477653B2 (en) 1998-01-08 2002-11-05 Fujitsu Limited Information storage system

Similar Documents

Publication Publication Date Title
Grefenstette et al. What is a word, what is a sentence?: problems of Tokenisation
JPH0193876A (en) Character reader
JPS61114388A (en) Character input device
Foley Fuzzy merges: examples and techniques
JP2560959B2 (en) Post-processing method for character recognition
JPS63268083A (en) Word recognizing device
JP2996823B2 (en) Character recognition device
JPS63138479A (en) Character recognizing device
JP2746899B2 (en) Character recognition device
JPS63782A (en) Pattern recognizing device
JP2712260B2 (en) Character recognition device
JP3151866B2 (en) English character recognition method
JPS61107486A (en) Character recognition post-processing system
JPS6095689A (en) Optical character reader
JP2570784B2 (en) Document reader post-processing device
JP2839515B2 (en) Character reading system
JPS6160189A (en) Optical character reader
JP3245415B2 (en) Character recognition method
JPS5953986A (en) Character recognizing device
JPS6118080A (en) Character recognizer
JPS6336487A (en) Character reading system
JPH0442382A (en) Word reader
JPS6121581A (en) Character recognizer
JPH0546806A (en) Character recognition method
JPS5847066B2 (en) character recognition device