JPS5847066B2

JPS5847066B2 - character recognition device

Info

Publication number: JPS5847066B2
Application number: JP54054907A
Authority: JP
Inventors: 正広大川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1979-05-04
Filing date: 1979-05-04
Publication date: 1983-10-20
Also published as: JPS55157077A

Description

【発明の詳細な説明】本発明は手書き文字の特徴ビットパターンを辞書の内容
と照合し複数個の答の候補を出力し、一方読取対象文字
テーブルを作成しこれと照合することにより最終的な答
を得る文字認識装置に関するものである。[Detailed Description of the Invention] The present invention compares the characteristic bit pattern of handwritten characters with the contents of a dictionary and outputs a plurality of answer candidates. On the other hand, by creating a character table to be read and comparing it with this, the final result is obtained. This relates to a character recognition device that obtains answers.

従来、この種の文字認識装置は、たとえば第１図に示す
ように、帳票１に書かれた手書き文字は光学文字読取（
ＯＣＲ）装置１０の読取機構２の走査器によって読み取
られ、ビデオデータとしてメモリ３に格納される。Conventionally, this type of character recognition device, as shown in FIG. 1, uses optical character reading (
The data is read by the scanner of the reading mechanism 2 of the OCR) device 10 and stored in the memory 3 as video data.

格納されたビデオデータは特徴抽出部４に送られ直ちに
解析され、その文字の特徴が抽出されて、読み取られた
文字のビットパターンが作成されレジスタ５に格納され
る。The stored video data is sent to the feature extractor 4 and immediately analyzed, the features of the character are extracted, and a bit pattern of the read character is created and stored in the register 5.

一方、ＯＣＲ装置１０内のメモリには、手書きによる多
くの数字、かな、英字、および各種記号のあらゆる字形
の特徴があらかじめ抽出され、ビットパターン化されて
格納されている。On the other hand, in the memory in the OCR device 10, the characteristics of many handwritten numbers, kana, alphabets, and various symbols are extracted in advance and stored in the form of bit patterns.

この蓄えられたビットパターン群を辞書６と称している
。This stored bit pattern group is called a dictionary 6.

前述の帳票１から読み込まれた手書き文字は、そのビッ
トパターンが作成されると、直ちに辞書６中の文字のビ
ットパターンと逐一照合が開始される。Immediately after the bit patterns of the handwritten characters read from the form 1 are created, point-by-point comparison with the bit patterns of the characters in the dictionary 6 is started.

ここで読み込まれた文字のビットパターンが辞書中のあ
る文字のビットパターンと一致すると、その辞書６中の
文字が読み取り結果として出力される。If the bit pattern of the character read here matches the bit pattern of a certain character in the dictionary, that character in the dictionary 6 is output as the reading result.

また、辞書６中のどの文字のビットパターンとも一致し
なげればリジェクト文字として出力される。Further, if the bit pattern does not match any character in the dictionary 6, the character is output as a reject character.

現在の文字認識装置の多くは数字、かな、英字、特殊記
号等が単独で存在する場合と、それらのいくつかが混在
する場合があるが、後者では類似文字の種類および組合
せが多くなる。Many of the current character recognition devices have numbers, kana, alphabetic characters, special symbols, etc., either singly or in combination with some of them, but in the latter case, there are many types and combinations of similar characters.

たとえば、数字の「７」とかなの「す」、「ワ」、数字
「８」と英字ｒＢｌのような場合手書きで書くと区別で
きない場合が往々起る。For example, the number "7", the kana "su", "wa", the number "8", and the alphabetic letter rBl are often difficult to distinguish when written by hand.

しかし、従来の特徴抽出部では辞書に収められた特徴と
の比較だけで判定され１つの答が出力される。However, in the conventional feature extraction unit, a determination is made only by comparison with features stored in a dictionary, and a single answer is output.

この場合読取対象文字が明らかに数字であることが分っ
ていても、辞書が１つであると字形からだけ判定される
からかなの類似文字を出力する場合が起る。In this case, even if the character to be read is clearly known to be a numeric character, if there is only one dictionary, the character can be determined only from the shape of the character, so similar characters such as kana may be output.

このような誤りを防止するため、読取対象によって複数
種類の辞書を用意して、あらかじめ読取対象の辞書を選
択することにより、正しい文字を認識することができる
。In order to prevent such errors, correct characters can be recognized by preparing a plurality of dictionaries depending on the object to be read and selecting the dictionary to be read in advance.

しかし、辞書は文字ビットパターンから成るぼう犬な容
量を有するメモリであるから、これを複数個もって制御
することは構成を複雑化し高価格となることは明らかで
ある。However, since a dictionary is a memory with a large capacity consisting of character bit patterns, it is obvious that controlling a plurality of dictionaries complicates the configuration and increases the cost.

本発明の目的は数字、かな、英字等の混在する手書き文
字を■個の辞書で誤りなく認識しうる文字認識装置を提
供することである。An object of the present invention is to provide a character recognition device that can recognize handwritten characters containing a mixture of numbers, kana, alphabetic characters, etc., without error using a number of dictionaries.

前記目的を達成するため、本発明の文字認識装置は手書
き文字の特徴を抽出した文字ビットパターンを格納した
辞書を設け、読取られた手書き文字のビデオデータを特
徴抽出および判定演算部に入れて特徴を抽出し前記辞書
と照合して判定演算の結果得られた文字の文字種優先順
位を付与して複数個の答の候補を出力し、一方入力する
手書き文字の文字種を格納する読取対象文字テーブルを
作成し、前記複数の答の候補を照合回路に入れ前記読取
対象文字テーブルと照合し、前記読取対象文字テーブル
に格納された文字種と合致する答の候補を出力すること
により最終の答を得ることを特徴とするものである。In order to achieve the above object, the character recognition device of the present invention includes a dictionary that stores character bit patterns from which features of handwritten characters are extracted, and inputs video data of the read handwritten characters into a feature extraction and determination calculation section to extract the features. is extracted and compared with the dictionary, and a character type priority is given to the character obtained as a result of the determination operation, and a plurality of answer candidates are outputted, while a reading target character table is created that stores the character type of input handwritten characters. A final answer is obtained by inputting the plurality of answer candidates into a matching circuit, comparing them with the reading target character table, and outputting answer candidates that match the character types stored in the reading target character table. It is characterized by:

以下本発明を実施例につき詳述する。The present invention will be described in detail below with reference to examples.

第２図は本発明の実施例の構成を示す説明図である。FIG. 2 is an explanatory diagram showing the configuration of an embodiment of the present invention.

同図において、たとえば第１図のＯＣＲ装置１０の読取
機構２に帳票１を挿入し、手書き文字を読取りメモリ３
に格納し、格納されたビデオデータを第２図に示す特徴
抽出および判定演算部１１に入れ、第１図と同様に特徴
を抽出し、辞書（メモリ）６の内容と照合し、判定演算
の結果答の候補として第１候補〜第ｎ候補を出力する。In the same figure, for example, a form 1 is inserted into the reading mechanism 2 of the OCR device 10 shown in FIG.
The stored video data is input into the feature extraction and judgment calculation section 11 shown in FIG. 2, where the features are extracted in the same way as in FIG. The first to nth candidates are output as answer candidates.

すなわち、辞書６には数字、かな、英字、特殊記号等の
特徴の文字ビットパターンを格納し、入力文字のビデオ
データの特徴と照合する。That is, the dictionary 6 stores character bit patterns of characteristics such as numbers, kana, alphabets, special symbols, etc., and compares them with characteristics of video data of input characters.

この場合、特徴の各項目につき優先順位を与え、複数の
答應１〜Ａｎを出力する。In this case, a priority is given to each feature item, and a plurality of answers 1 to An are output.

たとえば、読み取られた類似文字のうち特徴の内容によ
り数字、かな、英字の順序に優先順位を与えるとすると
、前述の数字「７」の特徴に最も近いものを第１優先と
し、次にかな「す」またはかな「ワ」により近い一致を
示すものに以下の優先順位を与える。For example, if we were to give priority to numbers, kana, and alphabetic characters based on the feature content among the similar characters that were read, we would give first priority to the character closest to the feature of the number "7" mentioned above, followed by kana " The following priority is given to those showing a closer match to ``su'' or kana ``wa''.

そして優先順位を与えた複数の答の候補＆１〜Ａ、ｎ
にそれぞれ固有のアドレスを与えて照合回路１３に送る
。Then, select multiple answer candidates with priority &1~A, n
A unique address is given to each of them and sent to the matching circuit 13.

一方、読取対象文字の数字、かな、英字等にそれぞれ固
有のアドレスを与えておき、このアドレスに従い、読取
対象文字テーブル１２を作成し、この内容を照合回路１
３に送り、前述の複数の答の候補との照合をとる。On the other hand, a unique address is assigned to each of the characters to be read, such as numbers, kana, alphabetic characters, etc. According to these addresses, a character to be read table 12 is created, and the contents are sent to the matching circuit 1.
3 and is compared with the multiple answer candidates mentioned above.

すなわち複数の答の第１優先の文字の固有アドレスに対
応し読取対象文字テーブル１２を探索し一致を検出し、
一致しなげれば第２優先に進む。That is, searching the reading target character table 12 corresponding to the unique address of the first priority character of the plurality of answers and detecting a match;
If they do not match, proceed to the second priority.

前述の例で読取対象文字がたとえば数字「０〜９」、が
な「ア、イ、つ、ニオ」とすれば数字「７」が決定的で
あり、かな「す」が出力する余地はない。In the above example, if the characters to be read are the numbers "0-9" and the Japanese characters "a, i, tsu, nio", the number "7" is decisive, and there is no room for the Japanese character "su" to be output. .

これに対して読取対象文字がかなのみであれば、第１優
先の数字「７」は棄てられ、第２優先のかな「す」と第
３優先の「ワ」のうち「す」の方が辞書との一致が近い
とすれば第２優先のかな「す」が出力されることになる
。On the other hand, if the only characters to be read are kana, the first priority digit "7" is discarded, and the second priority digit "su" and the third priority digit "wa" are replaced by "su". If the match with the dictionary is close, the second priority kana "su" will be output.

前掲の数字「８」と英字ｒＢＪの場合は、読取対象文字
に何れかを欠如している時決定的であるが、両者が存在
する場合には前例のように辞書との一致の近い方を優先
とするかまたは両者とも不一致としてリジェクトされる
。In the case of the number "8" and the alphabet rBJ mentioned above, it is decisive when either of the characters to be read is missing, but if both exist, the one with the closest match with the dictionary is selected as in the previous example. Either it will be given priority or both will be rejected as inconsistency.

以上説明したように、本発明によれば、手書き文字の特
徴ビットパターンを辞書の内容と照合し複数個の答の候
補を出力し、一方読取対象文字テーブルを作成しこれと
照合することにより最終的な答を得るものである。As explained above, according to the present invention, a plurality of answer candidates are output by comparing the characteristic bit patterns of handwritten characters with the contents of a dictionary, while a reading target character table is created and compared with this to obtain a final result. The answer is:

これにより従来の手書きの文字形だけの特徴を判定して
１答のみを出力するものに比し誤りが少なく、前述の複
数の辞書を設けたのと同じ効果が１個の辞書で得られる
。As a result, there are fewer errors compared to the conventional method that judges the features of only handwritten character shapes and outputs only one answer, and the same effect as the above-mentioned plural dictionaries can be obtained with a single dictionary.

この場合、読取対象文字テーブルが必要であるが、これ
は単に数字、かな、英字等に固有アドレスを与えた簡単
なメモリであるから構成上余り問題とはならない。In this case, a reading target character table is required, but since this is a simple memory in which unique addresses are given to numbers, kana, alphabetic characters, etc., it does not pose much of a problem in terms of structure.

[Brief explanation of the drawing]

第１図は従来例の説明図、第２図は本発明の実施例の構
成を示す説明図であり、図中、１は帳票、２は読取機構
、３はメモリ、６は辞書（メモリ）、１１は特徴抽出お
よび判定演算部、１２は読取対象文字テーブル、１３は
照合回路を示す。FIG. 1 is an explanatory diagram of a conventional example, and FIG. 2 is an explanatory diagram showing the configuration of an embodiment of the present invention. In the figure, 1 is a form, 2 is a reading mechanism, 3 is a memory, and 6 is a dictionary (memory). , 11 is a feature extraction and determination calculation section, 12 is a reading target character table, and 13 is a collation circuit.

Claims

[Claims]

1. A dictionary storing character bit patterns from which features of handwritten characters have been extracted is provided, and the video data of the read handwritten characters is input into a feature extraction and judgment calculation unit to extract the features and compared with the dictionary to obtain the result of judgment calculation. A character type priority order of the obtained characters is given to output a plurality of answer candidates, while a reading target character table is created that stores the character types of handwritten characters to be input, and the plurality of answer candidates are sent to a matching circuit. A character recognition device characterized in that a final answer is obtained by comparing the characters with the character table to be read and outputting answer candidates that match the character types stored in the character table to be read.