JPS60142791A

JPS60142791A - Dictionary forming method for character recognition

Info

Publication number: JPS60142791A
Application number: JP58248320A
Authority: JP
Inventors: Minoru Nagao; 永尾　実; Nobukazu Nasuhara; 茄子原　伸和
Original assignee: Tateisi Electronics Co; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 1983-12-29
Filing date: 1983-12-29
Publication date: 1985-07-27

Abstract

PURPOSE:To reduce the man-hour by forming a dictionary being a base in advance and collecting additional learning data while selecting them to decrease the required time for forming the dictionary. CONSTITUTION:The feature code relating to the learning data being a base is edited to generate a dictionary, the additional learning data is collected by utilizing a character reader, the data is re-edited and the dictionary 8 is formed. When the additional learning data is collected, a character of the additional modification mode written on a form is applied sequentially to a pattern reader 3, an A/D converter 4, a pre-processing circuit 5 and a feature extraction circuit 6 to extract a feature code of the character. The feature code is collated with the feature code in the dictionary 8 by a collation circuit 7 and only the feature code dissident with collation is selected and collected in an RAM9 as the additional learning data.

Description

【発明の詳細な説明】〈発明の技術分野〉本発明は、未知文字パターンの特徴コードを予め辞書に
格納しである標準パターンの特徴コードと照合して、未
知文字を特定化する文字認識装置に関連し、殊に本発明
は、前記標準パターンの特徴コードを編集して辞書を作
成する文字認識用辞書作成方法に関する。[Detailed Description of the Invention] <Technical Field of the Invention> The present invention provides a character recognition device that specifies an unknown character by storing the characteristic code of an unknown character pattern in a dictionary in advance and comparing it with the characteristic code of a standard pattern. In particular, the present invention relates to a method for creating a dictionary for character recognition, in which a dictionary is created by editing the feature codes of the standard pattern.

〈発明の背景〉従来この種辞書の作成は、不特定多数の者によって認識
対象とする各文字を筆記してもらい、これら筆記文字を
文字読取装置を用いて順次読み取って各７文字の特徴を
コード化し、その特徴コードを学習データとしてメモリ
に収集した後、この学習データを編集して辞書を作成し
ている。<Background of the Invention> Conventionally, this type of dictionary was created by having an unspecified number of people write down each character to be recognized, and then sequentially reading these written characters using a character reading device to identify the characteristics of each of the seven characters. After coding and collecting the feature codes in memory as learning data, this learning data is edited to create a dictionary.

ところがかかる方式の場合、辞書精度を上げるため、で
きるだけ多くの変形態様の筆記文字を一度に収集する必
要がある。而も収集した筆記文字には特徴コードが共通
するものが含まれるため、辞書の編成に際し、学習デー
タの選別が必要となる。これがためデータ収集および選
別に多大の労力と時間とが費やされ、辞書作成作業の能
率を著しく低下させていた。更に筆記文字から得た学習
データは筆記者の数や内容等により特定の特徴コードに
片寄る傾向があるため、他の態様の筆記文字を用意して
、追加学習データを収集する必要がある。この場合、追
加学習データを同様の方法にてメモリに収集するため、
辞書に格納済の学習データとの間に重複が生じ、追加学
習データの選別に更に労力と時間とがかかる等、幾多の
問題があった。However, in the case of such a method, in order to improve dictionary accuracy, it is necessary to collect as many deformed written characters as possible at once. However, since the collected written characters include characters with common feature codes, it is necessary to select the learning data when organizing the dictionary. As a result, a great deal of effort and time is expended in data collection and selection, which significantly reduces the efficiency of dictionary creation work. Furthermore, since learning data obtained from written characters tends to be biased toward specific feature codes depending on the number of scribes, content, etc., it is necessary to prepare other types of written characters and collect additional learning data. In this case, additional training data is collected in memory in the same way, so
There have been many problems, such as duplication with learning data already stored in the dictionary, and it takes more effort and time to select additional learning data.

〈発明の目的〉本発明は、基本となる辞書を予め作成しておき、この辞
書を用いて追加学習データを選別しつつデータ収集する
ことによって、辞書作成に要する時間の短縮および、労
力の軽減を実現した新規な文字認識用辞書作成方法を提
供することを目的とする。<Objective of the Invention> The present invention shortens the time required to create a dictionary and reduces labor by creating a basic dictionary in advance and collecting data while selecting additional learning data using this dictionary. The purpose of this paper is to provide a new method for creating a dictionary for character recognition that achieves the following.

〈発明の構成および効果〉上記目的を達成するため、本発明では、まず基本となる
学習データの各特徴コードを編集して辞書を作成し、つ
いで追加データとすべき文字の特徴コードを抽出して、
前記辞書に格納材の基本学習データとコードを照合する
と共に、照合不一致の特徴コードのみを収集して辞書の
追加学習データとすることとした。<Structure and Effects of the Invention> In order to achieve the above object, the present invention first creates a dictionary by editing each feature code of basic learning data, and then extracts character feature codes to be used as additional data. hand,
The basic learning data of the storage material is compared with the code in the dictionary, and only the feature codes that do not match are collected and used as additional learning data of the dictionary.

本発明は、一度に多くの辞書作成用の学習データを収集
せず、まず基本となる辞書を作成した後に、学習データ
を順次追加してゆく方式をとり、而も辞書に格納済のも
のと重複しない必要な追加学習データのみを収集するか
ら、辞書作成若しくは再編成の一連の処理を効率良〈実
施でき、時間の短縮並びに労力の軽減を実現する等、発
明目的を達成した効果を奏する。The present invention does not collect a lot of learning data for dictionary creation at once, but instead creates a basic dictionary and then adds learning data one by one. Since only necessary additional learning data that is not duplicated is collected, a series of processes for creating or reorganizing a dictionary can be carried out efficiently, and the purpose of the invention can be achieved, such as shortening time and reducing labor.

〈実施例の説明〉第１図は文字認識装置の概略構成を示す。図示例におい
て、帳票１に書かれた未知文字２はＣＣＤ　（Ｃｈａｒ
ｇｅ　−Ｃｏｕｐｌｅｄ　Ｄｅｖｉｃｅ　）等より成る
パターン読取装置３で光学的に読み取られ、その光学パ
ターンはＡ／Ｄ変換器４により時系列のデジタル信号に
変換される。前処理回路５は、デジタル信号のノイズ除
去、２値化処理等を行なって、文字パターンを画像メモ
リ（図示せず）に記憶させる。特徴抽出回路６は、画像
メモリに格納された文字パターンより文字認識に要する
文字の特徴、例えば交点数、端点数、分枝点数、ループ
数等をコード化して抽出する。辞書照合回路７は、特徴
抽出回路６が出方する特徴コードを、ＲＯＭ　（Ｒｅａ
ｄ　０ｎｌｙ　Ｍｅｍｏｒｙ　）より成る辞書８に格納
しである標準パターンの特徴コードと照合し、この両者
が一致するとき、その特徴コードをもつ標準パターンの
ＩＤコードを出力し、また不一致のとき、リジェクト処
理される。<Description of Embodiments> FIG. 1 shows a schematic configuration of a character recognition device. In the illustrated example, unknown character 2 written on form 1 is a CCD (Char
The optical pattern is optically read by a pattern reading device 3 such as a GE-Coupled Device, and the optical pattern is converted into a time-series digital signal by an A/D converter 4. The preprocessing circuit 5 performs noise removal and binarization processing on the digital signal, and stores the character pattern in an image memory (not shown). The feature extraction circuit 6 encodes and extracts character features required for character recognition, such as the number of intersections, the number of end points, the number of branch points, and the number of loops, from the character pattern stored in the image memory. The dictionary matching circuit 7 stores the feature code generated by the feature extraction circuit 6 in a ROM (Rea).
d 0nly Memory ) is compared with the feature code of the standard pattern stored in the dictionary 8, and when the two match, the ID code of the standard pattern with that feature code is output, and when they do not match, reject processing is performed. be done.

前記辞書８には、文字の各変形態様につきその特徴をコ
ード化した複数の特徴コードが各文字毎に格納されてい
る。第２図は片仮名文字「ア」について、各変形態様の
特徴（図示例では、連結の有無、ループの有無、左側の
凹みの有無、端点３個の有無、分校点の有無）を５ビツ
ト構成のコードで示しである。例えば第２図（２）に示
す変形態様の「ア」は、「左側の凹み」についての特徴
のみを具備しており、ｒｏｏｌｏｏＪの特徴コードが付
与される。The dictionary 8 stores a plurality of feature codes for each character, which encode the features of each deformation of the character. Figure 2 shows a 5-bit configuration of the characteristics of each deformed form of the katakana character "A" (in the illustrated example, the presence or absence of a connection, the presence or absence of a loop, the presence or absence of a dent on the left side, the presence or absence of 3 end points, and the presence or absence of a branch point). This is shown in the code below. For example, the modified form "A" shown in FIG. 2(2) has only the feature of "a dent on the left side" and is given the feature code of roolooJ.

上記の辞書８は、まず特定の者が基本となる学習データ
を収集し、この学習データにかかる字読取装置を利用し
て、追加学習データを収集し、然る後デー　タを再編成
して完成される。この追加学習データを収集するには、
追加に必要な変形態様の文字を帳票上に書き、この文字
をパターン読取装置３で光学的に読み取り、Ａ／Ｄ変換
器４および前処理回路５で所定の処理を施こした後、特
徴抽出回路６により文字の特徴コードを抽出する。つい
でこの特徴コードは、第３図に示す如く、辞書照合回路
７にて前記辞書８に格納した特徴コードと照合され、照
合不一致の特徴コードのみが選別されて、ＲＡＭ（Ｒａ
ｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ　）　９に追加
学習データとして収集される。Dictionary 8 above is created by first collecting basic learning data by a specific person, using a character reading device for this learning data to collect additional learning data, and then reorganizing the data. be completed. To collect this additional training data,
Characters of the transformation required for addition are written on the form, these characters are optically read by the pattern reading device 3, and after predetermined processing is performed by the A/D converter 4 and preprocessing circuit 5, features are extracted. A circuit 6 extracts character feature codes. Then, as shown in FIG. 3, this feature code is compared with the feature code stored in the dictionary 8 in a dictionary matching circuit 7, and only feature codes that do not match are selected and stored in RAM (Ra).
ndom Access Memory ) 9 is collected as additional learning data.

第４図はかかる追加学習データの収集処理をフローチャ
ートとして示している。まずステップ１１で追加に必要
な変形態様の文字が読み取られ、つづくステップ１２．
１３で前処理、特徴抽出の各処理が実行される。つぎに
ステップ１４において、その文字の特徴コードと辞書８
に格納された基本学習データにかかる各特徴コードとが
照合され、コードの一致があると、ステップ１４が“Ｙ
ＥＳ”　となり、ステップ１６へ進む。もしコードの一
致がないときは、ステップ１４が”　Ｎｏ　”となり、
つぎのステップ１５において、読取文字にかかる特徴コ
ードは追加学習データとしてＲＡＭ９に格納される。そ
してつぎの読取文字がある場合には、ステップ１６ノ判
定が−ＹＥＳ’′となってステップ１１へ戻り、同様の
処理が実行される。斯くてＲＡＭ９には、辞書８に格納
されていない特徴コードのみが選別されて次々に収集さ
れ、収集後の追加学習データは先の学習データと共に再
編集されて辞書８に格納される。FIG. 4 shows a flowchart of such additional learning data collection processing. First, in step 11, the character of the transformation required for addition is read, followed by step 12.
In step 13, preprocessing and feature extraction processing are executed. Next, in step 14, the feature code of that character and the dictionary 8
Each feature code related to the basic learning data stored in
ES” and the process proceeds to step 16. If there is no code match, step 14 becomes “No” and the process proceeds to step 16.
In the next step 15, the feature code related to the read character is stored in the RAM 9 as additional learning data. If there is a next character to be read, the determination at step 16 becomes -YES'' and the process returns to step 11, where the same process is executed. In this way, only the feature codes that are not stored in the dictionary 8 are selected and collected one after another in the RAM 9, and the additional learning data after collection is re-edited together with the previous learning data and stored in the dictionary 8.

[Brief explanation of the drawing]

第１図は文字認識装置の概略構成を示すブロック図、第
２図は文字「ア」の各変形態様につきその特徴をデータ
化して示す説明図、第３図は追加学習データの収集に用
いる回路のブロック図、第４図は追加学習データの収集
処理を示すフローチャートを示す。Figure 1 is a block diagram showing the schematic configuration of a character recognition device, Figure 2 is an explanatory diagram showing the characteristics of each deformation of the character "a" converted into data, and Figure 3 is a circuit used to collect additional learning data. FIG. 4 shows a flowchart showing the process of collecting additional learning data.

Claims

[Claims]

After creating a dictionary by editing each feature code of the basic learning data, extract the feature codes of characters that should be added data, match the codes with the basic learning data already stored in the dictionary, and A method for creating a dictionary for character recognition, characterized in that only unmatched feature codes are collected and used as additional learning data for the dictionary.