JPS60142791A - Dictionary forming method for character recognition - Google Patents

Dictionary forming method for character recognition

Info

Publication number
JPS60142791A
JPS60142791A JP58248320A JP24832083A JPS60142791A JP S60142791 A JPS60142791 A JP S60142791A JP 58248320 A JP58248320 A JP 58248320A JP 24832083 A JP24832083 A JP 24832083A JP S60142791 A JPS60142791 A JP S60142791A
Authority
JP
Japan
Prior art keywords
dictionary
learning data
character
feature code
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP58248320A
Other languages
Japanese (ja)
Inventor
Minoru Nagao
永尾 実
Nobukazu Nasuhara
茄子原 伸和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Omron Corp
Original Assignee
Tateisi Electronics Co
Omron Tateisi Electronics Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tateisi Electronics Co, Omron Tateisi Electronics Co filed Critical Tateisi Electronics Co
Priority to JP58248320A priority Critical patent/JPS60142791A/en
Publication of JPS60142791A publication Critical patent/JPS60142791A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To reduce the man-hour by forming a dictionary being a base in advance and collecting additional learning data while selecting them to decrease the required time for forming the dictionary. CONSTITUTION:The feature code relating to the learning data being a base is edited to generate a dictionary, the additional learning data is collected by utilizing a character reader, the data is re-edited and the dictionary 8 is formed. When the additional learning data is collected, a character of the additional modification mode written on a form is applied sequentially to a pattern reader 3, an A/D converter 4, a pre-processing circuit 5 and a feature extraction circuit 6 to extract a feature code of the character. The feature code is collated with the feature code in the dictionary 8 by a collation circuit 7 and only the feature code dissident with collation is selected and collected in an RAM9 as the additional learning data.

Description

【発明の詳細な説明】 〈発明の技術分野〉 本発明は、未知文字パターンの特徴コードを予め辞書に
格納しである標準パターンの特徴コードと照合して、未
知文字を特定化する文字認識装置に関連し、殊に本発明
は、前記標準パターンの特徴コードを編集して辞書を作
成する文字認識用辞書作成方法に関する。
[Detailed Description of the Invention] <Technical Field of the Invention> The present invention provides a character recognition device that specifies an unknown character by storing the characteristic code of an unknown character pattern in a dictionary in advance and comparing it with the characteristic code of a standard pattern. In particular, the present invention relates to a method for creating a dictionary for character recognition, in which a dictionary is created by editing the feature codes of the standard pattern.

〈発明の背景〉 従来この種辞書の作成は、不特定多数の者によって認識
対象とする各文字を筆記してもらい、これら筆記文字を
文字読取装置を用いて順次読み取って各7文字の特徴を
コード化し、その特徴コードを学習データとしてメモリ
に収集した後、この学習データを編集して辞書を作成し
ている。
<Background of the Invention> Conventionally, this type of dictionary was created by having an unspecified number of people write down each character to be recognized, and then sequentially reading these written characters using a character reading device to identify the characteristics of each of the seven characters. After coding and collecting the feature codes in memory as learning data, this learning data is edited to create a dictionary.

ところがかかる方式の場合、辞書精度を上げるため、で
きるだけ多くの変形態様の筆記文字を一度に収集する必
要がある。而も収集した筆記文字には特徴コードが共通
するものが含まれるため、辞書の編成に際し、学習デー
タの選別が必要となる。これがためデータ収集および選
別に多大の労力と時間とが費やされ、辞書作成作業の能
率を著しく低下させていた。更に筆記文字から得た学習
データは筆記者の数や内容等により特定の特徴コードに
片寄る傾向があるため、他の態様の筆記文字を用意して
、追加学習データを収集する必要がある。この場合、追
加学習データを同様の方法にてメモリに収集するため、
辞書に格納済の学習データとの間に重複が生じ、追加学
習データの選別に更に労力と時間とがかかる等、幾多の
問題があった。
However, in the case of such a method, in order to improve dictionary accuracy, it is necessary to collect as many deformed written characters as possible at once. However, since the collected written characters include characters with common feature codes, it is necessary to select the learning data when organizing the dictionary. As a result, a great deal of effort and time is expended in data collection and selection, which significantly reduces the efficiency of dictionary creation work. Furthermore, since learning data obtained from written characters tends to be biased toward specific feature codes depending on the number of scribes, content, etc., it is necessary to prepare other types of written characters and collect additional learning data. In this case, additional training data is collected in memory in the same way, so
There have been many problems, such as duplication with learning data already stored in the dictionary, and it takes more effort and time to select additional learning data.

〈発明の目的〉 本発明は、基本となる辞書を予め作成しておき、この辞
書を用いて追加学習データを選別しつつデータ収集する
ことによって、辞書作成に要する時間の短縮および、労
力の軽減を実現した新規な文字認識用辞書作成方法を提
供することを目的とする。
<Objective of the Invention> The present invention shortens the time required to create a dictionary and reduces labor by creating a basic dictionary in advance and collecting data while selecting additional learning data using this dictionary. The purpose of this paper is to provide a new method for creating a dictionary for character recognition that achieves the following.

〈発明の構成および効果〉 上記目的を達成するため、本発明では、まず基本となる
学習データの各特徴コードを編集して辞書を作成し、つ
いで追加データとすべき文字の特徴コードを抽出して、
前記辞書に格納材の基本学習データとコードを照合する
と共に、照合不一致の特徴コードのみを収集して辞書の
追加学習データとすることとした。
<Structure and Effects of the Invention> In order to achieve the above object, the present invention first creates a dictionary by editing each feature code of basic learning data, and then extracts character feature codes to be used as additional data. hand,
The basic learning data of the storage material is compared with the code in the dictionary, and only the feature codes that do not match are collected and used as additional learning data of the dictionary.

本発明は、一度に多くの辞書作成用の学習データを収集
せず、まず基本となる辞書を作成した後に、学習データ
を順次追加してゆく方式をとり、而も辞書に格納済のも
のと重複しない必要な追加学習データのみを収集するか
ら、辞書作成若しくは再編成の一連の処理を効率良〈実
施でき、時間の短縮並びに労力の軽減を実現する等、発
明目的を達成した効果を奏する。
The present invention does not collect a lot of learning data for dictionary creation at once, but instead creates a basic dictionary and then adds learning data one by one. Since only necessary additional learning data that is not duplicated is collected, a series of processes for creating or reorganizing a dictionary can be carried out efficiently, and the purpose of the invention can be achieved, such as shortening time and reducing labor.

〈実施例の説明〉 第1図は文字認識装置の概略構成を示す。図示例におい
て、帳票1に書かれた未知文字2はCCD (Char
ge −Coupled Device )等より成る
パターン読取装置3で光学的に読み取られ、その光学パ
ターンはA/D変換器4により時系列のデジタル信号に
変換される。前処理回路5は、デジタル信号のノイズ除
去、2値化処理等を行なって、文字パターンを画像メモ
リ(図示せず)に記憶させる。特徴抽出回路6は、画像
メモリに格納された文字パターンより文字認識に要する
文字の特徴、例えば交点数、端点数、分枝点数、ループ
数等をコード化して抽出する。辞書照合回路7は、特徴
抽出回路6が出方する特徴コードを、ROM (Rea
d 0nly Memory )より成る辞書8に格納
しである標準パターンの特徴コードと照合し、この両者
が一致するとき、その特徴コードをもつ標準パターンの
IDコードを出力し、また不一致のとき、リジェクト処
理される。
<Description of Embodiments> FIG. 1 shows a schematic configuration of a character recognition device. In the illustrated example, unknown character 2 written on form 1 is a CCD (Char
The optical pattern is optically read by a pattern reading device 3 such as a GE-Coupled Device, and the optical pattern is converted into a time-series digital signal by an A/D converter 4. The preprocessing circuit 5 performs noise removal and binarization processing on the digital signal, and stores the character pattern in an image memory (not shown). The feature extraction circuit 6 encodes and extracts character features required for character recognition, such as the number of intersections, the number of end points, the number of branch points, and the number of loops, from the character pattern stored in the image memory. The dictionary matching circuit 7 stores the feature code generated by the feature extraction circuit 6 in a ROM (Rea).
d 0nly Memory ) is compared with the feature code of the standard pattern stored in the dictionary 8, and when the two match, the ID code of the standard pattern with that feature code is output, and when they do not match, reject processing is performed. be done.

前記辞書8には、文字の各変形態様につきその特徴をコ
ード化した複数の特徴コードが各文字毎に格納されてい
る。第2図は片仮名文字「ア」について、各変形態様の
特徴(図示例では、連結の有無、ループの有無、左側の
凹みの有無、端点3個の有無、分校点の有無)を5ビツ
ト構成のコードで示しである。例えば第2図(2)に示
す変形態様の「ア」は、「左側の凹み」についての特徴
のみを具備しており、roolooJの特徴コードが付
与される。
The dictionary 8 stores a plurality of feature codes for each character, which encode the features of each deformation of the character. Figure 2 shows a 5-bit configuration of the characteristics of each deformed form of the katakana character "A" (in the illustrated example, the presence or absence of a connection, the presence or absence of a loop, the presence or absence of a dent on the left side, the presence or absence of 3 end points, and the presence or absence of a branch point). This is shown in the code below. For example, the modified form "A" shown in FIG. 2(2) has only the feature of "a dent on the left side" and is given the feature code of roolooJ.

上記の辞書8は、まず特定の者が基本となる学習データ
を収集し、この学習データにかかる字読取装置を利用し
て、追加学習データを収集し、然る後デー タを再編成
して完成される。この追加学習データを収集するには、
追加に必要な変形態様の文字を帳票上に書き、この文字
をパターン読取装置3で光学的に読み取り、A/D変換
器4および前処理回路5で所定の処理を施こした後、特
徴抽出回路6により文字の特徴コードを抽出する。つい
でこの特徴コードは、第3図に示す如く、辞書照合回路
7にて前記辞書8に格納した特徴コードと照合され、照
合不一致の特徴コードのみが選別されて、RAM(Ra
ndom Access Memory ) 9に追加
学習データとして収集される。
Dictionary 8 above is created by first collecting basic learning data by a specific person, using a character reading device for this learning data to collect additional learning data, and then reorganizing the data. be completed. To collect this additional training data,
Characters of the transformation required for addition are written on the form, these characters are optically read by the pattern reading device 3, and after predetermined processing is performed by the A/D converter 4 and preprocessing circuit 5, features are extracted. A circuit 6 extracts character feature codes. Then, as shown in FIG. 3, this feature code is compared with the feature code stored in the dictionary 8 in a dictionary matching circuit 7, and only feature codes that do not match are selected and stored in RAM (Ra).
ndom Access Memory ) 9 is collected as additional learning data.

第4図はかかる追加学習データの収集処理をフローチャ
ートとして示している。まずステップ11で追加に必要
な変形態様の文字が読み取られ、つづくステップ12.
13で前処理、特徴抽出の各処理が実行される。つぎに
ステップ14において、その文字の特徴コードと辞書8
に格納された基本学習データにかかる各特徴コードとが
照合され、コードの一致があると、ステップ14が“Y
ES” となり、ステップ16へ進む。もしコードの一
致がないときは、ステップ14が” No ”となり、
つぎのステップ15において、読取文字にかかる特徴コ
ードは追加学習データとしてRAM9に格納される。そ
してつぎの読取文字がある場合には、ステップ16ノ判
定が−YES’′となってステップ11へ戻り、同様の
処理が実行される。斯くてRAM9には、辞書8に格納
されていない特徴コードのみが選別されて次々に収集さ
れ、収集後の追加学習データは先の学習データと共に再
編集されて辞書8に格納される。
FIG. 4 shows a flowchart of such additional learning data collection processing. First, in step 11, the character of the transformation required for addition is read, followed by step 12.
In step 13, preprocessing and feature extraction processing are executed. Next, in step 14, the feature code of that character and the dictionary 8
Each feature code related to the basic learning data stored in
ES” and the process proceeds to step 16. If there is no code match, step 14 becomes “No” and the process proceeds to step 16.
In the next step 15, the feature code related to the read character is stored in the RAM 9 as additional learning data. If there is a next character to be read, the determination at step 16 becomes -YES'' and the process returns to step 11, where the same process is executed. In this way, only the feature codes that are not stored in the dictionary 8 are selected and collected one after another in the RAM 9, and the additional learning data after collection is re-edited together with the previous learning data and stored in the dictionary 8.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は文字認識装置の概略構成を示すブロック図、第
2図は文字「ア」の各変形態様につきその特徴をデータ
化して示す説明図、第3図は追加学習データの収集に用
いる回路のブロック図、第4図は追加学習データの収集
処理を示すフローチャートを示す。
Figure 1 is a block diagram showing the schematic configuration of a character recognition device, Figure 2 is an explanatory diagram showing the characteristics of each deformation of the character "a" converted into data, and Figure 3 is a circuit used to collect additional learning data. FIG. 4 shows a flowchart showing the process of collecting additional learning data.

Claims (1)

【特許請求の範囲】[Claims] 基本となる学習データの各特徴コードを編集して辞書を
作成した後、追加データとすべき文字の特徴コードを抽
出して、前記辞書に格納済の基本学習データとコードを
照合すると共に、照合不一致の特徴コードのみを収集し
て辞書の追加学習データとすることを特徴とする文字認
識用辞書作成方法。
After creating a dictionary by editing each feature code of the basic learning data, extract the feature codes of characters that should be added data, match the codes with the basic learning data already stored in the dictionary, and A method for creating a dictionary for character recognition, characterized in that only unmatched feature codes are collected and used as additional learning data for the dictionary.
JP58248320A 1983-12-29 1983-12-29 Dictionary forming method for character recognition Pending JPS60142791A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58248320A JPS60142791A (en) 1983-12-29 1983-12-29 Dictionary forming method for character recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58248320A JPS60142791A (en) 1983-12-29 1983-12-29 Dictionary forming method for character recognition

Publications (1)

Publication Number Publication Date
JPS60142791A true JPS60142791A (en) 1985-07-27

Family

ID=17176323

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58248320A Pending JPS60142791A (en) 1983-12-29 1983-12-29 Dictionary forming method for character recognition

Country Status (1)

Country Link
JP (1) JPS60142791A (en)

Similar Documents

Publication Publication Date Title
JP2726568B2 (en) Character recognition method and device
JP3139521B2 (en) Automatic language determination device
US6389166B1 (en) On-line handwritten Chinese character recognition apparatus
US5982933A (en) Information processing method, information processing apparatus, and storage medium
JPH0682403B2 (en) Optical character reader
JP2847715B2 (en) Character recognition device and character recognition method
JPS60142791A (en) Dictionary forming method for character recognition
JPS5922179A (en) Character recognizing method
JPS60142487A (en) Forming device of dictionary for recognizing character
JP3249654B2 (en) Creating a dictionary for character recognition
JPS60142486A (en) Recognizing device of general drawing
JPS60153579A (en) Forming method of dictionary for character recognition
CN118506384A (en) Printed drum set music score identification method and system based on image sequence
JPS63263588A (en) Character reader
JP3100786B2 (en) Character recognition post-processing method
JPH0326879B2 (en)
JPS60163188A (en) Dictionary producing device for character recognition
JPS6115288A (en) Optical character reader
JPH067393B2 (en) Character recognition device
JPH0272497A (en) Optical character reader
JPS5914078A (en) Reader of business form
JP2851865B2 (en) Character recognition device
JPS6143383A (en) Character recognizer
JPS62169289A (en) Optical character reader
Amin Recognition of Printed Arabic Text via Machine Learning