JPH0765128A

JPH0765128A - Method for generating dictionary for type character recognition

Info

Publication number: JPH0765128A
Application number: JP5216346A
Authority: JP
Inventors: Koji Hashimoto; 幸治橋本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1993-08-31
Filing date: 1993-08-31
Publication date: 1995-03-10
Anticipated expiration: 2017-01-21
Also published as: JP3249654B2

Abstract

PURPOSE:To efficiently generate the dictionary for type character recognition which enables the best character recognition within a limited amount of data. CONSTITUTION:The generating method for the type character recognition for a character recognition device which recognizes type characters of plural fonts by using a predetermined amount of data consists of an initial dictionary generating process (S1) which has templates for respective categories of the respective fonts to be recognized and generates the dictionary for character recognition that enables nearly complete character recognition, and an unnecessary template deleting process (S2) which recognizes evaluation data by the character recognition device by using the generated dictionary, and selects and deletes unnecessary templates on the basis of the recognition result up to the predetermined amount of data (S3).

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は活字文字認識用辞書の作
成方法に係り、特に、複数フォントの活字の機械認識を
行う、即ち、入力された文字イメージから特徴を抽出
し、予め持っている辞書と比較することで識別を行う文
字認識装置のための活字文字認識用辞書の作成方法に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for creating a dictionary for recognizing print characters, and more particularly, it performs machine recognition of print characters of a plurality of fonts, that is, features are extracted from input character images and have them in advance. The present invention relates to a method for creating a dictionary for recognizing printed characters for a character recognition device that performs identification by comparing with a dictionary.

【０００２】[0002]

【従来の技術】一般に、複数のフォントの活字文字の認
識を行うに際しては、複数のフォントの文字を少ないテ
ンプレート数の辞書で認識させようとする。ここで、装
置に格納できる辞書のデータ容量はそれぞれ限られてい
るから、この容量内でできるだけ高い文字の認識率を得
ることができる辞書を選択する必要がある。2. Description of the Related Art Generally, when recognizing print characters in a plurality of fonts, it is attempted to recognize characters in a plurality of fonts with a dictionary having a small number of templates. Here, since the data capacity of each dictionary that can be stored in the device is limited, it is necessary to select a dictionary that can obtain a character recognition rate as high as possible within this capacity.

【０００３】即ち、パターン認識装置のための辞書は、
装置のメモリ容量や処理速度の関係から小さい方が望ま
しい。しかし、活字文字認識用辞書は、一般的にはテン
プレート数を多くすると認識能力は向上する。そこで、
従来、基本となる、適当ないくつかのフォントを学習さ
せ、評価データを認識させ、結果の悪かったフォントの
カテゴリで新たにテンプレートを作り、追加して、辞書
の内容を作成するものとしている。That is, a dictionary for a pattern recognition device is
It is desirable that the size is small in view of the memory capacity and processing speed of the device. However, the recognition ability of a type character recognition dictionary generally improves as the number of templates increases. Therefore,
Conventionally, some proper basic fonts are learned, the evaluation data is recognized, a new template is created in the category of the font with poor results, and the template is added to create the contents of the dictionary.

【０００４】[0004]

【発明が解決しようとする課題】しかし、このような従
来の活字文字認識用辞書の作成方法にあっては、最終的
に作成される辞書は最初の辞書を作ったフォントにかな
り影響されることとなる。また、従来の辞書の作成は試
行錯誤的に行わざるを得ず、非常に面倒である。従っ
て、作成された辞書は、所定の容量内で、ある程度の認
識率を有するものとなるが、その内容は必ずしも最良の
ものではないことがある。即ち、テンプレートの選択を
試行錯誤的に行うため、より認識率が良好となるテンプ
レートが含まれていなかったり、不要なテンプレートが
含まれていることがあるからである。However, in such a conventional method of creating a dictionary for recognizing type characters, the dictionary to be finally created is considerably affected by the font that created the first dictionary. Becomes Moreover, the conventional dictionary must be created by trial and error, which is very troublesome. Therefore, the created dictionary has a certain recognition rate within a predetermined capacity, but its content may not always be the best. That is, the selection of the template is performed by trial and error, so that the template having a better recognition rate may not be included or the unnecessary template may be included.

【０００５】そこで、本発明は、限られたデータ量内に
おいて最良の文字認識を行うことができる活字文字認識
用辞書を効率よく作成することができる活字文字認識用
辞書の作成方法を提供することを目的とする。Therefore, the present invention provides a method for creating a type character recognition dictionary which can efficiently create a type character recognition dictionary capable of performing the best character recognition within a limited data amount. With the goal.

【０００６】[0006]

【課題を解決するための手段】本発明において、上記の
課題を解決するための第１の手段は、図１に示すよう
に、予め定められたデータ量であって、複数フォントの
活字文字の認識を行う、文字認識装置のための活字文字
認識用辞書の作成方法において、認識させたい各フォン
トの各カテゴリに対するテンプレートを有して、略完全
に文字認識ができる文字認識用辞書を作成する初期辞書
作成工程（Ｓ１）と、当該作成した辞書を用いて文字認
識装置で、評価データを認識させて、その結果に基づい
て、定められたデータ量まで（Ｓ３）、不要なテンプレ
ートを選別して削除する不要テンプレート削除工程（Ｓ
２）と、から構成したことである。In the present invention, the first means for solving the above-mentioned problems is that, as shown in FIG. A method for creating a character recognition dictionary for a character recognition device, which recognizes characters, has a template for each category of each font to be recognized, and initially creates a character recognition dictionary capable of almost complete character recognition. The dictionary creation step (S1) and the character recognition device using the created dictionary to recognize the evaluation data, and based on the result, select an unnecessary template up to a predetermined data amount (S3). Unnecessary template deletion process to delete (S
2) and is composed.

【０００７】また、本発明の第２の手段は、第１の手段
の初期辞書作成工程（Ｓ１）は、認識したい各フォント
の各カテゴリに対してテンプレートを作成し、該辞書で
評価データを認識させ、認識できない文字のテンプレー
トを作り辞書に追加して、評価データに対して略完全に
文字認識ができる文字認識用文字を作成することとした
ことである。In the second means of the present invention, in the initial dictionary creating step (S1) of the first means, a template is created for each category of each font to be recognized, and the evaluation data is recognized by the dictionary. That is, a template of unrecognizable characters is created and added to the dictionary to create a character recognition character that can almost completely recognize the evaluation data.

【０００８】さらに、本発明の第３の手段は、上記第１
または第２の手段の不要テンプレート削除工程（Ｓ２）
は、評価データを認識した際の認識の第１候補に選ばれ
た回数に基づいて削除候補を定める処理と、削除候補テ
ンプレートを含む辞書、該テンプレートを含まない辞書
により評価データを認識させて削除による影響を判定す
る判定処理を行い、削除による影響の小さいテンプレー
トから削除する第１工程（Ｓ２−１）を含むこととした
ことである。Further, the third means of the present invention is based on the first aspect.
Alternatively, the unnecessary template deleting step of the second means (S2)
Is a process of determining a deletion candidate based on the number of times selected as the first candidate for recognition when recognizing the evaluation data, a dictionary including a deletion candidate template, and recognizing and deleting the evaluation data by a dictionary not including the template. This is to include the first step (S2-1) of performing the determination process for determining the effect of the above, and deleting from the template that is less affected by the delete.

【０００９】さらに、本発明の第４の手段は、上記第１
乃至第３の手段の不要テンプレート削除工程（Ｓ２）
は、評価データを認識した際に認識の第１候補とならな
かったテンプレートを削除をする第１工程（Ｓ２−１）
を含むこととしたことである。さらに、本発明の第５の
手段は上記第１乃至第４の手段の不要テンプレート削除
工程（Ｓ２）は、上記第１工程（Ｓ２−１）の後に、全
てのテンプレートに対し、着目するテンプレート自身を
含む辞書、該テンプレートを含まない辞書により、評価
データを認識させて削除による影響を判定する判定処理
を行い、判定結果から影響の最も小さなテンプレートを
削除し、この処理を定められたデータ量になるまで繰り
返す第２工程（Ｓ２−２）を含むこととしたことであ
る。Further, the fourth means of the present invention is based on the above first aspect.
To the unnecessary template deleting step of the third means (S2)
Is a first step (S2-1) of deleting a template that is not the first candidate for recognition when the evaluation data is recognized.
Is to be included. Further, the fifth means of the present invention is that the unnecessary template deleting step (S2) of the first to fourth means is the template to be focused on for all templates after the first step (S2-1). The evaluation process is performed by a dictionary that includes the template and a dictionary that does not include the template, and the determination process for determining the influence of deletion is performed. The template with the smallest influence is deleted from the determination result, and this process is performed with a predetermined data amount. It is decided to include the second step (S2-2) which is repeated until it becomes.

【００１０】また、本発明の第６の手段は上記第１乃至
第４の手段の不要テンプレート削除工程（Ｓ２）は、上
記第１工程（Ｓ２−１）の後に、全てのテンプレートに
対し、着目するテンプレートを含む辞書、該テンプレー
トを含まない辞書により、評価データを認識させて削除
による影響を判定する判定処理を行い、判定結果から影
響の小さいテンプレートを小さいものから順に、定めら
れたデータ量を超えるテンプレートを全て一度に削除す
る第２工程（Ｓ２−２）を含むこととしたことである。In the sixth means of the present invention, in the unnecessary template deleting step (S2) of the first to fourth means, attention is paid to all templates after the first step (S2-1). A determination process for recognizing the evaluation data and determining the effect of deletion is performed by a dictionary that includes a template that does not include the template, and a template that has a small effect is determined from the determination result in order from the smallest to the determined data amount. That is, the second step (S2-2) of deleting all the exceeding templates at once is included.

【００１１】そして、本発明の第７の手段は上記第３、
第５及び第６の手段の判定処理は、判定対象テンプレー
トを含む辞書と含まない辞書で評価データを認識させた
場合、前者の正解数、リジェクト数、誤読数をそれぞれ
Ｃ１，Ｒ１，Ｅ１とし、後者のそれをＣ２，Ｒ２，Ｅ
２、α、β、γを定数とし削除の影響をＡＦとしたと
き、ＡＦ＝α（Ｃ１−Ｃ２）＋β（Ｒ２−Ｒ１）＋γ（Ｅ２
−Ｅ１）で定まるＡＦが大きい順に影響も大きいとしたことであ
る。The seventh means of the present invention is the above third,
In the determination processing of the fifth and sixth means, when the evaluation data is recognized in the dictionary including the determination target template and the dictionary that does not include the determination target template, the former correct answer number, reject number, and misread number are set to C1, R1, and E1, respectively. The latter one, C2, R2, E
When 2, α, β, γ are constants and the effect of deletion is AF, AF = α (C1-C2) + β (R2-R1) + γ (E2
That is, the influence is larger in the order of larger AF determined by -E1).

【００１２】[0012]

【作用】本発明の第１の手段によれば、先ず、初期辞書
作成工程において、認識させたい各フォントの各カテゴ
リに対してテンプレートを有して、略完全に文字認識が
できる文字認識用辞書を作成する。そして、不要テンプ
レート削除工程において、この作成した辞書を用いて文
字認識装置で、評価データを認識させて、その結果に基
づいて、定められたデータ量まで不要なテンプレートを
選別して削除する。According to the first means of the present invention, first, in the initial dictionary creating step, there is a template for each category of each font desired to be recognized, and character recognition dictionary capable of substantially complete character recognition is provided. To create. Then, in the unnecessary template deleting step, the character recognition device uses the created dictionary to recognize the evaluation data, and based on the result, unnecessary templates are selected and deleted up to a predetermined data amount.

【００１３】これにより、略完全な認識を行うことがで
きる辞書の内容から、評価データの認識結果に基づいて
定められたデータ量内で、不要なものから削除された最
良のテンプレートを有する辞書を、試行錯誤的な手法を
用いることなく迅速に作成することができる。また、本
発明の第２の手段によれば、初期辞書作成工程において
は、認識したい各フォントの各カテゴリに対してテンプ
レートを作成し、該辞書で評価データを認識させ、認識
できない文字のテンプレートを作り辞書に追加して、評
価データに対して略完全に文字認識ができる文字認識用
文字を作成するものとしたから、確実に初期辞書を作成
することができ、結果的に認識率の高い文字認識用辞書
を作成することができる。As a result, the dictionary having the best template deleted from unnecessary ones within the data amount determined on the basis of the recognition result of the evaluation data is selected from the contents of the dictionary capable of performing substantially complete recognition. , Can be created quickly without using trial and error methods. Further, according to the second means of the present invention, in the initial dictionary creating step, a template is created for each category of each font to be recognized, the evaluation data is recognized by the dictionary, and a template of unrecognizable characters is created. In addition to the creation dictionary, the character recognition character that can recognize the character almost completely with respect to the evaluation data is created. Therefore, the initial dictionary can be created with certainty and, as a result, the character with high recognition rate can be created. A recognition dictionary can be created.

【００１４】さらに、本発明の第３の手段によれば、不
要テンプレート削除工程の第１工程において、評価デー
タを認識した際の認識の第１候補に選ばれた回数に基づ
いて削除候補を定める処理と、削除候補テンプレートを
含む辞書と、該テンプレートを含まない辞書により評価
データを認識させて削除による影響を判定する判定処理
を行い、削除による影響の小さいテンプレートから削除
するから、不要、且つ、削除による影響を受けないテン
プレートを確実に排除することができ、認識率の高い文
字認識用辞書を確実に作成することができる。Further, according to the third means of the present invention, in the first step of the unnecessary template deleting step, the deletion candidate is determined based on the number of times selected as the first candidate for recognition when the evaluation data is recognized. Processing, a dictionary including a deletion candidate template, and a determination process for recognizing evaluation data by a dictionary not including the template to determine the effect of deletion, and deleting from a template having a small effect of deletion are unnecessary, and A template that is not affected by deletion can be reliably excluded, and a character recognition dictionary with a high recognition rate can be reliably created.

【００１５】さらに、本発明の第４の手段によれば、不
要テンプレート削除工程の第１工程において評価データ
を認識した際に認識の第１候補とならなかったテンプレ
ートを削除をするから、処理時間が短くなり、迅速に文
字認識用辞書を作成できる。さらに、本発明の第５の手
段によれば、不要テンプレート削除工程の上記第１工程
の後の第２工程において、全てのテンプレートに対し、
着目するテンプレート自身を含む辞書と含まない辞書に
より、評価データを認識させて削除による影響を判定す
る判定処理を行い、判定結果から影響の最も小さなテン
プレートを削除し、この処理を定められたデータ量にな
るまで繰り返すものとしたから、許容される辞書の容量
が小さいものであるときであって、第１工程による削除
だけでは辞書容量が未だに大きい場合でも、その都度削
除による影響が小さい順にテンプレートを削除して、適
正な順序で辞書容量を小さいものとできる。Further, according to the fourth means of the present invention, since the template which is not the first candidate for recognition when the evaluation data is recognized in the first step of the unnecessary template deletion step is deleted, the processing time is reduced. Can be shortened and a character recognition dictionary can be created quickly. Further, according to the fifth means of the present invention, in all of the templates in the second step after the first step of the unnecessary template deleting step,
A judgment process that recognizes the evaluation data and judges the effect of deletion by the dictionary that includes the template of interest itself and the dictionary that does not include the template is deleted from the judgment result, and the amount of data that has been set for this process is determined. Therefore, even if the dictionary capacity is still small and the dictionary capacity is still large only by the deletion in the first step, the templates are sorted in ascending order of the delete effect. By deleting them, the dictionary capacity can be reduced in an appropriate order.

【００１６】また、本発明の第６の手段によれば第１工
程の後の第２工程において、全てのテンプレートに対
し、着目するテンプレート自身を含む辞書と含まない辞
書により、評価データを認識させて削除による影響を判
定する判定処理を行い、判定結果から影響の小さいテン
プレートを小さいものから順に、定められたデータ量を
超えるテンプレートを全て一度に削除するものとしたか
ら、第１工程による削除だけでは辞書容量が未だに大き
い場合でも、削除による影響が小さい順に一度にテンプ
レートを削除して、迅速に辞書容量を小さいものとでき
る。According to the sixth means of the present invention, in the second step after the first step, all the templates are made to recognize the evaluation data by the dictionary containing the template of interest and the dictionary not containing it. The determination process is performed to determine the effect of deletion, and the templates with the least effect are deleted in order from the judgment result, and all templates that exceed the specified data amount are deleted at once. Then, even if the dictionary capacity is still large, the templates can be deleted at a time in the order of decreasing influence of deletion, and the dictionary capacity can be quickly reduced.

【００１７】そして、本発明の第７の手段によれば、判
定処理は、削除による影響を数値的に表し、これに基づ
いて削除を行うから適正な判定を行うことができる。According to the seventh means of the present invention, the determination process numerically represents the influence of deletion, and the deletion is performed based on this, so an appropriate determination can be made.

【００１８】[0018]

【実施例】以下本発明に係る活字文字認識用辞書の作成
方法の実施例を図面に基づいて説明する。図２乃至図３
は本実施例に係る活字文字認識用辞書の作成方法の第１
の実施例を示すものである。本実施例は上記第１乃至第
３及び第４乃至第６の手段を包含するものである。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT An embodiment of a method for creating a type character recognition dictionary according to the present invention will be described below with reference to the drawings. 2 to 3
Is the first method of creating a dictionary for recognizing printed characters according to the present embodiment.
FIG. This embodiment includes the first to third and fourth to sixth means.

【００１９】本実施例において。本辞書作成方法は図１
の点線で囲った３つの工程から成る。これは、初期辞書
作成工程（Ｓ１に相当）、及び、第１工程及び第２工程
から成る不要テンプレート削除工程（Ｓ２−１，Ｓ−２
に相当）である。初期辞書作成工程では、評価データに
対し１００％あるいはそれに近い認識能力を発揮する辞
書を作るものである。まずステップ（以下、ＳＴ）０１
において、評価データのもつ全フォントの全カテゴリに
対して１つづつテンプレートを作る。この辞書を用いて
評価データを認識させ（ＳＴ０２）、結果が１００％あ
るいはそれに近い十分な結果かどうかを判定し（ＳＴ０
３）、ＹＥＳなら初期辞書作成工程の処理を終了、ＮＯ
なら読めなかったフォントのカテゴリに対し新たにテン
プレートを作り、辞書に追加する（ＳＴ０４）。そして
ＳＴ０２に戻り、認識処理を行い１００％あるいはそれ
に近い能力を発揮する辞書ができるまで繰り返す。In this embodiment. Figure 1 shows how to create this dictionary.
It consists of three steps surrounded by a dotted line. This is an initial dictionary creating step (corresponding to S1) and an unnecessary template deleting step (S2-1, S-2) including the first step and the second step.
Is equivalent to). In the initial dictionary creating step, a dictionary that exhibits recognition ability of 100% or close to the evaluation data is created. First, step (hereinafter, ST) 01
In, a template is created for each category of all fonts included in the evaluation data. The evaluation data is recognized using this dictionary (ST02), and it is judged whether the result is 100% or a sufficient result close to it (ST0).
3) If YES, end the process of the initial dictionary creation process, NO
If so, a new template is created for the font category that could not be read and added to the dictionary (ST04). Then, the process returns to ST02, and the recognition process is performed, and the process is repeated until a dictionary that exhibits the ability of 100% or close thereto is created.

【００２０】本工程によって、辞書作成システムが十分
なメモリ容量を持っていることが前提とはなるが、テン
プレート数を増やしていけばいずれ認識率を１００％と
することができる。次に、処理は不要テンプレート削除
工程の第１工程へ移る。ここでは、削除しても影響のな
い不要なテンプレートを見つけ、削除するものである。This step assumes that the dictionary creating system has a sufficient memory capacity, but if the number of templates is increased, the recognition rate can eventually reach 100%. Next, the process moves to the first process of the unnecessary template deleting process. Here, an unnecessary template that has no effect even if deleted is found and deleted.

【００２１】まず、初期辞書作成工程で作られた辞書を
使って評価データに対し認識を行い、その結果を保存す
る（ＳＴ０５）。そしてこの結果から各テンプレートに
ついて第１候補、第２，第３…に選ばれた数などを集計
し（ＳＴ０６）、各テンプレートに必要性の高いと思わ
れる順に順番をつける（ＳＴ０７）。具体的には、ＳＴ
０６の集計結果からまず第１候補の数によって順番をつ
ける。First, the dictionary created in the initial dictionary creating step is used to recognize the evaluation data, and the result is saved (ST05). From this result, the numbers selected for the first candidate, the second, the third ... For each template are totaled (ST06), and each template is ordered in the order deemed to be most necessary (ST07). Specifically, ST
From the totaling result of 06, the order is first set according to the number of first candidates.

【００２２】そして、次にＳＴ０７で決まった順位に基
づき未判定のテンプレートの中から最も順位の悪いテン
プレートを選び、それを判定対象テンプレートとし（Ｓ
Ｔ０８）、この判定対象テンプレートを含む辞書で評価
データの認識を行い、その結果を保存する（ＳＴ０
９）。次に、判定対象テンプレートを含まない辞書で評
価データの認識を行い、結果を保存し（ＳＴ１０）、Ｓ
Ｔ０９とＳＴ１０の結果の集計を行い２つの辞書の認識
能力の差（即ち、そのテンプレートを削除した場合の影
響）を算出する（ＳＴ１１）。Then, the template with the lowest rank is selected from the undetermined templates based on the rank determined in ST07, and it is set as the judgment target template (S
(T08), the dictionary including this determination target template recognizes the evaluation data, and stores the result (ST0).
9). Next, the evaluation data is recognized using a dictionary that does not include the determination target template, and the result is saved (ST10), and S
The results of T09 and ST10 are totaled to calculate the difference in recognition ability between the two dictionaries (that is, the effect of deleting the template) (ST11).

【００２３】算出方法は以下の計算式によって計算を行
う。判定対象テンプレート含む辞書と含まない辞書で評
価データを認識させた場合、前者の正解数、リジェクト
数、誤読数をそれぞれＣ１，Ｒ１，Ｅ１とし、後者のそ
れをＣ２，Ｒ２，Ｅ２とする。そして削除した場合の影
響の大きさをＡＦ＝α（Ｃ１−Ｃ２）＋β（Ｒ２−Ｒ１）＋（Ｅ２−
Ｅ１）注：α〜γは係数によって計算されるＡＦと定義する。α，β，γは正
解、リジェクト、誤読のそれぞれの変化が他に比べてど
れだけ重要か示す重み付けである。本実施例では重みに
差を付けず、 α＝β＝γ＝１とする。The calculation method is as follows. When the evaluation data is recognized by the dictionary including the determination target template and the dictionary not including the determination target template, the number of correct answers, the number of rejects, and the number of misreads of the former are set as C1, R1, and E1, respectively, and those of the latter are set as C2, R2, and E2. Then, the magnitude of the influence when deleted is AF = α (C1-C2) + β (R2-R1) + (E2-
E1) Note: α to γ are defined as AF calculated by coefficients. α, β, and γ are weights that indicate how important each change of correct answer, reject, and misread is compared to others. In this embodiment, the weights are not differentiated and α = β = γ = 1.

【００２４】次に、ＳＴ１１で算出された削除の影響が
予め定めておいた基準を満たすかどうかを判定する（Ｓ
Ｔ１２）。基準を満たしていればテンプレートは削除可
能としてそのテンプレートを削除し（ＳＴ１３）、基準
を満たしていなければ削除不可として、未判定のテンプ
レートが残っているかどうかを調べ（ＳＴ１４）、残っ
ていればＳＴ０８へ戻り、残っていなければ次のＳＴへ
移る。これで第１工程は終了するここまでの処理が終わ
った時点で、辞書には不要なテンプレートはもう無いは
ずである。即ち、この段階で残っているテンプレート
は、どれを取っても認識能力に大きな影響があるという
ことである。もしこの時点での辞書の大きさが、製品に
載せる場合などの制限を超えていなければ、この内容を
辞書として採用し、制限を超えているなら、現在残って
いるテンプレートの中から削除を行う。Next, it is determined whether or not the effect of deletion calculated in ST11 satisfies a predetermined criterion (S
T12). If the standard is satisfied, the template is deleted and the template is deleted (ST13). If the standard is not satisfied, the template is not deleted and it is checked whether any undetermined template remains (ST14). Return to, and if there is no remaining, move to the next ST. This completes the first step. When the processing up to this point is completed, there should be no unnecessary templates in the dictionary. That is, any of the templates remaining at this stage has a great influence on the recognition ability. If the size of the dictionary at this point does not exceed the limit when putting it on the product, etc., adopt this content as the dictionary, and if it exceeds the limit, delete from the template that currently remains .

【００２５】よって次の第２工程では、図４に示すよう
に、現在残っているテンプレートの中から必要の度合い
が相対的に低いものを削除し、辞書の大きさ制限を満た
すようにする。まず、現在の辞書が大きさの制限を満た
しているかどうかを判定し、満たしている処理を終了さ
せ、満たしていないなら次の処理へ移る（ＳＴ１５）。Therefore, in the next second step, as shown in FIG. 4, the template having a relatively low degree of necessity is deleted from the currently remaining templates so as to satisfy the size restriction of the dictionary. First, it is determined whether or not the current dictionary satisfies the size restriction, the processing that satisfies the restriction is terminated, and if not, the process proceeds to the next processing (ST15).

【００２６】次に、ＳＴ０８のように判定対象テンプレ
ートの決定を行う（ＳＴ１６）。ただし、ここではＳＴ
０７の順位に従わなくてもよい。さらに、それぞれＳＴ
０９、ＳＴ１０と同様に判定対象テンプレートを含む辞
書と含まない辞書評価データの認識を行い、その結果を
保存する（ＳＴ１７、ＳＴ１８）。次に、ＳＴ１１と同
様に、ＳＴ１７、ＳＴ１８の結果を集計し、削除の影響
を算出し、保存する（ＳＴ１９）。Next, the template to be judged is determined as in ST08 (ST16). However, here ST
The order of 07 may not be followed. Furthermore, each ST
As in 09 and ST10, the dictionary evaluation data that does not include the template to be determined and the dictionary evaluation data that does not include the template are recognized, and the results are saved (ST17, ST18). Next, similarly to ST11, the results of ST17 and ST18 are totaled, the influence of deletion is calculated and stored (ST19).

【００２７】そして、未判定のテンプレートが残ってい
るかどうか調べ（ＳＴ２０）、残っていればＳＴ１６に
もどり、残っていなければ次へ移る。次に、各テンプレ
ートについて、ＳＴ１９で算出された値を参考にして、
最も影響の小さいテンプレートを選びだし、（ＳＴ２
１）、ＳＴ２１で選ばれたテンプレートを削除し（ＳＴ
２２）、ＳＴ１５へ戻り、これらの処理（ＳＴ１６〜Ｓ
Ｔ２２）を辞書の大きさが条件を満たすまで繰り返す。Then, it is checked whether or not an undetermined template remains (ST20). If it remains, the process returns to ST16, and if not, the process moves to the next. Next, referring to the value calculated in ST19 for each template,
Select the template that has the least effect, and select (ST2
1) Delete the template selected in ST21 (ST
22), returning to ST15, these processes (ST16 to S
T22) is repeated until the size of the dictionary satisfies the condition.

【００２８】従って、本実施例によれば、確実に文字認
識において使用頻度が低く、且つ、削除されても影響が
低い順に不要なテンプレートを削除することができ、確
実に文字認識用辞書を作成することができる。そして、
この際、削除の影響を数値的に表示して判定を行うこと
ができるため、適正な判定を行うことができる。次に、
本発明に係る活字文字認識用辞書の作成方法の第２の実
施例を説明する。Therefore, according to the present embodiment, unnecessary templates can be reliably deleted in the order of low frequency of use in character recognition, and even if deleted, the influence is low, and a character recognition dictionary is reliably created. can do. And
At this time, since the influence of the deletion can be displayed numerically and the determination can be made, an appropriate determination can be made. next,
A second embodiment of the method for creating a type character recognition dictionary according to the present invention will be described.

【００２９】本実施例は、上述した第４の手段に相当す
るものである。本実施例は上述した第１の実施例のＳＴ
０７とＳＴ０８の間に図５に示すように、ＳＴ２３を入
れたものである。この、ＳＴ２３では、第１候補の数が
０個のテンプレートを、判定処理を行うことなく全て削
除する。第１候補として１度も選ばれなかったテンプレ
ートは削除しても影響がほとんど無いであろうから、短
い処理時間で文字認識用辞書を作成する必要がある時に
は有効である。This embodiment corresponds to the above-mentioned fourth means. This embodiment is the ST of the first embodiment described above.
As shown in FIG. 5, ST23 is inserted between 07 and ST08. In this ST23, all the templates with the first candidate number of 0 are deleted without performing the determination process. A template that has never been selected as the first candidate will have almost no effect even if it is deleted, so it is effective when it is necessary to create a character recognition dictionary in a short processing time.

【００３０】次に、本発明に係る活字文字認識用辞書の
作成方法の第３の実施例を説明する。本実施例は、上述
した第６の手段に相当するものである。本実施例は、図
６に示すように、上述した第１の実施例のＳＴ２１とＳ
Ｔ２２の代わりに、それぞれＳＴ２４とＳＴ２５を用い
るものである。ＳＴ２４では、辞書の大きさの制限と、
現在の辞書の大きさから削除しなければならないテンプ
レート数を計算する。ＳＴ２５では、各テンプレートに
ついての削除した場合の影響が相対的に最も小さいもの
から順に、ＳＴ２４で求めた数だけ削除を行う。Next, a third embodiment of the method for creating a type character recognition dictionary according to the present invention will be described. The present embodiment corresponds to the above-mentioned sixth means. In the present embodiment, as shown in FIG. 6, ST21 and S of the first embodiment described above are used.
Instead of T22, ST24 and ST25 are used, respectively. In ST24, the size of the dictionary is limited,
Calculate the number of templates that must be deleted from the current dictionary size. In ST25, deletion is performed by the number obtained in ST24, in order from the template having the smallest influence when deleted for each template.

【００３１】第１の実施例では、全テンプレートについ
ての削除の影響を調べてその中から１個だけ削除し、ま
だ削除が必要な時には再度全テンプレートについて削除
の影響を調べなおすが、本実施例では、１回だけ全テン
プレートの削除の影響を調べてその中から悪い順に一括
して削除してしまうものである。従って本実施例によれ
ば、第２の実施例と同様に短い処理時間で文字認識用辞
書を作成する必要がある時には有効である。In the first embodiment, the influence of the deletion on all the templates is checked, only one of them is deleted, and when the deletion is still necessary, the influence of the deletion is again checked on all the templates. Then, the effect of deleting all the templates is checked only once, and the templates are collectively deleted in bad order. Therefore, this embodiment is effective when it is necessary to create a character recognition dictionary in a short processing time as in the second embodiment.

【００３２】尚、上記の説明では、第１乃至第３の実施
例を別個に説明したが、本発明では必要に応じて、これ
らの実施例を組み合わせて文字認識用辞書の作成を行う
ことができる。In the above description, the first to third embodiments have been separately described. However, in the present invention, these embodiments may be combined to create a character recognition dictionary if necessary. it can.

【００３３】[0033]

【発明の効果】以上説明したように、本発明によれば、
活字文字認識用辞書の作成方法を、認識させたい各フォ
ントの各カテゴリに対してテンプレートを有して、略完
全に文字認識ができる文字認識用辞書を作成する初期辞
書作成工程と、当該作成した辞書を用いて文字認識装置
で、評価データを認識させて、その結果に基づいて、定
められたデータ量まで不要なテンプレートを選別して削
除する不要テンプレート削除工程とから構成したから、
略完全に活字文字を認識することができる初期辞書か
ら、データ容量に応じて順次認識率の低い順にテンプレ
ートが削除されて、辞書が確定するから、限られたデー
タ量内において最良の文字認識を行うことができる活字
文字認識用辞書を、試行錯誤的でなく効率よく作成する
ことができるという効果を奏する。As described above, according to the present invention,
An initial dictionary creation process of creating a character recognition dictionary that has a template for each category of each font to be recognized and that allows character recognition to be performed almost completely With a character recognition device using a dictionary, the evaluation data is recognized, and based on the result, it is composed of an unnecessary template deletion step of selecting and deleting unnecessary templates up to a predetermined data amount,
Templates are deleted from the initial dictionary that can almost completely recognize print characters in order of decreasing recognition rate according to the data capacity, and the dictionary is fixed, so the best character recognition is possible within the limited data amount. The effect is that it is possible to efficiently create a printable character recognition dictionary that is not trial and error.

[Brief description of drawings]

【図１】本発明の原理を示す図である。FIG. 1 is a diagram showing the principle of the present invention.

【図２】本発明に係る活字文字認識用辞書の作成方法の
第１の実施例を示すフローチャートである。FIG. 2 is a flowchart showing a first embodiment of a method for creating a dictionary for recognizing printed characters according to the present invention.

【図３】本発明に係る活字文字認識用辞書の作成方法の
第１の実施例を示すフローチャートである。FIG. 3 is a flowchart showing a first embodiment of a method for creating a type character recognition dictionary according to the present invention.

【図４】本発明に係る活字文字認識用辞書の作成方法の
第１の実施例を示すフローチャートである。FIG. 4 is a flowchart showing a first embodiment of a method for creating a dictionary for recognizing printed characters according to the present invention.

【図５】本発明に係る活字文字認識用辞書の作成方法の
第２の実施例を示すフローチャートである。FIG. 5 is a flowchart showing a second embodiment of a method for creating a type character recognition dictionary according to the present invention.

【図６】本発明に係る活字文字認識用辞書の作成方法の
第３の実施例を示すフローチャートである。FIG. 6 is a flow chart showing a third embodiment of the method for creating a type character recognition dictionary according to the present invention.

[Explanation of symbols]

Ｓ１初期辞書作成工程Ｓ２不要テンプレート削除工程Ｓ２−１第１工程Ｓ２−２第２工程 S1 initial dictionary creating step S2 unnecessary template deleting step S2-1 first step S2-2 second step

Claims

[Claims]

1. A method for creating a type character recognition dictionary for a character recognition device for recognizing type characters of a plurality of fonts with a predetermined data amount, for each category of each font to be recognized. An initial dictionary creation step (S1) of creating a character recognition dictionary that has a template and is capable of almost complete character recognition, and a character recognition device using the created dictionary to recognize evaluation data, and the result Based on the above, up to a predetermined data amount (S3), an unnecessary template deleting step (S2) of selecting and deleting unnecessary templates, and a method for creating a dictionary for typographic character recognition.

2. The initial dictionary creating step (S1) creates a template for each category of each font to be recognized, causes the dictionary to recognize evaluation data, creates a template of unrecognizable characters, and adds the template to the dictionary. The method for creating a type character recognition dictionary according to claim 1, further comprising: creating character recognition characters capable of substantially completely recognizing the evaluation data.

3. The unnecessary template deleting step (S2)
Is a process of determining a deletion candidate based on the number of times selected as the first candidate for recognition when recognizing the evaluation data, a dictionary including a deletion candidate template, and recognizing and deleting the evaluation data by a dictionary not including the template. 3. The type character recognition dictionary according to claim 1, further comprising a first step (S2-1) of performing a determination process for determining the effect of the deletion, and deleting the template from the template less affected by the deletion. Method.

4. The unnecessary template deleting step (S2) comprises:
The type character recognition dictionary according to claim 1 or 2, further comprising a first step (S2-1) of deleting a template that has not become a first candidate for recognition when the evaluation data is recognized. How to create.

5. The unnecessary template deleting step (S2)
After the first step (S2-1), a determination process for recognizing the evaluation data by recognizing the evaluation data by a dictionary including the template of interest and a dictionary not including the template for all templates is performed. The second step (S) in which the template with the smallest influence is deleted from the determination result and this processing is repeated until the predetermined data amount is reached (S
2-2) is included, Claim 1, Claim 2,
A method for creating a dictionary for recognizing printed characters according to claim 3 or 4.

6. The unnecessary template deleting step (S2)
After the first step (S2-1), a determination process for recognizing the evaluation data by recognizing the evaluation data by a dictionary including the template of interest and a dictionary not including the template for all templates is performed. From the judgment result, the template with the smallest influence is sorted in ascending order,
5. The type character recognition according to claim 1, claim 2, claim 3 or claim 4, comprising a second step (S2-2) of deleting all templates exceeding a predetermined data amount at once. How to create a dictionary.

7. The determination process, when the evaluation data is recognized by a dictionary including a determination target template and a dictionary not including the determination target template, sets the former correct answer number, reject number, and misreading number as C, respectively.
1, R1, E1 and the latter one is C2, R2, E2,
When α, β, and γ are constants and the influence of deletion is AF, AF = α (C1-C2) + β (R2-R1) + γ (E2
The method for creating a type character recognition dictionary according to claim 3, claim 5, or claim 6, wherein the AF is determined in descending order of AF determined by -E1).