JPH11184976A

JPH11184976A - Dictionary learning system and character recognition device

Info

Publication number: JPH11184976A
Application number: JP9365834A
Authority: JP
Inventors: Keiji Yoshizaki; 慶司吉崎
Original assignee: Japan Digital Laboratory Co Ltd
Current assignee: Japan Digital Laboratory Co Ltd
Priority date: 1997-12-22
Filing date: 1997-12-22
Publication date: 1999-07-09
Anticipated expiration: 2017-12-22
Also published as: JP4116688B2

Abstract

PROBLEM TO BE SOLVED: To provide a dictionary learning system and a character recognition device that efficiently perform learning of a recognition dictionary and, at the same time, can give high validity and stability to the recognition dictionary after learning. SOLUTION: A learning character extraction part 7 automatically deletes a character pertinent to deletion condition from a character of a correction object. A learning character filter part 13 of a learning dictionary part 8 automatically deletes a learning character inappropriate to learning by taking recognition distance as reference; moreover, a learning character selection part 14 classifies the learning character into a template (a learning object template) of the recognition dictionary of the learning object, selects the learning character which is not learnt processed by taking the recognition distance as reference, a learning character synthesis part 15 gradually changes the synthesis ratio of the feature amount of the learning character and the feature amount of the learning object template from their initial values and synthesizes them into vector amount and brings the feature amount of the learning object template closer to the learning character. At the time of synthesis processing completion at a template addition part 16, necessity/unnecsessity of addition of the template is decided by taking the minimum recognition distance of the learning character as reference, a filter data change part 17 updates/adds recognition dictionary data. A recognition dictionary 9 retains a learnt dictionary in addition to a standard dictionary in accordance with kinds of an inputted character source.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は文字認識装置に関
し、特に、入力文字のイメージデータから特徴量を作成
し、認識辞書の特徴量との比較により認識された文字コ
ードを得る辞書学習方式および文字認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition apparatus and, more particularly, to a dictionary learning method and a character learning method for generating a characteristic amount from image data of an input character and obtaining a character code recognized by comparing the characteristic amount with a characteristic amount of a recognition dictionary. It relates to a recognition device.

【０００２】[0002]

【従来の技術】文字認識装置は、文字画像を読取って電
気信号に変換して画像イメージを得て、それを１文字ず
つ切出して認識辞書を用いて文字認識を施し、文字コー
ドを得る装置である。上記文字認識の過程では、先ず、
切出された文字の特徴量抽出と、抽出された特徴量と認
識辞書内の各テンプレートの特徴量との比較がなされ、
最も類似度の高い特徴量を有する候補文字が正解文字と
されその文字コードが取得される。しかし、読取られる
文字は同じ文字でも形や大きさや線の太さが異なったり
線のかすれがあったりして必ずしも正確に認識できると
は限らない。特に、手書き文字を認識する場合には上記
のような要因に加え、個人の書き癖等の要因も加わって
認識率が低下する。2. Description of the Related Art A character recognizing device is a device for reading a character image and converting it into an electric signal to obtain an image image, extracting the image one character at a time, performing character recognition using a recognition dictionary, and obtaining a character code. is there. In the above character recognition process, first,
The feature amount of the extracted character is extracted, and the extracted feature amount is compared with the feature amount of each template in the recognition dictionary.
The candidate character having the feature amount having the highest similarity is determined as the correct character, and the character code is obtained. However, the characters to be read cannot always be accurately recognized because the same characters have different shapes, sizes, and different line thicknesses, or have blurred lines. In particular, when recognizing handwritten characters, in addition to the above factors, factors such as personal writing habits are added, and the recognition rate is reduced.

【０００３】上述のような問題の解決方法として、不読
文字或いは誤認識文字の修正入力の際に認識辞書に登録
されている標準パターンを更新（修正／追加）する方
式、いわゆる’認識辞書の学習’により認識率向上を図
る方法がある。認識率の学習方式に関する従来技術とし
て、特開平２−１８６４８４号公報の「文字認識装置」
や特開平９ー１８５６８２号公報の「認識辞書の学習方
式」等に開示の技術のように修正入力の際に、逐一その
文字を学習させるか否かをオペレータが判断して指示を
与える方式、特開平７−１２９７２４号公報の「文字認識装置な
らびにその辞書学習方法および辞書作成方法」に開示の
技術のように、修正と同時に学習を行なう方式、特開平７−２７１９１７号公報の「手書き文字認識
辞書作成方法及び装置」や、特開平８−２８７１９１号
公報の「光学的文字読取り装置」等に開示の技術のよう
に、学習後の認識辞書の妥当性を認識シミュレーション
によって調べる方式、がある。As a solution to the above-mentioned problem, a method of updating (correcting / adding) a standard pattern registered in a recognition dictionary when correcting and inputting an unread character or an erroneously recognized character, that is, a so-called 'recognition dictionary. There is a method to improve the recognition rate by 'learning'. As a prior art relating to a learning method of a recognition rate, Japanese Patent Application Laid-Open No. 2-186484 discloses a "character recognition apparatus".
A method in which an operator determines whether or not to learn each character at the time of correction input as in a technique disclosed in “Learning method of recognition dictionary” in Japanese Patent Application Laid-Open No. Hei 9-185682 or the like, and gives an instruction. Japanese Patent Laid-Open No. 7-129724, "Character Recognition Apparatus and Dictionary Learning Method and Dictionary Creation Method", a method of performing learning simultaneously with correction. A method of examining the validity of a recognition dictionary after learning by recognition simulation, such as the technique disclosed in "Dictionary Creation Method and Apparatus" and "Optical Character Reading Device" in Japanese Patent Application Laid-Open No. 8-287191.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記
の技術では、オペレータの作業負担が増加しする点と文
字学習の要否の判断のための専門的知識をオペレータが
持っていなければ精度の高い’認識辞書の学習’ができ
ないという問題点があり、の技術では、学習文字の妥
当性を考慮していないので、学習の効率性および学習後
の認識辞書の安定性に欠けるという問題点があり、の
技術では、学習後の認識辞書の妥当性のシミュレーショ
ンに時間を要し、また、認識シミュレーションに必要な
データを格納する記憶手段を追加する必要があるという
不具合がある。However, according to the above-mentioned technique, the accuracy of the operation is high unless the operator has the technical load for determining the necessity of the character learning and the point that the workload of the operator increases. There is a problem that the 'learning of the recognition dictionary' cannot be performed, and the technology does not consider the validity of the learning character, so there is a problem that the learning efficiency and the stability of the recognition dictionary after learning are lacking. However, the technique of (1) has a disadvantage that it takes time to simulate the validity of the recognition dictionary after learning, and that it is necessary to add a storage unit for storing data necessary for the recognition simulation.

【０００５】本発明は、上記問題点および不具合に鑑み
てなされたものであり、効率的に認識辞書の学習を行な
うと共に学習後の認識辞書に高い妥当性と安定性を与え
得る辞書学習方式および文字認識装置の提供を目的とす
る。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems and disadvantages, and has a dictionary learning method capable of efficiently learning a recognition dictionary and providing the learned dictionary with high validity and stability after learning. It is intended to provide a character recognition device.

【０００６】[0006]

【課題を解決するための手段】上記課題を解決するため
に、本発明の辞書学習方式は、認識辞書の学習処理を行
なう文字認識装置において、認識辞書の学習に不適な学
習文字を認識距離に基づいて自動的に除外する工程を有
することを特徴とする。In order to solve the above-mentioned problems, a dictionary learning method according to the present invention provides a character recognition device for performing a learning process of a recognition dictionary. Automatically removing based on the information.

【０００７】また、上記辞書学習方式において、更に、
学習文字を学習対象となる認識辞書のテンプレート別に
分類する工程と、その中から未だ学習処理をしていない
学習文字を認識距離に基づいて選択する工程と、選択さ
れた学習文字の特徴量と分類されたテンプレートの特徴
量との合成比率を徐々に変化させながら合成して前記分
類されたテンプレートの特徴量を学習文字に近づける工
程とを有することを特徴とする。[0007] In the dictionary learning method,
A step of classifying learning characters for each template of a recognition dictionary to be learned, a step of selecting learning characters that have not yet undergone learning processing based on the recognition distance, a feature amount and classification of the selected learning characters And a step of bringing the classified template feature amount closer to the learning character by performing synthesis while gradually changing the synthesis ratio with the obtained template feature amount.

【０００８】また、上記辞書学習方式において、更に、
合成処理の終了時に、学習文字の最小認識距離を基準に
テンプレートの追加の要否を判定して認識辞書データの
更新または追加を行なう工程を有することを特徴とす
る。[0008] Further, in the dictionary learning method,
At the end of the synthesizing process, the method includes a step of determining whether or not to add a template based on the minimum recognition distance of the learning character and updating or adding the recognition dictionary data.

【０００９】また、上記各辞書学習方式では、修正対象
の文字から所定の除外条件に該当する文字を自動的に除
外して学習文字を得る工程を有している。Each of the dictionary learning methods has a step of automatically excluding a character corresponding to a predetermined exclusion condition from characters to be corrected to obtain a learning character.

【００１０】また、本発明の文字認識装置は、認識辞書
の学習処理を行なう文字認識装置において、学習文字と
異なる各文字コードと当該学習文字との正規化距離が第
１の閾値より小さい場合にその学習文字を認識辞書の学
習に不適な学習文字として除外する第１のフィルタ手段
と、学習文字と同じ文字コードと当該学習文字との正規
化距離が第２の閾値より大きい場合にその学習文字を辞
書の学習に不適な学習文字として除外する第２のフィル
タ手段と、を備えたことを特徴とする。Further, the character recognition device of the present invention is a character recognition device for performing a learning process of a recognition dictionary, wherein the normalized distance between each character code different from the learning character and the learning character is smaller than the first threshold value. First filter means for excluding the learning character as a learning character unsuitable for learning of the recognition dictionary; and a learning character when the normalized distance between the same character code as the learning character and the learning character is larger than a second threshold value. And second filter means for excluding as a learning character unsuitable for learning a dictionary.

【００１１】また、上記文字認識装置は、更に、学習文
字を学習対象となる認識辞書のテンプレート別に分類
し、その中から未だ学習処理をしていない学習文字を認
識距離に基づいて選択する学習文字選択手段と、選択さ
れた学習文字の特徴量と分類されたテンプレートの特徴
量との合成比率を変化させながら合成して分類されたテ
ンプレートの特徴量を学習文字に近づける特徴量合成手
段と、を備えたことを特徴とする。Further, the character recognition apparatus further classifies the learning characters according to templates of a recognition dictionary to be learned, and selects learning characters that have not yet undergone learning processing from the classification based on the recognition distance. Selecting means, and characteristic amount synthesizing means for bringing the characteristic amount of the classified template closer to the learning character by changing the combination ratio between the characteristic amount of the selected learning character and the classified template characteristic amount. It is characterized by having.

【００１２】また、上記文字認識装置において、更に、
合成処理の終了時に、学習文字の最小認識距離を基準に
テンプレートの追加の要否を判定する辞書学習方法判定
手段を備え、判定結果に基づいて認識辞書データの更新
または追加を行なうことを特徴とする。Further, in the above character recognition device,
It is provided with a dictionary learning method determining means for determining whether or not to add a template based on the minimum recognition distance of the learning character at the end of the combining process, and updating or adding the recognition dictionary data based on the determination result. I do.

【００１３】また、上記各文字認識装置では修正対象の
文字から所定の除外条件に該当する文字を自動的に除外
して学習文字を得る学習文字抽出手段を有している。Further, each of the above character recognition devices has a learning character extracting means for obtaining a learning character by automatically excluding a character corresponding to a predetermined exclusion condition from characters to be corrected.

【００１４】更に、上記文字認識装置において、学習文
字抽出手段は下記除外条件のいずれか１つに修正対象の
文字が該当する時、当該修正対象文字を自動的に除外す
る。修正入力により挿入された文字、修正入力により削除された修正対象文字、修正対象文字がその前後の文字と接触している場合
の当該修正対象文字、修正対象文字の大きさが閾値より小さい場合の当該
修正対象文字、修正対象文字が掠れている場合の当該修正対象文
字。Further, in the above-mentioned character recognition device, when a character to be corrected satisfies any one of the following exclusion conditions, the learning character extracting means automatically excludes the character to be corrected. The character inserted by the correction input, the correction target character deleted by the correction input, the correction target character when the correction target character is in contact with the character before and after it, and the correction target character when the size of the correction target character is smaller than the threshold value. The correction target character, or the correction target character when the correction target character is blurred.

【００１５】[0015]

【発明の実施の形態】＜文字認識装置の構成例＞図１は
本発明の文字認識装置の構成例を示すブロック図であ
り、（ａ）は文字認識装置の構成例を示すブロック図、
（ｂ）は本発明の要部である辞書学習部の構成例を示す
ブロック図である。図１（ａ）で、文字認識装置１００
は、光学的読取り装置や手書き文字読取り装置等からな
り、文字（または、線、点）を読み取って電気信号に変
換して更にデジタル化してイメージデータを得る文字入
力部１と、文字入力部１からのイメージデータを１文字
単位の文字イメージに分解する文字切出部２と、切出さ
れた文字イメージの特徴を抽出して特徴量を得る特徴抽
出部３と、入力された文字の特徴量と認識辞書９内の各
テンプレートの特徴量とを比較して認識結果（例えば、
文字コード）を出力する文字認識部４を有している。DESCRIPTION OF THE PREFERRED EMBODIMENTS <Configuration Example of Character Recognition Apparatus> FIG. 1 is a block diagram illustrating a configuration example of a character recognition apparatus according to the present invention. FIG. 1A is a block diagram illustrating a configuration example of a character recognition apparatus.
FIG. 2B is a block diagram illustrating a configuration example of a dictionary learning unit that is a main part of the present invention. In FIG. 1A, the character recognition device 100
Is composed of an optical reader, a handwritten character reader, etc., and reads a character (or a line or a dot), converts it into an electric signal, and further digitizes it to obtain image data; Character extracting unit 2 for decomposing image data from a character into character images in units of one character, a characteristic extracting unit 3 for extracting characteristics of the extracted character image to obtain a characteristic amount, and a characteristic amount of an input character Is compared with the feature amount of each template in the recognition dictionary 9 and a recognition result (for example,
And a character recognition unit 4 for outputting a character code.

【００１６】更に、文字認識装置１００は、認識辞書９
の学習を行なう場合に文字認識部８から出力される認識
結果を一時的に記憶する認識結果記憶部５と、認識結果
を文字としてモニター画面に表示し、オペレータが不読
文字または誤認識文字をキーボード操作によって修正入
力する認識結果修正部６と、学習文字抽出部７と、学習
辞書部８および認識辞書９を有している。学習文字抽出
部７と、学習辞書部８および認識辞書９については以下
に詳述する。Further, the character recognition device 100 includes a recognition dictionary 9
A recognition result storage unit 5 for temporarily storing a recognition result output from the character recognition unit 8 when learning is performed, and displaying the recognition result as a character on a monitor screen so that the operator can recognize an unread character or a misrecognized character. It has a recognition result correcting unit 6 for correcting and inputting by a keyboard operation, a learning character extracting unit 7, a learning dictionary unit 8, and a recognition dictionary 9. The learning character extracting unit 7, the learning dictionary unit 8, and the recognition dictionary 9 will be described in detail below.

【００１７】また、文字認識装置１００は図示しない制
御部を有している。制御部はＣＰＵ，ＲＡＭ，ＲＯＭ等
を有するマイクロプロセッサー構成を有し、文字認識装
置全体の動作を制御する。なお、文字入力部１は読取り
装置として別に専用の制御部を有する構成であってもよ
い。また、文字切出部２〜辞書学習部８はハードウエア
回路で構成することもできるが、文字切出部２〜辞書学
習部８の各モジュールのうちのあるモジュールをハード
ウエア回路で、その他のモジュールをプログラムで構成
するようにしてもよい。また、文字切出部２〜辞書学習
部８の一部をプログラムで構成した場合にはその各モジ
ュールはＲＯＭ等の記録媒体に記録され、制御プログラ
ムのコントロール下で制御部のＣＰＵにより実行制御さ
れて、それぞれの処理を実現する。The character recognition device 100 has a control unit (not shown). The control unit has a microprocessor configuration having a CPU, a RAM, a ROM, and the like, and controls the operation of the entire character recognition device. Note that the character input unit 1 may have a separate dedicated control unit as a reading device. Further, the character extracting unit 2 to the dictionary learning unit 8 can be configured by a hardware circuit, but a certain module among the modules of the character extracting unit 2 to the dictionary learning unit 8 is configured by a hardware circuit. The module may be configured by a program. When a part of the character extracting unit 2 to the dictionary learning unit 8 is configured by a program, each module is recorded on a recording medium such as a ROM, and is executed and controlled by the CPU of the control unit under the control of the control program. To realize each processing.

【００１８】なお、本実施例では文字切出部２〜文字認
識部４と認識結果修正部６の一部，学習文字抽出部およ
び辞書学習部８をプログラムで構成している。そして、
認識結果記憶部５をＲＡＭ等の内部メモリーとし、認識
結果修正部６のうちのハードウエア部分をモニター画面
を有するディスプレイ装置およびキーボード等からなる
装置としている。In this embodiment, the character extracting section 2 to the character recognizing section 4 and a part of the recognition result correcting section 6, the learning character extracting section and the dictionary learning section 8 are constituted by a program. And
The recognition result storage unit 5 is an internal memory such as a RAM, and the hardware part of the recognition result correction unit 6 is a device including a display device having a monitor screen and a keyboard.

【００１９】＜学習文字抽出部＞学習文字抽出部７は、
認識結果修正部６での修正対象の文字（以下、修正対象
文字）のうち一定の除外条件を満たす文字を自動的に除
外し、それ以外の文字を学習を要する学習文字として記
憶しているデータを出力する。実施例では、学習文字抽
出部７における除外条件を下記の５条件とし、これら５
条件のいずれかに該当する文字以外の文字を学習文字と
している。<Learning Character Extraction Unit> The learning character extraction unit 7
Data in which characters that satisfy certain exclusion conditions among characters to be corrected by the recognition result correction unit 6 (hereinafter, correction target characters) are automatically excluded, and other characters are stored as learning characters requiring learning. Is output. In the embodiment, the exclusion conditions in the learning character extraction unit 7 are set as the following five conditions.
Characters other than those that meet any of the conditions are set as learning characters.

【００２０】除外条件；修正入力により挿入された文字修正入力により削除された修正対象文字修正対象文字がその前後の文字と接触している場合
の当該修正対象文字修正対象文字の大きさが閾値より小さい場合の当該
修正対象文字修正対象文字が掠れている場合の当該修正対象文字上記，，の除外条件は、切出しが失敗している可
能性があるためこれらの文字を学習文字とすることを不
適として除外するために設けたものであり、，の除
外条件については認識結果修正部６から得る情報を基に
して判定でき、の除外条件については文字切出部２か
ら得る情報を基に判定できる。また、上記，の除外
条件は文字の形状が粗悪であることが予想できるので、
これらの文字を学習文字とすることを不適として除外す
るために設けたものであり、の除外条件については文
字切出部２から得る情報を基に判定でき、の除外条件
については修正対象文字の黒点（ドット）数から判定で
きる。Exclusion condition: Character inserted by correction input Character to be corrected deleted by correction input Character to be corrected when the character to be corrected touches characters before and after it The size of the character to be corrected is larger than the threshold. The character to be corrected when the character is small The character to be corrected when the character to be corrected is blurred The exclusion conditions described above are inappropriate to use these characters as learning characters because the extraction may have failed. The exclusion condition can be determined based on information obtained from the recognition result correction unit 6, and the exclusion condition can be determined based on information obtained from the character extraction unit 2. . In addition, the above exclusion condition can be expected that the shape of the character is poor,
These characters are provided in order to exclude them as inappropriate learning characters. The exclusion condition can be determined based on information obtained from the character cutout unit 2, and the exclusion condition can be determined based on the correction target character. It can be determined from the number of black points (dots).

【００２１】＜認識辞書＞認識辞書は、ハードディス
ク、フロッピーディスク、光ディスク等のリムーバブル
な記録媒体に記録されており、これら記録媒体の読み出
し／書込装置、すなわち、磁気ディスク装置，フロッピ
ーディスク装置，光ディスク装置等のいずれかが文字認
識装置の構成部分とされる。<Recognition Dictionary> The recognition dictionary is recorded on a removable recording medium such as a hard disk, a floppy disk, or an optical disk, and a read / write device for these recording media, that is, a magnetic disk device, a floppy disk device, and an optical disk. Any of the devices and the like is a component of the character recognition device.

【００２２】図２は認識辞書９の構造を示す説明図であ
り、認識辞書９はテンプレート群９１とフィルタデータ
群９２を有している。テンプレートは文字コードとその
文字コードに対応する文字の特徴量およびテンプレート
番号を有し、フィルタデータは文字コード別の認識距離
の最大値の平均値とその二乗平均値（分散値）、認識距
離の最小値その二乗平均値（分散値）、これらの値を作
成する際に用いた文字数からなっている。フィルタデー
タは認識辞書９を作成するために使用した文字データを
当該認識辞書を用いて文字認識部４と同様の認識処理を
行なうことにより求める。ここでいう認識距離は認識す
る文字の特徴量と同じ文字コードをもつ認識辞書の特徴
量をベクトルとして算出した距離をいう。FIG. 2 is an explanatory diagram showing the structure of the recognition dictionary 9. The recognition dictionary 9 has a template group 91 and a filter data group 92. The template includes a character code, a feature amount of the character corresponding to the character code, and a template number. The filter data includes an average value of a maximum value of a recognition distance for each character code, a root-mean-square value (variance value), and a The minimum value includes the root mean square value (variance value), and the number of characters used in creating these values. The filter data is obtained by performing the same recognition processing as that of the character recognition unit 4 on the character data used to create the recognition dictionary 9 using the recognition dictionary. Here, the recognition distance refers to a distance calculated as a vector of a feature amount of a recognition dictionary having the same character code as the feature amount of a character to be recognized.

【００２３】本発明の辞書学習方式では、学習された認
識辞書を初期の認識辞書（標準辞書）とは別に、入力文
字のソースの種類に従って保存する。この方式により入
力文字が、以前に学習したものであればその時の認識辞
書を認識に用いることができ、また、未学習の文字であ
れば標準辞書を用いるので常に安定した認識条件を得る
ことができる。なお、入力文字のソースの種類として
は、手書きの帳票を読取とって得た入力文字や手書き入
力装置等によって得られた入力文字の場合には（文字癖
等の個性を有するので）書手そのものをソースとでき、
活字の場合には書籍名やフォントをソースとすることが
できる。In the dictionary learning method according to the present invention, the learned recognition dictionary is stored separately from the initial recognition dictionary (standard dictionary) according to the type of input character source. With this method, if the input characters have been learned before, the recognition dictionary at that time can be used for recognition.If the characters have not been learned, the standard dictionary is used, so that stable recognition conditions can always be obtained. it can. In addition, as the type of input character source, in the case of an input character obtained by reading a handwritten form or an input character obtained by a handwriting input device or the like (because it has a personality such as a character habit), the writer itself Can be the source,
In the case of print, the title or font can be used as the source.

【００２４】＜学習辞書部＞［構成］図１（ｂ）で、学習辞書部８は、特徴抽出部１
１，認識距離計算部１２，学習文字フィルタ部１３，学
習文字選択部１４，学習文字合成部１５，テンプレート
追加部１６およびフィルタデータ変更部１７を有してい
る。特徴抽出部１１〜フィルタデータ変更部１７は、ハ
ードウエア回路で構成することもできるが本実施例では
プログラムで構成している。なお、特徴抽出部１１〜フ
ィルタデータ変更部１７の各モジュールのうちのあるモ
ジュールをハードウエア回路で、その他のモジュールを
プログラムで構成するようにしてもよい。また、特徴抽
出部１１〜フィルタデータ変更部１７をプログラムで構
成した場合にはその各モジュールはＲＯＭ等の記録媒体
に記録され、制御プログラムのコントロール下で制御部
のＣＰＵにより実行制御されて、認識辞書の学習処理を
実現する。<Learning Dictionary Unit> [Configuration] In FIG. 1B, the learning dictionary unit 8 includes a feature extracting unit 1.
1, a recognition distance calculation unit 12, a learning character filter unit 13, a learning character selection unit 14, a learning character synthesis unit 15, a template addition unit 16, and a filter data change unit 17. The feature extracting unit 11 to the filter data changing unit 17 can be configured by a hardware circuit, but in the present embodiment, is configured by a program. Some of the modules of the feature extracting unit 11 to the filter data changing unit 17 may be configured by a hardware circuit, and the other modules may be configured by a program. When the feature extracting unit 11 to the filter data changing unit 17 are configured by a program, each module is recorded on a recording medium such as a ROM, and is executed and controlled by the CPU of the control unit under the control of the control program. A dictionary learning process is realized.

【００２５】［辞書学習動作］図３は学習辞書部８の辞
書学習動作を示すフローチャートである。特徴抽出部１
１では、図１（ａ）の学習文字抽出部７から出力された
複数の学習文字のうち、そのデータに文字の特徴量が含
まれているか否かを調べ（Ｓ１）、特徴量が含まれてい
る場合にはＳ３に移行し、特徴量が含まれていない学習
データについてはその文字の特徴量を抽出（作成）する
（Ｓ２）。[Dictionary Learning Operation] FIG. 3 is a flowchart showing a dictionary learning operation of the learning dictionary section 8. Feature extraction unit 1
In step S1, it is checked whether or not the data includes a feature amount of a character among a plurality of learning characters output from the learning character extraction unit 7 in FIG. 1A (S1). If so, the process proceeds to S3, and for the learning data that does not include the feature, the feature of the character is extracted (created) (S2).

【００２６】次に、認識距離計算部１２で学習文字の特
徴と認識辞書９のテンプレートの特徴量から各文字コー
ド別に最小認識距離およびその時のテンプレート番号
と、最大認識距離およびその時のテンプレート番号とか
らなる認識距離データを作成する（Ｓ３）。図４に認識
距離データの構造を示す。Next, the recognition distance calculation unit 12 determines the minimum recognition distance and the template number at that time for each character code, and the maximum recognition distance and the template number at that time for each character code based on the features of the learning characters and the feature amounts of the templates in the recognition dictionary 9. The following recognition distance data is created (S3). FIG. 4 shows the structure of the recognition distance data.

【００２７】学習文字フィルタ部１３では、学習に不適
な学習文字をオペレータの判断を要することなく自動的
に除外する。具体的には、後述の式２（認識距離の正規
化式）によって導き出された各文字コードの最小認識距
離の正規化距離のうち、学習文字と異なる各文字コード
の正規化距離と閾値とを比較し、閾値より小さいものが
あればその学習文字を除外する。すなわち、このような
学習文字は異なる文字コードの最小認識距離の正規分布
内に存在していることとなり、その学習文字を認識辞書
に加えることによりその異なる文字コードの認識に影響
を与える可能性が生ずるものとしてその学習文字を除外
する。同様に、学習文字の文字コードの最大認識距離の
正規化距離を式２から求めて閾値と比較し、正規化距離
が閾値より大きければその学習文字を除外する。すなわ
ち、このような学習文字はその文字コードの最大認識距
離の正規分布内に存在しないため、その学習文字の信頼
性が低いものとして除外する（Ｓ４）。また、上記正規
化距離と閾値との比較によるフィルタ処理の結果、学習
文字が全く無い場合には処理を終了する（Ｓ５）。The learning character filter unit 13 automatically excludes learning characters unsuitable for learning without requiring the operator to make a determination. Specifically, among the normalized distances of the minimum recognition distances of the respective character codes derived by Expression 2 (recognition distance normalization expressions) described below, the normalized distances of the respective character codes different from the learning characters and the threshold are set. If the learning character is smaller than the threshold value, the learning character is excluded. That is, such a learning character exists in the normal distribution of the minimum recognition distance of different character codes, and adding the learning character to the recognition dictionary may affect the recognition of the different character codes. Exclude the learning character as occurring. Similarly, the normalized distance of the maximum recognition distance of the character code of the learning character is obtained from Expression 2 and compared with a threshold. If the normalized distance is larger than the threshold, the learning character is excluded. That is, since such a learning character does not exist in the normal distribution of the maximum recognition distance of the character code, the learning character is excluded as having low reliability (S4). If the result of the filtering process based on the comparison between the normalized distance and the threshold value indicates that there is no learning character, the process ends (S5).

【００２８】上述した学習文字フィルタ部１３での学習
文字と異なる文字コードでの比較処理は、学習文字が異
なる文字コードの辞書テンプレートから統計的に十分離
れているか否かを調べるために行なわれる。すなわち、
学習文字におけるある文字コードの最小認識距離とはそ
の文字コードの認識辞書のテンプレートの中で最も学習
文字に近いものと学習文字との特徴ベクトルの距離であ
り、その距離が短いということはその学習文字がその文
字コードに近いことを示している（図６参照）。学習文
字が他の文字コードに近過ぎる場合にその文字について
学習処理をすれば、近過ぎる他の文字コードの認識性能
に悪影響を及ぼす（他の文字コードの文字が入力された
時に学習文字の文字コードとして認識されてしまう）た
め、その学習文字を除外する。The above-described comparison processing with the learning character filter unit 13 using the character code different from the learning character is performed to check whether the learning character is statistically sufficiently separated from the dictionary template of the different character code. That is,
The minimum recognition distance of a character code in a learning character is the distance between the feature vector of the character dictionary closest to the learning character in the template of the character code and the learning character. This indicates that the character is close to the character code (see FIG. 6). If the learning process is performed for a character that is too close to another character code, the recognition performance of the other character code that is too close will be adversely affected. The learning character is excluded because it is recognized as a code).

【００２９】また、学習文字の文字コードの比較はその
文字コードの統計的分布にその学習文字が含まれている
か否かを調べるために行なわれる。すなわち、ある文字
コード内の最大認識距離とはその文字コードの特徴空間
の半径を近似しているものであることから、ある文字に
おいてその値が平均値からかけ離れていることはその文
字が対象文字コードの特徴空間からかけ離れていること
を意味する（図７参照）。従って、同じ文字コードの他
の文字の特徴からかけ離れている文字はその文字の特徴
の信頼性が低く学習には適さないものとしてその学習文
字を学習の対象から除外する。The comparison of the character codes of the learning characters is performed to check whether or not the statistical characters of the character codes include the learning characters. That is, since the maximum recognition distance in a certain character code approximates the radius of the feature space of the character code, the fact that the value of a certain character is far from the average value indicates that the character is not the target character. This means that it is far from the feature space of the code (see FIG. 7). Therefore, a character that is far from the characteristics of other characters of the same character code has low reliability of the characteristics of the character and is not suitable for learning, and the learning character is excluded from learning.

【００３０】上記２つの除外条件に基づいて、学習には
不適な文字すなわち他の文字コードの認識に影響を及ぼ
す文字および同じ文字コードの平均的特徴からかけ離れ
ている文字についてはオペレータの判断なしに自動的に
除外できる。また、他の文字コードの特徴に近い文字や
同じ文字コードの平均的特徴からかけ離れている文字が
出現する原因としては、文字の掠れや潰れ、傾きや変形
によって文字のイメージが粗悪になっていることや、オ
ペレータが認識結果の修正の際に誤った文字コードの文
字を入力すること等がある。Based on the above two exclusion conditions, characters that are unsuitable for learning, that is, characters that affect the recognition of other character codes and characters that are far from the average feature of the same character code, can be determined without the operator's judgment. Can be automatically excluded. In addition, characters that are close to the characteristics of other character codes or characters that are significantly different from the average characteristic of the same character code may be caused by character blurring, crushing, inclination or deformation, and the image of the character is poor. Or the operator inputs a character with an incorrect character code when correcting the recognition result.

【００３１】学習文字選択部１４では、有効な学習文字
をオペレータの判断を要することなく自動的に除外す
る。具体的には、学習文字フィルタ部１３でのフィルタ
処理を通過した学習文字を学習対象となる認識辞書のテ
ンプレート（学習対象テンプレート）別に分類し、その
中から未だ学習処理をしていなくて正認識距離の最小な
学習文字を選択していく（Ｓ６）。The learning character selecting section 14 automatically excludes valid learning characters without requiring the operator to make a decision. Specifically, the learning characters that have passed the filtering process in the learning character filter unit 13 are classified according to the template of the recognition dictionary to be learned (learning template), and the learning process is not yet performed and the correct recognition is performed. A learning character having the shortest distance is selected (S6).

【００３２】ここで、学習対象テンプレートとは、学習
文字の認識距離データにおいて学習文字と同じ文字コー
ドで最小認識距離のときのテンプレート、すなわち、学
習文字と同じ文字コードで最も近いテンプレートをい
う。また、その時の最小認識距離を正認識距離とする。
なお、認識する文字と同じ文字コードにおける最小認識
距離を正認識距離とし、認識する文字と異なる文字コー
ドの最小認識距離のうち、最も小さなものを誤認識距離
とすると、正認識距離＞誤認識距離のとき認識結果は誤
認識となる。また、誤認識距離≧正認識距離≧誤認識距
離−α（α：リジェクト判定用の閾値）のとき認識結果
はリジェクトとなる。更に、誤認識距離−α＞正認識距
離のとき認識結果は正認識となる。Here, the learning target template refers to a template having the same character code as the learning character and a minimum recognition distance in the recognition distance data of the learning character, that is, a template closest to the same character code as the learning character. Further, the minimum recognition distance at that time is defined as a positive recognition distance.
If the minimum recognition distance in the same character code as the character to be recognized is the correct recognition distance, and the smallest one of the minimum recognition distances of the character codes different from the character to be recognized is the incorrect recognition distance, the correct recognition distance> the incorrect recognition distance In this case, the recognition result is erroneous recognition. Further, when the erroneous recognition distance ≧ the correct recognition distance ≧ the erroneous recognition distance−α (α: threshold for reject determination), the recognition result is rejected. Further, when the erroneous recognition distance−α> the correct recognition distance, the recognition result is the correct recognition.

【００３３】学習文字を学習対象テンプレートに分ける
のは、同一ソースから入力された同一文字コードの文字
で正認識距離の辞書テンプレート（学習対象テンプレー
ト）が同じ学習文字は、その特徴量ベクトルを辞書テン
プレートの特徴ベクトルから見た場合、図８に示すよう
に同じ方向性を持っているからである。The learning characters are divided into the learning target templates. The learning characters having the same character code input from the same source and having the same dictionary template of the correct recognition distance (learning target template) have the feature amount vector as the dictionary template. This is because they have the same directionality as shown in FIG.

【００３４】このため、その学習文字の中の学習対象テ
ンプレートに近いものから学習することにより学習対象
テンプレートの特徴量を徐々にそれらの特徴量に近づけ
て学習文字を認識するようにして学習対象テンプレート
の変化量を最小限におさえることができる。通常、認識
辞書は初期状態（学習をしていない状態）ではテンプレ
ートのバランスが取れているためこのような学習方法を
用いることにより、そのバランスを大きく崩すことなく
学習できる。また、オペレータの判断を要さないのでオ
ペレータに文字特徴や学習に係わる専門的知識を要求す
る必要がない。Therefore, by learning from the learning characters that are close to the learning target template, the feature amount of the learning target template is gradually brought closer to those feature amounts, and the learning character is recognized. Can be minimized. Normally, the recognition dictionary has a balanced template in an initial state (a state in which learning is not performed). Therefore, by using such a learning method, learning can be performed without significantly breaking the balance. Further, since it is not necessary for the operator to make a judgment, it is not necessary to request the operator for character characteristics and specialized knowledge relating to learning.

【００３５】学習文字合成部１５では、学習文字選択部
１４で選ばれた学習文字の特徴量と学習対象テンプレー
トの特徴量とを、後述の特徴量合成法（式３：図５参
照）により学習文字の合成比率を初期値から徐々に変化
させながらベクトル量として合成していき、学習対象テ
ンプレートの特徴量を学習文字に近づけていく。そし
て、繰り返し合成中に後述の３つの合成終了判定条件の
いずれかを満たしたとき、特徴量合成処理を終了する
（Ｓ７）。The learning character synthesizing unit 15 learns the characteristic amount of the learning character selected by the learning character selecting unit 14 and the characteristic amount of the learning target template by using a characteristic amount synthesizing method described later (formula 3: see FIG. 5). The character synthesis ratio is gradually changed from the initial value and is synthesized as a vector amount to bring the feature amount of the learning target template closer to the learning character. Then, when any of the following three synthesis termination determination conditions is satisfied during the repetitive synthesis, the feature amount synthesis processing ends (S7).

【００３６】テンプレート追加部１６では、上記特徴量
合成処理の繰り返しの終了時に、学習文字が正認識され
るようになっているか否かを調べ、正認識されるように
なっていればＳ９に移行し、正認識（後述）されるよう
になっていなければ学習文字の特徴量を認識辞書の新た
なテンプレートとして追加する（Ｓ８）。すなわち、特
徴量合成のみの学習方法ではテンプレートが増えないの
で学習後の認識速度に影響を与えないが、辞書が元の辞
書から大きく変化したり、他の文字コードに近づき過ぎ
てしまうと認識率が低下する可能性がある。また、テン
プレート追加のみの学習方法ではそのテンプレートが他
の文字コードに近過ぎなければ認識率の低下をもたらす
ことはないが、テンプレートの増加は認識速度の低下要
因となる。そこで、本発明では、特徴量合成による学習
方法を優先し（Ｓ７）、後述の後述の合成終了判定条件
の（式５）または（式６）が成立した場合に、当該
学習方法を不適として、テンプレート追加による学習方
法に切換える。これより、オペレータの判断を要するこ
となしにその学習文字について最適な学習方法を選択す
ることができる。At the end of the repetition of the feature amount synthesizing process, the template adding unit 16 checks whether or not the learning character is correctly recognized. If the learning character is correctly recognized, the process proceeds to S9. If the recognition is not performed correctly (described later), the feature amount of the learning character is added as a new template of the recognition dictionary (S8). In other words, the learning method using only the feature amount synthesis does not affect the recognition speed after learning because the number of templates does not increase. May decrease. In the learning method using only the template addition, the recognition rate does not decrease unless the template is too close to other character codes. However, the increase in the template causes the recognition speed to decrease. Therefore, in the present invention, the learning method based on the feature amount synthesis is prioritized (S7), and when the later-described synthesis end determination condition (Equation 5) or (Equation 6) is satisfied, the learning method is determined to be inappropriate. Switch to learning method by adding template. As a result, it is possible to select an optimal learning method for the learning character without requiring an operator's judgment.

【００３７】フィルタデータ変更部１７では、上記Ｓ
７，Ｓ８（学習文字合成部１５）またはＳ９（テンプレ
ート追加部１６）によって学習文字が正認識されるよう
になった場合に、認識辞書９のフィルタデータを修正す
る。具体的には、学習後の認識辞書で学習文字の認識処
理を行ない、この時の認識距離データを学習文字フィル
タ部１３で用いる認識距離の最小値／最大値の平均値と
その二乗平均値および文字数のそれぞれに追加すること
により、認識距離の最小値／最大値の平均値とその二乗
平均値および文字数を修正対象文字に適した値に修正す
る（Ｓ１０）。In the filter data changing section 17, the S
7, when the learning character is correctly recognized by S8 (learning character synthesizing unit 15) or S9 (template adding unit 16), the filter data of the recognition dictionary 9 is corrected. More specifically, the learning character recognition process is performed using the recognition dictionary after learning, and the recognition distance data at this time is used as an average value of the minimum / maximum recognition distances used in the learning character filter unit 13 and its root mean square value. By adding to each of the number of characters, the average value of the minimum value / maximum value of the recognition distance, its squared average value, and the number of characters are corrected to values suitable for the correction target character (S10).

【００３８】学習文字選択部１４で選択された全ての学
習の学習処理（Ｓ７〜Ｓ９）の終了後、再度、認識距離
計算部１２で学習後の認識辞書を用いて認識距離データ
を作成し、その後、学習処理をせず、未だ誤認識する学
習文字についてのみ、全ての保存されている学習文字が
正認識されるようになるか、或いは学習文字フィルタ部
１３を通過する文字が存在しなくなるまで上記Ｓ３〜Ｓ
１０（認識距離計算部１２〜ファイルデータ変更部１
７）の学習処理を繰り返す（Ｓ１１）。After the learning process (S7 to S9) for all the learning selected by the learning character selecting unit 14, the recognition distance calculating unit 12 creates recognition distance data again using the learning dictionary after learning. Thereafter, the learning process is not performed, and only the learning characters that are still erroneously recognized are correctly recognized until all stored learning characters are recognized correctly, or until there are no characters that pass through the learning character filter unit 13. S3 to S
10 (recognition distance calculation unit 12 to file data change unit 1)
The learning process of 7) is repeated (S11).

【００３９】［認識距離の正規化等］学習文字フィルタ
部１３で閾値と比較する最小／最大認識距離の正規化距
離は次により求めることができる。文字コードｃｏにお
ける認識距離ｄc0の分散値ｖａｒ（ｄc0）は、認識辞書
９に格納されている認識距離の平均値ａｖｅ（ｄc0）お
よびその二乗平均値ａｖｅ（ｄc0²）より下記の式１に
より求められる。ｖａｒ（ｄco）＝ａｖｅ（ｄco²）−（ａｖｅ（ｄco ））² （式１）式１によって得られる各文字コードの最小認識距離の分
散値を用いて、認識距離計算部１２で得た学習文字の認
識距離データの各文字コードにおける最小認識距離を下
記の式２に従って正規化する。ｄ’co＝（ｄco−ａｖｅ（ｄco））／（ｖａｒ（ｄco ））^1/2 （式２）ここで、ｄcoは学習文字の文字コードｃｏにおける最小
認識距離、ｄ’coは学習文字の文字コードｃｏにおける
最小認識距離の正規化距離、ａｖｅ（ｄco）は認識辞書
９に格納されている文字コードｃｏにおける最小認識距
離の平均値、ｖａｒ（ｄco）は上記式１で算出された文
字コードｃｏの最小認識距離の分散値である。[Normalization of recognition distance, etc.] The normalized distance of the minimum / maximum recognition distance to be compared with the threshold value in the learning character filter unit 13 can be obtained as follows. Variance value var recognition distance dc0 in the character code co (dc0) obtains the average value ave of the recognition distance stored in the recognition dictionary 9 (dc0) and its mean square value ave than (dc0 ²⁾ by equation 1 below Can be var (dco) = ave (dco ² ) − (ave (dco)) ² (Equation 1) Learning obtained by the recognition distance calculation unit 12 using the variance value of the minimum recognition distance of each character code obtained by Expression 1 The minimum recognition distance of each character code of the character recognition distance data is normalized according to the following equation (2). d'co = (dco-ave (dco)) / (var (dco)) ^1/2 (Equation 2) where dco is the minimum recognition distance in the character code co of the learning character, and d'co is the character of the learning character. The normalized distance of the minimum recognition distance in the code co, ave (dco) is the average value of the minimum recognition distance in the character code co stored in the recognition dictionary 9, and var (dco) is the character code co calculated by the above equation 1. Is the minimum recognition distance variance.

【００４０】また、式１によって得られる各文字コード
の最大認識距離の分散値および上記式２を用いて、認識
距離計算部１２で得た学習文字の認識距離データの各文
字コードにおける最大認識距離を正規化することができ
る。この場合、学習文字の文字コードｃｏにおける最大
認識距離をｄco、学習文字の文字コードｃｏにおける最
大認識距離の正規化距離をｄ’co、認識辞書９に格納さ
れている文字コードｃｏにおける最大認識距離の平均値
をａｖｅ（ｄco）、ｖａｒ（ｄco）を式１で算出された
文字コードｃｏの最大認識距離の分散値として式２を適
用すればよい。Also, using the variance of the maximum recognition distance of each character code obtained by Expression 1 and Expression 2 above, the maximum recognition distance of each character code of the recognition distance data of the learning character obtained by the recognition distance calculation unit 12 Can be normalized. In this case, the maximum recognition distance in the character code co of the learning character is dco, the normalized distance of the maximum recognition distance in the character code co of the learning character is d'co, and the maximum recognition distance in the character code co stored in the recognition dictionary 9. Equation 2 may be applied by setting the average value of ave (dco) and var (dco) to the variance of the maximum recognition distance of the character code co calculated by equation 1.

【００４１】このようにある正規化分布における観測値
の範囲はその標本の平均値と分散値を用いて式２により
正規化された観測値の範囲として求めることができる。
例えば、信頼度を９５％とすれば、正規化分布表から−
１．９６〜１．９６が正規化された観測値の範囲（信頼
区間）として与えられる。これにより、認識距離がある
文字コードの認識距離の正規分布内に存在しているか否
かを判定できる。As described above, the range of the observed value in a certain normalized distribution can be obtained as the range of the observed value normalized by the equation 2 using the average value and the variance value of the sample.
For example, assuming that the reliability is 95%, from the normalized distribution table,
1.96 to 1.96 are given as the range (confidence interval) of the normalized observations. This makes it possible to determine whether or not the recognition distance exists within the normal distribution of the recognition distance of a certain character code.

【００４２】［特徴量合成法および合成終了判定条件］（１）特徴量合成法学習文字の特徴量と学習対象テンプレートの特徴量の合
成は、学習文字の合成比率を初期値から徐々に変化させ
ながらベクトル量として下記式（式３）により行なう。
なお、図５に合成による学習で遷移する特徴量の遷移状
態を示した。Ｆdic'＝（１−Ｒ）×Ｆdic ＋Ｒ×Ｆunk （式３）ここで、Ｆunk は学習文字の特徴量、Ｆdic は認識辞書
における学習対象テンプレートの特徴量、Ｆdic'は上記
テンプレートの合成後の特徴量、Ｒは合成比率（０＜Ｒ
＜１）である。[Feature Amount Synthesis Method and Synthesis Completion Judgment Condition] (1) Feature Amount Synthesis Method The feature amount of the learning character and the feature amount of the learning target template are synthesized by gradually changing the synthesis ratio of the learning character from the initial value. While performing the following equation (Equation 3) as a vector amount.
Note that FIG. 5 shows a transition state of a feature amount that transitions by learning by synthesis. Fdic ′ = (1−R) × Fdic + R × Funk (Equation 3) Here, Funk is the feature amount of the learning character, Fdic is the feature amount of the learning target template in the recognition dictionary, and Fdic ′ is the feature of the template after synthesis. The amount and R are the synthesis ratio (0 <R
<1).

【００４３】（２）合成終了判定条件前述したように、上記式３による繰り返し合成中に、下
記３条件のうちのいずれかを満たした時に学習文字合成
部１５での処理を終了する。学習文字が正認識され、且つ正認識距離と誤認識距離
の差が閾値より大きい場合、すなわち、下記式（式４）
が成立する場合、｜Ｆdic^err−Ｆunk ｜−｜Ｆdic'−Ｆunk ｜＞α１（式４）但し、Ｆdic^errは誤認識距離のテンプレートの特徴ベク
トル、α１（＞０）は閾値である。学習対象テンプレートの合成前後における移動値が閾
値より大きい場合、すなわち、下記式（式５）が成立す
る場合、｜Ｆdic'−Ｆdic ｜＞α２（式５）但し、α２（＞０）は閾値である。学習後の学習対象テンプレートの特徴ベクトルと認識
辞書の学習文字の文字コードと異なる文字のテンプレー
トの特徴ベクトルの距離が閾値より小さい場合、すなわ
ち、下記式（式６）が成立する場合、｜Ｆdicerr−Ｆdic'｜＜α３（式６）但し、α３（＞０）は閾値。(2) Conditions for Determining Combination Termination As described above, when any of the following three conditions is satisfied during repetitive composition according to the above equation 3, the processing in the learning character composition unit 15 is terminated. When the learning character is correctly recognized, and the difference between the correct recognition distance and the erroneous recognition distance is larger than the threshold, that is,
Is satisfied, | Fdic ^err −Funk | − | Fdic′−Funk |> α1 (Equation 4) where Fdic ^err is a feature vector of the template of the misrecognition distance, and α1 (> 0) is a threshold value. When the movement value before and after the synthesis of the learning target template is larger than the threshold, that is, when the following equation (Equation 5) is satisfied: | Fdic'−Fdic |> α2 (Equation 5) where α2 (> 0) is a threshold is there. If the distance between the feature vector of the learning target template after learning and the feature vector of the template of a character different from the character code of the learning character in the recognition dictionary is smaller than the threshold, that is, if the following equation (Equation 6) holds, | Fdicerr- Fdic '| <α3 (Equation 6) where α3 (> 0) is a threshold value.

【００４４】の合成終了判定条件が成立するケース、
すなわち正認識距離と誤認識距離の大きさが閾値α１よ
り大きくなるケースがあるが、その原因は、同一ソース
から入力された同じ文字コードの文字でもその特徴量に
ある程度のばらつきがあるので、１つの学習文字を正認
識距離に余裕が生ずるように学習することでそのばらつ
きを吸収するように構成してあることによる。A case where the condition for judging the completion of synthesis is satisfied,
That is, in some cases, the magnitudes of the correct recognition distance and the erroneous recognition distance are larger than the threshold value α1, but the reason is that even with characters of the same character code input from the same source, there is a certain degree of variation in the feature amount. This is because the learning is performed so as to absorb the variation by learning the two learning characters so that the correct recognition distance has a margin.

【００４５】の合成終了判定条件は、、前述したよう
に、通常、認識辞書は初期状態ではテンプレートのバラ
ンスが取れているため、認識辞書の既存のテンプレート
が大きく移動するとそのバランスを大きく崩し、その文
字コードの認識率が低下する可能性があるので、これを
防ぐためのものである。As described above, since the recognition dictionary is normally balanced in the initial state of the template as described above, if the existing template in the recognition dictionary moves greatly, the balance is largely lost. This is to prevent the possibility that the recognition rate of the character code may be reduced.

【００４６】また、の合成終了判定条件は、学習対象
のテンプレートが他の文字コードのテンプレートに近づ
き過ぎることによりその文字コードの認識に悪影響を及
ぼすことを防ぐために設けた条件である。The synthesis termination determination condition is a condition provided to prevent the template to be learned from being too close to the template of another character code, thereby adversely affecting the recognition of the character code.

【００４７】［最小認識距離］図６は最小認識距離の説
明図である。図６で、学習文字Ｘの特徴量ベクトルをＡ
（三角印）、図５の文字コードの特徴量ベクトルの分
布範囲を右側の円（破線）ａ、図５の文字コードの特
徴量ベクトルの分布範囲を左側の大きな円（破線）ｄと
し、文字コードの特徴量ベクトルをｂ（バツ印），辞
書テンプレートの特徴量ベクトルをｃ（丸印），学習文
字に最も近い辞書テンプレートをＢ（二重丸印）とし、
文字コードの特徴量ベクトルをｅ（バツ印），辞書テ
ンプレートの特徴量ベクトルをｆ（丸印），学習文字に
最も近い辞書テンプレートをＤ（二重丸印）とする。こ
の場合、学習文字とその文字コードの最小認識距離は
Ｃで示される線分の長さとなり、学習文字Ｘその文字コ
ードの最小認識距離はＥで示される線分の長さとな
り、Ｃ＜Ｅであるから学習文字Ｘの最小認識距離は線分
Ｃの長さとなる。[Minimum Recognition Distance] FIG. 6 is an explanatory diagram of the minimum recognition distance. In FIG. 6, the feature vector of the learning character X is A
(Triangle mark), the distribution range of the characteristic amount vector of the character code in FIG. 5 is a right circle (broken line) a, and the distribution range of the characteristic amount vector of the character code in FIG. The feature vector of the code is b (cross), the feature vector of the dictionary template is c (circle), the dictionary template closest to the learning character is B (double circle),
It is assumed that the feature vector of the character code is e (cross), the feature vector of the dictionary template is f (circle), and the dictionary template closest to the learning character is D (double circle). In this case, the minimum recognition distance between the learning character and its character code is the length of the line segment indicated by C, the minimum recognition distance of the learning character X and its character code is the length of the line segment indicated by E, and C <E Therefore, the minimum recognition distance of the learning character X is the length of the line segment C.

【００４８】［最大認識距離］図７は最大認識距離の説
明図である。図７で、学習文字をＡ（三角印）、図５の
文字コードの特徴量ベクトルの分布範囲を円（破線）
ａとし、文字コードの特徴ベクトルをｂ（バツ印），
辞書テンプレートの特徴量ベクトルをｃ（丸印），最大
認識空間（≒特徴空間半径）をｄ、学習文字Ａから最も
はなれている辞書テンプレートをＢ（二重丸印）とする
と、学習文字Ａにおける文字コードの最大認識距離は
線分ＢＡの長さとなる。[Maximum Recognition Distance] FIG. 7 is an explanatory diagram of the maximum recognition distance. In FIG. 7, the learning character is A (triangle mark), and the distribution range of the feature vector of the character code in FIG. 5 is a circle (dashed line).
a, and the character vector of the character code is b (cross),
Assuming that the feature vector of the dictionary template is c (circle), the maximum recognition space (≒ feature space radius) is d, and the dictionary template that is most separated from the learning character A is B (double circle), the learning character A The maximum recognition distance of the character code is the length of the line segment BA.

【００４９】［学習文字の方向性］図７は学習文字の方
向性の説明図である。図７で、図５の文字コードの特
徴量ベクトルの分布範囲を右側の円（破線）ａ、文字コ
ードの特徴量ベクトルをｂ（バツ印），辞書テンプレ
ートの特徴量ベクトルをｃ（丸印）した場合、あるテン
プレート（黒丸印）を最小認識距離のテンプレートとす
る学習文字群Ａはそのテンプレートを中心として一定の
角度範囲に存在し、他のテンプレート（二重丸印）を最
小認識距離のテンプレートとする学習文字群Ｂはそのテ
ンプレートを中心として一定の角度範囲に存在する（す
なわち、方向性を有する）。[Direction of Learning Character] FIG. 7 is an explanatory diagram of the direction of the learning character. In FIG. 7, the distribution range of the characteristic amount vector of the character code in FIG. 5 is a circle on the right (broken line) a, the characteristic amount vector of the character code is b (cross), and the characteristic amount vector of the dictionary template is c (circle). In this case, the learning character group A having a certain template (black circle) as a template of the minimum recognition distance exists within a certain angle range around the template, and another template (double circle) is a template of the minimum recognition distance. The learning character group B exists in a certain angle range around the template (that is, has a directionality).

【００５０】[0050]

【発明の効果】上記説明したように、本発明の辞書学習
方式および文字認識装置によれば、学習に不適な学習文
字を認識距離に基づいて自動的に除外することができ、
更に、学習文字を学習対象となる認識辞書のテンプレー
ト別に分類して認識距離に基づいて選択できるので、オ
ペレータの判断を必要とせずオペレータに文字特徴や学
習に係わる専門的知識を要求する必要がない。As described above, according to the dictionary learning method and the character recognition device of the present invention, learning characters unsuitable for learning can be automatically excluded based on the recognition distance.
Further, since the learning characters can be classified according to the template of the recognition dictionary to be learned and selected based on the recognition distance, there is no need for an operator's judgment, and there is no need to require the operator to provide character characteristics or specialized knowledge related to learning. .

【００５１】また、学習文字の最小認識距離を基準にテ
ンプレートの追加の要否を判定して認識辞書データの更
新または追加を行なうので、オペレータの判断なしにそ
の学習文字について最適な学習方法を選択できる。Further, since the necessity of adding a template is determined based on the minimum recognition distance of a learning character to update or add the recognition dictionary data, an optimum learning method for the learning character is selected without an operator's judgment. it can.

【００５２】また、修正対象の文字から所定の除外条件
に該当する文字を自動的に除外して学習文字を得るの
で、学習文字として不適切な修正対象文字、例えば、隣
同士の文字が接触していて切出しが失敗したような文
字、を学習文字とすることを防ぐことができ、無駄な学
習処理を行なうことがない。Further, since a learning character is obtained by automatically excluding a character corresponding to a predetermined exclusion condition from characters to be corrected, a character to be corrected that is inappropriate as a learning character, for example, a character adjacent to the correction target character is contacted. Thus, it is possible to prevent a character that has failed to be cut out from being a learning character, thereby preventing unnecessary learning processing.

[Brief description of the drawings]

【図１】本発明の文字認識装置の構成例を示すブロック
図である。FIG. 1 is a block diagram illustrating a configuration example of a character recognition device of the present invention.

【図２】認識辞書の構造を示す構造図である。FIG. 2 is a structural diagram showing a structure of a recognition dictionary.

【図３】学習辞書部の辞書学習動作を示すフローチャー
トである。FIG. 3 is a flowchart illustrating a dictionary learning operation of a learning dictionary unit.

【図４】認識距離データの構造を示す説明図である。FIG. 4 is an explanatory diagram showing the structure of recognition distance data.

【図５】合成による学習で遷移する特徴量の遷移状態を
示す説明図である。FIG. 5 is an explanatory diagram illustrating a transition state of a feature amount that transitions by learning by synthesis.

【図６】最小認識距離の説明図である。FIG. 6 is an explanatory diagram of a minimum recognition distance.

【図７】最大認識距離の説明図である。FIG. 7 is an explanatory diagram of a maximum recognition distance.

【図８】学習文字の方向性の説明図である。FIG. 8 is an explanatory diagram of the directionality of a learning character.

[Explanation of symbols]

７学習文字抽出部（学習文字抽出手段）９認識辞書１３学習文字フィルタ部（第１のフィルタ手段、第２
のフィルタ手段）１４学習文字選択部（学習文字選択手段）１５学習文字合成部（特徴量合成手段）１６テンプレート追加部（辞書学習方法判定手段）９１テンプレート９２フィルタデータ（認識辞書データ）１００文字認識装置7 learning character extraction unit (learning character extraction unit) 9 recognition dictionary 13 learning character filter unit (first filter unit, second filter unit)
14 learning character selecting unit (learning character selecting unit) 15 learning character synthesizing unit (feature amount synthesizing unit) 16 template adding unit (dictionary learning method determining unit) 91 template 92 filter data (recognition dictionary data) 100 character recognition apparatus

Claims

[Claims]

1. A dictionary learning method for a character recognition device for performing a learning process of a recognition dictionary, comprising a step of automatically excluding a learning character unsuitable for learning of the recognition dictionary based on a recognition distance.

2. The dictionary learning method according to claim 1, wherein
Further, a step of classifying learning characters for each template of a recognition dictionary to be learned, and a step of selecting learning characters that have not yet undergone learning processing based on the recognition distance,
Combining the characteristic amount of the selected learning character and the characteristic amount of the classified template while gradually changing the combination ratio to bring the characteristic amount of the classified template closer to the learning character. Dictionary learning method.

3. The dictionary learning method according to claim 2,
The dictionary learning method further includes a step of determining whether or not to add a template based on the minimum recognition distance of the learning character and updating or adding the recognition dictionary data at the end of the combining process.

4. The dictionary learning method according to claim 1, further comprising a step of automatically excluding a character corresponding to a predetermined exclusion condition from characters to be corrected to obtain a learning character. A dictionary learning method characterized by the following.

5. A character recognition device for performing a learning process of a recognition dictionary, wherein, when a normalized distance between each character code different from the learning character and the learning character is smaller than a first threshold value, the learning character is trained in the recognition dictionary. A first filter means for excluding as a learning character unsuitable for learning a learning character which is inappropriate for learning a dictionary when a normalized distance between the same character code as the learning character and the learning character is larger than a second threshold value A character recognition device comprising: a second filter unit that excludes characters.

6. The character recognition device according to claim 5, wherein
Further, a learning character selecting means for classifying learning characters according to a template of a recognition dictionary to be learned and selecting learning characters which have not yet undergone learning processing based on the recognition distance, and a feature of the selected learning characters. A character amount synthesizing means for synthesizing while changing a synthesis ratio of the amount and the feature amount of the classified template to change the feature amount of the classified template closer to the learning character. .

7. The character recognition device according to claim 6, wherein
Further, at the end of the combining process, the method further comprises a dictionary learning method determining means for determining whether or not to add a template based on the minimum recognition distance of the learning character, and updating or adding the recognition dictionary data based on the determination result. Character recognition device.

8. A learning character extracting apparatus according to claim 5, wherein a learning character is obtained by automatically excluding a character corresponding to a predetermined exclusion condition from a correction target character. A character recognition device comprising means.

9. The character recognition device according to claim 8, wherein
A character recognition device, wherein a learning character extracting unit automatically excludes a correction target character when a correction target character satisfies one of the following exclusion conditions. The character inserted by the correction input, the correction target character deleted by the correction input, the correction target character when the correction target character is in contact with the character before and after it, and the correction target character when the size of the correction target character is smaller than the threshold value. The correction target character, or the correction target character when the correction target character is blurred.