JPH0935006A

JPH0935006A - Character recognition device

Info

Publication number: JPH0935006A
Application number: JP7181275A
Authority: JP
Inventors: Yasunao Isaki; 保直伊▲崎▼
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1995-07-18
Filing date: 1995-07-18
Publication date: 1997-02-07

Abstract

PROBLEM TO BE SOLVED: To generate an optimum standard character pattern dictionary for operation by automatically altering the contents of the standard pattern dictionary according to character statistical information obtained through the character recognition of a document to be recognized or a test document on which bias of characters used for the operation is reflected. SOLUTION: The character recognition is performed by using the test document, etc., and a character statistical information generation part 2 generates a frequency of appearance or character statistical information on a character, etc., whose misread is corrected, character by character, according to the recognition result 6". A standard pattern dictionary alteration part 3 alters the contents of the standard pattern dictionary 4 on the basis of the character statistical information. Thus, the character pattern of the document is inputted from a character pattern input part 5 first for the character recognition. A character recognition part 6 segments one character out of the inputted character pattern, extracts and compares features with features (template) in the generated standard pattern dictionary 4, and finds similarity, a distance, etc., to recognize the character. A recognition result output part 7 outputs the character recognition result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は，文字認識装置に関す
る。文字認識装置は，あらかじめ多数の学習サンプルに
よって作られた標準パターン辞書を使って文字認識する
ものであり，ＯＣＲの利用分野が広まるに伴って，様々
な業務に使用されている。そのため，文字認識の対象と
なる帳票は，様々なフォーマットの帳票が使用され，そ
こに記入される文字種も異なり，文字種の使用頻度も業
務毎に偏りがあるものである。例えば，住所や氏名だけ
を記入する帳票では，都市名，県名，姓，名前等で使用
される文字に偏り，また，数字１０文字だけを記入する
帳票であっても，０と１の出現頻度は他の数字に比べて
著しく高いのが普通である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device. The character recognition device recognizes characters using a standard pattern dictionary created in advance by a large number of learning samples, and is used in various jobs as the field of use of OCR spreads. For this reason, as a form to be subjected to character recognition, forms of various formats are used, the character types to be written therein are different, and the frequency of use of the character types is biased for each business. For example, in a form in which only addresses and names are entered, the letters used in city names, prefecture names, surnames, names, etc. are biased, and even in the form in which only 10 numbers are entered, 0 and 1 appear. The frequency is usually significantly higher than other numbers.

【０００２】これら全ての業務に適用できるように開発
された汎用の標準パターン辞書を使った場合にはいずれ
の業務においても高い認識精度が得られるとは限らな
い。そのため，出現頻度に偏りがある場合にも高い認識
精度で認識できる文字認識装置が求められている。When a general-purpose standard pattern dictionary developed so as to be applicable to all of these jobs is used, high recognition accuracy may not always be obtained in every job. Therefore, there is a demand for a character recognition device that can recognize with high recognition accuracy even when the appearance frequencies are biased.

【０００３】[0003]

【従来の技術】図１５は従来の文字認識装置を示す。図
１５において，３１０は文字認識装置である。2. Description of the Related Art FIG. 15 shows a conventional character recognition device. In FIG. 15, reference numeral 310 is a character recognition device.

【０００４】３１１は文字パターン入力部であって，光
学的文字読み取り装置等により帳票に記入された手書き
文字のパターンを入力するものである。３１２は標準パ
ターン辞書であって，文字カテゴリ毎の標準パターンの
特徴を集めたものである。テンプレートはそれぞれの文
字カテゴリーの標準パターンの特徴を表すものである。Reference numeral 311 is a character pattern input unit for inputting a pattern of handwritten characters written on a form by an optical character reading device or the like. A standard pattern dictionary 312 is a collection of standard pattern features for each character category. The template represents the characteristics of the standard pattern of each character category.

【０００５】標準パターン辞書３１２において，３２１
は文字カテゴリであって，「ア」，「イ」，「ウ」等の
文字パターンの意味を表すものである。321 in the standard pattern dictionary 312
Is a character category and represents the meaning of a character pattern such as "A", "I", "U".

【０００６】３２２は標準パターンであって，「ア」の
手書き文字の標準パターンである。３２３はテンプレー
トであって，手書き文字の標準パターンの特徴を表すも
のである（例えば，横線の本数，縦線の本数，輪郭線の
形状等）。Reference numeral 322 is a standard pattern, which is a standard pattern of a handwritten character "A". Reference numeral 323 is a template that represents the characteristics of a standard pattern of handwritten characters (for example, the number of horizontal lines, the number of vertical lines, the shape of contour lines, etc.).

【０００７】図１５の従来の文字認識装置の動作につい
て説明する。文字パターン入力部３１１は光学文字読み
取り装置等により読み取られた帳票の手書き文字を入力
する。文字認識部３１３は，手書き文字の入力パターン
から１文字の文字パターンを切り出し，特徴を抽出す
る。そして，抽出した特徴について標準パターン辞書３
１２の各テンプレートと比較し，類似度計算もしくは距
離計算を行う。最大の類似度もしくは最小の距離となる
テンプレートの文字カテゴリを認識結果として出力す
る。The operation of the conventional character recognition device shown in FIG. 15 will be described. The character pattern input unit 311 inputs handwritten characters of a form read by an optical character reading device or the like. The character recognition unit 313 extracts a character pattern of one character from the input pattern of the handwritten character and extracts the feature. Then, for the extracted features, the standard pattern dictionary 3
The similarity calculation or the distance calculation is performed by comparing each of the 12 templates. The character category of the template having the maximum similarity or the minimum distance is output as the recognition result.

【０００８】[0008]

【発明が解決しようとする課題】従来の文字認識装置
は，あらかじめすべての業務を想定し，すべての文字が
等頻度で出現するものとして多数の学習サンプルの中か
ら複数の標準パタ−ンを選び，標準パターン辞書として
いた。そのため，実際に使用するのにあたって，各文字
が等頻度で出現する業務に適用する場合には最大の認識
精度が得られるが，多くの場合，文字の出現頻度に偏り
があり，標準パターン辞書の作成条件と大きく異なるた
め，認識精度が低下するという問題があった。A conventional character recognition device, assuming all jobs in advance, selects a plurality of standard patterns from a large number of learning samples assuming that all characters appear with equal frequency. ， It was a standard pattern dictionary. Therefore, in actual use, the maximum recognition accuracy is obtained when applied to a job in which each character appears with equal frequency, but in many cases, the appearance frequency of characters is biased and There is a problem that the recognition accuracy is reduced because it differs greatly from the creation conditions.

【０００９】また，学習文字サンプルは全ての手書き文
字のパターンを含むわけではないので，実際の業務での
出現頻度の高い文字種については学習サンプルより変形
の大きい文字パターンについては認識精度が低下するの
で，従来の文字認識装置は実際に想定していた場合より
文字認識精度が低い場合がある。Further, since the learning character sample does not include all the patterns of handwritten characters, the recognition accuracy decreases for a character pattern that frequently appears in actual work and for a character pattern that is more deformed than the learning sample. However, the conventional character recognition device may have a lower character recognition accuracy than actually assumed.

【００１０】本発明は，帳票で使用されている文字に偏
りがある場合にも高い認識精度が得られる文字認識装置
を提供することを目的とする。It is an object of the present invention to provide a character recognition device that can obtain high recognition accuracy even when the characters used in a form are biased.

【００１１】[0011]

【課題を解決するための手段】本発明の基本構成(1)
は，帳票の文字統計情報を作成する文字統計情報作成部
と，文字の特徴を表す標準パターンをもつ標準パターン
辞書と，文字統計情報をもとに標準パターン辞書の内容
を変更する標準パターン辞書変更部と，認識対象の文字
パターンと標準パターン辞書の標準パターンとを比較し
て該文字パターンの文字認識をする文字認識部と，文字
認識の結果を出力する認識結果出力部とを備える構成を
もつ。[Means for Solving the Problems] Basic configuration of the present invention (1)
Is a character statistical information creation unit that creates the character statistical information of the form, a standard pattern dictionary that has standard patterns that represent the characteristics of the characters, and a standard pattern dictionary change that changes the contents of the standard pattern dictionary based on the character statistical information. Section, a character recognition section for comparing the character pattern to be recognized with the standard pattern in the standard pattern dictionary to recognize the character of the character pattern, and a recognition result output section for outputting the result of the character recognition. .

【００１２】本発明の基本構成(2) は，帳票の文字統計
情報を作成する文字統計情報作成部と，文字の特徴を表
す標準パターンをもつ標準パターン辞書と，認識対象の
文字パターンと標準パターン辞書の標準パターンとを比
較し該文字パターンの文字認識をする文字認識部と，文
字統計情報をもとに文字に重み付けをする重み作成部
と，文字認識の結果を出力する認識結果出力部とを備
え，認識対象の文字パターンと標準パターンとの特徴点
の比較結果と重み作成部の作成した重みとに基づいて認
識結果を求め，出力する構成をもつ。The basic configuration (2) of the present invention is to provide a character statistical information creating unit for creating character statistical information of a form, a standard pattern dictionary having standard patterns representing characteristics of characters, a character pattern to be recognized and a standard pattern. A character recognition unit that compares the character pattern of the character pattern with a standard pattern of a dictionary, a weight creation unit that weights the characters based on the character statistical information, and a recognition result output unit that outputs the result of character recognition. The recognition result is obtained and output based on the comparison result of the feature points of the character pattern to be recognized and the standard pattern and the weight created by the weight creation unit.

【００１３】図１は本発明の基本構成を示す。図１にお
いて，１は文字認識装置である。FIG. 1 shows the basic configuration of the present invention. In FIG. 1, 1 is a character recognition device.

【００１４】２は文字統計情報作成部であって，例え
ば，文字の出現回数，文字の修正回数等を作成するもの
である。３は標準パターン辞書変更部であって，入力さ
れた文字の統計情報に基づいて標準パターン辞書の内容
を変更するものである。例えば，文字の出現回数に応じ
て文字カテゴリ毎に文字認識に使用するテンプレート数
を変更するものである。Reference numeral 2 denotes a character statistical information creating unit, which creates, for example, the number of times a character appears and the number of times a character is modified. A standard pattern dictionary changing unit 3 changes the contents of the standard pattern dictionary on the basis of the statistical information of the input characters. For example, the number of templates used for character recognition is changed for each character category according to the number of times a character appears.

【００１５】４は標準パターン辞書である。５は文字パ
ターン入力部であって，光学的文字読み取り装置等で読
み取られた帳票の文字パターンを入力するものである。Reference numeral 4 is a standard pattern dictionary. A character pattern input unit 5 is for inputting a character pattern of a form read by an optical character reading device or the like.

【００１６】６は文字認識部であって，入力された文字
パターンについて１文字ずつ標準パターン辞書４の文字
パターンと比較し，文字認識をするものである。６”は
認識結果である。Reference numeral 6 denotes a character recognition unit, which compares the input character pattern with the character pattern of the standard pattern dictionary 4 character by character and recognizes the character. 6 "is a recognition result.

【００１７】７は認識結果出力部であって，ディスプレ
イ等に認識結果を出力するものである。図１の本発明の
基本構成(1) の動作を説明する。A recognition result output unit 7 outputs the recognition result to a display or the like. The operation of the basic configuration (1) of the present invention in FIG. 1 will be described.

【００１８】まず，業務で使用する帳票，学習用のテス
ト帳票等により文字認識し，その認識結果６”により文
字統計情報作成部２は，出現回数，あるいは誤読を修正
した文字等の文字統計情報を文字毎に作成する。そし
て，標準パターン辞書変更部３は文字統計情報を基に標
準パターン辞書４の内容を変更する。例えば，出現回数
が多い文字についてはテンプレート数を多くし，出現回
数が低い文字についてはテンプレート数を少なくする等
である。あるいは，文字統計情報が誤読により修正され
た文字（修正文字）の修正回数（修正文字毎の修正回
数）であれば，修正回数の多い修正文字のテンプレート
数を多くし，修正回数の少ない修正文字についてはテン
プレート数を少なくする等である。First, character recognition is performed by a form used in business, a test form for learning, etc., and the character statistical information creation unit 2 uses the recognition result 6 "to determine the number of appearances or character statistical information such as a character whose misreading is corrected. Then, the standard pattern dictionary changing unit 3 changes the contents of the standard pattern dictionary 4 based on the character statistical information.For example, for a character with a large number of appearances, the number of templates is increased and the number of appearances is increased. For low characters, reduce the number of templates, etc. Alternatively, if the character statistical information is the number of corrections (correction number) of a character that has been corrected by misreading (correction number), the correction character with many corrections The number of templates is increased, and the number of templates is reduced for modified characters with a small number of modifications.

【００１９】そのようにして，標準パターン辞書４の内
容を変更する。文字認識はまず文字パターン入力部５よ
り帳票の文字パターンを入力する。文字認識部６は，入
力された文字パターンから１文字を切り出して特徴を抽
出し，上記のように作成した標準パターン辞書４の特徴
（テンプレート）と比較し，類似度，距離等を求め，文
字認識をする。認識結果出力部７は文字認識結果を出力
する。In this way, the contents of the standard pattern dictionary 4 are changed. For character recognition, first, the character pattern of the form is input from the character pattern input unit 5. The character recognition unit 6 cuts out one character from the input character pattern, extracts the feature, compares it with the feature (template) of the standard pattern dictionary 4 created as described above, obtains the degree of similarity, distance, etc. Be aware. The recognition result output unit 7 outputs the character recognition result.

【００２０】図２は本発明の基本構成(2) である。図２
において，１は文字認識装置である。FIG. 2 shows the basic configuration (2) of the present invention. FIG.
In, 1 is a character recognition device.

【００２１】４は標準パターン辞書である。５は文字パ
ターン入力部である。６は文字認識部である。Reference numeral 4 is a standard pattern dictionary. Reference numeral 5 is a character pattern input unit. 6 is a character recognition unit.

【００２２】６’は比較部であって，入力された１文字
の文字パターンの特徴と標準辞書の特徴を比較して類似
度，距離等を計算するものである。そして，類似度もし
くは距離等に重み作成部１３で作成された重みを加味す
るものである。Reference numeral 6'denotes a comparison unit for calculating the degree of similarity, distance, etc. by comparing the characteristics of the input character pattern of one character with the characteristics of the standard dictionary. Then, the weight created by the weight creating unit 13 is added to the similarity or the distance.

【００２３】６”は認識結果である。１２は統計情報作
成部である。１３は重み作成部であって，文字の出現回
数等の文字統計情報に基づいて文字毎に重みを作成する
ものであり，例えば，出現回数の高い文字に対しては低
い重み付けをし，出現回数の低い文字に対しては高い重
み付けをし，文字認識部６において，距離計算結果に重
みを乗算し，乗算結果に基づいて文字認識するものであ
る。あるいは類似度により文字認識する場合には出現回
数の多い文字の重みを大きくし，出現回数の少ない文字
の重みを小さくし，文字認識部６において，類似度に重
みを乗算し，乗算結果に基づいて文字認識するものであ
る。6 "is a recognition result. 12 is a statistical information creating unit. 13 is a weight creating unit, which creates a weight for each character based on character statistical information such as the number of appearances of characters. Yes, for example, a character with a high frequency of appearance is weighted low, a character with a low frequency of occurrence is weighted high, and in the character recognition unit 6, the distance calculation result is multiplied by the weight, and the multiplication result is Alternatively, when recognizing a character based on the similarity, the weight of the character having a large number of appearances is increased, the weight of the character having a small number of appearances is reduced, and the character recognition unit 6 determines the similarity. The weight is multiplied and character recognition is performed based on the multiplication result.

【００２４】図２の構成の動作を説明する。まず，業務
で使用する帳票，学習用のテスト帳票等により文字認識
し，文字統計情報作成部２はその認識結果６”により，
出現回数，あるいは誤読を修正した文字等の文字統計情
報を文字毎に作成する。そして，重み作成部１３は文字
統計情報に基づいて文字毎に重み付けをする。例えば１
文字の文字パターンと標準パターン辞書のテンプレート
との距離により認識する場合には，出現回数の多い文字
の重みを小さくし，出現回数の少ない文字の重みを大き
くする。類似度により文字認識する場合にはその反対に
出現回数の多い文字の重みを大きくし，出現回数の少な
い文字の重みを小さくする。The operation of the configuration of FIG. 2 will be described. First, character recognition is performed on a form used in business, a test form for learning, etc., and the character statistical information creation unit 2 uses the recognition result 6 "to
Character statistical information is created for each character, such as the number of appearances or characters with corrected misreading. Then, the weight creating unit 13 weights each character based on the character statistical information. Eg 1
In the case of recognition based on the distance between the character pattern of the character and the template of the standard pattern dictionary, the weight of the character having the high frequency of appearance is reduced and the weight of the character having the low frequency of occurrence is increased. On the other hand, when recognizing a character based on the similarity, the weight of the character having a high appearance frequency is increased, and the weight of the character having a low appearance frequency is decreased.

【００２５】文字パターン入力部５は帳票の文字パター
ンを入力する。文字認識部６は入力文字パターンから１
文字の文字パターンを切り出し，その特徴について標準
パターン辞書４の特徴と比較する。そして，特徴を比較
することにより得られた距離もしくは類似度に重み作成
部１３の作成した重みを加味し，文字認識する。例え
ば，特徴を比較して得られた距離もしくは類似度に重み
を乗算し，換算距離，換算類似度とする。換算距離の小
さい順に１つもしくは複数個を認識結果として出力す
る。換算類似度の場合には，換算類似度の大きい順に１
つもしくは複数個を認識結果として出力する。The character pattern input unit 5 inputs the character pattern of the form. The character recognizing unit 6 selects 1 from the input character pattern.
A character pattern of a character is cut out, and its characteristic is compared with the characteristic of the standard pattern dictionary 4. Then, the distance or the similarity obtained by comparing the features is added with the weight created by the weight creating unit 13 to perform character recognition. For example, the distance or the similarity obtained by comparing the features is multiplied by a weight to obtain the converted distance and the converted similarity. One or a plurality of conversion distances are output in the ascending order of the recognition distances. In the case of conversion similarity, 1 in descending order of conversion similarity
One or more are output as the recognition result.

【００２６】[0026]

【作用】本発明の基本構成(1) によれば，認識対象とな
る帳票もしくは業務で使用する文字の偏りを反映したテ
スト帳票を文字認識して得られる文字統計情報をもとに
標準パターン辞書の内容を自動的に変更するので，常に
業務に応じた最適な標準文字パターン辞書を作成するこ
とができる。そのため，業務毎に常に最大の認識精度に
近い精度の文字認識をすることができる。According to the basic configuration (1) of the present invention, the standard pattern dictionary is obtained based on the character statistical information obtained by character recognition of the form to be recognized or the test form reflecting the bias of the characters used in the work. Since the contents of are automatically changed, it is always possible to create an optimal standard character pattern dictionary according to the work. Therefore, it is possible to always perform character recognition with accuracy close to the maximum recognition accuracy for each job.

【００２７】また，本発明の基本構成(2) によれば，認
識対象となる帳票もしくは業務で使用する文字の偏りを
反映したテスト帳票を文字認識して得られる文字統計情
報をもとに文字に重み付けをし，文字認識部の認識結果
に反映できるので，認識の基準とする値が曖昧でなくな
り，認識精度を向上させることができる。Further, according to the basic configuration (2) of the present invention, characters are recognized based on character statistical information obtained by character recognition of a form to be recognized or a test form reflecting a bias of characters used in business. Can be weighted and reflected in the recognition result of the character recognition unit, so that the value used as the recognition reference is not ambiguous and the recognition accuracy can be improved.

【００２８】[0028]

【実施例】図３は本発明の実施例１であって，出現頻度
に応じて標準パターン辞書の内容を変更するようにした
ものである。FIG. 3 shows the first embodiment of the present invention, in which the contents of the standard pattern dictionary are changed according to the appearance frequency.

【００２９】図３において，３１は文字認識装置であ
る。３２は制御部であって，標準パターン辞書，文字認
識部，その他の各部の動作制御をするものである。In FIG. 3, reference numeral 31 is a character recognition device. A control unit 32 controls the operations of the standard pattern dictionary, the character recognition unit, and other units.

【００３０】３３は標準パターン辞書変更部であって，
辞書変更モードで標準パターン辞書の内容を変更するも
のである。３４は標準パターン辞書である。33 is a standard pattern dictionary changing unit,
The contents of the standard pattern dictionary are changed in the dictionary change mode. 34 is a standard pattern dictionary.

【００３１】３５は文字パターン入力部であって，文字
パターン読み取り装置２１１の読み取った文字パターン
を入力するものである。３６は文字認識部であって，文
字認識モードの指定で入力された文字パターンの文字認
識をするものである。Reference numeral 35 is a character pattern input section for inputting a character pattern read by the character pattern reading device 211. A character recognition unit 36 recognizes characters in a character pattern input by designating a character recognition mode.

【００３２】３６”は認識結果である。３７は認識結果
出力部であって，認識結果をディスプレイ２１２，他の
データ処理装置２１３等に出力するものである。Reference numeral 36 "is a recognition result. Reference numeral 37 is a recognition result output unit, which outputs the recognition result to the display 212, another data processing device 213 or the like.

【００３３】３８は文字統計情報作成部である。４１は
文字切り出し部であって，入力された文字パターンから
１文字の文字パターンを切り出すものである。Reference numeral 38 is a character statistical information creating section. Reference numeral 41 denotes a character cutout unit, which cuts out a character pattern of one character from the input character pattern.

【００３４】４２は特徴抽出部であって，１文字の文字
パターンから特徴を抽出するものである。４３は特徴比
較部であって，１文字の文字パターンの特徴と標準パタ
ーン辞書３４の特徴（テンプレート）との距離計算をす
るものである。Reference numeral 42 is a feature extraction unit for extracting features from a character pattern of one character. A feature comparison unit 43 calculates the distance between the feature of one character pattern and the feature (template) of the standard pattern dictionary 34.

【００３５】４４は出現回数測定部であって，認識結果
出力部３７の出力する認識結果に基づいて文字毎の出現
回数をカウントするものである。４５は出現回数保持部
であって，文字毎の出現回数を保持するものである。An appearance frequency measuring unit 44 counts the appearance frequency of each character based on the recognition result output from the recognition result output unit 37. Reference numeral 45 denotes an appearance count holding unit, which holds the appearance count for each character.

【００３６】２１０は入力装置であって，辞書変更モー
ド，文字認識モードの等のモード指定，誤読文字の修正
等をするキーボード等である。２１１は文字パターン読
み取り装置であって，帳票に記入された文字のパターン
を読み取るものである。An input device 210 is a keyboard or the like for designating modes such as a dictionary change mode and a character recognition mode, and correcting misread characters. A character pattern reading device 211 reads a pattern of characters written on a form.

【００３７】２１２はディスプレイであって，帳票の認
識結果を表示するものである。２１３は他のデータ処理
装置であって，文字認識されたデータを処理する他のコ
ンピュータ等の装置である。Reference numeral 212 denotes a display, which displays the recognition result of the form. Reference numeral 213 denotes another data processing device, which is another device such as a computer for processing the data whose characters have been recognized.

【００３８】図３の構成の動作を説明する前に図４によ
り本発明の実施例２の標準パターン辞書の変更方法につ
いて説明する。図４において，３１は文字認識装置であ
る。Before explaining the operation of the configuration of FIG. 3, a method of changing the standard pattern dictionary according to the second embodiment of the present invention will be described with reference to FIG. In FIG. 4, 31 is a character recognition device.

【００３９】３３は標準パターン辞書変更部である。３
４は標準パターン辞書である。３７は認識結果出力部で
ある。Reference numeral 33 is a standard pattern dictionary changing unit. 3
Reference numeral 4 is a standard pattern dictionary. 37 is a recognition result output unit.

【００４０】４４は出現回数測定部である。４５は出現
回数保持部である。標準パターン辞書３４において，５
１はポインタテーブルであって，標準パターン辞書３４
のテンプレートのうち文字カテゴリ毎に使用するテンプ
レートの開始アドレスを保持するものである。Reference numeral 44 is an appearance frequency measuring unit. Reference numeral 45 is an appearance frequency holding unit. 5 in the standard pattern dictionary 34
1 is a pointer table, which is a standard pattern dictionary 34
Of the templates, the starting address of the template used for each character category is held.

【００４１】５２はテンプレート保持部であって，文字
カテゴリ毎に２０テンプレートを保持するものである。
各文字カテゴリの最終テンプレートには最終位置である
ことを示すフラグ「１」をセットする。他のフラグは
「０」である。A template holding unit 52 holds 20 templates for each character category.
A flag "1" indicating the final position is set in the final template of each character category. The other flags are "0".

【００４２】図４において，出現回数測定部４４は認識
結果出力部３７の出力する認識文字について文字毎の出
現回数を測定する。そして，出現回数保持部４５に文字
毎に出現回数を記録する。In FIG. 4, the appearance frequency measuring unit 44 measures the appearance frequency of each character of the recognition character output by the recognition result output unit 37. Then, the appearance count is recorded in the appearance count storage unit 45 for each character.

【００４３】辞書変更モードが指定されると，標準パタ
ーン辞書変更部３３は出現回数保持部４５に記録されて
いる内容を入力する。そして，標準パターン辞書変更部
３３は文字の出現回数に応じてポインタテーブル５１の
ポインタの値を文字（文字カテゴリ）毎に変更する。When the dictionary changing mode is designated, the standard pattern dictionary changing unit 33 inputs the contents recorded in the appearance frequency holding unit 45. Then, the standard pattern dictionary changing unit 33 changes the value of the pointer of the pointer table 51 for each character (character category) according to the number of appearances of the character.

【００４４】ポインタテーブル５１はテンプレート保持
部５２の使用するテンプレートの開始アドレスを文字毎
に保持していて，ポインタの指定するアドレスがその文
字カテゴリのテンプレートを使用する開始アドレスであ
る。例えば，図４の場合，アのカテゴリの場合にはポイ
ンタの指定する開始アドレスがア₅であるので，ア₅か
らア₁までのテンプレートが使用される。The pointer table 51 holds the starting address of the template used by the template holding unit 52 for each character, and the address designated by the pointer is the starting address using the template of the character category. For example, in the case of FIG. 4, the start address designated by the pointer if the A categories since it is A _5, the template of up to A ₅ Karaa ₁ is used.

【００４５】ポインタテーブル５１の保持するポインタ
の初期値は各文字カテゴリの中央のアドレスになってい
て，それぞれア₁₀，イ₁₀，ウ₁₀，エ₁₀のアドレスであ
る。そして，標準パターン辞書変更部３３は文字の出現
回数の割合に応じてそれぞれの文字カテゴリで使用する
テンプレートの数を決め，使用するテンプレート数がそ
の数になるようにそれぞれの文字カテゴリのテンプレー
トの開始アドレスを変更する（指定されたアドレスのテ
ンプレートからフラグが１のテンプレートまでが使用さ
れる）。図４の例の場合，文字アの出現回数は５回，イ
の出現回数は５回，エの出現回数は５回であって少ない
ので，それぞれのテンプレートのうち使用するテンプレ
ートの数も少ないように開始アドレスをア₅，イ₅，エ
₅のテンプレートの位置にする。また，ウの出現回数は
多いので，ウに使用するテンプレートの開始アドレスは
ウ₂₀の位置とする。The initial value of the pointer held by the pointer table 51 have been at the center of the address of each character category, respectively A _10, b _10, c _10, is the address of d _10. Then, the standard pattern dictionary changing unit 33 determines the number of templates to be used in each character category according to the ratio of the number of appearances of characters, and starts the template of each character category so that the number of templates to be used becomes that number. Change the address (the template from the specified address to the template with the flag 1 is used). In the case of the example of FIG. 4, the number of appearances of the character “a” is 5, the number of appearances of “a” is 5, and the number of appearances of “d” is 5, which is small. Therefore, the number of templates used in each template is small. a _5, Lee ₅ a start address to, d
_Set to template position ₅ . Further, since the number of occurrences of U are many, the start address of the template used in c are the position of the window _20.

【００４６】ここで，図３の実施例１の構成の動作を説
明する。帳票の文字認識が開始されてからの最初の複数
枚の認識結果を基に得られる文字情報により，標準パタ
ーン辞書の内容を変更する。例えば，帳票の処理単位が
１００枚のとき，当初の１０枚程度の認識結果が得られ
た時点で，辞書変更モードを指定して標準パターン辞書
の内容を変更する。The operation of the configuration of the first embodiment shown in FIG. 3 will be described. The contents of the standard pattern dictionary are changed according to the character information obtained based on the recognition results of the first multiple sheets since the character recognition of the form was started. For example, when the processing unit of the form is 100 sheets, the dictionary change mode is designated to change the contents of the standard pattern dictionary when the initial recognition result of about 10 sheets is obtained.

【００４７】標準パターン辞書３４の変更は文字認識処
理の過程において随時行うことができる。出現回数の計
測は文字認識処理において絶えず行い，標準パターン辞
書３４の変更に反映された時点で一旦クリアされる。The standard pattern dictionary 34 can be changed at any time during the character recognition process. The number of appearances is constantly measured in the character recognition process, and is once cleared when the change in the standard pattern dictionary 34 is reflected.

【００４８】文字認識するときは文字認識モードを指定
する。モード選択部４６により文字認識モードの指定が
制御部３２に通知され，制御部３２は各部の文字認識処
理を制御する。When recognizing characters, a character recognition mode is designated. The mode selection unit 46 notifies the control unit 32 of the designation of the character recognition mode, and the control unit 32 controls the character recognition processing of each unit.

【００４９】文字パターン読み取り装置２１１で読み取
られた帳票の文字パターンは文字パターン入力部３５に
より文字認識部３６に入力される。文字切り出し部４１
は読み込まれた文字パターンから１文字の文字パターン
を切り出す。特徴抽出部４２はその文字パターンの特徴
抽出をする。特徴比較部４３は抽出した特徴と標準辞書
パターン辞書３４の特徴（テンプレート）とを比較す
る。比較において標準パターン辞書３４の特徴と１文字
の文字パターンの特徴の距離を計算する（距離の計算方
法は後述する）。そして，文字認識部３６は距離の最小
な特徴のテンプレートの文字カテゴリを認識結果３６’
とし，認識結果出力部３７により，ディスプレイ２１
２，他のデータ処理装置２１３等に出力する。認識結果
は距離の小さい方から順に候補順位を付して出力する。The character pattern of the form read by the character pattern reading device 211 is input to the character recognition unit 36 by the character pattern input unit 35. Character cutout unit 41
Cuts out a one-character character pattern from the read character pattern. The feature extraction unit 42 extracts the features of the character pattern. The feature comparison unit 43 compares the extracted feature with the feature (template) of the standard dictionary pattern dictionary 34. In the comparison, the distance between the feature of the standard pattern dictionary 34 and the feature of the character pattern of one character is calculated (a method of calculating the distance will be described later). Then, the character recognition unit 36 recognizes the character category of the template having the feature with the minimum distance as the recognition result 36 '.
The recognition result output unit 37 causes the display 21
2, output to another data processing device 213 or the like. The recognition result is output with candidate ranks assigned in order from the smallest distance.

【００５０】そして，出現回数測定部４４は認識結果と
して出力される文字（第１順位）をカウントし，出現回
数保持部４５に保持する。標準パターン辞書３４を変更
する場合には，辞書変更モードを指定する。モード選択
部（図示せず）は辞書変更モードの指定がなされたこと
を制御部３２に通知する。制御部３２は標準パターン辞
書変更部３３等の各部の動作を制御する。Then, the appearance number measuring unit 44 counts the characters (first rank) output as the recognition result, and holds them in the appearance number holding unit 45. When changing the standard pattern dictionary 34, a dictionary changing mode is designated. A mode selection unit (not shown) notifies the control unit 32 that the dictionary change mode has been designated. The control unit 32 controls the operation of each unit such as the standard pattern dictionary changing unit 33.

【００５１】辞書変更モードが指定されると，標準パタ
ーン辞書変更部３３は，出現回数保持部４５の保持する
文字毎の出現回数を入力する。標準パターン辞書変更部
３３は文字の出現回数を考慮して前述の方法により文字
毎に表示パターン辞書３４のポインタテーブルのポイン
タ値を変更し，文字（文字カテゴリ）毎に使用するテン
プレート数を変更する。標準パターン辞書３４の内容が
変更された時点で，出現回数保持部４５はクリアされ
る。When the dictionary change mode is designated, the standard pattern dictionary change unit 33 inputs the number of appearances for each character held by the number-of-appearances holding unit 45. The standard pattern dictionary changing unit 33 changes the pointer value in the pointer table of the display pattern dictionary 34 for each character and changes the number of templates to be used for each character (character category) in consideration of the number of times the character appears. . When the contents of the standard pattern dictionary 34 are changed, the appearance frequency holding unit 45 is cleared.

【００５２】以後，再び文字認識モードが指定されると
変更された標準パターン辞書３４を使用して帳票の文字
認識が継続され，認識文字の測定が新たに開始される。
特徴比較部４３の距離の計算は次のように行う。After that, when the character recognition mode is designated again, the changed standard pattern dictionary 34 is used to continue the character recognition of the form, and the measurement of the recognized characters is newly started.
The distance of the feature comparison unit 43 is calculated as follows.

【００５３】である。ここに，ｘ（ｉ）は入力文字の特徴ベクトルで
ある。ｓ（ｉ）は標準辞書のカテゴリＡの特徴ベクトル
である。ｎは特徴ベクトルの次元数である。ｍはカテゴ
リＡの標準パターンの数すなわちテンプレート数であ
る。[0053] It is. Here, x (i) is the feature vector of the input character. s (i) is a feature vector of category A in the standard dictionary. n is the dimension number of the feature vector. m is the number of standard patterns of category A, that is, the number of templates.

【００５４】入力文字の特徴を標準辞書のテンプレート
と比較し，ｍ個のテンプレート数のうち最小の距離のテ
ンプレートとの距離が入力文字と比較した文字カテゴリ
の距離である。このようにして，各文字カテゴリとの距
離を求め，距離が小さい方から順に認識候補文字として
出力する。The characteristic of the input character is compared with the template of the standard dictionary, and the distance to the template with the smallest distance among the number of m templates is the distance of the character category compared with the input character. In this way, the distance from each character category is obtained and the recognition candidate characters are output in order from the smallest distance.

【００５５】図５は本発明の実施例１の標準パターン辞
書変更部のフローチャートである。Ｓ１出現回数保持部４５から出現回数を入力する。Ｓ２総出現回数に対する文字毎の出現回数の割合を求
める。FIG. 5 is a flowchart of the standard pattern dictionary changing unit according to the first embodiment of the present invention. S1 The number of appearances is input from the number-of-appearances holding unit 45. S2 The ratio of the number of appearances for each character to the total number of appearances is calculated.

【００５６】Ｓ３文字毎に出現回数の割合に応じてそ
れぞれの文字カテゴリで使用するテンプレート数を求め
る。Ｓ４使用するテンプレート数に従って，文字毎にテン
プレートの使用開始アドレスを求め，それぞれの文字カ
テゴリのポインタとしてポインタテーブルに設定する。S3 For each character, the number of templates used in each character category is calculated according to the ratio of the number of appearances. S4: The use start address of the template is obtained for each character according to the number of templates to be used, and is set in the pointer table as a pointer for each character category.

【００５７】図６は本発明の実施例２であって，認識結
果が誤りであった場合に修正して正しくされた修正文字
毎に修正回数を測定し，標準パターン辞書を変更する情
報とするものである。FIG. 6 shows the second embodiment of the present invention, in which when the recognition result is incorrect, the number of corrections is measured for each corrected character that is corrected and used as information for changing the standard pattern dictionary. It is a thing.

【００５８】図６において，３１は文字認識装置であ
る。３２は制御部である。In FIG. 6, reference numeral 31 is a character recognition device. Reference numeral 32 is a control unit.

【００５９】３３は標準パターン辞書変更部である。３
４は標準パターン辞書である。３５は文字パターン入力
部である。Reference numeral 33 is a standard pattern dictionary changing unit. 3
Reference numeral 4 is a standard pattern dictionary. Reference numeral 35 is a character pattern input unit.

【００６０】３６は文字認識部である。３６”は認識結
果である。３７は認識結果出力部である。Reference numeral 36 is a character recognition unit. 36 "is a recognition result. 37 is a recognition result output unit.

【００６１】３８は文字統計情報作成部である。５１は
ポインタテーブルである。５２はテンプレート保持部で
ある。Reference numeral 38 is a character statistical information creating section. Reference numeral 51 is a pointer table. 52 is a template holding unit.

【００６２】６２は認識結果修正部であって，誤認され
た文字に対して入力装置２１０により入力された正しい
文字に認識結果を修正するものである。６３は修正回数
測定部であって，修正されて正しくなった文字毎に修正
回数を測定するものである（以後，修正されて得られた
文字を修正文字と称する）。Reference numeral 62 is a recognition result correction unit, which corrects the recognition result of the erroneously recognized character to the correct character input by the input device 210. Reference numeral 63 denotes a correction number measuring unit, which measures the number of corrections for each corrected and correct character (hereinafter, the corrected character is referred to as a corrected character).

【００６３】６４は修正回数保持部であって，修正文字
の得られた回数（以後，修正回数と称する）を保持する
ものである。２１０は入力装置であって，キーボード等
である。Reference numeral 64 denotes a correction number holding unit, which holds the number of times the correction character has been obtained (hereinafter referred to as the correction number). An input device 210 is a keyboard or the like.

【００６４】２１１は文字パターン読み取り装置であ
る。２１２はディスプレイである。２１３は他のデータ
処理装置である。Reference numeral 211 is a character pattern reading device. 212 is a display. 213 is another data processing device.

【００６５】図６の構成の動作を説明する。認識結果出
力部３７の出力する認識結果はディスプレイ２１２に表
示されるので，認識結果に誤りがあった場合には，誤認
した文字に対して入力装置２１０により正しい文字を入
力して修正する。認識結果修正部６２は誤認した文字を
正しい文字に修正し，認識結果出力部は修正文字を出力
する。The operation of the configuration of FIG. 6 will be described. Since the recognition result output from the recognition result output unit 37 is displayed on the display 212, if there is an error in the recognition result, a correct character is input to the erroneously recognized character by the input device 210 to correct it. The recognition result correction unit 62 corrects the erroneously recognized character to a correct character, and the recognition result output unit outputs the corrected character.

【００６６】修正回数測定部６３は修正文字の修正回数
を文字毎に測定する。修正回数保持部６４は文字毎に修
正回数を保持する。実際に帳票を処理する場合には，例
えば，１００枚程度の帳票を処理する場合，最初の１０
枚程度の文字認識結果の修正回数をもとに標準パターン
辞書３４を変更する。The correction number measuring unit 63 measures the number of corrections of the correction character for each character. The correction count holding unit 64 holds the correction count for each character. When actually processing the forms, for example, when processing about 100 forms, the first 10
The standard pattern dictionary 34 is changed based on the number of corrections of the character recognition result of about one sheet.

【００６７】図６の構成において，文字認識の動作は図
３の実施例１と同様なので説明は省略する。標準パター
ン辞書３４の変更は，図４の標準パターン辞書の変更方
法と同様になされる。即ち，標準パターン辞書変更部３
３は，辞書変更モードの指定の入力がなされると修正回
数保持部６４に保持されている修正文字毎の修正回数を
入力する。そして，文字毎に修正回数に応じてポインタ
テーブルのポインタを変更する。修正結果により得られ
た文字についてその回数の多い文字のテンプレート数が
多くなるようにポインタを設定し，その回数が少ない文
字のテンプレート数が少なくなるようにポインタを設定
する。In the configuration of FIG. 6, the character recognition operation is the same as that of the first embodiment of FIG. The standard pattern dictionary 34 is changed in the same manner as the standard pattern dictionary changing method of FIG. That is, the standard pattern dictionary changing unit 3
When the designation of the dictionary change mode is input, the number 3 inputs the number of corrections for each correction character held in the correction number holding unit 64. Then, the pointer in the pointer table is changed for each character according to the number of corrections. The pointer is set so that the number of templates of the character obtained by the correction is large, and the number of templates of the character whose number is small is small.

【００６８】このようにして標準パターン辞書３４が修
正されると，修正回数保持部６４はクリアされる。そし
て，文字認識モードの指定により文字認識が開始される
と，変更された標準パターン辞書３４で文字認識がなさ
れ，修正回数測定部６３は誤認した文字を修正する毎に
修正文字の修正回数を１ずつ増やす。When the standard pattern dictionary 34 is corrected in this way, the correction number holding unit 64 is cleared. Then, when the character recognition is started by the designation of the character recognition mode, the changed standard pattern dictionary 34 performs the character recognition, and the correction number measurement unit 63 sets the correction number of the corrected character to 1 each time the character which is erroneously recognized is corrected. Increase by one.

【００６９】図７は本発明で使用する実施例２の修正回
数保持部と標準パターン辞書変更部のフローチャートで
ある。図７ (a)は修正回数保持部６４であり，修正文字
毎に修正回数を保持するものである。例えば，「ア」の
文字カテゴリの文字パターンが文字認識において，誤認
されて「ア」に修正された回数が５回であることを示
す。同様に，「イ」は１０回，「ウ」は５回であること
を示す。FIG. 7 is a flow chart of the correction number holding unit and the standard pattern dictionary changing unit of the second embodiment used in the present invention. FIG. 7A shows a correction count holding unit 64, which holds the correction count for each correction character. For example, it is shown that the number of times that a character pattern in the character category of "A" is mistakenly recognized and corrected to "A" in character recognition is 5. Similarly, "a" indicates 10 times and "c" indicates 5 times.

【００７０】図７ (b)は標準パターン辞書変更部のフロ
ーチャートである。Ｓ１辞書変更モードの指定により標準パターン辞書変
更部は修正回数保持部から修正文字毎の修正回数を入力
する。FIG. 7B is a flowchart of the standard pattern dictionary changing unit. By specifying the S1 dictionary change mode, the standard pattern dictionary change unit inputs the number of corrections for each correction character from the correction number holding unit.

【００７１】Ｓ２総修正回数に対する文字毎の修正回
数の割合を求める。Ｓ３修正回数の割合に応じて文字（文字カテゴリ）毎
に使用テンプレート数を定める。S2 The ratio of the number of corrections for each character to the total number of corrections is calculated. S3 The number of templates used is determined for each character (character category) according to the ratio of the number of corrections.

【００７２】Ｓ４使用テンプレート数となるように文
字毎に使用テンプレートの開始アドレスを求め，ポイン
タとしてそれぞれの文字カテゴリ毎にポインタテーブル
に設定する。S4: The start address of the used template is obtained for each character so as to be the number of used templates, and is set as a pointer in the pointer table for each character category.

【００７３】図８は本発明の実施例３の説明図である。
図８ (a)において１０１は入力文字列Ａであって，「ト
ウキョウ」と正しく認識されたものである。FIG. 8 is an explanatory diagram of the third embodiment of the present invention.
In FIG. 8A, 101 is the input character string A, which is correctly recognized as "Tokyo".

【００７４】図８ (b)において，１０４は入力文字列Ｂ
であって，認識対象の文字パターンである。図８ (c)に
おいて，１１０は隣接文字出現回数表保持部であって，
カウント方法の説明図である。In FIG. 8 (b), 104 is the input character string B.
Which is the character pattern to be recognized. In FIG. 8 (c), 110 is an adjacent character appearance frequency table holding unit,
It is explanatory drawing of a counting method.

【００７５】図８ (d)において，１１１は隣接文字出現
回数表保持部であって，文字毎の隣接文字数の表を保持
するものである。隣接文字数は対象文字が文字認識され
た後に続いて文字認識された文字である。図８ (d)は対
象文字「ト」に対して隣接文字「ウ」が１００回，
「ク」が３０回であることを示すものである。In FIG. 8 (d), 111 is an adjacent character appearance frequency table holding unit, which holds a table of the number of adjacent characters for each character. The number of adjacent characters is the number of characters recognized after the target character is recognized. In Fig. 8 (d), the adjacent character "U" is 100 times the target character "T",
This indicates that "ku" is 30 times.

【００７６】図８ (a)のような入力文字列の文字認識に
おいて，対象文字「ト」１０２に続いて「ウ」１０３が
認識されると，隣接文字出現回数表保持部１１０の対象
文字「ト」の隣接文字「ウ」の回数を１加算する（図の
（ｃ）は＋１で加算したことを表している）。In the character recognition of the input character string as shown in FIG. 8A, when the target character "TO" 102 and "U" 103 are recognized, the target character " 1 is added to the number of times the adjacent character “U” of “T” is added ((c) in the figure indicates that addition is made by +1).

【００７７】このようにして，文字認識の処理において
隣接文字出現表を作成する。実際に文字認識で使用する
時は，例えば，処理を開始した後に１０枚程度の文字認
識をした時点で，標準パターン辞書変更部が隣接文字出
現回数表の内容を取り込み，保持する。そして，標準パ
ターン辞書変更部は文字認識の結果がでる毎に認識結果
の文字を対象文字とし，その対象文字の隣接文字の出現
回数に応じて標準パターン辞書を変更する。その隣接文
字の出現回数の多い文字カテゴリのテンプレート数を多
くし，少ない文字カテゴリのテンプレート数を少なくす
る。そのようにして変更された標準パターン辞書により
対象文字に続く文字パターンの文字認識をする。In this way, the adjacent character appearance table is created in the character recognition process. When actually used for character recognition, for example, when about 10 characters are recognized after the processing is started, the standard pattern dictionary changing unit fetches and holds the contents of the adjacent character appearance count table. Then, the standard pattern dictionary changing unit changes the standard pattern dictionary according to the number of appearances of the adjacent character of the target character, with the character of the recognition result as the target character each time the character recognition result is obtained. The number of templates in the character category in which the number of times of occurrence of the adjacent character is large is increased, and the number of templates in the character category in which the adjacent character is small is decreased. Character recognition of the character pattern following the target character is performed by the standard pattern dictionary thus changed.

【００７８】例えば，図８ (b)のような入力文字列Ｂ
（１０４）を文字認識する場合には，まず「ト」１０５
が認識されたら，隣接文字出現回数表を参照して「ト」
に隣接する文字「ウ」，「ク」等の出現回数を参照す
る。図８ (b)の場合，対象文字「ト」の隣接文字「ウ」
の回数は１００，「ク」の回路は３０であるので，文字
カテゴリ「ウ」のテンプレート数を多くし，「ク」のテ
ンプレート数をその回数に応じて設定する。他の文字に
ついても出現回数（図示していない）に応じてテンプレ
ート数を設定する。このようにして変更された標準パタ
ーン辞書を使用して文字パターン１０６の文字を認識す
る。For example, the input character string B as shown in FIG.
When recognizing (104) as a character, first, "TO" 105
Is recognized, refer to the adjacent character appearance count table
Refers to the number of appearances of the characters "U", "KU", etc. adjacent to. In the case of Fig. 8 (b), the adjacent character "U" of the target character "TO"
Since the number of times is 100 and the number of "K" circuits is 30, the number of templates of the character category "U" is increased, and the number of templates of "K" is set according to the number of times. For other characters, the number of templates is set according to the number of appearances (not shown). The characters of the character pattern 106 are recognized using the standard pattern dictionary thus changed.

【００７９】図９は本発明の実施例３であって，図８の
文字認識方法を実施するための装置構成である。図９に
おいて，３１は文字認識装置である。FIG. 9 shows a third embodiment of the present invention, which is an apparatus configuration for carrying out the character recognition method of FIG. In FIG. 9, 31 is a character recognition device.

【００８０】３３は標準パターン辞書変更部である３４
は標準パターン辞書である。３５は文字パターン入力部
である。Reference numeral 33 is a standard pattern dictionary changing unit 34.
Is a standard pattern dictionary. Reference numeral 35 is a character pattern input unit.

【００８１】３６は文字認識部である。３６”は認識結
果である。３７は認識結果出力部である。Reference numeral 36 is a character recognition unit. 36 "is a recognition result. 37 is a recognition result output unit.

【００８２】１０９は隣接文字出現回数測定部であっ
て，認識結果に基づいて，隣接文字の出現回数を対象文
字毎にカウントするものである。１１０は隣接文字出現
回数表保持部であって，隣接文字の出現回数を対象文字
毎に保持するものである。Reference numeral 109 denotes an adjacent character appearance count measuring unit, which counts the number of adjacent character appearances for each target character based on the recognition result. Reference numeral 110 denotes an adjacent character appearance count table holding unit, which holds the number of appearances of adjacent characters for each target character.

【００８３】１１５は標準パターン辞書変更部３３に保
持されている隣接文字出現回数表である。図９におい
て，入力装置，文字パターン読み取り装置，ディスプレ
イ，他のデータ処理装置は省略されている。Reference numeral 115 is an adjacent character appearance frequency table held in the standard pattern dictionary changing unit 33. In FIG. 9, an input device, a character pattern reading device, a display, and other data processing devices are omitted.

【００８４】図９の構成の動作を説明する。隣接文字出
現回数測定部１０９は，認識結果に基づいて対象文字毎
に隣接文字数をカウントし，カウント値を隣接文字出現
回数表保持部１１０に保持する。The operation of the configuration shown in FIG. 9 will be described. The adjacent character appearance count measuring unit 109 counts the number of adjacent characters for each target character based on the recognition result, and holds the count value in the adjacent character appearance count table holding unit 110.

【００８５】例えば，１００枚程度の帳票を処理する場
合，文字認識を開始した最初の１０枚程度の帳票の文字
認識結果により得られる隣接文字出現回数表１１５を標
準パターン辞書変更部３３は隣接文字出現回数表保持部
１１０から入力して保持する。For example, when processing about 100 sheets, the standard pattern dictionary changing unit 33 uses the adjacent character appearance frequency table 115 obtained by the character recognition result of the first about 10 sheets which started the character recognition. It is input from the appearance frequency table storage unit 110 and stored.

【００８６】文字認識結果が得られると，標準パターン
辞書変更部３３はその認識結果を対象文字としてその隣
接文字の出現回数を隣接文字出現回数表１１５より求
め，その出現回数に応じて対象文字のテンプレート数を
変更する。前述したように隣接文字出現回数が多い文字
カテゴリのテンプレートは多くし，隣接文字出現回数の
少ない文字カテゴリのテンプレートは少なくする。この
ように変更された標準パターン辞書３４を使用して対象
文字に続く文字の文字認識を行う。When the character recognition result is obtained, the standard pattern dictionary changing unit 33 determines the number of appearances of the adjacent character from the adjacent character appearance number table 115 by using the recognition result as the object character, and determines the number of appearances of the object character according to the number of appearances. Change the number of templates. As described above, the number of templates in the character category in which the number of appearances of adjacent characters is high is increased, and the number of templates in the character category in which the number of appearances of adjacent characters is low is decreased. Character recognition of the character following the target character is performed using the standard pattern dictionary 34 thus modified.

【００８７】図１０は本発明の実施例３の隣接文字出現
回数測定部のフローチャートである。Ｓ１隣接文字出現回数測定部は認識結果の文字Ａを判
定する。FIG. 10 is a flowchart of the adjacent character appearance frequency measuring unit according to the third embodiment of the present invention. S1 The adjacent character appearance frequency measurement unit determines the character A of the recognition result.

【００８８】Ｓ２文字Ａに続く文字の認識結果の文字
Ｂを判定する。Ｓ３文字Ａを対象文字とし，文字Ｂを隣接文字として
隣接文字出現回数表保持部の値を更新する。S2 The character B as a recognition result of the character following the character A is determined. S3 With the character A as the target character and the character B as the adjacent character, the value of the adjacent character appearance count table holding unit is updated.

【００８９】Ｓ４クリアの指示があるか判定し，クリ
アがなければＳ１以降の処理を繰り返し，クリアの指示
があればＳ５に進む。Ｓ５隣接文字出現回数表保持部１１０をクリアする。S4: It is judged whether or not there is a clear instruction. If there is no clear instruction, the processing from S1 is repeated, and if there is a clear instruction, the operation proceeds to S5. S5 Clear the adjacent character appearance frequency table holding unit 110.

【００９０】Ｓ６文字認識の処理を終了するか判定
し，終了するのでなければＳ１以降の処理を繰り返す。
終了するのであれば処理を終了する。図１１は本発明の
実施例３の標準パターン辞書変更部のフローチャートを
示す。S6 It is determined whether the character recognition processing is to be ended. If not, the processing from S1 onward is repeated.
If it ends, the process ends. FIG. 11 shows a flowchart of the standard pattern dictionary changing unit according to the third embodiment of the present invention.

【００９１】Ｓ１標準パターン変更部は隣接文字出現
回数表保持部から隣接文字出現回数表を入力する。Ｓ２制御部は，隣接文字出現回数表保持部をクリアす
る。The S1 standard pattern changing unit inputs the adjacent character appearance number table from the adjacent character appearance number table holding unit. The S2 control unit clears the adjacent character appearance count table holding unit.

【００９２】Ｓ３標準パターン辞書変更部は，認識結
果（Ａ）を判定する。Ｓ４標準パターン辞書変更部は認識結果（Ａ）を対象
文字として隣接文字の出現回数に従って標準パターン辞
書における隣接文字のテンプレート数を変更する。The S3 standard pattern dictionary changing section determines the recognition result (A). The S4 standard pattern dictionary changing unit changes the number of templates of adjacent characters in the standard pattern dictionary according to the number of appearances of adjacent characters with the recognition result (A) as the target character.

【００９３】Ｓ５文字認識部は，文字Ａに続く文字パ
ターンについてＳ４で作成された標準パターン辞書によ
り文字認識をする。Ｓ６文字認識を終了するのであれば処理を終了し，終
了するのでなければＳ７の処理をする。The S5 character recognition unit recognizes the character pattern following the character A using the standard pattern dictionary created in S4. If the S6 character recognition is to be ended, the processing is ended, and if not ended, the processing of S7 is executed.

【００９４】Ｓ７標準パターン辞書変更部の隣接文字
出現回数表を入力し直すのであればＳ１以降の処理を繰
り返し，入力し直さないのであればＳ３以降の処理を繰
り返す。S7 If the adjacent character appearance frequency table of the standard pattern dictionary changing unit is to be input again, the processing from S1 onward is repeated, and if it is not input again, the processing from S3 onward is repeated.

【００９５】図１２は本発明の実施例４であって，本発
明の基本構成(2) の実施例である。図１２において，３
１は文字認識装置である。FIG. 12 shows Embodiment 4 of the present invention, which is an embodiment of the basic configuration (2) of the present invention. In FIG. 12, 3
1 is a character recognition device.

【００９６】３４は標準パターン辞書である。３５は文
字パターン入力部である。３６は文字認識部である。Reference numeral 34 is a standard pattern dictionary. Reference numeral 35 is a character pattern input unit. 36 is a character recognition unit.

【００９７】３６”は認識結果である。３７は認識結果
出力部である。１２１は出現回数作成部であって，文字
の出現回数を測定するものである。36 "is a recognition result. 37 is a recognition result output section. 121 is an appearance number creating section, which measures the number of appearances of a character.

【００９８】１２２は重み付け部であって，出現回数に
応じて文字に重み付けをするものである。例えば，距離
により文字認識する場合には，出現回数が多い文字の重
みは小さくし，出現回数の少ない重みは大きくする。A weighting unit 122 weights characters in accordance with the number of appearances. For example, in the case of character recognition based on distance, the weight of a character having a large number of appearances is reduced, and the weight of a character having a small number of appearances is increased.

【００９９】１２３は重み表保持部であって，重み付け
部１２２の算出した重みを保持するものである。文字ア
の重みはＷアである。文字イの重みはＷイである。文字
ウの重みはＷウ等である。A weight table holding unit 123 holds the weights calculated by the weighting unit 122. The weight of the letter a is W. The weight of the character a is W. The weight of the letter C is W, etc.

【０１００】１２５は距離計算部である。文字認識部３
６において，１２５は距離計算部であって，重みを考慮
して入力文字パターンから切り出された１文字のパター
ンの特徴とテンプレートとの距離を求め，該距離に重み
を乗算した換算距離Ｄを求めるものである。Reference numeral 125 is a distance calculator. Character recognition unit 3
6, a distance calculator 125 calculates the distance between the template of the pattern of one character cut out from the input character pattern and the template in consideration of the weight, and calculates the converted distance D by multiplying the distance by the weight. It is a thing.

【０１０１】１２８は最小距離判定部であって，換算距
離Ｄの小さい方から順に高い順位の認識候補とするもの
である。１２９は重み表であって，重み表保持部１２３
から入力したものである。Reference numeral 128 is a minimum distance determination unit, which is a recognition candidate having a higher rank in the ascending order of the converted distance D. A weight table 129 is a weight table holding unit 123.
It was input from.

【０１０２】図１２の実施例４の動作を説明する。文字
認識を開始した当初の複数枚の帳票をもとに出現回数作
成部１２１により測定された出現回数を基に，重み付け
部１２２は出現回数に応じて文字毎に重みを計算し，重
み表保持部１２３に保持する。例えば，１００枚程度の
帳票を処理するとき，当初の１０枚程度の文字認識結果
を基に作成された文字毎の出現回数により重みを計算す
る。The operation of the fourth embodiment shown in FIG. 12 will be described. The weighting unit 122 calculates a weight for each character according to the number of appearances based on the number of appearances measured by the number-of-appearances creating unit 121 based on the initial plurality of forms that started character recognition, and holds the weight table. It is held in the part 123. For example, when processing about 100 forms, the weight is calculated based on the number of appearances of each character created based on the initial character recognition result of about 10 characters.

【０１０３】文字認識部３６において，図３の実施例１
の場合と同様に入力パターンから切り出した１文字のパ
ターンの特徴について標準辞書のテンプレートと比較
し，距離を求める。そして，求めた距離に重み表１２９
のその文字の重みを乗算し，換算距離Ｄを求める。In the character recognition unit 36, the first embodiment shown in FIG.
Similar to the case, the distance of the pattern of one character cut out from the input pattern is compared with the template of the standard dictionary. Then, a weight table 129 is added to the calculated distance.
The conversion distance D is obtained by multiplying the weight of that character of.

【０１０４】距離の小さいカテゴリからＭ位までを候補
して求め，候補メモリ（図示せず）に格納する。換算距
離Ｄは次のように求める。Candidates from the category with the shortest distance to the Mth are obtained as candidates and stored in a candidate memory (not shown). The converted distance D is calculated as follows.

【０１０５】である。ここに，ｘ（ｉ）は入力文字の特徴ベクトルで
ある。ｓ（ｉ）は標準辞書のカテゴリＡの特徴ベクトル
である。ｎは特徴ベクトルの次元数である。ｍはカテゴ
リＡの標準パターンの数すなわちテンプレート数であ
る。[0105] It is. Here, x (i) is the feature vector of the input character. s (i) is a feature vector of category A in the standard dictionary. n is the dimension number of the feature vector. m is the number of standard patterns of category A, that is, the number of templates.

【０１０６】本発明の文字認識方法は，前記の標準パタ
ーン辞書の変更を文字認識中の複数枚の帳票の文字認識
結果を基に作成した文字統計情報により行う方法（文字
認識方法(1) ）と，業務内容に合わせて作成した学習用
テスト帳票を基に標準パターン辞書を設定する方法（文
字認識方法(2) ）の２通りがある。In the character recognition method of the present invention, the standard pattern dictionary is changed based on the character statistical information created based on the character recognition results of a plurality of forms during character recognition (character recognition method (1)). There are two methods: setting a standard pattern dictionary based on the learning test form created according to the work content (character recognition method (2)).

【０１０７】図１３は本発明の文字認識方法(1) を示す
図である。Ｓ１文字統計情報作成部を一旦クリアする。Ｓ２文字認識する。FIG. 13 is a diagram showing a character recognition method (1) of the present invention. Clear the S1 character statistical information creation unit. S2 Character recognition.

【０１０８】Ｓ３文字統計情報を更新する。Ｓ４文字認識処理を終了するか判定する。文字認識処
理を終了しないのであればＳ５に進む。S3 The character statistical information is updated. S4 It is determined whether the character recognition processing is to be ended. If the character recognition process is not completed, the process proceeds to S5.

【０１０９】Ｓ５標準パターン辞書を変更するか判断
する。変更しないのであればＳ２以降の処理を繰り返
す。変更するのであれば，Ｓ６に進む。Ｓ６文字統計情報を入力する。S5: Determine whether to change the standard pattern dictionary. If it is not changed, the processing from S2 is repeated. If it is changed, the process proceeds to S6. S6 Enter the character statistical information.

【０１１０】Ｓ７標準パターン辞書を変更し，Ｓ１以
降の処理を繰り返す。図１４は本発明の文字認識方法
(2) を示す図である。Ｓ１標準パターン辞書変更用のテスト帳票を用意す
る。S7: Change the standard pattern dictionary and repeat the processing from S1. FIG. 14 is a character recognition method of the present invention.
It is a figure showing (2). S1 Prepare a test form for changing the standard pattern dictionary.

【０１１１】Ｓ２テスト帳票の文字認識をして文字統
計情報を作成する。Ｓ３標準パターン辞書更新部は，文字統計情報を入力
して標準パターン辞書を変更する。S2 Character recognition of the test form is performed and character statistical information is created. The S3 standard pattern dictionary updating unit changes the standard pattern dictionary by inputting character statistical information.

【０１１２】Ｓ４テスト帳票の文字認識をする。Ｓ５文字認識率を求め，一定値以上であれば，Ｓ７で
その標準パターン辞書により通常の帳票の文字認識を行
う。一定枚数以下であったら，Ｓ６に進む。S4 Character recognition of the test form is performed. The character recognition rate of S5 is obtained, and if it is a certain value or more, the character recognition of a normal form is performed by the standard pattern dictionary in S7. If the number is less than the predetermined number, the process proceeds to S6.

【０１１３】Ｓ６標準パターン辞書のテンプレート数
を変更し，Ｓ４でテスト帳票を再度文字認識してＳ５で
文字認識率が一定値以上になったかを判定する。なお，上記の実施例４の距離に出現回数に基づく重みを
乗算する場合にも，上記の文字認識方法(1) のように処
理対象の帳票の文字統計情報に基づいて重みを求め，重
みを加味して文字認識する方法と，文字認識方法(2) の
ように業務に合わせて作成した学習用テスト帳票により
作成した文字統計情報により重みを求め，重みを加味し
て文字認識する方法の２通りがある。S6 The number of templates in the standard pattern dictionary is changed, the test form is again character-recognized in S4, and it is determined in S5 whether the character recognition rate is equal to or higher than a certain value. Even when the distance in the fourth embodiment is multiplied by the weight based on the number of appearances, the weight is calculated based on the character statistical information of the form to be processed as in the character recognition method (1), and the weight is calculated. There are two methods: character recognition with consideration and character recognition with the statistical information created by the test test form prepared for the work as in character recognition method (2). There is a street.

【０１１４】なお，上記の実施例ではカタカナ文字につ
いて説明したが，本発明はカタカナに限られるものでな
く，それ以外の数字，英字，漢字およびそれらの混在す
る帳票の文字認識に適用できるものである。本発明は字
種が多い程，効果が大きいものである。Although the Katakana characters have been described in the above embodiments, the present invention is not limited to Katakana characters, and can be applied to character recognition of other numbers, letters, kanji and forms in which they are mixed. is there. The present invention is more effective as the number of character types is increased.

【０１１５】[0115]

【発明の効果】本発明の基本構成(1) によれば，業務に
応じて最適な標準パターン辞書を自動的に作成でき，そ
の標準パターン辞書により文字認識するので業務毎に常
に最大の認識精度に近い精度で文字認識することができ
る。According to the basic configuration (1) of the present invention, the optimum standard pattern dictionary can be automatically created according to the work, and the character recognition is performed by the standard pattern dictionary, so that the maximum recognition accuracy is always obtained for each work. Characters can be recognized with accuracy close to.

【０１１６】また，本発明の基本構成(2) によれば，実
際の帳票に出現する文字の重み付けを的確にすることが
でき，文字認識結果に反映されるので，曖昧さのない正
確な文字認識をすることができるようになる。Further, according to the basic configuration (2) of the present invention, it is possible to accurately weight the characters appearing in the actual form and to reflect them in the character recognition result, so that accurate characters without ambiguity can be obtained. You will be able to recognize.

[Brief description of drawings]

【図１】本発明の基本構成(1) を示す図である。FIG. 1 is a diagram showing a basic configuration (1) of the present invention.

【図２】本発明の基本構成(2) を示す図である。FIG. 2 is a diagram showing a basic configuration (2) of the present invention.

【図３】本発明の実施例１を示す図である。FIG. 3 is a diagram showing a first embodiment of the present invention.

【図４】本発明の実施例１の標準パターン辞書の変更方
法を示す図である。FIG. 4 is a diagram showing a method of changing the standard pattern dictionary according to the first embodiment of the present invention.

【図５】本発明の実施例１の標準パターン辞書変更部の
フローチャートを示す図である。FIG. 5 is a diagram showing a flowchart of a standard pattern dictionary changing unit according to the first embodiment of the present invention.

【図６】本発明の実施例２を示す図である。FIG. 6 is a diagram showing a second embodiment of the present invention.

【図７】本発明の実施例２の修正回数保持部と標準パタ
ーン辞書変更部のフローチャートを示す図である。FIG. 7 is a diagram showing a flowchart of a correction number holding unit and a standard pattern dictionary changing unit according to a second embodiment of the present invention.

【図８】実施例３の説明図である。FIG. 8 is an explanatory diagram of the third embodiment.

【図９】本発明の実施例３を示す図である。FIG. 9 is a diagram showing Embodiment 3 of the present invention.

【図１０】本発明の実施例３の隣接文字出現回数測定部
のフローチャートを示す図である。FIG. 10 is a diagram showing a flowchart of an adjacent character appearance frequency measurement unit according to the third embodiment of the present invention.

【図１１】本発明の実施例３の標準パターン辞書変更部
と文字認識部のフローチャートを示す図である。FIG. 11 is a diagram showing a flowchart of a standard pattern dictionary changing unit and a character recognizing unit according to the third embodiment of the present invention.

【図１２】本発明の実施例４を示す図である。FIG. 12 is a diagram showing Embodiment 4 of the present invention.

【図１３】本発明の文字認識方法(1) を示す図である。FIG. 13 is a diagram showing a character recognition method (1) of the present invention.

【図１４】本発明の文字認識方法(2) を示す図である。FIG. 14 is a diagram showing a character recognition method (2) of the present invention.

【図１５】従来の文字認識装置を示す図である。FIG. 15 is a diagram showing a conventional character recognition device.

[Explanation of symbols]

１：文字認識装置２：文字統計情報作成部３：標準パターン辞書変更部４：標準パターン辞書５：文字パターン入力部６：文字認識部６”：認識結果７：認識結果出力部 1: Character recognition device 2: Character statistical information creation unit 3: Standard pattern dictionary change unit 4: Standard pattern dictionary 5: Character pattern input unit 6: Character recognition unit 6 ″: Recognition result 7: Recognition result output unit

Claims

[Claims]

1. A character statistical information creating unit for creating character statistical information of a form, a standard pattern dictionary having a standard pattern representing character characteristics, and a standard for changing the contents of the standard pattern dictionary based on the character statistical information. A pattern dictionary changing unit, a character recognizing unit that compares a character pattern to be recognized with a standard pattern of a standard pattern dictionary to recognize a character of the character pattern, and a recognition result output unit that outputs a result of the character recognition. Character recognition device characterized.

2. The character recognition device according to claim 1, wherein the character statistical information is the number of appearances of characters on the form.

3. The character recognition device according to claim 1, wherein the character statistical information is the number of times of correction of the character corrected correctly by the correction of the recognition result.

4. When the recognition result of an arbitrary character A is a target character and the recognition result of a character B following the target character is an adjacent character, the character statistical information is the number of appearances of the adjacent character B to the target character A. When the recognition result of the target character A is obtained, the character recognition unit modifies the contents of the standard pattern dictionary based on the number of appearances of the adjacent character of the target character A of the character statistical information and recognizes the character. The character recognition device according to claim 1.

5. The standard pattern dictionary has a plurality of templates representing character characteristics for each character category, and the standard pattern dictionary changing unit increases or decreases the number of templates according to character statistical information. The character recognition device according to 3 or 4.

6. A character statistical information creating unit for creating character statistical information of a form, a standard pattern dictionary having a standard pattern representing characteristics of characters, a character pattern to be recognized and the standard pattern, and the character pattern. A character recognition unit for recognizing characters, a weight creation unit for weighting characters based on character statistical information, and a recognition result output unit for outputting the result of character recognition. A character recognition device characterized in that a recognition result is obtained and output based on a comparison result of feature points with and a weight created by a weight creation unit.

7. The character statistical information is the number of appearances, a character having a high appearance frequency has a small weight, a character having a low appearance frequency has a large weight, and a character recognition unit compares a character pattern with a template of a standard dictionary. 7. The character recognition device according to claim 6, wherein the character recognition device is for obtaining a distance to a feature, and outputs a recognition result based on a multiplication result of the distance and the weight.

8. The character recognition device according to claim 1, 2, 3, 4, 5 or 6, wherein the character statistical information creation unit comprises:
A character recognition device characterized in that character statistical information is created based on a character recognition result of a form targeted for character recognition processing.

9. The character recognition device according to claim 1, 2, 3, 4, 5 or 6, wherein the character statistical information creation unit comprises:
A character recognition device characterized in that it creates character statistical information based on the character recognition result of a test form.