JP2004310639A

JP2004310639A - Feature value dimension compression equipment, matching device, program and storage medium

Info

Publication number: JP2004310639A
Application number: JP2003106174A
Authority: JP
Inventors: Hideaki Yamagata; 秀明山形; Shigemasa Oba; 成征大羽; Makoto Ishii; 信石井
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2003-04-10
Filing date: 2003-04-10
Publication date: 2004-11-04
Anticipated expiration: 2023-04-10
Also published as: JP4336516B2

Abstract

<P>PROBLEM TO BE SOLVED: To enhance the capability of recognizing low-quality data while holding the recognition capability of high-quality data and without adverse effects to the high-quality data. <P>SOLUTION: A character recognition device executes pattern matching by performing dimension compression of feature values which uses canonical discriminant analysis utilizing in-class variance and interclass variance. A data classification part 21, a high-quality storage part 22 and a low-quality storage part 23 store learning data after classifying the data into a high-quality set which does not allow an error and a low-quality set which allows an error to some extent because its quality is low. An interclass variance calculation part 26 calculates an interclass variance matrix by using the high-quality set and without using the low-quality set. An in-class variance calculation part 24 calculates an in-class variance matrix by using the low-quality set and without using the high-quality set. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
この発明は、特徴量次元圧縮装置、マッチング装置、プログラム及び記憶媒体に関する。
【０００２】
【従来の技術】
文字認識技術については、特許文献１，２に開示されている。また、正準判別分析の技術については、特許文献３に開示されている。
【０００３】
【特許文献１】
特開２００１−５２１１３公報
【特許文献２】
特開平６−２３１３０９号公報
【特許文献３】
特開２００１−５２１１５公報
【０００４】
【発明が解決しようとする課題】
しかしながら、従来の文字認識技術においては、低品質文字に対する認識性能が向上させようとすると、高品質文字に対する認識性能が低下し、あるいは、高品質文字に対する認識性能に悪影響を与えるという不具合があった。
【０００５】
また、文字認識技術などにおいては、画像を読み込むスキャナの特性などを考慮して、低品質な文字画像を認識するためには多くの学習データを収集する必要があった。フォントデータを展開して作成された高品質な文字画像では、データの収集は容易であるが、実際に原稿をスキャンして得られた文字画像にはノイズなどが乗っており、そのような低品質なデータの認識に不具合を生じることがあった。このように、一般的に高品質な（典型的な）データセットは収集することが容易であるが、低品質なデータセットは種類も多く、データを収集することが困難、あるいは作業量が多い。そこで、高品質なデータセットのみを用いて、低品質なデータにも絶えうる文字認識技術が望まれている。
【０００６】
さらに、統計的手法においてはしばしば学習時間が問題となり、正準判別分析も例外ではなく、少しでも学習時間を減らすことが望ましい。
【０００７】
本発明は、高品質データに対する認識性能を保持しつつ、また、高品質データに対する悪影響なしに、低品質データに対する認識性能を向上させることができるようにすることである。
【０００８】
【課題を解決するための手段】
請求項１に記載の発明は、クラス内分散、クラス間分散を利用して正準判別分析を用いた特徴量の次元圧縮を行う特徴量次元圧縮装置において、学習データについて誤りを許容しない高品質セットと品質が低いためある程度の誤りを許容する低品質セットとに分類して記憶する記憶手段と、前記高品質セットを用いて前記低品質セットは用いることなくクラス間分散行列を算出するクラス間分散行列算出手段と、前記低品質セットを用いて前記高品質セットは用いることなくクラス内分散行列を算出するクラス内分散行列算出手段と、を備えていることを特徴とする特徴量次元圧縮装置である。
【０００９】
したがって、高品質データに対する認識性能を保持しつつ、また、高品質データに対する悪影響なしに、低品質データに対する認識性能を向上させることができる。
【００１０】
請求項２に記載の発明は、クラス内分散、クラス間分散を利用して正準判別分析を用いた特徴量の次元圧縮を行なってパターンマッチングを実行するマッチング装置において、学習データについて誤りを許容しない高品質セットと品質が低いためある程度の誤りを許容する低品質セットとに分類して記憶する記憶手段と、前記高品質セットを用いて前記低品質セットは用いることなくクラス間分散行列を算出するクラス間分散行列算出手段と、前記低品質セットを用いて前記高品質セットは用いることなくクラス内分散行列を算出するクラス内分散行列算出手段と、を備えていることを特徴とするマッチング装置である。
【００１１】
したがって、高品質データに対する認識性能を保持しつつ、また、高品質データに対する悪影響なしに、低品質データに対する認識性能を向上させることができる。
【００１２】
請求項３に記載の発明は、請求項２に記載のマッチング装置において、クラス内分散行列算出手段は、前記クラス内分散行列の算出の際に、前記低品質セットの中から一部のカテゴリのデータを選択し、選択されたデータについてクラス内分散を算出すること、を特徴とする。
【００１３】
したがって、低品質セットの中から一部のカテゴリのデータを選択し、選択されたデータについてクラス内分散を算出することで、品質データに対する認識性能を保持しつつ、また、高品質データに対する悪影響なしに、低品質データに対する認識性能を向上させることができる。
【００１４】
請求項４に記載の発明は、請求項２又は３に記載のマッチング装置において、前記特徴量の次元圧縮を多段階に行い、この多段階の各段で分割するクラス数が圧縮次元数よりも多い場合には前記正準判別分析による次元圧縮を用い、それ以外の場合には他の種類の次元圧縮を用いること、を特徴とする。
【００１５】
したがって、正準判別分析による次元圧縮を含めて次元圧縮を多段階に行なうことができる。
【００１６】
請求項５に記載の発明は、前請求項４に記載のマッチング装置において、記他の種類の次元圧縮として主成分分析を用いる次元圧縮を行うこと、を特徴とする。
【００１７】
したがって、正準判別分析による次元圧縮、主成分分析を用いる次元圧縮により、次元圧縮を多段階に行なうことができる。
【００１８】
請求項６に記載の発明は、正準判別分析を用いた特徴量の次元圧縮をコンピュータに実行させるコンピュータに読み取り可能なプログラムにおいて、学習データについて誤りを許容しない高品質セットと品質が低いためある程度の誤りを許容する低品質セットとに分類して記憶する記憶処理と、前記高品質セットを用いて前記低品質セットは用いることなくクラス間分散行列を算出するクラス間分散行列算出処理と、前記低品質セットを用いて前記高品質セットは用いることなくクラス内分散行列を算出するクラス内分散行列算出処理と、をコンピュータに実行させることを特徴とするプログラムである。
【００１９】
したがって、高品質データに対する認識性能を保持しつつ、また、高品質データに対する悪影響なしに、低品質データに対する認識性能を向上させることができる。
【００２０】
請求項７に記載の発明は、クラス内分散、クラス間分散を利用して正準判別分析を用いた特徴量の次元圧縮を行なってパターンマッチングをコンピュータに実行させるコンピュータに読み取り可能なプログラムにおいて、学習データについて誤りを許容しない高品質セットと品質が低いためある程度の誤りを許容する低品質セットとに分類して記憶する記憶手段と、前記高品質セットを用いて前記低品質セットは用いることなくクラス間分散行列を算出するクラス間分散行列算出手段と、前記低品質セットを用いて前記高品質セットは用いることなくクラス内分散行列を算出するクラス内分散行列算出手段と、をコンピュータに実行させるプログラムである。
【００２１】
したがって、高品質データに対する認識性能を保持しつつ、また、高品質データに対する悪影響なしに、低品質データに対する認識性能を向上させることができる。
【００２２】
請求項８に記載の発明は、プログラムを記憶している記憶媒体において、請求項６又は７に記載のプログラムを記憶していること、を特徴とする記憶媒体である。
【００２３】
したがって、記憶しているプログラムにより請求項６又は７に記載の発明と同様の作用、効果を奏することができる。
【００２４】
【発明の実施の形態】
本発明の一実施の形態について説明する。
【００２５】
図１は、本実施の形態の文字認識装置１のハードウエア構成を示す電気的な接続のブロック図である。図１に示すように、文字認識装置１は、本発明の特徴量次元圧縮装置、マッチング装置を実施するものであり、各種演算を行ない、文字認識装置１の各部を集中的に制御するＣＰＵ１１と、各種のＲＯＭ、ＲＡＭからなるメモリ１２とが、バス１３で接続されている。
【００２６】
バス１３には、所定のインターフェイスを介して、ハードディスクなどの磁気記憶装置１４と、マウス、キーボード等により構成される入力装置１５と、表示装置１６と、光ディスクなどの記憶媒体１７を読み取る記憶媒体読取装置１８とが接続され、また、ネットワーク２と通信を行なう所定の通信インターフェイス１９が接続されている。なお、記憶媒体１７としては、ＣＤ，ＤＶＤなどの光ディスク、光磁気ディスク、フレキシブルディスクなどの各種メディアを用いることができる。また、記憶媒体読取装置１８は、具体的には記憶媒体１７の種類に応じて光ディスク装置、光磁気ディスク装置、フレキシブルディスク装置などが用いられる。
【００２７】
文字認識装置１では、この発明の記憶媒体を実施する記憶媒体１７から、この発明のプログラムを実施するプログラム２０を読み取って、磁気記憶装置１４にインストールする。これらのプログラム２０はネットワーク２や、インターネットを介してダウンロードしてインストールするようにしてもよい。このインストールにより、文字認識装置１は、後述の所定の処理の実行が可能な状態となる。なお、プログラム２０は、所定のＯＳ上で動作するものであってもよい。
【００２８】
［射影ベクトルの算出について］
図２は、プログラム２０に基づいて動作する文字認識装置１の射影ベクトルの算出までの処理の機能ブロック図である。また、図３は、文字認識装置１が実行する射影ベクトルの算出までの処理のフローチャートである。図２に示す各部は、図１に例示するハードウエア上でプログラム２０が動作することにより実現する。
【００２９】
ここでは、文字認識装置１により、文字画像から得られた特徴ベクトルの次元圧縮を行う場合について説明する。ユーザはフォント展開文字画像及び図示しないスキャナから読み込んだ文字画像を用意し、適当な手段で「転写紙通過領域でジャムが発生した場合でも転写紙を容易に除去することができ、これにより転写紙の取出し性を一段と向上させることができる」という有利な効果特徴ベクトルに変換する。文字画像からの特徴抽出方法については数多くの方法が提案されており、いかなる手法を用いた場合でも本実施の形態では成立するので、ここでの説明は省略する。
【００３０】
ユーザは収集した学習データを、データ分類部２１に入力する。データ分類部２１においては、フォント展開文字画像から得られた特徴ベクトル、は誤りを許容する高品質セットに、他の特徴ベクトルは、品質が低いためある程度の誤りを許容する低品質セットに分類し（ステップＳ１）、それぞれ高品質保存部２２、低品質保存部２３に保存する（ステップＳ２，Ｓ３）。あるいは、ユーザが他の手法で適宜学習セットを分類して、文字認識装置１に対して高品質セットと低品質セットを別々に入力するように構成してもよい。データ分類部２１、高品質保存部２２、低品質保存部２３により記憶手段を、ステップＳ１〜Ｓ３の処理により記憶処理を、それぞれ実現している。
【００３１】
クラス内分散行列算出手段であるクラス内分散算出部２４においては、低品質保存部２３内の低品質特徴ベクトル群を用いて、高品質保存部２２内の高品質特徴ベクトル群を用いずに、クラス内分散行列を算出する（クラス内分散行列算出処理）（ステップＳ４）。クラス内分散の算出方法については詳細を後述する。クラス内分散行列が算出されたならば、それを固有値／固有ベクトル算出部２５へ送付する。
【００３２】
一方、クラス間分散行列算出手段であるクラス間分散算出部２６においては、高品質保存部２２内の高品質特徴ベクトル群を用いて、低品質保存部２３内の低品質特徴ベクトル群は用いずに、クラス間分散行列を算出する（クラス間分散行列算出処理）（ステップＳ５）。クラス間分散の算出方法については詳細を後述する。クラス間分散行列が算出されたならば、それを固有値／固有ベクトル算出部２５へ送付する。
【００３３】
固有値／固有ベクトル算出部２５では、送られてきたクラス内分散行列と、クラス間分散行列を用いて、後述の（１）式を用いて固有値／固有ベクトルを算出する（ステップＳ６）。
【００３４】
算出された固有値／固有ベクトルは、射影ベクトル算出部へ送付され、射影行列算出部２７では、送られてきた固有値／固有ベクトルを再構成して射影行列を得る（ステップＳ７）。その詳細については後述する。算出された射影行列は、射影行列保存部２８に保存する。
【００３５】
このようにして射影行列を求めることで、高品質文字に対する認識性能を低下させずに、低品質文字に対する認識性能が向上することが実験的に確認できた。
【００３６】
次に、射影ベクトルの算出までの処理の他の例について説明する。この例においても、そのハードウエア構成は図１のものと同様である。図４は、本例における機能ブロック図であり、図５はフローチャートである。図４に示す各部は、図１に例示するハードウエア上でプログラム２０が動作することにより実現する。なお、図２、図３と同一符号の装置、処理は、図２、図３を参照して説明した前述の例と同様であるため、詳細な説明は省略する。
【００３７】
本例でも、文字認識の場合を例にとって説明する。文字認識の場合、クラス内分散は主に原稿をスキャンする際に画像にのるノイズなどに起因して値が変動する場合が多い。したがって、文字が同じであれば、フォントなど字形が少々変化しても大きな変動は生じない場合が多い。その一方で、特定の原稿セットなどを認識する場合には、特定のフォントセットに認識対象を限定して、認識系を再設計（パターン辞書を再作成）する場合も多い。このような場合、毎回低品質データセットを収集するのには非常に労力がかかるため、特注対応による高コスト化が問題となる。
【００３８】
そこで、スキャナ変化などのクラス内分散の要因となる低品質データを収集し、それに対するクラス内分散行列を算出したならば、認識対象フォントなどが代わった場合でも、そのクラス内分散行列を流用する手法が考えられる。
【００３９】
まず、クラス内分散行列については、すでに実施の形態１に示す方法で過去に算出されたものが、クラス内分散保存部３１に保存されているものとする。ユーザは新しい認識対象について、実施の形態１と同様に学習データを収集する。その際、クラス内分散は過去のものを流用するので、高品質なデータセットのみを収集すればよい。
【００４０】
収集したデータは高品質保存部２２に入力しておく。入力されたならば、クラス間分散算出部２６においては、実施の形態１と同様の処理を行う（ステップＳ１１）。
【００４１】
固有値／固有ベクトル算出部２５では、クラス間分散行列が送られてきたならば、クラス内分散保存部３１内のクラス内分散行列を参照して、実施の形態１と同様に固有値／固有ベクトルを算出する（ステップＳ１２）。そして、射影行列算出部２７における処理は実施例１と同様である（ステップＳ１３）。
【００４２】
以上のような処理を実行することにより、低品質データの収集作業から開放されるとともに、高品質画像に対する性能の低下なしに低品質画像に対する性能を向上させることが実験により確認できた。
【００４３】
以上、射影ベクトルの算出までについて説明した。このようにして射影行列を算出することで、特徴量の次元圧縮が可能となり、後段の処理であるマッチング処理の時間を低減することができる。
【００４４】
［多段階認識を用いたマッチングについて］
ここでは、前述のように算出した射影行列を用いて行う、多段階認識を用いたマッチング処理について説明する。
【００４５】
この場合のハードウエア構成は図１のものと同様である。図６は、この場合の処理を実行する際の機能ブロック図であり、図７はフローチャートである。図４に示す各部は、図１に例示するハードウエア上でプログラム２０が動作することにより実現する。なお、図２，図４と同一符号の装置は、図２，図４を参照して説明した前述の説明と同様であるため、詳細な説明は省略する。
【００４６】
射影行列保存部２８には、前述のようにして求められた射影行列が保存されている。また、通常辞書４１には、学習データに対して射影行列を施して得られる次元圧縮された特徴量について、各カテゴリの平均を算出した平均特徴量を保存する。一例として、次元圧縮する前の特徴量ｎは、“ｎ＝２５６次元”、次元圧縮後の特徴量ｎ’は、“ｎ’＝６４次元”であるとする。また、日本語文書を対象とした場合、クラス数は約４０００程度である。
【００４７】
類似文字辞書４２には、特定の２文字の組み合わせについての射影行列と、それぞれの射影行列を施して得られた次元圧縮後の特徴量について、各カテゴリの平均を算出した平均特徴量を保存する。特定の２文字の組み合わせについての射影行列は、前述のような正準判別は用いずに、ここでは主成分分析を用いて算出する。主成分分析については詳細を後述する。ここでも、次元圧縮する前の特徴量ｎは、“ｎ＝２５６次元”、次元圧縮後の特徴量ｎ’は、“ｎ’＝６４次元”であるとする。特定の２文字の組み合わせを対象とするので、クラス数は２となる。
【００４８】
その場合、正準判別分析では、次元圧縮後の次元数は最大でもｎ’＝１となり、十分な識別性能が得られない場合が多い。このような場合には正準判別分析による分類は効果がないばかりか、認識性能が低下するので、他の手法によりマッチング処理を行う。ここでは、一例として主成分分析をあげたが、特徴量の次元圧縮を行わずに、この例の場合には２５６次元の特徴量をそのまま類似文字辞書４２に保存してマッチングする方法などでも構わない。特定の２文字の組み合わせは、正準判別による特徴量次元圧縮を用いた場合に識別困難な組み合わせを予め求めて保存しておく。ここでは、文字「ぱ」と「ば」、「ぴ」と「び」、「ぶ」と「ぷ」、「ぺ」と「べ」、「ぼ」と「ぽ」、の組み合わせについてそれぞれの学習データを用いて主成分分析によって求められた射影行列と、平均特徴量が保存されているものとする。この場合の類似文字辞書４２の登録例を図８に示す。
【００４９】
文字認識装置１の前段には図示しない特徴抽出装置が用意され、前述のように、文字画像からの特徴抽出方法については数多くの方法が提案されており、いかなる手法を用いた場合でも、本実施の形態は成立するので、ここでの説明は省略する。
【００５０】
文字画像から抽出された特徴量が、図６に示す文字認識装置１に入力されると、その特徴量は正準特徴圧縮装置４３と主成分分析特徴圧縮装置４４の両方へ送られる。そしてまず、正準特徴圧縮装置４３において、２５６次元から６４次元に次元圧縮処理を行う（ステップＳ２１）。ここでは学習時に作成されている前述の射影行列を参照して、後述の（２）式を用いて次元圧縮特徴量を算出する。
【００５１】
算出された６４次元の特徴量はマッチング装置４５に送られる。マッチング装置４５では、通常辞書４１に保存されている各クラスの平均特徴量と、送られてきた次元圧縮特徴量のユークリッド距離を算出し、距離の小さい順に候補として類似文字識別装置４６へ送って、マッチングを行う（ステップＳ２２）。ここでは一例として単純なユークリッド距離を用いたマッチング方法を示したが、数多く提案されている他のマッチング方法を用いても構わない。
【００５２】
類似文字識別装置４６では、送られてきた第１候補と第２候補の文字コードを、類似文字辞書４２の文字組レコードと比較し、文字組レコードの中に第１候補と第２候補の組と同じものがあった場合には（ステップＳ２３のＹ）、主成分分析特徴圧縮装置４４を起動（辞書レコード番号を送付）して、文字組に対応した次元圧縮特徴量を得る（ステップＳ２４）。それ以外の場合には（ステップＳ２３のＮ）、送られてきた認識結果をそのまま認識結果として出力する（ステップＳ２６）。例えば、第１候補が「ぴ」、第２候補が「び」の場合には、図８の辞書レコードの番号２が見つかるので、主成分分析特徴圧縮装置４４に辞書レコードの番号２を送付して、主成分分析特徴圧縮装置４４を起動する。また、第１候補が「ぴ」、第２候補が「Ｕ」の場合には、主成分分析特徴圧縮装置４４を起動せずに、得られた認識結果を出力する。
【００５３】
主成分分析特徴圧縮装置４４では、類似文字識別装置４６から辞書レコード番号が送付されてきたならば、対応する辞書レコードの射影行列を用いて２５６次元の特徴量を６４次元に次元圧縮する（ステップＳ２４）。そして、得られた次元圧縮特徴量を類似文字識別装置４６に送付する。
【００５４】
類似文字識別装置４６では、次元圧縮特徴量が送られてくると、辞書レコードとして類似文字辞書４２に保存されている２つの平均圧縮特徴量との間でユークリッド距離を算出し、距離の小さい方を第１候補、大きいほうを第２候補として、マッチング装置４５から送られてきた認識結果を修正して（ステップＳ２５）、その認識結果を出力する（ステップＳ２６）。
【００５５】
［正準判別分析について］
正準判別分析について具体的に説明する。
【００５６】
（１）正準判別分析
ある文字種ｃのｎ次元の特徴ベクトルＸ_ｃから正準判別分析によって、ｎ´次元の特徴量ベクトルＹ_ｃを選択する方法を以下に説明する。
【００５７】
まず、次式を満たす固有ベクトル行列Φと固有値行列Λを求める。
【００５８】
Ｓ_ｂΦ＝Ｓ_ｗΦΛ …… （１）
ここで、Ｓ_ｂはクラス間分散行列、Ｓ_ｗはクラス内分散行列であり、Λは固有値λ_ｉ（λ_ｉ≦λ_２≦…≦λ_ｎ）を対角要素とする固有値行列、Φは対応する固有ベクトルΦ_ｉを列ベクトルとする固有ベクトル行列である。
【００５９】
この固有ベクトルを固有値の大きいほうからだけ取った射影行列Ｗ´＝｛Φ_ｉ，…，Φ_ｎ´｝により、ｎ次元特徴量ベクトルＸ_ｃを射影し、新たなｎ´次元の特徴量ベクトルＹ_ｃを、
Ｙ_ｃ＝Ｗ´Ｘ_ｃ …… （２）
として求める。但し、分類するクラス数がｋの場合、ｎ´≦（ｋ−１）である。
【００６０】
（２）主成分分析
ある文字種ｃのｎ次元の特徴ベクトルＸ_ｃから主成分分析によって、ｎ´次元の特徴量ベクトルＹ_ｃを選択する方法を以下に説明する。
【００６１】
まず、次式を満たす固有ベクトル行列Φと固有値行列Λを求める。
【００６２】
Ｓ_ｔΦ＝ΦΛ …… （３）
ここで、Ｓ_ｔは共分散行列、Λは固有値λ_ｉ（λ_ｉ≦λ_２≦…≦λ_ｎ）を対角要素とする固有値行列、Φは対応する固有ベクトルΦ_ｉを列ベクトルとする固有ベクトル行列である。
【００６３】
この固有ベクトルを固有値の大きいほうからだけ取った射影行列Ｗ´＝｛Φ_ｉ，…，Φ_ｎ´｝により、ｎ次元特徴量ベクトルＸ_ｃを射影し、新たなｎ´次元の特徴量ベクトルＹ_ｃを、
Ｙ_ｃ＝Ｗ´Ｘ_ｃ …… （２）
として求める。但し、ｎ´＜ｎである。
【００６４】
［クラス内分散、クラス間分散について］
クラス内分散、クラス間分散について具体的に説明する。
【００６５】
ｎ次元の各データベクトルをｘ、クラスをＭとした場合、各クラス平均と全体平均との差ｖ^ｃ及び各例題とクラス平均との差ｗ（ｔ）を、
【００６６】
【数１】

【００６７】
と定義し、クラス間分散行列Ｂとクラス内分散行列Ｗを、
【００６８】
【数２】

【００６９】
と定義する。
【００７０】
【発明の効果】
請求項１，２，６〜８に記載の発明は、高品質データに対する認識性能を保持しつつ、また、高品質データに対する悪影響なしに、低品質データに対する認識性能を向上させることができる。
【００７１】
請求項３に記載の発明は、請求項２に記載の発明において、低品質セットの中から一部のカテゴリのデータを選択し、選択されたデータについてクラス内分散を算出することで、品質データに対する認識性能を保持しつつ、また、高品質データに対する悪影響なしに、低品質データに対する認識性能を向上させることができる。
【００７２】
請求項４に記載の発明は、請求項２又は３に記載の発明において、正準判別分析による次元圧縮を含めて次元圧縮を多段階に行なうことができる。
【００７３】
請求項５に記載の発明は、前請求項４に記載の発明において、正準判別分析による次元圧縮、主成分分析を用いる次元圧縮により、次元圧縮を多段階に行なうことができる。
【図面の簡単な説明】
【図１】本発明の一実施の形態である文字認識装置のハードウエア構成を示す電気的な接続のブロック図である。
【図２】文字認識装置の射影ベクトルの算出までの処理の機能ブロック図である。
【図３】文字認識装置の射影ベクトルの算出までの処理のフローチャートである。
【図４】文字認識装置の射影ベクトルの算出までの処理に関する他の例の機能ブロック図である。
【図５】文字認識装置の射影ベクトルの算出までの処理に関する他の例のフローチャートである。
【図６】文字認識装置による多段階認識を用いたマッチング処理の機能ブロック図である。
【図７】文字認識装置による多段階認識を用いたマッチング処理のフローチャートである。
【図８】類似文字辞書の登録例を説明する説明図である。
【符号の説明】
１特徴量次元圧縮装置、マッチング装置
２２，２３記憶手段
２４クラス内分散行列算出手段
２６クラス間分散行列算出手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a feature dimension compression device, a matching device, a program, and a storage medium.
[0002]
[Prior art]
The character recognition technology is disclosed in

Patent Documents

1 and 2. The technique of canonical discriminant analysis is disclosed in Patent Document 3.
[0003]
[Patent Document 1]
JP 2001-52113 A [Patent Document 2]
JP-A-6-231309 [Patent Document 3]
JP 2001-52115 A
[Problems to be solved by the invention]
However, in the conventional character recognition technology, there is a problem in that if the recognition performance for low-quality characters is to be improved, the recognition performance for high-quality characters is reduced, or the recognition performance for high-quality characters is adversely affected. .
[0005]
Further, in a character recognition technology or the like, it is necessary to collect a large amount of learning data in order to recognize a low-quality character image in consideration of characteristics of a scanner for reading an image. Data collection is easy for high-quality character images created by expanding font data.However, character images obtained by actually scanning originals contain noise, etc. In some cases, there was a problem in recognizing quality data. As described above, generally, high-quality (typical) data sets are easy to collect, but low-quality data sets are many types, and it is difficult to collect data or the amount of work is large. . Therefore, there is a demand for a character recognition technique that can use low-quality data by using only high-quality data sets.
[0006]
Furthermore, learning time often becomes a problem in statistical methods, and canonical discriminant analysis is no exception, and it is desirable to reduce learning time as much as possible.
[0007]
SUMMARY OF THE INVENTION It is an object of the present invention to improve the recognition performance for low-quality data while maintaining the recognition performance for high-quality data and without adverse effects on high-quality data.
[0008]
[Means for Solving the Problems]
According to a first aspect of the present invention, there is provided a feature quantity dimensional compression apparatus for performing dimensional compression of a feature quantity using canonical discriminant analysis using intra-class variance and inter-class variance. A storage means for classifying and storing the set and a low-quality set which allows a certain amount of error due to low quality, and an inter-class for calculating an inter-class variance matrix without using the low-quality set using the high-quality set A feature matrix dimensioning device, comprising: a variance matrix calculation unit; and an in-class variance matrix calculation unit that calculates an in-class variance matrix using the low-quality set without using the high-quality set. It is.
[0009]
Therefore, the recognition performance for low-quality data can be improved while maintaining the recognition performance for high-quality data, and without adverse effects on high-quality data.
[0010]
According to a second aspect of the present invention, there is provided a matching apparatus which performs pattern matching by performing dimension matching of a feature amount using canonical discriminant analysis using intra-class variance and inter-class variance, and allows an error in learning data. Storage means for classifying and storing a high-quality set that does not have a low-quality set that allows some error due to low quality, and calculating the inter-class variance matrix without using the low-quality set using the high-quality set An inter-class variance matrix calculating means, and an in-class variance matrix calculating means for calculating an in-class variance matrix using the low-quality set without using the high-quality set. It is.
[0011]
Therefore, the recognition performance for low-quality data can be improved while maintaining the recognition performance for high-quality data, and without adverse effects on high-quality data.
[0012]
According to a third aspect of the present invention, in the matching device according to the second aspect, the intra-class variance matrix calculating means calculates, when calculating the intra-class variance matrix, some categories of the low quality set. Data is selected, and the intra-class variance is calculated for the selected data.
[0013]
Therefore, by selecting some categories of data from the low quality set and calculating the intra-class variance for the selected data, the recognition performance for the quality data is maintained, and there is no adverse effect on the high quality data. In addition, it is possible to improve recognition performance for low-quality data.
[0014]
According to a fourth aspect of the present invention, in the matching device according to the second or third aspect, the dimensional compression of the feature amount is performed in multiple stages, and the number of classes divided in each of the multiple stages is larger than the number of compressed dimensions. In many cases, dimension compression by the canonical discriminant analysis is used, and in other cases, another type of dimension compression is used.
[0015]
Therefore, dimensional compression can be performed in multiple stages including dimensional compression by canonical discriminant analysis.
[0016]
According to a fifth aspect of the present invention, in the matching device according to the fourth aspect, dimensional compression using principal component analysis is performed as another type of dimensional compression.
[0017]
Therefore, dimensional compression can be performed in multiple stages by dimensional compression using canonical discriminant analysis and dimensional compression using principal component analysis.
[0018]
According to a sixth aspect of the present invention, there is provided a computer-readable program for causing a computer to execute dimension reduction of a feature amount using canonical discriminant analysis. A storage process that classifies and stores the low-quality set that allows an error, and an inter-class variance matrix calculation process that calculates an inter-class variance matrix without using the low-quality set using the high-quality set; A program for causing a computer to execute an in-class variance matrix calculation process of calculating an in-class variance matrix using a low-quality set without using the high-quality set.
[0019]
Therefore, the recognition performance for low-quality data can be improved while maintaining the recognition performance for high-quality data, and without adverse effects on high-quality data.
[0020]
According to a seventh aspect of the present invention, there is provided a program readable by a computer for causing a computer to execute pattern matching by performing dimensional compression of a feature amount using canonical discriminant analysis using intra-class variance and inter-class variance, For the training data, storing means for classifying and storing a high-quality set that does not allow an error and a low-quality set that allows a certain error due to low quality, and using the high-quality set, without using the low-quality set Causing a computer to execute an inter-class variance matrix calculation means for calculating an inter-class variance matrix, and an intra-class variance matrix calculation means for calculating an in-class variance matrix using the low-quality set without using the high-quality set It is a program.
[0021]
Therefore, the recognition performance for low-quality data can be improved while maintaining the recognition performance for high-quality data, and without adverse effects on high-quality data.
[0022]
The invention according to claim 8 is a storage medium storing the program according to claim 6 or 7 in a storage medium storing the program.
[0023]
Therefore, the same operation and effect as the invention described in claim 6 or 7 can be obtained by the stored program.
[0024]
BEST MODE FOR CARRYING OUT THE INVENTION
An embodiment of the present invention will be described.
[0025]
FIG. 1 is a block diagram of an electrical connection showing a hardware configuration of a character recognition device 1 of the present embodiment. As shown in FIG. 1, a character recognition apparatus 1 implements a feature dimension compression apparatus and a matching apparatus according to the present invention. The CPU 11 performs various calculations and centrally controls each unit of the character recognition apparatus 1. And a memory 12 composed of various ROMs and RAMs are connected by a bus 13.
[0026]
The bus 13 is provided with a magnetic storage device 14 such as a hard disk, an input device 15 including a mouse and a keyboard, a display device 16, and a storage medium reading a storage medium 17 such as an optical disk via a predetermined interface. An apparatus 18 is connected thereto, and a predetermined communication interface 19 for performing communication with the network 2 is connected thereto. In addition, as the storage medium 17, various media such as an optical disk such as a CD and a DVD, a magneto-optical disk, and a flexible disk can be used. As the storage medium reading device 18, an optical disk device, a magneto-optical disk device, a flexible disk device, or the like is used depending on the type of the storage medium 17.
[0027]
The character recognition device 1 reads the program 20 that implements the program of the present invention from the storage medium 17 that implements the storage medium of the present invention, and installs the program 20 in the magnetic storage device 14. These programs 20 may be downloaded and installed via the network 2 or the Internet. By this installation, the character recognition device 1 is in a state in which a predetermined process described later can be executed. Note that the program 20 may operate on a predetermined OS.
[0028]
[About calculation of projection vector]
FIG. 2 is a functional block diagram of a process up to the calculation of the projection vector of the character recognition device 1 operating based on the program 20. FIG. 3 is a flowchart of a process performed by the character recognition device 1 up to calculation of a projection vector. Each unit illustrated in FIG. 2 is realized by the operation of the program 20 on the hardware illustrated in FIG.
[0029]
Here, a case will be described in which the character recognition device 1 performs dimensional compression of a feature vector obtained from a character image. The user prepares a font development character image and a character image read from a scanner (not shown), and uses appropriate means to “can easily remove the transfer paper even if a jam occurs in the transfer paper passage area, thereby enabling the transfer paper to be removed. Can be further improved ", which is an advantageous effect feature vector. Numerous methods have been proposed for extracting a feature from a character image, and any method may be used in the present embodiment.
[0030]
The user inputs the collected learning data to the data classification unit 21. The data classification unit 21 classifies the feature vectors obtained from the font-decompressed character image into a high-quality set that allows errors, and the other feature vectors into a low-quality set that allows some errors due to low quality. (Step S1), and store them in the high quality storage unit 22 and the low quality storage unit 23 (Steps S2 and S3). Alternatively, a configuration may be adopted in which the user appropriately classifies the learning set by another method and inputs the high-quality set and the low-quality set to the character recognition device 1 separately. The data classification unit 21, the high-quality storage unit 22, and the low-quality storage unit 23 realize a storage unit, and a storage process is realized by the processes of steps S1 to S3.
[0031]
In the intra-class variance calculation unit 24, which is an intra-class variance matrix calculation unit, using the low quality feature vector group in the low quality storage unit 23, without using the high quality feature vector group in the high quality storage unit 22, The intra-class variance matrix is calculated (in-class variance matrix calculation processing) (step S4). The method of calculating the intra-class variance will be described later in detail. When the intra-class variance matrix is calculated, it is sent to the eigenvalue / eigenvector calculation unit 25.
[0032]
On the other hand, the inter-class variance calculation unit 26 as the inter-class variance matrix calculation means uses the high-quality feature vector group in the high-quality storage unit 22 and does not use the low-quality feature vector group in the low-quality storage unit 23. Next, an inter-class variance matrix is calculated (inter-class variance matrix calculation processing) (step S5). The method of calculating the inter-class variance will be described later in detail. When the inter-class variance matrix is calculated, it is sent to the eigenvalue / eigenvector calculation unit 25.
[0033]
The eigenvalue / eigenvector calculation unit 25 calculates an eigenvalue / eigenvector using the sent intra-class variance matrix and the inter-class variance matrix by using the following equation (1) (step S6).
[0034]
The calculated eigenvalue / eigenvector is sent to the projection vector calculation unit, and the projection matrix calculation unit 27 reconstructs the transmitted eigenvalue / eigenvector to obtain a projection matrix (step S7). The details will be described later. The calculated projection matrix is stored in the projection matrix storage unit 28.
[0035]
By determining the projection matrix in this way, it has been experimentally confirmed that the recognition performance for low-quality characters is improved without lowering the recognition performance for high-quality characters.
[0036]
Next, another example of the processing up to the calculation of the projection vector will be described. Also in this example, the hardware configuration is the same as that of FIG. FIG. 4 is a functional block diagram in this example, and FIG. 5 is a flowchart. Each unit illustrated in FIG. 4 is realized by the operation of the program 20 on the hardware illustrated in FIG. Note that the devices and processes denoted by the same reference numerals as those in FIGS. 2 and 3 are the same as those in the above-described example described with reference to FIGS.
[0037]
Also in this example, the case of character recognition will be described as an example. In the case of character recognition, the value of the intra-class variance often fluctuates mainly due to noise on an image when a document is scanned. Therefore, if the characters are the same, a large change does not often occur even if the character shape such as the font slightly changes. On the other hand, when recognizing a specific document set or the like, it is often the case that the recognition target is limited to a specific font set and the recognition system is redesigned (pattern dictionary is recreated). In such a case, it takes a great deal of effort to collect a low-quality data set every time, and therefore, there is a problem of high cost due to customization.
[0038]
Therefore, if low-quality data that causes intra-class variance such as scanner changes is collected and an intra-class variance matrix is calculated for it, the intra-class variance matrix is diverted even if the font to be recognized is replaced. A method is conceivable.
[0039]
First, it is assumed that the intra-class variance matrix previously calculated by the method shown in the first embodiment is stored in the intra-class variance storage unit 31. The user collects learning data for a new recognition target in the same manner as in the first embodiment. In this case, since the intra-class variance uses the past, only the high-quality data set needs to be collected.
[0040]
The collected data is input to the high quality storage unit 22. If input, the inter-class variance calculation unit 26 performs the same processing as in the first embodiment (step S11).
[0041]
When the inter-class variance matrix is sent, the eigenvalue / eigenvector calculation unit 25 refers to the intra-class variance matrix in the intra-class variance storage unit 31 and calculates the eigenvalue / eigenvector as in the first embodiment. (Step S12). The processing in the projection matrix calculation unit 27 is the same as in the first embodiment (step S13).
[0042]
By performing the above-described processing, it has been confirmed by an experiment that the operation for collecting low-quality data is released and the performance for low-quality images is improved without lowering the performance for high-quality images.
[0043]
The above has been described up to the calculation of the projection vector. By calculating the projection matrix in this manner, the dimension of the feature amount can be reduced, and the time for the matching processing, which is the subsequent processing, can be reduced.
[0044]
[About matching using multi-stage recognition]
Here, a description will be given of a matching process using multi-stage recognition performed using the projection matrix calculated as described above.
[0045]
The hardware configuration in this case is the same as that of FIG. FIG. 6 is a functional block diagram when the process in this case is executed, and FIG. 7 is a flowchart. Each unit illustrated in FIG. 4 is realized by the operation of the program 20 on the hardware illustrated in FIG. 2 and FIG. 4 are the same as those described above with reference to FIG. 2 and FIG.
[0046]
The projection matrix storage unit 28 stores the projection matrix obtained as described above. In addition, the normal dictionary 41 stores an average feature amount obtained by calculating an average of each category with respect to a dimension-compressed feature amount obtained by applying a projection matrix to the learning data. As an example, it is assumed that the feature amount n before the dimensional compression is “n = 256 dimensions”, and the feature amount n ′ after the dimensional compression is “n ′ = 64 dimensions”. When Japanese documents are targeted, the number of classes is about 4000.
[0047]
The similar character dictionary 42 stores a projection matrix for a specific combination of two characters, and an average feature amount obtained by calculating an average of each category for a feature amount after dimensional compression obtained by applying each projection matrix. . The projection matrix for a specific combination of two characters is calculated using principal component analysis here without using the above-described canonical discrimination. Details of the principal component analysis will be described later. Here, it is also assumed that the feature quantity n before the dimensional compression is “n = 256 dimensions”, and the feature quantity n ′ after the dimensional compression is “n ′ = 64 dimensions”. Since a specific combination of two characters is targeted, the number of classes is two.
[0048]
In that case, in the canonical discriminant analysis, the number of dimensions after dimension compression is n '= 1 at the maximum, and sufficient discrimination performance is often not obtained. In such a case, the classification by the canonical discriminant analysis is not only ineffective, but also the recognition performance deteriorates. Therefore, the matching process is performed by another method. Here, the principal component analysis is described as an example. However, in this example, a 256-dimensional feature amount is directly stored in the similar character dictionary 42 for matching without performing the dimension compression of the feature amount. Absent. As for a specific combination of two characters, a combination that is difficult to identify when using feature dimension compression by canonical discrimination is obtained and stored in advance. Here, the combination of the letters “ぱ” and “ba”, “ぴ” and “bi”, “bu” and “ぷ”, “ぺ” and “be”, and “bo” and “ぽ” It is assumed that a projection matrix obtained by principal component analysis using data and an average feature amount are stored. FIG. 8 shows an example of registration of the similar character dictionary 42 in this case.
[0049]
A feature extraction device (not shown) is provided at the preceding stage of the character recognition device 1, and as described above, a number of methods have been proposed for extracting features from a character image. Since this embodiment is established, the description here is omitted.
[0050]
When the feature value extracted from the character image is input to the character recognition device 1 shown in FIG. 6, the feature value is sent to both the canonical feature compression device 43 and the principal component analysis feature compression device 44. Then, first, the canonical feature compression device 43 performs a dimensional compression process from 256 dimensions to 64 dimensions (step S21). Here, with reference to the above-described projection matrix created at the time of learning, the dimension compression feature amount is calculated using Expression (2) described later.
[0051]
The calculated 64-dimensional feature amount is sent to the matching device 45. The matching device 45 calculates the average feature amount of each class normally stored in the dictionary 41 and the Euclidean distance of the sent dimensional compression feature amount, and sends them to the similar character identification device 46 as candidates in ascending order of distance. , Matching is performed (step S22). Here, as an example, a matching method using a simple Euclidean distance has been described, but other matching methods that have been proposed many may be used.
[0052]
The similar character identification device 46 compares the sent character codes of the first candidate and the second candidate with the character set record of the similar character dictionary 42, and sets the first candidate and the second candidate in the character set record. If there is the same one (Y in step S23), the principal component analysis feature compression device 44 is started (sends the dictionary record number), and the dimensional compression feature amount corresponding to the character set is obtained (step S24). . Otherwise (N in step S23), the sent recognition result is output as it is as a recognition result (step S26). For example, when the first candidate is “ぴ” and the second candidate is “bi”, since the dictionary record number 2 in FIG. 8 is found, the dictionary record number 2 is sent to the principal component analysis feature compression device 44. Then, the principal component analysis feature compression device 44 is activated. When the first candidate is “Δ” and the second candidate is “U”, the obtained recognition result is output without activating the principal component analysis feature compression device 44.
[0053]
When the dictionary record number is sent from the similar character identification device 46, the principal component analysis feature compression device 44 uses the projection matrix of the corresponding dictionary record to compress the 256-dimensional feature amount into 64 dimensions (step S101). S24). Then, the obtained dimensional compression feature amount is sent to the similar character identification device 46.
[0054]
Upon receiving the dimensional compression feature value, the similar character identification device 46 calculates the Euclidean distance between the two average compression feature values stored in the similar character dictionary 42 as dictionary records, and calculates the smaller Euclidean distance. Is the first candidate, and the larger one is the second candidate, the recognition result sent from the matching device 45 is corrected (step S25), and the recognition result is output (step S26).
[0055]
[About canonical discriminant analysis]
The canonical discriminant analysis will be specifically described.
[0056]
(1) by canonical discriminant analysis from n-dimensional feature vector X _c of canonical discriminant analysis is character type c, illustrating a method for selecting a feature vector Y _c of n'dimension below.
[0057]
First, an eigenvector matrix Φ and an eigenvalue matrix を満たす satisfying the following equation are obtained.
[0058]
S _b Φ = S _w ΦΛ (1)
Here, _Sb is an inter-class variance matrix, _Sw is an intra-class variance matrix, Λ is an eigenvalue matrix having an eigenvalue λ _i (λ _i ≦ λ ₂ ≦... Λ _n ) as a diagonal element, and Φ is a corresponding This is an eigenvector matrix having the eigenvector Φ _i as a column vector.
[0059]
The projection matrix W ′ = {Φ _i ,..., Φ n ′} taking this eigen vector only from the larger eigen _{value projects} the n-dimensional feature vector X _c and a new n′-dimensional feature vector Y _c To
Y _c = W′X _c (2)
Asking. However, when the number of classes to be classified is k, n ′ ≦ (k−1).
[0060]
(2) by the principal component analysis from the feature vector X _c of the n-dimensional principal component analysis is character type c, illustrating a method for selecting a feature vector Y _c of n'dimension below.
[0061]
First, an eigenvector matrix Φ and an eigenvalue matrix を満たす satisfying the following equation are obtained.
[0062]
_{S t Φ = ΦΛ ...... (3} )
Here, S _t is the covariance matrix, lambda is the eigenvalues _{_{_{λ i (λ i ≦ λ 2}}} ≦ ... ≦ λ n) eigenvalues matrix with diagonal elements, [Phi eigenvector and corresponding eigenvectors [Phi _i column vector matrix It is.
[0063]
The projection matrix W ′ = {Φ _i ,..., Φ n ′} taking this eigen vector only from the larger eigen _{value projects} the n-dimensional feature vector X _c and a new n′-dimensional feature vector Y _c To
Y _c = W′X _c (2)
Asking. Here, n ′ <n.
[0064]
[About intra-class variance and inter-class variance]
The intra-class variance and the inter-class variance will be specifically described.
[0065]
Each data vector n-dimensional x, if the class is M, the difference w (t) and the difference v ^c and the examples and class mean of each class mean the overall average,
[0066]
(Equation 1)

[0067]
And the inter-class variance matrix B and the intra-class variance matrix W are defined as
[0068]
(Equation 2)

[0069]
Is defined.
[0070]
【The invention's effect】
The invention described in

claims

1, 2, 6 to 8 can improve the recognition performance for low-quality data while maintaining the recognition performance for high-quality data, and without adverse effects on high-quality data.
[0071]
According to a third aspect of the present invention, in the second aspect, quality data is selected by selecting some categories of data from the low quality set and calculating the intra-class variance of the selected data. The recognition performance for low-quality data can be improved while maintaining the recognition performance for high-quality data, and without adverse effects on high-quality data.
[0072]
According to a fourth aspect of the present invention, in the second or third aspect, dimensional compression can be performed in multiple stages including dimensional compression by canonical discriminant analysis.
[0073]
According to a fifth aspect of the present invention, in the fourth aspect of the invention, dimensional compression can be performed in multiple stages by dimensional compression using canonical discriminant analysis and dimensional compression using principal component analysis.
[Brief description of the drawings]
FIG. 1 is an electrical connection block diagram showing a hardware configuration of a character recognition device according to an embodiment of the present invention.
FIG. 2 is a functional block diagram of a process up to calculation of a projection vector of the character recognition device.
FIG. 3 is a flowchart of a process up to calculation of a projection vector of the character recognition device.
FIG. 4 is a functional block diagram of another example relating to processing up to calculation of a projection vector of the character recognition device.
FIG. 5 is a flowchart of another example of processing up to calculation of a projection vector of the character recognition device.
FIG. 6 is a functional block diagram of a matching process using multi-stage recognition by the character recognition device.
FIG. 7 is a flowchart of a matching process using multi-stage recognition by the character recognition device.
FIG. 8 is an explanatory diagram illustrating a registration example of a similar character dictionary.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Feature dimension compression device, matching

device

22, 23 Storage means 24 Intra-class variance matrix calculation means 26 Inter-class variance matrix calculation means

Claims

In a feature dimension compression apparatus that performs dimension compression of feature quantities using canonical discriminant analysis using intra-class variance and inter-class variance,
Storage means for classifying and storing learning data into a high-quality set that does not allow errors and a low-quality set that allows some errors due to low quality;
Inter-class variance matrix calculation means for calculating the inter-class variance matrix without using the low quality set using the high quality set,
The intra-class variance matrix calculating means for calculating the intra-class variance matrix without using the high quality set using the low quality set,
A feature dimension compression device characterized by comprising:

In a matching device that performs pattern matching by performing dimensional compression of features using canonical discriminant analysis using intra-class variance and inter-class variance,
Storage means for classifying and storing learning data into a high-quality set that does not allow errors and a low-quality set that allows some errors due to low quality;
Inter-class variance matrix calculation means for calculating the inter-class variance matrix without using the low quality set using the high quality set,
The intra-class variance matrix calculating means for calculating the intra-class variance matrix without using the high quality set using the low quality set,
A matching device comprising:

The intra-class variance matrix calculating means, when calculating the intra-class variance matrix, selects data of some categories from the low-quality set, and calculates the intra-class variance for the selected data. The matching device according to claim 2, wherein

Perform the dimensional compression of the feature amount in multiple stages,
If the number of classes to be divided at each stage of the multi-stage is larger than the number of compression dimensions, use dimensional compression by the canonical discriminant analysis, otherwise use another type of dimensional compression,
The matching device according to claim 2 or 3, wherein:

5. The matching apparatus according to claim 4, wherein a dimensional compression using principal component analysis is performed as the other type of dimensional compression.

In a computer-readable program for causing a computer to execute dimensional compression of a feature amount using canonical discriminant analysis,
A storage process of classifying and storing the training data into a high-quality set that does not allow errors and a low-quality set that allows some errors due to low quality;
An inter-class variance matrix calculation process of calculating an inter-class variance matrix without using the low-quality set using the high-quality set,
The intra-class variance matrix calculation process of calculating the intra-class variance matrix without using the high quality set using the low quality set,
Which causes a computer to execute the program.

In a computer-readable program that causes a computer to execute pattern matching by performing dimensional compression of features using canonical discriminant analysis using intra-class variance and inter-class variance,
Storage means for classifying and storing learning data into a high-quality set that does not allow errors and a low-quality set that allows some errors due to low quality;
Inter-class variance matrix calculation means for calculating the inter-class variance matrix without using the low quality set using the high quality set,
The intra-class variance matrix calculating means for calculating the intra-class variance matrix without using the high quality set using the low quality set,
A program that causes a computer to execute.

In the storage medium storing the program,
A storage medium storing the program according to claim 6.