JP2016162437A

JP2016162437A - Pattern classification device, pattern classification method and pattern classification program

Info

Publication number: JP2016162437A
Application number: JP2015044117A
Authority: JP
Inventors: 恒雄新田; Tsuneo Nitta
Original assignee: Waseda University
Current assignee: Waseda University
Priority date: 2015-03-05
Filing date: 2015-03-05
Publication date: 2016-09-05

Abstract

PROBLEM TO BE SOLVED: To provide a pattern classification device for making vector data high-dimensional based on multiple linear mapping and extracting and classifying pattern features of the vector data, a pattern classification method and a pattern classification program.SOLUTION: The pattern classification device comprises: a preprocessing part which generates an eigen vector set common for identity classes by performing main component analysis on a learning input pattern beforehand, and reconfigures an orthogonal vector from the input pattern and the eigen vector set; a high-dimensional vector conversion part which converts the orthogonal vector into a high-dimensional vector by multiple linear mapping (tensor product); a high-dimensional partial space similarity calculation part which generates out a high-dimensional eigen vector set from the high-dimensional vector for each class (k) and calculates and generates similarity for each class between the high-dimensional vector and a high-dimensional eigen vector set Ψ (k, m); and an identity classification part which performs individual identity classification from the similarity for each class.SELECTED DRAWING: Figure 1

Description

本発明は、パターン分類装置、パターン分類方法およびパターン分類プログラムに関する。さらに詳しくは、入力パターンを表す特徴ベクトルの各特徴要素に基づくテンソル特徴量の抽出を行うことによりパターン認識性能を格段に向上させたパターン分類装置、パターン分類方法およびパターン分類プログラムに関する。 The present invention relates to a pattern classification device, a pattern classification method, and a pattern classification program. More specifically, the present invention relates to a pattern classification device, a pattern classification method, and a pattern classification program that have greatly improved pattern recognition performance by extracting tensor feature amounts based on feature elements of feature vectors representing input patterns.

ネットワークの高速化と高性能なクラウド型計算環境の整備に伴い、様々なメディア(テキスト、音声、画像)に対する処理は、これまでの符号化主体から、コンテンツに直接アクセスする検索主体へ、即ち各メディアに対するパターン認識・理解へと深化が始まっている。人間は日常生活において、音・音声、文字・画像・映像を通じて入力される膨大なマルチモーダル情報データからなるパターンを処理している。処理にはパターンを認識し、パターンを分類し、パターンを理解する機能が含まれる。例えば、本や新聞等を読むとき、人間は視覚を通じて入力された文字パターンと、学習しているパターンとを照合して、文字や単語の意味を理解する。また音声を聴くときは、聴覚を通じて入力された音声パターンと、学習しているパターンとを照合して、発話の意味を理解する。さらに人間が人の顔を観るときは、視覚を通じて入力された画像パターンと、学習しているパターンとを照合して他人を識別したり、情動を理解したりすることができる。 With the speeding up of the network and the development of a high-performance cloud computing environment, the processing for various media (text, voice, images) is changed from the main encoding body to the search body that directly accesses the content, that is, each medium. Deepening into pattern recognition / understanding has begun. In daily life, humans process patterns consisting of enormous multimodal information data input through sound / voice, text / image / video. Processing includes the ability to recognize patterns, classify patterns, and understand patterns. For example, when reading a book or a newspaper, humans understand the meaning of characters and words by collating character patterns input through vision with learned patterns. When listening to the voice, the voice pattern input through hearing is compared with the learned pattern to understand the meaning of the utterance. Furthermore, when a human watches a person's face, it is possible to identify other people or understand emotions by collating image patterns inputted through vision with learned patterns.

ここで音声を例にとると、近年、人間にとって最も自然なコミュニケーション手段である音声を利用した多くのシステムが登場し、音声認識技術はスマートフォンやＷＥＢブラウザなどにも導入されている。一方、音声認識システムの普及に伴い、基盤技術として音声認識エンジンの一層の性能向上が求められている。 Taking speech as an example, in recent years, many systems using speech, which is the most natural communication means for humans, have appeared, and speech recognition technology has also been introduced into smartphones and WEB browsers. On the other hand, with the spread of speech recognition systems, further improvement in performance of speech recognition engines is required as a basic technology.

現在の音声認識システムは、音響分析から求められるメル周波数ケプストラム係数（Mel-Frequency Cepstrum Coefficients; 以下、ＭＦＣＣと略する）を特徴パラメータとし、ＭＦＣＣ時系列を隠れマルコフモデル(Hidden Markov Model；以下、ＨＭＭと略する）の確率過程として扱う手法が主流になっている。 The current speech recognition system uses Mel-Frequency Cepstrum Coefficients (hereinafter abbreviated as MFCC) obtained from acoustic analysis as characteristic parameters, and MFCC time series as Hidden Markov Model (hereinafter HMM). The method of handling as a stochastic process is a mainstream.

一方、多層パーセプトロン(Multi- Layer- Perceptron; 以下、ＭＬＰと略する)を5段以上重ねるディープ・ニューラルネットワーク（Deep Neural Network；以下、ＤＮＮと略する）を用いて、音素や調音の素性に対して前後の文脈を含むトライフォン（Triphone）を抽出し、これらの系列をＨＭＭ確率モデルとして扱う音声認識システムの研究が盛んになっている（例えば、非特許文献１及び図１２）。 On the other hand, using a deep neural network (hereinafter abbreviated as DNN) that stacks five or more layers of multi-layer-perceptron (hereinafter abbreviated as MLP), Research on speech recognition systems that extract triphones including contexts before and after and treat these sequences as HMM probabilistic models has become active (for example, Non-Patent Document 1 and FIG. 12).

図１２に一例として、従来のこのような音声認識システムを示した。図１２に示すように、この音声認識システムは、多段からなるＭＬＰすなわちＤＤＮを備えており(５〜７段連ねたものが利用される)、かつ、音素素性や調音素性を抽出するに際しては、ＭＬＰ毎に重み係数を設定しなければならないものとなっている。 FIG. 12 shows such a conventional voice recognition system as an example. As shown in FIG. 12, this speech recognition system includes a multi-stage MLP, that is, a DDN (used in a series of 5 to 7 stages), and when extracting phoneme features and articulatory features, A weighting factor must be set for each MLP.

一方、現在のＤＮＮによる音素素性抽出は、ＭＬＰを数段重ねて抽出するため、ＭＬＰにおける抽出精度向上と共に、高い計算コストが課題になっている（例えば、非特許文献２）。またＭＬＰ（ＤＮＮを含む）は、アルゴリズムの収束性は保障されているものの、最適性は保障されていないという本質的な問題を持つ。さらにＭＬＰの内部構造とその意味が明確でないことから、技術進展は停滞しつつあり、内部構造が明確な数理モデルへの移行が望まれている。 On the other hand, in the current phoneme feature extraction by DNN, MLP is extracted in several stages, so that the extraction accuracy in MLP is improved and high calculation cost is a problem (for example, Non-Patent Document 2). MLP (including DNN) has an essential problem that the convergence of the algorithm is guaranteed but the optimality is not guaranteed. Furthermore, since the internal structure of MLP and its meaning are not clear, technological progress is stagnating, and a shift to a mathematical model with a clear internal structure is desired.

次に、音声、文字、画像、映像信号等のデータからなるパターンを認識するシステムとしては、部分空間法（Subspace Method；以下ＳＭと略す）に基づくシステムがある。このシステムは、パターン（音声、文字、画像、映像信号等のデータ）を構成するベクトルから、予め学習時に離散的なクラス（音素、調音素性、ストローク、顔部品等）に特有な固有ベクトルを部分空間として抽出する。この特有な固有ベクトルは、学習データから主成分分析（Primal Component Analysis；以下ＰＣＡと略す）により抽出することができる。SMでは、クラスが未知の入力パターン（ベクトルx）と、クラスkのM個の固有ベクトルφ(k、m), m=1,2,..M, k=1, 2,…, Kとの内積を計算し、クラスkの類似度 s(k)を次式から計算する。ここで、‖ ‖ はノルム（二乗和の平方根）である。
Next, as a system for recognizing patterns composed of data such as voice, characters, images, and video signals, there is a system based on a subspace method (hereinafter abbreviated as SM). In this system, eigenvectors peculiar to discrete classes (phonemes, articulatory features, strokes, facial parts, etc.) at the time of learning are subspaced from vectors constituting patterns (data such as speech, characters, images, video signals). Extract as This unique eigenvector can be extracted from the learning data by principal component analysis (hereinafter abbreviated as PCA). In SM, an input pattern (vector x) whose class is unknown and M eigenvectors φ (k, m), m = 1,2, ... M, k = 1, 2, ..., K of class k The inner product is calculated, and the similarity s (k) of class k is calculated from the following equation. Where ‖ ‖ is the norm (square root of the sum of squares).

すなわち未知クラスの入力は、学習データのM個の固有ベクトルとの内積を累積して、クラスkの類似度を得る。ここで内積の二乗を計算するのは、固有ベクトルの符号に依らない値として類似度を定義していることによる。クラスの帰属を決定する場合は、s(k)の最大値を与えるkを認識結果とすることになる。ＳＭは、ＭＬＰあるいはＤＮＮと異なり、凸最適化が保障されているという利点を持つが、固有ベクトルφ(k、m)がクラス内情報のみを対象とし、パターン素性のクラス間の関連性を扱っていないこと、またベクトルを構成する要素間の関連性も利用していないことから、認識性能に限界があった。 That is, the unknown class input accumulates the inner product of the learning data with M eigenvectors to obtain the class k similarity. Here, the square of the inner product is calculated because the similarity is defined as a value that does not depend on the sign of the eigenvector. When class membership is determined, k that gives the maximum value of s (k) is used as the recognition result. Unlike MLP or DNN, SM has the advantage that convex optimization is guaranteed, but the eigenvector φ (k, m) targets only the information in the class and handles the relationship between the pattern feature classes. There is a limit to the recognition performance because there is no connection and the relationship between the elements constituting the vector is not used.

一方、パターン素性のクラス間の情報を対象とする特徴抽出として、線形判別分析法（Linear Discriminant Analysis；以下、ＬＤＡと略する）がある。ＬＤＡは、クラス間の違いを考慮して識別を行う方法で、クラス内およびクラス外の共分散の双方を評価することでクラスを判別する。しかしＬＤＡは、クラス毎の平均ベクトルを中心に、周辺のクラス内共分散およびクラス外共分散から固有ベクトルを計算するため、クラス間を分離する能力は増すものの、学習データに依存し過ぎるという欠点があり、データ収集時(学習時)と認識時の環境の違いによる性能劣化が大きいという欠点を持つ。 On the other hand, there is a linear discriminant analysis (hereinafter abbreviated as LDA) as feature extraction for information between pattern feature classes. LDA is a method of identifying in consideration of the difference between classes, and discriminates a class by evaluating both within-class and outside-class covariance. However, since LDA calculates eigenvectors from surrounding intra-class covariance and extra-class covariance around the average vector for each class, the ability to separate classes increases, but it has the disadvantage of being too dependent on learning data. In addition, there is a drawback that performance degradation is large due to the difference in environment between data collection (learning) and recognition.

さらに、音声データを認識するシステムとしては、音素の弁別特徴ベクトルをＭＬＰとＰＣＡにより抽出し、音素認識をする方法が提案されている（例えば、非特許文献３）。また、音素ベクトルをＰＣＡから抽出し、認識性能を向上させる方法が提案されている（例えば、非特許文献４）。 Furthermore, as a system for recognizing speech data, a method of extracting phoneme discrimination feature vectors by MLP and PCA and performing phoneme recognition has been proposed (for example, Non-Patent Document 3). Also, a method has been proposed in which phoneme vectors are extracted from PCA to improve recognition performance (for example, Non-Patent Document 4).

しかしながら、上記いずれの方法を採用した特徴抽出器を用いても、膨大な情報量を持つパターン（音声、文字、画像、映像信号等のデータ）からその特徴を精確に抽出し、かつ性能よくパターンを認識することができないという問題点があった。 However, even if a feature extractor that employs any of the above methods is used, the feature is accurately extracted from a pattern (data such as voice, text, image, video signal, etc.) having an enormous amount of information, and the pattern has good performance There was a problem that could not be recognized.

特に、近年においては、ブログ、動画サイト、フェイスブック（登録商標）、Twitter（登録商標）といったＳＮＳ（Social Networking Service）の利用者が増加すると共に、パソコン、スマートフォン等の端末から文字のみならず、音声、写真、動画等の膨大なデジタルデータがインターネット上の様々なサーバーコンピューターに蓄積されており、これらの所謂ビッグデータは数百兆バイト以上と云われている。このような状況の中、ビッグデータ中の音声データ、文字データ、画像データから、個々のパターンを高い精度で識別し得るパターン認識装置が必要不可欠となっている。 In particular, in recent years, users of SNS (Social Networking Service) such as blogs, video sites, Facebook (registered trademark), Twitter (registered trademark) have increased, and not only characters from terminals such as personal computers and smartphones, Enormous amounts of digital data such as voice, photos, and moving images are stored in various server computers on the Internet. These so-called big data are said to be several hundred trillion bytes or more. Under such circumstances, a pattern recognition device capable of identifying individual patterns with high accuracy from voice data, character data, and image data in big data is indispensable.

しかしながらこれまでのテキスト・音声・画像処理技術は、ベクトルデータを対象としてきたため、パターン認識の精度を飛躍的に向上させることが困難であった。またネットワークの高速化と高性能なクラウド型計算環境の整備に伴い、様々なメディア（テキスト、音声、画像）に対する処理は、これまでの符号化主体から、コンテンツに直接アクセスする検索主体へと深化が始まっている。 However, since conventional text / sound / image processing techniques have been targeted for vector data, it has been difficult to dramatically improve the accuracy of pattern recognition. In addition, with the speeding up of the network and the development of a high-performance cloud computing environment, the processing for various media (text, voice, images) has been deepened from the main encoding body to the search body that directly accesses content. It has begun.

このような処理主体の変化により、パターン認識からパターン分類に注目が集まってきている。パターン分類としては、例えば、膨大な言語パターンを認識するシステムが挙げられ、ニューラルネットワークにテンソル解析の手法を組み込むことで、言語の認識性能を向上された大規模な言語認識装置が提案されている（例えば、非特許文献５）。 Due to such changes in processing subjects, attention is focused on pattern classification from pattern recognition. Examples of pattern classification include a system that recognizes an enormous number of language patterns, and a large-scale language recognition device that has improved language recognition performance by incorporating a tensor analysis technique into a neural network has been proposed. (For example, Non-Patent Document 5).

こうした要請に応えるには、これまでのベクトル対象の処理から、ベクトル要素間の関連性を含む処理、すなわちテンソル解析に基づく特徴抽出やクラス分類が必要になる。なお、本件特許出願人は、上記文献公知発明が記載された刊行物として、以下の刊行物を提示する。 In order to respond to such a request, processing including the relationship between vector elements, that is, feature extraction and classification based on tensor analysis is required from processing of vector objects so far. In addition, this patent applicant presents the following publications as publications in which the above-mentioned literature known invention is described.

Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury, “Deep Neural Networks for Acoustic Modeling in Speech Recognition”, IEEE SIGNAL PROCESSING MAGAZINE [2] , pp.82-97, november 2012.Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury, “Deep Neural Networks for Acoustic Modeling in Speech Recognition”, IEEE SIGNAL PROCESSING MAGAZINE [2], pp.82-97, november 2012. Mohammad Nurul Huda, Hiroaki Kawashima, and Tsuneo Nitta, “Distinctive Phonetic Feature (DPF) extraction based on MLPs and Inhibition/ Enhancement Network, ” IEICE Trans. Inf. & Syst., Vol.E92-D, No. 4, pp.671-680 (2009).Mohammad Nurul Huda, Hiroaki Kawashima, and Tsuneo Nitta, “Distinctive Phonetic Feature (DPF) extraction based on MLPs and Inhibition / Enhancement Network,” IEICE Trans. Inf. & Syst., Vol.E92-D, No. 4, pp. 671-680 (2009). 福田、新田「頑健な音声認識のための音素特徴ベクトル直行化方式の検討」情報処理学会研究報告、２００３年−ＳＬＰ−４９，２００３．Fukuda, Nitta “Examination of Phoneme Feature Vector Orthogonalization Method for Robust Speech Recognition”, Information Processing Society of Japan, 2003-SLP-49, 2003. 朴、溝口、有木「ＰＣＡを用いた音素ベクトルによる音声特徴量の抽出の検討」日本音響学会秋季研究発表会１−ｐ−２６，２００７Park, Mizoguchi, Ariki “Examination of speech feature extraction by phoneme vector using PCA” Autumn Meeting of the Acoustical Society of Japan 1-p-26, 2007 B. Hutchinson, L. Deng, and D. Yu, “A deep architecture with bilinear modeling of hidden representations: applications to phonetic recognition", Proc. ICASSP2012, pp.4805-4808, March 2012.B. Hutchinson, L. Deng, and D. Yu, “A deep architecture with bilinear modeling of hidden representations: applications to phonetic recognition”, Proc. ICASSP2012, pp.4805-4808, March 2012.

本発明はこうした要請に応えるため、これまでのメディアをベクトルとして扱う処理から、ベクトル要素間の関連性を含む処理、すなわちテンソル解析に基づく特徴抽出やクラス分類に基づき、高い精度でパターンを識別し得る分類装置、分類方法および分類プログラムを提供する。更に、詳しくは双対空間とテンソル空間を組み合わせた直交ベクトル系列生成の手法を用いて、高い精度にてパターンを認識し得る分類装置を提供する。更に、詳しくはベクトルデータをテンソル空間において多重線形写像して高次元化ベクトルデータを生成する方法を用いて、高速かつ精度良くパターンを分類し得るパターン分類装置、パターン分類方法およびパターン分類プログラムに関する。 In order to meet these demands, the present invention identifies patterns with high accuracy based on processing that includes the relationship between vector elements, that is, processing that includes relationships between vector elements, that is, feature extraction and class classification based on tensor analysis. A classification apparatus, a classification method, and a classification program are provided. More specifically, a classification apparatus capable of recognizing a pattern with high accuracy using a method of generating an orthogonal vector sequence combining a dual space and a tensor space is provided. More particularly, the present invention relates to a pattern classification apparatus, a pattern classification method, and a pattern classification program that can classify patterns at high speed and with high accuracy using a method of generating vector data of multiple dimensions by multi-linear mapping vector data in a tensor space.

本発明は、以下の技術的事項から構成される。
（１）パターン素性類似度ベクトルを抽出することにより入力パターンを認識することができるパターン分類装置であって、
前記入力パターンに対して前処理を行う前処理部と、
前記前処理を施した直交化ベクトルを多重線形写像(テンソル積)により高次元ベクトルに変換する高次元ベクトル変換部と、
前記高次元ベクトルからクラス別の高次元固有ベクトルセットを生成出力し、前記高次元ベクトルと前記高次元の固有ベクトルセットΨ(k,m)との間でクラス別に類似度を計算して生成する高次部分空間類似度計算部と、
前記クラス別類似度から個別素性を分類する素性分類部と、を備えるパターン分類装置。
（２）前記前処理部は、前記入力パターンと、予め学習パターンに対する主成分分析により生成された素性クラス共通の固有ベクトルセットとの内積をとることによって、前記入力パターンをクラス共通双対空間に写像し、前記内積演算から得られたスカラー値を固有ベクトルの軸数並べることでクラス共通の直交化ベクトルを再構成する処理を含むことを特徴とする（１）記載のパターン分類装置。
（３）前記高次部分空間類似度計算部は鏡像を用いた競合学習を高次部分空間法に適応することによってクラス分類の誤りを修正する競合学習部を有し、
前記競合学習部において高次元特徴の学習データを用いてクラス毎の(自己)相関行列間で鏡像学習を含む競合学習を行った後に得られるクラス別固有ベクトルΨ(k,m)を、前記高次元の固有ベクトルセットとして高次部分空間類似度計算に使用することを特徴とする（１）記載のパターン分類装置。
（４）類似度事後確率処理部をさらに備える（１）記載の分類装置であって、前記類似度事後確率処理部は、前記類似度を入力として、前記入力されたベクトルデータが属するクラスの事後確率を算出する構成としたことを特徴とするパターン分類装置。 The present invention is composed of the following technical matters.
(1) A pattern classification device capable of recognizing an input pattern by extracting a pattern feature similarity vector,
A preprocessing unit that performs preprocessing on the input pattern;
A high-dimensional vector conversion unit that converts the orthogonalized vector subjected to the preprocessing into a high-dimensional vector by multiple linear mapping (tensor product);
Generate and output a high-dimensional eigenvector set for each class from the high-dimensional vector, and calculate and generate a similarity for each class between the high-dimensional vector and the high-dimensional eigenvector set Ψ (k, m) A subspace similarity calculation unit;
A pattern classification apparatus comprising: a feature classification unit that classifies individual features based on the similarity by class.
(2) The pre-processing unit maps the input pattern to a class-common dual space by taking an inner product of the input pattern and an eigenvector set common to the feature class generated in advance by principal component analysis on the learning pattern. The pattern classification apparatus according to (1), further including a process of reconstructing an orthogonal vector common to a class by arranging the scalar values obtained from the inner product operation in the number of axes of eigenvectors.
(3) The higher-order subspace similarity calculating unit includes a competitive learning unit that corrects classification error by applying competitive learning using a mirror image to a higher-order subspace method,
Class-specific eigenvectors Ψ (k, m) obtained after competitive learning including mirror image learning between (auto) correlation matrices for each class using high-dimensional feature learning data in the competitive learning unit, (1) The pattern classification apparatus according to (1), wherein the pattern classification apparatus is used as a eigenvector set for calculating higher-order subspace similarity.
(4) The classification apparatus according to (1), further including a similarity posterior probability processing unit, wherein the similarity posterior probability processing unit receives the similarity and inputs the posterior of the class to which the input vector data belongs. A pattern classification apparatus characterized in that a probability is calculated.

また本発明は、上述した技術的事項を特徴とするパターン分類方法やパターン分類プログラムにも適用される。 The present invention is also applied to a pattern classification method and a pattern classification program characterized by the technical matters described above.

本発明は、入力されたベクトルを構成する要素間の関係（テンソル構造）を多重線形写像に基づいて解析することで、入力データのパターン特徴を精度よく抽出、分類することができる。 The present invention can extract and classify pattern features of input data with high accuracy by analyzing a relationship (tensor structure) between elements constituting an input vector based on multiple linear mapping.

また、高次部分空間類似度計算部において、鏡像を用いた競合学習を部分空間法に適応させることによって部分空間法における超平面の境界が正しく修正され、より正確な入力パターンの分類を行なうことができる。 In addition, by applying competitive learning using mirror images to the subspace method in the higher-order subspace similarity calculation unit, the boundary of the hyperplane in the subspace method is corrected correctly, and the input pattern is classified more accurately. Can do.

また、入力されたベクトルデータの主成分分析からベクトル表現の次元圧縮処理をすることによって、分類装置における計算効率と認識性能の両方が向上する。 Further, by performing dimension compression processing of vector expression from principal component analysis of input vector data, both calculation efficiency and recognition performance in the classification device are improved.

また、テンソル積より生成される高次部分空間の主成分を抽出することによって、各ベクトル要素間の相関を反映した特徴量を効果的に抽出させることができる。 Further, by extracting the main component of the higher-order subspace generated from the tensor product, it is possible to effectively extract the feature quantity reflecting the correlation between the vector elements.

また、前記高次部分空間抽出部は、鏡像を用いた競合学習を部分空間法に適応することによって、部分空間法における超平面の境界が正しく修正され、より正確な入力パターンの分類を行うことができる。 In addition, the higher-order subspace extraction unit adapts competitive learning using a mirror image to the subspace method, so that the boundary of the hyperplane in the subspace method is corrected correctly, and the input pattern is classified more accurately. Can do.

また、高次部分空間においてある入力が与えられたとき、この入力がどのクラスに属するかを確率的に計算することができる。つまり、類似度事後確率処理部によって事後確率化の処理が可能となる。 Further, when an input is given in a higher-order subspace, it can be probabilistically calculated to which class this input belongs. In other words, the posterior probability process by the similarity posterior probability processing unit is possible.

本発明の分類装置の構成を示したモデル図である。It is the model figure which showed the structure of the classification device of this invention. 本発明のハードウェア構成図であるIt is a hardware block diagram of this invention. 多重線形写像部５の処理を説明した説明図である。It is explanatory drawing explaining the process of the multiple linear mapping part. 音素の２４次元のフィルタバンク係数とその対数パワーを３フレーム連結した特徴量の平均パターンである。It is an average pattern of feature values obtained by connecting three frames of a 24-dimensional filter bank coefficient of a phoneme and its logarithmic power. 図４に図示した平均パターンのテンソル積により得られた特徴量の平均パターンの説明図であるIt is explanatory drawing of the average pattern of the feature-value obtained by the tensor product of the average pattern shown in FIG. 音素の２４次元のフィルタバンク係数とその対数パワーを３フレーム連結した特徴量の平均パターンの説明図である。It is explanatory drawing of the average pattern of the feature-value which connected 24 frame filter bank coefficient of the phoneme, and its logarithmic power 3 frames. 鏡像を用いた競合学習の説明図である。It is explanatory drawing of the competitive learning using a mirror image. 学習データから観測された音素素性毎の類似度分布P(s|p(k))と全ての音素素性の類似度が示す分布を示すThe similarity distribution P (s | p (k)) for each phoneme feature observed from the training data and the distribution shown by the similarity of all phoneme features 事前に観測できる分布から近似した事前分布のモデル図である。It is a model figure of the prior distribution approximated from the distribution which can be observed in advance. 本発明の分類装置を用いて処理される多重線形写像に基づく特徴抽出（図中ｖ_ｎ）と高次部分空間法（尤度ｓ_ｋ）のブロック図である。Features based on multiple linear mapping extraction is processed using a classification device of the present invention is a block diagram of (in the figure v _n) and higher subspace method (likelihood s _k). 本発明の処理手順を示すフローチャート図である。It is a flowchart figure which shows the process sequence of this invention. 本発明に係わる音素認識の実験結果を示すグラフである。It is a graph which shows the experimental result of the phoneme recognition concerning this invention. 従来手法の音素素性のDDN変換によるパターン認識装置である。This is a pattern recognition device based on the phoneme feature DDN conversion of the conventional method.

以下、本発明におけるパターン分類装置、パターン分類方法およびパターン分類プログラムの実施形態について、添付図面を参照して説明する。なお、添付図面は本発明の技術的特徴を説明するのに用いられており、記載されている装置の構成、各種処理の手順などは、特に特定的な記載がない限り、それのみに限定する趣旨ではない。 Hereinafter, embodiments of a pattern classification device, a pattern classification method, and a pattern classification program according to the present invention will be described with reference to the accompanying drawings. The attached drawings are used to explain the technical features of the present invention, and the configuration of the apparatus described, the procedure of various processes, and the like are limited to only those unless otherwise specified. Not the purpose.

図１は、パターン（音素）分類装置１の構成を示したモデル図である。図１を参照して、パターン分類装置１の電気的構成を説明する。本実施形態では、パターンの認識例として音声を用いて、本パターン分類装置１を音声認識装置１として以下に説明する。 FIG. 1 is a model diagram showing a configuration of a pattern (phoneme) classification device 1. With reference to FIG. 1, the electrical configuration of the pattern classification apparatus 1 will be described. In the present embodiment, voice is used as a pattern recognition example, and the pattern classification apparatus 1 will be described as the voice recognition apparatus 1 below.

音声認識装置１は、図示しないマイク等の音声入力装置から入力されるパターン（音声）に、前処理としてＢＰＦ（Band Path Filter）を適用して音声をスペクトル系列からなる音響特徴ベクトル系列を生成する前処理部２と、予め学習した（例えば、音素素性）クラスに共通のクラス共通固有ベクトルセット３を用いて音響特徴ベクトル系列を直交化する直交化ベクトル再構成部４と、直交化ベクトル系列をテンソル積の基底に基づく多重線形写像により高次元ベクトル系列を生成する多重線形写像部５と、予め学習データから音素素性クラス別に求めたクラス別固有ベクトルセット６を用いて高次元ベクトル系列の高次部分空間における類似度を計算する高次部分空間類似度計算部７と、音素素性クラス毎の類似度系列から事後確率化した類似度（尤度）系列へ変換する事後確率処理部８と、音素素性の事後確率化された類似度系列から学習により求めたＨＭＭの音響モデルと予め言語資源を用いて学習したＨＭＭの言語モデル（N-gram）を利用して音素を認識するＨＭＭ音素分類部１０により構成される。 The speech recognition apparatus 1 applies a BPF (Band Path Filter) as a preprocessing to a pattern (speech) input from a speech input device such as a microphone (not shown) to generate an acoustic feature vector sequence composed of a speech spectrum sequence. Pre-processing unit 2, orthogonal vector reconstructing unit 4 for orthogonalizing acoustic feature vector sequence using class common eigenvector set 3 common to previously learned (for example, phoneme feature) class, and tensor for orthogonal vector sequence A high-order subspace of a high-dimensional vector sequence using a multi-linear mapping unit 5 that generates a high-dimensional vector sequence by multiple linear mapping based on the basis of the product, and a class-specific eigenvector set 6 obtained in advance for each phoneme feature class from learning data High-order subspace similarity calculation unit 7 for calculating the similarity in, and similarity obtained by posterior probability from similarity series for each phoneme feature class A posterior probability processing unit 8 for converting to a degree (likelihood) sequence, an HMM acoustic model obtained by learning from a similarity sequence that has been a posteriori probability of phoneme features, and an HMM language model previously learned using language resources ( N-gram) is used by the HMM phoneme classification unit 10 for recognizing phonemes.

前処理部２は、コーパス等の外部に設置された入力部１４を通じてパターン入力されたベクトルデータからパターンの特徴を抽出し、パターン特徴ベクトル系列を生成する。ここでパターン入力とは、様々なメディア（テキスト、音声、画像）から成るベクトルデータの事である。音声ではフーリエ変換によりスペクトルを得た後、帯域フィルタ（ＢＰＦ）を通した時系列のベクトルが代表である。テキストでは、形態素解析により単語列を得た後、単語毎の生起確率を並べたベクトルが代表である。画像では画像中の多様なオブジェクトに対して、画素に対する画像特徴抽出を適用したデータを時空間に配置したベクトル、もしくは抽出した各オブジェクトの生起確率を並べたベクトルなどがある。 The pre-processing unit 2 extracts pattern features from vector data pattern-inputted through an input unit 14 installed outside a corpus or the like, and generates a pattern feature vector series. Here, pattern input refers to vector data composed of various media (text, sound, image). A typical example of speech is a time-series vector obtained by obtaining a spectrum by Fourier transform and then passing through a bandpass filter (BPF). In text, after obtaining a word string by morphological analysis, a vector in which occurrence probabilities for each word are arranged is representative. In an image, there are a vector in which data obtained by applying image feature extraction to pixels is arranged in time and space for various objects in the image, or a vector in which occurrence probabilities of each extracted object are arranged.

前処理部２は、例えば、２４チャンネル程度の帯域フィルタ（ＢＰＦ）群を有し、中心周波数がメル尺度間隔で設定されている。音声信号は、帯域フィルタ（ＢＰＦ）群に入力され、音響特徴ベクトル系列としてスペクトルパターンが出力される。 The preprocessing unit 2 has, for example, a band filter (BPF) group of about 24 channels, and the center frequency is set at mel scale intervals. The audio signal is input to a band-pass filter (BPF) group, and a spectrum pattern is output as an acoustic feature vector series.

この音響特徴ベクトル系列は、音声では中心周波数を聴覚特性のメル尺度に設定したBPF24チャンネルの対数出力に対数パワーを含めた25チャンネルの主に対数スペクトルの出力、あるいは対数スペクトルに離散コサイン変換（Discrete cosine Transform；ＤＣＴ）を施したメル周波数ケプストラム係数（Mel-frequency Cepstrum Coefficient；以下、ＭＦＣＣと略す）が用いられる。なお、スペクトル系列からＤＣＴによりＭＦＣＣを抽出する以外に、スペクトル系列、すなわち周波数と時間の二次元パターンに対して、畳み込み演算を施したり、局所特徴（Local Feature）の抽出を行ったりしてもよい。 This acoustic feature vector sequence is the output of 25-channel mainly logarithmic spectrum including logarithmic power in the logarithmic output of BPF24 channel with the center frequency set to the Mel scale of auditory characteristics, or discrete cosine transform to logarithmic spectrum (Discrete) Mel frequency cepstrum coefficient (hereinafter abbreviated as MFCC) subjected to cosine transform (DCT) is used. In addition to extracting MFCC from a spectrum sequence by DCT, a convolution operation may be performed on a spectrum sequence, that is, a two-dimensional pattern of frequency and time, or a local feature may be extracted. .

なお、前処理部は、これらの特徴ベクトルに対して、主成分分析（ＰＣＡ）を用いてまず全ての音素素性クラスの音響特徴ベクトルから自己相関行列Ｒ（ｊ），ｊ＝１，２，…Ｊ（＝Ｉ×Ｉ）を求めた後、該相関行列の固有値問題を解くことによって、主成分となるクラス別固有ベクトルセット６を求め、その後クラス別固有ベクトルセット６から以下の数式を用いてクラス共通の相関行列を作成し、該クラス共通の相関行列の固有値問題を解くことでクラス共通の固有ベクトルセット３を生成する。なお、これら音響特徴ベクトル系列などに代表されるベクトル系列データは、一般には相互に直交化されていない。そのため、次に説明する直交化ベクトル再構成部４でベクトル系列データを直交化し、ベクトルを再構成する必要がある。 Note that the pre-processing unit first performs autocorrelation matrix R (j), j = 1, 2,... From acoustic feature vectors of all phoneme feature classes using principal component analysis (PCA) for these feature vectors. After obtaining J (= I × I), the eigenvalue problem of the correlation matrix is solved to obtain a class-specific eigenvector set 6, and then the class-specific eigenvector set 6 is used to share the class using the following formula: And the eigenvalue set 3 common to the class is generated by solving the eigenvalue problem of the correlation matrix common to the class. Note that vector sequence data represented by these acoustic feature vector sequences and the like are generally not orthogonalized. For this reason, it is necessary to orthogonalize the vector sequence data by the orthogonalized vector reconstruction unit 4 described below to reconstruct the vector.

直交化ベクトル再構成部４では、前処理で生成したクラス共通の固有ベクトルセット３と、特徴ベクトルｘ（ｉ）との内積から得られるスカラー値ｃ（ｍ）を任意Ｍ個並べて直交化ベクトルｙ（ｍ）を再構成する。
The orthogonal vector reconstructing unit 4 arranges arbitrary M scalar values c (m) obtained from the inner product of the class-specific eigenvector set 3 generated in the preprocessing and the feature vector x (i), and generates the orthogonal vector y ( Reconfigure m).

この処理は、ＫＬ変換と呼ばれるが線形代数では双対変換と呼ばれ、関数ｆ（φ（ｍ））により、特徴ベクトルｘ（ｉ）を、スカラー値ｃ（ｍ）を並べた同値のベクトルに変換するものである。関数ｆ（φ（ｍ））は固有関数による射影関数と見做せ、関数の関数であることから汎関数（functional）と呼ばれる。 This process is called KL transformation, but is called dual transformation in linear algebra, and the function vector x (i) is converted into a vector of equivalence by arranging scalar values c (m) by a function f (φ (m)). To do. The function f (φ (m)) is regarded as a projection function based on an eigenfunction, and is called a functional because it is a function of the function.

以上の処理により、特徴ベクトル系列は成分が直交したベクトル系列に変換される。なお、元の特徴ベクトル系列は、１音声サンプル等のパターンにおいて、１フレームにつき２５次元程度のベクトルであるが、直交化では１１フレーム程度をまとめて特徴ベクトルｘ（ｉ），ｉ＝１，…，Ｉ（＝２５×１１）とし、これに対して主成分分析を行った結果の固有ベクトルセットφ（ｍ，ｉ）から双対変換によりｃ（ｍ）を得、このスカラー値を並べて直交化ベクトルｙ（ｍ）として用いる。 Through the above processing, the feature vector sequence is converted into a vector sequence having orthogonal components. Note that the original feature vector series is a vector of about 25 dimensions per frame in a pattern of one audio sample or the like, but in the orthogonalization, about 11 frames are collected together as feature vectors x (i), i = 1,. , I (= 25 × 11), c (m) is obtained by dual transformation from the eigenvector set φ (m, i) obtained as a result of the principal component analysis, and this scalar value is arranged to be orthogonalized vector y. Used as (m).

なお、このとき大きい固有値に対応する固有ベクトルとの内積を積極的にとり、一方、小さな固有値に対応する固有ベクトルとの内積を無視することによって、予め相関の低い冗長な成分を除去することができる。これによって識別的な情報を保持したまま、この後の計算量を削減することができる。 At this time, by taking the inner product with the eigenvector corresponding to the large eigenvalue positively and ignoring the inner product with the eigenvector corresponding to the small eigenvalue, redundant components with low correlation can be removed in advance. As a result, it is possible to reduce the subsequent calculation amount while retaining the discriminating information.

多重線形写像（Multiple- Linear Map）部５は、直交化ベクトルy(m)からその内部構造をテンソル表現することで高次元ベクトルを得る。簡単のため、以下では双線形写像(Bi-linear Map)の場合を説明する。多重線形写像はベクトル空間Ｖに対する双対空間V^*上の以下の式で示す多重線形関数とｎ階テンソル積として定義される。
A multiple-linear map unit 5 obtains a high-dimensional vector by expressing the internal structure of the orthogonal vector y (m) with a tensor. For simplicity, the case of Bi-linear Map will be described below. The multiple linear mapping is defined as a multiple linear function and an n-order tensor product expressed by the following equation on the dual space V ^{* with} respect to the vector space V.

また、双線形写像では以下の式と、２階のテンソル積が作るベクトルとして表現される。

ベクトル成分としては、φ（ｉ）φ（ｊ）を新たな基底として双対空間が出力する要素間の関連性をｚ_ｎ＝｛ｙ_１ｙ_１，ｙ_１ｙ_２，…，ｙ_１ｙ_Ｍ，ｙ_２ｙ_１，ｙ_２ｙ_２，…，ｙ_２ｙ_Ｍ，…ｙ_Ｍｙ_１，ｙ_Ｍｙ_２，…，ｙ_Ｍｙ_Ｍ｝とＭ×（Ｍ＋１）／２個の２次の形式で表される。 The bilinear map is expressed as a vector formed by the following equation and a second-order tensor product.

As the vector component, the relationship between elements output by the dual space with φ (i) φ (j) as a new basis is _expressed as z _n = {y ₁ y ₁ , y ₁ y ₂ ,..., Y ₁ y _M , y ₂ y ₁ , y ₂ y ₂ ,..., y ₂ y _M ,... y _M y ₁ , y _M y ₂ ,..., y _M y _M } and M × (M + 1) / 2 quadratic forms expressed.

図３は、この多重線形写像部５の処理を説明した説明図である。多重線形写像部５は、直交化ベクトル再構成部４から出力された４次元の正規直交ベクトル系列データ、即ち４つの要素（ｘ_１、ｘ_２、ｘ_３、ｘ_４）を有する正規直交化ベクトルｘ同士で２階テンソル積（４×４のテーブル）を作成できる。このテンソル積は、列ベクトルｘとその転置ベクトルである行ベクトルｘ^ｔとの行列積を計算して求めることができる。ここで、テンソル積の成分を詳しく見ると、例えば、ｘ_２ｘ_１とｘ_１ｘ_２、ｘ_３ｘ_２とｘ_２ｘ_３など、同テーブルの対角成分を境にして、同じ値となるものが複数存在（互いに対称性を有している）しており、図において白抜きに示してある。このような同じ成分を省いた独立した成分（濃い部分）の数は、Ｍ（Ｍ＋１）／２においてＭ＝４として計算することで１０である。これらの独立した１０個の成分をテンソル積の左列から１列に並べることで、ベクトルｘから高次元化された高次元ベクトルｇ(ｘ)が生成できる。 FIG. 3 is an explanatory diagram for explaining the processing of the multiple linear mapping unit 5. The multiple linear mapping unit 5 is a four-dimensional orthonormal vector sequence data output from the orthogonal vector reconstructing unit 4, that is, an orthonormalized vector having _four elements (x ₁ , x ₂ , x ₃ , x ₄ ). A second-order tensor product (4 × 4 table) can be created between x. The tensor product can be obtained by calculating the matrix product of a row vector x ^t is the transposed vector and column vector x. Here, when the components of the tensor product are examined in detail, for example, x ₂ x ₁ and x ₁ x ₂ , x ₃ x ₂ and x ₂ x _3, etc., the same value is obtained with the diagonal components of the same table as the boundary. There are a plurality of objects (symmetrical to each other), which are shown in white in the figure. The number of independent components (dark portions) excluding the same component is 10 by calculating as M = 4 in M (M + 1) / 2. By arranging these 10 independent components from the left column of the tensor product into one column, a high-dimensional vector g (x) that has been increased from the vector x can be generated.

図４は、ある音素の２４次元のフィルタバンク係数とその対数パワーを３フレーム連結した特徴量の平均パターン、図５はそれらのテンソル積により得られた特徴量の平均パターンである。図５において、３×３のブロックはそれぞれ前後３つの時間フレームをクラス分けしてしたものを表し、ブロックの中身は時間・周波数にまたがるスペクトル間の相関を表現したものになっている。なお、同図において、時間を表した３×３のブロックは、ブロックを単位として対角成分を挟んで対称となるが、３×３ブロックの非対角成分は、ブロックの内部で対称性を有さない。これは、異なるフレーム同士の相関を表現していることによる。一方、３×３ブロックの対角成分は、ブロックの内部において対称性を有している。これは同じフレーム同士の相関を表現していることによる。このように、周波数スペクトルに基づく特徴を複数フレーム分連結したベクトルから計算したテンソル特徴量は、各時刻・各周波数の成分の相関を陽に表現したものとなり、より精密な時間・周波数上のスペクトル構造を抽出できる。 FIG. 4 shows an average pattern of feature quantities obtained by connecting three frames of a 24-dimensional filter bank coefficient of a phoneme and its logarithmic power, and FIG. 5 shows an average pattern of feature quantities obtained by their tensor product. In FIG. 5, each 3 × 3 block represents a group of three time frames before and after, and the contents of the block represent a correlation between spectra over time and frequency. In the figure, the 3 × 3 block representing the time is symmetrical with the diagonal component between the blocks, but the non-diagonal component of the 3 × 3 block has symmetry within the block. I don't have it. This is because the correlation between different frames is expressed. On the other hand, the diagonal component of the 3 × 3 block has symmetry inside the block. This is because the correlation between the same frames is expressed. In this way, the tensor feature amount calculated from a vector obtained by concatenating features based on the frequency spectrum for multiple frames is an explicit representation of the correlation between the components of each time and frequency, and a more precise spectrum on time and frequency. The structure can be extracted.

図６にスペクトル系列を３フレームとした際の音素／ｂ／，／ｄ／，／ｇ／を示した（図の左側）。各フレームは２４個のＢＰＦチャンネルで表示されている。また、図の右側には、スペクトル系列をテンソル表現したテンソル系列／ｂ／，／ｄ／，／ｇ／を示す。テンソル系列における三角部分はフレーム内のチャンネル間関連性を表現している（同一フレームでのテンソル表現は対称になるため、三角領域で表示されている）。また矩形領域は、異なるフレーム（１−２，１−３，２−３）の間の関連性を表現している（データは非対称のため矩形領域で表示されている）。以上は分かりやすい例として、スペクトルから直接二次の関係を表示した。本実施形態において、ここに示したスペクトルデータである音響特徴ベクトル系列は直交化ベクトル再構成部４でクラス共通固有ベクトルセット３により直交化され、双線形写像はこの直交化ベクトル系列に対して行われる。多重線形写像により音響特徴ベクトルの内部構造が陽に表現されることにより次元が大幅に拡張された結果、音素素性クラス間の違いも拡大されており、直交化ベクトル系列が高次元ベクトル系列に変換されていることが分かる。 FIG. 6 shows phonemes / b /, / d /, / g / when the spectrum series is 3 frames (left side of the figure). Each frame is displayed with 24 BPF channels. Further, on the right side of the figure, tensor sequences / b /, / d /, and / g / representing the spectrum sequences as tensors are shown. The triangular part in the tensor sequence expresses the relationship between channels in the frame (the tensor expression in the same frame is symmetric and is displayed in the triangular area). The rectangular area expresses the relationship between the different frames (1-2, 1-3, 2-3) (data is displayed as a rectangular area because of asymmetry). As an easy-to-understand example, the secondary relationship was directly displayed from the spectrum. In this embodiment, the acoustic feature vector sequence which is the spectrum data shown here is orthogonalized by the orthogonal vector reconstruction unit 4 by the class common eigenvector set 3, and bilinear mapping is performed on this orthogonalized vector sequence. . As the internal structure of the acoustic feature vector is explicitly expressed by multiple linear mapping, the dimension is greatly expanded. As a result, the difference between phoneme feature classes is expanded, and the orthogonalized vector sequence is converted to a high-dimensional vector sequence. You can see that.

図１の説明に戻ると、高次部分空間類似度計算部７は、高次元ベクトル系列に対して、（高次）部分空間法による音素素性毎の類似度を計算する。具体的には、予め音素素性クラス毎の高次元ベクトル学習データから主成分分析（ＰＣＡ）によりクラス毎の固有ベクトルセットを抽出しておく。そして、クラスが未知の入力パターン（高次元ベクトルｘ）と、クラスｋのＭ個の固有ベクトルΨ（ｋ，ｍ），ｍ＝１，２，．．Ｍ，ｋ＝１，２，…，Ｋとの内積を計算し、クラスｋの類似度ｓ（ｋ）を次式から計算する。
Returning to the description of FIG. 1, the high-order subspace similarity calculation unit 7 calculates the similarity for each phoneme feature by the (higher-order) subspace method for the high-dimensional vector series. Specifically, an eigenvector set for each class is extracted in advance by principal component analysis (PCA) from high-dimensional vector learning data for each phoneme feature class. Then, an input pattern (high-dimensional vector x) whose class is unknown and M eigenvectors Ψ (k, m), m = 1, 2,. . The inner product of M, k = 1, 2,..., K is calculated, and the similarity s (k) of class k is calculated from the following equation.

すなわち未知クラスの入力に対して、Ｍ個の固有ベクトルとの内積を累積して、Ｋ個の類似度（尤度）の時系列を得る。本発明で用いる高次部分空間法は、通常の部分空間法と異なり、元のベクトル系列を構成する要素間の関連性を利用していることころに特徴がある。 That is, the inner product of M eigenvectors is accumulated for an unknown class input to obtain a time series of K similarities (likelihoods). Unlike the normal subspace method, the higher-order subspace method used in the present invention is characterized in that it uses the relationship between the elements constituting the original vector sequence.

本発明で用いる高次部分空間法(HSM)は、部分空間法が持つ凸最適化が保障されているという利点を持つと同時に、ベクトル系列を構成する要素間の関連性を利用していないという従来の部分空間法の欠点も解消する。 The high-order subspace method (HSM) used in the present invention has the advantage that the convex optimization possessed by the subspace method is guaranteed, and at the same time, does not use the relationship between the elements constituting the vector sequence. The disadvantages of the conventional subspace method are also eliminated.

なお、高次元部分空間類似度計算部７は、図示しない競合学習部を備えており、競合学習部による鏡像学習（P. Perner (Ed.): MLDM 2001, LNAI 2123, pp. 239-248, 2001. c Springer-Verlag; Meng Shi, Tetsushi Wakabayashi, Wataru Ohyama, and Fumitaka Kimura; Faculty of Engineering, Mie University, 1515 Kamihama, Tsu, 514-8507, Japan.等を参照）の動作によって、部分空間法における超平面の境界が正しく修正され、より正確な入力パターンの分類を行うことができる。 The high-dimensional subspace similarity calculation unit 7 includes a competitive learning unit (not shown), and mirror image learning by the competitive learning unit (P. Perner (Ed.): MLDM 2001, LNAI 2123, pp. 239-248, 2001. c Springer-Verlag; Meng Shi, Tetsushi Wakabayashi, Wataru Ohyama, and Fumitaka Kimura; Faculty of Engineering, Mie University, 1515 Kamihama, Tsu, 514-8507, Japan. The boundary of the hyperplane is corrected correctly, and the input pattern can be classified more accurately.

鏡像学習の動作を図７に示す。クラスｋ’に属するパターンｘがクラスｋに誤認識された時、ｋの代表パターンｙに対するｘの鏡像は、ｙ-（ｘ−ｙ）＝２ｙ-ｘとなる。そこで、このｙを使用してＲ（ｋ）＋βｙｙ^Ｔ（Ｒ（ｋ）と自己相関行列を修正し、クラスｋの部分空間をクラスｋ’から遠ざける（βは小さい定数）。この時、クラスｋ’の自己相関行列はＲ（ｋ’）に対してＲ（ｋ’）＋αｘｘ^Ｔ（Ｒ（ｋ’））と学習パターンｘを取り込む方向に修正する。鏡像学習の後、Ｒ（ｋ）から競合学習済みのクラス別固有ベクトルセットを再計算して高次元部分空間法に使用する。なお、上で鏡像を求める際にピボットとなるクラスｋの代表パターンｙは、自己相関行列Ｒ（ｋ）にｘを射影してＲ（ｋ）ｘ（ｙと求める（連想に相当）と求める、もしくはクラスｋの固有ベクトルΨ（ｋ，ｍ）から以下の式と合成する。
The operation of mirror image learning is shown in FIG. When the pattern x belonging to the class k ′ is erroneously recognized as the class k, the mirror image of x with respect to the representative pattern y of k is y− (xy) = 2y−x. Therefore, using this y, R (k) + βyy ^T (R (k) and the autocorrelation matrix are corrected, and the subspace of class k is moved away from class k ′ (β is a small constant). The autocorrelation matrix of 'is modified in the direction to capture R (k') + αxx ^T (R (k ')) and learning pattern x with respect to R (k'). After mirror image learning, competition from R (k) The learned class-specific eigenvector set is recalculated and used in the high-dimensional subspace method, where the representative pattern y of the class k that is the pivot when obtaining the mirror image above is represented by x in the autocorrelation matrix R (k). And R (k) x (determined as y (corresponding to associative)), or is synthesized from the eigenvector Ψ (k, m) of class k with the following equation.

高次部分空間類似度計算部７は、多重線形写像部５から出力された高次元ベクトルデータが有する特徴部を主成分分析によって抽出する。すなわち、高次部分空間類似度計算部７は、高次元ベクトルデータを主成分分析し、そこに表現された主成分を固有ベクトルの形で生成する。それと同時に、高次部分空間類似度計算部７は、生成された固有ベクトルから部分空間法を用いて尤度ｓ_ｋを生成する。なお高次部分空間類似度計算部７は、鏡像を用いた競合学習により誤った分類を訂正するための競合学習部を有し、ここで鏡像を用いた競合学習を行うことによって誤った分類を訂正し、本発明の分類装置の性能を向上させることができる。つまり、高次部分空間類似度計算部７は、鏡像を用いた競合学習を部分空間法に応用させることで音声、文字、画像などの認識率を、よりいっそう向上させることができる。 The high-order subspace similarity calculation unit 7 extracts a feature part included in the high-dimensional vector data output from the multiple linear mapping unit 5 by principal component analysis. That is, the higher-order subspace similarity calculation unit 7 performs principal component analysis on the high-dimensional vector data, and generates the principal components expressed therein in the form of eigenvectors. At the same time, higher subspace similarity calculation unit 7 generates likelihood s _k using subspace method from the generated eigenvectors. The higher-order subspace similarity calculation unit 7 has a competitive learning unit for correcting an erroneous classification by competitive learning using a mirror image. Here, an erroneous classification is performed by performing competitive learning using a mirror image. Corrections can be made to improve the performance of the classification device of the present invention. That is, the higher-order subspace similarity calculation unit 7 can further improve the recognition rate of speech, characters, images, and the like by applying competitive learning using a mirror image to the subspace method.

なお、本発明の分類装置はさらに類似度事後確率化処理部８を備えることが可能である。類似度事後確率化処理部８は高次部分空間類似度計算部７で生成された尤度ｓ_ｋを用い事後確率化の処理を行う。類似度事後確率化処理部８は、高次部分空間類似度計算部７の出力である音素素性毎の類似度系列に対して事後確率化の処理を行う。事後確率化は通常、ソフトマックス処理により行われる。ソフトマックスは、

と、正規化指数関数を使用して処理される。ここでは、より実用的な式を図８Ａに示す類似度事前分布からベイズの式に従い導出した事後確率化処理方法を説明する。ベイズの式

は、類似度値ｓが与えられた時、音素素性がｐ（ｋ）である確率Ｐ（ｐ（ｋ））は、事前分布である音素素性の尤度Ｐ（ｓ｜ｐ（ｋ））および音素素性の生起確率Ｐ（ｐ（ｋ））から計算できることを示している。Ｐ（ｐ（ｋ））は言語モデルの確率なので、後段のＨＭＭ音素分類部で評価される。図８Ａは学習データから観測された音素素性毎の類似度分布Ｐ（ｓ｜ｐ（ｋ））と全ての音素素性の類似度が示す分布を示す。これらの事前に観測できる分布から近似した事前分布のモデルを図８Ｂに示した。これから事前分布を以下に示すように近似して計算することができる。 Note that the classification device of the present invention can further include a similarity posterior probability processing unit 8. The similarity posterior probability processing unit 8 performs the process of the posterior probability of using the likelihood s _k generated by the high-order subspace similarity calculator 7. The similarity posterior probability processing unit 8 performs posterior probability processing on the similarity series for each phoneme feature that is the output of the higher-order subspace similarity calculation unit 7. The posterior probabilistic is usually performed by softmax processing. Softmax

And is processed using a normalized exponential function. Here, a posterior probabilistic processing method in which a more practical equation is derived from the similarity prior distribution shown in FIG. 8A according to the Bayes equation will be described. Bayesian formula

The probability P (p (k)) that the phoneme feature is p (k) given the similarity value s is the likelihood P (s | p (k)) of the phoneme feature that is a prior distribution and It is shown that it can be calculated from the occurrence probability P (p (k)) of the phoneme feature. Since P (p (k)) is the probability of the language model, it is evaluated by the subsequent HMM phoneme classification unit. FIG. 8A shows a similarity distribution P (s | p (k)) for each phoneme feature observed from learning data and a distribution indicated by the similarity of all phoneme features. FIG. 8B shows a model of a prior distribution approximated from the distribution that can be observed in advance. From this, the prior distribution can be approximated and calculated as shown below.

状態事後確率Ｐ（ｋ｜ｘ）の関数をｙ（ｓ_ｋ）＝ｅｘｐ（ａｓ_ｋ＋ｂ）−１とするとして近似する。ただし「ｅｘｐ」の記号は自然対数の底である。ｓ_ｋは尤度、ａとｂは後述の条件式より求める係数である。類似度の最大値をｓ_ｍａｘ、最小値をｓ_ｍｉｎと記述すれば、０＝ｙ（ｓ_ｋ＝ｓ_ｍｉｎ）、１＝ｙ（ｓ_ｋ＝ｓ_ｍａｘ）となるので、係数ａとｂを決定するための以下の条件式が決まる。

上式から求まるａ、ｂをもとの関数ｙ（ｓ_ｋ）＝ｅｘｐ（ａｓ_ｋ＋ｂ）−１に戻せば、状態事後確率Ｐ（ｋ｜ｘ）が以下の通り導出される。ただし、導出の最後でｓ_ｍａｘ＝１と仮定した。 The function of the state posterior probability P (k | x) is approximated as y (s _k ) = exp (as _k + b) −1. However, the symbol “exp” is the base of the natural logarithm. s _k is a likelihood, and a and b are coefficients obtained from conditional expressions described later. If the maximum value of the similarity is described as s _max and the minimum value is described as s _min , 0 = y (s _k = s _min ) and 1 = y (s _k = s _max ), so the coefficients a and b are determined. The following conditional expression is determined.

If a and b obtained from the above equation are returned to the original function y (s _k ) = exp (as _k + b) −1, the state posterior probability P (k | x) is derived as follows. However, s _max = 1 was assumed at the end of derivation.

このようにして、ある入力ｘが与えられたとき、この入力ｘがどのクラスｋに属するかを確率的に計算することができる。
In this way, when a certain input x is given, it can be probabilistically calculated to which class k the input x belongs.

なお、前記したソフトマックス、Ｓ_ｍａｘを使用した近似式、およびＳ_ｍａｘ＝１．０とした場合について、事前に比較実験を行ったところ、Ｓ_ｍａｘ＝１．０とした場合がわずかであるが良い結果を示したため、後述の音素認識評価実験ではＳ_ｍａｘ＝１．０とした時の事後確率化処理を採用している。 In addition, when a comparative experiment was performed in advance with respect to the above-described _softmax , the approximate expression using _Smax , and _Smax = 1.0, there were few cases where _Smax = 1.0. Since a good result was shown, the posterior probability process when S _max = 1.0 is adopted in the phoneme recognition evaluation experiment described later.

素性分類部１０は、高次部分空間類似度計算部７で生成された尤度ベクトルＳ（ｋ）のデータと、素性分類部１０に記憶、格納されている音響モデル及び言語モデル等のパターンモデルを比較する。素性分類部１０は、尤度ベクトル系列に対する比較結果を、出力装置に認識結果として送出するものである。出力結果は、図示しないディスプレイ等に表示され、本発明の分類装置の使用者に分類結果が認識される構成になっている。例えば、素性分類部１０として、通信ネットワークを介して入手可能な各種の音声認識ソフトウェアを用いてもよい。 The feature classification unit 10 includes data of the likelihood vector S (k) generated by the higher-order subspace similarity calculation unit 7 and pattern models such as acoustic models and language models stored and stored in the feature classification unit 10. Compare The feature classifying unit 10 sends a comparison result for the likelihood vector series to the output device as a recognition result. The output result is displayed on a display (not shown) or the like, and the classification result is recognized by the user of the classification device of the present invention. For example, as the feature classification unit 10, various kinds of voice recognition software available via a communication network may be used.

HMM音素分類部１０は、従来から用いられているHMM音素分類部と同様に、音響モデルと言語モデルから音素単位に用意される3〜5状態の確率遷移ネットワークが使用される。ただし、HMM音響モデルが持つ出力確率と遷移確率のうち、出力確率は事後確率化処理後の音素尤度ベクトルが用いられる。HMM分類部では、未知入力に対する音素尤度ベクトルに対して音素モデル(音素毎にモノフォンモデルを使用する場合と、音素文脈の違いを考慮しトライフォンモデルを使用する場合がある)。HMMの音素モデルの中では状態遷移に従って、出力モデルと遷移モデルを使用した確率計算が行われ、音声の始端からの累積値の大小によってパスを選択し、音素モデルの終端では次に接続される音素モデルをN-gram言語モデル(2-gram, 3-gram,…)の確率値から選択した後、続けて探索が進められ終端で最大尤度(対数尤度)が最大となる音素列が正解とされる。 The HMM phoneme classifier 10 uses a 3-5 state probability transition network prepared for each phoneme from an acoustic model and a language model, similarly to the conventionally used HMM phoneme classifier. However, among the output probabilities and transition probabilities of the HMM acoustic model, the phoneme likelihood vector after the posterior probability process is used as the output probability. In the HMM classification unit, a phoneme model is used for a phoneme likelihood vector for an unknown input (a monophone model is used for each phoneme and a triphone model is used in consideration of a difference in phoneme context). In the HMM phoneme model, probability calculation using the output model and the transition model is performed according to the state transition, the path is selected by the magnitude of the accumulated value from the beginning of the speech, and the next is connected at the end of the phoneme model After selecting the phoneme model from the probability values of the N-gram language model (2-gram, 3-gram, ...), the search continues and the phoneme sequence that maximizes the maximum likelihood (log likelihood) at the end is The answer is correct.

図２に本装置のハードウェア構成を示した。中央演算装置１１には入力部１４と入力部１５が備えられている。なお、パターン分類装置１において、特徴部抽出部２、直交化ベクトル再構成部４、多重線形写像部５、高次部分空間類似度計算部７、類似度事後確率化処理部８、および素性分類部１０は、コンピュータの中央演算処理装置１１が、以下に説明する処理手順に従い、数値演算や制御などの処理を実行することで構成される。 FIG. 2 shows the hardware configuration of this apparatus. The central processing unit 11 includes an input unit 14 and an input unit 15. In the pattern classification apparatus 1, the feature part extraction unit 2, the orthogonalized vector reconstruction unit 4, the multiple linear mapping unit 5, the higher-order subspace similarity calculation unit 7, the similarity posterior probability processing unit 8, and the feature classification The unit 10 is configured by the central processing unit 11 of the computer executing processing such as numerical calculation and control according to the processing procedure described below.

なおパターン分類装置において、ＢＰＦ（帯域通過フィルタ）分析を含む前処理部２、直交化ベクトル再構成部４、多重線形写像部５、高次部分空間類似度計算部７、類似度事後確率化部８及びＨＭＭ音素分類部１０は、コンピュータの中央演算処理装置１１が、以下に説明する処理手順に従い、数値演算や制御などの処理を実行することで構成される。また、（クラス共通）固有ベクトルセット３、（高次元ベクトル系列に対する）クラス別固有ベクトルセット６及びＨＭＭ音響モデル、言語モデル９は、何れも前記中央演算処理装置１１と電気的に接続するコンピュータの記憶装置１２に設けられる。記憶装置１２はその他に、中央演算処理装置１１によって実行される処理手順に対応した音素分類プログラムを格納している。 In the pattern classification apparatus, a preprocessing unit 2 including BPF (bandpass filter) analysis, an orthogonalized vector reconstruction unit 4, a multiple linear mapping unit 5, a higher-order subspace similarity calculation unit 7, a similarity posterior probability generation unit 8 and the HMM phoneme classification unit 10 are configured by a central processing unit 11 of a computer executing processing such as numerical calculation and control according to a processing procedure described below. In addition, the eigenvector set 3 (common to classes), the eigenvector set 6 by class (for high-dimensional vector sequences), the HMM acoustic model, and the language model 9 are all computer storage devices that are electrically connected to the central processing unit 11. 12 is provided. In addition, the storage device 12 stores a phoneme classification program corresponding to the processing procedure executed by the central processing unit 11.

中央演算処理装置１１は、例えば入出力インターフェースを備えたＣＰＵなどが使用可能である。また記憶装置１２は、例えばＲＯＭ（リード・オンリー・メモリ）や、ＲＡＭ（ランダム・アクセス・メモリ１６）や、ＨＤＤ（ハードディスクドライブ）などが使用可能である。ここには図示しないが、話者音声の入力を可能にするにマイクロホンなどの入力装置や、例えばＨＭＭ音素分類部１０で得られた認識結果などの出力を可能にするディスプレイやスピーカなどの出力装置を、中央演算処理装置１１の入出力インターフェースと電気的に接続してもよい。 The central processing unit 11 can use, for example, a CPU having an input / output interface. As the storage device 12, for example, a ROM (Read Only Memory), a RAM (Random Access Memory 16), an HDD (Hard Disk Drive), or the like can be used. Although not shown here, an input device such as a microphone to enable input of speaker speech, or an output device such as a display or speaker that enables output of recognition results obtained by the HMM phoneme classification unit 10, for example. May be electrically connected to the input / output interface of the central processing unit 11.

なお、本発明における分類装置１のハードウェア構成は、図１に示すものに限定されない。従って、インターネットなどの通信ネットワークを介して、分類装置１の一部の構成を電気的に接続しても構わない。 Note that the hardware configuration of the classification device 1 in the present invention is not limited to that shown in FIG. Therefore, a part of the configuration of the classification device 1 may be electrically connected via a communication network such as the Internet.

また、本実施形態の分類装置１とは、他のシステムから独立して設けられているが、本発明はこの構成に限定されない。従って、他の装置の一部として組込まれた構成とすることも可能である。また、その場合における入出力は、上述の他の装置を介して間接的に行われることになる。 Moreover, although the classification apparatus 1 of the present embodiment is provided independently from other systems, the present invention is not limited to this configuration. Therefore, it is also possible to adopt a configuration incorporated as part of another device. In this case, input / output is performed indirectly via the other devices described above.

入力装置を利用する場合、多数のパターン（音声、文字、画像、映像信号、顔等）を入力装置に入力する毎に、そのパターンが入力装置でアナログ電気信号に変換され、コンピュータに備えたＡ／Ｄ変換部（図示せず）に出力される。これを受けて、Ａ／Ｄ変換部で変換されたデジタル電気信号を中央演算処理装置１１に取り込むことで、中央演算処理装置１１が取り込まれたパターンを処理し得るように構成されている。 When an input device is used, each time a large number of patterns (sounds, characters, images, video signals, faces, etc.) are input to the input device, the patterns are converted into analog electrical signals by the input device. / D converter (not shown). In response to this, the digital electric signal converted by the A / D converter is loaded into the central processing unit 11 so that the central processing unit 11 can process the captured pattern.

図９は、本発明の分類装置を用いて処理される多重線形写像に基づく特徴抽出（図中ｖ_ｎ）と高次部分空間法（尤度ｓ_ｋ）のブロック図である。図１０は本発明の分類装置を用いた処理の手順を示したフローチャートである。以下、図９および９を参照しながら、そのアルゴリズムを説明する。該アルゴリズムは概ね以下のようなフローをとる： FIG. 9 is a block diagram of feature extraction (v _{n in the} figure) and higher-order subspace method (likelihood s _k ) based on multiple linear maps processed using the classification device of the present invention. FIG. 10 is a flowchart showing a processing procedure using the classification apparatus of the present invention. The algorithm will be described below with reference to FIGS. The algorithm generally follows the flow:

ステップＳ１〜Ｓ３：入力データ（ベクトル）に主成分分析を行い、固有ベクトルを求める。
ステップＳ４：固有ベクトルと入力したベクトルとの内積を求め、スカラー値の集合（＝再構成したベクトル）を求める。
ステップＳ５：求めたベクトルをテンソル空間に写像し、テンソル空間で高次化を行い、高次元化したベクトル（元のベクトル成分間の関連性を陽に表現している）を求める。
ステップＳ６〜Ｓ７：得られた高次元化ベクトルに主成分分析を行い、固有ベクトルを求める。クラス（例えば子音）ごとにデータの分布する部分空間（固有ベクトルに基づく部分空間）の違いを学習し分類する。
ステップＳ８：事前確率化による後処理。
ステップＳ９：素性分類部によるパターンデータの分類。 Steps S1 to S3: Principal component analysis is performed on the input data (vector) to obtain an eigenvector.
Step S4: An inner product of the eigenvector and the input vector is obtained, and a set of scalar values (= reconstructed vector) is obtained.
Step S5: The obtained vector is mapped to a tensor space, and higher order is performed in the tensor space, and a higher-dimensional vector (relationship between original vector components is explicitly expressed) is obtained.
Steps S6 to S7: A principal component analysis is performed on the obtained higher-dimensional vector to obtain an eigenvector. It learns and classifies differences in subspaces (subspaces based on eigenvectors) in which data is distributed for each class (for example, consonants).
Step S8: Post-processing by prior probability.
Step S9: Pattern data classification by the feature classification unit.

以下、上記のアルゴリズムを、順を追って詳しく説明する。
・ステップＳ１（パターン特徴ベクトル系列の生成）
コーパス等の外部に設置された入力部１４を通じてパターン入力されたベクトルデータからパターンの特徴を抽出し、パターン特徴ベクトル系列を生成する。例えば、本発明を音声認識に用いる場合には、特徴抽出部２にてフィルタバンク出力ｘ_１、ｘ_２、・・・ｘ_Ｔ，ｘ_ｔを抽出し、その後、例えばフィルタバンク出力を前後１１フレーム連結して、以下のような音素素性クラスの音響特徴ベクトルを得る。
Hereinafter, the above algorithm will be described in detail in order.
Step S1 (Generation of pattern feature vector series)
A pattern feature vector sequence is generated by extracting pattern features from vector data pattern-inputted through an input unit 14 installed outside a corpus or the like. For example, when the present invention is used for speech recognition, the feature extraction unit 2 extracts the filter bank outputs x ₁ , x ₂ ,... X _T , x _t , and then, for example, outputs the filter bank output to the front and rear 11 frames. By concatenation, the following acoustic feature vectors of the phoneme feature class are obtained.

・ステップＳ２（特徴ベクトルの主成分分析）
前処理部２において、特徴ベクトルに対して主成分分析(PCA)を用い、パターン素性クラス別固有ベクトルΦ（ｋ，ｍ）を算出する。
・ステップＳ３（パターン共通固有ベクトルΦ（ｍ）の生成）
パターン素性クラス別固有ベクトルΦからパターン素性クラス共通の自己相関行列R(j),j=1,2,…J(=I×I)を求め、該自己相関行列Rの固有値問題を解くことでパターン共通固有ベクトルΦ（ｍ）を算出する。
・ステップＳ４（直交化ベクトルｙ（ｍ）の再構成）
ステップＳ３で生成されたパターン共通固有ベクトルΦ（ｍ）と入力の特徴ベクトルｘ（ｉ）との内積から得られるスカラー値ｃ（ｍ）を並べて直交化ベクトルｙ（ｍ）を再構成する。この処理によってクラス全体を冗長性のない（正規）直交空間で形成する。
・ステップＳ５（多重線形写像による特徴ベクトルの高次元化）
再構築された直交化ベクトルを以下の数式に従って多重線形写像する。
Step S2 (principal component analysis of feature vectors)
In the pre-processing unit 2, a principal component analysis (PCA) is performed on the feature vector to calculate an eigenvector Φ (k, m) for each pattern feature class.
Step S3 (Generation of pattern common eigenvector Φ (m))
By finding the autocorrelation matrix R (j), j = 1,2, ... J (= I × I) common to the pattern feature classes from the eigenvectors Φ for each pattern feature class, and solving the eigenvalue problem of the autocorrelation matrix R The common eigenvector Φ (m) is calculated.
Step S4 (reconstruction of orthogonalized vector y (m))
The scalar vector c (m) obtained from the inner product of the pattern common eigenvector Φ (m) generated in step S3 and the input feature vector x (i) is arranged to reconstruct the orthogonalized vector y (m). By this processing, the entire class is formed in a (normal) orthogonal space without redundancy.
Step S5 (higher-order feature vector by multiple linear mapping)
The reconstructed orthogonalized vector is multiple linear mapped according to the following formula.

ただしｘは任意ｄ次元の入力ベクトルであり、実の値を取るものとする。また、ｘとｘとの間にある丸の中にバツがついた記号は、直積記号を表す。よって、上式によって定義される関数は、引数を入力ベクトルｘとそれ自身とのテンソル積に持ち、その結果は今の場合、対称行列となる。最後に、ｖｅｃ_ｔｒｉ（・）は対称行列の下三角部分の各列を並べてベクトル化する関数である。ただし、行列が対象とならない場合、例えば互いに異なる入力ベクトルのテンソル積を用いてベクトルを高次元化する場合は、関数ｖｅｃ_ｔｒｉ（・）は、テンソル積の全ての要素の積を並べる操作をする。つまり、関数ｖｅｃ_ｔｒｉ（・）は与えられたテンソル積の独立な要素を並べる関数である。
・ステップＳ６（高次元固有ベクトルセット抽出）
音素素性クラス毎の高次元ベクトル学習データから主成分分析（ＰＣＡ）によりクラス毎の固有ベクトルセットΨ（ｋ，ｍ）を抽出する。
・ステップＳ７（類似度の計算）
ステップＳ５で得られる高次元ベクトルとステップＳ６で得られるクラス毎の固有ベクトルセットΨ（ｋ，ｍ）との内積を利用して類似度ｓ（ｋ）を抽出する。このとき、競合学習部でクラス毎の固有ベクトルセットの再計算も行う。
・ステップＳ８（事後確率化処理）
ステップＳ７で得られた類似度ｓ（ｋ）を以下の数式に当てはめ、事後確率を計算する。ただしｓ_ｍｉｎは部分空間法による全クラス最小の類似度の値を表す。

つまり入力ｘがクラスｋに属する確率を類似度事後確率化処理部にて算出する。
・ステップＳ９（素性分類部によってパターンデータを分類）
最後に事後確率処理化されたパターンデータを素性分類部１０によって分類する。素性分類部１０には、図示しない記憶装置に、隠れマルコフモデルを用いた音響モデルや言語モデルなどを実行するためのプログラムが備えられている。 However, x is an arbitrary d-dimensional input vector and assumes a real value. A symbol with a cross in a circle between x and x represents a direct product symbol. Therefore, the function defined by the above equation has an argument as a tensor product of the input vector x and itself, and the result is a symmetric matrix in this case. Finally, vec _tri (·) is a function that arranges the columns of the lower triangular part of the symmetric matrix and vectorizes them. However, when a matrix is not a target, for example, when using a tensor product of different input vectors to increase the dimension of a vector, the function vec _tri (·) performs an operation of arranging products of all elements of the tensor product. . That is, the function vec _tri (·) is a function that arranges independent elements of a given tensor product.
Step S6 (High-dimensional eigenvector set extraction)
An eigenvector set Ψ (k, m) for each class is extracted from the high-dimensional vector learning data for each phoneme class by principal component analysis (PCA).
Step S7 (similarity calculation)
The similarity s (k) is extracted using the inner product of the high-dimensional vector obtained in step S5 and the eigenvector set Ψ (k, m) for each class obtained in step S6. At this time, the competitive learning unit also recalculates the eigenvector set for each class.
・ Step S8 (posterior probability processing)
The degree of similarity s (k) obtained in step S7 is applied to the following formula to calculate the posterior probability. Here, s _min represents the minimum similarity value of all classes by the subspace method.

That is, the probability that the input x belongs to the class k is calculated by the similarity posterior probability processing unit.
Step S9 (pattern data is classified by the feature classifier)
Finally, the feature classification unit 10 classifies the pattern data subjected to the posterior probability processing. The feature classifying unit 10 includes a program for executing an acoustic model, a language model, and the like using a hidden Markov model in a storage device (not shown).

本発明は、以上のステップＳ１〜Ｓ９を経て、入力パターンを認識し、分類することができる。なお、本発明の分類装置は、パターン（音声、文字、顔等のベクトルデータ）、例えば音声である場合には、音素や調音素性などの素性抽出精度を高め、かつ演算量を低減することができ、高い音声認識率を有する。本発明の分類装置は、音声のみを対象とするものではなく、文字、画像、顔等のベクトルデータをパターンとする認識にも有効である。このため、本発明の分類装置は、音声データ、文字データ、画像データ等から構成される膨大なデータから構成されるビックデータに含まれるパターンの特徴を高い精度で認識（recognition）、分類（classification）及び識別（discrimination）することができる。 The present invention can recognize and classify an input pattern through the above steps S1 to S9. Note that the classification device of the present invention can increase the accuracy of extracting features such as phonemes and articulatory features and reduce the amount of computation when the pattern (vector data of speech, characters, faces, etc.), for example, speech. And has a high speech recognition rate. The classification device according to the present invention is effective not only for speech but also for recognition using vector data such as characters, images, and faces as patterns. For this reason, the classification apparatus of the present invention recognizes and recognizes the characteristics of patterns included in big data composed of enormous data composed of voice data, character data, image data, etc., and classifies (classification) ) And discrimination.

図１１に本発明に係わる音素認識の実験結果を示す。図１１のＤＮＮ／ＨＭＭはＤＮＮで音素（triphone）を識別し、得られた音素ベクトル列（音素尤度）をＨＭＭに入力するものである。言語モデルはｂｉ−ｇｒａｍを使用した時とｔｒｉ−ｇｒａｍを使用した時を示す。これに対して、ＳＭ／ＨＭＭは従来の部分空間法（ＳＭ）とＨＭＭを組み合わせたもの、ＨＳＭ／ＨＭＭは本発明に掛かる高次部分空間法（ＨＳＭ）とＨＭＭを組み合わせた際の結果である。以上の実験は、音声認識で汎用的に利用される英語の音声コーパス（ＴＩＭＩＴ）を使用したもので、その中でも多くの研究機関がテストに使用している、ＴＩＭＩＴコアテストの結果を示した。図１１から、ＳＭをＨＳＭにすることで性能が大きく改善できることが分かる。また、鏡像学習を使用してクラス間で競合学習を行うことで、言語モデルなしでも従来方式のＤＮＮ／ＨＭＭ；ｂｉ−ｇｒａｍを上回る性能が得られている。さらに、言語モデルを付加することで、ｂｉ−ｇｒａｍ、Ｔｒｉ−ｇｒａｍと性能を向上できることが分かる。 FIG. 11 shows an experimental result of phoneme recognition according to the present invention. The DNN / HMM in FIG. 11 identifies a phoneme (triphone) by DNN and inputs the obtained phoneme vector sequence (phoneme likelihood) to the HMM. The language model indicates when bi-gram is used and when tri-gram is used. In contrast, SM / HMM is the result of combining the conventional subspace method (SM) and HMM, and HSM / HMM is the result of combining the higher order subspace method (HSM) and HMM according to the present invention. . The above experiment was conducted using an English speech corpus (TIMIT), which is widely used for speech recognition. Among them, the results of the TIMIT core test used by many research institutes were shown. From FIG. 11, it can be seen that the performance can be greatly improved by changing SM to HSM. In addition, by performing competitive learning between classes using mirror image learning, performance superior to conventional DNN / HMM; bi-gram is obtained even without a language model. Furthermore, it can be seen that the performance can be improved by adding a language model to bi-gram and tri-gram.

以上、本発明の実施形態について説明したが、当該実施形態はあくまでも例として提示したに過ぎず、発明の範囲を限定することを意図していない。ここに提示したれ実施形態は、その他の様々な形態で実施可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置換、変更が可能である。 As mentioned above, although embodiment of this invention was described, the said embodiment was only shown as an example to the last and is not intending limiting the range of invention. The embodiments presented herein can be implemented in various other forms, and various omissions, substitutions, and changes can be made without departing from the spirit of the invention.

本発明のパターン分類装置、パターン分類方法およびパターン分類プログラムは、多くのパターン（音声、文字、顔等のベクトルデータ）の中から、例えば音素、調音素性を精度良く抽出することができ、しかもパターンを構成するデータを処理する演算量を格段に減少させることができる。このため、本発明のパターン分類装置、パターン分類方法およびパターン分類プログラムは、多くのパターン（音声、文字、顔等のベクトルデータ）を処理することが不可欠な情報処理産業の発展に寄与することができる。また、本発明のパターン分類装置、パターン分類方法およびパターン分類プログラムは、多くのパターン（音声、文字、顔等のベクトルデータ）を精度良く認識（recognition）、分類（classification）及び識別（discrimination）することができるので、これを適用することによりロボット関連技術産業の発展に大きく寄与するものと考えらえる。 The pattern classification apparatus, pattern classification method, and pattern classification program of the present invention can accurately extract, for example, phonemes and articulatory features from many patterns (vector data of speech, characters, faces, etc.). The amount of calculation for processing the data constituting the can be significantly reduced. For this reason, the pattern classification apparatus, pattern classification method, and pattern classification program of the present invention can contribute to the development of the information processing industry in which it is essential to process many patterns (vector data such as speech, characters, and faces). it can. The pattern classification apparatus, pattern classification method, and pattern classification program of the present invention recognize, classify, and discriminate many patterns (vector data such as speech, characters, and faces) with high accuracy. Therefore, it can be considered that this will greatly contribute to the development of the robot-related technology industry.

１分類装置
２前処理部
３固有ベクトルセット
４直交化ベクトル再構成部
５多重線形写像部
６クラス別固有ベクトルセット
７高次部分空間類似度計算部
８類似度事後確率処理部
１０素性分類部（HMM音素分類部）
DESCRIPTION OF SYMBOLS 1 Classifier 2 Preprocessing part 3 Eigenvector set 4 Orthogonalization vector reconstruction part 5 Multiple linear mapping part 6 Class-specific eigenvector set 7 Higher order subspace similarity calculation part 8 Similarity posterior probability processing part 10 Feature classification part (HMM phoneme Classification part)

Claims

A pattern classification device capable of recognizing an input pattern by extracting a pattern feature similarity vector,
A preprocessing unit for performing preprocessing on the input pattern;
A high-dimensional vector conversion unit that converts the pre-processed vector into a high-dimensional vector by multiple linear mapping (tensor product);
A high-order subspace similarity calculation unit that generates and outputs a high-dimensional eigenvector set from the high-dimensional vector, and generates a class-wise similarity between the high-dimensional vector and the high-dimensional eigenvector set Ψ; ,
A pattern classification apparatus comprising: a feature classification unit that classifies individual features based on the similarity by class.

The pre-processing unit maps the input pattern to a class-common dual space by taking an inner product of the input pattern and a common eigenvector set common to the feature class previously generated by principal component analysis on the learning pattern, and the inner product 2. The pattern classification apparatus according to claim 1, further comprising a process of reconstructing an orthogonal vector common to the classes by arranging the scalar values obtained from the operations and arranging the number of eigenvector axes.

The higher-order subspace similarity calculation unit has a competitive learning unit that corrects classification error by applying competitive learning using a mirror image to the subspace method,
Class-specific eigenvectors Ψ (k, m) obtained after competitive learning including mirror image learning between (auto) correlation matrices for each class using high-dimensional feature learning data in the competitive learning unit, The pattern classification apparatus according to claim 1, wherein the pattern classification apparatus is used as a higher-order subspace similarity calculation as an eigenvector set.

The classification device according to claim 1, further comprising a similarity posterior probability processing unit, wherein the similarity posterior probability processing unit receives the similarity as an input and calculates a prior probability of a class to which the input vector data belongs. A pattern classification device characterized by having a configuration to perform.

A pattern classification method capable of recognizing an input pattern by extracting a pattern feature similarity vector,
A preprocessing step of performing preprocessing on the input pattern;
A high-dimensional vector conversion step of converting the pre-processed vector into a high-dimensional vector by multiple linear mapping (tensor product);
A high-order subspace similarity calculation step for generating and outputting a high-dimensional eigenvector set from the high-dimensional vector, and calculating and generating a class-wise similarity between the high-dimensional vector and the high-dimensional eigenvector set Ψ; ,
A pattern classification method comprising: a feature classification step of classifying individual features from the similarity by class.

The preprocessing step maps the input pattern to a class-common dual space by taking an inner product of the input pattern and an eigenvector set common to a feature class previously generated by principal component analysis on the learning pattern, and the inner product The pattern classification method according to claim 5, further comprising a process of reconstructing an orthogonal vector common to the classes by arranging the scalar values obtained from the operations and arranging the number of eigenvector axes.

The higher-order subspace similarity calculation step includes a competitive learning step of correcting classification error by applying competitive learning using a mirror image to a subspace method,
In the competitive learning step, class-specific eigenvectors Ψ (k, m) obtained after performing competitive learning including mirror image learning between (auto) correlation matrices for each class using high-dimensional feature learning data, 6. The pattern classification method according to claim 5, wherein the eigenvector set is used for high-order subspace similarity calculation.

6. The classification method according to claim 5, further comprising a similarity posterior probability processing step, wherein the similarity posterior probability processing step receives the similarity as an input and calculates a prior probability of a class to which the input vector data belongs. A pattern classification method characterized by:

The pattern classification program for making a computer perform the pattern classification method as described in any one of Claims 5-8.