JP6313062B2

JP6313062B2 - Pattern recognition device, pattern recognition method and program

Info

Publication number: JP6313062B2
Application number: JP2014027691A
Authority: JP
Inventors: 聡一郎小野
Original assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2014-02-17
Filing date: 2014-02-17
Publication date: 2018-04-18
Anticipated expiration: 2034-02-17
Also published as: JP2015153241A

Description

本発明の実施形態は、パターン認識装置、パターン認識方法およびプログラムに関する。 Embodiments described herein relate generally to a pattern recognition apparatus, a pattern recognition method, and a program.

パターン認識の分野では、音声信号や文字列画像などのように認識単位の区切りが明らかでない入力信号に対して認識を行う方法として、隠れマルコフモデル（ＨＭＭ：Hidden Markov Model）や条件付き確率場とその派生形が多く用いられている。この方法は、認識対象の区切り判定と認識を同時に行うことができる一方、内部の状態モデルと特徴ベクトルの照合に多くの計算時間を要するという欠点がある。このため、認識単位の区切りが明らかでない入力信号に対し、短時間で高精度な認識を行うことができる新たな技術の提供が望まれている。 In the field of pattern recognition, hidden Markov models (HMMs) and conditional random fields are used as methods for recognizing input signals whose recognition unit boundaries are not clear, such as speech signals and character string images. Many of its derivatives are used. This method has the disadvantage that it can simultaneously perform the determination and recognition of the separation of the recognition target, but requires a lot of calculation time to collate the internal state model with the feature vector. For this reason, it is desired to provide a new technique capable of performing highly accurate recognition in a short time with respect to an input signal whose recognition unit break is not clear.

C．M．ビショップ、「パターン認識と機械学習（上・下）」（村田昇監訳）、シュプリンガー・ジャパン、2007年C. M. Bishop, “Pattern Recognition and Machine Learning (Up / Down)” (translated by Noboru Murata), Springer Japan, 2007 F．Camastra et al．“Machine Learning for Audio，Image and Video Analysis：Theory and Applications”、Springer-Verlag、2007F. Camastra et al. “Machine Learning for Audio, Image and Video Analysis: Theory and Applications”, Springer-Verlag, 2007 高村大也ほか、「言語処理のための機械学習入門」（自然言語処理シリーズ１）、コロナ社、2010年Daiya Takamura et al., “Introduction to Machine Learning for Language Processing” (Natural Language Processing Series 1), Corona, 2010

本発明が解決しようとする課題は、認識単位の区切りが明らかでない入力信号に対し、短時間で高精度な認識を行うことができるパターン認識装置、パターン認識方法およびプログラムを提供することである。 The problem to be solved by the present invention is to provide a pattern recognition apparatus, a pattern recognition method, and a program that can perform high-accuracy recognition in a short time for an input signal whose recognition unit separation is not clear.

実施形態のパターン認識装置は、入力された信号を特徴ベクトルに変換し、該特徴ベクトルを認識辞書と照合することにより、入力された信号のパターン認識を行う。前記認識辞書は、前記特徴ベクトルの空間の部分空間である辞書部分空間を表現する辞書部分空間基底ベクトルと、前記特徴ベクトルと前記辞書部分空間から計算される類似度を尤度に変換するための複数の確率化パラメータと、を有する。パターン認識装置は、認識部を備える。認識部は、前記特徴ベクトルと前記辞書部分空間基底ベクトルの内積の値の二次多項式により前記類似度を計算し、該類似度と前記確率化パラメータの線形和の指数関数により前記尤度を計算する。前記認識辞書は、前記複数の確率化パラメータ間の拘束条件を利用した期待値最大化法により学習される。 The pattern recognition apparatus according to the embodiment performs pattern recognition of an input signal by converting the input signal into a feature vector and comparing the feature vector with a recognition dictionary. The recognition dictionary is a dictionary subspace base vector representing a dictionary subspace which is a subspace of the feature vector space, and a similarity calculated from the feature vector and the dictionary subspace is converted into likelihood. A plurality of stochastic parameters. The pattern recognition apparatus includes a recognition unit. The recognizing unit calculates the similarity by a quadratic polynomial of the inner product value of the feature vector and the dictionary subspace basis vector, and calculates the likelihood by an exponential function of a linear sum of the similarity and the probability parameter. To do. The recognition dictionary is learned by an expected value maximization method using a constraint condition between the plurality of probabilistic parameters.

図１は、ＨＭＭの状態モデルを確率的部分空間モデルに置き換えることを説明する概念図である。FIG. 1 is a conceptual diagram illustrating the replacement of an HMM state model with a probabilistic subspace model. 図２は、実施形態のパターン認識装置の機能的な構成を示すブロック図である。FIG. 2 is a block diagram illustrating a functional configuration of the pattern recognition apparatus according to the embodiment. 図３は、実施形態のパターン認識装置による処理手順の一例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of a processing procedure performed by the pattern recognition apparatus according to the embodiment. 図４は、実施形態のパターン認識装置のハードウェア構成例を示すブロック図である。FIG. 4 is a block diagram illustrating a hardware configuration example of the pattern recognition apparatus according to the embodiment.

以下、実施形態のパターン認識装置、パターン認識方法およびプログラムを、図面を参照しながら説明する。 Hereinafter, a pattern recognition apparatus, a pattern recognition method, and a program according to embodiments will be described with reference to the drawings.

まず、本実施形態の基本概念について説明する。本実施形態では、認識単位の区切りが明らかでない入力信号に対し、短時間で高精度な認識を行えるようにするために、認識対象の区切り判定と認識を同時に行う従来の方法において多くの計算時間を要していた内部の状態モデルと特徴ベクトルの照合の演算を、部分空間法やその派生形による類似度の演算で置き換えることを考える。部分空間法やその派生形は、単一の特徴ベクトルに対する認識に用いられる方法として知られており（下記の参考文献１を参照）、処理時間に比して高い認識精度を得られる利点がある。
＜参考文献１＞E．オヤ、「パターン認識と部分空間法」（小川英光、佐藤誠訳）、産業図書、1986年 First, the basic concept of this embodiment will be described. In the present embodiment, in order to enable recognition with high accuracy in a short time for an input signal whose recognition unit break is not clear, a large amount of calculation time is required in the conventional method for simultaneously performing recognition determination and recognition of a recognition target. Consider the replacement of the internal state model and feature vector matching operations that were required by the subspace method and its similarity calculation. The subspace method and its derivatives are known as methods used for recognition of a single feature vector (see Reference 1 below), and have an advantage that high recognition accuracy can be obtained compared to processing time. .
<Reference Document 1> E. Oya, "Pattern Recognition and Subspace Method" (Hidemitsu Ogawa, Makoto Sato), Sangyo Tosho, 1986

部分空間法やその派生形による類似度の演算は、認識対象の区切り判定と認識を同時に行う従来の方法における状態モデルと特徴ベクトルの照合の演算と同一の目的であり、実際、前者は後者を近似したものとみなすことができる（下記の参考文献２を参照）。
＜参考文献２＞黒沢由明、“球面ガウス分布から導出される部分空間法”信学論（D-2）、J81-D2（6）、pp.1205-1212、1998 Similarity calculation by the subspace method or its derivative form has the same purpose as the state model and feature vector matching operation in the conventional method that simultaneously performs recognition and delimitation of recognition targets. It can be regarded as an approximation (see Reference 2 below).
<Reference 2> Yoshiaki Kurosawa, “Subspace Method Derived from Spherical Gaussian Distribution” Theory of Science (D-2), J81-D2 (6), pp.1205-1212, 1998

そこで、前者における類似度を後者において用いる尤度などの確率尺度に変換する確率化パラメータを導入し、特徴ベクトルｘの尤度Ｌ（ｘ）を下記式（１）のように計算するモデルを考える。

ただし、式（１）におけるＰ’は、正規直交する辞書部分空間基底ベクトルｕ_１，・・・，ｕ_ｋから下記式（２）で計算される行列であり、ｑ，ｗは確率化パラメータである。

また、このとき、下記式（３）で示されるｓは、特徴ベクトルｘの類似度である。

以下では、式（１）の計算を行うモデルを「確率的部分空間モデル」と呼ぶ。 Therefore, a model for calculating the likelihood L (x) of the feature vector x as shown in the following equation (1) by introducing a probability parameter that converts the similarity in the former into a probability scale such as the likelihood used in the latter is considered. .

However, P ′ in the equation (1) is a matrix calculated by the following equation (2) from the orthonormal dictionary subspace basis vectors u ₁ ,..., U _k , and q and w are the stochastic parameters. is there.

At this time, s represented by the following formula (3) is the similarity of the feature vector x.

Hereinafter, the model that performs the calculation of Expression (1) is referred to as a “probabilistic subspace model”.

ここで、図１を参照しながら、隠れマルコフモデル（ＨＭＭ）の状態モデルを、確率的部分空間モデルに置き換える例を説明する。ＨＭＭは、図１（ａ）に模式的に示すように、複数の状態により構成される。ＨＭＭの入力は、特徴ベクトルの系列である。各状態は、単独の特徴ベクトルの統計モデルであり、通常は、図１（ｂ）に模式的に示すような混合ガウス分布モデル（ＧＭＭ：Gaussian Mixture Model）が用いられる（非特許文献１を参照）。ＨＭＭの学習時には、状態および状態間のパラメータがそれぞれ独立に学習される。 Here, an example of replacing the hidden Markov model (HMM) state model with a probabilistic subspace model will be described with reference to FIG. The HMM is constituted by a plurality of states as schematically shown in FIG. The input of the HMM is a series of feature vectors. Each state is a statistical model of a single feature vector, and usually a mixed Gaussian distribution model (GMM: Gaussian Mixture Model) as schematically shown in FIG. 1B is used (see Non-Patent Document 1). ). When learning the HMM, the states and the parameters between the states are learned independently.

ＨＭＭの状態モデルとして用いられるＧＭＭは、単独の特徴ベクトルのモデルとしては、認識精度の割に計算量が多い。そこで、ＨＭＭの状態モデルを、通常用いられるＧＭＭから、図１（ｃ）に模式的に示すような確率的部分空間モデルに置き換える。確率的部分空間モデルは、特徴ベクトルの次元数よりも小さい次元数の部分空間で演算を行うため、ＧＭＭに比べて計算量が少ない。したがって、図１（ｄ）に示すように、ＨＭＭの状態モデルをＧＭＭから確率的部分空間モデルに置き換えて特徴ベクトルの照合演算を行うことにより、短時間で高精度な認識を行うことが可能になる。 A GMM used as an HMM state model has a large amount of calculation for recognition accuracy as a single feature vector model. Therefore, the state model of the HMM is replaced with a probabilistic subspace model as schematically shown in FIG. Since the probabilistic subspace model performs an operation in a subspace having a smaller number of dimensions than the dimension number of the feature vector, the amount of calculation is smaller than that of the GMM. Therefore, as shown in FIG. 1D, the HMM state model is replaced with the probabilistic subspace model from the GMM, and the feature vector matching operation is performed, thereby enabling high-precision recognition in a short time. Become.

このとき、認識においては、ＨＭＭの尤度を計算するＶｉｔｅｒｂｉアルゴリズムは各状態モデルにおける尤度計算の方法に依存しないので（非特許文献１を参照）、ＨＭＭの状態モデルを確率的部分空間モデルに置き換えても、Ｖｉｔｅｒｂｉアルゴリズムをそのまま用いることができる。 At this time, in the recognition, the Viterbi algorithm for calculating the likelihood of the HMM does not depend on the likelihood calculation method in each state model (see Non-Patent Document 1), so the HMM state model is changed to a probabilistic subspace model. Even if it is replaced, the Viterbi algorithm can be used as it is.

一方、確率的部分空間モデルの学習時には、これをＨＭＭの学習に用いるＢａｕｍ−Ｗｅｌｃｈアルゴリズム（非特許文献１を参照）と同様に、期待値最大化法（ＥＭ法）により行うこととすると、各状態への負担率（非特許文献１を参照）は状態モデルの形によらないので、Ｂａｕｍ−Ｗｅｌｃｈアルゴリズムと同様に計算できる。 On the other hand, at the time of learning a probabilistic subspace model, if this is performed by the expected value maximization method (EM method) as in the Baum-Welch algorithm (see Non-Patent Document 1) used for HMM learning, Since the burden ratio to the state (see Non-Patent Document 1) does not depend on the form of the state model, it can be calculated in the same manner as the Baum-Welch algorithm.

そこで、学習データｘ_１，・・・，ｘ_Ｎを用いて、確率的部分空間モデルのパラメータである確率化パラメータｑ，ｗおよび辞書部分空間基底ベクトルｕ_１，・・・，ｕ_ｋを更新することを考える。このときｘ_１，・・・，ｘ_ｎの負担率をγ_１，・・・，γ_ｎとすると、全学習データの対数尤度は学習データの独立性を仮定して、下記式（４）のように表すことができる。

ＥＭ法においては、式（４）で表される対数尤度を最大化するようにパラメータを最大化するが、Ｐ’については主成分分析（非特許文献１を参照）と同様であり、下記式（５）を対角化し、上位ｋ個の固有値に対応する固有ベクトルを順番にとって、辞書部分空間基底ベクトルｕ_１，・・・，ｕ_ｋとすればよい。

Therefore, the learning data _x 1, ..., with _{x N,} the probability parameter q, w and subspace basis vector _u 1 is a parameter of the stochastic subspace model, ..., updates _{u k} Think about it. At this time, _{assuming that} the burden ratio of x ₁ ,..., X _n is γ ₁ ,..., Γ _n , the log likelihood of all learning data assumes the independence of the learning data, and the following equation (4) It can be expressed as

In the EM method, the parameters are maximized so as to maximize the log likelihood represented by the equation (4), but P ′ is the same as in the principal component analysis (see Non-Patent Document 1). The equation (5) is diagonalized, and the eigenvectors corresponding to the top k eigenvalues are taken in order as the dictionary subspace basis vectors u ₁ ,..., U _k .

ところが、確率化パラメータｑ，ｗについては問題が生じる。後述の式（６）の形から明らかなように、データ尤度Ｌはｑ，ｗについて単調であり、データ尤度Ｌを確率化パラメータｑ，ｗについて最大化することは不可能である。実際、ｗを小さく、またｑを大きくとれば、Ｌを任意に大きくすることができてしまい、これは認識モデルとしては不適切である。 However, there is a problem with the stochastic parameters q and w. As will be apparent from the form of Equation (6) described later, the data likelihood L is monotonous with respect to q and w, and it is impossible to maximize the data likelihood L with respect to the probability parameters q and w. In fact, if w is made small and q is made large, L can be arbitrarily increased, which is inappropriate as a recognition model.

そこで、本実施形態では、確率化パラメータｑ，ｗの間に適当な拘束条件ｆ（ｑ，ｗ）＝０を導入することにより、確率的部分空間モデルを適切に学習させることを可能にする。これにより、ＨＭＭの状態モデルなどを確率的部分空間モデルに置き換えた新規な方法により、認識単位の区切りが明らかでない入力信号に対して短時間で高精度な認識を行うことができるパターン認識装置を実現可能とする。 Therefore, in this embodiment, it is possible to appropriately learn the probabilistic subspace model by introducing an appropriate constraint condition f (q, w) = 0 between the probabilistic parameters q and w. As a result, a pattern recognition apparatus capable of performing high-accuracy recognition in a short time for an input signal whose recognition unit break is not obvious by a novel method in which an HMM state model or the like is replaced with a probabilistic subspace model. Make it feasible.

図２は、本実施形態のパターン認識装置の機能的な構成を示すブロック図である。図１に示すように、本実施形態のパターン認識装置は、信号入力部１、特徴抽出部２、認識部３、および辞書更新部４を備える。 FIG. 2 is a block diagram showing a functional configuration of the pattern recognition apparatus of the present embodiment. As shown in FIG. 1, the pattern recognition apparatus of this embodiment includes a signal input unit 1, a feature extraction unit 2, a recognition unit 3, and a dictionary update unit 4.

信号入力部１は、認識対象となる信号の入力を受け付ける。認識対象となる信号は、例えば、画像として表される文字や文字列、その他の画像、波形として表される音声信号や各種のセンサ信号などであり、これらのディジタル情報、または必要に応じて二値化などの前処理を施したディジタル情報が、信号入力部１に入力される。 The signal input unit 1 receives an input of a signal to be recognized. Signals to be recognized are, for example, characters and character strings represented as images, other images, audio signals represented as waveforms, various sensor signals, and the like. These digital information, or two as necessary. Digital information subjected to preprocessing such as valuation is input to the signal input unit 1.

特徴抽出部２は、信号入力部１に入力された信号を、一定の次元数の特徴ベクトルの集合に変換する。具体的には、特徴抽出部２は、まず信号入力部１に入力された信号に窓をかけて窓の範囲の部分信号を抽出する。次に、特徴抽出部２は、抽出した信号部分のそれぞれに対して、長さや量子化レベルを正規化するなどの前処理を施す。そして、特徴抽出部２は、その前処理後の値や、前処理後の信号にさらにガウシアンフィルタなどのフィルタ処理やフーリエ変換などの変換処理を施した後の値を成分とする特徴ベクトルを出力し、信号入力部１に入力された信号に対応する特徴ベクトルの集合を生成する。具体例としては、下記の参考文献３に記載の技術を用いることができる。
＜参考文献３＞J．A．Rodriguez and F．Perronin、“Local Gradient Histogram Features for Word Spotting in Unconstrained Handwritten Documents”、Proc．ICFHR2008、2008 The feature extraction unit 2 converts the signal input to the signal input unit 1 into a set of feature vectors having a certain number of dimensions. Specifically, the feature extraction unit 2 first extracts a partial signal in the window range by applying a window to the signal input to the signal input unit 1. Next, the feature extraction unit 2 performs preprocessing such as normalizing the length and quantization level for each of the extracted signal portions. Then, the feature extraction unit 2 outputs a feature vector whose component is a value after the pre-processing, or a value after further performing a filtering process such as a Gaussian filter or a transformation process such as a Fourier transform on the pre-processed signal. Then, a set of feature vectors corresponding to the signal input to the signal input unit 1 is generated. As a specific example, the technique described in Reference Document 3 below can be used.
<Reference Document 3> J. A. Rodriguez and F. Perronin, “Local Gradient Histogram Features for Word Spotting in Unconstrained Handwritten Documents”, Proc. ICFHR2008, 2008

認識部３は、認識辞書１０を用いて、特徴抽出部２により生成された特徴ベクトルの集合を評価し、信号入力部１に入力された信号が属するクラスまたはクラスの集合を表す認識結果を出力する。 The recognition unit 3 evaluates the set of feature vectors generated by the feature extraction unit 2 using the recognition dictionary 10 and outputs a recognition result representing the class or set of classes to which the signal input to the signal input unit 1 belongs. To do.

認識辞書１０は、本実施形態のパターン認識装置が信号の分類先として扱うそれぞれのクラスに対応するモデルを含むデータベースであり、本実施形態のパターン認識装置の内部または外部に保持される。認識辞書１０が保持する各クラスのモデルは、ＨＭＭのように複数の状態により構成され、それぞれの状態が上述した確率的部分空間モデルである。つまり、認識辞書１０は、クラスごとのモデルの各状態に対応する辞書部分空間基底ベクトルｕ_１，・・・，ｕ_ｋと、確率化パラメータｑ，ｗとを保持している。辞書部分空間基底ベクトルｕ_１，・・・，ｕ_ｋは、特徴ベクトルの次元数よりも少ない次元の辞書部分空間を表現するパラメータであり、確率化パラメータｑ，ｗは、特徴ベクトルと辞書部分空間から計算される類似度を尤度に変換するためのパラメータである。 The recognition dictionary 10 is a database including models corresponding to the respective classes handled by the pattern recognition apparatus of the present embodiment as signal classification destinations, and is held inside or outside the pattern recognition apparatus of the present embodiment. Each class model held by the recognition dictionary 10 is composed of a plurality of states like the HMM, and each state is the above-described stochastic subspace model. That is, the recognition dictionary 10 holds dictionary subspace basis vectors u ₁ ,..., U _k corresponding to each state of the model for each class, and the stochastic parameters q, w. The dictionary subspace basis vectors u ₁ ,..., U _k are parameters that represent a dictionary subspace having a dimension smaller than the number of dimensions of the feature vector, and the stochastic parameters q and w are the feature vector and the dictionary subspace. This is a parameter for converting the similarity calculated from the above into likelihood.

認識部３は、認識辞書１０に含まれるモデルを組み合わせて、特徴抽出部２により生成された特徴ベクトルの集合との最適な対応を探索し、モデルのラベル集合を出力する。このとき、認識部３は、認識辞書１０に含まれる各モデルの各状態において、特徴ベクトルの集合のうちの１つまたは複数の特徴ベクトルに対し、特徴ベクトルと辞書部分空間基底ベクトルｕ_１，・・・，ｕ_ｋの内積の値の二次多項式により類似度を計算し、その類似度と確率化パラメータｑ，ｗの線形和の指数関数により尤度を計算する。そして、全体としてのデータ尤度Ｌが最大となるモデルの組み合わせを選び、そのラベル集合を出力する。 The recognition unit 3 combines the models included in the recognition dictionary 10 to search for an optimum correspondence with the set of feature vectors generated by the feature extraction unit 2, and outputs a set of model labels. At this time, the recognition unit 3 performs the feature vector and the dictionary subspace basis vector u ₁ ,... With respect to one or a plurality of feature vectors in the set of feature vectors in each state of each model included in the recognition dictionary 10. .., U _k calculates the similarity with a quadratic polynomial of the inner product value, and calculates the likelihood with an exponential function of the linear sum of the similarity and the probabilistic parameters q and w. Then, a combination of models that maximizes the data likelihood L as a whole is selected, and the label set is output.

データ尤度Ｌは、特徴ベクトルの集合の要素ｘ_１，・・・，ｘ_Ｔがそれぞれ辞書部分空間Ｕ_１，・・・，Ｕ_ｔを持つモデルＭ_１，・・・，Ｍ_Ｔに対応するとき、下記式（６）として得られる。

このとき、モデル列がＭ_１，・・・，Ｍ_Ｔとなる確率Ｐ（Ｍ_１，・・・，Ｍ_Ｔ）は、Ｎグラム（下記の参考文献４を参照）などの確率的言語モデルによって決定することができ、通常はバイグラムを用いてＢａｕｍ−Ｗｅｌｃｈアルゴリズムにより学習される（非特許文献１を参照）。
＜参考文献４＞北研二、「確率的言語モデル」（言語と計算５）、東京大学出版会、1999年 Data likelihood L, the element _x 1 of the set of feature vectors, ..., _{x T} dictionary subspace _U 1, respectively, ..., model _M 1 with _{U t,} ..., corresponding to the _{M T} Is obtained as the following formula (6).

In this case, the model train _M 1, · · ·, probability _P to be _{M T (M 1, ···,} M T) , depending on probabilistic language model, such as N-gram (see reference 4 below) Usually, it is learned by the Baum-Welch algorithm using bigram (see Non-Patent Document 1).
<Reference 4> Kenji Kita, "Probabilistic Language Model" (Language and Calculation 5), University of Tokyo Press, 1999

辞書更新部４は、認識部３による処理が終了した後、入力信号から生成された特徴ベクトルの集合を用いて認識辞書１０を更新する。この際、辞書更新部４は、確率化パラメータｑ，ｗ間の拘束条件ｆ（ｑ，ｗ）＝０を利用した期待値最大化法により、認識辞書１０の学習を行う。以下、認識辞書１０を更新する方法の具体例を説明する。 The dictionary update unit 4 updates the recognition dictionary 10 using a set of feature vectors generated from the input signal after the processing by the recognition unit 3 is completed. At this time, the dictionary updating unit 4 learns the recognition dictionary 10 by the expected value maximization method using the constraint condition f (q, w) = 0 between the probabilistic parameters q and w. Hereinafter, a specific example of a method for updating the recognition dictionary 10 will be described.

ある状態モデルに入力された特徴ベクトルをｘ_１，・・・，ｘ_Ｎとし、その負担率がγ_１，・・・，γ_Ｎと書けるとき、まず、下記式（７）を計算し、Ｋの上位ｋ個の固有値に対応する固有ベクトルをｕ_１，・・・，ｕ_ｋとして、その状態モデルにおける辞書部分空間基底ベクトルを更新する。

_X 1 input feature vector to a state model, ..., and _{x N,} ₁ its burden rate gamma, ..., when written as gamma _N, firstly, calculates the following equation (7), K top k u ₁ eigenvectors corresponding to _eigenvalues, ..., as u _k, updating the dictionary subspace basis vector in the state model.

そして、確率化パラメータｑ，ｗ間の拘束条件ｆ（ｑ，ｗ）＝０に対し、下記式（８）に示す方程式の解をｑとし、さらにその解をｆ（ｑ，ｗ）＝０に代入して得られる解をｗとする。

ただし、μは認識時の類似度を１から引いたものの重み付き平均であり、下記式（９）のように表される。

Then, for the constraint condition f (q, w) = 0 between the stochastic parameters q and w, the solution of the equation shown in the following equation (8) is set to q, and the solution is further set to f (q, w) = 0. Let w be the solution obtained by substitution.

However, μ is a weighted average obtained by subtracting the similarity at the time of recognition from 1, and is expressed as the following formula (9).

確率化パラメータｑ，ｗ間の拘束条件としては、例えば、下記式（１０）で表される状態モデルの実効次元を一定に保つとの条件、すなわち、下記式（１１）で示される条件が挙げられる。

As a constraint condition between the probabilistic parameters q and w, for example, a condition that the effective dimension of the state model expressed by the following formula (10) is kept constant, that is, a condition expressed by the following formula (11) is given. It is done.

このとき、上記式（８）は下記式（１２）、すなわちＥ／２ｗ＝μとなるから、これを解いてｗ＝Ｅ／２μが得られる。

さらに、ｗ＝Ｅ／２μを上記式（１０）に代入して、下記式（１３）が得られる。これらの値で、確率化パラメータｑ，ｗを更新すればよい。

At this time, since the above equation (8) becomes the following equation (12), that is, E / 2w = μ, this is solved to obtain w = E / 2μ.

Further, by substituting w = E / 2μ into the above equation (10), the following equation (13) is obtained. The probability parameters q and w may be updated with these values.

辞書更新部４は、認識部３での認識に用いた各状態モデルそれぞれについて、以上のような辞書部分空間基底ベクトルｕ_１，・・・，ｕ_ｋの更新と、確率化パラメータｑ，ｗの更新を行う。これにより、認識を行うたびに認識辞書１０が自動学習されて、認識精度が向上する。 The dictionary update unit 4 updates the dictionary subspace basis vectors u ₁ ,..., U _k as described above and the probability parameters q and w for each state model used for recognition by the recognition unit 3. Update. Thereby, whenever the recognition is performed, the recognition dictionary 10 is automatically learned, and the recognition accuracy is improved.

次に、本実施形態のパターン認識装置による処理の概要について、図３に沿って説明する。図３は、本実施形態のパターン認識装置による処理手順の一例を示すフローチャートである。 Next, an outline of processing by the pattern recognition apparatus of the present embodiment will be described with reference to FIG. FIG. 3 is a flowchart illustrating an example of a processing procedure performed by the pattern recognition apparatus according to the present embodiment.

まず、信号入力部１が、認識対象となる信号の入力を受け付ける（ステップＳ１０１）。信号入力部１に入力された信号は、特徴抽出部２に渡される。 First, the signal input unit 1 receives an input of a signal to be recognized (step S101). The signal input to the signal input unit 1 is passed to the feature extraction unit 2.

次に、特徴抽出部２が、ステップＳ１０１で入力された信号を信号入力部１から受け取り、上述した方法によって、この信号から特徴ベクトルの集合を生成する（ステップＳ１０２）。特徴抽出部２により生成された特徴ベクトルの集合は、認識部３に渡される。 Next, the feature extraction unit 2 receives the signal input in step S101 from the signal input unit 1, and generates a set of feature vectors from this signal by the method described above (step S102). A set of feature vectors generated by the feature extraction unit 2 is passed to the recognition unit 3.

次に、認識部３が、ステップＳ１０２で生成された特徴ベクトルの集合を特徴抽出部２から受け取り、認識辞書１０を用いて特徴ベクトルの集合を評価して、ステップＳ１０１で入力された信号が属するクラスまたはクラスの集合を表す認識結果を出力する（ステップＳ１０３）。このとき、認識部３は、認識辞書１０に含まれる各モデルの各状態において、上述した特徴ベクトルの類似度の計算および尤度の計算を行って、全体としてのデータ尤度Ｌが最大となるモデルの組み合わせを選び、そのラベル集合を出力する。この認識処理の後、認識部３に入力された特徴ベクトルの集合と認識部３が出力したラベル集合が、辞書更新部４に渡される。 Next, the recognition unit 3 receives the set of feature vectors generated in step S102 from the feature extraction unit 2, evaluates the set of feature vectors using the recognition dictionary 10, and the signal input in step S101 belongs. A recognition result representing a class or a set of classes is output (step S103). At this time, the recognition unit 3 performs the above-described feature vector similarity calculation and likelihood calculation in each state of each model included in the recognition dictionary 10, and the data likelihood L as a whole is maximized. Select a combination of models and output the label set. After this recognition processing, the set of feature vectors input to the recognition unit 3 and the label set output from the recognition unit 3 are passed to the dictionary update unit 4.

次に、辞書更新部４が、認識部３に入力された特徴ベクトルの集合と認識部３が出力したラベル集合を受け取り、認識部３での認識に用いた各状態モデルそれぞれについて、上述した方法により、辞書部分空間基底ベクトルおよび確率化パラメータの更新を行う（ステップＳ１０４）。この際、特に確率化パラメータの更新には、上述した確率化パラメータ間の拘束条件を利用する。 Next, the dictionary update unit 4 receives the set of feature vectors input to the recognition unit 3 and the label set output from the recognition unit 3, and the above-described method for each state model used for recognition by the recognition unit 3 Thus, the dictionary subspace basis vector and the stochastic parameter are updated (step S104). At this time, in particular, for updating the probability parameter, the constraint condition between the probability parameters described above is used.

以上、具体的な例を挙げながら説明したように、本実施形態のパターン認識装置では、クラスごとのモデルの状態モデルとして、上述した確率的部分空間モデルを用いる。認識辞書１０は、各状態モデルのそれぞれに対応する辞書部分空間基底ベクトルと、確率化パラメータとを保持している。そして、認識部３は、入力信号から生成された特徴ベクトルの集合に対し、それぞれの特徴ベクトルと辞書部分空間基底ベクトルの内積の値の二次多項式により類似度を計算し、得られた類似度と確率化パラメータの線形和の指数関数により尤度を計算して、全体としてのデータ尤度が最大となるモデルの組み合わせを選び、そのラベル集合を認識結果として出力する。また、認識部３による処理が終了すると、辞書更新部４が、入力信号から生成された特徴ベクトルの集合を用いて認識辞書１０を更新する。この際、辞書更新部４は、確率化パラメータ間の拘束条件を利用した期待値最大化法により、認識辞書１０の学習を行う。したがって、本実施形態のパターン認識装置によれば、認識単位の区切りが明らかでない入力信号に対して短時間で高精度な認識を行うことができる。 As described above, as described with specific examples, the pattern recognition apparatus according to the present embodiment uses the above-described stochastic subspace model as a state model of a model for each class. The recognition dictionary 10 holds dictionary subspace basis vectors and probability parameters corresponding to each state model. Then, the recognizing unit 3 calculates a similarity to the set of feature vectors generated from the input signal by using a quadratic polynomial of the inner product value of each feature vector and the dictionary subspace basis vector, and the obtained similarity The likelihood is calculated by the exponential function of the linear sum of the probability parameter and the model combination that maximizes the overall data likelihood is selected, and the label set is output as the recognition result. When the processing by the recognition unit 3 is finished, the dictionary update unit 4 updates the recognition dictionary 10 using a set of feature vectors generated from the input signal. At this time, the dictionary update unit 4 learns the recognition dictionary 10 by the expected value maximization method using the constraint condition between the probabilistic parameters. Therefore, according to the pattern recognition apparatus of this embodiment, high-accuracy recognition can be performed in a short time for an input signal whose recognition unit break is not clear.

なお、以上の説明では、確率化パラメータ間の拘束条件として、状態モデルの実効次元を一定に保つとの条件を用いたが、利用可能な拘束条件はこれに限らない。例えば、モデルの各状態において下記式（１４）を一定に保つとの条件、すなわち、下記式（１５）で示される条件を用いてもよい。

In the above description, the condition that the effective dimension of the state model is kept constant is used as the constraint condition between the probabilistic parameters. However, the usable constraint condition is not limited to this. For example, a condition that the following expression (14) is kept constant in each state of the model, that is, a condition represented by the following expression (15) may be used.

このとき、上記式（８）は下記式（１６）となるから、これを上記式（１４）に代入して、下記式（１７）が得られる。これらの値で、確率化パラメータｑ，ｗを更新すればよい。

At this time, since the above equation (8) becomes the following equation (16), the following equation (17) is obtained by substituting this into the above equation (14). The probability parameters q and w may be updated with these values.

また、以上の説明では、ＨＭＭの状態モデルを式（１）で表される確率的部分空間モデルに置き換える例を想定したが、これに限らない。認識対象の区切り判定と認識を同時に行う他の方法において、尤度の計算に時間がかかるモデルを確率的部分空間モデルに置き換えるようにしてもよい。さらに、式（１）で表される確率的部分空間モデルに代えて、同様の機能を有するモデル、つまり部分空間法により類似度を計算し、類似度から尤度を計算する他のモデルを用いるようにしてもよい。 In the above description, an example in which the state model of the HMM is replaced with the probabilistic subspace model represented by Expression (1) is assumed, but the present invention is not limited to this. In another method of performing recognition determination and recognition at the same time, a model that takes time to calculate likelihood may be replaced with a probabilistic subspace model. Further, instead of the probabilistic subspace model represented by the equation (1), a model having the same function, that is, another model for calculating the similarity by the subspace method and calculating the likelihood from the similarity is used. You may do it.

また、以上の説明では、辞書更新部４をパターン認識装置の内部に備えた例を想定したが、辞書更新部４は、パターン認識装置の外部に設けてもよい。この場合、パターン認識装置の外部に設けられた辞書更新部４は、例えば、パターン認識装置と通信しながら上述した認識辞書１０の更新の処理を行う。 In the above description, an example in which the dictionary updating unit 4 is provided inside the pattern recognition device is assumed. However, the dictionary updating unit 4 may be provided outside the pattern recognition device. In this case, the dictionary update unit 4 provided outside the pattern recognition device performs the update process of the recognition dictionary 10 described above while communicating with the pattern recognition device, for example.

本実施形態のパターン認識装置は、例えば図４に示すように、ＣＰＵ（Central Processing Unit）１０１などのプロセッサ、ＲＯＭ（Read Only Memory）１０２やＲＡＭ（Random Access Memory）１０３などの記憶装置、ＨＤＤ（Hard Disk Drive）１０４などの補助記憶装置、ネットワークに接続して通信を行う通信Ｉ／Ｆ１０５、各部を接続するバス１０６などを備えた、通常のコンピュータを利用したハードウェア構成を採用することができる。この場合、上述した各機能的な構成要素は、コンピュータ上で所定のパターン認識プログラムを実行することによって実現することができる。 As shown in FIG. 4, for example, the pattern recognition apparatus of this embodiment includes a processor such as a CPU (Central Processing Unit) 101, a storage device such as a ROM (Read Only Memory) 102 and a RAM (Random Access Memory) 103, an HDD ( (Hard Disk Drive) 104 or the like, a communication I / F 105 that communicates by connecting to a network, a bus 106 that connects each part, and the like can be employed. . In this case, each functional component described above can be realized by executing a predetermined pattern recognition program on the computer.

このパターン認識プログラムは、インストール可能な形式または実行可能な形式のファイルでＣＤ−ＲＯＭ（Compact Disk Read Only Memory）、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ（Compact Disk Recordable）、ＤＶＤ（Digital Versatile Disc）等のコンピュータで読み取り可能な記録媒体に記録されてコンピュータプログラムプロダクトとして提供される。 This pattern recognition program is a file in an installable or executable format and is a CD-ROM (Compact Disk Read Only Memory), flexible disk (FD), CD-R (Compact Disk Recordable), DVD (Digital Versatile Disc). The program is recorded on a computer-readable recording medium such as a computer program product.

また、このパターン認識プログラムを、インターネットなどのネットワークに接続された他のコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、このパターン認識プログラムをインターネットなどのネットワーク経由で提供または配布するように構成してもよい。 Further, the pattern recognition program may be provided by being stored on another computer connected to a network such as the Internet and downloaded via the network. The pattern recognition program may be provided or distributed via a network such as the Internet.

また、このパターン認識プログラムを、ＲＯＭ１０２等に予め組み込んで提供するように構成してもよい。 Further, the pattern recognition program may be provided by being incorporated in advance in the ROM 102 or the like.

このパターン認識プログラムは、本実施形態のパターン認識装置の各処理部（信号入力部１、特徴抽出部２、認識部３、および辞書更新部４）を含むモジュール構成となっており、実際のハードウェアとしては、例えば、ＣＰＵ１０１（プロセッサ）が上記記録媒体からプログラムを読み出して実行することにより、上述した各処理部がＲＡＭ１０３（主記憶）上にロードされ、上述した各処理部がＲＡＭ１０３（主記憶）上に生成されるようになっている。なお、本実施形態のパターン認識装置は、上述した各処理部の一部または全部を、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field-Programmable Gate Array）などの専用のハードウェアを用いて実現することも可能である。 This pattern recognition program has a module configuration including each processing unit (the signal input unit 1, the feature extraction unit 2, the recognition unit 3, and the dictionary update unit 4) of the pattern recognition apparatus according to the present embodiment. As the hardware, for example, the CPU 101 (processor) reads out the program from the recording medium and executes the program, whereby each processing unit described above is loaded onto the RAM 103 (main memory), and each processing unit described above is loaded into the RAM 103 (main memory). ) To be generated on top. Note that the pattern recognition apparatus of the present embodiment realizes part or all of the above-described processing units using dedicated hardware such as an application specific integrated circuit (ASIC) or a field-programmable gate array (FPGA). It is also possible.

以上、本発明の実施形態を説明したが、ここで説明した実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。ここで説明した新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。ここで説明した実施形態やその変形は、発明の範囲や要旨に含まれるとともに、請求の範囲に記載された発明とその均等の範囲に含まれる。 As mentioned above, although embodiment of this invention was described, embodiment described here is shown as an example and is not intending limiting the range of invention. The novel embodiments described herein can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. The embodiments and modifications described herein are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１信号入力部
２特徴抽出部
３認識部
４辞書更新部
１０認識辞書
ｕ_１，・・・，ｕ_ｋ辞書部分空間基底ベクトル
ｑ，ｗ確率化パラメータ 1 the signal input unit 2 feature extraction section 3 recognition section 4 dictionary update unit 10 a recognition dictionary _u _1, ···, u _k subspace basis vector q, w probability parameter

Claims

A pattern recognition device that performs pattern recognition of an input signal by converting the input signal into a feature vector and collating the feature vector with a recognition dictionary,
The recognition dictionary is a dictionary subspace base vector representing a dictionary subspace which is a subspace of the feature vector space, and a similarity calculated from the feature vector and the dictionary subspace is converted into likelihood. A plurality of stochastic parameters,
A recognizing unit that calculates the similarity by a quadratic polynomial of the inner product value of the feature vector and the dictionary subspace basis vector, and calculates the likelihood by an exponential function of a linear sum of the similarity and the probability parameter Prepared,
The pattern recognition apparatus, wherein the recognition dictionary is learned by an expected value maximization method using a constraint condition between the plurality of probability parameters.

The pattern recognition apparatus according to claim 1, further comprising a dictionary update unit that learns the recognition dictionary by an expected value maximization method using a constraint condition between the probability parameters.

The recognition dictionary has a model composed of a plurality of states for each class, the dictionary subspace basis vector corresponding to each of the states of the model, and the probability parameter,
The said recognition part performs the calculation of the said similarity degree, and the calculation of the said likelihood with respect to one or more of the said feature vectors in each state of the said model, The Claim 1 or 2 characterized by the above-mentioned. Pattern recognition device.

The pattern recognition apparatus according to claim 3, wherein the constraint condition is a condition that an effective dimension of the state corresponding to the plurality of probability parameters is kept constant.

A pattern recognition method executed in a pattern recognition apparatus that performs pattern recognition of an input signal by converting an input signal into a feature vector and collating the feature vector with a recognition dictionary,
The recognition dictionary is a dictionary subspace base vector representing a dictionary subspace which is a subspace of the feature vector space, and a similarity calculated from the feature vector and the dictionary subspace is converted into likelihood. A plurality of stochastic parameters,
The pattern recognition device calculates the similarity by a second order polynomial of the inner product value of the feature vector and the dictionary subspace basis vector;
The pattern recognition device calculating the likelihood by an exponential function of a linear sum of the similarity and the probability parameter; and
The pattern recognition method, wherein the recognition dictionary is learned by an expected value maximization method using a constraint condition between the plurality of probability parameters.

A program for causing a computer to function as a pattern recognition device that performs pattern recognition of an input signal by converting the input signal into a feature vector and collating the feature vector with a recognition dictionary,
The recognition dictionary is a dictionary subspace base vector representing a dictionary subspace which is a subspace of the feature vector space, and a similarity calculated from the feature vector and the dictionary subspace is converted into likelihood. A plurality of stochastic parameters,
In the computer,
A function of calculating the degree of similarity by a quadratic polynomial of the inner product value of the feature vector and the dictionary subspace basis vector;
A function for calculating the likelihood by an exponential function of a linear sum of the similarity and the probability parameter;
The recognition dictionary is learned by an expected value maximization method using a constraint condition between the plurality of stochastic parameters.