JP5017941B2

JP5017941B2 - Model creation device and identification device

Info

Publication number: JP5017941B2
Application number: JP2006177102A
Authority: JP
Inventors: 勇馬
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2006-06-27
Filing date: 2006-06-27
Publication date: 2012-09-05
Anticipated expiration: 2026-06-27
Also published as: JP2008009548A

Description

本発明は、データモデルを作成する技術および作成されたモデルを用いてデータを識別する技術に関する。 The present invention relates to a technique for creating a data model and a technique for identifying data using the created model.

データのモデル化および解析は、与えられた学習用データから決定関数および未知の分布を学習することを目的とする。そして、得られた決定関数または分布は、未知のサンプルを識別（分類）するために用いられる。 Data modeling and analysis aims to learn a decision function and an unknown distribution from given learning data. The obtained decision function or distribution is used to identify (classify) unknown samples.

一般に、モデル化問題においては、入力ベクトルｘ_ｉ（ｉ＝１〜Ｎ）および対応するクラス（ターゲット、ラベルとも呼ばれる）ｔ_ｉ（ｉ＝１〜Ｎ）を含む、数１で表される学習用データセットが与えられる。 In general, in the modeling problem, for learning represented by Equation 1, which includes an input vector x _i (i = 1 to N) and a corresponding class (also called a target or a label) t _i (i = 1 to N). A data set is given.

この学習用データから、モデルｐ（ｔ｜ｘ）の推論が行われる。そして、未知のサンプルｘ^＊が与えられたときに、得られたモデルを用いてこのサンプルのクラスを推定することができる。 The model p (t | x) is inferred from the learning data. Then, given an unknown sample x ^* , the class of this sample can be estimated using the obtained model.

このようにして得られたデータモデルの性能は、問題そのものの複雑さ、学習用データのサンプル数、学習用データ内の誤りの数、識別器の複雑さなど、いくつかの要因によって定まる。一般に、情報検索や人間とコンピュータのやりとりなどのデータ解析問題の多くは複雑である。したがって、これらのデータ解析問題においては、カーネル・トリックなどの複雑な方法を使用しなければならない。一方、これらの用途においては、メモリ容量の小さなランタイムライブラリである必要があり、また、高速な評価速度も必要である。従来のモデル化手法では、上記のような複雑な問題を小サイズのランタイムライブラリで高速に評価することは困難であった。 The performance of the data model thus obtained is determined by several factors such as the complexity of the problem itself, the number of samples of learning data, the number of errors in the learning data, and the complexity of the classifier. In general, many data analysis problems such as information retrieval and human-computer interaction are complex. Therefore, in these data analysis problems, complex methods such as kernel tricks must be used. On the other hand, in these applications, it is necessary that the runtime library has a small memory capacity, and a high evaluation speed is also required. In the conventional modeling method, it has been difficult to quickly evaluate the above complicated problem with a small-size runtime library.

良好な性能を得るために、サポート・ベクトル・マシン（Support Vector Machine。以下、ＳＶＭと表記。特許文献１）や、レリバント・ベクトル・マシン（Relevant Vector Machine。以下、ＲＶＭと表記。特許文献２）では、数２で表される判別関数に基づいて
識別を行っている。 In order to obtain good performance, a support vector machine (hereinafter referred to as SVM. Patent Document 1) and a relevant vector machine (hereinafter referred to as RVM. Patent Document 2). Then, identification is performed based on the discriminant function expressed by Formula 2.

ここで、ｗ_ｉ（ｉ＝０〜Ｎ）は重み付け係数であり、Ｋ（ｘ，ｘ_ｉ）はカーネル関数である。

Here, w _i (i = 0 to N) is a weighting coefficient, and K (x, x _i ) is a kernel function.

ＳＶＭにおいては、ターゲット関数の最適化は、学習セットについての分類エラーを最小にしつつ、同時にトレードオフ・パラメータでカーネルによって暗黙裏に定められる特徴空間における２つの分類の間のマージンを最大化することによって行われる。このようにして行われる最適化によると、重み付けの多くがゼロとなり、非ゼロの重み付けがマージン上／マージン内あるいはマージンの誤り側に位置するｘ_ｉのみに組み合わされるスパ
ース（疎）なカーネル分離器を得ることができる。非ゼロの重み付けが与えられたｘ_ｉはサポートベクトルと呼ばれる。決定関数はこれらのサポートベクトルのみによって決定されるため、モデルの複雑さが簡略化される。ＳＶＭにおいては、一般に、学習サンプルのうちの１／１０〜１／５がサポートベクトルとなる。 In SVM, target function optimization maximizes the margin between two classifications in the feature space implicitly defined by the kernel with trade-off parameters while minimizing classification errors for the training set. Is done by. According to the optimization performed in this way, a sparse kernel separator in which much of the weighting is zero and non-zero weighting is combined only with x _i located on / in the margin or on the error side of the margin Can be obtained. X _i given a non-zero weight is called a support vector. Since the decision function is determined only by these support vectors, the complexity of the model is simplified. In SVM, generally, 1/10 to 1/5 of learning samples is a support vector.

ＲＶＭはＳＶＭよりもさらにスパースな学習方法である。ＲＶＭでは、数２における重み付け係数ｗ_ｉに、数３で表される独立なハイパーパラメータαによって制御される平均ゼロのガウス事前分布を導入する。 RVM is a more sparse learning method than SVM. In RVM, the weighting coefficient w _i in Equation 2 is introduced with a Gaussian prior distribution with a mean zero controlled by the independent hyperparameter α expressed by Equation 3.

そして、学習サンプルに基づいて、周辺尤度が最大になるように、重み付け係数およびハイパーパラメータを決定する。 Then, based on the learning sample, the weighting coefficient and the hyper parameter are determined so that the marginal likelihood is maximized.

ＲＶＭはＳＶＭよりもスパースであるが、特徴選択などのいくつかの用途については依然として十分にスパースではない。また、重み付け係数が数２で表されるように線形な形式であるという点も、現実の用途において柔軟性に欠ける。
米国特許第５６４９０６８号明細書米国特許第６６３３８５７号明細書 RVM is sparser than SVM, but is still not sparse enough for some applications such as feature selection. In addition, the fact that the weighting coefficient is in a linear form as expressed by Equation 2 also lacks flexibility in actual applications.
US Pat. No. 5,649,068 US Pat. No. 6,633,857

本発明は上記実情に鑑みてなされたものであって、その目的とするところは、スパースなモデルを作成するための技術を提供することにある。 The present invention has been made in view of the above circumstances, and an object thereof is to provide a technique for creating a sparse model.

上記目的を達成するために本発明では、以下の手段または処理によって学習用データからモデルの作成を行う。すなわち、本発明は、学習用データからモデルを作成するモデル作成装置であって、モデルを表すパラメータの事前分布としてラプラス分布を仮定し、この事前分布と学習用データから算出されるモデルの尤もらしさ（事後確率）が最大となるように、パラメータを決定することによってモデルを作成する。 In order to achieve the above object, in the present invention, a model is created from learning data by the following means or processing. That is, the present invention is a model creation device that creates a model from learning data, and assumes a Laplace distribution as a prior distribution of parameters representing the model, and the likelihood of the model calculated from the prior distribution and the learning data. A model is created by determining parameters so that (posterior probability) is maximized.

より具体的には、本発明は学習用データからモデルを作成するモデル作成装置であって、学習用データを入力する入力手段と、複数の基底関数の重み付け和として表されるモデルを作成するモデル作成手段とを備える。ここで、モデル作成手段は、基底関数の重み付けを制御する重み付けパラメータの事前分布を、ハイパーパラメータによって制御されるラプラス分布として決定する事前分布決定手段と、重み付けパラメータを決定する重み付けパラメータ決定手段と、を有し、重み付けパラメータおよびハイパーパラメータは、事前分布に基づいて算出される学習用データが与えられたときのモデルの尤もらしさを表すモデルの事後確率が最大となるように決定される。 More specifically, the present invention is a model creation device that creates a model from learning data, and a model that creates an input means for inputting learning data and a model represented as a weighted sum of a plurality of basis functions Creating means. Here, the model creating means includes a prior distribution determining means for determining the prior distribution of the weighting parameter for controlling the weighting of the basis function as a Laplace distribution controlled by the hyperparameter, a weighting parameter determining means for determining the weighting parameter, The weighting parameter and the hyper parameter are determined so that the posterior probability of the model representing the likelihood of the model when learning data calculated based on the prior distribution is given is maximized.

このように、重み付けパラメータの事前分布としてラプラス分布を設定することによって、重み付けパラメータは０を取る確率が高くなる。すなわち、事後確率を最大化することによって得られる重み付けパラメータの多くは０となる。したがって、モデルを決定するために記憶する必要のある学習データの数が少ないモデル、つまり、スパースなモデルを得ることが可能となる。 In this way, by setting the Laplace distribution as the prior distribution of the weighting parameter, the probability that the weighting parameter takes 0 increases. That is, many of the weighting parameters obtained by maximizing the posterior probability are zero. Therefore, it is possible to obtain a model with a small number of learning data that needs to be stored in order to determine a model, that is, a sparse model.

事後確率を最大化するように重み付けパラメータおよびハイパーパラメータを決定する処理は、期待値最大化法によって行われることが好ましい。つまり、重み付けパラメータおよびハイパーパラメータの推定値を決定し、この推定値に基づいて条件付き期待値を算出して、条件付き期待値が最適化されるように重み付けパラメータおよびハイパーパラメータを修正する。そして、修正後の重み付けパラメータおよびハイパーパラメータに基づいて、上記の処理を繰り返し適用する。この繰り返し処理は、所定の収束条件を満たすまで行われることが好ましい。 The process of determining the weighting parameter and the hyper parameter so as to maximize the posterior probability is preferably performed by an expected value maximization method. That is, the estimated values of the weighting parameter and the hyper parameter are determined, the conditional expected value is calculated based on the estimated value, and the weighting parameter and the hyper parameter are corrected so that the conditional expected value is optimized. Then, the above processing is repeatedly applied based on the corrected weighting parameter and hyperparameter. This iterative process is preferably performed until a predetermined convergence condition is satisfied.

また、基底関数の重み付け係数は、重み付けパラメータを変数とする重み付けマッピング関数によって決定されることが好ましい。この際、重み付けマッピング関数は重み付けパラメータに対して、線形であっても非線型であっても良い。特に、本発明に係るモデル作成装置では、非線形な重み付けマッピング関数を用いることができ、したがって柔軟なモデルを作成することが可能となる。なお、非線形な重み付けマッピング関数としては、ステップ関数を用いることも好ましい。 Moreover, it is preferable that the weighting coefficient of the basis function is determined by a weighting mapping function having a weighting parameter as a variable. At this time, the weighting mapping function may be linear or non-linear with respect to the weighting parameter. In particular, the model creation apparatus according to the present invention can use a non-linear weighted mapping function, and thus can create a flexible model. Note that it is also preferable to use a step function as the non-linear weighted mapping function.

なお、本発明の別の態様は、上記のモデル作成装置によって作成されたモデルを用いた識別装置である。すなわち、入力されたデータを識別する識別装置であって、データを入力するデータ入力手段と、上記のモデル作成装置によって作成されたモデルを記憶する記憶手段と、入力されたデータをこのモデルによって識別する識別手段とを有する。 Another aspect of the present invention is an identification device using a model created by the model creation device. That is, an identification device for identifying input data, wherein data input means for inputting data, storage means for storing a model created by the model creation device, and input data are identified by this model Identification means.

上記のモデル作成装置によって作成されるモデルはスパースなモデルであるため、識別装置に必要とされるメモリ容量が少なくて済み、また、識別処理を高速に行うことが可能である。 Since the model created by the model creation device is a sparse model, the memory capacity required for the identification device is small, and the identification process can be performed at high speed.

また、本発明は、上記処理の少なくとも一部を含むモデル作成方法や識別方法、または、かかる方法を実現するためのプログラムとして捉えることもできる。上記手段および処理の各々は可能な限り互いに組み合わせて本発明を構成することができる。 The present invention can also be understood as a model creation method and identification method including at least a part of the above processing, or a program for realizing the method. Each of the above means and processes can be combined with each other as much as possible to constitute the present invention.

本発明によれば、スパースなモデルを作成することが可能となる。 According to the present invention, it is possible to create a sparse model.

以下に図面を参照して、この発明の好適な実施の形態を例示的に詳しく説明する。 Exemplary embodiments of the present invention will be described in detail below with reference to the drawings.

本実施形態は、入力されたデータを識別（分類）する識別装置である。識別装置では、あらかじめ学習によって得られた識別器に、データを入力することでそのデータを識別する。 The present embodiment is an identification device that identifies (classifies) input data. In the identification device, data is identified by inputting the data to a classifier obtained by learning in advance.

以下で説明するモデルは、どのような識別処理に対しても有効である。識別処理としては、例えば、顔画像から男性か女性かを判別する２クラス分類問題、顔画像から年齢層を判別したり、複数の登録者のうちどの顔と合致するか判断（顔認証）したりという多クラス分類問題、顔画像から顔の向きを角度（連続値）で推定する回帰分析があるが、このいずれにも適用可能である。なお、ここでは入力データとして顔画像を例に説明したが、その他の画像データや音声データ、あるいはその他どのようなデータを識別するモデルであっても構わない。 The model described below is effective for any identification process. As the identification processing, for example, a two-class classification problem for determining whether a face image is male or female, an age group is determined from a face image, and which face of a plurality of registrants is matched (face authentication) There is a multi-class classification problem, and regression analysis that estimates the orientation of a face from a face image by an angle (continuous value) is applicable to both. Although a face image has been described as an example of input data here, other image data, audio data, or any other model for identifying data may be used.

以下では、まず識別器の学習方法について説明し、次に学習によって得られた識別器を用いて識別を行う方法について説明する。 Below, the learning method of a discriminator is demonstrated first, and the method of discriminating using the discriminator obtained by learning is demonstrated.

なお、以下で説明する学習処理および識別処理は、汎用のコンピュータ（情報処理装置
）が、プログラムを実行することによって行われる。典型的なコンピュータのハードウェア構成は図１に示す通りである。コンピュータは、入力装置１０１，メモリ１０２，ＣＰＵ（中央演算処理装置）１０３，出力装置１０４を備える。また、以下の学習処理および識別処理は分散型のコンピュータや他のどのような形式のコンピュータによって実行されても構わない。 Note that the learning process and the identification process described below are performed by a general-purpose computer (information processing apparatus) executing a program. A typical computer hardware configuration is as shown in FIG. The computer includes an input device 101, a memory 102, a CPU (Central Processing Unit) 103, and an output device 104. The following learning process and identification process may be executed by a distributed computer or any other type of computer.

＜学習処理＞
［概要］
まず初めに、一般化予測モデルの概要について説明する。図２は、予測モデルの概要を示す図である。モデル２０２は、入力データ２０１を受け付けて、この入力データがどのクラスに属するかを示す離散値または連続値の識別結果２０３を出力する線形な予測モデルであって良い。したがって、モデル２０２の作成は、モデル作成装置に入力される学習用データセットに基づいて、モデルについて事前分布を生成することによって行われる必要がある。なお、モデル作成装置（学習装置）は、汎用のコンピュータ上においてＣＰＵがプログラムを実行することで、以下の処理を行う。 <Learning process>
[Overview]
First, an outline of the generalized prediction model will be described. FIG. 2 is a diagram showing an outline of the prediction model. The model 202 may be a linear prediction model that receives the input data 201 and outputs a discrete value or continuous value identification result 203 indicating to which class the input data belongs. Therefore, the creation of the model 202 needs to be performed by generating a prior distribution for the model based on the learning data set input to the model creation device. Note that the model creation device (learning device) performs the following processing when the CPU executes a program on a general-purpose computer.

図３は、モデル作成装置の概要を示す図である。前述したように、モデル作成装置３０２は、数４で示される学習用データセット３０１を受け取る。

ここで、ｘ_ｉはパターンベクトルであり、ｔ_ｉは対応するスカラー・ターゲットである。 FIG. 3 is a diagram showing an outline of the model creation device. As described above, the model creation apparatus 302 receives the learning data set 301 expressed by Equation 4.

Where x _i is the pattern vector and t _i is the corresponding scalar target.

モデル作成装置３０２は、数５で表されるモデル３０３を最終的に出力する。なお、モデル作成装置３０２のより詳細な機能ブロック図を図４に示す。モデル作成装置３０２は、学習用データの入力を受け付ける学習データ入力部３４と、学習用データからモデルを作成するモデル作成部３０とから構成される。モデル作成部３０は、重み付けパラメータの事前分布を決定する事前分布決定部３２と、重み付けパラメータを決定する重み付けパラメータ決定部３３と、決定された事前分布・重み付けパラメータ・入力された学習用データに基づいてモデルの事後確率を求める事後確率算出部３１とを備える。事前分布決定部３２と重み付けパラメータ決定部３３は、モデルの事後確率が最大となるように、事前分布および重み付けパラメータを決定する。以下、各処理の詳細について説明する。

The model creation device 302 finally outputs the model 303 expressed by Equation 5. A more detailed functional block diagram of the model creation device 302 is shown in FIG. The model creation apparatus 302 includes a learning data input unit 34 that receives input of learning data, and a model creation unit 30 that creates a model from the learning data. The model creation unit 30 is based on the prior distribution determination unit 32 that determines the prior distribution of the weighting parameter, the weighting parameter determination unit 33 that determines the weighting parameter, the determined prior distribution, the weighting parameter, and the input learning data. And a posterior probability calculating unit 31 for determining the posterior probability of the model. The prior distribution determination unit 32 and the weighting parameter determination unit 33 determine the prior distribution and the weighting parameter so that the posterior probability of the model is maximized. Details of each process will be described below.

［一般化予測モデル］
一般化線形予測モデルにおいては、学習装置の出力は、数６で示すようにいくつかの基底関数Ｋ（ｘ_ｉ，ｘ）の線形重み付け和として表現することができる。

[Generalized prediction model]
In the generalized linear prediction model, the output of the learning device can be expressed as a linear weighted sum of several basis functions K (x _i , x) as shown in Equation 6.

ｓ（ｗ_ｉ）は、基底関数Ｋ（ｘ_ｉ，ｘ）に関連付けられ、重み付けパラメータｗ_ｉ（ｉ
＝０〜Ｎ）によって制御される重み付けマッピング関数であり、さまざまな形態を持つことが可能であるが、最も一般的な形として以下のような２種類の形態がある。

s (w _i ) is associated with the basis function K (x _i , x) and the weighting parameter w _i (i
= 0 to N), and can have various forms. The most common forms include the following two forms.

また、εは、平均値０および偏差σを有する雑音である。 Further, ε is noise having an average value of 0 and a deviation σ.

モデルの学習処理においては、学習用データセットが与えられたときのモデルの尤もらしさが最大となるようにモデルパラメータを決定する。条件付き分布

が、ガウシアンまたは正規形式

であるとする。この場合、学習用データセットの尤度は、次のように表される。

In the model learning process, model parameters are determined so that the likelihood of the model when the learning data set is given is maximized. Conditional distribution

Is Gaussian or canonical form

Suppose that In this case, the likelihood of the learning data set is expressed as follows.

なお、Φは、Ｎ×（Ｎ＋１）の設計マトリクスであり、次の数１２で表される内部学習セット・カーネル値から成る。

Note that Φ is an N × (N + 1) design matrix, and includes an internal learning set kernel value expressed by the following equation (12).

ｗおよびσの最尤推定によると、一般に過学習が生じ、学習データに過剰に依存することになる。そこで、重み付けパラメータに対して次の数１３で表される平均が０のラプラス分布を事前分布として定義する。

ここで、||w||₁は、

であり、また、αは一様分布のもとで分布したスカラー・ハイパーパラメータである。図５にラプラス分布の例を示した。図５に示す実線のグラフがラプラス分布であり、点線はＲＶＭにおいて事前分布として利用される正規分布（ガウス分布）である。 According to the maximum likelihood estimation of w and σ, overlearning generally occurs and depends excessively on the learning data. Therefore, a Laplace distribution with an average of 0 expressed by the following equation 13 is defined as a prior distribution with respect to the weighting parameter.

Where || w || ₁ is

And α is a scalar hyperparameter distributed under a uniform distribution. FIG. 5 shows an example of a Laplace distribution. The solid line graph shown in FIG. 5 is a Laplace distribution, and the dotted line is a normal distribution (Gaussian distribution) used as a prior distribution in the RVM.

事後確率最大化法によれば、最適なパラメータは以下の式によって求められる。

According to the posterior probability maximization method, the optimum parameter is obtained by the following equation.

しかしながら、これらのパラメータは直接数式から求めることはできないので、αを隠れ変数と考えてＥＭアルゴリズムを用いることによりｐ（ｗ，σ^２｜ｔ）を最大化する。αを考慮することにより、最大化すべき事後確率は、

となる。これは、ベイズ推定則によって以下のように変形できる。

However, since these parameters cannot be obtained directly from mathematical expressions, p (w, σ ² | t) is maximized by using EM algorithm with α as a hidden variable. By considering α, the posterior probability to be maximized is

It becomes. This can be modified as follows according to the Bayesian estimation rule.

これを最大化するＥＭアルゴリズムは、以下のＥステップとＭステップの２つから構成され、これらのステップを繰り返し適用することで事後確率を最大化する。 The EM algorithm for maximizing this is composed of the following two E steps and M steps, and the posterior probability is maximized by repeatedly applying these steps.

Ｅステップ：

E step:

Ｍステップ：

M step:

Ｅステップは、以下のように計算できる。

The E step can be calculated as follows.

Ｍステップでは、以下の手続によって、σ^２（ｋ＋１），ｗ（ｋ＋１）を求め、Ｑを最大化する。σ^２（ｋ＋１）は、以下の式によって得ることができる。

In the M step, σ ² (k + 1) and w (k + 1) are obtained by the following procedure, and Q is maximized. σ ² (k + 1) can be obtained by the following equation.

ｗ（ｋ＋１）は、以下の条件が成立するｗを求めることにより推定する。これには共役勾配法などの手法を用いることができる。

w (k + 1) is estimated by obtaining w satisfying the following condition. For this, a technique such as a conjugate gradient method can be used.

異なる重み付けマッピング関数Ｓ（ｗ）について、ＥＭアルゴリズムは異なる具体的な形態を有する。ここでは、Ｓ（ｗ）＝ｗの場合における、重み付けパラメータｗおよび雑音偏差σの最適化の具体的な方法について、以下で図６のフローチャートに基づいて説明する。なお、Ｓ（ｗ）＝ｗ以外の場合も同様にｗおよびσの最適化を行うことができる。 For different weighting mapping functions S (w), the EM algorithm has different specific forms. Here, a specific method for optimizing the weighting parameter w and the noise deviation σ when S (w) = w will be described with reference to the flowchart of FIG. It should be noted that w and σ can be similarly optimized when S (w) = w.

［Ｓ０．初期化］
まず、ｋ＝０における、パラメータの初期値ｗ（０）およびσ（０）を定める。この初期値は、任意の値を取ることができるが、経験的に妥当な値が分かっている場合にはその値を用いることが好ましい。 [S0. Initialization]
First, initial values w (0) and σ (0) of parameters at k = 0 are determined. Although this initial value can take any value, it is preferable to use this value when a reasonable value is known empirically.

［Ｓ１．小さな重み付けパラメータを省略］
次に、重み付けパラメータのうち十分に小さいならば、それらに対応するサンプルを学習セットから削除することができる。すなわち、重み付けパラメータのうち所定の閾値よりも小さい場合には、それに対応するサンプルを学習セットから取り除き、残った学習セットに対応する重み付けパラメータだけからｗを再構成する。たとえば、今、ｍ番目の重みｗ_ｍが十分に小さかったとすると、再構成されるｗは、以下のようになる。

[S1. Omit small weighting parameters]
Then, if the weighting parameters are sufficiently small, the samples corresponding to them can be deleted from the training set. That is, when the weighting parameter is smaller than a predetermined threshold, the corresponding sample is removed from the learning set, and w is reconstructed from only the weighting parameter corresponding to the remaining learning set. For example, if the m-th weight w _m is now sufficiently small, the reconstructed w is as follows.

［Ｓ２．モデルパラメータの更新］
Ｓ（ｗ）＝ｗの場合には、雑音偏差σは、次式の方法で更新することができる。ここで、Φは、上記のＳ１において省略したｗに対応する列を取り除いて再構成したものである。

[S2. Update model parameters]
When S (w) = w, the noise deviation σ can be updated by the following method. Here, Φ is reconfigured by removing the column corresponding to w omitted in S1.

また、重み付けパラメータｗは、以下の式にしたがって更新することができる。

The weighting parameter w can be updated according to the following formula.

［Ｓ３．対数事後確率の計算］
次に、上記のようにして更新されたｗ（ｋ＋１）、σ（ｋ＋１）に基づいて、Ｑ（ｋ＋１）を以下の式にしたがって計算することができる。

[S3. Calculation of log posterior probability]
Next, based on w (k + 1) and σ (k + 1) updated as described above, Q (k + 1) can be calculated according to the following equation.

［Ｓ４．収束判定］
事後確率が収束するか否かを判定する。すなわち、｜Ｑ（ｋ＋１）−Ｑ（ｋ）｜が所定の閾値以下であるか否かを判定する。事後確率が収束する場合には、最適化処理を終了し、収束しない場合にはＳ１へと戻る。 [S4. Convergence judgment]
Determine whether the posterior probability converges. That is, it is determined whether or not | Q (k + 1) −Q (k) | If the posterior probability converges, the optimization process is terminated, and if not converged, the process returns to S1.

このようにして、重み付けパラメータｗと雑音偏差σが定められ、したがって、モデルｐ（ｔ｜ｘ）が決定される。なお、重み付けパラメータｗ_ｉはラプラス分布にしたがっているため、ｗ_ｉのうちの多くはゼロとなる。すなわち、モデルｐ（ｔ｜ｘ）の決定のために記憶する必要のある基底関数の数が少なくなり、本実施形態に係る学習モデルはスパースである。 In this way, the weighting parameter w and the noise deviation σ are determined, and therefore the model p (t | x) is determined. Since the weighting parameter w _i follows a Laplace distribution, many of w _i are zero. That is, the number of basis functions that need to be stored for determining the model p (t | x) is reduced, and the learning model according to the present embodiment is sparse.

図７は、本実施形態に係るモデル作成装置によって作成されるモデルのスパースさを説明する図である。図において、各点は学習データを表し、丸で囲まれたものは対応する重み付けパラメータが非ゼロの学習データである。図７（ａ）はＳＶＭによって作成されたモデルであり、重み付けパラメータが非ゼロの学習データ（ＳＶＭにおいてはサポートベクトルと呼ばれる）は８８個である。図７（ｂ）はＲＶＭによって作成されたモデルであり、重み付けパラメータが非ゼロの学習データ（ＲＶＭにおいてはレリバント・ベクトルと呼ばれる）は４個であり、ＳＶＭよりもスパースである。図７（ｃ）は本実施形態に係るモデル作成装置によって作成されたモデルであり、重み付けパラメータが非ゼロの学習データは３個であり、ＲＶＭよりもさらにスパースなモデルを作成することができる。なお、ＳＶＭ，ＲＶＭ、本実施形態とも使用した学習データは同じものでありその数は３００個である。 FIG. 7 is a diagram for explaining the sparseness of a model created by the model creation device according to the present embodiment. In the figure, each point represents learning data, and the circled data is learning data whose corresponding weighting parameter is non-zero. FIG. 7A shows a model created by SVM, and there are 88 learning data with non-zero weighting parameters (called support vectors in SVM). FIG. 7B shows a model created by the RVM. The number of learning data having non-zero weighting parameters (referred to as a relevant vector in the RVM) is four, which is sparser than the SVM. FIG. 7C shows a model created by the model creation apparatus according to the present embodiment. There are three learning data with non-zero weighting parameters, and a model that is sparser than RVM can be created. Note that the learning data used in the SVM, RVM, and this embodiment is the same, and the number thereof is 300.

スパースなモデルを作成することの利点は以下の通りである。まず、重み付けが０である学習データは記憶する必要がないので、モデルを小サイズのランタイム・ライブラリとすることが可能であり、メモリ容量に対する制限の厳しい小型の装置（携帯電話機やＰＤＡなど）にも搭載することが可能となる。また、モデルを用いた識別処理において、計算量が少なくてすむため、高速な識別処理が可能となる。さらに、スパースであればあるほど、学習モデルに対する依存度が低くなることが知られており、未知のサンプルに対する汎化性能が高いモデルとなる。 The advantages of creating a sparse model are: First, since it is not necessary to store learning data with a weight of 0, the model can be made into a small-sized runtime library, and it can be used for small devices (such as mobile phones and PDAs) that have severe restrictions on memory capacity. Can also be installed. Further, in the identification process using the model, the calculation amount is small, so that a high-speed identification process is possible. Furthermore, it is known that the more sparse, the lower the dependence on the learning model, and the model has a higher generalization performance for unknown samples.

上記の説明においては、重み付けマッピング関数としてＳ（ｗ）＝ｗの線形の重み付けマッピング関数を仮定したが、本実施形態では非線形の重み付けマッピング関数を用いることができる。この場合も、モデルパラメータはＥＭアルゴリズムによって決定することができる。非線形の重み付けマッピング関数を用いることによって、様々な特徴を有するモデルに対応することができる。また、重み付けマッピング関数として、次式で示されるステップ関数を用いることも好ましい。

In the above description, a linear weighted mapping function of S (w) = w is assumed as the weighted mapping function. However, in this embodiment, a non-linear weighted mapping function can be used. Again, the model parameters can be determined by the EM algorithm. By using a non-linear weighted mapping function, a model having various features can be handled. It is also preferable to use a step function expressed by the following equation as the weighted mapping function.

このようなステップ関数を用いることで、重み付け係数は離散値（例えば、−１，０，＋１）となる。重み付け係数を整数に限定することで、重み付け係数が実数を取りうる場合に比較して、コンピュータ上における識別処理を高速に行うことが可能である。 By using such a step function, the weighting coefficient becomes a discrete value (for example, -1, 0, +1). By limiting the weighting coefficient to an integer, it is possible to perform identification processing on a computer at a higher speed than when the weighting coefficient can take a real number.

＜識別処理＞
識別装置は、モデル作成装置と同様に、図１に示すような汎用のコンピュータによって構成される。識別装置は、入力装置１０１からデータの入力を受け付け、メモリ１０２に格納されたモデルに基づいて識別処理を行い、識別結果を出力装置１０４に出力する。メモリ１０２に格納されたモデルは、上記で説明した学習処理によって作成されるモデルである。この識別処理は、ＣＰＵ１０３が、プログラムを実行することによって実現される。 <Identification process>
The identification device is constituted by a general-purpose computer as shown in FIG. The identification device accepts data input from the input device 101, performs identification processing based on the model stored in the memory 102, and outputs the identification result to the output device 104. The model stored in the memory 102 is a model created by the learning process described above. This identification process is realized by the CPU 103 executing a program.

入力装置１０１は、例えば、ハードディスク装置などの記録媒体からデータを読み込むための装置であったり、ネットワークを介してデータを取得するためのネットワークインタフェースであったりする。 The input device 101 is, for example, a device for reading data from a recording medium such as a hard disk device, or a network interface for acquiring data via a network.

本実施形態において、モデルの識別結果は連続値として出力されるため、回帰分析を行うことも可能であり、適切な閾値を設定することで２クラス分類や他クラス分類の分類処理を行うことも可能である。 In the present embodiment, the model identification result is output as a continuous value, so it is possible to perform a regression analysis, and it is possible to perform classification processing of two class classification and other class classification by setting an appropriate threshold value. Is possible.

本実施形態に係るモデル作成装置および識別装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the model creation apparatus and identification apparatus which concern on this embodiment. 本実施形態において作成されるモデルの概要を示す図である。It is a figure which shows the outline | summary of the model produced in this embodiment. 本実施形態に係るモデル作成装置の概要を示す図である。It is a figure which shows the outline | summary of the model production apparatus which concerns on this embodiment. 本実施形態に係るモデル作成装置の機能ブロックを示す図である。It is a figure which shows the functional block of the model production apparatus which concerns on this embodiment. 本実施形態および従来例において使用される事前分布を示す図である。It is a figure which shows the prior distribution used in this embodiment and a prior art example. 本実施形態におけるモデルパラメータを決定する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which determines the model parameter in this embodiment. 本実施形態および従来例において作成されるモデルのスパースさを説明する図である。It is a figure explaining the sparsity of the model created in this embodiment and a prior art example.

Explanation of symbols

３０２モデル作成装置
３０モデル作成部
３１事後確率算出部
３２事前分布決定部
３３重み付けパラメータ決定部
３４学習データ入力部 302 Model creation device 30 Model creation unit 31 A posteriori probability calculation unit 32 Prior distribution determination unit 33 Weighting parameter determination unit 34 Learning data input unit

Claims

A model creation device for creating a model from learning data,
An input means for inputting learning data;
A model creation means for creating a model expressed as a weighted sum of a plurality of basis functions;
With
The model creation means includes
A prior distribution determining means for determining a prior distribution of weighting parameters for controlling the weighting of the plurality of basis functions as a Laplace distribution controlled by a hyperparameter;
Weighting parameter determining means for determining the weighting parameter;
Have
The weighting parameter and the hyper parameter are determined so that the posterior probability of the model representing the likelihood of the model calculated from the prior distribution and the learning data is maximized ,
The weighting coefficient of the basis function is determined by a step function having the weighting parameter as a variable.
A model creation device characterized by that.

The model creation apparatus according to claim 1 , wherein the weighting parameter and the hyper parameter are determined by an expected value maximization method.

An identification device for identifying input data,
Data input means for inputting data;
Storage means for storing a model created by the model creation apparatus according to claim 1 or 2,
Identifying means for identifying input data by the model;
An identification device comprising:

A model creation method for creating a model from learning data,
Information processing device
Acquiring learning data; and
Creating a model represented as a weighted sum of a plurality of basis functions;
Including
The step of creating the model includes
Determining a prior distribution of weighting parameters that control the weighting of the plurality of basis functions as a Laplace distribution controlled by hyperparameters;
Determining the weighting parameters;
Including
The weighting parameter and the hyper parameter are determined so that the posterior probability of the model representing the likelihood of the model calculated from the prior distribution and the learning data is maximized ,
The weighting coefficient of the basis function is determined by a step function having the weighting parameter as a variable.
A model creation method characterized by this.

An identification method for identifying data,
An information processing apparatus having a model created by the model creation method according to claim 4 ,
Obtaining data, and
Identifying the data based on the model;
Outputting the identification result of the data;
Identification method including:

A program for creating a model from learning data,
In the information processing device,
Acquiring learning data; and
Creating a model represented as a weighted sum of a plurality of basis functions;
And execute
The step of creating the model includes
Determining a prior distribution of weighting parameters that control the weighting of the plurality of basis functions as a Laplace distribution controlled by hyperparameters;
Determining the weighting parameters;
Including
The weighting parameter and the hyper parameter are determined so that the posterior probability of the model representing the likelihood of the model calculated from the prior distribution and the learning data is maximized ,
The weighting coefficient of the basis function is determined by a step function having the weighting parameter as a variable.
A program characterized by that.

A program for identifying data,
An information processing apparatus having a model created by the program according to claim 6 ,
Obtaining data, and
Identifying the data based on the model;
Outputting the identification result of the data;
A program that executes