JP6660248B2

JP6660248B2 - Objective variable prediction device, method and program

Info

Publication number: JP6660248B2
Application number: JP2016100885A
Authority: JP
Inventors: ブロンデルマチュー; 正和石畠; 昭典藤野; 上田　修功; 修功上田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-05-19
Filing date: 2016-05-19
Publication date: 2020-03-11
Anticipated expiration: 2036-05-19
Also published as: JP2017207987A

Description

本発明は、機械学習の分野に関し、より詳細には、訓練データを用いて推定されたモデルを利用して特徴ベクトルに対応する目的変数を予測するための技術に関する。 The present invention relates to the field of machine learning, and more particularly, to a technique for predicting an objective variable corresponding to a feature vector using a model estimated using training data.

従来の統計的手法に基づく予測技術では、データと目的変数との依存関係を表すモデルを特徴ベクトルとモデルパラメータとの関数として与え、当該モデルを用いて特徴ベクトルに対する目的変数の値が推定される。一般に、当該モデルパラメータの値は目的変数が判明している訓練データを用いて求められる。 In a prediction technique based on a conventional statistical method, a model representing a dependency relationship between data and an objective variable is given as a function of a feature vector and a model parameter, and the value of the objective variable for the feature vector is estimated using the model. . Generally, the value of the model parameter is obtained using training data for which the objective variable is known.

２次特徴の組み合わせが扱えるモデルの例として、factorization machines(FM)がある（非特許文献１，２）。バイス項と１次特徴を除いて、FMでは以下のモデル関数が用いられる。 As an example of a model that can handle a combination of secondary features, there is factorization machines (FM) (Non-Patent Documents 1 and 2). Except for the vise term and the primary features, the following model functions are used in FM.

ここで、

here,

は入力特徴ベクトルであり、

Is the input feature vector,

は低ランクの２次重み行列（ただし、

Is a low-rank secondary weight matrix (where

である）であり、

), And

はハイパーパラメータである。パラメータ推定のためには、SGD、座標降下法（coordinate descent）、MCMC等を用いることができる（非特許文献１，２）。

Is a hyperparameter. For parameter estimation, SGD, coordinate descent, MCMC, or the like can be used (Non-Patent Documents 1 and 2).

また、２次特徴の組み合わせと３次特徴の組み合わせとが扱えるモデルの例として、polynomial networks(PN)がある（非特許文献３）。PNでは、以下のモデル関数を用いる。 As an example of a model that can handle a combination of secondary features and a combination of tertiary features, there is polynomial networks (PN) (Non-Patent Document 3). In PN, the following model function is used.

ここで、

here,

はベクトルであり、

Is a vector,

は行列であり、

Is a matrix,

はactivation関数である。ここで、

Is the activation function. here,

は内積を示す。

Indicates a dot product.

であるとき、パラメータ推定には、逐次アルゴリズムが利用可能である（非特許文献３）。

, A sequential algorithm can be used for parameter estimation (Non-Patent Document 3).

S.Rendle (2012), Factorization machines with libfm, ACM Transactions on Intelligent Systems and Technology, vol.3, pp.57-78S. Rendle (2012), Factorization machines with libfm, ACM Transactions on Intelligent Systems and Technology, vol.3, pp.57-78 S.Rendle (2010), Factorization machines, Proceedings of International Conference on Data Mining, pp.995-1000S.Rendle (2010), Factorization machines, Proceedings of International Conference on Data Mining, pp.995-1000 R.Livni, S.Shalev-Shwartz, O.Shamir (2014), On the computational efficiency of training neural networks, Advances in Neural Information Processing Systems, pp.855-863R. Livni, S. Shalev-Shwartz, O. Shamir (2014), On the computational efficiency of training neural networks, Advances in Neural Information Processing Systems, pp. 855-863

上述した従来のFM及びPMでは、それぞれ２次と３次までの特徴の組み合わせしか扱うことができない。従って、より高い次数の特徴ベクトルの組み合わせを扱うことができると共に、少ない計算量でパラメータ値を求めることが可能なアルゴリズムが求められる。 The above-mentioned conventional FM and PM can only handle combinations of the second and third order features, respectively. Therefore, an algorithm is required that can handle combinations of feature vectors of higher orders and that can obtain parameter values with a small amount of calculation.

上述した問題点を鑑み、本発明の課題は、高い次数の特徴ベクトルの組み合わせに対するモデルを利用して、データに対応する目的変数を予測するための技術を提供することである。 In view of the above-described problems, an object of the present invention is to provide a technique for predicting an objective variable corresponding to data using a model for a combination of feature vectors of a high order.

上記課題を解決するため、本発明の一態様は、特徴ベクトルにより表現されるデータに対応する目的変数を予測する目的変数予測装置であって、前記目的変数が判明している特徴ベクトルにより表現される訓練データを利用して、ｍ次特徴の組み合わせに対するモデル関数におけるベクトルにより表現される第１のパラメータと行列により表現される第２のパラメータとを推定する推定部と、前記推定された第１のパラメータと第２のパラメータによるモデル関数を利用して、未知のデータに対応する目的変数を予測する予測部と、を有する目的変数予測装置に関する。 In order to solve the above-described problem, one embodiment of the present invention is a target variable prediction device that predicts a target variable corresponding to data represented by a feature vector, wherein the target variable is represented by a feature vector in which the target variable is known. An estimating unit for estimating a first parameter represented by a vector and a second parameter represented by a matrix in a model function for a combination of m-order features using training data obtained by And a prediction unit that predicts a target variable corresponding to unknown data by using a model function based on the first parameter and a second parameter.

本発明の他の態様は、特徴ベクトルにより表現されるデータに対応する目的変数を予測する目的変数予測装置であって、前記目的変数が判明している特徴ベクトルにより表現される（ｍ−１）次以下の特徴の組み合わせに対する訓練データについて、前記訓練データの次数を拡大することによって取得された拡大された訓練データを利用して、ｍ次特徴ベクトルの組み合わせに対するモデル関数におけるベクトルにより表現される第１のパラメータと行列により表現される第２のパラメータとを推定する推定部と、前記推定された第１のパラメータと第２のパラメータによるモデル関数を利用して、未知のデータに対応する目的変数を予測する予測部と、を有する目的変数予測装置に関する。 Another aspect of the present invention is an objective variable predicting apparatus for predicting an objective variable corresponding to data represented by a feature vector, wherein the objective variable is represented by a feature vector for which the objective variable is known (m-1). For the training data for the following combination of features, the expanded training data obtained by expanding the degree of the training data is used, and the training data represented by the vector in the model function for the combination of the m-th feature vector is used. An estimation unit for estimating the first parameter and a second parameter represented by a matrix, and an objective variable corresponding to unknown data using a model function based on the estimated first parameter and the second parameter. And a prediction unit for predicting the target variable.

本発明の他の態様は、特徴ベクトルにより表現されるデータに対応する目的変数を予測する目的変数予測装置により実行される方法であって、前記目的変数が判明している特徴ベクトルにより表現される訓練データを利用して、ｍ次特徴の組み合わせに対するモデル関数におけるベクトルにより表現される第１のパラメータと行列により表現される第２のパラメータとを推定するステップと、前記推定された第１のパラメータと第２のパラメータによるモデル関数を利用して、未知のデータに対応する目的変数を予測するステップと、を有する方法に関する。 Another aspect of the invention is a method performed by an objective variable prediction device for predicting an objective variable corresponding to data represented by a feature vector, wherein the objective variable is represented by a known feature vector. Estimating, using training data, a first parameter represented by a vector and a second parameter represented by a matrix in a model function for the m-th feature combination; and the estimated first parameter And predicting an objective variable corresponding to the unknown data using a model function based on the second parameter.

本発明によると、高い次数の特徴ベクトルの組み合わせに対するモデルを利用して、データに対応する目的変数を予測することができる。 According to the present invention, a target variable corresponding to data can be predicted using a model for a combination of high-order feature vectors.

図１は、本発明の一実施例によるパラメータ推定及び目的変数予測処理を示す概略図である。FIG. 1 is a schematic diagram showing parameter estimation and target variable prediction processing according to one embodiment of the present invention. 図２は、本発明の一実施例による目的変数予測装置の機能構成を示すブロック図である。FIG. 2 is a block diagram showing a functional configuration of the target variable prediction device according to one embodiment of the present invention. 図３は、本発明の一実施例によるパラメータ推定及び目的変数予測処理を示す概略図である。FIG. 3 is a schematic diagram illustrating parameter estimation and target variable prediction processing according to an embodiment of the present invention. 図４は、本発明の一実施例によるパラメータ推定及び目的変数予測処理を示す概略図である。FIG. 4 is a schematic diagram showing parameter estimation and target variable prediction processing according to one embodiment of the present invention. 図５は、本発明の一実施例によるパラメータ推定及び目的変数予測処理を示す概略図である。FIG. 5 is a schematic diagram showing parameter estimation and target variable prediction processing according to one embodiment of the present invention. 図６は、本発明の一実施例によるパラメータ推定及び目的変数予測処理を示すフロー図である。FIG. 6 is a flowchart showing the parameter estimation and target variable prediction processing according to one embodiment of the present invention. 図７は、本発明の一実施例による目的変数予測装置のハードウェア構成を示すブロック図である。FIG. 7 is a block diagram showing a hardware configuration of the target variable prediction device according to one embodiment of the present invention.

以下、図面に基づいて本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

以下の実施例では、FM及びPNより高い次数の特徴ベクトルの組み合わせを処理可能な目的変数予測装置が開示される。すなわち、当該目的変数予測装置では、FM及びPMをより一般化したフレームワークが導入される。後述される実施例を概略すると、図１に示されるように、まず目的変数ｙが判明している特徴ベクトルｘにより表現される訓練データを利用して、ｍ次特徴の組み合わせに対するモデル関数のパラメータλ及びＰが目的関数の凸性を利用したアルゴリズムに従って推定又は学習される。当該パラメータλ及びＰは、線形時間の計算量で推定可能である。訓練データによる学習後、目的変数予測装置は、推定されたパラメータλ及びＰによるモデル関数を利用して、未知のデータｘの目的変数を予測する。このようにして、ｍ次特徴の組み合わせを扱うことが可能なモデル関数のパラメータを少ない計算量で推定することができ、効率的に予測性能を向上させることが可能になる。 In the following embodiment, a target variable prediction device capable of processing a combination of feature vectors of higher order than FM and PN is disclosed. That is, in the objective variable prediction device, a framework that generalizes FM and PM is introduced. Briefly describing an embodiment described later, as shown in FIG. 1, first, using training data represented by a feature vector x for which a target variable y is known, a parameter of a model function for a combination of m-th features is used. λ and P are estimated or learned according to an algorithm using the convexity of the objective function. The parameters λ and P can be estimated by the amount of calculation in linear time. After learning based on the training data, the objective variable prediction device predicts the objective variable of the unknown data x using a model function based on the estimated parameters λ and P. In this way, the parameters of the model function that can handle the m-th feature combination can be estimated with a small amount of calculation, and the prediction performance can be efficiently improved.

まず、図２〜５を参照して、本発明の一実施例による目的変数予測装置を説明する。本実施例による目的変数予測装置は、特徴ベクトルにより表現されるデータに対応する目的変数を予測する。具体的には、訓練データの特徴ベクトル First, an objective variable prediction device according to an embodiment of the present invention will be described with reference to FIGS. The target variable prediction device according to the present embodiment predicts a target variable corresponding to data represented by a feature vector. Specifically, the feature vector of the training data

が与えられたとき、目的変数予測装置は、推定されたパラメータを用いたｍ次特徴の組み合わせに対するモデル関数

Is given, the objective variable predictor sets the model function for the combination of the m-th feature using the estimated parameters.

を利用して目的変数を予測する。ここで、λ_ｓはλのｓ番目の要素であり、ｐ_ｓはＰのｓ番目の列であり、

Is used to predict the objective variable. Where λ _s is the s-th element of λ, p _s is the s-th column of P,

はカーネルであり、ｋはハイパーパラメータである。

Is a kernel and k is a hyperparameter.

図２は、本発明の一実施例による目的変数予測装置の機能構成を示すブロック図である。図２に示されるように、目的変数予測装置１００は、推定部１１０及び予測部１２０を有する。 FIG. 2 is a block diagram showing a functional configuration of the target variable prediction device according to one embodiment of the present invention. As shown in FIG. 2, the target variable prediction device 100 includes an estimation unit 110 and a prediction unit 120.

推定部１１０は、目的変数が判明している特徴ベクトルにより表現される訓練データを利用して、式（３）に示されるｍ次特徴の組み合わせに対するモデル関数におけるパラメータを推定する。具体的には、推定部１１０は、後述される２つのカーネルを利用して、訓練データからベクトルλと行列Ｐとを推定又は学習する。 The estimating unit 110 estimates parameters in the model function for the m-th feature combination shown in Expression (3) using training data represented by a feature vector whose objective variable is known. Specifically, the estimating unit 110 estimates or learns a vector λ and a matrix P from training data using two kernels described later.

第１のカーネルは、 The first kernel is

により表される同次多項式カーネルである。ここで、ｍは多項式の次数である。

Is a homogeneous polynomial kernel represented by Here, m is the degree of the polynomial.

第２のカーネルは、 The second kernel is

により表されるANOVAカーネルである。ここで、

Is the ANOVA kernel represented by here,

であり、ｐ_１，...，ｐ_ｄはベクトルｐの要素であり、ｘ_１，...，ｘ_ｄはベクトルｘの要素である。

And a, _{_p 1,} ..., _p _d is the elements of the vector _{_{p, x 1, ..., x}} d is the element of the vector x.

ANOVAカーネルを用いたとき、式（４）から理解されるように、ANOVAカーネルを計算するコストは、 When the ANOVA kernel is used, as understood from Equation (4), the cost of calculating the ANOVA kernel is

であり、多項式時間かかる。この計算量を低減するため、

And takes polynomial time. To reduce this amount of computation,

であるとき、推定部１１０は、以下の等価式を利用する。

, The estimating unit 110 uses the following equivalent equation.

ここで、

here,

と、

When,

として定義される。当該等価式を利用すると、ANOVAカーネルを計算するコストは、

Is defined as Using this equation, the cost of calculating the ANOVA kernel is

となり、線形時間に低減される。この結果、推定部１１０は、式（３）を

And reduced to linear time. As a result, the estimating unit 110 calculates Equation (3)

の計算量で計算できる。

It can be calculated with the amount of calculation.

なお、上述した同次多項式カーネルとANOVAカーネルとを式（３）に導入することによって、式（１）及び式（２）により表されるFM及びPNのモデル関数は、 By introducing the above-described homogeneous polynomial kernel and ANOVA kernel into equation (3), the FM and PN model functions represented by equations (1) and (2) are:

として表現できる。これらの式から理解されるように、FM及びPNは式（３）の特殊なケースであるとみなすことができる。

Can be expressed as As can be seen from these equations, FM and PN can be considered as special cases of equation (3).

パラメータλ及びＰを推定するため、推定部１１０は、 To estimate the parameters λ and P, the estimation unit 110

により定式化される目的関数を最小化する。ここで、

Minimize the objective function formulated by here,

は目的変数が判明している訓練データであり、

Is the training data for which the objective variable is known,

は２階微分可能な凸の損失関数を表し、β（＞０）はハイパーパラメータである。以降において、

Represents a second-order differentiable convex loss function, and β (> 0) is a hyperparameter. Hereafter,

をまとめた行列を

Matrix

と呼ぶ。

Call.

一実施例では、モデル関数がANOVAカーネルを利用するとき、すなわち、 In one embodiment, when the model function utilizes an ANOVA kernel,

であるとき、式（５）の目的関数は多量凸（multi-convex）となり、より詳細には、パラメータλ及びＰの各行について凸となる。従って、λについての学習とＰについての学習を目的関数が収束するまで繰り返すことによって、推定部１１０は、極小解を得ることができる。例えば、推定部１１０は、

, The objective function of equation (5) is multi-convex, and more specifically, convex for each row of the parameters λ and P. Therefore, the estimation unit 110 can obtain a minimum solution by repeating the learning on λ and the learning on P until the objective function converges. For example, the estimation unit 110

に示されるアルゴリズム１に従って計算することによって、最適なパラメータλ及びＰを推定できる。

The optimal parameters λ and P can be estimated by calculating according to the algorithm 1 shown in FIG.

ここで、推定部１１０は、パラメータλをLassoアルゴリズムに従って学習し、パラメータＰを座標降下法に従って学習してもよい。具体的には、推定部１１０は、 Here, the estimation unit 110 may learn the parameter λ according to the Lasso algorithm, and may learn the parameter P according to the coordinate descent method. Specifically, the estimation unit 110

に示されるアルゴリズム２に従ってパラメータλを推定する。ここで、アルゴリズム２における式（１３）は、標準のLasso問題であるため、任意のLassoアルゴリズムにより解くことができる。

The parameter λ is estimated according to the algorithm 2 shown in FIG. Here, since Equation (13) in Algorithm 2 is a standard Lasso problem, it can be solved by any Lasso algorithm.

他方、推定部１１０は、座標降下法に従ってパラメータＰを推定する。すなわち、Ｐの要素が循環的に更新される。より詳細には、すべての On the other hand, the estimating unit 110 estimates the parameter P according to the coordinate descent method. That is, the elements of P are updated cyclically. More specifically, all

と、すべての

And all

に対して、Ｐの要素は、

In contrast, the elements of P are

のように更新される。ここで、ｐ_ｊｓはＰのｊ番目の行とｓ番目の列の要素であり、

Will be updated as follows. Where p _js is the element of the j th row and s th column of P,

である。なお、μは凸損失関数ｌに依存する滑らかさの定数である。例えば、凸損失関数とこれに対応するμの値について、以下の対応関係が利用されてもよい。

It is. Here, μ is a smoothness constant dependent on the convex loss function l. For example, the following correspondence may be used for the convex loss function and the corresponding value of μ.

ここで、

here,

である。

It is.

式（６）を効率的に計算するためには、 To calculate equation (6) efficiently,

が効率的に計算される必要がある。

Needs to be calculated efficiently.

を固定したとき、すべての

When fixed, all

に対して

Against

と

When

とを維持し、Ｐの要素を更新するたびに

And every time the element of P is updated

と

When

とを同期させると、

Is synchronized with

を計算するコストを

The cost of calculating

に低減できる。従って、Ｐのすべての要素を１回更新するコストは

Can be reduced to Thus, the cost of updating all elements of P once is

となる。ここで、

Becomes here,

は行列Ｘの非ゼロの要素の数を示す。具体的には、推定部１１０は、

Denotes the number of non-zero elements of the matrix X. Specifically, the estimation unit 110

に示されるアルゴリズム３に従って、上述した座標降下法によるＰの推定を行うことができる。

Can be estimated by the above-described coordinate descent method according to the algorithm 3 shown in FIG.

他の実施例では、モデル関数が同次多項式カーネルを利用するとき、すなわち、 In another embodiment, when the model function utilizes a homogeneous polynomial kernel,

であるとき、式（５）の目的関数は多量凸ではなくなり、より困難な最適化問題となる。これを解決するため、本実施例では、パラメータλ及びＰの推定を低ランク対称テンソル推定問題に変換し、当該問題を多量凸目的関数の最小化として定式化する。

, The objective function of equation (5) is no longer convex and becomes a more difficult optimization problem. In order to solve this, in this embodiment, the estimation of the parameters λ and P is converted into a low-rank symmetric tensor estimation problem, and the problem is formulated as minimization of a large convex objective function.

まず、同次多項式カーネルを First, the homogeneous polynomial kernel is

に書き換える。ここで、

Rewrite to here,

は、

Is

のランク１テンソルを表す。

Represents the rank 1 tensor of

の対称テンソルの集合を表す。なお、

Represents a set of symmetric tensors of. In addition,

であり、

And

はテンソルの内積（要素毎の足し合わせ）である。

Is the inner product of tensors (addition for each element).

一般に、任意の対称テンソル In general, any symmetric tensor

に対して、以下の分解が存在する。

The following decompositions exist.

ここで、ｋは

Where k is

の対称ランクである。

Is a symmetric rank.

式（７）及び（８）を用いて、 Using equations (7) and (8),

が得られる。このようにして、パラメータλ及びＰの推定がランクｋテンソル

Is obtained. Thus, the estimation of the parameters λ and P is the rank k tensor

の推定に変換できる。

Can be converted into an estimate.

多量凸目的関数を定式化するため、 To formulate a mass convex objective function,

をｍ個の行列

With m matrices

を用いて分解する。具体的には、

Decompose using. In particular,

は、以下のように分解できる。

Can be decomposed as follows.

ここで、

here,

はＵ^ｔのｓ番目の列を表す。実用性の観点から、ｒはｋ／ｍに設定される。式（９）を用いて、

Represents the s-th column of U ^t . From the viewpoint of practicality, r is set to k / m. Using equation (9),

が得られる。当該式（１０）を計算するコストは、

Is obtained. The cost of calculating the expression (10) is

である。

It is.

ｍ個の行列Ｕ^１，...，Ｕ^ｍを推定するため、推定部１１０は、 To estimate the ^m matrices U ¹ ,..., U ^m ,

に示される目的関数を最小化する。ここで、式（１１）は、ｍ個の行列Ｕ^１，...，Ｕ^ｍのそれぞれに対して凸である。

The objective function shown in is minimized. Here, equation (11) is convex for each of the ^m matrices U ¹ ,..., U ^m .

推定部１１０は、式（１１）の目的関数に座標降下法を適用する。すなわち、ｍ個の行列Ｕ^１，...，Ｕ^ｍの要素が循環的に更新される。より詳細には、すべての The estimating unit 110 applies the coordinate descent method to the objective function of Expression (11). That is, the elements of the ^m matrices U ¹ ,..., U ^m are cyclically updated. More specifically, all

と、すべての

And all

と、すべての

And all

に対して、Ｕの要素は、

In contrast, the elements of U are

のように更新される。ここで、

Will be updated as follows. here,

である。

It is.

式（１２）を効率的に計算するためには、 To calculate equation (12) efficiently,

が効率的に計算される必要がある。ｔ及びｓを固定したとき、すべての

Needs to be calculated efficiently. When t and s are fixed, all

に対して

Against

を維持し、Ｕの要素を更新するたびに、ξ_ｉを同期すると、

And updating 、 _i every time we update the elements of U,

を

To

のコストで計算できなくなる。従って、ｍ個の行列Ｕ^１，...，Ｕ^ｍのすべての要素を１回更新するコストは、

Cannot be calculated at the cost of Thus, the cost of updating all elements of ^m matrices U ¹ ,..., U ^m once is

となる。具体的には、推定部１１０は、図３に示されるように、

Becomes Specifically, the estimating unit 110, as shown in FIG.

に示されるアルゴリズム４に従って、上述した座標降下法によるＰの推定を行うことができる。

Can be estimated by the coordinate descent method described above in accordance with the algorithm 4 shown in FIG.

上述した Mentioned above

と

When

は同次多項式であるため、ｍ次の特徴の組み合わせのみを利用している。しかしながら、実用性の観点から、１次の特徴からｍ次の特徴の組み合わせまでが利用可能であることが好ましい。このため、一実施例では、推定部１１０は、目的変数が判明している特徴ベクトルにより表現される（ｍ−１）次以下の特徴の組み合わせに対する訓練データについて、訓練データの特徴の組み合わせを拡大し、拡大された訓練データを利用して、ｍ次の特徴の組み合わせに対するモデル関数におけるパラメータλ及びＰ又パラメータＵを推定してもよい。

Is a polynomial of the same order, so only the combination of features of the m-th order is used. However, from the viewpoint of practicality, it is preferable that a combination of the first to m-th features can be used. For this reason, in one embodiment, the estimating unit 110 expands the combination of the features of the training data with respect to the training data for the (m−1) -th order or less feature combination represented by the feature vector whose objective variable is known. Then, using the expanded training data, the parameters λ and P or the parameter U in the model function for the m-th feature combination may be estimated.

具体的には、実用性の観点から、パラメータの推定時と推定されたパラメータによる予測時の双方において、データに常に１である特徴を加えることによって、次数を拡大する。これにより、１次の特徴からｍ次の特徴の組み合わせまでに対応した目的変数予測処理が可能になる。 More specifically, from the viewpoint of practicality, the order is expanded by adding a feature that is always 1 to the data at both the time of parameter estimation and the time of prediction with the estimated parameter. As a result, it is possible to perform a target variable prediction process corresponding to a combination of a primary feature to an m-th feature.

より詳細には、 More specifically,

の場合には、訓練データ

In the case of the training data

の代わりに、

Instead of,

が用いられる。また、予測時には、未知のデータｘの代わりに、

Is used. At the time of prediction, instead of unknown data x,

が用いられる。この場合、図４に示されるように、推定部１１０は、上述したアルゴリズム１に従ってパラメータλ及びＰを推定し、推定されたパラメータλ及びＰによるモデル関数を利用して、拡大された未知のデータに対して目的変数予測処理を実行する。

Is used. In this case, as shown in FIG. 4, the estimating unit 110 estimates the parameters λ and P according to the above-described algorithm 1, and uses the model function based on the estimated parameters λ and P to enlarge the unknown data. Execute the target variable prediction process for.

あるいは、 Or,

の場合には、学習データ

In the case of, the training data

の代わりに、

Instead of,

Is used. At the time of prediction, instead of unknown data x,

が用いられる。この場合、図５に示されるように、推定部１１０は、上述したアルゴリズム４に従ってパラメータＵを推定し、推定されたパラメータＵによるモデル関数を利用して、拡大された未知のデータに対して目的変数予測処理を実行する。

Is used. In this case, as shown in FIG. 5, the estimating unit 110 estimates the parameter U according to the above-described algorithm 4, and uses a model function based on the estimated parameter U to perform a target operation on the enlarged unknown data. Execute variable prediction processing.

次に、図６を参照して、本発明の一実施例によるパラメータ推定及び目的変数予測処理を説明する。当該処理は、目的変数予測装置１００により実行される。図６は、本発明の一実施例によるパラメータ推定及び目的変数予測処理を示すフロー図である。 Next, with reference to FIG. 6, a description will be given of parameter estimation and target variable prediction processing according to an embodiment of the present invention. This process is executed by the target variable prediction device 100. FIG. 6 is a flowchart showing the parameter estimation and target variable prediction processing according to one embodiment of the present invention.

図６に示されるように、ステップＳ１０１において、推定部１１０は、目的変数が判明している特徴ベクトルにより表現される訓練データを利用して、ｍ次特徴の組み合わせに対するモデル関数におけるパラメータを推定する。 As shown in FIG. 6, in step S101, the estimating unit 110 estimates parameters in a model function for a combination of m-th features using training data represented by a feature vector whose objective variable is known. .

具体的には、推定部１１０は、訓練データを用いて、式（３）に示されるモデル関数のパラメータλ及びＰを推定する。例えば、モデル関数のカーネルがANOVAカーネルである場合、推定部１１０は、アルゴリズム１に従って、訓練データを利用して式（５）に示される目的関数を最小化するパラメータλ及びＰを推定する。あるいは、モデル関数のカーネルが同次多項式カーネルである場合、推定部１１０は、アルゴリズム４に従って、訓練データを利用して式（１１）に示される目的関数を最小化するパラメータＵを推定する。 Specifically, the estimating unit 110 estimates parameters λ and P of the model function shown in Expression (3) using the training data. For example, when the kernel of the model function is an ANOVA kernel, the estimating unit 110 estimates parameters λ and P that minimize the objective function shown in Expression (5) using the training data according to Algorithm 1. Alternatively, when the kernel of the model function is a homogeneous polynomial kernel, the estimating unit 110 estimates the parameter U that minimizes the objective function shown in Expression (11) using the training data according to Algorithm 4.

また、（ｍ−１）次以下の特徴の組み合わせのデータに対して、推定部１１０は、データの次数を拡大し、拡大されたデータを利用してモデル関数のパラメータλ及びＰを推定してもよい。例えば、モデル関数のカーネルがANOVAカーネルである場合、推定部１１０は、訓練データに常に１である特徴をｍ次元分加えることによって次元を拡大してもよい。あるいは、モデル関数のカーネルが同時多項式カーネルである場合、推定部１１０は、訓練データに常に１である特徴を１次元分加えることによって次元を拡大してもよい。 The estimating unit 110 expands the order of the data and estimates the parameters λ and P of the model function using the expanded data with respect to the data of the combination of the features of the order of (m−1) or less. Is also good. For example, when the kernel of the model function is an ANOVA kernel, the estimation unit 110 may increase the dimension by adding a feature that is always 1 to the training data by m dimensions. Alternatively, when the kernel of the model function is a simultaneous polynomial kernel, the estimating unit 110 may increase the dimension by adding a feature that is always 1 to the training data by one dimension.

ステップＳ１０２において、予測部１２０は、推定されたパラメータによるモデル関数を利用して、未知のデータに対応する目的変数を予測する。具体的には、ステップＳ１０１において推定されたパラメータλ及びＰによる式（３）のモデル関数に未知のデータｘを入力することによって、目的変数を予測してもよい。あるいは、ステップＳ１０１において推定されたパラメータＵによる同次多項式カーネルによる式（３）のモデル関数に未知のデータｘを入力することによって、目的変数を予測してもよい。 In step S102, the prediction unit 120 predicts a target variable corresponding to unknown data using a model function based on the estimated parameters. Specifically, the objective variable may be predicted by inputting the unknown data x to the model function of the equation (3) using the parameters λ and P estimated in step S101. Alternatively, the objective variable may be predicted by inputting the unknown data x to the model function of the equation (3) using the homogeneous polynomial kernel based on the parameter U estimated in step S101.

また、（ｍ−１）次以下の特徴の組み合わせのデータに対して、推定部１１０は、データの次元を拡大し、拡大されたデータを利用して目的変数を予測してもよい。例えば、モデル関数のカーネルがANOVAカーネルと同次多項式カーネルの何れである場合も、推定部１１０は、データに常に１である特徴を１次元分加えることによって次数を拡大してもよい。 In addition, the estimating unit 110 may expand the dimension of the data with respect to the data of the combination of the following (m−1) -th order features, and predict the target variable using the expanded data. For example, regardless of whether the kernel of the model function is an ANOVA kernel or a homogeneous polynomial kernel, the estimating unit 110 may expand the order by adding a feature that is always 1 to the data by one dimension.

次に、図７を参照して、本発明の一実施例による目的変数予測装置のハードウェア構成を説明する。図７は、本発明の一実施例による目的変数予測装置のハードウェア構成を示すブロック図である。 Next, with reference to FIG. 7, the hardware configuration of the target variable prediction device according to one embodiment of the present invention will be described. FIG. 7 is a block diagram showing a hardware configuration of the target variable prediction device according to one embodiment of the present invention.

図７に示されるように、目的変数予測装置１００は、典型的には、サーバにより実現されてもよく、例えば、バスを介し相互接続されるドライブ装置１０１、補助記憶装置１０２、メモリ装置１０３、プロセッサ１０４、インタフェース装置１０５及び通信装置１０６から構成される。目的変数予測装置１００における上述した各種機能及び処理を実現するプログラムを含む各種コンピュータプログラムは、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）、フラッシュメモリなどの記録媒体１０７によって提供されてもよい。プログラムを記憶した記録媒体１０７がドライブ装置１０１にセットされると、プログラムが記録媒体１０７からドライブ装置１０１を介して補助記憶装置１０２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１０７により行う必要はなく、ネットワークなどを介し何れかの外部装置からダウンロードするようにしてもよい。補助記憶装置１０２は、インストールされたプログラムを格納すると共に、必要なファイルやデータなどを格納する。メモリ装置１０３は、プログラムの起動指示があった場合に、補助記憶装置１０２からプログラムやデータを読み出して格納する。プロセッサ１０４は、メモリ装置１０３に格納されたプログラムやプログラムを実行するのに必要なパラメータなどの各種データに従って、目的変数予測装置１００の各種機能及び処理を実行する。インタフェース装置１０５は、ネットワーク又は外部装置に接続するための通信インタフェースとして用いられる。通信装置１０６は、インターネットなどのネットワークと通信するための各種通信処理を実行する。しかしながら、上述したハードウェア構成は単なる一例であり、目的変数予測装置１００は、上述したハードウェア構成に限定されるものでなく、他の何れか適切なハードウェア構成により実現されてもよい。 As shown in FIG. 7, the objective variable prediction device 100 may be typically realized by a server, for example, a drive device 101, an auxiliary storage device 102, a memory device 103, It comprises a processor 104, an interface device 105 and a communication device 106. Various computer programs including programs for realizing the above-described various functions and processes in the objective variable prediction device 100 are stored in a recording medium 107 such as a CD-ROM (Compact Disk-Read Only Memory), a DVD (Digital Versatile Disk), or a flash memory. May be provided. When the recording medium 107 storing the program is set in the drive device 101, the program is installed from the recording medium 107 to the auxiliary storage device 102 via the drive device 101. However, the program need not always be installed on the recording medium 107, and may be downloaded from any external device via a network or the like. The auxiliary storage device 102 stores the installed program and also stores necessary files and data. The memory device 103 reads out and stores a program or data from the auxiliary storage device 102 when a program start instruction is issued. The processor 104 executes various functions and processes of the target variable prediction device 100 according to various data such as a program stored in the memory device 103 and parameters necessary for executing the program. The interface device 105 is used as a communication interface for connecting to a network or an external device. The communication device 106 executes various communication processes for communicating with a network such as the Internet. However, the above-described hardware configuration is merely an example, and the objective variable prediction device 100 is not limited to the above-described hardware configuration, and may be implemented by any other appropriate hardware configuration.

なお、上述した目的変数予測装置１００の各部及びステップＳ１０１〜Ｓ１０２は、コンピュータのメモリ装置１０３に記憶されたプログラムをプロセッサ１０４が実行することによって実現されてもよい。 Note that each unit of the objective variable prediction device 100 and steps S101 to S102 described above may be realized by the processor 104 executing a program stored in the memory device 103 of the computer.

以上、本発明の実施例について詳述したが、本発明は上述した特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 As described above, the embodiments of the present invention have been described in detail. However, the present invention is not limited to the specific embodiments described above, and various modifications may be made within the scope of the present invention described in the appended claims.・ Change is possible.

１００目的変数予測装置
１１０推定部
１２０予測部 Reference Signs List 100 target variable prediction device 110 estimation unit 120 prediction unit

Claims

An objective variable prediction device for predicting an objective variable corresponding to data represented by a feature vector,
Utilizing training data represented by a feature vector whose objective variable is known, a first parameter represented by a vector and a second parameter represented by a matrix in a model function for a combination of m-th features An estimator for estimating
A prediction unit that predicts a target variable corresponding to unknown data using a model function based on the estimated first parameter and the second parameter;
Have a,
The model function includes an ANOVA kernel,
The first parameter is learned according to a Lasso algorithm, the second parameter is learned according to a coordinate descent method,
An objective function for optimizing the model function is convex with respect to the first parameter and the second parameter,
The objective variable prediction device , wherein the estimation unit repeatedly learns the first parameter and the second parameter using the training data until the objective function converges .

An objective variable prediction device for predicting an objective variable corresponding to data represented by a feature vector,
Utilizing training data represented by a feature vector whose objective variable is known, a first parameter represented by a vector and a second parameter represented by a matrix in a model function for a combination of m-th features An estimator for estimating
A prediction unit that predicts a target variable corresponding to unknown data using a model function based on the estimated first parameter and the second parameter;
Has,
The model function includes a homogeneous polynomial kernel,
Estimating the first parameter and the second parameter into a symmetric tensor estimate that can be decomposed into a third parameter represented by m matrices;
The objective function for optimizing the model function is convex with respect to the third parameter,
The estimating unit learns the third parameter using the training data by applying a coordinate descent method to the objective function,
The prediction unit, the third parameter by using the model function by the eye variables predictor you predict target variable corresponding to unknown data.

An objective variable prediction device for predicting an objective variable corresponding to data represented by a feature vector,
For training data for a combination of the following (m-1) -th features represented by a feature vector whose objective variable is known, expanded training data obtained by expanding the order of the training data is used. An estimating unit for estimating a first parameter represented by a vector and a second parameter represented by a matrix in a model function for a combination of m-th feature vectors;
A prediction unit that predicts a target variable corresponding to unknown data using a model function based on the estimated first parameter and the second parameter;
Variable prediction device having

The model function includes an ANOVA kernel,
The expanded training data is obtained by expanding a dimension of the training data by m one feature;
The first parameter is learned according to a Lasso algorithm, the second parameter is learned according to a coordinate descent method,
An objective function for optimizing the model function is convex with respect to the first parameter and the second parameter,
The objective variable prediction device according to claim 3 , wherein the estimating unit repeatedly learns the first parameter and the second parameter using the training data until the objective function converges.

The model function includes a homogeneous polynomial kernel,
The expanded training data is obtained by expanding a dimension of the training data by one feature;
Estimating the first parameter and the second parameter into a symmetric tensor estimate that can be decomposed into a third parameter represented by m matrices;
The objective function for optimizing the model function is convex with respect to the third parameter,
The estimating unit learns the third parameter using the training data by applying a coordinate descent method to the objective function,
The prediction unit uses the model function by the third parameter to predict the target variable corresponding to unknown data, object variables predicting apparatus according to claim 3, wherein.

A method performed by an objective variable prediction device that predicts an objective variable corresponding to data represented by a feature vector,
Utilizing training data represented by a feature vector whose objective variable is known, a first parameter represented by a vector and a second parameter represented by a matrix in a model function for a combination of m-th features Estimating
Estimating a target variable corresponding to unknown data using a model function based on the estimated first parameter and the second parameter;
Have a,
The model function includes an ANOVA kernel,
The first parameter is learned according to a Lasso algorithm, the second parameter is learned according to a coordinate descent method,
An objective function for optimizing the model function is convex with respect to the first parameter and the second parameter,
The estimating step is a method of repeatedly learning the first parameter and the second parameter using the training data until the objective function converges .

A method performed by an objective variable prediction device that predicts an objective variable corresponding to data represented by a feature vector,
  Utilizing training data represented by a feature vector whose objective variable is known, a first parameter represented by a vector and a second parameter represented by a matrix in a model function for a combination of m-th features Estimating
  Estimating a target variable corresponding to unknown data using a model function based on the estimated first parameter and the second parameter;
Has,
  The model function includes a homogeneous polynomial kernel,
  Estimating the first parameter and the second parameter into a symmetric tensor estimate that can be decomposed into a third parameter represented by m matrices;
  The objective function for optimizing the model function is convex with respect to the third parameter,
  The step of estimating learns the third parameter using the training data by applying a coordinate descent method to the objective function,
  The predicting step is a method of predicting a target variable corresponding to unknown data using a model function based on the third parameter.

A program for causing a computer to function as each unit of the objective variable prediction device according to any one of claims 1 to 5 .