JP5071851B2

JP5071851B2 - Prediction device using time information, prediction method, prediction program, and recording medium recording the program

Info

Publication number: JP5071851B2
Application number: JP2007282843A
Authority: JP
Inventors: 具治岩田; 利幸田中
Original assignee: Kyoto University; Nippon Telegraph and Telephone Corp
Current assignee: Kyoto University; Nippon Telegraph and Telephone Corp
Priority date: 2007-10-31
Filing date: 2007-10-31
Publication date: 2012-11-14
Anticipated expiration: 2027-10-31
Also published as: JP2009110341A

Description

本発明は、生起順序が予め与えられている事象を示す学習データを学習することで将来生じる事象を予測する予測装置、予測方法、予測プログラムおよびそのプログラムを格納した記録媒体に関する。 The present invention relates to a prediction device, a prediction method, a prediction program, and a recording medium storing the program for predicting an event that will occur in the future by learning learning data indicating an event whose occurrence order is given in advance.

従来、生起順序が予め与えられている事象を示す学習データを学習することで、将来生じる事象を予測する予測問題に関する技術が知られている。この予測問題では、例えば、式（１０１）で示す入力サンプル（入力データ）ｘと、式（１０２）で示す出力サンプル（出力データ）ｙとを想定している。ここで、Ｘは入力サンプル集合、Ｙは出力サンプル集合を示す。入力サンプルｘと出力サンプルｙとには必ずしも因果関係がある訳ではない。そして、式（１０３）で示す入出力サンプル（ｘ，ｙ）が生成したときの時刻ｔを式（１０４）で定義する。ここで、Ｔは例えば最新の時刻を示す。そして、予測装置は、学習段階において、例えば式（１０５）で示す学習データを学習する。ここで、ｎはインデックスを示し、Ｎは学習データ数を示す。そして、予測装置は、予測段階において、式（１０６）で示すように、出力ｙが未知のサンプル（ｘ，ｔ）についてその出力ｙを予測する。 2. Description of the Related Art Conventionally, a technique related to a prediction problem for predicting an event occurring in the future by learning learning data indicating an event in which an occurrence order is given in advance is known. In this prediction problem, for example, an input sample (input data) x represented by Expression (101) and an output sample (output data) y represented by Expression (102) are assumed. Here, X represents an input sample set, and Y represents an output sample set. The input sample x and the output sample y do not necessarily have a causal relationship. Then, the time t when the input / output sample (x, y) represented by the equation (103) is generated is defined by the equation (104). Here, T indicates the latest time, for example. And a prediction apparatus learns the learning data shown, for example by Formula (105) in a learning stage. Here, n indicates an index, and N indicates the number of learning data. Then, in the prediction stage, the prediction device predicts the output y of the sample (x, t) whose output y is unknown, as shown by Expression (106).

この予測問題を、例えば、店舗側の購買予測に適用した場合には、出力サンプルｙは購買商品を示し、入力サンプルｘはそのときの購買履歴を示し、入出力サンプル（ｘ，ｙ）が生成したときの時刻ｔは購買時刻を示すこととなる。なお、インデックスｎは例えば購買番号や購買者（ユーザ）を示し、学習データ数Ｎは例えば購買数や購買者数（ユーザ数）を示すこととなる。購買予測の場合のデータ（購買データ）の具体例を表１に示す。ここで、購買履歴としては１つ前に購買した商品とした。購買履歴および購買商品（入出力サンプル）は時系列データである。なお、ｘ，ｙ，ｔはすべて離散的なものとする。 When this prediction problem is applied to, for example, store-side purchase prediction, the output sample y indicates a purchased product, the input sample x indicates the purchase history at that time, and an input / output sample (x, y) is generated. The time t at that time indicates the purchase time. The index n indicates, for example, a purchase number or a purchaser (user), and the learning data number N indicates, for example, the number of purchases or the number of purchasers (number of users). Table 1 shows a specific example of data (purchase data) in the case of purchase forecast. Here, the purchase history is the product purchased one time ago. The purchase history and purchased products (input / output samples) are time series data. Note that x, y, and t are all discrete.

このような時系列データの多くは、時間とともにその性質が変化するものである。時間とともに性質が変化する時系列データとは、式（１０７）に示すように、時刻ｔにおける入出力の（確率）分布Ｐと、時刻ｔ′における入出力の（確率）分布Ｐと、が異なるような時系列データである。なお、時系列データの一例として購買データを挙げたが、時系列データには、例えば、ニュース記事データ、論文データ、Ｗｅｂサーフィンデータ等が含まれる。 Many of such time-series data have properties that change with time. As shown in the equation (107), the input / output (probability) distribution P at time t and the input / output (probability) distribution P at time t ′ are different from the time-series data whose properties change with time. Such time series data. In addition, although purchase data was mentioned as an example of time series data, time series data includes, for example, news article data, paper data, Web surfing data, and the like.

時間とともにその性質が変化する時系列データの具体例を説明するために、ここでは、購買データを考える。例えば、商品販売の予測においては、現実にはメーカや店舗の経営戦略や都合に起因して、新商品の発表や発売中止等の突発的な事象が生じる。そのため、このような最新の情報を予測に反映させることにより、予測される商品候補（出力される商品候補）は日々変化することとなる。また、季節や流行、社会的、経済的環境等の大局的な事象の変化により、各商品の出力分布（予測される確率分布）も変化する。 In order to explain a specific example of time-series data whose properties change with time, purchase data is considered here. For example, in the forecast of product sales, in reality, sudden events such as the announcement of new products and cancellation of sales occur due to the management strategies and circumstances of manufacturers and stores. Therefore, by reflecting such latest information in the prediction, the predicted product candidates (output product candidates) change daily. In addition, the output distribution (predicted probability distribution) of each product also changes due to changes in global events such as seasons, trends, social and economic environments.

この購買データから、次にユーザが購入すると思われる商品を高い精度で予測できれば、その商品をユーザにリコメンドすることができる。従来、例えば、オンラインショッピング等の商品について購買順序を考慮した予測精度の高いリコメンド技術が知られている（例えば、非特許文献１参照）。
岩田具治、山田武士、上田修功、“購買順序を考慮した協調フィルタリング”、人工知能と知識処理研究会、AI2007-3,13-18,2007 If it is possible to predict with high accuracy a product that the user will purchase next from the purchase data, the product can be recommended to the user. 2. Description of the Related Art Conventionally, for example, a recommendation technique with high prediction accuracy in consideration of a purchase order for products such as online shopping is known (for example, see Non-Patent Document 1).
Tomoharu Iwata, Takeshi Yamada, Nobuo Ueda, “Collaborative Filtering Considering Purchase Order”, Artificial Intelligence and Knowledge Processing Study Group, AI2007-3,13-18,2007

非特許文献１に開示された技術は、予測精度は高いものの、時間情報すなわち購買時間を考慮するものではなく、計算コストを低く抑えることを目的としており、最新データの予測精度を高めることを目的とするものでもない。しかしながら、時間とともに性質が変化する時系列データにおける予測問題では、全学習データに対する高い予測精度ではなく、最新の時刻におけるデータに対して高い予測精度が望まれることが多い。これは、例えば、性質変化が急激なデータの場合には、過去の時刻のデータに対する予測精度が高いモデルであっても、将来を予測するためには使えない場合があるからである。 Although the technology disclosed in Non-Patent Document 1 has high prediction accuracy, it does not consider time information, that is, purchase time, and aims to keep calculation cost low and to increase prediction accuracy of the latest data. It is not something to do. However, in a prediction problem in time-series data whose properties change with time, high prediction accuracy is often desired for data at the latest time, not high prediction accuracy for all learning data. This is because, for example, in the case of data whose property change is abrupt, even a model with high prediction accuracy for data at past times may not be used for predicting the future.

そこで、本発明は、以上のような問題点に鑑みてなされたものであり、時間とともに性質が変化する時系列データにおいて最新の時刻におけるデータを高い精度で予測することができる予測装置、予測方法、予測プログラムおよびそのプログラムを記録した記録媒体を提供することを目的とする。 Therefore, the present invention has been made in view of the above problems, and a prediction apparatus and a prediction method capable of predicting data at the latest time with high accuracy in time-series data whose properties change with time. An object is to provide a prediction program and a recording medium on which the program is recorded.

本発明は、前記課題を解決するために創案されたものであり、本発明に係る予測装置は、学習用サンプルデータの系列である系列データおよび予測用サンプルデータを入力する入力手段と、演算手段と、前記入力された系列データと前記予測用サンプルデータとを格納すると共に前記演算手段による演算処理結果を記憶する記憶手段と、前記演算手段による予測結果を出力する出力手段と、を備え、前記学習用サンプルデータを、時間または地域に応じて性質が変化する第１観測データと、前記第１観測データに関連して時間または地域に応じて性質が変化する第２観測データと、前記第１および第２観測データに対応する時間または地域の情報の離散値を示す指標情報と、の３つの要素を有するデータとし、前記予測用サンプルデータを、前記学習用サンプルデータを構成する３つの要素のうち、前記第２観測データが未知であり前記第１観測データおよび前記指標情報が既知であるデータとしたとき、前記演算手段は、前記指標情報の予め定められた離散値における前記系列データの真の分布を近似するために、前記指標情報の各離散値における経験分布に混合される予め定められた混合比と、前記系列データにおける所定の学習用サンプルデータの前記第１および第２観測データとを条件として、前記第１および第２観測データの前記指標情報に対する事後確率をそれぞれ推定する事後確率推定手段と、前記推定されたそれぞれの事後確率を利用して、前記指標情報の前記予め定められた離散値における前記系列データに対する尤度を最大化するように、前記混合比を推定し、前記尤度が最大化されたときの前記混合比から前記指標情報の重みを決定する混合比推定手段と、前記決定された指標情報の重みを前記記憶手段に書き込む重み書込手段と、前記記憶手段から前記決定された指標情報の重みを読み込んで、当該重みを用いた期待誤差に基づく目的関数と前記系列データとを用いて、前記予測用サンプルデータから、前記指標情報の前記予め定められた離散値における前記未知の第２観測データを予測するモデルを構築して前記記憶手段に書き込むモデル構築手段と、前記記憶手段から前記構築されたモデルと前記予測用サンプルデータとを読み込んで、前記構築されたモデルと、前記予測用サンプルデータとに基づいて、前記指標情報の前記予め定められた離散値における当該予測用サンプルデータの前記未知の第２観測データを予測する処理を行う予測処理手段とを備えることを特徴とする。 The present invention has been made to solve the above problems, and a prediction apparatus according to the present invention includes an input unit that inputs sequence data that is a sequence of learning sample data and prediction sample data, and a calculation unit. Storage means for storing the input series data and the prediction sample data and storing a calculation processing result by the calculation means, and an output means for outputting a prediction result by the calculation means, The learning sample data includes first observation data whose properties change according to time or region, second observation data whose properties change according to time or region in relation to the first observation data, and the first And index information indicating discrete values of time or area information corresponding to the second observation data, and data having three elements, and the prediction sample data, Of the three elements constituting the learning sample data, when the second observation data is unknown and the first observation data and the index information are known, the calculation means In order to approximate the true distribution of the series data at a predetermined discrete value, a predetermined mixing ratio mixed with an empirical distribution at each discrete value of the index information and a predetermined learning value in the series data A posterior probability estimating means for estimating posterior probabilities for the index information of the first and second observation data, on the condition of the first and second observation data of the sample data, and the estimated posterior probabilities Utilizing the estimation ratio so as to maximize the likelihood for the sequence data at the predetermined discrete value of the index information. A mixture ratio estimating means for determining the weight of the index information from the mixture ratio when the likelihood is maximized, a weight writing means for writing the determined weight of the index information in the storage means, and the storage The weight of the determined index information is read from the means, and the objective function based on the expected error using the weight and the series data are used to calculate the predetermined index information from the prediction sample data. A model construction unit that builds a model that predicts the unknown second observation data at discrete values and writes the model to the storage unit, and reads the built model and the prediction sample data from the storage unit, and the construction The prediction sample data in the predetermined discrete value of the index information based on the model and the prediction sample data. And prediction processing means for performing processing for predicting known second observation data.

かかる構成によれば、予測装置は、予測対象に関する系列データから得られる期待誤差を近似した期待誤差を最小化させるように、指標情報の予め定められた離散値におけるモデルを構築し、構築されたモデルと、予測用サンプルデータとに基づいて、指標情報の予め定められた離散値における未知の第２観測データを予測する処理を行う。ここで、所定の基準は、例えば、時間や地域等である。時間という基準に立脚した離散的な指標とは、過去から未来へ向かう一方向性の特徴を有した時間における特定の時刻を示し、系列データは時系列データとなる。このときの離散値は、年月日や時分秒等所定の単位を用いることができ、予め定められた離散値とは、例えば最新時刻等に設定できる。また、所定の基準が地域である場合には、その基準の種類として、例えば、地域環境やその場所に居住する人やくらし等を挙げることができ、このときの指標は、例えば、位置や距離、各種統計量等となる。このときの離散値は、位置座標や各種距離単位、各種の統計や動向調査に用いられる所定単位、「０，１」等のデジタル表記等を用いることができる。 According to such a configuration, the prediction device, as to minimize the expected error approximating the expected error obtained from series data related to the prediction target, to construct a model in predetermined discrete values of indicators information, constructed Based on the obtained model and the prediction sample data, a process for predicting unknown second observation data at a predetermined discrete value of the index information is performed. Here, the predetermined reference is, for example, time or area. A discrete index based on the standard of time indicates a specific time in a time having a one-way characteristic from the past to the future, and the series data is time series data. The discrete value at this time can use a predetermined unit such as date, hour, minute, and second, and the predetermined discrete value can be set to the latest time, for example. In addition, when the predetermined standard is an area, examples of the standard include the local environment, people living in the place, living, etc., and the index at this time is, for example, a position or distance And various statistics. As the discrete values at this time, position coordinates, various distance units, predetermined units used for various statistics and trend surveys, digital notation such as “0, 1”, and the like can be used.

例えば、時間を基準にして時系列データを用いて予め定められた離散値として最新時刻を設定したときには、予測装置は、指標情報の重みとして、時間情報の重みを推定する。この時間情報の重みによって、最新時刻のデータに適合するモデルを学習するために有益な情報を最新時刻のデータ以外の過去のデータから取り込むことができる。また、この時間情報の重みによって、最新時刻での経験誤差を最新時刻での期待誤差に対する精度のよい近似として用いることができる。したがって、予測装置は、このように推定された重みを用いて構築されたモデルと、出力（第２観測データ）が未知のサンプルから、そのサンプルの最新の時刻における未知データを高い精度で予測することが可能となる。 For example, when the latest time is set as a discrete value determined in advance using time-series data with reference to time, the prediction device estimates the weight of time information as the weight of index information . By using the weight of the time information, useful information for learning a model that matches the data at the latest time can be acquired from past data other than the data at the latest time. In addition, with the weight of this time information, the experience error at the latest time can be used as an accurate approximation to the expected error at the latest time. Therefore, the prediction device predicts the unknown data at the latest time of the sample from the model constructed using the weights estimated in this way and the sample whose output (second observation data) is unknown with high accuracy. It becomes possible.

また、かかる構成によれば、予測装置は、事後確率推定手段と、混合比推定手段とを備えているので、事後確率推定手段が、ＥＭアルゴリズム（Expectation−Maximization algorithm）におけるＥステップ(Expectation step)を行い、かつ、混合比推定手段がＭステップ(Maximization step)を行うことで、混合比についての大域的最適解を求め、求めた混合比から重みを決定することができる。 Further, according to such a configuration, the prediction apparatus includes a thing posterior probability estimation means is provided with the mixing ratio estimating means, the posterior probability estimation means, E step in the EM algorithm (Expectation-Maximization algorithm) (Expectation step ) And the mixture ratio estimation means performs M steps (Maximization step), thereby obtaining a global optimum solution for the mixture ratio and determining the weight from the obtained mixture ratio.

また、前記課題を解決するために、本発明に係る予測装置は、学習用サンプルデータの系列である系列データおよび予測用サンプルデータを入力する入力手段と、演算手段と、前記入力された系列データと前記予測用サンプルデータとを格納すると共に前記演算手段による演算処理結果を記憶する記憶手段と、前記演算手段による予測結果を出力する出力手段と、を備え、前記学習用サンプルデータを、時間または地域に応じて性質が変化する第１観測データと、前記第１観測データに関連して時間または地域に応じて性質が変化する第２観測データと、前記第１および第２観測データに対応する時間または地域の情報の離散値を示す指標情報と、の３つの要素を有するデータとし、前記系列データを、第２観測データｙが与えられたときの第１観測データｘの分布が、異なる前記指標情報の離散値において類似している学習用サンプルデータの系列とし、前記予測用サンプルデータを、前記学習用サンプルデータを構成する３つの要素のうち、前記第２観測データが未知であり前記第１観測データおよび前記指標情報が既知であるデータとしたとき、前記演算手段は、前記指標情報の予め定められた離散値における前記系列データの真の分布を近似するために、前記指標情報の各離散値における経験分布に混合される予め定められた混合比と、前記系列データにおける所定の学習用サンプルデータの前記第２観測データを条件として、前記第２観測データの前記指標情報に対する事後確率をそれぞれ推定する事後確率推定手段と、前記推定されたそれぞれの事後確率を利用して、前記指標情報の前記予め定められた離散値における前記系列データに対する尤度を最大化するように、前記混合比を推定し、前記尤度が最大化されたときの前記混合比から前記指標情報の重みを決定する混合比推定手段と、前記決定された指標情報の重みを前記記憶手段に書き込む重み書込手段と、前記記憶手段から前記決定された指標情報の重みを読み込んで、当該重みを用いた期待誤差に基づく目的関数と前記系列データとを用いて、前記予測用サンプルデータから、前記指標情報の前記予め定められた離散値における前記未知の第２観測データを予測するモデルを構築して前記記憶手段に書き込むモデル構築手段と、前記記憶手段から前記構築されたモデルと前記予測用サンプルデータとを読み込んで、前記構築されたモデルと、前記予測用サンプルデータとに基づいて、前記指標情報の前記予め定められた離散値における当該予測用サンプルデータの前記未知の第２観測データを予測する処理を行う予測処理手段とを備えることを特徴とする。 In order to solve the above problem, the prediction apparatus according to the present invention includes an input unit that inputs sequence data that is a sequence of learning sample data and the sample data for prediction, a calculation unit, and the input sequence data. And storage means for storing the calculation processing result by the calculation means, and output means for outputting the prediction result by the calculation means, and the learning sample data is stored in time or Corresponding to first observation data whose properties change according to the region, second observation data whose properties change according to time or region in relation to the first observation data, and the first and second observation data. When the second observation data y is given, the data has three elements, index information indicating discrete values of time or area information. The distribution of one observation data x is a sequence of learning sample data that is similar in discrete values of the different index information, and the prediction sample data is selected from among the three elements that constitute the learning sample data. When the second observation data is unknown and the first observation data and the index information are known, the calculation means calculates a true distribution of the series data in a predetermined discrete value of the index information. In order to approximate, on the condition of the predetermined mixture ratio mixed in the empirical distribution in each discrete value of the index information and the second observation data of the predetermined learning sample data in the series data, the second Using the posterior probability estimating means for estimating the posterior probability for the indicator information of the observation data, respectively, and using the estimated posterior probabilities, So as to maximize the likelihood for the sequence data in the discrete value said predetermined the serial index information, and estimates the mixture ratio, the likelihood of the index information from the mixing ratio when maximized A mixture ratio estimating means for determining a weight; a weight writing means for writing the weight of the determined index information into the storage means; and reading the weight of the determined index information from the storage means and using the weight A model for predicting the unknown second observation data at the predetermined discrete value of the index information is constructed from the prediction sample data using the objective function based on the expected error and the series data. A model construction means for writing to the storage means; reading the constructed model and the prediction sample data from the storage means; Prediction processing means for performing a process of predicting the unknown second observation data of the prediction sample data at the predetermined discrete value of the index information based on the measurement sample data , To do .

かかる構成の予測装置によれば、予測装置は、重みを決定するためにＥＭアルゴリズムを用いて混合比を推定する際に、第１および第２観測データの条件付き分布の代わりに第２観測データの条件付き分布を用いればよいので、計算が比較的容易になり、処理負荷を低減できる。 According to the prediction device having such a configuration, the prediction device uses the second observation data instead of the conditional distribution of the first and second observation data when estimating the mixture ratio using the EM algorithm to determine the weight. Therefore, the calculation is relatively easy and the processing load can be reduced.

また、本発明に係る予測装置は、前記モデル構築手段が、前記推定された指標情報の重みと、前記重みのハイパーパラメータと、前記近似された期待誤差における誤差関数とに基づく目的関数を最小化させるように、前記系列データおよび前記推定された重みに基づいてモデルを学習することで、前記予め定められた離散値におけるモデルを推定するモデルパラメータ推定手段と、前記推定されたモデルに対して適用される前記誤差関数を最小化させるように、前記ハイパーパラメータを推定するハイパーパラメータ推定手段と、を備え、前記モデルパラメータ推定手段が、前記誤差関数が最小化されたときの前記ハイパーパラメータを用いて推定されたモデルを構築すべきモデルとして決定することが好ましい。 In the prediction device according to the present invention, the model construction means minimizes an objective function based on the weight of the estimated index information, the hyperparameter of the weight, and the error function in the approximated expected error. Model parameter estimating means for estimating a model at the predetermined discrete value by learning a model based on the sequence data and the estimated weight, and applied to the estimated model Hyperparameter estimating means for estimating the hyperparameter so as to minimize the error function, wherein the model parameter estimating means uses the hyperparameter when the error function is minimized. It is preferable to determine the estimated model as a model to be constructed.

かかる構成によれば、予測装置は、モデル構築手段が、モデルパラメータ推定手段と、ハイパーパラメータ推定手段とを備えているので、誤差関数を最小化させるハイパーパラメータによって、重み推定手段で推定された重みを安定化させた精度のよいモデルを構築することができる。 According to such a configuration, since the model construction unit includes the model parameter estimation unit and the hyper parameter estimation unit, the prediction apparatus includes the weight estimated by the weight estimation unit using the hyper parameter that minimizes the error function. It is possible to build an accurate model that stabilizes.

また、前記課題を解決するために、本発明に係る予測方法は、学習用サンプルデータの系列である系列データおよび予測用サンプルデータを入力する入力手段と、演算手段と、前記入力された系列データと前記予測用サンプルデータとを格納すると共に前記演算手段による演算処理結果を記憶する記憶手段と、前記演算手段による予測結果を出力する出力手段と、を備える予測装置の予測方法であって、前記学習用サンプルデータを、時間または地域に応じて性質が変化する第１観測データと、前記第１観測データに関連して時間または地域に応じて性質が変化する第２観測データと、前記第１および第２観測データに対応する時間または地域の情報の離散値を示す指標情報と、の３つの要素を有するデータとし、前記予測用サンプルデータを、前記学習用サンプルデータを構成する３つの要素のうち、前記第２観測データが未知であり前記第１観測データおよび前記指標情報が既知であるデータとしたとき、前記予測装置の演算手段が、前記指標情報の予め定められた離散値における前記系列データの真の分布を近似するために、前記指標情報の各離散値における経験分布に混合される予め定められた混合比と、前記系列データにおける所定の学習用サンプルデータの前記第１および第２観測データとを条件として、前記第１および第２観測データの前記指標情報に対する事後確率をそれぞれ推定するステップと、前記推定されたそれぞれの事後確率を利用して、前記指標情報の前記予め定められた離散値における前記系列データに対する尤度を最大化するように、前記混合比を推定し、前記尤度が最大化されたときの前記混合比から前記指標情報の重みを決定するステップと、前記決定された指標情報の重みを前記記憶手段に書き込むステップと、前記記憶手段から前記決定された指標情報の重みを読み込んで、当該重みを用いた期待誤差に基づく目的関数と前記系列データとを用いて、前記予測用サンプルデータから、前記指標情報の前記予め定められた離散値における前記未知の第２観測データを予測するモデルを構築して前記記憶手段に書き込むモデル構築ステップと、前記記憶手段から前記構築されたモデルと前記予測用サンプルデータとを読み込んで、前記構築されたモデルと、前記予測用サンプルデータとに基づいて、前記指標情報の前記予め定められた離散値における当該予測用サンプルデータの前記未知の第２観測データを予測する処理を行う予測処理ステップと、を含んで実行することを特徴とする。
また、前記課題を解決するために、本発明に係る予測方法は、学習用サンプルデータの系列である系列データおよび予測用サンプルデータを入力する入力手段と、演算手段と、前記入力された系列データと前記予測用サンプルデータとを格納すると共に前記演算手段による演算処理結果を記憶する記憶手段と、前記演算手段による予測結果を出力する出力手段と、を備える予測装置の予測方法であって、前記学習用サンプルデータを、時間または地域に応じて性質が変化する第１観測データと、前記第１観測データに関連して時間または地域に応じて性質が変化する第２観測データと、前記第１および第２観測データに対応する時間または地域の情報の離散値を示す指標情報と、の３つの要素を有するデータとし、前記系列データを、第２観測データｙが与えられたときの第１観測データｘの分布が、異なる前記指標情報の離散値において類似している学習用サンプルデータの系列とし、前記予測用サンプルデータを、前記学習用サンプルデータを構成する３つの要素のうち、前記第２観測データが未知であり前記第１観測データおよび前記指標情報が既知であるデータとしたとき、前記予測装置の演算手段は、前記指標情報の予め定められた離散値における前記系列データの真の分布を近似するために、前記指標情報の各離散値における経験分布に混合される予め定められた混合比と、前記系列データにおける所定の学習用サンプルデータの前記第２観測データを条件として、前記第２観測データの前記指標情報に対する事後確率をそれぞれ推定するステップと、前記推定されたそれぞれの事後確率を利用して、前記指標情報の前記予め定められた離散値における前記系列データに対する尤度を最大化するように、前記混合比を推定し、前記尤度が最大化されたときの前記混合比から前記指標情報の重みを決定するステップと、前記決定された指標情報の重みを前記記憶手段に書き込むステップと、前記記憶手段から前記決定された指標情報の重みを読み込んで、当該重みを用いた期待誤差に基づく目的関数と前記系列データとを用いて、前記予測用サンプルデータから、前記指標情報の前記予め定められた離散値における前記未知の第２観測データを予測するモデルを構築して前記記憶手段に書き込むモデル構築ステップと、前記記憶手段から前記構築されたモデルと前記予測用サンプルデータとを読み込んで、前記構築されたモデルと、前記予測用サンプルデータとに基づいて、前記指標情報の前記予め定められた離散値における当該予測用サンプルデータの前記未知の第２観測データを予測する処理を行う予測処理ステップと、を含んで実行することを特徴とする。 In order to solve the above-described problem, the prediction method according to the present invention includes an input unit that inputs sequence data that is a sequence of learning sample data and sample data for prediction, an arithmetic unit, and the input sequence data. And a storage unit that stores the prediction sample data and stores the calculation processing result by the calculation unit, and an output unit that outputs the prediction result by the calculation unit , the prediction method of the prediction device comprising: The learning sample data includes first observation data whose properties change according to time or region, second observation data whose properties change according to time or region in relation to the first observation data, and the first And the prediction sample data, the data having three elements: index information indicating discrete values of time or area information corresponding to the second observation data Of the three elements constituting the learning sample data, when the second observation data is data which is the first observation data and the index information is unknown known, the calculating means of the prediction device, In order to approximate the true distribution of the series data at a predetermined discrete value of the index information, a predetermined mixing ratio mixed with an empirical distribution at each discrete value of the index information, Estimating the posterior probabilities for the index information of the first and second observation data on the condition of the first and second observation data of predetermined learning sample data, and the estimated posterior probabilities The mixture ratio is set so as to maximize the likelihood for the sequence data at the predetermined discrete value of the index information. Determining a weight of the index information from the mixture ratio when the likelihood is maximized, writing a weight of the determined index information to the storage means, and from the storage means The weight of the determined index information is read, and the objective function based on the expected error using the weight and the series data are used to calculate the predetermined discrete value of the index information from the prediction sample data. A model construction step of constructing a model for predicting the second unknown observation data and writing it to the storage means; reading the constructed model and the prediction sample data from the storage means; and constructing the constructed model And the prediction sample data in the predetermined discrete value of the index information based on the prediction sample data And a prediction processing step for performing processing for predicting known second observation data.
In order to solve the above-described problem, the prediction method according to the present invention includes an input unit that inputs sequence data that is a sequence of learning sample data and sample data for prediction, an arithmetic unit, and the input sequence data. And a storage unit that stores the prediction sample data and stores the calculation processing result by the calculation unit, and an output unit that outputs the prediction result by the calculation unit, the prediction method of the prediction device comprising: The learning sample data includes first observation data whose properties change according to time or region, second observation data whose properties change according to time or region in relation to the first observation data, and the first And index information indicating discrete values of time or area information corresponding to the second observation data, and data having three elements, and the series data is the second view When the data y is given, the distribution of the first observation data x is a series of learning sample data similar in discrete values of the different index information, and the prediction sample data is the learning sample data. When the second observation data is unknown and the first observation data and the index information are known among the three elements constituting the calculation device, the calculation unit of the prediction device determines the index information in advance. In order to approximate the true distribution of the series data at the discrete values, a predetermined mixing ratio mixed with the empirical distribution at each discrete value of the index information, and a predetermined learning sample data in the series data Estimating the posterior probability of the second observation data with respect to the index information, on the condition of the second observation data, and the estimated Using each posterior probability, the mixture ratio is estimated so as to maximize the likelihood for the sequence data at the predetermined discrete value of the index information, and the likelihood is maximized. Determining the weight of the index information from the mixing ratio, writing the determined weight of the index information to the storage means, and reading the determined weight of the index information from the storage means The unknown second observation data at the predetermined discrete value of the index information is predicted from the prediction sample data using the objective function based on the expected error using the weight and the series data. A model construction step of constructing a model and writing it in the storage means; reading the constructed model and the prediction sample data from the storage means; Prediction processing step for performing a process of predicting the unknown second observation data of the prediction sample data at the predetermined discrete value of the index information based on the built model and the prediction sample data And executing.

かかる手順の予測方法によれば、予測装置は、予測対象に関する系列データから得られる期待誤差を近似した期待誤差を最小化させるように、指標情報の予め定められた離散値におけるモデルを構築し、構築されたモデルと、予測用サンプルデータとに基づいて、指標情報の予め定められた離散値における未知の第２観測データを予測する処理を行う。例えば、時間を基準にして時系列データを用いて予め定められた離散値として最新時刻を設定したときには、予測装置は、指標情報の重みとして、時間情報の重みを推定する。この時間情報の重みによって、最新時刻のデータに適合するモデルを学習するために有益な情報を最新時刻のデータ以外の過去のデータから取り込むことができる。また、この時間情報の重みによって、最新時刻での経験誤差を最新時刻での期待誤差に対する精度のよい近似として用いることができる。したがって、予測装置は、このように推定された重みを用いて構築されたモデルと、出力（第２観測データ）が未知のサンプルから、そのサンプルの最新の時刻における未知データを高い精度で予測することが可能となる。 According to the prediction method of the procedure, the prediction device, as to minimize the expected error approximating the expected error obtained from series data related to the prediction target, to construct a model in predetermined discrete values of indicators Information Based on the constructed model and the prediction sample data, a process of predicting unknown second observation data at a predetermined discrete value of the index information is performed. For example, when the latest time is set as a discrete value determined in advance using time-series data with reference to time, the prediction device estimates the weight of time information as the weight of index information . By using the weight of the time information, useful information for learning a model that matches the data at the latest time can be acquired from past data other than the data at the latest time. In addition, with the weight of this time information, the experience error at the latest time can be used as an accurate approximation to the expected error at the latest time. Therefore, the prediction device predicts the unknown data at the latest time of the sample from the model constructed using the weights estimated in this way and the sample whose output (second observation data) is unknown with high accuracy. It becomes possible.

また、本発明に係る予測プログラムは、前記した予測方法をコンピュータに実行させることを特徴とする。このように構成されることにより、このプログラムをインストールされたコンピュータは、このプログラムに基づいた各機能を実現することができる。 A prediction program according to the present invention causes a computer to execute the above-described prediction method. By being configured in this way, a computer in which this program is installed can realize each function based on this program.

また、本発明に係るコンピュータ読み取り可能な記録媒体は、前記した予測プログラムが記録されたことを特徴とする。このように構成されることにより、この記録媒体を装着されたコンピュータは、この記録媒体に記録されたプログラムに基づいた各機能を実現することができる。 A computer-readable recording medium according to the present invention is characterized in that the prediction program described above is recorded. By being configured in this way, a computer equipped with this recording medium can realize each function based on a program recorded on this recording medium.

本発明によれば、時間情報等を示す指標情報の重みによって、指標情報の予め定められた離散値のデータに適合するモデルを学習するために有益な情報を、予め定められた離散値のデータ以外のデータから取り込むことで、予め定められた離散値での経験誤差を期待誤差に対する精度のよい近似として用いることができる。そのため、時間とともに性質が変化する時系列データにおいて、最新の時刻におけるデータを高い精度で予測することが可能となる。 According to the present invention, information useful for learning a model that matches the predetermined discrete value data of the index information by using the weight of the index information indicating time information or the like, the predetermined discrete value data By taking in data other than the above, it is possible to use an experience error at a predetermined discrete value as an accurate approximation to an expected error. Therefore, it is possible to predict data at the latest time with high accuracy in time-series data whose properties change with time.

以下、本発明の予測装置および予測方法を実施するための最良の形態（以下「実施形態」という）について詳細に説明する。説明の都合上、本発明の概要を説明した上で、実施形態について図面を参照して説明する。 Hereinafter, the best mode (hereinafter referred to as “embodiment”) for carrying out the prediction apparatus and the prediction method of the present invention will be described in detail. For convenience of explanation, the outline of the present invention will be described, and then embodiments will be described with reference to the drawings.

［本発明の概要］
本発明は、予測対象に関する入力サンプル（基準側の第１観測データ）と、出力サンプル（予測側の第２観測データ）と、出力サンプルが示す性質の変化を離散的に知覚するための所定の基準に立脚した離散的な指標を示す指標情報とを有する予測対象に関する入出力サンプル（学習用サンプルデータ）についての指標情報に関する系列を示す系列データを学習することで、入力サンプルおよび指標情報が既知で出力サンプルが未知であるサンプル（予測用サンプルデータ）から、未知の出力サンプルを予測する予測問題に関するものである。特に、本発明は、時間情報に応じて変化するデータ（時系列データ）を予測対象とすることが好ましい。その性質が時間的に変化するデータの種類としては、例えば、購買データ、ニュース記事データ、論文データ、Ｗｅｂサーフィンデータ等を挙げることができる。特に、予測対象とするデータが購買データである場合には、購買される対象物（商品）は、実体としてのモノに限らずデータ等のサービスであってもよい。このようなサービスデータは、例えば、画像データ、映像データ、音楽データ等である。 [Outline of the present invention]
The present invention provides an input sample (first observation data on the reference side), an output sample (second observation data on the prediction side), and a predetermined value for discretely perceiving a change in properties indicated by the output sample. The input sample and the index information are known by learning the sequence data indicating the sequence related to the index information about the input / output sample (learning sample data) related to the prediction target having the index information indicating the discrete index based on the reference. This relates to a prediction problem in which an unknown output sample is predicted from a sample whose output sample is unknown (prediction sample data). In particular, according to the present invention, it is preferable that data (time-series data) that changes according to time information be a prediction target. Examples of data types whose characteristics change with time include purchase data, news article data, paper data, Web surfing data, and the like. In particular, when the data to be predicted is purchase data, the purchased object (product) is not limited to an entity as a substance but may be a service such as data. Such service data is, for example, image data, video data, music data, and the like.

本実施形態では、本発明の説明を単純化させるために、時間情報に応じて変化する購買データを予測対象とすることを念頭に説明する。この場合に、本発明は、全学習データに対する予測精度が高いモデルを学習するのではなく、最新のデータに対する予測精度が高いモデルを学習する予測手法であると言える。このような学習を行うためには、最新時刻で最小にすべき期待誤差を精度よく推定する必要がある。そこで、本発明では、大要、サンプルが生成された時刻（時間情報）を重み付ける重みを推定する処理（重み推定処理）と、推定された重みを安定に取り込んだ予測モデルを構築する処理（モデル構築処理）とを行うこととした。以下、これらの重み推定処理およびモデル構築処理の原理について順次説明する。 In the present embodiment, in order to simplify the description of the present invention, it will be described with the purchase data that changes according to the time information as a prediction target. In this case, it can be said that the present invention is a prediction method for learning a model having high prediction accuracy for the latest data, rather than learning a model having high prediction accuracy for all learning data. In order to perform such learning, it is necessary to accurately estimate the expected error that should be minimized at the latest time. Therefore, in the present invention, a process for estimating a weight for weighting the time (time information) at which a sample is generated (weight estimation process) and a process for building a prediction model that stably incorporates the estimated weight ( Model building process). Hereinafter, the principles of the weight estimation process and the model construction process will be sequentially described.

＜重み推定処理の原理＞
最新時刻におけるデータ（サンプル）に対する期待誤差Ｅ_Tは、式（１）で示される。ここで、時間の単位は、任意であり、最新時刻とは、最新の時分秒を示すほか、最新日、最新月、最新年等を示すことができる。このような最新のデータに対する予測精度が高いモデルを学習するときには、式（１）に示す期待誤差Ｅ_Tが最小になるようなモデルが望ましい。式（１）において、Ｔは最新時刻を示し、Ｊ（ｘ，ｙ；Ｍ）は、サンプル（ｘ，ｙ）が与えられたときの予測モデル（以下、単にモデルという）Ｍの誤差関数を示す。また、Ｐ（ｘ，ｙ；Ｔ）は、最新時刻におけるサンプル（ｘ，ｙ）の確率分布（以下、単に分布という）を示す。なお、モデルＭとしては、例えば、１次マルコフモデル、２次マルコフモデル等、任意のモデルを用いることができる。ここで、入力サンプル（ｘ）は、第１観測データであり、単に入力ともいう。また、出力サンプル（ｙ）は、第２観測データであり、単に出力ともいう。 <Principle of weight estimation processing>
The expected error E _T for the data (sample) at the latest time is expressed by equation (1). Here, the unit of time is arbitrary, and the latest time can indicate the latest hour, minute, second, the latest date, the latest month, the latest year, and the like. When learning a model with high prediction accuracy for such latest data, a model that minimizes the expected error E _T shown in Expression (1) is desirable. In Expression (1), T represents the latest time, and J (x, y; M) represents an error function of a prediction model (hereinafter simply referred to as a model) M when a sample (x, y) is given. . P (x, y; T) indicates a probability distribution (hereinafter simply referred to as distribution) of the sample (x, y) at the latest time. As the model M, for example, an arbitrary model such as a primary Markov model or a secondary Markov model can be used. Here, the input sample (x) is the first observation data and is also simply referred to as input. The output sample (y) is the second observation data, and is also simply called output.

誤差関数Ｊ（ｘ，ｙ；Ｍ）は、例えば、式（２）に示す負の対数尤度や、式（３）に示す０−１損失関数等が考えられる。なお、以下では、対数は自然対数、すなわち、対数logの底は「ｅ」であるものとする。 As the error function J (x, y; M), for example, a negative log likelihood shown in Expression (2), a 0-1 loss function shown in Expression (3), and the like can be considered. In the following, it is assumed that the logarithm is a natural logarithm, that is, the base of the logarithm log is “e”.

最新時刻のサンプル数が無限大の極限である場合には、最新時刻での経験誤差は、最新時刻での期待誤差Ｅ_Tと一致する。しかし、最新時刻のデータのみを用いて学習した場合には、全データを用いて学習する場合と比べて、サンプル数が少なくなるため、適切なモデルが得られない可能性がある。そのため、最新時刻のデータのみを用いて学習することは好ましくはない。また、最新時刻のデータより前の過去のデータにも、最新時刻のデータに適合するモデルを学習するために有益な情報がある程度は含まれていると考えられる。そこで、重み推定処理においては、サンプル（ｘ，ｙ）が与えられたときのモデルＭの誤差関数Ｊ（ｘ，ｙ；Ｍ）を、サンプル（ｘ，ｙ）が生成された時刻における重みによって重み付けしたときの誤差Ｅ（Ｍ）を最小化させるようにモデルＭを学習することとした。このときの誤差Ｅ（Ｍ）は、式（４）で示される。式（４）において、ｗ（ｔ_n）は時刻ｔ_nにおける重みであり、最新時刻Ｔにおけるモデル（重みに基づいて後に構築するモデル）を学習するために時刻ｔ_nのデータがどのくらい参考になるかを表す。 When the number of samples at the latest time is an infinite limit, the experience error at the latest time coincides with the expected error E _T at the latest time. However, when learning is performed using only the latest data, the number of samples is smaller than in the case where learning is performed using all data, and therefore, an appropriate model may not be obtained. Therefore, it is not preferable to learn using only the latest time data. In addition, it is considered that past data before the latest time data includes some information useful for learning a model that matches the latest time data. Therefore, in the weight estimation process, the error function J (x, y; M) of the model M when the sample (x, y) is given is weighted by the weight at the time when the sample (x, y) is generated. The model M is learned so as to minimize the error E (M). The error E (M) at this time is expressed by equation (4). In equation (4), w (t _n ) is a weight at time t _n , and how much reference is made to the data at time t _n in order to learn the model at the latest time T (model to be constructed later based on the weight). Represents

前記した式（１）の代わりに式（４）を用いることの正当性を以下に説明する。
まず、時刻がｔであるサンプル数をＮ（ｔ）とする。また、時刻がｔ、入力がｘ、出力がｙのサンプル数をＮ（ｔ，ｘ，ｙ）とする。また、最新時刻Ｔにおけるサンプル（ｘ，ｙ）の分布（真の分布）Ｐ（ｘ，ｙ｜Ｔ）を、式（５）に示すように、各時刻における経験分布の混合で近似することとする。式（５）を満たす混合比を式（６）で示す。以下では、個々の混合比を混合分布Ｐ（ｔ）と呼ぶ。つまり、式（５）は、最新時刻Ｔにおける真の分布Ｐ（ｘ，ｙ｜Ｔ）が、各時刻における経験分布と混合分布Ｐ（ｔ）との積を時刻ｔについて加算した結果で近似することを示すことになる。また、各時刻における経験分布には、サンプル数Ｎ（ｔ），Ｎ（ｔ，ｘ，ｙ）との間で、式（７）の関係がある。なお、式（７）においてＰに付した記号「＾（ハット）」は、そのＰが経験分布であることを示すものである。 The validity of using the formula (4) instead of the above-described formula (1) will be described below.
First, the number of samples whose time is t is N (t). The number of samples with time t, input x, and output y is N (t, x, y). Further, the distribution (true distribution) P (x, y | T) of the sample (x, y) at the latest time T is approximated by a mixture of empirical distributions at each time, as shown in Equation (5). To do. A mixing ratio satisfying the formula (5) is represented by the formula (6). Hereinafter, each mixing ratio is referred to as a mixture distribution P (t). In other words, Equation (5) approximates the true distribution P (x, y | T) at the latest time T with the result of adding the product of the empirical distribution and the mixed distribution P (t) at each time for the time t. Will show that. Further, the experience distribution at each time has a relationship of Expression (7) between the number of samples N (t) and N (t, x, y). In addition, the symbol “^ (hat)” added to P in Expression (7) indicates that P is an empirical distribution.

そして、式（８）に示すように、重みは、混合分布Ｐ（ｔ）を、時刻がｔであるサンプル数Ｎ（ｔ）で除した値として定義される。なお、式（８）に式（７）の関係を適用すると、重みは式（９）で示される。 As shown in Expression (8), the weight is defined as a value obtained by dividing the mixture distribution P (t) by the number of samples N (t) whose time is t. In addition, when the relationship of Formula (7) is applied to Formula (8), a weight is shown by Formula (9).

そして、このとき、前記した式（４）に示す誤差Ｅ（Ｍ）は、式（１０）のように書き換えられる。式（１０）において、右辺の２行目は、サンプル数Ｎ（ｔ，ｘ，ｙ）の定義による変形結果を示し、右辺の３行目は、前記した式（９）よる変形結果を示す。また、右辺の４行目は、前記した式（５）の近似を用いた変形結果を示し、右辺の５行目は、前記した式（１）よる変形結果を示す。その結果、前記した式（４）に示す誤差Ｅ（Ｍ）は、最新時刻Ｔでの期待誤差Ｅ_Tの近似となることが理解される。 At this time, the error E (M) shown in Equation (4) is rewritten as Equation (10). In Expression (10), the second line on the right side shows a modification result based on the definition of the number of samples N (t, x, y), and the third line on the right side shows a modification result according to Expression (9). The fourth line on the right side shows a modification result using approximation of the above-described equation (5), and the fifth line on the right side shows a modification result by the above-described equation (1). As a result, it is understood that the error E (M) shown in the equation (4) is an approximation of the expected error E _T at the latest time T.

次に、時刻ｔにおける重みｗ（ｔ）を推定する。前記した式（８）の関係から、重みｗ（ｔ）を推定することは、混合分布Ｐ（ｔ）を推定することと同義である。つまり、重みｗ（ｔ）を推定するためには、混合比を決定すればよい。前記した式（６）に示す混合比は、式（１１）に示す最新時刻Ｔのサンプル集合に対する尤度Ｌ（Ｐ）を最大化することにより推定することができる。なお、過学習を回避するためleave-one-out法を用いる。式（１１）の右辺の２行目は、式（１２）に示す関係式を用いて書き換えたものを示す。式（１２）は、ｎ番目のサンプルを除いたときに前記した式（５）より推定される最新時刻Ｔにおける分布の推定値の定義式である。 Next, the weight w (t) at time t is estimated. Estimating the weight w (t) from the relationship of the above equation (8) is synonymous with estimating the mixture distribution P (t). That is, in order to estimate the weight w (t), the mixture ratio may be determined. The mixing ratio shown in Equation (6) can be estimated by maximizing the likelihood L (P) for the sample set at the latest time T shown in Equation (11). Note that the leave-one-out method is used to avoid overlearning. The second line on the right side of Expression (11) shows a rewritten expression using the relational expression shown in Expression (12). Expression (12) is a definition expression of the estimated value of the distribution at the latest time T estimated from Expression (5) described above when the n-th sample is removed.

式（１２）の右辺におけるｎ番目のサンプルを除いたときの時刻ｔでの経験分布は、式（１３）で示される。式（１３）において、αは、ゼロ確率問題を回避するため導入されたものであり、Dirichlet事前分布のパラメータを示す。また、δ_t,Tはクロネッカーのデルタを示す。 An empirical distribution at time t when the n-th sample on the right side of Expression (12) is removed is expressed by Expression (13). In Equation (13), α is introduced to avoid the zero probability problem and indicates a parameter of the Dirichlet prior distribution. Also, δ _{t, T} represents the Kronecker delta.

ここで、対数関数は上に凸な関数であるため、前記した式（１１）に示すＬ（Ｐ）の関数形も上に凸であり、Ｌ（Ｐ）を最大化することにより、Ｐについての大域的最適解（Ｐ^＊と表記する）を得ることができる。Ｌ（Ｐ）を最大化させるための最適化手法は、任意であり、例えば、ＥＭアルゴリズムを用いることができる。本実施形態では、ＥＭアルゴリズムを用いて説明する。個々の完全データの対数尤度を式（１４）で示す。ここで、式（１４）は前記した式（１１）に示すＬ（Ｐ）の右辺第２行に示したものである。また、最大化すべき対数尤度は、ｔ_n＝Ｔとなるｎの個数だけの完全データの対数尤度である。 Here, since the logarithmic function is an upward convex function, the function form of L (P) shown in the above equation (11) is also upward convex, and by maximizing L (P), P The global optimal solution (denoted P ^* ) can be obtained. An optimization method for maximizing L (P) is arbitrary, and for example, an EM algorithm can be used. In the present embodiment, description will be made using an EM algorithm. The log likelihood of each complete data is expressed by equation (14). Here, Expression (14) is shown in the second row on the right side of L (P) shown in Expression (11). In addition, the log likelihood to be maximized is the log likelihood of the complete data of the number n that satisfies t _n = T.

したがって、最大化すべき完全データの対数尤度の条件付き期待値は、式（１５）で示される。ここで、τは、Ｅステップ(Expectation step)とＭステップ(Maximization step)との２つの手順を繰り返した回数（τ＝０，１，２，…）を指す。なお、τ＝０のときには推定値の予め定められた初期値を示す。 Therefore, the conditional expected value of the log likelihood of the complete data to be maximized is expressed by equation (15). Here, τ indicates the number of times (τ = 0, 1, 2,...) That the two steps of the E step (Expectation step) and the M step (Maximization step) are repeated. In addition, when τ = 0, a predetermined initial value of the estimated value is shown.

ＥＭアルゴリズムでは、まず、Ｅステップにおいて、式（１５）に示す条件付き期待値Ｑ（Ｐ｜Ｐ^（τ））を計算する際に、式（１７）を用いて推定したＰ^（τ）（ｔ）を用いて式（１６）の計算を行う。式（１６）の計算においては、全サンプル（ｘ，ｙ）の全時刻ｔに対する事後確率を推定する。次に、Ｍステップにおいて、式（１５）に示す条件付き期待値Ｑ（Ｐ｜Ｐ^（τ））を最大化する際に、式（１６）を用いて求めた結果を用いて式（１７）の計算を行い、混合分布についての新しい推定値Ｐ^{（τ＋１）}（ｔ）を求める。すなわち、混合比を推定する。 In the EM algorithm, first, in step E, when the conditional expected value Q (P | P ^(τ) ) shown in equation (15) is calculated, P ^(τ) (t estimated using equation (17) is used. ) To calculate equation (16). In the calculation of Expression (16), the posterior probabilities for all times t of all samples (x, y) are estimated. Next, in the M step, when the conditional expected value Q (P | P ^(τ) ) shown in Expression (15) is maximized, Expression (17) is obtained using the result obtained using Expression (16). To obtain a new estimated value P ^{(τ + 1)} (t) for the mixture distribution. That is, the mixture ratio is estimated.

そして、ＥステップとＭステップとの２つの手順を収束条件が満たされるまで繰り返すことにより、各混合分布Ｐ（ｔ）すなわち前記した式（６）に示す混合比が得られる。ここで、前記した式（１５）に示す条件付き期待値Ｑ（Ｐ｜Ｐ^（τ））が最大化するときには、前記した式（１１）に示す尤度Ｌ（Ｐ）が最大化することになる。すなわち、前記した式（１１）に示す尤度Ｌ（Ｐ）が収束する。このときに収束条件が満たされる。したがって、本実施形態においては、Ｅステップで、式（１７）を用いて推定したＰ^（τ）（ｔ）を利用して式（１６）の計算を行い、また、Ｍステップで、式（１６）を用いて求めた結果を利用して式（１７）の計算を行い、ＥステップとＭステップとの組み合わせが終了した時点で、前記した式（１１）に示す尤度Ｌ（Ｐ）が収束したか否かを判別する。判別の結果、収束していなければ収束するまでＥステップとＭステップとを交互に繰り返す。そして、収束条件が満たされたときの混合分布Ｐ（ｔ）を、前記した式（８）に示す重みの定義式に用いることで、サンプル（ｘ，ｙ）が生成された時刻ｔにおける重みｗ（ｔ）が推定されたこととなる。 Then, by repeating the two steps of the E step and the M step until the convergence condition is satisfied, each mixture distribution P (t), that is, the mixture ratio shown in the above equation (6) is obtained. Here, when the conditional expected value Q (P | P ^(τ) ) shown in the equation (15) is maximized, the likelihood L (P) shown in the equation (11) is maximized. Become. That is, the likelihood L (P) shown in the above equation (11) converges. At this time, the convergence condition is satisfied. Therefore, in the present embodiment, in step E, calculation of equation (16) is performed using P ^(τ) (t) estimated using equation (17), and in step M, equation (16) is calculated. ) Is used to calculate the equation (17), and when the combination of the E step and the M step is completed, the likelihood L (P) shown in the equation (11) converges. It is determined whether or not. If the result of determination is that it has not converged, the E step and M step are alternately repeated until convergence. Then, the weight distribution w at the time t at which the sample (x, y) is generated is obtained by using the mixture distribution P (t) when the convergence condition is satisfied in the weight definition formula shown in the above formula (8). (T) is estimated.

＜モデル構築処理の原理＞
重み推定処理では、サンプル（ｘ，ｙ）が生成された時刻ｔにおける重みｗ（ｔ）を推定することで、前記した式（１）に示す最新時刻Ｔにおけるデータ（サンプル）に対する期待誤差Ｅ_Tを、前記した式（４）に示す誤差Ｅ（Ｍ）で近似することとした。この推定された重みｗ（ｔ）を安定化させるために、モデル構築処理では、ハイパーパラメータλを導入して、式（１８）に示す誤差Ｅ（Ｍ，λ）を、最小とすべき目的関数とすることとした。つまり、誤差Ｅ（Ｍ，λ）を最小化させるようにモデルＭを学習する <Principle of model building process>
In the weight estimation process, by estimating the weight w (t) at the time t when the sample (x, y) is generated, the expected error E _{T with} respect to the data (sample) at the latest time T shown in the above equation (1). Is approximated by the error E (M) shown in Equation (4). In order to stabilize the estimated weight w (t), in the model construction process, the hyperparameter λ is introduced, and the objective function to minimize the error E (M, λ) shown in Expression (18). It was decided that. That is, the model M is learned so as to minimize the error E (M, λ).

モデルＭを学習したデータでハイパーパラメータλを学習すると過学習を起こす可能性がある。そこで、Ｋ重交差検定法（K-fold cross validation）を用いてハイパーパラメータλを学習する。なお、学習に用いなかったデータに対する汎化誤差が大きくなってしまう現象は過学習と呼ばれている。具体的には、Ｋ重交差検定法では、まず、式（１９）に示す全学習データＤを式（２０）に示すようにＫ個の部分集合Ｄ_kに分割する。式（２０）において、ｋは学習データＤ中の部分集合のインデックスを示す。そして、分割された部分集合のうち部分集合Ｄ_jを学習に用いる。ここで、部分集合Ｄ_jは、式（２１）に示すように、所定の部分集合を除く（Ｋ−１）個の部分集合を示す。式（２１）では、除かれる所定の部分集合のインデックスをｋとして、それ以外の部分集合のインデックスをｊとした。そして、部分集合Ｄ_jを学習に用いて、前記した式（１８）に示す目的関数（誤差Ｅ（Ｍ，λ））を最小化させるモデルを、最新時刻Ｔにおけるモデルとして構築する。構築すべきモデルとして、所定の部分集合ｋに依存したモデルを式（２２）に示す。Ｋ重交差検定法では、除かれる部分集合Ｄ_kとして分割したすべての種類（Ｋ種類）の集合を考慮するので、式（２２）に示すモデルを、除かれる所定の部分集合のインデックスｋ毎に学習し、Ｋ個のモデルを構築する。なお、式（２２）においてＭに付した記号「＾（ハット）」は、そのＭがargmin関数の引数を最小化させることを示すものである。 If the hyperparameter λ is learned from the data learned from the model M, overlearning may occur. Therefore, the hyperparameter λ is learned using a K-fold cross validation method. Note that the phenomenon in which the generalization error for data not used for learning becomes large is called overlearning. Specifically, in the K-fold cross-validation method, first, the entire learning data D shown in Expression (19) is divided into K subsets D _k as shown in Expression (20). In Expression (20), k represents a subset index in the learning data D. Of the divided subsets, the subset D _j is used for learning. Here, the subset D _j indicates (K−1) subsets excluding a predetermined subset, as shown in Expression (21). In Equation (21), the index of the predetermined subset to be excluded is k, and the index of the other subset is j. Then, using the subset D _j for learning, a model that minimizes the objective function (error E (M, λ)) shown in the equation (18) is constructed as a model at the latest time T. As a model to be constructed, a model depending on a predetermined subset k is shown in Expression (22). In the K-fold cross-validation method, since all types (K types) of sets divided as subsets D _{k to} be excluded are considered, the model shown in Expression (22) is applied to each index k of a predetermined subset to be excluded. Learn and build K models. In the equation (22), the symbol “^ (hat)” added to M indicates that M minimizes the argument of the argmin function.

ハイパーパラメータλについては、部分集合Ｄ_j以外、すなわち、学習に用いなかった部分集合Ｄ_k毎に、その部分集合Ｄ_kの最新時刻Ｔのデータに対する誤差関数Ｊが最小になるように学習し、さらに、除かれる部分集合Ｄ_kとしてすべての種類（Ｋ種類）を考慮する。具体的には、式（２３）に示す計算により、誤差関数Ｊを最小化させるハイパーパラメータλを求める。なお、式（２３）においてλに付した記号「＾（ハット）」は、そのλがargmin関数の引数を最小化させることを示すものである。 For the hyperparameter λ, learning is performed so that the error function J other than the subset D _j , that is, for each subset D _k not used for learning, is minimized with respect to the data at the latest time T of the subset D _k . Furthermore, all types (K types) are considered as the subset D _{k to} be excluded. Specifically, the hyperparameter λ that minimizes the error function J is obtained by the calculation shown in Expression (23). In the equation (23), the symbol “^ (hat)” added to λ indicates that λ minimizes the argument of the argmin function.

ここで、前記した式（１８）に示す目的関数（誤差Ｅ（Ｍ，λ））が最小化するときには、目的関数（誤差Ｅ（Ｍ，λ））が収束する。このときに収束条件が満たされる。したがって、本実施形態においては、式（２３）を用いて推定したハイパーパラメータを利用して式（２２）の計算を行い、また、式（２２）を用いて求めたモデルを利用して式（２３）の計算を行い、モデルの推定とハイパーパラメータの推定との組み合わせが終了した時点で、前記した式（１８）に示す目的関数（誤差Ｅ（Ｍ，λ））が収束したか否かを判別する。判別の結果、収束していなければ収束するまでモデルの推定とハイパーパラメータの推定とを交互に繰り返す。そして、収束条件が満たされたときに、その時点のモデルが、最新時刻Ｔにおけるモデルとして推定されたこととなる。 Here, when the objective function (error E (M, λ)) shown in the equation (18) is minimized, the objective function (error E (M, λ)) converges. At this time, the convergence condition is satisfied. Therefore, in the present embodiment, the calculation of the equation (22) is performed using the hyperparameter estimated using the equation (23), and the equation (22) is calculated using the model obtained using the equation (22). 23) When the calculation of 23) is completed and the combination of the estimation of the model and the estimation of the hyperparameter is completed, it is determined whether or not the objective function (error E (M, λ)) shown in the above equation (18) has converged. Determine. As a result of discrimination, if it has not converged, model estimation and hyperparameter estimation are alternately repeated until convergence. Then, when the convergence condition is satisfied, the model at that time is estimated as the model at the latest time T.

［予測装置の全体構成］
次に、本発明の実施形態について図面を参照して説明する。図１は、本発明の実施形態に係る予測装置の構成を示すブロック図である。予測装置１は、予測対象に関する入出力サンプル（学習用サンプルデータ）についての系列を示す入力データ（系列データ）を学習することで、出力が未知であるサンプルから、未知の出力サンプルを予測するものである。図１に示すように、予測装置１は、演算手段２と、入力手段３と、記憶手段４と、出力手段５とを備えている。各手段２〜５はバスライン１１に接続されている。 [Entire configuration of prediction device]
Next, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration of a prediction device according to an embodiment of the present invention. The prediction device 1 predicts an unknown output sample from a sample whose output is unknown by learning input data (sequence data) indicating a sequence of input / output samples (learning sample data) related to a prediction target. It is. As shown in FIG. 1, the prediction device 1 includes a calculation unit 2, an input unit 3, a storage unit 4, and an output unit 5. Each means 2 to 5 is connected to the bus line 11.

演算手段２は、例えば、ＣＰＵ（Central Processing Unit）およびＲＡＭ（Random Access Memory）から構成される主制御装置である。この演算手段２は、図１に示すように、重み推定部（重み推定手段）２１と、モデル構築部（モデル構築手段）２２と、予測処理部（予測処理手段）２３と、メモリ２４とを含んで構成される。なお、各部２１〜２３の説明は後記する。 The computing means 2 is a main control device composed of, for example, a CPU (Central Processing Unit) and a RAM (Random Access Memory). As shown in FIG. 1, the computing unit 2 includes a weight estimation unit (weight estimation unit) 21, a model construction unit (model construction unit) 22, a prediction processing unit (prediction processing unit) 23, and a memory 24. Consists of including. In addition, description of each part 21-23 is mentioned later.

入力手段３は、例えば、キーボード、マウス、ディスクドライブ装置等から構成される。この入力手段３は、例えば、データとして入力データ（系列データ）およびサンプルを入力し、記憶手段４に格納する。本実施形態では、入力データ（系列データ）を前記した表１に示す購買予測の場合のデータ（購買データ）とする。 The input unit 3 includes, for example, a keyboard, a mouse, a disk drive device, and the like. The input means 3 inputs, for example, input data (series data) and samples as data and stores them in the storage means 4. In this embodiment, the input data (series data) is assumed to be data (purchase data) in the case of the purchase forecast shown in Table 1 described above.

記憶手段４は、例えば、一般的なハードディスク装置等から構成され、演算手段２で用いられる各種プログラムや各種データ等を記憶する。この記憶手段４は、プログラムとして、重み推定プログラム４１と、モデル構築プログラム４２と、予測処理プログラム４３とをプログラム格納部４０ａに記憶する。そして、演算手段２は、これらのプログラム４１〜４３を記憶手段４から読み込んでメモリ２４に展開して実行することで、前記した重み推定部２１、モデル構築部２２、予測処理部２３の各機能を実現する。 The storage means 4 is composed of, for example, a general hard disk device or the like, and stores various programs and various data used by the calculation means 2. The storage unit 4 stores a weight estimation program 41, a model construction program 42, and a prediction processing program 43 as programs in the program storage unit 40a. Then, the calculation means 2 reads these programs 41 to 43 from the storage means 4, expands them in the memory 24, and executes them, so that each function of the weight estimation unit 21, model construction unit 22, and prediction processing unit 23 is performed. Is realized.

また、記憶手段４は、入力データ（系列データ）４４と、重み４５と、モデルパラメータ４６と、サンプル４７とをデータ格納部４０ｂに記憶する。ここで、入力データ（系列データ）４４は、入力手段３から入力されるデータであり、例えば、前記した表１に示した購買データである。重み４５は、演算手段２の重み推定部２１の演算処理結果を示すデータである。モデルパラメータ４６は、演算手段２のモデル構築部２２の演算処理結果を示すデータである。サンプル４７は、例えば、前記した表１に示した購買データに対応したデータであり、入力ｘと時刻ｔとが既知であり、出力ｙが未知のデータである。 The storage means 4 stores input data (series data) 44, weights 45, model parameters 46, and samples 47 in the data storage unit 40b. Here, the input data (series data) 44 is data input from the input means 3, for example, purchase data shown in Table 1 described above. The weight 45 is data indicating the calculation processing result of the weight estimation unit 21 of the calculation means 2. The model parameter 46 is data indicating the calculation processing result of the model construction unit 22 of the calculation means 2. The sample 47 is, for example, data corresponding to the purchase data shown in Table 1 above, and the input x and time t are known and the output y is unknown.

出力手段５は、例えば、グラフィックボード（出力インタフェース）およびそれに接続されたモニタである。モニタは、例えば、液晶ディスプレイ等から構成され、演算処理結果（例えば、予測を行った結果＜予測される購買商品の情報等＞）を表示する。 The output means 5 is, for example, a graphic board (output interface) and a monitor connected thereto. The monitor is composed of, for example, a liquid crystal display or the like, and displays a calculation processing result (for example, a prediction result <predicted purchase product information>).

次に、演算手段２の各部の構成の詳細を説明する。
＜重み推定部＞
図２を参照しながら、重み推定部２１の構成について説明する。図２は、重み推定部の構成を示す機能ブロック図である。重み推定部（重み推定手段）２１は、入力データ（系列データ）４４から得られる、最新時刻（指標情報の予め定められた離散値）における入出力サンプル（学習用サンプルデータ）の入力サンプル（第１観測データ）および出力サンプル（第２観測データ）に対する期待誤差が、入出力サンプルが与えられたときのモデルの誤差関数と、それぞれの入出力サンプルの時間情報の重みと、を用いて近似された期待誤差を最小化させるように、時間情報の重みを推定するものである。図２に示すように、重み推定部２１は、入力データ読込部２１１と、事後確率推定部２１２と、混合比推定部２１３と、重み書込部２１４とを備えている。 Next, the detail of the structure of each part of the calculating means 2 is demonstrated.
<Weight estimation unit>
The configuration of the weight estimation unit 21 will be described with reference to FIG. FIG. 2 is a functional block diagram showing the configuration of the weight estimation unit. The weight estimation unit (weight estimation means) 21 obtains input samples (first sample data) of input / output samples (learning sample data) at the latest time (predetermined discrete values of index information) obtained from the input data (series data) 44. The expected error for one observation data) and the output sample (second observation data) is approximated using the model error function given input / output samples and the weight of the time information for each input / output sample. The weight of the time information is estimated so as to minimize the expected error. As shown in FIG. 2, the weight estimation unit 21 includes an input data reading unit 211, a posterior probability estimation unit 212, a mixture ratio estimation unit 213, and a weight writing unit 214.

入力データ読込部２１１は、入力データ４４を読み込み、事後確率推定部２１２に出力するものである。
事後確率推定部（事後確率推定手段）２１２は、最新時刻における入力データ４４の真の分布を近似するために、各時刻における経験分布に混合される予め定められた混合比と、入力データ４４における入出力サンプル（ｘ，ｙ）とを条件として、入出力サンプル（ｘ，ｙ）の時間情報ｔに対する事後確率をそれぞれ推定するものである。本実施形態では、事後確率推定部２１２は、前記した式（１３）で示される経験分布を予め計算しておく。また、事後確率推定部２１２は、前記した式（１６）により全サンプルの全時刻に対する事後確率を推定する。 The input data reading unit 211 reads the input data 44 and outputs it to the posterior probability estimation unit 212.
A posteriori probability estimation unit (a posteriori probability estimation means) 212 uses a predetermined mixture ratio to be mixed with the empirical distribution at each time, and the input data 44 in order to approximate the true distribution of the input data 44 at the latest time. The posterior probabilities for the time information t of the input / output sample (x, y) are estimated on the condition of the input / output sample (x, y). In the present embodiment, the posterior probability estimation unit 212 calculates in advance the empirical distribution represented by the above equation (13). In addition, the posterior probability estimation unit 212 estimates the posterior probabilities for all the times of all samples by the above-described equation (16).

混合比推定部（混合比推定手段）２１３は、事後確率推定部２１２で推定されたそれぞれの事後確率を利用して、最新時刻における入力データ４４に対する尤度を最大化するように、混合比を推定し、尤度が最大化されたときの混合比から重みを決定するものである。本実施形態では、混合比推定部２１３は、前記した式（１７）により混合比を推定する。また、混合比推定部２１３は、式（１１）に示す尤度が収束するか否かを判別し、収束していなときには、その時点の混合比を事後確率推定部２１２に出力し、一方、収束したときには、その時点の混合比から前記した式（８）に示す重みｗ（ｔ）を算出し、重み書込部２１４に出力する。
重み書込部２１４は、混合比推定部２１３で推定された重みｗ（ｔ）を、重み４５として、記憶手段４（図１参照）に書き込む。なお、書き込まれた重み４５は、モデル構築部２２で利用される。 The mixture ratio estimation unit (mixing ratio estimation unit) 213 uses each posterior probability estimated by the posterior probability estimation unit 212 to set the mixture ratio so as to maximize the likelihood for the input data 44 at the latest time. The weight is estimated from the mixture ratio when the likelihood is maximized. In the present embodiment, the mixture ratio estimation unit 213 estimates the mixture ratio according to the above equation (17). Further, the mixture ratio estimation unit 213 determines whether or not the likelihood shown in the equation (11) converges. When the likelihood does not converge, the mixture ratio at that time is output to the posterior probability estimation unit 212. When converged, the weight w (t) shown in the above equation (8) is calculated from the mixture ratio at that time, and is output to the weight writing unit 214.
The weight writing unit 214 writes the weight w (t) estimated by the mixture ratio estimation unit 213 as the weight 45 in the storage unit 4 (see FIG. 1). The written weight 45 is used by the model construction unit 22.

＜モデル構築部＞
図３を参照しながら、モデル構築部２２の構成について説明する。図３は、モデル構築部の構成を示す機能ブロック図である。モデル構築部（モデル構築手段）２２は、重み推定部２１で推定された時間情報（指標情報）の重みを用いた期待誤差に基づく目的関数と入力データ（系列データ）４４とを用いて、サンプル（予測用サンプルデータ）４７から、最新時刻（指標情報の予め定められた離散値）における未知の出力サンプル（第２観測データ）を予測するモデルを構築するものである。図３に示すように、モデル構築部２２は、入力データ読込部２２１と、重み読込部２２２と、モデルパラメータ推定部２２３と、ハイパーパラメータ推定部２２４と、モデルパラメータ書込部２２５とを備えている。 <Model building department>
The configuration of the model construction unit 22 will be described with reference to FIG. FIG. 3 is a functional block diagram showing the configuration of the model construction unit. The model construction unit (model construction means) 22 uses the objective function based on the expected error using the weight of the time information (index information) estimated by the weight estimation unit 21 and the input data (series data) 44 to sample A model for predicting an unknown output sample (second observation data) at the latest time (predetermined discrete value of index information) from (prediction sample data) 47 is constructed. As shown in FIG. 3, the model construction unit 22 includes an input data reading unit 221, a weight reading unit 222, a model parameter estimation unit 223, a hyper parameter estimation unit 224, and a model parameter writing unit 225. Yes.

入力データ読込部２２１は、入力データ４４を読み込み、モデルパラメータ推定部２２３に出力するものである。
重み読込部２２２は、重み４５を読み込み、モデルパラメータ推定部２２３に出力するものである。
モデルパラメータ推定部（モデルパラメータ推定手段）２２３は、重み読込部２２２で読み込んだ重み４５と、重み４５のハイパーパラメータと、近似された期待誤差における誤差関数とに基づく目的関数を最小化させるように、入力データ（系列データ）４４および重み４５に基づいてモデルを学習することで、最新時刻におけるモデルを推定するものである。このモデルパラメータ推定部２２３は、ハイパーパラメータ推定部２２４で推定されたハイパーパラメータをモデル推定に用いる。本実施形態では、モデルパラメータ推定部２２３は、前記した式（２２）によりモデルパラメータを推定する。 The input data reading unit 221 reads the input data 44 and outputs it to the model parameter estimation unit 223.
The weight reading unit 222 reads the weight 45 and outputs it to the model parameter estimation unit 223.
The model parameter estimation unit (model parameter estimation unit) 223 minimizes the objective function based on the weight 45 read by the weight reading unit 222, the hyperparameter of the weight 45, and the error function in the approximated expected error. The model at the latest time is estimated by learning the model based on the input data (series data) 44 and the weight 45. The model parameter estimation unit 223 uses the hyper parameter estimated by the hyper parameter estimation unit 224 for model estimation. In the present embodiment, the model parameter estimation unit 223 estimates model parameters using the above-described equation (22).

ハイパーパラメータ推定部（ハイパーパラメータ推定手段）２２４は、モデルパラメータ推定部２２３で推定されたモデルに対して適用される誤差関数を最小化させるように、ハイパーパラメータを推定するものである。本実施形態では、ハイパーパラメータ推定部２２４は、前記した式（２３）によりハイパーパラメータを推定する。また、ハイパーパラメータ推定部２２４は、式（１８）に示す誤差が収束するか否かを判別し、収束するまで、その時点のハイパーパラメータをモデルパラメータ推定部２２３に出力し続ける。これにより、モデルパラメータ推定部２２３では、誤差関数が最小化されたときのハイパーパラメータを用いて推定されたモデルを構築すべきモデルとして決定する。また、ハイパーパラメータ推定部２２４は、式（１８）に示す誤差が収束したと判別したときに、その時点で推定されているモデルをモデルパラメータ書込部２２５に出力する。
モデルパラメータ書込部２２５は、推定されたモデルパラメータを、モデルパラメータ４６として、記憶手段４（図１参照）に書き込む。なお、書き込まれたモデルパラメータ４６は、予測処理部２３で利用される。 The hyper parameter estimation unit (hyper parameter estimation means) 224 estimates the hyper parameter so as to minimize the error function applied to the model estimated by the model parameter estimation unit 223. In the present embodiment, the hyper parameter estimation unit 224 estimates the hyper parameter using the above-described equation (23). Further, the hyper parameter estimation unit 224 determines whether or not the error shown in Expression (18) converges, and continues to output the hyper parameter at that time to the model parameter estimation unit 223 until the error converges. Thereby, the model parameter estimation unit 223 determines a model estimated using the hyperparameter when the error function is minimized as a model to be constructed. Further, when the hyper parameter estimation unit 224 determines that the error shown in Expression (18) has converged, the hyper parameter estimation unit 224 outputs the model estimated at that time to the model parameter writing unit 225.
The model parameter writing unit 225 writes the estimated model parameter as the model parameter 46 in the storage unit 4 (see FIG. 1). The written model parameter 46 is used by the prediction processing unit 23.

＜予測処理部＞
図４を参照しながら、予測処理部２３の構成について説明する。図４は、予測処理部の構成を示す機能ブロック図である。予測処理部（予測処理手段）２３は、モデル構築部２２で構築されたモデル（モデルパラメータ４６）と、サンプル（予測用サンプルデータ）４７とに基づいて、最新時刻（指標情報の予め定められた離散値）におけるサンプル４７の未知の出力サンプル（第２観測データ）を予測する処理を行うものである。図４に示すように、予測処理部２３は、サンプル読込部２３１と、モデルパラメータ読込部２３２と、予測結果出力部２３３とを備えている。 <Prediction processing unit>
The configuration of the prediction processing unit 23 will be described with reference to FIG. FIG. 4 is a functional block diagram illustrating the configuration of the prediction processing unit. The prediction processing unit (prediction processing means) 23 is based on the model (model parameter 46) constructed by the model construction unit 22 and the sample (prediction sample data) 47, and the latest time (index information is predetermined). A process of predicting an unknown output sample (second observation data) of the sample 47 in (discrete values) is performed. As shown in FIG. 4, the prediction processing unit 23 includes a sample reading unit 231, a model parameter reading unit 232, and a prediction result output unit 233.

サンプル読込部２３１は、出力が未知のサンプル４７を読み込み、予測結果出力部２３３に出力するものである。
モデルパラメータ読込部２３２は、モデルパラメータ４６を読み込み、予測結果出力部２３３に出力するものである。
予測結果出力部２３３は、サンプル４７とモデルパラメータ４６とを使って出力サンプル（ｙ）の予測結果を計算し、計算結果（予測結果）を出力手段５（図１参照）に出力するものである。 The sample reading unit 231 reads a sample 47 whose output is unknown and outputs the sample 47 to the prediction result output unit 233.
The model parameter reading unit 232 reads the model parameter 46 and outputs it to the prediction result output unit 233.
The prediction result output unit 233 calculates the prediction result of the output sample (y) using the sample 47 and the model parameter 46, and outputs the calculation result (prediction result) to the output unit 5 (see FIG. 1). .

[予測装置の動作]
＜処理の流れ＞
図１に示した予測装置１の動作について図５を参照（適宜図１参照）して説明する。図５は、予測装置の処理の流れを示す説明図である。まず、予測装置１は、重み推定部２１によって、記憶手段４（図１参照）に予め格納された入力データ４４に基づいて重みを推定する（ステップＳ１：重み推定ステップ）。推定された重みは、重み４５として記憶手段４に格納される。次に、予測装置１は、モデル構築部２２によって、記憶手段４（図１参照）に予め格納された入力データ４４および重み４５に基づいてモデルを構築する（ステップＳ２：モデル構築推定ステップ）。構築されたモデルは、モデルパラメータ４６として記憶手段４に格納される。続いて、予測装置１は、予測処理部２３によって、記憶手段４（図１参照）に予め格納された出力が未知であるサンプル４７に対して、モデルパラメータ４６に基づいて、対応する出力を予測する予測処理を行う（ステップＳ３：予測処理ステップ）。 [Predictor operation]
<Process flow>
The operation of the prediction device 1 shown in FIG. 1 will be described with reference to FIG. 5 (refer to FIG. 1 as appropriate). FIG. 5 is an explanatory diagram showing the flow of processing of the prediction device. First, the prediction device 1 estimates the weight based on the input data 44 stored in advance in the storage unit 4 (see FIG. 1) by the weight estimation unit 21 (step S1: weight estimation step). The estimated weight is stored in the storage unit 4 as the weight 45. Next, the prediction device 1 constructs a model based on the input data 44 and the weight 45 stored in advance in the storage unit 4 (see FIG. 1) by the model construction unit 22 (step S2: model construction estimation step). The constructed model is stored in the storage unit 4 as the model parameter 46. Subsequently, the prediction device 1 predicts a corresponding output based on the model parameter 46 for the sample 47 whose output stored in the storage unit 4 (see FIG. 1) is unknown by the prediction processing unit 23. A prediction process is performed (step S3: prediction process step).

次に、前記したステップＳ１の重み推定ステップと、前記したステップＳ２のモデル構築推定ステップについて図６および図７をそれぞれ参照（適宜図１ないし図５参照）して説明する。図６は、重み推定処理を示すフローチャートであり、図７は、モデル構築処理を示すフローチャートである。 Next, the weight estimation step in step S1 and the model construction estimation step in step S2 will be described with reference to FIGS. 6 and 7 (refer to FIGS. 1 to 5 as appropriate). FIG. 6 is a flowchart showing the weight estimation process, and FIG. 7 is a flowchart showing the model construction process.

＜重み推定ステップ＞
前記したステップＳ１の重み推定ステップでは、図６に示すように、重み推定部２１は、入力データ読込部２１１によって、記憶手段４（図１参照）から、入力データ４４を読み込む（ステップＳ１１）。具体的には、入力データ読込部２１１は、前記した表１で示される構造をした入力データ（系列データ）を読み込む。そして、重み推定部２１は、事後確率推定部２１２によって、経験分布の計算を行う（ステップＳ１２）。具体的には、前記した式（１３）で示される経験分布を計算する。そして、重み推定部２１は、事後確率推定部２１２によって、初期化を行う（ステップＳ１３）。具体的には、事後確率推定部２１２は、ＥＭアルゴリズムのＥステップとＭステップとの２つの手順の繰り返し回数τを０に設定し（τ＝０）、例えば所定の乱数を発生させて前記した式（６）で示す混合比の混合分布Ｐ（ｔ）をランダムに設定する。 <Weight estimation step>
In the weight estimation step of step S1 described above, as shown in FIG. 6, the weight estimation unit 21 reads the input data 44 from the storage unit 4 (see FIG. 1) by the input data reading unit 211 (step S11). Specifically, the input data reading unit 211 reads input data (series data) having the structure shown in Table 1 described above. Then, the weight estimation unit 21 calculates the experience distribution by using the posterior probability estimation unit 212 (step S12). Specifically, the empirical distribution represented by the above equation (13) is calculated. Then, the weight estimation unit 21 performs initialization by the posterior probability estimation unit 212 (step S13). Specifically, the posterior probability estimation unit 212 sets the number of repetitions τ of the two steps of the E step and the M step of the EM algorithm to 0 (τ = 0), and generates a predetermined random number, for example. A mixture distribution P (t) having a mixture ratio represented by Expression (6) is set at random.

そして、初期化終了後に、重み推定部２１は、事後確率推定部２１２によって、ＥＭアルゴリズムのＥステップを実行する（ステップＳ１４）。具体的には、事後確率推定部２１２は、前記した式（１６）により、全サンプル（ｘ，ｙ）の全時刻ｔに対する事後確率を推定する。続いて、重み推定部２１は、混合比推定部２１３によって、ＥＭアルゴリズムのＭステップを実行する（ステップＳ１５）。具体的には、混合比推定部２１３は、前記した式（１７）により、第（τ＋１）ステップにおける混合分布Ｐ^{（τ＋１）}（ｔ）すなわち混合比を推定する。次に、重み推定部２１は、混合比推定部２１３によって、収束条件が満たされたか否かを判別する（ステップＳ１６）。具体的には、混合比推定部２１３は、前記した式（１１）に示す尤度Ｌ（Ｐ）が収束したか否かを判別する。 Then, after the initialization is completed, the weight estimation unit 21 uses the posterior probability estimation unit 212 to execute the E step of the EM algorithm (step S14). Specifically, the posterior probability estimation unit 212 estimates the posterior probabilities for all the times t of all the samples (x, y) by the above-described equation (16). Subsequently, the weight estimation unit 21 executes the M step of the EM algorithm by the mixture ratio estimation unit 213 (step S15). Specifically, the mixture ratio estimation unit 213 estimates the mixture distribution P ^{(τ + 1)} (t), that is, the mixture ratio in the ^{(τ + 1) -th} step by the above-described equation (17). Next, the weight estimation unit 21 determines whether or not the convergence condition is satisfied by the mixture ratio estimation unit 213 (step S16). Specifically, the mixture ratio estimation unit 213 determines whether or not the likelihood L (P) shown in the above equation (11) has converged.

収束条件が満たされた場合、すなわち前記した式（１１）に示す尤度Ｌ（Ｐ）が収束した場合（ステップＳ１６：Ｙｅｓ）、混合比推定部２１３は、重みｗ（ｔ）を計算する（ステップＳ１７）。具体的には、混合比推定部２１３は、収束した時点における混合分布Ｐ（ｔ）とサンプル数Ｎ（ｔ）とに基づいて前記した式（８）に示す重みｗ（ｔ）を全時刻ｔについて算出する。そして、重み推定部２１は、重み書込部２１４によって、前記した式（８）に示す重みｗ（ｔ）を、重み４５として、記憶手段４（図１参照）に書き込み（ステップＳ１８）、処理を終了する。この重み４５は、最新時刻Ｔにおけるモデルを構築するために時刻ｔに対して推定された重みを示す。 When the convergence condition is satisfied, that is, when the likelihood L (P) shown in the above equation (11) has converged (step S16: Yes), the mixture ratio estimation unit 213 calculates the weight w (t) ( Step S17). Specifically, the mixture ratio estimation unit 213 uses the mixture distribution P (t) and the number of samples N (t) at the time of convergence to calculate the weight w (t) shown in the above equation (8) for all times t. Is calculated. Then, the weight estimation unit 21 writes the weight w (t) shown in the above equation (8) into the storage unit 4 (see FIG. 1) as the weight 45 by the weight writing unit 214 (step S18), and the processing Exit. The weight 45 indicates a weight estimated with respect to time t in order to construct a model at the latest time T.

一方、ステップＳ１６において、収束条件が満たされていない場合、すなわち前記した式（１１）に示す尤度Ｌ（Ｐ）が収束していない場合（ステップＳ１６：Ｎｏ）、重み推定部２１は、ＥステップおよびＭステップの繰り返し回数τに「１」を加算し（τ＝τ＋１）、繰り返し回数τを更新し（ステップＳ１９）、ステップＳ１４に戻る。 On the other hand, when the convergence condition is not satisfied in step S16, that is, when the likelihood L (P) shown in the above equation (11) is not converged (step S16: No), the weight estimation unit 21 determines that E “1” is added to the number of repetitions τ of the step and the M step (τ = τ + 1), the number of repetitions τ is updated (step S19), and the process returns to step S14.

＜モデル構築ステップ＞
前記したステップＳ２のモデル構築ステップでは、図７に示すように、モデル構築部２２は、まず、入力データ（系列データ）と重みとを読み込む（ステップＳ２１）。具体的には、モデル構築部２２は、入力データ読込部２２１によって、記憶手段４（図１参照）から、入力データ４４を読み込む。また、モデル構築部２２は、重み読込部２２２によって、記憶手段４（図１参照）から、重み推定処理により推定した重み４５を読み込む。なお、入力データ４４の読み込みと、重みの読み込みとの実行順序は、任意であり、処理を並列に実行してもよい。 <Model construction step>
In the model building step in step S2, the model building unit 22 first reads input data (series data) and weights as shown in FIG. 7 (step S21). Specifically, the model construction unit 22 reads the input data 44 from the storage unit 4 (see FIG. 1) by the input data reading unit 221. In addition, the model construction unit 22 reads the weight 45 estimated by the weight estimation process from the storage unit 4 (see FIG. 1) by the weight reading unit 222. Note that the execution order of the input data 44 reading and the weight reading is arbitrary, and the processes may be executed in parallel.

そして、モデル構築部２２は、モデルパラメータ推定部２２３によって、初期化を行う（ステップＳ２２）。具体的には、モデルパラメータ推定部２２３は、学習データＤをＫ個の部分集合に分割し、ハイパーパラメータλを０に設定する（λ＝０）。そして、モデル構築部２２は、モデルパラメータ推定部２２３によって、モデルの学習を行う（ステップＳ２３）。具体的には、モデルパラメータ推定部２２３は、Ｋ重交差検定法にしたがって、前記した式（２２）で最新時刻Ｔにおけるモデルを学習する。そして、モデル構築部２２は、ハイパーパラメータ推定部２２４によって、ハイパーパラメータλの学習を行う（ステップＳ２４）。具体的には、ハイパーパラメータ推定部２２４は、前記した式（２３）でハイパーパラメータλを学習する。そして、モデル構築部２２は、ハイパーパラメータ推定部２２４によって、収束条件が満たされたか否かを判別する（ステップＳ２５）。具体的には、ハイパーパラメータ推定部２２４は、前記した式（１８）に示す誤差Ｅ（Ｍ，λ）が収束したか否かを判別する。 And the model construction part 22 is initialized by the model parameter estimation part 223 (step S22). Specifically, the model parameter estimation unit 223 divides the learning data D into K subsets and sets the hyperparameter λ to 0 (λ = 0). And the model construction part 22 learns a model by the model parameter estimation part 223 (step S23). Specifically, the model parameter estimation unit 223 learns the model at the latest time T by the above-described equation (22) according to the K-fold cross validation method. Then, the model construction unit 22 learns the hyperparameter λ by the hyperparameter estimation unit 224 (step S24). Specifically, the hyper parameter estimation unit 224 learns the hyper parameter λ using the above-described equation (23). And the model construction part 22 discriminate | determines whether the convergence condition was satisfy | filled by the hyper parameter estimation part 224 (step S25). Specifically, the hyper parameter estimation unit 224 determines whether or not the error E (M, λ) shown in the equation (18) has converged.

収束条件が満たされた場合、すなわち前記した式（１８）に示す誤差Ｅ（Ｍ，λ）が収束した場合（ステップＳ２５：Ｙｅｓ）、モデル構築部２２は、モデルパラメータ書込部２２５によって、その時点のモデルをモデルパラメータ４６として記憶手段４（図１参照）に書き込み（ステップＳ２６）、処理を終了する。このモデルパラメータ４６は、最新時刻Ｔにおけるモデルとして推定されたモデルを示す。一方、ステップＳ２５において、収束条件が満たされていない場合、すなわち前記した式（１８）に示す誤差Ｅ（Ｍ，λ）が収束していない場合（ステップＳ２５：Ｎｏ）、モデル構築部２２は、ステップＳ２３に戻る。 When the convergence condition is satisfied, that is, when the error E (M, λ) shown in the equation (18) has converged (step S25: Yes), the model construction unit 22 uses the model parameter writing unit 225 to The model at the time is written as the model parameter 46 in the storage means 4 (see FIG. 1) (step S26), and the process is terminated. The model parameter 46 indicates a model estimated as a model at the latest time T. On the other hand, when the convergence condition is not satisfied in step S25, that is, when the error E (M, λ) shown in the above equation (18) has not converged (step S25: No), the model construction unit 22 The process returns to step S23.

なお、予測装置１は、一般的なコンピュータに、前記した各ステップを実行させる予測プログラムを実行することで実現することもできる。このプログラムは、通信回線を介して配布することも可能であるし、ＣＤ−ＲＯＭ等の記録媒体に書き込んで配布することも可能である。 Note that the prediction device 1 can also be realized by executing a prediction program that causes a general computer to execute the above-described steps. This program can be distributed via a communication line, or can be written on a recording medium such as a CD-ROM for distribution.

本実施形態によれば、予測装置１は、時間情報の重みを推定することによって、最新時刻のデータに適合するモデルを学習するために有益な情報を、最新時刻のデータ以外の過去のデータから取り込むことで、最新時刻での経験誤差を、最新時刻での期待誤差に対する精度のよい近似として用いることができる。そのため、予測装置１は、時間とともに性質が変化する時系列データにおいて、最新の時刻におけるデータを高い精度で予測することができる。その結果、例えば、ユーザの履歴購買データから、ユーザの現時点（最新時刻）における購買傾向を予測し、予測された商品をユーザにリコメンドすることができる。これにより、ユーザが所望する商品の情報に対して迅速にアクセスできるように利便性を向上させる効果と、商品提供者の収益を増加させる効果とをもたらすことが可能となる。 According to the present embodiment, the prediction device 1 estimates useful information for learning a model suitable for the latest time data from past data other than the latest time data by estimating the weight of the time information. By taking in, the experience error at the latest time can be used as an accurate approximation to the expected error at the latest time. Therefore, the prediction apparatus 1 can predict the data at the latest time with high accuracy in the time-series data whose properties change with time. As a result, for example, the purchase tendency of the user at the current time (latest time) can be predicted from the user's historical purchase data, and the predicted product can be recommended to the user. Thereby, it is possible to bring about an effect of improving convenience so that the user can quickly access information on the product desired and an effect of increasing the profit of the product provider.

以上、本発明の実施形態について説明したが、本発明はこれに限定されるものではなく、その趣旨を変えない範囲で実施することができる。例えば、重み推定処理において、前記した式（５）による近似は一例であって、条件に応じて可変とすることができる。例えば、式（２４）による近似を用いることもできる。すなわち、時刻ｔにおける入出力サンプルの条件付き分布Ｐ（ｘ｜ｙ，ｔ）と、時刻ｔとは異なる時刻ｔ′における入出力サンプルの条件付き分布Ｐ（ｘ｜ｙ，ｔ′）とが等しく、かつ、出力分布のみ時刻により異なる状況においては、最新時刻Ｔでの出力サンプルｙの分布Ｐ（ｙ｜Ｔ）を各時刻の出力サンプルｙの経験分布の混合で近似するように混合比を決定することができる。このとき、式（２４）の関係式を満たす混合比は、前記した式（５）の関係式を満たす混合比の近似となる。 As mentioned above, although embodiment of this invention was described, this invention is not limited to this, It can implement in the range which does not change the meaning. For example, in the weight estimation process, the approximation by the above equation (5) is an example, and can be made variable according to the conditions. For example, the approximation by the equation (24) can be used. That is, the conditional distribution P (x | y, t) of the input / output samples at time t is equal to the conditional distribution P (x | y, t ') of the input / output samples at time t ′ different from time t. In a situation where only the output distribution differs depending on the time, the mixing ratio is determined so that the distribution P (y | T) of the output sample y at the latest time T is approximated by a mixture of the empirical distributions of the output samples y at each time. can do. At this time, the mixing ratio satisfying the relational expression of Expression (24) is an approximation of the mixing ratio satisfying the relational expression of Expression (5).

式（２４）を用いる場合には、予測装置１の重み推定部２１の事後確率推定部２１２は、混合比と、入力データ４４における出力サンプル（ｙ）とを条件として、出力サンプル（ｙ）の時間情報ｔに対する事後確率をそれぞれ推定する。具体的には、事後確率推定部２１２は、前記した式（１３）の代わりに式（１３Ａ）で示される経験分布を予め計算しておく。式（１３Ａ）において、時刻がｔ、出力がｙのサンプル数をＮ（ｔ，ｙ）とする。また、この場合、事後確率推定部２１２は、前記した式（１６）の代わりに式（１６Ａ）により全サンプルの全時刻に対する事後確率を推定する。さらに、この場合、混合比推定部２１３は、前記した式（１７）の代わりに式（１７Ａ）により混合比を推定する。 When Expression (24) is used, the posterior probability estimation unit 212 of the weight estimation unit 21 of the prediction device 1 performs the condition of the output sample (y) on the condition of the mixture ratio and the output sample (y) in the input data 44. The posterior probabilities for the time information t are estimated. Specifically, the posterior probability estimation unit 212 calculates in advance an empirical distribution represented by the equation (13A) instead of the above equation (13). In Equation (13A), the number of samples whose time is t and output is y is N (t, y). Further, in this case, the posterior probability estimation unit 212 estimates the posterior probability for all the times of all samples by using the equation (16A) instead of the above equation (16). Furthermore, in this case, the mixture ratio estimation unit 213 estimates the mixture ratio using Expression (17A) instead of Expression (17) described above.

また、本実施形態では、サンプルが与えられたときのモデルの誤差関数Ｊとして前記した式（２）および式（３）で示すものを例示したが、任意の誤差関数を用いることが可能である。さらに、本実施形態では、例えば、前記した式（１）においてＴを最新時刻であるものとして説明したが、Ｔは最新時刻に限らず任意の時刻とすることができる。 Further, in the present embodiment, the error function J of the model when the sample is given is exemplified by the expressions (2) and (3) described above, but any error function can be used. . Furthermore, in the present embodiment, for example, T has been described as being the latest time in the above-described formula (1), but T is not limited to the latest time and can be any time.

さらにまた、本実施形態では、時間とともに性質が変化する時系列データを扱い、例えば、前記した式（４）においてｔを時間情報としたが、ｔは離散変数であれば時間以外の情報であってもよい。この場合には、ｔを例えば特定の場所や地域を指し示す地域情報（例えば空間座標）とすることができる。地域情報ｔに対応付けた購買データの一例を表２に示す。これにより、対象とする所定の地域に関するデータについて、その地域とは異なる他の地域に関するデータを重み付けて考慮することで、対象とする地域に関する未知データを高い精度で予測することができる。 Furthermore, in this embodiment, time-series data whose properties change with time is handled. For example, t is time information in the above-described equation (4), but t is information other than time if it is a discrete variable. May be. In this case, t can be set as, for example, area information (for example, spatial coordinates) indicating a specific place or area. Table 2 shows an example of purchase data associated with the regional information t. As a result, unknown data related to the target area can be predicted with high accuracy by weighting and considering data related to another area different from that area.

また、予測装置１を構成する装置は、１台に限定されることはなく、複数の装置に機能を分散配置してもよい。例えば、演算手段２の重み推定部２１、モデル構築部２２、予測処理部２３や、記憶手段４のデータ格納部４０ｂを、別々の装置として構成してもよい。これにより、各装置への負荷が分散され、高速な処理が実現可能となる。 Moreover, the apparatus which comprises the prediction apparatus 1 is not limited to 1 unit | set, You may distribute and arrange | position a function to several apparatus. For example, the weight estimation unit 21, the model construction unit 22, the prediction processing unit 23, and the data storage unit 40b of the storage unit 4 may be configured as separate devices. As a result, the load on each device is distributed, and high-speed processing can be realized.

本発明の効果を確認するために、本実施形態に係る予測装置１に、動画配信サービスの購買履歴を入力する場合の商品の予測精度と、携帯電話用漫画配信サービスの購買履歴を入力する場合の商品の予測精度とを実験により求めた。 In order to confirm the effect of the present invention, the prediction accuracy of the product when inputting the purchase history of the video distribution service and the purchase history of the comic distribution service for mobile phones are input to the prediction device 1 according to the present embodiment. The prediction accuracy of the product was determined by experiment.

＜設定＞
動画配信サービスの購買履歴（以下、動画データという）は、2007年１月１日から2007年１月31日までの動画配信サービスにおける動画データを示す。この動画データにおいて、ユーザ数は「40,282」、商品数は「5,914」、購買数は「1,667,414」であった。携帯電話用漫画配信サービスの購買履歴（以下、漫画データという）は、2005年４月１日から2006年３月31日の携帯電話用漫画配信サービスにおける漫画データを示す。この漫画データにおいて、ユーザ数は「164,538」、商品数は「175」、購買数は「1,018,741」であった。ここで１つの漫画が複数巻あるものは同一の商品として扱った。 <Setting>
The purchase history of the moving image distribution service (hereinafter referred to as moving image data) indicates moving image data in the moving image distribution service from January 1, 2007 to January 31, 2007. In this moving image data, the number of users was “40,282”, the number of products was “5,914”, and the number of purchases was “1,667,414”. The purchase history of the comic distribution service for mobile phones (hereinafter referred to as comic data) indicates comic data in the comic distribution service for mobile phones from April 1, 2005 to March 31, 2006. In this comic data, the number of users was “164,538”, the number of products was “175”, and the number of purchases was “1,018,741”. Here, one comic with multiple volumes was treated as the same product.

なお、動画データおよび漫画データから、売上数が「１」以下の商品を省くと共に、購買数が「１」以下であるユーザを省いた。また、ある商品が２回以上同一ユーザに購入された場合、その商品に関する２回目以降の購買を購買履歴から省いた。 It should be noted that, from the moving image data and the comic data, the products whose sales number is “1” or less are omitted, and the users whose purchase number is “1” or less are omitted. Further, when a certain product is purchased by the same user twice or more, the second and subsequent purchases regarding the product are omitted from the purchase history.

＜定式化＞
この場合のモデルＭとして１次マルコフモデルを用いた。また、１次マルコフモデルの誤差関数として、式（２５）に示すように、負の対数尤度を用いた。式（２５）において、θ_ijは、商品ｉの次に商品ｊを購入する確率（０≦θ_ij≦１，Σ_jθ_ij＝１）を意味するモデルパラメータである。また、商品ｘ_nは、あるユーザＵ_nが商品ｙ_nを購入したときにその１つ前に購入した商品を表すものとする。 <Formulation>
A primary Markov model was used as the model M in this case. Further, as an error function of the first-order Markov model, negative log likelihood is used as shown in Expression (25). In equation (25), θ _ij is a model parameter that means the probability (0 ≦ θ _ij ≦ 1, Σ _j θ _ij = 1) of purchasing the product j after the product i. Also, product x _n denote the product with the user U _n purchases before that one when a purchase y _n.

そして、この場合、前記した式（１８）に示す誤差Ｅは、式（２６）に示すように書き換えられる。つまり、式（２６）に示すＥが、最小にすべき目的関数である。 In this case, the error E shown in the equation (18) is rewritten as shown in the equation (26). That is, E shown in Expression (26) is an objective function to be minimized.

ここで、前記した式（２５）に示す誤差関数Ｊが最小となるモデルパラメータθ_ijを式（２７）に示す。式（２７）において、βはスムージングパラメータである。以下の計算では、β＝１０^−２を用いた。また、Ｖは商品数を示す。なお、式（２７）においてθに付した記号「＾（ハット）」は、θが誤差関数を最小化させることを示すものである。 Here, the model parameter θ _ij that minimizes the error function J shown in the equation (25) is shown in the equation (27). In Expression (27), β is a smoothing parameter. In the following calculation, β = 10 ⁻² was used. V represents the number of products. In the equation (27), the symbol “」 (hat) ”attached to θ indicates that θ minimizes the error function.

（実施例１、実施例２）
実施例１（OurXY）では、前記した式（５）に基づいて、最新時刻Ｔの同時分布（入出力分布ｘ，ｙ）を近似するように重みｗ（ｔ）を設定した。また、実施例２（OurY）では、前記した式（２４）に基づいて、最新時刻Ｔの出力分布ｙを近似するように重みｗ（ｔ）を設定した。これら実施例１（OurXY）および実施例２（OurY）では、前記した式（１３）に示すスムージングパラメータαの値を、α＝１０^-8とし、重みの最大値で各時刻ｔにおける重みの値を割った後に、ハイパーパラメータλを推定した。ここで、重みの最大値が「１」となるようにハイパーパラメータλを推定した。また、１０重交差検定で汎化誤差が最小になるように黄金分割法によりλを推定した。なお、このときにハイパーパラメータλの取り得る区間は［０，１０］とした。 (Example 1, Example 2)
In Example 1 (OurXY), the weight w (t) is set so as to approximate the simultaneous distribution (input / output distribution x, y) at the latest time T based on the above-described equation (5). In Example 2 (OurY), the weight w (t) is set so as to approximate the output distribution y at the latest time T based on the equation (24). In the first embodiment (OurXY) and the second embodiment (OurY), the value of the smoothing parameter α shown in the equation (13) is α = 10 ⁻⁸ , and the weight value at each time t is the maximum weight value. Hyperparameter λ was estimated. Here, the hyperparameter λ is estimated so that the maximum value of the weight is “1”. In addition, λ was estimated by the golden section method so that the generalization error was minimized by the 10-fold cross validation. At this time, the possible section of the hyper parameter λ is [0, 10].

（比較例１、比較例２）
実施例１（OurXY）および実施例２（OurY）を以下の２つのモデル（比較例１、比較例２）と比較した。
比較例１（NoWeight）：日にちによる重みの変動なし（ｗ（ｔ）＝１）
比較例２（Present）：最新日のみ重みあり（ｗ（Ｔ）＝１、ｔ≠Ｔでｗ（ｔ）＝０） (Comparative Example 1 and Comparative Example 2)
Example 1 (OurXY) and Example 2 (OurY) were compared with the following two models (Comparative Example 1 and Comparative Example 2).
Comparative Example 1 (NoWeight): No change in weight due to date (w (t) = 1)
Comparative Example 2 (Present): Only the latest date has a weight (w (T) = 1, t ≠ T and w (t) = 0)

＜実験方法＞
実験では時間情報の単位時間を１日とした。つまり、最新時刻とは、最新日すなわち最終日を意味する。そして、実験では、まずはじめの段階、すなわち、１日目では、最終日Ｔを配信開始日に設定した。次の段階、すなわち、２日目では、最終日Ｔを１日ずらし、配信開始日の次の日に設定した。以下同様に、最終日Ｔを１日ずつずらしていった。各段階、すなわち、それぞれの日にち（それぞれの最終日Ｔ）では、その最終日Ｔ以前の購買履歴のみを用いて、その最終日Ｔにおけるデータを作成した。そして、それぞれの日にちＴにおいて、その性能を式（２８）に示すパープレキシティＬ（Ｔ）により評価した。 <Experiment method>
In the experiment, the unit time of time information was 1 day. That is, the latest time means the latest day, that is, the last day. In the experiment, at the first stage, that is, on the first day, the final date T is set as the distribution start date. In the next stage, that is, the second day, the final date T is shifted by one day and set to the next day of the distribution start date. Similarly, the last day T was shifted by one day. At each stage, that is, on each date (each final date T), data on the final date T was created using only the purchase history before the final date T. And in each day T, the performance was evaluated by perplexity L (T) shown in Formula (28).

式（２８）において、Ｎ_Tは、その日にちの日付Ｔ（最新日Ｔ）におけるサンプル数である。パープレキシティは、その値が低い場合に、予測性能が高いことを表す。それぞれの日にちＴにおいて、まず、動画データについて１０重交差検定により、学習およびテストデータを１０セット作成し、その日にちＴのパープレキシティをそれぞれ求めた。そして、式（２９）に示すように、すべての日数Ｄで平均したパープレキシティ（平均パープレキシティ）Ｌを求めた。式（２９）に示すＤは、動画データにおいては、Ｄ＝３１である。同様に漫画データについても平均パープレキシティＬを求めた。ここで、漫画データにおいては、Ｄ＝３６５である。 In Expression (28), N _T is the number of samples on the date T (latest date T) of that date. The perplexity indicates that the prediction performance is high when the value is low. For each date T, first, ten sets of learning and test data were created for the moving image data by 10-fold cross validation, and the perplexity of the date T was determined. Then, as shown in Expression (29), perplexity (average perplexity) L averaged over all days D was obtained. D shown in Expression (29) is D = 31 in the moving image data. Similarly, the average perplexity L was obtained for the comic data. Here, in the comic data, D = 365.

＜実験結果＞
実施例および比較例についての実験結果を表２に示す。 <Experimental result>
Table 2 shows the experimental results for the examples and comparative examples.

表３に示すように、実施例１（OurXY）および実施例２（OurY）は、比較例１（NoWeight）および比較例２（Present）と比べて、平均パープレキシティＬが低い。言い換えると、実施例１（OurXY）および実施例２（OurY）は、比較例１（NoWeight）および比較例２（Present）と比べて、予測性能が高い。つまり、本発明では、その特徴である時間情報の重み付けにより、最新日のデータをより正確に予測できるモデルが学習できていると言える。なお、比較例１（NoWeight）は、最新日とは性質の異なる過去の時刻のデータも学習に用いているため低い予測精度となっている。また、比較例２（Present）は、学習に用いるデータの数が少ないため、低い予測精度となっている。 As shown in Table 3, Example 1 (OurXY) and Example 2 (OurY) have a lower average perplexity L than Comparative Example 1 (NoWeight) and Comparative Example 2 (Present). In other words, Example 1 (OurXY) and Example 2 (OurY) have higher prediction performance than Comparative Example 1 (NoWeight) and Comparative Example 2 (Present). That is, in the present invention, it can be said that a model capable of predicting the data of the latest date more accurately can be learned by weighting the time information that is a feature of the present invention. Note that Comparative Example 1 (NoWeight) has low prediction accuracy because data of a past time having a different property from the latest date is also used for learning. Further, Comparative Example 2 (Present) has low prediction accuracy because the number of data used for learning is small.

本発明の実施形態に係る予測装置の構成を示すブロック図である。It is a block diagram which shows the structure of the prediction apparatus which concerns on embodiment of this invention. 重み推定部の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of a weight estimation part. モデル構築部の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of a model construction part. 予測処理部の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of a prediction process part. 予測装置の処理の流れを示す説明図である。It is explanatory drawing which shows the flow of a process of a prediction apparatus. 重み推定処理を示すフローチャートである。It is a flowchart which shows a weight estimation process. モデル構築処理を示すフローチャートである。It is a flowchart which shows a model construction process.

Explanation of symbols

１予測装置
２演算手段
３入力手段
４記憶手段
５出力手段
１１バスライン
２１重み推定部（重み推定手段）
２２モデル構築部（モデル構築手段）
２３予測処理部（予測処理手段）
２４メモリ
４０ａプログラム格納部
４１重み推定プログラム
４２モデル構築プログラム
４３予測処理プログラム
４０ｂデータ格納部
４４入力データ（系列データ）
４５重み
４６モデルパラメータ
４７サンプル（予測用サンプルデータ）
２１１入力データ読込部
２１２事後確率推定部（事後確率推定手段）
２１３混合比推定部（混合比推定手段）
２１４重み書込部
２２１入力データ読込部
２２２重み読込部
２２３モデルパラメータ推定部（モデルパラメータ推定手段）
２２４ハイパーパラメータ推定部（ハイパーパラメータ推定手段）
２２５モデルパラメータ書込部
２３１サンプル読込部
２３２モデルパラメータ読込部
２３３予測結果出力部 DESCRIPTION OF SYMBOLS 1 Prediction apparatus 2 Calculation means 3 Input means 4 Storage means 5 Output means 11 Bus line 21 Weight estimation part (weight estimation means)
22 Model building department (model building means)
23 Prediction processing unit (prediction processing means)
24 memory 40a program storage unit 41 weight estimation program 42 model building program 43 prediction processing program 40b data storage unit 44 input data (series data)
45 Weight 46 Model parameter 47 samples (prediction sample data)
211 Input data reading unit 212 A posteriori probability estimation unit (a posteriori probability estimation means)
213 Mixing ratio estimation unit (mixing ratio estimation means)
214 Weight writing unit 221 Input data reading unit 222 Weight reading unit 223 Model parameter estimation unit (model parameter estimation means)
224 Hyper parameter estimation unit (hyper parameter estimation means)
225 Model parameter writing unit 231 Sample reading unit 232 Model parameter reading unit 233 Prediction result output unit

Claims

Input means for inputting series data that is a series of training sample data and sample data for prediction;
Computing means;
Storage means for storing the input series data and the prediction sample data and storing a calculation processing result by the calculation means;
Output means for outputting a prediction result by the calculation means,
The learning sample data includes: first observation data whose properties change according to time or region; second observation data whose properties change according to time or region in relation to the first observation data; Data having three elements, index information indicating discrete values of time or region information corresponding to the first and second observation data,
When the prediction sample data is data in which the second observation data is unknown and the first observation data and the index information are known among the three elements constituting the learning sample data,
The computing means is
In order to approximate the true distribution of the series data at a predetermined discrete value of the index information, a predetermined mixing ratio mixed with an empirical distribution at each discrete value of the index information, Posterior probability estimating means for estimating posterior probabilities for the index information of the first and second observation data, respectively, on the condition of the first and second observation data of predetermined learning sample data;
Using the estimated posterior probabilities, the mixture ratio is estimated so as to maximize the likelihood for the sequence data at the predetermined discrete value of the index information, and the likelihood is maximized. Mixing ratio estimating means for determining the weight of the index information from the mixing ratio when
Weight writing means for writing the weight of the determined index information into the storage means;
The weight of the determined index information is read from the storage means, the objective function based on the expected error using the weight and the series data are used, and the predetermined index information is determined from the prediction sample data. Model construction means for constructing a model for predicting the unknown second observation data at the discrete values obtained and writing the model to the storage means ;
Reading the constructed model and the prediction sample data from the storage means , and based on the constructed model and the prediction sample data, the index information in the predetermined discrete value A prediction apparatus comprising: a prediction processing unit that performs a process of predicting the unknown second observation data of the prediction sample data.

Input means for inputting series data that is a series of training sample data and sample data for prediction;
Computing means;
Storage means for storing the input series data and the prediction sample data and storing a calculation processing result by the calculation means;
Output means for outputting a prediction result by the calculation means,
The learning sample data includes: first observation data whose properties change according to time or region; second observation data whose properties change according to time or region in relation to the first observation data; Data having three elements, index information indicating discrete values of time or region information corresponding to the first and second observation data,
The series data is a series of learning sample data in which the distribution of the first observation data x when the second observation data y is given is similar in the discrete values of the different index information,
When the prediction sample data is data in which the second observation data is unknown and the first observation data and the index information are known among the three elements constituting the learning sample data,
The computing means is
To approximate the true distribution of the sequence data in the predetermined discrete values of the index information, and a predetermined mixing ratio to be mixed with the empirical distribution for the discrete values of the index information, in the sequence data Posterior probability estimating means for estimating posterior probabilities for the index information of the second observation data on the condition of the second observation data of predetermined learning sample data;
Using the estimated posterior probabilities, the mixture ratio is estimated so as to maximize the likelihood for the sequence data at the predetermined discrete value of the index information, and the likelihood is maximized. mixing ratio estimating means for determining the weight of the index information from the mixing ratio when reduction,
Weight writing means for writing the weight of the determined index information into the storage means;
The weight of the determined index information is read from the storage means, the objective function based on the expected error using the weight and the series data are used, and the predetermined index information is determined from the prediction sample data. Model construction means for constructing a model for predicting the unknown second observation data at the discrete values obtained and writing the model to the storage means;
Reading the constructed model and the prediction sample data from the storage means, and based on the constructed model and the prediction sample data, the index information in the predetermined discrete value prediction apparatus comprising: a prediction processing means for performing processing to predict the unknown second observation data of the prediction for the sample data.

The model construction means includes
Based on the sequence data and the estimated weight so as to minimize an objective function based on the weight of the estimated index information, the hyperparameter of the weight, and the error function in the approximated expected error Model parameter estimation means for estimating a model at the predetermined discrete value by learning a model;
Hyperparameter estimation means for estimating the hyperparameter so as to minimize the error function applied to the estimated model,
The model parameter estimation means includes:
Predicting apparatus according to claim 1 or claim 2, characterized in that determining a model is to be constructed estimated model using the hyper parameters when the error function is minimized.

Input means for inputting series data that is a series of training sample data and sample data for prediction;
Computing means;
Storage means for storing the input series data and the prediction sample data and storing a calculation processing result by the calculation means;
Output means for outputting a prediction result by the computing means, and a prediction method of a prediction device comprising:
The learning sample data includes: first observation data whose properties change according to time or region; second observation data whose properties change according to time or region in relation to the first observation data; Data having three elements, index information indicating discrete values of time or region information corresponding to the first and second observation data,
When the prediction sample data is data in which the second observation data is unknown and the first observation data and the index information are known among the three elements constituting the learning sample data,
The calculation means of the prediction device includes:
In order to approximate the true distribution of the series data at a predetermined discrete value of the index information, a predetermined mixing ratio mixed with an empirical distribution at each discrete value of the index information, Estimating the posterior probabilities for the index information of the first and second observation data, respectively, on the condition of the first and second observation data of predetermined learning sample data;
Using the estimated posterior probabilities, the mixture ratio is estimated so as to maximize the likelihood for the sequence data at the predetermined discrete value of the index information, and the likelihood is maximized. Determining the weight of the indicator information from the mixture ratio when
Writing the weight of the determined index information into the storage means;
The weight of the determined index information is read from the storage means, the objective function based on the expected error using the weight and the series data are used, and the predetermined index information is determined from the prediction sample data. A model construction step of constructing a model for predicting the unknown second observation data at a given discrete value and writing the model to the storage means ;
Reading the constructed model and the prediction sample data from the storage means , and based on the constructed model and the prediction sample data, the index information in the predetermined discrete value And a prediction processing step of performing a process of predicting the unknown second observation data of the sample data for prediction.

Input means for inputting series data that is a series of training sample data and sample data for prediction;
Computing means;
Storage means for storing the input series data and the prediction sample data and storing a calculation processing result by the calculation means;
Output means for outputting a prediction result by the computing means, and a prediction method of a prediction device comprising:
The learning sample data includes: first observation data whose properties change according to time or region; second observation data whose properties change according to time or region in relation to the first observation data; Data having three elements, index information indicating discrete values of time or region information corresponding to the first and second observation data,
The series data is a series of learning sample data in which the distribution of the first observation data x when the second observation data y is given is similar in the discrete values of the different index information,
When the prediction sample data is data in which the second observation data is unknown and the first observation data and the index information are known among the three elements constituting the learning sample data,
The calculation means of the prediction device includes:
In order to approximate the true distribution of the series data at a predetermined discrete value of the index information, a predetermined mixing ratio mixed with an empirical distribution at each discrete value of the index information, Estimating each of the posterior probabilities for the index information of the second observation data on the condition of the second observation data of predetermined learning sample data;
Using the estimated posterior probabilities, the mixture ratio is estimated so as to maximize the likelihood for the sequence data at the predetermined discrete value of the index information, and the likelihood is maximized. Determining the weight of the indicator information from the mixture ratio when
Writing the weight of the determined index information into the storage means;
The weight of the determined index information is read from the storage means, the objective function based on the expected error using the weight and the series data are used, and the predetermined index information is determined from the prediction sample data. A model construction step of constructing a model for predicting the unknown second observation data at a given discrete value and writing the model to the storage means;
Reading the constructed model and the prediction sample data from the storage means, and based on the constructed model and the prediction sample data, the index information in the predetermined discrete value And a prediction processing step of performing a process of predicting the unknown second observation data of the sample data for prediction.

A prediction program for causing a computer to execute the prediction method according to claim 4 or 5.

A computer-readable recording medium on which the prediction program according to claim 6 is recorded.