JP2007323315A

JP2007323315A - Cooperative filtering method, cooperative filtering device, cooperative filtering program and recording medium with the same program recorded thereon

Info

Publication number: JP2007323315A
Application number: JP2006152138A
Authority: JP
Inventors: Shuhei Kuwata; 修平桑田; Shuko Ueda; 修功上田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-05-31
Filing date: 2006-05-31
Publication date: 2007-12-13

Abstract

<P>PROBLEM TO BE SOLVED: To provide a cooperative filtering method, a cooperative filtering device, a cooperative filtering program and a recording medium with the program recorded thereon for predicting the unevaluated scores of a user with higher precision than a conventional manner. <P>SOLUTION: In the cooperative filtering method for predicting the scores of the unevaluated items of a certain user by using an evaluation history including scores evaluated by a plurality of users with a discrete value for a plurality of items, an input data reading part 221 reads a matrix R as input data, a prediction score initial value calculation part 222 substitutes an initial value into the unevaluated scores of the matrix R, a prediction score updating part 223 predicts the unevaluated scores by decreasing the KL divergence of an evaluated score group and unevaluated score group, and a prediction score writing part 224 writes the acquired prediction scores in a storage means 4. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ユーザによって未評価である評点を、従来よりも高い予測精度で予測できる協調フィルタリング方法、協調フィルタリング装置、および協調フィルタリングプログラムならびにそのプログラムを記録した記録媒体に関する。 The present invention relates to a collaborative filtering method, a collaborative filtering device, a collaborative filtering program, and a recording medium recording the program, which can predict a score that has not been evaluated by a user with higher prediction accuracy than before.

膨大なアイテム群の中から、ユーザが高い評価を与えると予測されるアイテムを選択し、推薦するリコメンデーションサービスが、主に電子商取引（E-Commerce）分野などで活用されている（例えば、商品、レンタルビデオ、ＴＶ番組、あるいは、レストランなどの推薦システム）。推薦対象ユーザ（以降、アクティブユーザという）に対し、その評点履歴のみならず、他のユーザの評点履歴も用いて、アクティブユーザの未評点を予測する技術は、協調フィルタリングと呼ばれている。 Recommendation services that select and recommend items that are predicted to give high ratings from users are used mainly in the field of e-commerce (for example, products) , Rental video, TV program, or restaurant recommendation system). A technique for predicting an unrated score of an active user by using not only the score history but also the score history of other users for a recommendation target user (hereinafter referred to as an active user) is called collaborative filtering.

協調フィルタリングにより、高い予測評点のアイテムを優先的に推薦することにより、アクティブユーザにとって、アイテム選択にかかる負担を軽減することができ、同時に、アクティブユーザが知らなかったアイテムを提示することが可能となる。また、アイテム提供者側では、ユーザに適切なアイテムが推薦されることで、アイテムの販売活動の活性化、効率化を促進することができる。 By preferentially recommending items with high predictive scores through collaborative filtering, it is possible to reduce the burden of item selection for active users, and at the same time, it is possible to present items that the active user did not know Become. In addition, on the item provider side, activation of item sales activities and efficiency can be promoted by recommending appropriate items to the user.

協調フィルタリングは、多くのユーザの評点履歴（例えば、表１のような、ユーザがアイテムに対して与えた評点の履歴データ上で、ユーザＩＤが１９６のユーザが、アイテムＩＤが２４２のアイテムに評点として３を与えた、と見る）を基に、アクティブユーザの未評点アイテムに対する評点を予測する。そして、予測評点値が高いアイテムを優先的に推薦する。したがって、推薦システムでは、評点履歴から未評点の評点値をいかに精度高く予測するか、すなわち、高精度の評点予測方法の実現が重要課題となる。 Collaborative filtering is based on the score history of many users (for example, on the history data of scores given by users to items as shown in Table 1, a user with a user ID of 196 scores an item with an item ID of 242) As a result, the score for the unrated item of the active user is predicted. Then, an item with a high predicted score value is preferentially recommended. Therefore, in the recommendation system, how to predict the score value of the unscored score from the score history with high accuracy, that is, the realization of the highly accurate score prediction method is an important issue.

従来の評点予測方法は、（ｉ）最近傍法に基づく方法（例えば、非特許文献１参照）、（ｉｉ）確率モデルに基づく方法（例えば、非特許文献２参照）、の２つのアプローチに大別できる。（ｉ）は、アクティブユーザの評点履歴と類似した評点履歴を持つ複数のユーザを探索し、探索したユーザの評点履歴を利用して、アクティブユーザの未評点アイテムの評点を予測する。（ｉｉ）は、最初に、ユーザの評点履歴を表現する確率モデルを仮定し、実際の評点履歴を用いてモデルを学習する。その後、学習したモデルを利用してアクティブユーザの未評点アイテムの評点を予測する。 Conventional score prediction methods are largely divided into two approaches: (i) a method based on the nearest neighbor method (for example, see Non-Patent Document 1) and (ii) a method based on a probability model (for example, see Non-Patent Document 2). Can be separated. (I) searches a plurality of users having a score history similar to the score history of the active user, and predicts the score of the unscored item of the active user using the score history of the searched user. In (ii), first, a probability model expressing the user's score history is assumed, and the model is learned using the actual score history. Thereafter, the score of the unscored item of the active user is predicted using the learned model.

いずれの方法によるアプローチも、ユーザ間で何らかの類似性の仮定を用いて予測を行っているといえる。つまり、（ｉ）ではアクティブユーザと探索対象となるユーザの共通する評点済みアイテムの評点が互いに類似していれば、未評点の部分も類似しているという仮定を用いている。 In any of the approaches, it can be said that the prediction is performed using some similarity assumption between users. In other words, (i) uses the assumption that if the grades of the scored items common to the active user and the user to be searched are similar to each other, the unscored parts are also similar.

また、（ｉｉ）では、ユーザは嗜好の似通ったいくつかのクラスに属し（（ｉｉ）ではこのようなクラスが潜在的に存在すると仮定している。以下、このグループのことを潜在クラスという）、同一の潜在クラスに属すユーザの、アイテムの評点はその潜在クラス内では同じ確率分布に従うという仮定を設けている。 In (ii), the user belongs to several classes with similar preferences (in (ii), it is assumed that such a class exists. Hereinafter, this group is referred to as a latent class). The assumption is that the item scores of users belonging to the same latent class follow the same probability distribution within that latent class.

これらいずれの方式による類似性も、未評点の評点を予測する際、ユーザもしくはアイテムの一部の情報しか用いられないという意味で“局所的類似性”といえる。すなわち、評点全体の大局的な特徴を捉えずに未評点の評点を予測するため、高い予測精度が得られない可能性が大きい問題点がある。 Similarity by any of these methods can be said to be “local similarity” in the sense that only part of the information of the user or item is used when predicting an unrated score. That is, there is a problem that there is a high possibility that high prediction accuracy cannot be obtained because an unrated score is predicted without capturing the global characteristics of the entire score.

P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: An Open Architecture for Collaborative Filtering of Netnews. In Proceedings of ACM CSCW1994, pages 175-186, October 1994.P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl.Grouplens: An Open Architecture for Collaborative Filtering of Netnews.In Proceedings of ACM CSCW1994, pages 175-186, October 1994. T. Hofmann, "Latent Semantic Models for Collaborative Filtering", ACM Trans. Information Systems, 22(1):89-115, January 2004.T. Hofmann, "Latent Semantic Models for Collaborative Filtering", ACM Trans. Information Systems, 22 (1): 89-115, January 2004.

本発明は、前記した問題点を鑑みてなされたものであり、ユーザおよびアイテムに関するすべての情報を用いた“大局的類似性”に基づき未評点の評点を予測する評点予測方法によって高精度に評点を予測できる協調フィルタリング方法、協調フィルタリング装置、および協調フィルタリングプログラムならびにそのプログラムを記録した記録媒体を提供することを目的とする。 The present invention has been made in view of the above-mentioned problems, and is highly accurate by a score prediction method that predicts unscored scores based on “global similarity” using all information about users and items. Filtering method, collaborative filtering device, collaborative filtering program, and recording medium on which the program is recorded are provided.

前記課題を解決するため、請求項１に記載の協調フィルタリング方法は、多数のアイテムに対して多数のユーザが離散値で評価した評点を含む評点履歴を用いて、あるユーザの未評価アイテムの評点を予測する協調フィルタリング方法であって、評点履歴を入力され評価済みの評点の分布を示す経験評点分布を出力する経験評点分布算出ステップと、未評価の評点に所定の予測初期値を代入して予測評点分布を出力する予測評点分布算出ステップと、経験評点分布と予測評点分布との非類似度を出力する非類似度算出ステップと、非類似度が小さくなるように非線形の最適化計算を行って未評価の評点を逐次推定し出力する評点推定ステップと、を備えることを特徴とする。 In order to solve the above-mentioned problem, the collaborative filtering method according to claim 1 is characterized in that a score of an unevaluated item of a certain user is obtained by using a score history including scores evaluated by a large number of users with respect to a large number of items. Is a collaborative filtering method for predicting an experience score distribution calculating step for outputting an experience score distribution indicating a distribution of evaluated scores inputted as a score history, and substituting a predetermined initial prediction value for an unrated score. Predictive score distribution calculation step that outputs predicted score distribution, dissimilarity calculation step that outputs dissimilarity between experience score distribution and predicted score distribution, and nonlinear optimization calculation so that dissimilarity is reduced And a score estimation step of sequentially estimating and outputting unrated ratings.

かかる手順によれば、経験評点分布算出ステップにおいて、評価済みの評点の分布を示す経験評点分布を出力し、予測評点分布算出ステップにおいて、未評価の評点に所定の予測初期値を代入して予測評点分布を出力し、非類似度算出ステップにおいて、経験評点分布と予測評点分布との非類似度を出力し、評点推定ステップにおいて、この非類似度が小さくなるように、すなわち、予測評点分布が経験評点分布に近づくように、非線形の最適化計算を行って未評価の評点を推定している。 According to such a procedure, in the experience score distribution calculation step, an experience score distribution indicating the distribution of evaluated scores is output, and in the predicted score distribution calculation step, a predetermined predicted initial value is substituted for an unrated score. The score distribution is output, the dissimilarity between the experience score distribution and the predicted score distribution is output in the dissimilarity calculation step, and the dissimilarity is reduced in the score estimation step, that is, the predicted score distribution is Non-evaluated scores are estimated by performing nonlinear optimization calculations so as to approximate the empirical score distribution.

また、請求項２に記載の協調フィルタリング方法は、請求項１に記載の協調フィルタリング方法において、経験評点分布算出ステップは、ユーザごとに評点済みのアイテムに渡って離散値の評点の頻度を算出し、ユーザごとの経験評点分布を出力するステップと、アイテムごとに評点済みユーザに渡って離散値の評点の頻度を出力し、アイテムごとの経験評点分布を出力するステップと、すべての評点データに対し離散値の評点を算出し、ユーザおよびアイテムごとの経験評点分布を出力するステップと、を備えることを特徴とする。 Further, the collaborative filtering method according to claim 2 is the collaborative filtering method according to claim 1, wherein the experience score distribution calculating step calculates the frequency of the score of discrete values over the items already scored for each user. , The step of outputting the experience score distribution for each user, the step of outputting the frequency of discrete score across the users who have already been scored for each item, the step of outputting the experience score distribution for each item, and all the score data Calculating a discrete score and outputting an experience score distribution for each user and item.

かかる手順によれば、経験評点分布算出ステップでは、（１）ユーザごとに評点済みのアイテムに渡って離散値の評点の頻度を算出し、ユーザごとの経験評点分布を出力し、（２）アイテムごとに評点済みユーザに渡って離散値の評点の頻度を出力し、アイテムごとの経験評点分布を出力し、（３）すべての評点データに対し離散値の評点を算出し、ユーザおよびアイテムごとの経験評点分布を出力する。 According to this procedure, in the experience score distribution calculation step, (1) the frequency of discrete score is calculated over the items already scored for each user, and the experience score distribution for each user is output. Outputs the frequency of discrete score across the users who have already been scored, outputs the empirical score distribution for each item, and (3) calculates the discrete score for all score data, for each user and item Output experience score distribution.

また、請求項３に記載の協調フィルタリング方法は、請求項１に記載の協調フィルタリング方法において、予測評点分布算出ステップは、ユーザごとに評点済みのアイテムに渡って離散値の評点の頻度を算出し、ユーザごとの予測評点分布を出力するステップと、アイテムごとに評点済みユーザに渡って離散値の評点の頻度を算出し、アイテムごとの予測評点分布を出力するステップと、すべての評点データに対し離散値の評点を算出し、ユーザおよびアイテムごとの予測評点分布を出力するステップと、を備えることを特徴とする。 Further, the collaborative filtering method according to claim 3 is the collaborative filtering method according to claim 1, wherein the predictive score distribution calculating step calculates the frequency of the score of discrete values over the items already scored for each user. , A step of outputting a predicted score distribution for each user, a step of calculating a frequency of a discrete score across users who have already been scored for each item, and a step of outputting a predicted score distribution for each item, and for all score data And calculating a discrete score and outputting a predicted score distribution for each user and item.

かかる手順によれば、予測評点分布算出ステップでは、（１）ユーザごとに評点済みのアイテムに渡って離散値の評点の頻度を算出し、ユーザごとの予測評点分布を出力し、（２）アイテムごとに評点済みユーザに渡って離散値の評点の頻度を算出し、アイテムごとの予測評点分布を出力し、（３）すべての評点データに対し離散値の評点を算出し、ユーザおよびアイテムごとの予測評点分布を出力する。 According to such a procedure, in the predicted score distribution calculating step, (1) the frequency of discrete score is calculated over the items already scored for each user, and the predicted score distribution for each user is output. (2) Item The frequency of discrete score is calculated for each scored user, and the predicted score distribution for each item is output. (3) The discrete score is calculated for all score data, Output the predicted score distribution.

また、請求項４に記載の協調フィルタリング方法は、請求項１に記載の協調フィルタリング方法において、非類似度算出ステップは、経験評点分布と予測評点分布とのＫＬダイバージェンスを算出することにより非類似度を算出することを特徴とする。 The collaborative filtering method according to claim 4 is the collaborative filtering method according to claim 1, wherein the dissimilarity calculation step calculates the KL divergence between the experience score distribution and the predicted score distribution. Is calculated.

かかる手順によれば、非類似度算出ステップにおいて、経験評点分布と予測評点分布とのＫＬダイバージェンスを算出することにより非類似度が算出される。 According to this procedure, the dissimilarity is calculated by calculating the KL divergence between the experience score distribution and the predicted score distribution in the dissimilarity calculation step.

また、請求項５に記載の協調フィルタリング方法は、請求項１に記載の協調フィルタリング方法において、評点推定ステップは、経験評点分布と予測評点分布とのカルバック情報量の総和が最小化するように最適化計算を行うことを特徴とする。 Further, the collaborative filtering method according to claim 5 is the collaborative filtering method according to claim 1, wherein the score estimation step is optimal so that the sum of the amount of the Cullback information between the experience score distribution and the predicted score distribution is minimized. It is characterized by performing calculus calculation.

かかる手順によれば、評点推定ステップにおいて、経験評点分布と予測評点分布とのカルバック情報量の総和が最小化するように最適化計算が行われる。 According to such a procedure, in the score estimation step, the optimization calculation is performed so that the total sum of the Cullback information amounts of the experience score distribution and the predicted score distribution is minimized.

また、請求項６に記載の協調フィルタリング装置は、多数のアイテムに対して多数のユーザが離散値で評価した評点を含む評点履歴を用いて、あるユーザの未評点アイテムの評点を予測する協調フィルタリング装置であって、評点履歴データを基に評価済みの評点の分布を示す経験評点分布を出力する経験評点分布算出部と、未評点の評点に所定の予測初期値を代入して予測評点分布を出力する予測評点分布算出部と、経験評点分布と予測評点分布との非類似度を算出する非類似度算出部と、非類似度が小さくなるように非線形の最適化計算を行って未評価の評点を逐次推定する評点推定部と、を備えることを特徴とする。 Further, the collaborative filtering device according to claim 6 predicts the score of an unscored item of a certain user by using a score history including scores evaluated by a large number of users with discrete values for a large number of items. An equipment that outputs an empirical score distribution indicating the distribution of evaluated scores based on the score history data, and a predicted score distribution by substituting a predetermined predicted initial value for an unscored score. Predictive score distribution calculator to output, dissimilarity calculator to calculate dissimilarity between experience score distribution and predicted score distribution, and non-evaluation by performing nonlinear optimization calculation so that dissimilarity is reduced And a score estimation unit that sequentially estimates the score.

かかる構成によれば、経験評点分布算出部において、評価済みの評点の分布を示す経験評点分布を出力し、予測評点分布算出部において、未評価の評点に所定の予測初期値を代入して予測評点分布を出力し、非類似度算出部において、経験評点分布と予測評点分布との非類似度を出力し、評点推定部において、この非類似度が小さくなるように、すなわち、予測評点分布が経験評点分布に近づくように、非線形の最適化計算を行って未評価の評点を推定している。 According to this configuration, the empirical score distribution calculation unit outputs an empirical score distribution indicating the distribution of evaluated scores, and the predicted score distribution calculation unit substitutes a predetermined predicted initial value for an unevaluated score. The score distribution is output, the dissimilarity calculation unit outputs the dissimilarity between the experience score distribution and the predicted score distribution, and the score estimation unit reduces the dissimilarity, that is, the predicted score distribution is Non-evaluated scores are estimated by performing nonlinear optimization calculations so as to approximate the empirical score distribution.

また、請求項７に記載の協調フィルタリングプログラムは、コンピュータに、請求項１に記載の協調フィルタリング方法を実行させることを特徴とする。この構成により、このコンピュータは、請求項１に記載の方法を実現できる。すなわち、このような評点予測方式による協調フィルタリング方法を実行させる協調フィルタリングプログラムによれば、大局的類似性に基づき予測対象のすべての評点を同時予測する評点予測方式による、協調フィルタリング装置と同様の機能をコンピュータに実行させることが可能である。 A collaborative filtering program according to claim 7 causes a computer to execute the collaborative filtering method according to claim 1. With this configuration, the computer can realize the method according to claim 1. That is, according to the collaborative filtering program that executes the collaborative filtering method based on such a score prediction method, the same function as that of the collaborative filtering device based on the score prediction method that simultaneously predicts all scores of the prediction target based on the global similarity. Can be executed by a computer.

また、請求項８に記載の記録媒体は、請求項７に記載の協調フィルタリングプログラムを記録したことを特徴とする。この構成により、所望のコンピュータに請求項１に記載の方法を実行させることができる。すなわち、このような評点予測方式による協調フィルタリング方法を実行させる協調フィルタリングプログラムを格納した記録媒体によれば、大局的類似性に基づき予測対象のすべての評点を同時に予測する評点予測方式による、協調フィルタリング装置と同様の機能をコンピュータに実行させるプログラムを記録媒体内に記録させることが可能である。 A recording medium according to an eighth aspect stores the collaborative filtering program according to the seventh aspect. With this configuration, it is possible to cause a desired computer to execute the method described in claim 1. That is, according to the recording medium storing the collaborative filtering program for executing the collaborative filtering method based on such a score prediction method, the collaborative filtering based on the score prediction method that simultaneously predicts all the scores of the prediction target based on the global similarity. A program for causing a computer to execute the same function as the apparatus can be recorded in a recording medium.

本発明によれば、従来よりも多くの情報を含んだ大局的類似性、すなわち、評点済みデータに関する経験周辺評点分布と予測対象の未評点データに関する予測評点分布の類似性に基づいて高精度に評点予測を行うことが可能となる。 According to the present invention, the global similarity including more information than before, that is, based on the similarity between the experience peripheral score distribution regarding the scored data and the prediction score distribution regarding the unrated score data to be predicted with high accuracy. Score prediction can be performed.

本発明は、前記課題を解決するために創案されたものであり、大局的類似性に基づき予測対象の評点を予測する評点予測方法を提供する。
本発明における評点予測方法では、まず、評点済みのデータから、ユーザごとの周辺評点分布、アイテムごとの周辺評点分布、および全評点の周辺評点分布を各々算出する。これらは、実際の観測値から頻度をカウントすることで求まる。これらの評点分布を経験周辺評点分布と呼ぶこととする。 The present invention has been developed to solve the above-described problems, and provides a score prediction method for predicting a score to be predicted based on global similarity.
In the score prediction method according to the present invention, first, the peripheral score distribution for each user, the peripheral score distribution for each item, and the peripheral score distribution of all scores are calculated from the scored data. These are obtained by counting the frequency from the actual observation values. These score distributions are referred to as experience marginal score distributions.

なお、本説明において、複数の評点分布のうち、所定の条件に基づき周辺化を行ったものを周辺評点分布、または、単に評点分布ということとする。また、「評点」とは、評価された点を意味するほか、評価すること（点数を付けること）をも意味することとする。 In the present description, among a plurality of score distributions, a peripheral score distribution based on a predetermined condition is referred to as a peripheral score distribution or simply a score distribution. Further, “score” means not only an evaluated point but also an evaluation (adding a score).

ここで、周辺評点分布とは、ある観点での評点分布を意味する。例えば、図１のような評点履歴の行列（テーブル）で表現され、正の値は評点値を、０は未評点値を各々表す。例えば、ユーザ１はアイテム１，３，４に各々評点２，３，１をつけ、アイテム２，５は評点をつけていないことを意味する。このデータに対するユーザごとの経験周辺評点分布とは、図２（ａ）の行列の行方向の評点分布（Ｐ~_ｉ，ｉ＝１，…，６）を指し、アイテムごとの経験周辺評点分布とは、図２（ａ）の行列の列方向の評点分布（Ｑ~_ｊ，ｊ＝１，…，５）を指し、全評点の評点分布とは、図２（ａ）の行列全体の評点分布（Ｓ~）を指す。例えば、Ｐ~_１は、評点１，２，３の確率がすべて１／３であることを表す。 Here, the peripheral score distribution means a score distribution from a certain viewpoint. For example, a score history matrix (table) as shown in FIG. 1 is used, with positive values representing score values and 0 representing unscored values. For example, it means that the user 1 gives a score 2, 3, 1 to the items 1, 3, 4 respectively, and the items 2, 5 do not give a score. And experience near score distribution for each user with respect to this data, scores distribution in the row direction of the matrix of FIG. _{2 (a) (P ~ i} , i = 1, ..., 6) refers to a empirical peripheral score distribution for each item Indicates the score distribution (Q ~ _j , j = 1,..., 5) in the column direction of the matrix of FIG. 2 (a). The score distribution of all scores is the score distribution of the entire matrix of FIG. 2 (a). (S ~). For example, P ~ _1, the probability scores 1, 2 and 3 indicate that all is 1/3.

そして、予測対象の未評点データに対し、前記周辺評点分布を予測値の関数として表現する。例えば、図２（ｂ）のＰ_ｉ，Ｑ_ｊ，Ｓは、未評点の評点にある予測値が代入されたときに算出される予測周辺評点分布の例を表す。次いで、対応する周辺評点分布（Ｐ~_ｉとＰ_ｉ，Ｑ~_ｊとＱ_ｊ，Ｓ~とＳ）が各々できるだけ類似するように、両者の分布間の非類似度（例えば、ＫＬダイバージェンス）を最小化することにより未評点の予測値を求める。 Then, the peripheral score distribution is expressed as a function of the predicted value for the unrated score data to be predicted. For example, P _i , Q _j , and S in FIG. 2B represent an example of a predicted peripheral score distribution calculated when a predicted value in an unrated score is substituted. Then, the dissimilarity between the two distributions (for example, KL divergence) is set so that the corresponding marginal score distributions (P ~ _i and _Pi , Q ~ _j and _Qj , S ~ and S) are as similar as possible. The predicted value of the unrated score is obtained by minimizing.

予測対象評点群はユーザごとの周辺評点分布、アイテムごとの周辺評点分布、全評点の評点分布に寄与するため、予測対象評点同士が直接あるいは間接的に関連しているという意味で“大局的類似性”に基づいた予測がなされることになる。それ故、従来よりも予測精度の向上が期待できる。同様の機能をコンピュータに実行させることが可能である。さらに、このような評点予測方式を実行させる評点予測プログラムを格納した記録媒体によれば、大局的類似性に基づき予測対象のすべての評点を同時に予測する装置と同様の機能をコンピュータに実行させるプログラムを記録媒体内に記録させることが可能である。 The prediction target score group contributes to the peripheral score distribution for each user, the peripheral score distribution for each item, and the score distribution of all the scores, so that the prediction target scores are directly or indirectly related to each other. A prediction based on “sex” will be made. Therefore, improvement in prediction accuracy can be expected as compared with the prior art. A similar function can be executed by a computer. Furthermore, according to the recording medium storing the score prediction program for executing such a score prediction method, the program for causing the computer to execute the same function as the device for simultaneously predicting all the scores to be predicted based on the global similarity Can be recorded in a recording medium.

次に、添付した各図を参照し、本発明の実施形態について詳細に説明する。
図３は、本発明の実施形態に係る協調フィルタリング装置１を示すブロック図である。
図３に示すように、協調フィルタリング装置１は、演算手段２と、入力手段３と、記憶手段４と、出力手段５と、これらの各手段を接続するバスライン１１と、を備えている。 Next, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
FIG. 3 is a block diagram showing the collaborative filtering device 1 according to the embodiment of the present invention.
As shown in FIG. 3, the collaborative filtering device 1 includes a calculation unit 2, an input unit 3, a storage unit 4, an output unit 5, and a bus line 11 that connects these units.

演算手段２は、ＣＰＵおよびＲＡＭなどから構成されるコンピュータの中枢部である。演算手段２は、前処理部２１と、評点予測部２２と、メモリ２３とを含んで構成される。演算手段２は、記憶手段４から前処理プログラム４１と、評点予測プログラム４２とを読み込み、メモリ２３に格納し、実行することで、前処理部２１と、評点予測部２２を実現する。 The computing means 2 is a central part of a computer composed of a CPU, a RAM and the like. The computing means 2 includes a preprocessing unit 21, a score prediction unit 22, and a memory 23. The calculation means 2 reads the preprocessing program 41 and the score prediction program 42 from the storage means 4, stores them in the memory 23, and executes them, thereby realizing the preprocessing section 21 and the score prediction section 22.

入力手段３は、キーボードやディスクドライブ装置など（いずれも図示せず）から構成される。評点履歴は、入力手段３を介して入力され、記憶手段４に記憶される。 The input means 3 is composed of a keyboard, a disk drive device, etc. (both not shown). The rating history is input via the input unit 3 and stored in the storage unit 4.

記憶手段４は、ハードディスク装置などから構成される。記憶手段４は、前処理プログラム４１、評点予測プログラム４２を記憶させておくことが可能である。また、記憶手段４は、評点履歴４３、入力データ４４、予測評点４５と、を含んで構成される。 The storage means 4 is composed of a hard disk device or the like. The storage unit 4 can store a preprocessing program 41 and a score prediction program 42. The storage unit 4 includes a score history 43, input data 44, and a predicted score 45.

ここで、表１に示すように、評点履歴は、ユーザＩＤ、アイテムＩＤ、評点、タイムスタンプなどの履歴である。ユーザＩＤがｉ(ｉ＝１，２，…，Ｎ)で、アイテムＩＤがｊ(ｊ＝１，２，…，Ｎ)に対する評点ｒ_ｉ，ｊは、離散値{１，２，…，Ｖ}を持つとする。評点ｒ_ｉ，ｊの値は、大きいほど良い評価とする。 Here, as shown in Table 1, the score history is a history of user ID, item ID, score, time stamp, and the like. The rating ri _{, j} for the user ID i (i = 1, 2,..., N) and the item ID j (j = 1, 2,..., N) is a discrete value {1, 2,. } The higher the score r _{i, j,} the better the evaluation.

出力手段５は、例えば、グラフィックボードおよびこれに接続されたモニタであり、アクティブユーザに対して推薦するアイテムなどを表示する。 The output means 5 is, for example, a graphic board and a monitor connected thereto, and displays items recommended for the active user.

図４は、本実施形態の前処理部２１を示すブロック図である。
この前処理部２１は、評点履歴読込部２１１と、入力データ書込部２１２とを備えている。 FIG. 4 is a block diagram showing the preprocessing unit 21 of the present embodiment.
The preprocessing unit 21 includes a grade history reading unit 211 and an input data writing unit 212.

評点履歴読込部２１１は、評点履歴４３から、ユーザＩＤ、アイテムＩＤ、評点を読み込む。 The score history reading unit 211 reads the user ID, item ID, and score from the score history 43.

そして、入力データ書込部２１２は、ユーザＩＤ、アイテムＩＤ、評点を、入力データ４４に格納する。入力データ４４は、図１に示すように、ユーザＩＤを行、アイテムＩＤを列、要素に評点、を持つ行列である。ユーザが未評点である評点には、“０”が入っているものとする。以降、入力データの行列をＲと表記する。行列Ｒの(ｉ，ｊ)要素は評点ｒ_ｉ，ｊを表す。 Then, the input data writing unit 212 stores the user ID, the item ID, and the grade in the input data 44. As shown in FIG. 1, the input data 44 is a matrix having user IDs as rows, item IDs as columns, and elements as grades. Assume that “0” is included in the score that is not rated by the user. Hereinafter, a matrix of input data is denoted as R. The (i, j) element of the matrix R represents the rating r _{i, j} .

ここで、説明の簡略化のため、行列Ｒに関して、以下の変数を定義しておく。
ユーザＩＤがｉで評点済みのｒ_ｉ，ｊ，ｊ∈Ｊ_ｉ１の集合をＢ_ｉ１、未評点のｒ_ｉ，ｊ，ｊ∈Ｊ_ｉ０の集合をＢ_ｉ０と表記する。ここでＪ_ｉ１(Ｊ_ｉ０)はユーザＩＤがｉのユーザが評点した（未評点の）アイテムインデックス集合を表す。また、便宜上、行列Ｒ中のアイテムＩＤ、ｉに関する評点集合をＢ_ｊ１、未評点の評点集合をＢ_ｊ０と表記する。すなわち、行列Ｒにおいて、第ｉ行の（未）評点集合がＢ_ｉ１(Ｂ_ｉ０)であり、第ｊ列の（未）評点集合がＢ_ｊ１(Ｂ_ｊ０)である。さらに、特にｉ，ｊを意識しない場合、行列Ｒにおける評点済みの評点集合をＢ^１ _＊、未評点の評点集合をＢ^０ _＊と表記する。 Here, for simplification of description, the following variables are defined for the matrix R.
A set of r _{i, j} , jεJ _i 1 with user ID i and graded is denoted as B _i 1, and a set of unrated scores r _{i, j} , jεJ _i 0 is denoted as B _i 0. Here, J _i 1 (J _i 0) represents an item index set (not rated) by a user whose user ID is i. For convenience, the score set for the item ID i in the matrix R is denoted as B _j 1 and the unrated score set is denoted as B _j 0. That is, in the matrix R, the (un) score set of the i-th row is B _i 1 (B _i 0), and the (un) score set of the j-th column is B _j 1 (B _j 0). Further, if i and j are not particularly considered, the score set that has already been scored in the matrix R is denoted as B ¹ _* , and the score set that has not been scored is denoted as B ⁰ _* .

図５は、本実施形態に係る評点予測部２２のブロック図である。
この評点予測部２２は、入力データ読込部２２１と、予測評点初期値算出部２２２と、予測評点更新部２２３と、予測評点書込部２２４とを備えている。 FIG. 5 is a block diagram of the score prediction unit 22 according to the present embodiment.
The score predicting unit 22 includes an input data reading unit 221, a predicted score initial value calculating unit 222, a predicted score updating unit 223, and a predicted score writing unit 224.

入力データ読込部２２１は、入力データ４４から、行列Ｒを読み込む。 The input data reading unit 221 reads the matrix R from the input data 44.

予測評点初期値算出部２２２は、入力データ読込部２２１が読み込んだ行列Ｒの要素のうち、予測対象となる未評点の評点{ｒ_ｉ，ｊ∈Ｂ^０ _＊}すべてに対して、未評点であることを示す０の代わりに、初期値として何らかの数値を代入する。例えば、０の代わりに、３を{ｒ_ｉ，ｊ∈Ｂ^０ _＊}に代入することが可能である。また、何らかの手段によって計算された値を代入することも可能である。初期値が代入された評点を{ｒ^(０) _ｉ，ｊ∈Ｂ^０ _＊}と表記する。
初期値として、実数値も代入可能である。以降の説明において、評点ｒ_ｉ，ｊは実数値をとるものとする。 The predicted score initial value calculation unit 222 is an unrated score for all the unscored scores {ri _, jεB ⁰ _* } to be predicted among the elements of the matrix R read by the input data reading unit 221. Some numerical value is substituted as an initial value instead of 0 indicating the presence. For example, instead of 0, 3 can be substituted for {ri _, jεB ⁰ _* }. It is also possible to substitute a value calculated by some means. The score assigned with the initial value is expressed as {r ⁽⁰⁾ _{i, j} ∈ B ⁰ _* }.
A real value can be substituted as an initial value. In the following description, the score r _{i, j} is assumed to be a real value.

予測評点更新部２２３は、予測評点初期値算出部２２２で作成された、未評点の評点集合に対する初期値を基に、ある目的関数を最小化する未評点の評点集合を算出する。 The predicted score update unit 223 calculates an unscored score set that minimizes a certain objective function based on the initial value for the unscored score set created by the predicted score initial value calculation unit 222.

より詳細に説明すると、まず、Ｎユーザ、Ｍアイテムからなる評点履歴データ（Ｎ×Ｍ行列Ｒ、行列Ｒの第ｉ，ｊ要素ｒ_ｉ，ｊ{０，１，…，Ｖ}はユーザｉのアイテムｊの評点を表す。評点の最高値をＶとし、０は未評点を表す。）に対し、以下の３種類の周辺分布を計算する。 More specifically, first, the score history data consisting of N users and M items (N × M matrix R, i-th and j-th elements r _{i, j} {0, 1,... Represents the score of item j. The maximum value of the score is V, and 0 represents an unrated score).

・行列Ｒの行方向（ユーザごと）の経験周辺評点分布Ｐ~_ｉ，ｉ＝１，２，…，Ｎ，および予測周辺評点分布Ｐ~_ｉ，ｉ＝１，２，…，Ｎ。
・行列Ｒの行方向（アイテムごと）の経験周辺評点分布Ｑ~_ｉ，ｉ＝１，２，…，Ｎ，および予測周辺評点分布Ｑ~_ｉ，ｉ＝１，２，…，Ｎ。
・行列Ｒの行列全体の経験評点分布Ｓ~，および予測周辺評点分布Ｓ。 The experience peripheral score distribution P ~ _i , i = 1, 2,..., N in the row direction (for each user) of the matrix R, and the predicted peripheral score distribution P ~ _i , i = 1, 2,.
· Experience peripheral score distribution Q ~ _i , i = 1, 2, ..., N in the row direction (for each item) of matrix R, and predicted peripheral score distribution Q ~ _i , i = 1, 2, ..., N.
The empirical score distribution S ~ of the entire matrix R and the predicted peripheral score distribution S.

図２に、Ｎ＝６，Ｍ＝５の例を示す。
例えば、ユーザｉの経験周辺評点分布Ｐ~_ｉは、ユーザｉの評点済みの評点集合Ｂ_ｉ１{ｒ_ｉ，ｊ；ｒ_ｉ，ｊ＞０，ｊ＝１，…，Ｍ}を用いて、次式により計算する。 FIG. 2 shows an example where N = 6 and M = 5.
For example, user i's experience peripheral score distribution P ~ _i uses a score set B _i 1 {ri _{, j} ; ri _{, j} > 0, j = 1,. Calculate with the following formula.

ここに、δ(ｘ)はデルタ関数を表し、ｘが真のとき１、それ以外は０の値をとる。つまりＰ~_ｉ(ｋ)は、評点ｋの他の評点ｌ≠ｋに対する頻度比に相当する。明らかに、Σ_ｋＰ~_ｉ(ｋ)＝１である。 Here, δ (x) represents a delta function, and takes a value of 1 when x is true, and 0 otherwise. That is, P _i (k) corresponds to the frequency ratio of the score k to the other scores l ≠ k. Obviously, Σ _k P ~ _i (k) = 1.

一方、未評点の評点に関する予測周辺評点分布Ｐ_ｉは、予測評点初期値算出部２２２で作成した初期値{ｒ^（０） _ｉ，ｊ∈Ｂ_ｉ０}を用いて、次式により計算する。 On the other hand, the predicted peripheral score distribution P _i related to unscored scores is calculated by the following equation using the initial value {r ⁽⁰⁾ _{i, j} ∈ B _i 0} created by the predicted score initial value calculation unit 222.

Ｂ_ｉ０は、ユーザｉの未評点の評点集合Ｂ_ｉ０＝{ｒ_ｉ，ｊ|ｒ_ｉ，ｊ＝０，ｊ＝１，…，Ｍ}を表す。σ(＞０)はスムージングパラメータで、σが大きいほど予測値と真値との偏差を緩和させる効果を持つ。σ→０の極限で式（２）は、式（１）の形式に一致する。 B _i 0 represents a score set B _i 0 = {r _{i, j} | r _{i, j} = 0, j = 1,... σ (> 0) is a smoothing parameter. The larger σ is, the more effective it is to reduce the deviation between the predicted value and the true value. In the limit of σ → 0, Equation (2) matches the form of Equation (1).

次に、前記の計算により作成された周辺評点分布を用いて、予測評点初期値算出部２２２で作成された初期値{ｒ^(０) _ｉ，ｊ∈Ｂ^０ _＊}と、評点済みの評点{ｒ_ｉ，ｊ∈Ｂ^１ _＊}の各々の周辺分布の非類似度を算出する。非類似度としてＫＬダイバージェンスなる公知の尺度が適用可能である。例えば、ユーザｉに関する周辺分布のＫＬダイバージェンスは、次式により計算される。 Next, using the peripheral score distribution created by the above calculation, the initial value {r ⁽⁰⁾ _{i, j} ∈ B ⁰ _* } created by the predicted score initial value calculation unit 222 and the scored score { The dissimilarity of each peripheral distribution of r _{i, j} ∈ B ¹ _* } is calculated. A known measure of KL divergence can be applied as the dissimilarity. For example, the KL divergence of the marginal distribution regarding the user i is calculated by the following equation.

Ｑ_ｊ，Ｓに関しても、同様に、非類似度ＫＬ(Ｑ~_ｊ||Ｑ_ｊ)、ＫＬ(Ｓ~_ｊ||Ｓ_ｊ)が以下の式により算出される。 Similarly for Q _j and S, the dissimilarities KL ( _Q˜j || _Qj ) and KL ( _S˜j || _Sj ) are calculated by the following equations.

以上の計算により、予測評点初期値算出部２２２で作成された初期値{ｒ^(０) _ｉ，ｊ∈Ｂ^０ _＊}と、評点済みの評点{ｒ_ｉ，ｊ∈Ｂ^１ _＊}との大局的類似性を、次に示す、目的関数Ｊ{ｒ^(０) _ｉ，ｊ∈Ｂ^０ _＊}によって計算する。 By the above calculation, the initial value {r ⁽⁰⁾ _{i, j} ∈ B ⁰ _* } created by the predicted score initial value calculation unit 222 and the score already obtained {ri _{, j} ∈ B ¹ _* } are summarized. The similarity is calculated by the objective function J {r ⁽⁰⁾ _{i, j} ∈ B ⁰ _* } shown below.

大局的類似性に基づく目的関数を計算した後、続いて、目的関数Ｊ{ｒ^(０) _ｉ，ｊ∈Ｂ^０ _＊}を最小化する未評点の評点集合{ｒ_ｉ，ｊ|ｒ_ｉ，ｊ∈Ｂ^０ _＊}を求める。目的関数を最小化した結果得られた未評点の評点集合を{ｒ^(ｆ) _ｉ，ｊ|ｒ^(ｆ) _ｉ，ｊ∈Ｂ^０ _＊}とおく。ここで、目的関数を最小化する手段として、準ニュートン法などの非線形最適化手段が適用可能である。
最後に、未評点の評点集合{ｒ^(ｆ) _ｉ，ｊ|ｒ^(ｆ) _ｉ，ｊ∈Ｂ^０ _＊}を予測評点４５に格納する。 After calculating the objective function based on global similarities, subsequently, the objective function ^{_{J {r (0) i,}} j ∈B 0 *} score set of non scores minimize _{_{{r i, j | r i}} , Find _j ∈ B ⁰ _* }. A score set of unscores obtained as a result of minimizing the objective function is set as {r ^(f) _{i, j} | r ^(f) _{i, j} ∈B ⁰ _* }. Here, as means for minimizing the objective function, nonlinear optimization means such as a quasi-Newton method can be applied.
Finally, the unrated score set {r ^(f) _{i, j} | r ^(f) _{i, j} ∈ B ⁰ _* } is stored in the predicted score 45.

図６は、協調フィルタリング装置１のデータ流れ図である。
評点履歴４３が入力されると、前処理部２１は、前記した前処理を行って、入力データを生成し、記憶手段４（図３参照）に一時的に蓄積する。前処理部２１は、この入力データを、評点予測部２２へ送る。評点予測部２２は、評点に関する予測を行って、予測評点４５を出力し、この予測評点４５は、記憶手段４に蓄積される。 FIG. 6 is a data flow diagram of the collaborative filtering device 1.
When the rating history 43 is input, the preprocessing unit 21 performs the above-described preprocessing, generates input data, and temporarily stores it in the storage unit 4 (see FIG. 3). The preprocessing unit 21 sends this input data to the score prediction unit 22. The score prediction unit 22 performs prediction related to the score and outputs a prediction score 45, and the prediction score 45 is accumulated in the storage unit 4.

また、本発明は、大局的類似性の基となる周辺評点分布間の類似度を測る尺度として、ＫＬダイバージェンス（式（３），式（４），式（５）以外の尺度を用いても、同様にして導出される目的関数を最小化することで、未評点の評点予測が可能である。さらに、本発明は、目的関数（式（６））の最小化方法として、種々の最適化手法が適用可能である。 In the present invention, a measure other than KL divergence (equation (3), equation (4), equation (5)) may be used as a measure for measuring the similarity between peripheral score distributions that are the basis of global similarity. In addition, by minimizing the objective function derived in the same manner, it is possible to predict a score that has not been scored.In addition, the present invention provides various optimization methods for minimizing the objective function (equation (6)). The method is applicable.

実データに対して、次の３つの方式を適用した。
・本発明における評点予測方式（以下、「発明方式」と呼ぶ）
・最近傍法に基づく方式（以下、「従来方式Ａ」と呼ぶ）
・確率モデルに基づく方式（以下、「従来方式Ｂ」と呼ぶ）
なお、従来方式Ａについては、非特許文献１参照、従来方式Ｂについては、非特許文献２参照のこと。 The following three methods were applied to the actual data.
· Score prediction method in the present invention (hereinafter referred to as “invention method”)
A method based on the nearest neighbor method (hereinafter referred to as “conventional method A”)
A method based on a probability model (hereinafter referred to as “conventional method B”)
For the conventional method A, refer to Non-Patent Document 1, and for the conventional method B, refer to Non-Patent Document 2.

従来方式Ａは、最初に、アクティブユーザａと類似した評点履歴を持つ“類似ユーザ”を探索し、次に、探索した類似ユーザｕの評点済みの評点ｒ_ｕ，ｊを用いて予測を行う。評点ｒ_ａ，ｊに対する予測値ｒ＾_ａ，ｊを求める予測式は以下で与えられる。 In the conventional method A, first, a “similar user” having a score history similar to that of the active user a is searched, and then prediction is performed using the scored scores r _{u, j} of the searched similar user u. Score _{r a,} the predicted value for the _j _{r ^ a,} prediction equation for obtaining the _j is given by.

ここで、Ｓ_ａは、アクティブユーザａに対する類似ユーザの集合、ｗ_ａ，ｕは、アクティブユーザａとユーザｕの類似度、ｒ￣_ａは、アクティブユーザａの評点済みのｒ_ａ，ｊの平均値をそれぞれ表す。類似度ｗ_ａ，ｕは、アクティブユーザａとユーザｕとの共通する評点済みの評点を用いて、ピアソン相関係数、コサイン類似度などによって計算する。 Here, S _a is a set of similar users for the active user a, w _{a, u} is the similarity between the active user a and the user u, and r ￣ _a is the average of the rated ra _{, j} of the active user a Represents each value. The similarity wa _{, u} is calculated by the Pearson correlation coefficient, the cosine similarity, and the like using the score that has already been scored by the active user a and the user u.

従来方式Ｂは、“ユーザは嗜好の似通ったＣ個の潜在クラスｚ_ｃ(ｃ＝１，２，…，Ｃ)に分かれ、評点は潜在クラスごとに設定された正規分布からのサンプリング”という混合正規分布モデルを仮定する。ただし、評点はユーザごとに標準化（原評点からそのユーザの平均を引き、そのユーザの評点の標準偏差で割る）し実数化しておく。すなわち、アクティブユーザａの標準化された評点ｒ_ａ，ｊが従う分布をｐ(ｒ|ａ，ｊ)、アクティブユーザａの潜在クラスｚ_ｃへの帰属確率をＰ(ｚ_ｃ|ａ)、潜在クラスｚ_ｃの評点が従う正規分布をｐ(ｒ|ｚ_ｃ，ｊ)とおくと、次式のように書ける。なお、確率関数は大文字、確率密度関数を小文字で表記している。 Conventional method B is a mixture of “the user is divided into C latent classes z _c (c = 1, 2,..., C) having similar preferences, and the score is a sampling from a normal distribution set for each latent class”. A normal distribution model is assumed. However, the score is standardized for each user (the average of the user is subtracted from the original score and divided by the standard deviation of the user's score), and the score is made real. That is, p (r | a, j) represents the distribution followed by the standardized score r _{a, j} of the active user a, P (z _c | a) represents the probability of belonging to the latent class z _c of the active user a, and the latent class. If the normal distribution followed by the score of z _c is p (r | z _c , j), it can be written as The probability function is shown in upper case letters and the probability density function is shown in lower case letters.

評点ｒ_ａ，ｊに対する予測値ｒ＾_ａ，ｊを求める予測式は、次式で与えられる。 Score _{r a,} the predicted value _{r ^ a} for _{_j,} prediction formula for determining the _j is given by the following equation.

ただし、Ｐ＾(ｚ_ｃ|ａ)は学習の結果得られた帰属確率の推定値、μ＾^ｊ _ｃは潜在クラスｚ_ｃにおけるアイテムｊの評点が従う正規分布の平均値の推定値をそれぞれ表す。 Here, P ^ (z _c | a) represents an estimated value of the belonging probability obtained as a result of learning, and μ ^ ^j _c represents an estimated value of the average value of the normal distribution followed by the score of the item j in the latent class z _c . .

（使用したデータ）
本実施例では、映画に対する評点のデータである“MovieLens(ＭＬデータ)”を用いた。このデータでの評点は１，…，５の値をとる。実際に各方式を適用したデータは、ＭＬデータを若干加工した次のデータを用いた。
すなわち、行列Ｒにおけるすべての要素が評点済みとなるように、ユーザとアイテムを原データから抽出して作成し、４１人のユーザ、５０本の映画からなるデータである。 (Data used)
In this embodiment, “MovieLens (ML data)”, which is score data for a movie, is used. The score in this data takes values of 1, ..., 5. As the data to which each method was actually applied, the following data obtained by slightly processing ML data was used.
That is, it is data consisting of 41 users and 50 movies, created by extracting users and items from the original data so that all elements in the matrix R are scored.

（評価尺度）
予測精度の良さを評価する尺度として、正規化平均絶対予測誤差（Normalized Mean Absolute Error; NMAE）を用いた。ＮＭＡＥを次に示す。 (Evaluation scale)
Normalized Mean Absolute Error (NMAE) was used as a scale for evaluating the accuracy of prediction. NMAE is shown below.

ここで、ＭＡＥ（平均絶対予測誤差）は、次式で与えられる。 Here, MAE (average absolute prediction error) is given by the following equation.

ｒ＾_ｉ，ｊは、未評点のｒ_ｉ，ｊに対する予測評点を表し、＃{}は集合の要素数を表す。ＮＭＡＥの最良値は０で、１より大きい場合にはランダムに評点を予測するよりも悪いことを意味する。Ｅ{ＭＡＥ}は、Ｎ，Ｍ，Ｖが既知であれば、容易に計算できる。 r ^ _{i, j} represents a predicted score for an unrated score r _{i, j} , and # {} represents the number of elements in the set. The best value of NMAE is 0, and if it is greater than 1, it means worse than predicting the score randomly. E {MAE} can be easily calculated if N, M, and V are known.

（実施方法）
行列Ｒにおける評点済みの評点に対して、ユーザごとにランダムに２０％を選んで未評点データ（テストデータ）とし、残りの８０％を評点済みデータ（学習データ）とする。これらを５セット作成し、これら５セットに対するＮＭＡＥの平均値を求めて、各方式を評価した。 (Implementation method)
For the graded scores in the matrix R, 20% is randomly selected for each user as unscored data (test data), and the remaining 80% is designated as graded data (learning data). Five sets of these were prepared, and the average value of NMAE for these five sets was determined to evaluate each method.

（実施結果）
表２に各予測方式におけるＮＭＡＥを示す。 (Implementation results)
Table 2 shows NMAE in each prediction method.

また、図７に、本発明による評点予測方式によって予測した値の一部を示す。表２の計算時間は、ＣＰＵの内部クロック周波数が３．６０ＧＨｚで、メインメモリが２ＧＢの計算機上で、Ｃ言語で実装したものの処理時間である。 FIG. 7 shows a part of values predicted by the score prediction method according to the present invention. The calculation time in Table 2 is the processing time of a C-language implementation on a computer with an internal clock frequency of 3.60 GHz and a main memory of 2 GB.

本発明によれば、特に、電子商取引分野、具体的には、商品販売、レンタルビデオ、ＴＶ番組、あるいは、レストランなどの推薦システムにおいて、膨大なアイテム（サービスを含む）群の中から、ユーザが高い評価を与えると予測されるアイテムを従来よりも適切に選択し、推薦できる。 According to the present invention, in particular, in the electronic commerce field, specifically, in a recommendation system such as merchandise sales, rental video, TV program, or restaurant, a user can select from a huge group of items (including services). Items that are expected to give a high evaluation can be selected and recommended more appropriately than before.

ひいては、協調フィルタリングにより、高い予測評点のアイテムを優先的に推薦することにより、アクティブユーザにとって、アイテム選択にかかる負担を軽減することができ、同時に、アクティブユーザが知らなかったアイテムを提示することが可能となる。また、アイテム提供者側では、ユーザに適切なアイテムが推薦されることで、アイテムの販売活動の活性化、効率化を促進することができる。 Eventually, by collaborative filtering, it is possible to reduce the burden of item selection for active users by preferentially recommending items with high prediction scores, and at the same time, it is possible to present items that the active user did not know It becomes possible. In addition, on the item provider side, activation of item sales activities and efficiency can be promoted by recommending appropriate items to the user.

評点履歴の行列（テーブル）で表現した評点分布例を示す図である。It is a figure which shows the example of score distribution expressed with the matrix (table) of the score history. 評価済みの周辺評点分布と、未評価の周辺評点分布を対比して示す模式図である。It is a schematic diagram which contrasts the evaluated peripheral score distribution and the unevaluated peripheral score distribution. 本発明の実施形態に係る協調フィルタリング装置を示すブロック図である。It is a block diagram which shows the collaborative filtering apparatus which concerns on embodiment of this invention. 本実施形態の前処理部を示すブロック図である。It is a block diagram which shows the pre-processing part of this embodiment. 本実施形態に係る評点予測部のブロック図である。It is a block diagram of the score estimation part which concerns on this embodiment. 協調フィルタリング装置のデータ流れ図である。It is a data flow figure of a collaborative filtering apparatus. 本発明による評点予測方式によって予測した値の一部を示す図である。It is a figure which shows a part of value estimated by the score prediction system by this invention.

Explanation of symbols

１協調フィルタリング装置
２演算手段
３入力手段
４記憶手段
５出力手段
１１バスライン
２１前処理部
２２評点予測部
２３メモリ
４１前処理プログラム
４２評点予測プログラム
４３評点履歴
４４入力データ
４５予測評点
２１１評点履歴読込部
２１２入力データ書込部
２２１入力データ読込部
２２２予測評点初期値算出部
２２３予測評点更新部
２２４予測評点書込部 DESCRIPTION OF SYMBOLS 1 Collaborative filtering apparatus 2 Computation means 3 Input means 4 Storage means 5 Output means 11 Bus line 21 Pre-processing part 22 Score prediction part 23 Memory 41 Pre-processing program 42 Score prediction program 43 Score history 44 Input data 45 Predictive score 211 Reading score history Unit 212 Input data writing unit 221 Input data reading unit 222 Predictive score initial value calculating unit 223 Predictive score updating unit 224 Predictive score writing unit

Claims

A collaborative filtering method for predicting a score of an unevaluated item of a user using a score history including a score evaluated by a large number of users with a discrete value for a large number of items,
An experience score distribution calculating step for outputting an experience score distribution indicating the distribution of the score that has been input and evaluated, and
A predicted score distribution calculating step for outputting a predicted score distribution by substituting a predetermined predicted initial value for an unrated score,
A dissimilarity calculating step for outputting a dissimilarity between the experience score distribution and the predicted score distribution;
A score estimation step of performing non-linear optimization calculation so as to reduce the dissimilarity, and sequentially estimating and outputting an unevaluated score;
A collaborative filtering method comprising:

The experience score distribution calculating step includes:
Calculating a frequency of discrete score across the items evaluated for each user, and outputting an experience score distribution for each user;
Calculating the frequency of discrete score across the evaluated users for each item and outputting an empirical score distribution for each item;
Calculating discrete scores for all score data and outputting an experience score distribution for each of the user and the item;
The collaborative filtering method according to claim 1, further comprising:

The predicted score distribution calculating step includes:
Calculating a frequency of discrete score across items evaluated for each user, and outputting a predicted score distribution for each user;
Calculating the frequency of discrete score across the evaluated users for each item, and outputting a predicted score distribution for each item;
Calculating discrete scores for all score data and outputting a predicted score distribution for each of the user and the item;
The collaborative filtering method according to claim 1, further comprising:

The collaborative filtering method according to claim 1, wherein the dissimilarity calculation step calculates the dissimilarity by calculating a KL divergence between the experience score distribution and the predicted score distribution.

2. The collaborative filtering method according to claim 1, wherein in the score estimation step, the optimization calculation is performed so that a total sum of the amount of the Cullback information between the experience score distribution and the predicted score distribution is minimized.

A collaborative filtering device that predicts a score of an unevaluated item of a user using a score history including a score evaluated by a large number of users with a discrete value for a large number of items,
An experience score distribution calculating unit that outputs an experience score distribution indicating a distribution of evaluated scores based on the score history;
A predicted score distribution calculator that outputs a predicted score distribution by substituting a predetermined predicted initial value for an unrated score;
A dissimilarity calculator for outputting dissimilarities between the experience score distribution and the predicted score distribution;
A score estimator that performs non-linear optimization calculation so as to reduce the dissimilarity, and sequentially estimates and outputs unevaluated scores; and
A collaborative filtering device comprising:

A collaborative filtering program that causes a computer to execute the collaborative filtering method according to claim 1.

A recording medium in which the collaborative filtering program according to claim 7 is recorded.