JP2006302097A

JP2006302097A - Cooperative filter device

Info

Publication number: JP2006302097A
Application number: JP2005125021A
Authority: JP
Inventors: Mie Nakai; 美絵中井; Hideyuki Yoshida; 秀行吉田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-04-22
Filing date: 2005-04-22
Publication date: 2006-11-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a cooperative filter device capable of reducing a calculation volume to shorten a processing time. <P>SOLUTION: Logs including user identification data and item identification data are accumulated as history data in a history data accumulation part 21, and history data is processed by a cooperative filter part 23. A processing object data extraction part 29 extracts logs to be processed in the cooperative filter part out of many logs accumulated in the history data accumulation part. In the processing object data extraction part 29, fundamental data of a relation between the numbers of logs and processing times of cooperative filtering is stored in a fundamental data storage part, and a processing time calculation part calculates a predictive processing time corresponding to the inputted target number of logs from fundamental data. A log extraction part of the processing object data extraction part 29 extracts logs accumulated in the history data accumulation part 21, of which the number is the inputted target number of logs. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、ユーザに推薦すべき商品等のアイテムを計算処理で求める協調フィルタ装置に関し、特に、計算量を削減して処理時間を短縮することができる協調フィルタ装置に関する。 The present invention relates to a collaborative filter device that obtains an item such as a product to be recommended to a user by calculation processing, and more particularly to a collaborative filter device that can reduce the amount of calculation and shorten the processing time.

従来、協調フィルタ装置は、コンテンツ等の商品の推薦情報を提供するために用いられる代表的な技術として知られている。協調フィルタリングは、ユーザの嗜好を過去の行動というかたちで記録したデータを用いて、推薦対象ユーザと似たような行動をとっているユーザの嗜好情報を基に推薦対象ユーザの嗜好を推測するシステムである。 Conventionally, a collaborative filter device has been known as a representative technique used for providing recommendation information of products such as contents. Collaborative filtering is a system that estimates the preference of a recommendation target user based on the preference information of a user who has taken a behavior similar to that of a recommendation target user using data in which the user's preference is recorded in the form of past behavior. It is.

協調フィルタ装置は、複数のユーザによる商品購入履歴のデータベースを備えている。協調フィルタリングでは、商品購入履歴からユーザ間の類似度が定義される。類似度としては、例えばジャッカード係数および余弦が知られている。購入履歴データから、推薦対象ユーザと比較対象ユーザの類似度が算出される。類似度から、推薦対象ユーザがまだ購入していない商品の推薦値が算出される。そして、推薦値が高い商品の情報が推薦対象ユーザへと提供される。 The collaborative filter device includes a database of product purchase histories by a plurality of users. In collaborative filtering, the similarity between users is defined from the product purchase history. As the similarity, for example, the Jackard coefficient and the cosine are known. The similarity between the recommendation target user and the comparison target user is calculated from the purchase history data. From the similarity, a recommended value of a product that has not yet been purchased by the recommended user is calculated. Then, product information with a high recommendation value is provided to the recommendation target user.

協調フィルタ装置では、ユーザ数が多いほど、また、購入履歴データが多いほど、フィルタリングの精度が向上する。そこで、協調フィルタ装置は、多数のユーザによる大量の購入履歴データのデータベースを備え、このデータベースの履歴データに対して上述の協調フィルタリングの処理を行うように構成されることが望まれる。
協調フィルタリング技術は、例えば、特許文献１に開示されている。
特開２００４−３２６２２７号公報（第５−８ページ、図１） In the collaborative filter device, the greater the number of users and the greater the purchase history data, the better the filtering accuracy. Therefore, it is desirable that the collaborative filter device includes a database of a large amount of purchase history data by a large number of users and is configured to perform the above-described collaborative filtering process on the history data of this database.
The collaborative filtering technique is disclosed in Patent Document 1, for example.
JP 2004-326227 A (page 5-8, FIG. 1)

しかしながら、従来の協調フィルタ装置においては、大量の購入履歴データを処理すると、計算量が膨大になって、処理時間が長くなりすぎるという問題があった。すなわち、従来は、処理時間がどれくらいであるかを前もって知る術がなく、いざ協調フィルタリングの実行を開始すると、いつまでたっても処理が終わらず、途中で処理を中止することになる、という事態が発生しやすかった。特に、協調フィルタ装置はメモリ、ＨＤＤといった有限なハードウエア資源の範囲内で動作するので、コンピュータのメモリが一杯になり、ＨＤＤ装置へのスワップが発生した場合などには処理時間が大幅に長くなるという問題があり、さらに、ＨＤＤ装置が一杯になると処理が不可能な事態に陥るという問題があった。 However, the conventional collaborative filter device has a problem that when a large amount of purchase history data is processed, the amount of calculation becomes enormous and the processing time becomes too long. In other words, in the past, there was no way of knowing in advance how long the processing time was, and when collaborative filtering was started, there was a situation where the processing would not end indefinitely and the processing would be interrupted It was easy. In particular, since the collaborative filter device operates within the limits of limited hardware resources such as memory and HDD, the processing time is significantly increased when the computer memory is full and swapping to the HDD device occurs. Furthermore, there is a problem that the processing cannot be performed when the HDD device is full.

本発明は、従来の問題を解決するためになされたもので、計算量を減らして処理時間を短縮することのできる協調フィルタ装置を提供することを目的とする。 The present invention has been made to solve the conventional problems, and an object of the present invention is to provide a collaborative filter device capable of reducing the amount of calculation and shortening the processing time.

本発明の協調フィルタ装置は、多数のユーザによりアイテムが取得されるたびに発生するユーザ識別データおよびアイテム識別データを含むログを履歴データとして蓄積した履歴データ蓄積部と、前記履歴データ蓄積部に蓄積された前記履歴データを用いた協調フィルタリング処理によって推薦対象ユーザへの推薦アイテムを求める協調フィルタ部と、前記協調フィルタ部による処理結果を出力する出力部と、前記履歴データ蓄積部に蓄積された多数のログから、前記協調フィルタ部で処理されるべきログを抽出する処理対象データ抽出部とを備え、前記処理対象データ抽出部は、ログ数と協調フィルタリングの処理時間との関係の基礎データを記憶した基礎データ記憶部と、協調フィルタリングで処理されるべき目標ログ数を入力する目標ログ数入力部と、前記基礎データ記憶部から前記基礎データを読み出して、前記ログ数入力部により入力された前記目標ログ数に対応する予測処理時間を前記基礎データから求める処理時間算出部と、前記処理時間算出部により算出された前記予測処理時間を出力する処理時間出力部と、前記ログ数入力部により入力された前記目標ログ数に基づき、前記履歴データ蓄積部に蓄積されたログから前記目標ログ数のログを抽出するログ抽出部とを備えている。ここで、「ログ」は、アイテム取得の履歴データであり、ユーザ識別データおよびアイテム識別データを含むデータである。アイテムは、例えば、コンテンツ等の商品である。 The collaborative filter device of the present invention stores a log including user identification data and item identification data generated each time an item is acquired by a large number of users as history data, and accumulates in the history data accumulation unit. A collaborative filter unit that obtains recommended items for recommendation users by collaborative filtering processing using the history data, an output unit that outputs a processing result by the collaborative filter unit, and a large number stored in the history data storage unit A processing target data extracting unit that extracts a log to be processed by the collaborative filter unit from the log, and the processing target data extracting unit stores basic data on the relationship between the number of logs and the processing time of collaborative filtering Target data storage and target number of logs to be processed by collaborative filtering A processing time calculation unit that reads the basic data from the basic data storage unit and obtains a predicted processing time corresponding to the target log number input by the log number input unit from the basic data; A processing time output unit that outputs the predicted processing time calculated by the processing time calculation unit, and a log stored in the history data storage unit based on the target log number input by the log number input unit. A log extracting unit that extracts a target number of logs. Here, the “log” is item acquisition history data, and is data including user identification data and item identification data. The item is, for example, a product such as content.

この構成により、ログ数と協調フィルタリング処理時間との関係の基礎データを記憶した基礎データ記憶部が設けられ、入力された目標ログ数に対応する予測処理時間が計算され、また、入力された目標ログ数のログが履歴データ蓄積部から抽出される。予測処理時間が予めわかるので、予測処理時間が妥当になるようにログ数を設定できる。また、設定したログ数のログを履歴データから抽出することにより、算出した予測処理時間を達成するように計算量を削減できる。このようにして、計算量を減らして処理時間を短縮することができる。 With this configuration, a basic data storage unit that stores basic data on the relationship between the number of logs and the collaborative filtering processing time is provided, the predicted processing time corresponding to the input target log number is calculated, and the input target The number of logs is extracted from the history data storage unit. Since the prediction processing time is known in advance, the number of logs can be set so that the prediction processing time is appropriate. In addition, by extracting logs of the set number of logs from the history data, the amount of calculation can be reduced so as to achieve the calculated predicted processing time. In this way, the amount of calculation can be reduced and the processing time can be shortened.

また、本発明の協調フィルタ装置において、前記ログ抽出部は、推薦対象ユーザの個人ログ数、比較対象ユーザの個人ログ数および両ユーザの共通取得アイテムのログ数から定まる類似度理論値に基づいて、前記推薦対象ユーザの個人ログ数に応じて前記比較対象ユーザの個人ログ数を限定し、限定された前記個人ログ数に対応する前記比較対象ユーザのログを前記履歴データベースから抽出するように構成されている。 In the collaborative filter device of the present invention, the log extraction unit is based on a theoretical similarity value determined from the number of personal logs of the recommendation target user, the number of personal logs of the comparison target user, and the number of logs of the common acquisition items of both users. The number of personal logs of the comparison target user is limited according to the number of personal logs of the recommendation target user, and the log of the comparison target user corresponding to the limited number of personal logs is extracted from the history database. Has been.

この構成により、類似度理論値に基づいて履歴データからログを抽出することで、推薦対象ユーザと取得アイテム数が近いユーザのログを残すことができ、フィルタリング精度をより高く保ったままログ数を削減できる。 With this configuration, by extracting logs from historical data based on theoretical similarity values, it is possible to leave logs of users who are close to the recommendation target user and the number of items to be acquired, and the number of logs can be increased while maintaining higher filtering accuracy. Can be reduced.

また、本発明の協調フィルタ装置において、前記ログ抽出部は、前記類似度理論値の類似度しきい値を設定する手段と、前記類似度しきい値以上の類似度理論値に対応する比較対象ユーザの個人ログ数であるしきい値基準個人ログ数を設定する手段と、前記履歴データ蓄積部のログから、前記しきい値基準個人ログ数に対応する比較対象ユーザのログの数のカウント値を取得するカウント取得手段と、を有し、前記カウント取得手段により得られるカウント値が前記目標ログ数を達成する範囲で前記類似度しきい値を最大に設定したときの前記しきい値基準個人ログ数へと、ログ抽出元の比較対象ユーザの個人ログ数を限定するように構成されている。 In the collaborative filter device of the present invention, the log extraction unit includes a means for setting a similarity threshold value of the similarity theory value and a comparison target corresponding to the similarity theory value equal to or higher than the similarity threshold value. Means for setting a threshold reference personal log number that is the number of personal logs of the user, and a count value of the number of logs of the comparison target user corresponding to the threshold reference personal log number from the logs of the history data storage unit The threshold reference individual when the similarity threshold value is set to the maximum within a range in which the count value obtained by the count acquisition unit achieves the target number of logs. It is configured to limit the number of personal logs of the comparison target user of the log extraction source to the number of logs.

この構成により、類似度理論値の類似度しきい値を設定し、この類似度しきい値を使って履歴データからログを抽出することで、推薦対象ユーザと取得アイテム数が近いユーザのログを残すことができ、フィルタリング精度をより高く保ったままログ数を削減できる。 With this configuration, the similarity threshold value of the similarity theory value is set, and by using this similarity threshold value to extract the log from the history data, the log of the user whose acquisition target number is close to the recommendation target user The number of logs can be reduced while maintaining higher filtering accuracy.

また、本発明の協調フィルタ装置において、前記基礎データ記憶部は、推薦対象ユーザを複数に分割したときのログ数と協調フィルタリング処理時間の関係の基礎データを記憶しており、前記処理時間算出部は、推薦対象ユーザを分割したときの前記基礎データを用いて予測処理時間を計算する。この構成により計算量を減らして処理時間を短縮できる。 Moreover, in the collaborative filter device of the present invention, the basic data storage unit stores basic data on the relationship between the number of logs and the collaborative filtering processing time when the recommendation target user is divided into a plurality of items, and the processing time calculating unit Calculates the prediction processing time using the basic data when the recommendation target user is divided. With this configuration, the amount of calculation can be reduced and the processing time can be shortened.

また、本発明の協調フィルタ装置において、前記基礎データ記憶部は、ログを複数に分割したときのログ数と協調フィルタリング処理時間の関係の基礎データを記憶しており、前記処理時間算出部は、ログを分割したときの前記基礎データを用いて予測処理時間を計算する。この構成により計算量を減らして処理時間を短縮できる。 Moreover, in the collaborative filter device of the present invention, the basic data storage unit stores basic data on the relationship between the number of logs and the collaborative filtering processing time when the log is divided into a plurality of logs, and the processing time calculation unit includes: The predicted processing time is calculated using the basic data when the log is divided. With this configuration, the amount of calculation can be reduced and the processing time can be shortened.

また、本発明の協調フィルタ装置において、前記基礎データ記憶部は、推薦対象ユーザおよびログをそれぞれ複数に分割したときのログ数と協調フィルタリング処理時間の関係の基礎データを記憶しており、前記処理時間算出部は、前薦対象ユーザおよびログを分割したときの前記基礎データを用いて予測処理時間を計算する。この構成により計算量を減らして処理時間を短縮できる。 Further, in the collaborative filter device of the present invention, the basic data storage unit stores basic data on the relationship between the number of logs and the collaborative filtering processing time when the recommendation target user and the log are each divided into a plurality of pieces, and the processing The time calculation unit calculates the predicted processing time using the basic data when the pre-recommended user and the log are divided. With this configuration, the amount of calculation can be reduced and the processing time can be shortened.

また、本発明の協調フィルタ装置において、前記処理対象データ抽出部は、さらに、協調フィルタリングの設定処理時間を入力する設定処理時間入力部と、前記基礎データ記憶部から前記基礎データを読み出して、前記処理時間入力部により入力された前記設定処理時間に対応する目標ログ数を前記基礎データから求める目標ログ数算出部とを備え、前記ログ抽出部は、前記設定処理時間が入力されたときは、前記目標ログ数算出部により求められた前記目標ログ数のログを前記履歴データ蓄積部に蓄積されたログから抽出する。 In the collaborative filter device of the present invention, the processing target data extraction unit further reads a setting processing time input unit that inputs a setting processing time of collaborative filtering and the basic data from the basic data storage unit, and A target log number calculation unit that obtains a target log number corresponding to the set processing time input by the processing time input unit from the basic data, and the log extraction unit, when the set processing time is input, The log of the target log number obtained by the target log number calculation unit is extracted from the logs stored in the history data storage unit.

この構成により、設定処理時間の入力に応じて、入力された設定処理時間に対応する目標ログ数が計算され、算出された目標ログ数のログが履歴データ蓄積部から抽出される。したがって、この構成でも、計算量を減らして処理時間を短縮するといったことを好適に行える。 With this configuration, the target number of logs corresponding to the input set processing time is calculated according to the input of the set processing time, and logs of the calculated target log number are extracted from the history data storage unit. Therefore, even with this configuration, it is possible to suitably reduce the amount of calculation and shorten the processing time.

また、本発明の協調フィルタ装置において、前記処理対象データ抽出部は、推薦対象ユーザの個人ログ数、比較対象ユーザの個人ログ数および両ユーザの共通取得アイテムのログ数から定まる類似度理論値の設定類似度しきい値を入力する類似度しきい値入力部と、前記設定類似度しきい値に対応する目標ログ数を求める目標ログ数取得部と、前記目標ログ数に対応する予測処理時間を算出する処理時間算出部とを有し、前記設定類似度しきい値が入力されたとき、前記目標ログ数取得部は、前記履歴データ蓄積部のログから、前記類似度しきい値以上の類似度理論値に対応する個人ログ数の比較対象ユーザのログの数のカウント値を前記目標ログ数として求め、前記処理時間算出部は、前記基礎データ記憶部から前記基礎データを読み出して、前記目標ログ数取得部により取得された前記目標ログ数に対応する予測処理時間を前記基礎データから求める。 Further, in the collaborative filter device of the present invention, the processing target data extraction unit has a theoretical similarity value determined from the number of personal logs of recommendation target users, the number of personal logs of comparison target users, and the number of logs of common acquisition items of both users. A similarity threshold value input unit for inputting a set similarity threshold value, a target log number acquisition unit for obtaining a target log number corresponding to the set similarity threshold value, and a predicted processing time corresponding to the target log number When the set similarity threshold value is input, the target log number acquisition unit obtains a value equal to or higher than the similarity threshold value from the log of the history data storage unit. The count value of the number of logs of the comparison target user log corresponding to the theoretical similarity value is obtained as the target log number, and the processing time calculation unit reads the basic data from the basic data storage unit Obtains the estimated processing time corresponding to the number of target logs acquired by the target log number obtaining unit from the basic data.

この構成により、入力された設定類似度しきい値から目標ログ数が算出され、目標ログ数から処理時間の予測値が算出される。したがって、設定類似度しきい値を調整して、処理時間が適切になるようにログを削減できる。類似度理論値の類似度しきい値を用いることで、高いフィルタリング精度を保ちながら計算量を削減でき、処理時間を短縮できる。 With this configuration, the target log number is calculated from the input set similarity threshold, and the predicted value of the processing time is calculated from the target log number. Therefore, the log can be reduced so that the processing time is appropriate by adjusting the set similarity threshold. By using the similarity threshold of the similarity theoretical value, the amount of calculation can be reduced while maintaining high filtering accuracy, and the processing time can be shortened.

また、本発明の別の態様は、多数のユーザによりアイテムが取得されるたびに発生するユーザ識別データおよびアイテム識別データを含むログを履歴データとして蓄積した履歴データ蓄積部と、前記履歴データ蓄積部に蓄積された前記履歴データを用いた協調フィルタリング処理によって推薦対象ユーザへの推薦アイテムを求める協調フィルタ部とを備えた協調フィルタ装置のために、前記履歴データ蓄積部に蓄積された多数のログから、前記協調フィルタ部で処理されるべきログを抽出する協調フィルタ支援装置である。この協調フィルタ支援装置は、ログ数と協調フィルタリングの処理時間との関係の基礎データを記憶した基礎データ記憶部と、協調フィルタリングで処理されるべき目標ログ数を入力する目標ログ数入力部と、前記基礎データ記憶部から前記基礎データを読み出して、前記ログ数入力部により入力された前記目標ログ数に対応する予測処理時間を前記基礎データから求める処理時間算出部と、前記処理時間算出部により算出された前記予測処理時間を出力する処理時間出力部と、前記ログ数入力部により入力された前記目標ログ数に基づき、前記履歴データ蓄積部に蓄積されたログから前記目標ログ数のログを抽出するログ抽出部とを備えている。この構成によっても、上述の本発明の利点が得られる。 According to another aspect of the present invention, there is provided a history data storage unit that stores user identification data generated each time an item is acquired by a large number of users and a log including the item identification data as history data, and the history data storage unit. From a large number of logs stored in the history data storage unit, for a collaborative filter device comprising a collaborative filter unit that seeks a recommended item for a recommendation target user by collaborative filtering processing using the history data stored in A collaborative filter support apparatus that extracts a log to be processed by the collaborative filter unit. This collaborative filter support device includes a basic data storage unit that stores basic data on the relationship between the number of logs and the processing time of collaborative filtering, a target log number input unit that inputs a target log number to be processed by collaborative filtering, By reading the basic data from the basic data storage unit and calculating a predicted processing time corresponding to the target log number input by the log number input unit from the basic data, and by the processing time calculation unit Based on the target log number input by the log number input unit, a processing time output unit that outputs the calculated predicted processing time, and logs of the target log number from the logs accumulated in the history data storage unit And a log extracting unit for extracting. This configuration also provides the above-described advantages of the present invention.

また、本発明の別の態様は、多数のユーザによりアイテムが取得されるたびに発生するユーザ識別データおよびアイテム識別データを含むログを履歴データとして蓄積した履歴データ蓄積部と、前記履歴データ蓄積部に蓄積された前記履歴データを用いた協調フィルタリング処理によって推薦対象ユーザへの推薦アイテムを求める協調フィルタ部とを備えた協調フィルタ装置のために、前記履歴データ蓄積部に蓄積された多数のログから、前記協調フィルタ部で処理されるべきログをコンピュータの処理によって抽出する協調フィルタ支援処理方法である。この協調フィルタ支援処理方法は、協調フィルタリングで処理されるべき目標ログ数を目標ログ数入力部から入力し、ログ数と協調フィルタリングの処理時間との関係の基礎データを記憶した基礎データ記憶部から基礎データを読み出し、前記基礎データ記憶部から読み出した前記基礎データから前記ログ数入力部により入力された前記目標ログ数に対応する予測処理時間を求め、算出された前記予測処理時間を処理時間出力部から出力し、前記ログ数入力部により入力された前記目標ログ数に基づき、前記履歴データ蓄積部に蓄積されたログから前記目標ログ数のログを抽出する。この方法によっても、上述の本発明の利点が得られる。 According to another aspect of the present invention, there is provided a history data storage unit that stores user identification data generated each time an item is acquired by a large number of users and a log including the item identification data as history data, and the history data storage unit. From a large number of logs stored in the history data storage unit, for a collaborative filter device comprising a collaborative filter unit that seeks a recommended item for a recommendation target user by collaborative filtering processing using the history data stored in A collaborative filter support processing method for extracting a log to be processed by the collaborative filter unit by computer processing. This collaborative filter support processing method inputs a target log number to be processed by collaborative filtering from a target log number input unit, and from a basic data storage unit that stores basic data on the relationship between the number of logs and the processing time of collaborative filtering. Reading basic data, obtaining a predicted processing time corresponding to the target log number input by the log number input unit from the basic data read from the basic data storage unit, and outputting the calculated predicted processing time as a processing time The log having the target number of logs is extracted from the log accumulated in the history data accumulation unit based on the target log number output from the log unit and input by the log number input unit. This method also provides the above-described advantages of the present invention.

また、本発明の別の態様は、多数のユーザによりアイテムが取得されるたびに発生するユーザ識別データおよびアイテム識別データを含むログを履歴データとして蓄積した履歴データ蓄積部と、前記履歴データ蓄積部に蓄積された前記履歴データを用いた協調フィルタリング処理によって推薦対象ユーザへの推薦アイテムを求める協調フィルタ部とを備えた協調フィルタ装置のために、前記履歴データ蓄積部に蓄積された多数のログから、前記協調フィルタ部で処理されるべきログを抽出する処理をコンピュータに実行させる協調フィルタ支援処理プログラムである。この協調フィルタ支援処理プログラムは、協調フィルタリングで処理されるべき目標ログ数を目標ログ数入力部から入力し、ログ数と協調フィルタリングの処理時間との関係の基礎データを記憶した基礎データ記憶部から基礎データを読み出し、前記基礎データ記憶部から読み出した前記基礎データから前記ログ数入力部により入力された前記目標ログ数に対応する予測処理時間を求め、算出された前記予測処理時間を処理時間出力部から出力し、前記ログ数入力部により入力された前記目標ログ数に基づき、前記履歴データ蓄積部に蓄積されたログから前記目標ログ数のログを抽出する処理をコンピュータに実行させる。この構成によっても、上述の本発明の利点が得られる。 According to another aspect of the present invention, there is provided a history data storage unit that stores user identification data generated each time an item is acquired by a large number of users and a log including the item identification data as history data, and the history data storage unit. From a large number of logs stored in the history data storage unit, for a collaborative filter device comprising a collaborative filter unit that seeks a recommended item for a recommendation target user by collaborative filtering processing using the history data stored in A collaborative filter support processing program for causing a computer to execute processing for extracting a log to be processed by the collaborative filter unit. This collaborative filter support processing program inputs a target log number to be processed by collaborative filtering from a target log number input unit, and from a basic data storage unit that stores basic data on the relationship between the number of logs and the processing time of collaborative filtering. Reading basic data, obtaining a predicted processing time corresponding to the target log number input by the log number input unit from the basic data read from the basic data storage unit, and outputting the calculated predicted processing time as a processing time Based on the target log number output from the log number input unit and causing the computer to execute processing for extracting the log of the target log number from the log stored in the history data storage unit. This configuration also provides the above-described advantages of the present invention.

また、本発明の別の態様は、多数のユーザによりアイテムが取得されるたびに発生するユーザ識別データおよびアイテム識別データを含むログを履歴データとして蓄積した履歴データ蓄積部と、前記履歴データ蓄積部に蓄積された前記履歴データを用いた協調フィルタリング処理によって推薦対象ユーザへの推薦アイテムを求める協調フィルタ部とを備えた協調フィルタ装置のために、前記履歴データ蓄積部に蓄積された多数のログから、前記協調フィルタ部で処理されるべきログを抽出する協調フィルタ支援装置である。この協調フィルタ支援装置は、ログ数と協調フィルタリングの処理時間との関係の基礎データを記憶した基礎データ記憶部と、協調フィルタリングの設定処理時間を入力する設定処理時間入力部と、前記基礎データ記憶部から前記基礎データを読み出して、前記設定処理時間入力部により入力された前記設定処理時間に対応する目標ログ数を前記基礎データから求める目標ログ数算出部と、前記目標ログ数算出部により算出された前記目標ログ数を出力する目標ログ数出力部と、前記目標ログ数算出部により算出された前記目標ログ数に基づき、前記履歴データ蓄積部に蓄積されたログから前記目標ログ数のログを抽出するログ抽出部とを備えている。この構成により、設定処理時間の入力に応じて、入力された設定処理時間に対応する目標ログ数が計算され、算出された目標ログ数のログが履歴データ蓄積部から抽出される。したがって、この構成でも、計算量を減らして処理時間を短縮するといったことを好適に行える。 According to another aspect of the present invention, there is provided a history data storage unit that stores user identification data generated each time an item is acquired by a large number of users and a log including the item identification data as history data, and the history data storage unit. From a large number of logs stored in the history data storage unit, for a collaborative filter device comprising a collaborative filter unit that seeks a recommended item for a recommendation target user by collaborative filtering processing using the history data stored in A collaborative filter support apparatus that extracts a log to be processed by the collaborative filter unit. The collaborative filter support apparatus includes a basic data storage unit that stores basic data of a relationship between the number of logs and a processing time of collaborative filtering, a setting processing time input unit that inputs a setting processing time of collaborative filtering, and the basic data storage The basic data is read from the unit, and the target log number calculation unit that obtains the target log number corresponding to the setting processing time input by the setting processing time input unit from the basic data is calculated by the target log number calculation unit A target log number output unit that outputs the target log number, and a log of the target log number from logs accumulated in the history data accumulation unit based on the target log number calculated by the target log number calculation unit And a log extracting unit for extracting. With this configuration, the target number of logs corresponding to the input set processing time is calculated according to the input of the set processing time, and logs of the calculated target log number are extracted from the history data storage unit. Therefore, even with this configuration, it is possible to suitably reduce the amount of calculation and shorten the processing time.

また、本発明の別の態様は、多数のユーザによりアイテムが取得されるたびに発生するユーザ識別データおよびアイテム識別データを含むログを履歴データとして蓄積した履歴データ蓄積部と、前記履歴データ蓄積部に蓄積された前記履歴データを用いた協調フィルタリング処理によって推薦対象ユーザへの推薦アイテムを求める協調フィルタ部とを備えた協調フィルタ装置のために、前記履歴データ蓄積部に蓄積された多数のログから、前記協調フィルタ部で処理されるべきログを抽出する処理をコンピュータに実行させる協調フィルタ支援処理プログラムである。この協調フィルタ支援処理プログラムは、協調フィルタリングの設定処理時間を設定処理時間入力部から入力し、ログ数と協調フィルタリングの処理時間との関係の基礎データを記憶した基礎データ記憶部から基礎データを読み出し、前記基礎データ記憶部から読み出した前記基礎データから前記設定処理時間入力部により入力された前記設定処理時間に対応する目標ログ数を求め、算出された前記目標ログ数を目標ログ数出力部から出力し、前記目標ログ数算出部により算出された前記目標ログ数に基づき、前記履歴データ蓄積部に蓄積されたログから前記目標ログ数のログを抽出する処理をコンピュータに実行させる。この構成によっても、上述の本発明の利点が得られる。 According to another aspect of the present invention, there is provided a history data storage unit that stores user identification data generated each time an item is acquired by a large number of users and a log including the item identification data as history data, and the history data storage unit. From a large number of logs stored in the history data storage unit, for a collaborative filter device comprising a collaborative filter unit that seeks a recommended item for a recommendation target user by collaborative filtering processing using the history data stored in A collaborative filter support processing program for causing a computer to execute processing for extracting a log to be processed by the collaborative filter unit. This collaborative filter support processing program inputs collaborative filtering setting processing time from a setting processing time input unit, and reads out basic data from a basic data storage unit that stores basic data on the relationship between the number of logs and the collaborative filtering processing time. The target log number corresponding to the set processing time input by the set processing time input unit is obtained from the basic data read from the basic data storage unit, and the calculated target log number is calculated from the target log number output unit. And outputting a log of the target number of logs from the log stored in the history data storage unit based on the target log number calculated by the target log number calculation unit. This configuration also provides the above-described advantages of the present invention.

また、本発明の別の態様は、多数のユーザによりアイテムが取得されるたびに発生するユーザ識別データおよびアイテム識別データを含むログを履歴データとして蓄積した履歴データ蓄積部と、前記履歴データ蓄積部に蓄積された前記履歴データを用いた協調フィルタリング処理によって推薦対象ユーザへの推薦アイテムを求める協調フィルタ部とを備えた協調フィルタ装置のために、前記履歴データ蓄積部に蓄積された多数のログから、前記協調フィルタ部で処理されるべきログを抽出する協調フィルタ支援装置である。この協調フィルタ支援装置は、推薦対象ユーザの個人ログ数、比較対象ユーザの個人ログ数および両ユーザの共通取得アイテムのログ数から定まる類似度理論値の類似度しきい値を入力する手段と、前記類似度しきい値以上の類似度理論値に対応する比較対象ユーザの個人ログ数であるしきい値基準個人ログ数を設定する手段と、前記しきい値基準個人ログ数の比較対象ユーザのログを前記履歴データ蓄積部から抽出する手段とを備えている。この構成により、類似度理論値の類似度しきい値を設定し、この類似度しきい値を使って履歴データからログを抽出することで、推薦対象ユーザと取得アイテム数が近いユーザのログを残すことができ、フィルタリング精度をより高く保ったままログ数を削減でき、計算量を減らして処理時間を短縮できる。 According to another aspect of the present invention, there is provided a history data storage unit that stores user identification data generated each time an item is acquired by a large number of users and a log including the item identification data as history data, and the history data storage unit. From a large number of logs stored in the history data storage unit, for a collaborative filter device comprising a collaborative filter unit that seeks a recommended item for a recommendation target user by collaborative filtering processing using the history data stored in A collaborative filter support apparatus that extracts a log to be processed by the collaborative filter unit. The collaborative filter support device includes a means for inputting a similarity threshold of a theoretical similarity value determined from the number of personal logs of a user to be recommended, the number of personal logs of a user to be compared, and the number of logs of a common acquisition item of both users; Means for setting a threshold reference personal log number that is the number of personal logs of a comparison target user corresponding to a similarity theoretical value equal to or greater than the similarity threshold; and a comparison target user of the threshold reference personal log number Means for extracting a log from the history data storage unit. With this configuration, the similarity threshold value of the similarity theory value is set, and by using this similarity threshold value to extract the log from the history data, the log of the user whose acquisition target number is close to the recommendation target user The number of logs can be reduced while maintaining higher filtering accuracy, and the processing time can be reduced by reducing the amount of calculation.

本発明は、ログ数と協調フィルタリング処理時間との関係の基礎データを記憶した基礎データ記憶部を設け、目標ログ数に対応する予測処理時間を計算し、目標ログ数のログを履歴データ蓄積部から抽出しており、これにより、計算量を減らして処理時間を短縮できるという効果を有する協調フィルタ装置を提供することができる。 The present invention provides a basic data storage unit that stores basic data on the relationship between the number of logs and the collaborative filtering processing time, calculates a predicted processing time corresponding to the target number of logs, and stores a log of the target number of logs as a history data storage unit Thus, it is possible to provide a collaborative filter device that has the effect of reducing the amount of calculation and shortening the processing time.

以下、本発明の実施の形態の協調フィルタ装置について、図面を用いて説明する。 Hereinafter, a collaborative filter device according to an embodiment of the present invention will be described with reference to the drawings.

本発明の実施の形態に係る協調フィルタ装置を図１に示す。図１において、協調フィルタ装置１は、商品購入システム３と共に示されている。本実施の形態では、商品がコンテンツである。商品購入システム３は、購入処理部５とコンテンツデータベース７を備えている。購入処理部５は、ＷＷＷサーバを含む情報配信装置で構成されており、インターネット９を介してユーザ端末装置１１と通信する機能を備えている。コンテンツデータベース７は、ユーザに販売されるべき多数のコンテンツデータを記憶している。ユーザ端末装置１１はパーソナルコンピュータおよび携帯端末等である。購入処理部５は、コンテンツデータベース７からコンテンツデータを読み出し、インターネット９を介してユーザ端末装置１１へと提供する。購入処理部５は、コンテンツデータをユーザ端末装置１１へ提供するたびに、履歴データを生成し、協調フィルタ装置１に提供し、協調フィルタ装置１で履歴データが蓄積される。 A collaborative filter device according to an embodiment of the present invention is shown in FIG. In FIG. 1, the collaborative filter device 1 is shown together with a product purchase system 3. In the present embodiment, the product is content. The product purchase system 3 includes a purchase processing unit 5 and a content database 7. The purchase processing unit 5 is composed of an information distribution device including a WWW server, and has a function of communicating with the user terminal device 11 via the Internet 9. The content database 7 stores a large amount of content data to be sold to the user. The user terminal device 11 is a personal computer, a portable terminal, or the like. The purchase processing unit 5 reads content data from the content database 7 and provides it to the user terminal device 11 via the Internet 9. Each time the purchase processing unit 5 provides content data to the user terminal device 11, the purchase processing unit 5 generates history data and provides it to the collaborative filter device 1, and the collaborative filter device 1 accumulates the history data.

図２は、図１に示される全体構成の動作のフローチャートであり、コンテンツ購入と履歴データ蓄積の流れを示している。ユーザ端末装置１１により購入処理部５がアクセスされて、コンテンツが検索され（Ｓ１）、検索されたコンテンツのダウンロードが要求されると（Ｓ３、Ｙｅｓ）、コンテンツがダウンロードされ（Ｓ５）、そして、ダウンロードの履歴データが生成されて、蓄積される（Ｓ７）。購入動作が終了していなければ（ステップＳ９、Ｎｏ）、ステップＳ１に戻る。 FIG. 2 is a flowchart of the operation of the overall configuration shown in FIG. 1, and shows the flow of content purchase and history data accumulation. When the purchase processing unit 5 is accessed by the user terminal device 11 to search for the content (S1), and the download of the searched content is requested (S3, Yes), the content is downloaded (S5), and then downloaded. History data is generated and stored (S7). If the purchase operation has not ended (No at Step S9), the process returns to Step S1.

協調フィルタ装置１は、履歴データ蓄積部２１と、協調フィルタ部２３と、推薦情報配信部２５と、商品情報記憶部２７と、処理対象データ抽出部２９とで構成される。 The collaborative filter device 1 includes a history data storage unit 21, a collaborative filter unit 23, a recommendation information distribution unit 25, a product information storage unit 27, and a processing target data extraction unit 29.

協調フィルタ装置１は購入処理部５との通信機能を備え、この通信機能が履歴データ入力部として機能し、購入処理部５から提供される履歴データを入力し、そして、履歴データが履歴データ蓄積部２１に蓄積される。本実施の形態では、一回の商品購入の履歴データをログという。 The collaborative filter device 1 has a communication function with the purchase processing unit 5, this communication function functions as a history data input unit, inputs history data provided from the purchase processing unit 5, and the history data accumulates history data. Stored in the unit 21. In the present embodiment, the history data of a single product purchase is referred to as a log.

ログは、ユーザ識別データと商品識別データを含む。ユーザ識別データは、商品を購入したユーザを識別するデータであり、商品識別データは、購入された商品を特定するデータである。図３はデータの例を示しており、図３ではユーザ識別データがユーザＩＤであり、商品識別データがコンテンツＩＤである。なお、商品であるコンテンツは、本発明でユーザに推薦されるべきアイテムの一例である。また、履歴データすなわちログは、上記のユーザ識別データおよび商品識別データの他に、購入日時等のデータを含んでよい。 The log includes user identification data and product identification data. The user identification data is data for identifying the user who has purchased the product, and the product identification data is data for specifying the purchased product. FIG. 3 shows an example of data. In FIG. 3, the user identification data is a user ID, and the product identification data is a content ID. Note that the content that is a product is an example of an item that should be recommended to the user in the present invention. Further, the history data, that is, the log may include data such as purchase date and time in addition to the above-described user identification data and product identification data.

協調フィルタ部２３は、履歴データ蓄積部２１に蓄積された履歴データを用いた協調フィルタリング処理によって推薦対象ユーザへの推薦アイテムを求める構成である。協調フィルタ部２３は、協調フィルタリング処理のプログラムをコンピュータが実行することによって実現される。協調フィルタ部２３は、類似度算出部３１と推薦部３３とで構成されている。 The collaborative filter unit 23 is configured to obtain a recommended item for the recommendation target user by collaborative filtering processing using the history data accumulated in the history data accumulation unit 21. The collaborative filter unit 23 is realized by a computer executing a collaborative filtering process program. The collaborative filter unit 23 includes a similarity calculation unit 31 and a recommendation unit 33.

推薦情報配信部２５は、ＷＷＷサーバを含む情報配信装置で構成されており、協調フィルタ部２３による処理結果を出力する出力部に相当する。商品情報記憶部２７は、多数の商品についての商品情報を記憶している。推薦情報配信部２５は、協調フィルタ部２３から、推薦対象ユーザに推薦されるべき推薦商品を特定するデータの供給を受ける。そして、推薦情報配信部２５は、推薦商品の商品情報を商品情報記憶部２７から読み出して、推薦対象ユーザのユーザ端末装置１１へとインターネット９を介して提供する。商品情報（推薦情報）は電子メールに組み込まれてもよいし、別の例では、紙に印刷して郵便にて送付してもよい。 The recommended information distribution unit 25 includes an information distribution device including a WWW server, and corresponds to an output unit that outputs a processing result from the collaborative filter unit 23. The merchandise information storage unit 27 stores merchandise information about a large number of merchandise. The recommendation information distribution unit 25 receives supply of data specifying a recommended product to be recommended to the recommendation target user from the collaborative filter unit 23. Then, the recommended information distribution unit 25 reads the product information of the recommended product from the product information storage unit 27 and provides it to the user terminal device 11 of the recommendation target user via the Internet 9. The merchandise information (recommended information) may be incorporated into an e-mail, or in another example, printed on paper and sent by mail.

処理対象データ抽出部２９は、履歴データ蓄積部２１に蓄積された多数のログから、協調フィルタ部２３で処理されるべきログを抽出する構成である。協調フィルタ部２３の内部構成自体は従来技術と同様でよいが、協調フィルタ部２３は、処理対象データ抽出部２９で抽出されたログを処理対象（入力データ）としている。すなわち、本実施の形態は、通常の協調フィルタ部２３に加えて処理対象データ抽出部２９が備えられている点で従来技術と異なる。 The processing target data extraction unit 29 is configured to extract a log to be processed by the collaborative filter unit 23 from a large number of logs stored in the history data storage unit 21. The internal configuration itself of the collaborative filter unit 23 may be the same as that of the related art, but the collaborative filter unit 23 uses the log extracted by the processing target data extracting unit 29 as a processing target (input data). That is, this embodiment differs from the prior art in that a processing target data extraction unit 29 is provided in addition to the normal collaborative filter unit 23.

処理対象データ抽出部２９は、協調フィルタ部２３と一体化されており、協調フィルタ装置２３と同じコンピュータによって実現される。すなわち、処理対象データ抽出部２９を実現するプログラムが、協調フィルタ部２３のコンピュータにより実行される。処理対象データ抽出部２９は、協調フィルタ部での処理の前に処理対象データを調整する機能を有しており、この点で、協調フィルタ支援装置と呼ぶこともできる。この観点では、協調フィルタ支援装置が協調フィルタ装置に一体化され、組み込まれている。なお、処理対象データ抽出部２９は協調フィルタ装置２３と同じコンピュータによって実現すると説明したが、別のコンピュータで実現してもよい。 The processing target data extraction unit 29 is integrated with the collaborative filter unit 23 and is realized by the same computer as the collaborative filter device 23. That is, a program for realizing the processing target data extraction unit 29 is executed by the computer of the collaborative filter unit 23. The processing target data extraction unit 29 has a function of adjusting processing target data before processing in the collaborative filter unit, and in this respect, it can also be called a collaborative filter support device. From this viewpoint, the collaborative filter support device is integrated and incorporated in the collaborative filter device. The processing target data extraction unit 29 has been described as being realized by the same computer as the collaborative filter device 23, but may be realized by another computer.

以上に、本実施の形態に係る協調フィルタ装置１の全体概要を説明した。次に、協調フィルタ部２３による協調フィルタリング処理について説明し、その後で、本実施の形態に特徴的な処理対象データ抽出部２９について説明する。 The overall outline of the collaborative filter device 1 according to the present embodiment has been described above. Next, the collaborative filtering process by the collaborative filter unit 23 will be described, and then the processing target data extraction unit 29 characteristic of the present embodiment will be described.

「協調フィルタリング処理」
図４は、購入履歴の具体例を、ジャッカード係数の類似度と共に示している。図４は、ユーザＡ、Ｂ、Ｃの購入履歴データと類似度である。ユーザＡは、コンテンツＵ１、Ｕ２、Ｕ３を購入し、ユーザＢはコンテンツＵ２、Ｕ３、Ｕ４、Ｕ５を購入し、ユーザＣは、コンテンツＵ２、Ｕ３、Ｕ５、Ｕ６、Ｕ７を購入している。 "Collaborative filtering process"
FIG. 4 shows a specific example of the purchase history together with the similarity of the Jackard coefficient. FIG. 4 shows the purchase history data of the users A, B, and C and the similarity. User A purchases content U1, U2, U3, user B purchases content U2, U3, U4, U5, and user C purchases content U2, U3, U5, U6, U7.

ジャッカード（Ｊａｃｃａｒｄ）係数を用いる場合、ユーザ間の類似度は下記のように定義される。
ユーザＡ、Ｂの類似度：Ｓｉｍ（Ａ，Ｂ）＝（Ａ∩Ｂ）／（Ａ∪Ｂ）
（Ａ∩Ｂ）は、共通購入商品の数である。また、（Ａ∪Ｂ）は、ユーザＡ、Ｂの購入商品数の合計から共通購入商品の数を引いた値である（ユーザＡ、Ｂにより購入された商品の種類の数）。したがって、図４の場合、類似度の計算値は下記の通りである。
ユーザＡ，Ｂの類似度：Ｓｉｍ（Ａ，Ｂ）＝２／５＝０．４
ユーザＡ，Ｃの類似度：Ｓｉｍ（Ａ，Ｃ）＝２／６＝０．３３
ユーザＢ，Ｃの類似度：Ｓｉｍ（Ｂ，Ｃ）＝３／６＝０．５ When using the Jaccard coefficient, the similarity between users is defined as follows.
Similarity between users A and B: Sim (A, B) = (A∩B) / (A∪B)
(A∩B) is the number of commonly purchased products. Further, (A∪B) is a value obtained by subtracting the number of common purchased products from the total number of purchased products of the users A and B (the number of types of products purchased by the users A and B). Therefore, in the case of FIG. 4, the calculated value of the similarity is as follows.
Similarity between users A and B: Sim (A, B) = 2/5 = 0.4
Similarity between users A and C: Sim (A, C) = 2/6 = 0.33
Similarity between users B and C: Sim (B, C) = 3/6 = 0.5

図５（ａ）、図５（ｂ）は、類似度から求められる推薦値のリストを示している。例えば、ユーザＡは、コンテンツＵ４をまだ購入していない。そして、コンテンツＵ４はユーザＢによって購入されている。したがって、ユーザＡに対するコンテンツＵ４の推薦値は、Ｓｉｍ（Ａ，Ｂ）＝０．４である。 FIG. 5A and FIG. 5B show a list of recommended values obtained from the similarity. For example, the user A has not purchased the content U4 yet. The content U4 is purchased by the user B. Therefore, the recommended value of the content U4 for the user A is Sim (A, B) = 0.4.

また、ユーザＡは、コンテンツＵ５をまだ購入していない。そして、コンテンツＵ５はユーザＢ、Ｃによって購入されている。したがって、ユーザＡに対するコンテンツＵ５の推薦値は、Ｓｉｍ（Ａ，Ｂ）とＳｉｍ（Ａ，Ｃ）の合計で求められる。 Further, the user A has not purchased the content U5 yet. The content U5 is purchased by the users B and C. Therefore, the recommended value of the content U5 for the user A is obtained by the sum of Sim (A, B) and Sim (A, C).

こうして、図５（ａ）に示されるように、各ユーザの未購入商品の推薦値が、同商品を購入したユーザの類似度の加算によって得られる。図５（ｂ）では、推薦値が高い順にコンテンツが並べられている。図５の推薦値から、推薦コンテンツを決定することができる。すなわち、推薦値が大きいコンテンツが、推薦コンテンツとして求められる。このとき、推薦値が最大のコンテンツが選択されてもよい。また、推薦値が大きい所定個数のコンテンツが選択されてもよい。また、推薦値が所定値以上のコンテンツが選択されてもよい。 In this way, as shown in FIG. 5A, the recommended value of the unpurchased product for each user is obtained by adding the similarity of the user who purchased the product. In FIG. 5B, the contents are arranged in descending order of recommended values. The recommended content can be determined from the recommended values in FIG. That is, content with a large recommended value is obtained as recommended content. At this time, the content having the maximum recommended value may be selected. A predetermined number of contents having a large recommended value may be selected. In addition, content whose recommended value is a predetermined value or more may be selected.

図１の協調フィルタ部２３では、類似度算出部３１が、図４に示される処理を行い、ユーザ間の類似度を算出する。そして、推薦部３３が、図５に示される処理を行い、推薦値を計算し、推薦値から推薦コンテンツを決定する。推薦コンテンツのコンテンツＩＤが、推薦対象ユーザのユーザ識別データと共に推薦情報配信部２５に提供される。推薦情報配信部２５が、コンテンツＩＤに関連づけられた商品情報を商品情報記憶部２７から読み出し、推薦対象のユーザに送信する。 In the collaborative filter unit 23 of FIG. 1, the similarity calculation unit 31 performs the processing shown in FIG. 4 to calculate the similarity between users. And the recommendation part 33 performs the process shown by FIG. 5, calculates a recommendation value, and determines a recommendation content from a recommendation value. The content ID of the recommended content is provided to the recommended information distribution unit 25 together with the user identification data of the recommendation target user. The recommendation information distribution unit 25 reads the product information associated with the content ID from the product information storage unit 27 and transmits it to the recommended user.

図６は、協調フィルタ部２３の動作を示すフローチャートである。ユーザ間の類似度が、類似度算出部３１により算出され（Ｓ１１）、推薦コンテンツの推薦度（推薦値）が推薦部３３により算出され（Ｓ１３）、さらに推薦リストが推薦部３３により生成される（Ｓ１５）。 FIG. 6 is a flowchart showing the operation of the collaborative filter unit 23. The similarity between users is calculated by the similarity calculation unit 31 (S11), the recommendation level (recommended value) of the recommended content is calculated by the recommendation unit 33 (S13), and a recommendation list is generated by the recommendation unit 33. (S15).

以上は、ジャッカード係数を利用する場合の協調フィルタリングである。次に、余弦を利用する場合の協調フィルタリングについて説明する。 The above is collaborative filtering when using the Jackard coefficient. Next, collaborative filtering when using cosine will be described.

図７は、評価値を含んだ購入履歴データの具体例例を、余弦の類似度と共に示している。購入履歴データは、各ユーザの各コンテンツに対する評価（１〜５）を含んでいる。例えば、ユーザＡがコンテンツＵ１を購入しており、ユーザＡによるコンテンツＵ１の評価値は、３である。評価値は、図１のユーザ端末装置１１から購入処理部５に送られ、購入処理部５が履歴データに評価値を組み込む。 FIG. 7 shows a specific example of purchase history data including evaluation values together with cosine similarity. The purchase history data includes evaluations (1 to 5) for each content of each user. For example, the user A has purchased the content U1, and the evaluation value of the content U1 by the user A is 3. The evaluation value is sent from the user terminal device 11 of FIG. 1 to the purchase processing unit 5, and the purchase processing unit 5 incorporates the evaluation value into the history data.

余弦の類似度は、図示のように、各ユーザのベクトルの余弦で表される。図の例では、ユーザ間の類似度が下記の通りになる。
ユーザＡ，Ｂの類似度：Ｓｉｍ（Ａ，Ｂ）＝０．６４８
ユーザＡ，Ｃの類似度：Ｓｉｍ（Ａ，Ｃ）＝０．５３１
ユーザＢ，Ｃの類似度：Ｓｉｍ（Ｂ，Ｃ）＝０．５８９ The cosine similarity is represented by the cosine of each user's vector, as shown. In the example of the figure, the similarity between users is as follows.
Similarity between users A and B: Sim (A, B) = 0.648
Similarity between users A and C: Sim (A, C) = 0.531
Similarity between users B and C: Sim (B, C) = 0.589

図８（ａ）、図８（ｂ）は、図７の購入履歴データと類似度から求められる推薦値のリストを示している。この場合、類似度に評価値がかけ算されて、推薦値が計算される。例えば、ユーザＡは、コンテンツＵ４をまだ購入していない。そして、コンテンツＵ４はユーザＢによって購入されている。コンテンツＵ４に対するユーザＢの評価値は、３である。したがって、ユーザＡに対するコンテンツＵ４の推薦値は、Ｓｉｍ（Ａ，Ｂ）×（Ｂの評価値）＝０．６４８×３＝２．５９である。 FIGS. 8A and 8B show a list of recommended values obtained from the purchase history data of FIG. 7 and the similarity. In this case, the recommendation value is calculated by multiplying the evaluation value by the similarity. For example, the user A has not purchased the content U4 yet. The content U4 is purchased by the user B. The evaluation value of the user B for the content U4 is 3. Therefore, the recommended value of the content U4 for the user A is Sim (A, B) × (B evaluation value) = 0.648 × 3 = 2.59.

また、ユーザＡは、コンテンツＵ５をまだ購入していない。そして、コンテンツＵ５はユーザＢ、Ｃによって購入されている。コンテンツＵ５に対するユーザＢ、Ｃの評価は、それぞれ、４、３である。これらを使って、ユーザＡに対するコンテンツＵ５の推薦値は、「Ｓｉｍ（Ａ，Ｂ）×（Ｂの評価値）」と「Ｓｉｍ（Ａ，Ｃ）×（Ｃの評価値）」の合計で求められる。図８（ａ）では、Ｂ、Ｃの評価値が、単に、Ｂ、Ｃと表記されている。 Further, the user A has not purchased the content U5 yet. The content U5 is purchased by the users B and C. The evaluations of the users B and C for the content U5 are 4, 3 respectively. Using these, the recommended value of the content U5 for the user A is obtained as the sum of “Sim (A, B) × (B evaluation value)” and “Sim (A, C) × (C evaluation value)”. It is done. In FIG. 8A, the evaluation values of B and C are simply expressed as B and C.

こうして、図８（ａ）に示されるように、各ユーザの未購入商品の推薦値が、同商品を購入したユーザの「類似度と評価値の積」の加算によって得られる。図８（ｂ）では、推薦値が高い順にコンテンツが並べられている。図８の推薦値から、図５で説明したようにして、推薦コンテンツを決定することができる。 In this way, as shown in FIG. 8A, the recommended value of each user's unpurchased product is obtained by adding the “product of similarity and evaluation value” of the user who purchased the product. In FIG. 8B, the contents are arranged in descending order of recommended values. From the recommended values in FIG. 8, the recommended content can be determined as described in FIG.

図９は、図７、図８の処理の変形例である。図７の購入履歴データでは、評価値が１〜５であった。図９の例では、評価値が−２〜２である。この場合、図９(ａ）に示すように、類似度は下記の通りになる。
ユーザＡ，Ｂの類似度：Ｓｉｍ（Ａ，Ｂ）＝０．８９
ユーザＡ，Ｃの類似度：Ｓｉｍ（Ａ，Ｃ）＝０．７１
ユーザＢ，Ｃの類似度：Ｓｉｍ（Ｂ，Ｃ）＝０．６３ FIG. 9 is a modification of the processing of FIGS. In the purchase history data in FIG. 7, the evaluation values were 1 to 5. In the example of FIG. 9, the evaluation value is −2 to 2. In this case, as shown in FIG. 9A, the similarity is as follows.
Similarity between users A and B: Sim (A, B) = 0.89
Similarity between users A and C: Sim (A, C) = 0.71
Similarity between users B and C: Sim (B, C) = 0.63

そして、推薦リストは、図９（ｂ）、図９（ｃ）に示す通りになる。図９（ｃ）では、推薦値が正（プラス）のコンテンツのみでリストが作られている。 The recommendation list is as shown in FIGS. 9B and 9C. In FIG. 9C, a list is created only with content whose recommended value is positive (plus).

ちなみに、類似度の算出に関して、ジャッカード係数、余弦を利用した協調フィルタリング処理を説明したが、上記以外の処理を利用した協調フィルタリング処理を用いてもよい。 Incidentally, although the collaborative filtering process using the Jackard coefficient and the cosine has been described regarding the calculation of the similarity, the collaborative filtering process using a process other than the above may be used.

「処理対象データ抽出部」
次に、本実施の形態に特徴的な処理対象データ抽出部２９について詳細に説明する。処理対象データ抽出部２９は、以下のように、協調フィルタリングの処理時間等を予測するシミュレーション機能と、シミュレーション結果に基づいて、協調フィルタリングで処理されるべきデータ量を削減する機能を実現する。 "Processing data extractor"
Next, the processing target data extraction unit 29 characteristic of this embodiment will be described in detail. The processing target data extracting unit 29 realizes a simulation function for predicting the processing time of collaborative filtering and a function for reducing the amount of data to be processed by collaborative filtering based on the simulation result as follows.

図１０は、処理対象データ抽出部２９の構成を示している。処理対象データ抽出部２９は、ログ数と協調フィルタリング処理時間との関係の基礎データを記憶した基礎データ記憶部４１と、協調フィルタリングの処理対象の目標ログ数を入力する目標ログ数入力部４３と、基礎データ記憶部４１から基礎データを読み出して、ログ数入力部により入力された目標ログ数に対応する予測処理時間を基礎データから求める処理時間算出部４５と、処理時間算出部４５により算出された予測処理時間を出力する処理時間出力部４７と、処理時間出力部４７により出力された予測処理時間に対する許可判定を受け付ける許可判定受付部４９と、目標ログ数入力部４３により入力された目標ログ数に基づき、履歴データ蓄積部２１に蓄積されたログから目標ログ数のログを抽出するログ抽出部５１とを備えている。 FIG. 10 shows the configuration of the processing target data extraction unit 29. The processing target data extraction unit 29 includes a basic data storage unit 41 that stores basic data on the relationship between the number of logs and the collaborative filtering processing time, a target log number input unit 43 that inputs the target log number of processing targets for collaborative filtering, The basic data is read from the basic data storage unit 41 and calculated by the processing time calculation unit 45 and the processing time calculation unit 45 which obtains the predicted processing time corresponding to the target log number input by the log number input unit from the basic data. The processing time output unit 47 that outputs the predicted processing time, the permission determination receiving unit 49 that receives permission determination for the predicted processing time output by the processing time output unit 47, and the target log input by the target log number input unit 43 And a log extraction unit 51 that extracts a target number of logs from the logs accumulated in the history data accumulation unit 21 based on the number.

基礎データ記憶部４１の基礎データは、基礎データ取得実験処理部５３によって生成される。基礎データは、下記のようなデータである。 The basic data in the basic data storage unit 41 is generated by the basic data acquisition experiment processing unit 53. The basic data is the following data.

図１１（ａ）および図１１（ｂ）は、推薦対象ユーザ数を固定したときのログ数と処理時間の関係である。推薦対象ユーザの数は、例えば、１，０００，０００人である。理想的には、図１１（ａ）に示されるように、ログ数と処理時間が比例し、両者の関係は一本の直線で表される。 FIG. 11A and FIG. 11B show the relationship between the number of logs and the processing time when the number of recommendation target users is fixed. The number of recommended users is, for example, 1,000,000. Ideally, as shown in FIG. 11A, the number of logs is proportional to the processing time, and the relationship between the two is represented by a single straight line.

しかし、現実のグラフは、図１１（ｂ）に示されるように、一本の直線にはならない。協調フィルタリングを実行するコンピュータでは、メモリおよびＨＤＤの容量が有限である。ログ数が増えると、メモリが足りなくなり、図１１（ｂ）のＡ点でＳＷＡＰが生じ、処理時間のラインの傾きが急に大きくなる。さらに、図１１（ｂ）のＢ点では、ＨＤＤの容量が限界点に達し、処理が不可能となる。 However, the actual graph does not become a single straight line as shown in FIG. In a computer that performs collaborative filtering, the capacity of the memory and HDD is finite. As the number of logs increases, the memory becomes insufficient, SWAP occurs at point A in FIG. 11B, and the slope of the processing time line suddenly increases. Further, at point B in FIG. 11B, the capacity of the HDD reaches a limit point, and processing becomes impossible.

基礎データ取得実験処理部５３は、実際にコンピュータに協調フィルタ処理を実行させて処理時間を計測する基礎実験処理によって、図１１（ｂ）のデータを計測する処理を行う。 The basic data acquisition experiment processing unit 53 performs the process of measuring the data in FIG. 11B by the basic experiment process of actually causing the computer to execute the collaborative filter process and measuring the processing time.

図１２は、基礎データ取得実験処理部５３の処理を示している。基礎実験では、推薦対象ユーザ数ｘが固定され、ログ数ｙが一定間隔で増加され、そして、各ｙの値での処理時間ｔが計測される。ここでは、協調フィルタ部２３によるサンプルログの処理に実際に要する時間が計測される。サンプルログは、履歴データ蓄積部２１に記憶された実際のログでよい。すなわち、基礎データ取得実験処理部５３は、履歴データ蓄積部２１の履歴データを読み出して基礎実験を行う。 FIG. 12 shows the processing of the basic data acquisition experiment processing unit 53. In the basic experiment, the number x of recommended users is fixed, the number y of logs is increased at regular intervals, and the processing time t at each y value is measured. Here, the time actually required for the processing of the sample log by the collaborative filter unit 23 is measured. The sample log may be an actual log stored in the history data storage unit 21. That is, the basic data acquisition experiment processing unit 53 reads the history data of the history data storage unit 21 and performs a basic experiment.

図１２に示すように、各ログ数ｙ１、ｙ２、、に対応する処理時間ｔ１、ｔ２が求められ、点ｐ１、ｐ２、、、がプロットされる。点ｐ２が求められると、線ｐ１−ｐ２の傾きが求められる。次に、点ｐ３が求められると、線ｐ２−ｐ３の傾きが求められる。この処理が繰り返される。 As shown in FIG. 12, processing times t1 and t2 corresponding to the respective log numbers y1 and y2 are obtained, and points p1, p2, and so on are plotted. When the point p2 is obtained, the slope of the line p1-p2 is obtained. Next, when the point p3 is obtained, the slope of the line p2-p3 is obtained. This process is repeated.

図１２の例では、線ｐ２−ｐ３の傾きが急に大きくなる。傾きが大きくなると、線の中間の点が取られる。図の例では、線ｐ２−ｐ３の傾きが大きくなったので、ｙ２とｙ３の間にｙ４が設定され、ｙ４に対する処理時間ｔ４が計測される。こうして、変曲点ｐ４が得られ、図１２のグラフが得られる。 In the example of FIG. 12, the slope of the line p2-p3 suddenly increases. As the slope increases, the middle point of the line is taken. In the example in the figure, since the slope of the line p2-p3 has increased, y4 is set between y2 and y3, and the processing time t4 for y4 is measured. Thus, the inflection point p4 is obtained, and the graph of FIG. 12 is obtained.

図１２は、説明を分かりやすくするために、模式化され、簡略化されている。実際には、より多くのログ数ｙに対する処理時間ｔの計測が繰り返されてよい。そして、上述したような２分法による計算処理が行われて、傾きの変曲点が求められてよい。また、２分法の代わりにＮｅｗｔｏｎ法が適用されてもよい。また、最急降下法によって図１２のグラフが求められてもよい。 FIG. 12 is simplified and simplified for easy understanding. Actually, the measurement of the processing time t for a larger number of logs y may be repeated. Then, the inflection point of the inclination may be obtained by performing calculation processing by the bisection method as described above. Further, the Newton method may be applied instead of the bisection method. Further, the graph of FIG. 12 may be obtained by the steepest descent method.

図１２のグラフが得られると、変曲点ログ数Ｙが求められる。変曲点ログ数Ｙは、グラフの変曲点（＝図１１（ｂ）のＡ点）に対応するログ数である。変曲点ログ数Ｙは、コンピュータ資源を最大限に使用する最適なログ数ということもできる。変曲点ログ数Ｙを適用すると、メモリ容量を限界以下で使用し、処理時間を極端に増大させることなくログ数を極力増やして、より高いフィルタリング精度が得られる。 When the graph of FIG. 12 is obtained, the inflection point log number Y is obtained. The inflection point log number Y is the number of logs corresponding to the inflection point of the graph (= point A in FIG. 11B). The inflection point log number Y can also be said to be the optimal number of logs that use computer resources to the maximum. When the inflection point log number Y is applied, the memory capacity is used below the limit, the log number is increased as much as possible without extremely increasing the processing time, and higher filtering accuracy can be obtained.

また、図１２のグラフから、推薦対象ユーザ数を固定したときのログ数と処理時間の関係が得られる。ｔ＝ａｙ＋ｂ（０＜ｙ＜Ｙ）、ｔ＝ｃｙ＋ｄ（Ｙ＜ｙ）（ａ、ｂ、ｃ、ｄは係数）。この式に対応するデータが、基礎データとして基礎データ記憶部４１に格納される。 Further, the relationship between the number of logs and the processing time when the number of recommended target users is fixed is obtained from the graph of FIG. t = ay + b (0 <y <Y), t = cy + d (Y <y) (a, b, c, d are coefficients). Data corresponding to this expression is stored in the basic data storage unit 41 as basic data.

以上に基礎データ取得実験処理部５３の機能を説明した。なお、ここでは説明を分かりやすくするためにグラフを用いた。実際の計算処理は、グラフへのプロットと同等の処理を行えばよく、グラフが実際に使われなくてよいことはもちろんである。 The function of the basic data acquisition experiment processing unit 53 has been described above. Here, a graph is used for easy understanding of the explanation. Of course, the actual calculation process is equivalent to the process of plotting on the graph, and the graph need not be actually used.

図１０に戻ると、目標ログ数入力部４３、処理時間出力部４７および許可判定受付部４９は、キーボード、ディスプレイ等の入出力装置で構成されている。オペレータにより入力装置が操作されて、所望の最小のログ数が入力される。このログ最小値は、必要な精度を確保するために必要と考えられる最低限のログ数である。このログ最小値が、目標ログ数として目標ログ数入力部４３に入力されて、処理時間算出部４５に取得される。 Returning to FIG. 10, the target log number input unit 43, the processing time output unit 47, and the permission determination reception unit 49 are configured by input / output devices such as a keyboard and a display. The input device is operated by the operator to input a desired minimum number of logs. This log minimum value is the minimum number of logs considered necessary for ensuring the necessary accuracy. This log minimum value is input as the target log number to the target log number input unit 43 and acquired by the processing time calculation unit 45.

処理時間算出部４５は、目標ログ数から予測処理時間を算出する。基礎データ記憶部４１から基礎データが読み出され、基礎データに目標ログ数が当てはめられて、予測処理時間が算出される。基礎データは、上述のようにログ数ｙと処理時間ｔの関係の式の形をとっている。この式に目標ログ数が代入されればよい。処理時間出力部４７は、算出された予測処理時間を出力する。予測処理時間はディスプレイに表示される。処理時間が適当であれば、オペレータは許可操作を入力装置に対して行い、許可判定受付部４９が許可判定（納得判定）を受け付ける。そして、許可判定は、許可判定受付部４９からログ抽出部５１に伝えられる。ログ抽出部５１は、目標ログ数入力部４３に入力された目標ログ数も取得している。ログ抽出部５１は、許可判定を取得すると、履歴データ蓄積部２１に蓄積されたログから目標ログ数のログを抽出する。ログ抽出処理は任意であり、例えば、ログ抽出処理では、ログがランダムに抽出されて、ログの数が目標ログ数まで減らされてもよい。しかし、本実施の形態では、後述する類似度理論値と類似度しきい値を用いることが好適であり、これによりフィルタリングの精度を高くできる。この特徴的な抽出処理の詳細は、後述に項を改めて述べる。 The processing time calculation unit 45 calculates a predicted processing time from the target number of logs. The basic data is read from the basic data storage unit 41, the target log number is applied to the basic data, and the prediction processing time is calculated. The basic data is in the form of an expression of the relationship between the log number y and the processing time t as described above. What is necessary is just to substitute the target log number into this formula. The processing time output unit 47 outputs the calculated predicted processing time. The predicted processing time is displayed on the display. If the processing time is appropriate, the operator performs a permission operation on the input device, and the permission determination receiving unit 49 receives a permission determination (satisfaction determination). The permission determination is transmitted from the permission determination reception unit 49 to the log extraction unit 51. The log extraction unit 51 also acquires the target log number input to the target log number input unit 43. When acquiring the permission determination, the log extraction unit 51 extracts a target number of logs from the logs accumulated in the history data accumulation unit 21. The log extraction process is arbitrary. For example, in the log extraction process, logs may be extracted at random, and the number of logs may be reduced to the target number of logs. However, in the present embodiment, it is preferable to use a similarity theory value and a similarity threshold value, which will be described later, thereby improving the accuracy of filtering. Details of this characteristic extraction process will be described later.

図１３は、処理対象データ抽出部２９の動作を示すフローチャートである。全推薦対象ユーザ数ｘが入力され（Ｓ２１）、基礎データ取得実験処理部５３により基礎データ取得実験が行われ（Ｓ２３）、基礎データである処理終了予測グラフデータが生成される（Ｓ２５）。そして、ログ最小値が入力されて（Ｓ２７）、予測処理時間が基礎データから求められ、表示される（Ｓ２９）。許可判定の受付の有無が判定され（Ｓ３１）、許可判定が受け付けられなければ（Ｓ３１、Ｎｏ）、ステップＳ２７に戻って別のログ最小値が入力される。許可判定が受け付けられると（Ｓ３１、Ｙｅｓ）、履歴データ蓄積部２１の全ログ数が目標ログ数と比較され、全ログ数が目標ログ数より大きいか否かが判定される（Ｓ３３）。全ログ数が目標ログ数以下であれば（Ｓ３３、Ｎｏ）、処理が終了する。処理対象データ抽出部２９は、履歴データ蓄積部２１のすべてのログを協調フィルタ部２３に処理させる。ステップＳ３３で全ログ数が目標ログ数より多ければ（Ｓ３３、Ｙｅｓ）、全ログから処理対象のログが抽出され、ログ数が目標ログ数まで削減される（Ｓ３５）。抽出されたログが、処理対象データ抽出部２９から協調フィルタ部２３に伝えられ、協調フィルタ部２３でフィルタ処理が行われる。 FIG. 13 is a flowchart showing the operation of the processing target data extraction unit 29. The total number x of recommended users is input (S21), and a basic data acquisition experiment is performed by the basic data acquisition experiment processing unit 53 (S23), and processing end prediction graph data that is basic data is generated (S25). Then, the minimum log value is input (S27), and the predicted processing time is obtained from the basic data and displayed (S29). Whether or not the permission determination is accepted is determined (S31), and if the permission determination is not accepted (S31, No), the process returns to step S27 and another log minimum value is input. When the permission determination is accepted (S31, Yes), the total number of logs in the history data storage unit 21 is compared with the target log number, and it is determined whether or not the total log number is larger than the target log number (S33). If the total number of logs is equal to or less than the target number of logs (S33, No), the process ends. The processing target data extraction unit 29 causes the collaborative filter unit 23 to process all the logs in the history data storage unit 21. If the total number of logs is larger than the target number of logs in step S33 (S33, Yes), the logs to be processed are extracted from all the logs, and the number of logs is reduced to the target number of logs (S35). The extracted log is transmitted from the processing target data extraction unit 29 to the collaborative filter unit 23, and the collaborative filter unit 23 performs filter processing.

図１４は、処理対象データ抽出部２９の動作に対応する画面表示を示している。図１４の画面は、コンピュータのディスプレイに表示される。図１４に示すように、まず、基礎データ実験のためのｘ（推薦対象ユーザ数）の入力要求画面が表示される。推薦対象ユーザ数が入力されると、基礎実験中であることを示す画面が表示される。基礎実験が終わると、ログ最小値（目標ログ数）の入力要求画面が表示される。ログ最小値が入力されると、予測処理時間が表示されて、「処理実行」ボタンと「キャンセル」ボタンが表示される。「キャンセル」ボタンが表示されると、図１３のステップＳ３１がＮｏになり、再度、ログ最小値の入力要求画面が表示される。「処理実行」ボタンが押されると、ステップＳ３１がＹｅｓになり、ログ抽出が行われる。なお、基礎実験で得られるログ数と処理時間の関係を保管しておくことにより、基礎実験を繰り返さなくてもよくなり、この場合には図１３および図１４で基礎実験の処理が省略されてよい。 FIG. 14 shows a screen display corresponding to the operation of the processing target data extraction unit 29. The screen of FIG. 14 is displayed on a computer display. As shown in FIG. 14, first, an input request screen for x (the number of recommended users) for the basic data experiment is displayed. When the number of recommended users is input, a screen indicating that a basic experiment is being performed is displayed. When the basic experiment is completed, an input request screen for the minimum log value (target log number) is displayed. When the log minimum value is input, the predicted processing time is displayed, and a “process execution” button and a “cancel” button are displayed. When the “Cancel” button is displayed, Step S31 in FIG. 13 is No, and the log minimum value input request screen is displayed again. When the “process execution” button is pressed, step S31 is Yes and log extraction is performed. By keeping the relationship between the number of logs obtained in the basic experiment and the processing time, it is not necessary to repeat the basic experiment. In this case, the basic experiment process is omitted in FIGS. 13 and 14. Good.

次に、図１０のログ抽出部５１について詳細に説明する。本実施の形態では、ログ抽出部５１は、下記のように、類似度理論値を用いて、フィルタリング処理の精度を高く保ちながら、全ログの一部を抽出することができる。 Next, the log extracting unit 51 in FIG. 10 will be described in detail. In the present embodiment, the log extraction unit 51 can extract a part of all logs using the theoretical similarity value as described below while keeping the accuracy of the filtering process high.

ここでは、まず、図１５の２つの具体例におけるジャッカード係数による類似度を考える。図１５（ａ）では、推薦対象ユーザＡの購入個数が１０個であり、比較対象ユーザ（類似ユーザ）Ｂの購入個数が１０個であり、共通購入商品の数が５である。この場合、類似度は０．３３である。一方、図１５（ｂ）では、推薦対象ユーザＡの購入個数が１０個であり、比較対象ユーザ（類似ユーザ）Ｃの購入個数が１００個であり、共通商品の数が５である。この場合、類似度は０．０４である。したがって、購入個数が遠いユーザＣの購入商品ｃよりも、購入個数が近いユーザＢの購入商品ｂの方が、信頼性の高いお薦め商品である。 Here, first, the similarity based on the Jackard coefficient in the two specific examples of FIG. 15 is considered. In FIG. 15A, the number of purchases of the recommendation target user A is 10, the number of purchases of the comparison target user (similar user) B is 10, and the number of common purchase products is 5. In this case, the similarity is 0.33. On the other hand, in FIG. 15B, the number of purchases of the recommendation target user A is 10, the number of purchases of the comparison target user (similar user) C is 100, and the number of common products is 5. In this case, the similarity is 0.04. Therefore, the purchase product b of the user B, who has a smaller purchase quantity, is the recommended product with higher reliability than the purchase product c of the user C, whose purchase quantity is far.

このように、共通購入商品が同じであっても、ベースになる購入個数といった嗜好情報が近いユーザの類似度が大きくなる。このことは、類似度を考慮して購入個数が近いユーザのログを抽出することによって信頼性つまり精度を極力落とさずにログ数を削減できることを意味している。 In this way, even if the common purchase product is the same, the similarity of users having similar preference information such as the number of purchases as a base increases. This means that the number of logs can be reduced without reducing reliability, that is, accuracy as much as possible, by extracting logs of users whose purchases are close in consideration of similarity.

本実施の形態は、このような見地に基づいてログ抽出処理を行う。特に、本実施の形態は、下記のように、類似度理論値に基づいた処理の工夫により、極力類似度が大きくなるような購入数を客観的なデータ処理によって特定し、この特定された購入数の比較対象ユーザのログを残す抽出処理を実現する。 In the present embodiment, log extraction processing is performed based on such a viewpoint. In particular, in the present embodiment, as described below, the number of purchases that maximizes the similarity as much as possible is specified by objective data processing by means of processing based on the theoretical similarity value, and this specified purchase is determined. An extraction process that leaves a number of comparison target users' logs is realized.

図１６は、本実施の形態で使用する類似度理論値のテーブルの例である。この例では、推薦対象ユーザの購入個数は１０に固定されている。そして、各比較対象ユーザの購入個数と、各共通購入個数とから、類似度理論値としてジャッカード係数が計算される。図１６は、比較対象ユーザ購入個数と共通購入個数の各種の組合せにおける類似度理論値のテーブルである。 FIG. 16 is an example of a similarity theoretical value table used in the present embodiment. In this example, the number of recommended users to be purchased is fixed at 10. Then, a Jackard coefficient is calculated as a theoretical value of similarity from the purchase quantity of each comparison target user and each common purchase quantity. FIG. 16 is a table of similarity degree theoretical values for various combinations of the user purchase quantity to be compared and the common purchase quantity.

図１６において、三角形のゾーンＡは、類似度理論値が０．３３以上のゾーンである。一方、ゾーンＢは、類似度理論値が０．３３未満のゾーンである。なお、ゾーンＣは不要ゾーンである（購入個数≦共通個数のゾーン、購入個数＝共通個数の場合、推薦商品が存在しない）。 In FIG. 16, a triangular zone A is a zone having a theoretical similarity value of 0.33 or more. On the other hand, zone B is a zone having a theoretical similarity value of less than 0.33. Note that the zone C is an unnecessary zone (the number of purchases ≦ the number of common zones, and when the number of purchases = the common number, there is no recommended product).

このように、類似度理論値の最低値を設定すると、図示のゾーンＡができる。この値を、本発明では類似度しきい値と呼ぶ。すなわち、図の例では、類似度しきい値が０．３３である。 Thus, when the minimum value of the theoretical similarity value is set, the illustrated zone A is formed. This value is called a similarity threshold in the present invention. That is, in the example of the figure, the similarity threshold is 0.33.

そして、類似度しきい値が決まると、ゾーンＡが決まり、ゾーンＡでの比較対象ユーザの購入個数の最小値（５）と最大値（３０）が決まる。そして、この最小値から最大値までの購入個数に該当する比較対象ユーザの購入履歴データを抽出することができる。 When the similarity threshold is determined, zone A is determined, and the minimum value (5) and maximum value (30) of the number of comparison target users purchased in zone A are determined. Then, purchase history data of the comparison target user corresponding to the number of purchases from the minimum value to the maximum value can be extracted.

ここで、図１５の具体例で説明した考え方に基づいて考えると、類似度しきい値を大きくすると、推薦対象ユーザとの購入個数が近い比較対象ユーザが絞り込まれることになる。しかも、極力、類似度が高くなるように、すなわち、推薦精度が高くなるように、比較対象ユーザが絞り込まれる。ただし、類似度しきい値を大きくすると、比較対象ユーザが絞り込まれるので（ゾーンＡが狭まる）、履歴データの数が少なくなる。 Here, based on the concept described in the specific example of FIG. 15, if the similarity threshold value is increased, comparison target users whose purchase numbers with the recommendation target user are close are narrowed down. In addition, the comparison target users are narrowed down so that the degree of similarity is as high as possible, that is, the recommendation accuracy is increased. However, if the similarity threshold value is increased, the comparison target users are narrowed down (zone A is narrowed), and the number of history data is reduced.

そこで、本実施の形態は、抽出目標の履歴データの量を達成できる範囲で、類似度しきい値を最大にする。これにより、精度を極力高く保ちながら必要な量の履歴データを抽出できる。 Therefore, in the present embodiment, the similarity threshold is maximized within a range in which the amount of history data to be extracted can be achieved. As a result, a necessary amount of history data can be extracted while keeping the accuracy as high as possible.

なお、図１６の例にて、類似度しきい値が０．３３のときのゾーンＡでは、購入個数の最大値が３０、最小値が５である。本実施の形態は、この購入個数範囲の全比較対象ユーザの履歴データを抽出する。したがって、ゾーンＡ以外の履歴データ（ゾーンＡの左側にあるゾーンＢの履歴データ）も混じることになる。厳密にはゾーンＡのみから履歴データを抽出した方が精度が向上する（ただし、処理時間が長くなる）。 In the example of FIG. 16, in the zone A when the similarity threshold is 0.33, the maximum value of the purchased number is 30 and the minimum value is 5. In the present embodiment, history data of all comparison target users in the purchased number range is extracted. Accordingly, history data other than zone A (history data of zone B on the left side of zone A) is also mixed. Strictly speaking, the accuracy is improved when history data is extracted from only zone A (however, the processing time becomes longer).

次に、図１７を参照し、本実施の形態のログ抽出処理をより具体的に説明する。図１７も図１６と同様に類似度理論値のテーブルである。ただし、図１６は、購入個数を使った一般的な説明の図であった。一方、図１７は、本実施の形態で実際に処理される履歴データであるログを用いて表現されたテーブルである。 Next, with reference to FIG. 17, the log extraction processing of the present embodiment will be described more specifically. FIG. 17 is also a table of theoretical similarity values similar to FIG. However, FIG. 16 is a diagram of general explanation using the purchased number. On the other hand, FIG. 17 is a table expressed using a log that is history data actually processed in the present embodiment.

すなわち、図１７では、購入個数がログに置き換えられる。一人のユーザのログ数は、全体のログ数との区別を明確にするために、個人ログ数ということにする。したがって、図１７では、推薦対象ユーザの個人ログ数が１０に固定されている。そして、図１７は、比較対象ユーザの個人ログ数と共通取得アイテムのログ数との組合せに対応する類似度理論値のテーブルである。 That is, in FIG. 17, the number of purchases is replaced with a log. The number of logs for a single user is referred to as the number of individual logs in order to clarify the distinction from the total number of logs. Therefore, in FIG. 17, the number of personal logs of the recommendation target user is fixed to 10. FIG. 17 is a table of similarity theoretical values corresponding to combinations of the number of personal logs of the comparison target user and the number of logs of the common acquired item.

図１７の例でも、類似度しきい値が０．３３である。類似度しきい値が設定されると、「しきい値基準個人ログ数」が決まる。「しきい値基準個人ログ数」は、類似度しきい値以上の類似度理論値に対応する比較対象ユーザの個人ログ数である。そして、「しきい値基準個人ログ数」は、類似度しきい値以上の類似度理論値に対応するゾーンＡの最大個人ログ数（３０）から最小個人ログ数（５）までの個人ログ数である。 Also in the example of FIG. 17, the similarity threshold is 0.33. When the similarity threshold is set, the “threshold reference personal log number” is determined. The “threshold reference personal log number” is the number of personal logs of the comparison target user corresponding to the similarity similarity value equal to or higher than the similarity threshold. The “threshold reference personal log number” is the number of personal logs from the maximum personal log number (30) to the minimum personal log number (5) in zone A corresponding to the similarity theoretical value equal to or higher than the similarity threshold. It is.

「しきい値基準個人ログ数」が決まると、「しきい値基準個人ログ数」に対応する比較対象ユーザのログの数が、履歴データの全ログからカウントされて、カウント値が求められる。図１７の例では、個人ログ数が５〜３０の全部の比較対象ユーザの全ログの数が求められる。本明細書で、このカウント値は、履歴データのログのうちで、与えられた条件に該当するログの実際の数の求められた値を意味する。カウントの処理は、このような与えられた条件に該当するログの実際の数を得られる処理であり、任意の処理でよく、検索等が含まれてよい。与えられる条件は、ここでは、類似度しきい値またはそこから定まるしきい値基準個人ログ数である。カウントのとき、各ログ数の該当人数が求められ、各ログ数と該当人数の積が求められ、積の総和が計算されてもよい。そして、本実施の形態では、カウント値に基づき、目標ログ数以上のログが残る範囲で最大の類似度しきい値が求められる。この最大の類似度しきい値を設定して、ログが履歴データから抽出される。 When the “threshold reference personal log number” is determined, the number of logs of the comparison target user corresponding to the “threshold reference personal log number” is counted from all the logs of the history data, and the count value is obtained. In the example of FIG. 17, the number of all logs of all the comparison target users whose personal logs are 5 to 30 is obtained. In this specification, this count value means a value obtained from the actual number of logs corresponding to a given condition among logs of history data. The counting process is a process for obtaining the actual number of logs corresponding to such a given condition, and may be an arbitrary process and may include a search or the like. Here, the given condition is the similarity threshold value or the threshold reference personal log number determined therefrom. At the time of counting, the number of people corresponding to the number of logs can be obtained, the product of the number of logs and the number of people can be obtained, and the sum of the products can be calculated. In the present embodiment, based on the count value, the maximum similarity threshold is obtained in a range where logs equal to or more than the target number of logs remain. A log is extracted from the history data by setting the maximum similarity threshold.

図１８は、上記の処理を実現するログ抽出部５１の構成を示すブロック図である。ログ抽出部５１は、類似度理論値テーブル記憶部６１、類似度しきい値設定部６３、しきい値基準個人ログ数設定部６５、カウント値取得部６７、カウント値判定部６９、適用しきい値決定部７１および抽出実行部７３を有している。 FIG. 18 is a block diagram showing the configuration of the log extraction unit 51 that implements the above processing. The log extraction unit 51 includes a similarity degree theoretical value table storage unit 61, a similarity threshold setting unit 63, a threshold reference personal log number setting unit 65, a count value acquisition unit 67, a count value determination unit 69, and an application threshold. A value determination unit 71 and an extraction execution unit 73 are included.

類似度理論値テーブル記憶部６１は、図１７の類似度理論値のテーブルを記憶している。類似度しきい値設定部６３は、類似度しきい値を設定する。しきい値基準個人ログ数設定部６５は、類似度理論値テーブルを参照し、類似度しきい値に対応する「しきい値基準個人ログ数」を設定する。「しきい値基準個人ログ数」は、類似度しきい値以上の類似度理論値に対応するゾーンＡの最大個人ログ数から最小個人ログ数までの個人ログ数である。 The similarity degree theoretical value table storage unit 61 stores a table of similarity degree theoretical values shown in FIG. The similarity threshold setting unit 63 sets a similarity threshold. The threshold standard personal log number setting unit 65 refers to the similarity theoretical value table and sets the “threshold standard personal log number” corresponding to the similarity threshold. The “threshold reference personal log number” is the number of personal logs from the maximum personal log number to the minimum personal log number in the zone A corresponding to the similarity degree theoretical value equal to or higher than the similarity threshold value.

カウント値取得部６７は、履歴データ蓄積部２１のログを検索する。そして、カウント値取得部６７は、しきい値基準個人ログ数に対応する比較対象ユーザのログの数をカウントしてカウント値を取得する。カウント値判定部６９は、カウント取得部６７により得られたログ数のカウント値を判定する。カウント値判定部６９には目標ログ数が入力される。カウント値判定部６９は、目標ログ数とカウント値とを比較し、目標ログ数よりカウント値が多いか否かを判定する。 The count value acquisition unit 67 searches the log of the history data storage unit 21. And the count value acquisition part 67 counts the number of the logs of the comparison object user corresponding to the threshold value reference | standard personal log number, and acquires a count value. The count value determination unit 69 determines the count value of the number of logs obtained by the count acquisition unit 67. The target log number is input to the count value determination unit 69. The count value determination unit 69 compares the target log number with the count value, and determines whether the count value is larger than the target log number.

適用しきい値決定部７１は、カウント値判定部６９の判定結果に基づいて、適用しきい値を決定する。適用しきい値は、カウント値取得部６７により得られるログの数が目標ログ数以上になる範囲で最大の類似度しきい値である。適用しきい値決定部７１は、後述する処理により、類似度しきい値設定部６３が類似度しきい値を変化させていったときのカウント値の判定結果から、適用しきい値を決定する。抽出実行部７３は、適用しきい値決定部７１で決定された適用しきい値を用いてログを履歴データ蓄積部２１から抽出する。抽出実行部７３は、適用しきい値に対応する個人ログ数（しきい値基準個人ログ数）に対応する比較対象ユーザのログを抽出する。抽出実行部７３は、抽出したログを協調フィルタ部２３に渡す。 The application threshold value determination unit 71 determines an application threshold value based on the determination result of the count value determination unit 69. The application threshold is a maximum similarity threshold within a range in which the number of logs obtained by the count value acquisition unit 67 is equal to or greater than the target number of logs. The application threshold value determination unit 71 determines the application threshold value from the determination result of the count value when the similarity threshold value setting unit 63 changes the similarity threshold value by a process described later. . The extraction execution unit 73 extracts a log from the history data storage unit 21 using the application threshold value determined by the application threshold value determination unit 71. The extraction execution unit 73 extracts the log of the comparison target user corresponding to the number of personal logs corresponding to the application threshold value (threshold reference personal log number). The extraction execution unit 73 passes the extracted log to the collaborative filter unit 23.

図１９は、上記のログ抽出部５１の動作例を示すフローチャートである。この例では、目標ログ数がログ抽出部５１に入力され（Ｓ４１）、類似度しきい値の初期値が設定される（Ｓ４３）。類似度しきい値の初期値は、所定の最大値（例えば１）に設定される。そして、類似度しきい値に対応するしきい値基準個人ログ数が設定され（Ｓ４５）、このしきい値基準個人ログ数に対応する比較対象ユーザのログ数がカウントされる（Ｓ４７）。この処理では、各ログ数の該当人数が検索等で求められ、各ログ数と該当人数の積の総和が計算され、これによりカウント値が取得されてもよい（以下同様）。そして、ログ数のカウント値が目標ログ数と比較されて、カウント値が目標ログ数以上か否かが判定される（Ｓ４９）。カウント値が目標ログ数より少なければ、（Ｓ４９、Ｎｏ）、類似度しきい値設定部６５が類似度しきい値を再設定する（Ｓ５１）。このとき、類似度しきい値が所定の幅だけ減らされる。しきい値変更の後、ステップＳ４５に戻る。 FIG. 19 is a flowchart showing an operation example of the log extracting unit 51 described above. In this example, the target number of logs is input to the log extraction unit 51 (S41), and the initial value of the similarity threshold is set (S43). The initial value of the similarity threshold is set to a predetermined maximum value (for example, 1). Then, the threshold standard personal log number corresponding to the similarity threshold is set (S45), and the number of logs of the comparison target user corresponding to the threshold standard personal log number is counted (S47). In this processing, the number of people corresponding to the number of logs can be obtained by searching or the like, the sum of products of the number of logs and the number of people can be calculated, and the count value can be obtained (the same applies hereinafter). Then, the count value of the log number is compared with the target log number to determine whether the count value is equal to or greater than the target log number (S49). If the count value is less than the target number of logs (S49, No), the similarity threshold value setting unit 65 resets the similarity threshold value (S51). At this time, the similarity threshold is decreased by a predetermined width. After changing the threshold value, the process returns to step S45.

ステップＳ４９の判定がＹｅｓであれば（カウント値≧目標ログ数）、適用しきい値決定部７１が、ステップＳ５１またはＳ４３で設定された類似度しきい値を適用しきい値に決定し、適用しきい値に対応するしきい値基準個人ログ数を求める（Ｓ５３）。そして、抽出実行部７３が、しきい値基準個人ログ数に従って履歴データ蓄積部２１からログを抽出する（Ｓ５５）。 If the determination in step S49 is Yes (count value ≧ target log number), the application threshold value determination unit 71 determines the similarity threshold value set in step S51 or S43 as the application threshold value, and applies it. The number of threshold reference personal logs corresponding to the threshold is obtained (S53). Then, the extraction execution unit 73 extracts logs from the history data storage unit 21 according to the threshold reference personal log number (S55).

このようにして、図１９の処理では、類似度しきい値を小さくして（減らして）ログ数を判定する動作が繰り返され、ログ数のカウント値が目標ログ数以上になると類似度しきい値が確定し、ログ抽出が実行される。 In this way, in the processing of FIG. 19, the operation of determining the number of logs by reducing (decreasing) the similarity threshold is repeated, and when the count value of the number of logs exceeds the target number of logs, the similarity threshold is reached. The value is confirmed and log extraction is performed.

図２０は、ログ抽出処理の他の例を示すフローチャートである。図１９では、類似度しきい値が最大値から減らされたが、図２０では、類似度しきい値が逆に増大される。すなわち、図２０では、ステップＳ４３ａにて、類似度しきい値の初期値が、所定の最小値に設定される。そして、ステップ４９ａでは、カウント値判定部６９が、ログ数のカウント値が目標ログ数より小さいか否かを判定する。ステップＳ４９ａの判定がＮｏであれば、ステップＳ５１ａで類似度しきい値が増やされる。ステップＳ４９ａの判定がＹｅｓであれば、適用しきい値決定部７１が、ステップＳ５１ａまたはＳ４３ａで前回のループで設定された類似度しきい値を適用しきい値に決定する（Ｓ５３ａ）。その他の処理は、図１９と同様である。 FIG. 20 is a flowchart illustrating another example of the log extraction process. In FIG. 19, the similarity threshold is reduced from the maximum value, but in FIG. 20, the similarity threshold is increased conversely. That is, in FIG. 20, in step S43a, the initial value of the similarity threshold is set to a predetermined minimum value. In step 49a, the count value determination unit 69 determines whether or not the log count value is smaller than the target log count. If the determination in step S49a is No, the similarity threshold is increased in step S51a. If the determination in step S49a is Yes, the application threshold value determination unit 71 determines the similarity threshold value set in the previous loop in step S51a or S43a as the application threshold value (S53a). Other processes are the same as those in FIG.

このようにして、図２０の処理では、類似度しきい値を大きくして（増やして）ログ数を判定する動作が繰り返される。ログ数のカウント値が目標ログ数を下回ると、類似度しきい値が確定し（類似度しきい値は、１つ前のループの値になる）、ログ抽出が実行される。 In this way, in the process of FIG. 20, the operation of determining the number of logs by increasing (increasing) the similarity threshold is repeated. When the count value of the number of logs falls below the target number of logs, the similarity threshold is determined (the similarity threshold becomes the value of the previous loop), and log extraction is executed.

類似度しきい値に関するループ処理は、図１９、図２０に限定されない。しきい値変更の幅が一定でなくてもよい。そして、収束計算が行われてもよい。例えば、最初は、しきい値が増大され、そして、ログ数が減りすぎると今度はしきい値が減らされる。このような動作が繰り返され、しきい値の変化幅が徐々に小さく変更される。 The loop processing related to the similarity threshold is not limited to FIGS. The width of the threshold change may not be constant. Then, convergence calculation may be performed. For example, initially the threshold is increased, and this time the threshold is decreased if the number of logs decreases too much. Such an operation is repeated, and the change width of the threshold value is gradually reduced.

また、本実施の形態では、類似度しきい値を基準にしているので、抽出されるログ数は、目標ログ数より少し大きくなる。そこで、ログ数をさらに調整して、最終的なログ数を目標ログ数に揃えてもよい。この場合、ログがランダムに削減されてよい。 In this embodiment, since the similarity threshold is used as a reference, the number of logs to be extracted is slightly larger than the target number of logs. Therefore, the number of logs may be further adjusted, and the final number of logs may be aligned with the target number of logs. In this case, the log may be reduced at random.

以上に、類似度理論値と類似度しきい値を使った本実施の形態の好適なログ抽出処理を説明した。本実施の形態によれば、類似度理論値に基づいて履歴データからログを抽出することで、推薦対象ユーザと取得アイテム数が近いユーザのログを残すことができ、フィルタリング精度をより高く保ったままログ数を削減できる。 The preferred log extraction processing of the present embodiment using the similarity theoretical value and the similarity threshold has been described above. According to the present embodiment, by extracting the log from the history data based on the theoretical similarity value, it is possible to leave a log of the user whose acquisition target number is close to the recommendation target user, and to keep the filtering accuracy higher. The number of logs can be reduced.

より詳細には、類似度理論値の類似度しきい値を設定し、この類似度しきい値を使って履歴データからログを抽出することで、推薦対象ユーザと取得アイテム数が近いユーザのログを残すことができ、フィルタリング精度をより高く保ったままログ数を削減できる。 More specifically, by setting a similarity threshold of the similarity theoretical value, and extracting the log from the history data using this similarity threshold, the log of the user whose acquisition target number is close to the recommended target user And the number of logs can be reduced while maintaining higher filtering accuracy.

「処理対象データ抽出部」
（設定処理時間を入力するパターン）
図２１は、処理対象データ抽出部２９の別の構成例を示している。前述の図１０の構成では、目標ログ数が入力され、予測処理時間が目標ログ数から計算された。一方、図２１の構成では、設定処理時間が入力され、目標ログ数が設定処理時間から算出される。 "Processing data extractor"
(Pattern for entering the setting processing time)
FIG. 21 shows another configuration example of the processing target data extraction unit 29. In the configuration of FIG. 10 described above, the target log number is input, and the predicted processing time is calculated from the target log number. On the other hand, in the configuration of FIG. 21, the setting processing time is input, and the target number of logs is calculated from the setting processing time.

図２１の処理対象データ抽出部２９は、ログ数と協調フィルタリング処理時間との関係の基礎データを記憶した基礎データ記憶部８１と、協調フィルタリングの設定処理時間を入力する設定処理時間入力部８３と、基礎データ記憶部８１から基礎データを読み出して、処理時間入力部８３により入力された設定処理時間に対応する目標ログ数を基礎データから求める目標ログ数算出部８５と、目標ログ数算出部８５により算出された目標ログ数を出力する目標ログ数出力部８７と、目標ログ数出力部８７により出力された目標ログ数に対する許可判定を受け付ける許可判定受付部８９と、目標ログ数算出部８５により算出された目標ログ数に基づき、履歴データ蓄積部２１に蓄積されたログから目標ログ数のログを抽出するログ抽出部９１と、基礎データ取得実験処理部９３とを備えている。基礎データ記憶部８１および基礎データ取得実験処理部９３は、図１０の対応する構成と同様でよい。 The processing target data extraction unit 29 in FIG. 21 includes a basic data storage unit 81 that stores basic data on the relationship between the number of logs and the collaborative filtering processing time, a setting processing time input unit 83 that inputs collaborative filtering setting processing time, and the like. Then, the basic data is read from the basic data storage unit 81, and the target log number calculating unit 85 for obtaining the target log number corresponding to the set processing time input by the processing time input unit 83 from the basic data, and the target log number calculating unit 85 A target log number output unit 87 that outputs the target log number calculated by the above, a permission determination reception unit 89 that receives a permission determination for the target log number output by the target log number output unit 87, and a target log number calculation unit 85. Based on the calculated target log number, a log extracting unit 91 that extracts a log of the target log number from the logs accumulated in the history data accumulating unit 21; And a basic data acquisition experiment processor 93. The basic data storage unit 81 and the basic data acquisition experiment processing unit 93 may have the same configurations as those in FIG.

設定処理時間入力部８３、目標ログ数出力部８７および許可判定受付部８９は、キーボード、ディスプレイ等の入出力装置で構成されている。オペレータにより入力装置が操作されて、所望の処理時間の最大値が入力される。この最大処理時間が、設定処理時間として設定処理時間入力部８３に入力されて、目標ログ数算出部８５に取得される。 The setting processing time input unit 83, the target log number output unit 87, and the permission determination reception unit 89 are configured by input / output devices such as a keyboard and a display. The input device is operated by the operator, and the maximum value of the desired processing time is input. The maximum processing time is input to the setting processing time input unit 83 as the setting processing time, and is acquired by the target log number calculation unit 85.

目標ログ数算出部８５は、設定処理時間から目標ログ数を算出する。基礎データ記憶部４１から基礎データが読み出され、基礎データに設定処理時間が当てはめられて、目標ログ数が算出される。基礎データは、ログ数ｙと処理時間ｔの関係の式の形をとっている。この式に設定処理時間が代入されればよい。目標ログ数出力部８７は、算出された目標ログ数を出力する。目標ログ数はディスプレイに表示される。目標ログ数が適当であれば、オペレータは許可操作を入力装置に対して行い、許可判定受付部８９が許可判定（納得判定）を受け付ける。そして、許可判定は、許可判定受付部８９からログ抽出部９１に伝えられる。ログ抽出部９１は、目標ログ数算出部８５により算出された目標ログ数も取得している。ログ抽出部９１は、許可判定を取得すると、履歴データ蓄積部２１に蓄積されたログから目標ログ数のログを抽出する。 The target log number calculation unit 85 calculates the target log number from the setting processing time. The basic data is read from the basic data storage unit 41, the set processing time is applied to the basic data, and the target number of logs is calculated. The basic data is in the form of a relationship between the number of logs y and the processing time t. The setting processing time may be substituted into this equation. The target log number output unit 87 outputs the calculated target log number. The target number of logs is displayed on the display. If the target number of logs is appropriate, the operator performs a permission operation on the input device, and the permission determination receiving unit 89 receives a permission determination (satisfaction determination). The permission determination is transmitted from the permission determination reception unit 89 to the log extraction unit 91. The log extraction unit 91 also acquires the target log number calculated by the target log number calculation unit 85. When acquiring the permission determination, the log extraction unit 91 extracts a target number of logs from the logs accumulated in the history data accumulation unit 21.

図２２は、処理対象データ抽出部２９の動作を示すフローチャートである。全推薦対象ユーザ数ｘが入力され（Ｓ６１）、基礎データ取得実験処理部により基礎データ取得実験が行われ（Ｓ６３）、基礎データである処理終了予測グラフデータが生成される（Ｓ２５）。そして、処理時間最大値が入力されて（Ｓ６７）、目標ログ数が基礎データから求められ、表示される（Ｓ６９）。許可判定の受付の有無が判定され（Ｓ７１）、許可判定が受け付けられなければ（Ｓ７１、Ｎｏ）、ステップＳ６７に戻って別の設定時間最大値が入力される。許可判定が受け付けられると（Ｓ７１、Ｙｅｓ）、履歴データ蓄積部２１の全ログ数が目標ログ数と比較され、全ログ数が目標ログ数より大きいか否かが判定される（Ｓ７３）。全ログ数が目標ログ数以下であれば（Ｓ７３、Ｎｏ）、処理が終了する。処理対象データ抽出部２９は、履歴データ蓄積部２１のすべてのログを協調フィルタ部２３に処理させる。ステップＳ７３で全ログ数が目標ログ数より多ければ（Ｓ７３、Ｙｅｓ）、全ログから処理対象のログが抽出され、ログ数が目標ログ数まで削減される（Ｓ７５）。抽出されたログが、処理対象データ抽出部２９から協調フィルタ部２３に伝えられ、協調フィルタ部２３でフィルタ処理が行われる。 FIG. 22 is a flowchart showing the operation of the processing target data extraction unit 29. The total number x of recommended users is input (S61), the basic data acquisition experiment is performed by the basic data acquisition experiment processing unit (S63), and the process end prediction graph data as basic data is generated (S25). Then, the maximum processing time value is input (S67), and the target number of logs is obtained from the basic data and displayed (S69). Whether or not the permission determination is accepted is determined (S71), and if the permission determination is not accepted (S71, No), the process returns to step S67 and another set time maximum value is input. When the permission determination is accepted (S71, Yes), the total number of logs in the history data storage unit 21 is compared with the target log number, and it is determined whether the total log number is larger than the target log number (S73). If the total number of logs is less than or equal to the target number of logs (S73, No), the process ends. The processing target data extraction unit 29 causes the collaborative filter unit 23 to process all the logs in the history data storage unit 21. If the total number of logs is larger than the target number of logs in step S73 (S73, Yes), processing target logs are extracted from all the logs, and the number of logs is reduced to the target number of logs (S75). The extracted log is transmitted from the processing target data extraction unit 29 to the collaborative filter unit 23, and the collaborative filter unit 23 performs filter processing.

ログ抽出部９１の構成および動作は、概ね、図１０のログ抽出部５１と同様である（図１８、図１９、図２０参照）。ただし、ログ抽出部５１では、抽出ログ数が、目標ログ数以上であった。これに対して、本実施の形態のログ抽出部９１では、抽出ログ数が、目標ログ数以下である。このような相違に関連して、ログ抽出部９１の構成および動作も若干異なっており、具体的には、適用しきい値の決定処理が異なり、ログ数のカウント値が目標ログ数以下になる範囲で最大の類似度しきい値になるように適用しきい値が決定される。 The configuration and operation of the log extraction unit 91 are generally the same as those of the log extraction unit 51 of FIG. 10 (see FIGS. 18, 19, and 20). However, in the log extraction unit 51, the number of extracted logs is greater than or equal to the target number of logs. On the other hand, in the log extraction unit 91 of the present embodiment, the number of extracted logs is equal to or less than the target number of logs. In relation to such a difference, the configuration and operation of the log extraction unit 91 are also slightly different. Specifically, the application threshold value determination process is different, and the count value of the number of logs is equal to or less than the target number of logs. The application threshold is determined to be the maximum similarity threshold in the range.

すなわち、図１９の処理が変形されて、ステップＳ４９では、カウント値が目標ログ数を越えると、判定がＹｅｓになる。ステップＳ５３では、適用しきい値決定部７１が、「前回のループ」のステップＳ５１またはＳ４３で設定された類似度しきい値を適用しきい値に決定する。つまり、類似度しきい値を小さくして（減らして）ログ数を判定する動作が繰り返され、ログ数のカウント値が目標ログ数を越えると、類似度しきい値が確定し（類似度しきい値は、１つ前のループの値になる）、ログ抽出が実行される。 That is, when the process of FIG. 19 is modified and the count value exceeds the target log number in step S49, the determination is Yes. In step S53, the application threshold value determination unit 71 determines the similarity threshold value set in step S51 or S43 of the “previous loop” as the application threshold value. In other words, the operation for determining the number of logs is repeated by decreasing (decreasing) the similarity threshold, and when the count value of the number of logs exceeds the target number of logs, the similarity threshold is determined (similarity level). The threshold value is the value of the previous loop), and log extraction is performed.

また、図２０の処理も変形される。ステップＳ４９では、ログ数のカウント値が目標ログ数以下になると、判定がＹｅｓになる。そして、ステップＳ５３ａでは、適用しきい値決定部７１が、ステップＳ５１ａまたはＳ４３ａで設定された類似度しきい値を適用しきい値に決定する。つまり、類似度しきい値を大きくして（増やして）ログ数を判定する動作が繰り返され、ログ数のカウント値が目標ログ数以下になると類似度しきい値が確定し、ログ抽出が実行される。 Also, the process of FIG. 20 is modified. In step S49, if the count value of the number of logs is equal to or less than the target number of logs, the determination is yes. In step S53a, the application threshold value determination unit 71 determines the similarity threshold value set in step S51a or S43a as the application threshold value. In other words, the operation of judging the number of logs by increasing (increasing) the similarity threshold is repeated, and when the count value of the number of logs falls below the target number of logs, the similarity threshold is fixed and log extraction is executed. Is done.

図２３は、処理対象データ抽出部２９の動作に対応する画面表示を示している。図２３の画面は、処理対象データ抽出部２９のコンピュータのディスプレイに表示される。図２３に示すように、まず、基礎データ実験のためのｘ（推薦対象ユーザ数）の入力要求画面が表示される。推薦対象ユーザ数が入力されると、基礎実験中であることを示す画面が表示される。基礎実験が終わると、処理時間の最大値（設定処理時間）の入力要求画面が表示される。処理時間の最大値が入力されると、目標ログ数が表示されて、「処理実行」ボタンと「キャンセル」ボタンが表示される。「キャンセル」ボタンが表示されると、図２２のステップＳ７１がＮｏになり、再度、設定時間最大値の入力要求画面が表示される。「処理実行」ボタンが押されると、ステップＳ７１がＹｅｓになり、ログ抽出が行われる。 FIG. 23 shows a screen display corresponding to the operation of the processing target data extraction unit 29. The screen of FIG. 23 is displayed on the computer display of the processing target data extraction unit 29. As shown in FIG. 23, first, an input request screen for x (the number of recommended users) for the basic data experiment is displayed. When the number of recommended users is input, a screen indicating that a basic experiment is being performed is displayed. When the basic experiment is completed, an input request screen for the maximum processing time (setting processing time) is displayed. When the maximum value of the processing time is input, the target number of logs is displayed, and a “process execution” button and a “cancel” button are displayed. When the “cancel” button is displayed, step S71 in FIG. 22 becomes No, and the input request screen for the maximum set time value is displayed again. If the “execute processing” button is pressed, step S71 is Yes and log extraction is performed.

以上に、図２１〜図２３を参照し、設定処理時間を入力するパターンの構成を説明した。なお、図１０の構成と図２１の構成（目標ログ数を入力するパターンの構成と設定処理時間を入力するパターンの構成）が、両方とも備えられてもよいことはもちろんである。 The configuration of the pattern for inputting the setting processing time has been described above with reference to FIGS. Of course, both the configuration of FIG. 10 and the configuration of FIG. 21 (the configuration of the pattern for inputting the target number of logs and the configuration of the pattern for inputting the set processing time) may be provided.

図２４は、処理対象データ抽出部２９の別の構成例を示している。前述の図１０の構成では、目標ログ数が入力され、予測処理時間が目標ログ数から計算された。一方、図２１の構成では、類似度しきい値が入力され、類似度しきい値から目標ログ数と予測処理時間が算出される。 FIG. 24 illustrates another configuration example of the processing target data extraction unit 29. In the configuration of FIG. 10 described above, the target log number is input, and the predicted processing time is calculated from the target log number. On the other hand, in the configuration of FIG. 21, a similarity threshold is input, and the target number of logs and the prediction processing time are calculated from the similarity threshold.

図２４において、基礎データ記憶部１０１と基礎データ取得実験処理部１１５は、図１０の対応する構成と同様である。類似度しきい値入力部１０３は、図１６を用いて説明された類似度しきい値を入力する。ここでは、入力される類似度しきい値を設定類似度しきい値と呼ぶ。目標ログ数取得部１０５は、設定類似度しきい値に対応する目標ログ数を求める。処理時間算出部１０７は、目標ログ数に対応する予測処理時間を算出する。計算結果出力部１０９は、目標ログ数取得部１０５により求められた目標ログ数と、処理時間算出部１０７により算出された予測処理時間を出力する。許可判定受付部１１１は、出力された目標ログ数および予測処理時間に対する許可判定を受け付ける。ログ抽出部１１３は、設定類似度しきい値を用いて、履歴データ蓄積部２１に蓄積されたログから目標ログ数のログを抽出する。 24, the basic data storage unit 101 and the basic data acquisition experiment processing unit 115 are the same as the corresponding configurations in FIG. The similarity threshold value input unit 103 inputs the similarity threshold value described with reference to FIG. Here, the input similarity threshold is referred to as a set similarity threshold. The target log number acquisition unit 105 obtains the target log number corresponding to the set similarity threshold. The processing time calculation unit 107 calculates a predicted processing time corresponding to the target number of logs. The calculation result output unit 109 outputs the target log number obtained by the target log number acquisition unit 105 and the predicted processing time calculated by the processing time calculation unit 107. The permission determination reception unit 111 receives a permission determination for the output target log number and the predicted processing time. The log extraction unit 113 extracts a target number of logs from the logs accumulated in the history data accumulation unit 21 using the set similarity threshold.

類似度しきい値入力部１０３、計算結果出力部１０９および許可判定受付部１１１は、キーボード、ディスプレイ等の入出力装置で構成されている。オペレータにより入力装置が操作されて、所望の類似度しきい値が入力される。この類似度しきい値が、設定類似度しきい値として類似度しきい値入力部１０３に入力されて、目標ログ数取得部１０５に取得される。 The similarity threshold value input unit 103, the calculation result output unit 109, and the permission determination reception unit 111 are configured by input / output devices such as a keyboard and a display. The input device is operated by the operator, and a desired similarity threshold value is input. This similarity threshold is input to the similarity threshold input unit 103 as a set similarity threshold and is acquired by the target log number acquisition unit 105.

目標ログ数取得部１０５は、図１６を用いて説明した類似度しきい値を使う処理の原理に従い、類似度しきい値から目標ログ数を求める。すなわち、目標ログ数取得部１０５は、類似度理論値テーブル記憶部１１７から類似度理論値テーブルを読み出して、類似度理論値テーブルを参照して、類似度しきい値に対応する「しきい値基準個人ログ数」を求める。さらに、目標ログ数取得部１０５は、前出のカウント値取得部に相当する機能を持ち、履歴データ蓄積部２１のログデータから、「しきい値基準個人ログ数」に対応する比較対象ユーザのログをカウントしてカウント値を取得する。カウントされたログの数が、目標ログ数になる。 The target log number acquisition unit 105 obtains the target log number from the similarity threshold according to the principle of the process using the similarity threshold described with reference to FIG. That is, the target log number acquisition unit 105 reads the similarity theoretical value table from the similarity theoretical value table storage unit 117, refers to the similarity theoretical value table, and corresponds to the “threshold value corresponding to the similarity threshold value”. "Reference personal log number" is calculated. Further, the target log number acquisition unit 105 has a function corresponding to the above-described count value acquisition unit, and the log data of the history data storage unit 21 indicates that the comparison target user corresponding to the “threshold reference personal log number” Count the log to get the count value. The number of logs counted becomes the target number of logs.

処理時間算出部１０７の処理は、図１０の同構成と同様である。基礎データ記憶部１０１から基礎データが読み出され、基礎データに目標ログ数が当てはめられて、予測処理時間が算出される。 The processing of the processing time calculation unit 107 is the same as that in FIG. The basic data is read from the basic data storage unit 101, the target log number is applied to the basic data, and the predicted processing time is calculated.

計算結果出力部１０９は、目標ログ数および予測処理時間をディスプレイ等に表示する。目標ログ数と予測処理時間が適当であれば、オペレータは許可操作を入力装置に対して行い、許可判定受付部１１１が許可判定（納得判定）を受け付ける。 The calculation result output unit 109 displays the target number of logs and the predicted processing time on a display or the like. If the target log number and the predicted processing time are appropriate, the operator performs a permission operation on the input device, and the permission determination receiving unit 111 receives a permission determination (consent determination).

許可判定は、許可判定受付部１１１からログ抽出部１１３に伝えられる。また、ログ抽出部１１３は、類似度しきい値入力部１０３から入力された設定類似度しきい値を取得する。そして、ログ抽出部１１３は、設定類似度しきい値を用いてログを抽出する。すなわち、ログ抽出部１１３は、「しきい値基準個人ログ数」に対応する比較対象ユーザのログを、履歴データ蓄積部２１から抽出する。 The permission determination is transmitted from the permission determination reception unit 111 to the log extraction unit 113. In addition, the log extraction unit 113 acquires the set similarity threshold value input from the similarity threshold value input unit 103. Then, the log extraction unit 113 extracts a log using the set similarity threshold. That is, the log extraction unit 113 extracts the log of the comparison target user corresponding to the “threshold reference personal log number” from the history data storage unit 21.

ログ抽出部１１３の内部構成は、図１８とは異なっていてよい。図１８の構成では、類似度しきい値が順次変更されて、適用しきい値が決定された。しかし、ここでは、適用すべきしきい値が既に決まっている。そこで、ログ抽出部１１３は、取得した設定類似度しきい値をそのまま使って、ログを抽出する。 The internal configuration of the log extraction unit 113 may be different from that in FIG. In the configuration of FIG. 18, the similarity threshold is sequentially changed to determine the application threshold. However, here, the threshold value to be applied is already determined. Therefore, the log extraction unit 113 extracts the log using the acquired setting similarity threshold as it is.

図２５は、処理対象データ抽出部２９の動作を示すフローチャートである。全推薦対象ユーザ数ｘが入力され（Ｓ８１）、基礎データ取得実験処理部により基礎データ取得実験が行われ（Ｓ８３）、基礎データである処理終了予測グラフデータが生成される（Ｓ８５）。そして、設定類似度しきい値が入力されて（Ｓ８７）、目標ログ数が求められ（Ｓ８９）、予測処理時間が算出され（Ｓ９１）、それらが表示される（Ｓ９３）。許可判定の受付の有無が判定され（Ｓ９５）、許可判定が受け付けられなければ（Ｓ９５、Ｎｏ）、ステップＳ８７に戻って別の設定類似度しきい値が入力される。許可判定が受け付けられると（Ｓ９５、Ｙｅｓ）、設定類似度しきい値を用いてログが抽出される（Ｓ９７）。 FIG. 25 is a flowchart showing the operation of the processing target data extraction unit 29. The total recommended number of users x is input (S81), the basic data acquisition experiment processing unit performs a basic data acquisition experiment (S83), and processing end prediction graph data that is basic data is generated (S85). Then, the set similarity threshold is input (S87), the target number of logs is obtained (S89), the predicted processing time is calculated (S91), and these are displayed (S93). Whether or not the permission determination is accepted is determined (S95), and if the permission determination is not accepted (S95, No), the process returns to step S87 and another set similarity threshold is input. When the permission determination is accepted (S95, Yes), a log is extracted using the set similarity threshold (S97).

図２６は、処理対象データ抽出部２９の動作に対応する画面表示を示している。図２６の画面は、処理対象データ抽出部２９のコンピュータのディスプレイに表示される。図２６に示すように、まず、基礎データ実験のためのｘ（推薦対象ユーザ数）の入力要求画面が表示される。推薦対象ユーザ数が入力されると、基礎実験中であることを示す画面が表示される。基礎実験が終わると、類似度しきい値の入力要求画面が表示される。類似度しきい値が入力されると、目標ログ数と予測処理時間が表示されて、「処理実行」ボタンと「キャンセル」ボタンが表示される。「キャンセル」ボタンが表示されると、図２５のステップＳ９５がＮｏになり、再度、類似度しきい値の入力要求画面が表示される。「処理実行」ボタンが押されると、ステップＳ９５がＹｅｓになり、ログ抽出が行われる。 FIG. 26 shows a screen display corresponding to the operation of the processing target data extraction unit 29. The screen of FIG. 26 is displayed on the computer display of the processing target data extraction unit 29. As shown in FIG. 26, first, an input request screen for x (number of recommended users) for the basic data experiment is displayed. When the number of recommended users is input, a screen indicating that a basic experiment is being performed is displayed. When the basic experiment is completed, the input request screen for the similarity threshold is displayed. When the similarity threshold value is input, the target log number and the predicted processing time are displayed, and a “process execution” button and a “cancel” button are displayed. When the “Cancel” button is displayed, Step S95 in FIG. 25 becomes No, and the similarity threshold value input request screen is displayed again. When the “process execution” button is pressed, step S95 is Yes and log extraction is performed.

以上に、図２４〜図２６を参照し、類似度しきい値を入力するパターンの構成を説明した。なお、図２４の構成が、図１０および図２１の少なくとも一方の構成と共に備えられてよいことはもちろんである。 The pattern configuration for inputting the similarity threshold value has been described above with reference to FIGS. Of course, the configuration of FIG. 24 may be provided together with at least one of the configurations of FIGS.

「分割」
（推薦対象ユーザの分割／ログの分割）
次に、本実施の形態のもう一つの特徴である分割処理について説明する。フィルタ装置のＨＤＤ装置、メモリといった資源は有限であるため、推薦対象ユーザが膨大であれば、処理に時間がかかる。また、ログが膨大であっても、処理に時間がかかる。この点は、図１１を参照して説明した通りである。 "Split"
(Recommendation target users / logs)
Next, the division process, which is another feature of the present embodiment, will be described. Since resources such as the HDD device and the memory of the filter device are limited, if the number of recommended users is enormous, the processing takes time. Moreover, even if the log is enormous, processing takes time. This point is as described with reference to FIG.

そこで、本実施の形態では、さらに、推薦対象ユーザを分割することにより処理時間が短縮される。また、ログを分割することにより処理時間が短縮される。本実施の形態は、このような分割とそれによる処理時間の短縮を、前述した目標ログ数や処理時間のシミュレーションを利用して行うことを可能にする。 Therefore, in the present embodiment, the processing time is further reduced by dividing the recommendation target user. Further, the processing time is shortened by dividing the log. The present embodiment makes it possible to perform such division and shorten the processing time by using the above-described simulation of the target log number and processing time.

推薦対象ユーザについては、分割によって複数のセット（グループ）が作られることになり、セット毎にフィルタリング処理が行われる。セットの処理時間の合計が、分割をしない場合の処理時間よりも短ければ、分割が有効である。 For the recommendation target user, a plurality of sets (groups) are created by the division, and a filtering process is performed for each set. If the total processing time of the set is shorter than the processing time when no division is performed, the division is effective.

ログについては、比較対象ユーザ（類似ユーザ）が複数のセットに分割されて、これによりログが分割される。すなわち、類似ユーザ単位で分かれるように履歴データのログが分割される。 As for the log, the comparison target user (similar user) is divided into a plurality of sets, thereby dividing the log. That is, the log of history data is divided so as to be divided in units of similar users.

図２７は、推薦対象ユーザを複数のセット（グループ）に分割したときの処理時間の変化を示している。ここでは、推薦対象ユーザが均等に分割され、各セットの人数が同じであるとする。図２７において、横軸は、推薦分割人数ｒ（１セットの人数ｒ＝全ユーザ数／分割数）である。縦軸は処理時間ｔである。ログ数ｙは固定されている。図示のように、推薦分割人数ｒが小さくなるほど、処理時間ｔが短くなる。しかし、推薦分割人数ｒが小さくなりすぎると、処理時間ｔが逆に長くなる。最適な推薦分割人数Ｒでは、処理時間ｔが極小になる。 FIG. 27 shows a change in processing time when the recommendation target user is divided into a plurality of sets (groups). Here, it is assumed that the recommendation target users are equally divided and the number of persons in each set is the same. In FIG. 27, the horizontal axis represents the recommended number of divided persons r (one set of persons r = total number of users / number of divided areas). The vertical axis represents the processing time t. The log number y is fixed. As shown in the figure, the processing time t becomes shorter as the recommended number of divided persons r becomes smaller. However, if the recommended division number r becomes too small, the processing time t becomes conversely longer. In the optimum recommended division number R, the processing time t is minimized.

図２７のデータは、既に説明した基礎実験を実行することによって得られる。基礎実験では、前述のように、サンプルログを用いてフィルタ装置が実際に動作する。ログ数は図２７に示すように固定される。そして、推薦対象ユーザの分割セットを用いて処理時間が計測される。 The data in FIG. 27 is obtained by executing the basic experiment already described. In the basic experiment, as described above, the filter device actually operates using the sample log. The number of logs is fixed as shown in FIG. And processing time is measured using the division | segmentation set of a recommendation object user.

図２８は、分割を適用したときの基礎実験処理の幾つかの例を示している。図２８（ａ）は、分割を行わないパターンである。 FIG. 28 shows some examples of basic experiment processing when division is applied. FIG. 28A shows a pattern in which no division is performed.

図２８（ｂ）では、まず、図２７を用いて説明されたように、推薦対象ユーザの推薦分割人数ｒに応じた処理時間ｔの変化が基礎実験によって求められ、処理時間ｔの極小値に対応する推薦分割人数ａが求められる。そして、推薦対象ユーザを推薦分割人数ａの複数セットに分割したときの、ログ数ｙと処理時間ｔの関係が計測される。この計測結果が、基礎データとして取得される。 In FIG. 28B, first, as described with reference to FIG. 27, a change in the processing time t according to the recommended number of divided users r of the user to be recommended is obtained by a basic experiment, and the minimum value of the processing time t is obtained. A corresponding recommended division number a is obtained. Then, the relationship between the number of logs y and the processing time t when the recommendation target user is divided into a plurality of sets of recommended division number a is measured. This measurement result is acquired as basic data.

図２８（ｃ）では、まず、分割をしない状態で基礎実験が行われる。推薦対象ユーザ数ｘが固定され、ログ数ｙを変えたときの処理時間ｔが計測され、図２８（ａ）と同じ結果が得られる。測定結果から、変曲点に対応するログ数ｂが求められる。１セットのログ数がｂ個になるようにログが分割される。ログ数がｂに固定されて、推薦対象ユーザの推薦分割人数ｒに応じた処理時間ｔの変化が計測され、処理時間ｔの極小値に対応する推薦分割人数ｃが求められる。そして、推薦対象ユーザを推薦分割人数ｃのセットに分割したときの、ログ数ｙと処理時間ｔの関係が計測される。この計測結果が、基礎データとして取得される。図２８（ｃ）では、結局、ログと推薦対象ユーザの両方を分割したときの基礎データが得られる。 In FIG. 28 (c), first, a basic experiment is performed without division. The processing time t when the recommendation target user number x is fixed and the log number y is changed is measured, and the same result as in FIG. 28A is obtained. From the measurement result, the number b of logs corresponding to the inflection point is obtained. Logs are divided so that the number of logs in one set is b. The number of logs is fixed to b, the change in the processing time t according to the recommended division number r of the recommendation target user is measured, and the recommended division number c corresponding to the minimum value of the processing time t is obtained. Then, the relationship between the number of logs y and the processing time t when the recommendation target user is divided into a set of recommended division number c is measured. This measurement result is acquired as basic data. In FIG. 28 (c), basic data when both the log and the recommendation target user are divided is obtained.

さて、上述した分割処理は、以下のようにして協調フィルタ装置へ適用される。すなわち、図１０に示す処理対象データ抽出部２９において、基礎データ取得実験処理部５３により、下記のように分割パターン（分割形態）が異なる複数種類の基礎データが取得され、これら複数種類の基礎データが、基礎データ記憶部４１に格納される。
（１）推薦対象ユーザ、ログとも分割しない場合
（２）推薦対象ユーザを分割する場合
（３）ログを分割する場合
（４）推薦対象ユーザとログを分割する場合
（２）〜（４）の各々について、複数の分割パターンの各々から基礎データが取得され、格納されてよい。つまり、推薦対象ユーザの分割数を変えたり、ログの分割数を変えたときの基礎データが格納されてよい。 Now, the division processing described above is applied to the collaborative filter device as follows. That is, in the processing object data extraction unit 29 shown in FIG. 10, the basic data acquisition experiment processing unit 53 acquires a plurality of types of basic data having different division patterns (division forms) as described below, and the plurality of types of basic data Is stored in the basic data storage unit 41.
(1) When neither the recommendation target user nor the log is divided (2) When the recommendation target user is divided (3) When the log is divided (4) When the recommendation target user and the log are divided (2) to (4) For each, basic data may be obtained from each of the plurality of division patterns and stored. That is, basic data when the number of divisions of recommendation target users is changed or the number of log divisions is changed may be stored.

図１０において、処理時間算出部４５は、基礎データ記憶部４１の複数種類の基礎データの各々から、予測処理時間を算出する。すなわち、各基礎データに目標ログ数が当てはめられる。そして、処理時間算出部４５は、最も短い予測処理時間を求める。処理時間出力部４７は、最短の予測処理時間と共に、その予測処理時間が得られた基礎データの分割パターンを表示する。例えば、「推薦対象ユーザｘは、Ｒ人ずつ分割されます。」というメッセージがディスプレイに表示される。 In FIG. 10, the processing time calculation unit 45 calculates a predicted processing time from each of a plurality of types of basic data in the basic data storage unit 41. That is, the target number of logs is applied to each basic data. Then, the processing time calculation unit 45 obtains the shortest predicted processing time. The processing time output unit 47 displays the division pattern of the basic data obtained with the predicted processing time together with the shortest predicted processing time. For example, a message “Recommendation target user x is divided by R people” is displayed on the display.

ログ抽出部５１は、予測処理時間に対応する分割パターンの情報（より詳細には、予測処理時間が最短になるときの分割パターンの情報）を、処理時間算出部４５から取得する。そして、ログ抽出部５１は、取得した分割パターンに基づいてログ抽出処理を行う。すなわち、履歴データ蓄積部２１のログが分割される場合に、分割されたログから目標ログ数のログが抽出される。ここでも、前述の類似度しきい値を使った処理が同様に行われる。 The log extraction unit 51 acquires information on the division pattern corresponding to the prediction processing time (more specifically, information on the division pattern when the prediction processing time is the shortest) from the processing time calculation unit 45. And the log extraction part 51 performs a log extraction process based on the acquired division | segmentation pattern. That is, when the logs of the history data storage unit 21 are divided, the target number of logs is extracted from the divided logs. Again, the process using the above-described similarity threshold is performed in the same manner.

上記の分割パターンのデータは、さらに、処理対象データ抽出部２９から協調フィルタ部２３へも伝えられる。協調フィルタ部２３は、取得した分割パターンのデータに従って協調フィルタ動作を行う。分割パターンで推薦対象ユーザが分割されていれば、分割パターンに従って推薦対象を分割してフィルタリング処理が行われる。また、分割パターンでログが分割されていれば、ログ抽出部５１より提供される分割ログからのログ抽出データを使ってフィルタリング処理が行われる。 The data of the division pattern is further transmitted from the processing target data extraction unit 29 to the collaborative filter unit 23. The collaborative filter unit 23 performs a collaborative filter operation in accordance with the acquired division pattern data. If the recommendation target user is divided by the division pattern, the recommendation target is divided according to the division pattern and the filtering process is performed. Further, if the log is divided according to the division pattern, the filtering process is performed using the log extraction data from the division log provided by the log extraction unit 51.

上記の分割パターンとは、推薦対象ユーザがどのように分割されるか、および、ログがどのように分割されるかのデータである。具体的には、例えば、推薦対象ユーザの分割における１セットの人数とセット数であり、また、ログ分割における１セットのログ数である。 The above division pattern is data on how the recommendation target user is divided and how the log is divided. Specifically, for example, the number of sets and the number of sets in the division of the recommendation target user, and the number of logs in one set in the log division.

図２１に示される処理対象データ抽出部２９にも、上記の分割処理が同様に適用されてよい。図２１の構成は、設定処理時間を入力して目標ログ数を算出する構成である。この場合も、複数の分割パターンに対応する複数種類の基礎データが、基礎データ取得実験処理部９３により取得され、基礎データ記憶部８１に格納される。 The above division processing may be similarly applied to the processing target data extraction unit 29 shown in FIG. The configuration of FIG. 21 is a configuration in which the target log number is calculated by inputting the set processing time. Also in this case, a plurality of types of basic data corresponding to a plurality of division patterns are acquired by the basic data acquisition experiment processing unit 93 and stored in the basic data storage unit 81.

目標ログ数算出部８５は、基礎データ記憶部８１の複数種類の基礎データの各々から、目標ログ数を算出する。そして、目標ログ数算出部８５は、最も大きい目標ログ数を求める。目標ログ数出力部８７は、目標ログ数と共に、その目標ログ数が得られた基礎データの分割パターンを表示する。ログ抽出部８１は、目標ログ数が最大になる基礎データの分割パターンの情報を、目標ログ数算出部８５から取得する。そして、ログ抽出部８１は、取得した分割パターンに基づいてログ抽出処理を行う。協調フィルタ部２３も、分割パターンを取得して、そのパターンでの協調フィルタリングを行う。 The target log number calculation unit 85 calculates the target log number from each of a plurality of types of basic data in the basic data storage unit 81. Then, the target log number calculation unit 85 obtains the largest target log number. The target log number output unit 87 displays the division pattern of the basic data from which the target log number is obtained together with the target log number. The log extraction unit 81 acquires information on the division pattern of the basic data that maximizes the target log number from the target log number calculation unit 85. Then, the log extraction unit 81 performs log extraction processing based on the acquired division pattern. The collaborative filter unit 23 also acquires the division pattern and performs collaborative filtering with the pattern.

さらに、図２４に示される処理対象データ抽出部２９にも、上記の分割処理が適用されてよい。図２４の構成は、設定類似度しきい値を入力して目標ログ数と予測処理時間を算出する構成である。この場合も、上記の複数種類の基礎データが、基礎データ取得実験処理部１１５により取得され、基礎データ記憶部１０１に格納される。 Furthermore, the above-described division processing may be applied to the processing target data extraction unit 29 shown in FIG. The configuration of FIG. 24 is a configuration in which the target log number and the prediction processing time are calculated by inputting a set similarity threshold. Also in this case, the plurality of types of basic data are acquired by the basic data acquisition experiment processing unit 115 and stored in the basic data storage unit 101.

目標ログ数取得部１０５は、図２４で説明した通りに、設定類似度しきい値に対応するログ数を求める。この後、処理時間算出部１０７が、基礎データ記憶部１０１の複数種類の基礎データの各々から、予測処理時間を算出する。そして、処理時間算出部１０７は、最も短い予測処理時間を求める。計算結果出力部１０９は、予測処理時間と共に、その予測処理時間が得られた基礎データの分割パターンを表示する。ログ抽出部１１３は、最も予測処理時間が短くなるときの分割パターンの情報を、処理時間算出部１０７から取得する。そして、ログ抽出部１１３は、取得した分割パターンに基づいてログ抽出処理を行う。協調フィルタ部２３も、分割パターンを取得して、そのパターンでの協調フィルタリングを行う。 The target log number acquisition unit 105 obtains the number of logs corresponding to the set similarity threshold as described with reference to FIG. Thereafter, the processing time calculation unit 107 calculates a predicted processing time from each of a plurality of types of basic data in the basic data storage unit 101. Then, the processing time calculation unit 107 obtains the shortest predicted processing time. The calculation result output unit 109 displays the division pattern of the basic data obtained with the prediction processing time together with the prediction processing time. The log extraction unit 113 acquires information on the division pattern when the predicted processing time is the shortest from the processing time calculation unit 107. Then, the log extraction unit 113 performs log extraction processing based on the acquired division pattern. The collaborative filter unit 23 also acquires the division pattern and performs collaborative filtering with the pattern.

以上に分割処理について説明した。上記の説明では、基礎データ記憶部は、（１）非分割時の基礎データと、（２）推薦対象ユーザを分割したときの基礎データと、（３）ログを分割したときの基礎データと、（４）両者を分割したときの基礎データを記憶している。しかし、これらのすべての基礎データが記憶されていなくてよい。一つのみが記憶されていてよい。例えば、推薦対象データを分割した基礎データのみが記憶されていてもよい（ただし、分割人数を変えた複数の基礎データが記憶されていてよい）。 The division process has been described above. In the above description, the basic data storage unit includes (1) basic data at the time of non-division, (2) basic data when the recommendation target user is divided, and (3) basic data when the log is divided, (4) The basic data when both are divided is stored. However, not all these basic data need be stored. Only one may be stored. For example, only basic data obtained by dividing recommendation target data may be stored (however, a plurality of basic data in which the number of divided persons is changed may be stored).

また、上記の例では、推薦対象ユーザがランダムに分割され、各セットの人数が同じであった。推薦対象ユーザの分割のパターンはこれに限定されない。例えば、各推薦対象ユーザのログ数（個人ログ数）が調べられる。そして、前述のジャッカード係数の理論値に基づいて、理論値を固定してユーザが分割される。具体的には、個人ログ数が近いユーザが同じセットに属するように、推薦対象ユーザが分割される。個人ログ数ごとにユーザが分割されてよい。 Moreover, in said example, the recommendation object user was divided | segmented at random and the number of people of each set was the same. The division pattern of the user to be recommended is not limited to this. For example, the number of logs (number of personal logs) of each recommendation target user is checked. Based on the above-described theoretical value of the Jackard coefficient, the theoretical value is fixed and the users are divided. Specifically, the recommendation target users are divided so that users with a close personal log count belong to the same set. A user may be divided for each number of personal logs.

以上に、本発明の実施の形態に係る協調フィルタ装置１について説明した。本実施の形態によれば、ログ数と協調フィルタリング処理時間との関係の基礎データを記憶した基礎データ記憶部４１が設けられ、入力された目標ログ数に対応する予測処理時間が計算され、また、入力された目標ログ数のログが履歴データ蓄積部２１から抽出される。予測処理時間が予めわかるので、予測処理時間が妥当になるようにログ数を設定できる。また、設定したログ数のログを履歴データから抽出することにより、算出した予測処理時間を達成するように計算量を削減できる。このようにして、計算量を減らして処理時間を短縮することができる。 The collaborative filter device 1 according to the embodiment of the present invention has been described above. According to the present embodiment, the basic data storage unit 41 that stores the basic data of the relationship between the number of logs and the collaborative filtering processing time is provided, the predicted processing time corresponding to the input target log number is calculated, and , Logs of the input target number of logs are extracted from the history data storage unit 21. Since the prediction processing time is known in advance, the number of logs can be set so that the prediction processing time is appropriate. In addition, by extracting logs of the set number of logs from the history data, the amount of calculation can be reduced so as to achieve the calculated predicted processing time. In this way, the amount of calculation can be reduced and the processing time can be shortened.

また、本実施の形態によれば、類似度理論値に基づいて、推薦対象ユーザのログ数に応じて比較対象ユーザのログ数を限定して、履歴データからログを抽出することで、推薦対象ユーザと取得アイテム数が近いユーザのログを残すことができ、フィルタリング精度をより高く保ったままログ数を削減できる。 In addition, according to the present embodiment, based on the similarity theoretical value, the number of logs of the comparison target user is limited according to the number of logs of the recommendation target user, and the log is extracted from the history data, so that the recommendation target It is possible to leave a log of a user who has a similar number of items to the user and to reduce the number of logs while maintaining higher filtering accuracy.

また、本実施の形態によれば、類似度理論値の類似度しきい値を設定し、この類似度しきい値を使って履歴データからログを抽出することで、推薦対象ユーザと取得アイテム数が近いユーザのログを残すことができ、フィルタリング精度をより高く保ったままログ数を削減できる。 Further, according to the present embodiment, the similarity threshold value of the similarity theory value is set, and the log is extracted from the history data using the similarity threshold value, so that the recommended target user and the number of items to be acquired Logs of users close to can be left, and the number of logs can be reduced while maintaining higher filtering accuracy.

また、本実施の形態によれば、推薦対象ユーザを複数に分割することにより、計算量を減らして処理時間を短縮できる。このときも、処理時間とログ数を適切に設定した上で処理を実行できる。 Further, according to the present embodiment, by dividing the recommendation target user into a plurality of pieces, the calculation amount can be reduced and the processing time can be shortened. Also at this time, the processing can be executed after appropriately setting the processing time and the number of logs.

また、本実施の形態によれば、ログを複数に分割することにより、計算量を減らして処理時間を短縮できる。このときも、処理時間とログ数を適切に設定した上で処理を実行できる。 Further, according to the present embodiment, by dividing the log into a plurality of logs, the calculation amount can be reduced and the processing time can be shortened. Also at this time, the processing can be executed after appropriately setting the processing time and the number of logs.

また、本実施の形態によれば、推薦対象ユーザを複数に分割し、ログを複数に分割することにより、計算量を減らして処理時間を短縮できる。このときも、処理時間とログ数を適切に設定した上で処理を実行できる。 Further, according to the present embodiment, by dividing the recommendation target user into a plurality of parts and dividing the log into a plurality of parts, the calculation amount can be reduced and the processing time can be shortened. Also at this time, the processing can be executed after appropriately setting the processing time and the number of logs.

また、本実施の形態によれば、設定処理時間の入力に応じて、入力された設定処理時間に対応する目標ログ数が計算され、算出された目標ログ数のログが履歴データ蓄積部から抽出される。したがって、この点でも、計算量を減らして処理時間を短縮するといったことを好適に行える。 In addition, according to the present embodiment, the target number of logs corresponding to the input setting processing time is calculated in response to the input of the setting processing time, and logs of the calculated target log number are extracted from the history data storage unit. Is done. Therefore, also in this respect, it is possible to suitably reduce the amount of calculation and shorten the processing time.

また、本実施の形態によれば、入力された設定類似度しきい値から目標ログ数が算出され、目標ログ数から処理時間の予測値が算出される。したがって、設定類似度しきい値を調整して、処理時間が適切になるようにログを削減できる。類似度理論値の類似度しきい値を用いることで、フィルタリング精度をより高く保ちながら計算量を削減でき、処理時間を短縮できる。 Further, according to the present embodiment, the target log number is calculated from the input set similarity threshold, and the predicted value of the processing time is calculated from the target log number. Therefore, the log can be reduced so that the processing time is appropriate by adjusting the set similarity threshold. By using the similarity threshold value of the similarity theory value, it is possible to reduce the amount of calculation while keeping the filtering accuracy higher, and to shorten the processing time.

なお、本実施の形態では、商品がコンテンツであるとして説明を行ったが、商品を説明する情報をコンテンツとしてコンテンツデータベースを構成してもよい。また、コンテンツは電子データでなくてもよい。
また、コンテンツ購入をダウンロードすると説明したが、履歴データはユーザと商品の購入を関連付けるものでればいずれの手段から得られてもよい。例えば、インターネットを利用しない、ファクシミリや電話での受注であってもよい。 In the present embodiment, the description has been made on the assumption that the product is content. However, the content database may be configured with information describing the product as content. Further, the content may not be electronic data.
Further, although it has been described that the content purchase is downloaded, the history data may be obtained from any means as long as it associates the purchase of the product with the user. For example, the order may be received by facsimile or telephone without using the Internet.

以上に本発明の好適な実施の形態を説明した。しかし、本発明は上述の実施の形態に限定されず、当業者が本発明の範囲内で上述の実施の形態を変形可能なことはもちろんである。 The preferred embodiments of the present invention have been described above. However, the present invention is not limited to the above-described embodiments, and it goes without saying that those skilled in the art can modify the above-described embodiments within the scope of the present invention.

以上のように、本発明にかかる協調フィルタ装置は、計算量を減らして処理時間を短縮することのできるという効果を有し、コンテンツ配信の推薦システム等として有用である。 As described above, the collaborative filter device according to the present invention has an effect of reducing the amount of calculation and shortening the processing time, and is useful as a content distribution recommendation system.

本発明の実施の形態に係る協調フィルタ装置の構成を示すブロック図The block diagram which shows the structure of the collaborative filter apparatus which concerns on embodiment of this invention コンテンツ購入および履歴データ蓄積の全体的な動作のフローチャートFlow chart of overall operation of content purchase and history data accumulation 履歴データであるログの例を示す図The figure which shows the example of log which is history data ジャッカード係数を利用する協調フィルタリング処理を示す図The figure which shows the collaborative filtering processing which utilizes the Jackard coefficient ジャッカード係数を利用する協調フィルタリング処理を示す図The figure which shows the collaborative filtering processing which utilizes the Jackard coefficient 協調フィルタリング処理を示すフローチャートFlow chart showing collaborative filtering processing 余弦の類似度を利用する協調フィルタリング処理を示す図The figure which shows the collaborative filtering process using similarity of a cosine 余弦の類似度を利用する協調フィルタリング処理を示す図The figure which shows the collaborative filtering process using similarity of a cosine 余弦の類似度を利用する協調フィルタリング処理を示す図The figure which shows the collaborative filtering process using similarity of a cosine 処理対象データ抽出部の構成を示す図The figure which shows the structure of the processing object data extraction part （ａ）ログ数と処理時間の理想的な関係を示す図（ｂ）ログ数と処理時間の現実の関係を示す図(A) Diagram showing ideal relationship between number of logs and processing time (b) Diagram showing actual relationship between number of logs and processing time 基礎データ取得実験処理部の処理を示す図The figure which shows the processing of basic data acquisition experiment processing section 処理対象データ抽出部の動作を示すフローチャートA flowchart showing the operation of the processing target data extraction unit 処理対象データ抽出部の動作に対応する画面表示を示す図The figure which shows the screen display corresponding to operation | movement of a process target data extraction part. （ａ）購入履歴の例を示す図（ｂ）購入履歴の別の例を示す図(A) Diagram showing an example of purchase history (b) Diagram showing another example of purchase history 類似度理論値のテーブルを示す図Figure showing a table of similarity theory values 類似度理論値のテーブルを示す図Figure showing a table of similarity theory values ログ抽出部の構成を示すブロック図Block diagram showing the configuration of the log extractor ログ抽出部の動作を示すフローチャートFlow chart showing operation of log extractor ログ抽出部の動作を示すフローチャートFlow chart showing operation of log extractor 処理対象データ抽出部の別の構成例を示す図The figure which shows another structural example of a process target data extraction part 処理対象データ抽出部の動作を示すフローチャートA flowchart showing the operation of the processing target data extraction unit 処理対象データ抽出部の動作に対応する画面表示を示す図The figure which shows the screen display corresponding to operation | movement of a process target data extraction part. 処理対象データ抽出部の別の構成例を示す図The figure which shows another structural example of a process target data extraction part 処理対象データ抽出部の動作を示すフローチャートA flowchart showing the operation of the processing target data extraction unit 処理対象データ抽出部の動作に対応する画面表示を示す図The figure which shows the screen display corresponding to operation | movement of a process target data extraction part. 推薦対象ユーザを複数のセットに分割したときの処理時間の変化を示す図The figure which shows the change of processing time when a user for recommendation is divided into a plurality of sets （ａ）分割を行わないパターンの基礎データを示す図（ｂ）分割を行うパターンの基礎データを示す図（ｃ）分割を行うパターンの基礎データを示す図(A) The figure which shows the basic data of the pattern which does not divide | segment (b) The figure which shows the basic data of the pattern which divides | segments (c) The figure which shows the basic data of the pattern which divides | segments

Explanation of symbols

１協調フィルタ装置
２１履歴データ蓄積部
２３協調フィルタ部
２５推薦情報配信部
２７商品情報記憶部
２９処理対象データ抽出部
４１基礎データ記憶部
４３目標ログ数入力部
４５処理時間算出部
４７処理時間出力部
４９許可判定受付部
５１ログ抽出部
５３基礎データ取得実験処理部 DESCRIPTION OF SYMBOLS 1 Collaborative filter apparatus 21 History data storage part 23 Collaborative filter part 25 Recommendation information delivery part 27 Merchandise information memory | storage part 29 Process target data extraction part 41 Basic data memory | storage part 43 Target log number input part 45 Processing time calculation part 47 Processing time output part 49 Permission judgment acceptance part 51 Log extraction part 53 Basic data acquisition experiment processing part

Claims

A history data accumulation unit that accumulates, as history data, a log including user identification data and item identification data generated each time an item is acquired by a large number of users;
A collaborative filter unit for obtaining a recommended item for a recommendation target user by collaborative filtering processing using the history data accumulated in the history data accumulation unit;
An output unit for outputting a processing result by the collaborative filter unit;
A processing target data extraction unit that extracts logs to be processed by the collaborative filter unit from a large number of logs stored in the history data storage unit;
The processing object data extraction unit
A basic data storage unit storing basic data on the relationship between the number of logs and the processing time of collaborative filtering;
A target log number input unit for inputting a target log number to be processed by collaborative filtering;
A processing time calculation unit that reads out the basic data from the basic data storage unit and obtains a predicted processing time corresponding to the target log number input by the log number input unit from the basic data;
A processing time output unit that outputs the predicted processing time calculated by the processing time calculation unit;
Based on the target log number input by the log number input unit, a log extraction unit that extracts logs of the target log number from logs accumulated in the history data accumulation unit;
A collaborative filter device comprising:

The log extraction unit calculates the number of personal logs of the recommendation target user based on the similarity similarity value determined from the number of personal logs of the recommendation target user, the number of personal logs of the comparison target user, and the number of logs of the common acquisition items of both users. The collaborative filter according to claim 1, wherein the number of personal logs of the comparison target user is limited in response, and the logs of the comparison target user corresponding to the limited number of personal logs are extracted from the history database. apparatus.

The log extraction unit
Means for setting a similarity threshold of the similarity theoretical value;
Means for setting a threshold reference personal log number that is the number of personal logs of a comparison target user corresponding to a similarity theoretical value equal to or greater than the similarity threshold;
Count acquisition means for acquiring a count value of the number of logs of comparison target users corresponding to the threshold standard personal log number from the log of the history data storage unit;
And the log extraction source to the threshold reference personal log number when the similarity threshold is set to the maximum in a range in which the count value obtained by the count acquisition means achieves the target log number The collaborative filter device according to claim 2, wherein the number of personal logs of the comparison target users is limited.

The basic data storage unit stores basic data on the relationship between the number of logs and the collaborative filtering processing time when the recommended user is divided into a plurality,
The collaborative filter device according to claim 1, wherein the processing time calculation unit calculates a prediction processing time using the basic data obtained when the recommendation target user is divided.

The basic data storage unit stores basic data on the relationship between the number of logs and collaborative filtering processing time when the log is divided into a plurality of logs,
The collaborative filter device according to claim 1, wherein the processing time calculation unit calculates a predicted processing time using the basic data when the log is divided.

The basic data storage unit stores basic data on the relationship between the number of logs and the collaborative filtering processing time when the recommended user and the log are each divided into a plurality of pieces,
The collaborative filter device according to claim 1, wherein the processing time calculation unit calculates a predicted processing time using the basic data when the pre-recommendation target user and the log are divided.

The processing target data extraction unit further includes a setting processing time input unit that inputs a setting processing time of collaborative filtering, and the setting that is input by the processing time input unit by reading the basic data from the basic data storage unit A target log number calculation unit for obtaining a target log number corresponding to the processing time from the basic data,
The log extraction unit, when the set processing time is input, extracts the log of the target log number obtained by the target log number calculation unit from the logs accumulated in the history data accumulation unit. The collaborative filter device according to claim 1.

The processing target data extraction unit inputs a similarity threshold value for setting a similarity degree theoretical value determined from the number of personal logs of a recommendation target user, the number of personal logs of a comparison target user, and the number of logs of a common acquisition item of both users A threshold value input unit, a target log number acquisition unit for obtaining a target log number corresponding to the set similarity threshold, and a processing time calculation unit for calculating a predicted processing time corresponding to the target log number. And
When the set similarity threshold is input, the target log number acquisition unit compares the number of individual logs corresponding to the similarity theoretical value equal to or higher than the similarity threshold from the log of the history data storage unit. The count value of the number of logs of the target user is obtained as the target log number, and the processing time calculation unit reads the basic data from the basic data storage unit, and the target log acquired by the target log number acquisition unit The collaborative filter device according to claim 1, wherein a prediction processing time corresponding to a number is obtained from the basic data.

Using a history data storage unit that stores, as history data, a log including user identification data and item identification data that occurs each time an item is acquired by a large number of users, and the history data stored in the history data storage unit For a collaborative filter device comprising a collaborative filter unit that seeks a recommended item for a recommendation target user by collaborative filtering processing, the collaborative filter unit should be processed from a large number of logs accumulated in the history data accumulating unit A collaborative filter support device for extracting logs,
A basic data storage unit storing basic data on the relationship between the number of logs and the processing time of collaborative filtering;
A target log number input unit for inputting a target log number to be processed by collaborative filtering;
A processing time calculation unit that reads out the basic data from the basic data storage unit and obtains a predicted processing time corresponding to the target log number input by the log number input unit from the basic data;
A processing time output unit that outputs the predicted processing time calculated by the processing time calculation unit;
Based on the target log number input by the log number input unit, a log extraction unit that extracts logs of the target log number from logs accumulated in the history data accumulation unit;
A collaborative filter support device characterized by comprising:

Using a history data storage unit that stores, as history data, a log including user identification data and item identification data that occurs each time an item is acquired by a large number of users, and the history data stored in the history data storage unit For a collaborative filter device comprising a collaborative filter unit that seeks a recommended item for a recommendation target user by collaborative filtering processing, the collaborative filter unit should be processed from a large number of logs accumulated in the history data accumulating unit A collaborative filter support processing method for extracting logs by computer processing,
The target log number to be processed by collaborative filtering is input from the target log number input unit, the basic data is read from the basic data storage unit storing the basic data of the relationship between the log number and the processing time of collaborative filtering, and the basic data A prediction processing time corresponding to the target log number input by the log number input unit is obtained from the basic data read from the storage unit, and the calculated prediction processing time is output from the processing time output unit, and the number of logs A collaborative filter support processing method, wherein a log having the target number of logs is extracted from logs accumulated in the history data accumulation unit based on the target number of logs input by an input unit.

Using a history data storage unit that stores, as history data, a log including user identification data and item identification data that occurs each time an item is acquired by a large number of users, and the history data stored in the history data storage unit For a collaborative filter device comprising a collaborative filter unit that seeks a recommended item for a recommendation target user by collaborative filtering processing, the collaborative filter unit should be processed from a large number of logs accumulated in the history data accumulating unit A collaborative filter support processing program for causing a computer to execute processing for extracting a log,
The target log number to be processed by collaborative filtering is input from the target log number input unit, the basic data is read from the basic data storage unit storing the basic data of the relationship between the log number and the processing time of collaborative filtering, and the basic data A prediction processing time corresponding to the target log number input by the log number input unit is obtained from the basic data read from the storage unit, and the calculated prediction processing time is output from the processing time output unit, and the number of logs A collaborative filter support processing program for causing a computer to execute a process of extracting a log of the target log number from logs accumulated in the history data accumulation unit based on the target log number input by an input unit.

Using a history data storage unit that stores, as history data, a log including user identification data and item identification data that occurs each time an item is acquired by a large number of users, and the history data stored in the history data storage unit For a collaborative filter device comprising a collaborative filter unit that seeks a recommended item for a recommendation target user by collaborative filtering processing, the collaborative filter unit should be processed from a large number of logs accumulated in the history data accumulating unit A collaborative filter support device for extracting logs,
A basic data storage unit storing basic data on the relationship between the number of logs and the processing time of collaborative filtering;
A setting processing time input unit for inputting a setting processing time of collaborative filtering;
A target log number calculation unit that reads the basic data from the basic data storage unit and obtains a target log number corresponding to the setting processing time input by the setting processing time input unit from the basic data;
A target log number output unit for outputting the target log number calculated by the target log number calculation unit;
Based on the target log number calculated by the target log number calculation unit, a log extraction unit that extracts logs of the target log number from logs accumulated in the history data accumulation unit;
A collaborative filter support device characterized by comprising:

Using a history data storage unit that stores, as history data, a log including user identification data and item identification data that occurs each time an item is acquired by a large number of users, and the history data stored in the history data storage unit For a collaborative filter device comprising a collaborative filter unit that seeks a recommended item for a recommendation target user by collaborative filtering processing, the collaborative filter unit should be processed from a large number of logs accumulated in the history data accumulating unit A collaborative filter support processing program for causing a computer to execute processing for extracting a log,
Input the collaborative filtering setting processing time from the setting processing time input unit, read the basic data from the basic data storage unit storing the basic data of the relationship between the number of logs and the processing time of collaborative filtering, and read from the basic data storage unit The target log number corresponding to the set processing time input by the set processing time input unit is obtained from the basic data, the calculated target log number is output from the target log number output unit, and the target log number calculation is performed. A collaborative filter support processing program for causing a computer to execute a process of extracting a log of the target log number from the logs accumulated in the history data accumulation unit based on the target log number calculated by a unit.

Using a history data storage unit that stores, as history data, a log including user identification data and item identification data that occurs each time an item is acquired by a large number of users, and the history data stored in the history data storage unit For a collaborative filter device comprising a collaborative filter unit that seeks a recommended item for a recommendation target user by collaborative filtering processing, the collaborative filter unit should be processed from a large number of logs accumulated in the history data accumulating unit A collaborative filter support device for extracting logs,
Means for inputting a similarity threshold of a theoretical value of similarity determined from the number of personal logs of recommended users, the number of personal logs of users to be compared, and the number of logs of common acquisition items of both users;
Means for setting a threshold reference personal log number that is the number of personal logs of a comparison target user corresponding to a similarity theoretical value equal to or greater than the similarity threshold;
Means for extracting the log of the comparison target user of the threshold reference personal log number from the history data storage unit;
A collaborative filter support device characterized by comprising: