JP5216895B2

JP5216895B2 - Log processing apparatus and operation method thereof

Info

Publication number: JP5216895B2
Application number: JP2011124039A
Authority: JP
Inventors: 彰中山; 仁志瀬下; 優甲谷; 大我吉田; 篤信木村; 真中辻; 達郎石田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-06-02
Filing date: 2011-06-02
Publication date: 2013-06-19
Anticipated expiration: 2031-06-02
Also published as: JP2012252480A

Description

本発明は、ログ処理装置およびその動作方法に関するものである。 The present invention relates to a log processing apparatus and an operation method thereof.

協調フィルタリングは、広義には、多くのユーザの嗜好情報を蓄積し、ある利用者（ユーザ）と嗜好の類似した他のユーザの情報を用いて該当ユーザにコンテンツを推奨する技術である。 Collaborative filtering is a technology that, in a broad sense, accumulates preference information of many users and recommends content to relevant users using information on other users who have similar preferences to a certain user (user).

この技術は、利用者間型とアイテム（コンテンツともいう）間型に分類にできる。利用者間型は、推奨を受けるユーザと嗜好パターンが似ているユーザ（類似ユーザという）をまず見つけ、その類似ユーザが好むアイテム群を推奨候補とする技術である。実装にはユーザ同士の類似度を、同じアイテムにつけた評価の相関係数（Ｐｅａｒｓｏｎ相関、順位相関などが用いられる）などによって表し、また、嗜好の予測には、類似度の高いユーザを抽出し、そのアイテムへの評価値を、そのユーザへの類似度で重みを付けし、それらの評価値の加重平均値を予測として用いる。それらの予測値の大きなものから、推奨アイテムとして推奨を行う。表示画面の広さに制約もあるため、小さな予測値を持つアイテムを削除したり、上位３〜１０個程度のアイテムを表示するように構成を行う。 This technology can be classified into a type between users and a type between items (also referred to as contents). The inter-user type is a technique that first finds a user (referred to as a similar user) whose preference pattern is similar to the user receiving the recommendation, and sets an item group preferred by the similar user as a recommended candidate. For implementation, the degree of similarity between users is represented by the correlation coefficient (Pearson correlation, rank correlation, etc.) of the evaluation given to the same item, and users with high similarity are extracted for preference prediction. The evaluation value for the item is weighted with the similarity to the user, and the weighted average value of the evaluation values is used as a prediction. Recommendations are made as recommended items from those with large predicted values. Since there is a restriction on the size of the display screen, an item having a small predicted value is deleted, or the upper 3 to 10 items are displayed.

また、アイテム間型では、いろいろな利用者に同じような評価を受けているアイテムは似ているという考え、関心があるアイテムの類似アイテムに利用者は関心を持つという仮定を置き、ユーザの直近の利用履歴にあるアイテムに類似しているアイテムを推奨するものである。実装としては、アイテムの利用ユーザの共起性などでアイテム間の類似度を測り、ユーザの直近の利用履歴にあるアイテムの類似アイテムの推奨を行うことが行われている。 Also, in the item-to-item type, it is assumed that items that receive similar evaluations by various users are similar, and the assumption that users are interested in similar items of the items they are interested in is the closest to the user. Recommend items that are similar to items in your usage history. As an implementation, the similarity between items is measured based on the co-occurrence of the user of the item, and the similar item of the item in the latest usage history of the user is recommended.

これらの技術の詳細については、非特許文献１に解説されている。 Details of these techniques are described in Non-Patent Document 1.

上嶌敏弘, “推奨システムのアルゴリズム（２）,” 人工知能学会誌 23巻1号, pp.89103, 2008年1月Toshihiro Kamijo, “Algorithm of Recommended System (2),” Journal of Japanese Society for Artificial Intelligence Vol.23, No.1, pp.89103, January 2008

これらの推奨手法は、サービス内に閉じた推奨や、サービス間で行う（サービス横断推奨と以下呼ぶ）場合でも同一のユーザ数、コンテンツ数、利用頻度の場合は用いることができるが、現実のサービスでは、以下のような問題がある。 These recommended methods can be used in the case of the same number of users, the number of contents, and the frequency of use even when recommendations are closed within the service or between services (hereinafter referred to as cross-service recommendations). Then, there are the following problems.

利用者間型の推奨方法を用いる場合、アイテムの評価値の加重平均値を利用するため、サービス間で大きく利用頻度が異なる場合などでは、利用頻度の低いサービスに属するアイテムの評価値が利用頻度の高いサービスに属するアイテムに比較して、大きな値を持つと考えられるため、推奨として表示されないという問題があった。 When using the inter-user type recommendation method, the weighted average value of the evaluation values of items is used, so the evaluation value of an item belonging to a service with low usage frequency is used when the usage frequency varies greatly between services. There is a problem that it is not displayed as a recommendation because it is considered to have a large value compared to items belonging to services with a high price.

アイテム間型の推奨方法を用いる場合も同様であり、アイテムの利用ユーザの共起性などでアイテム間の類似度を測るため、利用者間型の推奨方法を用いる場合と同様に、利用頻度の低いサービスに属するアイテムが出現しにくいという問題があり、また「あるサービスＡに属するコンテンツαからあるサービスＢに属するコンテンツβをお勧めする」といった異種のサービス間同士の推奨が起こりにくいという問題があった。 The same applies when using the item-to-item recommendation method, and in order to measure the similarity between items based on the co-occurrence of the user who uses the item. There is a problem that items belonging to a low service are unlikely to appear, and a problem that recommendations between different services such as “recommend content β belonging to a service B from a content α belonging to a service A” is unlikely to occur. there were.

これらの問題は、サービスごとのログの数の不均衡が原因で生じることがわかっている。 These problems are known to be caused by an imbalance in the number of logs per service.

本発明は、上記の課題に鑑みてなされたものであり、その目的とするところは、コンテンツの推奨に用いるサービスごとのログの数の不均衡さを解消するためのログ処理装置およびその動作方法を提供することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a log processing apparatus and an operation method thereof for eliminating the imbalance in the number of logs for each service used for content recommendation. Is to provide.

上記の課題を解決するために、第１の本発明は、複数のサービスのそれぞれにおいて１以上のコンテンツで構成されるシリーズが１以上送信可能な場合における送信先のユーザに対しコンテンツを推奨するために蓄積されたログの処理を行うログ処理装置であって、いずれかのサービスのいずれかのシリーズのいずれかのコンテンツがいずれかのユーザのために送信されたことを示す当該送信の送信日時、該コンテンツのコンテンツＩＤ、該サービスのサービスＩＤ、該シリーズのシリーズＩＤおよび該ユーザのユーザＩＤを含むログが蓄積されるログ蓄積部と、予め定められた数より多い数のログを有するログの集合を前記ログ蓄積部から検索し、各ログの集合について、ユーザＩＤおよびシリーズＩＤの組ごとに、該組のユーザＩＤおよびシリーズＩＤとを含むログの集合であるログ集合を前記ログ蓄積部から検索し、該ログ集合ごとに、該ログ集合から予め定めた数のログを送信日時の最も新しいものから順に選択するとともに選択されなかったログを削除するログ削除部とを備えることを特徴とするログ処理装置をもって解決手段とする。 In order to solve the above-described problem, the first aspect of the present invention recommends content to a destination user when one or more series composed of one or more contents can be transmitted in each of a plurality of services. A log processing device for processing the log accumulated in the transmission, the transmission date and time of the transmission indicating that any content of any series of any service has been transmitted for any user, Log accumulation unit for accumulating logs including the content ID of the content, the service ID of the service, the series ID of the series, and the user ID of the user, and a set of logs having more than a predetermined number of logs For each set of logs, for each set of user ID and series ID, for each set of logs. A log set that is a set of logs including a series ID is searched from the log storage unit, and for each log set, a predetermined number of logs are selected from the log set in order from the newest transmission date and selected. A log processing apparatus including a log deletion unit that deletes a log that has not been performed is used as a solution means.

例えば、前記ログ処理装置は、サービスＩＤごとの該サービスＩＤを含むログの集合におけるログの数の不均衡さを示す係数を求める係数計算部を備え、前記ログ削除部は、前記係数が予め定めた閾値より大きいなら、予め定められた数より多い数のログを有するログの集合を前記ログ蓄積部から検索し、各ログの集合について、ユーザＩＤおよびシリーズＩＤの組ごとに、該組のユーザＩＤおよびシリーズＩＤとを含むログの集合であるログ集合を前記ログ蓄積部から検索し、該ログ集合ごとに、該ログ集合から予め定めた数のログを送信日時の最も新しいものから順に選択するとともに選択されなかったログを削除する。 For example, the log processing apparatus includes a coefficient calculation unit that obtains a coefficient indicating an imbalance in the number of logs in a set of logs including the service ID for each service ID, and the log deletion unit has the coefficient determined in advance. If it is larger than the threshold, a set of logs having a number of logs larger than a predetermined number is searched from the log storage unit, and for each set of logs, for each set of user ID and series ID, the set of users A log set that is a set of logs including an ID and a series ID is searched from the log storage unit, and for each log set, a predetermined number of logs are selected in order from the newest transmission date and time from the log set. Delete logs that were not selected with.

例えば、前記ログ処理装置は、サービスＩＤごとの該サービスＩＤを含むログの集合におけるログの数の不均衡さを示す係数を求める係数計算部を備え、前記ログ削除部は、前記係数が予め定められた閾値より大きいなら、前記ログ蓄積部からログ数の異なる２つのログの集合を検索し、ログ数の少ない該ログの集合の最も新しい送信日時から最も古い送信日時までの期間の期間長を求め、該期間長に予め定めた係数を乗じた期間長を求め、ログ数の多い該ログの集合から最も新しい送信日時を検出し、該ログの集合の各送信日時について、該検出した送信日時から該求めた期間長だけ遡った時点までの期間に含まれるか否か判定し、該期間に含まれると判定された送信日時を含むログを選択するとともに選択されなかったログを該ログの集合から削除する。 For example, the log processing apparatus includes a coefficient calculation unit that obtains a coefficient indicating an imbalance in the number of logs in a set of logs including the service ID for each service ID, and the log deletion unit has the coefficient determined in advance. If it is greater than the threshold value, two log sets having different numbers of logs are searched from the log storage unit, and the period length from the most recent transmission date / time to the oldest transmission date / time of the log set having a small number of logs is determined. And obtaining a period length obtained by multiplying the period length by a predetermined coefficient, detecting the latest transmission date and time from the set of logs having a large number of logs, and detecting the detected transmission date and time for each transmission date and time of the log set. Whether or not it is included in the period up to the time point that has been traced back by the determined period length, and the log including the transmission date and time determined to be included in the period is selected and the log that has not been selected is collected in the log collection To remove from.

第２の本発明は、複数のサービスのそれぞれにおいて１以上のコンテンツで構成されるシリーズが１以上送信可能な場合における送信先のユーザに対しコンテンツを推奨するために蓄積されたログの処理を行うログ処理装置の動作方法であって、前記ログ処理装置は、いずれかのサービスのいずれかのシリーズのいずれかのコンテンツがいずれかのユーザのために送信されたことを示す当該送信の送信日時、該コンテンツのコンテンツＩＤ、該サービスのサービスＩＤ、該シリーズのシリーズＩＤおよび該ユーザのユーザＩＤを含むログが蓄積されるログ蓄積部を備え、前記動作方法は、前記ログ処理装置が、予め定められた数より多い数のログを有するログの集合を前記ログ蓄積部から検索し、各ログの集合について、ユーザＩＤおよびシリーズＩＤの組ごとに、該組のユーザＩＤおよびシリーズＩＤとを含むログの集合であるログ集合を前記ログ蓄積部から検索し、該ログ集合ごとに、該ログ集合から予め定めた数のログを送信日時の最も新しいものから順に選択するとともに選択されなかったログを削除することを特徴とするログ処理装置の動作方法をもって解決手段とする。 The second aspect of the present invention performs processing of logs accumulated to recommend content to a destination user when one or more series composed of one or more contents can be transmitted in each of a plurality of services. A log processing apparatus operating method, wherein the log processing apparatus transmits a transmission date and time of transmission indicating that any content in any series of any service has been transmitted for any user, A log storage unit that stores a log including the content ID of the content, the service ID of the service, the series ID of the series, and the user ID of the user, and the operation method is determined in advance by the log processing device; The log accumulation unit is searched for a set of logs having a larger number of logs, and a user ID and a log are collected for each set of logs. For each group of IDs, a log set that is a set of logs including the user ID and series ID of the group is searched from the log storage unit, and a predetermined number of log sets are obtained from the log set for each log set. The log processing apparatus operating method is characterized in that logs are selected in order from the most recent transmission date and time, and logs that have not been selected are deleted.

例えば、前記ログ処理装置が、予めサービスＩＤごとの該サービスＩＤを含むログの集合におけるログの数の不均衡さを示す係数を求め、前記ログ処理装置が、前記係数が予め定めた閾値より大きい場合において、予め定められた数より多い数のログを有するログの集合を前記ログ蓄積部から検索し、各ログの集合について、ユーザＩＤおよびシリーズＩＤの組ごとに、該組のユーザＩＤおよびシリーズＩＤとを含むログの集合であるログ集合を前記ログ蓄積部から検索し、該ログ集合ごとに、該ログ集合から予め定めた数のログを送信日時の最も新しいものから順に選択するとともに選択されなかったログを削除する。 For example, the log processing device obtains a coefficient indicating an imbalance in the number of logs in a log set including the service ID for each service ID in advance, and the log processing device has the coefficient larger than a predetermined threshold. In this case, a set of logs having a number of logs larger than a predetermined number is searched from the log storage unit, and for each set of logs, for each set of user ID and series ID, the set of user ID and series A log set that is a set of logs including an ID is searched from the log storage unit, and for each log set, a predetermined number of logs are selected from the log set in order from the newest transmission date and time. Delete logs that did not exist.

例えば、前記ログ処理装置が、予めサービスＩＤごとの該サービスＩＤを含むログの集合におけるログの数の不均衡さを示す係数を求め、前記ログ処理装置が、前記係数が予め定められた閾値より大きい場合において、前記ログ蓄積部からログ数の異なる２つのログの集合を検索し、ログ数の少ない該ログの集合の最も新しい送信日時から最も古い送信日時までの期間の期間長を求め、該期間長に予め定めた係数を乗じた期間長を求め、ログ数の多い該ログの集合から最も新しい送信日時を検出し、該ログの集合の各送信日時について、該検出した送信日時から該求めた期間長だけ遡った時点までの期間に含まれるか否か判定し、該期間に含まれると判定された送信日時を含むログを選択するとともに選択されなかったログを該ログの集合から削除する。 For example, the log processing device obtains a coefficient indicating an imbalance in the number of logs in a set of logs including the service ID for each service ID, and the log processing device determines that the coefficient is greater than a predetermined threshold. In the case where the number of logs is large, two log sets having different numbers of logs are searched from the log storage unit, and the period length of the period from the most recent transmission date to the oldest transmission date of the log set having a small number of logs is obtained, The period length obtained by multiplying the period length by a predetermined coefficient is obtained, the latest transmission date / time is detected from the set of logs having a large number of logs, and the respective transmission date / time of the log set is obtained from the detected transmission date / time. Whether or not it is included in the period up to the point of time that is traced back by the length of the selected period, and the log including the transmission date and time determined to be included in the period is selected and the log that has not been selected is the set of logs We want to delete.

本発明によれば、コンテンツの推奨に用いるサービスごとのログの数の不均衡さを解消することができる。 According to the present invention, it is possible to eliminate an imbalance in the number of logs for each service used for content recommendation.

本実施の形態に係るログ処理装置を使用した通信システムの構成を示す図である。It is a figure which shows the structure of the communication system using the log processing apparatus which concerns on this Embodiment. 履歴情報データベースの内容の一例を示す図である。It is a figure which shows an example of the content of a history information database. メタ情報データベースの内容の一例を示す図である。It is a figure which shows an example of the content of a meta information database. レコメンドサーバ１の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the recommendation server. レコメンドサーバ１における処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process in the recommendation server. ログ蓄積部１１の内容の一例を示す図である。3 is a diagram illustrating an example of contents of a log storage unit 11. FIG. ログ削除部１３によるログの削除の一例を示す図である。It is a figure which shows an example of the log deletion by the log deletion part.

以下、本発明の実施の形態について図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本実施の形態に係るログ処理装置を使用した通信システムの構成を示す図である。 FIG. 1 is a diagram showing a configuration of a communication system using the log processing apparatus according to the present embodiment.

図１において、レコメンドサーバ１は、コンテンツサーバ２１、２２、…と、ユーザ端末３１、３２、…とに対し、通信可能なように接続される。各ユーザ端末は、そのユーザ端末を使用する個々のユーザに使用される。本実施の形態に係るログ処理装置は、レコメンドサーバ１に構成される。 1, the recommendation server 1 is connected to the content servers 21, 22,... And the user terminals 31, 32,. Each user terminal is used by an individual user who uses the user terminal. The log processing apparatus according to the present embodiment is configured in the recommendation server 1.

コンテンツサーバ２１は、ユーザ端末からのリクエストに応じ、例えば、書籍のデータをそのユーザ端末に送信するサービスを行うものである。 In response to a request from a user terminal, the content server 21 performs, for example, a service for transmitting book data to the user terminal.

コンテンツサーバ２２は、ユーザ端末からのリクエストに応じ、例えば、映像のデータをそのユーザ端末に送信するサービスを行うものである。 In response to a request from a user terminal, the content server 22 performs, for example, a service for transmitting video data to the user terminal.

ここでは、１冊の書籍のデータ、１つの映像のデータをいずれもコンテンツという。 Here, the data of one book and the data of one video are both referred to as contents.

本実施の形態では、例えば、長編小説の１巻目、２巻目、…というように、２以上のコンテンツにより構成されるものをシリーズという。また、例えば、ある映画の映像のデータとその続編のデータのような構成もシリーズという。なお、ここでは、単独のコンテンツも１つのシリーズとする。 In the present embodiment, for example, a series composed of two or more contents such as the first volume, the second volume,. Also, for example, a configuration such as video data of a movie and data of its sequel is also called a series. Here, the single content is also one series.

レコメンドサーバ１は、コンテンツサーバから取得したデータを基に、各ユーザに好まれると思われるコンテンツを求め、これを対応するユーザ端末に通知する、つまり、コンテンツを推奨するものである。 The recommendation server 1 obtains a content that is considered to be preferred by each user based on data acquired from the content server, and notifies the corresponding user terminal of the content, that is, recommends the content.

各コンテンツサーバは、送信可能なコンテンツを蓄積したコンテンツデータベース１０１と、コンテンツを送信した履歴を示す履歴情報を蓄積した履歴情報データベース１０２と、コンテンツとシリーズの関係を示すメタ情報を蓄積したメタ情報データベース１０３とを備える。 Each content server includes a content database 101 that stores content that can be transmitted, a history information database 102 that stores history information indicating the history of content transmission, and a meta information database that stores meta information indicating the relationship between the content and the series. 103.

図２は、履歴情報データベース１０２の内容の一例を示す図である。 FIG. 2 is a diagram illustrating an example of the contents of the history information database 102.

履歴情報データベース１０２は、履歴情報を蓄積し、各履歴情報は、該当のコンテンツが送信された送信日時、該コンテンツを示す識別情報（以下、コンテンツＩＤという）、送信先のユーザ端末を使用するユーザを示す識別情報（以下、ユーザＩＤという）を有する。 The history information database 102 stores history information. Each history information includes the transmission date and time when the corresponding content is transmitted, identification information indicating the content (hereinafter referred to as a content ID), and a user who uses the destination user terminal. Identification information (hereinafter referred to as user ID).

図３は、メタ情報データベース１０３の内容の一例を示す図である。 FIG. 3 is a diagram illustrating an example of the contents of the meta information database 103.

メタ情報データベース１０３は、各コンテンツについてのメタ情報を蓄積し、各メタ情報は、該当のコンテンツのコンテンツＩＤ、当該コンテンツを含むシリーズを示す識別情報（以下、シリーズＩＤという）を有する。 The meta information database 103 stores meta information about each content, and each meta information has a content ID of the corresponding content and identification information (hereinafter referred to as a series ID) indicating a series including the content.

図４は、レコメンドサーバ１の概略構成を示すブロック図である。 FIG. 4 is a block diagram illustrating a schematic configuration of the recommendation server 1.

レコメンドサーバ１は、各コンテンツサーバから取得する履歴情報とメタ情報を基に構成されるログ蓄積部１１と、ログ蓄積部１１におけるログの数の不均衡さを示すジニ係数ＧＣを求めるジニ係数計算部１２と、ログ蓄積部１１からログを削除するログ削除部１３と、ユーザに好まれると思われるコンテンツをユーザ端末に通知するコンテンツ推奨部１４とを備える。 The recommendation server 1 includes a log storage unit 11 configured based on history information and meta information acquired from each content server, and a Gini coefficient calculation for obtaining a Gini coefficient GC indicating an imbalance in the number of logs in the log storage unit 11. Unit 12, a log deletion unit 13 that deletes a log from log storage unit 11, and a content recommendation unit 14 that notifies a user terminal of content that is considered to be preferred by the user.

図５は、レコメンドサーバ１における処理の流れを示すフローチャートである。 FIG. 5 is a flowchart showing the flow of processing in the recommendation server 1.

レコメンドサーバ１は、各コンテンツサーバから履歴情報とメタ情報を取得し、ログ蓄積部１１を構成する（Ｓ１）。 The recommendation server 1 acquires history information and meta information from each content server, and configures the log storage unit 11 (S1).

図６は、ログ蓄積部１１の内容の一例を示す図である。 FIG. 6 is a diagram illustrating an example of the contents of the log storage unit 11.

ログ蓄積部１１は、取得した履歴情報に対応するログを蓄積し、各ログは、該当の履歴情報に含まれていた送信日時、当該履歴情報に含まれていたコンテンツＩＤ、取得元のコンテンツサーバに対応するサービスを示す識別情報（以下、サービスＩＤという）、当該コンテンツＩＤを含むメタ情報に含まれていたシリーズＩＤ、、当該履歴情報に含まれていたユーザＩＤを有する。 The log accumulating unit 11 accumulates a log corresponding to the acquired history information, and each log includes a transmission date and time included in the corresponding history information, a content ID included in the history information, and an acquisition source content server Identification information (hereinafter referred to as a service ID) indicating a service corresponding to, a series ID included in the meta information including the content ID, and a user ID included in the history information.

図５に戻り、ジニ係数計算部１２は、ログ蓄積部１１を基にジニ係数ＧＣを計算する（Ｓ３）。 Returning to FIG. 5, the Gini coefficient calculation unit 12 calculates the Gini coefficient GC based on the log storage unit 11 (S3).

ここでは、ジニ係数計算部１２は、まず、式（１）により平均差ＭＤＦを計算する。

Here, the Gini coefficient calculation unit 12 first calculates the average difference MDF according to Equation (1).

ここで、ｎは、サービスの数（コンテンツサーバの数）、ｘｉは、サービスｉのサービスＩＤを含むログの数、ｘｊは、サービスｊのサービスＩＤを含むログの数である。 Here, n is the number of services (the number of content servers), xi is the number of logs including the service ID of service i, and xj is the number of logs including the service ID of service j.

次に、ジニ係数計算部１２は、式（２）により平均値μを計算する。

Next, the Gini coefficient calculation unit 12 calculates the average value μ using Equation (2).

ここで、ｎは、サービスの数（コンテンツサーバの数）、ｘ１、ｘ２、…は、それぞれ、第１のサービス（例えば、コンテンツサーバ２１に対応するサービス）のサービスＩＤを含むログの数、第２のサービス（例えば、コンテンツサーバ２２に対応するサービス）のサービスＩＤを含むログの数、…、第ｎのサービスのサービスＩＤを含むログの数である。 Here, n is the number of services (the number of content servers), x1, x2,... Are the number of logs including the service ID of the first service (for example, the service corresponding to the content server 21), The number of logs including the service ID of the second service (for example, the service corresponding to the content server 22), ..., the number of logs including the service ID of the nth service.

次に、ジニ係数計算部１２は、式（３）によりジニ係数ＧＣを計算する。 Next, the Gini coefficient calculation unit 12 calculates the Gini coefficient GC using Expression (3).

ＧＣ＝ＭＤＦ／（２×μ）（３）
ジニ係数ＧＣは、サービスＩＤごとの該サービスＩＤを含むログの集合（以下、ログの集合という）におけるログの数の不均衡さを示すものである。ジニ係数ＧＣは、０〜１の範囲に含まれる。ジニ係数ＧＣが１に近いほど不均衡の程度は大きく、ジニ係数ＧＣが０に近いほど不均衡の程度は小さい。不均衡がないとき、つまり、各ログの数が互いに等しいとき、ジニ係数ＧＣは０になる。 GC = MDF / (2 × μ) (3)
The Gini coefficient GC indicates an imbalance in the number of logs in a set of logs (hereinafter referred to as a set of logs) including the service ID for each service ID. The Gini coefficient GC is included in the range of 0-1. The closer the Gini coefficient GC is to 1, the greater the degree of imbalance, and the closer the Gini coefficient GC to 0, the smaller the degree of imbalance. When there is no imbalance, that is, when the number of logs is equal to each other, the Gini coefficient GC becomes zero.

次に、ログ削除部１３は、ジニ係数ＧＣが予め定められた値（以下、閾値ＧＣＴという）より大きいか否かを判定する（Ｓ５）。 Next, the log deletion unit 13 determines whether or not the Gini coefficient GC is larger than a predetermined value (hereinafter referred to as a threshold GCT) (S5).

ログ削除部１３は、ジニ係数ＧＣが閾値ＧＣＴより大きいなら（Ｓ５：ＹＥＳ）、予め定められた数（以下、閾値ＬＴという）より多い数のログを有するログの集合をログ蓄積部１１から検索し（Ｓ７）、各ログの集合について、ユーザＩＤおよびシリーズＩＤの組ごとに、ステップＳ９、Ｓ１１の処理を行う。 If the Gini coefficient GC is greater than the threshold value GCT (S5: YES), the log deletion unit 13 searches the log storage unit 11 for a set of logs having a larger number of logs than a predetermined number (hereinafter referred to as threshold value LT). (S7) For each set of logs, the processes of steps S9 and S11 are performed for each set of user ID and series ID.

ステップＳ９では、ログ削除部１３は、該組のユーザＩＤおよびシリーズＩＤを含むログの集合（以下、ログ集合という）をログ蓄積部１１から検索する（Ｓ９）。 In step S9, the log deletion unit 13 searches the log storage unit 11 for a set of logs (hereinafter referred to as a log set) including the user ID and series ID of the set (S9).

続くステップＳ１１では、ログ削除部１３は、該ログ集合から予め定めた数のログを送信日時の最も新しいものから順に選択するとともに選択されなかったログを削除する（Ｓ１１）。 In subsequent step S11, the log deleting unit 13 selects a predetermined number of logs from the log set in order from the newest transmission date and time, and deletes unselected logs (S11).

図７に示すように、ステップＳ１１では、ログ削除部１３は、５つのログからなるログ集合から予め定めた数である３に等しい３つのログを送信日時の最も新しいものから順に選択するとともに選択されなかった２つのログを削除する。 As shown in FIG. 7, in step S11, the log deletion unit 13 sequentially selects and selects three logs equal to a predetermined number 3 from the log set including five logs in order from the newest transmission date and time. Delete the two logs that were not done.

図５に戻り、次に、ジニ係数計算部１２は、ステップＳ３と同様に、ジニ係数ＧＣを計算し（Ｓ１３）、ログ削除部１３は、ステップＳ５と同様に、ジニ係数ＧＣが閾値ＧＣＴより大きいか否かを判定する（Ｓ１５）。 Returning to FIG. 5, next, the Gini coefficient calculation unit 12 calculates the Gini coefficient GC similarly to Step S <b> 3 (S <b> 13), and the log deletion unit 13 determines that the Gini coefficient GC is greater than the threshold GCT, similar to Step S <b> 5. It is determined whether it is larger (S15).

ログ削除部１３は、ジニ係数ＧＣが閾値ＧＣＴより大きいなら（Ｓ１５：ＹＥＳ）、閾値ＬＴ以下の最も少ない数のログを有するログの集合（以下、ログの集合ＬＬという）をログ蓄積部１１から１つ検索し（Ｓ１７）、ログの集合ＬＬの最も新しい送信日時から最も古い送信日時までの期間の期間長を求め、該期間長に対し、０〜１の範囲に属する予め定めた係数（例えば、０．５）を乗じた期間長（以下、期間長Ｔという）を求める（Ｓ１９）。ログの集合ＬＬの最も新しい送信日時から最も古い送信日時までの期間の期間長が３０日で、係数が０．５なら、期間長Ｔは１５日となる。 If the Gini coefficient GC is greater than the threshold GCT (S15: YES), the log deletion unit 13 determines from the log storage unit 11 a set of logs having the smallest number of logs equal to or less than the threshold LT (hereinafter referred to as a log set LL). One is searched (S17), the period length of the period from the most recent transmission date and time to the oldest transmission date and time of the log set LL is obtained, and a predetermined coefficient belonging to the range of 0 to 1 (for example, , 0.5) is obtained (S19). If the period length of the period from the most recent transmission date / time of the log set LL to the oldest transmission date / time is 30 days and the coefficient is 0.5, the period length T is 15 days.

次に、ログ削除部１３は、閾値ＬＴより多い数のログを有するログの集合（以下、ログの集合ＬＭという）をログ蓄積部１１から検索し（Ｓ２１）、各ログの集合ＬＭについて、ステップＳ２３の処理を行う。 Next, the log deletion unit 13 searches the log storage unit 11 for a set of logs having a number of logs larger than the threshold LT (hereinafter, referred to as a log set LM) (S21). The process of S23 is performed.

ステップＳ２３では、ログの集合ＬＭから最も新しい送信日時（以下、送信日時Ｐという）を検出し、該ログの集合ＬＭの各送信日時について、送信日時Ｐから期間長Ｔだけ遡った時点までの期間に含まれるか否か判定し、該期間に含まれると判定された送信日時を含むログを選択するとともに選択されなかったログを該ログの集合ＬＭから削除する（Ｓ２３）。 In step S23, the most recent transmission date and time (hereinafter referred to as transmission date and time P) is detected from the log set LM, and each transmission date and time of the log set LM is a period from the transmission date and time P to the time point that is back by the period length T. The log including the transmission date / time determined to be included in the period is selected and the log not selected is deleted from the log set LM (S23).

次に、ジニ係数計算部１２は、ステップＳ３、Ｓ１３と同様に、ジニ係数ＧＣを計算し（Ｓ２５）、ログ削除部１３は、ステップＳ５、Ｓ１５と同様に、ジニ係数ＧＣが閾値ＧＣＴより大きいか否かを判定する（Ｓ２７）。 Next, the Gini coefficient calculation unit 12 calculates the Gini coefficient GC similarly to Steps S3 and S13 (S25), and the log deletion unit 13 determines that the Gini coefficient GC is larger than the threshold GCT, similar to Steps S5 and S15. It is determined whether or not (S27).

ログ削除部１３は、ジニ係数ＧＣが閾値ＧＣＴより大きいなら（Ｓ２７：ＹＥＳ）、ステップ２５の過程で計算した平均値μに予め定められた倍数を乗じた値（以下、閾値ＬＴ２という）を求め（Ｓ２９）、閾値ＬＴ２より多い数のログを有するログの集合をログ蓄積部１１から検索し（Ｓ３１）、閾値ＬＴ２より多い数のログを有するログの集合があれば（Ｓ３２：ＹＥＳ）、各ログの集合について、ユーザＩＤおよびシリーズＩＤの組ごとに、ステップＳ９、Ｓ１１の処理を行う。ステップＳ１１の後は、ステップＳ１３に進む。 If the Gini coefficient GC is greater than the threshold value GCT (S27: YES), the log deletion unit 13 obtains a value (hereinafter referred to as a threshold value LT2) obtained by multiplying the average value μ calculated in the process of step 25 by a predetermined multiple. (S29) A set of logs having a number of logs larger than the threshold LT2 is searched from the log storage unit 11 (S31), and if there is a set of logs having a number of logs larger than the threshold LT2 (S32: YES), For the set of logs, the processes of steps S9 and S11 are performed for each set of user ID and series ID. After step S11, the process proceeds to step S13.

さて、ステップＳ５、Ｓ１５、または、Ｓ２７で、ジニ係数ＧＣが閾値ＧＣＴ以下（ＮＯ）であると判定されたなら、または、閾値ＬＴ２より多い数のログを有するログの集合がなければ（Ｓ３２：ＮＯ）、コンテンツ推奨部１４は、ユーザ毎に、ログ蓄積部１１を用いて、該ユーザに好まれると思われるコンテンツのコンテンツＩＤを予め定められた最大数以下の範囲で求め、該コンテンツＩＤを含む情報を該ユーザに対応するユーザ端末に送信する（Ｓ３３）ことで、該ユーザに該コンテンツを推奨し、一連の処理を終了する。 If it is determined in step S5, S15, or S27 that the Gini coefficient GC is equal to or less than the threshold value GCT (NO), or if there is no log set having more logs than the threshold value LT2 (S32: NO), for each user, the content recommendation unit 14 uses the log storage unit 11 to obtain the content ID of the content that the user is likely to like in a range of a predetermined maximum number or less, and obtains the content ID. By transmitting the included information to the user terminal corresponding to the user (S33), the content is recommended to the user, and a series of processing ends.

コンテンツ推奨部１４は、該ユーザに好まれると思われるコンテンツにより構成されるシリーズのシリーズＩＤを求め、該シリーズＩＤを含む情報をユーザ端末に送信する（Ｓ３３）ことで、該ユーザに該シリーズを構成するコンテンツを推奨してもよい。 The content recommendation unit 14 obtains a series ID of a series composed of content that is considered to be preferred by the user, and transmits information including the series ID to the user terminal (S33). You may recommend the content that you make up.

なお、ユーザに好まれると思われるコンテンツを求めるには、非特許文献１を技術を使用することができる。ここでは、コンテンツ間、シリーズ間、サービス間での関連性が求められ、このような関連性を基に、コンテンツが決定される。 It should be noted that the technology of Non-Patent Document 1 can be used to obtain content that is considered to be preferred by users. Here, relevance between contents, series, and services is required, and content is determined based on such relevance.

仮に、ログ削除部１３による削除前のログ蓄積部１１（当初のログ蓄積部１１）に対して、当該技術を使用した場合、あるサービスのサービスＩＤを含むログの数が当初のログ蓄積部１１において他のサービスＩＤを含むログの数に比べて極端に少ないときは、そのサービスのコンテンツは、ユーザに好まれると思われるコンテンツとしては認識されず、よって、推奨もされない。 If the technology is used for the log storage unit 11 (initial log storage unit 11) before deletion by the log deletion unit 13, the number of logs including the service ID of a certain service is the initial log storage unit 11 When the number of logs including other service IDs is extremely small, the content of the service is not recognized as content that the user would like, and therefore is not recommended.

しかし、ログ削除部１３による削除後のログ蓄積部１１において、そのサービスのサービスＩＤを含むログの数は相対的に多くなり、よって、そのサービスのコンテンツを、ユーザに好まれると思われるコンテンツとして推奨することができる。 However, in the log storage unit 11 after deletion by the log deletion unit 13, the number of logs including the service ID of the service is relatively large. Therefore, the content of the service is the content that the user thinks is preferred. Can be recommended.

したがって、本実施の形態によれば、レコメンドサーバ１において、送信日時、コンテンツＩＤ、サービスＩＤ、シリーズＩＤおよびユーザＩＤを含むログが蓄積されるログ蓄積部１１と、サービスＩＤごとの該サービスＩＤを含むログの集合におけるログの数の不均衡さを示すジニ係数ＧＣを求めるジニ係数計算部１２と、ジニ係数ＧＣが予め定めた閾値ＧＣＴより大きいなら（Ｓ５：ＹＥＳ）、予め定められた数（閾値ＬＴ）より多い数のログを有するログの集合をログ蓄積部１１から検索し（Ｓ７）、各ログの集合について、ユーザＩＤおよびシリーズＩＤの組ごとに、該組のユーザＩＤおよびシリーズＩＤとを含むログの集合であるログ集合をログ蓄積部１１から検索し（Ｓ９）、該ログ集合ごとに、該ログ集合から予め定めた数（図７の例では「３」）のログを送信日時の最も新しいものから順に選択するとともに選択されなかったログを削除する（Ｓ１１）ログ削除部１３を備えるログ処理装置が構成されるので、サービスごとのログの数の不均衡さを解消することができる。 Therefore, according to the present embodiment, in the recommendation server 1, the log storage unit 11 in which logs including the transmission date / time, content ID, service ID, series ID, and user ID are stored, and the service ID for each service ID. If the Gini coefficient GC is greater than a predetermined threshold GCT (S5: YES), a predetermined number (S5: YES) is obtained. A set of logs having a larger number of logs than the threshold value LT) is searched from the log storage unit 11 (S7). For each set of logs, for each set of user ID and series ID, the set of user ID and series ID and The log accumulation unit 11 is searched for a log set that is a set of logs including (S9), and for each log set, a predetermined number ( In the example of FIG. 7, the log of “3”) is selected in order from the newest transmission date and time, and the log that has not been selected is deleted (S11). Can eliminate the imbalance in the number of logs.

また、ログ削除部１３は、ジニ係数ＧＣが閾値ＧＣＴより大きいなら（Ｓ１５：ＹＥＳ）、ログ蓄積部１１からログ数の異なる２つのログの集合（ログの集合ＬＬ、ＬＭ）を検索し（Ｓ１７、Ｓ２１）、ログ数の少ない該ログの集合ＬＬの最も新しい送信日時から最も古い送信日時までの期間の期間長を求め、該期間長に予め定めた係数を乗じた期間長（期間長Ｔ）を求め（Ｓ１９）、ログ数の多い該ログの集合ＬＭから最も新しい送信日時（送信日時Ｐ）を検出し、該ログの集合の各送信日時について、該検出した送信日時Ｐから該求めた期間長Ｔだけ遡った時点までの期間に含まれるか否か判定し、該期間に含まれると判定された送信日時を含むログを選択するとともに選択されなかったログを該ログの集合から削除する（Ｓ２３）ので、サービスごとのログの数の不均衡さを解消することができる。 If the Gini coefficient GC is larger than the threshold GCT (S15: YES), the log deletion unit 13 searches the log storage unit 11 for two log sets (log sets LL and LM) having different numbers of logs (S17). S21), the period length of the period from the newest transmission date and time to the oldest transmission date and time of the log set LL with a small number of logs is obtained, and the period length (period length T) obtained by multiplying the period length by a predetermined coefficient (S19), the most recent transmission date and time (transmission date and time P) is detected from the log set LM with a large number of logs, and each transmission date and time of the log set is determined from the detected transmission date and time P. It is determined whether or not it is included in a period up to a point back by the length T, a log including a transmission date and time determined to be included in the period is selected, and a log that has not been selected is deleted from the set of logs ( S23) It is possible to eliminate a number of the imbalance of the log of each service.

なお、本実施の形態では、ジニ係数ＧＣを用いたが、サービスごとのログの数の不均衡さを示す係数なら、他のものを用いてもよい。また、不均衡さを示す係数による条件判定を行わず、多数のログを有するログの集合に対して、ログ削除の処理を行うように構成しても良い。 In this embodiment, the Gini coefficient GC is used, but any other coefficient may be used as long as it is a coefficient indicating the imbalance of the number of logs for each service. Further, it may be configured such that log deletion processing is performed on a set of logs having a large number of logs without performing condition determination using a coefficient indicating imbalance.

また、あらかじめ定めたサービス種別に対応するログ集合に対してのみＳ９、Ｓ１１を行うように構成しても良い。 Further, S9 and S11 may be performed only for a log set corresponding to a predetermined service type.

なお、本実施の形態に係るログ処理装置としてコンピュータを機能させるためのコンピュータプログラムは、半導体メモリ、磁気ディスク、光ディスク、光磁気ディスク、磁気テープなどのコンピュータ読み取り可能な記録媒体に記録でき、また、インターネットなどの通信網を介して伝送させて、広く流通させることができる。 The computer program for causing the computer to function as the log processing apparatus according to the present embodiment can be recorded on a computer-readable recording medium such as a semiconductor memory, a magnetic disk, an optical disk, a magneto-optical disk, or a magnetic tape. It can be widely distributed by being transmitted via a communication network such as the Internet.

１…レコメンドサーバ
１１…ログ蓄積部
１２…ジニ係数計算部
１３…ログ削除部
１４…コンテンツ推奨部
２１、２２…コンテンツサーバ
３１、３２…ユーザ端末
１０１…コンテンツデータベース
１０２…履歴情報データベース
１０３…メタ情報データベース DESCRIPTION OF SYMBOLS 1 ... Recommendation server 11 ... Log storage part 12 ... Gini coefficient calculation part 13 ... Log deletion part 14 ... Content recommendation part 21, 22 ... Content server 31, 32 ... User terminal 101 ... Content database 102 ... History information database 103 ... Meta information The database

Claims

A log processing device that processes logs accumulated to recommend content to a destination user when one or more series composed of one or more contents can be transmitted in each of a plurality of services,
The transmission date and time of the transmission indicating that any content of any series of any service has been transmitted for any user, the content ID of the content, the service ID of the service, the series of the series A log storage unit for storing a log including an ID and the user ID of the user;
A set of logs having a number of logs larger than a predetermined number is searched from the log storage unit, and for each set of logs, for each set of user ID and series ID, the user ID and series ID of the set are set. A log set that is a set of logs to be included is searched from the log storage unit, and for each log set, a predetermined number of logs from the log set are selected in order from the newest transmission date and time, and logs that are not selected And a log deleting unit that deletes the log.

A coefficient calculation unit for obtaining a coefficient indicating an imbalance in the number of logs in a set of logs including the service ID for each service ID;
If the coefficient is greater than a predetermined threshold, the log deletion unit searches the log storage unit for a set of logs having a larger number of logs than a predetermined number, and for each log set, a user ID and For each set of series IDs, a log set, which is a set of logs including the user ID and series ID of the set, is searched from the log storage unit, and a predetermined number of logs from the log set for each log set The log processing apparatus according to claim 1, wherein the logs are selected in order from the most recent transmission date and time, and logs that have not been selected are deleted.

A coefficient calculation unit for obtaining a coefficient indicating an imbalance in the number of logs in a set of logs including the service ID for each service ID;
If the coefficient is greater than a predetermined threshold, the log deletion unit searches the log storage unit for a set of two logs having different numbers of logs, and starts from the most recent transmission date and time of the set of logs with a small number of logs. The period length of the period up to the oldest transmission date and time is obtained, the period length obtained by multiplying the period length by a predetermined coefficient is obtained, the latest transmission date and time is detected from the set of logs having a large number of logs, and the set of logs For each transmission date and time, it is determined whether or not it is included in a period from the detected transmission date and time to the time point that is back by the determined period length, and a log including the transmission date and time determined to be included in the period is selected. The log processing apparatus according to claim 1, wherein a log that has not been selected is deleted from the set of logs.

An operation method of a log processing apparatus that processes accumulated logs in order to recommend content to a destination user when one or more series composed of one or more contents can be transmitted in each of a plurality of services. There,
The log processing device includes:
The transmission date and time of the transmission indicating that any content of any series of any service has been transmitted for any user, the content ID of the content, the service ID of the service, the series of the series A log storage unit for storing a log including an ID and the user ID of the user;
The operation method is as follows:
The log processing device searches the log storage unit for a set of logs having a larger number of logs than a predetermined number, and for each set of logs, for each set of user ID and series ID, the set of users A log set that is a set of logs including an ID and a series ID is searched from the log storage unit, and for each log set, a predetermined number of logs are selected in order from the newest transmission date and time from the log set. And an operation method of the log processing apparatus, wherein the log that has not been selected is deleted.

The log processing device obtains a coefficient indicating an imbalance of the number of logs in a set of logs including the service ID for each service ID in advance,
In the case where the log processing device has a coefficient larger than a predetermined threshold, the log processing device searches the log storage unit for a set of logs having a larger number of logs than a predetermined number. And for each set of series IDs, a log set that is a set of logs including the user ID and series ID of the set is searched from the log storage unit, and for each log set, a predetermined number of log sets are obtained from the log set. 5. The operation method of the log processing apparatus according to claim 4, wherein logs are selected in order from the most recent transmission date and time, and logs not selected are deleted.

The log processing device obtains a coefficient indicating an imbalance of the number of logs in a set of logs including the service ID for each service ID in advance,
When the log processing device searches for a set of two logs having different numbers of logs from the log storage unit when the coefficient is larger than a predetermined threshold, the most recent transmission date / time of the set of logs having a small number of logs To obtain the period length of the period from the oldest transmission date to the oldest transmission date, determine the period length obtained by multiplying the period length by a predetermined coefficient, detect the latest transmission date and time from the set of logs having a large number of logs, For each transmission date and time of the set, it is determined whether or not it is included in a period from the detected transmission date and time to the time point that is back by the determined period length, and a log including the transmission date and time determined to be included in the period is selected The log processing apparatus operating method according to claim 4, further comprising: deleting a log that has not been selected from the set of logs.

A computer program for causing a computer to function as the log processing device according to claim 1.