JP5519824B1

JP5519824B1 - Interest analysis method, interest analysis apparatus, and interest analysis program

Info

Publication number: JP5519824B1
Application number: JP2013091047A
Authority: JP
Inventors: 将成藤田; 明片岡; 丈二中山; 知之兼清
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-04-24
Filing date: 2013-04-24
Publication date: 2014-06-11
Anticipated expiration: 2033-04-24
Also published as: JP2014215734A

Abstract

【課題】ユーザの興味を分析する際に必要なデータ量を削減するとともに必要なデータ量の事前見積もりを可能とする。
【解決手段】ユーザ興味スコアを算出する際に求める中間結果を所定の集約期間毎に集約してボックスに格納しておき、履歴ウィンドウサイズに属するボックスに格納された中間結果を用いてユーザ興味スコアを算出する。これにより、履歴ウィンドウサイズのみを分析対象とする場合や時間に従って重み付けを行う場合に、保存しておく中間結果のデータ量を削減するとともに、データ量の事前見積もりを可能とすることができる。
【選択図】図１An object of the present invention is to reduce the amount of data required when analyzing user interests and to estimate the amount of data required in advance.
An intermediate result obtained when calculating a user interest score is aggregated for each predetermined aggregation period and stored in a box, and a user interest score is obtained using the intermediate result stored in a box belonging to a history window size. Is calculated. As a result, when only the history window size is to be analyzed or when weighting is performed according to time, it is possible to reduce the data amount of intermediate results to be saved and to estimate the data amount in advance.
[Selection] Figure 1

Description

本発明は、コンテンツ閲覧履歴からユーザの興味を分析する技術に関する。 The present invention relates to a technique for analyzing a user's interest from a content browsing history.

ユーザの行動や状況に合わせて適切なサービス・コンテンツをレコメンドする技術が望まれている。そこで、発明者らは、概念出現の希少性を利用してユーザの興味を高精度に推定できる興味分析方法及び興味分析装置を提案した（特許文献１）。 There is a demand for a technique for recommending appropriate services and contents in accordance with user behavior and situations. Therefore, the inventors have proposed an interest analysis method and an interest analysis device that can estimate the user's interest with high accuracy by utilizing the scarcity of concept appearance (Patent Document 1).

特開２０１３−００３７９７号公報JP 2013-003797 A

特許文献１では、ユーザの興味を分析する際に過去全期間の履歴を分析対象とし、入力された履歴情報を利用してユーザの興味度を更新していた。過去一定期間の履歴のみを分析対象としたい場合や、過去に蓄積した履歴について現在との時間差（経過時間）に従って重み付けを行いたい場合は、過去一定期間の中間結果を保存しておく必要がある。しかしながら、中間結果として保存しておくデータ量が多いという問題があった。また、過去一定期間において算出される中間結果のデータ量が事前に見積もれないという問題があった。 In Patent Literature 1, when analyzing user interests, the history of all past periods is set as an analysis target, and the interest level of the user is updated using the input history information. If you want to analyze only the history of a certain period in the past, or if you want to weight the history accumulated in the past according to the time difference (elapsed time) from the present, it is necessary to save the intermediate results of the past period . However, there is a problem that a large amount of data is stored as an intermediate result. In addition, there is a problem that the data amount of the intermediate result calculated in the past fixed period cannot be estimated in advance.

本発明は、上記に鑑みてなされたものであり、ユーザの興味を分析する際に過去一定期間の履歴のみを分析対象とする場合や過去に蓄積した履歴について現在との時間差（経過時間）に従って重み付けをする場合に必要なデータ量を削減するとともに必要なデータ量の事前見積もりを可能とすることを目的とする。 The present invention has been made in view of the above, and when analyzing a user's interest, only a history of a certain period in the past is to be analyzed, or according to a time difference (elapsed time) from the present with respect to a history accumulated in the past. It is an object to reduce the amount of data required for weighting and to make a preliminary estimate of the required amount of data.

第１の本発明に係る興味分析方法は、コンテンツに付与された概念に対するユーザの興味を示すユーザ興味スコアを求める興味分析方法であって、複数のコンテンツを一覧として閲覧した第１のコンテンツリストと、前記第１のコンテンツリストから選択して閲覧した第２のコンテンツリストとを受信するステップと、前記第１のコンテンツリストのコンテンツの総数を第１の総数、前記第１のコンテンツリストにおいて処理対象の概念が出現するコンテンツの数を第１の出現数、前記第２のコンテンツリストのコンテンツの総数を第２の総数、前記第２のコンテンツリストにおいて前記概念が出現するコンテンツの数を第２の出現数としたとき、前記第１の総数、前記第１の出現数、前記第２の総数の条件下で、前記第２のコンテンツリストに前記概念が出現するコンテンツの数が前記第２の出現数以上となる第１の確率及び前記第２の出現数以下となる第２の確率を算出し、前記第１の確率及び前記第２の確率をもとに標準正規分布の累積分布関数の逆関数により特徴スコアを算出するステップと、前記特徴スコアと当該特徴スコアに対する重みを用いて所定の集約期間の中間結果を算出するステップと、前記集約期間毎に前記中間結果を格納するステップと、所定の時点からの時間的な距離に応じて前記集約期間毎に重み付けをして、集約期間毎に重み付けされた前記中間結果を用いて前記ユーザ興味スコアを算出するステップと、を有することを特徴とする。 An interest analysis method according to a first aspect of the present invention is an interest analysis method for obtaining a user interest score indicating a user's interest in a concept assigned to content, the first content list browsing a plurality of contents as a list, Receiving a second content list selected and browsed from the first content list, a total number of contents in the first content list as a first total, and a processing target in the first content list The number of contents in which the concept appears is the first number of appearances, the total number of contents in the second content list is the second total number, and the number of contents in which the concept appears in the second content list is the second number When the number of appearances is used, the second content list is satisfied under the conditions of the first total number, the first appearance number, and the second total number. Calculating a first probability that the number of contents in which the concept appears is equal to or greater than the second occurrence number and a second probability that is equal to or less than the second occurrence number, and the first probability and the second probability Calculating a feature score by an inverse function of a cumulative distribution function of a standard normal distribution based on the probability of, and calculating an intermediate result of a predetermined aggregation period using the feature score and a weight for the feature score; Storing the intermediate result for each aggregation period, weighting for each aggregation period according to a temporal distance from a predetermined time point, and using the intermediate result weighted for each aggregation period Calculating a user interest score .

上記興味分析方法において、所定の一定期間より前の前記中間結果を集約して長期興味中間結果として保存し、前記所定の一定期間に属する集約期間で集約された集約期間毎の中間結果と重み付けを行い組み合わせて前記ユーザ興味スコアを算出することを特徴とする。 In the above-described interest analysis method, the intermediate results before a predetermined fixed period are aggregated and stored as long-term interest intermediate results, and the intermediate results and weights for each aggregation period aggregated in the aggregation period belonging to the predetermined fixed period are weighted. The user interest score is calculated in combination.

上記興味分析方法において、前記長期興味中間結果を格納する長期興味保存領域と、前記集約期間で中間結果を集約した結果を格納する集約期間毎の保存領域を前記集約期間の設定の数に対応する個数分用意しておき、前記中間結果を格納するステップは、前記保存領域に集約された中間結果を格納し、所定の時点において、前記保存領域に対応する集約期間が前記所定の一定期間に属さなくなった場合は、当該保存領域に格納された中間結果を前記長期興味保存領域に反映するとともに当該保存領域を初期化して新たに前記所定の一定期間に属することとなった前記集約期間の保存領域として用いることを特徴とする。 In the interest analysis method, and long-term interest storage area for storing the long-term interest intermediate result, the corresponding storage area for each aggregation period for storing the result of aggregating the intermediate results in the aggregation period to the number of settings of the aggregation period In the step of storing the intermediate results, the intermediate results aggregated in the storage area are stored, and the aggregation period corresponding to the storage area belongs to the predetermined fixed period at a predetermined time. If there is no longer, the intermediate result stored in the storage area is reflected in the long-term interest storage area, the storage area is initialized, and the storage area of the aggregation period newly belonging to the predetermined period It is used as.

上記興味分析方法において、前記長期興味保存領域と前記保存領域をを１つのレコードしてデータベースで管理することを特徴とする。 In the above-described interest analysis method, the long-term interest storage area and the storage area are managed as one record in a database.

第２の本発明に係る興味分析装置は、コンテンツに付与された概念に対するユーザの興味を示すユーザ興味スコアを求める興味分析装置であって、複数のコンテンツを一覧として閲覧した第１のコンテンツリストと、前記第１のコンテンツリストから選択して閲覧した第２のコンテンツリストとを受信する受信手段と、前記第１のコンテンツリストのコンテンツの総数を第１の総数、前記第１のコンテンツリストにおいて処理対象の概念が出現するコンテンツの数を第１の出現数、前記第２のコンテンツリストのコンテンツの総数を第２の総数、前記第２のコンテンツリストにおいて前記概念が出現するコンテンツの数を第２の出現数としたとき、前記第１の総数、前記第１の出現数、前記第２の総数の条件下で、前記第２のコンテンツリストに前記概念が出現するコンテンツの数が前記第２の出現数以上となる第１の確率及び前記第２の出現数以下となる第２の確率を算出し、前記第１の確率及び前記第２の確率をもとに標準正規分布の累積分布関数の逆関数により特徴スコアを算出する算出手段と、前記特徴スコアと当該特徴スコアに対する重みを用いて所定の集約期間の中間結果を算出し、前記集約期間毎に前記中間結果を格納する中間結果更新手段と、所定の時点からの時間的な距離に応じて前記集約期間毎に重み付けをして、集約期間毎に重み付けされた前記中間結果を用いて前記ユーザ興味スコアを算出する興味スコア算出手段と、を有することを特徴とする。 An interest analysis apparatus according to a second aspect of the present invention is an interest analysis apparatus for obtaining a user interest score indicating a user's interest in a concept assigned to content, wherein the first content list browses a plurality of contents as a list, Receiving means for receiving a second content list selected and browsed from the first content list, and processing the total number of contents in the first content list in the first content list The number of contents in which the target concept appears is the first number of appearances, the total number of contents in the second content list is the second total number, and the number of contents in which the concept appears in the second content list is the second number. The number of occurrences of the second content list under the condition of the first total number, the first appearance number, and the second total number Calculating a first probability that the number of contents in which the concept appears is equal to or greater than the second occurrence number and a second probability that is equal to or less than the second occurrence number, and the first probability and the second probability A calculation means for calculating a feature score by an inverse function of a cumulative distribution function of a standard normal distribution based on the probability of calculating an intermediate result of a predetermined aggregation period using the feature score and a weight for the feature score, An intermediate result update unit that stores the intermediate result for each aggregation period, and uses the intermediate result weighted for each aggregation period, weighted for each aggregation period according to a temporal distance from a predetermined time point And an interest score calculating means for calculating the user interest score .

第３の本発明に係る興味分析プログラムは、上記興味分析装置の各部としてコンピュータを動作させることを特徴とする。 An interest analysis program according to a third aspect of the present invention is characterized in that a computer is operated as each part of the interest analysis apparatus.

本発明によれば、ユーザの興味を分析する際に過去一定期間の履歴のみを分析対象とする場合や時間に従って重み付けをする場合に必要なデータ量を削減するとともに必要なデータ量の事前見積もりを可能とすることができる。 According to the present invention, when analyzing the user's interests, the amount of data required when analyzing only a history of a certain period in the past or when weighting according to time is reduced, and the necessary amount of data is estimated in advance. Can be possible.

本実施の形態における興味分析装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the interest analysis apparatus in this Embodiment. 一覧閲覧コンテンツリストの例を示す図である。It is a figure which shows the example of a list browsing content list. 詳細閲覧コンテンツリストの例を示す図である。It is a figure which shows the example of a detailed browsing content list. 概念体系／ユーザ興味スコアデータベースに格納される概念体系テーブルの例を示す図である。It is a figure which shows the example of the concept system table stored in a concept system / user interest score database. 概念体系のグラフを示す図である。It is a figure which shows the graph of a conceptual system. ユーザ興味度中間結果テーブルの例を示す図である。It is a figure which shows the example of a user interest degree intermediate | middle result table. ユーザ興味度テーブルの例を示す図である。It is a figure which shows the example of a user interest degree table. 提示コンテンツリストの例を示す図である。It is a figure which shows the example of a presentation content list. 本実施の形態における興味分析装置がユーザ興味スコアを算出する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process in which the interest analysis apparatus in this Embodiment calculates a user interest score. 分析パラメータの例を示す図である。It is a figure which shows the example of an analysis parameter. 履歴ウィンドウサイズとボックスを説明するための図である。It is a figure for demonstrating the log | history window size and a box. ボックスに格納された中間結果を更新する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which updates the intermediate result stored in the box.

以下、本発明の実施の形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本実施の形態における興味分析装置の構成を示す機能ブロック図である。本興味分析装置１は、ユーザがコンテンツを一覧として閲覧した一覧閲覧コンテンツリストと、ユーザがコンテンツの一覧から選択してコンテンツの本体を閲覧した詳細閲覧コンテンツリストを入力し、コンテンツそれぞれに付与された各概念に対するユーザの興味の度合いを示すユーザ興味スコアを算出してユーザの興味を分析する。そして、ユーザ興味スコアに基いて算出したコンテンツの評価スコアの高い順にコンテンツをソートしたソート済みコンテンツスコアリストを出力する。 FIG. 1 is a functional block diagram showing the configuration of the interest analysis apparatus in the present embodiment. The interest analysis apparatus 1 inputs a list browsing content list in which a user browses content as a list, and a detailed browsing content list in which the user browses the content body by selecting from the content list, and is given to each content A user interest score indicating the degree of interest of the user with respect to each concept is calculated to analyze the user's interest. Then, a sorted content score list in which the content is sorted in descending order of the content evaluation score calculated based on the user interest score is output.

図１に示す興味分析装置１は、履歴情報受信部１１、特徴スコア算出部１２、概念体系更新処理部１３、概念体系／ユーザ興味スコアデータベース１４、提示コンテンツリスト受信部１５、コンテンツデータベース１６、コンテンツ評価処理部１７、およびソート済みコンテンツスコアリスト送信部１８を備える。興味分析装置１が備える各部は、演算処理装置、記憶装置等を備えたコンピュータにより構成して、各部の処理がプログラムによって実行されるものとしてもよい。このプログラムは興味分析装置１が備える記憶装置に記憶されており、磁気ディスク、光ディスク、半導体メモリ等の記録媒体に記録することも、ネットワークを通して提供することも可能である。 The interest analysis apparatus 1 shown in FIG. 1 includes a history information receiving unit 11, a feature score calculating unit 12, a concept system update processing unit 13, a concept system / user interest score database 14, a presentation content list receiving unit 15, a content database 16, and content. An evaluation processing unit 17 and a sorted content score list transmission unit 18 are provided. Each unit included in the interest analysis device 1 may be configured by a computer including an arithmetic processing device, a storage device, and the like, and the processing of each unit may be executed by a program. This program is stored in a storage device included in the interest analysis apparatus 1, and can be recorded on a recording medium such as a magnetic disk, an optical disk, or a semiconductor memory, or provided through a network.

履歴情報受信部１１は、ユーザがクライアント端末で閲覧した一覧閲覧コンテンツリスト及び詳細閲覧コンテンツリストを受信する。一覧閲覧コンテンツリスト及び詳細閲覧コンテンツリストは、クライアント端末で生成されてコンテンツサーバに送信され、コンテンツサーバから興味分析装置１に転送される。 The history information receiving unit 11 receives a list browsing content list and a detailed browsing content list browsed by the user on the client terminal. The list browsing content list and the detailed browsing content list are generated at the client terminal, transmitted to the content server, and transferred from the content server to the interest analysis device 1.

一覧閲覧コンテンツリストとは、ユーザがコンテンツのタイトルを一覧で閲覧したコンテンツのリストである。図２に、一覧閲覧コンテンツリストの例を示す。同図に示す一覧閲覧コンテンツリストには、クラスタＩＤ、コンテンツを識別するためのコンテンツＩＤ、および閲覧時刻が記載されている。閲覧時刻は、コンテンツのタイトルがクライアント端末に表示された時刻である。コンテンツの全てのタイトルがクライアント端末で一度に表示されない場合は、ユーザがコンテンツのタイトルをスクロールさせて表示したものが一覧閲覧コンテンツリストに記載される。クラスタＩＤとは、一覧閲覧コンテンツリスト及び詳細閲覧コンテンツリストに一意に付与される識別子である。ユーザが別の時刻（時間帯）にコンテンツのタイトルを一覧で閲覧、詳細を表示した場合は、別のクラスタＩＤが付与される。なお、時刻以外の条件でクラスタＩＤを新たに付与する条件としては、一覧閲覧コンテンツリスト表示中に一定時間操作が無かった場合や、閲覧するユーザ（ユーザＩＤ）を切り替えた場合、一覧閲覧コンテンツリストに対して、コンテンツジャンル等を観点に絞り込み検索を掛けた場合、その他閲覧アプリケーションにおいて閲覧モードを切り替えた場合がある。 The list browsing content list is a list of content that the user has browsed in a list of content titles. FIG. 2 shows an example of the list browsing content list. In the list browsing content list shown in the figure, a cluster ID, a content ID for identifying the content, and a browsing time are described. The browsing time is the time when the title of the content is displayed on the client terminal. When all the titles of content are not displayed at the same time on the client terminal, what is displayed by scrolling the title of the content by the user is described in the list browsing content list. The cluster ID is an identifier uniquely assigned to the list browsing content list and the detailed browsing content list. When the user browses the titles of the contents in a list at a different time (time zone) and displays details, a different cluster ID is given. The conditions for newly assigning the cluster ID under conditions other than the time include when there is no operation for a certain period of time while the list browsing content list is displayed, or when the browsing user (user ID) is switched, the list browsing content list On the other hand, when a narrow search is performed from the viewpoint of the content genre or the like, the browsing mode may be switched in other browsing applications.

詳細閲覧コンテンツリストとは、ユーザが一覧で閲覧したコンテンツのリストから選択してコンテンツ本体の内容（詳細）を閲覧したコンテンツのリストである。図３に、詳細閲覧コンテンツリストの例を示す。同図に示す一覧閲覧コンテンツリストには、クラスタＩＤ、コンテンツＩＤ、および閲覧時刻が記載されている。閲覧時刻は、ユーザがコンテンツのタイトルの一覧からコンテンツを選択し、コンテンツの本体を閲覧した時刻である。 The detailed browsing content list is a list of content that is selected from a list of content browsed by the user and browses the contents (details) of the content body. FIG. 3 shows an example of the detailed browsing content list. The list browsing content list shown in the figure describes a cluster ID, a content ID, and a browsing time. The browsing time is the time when the user selects content from a list of content titles and browses the content body.

特徴スコア算出部１２は、一覧閲覧コンテンツリスト及び詳細閲覧コンテンツリストを利用して概念選択の統計モデルによりコンテンツに付与された各概念の特徴スコアを算出する。特徴スコアの算出の詳細については後述する。 The feature score calculation unit 12 calculates the feature score of each concept assigned to the content by the statistical model of concept selection using the list browsing content list and the detailed browsing content list. Details of the feature score calculation will be described later.

概念体系更新処理部１３は、特徴スコア算出部１２が算出した特徴スコアを用いて各概念に対するユーザ興味スコアを算出して更新し、概念体系／ユーザ興味スコアデータベース１４に格納された概念体系における概念間の関係情報（上位概念及び下位概念）を利用して下位概念のユーザ興味スコアを更新する。 The concept system update processing unit 13 calculates and updates the user interest score for each concept using the feature score calculated by the feature score calculation unit 12, and the concept in the concept system stored in the concept system / user interest score database 14. The user interest score of the subordinate concept is updated using the relationship information (superordinate concept and subordinate concept).

図４に、概念体系／ユーザ興味スコアデータベース１４に格納される概念体系テーブルの例を示す。同図に示す概念体系テーブルは、自概念ＩＤ、親概念ＩＤリスト、および子概念ＩＤリストを格納する。概念体系内の全ての自概念ＩＤは、親概念ＩＤおよび子概念ＩＤと紐付けて保存されており、図５に示す概念体系のグラフのような概念構造が定義される。図５に示す概念体系のグラフに含まれるノードは概念を表し、リンクは概念間の関係を表す。概念体系のノード値がユーザ興味スコアである。概念体系において、上位に位置するノードほど抽象的な概念を表し、下位に位置するノードほど具体的な概念を表す。概念体系は、サービス運用者等が事前に設計し定義する。 FIG. 4 shows an example of a concept system table stored in the concept system / user interest score database 14. The concept system table shown in the figure stores a self-concept ID, a parent concept ID list, and a child concept ID list. All the self-concept IDs in the concept system are stored in association with the parent concept ID and the child concept ID, and a concept structure like a graph of the concept system shown in FIG. 5 is defined. Nodes included in the conceptual system graph shown in FIG. 5 represent concepts, and links represent relationships between concepts. The node value of the conceptual system is the user interest score. In the concept system, the nodes located at the higher level represent the abstract concept, and the nodes located at the lower level represent the specific concept. The concept system is designed and defined in advance by service operators.

ユーザ興味スコアの更新に際し、本実施の形態では、概念体系更新処理部１３がユーザ興味スコアを更新する際に求めた中間結果を所定の集約期間を元に集約して過去一定期間分保存しておき、過去一定期間の履歴のみを分析対象としたり、ユーザの興味の変化に応じて重み付けを可能とした。具体的には、過去一定期間を履歴ウィンドウサイズとし、履歴ウィンドウサイズをサービスとして要求される忘却粒度に適した集約期間でＢ個のボックスに区切り、ボックス毎に中間結果を集約した値を保存した。履歴ウィンドウサイズはＢ個の集約期間の合計値となる。図６に、中間結果を格納するユーザ興味度中間結果テーブルの例を示す。ユーザ興味度中間結果テーブルは、ユーザＩＤ、概念ＩＤを保持し、ユーザそれぞれについて概念ＩＤ毎に中間結果が格納される。ユーザ興味度中間結果テーブルには、集計を始めてから履歴ウィンドウサイズ前までの中間結果を集約した中間結果を格納する長期興味ボックス、履歴ウィンドウサイズをＢ個（図６では３個）のボックスに区切り、集約期間毎に中間結果を集約した中間結果を格納するＢＯＸ１〜ＢＯＸ３が存在する。長期興味ボックスとＢＯＸ１〜ＢＯＸ３は、１つのレコードとして概念体系／ユーザ興味スコアデータベース１４で管理される。なお、集約期間は、ある時点からの相対値（例えば過去Ｎ日〜Ｎ＋Ｍ日の形式）、あるいは絶対値（例えばＸ月Ｙ日〜Ｚ月Ｗ日の形式）で表される。 In updating the user interest score, in the present embodiment, the conceptual system update processing unit 13 aggregates the intermediate results obtained when updating the user interest score based on a predetermined aggregation period and stores it for a certain past period. In addition, only the history of a certain period in the past can be analyzed, and weighting is possible according to changes in user interest. Specifically, the history window size is defined as the history window size in the past, the history window size is divided into B boxes with an aggregation period suitable for the forgetting granularity required as a service, and an intermediate result is stored for each box. . The history window size is a total value of B aggregation periods. FIG. 6 shows an example of a user interest level intermediate result table for storing intermediate results. The user interest level intermediate result table holds a user ID and a concept ID, and an intermediate result is stored for each concept ID for each user. In the user interest level intermediate result table, a long-term interest box for storing intermediate results obtained by aggregating intermediate results from the start of aggregation until the history window size is stored, and the history window size is divided into B boxes (three in FIG. 6). There are BOX1 to BOX3 for storing intermediate results obtained by aggregating intermediate results for each aggregation period. The long-term interest box and BOX1 to BOX3 are managed in the concept system / user interest score database 14 as one record. The aggregation period is expressed by a relative value from a certain point in time (for example, the format of the past N days to N + M days) or an absolute value (for example, the format of X month Y day to Z month W day).

概念体系更新処理部１３は、特徴スコアを用いて中間結果を求めてユーザ興味度中間結果テーブルに反映させた後、ユーザ興味度中間結果テーブルの各ボックスに格納された中間結果を利用してユーザ興味スコアを算出し、ユーザ興味スコアをユーザ興味度テーブルに格納する。ユーザ興味スコアを算出する際には、所定の時間（例えば現在時刻）からの時間的な距離に応じてボックスに重み付けをする。また、長期興味ボックスに格納された中間結果を組み合わせて用いてもよい。図７に、ユーザ興味度テーブルの例を示す。ユーザ興味度テーブルでは、ユーザそれぞれについて、概念ＩＤ毎にユーザ興味スコアを格納する。 The concept system update processing unit 13 obtains an intermediate result using the feature score and reflects it in the user interest level intermediate result table, and then uses the intermediate result stored in each box of the user interest level intermediate result table. An interest score is calculated, and the user interest score is stored in the user interest degree table. When calculating the user interest score, the box is weighted according to a temporal distance from a predetermined time (for example, the current time). Moreover, you may use combining the intermediate result stored in the long-term interest box. FIG. 7 shows an example of the user interest level table. In the user interest level table, a user interest score is stored for each concept ID for each user.

概念体系／ユーザ興味スコアデータベース１４は、上記の概念体系テーブル、ユーザ興味度中間結果テーブル、およびユーザ興味度テーブルを格納する。 The conceptual system / user interest score database 14 stores the conceptual system table, the user interest level intermediate result table, and the user interest level table.

提示コンテンツリスト受信部１５は、ユーザに提示するコンテンツを一覧にした提示コンテンツリストを受信する。図８に、提示コンテンツリストの例を示す。提示コンテンツリストは、コンテンツＩＤ、概念ＩＤ／関連度リスト、コンテンツ本体、およびコンテンツ登録時刻を有する。概念ＩＤ／関連度リストは、コンテンツ毎に予め設定され、コンテンツに出現する概念の概念ＩＤおよび当該概念とコンテンツとの関連性の程度を示す値（関連度）の組みのリストである。関連度は、例えば、０から１までの値とし、大きい値ほど関連度が強いものとする。例えば、スポーツに関連するコンテンツには、｛野球の概念ＩＤ＝１，関連度＝０．５｝，｛サッカーの概念ＩＤ＝２，関連度＝０．８｝・・というように、概念ＩＤと関連度の組みを設定する。 The presented content list receiving unit 15 receives a presented content list that lists contents to be presented to the user. FIG. 8 shows an example of the presented content list. The presented content list has a content ID, a concept ID / relevance list, a content body, and a content registration time. The concept ID / relationship degree list is a list of combinations of a concept ID of a concept that appears in advance for each content and a value (relevance) indicating the degree of relevance between the concept and the content. For example, the relevance is a value from 0 to 1, and the larger the value, the stronger the relevance. For example, for content related to sports, {baseball concept ID = 1, relevance = 0.5}, {soccer concept ID = 2, relevance = 0.8}. Set the relevance pair.

コンテンツデータベース１６は、提示コンテンツリスト受信部１５が受信した提示コンテンツリストをコンテンツテーブルに格納する。 The content database 16 stores the presented content list received by the presented content list receiving unit 15 in the content table.

コンテンツ評価処理部１７は、コンテンツに出現する各概念のユーザ興味スコアを利用して確率結合によってコンテンツに対するユーザの評価スコアを算出する。 The content evaluation processing unit 17 calculates a user's evaluation score for the content by probability combination using the user interest score of each concept appearing in the content.

ソート済みコンテンツスコアリスト送信部１８は、ユーザの評価スコア順にソートしたコンテンツスコアリストを送信する。 The sorted content score list transmission unit 18 transmits the content score list sorted in the order of the user's evaluation score.

次に、本実施の形態における興味分析装置の動作について説明する。 Next, the operation of the interest analysis device in the present embodiment will be described.

図９は、本実施の形態における興味分析装置がユーザ興味スコアを算出する処理の流れを示すフローチャートである。 FIG. 9 is a flowchart showing a flow of processing in which the interest analysis apparatus according to the present embodiment calculates a user interest score.

履歴情報受信部１１は、ユーザＩＤ（もしくはクライアント端末ＩＤ）、一覧閲覧コンテンツリスト及び詳細閲覧コンテンツリストを受信し、特徴スコア算出部１２へ出力する（ステップＳ１０１）。 The history information receiving unit 11 receives the user ID (or client terminal ID), the list browsing content list, and the detailed browsing content list, and outputs them to the feature score calculation unit 12 (step S101).

特徴スコア算出部１２は、詳細閲覧コンテンツリスト内の各コンテンツに出現する概念ＩＤをコンテンツデータベース１６から抽出して出現概念リストを作成する（ステップＳ１０２）。具体的には、詳細閲覧コンテンツリストの各コンテンツＩＤに紐付けされている概念ＩＤをコンテンツデータベース１６のコンテンツテーブルから検索し、出現概念リストに列挙する。出現概念リストとは、詳細閲覧コンテンツリストに含まれる各コンテンツに付与された概念ＩＤを全て列挙したものである。 The feature score calculation unit 12 extracts the concept ID that appears in each content in the detailed browsing content list from the content database 16 and creates an appearance concept list (step S102). Specifically, the concept ID linked to each content ID in the detailed browsing content list is searched from the content table of the content database 16 and listed in the appearance concept list. The appearance concept list is a list of all concept IDs assigned to the contents included in the detailed browsing content list.

続いて、特徴スコア算出部１２は、出現概念リストの各概念ＩＤの上位概念を抽出して出現概念リストに追加する（ステップＳ１０３）。具体的には、出現概念リスト内の各概念について、概念体系／ユーザ興味スコアデータベース１４に格納された概念体系テーブルを参照して上位概念を抽出し、抽出した上位概念の概念ＩＤを出現概念リストに追加する。 Subsequently, the feature score calculation unit 12 extracts a superordinate concept of each concept ID in the appearance concept list and adds it to the appearance concept list (step S103). Specifically, for each concept in the appearance concept list, a superordinate concept is extracted with reference to the concept system table stored in the concept system / user interest score database 14, and the concept ID of the extracted superordinate concept is used as the appearance concept list. Add to

続いて、特徴スコア算出部１２は、クラスタＩＤ毎に、出現概念リストの各概念について出現数を算出し、特徴スコアの算出に必要な分析パラメータを抽出する（ステップＳ１０４）。図１０に、分析パラメータの例を示す。同図に示す分析パラメータは、クラスタＩＤ毎に集計した、一覧閲覧コンテンツリストのコンテンツ総数Ｓ、詳細閲覧コンテンツリストのコンテンツ総数ａと、さらに出現概念リストの概念ＩＤ毎に集計した、一覧閲覧コンテンツリストのコンテンツで当該概念ＩＤが付与されているコンテンツ数Ｎ、および詳細閲覧コンテンツリストのコンテンツで当該概念ＩＤが付与されているコンテンツ数ｎである。なお、ステップＳ１０３で追加した上位概念も含めて出現概念リスト内の概念ＩＤすべてについて上記のコンテンス数Ｎ，ｎを算出する。 Subsequently, the feature score calculation unit 12 calculates the number of appearances for each concept in the appearance concept list for each cluster ID, and extracts analysis parameters necessary for calculating the feature score (step S104). FIG. 10 shows examples of analysis parameters. The analysis parameters shown in the figure include the total content S of the list browsing content list, the total content a of the detailed browsing content list, and the list browsing content list totaled for each concept ID of the appearance concept list. The number of contents N to which the concept ID is assigned and the number of contents n to which the concept ID is assigned in the contents of the detailed browsing content list. It should be noted that the content numbers N and n are calculated for all concept IDs in the appearance concept list including the superordinate concept added in step S103.

そして、特徴スコア算出部１２は、分析パラメータＳ，ａ，Ｎ，ｎを利用して概念ＩＤ毎に特徴スコアＺを算出する（ステップＳ１０５）。具体的には、次式に示すように、一覧閲覧コンテンツリストのコンテンツ総数Ｓ_ｊ、一覧閲覧コンテンツリストのコンテンツで処理対象の概念ｉが付与されているコンテンツ数Ｎ_ｊの条件下で、詳細閲覧コンテンツリストからコンテンツをａ_ｊ個ランダムに選択した場合に、概念ｉが出現するコンテンツの数がｎ以上となる累積確率Ｈ１と、概念ｉが出現するコンテンツの数がｎ以下となる累積確率Ｈ２を算出し、累積確率Ｈ１，Ｈ２をもとに標準正規分布の累積分布関数の逆関数により特徴スコアＺ_ｉｊを求める。

Then, the feature score calculation unit 12 calculates the feature score Z for each concept ID using the analysis parameters S, a, N, and n (step S105). Specifically, as shown in the following equation, the detailed browsing is performed under the conditions of the total number S _{j of} the list browsing content list and the number N _j of the contents of the list browsing content list to which the concept i to be processed is assigned. When _aj pieces of content are randomly selected from the content list, the cumulative probability H1 that the number of contents in which the concept i appears is n or more and the cumulative probability H2 that the number of contents in which the concept i appears are n or less are Based on the cumulative probabilities H1 and H2, the feature score _Zij is obtained by the inverse function of the standard normal distribution cumulative distribution function.

ここで、ｉは概念ＩＤ、ｊはクラスタＩＤを表す。本実施の形態では、累積確率Ｈ１，Ｈ２を超幾何分布により求めるが、二項分布、正規分布など他の分布を用いてもよい。 Here, i represents a concept ID and j represents a cluster ID. In the present embodiment, the cumulative probabilities H1 and H2 are obtained by hypergeometric distribution, but other distributions such as binomial distribution and normal distribution may be used.

そして、特徴スコア算出部１２は、クラスタ毎に、概念ＩＤ、特徴スコア、および重みｗの組みからなる更新対象概念リストを生成する。重みｗは、ユーザの特徴的な操作、コンテンツの閲覧時間および閲覧状況等に応じた値である。 Then, the feature score calculation unit 12 generates an update target concept list including a set of the concept ID, the feature score, and the weight w for each cluster. The weight w is a value corresponding to a user's characteristic operation, content browsing time, browsing status, and the like.

続いて、概念体系更新処理部１３は、更新対象概念リストに記載された各概念ＩＤのノード値を更新する（ステップＳ１０６）。 Subsequently, the concept system update processing unit 13 updates the node value of each concept ID described in the update target concept list (step S106).

さらに、概念体系更新処理部１３は、更新対象概念リストに記載された各概念ＩＤの下位概念を抽出し、下位概念のノード値を更新する（ステップＳ１０７）。下位概念のユーザ興味スコアの更新に利用する特徴スコアは、例えば、隣接した親概念のうち特徴スコアの絶対値が最も大きい値を利用、最も近い親概念の値を利用、親概念の値を平均、または確率結合した値とする方法がある。なお、ステップＳ１０６で更新したユーザ興味スコアはここでは更新しない。 Further, the concept system update processing unit 13 extracts a subordinate concept of each concept ID described in the update target concept list, and updates the node value of the subordinate concept (step S107). The feature score used to update the user interest score of the subordinate concept is, for example, the value of the adjacent parent concept that has the largest absolute value of the feature score, the value of the closest parent concept, the average value of the parent concept Alternatively, there is a method of using a value obtained by stochastic coupling. Note that the user interest score updated in step S106 is not updated here.

次に、概念体系更新処理部１３が各概念のユーザ興味スコアを更新する処理について説明する。 Next, processing in which the concept system update processing unit 13 updates the user interest score of each concept will be described.

本実施の形態では、概念体系更新処理部１３は、更新対象概念リストの各概念について中間結果を算出し、算出した中間結果で更新対象のボックスが保持する中間結果を更新したうえで、ボックスが保持する中間結果を用いてユーザ興味スコアを更新する。 In the present embodiment, the concept system update processing unit 13 calculates an intermediate result for each concept in the update target concept list, updates the intermediate result held by the update target box with the calculated intermediate result, The user interest score is updated using the retained intermediate result.

まず、中間結果を格納するボックスについて説明する。本実施の形態では、所定の時点（例えば１９７０年１月１日０時０分０秒）を基準として所定の集約期間毎に時間を区切って中間結果を集約し、集約した中間結果を集約期間毎に用意したボックスに格納する。ボックスの数は履歴ウィンドウサイズおよび集約期間の長さで予め決定され、所定の時点（例えば処理時点）において集約期間が履歴ウィンドウサイズに属さなくなったボックスについては、そのボックスに格納された中間結果を長期興味ボックスに反映するとともに、ボックスを初期化して新たに履歴ウィンドウサイズに属することとなった集約期間のボックスとして用いる。 First, a box for storing intermediate results will be described. In the present embodiment, intermediate results are aggregated by dividing the time every predetermined aggregation period based on a predetermined time point (for example, January 1, 1970, 00:00:00). Store in the prepared box. The number of boxes is determined in advance by the history window size and the length of the aggregation period. For a box whose aggregation period does not belong to the history window size at a predetermined time (for example, processing time), the intermediate result stored in the box is used. It is reflected in the long-term interest box, and is used as an aggregation period box newly initialized and belonging to the history window size.

図１１に、履歴ウィンドウサイズとボックスの様子を示す。時間を横軸にとっている。同図に示した各ボックスは、図６で示したユーザ興味度中間結果テーブルの長期興味ボックス、ＢＯＸ１〜ＢＯＸ３に対応する。 FIG. 11 shows the history window size and the state of the box. Time is on the horizontal axis. Each box shown in the figure corresponds to the long-term interest boxes BOX1 to BOX3 in the user interest intermediate result table shown in FIG.

図１１（ａ）の時点では、ユーザ興味スコアを更新する処理時の時刻はＢＯＸ３の集約期間に属しているとする。このとき、データが入力されてユーザ興味スコアを更新する処理が行われる場合は、算出した中間結果でＢＯＸ３の中間結果を更新する。 At the time of FIG. 11A, it is assumed that the time at the time of processing for updating the user interest score belongs to the aggregation period of BOX3. At this time, when data is input and processing for updating the user interest score is performed, the intermediate result of BOX 3 is updated with the calculated intermediate result.

その後、図１１（ｂ）に示すように、処理時点において、ＢＯＸ１の集約期間が履歴ウィンドウサイズに属さなくなった場合、ＢＯＸ１に格納された中間結果を長期興味ボックスに反映し、図１１（ｃ）に示すように、ＢＯＸ１をＢＯＸ３の次の新たなボックスとして用いる。 After that, as shown in FIG. 11B, when the aggregation period of BOX1 does not belong to the history window size at the time of processing, the intermediate result stored in BOX1 is reflected in the long-term interest box, and FIG. BOX1 is used as the next new box after BOX3.

続いて、ボックスに格納された中間結果を更新する処理について説明する。図１２は、ボックスに格納された中間結果を更新する処理の流れを示すフローチャートである。 Next, a process for updating the intermediate result stored in the box will be described. FIG. 12 is a flowchart showing a flow of processing for updating the intermediate result stored in the box.

まず、概念体系更新処理部１３は、更新対象概念リストの各概念ＩＤについて、次式を用いて、中間結果ｔｍｐＸ，ｔｍｐＹを算出する（ステップＳ２００）。

First, the concept system update processing unit 13 calculates intermediate results tmpX and tmpY using the following formula for each concept ID in the update target concept list (step S200).

ここで、ｉは概念ＩＤ、ｎは更新処理回数、Ｚは特徴スコア、ｗは重みを表す。更新処理回数ｎは、概念体系更新処理が何度目かを示す値である。ユーザ興味スコアを求める一連の処理は、クラスタＩＤ単位で行われる。クラスタＩＤ単位で行う一連の処理を１回と数える。なお、特許文献１では、この中間結果ｔｍｐＸ，ｔｍｐＹを用いてユーザ興味スコアを算出していた。本実施の形態では、以下の処理により中間結果ｔｍｐＸ，ｔｍｐＹをボックス毎に集約した後、ユーザ興味スコアを算出する。 Here, i is a concept ID, n is the number of update processes, Z is a feature score, and w is a weight. The number n of update processes is a value indicating how many times the concept system update process is performed. A series of processes for obtaining the user interest score is performed in cluster ID units. A series of processes performed for each cluster ID is counted as one time. In Patent Document 1, the user interest score is calculated using the intermediate results tmpX and tmpY. In the present embodiment, the intermediate results tmpX and tmpY are aggregated for each box by the following processing, and then the user interest score is calculated.

続いて、現在時刻から更新対象のボックス番号を決定する（ステップＳ２０１）。例えば、図１１（ａ）に示す例ではボックス番号は３となり、図１１（ｂ），図１１（ｃ）に示す例ではボックス番号は１となる。 Subsequently, the update target box number is determined from the current time (step S201). For example, the box number is 3 in the example shown in FIG. 11A, and the box number is 1 in the examples shown in FIGS. 11B and 11C.

そして、ボックス番号を示す変数ｉを初期化する（ステップＳ２０２）。図１１に示す例では、ボックス番号は１〜３であるので、変数ｉを１に初期化する。 Then, the variable i indicating the box number is initialized (step S202). In the example shown in FIG. 11, since the box numbers are 1 to 3, the variable i is initialized to 1.

変数ｉが示すＢＯＸｉは履歴ウィンドウサイズ内であるか否か判定する（ステップＳ２０３）。例えば、変数ｉ＝１のとき、図１１（ａ），図１１（ｃ）に示す例ではＢＯＸ１は履歴ウィンドウサイズ内であり、図１１（ｂ）に示す例ではＢＯＸ１は履歴ウィンドウサイズ外である。時間が経過し、現在時刻が新たなボックスの集約期間に移った場合に、履歴ウィンドウサイズ外となるボックスが存在することになる。ＢＯＸｉが履歴ウィンドウサイズ外であるか否かは、ＢＯＸｉの最終更新日が履歴ウィンドウサイズ外つまり履歴ウィンドウサイズ以前であるか否か判定することで調べることができる。 It is determined whether BOXi indicated by the variable i is within the history window size (step S203). For example, when variable i = 1, BOX1 is within the history window size in the example shown in FIGS. 11A and 11C, and BOX1 is outside the history window size in the example shown in FIG. 11B. . When time elapses and the current time moves to a new box aggregation period, there will be a box outside the history window size. Whether BOXi is outside the history window size can be checked by determining whether the last update date of BOXi is outside the history window size, that is, before the history window size.

ＢＯＸｉが履歴ウィンドウサイズ内である場合（ステップＳ２０３のＹｅｓ）、ＢＯＸｉは更新対象であるか否かを判定する（ステップＳ２０４）。例えば、変数ｉ＝１のとき、図１１（ｃ）に示す例ではＢＯＸ１は更新対象である。 If BOXi is within the history window size (Yes in step S203), it is determined whether BOXi is an update target (step S204). For example, when variable i = 1, BOX1 is an update target in the example shown in FIG.

ＢＯＸｉが更新対象でない場合（ステップＳ２０４のＮｏ）、変数ｉをインクリメントして次のボックスの処理に移る（ステップＳ２１２、Ｓ２１３）。つまり、ＢＯＸｉが履歴ウィンドウサイズ内にあって、更新対象でない場合、ＢＯＸｉに対しては何も処理しない。 If BOXi is not an update target (No in step S204), the variable i is incremented and the process proceeds to the next box (steps S212 and S213). That is, if BOXi is within the history window size and is not an update target, no processing is performed on BOXi.

ＢＯＸｉが更新対象である場合（ステップＳ２０４のＹｅｓ）、ＢＯＸｉが保持する中間結果をｔｍｐＸ，ｔｍｐＹで更新する（ステップＳ２０５）。具体的には、ＢＯＸｉが保持する中間結果Ｘ_ＢＯＸｉ，Ｙ_ＢＯＸｉを次式で更新する。

If BOXi is an update target (Yes in step S204), the intermediate result held by BOXi is updated with tmpX and tmpY (step S205). Specifically, the intermediate results X _BOXi and Y _BOXi held by BOXi are updated by the following equation.

そして、ＢＯＸｉの最終更新日を現在時刻に更新し（ステップＳ２１１）、変数ｉをインクリメントして次のボックスの処理に移る（ステップＳ２１２、Ｓ２１３）。 Then, the last update date of BOXi is updated to the current time (step S211), the variable i is incremented, and the process proceeds to the next box (steps S212 and S213).

一方、ＢＯＸｉが履歴ウィンドウサイズ外である場合も（ステップＳ２０３のＮｏ）、ＢＯＸｉは更新対象であるか否かを判定する（ステップＳ２０６）。 On the other hand, even when BOXi is outside the history window size (No in step S203), it is determined whether BOXi is an update target (step S206).

ＢＯＸｉが更新対象でない場合（ステップＳ２０６のＮｏ）、ＢＯＸｉが保持する中間結果を長期興味ボックスに反映し（ステップＳ２０７）、ＢＯＸｉが保持する中間結果を０にリセットする（ステップＳ２０８）。 If BOXi is not an update target (No in step S206), the intermediate result held by BOXi is reflected in the long-term interest box (step S207), and the intermediate result held by BOXi is reset to 0 (step S208).

ＢＯＸｉが更新対象である場合（ステップＳ２０６のＹｅｓ）、ＢＯＸｉが保持する中間結果を長期興味ボックスに反映し（ステップＳ２０９）、ＢＯＸｉが保持する中間結果をｔｍｐＸ，ｔｍｐＹに置き換える（ステップＳ２１０）。 When BOXi is an update target (Yes in step S206), the intermediate result held by BOXi is reflected in the long-term interest box (step S209), and the intermediate result held by BOXi is replaced with tmpX, tmpY (step S210).

ＢＯＸｉが履歴ウィンドウサイズ外である場合、つまりＢＯＸｉが履歴ウィンドウサイズに属さなくなった場合は、図１１（ｂ）に示すように、ＢＯＸｉが保持する中間結果を長期興味ボックスに反映して新たなボックスとして用いる。長期興味ボックスが保持する中間結果をＬｏｎｇＸ，ＬｏｎｇＹとすると、次式により長期興味ボックスを更新する。

When BOXi is outside the history window size, that is, when BOXi no longer belongs to the history window size, as shown in FIG. 11B, the intermediate result held by BOXi is reflected in the long-term interest box and a new box is created. Used as If the intermediate results held by the long-term interest box are LongX and LongY, the long-term interest box is updated by the following equation.

なお、ステップＳ２０８の中間結果を０にリセットする処理は、前回の処理時刻から集約期間以上の時間が経過し、複数のボックスが一度に履歴ウィンドウサイズ外となった場合に発生する。そのボックスの集約期間中に処理が行われていないので、当該ボックスは中間結果を持たないことを意味する。 Note that the process of resetting the intermediate result in step S208 to 0 occurs when a time equal to or longer than the aggregation period has elapsed from the previous processing time and a plurality of boxes are out of the history window size at a time. Since no processing is performed during the aggregation period of the box, this means that the box has no intermediate result.

ステップＳ２１０のＢＯＸｉの中間結果をｔｍｐＸ，ｔｍｐＹに置き換える処理は、新たに履歴ウィンドウサイズに属することとなった集約期間の中間結果を格納するボックスとしてＢＯＸｉを利用することを意味する。 The process of replacing the intermediate result of BOXi in step S210 with tmpX, tmpY means that BOXi is used as a box for storing the intermediate result of the aggregation period that newly belongs to the history window size.

ステップＳ２０８，Ｓ２１０でＢＯＸｉに中間結果を設定すると、ＢＯＸｉの最終更新日を現在時刻に更新し（ステップＳ２１１）、変数ｉをインクリメントする（ステップＳ２１２）。全てのボックスについて処理していない場合はステップＳ２０３に戻り（ステップＳ２１３のＮｏ）、次のボックスに対して処理を行う。 When an intermediate result is set in BOXi in steps S208 and S210, the last update date of BOXi is updated to the current time (step S211), and variable i is incremented (step S212). If all the boxes have not been processed, the process returns to step S203 (No in step S213), and the next box is processed.

全てのボックスについて処理した場合は（ステップＳ２１３のＹｅｓ）、中間結果を更新する処理を終えて、ユーザ興味スコアを更新する処理に移る。 If all the boxes have been processed (Yes in step S213), the process of updating the intermediate result is finished, and the process moves to a process of updating the user interest score.

続いて、ユーザ興味スコアを更新する処理について説明する。 Then, the process which updates a user interest score is demonstrated.

概念体系更新処理部１３は、ボックスが保持する中間結果を更新すると、次式により、ボックスが保持する中間結果に所定の時点からの時間的な距離に応じて設定された重みｗ_ｔ１，ｗ_ｔ２，・・，ｗ_ｔｎと長期興味の重みｗ_ｌを加味してユーザ興味スコアＺを更新する。

When the conceptual system update processing unit 13 updates the intermediate result held by the box, the weights w _t1 and w _t2 set according to the temporal distance from the predetermined time point to the intermediate result held by the box according to the following equation: ,..., W _tn and long-term interest weight w _l are added to update the user interest score Z.

ここで、中間結果Ｘａ，Ｘｂ，・・，Ｘｎ，Ｙａ，Ｙｂ，・・，Ｙｎは、時間の近い順にボックスの中間結果を割り当てたものである。例えば、図１１（ａ）の例では、ＸａはＸ_ＢＯＸ３、ＹａはＹ_ＢＯＸ３、ＸｂはＸ_ＢＯＸ２、ＹｂはＹ_ＢＯＸ２、ＸｃはＸ_ＢＯＸ１、ＹｃはＹ_ＢＯＸ１となる。なお、長期興味ボックスを利用しない場合は、ｓｈｏｒｔＺをユーザ興味スコアＺとする。 Here, intermediate results Xa, Xb,..., Xn, Ya, Yb,. For example, in the example of FIG. 11 (a), Xa is _{X BOX3,} Ya is _{Y BOX3,} Xb is _{X BOX2,} Yb is _{Y BOX2,} Xc is _{X BOX1,} Yc becomes _{Y BOX1.} When the long-term interest box is not used, shortZ is set as the user interest score Z.

次に、コンテンツ評価処理について説明する。 Next, content evaluation processing will be described.

コンテンツ評価処理部１７は、要求に応じてコンテンツの評価スコアを算出し、評価スコアの高い順にコンテンツを並べたソート済みコンテンツスコアリストを生成する。 The content evaluation processing unit 17 calculates a content evaluation score in response to the request, and generates a sorted content score list in which the content is arranged in descending order of the evaluation score.

分析対象の概念ＩＤが指定された場合は、概念体系において当該概念ＩＤの下位概念のみを評価対象とし、該当する概念ＩＤが紐付けられたフィルタリング済みコンテンツリストを生成する。 When a concept ID to be analyzed is designated, only a subordinate concept of the concept ID in the concept system is evaluated, and a filtered content list linked with the corresponding concept ID is generated.

そして、フィルタリング済みコンテンツリストに含まれるコンテンツｘの評価スコアＥｎｔｉｔｉｙＺ_ｘを次式を用いて算出する。

Then, the evaluation score EntityZ _x of the content x included in the filtered content list is calculated using the following equation.

ここで、Ｚ_ｉは概念ｉのユーザ興味スコア、ｗ_ｉはコンテンツｘと概念ｉの関連度、ｐはコンテンツｘが持つ概念ＩＤの集合を表す。 Here, Z _i represents the user interest score of concept i, w _i represents the degree of association between content x and concept i, and p represents a set of concept IDs possessed by content x.

そして、ソート済みコンテンツスコアリスト送信部１８は、ソート済みコンテンツスコアリストを送信する。 Then, the sorted content score list transmission unit 18 transmits the sorted content score list.

以上説明したように、本実施の形態によれば、ユーザ興味スコアを算出する際に求める中間結果を所定の集約期間毎に集約してボックスに格納しておき、履歴ウィンドウサイズに属するボックスに格納された中間結果を用いてユーザ興味スコアを算出することにより、履歴ウィンドウサイズのみを分析対象とする場合や時間に従って重み付けを行う場合に、保存しておく中間結果のデータ量を削減するとともに、データ量の事前見積もりを可能とすることができる。 As described above, according to the present embodiment, the intermediate results obtained when calculating the user interest score are aggregated for each predetermined aggregation period and stored in the box, and stored in the box belonging to the history window size. By calculating the user interest score using the obtained intermediate results, when only the history window size is to be analyzed or when weighting is performed according to time, the data amount of the intermediate results to be saved is reduced, and the data Pre-estimation of quantity can be possible.

本実施の形態によれば、長期興味ボックスとＢＯＸ１〜ＢＯＸ３を１つのレコードとして概念体系／ユーザ興味スコアデータベース１４で管理することにより、データベースの読み出しと更新回数を減らすことが可能となる。 According to the present embodiment, by managing the long-term interest box and BOX1 to BOX3 as one record in the concept system / user interest score database 14, it is possible to reduce the number of reading and updating of the database.

１…興味分析装置
１１…履歴情報受信部
１２…特徴スコア算出部
１３…概念体系更新処理部
１４…概念体系／ユーザ興味スコアデータベース
１５…提示コンテンツリスト受信部
１６…コンテンツデータベース
１７…コンテンツ評価処理部
１８…ソート済みコンテンツスコアリスト送信部 DESCRIPTION OF SYMBOLS 1 ... Interest analysis apparatus 11 ... History information receiving part 12 ... Feature score calculation part 13 ... Concept system update processing part 14 ... Concept system / user interest score database 15 ... Presented content list receiving part 16 ... Content database 17 ... Content evaluation processing part 18 ... Sorted content score list transmitter

Claims

An interest analysis method executed by a computer for obtaining a user interest score indicating a user's interest in a concept given to content,
Receiving a first content list browsing a plurality of contents as a list and a second content list selected and browsed from the first content list;
The total number of contents in the first content list is the first total number, the number of contents in which the concept to be processed appears in the first content list is the first number of appearances, and the total number of contents in the second content list Is the second total number, and the number of contents in which the concept appears in the second content list is the second appearance number, the first total number, the first appearance number, and the second total number Under the condition, a first probability that the number of contents in which the concept appears in the second content list is greater than or equal to the second occurrence number and a second probability that is less than or equal to the second occurrence number are calculated. Calculating a feature score by an inverse function of a cumulative distribution function of a standard normal distribution based on the first probability and the second probability;
Calculating an intermediate result of a predetermined aggregation period using the feature score and a weight for the feature score;
Storing the intermediate result for each aggregation period;
Weighting each aggregation period according to a temporal distance from a predetermined time point, and calculating the user interest score using the intermediate result weighted for each aggregation period;
An interest analysis method characterized by comprising:

The intermediate results before a predetermined fixed period are aggregated and stored as a long-term interest intermediate result, and the user interest is combined by weighting and combining the intermediate results for each aggregation period aggregated in the aggregation period belonging to the predetermined fixed period. interest analysis method according to claim 1, wherein the calculating the score.

And long-term interest storage area for storing the long-term interest intermediate results, leave number fraction prepared corresponding storage area for each aggregation period for storing the result of aggregating the intermediate results in the aggregation period to the number of settings of the aggregation period ,
The step of storing the intermediate result stores the intermediate result aggregated in the storage area, and if the aggregation period corresponding to the storage area does not belong to the predetermined fixed period at a predetermined time, the storage result The intermediate result stored in the area is reflected in the long-term interest storage area, and the storage area is initialized and used as a storage area for the aggregation period newly belonging to the predetermined period. The interest analysis method according to claim 2 .

The interest analysis method according to claim 3, wherein the long-term interest storage area and the storage area are managed as one record in a database.

An interest analysis device for obtaining a user interest score indicating a user's interest in a concept assigned to content,
Receiving means for receiving a first content list viewed as a list of a plurality of contents and a second content list selected and browsed from the first content list;
The total number of contents in the first content list is the first total number, the number of contents in which the concept to be processed appears in the first content list is the first number of appearances, and the total number of contents in the second content list Is the second total number, and the number of contents in which the concept appears in the second content list is the second appearance number, the first total number, the first appearance number, and the second total number Under the condition, a first probability that the number of contents in which the concept appears in the second content list is greater than or equal to the second occurrence number and a second probability that is less than or equal to the second occurrence number are calculated. Calculating means for calculating a feature score by an inverse function of a cumulative distribution function of a standard normal distribution based on the first probability and the second probability;
Intermediate result update means for calculating an intermediate result of a predetermined aggregation period using the feature score and a weight for the feature score, and storing the intermediate result for each aggregation period;
An interest score calculation unit that weights each aggregation period according to a temporal distance from a predetermined time point, and calculates the user interest score using the intermediate result weighted for each aggregation period;
An interest analysis apparatus characterized by comprising:

6. An interest analysis program for operating a computer as each part of the interest analysis apparatus according to claim 5 .