JP2009223844A

JP2009223844A - Related information search apparatus, related information search method, related information search program, and recording medium with this program recorded

Info

Publication number: JP2009223844A
Application number: JP2008070568A
Authority: JP
Inventors: Noriaki Kawamae; 徳章川前; Takeshi Yamada; 武士山田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-03-19
Filing date: 2008-03-19
Publication date: 2009-10-01

Abstract

PROBLEM TO BE SOLVED: To provide a search technique for implementing search reflecting users' preferences. SOLUTION: A related information search apparatus 100 comprises a history data processing part 130 for generating, from target users' history data, user-specific session data represented by user ID, item ID and access time and computing, from the session data, the number of sessions including a given item, the number of sessions including an ordered item pair and the number of sessions including two given items as a pair, a computational processing part 140 for calculating parameters using the session numbers as indices and computing a degree of association between items using the indices, an input processing part 120 for receiving a search condition, a scoring part 150 for generating a ranking list including related items related to an item indicated by the search condition according to the degree of association computed about the item indicated by the search condition, and an output processing part 160 for outputting the ranking list. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、検索技術に係り、特に、ユーザがアイテム等のオブジェクトにアクセスする行動を行った行動履歴を示す履歴データの分析結果を利用して、検索条件のオブジェクトに関連する関連オブジェクトを検索する検索技術に関する。 The present invention relates to a search technique, and in particular, searches for related objects related to an object of a search condition using an analysis result of history data indicating an action history of a user performing an action of accessing an object such as an item. Related to search technology.

従来、インターネット上のオンラインストアなどで、ユーザ（閲覧者や購買者）を識別し、ユーザがアイテム等のオブジェクトにアクセスする行動に合わせて、ユーザに提供するサービスやコンテンツを変える仕組み、いわゆる、パーソナライゼーションが利用されている。例えば、あるユーザと他のユーザとの関係（ユーザ間の関係）と、ユーザがアイテムにアクセスする行動の履歴に基づいて、検索結果のランキングをパーソナライズする手法が提案されている（非特許文献１参照）。 Conventionally, a mechanism for identifying a user (browser or buyer) in an online store on the Internet and changing the services and contents provided to the user in accordance with the behavior of the user accessing an object such as an item, so-called personality Lizaization is being used. For example, a method for personalizing a ranking of search results based on a relationship between a user and another user (a relationship between users) and an action history of the user accessing an item has been proposed (Non-Patent Document 1). reference).

非特許文献１に記載された方法は、ユーザがアイテムへアクセスした各イベント（事象）を時系列にしたがって分析している。例えば、ユーザＡがアイテム１とアイテム２に続けてアクセスし、それから遅れた時期に、ユーザＢがアイテム１とアイテム２に続けてアクセスしていたときに、ユーザＢの行動が、ユーザＡの行動に追随していると考えられる。または、ユーザＡが、ユーザＢの行動に影響を与えていると考えられる。この場合、類似の行動履歴を有するユーザの情報を用いてユーザの行動を予測することができる。非特許文献１に記載の技術によれば、オンラインの音楽ダウンロード及びビデオ視聴のサービスにおけるユーザの購買履歴を用いて、各ユーザが購買する音楽またはビデオを予測することができる。 The method described in Non-Patent Document 1 analyzes each event (event) that a user accesses an item according to a time series. For example, when user A continues to access item 1 and item 2, and user B continues to access item 1 and item 2 at a later time, user B's action is user A's action. It is thought that it is following. Or it is thought that the user A is affecting the action of the user B. In this case, a user's action can be predicted using information of a user having a similar action history. According to the technology described in Non-Patent Document 1, music or video purchased by each user can be predicted using the purchase history of the user in an online music download and video viewing service.

また、従来の技術では、アイテムの関連性を定義するために、アイテムに含まれるテキストデータ（以下、単にテキストという）やそのメタデータ等々を利用している。その結果、従来の技術により定義されるアイテムの関連性は、例えば、アイテムが、「経済」、「科学」といったカテゴリやジャンルのどれに属するかという基準を用いることで、２つのアイテム間の関係を、明示的な関係として定義することが可能である。 Further, in the prior art, in order to define the relevance of items, text data (hereinafter simply referred to as text) included in the items, metadata thereof, and the like are used. As a result, the relevance of items defined by the conventional technology is, for example, the relationship between two items by using the criteria of whether the item belongs to a category or genre such as “economy” or “science”. Can be defined as an explicit relationship.

また、従来の技術では、ユーザの履歴を分析する際に、アイテム間の関連を示す情報を、「共起するユーザ数」に基づいて抽出することが多い。例えば、アイテム１にアクセスしたユーザが、「ユーザＡ」と「ユーザＢ」であれば、アイテム１に対して「共起するユーザ数」が「２」となり、同様に、各アイテムについて「共起するユーザ数」を求めれば、「共起するユーザ数」がそれぞれ類似したアイテムどうしは、関連が深いと考えられる。
川前徳章、山田武士、上田修功、「Relative Innovatorの発見によるパーソナライズ手法の提案」、情報科学技術レターズ、2007、vol.6、pp.99-102 Further, in the conventional technology, when analyzing a user's history, information indicating the relationship between items is often extracted based on the “number of co-occurring users”. For example, if the users who access item 1 are “user A” and “user B”, “number of co-occurring users” for item 1 is “2”, and similarly, “co-occurrence” for each item. If the “number of users” is calculated, items similar in “number of co-occurring users” are considered to be closely related.
Noriaki Kawamae, Takeshi Yamada, Nobuo Ueda, "Proposal of Personalization Method by Discovery of Relative Innovator", Information Science and Technology Letters, 2007, vol.6, pp.99-102

ユーザにリコメンド（推奨）するアイテム（オブジェクト）を求めるためにユーザの行動履歴を用いた検索において、検索条件として入力されるアイテムに対して、関連アイテムを検索結果として出力する場合に、検索を行う利用者の目的に柔軟に対応して多様な検索を可能とすることが要望されている。例えば、「リラックス」、「面白い」といった主観的な基準（ユーザの嗜好）を反映した検索のニーズが存在する。例えば、アイテムとしての１つの映画を「面白い」と感じるかどうかはユーザによって異なるが、趣味や嗜好が似通ったグループを形成するユーザ間では、「面白い」と感じる好みの映画が似通っていると考えられる。この場合に、趣味や嗜好が似通ったグループ内のあるユーザが購入した「映画Ａ」を、まだ見ていないが、「映画Ｂ」を購入したことのある他のユーザに対して、「映画Ｂ」に関連した映画として「映画Ａ」を薦めれば、その購入の可能性が高まると考えられる。 In a search using a user's action history to obtain an item (object) to be recommended (recommended) to the user, a search is performed when a related item is output as a search result for an item input as a search condition. It is desired that various searches can be made flexibly corresponding to the purpose of the user. For example, there is a search need that reflects subjective criteria (user preferences) such as “relax” and “interesting”. For example, whether or not one movie as an item feels “interesting” varies depending on the user, but among users who form groups with similar hobbies and preferences, the favorite movie that feels “interesting” is similar It is done. In this case, “movie B” purchased by a user in a group with similar hobbies and preferences has not been seen yet, but “movie B” has been purchased. "Movie A" is recommended as a movie related to "

しかしながら、従来の技術は、アイテムの関連性を、アイテムに含まれるテキストやそのメタデータ等々で定義するものであってアイテムの関連性の定義が異なるため、ユーザの嗜好を反映した検索に対しては適用することができない。つまり、従来の技術では、アイテムの関連性を、「経済」、「科学」といった明示的な関係として定義できるが、ユーザの嗜好を反映した、「リラックス」、「面白い」といった非明示的な関係を定義することができない。そのため、例えば、リコメンドするアイテムを、ユーザが、例えば「リラックス」するような映画として検索しようとしても、意図した検索結果を得ることができない。さらに、従来の技術では、仮にアイテムに付随するテキストやメタデータがなかった場合には、関連性を計算することが困難となってしまう。 However, in the conventional technology, the relevance of an item is defined by the text included in the item, its metadata, etc., and the definition of the relevance of the item is different. Is not applicable. In other words, in the conventional technology, the relationship between items can be defined as an explicit relationship such as “economy” or “science”, but an implicit relationship such as “relaxation” or “interesting” that reflects user preferences. Cannot be defined. Therefore, for example, even if the user tries to search for an item to be recommended as a movie that is “relaxed”, the intended search result cannot be obtained. Furthermore, in the conventional technique, if there is no text or metadata associated with an item, it becomes difficult to calculate the relevance.

また、従来の技術では、ユーザの履歴を分析する際に、「共起するユーザ数」に基づいてアイテム間の関連の情報を抽出することが多く、アイテム間の関連性を時系列にしたがって分析することまでは充分に考慮されてこなかった。そのため、検索結果としての関連アイテムには、「アイテムヘアクセスするユーザ数」の時間経過にしたがった変化が反映されない。したがって、アイテム間の関連の情報を抽出する際に、その時系列変化を考慮しつつ、ユーザの嗜好を反映した検索が要望されている。 In addition, in the conventional technology, when analyzing a user's history, information on the relationship between items is often extracted based on the “number of co-occurring users”, and the relationship between items is analyzed in time series. Until that time, it was not considered enough. For this reason, the related item as the search result does not reflect the change of the “number of users accessing the item” over time. Accordingly, there is a demand for a search that reflects the user's preferences while considering the time-series change when extracting the information related to the items.

そこで、本発明は、以上のような問題点に鑑みてなされたものであり、オブジェクト間の関連の時系列変化を考慮しつつユーザの嗜好を反映した検索を行うことができる検索技術を提供することを目的とする。 Therefore, the present invention has been made in view of the above problems, and provides a search technique capable of performing a search reflecting the user's preference while taking into account the time series change of the relationship between objects. For the purpose.

前記課題を解決するために、本発明に係る関連情報検索装置は、ユーザがオブジェクトにアクセスする行動を行った行動履歴を示す履歴データを予め分析して求めたオブジェクトおよびオブジェクト間の関係を示す指標に基づいて、前記分析の対象とされたオブジェクトの少なくとも１つを検索条件として、前記分析の対象とされた他のオブジェクトの中から、前記検索条件に関連した関連オブジェクトを関連情報として検索する関連情報検索装置であって、対象とする前記履歴データから、ユーザ識別情報と、オブジェクト識別情報と、前記ユーザが前記オブジェクトにアクセスする行動を行った時刻を示すアクセス時刻情報との組で表されるユーザ毎の個々のセッションデータを生成し、前記生成したセッションデータから、所定の１つのオブジェクトが出現したセッション数、オブジェクトペアにおいてアクセス順が指定された所定の２つのオブジェクトが出現したセッション数、および、所定のオブジェクトをオブジェクトペアの一方として含むセッション数を、前記履歴データの統計量としてそれぞれ求める履歴データ処理手段と、前記履歴データの統計量として求められた前記セッション数を用いたパラメタを前記指標として算出すると共に、前記算出した指標に基づいて、任意の２つのオブジェクト間の関連度をそれぞれ算出する計算処理手段と、前記検索条件を受け付ける入力処理手段と、前記受け付けた検索条件で示されるオブジェクトに関して予め算出された前記関連度に基づいて前記分析の対象とされた他のオブジェクトのうち、前記検索条件で示されるオブジェクトに対して最も関連する関連オブジェクトを少なくとも含むランキングリストを生成するスコアリング手段と、前記ランキングリストを出力する出力処理手段とを備えることを特徴とする。 In order to solve the above-mentioned problem, the related information search device according to the present invention provides an object index indicating an object and a relationship between objects obtained by analyzing in advance history data indicating an action history of a user performing an action of accessing an object. Based on the above, a search condition is used as at least one of the objects to be analyzed, and a related object related to the search condition is searched as related information from other objects to be analyzed. An information search device, which is represented by a set of user identification information, object identification information, and access time information indicating a time at which the user performs an action to access the object from the target history data. Individual session data for each user is generated, and a predetermined one is generated from the generated session data. The number of sessions in which an object has appeared, the number of sessions in which two predetermined objects whose access order is specified in the object pair have appeared, and the number of sessions in which the predetermined object is included as one of the object pairs are used as the statistics of the history data A history data processing means to be obtained and a parameter using the number of sessions obtained as a statistic of the history data are calculated as the index, and the degree of association between any two objects based on the calculated index Calculation processing means for calculating each of the above, an input processing means for receiving the search condition, and another object that is the object of the analysis based on the relevance calculated in advance with respect to the object indicated by the received search condition Of these, the options indicated in the search conditions It characterized in that it comprises a scoring means for generating a ranking list including at least the most relevant related objects, and an output processing means for outputting the ranking list to the object.

また、前記課題を解決するために、本発明に係る関連情報検索方法は、ユーザがオブジェクトにアクセスする行動を行った行動履歴を示す履歴データを予め分析して求めたオブジェクトおよびオブジェクト間の関係を示す指標に基づいて、前記分析の対象とされたオブジェクトの少なくとも１つを検索条件として、前記分析の対象とされた他のオブジェクトの中から、前記検索条件に関連した関連オブジェクトを関連情報として検索する関連情報検索装置の関連情報検索方法であって、前記関連情報検索装置が、入力処理部と、履歴データ処理部と、計算処理部と、スコアリング部と、出力処理部とを備え、前記履歴データ処理手段が、対象とする前記履歴データを受け付けるステップと、前記受け付けた履歴データから、ユーザ識別情報と、オブジェクト識別情報と、前記ユーザが前記オブジェクトにアクセスする行動を行った時刻を示すアクセス時刻情報との組で表されるユーザ毎の個々のセッションデータを生成するステップと、前記生成したセッションデータから、所定の１つのオブジェクトが出現したセッション数、オブジェクトペアにおいてアクセス順が指定された所定の２つのオブジェクトが出現したセッション数、および、所定のオブジェクトをオブジェクトペアの一方として含むセッション数を、前記履歴データの統計量としてそれぞれ求めるステップとを実行し、前記計算処理手段が、前記履歴データの統計量として求められた前記セッション数を用いたパラメタを前記指標として算出するステップと、前記算出した指標に基づいて、任意の２つのオブジェクト間の関連度をそれぞれ算出するステップとを実行し、前記入力処理手段が、前記検索条件を受け付けるステップを実行し、前記スコアリング手段が、前記受け付けた検索条件で示されるオブジェクトに関して予め算出された前記関連度に基づいて前記分析の対象とされた他のオブジェクトのうち、前記検索条件で示されるオブジェクトに対して最も関連する関連オブジェクトを少なくとも含むランキングリストを生成するステップを実行し、前記出力処理手段が、前記ランキングリストを出力するステップを実行することを特徴とする。 In addition, in order to solve the above-described problem, the related information search method according to the present invention provides a relationship between an object and an object obtained by analyzing in advance history data indicating an action history of a user performing an action of accessing an object. Based on the index indicated, at least one of the objects to be analyzed is used as a search condition, and the related object related to the search condition is searched as related information from among other objects to be analyzed. A related information search method for a related information search device, wherein the related information search device includes an input processing unit, a history data processing unit, a calculation processing unit, a scoring unit, and an output processing unit, The history data processing means accepts the target history data, and from the received history data, user identification information and From the step of generating individual session data for each user represented by a set of object identification information and access time information indicating a time at which the user performs an action to access the object, from the generated session data, The history data includes the number of sessions in which a predetermined object has appeared, the number of sessions in which a predetermined two objects whose access order is specified in the object pair have appeared, and the number of sessions in which the predetermined object is included as one of the object pairs. And calculating the parameter using the number of sessions determined as the statistic of the history data as the index, and based on the calculated index Any two objects Each step of calculating the degree of association between the input processing unit, the step of receiving the search condition, and the scoring unit calculated in advance for the object indicated by the received search condition Executing the step of generating a ranking list including at least the related objects most relevant to the object indicated by the search condition among the other objects to be analyzed based on the degree of association, and the output process The means executes the step of outputting the ranking list.

かかる構成の関連情報検索装置、または、かかる手順の関連情報検索方法によれば、関連情報検索装置は、検索を実行する前に、予め、対象とする履歴データから、ユーザ識別情報、オブジェクト識別情報およびアクセス時刻情報を含むセッションデータをユーザ毎に生成し、セッションデータから、履歴データの統計量として、例えば、オブジェクトペアにおいてアクセス順が指定された所定の２つのオブジェクトが出現したセッション等を求める。そして、関連情報検索装置は、履歴データの統計量として求められたセッション数を用いたパラメタを指標として算出すると共に、この算出した指標に基づいて、任意の２つのオブジェクト間の関連度をそれぞれ予め算出しておく。ここで、オブジェクトとは、例えば、電子化情報、実世界の情報、コンテンツ、アイテムを含み、アクセスとは、例えば、ログイン、閲覧、ダウンロード、購買などの操作情報を含む。また、アクセス条件は、例えば、対象とするオブジェクトペアの参照範囲、参照方向、アクセス時刻の間隔の閾値、アクセス順の間隔の閾値を示す。これによれば、関連情報検索装置は、従来とは異なり、オブジェクトおよびオブジェクトペアを含むセッション数およびそれに基づくオブジェクト間の関連度を求めることで、セッションデータに含まれるユーザ識別情報、オブジェクト識別情報およびアクセス時刻情報を、時系列にしたがって分析することができる。 According to the related information search device having such a configuration or the related information search method of such a procedure, the related information search device is configured to perform user identification information and object identification information from target history data in advance before executing the search. Then, session data including access time information is generated for each user, and from the session data, for example, a session in which two predetermined objects with the access order specified in the object pair appear as a history data statistic. Then, the related information search device calculates, as an index, a parameter using the number of sessions obtained as a statistic of the history data, and based on the calculated index, each degree of association between any two objects is calculated in advance. Calculate it. Here, the object includes, for example, computerized information, real-world information, content, and items, and the access includes, for example, operation information such as login, browsing, download, and purchase. The access condition indicates, for example, a reference range, a reference direction, an access time interval threshold value, and an access order interval threshold value of a target object pair. According to this, unlike the conventional case, the related information search device obtains the number of sessions including objects and object pairs and the degree of association between objects based on the number of sessions, thereby identifying user identification information, object identification information and The access time information can be analyzed according to a time series.

そして、関連情報検索装置は、検索を実行する際には、受け付けた検索条件で示されるオブジェクトに関して予め算出された関連度に基づいて、他のオブジェクトのうち、検索条件で示されるオブジェクトに対して最も関連する関連オブジェクトを少なくとも含むランキングリストを生成し、出力する。このランキングリストは、セッション数に基づいて定義されたオブジェクト間の関連度に基づいて生成される。したがって、セッション数に基づいて定義されたオブジェクト間の関連度は、テキストやそのメタデータを利用していないので、２つのアイテム間の関係を、非明示的な関係として定義することができる。そのため、関連情報検索装置は、オブジェクト間の関連の時系列変化を考慮しつつユーザの嗜好を反映した検索を可能とする。これにより、ユーザの嗜好を反映した関連オブジェクトをユーザに対してリコメンドすることができる。 When the related information search device executes the search, based on the relevance calculated in advance with respect to the object indicated by the received search condition, the related information search device applies to the object indicated by the search condition. A ranking list including at least the most related objects is generated and output. This ranking list is generated based on the degree of association between objects defined based on the number of sessions. Therefore, since the degree of association between objects defined based on the number of sessions does not use text or its metadata, the relationship between two items can be defined as an implicit relationship. Therefore, the related information search device enables a search reflecting the user's preference while taking into account the time series change of the relationship between objects. Thereby, the related object reflecting a user preference can be recommended to the user.

また、本発明に係る関連情報検索装置は、前記履歴データ処理手段が、前記履歴データの統計量として求められたセッション数と、前記個々のセッションデータとから、任意の２つのオブジェクトへのアクセス時刻の差を示す時間の平均値を時間統計量として求める時間統計量算出手段を備え、前記計算処理手段が、前記時間統計量として求められた前記時間の平均値に基づいて、時間平均を示すパラメタを前記指標として算出する時間平均算出手段と、前記算出された時間平均を示すパラメタに基づいて、任意の２つのオブジェクト間の関連度をそれぞれ算出する第２関連度算出手段とを備えることが好ましい。 Further, in the related information search device according to the present invention, the history data processing means accesses the time of access to any two objects from the number of sessions determined as the statistics of the history data and the individual session data. A parameter indicating time average based on the average value of the time obtained as the time statistic. It is preferable to include a time average calculating unit that calculates the degree of association as an index, and a second relevance degree calculating unit that calculates the degree of association between any two objects based on the parameter indicating the calculated time average. .

また、本発明に係る関連情報検索方法は、前記履歴データ処理手段が、時間統計量算出手段を備え、前記時間統計量算出手段が、前記履歴データの統計量として求められたセッション数と、前記個々のセッションデータとから、任意の２つのオブジェクトへのアクセス時刻の差を示す時間の平均値を時間統計量として求めるステップを実行し、前記計算処理手段が、時間平均算出手段と第２関連度算出手段とを備え、前記時間平均算出手段が、前記時間統計量として求められた前記時間の平均値に基づいて、時間平均を示すパラメタを前記検索時の指標として算出するステップを実行し、前記第２関連度算出手段が、前記算出された時間平均を示すパラメタに基づいて、任意の２つのオブジェクト間の関連度をそれぞれ算出するステップを実行することが好ましい。 Further, in the related information search method according to the present invention, the history data processing unit includes a time statistic calculation unit, and the time statistic calculation unit includes the number of sessions obtained as a statistic of the history data, A step of obtaining an average value of a time indicating a difference in access time to any two objects from individual session data as a time statistic, and the calculation processing means includes a time average calculation means and a second relevance degree Calculation means, and the time average calculation means, based on the average value of the time obtained as the time statistic, executing a step of calculating a parameter indicating the time average as an index at the time of the search, The second relevance calculating means executes a step of calculating relevance between any two objects based on the calculated parameter indicating the time average. It is preferable to.

かかる構成の関連情報検索装置、または、かかる手順の関連情報検索方法によれば、関連情報検索装置は、検索を実行する前に、予め生成したセッションデータと予め求めたセッション数とから、任意の２つのオブジェクトへのアクセス時刻の差を示す時間の平均値を時間統計量として求め、時間平均を示すパラメタを指標として算出し、さらに、この時間平均を示すパラメタに基づいて関連度をそれぞれ算出する。したがって、関連情報検索装置は、検索を実行する際には、受け付けた検索条件で示されるオブジェクトに関してこの時間平均を示すパラメタに基づいて予め算出された関連度に基づいて、ランキングリストを生成することができる。これによれば、関連情報検索装置は、セッション数に加えて、任意の２つのオブジェクトへのアクセス時刻の差の平均を求めることで、オブジェクト間の関連の時系列変化を考慮することができる。また、時間平均を示すパラメタとセッション数により、アイテムヘアクセスするユーザ数として、任意の時点までにアクセスする累積ユーザ数や、各時刻での累積アクセスユーザ数を予測できる。 According to the related information search device having such a configuration, or the related information search method of such a procedure, the related information search device can perform arbitrary processing from the session data generated in advance and the number of sessions determined in advance before executing the search. The average value of the time indicating the difference between the access times of the two objects is obtained as a time statistic, the parameter indicating the time average is calculated as an index, and the relevance is calculated based on the parameter indicating the time average. . Therefore, when executing the search, the related information search device generates a ranking list based on the relevance calculated in advance based on the parameter indicating the time average for the object indicated by the received search condition. Can do. According to this, in addition to the number of sessions, the related information search device can take into account the time series change of the relationship between objects by obtaining the average of the differences in the access times to any two objects. In addition, the parameter indicating the time average and the number of sessions can be used to predict the cumulative number of users accessing the item and the cumulative number of accessed users at each time as the number of users accessing the item.

また、本発明に係る関連情報検索装置は、前記履歴データ処理手段が、前記履歴データの統計量として求められたセッション数と、前記個々のセッションデータとから、前記オブジェクトペアのうちユーザが一方へアクセスしてから他方へアクセスするまでにアクセスした互いに異なるオブジェクトの個数を示すオブジェクト数の平均値をオブジェクト数統計量として求めるオブジェクト数統計量算出手段を備え、前記計算処理手段が、前記オブジェクト数統計量として求められた前記オブジェクト数の平均値に基づいて、異なりオブジェクト数の平均を示すパラメタを前記指標として算出する異なりオブジェクト数算出手段と、前記算出された異なりオブジェクト数の平均を示すパラメタに基づいて、任意の２つのオブジェクト間の関連度をそれぞれ算出する第３関連度算出手段とを備えることが好ましい。 Further, in the related information search device according to the present invention, the history data processing means allows the user to move one of the object pairs from the number of sessions determined as the statistics of the history data and the individual session data. An object number statistic calculating unit that obtains an average value of the number of objects indicating the number of different objects accessed from the access to the other as an object number statistic, and the calculation processing unit includes the object number statistic Based on the average value of the number of objects determined as a quantity, based on the parameter indicating the average of the number of different objects, the different object number calculation means for calculating a parameter indicating the average of the number of different objects as the index The degree of association between any two objects It is preferable to provide a third degree of association calculation means for calculating, respectively.

また、本発明に係る関連情報検索方法は、前記履歴データ処理手段が、オブジェクト数統計量算出手段を備え、前記オブジェクト数統計量算出手段が、前記履歴データの統計量として求められたセッション数と、前記個々のセッションデータとから、前記オブジェクトペアのうちユーザが一方へアクセスしてから他方へアクセスするまでにアクセスした互いに異なるオブジェクトの個数を示すオブジェクト数の平均値をオブジェクト数統計量として求めるステップを実行し、前記計算処理手段が、異なりオブジェクト数算出手段と第３関連度算出手段とを備え、前記異なりオブジェクト数算出手段が、前記オブジェクト数統計量として求められた前記オブジェクト数の平均値に基づいて、異なりオブジェクト数の平均を示すパラメタを前記検索時の指標として算出するステップを実行し、前記第３関連度算出手段が、前記算出された異なりオブジェクト数の平均を示すパラメタに基づいて、任意の２つのオブジェクト間の関連度をそれぞれ算出するステップを実行することが好ましい。 Further, in the related information search method according to the present invention, the history data processing unit includes an object number statistic calculating unit, and the object number statistic calculating unit includes the number of sessions obtained as a statistic of the history data, and And obtaining an average value of the number of objects indicating the number of different objects accessed from when the user accesses one of the object pairs to the other from the individual session data as an object number statistic. The calculation processing means includes a different object number calculation means and a third relevance calculation means, and the different object number calculation means sets the average value of the number of objects obtained as the object number statistic. Based on the parameter indicating the average of the number of different objects. A step of calculating as a time index, wherein the third relevance calculating means calculates a relevance between any two objects based on the calculated parameter indicating the average number of different objects. Is preferably performed.

かかる構成の関連情報検索装置、または、かかる手順の関連情報検索方法によれば、関連情報検索装置は、検索を実行する前に、予め生成したセッションデータと予め求めたセッション数とから、オブジェクトペア間の遷移が起こるまでにアクセスされた互いに異なるオブジェクトの個数を示すオブジェクト数の平均値をオブジェクト数統計量として求め、異なりオブジェクト数の平均を示すパラメタを指標として算出し、さらに、この異なりオブジェクト数の平均を示すパラメタに基づいて関連度をそれぞれ算出する。したがって、関連情報検索装置は、検索を実行する際には、受け付けた検索条件で示されるオブジェクトに関してこの異なりオブジェクト数の平均を示すパラメタに基づいて予め算出された関連度に基づいて、ランキングリストを生成することができる。これによれば、関連情報検索装置は、セッション数に加えて、前記した互いに異なるオブジェクトの個数を示すオブジェクト数の平均値を求めることで、オブジェクト間の関連の時系列変化を考慮することができる。また、異なりオブジェクト数の平均を示すパラメタとセッション数により、アイテムヘアクセスするユーザ数として、任意の時点までにアクセスする累積ユーザ数や、各時刻での累積アクセスユーザ数を予測できる。 According to the related information search device having the above configuration or the related information search method of the procedure, the related information search device uses the object pair from the session data generated in advance and the number of sessions determined in advance before executing the search. The average value of the number of different objects accessed before the transition between them is calculated as an object number statistic, and the parameter indicating the average number of different objects is calculated as an index. The relevance is calculated based on a parameter indicating the average of each. Therefore, when executing the search, the related information search device searches the ranking list based on the relevance calculated in advance based on the parameter indicating the average of the number of different objects for the object indicated by the received search condition. Can be generated. According to this, in addition to the number of sessions, the related information search device can take into account the time series change of the relationship between objects by obtaining the average value of the number of objects indicating the number of different objects described above. . In addition, the number of users accessing an item can be predicted by the parameter indicating the average of the number of different objects and the number of sessions, and the cumulative number of users accessing by an arbitrary time point and the cumulative number of accessing users at each time can be predicted.

また、本発明に係る関連情報検索プログラムは、前記した関連情報検索装置の機能をコンピュータに実現させることを特徴とする。このように構成されることにより、このプログラムをインストールされたコンピュータは、このプログラムに基づいた各機能を実現することができる。 A related information search program according to the present invention causes a computer to realize the function of the related information search apparatus described above. By being configured in this way, a computer in which this program is installed can realize each function based on this program.

また、本発明に係るコンピュータ読み取り可能な記録媒体は、前記した関連情報検索プログラムが記録されたことを特徴とする。このように構成されることにより、この記録媒体を装着されたコンピュータは、この記録媒体に記録されたプログラムに基づいた各機能を実現することができる。 A computer-readable recording medium according to the present invention is characterized in that the related information retrieval program is recorded. By being configured in this way, a computer equipped with this recording medium can realize each function based on a program recorded on this recording medium.

本発明によれば、関連情報検索装置は、ユーザの嗜好を反映した検索を行うことができきるので、ユーザの嗜好を反映した関連オブジェクトをユーザに対してリコメンドすることができる。また、関連情報検索装置は、オブジェクト間の関連の時系列変化を考慮しつつユーザの嗜好を反映した検索を可能とするので、オブジェクト間の関連度の時系列変化をふまえたパーソナライゼーションを可能とする。 According to the present invention, the related information search device can perform a search that reflects the user's preference, and can therefore recommend a related object that reflects the user's preference to the user. In addition, the related information search device enables a search that reflects the user's preference while taking into consideration the time series change of the relationship between objects, so that personalization based on the time series change of the degree of association between objects is possible. To do.

以下、図面を参照して本発明の関連情報検索装置および検索処理方法を実施するための最良の形態（以下「実施形態」という）について詳細に説明する。 The best mode (hereinafter referred to as “embodiment”) for carrying out the related information search device and search processing method of the present invention will be described below in detail with reference to the drawings.

［関連情報検索装置の構成の概要］
図１は、本発明の実施形態に係る関連情報検索装置を模式的に示す構成図である。
関連情報検索装置１００は、ユーザがオブジェクトにアクセスする行動を行った行動履歴を示す履歴データをユーザ毎に予め分析してオブジェクトおよびオブジェクト間の関係を示す検索時の指標を求める。また、関連情報検索装置１００は、求めた指標に基づいて、分析の対象とされたオブジェクトの少なくとも１つを検索条件として、分析の対象とされた他のオブジェクトの中から、検索条件に関連した関連オブジェクトを関連情報として検索する。ここで、オブジェクトとは、例えば、電子化情報、実世界の情報、コンテンツ、アイテムを含み、アクセスとは、例えば、ログイン、閲覧、ダウンロード、購買などの操作情報を含む。以下では、オブジェクトをアイテムとして説明する。本実施形態では、関連情報検索装置１００は、検索エンジンやＥＣサイト（Electronic Commerceを行うサイト）等の検索サービスのシステムに内包されているものとして説明する。 [Overview of configuration of related information search device]
FIG. 1 is a configuration diagram schematically showing a related information search device according to an embodiment of the present invention.
The related information search device 100 analyzes in advance history data indicating an action history of a user performing an action to access an object for each user, and obtains an index at the time of search indicating an object and a relationship between objects. Further, the related information search apparatus 100 relates to the search condition from among other objects to be analyzed using at least one of the objects to be analyzed as a search condition based on the obtained index. Search related objects as related information. Here, the object includes, for example, computerized information, real-world information, content, and items, and the access includes, for example, operation information such as login, browsing, download, and purchase. Below, an object is demonstrated as an item. In the present embodiment, the related information search device 100 will be described as being included in a search service system such as a search engine or an EC site (a site for performing electronic commerce).

関連情報検索装置１００は、例えば、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、入出力インタフェース等のコンピュータ、および、ＨＤＤ（Hard Disk Drive）に記憶されＲＡＭに展開されるプログラムを備えている。すなわち、ハードウェア装置とソフトウェアとが協働することによって、これらのハードウェア資源がプログラムによって制御され、関連情報検索装置１００は、図１に示すように、データ保存部１１０と、入力処理部１２０と、履歴データ処理部１３０と、計算処理部１４０と、スコアリング部１５０と、出力処理部１６０とを備えて実現される。 The related information search apparatus 100 is stored in a RAM (Hard Disk Drive) and a computer such as a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and an input / output interface, for example. Has a program to be deployed. That is, these hardware resources are controlled by a program through the cooperation of the hardware device and software, and the related information search device 100 includes a data storage unit 110 and an input processing unit 120 as shown in FIG. And a history data processing unit 130, a calculation processing unit 140, a scoring unit 150, and an output processing unit 160.

データ保存部１１０は、外部から入力されるユーザの履歴データと、演算処理結果として算出されるパラメタや関連度（関連性）とを保存するものであり、例えば、一般的なハードディスク装置などから構成され、ＣＰＵ等で用いられる各種プログラムや各種データ等を記憶する。 The data storage unit 110 stores user history data input from the outside, and parameters and relevance (relevance) calculated as calculation processing results. The data storage unit 110 includes, for example, a general hard disk device And stores various programs and various data used by the CPU and the like.

入力処理部１２０は、所定の入力インタフェースから構成され、図示しないキーボード、マウス、ディスクドライブ装置等の入力装置から入力される各種情報（データやコマンド）を受け付け、履歴データ処理部１３０やスコアリング部１５０に入力するものである。入力処理部１２０は、各ユーザの履歴データを受け付け、履歴データ処理部１３０に入力する。ユーザの履歴データは、ユーザの検索クエリといったユーザの各操作、アクセス及び購入などのユーザの行動の履歴を示す。入力処理部１２０は、アイテムＩＤなどの検索条件を受け付け、スコアリング部１５０に入力する。なお、入力処理部１２０は、図示しない通信ネットワークから各種情報を入力する通信インタフェースから構成するようにしてもよい。 The input processing unit 120 includes a predetermined input interface, receives various information (data and commands) input from an input device such as a keyboard, a mouse, and a disk drive device (not shown), and receives a history data processing unit 130 and a scoring unit. 150 is input. The input processing unit 120 receives the history data of each user and inputs it to the history data processing unit 130. The user history data indicates a history of user actions such as each user operation such as a user search query, access, and purchase. The input processing unit 120 receives a search condition such as an item ID and inputs it to the scoring unit 150. The input processing unit 120 may be configured by a communication interface that inputs various types of information from a communication network (not shown).

履歴データ処理部（履歴データ処理手段）１３０は、後段の計算処理部１４０においてユーザの履歴データに基づいて指標となるパラメタを算出するために、ユーザの履歴データを分析処理して、履歴データの統計量を求めるものである。履歴データの統計量の詳細については後記する。 The history data processing unit (history data processing means) 130 analyzes the user's history data in order to calculate a parameter serving as an index based on the user's history data in the subsequent calculation processing unit 140, The statistics are calculated. Details of the statistics of historical data will be described later.

計算処理部（計算処理手段）１４０は、履歴データの統計量を用いたパラメタを指標として算出すると共に、算出した指標に基づいて、任意の２つのオブジェクト間の関連度をそれぞれ算出するものである。計算処理部１４０は、関連度を算出することでアイテム間の関連性を予め計算する。指標や関連度を示すパラメタの詳細については後記する。 The calculation processing unit (calculation processing means) 140 calculates a parameter using the statistical amount of history data as an index, and calculates the degree of association between any two objects based on the calculated index. . The calculation processing unit 140 calculates the relationship between items in advance by calculating the degree of association. Details of parameters indicating indices and relevance will be described later.

スコアリング部（スコアリング手段）１５０は、入力処理部１２０で受け付けた検索条件で示されるアイテムに関して予め算出された関連度に基づいて、他のアイテムのうち、検索条件で示されるアイテムに対して最も関連する関連アイテムを少なくとも含むランキングリストを生成する。ここで、ランキングリストは、入力されたアイテムＩＤで示されるアイテムに関連する関連アイテムのＩＤ等のリストを示す。また、検索条件で示されるアイテムと、他のアイテムとは、予め履歴データ処理部１３０で分析の対象とされているものである。本実施形態では、スコアリング部１５０は、後記するように、アイテムヘアクセスするユーザ数を予測する。 The scoring unit (scoring means) 150 is based on the relevance calculated in advance with respect to the item indicated by the search condition received by the input processing unit 120, with respect to the item indicated by the search condition among other items. Generate a ranking list that includes at least the most relevant items. Here, the ranking list indicates a list of IDs of related items related to the item indicated by the input item ID. In addition, the items indicated by the search conditions and the other items are the objects of analysis by the history data processing unit 130 in advance. In the present embodiment, the scoring unit 150 predicts the number of users accessing the item, as will be described later.

出力処理部１６０は、例えば、グラフィックボード等の出力インタフェースから構成され、スコアリング部１５０による演算処理結果を図示しない液晶ディスプレイ等の出力装置に出力するものである。ここで、演算処理結果は、例えば、検索結果として作成されたランキングリスト等である。 The output processing unit 160 is constituted by an output interface such as a graphic board, for example, and outputs the calculation processing result by the scoring unit 150 to an output device such as a liquid crystal display (not shown). Here, the calculation processing result is, for example, a ranking list created as a search result.

［関連情報検索装置の処理の概要］
次に、図２ないし図５を参照して関連情報検索装置の処理の概要を説明する。図２は、図１に示した関連情報検索装置における履歴データの処理の概要を示す説明図であり、図３は、図１に示した関連情報検索装置における検索処理の概要を示す説明図である。また、図４は、複数のユーザの履歴データの概要を示す説明図であり、図５は、図１に示した関連情報検索装置の動作の概要を示す説明図である。 [Overview of related information search device processing]
Next, an overview of the processing of the related information search device will be described with reference to FIGS. FIG. 2 is an explanatory diagram showing an overview of history data processing in the related information search device shown in FIG. 1, and FIG. 3 is an explanatory diagram showing an overview of search processing in the related information search device shown in FIG. is there. 4 is an explanatory diagram showing an overview of history data of a plurality of users, and FIG. 5 is an explanatory diagram showing an overview of the operation of the related information search device shown in FIG.

本実施形態では、関連情報検索装置１００は、図５に破線で示すように、予めユーザの履歴データから履歴に出現するアイテム間の関係を計算する処理と、図５に実線で示すように、あるユーザの検索要求があった際に、その要求及び検索結果に対して、関連するアイテムを計算するスコアリング処理とを実行する。 In the present embodiment, the related information search device 100 calculates the relationship between items that appear in the history in advance from the user's history data, as shown by the broken line in FIG. 5, and as shown by the solid line in FIG. When there is a search request of a certain user, a scoring process for calculating related items is executed for the request and the search result.

＜検索前に予め行う処理の概要＞
図２に示すように、履歴データ処理部１３０は、入力処理部１２０を介してユーザの履歴データ２０１を受け付ける。履歴データ処理部１３０は、ユーザの履歴データ２０１に基づいて、履歴データの統計量２０２を生成し、計算処理部１４０に出力する。また、履歴データ処理部１３０は、生成した履歴データの統計量２０２をデータ保存部１１０に記録する。計算処理部１４０は、履歴データ処理部１３０から取得した履歴データの統計量２０２に基づいて、アイテム毎のパラメタ及びアイテムペア毎のパラメタ２０３を生成し、生成したアイテム毎のパラメタ及びアイテムペア毎のパラメタ２０３をデータ保存部１１０に記録する。このパラメタ２０３には、前記した指標やアイテム間の関連度を含む。なお、計算処理部１４０は、生成したパラメタをスコアリング部１５０に直接出力することもできる。 <Outline of processing to be performed in advance before search>
As illustrated in FIG. 2, the history data processing unit 130 receives user history data 201 via the input processing unit 120. The history data processing unit 130 generates history data statistics 202 based on the user history data 201 and outputs the statistics data 202 to the calculation processing unit 140. Further, the history data processing unit 130 records the statistics 202 of the generated history data in the data storage unit 110. The calculation processing unit 140 generates a parameter for each item and a parameter 203 for each item pair based on the statistics 202 of the history data acquired from the history data processing unit 130, and generates the generated parameter for each item and each item pair. The parameter 203 is recorded in the data storage unit 110. The parameter 203 includes the index and the degree of association between items. Note that the calculation processing unit 140 can also directly output the generated parameter to the scoring unit 150.

＜検索要求時の処理の概要＞
図３に示すように、入力処理部１２０は、検索サービス等から、アイテムＩＤ３０１を受け付けて、スコアリング部１５０に渡す。スコアリング部１５０は、入力処理部１２０を介してアイテムＩＤ３０１を受け付ける。スコアリング部１５０は、アイテムＩＤ３０１をキーとして、データ保存部１１０に予め保存されている、アイテム毎のパラメタ及びアイテムペア毎のパラメタ２０３としてのアイテム間の関連度を呼び出す。次に、スコアリング部１５０は、データ保存部１１０から呼び出したデータ（アイテム間の関連度）を入力されたアイテムＩＤについてのスコアリング結果３０２として、アイテムＩＤ３０１に最も関連する順番に相手を並び替えたランキング結果３０３を生成し、出力処理部１６０に出力する。なお、アイテムＩＤ３０１の代わりにアイテム名を受け付けるようにしてもよい。また、アイテムＩＤと共にユーザＩＤを入力するようにしてもよい。 <Outline of processing at the time of search request>
As illustrated in FIG. 3, the input processing unit 120 receives an item ID 301 from a search service or the like and passes it to the scoring unit 150. The scoring unit 150 receives the item ID 301 via the input processing unit 120. The scoring unit 150 uses the item ID 301 as a key to call a relevance level between items as a parameter for each item and a parameter 203 for each item pair stored in advance in the data storage unit 110. Next, the scoring unit 150 rearranges the opponents in the order most related to the item ID 301 as the scoring result 302 for the item ID input with the data (relationship between items) called from the data storage unit 110. The ranking result 303 is generated and output to the output processing unit 160. Note that an item name may be accepted instead of the item ID 301. Moreover, you may make it input user ID with item ID.

＜検索例＞
コンテンツを「映画」、アイテムを「映画タイトル」として、関連情報検索装置１００において予めユーザの履歴データを分析処理した後に、分析処理済のアイテムとして、例えば、映画タイトルとして「レオン」を示す「アイテムＩＤ」を入力すると、映画タイトルとして、「ジャッカル」や「フィフスエレメント」などを示す「アイテムＩＤ」のランキングリストが出力される。これらの映画は、従来の検索装置によるカテゴリやメタデータでは、関連性が見られない。しかしながら、同じ嗜好を持ったユーザ同士にとっては、人気のある映画である。例えば、「レオン」が公開された後で、「ジャッカル」や「フィフスエレメント」などが公開されている。「レオン」と「フィフスエレメント」とは、製作者や出演者が共通し、「ジャッカル」と「フィフスエレメント」とは、出演者が共通し、「レオン」と「ジャッカル」は、凄腕の殺し屋が重要な配役である点が共通している。したがって、従来のようにアイテムに含まれるテキストやそのメタデータだけでは定義できないようなユーザの嗜好を反映した検索を行うことが可能となる。 <Search example>
After analyzing the user history data in advance in the related information search apparatus 100 with the content “movie” and the item “movie title”, for example, “item” indicating “Leon” as the movie title is analyzed. When “ID” is input, a ranking list of “item ID” indicating “Jackal”, “Fifth Element”, and the like is output as a movie title. These movies are not related in the category or metadata by the conventional search device. However, it is a popular movie for users with the same preference. For example, after “Leon” is released, “Jackal”, “Fifth Element”, and the like are released. `` Leon '' and `` Fifth Element '' are the same producers and performers, `` Jackal '' and `` Fifth Element '' are the same performers, `` Leon '' and `` Jackal '' are great killer Is an important cast. Therefore, it is possible to perform a search reflecting the user's preference that cannot be defined only by the text included in the item and its metadata as in the conventional case.

＜対象とする履歴データ＞
ここで、履歴データ処理部１３０で扱う対象とする履歴データの概要について図４を参照して説明する。この例では、４人のユーザ（ユーザＡ、ユーザＢ、ユーザＣ、ユーザＤ）を仮定する。４人のユーザの履歴データ２０１ａ，２０１ｂ，２０１ｃ，２０１ｄの詳細は、対象とする履歴データとしての全ユーザの履歴データ２０１に記載されている。履歴データ処理部１３０は、全ユーザの履歴データ２０１を収集する。各ユーザの履歴データは、少なくとも、ユーザＩＤ、アイテムＩＤ、アクセス時間を有している。例えば、ユーザＡは、「アイテムｉ₁」に対して１月１日（Jan. 1）にアクセスし、「アイテムｉ₂」に対して１月３１日（Jan. 31）にアクセスしたことがわかる。なお、履歴データに含まれる情報として、アイテムにアクセスする操作についてのユーザの行動の種類（ログイン、購買、閲覧、ダウンロード）を加えてもよい。 <Target history data>
Here, an overview of history data to be handled by the history data processing unit 130 will be described with reference to FIG. In this example, four users (user A, user B, user C, user D) are assumed. Details of the history data 201a, 201b, 201c, and 201d of the four users are described in the history data 201 of all users as target history data. The history data processing unit 130 collects history data 201 for all users. The history data of each user has at least a user ID, an item ID, and an access time. For example, it is understood that the user A accesses “item i ₁ ” on January 1 (Jan. 1) and accesses “item i ₂ ” on January 31 (Jan. 31). . In addition, you may add the kind (login, purchase, browsing, download) of a user's action regarding operation which accesses an item as information contained in historical data.

履歴データ処理部１３０は、全ユーザの履歴データ２０１から、履歴データの統計量２０２として、ユーザを介したアイテム間の統計量を算出することができる。図４では、「アイテムｉ₁」から「アイテムｉ₂」まで各ユーザが推移するのに要した時間を、アイテム４０１を起点とした矢で図示している。例えば、ユーザＡは、「アイテムｉ₁」をアクセス対象とした日から、「アイテムｉ₂」をアクセス対象とした日へと推移するまでに３０日を要している。同様に、「アイテムｉ₁」から「アイテムｉ₃」まで各ユーザが推移するのに要した時間と、「アイテムｉ₁」から「アイテムｉ₄」まで各ユーザが推移するのに要した時間を図示している。ただし、ここでは、参照方向の条件（アクセス条件）を「後方」としているために、「アイテムｉ₁」の前に「アイテムｉ₂」をアクセスしたユーザＤは図示されていない。 The history data processing unit 130 can calculate a statistic between items via the user as the history data statistic 202 from the history data 201 of all users. In FIG. 4, the time required for each user to transition from “item i ₁ ” to “item i ₂ ” is illustrated by an arrow starting from item 401. For example, the user A requires 30 days from the date when “item i ₁ ” is accessed to the day when “item i ₂ ” is accessed. Similarly, the time required for each user to change from “item i ₁ ” to “item i ₃ ” and the time required for each user to change from “item i ₁ ” to “item i ₄ ” It is shown. However, since the reference direction condition (access condition) is “backward”, the user D who accesses “item i ₂ ” before “item i ₁ ” is not illustrated.

［関連情報検索装置の詳細な構成］
図６は、図１に示した関連情報検索装置の詳細な構成の一例を示すブロック図である。
（履歴データ処理部）
ここでは、履歴データ処理部１３０は、履歴データの統計量２０２（図２参照）を抽出するために、図６に示すように、セッションデータ生成手段１３１と、セッション統計量算出手段１３２と、時間統計量算出手段１３３と、アイテム数統計量算出手段１３４とを備えることとした。 [Detailed configuration of related information search device]
FIG. 6 is a block diagram illustrating an example of a detailed configuration of the related information search apparatus illustrated in FIG.
(History data processing part)
Here, the history data processing unit 130 extracts the history data statistics 202 (see FIG. 2), as shown in FIG. 6, the session data generation means 131, the session statistics calculation means 132, and the time The statistic calculation unit 133 and the item count statistic calculation unit 134 are provided.

＜セッションデータ生成手段＞
セッションデータ生成手段１３１は、対象とする履歴データから、ユーザＩＤ（ユーザ識別情報）と、アイテムＩＤ（アイテム識別情報）と、ユーザがアイテムにアクセスする行動を行った時刻を示すアクセス時刻（アクセス時刻情報）との組で表されるユーザ毎の個々のセッションデータ（以下、単にセッションという）を生成するものである。 <Session data generation means>
The session data generating unit 131 uses a user ID (user identification information), an item ID (item identification information), and an access time (access time) indicating the time when the user performs an action to access the item from the target history data. Each session data (hereinafter, simply referred to as a session) for each user represented by a pair of information) is generated.

セッションデータ生成手段１３１は、対象とする履歴データをユーザ単位に分割したデータ（以下、シーケンスという）を求める。ここで、対象とする履歴データは、サーバ等で記録されるものである。また、シーケンスはユニークＩＤを有し、シーケンスの内部では、アイテムＩＤと、アクセス時刻との両方を有している。履歴データ処理部１３０では、利用者が設定した条件に基づいてシーケンスを分割することができるものとする。本実施形態では、セッションデータ生成手段１３１は、シーケンス生成の際に、重複するアイテムが出現した場合には、最初に出現したアイテムと、そのアクセス時刻のみを利用することとする。その理由は、１人のユーザが同じアイテムに再度アクセスした場合には、最初のアクセスで得た経験が反映されるため初回のアクセスと同等に扱うことが難しく、そのための処理が複雑であるからである。 The session data generating unit 131 obtains data (hereinafter referred to as a sequence) obtained by dividing the target history data into user units. Here, the target history data is recorded by a server or the like. The sequence has a unique ID, and has both an item ID and an access time inside the sequence. The history data processing unit 130 can divide the sequence based on conditions set by the user. In the present embodiment, when duplicate items appear during sequence generation, the session data generating unit 131 uses only the first appearing item and its access time. The reason is that when one user accesses the same item again, the experience gained in the first access is reflected, so it is difficult to handle the same as the first access, and the process for that is complicated It is.

セッションデータ生成手段１３１は、各シーケンスをアイテム毎のセッションに分割することで、統計量を求めるための準備を行う。本実施形態では、セッションとは、ユーザの履歴データから、ユーザ単位に、＜アイテム：そのアイテムをアクセスした時刻＞の形式で生成されたリストを示す。例えば、ユーザＩＤが「ｉ」であるユーザのセッションの集合ｕ_iは、そのユーザがアクセスしたアイテムをｉ_j（アイテムＩＤ「ｊ」）、そのユーザが「アイテムｉ_j」にアクセスした時刻をｔ_jとしてセッションが合計ｈ個あったとすると、式（１）のように示すことができる。 The session data generating unit 131 prepares for obtaining a statistic by dividing each sequence into sessions for each item. In the present embodiment, the session indicates a list generated in the format of <item: time when the item is accessed> for each user from the history data of the user. For example, the set u _i of the user's session whose user ID is “i” is i _j (item ID “j”) for an item accessed by the user, and t is the time when the user accessed “item i _j ”. _{Assuming that} there are a total of h sessions as _j , it can be expressed as in equation (1).

ｕ_i＝｛＜ｉ₁:ｔ₁＞，＜ｉ₂: ｔ₂＞，…，＜ｉ_h: ｔ_h＞｝ …式（１） u _i = {, ,..., } Expression (1)

以下の処理では、同一ユーザのセッションに同一アイテムが出現した場合には、最初に出現したアイテムとその時刻のみを残す。つまり、ユーザＩＤが「ｉ」であるユーザが同じアイテムｉ_jに重複してアクセスした場合には、初めてアクセスしたアクセス時刻ｔ_jのアイテムの情報のみを残す。また、必要に応じて、アイテムをアクセスしたアクセス時刻の間隔に、閾値（時間閾値）を設定し、その閾値を超えたところでセッションを分割することもできる。 In the following processing, when the same item appears in the session of the same user, only the item that appears first and its time are left. That is, when the user with the user ID “i” accesses the same item _ij repeatedly, only the information of the item at the access time t _j accessed for the first time is left. If necessary, a threshold (time threshold) can be set for the access time interval at which the item is accessed, and the session can be divided when the threshold is exceeded.

＜セッション統計量算出手段＞
セッション統計量算出手段１３２は、セッションデータ生成手段１３１で生成したセッションから、アイテム及びアイテム間で共起するセッション数として、所定の１つのアイテムが出現したセッション数、アイテムペアにおいてアクセス順が指定された所定の２つのアイテムが出現したセッション数、および、所定のオブジェクトをオブジェクトペアの一方として含むセッション数を、前記履歴データの統計量としてそれぞれ求めるものである。ここで、アイテムペアとは、任意のアイテムの組み合わせである。また、アクセス条件（以下、参照条件という）とは、セッション数を求める（セッション数をカウントする）にあたって、設定する条件のことであり、例えば、参照範囲、参照方向、時間閾値またはアイテム数閾値等である。参照条件を設定した場合には、アイテムペアにおいて、その条件を満たすセッションのみをカウントする。以下に、カウントの仕方によりそれぞれ異なる３種類のセッション数「ｎ_a，ｎ_aj，ｎ_・j」を例として示す。 <Session statistics calculation means>
The session statistic calculation unit 132 specifies the number of sessions in which a predetermined item appears and the access order in an item pair as the number of sessions that co-occur between items from the session generated by the session data generation unit 131. The number of sessions in which two predetermined items appear and the number of sessions including a predetermined object as one of the object pairs are obtained as statistics of the history data. Here, the item pair is a combination of arbitrary items. An access condition (hereinafter referred to as a reference condition) is a condition that is set when obtaining the number of sessions (counting the number of sessions), such as a reference range, a reference direction, a time threshold, or an item count threshold. It is. When the reference condition is set, only sessions that satisfy the condition in the item pair are counted. In the following, three types of sessions “n _a , n _aj , n _{· j} ”, which differ depending on the counting method, are shown as examples.

（１）所定の１つのアイテムが出現したセッション数「ｎ_a」
所定の１つのアイテムが出現したセッション数は、所定のアイテムを含むセッション数である。アイテムｉ_a（アイテムＩＤ＝ａ）を含むセッション数を「ｎ_a」とする。カウント対象とする１人のユーザのセッション集合に、アイテムｉ_aを含むセッションが複数であっても、そのユーザに対しては１回しかカウントしない。 (1) Number of sessions “n _a ” in which one predetermined item appears
The number of sessions in which a predetermined item has appeared is the number of sessions including the predetermined item. The number of sessions including item i _a (item ID = a) is “n _a ”. The session a set of one user to be counted, even more sessions including item i _a, only counted once for the user.

（２）アイテムペアにおいてアクセス順が指定された所定の２つのアイテムが出現したセッション数「ｎ_{a j}」
アイテムペアにおいてアクセス順が指定された所定の２つのアイテムが出現したセッション数は、所定のアクセス条件に基づくアイテムペアを含むセッション数である。アイテムＩＤ＝ａを起点として指定する参照条件があり、アイテムｉ_a（アイテムＩＤ＝ａ）およびアイテムｉ_j（アイテムＩＤ≠ａのいずれか）からなるペア（アイテムペア）を含むセッション数を「ｎ_{a j}」とする。カウント対象とする１人のユーザのセッション集合に、アイテムｉ_aおよびアイテムｉ_jのアイテムペアが複数ある場合に、そのユーザの最新セッションにおいて、１回だけカウントする。 (2) Number of sessions “n _aj ” in which two predetermined items whose access order is specified in the item pair appear
The number of sessions in which two predetermined items whose access order is specified in the item pair appears is the number of sessions including the item pair based on a predetermined access condition. There is a reference condition that specifies item ID = a as a starting point, and the number of sessions including a pair (item pair) consisting of item i _a (item ID = a) and item i _j (item ID ≠ a) is “n” _aj ". The session a set of one user to be counted, when the item pairs item i _a and item i _j there is a plurality, in recent sessions for that user, be counted only once.

（３）所定のオブジェクトをオブジェクトペアの一方として含むセッション数「ｎ_・j」
所定のオブジェクトをオブジェクトペアの一方として含むセッション数は、各アイテムを起点または終点として所定のアイテムを含むセッション数である。この場合、特定のアイテムＩＤ＝ａを起点や終点として指定するような参照条件がない。この場合、各アイテムｉ_・（アイテムＩＤ＝・）に対応して、アイテムｉ_j（アイテム識別子ｊ≠・のいずれか）が出現したセッション数を「ｎ_・j」とする。 (3) Number of sessions “n _{· j} ” including a predetermined object as one of the object pairs
The number of sessions including a predetermined object as one of the object pairs is the number of sessions including a predetermined item with each item as a starting point or an ending point. In this case, there is no reference condition for designating a specific item ID = a as a starting point or an ending point. In this case, the number of sessions in which item i _j (item identifier j ≠ ·) appears corresponding to each item i _· (item ID = ·) is assumed to be “n _{· j} ”.

＜時間統計量算出手段＞
時間統計量算出手段１３３は、セッション統計量算出手段１３２で履歴データの統計量として求められたセッション数と、セッションデータ生成手段１３１で生成された個々のセッションとから、２つのアイテム（アイテムペア）へのアクセス時刻の差を示す時間の平均値（時間平均）を時間統計量として求めるものである。本実施形態では、アイテムペアについてのセッション数が２種類（ｎ_aj，ｎ_・j）あるので、それに対応して、時間統計量算出手段１３３は、２種類の「時間平均」を求める。 <Time statistics calculation means>
The time statistic calculation means 133 includes two items (item pairs) from the number of sessions obtained as the statistics of the history data by the session statistic calculation means 132 and the individual sessions generated by the session data generation means 131. The average value (time average) of the time indicating the difference in access time is obtained as a time statistic. In this embodiment, since there are two types of sessions (n _aj , n _{· j} ) for the item pair, the time statistic calculation means 133 determines two types of “time average” correspondingly.

≪時間平均〜その１≫
時間統計量算出手段１３３は、「アイテムｉ_a」に対する「アイテムｉ_j」に推移するまでの時間平均を、アイテム間の時間平均

として、式（２）により求める。 ≪Time average〜1≫
The time statistic calculation means 133 calculates the time average between items until the transition to “item i _j ” with respect to “item i _a ”.

As described above, it is obtained by the equation (2).

ここで、ｎ_{a j}は、セッション統計量算出手段１３２で算出されたセッション数であり、「アイテムｉ_a」からの参照条件を満たし、「アイテムｉ_j」が出現したセッション数を示す。Ｓ_{a j}は、「アイテムｉ_a」からの参照条件を満たし、「アイテムｉ_j」が出現したセッションの集合を示す。ｔ_{aj ,k}は、セッション毎に求められた統計情報を示す。ｋはセッションＩＤを示す。ただし、ｎ_{a j}が示す数と、Ｓ_{a j}の要素数（セッション数ｋで識別される個数）とは同数にならなければならない。したがって、厳密には、１人のユーザが「アイテムｉ_a」と「アイテムｉ_j」という同じ組み合わせにアクセスするセッションが、実際には複数回存在していたとしても、その１人のユーザにつき、「アイテムｉ_a」と「アイテムｉ_j」という同じ組み合わせのセッションの回数は、１回しかカウントしないこととする。 Here, n _aj is the number of sessions calculated by the session statistic calculation unit 132, and indicates the number of sessions in which “item i _j ” appears, satisfying the reference condition from “item i _a ”. S _aj indicates a set of sessions that satisfy the reference condition from “item i _a ” and in which “item i _j ” appears. t _{aj, k} indicates statistical information obtained for each session. k indicates a session ID. However, the number indicated by _naj must be the same as the number of elements of _Saj (the number identified by the session number k). Therefore, strictly speaking, even if there are actually multiple sessions in which a single user accesses the same combination of “item i _a ” and “item i _j ”, The number of sessions of the same combination of “item i _a ” and “item i _j ” is counted only once.

時間統計量算出手段１３３は、予め指定された参照条件（参照範囲、参照方向、時間閾値）を満たす時間の間隔ｔ_{aj ,k}を、セッション毎に求める。ここで、参照範囲とは、「起点となるアイテム」から「参照するアイテム」までの範囲である。例えば、参照範囲として、異なりアイテム数「２」を指定すると、「起点となるアイテム」から数えて、異なりアイテム数が「２」以内のアイテムに対してのみ、時間の間隔ｔ_{aj ,k}を抽出する。具体的には、本実施形態では、図４に示すユーザＢについては、例えば、「アイテムＩ₁」から「アイテムＩ₂」に推移するまでに、「アイテムＩ₄」と「アイテムＩ₃」を経て「アイテムＩ₂」に到達するので、異なりアイテム数を「３」であるものとして定義する。なお、この場合に、異なりアイテム数を、間に経由したアイテムの数として、「２」であるものとして定義することも可能である。さらに、途中で、起点と同じ「アイテムＩ₁」を経由した場合には、そのカウントを残すこともできるし、除去することも可能である。ここで、除去した場合には、物理的な時間情報とは異なる「異なりアイテム数」を定義することができる。 The time statistic calculating unit 133 obtains, for each session _{, a} time interval t _{aj, k} that satisfies a preliminarily designated reference condition (reference range, reference direction, time threshold). Here, the reference range is a range from “item to be a starting point” to “item to be referred to”. For example, if the number of different items “2” is specified as the reference range, the time interval t _{aj, k} is extracted only for items with the number of different items within “2” counting from the “starting item”. To do. Specifically, in the present embodiment, for the user B shown in FIG. 4, for example, “item I ₄ ” and “item I ₃ ” are changed before “item I ₁ ” is changed to “item I ₂ ”. Then, since “item I ₂ ” is reached, the number of different items is defined as “3”. In this case, it is also possible to define the number of different items as “2” as the number of items passed in between. Further, when the “item I ₁ ” that is the same as the starting point is passed along the way, the count can be left or removed. Here, when removed, “different number of items” different from the physical time information can be defined.

また、参照条件のうち、参照方向とは、「起点となるアイテム」から見た時系列による方向である。例えば、参照方向として、「両方」を指定した場合には、「起点となるアイテム」から見た時系列は問わないが、「前方（または後方）」を指定した場合には、「起点となるアイテム」から見て時間的に前（または後）のアイテムのみについて、時間の間隔ｔ_{aj ,k}を抽出する。また、参照条件として、時間閾値を設定すると、それを超えるような時間の間隔ｔ_{aj ,k}を抽出しないようにすることができる。 In the reference condition, the reference direction is a time-series direction as viewed from the “starting item”. For example, when “both” is specified as the reference direction, the time series viewed from “starting item” does not matter, but when “forward (or backward)” is specified, “starting point” The time interval t _{aj, k} is extracted only for items that are before (or after) the time as viewed from the item. In addition, when a time threshold is set as a reference condition, it is possible not to extract a time interval t _{aj, k} exceeding the time threshold.

例えば、図７（ａ）に示すようなユーザの履歴データがある場合に、参照条件の一例として、「起点となるアイテム」をｉ₁、参照方向の条件を「後」として、「アイテムｉ₁」から「アイテムｉ₂」までのアイテム間の時間平均

を、前記した式（２）により計算すると、式（３）のように求められる。なお、式（３）の計算では小数第２位を四捨五入した。ここで、「アイテムｉ₁」の前に「アイテムｉ₂」を参照しているユーザＤは、参照方向の条件「後」を満たさないので、この計算には使われない（ＮＧ）。 For example, if there is a user of the history data as shown in FIG. 7 (a), as an example of a reference condition, i ₁ the "item as a starting point", see the direction condition as a "rear", "Item i ₁ ”To“ Item i ₂ ”average time between items

Is calculated by the above-described equation (2), it is obtained as in equation (3). In the calculation of equation (3), the second decimal place was rounded off. Here, the user D referring to the “item i ₂ ” before the “item i ₁ ” does not satisfy the condition “after” in the reference direction, and thus is not used for this calculation (NG).

≪時間平均〜その２≫
時間平均〜その１では、アイテム間について時間平均を計算したのに対し、ここでは、アイテム毎に計算している点が異なる。時間統計量算出手段１３３は、「アイテムｉ_j」が出現してくるまでの時間平均を、起点となるアイテムすべてに対して、アイテム毎の時間平均

として、式（４）により求める。 ≪Time average〜2≫
In the time average to the first, the time average is calculated between items, but here, the points are calculated for each item. The time statistic calculation means 133 calculates the time average until “item i _j ” appears for each item as a starting point.

As described above, it is obtained by the equation (4).

ここで、ｎ_・jは、セッション統計量算出手段１３２で算出されたセッション数であり、全アイテムから見て、「アイテムｉ_j」が出現したセッション数を示す。Ｓ_・jは、「アイテムｉ_j」が後に出現するアイテムを含むセッションの集合を示す。ｔ_{aj ,k}は、セッション毎に求められた統計情報を示す。ｋはセッションＩＤを示す。 Here, n _{· j} is the number of sessions calculated by the session statistic calculation unit 132, and indicates the number of sessions in which “item i _j ” appears as seen from all items. S _{· j} represents a set of sessions including items in which “item i _j ” appears later. t _{aj, k} indicates statistical information obtained for each session. k indicates a session ID.

例えば、図７（ｂ）に示すようなユーザの履歴データがある場合に、一例として、「アイテムｉ₂」が出てくるまでの「時間平均」を、「起点となるアイテム」すべてに対してのアイテム毎の時間平均

として前記した式（４）により計算すると、式（５）のように求められる。ここでは、「アイテムｉ₁」の前に「アイテムｉ₂」を参照しているユーザＤも計算に使う（ＯＫ）。 For example, in the case where there is user history data as shown in FIG. 7B, as an example, the “time average” until “item i ₂ ” appears is set for all “starting items”. Average time per item

As calculated by the above-described equation (4), it is obtained as in equation (5). Here, the user D who refers to “item i ₂ ” before “item i ₁ ” is also used for the calculation (OK).

＜アイテム数統計量算出手段＞
アイテム数統計量算出手段（オブジェクト数統計量算出手段）１３４は、セッション統計量算出手段１３２で履歴データの統計量として求められたセッション数と、セッションデータ生成手段１３１で生成された個々のセッションとから、２つのアイテム（アイテムペア）のうちユーザが一方へアクセスしてから他方へアクセスするまでにアクセスした互いに異なるアイテムの個数を示すアイテム数の平均値（アイテム間の異なりアイテム数の平均）をアイテム数統計量として求めるものである。本実施形態では、アイテムペアについてのセッション数が２種類（ｎ_aj，ｎ_・j）あるので、それに対応して、アイテム数統計量算出手段１３４は、２種類の「異なりアイテム数の平均」を求める。 <Item statistic calculation means>
The item number statistic calculating unit (object number statistic calculating unit) 134 includes the number of sessions obtained as the statistical data statistic by the session statistic calculating unit 132, the individual sessions generated by the session data generating unit 131, and From the two items (item pairs), the average value of the number of items indicating the number of different items accessed from when the user accesses one to the other (average of the number of different items between items) This is obtained as an item count statistic. In the present embodiment, since there are two types of sessions (n _aj , n _{· j} ) for the item pair, the item number statistic calculation means 134 correspondingly calculates two types of “average of the number of different items”. Ask.

≪異なりアイテム数の平均〜その１≫
アイテム数統計量算出手段１３４は、「アイテムｉ_a」から「アイテムｉ_j」に推移するまでの異なりアイテム数の平均を、アイテム間の異なりアイテム数の平均

として、式（６）により求める。 ≪Average of the number of different items ~ 1≫
The item count statistic calculation means 134 calculates the average of the number of different items until the transition from “item i _a ” to “item i _j ”, and the average of the number of different items between items.

As described above, it is obtained by the equation (6).

ここで、ｎ_{a j}は、セッション統計量算出手段１３２で算出されたセッション数であり、「アイテムｉ_a」の後に、「アイテムｉ_j」が出現したセッション数を示す。Ｓ_{a j}は、「アイテムｉ_a」の後に、「アイテムｉ_j」が出現したセッションの集合を示す。ｄ_{aj ,k}は、セッション毎に求められた統計情報を示す。ｋはセッションＩＤを示す。 Here, n _aj is the number of sessions calculated by the session statistic calculation unit 132, and indicates the number of sessions in which “item i _j ” appears after “item i _a ”. S _aj indicates a set of sessions in which “item i _j ” appears after “item i _a ”. d _{aj, k} indicates statistical information obtained for each session. k indicates a session ID.

アイテム数統計量算出手段１３４は、予め指定された参照条件（参照範囲、参照方向、アイテム数閾値）を満たすアイテム数の間隔ｄ_{aj ,k}を、セッション毎に求める。例えば、「アイテム数閾値」を設定すると、それを超える「アイテム数」を抽出しないようにすることができる。 The item count statistic calculation unit 134 determines, for each session _, an interval d _{aj, k} for the number of items that satisfies a reference condition (reference range, reference direction, item count threshold) specified in advance. For example, if an “item number threshold” is set, it is possible not to extract an “item number” exceeding that.

例えば、図８（ａ）に示すようなユーザの履歴データがある場合に、「起点となるアイテム」をｉ₁、参照方向の条件を「後」として、「アイテムｉ₁」から「アイテムｉ₂」までのアイテム間の異なりアイテム数の平均

を、前記した式（６）により計算すると、式（７）のように求められる。ここで、「アイテムｉ₁」の前に「アイテムｉ₂」を参照しているユーザＤは、参照方向の条件「後」を満たさないので、この計算には使われない（ＮＧ）。 For example, when there is user history data as shown in FIG. 8A, “item i ₁ ” to “item i ₂ ” are set with “starting item” as i ₁ and reference direction condition as “after”. The average number of items that differ between items up to

Is calculated by the above-described equation (6), it is obtained as in equation (7). Here, the user D referring to the “item i ₂ ” before the “item i ₁ ” does not satisfy the condition “after” in the reference direction, and thus is not used for this calculation (NG).

≪異なりアイテム数の平均〜その２≫
異なりアイテム数の平均〜その１では、アイテム間について異なりアイテム数の平均を計算したのに対し、ここでは、アイテム毎に計算している点が異なる。アイテム数統計量算出手段１３４は、「アイテムｉ_j」が出現してくるまでの異なりアイテム数の平均を、起点となるアイテムすべてに対して、アイテム毎の異なりアイテム数の平均

として、式（８）により求める。 ≪Average of the number of different items ~ 2≫
In the average of the number of different items to the first item, the average of the number of different items is calculated for each item, whereas here, the difference is calculated for each item. The item count statistic calculating means 134 calculates the average of the number of different items until “item i _j ” appears, and the average of the number of different items for each item for all the starting items.

As described above, it is obtained by the equation (8).

ここで、ｎ_・jは、セッション統計量算出手段１３２で算出されたセッション数であり、全アイテムから見て、「アイテムｉ_j」が出現したセッション数を示す。Ｓ_・jは、「アイテムｉ_j」が後に出現するアイテムを含むセッションの集合を示す。ｄ_{aj ,k}は、セッション毎に求められた統計情報を示す。ｋはセッションＩＤを示す。 Here, n _{· j} is the number of sessions calculated by the session statistic calculation unit 132, and indicates the number of sessions in which “item i _j ” appears as seen from all items. S _{· j} represents a set of sessions including items in which “item i _j ” appears later. d _{aj, k} indicates statistical information obtained for each session. k indicates a session ID.

例えば、図８（ｂ）に示すようなユーザの履歴データがある場合に、一例として、「アイテムｉ₂」が出てくるまでの「異なりアイテム数の平均」を、「起点となるアイテム」すべてに対してのアイテム毎の異なりアイテム数の平均

として前記した式（８）により計算すると、式（９）のように求められる。ここでは、「アイテムｉ₁」の前に「アイテムｉ₂」を参照しているユーザＤも計算に使う（ＯＫ）。 For example, in the case where there is user history data as shown in FIG. 8B, as an example, “average of the number of different items” until “item i ₂ ” appears is “all items starting”. Average number of different items per item for

Is calculated by the above-described equation (8), the equation (9) is obtained. Here, the user D who refers to “item i ₂ ” before “item i ₁ ” is also used for the calculation (OK).

（計算処理部）
計算処理部（計算処理手段）１４０は、図６に示すように、セッション数算出手段１４１と、時間平均算出手段１４２と、異なりアイテム数算出手段１４３と、第１関連度算出手段１４４と、第２関連度算出手段１４５と、第３関連度算出手段１４６とを備えている。 (Calculation processing part)
As shown in FIG. 6, the calculation processing unit (calculation processing unit) 140 includes a session number calculation unit 141, a time average calculation unit 142, a different item number calculation unit 143, a first relevance calculation unit 144, 2 relevance calculating means 145 and third relevance calculating means 146 are provided.

＜セッション数算出手段＞
セッション数算出手段１４１は、セッション統計量算出手段１３２でカウントしたセッション数（ｎ_a，ｎ_aj，ｎ_・j）から、式（１０）に示すパラメタλ_n,・jおよび式（１１）に示すパラメタλ_n,ajを、指標として算出するものである。なお、式（１１）に示すμは任意に設定できるが、ギプスサンプリング等の方法を用いても決定できる。このμの値を適切に設定することで対象とする履歴データに過度に依存したモデルとなることを防止できる。つまり、ユーザの行動履歴に対するオーバーフィッティングを避けることができる。 <Session number calculation means>
The number-of-sessions calculating means 141 shows the parameters λ _n, .j shown in Expression (10) and Expression (11) from the number of sessions (n _a , n _aj , n _{· j} ) counted by the session statistic calculating means 132. The parameter λ _{n, aj} is calculated as an index. Note that μ shown in Expression (11) can be set arbitrarily, but can also be determined using a method such as cast sampling. By appropriately setting the value of μ, it is possible to prevent the model from being excessively dependent on the target history data. That is, overfitting with respect to the user's behavior history can be avoided.

＜時間平均算出手段＞
時間平均算出手段１４２は、時間統計量算出手段１３３で時間統計量として求められた時間の平均値（アイテム間の時間平均）に基づいて、時間平均を示すパラメタを指標として算出するものである。本実施形態では、時間平均算出手段１４２は、セッション統計量算出手段１３２でカウントしたセッション数（ｎ_aj）および前記した式（２）および式（４）に示す「時間平均」から、式（１２）に示すパラメタλ_t,・jおよび式（１３）に示すパラメタλ_t,ajを算出する。なお、式（１３）に示すμは任意に設定できるが、ギプスサンプリング等の方法を用いても決定できる。 <Time average calculation means>
The time average calculation unit 142 calculates a parameter indicating the time average as an index based on the average value of the time (time average between items) obtained as the time statistic by the time statistic calculation unit 133. In this embodiment, the time average calculation unit 142 calculates the equation (12) from the number of sessions ( _naj ) counted by the session statistic calculation unit 132 and the “time average” shown in the above equations (2) and (4). parameter lambda _{t shown),} calculates a parameter lambda _{t, aj} shown in _{· j} and the formula (13). Note that μ shown in Expression (13) can be set arbitrarily, but can also be determined using a method such as cast sampling.

＜異なりアイテム数算出手段＞
異なりアイテム数算出手段（異なりオブジェクト数算出手段）１４３は、アイテム数統計量算出手段でアイテム数統計量として求められたアイテム数の平均値（アイテム間の異なりアイテム数の平均）に基づいて、異なりアイテム数の平均を示すパラメタ（以下では、単に、「異なりアイテム数のパラメタ」という）を指標として算出するものである。本実施形態では、異なりアイテム数算出手段１４３は、セッション統計量算出手段１３２でカウントしたセッション数（ｎ_aj）および前記した式（６）および式（８）に示す「異なりアイテム数の平均」から、式（１４）に示すパラメタλ_d,・jおよび式（１５）に示すパラメタλ_d,ajを算出する。なお、式（１５）に示すμは任意に設定できるが、ギプスサンプリング等の方法を用いても決定できる。 <Different item number calculation means>
The different item number calculating means (different object number calculating means) 143 is different based on the average value of the number of items (average of different items between items) obtained as the item number statistical quantity by the item number statistical quantity calculating means. A parameter indicating an average of the number of items (hereinafter simply referred to as a “parameter of the number of different items”) is calculated as an index. In the present embodiment, the different item number calculating unit 143 determines the number of sessions (n _aj ) counted by the session statistic calculating unit 132 and the “average of the number of different items” shown in the above formulas (6) and (8). The parameter λ _{d, j} shown in the equation (14) and the parameter λ _{d, aj} shown in the equation (15) are calculated. Note that μ shown in Expression (15) can be set arbitrarily, but can also be determined using a method such as cast sampling.

＜第１関連度算出手段＞
第１関連度算出手段１４４は、データ保存部１１０からセッション数のパラメタを読み出し、検索対象とするアイテムのうち任意の２つのアイテム間の関連度をそれぞれ算出するものである。本実施形態では、アイテム間で共起するセッション数がポアソン分布に従うことを前提とする。この第１関連度算出手段１４４は、前記した式（１０）および式（１１）に示すセッション数のパラメタを利用して、式（１６）により、所定の「アイテムｉ_a」と、他の「アイテムｉ_j」とのアイテム間の関連度ｆ_n（ｊ｜ａ；θ）を計算する。 <First relevance calculation means>
The first relevance calculating unit 144 reads the parameter of the number of sessions from the data storage unit 110 and calculates the relevance between any two items among the items to be searched. In this embodiment, it is assumed that the number of sessions that co-occur between items follows a Poisson distribution. The first relevance calculating means 144 uses the parameters of the number of sessions shown in the above formulas (10) and (11) to formulate a predetermined “item i _a ” and other “ The degree of association f _n (j | a; θ) between items with item i _j ”is calculated.

ここで、θはパラメタを使用することを示し、Ｌ_nはパラメタ距離を測る関数を使用することを示す。式（１６）は、「ポアソン分布間の距離」を測るためのKullback Leibler distanceを示す。この式（１６）で算出される値の小さい順に、所定の「アイテムｉ_a」に対して関連がより深いアイテム（関連アイテム）として選択することができる。 Here, θ indicates that a parameter is used, and L _n indicates that a function for measuring the parameter distance is used. Equation (16) shows the Kullback Leibler distance for measuring the “distance between Poisson distributions”. It is possible to select items (related items) that are more deeply related to _a predetermined “item i _a ” in ascending order of the value calculated by the equation (16).

＜第２関連度算出手段＞
第２関連度算出手段１４５は、データ保存部１１０から時間平均のパラメタを読み出し、検索対象とするアイテムのうち任意の２つのアイテム間の関連度をそれぞれ算出するものである。本実施形態では、アイテム間の関連性を示す「時間平均」が指数分布に従うことを前提とする。この第２関連度算出手段１４５は、前記した式（１２）および式（１３）に示す時間平均のパラメタを利用して、式（１７）により、所定の「アイテムｉ_a」と、他の「アイテムｉ_j」とのアイテム間の関連度ｆ_t（ｊ｜ａ；θ）を計算する。 <Second relevance calculation means>
The second relevance calculation means 145 reads the time average parameter from the data storage unit 110 and calculates the relevance between any two items among the items to be searched. In the present embodiment, it is assumed that the “time average” indicating the relationship between items follows an exponential distribution. The second relevance calculating means 145 uses the time average parameters shown in the above formulas (12) and (13) to formulate a predetermined “item i _a ” and other “ The degree of association f _t (j | a; θ) between items with the item i _j ”is calculated.

＜第３関連度算出手段＞
第３関連度算出手段１４６は、データ保存部１１０から異なりアイテム数のパラメタを読み出し、検索対象とするアイテムのうち任意の２つのアイテム間の関連度をそれぞれ算出するものである。本実施形態では、アイテム間の関連性を示す「異なりアイテム数の平均」が指数分布に従うことを前提とする。この第３関連度算出手段１４６は、前記した式（１４）および式（１５）に示す異なりアイテム数のパラメタを利用して、式（１８）により、所定の「アイテムｉ_a」と、他の「アイテムｉ_j」とのアイテム間の関連度ｆ_d（ｊ｜ａ；θ）を計算する。 <Third relevance calculation means>
The third relevance calculating means 146 reads out the parameter of the number of items different from the data storage unit 110, and calculates the relevance between any two items among the items to be searched. In the present embodiment, it is assumed that the “average of the number of different items” indicating the relationship between items follows an exponential distribution. The third relevance calculating means 146 uses the parameter of the number of different items shown in the above formulas (14) and (15), and by using the formula (18), a predetermined “item i _a ” The degree of association f _d (j | a; θ) between items with “item i _j ” is calculated.

なお、前記した式（１７）および式（１８）において、θはパラメタを使用することを示し、Ｌ_t（またはＬ_d）はパラメタ距離を測る関数を使用することを示す。また、両式（１７）および式（１８）は、「指数分布間の距離」を測るためのKullback Leibler distanceを示す。また、両式で算出される値の小さい順に、所定の「アイテムｉ_a」に対して関連がより深いアイテム（関連アイテム）として選択することができる。 In the above-described equations (17) and (18), θ indicates that a parameter is used, and L _t (or L _d ) indicates that a function that measures a parameter distance is used. Further, both equations (17) and (18) indicate the Kullback Leibler distance for measuring “distance between exponential distributions”. Moreover, it is possible to select an item (related item) that is more deeply related to _a predetermined “item i _a ” in ascending order of values calculated by both equations.

（スコアリング部）
スコアリング部（スコアリング手段）１５０は、図６に示すように、ユーザ数予測手段１５１と、ランキングリスト作成手段１５２とを備えている。 (Scoring part)
As shown in FIG. 6, the scoring unit (scoring means) 150 includes a user number predicting means 151 and a ranking list creating means 152.

＜ユーザ数予測手段＞
ユーザ数予測手段１５１は、入力に対応するアイテムへアクセスするユーザ数を予測するものである。本実施形態では、ユーザ数予測手段１５１は、セッション統計量算出手段１３２でカウントしたセッション数（ｎ_aj）および前記した式（１３）に示す時間平均のパラメタを利用して、式（１９）により、検索条件で示される「アイテムｉ_a」をアクセスしてから、時間の間隔ｔの後に、「アイテムｉ_j」にアクセスするユーザ数を予測する。また、本実施形態では、ユーザ数予測手段１５１は、セッション統計量算出手段１３２でカウントしたセッション数（ｎ_aj）および前記した式（１５）に示す異なりアイテム数のパラメタを利用して、式（２０）により、検索条件で示される「アイテムｉ_a」をアクセスしてから、異なりアイテム数の間隔ｄの後に、「アイテムｉ_j」にアクセスするユーザ数を予測する。この予測ユーザ数を用いることで、各アイテムからの、時系列を考慮した推移確率を計算できる。 <User number prediction means>
The user number predicting means 151 predicts the number of users accessing the item corresponding to the input. In the present embodiment, the user number prediction unit 151 uses the number of sessions (n _aj ) counted by the session statistic calculation unit 132 and the time average parameter shown in the above equation (13) according to the equation (19). The number of users accessing “item i _j ” after the time interval t after accessing “item i _a ” indicated by the search condition is predicted. In the present embodiment, the user number predicting unit 151 uses the number of sessions (n _aj ) counted by the session statistic calculating unit 132 and the parameter of the number of different items shown in the above equation (15) to calculate the equation ( 20), after accessing “item i _a ” indicated by the search condition, the number of users accessing “item i _j ” is predicted after an interval d of different item numbers. By using this predicted number of users, it is possible to calculate the transition probability from each item in consideration of the time series.

＜ランキングリスト作成手段＞
ランキングリスト作成手段１５２は、データ保存部１１０から、検索条件としてのアイテムに対応した各アイテムのそれぞれの関連度を、スコアリング結果として抽出し、ポアソン分布（または指数分布）間の距離が小さい順に並べ替えてランキングリストを作成するものである。なお、関連度は、第１関連度算出手段１４４、第２関連度算出手段１４５、および、第３関連度算出手段１４６により予め算出されてデータ保存部１１０に保存されている。 <Ranking list creation means>
The ranking list creation unit 152 extracts the degree of relevance of each item corresponding to the item as the search condition from the data storage unit 110 as a scoring result, and the distance between the Poisson distribution (or exponential distribution) is ascending. A ranking list is created by rearranging. The relevance is calculated in advance by the first relevance calculation unit 144, the second relevance calculation unit 145, and the third relevance calculation unit 146 and stored in the data storage unit 110.

[関連情報検索装置の動作]
図１に示した関連情報検索装置１００の動作について図９ないし図１１を参照（適宜図６参照）して説明する。図９ないし図１１は、図６に示した履歴データ処理部、計算処理部およびスコアリング部のそれぞれの動作を示すフローチャートである。 [Operation of related information retrieval device]
The operation of the related information retrieval apparatus 100 shown in FIG. 1 will be described with reference to FIGS. 9 to 11 (see FIG. 6 as appropriate). 9 to 11 are flowcharts showing operations of the history data processing unit, the calculation processing unit, and the scoring unit shown in FIG.

＜検索前に予め行う処理〜その１＞
図９に示すように、操作者の操作に基づいて、関連情報検索装置１００は、入力処理部１２０によって、対象とするユーザの履歴データを受け付け、履歴データ処理部１３０に入力する（ステップＳ１）。また、操作者の操作に基づいて、関連情報検索装置１００は、入力処理部１２０によって、時間閾値またはアイテム数閾値等の閾値条件を受け付け、履歴データ処理部１３０に入力する（ステップＳ２）。なお、ステップＳ１とステップＳ２の処理順は任意である。 <Processing to be performed in advance before search-part 1>
As illustrated in FIG. 9, based on the operation of the operator, the related information search device 100 receives the target user's history data by the input processing unit 120 and inputs the history data to the history data processing unit 130 (step S1). . Further, based on the operation of the operator, the related information search device 100 accepts a threshold condition such as a time threshold or an item count threshold by the input processing unit 120 and inputs it to the history data processing unit 130 (step S2). Note that the processing order of step S1 and step S2 is arbitrary.

続いて、履歴データ処理部１３０は、セッションデータ生成手段１３１によって、受け付けた全履歴データからユーザ毎のセッションを生成し（ステップＳ３）、セッション統計量算出手段１３２によって、セッションの統計量として、セッション毎に、アイテム及びアイテムペアとその頻度をカウントする（ステップＳ４）。そして、履歴データ処理部１３０は、時間統計量算出手段１３３によって、アイテムペアについて時間の統計量として、アイテム間の統計量として時間平均を算出する（ステップＳ５）。そして、履歴データ処理部１３０は、アイテム数統計量算出手段１３４によって、アイテムペアについてアイテム数の統計量として、アイテム間の統計量として、異なりアイテム数の平均を算出する（ステップＳ６）。そして、履歴データ処理部１３０は、履歴データの統計量をデータ保存部１１０に出力する（ステップＳ７）。なお、ステップＳ５とステップＳ６の処理順は任意である。 Subsequently, the history data processing unit 130 generates a session for each user from the received all history data by the session data generation unit 131 (step S3), and the session statistics calculation unit 132 sets the session statistics as the session statistics. Every time, an item, an item pair, and its frequency are counted (step S4). Then, the history data processing unit 130 uses the time statistic calculation unit 133 to calculate the time average as the statistic between items as the time statistic for the item pair (step S5). Then, the history data processing unit 130 calculates the average of the number of different items as the statistical amount of the item number and the statistical amount between the items for the item pair by the item number statistical amount calculating unit 134 (step S6). Then, the history data processing unit 130 outputs history data statistics to the data storage unit 110 (step S7). Note that the processing order of steps S5 and S6 is arbitrary.

＜検索前に予め行う処理〜その２＞
図１０に示すように、操作者の操作に基づいて（または予め定められた所定のタイミングに）、関連情報検索装置１００は、計算処理部１４０によって、データ保存部１１０から、履歴データの統計量としてセッション数を読み出し、セッション数のパラメタを算出する（ステップＳ１１）。また、関連情報検索装置１００は、計算処理部１４０によって、データ保存部１１０から、アイテム間の統計量としての時間平均を読み出し、時間平均のパラメタを算出する（ステップＳ１２）。また、関連情報検索装置１００は、計算処理部１４０によって、データ保存部１１０から、アイテム間の統計量としての異なりアイテム数の平均を読み出し、異なりアイテム数のパラメタを算出する（ステップＳ１３）。なお、ステップＳ１１ないしステップＳ１３の処理順は任意である。 <Process to be performed in advance before search-part 2>
As shown in FIG. 10, based on the operation of the operator (or at a predetermined timing), the related information search apparatus 100 uses the calculation processing unit 140 from the data storage unit 110 to collect the statistics of history data. The number of sessions is read out, and a parameter for the number of sessions is calculated (step S11). Further, the related information search device 100 reads the time average as the statistical quantity between items from the data storage unit 110 by the calculation processing unit 140 and calculates the parameter of the time average (step S12). Further, the related information search apparatus 100 reads the average of the number of different items as the statistical quantity between items from the data storage unit 110 by the calculation processing unit 140, and calculates the parameter of the number of different items (step S13). Note that the processing order of steps S11 to S13 is arbitrary.

そして、計算処理部１４０は、第１関連度算出手段１４４によって、データ保存部１１０から、セッション数のパラメタを読み出し、検索対象とするアイテムのうち任意の２つのアイテム間の関連度を前記した式（１６）によりそれぞれ算出する（ステップＳ１４）。また、計算処理部１４０は、第２関連度算出手段１４５によって、データ保存部１１０から、時間平均のパラメタを読み出し、同様に任意の２つのアイテム間の関連度を前記した式（１７）によりそれぞれ算出する（ステップＳ１５）。また、計算処理部１４０は、第３関連度算出手段１４６によって、データ保存部１１０から、異なりアイテム数のパラメタを読み出し、同様に任意の２つのアイテム間の関連度を前記した式（１８）によりそれぞれ算出する（ステップＳ１６）。なお、ステップＳ１４ないしステップＳ１６の処理順は任意である。 Then, the calculation processing unit 140 reads the parameter of the number of sessions from the data storage unit 110 by the first relevance calculating unit 144, and calculates the relevance between any two items among the items to be searched. (16) is calculated respectively (step S14). In addition, the calculation processing unit 140 reads the time average parameter from the data storage unit 110 by the second relevance calculation unit 145, and similarly calculates the relevance between any two items by the above-described equation (17). Calculate (step S15). In addition, the calculation processing unit 140 reads out the parameter of the number of different items from the data storage unit 110 by the third relevance calculating unit 146, and similarly calculates the relevance between any two items according to the equation (18) described above. Each is calculated (step S16). Note that the processing order of steps S14 to S16 is arbitrary.

＜検索要求時の処理＞
図１１に示すように、操作者の操作に基づいて、関連情報検索装置１００は、入力処理部１２０によって、検索条件としてアイテムＩＤを受け付け、スコアリング部１５０に入力する（ステップＳ２１）。なお、検索条件としてのアイテムＩＤがデータ保存部１１０に保存されている場合には、スコアリング部１５０は、保存されているアイテムＩＤを読み込む。そして、スコアリング部１５０は、ユーザ数予測手段１５１によって、データ保存部１１０から読み出した統計量およびパラメタを利用して前記した式（１９）および式（２０）により、入力に対応するアイテムへアクセスするユーザ数を予測する（ステップＳ２２）。そして、スコアリング部１５０は、ランキングリスト作成手段１５２によって、データ保存部１１０から、入力に対応するアイテム（検索条件）に適合した関連度をスコアリング結果として抽出してスコアリング結果を並べ替えてランキングリストをそれぞれ作成する（ステップＳ２３）。そして、スコアリング部１５０は、ランキングリストを出力処理部１６０を介して図示しない出力装置に出力する（ステップＳ２４）。 <Processing at the time of search request>
As illustrated in FIG. 11, based on the operation of the operator, the related information search apparatus 100 receives an item ID as a search condition by the input processing unit 120 and inputs the item ID to the scoring unit 150 (step S 21). When the item ID as the search condition is stored in the data storage unit 110, the scoring unit 150 reads the stored item ID. Then, the scoring unit 150 accesses the item corresponding to the input by the above formula (19) and formula (20) using the statistics and parameters read from the data storage unit 110 by the user number predicting unit 151. The number of users to be predicted is predicted (step S22). Then, the scoring unit 150 uses the ranking list creation unit 152 to extract, from the data storage unit 110, the degree of relevance that matches the item (search condition) corresponding to the input, and rearranges the scoring results. Each ranking list is created (step S23). Then, the scoring unit 150 outputs the ranking list to an output device (not shown) via the output processing unit 160 (step S24).

なお、関連情報検索装置１００は、一般的なコンピュータに、前記した各ステップを実行させる関連情報検索プログラムを実行することで実現することもできる。このプログラムは、通信回線を介して配布することも可能であるし、ＣＤ−ＲＯＭやフラッシュメモリ等のコンピュータ読み取り可能な記録媒体に書き込んで配布することも可能である。 Note that the related information search apparatus 100 can also be realized by executing a related information search program that causes a general computer to execute the above-described steps. This program can be distributed via a communication line, or can be written and distributed on a computer-readable recording medium such as a CD-ROM or a flash memory.

本実施形態によれば、関連情報検索装置１００は、検索を実行する前に、各ユーザの行動履歴の時間の前後関係として、アイテムペアを含むセッション数等を用いることで、アイテム間の関連の時系列変化を考慮しつつ定義したアイテム間の関連度を予め算出できる。これにより、アイテムを、同じ嗜好に対する関連性で検索できると共に、アイテム毎のアクセスユーザ数を時系列にしたがって予測できる。また、関連情報検索装置１００は、従来と異なってアイテム間の関連性を用いるので、他の手法に比較してユーザの嗜好に合ったアイテムのリコメンドによるバーソナライズを効果的に実現することができる。さらに、関連情報検索装置１００は、ユーザの履歴データの形式や行動履歴の種類に依存することなく同様に関連情報の検索処理を実行できる。 According to the present embodiment, the related information search device 100 uses the number of sessions including item pairs as the time context of each user's action history before executing the search, thereby The degree of association between items defined in consideration of time series changes can be calculated in advance. Thereby, an item can be searched with the relevance to the same preference, and the number of access users for each item can be predicted in time series. Further, since the related information search device 100 uses the relevance between items unlike the conventional method, it is possible to effectively realize personalization by recommending items that match the user's preference as compared with other methods. Furthermore, the related information search device 100 can similarly perform related information search processing without depending on the format of the user history data and the type of action history.

以上、本発明の実施形態について説明したが、本発明はこれに限定されるものではなく、その趣旨を変えない範囲で実施することができる。例えば、本実施形態では、履歴データ処理部１３０において、ユーザの履歴データに基づく統計量として、「セッション数」、「時間平均」、「異なりアイテム数の平均」をすべて使用するベストモードとして説明したが、これに限定されず、少なくとも「セッション数」さえ使用できれば、本発明の効果を奏する。 As mentioned above, although embodiment of this invention was described, this invention is not limited to this, It can implement in the range which does not change the meaning. For example, in the present embodiment, the history data processing unit 130 has been described as the best mode in which “number of sessions”, “time average”, and “average of the number of different items” are all used as statistics based on user history data. However, the present invention is not limited to this, and the effect of the present invention can be obtained as long as at least the “number of sessions” can be used.

また、計算処理部１４０において指標のパラメタを求める処理も同様である。さらに、計算処理部１４０において、関連度を求める際に、「セッション数」、「時間平均」、「異なりアイテム数の平均」のすべてのパラメタを使用するベストモードとして説明したが、これに限定されず、少なくとも「セッション数」のパラメタさえ使用できれば、本発明の効果を奏する。また、スコアリング部１５０において、ユーザ数予測手段１５１は、式（１９）および式（２０）の両方を用いて予測するものとしたが、いずれか一方でもよいし、ユーザ数予測手段１５１は必須の構成でもない。 The processing for obtaining the parameter of the index in the calculation processing unit 140 is also the same. Furthermore, the calculation processing unit 140 has been described as the best mode using all parameters of “number of sessions”, “average of time”, and “average of the number of different items” when calculating the relevance, but the present invention is not limited to this. If at least the parameter “number of sessions” can be used, the effect of the present invention is obtained. Moreover, in the scoring part 150, although the user number prediction means 151 shall be predicted using both Formula (19) and Formula (20), either may be sufficient and the user number prediction means 151 is essential. It is not a configuration.

また、「セッション数」、「時間平均」、「異なりアイテム数の平均」のすべてのパラメタおよび関連度を求めてデータ保存部１１０に保存しておき、利用者の要求や利用者の目的に応じて、それぞれ異なる３つの観点の少なくともいずれかを用いて関連アイテムを検索するようにしてもよい。このように各パラメタやセッション等の統計量および関連度を求めてデータ保存部１１０に保存しておいた場合には、シーケンスをセッションに分割するなどの履歴の処理を最初から繰り返すことなく、１つの検索入力条件に対して、それぞれ異なる複数の観点の検索を高速に行うことができる。 In addition, all parameters and relevance of “number of sessions”, “time average”, and “average of different items” are obtained and stored in the data storage unit 110, according to the user's request and the user's purpose. Thus, the related item may be searched using at least one of three different viewpoints. Thus, when the statistics and the degree of association of each parameter, session, etc. are obtained and stored in the data storage unit 110, the history processing such as dividing the sequence into sessions is not repeated from the beginning. A plurality of different viewpoints can be searched at a high speed for each search input condition.

また、本実施形態では、検索処理の前に、計算処理部１４０においてアイテム間の関連度を予め計算しておくこととしたが、この場合、例えば、前記した式（１１）、式（１３）または式（１５）に示すμとしてデフォルト値を定めておき、このデフォルト値により対象とする履歴データについてのアイテム間の関連度を求めておくことができる。その後、関連情報検索装置１００を利用して検索を行うユーザが、パラメタμを任意の値に設定したときに、パラメタλ_n,aj，λ_t,aj，λ_d,ajを計算し直し、さらに、前記した式（１６）、式（１７）または式（１８）に示す関連度をあらためて計算することができる。これにより、ユーザの目的に応じた適切な関連アイテムを検索することができる。 Further, in the present embodiment, the degree of association between items is calculated in advance in the calculation processing unit 140 before the search processing. In this case, for example, the above-described equations (11) and (13) are used. Alternatively, a default value can be determined as μ shown in Expression (15), and the degree of association between items for the target history data can be obtained based on the default value. Thereafter, when the user who searches using the related information search apparatus 100 sets the parameter μ to an arbitrary value, the parameters λ _{n, aj} , λ _{t, aj} , λ _{d, aj} are recalculated, and The relevance shown in the equation (16), the equation (17), or the equation (18) can be calculated again. Thereby, a suitable related item according to a user's purpose can be searched.

また、関連情報検索装置１００は、１つのアイテムＩＤの入力に対して、複数の関連アイテムを出力することに限定されず、最も関連する１つの関連アイテムだけを出力してもよい。さらに、「ＡＮＤ」や「ＯＲ」等の条件による複数のアイテムＩＤの入力に対して、１以上の関連アイテムを出力することもできる。この場合に、例えば「ＡＮＤ」の条件に対しては、個々の関連度の積、「ＯＲ」の条件に対しては、個々の関連度の和を用いることができる。 Moreover, the related information search device 100 is not limited to outputting a plurality of related items in response to an input of one item ID, and may output only one related item that is most related. Furthermore, one or more related items can be output in response to input of a plurality of item IDs based on conditions such as “AND” and “OR”. In this case, for example, the product of the individual relevance levels can be used for the “AND” condition, and the sum of the individual relevance levels can be used for the “OR” condition.

また、本実施形態では、関連情報検索装置１００は、検索サービス（検索エンジンやＥＣサイト）のシステムに内包されるものとしたが、検索サービスのシステムとは独立したシステムとしても実装可能である。この場合には、複数の検索サービスに対して同時に関連アイテムのランキングリストを送信できる。また、この場合に、入力処理部１２０は、他の検索エンジンの出力結果やＥＣサイトの出力結果と、アイテムＩＤとを受け付けることができる。 In the present embodiment, the related information search device 100 is included in a search service (search engine or EC site) system, but can also be implemented as a system independent of the search service system. In this case, the related item ranking list can be transmitted simultaneously to a plurality of search services. In this case, the input processing unit 120 can accept an output result of another search engine, an output result of an EC site, and an item ID.

また、ランキングリストを、出力処理部１６０を介して図示しない出力装置に出力するものとしたが、ランキングリストを、独立した検索サービスに対して返すことも可能である。また、本発明は、本実施形態に示した検索サービスのみならず、ＥＣサイトの商品検索や情報フィルタリングにおいても適用可能である。 Further, although the ranking list is output to an output device (not shown) via the output processing unit 160, the ranking list can be returned to an independent search service. Further, the present invention is applicable not only to the search service shown in the present embodiment but also to EC site product search and information filtering.

また、関連情報検索装置１００を構成する装置は、１台に限定されることはなく、複数の装置に機能を分散配置してもよい。例えば、履歴データ処理部１３０および計算処理部１４０と、スコアリング部１５０とを、別々の装置として構成してもよい。これにより、各装置への負荷が分散され、高速な検索処理が実現可能となる。 Moreover, the apparatus which comprises the relevant-information search apparatus 100 is not limited to 1 unit | set, You may distribute and distribute a function to several apparatus. For example, the history data processing unit 130, the calculation processing unit 140, and the scoring unit 150 may be configured as separate devices. As a result, the load on each device is distributed, and high-speed search processing can be realized.

本発明の実施形態に係る関連情報検索装置を模式的に示す構成図である。It is a lineblock diagram showing typically the related information search device concerning the embodiment of the present invention. 図１に示した関連情報検索装置における履歴データの処理の概要を示す説明図である。It is explanatory drawing which shows the outline | summary of the process of the history data in the related information search device shown in FIG. 図１に示した関連情報検索装置における検索処理の概要を示す説明図である。It is explanatory drawing which shows the outline | summary of the search process in the related information search device shown in FIG. 複数のユーザの履歴データの概要を示す説明図である。It is explanatory drawing which shows the outline | summary of the log | history data of a some user. 図１に示した関連情報検索装置の動作の概要を示す説明図である。It is explanatory drawing which shows the outline | summary of operation | movement of the relevant-information search apparatus shown in FIG. 図１に示した関連情報検索装置の詳細な構成の一例を示すブロック図である。It is a block diagram which shows an example of a detailed structure of the related information search device shown in FIG. 図６に示した時間統計量算出手段の処理の説明図であって、（ａ）は参照方向が後方である場合、（ｂ）は参照方向が前後である場合をそれぞれ示す。FIG. 7 is an explanatory diagram of processing of the time statistic calculation unit shown in FIG. 6, where (a) shows the case where the reference direction is backward, and (b) shows the case where the reference direction is forward and backward. 図６に示したアイテム数統計量算出手段の処理の説明図であって、（ａ）は参照方向が後方である場合、（ｂ）は参照方向が前後である場合をそれぞれ示す。7A and 7B are explanatory diagrams of processing of the item number statistic calculation unit illustrated in FIG. 6, in which FIG. 6A illustrates a case where the reference direction is backward and FIG. 6B illustrates a case where the reference direction is front and rear. 図６に示した履歴データ処理部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the log | history data processing part shown in FIG. 図６に示した計算処理部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the calculation process part shown in FIG. 図６に示したスコアリング部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the scoring part shown in FIG.

Explanation of symbols

１００関連情報検索装置
１１０データ保存部
１２０入力処理部
１３０履歴データ処理部（履歴データ処理手段）
１３１セッションデータ生成手段
１３２セッション統計量算出手段
１３３時間統計量算出手段
１３４アイテム数統計量算出手段（オブジェクト数統計量算出手段）
１４０計算処理部（計算処理手段）
１４１セッション数算出手段
１４２時間平均算出手段
１４３異なりアイテム数算出手段（異なりオブジェクト数算出手段）
１５０スコアリング部（スコアリング手段）
１４４第１関連度算出手段
１４５第２関連度算出手段
１４６第３関連度算出手段
１５１ユーザ数予測手段
１５２ランキングリスト作成手段
１６０出力処理部 100 Related Information Retrieval Device 110 Data Storage Unit 120 Input Processing Unit 130 History Data Processing Unit (History Data Processing Means)
131 Session data generation means 132 Session statistics calculation means 133 Time statistics calculation means 134 Item number statistics calculation means (object number statistics calculation means)
140 Calculation processing unit (calculation processing means)
141 Session number calculating means 142 Time average calculating means 143 Different item number calculating means (different object number calculating means)
150 Scoring part (scoring means)
144 First relevance calculating means 145 Second relevance calculating means 146 Third relevance calculating means 151 User number predicting means 152 Ranking list creating means 160 Output processing unit

Claims

Searching for at least one of the objects to be analyzed based on the object and the index indicating the relationship between the objects obtained by analyzing the history data indicating the action history of the user performing the action of accessing the object in advance As a condition, a related information search device that searches related objects related to the search condition as related information from among other objects that are the subject of the analysis,
Individual session data for each user represented by a set of user identification information, object identification information, and access time information indicating the time when the user performed an action to access the object, from the history data to be processed From the generated session data, the number of sessions in which a predetermined object appears, the number of sessions in which two predetermined objects whose access order is specified in the object pair appear, and the predetermined object as an object pair History data processing means for obtaining the number of sessions as one of the above as statistics of the history data,
A calculation processing means for calculating a parameter using the number of sessions obtained as a statistic of the history data as the index, and calculating a degree of association between any two objects based on the calculated index; ,
Input processing means for receiving the search condition;
At least a related object that is most relevant to the object indicated by the search condition among the other objects that are subject to the analysis based on the relevance degree calculated in advance with respect to the object indicated by the accepted search condition. A scoring means for generating a ranking list including:
Output processing means for outputting the ranking list;
A related information search device comprising:

The history data processing means includes
Time statistic calculation means for obtaining, as a time statistic, an average value of time indicating a difference in access time to any two objects from the number of sessions obtained as the statistic of the history data and the individual session data With
The calculation processing means includes:
Based on the average value of the time obtained as the time statistic, a time average calculation means for calculating a parameter indicating the time average as the index,
The related information search device according to claim 1, further comprising: a second relevance calculating unit that calculates a relevance between any two objects based on the calculated time average parameter. .

The history data processing means includes
Indicates the number of different objects accessed from when the user accesses one of the object pairs until the other is accessed, based on the number of sessions determined as the statistics of the history data and the individual session data An object number statistic calculating means for obtaining an average number of objects as an object number statistic;
The calculation processing means includes:
Based on the average value of the number of objects obtained as the object number statistic, a different object number calculating means for calculating a parameter indicating the average of the different number of objects as the index,
The association according to claim 1, further comprising: a third association degree calculating unit that calculates an association degree between any two objects based on the calculated parameter indicating the average number of different objects. Information retrieval device.

Searching for at least one of the objects to be analyzed based on the object and the index indicating the relationship between the objects obtained by analyzing the history data indicating the action history of the user performing the action of accessing the object in advance As a condition, a related information search method for a related information search device that searches related objects related to the search condition as related information from among other objects that are the subject of the analysis,
The related information retrieval device includes:
An input processing unit, a history data processing unit, a calculation processing unit, a scoring unit, and an output processing unit;
The history data processing means includes
Receiving the history data of interest;
From the received history data, individual session data for each user represented by a set of user identification information, object identification information, and access time information indicating a time at which the user performs an action to access the object. Generating step;
From the generated session data, the number of sessions in which a predetermined one object has appeared, the number of sessions in which a predetermined two objects in which the access order is specified in the object pair has appeared, and the predetermined object are included as one of the object pairs A step of obtaining the number of sessions as a statistic of the history data,
The calculation processing means calculates a parameter using the number of sessions obtained as a statistic of the history data as the index;
Executing a degree of relevance between any two objects based on the calculated index,
The input processing means executes a step of accepting the search condition,
The scoring means includes
At least a related object that is most relevant to the object indicated by the search condition among the other objects that are subject to the analysis based on the relevance degree calculated in advance with respect to the object indicated by the accepted search condition. Perform the steps to generate a ranking list containing:
The output processing means executes a step of outputting the ranking list.

The history data processing means includes a time statistic calculation means,
The time statistic calculation means calculates an average value of time indicating a difference in access time to any two objects from the number of sessions obtained as the statistic of the history data and the individual session data. Perform the step you want as a quantity,
The calculation processing means includes a time average calculation means and a second relevance calculation means,
The time average calculation means executes a step of calculating a parameter indicating a time average as an index at the time of the search based on the average value of the time obtained as the time statistic.
The said 2nd relevance degree calculation means performs the step which calculates the relevance degree between arbitrary two objects based on the parameter which shows the calculated time average, respectively. Related information search method.

The history data processing means includes object number statistic calculation means,
The object number statistic calculating means is configured to determine whether the user accesses one of the object pairs after accessing the other from the number of sessions obtained as the statistic of the history data and the individual session data. The step of obtaining the average value of the number of objects indicating the number of different objects accessed as an object number statistic,
The calculation processing means includes different object number calculation means and third relevance calculation means,
The different object number calculation means executes a step of calculating a parameter indicating an average of different objects as an index at the time of the search based on the average value of the object numbers obtained as the object number statistic.
5. The third relevance calculating means executes a step of calculating relevance between any two objects based on the calculated parameter indicating an average of the number of different objects, respectively. The related information search method described in 1.

The related information search program for implement | achieving the function of the related information search apparatus as described in any one of Claim 1 thru | or 3 with a computer.

A computer-readable recording medium on which the related information retrieval program according to claim 7 is recorded.