JP2021012524A

JP2021012524A - Information processing device, information processing method, and program

Info

Publication number: JP2021012524A
Application number: JP2019126120A
Authority: JP
Inventors: コウ牛; Niu Hao; 慧米川; Kei Yonekawa; 茂莉黒川; Mori Kurokawa; 亜令小林; Arei Kobayashi
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2021-02-04
Anticipated expiration: 2039-07-05
Also published as: JP7039525B2

Abstract

To generate feature vectors in consideration of behavior tendencies of users in a plurality of domains.SOLUTION: An information processing device 1 includes: a time series data acquisition unit 121 for acquiring first time series data of a first domain and second time series data of a second domain containing item information indicating items corresponding to events, event occurrence time points, and user identification information; a specification unit 122 for identifying a combination of time series data corresponding to common users in the first domain and the second domain; a learning unit 123 for generating a model by simultaneously learning a relation of appearance of an item and the other items based on the time series data corresponding to the identified combination and a relation of common users corresponding to each time series data; and a feature vector generation unit 124 for generating a feature vector of the items and a feature vector of the user to which the relation of appearance of the items in each domain is reflected based on the generated model.SELECTED DRAWING: Figure 2

Description

本発明は、特徴ベクトルを生成する情報処理装置、情報処理方法、及びプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program for generating a feature vector.

幅広い業種で時系列データ分析が使われている。時系列データ分析では、過去のデータに基づいて将来のデータを推定する場面が多い。従来、線形の時系列解析手法（例えば、自己回帰和分移動平均モデル）がよく使われてきたが、機械学習手法も提案されている。機械学習手法は、非線形な時系列データや複雑な周期の時系列データにも効果的に対応できる。 Time series data analysis is used in a wide range of industries. In time series data analysis, future data is often estimated based on past data. Conventionally, a linear time series analysis method (for example, an autoregressive integrated moving average model) has been often used, but a machine learning method has also been proposed. Machine learning methods can effectively handle non-linear time-series data and time-series data with complex periods.

近年、機械学習手法として、時系列データの各アイテムに特徴ベクトルを割り当てることにより、アイテムのクラスタリングや時系列パターンの分類等を行う方法が提案されている。例えば、非特許文献１には、Ｗｏｒｄ２Ｖｅｃ（Ｄｏｃ２Ｖｅｃ）を利用して、全ユーザの時系列データセットの各アイテムに特徴ベクトルを割り当て、当該特徴ベクトルに基づいて、ユーザにアイテムの推薦を行うことが開示されている。また、非特許文献２には、異なるドメインの複数の時系列データセットをユーザの共通性に基づいて統合し、統合された時系列データに含まれる複数のアイテムの関係性に着目することにより、これらのアイテムに同じ基準で特徴ベクトルを割り当てることが開示されている。非特許文献２に記載の技術を用いて生成された特徴ベクトルを用いることにより、複数の時系列データの間で転移学習を行うことができる。 In recent years, as a machine learning method, a method of clustering items and classifying time-series patterns by assigning feature vectors to each item of time-series data has been proposed. For example, in Non-Patent Document 1, Word2Vec (Doc2Vec) is used to assign a feature vector to each item in the time-series data set of all users, and recommend the item to the user based on the feature vector. It is disclosed. Further, in Non-Patent Document 2, a plurality of time series data sets of different domains are integrated based on the commonality of users, and the relationship between a plurality of items included in the integrated time series data is focused on. It is disclosed to assign feature vectors to these items on the same basis. By using the feature vector generated by the technique described in Non-Patent Document 2, transfer learning can be performed between a plurality of time series data.

Ozsoy, Makbule Gulcin. "From word embeddings to item recommendation." arXiv preprint arXiv:1601.01356, ２０１６年Ozsoy, Makbule Gulcin. "From word embeddings to item recommendation." ArXiv preprint arXiv: 1601.01356, 2016 Hao Niu, et al. "Transfer Learning Among Time Series Data." The 21st Information-Based Induction Sciences Workshop. Sapporo, ２０１８年Hao Niu, et al. "Transfer Learning Among Time Series Data." The 21st Information-Based Induction Sciences Workshop. Sapporo, 2018

ところで、複数のドメインで共通するユーザの中には、複数のドメインのそれぞれで行動傾向が異なるユーザが存在する。これに対し、非特許文献２に開示された技術では、それぞれのドメインにおいて行動傾向が異なるユーザについて考慮していない。このため、複数のドメインのそれぞれで行動傾向が異なるユーザの時系列データに基づいて特徴ベクトルが生成される。したがって、複数のドメインのそれぞれで行動傾向が類似するユーザの時系列データセットに基づいて特徴ベクトルを生成できるようにすることが求められる。 By the way, among the users common to a plurality of domains, there are users who have different behavioral tendencies in each of the plurality of domains. On the other hand, the technique disclosed in Non-Patent Document 2 does not consider users who have different behavioral tendencies in each domain. Therefore, the feature vector is generated based on the time series data of the users whose behavior tendencies are different in each of the plurality of domains. Therefore, it is required to be able to generate a feature vector based on a time series data set of users having similar behavioral tendencies in each of a plurality of domains.

そこで、本発明はこれらの点に鑑みてなされたものであり、複数のドメインにおけるユーザの行動傾向を考慮して特徴ベクトルを生成することができる情報処理装置、情報処理方法、及びプログラムを提供することを目的とする。 Therefore, the present invention has been made in view of these points, and provides an information processing device, an information processing method, and a program capable of generating a feature vector in consideration of the behavioral tendency of a user in a plurality of domains. The purpose is.

本発明の第１の態様に係る情報処理装置は、イベントに対応するアイテムを示すアイテム情報と、前記イベントが発生した時刻と、前記イベントを発生させたユーザを識別するユーザ識別情報とを含む時系列データであって、第１ドメインの時系列データである第１時系列データと、第２ドメインの時系列データである第２時系列データとを取得する時系列データ取得部と、前記ユーザのうち、前記第１ドメインと前記第２ドメインとで共通する共通ユーザに対応する前記第１時系列データと、前記第２時系列データとの組み合わせを特定する特定部と、前記特定部が特定した前記組み合わせに対応する前記第１時系列データと前記第２時系列データとに基づいて、前記アイテムと他の前記アイテムとの出現の関係性と、当該第１時系列データに対応する前記共通ユーザと、当該第２時系列データに対応する前記共通ユーザとの関係性とを同時に学習することにより、前記アイテム又は前記ユーザに関係する他のアイテム又は他のユーザを予測するモデルを生成する学習部と、前記学習部が生成した前記モデルに基づいて、前記アイテムの特徴ベクトルと、前記第１ドメイン及び前記第２ドメインにおける前記アイテムの出現の関係性が反映された前記ユーザの特徴ベクトルとを生成する特徴ベクトル生成部と、を備える。 When the information processing device according to the first aspect of the present invention includes item information indicating an item corresponding to an event, a time when the event occurs, and user identification information that identifies a user who has generated the event. The time-series data acquisition unit that acquires the first time-series data, which is the time-series data of the first domain, and the second time-series data, which is the time-series data of the second domain, and the user's Among them, the specific unit that specifies the combination of the first time series data corresponding to the common user common to the first domain and the second domain and the second time series data, and the specific unit are specified. Based on the first time-series data corresponding to the combination and the second time-series data, the appearance relationship between the item and the other items and the common user corresponding to the first time-series data. And a learning unit that generates a model for predicting the item or another item related to the user or another user by simultaneously learning the relationship with the common user corresponding to the second time series data. And, based on the model generated by the learning unit, the feature vector of the item and the feature vector of the user reflecting the relationship of appearance of the item in the first domain and the second domain are generated. A feature vector generation unit is provided.

前記情報処理装置は、所定の条件に基づいて、前記第１時系列データから、当該第１時系列データに対応する期間のうちの部分的な期間に対応する第１部分時系列データを複数抽出するとともに、前記第２時系列データから、当該第２時系列データに対応する期間のうちの部分的な期間に対応する第２部分時系列データを複数抽出する抽出部をさらに備え、前記特定部は、複数の前記共通ユーザのそれぞれに対して、前記第１部分時系列データと、前記第２部分時系列データとの組み合わせを複数特定し、前記学習部は、前記複数の組み合わせに対応する前記第１部分時系列データと前記第２部分時系列データとに基づいて、前記アイテムの出現性と、前記ユーザの関係性とを学習してもよい。 The information processing apparatus extracts a plurality of first partial time series data corresponding to a partial period of the period corresponding to the first time series data from the first time series data based on a predetermined condition. In addition, the specific unit is further provided with an extraction unit for extracting a plurality of second partial time series data corresponding to a partial period of the period corresponding to the second time series data from the second time series data. Specifies a plurality of combinations of the first partial time series data and the second partial time series data for each of the plurality of common users, and the learning unit corresponds to the plurality of combinations. Based on the first partial time series data and the second partial time series data, the appearance of the item and the relationship between the users may be learned.

前記抽出部は、所定のイベントに対応するアイテムの出現時間に基づいて、複数の前記第１部分時系列データ及び前記第２部分時系列データを抽出してもよい。
前記抽出部は、同一の時間間隔を有する、複数の前記第１部分時系列データ及び前記第２部分時系列データを抽出してもよい。 The extraction unit may extract a plurality of the first partial time series data and the second partial time series data based on the appearance time of the item corresponding to the predetermined event.
The extraction unit may extract a plurality of the first partial time series data and the second partial time series data having the same time interval.

前記学習部は、前記特定部が特定した前記第１時系列データ及び前記第２時系列データのそれぞれに対応し、アイテムの出現順にアイテムを識別するアイテム識別情報を含むシーケンスデータを生成し、当該第１時系列データと当該第２時系列データとに、当該第１時系列データと当該第２時系列データとのユーザを区別するための学習用のユーザ識別情報を割り当て、生成したシーケンスデータと、前記学習用のユーザ識別情報とに基づいて、前記アイテムの出現性と、前記ユーザの関係性を学習してもよい。 The learning unit corresponds to each of the first time-series data and the second time-series data specified by the specific unit, and generates sequence data including item identification information for identifying items in the order of appearance of the items. Sequence data generated by assigning user identification information for learning to distinguish a user between the first time series data and the second time series data to the first time series data and the second time series data. , The appearance of the item and the relationship between the users may be learned based on the user identification information for learning.

前記学習部は、前記第１ドメインにおけるユーザ識別情報を線形結合することにより第１合成変数を構成するとともに、前記第２ドメインのユーザ識別情報を線形結合することにより第２合成変数を構成し、前記第１ドメインにおけるユーザ識別情報と前記第２ドメインのユーザ識別情報との正準相関分析を行うことにより、前記第１合成変数と前記第２合成変数との相関係数が相対的に大きくなるときの前記第１合成変数及び前記第２合成変数に含まれる前記ユーザ識別情報のそれぞれに対応する係数を学習し、当該係数に基づいて前記ユーザの関係性を学習してもよい。 The learning unit constructs the first synthetic variable by linearly connecting the user identification information in the first domain, and constitutes the second synthetic variable by linearly connecting the user identification information in the second domain. By performing canonical correlation analysis between the user identification information in the first domain and the user identification information in the second domain, the correlation coefficient between the first synthetic variable and the second synthetic variable becomes relatively large. The coefficient corresponding to each of the first synthetic variable and the user identification information included in the second synthetic variable may be learned, and the relationship between the users may be learned based on the coefficient.

第１時系列データには、所定のイベントと、前記所定のイベントとは異なるイベントとの少なくともいずれかが含まれており、前記特徴ベクトル生成部は、前記第１時系列データと、前記第２時系列データとの少なくともいずれかに含まれる複数のアイテムのそれぞれの前記特徴ベクトルに基づいて、前記第１時系列データと、前記第２時系列データとに対応するユーザの特徴ベクトルをそれぞれ第１特徴ベクトル、第２特徴ベクトルとして生成し、前記情報処理装置は、前記第１ドメインに対応する前記ユーザの特徴ベクトルである第１特徴ベクトルと、当該ユーザが所定のイベントを発生させたか否かの結果とに基づいて、当該第１特徴ベクトルの入力に対して、ユーザを、前記所定のイベントを発生させたユーザと、前記所定のイベントを発生させなかったユーザとに分類する分類器を生成し、生成した前記分類器に前記第２特徴ベクトルを入力することにより、前記第２特徴ベクトルに対応するユーザを、前記所定のイベントを発生させると予測されるユーザと、前記所定のイベントを発生させないと予測されるユーザとに分類する予測部をさらに備えてもよい。 The first time-series data includes at least one of a predetermined event and an event different from the predetermined event, and the feature vector generation unit includes the first time-series data and the second. Based on the feature vector of each of the plurality of items included in at least one of the time series data, the first time series data and the user's feature vector corresponding to the second time series data are each first. Generated as a feature vector and a second feature vector, the information processing apparatus includes a first feature vector which is a feature vector of the user corresponding to the first domain and whether or not the user has generated a predetermined event. Based on the result, a classifier is generated to classify the users into the user who generated the predetermined event and the user who did not generate the predetermined event in response to the input of the first feature vector. By inputting the second feature vector into the generated classifier, the user corresponding to the second feature vector is not generated with the user predicted to generate the predetermined event and the predetermined event. It may further include a prediction unit that classifies the user as a user who is predicted to be.

前記予測部は、前記共通ユーザに対応する前記第１特徴ベクトルと、前記第２特徴ベクトルとの距離に基づいて、前記分類器の生成に用いる共通ユーザを選択し、選択した共通ユーザの前記第１特徴ベクトルと、当該共通ユーザが所定のイベントを発生させたか否かの結果とに基づいて、前記分類器を生成してもよい。 The prediction unit selects a common user to be used for generating the classifier based on the distance between the first feature vector corresponding to the common user and the second feature vector, and the first of the selected common users. The classifier may be generated based on one feature vector and the result of whether or not the common user has generated a predetermined event.

前記情報処理装置は、前記分類器において前記所定のイベントを発生させると予測されたユーザに対して、前記所定のイベントを発生させたユーザに広告を配信する広告配信部をさらに備えてもよい。 The information processing device may further include an advertisement distribution unit that delivers an advertisement to a user who has generated the predetermined event to a user who is predicted to generate the predetermined event in the classifier.

本発明の第２の態様に係る情報処理方法は、コンピュータが実行する、イベントに対応するアイテムを示すアイテム情報と、前記イベントが発生した時刻と、前記イベントを発生させたユーザを識別するユーザ識別情報とを含む時系列データであって、第１ドメインの時系列データである第１時系列データと、第２ドメインの時系列データである第２時系列データとを取得するステップと、前記ユーザのうち、前記第１ドメインと前記第２ドメインとで共通する共通ユーザに対応する前記第１時系列データと、前記第２時系列データとの組み合わせを特定するステップと、特定された前記組み合わせに対応する前記第１時系列データと前記第２時系列データとに基づいて、前記アイテムと他の前記アイテムとの出現の関係性と、当該第１時系列データに対応する前記共通ユーザと、当該第２時系列データに対応する前記共通ユーザとの関係性とを同時に学習することにより、前記アイテム又は前記ユーザに関係する他のアイテム又は他のユーザを予測するモデルを生成するステップと、生成された前記モデルに基づいて、前記アイテムの特徴ベクトルと、前記第１ドメイン及び前記第２ドメインにおける前記アイテムの出現の関係性が反映された前記ユーザの特徴ベクトルとを生成するステップと、を備える。 In the information processing method according to the second aspect of the present invention, item information indicating an item corresponding to an event executed by a computer, a time when the event occurs, and user identification for identifying the user who generated the event The step of acquiring the first time-series data which is the time-series data including the information and is the time-series data of the first domain and the second time-series data which is the time-series data of the second domain, and the user. Among the steps, the step of specifying the combination of the first time series data corresponding to the common user common to the first domain and the second domain and the second time series data, and the specified combination. Based on the corresponding first time-series data and the second time-series data, the relationship between the appearance of the item and the other items, the common user corresponding to the first time-series data, and the said. By simultaneously learning the relationship with the common user corresponding to the second time series data, a step of generating a model for predicting the item or another item related to the user or another user is generated. Based on the model, a step of generating a feature vector of the item and a feature vector of the user reflecting the relationship of appearance of the item in the first domain and the second domain is provided.

本発明の第３の態様に係るプログラムは、コンピュータを、イベントに対応するアイテムを示すアイテム情報と、前記イベントが発生した時刻と、前記イベントを発生させたユーザを識別するユーザ識別情報とを含む時系列データであって、第１ドメインの時系列データである第１時系列データと、第２ドメインの時系列データである第２時系列データとを取得する時系列データ取得部、前記ユーザのうち、前記第１ドメインと前記第２ドメインとで共通する共通ユーザに対応する前記第１時系列データと、前記第２時系列データとの組み合わせを特定する特定部、前記特定部が特定した前記組み合わせに対応する前記第１時系列データと前記第２時系列データとに基づいて、前記アイテムと他の前記アイテムとの出現の関係性と、当該第１時系列データに対応する前記共通ユーザと、当該第２時系列データに対応する前記共通ユーザとの関係性とを同時に学習することにより、前記アイテム又は前記ユーザに関係する他のアイテム又は他のユーザを予測するモデルを生成する学習部、及び、前記学習部が生成した前記モデルに基づいて、前記アイテムの特徴ベクトルと、前記第１ドメイン及び前記第２ドメインにおける前記アイテムの出現の関係性が反映された前記ユーザの特徴ベクトルとを生成する特徴ベクトル生成部、として機能させる。 The program according to the third aspect of the present invention includes, the computer, the item information indicating the item corresponding to the event, the time when the event occurs, and the user identification information for identifying the user who generated the event. A time-series data acquisition unit for acquiring the first time-series data, which is the time-series data of the first domain, and the second time-series data, which is the time-series data of the second domain. Among them, a specific unit that specifies a combination of the first time-series data corresponding to a common user common to the first domain and the second domain and the second time-series data, and the specific unit specified by the specific unit. Based on the first time-series data corresponding to the combination and the second time-series data, the relationship between the appearance of the item and the other items, and the common user corresponding to the first time-series data. , A learning unit that generates a model for predicting the item or another item related to the user or another user by simultaneously learning the relationship with the common user corresponding to the second time-series data. Then, based on the model generated by the learning unit, a feature vector of the item and a feature vector of the user reflecting the relationship of appearance of the item in the first domain and the second domain are generated. It functions as a feature vector generator.

本発明によれば、複数のドメインにおけるユーザの行動傾向を考慮して特徴ベクトルを生成することができるという効果を奏する。 According to the present invention, there is an effect that the feature vector can be generated in consideration of the behavioral tendency of the user in a plurality of domains.

第１実施形態に係る情報処理装置の概要を説明する図である。It is a figure explaining the outline of the information processing apparatus which concerns on 1st Embodiment. 第１実施形態に係る情報処理装置の構成を示す図である。It is a figure which shows the structure of the information processing apparatus which concerns on 1st Embodiment. 第１実施形態に係る第１時系列データ及び第２時系列データの例を示す図である。It is a figure which shows the example of the 1st time series data and 2nd time series data which concerns on 1st Embodiment. 第１実施形態に係る学習部による学習例を説明する図である。It is a figure explaining the learning example by the learning part which concerns on 1st Embodiment. 第１実施形態に係る特徴ベクトル生成部が生成したユーザの特徴ベクトルを特徴空間に配置した例を示す図である。It is a figure which shows the example which arranged the user's feature vector generated by the feature vector generation part which concerns on 1st Embodiment in a feature space. 第１実施形態に係る分類器を学習させた例を示す図である。It is a figure which shows the example which trained the classifier which concerns on 1st Embodiment. 第１実施形態に係る分類器により第２特徴ベクトルが分類された例を示す図である。It is a figure which shows the example which the 2nd feature vector was classified by the classifier which concerns on 1st Embodiment. 第１実施形態に係る情報処理装置がユーザの特徴ベクトル及びアイテムの特徴ベクトルを生成するときの処理の流れを示すフローチャートである。It is a flowchart which shows the flow of processing when the information processing apparatus which concerns on 1st Embodiment generates a feature vector of a user and a feature vector of an item. 第１実施形態に係る情報処理装置が所定のイベントを発生させるユーザを予測するときの処理の流れを示すフローチャートである。It is a flowchart which shows the flow of processing when the information processing apparatus which concerns on 1st Embodiment predicts a user who generates a predetermined event. 第２実施形態に係る情報処理装置の構成を示す図である。It is a figure which shows the structure of the information processing apparatus which concerns on 2nd Embodiment.

＜第１実施形態＞
［情報処理装置の概要］
図１は、第１実施形態に係る情報処理装置の概要を説明する図である。情報処理装置は、異なるドメインの時系列データに対応するユーザの特徴を示す特徴ベクトルと、当該時系列データに含まれるアイテムの特徴を示す特徴ベクトルとを生成するコンピュータである。 <First Embodiment>
[Overview of information processing equipment]
FIG. 1 is a diagram for explaining an outline of the information processing apparatus according to the first embodiment. The information processing device is a computer that generates a feature vector indicating the characteristics of a user corresponding to time series data of different domains and a feature vector indicating the characteristics of items included in the time series data.

情報処理装置は、第１ドメインの時系列データである第１時系列データと、第２ドメインの時系列データである第２時系列データとを取得する。時系列データには、イベントに対応するアイテムを示すアイテム情報と、イベントが発生した時刻と、イベントを発生させたユーザを識別するユーザ識別情報とが含まれている。 The information processing apparatus acquires the first time-series data which is the time-series data of the first domain and the second time-series data which is the time-series data of the second domain. The time-series data includes item information indicating an item corresponding to the event, the time when the event occurred, and user identification information for identifying the user who generated the event.

情報処理装置は、第１ドメインと第２ドメインとで共通する共通ユーザに対応する第１時系列データと、第２時系列データとの組み合わせを特定する。情報処理装置は、特定した組み合わせに含まれる第１時系列データと第２時系列データとに基づいて、アイテムと他のアイテムとの出現の関係性と、当該第１時系列データに対応する共通ユーザと、当該第２時系列データに対応する共通ユーザとの関係性とを同時に学習し、アイテム又はユーザに関係する他のアイテム又は他のユーザを予測するモデルを生成する。 The information processing apparatus specifies a combination of the first time-series data corresponding to the common user common to the first domain and the second domain and the second time-series data. The information processing device is based on the first time-series data and the second time-series data included in the specified combination, and the relationship between the appearance of the item and other items and the common corresponding to the first time-series data. The relationship between the user and the common user corresponding to the second time series data is learned at the same time, and a model for predicting the item or another item related to the user or another user is generated.

図１に示す例において、情報処理装置は、アイテムＩ_ｔと、当該アイテムＩｔの前後に出現する他のアイテムＩ_ｔ−２、Ｉ_ｔ−１、Ｉ_ｔ＋１、Ｉ_ｔ＋２とに基づいて、アイテムの出現の関係性を学習する。また、情報処理装置は、第１時系列データに対応する共通ユーザｕと、第２時系列データに対応する共通ユーザｕ’との関係性についても学習する。これにより情報処理装置は、図１に示すように、重み行列Ｗ、Ｄを含むモデルを生成する。 In the example shown in FIG. 1, the information processing apparatus, on the basis and item _{I t,} the other items _{_{_{I t-2, I t-}}} 1, and _{_{I t + 1, I t +}} 2 appearing before and after the item It, item Learn the relationship of appearance. The information processing device also learns about the relationship between the common user u corresponding to the first time series data and the common user u'corresponding to the second time series data. As a result, the information processing apparatus generates a model including the weight matrices W and D, as shown in FIG.

情報処理装置は、生成されたモデルに含まれる重み行列Ｗ、Ｄに基づいて、アイテムの特徴ベクトルと、第１ドメイン及び第２ドメインにおけるアイテムの出現の関係性が反映されたユーザの特徴ベクトルとを生成する。このようにすることで、情報処理装置は、複数のドメインにおけるユーザの行動傾向を考慮して特徴ベクトルを生成することができる。
以下、情報処理装置の構成について説明する。 Based on the weight matrices W and D included in the generated model, the information processing device has an item feature vector and a user feature vector that reflects the relationship between the appearance of items in the first domain and the second domain. To generate. By doing so, the information processing apparatus can generate a feature vector in consideration of the behavioral tendency of the user in a plurality of domains.
Hereinafter, the configuration of the information processing device will be described.

［情報処理装置１の構成例］
図２は、第１実施形態に係る情報処理装置１の構成を示す図である。情報処理装置１は、記憶部１１と、制御部１２とを備える。 [Configuration example of information processing device 1]
FIG. 2 is a diagram showing a configuration of the information processing device 1 according to the first embodiment. The information processing device 1 includes a storage unit 11 and a control unit 12.

記憶部１１は、例えば、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）等である。記憶部１１は、情報処理装置１を機能させるための各種プログラムを記憶する。例えば、記憶部１１は、情報処理装置１の制御部１２を、時系列データ取得部１２１、特定部１２２、学習部１２３、特徴ベクトル生成部１２４、予測部１２５、広告配信部１２６として機能させるプログラムを記憶する。なお、本プログラムは、複数のプログラムから構成されていてもよい。 The storage unit 11 is, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), or the like. The storage unit 11 stores various programs for operating the information processing device 1. For example, the storage unit 11 is a program that causes the control unit 12 of the information processing device 1 to function as a time series data acquisition unit 121, a specific unit 122, a learning unit 123, a feature vector generation unit 124, a prediction unit 125, and an advertisement distribution unit 126. Remember. In addition, this program may be composed of a plurality of programs.

制御部１２は、例えばＣＰＵ（Central Processing Unit）である。制御部１２は、記憶部１１に記憶されている各種プログラムを実行することにより、情報処理装置１に係る機能を制御する。制御部１２は、記憶部１１に記憶されているプログラムを実行することにより、時系列データ取得部１２１、特定部１２２、学習部１２３、特徴ベクトル生成部１２４、予測部１２５、及び広告配信部１２６として機能する。 The control unit 12 is, for example, a CPU (Central Processing Unit). The control unit 12 controls the functions related to the information processing device 1 by executing various programs stored in the storage unit 11. By executing the program stored in the storage unit 11, the control unit 12 executes a time-series data acquisition unit 121, a specific unit 122, a learning unit 123, a feature vector generation unit 124, a prediction unit 125, and an advertisement distribution unit 126. Functions as.

［アイテムの特徴ベクトルの生成］
本実施形態において、時系列データ取得部１２１、特定部１２２、学習部１２３、特徴ベクトル生成部１２４は、協働することにより、時系列データに含まれるアイテムの特徴ベクトル及びユーザの特徴ベクトルを生成する。以下、アイテムの特徴ベクトルの生成に係る時系列データ取得部１２１、特定部１２２、学習部１２３、特徴ベクトル生成部１２４の機能について説明する。 [Generate item feature vector]
In the present embodiment, the time series data acquisition unit 121, the specific unit 122, the learning unit 123, and the feature vector generation unit 124 cooperate to generate the feature vector of the item included in the time series data and the feature vector of the user. To do. Hereinafter, the functions of the time-series data acquisition unit 121, the specific unit 122, the learning unit 123, and the feature vector generation unit 124 related to the generation of the feature vector of the item will be described.

時系列データ取得部１２１は、第１ドメインの時系列データである第１時系列データと、第２ドメインの時系列データである第２時系列データとを取得する。例えば、時系列データ取得部１２１は、複数のユーザのＥＣサイトにおけるアイテムの閲覧及び購買履歴を示す情報を第１時系列データとして取得するとともに、複数のユーザのそれぞれのウェブサイトの閲覧履歴を示す情報を第２時系列データとして取得する。時系列データ取得部１２１は、例えば、第１時系列データと第２時系列データとを収集する情報収集サーバ（不図示）から、複数の第１時系列データと、複数の第２時系列データとを、例えば、１週間おきに取得する。 The time-series data acquisition unit 121 acquires the first time-series data which is the time-series data of the first domain and the second time-series data which is the time-series data of the second domain. For example, the time-series data acquisition unit 121 acquires information indicating the browsing and purchasing history of items on the EC sites of a plurality of users as the first time-series data, and also shows the browsing history of each of the websites of the plurality of users. The information is acquired as the second time series data. The time-series data acquisition unit 121 receives, for example, a plurality of first time-series data and a plurality of second time-series data from an information collection server (not shown) that collects the first time-series data and the second time-series data. And, for example, every other week.

図３は、第１実施形態に係る第１時系列データ及び第２時系列データの例を示す図である。図３（ａ）は、第１時系列データＤ１の例を示し、図３（ｂ）は、第２時系列データＤ２の例を示している。 FIG. 3 is a diagram showing an example of the first time series data and the second time series data according to the first embodiment. FIG. 3A shows an example of the first time series data D1, and FIG. 3B shows an example of the second time series data D2.

第１時系列データには、イベントに対応するアイテムを示すアイテム情報と、イベントが発生した時刻と、イベントを発生させたユーザを識別するユーザ識別情報とを関連付けたイベントデータが複数含まれている。 The first time-series data includes a plurality of event data in which item information indicating an item corresponding to an event, the time when the event occurred, and user identification information for identifying the user who generated the event are associated with each other. ..

図３に示す例では、楕円形状のマークＭ１、Ｍ２及び菱形形状のマークＭ３は、イベントデータを示している。また、マークの表示形態は、イベントの種別を示している。図３に示す例では、マークＭ１に対応するイベントデータは、ＥＣサイトにおいてユーザがアイテムを閲覧したことを示すイベントデータである。マークＭ２に対応するイベントデータは、ＥＣサイトにおいてユーザがアイテムを購入したことを示すイベントデータである。マークＭ３に対応するイベントデータは、ウェブサイトにおいてユーザがアイテムを閲覧したことを示すイベントデータである。ここで、アイテムは、例えば、商品やサービスである。本実施形態において、マークＭ１〜Ｍ３の上部に示す符号ｖ１〜ｖ６、ｗ１〜ｗ８を、アイテムを識別する識別情報とする。 In the example shown in FIG. 3, the elliptical marks M1 and M2 and the diamond-shaped marks M3 indicate event data. The display form of the mark indicates the type of event. In the example shown in FIG. 3, the event data corresponding to the mark M1 is event data indicating that the user has browsed the item on the EC site. The event data corresponding to the mark M2 is event data indicating that the user has purchased an item on the EC site. The event data corresponding to the mark M3 is event data indicating that the user has browsed the item on the website. Here, the item is, for example, a product or service. In the present embodiment, the symbols v1 to v6 and w1 to w8 shown above the marks M1 to M3 are used as identification information for identifying the item.

第１時系列データには、所定のイベントに対応するアイテム情報と、所定のイベントとは異なるイベントに対応するアイテム情報との少なくともいずれかが含まれている。また、第２時系列データには、所定のイベントとは異なるイベントに対応するアイテム情報が含まれている。例えば、第１時系列データにおいて、所定のイベントは、ユーザがＥＣサイト上でアイテムを購入するイベントである。また、第１時系列データにおいて、所定のイベントとは異なるイベントは、ユーザがＥＣサイトを閲覧するイベントである。また、第２ドメインにおけるイベントは、所定のイベントとは異なるイベントであり、ウェブサイトを閲覧するイベントである。 The first time-series data includes at least one of item information corresponding to a predetermined event and item information corresponding to an event different from the predetermined event. In addition, the second time series data includes item information corresponding to an event different from the predetermined event. For example, in the first time series data, a predetermined event is an event in which a user purchases an item on an EC site. Further, in the first time series data, an event different from the predetermined event is an event in which the user browses the EC site. In addition, the event in the second domain is an event different from a predetermined event and is an event for browsing a website.

また、図３に示される矢印は、イベントの発生時刻に対応している。例えば、第１時系列データＤ１では、ＥＣサイト上で、アイテムｖ１、ｖ２、ｖ１、ｖ３が順番に閲覧された後にアイテムｖ２が購入されたこと、その後、アイテムｖ４、ｖ５、ｖ６、ｖ６、ｖ５が順番に閲覧された後にアイテムｖ５が購入されたことを示している。 The arrows shown in FIG. 3 correspond to the time when the event occurs. For example, in the first time-series data D1, the item v2 was purchased after the items v1, v2, v1, v3 were browsed in order on the EC site, and then the items v4, v5, v6, v6, v5. Indicates that item v5 was purchased after being browsed in order.

ユーザ識別情報は、ユーザを一意に特定可能な情報であり、例えば、ユーザが使用する端末に割り当てられたＩＰアドレスである。なお、ユーザ識別情報として、ＥＣサイトにおいてユーザを識別するために用いられるユーザＩＤが用いられてもよいし、各ウェブサイトにおいてユーザを識別するために用いられるユーザＩＤが用いられてもよい。図３に示す例では、説明の便宜上、ユーザｕを、それぞれの時系列データに対応するユーザ識別情報とする。 The user identification information is information that can uniquely identify the user, and is, for example, an IP address assigned to the terminal used by the user. As the user identification information, the user ID used to identify the user on the EC site may be used, or the user ID used to identify the user on each website may be used. In the example shown in FIG. 3, for convenience of explanation, the user u is set as the user identification information corresponding to each time series data.

第１時系列データは、例えば、ＥＣサイトにおけるアクセス履歴であり、ＥＣサイトにアクセスした端末のＩＰアドレスと、ＥＣサイトにおけるＵＲＬと、当該ＵＲＬへのアクセス時刻とを関連付けたイベントデータが複数含まれている。ＥＣサイトにおけるＵＲＬには、アイテムが購入された場合にユーザの端末に表示される購入完了ページのＵＲＬ及びアイテムを説明するページのＵＲＬが含まれている。アイテムが購入された場合に端末に表示される購入完了ページのＵＲＬは、購入イベントのアイテム情報に対応しており、アイテムを説明するページのＵＲＬは、閲覧イベントのアイテム情報に対応している。また、ＵＲＬへのアクセス時刻が、イベントが発生した時刻に対応している。 The first time-series data is, for example, an access history on an EC site, and includes a plurality of event data in which the IP address of the terminal that accessed the EC site, the URL on the EC site, and the access time to the URL are associated with each other. ing. The URL on the EC site includes the URL of the purchase completion page displayed on the user's terminal when the item is purchased and the URL of the page explaining the item. The URL of the purchase completion page displayed on the terminal when the item is purchased corresponds to the item information of the purchase event, and the URL of the page explaining the item corresponds to the item information of the browsing event. In addition, the access time to the URL corresponds to the time when the event occurred.

第２時系列データは、例えば、ウェブサイトにおけるアクセス履歴であり、ウェブサイトにアクセスした端末のＩＰアドレスと、ウェブサイトにおけるＵＲＬと、当該ＵＲＬへのアクセス時刻とを関連付けたイベントデータが複数含まれている。ウェブサイトにおけるＵＲＬには、アイテムを説明するページのＵＲＬが含まれている。アイテムを説明するページのＵＲＬは、ウェブサイトにおける閲覧イベントのアイテム情報に対応している。また、ＵＲＬへのアクセス時刻が、イベントが発生した時刻に対応している。 The second time-series data is, for example, an access history on a website, and includes a plurality of event data in which the IP address of the terminal that accessed the website, the URL on the website, and the access time to the URL are associated with each other. ing. The URL on the website includes the URL of the page that describes the item. The URL of the page explaining the item corresponds to the item information of the browsing event on the website. In addition, the access time to the URL corresponds to the time when the event occurred.

特定部１２２は、時系列データに対応するユーザのうち、第１ドメインと第２ドメインとで共通する共通ユーザに対応する第１時系列データと、第２時系列データとの組み合わせを特定する。例えば、特定部１２２は、第１時系列データに含まれるユーザ識別情報と、第２時系列データに含まれるユーザ識別情報とが一致する場合に、これらの時系列データに対応するユーザが共通ユーザであると判定する。そして、特定部１２２は、これらの時系列データを、共通ユーザに対応する第１時系列データと、第２時系列データとの組み合わせと特定する。 The identification unit 122 specifies a combination of the first time-series data corresponding to the common user common to the first domain and the second domain and the second time-series data among the users corresponding to the time-series data. For example, in the specific unit 122, when the user identification information included in the first time series data and the user identification information included in the second time series data match, the user corresponding to these time series data is a common user. Is determined to be. Then, the specifying unit 122 identifies these time-series data as a combination of the first time-series data corresponding to the common user and the second time-series data.

なお、ユーザ識別情報が、ＥＣサイトにおいてユーザを識別するために用いられるユーザＩＤ、及び各ウェブサイトにおいてユーザを識別するために用いられるユーザＩＤである場合、記憶部１１には、これらのユーザＩＤの対応関係を示す対応関係情報が記憶されている。特定部１２２は、記憶部１１に記憶されている対応関係情報を参照することにより、共通ユーザを特定する。 When the user identification information is a user ID used to identify a user on an EC site and a user ID used to identify a user on each website, the storage unit 11 stores these user IDs. Correspondence information indicating the correspondence of is stored. The identification unit 122 identifies the common user by referring to the correspondence information stored in the storage unit 11.

学習部１２３は、特定部１２２が特定した組み合わせに対応する第１時系列データと第２時系列データとに基づいて、アイテムと他のアイテムとの出現の関係性と、当該第１時系列データに対応する共通ユーザと、当該第２時系列データに対応する共通ユーザとの関係性を同時に学習する。これにより、学習部１２３は、アイテム又はユーザに関係する他のアイテム又は他のユーザを予測するモデルを生成する。 Based on the first time-series data and the second time-series data corresponding to the combination specified by the specific unit 122, the learning unit 123 determines the relationship between the appearance of the item and the other items and the first time-series data. Simultaneously learn the relationship between the common user corresponding to the above and the common user corresponding to the second time series data. As a result, the learning unit 123 generates a model for predicting other items or other users related to the item or user.

具体的には、まず学習部１２３は、第１時系列データ及び第２時系列データのそれぞれに対応し、アイテムの出現順にアイテム識別情報を含むシーケンスデータを生成する。例えば、学習部１２３は、図３（ａ）に示す第１時系列データＤ１に対応するシーケンスデータとして第１シーケンスデータ［ｖ１，ｖ２，ｖ１，ｖ３，ｖ２，ｖ４，ｖ５，ｖ６，ｖ６，ｖ５，ｖ５］を生成する。また、学習部１２３は、図３（ｂ）に示す第２時系列データＤ２に対応するシーケンスデータとして第２シーケンスデータ［ｗ１，ｗ２，ｗ２，ｗ３，ｗ３，ｗ４，ｗ５，ｗ６，ｗ７，ｗ７，ｗ８］を生成する。 Specifically, first, the learning unit 123 corresponds to each of the first time series data and the second time series data, and generates sequence data including item identification information in the order of appearance of items. For example, the learning unit 123 uses the first sequence data [v1, v2, v1, v3, v2, v4, v5, v6, v6, v5 as the sequence data corresponding to the first time series data D1 shown in FIG. 3A. , V5] is generated. Further, the learning unit 123 uses the second sequence data [w1, w2, w2, w3, w3, w4, w5, w6, w7, w7 as the sequence data corresponding to the second time series data D2 shown in FIG. 3B. , W8] is generated.

続いて、学習部１２３は、第１時系列データＤ１と、第２時系列データＤ２とに対して、当該第１時系列データＤ１と当該第２時系列データＤ２とのユーザを区別するための学習用のユーザ識別情報である学習用識別情報を割り当てる。例えば、学習部１２３は、第１時系列データＤ１に学習用識別情報ｕ、第２時系列データＤ２に学習用識別情報ｕ’を割り当てる。 Subsequently, the learning unit 123 is for distinguishing the user of the first time series data D1 and the second time series data D2 from the first time series data D1 and the second time series data D2. Allocate learning identification information, which is user identification information for learning. For example, the learning unit 123 assigns the learning identification information u to the first time series data D1 and the learning identification information u'to the second time series data D2.

学習部１２３は、生成したシーケンスデータと、時系列データに対して割り当てた学習用識別情報とを用いて、アイテムと他のアイテムとの出現の関係性と、当該第１時系列データに対応する共通ユーザと、当該第２時系列データに対応する共通ユーザとの関係性を同時に学習する。 The learning unit 123 uses the generated sequence data and the learning identification information assigned to the time-series data to correspond to the relationship between the appearance of the item and other items and the first time-series data. The relationship between the common user and the common user corresponding to the second time series data is learned at the same time.

例えば、学習部１２３は、第１シーケンスデータと第２シーケンスデータとを１つの文章とみなし、これらの文章と、第１時系列データに割り当てた学習用識別情報と、第２時系列データに割り当てた学習用識別情報とを、Ｄｏｃ２Ｖｅｃを用いて学習することにより、複数のアイテムのそれぞれの特徴ベクトルと、学習用識別情報ｕ、ｕ’に対応する特徴ベクトルを生成する。ここで、学習用識別情報ｕ，ｕ’は同一の共通ユーザを示すものの、第１ドメイン、第２ドメインにおける当該共通ユーザの行動傾向（閲覧行動及び購入行動）を考慮した学習が行われる結果、ユーザ識別情報ｕ，ｕ’に対してそれぞれ固有の特徴ベクトルが生成される。 For example, the learning unit 123 regards the first sequence data and the second sequence data as one sentence, and allocates these sentences, the learning identification information assigned to the first time series data, and the second time series data. By learning the learning identification information using Doc2Vec, the feature vectors of the plurality of items and the feature vectors corresponding to the learning identification information u and u'are generated. Here, although the learning identification information u and u'indicate the same common user, as a result of learning in consideration of the behavioral tendency (browsing behavior and purchasing behavior) of the common user in the first domain and the second domain. Unique feature vectors are generated for each of the user identification information u and u'.

学習部１２３は、以下に示す式（１）のように目標関数Ｔを最大化するように学習する。

The learning unit 123 learns to maximize the target function T as shown in the following equation (1).

ここで、目標関数Ｔを示す式に出現するＵは、共通ユーザの集合であり、Ｐ（ｘ｜ｙ）は、条件付き確率（ｙが出現した場合にｘが出現する確率）である。また、Ｉ_ｕは、ユーザｕのシーケンスデータ、Ｋはウィンドウサイズのパラメータである。α_ｕは、ユーザごとの重み係数であり、例えば、ユーザに対応するシーケンスデータにおけるアイテムの個数の逆数である。 Here, U appearing in the equation showing the target function T is a set of common users, and P (x | y) is a conditional probability (probability that x appears when y appears). Further, I _u is the sequence data of the user u, and K is a parameter of the window size. α _u is a weighting coefficient for each user, and is, for example, the reciprocal of the number of items in the sequence data corresponding to the user.

学習部１２３は、以下に示す式（２）のように目標関数Ｔを最大化するように学習してもよい。

The learning unit 123 may learn to maximize the target function T as shown in the following equation (2).

式（２）では、共通ユーザとの関係性に関する項をＭ（ｕ，ｕ’）としてアイテムの関係性に関する項から括り出しており、Ｍ（ｕ，ｕ’）＝ｌｏｇＰ（ｕ｜ｕ’）＋ｌｏｇＰ（ｕ’｜ｕ）とすれば、式（１）と同じである。 In the formula (2), the term related to the relationship with the common user is grouped out from the term related to the relationship between items as M (u, u'), and M (u, u') = logP (u | u'). If + logP (u'| u) is set, it is the same as the equation (1).

ここで、学習部１２３は、Ｍ（ｕ，ｕ’）を正準相関分析により導出してもよい。例えば、学習部１２３は、第１ドメインにおけるユーザ識別情報を線形結合することにより第１合成変数を構成するとともに、第２ドメインのユーザ識別情報を線形結合することにより第２合成変数を構成する。そして、学習部１２３は、第１ドメインにおけるユーザ識別情報と第２ドメインのユーザ識別情報との正準相関分析を行うことにより、第１合成変数と第２合成変数との相関係数が相対的に大きくなるときの第１合成変数及び第２合成変数に含まれるユーザ識別情報のそれぞれに対応する係数を学習し、当該係数に基づいてユーザの関係性を学習する。 Here, the learning unit 123 may derive M (u, u') by canonical correlation analysis. For example, the learning unit 123 constructs the first synthetic variable by linearly combining the user identification information in the first domain, and constitutes the second synthetic variable by linearly combining the user identification information in the second domain. Then, the learning unit 123 performs a canonical correlation analysis between the user identification information in the first domain and the user identification information in the second domain, so that the correlation coefficient between the first synthetic variable and the second synthetic variable is relative. The coefficient corresponding to each of the user identification information included in the first synthetic variable and the second synthetic variable when the value becomes large is learned, and the user relationship is learned based on the coefficient.

正準相関分析とは、２つのドメインのユーザ識別情報間を関連付ける統計手法である。正準相関分析では、各ドメインのユーザ識別情報群に対し線形結合による合成変数（正準変数）を構成し、正準変数間の相関係数（正準相関係数）が大きくなるように線形結合の係数を学習する。ここで、２つのドメインのユーザ識別情報を各ドメインのユーザを特定するｏｎｅ−ｈｏｔベクトルとし、その線形結合の係数を各ユーザの特徴ベクトルとみなす。このように設定し正準相関分析を適用することにより、共通ユーザをよりよく関連付ける線形結合の係数として各ユーザの特徴ベクトルを得ることができる。Ｍ（ｕ，ｕ’）には、全ての正準変数について総合的に評価するため、正準変数間の相関行列のトレースノルムを設定する。これは正準変数の個数を共通ユーザ数と同じにした場合の値に相当するが、正準変数のうち重要な少数のｋ個のみを評価したい場合は、正準変数間の相関行列の第１固有値から第ｋ固有値までの和をＭ（ｕ，ｕ’）に設定すればよい。 Canonical correlation analysis is a statistical method for associating user identification information between two domains. In the canonical correlation analysis, a composite variable (canonical variable) by linear combination is constructed for the user identification information group of each domain, and the correlation coefficient (canonical correlation coefficient) between the canonical variables is linearly increased. Learn the coefficients of the combination. Here, the user identification information of the two domains is regarded as the one-hot vector that identifies the user of each domain, and the coefficient of the linear combination is regarded as the feature vector of each user. By setting in this way and applying the canonical correlation analysis, it is possible to obtain the feature vector of each user as the coefficient of the linear combination that better associates the common users. In M (u, u'), the trace norm of the correlation matrix between the canonical variables is set in order to comprehensively evaluate all the canonical variables. This corresponds to the value when the number of canonical variables is the same as the number of common users, but if you want to evaluate only a small number of important k of the canonical variables, the number of the correlation matrix between the canonical variables The sum from the 1st eigenvalue to the kth eigenvalue may be set to M (u, u').

学習部１２３は、全てのアイテム識別情報の個数を要素数とする入力ベクトル及び出力ベクトルを定義する。学習部１２３は、ウィンドウサイズに基づいてアイテムの出現関係を示す入力ベクトルと出力ベクトルとの組み合わせを生成する。同様に、学習部１２３は、学習用識別情報の個数を要素数とする入力ベクトル及び出力ベクトルを定義し、学習用識別情報の関係性を示す入力ベクトルと出力ベクトルとの組み合わせを生成する。ここで、入力ベクトル及び出力ベクトルはｏｎｅ−ｈｏｔベクトルである。 The learning unit 123 defines an input vector and an output vector having the number of all item identification information as the number of elements. The learning unit 123 generates a combination of an input vector and an output vector indicating the appearance relationship of items based on the window size. Similarly, the learning unit 123 defines an input vector and an output vector having the number of learning identification information as the number of elements, and generates a combination of the input vector and the output vector showing the relationship of the learning identification information. Here, the input vector and the output vector are one-hot vectors.

図４は、第１実施形態に係る学習部１２３による学習例を説明する図である。第１シーケンスデータ［ｖ１，ｖ２，ｖ１，ｖ３，ｖ２，ｖ４，ｖ５，ｖ６，ｖ６，ｖ５，ｖ５］のうち、３番目に出現するアイテム「ｖ１」に対する、他のアイテムの出現性を学習する場合、学習部１２３は、ウィンドウサイズに基づいて、当該アイテムに隣接するアイテムを抽出する。ウィンドウサイズが「５」の場合、学習部１２３は、３番目に出現するアイテム「ｖ１」を中心として、前後２つのアイテムを「ｖ１」、「ｖ２」、「ｖ３」、「ｖ２」を抽出する。 FIG. 4 is a diagram illustrating a learning example by the learning unit 123 according to the first embodiment. Learn the appearance of other items with respect to the item "v1" that appears third in the first sequence data [v1, v2, v1, v3, v2, v4, v5, v6, v6, v5, v5]. In the case, the learning unit 123 extracts the item adjacent to the item based on the window size. When the window size is "5", the learning unit 123 extracts "v1", "v2", "v3", and "v2" from the two items before and after, centering on the item "v1" that appears third. ..

続いて、学習部１２３は、前後２つのアイテムのいずれかが出現したことを示すｏｎｅ−ｈｏｔベクトルを入力ベクトルとして生成するとともに、３番目に出現するアイテム「ｖ１」が出現したことを示すｏｎｅ−ｈｏｔベクトルを出力ベクトルとして生成する。これにより、アイテムの出現の関係性が示される入力ベクトルと出力ベクトルとの組み合わせが生成される。 Subsequently, the learning unit 123 generates an one-hot vector indicating that one of the two front and rear items has appeared as an input vector, and one-indicating that the third appearing item "v1" has appeared. Generate a hot vector as an output vector. As a result, a combination of an input vector and an output vector showing the relationship between the appearance of items is generated.

また、学習部１２３は、第１時系列データに対応する学習用識別情報「ｕ」が出現したことを示すｏｎｅ−ｈｏｔベクトルを入力ベクトルとして生成するとともに、当該学習用識別情報「ｕ」に対応する第２時系列データの学習用識別情報「ｕ’」が出現したことを示すｏｎｅ−ｈｏｔベクトルを出力ベクトルとして生成する。これにより、ユーザの関係性が示される入力ベクトルと出力ベクトルとの組み合わせが生成される。さらに、学習部１２３は、出現するアイテム「ｖ１」を示す出力ベクトルに対応して、第１シーケンスデータに対応する学習用識別情報「ｕ」を示すｏｎｅ−ｈｏｔベクトルも入力ベクトルとし、アイテムの出現とユーザの関係性を反映する。 Further, the learning unit 123 generates an one-hot vector indicating that the learning identification information "u" corresponding to the first time-series data has appeared as an input vector, and corresponds to the learning identification information "u". A one-hot vector indicating that the learning identification information “u'” of the second time-series data has appeared is generated as an output vector. As a result, a combination of an input vector and an output vector showing the user's relationship is generated. Further, the learning unit 123 also uses the one-hot vector indicating the learning identification information “u” corresponding to the first sequence data as the input vector corresponding to the output vector indicating the appearing item “v1”, and the appearance of the item. And the user's relationship.

学習部１２３は、第２シーケンスデータ［ｗ１，ｗ２，ｗ２，ｗ３，ｗ３，ｗ４，ｗ５，ｗ６，ｗ７，ｗ７，ｗ８］についても上述した手順により、入力ベクトルと出力ベクトルとの組み合わせを生成する。また、学習部１２３は、第２時系列データに対応する学習用識別情報「ｕ’」が出現したことを示すｏｎｅ−ｈｏｔベクトルを入力ベクトルとして生成するとともに、当該学習用識別情報「ｕ’」に対応する第１時系列データの学習用識別情報「ｕ」が出現したことを示すｏｎｅ−ｈｏｔベクトルを出力ベクトルとして生成する。また、学習部１２３は、出現するアイテムを示す出力ベクトルに対応して、第２シーケンスデータに対応する学習用識別情報「ｕ’」を示すｏｎｅ−ｈｏｔベクトルを入力ベクトルとして生成する。 The learning unit 123 also generates a combination of the input vector and the output vector for the second sequence data [w1, w2, w2, w3, w3, w4, w5, w6, w7, w7, w8] by the above procedure. .. Further, the learning unit 123 generates an one-hot vector indicating that the learning identification information "u'" corresponding to the second time-series data has appeared as an input vector, and the learning identification information "u'". A one-hot vector indicating that the learning identification information “u” of the first time-series data corresponding to the above has appeared is generated as an output vector. Further, the learning unit 123 generates a one-hot vector indicating the learning identification information "u'" corresponding to the second sequence data as an input vector corresponding to the output vector indicating the item to appear.

なお、学習部１２３は、第１シーケンスデータ、第２シーケンスデータのいずれか一方を、第１時系列データと第２時系列データの混合時系列データから作成するようにしてもよい。例えば、図４に示す左側の例に示される学習用識別情報「ｕ」に対応する出力ベクトルを、当該学習用識別情報「ｕ」と同一のユーザを示す学習用識別情報「ｕ’」に対応する出力ベクトルとしてもよい。例えば、学習部１２３は、図３に示す第１時系列データＤ１と、第２時系列データＤ２とのアイテムを時間順に統合して１つの時系列データを生成する。そして、学習部１２３は、当該時系列データに対応するシーケンスデータ［ｖ１，ｗ１，ｗ２，ｖ２，ｗ２，ｖ１，ｗ３，ｖ３，ｗ３…］を図４の左側に示す学習用識別情報「ｕ」の学習に利用する第１シーケンスデータとし、アイテムの出現関係を示す入力ベクトルと出力ベクトルとの組み合わせを生成してもよい。このようにすることで、混合時系列データに、時系列データ間のアイテムの出現の関係性を取り込むことが可能となる。 The learning unit 123 may create either the first sequence data or the second sequence data from the mixed time series data of the first time series data and the second time series data. For example, the output vector corresponding to the learning identification information "u" shown in the example on the left side shown in FIG. 4 corresponds to the learning identification information "u'" indicating the same user as the learning identification information "u". It may be an output vector to be used. For example, the learning unit 123 integrates the items of the first time series data D1 and the second time series data D2 shown in FIG. 3 in chronological order to generate one time series data. Then, the learning unit 123 displays the sequence data [v1, w1, w2, v2, w2, v1, w3, v3, w3 ...] Corresponding to the time series data for learning identification information “u” on the left side of FIG. As the first sequence data used for learning, a combination of an input vector and an output vector indicating the appearance relationship of items may be generated. By doing so, it is possible to incorporate the relationship of appearance of items between the time series data into the mixed time series data.

学習部１２３は、共通ユーザに対応する第２（第１）シーケンスデータに出現するアイテムが出現したことを示すｏｎｅ−ｈｏｔベクトルを出力ベクトルとし、当該アイテムに時間的に隣接する第１（第２）シーケンスデータのアイテムを抽出してそれらのアイテムのいずれかが出現したことを示すｏｎｅ−ｈｏｔベクトルを入力ベクトルとし、入力ベクトルと出力ベクトルとの組み合わせを生成してもよい。 The learning unit 123 uses a one-hot vector indicating that an item appearing in the second (first) sequence data corresponding to the common user has appeared as an output vector, and the first (second) temporally adjacent to the item. ) An item of sequence data may be extracted and a one-hot vector indicating that any of those items has appeared may be used as an input vector, and a combination of an input vector and an output vector may be generated.

例えば、学習部１２３は、図４に示す左側の例における出力ベクトルを、第２シーケンスデータに出現するアイテム「ｗｉ」の出現時刻と時間的に隣接する第１シーケンスに対応するアイテム（例えば、前後の２つのアイテム）としてアイテム「ｖａ」、「ｖｂ」、「ｖｃ」、「ｖｄ」を抽出する。そして、学習部１２３は、「ｖａ」、「ｖｂ」、「ｖｃ」、「ｖｄ」に対応するｏｎｅ−ｈｏｔベクトルを入力ベクトルとし、「ｗｉ」に対応するｏｎｅ−ｈｏｔベクトルを出力ベクトルとした組み合わせを生成する。このようにすることで、入力ベクトルと出力ベクトルとに、時系列データ間のアイテムの出現の関係性を取り込むことも可能である。 For example, the learning unit 123 sets the output vector in the example on the left side shown in FIG. 4 to an item corresponding to the first sequence (for example, before and after) which is temporally adjacent to the appearance time of the item “wi” appearing in the second sequence data. Items "va", "vb", "vc", and "vd" are extracted as the two items). Then, the learning unit 123 uses a combination in which the one-hot vector corresponding to “va”, “vb”, “vc”, and “vd” is used as the input vector and the one-hot vector corresponding to “wi” is used as the output vector. To generate. By doing so, it is also possible to incorporate the relationship between the appearance of items between time series data into the input vector and the output vector.

また、学習部１２３は、上記の学習と同時に、第１ドメインと第２ドメインとで共通する共通ユーザとは異なるユーザに対応する第１時系列データと、第２時系列データとのそれぞれに基づいてシーケンスデータを生成し、Ｄｏｃ２Ｖｅｃを用いて学習してもよい。このようにすることで、情報処理装置１は、共通ユーザに対応する第１時系列データと、第２時系列データとにおいて出現しなかったアイテム、及び共通ユーザとは異なるユーザの特徴ベクトルを生成できる。 Further, at the same time as the above learning, the learning unit 123 is based on the first time series data corresponding to the user different from the common user common in the first domain and the second domain and the second time series data, respectively. You may generate sequence data and learn using Doc2Vec. By doing so, the information processing device 1 generates the first time-series data corresponding to the common user, the items that did not appear in the second time-series data, and the feature vector of the user different from the common user. it can.

そして、学習部１２３は、生成した入力ベクトルと出力ベクトルとの組み合わせに基づいて、学習を行うことにより、目標関数Ｔが最大化されるように、アイテム又はユーザに関係する他のアイテム又は他のユーザを予測するモデルを生成する。生成したモデルには、入力層に入力された入力ベクトルを隠れ層に入力するための重み行列Ｗ、Ｄと、隠れ層から出力されたベクトルを出力層に出力するための重み行列Ｗ’、Ｄ’（不図示）とが含まれる。 Then, the learning unit 123 performs learning based on the combination of the generated input vector and the output vector, so that the target function T is maximized, and the item or another item related to the user or another Generate a model that predicts the user. In the generated model, the weight matrices W and D for inputting the input vector input to the input layer to the hidden layer and the weight matrices W'and D for outputting the vector output from the hidden layer to the output layer. '(Not shown) and is included.

特徴ベクトル生成部１２４は、学習部１２３が生成したモデルに基づいて、アイテムの特徴ベクトルと、第１ドメイン及び第２ドメインにおけるアイテムの出現の関係性が反映されたユーザの特徴ベクトルとを生成する。具体的には、特徴ベクトル生成部１２４は、学習部１２３が生成したモデルに含まれる重み行列Ｗ、Ｄに基づいて、複数のアイテムのそれぞれの特徴ベクトルと、ユーザの特徴ベクトルとを生成する。 The feature vector generation unit 124 generates an item feature vector and a user feature vector that reflects the relationship between the appearance of items in the first domain and the second domain, based on the model generated by the learning unit 123. .. Specifically, the feature vector generation unit 124 generates each feature vector of a plurality of items and a user feature vector based on the weight matrices W and D included in the model generated by the learning unit 123.

図５は、第１実施形態に係る特徴ベクトル生成部１２４が生成したユーザの特徴ベクトルを特徴空間に配置した例を示す図である。なお、図５では、説明の便宜上、特徴空間を二次元に圧縮し、当該特徴空間にユーザの特徴ベクトルを示すマークを配置した例を示している。図５に示すマークＭ４は、第１時系列データに対応するユーザの特徴ベクトルを示し、マークＭ５は、第２時系列データに対応するユーザの特徴ベクトルを示している。また、マークＭ４と、マークＭ５には、符号として、学習用識別情報ｕ１〜ｕ３、ｕ１’〜ｕ３’が示されている。 FIG. 5 is a diagram showing an example in which the user's feature vector generated by the feature vector generation unit 124 according to the first embodiment is arranged in the feature space. Note that FIG. 5 shows an example in which the feature space is compressed two-dimensionally and a mark indicating the user's feature vector is arranged in the feature space for convenience of explanation. The mark M4 shown in FIG. 5 shows the feature vector of the user corresponding to the first time series data, and the mark M5 shows the feature vector of the user corresponding to the second time series data. Further, the marks M4 and the mark M5 are indicated with learning identification information u1 to u3 and u1'to u3'as reference numerals.

例えば、図５における、学習用識別情報ｕ１、ｕ１’の特徴ベクトルの距離、及び学習用識別情報ｕ３、ｕ３’の特徴ベクトルの距離は、学習用識別情報ｕ２、ｕ２’の特徴ベクトルの距離に比べて短いことが確認できる。これは、学習用識別情報ｕ１、ｕ１’に対応する共通ユーザ、及び学習用識別情報ｕ３、ｕ３’に対応する共通ユーザが、学習用識別情報ｕ２、ｕ２’に対応する共通ユーザに比べて、第１ドメインと第２ドメインとにおける行動傾向の変化が少ないことを示している。 For example, the distance of the feature vectors of the learning identification information u1 and u1'and the distance of the feature vectors of the learning identification information u3 and u3' in FIG. 5 are the distances of the feature vectors of the learning identification information u2 and u2'. It can be confirmed that it is shorter than that. This is because the common user corresponding to the learning identification information u1, u1'and the common user corresponding to the learning identification information u3, u3' are compared with the common user corresponding to the learning identification information u2, u2'. It shows that there is little change in behavioral tendency between the first domain and the second domain.

［転移学習及び所定のイベントの発生予測］
本実施形態において、特徴ベクトル生成部１２４及び予測部１２５は、協働することにより、異なるドメインに対応する時系列データ間の転移学習を行うとともに、第２時系列データに対応するユーザが所定のイベントを発生させるか否かを予測する。以下、転移学習及び所定のイベントの発生予測に係る特徴ベクトル生成部１２４及び予測部１２５の機能について説明する。 [Transfer learning and prediction of occurrence of predetermined events]
In the present embodiment, the feature vector generation unit 124 and the prediction unit 125 cooperate to perform transfer learning between time series data corresponding to different domains, and a user corresponding to the second time series data is determined. Predict whether to raise an event. Hereinafter, the functions of the feature vector generation unit 124 and the prediction unit 125 related to transfer learning and prediction of the occurrence of a predetermined event will be described.

特徴ベクトル生成部１２４は、第１時系列データと、第２時系列データとの少なくともいずれかに含まれる複数のアイテム情報が示す複数のアイテムのそれぞれの特徴ベクトルに基づいて、第１ドメインと第２ドメインとのユーザの特徴ベクトルをそれぞれ第１特徴ベクトル、第２特徴ベクトルとして生成する。 The feature vector generation unit 124 has a first domain and a first domain based on the feature vectors of the plurality of items indicated by the plurality of item information included in at least one of the first time series data and the second time series data. The user feature vectors of the two domains are generated as the first feature vector and the second feature vector, respectively.

例えば、特徴ベクトル生成部１２４は、ユーザの第１時系列データと、第２時系列データとの少なくともいずれかに含まれる複数のアイテム情報のそれぞれに対して生成された特徴ベクトルの平均値（例えば、算出平均値や加重平均値）を算出することにより、当該ユーザの第１特徴ベクトルと第２特徴ベクトルとを生成する。 For example, the feature vector generation unit 124 has an average value of feature vectors generated for each of a plurality of item information included in at least one of the user's first time series data and the second time series data (for example,). , Calculated average value and weighted average value) to generate the first feature vector and the second feature vector of the user.

また、特徴ベクトル生成部１２４は、学習部１２３における共通する共通ユーザとは異なるユーザに対応する第１時系列データと、第２時系列データとのそれぞれに対応するシーケンスデータを生成し、Ｄｏｃ２Ｖｅｃを用いて同時に学習することで生成したユーザの特徴ベクトルを、当該ユーザの第１特徴ベクトルと第２特徴ベクトルとしてもよい。 Further, the feature vector generation unit 124 generates sequence data corresponding to each of the first time series data and the second time series data corresponding to users different from the common common user in the learning unit 123, and generates Doc2Vec. The user's feature vector generated by learning at the same time may be used as the user's first feature vector and second feature vector.

予測部１２５は、第１ドメインに対応するユーザの特徴ベクトルである第１特徴ベクトルと、当該ユーザが所定のイベントを発生させたか否かの結果に基づいて、当該第１特徴ベクトルの入力に対して、ユーザを、所定のイベントを発生させたユーザと、所定のイベントを発生させなかったユーザとに分類する分類器を生成する。 The prediction unit 125 receives the input of the first feature vector, which is the feature vector of the user corresponding to the first domain, and the result of whether or not the user has generated a predetermined event. Therefore, a classifier is generated that classifies the users into a user who has generated a predetermined event and a user who has not generated a predetermined event.

具体的には、予測部１２５は、第１ドメインに対応するユーザに対応する第１時系列データを参照して、当該ユーザが所定のイベントを発生させたか否かを特定する。そして、予測部１２５は、当該ユーザの第１特徴ベクトルと、当該ユーザが所定のイベントを発生させたか否かの結果とに基づいて、分類器を生成する。例えば、予測部１２５は、当該ユーザの第１特徴ベクトルを入力ベクトルとし、当該ユーザが所定のイベントを発生させたか否かの結果を出力ベクトルとして学習を行うことにより、特徴ベクトルの入力に対して、ユーザを、所定のイベントを発生させたユーザと、所定のイベントを発生させなかったユーザとに分類する分類器を生成する。 Specifically, the prediction unit 125 refers to the first time series data corresponding to the user corresponding to the first domain, and specifies whether or not the user has generated a predetermined event. Then, the prediction unit 125 generates a classifier based on the first feature vector of the user and the result of whether or not the user has generated a predetermined event. For example, the prediction unit 125 receives the input of the feature vector by learning using the first feature vector of the user as an input vector and the result of whether or not the user has generated a predetermined event as an output vector. , Generates a classifier that classifies users into users who have generated a predetermined event and users who have not generated a predetermined event.

また、予測部１２５は、共通ユーザに対応する第１特徴ベクトルと、第２特徴ベクトルとの距離に基づいて、分類器の生成に用いるユーザを選択してもよい。例えば、共通ユーザの第１特徴ベクトルと第２特徴ベクトルとの距離を算出し、当該距離が予め定められた閾値よりも小さい共通ユーザを、分類器の生成に用いるユーザとして選択する。このようにすることで、第１ドメインと第２ドメインとにおける行動傾向の変化が少ない（行動傾向が類似する）ユーザのみを用いて分類器を生成することで、生成した分類器は精度良く転移することができる。 Further, the prediction unit 125 may select a user to be used for generating the classifier based on the distance between the first feature vector corresponding to the common user and the second feature vector. For example, the distance between the first feature vector and the second feature vector of the common user is calculated, and the common user whose distance is smaller than a predetermined threshold value is selected as the user used to generate the classifier. By doing so, by generating a classifier using only users who have little change in behavioral tendency (similar behavioral tendency) between the first domain and the second domain, the generated classifier is transferred with high accuracy. can do.

閾値は、共通ユーザに対応する第１特徴ベクトルと、第２特徴ベクトルとの距離の統計値（例えば、平均値や一番小さな値からＸ％目の値）を使っても良い。なお、予測部１２５は、分類器を生成するにあたり、共通ユーザの特徴ベクトルである第２特徴ベクトルを含めて分類器を生成してもよい。 As the threshold value, a statistical value of the distance between the first feature vector corresponding to the common user and the second feature vector (for example, the average value or the X% value from the smallest value) may be used. When generating the classifier, the prediction unit 125 may generate the classifier including the second feature vector which is the feature vector of the common user.

図６は、第１実施形態に係る分類器を学習させた例を示す図である。なお、図６では、説明の便宜上、第１特徴ベクトルを二次元に圧縮して特徴空間に配置した例を示している。図６に示すマークＭ６は、正例データ、すなわちイベントを発生させたユーザに対応する第１特徴ベクトルを示し、マークＭ７は、負例データ、すなわちイベントを発生させなかったユーザに対応する第１特徴ベクトルを示している。また、境界線Ｌは、分類器により第１特徴ベクトルを正例データと負例データとを分類したときの境界線を示している。なお、境界線は、説明の便宜上示すものであり、実際には境界線は生成されるものではない。 FIG. 6 is a diagram showing an example in which the classifier according to the first embodiment is trained. Note that FIG. 6 shows an example in which the first feature vector is compressed two-dimensionally and arranged in the feature space for convenience of explanation. The mark M6 shown in FIG. 6 indicates positive example data, that is, the first feature vector corresponding to the user who generated the event, and the mark M7 indicates negative example data, that is, the first feature vector corresponding to the user who did not generate the event. The feature vector is shown. Further, the boundary line L indicates a boundary line when the first feature vector is classified into positive example data and negative example data by a classifier. It should be noted that the boundary line is shown for convenience of explanation, and the boundary line is not actually generated.

予測部１２５は、生成した分類器に、特徴ベクトル生成部１２４が生成したユーザの第２特徴ベクトルを入力することにより、当該ユーザを、所定のイベントを発生させると予測されるユーザと、所定のイベントを発生させないと予測されるユーザとに分類する。予測部１２５は、分類結果を示す情報を出力する。 The prediction unit 125 inputs the second feature vector of the user generated by the feature vector generation unit 124 into the generated classifier, so that the user is predicted to generate a predetermined event and a predetermined user. Classify as users who are not expected to generate events. The prediction unit 125 outputs information indicating the classification result.

図７は、第１実施形態に係る分類器により第２特徴ベクトルが分類された例を示す図である。図７に示す例は、図６に対応する分類器により第２特徴ベクトルを分類した例を示しており、図６と同じ境界線Ｌが表示されている。図７に示すマークＭ８は、所定のイベントを発生させると予測されたユーザに対応する第２特徴ベクトルを示している。また、マークＭ９は、所定のイベントを発生させないと予測されたユーザに対応する第２特徴ベクトルを示している。このようにすることで、情報処理装置１は、異なるドメインに対応する時系列データ間の転移学習を精度良く行うことができる。 FIG. 7 is a diagram showing an example in which the second feature vector is classified by the classifier according to the first embodiment. The example shown in FIG. 7 shows an example in which the second feature vector is classified by the classifier corresponding to FIG. 6, and the same boundary line L as in FIG. 6 is displayed. The mark M8 shown in FIG. 7 indicates a second feature vector corresponding to the user predicted to generate a predetermined event. Further, the mark M9 indicates a second feature vector corresponding to a user who is predicted not to generate a predetermined event. By doing so, the information processing apparatus 1 can accurately perform transfer learning between time series data corresponding to different domains.

［分類結果の利用例］
続いて、分類結果の利用例について説明する。広告配信部１２６は、分類器において所定のイベントを発生させると予測されたユーザに対して、所定のイベントを発生させたユーザに配信する広告と同じ広告又は類似する広告を配信する。例えば、広告配信部１２６は、所定のイベントに対応する所定のアイテムに関する広告を、所定のイベントを発生させると予測されたユーザに対して配信する。このようにすることで、情報処理装置１は、所定のイベントを発生させると予測されたユーザに対しても、所定のイベントの発生に対して効果的な広告を配信することができる。 [Example of using classification results]
Next, an example of using the classification result will be described. The advertisement distribution unit 126 delivers the same advertisement or an advertisement similar to the advertisement delivered to the user who generated the predetermined event to the user who is predicted to generate the predetermined event in the classifier. For example, the advertisement distribution unit 126 distributes an advertisement related to a predetermined item corresponding to a predetermined event to a user who is predicted to generate a predetermined event. By doing so, the information processing apparatus 1 can deliver an advertisement effective for the occurrence of the predetermined event even to the user who is predicted to generate the predetermined event.

［情報処理装置１における処理の流れ］
続いて、情報処理装置１における処理の流れの一例について説明する。まず、情報処理装置１がユーザの特徴ベクトル及びアイテムの特徴ベクトルを生成するときの処理の流れについて説明する。図８は、第１実施形態に係る情報処理装置１がユーザの特徴ベクトル及びアイテムの特徴ベクトルを生成するときの処理の流れを示すフローチャートである。 [Process flow in information processing device 1]
Subsequently, an example of the processing flow in the information processing apparatus 1 will be described. First, the flow of processing when the information processing apparatus 1 generates the feature vector of the user and the feature vector of the item will be described. FIG. 8 is a flowchart showing a processing flow when the information processing apparatus 1 according to the first embodiment generates a user feature vector and an item feature vector.

まず、時系列データ取得部１２１は、複数の第１時系列データと複数の第２時系列データを取得する（Ｓ１）。
続いて、特定部１２２は、第１ドメインと第２ドメインとで共通する共通ユーザを特定し（Ｓ２）、共通ユーザに対応する第１時系列データと第２時系列データとの組み合わせを特定する（Ｓ３）。 First, the time series data acquisition unit 121 acquires a plurality of first time series data and a plurality of second time series data (S1).
Subsequently, the identification unit 122 identifies a common user common to the first domain and the second domain (S2), and specifies a combination of the first time series data and the second time series data corresponding to the common user. (S3).

続いて、学習部１２３は、アイテムの出現の関係性と、ユーザの関係性とを同時に学習する（Ｓ４）。
続いて、特徴ベクトル生成部１２４は、学習部１２３による学習により生成されたモデルに基づいて、アイテム及びユーザの特徴ベクトルを生成する（Ｓ５）。 Subsequently, the learning unit 123 simultaneously learns the relationship between the appearance of items and the relationship between users (S4).
Subsequently, the feature vector generation unit 124 generates the feature vector of the item and the user based on the model generated by the learning by the learning unit 123 (S5).

続いて、情報処理装置１が所定のイベントを発生させるユーザを予測するときの処理の流れについて説明する。図９は、第１実施形態に係る情報処理装置１が所定のイベントを発生させるユーザを予測するときの処理の流れを示すフローチャートである。なお、本フローチャートの開始時に、時系列データ取得部１２１が複数の第１時系列データと複数の第２時系列データを取得しており、特徴ベクトル生成部１２４がユーザの特徴ベクトル及び複数のアイテムの特徴ベクトルを生成しているものとする。また、本フローチャートは、行動傾向が類似するユーザのみを用いて分類器を生成する場合の処理の流れを示すフローチャートである。 Subsequently, a processing flow when the information processing apparatus 1 predicts a user who generates a predetermined event will be described. FIG. 9 is a flowchart showing a processing flow when the information processing apparatus 1 according to the first embodiment predicts a user who generates a predetermined event. At the start of this flowchart, the time series data acquisition unit 121 has acquired a plurality of first time series data and a plurality of second time series data, and the feature vector generation unit 124 has acquired the user's feature vector and a plurality of items. It is assumed that the feature vector of is generated. Further, this flowchart is a flowchart showing a processing flow when a classifier is generated using only users having similar behavioral tendencies.

まず、特徴ベクトル生成部１２４は、第１ドメインと第２ドメインとのユーザの特徴ベクトルをそれぞれ第１特徴ベクトル、第２特徴ベクトルとして生成する（Ｓ１１）。
続いて、予測部１２５は、第１特徴ベクトルと第２特徴ベクトルとの距離が予め定められた閾値よりも小さい共通ユーザを、第１ドメインと第２ドメインとで行動傾向が類似する共通ユーザとして選択する（Ｓ１２）。 First, the feature vector generation unit 124 generates the user feature vectors of the first domain and the second domain as the first feature vector and the second feature vector, respectively (S11).
Subsequently, the prediction unit 125 sets a common user whose distance between the first feature vector and the second feature vector is smaller than a predetermined threshold value as a common user whose behavioral tendency is similar between the first domain and the second domain. Select (S12).

続いて、予測部１２５は、選択した共通ユーザの第１特徴ベクトルと、当該共通ユーザが所定のイベントを発生させたか否かの結果とに基づいて、分類器を生成する（Ｓ１３）。 Subsequently, the prediction unit 125 generates a classifier based on the first feature vector of the selected common user and the result of whether or not the common user has generated a predetermined event (S13).

続いて、予測部１２５は、Ｓ１３において生成された分類器に、Ｓ１１において生成された第２特徴ベクトルを入力することにより、当該第２特徴ベクトルに対応するユーザを、所定のイベントを発生させると予測されるユーザと、所定のイベントを発生させないと予測されるユーザとに分類する（Ｓ１４）。
続いて、予測部１２５は、Ｓ１４における分類結果を出力する（Ｓ１５）。例えば、予測部１２５は、分類結果を示す情報を含むファイルを生成し、当該ファイルを記憶部１１に記憶させる。 Subsequently, the prediction unit 125 inputs the second feature vector generated in S11 into the classifier generated in S13 to generate a predetermined event for the user corresponding to the second feature vector. It is classified into a predicted user and a user who is predicted not to generate a predetermined event (S14).
Subsequently, the prediction unit 125 outputs the classification result in S14 (S15). For example, the prediction unit 125 generates a file including information indicating the classification result, and stores the file in the storage unit 11.

［第１実施形態における効果］
以上の通り、第１実施形態に係る情報処理装置１は、第１ドメインと第２ドメインとで共通する共通ユーザに対応する第１時系列データと、第２時系列データとの組み合わせを特定し、当該組み合わせに対応する第１時系列データと第２時系列データとに基づいて、アイテムと他のアイテムとの出現の関係性と、当該第１時系列データに対応する共通ユーザと、当該第２時系列データに対応する共通ユーザとの関係性とを同時に学習することにより、アイテム又はユーザに関係する他のアイテム又は他のユーザを予測するモデルを生成する。情報処理装置１は、学習部が生成したモデルに基づいて、アイテムの特徴ベクトルと、第１ドメイン及び第２ドメインにおけるアイテムの出現の関係性が反映されたユーザの特徴ベクトルとを生成する。このようにすることで、情報処理装置１は、異なるドメインに対応する時系列データのアイテムに関連性を持たせて特徴ベクトルを生成することができる。これにより、情報処理装置１は、複数のドメインにおけるユーザの行動傾向を考慮して特徴ベクトルを生成することができる。 [Effect in the first embodiment]
As described above, the information processing apparatus 1 according to the first embodiment specifies a combination of the first time series data corresponding to the common user common to the first domain and the second domain and the second time series data. , Based on the first time series data and the second time series data corresponding to the combination, the relationship of appearance between the item and other items, the common user corresponding to the first time series data, and the first By simultaneously learning the relationship with the common user corresponding to the two time series data, a model for predicting the item or another item related to the user or another user is generated. The information processing device 1 generates an item feature vector and a user feature vector that reflects the relationship between the appearance of items in the first domain and the second domain, based on the model generated by the learning unit. By doing so, the information processing apparatus 1 can generate a feature vector by associating the items of the time series data corresponding to different domains. As a result, the information processing device 1 can generate a feature vector in consideration of the behavioral tendency of the user in a plurality of domains.

＜第２実施形態＞
続いて、第２実施形態について説明する。第２実施形態に係る情報処理装置１は、１人の共通ユーザの第１時系列データ及び第２時系列データから、複数の部分時系列データを抽出し、部分時系列データに基づいて学習する点で第１実施形態と異なる。以下、第２実施形態に係る情報処理装置１について説明する。なお、第１実施形態と同じ部分については適宜説明を省略する。 <Second Embodiment>
Subsequently, the second embodiment will be described. The information processing device 1 according to the second embodiment extracts a plurality of partial time series data from the first time series data and the second time series data of one common user, and learns based on the partial time series data. It differs from the first embodiment in that. Hereinafter, the information processing apparatus 1 according to the second embodiment will be described. The same parts as those in the first embodiment will be omitted as appropriate.

図１０は、第２実施形態に係る情報処理装置１の構成を示す図である。第２実施形態に係る情報処理装置１の制御部１２は、抽出部１２７をさらに備える。
抽出部１２７は、所定の条件に基づいて、第１時系列データから、当該第１時系列データに対応する期間のうちの部分的な期間に対応する第１部分時系列データを複数抽出する。また、抽出部１２７は、第２時系列データから、当該第２時系列データに対応する期間のうちの部分的な期間に対応する第２部分時系列データを複数抽出する。 FIG. 10 is a diagram showing a configuration of the information processing device 1 according to the second embodiment. The control unit 12 of the information processing device 1 according to the second embodiment further includes an extraction unit 127.
The extraction unit 127 extracts a plurality of first partial time series data corresponding to a partial period of the period corresponding to the first time series data from the first time series data based on a predetermined condition. In addition, the extraction unit 127 extracts a plurality of second partial time series data corresponding to a partial period of the period corresponding to the second time series data from the second time series data.

具体的には、抽出部１２７は、所定のイベントに対応するアイテムの出現時間に基づいて、複数の第１部分時系列データ及び第２部分時系列データを抽出する。より具体的には、まず、抽出部１２７は、複数の共通ユーザのそれぞれについて、第１部分時系列データを参照し、所定のイベントに対応するアイテムの出現時間から、所定時間（例えば１時間）前までの期間を特定する。そして、抽出部１２７は、複数の共通ユーザのそれぞれの第１部分時系列データ及び第２時系列データから、特定した期間に対応する時系列データを、第１部分時系列データ及び第２部分時系列データとして抽出する。このようにすることで、情報処理装置１は、所定のイベントが発生するときのユーザの行動傾向に対応したユーザの特徴ベクトルを生成することができる。 Specifically, the extraction unit 127 extracts a plurality of first partial time series data and second partial time series data based on the appearance time of the item corresponding to the predetermined event. More specifically, first, the extraction unit 127 refers to the first partial time series data for each of the plurality of common users, and from the appearance time of the item corresponding to the predetermined event, a predetermined time (for example, 1 hour). Identify the previous period. Then, the extraction unit 127 selects the time series data corresponding to the specified period from the first partial time series data and the second time series data of the plurality of common users, respectively, in the first partial time series data and the second partial time. Extract as series data. By doing so, the information processing device 1 can generate a user feature vector corresponding to the user's behavioral tendency when a predetermined event occurs.

なお、抽出部１２７は、第１時系列データ及び第２時系列データから、同一の時間間隔を有する、複数の第１部分時系列データ及び第２部分時系列データを抽出してもよい。ここで、同一の時間間隔は、例えば１日であり、抽出部１２７は、第１時系列データ及び第２時系列データに含まれる毎日の時系列データを、第１部分時系列データ及び第２部分時系列データとして抽出する。このようにすることで、情報処理装置１は、ユーザの１日の行動傾向に対応したユーザの特徴ベクトルを生成することができる。 The extraction unit 127 may extract a plurality of first partial time series data and second partial time series data having the same time interval from the first time series data and the second time series data. Here, the same time interval is, for example, one day, and the extraction unit 127 uses the daily time series data included in the first time series data and the second time series data as the first partial time series data and the second time series data. Extract as partial time series data. By doing so, the information processing device 1 can generate a user's feature vector corresponding to the user's daily behavior tendency.

特定部１２２は、複数の共通ユーザのそれぞれに対して、第１部分時系列データと第２部分時系列データとの組み合わせを複数特定する。そして、学習部１２３は、特定部１２２が特定した複数の組み合わせに対応する第１部分時系列データと第２部分時系列データとに基づいて、アイテムの出現性とユーザの関係性とを同時に学習する。 The identification unit 122 specifies a plurality of combinations of the first partial time series data and the second partial time series data for each of the plurality of common users. Then, the learning unit 123 simultaneously learns the appearance of the item and the relationship between the users based on the first partial time series data and the second partial time series data corresponding to the plurality of combinations specified by the specific unit 122. To do.

［第２実施形態における効果］
以上の通り、第１実施形態に係る情報処理装置１は、所定の条件に基づいて、第１時系列データ及び第２時系列データから複数の第１部分時系列データ及び第２部分時系列データを抽出し、複数の共通ユーザのそれぞれに対して、第１部分時系列データと、第２部分時系列データとの組み合わせを複数特定する。そして、情報処理装置１は、特定した複数の組み合わせに対応する第１部分時系列データと第２部分時系列データとに基づいて、アイテムの出現性とユーザの関係性とを学習する。このようにすることで、情報処理装置１は、所定の条件に対応する期間におけるユーザの行動傾向を反映した特徴ベクトルを生成することができる。 [Effect in the second embodiment]
As described above, the information processing apparatus 1 according to the first embodiment has a plurality of first partial time series data and second partial time series data from the first time series data and the second time series data based on predetermined conditions. Is extracted, and a plurality of combinations of the first partial time series data and the second partial time series data are specified for each of the plurality of common users. Then, the information processing device 1 learns the appearance of the item and the relationship between the users based on the first partial time series data and the second partial time series data corresponding to the plurality of specified combinations. By doing so, the information processing apparatus 1 can generate a feature vector that reflects the behavioral tendency of the user in a period corresponding to a predetermined condition.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の全部又は一部は、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を併せ持つ。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes can be made within the scope of the gist. is there. For example, all or a part of the device can be functionally or physically distributed / integrated in any unit. Also included in the embodiments of the present invention are new embodiments resulting from any combination of the plurality of embodiments. The effect of the new embodiment produced by the combination has the effect of the original embodiment.

１・・・情報処理装置、１１・・・記憶部、１２・・・制御部、１２１・・・時系列データ取得部、１２２・・・特定部、１２３・・・学習部、１２４・・・特徴ベクトル生成部、１２５・・・予測部、１２６・・・広告配信部、１２７・・・抽出部 1 ... Information processing device, 11 ... Storage unit, 12 ... Control unit, 121 ... Time series data acquisition unit, 122 ... Specific unit, 123 ... Learning unit, 124 ... Feature vector generation unit, 125 ... prediction unit, 126 ... advertisement distribution unit, 127 ... extraction unit

Claims

Time-series data including item information indicating an item corresponding to the event, the time when the event occurred, and user identification information for identifying the user who generated the event, and is the time-series data of the first domain. A time-series data acquisition unit that acquires a certain first time-series data and a second time-series data that is time-series data of the second domain.
Among the users, a specific unit that specifies a combination of the first time series data corresponding to the common user common to the first domain and the second domain and the second time series data, and
Based on the first time-series data and the second time-series data corresponding to the combination specified by the specific unit, the relationship between the appearance of the item and the other items and the first time-series data. By simultaneously learning the relationship between the common user corresponding to the item and the common user corresponding to the second time series data, the item or another item related to the user or another user is predicted. A learning unit that generates a model and
Based on the model generated by the learning unit, a feature that generates a feature vector of the item and a feature vector of the user that reflects the relationship of appearance of the item in the first domain and the second domain. Vector generator and
Information processing device equipped with.

Based on a predetermined condition, a plurality of first partial time series data corresponding to a partial period of the period corresponding to the first time series data is extracted from the first time series data, and the second Further provided with an extraction unit that extracts a plurality of second part time series data corresponding to a partial period of the period corresponding to the second time series data from the time series data.
The specific unit specifies a plurality of combinations of the first partial time series data and the second partial time series data for each of the plurality of common users.
The learning unit learns the appearance of the item and the relationship between the users based on the first partial time series data and the second partial time series data corresponding to the plurality of combinations.
The information processing device according to claim 1.

The extraction unit extracts a plurality of the first partial time series data and the second partial time series data based on the appearance time of the item corresponding to the predetermined event.
The information processing device according to claim 2.

The extraction unit extracts a plurality of the first partial time series data and the second partial time series data having the same time interval.
The information processing device according to claim 2.

The learning unit corresponds to each of the first time-series data and the second time-series data specified by the specific unit, and generates sequence data including item identification information for identifying items in the order of appearance of the items. Sequence data generated by assigning user identification information for learning to distinguish a user between the first time series data and the second time series data to the first time series data and the second time series data. , The appearance of the item and the relationship between the users are learned based on the user identification information for learning.
The information processing device according to any one of claims 2 to 4.

The learning unit constructs the first synthetic variable by linearly connecting the user identification information in the first domain, and constitutes the second synthetic variable by linearly connecting the user identification information in the second domain. By performing canonical correlation analysis between the user identification information in the first domain and the user identification information in the second domain, the correlation coefficient between the first synthetic variable and the second synthetic variable becomes relatively large. The coefficient corresponding to each of the first synthetic variable and the user identification information included in the second synthetic variable is learned, and the relationship of the user is learned based on the coefficient.
The information processing device according to claim 5.

The first time series data includes at least one of a predetermined event and an event different from the predetermined event.
The feature vector generation unit includes the first time series data and the first time series data based on the feature vectors of each of a plurality of items included in at least one of the first time series data and the second time series data. User feature vectors corresponding to the second time series data are generated as the first feature vector and the second feature vector, respectively.
With respect to the input of the first feature vector, based on the first feature vector which is the feature vector of the user corresponding to the first domain and the result of whether or not the user has generated a predetermined event. By generating a classifier that classifies the user into a user who generated the predetermined event and a user who did not generate the predetermined event, and inputting the second feature vector into the generated classifier. The user further includes a prediction unit that classifies the user corresponding to the second feature vector into a user who is predicted to generate the predetermined event and a user who is predicted not to generate the predetermined event.
The information processing device according to any one of claims 1 to 6.

The prediction unit selects a common user to be used for generating the classifier based on the distance between the first feature vector corresponding to the common user and the second feature vector, and the first of the selected common users. 1 The classifier is generated based on the feature vector and the result of whether or not the common user has generated a predetermined event.
The information processing device according to claim 7.

An advertisement distribution unit that delivers an advertisement to a user who has generated the predetermined event to a user who is predicted to generate the predetermined event in the classifier is further provided.
The information processing device according to claim 7 or 8.

Computer runs,
Time-series data including item information indicating an item corresponding to the event, the time when the event occurred, and user identification information for identifying the user who generated the event, and is the time-series data of the first domain. A step of acquiring a certain first time series data and a second time series data which is time series data of the second domain,
Among the users, a step of specifying a combination of the first time series data corresponding to the common user common to the first domain and the second domain and the second time series data.
Based on the first time-series data and the second time-series data corresponding to the specified combination, the relationship between the appearance of the item and the other items and the first time-series data are supported. By simultaneously learning the relationship between the common user and the common user corresponding to the second time series data, a model for predicting the item or another item related to the user or another user is generated. Steps to do and
Based on the generated model, a step of generating a feature vector of the item and a feature vector of the user reflecting the relationship of appearance of the item in the first domain and the second domain.
Information processing method including.

Computer,
Time-series data including item information indicating an item corresponding to the event, the time when the event occurred, and user identification information for identifying the user who generated the event, and is the time-series data of the first domain. A time-series data acquisition unit that acquires a certain first time-series data and a second time-series data that is time-series data of the second domain.
A specific unit that specifies a combination of the first time-series data corresponding to a common user common to the first domain and the second domain among the users and the second time-series data.
Based on the first time series data and the second time series data corresponding to the combination specified by the specific unit, the relationship between the appearance of the item and the other items and the first time series data. By simultaneously learning the relationship between the common user corresponding to the item and the common user corresponding to the second time series data, the item or another item related to the user or another user is predicted. Learning unit that generates a model, and
Based on the model generated by the learning unit, a feature that generates a feature vector of the item and a feature vector of the user that reflects the relationship of appearance of the item in the first domain and the second domain. Vector generator,
A program that functions as.