JP6933623B2

JP6933623B2 - Feature vector generator, feature vector generation method and feature vector generation program

Info

Publication number: JP6933623B2
Application number: JP2018178806A
Authority: JP
Inventors: コウ牛; 慧米川; 茂莉黒川; 亜令小林
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-09-25
Filing date: 2018-09-25
Publication date: 2021-09-08
Anticipated expiration: 2038-09-25
Also published as: JP2020052518A

Description

本発明は、特徴ベクトル生成装置、特徴ベクトル生成方法及び特徴ベクトル生成プログラムに関する。 The present invention relates to a feature vector generator, a feature vector generation method, and a feature vector generation program.

幅広い業種で時系列データ分析が使われている。時系列データ分析では、過去のデータに基づいて将来のデータを推定する場面が多い。従来、線形の時系列解析手法（例えば、自己回帰和分移動平均モデル）がよく使われてきたが、機械学習手法も提案されている。機械学習技術は非線形な時系列データや複雑な周期の時系列データにも効果的に対応できる。 Time series data analysis is used in a wide range of industries. In time series data analysis, future data is often estimated based on past data. Conventionally, a linear time series analysis method (for example, an autoregressive integrated moving average model) has been often used, but a machine learning method has also been proposed. Machine learning technology can effectively handle non-linear time-series data and time-series data with complex periods.

近年、時系列データの各アイテムに特徴ベクトルを割り当てることにより、アイテムのクラスタリングや時系列パターンの分類等を行う機械学習技術が提案されている。例えば、非特許文献１には、Ｗｏｒｄ２Ｖｅｃ（Ｄｏｃ２Ｖｅｃ）を利用して、全ユーザの時系列データセットの各アイテムに特徴ベクトルを割り当て、当該特徴ベクトルに基づいて、ユーザにアイテムの推薦を行うことが開示されている。また、非特許文献２には、時系列データのアイテム間の間隔を考慮して、各アイテムに特徴ベクトルの割り当てを行うことが開示されている。 In recent years, a machine learning technique for clustering items and classifying time-series patterns by assigning feature vectors to each item of time-series data has been proposed. For example, in Non-Patent Document 1, Word2Vec (Doc2Vec) can be used to assign a feature vector to each item in the time-series data set of all users, and recommend items to users based on the feature vector. It is disclosed. Further, Non-Patent Document 2 discloses that a feature vector is assigned to each item in consideration of the interval between items of time series data.

Ozsoy, Makbule Gulcin. "From word embeddings to item recommendation." arXiv preprint arXiv:1601.01356, ２０１６年Ozsoy, Makbule Gulcin. "From word embeddings to item recommendation." ArXiv preprint arXiv: 1601.01356, 2016 Hong, Shenda, et al. "Event2vec: Learning Representations of Events on Temporal Sequences." Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data. Springer, Cham,２０１７年Hong, Shenda, et al. "Event2vec: Learning Representations of Events on Temporal Sequences." Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data. Springer, Cham, 2017

従来の技術では、１つのドメインに対応する時系列データセットについて、各アイテムに特徴ベクトルを割り当てているものであり、異なるドメインの各アイテムに特徴ベクトルを割り当てることについては考慮されていない。したがって、異なるドメインのそれぞれに対応する時系列データセット間の転移学習を行う場合、これらのデータセットの関連性を考慮できず、転移学習を精度良く行うことができないという問題がある。 In the conventional technique, the feature vector is assigned to each item in the time series data set corresponding to one domain, and the assignment of the feature vector to each item in a different domain is not considered. Therefore, when performing transfer learning between time-series data sets corresponding to each of the different domains, there is a problem that the relevance of these data sets cannot be taken into consideration and the transfer learning cannot be performed accurately.

そこで、本発明はこれらの点に鑑みてなされたものであり、異なるドメインに対応する時系列データのアイテムに関連性を持たせて特徴ベクトルを生成することができる特徴ベクトル生成装置、特徴ベクトル生成方法及び特徴ベクトル生成プログラムを提供することを目的とする。 Therefore, the present invention has been made in view of these points, and is a feature vector generator and feature vector generator capable of generating feature vectors by associating items of time series data corresponding to different domains with each other. It is an object of the present invention to provide a method and a feature vector generation program.

本発明の第１の態様に係る特徴ベクトル生成装置は、イベントに対応するアイテムを示すアイテム情報と、前記イベントが発生した時刻と、前記イベントを発生させたユーザを識別するユーザ識別情報とを含む時系列データであって、第１ドメインの時系列データである第１時系列データと、第２ドメインの時系列データである第２時系列データとを取得する時系列データ取得部と、複数の前記ユーザ識別情報の対応関係を特定する対応関係特定部と、前記第１時系列データ及び前記第２時系列データに含まれる時刻と、前記対応関係特定部が特定した前記ユーザ識別情報の対応関係とに基づいて、前記第１時系列データの一部と前記第２時系列データの一部とを統合することにより部分時系列データを複数生成する統合部と、前記統合部により統合された複数の前記部分時系列データに基づいて、複数の前記部分時系列データのそれぞれに含まれる複数の前記アイテム情報のそれぞれが示すアイテムの特徴を示す特徴ベクトルを生成する特徴ベクトル生成部と、を備える。 The feature vector generator according to the first aspect of the present invention includes item information indicating an item corresponding to an event, a time when the event occurs, and user identification information that identifies a user who has generated the event. A plurality of time-series data acquisition units for acquiring the first time-series data which is the time-series data of the first domain and the second time-series data which is the time-series data of the second domain. Correspondence relationship between the correspondence relationship specifying unit that specifies the correspondence relationship of the user identification information, the time included in the first time series data and the second time series data, and the user identification information specified by the correspondence relationship identification unit. Based on the above, an integrated unit that generates a plurality of partial time series data by integrating a part of the first time series data and a part of the second time series data, and a plurality of integrated units integrated by the integrated unit. Based on the partial time series data of the above, the feature vector generation unit for generating a feature vector indicating the feature of the item indicated by each of the plurality of item information included in each of the plurality of the partial time series data is provided.

第１時系列データには、所定のイベントに対応するアイテム情報と、前記所定のイベントとは異なるイベントに対応するアイテム情報との少なくともいずれかが含まれており、前記統合部は、前記第１時系列データから、当該第１時系列データに含まれる前記所定のイベントの発生時刻を含み、当該発生時刻以前の期間に対応する時系列データである第１部分時系列データを抽出し、前記第２時系列データから、当該発生時刻以前の期間に対応する時系列データである第２部分時系列データを抽出し、当該第１部分時系列データと、当該第２時系列データとを統合することにより前記部分時系列データを生成してもよい。 The first time-series data includes at least one of item information corresponding to a predetermined event and item information corresponding to an event different from the predetermined event, and the integrated unit includes the first item. From the time-series data, the first partial time-series data including the occurrence time of the predetermined event included in the first time-series data and corresponding to the period before the occurrence time is extracted, and the first partial time-series data is extracted. 2. Extract the second partial time series data, which is the time series data corresponding to the period before the occurrence time, from the time series data, and integrate the first partial time series data with the second time series data. May generate the partial time series data.

前記統合部は、前記第１時系列データに前記所定のイベントに対応するアイテム情報が含まれていない場合には、前記第１時系列データから、任意の期間に対応する時系列データを前記第１部分時系列データとして抽出するとともに、前記第２時系列データから、前記任意の期間に対応する時系列データを前記第２部分時系列データとして抽出し、当該第１部分時系列データと当該第２部分時系列データとを統合することにより部分時系列データを生成してもよい。 When the first time-series data does not include the item information corresponding to the predetermined event, the integration unit obtains the time-series data corresponding to an arbitrary period from the first time-series data. In addition to extracting as one partial time series data, time series data corresponding to the arbitrary period is extracted as the second partial time series data from the second time series data, and the first partial time series data and the first part time series data are extracted. Partial time series data may be generated by integrating with two partial time series data.

前記統合部は、前記部分時系列データに含まれる前記アイテム情報の数が予め定められた数となるように前記部分時系列データを生成してもよい。
前記統合部は、前記第１部分時系列データに含まれるアイテム情報の数が第１の数になるように前記第１部分時系列データを抽出し、前記第２部分時系列データに含まれるアイテム情報の数が第２の数になるように前記第２部分時系列データを抽出してもよい。 The integrated unit may generate the partial time series data so that the number of the item information included in the partial time series data is a predetermined number.
The integration unit extracts the first partial time series data so that the number of item information included in the first partial time series data becomes the first number, and the item included in the second partial time series data. The second partial time series data may be extracted so that the number of information becomes the second number.

前記統合部は、前記第１時系列データにおいて前記第１部分時系列データに含まれるアイテム情報の数が第１の数となる期間を特定し、当該期間に対応する前記第２時系列データを第２部分時系列データとして抽出し、当該第２部分時系列データに含まれるアイテム情報の数が前記第２の数よりも多い場合には、当該第２部分時系列データに含まれるアイテム情報の数が第２の数となるように前記アイテム情報を削減してもよい。 The integration unit specifies a period in which the number of item information included in the first partial time-series data is the first number in the first time-series data, and selects the second time-series data corresponding to the period. When the number of item information included in the second partial time series data is larger than the second number, the item information included in the second partial time series data is extracted as the second partial time series data. The item information may be reduced so that the number becomes the second number.

前記統合部は、前記第１時系列データにおいて前記第１部分時系列データに含まれるアイテム情報の数が第１の数となる期間を特定し、当該期間に対応する前記第２時系列データに含まれるアイテム情報の数が前記第２の数よりも少ない場合には、当該期間を長くして、前記第２部分時系列データに含まれるアイテム情報の数が第２の数となるように前記第２部分時系列データを抽出してもよい。 The integration unit specifies a period in which the number of item information included in the first partial time-series data is the first number in the first time-series data, and the second time-series data corresponding to the period is used. When the number of item information included is smaller than the second number, the period is extended so that the number of item information included in the second partial time series data becomes the second number. The second part time series data may be extracted.

前記統合部は、前記部分時系列データに含まれる前記アイテム情報に対応するイベントが発生した期間が予め定められた所定期間となるように前記部分時系列データを生成してもよい。
前記統合部は、前記部分時系列データに含まれる前記所定のイベントに対応するアイテム情報の数が予め定められた数となるように前記部分時系列データを生成してもよい。 The integrated unit may generate the partial time series data so that the period in which the event corresponding to the item information included in the partial time series data occurs is a predetermined predetermined period.
The integrated unit may generate the partial time series data so that the number of item information corresponding to the predetermined event included in the partial time series data is a predetermined number.

前記生成部は、複数の前記部分時系列データに含まれる前記複数のアイテム情報が示すアイテムの関係を解析することにより、複数のアイテムのそれぞれの特徴ベクトルを生成してもよい。 The generation unit may generate a feature vector of each of the plurality of items by analyzing the relationship between the items indicated by the plurality of item information included in the plurality of the partial time series data.

前記統合部は、前記対応関係特定部が特定した前記ユーザ識別情報の対応関係に基づいて、前記第１時系列データに対応するユーザと同一のユーザに対応する前記第２時系列データを特定し、当該第１時系列データの一部と当該第２時系列データの一部とを統合することにより前記部分時系列データを生成してもよい。 The integration unit identifies the second time-series data corresponding to the same user as the user corresponding to the first time-series data, based on the correspondence of the user identification information specified by the correspondence-specificing unit. , The partial time series data may be generated by integrating a part of the first time series data and a part of the second time series data.

第１時系列データには、所定のイベントと、前記所定のイベントとは異なるイベントとの少なくともいずれかが含まれており、前記生成部は、前記第１時系列データに含まれる複数のアイテム情報が示す複数のアイテムのそれぞれの前記特徴ベクトルに基づいて、前記第１時系列データに対応する前記ユーザの特徴ベクトルを第１特徴ベクトルとして生成するとともに、前記第２時系列データに含まれる複数のアイテム情報が示す複数のアイテムのそれぞれの前記特徴ベクトルに基づいて、前記第２時系列データに対応する前記ユーザの特徴ベクトルを第２特徴ベクトルとして生成し、前記特徴ベクトル生成装置は、複数の前記第１特徴ベクトルと、当該第１特徴ベクトルに対応するユーザが所定のイベントを発生させたか否かの結果とに基づいて、ユーザの特徴ベクトルの入力に対して、当該ユーザを、前記所定のイベントを発生させたユーザと、前記所定のイベントを発生させなかったユーザとに分類する分類器を生成し、生成した前記分類器に前記第２特徴ベクトルを入力することにより、前記第２特徴ベクトルに対応するユーザを、前記所定のイベントを発生させると予測されるユーザと、前記所定のイベントを発生させないと予測されるユーザとに分類する予測部をさらに備えてもよい。 The first time-series data includes at least one of a predetermined event and an event different from the predetermined event, and the generation unit includes a plurality of item information included in the first time-series data. Based on the feature vector of each of the plurality of items indicated by, the feature vector of the user corresponding to the first time series data is generated as the first feature vector, and a plurality of features included in the second time series data. Based on the feature vector of each of the plurality of items indicated by the item information, the feature vector of the user corresponding to the second time series data is generated as the second feature vector, and the feature vector generator is a plurality of the above. Based on the first feature vector and the result of whether or not the user corresponding to the first feature vector has generated a predetermined event, the user is sent to the predetermined event in response to the input of the user's feature vector. By generating a classifier that classifies the user who generated the above and the user who did not generate the predetermined event and inputting the second feature vector into the generated classifier, the second feature vector can be used. A prediction unit may be further provided to classify the corresponding users into a user who is predicted to generate the predetermined event and a user who is predicted not to generate the predetermined event.

本発明の第２の態様に係る特徴ベクトル生成方法は、コンピュータが実行する、イベントに対応するアイテムを示すアイテム情報と、前記イベントが発生した時刻と、前記イベントを発生させたユーザを識別するユーザ識別情報とを含む時系列データであって、第１ドメインの時系列データである第１時系列データと、第２ドメインの時系列データである第２時系列データとを取得するステップと、複数の前記ユーザ識別情報の対応関係を特定するステップと、前記第１時系列データ及び前記第２時系列データに含まれる時刻と、特定された前記ユーザ識別情報の対応関係とに基づいて、前記第１時系列データの一部と前記第２時系列データの一部とを統合することにより部分時系列データを複数生成するステップと、統合された複数の前記部分時系列データに基づいて、複数の前記部分時系列データのそれぞれに含まれる複数の前記アイテム情報のそれぞれが示すアイテムの特徴を示す特徴ベクトルを生成するステップと、を備える。 In the feature vector generation method according to the second aspect of the present invention, the item information indicating the item corresponding to the event executed by the computer, the time when the event occurs, and the user who identifies the user who generated the event are identified. A plurality of steps for acquiring the first time-series data which is the time-series data of the first domain and the second time-series data which is the time-series data of the second domain, which is the time-series data including the identification information. Based on the step of specifying the correspondence relationship of the user identification information, the time included in the first time series data and the second time series data, and the correspondence relationship of the specified user identification information. A plurality of steps based on a step of generating a plurality of partial time series data by integrating a part of one time series data and a part of the second time series data, and a plurality of integrated partial time series data. It includes a step of generating a feature vector indicating the feature of the item indicated by each of the plurality of item information included in each of the partial time series data.

本発明の第３の態様に係る特徴ベクトル生成プログラムは、コンピュータを、イベントに対応するアイテムを示すアイテム情報と、前記イベントが発生した時刻と、前記イベントを発生させたユーザを識別するユーザ識別情報とを含む時系列データであって、第１ドメインの時系列データである第１時系列データと、第２ドメインの時系列データである第２時系列データとを取得する時系列データ取得部、複数の前記ユーザ識別情報の対応関係を特定する対応関係特定部、前記第１時系列データ及び前記第２時系列データに含まれる時刻と、前記対応関係特定部が特定した前記ユーザ識別情報の対応関係とに基づいて、前記第１時系列データの一部と前記第２時系列データの一部とを統合することにより部分時系列データを複数生成する統合部、及び、前記統合部により統合された複数の前記部分時系列データに基づいて、複数の前記部分時系列データのそれぞれに含まれる複数の前記アイテム情報のそれぞれが示すアイテムの特徴を示す特徴ベクトルを生成する特徴ベクトル生成部、として機能させる。 The feature vector generation program according to the third aspect of the present invention uses the computer to display item information indicating an item corresponding to an event, a time when the event occurs, and user identification information that identifies a user who has generated the event. A time-series data acquisition unit that acquires the first time-series data, which is the time-series data of the first domain, and the second time-series data, which is the time-series data of the second domain. Correspondence between the correspondence relationship identification unit for specifying the correspondence relationship of a plurality of the user identification information, the time included in the first time series data and the second time series data, and the user identification information specified by the correspondence relationship identification unit. Based on the relationship, the integration unit that generates a plurality of partial time series data by integrating a part of the first time series data and a part of the second time series data, and the integration unit are integrated. It functions as a feature vector generator that generates a feature vector indicating the feature of the item indicated by each of the plurality of item information included in each of the plurality of the partial time series data based on the plurality of the partial time series data. Let me.

本発明によれば、異なるドメインに対応する時系列データのアイテムに関連性を持たせて特徴ベクトルを生成することができるという効果を奏する。 According to the present invention, it is possible to generate a feature vector by associating items of time series data corresponding to different domains.

本実施形態に係る特徴ベクトル生成装置の概要を説明する図である。It is a figure explaining the outline of the feature vector generation apparatus which concerns on this embodiment. 本実施形態に係る特徴ベクトル生成装置の構成を示す図である。It is a figure which shows the structure of the feature vector generation apparatus which concerns on this embodiment. 本実施形態に係る第１時系列データ及び第２時系列データの例を示す図である。It is a figure which shows the example of the 1st time series data and the 2nd time series data which concerns on this embodiment. 本実施形態に係る部分時系列データの生成例を示す図である。It is a figure which shows the generation example of the partial time series data which concerns on this embodiment. 図３に示す時系列データに基づいて生成した特徴ベクトルを特徴空間に配置した例を示す図である。It is a figure which shows the example which arranged the feature vector generated based on the time series data shown in FIG. 3 in a feature space. 本実施形態に係る第１特徴ベクトルに基づいて分類器を学習させた例を示す図である。It is a figure which shows the example which trained the classifier based on the 1st feature vector which concerns on this embodiment. 本実施形態に係る第２部分時系列データを含む部分時系列データに基づいて生成された第１特徴ベクトルを含めて分類器を学習させた例を示す図である。It is a figure which shows the example which trained the classifier including the 1st feature vector generated based on the partial time series data including the 2nd partial time series data which concerns on this embodiment. 本実施形態に係る分類器により第２特徴ベクトルが分類された例を示す図である。It is a figure which shows the example which the 2nd feature vector was classified by the classifier which concerns on this embodiment. 本実施形態に係る特徴ベクトル生成装置がアイテムの特徴ベクトルを生成するときの処理の流れを示すフローチャートである。It is a flowchart which shows the flow of processing when the feature vector generation apparatus which concerns on this embodiment generates a feature vector of an item. 本実施形態に係る特徴ベクトル生成装置が所定のイベントを発生させるユーザを予測するときの処理の流れを示すフローチャートである。It is a flowchart which shows the flow of processing when the feature vector generation apparatus which concerns on this embodiment predicts a user who generates a predetermined event.

［特徴ベクトル生成装置の概要］
図１は、本実施形態に係る特徴ベクトル生成装置の概要を説明する図である。特徴ベクトル生成装置は、異なるドメインの時系列データを統合し、当該時系列データに含まれるアイテムの特徴を示す特徴ベクトルを生成するコンピュータである。 [Overview of feature vector generator]
FIG. 1 is a diagram illustrating an outline of a feature vector generator according to the present embodiment. The feature vector generator is a computer that integrates time series data of different domains and generates a feature vector indicating the features of the items included in the time series data.

特徴ベクトル生成装置は、第１ドメインの時系列データを取得するとともに、第２ドメインの時系列データを取得する（図１の（１））。本実施形態においてドメインは、時系列データの種別に基づいて時系列データを分類するための領域である。本実施形態では、第１ドメインは、例えば、ＥＣ（Electronic Commerce）サイトにおけるアイテムの購入に関するユーザの行動を示す時系列データを含む領域であり、第２ドメインは、例えば、任意のウェブサイトにおけるユーザの閲覧行動を示す時系列データを含む領域である。 The feature vector generator acquires the time-series data of the first domain and the time-series data of the second domain ((1) in FIG. 1). In the present embodiment, the domain is an area for classifying time series data based on the type of time series data. In the present embodiment, the first domain is, for example, an area containing time-series data indicating the user's behavior regarding the purchase of an item on an EC (Electronic Commerce) site, and the second domain is, for example, a user on an arbitrary website. This is an area containing time-series data indicating the browsing behavior of.

また、時系列データには、イベントに対応するアイテムを示すアイテム情報と、イベントが発生した時刻と、イベントを発生させたユーザを識別するユーザ識別情報とが含まれている。 In addition, the time-series data includes item information indicating an item corresponding to the event, a time when the event occurred, and user identification information for identifying the user who generated the event.

特徴ベクトル生成装置は、第１ドメインの時系列データ及び第２ドメインの時系列データのうち、共通のユーザの時系列データを特定する。そして、特徴ベクトル生成装置は、共通のユーザの時系列データに含まれる時刻情報に基づいて、共通のユーザの第１ドメインの時系列データの一部と、第２ドメインの時系列データの一部とを統合することにより、部分時系列データを複数生成する（図１の（２））。 The feature vector generator identifies the time-series data of a common user among the time-series data of the first domain and the time-series data of the second domain. Then, the feature vector generator has a part of the time series data of the first domain of the common user and a part of the time series data of the second domain based on the time information included in the time series data of the common user. By integrating with and, a plurality of partial time series data are generated ((2) in FIG. 1).

特徴ベクトル生成装置は、生成した複数の部分時系列データのそれぞれに含まれる複数のアイテム情報のそれぞれが示すアイテムの特徴を示す特徴ベクトルを生成する（図１の（３））。このようにすることで、特徴ベクトル生成装置は、異なるドメインに対応する時系列データのアイテムに関連性を持たせて特徴ベクトルを生成することができる。これにより、特徴ベクトル生成装置は、異なるドメインに対応する時系列データ間の転移学習を精度良く行うことができる。
以下、特徴ベクトル生成装置の構成について説明する。 The feature vector generator generates a feature vector indicating the feature of the item indicated by each of the plurality of item information included in each of the generated plurality of partial time series data ((3) in FIG. 1). By doing so, the feature vector generator can generate the feature vector by associating the items of the time series data corresponding to different domains. As a result, the feature vector generator can accurately perform transfer learning between time series data corresponding to different domains.
Hereinafter, the configuration of the feature vector generator will be described.

［特徴ベクトル生成装置１の構成例］
図２は、本実施形態に係る特徴ベクトル生成装置１の構成を示す図である。特徴ベクトル生成装置１は、記憶部１１と、制御部１２とを備える。 [Configuration example of feature vector generator 1]
FIG. 2 is a diagram showing a configuration of a feature vector generation device 1 according to the present embodiment. The feature vector generation device 1 includes a storage unit 11 and a control unit 12.

記憶部１１は、例えば、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）等である。記憶部１１は、特徴ベクトル生成装置１を機能させるための各種プログラムを記憶する。例えば、記憶部１１は、特徴ベクトル生成装置１の制御部１２を、時系列データ取得部１２１、対応関係特定部１２２、統合部１２３、特徴ベクトル生成部１２４及び予測部１２５として機能させる特徴ベクトル生成プログラムを記憶する。 The storage unit 11 is, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), or the like. The storage unit 11 stores various programs for operating the feature vector generation device 1. For example, the storage unit 11 causes the control unit 12 of the feature vector generation device 1 to function as a time series data acquisition unit 121, a correspondence relationship identification unit 122, an integration unit 123, a feature vector generation unit 124, and a prediction unit 125. Memorize the program.

制御部１２は、例えばＣＰＵ（Central Processing Unit）である。制御部１２は、記憶部１１に記憶されている各種プログラムを実行することにより、特徴ベクトル生成装置１に係る機能を制御する。制御部１２は、記憶部１１に記憶されているプログラムを実行することにより、時系列データ取得部１２１、対応関係特定部１２２、統合部１２３、特徴ベクトル生成部１２４及び予測部１２５として機能する。 The control unit 12 is, for example, a CPU (Central Processing Unit). The control unit 12 controls the function related to the feature vector generation device 1 by executing various programs stored in the storage unit 11. The control unit 12 functions as a time series data acquisition unit 121, a correspondence relationship identification unit 122, an integration unit 123, a feature vector generation unit 124, and a prediction unit 125 by executing a program stored in the storage unit 11.

［アイテムの特徴ベクトルの生成］
本実施形態において、時系列データ取得部１２１、対応関係特定部１２２、統合部１２３、特徴ベクトル生成部１２４は、協働することにより、時系列データに含まれるアイテムの特徴ベクトルを生成する。以下、アイテムの特徴ベクトルの生成に係る時系列データ取得部１２１、対応関係特定部１２２、統合部１２３及び特徴ベクトル生成部１２４の機能について説明する。 [Generate item feature vector]
In the present embodiment, the time-series data acquisition unit 121, the correspondence relationship identification unit 122, the integration unit 123, and the feature vector generation unit 124 cooperate to generate the feature vector of the item included in the time-series data. Hereinafter, the functions of the time-series data acquisition unit 121, the correspondence relationship identification unit 122, the integration unit 123, and the feature vector generation unit 124 related to the generation of the feature vector of the item will be described.

時系列データ取得部１２１は、第１ドメインの時系列データである第１時系列データと、第２ドメインの時系列データである第２時系列データとを取得する。例えば、時系列データ取得部１２１は、ユーザのＥＣサイトにおけるアイテムの閲覧及び購買履歴を示す情報を第１時系列データとして取得するとともに、複数のユーザのそれぞれのウェブサイトの閲覧履歴を示す情報を第２時系列データとして取得する。時系列データ取得部１２１は、例えば、第１時系列データと第２時系列データとを収集する情報収集サーバ（不図示）から、複数の第１時系列データと、複数の第２時系列データとを所定時間おきに取得する。 The time-series data acquisition unit 121 acquires the first time-series data which is the time-series data of the first domain and the second time-series data which is the time-series data of the second domain. For example, the time-series data acquisition unit 121 acquires information indicating the browsing and purchasing history of items on the user's EC site as the first time-series data, and also obtains information indicating the browsing history of each of the websites of a plurality of users. Acquire as the second time series data. The time-series data acquisition unit 121 receives, for example, a plurality of first time-series data and a plurality of second time-series data from an information collection server (not shown) that collects the first time-series data and the second time-series data. And are acquired every predetermined time.

図３は、本実施形態に係る第１時系列データ及び第２時系列データの例を示す図である。図３（ａ）には、３つの第１時系列データＤ１Ａ〜Ｄ１Ｃが示されている。図３（ｂ）には、３つの第２時系列データＤ２Ａ〜Ｄ２Ｃが示されている。 FIG. 3 is a diagram showing an example of the first time series data and the second time series data according to the present embodiment. FIG. 3A shows three first time series data D1A to D1C. FIG. 3B shows three second time series data D2A to D2C.

第１時系列データには、イベントに対応するアイテムを示すアイテム情報と、イベントが発生した時刻と、イベントを発生させたユーザを識別するユーザ識別情報とを関連付けたイベントデータが複数含まれている。図３に示す例では、ｖ１〜ｖ７、ｐ１、ｐ２、ｗ１〜ｗ６は、イベントデータを示し、イベントデータの表示形態は、イベントの種別を示している。 The first time-series data includes a plurality of event data in which item information indicating an item corresponding to an event, the time when the event occurred, and user identification information for identifying the user who generated the event are associated with each other. .. In the example shown in FIG. 3, v1 to v7, p1, p2, and w1 to w6 indicate event data, and the display form of the event data indicates the type of event.

ここで、アイテムは、例えば、商品やサービスである。本実施形態において、イベントデータに付した符号を、アイテムを識別する識別情報とする。なお、異なる符号のアイテムは、同じアイテムであってもよいし、異なるアイテムであってもよい。例えば、イベントデータｐ１に対応するアイテムと、イベントデータｖ１に対応するアイテムは、同じであってもよいし、異なっていてもよい。 Here, the item is, for example, a product or service. In the present embodiment, the code attached to the event data is used as the identification information for identifying the item. Items having different codes may be the same item or different items. For example, the item corresponding to the event data p1 and the item corresponding to the event data v1 may be the same or different.

第１時系列データには、所定のイベントに対応するアイテム情報と、所定のイベントとは異なるイベントに対応するアイテム情報との少なくともいずれかが含まれている。また、第２時系列データにも、所定のイベントとは異なるイベントに対応するアイテム情報が含まれている。 The first time-series data includes at least one of item information corresponding to a predetermined event and item information corresponding to an event different from the predetermined event. Further, the second time series data also includes item information corresponding to an event different from the predetermined event.

例えば、第１時系列データにおいて、所定のイベントは、ユーザがＥＣサイト上でアイテムを購入するイベントである。また、第１時系列データにおいて、所定のイベントとは異なるイベントは、ユーザがＥＣサイトを閲覧するイベントである。図３（ａ）に示す例では、イベントデータｖ１〜ｖ７に対応するイベントは、ＥＣサイトを閲覧するイベントであり、イベントデータｐ１、ｐ２に対応するイベントは、ＥＣサイト上でアイテムを購入するイベントである。また、第２ドメインにおいて、所定のイベントとは異なるイベントは、ウェブサイトを閲覧するイベントである。図３（ｂ）に示す例では、イベントデータｗ１〜ｗ６に対応するイベントは、ウェブサイトを閲覧するイベントである。 For example, in the first time series data, a predetermined event is an event in which a user purchases an item on an EC site. Further, in the first time series data, an event different from the predetermined event is an event in which the user browses the EC site. In the example shown in FIG. 3A, the event corresponding to the event data v1 to v7 is an event to browse the EC site, and the event corresponding to the event data p1 and p2 is an event to purchase an item on the EC site. Is. Further, in the second domain, an event different from a predetermined event is an event for browsing a website. In the example shown in FIG. 3B, the event corresponding to the event data w1 to w6 is an event for browsing a website.

また、図３に示される矢印は、イベントの発生時刻に対応している。例えば、第１時系列データＤ１Ａでは、ＥＣサイト上で、イベントデータｖ１、ｖ２、ｖ３に対応するアイテムが順番に閲覧され、その後、イベントデータｐ１に対応するアイテムが購入されたことを示している。 Further, the arrows shown in FIG. 3 correspond to the time when the event occurs. For example, in the first time series data D1A, it is shown that the items corresponding to the event data v1, v2, and v3 are sequentially browsed on the EC site, and then the items corresponding to the event data p1 are purchased. ..

ユーザ識別情報は、ユーザを一意に特定可能な情報であり、例えば、ユーザが使用する端末に割り当てられたＩＰアドレスである。なお、ユーザ識別情報として、ＥＣサイトにおいてユーザを識別するために用いられるユーザＩＤが用いられてもよいし、各ウェブサイトにおいてユーザを識別するために用いられるユーザＩＤが用いられてもよい。 The user identification information is information that can uniquely identify the user, and is, for example, an IP address assigned to the terminal used by the user. As the user identification information, the user ID used to identify the user on the EC site may be used, or the user ID used to identify the user on each website may be used.

１つの時系列データに含まれる複数のイベントデータには、共通のユーザ識別情報が含まれている。このため、図３に示す例では、説明の便宜上、ユーザ識別情報に対応するユーザを別途表示するものとする。図３に示す例では、ユーザＡ、ユーザＢ、ユーザＣ、ユーザａ、ユーザｂ、ユーザｃを、それぞれの時系列データに対応するユーザ識別情報とする。 A plurality of event data included in one time series data includes common user identification information. Therefore, in the example shown in FIG. 3, for convenience of explanation, the user corresponding to the user identification information is separately displayed. In the example shown in FIG. 3, user A, user B, user C, user a, user b, and user c are used as user identification information corresponding to the respective time series data.

第１時系列データは、例えば、ＥＣサイトにおけるアクセス履歴であり、ＥＣサイトにアクセスした端末のＩＰアドレスと、ＥＣサイトにおけるＵＲＬと、当該ＵＲＬへのアクセス時刻とを関連付けたイベントデータが複数含まれている。ＥＣサイトにおけるＵＲＬには、アイテムが購入された場合に端末に表示される購入完了ページのＵＲＬ及びアイテムを説明するページのＵＲＬが含まれている。アイテムが購入された場合に端末に表示される購入完了ページのＵＲＬは、アイテムの購入イベントに対応するアイテム情報に対応しており、アイテムを説明するページのＵＲＬは、アイテムの閲覧イベントを示すアイテム情報に対応している。また、ＵＲＬへのアクセス時刻が、イベントが発生した時刻に対応している。 The first time-series data is, for example, an access history on an EC site, and includes a plurality of event data in which the IP address of the terminal that accessed the EC site, the URL on the EC site, and the access time to the URL are associated with each other. ing. The URL on the EC site includes the URL of the purchase completion page displayed on the terminal when the item is purchased and the URL of the page explaining the item. The URL of the purchase completion page displayed on the terminal when the item is purchased corresponds to the item information corresponding to the item purchase event, and the URL of the page explaining the item corresponds to the item indicating the item browsing event. Corresponds to information. Also, the access time to the URL corresponds to the time when the event occurred.

第２時系列データは、例えば、ウェブサイトにおけるアクセス履歴であり、ウェブサイトにアクセスした端末のＩＰアドレスと、ウェブサイトにおけるＵＲＬと、当該ＵＲＬへのアクセス時刻とを関連付けたイベントデータが複数含まれている。ウェブサイトにおけるＵＲＬには、アイテムを説明するページのＵＲＬが含まれている。アイテムを説明するページのＵＲＬは、ウェブサイトにおけるアイテムの閲覧イベントに対応するアイテム情報に対応しており、ＵＲＬへのアクセス時刻が、イベントが発生した時刻に対応している。 The second time-series data is, for example, an access history on a website, and includes a plurality of event data in which the IP address of the terminal that accessed the website, the URL on the website, and the access time to the URL are associated with each other. ing. The URL on the website includes the URL of the page that describes the item. The URL of the page explaining the item corresponds to the item information corresponding to the item browsing event on the website, and the access time to the URL corresponds to the time when the event occurred.

対応関係特定部１２２は、複数のユーザ識別情報の対応関係を特定する。例えば、対応関係特定部１２２は、第１時系列データに含まれるユーザ識別情報と、第２時系列データに含まれるユーザ識別情報の一致・不一致を特定することにより、複数のユーザ識別情報の対応関係を特定する。本実施形態では、対応関係特定部１２２は、第１時系列データのユーザ識別情報と、第２時系列データのユーザ識別情報とが一致する場合、これらの時系列データに対応するユーザが同一のユーザであると判定する。 The correspondence relationship specifying unit 122 specifies the correspondence relationship of a plurality of user identification information. For example, the correspondence relationship specifying unit 122 corresponds to a plurality of user identification information by identifying a match / mismatch between the user identification information included in the first time series data and the user identification information included in the second time series data. Identify the relationship. In the present embodiment, when the user identification information of the first time series data and the user identification information of the second time series data match, the correspondence identification unit 122 has the same user corresponding to these time series data. Determined to be a user.

なお、ユーザ識別情報が、ＥＣサイトにおいてユーザを識別するために用いられるユーザＩＤ、及び各ウェブサイトにおいてユーザを識別するために用いられるユーザＩＤである場合、記憶部１１には、これらのユーザＩＤの対応関係を示す対応関係情報が記憶されている。そして、対応関係特定部１２２は、記憶部１１に記憶されている対応関係情報を参照することにより、複数のユーザ識別情報の対応関係を特定する。 When the user identification information is a user ID used to identify a user on an EC site and a user ID used to identify a user on each website, the storage unit 11 stores these user IDs. Correspondence information indicating the correspondence of is stored. Then, the correspondence relationship specifying unit 122 specifies the correspondence relationship of the plurality of user identification information by referring to the correspondence relationship information stored in the storage unit 11.

統合部１２３は、第１時系列データ及び第２時系列データに含まれる時刻と、対応関係特定部１２２が特定したユーザ識別情報の対応関係とに基づいて、第１時系列データの一部と、第２時系列データの一部とを統合することにより、部分時系列データを複数生成する。 The integration unit 123 includes a part of the first time series data based on the time included in the first time series data and the second time series data and the correspondence relationship of the user identification information specified by the correspondence relationship identification unit 122. , A plurality of partial time series data are generated by integrating a part of the second time series data.

具体的には、統合部１２３は、第１時系列データから、当該第１時系列データに含まれる所定のイベントの発生時刻を含み、当該発生時刻以前の期間に対応する時系列データである第１部分時系列データを抽出する。例えば、統合部１２３は、第１時系列データから、所定のイベントの発生時刻から３０分前までの期間をデータ抽出期間に特定し、データ抽出期間に対応する時系列データを第１部分時系列データとして抽出する。 Specifically, the integration unit 123 includes the occurrence time of a predetermined event included in the first time series data from the first time series data, and is the time series data corresponding to the period before the occurrence time. Extract 1 partial time series data. For example, the integration unit 123 specifies a period from the first time series data to 30 minutes before the occurrence time of a predetermined event as a data extraction period, and sets the time series data corresponding to the data extraction period as the first part time series. Extract as data.

図３に示す例では、統合部１２３は、第１時系列データＤ１Ａから、所定のイベントとしてのアイテムの購入イベントに対応するイベントデータｐ１と、当該アイテムの購入イベントが発生する前に発生したアイテムの閲覧イベントに対応するイベントデータｖ１〜ｖ３とを、第１部分時系列データとして抽出する。 In the example shown in FIG. 3, the integration unit 123, from the first time series data D1A, event data p1 corresponding to the purchase event of the item as a predetermined event, and the item generated before the purchase event of the item occurs. The event data v1 to v3 corresponding to the browsing event of the above are extracted as the first partial time series data.

続いて、統合部１２３は、対応関係特定部１２２が特定したユーザ識別情報の対応関係に基づいて、第１時系列データに対応するユーザと同一のユーザに対応する第２時系列データを特定する。そして、統合部１２３は、第１時系列データに対応するユーザと同一のユーザに対応する第２時系列データから、当該所定のイベントの発生時刻以前の期間に対応する時系列データである第２部分時系列データを抽出する。例えば、統合部１２３は、当該第２時系列データから、第１部分時系列データに対して特定されたデータ抽出期間と同じ期間に対応する時系列データを第２部分時系列データとして抽出する。 Subsequently, the integration unit 123 specifies the second time series data corresponding to the same user as the user corresponding to the first time series data, based on the correspondence relationship of the user identification information specified by the correspondence relationship specifying unit 122. .. Then, the integration unit 123 is a second time-series data corresponding to a period before the occurrence time of the predetermined event from the second time-series data corresponding to the same user as the user corresponding to the first time-series data. Extract partial time series data. For example, the integration unit 123 extracts time-series data corresponding to the same period as the data extraction period specified for the first partial time-series data from the second time-series data as the second partial time-series data.

図３に示す例において、第１時系列データＤ１Ａに対応するユーザ識別情報であるユーザＡと、第２時系列データＤ２Ａに対応するユーザ識別情報であるユーザａとが対応しており、これらの時系列データに対応するユーザが同一であるものとする。この場合、統合部１２３は、第２時系列データＤ２Ａから、アイテムの閲覧イベントに対応するイベントデータｗ１と、イベントデータｗ２とを第２部分時系列データとして抽出する。ここで、イベントデータｗ２に対応するイベントの発生時刻は、イベントデータｐ１に対応するイベントの発生時刻よりも後であるため、統合部１２３は、イベントデータｗ２を第２部分時系列データに含めない。 In the example shown in FIG. 3, the user A, which is the user identification information corresponding to the first time series data D1A, and the user a, which is the user identification information corresponding to the second time series data D2A, correspond to each other. It is assumed that the users corresponding to the time series data are the same. In this case, the integration unit 123 extracts the event data w1 corresponding to the item browsing event and the event data w2 as the second partial time series data from the second time series data D2A. Here, since the event occurrence time corresponding to the event data w2 is later than the event occurrence time corresponding to the event data p1, the integration unit 123 does not include the event data w2 in the second part time series data. ..

そして、統合部１２３は、当該第１部分時系列データと、当該第２部分時系列データとを統合することにより部分時系列データを生成する。図４は、本実施形態に係る部分時系列データの生成例を示す図である。図４では、図３に示す第１時系列データＤ１Ａから抽出された第１部分時系列データＤ１Ａ−１と、第２時系列データＤ２Ａから抽出された第２部分時系列データＤ２Ａ−１が統合され、部分時系列データｄ１が生成されていることが確認できる。 Then, the integration unit 123 generates the partial time series data by integrating the first partial time series data and the second partial time series data. FIG. 4 is a diagram showing an example of generating partial time series data according to the present embodiment. In FIG. 4, the first partial time series data D1A-1 extracted from the first time series data D1A shown in FIG. 3 and the second partial time series data D2A-1 extracted from the second time series data D2A are integrated. It can be confirmed that the partial time series data d1 is generated.

ここで、統合部１２３は、部分時系列データに含まれるアイテム情報の数が予め定められた数となるように部分時系列データを生成してもよい。例えば、統合部１２３は、第１部分時系列データに含まれるアイテム情報の数が第１の数になるように第１部分時系列データを抽出するとともに、第２部分時系列データに含まれるアイテム情報の数が第２の数になるよう第２部分時系列データを抽出してもよい。 Here, the integration unit 123 may generate the partial time series data so that the number of item information included in the partial time series data is a predetermined number. For example, the integration unit 123 extracts the first partial time series data so that the number of item information included in the first partial time series data is the first number, and the integration unit 123 extracts the first partial time series data and the items included in the second partial time series data. The second partial time series data may be extracted so that the number of information becomes the second number.

例えば、統合部１２３は、第１時系列データにおいて、第１部分時系列データに含まれるアイテム情報の数が第１の数となるデータ抽出期間を特定する。そして、統合部１２３は、特定したデータ抽出期間に対応する第２時系列データを第２部分時系列データとして抽出する。統合部１２３は、抽出した第２部分時系列データに含まれるアイテム情報の数が第２の数よりも多い場合には、当該第２部分時系列データに含まれるアイテム情報の数が第２の数となるようにイベントデータを削減する。 For example, the integration unit 123 specifies a data extraction period in which the number of item information included in the first partial time series data is the first number in the first time series data. Then, the integration unit 123 extracts the second time-series data corresponding to the specified data extraction period as the second partial time-series data. When the number of item information included in the extracted second partial time series data is larger than the second number, the integration unit 123 has a second number of item information included in the second partial time series data. Reduce event data to a number.

例えば、統合部１２３は、同一のイベント及びアイテムを示すイベントデータが複数含まれている場合には、これらのイベントデータの一部を削除することによりアイテム情報の数を削減する。このようにすることで、特徴ベクトル生成装置１は、データ抽出期間において第２時系列データに含まれるアイテム情報の数が多い場合であっても、当該アイテム情報の数が第２の数となるように調整することができる。なお、統合部１２３は、同一のイベント及びアイテムを示すイベントデータが複数含まれている場合に、これらのイベントデータを消去せず、他のイベントデータとイベント及びアイテムが重複していないイベントデータを削除することによりアイテム情報の数を削減するようにしてもよい。 For example, when the integration unit 123 includes a plurality of event data indicating the same event and item, the integration unit 123 reduces the number of item information by deleting a part of the event data. By doing so, the feature vector generator 1 has a second number of item information even when the number of item information included in the second time series data is large during the data extraction period. Can be adjusted as follows. In addition, when the integration unit 123 includes a plurality of event data indicating the same event and item, the integration unit 123 does not delete the event data and collects the event data in which the event and the item do not overlap with other event data. The number of item information may be reduced by deleting the item.

また、統合部１２３は、特定したデータ抽出期間に対応する第２時系列データに含まれるアイテム情報の数が第２の数よりも少ない場合には、当該データ抽出期間を長くしてもよい。例えば、統合部１２３は、当該データ抽出期間の終了時刻はそのままとし、開始時刻を、当該開始時刻よりも過去の時刻とし、第２時系列データにおいて、データ抽出期間に含まれるアイテム情報の数が第２の数となるようにデータ抽出期間を長くする。そして、統合部１２３は、第２部分時系列データに含まれるアイテム情報の数が第２の数となるように第２部分時系列データを抽出する。 Further, when the number of item information included in the second time series data corresponding to the specified data extraction period is smaller than the second number, the integration unit 123 may lengthen the data extraction period. For example, the integration unit 123 sets the end time of the data extraction period as it is, sets the start time to a time earlier than the start time, and sets the number of item information included in the data extraction period in the second time series data. The data extraction period is lengthened so that it becomes the second number. Then, the integration unit 123 extracts the second partial time series data so that the number of item information included in the second partial time series data is the second number.

このようにすることで、特徴ベクトル生成装置１は、データ抽出期間において第２時系列データに含まれるアイテム情報の数が少ない場合であっても、当該アイテム情報の数が第２の数となるように調整することができる。 By doing so, the feature vector generator 1 has a second number of item information even when the number of item information included in the second time series data is small during the data extraction period. Can be adjusted as follows.

また、統合部１２３は、特定したデータ抽出期間に対応する第２時系列データに含まれるアイテム情報の数が第２の数よりも少ない場合には、データ抽出期間に対応する第２時系列データに含まれるアイテム情報を第２部分時系列データに含ませるとともに、当該アイテム情報を複製して第２部分時系列データに含ませることにより、第２部分時系列データに含まれるアイテム情報の数が第２の数になるようにしてもよい。 Further, when the number of item information included in the second time series data corresponding to the specified data extraction period is smaller than the second number, the integration unit 123 includes the second time series data corresponding to the data extraction period. By including the item information included in the second part time series data in the second part time series data and duplicating the item information and including it in the second part time series data, the number of item information included in the second part time series data can be increased. It may be a second number.

なお、統合部１２３は、第１時系列データにおいて、第１部分時系列データに含まれるアイテム情報の数が第１の数となるようにデータ抽出期間を特定し、当該データ抽出期間に基づいて部分時系列データを生成したが、これに限らない。統合部１２３は、部分時系列データに含まれるイベントデータに対応するイベントが発生した期間が予め定められた所定期間となるように部分時系列データを生成してもよい。このようにすることで、特徴ベクトル生成装置１は、第１の数及び第２の数に基づいてデータ抽出期間を設定することにより、当該期間が大幅に長くなり、所定のイベントとは関係がないイベントに対応するアイテム情報が部分時系列データに含まれてしまうことを抑制することができる。 In addition, the integration unit 123 specifies the data extraction period so that the number of item information included in the first partial time series data is the first number in the first time series data, and based on the data extraction period. Partial time series data was generated, but it is not limited to this. The integration unit 123 may generate the partial time series data so that the period in which the event corresponding to the event data included in the partial time series data occurs is a predetermined predetermined period. By doing so, the feature vector generator 1 sets the data extraction period based on the first number and the second number, so that the period becomes significantly longer and has no relation to the predetermined event. It is possible to prevent the item information corresponding to a non-event from being included in the partial time series data.

また、統合部１２３は、部分時系列データに含まれる所定のイベントとしてのアイテムの購入イベントに対応するイベントデータが予め定められた数となるように部分時系列データを生成するようにしてもよい。例えば、統合部１２３は、購入イベントに対応するイベントデータが部分時系列データに１つのみ含まれるように部分時系列データを生成してもよい。例えば、アイテムの購入が短期間に連続した場合、第１のアイテムの購入前のアイテムの閲覧イベントと、第２のアイテムの購入前のアイテムの閲覧イベントとは関係がない可能性が高い。これに対し、特徴ベクトル生成装置１は、アイテムの購入イベントが短期間に連続した場合に、アイテムの購入イベントに関係する可能性が高いアイテムの閲覧イベントに対応するイベントデータのみを部分時系列データに含めることができる。 Further, the integration unit 123 may generate the partial time series data so that the number of event data corresponding to the purchase event of the item as a predetermined event included in the partial time series data is a predetermined number. .. For example, the integration unit 123 may generate the partial time series data so that the partial time series data includes only one event data corresponding to the purchase event. For example, if the purchase of an item is continuous for a short period of time, it is highly possible that the browsing event of the item before the purchase of the first item and the browsing event of the item before the purchase of the second item are not related. On the other hand, the feature vector generation device 1 partially collects only the event data corresponding to the item browsing event that is likely to be related to the item purchase event when the item purchase events are continuous in a short period of time. Can be included in.

また、統合部１２３は、所定のイベントに対応するイベントデータが含まれるように部分時系列データを生成したが、これに限らない。統合部１２３は、第１時系列データに所定のイベントに対応するイベントデータが含まれていない場合には、第１時系列データから、任意の期間に対応する時系列データを第１部分時系列データとして抽出してもよい。この場合、統合部１２３は、第１時系列データに対応するユーザと同一のユーザに対応する第２時系列データから、当該任意の期間に対応する時系列データを第２部分時系列データとして抽出し、当該第１部分時系列データと当該第２部分時系列データとを統合することにより部分時系列データを生成してもよい。このようにすることで、特徴ベクトル生成装置１は、アイテムの購入が行われなかった場合におけるアイテム閲覧状況に基づく特徴ベクトルを生成することができる。 Further, the integration unit 123 generates partial time series data so as to include event data corresponding to a predetermined event, but the present invention is not limited to this. When the first time-series data does not include the event data corresponding to a predetermined event, the integration unit 123 selects the time-series data corresponding to an arbitrary period from the first time-series data as the first part time-series. It may be extracted as data. In this case, the integration unit 123 extracts the time series data corresponding to the arbitrary period as the second partial time series data from the second time series data corresponding to the same user as the user corresponding to the first time series data. Then, the partial time series data may be generated by integrating the first partial time series data and the second partial time series data. By doing so, the feature vector generation device 1 can generate a feature vector based on the item browsing status when the item is not purchased.

また、統合部１２３は、対応関係が特定されていないユーザに対応する第１時系列データと第２時系列データとのそれぞれに基づいて、部分時系列データを生成してもよい。例えば、統合部１２３は、第２時系列データのユーザとの対応関係が特定されていないユーザの第１時系列データから生成した第１部分時系列データをそのまま部分時系列データとして利用してもよい。また、統合部１２３は、第１時系列データのユーザとの対応関係が特定されていないユーザの第２時系列データから、任意の期間に対応する第２部分時系列データを抽出し、抽出した第２部分時系列データを部分時系列データとして利用してもよい。 Further, the integration unit 123 may generate partial time series data based on each of the first time series data and the second time series data corresponding to the user whose correspondence relationship is not specified. For example, the integration unit 123 may use the first partial time series data generated from the first time series data of the user whose correspondence with the user of the second time series data is not specified as it is as the partial time series data. good. Further, the integration unit 123 extracts and extracts the second partial time series data corresponding to an arbitrary period from the second time series data of the user whose correspondence relationship with the user of the first time series data is not specified. The second partial time series data may be used as the partial time series data.

特徴ベクトル生成部１２４は、統合部１２３により統合された複数の部分時系列データに基づいて、複数の部分時系列データのそれぞれに含まれる複数のアイテム情報のそれぞれが示すアイテムの特徴を示す特徴ベクトルを生成する。 The feature vector generation unit 124 is a feature vector indicating the characteristics of the items indicated by each of the plurality of item information included in each of the plurality of partial time series data based on the plurality of partial time series data integrated by the integration unit 123. To generate.

具体的には、特徴ベクトル生成部１２４は、複数の部分時系列データに含まれるイベントデータに含まれているアイテム情報を抽出する。特徴ベクトル生成部１２４は、抽出した複数のアイテム情報が示すアイテムの関係を解析することにより、複数のアイテムのそれぞれの特徴を示す特徴ベクトルを生成する。 Specifically, the feature vector generation unit 124 extracts the item information included in the event data included in the plurality of partial time series data. The feature vector generation unit 124 generates a feature vector indicating the characteristics of each of the plurality of items by analyzing the relationship between the items indicated by the extracted plurality of item information.

例えば、特徴ベクトル生成部１２４は、複数のアイテムのそれぞれを１つの単語とみなし、複数の部分時系列データのそれぞれにおいて、当該単語を連結した文章を生成する。特徴ベクトル生成部１２４は、生成した複数の文章について、例えば、Ｗｏｒｄ２Ｖｅｃを用いることにより、複数のアイテムのそれぞれの特徴ベクトルを生成する。特徴ベクトルの要素数は、例えば、アイテムの数に対応するものとする。 For example, the feature vector generation unit 124 regards each of the plurality of items as one word, and generates a sentence in which the words are concatenated in each of the plurality of partial time series data. The feature vector generation unit 124 generates the feature vectors of each of the plurality of items for the generated sentences by using, for example, Word2Vec. It is assumed that the number of elements of the feature vector corresponds to, for example, the number of items.

図５は、図３に示す時系列データに基づいて生成した特徴ベクトルを特徴空間に配置した例を示す図である。なお、図５では、説明の便宜上、特徴空間を二次元に圧縮して特徴空間に配置した例を示している。図５には、マークＭ１とマークＭ２とがそれぞれ複数配置されている。これらのマークは、特徴空間上のアイテムの位置を示している。マークＭ１は、第１時系列データに対応する閲覧イベントに対応するアイテムを示しており、マークＭ２は、第２時系列データに対応する閲覧イベントに対応するアイテムを示している。 FIG. 5 is a diagram showing an example in which a feature vector generated based on the time series data shown in FIG. 3 is arranged in a feature space. Note that FIG. 5 shows an example in which the feature space is compressed two-dimensionally and arranged in the feature space for convenience of explanation. In FIG. 5, a plurality of marks M1 and marks M2 are arranged. These marks indicate the position of the item in the feature space. The mark M1 indicates an item corresponding to the browsing event corresponding to the first time series data, and the mark M2 indicates an item corresponding to the browsing event corresponding to the second time series data.

また、マークＭ１と、マークＭ２には、符号として、ｖ１〜ｖ７、ｗ１〜ｗ６が示されている。この符号は、アイテムを示しており、図３に示すイベントデータの符号と一致している。例えば、図５における、ｖ１が添えられたマークＭ１は、図３に示す第１時系列データＤ１Ａのイベントデータｖ１に含まれるアイテム情報が示すアイテムに対応しており、ｗ１が添えられたマークＭ２は、図３に示す第２時系列データＤ２Ａのイベントデータｗ１に含まれるアイテム情報が示すアイテムに対応している。 Further, the marks M1 and the mark M2 are indicated with reference numerals v1 to v7 and w1 to w6. This code indicates an item and matches the code of the event data shown in FIG. For example, the mark M1 with v1 in FIG. 5 corresponds to the item indicated by the item information included in the event data v1 of the first time series data D1A shown in FIG. 3, and the mark M2 with w1 is attached. Corresponds to the item indicated by the item information included in the event data w1 of the second time series data D2A shown in FIG.

図３に示す例において、ＥＣサイトにおいてアイテムを閲覧するイベントを示すイベントデータｖ１、ｖ３と、ウェブサイトにおいてアイテムを閲覧するイベントを示すイベントデータｗ１、ｗ２は、ＥＣサイトにおいてアイテムを購買するイベントｐ１の前に出ている頻度が高い。これに対応し、図５では、ｖ１、ｖ２、ｗ１、ｗ２がそれぞれ特徴空間において他のアイテムに比べて近い位置に配置されており、共起していることが確認できる。 In the example shown in FIG. 3, the event data v1 and v3 indicating the event of browsing the item on the EC site and the event data w1 and w2 indicating the event of browsing the item on the website are the event p1 of purchasing the item on the EC site. Frequently appearing in front of. Corresponding to this, in FIG. 5, v1, v2, w1 and w2 are arranged at positions closer to each other in the feature space than other items, and it can be confirmed that they co-occur.

［転移学習及び所定のイベントの発生予測］
本実施形態において、特徴ベクトル生成部１２４及び予測部１２５は、協働することにより、異なるドメインに対応する時系列データ間の転移学習を行うとともに、第２時系列データに対応するユーザが、所定のイベントを発生させるか否かを予測する。これにより、特徴ベクトル生成装置１は、異なるドメインに対応する時系列データ間の転移学習を行う学習装置、及び第２時系列データに対応するユーザが所定のイベントを発生させるか否かを予測する予測装置として機能する。以下、転移学習及び所定のイベントの発生予測に係る特徴ベクトル生成部１２４及び予測部１２５の機能について説明する。 [Transfer learning and prediction of occurrence of predetermined events]
In the present embodiment, the feature vector generation unit 124 and the prediction unit 125 cooperate to perform transfer learning between time series data corresponding to different domains, and a user corresponding to the second time series data is predetermined. Predict whether or not to generate the event of. As a result, the feature vector generation device 1 predicts whether or not a learning device that performs transfer learning between time-series data corresponding to different domains and a user corresponding to the second time-series data will generate a predetermined event. Functions as a predictor. Hereinafter, the functions of the feature vector generation unit 124 and the prediction unit 125 related to transfer learning and prediction of the occurrence of a predetermined event will be described.

特徴ベクトル生成部１２４は、第１時系列データに含まれる複数のアイテム情報が示す複数のアイテムのそれぞれの特徴ベクトルに基づいて、第１時系列データに対応するユーザの特徴ベクトルを第１特徴ベクトルとして生成する。 The feature vector generation unit 124 sets the user's feature vector corresponding to the first time series data as the first feature vector based on the feature vector of each of the plurality of items indicated by the plurality of item information included in the first time series data. Generate as.

例えば、特徴ベクトル生成部１２４は、統合部１２３と同様に、第１時系列データから、当該第１時系列データに含まれる所定のイベントであるアイテムの購入イベントの発生時刻を含み、当該発生時刻以前の期間に対応する時系列データである第１部分時系列データを抽出する。ここで、特徴ベクトル生成部１２４は、第１部分時系列データに含まれるイベントデータの数が第１の数となるように第１部分時系列データを抽出する。 For example, the feature vector generation unit 124, like the integration unit 123, includes the occurrence time of the purchase event of the item which is a predetermined event included in the first time series data from the first time series data, and the occurrence time thereof. The first partial time series data, which is the time series data corresponding to the previous period, is extracted. Here, the feature vector generation unit 124 extracts the first partial time series data so that the number of event data included in the first partial time series data is the first number.

また、特徴ベクトル生成部１２４は、第１時系列データから、所定のイベントであるアイテムの購入イベントに対応するイベントデータが含まれていない期間に対応し、第１の数のイベントデータを含む第１部分時系列データを抽出する。 Further, the feature vector generation unit 124 corresponds to a period in which the event data corresponding to the purchase event of the item, which is a predetermined event, is not included from the first time series data, and includes the first number of event data. Extract 1 partial time series data.

そして、特徴ベクトル生成部１２４は、第１部分時系列データに含まれる複数のアイテム情報のそれぞれに対して生成された特徴ベクトルの平均値（例えば、算出平均値や加重平均値）を算出することにより、第１特徴ベクトルを生成する。 Then, the feature vector generation unit 124 calculates the average value (for example, the calculated average value or the weighted average value) of the feature vectors generated for each of the plurality of item information included in the first partial time series data. Generates the first feature vector.

予測部１２５は、複数の第１特徴ベクトルと、当該第１特徴ベクトルに対応するユーザが所定のイベントを発生させたか否かの結果とに基づいて、ユーザの特徴ベクトルの入力に対して、当該ユーザを、所定のイベントを発生させたユーザと、当該所定のイベントを発生させなかったユーザとに分類する分類器を生成する。 The prediction unit 125 responds to the input of the user's feature vector based on the plurality of first feature vectors and the result of whether or not the user corresponding to the first feature vector has generated a predetermined event. A classifier is generated that classifies users into a user who has generated a predetermined event and a user who has not generated the predetermined event.

例えば、予測部１２５は、第１特徴ベクトルの生成元となった第１部分時系列データのうち、アイテムの購入イベントを含む部分時系列データを正例データとし、アイテムの購入イベントを含まない部分時系列データを負例データとする。そして、予測部１２５は、これらの正例データ及び負例データに基づいて機械学習を行うことにより、ユーザの特徴ベクトルの入力に対して、当該ユーザを、アイテムの購入イベントを発生させたユーザと、アイテムの購入イベントを発生させなかったユーザとに分類する分類器を生成する。 For example, the prediction unit 125 uses the partial time-series data including the item purchase event as regular example data in the first partial time-series data from which the first feature vector is generated, and does not include the item purchase event. Let the time series data be negative example data. Then, the prediction unit 125 performs machine learning based on these positive example data and negative example data, so that the user is referred to the user who generated the item purchase event in response to the input of the user's feature vector. , Generates a classifier that classifies items as users who did not generate a purchase event.

図６は、本実施形態に係る第１特徴ベクトルに基づいて分類器を学習させた例を示す図である。なお、図６では、説明の便宜上、第１特徴ベクトルを二次元に圧縮して特徴空間に配置した例を示している。図６に示すマークＭ３は、正例データに対応する第１特徴ベクトルを示し、マークＭ４は、負例データに対応する第１特徴ベクトルを示している。また、境界線Ｌは、分類器により第１特徴ベクトルを正例データと負例データとを分類したときの境界線を示している。なお、境界線は、説明の便宜上示すものであり、実際には境界線は生成されるものではない。 FIG. 6 is a diagram showing an example in which the classifier is trained based on the first feature vector according to the present embodiment. Note that FIG. 6 shows an example in which the first feature vector is compressed two-dimensionally and arranged in the feature space for convenience of explanation. The mark M3 shown in FIG. 6 indicates a first feature vector corresponding to the positive example data, and the mark M4 indicates a first feature vector corresponding to the negative example data. Further, the boundary line L indicates a boundary line when the first feature vector is classified into positive example data and negative example data by a classifier. The boundary line is shown for convenience of explanation, and the boundary line is not actually generated.

なお、予測部１２５は、分類器を生成するにあたり、統合部１２３が生成した、第２部分時系列データを含む部分時系列データに基づいて第１特徴ベクトルを生成し、当該第１特徴ベクトルを含めて分類器を生成してもよい。図７は、本実施形態に係る第２部分時系列データを含む部分時系列データに基づいて生成された第１特徴ベクトルを含めて分類器を学習させた例を示す図である。図７には、図６と同様に正例データに対応する第１特徴ベクトルを示すマークＭ３と、負例データに対応する第１特徴ベクトルを示すマークＭ４とともに、これらのマークと異なるマークＭ５と、マークＭ６とが表示されている。 In generating the classifier, the prediction unit 125 generates a first feature vector based on the partial time series data including the second partial time series data generated by the integration unit 123, and generates the first feature vector. It may be included to generate a classifier. FIG. 7 is a diagram showing an example in which the classifier is trained including the first feature vector generated based on the partial time series data including the second partial time series data according to the present embodiment. In FIG. 7, a mark M3 indicating the first feature vector corresponding to the positive example data, a mark M4 indicating the first feature vector corresponding to the negative example data, and a mark M5 different from these marks are shown in FIG. , Mark M6 and so on are displayed.

図７に示すマークＭ５は、正例データに対応する第１特徴ベクトルであって、第２部分時系列データを含む部分時系列データに基づいて生成された第１特徴ベクトルを示している。また、マークＭ６は、負例データに対応する第１特徴ベクトルであって、第２部分時系列データを含む部分時系列データに基づいて生成された第１特徴ベクトルを示している。また、境界線Ｌ２は、分類器により第１特徴ベクトルを正例データと負例データとを分類したときの境界線を示している。図７に示す例は、図６に示す例に比べて正例データと負例データとが増加したことにより、境界線Ｌ２の位置が境界線Ｌに比べて若干異なっていることが確認できる。 The mark M5 shown in FIG. 7 is a first feature vector corresponding to the positive example data, and indicates a first feature vector generated based on the partial time series data including the second partial time series data. Further, the mark M6 is a first feature vector corresponding to the negative example data, and indicates a first feature vector generated based on the partial time series data including the second partial time series data. Further, the boundary line L2 indicates a boundary line when the first feature vector is classified into positive example data and negative example data by a classifier. In the example shown in FIG. 7, it can be confirmed that the position of the boundary line L2 is slightly different from that of the boundary line L because the positive example data and the negative example data are increased as compared with the example shown in FIG.

特徴ベクトル生成部１２４は、時系列データ取得部１２１が取得した第２時系列データに含まれる複数のアイテム情報が示す複数のアイテムのそれぞれの特徴ベクトルに基づいて、第２時系列データに対応するユーザの特徴ベクトルを第２特徴ベクトルとして生成する。例えば、特徴ベクトル生成部１２４は、第２時系列データのうち、最新の時刻から所定期間に含まれるイベントデータに基づいて第２部分時系列データを生成する。そして、特徴ベクトル生成部１２４は、第２部分時系列データに含まれる複数のアイテム情報のそれぞれに対して生成された特徴ベクトルの平均値を算出することにより、第２特徴ベクトルを生成する。なお、特徴ベクトル生成部１２４が、第２特徴ベクトルを生成するタイミングで時系列データ取得部１２１が第２時系列データを取得してもよい。 The feature vector generation unit 124 corresponds to the second time series data based on the feature vectors of the plurality of items indicated by the plurality of item information included in the second time series data acquired by the time series data acquisition unit 121. The user's feature vector is generated as the second feature vector. For example, the feature vector generation unit 124 generates the second partial time series data based on the event data included in the predetermined period from the latest time in the second time series data. Then, the feature vector generation unit 124 generates the second feature vector by calculating the average value of the generated feature vectors for each of the plurality of item information included in the second partial time series data. The time-series data acquisition unit 121 may acquire the second time-series data at the timing when the feature vector generation unit 124 generates the second feature vector.

予測部１２５は、生成した分類器に第２特徴ベクトルを入力することにより、第２特徴ベクトルに対応するユーザを、所定のイベントを発生させると予測されるユーザと、所定のイベントを発生させないと予測されるユーザとに分類し、分類結果を示す情報を出力する。 By inputting the second feature vector into the generated classifier, the prediction unit 125 has to generate a user corresponding to the second feature vector, a user predicted to generate a predetermined event, and a predetermined event. It classifies it as a predicted user and outputs information indicating the classification result.

図８は、本実施形態に係る分類器により第２特徴ベクトルが分類された例を示す図である。図８に示す例は、図６に対応する分類器により第２特徴ベクトルを分類した例を示しており、図６と同じ境界線Ｌが表示されている。図８に示すマークＭ７は、所定のイベントを発生させると予測されたユーザに対応する第２特徴ベクトルを示している。また、マークＭ８は、所定のイベントを発生させないと予測されたユーザに対応する第２特徴ベクトルを示している。このようにすることで、特徴ベクトル生成装置１は、異なるドメインに対応する時系列データ間の転移学習を精度良く行うことができる。 FIG. 8 is a diagram showing an example in which the second feature vector is classified by the classifier according to the present embodiment. The example shown in FIG. 8 shows an example in which the second feature vector is classified by the classifier corresponding to FIG. 6, and the same boundary line L as in FIG. 6 is displayed. The mark M7 shown in FIG. 8 indicates a second feature vector corresponding to the user predicted to generate a predetermined event. Further, the mark M8 indicates a second feature vector corresponding to a user who is predicted not to generate a predetermined event. By doing so, the feature vector generator 1 can accurately perform transfer learning between time series data corresponding to different domains.

［特徴ベクトル生成装置１における処理の流れ］
続いて、特徴ベクトル生成装置１における処理の流れの一例について説明する。まず、特徴ベクトル生成装置１がアイテムの特徴ベクトルを生成するときの処理の流れについて説明する。図９は、本実施形態に係る特徴ベクトル生成装置１がアイテムの特徴ベクトルを生成するときの処理の流れを示すフローチャートである。 [Process flow in feature vector generator 1]
Subsequently, an example of the processing flow in the feature vector generator 1 will be described. First, the flow of processing when the feature vector generation device 1 generates the feature vector of the item will be described. FIG. 9 is a flowchart showing a processing flow when the feature vector generation device 1 according to the present embodiment generates a feature vector of an item.

まず、時系列データ取得部１２１は、複数の第１時系列データと複数の第２時系列データを取得する（Ｓ１）。
続いて、対応関係特定部１２２は、複数の第１時系列データに含まれるユーザ識別情報と、複数の第２時系列データに含まれるユーザ識別情報との対応関係を特定する（Ｓ２）。 First, the time-series data acquisition unit 121 acquires a plurality of first time-series data and a plurality of second time-series data (S1).
Subsequently, the correspondence relationship specifying unit 122 specifies the correspondence relationship between the user identification information included in the plurality of first time series data and the user identification information included in the plurality of second time series data (S2).

続いて、統合部１２３は、第１時系列データ及び第２時系列データに含まれるイベントデータの発生時刻と、Ｓ２において特定された対応関係とに基づいて、第１時系列データの一部と第２時系列データの一部とを統合することにより部分時系列データを生成する（Ｓ３）。統合部１２３は、複数の第１時系列データと複数の第２時系列データとに基づいて、部分時系列データを複数生成する。 Subsequently, the integration unit 123 together with a part of the first time series data based on the occurrence time of the event data included in the first time series data and the second time series data and the correspondence relationship specified in S2. Partial time series data is generated by integrating a part of the second time series data (S3). The integration unit 123 generates a plurality of partial time series data based on the plurality of first time series data and the plurality of second time series data.

続いて、特徴ベクトル生成部１２４は、複数の部分時系列データに基づいて、複数の部分時系列データのそれぞれに含まれる複数のアイテム情報のそれぞれが示すアイテムの特徴を示す特徴ベクトルを生成する（Ｓ４）。 Subsequently, the feature vector generation unit 124 generates a feature vector indicating the feature of the item indicated by each of the plurality of item information included in each of the plurality of partial time series data based on the plurality of partial time series data ( S4).

続いて、特徴ベクトル生成装置１が所定のイベントを発生させるユーザを予測するときの処理の流れについて説明する。図１０は、本実施形態に係る特徴ベクトル生成装置１が所定のイベントを発生させるユーザを予測するときの処理の流れを示すフローチャートである。なお、本フローチャートの開始時に、時系列データ取得部１２１が複数の第１時系列データと複数の第２時系列データを取得しており、特徴ベクトル生成部１２４が複数のアイテムの特徴ベクトルを生成しているものとする。 Next, a processing flow when the feature vector generation device 1 predicts a user who generates a predetermined event will be described. FIG. 10 is a flowchart showing a processing flow when the feature vector generation device 1 according to the present embodiment predicts a user who generates a predetermined event. At the start of this flowchart, the time series data acquisition unit 121 has acquired a plurality of first time series data and a plurality of second time series data, and the feature vector generation unit 124 generates feature vectors of a plurality of items. It is assumed that you are doing.

まず、特徴ベクトル生成部１２４は、時系列データ取得部１２１が取得した第１時系列データに基づいて第１部分時系列データを複数生成する（Ｓ１１）。
続いて、特徴ベクトル生成部１２４は、Ｓ１１において生成された複数の第１部分時系列データのそれぞれについて、当該第１部分時系列データに含まれるアイテム情報が示すアイテムの特徴ベクトルに基づいて第１特徴ベクトルを複数生成する（Ｓ１２）。 First, the feature vector generation unit 124 generates a plurality of first partial time series data based on the first time series data acquired by the time series data acquisition unit 121 (S11).
Subsequently, the feature vector generation unit 124 first, for each of the plurality of first partial time series data generated in S11, based on the feature vector of the item indicated by the item information included in the first partial time series data. A plurality of feature vectors are generated (S12).

続いて、予測部１２５は、Ｓ１２において生成された複数の第１特徴ベクトルと、当該第１特徴ベクトルに対応するユーザが所定のイベント（アイテムの購入イベント）を発生させたか否かの結果とに基づいて、ユーザの特徴ベクトルの入力に対して、当該ユーザを、所定のイベントを発生させたユーザと、当該所定のイベントを発生させなかったユーザとに分類する分類器を生成する（Ｓ１３）。 Subsequently, the prediction unit 125 determines the result of whether or not the plurality of first feature vectors generated in S12 and the user corresponding to the first feature vector have generated a predetermined event (item purchase event). Based on this, a classifier that classifies the user into a user who generated a predetermined event and a user who did not generate the predetermined event is generated in response to the input of the feature vector of the user (S13).

続いて、特徴ベクトル生成部１２４は、時系列データ取得部１２１が取得した第２時系列データに基づいて第２部分時系列データを生成する（Ｓ１４）。
続いて、特徴ベクトル生成部１２４は、Ｓ１４において生成された複数の第２部分時系列データのそれぞれについて、当該第２部分時系列データに含まれるアイテム情報が示すアイテムの特徴ベクトルに基づいて第２特徴ベクトルを複数生成する（Ｓ１５）。 Subsequently, the feature vector generation unit 124 generates the second partial time series data based on the second time series data acquired by the time series data acquisition unit 121 (S14).
Subsequently, the feature vector generation unit 124 uses the feature vector of the item indicated by the item information included in the second partial time series data for each of the plurality of second partial time series data generated in S14. A plurality of feature vectors are generated (S15).

続いて、予測部１２５は、Ｓ１３において生成された分類器に、Ｓ１５において生成された第２特徴ベクトルを入力することにより、当該第２特徴ベクトルに対応するユーザを、所定のイベントを発生させると予測されるユーザと、所定のイベントを発生させないと予測されるユーザとに分類する（Ｓ１６）。
続いて、予測部１２５は、Ｓ１６における分類結果を出力する（Ｓ１７）。例えば、予測部１２５は、分類結果を示す情報を含むファイルを生成し、当該ファイルを記憶部１１に記憶させる。 Subsequently, the prediction unit 125 inputs the second feature vector generated in S15 to the classifier generated in S13 to generate a predetermined event for the user corresponding to the second feature vector. It is classified into a predicted user and a user who is predicted not to generate a predetermined event (S16).
Subsequently, the prediction unit 125 outputs the classification result in S16 (S17). For example, the prediction unit 125 generates a file including information indicating the classification result, and stores the file in the storage unit 11.

［本実施形態における効果］
以上の通り、本実施形態に係る特徴ベクトル生成装置１は、複数の第１時系列データ及び第２時系列データに含まれる時刻と、第１時系列データ及び第２時系列データに含まれるユーザ識別情報の対応関係とに基づいて、第１時系列データの一部と第２時系列データの一部とを統合することにより部分時系列データを複数生成する。そして、特徴ベクトル生成装置１は、統合された複数の部分時系列データに基づいて、複数の部分時系列データのそれぞれに含まれる複数のアイテム情報のそれぞれが示すアイテムの特徴を示す特徴ベクトルを生成する。このようにすることで、特徴ベクトル生成装置１は、異なるドメインに対応する時系列データのアイテムに関連性を持たせて特徴ベクトルを生成することができる。これにより、特徴ベクトル生成装置１は、異なるドメインに対応する時系列データ間の転移学習を精度良く行うことができる。 [Effect in this embodiment]
As described above, the feature vector generation device 1 according to the present embodiment includes the time included in the plurality of first time series data and the second time series data, and the user included in the first time series data and the second time series data. A plurality of partial time series data are generated by integrating a part of the first time series data and a part of the second time series data based on the correspondence of the identification information. Then, the feature vector generation device 1 generates a feature vector indicating the feature of the item indicated by each of the plurality of item information included in each of the plurality of partial time series data based on the integrated plurality of partial time series data. do. By doing so, the feature vector generation device 1 can generate the feature vector by associating the items of the time series data corresponding to different domains. As a result, the feature vector generator 1 can accurately perform transfer learning between time series data corresponding to different domains.

以上、本発明を上記の実施形態を用いて説明したが、本発明の技術的範囲は上記実施形態に記載の範囲には限定されない。上記実施形態に、多様な変更又は改良を加えることが可能であることが当業者に明らかである。 Although the present invention has been described above using the above-described embodiment, the technical scope of the present invention is not limited to the scope described in the above-described embodiment. It will be apparent to those skilled in the art that various changes or improvements can be made to the above embodiments.

例えば、上述の実施形態では、イベントデータをユーザが発生させたものとしたが、これに限らず、デバイスが発生させたものであってもよい。この場合、イベントデータに含まれるユーザ識別情報は、デバイスを識別するデバイス識別情報であってもよい。 For example, in the above-described embodiment, the event data is generated by the user, but the present invention is not limited to this, and the event data may be generated by the device. In this case, the user identification information included in the event data may be device identification information for identifying the device.

また、上述の実施形態では、特徴ベクトル生成装置１は、同一のユーザの第１時系列データの一部と第２時系列データの一部とを統合することにより部分時系列データを生成したが、これに限らない。例えば、記憶部１１に、ユーザの対応情報とともに、ユーザの属性を示す属性情報を記憶させておいてもよい。そして、特徴ベクトル生成装置１は、第１時系列データの一部と、当該第１時系列データのユーザと異なるユーザであって、属性が類似するユーザに対応する第２時系列データの一部とを統合することにより部分時系列データを生成してもよい。 Further, in the above-described embodiment, the feature vector generator 1 generates partial time-series data by integrating a part of the first time-series data of the same user and a part of the second time-series data. , Not limited to this. For example, the storage unit 11 may store the attribute information indicating the user's attribute together with the user's correspondence information. Then, the feature vector generator 1 is a part of the first time series data and a part of the second time series data corresponding to a user who is different from the user of the first time series data and has similar attributes. Partial time series data may be generated by integrating with.

また、上述の実施形態では、特徴ベクトル生成装置１は、第１ドメインの第１時系列データの一部と、第２ドメインの第２時系列データの一部とを統合することにより部分時系列データを生成し、当該部分時系列データに基づいて、アイテムの特徴を示す特徴ベクトルを生成したが、これに限らない。特徴ベクトル生成装置１は、３つ以上のドメインのそれぞれに対応する時系列データの一部を統合することにより部分時系列データを生成し、当該部分時系列データに基づいて、アイテムの特徴を示す特徴ベクトルを生成してもよい。 Further, in the above-described embodiment, the feature vector generator 1 integrates a part of the first time series data of the first domain and a part of the second time series data of the second domain to perform a partial time series. Data was generated, and a feature vector indicating the characteristics of the item was generated based on the partial time series data, but the present invention is not limited to this. The feature vector generator 1 generates partial time series data by integrating a part of time series data corresponding to each of three or more domains, and shows the features of the item based on the partial time series data. A feature vector may be generated.

また、特に、装置の分散・統合の具体的な実施形態は以上に図示するものに限られず、その全部又は一部について、種々の付加等に応じて、又は、機能負荷に応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。 Further, in particular, the specific embodiment of the distribution / integration of the apparatus is not limited to those shown above, and all or a part thereof may be arbitrarily added according to various additions or functional loads. It can be functionally or physically distributed / integrated in units.

１・・・特徴ベクトル生成装置、１１・・・記憶部、１２・・・制御部、１２１・・・時系列データ取得部、１２２・・・対応関係特定部、１２３・・・統合部、１２４・・・特徴ベクトル生成部、１２５・・・予測部 1 ... Feature vector generator, 11 ... Storage unit, 12 ... Control unit, 121 ... Time series data acquisition unit, 122 ... Correspondence relationship identification unit, 123 ... Integration unit, 124・・・ Feature vector generation unit, 125 ・・・ Prediction unit

Claims

Time-series data including item information indicating an item corresponding to the event, the time when the event occurred, and user identification information for identifying the user who generated the event, and is the time-series data of the first domain. A time-series data acquisition unit that acquires a certain first time-series data and a second time-series data that is time-series data of the second domain.
A correspondence relationship specifying unit that specifies a correspondence relationship between a plurality of the user identification information,
A part of the first time series data and the above based on the time included in the first time series data and the second time series data and the correspondence relationship of the user identification information specified by the correspondence relationship specifying unit. An integration unit that generates multiple partial time series data by integrating a part of the second time series data,
Based on the plurality of the partial time series data integrated by the integration unit, a feature that generates a feature vector indicating the characteristics of the item indicated by each of the plurality of item information included in each of the plurality of the partial time series data. Vector generator and
Feature vector generator with.

The first time-series data includes at least one of item information corresponding to a predetermined event and item information corresponding to an event different from the predetermined event.
The integrated unit includes the occurrence time of the predetermined event included in the first time-series data from the first time-series data, and is the time-series data corresponding to the period before the occurrence time. The series data is extracted, and the second partial time series data, which is the time series data corresponding to the period before the occurrence time, is extracted from the second time series data, and the first partial time series data and the second part time series data are extracted. The partial time series data is generated by integrating with the time series data.
The feature vector generator according to claim 1.

When the first time-series data does not include the item information corresponding to the predetermined event, the integration unit obtains the time-series data corresponding to an arbitrary period from the first time-series data. In addition to extracting as one partial time series data, time series data corresponding to the arbitrary period is extracted as the second partial time series data from the second time series data, and the first partial time series data and the first part time series data are extracted. Generate partial time series data by integrating with two partial time series data,
The feature vector generator according to claim 2.

The integration unit generates the partial time series data so that the number of the item information included in the partial time series data is a predetermined number.
The feature vector generator according to claim 2 or 3.

The integration unit extracts the first partial time series data so that the number of item information included in the first partial time series data becomes the first number, and the item included in the second partial time series data. The second partial time series data is extracted so that the number of information becomes the second number.
The feature vector generator according to claim 4.

The integration unit specifies a period in which the number of item information included in the first partial time-series data is the first number in the first time-series data, and selects the second time-series data corresponding to the period. When the number of item information included in the second partial time series data is larger than the second number, the item information included in the second partial time series data is extracted as the second partial time series data. Reduce the item information so that the number is the second number,
The feature vector generator according to claim 5.

The integration unit specifies a period in which the number of item information included in the first partial time-series data is the first number in the first time-series data, and the second time-series data corresponding to the period is used. When the number of item information included is smaller than the second number, the period is extended so that the number of item information included in the second partial time series data becomes the second number. Part 2 Extract time series data,
The feature vector generator according to claim 5.

The integrated unit generates the partial time series data so that the period in which the event corresponding to the item information included in the partial time series data occurs is a predetermined predetermined period.
The feature vector generator according to claim 2 or 3.

The integrated unit generates the partial time series data so that the number of item information corresponding to the predetermined event included in the partial time series data is a predetermined number.
The feature vector generator according to any one of claims 2 to 8.

The generation unit generates a feature vector of each of the plurality of items by analyzing the relationship between the items indicated by the plurality of item information included in the plurality of the partial time series data.
The feature vector generator according to any one of claims 2 to 9.

The integration unit identifies the second time-series data corresponding to the same user as the user corresponding to the first time-series data, based on the correspondence of the user identification information specified by the correspondence-specificing unit. , The partial time series data is generated by integrating a part of the first time series data and a part of the second time series data.
The feature vector generator according to any one of claims 1 to 10.

The first time series data includes at least one of a predetermined event and an event different from the predetermined event.
The generation unit first obtains the feature vector of the user corresponding to the first time series data based on the feature vector of each of the plurality of items indicated by the plurality of item information included in the first time series data. The user's feature vector corresponding to the second time series data is generated as a feature vector and based on the feature vector of each of the plurality of items indicated by the plurality of item information included in the second time series data. Generated as a second feature vector
Based on the plurality of the first feature vectors and the result of whether or not the user corresponding to the first feature vector has generated a predetermined event, the user is subjected to the input of the user's feature vector. By generating a classifier that classifies a user who has generated a predetermined event and a user who has not generated the predetermined event, and inputting the second feature vector into the generated classifier, the second feature vector is input. It further includes a prediction unit that classifies the users corresponding to the feature vector into a user who is predicted to generate the predetermined event and a user who is predicted not to generate the predetermined event.
The feature vector generator according to any one of claims 1 to 11.

Computer runs,
Time-series data including item information indicating an item corresponding to the event, the time when the event occurred, and user identification information for identifying the user who generated the event, and is the time-series data of the first domain. A step of acquiring a certain first time series data and a second time series data which is time series data of the second domain,
A step of identifying the correspondence between a plurality of the user identification information and
A part of the first time series data and the second time series data based on the correspondence between the time included in the first time series data and the second time series data and the specified user identification information. Steps to generate multiple partial time series data by integrating with a part of
Based on the plurality of integrated partial time series data, a step of generating a feature vector indicating the characteristics of the item indicated by each of the plurality of item information included in each of the plurality of the partial time series data, and a step of generating a feature vector.
A feature vector generation method comprising.

Computer,
Time-series data including item information indicating an item corresponding to the event, the time when the event occurred, and user identification information for identifying the user who generated the event, and is the time-series data of the first domain. A time-series data acquisition unit that acquires a certain first time-series data and a second time-series data that is time-series data of the second domain.
Correspondence relationship identification unit that specifies the correspondence relationship of a plurality of the user identification information,
A part of the first time series data and the above based on the time included in the first time series data and the second time series data and the correspondence relationship of the user identification information specified by the correspondence relationship specifying unit. An integration unit that generates multiple partial time series data by integrating a part of the second time series data, and
Based on the plurality of the partial time series data integrated by the integration unit, a feature that generates a feature vector indicating the characteristics of the item indicated by each of the plurality of item information included in each of the plurality of the partial time series data. Vector generator,
Feature vector generator to function as.