JP2023070618A

JP2023070618A - Information processing system, computer program, and information processing method

Info

Publication number: JP2023070618A
Application number: JP2022048893A
Authority: JP
Inventors: 雄介熊谷; Yusuke Kumagai; 龍道本; Ryu Domoto; 悠哉野沢; Yuya Nozawa
Original assignee: Hakuhodo DY Holdings Inc
Current assignee: Hakuhodo DY Holdings Inc
Priority date: 2021-11-09
Filing date: 2022-03-24
Publication date: 2023-05-19
Anticipated expiration: 2042-03-24
Also published as: WO2023085279A1; JP7227412B1; TW202336607A

Abstract

To provide a technology capable of realizing association between entities without depending on common variables.SOLUTION: In an information processing system 1, a first data set regarding a plurality of first entities is acquired (S110), and a second data set regarding a plurality of second entities is acquired (S120). Dimension reduction processing is executed to one group having a first feature vector identified by the first data set and to one group having a second feature vector identified by the second data set (S130 and S140). Consequently, there are generated one group having a first low dimensional feature vector corresponding to the one group of the first feature vector, and one group having a second low dimensional feature vector corresponding to the one group of the second feature vector. Each of the first entities is associated with at least one of the second entities on the basis of the one group having a first low dimensional feature vector and the one group having a second low dimensional feature vector (S150 to S180).SELECTED DRAWING: Figure 2

Description

本開示は、情報処理システム及び情報処理方法に関する。 The present disclosure relates to an information processing system and an information processing method.

従来、商品の販売データに基づき顧客の購買行動を分析することが行われている。顧客によるマスメディアやネットワークコンテンツへの接触行動を分析することも行われている。この他、アンケート形式や対面での質問形式により、顧客に関する多様な情報を収集することが行われている。 2. Description of the Related Art Conventionally, analysis of customer purchasing behavior is performed based on product sales data. Analyzes of customers' contact behavior with respect to mass media and network contents are also performed. In addition, various types of information about customers are collected in the form of questionnaires and face-to-face questions.

異なる手段で収集した複数のデータを共通変数に基づいて結合するデータフュージョン技術も知られている。例えば、出願人は、第一の顧客群に関する第一の特徴を表す第一のデータセットと、第二の顧客群に関する第二の特徴を表す第二のデータセットとを、第一のデータセットと第二のデータセットとの間で共通する変数、例えば、顧客のデモグラフィック属性に基づいて、近しい顧客の第一のデータと第二のデータとを結合するように、結合する技術を既に開示している（例えば、特許文献１参照）。 A data fusion technique is also known that combines a plurality of data collected by different means based on a common variable. For example, applicant may combine a first data set representing a first characteristic about a first group of customers and a second data set representing a second characteristic about a second group of customers into the first data set and a second data set, such as combining first and second data of close customers based on demographic attributes of the customer, such as (See, for example, Patent Document 1).

特開２０１６－１２６６０９号公報JP 2016-126609 A

しかしながら、従来のデータフュージョン技術では、共通変数を用いて近しい顧客を判別するために、結合対象の第一のデータセットと第二のデータセットとの間に、顧客に関する共通変数が必要である。従って、従来技術では、共通変数のないデータ同士を結合することができない。 However, conventional data fusion techniques require a common variable about customers between the first data set and the second data set to be combined in order to use the common variable to determine close customers. Therefore, in the prior art, it is not possible to combine data that do not have common variables.

そこで、本開示の一側面によれば、共通変数によらずに、複数の第一のエンティティに関する第一のデータセットと、複数の第二のエンティティに関する第二のデータセットとに基づいて、第一のエンティティと第二のエンティティとの対応付けを実現可能な技術を提供できることが望ましい。 Therefore, according to one aspect of the present disclosure, the first It is desirable to be able to provide a technique that can realize correspondence between one entity and a second entity.

本開示の一側面によれば、情報処理システムが提供される。情報処理システムは、第一取得部と、第二取得部と、次元削減部と、対応付け部と、を備える。第一取得部は、複数の第一のエンティティに関する第一のデータセットを取得するように構成される。第一のデータセットは、複数の第一のエンティティのそれぞれの特徴を記述し得る。 According to one aspect of the present disclosure, an information processing system is provided. The information processing system includes a first acquisition unit, a second acquisition unit, a dimensionality reduction unit, and an association unit. The first obtaining unit is configured to obtain a first data set for a plurality of first entities. The first data set may describe characteristics of each of the plurality of first entities.

第二取得部は、複数の第二のエンティティに関する第二のデータセットを取得するように構成される。第二のデータセットは、複数の第二のエンティティのそれぞれの特徴を記述し得る。 The second obtaining unit is configured to obtain a second data set for a plurality of second entities. The second data set may describe characteristics of each of the plurality of second entities.

次元削減部は、第一のデータセットから特定される第一の特徴ベクトルの一群、及び、第二のデータセットから特定される第二の特徴ベクトルの一群に対する次元削減処理を実行することによって、第一の特徴ベクトルの一群に対応する第一の低次元特徴ベクトルの一群、及び、第二の特徴ベクトルの一群に対応する第二の低次元特徴ベクトルの一群を生成するように構成される。第二の低次元特徴ベクトルの一群は、第一の低次元特徴ベクトルの一群と同一次元数の特徴ベクトルの一群であり得る。 The dimension reduction unit performs dimension reduction processing on a group of first feature vectors identified from the first data set and a group of second feature vectors identified from the second data set, It is configured to generate a first set of low-dimensional feature vectors corresponding to the first set of feature vectors and a second set of low-dimensional feature vectors corresponding to the second set of feature vectors. The group of second low-dimensional feature vectors may be a group of feature vectors having the same number of dimensions as the group of first low-dimensional feature vectors.

第一の特徴ベクトルのそれぞれは、複数の第一のエンティティのうちの対応する一つの特徴を表し得る。第二の特徴ベクトルのそれぞれは、複数の第二のエンティティのうちの対応する一つの特徴を表し得る。 Each first feature vector may represent a corresponding feature of the plurality of first entities. Each second feature vector may represent a corresponding feature of the plurality of second entities.

対応付け部は、第一の低次元特徴ベクトルの一群、及び、第二の低次元特徴ベクトルの一群に基づき、複数の第一のエンティティのそれぞれを、複数の第二のエンティティの少なくとも一つに対応付けるように構成される。 The associating unit associates each of the plurality of first entities with at least one of the plurality of second entities based on the group of first low-dimensional feature vectors and the group of second low-dimensional feature vectors. configured to match.

第一のエンティティの集合及び第二のエンティティの集合が互いに共通する母集団又は互いに関係する母集団からの部分集合である場合には、第一の特徴ベクトルと第二の特徴ベクトルとの間に共通変数が存在しなくても、次元削減によって、第一のエンティティの特徴及び第二のエンティティの特徴を、互いに共通する又は関係する成分の組合せで表現することができる。 between the first feature vector and the second feature vector if the first set of entities and the second set of entities are subsets from a common population or a mutually related population Even in the absence of common variables, dimensionality reduction allows features of the first entity and features of the second entity to be represented by combinations of components that are common or related to each other.

すなわち、次元削減によれば、第一の特徴ベクトル及び第二の特徴ベクトルから互いに共通する又は関係する主要な特徴成分を抽出することができる。従って、低次元特徴ベクトルの比較によれば、第一のエンティティと第二のエンティティとの間の適合度を、適切に判別することができる。 That is, according to dimensionality reduction, it is possible to extract main feature components that are common or related to each other from the first feature vector and the second feature vector. Therefore, by comparing the low-dimensional feature vectors, it is possible to appropriately determine the degree of matching between the first entity and the second entity.

従って、本開示の一側面によれば、共通変数によらずに、複数の第一のエンティティに関する第一のデータセットと、複数の第二のエンティティに関する第二のデータセットとに基づいて、第一のエンティティと第二のエンティティとの対応付けを適切に行うことができる。 Therefore, according to one aspect of the present disclosure, the first It is possible to properly associate one entity with a second entity.

本開示の一側面によれば、対応付け部は、第一の低次元特徴ベクトルの一群から特定される第一のエンティティ間の類似度、及び、第二の低次元特徴ベクトルの一群から特定される第二のエンティティ間の類似度に基づき、類似度に関する第一のエンティティ間の相互関係が第二のエンティティ間の相互関係に適合するように、複数の第一のエンティティのそれぞれを、複数の第二のエンティティの少なくとも一つに対応付けることができる。 According to one aspect of the present disclosure, the associating unit is the similarity between the first entities identified from the first group of low-dimensional feature vectors and the second group of low-dimensional feature vectors identified from the Each of the plurality of first entities is divided into a plurality of It can be associated with at least one of the second entities.

第一のエンティティの集合、及び、第二のエンティティの集合が共通する母集団又は互いに関係する母集団からの部分集合である場合には、エンティティ間の類似度に関する相互関係が、母集団と同様に、第一のエンティティの集合と、第二のエンティティの集合との間で、およそ互いに共通する又は関係する。 If the first set of entities and the second set of entities are subsets from a common population or mutually related populations, then the mutual relationships in terms of similarity between the entities are similar to the populations. In addition, the first set of entities and the second set of entities generally have or are related to each other.

従って、類似度に関する第一のエンティティ間の相互関係が第二のエンティティ間の相互関係に適合するように、複数の第一のエンティティのそれぞれを、複数の第二のエンティティの少なくとも一つに対応付けることによれば、第一のエンティティのそれぞれを、同一性の高い又は関係性の強い適切な第二のエンティティに対応付けることができる。 Therefore, each of the plurality of first entities is associated with at least one of the plurality of second entities such that the correlation between the first entities in terms of similarity matches the correlation between the second entities. Possibly, each first entity can be associated with a suitable second entity that is highly identical or closely related.

本開示の一側面によれば、第一の低次元特徴ベクトルは、第一の特徴空間によって定義され得る。第二の低次元特徴ベクトルは、第二の特徴空間によって定義され得る。対応付け部は、第一の低次元特徴ベクトルの一群から特定される第一の特徴空間における複数の第一のエンティティの分布が、第二の低次元特徴ベクトルの一群から特定される第二の特徴空間における複数の第二のエンティティの分布に適合するように、第一の特徴空間上の複数の第一のエンティティを第二の特徴空間にマッピングするための写像を探索し得る。
対応付け部は、写像に基づいて、複数の第一のエンティティのそれぞれを、複数の第二のエンティティの少なくとも一つに対応付けるように構成され得る。 According to one aspect of the disclosure, a first low-dimensional feature vector may be defined by a first feature space. A second low-dimensional feature vector may be defined by a second feature space. The associating unit is configured such that the distribution of the plurality of first entities in the first feature space specified from the group of first low-dimensional feature vectors is specified from the group of second low-dimensional feature vectors. A mapping may be searched to map the plurality of first entities on the first feature space to the second feature space to match the distribution of the plurality of second entities in the feature space.
The mapping unit may be configured to map each of the plurality of first entities to at least one of the plurality of second entities based on the mapping.

本開示の一側面によれば、対応付け部は、行列Ｋ、行列Ｌ、及び行列Ｈを含む式

に従う値Ｚ（Ω）を最大化する行列Ωを行列Ω^＊として探索し、行列Ω^＊に基づいて、複数の第一のエンティティのそれぞれを、複数の第二のエンティティの少なくとも一つに対応付けるように構成され得る。Ｔは、転置記号である。ｔｒａｃｅ（Ｘ）は、行列Ｘの対角和である。 According to one aspect of the present disclosure, the mapper includes an equation that includes matrix K, matrix L, and matrix H:

A matrix Ω that maximizes the value Z(Ω) according to is searched as a matrix Ω ^* , and based on the matrix Ω ^* , each of the plurality of first entities is associated with at least one of the plurality of second entities. can be configured to T is the transpose symbol. trace(X) is the diagonal sum of matrix X;

行列Ｋは、Ｎ行Ｎ列の行列であり得る。第一のエンティティの数は、Ｎであり得る。第二のエンティティの数は、第一のエンティティと同じであり得る。行列Ｋは、第ｉ行第ｊ列の要素の値が、複数の第一のエンティティのうちのｉ番目のエンティティとｊ番目のエンティティとの間の類似度を表す第一の類似度行列であり得る。 Matrix K may be a matrix with N rows and N columns. The number of first entities may be N; The number of second entities can be the same as the first entity. The matrix K is a first similarity matrix in which the value of the element in the i-th row and j-th column represents the similarity between the i-th entity and the j-th entity among the plurality of first entities. obtain.

行列Ｋにおける第ｉ行第ｊ列の要素の値は、複数の第一のエンティティのうちのｉ番目のエンティティの第一の低次元特徴ベクトルと、複数の第一のエンティティのうちのｊ番目のエンティティの第一の低次元特徴ベクトルと、に基づいて算出され得る。 The value of the i-th row and j-th column element in the matrix K is the first low-dimensional feature vector of the i-th entity among the plurality of first entities and the j-th and a first low-dimensional feature vector of the entity.

行列Ｌは、Ｎ行Ｎ列の行列であり得る。行列Ｌは、第ｉ行第ｊ列の要素の値が、複数の第二のエンティティのうちのｉ番目のエンティティとｊ番目のエンティティとの間の類似度を表す第二の類似度行列である。 The matrix L may be a matrix with N rows and N columns. The matrix L is a second similarity matrix in which the value of the element in the i-th row and j-th column represents the similarity between the i-th entity and the j-th entity among the plurality of second entities. .

行列Ｌにおける第ｉ行第ｊ列の要素の値は、複数の第二のエンティティのうちのｉ番目のエンティティの第二の低次元特徴ベクトルと、複数の第二のエンティティのうちのｊ番目のエンティティの第二の低次元特徴ベクトルと、に基づいて算出され得る。 The value of the i-th row and j-th column element in the matrix L is the second low-dimensional feature vector of the i-th entity among the plurality of second entities and the j-th feature vector of the plurality of second entities. and a second low-dimensional feature vector of the entity.

行列Ｈは、Ｎ行Ｎ列の行列であり得る。行列Ｈは、第ｉ行第ｊ列の要素の値が、ｉ＝ｊであるとき値１－１／Ｎを示し、ｉ≠ｊであるとき値０を示す行列であり得る。 Matrix H may be a matrix with N rows and N columns. The matrix H may be a matrix in which the value of the element in the i-th row and j-th column indicates the value 1−1/N when i=j, and indicates the value 0 when i≠j.

本開示の一側面によれば、対応付け部は、行列Ω^＊に基づいて、次元削減処理での次元削減方式を変更してもよい。例えば、対応付け部は、第一の低次元特徴ベクトルの一群及び第二の低次元特徴ベクトルの一群のうち、互いに対応する第一の低次元特徴ベクトルと第二の低次元特徴ベクトルとの間の特徴空間上の距離が短くなるように、次元削減処理での次元削減方式を変更してもよい。 According to one aspect of the present disclosure, the associating unit may change the dimensionality reduction method in the dimensionality reduction process based on the matrix Ω ^* . For example, the associating unit, of the group of first low-dimensional feature vectors and the group of second low-dimensional feature vectors, between the first low-dimensional feature vector and the second low-dimensional feature vector corresponding to each other The dimension reduction method in the dimension reduction process may be changed so that the distance in the feature space of is shortened.

本開示の一側面によれば、対応付け部は、行列Ω^＊に基づいて、次元削減処理での次元削減方式を変更し、変更後の次元削減方式での次元削減処理を、次元削減部に実行させ、それにより新たに得られた第一の低次元特徴ベクトルの一群及び第二の低次元特徴ベクトルの一群に基づいて、値Ｚ（Ω）を最大化する行列Ωを行列Ω^＊として探索する再探索処理を、所定条件が満足されるまで繰返し実行することにより、行列Ω^＊を改善し、改善された行列Ω^＊に基づいて、複数の第一のエンティティのそれぞれを、複数の第二のエンティティの少なくとも一つに対応付けるように構成されてもよい。 According to one aspect of the present disclosure, the associating unit changes the dimensionality reduction method in the dimensionality reduction processing based on the matrix Ω ^* , and performs the dimensionality reduction processing in the changed dimensionality reduction method to the dimensionality reduction unit. and search the matrix Ω that maximizes the value Z(Ω) as the matrix Ω ^* based on the group of the first low-dimensional feature vectors and the group of the second low-dimensional feature vectors newly obtained. The matrix Ω ^* is improved by repeatedly executing the re-search process until a predetermined condition is satisfied, and each of the plurality of first entities is processed by the plurality of second entities based on the improved matrix Ω ^* . may be configured to correspond to at least one of the entities of

対応付け部がこのように構成される情報処理システムによれば、第一のエンティティと第二のエンティティとの間の対応付けを、高精度に実行することができる。 According to the information processing system in which the associating unit is configured in this manner, associating between the first entity and the second entity can be executed with high accuracy.

本開示の一側面によれば、第一のデータセットは、複数の第一の特徴データを含み得る。複数の第一の特徴データのそれぞれは、複数の第一のエンティティのうちの対応する一つの特徴を表し得る。第二のデータセットは、複数の第二の特徴データを含み得る。複数の第二の特徴データのそれぞれは、複数の第二のエンティティのうちの対応する一つの特徴を表し得る。 According to one aspect of the present disclosure, the first data set may include multiple first feature data. Each of the plurality of first feature data may represent a corresponding one feature of the plurality of first entities. The second data set may include multiple second feature data. Each of the plurality of second feature data may represent a corresponding feature of the plurality of second entities.

本開示の一側面によれば、情報処理システムは、データフュージョン部を更に備え得る。データフュージョン部は、対応付け部による対応付けに基づき、第一のデータセットに含まれる複数の第一の特徴データのそれぞれに、第二のデータセットに含まれる複数の第二の特徴データのうちの一つを結合することによって、拡張データセットを生成するように構成され得る。拡張データセットは、複数の拡張データを備え得る。複数の拡張データのそれぞれは、対応する一つの第一の特徴データと第二の特徴データとの結合データであり得る。 According to one aspect of the present disclosure, the information processing system may further include a data fusion unit. The data fusion unit converts each of the plurality of first feature data included in the first data set to each of the plurality of second feature data included in the second data set based on the association by the associating unit. can be configured to generate an augmented data set by combining one of An extended data set may comprise multiple extended data. Each of the plurality of extended data can be combined data of corresponding one first feature data and second feature data.

こうした情報処理システムによれば、複数のデータセットを結合した情報量の多いデータセットを生成することが可能である。 According to such an information processing system, it is possible to generate a dataset with a large amount of information by combining a plurality of datasets.

本開示の一側面によれば、第一のエンティティは、人であり得る。第二のエンティティは、人であり得る。第一のデータセットは、第一の集団に属する複数の人のそれぞれの第一の特徴を記述するデータセットであり得る。第二のデータセットは、第二の集団の属する複数の人のそれぞれの第二の特徴を記述するデータセットであり得る。 According to one aspect of the disclosure, the first entity may be a person. A second entity can be a person. The first data set may be a data set describing a first characteristic of each of a plurality of persons belonging to the first population. The second data set can be a data set describing a second characteristic of each of the plurality of persons belonging to the second population.

人の行動や関心等に関する特徴は、デモグラフィック属性に大きく影響し、異なる人の集団の間でも、デモグラフィック属性に応じた特徴分布は、大きく変わらないと考えられる。従って、本開示の一側面に係る情報処理システムによれば、共通変数なしで、異なる集団間の人に関する対応付けを、適切に行うことが可能である。 It is believed that features related to human behavior, interests, etc. greatly affect demographic attributes, and that feature distributions according to demographic attributes do not vary greatly even among different groups of people. Therefore, according to the information processing system according to one aspect of the present disclosure, it is possible to appropriately associate people in different groups without common variables.

本開示の一側面によれば、第一の特徴と第二の特徴との組合せは、購買行動に関する特徴と、オンライン空間及びオフライン空間の少なくとも一方の空間における移動に関する特徴、及び／又は、空間上の複数の地点への訪問に関する特徴と、の組合せであり得る。こうした特徴に関係するデータセットに基づいたエンティティの対応付けは、更には、データフュージョンは、人の行動分析に役立つ。 According to one aspect of the present disclosure, the combination of the first feature and the second feature includes a feature related to purchasing behavior, a feature related to movement in at least one of the online space and the offline space, and/or spatially and features relating to visits to multiple points of. Entity matching based on datasets related to these features, and furthermore data fusion, aids in human behavior analysis.

本開示の一側面によれば、第二のデータセットには、複数の第二のエンティティのそれぞれに対応する情報端末の識別情報が関連付けられていてもよい。 According to one aspect of the present disclosure, the second data set may be associated with identification information of information terminals corresponding to each of the plurality of second entities.

本開示の一側面によれば、情報処理システムは、上記識別情報に基づいて、複数の第二のエンティティのうち、情報コンテンツの配信先として選択された第二のエンティティの集合に対応する情報端末の集合に、情報コンテンツを配信するように構成される配信部を備えていてもよい。 According to one aspect of the present disclosure, an information processing system includes an information terminal corresponding to a set of second entities selected as distribution destinations of information content among a plurality of second entities based on the identification information. may comprise a distributor configured to distribute the information content.

本開示の一側面によれば、情報処理システムは、対応付け部により複数の第一のエンティティのいずれかと対応付けられた第二のエンティティの集合の少なくとも一部を、情報コンテンツの配信先に選択する選択部を備えていてもよい。 According to one aspect of the present disclosure, an information processing system selects at least part of a set of second entities associated with one of a plurality of first entities by an associating unit as an information content delivery destination. It may be provided with a selection unit to select.

この情報処理システムは、第一のエンティティ及び第二のエンティティが人であるときに有意義に機能する。上述の配信方式によれば、第一のエンティティと情報端末との対応関係が不明であるときにも、第二のエンティティに関連付けられた情報端末の識別情報を活用して、第一のエンティティに対応する第二のエンティティの情報端末に適切に情報コンテンツを配信することができる。 This information processing system functions meaningfully when the first entity and the second entity are people. According to the distribution method described above, even when the correspondence relationship between the first entity and the information terminal is unknown, the identification information of the information terminal associated with the second entity can be used to send the information to the first entity. The information content can be appropriately distributed to the information terminal of the corresponding second entity.

本開示の一側面によれば、選択部は、対応付け部により複数の第一のエンティティのいずれかと対応付けられた第二のエンティティの集合である第一の集合と、複数の第二のエンティティのうち、第一の集合と特徴が類似する第二の集合と、を情報コンテンツの配信先に選択するように構成されてもよい。こうした配信先の選択によれば、第二のデータセットに基づいて、配信先を適切な範囲で拡張して、情報コンテンツを配信することができる。 According to one aspect of the present disclosure, the selection unit includes a first set that is a set of second entities associated with any of the plurality of first entities by the association unit, and a plurality of second entities. Of these, the second set having similar features to the first set may be selected as the distribution destination of the information content. According to such selection of distribution destinations, it is possible to distribute the information content by expanding the distribution destinations within an appropriate range based on the second data set.

本開示の一側面によれば、第二のデータセットは、複数の第二のエンティティのそれぞれの行動に関する特徴を記述するデータセットであってもよい。この場合、情報処理システムは、複数の第一のエンティティの少なくとも一部のエンティティに関して、エンティティ毎に、第二のデータセットから特定される対応付け部によって当該エンティティに対応付けられた複数の第二のエンティティの少なくとも一つの行動に関する特徴に基づき、当該エンティティの行動に関する推定値を算出する推定部を備えていてもよい。この場合、第一のエンティティ及び第二のエンティティは、人であり得る。 According to one aspect of the present disclosure, the second data set may be a data set describing behavioral characteristics of each of the plurality of second entities. In this case, with respect to at least some of the plurality of first entities, the information processing system provides, for each entity, a plurality of second an estimating unit for calculating an estimated value for the behavior of the entity based on at least one behavioral characteristic of the entity. In this case, the first entity and the second entity may be people.

上述の推定部を備える情報処理システムによれば、第一のデータセットだけでは判別できない第一のエンティティの行動を、第二のデータセットを通じて推定することが可能である。推定は、予測であってもよい。 According to the information processing system including the estimation unit described above, it is possible to estimate the behavior of the first entity, which cannot be determined only by the first data set, through the second data set. An estimate may be a prediction.

本開示の一側面によれば、上述した情報処理システムにおける第一取得部、第二取得部、次元削減部、及び、対応付け部の少なくとも一部として、コンピュータを機能させるためのコンピュータプログラムが提供されてもよい。 According to one aspect of the present disclosure, a computer program for causing a computer to function as at least part of the first acquisition unit, the second acquisition unit, the dimension reduction unit, and the association unit in the information processing system described above is provided. may be

本開示の一側面によれば、上述した情報処理システムにおける第一取得部、第二取得部、次元削減部、対応付け部、配信部、及び、選択部の少なくとも一部として、コンピュータを機能させるためのコンピュータプログラムが提供されてもよい。 According to one aspect of the present disclosure, a computer functions as at least a part of the first acquisition unit, the second acquisition unit, the dimension reduction unit, the association unit, the distribution unit, and the selection unit in the information processing system described above. A computer program may be provided for.

本開示の一側面によれば、上述した情報処理システムにおける第一取得部、第二取得部、次元削減部、対応付け部、及び、推定部の少なくとも一部として、コンピュータを機能させるためのコンピュータプログラムが提供されてもよい。これらのコンピュータプログラムは、記録媒体に記録されて、提供されてもよい。 According to one aspect of the present disclosure, a computer for causing a computer to function as at least a part of the first acquisition unit, the second acquisition unit, the dimension reduction unit, the association unit, and the estimation unit in the information processing system described above A program may be provided. These computer programs may be recorded on a recording medium and provided.

本開示の一側面によれば、上述した情報処理システムが実行する方法に対応する情報処理方法が提供されてもよい。本開示の一側面によれば、コンピュータにより実行される情報処理方法が提供されてもよい。情報処理方法は、複数の第一のエンティティに関する第一のデータセットであって、複数の第一のエンティティのそれぞれの特徴を記述する第一のデータセットを取得することを含み得る。 According to one aspect of the present disclosure, an information processing method corresponding to the method executed by the information processing system described above may be provided. According to one aspect of the present disclosure, a computer-implemented information processing method may be provided. The information processing method may include obtaining a first data set relating to a plurality of first entities, the first data set describing characteristics of each of the plurality of first entities.

情報処理方法は、複数の第二のエンティティに関する第二のデータセットであって、複数の第二のエンティティのそれぞれの特徴を記述する第二のデータセットを取得することを含み得る。 The information processing method may include obtaining a second data set relating to a plurality of second entities, the second data set describing characteristics of each of the plurality of second entities.

情報処理方法は、第一のデータセットから特定される第一の特徴ベクトルの一群、及び、第二のデータセットから特定される第二の特徴ベクトルの一群に対する次元削減処理を実行することによって、第一の特徴ベクトルの一群に対応する第一の低次元特徴ベクトルの一群、及び、第二の特徴ベクトルの一群に対応する第二の低次元特徴ベクトルの一群を生成することを含み得る。第二の低次元特徴ベクトルの一群は、第一の低次元特徴ベクトルの一群と同一次元数の特徴ベクトルの一群であり得る。 The information processing method performs dimension reduction processing on a group of first feature vectors identified from the first data set and a group of second feature vectors identified from the second data set, It may include generating a first set of low-dimensional feature vectors corresponding to the first set of feature vectors and a second set of low-dimensional feature vectors corresponding to the second set of feature vectors. The group of second low-dimensional feature vectors may be a group of feature vectors having the same number of dimensions as the group of first low-dimensional feature vectors.

情報処理方法は、第一の低次元特徴ベクトルの一群、及び、第二の低次元特徴ベクトルの一群に基づき、複数の第一のエンティティのそれぞれを、複数の第二のエンティティの少なくとも一つに対応付けることを含み得る。 The information processing method converts each of the plurality of first entities into at least one of the plurality of second entities based on the set of first low-dimensional feature vectors and the set of second low-dimensional feature vectors. may include matching.

本開示の一側面によれば、対応付けることは、第一の低次元特徴ベクトルの一群から特定される第一のエンティティ間の類似度、及び、第二の低次元特徴ベクトルの一群から特定される第二のエンティティ間の類似度に基づき、類似度に関する第一のエンティティ間の相互関係が第二のエンティティ間の相互関係に適合するように、複数の第一のエンティティのそれぞれを、複数の第二のエンティティの一つに対応付けることを含み得る。 According to one aspect of the present disclosure, the matching is a measure of similarity between a first entity identified from a first set of low-dimensional feature vectors and a measure of similarity between entities identified from a second set of low-dimensional feature vectors. Based on the similarity between the second entities, each of the plurality of first entities is combined with the plurality of second entities such that the correlation between the first entities with respect to the similarity matches the correlation between the second entities. It may involve mapping to one of two entities.

こうした情報処理方法によれば、上述した情報処理システムと同様に、共通変数によらずに、複数の第一のエンティティに関する第一のデータセットと、複数の第二のエンティティに関する第二のデータセットとに基づいて、第一のエンティティと第二のエンティティとの対応付けを実現可能である。 According to this information processing method, as in the information processing system described above, a first data set regarding a plurality of first entities and a second data set regarding a plurality of second entities are obtained without using a common variable. A correspondence between the first entity and the second entity can be realized based on .

情報処理システムの構成を表すブロック図である。1 is a block diagram showing the configuration of an information processing system; FIG. プロセッサが実行する分析処理を表すフローチャートである。4 is a flowchart representing analysis processing executed by a processor; 図３Ａは、第一のデータセットの構成を例示する図であり、図３Ｂは、第二のデータセットの構成を例示する図である。FIG. 3A is a diagram illustrating the configuration of the first data set, and FIG. 3B is a diagram illustrating the configuration of the second data set. 図４Ａ及び図４Ｂは、行列Ωの探索手法を説明する図である。4A and 4B are diagrams illustrating a search method for the matrix Ω. プロセッサにより生成される対応表の構成を例示する図である。4 is a diagram illustrating the configuration of a correspondence table generated by a processor; FIG. プロセッサにより生成される拡張データセットの構成を例示する図である。FIG. 4 is a diagram illustrating the configuration of an extended data set generated by a processor; 第二実施形態でプロセッサが実行する分析処理を表すフローチャートである。It is a flow chart showing analysis processing which a processor performs in a second embodiment. 第三実施形態でプロセッサが実行する評価処理を表すフローチャートである。It is a flow chart showing evaluation processing which a processor performs in a third embodiment. 第三実施形態でプロセッサが実行する選択処理を表すフローチャートである。FIG. 11 is a flowchart showing selection processing executed by a processor in the third embodiment; FIG. 第四実施形態の配信システムの構成を表すブロック図である。It is a block diagram showing the structure of the delivery system of 4th embodiment. 第四実施形態における内部データセットの構成を例示する図である。It is a figure which illustrates the structure of the internal data set in 4th embodiment. 第四実施形態でプロセッサが実行する配信制御処理を表すフローチャートである。It is a flow chart showing distribution control processing which a processor performs in a fourth embodiment. 第五実施形態でプロセッサが実行する配信制御処理を表すフローチャートである。It is a flow chart showing distribution control processing which a processor performs in a fifth embodiment. 第六実施形態でプロセッサが実行する予測処理を表すフローチャートである。FIG. 16 is a flowchart showing prediction processing executed by a processor in the sixth embodiment; FIG.

以下に本開示の例示的実施形態を、図面を参照しながら説明する。
＜第一実施形態＞
本実施形態の情報処理システム１は、汎用コンピュータに専用のコンピュータプログラムＰｒがインストールされて構成される。情報処理システム１は、図１に示すように、プロセッサ１１と、メモリ１３と、ストレージ１５と、ユーザインタフェース１７と、通信インタフェース１９とを備える。 Exemplary embodiments of the present disclosure are described below with reference to the drawings.
<First embodiment>
The information processing system 1 of this embodiment is configured by installing a dedicated computer program Pr in a general-purpose computer. The information processing system 1 includes a processor 11, a memory 13, a storage 15, a user interface 17, and a communication interface 19, as shown in FIG.

プロセッサ１１は、ストレージ１５に格納されたコンピュータプログラムＰｒに従う処理を実行する。メモリ１３は、ＲＡＭを備える一次記憶装置であり、プロセッサ１１による処理の実行時に作業エリアとして使用される。 The processor 11 executes processing according to the computer program Pr stored in the storage 15 . The memory 13 is a primary storage device having a RAM, and is used as a work area when the processor 11 executes processing.

ストレージ１５は、例えばハードディスクドライブ又はソリッドステートドライブを備える二次記憶装置であり、コンピュータプログラムＰｒの他、コンピュータプログラムＰｒに従う処理の実行時に供される各種データを記憶する。 The storage 15 is a secondary storage device including, for example, a hard disk drive or a solid state drive, and stores various data provided during execution of processing according to the computer program Pr in addition to the computer program Pr.

ユーザインタフェース１７は、情報処理システム１を操作するユーザからの操作信号をプロセッサ１１に入力するための入力デバイスと、ユーザに対して各種情報を表示するためのディスプレイと、を備える。入力デバイスの例には、キーボード及びポインティングデバイスが含まれる。 The user interface 17 includes an input device for inputting an operation signal from a user who operates the information processing system 1 to the processor 11, and a display for displaying various information to the user. Examples of input devices include keyboards and pointing devices.

通信インタフェース１９は、ＬＡＮ（ローカルエリアネットワーク）インタフェース及びＵＳＢ（ユニバーサル・シリアル・シリアル）インタフェースを含み、外部装置との通信に使用される。情報処理システム１は、通信インタフェース１９を通じて外部装置との間でデータ送受する。 The communication interface 19 includes a LAN (Local Area Network) interface and a USB (Universal Serial Serial) interface, and is used for communication with external devices. The information processing system 1 transmits and receives data to and from an external device through the communication interface 19 .

情報処理システム１におけるプロセッサ１１は、コンピュータプログラムＰｒに従う処理の実行により、通信インタフェース１９を通じて外部装置から取得した第一のデータセット１５Ａを、第二のデータセット１５Ｂを用いて拡張した拡張データセット１５Ｃを生成する。 The processor 11 in the information processing system 1 executes a process according to the computer program Pr to extend the first data set 15A acquired from the external device through the communication interface 19 using the second data set 15B to obtain an extended data set 15C. to generate

拡張データセット１５Ｃは、第一のデータセット１５Ａに対して、第二のデータセット１５Ｂが備える情報を付加したデータセットである。拡張により、第一のデータセット１５Ａが記述する各エンティティの情報量は増大する。エンティティは、例えば人である。情報量の増大は、拡張データセット１５Ｃに基づいた人の行動分析や広告配信のために行われる。 The extended data set 15C is a data set obtained by adding information included in the second data set 15B to the first data set 15A. Expansion increases the amount of information for each entity that the first data set 15A describes. An entity is, for example, a person. The amount of information is increased for human behavior analysis and advertisement distribution based on the extended data set 15C.

具体的に、情報処理システム１のプロセッサ１１は、ユーザインタフェース１７を通じてユーザからの実行指令が入力されると、図２に示す分析処理を実行する。図２に示す分析処理を開始すると、プロセッサ１１は、データフュージョン対象の第一のデータセット１５Ａと第二のデータセット１５Ｂとを取得する（Ｓ１１０，Ｓ１２０）。 Specifically, the processor 11 of the information processing system 1 executes the analysis process shown in FIG. 2 when an execution command is input from the user through the user interface 17 . When the analysis process shown in FIG. 2 is started, the processor 11 acquires the first data set 15A and the second data set 15B for data fusion (S110, S120).

Ｓ１１０，Ｓ１２０において、プロセッサ１１は、ストレージ１５に予め格納された第一のデータセット１５Ａ及び第二のデータセット１５Ｂを、ストレージ１５から読み出す。これにより、プロセッサ１１は、第一のデータセット１５Ａ及び第二のデータセット１５Ｂを取得する。 In S110 and S120, the processor 11 reads the first data set 15A and the second data set 15B pre-stored in the storage 15 from the storage 15. FIG. Thereby, the processor 11 acquires the first data set 15A and the second data set 15B.

取得すべき第一のデータセット１５Ａ及び第二のデータセット１５Ｂは、ユーザから指定され得る。ユーザは、データフュージョン対象の第一のデータセット１５Ａ及び第二のデータセット１５Ｂを予め収集してストレージ１５に格納することができる。 The first data set 15A and the second data set 15B to be acquired can be specified by the user. The user can collect the first data set 15A and the second data set 15B for data fusion in advance and store them in the storage 15 .

あるいは、プロセッサ１１は、通信インタフェース１９を用いた通信により、第一の外部装置から第一のデータセット１５Ａを取得し、第二の外部装置から第二のデータセット１５Ｂを取得することができる。 Alternatively, the processor 11 can acquire the first data set 15A from the first external device and the second data set 15B from the second external device by communication using the communication interface 19. FIG.

具体的に、第一のデータセット１５Ａは、複数の第一のエンティティに関するデータセットであって、第一のエンティティのそれぞれの第一の特徴を記述するデータセットである。第一のデータセット１５Ａは、第一の特徴データの集合であり、第一の特徴データのそれぞれは、複数の第一のエンティティのうちの対応する一つのエンティティの第一の特徴を表す。 Specifically, the first dataset 15A is a dataset relating to a plurality of first entities and describing a first characteristic of each of the first entities. The first data set 15A is a set of first feature data, each piece of first feature data representing a first feature of a corresponding one of the plurality of first entities.

第二のデータセット１５Ｂは、複数の第二のエンティティに関するデータセットであって、第二のエンティティのそれぞれの第二の特徴を記述するデータセットである。第二の特徴は、第一の特徴とは異なる特徴であり得る。具体的に、第二のデータセット１５Ｂは、第二の特徴データの集合であり、第二の特徴データのそれぞれは、複数の第二のエンティティのうちの対応する一つのエンティティの第二の特徴を表す。 A second data set 15B is a data set relating to a plurality of second entities and describing a second characteristic of each of the second entities. The second feature can be a different feature than the first feature. Specifically, the second data set 15B is a set of second feature data, and each of the second feature data is the second feature of a corresponding one of the plurality of second entities. represents

第一のエンティティの集合、及び、第二のエンティティの集合は、例えば、互いに共通する母集団における異なる部分集合である。母集団は、人の集合、又は、消費者の集合であり得る。例えば第一のエンティティの集合は、第一の企業の顧客に対応する人の集合であり得て、第二のエンティティの集合は、第一の企業とは異なる第二の企業の顧客に対応する人の集合であり得る。 The first set of entities and the second set of entities are, for example, different subsets of a common population. A population can be a collection of people or a collection of consumers. For example, a first set of entities may be a set of people corresponding to customers of the first company, and a second set of entities correspond to customers of a second company that is different from the first company. It can be a group of people.

あるいは、第一のエンティティの集合は、第一の行動の収集対象とされる人の集合であり得て、第二のエンティティの集合は、第二の行動の収集対象とされる人の集合であり得る。 Alternatively, the first set of entities may be the set of people from whom the first behavior is collected, and the second set of entities is the set of people from whom the second behavior is collected. could be.

図３Ａに示される第一のデータセット１５Ａは、第一の人の集合に関するデータであり、人毎の購買行動に関する特徴データを備える。各特徴データは、対応する人のＩＤに関連付けて、複数の商品Ｐ１，Ｐ２，Ｐ３，…のそれぞれを、対応する人が購入したかを１又は０の２値で表す。 A first data set 15A, shown in FIG. 3A, is data about a first set of people and comprises feature data about the purchasing behavior of each person. Each piece of feature data is associated with a corresponding person's ID, and indicates by a binary value of 1 or 0 whether the corresponding person has purchased each of the plurality of products P1, P2, P3, .

図３Ｂに示される第二のデータセット１５Ｂは、第二の人の集合に関するデータであり、人毎のウェブコンテンツの閲覧行動に関する特徴データを備える。各特徴データは、対応する人のＩＤに関連付けて、複数のウェブサイトＳ１，Ｓ２，Ｓ３，…のそれぞれについて、対応するウェブサイトを、対応する人が訪問してウェブコンテンツを閲覧したか否かを１又は０の２値で表す。 A second data set 15B, shown in FIG. 3B, is data about a second set of people, comprising characteristic data about the web content viewing behavior of each person. Each feature data is associated with a corresponding person's ID, and whether or not the corresponding person has visited each of the plurality of websites S1, S2, S3, . . . is represented by a binary value of 1 or 0.

プロセッサ１１は、Ｓ１１０において、取得した第一のデータセット１５Ａに含まれる第一のエンティティ毎の第一の特徴データに基づいて、第一のエンティティ毎のＭ１次元特徴ベクトルｘ＝（ｘ１，ｘ２，ｘ３，…）を生成する。一例によれば、特徴ベクトルｘの要素ｘ１，ｘ２，ｘ３，…は、それぞれ、対応する人の商品Ｐ１，Ｐ２，Ｐ３，…の購買の有無を表し得る。 In S110, the processor 11 generates an M1-dimensional feature vector x=(x1, x2, x3,...). According to one example, the elements x1, x2, x3, .

同様に、プロセッサ１１は、Ｓ１２０において、取得した第二のデータセット１５Ｂに含まれる第二のエンティティ毎の第二の特徴データに基づいて、第二のエンティティ毎のＭ２次元特徴ベクトルｙ＝（ｙ１，ｙ２，ｙ３，…）を生成する。一例によれば、特徴ベクトルｙの要素ｙ１，ｙ２，ｙ３，…は、それぞれ、対応する人のウェブサイトＳ１，Ｓ２，Ｓ３，…でのウェブコンテンツの閲覧有無を表し得る。 Similarly, in S120, the processor 11 calculates the M2-dimensional feature vector y=(y1 , y2, y3, . . . ). According to one example, the elements y1, y2, y3, .

その後、プロセッサ１１は、特徴ベクトルｘの一群に対する次元削減処理（Ｓ１３０）により、各特徴ベクトルｘを、Ｍ１次元特徴ベクトルから、それより小さいＭ次元の特徴ベクトルである低次元特徴ベクトルＤｘ＝（Ｄｘ１，Ｄｘ２，…）に変換する。これにより、プロセッサ１１は、特徴ベクトルｘの一群に対応する低次元特徴ベクトルＤｘの一群を生成する。図３Ａの右下領域は、低次元特徴ベクトルＤｘの例を、テーブルにより示す。 After that, the processor 11 performs dimension reduction processing (S130) on a group of feature vectors x to convert each feature vector x from the M1-dimensional feature vector to a lower-dimensional feature vector Dx=(Dx1 , Dx2, . . . ). Thereby, the processor 11 generates a group of low-dimensional feature vectors Dx corresponding to the group of feature vectors x. The lower right area of FIG. 3A shows an example of a low-dimensional feature vector Dx in the form of a table.

プロセッサ１１は更に、特徴ベクトルｙの一群に対する次元削減処理（Ｓ１４０）により、各特徴ベクトルｙを、Ｍ２次元特徴ベクトルから、それより小さいＭ次元の特徴ベクトルである低次元特徴ベクトルＤｙ＝（Ｄｙ１，Ｄｙ２，…）に変換する。これにより、プロセッサ１１は、特徴ベクトルｙの一群に対応する低次元特徴ベクトルＤｙの一群を生成する。低次元特徴ベクトルＤｙは、低次元特徴ベクトルＤｘと同一次元数Ｍの特徴ベクトルである。図３Ｂの右下領域は、低次元特徴ベクトルＤｙの例を、テーブルにより示す。 Further, the processor 11 performs dimension reduction processing (S140) on a group of feature vectors y to convert each feature vector y from the M2-dimensional feature vector to a low-dimensional feature vector Dy=(Dy1, Dy1, which is a smaller M-dimensional feature vector). Dy2,...). Thereby, the processor 11 generates a group of low-dimensional feature vectors Dy corresponding to the group of feature vectors y. The low-dimensional feature vector Dy is a feature vector having the same dimension number M as the low-dimensional feature vector Dx. The lower right area of FIG. 3B shows an example of a low-dimensional feature vector Dy in the form of a table.

低次元空間への写像を実現するためのアルゴリズムの例としては、非負値行列分解（ＮｏｎｎｅｇａｔｉｖｅＭａｔｒｉｘＦａｃｔｏｒｉｚａｔｉｏｎ）、潜在的ディリクレ分配（ｌａｔｅｎｔｄｉｒｉｃｈｌｅｔａｌｌｏｃａｔｉｏｎ）、特異値分解（ｓｉｎｇｕｌａｒｖａｌｕｅｄｅｃｏｍｐｏｓｉｔｉｏｎ）、及び、確率的潜在意味解析（ＰｒｏｂａｂｉｌｉｓｔｉｃＬａｔｅｎｔＳｅｍａｎｔｉｃＡｎａｌｙｓｉｓ）が知られている。Ｓ１３０，Ｓ１４０における次元削減処理は、これらのアルゴリズムの一つを用いて実行され得る。 Examples of algorithms for realizing mapping to a low-dimensional space include nonnegative matrix factorization, latent dirichlet allocation, singular value decomposition, and stochastic Latent Semantic Analysis is known. The dimensionality reduction process at S130, S140 can be performed using one of these algorithms.

上述したアルゴリズムによれば、特徴ベクトルは、エンティティの個々を強く特徴付ける主要な特徴成分が抽出されるように、あるいは、エンティティの個々を区別するための情報の損失が少ない形式で、低次元化される。 According to the algorithm described above, the feature vectors are reduced in dimension so that the main feature components that strongly characterize the individual entities are extracted, or in a form with less information loss to distinguish the individual entities. be.

その後、プロセッサ１１は、低次元特徴ベクトルＤｘの一群と、低次元特徴ベクトルＤｙの一群と、に基づいて、第一のエンティティのそれぞれと、第二のエンティティのそれぞれとの対応関係を計算するアライメント処理を行う（Ｓ１５０－Ｓ１８０）。 After that, the processor 11 calculates the correspondence relationship between each of the first entities and each of the second entities based on the group of low-dimensional feature vectors Dx and the group of low-dimensional feature vectors Dy. Processing is performed (S150-S180).

アライメント処理は、カーネライズドソーティング（ＫｅｒｎｅｌｉｚｅｄＳｏｒｔｉｎｇ）の技術を用いて行われる。以下には、カーネライズドソーティングを用いたアライメント処理の詳細を説明するが、アライメント処理は、敵対的学習、Ｇｒｏｍｏｖ－ＷａｓｓｅｒｓｔｅｉｎＡｌｉｇｎｍｅｎｔ技術、又は、不均衡最適輸送（ＵｎｂａｌａｎｃｅｄＯｐｔｉｍａｌＴｒａｎｓｐｏｒｔ）技術を用いて実現されてもよい。 Alignment processing is performed using a kernelized sorting technique. The details of alignment processing using kernelized sorting will be described below. Alignment processing is realized using adversarial learning, Gromov-Wasserstein Alignment technology, or Unbalanced Optimal Transport technology. may be

Ｓ１５０において、プロセッサ１１は、低次元特徴ベクトルＤｘの一群を用いて、第一のエンティティの集合に関する類似度行列Ｋを生成する。類似度行列Ｋは、Ｎ行Ｎ列の正方行列である。ここで、Ｎは、低次元特徴ベクトルＤｘの個数、換言すれば、第一のエンティティの数である。 At S150, the processor 11 uses the collection of low-dimensional feature vectors Dx to generate a similarity matrix K for the first set of entities. The similarity matrix K is a square matrix with N rows and N columns. Here, N is the number of low-dimensional feature vectors Dx, in other words, the number of first entities.

類似度行列Ｋは、第ｉ行第ｊ列の要素の値Ｋｉｊが、第一のエンティティの集合におけるｉ番目のエンティティとｊ番目のエンティティとの間の類似度を表す行列として定義される。 The similarity matrix K is defined as a matrix in which the value Kij of the i-th row and j-th column element represents the similarity between the i-th entity and the j-th entity in the first set of entities.

すなわち、類似度行列Ｋは、第一のエンティティの集合に関して、エンティティ間の類似度の分布を説明する行列として定義される。換言すれば、類似度行列Ｋは、第一のエンティティの集合に関して、特徴空間上のエンティティの分布を、エンティティ間の近しさの尺度を用いて説明する行列として定義される。 That is, the similarity matrix K is defined as a matrix that describes the distribution of similarities between entities with respect to the first set of entities. In other words, the similarity matrix K is defined as a matrix that describes the distribution of entities on the feature space with respect to the first set of entities using a measure of closeness between entities.

具体的に、類似度は、ｉ番目のエンティティの低次元特徴ベクトルＤｘである低次元特徴ベクトルＤｘ［ｉ］と、ｊ番目のエンティティの低次元特徴ベクトルＤｘである低次元特徴ベクトルＤｘ［ｊ］と、をカーネル関数ｋ（ａ，ｂ）に代入した値ｋ（Ｄｘ［ｉ］，Ｄｘ［ｊ］）として算出される。すなわち、Ｋｉｊ＝ｋ（Ｄｘ［ｉ］，Ｄｘ［ｊ］）である。 Specifically, the similarity is calculated using a low-dimensional feature vector Dx[i], which is the low-dimensional feature vector Dx of the i-th entity, and a low-dimensional feature vector Dx[j], which is the low-dimensional feature vector Dx of the j-th entity. and are substituted into the kernel function k(a, b) as a value k(Dx[i], Dx[j]). That is, Kij=k(Dx[i], Dx[j]).

カーネル関数ｋ（ａ，ｂ）の例には、次式で表されるガウシアンＲＢＦ（動径基底関数）カーネルが含まれる。このカーネル関数ｋ（ａ，ｂ）を用いて算出される類似度は、値０から値１までの範囲の値を採る。 Examples of kernel functions k(a,b) include Gaussian RBF (Radial Basis Function) kernels: The similarity calculated using this kernel function k(a, b) takes values ranging from 0 to 1.

上記カーネル関数ｋ（ａ，ｂ）によれば、類似度行列Ｋの要素の値Ｋｉｊは、０＜Ｋｉｊ≦１である。

According to the kernel function k(a,b), the value Kij of the elements of the similarity matrix K is 0<Kij≦1.

Ｓ１６０において、プロセッサ１１は、低次元特徴ベクトルＤｙの一群を用いて、第二のエンティティの集合に関する類似度行列Ｌを生成する。類似度行列Ｌは、Ｎ行Ｎ列の正方行列である。ここで、Ｎは、低次元特徴ベクトルＤｙの個数、換言すれば、第二のエンティティの数である。すなわち、第一のエンティティの数と、第二のエンティティの数は、同一である。 At S160, processor 11 generates a similarity matrix L for the second set of entities using the set of low-dimensional feature vectors Dy. The similarity matrix L is a square matrix with N rows and N columns. Here, N is the number of low-dimensional feature vectors Dy, in other words, the number of second entities. That is, the number of first entities and the number of second entities are the same.

類似度行列Ｌは、類似度行列Ｋと同様に、第ｉ行第ｊ列の要素の値Ｌｉｊが第二のエンティティの集合のうち、ｉ番目のエンティティとｊ番目のエンティティとの間の類似度を表す行列として定義される。すなわち、第ｉ行第ｊ列の要素の値Ｌｉｊ＝ｋ（Ｄｙ［ｉ］，Ｄｙ［ｊ］）である。 In the similarity matrix L, similar to the similarity matrix K, the value Lij of the element in the i-th row and j-th column indicates the similarity between the i-th entity and the j-th entity in the set of second entities. is defined as a matrix representing That is, the value Lij of the element in the i-th row and j-th column is Lij=k(Dy[i], Dy[j]).

続くＳ１７０において、プロセッサ１１は、類似度行列Ｋ及び類似度行列Ｌを用いて、次式に従う値Ｚ（Ω）を最大化する行列Ωを行列Ω^＊として探索する。

ここで、行列Ｈは、Ｎ行Ｎ列の行列であって、第ｉ行第ｊ列の要素の値が、ｉ＝ｊであるとき値１－１／Ｎを示し、ｉ≠ｊであるとき値０を示す対角行列である。Ｔは、転置記号である。ｔｒａｃｅ（Ｘ）は、行列Ｘの対角和である。類似度行列Ｋ，Ｌは、対称行列である。行列Ω^ＴＬ’Ωが行列Ｋ’の転置行列となるような理想的なΩが見つかるとき、値Ｚ（Ω）は、最大化する。 In subsequent S170, processor 11 uses similarity matrix K and similarity matrix L to search for matrix Ω that maximizes value Z(Ω) according to the following equation as matrix Ω ^* .

Here, the matrix H is a matrix of N rows and N columns, and the value of the element in the i-th row and j-th column indicates the value 1-1/N when i = j, and when i ≠ j It is a diagonal matrix showing the value 0. T is the transpose symbol. trace(X) is the diagonal sum of matrix X; Similarity matrices K and L are symmetric matrices. The value Z(Ω) is maximized when the ideal Ω is found such that the matrix Ω ^T L'Ω is the transpose of the matrix K'.

行列Ω^＊を探索することは、低次元特徴ベクトルＤｘの一群から特定される第一のエンティティ間の類似度、及び、低次元特徴ベクトルＤｙの一群から特定される第二のエンティティ間の類似度に基づき、類似度に関する第一のエンティティ間の相互関係が第二のエンティティ間の相互関係に適合するように、複数の第一のエンティティのそれぞれを、複数の第二のエンティティの少なくとも一つに対応付けることに対応する。 Searching the matrix Ω ^* is the similarity between a first entity identified from a set of low-dimensional feature vectors Dx and the similarity between a second entity identified from a set of low-dimensional feature vectors Dy each of the plurality of first entities to at least one of the plurality of second entities such that the correlation between the first entities in terms of similarity matches the correlation between the second entities based on Corresponding to correspond.

換言すれば、行列Ω^＊を探索することは、低次元特徴ベクトルＤｘの一群から特定される第一のＭ次元特徴空間における第一のエンティティの分布であって、エンティティ間の類似度で定義される第一のエンティティの分布が、低次元特徴ベクトルＤｙの一群から特定される第二のＭ次元特徴空間における第二のエンティティの分布に適合するように、第一のＭ次元特徴空間上の複数の第一のエンティティを第二のＭ次元特徴空間にマッピングするための写像を探索することに対応する。 In other words, searching the matrix Ω ^* is the distribution of the first entity in the first M-dimensional feature space identified from the group of low-dimensional feature vectors Dx, defined by the similarity between the entities. A plurality of to a second M-dimensional feature space.

図４Ａの左グラフは、第一のエンティティの分布を概念的に表し、図４Ｂの左グラフは、第二のエンティティの分布を概念的に表す。図４Ａ及び図４Ｂに示す例は、技術説明のためだけに、２次元の低次元特徴ベクトルＤｘ，Ｄｙを定義している。符号Ｅ１１，Ｅ１２，Ｅ１３，Ｅ１４，Ｅ１５，Ｅ１６，Ｅ１７が付された各点は、第一のエンティティのそれぞれの特徴空間上の位置を示す。符号Ｅ２１，Ｅ２２，Ｅ２３，Ｅ２４，Ｅ２５，Ｅ２６，Ｅ２７が付された各点は、第二のエンティティのそれぞれの特徴空間上の位置を示す。 The left graph in FIG. 4A conceptually represents the distribution of the first entity, and the left graph in FIG. 4B conceptually represents the distribution of the second entity. The examples shown in FIGS. 4A and 4B define two-dimensional low-dimensional feature vectors Dx and Dy for technical explanation only. Each point labeled E11, E12, E13, E14, E15, E16, E17 indicates the position of the first entity on the feature space. Each point labeled E21, E22, E23, E24, E25, E26, E27 indicates the position of the second entity on the feature space.

図４Ｂから理解できるように、この例によれば、低次元特徴ベクトルＤｙの成分Ｄｙ１は、低次元特徴ベクトルＤｘの成分Ｄｘ２に対応し、低次元特徴ベクトルＤｙの成分Ｄｙ２は、低次元特徴ベクトルＤｘの成分Ｄｘ１に対応する。 As can be understood from FIG. 4B, according to this example, the component Dy1 of the low-dimensional feature vector Dy corresponds to the component Dx2 of the low-dimensional feature vector Dx, and the component Dy2 of the low-dimensional feature vector Dy corresponds to the low-dimensional feature vector It corresponds to the component Dx1 of Dx.

すなわち、図４Ａに示す例によれば、第一のエンティティの一群と、第二のエンティティの一群とは、エンティティの配列及び次元の順序が、類似度行列Ｋと類似度行列Ｌとの間で異なる形で定義されているだけであり、実質、同じエンティティの集合の類似度分布を示す。 That is, according to the example shown in FIG. 4A , the first group of entities and the second group of entities are such that the entity arrangement and dimension order are between the similarity matrix K and the similarity matrix L. They represent similarity distributions for sets of entities that are essentially the same, only defined differently.

第一のエンティティの一群と、第二のエンティティの一群とが、母集団が同じであるなどの理由により、相互に共通する又は関係する集団的性質を有するときには、特徴ベクトルｘ，ｙの低次元化により、情報源の第一のデータセット１５Ａと第二のデータセット１５Ｂとの間に共通変数がなくとも、各エンティティに本質的な共通する特徴成分を抽出することができる。 Low dimensionality of feature vectors x, y when the first group of entities and the second group of entities have collective properties that are common or related to each other, such as because the populations are the same By the transformation, even if there is no common variable between the first data set 15A and the second data set 15B of the information sources, it is possible to extract the essential common feature components for each entity.

但し、このような低次元化によっても、低次元特徴ベクトルＤｘ，Ｄｙが、同じ特徴成分を有するだけで、特徴成分の配列を揃えることはできない。また、第一のデータセット１５Ａと第二のデータセット１５Ｂとの間でエンティティの配列は揃っていない。 However, even with such a reduction in dimension, the low-dimensional feature vectors Dx and Dy only have the same feature component, and the arrangement of the feature components cannot be aligned. Also, the entities are not aligned between the first data set 15A and the second data set 15B.

行列Ω^＊の探索は、このようにエンティティの配列及び次元の配列の点で、不ぞろいな特徴ベクトルＤｘ，Ｄｙの対応関係を、類似度分布の同一性を手掛かりに、探索する作業に対応する。 The search for the matrix Ω ^* thus corresponds to the work of searching for the correspondence between irregular feature vectors Dx and Dy in terms of the array of entities and the array of dimensions, using the identity of the similarity distribution as a clue.

続くＳ１８０において、プロセッサ１１は、行列Ω^＊に基づいて、第一のエンティティのそれぞれを、第二のエンティティの少なくとも一つに対応付ける。行列Ω^＊における第ｉ行第ｊ列の要素値は、類似度の分布によれば、第一のエンティティの集合のうちｉ番目のエンティティと、第二のエンティティの集合のうちのｊ番目のエンティティと、が対応する程度又は可能性の大きさを表す。 At subsequent S180, the processor 11 associates each of the first entities with at least one of the second entities based on the matrix Ω ^* . According to the similarity distribution, the element value of the i-th row and j-th column of the matrix Ω ^* is the i-th entity in the first entity set and the j-th entity in the second entity set. and represent the degree or possibility of correspondence.

行列Ω^＊の各要素は、理想的には０又は１を採り、各行について、一行の要素値の合計が１になり、各列について、一列の要素値の合計が１になる。行列Ω^＊が、こうした理想的な行列であるときには、値が１である要素の行番号の第一のエンティティと、列番号の第二のエンティティとが、互いに対応する。 Each element of the matrix Ω ^* ideally takes 0 or 1, the sum of the element values in one row is 1 for each row, and the sum of the element values in one column is 1 for each column. When the matrix Ω ^* is such an ideal matrix, the first entity of the row number and the second entity of the column number of the 1-valued elements correspond to each other.

すなわち、行列Ω^＊における第ｉ行第ｊ列の要素が、値１であるとき、第一のエンティティの集合のうちｉ番目のエンティティと、第二のエンティティの集合のうちのｊ番目のエンティティと、が互いに対応することを示す。 That is, when the i-th row and j-th column element in the matrix Ω ^* has the value 1, the i-th entity in the first set of entities and the j-th entity in the second set of entities , correspond to each other.

但し、数値計算上において、行列Ω^＊が、こうした理想的な行列になることはまれである。従って、Ｓ１８０では、次のいずれかの手法で、複数の第一のエンティティのそれぞれを、第二のエンティティの少なくとも一つに対応付ける。 However, in numerical calculations, the matrix Ω ^* is rarely such an ideal matrix. Therefore, in S180, each of the plurality of first entities is associated with at least one of the second entities using one of the following techniques.

（手法１）行列Ω^＊の第ｉ行において、値が最大の要素を探索する。値が最大の要素が第ｃ列である場合には、第一のエンティティの集合のうちｉ番目のエンティティを、第二のエンティティの集合のうちｃ番目のエンティティに対応付ける。これを全ての行について行う。 (Method 1) The i-th row of the matrix Ω ^* is searched for the element with the maximum value. If the element with the largest value is in the c-th column, the i-th entity in the first set of entities is associated with the c-th entity in the second set of entities. Do this for all rows.

この手法では、第二のエンティティの一つに、複数の第一のエンティティが対応付けられる可能性がある。これを緩和するために、近傍検索が行われてもよい。近傍検索の例としては、ＣｏｎｔｅｘｔｕａｌＤｉｓｓｉｍｉｌａｒｉｔｙｍｅａｓｕｒｅが知られている。 In this approach, one of the second entities may be associated with multiple first entities. To mitigate this, a neighborhood search may be performed. Contextual dissimilarity measure is known as an example of neighborhood search.

（手法２）厳密な一対一の対応付けを行うために、行列Ω^＊を入力とした最適割当問題を解くことにより、複数の第一のエンティティのそれぞれを、重複しない第二のエンティティの一つに対応付ける。 (Method 2) In order to perform a strict one-to-one correspondence, by solving an optimal assignment problem with the matrix Ω ^* as an input, each of the plurality of first entities is assigned to one of the non-overlapping second entities. correspond to

Ｓ１８０において、プロセッサ１１は更に、第一のエンティティと第二のエンティティの対応関係を説明するテーブルとして、図５に示す対応表を出力することができる。すなわち、第一のエンティティのそれぞれのＩＤに関連付けて、第二のエンティティのＩＤを記述する対応表を出力して、ストレージ１５に記憶することができる。 At S180, the processor 11 can further output the correspondence table shown in FIG. 5 as a table describing the correspondence between the first entity and the second entity. That is, a correspondence table describing the ID of the second entity can be output and stored in the storage 15 in association with each ID of the first entity.

更にプロセッサ１１は、上記対応付けの結果、又は上記対応表に基づいて、第一のデータセット１５Ａと、第二のデータセット１５Ｂとを結合して、拡張データセット１５Ｃを生成するデータフュージョン処理を実行する（Ｓ１９０）。 Furthermore, the processor 11 combines the first data set 15A and the second data set 15B based on the result of the correspondence or the correspondence table to perform data fusion processing to generate the extended data set 15C. Execute (S190).

拡張データセット１５Ｃは、複数の拡張データを備える。図６に示すように、複数の拡張データのそれぞれは、対応する一つの第一の特徴データと第二の特徴データとの結合データである。 The extended data set 15C comprises multiple extended data. As shown in FIG. 6, each of the plurality of extension data is combined data of corresponding one first feature data and second feature data.

すなわち、プロセッサ１１は、対応表が示す対応関係に基づき、第一のデータセット１５Ａに含まれる複数の第一の特徴データのそれぞれに、第二のデータセット１５Ｂに含まれる複数の第二の特徴データのうちの一つを結合することによって、拡張データセット１５Ｃを生成する。 That is, the processor 11 assigns a plurality of second features contained in the second data set 15B to each of the plurality of first feature data contained in the first data set 15A based on the correspondence shown by the correspondence table. An extended data set 15C is generated by combining one of the data.

プロセッサ１１は、対応表によれば、第一のエンティティの集合のうちのｉ番目のエンティティと、第二のエンティティの集合のうちｊ番目のエンティティとが対応付けられているとき、第一のエンティティの集合のうちのｉ番目のエンティティの特徴を説明する第一の特徴データと、第二のエンティティの集合のうちのｊ番目のエンティティの特徴を説明する第二の特徴データとを結合して、上記ｉ番目のエンティティの拡張データを生成する。 When the i-th entity in the first entity set and the j-th entity in the second entity set are associated according to the correspondence table, the processor 11 combining the first feature data describing the feature of the i-th entity in the set of and the second feature data describing the feature of the j-th entity in the second set of entities, Generate extension data for the i-th entity.

このようにして生成された拡張データセット１５Ｃは、ストレージ１５に格納される。ストレージ１５に格納された拡張データセット１５Ｃは、例えばユーザインタフェース１７を通じて入力されるユーザからの指令に基づき、通信インタフェース１９を通じて別のシステムに転送される。 The extended data set 15C generated in this manner is stored in the storage 15. FIG. The extended data set 15C stored in the storage 15 is transferred to another system through the communication interface 19 based on a command from the user input through the user interface 17, for example.

別のシステムは、例えば広告配信システムであり得る。広告配信システムは、拡張データセット１５Ｃに基づき、広告配信先のエンティティを判別し、対応するエンティティに広告配信することができる。 Another system may be, for example, an advertisement distribution system. Based on the extended data set 15C, the advertisement distribution system can determine the entity to which the advertisement is to be distributed, and distribute the advertisement to the corresponding entity.

Ｓ１９０において、データフュージョン処理を終了すると、プロセッサ１１は、図２に示す分析処理を終了する。 In S190, after ending the data fusion process, the processor 11 ends the analysis process shown in FIG.

以上に説明したように、本実施形態の情報処理システム１によれば、第一のデータセット１５Ａと、第二のデータセット１５Ｂとの間に共通変数が存在しなくとも、第一のエンティティのそれぞれと、第二のエンティティのそれぞれと、を類似度の分布から適切に対応付けることができる。 As described above, according to the information processing system 1 of the present embodiment, even if there is no common variable between the first data set 15A and the second data set 15B, the first entity Each can be appropriately associated with each of the second entities from the similarity distribution.

本実施形態の技術に基づいて、類似度の分布に基づいて適切に対応付けるためには、第一のエンティティの集合と、第二のエンティティの集合との間において、その類似度分布が、相互に一致する、類似する、又は、関係するのが好ましい条件である。 Based on the technology of the present embodiment, in order to make appropriate correspondence based on the similarity distribution, the similarity distribution between the first entity set and the second entity set must be mutually Matching, similar or related are preferred conditions.

第一のエンティティの集合と、第二のエンティティの集合とが、同じ母集団からの部分集合であるとき、このような好ましい条件はおよそ満足される。従って、例えば、第一のエンティティ及び第二のエンティティが人であるとき、すなわち、第一のデータセット１５Ａ及び第二のデータセット１５Ｂとして、人に関する特徴を表すデータセットが取り扱われるとき、本実施形態の技術は、有意義に機能する。 Such favorable conditions are approximately satisfied when the first set of entities and the second set of entities are subsets from the same population. Therefore, for example, when the first entity and the second entity are people, that is, when data sets representing features related to people are handled as the first data set 15A and the second data set 15B, the present implementation Morphology works meaningfully.

人の行動は、特にデモグラフィック属性に応じた傾向を示すことが多い。従って、第一のデータセット１５Ａ及び第二のデータセット１５Ｂが、デモグラフィック属性の分布が互いに類似すると推定される集団からの収集データに基づいたデータセットであるときには、第一のデータセット１５Ａ及び第二のデータセット１５Ｂが、共通変数の存在しない、互いに異なる集団に属する人の特徴を説明するデータセットであったり、異なる行動の特徴を説明するデータセットであったりしても、適切に、エンティティ間の対応付けを行い、拡張データセット１５Ｃとして、人の心理・行動分析に役立つデータセットを生成することができる。 Human behavior often shows trends that are particularly dependent on demographic attributes. Therefore, when the first data set 15A and the second data set 15B are data sets based on collected data from groups whose demographic attribute distributions are estimated to be similar to each other, the first data set 15A and the second data set 15B Even if the second data set 15B is a data set that describes the characteristics of people belonging to different groups with no common variables, or a data set that describes the characteristics of different behaviors, A data set useful for human psychology/behavior analysis can be generated as an extended data set 15C by matching between entities.

上述した例によれば、第一のデータセット１５Ａは、第一の集団に属する複数の人のそれぞれの購買行動に関する特徴を記述するデータセットであり、第二のデータセット１５Ｂは、第二の集団に属する複数の人のそれぞれのウェブサイト訪問行動／ウェブコンテンツ閲覧行動に関する特徴を記述するデータセットである。 According to the example described above, the first data set 15A is a data set describing the characteristics of purchasing behavior of each of the plurality of people belonging to the first group, and the second data set 15B is the second data set 15B. It is a data set describing the characteristics of website visit behavior/web content browsing behavior of each of a plurality of people belonging to a group.

しかしながら、第一のデータセット１５Ａ及び第二のデータセット１５Ｂの一方には、テレビ視聴行動などの人のメディア接触行動に関する特徴を記述するデータセットが用いられてもよいし、スマートフォン等の携帯端末の使用状況に関する特徴を記述するデータセットが用いられてもよい。 However, one of the first data set 15A and the second data set 15B may be a data set that describes characteristics of people's media contact behavior, such as television viewing behavior, or a mobile terminal such as a smartphone. A data set may be used that describes the usage characteristics of

更に言えば、第一のデータセット１５Ａ及び第二のデータセット１５Ｂの一方には、オフライン空間（すなわち現実空間）における人の移動、例えば複数の場所への訪問、移動経路、及び移動手段に関する特徴を記述するデータセットが用いられてもよいし、オンライン空間における人の移動、具体的には仮想現実（ＶＲ）空間の移動やネットサ―フィンに関する特徴を記述するデータセットが用いられてもよい。更に言えば、第一のデータセット１５Ａ及び第二のデータセット１５Ｂの一方には、アンケートにより収集されたデータに基づくデータセットが使用されてもよい。 Furthermore, one of the first data set 15A and the second data set 15B includes features related to human movement in the offline space (i.e., real space), such as visits to multiple locations, travel routes, and means of travel. may be used, or a dataset describing features related to movement of people in an online space, specifically movement in a virtual reality (VR) space and surfing the net may be used. Furthermore, a data set based on data collected by a questionnaire may be used as one of the first data set 15A and the second data set 15B.

第一のデータセット１５Ａと第二のデータセット１５Ｂとの組合せとしては、アンケートにより収集されたデータセットと、テレビ視聴行動に関するデータセットとの組合せ、移動履歴に関するデータセットと、購買に関するデータセットとの組合せなども考えられる。 Combinations of the first data set 15A and the second data set 15B include a data set collected by a questionnaire and a data set related to TV viewing behavior, a data set related to movement history, and a data set related to purchases. A combination of

この他、上述した実施形態において、低次元特徴ベクトルＤｘ，Ｄｙの一群に対しては、ＺＣＡ白色化、正規化、及び、標準化などの処理が行われてもよい。 In addition, in the above-described embodiment, processing such as ZCA whitening, normalization, and standardization may be performed on the group of low-dimensional feature vectors Dx and Dy.

また、上記実施形態では、低次元特徴ベクトルＤｘ，Ｄｙの次元数Ｍが、設計者又はユーザにより定められるが、最適な次元数Ｍを探索するように、情報処理システム１は構成されてもよい。例えば、情報処理システム１は、図２に示す分析処理を、同一のデータセット１５Ａ，１５Ｂについて次元数Ｍを変更しながら繰返し実行して、Ｚ（Ω）の最大値を指標に、最適な次元数Ｍを自動選定するように構成されてもよい。 In the above embodiment, the number of dimensions M of the low-dimensional feature vectors Dx and Dy is determined by the designer or user, but the information processing system 1 may be configured to search for the optimum number of dimensions M. . For example, the information processing system 1 repeatedly executes the analysis processing shown in FIG. It may be configured to automatically select the number M.

＜第二実施形態＞
第二実施形態の情報処理システム１は、プロセッサ１１が図２に示す分析処理に代えて、図７に示す分析処理を実行するように構成される。以下では、第二実施形態の説明として、プロセッサ１１が実行する分析処理の詳細を選択的に説明する。本実施形態において言及されない情報処理システム１の構成は、第一実施形態と同じであると理解されてよい。 <Second embodiment>
The information processing system 1 of the second embodiment is configured such that the processor 11 executes the analysis process shown in FIG. 7 instead of the analysis process shown in FIG. Below, the details of the analysis processing executed by the processor 11 will be selectively described as a description of the second embodiment. It may be understood that the configuration of the information processing system 1 that is not mentioned in this embodiment is the same as in the first embodiment.

プロセッサ１１は、図７に示す分析処理を開始すると、第一実施形態と同様に、データフュージョン対象の第一のデータセット１５Ａと第二のデータセット１５Ｂとを取得する（Ｓ３１０，Ｓ３２０）。 When the analysis processing shown in FIG. 7 is started, the processor 11 acquires the first data set 15A and the second data set 15B to be subjected to data fusion, as in the first embodiment (S310, S320).

プロセッサ１１は、Ｓ１１０での処理と同様に、第一のデータセット１５Ａに基づいて、第一のエンティティ毎の特徴ベクトルｘを生成する（Ｓ３１０）。プロセッサ１１は、Ｓ１２０での処理と同様に、第二のデータセット１５Ｂに基づいて、第二のエンティティ毎の特徴ベクトルｙを生成する（Ｓ３２０）。 The processor 11 generates a feature vector x for each first entity based on the first data set 15A (S310), similar to the process at S110. The processor 11 generates a feature vector y for each second entity based on the second data set 15B (S320), similar to the process at S120.

更に、プロセッサ１１は、Ｓ１３０，Ｓ１４０での処理と同様に、次元削減処理によって、特徴ベクトルｘの一群に対応する低次元特徴ベクトルＤｘの一群を生成し、特徴ベクトルｙの一群に対応する低次元特徴ベクトルＤｙの一群を生成する（Ｓ３３０）。 Further, the processor 11 generates a group of low-dimensional feature vectors Dx corresponding to the group of feature vectors x, and a group of low-dimensional feature vectors Dx corresponding to the group of feature vectors y by the dimension reduction process, similarly to the processes in S130 and S140. A set of feature vectors Dy is generated (S330).

続くＳ３４０において、プロセッサ１１は、Ｓ１５０，Ｓ１６０，Ｓ１７０での処理と同様の処理を実行する。すなわち、プロセッサ１１は、低次元特徴ベクトルＤｘの一群を用いて、第一のエンティティの集合に関する類似度行列Ｋを生成し、低次元特徴ベクトルＤｙの一群を用いて、第二のエンティティの集合に関する類似度行列Ｌを生成する。 In subsequent S340, the processor 11 executes the same processes as those in S150, S160 and S170. That is, the processor 11 uses a group of low-dimensional feature vectors Dx to generate a similarity matrix K for the first set of entities, and uses a group of low-dimensional feature vectors Dy to generate a similarity matrix K for the second set of entities. Generate a similarity matrix L.

更に、プロセッサ１１は、類似度行列Ｋ及び類似度行列Ｌを用いて、第一実施形態で説明した値Ｚ（Ω）を最大化する行列Ωを行列Ω^＊として探索する（Ｓ３４０）。ここでは、探索された行列Ω^＊のことを、対応関係行列Ω^＊と表現する。 Further, the processor 11 uses the similarity matrix K and the similarity matrix L to search for the matrix Ω that maximizes the value Z(Ω) described in the first embodiment as the matrix Ω ^* (S340). Here, the searched matrix Ω ^* is expressed as a correspondence matrix Ω ^* .

その後、プロセッサ１１は、繰返し終了条件が満足されたか否かを判断する（Ｓ３５０）。繰返し終了条件が満足されていないと判断すると（Ｓ３５０でＮｏ）、プロセッサ１１は、Ｓ３６０の処理を実行する。 After that, the processor 11 determines whether or not the repetition end condition is satisfied (S350). When determining that the repetition end condition is not satisfied (No in S350), the processor 11 executes the process of S360.

Ｓ３６０において、プロセッサ１１は、Ｓ３４０で探索された対応関係行列Ω^＊を固定した状態で、Ｇｒｏｍｏｖ－Ｗａｓｓｅｒｓｔｅｉｎ距離のコストを最小化する次元削減方式を探索する。 At S360, the processor 11 searches for a dimensionality reduction scheme that minimizes the cost of the Gromov-Wasserstein distance while fixing the correspondence matrix Ω ^* searched at S340.

対応関係行列Ω^＊を固定した状態は、第一のエンティティと第二のエンティティとの間の対応関係を固定した状態に対応する。上述したように値Ｚ（Ω）を最大化する行列Ωを対応関係行列Ω^＊として探索することは、第二の特徴空間における第二のエンティティの分布に適合するように、第一の特徴空間上の複数の第一のエンティティを第二の特徴空間にマッピングするための写像を探索することに対応する。 Fixing the correspondence matrix Ω ^* corresponds to fixing the correspondence between the first entity and the second entity. Searching the matrix Ω that maximizes the value Z(Ω) as described above as the correspondence matrix Ω ^* is performed in the first feature space to fit the distribution of the second entity in the second feature space This corresponds to finding a map for mapping the plurality of first entities above to the second feature space.

Ｇｒｏｍｏｖ－Ｗａｓｓｅｒｓｔｅｉｎ距離のコストは、第一のエンティティの集合を、第二の特徴空間にマッピングしたときの第一のエンティティと第二のエンティティとの間の最適輸送問題における輸送コストに対応する。 The cost of the Gromov-Wasserstein distance corresponds to the transportation cost in the optimal transportation problem between the first entity and the second entity when mapping the first set of entities to the second feature space.

Ｇｒｏｍｏｖ－Ｗａｓｓｅｒｓｔｅｉｎ距離のコストは、類似度行列Ｋ，Ｌ及び対応関係行列Ω^＊を用いて算出可能である。類似度行列Ｋは、上述の通り、次元削減後の低次元特徴ベクトルＤｘに基づいて算出された第一のエンティティ間の類似度を要素に含む行列である。類似度行列Ｌは、次元削減後の低次元特徴ベクトルＤｙに基づいて算出された第二のエンティティ間の類似度を要素に含む行列である。 The cost of the Gromov-Wasserstein distance can be calculated using the similarity matrices K, L and the correspondence matrix Ω ^* . The similarity matrix K is, as described above, a matrix whose elements are the degrees of similarity between the first entities calculated based on the reduced-dimensional feature vector Dx. The similarity matrix L is a matrix whose elements are similarities between the second entities calculated based on the reduced-dimensional feature vector Dy.

Ｇｒｏｍｏｖ－Ｗａｓｓｅｒｓｔｅｉｎ距離のコストを最小化する次元削減方式を探索することは、対応関係行列Ω^＊で示される第一のエンティティと第二のエンティティとの間の対応関係を最もよく正当化する低次元特徴ベクトルＤｘ，Ｄｙを生成するための次元削減方式を探索することに対応する。 Searching for a dimensionality reduction scheme that minimizes the cost of the Gromov-Wasserstein distance, the low dimensionality that best justifies the correspondence between the first entity and the second entity denoted by the correspondence matrix Ω ^* It corresponds to searching for a dimensionality reduction scheme for generating feature vectors Dx, Dy.

コストの最小化は、対応関係行列Ω^＊によれば、互いに対応する第一のエンティティと第二のエンティティとの間の特徴空間上の距離、換言すれば、第一のエンティティの低次元特徴ベクトルＤｘと、第二のエンティティの低次元特徴ベクトルＤｙとの間の特徴空間上の距離が短くなるように、次元削減方式を探索することに対応する。 According to the correspondence matrix Ω ^* , the cost minimization is the distance in the feature space between the first entity and the second entity that correspond to each other, in other words, the low-dimensional feature vector of the first entity It corresponds to searching for a dimensionality reduction scheme that reduces the distance in the feature space between Dx and the low-dimensional feature vector Dy of the second entity.

例えば、Ｍ１次元の特徴ベクトルｘを、Ｍ次元の低次元特徴ベクトルＤｘに変換する場合には、特徴ベクトルｘにＭ行Ｍ１列の変換行列Ｔｘを作用させる。Ｍ２次元の特徴ベクトルｙを、Ｍ次元の低次元特徴ベクトルＤｙに変換する場合には、特徴ベクトルｙにＭ行Ｍ２列の変換行列Ｔｙを作用させる。このとき、変換行列Ｔｘ，Ｔｙを構成するパラメータｍの数は、（Ｍ＊Ｍ１＋Ｍ＊Ｍ２）個である。 For example, when transforming an M1-dimensional feature vector x into an M-dimensional low-dimensional feature vector Dx, a transformation matrix Tx of M rows and M1 columns is applied to the feature vector x. When transforming an M2-dimensional feature vector y into an M-dimensional low-dimensional feature vector Dy, a transformation matrix Ty of M rows and M2 columns is applied to the feature vector y. At this time, the number of parameters m constituting the transformation matrices Tx and Ty is (M*M1+M*M2).

次元削減方式の探索は、例えば変換行列Ｔｘ，Ｙｙのパラメータｍとして、上述のコストを最小化するパラメータｍを、勾配法等を用いて探索することにより実現される。 The search for the dimensionality reduction method is realized, for example, by searching for the parameter m that minimizes the above cost as the parameter m of the transformation matrices Tx and Yy using the gradient method or the like.

その後、プロセッサ１１は、探索された次元削減方式（例えば変換行列Ｔｘ，Ｔｙ）で特徴ベクトルｘ，ｙを低次元化し、新たな低次元特徴ベクトルＤｘ，Ｄｙを算出する（Ｓ３７０）。 After that, the processor 11 reduces the dimensions of the feature vectors x and y by the searched dimension reduction method (for example, the transformation matrices Tx and Ty), and calculates new low-dimensional feature vectors Dx and Dy (S370).

プロセッサ１１は、新たな低次元特徴ベクトルＤｘに基づく類似度行列Ｋ、及び、新たな低次元特徴ベクトルＤｙに基づく類似度行列Ｌを用いて、値Ｚ（Ω）を最大化する行列Ωを、新たな対応関係行列Ω^＊として探索する（Ｓ３４０）。 The processor 11 uses the similarity matrix K based on the new low-dimensional feature vector Dx and the similarity matrix L based on the new low-dimensional feature vector Dy to create a matrix Ω that maximizes the value Z (Ω), Search as a new correspondence matrix Ω ^* (S340).

プロセッサ１１は、このようにＳ３６０，Ｓ３７０，Ｓ３４０の処理を繰返し実行することによって、マッチング精度の高い対応関係行列Ω^＊を、より良い次元削減方式と共に再探索する。 By repeatedly executing the processes of S360, S370, and S340 in this manner, the processor 11 re-searches the correspondence matrix Ω ^* with high matching accuracy along with a better dimensionality reduction method.

プロセッサ１１は、繰返し終了条件が満足されると（Ｓ３５０でＹｅｓ）、Ｓ３８０の処理を実行する。繰返し終了条件は、例えば、Ｓ３４０の処理が所定回実行された場合に、あるいは、再探索による対応関係行列Ω^＊の変化量が一定未満になった場合に満足される。 When the repetition end condition is satisfied (Yes in S350), the processor 11 executes the process of S380. The repetition end condition is satisfied, for example, when the process of S340 is executed a predetermined number of times, or when the amount of change in the correspondence matrix Ω ^* due to the re-search becomes less than a certain amount.

Ｓ３８０において、プロセッサ１１は、第一実施形態におけるＳ１８０の処理と同様に、繰返し処理の最後に算出された対応関係行列Ω^＊に基づいて、第一のエンティティのそれぞれを、第二のエンティティの少なくとも一つに対応付ける。プロセッサ１１は更に、第一のエンティティと第二のエンティティの対応関係を説明する対応表を記憶及び出力することができる。 In S380, the processor 11 converts each of the first entities to at ^least correspond to one. Processor 11 is further capable of storing and outputting a correspondence table describing the correspondence between the first entity and the second entity.

その後、プロセッサ１１は、Ｓ１９０での処理と同様に、第一のデータセット１５Ａと、第二のデータセット１５Ｂとを結合して、拡張データセット１５Ｃを生成するデータフュージョン処理を実行し、生成した拡張データセット１５Ｃをストレージ１５に格納する（Ｓ３９０）。 After that, similarly to the processing in S190, the processor 11 combines the first data set 15A and the second data set 15B to perform data fusion processing to generate the extended data set 15C. The extended data set 15C is stored in the storage 15 (S390).

以上に説明した第二実施形態の情報処理システム１は、上述の繰返し処理によって、更に精度よく、第一のエンティティと第二のエンティティとの間の対応付けを行うことができる。従って、対応付けの正しい精度の良い拡張データセット１５Ｃを生成することが可能である。 The information processing system 1 of the second embodiment described above can associate the first entity and the second entity with even higher accuracy through the iterative process described above. Therefore, it is possible to generate the extended data set 15C with correct correspondence and high accuracy.

＜第三実施形態＞
第三実施形態の情報処理システム１は、ユーザインタフェース１７を通じたユーザからの実行指示に基づき、プロセッサ１１が図８に示す評価処理を実行するように構成される。以下では、第三実施形態の説明として、プロセッサ１１が実行する評価処理の詳細を説明する。本実施形態において言及されない情報処理システム１の構成は、第一又は第二実施形態と同じであると理解されてよい。 <Third Embodiment>
The information processing system 1 of the third embodiment is configured such that the processor 11 executes the evaluation process shown in FIG. Details of the evaluation process executed by the processor 11 will be described below as a description of the third embodiment. It may be understood that the configuration of the information processing system 1 that is not mentioned in this embodiment is the same as in the first or second embodiment.

評価処理は、評価対象のデータセットが、図２又は図７に示す分析処理での対応付け及びデータフュージョンを高精度に実行可能な優良なデータセットであるか否かを評価するために実行される。評価対象のデータセットは、分析処理で、第一のデータセット１５Ａ又は第二のデータセット１５Ｂとして使用され得るデータセットに対応する。 The evaluation process is performed to evaluate whether or not the data set to be evaluated is an excellent data set capable of performing the matching and data fusion in the analysis process shown in FIG. 2 or 7 with high accuracy. be. The data set to be evaluated corresponds to a data set that can be used as the first data set 15A or the second data set 15B in the analytical process.

プロセッサ１１は、評価処理を開始すると、ユーザから実行指示と共に指定された評価対象のデータセットを取得する（Ｓ４１０）。プロセッサ１１は、ストレージ１５から指定された評価対象のデータセットを取得することができる。 When starting the evaluation process, the processor 11 acquires an evaluation target data set specified by the user together with an execution instruction (S410). The processor 11 can acquire the designated evaluation target data set from the storage 15 .

その後、プロセッサ１１は、評価対象のデータセットに基づき、エンティティ毎に、第一の特徴ベクトルｘ＿１と、第二の特徴ベクトルｘ＿２と、を生成する（Ｓ４２０）。評価対象のデータセットは、エンティティ毎に、対応するエンティティの特徴を（Ｑ１＋Ｑ２）個の要素で表す特徴データを備えることができる。 Processor 11 then generates a first feature vector x_1 and a second feature vector x_2 for each entity based on the data set to be evaluated (S420). The data set to be evaluated may comprise, for each entity, feature data representing features of the corresponding entity with (Q1+Q2) elements.

プロセッサ１１は、（Ｑ１＋Ｑ２）個の要素を、Ｑ１個の要素からなる第一の要素群と、Ｑ２個の要素からなる第二の要素群と、に分割することができる。（Ｑ１＋Ｑ２）個の要素のそれぞれは、ランダムに、第一の要素群及び第二の要素群のいずれかに分類され得る。 The processor 11 can divide the (Q1+Q2) elements into a first element group consisting of Q1 elements and a second element group consisting of Q2 elements. Each of the (Q1+Q2) elements can be randomly classified into either the first element group or the second element group.

プロセッサ１１は、評価対象のデータセットに基づいて、エンティティ毎に、対応するエンティティの第一の要素群に関する特徴を記述した第一の特徴ベクトルｘ＿１と、対応するエンティティの第二の要素群に関する特徴を記述した第二の特徴ベクトルｘ＿２と、を生成することができる。 Based on the data set to be evaluated, the processor 11 generates, for each entity, a first feature vector x_1 describing a feature of the first element group of the corresponding entity and a feature of the second element group of the corresponding entity. and a second feature vector x_2 describing

例えば、評価対象のデータセットが、Ｓ１１０，Ｓ１２０，Ｓ３１０，又はＳ３２０で特徴ベクトルｖ＝（ｖ［１］，ｖ［２］，ｖ［３］，…，ｖ［Ｑ］）が生成され得る要素数Ｑ＝（Ｑ１＋Ｑ２）の特徴データをエンティティ毎に備える場合、Ｑ１個の要素を含む第一の特徴ベクトルｘ＿１＝（ｖ［１］，ｖ［２］，…，ｖ［Ｑ１］）及びＱ２個の要素を含む第二の特徴ベクトルｘ＿２＝（ｖ［Ｑ１＋１］，ｖ［Ｑ１＋２］，…，ｖ［Ｑ１＋Ｑ２］）が生成され得る。 For example, the data set to be evaluated is an element that can generate a feature vector v=(v[1], v[2], v[3], . When the number Q = (Q1 + Q2) of feature data is provided for each entity, a first feature vector x_1 containing Q1 elements = (v[1], v[2], ..., v[Q1]) and Q2 A second feature vector x_2=(v[Q1+1], v[Q1+2], . . . , v[Q1+Q2]) may be generated containing elements of

第一の特徴ベクトルｘ＿１は、第一のエンティティの集合におけるエンティティ毎の特徴ベクトルｘに対応し、第二の特徴ベクトルｘ＿２は、第一のエンティティの集合と同一の第二のエンティティの集合におけるエンティティ毎の特徴ベクトルｙに対応する。 The first feature vector x_1 corresponds to the feature vector x for each entity in the first set of entities, and the second feature vector x_2 corresponds to the entity in the same second set of entities as the first set of entities. corresponding to each feature vector y.

その後、プロセッサ１１は、Ｓ１３０～Ｓ１７０で実行される処理と同様の処理を、Ｓ４３０，Ｓ４４０において、第一の特徴ベクトルｘ＿１及び第二の特徴ベクトルｘ＿２に対して実行する。 After that, the processor 11 performs the same processing as the processing performed in S130 to S170 on the first feature vector x_1 and the second feature vector x_2 in S430 and S440.

Ｓ４３０において、プロセッサ１１は、Ｓ１３０，Ｓ１４０での処理と同様に、第一のエンティティ毎の第一の特徴ベクトルｘ＿１及び第二のエンティティ毎の第二の特徴ベクトルｘ＿２に対する次元削減処理を実行して、同次元数の低次元特徴ベクトルＤｘ＿１及び低次元特徴ベクトルＤｘ＿２を生成する。 In S430, the processor 11 performs dimension reduction processing on the first feature vector x_1 for each first entity and the second feature vector x_2 for each second entity, similarly to the processing in S130 and S140. , generate a low-dimensional feature vector Dx_1 and a low-dimensional feature vector Dx_2 having the same number of dimensions.

プロセッサ１１は、第一のエンティティ毎の低次元特徴ベクトルＤｘ＿１に基づき、類似度行列Ｋに対応する第一のエンティティ間の低次元特徴ベクトルＤｘ＿１の類似度を表す類似度行列を生成する。プロセッサ１１は更に、第二のエンティティ毎の低次元特徴ベクトルＤｘ＿２に基づき、類似度行列Ｌに対応する第二のエンティティ間の低次元特徴ベクトルＤｘ＿２の類似度を表す類似度行列を生成する。 The processor 11 generates a similarity matrix representing the similarity of the low-dimensional feature vectors Dx_1 between the first entities corresponding to the similarity matrix K based on the low-dimensional feature vectors Dx_1 for each first entity. The processor 11 further generates a similarity matrix representing the similarity of the low-dimensional feature vectors Dx_2 between the second entities corresponding to the similarity matrix L based on the low-dimensional feature vectors Dx_2 for each second entity.

プロセッサ１１は、これらの類似度行列に基づき、値Ｚ（Ω）を最大化する行列Ωを対応関係行列Ω^＊として探索する（Ｓ４４０）。 Based on these similarity matrices, the processor 11 searches for the matrix Ω that maximizes the value Z(Ω) as the correspondence matrix Ω ^* (S440).

その後、プロセッサ１１は、低次元特徴ベクトルＤｘ＿１の一群に対応する第一のエンティティの集合と、低次元特徴ベクトルＤｘ＿２の一群に対応する第二のエンティティの集合とに関して、対応関係行列Ω^＊が、第一のエンティティと第二のエンティティとの間の対応関係を正しく表している程度をスコアとして算出する（Ｓ４５０）。 After that, the processor 11 determines that the correspondence matrix Ω ^* for the first set of entities corresponding to the group of low-dimensional feature vectors Dx_1 and the second set of entities corresponding to the group of low-dimensional feature vectors Dx_2 is: A score is calculated to indicate the degree of correct representation of the correspondence between the first entity and the second entity (S450).

これにより、プロセッサ１１は、評価対象のデータセットが分析処理による対応付け及びデータフュージョンを高精度に実行可能な優良なデータセットであるか否かを評価する（Ｓ４５０）。 Thereby, the processor 11 evaluates whether or not the data set to be evaluated is an excellent data set capable of performing matching and data fusion by analysis processing with high accuracy (S450).

プロセッサ１１は、予めＳ４２０で第一のエンティティ毎の特徴ベクトルｘ＿１及び第二のエンティティ毎の特徴ベクトルｘ＿２を生成する際に、第一のエンティティと第二のエンティティとの間の正しい対応関係を記憶しておくことができる。 The processor 11 stores the correct correspondence relationship between the first entity and the second entity when generating the feature vector x_1 for each first entity and the feature vector x_2 for each second entity in advance in S420. can be kept.

プロセッサ１１は、このように対応関係の正解を記憶した環境で、Ｓ４３０，Ｓ４４０において分析処理と同様の処理を実行して対応関係行列Ω^＊を算出し、対応関係行列Ω^＊から特定される対応関係を正解と比較する。 Processor 11 calculates correspondence matrix Ω ^* by executing processing similar to the analysis processing in S430 and S440 in an environment in which the correct correspondence relationship is stored as described above, and calculates correspondence specified from correspondence matrix Ω ^* . Compare the relationship with the correct answer.

例えば、プロセッサ１１は、対応関係行列Ω^＊に基づいて、第一のエンティティのそれぞれを、第二のエンティティの一つと対応付ける処理を、Ｓ１８０，Ｓ３８０での処理と同様に実行する。 For example, the processor 11 performs the process of associating each of the first entities with one of the second entities based on the correspondence matrix Ω ^* in the same manner as in S180 and S380.

プロセッサ１１は、対応関係行列Ω^＊に基づいて対応付けられた第一のエンティティと第二のエンティティとが、評価対象のデータセットにおいて同一のエンティティである場合には、対応付けに成功したと判別し、同一のエンティティではない場合には、対応付けに失敗したと判別する。 The processor 11 determines that the association is successful when the first entity and the second entity associated based on the correspondence matrix Ω ^* are the same entity in the data set to be evaluated. If they are not the same entity, it is determined that the association has failed.

プロセッサ１１は、エンティティ全体のうち、対応付けに成功した割合を、評価対象のデータセットのスコアとして算出することができる（Ｓ４５０）。その後、プロセッサ１１は、算出したスコアを評価結果として出力し（Ｓ４６０）、評価処理を終了する。 The processor 11 can calculate the percentage of successful associations among all entities as the score of the data set to be evaluated (S450). After that, the processor 11 outputs the calculated score as an evaluation result (S460), and ends the evaluation process.

一つのデータセットに基づく対応付け及びデータフュージョンを高精度に実行できない場合には、そのデータセットが、集合の特徴に関して高精度な対応付け及びデータフュージョンを実現するために十分な情報又はデータ構造を有していないと推測できる。 If matching and data fusion based on a single dataset cannot be performed with high accuracy, the dataset must contain sufficient information or data structure to achieve high accuracy matching and data fusion of the features of the set. It can be assumed that they do not.

この情報不足は、二つの異なるデータセットに関して分析処理を実行して、対応付け及びデータフュージョンを行う場合の精度にも影響する。従って、上記評価処理によれば、評価対象のデータセットが、共通変数なしのデータフュージョンを高精度に実行可能なデータセットであるかを、事前に推測することができる。 This lack of information also affects the accuracy of matching and data fusion when performing analytical processes on two different data sets. Therefore, according to the evaluation process described above, it is possible to infer in advance whether the data set to be evaluated is a data set in which data fusion without a common variable can be executed with high accuracy.

プロセッサ１１は、Ｓ４６０において、スコアの出力により、評価対象のデータセットが優良なデータセットであるか否かを情報処理システム１のユーザに伝達することができる。これにより、ユーザは、分析処理に、適切な第一のデータセット１５Ａ及び第二のデータセット１５Ｂの組合せを採用して、信頼性の高い拡張データセット１５Ｃを得ることができる。 In S460, the processor 11 can inform the user of the information processing system 1 whether or not the data set to be evaluated is an excellent data set by outputting the score. Thereby, the user can employ an appropriate combination of the first data set 15A and the second data set 15B for analysis processing to obtain the extended data set 15C with high reliability.

また、所望の拡張データセット１５Ｃを得るために、第二のデータセット１５Ｂに結合する第一のデータセット１５Ａとして、互いに類似する複数のデータセットのいずれかを採用すれば十分である環境が考えられる。 In addition, in order to obtain the desired extended data set 15C, it is considered that it is sufficient to adopt any one of a plurality of mutually similar data sets as the first data set 15A to be combined with the second data set 15B. be done.

例えば、購買行動に関する第一のデータセット１５Ａと、ウェブサイト訪問行動／ウェブコンテンツ閲覧行動に関する第二のデータセット１５Ｂとを、結合して、拡張データセット１５Ｃを生成することを考える。この場合、第一のデータセット１５Ａとして、複数の流通組織のいずれか一組織の顧客の購買行動に関するデータセットを用いて、拡張データセット１５Ｃを生成すれば十分であることが考えられる。 For example, consider combining a first data set 15A on purchasing behavior and a second data set 15B on website visiting/web content browsing behavior to produce an extended data set 15C. In this case, it may be sufficient to generate the extended data set 15C using a data set relating to customer purchasing behavior of any one of the plurality of distribution organizations as the first data set 15A.

複数の流通組織の例には、複数のコンビニエンスストアチェーンが含まれる。各コンビニストアチェーンの購買に関するデータセットには、消費者の購買行動として、同種の購買行動に関する情報が含まれ得る。 Examples of multiple distribution organizations include multiple convenience store chains. A data set regarding purchases of each convenience store chain may include information regarding the same kind of purchasing behavior as consumers' purchasing behavior.

従って、第一のデータセット１５Ａとしては、複数のコンビニエンスストアチェーンのうちのいずれか一つの顧客の購買行動に関するデータセットを用いて、拡張データセット１５Ｃを生成すれば十分であることが考えられる。 Therefore, as the first data set 15A, it is considered sufficient to generate the extended data set 15C using a data set on customer purchasing behavior of any one of a plurality of convenience store chains.

上述の評価処理は、第一のデータセット１５Ａ（又は第二のデータセット１５Ｂ）の候補として、複数のデータセットが存在する場合に、これらの複数のデータセットから、対応付け及びデータフュージョンの精度の観点で最適なデータセットを、第一のデータセット１５Ａ（又は第二のデータセット１５Ｂ）として選択するために利用することができる。 In the above-described evaluation process, when there are multiple data sets as candidates for the first data set 15A (or second data set 15B), the accuracy of matching and data fusion is determined from these multiple data sets. can be used to select the optimal data set in terms of as the first data set 15A (or the second data set 15B).

例えば、プロセッサ１１は、Ｓ１１０，Ｓ１２０，Ｓ３１０，Ｓ３２０のいずれかの処理において、必要に応じて、図９に示す選択処理を実行することにより、データフュージョン対象のデータセットの複数の候補から、一つの候補を、データフュージョン対象のデータセットとして採用することができる。Ｓ１１０，Ｓ３１０におけるデータフュージョン対象のデータセットは、第一のデータセット１５Ａに対応し、Ｓ１２０，Ｓ３２０におけるデータフュージョン対象のデータセットは、第二のデータセット１５Ｂに対応する。 For example, the processor 11 executes the selection process shown in FIG. One candidate can be adopted as a data set for data fusion. The data set targeted for data fusion in S110 and S310 corresponds to the first data set 15A, and the data set targeted for data fusion in S120 and S320 corresponds to the second data set 15B.

図９に示す選択処理を開始すると、プロセッサ１１は、データフュージョン対象のデータセットの複数の候補として、複数のデータセットを取得する（Ｓ５１０）。プロセッサ１１は、ユーザから指定された複数のデータセットを、ストレージ１５から取得することができる。 When the selection process shown in FIG. 9 is started, the processor 11 acquires multiple data sets as multiple data set candidates for data fusion (S510). The processor 11 can acquire a plurality of data sets designated by the user from the storage 15 .

その後、プロセッサ１１は、複数のデータセットのうちの一つを、評価対象のデータセットに設定して（Ｓ５２０）、図８に示す評価処理を実行する（Ｓ５３０）。プロセッサ１１は、複数のデータセットのすべてに関する評価処理を実行するまで（Ｓ５４０でＹｅｓ）、データセット毎に、これを評価対象のデータセットに設定して（Ｓ５２０）、評価処理（Ｓ５３０）を実行する処理を繰り返す。これにより、データセット毎に、Ｓ４５０で算出されるスコアを取得する。 After that, the processor 11 sets one of the plurality of data sets as a data set to be evaluated (S520), and executes the evaluation process shown in FIG. 8 (S530). The processor 11 sets each data set as a data set to be evaluated (S520) and executes the evaluation process (S530) until the evaluation process for all of the plurality of data sets is executed (Yes in S540). repeat the process. As a result, the score calculated in S450 is obtained for each data set.

複数のデータセットのすべてに関して評価処理を実行し、スコアを取得すると（Ｓ５４０でＹｅｓ）、プロセッサ１１は、複数のデータセットのうち、最もスコアの高いデータセットを、データフュージョン対象のデータセットに採用する（Ｓ５５０）。その後、選択処理を終了する。Ｓ１１０，Ｓ１２０，Ｓ３１０，Ｓ３２０において、プロセッサ１１は、採用されたデータフュージョン対象のデータセットに基づく特徴ベクトル（ｘ又はｙ）を生成することができる。 When evaluation processing is performed on all of the plurality of data sets and scores are obtained (Yes in S540), the processor 11 adopts the data set with the highest score among the plurality of data sets as the data set to be subjected to data fusion. (S550). After that, the selection process ends. At S110, S120, S310, S320, processor 11 may generate a feature vector (x or y) based on the adopted data fusion target dataset.

このように選択処理を実行して、複数の候補の中から最適なデータセットを選択することによれば、精度の高い拡張データセット１５Ｃを生成することが可能である。 By executing the selection process in this manner and selecting the optimum data set from a plurality of candidates, it is possible to generate the extended data set 15C with high accuracy.

付言すると、購買行動の例において、データフュージョン対象のデータセットの複数の候補には、消費者の購買行動を異なるパラメータで表すデータセットが含まれ得る。例えば、第一の候補は、エンティティ（消費者）毎に、商品毎の購入個数を要素に含む特徴ベクトルを生成可能なデータセットであり得る。第二の候補は、エンティティ（消費者）毎に、商品毎の購入金額を要素に含む特徴ベクトルを生成可能なデータセットであり得る。 Additionally, in the example of purchasing behavior, the plurality of candidate datasets for data fusion may include datasets representing consumer purchasing behavior with different parameters. For example, the first candidate may be a data set capable of generating a feature vector whose elements include the number of items purchased for each entity (consumer). A second candidate may be a data set capable of generating a feature vector whose elements include the purchase amount of each product for each entity (consumer).

こうした同種の特徴を異なるパラメータで説明する複数のデータセットを用意して、データフュージョンに適したデータセットを選択することは、より良い拡張データセット１５Ｃの生成に繋がる。 Preparing a plurality of data sets that explain similar features with different parameters and selecting a data set suitable for data fusion leads to generation of a better extended data set 15C.

＜第四実施形態＞
図１０に示す第四実施形態の配信システム３０は、第一実施形態又は第二実施形態のデータフュージョン技術を用いて、外部から提供されるデータセットである外部データセット３５Ａと、内部に保持するデータセットである内部データセット３５Ｂとを結合し、それにより生成される拡張データセット３５Ｃに基づいて、広告配信を行うシステムである。 <Fourth embodiment>
The distribution system 30 of the fourth embodiment shown in FIG. 10 uses the data fusion technology of the first embodiment or the second embodiment to provide an external data set 35A, which is a data set provided from the outside, and an external data set 35A that is internally held. This is a system that combines an internal data set 35B, which is a data set, and distributes advertisements based on the extended data set 35C that is generated thereby.

配信システム３０は、図１０に示すように、プロセッサ３１と、メモリ３３と、ストレージ３５と、通信インタフェース３９とを備える。プロセッサ３１は、ストレージ３５に格納されたコンピュータプログラムＰｒ１に従う処理を実行する。ストレージ３５は、更に、内部データセット３５Ｂを備える。 The distribution system 30 includes a processor 31, a memory 33, a storage 35, and a communication interface 39, as shown in FIG. Processor 31 executes processing according to computer program Pr1 stored in storage 35 . The storage 35 further comprises an internal data set 35B.

内部データセット３５Ｂは、図１１に示すように、ユーザ毎に、対応するユーザの広告ＩＤに関連付けて、対応するユーザのオンライン行動の特徴を説明する特徴データを備える。広告ＩＤは、良く知られるように、広告のために使用される識別コードであって、情報端末に固有のＩＤである。 As shown in FIG. 11, the internal data set 35B includes, for each user, feature data describing the features of online behavior of the corresponding user in association with the advertisement ID of the corresponding user. The advertisement ID, as is well known, is an identification code used for advertisement and is an ID unique to the information terminal.

広告ＩＤに関連付けられた特徴データは、対応する広告ＩＤが割り当てられた情報端末を通じて観測されたユーザのオンライン行動の特徴を説明する。オンライン行動には、ウェブサイト訪問行動及びウェブコンテンツ閲覧行動が含まれる。 The feature data associated with the Advertisement ID describes features of the user's online behavior observed through the information terminal assigned the corresponding Advertisement ID. Online behavior includes website visit behavior and web content viewing behavior.

配信システム３０は、通信インタフェース３９を通じて広域ネットワークと接続され、広域ネットワークを介して、広告配信サービスを提供する。広告配信サービスを利用するサービス利用企業側システム４０は、配信システム３０に対して、配信対象の広告コンテンツと共に、配信指定情報を提供する。広告コンテンツは、広告用の情報コンテンツである。配信指定情報には、配信ターゲットを指定するターゲット指定情報、及び、配信数を指定する配信数指定情報が含まれる。 The distribution system 30 is connected to a wide area network through a communication interface 39 and provides an advertisement distribution service through the wide area network. A system 40 on the side of a service-using company that uses an advertisement distribution service provides the distribution system 30 with distribution designation information together with advertisement content to be distributed. Advertising content is information content for advertising. The distribution designation information includes target designation information that designates distribution targets and distribution number designation information that designates the number of distributions.

サービス利用企業側システム４０は更に、配信システム３０に対し、外部データセット３５Ａとして、配信先候補に対応する顧客の特徴を説明するデータセットである顧客データセットを提供する。 The service-using enterprise system 40 further provides the delivery system 30 with a customer data set, which is a data set describing the characteristics of the customer corresponding to the delivery destination candidate, as an external data set 35A.

顧客データセットは、例えば、サービス利用企業が運営する店舗を利用する顧客の購買行動に関する特徴を説明するデータセットであり得る。例えば、顧客データセットは、顧客毎の特徴データとして、複数の商品に関する、対応する顧客の商品毎の購買量を説明する特徴データを備えることができる。 The customer data set may be, for example, a data set describing the characteristics of the purchasing behavior of customers who use stores operated by service using companies. For example, the customer data set may comprise, as feature data for each customer, feature data describing the purchase volume for each item of the corresponding customer regarding a plurality of items.

プロセッサ３１は、通信インタフェース３９を通じてサービス利用企業側システム４０から配信要求が入力されると、コンピュータプログラムＰｒ１に基づいて図１２に示す配信制御処理を実行する。 When a distribution request is input from the service using company system 40 through the communication interface 39, the processor 31 executes the distribution control process shown in FIG. 12 based on the computer program Pr1.

配信制御処理を開始すると、プロセッサ３１は、サービス利用企業側システム４０から、配信対象の広告コンテンツと共に、ターゲット指定情報及び配信数指定情報を含む配信指定情報、外部データセット３５Ａとしての顧客データセットを取得する（Ｓ６１０）。 When the distribution control process is started, the processor 31 receives from the system 40 of the company using the service, the advertisement content to be distributed, the distribution designation information including the target designation information and the number of distribution designation information, and the customer data set as the external data set 35A. Acquire (S610).

その後、プロセッサ３１は、第一のデータセット１５Ａとして外部データセット３５Ａを用いて、更には、第二のデータセット１５Ａとして内部データセット３５Ｂを用いて、分析処理におけるＳ１１０～Ｓ１９０の処理と同様の処理を実行することにより、外部データセット３５Ａと内部データセット３５Ｂとを結合し、拡張データセット３５Ｃを生成する（Ｓ６２０）。 After that, the processor 31 uses the external data set 35A as the first data set 15A and further uses the internal data set 35B as the second data set 15A to perform the same processing as in S110 to S190 in the analysis process. By executing the process, the external data set 35A and the internal data set 35B are combined to generate the extended data set 35C (S620).

外部データセット３５Ａと内部データセット３５Ｂとの結合により、外部データセット３５Ａに含まれる顧客毎の特徴データには、内部データセット３５Ｂに含まれる顧客と同一人物である可能性の高いユーザの広告ＩＤが関連付けられる。 By combining the external data set 35A and the internal data set 35B, the feature data for each customer contained in the external data set 35A includes the advertisement ID of the user who is highly likely to be the same person as the customer contained in the internal data set 35B. is associated.

拡張データセット３５Ｃは、エンティティ毎に、対応する顧客の外部データセット３５Ａが有する特徴データと、対応するユーザの内部データセット３５Ｂが有する特徴データとが結合された拡張データを備える。各拡張データには、内部データセット３５Ｂが有する対応するユーザの広告ＩＤが関連付けられる。 The extended data set 35C includes extended data obtained by combining the feature data of the corresponding customer's external data set 35A and the feature data of the corresponding user's internal data set 35B for each entity. Each extension data is associated with the corresponding user's advertisement ID in the internal data set 35B.

ここでいうエンティティは、データフュージョンにより互いに対応付けられた顧客とユーザとの組合せのことである。データフュージョンでは、顧客とユーザとが一対一で対応付けられる。例えば、拡張データセット３５Ｃは、図６に示す拡張データセット１５Ｃにおいて図示される「ＩＤ２＿１」「ＩＤ２＿２」「ＩＤ２＿３」の列に、広告ＩＤが記述された構成にされ得る。 An entity here is a combination of a customer and a user that are associated with each other by data fusion. Data fusion creates a one-to-one correspondence between customers and users. For example, the extended data set 35C may have a configuration in which advertisement IDs are described in columns of "ID2_1", "ID2_2", and "ID2_3" illustrated in the extended data set 15C shown in FIG.

プロセッサ３１は、その後、拡張データセット３５Ｃ内の各エンティティが配信ターゲットである可能性に関するスコアを算出する（Ｓ６３０）。例えば、外部データセット３５Ａが顧客の購買行動に関するデータセットであり、内部データセット３５Ｂがユーザのオンライン行動に関するデータセットである場合、プロセッサ３１は、拡張データセット３５Ｃ内の各エンティティの購買行動に関する特徴データとオンライン行動に関する特徴データとを所定の関数に入力して、対応するエンティティが配信ターゲットである可能性を数値化したスコアを算出する。 Processor 31 then calculates a score for the likelihood that each entity in extended dataset 35C is a delivery target (S630). For example, if the external data set 35A is a data set relating to customer purchasing behavior and the internal data set 35B is a data set relating to user online behavior, the processor 31 may determine the purchasing behavior characteristics of each entity in the extended data set 35C. The data and feature data about online behavior are input into a predetermined function to calculate a score that quantifies the likelihood that the corresponding entity is a distribution target.

配信ターゲットは、性別、年齢、購買傾向、オンライン行動傾向、興味、及び関心等の消費者を特徴付けるパラメータにより絞り込まれる配信先の消費者群であり、ターゲット指定情報を通じて指定される。 A distribution target is a group of consumers to whom distribution is to be narrowed down by parameters that characterize consumers such as gender, age, purchasing tendency, online behavior tendency, interest, and concern, and is designated through target designation information.

Ｓ６３０におけるスコア算出後、プロセッサ３１は、広告ＩＤが関連付けられているエンティティの一群（換言すれば顧客の一群）のうち、算出されたスコアが高い順に、サービス利用企業側システム４０から指定された配信数に対応する数のエンティティを、コンテンツ配信先に決定する（Ｓ６４０）。このようにして、プロセッサ３１は、外部データセット３５Ａに対応する複数の顧客のいずれかと対応付けられた内部データセット３５Ｂに対応する複数のユーザの少なくとも一部を、広告コンテンツの配信先に選択する。 After calculating the score in S630, the processor 31 selects distributions specified by the system 40 of the company using the service in descending order of the calculated score among a group of entities (in other words, a group of customers) associated with the advertisement ID. A number of entities corresponding to the number are determined as content distribution destinations (S640). In this way, the processor 31 selects at least some of the plurality of users corresponding to the internal data set 35B associated with one of the plurality of customers corresponding to the external data set 35A as distribution destinations of the advertising content. .

その後、プロセッサ３１は、決定したコンテンツ配信先の情報端末に、サービス利用企業側システム４０から提供された広告コンテンツを、広域ネットワークを通じて送信する（Ｓ６５０）。広告コンテンツは、コンテンツ配信先の広告ＩＤから識別される情報端末に配信される。その後プロセッサ３１は、配信制御処理を終了する。 After that, the processor 31 transmits the advertising content provided from the system 40 of the service user company to the determined information terminal of the content distribution destination through the wide area network (S650). The advertisement content is distributed to the information terminal identified from the advertisement ID of the content distribution destination. After that, the processor 31 ends the distribution control process.

以上に説明した第四実施形態の配信システム３０によれば、共通変数なしのデータフォージョン技術を用いて、外部データセット３５Ａと内部データセット３５Ｂとを結合することにより、広告ＩＤが不明な顧客の特徴データに対して広告ＩＤを関連付けることができる。これにより、広告ＩＤが不明な外部データセット３５Ａの顧客に対して、広告コンテンツを適切に配信することができる。 According to the distribution system 30 of the fourth embodiment described above, by combining the external data set 35A and the internal data set 35B using the data forsion technique without common variables, the customer whose advertisement ID is unknown The advertisement ID can be associated with the feature data of . As a result, it is possible to appropriately distribute the advertisement content to the customer of the external data set 35A whose advertisement ID is unknown.

＜第五実施形態＞
第五実施形態の配信システム３０は、プロセッサ３１が図１２に示す配信制御処理に代えて、図１３に示す配信制御処理を実行するように構成される。以下では、第五実施形態の説明として、プロセッサ３１が実行する配信制御処理の詳細を選択的に説明する。本実施形態において言及されない配信システム３０の構成は、第四実施形態と同じであると理解されてよい。 <Fifth embodiment>
The distribution system 30 of the fifth embodiment is configured such that the processor 31 executes distribution control processing shown in FIG. 13 instead of the distribution control processing shown in FIG. Below, the details of the distribution control process executed by the processor 31 will be selectively described as a description of the fifth embodiment. It may be understood that the configuration of the distribution system 30 not mentioned in this embodiment is the same as in the fourth embodiment.

本実施形態において、プロセッサ３１は、通信インタフェース３９を通じてサービス利用企業側システム４０から、配信要求が入力されると、図１３に示す配信制御処理を実行する。 In this embodiment, when a distribution request is input from the service user company system 40 through the communication interface 39, the processor 31 executes distribution control processing shown in FIG.

配信制御処理を開始すると、プロセッサ３１は、サービス利用企業側システム４０から、配信対象の広告コンテンツと共に、配信指定情報、外部データセット３５Ａとしての顧客データセットを取得する（Ｓ７１０）。 When the distribution control process is started, the processor 31 acquires the advertisement content to be distributed, the distribution designation information, and the customer data set as the external data set 35A from the service user company system 40 (S710).

但し、ここで取得される配信指定情報は、ターゲット指定情報を含まず、配信数指定情報のみを含む。また、外部データセット３５Ａとして取得される顧客データセットは、サービス提供企業が絞り込んだ配信ターゲットに対応する顧客群の特徴を説明する特定顧客データセットである。 However, the distribution designation information acquired here does not include target designation information, but only distribution number designation information. Also, the customer data set acquired as the external data set 35A is a specific customer data set that describes the characteristics of the customer group corresponding to the distribution target narrowed down by the service provider company.

その後、プロセッサ３１は、Ｓ６２０での処理と同様に、外部データセット３５Ａと内部データセット３５Ｂとを結合し、拡張データセット３５Ｃを生成する（Ｓ７２０）。拡張データセット３５Ｃは、エンティティ毎に、対応する顧客の外部データセット３５Ａが有する特徴データと、対応するユーザの内部データセット３５Ｂが有する特徴データとが結合された拡張データを備える。 After that, the processor 31 combines the external data set 35A and the internal data set 35B to generate the extended data set 35C (S720), similarly to the processing in S620. The extended data set 35C includes extended data obtained by combining the feature data of the corresponding customer's external data set 35A and the feature data of the corresponding user's internal data set 35B for each entity.

但し、本実施形態では、内部データセット３５Ｂのユーザのすべてに対して外部データセット３５Ａの顧客が対応付けられることにはならない。本実施形態の拡張データセット３５Ｃは、サービス利用企業側の顧客と対応付けられていないユーザの特徴データも、一つのエンティティの拡張データとして含む。この拡張データは、実質的には拡張されていない内部データセット３５Ｂが有する該当ユーザの特徴データである。 However, in this embodiment, not all users of the internal data set 35B are associated with customers of the external data set 35A. The extended data set 35C of the present embodiment also includes, as extended data of one entity, user feature data that is not associated with the customer of the company using the service. This extended data is the feature data of the corresponding user that the internal data set 35B has, which is not substantially extended.

本実施形態では、拡張データセット３５Ｃに対応するエンティティの一群のうち、外部データセット３５Ａに対応する顧客群に対応付けられたエンティティ群のことをシードと表現し、それ以外のエンティティの一群のことを、非シードと表現する。 In this embodiment, among the group of entities corresponding to the extended data set 35C, the group of entities associated with the group of customers corresponding to the external data set 35A is referred to as a seed, and the other group of entities is referred to as a seed. is expressed as non-seed.

Ｓ７２０の処理後、プロセッサ３１は、拡張データセット３５Ｃに基づいて、非シードの各エンティティと、シードの各エンティティとの間の内部データセット３５Ｂが示す特徴の類似度を算出する（Ｓ７３０）。類似度は、非シードの各エンティティと、シードの各エンティティとの間の特徴空間上の距離によって算出され得る。 After the processing of S720, the processor 31 calculates the similarity of the feature indicated by the internal data set 35B between each non-seed entity and each seed entity based on the extended data set 35C (S730). The similarity can be calculated by the feature space distance between each non-seed entity and each seed entity.

類似度の算出後、プロセッサ３１は、類似度の高い順に、配信指定情報で指定された配信数に対応する数のエンティティを配信先に決定する（Ｓ７４０）。この際、シードに対応する全てのエンティティも配信先に決定される。 After calculating the degree of similarity, processor 31 determines, as distribution destinations, the number of entities corresponding to the number of distributions designated by the distribution designation information in descending order of similarity (S740). At this time, all entities corresponding to the seed are also determined as delivery destinations.

このようにして、プロセッサ３１は、外部データセット３５Ａに対応する複数の顧客と対応付けられたユーザの集合であるシードの集合と、内部データセット３５Ｂに対応する複数のユーザのうち、シードと特徴が類似するユーザの集合と、を広告コンテンツの配信先に選択する。 In this way, the processor 31 selects a set of seeds, which is a set of users associated with a plurality of customers corresponding to the external data set 35A, and seeds and features among a plurality of users corresponding to the internal data set 35B. is selected as a distribution destination of advertising content.

その後、プロセッサ３１は、Ｓ６５０の処理と同様に、Ｓ７４０で決定したコンテンツ配信先の情報端末に、サービス利用企業側システム４０から提供された広告コンテンツを、広域ネットワークを通じて送信する（Ｓ７５０）。その後、配信制御処理を終了する。 After that, the processor 31 transmits the advertisement content provided from the service user company system 40 to the information terminal of the content delivery destination determined in S740 through the wide area network, similarly to the processing of S650 (S750). After that, the distribution control process is terminated.

以上に説明した本実施形態の配信システム３０によれば、サービス利用企業側システム４０から提供された顧客群のデータセットを基礎に、それら顧客群と共に、顧客群と類似した特徴を示すより大きな集合の消費者の情報端末に、広告コンテンツを配信することができる。従って、本実施形態によれば、多くの消費者に、効率的に広告配信を行うことが可能である。 According to the distribution system 30 of the present embodiment described above, based on the data set of the customer group provided by the system 40 on the side of the service-using company, a larger set of data showing characteristics similar to those of the customer group is obtained along with the customer group. of consumer information terminals. Therefore, according to this embodiment, it is possible to efficiently distribute advertisements to many consumers.

＜第六実施形態＞
第六実施形態の配信システム３０は、第四実施形態又は第五実施形態の配信システム３０と同様の広告配信サービスと共に、予測サービスを提供するように構成される。 <Sixth Embodiment>
The distribution system 30 of the sixth embodiment is configured to provide a prediction service along with an advertisement distribution service similar to the distribution system 30 of the fourth or fifth embodiment.

具体的には、本実施形態のプロセッサ３１は、サービス利用企業側システム４０からの実行要求に応じて、図１４に示す予測処理を実行するように構成される。以下では、第六実施形態の説明として、プロセッサ３１が実行する予測処理の詳細を選択的に説明する。本実施形態において言及されない配信システム３０の構成は、第四実施形態又は第五実施形態と同じであると理解されてよい。 Specifically, the processor 31 of this embodiment is configured to execute the prediction process shown in FIG. Details of the prediction process executed by the processor 31 will be selectively described below as a description of the sixth embodiment. It may be understood that the configuration of the distribution system 30 not mentioned in this embodiment is the same as in the fourth or fifth embodiment.

プロセッサ３１は、予測処理を開始すると、通信インタフェース３９を通じて分析対象のデータセットを、分析条件指定情報と共に、サービス利用企業側システム４０から取得する（Ｓ８１０）。分析対象のデータセットは、分析対象の顧客毎の特徴データを備えるデータセットである。 When starting the prediction process, the processor 31 acquires the data set to be analyzed from the service user company side system 40 through the communication interface 39 together with the analysis condition designation information (S810). The data set to be analyzed is a data set comprising feature data for each customer to be analyzed.

分析条件指定情報は、顧客の購入可能性を評価する対象の商品を指定する情報であり得る。予測処理では、指定された対象商品を分析対象の各顧客が購入する可能性が、対象商品の購入数の予測値を算出することによって予測される。ここでの予測は、顧客の行動を推定することに対応し、予測値は、行動に関する推定値に対応する。 The analysis condition designation information may be information that designates a product for which the customer's purchase possibility is evaluated. In the prediction process, the possibility that each analysis target customer will purchase a designated target product is predicted by calculating a predicted value of the number of purchases of the target product. Prediction here corresponds to estimating the customer's behavior, and predicted value corresponds to an estimate of the behavior.

Ｓ８１０の処理実行後、プロセッサ３１は、第一のデータセット１５Ａとして分析対象のデータセットを用いて、更には、第二のデータセット１５Ｂとして内部データセット３５Ｂを用いて、分析処理におけるＳ１１０～Ｓ１７０又はＳ３１０～Ｓ３７０の処理と同様の処理を実行することにより、分析対象の各顧客と内部データセット３５Ｂに特徴データを有する各ユーザとの対応関係を示す対応関係行列Ω^＊を算出する（Ｓ８２０）。 After executing the process of S810, the processor 31 uses the data set to be analyzed as the first data set 15A and further uses the internal data set 35B as the second data set 15B to perform S110 to S170 in the analysis process. Alternatively, by executing the same processing as the processing of S310 to S370, a correspondence matrix Ω ^* indicating the correspondence between each customer to be analyzed and each user having feature data in the internal data set 35B is calculated (S820). .

プロセッサ３１は更に、算出された対応関係行列Ω^＊に基づき、分析対象の顧客毎に、対応する顧客に近しい所定数のユーザを抽出し、内部データセット３５Ｂから特定可能な上記抽出したユーザの対象商品の購入数の重み付け平均により、対応する顧客の対象商品の購入数の予測値を算出する（Ｓ８３０）。このようにしてプロセッサ３１は、顧客の購買行動を、対応付けられたユーザの購買行動から推定する。内部データセット３５Ｂは、各ユーザの対象商品の購入数を特定可能な情報を含む。 Further, the processor 31 extracts a predetermined number of users close to the corresponding customer for each customer to be analyzed based on the calculated correspondence matrix Ω ^* , and extracts the above extracted users who can be identified from the internal data set 35B. Based on the weighted average of the number of product purchases, the predicted number of purchases of the target product by the corresponding customer is calculated (S830). Thus, the processor 31 infers the customer's purchasing behavior from the associated user's purchasing behavior. The internal data set 35B includes information that can identify the number of purchases of target products by each user.

対応関係行列Ω^＊の各要素は、顧客とユーザとの間の類似度を０～１の値で示す。具体的には、対応関係行列Ω^＊における第ｉ行第ｊ列の要素は、内部データセット３５Ｂに対応するユーザの集合のうちｉ番目のユーザと、分析対象のデータセットに対応する顧客の集合のうちのｊ番目の顧客と、の間の類似度を値０～１で示す。 Each element of the correspondence matrix Ω ^* indicates the degree of similarity between the customer and the user with a value of 0-1. Specifically, the element of the i-th row and j-th column in the correspondence matrix Ω ^* is the i-th user among the set of users corresponding to the internal data set 35B and the set of customers corresponding to the data set to be analyzed. A value of 0 to 1 indicates the degree of similarity between the j-th customer of

重み付け平均は、例えば、類似度を重みとして用いて算出される。重み付け平均は、顧客と近しい３人のユーザとして第一、第二、及び第三のユーザが抽出されたと仮定したとき、次のように算出され得る。 A weighted average is calculated using, for example, the degree of similarity as a weight. A weighted average can be calculated as follows, assuming that the first, second, and third users are extracted as three users close to the customer.

すなわち、顧客と第一のユーザとの類似度がｗ１、顧客と第二のユーザとの類似度がｗ２、顧客と第三のユーザとの類似度ｗ３であり、第一のユーザの対象商品購入数がｐ１、第二のユーザの対象商品購入数がｐ２であり、第三のユーザの対象商品購入数がｐ３であるとき、顧客の対象商品の購入数の予測値ｐｅは、ｐｅ＝（ｗ１・ｐ１＋ｗ２・ｐ２＋ｗ３・ｐ３）／３で算出され得る。 That is, the degree of similarity between the customer and the first user is w1, the degree of similarity between the customer and the second user is w2, and the degree of similarity between the customer and the third user is w3. is p1, the number of purchases of the target product by the second user is p2, and the number of purchases of the target product by the third user is p3, the predicted value pe of the number of purchases of the target product by the customer is pe=(w1 *p1+w2*p2+w3*p3)/3.

対応関係行列Ω^＊からは、顧客毎に、すべてのユーザとの類似度（換言すれば対応付けの大きさ）を特定可能である。従って、近しいユーザを抽出するプロセスなしに、すべてのユーザの対象商品の購入数の重み付け平均によって、顧客の対象商品の購入数の予測値が算出されてもよい。 From the correspondence matrix Ω ^* , it is possible to identify the degree of similarity (in other words, magnitude of correspondence) with all users for each customer. Therefore, without the process of extracting close users, a weighted average of the number of purchases of the target product by all users may be used to calculate the expected number of purchases of the target product by the customer.

Ｓ８３０の処理実行後、プロセッサ３１は、顧客毎の対応商品の購入数の予測値を記述した予測データを、予測処理の実行要求元に出力する（Ｓ８４０）。その後、プロセッサ３１は、図１４に示す予測処理を終了する。 After executing the processing of S830, the processor 31 outputs prediction data describing the predicted number of purchases of corresponding products for each customer to the source of the prediction processing execution request (S840). After that, the processor 31 terminates the prediction process shown in FIG.

別例によれば、プロセッサ３１は、Ｓ８３０の処理実行後、予測データを出力することに代えて又は加えて、顧客毎の対応商品の購入数の予測値に基づき、予測値が大きい順に、サービス利用企業から指定された配信数に対応する人数の顧客に対して、対象商品の購入を進める広告コンテンツを配信する処理を実行してもよい（Ｓ８４０）。 According to another example, after executing the process of S830, instead of or in addition to outputting the prediction data, the processor 31, based on the prediction value of the number of purchases of corresponding products for each customer, sorts the services in descending order of the prediction value. A process of distributing the advertising content promoting the purchase of the target product to the number of customers corresponding to the number of distributions specified by the user company may be executed (S840).

以上に、第六実施形態の配信システム３０を説明したが、本実施形態によれば、共通変数なしのデータフュージョン技術を用いて、有意義な広告配信サービスを提供でき、更には、有意義なマーケティングソリューションを提供可能である。 The distribution system 30 of the sixth embodiment has been described above, but according to this embodiment, a meaningful advertisement distribution service can be provided using data fusion technology without common variables, and furthermore, a meaningful marketing solution can be provided. can be provided.

［その他］
本開示が上述した実施形態に限定されるものではなく、種々の態様を採り得ることは言うまでもない。上記実施形態における１つの構成要素が有する機能は、複数の構成要素に分散して設けられてもよい。複数の構成要素が有する機能は、１つの構成要素に統合されてもよい。上記実施形態の構成の一部は、省略されてもよい。上記実施形態の構成の少なくとも一部は、他の上記実施形態の構成に対して付加又は置換されてもよい。特許請求の範囲に記載の文言から特定される技術思想に含まれるあらゆる態様が本開示の実施形態である。 [others]
It goes without saying that the present disclosure is not limited to the embodiments described above, and can take various forms. A function possessed by one component in the above embodiment may be distributed to a plurality of components. Functions possessed by multiple components may be integrated into one component. A part of the configuration of the above embodiment may be omitted. At least part of the configurations of the above embodiments may be added or replaced with respect to the configurations of other above embodiments. All aspects included in the technical ideas specified by the language in the claims are embodiments of the present disclosure.

１…情報処理システム、１１，３１…プロセッサ、１３，３３…メモリ、１５，３５…ストレージ、１５Ａ…第一のデータセット、１５Ｂ…第二のデータセット、１５Ｃ…拡張データセット、１７…ユーザインタフェース、１９，３９…通信インタフェース、３０…配信システム、３５Ａ…外部データセット、３５Ｂ…内部データセット、３５Ｃ…拡張データセット、４０…サービス利用企業側システム、Ｐｒ，Ｐｒ１…コンピュータプログラム。 REFERENCE SIGNS LIST 1 information processing system 11, 31 processor 13, 33 memory 15, 35 storage 15A first data set 15B second data set 15C extended data set 17 user interface , 19, 39... communication interface, 30... distribution system, 35A... external data set, 35B... internal data set, 35C... extended data set, 40... service user company side system, Pr, Pr1... computer program.

Claims

a first acquisition unit configured to acquire a first data set relating to a plurality of first entities, the first data set describing characteristics of each of the plurality of first entities;
a second acquisition unit configured to acquire a second data set relating to a plurality of second entities, the second data set describing characteristics of each of the plurality of second entities;
a first set of first feature vectors identified from the first data set, each of the first feature vectors representing a corresponding feature of one of the plurality of first entities; and a group of second feature vectors identified from the second data set, each of the second feature vectors corresponding to the plurality of second entities A group of first low-dimensional feature vectors corresponding to the group of first feature vectors, and the second group of configured to generate a group of second low-dimensional feature vectors corresponding to a group of feature vectors and having the same number of dimensions as the first group of low-dimensional feature vectors; a dimensionality reduction unit that
Each of the plurality of first entities is associated with at least one of the plurality of second entities based on the group of first low-dimensional feature vectors and the group of second low-dimensional feature vectors. a mapping unit configured to:
An information processing system comprising

The associating unit determines the degree of similarity between the first entities identified from the group of the first low-dimensional feature vectors, and the second entity identified from the group of the second low-dimensional feature vectors. each of the plurality of first entities to the plurality of 2. The information processing system according to claim 1, associated with at least one of the second entities.

the first low-dimensional feature vector defined by a first feature space;
The second low-dimensional feature vector is defined by a second feature space,
The associating unit identifies, from the group of the second low-dimensional feature vectors, the distribution of the plurality of first entities in the first feature space identified from the group of the first low-dimensional feature vectors. for mapping the plurality of first entities on the first feature space to the second feature space to match the distribution of the plurality of second entities in the second feature space where 3. The information processing system according to claim 1 or 2, wherein a mapping of is searched for, and each of said plurality of first entities is associated with at least one of said plurality of second entities based on said mapping.

The associating unit is a formula comprising a matrix K, a matrix L, and a matrix H

A matrix Ω that maximizes the value Z(Ω) according to is searched as a matrix Ω ^* , and based on said matrix Ω ^* , each of said plurality of first entities is replaced by at least one of said plurality of second entities configured to map to
the number of said first entities is N, the number of said second entities is the same as said first entity;
The matrix K is a matrix of N rows and N columns, and the value of the element in the i-th row and j-th column is the first low-dimensional feature vector of the i-th entity among the plurality of first entities. , a first low-dimensional feature vector of the j-th entity among the plurality of first entities, and the i-th entity among the plurality of first entities and the j-th entity calculated based on is a first similarity matrix representing the similarity between the entities of
The matrix L is a matrix of N rows and N columns, and the value of the element in the i-th row and j-th column is the second low-dimensional feature vector of the i-th entity among the plurality of second entities. , a second low-dimensional feature vector of the j-th entity among the plurality of second entities, and the i-th entity among the plurality of second entities and the j-th entity calculated based on is a second similarity matrix representing the similarity between the entities of
The matrix H is a matrix of N rows and N columns, and the value of the element in the i-th row and j-th column indicates a value of 1−1/N when i=j, and a value of 0 when i≠j. 3. The information processing system according to claim 1, wherein the matrix indicates

The associating unit
Based on the matrix Ω ^* , the dimensionality reduction method in the dimensionality reduction processing is changed, and the dimensionality reduction unit executes the dimensionality reduction processing in the changed dimensionality reduction method, thereby newly obtaining a re-search process of searching for the matrix Ω that maximizes the value Z(Ω) as a matrix Ω ^* based on the first group of low-dimensional feature vectors and the group of the second low-dimensional feature vectors, refining the matrix Ω ^* by repeatedly performing until a predetermined condition is satisfied; and, based on the refined matrix Ω ^* , each of the plurality of first entities to the plurality of second entities. 5. The information processing system according to claim 4, configured to correspond to at least one.

The associating unit, based on the matrix Ω ^* , assigns the dimensionality reduction method to the first group of the first low-dimensional feature vectors and the group of the second low-dimensional feature vectors that correspond to each other. 6. The information processing system according to claim 5, wherein the change is made so that the distance in the feature space between the low-dimensional feature vector and the second low-dimensional feature vector is shortened.

the first data set includes a plurality of first feature data, each of the plurality of first feature data representing a corresponding one feature of the plurality of first entities;
the second data set includes a plurality of second feature data, each of the plurality of second feature data representing a corresponding one feature of the plurality of second entities;
The information processing system is
Among the plurality of second feature data included in the second data set, each of the plurality of first feature data included in the first data set based on the matching by the associating unit an extended data set including a plurality of extended data by combining one of 7. The information processing system according to any one of claims 1 to 6, further comprising: a data fusion unit that generates an extension data set including a plurality of extension data that are:

the first entity and the second entity are people;
the first data set is a data set describing a first characteristic of each of a plurality of people belonging to a first group;
The information processing system according to any one of claims 1 to 7, wherein said second data set is a data set describing second characteristics of each of a plurality of people belonging to a second group.

The combination of the first feature and the second feature includes a feature related to purchasing behavior, a feature related to movement in at least one of an online space and an offline space, and/or a plurality of points on the space. 10. The information processing system according to claim 8, which is a combination of: a feature relating to a visit;

the first entity and the second entity are people;
the second data set is associated with identification information of an information terminal corresponding to each of the plurality of second entities;
The information processing system is
Based on the identification information, the information content is distributed to a set of information terminals corresponding to a set of second entities selected as distribution destinations of the information content among the plurality of second entities. a delivery unit to be
a selection unit that selects at least part of the set of the second entities associated with one of the plurality of first entities by the association unit as a delivery destination of the information content;
The information processing system according to any one of claims 1 to 9, comprising:

The selecting unit selects a first set, which is a set of the second entities associated with any one of the plurality of first entities by the associating unit, and the plurality of second entities, the 11. The information processing system according to claim 10, wherein a second set similar in characteristics to the first set is selected as a distribution destination of the information content.

the first entity and the second entity are people;
the second data set describes behavioral characteristics of each of the plurality of second entities;
The information processing system further includes:
With respect to at least some of the plurality of first entities, for each entity, at least of the plurality of second entities associated with the entity by the association unit identified from the second data set 12. The information processing system according to any one of claims 1 to 11, further comprising: an estimating unit that calculates an estimated value regarding behavior of the entity based on a feature regarding one behavior.

A computer for causing a computer to function as the first acquisition unit, the second acquisition unit, the dimension reduction unit, and the association unit in the information processing system according to any one of claims 1 to 9 program.

A computer is caused to function as the first acquisition unit, the second acquisition unit, the dimension reduction unit, the association unit, the distribution unit, and the selection unit in the information processing system according to claim 10 or 11. A computer program for

13. A computer program for causing a computer to function as the first acquisition unit, the second acquisition unit, the dimension reduction unit, the association unit, and the estimation unit in the information processing system according to claim 12.

A computer-implemented information processing method comprising:
obtaining a first data set for a plurality of first entities, the first data set describing characteristics of each of the plurality of first entities;
obtaining a second data set for a plurality of second entities, the second data set describing characteristics of each of the plurality of second entities;
a group of first feature vectors identified from the first data set, each of the first feature vectors representing a feature of a corresponding one of the plurality of first entities; and a group of second feature vectors identified from the second data set, each of the second feature vectors corresponding to the plurality of second entities A group of first low-dimensional feature vectors corresponding to the group of the first feature vectors, and the second group of generating a group of second low-dimensional feature vectors corresponding to the group of feature vectors and having the same number of dimensions as the group of first low-dimensional feature vectors;
Each of the plurality of first entities is associated with at least one of the plurality of second entities based on the group of first low-dimensional feature vectors and the group of second low-dimensional feature vectors. and
Information processing method including.

said associating
the similarity between the first entities identified from the group of the first low-dimensional feature vectors and the similarity between the second entities identified from the group of the second low-dimensional feature vectors each of the plurality of first entities to one of the plurality of second entities such that the correlation between the first entities in terms of similarity matches the correlation between the second entities based on 17. The information processing method according to claim 16, comprising: