JP7151009B1

JP7151009B1 - Information processing system, computer program, and information processing method

Info

Publication number: JP7151009B1
Application number: JP2022049442A
Authority: JP
Inventors: 雄介熊谷; 龍道本; 慎平三浦
Original assignee: Hakuhodo DY Holdings Inc
Current assignee: Hakuhodo DY Holdings Inc
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-10-11
Anticipated expiration: 2042-03-25
Also published as: WO2023182163A1; JP2023142494A

Abstract

【課題】クラスタ間の特徴データの適切な対応付けを実現可能な新規技術を提供する。【解決手段】エンティティの第一のグループにおける複数のクラスタに関し、クラスタ毎に、対応するクラスタの特徴を表す第一のクラスタ特徴データを備える第一のデータセットが取得される。エンティティの第二のグループにおける複数のクラスタに関し、クラスタ毎に、対応するクラスタの特徴を表す第二のクラスタ特徴データを備える第二のデータセットが取得される。共通エンティティのそれぞれが属する第一のクラスタ及び第二のクラスタを識別可能な情報に基づき、第一のデータセットが備える複数の第一のクラスタ特徴データのそれぞれが、第二のデータセットが備える複数の第二のクラスタ特徴データのうちの一つ以上に対応付けられる（Ｓ２１０～Ｓ２５０）。【選択図】図６Kind Code: A1 A novel technique capable of realizing appropriate association of feature data between clusters is provided. For a plurality of clusters in a first group of entities, a first data set is obtained comprising, for each cluster, first cluster feature data representing features of the corresponding cluster. For a plurality of clusters in the second group of entities, a second data set is obtained comprising, for each cluster, second cluster feature data representing features of the corresponding cluster. Each of the plurality of first cluster feature data included in the first data set corresponds to the plurality of first cluster feature data included in the second data set based on information that can identify the first cluster and the second cluster to which each of the common entities belongs is associated with one or more of the second cluster feature data of (S210-S250). [Selection drawing] Fig. 6

Description

本開示は、情報処理システム及び情報処理方法に関する。 The present disclosure relates to an information processing system and an information processing method.

異なる手段で収集した消費者に関する複数のデータセットを対応付けるデータフュージョン技術が既に知られている。例えば、出願人は、消費者の第一のグループをクラスタリングして定義される複数のクラスタの特徴データと、消費者の第二のグループをクラスタリングして定義される複数のクラスタの特徴データと、をクラスタの類似性に基づいて、対応付ける技術を既に開示している（例えば、特許文献１参照）。 Data fusion techniques are already known for associating multiple data sets on consumers collected by different means. For example, Applicant provides a plurality of clusters of feature data defined by clustering a first group of consumers, a plurality of clusters of feature data defined by clustering a second group of consumers, have already been disclosed based on the similarity of the clusters (see, for example, Patent Document 1).

特開２０１６－１２６６０９号公報JP 2016-126609 A

消費者個人の特徴データを、クラスタの特徴データに置き換えることは、個人情報を保護しながら、消費者データを授受する目的で役立つ。 Replacing consumer individual feature data with cluster feature data is useful for the purpose of exchanging consumer data while protecting personal information.

しかしながら、クラスタの類似性を特徴データの内容に基づき評価するだけでは、高精度に対応付けを行うことができない場合がある。すなわち、従来技術には、対応付けの精度について改善の余地があった。 However, it may not be possible to perform high-precision association only by evaluating the similarity of clusters based on the content of feature data. In other words, there is room for improvement in the accuracy of matching in the conventional technology.

そこで、本開示の一側面によれば、エンティティの第一のグループをクラスタリングして定義される複数の第一のクラスタ、及び、エンティティの第二のグループをクラスタリングして定義される複数の第二のクラスタに関して、クラスタ間の特徴データの適切な対応付けを実現可能な新規技術を提供できることが望ましい。 Therefore, according to one aspect of the present disclosure, a plurality of first clusters defined by clustering the first group of entities and a plurality of second clusters defined by clustering the second group of entities clusters, it would be desirable to provide a new technique capable of realizing appropriate association of feature data between clusters.

本開示の一側面によれば、情報処理システムが提供される。情報処理システムは、第一取得部と、第二取得部と、対応付け部と、を備える。 According to one aspect of the present disclosure, an information processing system is provided. The information processing system includes a first acquisition section, a second acquisition section, and an association section.

第一取得部は、エンティティの第一のグループをクラスタリングして定義される複数の第一のクラスタに関し、第一のクラスタ毎に、対応する第一のクラスタの特徴を表す第一のクラスタ特徴データを備える第一のデータセットを取得するように構成される。 The first acquisition unit obtains, for each first cluster, first cluster feature data representing a feature of the corresponding first cluster with respect to a plurality of first clusters defined by clustering the first group of entities. is configured to obtain a first data set comprising:

第二取得部は、エンティティの第二のグループをクラスタリングして定義される複数の第二のクラスタに関し、第二のクラスタ毎に、対応する第二のクラスタの特徴を表す第二のクラスタ特徴データを備える第二のデータセットを取得するように構成される。 A second acquisition unit obtains, for each second cluster, second cluster feature data representing a feature of a corresponding second cluster, with respect to a plurality of second clusters defined by clustering a second group of entities. is configured to obtain a second data set comprising

対応付け部は、第一のグループと第二のグループとの間で共通する一以上のエンティティである一以上の共通エンティティに関して、共通エンティティのそれぞれが属する第一のクラスタ及び第二のクラスタを識別可能な情報に基づき、第一のデータセットが備える複数の第一のクラスタ特徴データのそれぞれを、第二のデータセットが備える複数の第二のクラスタ特徴データのうちの一つ以上に対応付けるように構成される。 The associating unit identifies, with respect to one or more common entities that are one or more common entities between the first group and the second group, a first cluster and a second cluster to which each common entity belongs. Each of the plurality of first cluster feature data included in the first data set is associated with one or more of the plurality of second cluster feature data included in the second data set based on available information. Configured.

このように構成された情報処理システムによれば、第一のデータセットに対応するエンティティの第一のグループと、第二のデータセットに対応するエンティティの第二のグループとの間に共通するエンティティの情報を活用し、クラスタに関する特徴データ間の対応付けを精度よく行うことができる。 According to the information processing system configured in this way, the entity common between the first group of entities corresponding to the first data set and the second group of entities corresponding to the second data set can be used to accurately associate feature data related to clusters.

本開示の一側面によれば、対応付け部は、複数の第一のクラスタのうち、それぞれが共通エンティティの少なくとも一つを含む一以上の第一のクラスタである一以上の特定クラスタに関しては、特定クラスタのそれぞれの第一のクラスタ特徴データを、同じエンティティを含む第二のクラスタの第二のクラスタ特徴データと対応付けるように構成され得る。 According to one aspect of the present disclosure, the associating unit, among the plurality of first clusters, for one or more specific clusters that are one or more first clusters each including at least one common entity, It may be configured to associate each first cluster feature data of a particular cluster with second cluster feature data of a second cluster containing the same entity.

このように構成された情報処理システムによれば、共通エンティティが属する第一のクラスタと第二のクラスタとの関係に整合するように、第一のクラスタ特徴データと、第二のクラスタ特徴データとを対応付けることができる。 According to the information processing system configured in this way, the first cluster feature data and the second cluster feature data are combined so as to match the relationship between the first cluster and the second cluster to which the common entity belongs. can be associated.

本開示の一側面によれば、対応付け部は、複数の第一のクラスタのうち、特定クラスタ以外の残りの一以上のクラスタである一以上の非共通第一クラスタ、及び、複数の第二のクラスタのうち、特定クラスタと対応付けられるクラスタ以外の残りの一以上のクラスタである一以上の非共通第二クラスタに関しては、一以上の非共通第一クラスタのそれぞれの第一のクラスタ特徴データと一以上の非共通第二クラスタのそれぞれの第二のクラスタ特徴データとに基づき、非共通第一クラスタのそれぞれと非共通第二クラスタのそれぞれとの間の関係を判別し、非共通第一クラスタのそれぞれの第一のクラスタ特徴データを、非共通第二クラスタの少なくとも一つの第二のクラスタ特徴データに対応付けるように構成され得る。 According to one aspect of the present disclosure, the associating unit includes one or more non-common first clusters, which are one or more remaining clusters other than the specific cluster, among the plurality of first clusters, and a plurality of second Of the clusters, for one or more non-common second clusters that are one or more remaining clusters other than the clusters associated with the specific cluster, the first cluster feature data of each of the one or more non-common first clusters and second cluster feature data of each of the one or more non-common second clusters, determining a relationship between each of the non-common first clusters and each of the non-common second clusters; The first cluster feature data for each of the clusters may be configured to correspond to at least one second cluster feature data for the non-common second clusters.

このように構成された情報処理システムによれば、共通エンティティの有無と整合するように、第一のクラスタ特徴データと、第二のクラスタ特徴データとを対応付けることができる。 According to the information processing system configured in this way, the first cluster feature data and the second cluster feature data can be associated so as to match the presence or absence of the common entity.

本開示の一側面によれば、第一のクラスタ特徴データは、対応する第一のクラスタの第一の特徴空間上の点を定義するデータであり得る。第二のクラスタ特徴データは、対応する第二のクラスタの第二の特徴空間上の点を定義するデータであり得る。 According to one aspect of the present disclosure, the first cluster feature data may be data defining points on the first feature space of corresponding first clusters. The second cluster feature data may be data defining points on the second feature space of corresponding second clusters.

本開示の一側面によれば、対応付け部は、第一の特徴空間上の複数の第一のクラスタのそれぞれの点を第二の特徴空間に写す写像を、評価関数を用いて探索する処理を実行し得る。 According to one aspect of the present disclosure, the associating unit uses an evaluation function to search for a mapping that maps each point of a plurality of first clusters on the first feature space to the second feature space. can be executed.

本開示の一側面によれば、評価関数は、同じエンティティを含む第一のクラスタと第二のクラスタとのペアである特定ペアに関しては、特定ペアを構成する第一のクラスタと第二のクラスタとの間の第二の特徴空間上の距離が短いほど、特定方向に変化する値を出力するように設計され得る。 According to one aspect of the present disclosure, the evaluation function is a pair of a first cluster and a second cluster containing the same entity. can be designed to output a value that changes in a specific direction as the distance in the second feature space between is shorter.

本開示の一側面によれば、評価関数は、特定ペア以外の第一のクラスタと第二のクラスタとのペアに関しては、第一のクラスタと第二のクラスタとの間の第二の特徴空間上の距離が短いほど、特定方向とは逆方向に変化する値を出力するように設計され得る。この場合、対応付け部は、評価関数の出力を特定方向側の限界値にする写像を探索するように構成され得る。 According to one aspect of the present disclosure, for pairs of the first cluster and the second cluster other than the specific pair, the evaluation function is the second feature space between the first cluster and the second cluster It can be designed to output a value that changes in the opposite direction to a specific direction as the upper distance is shorter. In this case, the associating unit may be configured to search for a mapping that makes the output of the evaluation function the limit value on the side of the specific direction.

本開示の一側面によれば、評価関数は、特定ペアに関しては、特定ペアを構成する第一のクラスタと第二のクラスタとの間の第二の特徴空間上の距離が短いほど、小さい値を出力し、特定ペア以外の第一のクラスタと第二のクラスタとのペアに関しては、第一のクラスタと第二のクラスタとの間の第二の特徴空間上の距離が短いほど、大きい値を出力するように設計され得る。この場合、対応付け部は、評価関数の出力を最小にする写像を探索するように構成され得る。 According to one aspect of the present disclosure, for a specific pair, the evaluation function has a smaller value as the distance on the second feature space between the first cluster and the second cluster that constitute the specific pair is shorter. , and for pairs of the first cluster and the second cluster other than the specific pair, the shorter the distance in the second feature space between the first cluster and the second cluster, the larger the value can be designed to output In this case, the matcher may be configured to search for the mapping that minimizes the output of the evaluation function.

本開示の一側面によれば、対応付け部は、写像により写される第二の特徴空間上の複数の第一のクラスタのそれぞれの点と、第二の特徴空間上の複数の第二のクラスタのそれぞれの点との間の位置関係に基づいて、複数の第一のクラスタ特徴データのそれぞれを、複数の第二のクラスタ特徴データのうちの一つ以上に対応付ける処理を実行するように構成され得る。 According to one aspect of the present disclosure, the associating unit associates each point of the plurality of first clusters on the second feature space mapped by the mapping with the plurality of second clusters on the second feature space. configured to execute a process of associating each of the plurality of first cluster feature data with one or more of the plurality of second cluster feature data based on the positional relationship between each point of the cluster can be

このように構成された情報処理システムによれば、共通エンティティに注目して、適切な第一の特徴空間から第二の特徴空間への写像を求めた上で、第一のクラスタ特徴データと、第二のクラスタ特徴データとを対応付けることができるので、高精度な対応付けを実現することができる。 According to the information processing system configured in this way, focusing on the common entity, an appropriate mapping from the first feature space to the second feature space is obtained, and then the first cluster feature data, Since the second cluster feature data can be associated, highly accurate association can be realized.

本開示の一側面によれば、対応付け部は、特定クラスタのそれぞれの第一のクラスタ特徴データを、同じエンティティを含む第二のクラスタの第二のクラスタ特徴データと対応付ける処理を実行する一方で、写像により写される第二の特徴空間上の非共通第一クラスタのそれぞれの点と、第二の特徴空間上の非共通第二クラスタのそれぞれの点との間の位置関係に基づいて、非共通第一クラスタのそれぞれの第一のクラスタ特徴データを、非共通第二クラスタの少なくとも一つの第二のクラスタ特徴データに対応付ける処理を実行するように構成され得る。 According to one aspect of the present disclosure, the associating unit performs a process of associating each first cluster feature data of a specific cluster with second cluster feature data of a second cluster containing the same entity. , based on the positional relationship between each point of the non-common first cluster on the second feature space mapped by the mapping and each point of the non-common second cluster on the second feature space, It may be configured to perform a process of associating first cluster feature data of each of the non-common first clusters with at least one second cluster feature data of the non-common second clusters.

本開示の一側面によれば、一以上の共通エンティティは、複数の共通エンティティであり得る。複数の第一のクラスタは、複数の共通エンティティが複数の第一のクラスタに分散して配置されるように、第一のグループをクラスタリングして定義されるクラスタの集合であり得る。複数の第二のクラスタは、複数の共通エンティティが複数の第二のクラスタに分散して配置されるように、第二のグループをクラスタリングして定義されるクラスタの集合であり得る。 According to one aspect of the disclosure, the one or more common entities may be multiple common entities. The plurality of first clusters may be a set of clusters defined by clustering the first groups such that the plurality of common entities are distributed among the plurality of first clusters. The plurality of second clusters may be a set of clusters defined by clustering the second groups such that the plurality of common entities are distributed among the plurality of second clusters.

共通エンティティを複数のクラスタに分散させて、一つのクラスタに多数の共通エンティティが含まれないように、第一のグループ及び第二のグループをクラスタリングすれば、共通エンティティを通じて第一のクラスタと第二のクラスタとの間の対応関係を精度よく判別することができ、第一のクラスタ特徴データと、第二のクラスタ特徴データとを精度よく対応付けることができる。 By distributing the common entities into a plurality of clusters and clustering the first group and the second group so that one cluster does not contain many common entities, the first cluster and the second group are clustered through the common entities. The correspondence relationship between the clusters can be determined with high accuracy, and the first cluster feature data and the second cluster feature data can be matched with high accuracy.

本開示の一側面によれば、複数の第一のクラスタは、複数の共通エンティティのそれぞれが互いに異なる第一のクラスタに属するように、第一のグループをクラスタリングして定義されるクラスタの集合であり得る。複数の第二のクラスタは、複数の共通エンティティのそれぞれが互いに異なる第二のクラスタに属するように、第二のグループをクラスタリングして定義されるクラスタの集合であり得る。 According to one aspect of the present disclosure, the plurality of first clusters is a set of clusters defined by clustering the first groups such that each of the plurality of common entities belongs to a different first cluster. could be. The plurality of second clusters may be a set of clusters defined by clustering the second groups such that each of the plurality of common entities belongs to different second clusters.

本開示の一側面によれば、第一のクラスタ特徴データは、対応する第一のクラスタに属する複数のエンティティの特徴データを統計処理により統合して生成される特徴データであり得る。第二のクラスタ特徴データは、対応する第二のクラスタに属する複数のエンティティの特徴データを統計処理により統合して生成される特徴データであり得る。 According to one aspect of the present disclosure, the first cluster feature data may be feature data generated by statistically integrating feature data of a plurality of entities belonging to corresponding first clusters. The second cluster feature data may be feature data generated by statistically integrating feature data of a plurality of entities belonging to corresponding second clusters.

エンティティ個々の情報を隠すために、エンティティの特徴が統計化されたクラスタ特徴データが、データ保持者から提供され得る。本開示の一側面によれば、こうした統計処理がなされたクラスタ特徴データ間の対応付けを、適切に行うことができる。 To hide information about individual entities, data holders can provide cluster feature data in which the features of the entities are statisticized. According to one aspect of the present disclosure, it is possible to appropriately associate the cluster feature data on which such statistical processing has been performed.

本開示の一側面によれば、エンティティは、消費者であり得る。エンティティの特徴データは、消費者の特徴を説明する特徴データであり得る。 According to one aspect of the disclosure, the entity may be a consumer. Characteristic data of an entity may be characteristic data that describes consumer characteristics.

本開示の一側面によれば、上述した情報処理システムが有する機能を少なくとも部分的に実現するためのコンピュータプログラムが提供されてもよい。本開示の一側面によれば、上述した情報処理システムにおける第一取得部、第二取得部、及び対応付け部として、コンピュータを機能させるためのコンピュータプログラムが提供されてもよい。コンピュータプログラムは、コンピュータ読取可能な一時的でない記録媒体に記録され得る。 According to one aspect of the present disclosure, a computer program for at least partially realizing the functions of the information processing system described above may be provided. According to one aspect of the present disclosure, a computer program may be provided for causing a computer to function as the first acquisition unit, the second acquisition unit, and the association unit in the information processing system described above. A computer program can be recorded on a computer-readable, non-transitory recording medium.

本開示の一側面によれば、上述した情報処理システムに対応する情報処理方法が提供されてもよい。本開示の一側面によれば、コンピュータにより実行される情報処理方法が提供されてもよい。 According to one aspect of the present disclosure, an information processing method corresponding to the information processing system described above may be provided. According to one aspect of the present disclosure, a computer-implemented information processing method may be provided.

本開示の一側面によれば、情報処理方法は、エンティティの第一のグループをクラスタリングして定義される複数の第一のクラスタに関し、第一のクラスタ毎に、対応する第一のクラスタの特徴を表す第一のクラスタ特徴データを備える第一のデータセットを取得することを含み得る。 According to one aspect of the present disclosure, an information processing method relates to a plurality of first clusters defined by clustering a first group of entities, and for each first cluster, a feature of a corresponding first cluster Obtaining a first data set comprising first cluster feature data representing .

情報処理方法は、エンティティの第二のグループをクラスタリングして定義される複数の第二のクラスタに関し、第二のクラスタ毎に、対応する第二のクラスタの特徴を表す第二のクラスタ特徴データを備える第二のデータセットを取得することを含み得る。 The information processing method relates to a plurality of second clusters defined by clustering a second group of entities, and for each second cluster, second cluster feature data representing a feature of the corresponding second cluster. Obtaining a second data set comprising:

情報処理方法は、第一のグループと第二のグループとの間で共通する一以上のエンティティである一以上の共通エンティティに関して、共通エンティティのそれぞれが属する第一のクラスタ及び第二のクラスタを識別可能な情報に基づき、第一のデータセットが備える複数の第一のクラスタ特徴データのそれぞれを、第二のデータセットが備える複数の第二のクラスタ特徴データのうちの一つ以上に対応付けることを含み得る。 The information processing method identifies, for one or more common entities that are one or more entities common between the first group and the second group, a first cluster and a second cluster to which each common entity belongs. Correlating each of the plurality of first cluster feature data included in the first data set with one or more of the plurality of second cluster feature data included in the second data set based on available information. can contain.

この情報処理方法によっても、上述した情報処理システムと同様に、第一のデータセットに対応するエンティティの第一のグループと、第二のデータセットに対応するエンティティの第二のグループとの間に共通するエンティティの情報を活用し、クラスタに関する特徴データ間の対応付けを精度よく行うことができる。 According to this information processing method, similarly to the information processing system described above, there is a difference between the first group of entities corresponding to the first dataset and the second group of entities corresponding to the second dataset. By utilizing information on common entities, feature data relating to clusters can be associated with high accuracy.

情報処理システムの構成を表すブロック図である。1 is a block diagram showing the configuration of an information processing system; FIG. 図２Ａは、データ提供システムにおける動作を説明する図であり、図２Ｂは、例示的なクラスタ特徴データの構成を表す図である。FIG. 2A is a diagram explaining the operation in the data providing system, and FIG. 2B is a diagram showing the configuration of exemplary cluster feature data. 共通消費者（白丸）及び非共通消費者（黒丸）を含む消費者グループのクラスタリングに関する説明図である。FIG. 4 is an explanatory diagram for clustering consumer groups including common consumers (white circles) and non-common consumers (black circles); 情報処理システムのプロセッサが実行する取得関連処理を表すフローチャートである。4 is a flowchart representing acquisition-related processing executed by a processor of the information processing system; 図５Ａは、第一のクラスタ消費者関係表を表す図であり、図５Ｂは、第二のクラスタ消費者関係表を表す図であり、図５Ｃは、クラスタ関係表を表す図である。5A is a diagram representing a first cluster-consumer relationship table, FIG. 5B is a diagram representing a second cluster-consumer relationship table, and FIG. 5C is a diagram representing a cluster-consumer relationship table. 情報処理システムのプロセッサが実行する分析処理を表すフローチャートである。4 is a flowchart representing analysis processing executed by a processor of the information processing system; 図７Ａは、分析処理における一部手順の第一例を表すフローチャートであり、図７Ｂは、分析処理における一部手順の第二例を表すフローチャートである。FIG. 7A is a flowchart showing a first example of partial procedures in the analysis process, and FIG. 7B is a flowchart showing a second example of partial procedures in the analysis process. 図８Ａは、クラスタリング処理の第一例を表すフローチャートであり、図８Ｂは、クラスタリング処理の第二例を表すフローチャートである。FIG. 8A is a flowchart representing a first example of clustering processing, and FIG. 8B is a flowchart representing a second example of clustering processing. 変形例の分析処理を表すフローチャートである。It is a flow chart showing analysis processing of a modification.

以下に本開示の例示的実施形態を、図面を参照しながら説明する。
本実施形態の情報処理システム１は、汎用コンピュータに専用のコンピュータプログラムＰｒがインストールされて構成される。情報処理システム１は、図１に示すように、プロセッサ１１と、メモリ１３と、ストレージ１５と、ユーザインタフェース１７と、通信インタフェース１９とを備える。 Exemplary embodiments of the present disclosure are described below with reference to the drawings.
The information processing system 1 of this embodiment is configured by installing a dedicated computer program Pr in a general-purpose computer. The information processing system 1 includes a processor 11, a memory 13, a storage 15, a user interface 17, and a communication interface 19, as shown in FIG.

プロセッサ１１は、ストレージ１５に格納されたコンピュータプログラムＰｒに従う処理を実行する。メモリ１３は、ＲＡＭを備える一次記憶装置であり、プロセッサ１１による処理の実行時に作業エリアとして使用される。 The processor 11 executes processing according to the computer program Pr stored in the storage 15 . The memory 13 is a primary storage device having a RAM, and is used as a work area when the processor 11 executes processing.

ストレージ１５は、例えばハードディスクドライブ又はソリッドステートドライブを備える二次記憶装置であり、コンピュータプログラムＰｒの他、コンピュータプログラムＰｒに従う処理の実行時に供される各種データを記憶する。 The storage 15 is a secondary storage device including, for example, a hard disk drive or a solid state drive, and stores various data provided during execution of processing according to the computer program Pr in addition to the computer program Pr.

ユーザインタフェース１７は、情報処理システム１を操作するユーザからの操作信号をプロセッサ１１に入力するための入力デバイスと、ユーザに対して各種情報を表示するためのディスプレイと、を備える。入力デバイスの例には、キーボード及びポインティングデバイスが含まれる。 The user interface 17 includes an input device for inputting an operation signal from a user who operates the information processing system 1 to the processor 11, and a display for displaying various information to the user. Examples of input devices include keyboards and pointing devices.

通信インタフェース１９は、広域ネットワークに接続される。情報処理システム１は、通信インタフェース１９を通じて、第一のデータ提供システム３１及び第二のデータ提供システム３２と通信可能に構成される。 Communication interface 19 is connected to a wide area network. The information processing system 1 is configured to communicate with the first data providing system 31 and the second data providing system 32 through the communication interface 19 .

プロセッサ１１は、コンピュータプログラムＰｒに従う処理の実行により、通信インタフェース１９を通じて第一のデータ提供システム３１から第一のデータセット１５Ａを取得し、第二のデータ提供システム３２から第二のデータセット１５Ｂを取得する。 Processor 11 acquires first data set 15A from first data providing system 31 through communication interface 19 and obtains second data set 15B from second data providing system 32 by executing processing according to computer program Pr. get.

プロセッサ１１は、取得した第一のデータセット１５Ａに、第二のデータセット１５Ｂを結合することにより、データリッチ化した第一のデータセット１５Ａである拡張データセット１５Ｃを生成する。 The processor 11 combines the acquired first data set 15A with the second data set 15B to generate the extended data set 15C, which is the data-enriched first data set 15A.

第一のデータセット１５Ａは、エンティティの第一のグループをクラスタリングして定義される複数の第一のクラスタに関し、第一のクラスタ毎に、対応する第一のクラスタの特徴を表すクラスタ特徴データを備えるデータセットである。 The first data set 15A includes cluster feature data representing features of the corresponding first cluster for each of a plurality of first clusters defined by clustering the first group of entities. data set.

第二のデータセット１５Ｂは、エンティティの第二のグループをクラスタリングして定義される複数の第二のクラスタに関し、第二のクラスタ毎に、対応する第二のクラスタの特徴を表すクラスタ特徴データを備えるデータセットである。 The second data set 15B includes cluster feature data representing features of the corresponding second cluster for each of a plurality of second clusters defined by clustering the second group of entities. data set.

本実施形態によれば、エンティティは、消費者である。すなわち、第一のデータセット１５Ａは、消費者の第一のグループをクラスタリングして定義される複数の第一のクラスタに関し、第一のクラスタ毎に、クラスタ特徴データを備える。 According to this embodiment, the entity is a consumer. That is, the first data set 15A comprises cluster feature data for each first cluster for a plurality of first clusters defined by clustering the first group of consumers.

図２Ａに示すように、第一のデータ提供システム３１は、第一のグループにおける複数の消費者の消費者特徴データを記憶する。各消費者特徴データは、対応する消費者の特徴を表す。第一のデータ提供システム３１は、クラスタリング処理を実行することにより、第一のグループを、複数の第一のクラスタにクラスタリングする。 As shown in FIG. 2A, a first data providing system 31 stores consumer characteristic data of a plurality of consumers in a first group. Each consumer characteristic data represents characteristics of the corresponding consumer. The first data providing system 31 clusters the first group into a plurality of first clusters by executing clustering processing.

第一のデータ提供システム３１は更に、第一のクラスタ毎に、この第一のクラスタに関する消費者特徴データの一群を統計処理して、第一のクラスタ毎のクラスタ特徴データを生成する。 The first data providing system 31 further statistically processes a group of consumer feature data relating to this first cluster for each first cluster to generate cluster feature data for each first cluster.

クラスタ特徴データのそれぞれは、対応するクラスタに属する複数の消費者の消費者特徴データを統計処理により統合して生成される。統計処理は、例えば個人情報保護のために行われる。第一のデータ提供システム３１は、このように生成した第一のクラスタ毎のクラスタ特徴データを備える第一のデータセット１５Ａを情報処理システム１に提供する。 Each piece of cluster feature data is generated by statistically integrating consumer feature data of a plurality of consumers belonging to the corresponding cluster. Statistical processing is performed, for example, for personal information protection. The first data providing system 31 provides the information processing system 1 with the first data set 15A including the cluster feature data for each first cluster thus generated.

第一のデータ提供システム３１は、プロセッサ３１１及びメモリ３１２を備える。プロセッサ３１１は、メモリ３１２が記憶するコンピュータプログラムに従って、クラスタリング処理及び統計処理を実行する。 The first data providing system 31 has a processor 311 and a memory 312 . The processor 311 executes clustering processing and statistical processing according to computer programs stored in the memory 312 .

第二のデータ提供システム３２もまた、プロセッサ３２１及びメモリ３２２を備える。プロセッサ３２１は、第一のデータ提供システム３１と同様に、メモリ３２２が記憶するコンピュータプログラムに従って、第二のデータ提供システム３２が保持する消費者特徴データの一群に対し、クラスタリング処理及び統計処理を実行する。 The second data providing system 32 also comprises a processor 321 and memory 322 . The processor 321 performs clustering processing and statistical processing on a group of consumer characteristic data held by the second data providing system 32 according to a computer program stored in the memory 322, similarly to the first data providing system 31. do.

これにより、第二のデータ提供システム３２は、消費者の第二のグループをクラスタリングして定義される複数の第二のクラスタに関し、第二のクラスタ毎に、対応する第二のクラスタの特徴を表すクラスタ特徴データを備える第二のデータセット１５Ｂを生成し、情報処理システム１に提供する。 As a result, the second data providing system 32, regarding a plurality of second clusters defined by clustering the second group of consumers, for each second cluster, the characteristics of the corresponding second cluster A second data set 15B comprising representative cluster feature data is generated and provided to the information processing system 1 .

消費者特徴データは、対応する消費者の特徴を、複数の要素の値によりベクトル表現する。クラスタ特徴データもまた、対応するクラスタの特徴を、複数の要素の値によりベクトル表現する。 The consumer feature data expresses the corresponding consumer feature as a vector using the values of a plurality of elements. The cluster feature data also expresses the feature of the corresponding cluster as a vector with multiple element values.

クラスタ特徴データにおける各要素の値は、対応するクラスタに属する消費者集合における、対応する要素の統計的な代表値である。代表値は、消費者集合における対応する要素の平均値、中央値、平均値と分散との組合せ、及び、最大値と最小値との組合せの一以上を含み得る。 The value of each element in the cluster feature data is a statistically representative value of the corresponding element in the set of consumers belonging to the corresponding cluster. A representative value may include one or more of the mean, median, combination of mean and variance, and combination of maximum and minimum of corresponding members in the consumer set.

図２Ｂに示すクラスタ特徴データは、クラスタの識別コードであるクラスタＩＤに関連付けて、対応するクラスタの特徴を表す各要素の値と、クラスタサイズと、を記述する。クラスタサイズは、対応するクラスタを構成する消費者の数に対応する。 The cluster feature data shown in FIG. 2B describes the value of each element representing the feature of the corresponding cluster and the cluster size in association with the cluster ID, which is the identification code of the cluster. A cluster size corresponds to the number of consumers that make up the corresponding cluster.

本実施形態では更に、第一のデータ提供システム３１が消費者特徴データを保持する消費者の第一のグループと、第二のデータ提供システム３２が消費者特徴データを保持する消費者の第二のグループとの間に、共通する消費者が存在する可能性を考慮して、情報処理システム１が、第一のデータ提供システム３１及び第二のデータ提供システム３２と通信する。 Further, in this embodiment, the first data providing system 31 holds a first group of consumers holding consumer characteristic data, and the second data providing system 32 holds a second group of consumers holding consumer characteristic data. The information processing system 1 communicates with the first data providing system 31 and the second data providing system 32 in consideration of the possibility that a common consumer exists between the groups.

これにより、情報処理システム１は、第一のグループと第二のグループとの間に共通して存在する消費者の一群を、共通消費者の一群として判別し、これら共通消費者のリストを、第一のデータ提供システム３１及び第二のデータ提供システム３２に送信する。 As a result, the information processing system 1 identifies a group of consumers that exist in common between the first group and the second group as a group of common consumers, and creates a list of these common consumers as follows: It is transmitted to the first data providing system 31 and the second data providing system 32 .

第一のデータ提供システム３１は、共通消費者のリストに基づいて、第一のグループを、共通消費者の一群が複数の第一のクラスタに分散して配置されるように、第一のグループをクラスタリングする。共通消費者の数がクラスタ数以下である場合には、複数の共通消費者のそれぞれが互いに異なるクラスタに属するように、第一のグループをクラスタリングする。 The first data providing system 31 divides the first group based on the list of common consumers so that one group of common consumers is distributed and arranged in a plurality of first clusters. to cluster. If the number of common consumers is equal to or less than the number of clusters, the first group is clustered such that each of the plurality of common consumers belongs to different clusters.

図３は、白丸及び黒丸で表される消費者の集合を、一つのクラスタに含まれる共通消費者が可能な限り少なくなるように、理想的には一つのクラスタに含まれる共通消費者が一人になるように、クラスタリングすることを説明している。丸い破線で囲われた部位が、一つのクラスタに対応する。白丸のそれぞれは、共通消費者の一人を示し、黒丸のそれぞれは、共通消費者ではない消費者、すなわち非共通消費者の一人を示す。 FIG. 3 shows the set of consumers represented by white circles and black circles so that the number of common consumers included in one cluster is as small as possible. It explains clustering so that A portion surrounded by a round dashed line corresponds to one cluster. Each white circle represents one of the common consumers, and each black circle represents one of the consumers who are not the common consumers, that is, the non-common consumers.

第二のデータ提供システム３２も同様に、第二のグループを、共通消費者の一群が複数の第二のクラスタに分散して配置されるように、第二のグループをクラスタリングする。共通消費者の数がクラスタ数以下である場合には、複数の共通消費者のそれぞれが互いに異なる第二のクラスタに属するように、第二のグループをクラスタリングする。 The second data providing system 32 similarly clusters the second group such that a group of common consumers is distributed over a plurality of second clusters. If the number of common consumers is equal to or less than the number of clusters, the second group is clustered such that each of the plurality of common consumers belongs to a different second cluster.

情報処理システム１は、このようなクラスタリングにより生成された第一のクラスタ毎のクラスタ特徴データを含む第一のデータセット１５Ａを、第一のデータ提供システム３１から取得し、第二のクラスタ毎のクラスタ特徴データを含む第二のデータセット１５Ｂを、第二のデータ提供システム３２から取得する。 The information processing system 1 acquires from the first data providing system 31 the first data set 15A including the cluster feature data for each first cluster generated by such clustering, and obtains the first data set 15A for each second cluster. A second data set 15B containing cluster feature data is obtained from the second data providing system 32 .

図４には、第一のデータセット１５Ａ及び第二のデータセット１５Ｂの取得のために、情報処理システム１のプロセッサ１１が実行する取得関連処理の詳細を示す。プロセッサ１１は、ユーザインタフェース１７を通じたユーザからの指示に基づき、取得関連処理を開始する。 FIG. 4 shows details of acquisition-related processing executed by the processor 11 of the information processing system 1 to acquire the first data set 15A and the second data set 15B. The processor 11 starts acquisition-related processing based on instructions from the user through the user interface 17 .

取得関連処理を開始すると、プロセッサ１１は、第一のデータ提供システム３１から第一のグループに属する消費者のリストを、第一消費者リストとして取得する（Ｓ１１０）。第一消費者リストは、第一のグループに属する消費者毎に、対応する消費者の識別コードである消費者ＩＤの情報を有する。 When the acquisition related process is started, the processor 11 acquires a list of consumers belonging to the first group from the first data providing system 31 as a first consumer list (S110). The first consumer list has consumer ID information, which is the identification code of the corresponding consumer, for each consumer belonging to the first group.

プロセッサ１１は更に、第二のデータ提供システム３２から第二のグループに属する消費者のリストを、第二消費者リストとして取得する（Ｓ１２０）。第二消費者リストは、第二のグループに属する消費者毎に、消費者ＩＤの情報を有する。 The processor 11 further acquires a list of consumers belonging to the second group from the second data providing system 32 as a second consumer list (S120). The second consumer list has consumer ID information for each consumer belonging to the second group.

共通消費者の判別のために、第一消費者リスト及び第二消費者リストには、同種の消費者ＩＤが記述され得る。消費者ＩＤは、例えば、広告ＩＤであり得る。あるいは、消費者ＩＤは、多数の消費者に付与されたＩＤであり得る。 In order to identify common consumers, the first consumer list and the second consumer list can describe the same consumer ID. A consumer ID can be, for example, an advertisement ID. Alternatively, the consumer ID may be an ID given to multiple consumers.

第一消費者リストで使用される消費者ＩＤである第一消費者ＩＤは、第二消費者リストで使用される消費者ＩＤである第二消費者ＩＤとは異なる種類の識別コードであってもよい。この場合には、同一人物を判別するために、第一消費者ＩＤと第二消費者ＩＤとの間の対応関係を示すテーブルが必要になり得る。 The first consumer ID, which is the consumer ID used in the first consumer list, is an identification code of a different type from the second consumer ID, which is the consumer ID used in the second consumer list good too. In this case, a table showing the correspondence between the first consumer ID and the second consumer ID may be required in order to identify the same person.

続くＳ１３０において、プロセッサ１１は、第一消費者リストに列挙された消費者ＩＤと、第二消費者リストに列挙された消費者ＩＤとの比較により、第一のグループと第二のグループとの間で共通する一以上の消費者である一以上の共通消費者を判別し、共通消費者のリストを生成する。 In subsequent S130, the processor 11 compares the consumer IDs listed in the first consumer list and the consumer IDs listed in the second consumer list to determine the difference between the first group and the second group. One or more common consumers that are one or more common consumers among the common consumers are determined, and a list of common consumers is generated.

続くＳ１４０において、プロセッサ１１は、Ｓ１３０で作成した共通消費者のリストを、第一及び第二のデータ提供システム３１，３２に送信し、第一及び第二のデータ提供システム３１，３２に、このリストに基づき、上述した第一のデータセット１５Ａ及び第二のデータセット１５Ｂを生成させる。 In subsequent S140, the processor 11 transmits the list of common consumers created in S130 to the first and second data providing systems 31 and 32, and sends this list to the first and second data providing systems 31 and 32. Based on the list, the first data set 15A and the second data set 15B described above are generated.

続くＳ１５０において、プロセッサ１１は、第一のデータ提供システム３１から、第一のデータセット１５Ａを取得すると共に、第一のクラスタ消費者関係表を取得する。第一のクラスタ消費者関係表は、例えば、図５Ａに示すように、共通消費者の消費者ＩＤ毎に、対応する消費者が属する第一のクラスタの識別コードである第一クラスタＩＤを記述する。第一のデータ提供システム３１は、第一のデータセット１５Ａと共に、第一のクラスタ消費者関係表を情報処理システム１に送信するように構成される。 In subsequent S150, the processor 11 acquires the first data set 15A and the first cluster consumer relationship table from the first data providing system 31. FIG. For example, as shown in FIG. 5A, the first cluster-consumer relationship table describes the first cluster ID, which is the identification code of the first cluster to which the corresponding consumer belongs, for each consumer ID of the common consumer. do. The first data providing system 31 is configured to send the first cluster consumer relationship table to the information processing system 1 together with the first data set 15A.

プロセッサ１１は、第二のデータ提供システム３２から、第二のデータセット１５Ｂを取得すると共に、第二のクラスタ消費者関係表を取得する（Ｓ１６０）。第二のクラスタ消費者関係表は、例えば、図５Ｂに示すように、共通消費者の消費者ＩＤ毎に、対応する消費者が属する第二のクラスタの識別コードである第二クラスタＩＤを記述する。第二のデータ提供システム３２は、第二のデータセット１５Ｂと共に、第二のクラスタ消費者関係表を情報処理システム１に送信するように構成される。 The processor 11 obtains the second data set 15B and the second cluster-consumer relationship table from the second data providing system 32 (S160). For example, as shown in FIG. 5B, the second cluster-consumer relationship table describes, for each consumer ID of a common consumer, the second cluster ID, which is the identification code of the second cluster to which the corresponding consumer belongs. do. The second data providing system 32 is configured to send the second cluster-consumer relational table to the information processing system 1 together with the second data set 15B.

続くＳ１７０において、プロセッサ１１は、取得した第一及び第二のクラスタ消費者関係表に基づき、同じ消費者が含まれる第一のクラスタと第二のクラスタとの組合せを示した図５Ｃに示すクラスタ関係表を生成する。クラスタ関係表は、共通消費者のそれぞれが属する第一のクラスタ及び第二のクラスタを識別可能な情報に対応する。図５Ｃに例示されるクラスタ関係表は、同じ共通消費者が含まれる第一のクラスタと、第二のクラスタとの組合せを、第一のクラスタのクラスタＩＤと第二のクラスタのクラスタＩＤとの組み合わせにより示す。 In subsequent S170, the processor 11, based on the acquired first and second cluster-consumer relationship tables, generates clusters shown in FIG. Generate a relationship table. The cluster relationship table corresponds to information that can identify the first cluster and the second cluster to which each common consumer belongs. The cluster relationship table exemplified in FIG. 5C shows the combination of the first cluster and the second cluster that include the same common consumer as the cluster ID of the first cluster and the cluster ID of the second cluster. Shown in combination.

続くＳ１８０において、プロセッサ１１は、Ｓ１７０で生成したクラスタ関係表を、第一のデータセット１５Ａ及び第二のデータセット１５Ｂと共に、ストレージ１５に保存して図４に示す処理を終了する。 In subsequent S180, the processor 11 stores the cluster relation table generated in S170 together with the first data set 15A and the second data set 15B in the storage 15, and ends the processing shown in FIG.

続いて、プロセッサ１１が実行する分析処理の詳細を、図６を用いて説明する。分析処理は、例えばユーザインタフェース１７を通じたユーザからの指示に基づいて実行される。 Next, the details of the analysis process executed by the processor 11 will be described with reference to FIG. Analysis processing is executed based on an instruction from the user through the user interface 17, for example.

分析処理を開始すると、プロセッサ１１は、ユーザから指定された第一のデータセット１５Ａをストレージ１５から読み出し、読み出した第一のデータセット１５Ａに含まれる第一のクラスタ毎のクラスタ特徴データに基づき、配列Ｄ_ｆを生成する（Ｓ２１０）。 When the analysis process is started, the processor 11 reads the first data set 15A specified by the user from the storage 15, and based on the cluster feature data for each first cluster included in the read first data set 15A, An array D _f is generated (S210).

例えば、クラスタがＫ１個であり、Ｋ１個のクラスタ特徴データのそれぞれが、Ｌ１個の要素を有するＬ１次元特徴ベクトルでクラスタの特徴を表すとき、配列Ｄ_ｆは、Ｋ１行Ｌ１列の二次元配列である。 For example, when the number of clusters is K1, and each of the K1 cluster feature data represents the cluster feature with an L1-dimensional feature vector having L1 elements, the array D _f is a two-dimensional array of K1 rows and L1 columns. is.

配列Ｄ_ｆの各行ベクトルＤ_ｆ［ｉ］＝（ｘ_ｉ［１］，ｘ_ｉ［２］，ｘ_ｉ［３］，…，ｘ_ｉ［Ｌ１］）は、対応するｉ番目の第一のクラスタのＬ１次元特徴ベクトルである。ここでのｉは、配列Ｄ_ｆの行番号に対応し、クラスタ数Ｋ１に対応する値１から値Ｋ１までの整数値を取る。 Each row vector D _f [ _i ] ₌ (x _i [1], x _i [2], x _i [3], . is the L1-dimensional feature vector of . i here corresponds to the row number of the array _Df and takes an integer value from 1 corresponding to the number of clusters K1 to K1.

以下では、第一のデータセット１５Ａに含まれるクラスタ特徴データによって表現される特徴ベクトルＤ_ｆ［ｉ］に対応するＬ１次元空間のことを第一の特徴空間と表現する。特徴ベクトルＤ_ｆ［ｉ］は、対応する第一のクラスタが第一の特徴空間上において配置される点を定義する。第一の特徴空間は、特徴ベクトルＤ_ｆ［ｉ］が有するＬ１個の要素のそれぞれに対応する次元を有するＬ１次元空間である。 Below, the L1-dimensional space corresponding to the feature vector D _f [i] represented by the cluster feature data included in the first data set 15A is referred to as the first feature space. The feature vector D _f [i] defines the point at which the corresponding first cluster is located on the first feature space. The first feature space is an L1-dimensional space having dimensions corresponding to the L1 elements of the feature vector D _f [i].

続くＳ２２０において、プロセッサ１１は、ユーザから指定された第二のデータセット１５Ｂをストレージ１５から読み出し、読み出した第二のデータセット１５Ｂに含まれる第二のクラスタ毎のクラスタ特徴データに基づき、配列Ｄ_ｓを生成する。 In subsequent S220, the processor 11 reads the second data set 15B designated by the user from the storage 15, and based on the cluster feature data for each second cluster included in the read second data set 15B, the array D generate _s .

例えば、クラスタがＫ２個であり、Ｋ２個のクラスタ特徴データのそれぞれが、Ｌ２個の要素を有するＬ２次元特徴ベクトルでクラスタの特徴を表すとき、配列Ｄ_ｓは、Ｋ２行Ｌ２列の二次元配列である。 For example, when the number of clusters is K2, and each of the K2 cluster feature data represents the cluster feature with an L2-dimensional feature vector having L2 elements, the array D _s is a two-dimensional array of K2 rows and L2 columns. is.

配列Ｄｓの各行ベクトルＤ_ｓ［ｊ］＝（ｙ_ｊ［１］，ｙ_ｊ［２］，ｙ_ｊ［３］，…，ｙ_ｊ［Ｌ２］）は、対応するｊ番目の第二のクラスタのＬ２次元特徴ベクトルである。ここでのｊは、配列Ｄ_ｓの行番号に対応し、クラスタ数Ｋ２に対応する値１から値Ｋ２までの整数値を取る。 Each row vector D _s [ _j ]=(y _j [1], y _j [2], y _j [3], . It is an L2-dimensional feature vector. Here j corresponds to the row number of the array _Ds and takes an integer value from 1 corresponding to the number of clusters K2 to K2.

以下では、第二のデータセット１５Ｂに含まれるクラスタ特徴データによって表現される特徴ベクトルＤ_ｓ［ｊ］に対応するＬ２次元空間のことを第二の特徴空間と表現する。特徴ベクトルＤ_ｓ［ｊ］は、対応する第二のクラスタが第二の特徴空間上で配置される点を定義する。第二の特徴空間は、特徴ベクトルＤ_ｓ［ｊ］が有するＬ２個の要素のそれぞれに対応する次元を有するＬ２次元空間である。 Below, the L2-dimensional space corresponding to the feature vector D _s [j] represented by the cluster feature data included in the second data set 15B is referred to as the second feature space. The feature vector D _s [j] defines the points at which the corresponding second clusters are located on the second feature space. The second feature space is an L2-dimensional space having dimensions corresponding to the L2 elements of the feature vector D _s [j].

続くＳ２３０において、プロセッサ１１は、次に示す評価関数Ｅを最小にする関数ｇを探索することにより、第一のデータセット１５Ａから特定される第一の特徴空間上の共通消費者に対応する点を、第二のデータセット１５Ｂから特定される第二の特徴空間上の同一人物の点近くに、最もよく写す写像を探索する。 In subsequent S230, the processor 11 searches for a function g that minimizes the evaluation function E shown below, so that the points corresponding to the common consumers on the first feature space identified from the first data set 15A near the point of the same person on the second feature space identified from the second data set 15B.

上式におけるｍは、第一のデータセット１５Ａにクラスタ特徴データが記述されるＫ１個の第一のクラスタに対応する消費者群のうちのｍ番目の消費者を意味する。

m in the above formula means the m-th consumer in the consumer group corresponding to the K1 first clusters whose cluster feature data are described in the first data set 15A.

ここでの消費者群は、第一のクラスタのそれぞれには、クラスタサイズに対応する複数の消費者が存在するとみなしたときに解釈されるＫ１個の第一のクラスタに対応する消費者群であり得る。あるいは、消費者群は、クラスタのそれぞれに一人の代表する消費者が存在するとみなしたときに解釈されるＫ１個の第一のクラスタに対応するＫ１人の消費者群であり得る。Ａ_ｆ ^－１［ｍ］は、ｍ番目の消費者が所属する第一のクラスタが、Ｋ１個のクラスタのうち、Ａ_ｆ ^－１［ｍ］番目のクラスタであることを意味する。 The consumer group here is a consumer group corresponding to K1 first clusters interpreted when it is assumed that each of the first clusters has a plurality of consumers corresponding to the cluster size. could be. Alternatively, the consumer group may be the K1 consumer group corresponding to the K1 first clusters interpreted when one representative consumer is assumed to exist in each of the clusters. A _f ⁻¹ [m] means that the first cluster to which the m-th consumer belongs is the A _f ⁻¹ [m]-th cluster among K1 clusters.

上式におけるｎは、第二のデータセット１５Ｂにクラスタ特徴データが記述されるＫ２個の第二のクラスタに対応する消費者群のうちのｎ番目の消費者を意味する。ここでの消費者群は、第二のクラスタのそれぞれには、クラスタサイズに対応する複数の消費者が存在するとみなしたときに解釈されるＫ２個の第二のクラスタに対応する消費者群であり得る。あるいは、消費者群は、クラスタのそれぞれに一人の代表する消費者が存在するとみなしたときに解釈されるＫ２個の第二のクラスタに対応するＫ２人の消費者群であり得る。Ａ_ｓ ^－１［ｎ］は、ｎ番目の消費者が所属する第二のクラスタが、Ｋ２個のクラスタのうち、Ａ_ｓ ^－１［ｎ］番目のクラスタであることを意味する。 n in the above formula means the n-th consumer in the consumer group corresponding to the K2 second clusters whose cluster feature data are described in the second data set 15B. The consumer group here is the consumer group corresponding to the K2 second clusters interpreted when it is assumed that each of the second clusters has a plurality of consumers corresponding to the cluster size. could be. Alternatively, the consumer group may be the K2 consumer group corresponding to the K2 second clusters interpreted when one representative consumer is assumed to exist in each of the clusters. A _s ⁻¹ [n] means that the second cluster to which the nth consumer belongs is the A _s ⁻¹ [n]th cluster among K2 clusters.

ｄ（ｖ１，ｖ２）は、距離関数であり、ベクトルｖ１とベクトルｖ２との間の距離（例えばマハラノビス距離）を表す。τは、設計パラメータであり、ユーザにより、換言すれば分析者により定められる。 d(v1, v2) is a distance function representing the distance (eg Mahalanobis distance) between vector v1 and vector v2. τ is a design parameter and is defined by the user, in other words by the analyst.

評価関数Ｅの第一項及び第二項のそれぞれは、第一のグループにおけるｍ番目の消費者が所属するＡ_ｆ ^－１［ｍ］番目の第一のクラスタの第一の特徴空間上の点Ｄ_ｆ［Ａ_ｆ ^－１［ｍ］］を、写像としての関数ｇにより第二の特徴空間上の点に写したときの、第二の特徴空間上の点ｇ（Ｄ_ｆ［Ａ_ｆ ^－１［ｍ］］）と、第二のグループにおけるｎ番目の消費者が所属するＡ_ｓ ^－１［ｎ］番目の第二のクラスタの第二の特徴空間上の点Ｄ_ｓ［Ａ_ｓ ^－１［ｎ］］との間の距離ｄ（ｇ（Ｄ_ｆ［Ａ_ｆ ^－１［ｍ］］），Ｄ_ｓ［Ａ_ｓ ^－１［ｎ］］）の総和Σｄ（ｇ（Ｄ_ｆ［Ａ_ｆ ^－１［ｍ］］），Ｄ_ｓ［Ａ_ｓ ^－１［ｎ］］）を表す。 Each of the first and second terms of the evaluation function E is a point on the first feature space of the A _f ⁻¹ [m]-th first cluster to which the m-th consumer in the first group belongs ^A _point _{g (D f} _[ A _f ^-1 [ _m ]]) and ^a point D _s [A _s ^-1 [ _Σd ( ^g ⁽ _D _f _[ _A _f ^-1 [m]]) and D _s [A _s ⁻¹ [n]]).

但し、評価関数Ｅの第一項は、クラスタ関係表によれば、Ａ_ｆ ^－１［ｍ］番目の第一のクラスタとＡ_ｓ ^－１［ｎ］番目の第二のクラスタとの間に同じ消費者が存在するｍ，ｎの組合せの集合Ｍ１についての総和Σｄ（ｇ（Ｄ_ｆ［Ａ_ｆ ^－１［ｍ］］），Ｄ_ｓ［Ａ_ｓ ^－１［ｎ］］）である。すなわち、同じ消費者の組合せ（ｍ，ｎ）についての総和Σｄ（ｇ（Ｄ_ｆ［Ａ_ｆ ^－１［ｍ］］），Ｄ_ｓ［Ａ_ｓ ^－１［ｎ］］）である。 However, according to the cluster relation table, the first term of the evaluation function E is the same between the A _f ^-1 [m]-th first cluster and the A _s ^-1 [n]-th second cluster It is the total sum Σd(g(D _f [A _f ⁻¹ [m]]), D _s [A _s ⁻¹ [n]]) for a set M1 of combinations of m and n in which consumers exist. That is, the sum Σd(g(D _f [A _f ⁻¹ [m]]), D _s [A _s ⁻¹ [n]]) for the same consumer combination (m, n).

評価関数Ｅの第二項は、クラスタ関係表によれば、Ａ_ｆ ^－１［ｍ］番目の第一のクラスタとＡ_ｓ ^－１［ｎ］番目の第二のクラスタとの間に同じ消費者が存在しないｍ，ｎの組合せの集合Ｍ２についての総和Σｄ（ｇ（Ｄ_ｆ［Ａ_ｆ ^－１［ｍ］］），Ｄ_ｓ［Ａ_ｓ ^－１［ｎ］］）である。すなわち、異なる消費者の組合せ（ｍ，ｎ）についての総和Σｄ（ｇ（Ｄ_ｆ［Ａ_ｆ ^－１［ｍ］］），Ｄ_ｓ［Ａ_ｓ ^－１［ｎ］］）である。 According to the cluster relation table, the second term of the evaluation function E is the same consumer between the A _f ⁻¹ [m]th first cluster and the A _s ⁻¹ [n]th second cluster is the summation Σd(g(D _f [A _f ⁻¹ [m]]), D _s [A _s ⁻¹ [n]]) for the set M2 of combinations of m and n that do not exist. That is, the sum Σd(g(D _f [A _f ⁻¹ [m]]), D _s [A _s ⁻¹ [n]]) for different consumer combinations (m, n).

この評価関数Ｅは、同じ消費者を含む第一のクラスタと第二のクラスタとのペアである特定ペアに関しては、特定ペアを構成する第一のクラスタと第二のクラスタとの間の第二の特徴空間上の距離が短いほど、小さい値を出力し、特定ペア以外の第一のクラスタと第二のクラスタとのペアに関しては、第一のクラスタと第二のクラスタとの間の第二の特徴空間上の距離が短いほど、大きい値を出力する関数である。 This evaluation function E is the second The shorter the distance in the feature space of , the smaller the output value. For pairs of the first cluster and the second cluster other than the specific pair, is a function that outputs a larger value as the distance in the feature space of is shorter.

従って、評価関数Ｅを最小にする関数ｇを探索することによれば、第一のデータセット１５Ａから特定される第一の特徴空間上の共通消費者に対応する点を、第二のデータセット１５Ｂから特定される第二の特徴空間上の同一人物の点に近い位置に、最もよく写す写像を探索することができる。この写像によれば、非同一人物間の距離は、第二の特徴空間上で遠くなる。関数ｇは、変換直交行列であってもよいし、多層パーセプトロンであってもよい。 Therefore, by searching for the function g that minimizes the evaluation function E, the point corresponding to the common consumer on the first feature space specified from the first data set 15A is transferred to the second data set A mapping that best maps to a position close to the point of the same person on the second feature space identified from 15B can be searched. According to this mapping, the distance between non-identical persons becomes far on the second feature space. The function g may be a transform orthogonal matrix or a multi-layer perceptron.

Ｓ２３０における評価関数Ｅを最小にする関数ｇの探索後、プロセッサ１１は、探索された関数ｇにより写されるＫ１個の第一のクラスタの第二の特徴空間上の点ｇ（Ｄ_ｆ［Ａ_ｆ ^－１［ｍ］］）と、Ｋ２個の第二のクラスタの第二の特徴空間上の点Ｄ_ｓ［Ａ_ｓ ^－１［ｎ］］との間の位置関係（換言すれば相対位置）に基づいて、複数の第一のクラスタのそれぞれを、第二のクラスタの一以上に対応付ける（Ｓ２４０）。 After searching for the function g that minimizes the evaluation function E in S230, the processor 11 selects a point g (D _f [A _f ⁻¹ [m]]) and the point D _s [A _s ⁻¹ [n]] on the second feature space of the K2 second clusters (in other words, relative position) , each of the plurality of first clusters is associated with one or more of the second clusters (S240).

例えば、プロセッサ１１は、各点には、対応するクラスタのクラスタサイズに一致する人数の消費者が存在するとみなして、最適輸送問題を解くことにより、複数の第一のクラスタのそれぞれを、第二のクラスタの一以上に対応付ける。 For example, processor 11 may solve each of the plurality of first clusters by solving the optimal transportation problem, assuming that each point has a number of consumers matching the cluster size of the corresponding cluster. map to one or more of the clusters of

例えば、プロセッサ１１は、Ｓ２４０において図７Ａに示す処理を実行することができる。この処理によれば、プロセッサ１１は、クラスタ関係表に基づき、同じ消費者を含む第一のクラスタと第二のクラスタとのペアに関しては、第一のクラスタと第二のクラスタとの間の距離を特定値εに設定する（Ｓ２４１）。値εは、例えばゼロ又はゼロに近い値であり得る。 For example, the processor 11 can perform the process shown in FIG. 7A at S240. According to this process, the processor 11 computes, based on the cluster relationship table, for a pair of a first cluster and a second cluster containing the same consumer, the distance between the first cluster and the second cluster is set as a specific value ε (S241). The value ε can be, for example, zero or a value close to zero.

プロセッサ１１は、同じ消費者を含まない第一のクラスタと第二のクラスタとのペアに関しては、特徴ベクトルｇ（Ｄ_ｆ［Ａ_ｆ ^－１［ｍ］］）及び特徴ベクトルＤ_ｓ［Ａ_ｓ ^－１［ｎ］］から特定される第一のクラスタと第二のクラスタとの間の第二の特徴空間上の距離をそのまま用いて、上記最適輸送問題を解くことにより、複数の第一のクラスタのそれぞれを、第二のクラスタの一以上に対応付ける（Ｓ２４３）。 For pairs of first and second clusters that do not contain the same consumer, processor 11 computes feature vector g(D _f [A _f ⁻¹ [m]]) and feature vector D _s [A _s ^{− 1} [n]] using the distance in the second feature space between the first cluster and the second cluster as it is, by solving the above optimal transportation problem, a plurality of first clusters are associated with one or more of the second clusters (S243).

あるいは、プロセッサ１１は、Ｓ２４０において図７Ｂに示す処理を実行することができる。この処理によれば、プロセッサ１１は、同じ消費者を含む第一のクラスタと第二のクラスタとのペアに関しては、ペア毎に、クラスタ特徴データを参照せず、一律に、第一のクラスタをペアに対応する第二のクラスタに対応付ける（Ｓ２４５）。 Alternatively, processor 11 may perform the process shown in FIG. 7B at S240. According to this processing, the processor 11 does not refer to the cluster feature data for each pair of the first cluster and the second cluster that include the same consumer, and uniformly selects the first cluster. It is associated with the second cluster corresponding to the pair (S245).

同じ消費者を含まない第一のクラスタと第二のクラスタとのペアの一群に関しては、ペア毎に、クラスタ特徴データから特定される第一のクラスタの第二の特徴空間上の点ｇ（Ｄ_ｆ［Ａ_ｆ ^－１［ｍ］］）と、第二のクラスタの第二の特徴空間上の点Ｄ_ｓ［Ａ_ｓ ^－１［ｎ］］との間の位置関係に基づいて、複数の第一のクラスタのそれぞれを、第二のクラスタの一以上に対応付ける（Ｓ２４７）。 For a group of pairs of first clusters and second clusters that do not contain the same consumer, for each pair, a point g (D _A _plurality ^of _first ^_ _{_} Each one cluster is associated with one or more second clusters (S247).

例えば、プロセッサ１１は、同じ消費者を含まない第一のクラスタと第二のクラスタとのペアの一群に限って、これに対する最適輸送問題を解くことにより、複数の第一のクラスタのそれぞれを、第二のクラスタの一以上に対応付ける（Ｓ２４７）。 For example, processor 11 may solve each of the plurality of first clusters by solving an optimal transport problem for only a set of pairs of first clusters and second clusters that do not contain the same consumer: One or more second clusters are associated (S247).

このようしてＳ２４０での処理を終了すると、プロセッサ１１は、第一のデータセット１５Ａに含まれる第一のクラスタ毎のクラスタ特徴データを、第二のデータセット１５Ｂに含まれる、Ｓ２４０の処理で対応付けられた一以上の第二のクラスタのクラスタ特徴データと結合するように、第一のデータセット１５Ａと第二のデータセット１５Ｂとを結合する。この結合（すなわち、データフュージョン）により、プロセッサ１１は、拡張データセット１５Ｃを生成し、ストレージ１５に保存する（Ｓ２５０）。その後、プロセッサ１１は、分析処理を終了する。 After completing the processing in S240 in this way, the processor 11 converts the first cluster feature data for each cluster included in the first data set 15A to the second data set 15B in the processing of S240. The first data set 15A and the second data set 15B are combined so as to combine the cluster feature data of the associated one or more second clusters. Through this combination (that is, data fusion), the processor 11 generates the extended data set 15C and stores it in the storage 15 (S250). After that, the processor 11 terminates the analysis process.

以上に説明した本実施形態の情報処理システム１によれば、第一のデータセット１５Ａに対応する消費者の第一のグループと、第二のデータセット１５Ｂに対応する消費者の第二のグループとの間に共通する消費者の情報を活用し、クラスタに関する特徴データ間の対応付けを精度よく行うことができる。特には、同じ消費者を有する第一のクラスタと第二のクラスタとを対応付けるように、第一のクラスタ特徴データと、第二のクラスタ特徴データとを対応付けることができる。 According to the information processing system 1 of the present embodiment described above, the first group of consumers corresponding to the first data set 15A and the second group of consumers corresponding to the second data set 15B It is possible to accurately associate feature data related to clusters by utilizing common consumer information between and. In particular, the first cluster feature data and the second cluster feature data can be associated such that a first cluster and a second cluster having the same consumer are associated.

ところで、対応付けの精度向上のためには、第一のデータ提供システム３１及び第二のデータ提供システム３２において、一つのクラスタに含まれる共通消費者が可能な限り少なくなるように、理想的には一つのクラスタに含まれる共通消費者が一人になるように、クラスタリング処理が実行されることが好ましい。 By the way, in order to improve the accuracy of matching, ideally, in the first data providing system 31 and the second data providing system 32, the number of common consumers included in one cluster should be as small as possible. It is preferable that the clustering process is performed so that one common consumer is included in one cluster.

このために、第一のデータ提供システム３１及び第二のデータ提供システム３２では、ｋ－ｍｅａｎｓ法を改良した図８Ａに示すクラスタリング処理が実行され得る。 For this reason, the first data providing system 31 and the second data providing system 32 can perform the clustering process shown in FIG. 8A, which is an improvement of the k-means method.

すなわち、第一のデータ提供システム３１のプロセッサ３１１は、クラスタリング処理として、次のペナルティ付き評価関数Ｈが最小となるように、第一のグループを所定数Ｋ個のクラスタにクラスタリングする処理（Ｓ３００）を実行することができる。 That is, the processor 311 of the first data providing system 31 clusters the first group into a predetermined number K of clusters so that the following penalized evaluation function H is minimized as clustering processing (S300). can be executed.

評価関数Ｈの第一項は、ｋ－ｍｅａｎｓ法に従う評価関数であり、評価関数Ｈの第二項は、一つのクラスタに複数の共通消費者が存在するときに、ペナルティλを加算する項である。

The first term of the evaluation function H is an evaluation function according to the k-means method, and the second term of the evaluation function H is a term that adds a penalty λ when there are multiple common consumers in one cluster. be.

評価関数Ｈにおけるｍは、消費者のインデックスであり、ｋは、クラスタのインデックスである。Ａ［ｋ］は、第ｋクラスタを構成する消費者集合である。指示関数Ｉ（ｍ∈Ａ［ｋ］）は、第ｍ消費者が第ｋクラスタを構成する消費者集合に存在するとき、値１を返し、それ以外の場合には値０を返す関数である。Ｗ［ｍ］は、第ｍ消費者の重みである。重みＷは１と理解されてもよい。 m in the evaluation function H is the consumer index and k is the cluster index. A[k] is the set of consumers that make up the k-th cluster. An indicator function I(mεA[k]) is a function that returns a value of 1 if the mth consumer is in the set of consumers that make up the kth cluster, and a value of 0 otherwise. . W[m] is the weight of the mth consumer. The weight W may be understood as one.

Ｃ［ｋ］は、第ｋクラスタの中心ベクトルであり、Ｄ［ｍ］は、第ｍ消費者の消費者特徴データから特定される第ｍ消費者の特徴ベクトルである。Ａ^－１［ｉ］は、第ｉ消費者が所属するクラスタが、Ｋ個のクラスタのうち、第Ａ^－１［ｉ］クラスタであることを示す。 C[k] is the center vector of the k-th cluster, and D[m] is the feature vector of the m-th consumer identified from the consumer feature data of the m-th consumer. A ⁻¹ [i] indicates that the cluster to which the i-th consumer belongs is the A ⁻¹ [i]-th cluster among K clusters.

指示関数Ｉ（Ａ^－１［ｉ］＝＝Ａ^－１［ｊ］）は、共通消費者の集合Ｑのうち、第ｉ消費者と第ｊ消費者とが同じクラスタに存在するとき値１を返し、第ｉ消費者と第ｊ消費者とが同じクラスタに存在しないとき値０を返す関数である。 The indicator function I (A ⁻¹ [i]==A ⁻¹ [j]) gives a value of 1 when the i-th consumer and the j-th consumer in the set Q of common consumers exist in the same cluster. is a function that returns the value 0 when the i-th consumer and the j-th consumer are not in the same cluster.

すなわち、評価関数Ｈの第二項は、共通消費者の集合Ｑであって、ｉ≠ｊである消費者の組合せについての値Ｉ（Ａ^－１［ｉ］＝＝Ａ^－１［ｊ］）の和にペナルティλを掛けた項である。ペナルティλは正の値として設計者により定められ得る。 That is, the second term of the evaluation function H is the set Q of common consumers, and the value I (A ⁻¹ [i] == A ⁻¹ [j]) for the combination of consumers where i ≠ j is a term obtained by multiplying the sum of , by the penalty λ. The penalty λ can be defined by the designer as a positive value.

第二のデータ提供システム３２のプロセッサ３２１も、第一のデータ提供システム３１と同様にペナルティ付き評価関数Ｈに従って、第二のグループを複数の第二のクラスタにクラスタリングすることができる。 The processor 321 of the second data providing system 32 can also cluster the second group into a plurality of second clusters according to the penalized evaluation function H as in the first data providing system 31 .

別例として、第一のデータ提供システム３１のプロセッサ３１１は、図８Ｂに示すクラスタリング処理を実行してもよい。 As another example, the processor 311 of the first data providing system 31 may perform the clustering process shown in FIG. 8B.

図８Ｂに示すクラスタリング処理によれば、プロセッサ３１１は、所定数Ｋ個の空のクラスタを設定する（Ｓ４１０）。その後、プロセッサ３１１は、消費者の第一のグループのうち、第二のグループと共通する共通消費者の一人を、対象消費者としてランダムに選択する（Ｓ４２０）。 According to the clustering process shown in FIG. 8B, the processor 311 sets a predetermined number K of empty clusters (S410). After that, the processor 311 randomly selects one of the common consumers in common with the second group among the first group of consumers as the target consumer (S420).

続いて、プロセッサ３１１は、設定されたＫ個のクラスタの中に、共通消費者が割り当てられていない空のクラスタが存在するかを判断する（Ｓ４３０）。プロセッサ３１１は、空のクラスタが存在すると判断すると（Ｓ４３０でＹｅｓ）、対象消費者を、空のクラスタの一つに割り当てる（Ｓ４４０）。 Subsequently, the processor 311 determines whether there is an empty cluster to which no common consumer is assigned among the set K clusters (S430). If processor 311 determines that there is an empty cluster (Yes at S430), processor 311 assigns the target consumer to one of the empty clusters (S440).

続くＳ４５０において、プロセッサ３１１は、共通消費者の全てをＫ個のクラスタのいずれかに割り当てたかを判断する。ここで否定判断すると（Ｓ４５０でＮｏ）、プロセッサ３１１は、Ｓ４２０において新たに共通消費者を一人、対象消費者に選択し、選択した対象消費者についてＳ４３０以降の処理を実行する。 At subsequent S450, the processor 311 determines whether all common consumers have been assigned to any of the K clusters. If a negative determination is made here (No in S450), the processor 311 newly selects one common consumer as the target consumer in S420, and executes the processes from S430 onward for the selected target consumer.

このようにして、プロセッサ３１１は、共通消費者毎にＳ４３０以降の処理を実行し、共通消費者のそれぞれを、Ｋ個のクラスタのいずれかに割り当てる。 In this way, the processor 311 executes the processes from S430 onward for each common consumer, and assigns each common consumer to one of K clusters.

クラスタ数より共通消費者数が少ない場合には、Ｓ４３０で否定判断されることなく、共通消費者がＫ個のクラスタのいずれかに割り当てられる。一方、クラスタ数より共通消費者数が多い場合、Ｋ人の共通消費者を空のクラスタに割り当てた後、残りの共通消費者を新たに対象消費者に選択するとき、プロセッサ３１１は、Ｓ４３０で空のクラスタが存在しないと判断する。 If the number of common consumers is smaller than the number of clusters, the common consumer is assigned to one of the K clusters without making a negative determination in S430. On the other hand, if the number of common consumers is greater than the number of clusters, after assigning K common consumers to an empty cluster, when selecting the remaining common consumers as new target consumers, the processor 311 at S430: Determine that there are no empty clusters.

Ｓ４３０で空のクラスタが存在しないと判断すると、プロセッサ３１１は、対象消費者を、次の評価関数Ｒ＝ｄ（Ｃ［ｋ］，Ｄ［ｉ］）が最小となるクラスタに割り当てる（Ｓ４４５）。 If it is determined at S430 that there is no empty cluster, the processor 311 assigns the target consumer to the cluster with the smallest evaluation function R=d(C[k], D[i]) (S445).

評価関数Ｒにおける、Ｃ［ｋ］は、第ｋクラスタの中心ベクトルであり、Ｄ［ｉ］は、対象消費者である第ｉ消費者の消費者特徴データから特定される第ｉ消費者の特徴ベクトルである。ｄ（）は、距離関数である。 In the evaluation function R, C[k] is the center vector of the k-th cluster, and D[i] is the feature of the i-th consumer specified from the consumer feature data of the i-th consumer who is the target consumer. is a vector. d() is the distance function.

Ｓ４４５におけるクラスタの割当後、プロセッサ３１１は、Ｓ４５０の処理を実行する。Ｓ４５０において共通消費者の全てをＫ個のクラスタのいずれかに割り当てたと判断すると、プロセッサ３１１は、第一のグループにおける残りの非共通消費者のそれぞれを、ｋ－ｍｅａｎｓ法と同様に、当該消費者の特徴ベクトルとクラスタの中心ベクトルとの間の距離の総和が最も短くなるように、近傍のクラスタに割り当てる（Ｓ４６０）。 After allocating clusters in S445, the processor 311 executes the process of S450. Upon determining in S450 that all common consumers have been assigned to one of the K clusters, processor 311 assigns each of the remaining non-common consumers in the first group to Neighboring clusters are assigned so that the sum of the distances between the feature vector of the person and the center vector of the cluster is the shortest (S460).

プロセッサ３１１は、このようにして、第一のグループに属する消費者のそれぞれをＫ個のクラスタのいずれかに割り当てることにより、消費者の第一のグループをＫ個のクラスタにクラスタリングする。その後、図８Ｂに示すクラスタリング処理を終了する。 The processor 311 thus clusters the first group of consumers into K clusters by assigning each consumer belonging to the first group to one of the K clusters. After that, the clustering process shown in FIG. 8B ends.

このクラスタリング処理によれば、共通消費者の数がクラスタ数より少ないときには、一つのクラスタに含まれる共通消費者が一人になるように、共通消費者を複数のクラスタに分散して配置することができる。従って、情報処理システム１における第一のクラスタと第二のクラスタとの対応付けを、共通消費者の情報に基づいて、高精度に行うことが可能である。 According to this clustering process, when the number of common consumers is less than the number of clusters, the common consumers can be distributed and arranged in a plurality of clusters so that one common consumer is included in one cluster. can. Therefore, it is possible to highly accurately associate the first cluster and the second cluster in the information processing system 1 based on the information of the common consumer.

以上に説明した本開示の技術は、上述した実施形態に限定されるものではなく、種々の態様を採り得る。 The technology of the present disclosure described above is not limited to the above-described embodiments, and can take various forms.

上述した実施形態は、第一のデータセット１５Ａに含まれるクラスタ特徴データと、第二のデータセット１５Ｂに含まれるクラスタ特徴データとの間に、クラスタの特徴を説明する要素として共通する要素が存在しない場合でも、写像（関数ｇ）の探索により、第一のクラスタと第二のクラスタとを対応付け可能である。 In the above-described embodiment, the cluster feature data included in the first data set 15A and the cluster feature data included in the second data set 15B have elements common to each other as elements for explaining cluster features. Even if not, it is possible to associate the first cluster with the second cluster by searching for the mapping (function g).

一方、第一のデータセット１５Ａに含まれるクラスタ特徴データと、第二のデータセット１５Ｂに含まれるクラスタ特徴データとの間に、クラスタの特徴を説明する要素として共通する要素が存在する場合、すなわち、第一のデータセット１５Ａに含まれるクラスタ特徴データの特徴ベクトル（ｘ_ｉ［１］，ｘ_ｉ［２］，ｘ_ｉ［３］，…，ｘ_ｉ［Ｌ１］）と、第二のデータセット１５Ｂに含まれるクラスタ特徴データの特徴ベクトル（ｙ_ｊ［１］，ｙ_ｊ［２］，ｙ_ｊ［３］，…，ｙ_ｊ［Ｌ２］）との間に共通する変数が存在する場合は、この共通変数のみに基づいて最適輸送問題を解くことにより、第一のクラスタと第二のクラスタとの間の対応付けを行ってもよい。 On the other hand, when there is a common element as an element explaining the cluster feature between the cluster feature data included in the first data set 15A and the cluster feature data included in the second data set 15B, i.e. , feature vectors (x _i [1], x _i [2], x _i [3], . . . , x _i [L1]) of the cluster feature data included in the first data set 15A, and If there is a common variable between the feature vectors (y _j [1], y _j [2], y _j [3], ..., y _j [L2]) of the cluster feature data included in 15B, A correspondence between the first cluster and the second cluster may be made by solving the optimal transport problem based only on this common variable.

具体的に、プロセッサ１１は、図６に示す分析処理に代えて、図９に示す分析処理を実行することができる。図９に示す分析処理を開始すると、プロセッサ１１は、Ｓ５１０において、指定された第一のデータセット１５Ａをストレージ１５から読み出し、読み出した第一のデータセット１５Ａに含まれる第一のクラスタ毎のクラスタ特徴データに基づき、第一のクラスタ毎に共通変数のみの特徴ベクトルＤ_ｆ ^＊［ｉ］＝（ｘ_ｉ［１］，ｘ_ｉ［２］，ｘ_ｉ［３］，…，ｘ_ｉ［Ｌ０］）を生成する。 Specifically, the processor 11 can execute the analysis process shown in FIG. 9 instead of the analysis process shown in FIG. When the analysis processing shown in FIG. 9 is started, the processor 11 reads the specified first data set 15A from the storage 15 in S510, and clusters for each first cluster included in the read first data set 15A Based on the feature data, the feature vector Df ^* [i] ₌ ( _xi [1], _xi [2], _xi [3],..., _xi [L0] with only common variables for each first cluster ).

ここでは、第一のデータセット１５Ａにおいて第一のクラスタの特徴を説明するＬ１個の要素のうちのＬ０個の要素ｘ_ｉ［１］，ｘ_ｉ［２］，ｘ_ｉ［３］，…，ｘ_ｉ［Ｌ０］と、第二のデータセット１５Ｂにおいて第二のクラスタの特徴を説明するＬ２個の要素のうちのＬ０個の要素ｙ_ｊ［１］，ｙ_ｊ［２］，ｙ_ｊ［３］，…，ｙ_ｊ［Ｌ０］と、が共通変数であるとの前提を置く。 Here, L0 elements x _i [1], x _i [2], x _i [3], . x _i [L0] and the L0 elements y _j [1], y _j [2], y _j [3 of the L2 elements that describe the features of the second cluster in the second data set 15B ], . . . , y _j [L0] are common variables.

続くＳ５２０において、プロセッサ１１は、指定された第二のデータセット１５Ｂをストレージ１５から読み出し、読み出した第二のデータセット１５Ｂに含まれる第二のクラスタ毎のクラスタ特徴データに基づき、第二のクラスタ毎に共通変数のみの特徴ベクトルＤ_ｓ ^＊［ｊ］＝（ｙ_ｊ［１］，ｙ_ｊ［２］，ｙ_ｊ［３］，…，ｙ_ｊ［Ｌ０］）を生成する。 In subsequent S520, the processor 11 reads the designated second data set 15B from the storage 15, and based on the cluster feature data for each second cluster included in the read second data set 15B, the second cluster , a feature vector D _s ^* [j]=(y _j [1], y _j [2], y _j [3], . . . , y _j [L0]) of only common variables is generated.

続くＳ５４０において、プロセッサ１１は、第一のクラスタと第二のクラスタとの組合せ毎の、特徴ベクトルＤ_ｆ ^＊［ｉ］と特徴ベクトルＤ_ｓ ^＊［ｊ］との間の特徴空間上の距離に基づいて、最適輸送問題を解くことにより、複数の第一のクラスタのそれぞれを、第二のクラスタの一つ以上に対応付ける。 In subsequent S540, the processor 11 calculates the distance in the feature space between the feature vector D _f ^* [i] and the feature vector D _s ^* [j] for each combination of the first cluster and the second cluster. Based on this, each of the plurality of first clusters is matched to one or more of the second clusters by solving the optimal transport problem.

Ｓ５４０において、プロセッサ１１は、図７Ａに示す処理と同様に、同じ消費者を含む第一のクラスタと第二のクラスタとのペアに関しては、第一のクラスタと第二のクラスタとの間の距離を特定値εに設定して、最適輸送問題を解くことができる。 In S540, the processor 11 calculates the distance between the first cluster and the second cluster for the pair of the first cluster and the second cluster containing the same consumer, similar to the process shown in FIG. 7A. can be set to a specific value ε to solve the optimal transport problem.

あるいは、プロセッサ１１は、同じ消費者を含む第一のクラスタと第二のクラスタとのペアに関しては、ペア毎に、クラスタ特徴データを参照せず、一律に、第一のクラスタをペアに対応する第二のクラスタに対応付けることができる。その後、同じ消費者を含まない第一のクラスタと第二のクラスタとのペアの一群に関しては、特徴ベクトルＤ_ｆ ^＊［ｉ］と特徴ベクトルＤ_ｓ ^＊［ｊ］との間の特徴空間上の距離に基づいて、最適輸送問題を解くことにより、複数の第一のクラスタのそれぞれを、第二のクラスタの一つ以上に対応付けることができる。 Alternatively, the processor 11 uniformly associates the first cluster with the pair without referring to the cluster feature data for each pair of the first cluster and the second cluster containing the same consumer. can be mapped to a second cluster. Then, for a set of _pairs of first and second _clusters that do not ^contain the same ^consumer , Based on the distances, each of the plurality of first clusters can be associated with one or more of the second clusters by solving an optimal transport problem.

続くＳ５５０においてプロセッサ１１は、Ｓ５４０での対応付けに従って、第一のデータセット１５Ａに含まれる第一のクラスタ毎のクラスタ特徴データを、第二のデータセット１５Ｂに含まれる、対応付けられた一つ以上の第二のクラスタのクラスタ特徴データと結合するように、第一のデータセット１５Ａと第二のデータセット１５Ｂとを結合する。この結合により、拡張データセット１５Ｃを生成し、これをストレージ１５に保存する。その後、プロセッサ１１は、分析処理を終了する。 In subsequent S550, the processor 11 converts the cluster feature data for each first cluster included in the first data set 15A to the associated one included in the second data set 15B according to the association in S540. The first data set 15A and the second data set 15B are combined so as to combine with the cluster feature data of the second cluster. By this combination, an extended data set 15C is generated and stored in the storage 15. FIG. After that, the processor 11 terminates the analysis process.

図９に示す分析処理によれば、共通変数に注目して負荷の低い手順で、第一のデータセット１５Ａと第二のデータセット１５Ｂとの間のデータフュージョンを実現することができる。 According to the analysis processing shown in FIG. 9, data fusion between the first data set 15A and the second data set 15B can be realized with a low-load procedure focusing on common variables.

以上には、消費者の特徴データを含むデータセットの結合に、本開示の技術を適用した例を説明したが、消費者以外の、情報端末及び自動車を例に含む様々なものに対して、本開示の技術は適用され得る。すなわち、エンティティは、消費者以外及び人以外であってもよい。 In the above, an example in which the technology of the present disclosure is applied to combine data sets containing consumer feature data has been described. The technology of the present disclosure can be applied. That is, entities may be non-consumers and non-persons.

上記実施形態における１つの構成要素が有する機能は、複数の構成要素に分散して設けられてもよい。複数の構成要素が有する機能は、１つの構成要素に統合されてもよい。上記実施形態の構成の一部は、省略されてもよい。上記実施形態の構成の少なくとも一部は、他の上記実施形態の構成に対して付加又は置換されてもよい。特許請求の範囲に記載の文言から特定される技術思想に含まれるあらゆる態様が本開示の実施形態である。 A function possessed by one component in the above embodiment may be distributed to a plurality of components. Functions possessed by multiple components may be integrated into one component. A part of the configuration of the above embodiment may be omitted. At least part of the configurations of the above embodiments may be added or replaced with respect to the configurations of other above embodiments. All aspects included in the technical ideas specified by the language in the claims are embodiments of the present disclosure.

１…情報処理システム、１１…プロセッサ、１３…メモリ、１５…ストレージ、１５Ａ…第一のデータセット、１５Ｂ…第二のデータセット、１５Ｃ…拡張データセット、１７…ユーザインタフェース、１９…通信インタフェース、３１…第一のデータ提供システム、３２…第二のデータ提供システム、３１１…プロセッサ、３１２…メモリ、３２１…プロセッサ、３２２…メモリ、Ｐｒ…コンピュータプログラム。 Reference Signs List 1 information processing system 11 processor 13 memory 15 storage 15A first data set 15B second data set 15C extended data set 17 user interface 19 communication interface 31... First data providing system, 32... Second data providing system, 311... Processor, 312... Memory, 321... Processor, 322... Memory, Pr... Computer program.

Claims

First data comprising, for each first cluster, first cluster feature data representing a feature of the corresponding first cluster with respect to a plurality of first clusters defined by clustering the first group of entities a first obtaining unit configured to obtain a set;
Second data comprising second cluster feature data representing features of the corresponding second cluster for each second cluster, with respect to a plurality of second clusters defined by clustering the second group of entities. a second obtaining unit configured to obtain a set;
With respect to one or more common entities that are one or more common entities between the first group and the second group, a first cluster and a second cluster to which each of the common entities belong can be identified. Based on the information, each of the plurality of first cluster feature data included in the first data set is associated with one or more of the plurality of second cluster feature data included in the second data set. a mapping unit configured;
with
The associating unit
A first pair that is a pair of the first cluster and the second cluster that have entities in common with each other and a second pair that is a pair of the first cluster and the second cluster that have no entities in common with each other the first cluster feature data and the second cluster feature data corresponding to the first pair among the pairs are associated with each other preferentially over the second pair; to one or more of the plurality of second cluster feature data,
For at least the second pair, the positional relationship in the feature space between the first cluster and the second cluster determined from the first cluster feature data and the second cluster feature data an information processing system that determines a pair of the first cluster and the second cluster to be associated with each other based on the above, and associates the first cluster feature data with the second cluster feature data .

First data comprising, for each first cluster, first cluster feature data representing a feature of the corresponding first cluster with respect to a plurality of first clusters defined by clustering the first group of entities a first obtaining unit configured to obtain a set;
Second data comprising second cluster feature data representing features of the corresponding second cluster for each second cluster, with respect to a plurality of second clusters defined by clustering the second group of entities. a second obtaining unit configured to obtain a set;
With respect to one or more common entities that are one or more common entities between the first group and the second group, a first cluster and a second cluster to which each of the common entities belong can be identified. Based on the information, each of the plurality of first cluster feature data included in the first data set is associated with one or more of the plurality of second cluster feature data included in the second data set. a mapping unit configured;
with
For one or more specific clusters, each of which is one or more first clusters each containing at least one of the common entities, among the plurality of first clusters, the associating unit performs configured to associate the first cluster feature data with the second cluster feature data of the second cluster containing the same entity ;
Among the plurality of first clusters, for non-specific clusters other than the specific cluster, the first cluster and the second cluster determined from the first cluster feature data and the second cluster feature data A pair of the first cluster and the second cluster to be associated with each other is determined based on the positional relationship between the clusters in the feature space, and the first cluster feature data and the second cluster feature data are determined. Information processing system that associates

First data comprising, for each first cluster, first cluster feature data representing a feature of the corresponding first cluster with respect to a plurality of first clusters defined by clustering the first group of entities a first obtaining unit configured to obtain a set;
Second data comprising second cluster feature data representing features of the corresponding second cluster for each second cluster, with respect to a plurality of second clusters defined by clustering the second group of entities. a second obtaining unit configured to obtain a set;
With respect to one or more common entities that are one or more common entities between the first group and the second group, a first cluster and a second cluster to which each of the common entities belong can be identified. Based on the information, each of the plurality of first cluster feature data included in the first data set is associated with one or more of the plurality of second cluster feature data included in the second data set. a mapping unit configured;
with
For one or more specific clusters, each of which is one or more first clusters each containing at least one of the common entities, among the plurality of first clusters, the associating unit performs configured to associate the first cluster feature data with the second cluster feature data of the second cluster containing the same entity;
The associating unit, among the plurality of first clusters, one or more non-common first clusters that are one or more remaining clusters other than the specific cluster, and among the plurality of second clusters, With respect to one or more non-common second clusters that are one or more remaining clusters other than the clusters associated with the specific cluster, the first cluster feature data of each of the one or more non-common first clusters and the determining a relationship between each of the non-common first clusters and each of the non-common second clusters based on the second cluster feature data of each of the one or more non-common second clusters; An information processing system configured to associate the first cluster feature data of each of the common first clusters with the second cluster feature data of at least one of the non-common second clusters.

First data comprising, for each first cluster, first cluster feature data representing a feature of the corresponding first cluster with respect to a plurality of first clusters defined by clustering the first group of entities a first obtaining unit configured to obtain a set;
Second data comprising second cluster feature data representing features of the corresponding second cluster for each second cluster, with respect to a plurality of second clusters defined by clustering the second group of entities. a second obtaining unit configured to obtain a set;
With respect to one or more common entities that are one or more common entities between the first group and the second group, a first cluster and a second cluster to which each of the common entities belong can be identified. Based on the information, each of the plurality of first cluster feature data included in the first data set is associated with one or more of the plurality of second cluster feature data included in the second data set. a mapping unit configured;
with
the first cluster feature data defines points on the first feature space of the corresponding first cluster;
the second cluster feature data defines points on a second feature space of the corresponding second cluster;
The associating unit
A process of searching, using an evaluation function, for a mapping that maps each point of the plurality of first clusters on the first feature space to the second feature space;
Between each point of the plurality of first clusters on the second feature space mapped by the mapping and each point of the plurality of second clusters on the second feature space a process of associating each of the plurality of first cluster feature data with one or more of the plurality of second cluster feature data based on the positional relationship;
and run
The evaluation function is
With respect to a specific pair that is a pair of the first cluster and the second cluster containing the same entity, the second cluster between the first cluster and the second cluster constituting the specific pair The shorter the distance on the feature space, the more the value that changes in a specific direction is output,
Regarding pairs of the first cluster and the second cluster other than the specific pair, the shorter the distance in the second feature space between the first cluster and the second cluster, the An information processing system that is a function designed to output a value that changes in the direction opposite to the specific direction.

the first cluster feature data defines points on the first feature space of the corresponding first cluster;
the second cluster feature data defines points on a second feature space of the corresponding second cluster;
The associating unit
A process of searching, using an evaluation function, for a mapping that maps each point of the plurality of first clusters including the specific cluster on the first feature space to the second feature space;
a process of associating the first cluster feature data of each of the specific clusters with the second cluster feature data of the second clusters containing the same entity;
positional relationship between each point of the non-common first cluster on the second feature space mapped by the mapping and each point of the non-common second cluster on the second feature space; a process of associating the first cluster feature data of each of the non-common first clusters with the second cluster feature data of at least one of the non-common second clusters based on;
and run
The evaluation function is
With respect to a specific pair that is a pair of the first cluster and the second cluster containing the same entity, the second cluster between the first cluster and the second cluster constituting the specific pair The shorter the distance on the feature space, the more the value that changes in a specific direction is output,
Regarding pairs of the first cluster and the second cluster other than the specific pair, the shorter the distance in the second feature space between the first cluster and the second cluster, the 4. The information processing system according to claim 3, wherein the function is designed to output a value that changes in a direction opposite to the specific direction.

The evaluation function is
With respect to the specific pair, the shorter the distance in the second feature space between the first cluster and the second cluster that constitute the specific pair, the smaller the value is output,
Regarding pairs of the first cluster and the second cluster other than the specific pair, the shorter the distance in the second feature space between the first cluster and the second cluster, the Designed to output large values,
6. The information processing system according to claim 4, wherein the associating unit searches for the mapping that minimizes the output of the evaluation function.

the one or more common entities are a plurality of common entities;
The plurality of first clusters is a set of clusters defined by clustering the first groups such that the plurality of common entities are distributed and arranged in the plurality of first clusters;
wherein said plurality of second clusters is a set of clusters defined by clustering said second groups such that said plurality of common entities are distributed and arranged in said plurality of second clusters; The information processing system according to any one of claims 1 to 6.

the one or more common entities are a plurality of common entities;
The plurality of first clusters is a set of clusters defined by clustering the first groups such that each of the plurality of common entities belongs to different first clusters;
The plurality of second clusters is a set of clusters defined by clustering the second groups such that each of the plurality of common entities belongs to different second clusters. Item 7. The information processing system according to any one of items 6.

The first cluster feature data is feature data generated by statistically integrating feature data of a plurality of entities belonging to the corresponding first cluster,
9. The second cluster feature data is feature data generated by statistically integrating feature data of a plurality of entities belonging to the corresponding second cluster. Information processing system as described.

10. The information processing system according to claim 9, wherein said entity is a consumer, and said feature data of said entity is feature data describing features of said consumer.

A computer program for causing a computer to function as the first acquisition unit, the second acquisition unit, and the association unit in the information processing system according to any one of claims 1 to 10.

A computer-implemented information processing method comprising:
First data comprising, for each first cluster, first cluster feature data representing a feature of the corresponding first cluster with respect to a plurality of first clusters defined by clustering the first group of entities obtaining a set;
Second data comprising second cluster feature data representing features of the corresponding second cluster for each second cluster, with respect to a plurality of second clusters defined by clustering the second group of entities. obtaining a set;
With respect to one or more common entities that are one or more common entities between the first group and the second group, a first cluster and a second cluster to which each of the common entities belong can be identified. Correlating each of the plurality of first cluster feature data included in the first data set with one or more of the plurality of second cluster feature data included in the second data set based on the information. ,
including
said associating
A first pair that is a pair of the first cluster and the second cluster that have entities in common with each other and a second pair that is a pair of the first cluster and the second cluster that have no entities in common with each other the first cluster feature data and the second cluster feature data corresponding to the first pair among the pairs are associated with each other preferentially over the second pair; to one or more of the plurality of second cluster feature data,
For at least the second pair, the positional relationship in the feature space between the first cluster and the second cluster determined from the first cluster feature data and the second cluster feature data determining a pair of the first cluster and the second cluster to be associated with each other, and associating the first cluster feature data with the second cluster feature data .

A computer-implemented information processing method comprising:
First data comprising, for each first cluster, first cluster feature data representing a feature of the corresponding first cluster with respect to a plurality of first clusters defined by clustering the first group of entities obtaining a set;
Second data comprising second cluster feature data representing features of the corresponding second cluster for each second cluster, with respect to a plurality of second clusters defined by clustering the second group of entities. obtaining a set;
With respect to one or more common entities that are one or more common entities between the first group and the second group, a first cluster and a second cluster to which each of the common entities belong can be identified. Correlating each of the plurality of first cluster feature data included in the first data set with one or more of the plurality of second cluster feature data included in the second data set based on the information. ,
including
said associating
For one or more specific clusters, each of which is one or more first clusters including at least one of the common entities, among the plurality of first clusters, the first cluster characteristic of each of the specific clusters associating data with the second cluster feature data of the second cluster containing the same entity;
Among the plurality of first clusters, for non-specific clusters other than the specific cluster, the first cluster and the second cluster determined from the first cluster feature data and the second cluster feature data A pair of the first cluster and the second cluster to be associated with each other is determined based on the positional relationship between the clusters in the feature space, and the first cluster feature data and the second cluster feature data are determined. and
Information processing method including.

A computer-implemented information processing method comprising:
First data comprising, for each first cluster, first cluster feature data representing a feature of the corresponding first cluster with respect to a plurality of first clusters defined by clustering the first group of entities obtaining a set;
Second data comprising second cluster feature data representing features of the corresponding second cluster for each second cluster, with respect to a plurality of second clusters defined by clustering the second group of entities. obtaining a set;
With respect to one or more common entities that are one or more common entities between the first group and the second group, a first cluster and a second cluster to which each of the common entities belong can be identified. Correlating each of the plurality of first cluster feature data included in the first data set with one or more of the plurality of second cluster feature data included in the second data set based on the information. ,
including
said associating
With respect to one or more specific clusters, each of which is one or more first clusters including at least one of the common entities, among the plurality of first clusters, the first cluster characteristic of each of the specific clusters associating data with the second cluster feature data of the second cluster containing the same entity;
One or more non-common first clusters that are one or more remaining clusters other than the specific cluster among the plurality of first clusters, and one or more of the plurality of second clusters, associated with the specific cluster With respect to one or more non-common second clusters that are one or more remaining clusters other than the clusters obtained by determining a relationship between each of the non-common first clusters and each of the non-common second clusters based on the second cluster feature data of each of the two clusters; associating the first cluster feature data of with the second cluster feature data of at least one of the non-common second clusters;
Information processing methods, including

A computer-implemented information processing method comprising:
First data comprising, for each first cluster, first cluster feature data representing a feature of the corresponding first cluster with respect to a plurality of first clusters defined by clustering the first group of entities obtaining a set;
Second data comprising second cluster feature data representing features of the corresponding second cluster for each second cluster, with respect to a plurality of second clusters defined by clustering the second group of entities. obtaining a set;
With respect to one or more common entities that are one or more common entities between the first group and the second group, a first cluster and a second cluster to which each of the common entities belong can be identified. Correlating each of the plurality of first cluster feature data included in the first data set with one or more of the plurality of second cluster feature data included in the second data set based on the information. ,
including
the first cluster feature data defines points on the first feature space of the corresponding first cluster;
the second cluster feature data defines points on a second feature space of the corresponding second cluster;
said associating
using an evaluation function to search for a mapping that maps each point of the plurality of first clusters on the first feature space to the second feature space;
Between each point of the plurality of first clusters on the second feature space mapped by the mapping and each point of the plurality of second clusters on the second feature space associating each of the plurality of first cluster feature data with one or more of the plurality of second cluster feature data based on a positional relationship;
including
The evaluation function is
With respect to a specific pair that is a pair of the first cluster and the second cluster containing the same entity, the second cluster between the first cluster and the second cluster constituting the specific pair The shorter the distance on the feature space, the more the value that changes in a specific direction is output,
Regarding pairs of the first cluster and the second cluster other than the specific pair, the shorter the distance in the second feature space between the first cluster and the second cluster, the Output a value that changes in the direction opposite to the specific direction
An information processing method that is a function designed to