JP2019070973A

JP2019070973A - Information processing system, information processing method, and program

Info

Publication number: JP2019070973A
Application number: JP2017197105A
Authority: JP
Inventors: 龍道本; Ryu Domoto; 良治見並; Ryoji Minami
Original assignee: Hakuhodo DY Holdings Inc
Current assignee: Hakuhodo DY Holdings Inc
Priority date: 2017-10-10
Filing date: 2017-10-10
Publication date: 2019-05-09
Anticipated expiration: 2037-10-10
Also published as: JP6302126B1

Abstract

To provide new technologies related to databases.SOLUTION: A system (50) generates a database based on a first database and a second database. The first database (151) has feature data for each component in a first group, and the second database (351) has feature data for each component in a second group. The system divides a plurality of pairs of components between the first group and the second group into a plurality of clusters and provides cluster information; acquires first integrated feature data for each cluster from a first generation unit (10) that integrates the feature data of the first database for each cluster based on the cluster information; acquires second integrated feature data for each cluster from a second generation unit (30) that integrates the feature data of the second database for each cluster based on the cluster information; and generates a combined database (551) which has, for each cluster, data combining the first and second integrated feature data of the same cluster.SELECTED DRAWING: Figure 1

Description

本開示は、情報処理システム及び情報処理方法に関する。 The present disclosure relates to an information processing system and an information processing method.

従来、商品の販売データに基づき顧客の購買行動を解析することが行われている。商業活動に役立てるために、顧客によるマスメディアやネットワークコンテンツへの接触行動を解析することも行われている。 Heretofore, it has been practiced to analyze the purchase behavior of a customer based on sales data of a product. In order to use for commercial activities, analysis of contact behavior with mass media and network content by customers is also performed.

アンケート形式や対面での質問形式により、顧客の購買行動、マスメディア／ネットワークコンテンツへの接触行動、及び、ライフスタイル等の多様な情報を収集することも行われている。 It is also conducted to collect various information such as purchase behavior of customers, contact behavior to mass media / network contents, lifestyle, etc. by questionnaire format or face-to-face question format.

近年では、こうした顧客に関するデータを備える巨大なデータベースを、各企業が有している。しかしながら、各企業は、個人情報保護を主な理由として、これらの顧客に関するデータを外部に提供することに消極的である。これらのデータは、それを保有する企業から外部に提供される場合、暗号化されて提供されたり、顧客の特定に繋がる情報が大幅に削除されて提供されたり、意図的に誤り（ノイズ）を含むように変更された状態で提供されたりする（特許文献１参照）。 In recent years, each company has a huge database that includes data on such customers. However, each company is reluctant to provide data on these customers to the outside, mainly for the protection of personal information. These data, if provided externally by the company that owns it, is provided encrypted, provided that information leading to the identification of the customer has been largely deleted and provided, or intentionally (error) It is provided in the state changed so that it may contain (refer patent document 1).

特開２０１４−１０９６４７号公報JP, 2014-109647, A

上述したように、データ保有企業からの顧客に関するデータの提供は、個人情報保護の観点から限られている。このため、従来技術によれば、社会に存在する各種データを有効に活用することが難しい。 As described above, the provision of data on customers from data-holding companies is limited in terms of personal information protection. Therefore, according to the prior art, it is difficult to effectively utilize various data existing in society.

そこで、本開示の一側面では、社会に存在する各種データを有効活用可能なデータベースに関する新規技術を提供できることが望ましい。 Therefore, in one aspect of the present disclosure, it is desirable to be able to provide a new technology related to a database that can effectively utilize various data existing in society.

本開示の一側面に係る情報処理システムは、第一及び第二データベースに基づく新たなデータベースを生成する情報処理システムである。第一データベースは、第一グループの構成体毎に、当該構成体の第一の特徴を表す特徴データを有する。第二データベースは、第二グループの構成体毎に、当該構成体の第二の特徴を表す特徴データを有する。 An information processing system according to an aspect of the present disclosure is an information processing system that generates a new database based on the first and second databases. The first database has, for each of the members of the first group, feature data representing the first feature of the members. The second database has, for each of the members of the second group, feature data representing a second feature of the members.

本開示の一側面によれば、情報処理システムは、クラスタリング部と、第一取得部と、第二取得部と、結合部と、を備える。クラスタリング部は、第一グループと第二グループとの間の構成体のペアであってペアに属する二つの構成体が互いに少なくとも対応する構成体ペアの複数を、複数のクラスタに分割し、複数の構成体ペアの夫々が属するクラスタを表すクラスタ情報を提供するように構成される。 According to one aspect of the present disclosure, an information processing system includes a clustering unit, a first acquisition unit, a second acquisition unit, and a coupling unit. The clustering unit divides a plurality of construct pairs, which are a pair of constructs between the first group and the second group and in which two constructs belonging to the pair at least correspond to each other, into a plurality of clusters; It is configured to provide cluster information representing the cluster to which each of the constituent pairs belong.

上述のように、一つの構成体ペアは、互いに少なくとも対応する第一グループの構成体と第二グループの構成体とのペアである。ここでの用語「少なくとも対応する」は、「一致する」を概念に含む。従って、一つの構成体ペアに属する第一グループの構成体及び第二グループの構成体の実体は、単一の実体であってもよい。例えば、複数の構成体ペアの夫々は、実体が同一であると推定される構成体のペアであり得る。構成体ペアが、実体が同一である構成体のペアである場合、「複数の構成体ペア」は、「第一グループと第二グループとの間で共通する複数の構成体」と読み替えられてもよい。 As mentioned above, one constituent pair is a pair of a first group of constituents and a second group of constituents at least corresponding to each other. The term "at least corresponding" as used herein includes "matching" in the concept. Thus, the entities of the first group of constituents and the second group of constituents belonging to one constituent pair may be a single entity. For example, each of a plurality of construct pairs may be a pair of constructs whose entities are assumed to be identical. When a construct pair is a pair of constructs having the same entity, "a plurality of construct pairs" is read as "a plurality of constructs common to the first group and the second group" It is also good.

第一取得部は、第一生成部からクラスタ毎の第一統合特徴データを取得するように構成される。第一生成部は、クラスタリング部から取得したクラスタ情報に基づき、第一データベースが有する複数の構成体ペアに対応する特徴データを、クラスタ毎に統計処理によって統合し、それによりクラスタ毎に統合された特徴データとしての第一統合特徴データを生成するように構成される。 The first acquisition unit is configured to acquire first integrated feature data for each cluster from the first generation unit. The first generation unit integrates feature data corresponding to a plurality of constituent pairs possessed by the first database by statistical processing for each cluster based on cluster information acquired from the clustering unit, and thereby is integrated for each cluster It is configured to generate first integrated feature data as feature data.

第二取得部は、第二生成部からクラスタ毎の第二統合特徴データを取得するように構成される。第二生成部は、クラスタリング部から取得したクラスタ情報に基づき、第二データベースが有する複数の構成体ペアに対応する特徴データを、クラスタ毎に統計処理によって統合し、それによりクラスタ毎に統合された特徴データとしての第二統合特徴データを生成するように構成される。 The second acquisition unit is configured to acquire second integrated feature data for each cluster from the second generation unit. The second generation unit integrates feature data corresponding to a plurality of construction body pairs possessed by the second database by statistical processing for each cluster based on cluster information acquired from the clustering unit, and thereby is integrated for each cluster It is configured to generate second integrated feature data as feature data.

結合部は、第一取得部が取得したクラスタ毎の第一統合特徴データと、第二取得部が取得したクラスタ毎の第二統合特徴データと、に基づき、新たなデータベースとして、同一クラスタの第一統合特徴データと第二統合特徴データとを結合した結合データをクラスタ毎に有する結合データベースを生成するように構成される。 The combining unit is configured as a new database based on the first integrated feature data for each cluster acquired by the first acquisition unit and the second integrated feature data for each cluster acquired by the second acquisition unit. It is configured to generate a combined database having combined data for each cluster combining one integrated feature data and a second integrated feature data.

この情報処理システムによれば、第一データベース及び第二データベースが有する加工なしの特徴データを入手することができなくても、第一統合特徴データ及び第二統合特徴データを有意義に結合して、第一データベースと第二データベースとの結合データベースに対応した、有意義な結合データベースを生成することができる。 According to this information processing system, the first integrated feature data and the second integrated feature data are meaningfully combined even if it is not possible to obtain the feature data without processing that the first database and the second database have. A meaningful combined database can be generated that corresponds to the combined database of the first and second databases.

この情報処理システムによって生成される結合データベースの特徴データは、第一及び第二データベースにおける特徴データをクラスタ毎に統計処理により統合した第一及び第二統合特徴データである。このため、本開示の情報処理システムによれば、個人情報を保護できる。従って、本開示の一側面によれば、社会に存在する各種データを有効活用可能なデータベースに関する新規技術を提供可能である。 The feature data of the combined database generated by the information processing system is first and second combined feature data in which feature data in the first and second databases are integrated by statistical processing for each cluster. Therefore, according to the information processing system of the present disclosure, personal information can be protected. Therefore, according to one aspect of the present disclosure, it is possible to provide a new technology related to a database that can effectively utilize various data existing in society.

本開示の一側面によれば、第一及び第二グループの構成体は、消費者であってもよい。この場合、第一データベースは、第一グループの消費者毎に、当該消費者の第一の特徴を表す特徴データを有し、第二データベースは、第二グループの消費者毎に、当該消費者の第二の特徴を表す特徴データを有し得る。 According to one aspect of the present disclosure, the first and second groups of components may be consumers. In this case, the first database has, for each consumer of the first group, the feature data representing the first feature of the consumer, and the second database, for each consumer of the second group, the consumer May have feature data representing a second feature of

本開示の一側面によれば、第一グループの構成体の夫々には、個別の第一の識別コードが割り当てられてもよい。第一データベースは、第一グループの構成体毎の特徴データを、構成体の第一の識別コードと関連付けて記憶してもよい。第二グループの構成体の夫々には、個別の第二の識別コードが割り当てられてもよい。第二データベースは、第二グループの構成体毎の特徴データを、当該構成体の第二の識別コードと関連付けて記憶してもよい。 According to one aspect of the present disclosure, each of the first group of constructs may be assigned a respective first identification code. The first database may store feature data of each of the first group of constituents in association with the first identification code of the constituents. A separate second identification code may be assigned to each of the second group of constructs. The second database may store the feature data of the second group of constituents in association with the second identification code of the constituents.

本開示の一側面によれば、クラスタリング部は、第一の識別コードと第二の識別コードとの対応関係を表す情報に基づき、上記複数の構成体ペアを特定して、複数の構成体ペアを、複数のクラスタに分割してもよい。クラスタリング部は、第一生成部に、複数の構成体ペアの夫々が属するクラスタを第一の識別コードと関連付けて表すクラスタ情報を提供してもよい。クラスタリング部は、第二生成部に、複数の構成体ペアの夫々が属するクラスタを第二の識別コードと関連付けて表すクラスタ情報を提供してもよい。 According to one aspect of the present disclosure, the clustering unit identifies the plurality of constituent pairs based on the information indicating the correspondence between the first identification code and the second identification code, and determines the plurality of constituent pairs. May be divided into a plurality of clusters. The clustering unit may provide the first generation unit with cluster information representing a cluster to which each of the plurality of construction body pairs belongs in association with the first identification code. The clustering unit may provide, to the second generation unit, cluster information representing a cluster to which each of the plurality of construction body pairs belongs in association with the second identification code.

本開示の一側面によれば、第一及び第二データベースは、第一データベースと第二データベースとの間で共通する識別コードを用いて、各構成体の特徴データを、対応する構成体の識別コードと関連付けて記憶してもよい。クラスタリング部は、複数の構成体ペアとして、第一データベースと第二データベースとの間で同一の識別コードが関連付けられた特徴データのペアに対応する構成体ペアの複数を、複数のクラスタに分割し、クラスタ情報として、第一及び第二生成部に、複数の構成体ペアの夫々が属するクラスタを識別コードと関連付けて表すクラスタ情報を提供するように構成されてもよい。 According to one aspect of the present disclosure, the first and second databases use the identification code common to the first database and the second database to identify the feature data of each construct and the corresponding construct. It may be stored in association with the code. The clustering unit divides a plurality of constituent pairs corresponding to a pair of feature data associated with the same identification code between the first database and the second database as a plurality of constituent pairs into a plurality of clusters. The present invention may be configured to provide, as cluster information, cluster information representing a cluster to which each of a plurality of construction pairs belongs in association with an identification code as the first and second generation units.

本開示の一側面によれば、第一生成部は、第一グループに属する複数の構成体の夫々に関し、当該構成体の個人情報を特定関数で秘匿化／ハッシュ化し、第一グループに属する複数の構成体のリストであって、構成体毎の秘匿値／ハッシュ値を含むリストを、クラスタリング部に提供するように構成されてもよい。 According to one aspect of the present disclosure, the first generation unit conceals / hashes the personal information of the plurality of constituents belonging to the first group with a specific function, and a plurality of members belonging to the first group The configuration unit may be configured to provide the clustering unit with a list including a secrecy value / hash value for each configuration.

第二生成部は、第二グループに属する複数の構成体の夫々に関し、当該構成体の個人情報を上記特定関数で秘匿化／ハッシュ化し、第二グループに属する複数の構成体のリストであって、構成体毎の秘匿値／ハッシュ値を含むリストを、クラスタリング部に提供するように構成されてもよい。 The second generation unit is a list of a plurality of constituents belonging to the second group, regarding each of the plurality of constituents belonging to the second group, concealing / hashing the personal information of the constituents with the specific function, , And may be configured to provide the clustering unit with a list including the concealment value / hash value for each construct.

クラスタリング部は、第一生成部から取得したリストに含まれる秘匿値／ハッシュ値及び第二生成部から取得したリストに含まれる秘匿値／ハッシュ値に基づいて、複数の構成体ペアを特定し、第一及び第二生成部に、複数の構成体ペアの夫々が属するクラスタを秘匿値／ハッシュ値と関連付けて表すクラスタ情報を提供してもよい。 The clustering unit identifies a plurality of construct pairs based on the concealment value / hash value included in the list acquired from the first generation unit and the concealment value / hash value included in the list acquired from the second generation unit, The first and second generation units may be provided with cluster information representing a cluster to which each of the plurality of construct pair belongs, in association with the secret value / hash value.

本開示の一側面によれば、クラスタリング部は、複数の構成体ペアを、複数の構成体ペア間の類似度に基づき、複数のクラスタに分割するように構成されてもよい。このような類似度に基づくクラスタリングによれば、似た構成体の特徴データを統合することができるため、クラスタ毎の特徴データの統合によっても価値のある情報が失われるのを抑えることができ、より有意義な結合データベースを生成することができる。 According to an aspect of the present disclosure, the clustering unit may be configured to divide the plurality of construct pairs into a plurality of clusters based on the degree of similarity between the plurality of construct pairs. According to such clustering based on the degree of similarity, feature data of similar configurations can be integrated, so that loss of valuable information can be suppressed even by integration of feature data for each cluster. A more meaningful combined database can be generated.

本開示の一側面によれば、クラスタリング部は、複数の構成体ペア間の第一及び第二の特徴の少なくとも一方に関する類似度を特定可能な類似度情報を取得するように構成されてもよい。この場合、クラスタリング部は、取得した類似度情報に基づき、複数の構成体ペアを、第一及び第二の特徴の少なくとも一方が類似する構成体ペアをまとめるように、複数のクラスタに分割するように構成されてもよい。 According to an aspect of the present disclosure, the clustering unit may be configured to obtain similarity information that can specify the similarity of at least one of the first and second features between a plurality of constituent pairs. . In this case, the clustering unit divides the plurality of construct pairs into a plurality of clusters based on the acquired similarity information so that at least one of the first and second features is similar. May be configured.

本開示の一側面によれば、第一生成部は、第一グループに属する複数の構成体のリストであって、第一の特徴に関する複数の構成体間の類似度を表すリストを、クラスタリング部に提供するように構成されてもよい。第二生成部は、第二グループに属する複数の構成体のリストであって、第二の特徴に関する複数の構成体間の類似度を表すリストを、クラスタリング部に提供するように構成されてもよい。この場合、クラスタリング部は、第一生成部及び第二生成部から取得したリストに基づいて、複数の構成体ペアを、第一及び第二の特徴が類似する構成体をまとめるように、複数のクラスタに分割する構成にされてもよい。 According to one aspect of the present disclosure, the first generation unit is a list of a plurality of constructs belonging to the first group, the list representing the similarity between the plurality of constructs related to the first feature, the clustering unit May be configured to provide. The second generation unit may be configured to provide the clustering unit with a list representing a plurality of constructs belonging to the second group and representing the similarity between the plurality of constructs relating to the second feature. Good. In this case, based on the list acquired from the first generation unit and the second generation unit, the clustering unit combines the plurality of structure pairs into a plurality of structures in which the first and second features are similar. It may be configured to be divided into clusters.

本開示の一側面によれば、第一生成部は、第一グループに属する複数の構成体のリストであって、構成体毎の第一の属性値を含むリストを、クラスタリング部に提供するように構成されてもよい。第二生成部は、第二グループに属する複数の構成体のリストであって、構成体毎の第二の属性値を含むリストを、クラスタリング部に提供するように構成されてもよい。この場合、クラスタリング部は、第一及び第二の属性値の少なくとも一方に基づき、複数の構成体ペア間の類似度を判定し、判定した類似度に基づき、複数の構成体ペアを、複数のクラスタに分割するように構成され得る。 According to one aspect of the present disclosure, the first generation unit is configured to provide the clustering unit with a list including a plurality of constituents belonging to the first group and including the first attribute value of each constituent. May be configured. The second generation unit may be configured to provide the clustering unit with a list including a plurality of constituent bodies belonging to the second group and including a second attribute value for each construction body. In this case, the clustering unit determines the degree of similarity between the plurality of structure pairs based on at least one of the first and second attribute values, and based on the determined degree of similarity, the plurality of structure pairs It may be configured to split into clusters.

本開示の一側面によれば、上記情報処理システムが備えるクラスタリング部、第一取得部、第二取得部、及び結合部の少なくとも一つとしての機能をコンピュータに実現させるためのコンピュータプログラムが提供されてもよい。コンピュータプログラムは、コンピュータ読取可能な一時的でない記録媒体に格納され得る。 According to an aspect of the present disclosure, there is provided a computer program for causing a computer to implement at least one of a clustering unit, a first acquisition unit, a second acquisition unit, and a coupling unit included in the information processing system. May be The computer program may be stored in a computer readable non-transitory recording medium.

本開示の一側面によれば、第一データベースを備える第一の外部システム、及び、第二データベースを備える第二の外部システムと通信可能な情報処理システムが提供されてもよい。この情報処理システムは、第一の外部システムから第一グループに属する複数の構成体のリストを取得し、更には、第二の外部システムから第二グループに属する複数の構成体のリストを取得し、取得したリストに基づいて、第一グループと第二グループとの間の構成体のペアであってペアに属する二つの構成体が互いに少なくとも対応する構成体ペアの複数を、複数のクラスタに分割し、複数の構成体ペアの夫々が属するクラスタを表すクラスタ情報を、第一及び第二の外部システムに提供するクラスタリング部を備えていてもよい。 According to an aspect of the present disclosure, an information processing system capable of communicating with a first external system including a first database and a second external system including a second database may be provided. The information processing system acquires a list of a plurality of constituents belonging to the first group from the first external system, and further acquires a list of a plurality of constituents belonging to the second group from the second external system. And a plurality of construction pairs each of which is a pair of constructions between the first group and the second group and at least two constructions corresponding to each other correspond to each other, based on the acquired list, into a plurality of clusters. And a clustering unit that provides cluster information representing a cluster to which each of a plurality of construct pairs belongs to the first and second external systems.

第一の外部システムは、クラスタリング部から取得したクラスタ情報に基づき、第一データベースが有する複数の構成体ペアに対応する特徴データをクラスタ毎に統計処理によって統合し、それにより生成したクラスタ毎の第一統合特徴データを、情報処理システムに提供するように構成されてもよい。情報処理システムは、第一の外部システムからクラスタ毎の第一統合特徴データを取得する第一取得部を備えていてもよい。 The first external system integrates feature data corresponding to a plurality of constituent pairs possessed by the first database by statistical processing for each cluster based on the cluster information acquired from the clustering unit, and generates the second feature for each cluster generated thereby One integrated feature data may be configured to be provided to the information processing system. The information processing system may include a first acquisition unit that acquires first integrated feature data for each cluster from the first external system.

第二の外部システムは、クラスタリング部から取得したクラスタ情報に基づき、第二データベースが有する複数の構成体ペアに対応する特徴データをクラスタ毎に統計処理によって統合し、それにより生成したクラスタ毎の第二統合特徴データを、情報処理システムに提供するように構成されてもよい。情報処理システムは、第二の外部システムからクラスタ毎の第二統合特徴データを取得する第二取得部を備えていてもよい。 The second external system integrates feature data corresponding to a plurality of constituent pairs possessed by the second database by statistical processing for each cluster based on the cluster information acquired from the clustering unit, and generates the second feature for each cluster generated thereby The second integrated feature data may be configured to be provided to the information processing system. The information processing system may include a second acquisition unit that acquires second integrated feature data for each cluster from the second external system.

情報処理システムは、第一取得部が取得したクラスタ毎の第一統合特徴データと、第二取得部が取得したクラスタ毎の第二統合特徴データと、に基づき、同一クラスタの第一統合特徴データと第二統合特徴データとを結合した結合データをクラスタ毎に有する結合データベースを生成する結合部を備えていてもよい。 The information processing system performs first integrated feature data of the same cluster based on the first integrated feature data of each cluster acquired by the first acquisition unit and the second integrated characteristic data of each cluster acquired by the second acquisition unit. And the second integrated feature data may be provided with a combining unit that generates a combined database having combined data for each cluster.

本開示の一側面によれば、第一データベースを備える外部システムと通信可能に構成された、第二データベースを備える情報処理システムが提供されてもよい。この情報処理システムは、クラスタリング部と、取得部と、生成部と、結合部と、を備えることができる。クラスタリング部は、第一グループと第二グループとの間の構成体のペアであってペアに属する二つの構成体が互いに少なくとも対応する構成体ペアの複数を、複数のクラスタに分割し、複数の構成体ペアの夫々が属するクラスタを表すクラスタ情報を、外部システムに提供する。 According to one aspect of the present disclosure, an information processing system may be provided that includes a second database configured to be communicable with an external system that includes the first database. The information processing system can include a clustering unit, an acquisition unit, a generation unit, and a coupling unit. The clustering unit divides a plurality of construct pairs, which are a pair of constructs between the first group and the second group and in which two constructs belonging to the pair at least correspond to each other, into a plurality of clusters; Cluster information is provided to the external system that represents the cluster to which each of the constituent pairs belong.

外部システムは、クラスタリング部から受信したクラスタ情報に基づき、第一データベースが有する複数の構成体ペアに対応する特徴データをクラスタ毎に統計処理によって統合し、それにより生成したクラスタ毎の第一統合特徴データを、情報処理システムに提供する。取得部は、外部システムからクラスタ毎の第一統合特徴データを取得する。 The external system integrates feature data corresponding to a plurality of constituent pairs possessed by the first database by statistical processing on a cluster basis based on cluster information received from the clustering unit, thereby generating a first integrated feature for each cluster Provide data to the information processing system. The acquisition unit acquires first integrated feature data for each cluster from the external system.

生成部は、クラスタ情報に基づき、第二データベースが有する複数の構成体ペアに対応する特徴データを、クラスタ毎に統計処理によって統合することにより、クラスタ毎の第二統合特徴データを生成する。結合部は、取得部が取得したクラスタ毎の第一統合特徴データと、生成部が生成したクラスタ毎の第二統合特徴データと、に基づき、同一クラスタの第一統合特徴データと第二統合特徴データとを結合した結合データをクラスタ毎に有する結合データベースを生成する。 The generation unit generates second integrated feature data for each cluster by integrating feature data corresponding to a plurality of component pairs of the second database based on cluster information by statistical processing for each cluster. The combining unit is based on the first integrated feature data for each cluster acquired by the acquiring unit and the second integrated feature data for each cluster generated by the generating unit, the first integrated feature data and the second integrated feature of the same cluster A combined database is generated which has combined data combining data with each cluster.

本開示の一側面によれば、第一及び第二データベースに基づく新たなデータベースを生成するための情報処理方法が提供されてもよい。この方法は、第一グループと第二グループとの間の構成体のペアであってペアに属する二つの構成体が互いに少なくとも対応する構成体ペアの複数を、複数のクラスタに分割して、複数の構成体ペアの夫々が属するクラスタを表すクラスタ情報を提供するクラスタリング手順と、クラスタリング手順により提供されるクラスタ情報に基づき、第一データベースが有する複数の構成体ペアに対応する特徴データを、クラスタ毎に統計処理によって統合し、それによりクラスタ毎に統合された特徴データとしての第一統合特徴データを生成するデバイス、からクラスタ毎の第一統合特徴データを取得する第一取得手順と、クラスタリング手順により提供されるクラスタ情報に基づき、第二データベースが有する複数の構成体ペアに対応する特徴データを、クラスタ毎に統計処理によって統合し、それによりクラスタ毎に統合された特徴データとしての第二統合特徴データを生成するデバイス、からクラスタ毎の第二統合特徴データを取得する第二取得手順と、第一取得手順により取得されたクラスタ毎の第一統合特徴データと、第二取得手順により取得されたクラスタ毎の第二統合特徴データと、に基づき、同一クラスタの第一統合特徴データと第二統合特徴データとを結合した結合データをクラスタ毎に有する結合データベースを生成する結合手順と、を含むことができる。 According to one aspect of the present disclosure, an information processing method may be provided for generating a new database based on the first and second databases. The method divides a plurality of pairs of members between the first group and the second group, in which two members belonging to the pair at least correspond to each other, into a plurality of clusters. Feature data corresponding to a plurality of construct pairs included in the first database, based on the clustering procedure providing cluster information representing the cluster to which each of the construct pairs belongs, and the cluster information provided by the clustering procedure, The first acquisition procedure for acquiring the first integrated feature data for each cluster from the device that generates the first integrated feature data as feature data integrated for each cluster by statistical processing, and the clustering procedure Based on the provided cluster information, feature data corresponding to a plurality of construction pairs possessed by the second database A second acquisition procedure for acquiring second integrated feature data for each cluster from the device for integrating the clusters by statistical processing for each cluster and thereby generating the second integrated feature data as feature data integrated for each cluster And first integrated feature data of the same cluster based on the first integrated feature data of each cluster acquired by the first acquisition procedure and the second integrated feature data of each cluster acquired by the second acquisition procedure And a combining procedure for creating a combined database having, per cluster, combined data combined with the second integrated feature data.

第一実施形態の情報処理システムの構成を表すブロック図である。It is a block diagram showing composition of an information processing system of a first embodiment. 第一データベース及びメンバリストの構成を表す図である。It is a figure showing the composition of the 1st database and member list. 第二データベース及びメンバリストの構成を表す図である。It is a figure showing composition of the 2nd database and member list. 結合デバイスが実行する結合関連処理を表すフローチャートである。It is a flowchart showing the coupling related process which a coupling device performs. 第一データ提供システムが実行する第一データ提供処理を表すフローチャートである。It is a flowchart showing the 1st data provision process which a 1st data provision system performs. 第二データ提供システムが実行する第二データ提供処理を表すフローチャートである。It is a flowchart showing the 2nd data provision process which a 2nd data provision system performs. 第一クラスタ情報及び第二クラスタ情報の構成を表す図である。It is a figure showing composition of the 1st cluster information and the 2nd cluster information. 第一及び第二データ提供システムが実行する加工処理を表すフローチャートである。It is a flowchart showing the processing which a 1st and 2nd data provision system performs. 図９Ａは、第一データベースの加工に関する説明図であり、図９Ｂは、第二データベースの加工に関する説明図である。FIG. 9A is an explanatory diagram regarding processing of the first database, and FIG. 9B is an explanatory diagram regarding processing of the second database. 結合データベースの構成を表す図である。It is a figure showing composition of a binding database. 第二実施形態の第一データ提供システムが実行するメンバリスト生成処理を表すフローチャートである。It is a flowchart showing the member list production | generation process which the 1st data provision system of 2nd embodiment performs. 第二実施形態の第二データ提供システムが実行するメンバリスト生成処理を表すフローチャートである。It is a flowchart showing the member list production | generation process which the 2nd data provision system of 2nd embodiment performs. 第二実施形態の結合システムが実行する処理を表すフローチャートである。It is a flowchart showing the process which the coupling system of 2nd embodiment performs. 第三実施形態の第一データ提供システムが実行する第一データ提供処理を表すフローチャートである。It is a flowchart showing the 1st data provision process which the 1st data provision system of 3rd embodiment performs. 第三実施形態の第二データ提供システムが実行する第二データ提供処理を表すフローチャートである。It is a flowchart showing the 2nd data provision process which the 2nd data provision system of 3rd embodiment performs. 第三実施形態の結合システムが実行する処理を表すフローチャートである。It is a flowchart showing the process which the coupling system of 3rd embodiment performs. 第四実施形態の結合システムが有する関係表に関する説明図である。It is explanatory drawing regarding the related table which the coupling system of 4th embodiment has. 第四実施形態の結合システムが実行する結合関連処理を表すフローチャートである。It is a flowchart showing the coupling related processing which the coupling system of a 4th embodiment performs. 第五実施形態の情報処理システムの構成を表すブロック図である。It is a block diagram showing the composition of the information processing system of a fifth embodiment. 第五実施形態の結合システムが実行する結合関連処理を表すフローチャートである。It is a flowchart showing the coupling related process which the coupling system of 5th embodiment performs.

以下に本開示の例示的実施形態を、図面を参照しながら説明する。 Exemplary embodiments of the present disclosure are described below with reference to the drawings.

［第一実施形態］
本実施形態の情報処理システム１は、図１に示すように、第一データ提供システム１０と、第二データ提供システム３０と、結合システム５０と、を備える。結合システム５０は、第一データ提供システム１０から提供される第一データベース１５１に関するデータと、第二データ提供システム３０から提供される第二データベース３５１に関するデータと、に基づき、第一データベース１５１と第二データベース３５１とを結合した結合データベース５５１を生成するように構成される。 First Embodiment
As shown in FIG. 1, the information processing system 1 of the present embodiment includes a first data providing system 10, a second data providing system 30, and a coupling system 50. The coupling system 50 is configured to receive the first database 151 and the second database based on data on the first database 151 provided from the first data providing system 10 and data on the second database 351 provided from the second data providing system 30. It is comprised so that the joint database 551 which couple | bonded with the two database 351 may be produced | generated.

結合データベース５５１を生成する工程には、第一データ提供システム１０が第一データベース１５１内のデータを加工する工程と、第二データ提供システム３０が第二データベース３５１内のデータを加工する工程と、が含まれる。加工する工程には、第一データベース１５１及び第二データベース３５１が有する複数の個人に関するデータを統合して、集団に関するデータに変換する工程が含まれる。 The step of generating the combined database 551 includes the steps of processing the data in the first database 151 by the first data providing system 10 and the step of processing the data in the second database 351 by the second data providing system 30; Is included. The process of processing includes a process of integrating data on a plurality of individuals in the first database 151 and the second database 351 into data on a group.

結合システム５０は、これらの加工を制御するための情報（後述する第一クラスタ情報１５５及び第二クラスタ情報３５５）を、第一データベース１５１及び第二データベース３５１に提供する。この提供は、結合システム５０が、第一データ提供システム１０及び第二データ提供システム３０から個人に関するデータの提供を受けなくても、消費者に関する有意義な結合データベース５５１を生成することを可能にする。 The coupling system 50 provides the first database 151 and the second database 351 with information (first cluster information 155 and second cluster information 355 described later) for controlling the processing. This provision enables the combination system 50 to generate a meaningful combination database 551 for consumers without receiving provision of data on individuals from the first data provision system 10 and the second data provision system 30. .

第一データ提供システム１０、第二データ提供システム３０、及び、結合システム５０のそれぞれは、説明を簡単にするため、図１において単一装置（マシン）として表現される。しかしながら、第一データ提供システム１０、第二データ提供システム３０、及び、結合システム５０は、それぞれ、複数のマシンで構成されてもよい。 Each of the first data providing system 10, the second data providing system 30, and the coupling system 50 is represented as a single device (machine) in FIG. 1 to simplify the description. However, the first data providing system 10, the second data providing system 30, and the coupling system 50 may each be configured with a plurality of machines.

第一データ提供システム１０は、プロセッサ１１と、メモリ１３と、ストレージ装置１５と、を備える。第一データ提供システム１０は、図示しない通信インタフェースを備え、ネットワークＮＴを通じて、結合システム５０と通信可能に構成される。 The first data providing system 10 includes a processor 11, a memory 13, and a storage device 15. The first data providing system 10 includes a communication interface (not shown), and is configured to be communicable with the coupling system 50 through the network NT.

プロセッサ（ＣＰＵ）１１は、メモリ１３又はストレージ装置１５に記憶されたプログラムに従う処理を実行する。メモリ１３は、ＲＯＭ及びＲＡＭ等を含む。ストレージ装置１５は、第一データベース１５１を格納する。 The processor (CPU) 11 executes processing in accordance with a program stored in the memory 13 or the storage device 15. The memory 13 includes a ROM, a RAM, and the like. The storage device 15 stores a first database 151.

第一データベース１５１は、第一グループに属する消費者毎に、消費者の特徴を表す特徴データを有する。第一グループに属する消費者は、第一グループの構成体に対応する。以下では、第一グループに属する消費者のことを、第一グループのメンバとも表現する。 The first database 151 has, for each consumer belonging to the first group, feature data representing a feature of the consumer. The consumers belonging to the first group correspond to the constituents of the first group. Hereinafter, the consumers belonging to the first group will also be expressed as members of the first group.

図２上段には、第一データベース１５１の構成が概念的に示される。図２上段から理解できるように、第一データベース１５１は、各メンバの特徴データ（ｘ１，ｘ２，…）を、メンバの顧客番号ＩＤ＿Ａ及び連結子ＩＤ＿Ｃに関連付けて記憶する。 The configuration of the first database 151 is conceptually shown in the upper part of FIG. As can be understood from the upper part of FIG. 2, the first database 151 stores the feature data (x1, x2,...) Of each member in association with the member's customer number ID_A and connector ID_C.

顧客番号ＩＤ＿Ａ及び連結子ＩＤ＿Ｃは、それぞれ、対応するメンバを識別するためのメンバ固有の識別コードである。但し、顧客番号ＩＤ＿Ａは、第二データベース３５１では用いられない第一グループ専用の識別コードである。この点で、顧客番号ＩＤ＿Ａは、第一データベース１５１及び第二データベース３５１において共通して用いられる連結子ＩＤ＿Ｃとは異なる。 Each of the customer number ID_A and the connector ID_C is a member-specific identification code for identifying the corresponding member. However, the customer number ID_A is an identification code dedicated to the first group that is not used in the second database 351. At this point, the customer number ID_A is different from the connector ID_C commonly used in the first database 151 and the second database 351.

特徴データは、対応するメンバの特徴を複数の要素ｘ１，ｘ２，…で表す。要素ｘ１，ｘ２，…の例には、メンバの年齢、性別、居住地域、趣味、並びに、商品毎の購買経験及び購買数等が含まれる。第一データベース１５１が特定企業Ａによって管理される場合、第一データベース１５１の特徴データに含まれる商品毎の購買経験及び購買数に関する情報は、特定企業Ａから販売される商品に関する情報であり得る。 The feature data represents the feature of the corresponding member by a plurality of elements x1, x2,. Examples of the elements x1, x2, ... include the member's age, gender, residence area, hobbies, and purchase experience and number of purchases for each product. When the first database 151 is managed by the specific company A, the information on the purchase experience and the number of purchases for each product included in the feature data of the first database 151 may be information on goods sold from the specific company A.

第二データ提供システム３０は、プロセッサ３１と、メモリ３３と、ストレージ装置３５と、を備える。第二データ提供システム３０は、図示しない通信インタフェースを備え、ネットワークＮＴを通じて、結合システム５０と通信可能に構成される。 The second data providing system 30 includes a processor 31, a memory 33, and a storage device 35. The second data providing system 30 includes a communication interface (not shown) and is configured to be communicable with the coupling system 50 through the network NT.

プロセッサ（ＣＰＵ）３１は、メモリ３３又はストレージ装置３５に記憶されたプログラムに従う処理を実行する。メモリ３３は、ＲＯＭ及びＲＡＭ等を含む。ストレージ装置３５は、第二データベース３５１を格納する。 The processor (CPU) 31 executes processing in accordance with a program stored in the memory 33 or the storage device 35. The memory 33 includes a ROM, a RAM, and the like. The storage device 35 stores a second database 351.

第二データベース３５１は、第二グループに属する消費者毎に、消費者の特徴を表す特徴データを有する。第二グループに属する消費者は、第二グループの構成体に対応する。以下では、第二グループに属する消費者のことを、第二グループのメンバとも表現する。 The second database 351 has feature data representing the features of the consumer for each consumer belonging to the second group. The consumers belonging to the second group correspond to the constituents of the second group. Hereinafter, consumers belonging to the second group are also expressed as members of the second group.

図３上段には、第二データベース３５１の構成が概念的に示される。図３上段から理解できるように、第二データベース３５１は、各メンバの特徴データ（ｙ１，ｙ２，…）を、メンバの顧客番号ＩＤ＿Ｂ及び連結子ＩＤ＿Ｃに関連付けて記憶する。 The configuration of the second database 351 is conceptually shown in the upper part of FIG. As can be understood from the upper part of FIG. 3, the second database 351 stores the feature data (y1, y2,...) Of each member in association with the member's customer number ID_B and connector ID_C.

顧客番号ＩＤ＿Ｂ及び連結子ＩＤ＿Ｃは、それぞれ、対応するメンバを識別するためのメンバ固有の識別コードである。但し、顧客番号ＩＤ＿Ｂは、第一データベース１５１では用いられない第二グループ専用の識別コードである。連結子ＩＤ＿Ｃは、上述した通り、第一データベース１５１と共通して用いられる識別コードである。第二データベース３５１において、第一データベース１５１と同一の消費者についての特徴データには、第一データベース１５１と同一の値を示す連結子ＩＤ＿Ｃが関連付けられる。 Each of the customer number ID_B and the connector ID_C is a member-specific identification code for identifying the corresponding member. However, the customer number ID_B is a second group dedicated identification code not used in the first database 151. The connector ID_C is an identification code commonly used with the first database 151 as described above. In the second database 351, a connector ID_C indicating the same value as that of the first database 151 is associated with the feature data of the same consumer as the first database 151.

第二データベース３５１内の特徴データは、メンバの特徴を複数の要素ｙ１，ｙ２，…で表す。要素ｙ１，ｙ２，…の例には、メンバの年齢、性別、居住地域、趣味、並びに、商品毎の購買経験及び購買数等が含まれる。但し、要素ｙ１，ｙ２，…の少なくとも一部は、第一データベース１５１の特徴データが有する要素ｘ１，ｘ２，…とは異なる。その意味で、同一消費者についての第二データベース３５１における特徴データと第一データベース１５１における特徴データとは、互いに同一消費者の異なる種類の特徴を表す。要素ｙ１，ｙ２，…の例として説明した商品毎の購買経験及び購買数に関する情報は、例えば、特定企業Ａとは別の特定企業Ｂから販売される商品に関する情報であり得る。 The feature data in the second database 351 represents the feature of the member by a plurality of elements y1, y2,. Examples of the elements y1, y2, ... include the member's age, gender, residence area, hobbies, and purchase experience and number of purchases for each product. However, at least a part of the elements y1, y2,... Is different from the elements x1, x2,. In that sense, the feature data in the second database 351 for the same consumer and the feature data in the first database 151 represent different types of features of the same consumer. The information on the purchase experience and the number of purchases for each product described as an example of the elements y1, y2, ... may be, for example, information on a product sold from a specific company B different from the specific company A.

連結子ＩＤ＿Ｃは、複数のデータベースに共通する消費者を識別するために用いられる。具体的には、同一消費者についての、第一データベース１５１の特徴データと、第二データベース３５１の特徴データとを関連付けるために用いられる。 The connector ID_C is used to identify a consumer common to multiple databases. Specifically, it is used to associate feature data of the first database 151 and feature data of the second database 351 for the same consumer.

図２上段及び図３上段に示される例によれば、第一データベース１５１において顧客番号ＩＤ＿ＡとしてのＡ０００３，Ａ０００４、Ａ０００５，Ａ０００６，Ａ０００７に関連付けられた特徴データの消費者が、第二データベース３５１において顧客番号ＩＤ＿ＢとしてのＢ０００１，Ｂ０００２，Ｂ０００３，Ｂ０００４，Ｂ０００５に関連付けられた特徴データの消費者と同じである。 According to the example shown in the upper part of FIG. 2 and the upper part of FIG. 3, the consumer of the feature data associated with A0003, A0004, A0005, A0006, A0007 as the customer number ID_A in the first database 151 is in the second database 351. It is the same as the consumer of the feature data associated with B0001, B0002, B0003, B0004, B0005 as customer number ID_B.

連結子ＩＤ＿Ｃは、例えば、第一及び第二グループよりも、メンバ数の多い第三グループの顧客番号であり得る。第三グループは、消費者の多くがメンバとして所属するグループであり得る。連結子ＩＤ＿Ｃは、消費者が所有するスマートフォン等の通信機器の識別コードであってもよいし、ネットワーク上の消費者追跡に利用される消費者の識別コード（例えばＣｏｏｋｉｅ等）であってもよい。 The connector ID_C may be, for example, a customer number of a third group having more members than the first and second groups. The third group may be a group to which many consumers are members. The connector ID_C may be an identification code of a communication device such as a smartphone owned by a consumer, or may be an identification code of a consumer (for example, Cookie etc.) used for consumer tracking on the network. .

連結子ＩＤ＿Ｃは、第一データベース１５１の管理者と、第二データベース３５１の管理者とが協力せずとも共通して入手可能な消費者の識別コードであると都合が良い。連結子ＩＤ＿Ｃは、これらの識別コードを秘匿化、具体的にはハッシュ化した値であってもよい。ハッシュ化は、同一消費者の連結子ＩＤ＿Ｃの値が同じとなるように、第一データ提供システム１０及び第二データ提供システム３０で、同じハッシュ関数を用いて行うことができる。付言すると、第一グループのメンバ及び第二グループのメンバの一部は、連結子ＩＤ＿Ｃを有していなくてもよい。この場合、第一データベース１５１及び第二データベース３５１における該当メンバの特徴データには、連結子が不明であることを示す情報が関連付けられる。 The connector ID_C is advantageously a consumer identification code that can be commonly obtained without the cooperation of the administrator of the first database 151 and the administrator of the second database 351. The connector ID_C may be a value obtained by concealing these identification codes, specifically by hashing. Hashing can be performed using the same hash function in the first data providing system 10 and the second data providing system 30 so that the value of the connector ID_C of the same consumer is the same. In addition, the members of the first group and some of the members of the second group may not have the connector ID_C. In this case, information indicating that the connector is unknown is associated with the feature data of the corresponding member in the first database 151 and the second database 351.

結合システム５０は、プロセッサ５１と、メモリ５３と、ストレージ装置５５と、を備える。結合システム５０は、図示しない通信インタフェースを備え、ネットワークＮＴを通じて、第一データ提供システム１０及び第二データ提供システム３０と通信可能に構成される。 The coupling system 50 includes a processor 51, a memory 53, and a storage device 55. The coupling system 50 includes a communication interface (not shown), and is configured to be communicable with the first data providing system 10 and the second data providing system 30 through the network NT.

プロセッサ（ＣＰＵ）５１は、メモリ５３又はストレージ装置５５に記憶されたプログラムに従う処理を実行する。メモリ５３は、ＲＯＭ及びＲＡＭ等を含む。ストレージ装置５５には、プロセッサ５１が実行する処理により生成される結合データベース５５１が格納される。 The processor (CPU) 51 executes processing in accordance with a program stored in the memory 53 or the storage device 55. The memory 53 includes a ROM, a RAM, and the like. The storage device 55 stores a combined database 551 generated by the process executed by the processor 51.

結合システム５０のプロセッサ５１は、図示しないユーザインタフェースを通じて、結合システム５０のユーザから、第一データベース１５１及び第二データベース３５１に基づく結合データベース５５１の生成指示が入力されると、図４に示す結合関連処理を開始する。 The processor 51 of the coupling system 50 receives the instruction to generate the coupling database 551 based on the first database 151 and the second database 351 from the user of the coupling system 50 through a user interface (not shown), as shown in FIG. Start processing

結合関連処理を開始すると、プロセッサ５１は、ネットワークＮＴを通じて、第一データ提供システム１０及び第二データ提供システム３０にメンバリストを要求する要求信号を送信する（Ｓ１１０）。 When the connection related process is started, the processor 51 transmits a request signal for requesting a member list to the first data providing system 10 and the second data providing system 30 through the network NT (S110).

この要求信号を受信すると、第一データ提供システム１０のプロセッサ１１は、図５に示す第一データ提供処理を開始し、第一データベース１５１に特徴データを有する第一グループのメンバを列挙したメンバリスト１５３を生成し（Ｓ３１０）、生成したメンバリスト１５３を結合システム５０にネットワークＮＴを通じて送信する（Ｓ３２０）。 When this request signal is received, the processor 11 of the first data providing system 10 starts the first data providing process shown in FIG. 5 and a member list listing members of the first group having feature data in the first database 151. 153 is generated (S310), and the generated member list 153 is transmitted to the coupling system 50 through the network NT (S320).

具体的に、プロセッサ１１は、図２下段に示すように、第一グループのメンバを、連結子ＩＤ＿Ｃで表現してメンバリスト１５３を生成する。メンバリスト１５３を生成する際、連結子ＩＤ＿Ｃはハッシュ化されてもよい。ここで、連結子ＩＤ＿Ｃが不明の第一グループのメンバは、メンバリスト１５３に列挙されない。本実施形態において、連結子ＩＤ＿Ｃが不明のメンバの特徴データは、結合データベース５５１の生成に用いられない。換言すれば、結合データベース５５１の生成過程において、連結子ＩＤ＿Ｃが不明のメンバの特徴データは、存在しないものとみなされる。 Specifically, as illustrated in the lower part of FIG. 2, the processor 11 generates a member list 153 by representing the members of the first group by the connector ID_C. When generating the member list 153, the connector ID_C may be hashed. Here, the members of the first group whose connector ID_C is unknown are not listed in the member list 153. In the present embodiment, the feature data of the member whose connector ID_C is unknown is not used to generate the combined database 551. In other words, in the generation process of the combined database 551, the feature data of the member whose connector ID_C is unknown is regarded as nonexistent.

同様に、第二データ提供システム３０のプロセッサ３１は、上記要求信号を受信すると、図６に示す第二データ提供処理を開始し、第二データベース３５１に特徴データを有する第二グループのメンバを列挙したメンバリスト３５３を生成し（Ｓ４１０）、生成したメンバリスト３５３を、結合システム５０にネットワークＮＴを通じて送信する（Ｓ４２０）。具体的に、プロセッサ３１は、図３下段に示すように、第二グループのメンバを、連結子ＩＤ＿Ｃで表して、メンバリスト３５３を生成する。メンバリスト３５３を生成する際、連結子ＩＤ＿Ｃは、第一データ提供システム１０と同じハッシュ関数を用いてハッシュ化されてもよい。 Similarly, when the processor 31 of the second data providing system 30 receives the request signal, it starts the second data providing process shown in FIG. 6 and enumerates members of the second group having feature data in the second database 351. The generated member list 353 is generated (S410), and the generated member list 353 is transmitted to the coupling system 50 through the network NT (S420). Specifically, the processor 31 generates a member list 353 by representing the members of the second group with a connector ID_C as shown in the lower part of FIG. When generating the member list 353, the connector ID_C may be hashed using the same hash function as the first data providing system 10.

結合システム５０のプロセッサ５１は、このように第一データ提供システム１０及び第二データ提供システム３０から送信されてくる第一グループのメンバリスト１５３及び第二グループのメンバリスト３５３を受信し（Ｓ１２０）、受信したメンバリスト１５３，３５３に基づいて、クラスタリング処理（Ｓ１３０）を実行する。 The processor 51 of the coupling system 50 receives the member list 153 of the first group and the member list 353 of the second group thus transmitted from the first data providing system 10 and the second data providing system 30 (S120). The clustering process (S130) is executed based on the received member list 153, 353.

クラスタリング処理（Ｓ１３０）において、プロセッサ５１は、第一グループ及び第二グループのメンバリスト１５３，３５３から、第一グループ及び第二グループに共通するメンバ（以下、「共通メンバ」と称する）を特定する（Ｓ１３１）。特定は、メンバリスト１５３，３５３間の連結子ＩＤ＿Ｃの照合により実現可能である。 In the clustering process (S130), the processor 51 specifies a member common to the first group and the second group (hereinafter, referred to as "common member") from the member list 153, 353 of the first group and the second group. (S131). The identification can be realized by collating the connector ID_C between the member lists 153 and 353.

その後、プロセッサ５１は、共通メンバを複数のクラスタに分割する（Ｓ１３３）。例えば、プロセッサ５１は、共通メンバを、ランダムに又は所定規則で、予め定められたメンバ数のクラスタに分割することができる。例えば、共通メンバの数がＭで、予め定められたクラスタ当たりのメンバ数がＫである場合、値Ｍを値Ｋで除算したときの商αに対応する個数だけクラスタを生成することができる。値Ｋは、個人情報保護の観点で１より大きい値に定められる。１以上の剰余βがある場合、剰余βに対応するメンバは、上記商αに対応する個数のクラスタのいずれかにランダム又は所定規則で配分され得る。 Thereafter, the processor 51 divides the common member into a plurality of clusters (S133). For example, the processor 51 can divide the common members into clusters of a predetermined number of members randomly or with a predetermined rule. For example, when the number of common members is M and the predetermined number of members per cluster is K, clusters can be generated by the number corresponding to the quotient α when dividing the value M by the value K. The value K is set to a value larger than 1 in terms of personal information protection. When there is one or more remainders β, members corresponding to the remainders β may be allocated randomly or according to a predetermined rule to any number of clusters corresponding to the quotient α.

更に、プロセッサ５１は、第一グループのメンバリスト１５３から特定される共通メンバではない第一グループのメンバ、即ち、第一グループの非共通メンバを、複数のクラスタに分割する（Ｓ１３５）。Ｓ１３５におけるクラスタリングは、Ｓ１３３におけるクラスタリングと同様の手法で行うことができる。 Furthermore, the processor 51 divides the members of the first group that are not common members specified from the member list 153 of the first group, that is, the non-common members of the first group into a plurality of clusters (S135). The clustering in S135 can be performed by the same method as the clustering in S133.

更に、プロセッサ５１は、第二グループのメンバリスト３５３から特定される共通メンバではない第二グループのメンバ、即ち、第二グループの非共通メンバを、複数のクラスタに分割する（Ｓ１３７）。Ｓ１３７におけるクラスタリングは、Ｓ１３５におけるクラスタリングと同様の手法で行うことができる。 Further, the processor 51 divides the members of the second group which are not common members specified from the member list 353 of the second group, that is, the non-common members of the second group into a plurality of clusters (S137). The clustering in S137 can be performed by the same method as the clustering in S135.

プロセッサ５１は、このようにして共通メンバ、第一グループの非共通メンバ、及び第二グループの非共通メンバをそれぞれ、複数のクラスタに分割した後、第一クラスタ情報１５５及び第二クラスタ情報３５５を生成し（Ｓ１４０）、第一データ提供システム１０に、第一クラスタ情報１５５を送信し、第二データ提供システム３０に、第二クラスタ情報３５５を送信する（Ｓ１４０）。 The processor 51 divides the common member, the non-common member of the first group, and the non-common member of the second group into a plurality of clusters in this manner, and then the first cluster information 155 and the second cluster information 355. It generates (S140), transmits the first cluster information 155 to the first data providing system 10, and transmits the second cluster information 355 to the second data providing system 30 (S140).

第一クラスタ情報１５５は、図７に示すように、第一データ提供システム１０から受信したメンバリスト１５３に、各メンバが属するクラスタの識別コードであるクラスタ番号を付して生成される。第二クラスタ情報３５５は、第二データ提供システム３０から受信したメンバリスト３５３に、各メンバが属するクラスタのクラスタ番号を付して生成される。 As shown in FIG. 7, the first cluster information 155 is generated by adding a cluster number, which is an identification code of a cluster to which each member belongs, to the member list 153 received from the first data providing system 10. The second cluster information 355 is generated by adding the cluster number of the cluster to which each member belongs to the member list 353 received from the second data providing system 30.

第一データ提供システム１０のプロセッサ１１は、第一クラスタ情報１５５を受信すると（Ｓ３３０）、第一データベース１５１に対する加工処理を実行する（Ｓ３４０）。具体的に、プロセッサ１１は、図８に示す加工処理を実行する。 The processor 11 of the first data providing system 10, upon receiving the first cluster information 155 (S330), executes processing on the first database 151 (S340). Specifically, the processor 11 executes the processing shown in FIG.

即ち、プロセッサ１１は、受信した第一クラスタ情報１５５に基づき、クラスタの一つを選択し（Ｓ５１０）、選択したクラスタに属するメンバの第一データベース１５１内の複数の特徴データを統計処理により統合して、選択したクラスタに対応する一つの特徴データ（以下、「統合特徴データ」と称する）を生成する（Ｓ５２０）。 That is, the processor 11 selects one of the clusters based on the received first cluster information 155 (S510), and integrates, by statistical processing, a plurality of feature data in the first database 151 of the members belonging to the selected cluster. Then, one feature data (hereinafter referred to as "integrated feature data") corresponding to the selected cluster is generated (S520).

プロセッサ１１は、全てのクラスタに関して統合特徴データを生成したと判断するまで、Ｓ５１０及びＳ５２０の処理を繰返し実行することにより、クラスタ毎に統合特徴データを生成する。そして、全てのクラスタの統合特徴データを生成すると（Ｓ５３０でＹｅｓ）、加工処理（Ｓ３４０）を終了し、クラスタ毎の統合特徴データを備える加工後第一データベース１５７を結合システム５０に送信する（Ｓ３５０）。 The processor 11 repeatedly generates the integrated feature data for each cluster by repeatedly executing the processes of S510 and S520 until determining that the integrated feature data is generated for all the clusters. Then, when integrated feature data of all clusters are generated (Yes in S530), the processing (S340) is ended, and the processed first database 157 including integrated feature data for each cluster is transmitted to the coupling system 50 (S350) ).

加工後第一データベース１５７は、第一データベース１５１が備えるメンバ毎の特徴データに代えて、この特徴データに対する統計処理後のデータであるクラスタ毎の統合特徴データを有するデータベースである。図９Ａには、加工後第一データベース１５７の構成を部分的に表す。 The post-processing first database 157 is a database having integrated feature data for each cluster, which is data after statistical processing for the feature data, instead of the feature data for each member included in the first database 151. FIG. 9A partially shows the configuration of the first processed database 157.

Ｓ５２０において、プロセッサ１１は、選択したクラスタに属する複数のメンバの特徴データが示す要素ｘ１，ｘ２，…の値を、要素毎に、一つの統計値に変換して、このクラスタに対応する一つの統合特徴データを生成する。 In S520, the processor 11 converts the values of the elements x1, x2, ... indicated by the feature data of the plurality of members belonging to the selected cluster into one statistical value for each element, and selects one corresponding to the cluster Generate integrated feature data.

図７に示す第一クラスタ情報１５５の例によれば、顧客番号Ａ０００３及びＡ０００４の特徴データは、同一クラスタ（クラスタ番号００２）に属する。そのため、このクラスタに関して、Ｓ５２０では、Ａ０００３の特徴データとＡ０００４の特徴データとが、要素ｘ１，ｘ２，…毎に統計値に変換されて、要素ｘ１，ｘ２，…毎の統計値を含む統合特徴データが生成される。 According to the example of the first cluster information 155 shown in FIG. 7, the feature data of the customer numbers A0003 and A0004 belong to the same cluster (cluster number 002). Therefore, regarding this cluster, in S520, the feature data of A0003 and the feature data of A0004 are converted into statistical values for each element x1, x2, ..., and integrated features including statistical values for each element x1, x2, ... Data is generated.

図９Ａに示されるように、要素ｘ１に関しては、Ａ０００３の値ｘ１［３］と、Ａ０００４の値ｘ１［４］とが統計処理されて、その統計値ＳＴ｛ｘ１［３］，ｘ１［４］｝が、クラスタ番号００２の統合特徴データにおける要素ｘ１の値として記述される。ここでＳＴ｛｝は、括弧｛｝内の値の統計値であることを示す。統計値は、平均値であってもよいし、中央値であってもよいし、最頻値であってもよいし、最大値及び／又は最小値であってもよいし、構成比であってもよい。統計値の種類は、要素ｘ１，ｘ２，…の種類毎に予め定められる。 As shown in FIG. 9A, with respect to the element x1, the value x1 [3] of A0003 and the value x1 [4] of A0004 are statistically processed, and the statistical value ST {x1 [3], x1 [4] is obtained. } Is described as the value of the element x1 in the integrated feature data of the cluster number 002. Here, ST {} indicates that it is a statistical value of values in parentheses {}. The statistical value may be an average value, a median value, a mode value, a maximum value and / or a minimum value, or a composition ratio. May be The type of statistical value is determined in advance for each type of elements x1, x2,.

例えば、要素ｘ１が年齢を表すとき、統計値ＳＴ｛ｘ１［３］，ｘ１［４］｝は、年齢ｘ１［３］及び年齢ｘ１［４］の平均値であり得る。要素ｘ１が性別を表すとき、統計値ＳＴ｛ｘ１［３］，ｘ１［４］｝は、男性及び／又は女性の比率であり得る。要素ｘ１が商品の購入経験を表すとき、ＳＴ｛ｘ１［３］，ｘ１［４］｝は、商品の購入経験を有するメンバの比率であり得る。要素ｘ１が商品の購入数を表すとき、ＳＴ｛ｘ１［３］，ｘ１［４］｝は、商品の購入数の平均値、中央値、及び、最大値の一つ又は組合せであり得る。 For example, when the element x1 represents an age, the statistical value ST {x1 [3], x1 [4]} may be an average value of the age x1 [3] and the age x1 [4]. When the element x1 represents a gender, the statistic ST {x1 [3], x1 [4]} may be a male and / or female ratio. When the element x1 represents the purchase experience of a product, ST {x1 [3], x1 [4]} may be a ratio of members who have the purchase experience of the product. When the element x1 represents the number of purchases of a product, ST {x1 [3], x1 [4]} may be one or a combination of an average, a median, and a maximum of the number of purchases of a product.

この他、図７に示す第一クラスタ情報１５５の例によれば、顧客番号Ａ０００５、Ａ０００６、及びＡ０００７の特徴データは、同一クラスタ（クラスタ番号００３）に属する。そのため、このクラスタに関して、加工処理のＳ５２０では、Ａ０００５の特徴データと、Ａ０００６の特徴データと、Ａ０００７の特徴データが、要素ｘ１，ｘ２，…毎に統計値に変換されて、統合特徴データが生成される。 Besides, according to the example of the first cluster information 155 shown in FIG. 7, the feature data of the customer numbers A0005, A0006 and A0007 belong to the same cluster (cluster number 003). Therefore, regarding this cluster, in S520 of processing, the feature data of A0005, the feature data of A0006, and the feature data of A0007 are converted into statistical values for each element x1, x2, ..., and integrated feature data is generated. Be done.

図９Ａに示されるように、統合特徴データは、変数ｋの値を追加で有する。この変数ｋは、クラスタに属するメンバの数を表し、統合特徴データを生成する際に、統合特徴データ内に記述される。 As shown in FIG. 9A, the integrated feature data additionally has the value of the variable k. The variable k represents the number of members belonging to a cluster, and is described in integrated feature data when generating integrated feature data.

第二データ提供システム３０も、結合システム５０から第二クラスタ情報３５５を受信すると、第一データ提供システム１０と同様の処理を実行する。即ち、第二データ提供システム３０のプロセッサ３１は、第二クラスタ情報３５５を受信すると（Ｓ４３０）、第二データベース３５１に対する加工処理を実行する（Ｓ４４０）。ここで、プロセッサ３１が実行する加工処理は、図８に示す通りである。 The second data providing system 30 also executes the same process as the first data providing system 10 upon receiving the second cluster information 355 from the combined system 50. That is, when the processor 31 of the second data providing system 30 receives the second cluster information 355 (S430), the processor 31 executes processing on the second database 351 (S440). Here, the processing performed by the processor 31 is as shown in FIG.

即ち、プロセッサ３１は、第二データベース３５１が有する特徴データを、第二クラスタ情報３５５に基づき、クラスタ毎に統合して、クラスタ毎の統合特徴データを生成する。具体的に、プロセッサ３１は、クラスタ毎に、対応するクラスタに属する複数のメンバの特徴データが示す要素ｙ１，ｙ２，…の値を、要素毎に、一つの統計値に変換して、このクラスタに対応する一つの統合特徴データを生成する。これにより、プロセッサ３１は、第二データベース３５１に基づくクラスタ毎の統合特徴データを備える加工後第二データベース３５７を生成する。その後、プロセッサ３１は、加工後第二データベース３５７を、結合システム５０に送信する（Ｓ４５０）。 That is, the processor 31 integrates feature data included in the second database 351 for each cluster based on the second cluster information 355, and generates integrated feature data for each cluster. Specifically, the processor 31 converts the values of the elements y1, y2,... Indicated by the feature data of a plurality of members belonging to the corresponding cluster into one statistical value for each element, and To generate one integrated feature data corresponding to. Thereby, the processor 31 generates a second processed database 357 including integrated feature data for each cluster based on the second database 351. Thereafter, the processor 31 transmits the processed second database 357 to the coupling system 50 (S450).

図７に示す第二クラスタ情報３５５の例によれば、顧客番号Ｂ０００１及びＢ０００２の特徴データは、同一クラスタ（クラスタ番号００２）に属する。そのため、このクラスタに関して、加工処理のＳ５２０では、Ｂ０００１の特徴データとＢ０００２の特徴データとが、要素ｙ１，ｙ２，…毎に統計値に変換されて、要素ｙ１，ｙ２，…毎の統計値を含む統合特徴データが生成される。 According to the example of the second cluster information 355 shown in FIG. 7, the feature data of the customer numbers B0001 and B0002 belong to the same cluster (cluster number 002). Therefore, regarding this cluster, in S520 of processing, the feature data of B0001 and the feature data of B0002 are converted into statistical values for each element y1, y2, ..., and the statistical values for each element y1, y2, ... Integrated feature data to be included is generated.

加工後第二データベース３５７が備える統合特徴データの例は、図９Ｂに示される。図９Ｂに示されるように、要素ｙ１に関しては、Ｂ０００１の値ｙ１［１］と、Ｂ０００２の値ｙ１［２］とが統計処理されて、その統計値ＳＴ｛ｙ１［１］，ｙ１［２］｝が、クラスタ番号００２の統合特徴データにおける要素ｙ１の値として記述される。統計値の種類は、要素ｙ１，ｙ２，…の種類毎に予め定められる。 An example of integrated feature data included in the second processed database 357 is shown in FIG. 9B. As shown in FIG. 9B, with respect to the element y1, the value y1 [1] of B0001 and the value y1 [2] of B0002 are statistically processed, and the statistical value ST {y1 [1], y1 [2] is obtained. } Is described as the value of the element y1 in the integrated feature data of the cluster number 002. The type of statistical value is determined in advance for each type of elements y1, y2,.

結合システム５０のプロセッサ５１は、第一データ提供システム１０から上記加工後第一データベース１５７を受信し（Ｓ１５０）、更に、第二データ提供システム３０から加工後第二データベース３５７を受信すると（Ｓ１６０）、これら加工後第一データベース１５７及び加工後第二データベース３５７を結合することにより、結合データベース５５１を生成し、結合データベース５５１をストレージ装置５５に格納する（Ｓ１７０）。その後、図４に示す処理を終了する。 The processor 51 of the coupling system 50 receives the processed first database 157 from the first data providing system 10 (S150), and further receives the processed second database 357 from the second data providing system 30 (S160). A combined database 551 is generated by combining the post-processing first database 157 and the post-processing second database 357, and the combined database 551 is stored in the storage device 55 (S170). Thereafter, the process shown in FIG. 4 is ended.

結合データベース５５１の構成は、図１０において概念的に示される。Ｓ１７０において、結合システム５０のプロセッサ５１は、加工後第一データベース１５７と加工後第二データベース３５７との間で、同一クラスタの統合特徴データ同士を結合するように、加工後第一データベース１５７が備えるクラスタ毎の統合特徴データと、加工後第二データベース３５７が備えるクラスタ毎の統合特徴データと、を結合することによって、結合データベース５５１を生成する。 The configuration of the combined database 551 is conceptually shown in FIG. In S170, the processor 51 of the coupling system 50 includes the first processed database 157 so as to couple integrated feature data of the same cluster between the first processed database 157 and the second processed database 357. A combined database 551 is generated by combining the integrated feature data for each cluster and the integrated feature data for each cluster included in the second processed database 357.

結合データベース５５１は、クラスタ毎に、第一統合特徴データと第二統合特徴データとが結合された結合データを有する。ここで、一つのクラスタの第一統合特徴データは、加工後第一データベース１５７が有する対応クラスタの統合特徴データであり、第二統合特徴データは、加工後第二データベース３５７が有する対応クラスタの統合特徴データである。 The combined database 551 includes combined data in which the first integrated feature data and the second integrated feature data are combined for each cluster. Here, the first integrated feature data of one cluster is integrated feature data of the corresponding cluster of the processed first database 157, and the second integrated feature data is an integrated of the corresponding cluster of the second processed database 357. It is feature data.

以上に、本実施形態の情報処理システム１について説明したが、この情報処理システム１によれば、結合システム５０は、第一データベース１５１及び第二データベース３５１が有する個人の特徴データを入手することなしに、第一データベース１５１に基づくクラスタ毎の統合特徴データ及び第二データベース３５１に基づくクラスタ毎の統合特徴データを有意義に結合して、第一データベース１５１と第二データベース３５１との結合データベースに対応した有意義な結合データベース５５１を生成することができる。 Although the information processing system 1 according to the present embodiment has been described above, according to the information processing system 1, the combination system 50 does not obtain individual feature data of the first database 151 and the second database 351. The integrated feature data for each cluster based on the first database 151 and the integrated feature data for each cluster based on the second database 351 are meaningfully combined to correspond to the combined database of the first database 151 and the second database 351. A meaningful combined database 551 can be generated.

個人の特徴データをクラスタの特徴データ（統合特徴データ）に変換することは、個人情報保護に役立つ。従って、本技術によれば、個人情報保護の観点からデータ提供に消極的なデータベースの管理者からも、比較的容易にデータ提供を受けることができる。よって、本実施形態によれば、個人情報を保護しつつ、社会に散在する消費者に関するデータを有意義に結合して、有意義な結合データベース５５１を生成することができる。結合データベース５５１は、例えば、ストレージ装置５５から取り出して、消費者行動の分析等に利用することができる。 Converting individual feature data into cluster feature data (integrated feature data) helps protect personal information. Therefore, according to the present technology, it is possible to receive data provision relatively easily even from a database administrator who is reluctant to provide data from the viewpoint of personal information protection. Therefore, according to the present embodiment, it is possible to generate meaningful combined database 551 by meaningfully combining data on consumers scattered in society while protecting personal information. The combined database 551 can be retrieved from, for example, the storage device 55 and used for analysis of consumer behavior or the like.

この他、本実施形態では、結合システム５０が、第一データ提供システム１０における第一データベース１５１の加工、及び、第二データ提供システム３０における第二データベース３５１の加工を、第一及び第二クラスタ情報１５５，３５５の提供により制御する。この制御により、第一データ提供システム１０及び第二データ提供システム３０からは、共通メンバに関し、共通するクラスタ毎の統合特徴データの提供を受けることができ、結合システム５０は、同一クラスタ毎に、統合特徴データを有意義に結合することができる。 In addition, in the present embodiment, the coupling system 50 performs processing of the first database 151 in the first data providing system 10 and processing of the second database 351 in the second data providing system 30 as the first and second clusters. It controls by provision of the information 155 and 355. By this control, the first data provision system 10 and the second data provision system 30 can receive provision of integrated feature data for each common cluster with respect to the common members, and the combination system 50 can be provided for each same cluster. Integration feature data can be meaningfully combined.

比較例として、第一グループのメンバ及び第二グループのメンバが、第一データ提供システム及び第二データ提供システムにおいて個別にクラスタリングされる例を考える。この場合、共通メンバに対応するクラスタ群は、第一データ提供システムと第二データ提供システムとで異なる。従って、比較例の結合システムは、クラスタ内のメンバが異なる第一グループのクラスタ毎の統合特徴データと、第二グループのクラスタ毎の統合特徴データを結合して、結合データベースを生成することになる。 As a comparative example, consider an example in which the members of the first group and the members of the second group are separately clustered in the first data providing system and the second data providing system. In this case, the cluster group corresponding to the common member is different between the first data providing system and the second data providing system. Therefore, the combined system of the comparative example combines the combined feature data for each cluster of the first group with different members in the cluster and the combined feature data for each cluster of the second group to generate a combined database .

この比較例から理解できるように、本実施形態によれば、第一データベース１５１及び第二データベース３５１の加工を伴いながらも、比較例の結合データベースより有意義で価値のある結合データベース５５１を生成することができる。 As can be understood from this comparative example, according to the present embodiment, while processing the first database 151 and the second database 351, generating a more meaningful and valuable combined database 551 than the combined database of the comparative example. Can.

付言すると、比較例では、互いに結合される二つの統合特徴データに対応する二つのクラスタの両方に属するメンバが唯一である状態も発生し得る。このような状態が発生する可能性は、３個以上の多数のデータベースを連結する場合に高まる。これに対し、本実施形態は、多数のデータベースを連結しても、連結される複数の統合特徴データに対応するクラスタは同一メンバで構成されるため、上述のような状態が発生しない。従って、本実施形態によれば、個人情報保護の観点でも、より優れた結合データベース５５１を提供することができる。 In addition, in the comparative example, a state in which a member belonging to both of two clusters corresponding to two integrated feature data coupled to each other is unique may also occur. The possibility that such a situation occurs increases when connecting three or more databases. On the other hand, in the present embodiment, even if a large number of databases are connected, the above-mentioned state does not occur because clusters corresponding to a plurality of integrated feature data to be connected are configured by the same member. Therefore, according to the present embodiment, it is possible to provide a better combined database 551 also from the viewpoint of personal information protection.

［第二実施形態］
続いて、第二実施形態の情報処理システム１を説明する。第二実施形態の情報処理システム１は、結合システム５０が、第一データ提供システム１０及び第二データ提供システム３０から提供される距離情報付のメンバリストを利用して、共通メンバ、第一グループの非共通メンバ、及び、第二グループの非共通メンバをクラスタリングする点で、第一実施形態の情報処理システム１とは異なる。一方、第二実施形態の情報処理システム１は、その他の多くの点で、第一実施形態の情報処理システム１と同様に構成される。 Second Embodiment
Subsequently, an information processing system 1 of the second embodiment will be described. In the information processing system 1 of the second embodiment, the coupling system 50 uses the member list with distance information provided from the first data providing system 10 and the second data providing system 30 to generate a common member, a first group The information processing system 1 is different from the information processing system 1 of the first embodiment in that the non-common members of the second group and the non-common members of the second group are clustered. On the other hand, the information processing system 1 according to the second embodiment is configured in the same manner as the information processing system 1 according to the first embodiment in many other respects.

従って、以下では、第二実施形態の情報処理システム１の構成に関し、第一実施形態の情報処理システム１とは異なる構成を選択的に説明し、第一実施形態の情報処理システム１と同一構成部位に関する説明を省略する。第一実施形態の情報処理システム１と同一符号が付された構成は、特に追加の説明がない限り、第一実施形態の情報処理システム１と同一構成であると理解されてよい。 Therefore, in the following, regarding the configuration of the information processing system 1 of the second embodiment, a configuration different from the information processing system 1 of the first embodiment will be selectively described, and the same configuration as the information processing system 1 of the first embodiment Description of the part is omitted. The configuration given the same reference numerals as the information processing system 1 of the first embodiment may be understood to be the same configuration as the information processing system 1 of the first embodiment unless there is an additional description.

本実施形態において、第一データ提供システム１０のプロセッサ１１は、第一データ提供処理（図５）のＳ３１０において、図１１に示すメンバリスト生成処理を実行する。 In the present embodiment, the processor 11 of the first data provision system 10 executes a member list generation process shown in FIG. 11 in S310 of the first data provision process (FIG. 5).

このメンバリスト生成処理において、プロセッサ１１は、第一データベース１５１において連結子ＩＤ＿Ｃの付された特徴データに対応する第一グループのメンバに関して、メンバ間の距離Ｄ１を算出する（Ｓ３１１）。 In this member list generation process, the processor 11 calculates the distance D1 between members with respect to the members of the first group corresponding to the feature data attached with the connector ID_C in the first database 151 (S311).

距離Ｄ１は、特徴データの要素ｘ１，ｘ２，…に対応する特徴空間上でのメンバ間の距離を意味する。距離Ｄ１は、要素ｘ１，ｘ２，…で定義される特徴についてのメンバ間の類似度に対応する。距離Ｄ１が小さいほど対応するメンバ間は類似していると理解できる。距離Ｄ１は、例えば、ユークリッド距離であり得る。例えばメンバｉと、メンバｊとの間の距離Ｄ１［ｉ，ｊ］は、次式に従って算出することができる。この式は、図１１にも示される。 The distance D1 means the distance between members on the feature space corresponding to the elements x1, x2,. The distance D1 corresponds to the similarity between members for the features defined by the elements x1, x2,. It can be understood that the smaller the distance D1, the more similar the corresponding members are. The distance D1 may be, for example, a Euclidean distance. For example, the distance D1 [i, j] between the member i and the member j can be calculated according to the following equation. This equation is also shown in FIG.

Ｄ１［ｉ，ｊ］＝｛Σ（ｘｎ［ｉ］−ｘｎ［ｊ］）^２｝^１／２
ここで、変数ｎは、値１から特徴データの要素数に対応する値Ｎまでの値を採る。ｎ＝１である場合のｘｎは、要素ｘ１を意味し、ｎ＝２である場合のｘｎは、要素ｘ２を意味する。ｘｎ［ｉ］及びｘｎ［ｊ］は、それぞれ、メンバｉ及びメンバｊの特徴データにおける要素ｘｎの値であると理解してよい。ここで要素ｘｎの値は、全て数値で表現されているものと理解されてよい。Σ（ｘｎ［ｉ］−ｘｎ［ｊ］）^２は、ｎ＝１からｎ＝Ｎまでの（ｘｎ［ｉ］−ｘｎ［ｊ］）^２の和であると理解されてよい。Ｓ３１１では、全てのメンバｉ及びメンバｊの組合せに関して、距離Ｄ１［ｉ，ｊ］を算出する。 D1 [i, j] = {Σ (xn [i] −xn [j]) ² } ^1/2
Here, the variable n takes a value from the value 1 to the value N corresponding to the number of elements of the feature data. xn in the case of n = 1 means the element x1, and xn in the case of n = 2 means the element x2. It may be understood that xn [i] and xn [j] are the values of element xn in the feature data of member i and member j, respectively. Here, the values of the element xn may be understood to be all expressed numerically. Σ (xn [i] −xn [j]) ² may be understood as the sum of (xn [i] −xn [j]) ² from n = 1 to n = N. In S311, the distance D1 [i, j] is calculated for all combinations of member i and member j.

その後、プロセッサ１１は、第一実施形態と同様に、第一グループのメンバを連結子ＩＤ＿Ｃで表現したメンバリスト１５３１であって、Ｓ３１１で算出した距離Ｄ１の情報を付した距離情報付のメンバリスト１５３１を生成する（Ｓ３１２）。図１１に示す例によれば、メンバリスト１５３１には、メンバｊの連結子に、任意のメンバｉとの組合せ毎の距離Ｄ１［ｉ，ｊ］が関連付けられている。Ｓ３２０（図５）では、このように生成した距離情報付のメンバリスト１５３１を結合システム５０に送信する。 Thereafter, as in the first embodiment, the processor 11 is a member list 1531 in which the members of the first group are expressed by the connector ID_C, and the member list with distance information to which the information of the distance D1 calculated in S311 is added. 1531 is generated (S312). According to the example shown in FIG. 11, in the member list 1531, the distance D 1 [i, j] for each combination with an arbitrary member i is associated with the connector of the member j. In S320 (FIG. 5), the member list 1531 with distance information generated as described above is transmitted to the coupling system 50.

同様に、第二データ提供システム３０のプロセッサ３１は、第二データ提供処理（図６）のＳ４１０において、図１２に示すメンバリスト生成処理を実行する。 Similarly, the processor 31 of the second data provision system 30 executes a member list generation process shown in FIG. 12 in S410 of the second data provision process (FIG. 6).

このメンバリスト生成処理において、プロセッサ３１は、第二データベース３５１において連結子ＩＤ＿Ｃの付された特徴データに対応する第二グループのメンバに関して、メンバ間の距離Ｄ２を算出する（Ｓ４１１）。 In the member list generation process, the processor 31 calculates the distance D2 between members with respect to the members of the second group corresponding to the feature data attached with the connector ID_C in the second database 351 (S411).

距離Ｄ２は、特徴データの要素ｙ１，ｙ２，…に対応する特徴空間上でのメンバ間の距離を意味する。距離Ｄ２は、要素ｙ１，ｙ２，…で定義される特徴についてのメンバ間の類似度に対応する。距離Ｄ２が小さいほど対応するメンバ間は類似していると理解できる。距離Ｄ２は、例えば、ユークリッド距離であり得る。例えばメンバｉとメンバｊとの間の距離Ｄ２［ｉ，ｊ］は、次式に従って算出することができる。この式は、図１２にも示される。 The distance D2 means the distance between members on the feature space corresponding to the elements y1, y2,. The distance D2 corresponds to the similarity between members for the features defined by the elements y1, y2,. It can be understood that the smaller the distance D2, the more similar the corresponding members are. The distance D2 may be, for example, a Euclidean distance. For example, the distance D2 [i, j] between member i and member j can be calculated according to the following equation. This equation is also shown in FIG.

Ｄ２［ｉ，ｊ］＝｛Σ（ｙｎ［ｉ］−ｙｎ［ｊ］）^２｝^１／２
ここで、変数ｎは、値１から特徴データの要素数に対応する値Ｎまでの値を採る。ｙｎ［ｉ］及びｙｎ［ｊ］は、それぞれ、メンバｉ及びメンバｊの特徴データにおける要素ｙｎの値であると理解してよい。ここで要素ｙｎの値は、全て数値で表現されているものと理解されてよい。Σ（ｙｎ［ｉ］−ｙｎ［ｊ］）^２は、ｎ＝１からｎ＝Ｎまでの（ｙｎ［ｉ］−ｙｎ［ｊ］）^２の和であると理解されてよい。Ｓ４１１では、全てのメンバｉ及びメンバｊの組合せに関して、距離Ｄ２［ｉ，ｊ］を算出する。 D2 [i, j] = {Σ (yn [i] −yn [j]) ² } ^1/2
Here, the variable n takes a value from the value 1 to the value N corresponding to the number of elements of the feature data. yn [i] and yn [j] may be understood to be the value of element yn in the feature data of member i and member j, respectively. Here, the value of the element yn may be understood as being all expressed numerically. Σ (yn [i] −yn [j]) ² may be understood to be the sum of (yn [i] −yn [j]) ² from n = 1 to n = N. In S411, the distance D2 [i, j] is calculated for all combinations of member i and member j.

その後、プロセッサ３１は、第一実施形態と同様に、第二グループのメンバを連結子ＩＤ＿Ｃで表現したメンバリスト３５３１であって、Ｓ４１１で算出した距離Ｄ２の情報を付した距離情報付のメンバリスト３５３１を生成する。図１２に示す例によれば、メンバリスト３５３１には、メンバｊの連結子に、任意のメンバｉとの組合せ毎の距離Ｄ２［ｉ，ｊ］が関連付けている。Ｓ４２０（図６）では、このように生成した距離情報付のメンバリスト３５３１を結合システム５０に送信する。 Thereafter, as in the first embodiment, the processor 31 is a member list 3531 in which the members of the second group are expressed by the connector ID_C, and the member list with distance information to which the information of the distance D2 calculated in S411 is added. Generate 3531. According to the example shown in FIG. 12, in the member list 3531, the distance D2 [i, j] for each combination with an arbitrary member i is associated with the connector of the member j. In S420 (FIG. 6), the member list 3531 with distance information generated in this way is transmitted to the coupling system 50.

結合システム５０のプロセッサ５１は、第一データ提供システム１０から受信したメンバリスト１５３１及び第二データ提供システム３０から受信したメンバリスト３５３１に基づいて、Ｓ１３３，Ｓ１３５，Ｓ１３７（図４参照）のそれぞれにおいて、図１３に示す処理を実行する。 The processor 51 of the coupling system 50 performs each of S133, S135, and S137 (see FIG. 4) based on the member list 1531 received from the first data providing system 10 and the member list 3531 received from the second data providing system 30. The process shown in FIG. 13 is performed.

即ち、Ｓ１３３において、プロセッサ５１は、共通メンバに関してメンバ間の距離Ｄを、メンバリスト１５３１が示す距離Ｄ１及びメンバリスト３５３１が示す距離Ｄ２の合成距離Ｄ＝（Ｄ１^２＋Ｄ２^２）^１／２として算出する（Ｓ６１０）。この距離Ｄは，要素ｘ１，ｘ２，…，ｙ１，ｙ２で定義される特徴についてのメンバ間の類似度に対応する。メンバｉとメンバｊとの間の距離Ｄ＝Ｄ［ｉ，ｊ］は、式Ｄ［ｉ，ｊ］＝（Ｄ１［ｉ，ｊ］^２＋Ｄ２［ｉ，ｊ］^２）^１／２に従って算出することができる。 That is, in S133, the processor 51 calculates the distance D between members for the common member as the combined distance D = (D1 ² + D2 ² ) ^1/2 of the distance D1 indicated by the member list 1531 and the distance D2 indicated by the member list 3531. (S610). This distance D corresponds to the similarity between members of the feature defined by the elements x1, x2, ..., y1, y2. The distance D = D [i, j] between member i and member j is calculated according to the equation D [i, j] = (D1 [i, j] ² + D 2 [i, j] ² ) ^1/2 be able to.

この距離Ｄの算出により、要素ｘ１，ｘ２，…，ｙ１，ｙ２に対応する特徴空間上での各メンバの絶対位置を特定することはできないものの、メンバ間の相対位置、即ちメンバの分布については特定することができる。 Although the absolute position of each member on the feature space corresponding to the elements x1, x2,..., Y1, y2 can not be specified by the calculation of the distance D, the relative position between the members, ie, the distribution of the members It can be identified.

プロセッサ５１は、この距離Ｄから特定される特徴空間上の共通メンバの分布に基づき、距離Ｄが近い、換言すれば特徴が類似するメンバを、予め定められた数ずつまとめるようにして、共通メンバを、複数のクラスタに分割する（Ｓ６２０）。各クラスタは、上記予め定められた数のメンバから構成される。このクラスタリングは、周知のｋ−ｍｅａｎｓ法（ｋ平均法）を利用して行うことができる。クラスタリングは、他の公知の技法を用いて行われてもよい。 Based on the distribution of common members on the feature space specified from the distance D, the processor 51 combines the members having similar distances by a predetermined number, ie, members having similar distances. Are divided into a plurality of clusters (S620). Each cluster is composed of the predetermined number of members. This clustering can be performed using the known k-means method (k-means method). Clustering may be performed using other known techniques.

同様に、Ｓ１３５において、プロセッサ５１は、第一グループの非共通メンバに関してメンバ間の距離Ｄ＝（Ｄ１^２＋Ｄ２^２）^１／２を、メンバリスト１５３１が示す距離Ｄ１に基づいて算出する（Ｓ６１０）。ここで、第一グループの非共通メンバに関しては距離Ｄ２に対応する情報が存在しないので、Ｄ２＝０と取り扱って距離Ｄを算出することができる。換言すれば、Ｓ６１０では、メンバｉとメンバｊとの間の距離Ｄ＝Ｄ［ｉ，ｊ］を、Ｄ＝Ｄ１［ｉ，ｊ］に設定することができる。 Similarly, in S135, the processor 51 calculates the distance D = (D1 ² + D2 ² ) ^1/2 between the members with respect to the non-common member of the first group based on the distance D1 indicated by the member list 1531 (S610) . Here, since there is no information corresponding to the distance D2 regarding the non-common members of the first group, the distance D can be calculated by handling D2 = 0. In other words, in S610, the distance D = D [i, j] between the member i and the member j can be set to D = D1 [i, j].

プロセッサ５１は、この距離Ｄから特定される特徴空間上の第一グループの非共通メンバの分布に基づき、距離Ｄが近いメンバを、予め定められた数ずつまとめるようにして、第一グループの非共通メンバを、複数のクラスタに分割する（Ｓ６２０）。 Based on the distribution of the non-common members of the first group on the feature space identified from the distance D, the processor 51 puts together the members having the similar distance D by a predetermined number, and the non-common members of the first group are separated. The common member is divided into a plurality of clusters (S620).

同様に、Ｓ１３７において、プロセッサ５１は、第二グループの非共通メンバに関してメンバ間の距離Ｄ＝（Ｄ１^２＋Ｄ２^２）^１／２を、メンバリスト３５３１が示す距離Ｄ２に基づいて算出する（Ｓ６１０）。ここでは、Ｄ１＝０と取り扱って距離Ｄを算出することができる。換言すれば、Ｓ６１０では、メンバｉとメンバｊとの間の距離Ｄ＝Ｄ［ｉ，ｊ］を、Ｄ＝Ｄ２［ｉ，ｊ］に設定することができる。 Similarly, in S137, the processor 51 calculates the distance D = (D1 ² + D2 ² ) ^1/2 between the members with respect to the non-common member of the second group based on the distance D2 indicated by the member list 3531 (S610) . Here, the distance D can be calculated by handling D1 = 0. In other words, in S610, the distance D = D [i, j] between the member i and the member j can be set to D = D2 [i, j].

プロセッサ５１は、この距離Ｄから特定される特徴空間上の第二グループの非共通メンバの分布に基づき、距離Ｄが近いメンバを、予め定められた数ずつまとめるようにして、第二グループの非共通メンバを、複数のクラスタに分割する（Ｓ６２０）。 Based on the distribution of the non-common members of the second group on the feature space specified from the distance D, the processor 51 puts together the members having the similar distance D by a predetermined number, and The common member is divided into a plurality of clusters (S620).

プロセッサ５１は、クラスタリング処理（Ｓ１３０）において上述の処理を実行すると、その処理結果に基づく第一クラスタ情報１５５及び第二クラスタ情報３５５をそれぞれ第一データ提供システム１０及び第二データ提供システム３０に送信する（Ｓ１４０）。その後、第一実施形態と同様の処理を実行する。 When the processor 51 executes the above processing in the clustering processing (S130), it transmits the first cluster information 155 and the second cluster information 355 based on the processing result to the first data providing system 10 and the second data providing system 30, respectively. (S140). Thereafter, processing similar to that of the first embodiment is performed.

本実施形態によれば、距離Ｄの情報に基づき、結合システム５０が、共通メンバ、第一グループの非共通メンバ、及び、第二の非共通メンバを、距離Ｄが近い（即ち特徴が類似する）メンバでまとめるように、クラスタリングする。従って、距離Ｄを考慮せずにクラスタリングを行う第一実施形態と比較して、より有意義な結合データベース５５１を生成することができる。即ち、有意義な消費者の情報が統計処理によって失われないように複数の特徴データを統合し、その統合特徴データに基づいて結合データベース５５１を生成することができる。 According to the present embodiment, based on the information of the distance D, the combination system 50 has the common member, the non-common member of the first group, and the second non-common member, the distance D is close (ie, the features are similar) ) Cluster as you group members. Therefore, compared with the first embodiment in which clustering is performed without considering the distance D, a more meaningful combined database 551 can be generated. That is, a plurality of feature data can be integrated so that meaningful consumer information is not lost by statistical processing, and the combined database 551 can be generated based on the integrated feature data.

以上、第二実施形態の情報処理システム１について説明したが、距離Ｄ１は、第一データベース１５１が備える特徴データの要素ｘ１，ｘ２，…の全てを用いて算出されなくてもよく、距離Ｄ１は、一部の要素を用いて算出されてもよい。同様に、距離Ｄ２は、特徴データが有する要素ｙ１，ｙ２，…の内、一部の要素を用いて算出されてもよい。一部の要素のみを用いた距離Ｄ１，Ｄ２の算出は、個人情報保護を強固にする目的で有意義である。 The information processing system 1 according to the second embodiment has been described above, but the distance D1 may not be calculated using all of the elements x1, x2,... Of the feature data included in the first database 151. , And may be calculated using some elements. Similarly, the distance D2 may be calculated using some of the elements y1, y2,... Of the feature data. Calculation of the distances D1 and D2 using only some elements is significant for the purpose of strengthening personal information protection.

［第三実施形態］
続いて、第三実施形態の情報処理システム１を説明する。第三実施形態の情報処理システム１は、第二実施形態と同じく、距離情報に基づいてメンバのクラスタリングを行うように構成される。 Third Embodiment
Subsequently, an information processing system 1 of the third embodiment will be described. The information processing system 1 of the third embodiment is configured to perform clustering of members based on distance information, as in the second embodiment.

以下では、第三実施形態の情報処理システム１の構成に関し、第一実施形態の情報処理システム１とは異なる構成を選択的に説明し、第一実施形態の情報処理システム１と同一構成部位に関する説明を省略する。第一実施形態の情報処理システム１と同一符号が付された構成は、特に追加の説明がない限り、第一実施形態の情報処理システム１と同一構成であると理解されてよい。 In the following, regarding the configuration of the information processing system 1 of the third embodiment, a configuration different from the information processing system 1 of the first embodiment will be selectively described, and related to the same components as the information processing system 1 of the first embodiment. I omit explanation. The configuration given the same reference numerals as the information processing system 1 of the first embodiment may be understood to be the same configuration as the information processing system 1 of the first embodiment unless there is an additional description.

本実施形態において、第一データ提供システム１０のプロセッサ１１は、結合システム５０からメンバリストについての要求信号を受信すると、図５に示す処理に代えて、図１４に示す第一データ提供処理を実行する。この第一データ提供処理において、プロセッサ１１は、第一データベース１５１において連結子ＩＤ＿Ｃの付された特徴データに対応する第一グループのメンバに関して、各メンバの属性値Ｚ１を算出する（Ｓ７１０）。図１４右領域及び以下に示されるＺ１［ｉ］は、メンバｉについての属性値Ｚ１を意味する。 In the present embodiment, when the processor 11 of the first data providing system 10 receives the request signal for the member list from the coupling system 50, it executes the first data providing process shown in FIG. 14 instead of the process shown in FIG. Do. In the first data provision processing, the processor 11 calculates the attribute value Z1 of each member for the members of the first group corresponding to the feature data attached with the connector ID_C in the first database 151 (S710). The right area in FIG. 14 and Z1 [i] shown below mean the attribute value Z1 for the member i.

メンバｉの属性値Ｚ１［ｉ］は、例えば、メンバｉの年齢、性別、居住地域、職業等の基本属性（例えばデモグラフィック属性）の組合せを、その組合せに対応する数値に符号化したものである。例えば、メンバｉの属性値Ｚ１［ｉ］は、メンバｉの基本属性の組合せをハッシュ関数に入力したときに算出されるハッシュ値であり得る。 The attribute value Z1 [i] of the member i is, for example, a combination of basic attributes (for example, demographic attributes) such as the age, sex, residence area, and occupation of the member i into numerical values corresponding to the combinations. is there. For example, the attribute value Z1 [i] of member i may be a hash value calculated when a combination of basic attributes of member i is input to the hash function.

但し、属性値Ｚ１は、基本属性の一つだけをハッシュ化したものであってもよい。属性値Ｚ１は、例えば、メンバの居住地域だけをハッシュ化したものであってもよい。ハッシュ化は、メンバの基本属性に関する具体的な情報を、第一データ提供システム１０の外部に対し秘密にするために行われる。 However, the attribute value Z1 may be one obtained by hashing only one of the basic attributes. The attribute value Z1 may be, for example, hashing only the residence area of the member. Hashing is performed to make specific information on basic attributes of members secret from the outside of the first data providing system 10.

上記基本属性に関する情報は、第一データベース１５１内の特徴データから抽出されてもよいし、ストレージ装置１５が記憶する第一データベース１５１とは別の第一グループの会員情報を格納するデータベースから取得されてもよい。 The information on the basic attribute may be extracted from the feature data in the first database 151 or may be acquired from a database storing member information of a first group different from the first database 151 stored in the storage device 15. May be

Ｓ７１０において各メンバの属性値Ｚ１を算出した後、プロセッサ１１は、第一グループのメンバを連結子ＩＤ＿Ｃで表現したメンバリスト１５３２であって、Ｓ７１０で算出した属性値Ｚ１を付したメンバリスト１５３２を生成する（Ｓ７２０）。図１４右領域に示される例によれば、メンバリスト１５３２は、メンバｉの連結子に、メンバｉに関する属性値Ｚ１［ｉ］を関連付けて生成される。 After calculating the attribute value Z1 of each member in S710, the processor 11 is the member list 1532 representing the members of the first group by the connector ID_C, and the member list 1532 to which the attribute value Z1 calculated in S710 is added Generate (S720). According to the example shown in the right area of FIG. 14, the member list 1532 is generated by associating the connector of member i with the attribute value Z1 [i] regarding member i.

更に、プロセッサ１１は、属性値Ｚ１の組合せ（Ｚ１＿ｐ，Ｚ１＿ｑ）を距離Ｄ１に変換するための変換テーブル１５３３を作成する（Ｓ７３０）。変換テーブル１５３３は、図１４右領域に示されるように、属性値Ｚ１の組合せ（Ｚ１＿ｐ，Ｚ１＿ｑ）毎に、その組合せ（Ｚ１＿ｐ，Ｚ１＿ｑ）に対応する距離Ｄ１が記述されたテーブルである。 Furthermore, the processor 11 creates a conversion table 1533 for converting the combination (Z1_p, Z1_q) of the attribute values Z1 into the distance D1 (S730). The conversion table 1533 is a table in which, for each combination (Z1_p, Z1_q) of the attribute values Z1, a distance D1 corresponding to the combination (Z1_p, Z1_q) is described as shown in the right area of FIG.

変換テーブル１５３３によれば、メンバｉの属性値Ｚ［ｉ］が値Ｚ１＿ｐであり、メンバｊの属性値Ｚ［ｊ］が値Ｚ１＿ｑであるときの、メンバｉとメンバｊとの間の距離Ｄ１を、変換テーブル１５３３において対応する組合せ（Ｚ１＿ｐ，Ｚ１＿ｑ）に関連付けられた距離Ｄ１に特定可能である。メンバｉとメンバｊとの間の距離Ｄ１は、基本属性で定義される特徴空間上に、メンバｉとメンバｊを配置したときの、特徴空間上でのメンバ間の距離と理解してよい。この距離は、第二実施形態と同様、ユークリッド距離であり得る。 According to the conversion table 1533, the distance D1 between the member i and the member j when the attribute value Z [i] of the member i is the value Z1_p and the attribute value Z [j] of the member j is the value Z1_q Can be specified as the distance D1 associated with the corresponding combination (Z1_p, Z1_q) in the conversion table 1533. The distance D1 between the member i and the member j may be understood as the distance between the members on the feature space when the member i and the member j are arranged on the feature space defined by the basic attribute. This distance may be Euclidean distance as in the second embodiment.

プロセッサ１１は、属性値Ｚ１の組合せ（Ｚ１＿ｐ，Ｚ１＿ｑ）毎に、属性値Ｚ１＿ｐに対応する基本属性の組合せに対応する特徴空間上の点と、属性値Ｚ１＿ｑに対応する基本属性の組合せに対応する特徴空間上の点との間の距離を、距離Ｄ１として算出して、変換テーブル１５３３を生成することができる。 The processor 11 corresponds, for each combination (Z1_p, Z1_q) of the attribute value Z1, a point on the feature space corresponding to the combination of basic attributes corresponding to the attribute value Z1_p and a combination of basic attributes corresponding to the attribute value Z1_q. The conversion table 1533 can be generated by calculating the distance to a point on the feature space as the distance D1.

プロセッサ１１は、生成したメンバリスト１５３２及び変換テーブル１５３３を、結合システム５０に送信する（Ｓ７４０）。その後、プロセッサ１１は、第一実施形態と同様に、第一クラスタ情報１５５を受信し（Ｓ３３０）、その第一クラスタ情報１５５に基づいて、第一データベース１５１を加工して、加工後第一データベース１５７を生成し（Ｓ３４０）、加工後第一データベース１５７を結合システム５０に送信し（Ｓ３５０）、第一データ提供処理を終了する。 The processor 11 transmits the generated member list 1532 and the conversion table 1533 to the coupling system 50 (S740). Thereafter, as in the first embodiment, the processor 11 receives the first cluster information 155 (S330), processes the first database 151 based on the first cluster information 155, and processes the first database 157 is generated (S340), the processed first database 157 is transmitted to the coupling system 50 (S350), and the first data provision processing is ended.

同様に、第二データ提供システム３０のプロセッサ３１は、結合システム５０からメンバリストについての要求信号を受信すると、図６に示す処理に代えて、図１５に示す第二データ提供処理を実行する。この第二データ提供処理において、プロセッサ３１は、Ｓ７１０と同様、第二グループのメンバに関して、各メンバの属性値Ｚ２を算出する（Ｓ８１０）。図１５右領域及び以下に示されるＺ２［ｉ］は、メンバｉについての属性値Ｚ２を意味する。 Similarly, when the processor 31 of the second data providing system 30 receives a request signal for the member list from the coupling system 50, the processor 31 executes the second data providing process shown in FIG. 15 instead of the process shown in FIG. In the second data provision process, the processor 31 calculates the attribute value Z2 of each member for the members of the second group, as in S710 (S810). The right area in FIG. 15 and Z2 [i] shown below mean the attribute value Z2 for the member i.

メンバｉの属性値Ｚ２［ｉ］は、上述した属性値Ｚ１と同様に、メンバｉの基本属性の組合せを、その組合せに対応する数値（例えばハッシュ値）に符号化したものである。上記基本属性に関する情報は、第二データベース３５１内の特徴データから抽出されてもよいし、ストレージ装置３５が記憶する第二データベース３５１とは別の第二グループの会員情報を格納するデータベースから取得されてもよい。 The attribute value Z2 [i] of the member i is obtained by encoding the combination of the basic attributes of the member i into a numerical value (for example, a hash value) corresponding to the combination, similarly to the above-described attribute value Z1. The information on the basic attribute may be extracted from feature data in the second database 351, or may be acquired from a database storing member information of a second group different from the second database 351 stored in the storage device 35. May be

Ｓ８１０において各メンバの属性値Ｚ２を算出した後、プロセッサ３１は、第二グループのメンバを連結子ＩＤ＿Ｃで表現したメンバリスト３５３２であって、Ｓ８１０で算出した属性値Ｚ２を付したメンバリスト３５３２を生成する（Ｓ８２０）。メンバリスト３５３２の例は、図１５右領域に示される。 After calculating the attribute value Z2 of each member in S810, the processor 31 is a member list 3532 in which members of the second group are expressed by the connector ID_C, and the member list 3532 to which the attribute value Z2 calculated in S810 is added is used. Generate (S820). An example of the member list 3532 is shown in the right area of FIG.

更に、プロセッサ３１は、属性値Ｚ２の組合せ（Ｚ２＿ｐ，Ｚ２＿ｑ）を距離Ｄ２に変換するための変換テーブル３５３３を作成する（Ｓ８３０）。変換テーブル３５３３は、図１５右領域に示されるように、属性値Ｚ２の組合せ（Ｚ２＿ｐ，Ｚ２＿ｑ）毎に、その組合せ（Ｚ２＿ｐ，Ｚ２＿ｑ）に対応する距離Ｄ２が記述されたテーブルである。 Furthermore, the processor 31 creates a conversion table 3533 for converting the combination (Z2_p, Z2_q) of the attribute values Z2 into the distance D2 (S830). The conversion table 3533 is a table in which, for each combination (Z2_p, Z2_q) of the attribute value Z2, a distance D2 corresponding to the combination (Z2_p, Z2_q) is described as shown in the right area of FIG.

プロセッサ３１は、属性値Ｚ２の組合せ（Ｚ２＿ｐ，Ｚ２＿ｑ）毎に、属性値Ｚ２＿ｐに対応する基本属性の組合せに対応する特徴空間上の点と、属性値Ｚ２＿ｑに対応する基本属性の組合せに対応する特徴空間上の点との間の距離を、距離Ｄ２として算出して、変換テーブル３５３３を生成することができる。 The processor 31 corresponds, for each combination (Z2_p, Z2_q) of the attribute value Z2, to a point on the feature space corresponding to the combination of basic attributes corresponding to the attribute value Z2_p and a combination of basic attributes corresponding to the attribute value Z2_q. The conversion table 3533 can be generated by calculating the distance between points on the feature space as the distance D2.

プロセッサ３１は、生成したメンバリスト３５３２及び変換テーブル３５３３を、結合システム５０に送信する（Ｓ８４０）。その後、プロセッサ３１は、第一実施形態と同様に、第二クラスタ情報３５５を受信し（Ｓ４３０）、その第二クラスタ情報３５５に基づいて、第二データベース３５１を加工して、加工後第二データベース３５７を生成し（Ｓ４４０）、加工後第二データベース３５７を結合システム５０に送信し、第二データ提供処理を終了する。 The processor 31 transmits the generated member list 3532 and the conversion table 3533 to the coupling system 50 (S 840). Thereafter, as in the first embodiment, the processor 31 receives the second cluster information 355 (S430), processes the second database 351 based on the second cluster information 355, and processes the second database 352 after processing. 357 is generated (S440), the processed second database 357 is transmitted to the coupling system 50, and the second data provision processing is ended.

結合システム５０のプロセッサ５１は、第一データ提供システム１０から受信したメンバリスト１５３２及び変換テーブル１５３３、並びに、第二データ提供システム３０から受信したメンバリスト３５３２及び変換テーブル３５３３に基づき、Ｓ１３３，Ｓ１３５，Ｓ１３７（図４参照）のそれぞれにおいて、図１６に示す処理を実行することができる。 Based on the member list 1532 and the conversion table 1533 received from the first data providing system 10, and the member list 3532 and the conversion table 3533 received from the second data providing system 30, the processor 51 of the coupling system 50 performs S133, S135, The process shown in FIG. 16 can be executed in each of S137 (see FIG. 4).

即ち、Ｓ１３３において、プロセッサ５１は、共通メンバに関してメンバ間の距離Ｄ１を、メンバリスト１５３２及び変換テーブル１５３３に基づいて算出する（Ｓ９１０）。更に、メンバ間の距離Ｄ２を、メンバリスト３５３２及び変換テーブル３５３３に基づいて算出する（Ｓ９２０）。そして、算出した距離Ｄ１及び距離Ｄ２に基づいて、メンバ間の距離Ｄ＝（Ｄ１^２＋Ｄ２^２）^１／２を算出する（Ｓ９３０）。 That is, in S133, the processor 51 calculates the distance D1 between members with respect to the common member based on the member list 1532 and the conversion table 1533 (S910). Further, the distance D2 between members is calculated based on the member list 3532 and the conversion table 3533 (S920). Then, based on the calculated distances D1 and D2, the distance D between members = (D1 ² + D2 ² ) ^1/2 is calculated (S930).

その後、プロセッサ５１は、Ｓ６２０での処理と同様、距離Ｄから特定される特徴空間上の共通メンバの分布に基づき、距離Ｄが近い、換言すれば特徴が類似するメンバを、予め定められた数ずつまとめるようにして、共通メンバを複数のクラスタに分割する（Ｓ９４０）。 After that, the processor 51 determines a predetermined number of members having similar distances D, in other words, similar characteristics, based on the distribution of common members on the feature space specified from the distance D, similar to the processing in S620. The common members are divided into a plurality of clusters (S940).

プロセッサ５１は、Ｓ１３５においても同様に、図１６に従う手順で、第一グループの非共通メンバを、距離Ｄが近いメンバを予め定められた数ずつまとめるように、複数のクラスタに分割する。Ｓ１３７においても同様に、第二グループの非共通メンバを、距離Ｄが近いメンバを予め定められた数ずつまとめるように、複数のクラスタに分割する。Ｓ１３５，Ｓ１３７において特定できない距離Ｄ１，Ｄ２の取り扱いについては、第二実施形態と同様である。 Similarly in S135, the processor 51 divides the non-common members of the first group into a plurality of clusters such that the members having the similar distance D are grouped by a predetermined number in the procedure according to FIG. Similarly, in S137, the non-common members of the second group are divided into a plurality of clusters such that the members having the similar distance D are grouped by a predetermined number. The handling of the distances D1 and D2 that can not be specified in S135 and S137 is the same as that in the second embodiment.

以上、第三実施形態の情報処理システム１について説明したが、第三実施形態においても第二実施形態と同様に、メンバ間の特徴空間上の距離に基づいてクラスタリングを行うので、有意義な結合データベース５５１を生成可能である。 The information processing system 1 according to the third embodiment has been described above, but in the third embodiment as well, clustering is performed based on the distance between members in the feature space as in the second embodiment. 551 can be generated.

［第四実施形態］
続いて、第四実施形態の情報処理システム１を説明する。以下では、第四実施形態の情報処理システム１の構成に関し、第一実施形態の情報処理システム１とは異なる構成を選択的に説明し、第一実施形態の情報処理システム１と同一構成部位に関する説明を省略する。第一実施形態の情報処理システム１と同一符号が付された構成は、特に追加の説明がない限り、第一実施形態の情報処理システム１と同一構成であると理解されてよい。 Fourth Embodiment
Subsequently, an information processing system 1 of the fourth embodiment will be described. In the following, regarding the configuration of the information processing system 1 of the fourth embodiment, a configuration different from the information processing system 1 of the first embodiment will be selectively described, and related to the same components as the information processing system 1 of the first embodiment. I omit explanation. The configuration given the same reference numerals as the information processing system 1 of the first embodiment may be understood to be the same configuration as the information processing system 1 of the first embodiment unless there is an additional description.

本実施形態の情報処理システム１は、結合システム５０がストレージ装置５５内に、第一データベース１５１内で用いられる顧客番号ＩＤ＿Ａと、第二データベース３５１内で用いられる顧客番号ＩＤ＿Ｂとの関係を示す関係表５５３を記憶する。図１７上段には、関係表５５３を概念的に表す。 In the information processing system 1 of the present embodiment, the coupling system 50 indicates the relationship between the customer number ID_A used in the first database 151 and the customer number ID_B used in the second database 351 in the storage apparatus 55. The table 553 is stored. The upper part of FIG. 17 conceptually shows a relation table 553.

即ち、関係表５５３は、第一グループ及び第二グループの両者に所属する共通メンバの夫々に関し、共通メンバの顧客番号ＩＤ＿Ａと、顧客番号ＩＤ＿Ｂとを関連付ける情報を有する。図１７上段に示す関係表５５３は、更に、各メンバの連結子ＩＤ＿Ｃの情報を有するが、この情報は任意であり、なくてもよい。 That is, the relation table 553 has information relating the customer number ID_A of the common member and the customer number ID_B for each of the common members belonging to both the first group and the second group. Although the relationship table 553 shown in the upper part of FIG. 17 further includes information of the connector ID_C of each member, this information is optional and may not be necessary.

関係表５５３は、第一データベース１５１及び第二データベース３５１の管理者から予め提供された情報に基づき、生成され得る。関係表５５３は、別の者から提供された情報に基づき、生成されてもよい。例えば、ウェブページへのアクセスなどのネットワーク上のユーザ行動を追跡する企業は、その追跡により、同一ユーザに関する複数企業の顧客番号ＩＤ＿Ａ，ＩＤ＿Ｂを入手し得る。関係表５５３は、このような企業から入手可能な情報に基づいて生成可能である。関係表５５３は、結合システム５０がネットワーク上のユーザ行動を追跡して取得した情報に基づいて生成してもよい。 The relation table 553 may be generated based on information provided in advance from the administrator of the first database 151 and the second database 351. The relationship table 553 may be generated based on information provided by another person. For example, a company tracking user behavior on a network such as access to a web page may obtain customer numbers ID_A, ID_B of multiple companies related to the same user by the tracking. The relationship table 553 can be generated based on information available from such a company. The relationship table 553 may be generated based on information obtained by the coupling system 50 tracking user behavior on the network.

本実施形態では、結合システム５０のプロセッサ５１が、ユーザからの指示に基づき、図４に示す処理に代えて、図１８に示す結合関連処理を実行する。 In the present embodiment, the processor 51 of the coupling system 50 executes coupling-related processing shown in FIG. 18 in place of the processing shown in FIG. 4 based on an instruction from the user.

この結合関連処理において、プロセッサ５１は、メンバリストの要求及び受信を行わず、Ｓ１０１０において、Ｓ１３０に対応するクラスタリング処理を行う。このクラスタリング処理（Ｓ１０１０）では、関係表５５３を参照して、共通メンバを特定し（Ｓ１３１）、共通メンバを複数のクラスタに分割し（Ｓ１３３）、更には、第一グループの非共通メンバを複数のクラスタに分割し（Ｓ１３５）、第二グループの非共通メンバを複数のクラスタに分割する（Ｓ１３７）。 In the connection related processing, the processor 51 does not request and receive a member list, and performs clustering processing corresponding to S130 in S1010. In this clustering process (S1010), a common member is specified with reference to the relationship table 553 (S131), the common member is divided into a plurality of clusters (S133), and a plurality of non-common members of the first group are further divided. (S135), and the non-common member of the second group is divided into a plurality of clusters (S137).

その後、プロセッサ５１は、Ｓ１０２０において、第一クラスタ情報１５５４及び第二クラスタ情報３５５４を生成し、第一データ提供システム１０に、第一クラスタ情報１５５４を送信し、第二データ提供システム３０に、第二クラスタ情報３５５４を送信する。 After that, the processor 51 generates the first cluster information 1554 and the second cluster information 3554 in S1020, and transmits the first cluster information 1554 to the first data providing system 10, and the second data providing system 30 to the second data providing system 30. Two cluster information 3554 is transmitted.

第一クラスタ情報１５５４は、図１７左下領域に示すように、第一グループのメンバを、第一データベース１５１が用いる第一グループの顧客番号ＩＤ＿Ａで表現した、第一グループのメンバリストに、各メンバが属するクラスタのクラスタ番号を付して生成される。第二クラスタ情報３５５４は、図１７右下領域に示すように、第二グループのメンバを、第二データベース３５１が用いる第二グループの顧客番号ＩＤ＿Ｂで表現した、第二グループのメンバリストに、各メンバが属するクラスタのクラスタ番号を付して生成される。 The first cluster information 1554, as shown in the lower left area of FIG. 17, represents the members of the first group in the member list of the first group, which is represented by the customer number ID_A of the first group used by the first database 151. Is generated with the cluster number of the cluster to which. The second cluster information 3554 is, as shown in the lower right area of FIG. 17, each member of the second group, represented by the second group customer number ID_B used by the second database 351, in each member list of the second group. It is generated with the cluster number of the cluster to which the member belongs.

続くＳ１０３０において、プロセッサ５１は、第一データ提供システム１０から加工後第一データベース１５７を受信し、更には、第二データ提供システム３０から加工後第二データベース３５７を受信し（Ｓ１０４０）、これら加工後第一データベース１５７及び加工後第二データベース３５７を結合することにより、結合データベース５５１を生成し、結合データベース５５１をストレージ装置５５に格納する（Ｓ１０５０）。その後、図１７に示す処理を終了する。 In the subsequent S1030, the processor 51 receives the processed first database 157 from the first data providing system 10, and further receives the processed second database 357 from the second data providing system 30 (S1040). A combined database 551 is generated by combining the back first database 157 and the processed second database 357, and the combined database 551 is stored in the storage device 55 (S1050). Thereafter, the process shown in FIG. 17 is ended.

第一データ提供システム１０のプロセッサ１１は、図５に示すＳ３１０，Ｓ３２０の処理を実行せず、結合システム５０から第一クラスタ情報１５５４を受信すると（Ｓ３３０）、この第一クラスタ情報１５５４が有する顧客番号ＩＤ＿Ａとクラスタ番号との関連付け情報に基づいて、加工後第一データベース１５７を生成及び送信することができる（Ｓ３４０，Ｓ３５０）。 The processor 11 of the first data providing system 10 does not execute the processing of S310 and S320 shown in FIG. 5 and receives the first cluster information 1554 from the coupling system 50 (S330), the customer possessed by the first cluster information 1554 The processed first database 157 can be generated and transmitted based on the association information between the number ID_A and the cluster number (S340, S350).

第二データ提供システム３０のプロセッサ３１は、図６に示すＳ４１０，Ｓ４２０の処理を実行せず、結合システム５０から第二クラスタ情報３５５４を受信すると（Ｓ４３０）、この第二クラスタ情報３５５４が有する顧客番号ＩＤ＿Ｂとクラスタ番号との関連付け情報に基づいて、加工後第二データベース３５７を生成及び送信することができる（Ｓ４４０，Ｓ４５０）。 When the processor 31 of the second data providing system 30 receives the second cluster information 3554 from the coupling system 50 without executing the processes of S410 and S420 shown in FIG. 6 (S430), the customer included in the second cluster information 3554 The processed second database 357 can be generated and transmitted based on the association information between the number ID_B and the cluster number (S440, S450).

以上、第四実施形態の情報処理システム１を説明したが、本実施形態の情報処理システム１も第一実施形態と同様の効果を奏する。 As mentioned above, although the information processing system 1 of 4th embodiment was demonstrated, the information processing system 1 of this embodiment also has an effect similar to 1st embodiment.

［第五実施形態］
続いて、第五実施形態の情報処理システム５を説明する。以下では、第五実施形態の情報処理システム５の構成に関し、第一実施形態の情報処理システム１とは異なる構成を選択的に説明し、第一実施形態の情報処理システム１と同一構成部位に関する説明を省略する。第一実施形態の情報処理システム１と同一符号が付された構成は、特に追加の説明がない限り、第一実施形態の情報処理システム１と同一構成であると理解されてよい。 Fifth Embodiment
Subsequently, an information processing system 5 of the fifth embodiment will be described. In the following, regarding the configuration of the information processing system 5 of the fifth embodiment, a configuration different from the information processing system 1 of the first embodiment will be selectively described, and related to the same components as the information processing system 1 of the first embodiment. I omit explanation. The configuration given the same reference numerals as the information processing system 1 of the first embodiment may be understood to be the same configuration as the information processing system 1 of the first embodiment unless there is an additional description.

本実施形態の情報処理システム５は、図１９に示すように、第一実施形態の第二データ提供システム３０に対応する機能が、結合システム８０に組み込まれた構成にされる。具体的に、この情報処理システム５は、第一実施形態の第一データ提供システム１０に対応するデータ提供システム７０と、第一実施形態の第二データ提供システム３０及び結合システム５０に対応する結合システム８０と、を備える。 As shown in FIG. 19, the information processing system 5 of this embodiment has a configuration in which the function corresponding to the second data providing system 30 of the first embodiment is incorporated in the coupling system 80. Specifically, the information processing system 5 includes a data providing system 70 corresponding to the first data providing system 10 of the first embodiment, and a combination corresponding to the second data providing system 30 and the coupling system 50 of the first embodiment. And a system 80.

データ提供システム７０は、プロセッサ７１と、メモリ７３と、ストレージ装置７５と、を備える。ストレージ装置７５は、第一データベース１５１を格納する。このデータ提供システム７０は、第一実施形態の第一データ提供システム１０と同一構成であると理解されてよい。プロセッサ７１が実行する処理は、第一実施形態のプロセッサ１１が実行する処理と基本的に同じであると理解されてよい。 The data providing system 70 includes a processor 71, a memory 73, and a storage device 75. The storage device 75 stores the first database 151. The data providing system 70 may be understood to have the same configuration as the first data providing system 10 of the first embodiment. It may be understood that the processing executed by the processor 71 is basically the same as the processing executed by the processor 11 of the first embodiment.

結合システム８０は、データ提供システム７０とネットワークＮＴを通じて通信可能に構成される。この結合システム８０は、プロセッサ８１と、メモリ８３と、ストレージ装置８５と、を備える。ストレージ装置８５は、第二データベース３５１を格納する。ストレージ装置８５は、プロセッサ８１が実行する処理により結合データベース５５１が生成されたときに、当該結合データベース５５１を格納する。 The coupling system 80 is configured to be communicable with the data providing system 70 through the network NT. The coupling system 80 includes a processor 81, a memory 83, and a storage device 85. The storage device 85 stores a second database 351. The storage apparatus 85 stores the coupled database 551 when the coupled database 551 is generated by the process executed by the processor 81.

プロセッサ８１は、結合データベース５５１の生成指示がユーザから入力されると、図４に示す処理に代えて、図２０に示す結合関連処理を実行する。 When an instruction to generate the combined database 551 is input from the user, the processor 81 executes the connection-related process shown in FIG. 20 instead of the process shown in FIG.

この結合関連処理において、プロセッサ８１は、ネットワークＮＴを通じて、データ提供システム７０にメンバリストを要求する要求信号を送信し（Ｓ１１１０）、データ提供システム７０から、第一グループのメンバリスト１５３を取得する（Ｓ１１２０）。 In this connection-related process, the processor 81 transmits a request signal for requesting a member list to the data providing system 70 through the network NT (S1110), and acquires the member list 153 of the first group from the data providing system 70 (S1110) S1120).

その後、取得したメンバリスト１５３に基づいてクラスタリング処理を実行する（Ｓ１１３０）。このクラスタリング処理（Ｓ１１３０）において、プロセッサ８１は、メンバリスト１５３と第二データベース３５１とを照合し、第一グループ及び第二グループに共通して存在する共通メンバを特定する。更には、第一グループのメンバリスト１５３に表されるメンバの内、共通メンバ以外のメンバを、第一グループの非共通メンバとして特定する。この他、第二データベース３５１に特徴データを有する第二グループのメンバの内、共通メンバ以外のメンバを、第二グループの非共通メンバとして特定する。 Thereafter, clustering processing is executed based on the acquired member list 153 (S1130). In the clustering process (S1130), the processor 81 collates the member list 153 with the second database 351, and identifies a common member commonly existing in the first group and the second group. Furthermore, among members represented in the member list 153 of the first group, members other than common members are specified as non-common members of the first group. In addition, among members of the second group having feature data in the second database 351, members other than common members are specified as non-common members of the second group.

プロセッサ８１は、特定した共通メンバ、第一グループの非共通メンバ、及び、第二グループの非共通メンバのそれぞれを、第一実施形態におけるＳ１３３，Ｓ１３５，Ｓ１３７の処理と同様の手法で、複数のクラスタに分割する。 The processor 81 determines the plurality of common members, the non-common members of the first group, and the non-common members of the second group in the same manner as the processing of S133, S135, and S137 in the first embodiment. Split into clusters.

プロセッサ８１は、このようにしてＳ１１３０におけるクラスタリング処理を終了すると、データ提供システム７０に第一クラスタ情報１５５を送信し（Ｓ１１４０）、第一クラスタ情報１５５に基づく加工後第一データベース１５７を、データ提供システム７０から取得する（Ｓ１１５０）。 The processor 81 transmits the first cluster information 155 to the data providing system 70 when the clustering processing in S1130 is finished (S1140), and provides the processed first database 157 based on the first cluster information 155 as data. It acquires from the system 70 (S1150).

更に、プロセッサ８１は、Ｓ１１３０におけるクラスタリング処理の結果に基づいて、ストレージ装置８５内の第二データベース３５１が有する特徴データを、クラスタ毎に統計処理により統合して、クラスタ毎の統合特徴データを生成する。これにより、プロセッサ８１は、第二データベース３５１に基づくクラスタ毎の統合特徴データを備える加工後第二データベース３５７を生成する（Ｓ１１６０）。 Furthermore, the processor 81 integrates feature data possessed by the second database 351 in the storage apparatus 85 by statistical processing for each cluster based on the result of the clustering process in S1130 to generate integrated feature data for each cluster. . Thereby, the processor 81 generates a second processed database 357 including integrated feature data for each cluster based on the second database 351 (S1160).

その後、プロセッサ８１は、データ提供システム７０から取得した上記加工後第一データベース１５７と、自ら生成した上記加工後第二データベース３５７と、を結合することにより、結合データベース５５１を生成し、結合データベース５５１をストレージ装置８５に格納する（Ｓ１１７０）。その後、図２０に示す処理を終了する。 Thereafter, the processor 81 generates a combined database 551 by combining the first processed database 157 obtained from the data providing system 70 and the second processed database 357 generated by itself, thereby generating a combined database 551. Are stored in the storage device 85 (S1170). Thereafter, the process shown in FIG. 20 is ended.

結合システム８０で生成された結合データベース５５１は、結合システム８０側の企業が消費者行動分析や広告配信対象の決定に役立ててもよいし、第一データベース１５１側の企業に提供されてもよい。本実施形態の情報処理システム５に係る技術も、第一実施形態と同様に、個人情報保護の観点からデータ提供に消極的な企業からのデータ提供を容易にすることができ、更には、統計化されたデータに基づいても、消費者行動分析等に有意義な結合データベースを生成することができる。 The combined database 551 generated by the combined system 80 may be used by a company on the side of the combined system 80 to analyze consumer behavior and to determine an advertisement delivery target, or may be provided to a company on the first database 151 side. Similarly to the first embodiment, the technology relating to the information processing system 5 of the present embodiment can facilitate the provision of data from a company reluctant to provide data from the viewpoint of protection of personal information, and further, the statistics Based on the converted data, it is possible to generate a joint database meaningful for consumer behavior analysis and the like.

以上に第一実施形態から第五実施形態までを説明したが、本開示は、上記実施形態に限定されるものではなく種々の態様を採ることができる。例えば、第五実施形態には、第二実施形態から第四実施形態の技術思想が適用されてもよい。第五実施形態に第四実施形態の技術思想が適用される場合には、例えば、結合システム８０が関係表５５３を有することができ、図２０におけるＳ１１１０，Ｓ１１２０の処理は省略され得る。 Although the first to fifth embodiments have been described above, the present disclosure is not limited to the above embodiments, and various aspects can be adopted. For example, the technical ideas of the second to fourth embodiments may be applied to the fifth embodiment. When the technical concept of the fourth embodiment is applied to the fifth embodiment, for example, the coupling system 80 can have the relation table 553 and the processing of S1110 and S1120 in FIG. 20 can be omitted.

連結子ＩＤ＿Ｃは、メンバに予め割り当てられるものでなくてもよい。連結子ＩＤ＿Ｃは、第一データベース１５１及び第二データベース３５１の管理者が保有する各メンバの詳細な個人情報を符号化して生成されるものであってもよい。データ提供システム１０，３０は、このような個人情報を記憶し、個人情報を符号化して連結子ＩＤ＿Ｃを生成する機能を有していてもよい。ここでいう個人情報の例には、名前、住所、電話番号、メールアドレス等の情報が含まれる。符号化には、ハッシュ関数が用いられ得る。 The connector ID_C may not be pre-assigned to a member. The connector ID_C may be generated by encoding detailed personal information of each member held by the administrator of the first database 151 and the second database 351. The data providing system 10, 30 may have a function of storing such personal information and encoding the personal information to generate a connector ID_C. Examples of the personal information mentioned here include information such as name, address, telephone number, and e-mail address. A hash function may be used for encoding.

即ち、連結子ＩＤ＿Ｃは、個人情報をハッシュ関数に入力して生成されるハッシュ値であってもよい。第一データベース１５１及び第二データベース３５１において共通するハッシュ関数で個人情報を符号化して連結子ＩＤ＿Ｃを生成すれば、同一人物に関して、第一データベース１５１の特徴データ及び第二データベース３５１の特徴データのいずれにも同一の連結子ＩＤ＿Ｃを関連付けることができる。不可逆なハッシュ値を連結子ＩＤ＿Ｃに用いれば、基本的には、外部に個人情報が漏えいすることもない。ハッシュ値に変換する個人情報は、連結子ＩＤ＿Ｃがおよそ各個人に対して固有になる情報であれば、上述の具体例に限定されない。連結子ＩＤ＿Ｃに代えて、符号化を伴わない個人情報の一部がその顧客番号ＩＤ＿Ａ，ＩＤ＿Ｂと共に共通メンバの特定のために結合システムに提供されてもよい。 That is, the connector ID_C may be a hash value generated by inputting personal information into the hash function. If personal information is encoded with a common hash function in the first database 151 and the second database 351 to generate the connector ID_C, any of the feature data of the first database 151 and the feature data of the second database 351 regarding the same person. Can also associate the same connector ID_C. Basically, personal information is not leaked to the outside by using an irreversible hash value for the connector ID_C. The personal information to be converted into the hash value is not limited to the above specific example as long as the connector ID_C is information that is approximately unique to each individual. Instead of the connector ID_C, a part of the personal information without coding may be provided to the combined system for the identification of the common member together with the customer number ID_A, ID_B.

この他、上記実施形態では、第一グループ及び第二グループに共通する複数のメンバを、複数のクラスタに分割したが、第一グループ内のメンバと第二グループ内のメンバとのペアに関して、複数のペアを、複数のクラスタに分割し、そのクラスタリング結果に基づいて結合データベース５５１を生成してもよい。ペアは、第一グループと第二グループとの間で、互いに少なくとも対応するメンバのペアであり得る。ペアは、そのペアに属する二人のメンバが互いに少なくとも対応するメンバのペアであり得る。例えば、互いに少なくとも対応するメンバのペアは、実体が同一であると推定される又は実体が同一である蓋然性が高いメンバのペアであり得る。 Besides, in the above embodiment, although a plurality of members common to the first group and the second group are divided into a plurality of clusters, a plurality of pairs of members in the first group and members in the second group Pairs may be divided into a plurality of clusters, and a combined database 551 may be generated based on the clustering result. The pair may be a pair of members at least corresponding to each other between the first group and the second group. A pair may be a pair of members in which two members belonging to the pair at least correspond to each other. For example, a pair of members that at least correspond to each other may be a pair of members that are presumed to be identical to each other or likely to be identical to each other.

例えば、第四実施形態では、関係表５５３が、共通メンバの顧客番号ＩＤ＿Ａと、顧客番号ＩＤ＿Ｂとを関連付ける情報を有するが、関係表５５３は、互いに少なくとも対応する第一グループのメンバと第二グループのメンバとのペアを、顧客番号ＩＤ＿Ａと、顧客番号ＩＤ＿Ｂとの関連付けにより示す表であってもよい。例えば、関係表５５３は、同一であると推定される第一グループのメンバと第二グループのメンバとのペアを、顧客番号ＩＤ＿Ａと、顧客番号ＩＤ＿Ｂとの関連付けにより示す表であり得る。 For example, in the fourth embodiment, the relationship table 553 includes information that associates the common member customer number ID_A and the customer number ID_B, but the relationship table 553 includes at least members of the first group and the second group that correspond to each other. It may be a table showing a pair with a member of by the association of the customer number ID_A and the customer number ID_B. For example, the relationship table 553 may be a table indicating pairs of members of the first group and members of the second group estimated to be identical, by associating the customer number ID_A with the customer number ID_B.

このような関係表５５３は、例えば、クッキー（Ｃｏｏｋｉｅ）リストの照合により生成可能である。周知のように、クッキーは、ウェブページにアクセスする人物の識別のために使用される。ネットワーク上の消費者行動を追跡することによっては、顧客番号ＩＤ＿Ａと対応付けられるクッキーリストと、顧客番号ＩＤ＿Ｂと対応付けられるクッキーリストとを、生成可能である。そして、顧客番号ＩＤ＿Ａと対応付けられるクッキーリストと顧客番号ＩＤ＿Ｂと対応付けられるクッキーリストとの一致度が高い場合、その顧客番号ＩＤ＿Ａに対応する第一グループのメンバと、顧客番号ＩＤ＿Ｂに対応する第二グループのメンバは、同一の消費者である可能性が高いと言える。 Such a relationship table 553 can be generated, for example, by collation of a cookie (Cookie) list. As is well known, cookies are used to identify the person who accesses the web page. By tracking consumer behavior on the network, it is possible to create a cookie list associated with customer number ID_A and a cookie list associated with customer number ID_B. Then, when the matching degree between the cookie list associated with the customer number ID_A and the cookie list associated with the customer number ID_B is high, the members of the first group corresponding to the customer number ID_A and the first corresponding to the customer number ID_B It can be said that members of the two groups are likely to be the same consumer.

従って、クッキーリストの一致度が基準以上である第一グループのメンバと第二グループのメンバとを同一人物であると推定すれば、関係表５５３として、同一であると推定される第一グループのメンバと第二グループのメンバとのペアを、顧客番号ＩＤ＿Ａと、顧客番号ＩＤ＿Ｂとの関連付けにより示した表を生成可能である。 Therefore, if it is presumed that the members of the first group and the members of the second group whose degree of coincidence in the cookie list is equal to or higher than the reference are the same person, the relation table 553 is assumed to be the same as that of the first group. It is possible to create a table showing pairs of members and members of the second group by associating the customer number ID_A with the customer number ID_B.

このような関係表５５３を用いる情報処理システム１にも、距離情報に基づいたクラスタリングを行う第二実施形態及び第三実施形態の技術を適用可能である。この場合、第一データ提供システム１０から結合システム５０へは、図１１及び図１４に示されるメンバリスト１５３１，１５３２に代替するメンバリストとして、連結子ＩＤ＿Ｃではなく顧客番号ＩＤ＿Ａで各メンバを表すメンバリストを送信することができる。同様に、第二データ提供システム３０から結合システム５０へは、図１２及び図１５に示されるメンバリスト３５３１，３５３２に代替するメンバリストとして、顧客番号ＩＤ＿Ｂで各メンバを表すメンバリストを送信することができる。 The techniques of the second embodiment and the third embodiment that perform clustering based on distance information are also applicable to the information processing system 1 using such a relationship table 553. In this case, from the first data providing system 10 to the coupling system 50, a member list representing each member by the customer number ID_A instead of the connector ID_C as a member list replacing the member lists 1531, 1532 shown in FIGS. You can send the list. Similarly, the second data providing system 30 transmits a member list representing each member by the customer number ID_B as a member list replacing the member lists 3531 and 3532 shown in FIGS. 12 and 15 from the coupling system 50. Can.

この他、上記実施形態では、二つのデータベースを加工して結合する例を説明したが、三つ以上のデータベースを加工して結合する際に、上記実施形態の技術を適用できることは言うまでもない。従って、本開示は、三以上のデータベースを加工して結合する目的で活用されてもよい。この場合、一つのデータベースを中心に複数のデータベースが結合されてもよいし、複数のデータベースが直列に結合されてもよい。 In addition, although the example which processes and connects two databases was demonstrated in the said embodiment, when processing and connecting three or more databases, it can not be overemphasized that the technique of the said embodiment can be applied. Thus, the present disclosure may be exploited for the purpose of processing and combining three or more databases. In this case, a plurality of databases may be linked around one database, or a plurality of databases may be linked in series.

データベースは、消費者と関連する物及び／又は場所を構成体（メンバ）とするグループの特徴データを有するデータベースであってもよい。近年においては、消費者行動が、スマートフォンなどの携帯端末と密接にかかわっている。従って、第一データベース１５１及び第二データベース３５１は、消費者に対応する携帯端末毎の特徴データを有していてもよい。 The database may be a database having feature data of a group whose members (members) are objects and / or places associated with the consumer. In recent years, consumer behavior is closely related to portable terminals such as smart phones. Therefore, the first database 151 and the second database 351 may have feature data for each portable terminal corresponding to the consumer.

更に、結合データベース５５１は、加工後第一データベース１５７及び加工後第二データベース３５７への参照情報を有するデータベースとして構成されてもよい。即ち、結合データベース５５１は、統合特徴データの実体を有していなくてもよく、クラスタ毎の結合データは、加工後第一データベース１５７及び加工後第二データベース３５７が有する統合特徴データへのリンク情報又はアドレス情報を有する形態で構成されていてもよい。 Furthermore, the combined database 551 may be configured as a database having reference information to the first processed database 157 and the second processed database 357. That is, the combined database 551 may not have the substance of integrated feature data, and the combined data of each cluster is the link information to the integrated feature data of the first processed database 157 and the second processed database 357. Alternatively, it may be configured in a form having address information.

第二実施形態では、距離Ｄ１及び距離Ｄ２の情報を用いてクラスタリングが行なわれたが、距離Ｄ１及び距離Ｄ２のいずれか一方のみがクラスタリングに用いられてもよい。この場合、不要な距離Ｄ１，Ｄ２の一方を値ゼロとみなして合成距離Ｄを算出すればよい。不要な距離Ｄ１，Ｄ２の一方の情報は、データ提供システム１０，３０から結合システム５０に提供されなくてもよい。第三実施形態で、距離Ｄ１、Ｄ２に代えて用いられる属性値Ｚ１，Ｚ２も同様に、一方のみが用いられる変形例が考えられる。 In the second embodiment, clustering is performed using information of the distance D1 and the distance D2, but only one of the distance D1 and the distance D2 may be used for clustering. In this case, one of the unnecessary distances D1 and D2 may be regarded as the value zero to calculate the combined distance D. Information on one of the unnecessary distances D1 and D2 may not be provided from the data providing system 10 or 30 to the coupling system 50. In the third embodiment, as the attribute values Z1 and Z2 used in place of the distances D1 and D2, similarly, a modification is conceivable in which only one of them is used.

上記実施形態における１つの構成要素が有する機能は、複数の構成要素に分散して設けられてもよい。複数の構成要素が有する機能は、１つの構成要素に統合されてもよい。上記実施形態の構成の一部は、省略されてもよい。上記実施形態の構成の少なくとも一部は、他の上記実施形態の構成に対して付加又は置換されてもよい。特許請求の範囲に記載の文言から特定される技術思想に含まれるあらゆる態様が本開示の実施形態である。 The functions of one component in the above embodiment may be distributed to a plurality of components. The functions of multiple components may be integrated into one component. A part of the configuration of the above embodiment may be omitted. At least a part of the configuration of the above embodiment may be added to or replaced with the configuration of the other above embodiments. All aspects included in the technical concept specified from the wording described in the claims are an embodiment of the present disclosure.

用語間の対応関係は、次の通りである。結合システム５０，８０のプロセッサ５１，８１によって実行されるＳ１３０，Ｓ１０１０，Ｓ１１３０の処理は、クラスタリング部によって実現される処理の一例に対応する。プロセッサ５１，８１によって実行されるＳ１５０，Ｓ１０３０，Ｓ１１５０の処理は、第一取得部によって実現される処理の一例に対応する。プロセッサ５１，８１によって実行されるＳ１６０，Ｓ１０４０，Ｓ１１６０の処理及びプロセッサ８１がＳ１１７０においてＳ１１６０で生成された加工後データベース３５７を読み出す処理は、第二取得部によって実現される処理の一例に対応する。プロセッサ５１，８１によって実行されるＳ１７０，Ｓ１０５０，Ｓ１１７０の処理は、結合部によって実現される処理の一例に対応する。第一データ提供システム１０のプロセッサ１１によって実行されるＳ３４０の処理は、第一生成部によって実現される処理の一例に対応する。第二データ提供システム３０のプロセッサ３１によって実行されるＳ４４０の処理及び結合システム８０のプロセッサ８１によって実行されるＳ１１６０の処理は、第二生成部によって実現される処理の一例に対応する。 The correspondence between terms is as follows. The processes of S130, S1010 and S1130 executed by the processors 51 and 81 of the combined system 50 and 80 correspond to an example of the process realized by the clustering unit. The processes of S150, S1030, and S1150 executed by the processors 51 and 81 correspond to an example of the process implemented by the first acquisition unit. The processes of S160, S1040 and S1160 executed by the processors 51 and 81 and the process of the processor 81 reading out the processed database 357 generated in S1160 in S1170 correspond to an example of the process realized by the second acquisition unit. The processes of S170, S1050, and S1170 executed by the processors 51 and 81 correspond to an example of the process realized by the coupling unit. The process of S340 executed by the processor 11 of the first data providing system 10 corresponds to an example of the process implemented by the first generation unit. The process of S440 executed by the processor 31 of the second data providing system 30 and the process of S1160 executed by the processor 81 of the coupling system 80 correspond to an example of the process implemented by the second generation unit.

１…情報処理システム、５…情報処理システム、１０…第一データ提供システム、１１…プロセッサ、１３…メモリ、１５…ストレージ装置、３０…第二データ提供システム、３１…プロセッサ、３３…メモリ、３５…ストレージ装置、５０…結合システム、５１…プロセッサ、５３…メモリ、５５…ストレージ装置、７０…データ提供システム、７１…プロセッサ、７３…メモリ、７５…ストレージ装置、８０…結合システム、８１…プロセッサ、８３…メモリ、８５…ストレージ装置、１５１…第一データベース、１５３…メンバリスト、１５５…第一クラスタ情報、１５７…加工後第一データベース、３５１…第二データベース、３５３…メンバリスト、３５５…第二クラスタ情報、３５７…加工後第二データベース、５５１…結合データベース、５５３…関係表、１５３１…メンバリスト、１５３２…メンバリスト、１５３３…変換テーブル、１５５４…第一クラスタ情報、３５３１…メンバリスト、３５３２…メンバリスト、３５３３…変換テーブル、３５５４…第二クラスタ情報、ＮＴ…ネットワーク。 DESCRIPTION OF SYMBOLS 1 ... Information processing system, 5 ... Information processing system, 10 ... 1st data provision system, 11 ... Processor, 13 ... Memory, 15 ... Storage apparatus, 30 ... 2nd data provision system, 31 ... Processor, 33 ... Memory, 35 ... Storage device 50: Coupling system 51: Processor 53: Memory 55: Storage device 70: Data providing system 71: Processor 73: Memory 75: Storage device 80: Coupling system 81: Processor 81 83: memory, 85: storage device, 151: first database, 153: member list, 155: first cluster information, 157: first database after processing, 351: second database, 353: member list, 355: second Cluster information, 357 ... second database after processing, 551 ... combined data Database, 553 ... relationship table, 1531 ... member list, 1532 ... member list, 1533 ... conversion table, 1554 ... first cluster information, 3531 ... member list, 3532 ... member list, 3533 ... conversion table, 3554 ... second cluster information , NT ... Network.

Claims

An information processing system for generating a new database based on first and second databases, wherein the first database has feature data representing a first feature of a first group of components, for each component of the first group. In the information processing system, the second database includes, for each of the members of the second group, feature data representing a second feature of the members,
A plurality of construction pairs, each of which is a pair of constructions between the first group and the second group and in which two constructions belonging to the pair at least correspond to each other, are divided into a plurality of clusters; A clustering unit that provides cluster information representing a cluster to which each of the constituent pairs belong;
Based on the cluster information acquired from the clustering unit, feature data corresponding to the plurality of structure pairs included in the first database are integrated by statistical processing for each cluster, whereby the integration is performed for each cluster. A first acquisition unit that generates a first integrated feature data as feature data, and a first acquisition unit that acquires a first integrated feature data for each cluster from the first generation unit;
Based on the cluster information acquired from the clustering unit, feature data corresponding to the plurality of constituent pairs included in the second database are integrated by statistical processing for each cluster, whereby the integration is performed for each cluster. A second acquisition unit that generates second integrated feature data as the second feature data, and a second acquisition unit that acquires second integrated feature data for each of the clusters from the second generation unit that generates second integrated feature data as the feature data;
Based on the first integrated feature data for each cluster acquired by the first acquisition unit and the second integrated feature data for each cluster acquired by the second acquisition unit, as the new database A combining unit that generates a combined database including combined data obtained by combining the first integrated feature data and the second integrated feature data for each of the clusters;
An information processing system comprising:

The information processing system according to claim 1, wherein
The members of the first and second groups are consumers, and the first database has, for each consumer of the first group, feature data representing a first feature of the consumer, The second database is an information processing system including, for each consumer of the second group, feature data representing a second feature of the consumer.

The information processing system according to claim 1 or 2, wherein
Each of the first group of constructs is assigned a separate first identification code, and the first database comprises: feature data for each of the first group of constructs; Store in association with the identification code,
Each of the second group of constructs is assigned a separate second identification code, and the second database is configured to include feature data for each of the second group of constructs in the second group of said constructs. Store in association with the identification code,
The clustering unit identifies the plurality of construction body pairs based on the information indicating the correspondence between the first identification code and the second identification code, divides the plurality of construction pairs into the plurality of clusters, and determines the plurality of clusters as the cluster information. Providing cluster information representing a cluster to which each of the plurality of component pairs belongs in association with the first identification code, and providing the second generator with the plurality of component pairs. An information processing system for providing cluster information representing a cluster to which each belongs in association with the second identification code.

The information processing system according to any one of claims 1 to 3, wherein
An information processing system, wherein each of the plurality of component pairs is a pair of components whose entities are assumed to be identical.

The information processing system according to claim 1 or 2, wherein
The first and second databases store the feature data of each structure in association with the identification code of the corresponding structure using an identification code common to the first database and the second database. ,
The clustering unit includes, as the plurality of constituent pairs, a plurality of constituent pairs corresponding to the pair of the feature data in which the same identification code is associated between the first database and the second database. An information processing system for dividing the data into clusters and representing, as the cluster information, the first and second generation units in association with the identification code and representing the clusters to which each of the plurality of component pairs belongs.

The information processing system according to any one of claims 1 to 5, wherein
The information processing system, wherein the clustering unit divides the plurality of constituent pairs into the plurality of clusters based on the degree of similarity between the plurality of constituent pairs.

The information processing system according to claim 6, wherein
The clustering unit acquires similarity information that can specify the similarity of at least one of the first and second features between the plurality of construction body pairs, and the plurality of the clustering units are based on the acquired similarity information. An information processing system, wherein a construction pair is divided into the plurality of clusters such that construction pairs in which at least one of the first and second features is similar are put together.

The information processing system according to claim 6 or 7, wherein
The first generation unit provides the clustering unit with a list representing a similarity between a plurality of constructs related to the first feature, the list being a list of a plurality of constructs belonging to the first group.
The second generation unit provides the clustering unit with a list representing a similarity between a plurality of constructs related to the second feature, the list being a list of a plurality of constructs belonging to the second group.
The clustering unit is configured to combine the plurality of structure pairs into a structure pair having similar first and second features based on the list acquired from the first generation unit and the second generation unit. An information processing system for dividing into a plurality of clusters.

The information processing system according to claim 6, wherein
The first generation unit provides the clustering unit with a list including a plurality of constructs belonging to the first group and including attribute values of the constructs.
The clustering unit determines similarity between the plurality of construct pairs based on the attribute value, and divides the plurality of construct pairs into the plurality of clusters based on the determined similarity. system.

The information processing system according to claim 6, wherein
The first generation unit provides the clustering unit with a list including a plurality of constituents belonging to the first group and including a first attribute value of each of the constituents.
The second generation unit provides the clustering unit with a list including a plurality of constituents belonging to the second group, the list including a second attribute value for each of the constituents.
The clustering unit determines the similarity between the plurality of construction pairs based on the first and second attribute values, and the plurality of construction pairs are determined based on the determined similarity. An information processing system that divides into clusters of.

The information processing system according to any one of claims 1 to 10, wherein
The first generation unit and the first database are provided in a first external system, and the second generation unit and the second database are provided in a second external system independent of the first external system. An information processing system configured to be communicable with the first and second external systems.

An information processing system capable of communicating with first and second external systems, wherein
The first external system is provided with a first database having feature data representing a first feature of each of the first group of components,
In the information processing system, the second external system includes, for each of the members of the second group, a second database having feature data representing a second feature of the members.
Obtaining a list of a plurality of components belonging to the first group from the first external system, and further obtaining a list of a plurality of components belonging to the second group from the second external system; Based on the acquired list, a plurality of clusters of a pair of members between the first group and the second group, in which two members belonging to the pair at least correspond to each other, are referred to as a plurality of clusters A clustering unit for dividing the information into a plurality of clusters and providing cluster information representing a cluster to which each of the plurality of construct pairs belongs to the first and second external systems;
The first external system integrates feature data corresponding to the plurality of construct pairs included in the first database by statistical processing on the basis of the cluster information acquired from the clustering unit. A first acquisition unit for acquiring the generated first integrated feature data for each cluster from the first external system;
The second external system integrates feature data corresponding to the plurality of construct pairs included in the second database by statistical processing on the basis of the cluster information acquired from the clustering unit. A second acquisition unit for acquiring the generated second integrated feature data for each cluster from the second external system;
The first integrated feature data of the same cluster based on the first integrated feature data for each cluster acquired by the first acquisition unit and the second integrated feature data for each cluster acquired by the second acquisition unit A combination unit that generates a combination database including, for each of the clusters, combination data obtained by combining the second integrated feature data and the second integrated feature data;
An information processing system comprising:

An information processing system configured to be communicable with an external system,
The external system comprises, for each component of the first group, a first database having feature data representing a first feature of the component;
The information processing system comprises, for each of the members of the second group, a second database having feature data representing a second feature of the members,
The information processing system further includes
A plurality of construction pairs, each of which is a pair of constructions between the first group and the second group and in which two constructions belonging to the pair at least correspond to each other, are divided into a plurality of clusters; A clustering unit that provides the external system with cluster information representing a cluster to which each of the constituent pairs belongs;
The external system integrates feature data corresponding to the plurality of construct pairs included in the first database by statistical processing on the basis of the cluster information received from the clustering unit, and generates the data by the statistical processing. An acquisition unit for acquiring first integrated feature data for each cluster from the external system;
Generation is performed to generate second integrated feature data for each cluster by integrating feature data corresponding to the plurality of component pairs of the second database based on the cluster information by statistical processing for each cluster Department,
The first integrated feature data of the same cluster and the second integrated feature data of the same cluster based on the first integrated feature data of each cluster acquired by the acquisition unit and the second integrated feature data of the cluster generated by the generation unit. A combining unit for creating a combined database having combined data for each cluster, combined data being combined with integrated feature data;
An information processing system comprising:

An information processing system for generating a new database based on first and second databases, wherein the first database has feature data representing a first feature of a first group of components, for each component of the first group. The second database includes, for each of the second group of structures, feature data representing a second feature of the structures, and the first and second groups include at least a plurality of structures in common. In an information processing system that includes in part:
A clustering unit that divides the plurality of common constituents into a plurality of clusters and provides cluster information representing a cluster to which each of the plurality of common constituents belongs;
Based on the cluster information acquired from the clustering unit, feature data of the plurality of common constituents possessed by the first database are integrated by statistical processing for each cluster, whereby the integration is performed for each cluster A first acquisition unit that generates a first integrated feature data as feature data; a first acquisition unit that acquires a first integrated feature data for each cluster from the first generation unit;
Based on the cluster information acquired from the clustering unit, feature data of the plurality of common constituents included in the second database are integrated by statistical processing for each cluster, whereby the integration is performed for each cluster A second generation unit configured to generate second integrated feature data as feature data; and a second acquisition unit configured to obtain second integrated feature data for each cluster from the second generation unit;
Based on the first integrated feature data for each cluster acquired by the first acquisition unit and the second integrated feature data for each cluster acquired by the second acquisition unit, as the new database A combining unit that generates a combined database including combined data obtained by combining the first integrated feature data and the second integrated feature data for each of the clusters;
An information processing system comprising:

In order to cause a computer to realize functions as a clustering unit, a first acquisition unit, a second acquisition unit, and a coupling unit included in the information processing system according to any one of claims 1 to 12 and claim 14. Programs.

An information processing method for generating a new database based on first and second databases,
The first database has feature data representing a first feature of the first group of components, and the second database includes the feature data of the second group of components. Having feature data representing the second feature,
The method is
A plurality of construction pairs between the first group and the second group, in which two constructions belonging to the pair at least correspond to each other, are divided into a plurality of clusters; A clustering procedure that provides cluster information representing the cluster to which each of the construct pairs belongs;
Based on the cluster information provided by the clustering procedure, feature data corresponding to the plurality of construct pairs included in the first database are integrated by statistical processing for each cluster, whereby the integration is performed for each cluster A first acquisition procedure for acquiring the first integrated feature data for each cluster from the device for generating the first integrated feature data as the selected feature data;
Based on the cluster information provided by the clustering procedure, feature data corresponding to the plurality of constituent pairs included in the second database are integrated by statistical processing for each cluster, whereby the integration is performed for each cluster A second acquisition procedure for acquiring second integrated feature data for each cluster from a device for generating second integrated feature data as the selected feature data;
The same as the new database based on the first integrated feature data for each cluster acquired by the first acquisition procedure and the second integrated feature data for each cluster acquired by the second acquisition procedure A combining procedure for creating a combined database having combined data for each cluster that combines the first integrated feature data of the cluster and the second integrated feature data;
Information processing method including:

An information processing method for generating a new database based on first and second databases,
The first database has feature data representing a first feature of the first group of components, and the second database includes the feature data of the second group of components. Having feature data representing a second feature, the first and second groups at least partially including a plurality of common components,
The method is
A clustering procedure for dividing the common plurality of structures into a plurality of clusters and providing cluster information representing a cluster to which each of the common plurality of structures belongs;
Based on the cluster information provided by the clustering procedure, feature data of the plurality of common constituents possessed by the first database are integrated by statistical processing for each cluster, whereby the integration is performed for each cluster A first acquisition procedure for acquiring the first integrated feature data for each cluster from the device for generating the first integrated feature data as the selected feature data;
Based on the cluster information provided by the clustering procedure, feature data of the plurality of common constituents possessed by the second database are integrated by statistical processing for each cluster, whereby the integration is performed for each cluster A second acquisition procedure for acquiring second integrated feature data for each cluster from a device for generating second integrated feature data as the selected feature data;
The same as the new database based on the first integrated feature data for each cluster acquired by the first acquisition procedure and the second integrated feature data for each cluster acquired by the second acquisition procedure A combining procedure for creating a combined database having combined data for each cluster that combines the first integrated feature data of the cluster and the second integrated feature data;
Information processing method including: