JP2019175419A

JP2019175419A - Data providing system, data providing method, and computer program

Info

Publication number: JP2019175419A
Application number: JP2018180220A
Authority: JP
Inventors: 龍道本; Ryu Domoto
Original assignee: Hakuhodo DY Holdings Inc
Current assignee: Hakuhodo DY Holdings Inc
Priority date: 2018-09-26
Filing date: 2018-09-26
Publication date: 2019-10-10
Anticipated expiration: 2038-03-27
Also published as: JP6535128B1

Abstract

To provide a new database technique.SOLUTION: A method according to one aspect of the present disclosure includes combining a group of first feature data and a group of second feature data. The first feature data represents features of corresponding one or more first constituents, associated with a first identifier which is an identifier of the corresponding one or more first constituents, among a plurality of first constituents included in a first group. Each piece of the second feature data corresponds to each of a plurality of clusters in a second group. The second feature data includes statistical data associated with a second identifier which is an identifier of two or more second constituents included in a corresponding cluster. The statistical data represents a feature of the two or more second constituent included in the corresponding cluster as a statistic. Combining (S150) is performed on the basis of the first identifier and the second identifier.SELECTED DRAWING: Figure 3

Description

本開示は、情報処理システム及びデータ提供システム、並びに関連する方法に関する。 The present disclosure relates to an information processing system, a data providing system, and related methods.

従来、商品の販売データに基づき顧客の購買行動を解析することが行われている。商業活動に役立てるために、顧客によるマスメディアやネットワークコンテンツへの接触行動を解析することも行われている。アンケート形式や対面での質問形式により、顧客の購買行動、マスメディア／ネットワークコンテンツへの接触行動、及び、ライフスタイル等の多様な情報を収集することも行われている。 Conventionally, a customer's purchasing behavior is analyzed based on product sales data. In order to make use for commercial activities, the customer's contact with mass media and network contents is also analyzed. Various types of information such as customer purchase behavior, mass media / network content contact behavior, and lifestyles are also collected using a questionnaire format or a face-to-face question format.

近年では、こうした顧客に関するデータを備える巨大なデータベースを、各企業が有している。しかしながら、各企業は、個人情報保護を主な理由として、これらの顧客に関するデータを外部に提供することに消極的である。これらのデータは、それを保有する企業から外部に提供される場合、暗号化されて提供されたり、顧客の特定に繋がる情報が大幅に削除されて提供されたり、意図的に誤り（ノイズ）を含むように変更された状態で提供されたりする（特許文献１参照）。 In recent years, each company has a huge database with data on such customers. However, each company is reluctant to provide data on these customers to the outside mainly due to the protection of personal information. When these data are provided to the outside by the company that owns them, they are provided in encrypted form, provided with information that leads to the identification of customers being significantly deleted, or intentionally erroneous (noise). It is provided in a state changed to include (see Patent Document 1).

特開２０１４−１０９６４７号公報JP 2014-109647 A

上述したように、データ保有企業からの顧客に関するデータの提供は、従来、個人情報保護の観点から限られている。 As described above, the provision of data related to customers from data holding companies is conventionally limited from the viewpoint of personal information protection.

そこで、本開示の一側面では、個人情報保護を考慮したデータ提供及びデータ処理に関する新規技術を提供できることが望ましい。 Therefore, in one aspect of the present disclosure, it is desirable to be able to provide a new technique related to data provision and data processing in consideration of personal information protection.

本開示の一側面に係る情報処理システムは、記憶ユニットと、取得ユニットと、結合ユニットと、を備える。記憶ユニットは、第一のグループに関する第一の特徴データの一群を記憶するように構成される。取得ユニットは、第二のグループに関する第二の特徴データの一群を取得するように構成される。第一のグループは、複数の第一の構成体を含む。第二のグループは、複数の第二の構成体を含む。 An information processing system according to an aspect of the present disclosure includes a storage unit, an acquisition unit, and a combining unit. The storage unit is configured to store a group of first feature data relating to the first group. The acquisition unit is configured to acquire a group of second feature data relating to the second group. The first group includes a plurality of first structures. The second group includes a plurality of second structures.

第一の特徴データのそれぞれは、対応する一以上の第一の構成体の特徴を表す。第一の特徴データのそれぞれは、対応する一以上の第一の構成体の識別子である第一の識別子に関連付けられる。 Each of the first feature data represents a feature of the corresponding one or more first components. Each of the first feature data is associated with a first identifier that is an identifier of the corresponding one or more first constructs.

第二の特徴データのそれぞれは、第二のグループにおける複数のクラスタのそれぞれに対応する。複数のクラスタのそれぞれは、複数の第二の構成体のうちの二以上を含む。第二の特徴データのそれぞれは、対応するクラスタに含まれる二以上の第二の構成体の特徴を統計量で表す統計データを備える。統計データは、対応するクラスタに含まれる二以上の第二の構成体の識別子である第二の識別子に関連付けられる。 Each of the second feature data corresponds to each of the plurality of clusters in the second group. Each of the plurality of clusters includes two or more of the plurality of second structures. Each of the second feature data includes statistical data that represents the features of two or more second constituents included in the corresponding cluster by a statistical amount. The statistical data is associated with a second identifier that is an identifier of two or more second constructs included in the corresponding cluster.

結合ユニットは、第一の特徴データに関連付けられた第一の識別子に基づき、第一の特徴データのそれぞれを、対応する第二の識別子に関連付けられた第二の特徴データの統計データと結合するように、第一の特徴データの一群と第二の特徴データの一群とを結合するように構成される。 The combining unit combines each of the first feature data with the statistical data of the second feature data associated with the corresponding second identifier based on the first identifier associated with the first feature data. As described above, the first feature data group and the second feature data group are configured to be combined.

この情報処理システムによれば、第二のグループに関する第二の特徴データを、情報保護された統計データとして取得しながらも、統計データに対応する構成体を第二の識別子に基づき具体的に識別して、対応する構成体の第一の特徴データと結合することができる。従って、本開示の一側面によれば、個人情報保護を実現しながら、第一のグループの構成体と第二のグループの構成体と間の対応関係に適合した有意義なデータ結合を実現することができる。このように、本開示の一側面によれば、有意義なデータ結合技術を提供することができる。 According to this information processing system, the second feature data related to the second group is acquired as statistical data with information protection, and the structure corresponding to the statistical data is specifically identified based on the second identifier. Thus, it can be combined with the first feature data of the corresponding structure. Therefore, according to one aspect of the present disclosure, it is possible to realize a meaningful data combination suitable for the correspondence between the first group constituent and the second group constituent while realizing the protection of personal information. Can do. Thus, according to one aspect of the present disclosure, a meaningful data combining technique can be provided.

本開示の一側面によれば、取得ユニットは、複数の第二の構成体のそれぞれに対応する複数の個別特徴データを備えるデータ提供システムに、指定した制約条件に従って複数の第二の構成体を複数のクラスタにクラスタリングするように要求し、データ提供システムから制約条件に従う複数のクラスタに対応する第二の特徴データの一群を取得してもよい。 According to one aspect of the present disclosure, an acquisition unit includes a plurality of second constituents according to a designated constraint condition in a data providing system including a plurality of individual feature data corresponding to each of the plurality of second constituents. The cluster may be requested to be clustered into a plurality of clusters, and a group of second feature data corresponding to the plurality of clusters complying with the constraints may be acquired from the data providing system.

データ提供システムは、制約条件に従って、複数の第二の構成体を複数のクラスタにクラスタリングし、複数のクラスタに対応する第二の特徴データの一群を、情報処理システムに提供するように構成されてもよい。本開示の一側面によれば、統計データは、データ提供システムが、対応するクラスタに含まれる二以上の第二の構成体の個別特徴データが示す二以上の第二の構成体の特徴を統計量に変換することにより生成されてもよい。 The data providing system is configured to cluster a plurality of second constructs into a plurality of clusters according to a constraint condition, and to provide a group of second feature data corresponding to the plurality of clusters to the information processing system. Also good. According to one aspect of the present disclosure, the statistical data is obtained by statistically analyzing the characteristics of the two or more second components indicated by the individual feature data of the two or more second components included in the corresponding cluster. It may be generated by converting to a quantity.

制約条件を情報処理システムがデータ提供システムに指定することによれば、情報処理システムは、データ提供システムのクラスタリングを制御することができる。これにより、一つのクラスタに対応する複数の個別特徴データが有する情報が、統計データとしてまとめられる際に、有意義な情報が失われるのを抑制することができる。即ち、データ提供システムにおける不適切なクラスタリングに起因して、データ提供システムから統計データとして提供される情報の価値が劣化するのを抑制することができる。 When the information processing system designates the constraint condition for the data providing system, the information processing system can control the clustering of the data providing system. As a result, it is possible to suppress the loss of meaningful information when information included in a plurality of individual feature data corresponding to one cluster is collected as statistical data. That is, it is possible to suppress deterioration of the value of information provided as statistical data from the data providing system due to inappropriate clustering in the data providing system.

本開示の一側面によれば、複数の第二の構成体には、複数の第一の構成体のいずれかに対応する複数の対応構成体と、複数の第一の構成体のいずれにも対応しない複数の非対応構成体と、が含まれてもよい。取得ユニットにより取得される第二の特徴データのそれぞれは、第二のグループにおける複数の対応構成体をクラスタリングして定義される複数のクラスタのそれぞれに対応してもよい。 According to one aspect of the present disclosure, the plurality of second constituent members include a plurality of corresponding constituent members corresponding to any of the plurality of first constituent members and a plurality of first constituent members. A plurality of non-corresponding constructs that do not correspond may be included. Each of the second feature data acquired by the acquisition unit may correspond to each of a plurality of clusters defined by clustering a plurality of corresponding constituents in the second group.

本開示の一側面によれば、取得ユニットは、データ提供システムに、複数の対応構成体を複数のクラスタにクラスタリングするように要求してもよい。データ提供システムは、この要求に従って、複数の対応構成体を複数のクラスタにクラスタリングし、複数のクラスタに対応する第二の特徴データの一群を、情報処理システムに提供するように構成されてもよい。 According to one aspect of the present disclosure, the acquisition unit may request the data providing system to cluster a plurality of corresponding constructs into a plurality of clusters. In accordance with this request, the data providing system may be configured to cluster a plurality of corresponding constructs into a plurality of clusters and provide a group of second feature data corresponding to the plurality of clusters to the information processing system. .

本開示の一側面によれば、取得ユニットは、構成体のリストを、データ提供システムに送信してもよい。リストは、具体的には、複数の第一の構成体のリスト又は複数の対応構成体のリストであり得る。データ提供システムは、リストに基づき、第二のグループにおける複数の対応構成体を識別するように構成されてもよい。 According to one aspect of the present disclosure, the acquisition unit may send a list of constructs to the data providing system. Specifically, the list may be a list of a plurality of first structures or a list of a plurality of corresponding structures. The data providing system may be configured to identify a plurality of corresponding entities in the second group based on the list.

本開示の一側面によれば、取得ユニットは、リストに含まれる複数の構成体間の特徴空
間上の距離を表す距離情報をデータ提供システムに送信してもよい。データ提供システムは、距離情報に基づき、第二のグループにおける複数の対応構成体を、複数のクラスタにクラスタリングするように構成されてもよい。このような距離情報の提供は、データ提供システムにおける第二のグループの適切なクラスタリングに役立つ。 According to one aspect of the present disclosure, the acquisition unit may transmit distance information representing a distance on a feature space between a plurality of constituents included in the list to the data providing system. The data providing system may be configured to cluster a plurality of corresponding constituents in the second group into a plurality of clusters based on the distance information. Providing such distance information is useful for proper clustering of the second group in the data providing system.

本開示の一側面によれば、取得ユニットは、リストに含まれる複数の構成体のそれぞれの分類を表す分類情報をデータ提供システムに送信してもよい。データ提供システムは、分類情報に基づき、第二のグループにおける複数の対応構成体を、一つのクラスタに異なる分類の対応構成体が混在しないように、複数のクラスタにクラスタリングするように構成されてもよい。このような分類情報の提供は、同様に、データ提供システムにおける第二のグループの適切なクラスタリングに役立つ。 According to one aspect of the present disclosure, the acquisition unit may transmit classification information representing each classification of the plurality of constituents included in the list to the data providing system. The data providing system may be configured to cluster a plurality of corresponding constituents in the second group into a plurality of clusters based on the classification information so that corresponding constituents of different classifications are not mixed in one cluster. Good. Providing such classification information is also useful for proper clustering of the second group in the data providing system.

本開示の一側面によれば、第二のグループにおける複数の対応構成体は、第一のグループにおける複数の第一の構成体のいずれかと同一の構成であってもよい。第一のグループと第二のグループとの間では、複数の対応構成体のそれぞれに、同一の識別子が割り当てられていてもよい。結合ユニットは、第一の特徴データのそれぞれを、同一の識別子に関連付けられた第二の特徴データの統計データと結合してもよい。 According to one aspect of the present disclosure, the plurality of corresponding constituents in the second group may have the same configuration as any of the plurality of first constituents in the first group. Between the first group and the second group, the same identifier may be assigned to each of the plurality of corresponding components. The combining unit may combine each of the first feature data with statistical data of the second feature data associated with the same identifier.

本開示の一側面によれば、記憶ユニットは、第一の識別子と第二の識別子との間の対応関係を記憶してもよい。結合ユニットは、対応関係に従って、第一の特徴データのそれぞれを、対応する第二の識別子に関連付けられた第二の特徴データの統計データと結合してもよい。 According to one aspect of the present disclosure, the storage unit may store a correspondence relationship between the first identifier and the second identifier. The combining unit may combine each of the first feature data with statistical data of the second feature data associated with the corresponding second identifier according to the correspondence relationship.

本開示の一側面によれば、第一の特徴データのそれぞれは、第一のグループにおける複数のクラスタのそれぞれに対応していてもよい。複数のクラスタのそれぞれは、複数の第一の構成体のうちの二以上を含んでいてもよい。第一の特徴データは、対応するクラスタに含まれる二以上の第一の構成体の特徴を統計量で表す統計データを備えてもよい。統計データは、対応するクラスタに含まれる二以上の第一の構成体の識別子に関連付けられていてもよい。 According to one aspect of the present disclosure, each of the first feature data may correspond to each of a plurality of clusters in the first group. Each of the plurality of clusters may include two or more of the plurality of first structures. The first feature data may include statistical data that represents the features of two or more first constituents included in the corresponding cluster by a statistical amount. The statistical data may be associated with identifiers of two or more first structures included in the corresponding cluster.

本開示の一側面によれば、複数の第二の構成体には、複数の第一の構成体のいずれかに対応する複数の対応構成体が含まれる環境で、取得ユニットが次のように動作してもよい。即ち、取得ユニットは、複数の対応構成体を、対応する複数の第一の構成体の第一の特徴データに基づき、複数のクラスタにクラスタリングして、複数の対応構成体のそれぞれが属するクラスタを識別可能なクラスタ情報を、複数の第二の構成体のそれぞれに対応する複数の個別特徴データを備えるデータ提供システムに提供し、データ提供システムからクラスタ情報に従う第二の特徴データの一群を取得してもよい。この場合、データ提供システムは、クラスタ情報から識別されるクラスタ毎に、対応するクラスタに含まれる二以上の第二の構成体の個別特徴データが示す二以上の第二の構成体の特徴を統計量に変換することによって、対応するクラスタの第二の特徴データを生成し、生成した第二の特徴データの一群を情報処理システムに提供するように構成されてもよい。 According to one aspect of the present disclosure, in an environment in which the plurality of second constituents include a plurality of corresponding constituents corresponding to any of the plurality of first constituents, the acquisition unit is as follows: It may work. That is, the acquisition unit clusters a plurality of corresponding constituents into a plurality of clusters based on the first feature data of the corresponding first constituents, and selects a cluster to which each of the plurality of corresponding constituents belongs. Providing identifiable cluster information to a data providing system including a plurality of individual feature data corresponding to each of a plurality of second structures, and obtaining a group of second feature data according to the cluster information from the data providing system May be. In this case, for each cluster identified from the cluster information, the data providing system statistically analyzes the characteristics of the two or more second components indicated by the individual feature data of the two or more second components included in the corresponding cluster. It may be configured to generate the second feature data of the corresponding cluster by converting into a quantity, and to provide the group of the generated second feature data to the information processing system.

本開示の一側面によれば、記憶ユニットと、クラスタリングユニットと、生成ユニットと、提供ユニットと、を備えるデータ提供システムが提供されてもよい。記憶ユニットは、グループ内の複数の構成体に対応する複数の個別特徴データを記憶してもよい。 According to one aspect of the present disclosure, a data provision system including a storage unit, a clustering unit, a generation unit, and a provision unit may be provided. The storage unit may store a plurality of individual feature data corresponding to a plurality of components in the group.

クラスタリングユニットは、複数の構成体を複数のクラスタにクラスタリングするように構成されてもよい。生成ユニットは、複数のクラスタに対応するクラスタ特徴データの一群を生成するように構成されてもよい。提供ユニットは、生成ユニットにより生成されたクラスタ特徴データの一群を、情報処理システムに提供するように構成されてもよい。 The clustering unit may be configured to cluster a plurality of constructs into a plurality of clusters. The generating unit may be configured to generate a group of cluster feature data corresponding to a plurality of clusters. The providing unit may be configured to provide a group of cluster feature data generated by the generating unit to the information processing system.

個別特徴データのそれぞれは、対応する構成体の特徴を表すことができる。個別特徴データのそれぞれは、対応する構成体の識別子に関連付けられていてもよい。クラスタ特徴データのそれぞれは、対応するクラスタに含まれる二以上の構成体の特徴を統計量で表す統計データを備えていてもよい。統計データは、対応するクラスタに含まれる二以上の構成体の識別子に関連付けられていてもよい。生成ユニットは、統計データを、対応するクラスタに含まれる二以上の構成体の個別特徴データが示す二以上の構成体の特徴を統計量に変換して生成するように構成されてもよい。 Each piece of individual feature data can represent a feature of the corresponding construct. Each piece of individual feature data may be associated with an identifier of a corresponding construct. Each of the cluster feature data may include statistical data that represents the features of two or more constituents included in the corresponding cluster by a statistical amount. The statistical data may be associated with identifiers of two or more constructs included in the corresponding cluster. The generation unit may be configured to generate the statistical data by converting the features of two or more constituents indicated by the individual feature data of the two or more constituents included in the corresponding cluster into statistics.

このデータ提供システムは、第二のグループにおける構成体の特徴を、情報保護した統計データとして情報処理システムに提供しながらも、統計データに第二の識別子を関連付けることで、情報処理システムが、統計データに対応する第二のグループの構成体を具体的に識別できるようにする。従って、データ提供システムは、データ結合に役立つ有意義な情報を情報処理システムに提供することができる。 This data providing system provides the information processing system with the characteristics of the constructs in the second group as information-protected statistical data, but also associates the second identifier with the statistical data. The structure of the second group corresponding to the data can be specifically identified. Therefore, the data providing system can provide meaningful information useful for data combination to the information processing system.

本開示の一側面によれば、クラスタリングユニットは、情報処理システムから指定された制約条件に従って、複数の構成体を複数のクラスタにクラスタリングしてもよい。本開示の一側面によれば、情報処理システムは、複数の構成体間の特徴空間上の距離を表す距離情報をデータ提供システムに送信するように構成されてもよい。クラスタリングユニットは、情報処理システムからの距離情報に基づき、複数の構成体を複数のクラスタにクラスタリングしてもよい。距離情報に基づけば、データ提供システムは、複数の構成体をより適切にクラスタリングすることができ、高い情報価値を有するクラスタ特徴データを情報処理システムに提供することができる。 According to one aspect of the present disclosure, the clustering unit may cluster a plurality of constructs into a plurality of clusters in accordance with a constraint specified by the information processing system. According to one aspect of the present disclosure, the information processing system may be configured to transmit distance information representing a distance on a feature space between a plurality of components to the data providing system. The clustering unit may cluster a plurality of constructs into a plurality of clusters based on distance information from the information processing system. Based on the distance information, the data providing system can more appropriately cluster a plurality of constructs, and can provide cluster information data having high information value to the information processing system.

本開示の一側面によれば、情報処理システムは、複数の構成体のそれぞれの分類を表す分類情報をデータ提供システムに送信するように構成されてもよい。クラスタリングユニットは、情報処理システムからの分類情報に基づき、複数の構成体を、一つのクラスタに異なる分類の構成体が混在しないように、複数のクラスタにクラスタリングしてもよい。分類情報に基づけば、データ提供システムは、複数の構成体をより適切にクラスタリングすることができる。 According to one aspect of the present disclosure, the information processing system may be configured to transmit classification information representing each classification of the plurality of constituents to the data providing system. The clustering unit may cluster a plurality of constituents into a plurality of clusters based on classification information from the information processing system so that different classification constituents are not mixed in one cluster. Based on the classification information, the data providing system can more appropriately cluster a plurality of components.

本開示の一側面によれば、情報処理システムは、構成体のリストを、データ提供システムに送信するように構成されてもよい。データ提供システムは、複数の構成体の中から、リストに含まれる構成体のいずれかと対応する複数の対応構成体を識別するように構成される識別ユニットを備えてもよい。クラスタリングユニットは、識別ユニットにより識別された複数の対応構成体を複数のクラスタにクラスタリングしてもよい。 According to one aspect of the present disclosure, the information processing system may be configured to transmit a list of constructs to the data providing system. The data providing system may include an identification unit configured to identify a plurality of corresponding components corresponding to any of the components included in the list from among the plurality of components. The clustering unit may cluster a plurality of corresponding constituents identified by the identification unit into a plurality of clusters.

本開示の一側面によれば、コンピュータが実行する情報処理方法が提供されてもよい。情報処理方法は、複数の第一の構成体を含む第一のグループに関する第一の特徴データの一群を取得することと、複数の第二の構成体を含む第二のグループに関する第二の特徴データの一群を取得することと、取得した第一の特徴データの一群と、取得した第二の特徴データの一群とを結合することとを含んでいてもよい。 According to one aspect of the present disclosure, an information processing method executed by a computer may be provided. The information processing method acquires a group of first feature data related to a first group including a plurality of first constituents and a second feature related to a second group including a plurality of second constituents Acquiring a group of data, and combining the acquired group of first feature data and the acquired group of second feature data may be included.

第一の特徴データのそれぞれは、対応する一以上の第一の構成体の特徴を表すことができる。第一の特徴データのそれぞれには、対応する一以上の第一の構成体の識別子である第一の識別子が関連付けられてもよい。 Each of the first feature data can represent a feature of one or more corresponding first components. Each of the first feature data may be associated with a first identifier that is an identifier of one or more corresponding first components.

第二の特徴データのそれぞれは、第二のグループにおける複数のクラスタのそれぞれに対応してもよい。複数のクラスタのそれぞれは、複数の第二の構成体のうちの二以上を含んでいてもよい。第二の特徴データのそれぞれは、対応するクラスタに含まれる二以上の
第二の構成体の特徴を統計量で表す統計データを備えていてもよい。統計データは、対応するクラスタに含まれる二以上の第二の構成体の識別子である第二の識別子に関連付けられていてもよい。 Each of the second feature data may correspond to each of a plurality of clusters in the second group. Each of the plurality of clusters may include two or more of the plurality of second structures. Each of the second feature data may include statistical data that represents the features of two or more second constituents included in the corresponding cluster by a statistical amount. The statistical data may be associated with a second identifier that is an identifier of two or more second structures included in the corresponding cluster.

結合することは、第一の特徴データに関連付けられた第一の識別子に基づき、第一の特徴データのそれぞれを、対応する第二の識別子に関連付けられた第二の特徴データの統計データと結合するように、第一の特徴データの一群と第二の特徴データの一群とを結合することを含んでいてもよい。 Combining is based on a first identifier associated with the first feature data, and combining each of the first feature data with statistical data of the second feature data associated with the corresponding second identifier. As such, it may include combining a group of first feature data and a group of second feature data.

本開示の一側面によれば、第一の特徴データの一群を取得することは、第一の特徴データの一群を記憶する記憶デバイスから、第一の特徴データの一群を読み出すことを含んでいてもよい。第二の特徴データの一群を取得することは、第二の特徴データの一群を提供するデータ提供システムから、第二の特徴データの一群を取得することを含んでいてもよい。この方法によれば、上述した情報処理システムと同様の効果を得ることができる。 According to one aspect of the present disclosure, obtaining the group of first feature data includes reading the group of first feature data from a storage device that stores the group of first feature data. Also good. Acquiring a group of second feature data may include acquiring a group of second feature data from a data providing system that provides the group of second feature data. According to this method, the same effect as the information processing system described above can be obtained.

本開示の一側面によれば、コンピュータが実行するデータ提供方法が提供されてもよい。データ提供方法は、グループ内の複数の構成体のそれぞれに対応する複数の個別特徴データを取得することと、複数の構成体を複数のクラスタにクラスタリングすることと、複数のクラスタに対応するクラスタ特徴データの一群を生成することと、生成したクラスタ特徴データの一群を、情報処理システムに提供することと、を含んでいてもよい。 According to one aspect of the present disclosure, a data providing method executed by a computer may be provided. The data providing method includes acquiring a plurality of individual feature data corresponding to each of a plurality of constituents in a group, clustering the plurality of constituents into a plurality of clusters, and cluster features corresponding to the plurality of clusters. Generating a group of data and providing the generated group of cluster feature data to an information processing system may be included.

個別特徴データのそれぞれは、対応する構成体の特徴を表すことができる。個別特徴データのそれぞれは、対応する構成体の識別子に関連付けられていてもよい。クラスタ特徴データのそれぞれは、対応するクラスタに含まれる二以上の構成体の特徴を統計量で表す統計データを備えていてもよい。統計データは、対応するクラスタに含まれる二以上の構成体の識別子に関連付けられていてもよい。 Each piece of individual feature data can represent a feature of the corresponding construct. Each piece of individual feature data may be associated with an identifier of a corresponding construct. Each of the cluster feature data may include statistical data that represents the features of two or more constituents included in the corresponding cluster by a statistical amount. The statistical data may be associated with identifiers of two or more constructs included in the corresponding cluster.

生成することは、統計データを、対応するクラスタに含まれる二以上の構成体の個別特徴データが示す二以上の構成体の特徴を統計量に変換して生成することを含んでいてもよい。 Generating may include generating statistical data by converting features of two or more constituents indicated by individual feature data of two or more constituents included in the corresponding cluster into statistics.

本開示の一側面によれば、クラスタリングすることは、情報処理システムから指定された制約条件に従って複数の構成体を複数のクラスタにクラスタリングすることを含んでいてもよい。 According to one aspect of the present disclosure, clustering may include clustering a plurality of constructs into a plurality of clusters according to a constraint specified by the information processing system.

本開示の一側面によれば、クラスタリングすることは、情報処理システムからの複数の構成体間の特徴空間上の距離を表す距離情報に基づき、複数の構成体を、複数のクラスタにクラスタリングすることを含んでいてもよい。 According to one aspect of the present disclosure, clustering includes clustering a plurality of constructs into a plurality of clusters based on distance information representing a distance in a feature space between the plurality of constructs from the information processing system. May be included.

本開示の一側面によれば、クラスタリングすることは、情報処理システムからの複数の構成体のそれぞれの分類を表す分類情報に基づき、複数の構成体を、一つのクラスタに異なる分類の構成体が混在しないように、複数のクラスタにクラスタリングすることを含んでいてもよい。 According to one aspect of the present disclosure, clustering is based on classification information representing each classification of a plurality of components from the information processing system, and a plurality of components are classified into one cluster. Clustering into a plurality of clusters may be included so as not to be mixed.

本開示の一側面によれば、複数の個別特徴データを取得することは、複数の個別特徴データを記憶する記憶デバイスから、複数の個別特徴データを読み出すことを含んでいてもよい。 According to one aspect of the present disclosure, obtaining the plurality of individual feature data may include reading the plurality of individual feature data from a storage device that stores the plurality of individual feature data.

本開示の一側面によれば、上述した情報処理システムが備える取得ユニット、及び、結合ユニットの少なくとも一つとして、コンピュータを機能させるためのコンピュータプロ
グラムが提供されてもよい。本開示の一側面によれば、上述したデータ提供システムが備えるクラスタリングユニット、生成ユニット、識別ユニット、及び提供ユニットの少なくとも一つとして、コンピュータを機能させるためのコンピュータプログラムが提供されてもよい。 According to one aspect of the present disclosure, a computer program for causing a computer to function may be provided as at least one of the acquisition unit and the coupling unit included in the information processing system described above. According to an aspect of the present disclosure, a computer program for causing a computer to function may be provided as at least one of a clustering unit, a generation unit, an identification unit, and a provision unit included in the data provision system described above.

本開示の一側面によれば、上述した情報処理方法を、コンピュータを実行させるためのコンピュータプログラムが提供されてもよい。本開示の一側面によれば、上述したデータ提供方法を、コンピュータを実行させるためのコンピュータプログラムが提供されてもよい。本開示の一側面によれば、上述したコンピュータプログラムを格納したコンピュータ読取可能な一時的でない記録媒体が提供されてもよい。 According to one aspect of the present disclosure, a computer program for causing a computer to execute the information processing method described above may be provided. According to one aspect of the present disclosure, a computer program for causing a computer to execute the above-described data providing method may be provided. According to one aspect of the present disclosure, a computer-readable non-transitory recording medium storing the above-described computer program may be provided.

第一実施形態のデータ加工システムの構成を表すブロック図である。It is a block diagram showing the structure of the data processing system of 1st embodiment. 図２Ａは、第一データベースの構成を表す図であり、図２Ｂは、第二データベースの構成を表す図である。FIG. 2A is a diagram illustrating the configuration of the first database, and FIG. 2B is a diagram illustrating the configuration of the second database. 結合システムで実行される処理のフローチャートである。It is a flowchart of the process performed by a coupling | bonding system. データ提供システムで実行される処理のフローチャートである。It is a flowchart of the process performed with a data provision system. クラスタ特徴データの生成に関する説明図である。It is explanatory drawing regarding the production | generation of cluster characteristic data. 第二データベースに基づく加工後データベースの構成を表す図である。It is a figure showing the structure of the database after a process based on a 2nd database. 結合データベースの構成を表す図である。It is a figure showing the structure of a joint database. 図８Ａは、クラスタ特徴データの具体例を示す図であり、図８Ｂは、結合データベースの具体例を示す図である。FIG. 8A is a diagram illustrating a specific example of cluster feature data, and FIG. 8B is a diagram illustrating a specific example of a combined database. 第二実施形態のデータ加工システムの構成を表すブロック図である。It is a block diagram showing the structure of the data processing system of 2nd embodiment. 図１０Ａは、第三データベースの構成を表す図であり、図１０Ｂは、第三データベースに基づく加工後データベースの構成を表す図である。FIG. 10A is a diagram illustrating a configuration of a third database, and FIG. 10B is a diagram illustrating a configuration of a post-processing database based on the third database. 拡張された結合データベースの構成を表す図である。It is a figure showing the structure of the extended joint database. 図１２Ａは、第三実施形態における第二データベースの構成を表す図であり、図１２Ｂは、第三実施形態における加工後データベースの構成を表す図である。FIG. 12A is a diagram illustrating the configuration of the second database in the third embodiment, and FIG. 12B is a diagram illustrating the configuration of the post-processing database in the third embodiment. 対応表に関する説明図である。It is explanatory drawing regarding a correspondence table. 第三実施形態の結合データベースの構成を表す図である。It is a figure showing the structure of the joint database of 3rd embodiment. セグメント情報付の共通消費者のリストの構成を表す図である。It is a figure showing the structure of the list of common consumers with segment information. 第四実施形態のデータ提供システムで実行される処理のフローチャートである。It is a flowchart of the process performed with the data provision system of 4th embodiment. 第五実施形態の結合システムで実行される処理のフローチャートである。It is a flowchart of the process performed with the coupling | bonding system of 5th embodiment. 第五実施形態のデータ提供システムで実行される処理のフローチャートである。It is a flowchart of the process performed with the data provision system of 5th embodiment. 第六実施形態の結合システムで実行される処理のフローチャートである。It is a flowchart of the process performed with the coupling | bonding system of 6th embodiment. 第六実施形態のデータ提供システムで実行される処理のフローチャートである。It is a flowchart of the process performed with the data provision system of 6th embodiment.

以下に本開示の例示的実施形態を、図面を参照しながら説明する。 Hereinafter, exemplary embodiments of the present disclosure will be described with reference to the drawings.

［第一実施形態］
第一実施形態のデータ加工システム１は、図１に示すように、結合システム１０と、データ提供システム３０とを備える。このデータ加工システム１では、結合システム１０が有する第一グループの消費者に関する第一データベース１５１と、データ提供システム３０が有する第二グループの消費者に関する第二データベース３５１とに基づき、結合データベース１５５が生成される。 [First embodiment]
As shown in FIG. 1, the data processing system 1 of the first embodiment includes a coupling system 10 and a data providing system 30. In this data processing system 1, the combined database 155 is based on the first database 151 regarding the first group of consumers included in the combined system 10 and the second database 351 regarding the second group of consumers included in the data providing system 30. Generated.

結合システム１０は、プロセッサ（ＣＰＵ）１１と、メモリ１３と、ストレージ装置１５と、を備える。結合システム１０は、図示しない通信インタフェースを備え、ネットワークＮＴを通じて、データ提供システム３０と通信可能に接続される。プロセッサ１１は、ストレージ装置１５に記憶されたコンピュータプログラムに従う処理を実行する。メモリ１３は、ＲＯＭ及びＲＡＭを含む。ストレージ装置１５は、第一データベース１５１を備える。 The combined system 10 includes a processor (CPU) 11, a memory 13, and a storage device 15. The coupling system 10 includes a communication interface (not shown) and is communicably connected to the data providing system 30 through the network NT. The processor 11 executes processing according to the computer program stored in the storage device 15. The memory 13 includes a ROM and a RAM. The storage device 15 includes a first database 151.

第一データベース１５１は、図２Ａに示すように、第一グループの消費者毎に、対応する消費者個人の特徴データＦ１を有する。第一グループの消費者は、第一グループの構成体に対応する。以下では、消費者個人の特徴データのことを、個別特徴データと表現し、第一データベース１５１が有する個別特徴データＦ１のことを第一個別特徴データＦ１と表現する。図２Ａによれば、第一個別特徴データＦ１は、一行のデータで表される。第一個別特徴データＦ１は、対応する一人の消費者の特徴を、その消費者の識別コードに関連付けて表す。具体的には、第一個別特徴データＦ１は、対応する消費者の特徴を、複数のパラメータＸ１，Ｘ２，Ｘ３で表す。 As shown in FIG. 2A, the first database 151 has characteristic data F1 of the corresponding consumer individual for each consumer of the first group. The first group of consumers corresponds to the constituents of the first group. In the following, individual consumer feature data is expressed as individual feature data, and individual feature data F1 of the first database 151 is expressed as first individual feature data F1. According to FIG. 2A, the first individual feature data F1 is represented by one line of data. The first individual characteristic data F1 represents the characteristic of one corresponding consumer in association with the consumer's identification code. Specifically, the first individual characteristic data F1 represents the corresponding consumer characteristics by a plurality of parameters X1, X2, and X3.

図２Ａによれば、第一個別特徴データＦ１は、対応する消費者の特徴を、三つのパラメータＸ１，Ｘ２，Ｘ３で表す。しかしながら、これは、図面サイズを考慮して単純化された例に過ぎない。第一個別特徴データＦ１は、三つより多いパラメータＸ１，Ｘ２，Ｘ３で消費者の特徴を表し得る。 According to FIG. 2A, the first individual feature data F1 represents the corresponding consumer feature with three parameters X1, X2, X3. However, this is only a simplified example in view of the drawing size. The first individual feature data F1 may represent consumer features with more than three parameters X1, X2, X3.

図２Ａでは、各消費者の複数パラメータＸ１，Ｘ２，Ｘ３の値が、「ＰＥ」で表される。この表現「ＰＥ」は、対応する値が保護されるべき個人情報であることを示す。この表現「ＰＥ」の意味は、他の図面でも同様である。複数のパラメータＸ１，Ｘ２，Ｘ３の例には、対応する消費者の年齢、性別、及び居住地域等のデモグラフィック属性に関するパラメータの他、対応する消費者の消費行動に関するパラメータが含まれる。消費行動に関するパラメータの例には、利用店舗、購入商品、購入時期、購入数、及び、購入金額等の情報が含まれる。 In FIG. 2A, the values of the multiple parameters X1, X2, and X3 of each consumer are represented by “PE”. This expression “PE” indicates that the corresponding value is personal information to be protected. The meaning of the expression “PE” is the same in other drawings. Examples of the plurality of parameters X1, X2, and X3 include parameters related to demographic attributes such as the age, sex, and residential area of the corresponding consumer, as well as parameters related to the consumption behavior of the corresponding consumer. Examples of the parameters related to the consumption behavior include information such as the store used, the purchased product, the purchase time, the number of purchases, and the purchase price.

ストレージ装置１５には更に、プロセッサ１１が実行する処理により生成される結合データベース１５５が格納される（詳細後述）。 The storage device 15 further stores a combined database 155 generated by processing executed by the processor 11 (details will be described later).

データ提供システム３０は、図１に示すように、プロセッサ３１と、メモリ３３と、ストレージ装置３５と、を備える。データ提供システム３０は、図示しない通信インタフェースを備え、ネットワークＮＴを通じて、結合システム１０と通信可能に接続される。 As shown in FIG. 1, the data providing system 30 includes a processor 31, a memory 33, and a storage device 35. The data providing system 30 includes a communication interface (not shown) and is communicably connected to the coupling system 10 through the network NT.

プロセッサ３１は、ストレージ装置３５に記憶されたコンピュータプログラムに従う処理を実行する。メモリ３３は、ＲＯＭ及びＲＡＭを含む。ストレージ装置３５は、第二データベース３５１を格納する。 The processor 31 executes processing according to the computer program stored in the storage device 35. The memory 33 includes a ROM and a RAM. The storage device 35 stores a second database 351.

第二データベース３５１は、第二グループの消費者毎に、個別特徴データＦ２を有する。第二グループの消費者は、第二グループの構成体に対応する。以下では、第二データベース３５１が有する個別特徴データＦ２のことを第二個別特徴データＦ２と表現する。図２Ｂによれば、第二個別特徴データＦ２は、第一個別特徴データＦ１と同様に、対応する一人の消費者の特徴を、その消費者の識別コードに関連付けて表す。具体的には、第二個別特徴データＦ２は、対応する消費者の特徴を、複数のパラメータＹ１，Ｙ２，Ｙ３で表す。パラメータ数が限定されるものではないことは、第一個別特徴データＦ１と同様である。 The second database 351 has individual characteristic data F2 for each consumer of the second group. The second group of consumers corresponds to the second group of entities. Hereinafter, the individual feature data F2 included in the second database 351 is expressed as second individual feature data F2. According to FIG. 2B, the second individual feature data F2 represents the corresponding feature of one consumer in association with the consumer's identification code, like the first individual feature data F1. Specifically, the second individual feature data F2 represents the corresponding consumer features by a plurality of parameters Y1, Y2, Y3. The number of parameters is not limited as in the first individual feature data F1.

パラメータＹ１，Ｙ２，Ｙ３は、第一個別特徴データＦ１におけるパラメータＸ１，Ｘ
２，Ｘ３とは少なくとも部分的に異なるパラメータを含む。例えば、パラメータＹ１，Ｙ２，Ｙ３には、第一個別特徴データＦ１とは異なる商品に関する、消費者の消費行動に関するパラメータが含まれ得る。 The parameters Y1, Y2, Y3 are parameters X1, X in the first individual feature data F1
2 and X3 include at least partially different parameters. For example, the parameters Y1, Y2, and Y3 may include parameters related to consumer consumption behavior related to products different from the first individual feature data F1.

第二グループの消費者には、第一グループの消費者と同一の消費者が一部含まれる。以下では、第一グループ及び第二グループの両者に属する消費者、すなわち、第一グループと第二グループとの間で共通する消費者のことを、共通消費者と表現する。図２Ａ及び図２Ｂにおいて符号Ｃで示される範囲の第一個別特徴データＦ１及び第二個別特徴データＦ２が、共通消費者の第一個別特徴データＦ１及び第二個別特徴データＦ２に対応する。 The second group of consumers includes some of the same consumers as the first group of consumers. Hereinafter, consumers belonging to both the first group and the second group, that is, consumers common to the first group and the second group are expressed as a common consumer. The first individual feature data F1 and the second individual feature data F2 in the range indicated by the symbol C in FIGS. 2A and 2B correspond to the first individual feature data F1 and the second individual feature data F2 of the common consumer.

図２Ａ及び図２Ｂから理解できるように、第一データベース１５１及び第二データベース３５１では、消費者の識別コードとして共通の識別コードが用いられる。即ち、同一の消費者の第一個別特徴データＦ１及び第二個別特徴データＦ２には、同一の識別コードが関連付けられる。 As can be understood from FIGS. 2A and 2B, in the first database 151 and the second database 351, a common identification code is used as a consumer identification code. That is, the same identification code is associated with the first individual feature data F1 and the second individual feature data F2 of the same consumer.

このように識別コードを共通化するために、第一データベース１５１を保有する企業又は結合システム１０の運営企業は、第二データベース３５１を保有する企業に、識別コードを発行することができる。識別コードの例には、クッキーＩＤが含まれる。クッキー技術等を利用して消費者の端末装置をトラッキングすることにより、識別コードを共通化することができる。 In order to share the identification code in this way, the company that owns the first database 151 or the operating company of the combined system 10 can issue the identification code to the company that owns the second database 351. Examples of the identification code include a cookie ID. By using the cookie technology or the like to track the consumer terminal device, the identification code can be shared.

続いて、結合システム１０及びデータ提供システム３０が実行する処理の詳細を説明する。結合システム１０のプロセッサ１１は、図示しないユーザインタフェースを通じて結合システム１０の操作者から、結合データベース１５５の生成指示が入力されると、図３に示す結合データベース生成処理を実行する。データ提供システム３０のプロセッサ３１は、結合システム１０から要求信号を受信すると、図４に示す要求受付処理を実行する。 Next, details of processing executed by the combined system 10 and the data providing system 30 will be described. When an instruction for generating the combined database 155 is input from the operator of the combined system 10 through a user interface (not shown), the processor 11 of the combined system 10 executes a combined database generation process shown in FIG. When receiving the request signal from the combined system 10, the processor 31 of the data providing system 30 executes a request receiving process shown in FIG.

結合データベース生成処理において、結合システム１０のプロセッサ１１は、結合対象の第一データベース１５１及び第二データベース３５１の内、結合システム１０が有する第一データベース１５１をストレージ装置１５から読み出す（Ｓ１１０）。更に、結合対象に対応する第一グループと第二グループとの間における共通消費者を識別する（Ｓ１２０）。 In the combined database generation process, the processor 11 of the combined system 10 reads the first database 151 included in the combined system 10 from the storage device 15 among the first database 151 and the second database 351 to be combined (S110). Furthermore, a common consumer between the first group and the second group corresponding to the combination target is identified (S120).

具体的に、プロセッサ１１は、共通消費者を識別するために、第二グループの消費者リストを要求するリスト要求信号をデータ提供システム３０に送信し（Ｓ１２１）、データ提供システム３０から、第二グループの消費者リストを受信することができる（Ｓ１２３）。 Specifically, the processor 11 transmits a list request signal for requesting a consumer list of the second group to the data providing system 30 in order to identify a common consumer (S121). The group consumer list can be received (S123).

データ提供システム３０のプロセッサ３１は、図４に示すように、結合システム１０からリスト要求信号を受信すると（Ｓ２１０でＹｅｓ）、第二データベース３５１を参照し、第二データベース３５１に第二個別特徴データＦ２が登録された第二グループの消費者リストを、結合システム１０に送信する（Ｓ２１５）。第二グループの消費者リストには、第二グループの消費者の識別コードが含まれるが、消費者の個人情報は含まれない。 As shown in FIG. 4, when the processor 31 of the data providing system 30 receives the list request signal from the combined system 10 (Yes in S210), the processor 31 refers to the second database 351 and stores the second individual feature data in the second database 351. The consumer list of the second group in which F2 is registered is transmitted to the combined system 10 (S215). The consumer list of the second group includes the identification code of the consumer of the second group, but does not include consumer personal information.

結合システム１０のプロセッサ１１は、この消費者リストを受信した後、受信した消費者リストが示す第二グループの消費者の識別コードと、Ｓ１１０で読み出した第一データベース１５１が示す第一グループの消費者の識別コードとを比較することにより（Ｓ１２５）、第一グループと第二グループとの間の共通消費者を識別することができる。 After receiving this consumer list, the processor 11 of the combined system 10 receives the identification code of the second group of consumers indicated by the received consumer list and the consumption of the first group indicated by the first database 151 read in S110. The common consumer between the first group and the second group can be identified by comparing the identification code of the person (S125).

プロセッサ１１は、Ｓ１２０で共通消費者を識別すると、データ提供システム３０にデ
ータベース要求信号を送信する（Ｓ１３０）。データベース要求信号には、Ｓ１２０で識別した共通消費者のリストが付属する。共通消費者のリストには、第一グループに含まれる共通消費者の識別コードが含まれる。 When the processor 11 identifies the common consumer in S120, the processor 11 transmits a database request signal to the data providing system 30 (S130). The database request signal is accompanied by a list of common consumers identified in S120. The common consumer list includes identification codes of common consumers included in the first group.

データ提供システム３０のプロセッサ３１は、図４に示すように、結合システム１０からデータベース要求信号を受信すると（Ｓ２２０でＹｅｓ）、データベース要求信号に付属する共通消費者のリストに基づき、第二データベース３５１を参照して、共通消費者の第二個別特徴データＦ２を識別する（Ｓ２３０）。 As shown in FIG. 4, when the processor 31 of the data providing system 30 receives the database request signal from the combined system 10 (Yes in S220), the second database 351 is based on the list of common consumers attached to the database request signal. The second individual characteristic data F2 of the common consumer is identified (S230).

その後、プロセッサ３１は、共通消費者の第二個別特徴データＦ２に基づいて、共通消費者を複数のクラスタにクラスタリングする（Ｓ２４０）。クラスタリングは、例えばｋ−ｍｅａｎｓ法（ｋ平均法）又はその他の公知技法に基づいて行うことができる。 Thereafter, the processor 31 clusters the common consumers into a plurality of clusters based on the second individual feature data F2 of the common consumers (S240). Clustering can be performed based on, for example, the k-means method (k-means method) or other known techniques.

クラスタリングは、一つのクラスタに含まれる消費者数（即ちクラスタサイズ）が、所定数以上となるように行われる。所定数は、個人情報保護の観点で定められる。クラスタリングは、共通消費者を、特徴空間上に配置したときの共通消費者の分布に基づき、特徴空間上で距離が近い複数の消費者を一つのクラスタにまとめるように行われる。 Clustering is performed so that the number of consumers (that is, cluster size) included in one cluster is equal to or greater than a predetermined number. The predetermined number is determined from the viewpoint of personal information protection. Clustering is performed based on the distribution of common consumers when common consumers are arranged on the feature space so that a plurality of consumers that are close to each other on the feature space are combined into one cluster.

クラスタリングに際し、プロセッサ３１は、共通消費者間の距離Ｄｙ［ｉ，ｊ］を、ユークリッド距離で求めることができる。消費者ｉと消費者ｊとの間の距離Ｄｙ［ｉ，ｊ］は、次式に従って算出され得る。 At the time of clustering, the processor 31 can obtain the distance Dy [i, j] between common consumers by the Euclidean distance. The distance Dy [i, j] between the consumer i and the consumer j can be calculated according to the following equation.

Ｄｙ［ｉ，ｊ］＝｛Σ（ｙｎ［ｉ］−ｙｎ［ｊ］）^２｝^１／２
ここで、Σ（ｙｎ［ｉ］−ｙｎ［ｊ］）^２は、ｎ＝１からｎ＝Ｎまでの（ｙｎ［ｉ］−ｙｎ［ｊ］）^２の和である。値Ｎは、第二個別特徴データＦ２において消費者の特徴を表すパラメータＹ１，Ｙ２，Ｙ３の数に対応する。図２ＢによればＮ＝３である。 Dy [i, j] = {Σ (yn [i] −yn [j]) ² } ^1/2
Here, Σ (yn [i] −yn [j]) ² is the sum of (yn [i] −yn [j]) ² from n = 1 to n = N. The value N corresponds to the number of parameters Y1, Y2, Y3 representing consumer characteristics in the second individual characteristic data F2. According to FIG. 2B, N = 3.

ｙｎ［ｉ］は、消費者ｉのパラメータＹｎの値である。ｙｎ［ｊ］は、消費者ｊのパラメータＹｎの値である。図５左領域には、第二データベース３５１が示され、ここでは、消費者ｉに関する第二個別特徴データＦ２の各パラメータＹ１，Ｙ２の値がｙ１［ｉ］，ｙ２［ｉ］で表されている（ｉ＝１，２，３，…）。 yn [i] is the value of parameter Yn for consumer i. yn [j] is a value of the parameter Yn of the consumer j. In the left area of FIG. 5, a second database 351 is shown. Here, the values of the parameters Y1 and Y2 of the second individual feature data F2 relating to the consumer i are represented by y1 [i] and y2 [i]. (I = 1, 2, 3,...).

プロセッサ３１は、Ｓ２４０において、第二グループ内の共通消費者を複数のクラスタにクラスタリングすると、Ｓ２５０に移行し、複数のクラスタのそれぞれに対して、対応するクラスタに属する複数消費者の第二個別特徴データＦ２を統合した、一つのクラスタ特徴データＦＣ２を生成する。 If the common consumer in the second group is clustered into a plurality of clusters in S240, the processor 31 moves to S250, and for each of the plurality of clusters, the second individual feature of the plurality of consumers belonging to the corresponding cluster. One cluster feature data FC2 is generated by integrating the data F2.

具体的に、プロセッサ３１は、対応するクラスタに属する複数消費者の第二個別特徴データＦ２が示す複数パラメータＹ１，Ｙ２，Ｙ３の値を、パラメータ毎に、一つの統計量ＳＴに変換して、このクラスタに対応する一つのクラスタ特徴データＦＣ２を生成する。 Specifically, the processor 31 converts the values of the plurality of parameters Y1, Y2, Y3 indicated by the second individual feature data F2 of the plurality of consumers belonging to the corresponding cluster into one statistic ST for each parameter, One cluster feature data FC2 corresponding to this cluster is generated.

図５右領域には、図５左領域に示される第二データベース３５１内の第二個別特徴データＦ２の一群に基づき、生成されるクラスタ特徴データＦＣ２の一群が例示される。 The right region of FIG. 5 illustrates a group of cluster feature data FC2 generated based on the group of second individual feature data F2 in the second database 351 shown in the left region of FIG.

図５に示す例によれば、識別コードＩＤ００１，ＩＤ００２，ＩＤ００３の消費者が一つのクラスタにまとめられて、これら消費者の第二個別特徴データＦ２に基づき、一つのクラスタ特徴データＦＣ２が生成される。 According to the example shown in FIG. 5, the consumers with the identification codes ID001, ID002, and ID003 are combined into one cluster, and one cluster feature data FC2 is generated based on the second individual feature data F2 of these consumers. The

この例によれば、識別コードＩＤ００１，ＩＤ００２，ＩＤ００３のパラメータＹ１の値ｙ１［１］，ｙ１［２］，ｙ１［３］が、一つの統計量ＳＴ｛ｙ１［１］，ｙ１［２］
，ｙ１［３］｝に変換される。ここで、表現ＳＴ｛｝は、括弧｛｝内の値の統計量を意味すると理解されてよい。同様に、パラメータＹ２の値ｙ２［１］，ｙ２［２］，ｙ２［３］が、一つの統計量ＳＴ｛ｙ２［１］，ｙ２［２］，ｙ２［３］｝に変換される。同様に、パラメータＹ３の値ｙ３［１］，ｙ３［２］，ｙ３［３］が、一つの統計量ＳＴ｛ｙ３［１］，ｙ３［２］，ｙ３［３］｝に変換される。 According to this example, the values y1 [1], y1 [2], y1 [3] of the parameter Y1 of the identification codes ID001, ID002, ID003 are one statistic ST {y1 [1], y1 [2].
, Y1 [3]}. Here, the expression ST {} may be understood to mean the statistic of the value in parentheses {}. Similarly, the values y2 [1], y2 [2], y2 [3] of the parameter Y2 are converted into one statistic ST {y2 [1], y2 [2], y2 [3]}. Similarly, the values y3 [1], y3 [2], y3 [3] of the parameter Y3 are converted into one statistic ST {y3 [1], y3 [2], y3 [3]}.

これにより、同一クラスタに属する識別コードＩＤ００１，ＩＤ００２，ＩＤ００３の消費者の第二個別特徴データＦ２は、同一クラスタに属する消費者の特徴を、パラメータＹ１，Ｙ２，Ｙ３の統計量ＳＴ｛ｙ１［１］，ｙ１［２］，ｙ１［３］｝，ＳＴ｛ｙ２［１］，ｙ２［２］，ｙ２［３］｝，ＳＴ｛ｙ３［１］，ｙ３［２］，ｙ３［３］｝で表す統計データＦＳ２を備えた一つのクラスタ特徴データＦＣ２に変換される。このクラスタ特徴データＦＣ２は、対応するクラスタに属する消費者の識別コードＩＤ００１，ＩＤ００２，ＩＤ００３を統計データＦＳ２に関連付けて備える。 As a result, the second individual feature data F2 of the consumers with the identification codes ID001, ID002, and ID003 belonging to the same cluster is obtained by using the statistics ST {y1 [1] of the parameters Y1, Y2, and Y3 of the consumers that belong to the same cluster. ], Y1 [2], y1 [3]}, ST {y2 [1], y2 [2], y2 [3]}, ST {y3 [1], y3 [2], y3 [3]} It is converted into one cluster feature data FC2 having the statistical data FS2. The cluster feature data FC2 includes identification codes ID001, ID002, and ID003 of consumers belonging to the corresponding cluster in association with the statistical data FS2.

統計量ＳＴは、パラメータＹ１，Ｙ２，Ｙ３の種類毎に予め定められ得る。統計量ＳＴの例には、比率、平均値、中央値、最大値、標準偏差、及び分散が含まれる。平均値、中央値、及び最大値は、クラスタの代表値に対応する。統計量ＳＴは、複数の異なる統計量の組合せであってもよい。 The statistic ST can be determined in advance for each type of parameters Y1, Y2, Y3. Examples of the statistic ST include ratio, average value, median value, maximum value, standard deviation, and variance. The average value, median value, and maximum value correspond to the representative value of the cluster. The statistic ST may be a combination of a plurality of different statistics.

例えば、パラメータが性別を表すとき、統計量ＳＴは、クラスタ内の男性及び／又は女性の比率を表し得る。パラメータが年齢を表すとき、統計量ＳＴは、クラスタ内の平均年齢を表し得る。パラメータが商品の購入経験を表すとき、統計量ＳＴは、クラスタ内において商品の購入経験を有する消費者の比率であり得る。パラメータが商品の購入数を表すとき、統計量ＳＴは、商品の購入数の平均値、中央値、及び、最大値の一つ又は組合せであり得る。 For example, when the parameter represents gender, the statistic ST may represent the ratio of males and / or females in the cluster. When the parameter represents age, the statistic ST may represent the average age within the cluster. When the parameter represents product purchase experience, the statistic ST can be the proportion of consumers who have product purchase experience in the cluster. When the parameter represents the number of products purchased, the statistic ST can be one or a combination of the average value, median value, and maximum value of the number of products purchased.

図５によれば同様に、識別コードＩＤ００４，ＩＤ００５，ＩＤ００６，ＩＤ００７の消費者が一つのクラスタにまとめられて、これらの消費者の特徴を統計量ＳＴで表す一つのクラスタ特徴データＦＣ２が生成される。図５によれば同様に、識別コードＩＤ００８，ＩＤ００９，ＩＤ００１０の消費者が一つのクラスタにまとめられて、これらの消費者の特徴を統計量ＳＴで表す一つのクラスタ特徴データＦＣ２が生成される。 Similarly, according to FIG. 5, the consumers of the identification codes ID004, ID005, ID006, and ID007 are grouped into one cluster, and one cluster feature data FC2 that represents these consumer features as a statistic ST is generated. The Similarly, according to FIG. 5, the consumers of the identification codes ID008, ID009, and ID0010 are combined into one cluster, and one cluster feature data FC2 that expresses the features of these consumers as a statistic ST is generated.

Ｓ２５０での処理後、プロセッサ３１は、第二グループ内の非共通消費者を複数のクラスタにクラスタリングする（Ｓ２６０）。このようにして、本実施形態では、第二グループ内の非共通消費者を、共通消費者とは別にクラスタリングし、一つのクラスタ内に、共通消費者と非共通消費者とが混在しないようにする。 After the processing in S250, the processor 31 clusters the non-common consumers in the second group into a plurality of clusters (S260). In this way, in this embodiment, non-common consumers in the second group are clustered separately from common consumers so that common consumers and non-common consumers are not mixed in one cluster. To do.

Ｓ２６０でのクラスタリングも、Ｓ２４０と同様に行われる。即ち、クラスタリングは、一つのクラスタに含まれる消費者数が、所定数以上となるように行われる。クラスタリングは、非共通消費者を、特徴空間上に配置したときの非共通消費者の分布に基づき、特徴空間上で距離が近い複数の消費者を一つのクラスタにまとめるように行われる。 Clustering in S260 is also performed in the same manner as S240. That is, clustering is performed so that the number of consumers included in one cluster is equal to or greater than a predetermined number. Clustering is performed so that a plurality of consumers that are close to each other in the feature space are combined into one cluster based on the distribution of the non-common consumers when the non-common consumers are arranged on the feature space.

プロセッサ３１は、Ｓ２６０での処理を終えると、Ｓ２７０に移行し、非共通消費者に関する複数のクラスタのそれぞれに対し、一つのクラスタ特徴データＦＣ２を生成する。Ｓ２７０におけるクラスタ特徴データＦＣ２の生成は、Ｓ２５０での処理と同様に行われる。 After completing the processing in S260, the processor 31 proceeds to S270, and generates one cluster feature data FC2 for each of the plurality of clusters related to the non-common consumer. The generation of the cluster feature data FC2 in S270 is performed in the same manner as the processing in S250.

図５の例によれば、識別コードＩＤ０１５，ＩＤ０１６の非共通消費者が一つのクラスタにまとめられて、これらの消費者の特徴を統計量ＳＴで表す一つのクラスタ特徴データＦＣ２が生成される。同様に、識別コードＩＤ０１７，ＩＤ００１８の消費者が一つのク
ラスタにまとめられて、これらの消費者の特徴を統計量ＳＴで表す一つのクラスタ特徴データＦＣ２が生成される。 According to the example of FIG. 5, the non-common consumers with the identification codes ID015 and ID016 are grouped into one cluster, and one cluster feature data FC2 that expresses the features of these consumers as a statistic ST is generated. Similarly, the consumers of the identification codes ID017 and ID0018 are collected into one cluster, and one cluster feature data FC2 that represents the characteristics of these consumers as a statistic ST is generated.

本実施形態において、複数の第二個別特徴データＦ２が統合されて、一つのクラスタ特徴データＦＣ２が生成されるのは、消費者の個人情報を保護するためである。従って、クラスタの人数は、二人より多いのが好ましい。ここでは、説明及び図示を簡単にするために、クラスタ内の消費者数が二人である例を示しただけである。 In the present embodiment, the plurality of second individual feature data F2 are integrated to generate one cluster feature data FC2 in order to protect consumer personal information. Therefore, the number of clusters is preferably greater than two. Here, in order to simplify the explanation and illustration, only an example in which the number of consumers in the cluster is two is shown.

Ｓ２７０での処理を終えると、プロセッサ３１は、Ｓ２５０及びＳ２７０で生成した共通消費者及び非共通消費者のクラスタ特徴データＦＣ２の一群を、加工後データベースＦＰ２として、データベース要求元の結合システム１０に送信する（Ｓ２８０）。 When the processing in S270 is completed, the processor 31 transmits the group of common consumer and non-common consumer cluster feature data FC2 generated in S250 and S270 as the post-processing database FP2 to the combined system 10 of the database request source. (S280).

図６には、加工後データベースＦＰ２の例を示す。図６から理解できるように、加工後データベースＦＰ２は、共通消費者及び非共通消費者のそれぞれに関して、クラスタ毎のクラスタ特徴データＦＣ２を有する。図６では、クラスタ特徴データＦＣ２内の複数パラメータＹ１，Ｙ２，Ｙ３の値が、「ＳＴ」で表される。「ＳＴ」は、対応するパラメータの値が統計量ＳＴであることを示す。図６における太実線は、共通消費者と非共通消費者との境界を示す。 FIG. 6 shows an example of the post-processing database FP2. As can be understood from FIG. 6, the post-processing database FP2 has cluster feature data FC2 for each cluster for each of the common consumer and the non-common consumer. In FIG. 6, the values of the plurality of parameters Y1, Y2, Y3 in the cluster feature data FC2 are represented by “ST”. “ST” indicates that the value of the corresponding parameter is the statistic ST. A thick solid line in FIG. 6 indicates a boundary between a common consumer and a non-common consumer.

結合システム１０のプロセッサ１１は、Ｓ１３０で送信したデータベース要求信号に応答してデータ提供システム３０から送信されてくる加工後データベースＦＰ２を受信すると（Ｓ１４０）、Ｓ１５０に移行する。Ｓ１５０において、プロセッサ１１は、第一データベース１５１とデータ提供システム３０から受信した加工後データベースＦＰ２とを結合することにより、結合データベース１５５を生成する（Ｓ１５０）。図７には、結合データベース１５５の例が示される。 When the processor 11 of the combined system 10 receives the processed database FP2 transmitted from the data providing system 30 in response to the database request signal transmitted in S130 (S140), the process proceeds to S150. In S150, the processor 11 generates the combined database 155 by combining the first database 151 and the processed database FP2 received from the data providing system 30 (S150). FIG. 7 shows an example of the combined database 155.

Ｓ１５０において、プロセッサ１１は、第一個別特徴データＦ１に関連付けられた識別コードに基づき、第一個別特徴データＦ１のそれぞれを、対応する識別コードに関連付けられたクラスタ特徴データＦＣ２の統計データＦＳ２と結合するように、第一個別特徴データＦ１の一群とクラスタ特徴データＦＣ２の一群とを結合して、結合データベース１５５を生成する。 In S150, based on the identification code associated with the first individual feature data F1, the processor 11 combines each of the first individual feature data F1 with the statistical data FS2 of the cluster feature data FC2 associated with the corresponding identification code. As described above, the group of first individual feature data F1 and the group of cluster feature data FC2 are combined to generate a combined database 155.

即ち、Ｓ１５０では、第一個別特徴データＦ１のそれぞれが、同一の識別コードに関連付けられたクラスタ特徴データＦＣ２の統計データＦＳ２と結合されて、結合データベース１５５が生成される。但し、第一グループにおける非共通消費者の第一個別特徴データＦ１は、クラスタ特徴データＦＣ２と結合されることなく、結合データベース１５５に登録される。同様に、第二グループにおける非共通消費者のクラスタ特徴データＦＣ２は、第一個別特徴データＦ１と結合されることなく、結合データベース１５５に登録される。このようにして結合データベース１５５は生成される。 That is, in S150, each of the first individual feature data F1 is combined with the statistical data FS2 of the cluster feature data FC2 associated with the same identification code, and the combined database 155 is generated. However, the first individual feature data F1 of the non-common consumer in the first group is registered in the combined database 155 without being combined with the cluster feature data FC2. Similarly, the non-common consumer cluster feature data FC2 in the second group is registered in the combined database 155 without being combined with the first individual feature data F1. In this way, the combined database 155 is generated.

付言すると、データ提供システム３０から提供されるクラスタ特徴データＦＣ２のそれぞれは、図８Ａに示すように、統計データＦＳ２として、対応するクラスタに属する複数の消費者のそれぞれの統計データＦＳＩを有した構成にされてもよい。この場合、対応するクラスタ内の各消費者の統計データＦＳＩは、一つの統計データを複製した同一データであり、同じ統計量を示す。 In addition, as shown in FIG. 8A, each of the cluster feature data FC2 provided from the data providing system 30 has, as the statistical data FS2, the statistical data FSI of a plurality of consumers belonging to the corresponding cluster. May be. In this case, the statistical data FSI of each consumer in the corresponding cluster is the same data copied from one statistical data, and shows the same statistical quantity.

この場合、結合データベース１５５では、図８Ｂに示すように、消費者毎に、第一個別特徴データＦ１と、対応するクラスタ特徴データＦＣ２の統計データＦＳＩとが結合される。 In this case, in the combined database 155, as shown in FIG. 8B, the first individual feature data F1 and the statistical data FSI of the corresponding cluster feature data FC2 are combined for each consumer.

図７に示す例によれば、図及び説明を簡単にするために、一つのクラスタが、連続する識別コードでまとめられている。しかしながら、クラスタリングは、特徴空間上の距離が近い、即ち、特徴の類似する複数の消費者を一つのクラスタにまとめるように行われる。従って、一つのクラスタに属する複数の消費者は、第一データベース１５１及び第二データベース３５１において、分散しているのが通常である。 According to the example shown in FIG. 7, for the sake of simplicity of illustration and description, one cluster is grouped with continuous identification codes. However, the clustering is performed so that a plurality of consumers having a short distance in the feature space, that is, similar features, are combined into one cluster. Therefore, a plurality of consumers belonging to one cluster are usually distributed in the first database 151 and the second database 351.

統計データＦＳＩを消費者毎に用意すれば、加工後データベースＦＰ２及び結合データベース１５５において、消費者の配列を、一つのクラスタに属する複数の消費者が隣接するように並び替える必要がなく、便利である。 If the statistical data FSI is prepared for each consumer, it is not necessary to rearrange the consumer array in the post-processing database FP2 and the combined database 155 so that a plurality of consumers belonging to one cluster are adjacent to each other. is there.

本実施形態では、データ提供システム３０が個人情報保護を図りながら結合システム１０に消費者に関するデータを消費者の識別コード付で提供することができ、結合システム１０は、識別コードに基づき精度よく、同一消費者に関する複数種のデータを結合して、消費者に関する有意義な結合データベース１５５を生成することができる。従って、結合データベース１５５は、これを参照可能な企業が消費者行動を分析するのに大変役立つ。 In this embodiment, the data providing system 30 can provide consumer-related data with a consumer identification code to the coupling system 10 while protecting personal information, and the coupling system 10 is accurate based on the identification code, Multiple types of data about the same consumer can be combined to generate a meaningful combined database 155 about the consumer. Thus, the combined database 155 is very useful for companies that can refer to it to analyze consumer behavior.

［第二実施形態］
続いて、第二実施形態のデータ加工システム２を説明する。本実施形態のデータ加工システム２は、第一実施形態のデータ加工システム１に、更なるデータ提供システム５０が設けられた構成にされる。図９に示すように、データ加工システム２は、結合システム１０及びデータ提供システム３０に加えて、更なるデータ提供システム５０を備える。以下では、データ提供システム３０のことを、第一データ提供システム３０と表現し、データ提供システム５０のことを第二データ提供システム５０と表現する。以下では、第一実施形態と同様の構成に関する説明を、適宜省略する。 [Second Embodiment]
Then, the data processing system 2 of 2nd embodiment is demonstrated. The data processing system 2 of this embodiment is configured such that a further data providing system 50 is provided in the data processing system 1 of the first embodiment. As shown in FIG. 9, the data processing system 2 includes a further data providing system 50 in addition to the combination system 10 and the data providing system 30. Hereinafter, the data providing system 30 is expressed as the first data providing system 30, and the data providing system 50 is expressed as the second data providing system 50. Below, the description regarding the structure similar to 1st embodiment is abbreviate | omitted suitably.

第二データ提供システム５０は、プロセッサ５１と、メモリ５３と、ストレージ装置５５と、を備える。第二データ提供システム５０は、ネットワークＮＴを通じて、結合システム１０と通信可能に接続される。プロセッサ５１は、ストレージ装置５５に記憶されたコンピュータプログラムに従う処理を実行する。ストレージ装置５５は、第三グループの消費者に関する第三データベース５５１を格納する。 The second data providing system 50 includes a processor 51, a memory 53, and a storage device 55. The second data providing system 50 is communicably connected to the coupling system 10 through the network NT. The processor 51 executes processing according to the computer program stored in the storage device 55. The storage device 55 stores a third database 551 related to the third group of consumers.

第三データベース５５１は、図１０Ａに示すように、第三グループの消費者毎に、個別特徴データＦ３を有する。以下では、第三データベース５５１が有する個別特徴データＦ３のことを第三個別特徴データＦ３と表現する。第三個別特徴データＦ３は、対応する一人の消費者の特徴を、その消費者の識別コードに関連付けて表す。具体的には、第三個別特徴データＦ３は、対応する消費者の特徴を、複数のパラメータＺ１，Ｚ２，Ｚ３で表す。パラメータ数が限定されるものではないことは、第一及び第二個別特徴データＦ１，Ｆ２と同様である。 As shown in FIG. 10A, the third database 551 has individual feature data F3 for each consumer of the third group. Hereinafter, the individual feature data F3 included in the third database 551 is expressed as third individual feature data F3. The third individual characteristic data F3 represents the characteristic of a corresponding one consumer in association with the consumer's identification code. Specifically, the third individual feature data F3 represents the corresponding consumer features by a plurality of parameters Z1, Z2, and Z3. The number of parameters is not limited as in the first and second individual feature data F1, F2.

パラメータＺ１，Ｚ２，Ｚ３は、第一個別特徴データＦ１のパラメータＸ１，Ｘ２，Ｘ３及び第二個別特徴データＦ２のパラメータＹ１，Ｙ２，Ｙ３とは少なくとも部分的に異なるパラメータである。パラメータＺ１，Ｚ２，Ｚ３には、第一個別特徴データＦ１及び第二個別特徴データＦ２とは異なる商品に関する、消費者の消費行動に関するパラメータが含まれ得る。 The parameters Z1, Z2, and Z3 are parameters that are at least partially different from the parameters X1, X2, and X3 of the first individual feature data F1 and the parameters Y1, Y2, and Y3 of the second individual feature data F2. The parameters Z1, Z2, and Z3 may include parameters related to consumer consumption behavior related to products different from the first individual feature data F1 and the second individual feature data F2.

また、図１０Ａから理解できるように、第一グループと第三グループとの間の共通消費者は、第一グループと第二グループとの間の共通消費者とは異なる。図２Ｂに示す例によれば、第一グループと第二グループとの間の共通消費者は、識別コードＩＤ００１から識別コードＩＤ０１０までの消費者であるのに対し、図１０Ａに示す例によれば、第一グループと第三グループとの間の共通消費者は、範囲Ｃに示すように、識別コードＩＤ００５
から識別コードＩＤ０１４までの消費者である。 Moreover, as can be understood from FIG. 10A, the common consumer between the first group and the third group is different from the common consumer between the first group and the second group. According to the example shown in FIG. 2B, the common consumers between the first group and the second group are consumers from the identification code ID001 to the identification code ID010, whereas according to the example shown in FIG. 10A. The common consumer between the first group and the third group has an identification code ID005 as shown in range C.
To the identification code ID014.

本実施形態において、結合システム１０のプロセッサ１１は、第一データベース１５１と、第一データ提供システム３０から得た加工後データベースＦＰ２と、に基づく結合データベース１５５に、第二データ提供システム５０から得た、第三データベース５５１に基づく加工後データベースＦＰ３を結合して、結合データベース１５５を拡張するように構成される。そのために、プロセッサ１１は、第二データ提供システム５０に対しても図３に示すＳ１２０−Ｓ１４０の処理を実行し、第二データ提供システム５０から加工後データベースＦＰ３を取得する。 In this embodiment, the processor 11 of the combined system 10 obtained from the second data providing system 50 into the combined database 155 based on the first database 151 and the processed database FP2 obtained from the first data providing system 30. The post-processing database FP3 based on the third database 551 is combined to extend the combined database 155. For that purpose, the processor 11 also executes the processing of S120 to S140 shown in FIG. 3 for the second data providing system 50, and acquires the post-processing database FP3 from the second data providing system 50.

第二データ提供システム５０のプロセッサ５１は、第一データ提供システム３０と同様に、図４に示す処理を実行し、結合システム１０からリスト要求信号を受信した場合には（Ｓ２１０でＹｅｓ）、第三グループの消費者リストを結合システム１０に提供する（Ｓ２１５）。 Similarly to the first data providing system 30, the processor 51 of the second data providing system 50 executes the processing shown in FIG. 4 and when the list request signal is received from the combined system 10 (Yes in S210), Three groups of consumer lists are provided to the combined system 10 (S215).

第二データ提供システム５０のプロセッサ５１は、結合システム１０からデータベース要求信号を受信した場合には（Ｓ２２０でＹｅｓ）、第一グループと第三グループとの間の共通消費者の第三個別特徴データＦ３を識別し（Ｓ２３０）、この第三個別特徴データＦ３に基づき、第一グループと第三グループとの間の共通消費者を、複数のクラスタにクラスタリングし（Ｓ２４０）、クラスタ毎のクラスタ特徴データＦＣ３を生成する（Ｓ２５０）。クラスタ特徴データＦＣ３は、図１０Ｂに示される。 When the processor 51 of the second data providing system 50 receives the database request signal from the combined system 10 (Yes in S220), the third individual characteristic data of the common consumer between the first group and the third group. F3 is identified (S230), and based on the third individual feature data F3, common consumers between the first group and the third group are clustered into a plurality of clusters (S240), and cluster feature data for each cluster. FC3 is generated (S250). The cluster feature data FC3 is shown in FIG. 10B.

第二データ提供システム５０におけるクラスタリング手法及びクラスタ特徴データＦＣ３の生成手法は、第一データ提供システム３０と同様である。即ち、プロセッサ５１は、クラスタ毎に、対応するクラスタに属する複数消費者の第三個別特徴データＦ３が示す複数パラメータＺ１，Ｚ２，Ｚ３の値を、パラメータ毎に、一つの統計量ＳＴに変換して、このクラスタに対応する一つのクラスタ特徴データＦＣ３を生成する（Ｓ２５０）。クラスタ特徴データＦＣ３は、パラメータ毎の統計量ＳＴを含む統計データＦＳ３を、クラスタに属する複数消費者の識別コードに関連付けて有する。 The clustering method and the cluster feature data FC3 generation method in the second data providing system 50 are the same as those in the first data providing system 30. That is, for each cluster, the processor 51 converts the values of the plurality of parameters Z1, Z2, and Z3 indicated by the third individual feature data F3 of the plurality of consumers belonging to the corresponding cluster into one statistic ST for each parameter. Thus, one cluster feature data FC3 corresponding to this cluster is generated (S250). The cluster feature data FC3 includes statistical data FS3 including the statistical amount ST for each parameter in association with identification codes of a plurality of consumers belonging to the cluster.

このように共通消費者に関するクラスタ特徴データＦＣ３の一群を生成すると（Ｓ２５０）、プロセッサ５１は更に、第三グループにおける第一グループとの非共通消費者を、複数のクラスタにクラスタリングし（Ｓ２６０）、クラスタ毎に、対応する一つのクラスタ特徴データＦＣ３を生成する（Ｓ２７０）。そして、これら共通消費者及び非共通消費者のクラスタ特徴データＦＣ３の一群を、加工後データベースＦＰ３として、結合システム１０に送信する（Ｓ２８０）。図１０Ｂに示される太実線は、加工後データベースＦＰ３において第一グループと第三グループとの間の共通消費者と非共通消費者との境界を示す。 When a group of cluster feature data FC3 related to the common consumer is generated in this way (S250), the processor 51 further clusters non-common consumers with the first group in the third group into a plurality of clusters (S260), For each cluster, one corresponding cluster feature data FC3 is generated (S270). Then, a group of the cluster feature data FC3 of the common consumer and the non-common consumer is transmitted to the combined system 10 as the processed database FP3 (S280). The thick solid line shown in FIG. 10B indicates the boundary between the common consumer and the non-common consumer between the first group and the third group in the processed database FP3.

結合システム１０のプロセッサ１１は、第二データ提供システム５０から加工後データベースＦＰ３を受信すると（Ｓ１４０）、この加工後データベースＦＰ３に基づき、結合データベース１５５を拡張する（Ｓ１５０）。即ち、プロセッサ１１は、第一データベース１５１に加工後データベースＦＰ２を結合した結合データベース１５５に、更に、第三データベース５５１に基づく加工後データベースＦＰ３を結合することにより、結合データベース１５５を拡張する（Ｓ１５０）。 Upon receiving the processed database FP3 from the second data providing system 50 (S140), the processor 11 of the combined system 10 expands the combined database 155 based on the processed database FP3 (S150). That is, the processor 11 expands the combined database 155 by further combining the processed database FP3 based on the third database 551 with the combined database 155 obtained by combining the processed database FP2 with the first database 151 (S150). .

拡張された結合データベース１５５の例は、図１１に示される。この結合データベース１５５では、図１１に示されるように、消費者の特徴が多くのパラメータで表される。しかも、第二グループ及び第三グループは、第一グループとの非共通消費者内に、第二グループと第三グループとの間で共通する消費者を有する。これら消費者に関する統計データ
ＦＳ２，ＦＳ３は、拡張された結合データベース１５５において結合される。従って、拡張された結合データベース１５５は、消費者行動に分析に大変役立つ。 An example of an extended combined database 155 is shown in FIG. In this combined database 155, as shown in FIG. 11, consumer characteristics are represented by a number of parameters. And the 2nd group and the 3rd group have a consumer common between a 2nd group and a 3rd group in a non-common consumer with a 1st group. These consumer statistical data FS2, FS3 are combined in an extended combined database 155. Thus, the expanded combined database 155 is very useful for analyzing consumer behavior.

［第三実施形態］
続いて、第三実施形態を説明する。第三実施形態のデータ加工システムは、第一実施形態のデータ加工システム１の変形例に対応する。以下では、第一実施形態と同様の構成に関する説明を、適宜省略する。 [Third embodiment]
Subsequently, a third embodiment will be described. The data processing system of the third embodiment corresponds to a modification of the data processing system 1 of the first embodiment. Below, the description regarding the structure similar to 1st embodiment is abbreviate | omitted suitably.

第三実施形態によれば、第一グループの消費者には、第一実施形態と同じ識別コード（以下、第一識別コードという。）が割り当てられているのに対し、第二グループの消費者には、第一識別コードとは異なる種類の第二識別コードが割り当てられている。このため、図１２Ａに示される本実施形態の第二データベース３５１において範囲Ｃの第二個別特徴データＦ４は、図２Ａの範囲Ｃに示される第一個別特徴データＦ１と同一消費者のデータであるが、第一グループと同一の識別コードを有さない。図１２Ａに例示される第二識別コードは、図２Ａに例示される第一識別コードとの間に一定の規則性を示すが、この規則性は、対応関係の理解を容易にするために便宜的に設けられたものであり、通常、第一識別コードと第二識別コードとの間に規則性はない。 According to the third embodiment, the first group of consumers is assigned the same identification code as the first embodiment (hereinafter referred to as the first identification code), whereas the second group of consumers. Is assigned a second identification code of a different type from the first identification code. Therefore, in the second database 351 of the present embodiment shown in FIG. 12A, the second individual feature data F4 in the range C is the same consumer data as the first individual feature data F1 shown in the range C in FIG. 2A. However, it does not have the same identification code as the first group. The second identification code illustrated in FIG. 12A exhibits a certain regularity with the first identification code illustrated in FIG. 2A, but this regularity is convenient for easy understanding of the correspondence. Usually, there is no regularity between the first identification code and the second identification code.

このため、本実施形態では、第一識別コードと第二識別コードとの間の対応関係を表す対応表が用意され、この対応表に基づき、第一グループと第二グループとの間の共通消費者が識別される。図１３に示す例示的な対応表は、第一識別コード毎に、対応する第二識別コードが記された構成にされる。 For this reason, in this embodiment, a correspondence table showing the correspondence between the first identification code and the second identification code is prepared, and based on this correspondence table, the common consumption between the first group and the second group is prepared. Is identified. The exemplary correspondence table shown in FIG. 13 has a configuration in which a corresponding second identification code is written for each first identification code.

結合システム１０は、例えば、外部システムから対応表を取得することができる。例えば、第一識別コード及び第二識別コードが異なるウェブサイト又はオンライン店舗のユーザＩＤである場合、これらのユーザＩＤは、ウェブサイトを利用するユーザ端末装置の端末ＩＤやクッキーＩＤを通じて結びつく。外部システムは、ユーザ端末装置が接続される広域ネットワーク（インターネット）を監視して、端末ＩＤやクッキーＩＤを拠り所に、対応表を生成することができる。あるいは、対応表は、ＩＤ連携サービスを提供する企業や提携企業の多い有価ポイントの発行会社から取得されてもよい。 For example, the combined system 10 can acquire a correspondence table from an external system. For example, if the first identification code and the second identification code are different website or online store user IDs, these user IDs are linked through the terminal ID and cookie ID of the user terminal device that uses the website. The external system can monitor the wide area network (Internet) to which the user terminal device is connected, and generate a correspondence table based on the terminal ID and the cookie ID. Alternatively, the correspondence table may be acquired from a company that provides ID linkage service or a company that issues valuable points with many affiliated companies.

本実施形態において、結合システム１０のプロセッサ１１は、図３に示す処理を実行し、Ｓ１２０では、上記外部システムから対応表を取得し、取得した対応表に基づいて第一グループと第二グループとの共通消費者を識別することができる。プロセッサ１１は、取得した対応表をメモリ１３又はストレージ装置１５に記憶することができる。 In the present embodiment, the processor 11 of the combined system 10 executes the processing shown in FIG. 3, and in S120, acquires a correspondence table from the external system, and based on the acquired correspondence table, the first group and the second group Can identify common consumers. The processor 11 can store the acquired correspondence table in the memory 13 or the storage device 15.

プロセッサ１１は、この共通消費者の第二識別コードのリストを付したデータベース要求信号を、データ提供システム３０に送信することができる（Ｓ１３０）。これによりデータ提供システム３０は、第一グループと第二グループとの間の共通消費者を、リスト内の第二識別コードに基づいて識別することができる（Ｓ２３０）。 The processor 11 can transmit the database request signal with the list of the second identification codes of the common consumers to the data providing system 30 (S130). Thereby, the data provision system 30 can identify the common consumer between the first group and the second group based on the second identification code in the list (S230).

そして、データ提供システム３０は、識別した共通消費者についてクラスタリングを行い（Ｓ２４０）、第二識別コードに関連付けられたクラスタ特徴データＦＣ４を有する図１２Ｂに示す加工後データベースＦＰ４を生成及び送信することができる（Ｓ２５０，Ｓ２８０）。加工後データベースＦＰ４は、非共通消費者のクラスタ特徴データが含まれない点で、第一実施形態の加工後データベースＦＰ２とは異なる。 Then, the data providing system 30 may perform clustering on the identified common consumer (S240), and generate and transmit the processed database FP4 shown in FIG. 12B having the cluster feature data FC4 associated with the second identification code. Yes (S250, S280). The post-processing database FP4 is different from the post-processing database FP2 of the first embodiment in that the non-common consumer cluster feature data is not included.

結合システム１０のプロセッサ１１は、この加工後データベースＦＰ４をデータ提供システム３０から取得すると（Ｓ１４０）、対応表を参照して、結合データベース１５５を生成する。即ち、対応表を参照して、第一個別特徴データＦ１のそれぞれを、その第一個
別特徴データＦ１が有する第一識別コードに対応する第二識別コードに関連付けられたクラスタ特徴データＦＣ４の統計データＦＳ４と結合し、結合データベース１５５を生成する。 When the processor 11 of the combined system 10 acquires the post-processing database FP4 from the data providing system 30 (S140), the combined database 155 is generated with reference to the correspondence table. That is, referring to the correspondence table, each piece of the first individual feature data F1 is statistical data of the cluster feature data FC4 associated with the second identification code corresponding to the first identification code included in the first individual feature data F1. Combine with FS4 to generate a combined database 155.

図１４には、本実施形態において生成される結合データベース１５５が示される。この結合データベース１５５は、非共通消費者に関するクラスタ特徴データを有さない。この結合データベース１５５の構成は、第一及び第二実施形態に適用されてもよい。この場合、第一実施形態及び第二実施形態におけるデータ提供システム３０，５０は、非共通消費者のクラスタ特徴データを結合システム１０に提供しないように構成されてもよい。 FIG. 14 shows a combined database 155 generated in the present embodiment. This combined database 155 does not have cluster feature data regarding non-common consumers. The configuration of the combined database 155 may be applied to the first and second embodiments. In this case, the data providing systems 30 and 50 in the first embodiment and the second embodiment may be configured not to provide the cluster characteristic data of non-common consumers to the combined system 10.

第三実施形態によれば、第一グループと第二グループとの間で異なる識別コードが用いられる場合にも、適切に、同一消費者の第一個別特徴データＦ１と、クラスタ特徴データＦＣ４とを結合して、有意義な結合データベース１５５を生成することができる。 According to the third embodiment, even when different identification codes are used between the first group and the second group, the first individual feature data F1 of the same consumer and the cluster feature data FC4 are appropriately displayed. Combined, a meaningful combined database 155 can be generated.

［第四実施形態］
続いて、第四実施形態を説明する。第四実施形態のデータ加工システムは、第一実施形態のデータ加工システム１の変形例に対応する。以下では、第一実施形態と同様の構成に関する説明を、適宜省略する。 [Fourth embodiment]
Subsequently, a fourth embodiment will be described. The data processing system of the fourth embodiment corresponds to a modification of the data processing system 1 of the first embodiment. Below, the description regarding the structure similar to 1st embodiment is abbreviate | omitted suitably.

第四実施形態によれば、結合システム１０のプロセッサ１１は、図１５に示す共通消費者のリストＬ４を付したデータベース要求信号を、Ｓ１３０でデータ提供システム３０に送信する。図１５に示すリストＬ４は、共通消費者のそれぞれの所属セグメントを表すセグメント情報を有した構成にされる。セグメント情報は、共通消費者のそれぞれの分類を表す分類情報に対応する。具体的に、このリストＬ４は、セグメント毎に、対応するセグメントに属する共通消費者の識別コードが記された構成にされる。 According to the fourth embodiment, the processor 11 of the combined system 10 transmits the database request signal with the common consumer list L4 shown in FIG. 15 to the data providing system 30 in S130. The list L4 shown in FIG. 15 is configured to have segment information representing each affiliation segment of the common consumer. The segment information corresponds to classification information representing each classification of common consumers. Specifically, the list L4 has a configuration in which identification codes of common consumers belonging to the corresponding segment are written for each segment.

図１５によれば、リストＬ４内には、第一セグメント（ＳＥＧ１）、第二セグメント（ＳＥＧ２）、及び、第三セグメント（ＳＥＧ３）が規定され、セグメント毎に、対応するセグメントに属する共通消費者の識別コードが記述されている。 According to FIG. 15, the first segment (SEG1), the second segment (SEG2), and the third segment (SEG3) are defined in the list L4. For each segment, the common consumers belonging to the corresponding segment The identification code is described.

セグメント情報は、結合システム１０がデータ提供システム３０におけるクラスタリングを制御するために用いられる。ここで言うクラスタリングの制御は、一つのクラスタ内に、異なるセグメントの消費者が混在しないように、データ提供システム３０におけるクラスタリングを制御することを含む。 The segment information is used by the combining system 10 to control clustering in the data providing system 30. The clustering control mentioned here includes controlling clustering in the data providing system 30 so that consumers of different segments are not mixed in one cluster.

プロセッサ１１は、Ｓ１３０において、予め定められた規則に従って、第一グループ内の共通消費者を、複数のセグメントに分類することができる。例えば、第一個別特徴データＦ１を参照することにより共通消費者の性別を判別することができる場合、プロセッサ１１は、異なる性別の共通消費者が同一のクラスタに混在しないように、リストＬ４内に男性のセグメント及び女性のセグメントを規定することができる。即ち、プロセッサ１１は、Ｓ１３０において、セグメント毎に、対応する性別の共通消費者の識別コードを記述したリストＬ４を生成して、データ提供システム３０に送信することができる。 In S <b> 130, the processor 11 can classify the common consumers in the first group into a plurality of segments according to a predetermined rule. For example, when the gender of the common consumer can be determined by referring to the first individual characteristic data F1, the processor 11 includes the list L4 so that common consumers of different genders are not mixed in the same cluster. Male and female segments can be defined. That is, in S130, the processor 11 can generate a list L4 describing the identification code of the corresponding common consumer for each segment for each segment, and transmit the list L4 to the data providing system 30.

あるいは、第一個別特徴データＦ１を参照することにより共通消費者の居住地域を判別することができる場合、プロセッサ１１は、異なる居住地域の共通消費者が同一のクラスタに混在しないように、居住地域毎のセグメントを規定することができる。即ち、プロセッサ１１は、Ｓ１３０において、セグメント毎に、対応する居住地域の共通消費者の識別コードを記述したリストＬ４を生成して、データ提供システム３０に送信することができる。消費者の個人情報を保護するために、リストＬ４は、セグメントの詳細、即ち、セグメント内の消費者の具体的な属性（性別や居住地域）を特定できるような情報を含まない
ように、生成され得る。 Alternatively, when the residential area of the common consumer can be determined by referring to the first individual characteristic data F1, the processor 11 sets the residential area so that common consumers in different residential areas are not mixed in the same cluster. Each segment can be defined. That is, in S130, the processor 11 can generate a list L4 describing the identification code of the common consumer in the corresponding residential area for each segment, and transmit the list L4 to the data providing system 30. To protect consumer personal information, list L4 is generated so that it does not contain segment details, ie, information that can identify specific attributes (gender and residential area) of consumers within the segment. Can be done.

このセグメント情報付の共通消費者のリストＬ４を受信するデータ提供システム３０のプロセッサ３１は、Ｓ２４０及びＳ２５０で、図４に示す処理に代えて、図１６に示す処理を実行することができる。図１６に示す処理では、リストＬ４に基づき、共通消費者がセグメント毎に、複数のクラスタにクラスタリングされる。 In S240 and S250, the processor 31 of the data providing system 30 that receives the common consumer list L4 with segment information can execute the process shown in FIG. 16 instead of the process shown in FIG. In the process shown in FIG. 16, based on the list L4, common consumers are clustered into a plurality of clusters for each segment.

図１６に示す例によれば、プロセッサ３１は、まず、受信したリストＬ４に基づき、第二グループ内の共通消費者の所属セグメントを識別する（Ｓ２４１）。その後、複数セグメントの一つを、処理対象セグメントに選択し（Ｓ２４３）、選択した処理対象セグメントに属する共通消費者を複数のクラスタにクラスタリングする（Ｓ２４５）。Ｓ２４５での処理後、プロセッサ３１は、処理対象セグメントにおける複数のクラスタのそれぞれに対して、対応するクラスタに属する複数消費者の第二個別特徴データＦ２に基づき、一つのクラスタ特徴データＦＣ２を生成する（Ｓ２４７）。 According to the example shown in FIG. 16, the processor 31 first identifies the belonging segment of the common consumer in the second group based on the received list L4 (S241). Thereafter, one of the plurality of segments is selected as a processing target segment (S243), and the common consumers belonging to the selected processing target segment are clustered into a plurality of clusters (S245). After the processing in S245, the processor 31 generates one cluster feature data FC2 for each of the plurality of clusters in the processing target segment based on the second individual feature data F2 of the plurality of consumers belonging to the corresponding cluster. (S247).

プロセッサ３１は、その後、全てのセグメントについてＳ２４５，Ｓ２４７の処理を実行したか判断する（Ｓ２４９）。全てのセグメントについて実行していないと判断した場合（Ｓ２４９でＮｏ）、プロセッサ３１は、処理対象セグメントとして未選択のセグメントの一つを新たな処理対象セグメントに選択し（Ｓ２４３）、Ｓ２４５及びＳ２４７の処理を実行する。全てのセグメントについて処理を実行したと判断すると（Ｓ２４９でＹｅｓ）、図１６に示す処理を終了する。 Thereafter, the processor 31 determines whether or not the processes of S245 and S247 have been executed for all segments (S249). If it is determined that all segments have not been executed (No in S249), the processor 31 selects one of the unselected segments as the processing target segment as a new processing target segment (S243), and the processing of S245 and S247 Execute the process. If it is determined that processing has been executed for all segments (Yes in S249), the processing shown in FIG. 16 is terminated.

本実施形態によれば、結合システム１０は、第一データベース１５１の情報を有意義に活用して、クラスタ特徴データＦＣ２の生成時に所望の情報が失われないように、データ提供システム３０のクラスタリングを制御することができる。従って、よりデータ分析に適した結合データベース１５５を生成することができる。 According to the present embodiment, the combination system 10 controls the clustering of the data providing system 30 so that the desired information is not lost when the cluster feature data FC2 is generated by meaningfully using the information in the first database 151. can do. Therefore, it is possible to generate a combined database 155 that is more suitable for data analysis.

本実施形態は、結合システム１０がデータ提供システム３０のクラスタリングを制御する思想、及び、データ提供システム３０が結合システム１０から指定された制約条件に従って複数の消費者をクラスタリングする思想を含む。 This embodiment includes the idea that the combined system 10 controls the clustering of the data providing system 30 and the idea that the data providing system 30 clusters a plurality of consumers according to the constraints specified by the combined system 10.

従って、更なる変形例として、結合システム１０は、セグメント情報を提供せずに、男女別でクラスタリングすること、居住地域別にクラスタリングすることを要求するデータベース要求信号を送信するように構成されてもよい。この場合、データ提供システム３０のプロセッサ３１は、第二データベース３５１の第二個別特徴データＦ２に基づいて、共通消費者及び非共通消費者のセグメント（性別又は居住地域）を判別して、セグメント毎のクラスタリングを、結合システム１０からの要求に応じて実行することができる。 Thus, as a further variation, the combined system 10 may be configured to transmit a database request signal requesting clustering by gender and clustering by residential area without providing segment information. . In this case, the processor 31 of the data providing system 30 determines the segment (gender or residential area) of the common consumer and the non-common consumer based on the second individual characteristic data F2 of the second database 351, and determines each segment. Can be performed in response to a request from the combined system 10.

［第五実施形態］
続いて、第五実施形態を説明する。第五実施形態のデータ加工システムは、第一実施形態のデータ加工システム１の変形例に対応する。以下では、第一実施形態と同様の構成に関する説明を、適宜省略する。 [Fifth embodiment]
Next, a fifth embodiment will be described. The data processing system of the fifth embodiment corresponds to a modification of the data processing system 1 of the first embodiment. Below, the description regarding the structure similar to 1st embodiment is abbreviate | omitted suitably.

第五実施形態によれば、結合システム１０のプロセッサ１１は、Ｓ１３０で図１７に示す処理を実行することにより、距離情報を含む共通消費者のリストＬ５を付したデータベース要求信号をデータ提供システム３０に送信する。 According to the fifth embodiment, the processor 11 of the combined system 10 executes the process shown in FIG. 17 in S130, thereby sending the database request signal with the common consumer list L5 including the distance information to the data providing system 30. Send to.

具体的に、プロセッサ１１は、第一データベース１５１の第一個別特徴データＦ１を参照することにより、共通消費者間の特徴空間上の距離Ｄｘ［ｉ，ｊ］を算出する（Ｓ１３１）。距離Ｄｘ［ｉ，ｊ］は、第一個別特徴データＦ１に基づく消費者ｉと消費者ｊとの
間の距離を示し、この距離は、消費者ｉと消費者ｊとの間の特徴の類似度に対応する。 Specifically, the processor 11 calculates the distance Dx [i, j] on the feature space between the common consumers by referring to the first individual feature data F1 of the first database 151 (S131). The distance Dx [i, j] indicates the distance between the consumer i and the consumer j based on the first individual feature data F1, and this distance is the similarity of the feature between the consumer i and the consumer j. Corresponds to the degree.

プロセッサ１１は、共通消費者間の特徴空間上の距離Ｄｘ［ｉ，ｊ］を、ユークリッド距離で求めることができる。消費者ｉと消費者ｊとの間の距離Ｄｘ［ｉ，ｊ］は、次式に従って算出され得る。 The processor 11 can obtain the distance Dx [i, j] on the feature space between the common consumers by the Euclidean distance. The distance Dx [i, j] between the consumer i and the consumer j can be calculated according to the following equation.

Ｄｘ［ｉ，ｊ］＝｛Σ（ｘｎ［ｉ］−ｘｎ［ｊ］）^２｝^１／２
ここで、Σ（ｘｎ［ｉ］−ｘｎ［ｊ］）^２は、ｎ＝１からｎ＝Ｍまでの（ｘｎ［ｉ］−ｘｎ［ｊ］）^２の和である。値Ｍは、第一個別特徴データＦ１において消費者の特徴を定義するパラメータＸ１，Ｘ２，Ｘ３の数に対応する。図２ＡによればＭ＝３である。ｘｎ［ｉ］は、消費者ｉのパラメータＸｎの値である。ｘｎ［ｊ］は、消費者ｊのパラメータＸｎの値である。Ｓ１３１では、共通消費者に該当する消費者ｉ及び消費者ｊの全ての組合せに関して、距離Ｄｘ［ｉ，ｊ］を算出する。 Dx [i, j] = {Σ (xn [i] −xn [j]) ² } ^1/2
Here, Σ (xn [i] −xn [j]) ² is the sum of (xn [i] −xn [j]) ² from n = 1 to n = M. The value M corresponds to the number of parameters X1, X2, and X3 that define consumer characteristics in the first individual characteristic data F1. According to FIG. 2A, M = 3. xn [i] is the value of the parameter Xn of the consumer i. xn [j] is a value of the parameter Xn of the consumer j. In S131, the distance Dx [i, j] is calculated for all combinations of the consumer i and the consumer j corresponding to the common consumer.

その後、プロセッサ１１は、共通消費者のリストＬ５であって、共通消費者の識別コードに、Ｓ１３１で算出した距離Ｄｘの情報を付した距離情報付のリストＬ５を生成する（Ｓ１３３）。図１７に示す例によれば、リストＬ５には、消費者ｊの識別コードに、消費者ｉとの組合せ毎の距離Ｄｘ［ｉ，ｊ］が関連付けられている。 After that, the processor 11 generates a list L5 with distance information, which is the common consumer list L5, and adds the information of the distance Dx calculated in S131 to the common consumer identification code (S133). According to the example shown in FIG. 17, in the list L5, the distance Dx [i, j] for each combination with the consumer i is associated with the identification code of the consumer j.

プロセッサ１１は、このようにＳ１３３で生成した距離情報付の共通消費者のリストＬ５をデータ提供システム３０に送信する（Ｓ１３５）。 The processor 11 transmits the list L5 of common consumers with distance information generated in S133 in this way to the data providing system 30 (S135).

一方、距離情報付のリストＬ５を受信するデータ提供システム３０のプロセッサ３１は、Ｓ２４０において、図１８に示す処理を実行することができる。即ち、プロセッサ３１は、クラスタリング対象の消費者間（即ち、共通消費者間）の距離Ｄｙ［ｉ，ｊ］を、第二個別特徴データＦ２に基づき算出する。距離Ｄｙｉ，ｊ］は、第一実施形態において説明した方法と同様の方法で算出することができる（Ｓ３１０）。 On the other hand, the processor 31 of the data providing system 30 that receives the list L5 with distance information can execute the process shown in FIG. 18 in S240. That is, the processor 31 calculates a distance Dy [i, j] between consumers to be clustered (that is, between common consumers) based on the second individual feature data F2. The distance Dyi, j] can be calculated by a method similar to the method described in the first embodiment (S310).

その後、プロセッサ３１は、消費者間の合成距離Ｄ［ｉ，ｊ］を、次式に従い算出する（Ｓ３２０）。合成距離Ｄ［ｉ，ｊ］は、消費者ｉと消費者ｊとの間の合成距離である。
Ｄ［ｉ，ｊ］＝（Ｄｘ［ｉ，ｊ］^２＋Ｄ２［ｉ，ｊ］^２）^１／２ Thereafter, the processor 31 calculates a composite distance D [i, j] between consumers according to the following equation (S320). The composite distance D [i, j] is a composite distance between the consumer i and the consumer j.
D [i, j] = (Dx [i, j] ² + D2 [i, j] ² ) ^1/2

Ｓ３２０での処理後、プロセッサ３１は、合成距離Ｄ［ｉ，ｊ］から特定されるクラスタリング対象の消費者の分布に基づき、合成距離Ｄ［ｉ，ｊが近い消費者同士を一つのクラスタにまとめるように、クラスタリング対象の消費者を、複数のクラスタにクラスタリングする（Ｓ３３０）。 After the processing in S320, the processor 31 collects consumers having a short synthetic distance D [i, j into one cluster based on the distribution of consumers to be clustered specified from the synthetic distance D [i, j]. As described above, the consumers to be clustered are clustered into a plurality of clusters (S330).

第五実施形態によれば、結合システム１０は、第一データベース１５１の個人情報を秘匿にしながら、第一データベース１５１から特定される共通消費者間の距離情報、即ち類似度情報をデータ提供システム３０に提供して、データ提供システム３０においてより良いクラスタリングが行われるようにする。従って、個人情報保護を図りながら、より情報価値の高い結合データベース１５５を生成することができる。 According to the fifth embodiment, the combining system 10 provides distance information between common consumers specified from the first database 151, that is, similarity information, as the data providing system 30 while keeping the personal information of the first database 151 secret. So that the data providing system 30 can perform better clustering. Therefore, the combined database 155 with higher information value can be generated while protecting personal information.

［第六実施形態］
続いて、第六実施形態を説明する。第六実施形態のデータ加工システムは、第一実施形態のデータ加工システム１の変形例に対応する。以下では、第一実施形態と同様の構成に関する説明を、適宜省略する。 [Sixth embodiment]
Subsequently, a sixth embodiment will be described. The data processing system of the sixth embodiment corresponds to a modification of the data processing system 1 of the first embodiment. Below, the description regarding the structure similar to 1st embodiment is abbreviate | omitted suitably.

第六実施形態によれば、結合システム１０のプロセッサ１１が、共通消費者を複数のクラスタにクラスタリングし、そのクラスタ情報をデータ提供システム３０に提供し、クラ
スタ毎のクラスタ特徴データＦＣ２を取得するように構成される。即ち、第六実施形態は、共通消費者に関するクラスタリングの主体がデータ提供システム３０ではなく、結合システム１０に変更されている点で、第一実施形態とは異なる。 According to the sixth embodiment, the processor 11 of the combined system 10 clusters common consumers into a plurality of clusters, provides the cluster information to the data providing system 30, and acquires the cluster feature data FC2 for each cluster. Configured. That is, the sixth embodiment is different from the first embodiment in that the subject of clustering related to common consumers is changed to the combined system 10 instead of the data providing system 30.

具体的に、本実施形態における結合システム１０は、プロセッサ１１が、図３に示す処理に代えて、図１９に示す結合データベース生成処理を実行するように構成される。データ提供システム３０は、プロセッサ３１が図４に示すＳ２３０からＳ２８０までの処理に代えて、図２０に示すＳ５３０からＳ５８０までの処理を実行するように構成される。 Specifically, the combined system 10 in the present embodiment is configured such that the processor 11 executes a combined database generation process shown in FIG. 19 instead of the process shown in FIG. The data providing system 30 is configured such that the processor 31 executes the processing from S530 to S580 shown in FIG. 20 instead of the processing from S230 to S280 shown in FIG.

結合システム１０のプロセッサ１１は、図１９に示す処理を開始すると、Ｓ１１０での処理と同様に、第一データベース１５１をストレージ装置１５から読み出す（Ｓ４１０）。更に、Ｓ１２０での処理と同様に、グループ間の共通消費者を識別する（Ｓ４２０）。 When starting the processing shown in FIG. 19, the processor 11 of the combined system 10 reads the first database 151 from the storage device 15 as in the processing in S110 (S410). Furthermore, the common consumer between groups is identified similarly to the process in S120 (S420).

その後、プロセッサ１１は、共通消費者の第一個別特徴データＦ１に基づき、共通消費者を複数のクラスタにクラスタリングする（Ｓ４２５）。ここでは、第二個別特徴データＦ２に代えて第一個別特徴データＦ１を用いることを除けば、Ｓ２４０での処理と同様の手法で、共通消費者をクラスタリングすることができる。 Thereafter, the processor 11 clusters the common consumers into a plurality of clusters based on the first individual characteristic data F1 of the common consumers (S425). Here, except for using the first individual feature data F1 instead of the second individual feature data F2, the common consumers can be clustered by the same method as the process in S240.

Ｓ４２５での処理後、プロセッサ１１は、クラスタ情報を含む、共通消費者の第二識別コードのリストを付したデータベース要求信号を、データ提供システム３０に送信する（Ｓ４３０）。クラスタ情報は、共通消費者のそれぞれが属するクラスタをデータ提供システム３０が識別可能に構成される。 After the processing in S425, the processor 11 transmits a database request signal including a list of the second identification codes of the common consumers including the cluster information to the data providing system 30 (S430). The cluster information is configured so that the data providing system 30 can identify the cluster to which each common consumer belongs.

データ提供システム３０のプロセッサ３１は、データベース要求信号を受信すると（Ｓ２２０でＹｅｓ）、図２０に示すように、リストに含まれるクラスタ情報に基づき、各共通消費者が属するクラスタを識別すると共に、リストに含まれる第二識別コードに基づき、各共通消費者の第二個別特徴データＦ２を識別する（Ｓ５３０）。 When receiving the database request signal (Yes in S220), the processor 31 of the data providing system 30 identifies the cluster to which each common consumer belongs, based on the cluster information included in the list, as shown in FIG. The second individual characteristic data F2 of each common consumer is identified on the basis of the second identification code included in (S530).

その後、プロセッサ３１は、クラスタ情報から識別される複数のクラスタのそれぞれに対して、対応するクラスタに属する複数消費者の第二個別特徴データＦ２を統合した、一つのクラスタ特徴データＦＣ２を生成する（Ｓ５５０）。Ｓ５５０では、第一実施形態におけるＳ２５０と同様の処理を実行することができる。 Thereafter, the processor 31 generates, for each of the plurality of clusters identified from the cluster information, one cluster feature data FC2 obtained by integrating the second individual feature data F2 of the plurality of consumers belonging to the corresponding cluster ( S550). In S550, the same process as S250 in the first embodiment can be executed.

Ｓ５５０での処理を終えると、プロセッサ３１は、第一実施形態におけるＳ２６０，Ｓ２７０と同様の処理をＳ５６０，Ｓ５７０で実行して、非共通消費者に関するクラスタ毎のクラスタ特徴データＦＣ２を生成することができる。 When the processing in S550 is completed, the processor 31 executes the processing similar to S260 and S270 in the first embodiment in S560 and S570, and generates cluster feature data FC2 for each cluster related to non-common consumers. it can.

その後、プロセッサ３１は、Ｓ５５０及びＳ５７０で生成した共通消費者及び非共通消費者のクラスタ特徴データＦＣ２の一群を、加工後データベースＦＰ２として、結合システム１０に送信する（Ｓ５８０）。 Thereafter, the processor 31 transmits the group of common consumer and non-common consumer cluster feature data FC2 generated in S550 and S570 to the combined system 10 as a processed database FP2 (S580).

結合システム１０のプロセッサ１１は、このようにデータ提供システム３０から送信されてくる加工後データベースＦＰ２を受信し（Ｓ４４０）、結合データベース１５５を生成する（Ｓ４５０）。プロセッサ１１は、Ｓ４４０及びＳ４５０において、第一実施形態におけるＳ１４０及びＳ１５０と同様の処理を実行することができる。 The processor 11 of the combined system 10 receives the post-processing database FP2 thus transmitted from the data providing system 30 (S440), and generates a combined database 155 (S450). The processor 11 can execute the same processes as S140 and S150 in the first embodiment in S440 and S450.

第六実施形態においても、第一実施形態と同様の結合データベース１５５を生成することができる。第六実施形態から理解できるように、共通消費者のクラスタリングは、結合システム１０及びデータ提供システム３０のいずれでも実行し得る。これに関連して、第五実施形態において結合システム１０がデータ提供システム３０に距離情報を提供する技
術思想は、データ提供システム３０が結合システム１０に距離情報を提供する形態で、第六実施形態にも適用し得る。 Also in the sixth embodiment, the same combined database 155 as in the first embodiment can be generated. As can be understood from the sixth embodiment, the common consumer clustering can be executed by either the combined system 10 or the data providing system 30. In this regard, the technical idea that the combining system 10 provides the distance information to the data providing system 30 in the fifth embodiment is the form in which the data providing system 30 provides the distance information to the combining system 10 in the sixth embodiment. It can also be applied to.

この場合には、データ提供システム３０のプロセッサ３１が、第二データベース３５１の第二個別特徴データＦ２に基づいて、共通消費者及び非共通消費者の距離情報を生成し、生成した距離情報を結合システム１０に提供することができる。結合システム１０のプロセッサ１１は、この距離情報を用いて、共通消費者を複数のクラスタにクラスタリングすることができる。この場合、プロセッサ１１は、距離情報を用いて、非共通消費者のクラスタリングを更に実行してもよい。 In this case, the processor 31 of the data providing system 30 generates the distance information of the common consumer and the non-common consumer based on the second individual feature data F2 of the second database 351, and combines the generated distance information. System 10 can be provided. The processor 11 of the combined system 10 can use this distance information to cluster common consumers into multiple clusters. In this case, the processor 11 may further perform clustering of non-common consumers using the distance information.

［その他］
以上、本開示の例示的実施形態を説明したが、本開示は、上記実施形態に限定されるものではなく、種々の態様を採り得ることは言うまでもない。 [Others]
As mentioned above, although exemplary embodiment of this indication was described, it cannot be overemphasized that this indication is not limited to the above-mentioned embodiment and can take various modes.

例えば、第一及び第二のグループの構成体は、消費者に限定されない。第一及び第二のグループの一方又は両方は、人の活動に関連する物及び場所の少なくとも一つの集合であってもよく、第一及び第二のグループの構成体は、これら集合の要素であってもよい。 For example, the first and second group members are not limited to consumers. One or both of the first and second groups may be at least one set of things and places related to human activity, and the members of the first and second groups are elements of these sets. There may be.

上述の実施形態では、結合システム１０がデータ提供システム３０から第二グループの消費者リストを取得して、共通消費者を識別するが、結合システム１０は、Ｓ１２０の処理を実行せず、Ｓ１３０において、第一グループ全体の消費者リストを送信するように構成されてもよい。この場合、データ提供システム３０は、Ｓ２３０において、第一グループ全体の消費者リストに含まれる識別コードと、第二グループの消費者の識別コードとを比較することにより、共通消費者を識別することができる。共通消費者の識別は、多種の方法で行うことができ、これらの方法のいずれが採用されてもよい。 In the above-described embodiment, the combined system 10 acquires the second group consumer list from the data providing system 30 and identifies the common consumer. However, the combined system 10 does not execute the process of S120, and in S130 The consumer list of the entire first group may be configured to be transmitted. In this case, in S230, the data providing system 30 identifies the common consumer by comparing the identification code included in the consumer list of the entire first group with the identification code of the second group of consumers. Can do. Common consumer identification can be performed in a variety of ways, and any of these methods may be employed.

この他、上記実施形態における１つの構成要素が有する機能は、複数の構成要素に分散して設けられてもよい。複数の構成要素が有する機能は、１つの構成要素に統合されてもよい。上記実施形態の構成の一部は、省略されてもよい。上記実施形態の構成の少なくとも一部は、他の上記実施形態の構成に対して付加又は置換されてもよい。特許請求の範囲に記載の文言から特定される技術思想に含まれるあらゆる態様が本開示の実施形態である。 In addition, the function of one component in the above embodiment may be distributed among a plurality of components. Functions of a plurality of components may be integrated into one component. A part of the configuration of the above embodiment may be omitted. At least a part of the configuration of the embodiment may be added to or replaced with the configuration of the other embodiment. Any aspect included in the technical idea specified from the wording of the claims is an embodiment of the present disclosure.

最後に用語間の対応関係を説明する。結合システム１０は、情報処理システムの一例に対応する。結合システム１０がデータベース要求信号を送信してデータ提供システム３０，５０から加工後データベースＦＰ２，ＦＰ３，ＦＰ４を取得する処理は、情報処理システムの取得ユニットにより実現される処理の一例に対応する。共通消費者は、対応構成体の一例に対応する。データ提供システム３０，５０がＳ２３０において共通消費者の個別特徴データＦ２，Ｆ３を識別する処理は、識別ユニットにより実現される処理の一例に対応する。 Finally, the correspondence between terms will be explained. The coupling system 10 corresponds to an example of an information processing system. The process in which the combined system 10 transmits the database request signal and acquires the processed databases FP2, FP3, and FP4 from the data providing systems 30 and 50 corresponds to an example of the process realized by the acquisition unit of the information processing system. A common consumer corresponds to an example of a corresponding structure. The process in which the data providing systems 30 and 50 identify the individual feature data F2 and F3 of the common consumer in S230 corresponds to an example of a process realized by the identification unit.

１，２…データ加工システム、１０…結合システム、１１…プロセッサ、１３…メモリ、１５…ストレージ装置、３０…データ提供システム、３１…プロセッサ、３３…メモリ、３５…ストレージ装置、５０…データ提供システム、５１…プロセッサ、５３…メモリ、５５…ストレージ装置、１５１…第一データベース、１５５…結合データベース、３５１…第二データベース、５５１…第三データベース、Ｆ１，Ｆ２，Ｆ３，Ｆ４…個別特徴データ、ＦＣ２，ＦＣ３，ＦＣ４…クラスタ特徴データ、ＦＰ２，ＦＰ３，ＦＰ４…加工後データベース、ＦＳ２，ＦＳ３，ＦＳ４，ＦＳＩ…統計データ、Ｌ４，Ｌ５…リスト。 DESCRIPTION OF SYMBOLS 1, 2 ... Data processing system, 10 ... Coupling system, 11 ... Processor, 13 ... Memory, 15 ... Storage apparatus, 30 ... Data provision system, 31 ... Processor, 33 ... Memory, 35 ... Storage apparatus, 50 ... Data provision system , 51 ... Processor, 53 ... Memory, 55 ... Storage device, 151 ... First database, 155 ... Combined database, 351 ... Second database, 551 ... Third database, F1, F2, F3, F4 ... Individual feature data, FC2 , FC3, FC4 ... cluster feature data, FP2, FP3, FP4 ... post-processing database, FS2, FS3, FS4, FSI ... statistical data, L4, L5 ... list.

Claims

A storage unit configured to store a group of first feature data relating to a first group including a plurality of first constructs;
An acquisition unit configured to acquire a group of second feature data relating to a second group including a plurality of second constructs;
A coupling unit;
With
The first feature data is associated with a first identifier that is an identifier of the corresponding one or more first constructs, and represents the features of the corresponding one or more first constructs;
Each of the second feature data corresponds to each of a plurality of clusters in the second group, and each of the plurality of clusters includes two or more of the plurality of second constructs, The second feature data is associated with a second identifier that is an identifier of the two or more second constructs included in the corresponding cluster, and the two or more second configurations included in the corresponding cluster. It has statistical data that expresses body characteristics as statistics,
Based on the first identifier associated with the first feature data, the combining unit converts each of the first feature data to the second feature data associated with the corresponding second identifier. An information processing system configured to combine a group of the first feature data and a group of the second feature data so as to combine with the statistical data.

An information processing system according to claim 1,
The acquisition unit is configured to cluster the plurality of second structures into the plurality of clusters according to a specified constraint condition in a data providing system including a plurality of individual feature data corresponding to each of the plurality of second structures. And obtaining a group of the second feature data corresponding to the plurality of clusters according to the constraint condition from the data providing system,
The data providing system clusters the plurality of second structures into the plurality of clusters according to the constraint condition, and the group of the second feature data corresponding to the plurality of clusters is stored in the information processing system. An information processing system configured to provide.

An information processing system according to claim 2,
In the statistical data, the data providing system converts the features of the two or more second constituents indicated by the individual feature data of the two or more second constituents included in the corresponding cluster into statistics. Information processing system.

An information processing system according to claim 1,
The plurality of second constituent bodies include a plurality of corresponding constituent bodies corresponding to any of the plurality of first constituent bodies, and a plurality of non-corresponding sections not corresponding to any of the plurality of first constituent bodies. A construct, and
Each of the second feature data acquired by the acquisition unit corresponds to each of a plurality of clusters defined by clustering the plurality of corresponding constituents in the second group.

An information processing system according to claim 4,
The acquisition unit requests a data providing system having a plurality of individual feature data corresponding to each of the plurality of second structures to cluster the plurality of corresponding structures into the plurality of clusters,
In accordance with the request, the data providing system clusters the plurality of corresponding constructs into the plurality of clusters, and provides the information processing system with the second group of feature data corresponding to the plurality of clusters. An information processing system composed of

An information processing system according to claim 5,
The acquisition unit transmits the list of the plurality of first structures or the list of the plurality of corresponding structures to the data providing system as a list of structures,
The data providing system is an information processing system configured to identify the plurality of corresponding constituents in the second group based on the list.

An information processing system according to claim 6,
The acquisition unit transmits distance information representing a distance on a feature space between a plurality of constituents included in the list to the data providing system,
The data providing system is an information processing system configured to cluster the plurality of corresponding constituents in the second group into the plurality of clusters based on the distance information.

The information processing system according to claim 6 or 7,
The acquisition unit transmits classification information representing each classification of the plurality of constituents included in the list to the data providing system,
The data providing system clusters the plurality of corresponding constituents in the second group into the plurality of clusters based on the classification information so that corresponding constituents of different classifications are not mixed in one cluster. An information processing system composed of

An information processing system according to any one of claims 4 to 8,
The plurality of corresponding structural bodies in the second group are the same structural bodies as any of the plurality of first structural bodies in the first group;
Between the first group and the second group, the same identifier is assigned to each of the plurality of corresponding components,
The information processing system, wherein the combining unit combines each of the first feature data with the statistical data of the second feature data associated with the same identifier.

An information processing system according to any one of claims 1 to 8,
The storage unit stores a correspondence relationship between the first identifier and the second identifier;
The information processing system, wherein the combining unit combines each of the first feature data with the statistical data of the second feature data associated with the corresponding second identifier according to the correspondence relationship.

An information processing system according to any one of claims 1 to 10,
Each of the first feature data corresponds to each of a plurality of clusters in the first group, each of the plurality of clusters including two or more of the plurality of first constructs, The first feature data is a statistic indicating the features of the two or more first components included in the corresponding cluster associated with the identifiers of the two or more first components included in the corresponding cluster. Information processing system with statistical data represented by

An information processing system according to claim 1,
The plurality of second constituents include a plurality of corresponding constituents corresponding to any of the plurality of first constituents,
The acquisition unit clusters the plurality of corresponding constituents into a plurality of clusters based on first feature data of the corresponding first constituents, and a cluster to which each of the plurality of corresponding constituents belongs. Is provided to a data providing system including a plurality of individual feature data corresponding to each of the plurality of second structures, and the second feature data according to the cluster information is provided from the data providing system. Get a group,
For each cluster identified from the cluster information, the data providing system statistically analyzes the characteristics of the two or more second structures indicated by the individual feature data of the two or more second structures included in the corresponding cluster. An information processing system configured to generate second feature data of the corresponding cluster by converting into a quantity, and to provide the group of the generated second feature data to the information processing system.

A storage unit for storing a plurality of individual feature data corresponding to each of a plurality of components in the group;
A clustering unit configured to cluster the plurality of constructs into a plurality of clusters;
A generating unit configured to generate a group of cluster feature data corresponding to the plurality of clusters;
A providing unit configured to provide a group of the cluster feature data generated by the generating unit to an information processing system;
With
The individual feature data is associated with an identifier of a corresponding construct, and represents the feature of the corresponding construct;
The cluster feature data includes statistical data that represents the features of the two or more constructs included in the corresponding cluster in a statistic associated with identifiers of the two or more constructs included in the corresponding cluster;
The data generation system, wherein the generation unit generates the statistical data by converting the characteristics of the two or more constituents indicated by the individual feature data of the two or more constituents included in the corresponding cluster into the statistics. .

14. The data providing system according to claim 13, wherein
The data providing system, wherein the clustering unit clusters the plurality of constructs into the plurality of clusters in accordance with a constraint specified by the information processing system.

14. The data providing system according to claim 13, wherein
The information processing system is configured to transmit distance information representing a distance on a feature space between the plurality of constructs to the data providing system,
The data providing system, wherein the clustering unit clusters the plurality of constructs into the plurality of clusters based on the distance information from the information processing system.

The data providing system according to claim 13 or 15,
The information processing system is configured to transmit classification information representing each classification of the plurality of constructs to the data providing system,
The clustering unit is a data providing system that clusters the plurality of constituents into the plurality of clusters based on the classification information from the information processing system so that constituents of different classifications are not mixed in one cluster.

The data providing system according to claim 13 or 15,
The information processing system is configured to transmit a list of constructs to the data providing system;
The data providing system includes an identification unit configured to identify a plurality of corresponding components corresponding to any of the components included in the list from the plurality of components.
The data providing system in which the clustering unit clusters the plurality of corresponding constituents identified by the identification unit into the plurality of clusters.

A storage unit for storing a plurality of individual feature data corresponding to each of a plurality of components in the group;
A clustering unit configured to cluster the plurality of constructs into a plurality of clusters;
A generating unit configured to generate a group of cluster feature data corresponding to the plurality of clusters;
A providing unit configured to provide a group of the cluster feature data generated by the generating unit to an information processing system;
With
The individual feature data represents a feature of a corresponding component,
The cluster feature data includes statistical data that expresses features of two or more constituents included in a corresponding cluster by a statistic.
The clustering unit clusters the plurality of constructs into a plurality of clusters according to a constraint specified by the information processing system,
The data generation system, wherein the generation unit generates the statistical data by converting the characteristics of the two or more constituents indicated by the individual feature data of the two or more constituents included in the corresponding cluster into the statistics. .

A storage unit for storing a plurality of individual feature data corresponding to each of a plurality of components in the group;
A clustering unit configured to cluster the plurality of constructs into a plurality of clusters;
A generating unit configured to generate a group of cluster feature data corresponding to the plurality of clusters;
A providing unit configured to provide a group of the cluster feature data generated by the generating unit to an information processing system;
With
The individual feature data represents a feature of a corresponding component,
The cluster feature data includes statistical data that expresses features of two or more constituents included in a corresponding cluster by a statistic.
The information processing system is configured to transmit distance information representing a distance on a feature space between the plurality of constructs to the data providing system,
The clustering unit clusters the plurality of constructs into the plurality of clusters based on the distance information from the information processing system,
The data generation system, wherein the generation unit generates the statistical data by converting the characteristics of the two or more constituents indicated by the individual feature data of the two or more constituents included in the corresponding cluster into the statistics. .

A storage unit for storing a plurality of individual feature data corresponding to each of a plurality of components in the group;
A clustering unit configured to cluster the plurality of constructs into a plurality of clusters;
A generating unit configured to generate a group of cluster feature data corresponding to the plurality of clusters;
A providing unit configured to provide a group of the cluster feature data generated by the generating unit to an information processing system;
With
The individual feature data represents a feature of a corresponding component,
The cluster feature data includes statistical data that expresses features of two or more constituents included in a corresponding cluster by a statistic.
The information processing system is configured to transmit classification information representing each classification of the plurality of constructs to the data providing system,
The clustering unit, based on the classification information from the information processing system, clusters the plurality of constituents into the plurality of clusters so that constituents of different classifications are not mixed in one cluster,
The data generation system, wherein the generation unit generates the statistical data by converting the characteristics of the two or more constituents indicated by the individual feature data of the two or more constituents included in the corresponding cluster into the statistics. .

An information processing method executed by a computer,
Obtaining a group of first feature data relating to a first group comprising a plurality of first constructs;
Obtaining a group of second feature data relating to a second group comprising a plurality of second constructs;
Combining the acquired first feature data group and the acquired second feature data group;
The first feature data is associated with a first identifier that is an identifier of the corresponding one or more first constructs, and represents the features of the corresponding one or more first constructs;
Each of the second feature data corresponds to each of a plurality of clusters in the second group, and each of the plurality of clusters includes two or more of the plurality of second constructs, The second feature data is associated with a second identifier that is an identifier of the two or more second constructs included in the corresponding cluster, and the two or more second configurations included in the corresponding cluster. It has statistical data that expresses body characteristics as statistics,
The combining is based on the first identifier associated with the first feature data, and each of the first feature data is associated with the corresponding second identifier. An information processing method comprising combining the group of the first feature data and the group of the second feature data so as to be combined with the statistical data.

The information processing method according to claim 21,
Obtaining the group of first feature data includes reading the group of first feature data from a storage device storing the group of first feature data;
Obtaining the group of second feature data includes obtaining the group of second feature data from a data providing system that provides the group of second feature data.

A data providing method executed by a computer,
Obtaining a plurality of individual feature data corresponding to each of a plurality of constructs in the group;
Clustering the plurality of constructs into a plurality of clusters;
Generating a group of cluster feature data corresponding to the plurality of clusters;
Providing a group of the generated cluster feature data to an information processing system;
Including
The individual feature data is associated with an identifier of a corresponding construct, and represents the feature of the corresponding construct;
The cluster feature data includes statistical data that represents the features of the two or more constructs included in the corresponding cluster in a statistic associated with identifiers of the two or more constructs included in the corresponding cluster;
The generating includes generating the statistical data by converting the characteristics of the two or more constituents indicated by the individual feature data of the two or more constituents included in the corresponding cluster into the statistics. Data provision method including.

A data providing method executed by a computer,
Obtaining a plurality of individual feature data corresponding to each of a plurality of constructs in the group;
Clustering the plurality of constructs into a plurality of clusters;
Generating a group of cluster feature data corresponding to the plurality of clusters;
Providing a group of the generated cluster feature data to an information processing system;
Including
The individual feature data represents a feature of a corresponding component,
The cluster feature data includes statistical data that expresses features of two or more constituents included in a corresponding cluster by a statistic.
The clustering includes clustering the plurality of constructs into the plurality of clusters according to a constraint specified by the information processing system;
The generating includes generating the statistical data by converting the characteristics of the two or more constituents indicated by the individual feature data of the two or more constituents included in the corresponding cluster into the statistics. Data provision method including.

A data providing method executed by a computer,
Obtaining a plurality of individual feature data corresponding to each of a plurality of constructs in the group;
Clustering the plurality of constructs into a plurality of clusters;
Generating a group of cluster feature data corresponding to the plurality of clusters;
Providing a group of the generated cluster feature data to an information processing system;
Including
The individual feature data represents a feature of a corresponding component,
The cluster feature data includes statistical data that expresses features of two or more constituents included in a corresponding cluster by a statistic.
The clustering includes clustering the plurality of constructs into the plurality of clusters based on distance information representing a distance on a feature space between the plurality of constructs from the information processing system,
The generating includes generating the statistical data by converting the characteristics of the two or more constituents indicated by the individual feature data of the two or more constituents included in the corresponding cluster into the statistics. Including data provision methods.

A data providing method executed by a computer,
Obtaining a plurality of individual feature data corresponding to each of a plurality of constructs in a group; clustering the plurality of constructs into a plurality of clusters;
Generating a group of cluster feature data corresponding to the plurality of clusters;
Providing a group of the generated cluster feature data to an information processing system;
Including
The individual feature data represents a feature of a corresponding component,
The cluster feature data includes statistical data that expresses features of two or more constituents included in a corresponding cluster by a statistic.
The clustering is based on classification information representing each classification of the plurality of constituents from the information processing system, so that the plurality of constituents are not mixed in different constituents in one cluster. Clustering into the plurality of clusters,
The generating includes generating the statistical data by converting the characteristics of the two or more constituents indicated by the individual feature data of the two or more constituents included in the corresponding cluster into the statistics. Data provision method including.

The data providing method according to any one of claims 23 to 26, wherein:
The method of providing data, wherein obtaining the plurality of individual feature data includes reading the plurality of individual feature data from a storage device that stores the plurality of individual feature data.

The computer program for functioning a computer as the said acquisition unit with which the information processing system as described in any one of Claims 1-12 is provided, and the said coupling | bonding unit.

A computer program for causing a computer to function as the clustering unit, the generating unit, and the providing unit provided in the data providing system according to any one of claims 13 to 16.

A computer program for causing a computer to function as the clustering unit, the generating unit, and the providing unit provided in the data providing system according to any one of claims 18 to 20.