TW202336607A

TW202336607A - Information processing system and method of processing information

Info

Publication number: TW202336607A
Application number: TW111142613A
Authority: TW
Inventors: 熊谷雄介; 道本龍; 野沢悠哉
Original assignee: 日商博報堂Ｄｙ控股股份有限公司
Priority date: 2021-11-09
Filing date: 2022-11-08
Publication date: 2023-09-16
Also published as: JP7227412B1; WO2023085279A1; JP2023070618A

Abstract

An information processing system obtains a first data set for a plurality of first entities. A second data set is obtained for a plurality of second entities. A dimensionality reduction operation is performed on a group of the first feature vectors identified from the first data set and a group of the second feature vectors identified from the second data set. Thereby, a group of first low-dimensional feature vectors corresponding to the group of the first feature vectors and a group of second low-dimensional feature vectors corresponding to the group of the second feature vectors are generated. Each of the first entities is associated with at least one of the second entities based on the group of the first low-dimensional feature vectors and the group of the second low-dimensional feature vectors.

Description

Information processing systems and information processing methods

本發明係有關資訊處理系統及資訊處理方法。The present invention relates to an information processing system and an information processing method.

以往，根據商品的販售資料對顧客的購買行為進行分析。還對顧客進行的對大眾媒體、網路內容之接觸行為進行分析。藉由問卷調查形式和/或面對面提問形式，對關於顧客的各種資訊進行收集。In the past, customers’ purchasing behavior was analyzed based on product sales data. It also analyzes customers’ contact behavior with mass media and online content. Collect various information about customers through questionnaires and/or face-to-face questions.

已知一種根據共用變數對使用不同的手段收集到的複數個資料進行結合的資料融合技術。特別是已揭示一種關於第一資料集與第二資料集之間的資料融合之技術，其中，第一資料集係具備複數個第一顧客之每個顧客的第一特徵資料，第二資料集係具備複數個第二顧客之每個顧客的第二特徵資料(例如參照專利文獻1)。資料融合係根據第一資料集與第二資料集之間的共用變數，例如根據顧客的人口統計屬性，對近似顧客之第一特徵資料和第二特徵資料進行結合。 [專利文獻] There is known a data fusion technology that combines a plurality of data collected using different means based on a common variable. In particular, a technology for data fusion between a first data set and a second data set has been disclosed, wherein the first data set has first characteristic data of each customer of a plurality of first customers, and the second data set has It is the second characteristic data of each customer including a plurality of second customers (for example, refer to Patent Document 1). Data fusion is to combine the first characteristic data and the second characteristic data of similar customers based on the shared variables between the first data set and the second data set, for example, based on the customer's demographic attributes. [Patent Document]

專利文獻1為日本特開2016-126609號公報。Patent Document 1 is Japanese Patent Application Laid-Open No. 2016-126609.

在以往之資料融合技術中，為了使用共用變數來辨別近似的顧客，在作為結合對象之第一資料集與第二資料集之間需要關於顧客的共用變數。因此，不存在共用變數的資料彼此不能結合。In the conventional data fusion technology, in order to use common variables to identify similar customers, common variables about customers are required between the first data set and the second data set that are the objects of fusion. Therefore, data that do not share variables cannot be combined with each other.

因此，根據本發明之一部分，期待能夠提供如下一種技術，能夠不藉由共用變數，而根據關於複數個第一實體的第一資料集和關於複數個第二實體的第二資料集，實現建立第一實體和第二實體的對應。Therefore, according to a part of the present invention, it is expected to provide a technology that can achieve creation based on a first data set about a plurality of first entities and a second data set about a plurality of second entities without sharing variables. The correspondence between the first entity and the second entity.

根據本發明之一部分，提供一種資訊處理系統。資訊處理系統包含第一取得部、第二取得部、降維部和對應建立部。第一取得部係構成為取得關於複數個第一實體的第一資料集。第一資料集可以記述複數個第一實體各自的特徵。According to a part of the present invention, an information processing system is provided. The information processing system includes a first acquisition part, a second acquisition part, a dimension reduction part and a correspondence creation part. The first acquisition part is configured to acquire a first data set regarding a plurality of first entities. The first data set may describe respective characteristics of a plurality of first entities.

第二取得部係構成為取得關於複數個第二實體的第二資料集。第二資料集可以記述複數個第二實體各自的特徵。The second acquisition part is configured to acquire a second data set regarding a plurality of second entities. The second data set may describe respective characteristics of a plurality of second entities.

降維部係構成為，藉由對自第一資料集特定的第一特徵向量的群組及自第二資料集特定的第二特徵向量的群組執行降維處理，而生成與第一特徵向量的群組對應之第一低維度特徵向量的群組及與第二特徵向量的群組對應之第二低維度特徵向量的群組。第二低維度特徵向量的群組可以係具有與第一低維度特徵向量的群組相同的維度數之特徵向量的群組。The dimensionality reduction unit is configured to generate the same features as the first features by performing dimensionality reduction processing on a group of first feature vectors specified from the first data set and a group of second feature vectors specified from the second data set. The group of vectors corresponds to the group of first low-dimensional feature vectors and the group of second low-dimensional feature vectors corresponds to the group of second feature vectors. The group of second low-dimensional feature vectors may be a group of feature vectors having the same number of dimensions as the group of first low-dimensional feature vectors.

第一特徵向量之每一個可以表示複數個第一實體中的對應之一個實體的特徵。第二特徵向量之每一個可以表示複數個第二實體中的對應之一個實體的特徵。Each of the first feature vectors may represent a feature of a corresponding one of the plurality of first entities. Each of the second feature vectors may represent a feature of a corresponding one of the plurality of second entities.

對應建立部構成為，根據第一低維度特徵向量的群組及第二低維度特徵向量的群組，將複數個第一實體之每一個與複數個第二實體的至少一個建立對應。The correspondence establishment unit is configured to associate each of the plurality of first entities with at least one of the plurality of second entities based on the group of first low-dimensional feature vectors and the group of second low-dimensional feature vectors.

在第一實體的集合及第二實體的集合係來自彼此共同的母體的子集或係來自彼此相關的母體的子集的情況下，第一特徵向量與第二特徵向量之間即使不存在共用變數，亦可以藉由降維，用彼此共同的成分的組合或彼此相關的成分的組合來表現第一實體的特徵及第二實體的特徵。In the case where the set of first entities and the set of second entities are subsets from a common parent or subsets from parents that are related to each other, even if there is no commonality between the first feature vector and the second feature vector Variables can also be used to express the characteristics of the first entity and the characteristics of the second entity by a combination of common components or a combination of components related to each other through dimensionality reduction.

亦即，藉由降維，能夠從第一特徵向量及第二特徵向量抽出彼此共同的主要特徵成分或彼此相關的主要特徵成分。因此，根據低維度特徵向量的比較，能夠適當地辨別第一實體與第二實體之間的適合度。That is, through dimensionality reduction, main feature components common to each other or main feature components related to each other can be extracted from the first feature vector and the second feature vector. Therefore, the degree of fitness between the first entity and the second entity can be appropriately discerned based on the comparison of the low-dimensional feature vectors.

因此，根據本發明之一部分，能夠不藉由共用變數，而基於關於複數個第一實體的第一資料集和關於複數個第二實體的第二資料集來適當地建立第一實體和第二實體的對應。Therefore, according to part of the present invention, the first entity and the second entity can be appropriately established based on the first data set regarding the plurality of first entities and the second data set regarding the plurality of second entities without sharing variables. Entity correspondence.

根據本發明之一部分，對應建立部能夠根據自第一低維度特徵向量的群組特定的第一實體間的相似度及自第二低維度特徵向量的群組特定的第二實體間的相似度，將複數個第一實體之每一個與複數個第二實體的至少一個建立對應，以使關於相似度的第一實體間的相互關係適合於第二實體間的相互關係。According to one part of the present invention, the correspondence establishing unit can be based on the similarity between the first entities specified from the group of the first low-dimensional feature vectors and the similarity between the second entities specified from the group of the second low-dimensional feature vectors. , establishing a correspondence between each of the plurality of first entities and at least one of the plurality of second entities, so that the mutual relationship between the first entities with respect to similarity is adapted to the mutual relationship between the second entities.

在第一實體的集合及第二實體的集合係來自共同的母體的子集或係來自彼此相關的母體的子集的情況下，關於實體間的相似度的相互關係係與母體同樣地，在第一實體的集合與第二實體的集合之間大致彼此共同或彼此相關。In the case where the set of the first entity and the set of the second entity are subsets from a common parent or are subsets from a parent that are related to each other, the mutual relationship regarding the similarity between the entities is the same as the parent, in The set of first entities and the set of second entities are generally common to or related to each other.

因此，將複數個第一實體之每一個與複數個第二實體的至少一個建立對應，以使關於相似度的第一實體間的相互關係適合於第二實體間的相互關係，藉此能夠使得第一實體之每一個與同一性高或相關性強的適當的第二實體建立對應。Therefore, each of the plurality of first entities is associated with at least one of the plurality of second entities so that the mutual relationship between the first entities with respect to the degree of similarity is adapted to the mutual relationship between the second entities, thereby enabling Each of the first entities is corresponding to an appropriate second entity with high identity or strong correlation.

根據本發明之一部分，可以根據第一特徵空間定義第一低維度特徵向量的群組。可以根據第二特徵空間定義第二低維度特徵向量的群組。According to one part of the present invention, a group of first low-dimensional feature vectors may be defined according to a first feature space. A group of second low-dimensional feature vectors may be defined according to the second feature space.

對應建立部可以搜尋用於將第一特徵空間上的複數個第一實體映射到第二特徵空間的映射（Mapping），以使自第一低維度特徵向量的群組特定的第一特徵空間中的複數個第一實體的分布適合於自第二低維度特徵向量的群組特定的第二特徵空間中的複數個第二實體的分布。The correspondence establishment part may search for a mapping (Mapping) for mapping the plurality of first entities on the first feature space to the second feature space, so that the group of first low-dimensional feature vectors is specified in the first feature space. The distribution of the plurality of first entities is adapted to the distribution of the plurality of second entities in a second feature space specified from the group of second low-dimensional feature vectors.

對應建立部可以構成為，基於映射將複數個第一實體之每一個與複數個第二實體的至少一個建立對應。The correspondence establishing unit may be configured to associate each of the plurality of first entities with at least one of the plurality of second entities based on the mapping.

根據本發明之一部分，對應建立部可以構成為，將按照包含矩陣K、矩陣L及矩陣H之如下數學式搜尋使值Z(Ω)最大化的矩陣Ω作為矩陣Ω ^＊，並且基於矩陣Ω ^＊，將複數個第一實體之每一個與複數個第二實體的至少一個建立對應。T係轉置符號。trace係矩陣X的對角和。 [數學式1] According to a part of the present invention, the correspondence creation unit may be configured to search for the matrix Ω that maximizes the value Z(Ω) according to the following mathematical expression including the matrix K, the matrix L, and the matrix H as the matrix Ω ^* , and based on the matrix Ω ^* , establishing a correspondence between each of the plurality of first entities and at least one of the plurality of second entities. T is the transpose symbol. The trace system is the diagonal sum of the matrix X. [Mathematical formula 1]

矩陣K可以係N行N列的矩陣。第一實體的數量可以係N。第二實體的數量可以與第一實體的數量相同。矩陣K可以係，第i行第j列的元素的值表示複數個第一實體中第i個實體與第j個實體之間的相似度的第一相似度矩陣。Matrix K can be a matrix with N rows and N columns. The number of first entities may be N. The number of second entities may be the same as the number of first entities. The matrix K may be a first similarity matrix in which the value of the element in the i-th row and j-th column represents the similarity between the i-th entity and the j-th entity among the plurality of first entities.

矩陣K中的第i行第j列的元素的值係可以基於複數個第一實體中的第i個實體的第一低維度特徵向量和複數個第一實體中的第j個實體的第一低維度特徵向量而計算得出。The value system of the element in the i-th row and j-th column in the matrix K may be based on the first low-dimensional feature vector of the i-th entity among the plurality of first entities and the first low-dimensional feature vector of the j-th entity among the plurality of first entities. Calculated from low-dimensional feature vectors.

矩陣L可以係N行N列的矩陣。矩陣L係，第i行第j列的元素的值表示複數個第二實體中的第i個實體與第j個實體之間的相似度的第二相似度矩陣。The matrix L can be a matrix with N rows and N columns. Matrix L system, the value of the element in the i-th row and j-th column represents a second similarity matrix representing the similarity between the i-th entity and the j-th entity among the plurality of second entities.

矩陣L中的第i行第j列的元素的值係可以基於複數個第二實體中的第i個實體的第二低維度特徵向量和複數個第二實體中的第j個實體的第二低維度特徵向量而計算得出。The value system of the element in the i-th row and j-th column in the matrix L may be based on the second low-dimensional feature vector of the i-th entity among the plurality of second entities and the second low-dimensional feature vector of the j-th entity among the plurality of second entities. Calculated from low-dimensional feature vectors.

矩陣H可以係N行N列的矩陣。矩陣H可以係，當第i行第j列的元素的值係i＝j時，表示值為1-1/N，當第i行第j列的元素的值係i≠j時，表示值為0的矩陣。The matrix H can be a matrix with N rows and N columns. The matrix H can be expressed as follows. When the value series of the element in the i-th row and j-th column is i=j, it means that the value is 1-1/N. When the value series of the element in the i-th row and j-th column is i≠j, it means that the value is 1-1/N. is a matrix of 0.

根據本發明之一部分，對應建立部可以根據矩陣Ω ^＊變更在降維處理中的降維方式。例如，對應建立部可以變更在降維處理中的降維方式，以使第一低維度特徵向量的群組及第二低維度特徵向量的群組中彼此對應之第一低維度特徵向量與第二低維度特徵向量之間於特徵空間上的距離縮短。 According to one aspect of the present invention, the correspondence creation unit can change the dimensionality reduction method in the dimensionality reduction process based on the matrix Ω ^* . For example, the correspondence establishment unit may change the dimensionality reduction method in the dimensionality reduction process so that the first low-dimensional feature vector and the third low-dimensional feature vector in the group of first low-dimensional feature vectors and the group of second low-dimensional feature vectors correspond to each other. The distance between the two low-dimensional feature vectors in the feature space is shortened.

根據本發明之一部分，對應建立部可以構成為，循環執行關於矩陣Ω ^＊的再搜尋處理直至滿足既定條件為止，藉此來改善矩陣Ω ^＊，並且根據已改善的矩陣Ω ^＊，將複數個第一實體之每一個與複數個第二實體的至少一個建立對應。 According to a part of the present invention, the correspondence establishment unit may be configured to perform a re-search process on the matrix Ω ^* in a loop until a predetermined condition is satisfied, thereby improving the matrix Ω ^* , and based on the improved matrix Ω ^* , the plurality of th Each one of the entities is corresponding to at least one of the plurality of second entities.

再搜尋處理可以包含，根據矩陣Ω ^＊變更在降維處理中的降維方式。再搜尋處理可以包含，使降維部以変更後的降維方式執行降維處理，並且基於由此而新獲得的第一低維度特徵向量的群組及第二低維度特徵向量的群組再次搜尋矩陣Ω ^＊。 The re-search process may include changing the dimensionality reduction method in the dimensionality reduction process based on the matrix Ω ^* . The re-search process may include causing the dimensionality reduction unit to perform the dimensionality reduction process in an updated dimensionality reduction manner, and re-searching based on the newly obtained group of first low-dimensional feature vectors and the group of second low-dimensional feature vectors. Search matrix Ω ^* .

根據包含如上述構成的對應建立部之資訊處理系統，能夠高精度地執行在第一實體與第二實體之間建立對應的處理。According to the information processing system including the correspondence establishment unit configured as described above, the process of establishing correspondence between the first entity and the second entity can be executed with high accuracy.

根據本發明之一部分，第一資料集可以包含複數個第一特徵資料。複數個第一特徵資料之每一個可以表示複數個第一實體中的對應之一個實體的特徵。第二資料集可以包含複數個第二特徵資料。複數個第二特徵資料之每一個可以表示複數個第二實體中的對應之一個實體的特徵。According to a part of the present invention, the first data set may include a plurality of first characteristic data. Each of the plurality of first characteristic data may represent a characteristic of a corresponding one of the plurality of first entities. The second data set may contain a plurality of second characteristic data. Each of the plurality of second characteristic data may represent a characteristic of a corresponding one of the plurality of second entities.

根據本發明之一部分，資訊處理系統可以進一步包含資料融合部。資料融合部係可以構成為，根據由對應建立部建立的複數個第一實體與複數個第二實體之間的對應，於複數個第一特徵資料之每一個分別結合複數個第二特徵資料中的一個，藉此生成擴展資料集。擴展資料集可以包含複數個擴展資料。複數個擴展資料之每一個可以係對應之一個第一特徵資料與對應之一個第二特徵資料結合而生成的資料。According to a part of the present invention, the information processing system may further include a data fusion unit. The data fusion unit may be configured to combine the plurality of second characteristic data with each of the plurality of first characteristic data based on the correspondence between the plurality of first entities and the plurality of second entities established by the correspondence creation unit. , from which an extended data set is generated. An extension set can contain multiple extensions. Each of the plurality of extended data may be data generated by combining a corresponding first characteristic data and a corresponding second characteristic data.

根據上述資訊處理系統，能夠生成使複數個資料集相結合的資訊量多的資料集。According to the above information processing system, it is possible to generate a data set with a large amount of information by combining a plurality of data sets.

根據本發明之一部分，第一實體可以係人。第二實體可以係人。第一資料集可以係記述屬於第一集團之複數個人各自的第一特徵的資料集。第二資料集可以係記述屬於第二集團之複數個人各自的第二特徵的資料集。According to part of the invention, the first entity may be a human being. The second entity can contact people. The first data set may be a data set describing the first characteristics of each of the plurality of individuals belonging to the first group. The second data set may be a data set describing the second characteristics of each of the plurality of individuals belonging to the second group.

可以認為，關於人的行為、關注等特徵會較大影響人口統計屬性，並且即使在不同的人的集團之間，對應於人口統計屬性之特徵分布不會有較大的變化。因此，根據本發明之一部分之資訊處理系統，能夠無需共用變數，而適當地建立關於不同集團間的人的對應。It can be considered that characteristics such as people's behavior and attention will greatly affect demographic attributes, and even among different groups of people, the distribution of characteristics corresponding to demographic attributes will not change significantly. Therefore, according to the information processing system which is a part of the present invention, it is possible to appropriately establish correspondence between people in different groups without sharing variables.

根據本發明之一部分，第一特徵和第二特徵的組合可以係關於購買行為的特徵、關於在線上空間及離線空間的至少一者的空間中的移動的特徵、和/或關於對空間上的複數個地點進行訪問的特徵的組合。基於與上述特徵相關的資料集而建立的實體的對應，乃至資料融合有助於分析人的行為。According to a part of the present invention, the combination of the first characteristic and the second characteristic may be related to characteristics of purchasing behavior, characteristics of movement in at least one of online space and offline space, and/or characteristics of spatial perception. A combination of features for visiting multiple locations. The correspondence and even data fusion of entities established based on data sets related to the above characteristics help analyze human behavior.

根據本發明之一部分，與複數個第二實體之每一個對應的資訊終端的識別資訊可以與第二資料集建立關聯。According to a part of the present invention, the identification information of the information terminal corresponding to each of the plurality of second entities can be associated with the second data set.

根據本發明之一部分，資訊處理系統可以包含選擇部，選擇部選擇複數個第二實體中的、藉由對應建立部而與複數個第一實體之任一個實體已建立對應的第二實體的集合的至少一部分，來作為資訊內容的傳送標的。According to a part of the present invention, the information processing system may include a selection unit that selects a set of second entities among the plurality of second entities that have been corresponding to any one of the plurality of first entities through the correspondence establishment unit. At least part of the information content is used as the transmission target.

根據本發明之一部分，資訊處理系統可以包含傳送部，傳送部係構成為，基於上述識別資訊，將資訊內容傳送至與資訊內容的傳送標的對應之資訊終端的集合。According to a part of the present invention, the information processing system may include a transmission unit configured to transmit the information content to a set of information terminals corresponding to the transmission target of the information content based on the identification information.

該資訊處理系統在第一實體及第二實體係人時，發揮有意義的功能。根據上述傳送方式，即使在第一實體和資訊終端的關係不明確時，亦能夠活用已與第二實體建立關聯的資訊終端的識別資訊，而將資訊內容適當地傳送至與第一實體對應之第二實體的資訊終端。The information processing system performs meaningful functions when the first entity and the second entity are human beings. According to the above transmission method, even when the relationship between the first entity and the information terminal is unclear, the identification information of the information terminal that has been associated with the second entity can be utilized to appropriately transmit the information content to the first entity. The information terminal of the second entity.

根據本發明之一部分，選擇部係可以構成為，選擇第一集合和第二集合作為資訊內容的傳送標的，其中，第一集合係藉由對應建立部而與複數個第一實體之任一個實體已建立對應的第二實體的集合；第二集合係複數個第二實體中與第一集合特徵類似的集合。根據上述對傳送標的之選擇，能夠基於第二資料集在適當的範圍擴充傳送標的，並傳送資訊內容。According to a part of the present invention, the selecting unit may be configured to select a first set and a second set as the transmission target of the information content, wherein the first set is associated with any one of the plurality of first entities through the corresponding establishing unit. A set of corresponding second entities has been established; the second set is a set among a plurality of second entities that has characteristics similar to those of the first set. According to the above selection of the transmission object, the transmission object can be expanded in an appropriate range based on the second data set, and the information content can be transmitted.

根據本發明之一部分，第二資料集可以係記述關於複數個第二實體之每一個實體的行為的特徵的資料集。於這種情況下，資訊處理系統可以包含推定部，推定部係關於一個以上的關注實體，針對每個關注實體，均計算出關於對應之關注實體的行為之推定值。一個以上的關注實體可以係複數個第一實體的至少一部分。推定值係可以根據關於複數個第二實體之至少一個實體的行為之特徵計算得出，其中，複數個第二實體已與對應之關注實體建立對應。第一實體及第二實體可以係人。According to a part of the present invention, the second data set may be a data set describing characteristics of behavior of each entity of the plurality of second entities. In this case, the information processing system may include an inference part, which is about one or more entities of interest, and for each entity of interest, calculates an inference value about the behavior of the corresponding entity of interest. More than one entity of interest may be at least part of a plurality of first entities. The inferred value may be calculated based on the characteristics of the behavior of at least one entity of the plurality of second entities, wherein the plurality of second entities have established correspondence with the corresponding entity of interest. The first entity and the second entity may be related to people.

根據包含上述推定部之資訊處理系統，能夠藉由第二資料集來推定僅憑第一資料集不能辨別的第一實體的行為。推定可為預測。According to the information processing system including the above-mentioned inference unit, it is possible to infer the behavior of the first entity that cannot be identified based only on the first data set using the second data set. Presumptions can be predictions.

根據本發明之一部分，可以提供與由上述資訊處理系統執行的方法對應之資訊處理方法。根據本發明之一部分，可以提供藉由電腦執行的資訊處理方法。資訊處理方法可以包含取得第一資料集，其中，第一資料集係關於複數個第一實體的資料集，並且係記述複數個第一實體各自的特徵。According to a part of the present invention, an information processing method corresponding to the method performed by the above-mentioned information processing system can be provided. According to a part of the present invention, an information processing method executed by a computer can be provided. The information processing method may include obtaining a first data set, wherein the first data set is a data set related to a plurality of first entities and describes respective characteristics of the plurality of first entities.

資訊處理方法可以包含取得第二資料集，其中，第二資料集係關於複數個第二實體的資料集，並且係記述複數個第二實體各自的特徵。The information processing method may include obtaining a second data set, wherein the second data set is a data set related to a plurality of second entities and describes respective characteristics of the plurality of second entities.

資訊處理方法可以包含，藉由對自第一資料集特定的第一特徵向量的群組及自第二資料集特定的第二特徵向量的群組執行降維處理，而生成與第一特徵向量的群組對應之第一低維度特徵向量的群組及與第二特徵向量的群組對應之第二低維度特徵向量的群組。第二低維度特徵向量的群組可以係具有與第一低維度特徵向量的群組相同的維度數之特徵向量的群組。The information processing method may include generating the first feature vector by performing dimensionality reduction on the group of first feature vectors specified from the first data set and the group of second feature vectors specified from the second data set. A group of first low-dimensional feature vectors corresponding to the group of and a group of second low-dimensional feature vectors corresponding to the group of second feature vectors. The group of second low-dimensional feature vectors may be a group of feature vectors having the same number of dimensions as the group of first low-dimensional feature vectors.

資訊處理方法可以包含，根據第一低維度特徵向量的群組及第二低維度特徵向量的群組，將複數個第一實體之每一個與複數個第二實體的至少一個建立對應。The information processing method may include establishing a correspondence between each of the plurality of first entities and at least one of the plurality of second entities according to the group of first low-dimensional feature vectors and the group of second low-dimensional feature vectors.

根據本發明之一部分，建立對應可以包含，根據自第一低維度特徵向量的群組特定的第一實體間的相似度及自第二低維度特徵向量的群組特定的第二實體間的相似度，將複數個第一實體之每一個與複數個第二實體的至少一個建立對應，以使關於相似度的第一實體間的相互關係適合於第二實體間的相互關係。According to a part of the present invention, establishing the correspondence may include based on the similarity between the first entities specified from the group of the first low-dimensional feature vector and the similarity between the second entities specified from the group of the second low-dimensional feature vector. degree, establishing a correspondence between each of the plurality of first entities and at least one of the plurality of second entities, so that the mutual relationship between the first entities with respect to the degree of similarity is suitable for the mutual relationship between the second entities.

根據上述資訊處理方法，與上述資訊處理系統同樣地，能夠不藉由共用變數，而基於關於複數個第一實體的第一資料集和關於複數個第二實體的第二資料集，實現建立第一實體和第二實體的對應。According to the above-mentioned information processing method, similarly to the above-mentioned information processing system, it is possible to realize the establishment of the second data set based on the first data set regarding the plurality of first entities and the second data set regarding the plurality of second entities without sharing variables. Correspondence between one entity and a second entity.

根據本發明之一部分，可以提供包含用於使電腦執行上述資訊處理方法之命令的電腦程式。根據本發明之一部分，可以提供記憶電腦程式之電腦可讀取的記錄介質。According to a part of the present invention, a computer program including commands for causing a computer to execute the above information processing method can be provided. According to a part of the present invention, a computer-readable recording medium storing a computer program can be provided.

根據本發明之一部分，可以提供一種內儲電腦程式之電腦可讀取的記錄介質，其特徵在於，在電腦載入並執行電腦程式後，能夠完成上述資訊處理方法。According to a part of the present invention, a computer-readable recording medium storing a computer program can be provided, which is characterized in that after the computer loads and executes the computer program, the above information processing method can be completed.

根據本發明之一部分，可以提供一種內儲電腦程式之電腦程式產品，其特徵在於，在電腦載入並執行電腦程式後，能夠完成上述資訊處理方法。According to a part of the present invention, a computer program product with a built-in computer program can be provided, which is characterized in that after the computer loads and executes the computer program, the above information processing method can be completed.

以下，參照圖式說明本發明舉例說明之實施形態。 [第一實施形態] 本實施形態之資訊處理系統1係在通用電腦中安裝專用之電腦程式Pr所構成。如第1圖所示，資訊處理系統1係包含處理器11、記憶體13、儲存器15、用戶介面17和通訊介面19。 Hereinafter, illustrative embodiments of the present invention will be described with reference to the drawings. [First Embodiment] The information processing system 1 of this embodiment is configured by installing a dedicated computer program Pr in a general-purpose computer. As shown in Figure 1, the information processing system 1 includes a processor 11, a memory 13, a storage 15, a user interface 17 and a communication interface 19.

處理器11係按照儲存在儲存器15中的電腦程式Pr來執行處理。記憶體13係包含RAM的第一級記憶裝置，且係當處理器11執行處理時，作為作業區而被使用。The processor 11 executes processing according to the computer program Pr stored in the memory 15 . The memory 13 is a first-level memory device including RAM, and is used as a work area when the processor 11 performs processing.

儲存器15係例如包含硬碟驅動器或固態驅動器的第二級記憶裝置，除電腦程式Pr之外，還儲存供按照電腦程式Pr執行處理時的各種資料。The storage 15 is a secondary memory device including, for example, a hard disk drive or a solid-state drive. In addition to the computer program Pr, it also stores various data for processing according to the computer program Pr.

用戶介面17係包含輸入裝置和顯示器。輸入裝置係被設置成用於將來自操作資訊處理系統1的用戶之操作信號輸入至處理器11。顯示器係被設置成用於對用戶顯示各種資訊。輸入裝置的例子包含鍵盤及指向裝置。The user interface 17 includes an input device and a display. The input device is configured to input operation signals from a user operating the information processing system 1 to the processor 11 . The display is configured to display various information to the user. Examples of input devices include keyboards and pointing devices.

通訊介面19係包含LAN(區域網路)介面及USB(通用序列匯流排)介面，並被用於和外部裝置通訊。資訊處理系統1係透過通訊介面19而與外部裝置進行資料的發送和接收。The communication interface 19 includes a LAN (Local Area Network) interface and a USB (Universal Serial Bus) interface, and is used to communicate with external devices. The information processing system 1 sends and receives data with external devices through the communication interface 19 .

資訊處理系統1中的處理器11係藉由按照電腦程式Pr執行處理，而生成擴展資料集15C，擴展資料集15C係利用第二資料集15B對透過通訊介面19而從外部裝置取得的第一資料集15A進行擴展後生成的擴展資料集。The processor 11 in the information processing system 1 generates an extended data set 15C by executing processing according to the computer program Pr. The extended data set 15C uses the second data set 15B to pair the first data set obtained from the external device through the communication interface 19. An extended data set generated by extending data set 15A.

擴展資料集15C係對第一資料集15A附加了第二資料集15B所包含的資訊的資料集。藉由擴展，第一資料集15A記述的各實體之資訊量增加。實體係例如人，特別是個人。為了基於擴展資料集15C進行人的行為分析和/或廣告傳送而執行資訊量的增加。The extended data set 15C is a data set in which the information included in the second data set 15B is added to the first data set 15A. Through expansion, the amount of information on each entity described in the first data set 15A increases. Entity systems such as people, especially individuals. The amount of information is increased for human behavior analysis and/or advertisement delivery based on the extended data set 15C.

具體而言，當來自用戶的執行指令透過用戶介面17被輸入時，資訊處理系統1的處理器11執行第2圖所示之分析處理。若開始執行第2圖所示的分析處理，則處理器11取得作為資料融合對象之第一資料集15A和第二資料集15B(S110、S120)。Specifically, when an execution instruction from the user is input through the user interface 17, the processor 11 of the information processing system 1 executes the analysis process shown in FIG. 2 . When the analysis process shown in FIG. 2 is started, the processor 11 acquires the first data set 15A and the second data set 15B as data fusion targets (S110, S120).

在S110、S120中，處理器11可以從儲存器15讀出預先儲存在儲存器15中的第一資料集15A及第二資料集15B。藉此，處理器11能夠取得第一資料集15A及第二資料集15B。In S110 and S120, the processor 11 may read out the first data set 15A and the second data set 15B pre-stored in the storage 15 from the storage 15. Thereby, the processor 11 can obtain the first data set 15A and the second data set 15B.

可以由用戶指定應取得的第一資料集15A及第二資料集15B。用戶可以預先收集作為資料融合對象之第一資料集15A及第二資料集15B並將其儲存在儲存器15中。The user can specify the first data set 15A and the second data set 15B to be obtained. The user can collect the first data set 15A and the second data set 15B as data fusion objects in advance and store them in the storage 15 .

亦或，處理器11可以藉由使用通訊介面19進行的通訊，從第一外部裝置取得第一資料集15A，並從第二外部裝置取得第二資料集15B。Alternatively, the processor 11 may obtain the first data set 15A from the first external device and the second data set 15B from the second external device through communication using the communication interface 19 .

第一資料集15A係關於複數個第一實體之資料集，且係記述第一實體各自的第一特徵之資料集。第一資料集15A係第一特徵資料的集合，各第一特徵資料分別表示複數個第一實體中與其對應之一個實體的第一特徵。The first data set 15A is a data set regarding a plurality of first entities, and is a data set describing first characteristics of each of the first entities. The first data set 15A is a set of first characteristic data, and each first characteristic data respectively represents the first characteristic of a corresponding entity among the plurality of first entities.

第二資料集15B係關於複數個第二實體之資料集，且係記述第二實體各自的第二特徵之資料集。第二特徵可以係與第一特徵不同的特徵。具體而言，第二資料集15B係第二特徵資料的集合，各第二特徵資料分別表示複數個第二實體中與其對應之一個實體的第二特徵。The second data set 15B is a data set regarding a plurality of second entities, and is a data set describing the second characteristics of each of the second entities. The second characteristic may be a different characteristic than the first characteristic. Specifically, the second data set 15B is a set of second characteristic data, and each second characteristic data respectively represents the second characteristic of a corresponding entity among the plurality of second entities.

第一實體的集合及第二實體的集合係例如彼此共同的母體中的不同的子集。母體可以係人的集合或消費者的集合。例如，第一實體的集合可以係與第一企業的顧客對應之人的集合。例如，第二實體的集合可以係與不同於第一企業的第二企業的顧客對應之人的集合。The set of first entities and the set of second entities are, for example, different subsets of a common parent. The parent can be a collection of people or a collection of consumers. For example, the set of first entities may be a set of people corresponding to customers of the first enterprise. For example, the set of second entities may be a set of people that correspond to customers of a second business that is different from the first business.

亦或，第一實體的集合可以係被作為第一行為的收集對象之人的集合。第二實體的集合可以係被作為第二行為的收集對象之人的集合。Alternatively, the set of first entities may be a set of persons who are the collection objects of the first act. The set of second entities may be a set of persons who are the collection objects of the second act.

第3A圖所示的第一資料集15A係關於實體係人之第一集合之資料，係包含關於每個人的購買行為之特徵資料。各特徵資料係與對應之人的ID建立關聯，並且用1或0之兩個值表示對應之人是否購買了複數個商品P1、P2、P3、…的各個商品。The first data set 15A shown in Figure 3A is data about the first set of entities and people, and includes characteristic data about each person's purchasing behavior. Each characteristic data is associated with the ID of the corresponding person, and uses two values of 1 or 0 to indicate whether the corresponding person purchased each of the plurality of products P1, P2, P3, ....

第3B圖所示的第二資料集15B係關於實體係人之第二集合之資料，係包含關於每個人的網頁內容的瀏覽行為之特徵資料。各特徵資料係與對應之人的ID建立關聯，關於複數個網站S1、S2、S3、…的各個網站，用1或0之兩個值表示對應之人是否訪問了對應之網站並瀏覽了網頁內容。The second data set 15B shown in Figure 3B is data about the second set of entities and people, and includes characteristic data about each person's browsing behavior of web page content. Each characteristic data is associated with the ID of the corresponding person. For each website of the plurality of websites S1, S2, S3, ..., two values of 1 or 0 are used to indicate whether the corresponding person has visited the corresponding website and browsed the web page. content.

在S110中，處理器11基於已取得的第一資料集15A中包含的每個第一實體的第一特徵資料，生成每個第一實體的M1維度特徵向量x＝(x1，x2，x3，…)。根據一個示例，特徵向量x的元素x1、x2、x3、…分別可以表示對應之人是否購買了商品P1、P2、P3、…。In S110, the processor 11 generates the M1-dimensional feature vector x=(x1, x2, x3, …). According to an example, the elements x1, x2, x3, ... of the feature vector x can respectively represent whether the corresponding person purchased the products P1, P2, P3, ....

同樣地，在S120中，處理器11基於已取得的第二資料集15B中包含的每個第二實體的第二特徵資料，生成每個第二實體的M2維度特徵向量y＝(y1，y2，y3，…)。根據一個示例，特徵向量y的元素y1、y2、y3、…分別可以表示對應之人是否瀏覽了網站S1、S2、S3、…中的網頁內容。Similarly, in S120, the processor 11 generates the M2-dimensional feature vector y=(y1, y2) of each second entity based on the acquired second feature data of each second entity included in the second data set 15B. ,y3,…). According to an example, the elements y1, y2, y3, ... of the feature vector y can respectively represent whether the corresponding person has browsed the web page content in the websites S1, S2, S3, ....

之後，處理器11藉由對特徵向量x的群組執行的降維處理(S130)，將各特徵向量x自M1維度特徵向量變換成比M1維度小的M維度的特徵向量，即低維度特徵向量Dx＝(Dx1，Dx2，…)。藉此，處理器11生成與特徵向量x的群組對應之低維度特徵向量Dx的群組。第3A圖的右下區域藉由表格示出低維度特徵向量Dx的示例。After that, the processor 11 transforms each feature vector x from an M1-dimensional feature vector into an M-dimensional feature vector smaller than the M1 dimension, that is, a low-dimensional feature, by performing dimensionality reduction processing (S130) on the group of feature vectors x. Vector Dx=(Dx1, Dx2,...). Thereby, the processor 11 generates a group of low-dimensional feature vectors Dx corresponding to the group of feature vectors x. The lower right area of Figure 3A shows an example of the low-dimensional feature vector Dx through a table.

處理器11進一步藉由對特徵向量y的群組執行的降維處理(S140)，將各特徵向量y自M2維度特徵向量變換成比M2維度小的M維度的特徵向量，即低維度特徵向量Dy＝(Dy1，Dy2，…)。藉此，處理器11生成與特徵向量y的群組對應之低維度特徵向量Dy的群組。低維度特徵向量Dy係與低維度特徵向量Dx具有相同維度數M的特徵向量。第3B圖的右下區域藉由表格示出低維度特徵向量Dy的示例。The processor 11 further performs dimensionality reduction processing (S140) on the group of feature vectors y to transform each feature vector y from an M2-dimensional feature vector into an M-dimensional feature vector smaller than the M2 dimension, that is, a low-dimensional feature vector. Dy=(Dy1, Dy2,...). Thereby, the processor 11 generates a group of low-dimensional feature vectors Dy corresponding to the group of feature vectors y. The low-dimensional feature vector Dy is a feature vector having the same number of dimensions M as the low-dimensional feature vector Dx. The lower right area of Figure 3B shows an example of the low-dimensional feature vector Dy through a table.

作為用於實現向低維度空間映射之算法的示例，已知有非負矩陣分解(Nonnegative Matrix Factorization)、潛在狄利克里分配(Latent Dirichlet Allocation)、奇異值分解(Singular value Decomposition)、及機率潛在語意分析(probabilistic Latent Semantic Analysis)。可以使用該等算法之一來執行S130、S140中的降維處理。As examples of algorithms for implementing mapping to low-dimensional spaces, Nonnegative Matrix Factorization, Latent Dirichlet Allocation, Singular value Decomposition, and Probabilistic Latent Semantics are known Analysis (probabilistic Latent Semantic Analysis). One of these algorithms may be used to perform the dimensionality reduction processing in S130 and S140.

根據上述算法，特徵向量可以被低維度化，以提取對每個實體賦予強特性化的主要特徵成分。亦或，特徵向量可以以使得用於區別每個實體之資訊損失得較少的形式被低維度化。According to the above algorithm, the feature vector can be low-dimensionalized to extract the main feature components that give strong characterization to each entity. Alternatively, the feature vectors may be reduced in dimensionality in a form such that less information is lost in distinguishing each entity.

之後，處理器11執行對齊處理(S150-S180)，即，基於低維度特徵向量Dx的群組和低維度特徵向量Dy的群組，計算第一實體和第二實體之間的關係的對齊處理。After that, the processor 11 performs an alignment process (S150-S180), that is, an alignment process that calculates the relationship between the first entity and the second entity based on the group of low-dimensional feature vectors Dx and the group of low-dimensional feature vectors Dy. .

使用核排序(Kernelized Sorting)的技術執行對齊處理。以下對使用核排序的對齊處理的細節進行說明。不過，也可以使用對抗式學習、格羅莫夫-瓦瑟斯坦對齊(Gromov-wasserstein Alignment)技術、或不均衡最佳傳輸(Unbalanced Optimal Transport)技術來實現對齊處理。Alignment processing is performed using Kernelized Sorting technology. Details of the alignment process using kernel sorting are described below. However, alignment processing can also be achieved using adversarial learning, Gromov-wasserstein Alignment technology, or Unbalanced Optimal Transport technology.

在S150中，處理器11使用低維度特徵向量Dx的群組生成關於第一實體的集合之相似度矩陣K。相似度矩陣K係N行N列的正方形矩陣。在此，N係低維度特徵向量Dx的個數，換言之，N係第一實體的數量。In S150, the processor 11 generates a similarity matrix K regarding the set of first entities using the group of low-dimensional feature vectors Dx. The similarity matrix K is a square matrix with N rows and N columns. Here, N represents the number of low-dimensional feature vectors Dx. In other words, N represents the number of first entities.

相似度矩陣K係被定義為，第i行第j列的元素的值Kij 表示第一實體的集合中的第i個實體與第j個實體之間的相似度之矩陣。The similarity matrix K is defined as a matrix in which the value Kij of the element in the i-th row and j-th column represents the similarity between the i-th entity and the j-th entity in the first entity set.

亦即，相似度矩陣K係被定義為，說明第一實體的集合中的實體之間的相似度的分布之矩陣。換言之，相似度矩陣K係被定義為，關於第一實體的集合，使用實體之間的相似性的尺度來說明特徵空間上的實體的分布之矩陣。That is, the similarity matrix K is defined as a matrix describing the distribution of similarities between entities in the first entity set. In other words, the similarity matrix K is defined as a matrix that explains the distribution of the entities on the feature space using the scale of similarity between the entities with respect to the first entity set.

具體而言，計算出將第i個實體的低維度特徵向量Dx即低維度特徵向量Dx[i]和第j個實體的低維度特徵向量Dx即低維度特徵向量Dx[j]代入核函數k(a,b)而得出的值k(Dx[i]，Dx[j]) ，以此作為相似度。亦即，Kij＝k(Dx[i],Dx[j])。Specifically, it is calculated that the low-dimensional feature vector Dx of the i-th entity, that is, the low-dimensional feature vector Dx[i], and the low-dimensional feature vector Dx of the j-th entity, that is, the low-dimensional feature vector Dx[j], are substituted into the kernel function k The value k(Dx[i], Dx[j]) obtained from (a,b) is used as the similarity. That is, Kij=k(Dx[i], Dx[j]).

核函數k(a,b)的例子中包含用下列數學式表示的高斯RBF(徑向基底函數)的核函數。使用該核函數k(a,b)計算出的相似度係從0到1的範圍取值。Examples of the kernel function k(a,b) include the Gaussian RBF (radial basis function) kernel function expressed by the following mathematical formula. The similarity calculated using this kernel function k(a,b) ranges from 0 to 1.

[數學式2] 根據上述核函數k(a,b)，相似度矩陣K的元素的值Kij係0＜Kij≦1。 [Mathematical formula 2] According to the above kernel function k(a,b), the value Kij of the element of the similarity matrix K is 0<Kij≦1.

在S160中，處理器11使用低維度特徵向量Dy的群組生成關於第二實體的集合之相似度矩陣L。相似度矩陣L係N行N列的正方形矩陣。在此，N係低維度特徵向量Dy的個數，換言之，N係第二實體的數量。亦即，第一實體的數量和第二實體的數量相同。In S160, the processor 11 generates a similarity matrix L regarding the set of second entities using the group of low-dimensional feature vectors Dy. The similarity matrix L is a square matrix with N rows and N columns. Here, N represents the number of low-dimensional feature vectors Dy. In other words, N represents the number of second entities. That is, the number of first entities is the same as the number of second entities.

與相似度矩陣K同樣地，相似度矩陣L係被定義為，第i行第j列的元素的值Lij表示第二實體的集合中第i個實體與第j個實體之間的相似度之矩陣。亦即，第i行第j列的元素的值Lij＝k(Dy[i],Dy[j])。Similar to the similarity matrix K, the similarity matrix L is defined as, the value Lij of the element in the i-th row and j-th column represents the similarity between the i-th entity and the j-th entity in the second entity set. matrix. That is, the value of the element in the i-th row and j-th column is Lij=k(Dy[i], Dy[j]).

在接下來的S170中，處理器11使用相似度矩陣K及相似度矩陣L，並根據下列公式搜尋使值Z(Ω)最大化之矩陣Ω來作為矩陣Ω ^＊。 [數學式3] 在此，矩陣H係N行N列的矩陣，且係當i＝j時第i行第j列的元素的值顯示1-1/N，當i≠j時第i行第j列的元素的值顯示0之對角矩陣。T係轉置記號。trace(X)係矩陣X之對角和。相似度矩陣K、L係對稱矩陣。當搜尋到矩陣Ω ^TL’Ω為矩陣K’的轉置矩陣之理想的Ω時，值Z(Ω) 係最大化。 In the next S170, the processor 11 uses the similarity matrix K and the similarity matrix L, and searches for the matrix Ω that maximizes the value Z(Ω) as the matrix Ω ^* according to the following formula. [Mathematical formula 3] Here, the matrix H is a matrix with N rows and N columns, and when i=j, the value of the element in the i-th row and j-th column shows 1-1/N, and when i≠j, the value of the element in the i-th row and j-th column shows 1-1/N. The value of shows a diagonal matrix of 0. T is the transposition symbol. trace(X) is the diagonal sum of matrix X. The similarity matrices K and L are symmetric matrices. When the matrix Ω ^T L'Ω is found to be the ideal Ω of the transpose matrix of the matrix K', the value Z(Ω) is maximized.

搜尋矩陣Ω ^＊對應如下，根據自低維度特徵向量Dx的群組特定的第一實體的相似度及自低維度特徵向量Dy的群組特定的第二實體之相似度，將複數個第一實體的每一個與複數個第二實體的至少一個建立對應，以使關於相似度之第一實體之間的相互關係適合於第二實體之間的相互關係。 The search matrix Ω ^* corresponds to the following. According to the similarity of the group-specific first entity from the low-dimensional feature vector Dx and the similarity of the group-specific second entity from the low-dimensional feature vector Dy, the plurality of first entities are Each of the entities is corresponding to at least one of the plurality of second entities, so that the mutual relationship between the first entities with respect to the similarity is adapted to the mutual relationship between the second entities.

換言之，搜尋矩陣Ω ^＊對應如下，搜尋用於將第一M維度特徵空間上的複數個第一實體映射到第二M維度特徵空間的映射（Mapping），以使第一實體的分布適合於第二實體的分布，其中，第一實體的分布係第一實體在自低維度特徵向量Dx的群組特定的第一M維度特徵空間中的分布，且係以實體之相似度定義的分布，第二實體的分布係第二實體在自低維度特徵向量Dy的群組特定的第二M維度特徵空間中的分布。 In other words, the search matrix Ω ^* corresponds to the following, searching for a mapping (Mapping) for mapping a plurality of first entities on the first M-dimensional feature space to the second M-dimensional feature space, so that the distribution of the first entity is suitable for the second M-dimensional feature space. The distribution of two entities, wherein the distribution of the first entity is the distribution of the first entity in the first M-dimensional feature space specified from the group of low-dimensional feature vectors Dx, and is the distribution defined by the similarity of the entities, and the The distribution of the two entities is the distribution of the second entity in the second M-dimensional feature space specified from the group of low-dimensional feature vectors Dy.

第4A圖的左圖概念性地表示第一實體的分布，第4B圖的左圖概念性地表示第二實體的分布。第4A圖及第4B圖所示之例僅為了說明技術，而定義了2維度之低維度特徵向量Dx、Dy。標記了符號E11、E12、E13、E14、E15、E16、E17的各點表示第一實體各自在特徵空間上的位置。標註了符號E21、E22、E23、E24、E25、E26、E27的各點表示第二實體各自在特徵空間上的位置。The left diagram in Figure 4A conceptually represents the distribution of the first entity, and the left diagram in Figure 4B conceptually represents the distribution of the second entity. The examples shown in Figure 4A and Figure 4B are only for illustrating the technology, and two-dimensional low-dimensional feature vectors Dx and Dy are defined. Each point marked with symbols E11, E12, E13, E14, E15, E16, and E17 represents the respective position of the first entity on the feature space. Each point marked with symbols E21, E22, E23, E24, E25, E26, and E27 represents the position of the second entity in the feature space.

從第4B圖可以看出，根據此例，低維度特徵向量Dy的成分Dy1與低維度特徵向量Dx的成分Dx2對應，低維度特徵向量Dy的成分Dy2與低維度特徵向量Dx的成分Dx1對應。As can be seen from Figure 4B, according to this example, the component Dy1 of the low-dimensional feature vector Dy corresponds to the component Dx2 of the low-dimensional feature vector Dx, and the component Dy2 of the low-dimensional feature vector Dy corresponds to the component Dx1 of the low-dimensional feature vector Dx.

亦即，根據第4A圖所示之例，僅以在相似度矩陣K與相似度矩陣L之間不同的形式定義了第一實體的群組和第二實體的群組各自的實體的排列及維度的順序，實質上表示相同的實體的集合的相似度分布。That is, according to the example shown in FIG. 4A , the arrangement and arrangement of the respective entities of the first entity group and the second entity group are defined only in a different form between the similarity matrix K and the similarity matrix L. The order of dimensions essentially represents the similarity distribution of sets of identical entities.

當第一實體的群組和第二實體的群組基於如母體相同等理由而具有彼此共同或彼此相關的集團之性質時，藉由特徵向量x、y之低維度化，即使作為資訊源的第一資料集15A與第二資料集15B之間不存在共用變數，亦能夠抽出各實體的本質上共同之特徵成分。When the group of the first entity and the group of the second entity have the properties of common or related groups based on reasons such as the same parent, through the low-dimensionalization of the feature vectors x and y, even as an information source There are no common variables between the first data set 15A and the second data set 15B, and it is possible to extract essentially common characteristic components of each entity.

不過，即使藉由上述之低維度化，低維度特徵向量Dx、Dy僅具有相同之特徵成分，亦不能對齊特徵成分的排列。又，在第一資料集15A與第二資料集15B之間實體的排列亦未對齊。However, even through the above-mentioned low-dimensionalization, the low-dimensional feature vectors Dx and Dy only have the same feature components, and the arrangement of the feature components cannot be aligned. In addition, the arrangement of entities between the first data set 15A and the second data set 15B is not aligned.

搜尋矩陣Ω ^＊對應於以下搜尋作業，關於實體的排列及維度的排列，以相似度分布之同一性為線索，搜尋未對齊的特徵向量Dx、Dy的對應關係。 The search matrix Ω ^* corresponds to the following search operation, regarding the arrangement of entities and the arrangement of dimensions, using the identity of the similarity distribution as a clue to search for the correspondence between the unaligned feature vectors Dx and Dy.

在接下來的S180中，處理器11根據矩陣Ω ^＊將第一實體之每一個與第二實體的至少一個建立對應。矩陣Ω ^＊中的第i行第j列的元素值表示，根據相似度的分布，第一實體的集合中的第i個實體和第二實體的集合中的第j個實體彼此對應之程度或可能性的大小。 In the next S180, the processor 11 establishes a correspondence between each of the first entities and at least one of the second entities according to the matrix Ω ^* . The element value of the i-th row and j-th column in the matrix Ω ^* represents, according to the distribution of similarity, the degree to which the i-th entity in the first entity set and the j-th entity in the second entity set correspond to each other or The magnitude of the possibility.

矩陣Ω ^＊的各元素理想地取值為0或1，對於各行，一行的元素值的合計係1，對於各列，一列的元素值的合計係1。當矩陣Ω ^＊係上述之理想矩陣時，值為1的元素的行序號所對應之第一實體和該元素的列序號所對應之第二實體彼此對應。 Each element of the matrix Ω ^* ideally takes a value of 0 or 1, and the sum of the element values in a row is 1 for each row, and the sum of the element values of a column is 1 for each column. When the matrix Ω ^* is the above-mentioned ideal matrix, the first entity corresponding to the row number of the element with a value of 1 and the second entity corresponding to the column number of the element correspond to each other.

亦即表示當矩陣Ω ^＊中的第i行第j列的元素的值為1時，第一實體的集合中的第i個實體和第二實體的集合中的第j個實體彼此對應。 That is to say, when the value of the element in the i-th row and j-th column in the matrix Ω ^* is 1, the i-th entity in the set of first entities and the j-th entity in the set of second entities correspond to each other.

不過，在數值計算上，矩陣Ω ^＊鮮少成為上述之理想矩陣。因此，在S180中，按照下述任一種方法將複數個第一實體之每一個與第二實體的至少一個建立對應。 However, in numerical calculations, the matrix Ω ^* rarely becomes the ideal matrix mentioned above. Therefore, in S180, each of the plurality of first entities is associated with at least one of the second entities according to any of the following methods.

（方法1）在矩陣Ω ^＊的第i行，搜尋值為最大的元素。在值為最大的元素係第c列之情況下，將第一實體的集合中的第i個實體與第二實體的集合中的第c個實體建立對應。對所有的行均執行上述操作。 (Method 1) In the i-th row of matrix Ω ^* , search for the element with the largest value. When the element with the largest value is in the c-th column, establish a correspondence between the i-th entity in the first entity set and the c-th entity in the second entity set. Do this for all rows.

在該方法中，第二實體中的一個實體有可能與複數個第一實體建立對應。為了抑制該可能性，也可以執行鄰近搜尋。作為鄰近搜尋的示例，已知有上下文相異性度量(Contextual Dissimilarity Measure)。In this method, one entity in the second entity may be corresponding to a plurality of first entities. To suppress this possibility, a proximity search can also be performed. As an example of proximity search, contextual dissimilarity measure is known.

（方法2）為了建立嚴密的一對一的對應，藉由輸入矩陣Ω ^＊並求解最佳配置問題，將複數個第一實體之每一個與不重複的第二實體的一個建立對應。 (Method 2) In order to establish a strict one-to-one correspondence, by inputting the matrix Ω ^* and solving the optimal configuration problem, establish a correspondence between each of the plurality of first entities and one of the non-repeating second entities.

在S180中，處理器11能夠進一步輸出第5圖所示的對應表作為說明第一實體與第二實體之間的對應關係的表。亦即，處理器11能夠輸出記述與第一實體之每一個的ID建立關聯且與之對應的第二實體的ID的對應表，並且將該對應表記憶到儲存器15中。In S180, the processor 11 can further output the correspondence table shown in FIG. 5 as a table illustrating the correspondence relationship between the first entity and the second entity. That is, the processor 11 can output a correspondence table describing the ID of the second entity associated with each ID of the first entity and corresponding thereto, and store the correspondence table in the memory 15 .

處理器11進一步執行資料融合處理 (S190)。在資料融合處理中，處理器11基於上述建立對應之結果或上述對應表，對第一資料集15A和第二資料集15B進行結合，並生成擴展資料集15C。The processor 11 further performs data fusion processing (S190). In the data fusion process, the processor 11 combines the first data set 15A and the second data set 15B based on the above-mentioned corresponding result or the above-mentioned correspondence table, and generates an extended data set 15C.

擴展資料集15C包含複數個擴展資料。如第6圖所示，複數個擴展資料之每一個係彼此相對應的一個第一特徵資料和一個第二特徵資料相結合而生成的結合資料。Extension set 15C contains multiple extensions. As shown in FIG. 6, each of the plurality of extended data is combined data generated by combining a first characteristic data and a second characteristic data corresponding to each other.

亦即，處理器11根據對應表，將包含在第一資料集15A的複數個第一特徵資料之每一個分別與包含在第二資料集15B的複數個第二特徵資料中的一個相結合，藉此來生成擴展資料集15C。That is, the processor 11 combines each of the plurality of first characteristic data included in the first data set 15A with one of the plurality of second characteristic data included in the second data set 15B according to the correspondence table, This is used to generate extended data set 15C.

在第一實體的集合中的第i個實體與第二實體的集合中的第j個實體已建立對應時，處理器11根據對應表，將說明第一實體的集合中的第i個實體的特徵之第一特徵資料和說明第二實體的集合中的第j個實體的特徵之第二特徵資料進行結合，來生成上述第i個實體的擴展資料。When the i-th entity in the first entity set has established a correspondence with the j-th entity in the second entity set, the processor 11 will explain the i-th entity in the first entity set according to the correspondence table. The first characteristic data of the characteristics and the second characteristic data describing the characteristics of the j-th entity in the set of second entities are combined to generate the above-mentioned extended data of the i-th entity.

以上述之方式生成的擴展資料集15C被儲存在儲存器15中。例如按照用戶透過用戶介面17而輸入的指令，使儲存在儲存器15中的擴展資料集15C透過通訊介面19被傳輸至其他系統。The extended data set 15C generated in the above manner is stored in the storage 15 . For example, according to instructions input by the user through the user interface 17 , the extended data set 15C stored in the storage 15 is transmitted to other systems through the communication interface 19 .

其他系統例如可以係廣告傳送系統。廣告傳送系統能夠根據擴展資料集15C，來辨別廣告傳送標的之實體，並向該實體傳送廣告。Other systems may be, for example, advertising delivery systems. The advertisement delivery system can identify the entity of the advertisement delivery target according to the extended data set 15C and deliver the advertisement to the entity.

若在S190中結束資料融合處理，則處理器11結束第2圖所示的分析處理。If the data fusion process ends in S190, the processor 11 ends the analysis process shown in Figure 2.

如上述之說明，根據本實施形態之資訊處理系統1，即使第一資料集15A和第二資料集15B之間不存在共用變數，亦能夠根據相似度之分布使第一實體與第二實體適當地建立對應。As explained above, according to the information processing system 1 of this embodiment, even if there are no common variables between the first data set 15A and the second data set 15B, the first entity and the second entity can be matched based on the distribution of similarity. establish correspondence.

為了適當地建立對應，在第一實體的集合與第二實體的集合之間，相似度之分布彼此一致或彼此類似或彼此相關為較佳。In order to properly establish correspondence, it is preferable that the distributions of similarities between the set of first entities and the set of second entities are consistent with each other or similar to each other or correlated with each other.

當第一實體的集合和第二實體的集合係來自相同的母體的子集時，上述較佳條件大致被滿足。因此，當第一實體及第二實體係人時，亦即，表示關於人的特徵之資料集作為第一資料集15A及第二資料集15B被處理時，本實施形態之技術有意義地發揮作用。When the set of first entities and the set of second entities are subsets from the same parent, the above preferred conditions are generally satisfied. Therefore, when the first entity and the second entity are persons, that is, when the data set representing the characteristics of the person is processed as the first data set 15A and the second data set 15B, the technology of this embodiment functions meaningfully. .

特別是人的行為大都顯示特別是與人口統計屬性相應的傾向。因此，當第一資料集15A及第二資料集15B係基於來自被推定為人口統計屬性之分布彼此類似的集團的收集資料而生成的資料集時，能夠實現在實體之間適當地建立對應。In particular, human behavior mostly shows tendencies that correspond especially to demographic attributes. Therefore, when the first data set 15A and the second data set 15B are data sets generated based on collected data from groups whose distributions of demographic attributes are presumed to be similar to each other, appropriate correspondence between entities can be achieved.

例如，即使第一資料集15A及第二資料集15B係不存在共用變數且係說明屬於彼此不同的集團的人的特徵之資料集，或係說明不同行為的特徵之資料集，亦能夠在實體之間適當地建立對應。因此，能夠生成有助於人的心理、行為分析的資料集作為擴展資料集15C。For example, even if the first data set 15A and the second data set 15B have no common variables and are data sets describing characteristics of people who belong to different groups, or data sets describing characteristics of different behaviors, the entity can be establish appropriate correspondence between them. Therefore, a data set useful for analyzing human psychology and behavior can be generated as the extended data set 15C.

根據上述之示例，第一資料集15A係記述關於屬於第一集團的複數個人各自的購買行為之特徵的資料集，第二資料集15B係記述關於屬於第二集團的複數個人各自的網站訪問行為和/或網頁內容瀏覽行為之特徵的資料集。According to the above example, the first data set 15A is a data set describing the characteristics of the purchasing behavior of a plurality of individuals belonging to the first group, and the second data set 15B is a data set describing the website access behaviors of a plurality of individuals belonging to the second group. and/or a collection of data that characterizes web content browsing behavior.

根據其他示例，也可以使用記述如電視視聽行為等關於人的媒體接觸行為之特徵的資料集作為第一資料集15A及第二資料集15B中之一者。亦可以使用記述關於智慧手機等攜帶式終端的使用狀況之特徵的資料集作為第一資料集15A及第二資料集15B中之一者。According to other examples, a data set describing characteristics of people's media exposure behavior, such as television viewing behavior, may also be used as one of the first data set 15A and the second data set 15B. As one of the first data set 15A and the second data set 15B, a data set describing characteristics of the usage status of a portable terminal such as a smartphone may be used.

也可以使用記述關於在離線空間(亦即現實空間)中的人的移動之特徵的資料集作為第一資料集15A及第二資料集15B中之一者。資料集例如可以記述關於對複數個場所的訪問、移動路徑和/或移動手段之特徵來作為關於在離線空間中的人的移動之特徵。A data set describing characteristics of people's movement in offline space (that is, real space) may also be used as one of the first data set 15A and the second data set 15B. The data set may, for example, describe features regarding visits to a plurality of places, movement paths and/or means of movement as features regarding a person's movement in an offline space.

也可以使用記述關於在線上空間中的人的移動之特徵的資料集作為第一資料集15A及第二資料集15B中之一者。資料集可以記述關於在虛擬實境(VR)空間中的人的移動或網路漫遊（net surfing）之特徵來作為關於在線上空間中的人的移動之特徵。亦可以使用基於透過問卷調查收集到的資料而生成的資料集作為第一資料集15A及第二資料集15B中之一者。A data set describing characteristics of human movement in an online space may be used as one of the first data set 15A and the second data set 15B. The data set may describe characteristics about human movement in a virtual reality (VR) space or net surfing as characteristics about human movement in an online space. A data set generated based on data collected through a questionnaire survey may also be used as one of the first data set 15A and the second data set 15B.

作為第一資料集15A和第二資料集15B的組合，可以採用透過問卷調查收集到的資料集和關於電視視聽行為的資料集的組合，或可以採用關於移動歷史記錄的資料集和關於購買的資料集的組合。As a combination of the first data set 15A and the second data set 15B, a combination of a data set collected through a questionnaire survey and a data set on TV viewing behavior may be used, or a data set on movement history and a data set on purchases may be used. A combination of data sets.

在上述實施形態中，也可以對低維度特徵向量Dx、Dy的群組執行ZCA白色化、正規化及標準化等處理。In the above embodiment, ZCA whitening, normalization, standardization, etc. may also be performed on the group of low-dimensional feature vectors Dx and Dy.

在上述實施形態中，由設計者或用戶規定低維度特徵向量Dx、Dy的維度數M，不過，資訊處理系統1也可以構成為搜尋最合適的維度數M。例如，資訊處理系統1可以構成為，對同一資料集15A、15B一邊變更維度數M一邊循環執行第2圖所示之分析處理，並且以Z(Ω)的最大值為指標來自動選定最合適的維度數M。In the above embodiment, the designer or user specifies the number of dimensions M of the low-dimensional feature vectors Dx and Dy. However, the information processing system 1 may also be configured to search for the most appropriate number of dimensions M. For example, the information processing system 1 can be configured to cyclically execute the analysis processing shown in FIG. 2 while changing the number of dimensions M for the same data sets 15A and 15B, and automatically select the most suitable one using the maximum value of Z (Ω) as an index. The number of dimensions M.

[第二實施形態] 第二實施形態之資訊處理系統1係構成為，處理器11執行第7圖所示的分析處理來取代第2圖所示的分析處理。以下選擇性地說明處理器11所執行的分析處理的細節來作為對第二實施形態的說明。可以理解為在本實施形態中未提及的資訊處理系統1的構成與第一實施形態相同。 [Second Embodiment] The information processing system 1 of the second embodiment is configured such that the processor 11 executes the analysis process shown in FIG. 7 instead of the analysis process shown in FIG. 2 . The details of the analysis processing executed by the processor 11 will be selectively described below as an explanation of the second embodiment. It can be understood that the configuration of the information processing system 1 not mentioned in this embodiment is the same as that of the first embodiment.

若處理器11開始執行第7圖所示的分析處理，則與第一實施形態同樣地，處理器11取得作為資料融合對象之第一資料集15A和第二資料集15B (S310、S320)。When the processor 11 starts executing the analysis process shown in FIG. 7 , similarly to the first embodiment, the processor 11 acquires the first data set 15A and the second data set 15B that are data fusion targets (S310, S320).

與在S110中的處理同樣地，處理器11基於第一資料集15A生成每個第一實體的特徵向量x(S310)。與在S120中的處理同樣地，處理器11基於第二資料集15B生成每個第二實體的特徵向量y(S320)。Similar to the process in S110, the processor 11 generates the feature vector x of each first entity based on the first data set 15A (S310). Similar to the process in S120, the processor 11 generates the feature vector y of each second entity based on the second data set 15B (S320).

與在S130、S140中的處理同樣地，處理器11進一步藉由降維處理，生成與特徵向量x的群組對應之低維度特徵向量Dx的群組，並生成與特徵向量y的群組對應之低維度特徵向量Dy的群組(S330)。Similar to the processing in S130 and S140, the processor 11 further performs dimensionality reduction processing to generate a group of low-dimensional feature vectors Dx corresponding to the group of feature vectors x, and generates a group corresponding to the feature vector y. A group of low-dimensional feature vectors Dy (S330).

在接下來的S340中，處理器11執行與在S150、S160、S170中的處理同様的處理。亦即，處理器11使用低維度特徵向量Dx的群組生成關於第一實體的集合之相似度矩陣K，並使用低維度特徵向量Dy的群組生成關於第二實體的集合之相似度矩陣L。In the following S340, the processor 11 executes the same processing as the processing in S150, S160, and S170. That is, the processor 11 uses the group of low-dimensional feature vectors Dx to generate the similarity matrix K about the set of the first entity, and uses the group of low-dimensional feature vectors Dy to generate the similarity matrix L about the set of the second entity. .

進一步地，處理器11使用相似度矩陣K及相似度矩陣L搜尋在第一實施形態已說明過的使值Z(Ω)最大化的矩陣Ω來作為矩陣Ω ^＊(S340)。在此，將已搜尋到的矩陣Ω ^＊表述為對應關係矩陣Ω ^＊。 Furthermore, the processor 11 uses the similarity matrix K and the similarity matrix L to search for the matrix Ω that maximizes the value Z(Ω) described in the first embodiment as the matrix Ω ^* (S340). Here, the searched matrix Ω ^* is expressed as a correspondence matrix Ω ^* .

之後，處理器11判斷是否滿足循環結束條件 (S350)。當判斷為不滿足循環結束條件時(在S350中為No)，處理器11執行S360的處理。After that, the processor 11 determines whether the loop end condition is satisfied (S350). When it is determined that the loop end condition is not satisfied (No in S350), the processor 11 executes the process of S360.

在S360中，處理器11以固定在S340中已搜尋到的對應關係矩陣Ω ^＊的狀態，搜尋以使格羅莫夫-瓦瑟斯坦(Gromov-Wasserstein)距離的成本最小化的降維方式。 In S360, the processor 11 searches for a dimensionality reduction method that minimizes the cost of the Gromov-Wasserstein distance in the state of fixing the correspondence matrix Ω ^* that has been searched in S340.

固定對應關係矩陣Ω ^＊的狀態係對應於固定第一實體與第二實體之間的對應關係的狀態。如上所述，搜尋使值Z(Ω) 最大化的矩陣Ω作為對應關係矩陣Ω ^＊係對應於，搜尋用於將第一特徵空間上的複數個第一實體映射到第二特徵空間的映射（Mapping），以使第一特徵空間上的複數個第一實體適合於第二特徵空間中的第二實體的分布。 The state of the fixed correspondence matrix Ω ^* corresponds to the state of the fixed correspondence between the first entity and the second entity. As described above, searching for the matrix Ω that maximizes the value Z(Ω) as the correspondence matrix Ω ^* corresponds to searching for a mapping for mapping a plurality of first entities on the first feature space to the second feature space ( Mapping), so that a plurality of first entities on the first feature space are suitable for the distribution of the second entity in the second feature space.

格羅莫夫-瓦瑟斯坦(Gromov-Wasserstein)距離的成本係對應於，將第一實體的集合映射到第二特徵空間時的第一實體與第二實體之間的最佳傳輸問題中的傳輸成本。The cost system of Gromov-Wasserstein distance corresponds to the optimal transmission problem between the first entity and the second entity when mapping the set of the first entity to the second feature space. Transmission costs.

使用相似度矩陣K、L及對應關係矩陣Ω ^＊能夠計算出格羅莫夫-瓦瑟斯坦(Gromov-Wasserstein)距離的成本。如上所述，相似度矩陣K係在元素中包含基於降維後的低維度特徵向量Dx計算出的第一實體間之相似度的矩陣。相似度矩陣L係在元素中包含基於降維後的低維度特徵向量Dy計算出的第二實體間之相似度的矩陣。 The cost of the Gromov-Wasserstein distance can be calculated using the similarity matrices K, L and the correspondence matrix Ω ^* . As mentioned above, the similarity matrix K is a matrix that contains the similarity between the first entities calculated based on the dimensionally reduced low-dimensional feature vector Dx in its elements. The similarity matrix L is a matrix that contains the similarity between the second entities calculated based on the dimensionally reduced low-dimensional feature vector Dy in its elements.

搜尋使格羅莫夫-瓦瑟斯坦(Gromov-Wasserstein)距離最小化的降維方式係對應於，搜尋用於生成低維度特徵向量Dx、Dy的降維方式，其中，低維度特徵向量Dx、Dy係使以對應關係矩陣Ω ^＊示出的第一實體與第二實體之間的對應關係達到最佳合理化。 Searching for a dimensionality reduction method that minimizes the Gromov-Wasserstein distance corresponds to searching for a dimensionality reduction method for generating low-dimensional feature vectors Dx, Dy, where the low-dimensional feature vectors Dx, The Dy system optimally rationalizes the correspondence between the first entity and the second entity shown by the correspondence matrix Ω ^* .

成本的最小化係對應於，根據對應關係矩陣Ω ^＊，搜尋降維方式，以使彼此對應之第一實體與第二實體之間於特徵空間上的距離縮短，換言之，使第一實體的低維度特徵向量Dx與第二實體的低維度特徵向量Dy之間於特徵空間上的距離縮短。 The minimization of the cost corresponds to searching for a dimensionality reduction method according to the correspondence matrix Ω ^* so as to shorten the distance in the feature space between the first entity and the second entity that correspond to each other. In other words, to make the first entity have a lower The distance between the dimensional feature vector Dx and the low-dimensional feature vector Dy of the second entity is shortened in the feature space.

例如，在將M1維度的特徵向量x變換成M維度的低維度特徵向量Dx的情況下，使M行M1列的變換矩陣Tx作用於特徵向量x。在將M2維度的特徵向量y變換成M維度的低維度特徵向量Dy的情況下，使M行M2列的變換矩陣Ty作用於特徵向量y。此時，構成變換矩陣Tx、Ty的參數m的數量係(M＊M1＋M＊M2)個。For example, when transforming an M1-dimensional feature vector x into an M-dimensional low-dimensional feature vector Dx, the transformation matrix Tx of M rows and M1 columns is applied to the feature vector x. When converting the M2-dimensional feature vector y into the M-dimensional low-dimensional feature vector Dy, the transformation matrix Ty of M rows and M2 columns is made to act on the feature vector y. At this time, the number of parameters m constituting the transformation matrices Tx and Ty is (M*M1+M*M2).

例如使用梯度法等搜尋使上述成本最小化的參數m作為變換矩陣Tx、Yy的參數m，藉此實現降維方式的搜尋。For example, the gradient method is used to search for the parameter m that minimizes the above cost as the parameter m of the transformation matrix Tx and Yy, thereby realizing the search for the dimensionality reduction method.

之後，處理器11使用已搜尋到的降維方式(例如變換矩陣Tx、Ty)來使特徵向量x、y低維度化，並計算出新的低維度特徵向量Dx、Dy(S370)。Afterwards, the processor 11 uses the searched dimensionality reduction method (for example, transformation matrices Tx, Ty) to reduce the dimensionality of the feature vectors x and y, and calculates new low-dimensional feature vectors Dx and Dy (S370).

處理器11使用基於新的低維度特徵向量Dx的相似度矩陣K及基於新的低維度特徵向量Dy的相似度矩陣L來搜尋使值Z(Ω)最大化的矩陣Ω作為新的對應關係矩陣Ω ^＊(S340)。 The processor 11 uses the similarity matrix K based on the new low-dimensional feature vector Dx and the similarity matrix L based on the new low-dimensional feature vector Dy to search for the matrix Ω that maximizes the value Z(Ω) as a new correspondence matrix Ω ^* (S340).

處理器11藉由以此方式循環執行S360、S370、S340的處理，而與搜尋更佳的降維方式一併再次搜尋匹配精度高的對應關係矩陣Ω ^＊。 By executing the processes of S360, S370, and S340 in this manner, the processor 11 searches again for a correspondence matrix Ω ^* with high matching accuracy together with searching for a better dimensionality reduction method.

當滿足循環結束條件時(在S350中為Yes)，處理器11執行S380的處理。例如在S340的處理已被執行既定次數的情況下，亦或，在藉由再搜尋而搜尋到的對應關係矩陣Ω ^＊的變化量係未達到既定量的情況下，滿足循環結束條件。 When the loop end condition is satisfied (Yes in S350), the processor 11 executes the process of S380. For example, when the process of S340 has been executed a predetermined number of times, or when the change amount of the correspondence matrix Ω ^* found by re-searching does not reach the predetermined amount, the loop end condition is satisfied.

在S380中，與第一實施形態中的S180的處理同樣地，處理器11根據在循環處理的最後計算出的對應關係矩陣Ω ^＊，將第一實體之每一個與第二實體的至少一個建立對應。處理器11能夠進一步記憶並輸出用於說明第一實體與第二實體之間的對應關係的對應表。 In S380, similarly to the process of S180 in the first embodiment, the processor 11 establishes each of the first entities and at least one of the second entities based on the correspondence matrix Ω ^* calculated at the end of the loop process. correspond. The processor 11 can further memorize and output a correspondence table describing the correspondence between the first entity and the second entity.

之後，與在S190中的處理同樣地，處理器11藉由執行資料融合處理，對第一資料集15A和第二資料集15B進行結合，並生成擴展資料集15C，並且將已生成的擴展資料集15C儲存在儲存器15中(S390)。Thereafter, similarly to the processing in S190, the processor 11 performs data fusion processing to combine the first data set 15A and the second data set 15B, generate an extended data set 15C, and combine the generated extended data The set 15C is stored in the storage 15 (S390).

以上說明的第二實施形態之資訊處理系統1能夠藉由上述循環處理在第一實體與第二實體之間建立精度更佳的對應。因此，能夠生成精度較佳的擴展資料集15C。The information processing system 1 of the second embodiment described above can establish a more accurate correspondence between the first entity and the second entity through the above-mentioned loop processing. Therefore, an extended data set 15C with better accuracy can be generated.

[第三實施形態] 第三實施形態之資訊處理系統1係構成為，根據用戶透過用戶介面17發出的執行指示，處理器11執行第8圖所示的評價處理。以下作為第三實施形態的說明，對處理器11執行的評價處理的細節進行說明。可以理解為在本實施形態中未提及的資訊處理系統1的構成與第一或第二實施形態相同。 [Third Embodiment] The information processing system 1 of the third embodiment is configured such that the processor 11 executes the evaluation process shown in FIG. 8 based on an execution instruction issued by the user through the user interface 17 . As an explanation of the third embodiment, the details of the evaluation process executed by the processor 11 will be described below. It can be understood that the configuration of the information processing system 1 not mentioned in this embodiment is the same as that of the first or second embodiment.

為了對評價對象之資料集是否係在第2圖或第7圖所示的分析處理中能夠高精度地執行建立對應及資料融合的優良資料集進行評價，而執行評價處理。評價對象之資料集係對應於在分析處理中可以作為第一資料集15A或第二資料集15B而使用的資料集。The evaluation process is performed to evaluate whether the data set to be evaluated is an excellent data set that can perform correspondence and data fusion with high accuracy in the analysis process shown in FIG. 2 or FIG. 7 . The data set to be evaluated corresponds to a data set that can be used as the first data set 15A or the second data set 15B in analysis processing.

若開始執行評價處理，則處理器11取得來自用戶的執行指示和被指定的評價對象之資料集(S410)。處理器11能夠從儲存器15取得被指定的評價對象之資料集。When the execution of the evaluation process is started, the processor 11 obtains the execution instruction from the user and the data set of the designated evaluation target (S410). The processor 11 can obtain the data set of the designated evaluation object from the storage 15 .

之後，處理器11根據評價對象之資料集，對每個實體生成第一特徵向量x_1和第二特徵向量x_2 (S420)。評價對象之資料集能夠包含用(Q1＋Q2)個元素表示與每個實體對應之實體的特徵之特徵資料。Afterwards, the processor 11 generates a first feature vector x_1 and a second feature vector x_2 for each entity based on the data set of the evaluation object (S420). The data set of the evaluation object can include characteristic data representing the characteristics of the entity corresponding to each entity using (Q1 + Q2) elements.

處理器11能夠將(Q1＋Q2)個元素分割成包含Q1個元素的第一元素群和包含Q2個元素的第二元素群。(Q1＋Q2)個元素各自可以隨機地被分類到第一元素群及第二元素群之任一者。The processor 11 can divide (Q1+Q2) elements into a first element group including Q1 elements and a second element group including Q2 elements. Each of the (Q1+Q2) elements can be randomly classified into either the first element group or the second element group.

處理器11能夠基於評價對象之資料集，對每個實體生成第一特徵向量x_1和第二特徵向量x_2，其中，第一特徵向量x_1係記述關於對應之實體的第一元素群的特徵，第二特徵向量x_2係記述關於對應之實體的第二元素群的特徵。The processor 11 can generate a first feature vector x_1 and a second feature vector x_2 for each entity based on the data set of the evaluation object, where the first feature vector x_1 describes the characteristics of the first element group of the corresponding entity, and the The two characteristic vectors x_2 describe the characteristics of the second element group of the corresponding entity.

例如，在評價對象之資料集對每個實體均包含可以在S110、S120、S310或S320中生成特徵向量v＝(v[1]，v[2]，v[3]，…，v[Q])的元素數Q＝(Q1＋Q2)的特徵資料的情況下，可以生成包含Q1個元素的第一特徵向量x_1＝(v[1]，v[2]，…，v[Q1])及包含Q2個元素的第二特徵向量x_2＝(v[Q1＋1]，v[Q1＋2]，…，v[Q1＋Q2])。For example, if the data set of the evaluation object contains each entity, the feature vector v=(v[1], v[2], v[3],..., v[Q] can be generated in S110, S120, S310 or S320 ]), the first feature vector x_1=(v[1], v[2],..., v[Q1]) containing Q1 elements can be generated and contains The second eigenvector x_2 of Q2 elements = (v[Q1+1], v[Q1+2],..., v[Q1+Q2]).

第一特徵向量x_1係對應於第一實體的集合中的每個實體的特徵向量x，第二特徵向量x_2係對應於與第一實體的集合相同的第二實體的集合中的每個實體的特徵向量y。The first feature vector x_1 is the feature vector x corresponding to each entity in the set of first entities, and the second feature vector x_2 is corresponding to each entity in the set of second entities that is the same as the set of first entities. eigenvector y.

之後，處理器11在S430、S440中對第一特徵向量x_1及第二特徵向量x_2執行與在S130～S170中執行的處理相同的處理。Thereafter, the processor 11 performs the same processing as the processing performed in S130 to S170 on the first feature vector x_1 and the second feature vector x_2 in S430 and S440.

在S430中，與在S130、S140中的處理同樣地，處理器11對第一實體之每一個實體的第一特徵向量x_1及第二實體之每一個實體的第二特徵向量x_2執行降維處理，並生成相同維度數之低維度特徵向量Dx_1及低維度特徵向量Dx_2。In S430, similar to the processing in S130 and S140, the processor 11 performs dimensionality reduction processing on the first feature vector x_1 of each of the first entities and the second feature vector x_2 of each of the second entities. , and generate low-dimensional feature vector Dx_1 and low-dimensional feature vector Dx_2 with the same number of dimensions.

處理器11根據第一實體之每一個實體的低維度特徵向量Dx_1，生成與相似度矩陣K相對應的、表示第一實體間的低維度特徵向量Dx_1的相似度的相似度矩陣。處理器11進一步根據第二實體之每一個實體的低維度特徵向量Dx_2，生成與相似度矩陣L相對應的、表示第二實體間的低維度特徵向量Dx_2的相似度的相似度矩陣。The processor 11 generates a similarity matrix corresponding to the similarity matrix K and representing the similarity of the low-dimensional feature vectors Dx_1 between the first entities based on the low-dimensional feature vector Dx_1 of each of the first entities. The processor 11 further generates a similarity matrix corresponding to the similarity matrix L and representing the similarity of the low-dimensional feature vectors Dx_2 between the second entities based on the low-dimensional feature vector Dx_2 of each second entity.

處理器11根據該等相似度矩陣，搜尋使值Z(Ω)最大化的矩陣Ω來作為對應關係矩陣Ω ^＊(S440)。 The processor 11 searches for the matrix Ω that maximizes the value Z(Ω) based on the similarity matrices as the correspondence matrix Ω ^* (S440).

之後，關於與低維度特徵向量Dx_1的群組對應之第一實體的集合及與低維度特徵向量Dx_2的群組對應之第二實體的集合，處理器11計算出對應關係矩陣Ω ^＊正確表示第一實體和第二實體之間的對應關係的程度來作為分數(S450)。 Afterwards, regarding the set of first entities corresponding to the group of low-dimensional feature vectors Dx_1 and the set of second entities corresponding to the group of low-dimensional feature vectors Dx_2, the processor 11 calculates the correspondence matrix Ω ^* to correctly represent the first The degree of correspondence between one entity and the second entity is used as a score (S450).

藉此，處理器11對評價對象之資料集是否係能夠高精度地執行藉由分析處理而實現的建立對應及資料融合的優良資料集進行評價(S450)。Thereby, the processor 11 evaluates whether the evaluation target data set is an excellent data set that can perform correspondence and data fusion by analysis processing with high accuracy (S450).

當處理器11在S420中生成第一實體之每一個實體的特徵向量x_1及第二實體之每一個實體的特徵向量x_2時，處理器11能夠預先記憶第一實體與第二實體之間的正確的對應關係。When the processor 11 generates the feature vector x_1 of each entity of the first entity and the feature vector x_2 of each entity of the second entity in S420, the processor 11 can memorize the correct relationship between the first entity and the second entity in advance. corresponding relationship.

在如上述所示已記憶了對應關係的正確答案之環境下，處理器11在S430、S440中執行與分析處理相同的處理並計算出對應關係矩陣Ω ^＊，並且將由對應關係矩陣Ω ^＊特定的對應關係與正確答案進行比較。 In an environment where the correct answer to the correspondence relationship has been memorized as described above, the processor 11 performs the same processing as the analysis process in S430 and S440 and calculates the correspondence matrix Ω ^* , and specifies the correspondence matrix Ω ^* The correspondence is compared with the correct answer.

例如，處理器11根據對應關係矩陣Ω ^＊，以與在S180、S380執行的處理同樣之方式執行將第一實體之每一個與第二實體的一個建立對應的處理。 For example, the processor 11 performs a process of correlating each of the first entities with one of the second entities in the same manner as the processes performed in S180 and S380 based on the correspondence matrix Ω ^* .

在根據對應關係矩陣Ω ^＊而建立了對應的第一實體與第二實體係在評價對象之資料集中的同一實體的情況下，處理器11辨別為已成功建立對應，而在並非同一實體的情況下，處理器11辨別為建立對應失敗。 When the first entity and the second entity corresponding to each other are the same entity in the data set of the evaluation object, and the corresponding first entity and the second entity are established according to the correspondence matrix Ω ^* , the processor 11 determines that the correspondence has been successfully established, and in the case where they are not the same entity. , the processor 11 determines that the establishment of correspondence has failed.

處理器11能夠計算出在全體實體中成功建立對應之比率來作為評價對象之資料集的分數(S450)。之後，處理器11輸出已計算出的分數來作為評價結果 (S460)，然後結束評價處理。The processor 11 can calculate the ratio of successfully establishing correspondence among all entities as the score of the data set to be evaluated (S450). After that, the processor 11 outputs the calculated score as the evaluation result (S460), and then ends the evaluation process.

在基於一個資料集不能高精度地執行建立對應及資料融合的情況下，可以推測為，該資料集未具有充分的資訊或資料結構，以實現關於集合特徵之高精度的對應及資料融合。When mapping and data fusion cannot be performed with high accuracy based on a data set, it can be inferred that the data set does not have sufficient information or data structure to achieve high-precision mapping and data fusion with respect to the set features.

該資訊不充分也會影響在對兩個不同的資料集執行分析處理並進行建立對應及資料融合時的精度。因此，根據上述評價處理，能夠事先推測出評價對象之資料集是否係能夠高精度地執行不具備共用變數的資料融合的資料集。Insufficient information also affects the accuracy of analysis and mapping and data fusion of two different data sets. Therefore, based on the above evaluation process, it can be estimated in advance whether the data set to be evaluated is a data set that can perform data fusion without common variables with high accuracy.

在S460中，處理器11能夠藉由輸出分數，而向資訊處理系統1的用戶傳達評價對象之資料集是否係優良資料集。藉此，用戶能夠在分析處理中採用適當的第一資料集15A及第二資料集15B的組合，並且獲得信頼性高的擴展資料集15C。In S460, the processor 11 can convey to the user of the information processing system 1 whether the evaluation target data set is an excellent data set by outputting a score. Thereby, the user can use an appropriate combination of the first data set 15A and the second data set 15B in analysis processing, and obtain an extended data set 15C with high reliability.

為了獲得所希望的擴展資料集15C，可以考慮只要採用彼此類似的複數個資料集之任一個來作為與第二資料集15B結合的第一資料集15A便足矣的環境。In order to obtain the desired extended data set 15C, it may be considered that it is sufficient to use any one of a plurality of data sets that are similar to each other as the first data set 15A combined with the second data set 15B.

例如，考慮對關於購買行為之第一資料集15A和關於網站訪問行為/網頁內容瀏覽行為之第二資料集15B進行結合，並生成擴展資料集15C的情況。於這種情況下，可以考慮只要使用關於複數個流通組織之任一個組織的顧客的購買行為之資料集作為第一資料集15A來生成擴展資料集15C便足矣的情況。For example, consider a case where the first data set 15A on purchasing behavior and the second data set 15B on website access behavior/web content browsing behavior are combined to generate an extended data set 15C. In this case, it may be considered that it is sufficient to generate the extended data set 15C using a data set on the purchasing behavior of customers of any one of the plurality of distribution organizations as the first data set 15A.

複數個流通組織的示例中包含複數個連鎖便利商店。關於各連鎖便利商店的購買之資料集中可以作為消費者的購買行為而包含關於和其他連鎖便利商店同類的購買行為之資訊。Examples of a plurality of distribution organizations include a plurality of chain convenience stores. The data set on purchases at each convenience store chain may include information on similar purchasing behaviors at other convenience store chains as consumer purchasing behavior.

因此，可以考慮使用關於複數個連鎖便利商店中的任一個的顧客購買行為之資料集作為第一資料集15A來生成擴展資料集15C便足矣的情況。Therefore, it may be considered that it is sufficient to generate the extended data set 15C using a data set on the customer purchase behavior of any one of a plurality of chain convenience stores as the first data set 15A.

在作為第一資料集15A(或第二資料集15B)的候補而存在複數個資料集的情況下，能夠利用上述評價處理，以便出於建立對應及資料融合的精度之觀點而從該等複數個資料集中選擇最合適的資料集來作為第一資料集15A(或第二資料集15B)。When there are a plurality of data sets as candidates for the first data set 15A (or the second data set 15B), the above-mentioned evaluation process can be used to select from the plurality of data sets from the viewpoint of establishing correspondence and data fusion accuracy. The most appropriate data set among the data sets is selected as the first data set 15A (or the second data set 15B).

例如，處理器11可以在S110、S120、S310、S320之任一個處理中，根據需要而執行第9圖所示的選擇處理，藉此從作為資料融合對象之資料集的複數個候補中採用一個候補，來作為資料融合對象之資料集。在S110、S310中的作為資料融合對象之資料集對應於第一資料集15A，在S120、S320中的作為資料融合對象之資料集對應於第二資料集15B。For example, in any one of S110, S120, S310, and S320, the processor 11 may select one of the plurality of candidates for the data set to be the data fusion target by executing the selection process shown in FIG. 9 as necessary. Candidates to serve as data sets for data fusion objects. The data set as the data fusion target in S110 and S310 corresponds to the first data set 15A, and the data set as the data fusion target in S120 and S320 corresponds to the second data set 15B.

若開始執行第9圖所示的選擇處理時，則處理器11取得複數個資料集來作為資料融合對象之資料集的複數個候補 (S510)。處理器11能夠自儲存器15取得由用戶指定的複數個資料集。When the selection process shown in FIG. 9 is started, the processor 11 obtains a plurality of data sets as a plurality of candidates for the data set to be fused (S510). The processor 11 can obtain a plurality of data sets specified by the user from the storage 15 .

之後，處理器11將複數個資料集中的一個設定為評價對象之資料集(S520)，並執行第8圖所示的評價處理(S530)。處理器11針對每一個資料集，均將其設定為評價對象之資料集(S520)，並循環地實施執行評價處理(S530) 之處理，直至執行了關於所有複數個資料集之評價處理(在S540中為Yes)為止。藉此，對每個資料集取得在S450中計算出的分數。After that, the processor 11 sets one of the plurality of data sets as the data set to be evaluated (S520), and executes the evaluation process shown in FIG. 8 (S530). The processor 11 sets each data set as a data set to be evaluated (S520), and executes the evaluation processing (S530) in a loop until the evaluation processing on all the plurality of data sets is executed (in until Yes in S540). Thereby, the score calculated in S450 is obtained for each data set.

若關於所有複數個資料集均執行了評價處理並取得了分數(在S540中為Yes)，則處理器11採用複數個資料集中分數最高的資料集來作為資料融合對象之資料集 (S550)。之後，結束選擇處理。在S110、S120、S310、S320中，處理器11能夠基於已採用的資料融合對象之資料集來生成特徵向量(x或y)。If the evaluation process has been performed on all the plurality of data sets and scores have been obtained (Yes in S540), the processor 11 uses the data set with the highest score among the plurality of data sets as the data set of the data fusion target (S550). After that, the selection process is ended. In S110, S120, S310, and S320, the processor 11 can generate a feature vector (x or y) based on the data set of the adopted data fusion object.

執行如上所述之選擇處理並從複數個候補中選擇最合適的資料集，藉此能夠生成精度較高的擴展資料集15C。By performing the selection process as described above and selecting the most suitable data set from a plurality of candidates, the extended data set 15C with higher accuracy can be generated.

補充說明如下，在購買行為的示例中，作為資料融合對象之資料集的複數個候補中可以包含用不同的參數表示消費者的購買行為之複數個資料集。例如，第一候補可以係如下資料集：能夠對作為實體的每個消費者生成將每個商品的購買個數包含在元素中的特徵向量。第二候補可以係如下資料集：能夠對作為實體的每個消費者生成將每個商品的購買金額包含在元素中的特徵向量。Supplementary explanation is as follows. In the example of purchasing behavior, the plurality of candidates for the data set as the data fusion target may include a plurality of data sets representing the consumer's purchasing behavior with different parameters. For example, the first candidate may be a data set that can generate a feature vector including the purchase number of each product in an element for each consumer as an entity. The second candidate may be a data set that can generate a feature vector including the purchase amount of each product in an element for each consumer as an entity.

準備用不同的參數來說明上述之同類特徵的複數個資料集，並選擇適合於資料融合的資料集有助於生成更佳的擴展資料集15C。Preparing multiple data sets using different parameters to describe the same characteristics mentioned above, and selecting a data set suitable for data fusion will help generate a better extended data set 15C.

[第四實施形態] 第10圖所示的第四實施形態之傳送系統30係使用第一實施形態或第二實施形態之資料融合技術，對自傳送系統30的外部提供的資料集即外部資料集35A和在傳送系統30的內部保持的資料集即內部資料集35B進行結合，並基於藉此生成的擴展資料集35C來進行傳送廣告的系統。 [Fourth Embodiment] The transmission system 30 of the fourth embodiment shown in FIG. 10 uses the data fusion technology of the first embodiment or the second embodiment to combine the data set provided from outside the transmission system 30, that is, the external data set 35A, and the external data set 35A in the transmission system. A system that combines 30 internally maintained data sets, namely the internal data set 35B, and delivers advertisements based on the generated extended data set 35C.

如第10圖所示，傳送系統30包含處理器31、記憶體33、儲存器35和通訊介面39。處理器31按照儲存在儲存器35中的電腦程式Pr1執行處理。儲存器35進一步包含內部資料集35B。As shown in FIG. 10 , the transmission system 30 includes a processor 31 , a memory 33 , a storage 35 and a communication interface 39 . The processor 31 executes processing according to the computer program Pr1 stored in the memory 35 . Storage 35 further contains an internal data set 35B.

如第11圖所示，內部資料集35B包含特徵資料，該特徵資料係針對每個用戶，與對應之用戶的廣告ID建立關聯，並說明對應之用戶的線上行為的特徵。眾所周知，廣告ID係用於廣告的識別碼，且係資訊終端中固有的ID。As shown in Figure 11, the internal data set 35B includes characteristic data, which is associated with the advertising ID of the corresponding user for each user and describes the characteristics of the corresponding user's online behavior. As we all know, the advertisement ID is an identification code used for advertisements and is an ID inherent in the information terminal.

與廣告ID建立了關聯的特徵資料係說明透過已被分配對應之廣告ID的資訊終端而觀測到的用戶線上行為的特徵。線上行為中包含網站訪問行為及網頁內容瀏覽行為。The characteristic data associated with the advertising ID describes the characteristics of the user's online behavior observed through the information terminal that has been assigned the corresponding advertising ID. Online behavior includes website access behavior and web content browsing behavior.

傳送系統30透過通訊介面39與區域網路連接，並經由區域網路提供廣告傳送服務。利用廣告傳送服務的企業方的系統即利用企業方系統40向傳送系統30提供作為傳送對象之廣告內容和傳送指定資訊。廣告內容係用於廣告的資訊內容。傳送指定資訊中包含指定傳送目標的目標指定資訊及指定傳送數量的傳送數量指定資訊。The delivery system 30 is connected to the local network through the communication interface 39 and provides advertisement delivery services through the local network. The company-side system that uses the advertisement delivery service uses the company-side system 40 to provide the delivery system 30 with advertising content and delivery designation information to be delivered. Advertising content is the information content used for advertising. The transmission designation information includes destination designation information that specifies the transmission destination and transmission quantity designation information that specifies the transmission quantity.

利用企業方系統40進一步向傳送系統30提供作為外部資料集35A之如下顧客資料集，該顧客資料集係說明與傳送標的候補對應之顧客的特徵的資料集。The enterprise-side system 40 further provides the delivery system 30 with a customer data set that describes the characteristics of the customer corresponding to the delivery target candidate as the external data set 35A.

顧客資料集例如可以係用於說明關於對利用企業運営的店舗進行利用的顧客的購買行為之特徵的資料集。例如，顧客資料集可以包含用於說明對應之顧客對複數個商品之每個商品的購買數量之特徵資料，來作為每個顧客的特徵資料。The customer data set may be, for example, a data set describing characteristics of purchasing behavior of customers who use a store operated by the company. For example, the customer data set may include characteristic data describing the purchase quantity of each of the plurality of products by the corresponding customer as the characteristic data of each customer.

若傳送要求自利用企業方系統40透過通訊介面39被輸入至處理器31，則處理器31按照電腦程式Pr1執行第12圖所示的傳送控制處理。If the transmission request is input from the user-side system 40 to the processor 31 through the communication interface 39, the processor 31 executes the transmission control process shown in FIG. 12 according to the computer program Pr1.

若開始執行傳送控制處理，則處理器31自利用企業方系統40一併取得作為傳送對象之廣告內容、包含目標指定資訊和傳送數量指定資訊的傳送指定資訊、以及作為外部資料集35A的顧客資料集(S610)。When the delivery control process is started, the processor 31 acquires the advertising content to be delivered, the delivery designation information including the target designation information and the delivery quantity designation information, and the customer data as the external data set 35A from the user company's system 40 Set (S610).

之後，處理器31使用外部資料集35A作為第一資料集15A，並使用內部資料集35B作為第二資料集15B並且執行與在分析處理中的S110～S190之處理相同的處理。藉此，處理器31對外部資料集35A和內部資料集35B進行結合，並生成擴展資料集35C(S620)。After that, the processor 31 uses the external data set 35A as the first data set 15A and uses the internal data set 35B as the second data set 15B and performs the same processing as the processing of S110 to S190 in the analysis processing. Thereby, the processor 31 combines the external data set 35A and the internal data set 35B, and generates the extended data set 35C (S620).

藉由外部資料集35A和內部資料集35B的結合，包含在外部資料集35A中的每個顧客的特徵資料與包含在內部資料集35B中的、顧客係同一人物的可能性較高的用戶之廣告ID建立關聯。Through the combination of the external data set 35A and the internal data set 35B, the characteristic data of each customer included in the external data set 35A is compared with the users included in the internal data set 35B whose customers are more likely to be the same person. Advertising ID is associated.

擴展資料集35C包含如下擴展資料，該擴展資料係針對每個實體，將對應之顧客的外部資料集35A所具有的特徵資料與對應之用戶的內部資料集35B所具有的特徵資料進行結合而生成的資料。各擴展資料與內部資料集35B所具有的對應之用戶的廣告ID建立關聯。The extended data set 35C includes the following extended data, which is generated for each entity by combining the characteristic data of the corresponding customer's external data set 35A with the characteristic data of the corresponding user's internal data set 35B. information. Each extended data is associated with the corresponding user's advertising ID contained in the internal data set 35B.

於此所說的實體係指，藉由資料融合而使彼此已建立對應之顧客與用戶的組合。在資料融合中，顧客與用戶一對一地建立對應。例如，擴展資料集35C可以係具有如下結構的資料集，即，在第6圖所示的擴展資料集15C中，於具有圖示之「ID2_1」「ID2_2」「ID2_3」的列中記述有各實體的廣告ID。The entity mentioned here refers to the combination of customers and users that have been corresponding to each other through data fusion. In data fusion, customers and users are corresponding one-to-one. For example, the extended data set 35C may be a data set having the following structure. That is, in the extended data set 15C shown in FIG. The entity's advertising ID.

之後，處理器31計算出關於擴展資料集35C內的各實體為傳送目標的可能性之分數(S630)。例如，在外部資料集35A係關於顧客的購買行為之資料集，且內部資料集35B係關於用戶的線上行為之資料集的情況下，處理器31將擴展資料集35C內的關於各實體的購買行為之特徵資料和關於線上行為之特徵資料輸入至既定的函數，並計算出將對應之實體為傳送目標的可能性數值化的分數。Afterwards, the processor 31 calculates a score regarding the probability that each entity in the extended data set 35C is the transmission target (S630). For example, in the case where the external data set 35A is a data set about the customer's purchasing behavior, and the internal data set 35B is a data set about the user's online behavior, the processor 31 will expand the data set 35C about the purchases of each entity. The characteristic data of the behavior and the characteristic data about the online behavior are input into the predetermined function, and a score is calculated that digitizes the probability that the corresponding entity is the transmission target.

傳送目標係藉由性別、年齡、購買傾向、線上行為傾向、興趣及關注等賦予消費者特性化之參數而限縮的作為傳送標的之消費者群，並藉由目標指定資訊指定傳送目標。The transmission target is the consumer group that is the transmission target and is narrowed by parameters that characterize the consumer such as gender, age, purchasing tendency, online behavior tendency, interest and attention, and the transmission target is specified by the target specific information.

在S630中計算出分數後，處理器31將廣告ID被建立關聯的實體的群組中的、數量與由利用企業方系統40指定的傳送數量對應之實體按照被計算出的分數由高到低之順序而決定為內容傳送標的(S640)。如上所述，處理器31選擇與內部資料集35B對應之複數個用戶的至少一部分作為廣告內容的傳送標的，其中，內部資料集35B已與對應於外部資料集35A的複數個顧客的任一個建立對應。After calculating the score in S630, the processor 31 sorts the entities in the group of entities whose advertising IDs are associated with the number corresponding to the transmission quantity specified by the enterprise system 40 in descending order according to the calculated score. The sequence is determined as the content transmission target (S640). As described above, the processor 31 selects at least a portion of the plurality of users corresponding to the internal data set 35B, which has been established with any one of the plurality of customers corresponding to the external data set 35A, as the transmission target of the advertising content. correspond.

之後，處理器31將由利用企業方系統40提供的廣告內容經由區域網路發送至已決定的內容傳送標的的資訊終端(S650)。廣告內容被傳送至根據內容傳送標的的廣告ID識別出的資訊終端。之後，處理器31結束傳送控制處理。After that, the processor 31 sends the advertising content provided by the enterprise system 40 to the information terminal of the determined content transmission target via the local network (S650). The advertising content is delivered to the information terminal identified based on the advertising ID of the content delivery target. After that, the processor 31 ends the transfer control process.

根據以上說明的第四實施形態之傳送系統30，使用不具備共用變數的資料融合技術對外部資料集35A和內部資料集35B進行結合，藉此，能夠將廣告ID與廣告ID不明的顧客的特徵資料建立關聯。因此，能夠針對廣告ID不明的外部資料集35A的顧客適當地傳送廣告內容。According to the delivery system 30 of the fourth embodiment described above, the external data set 35A and the internal data set 35B are combined using data fusion technology that does not have common variables, thereby making it possible to combine the advertising ID and the characteristics of customers whose advertising ID is unknown. Data is associated. Therefore, the advertisement content can be delivered appropriately to the customers of the external data set 35A whose advertisement ID is unknown.

[第五實施形態] 第五實施形態之傳送系統30係構成為，處理器31執行第13圖所示的傳送控制處理來取代第12圖所示的傳送控制處理。以下作為第五實施形態的說明，選擇性地說明處理器31執行傳送控制處理的細節。可以理解為在本實施形態中未提及的傳送系統30的構成與第四實施形態相同。 [Fifth Embodiment] The transmission system 30 of the fifth embodiment is configured such that the processor 31 executes the transmission control process shown in FIG. 13 instead of the transmission control process shown in FIG. 12 . As a description of the fifth embodiment, details of the transfer control processing executed by the processor 31 will be selectively described below. It can be understood that the structure of the transmission system 30 not mentioned in this embodiment is the same as that of the fourth embodiment.

在本實施形態中，若傳送要求自利用企業方系統40藉由通訊介面39而被輸入至處理器31，則處理器31執行第13圖所示的傳送控制處理。In this embodiment, if a transmission request is input from the user-side system 40 to the processor 31 through the communication interface 39, the processor 31 executes the transmission control process shown in FIG. 13.

若開始執行傳送控制處理，則處理器31自利用企業方系統40一併取得作為傳送對象之廣告內容、傳送指定資訊、和作為外部資料集35A的顧客資料集(S710)。When the transmission control process is started, the processor 31 acquires the advertisement content to be transmitted, the transmission designation information, and the customer data set as the external data set 35A from the user company system 40 (S710).

在S710中取得的傳送指定資訊不包含目標指定資訊，而僅包含傳送數量指定資訊。作為外部資料集35A而取得的顧客資料集係用於說明與利用企業已限縮的傳送目標對應之顧客群的特徵之特定顧客資料集。The delivery designation information obtained in S710 does not include target designation information, but only contains delivery quantity designation information. The customer data set acquired as the external data set 35A is a specific customer data set describing the characteristics of the customer group corresponding to the transmission target that has been restricted by the utilization company.

之後，與在S620中的處理同樣地，處理器31對外部資料集35A和內部資料集35B進行結合，並生成擴展資料集35C (S720)。擴展資料集35C包含如下之擴展資料，該擴展資料係針對每個實體，使對應之顧客的外部資料集35A所具有的特徵資料與對應之用戶的內部資料集35B所具有的特徵資料進行結合而生成的資料。Thereafter, similarly to the process in S620, the processor 31 combines the external data set 35A and the internal data set 35B and generates the extended data set 35C (S720). The extended data set 35C includes the following extended data. The extended data is for each entity by combining the characteristic data of the corresponding customer's external data set 35A with the characteristic data of the corresponding user's internal data set 35B. generated information.

在本實施形態的S720的處理中，不產生外部資料集35A的顧客與內部資料集35B的所有用戶建立對應之結果。本實施形態的擴展資料集35C也包含未與利用企業方的顧客建立對應之用戶的特徵資料來作為一個實體的擴展資料。該擴展資料係實質上未被擴展的內部資料集35B所具有的該用戶的特徵資料。In the process of S720 in this embodiment, the result that the customers of the external data set 35A are associated with all the users of the internal data set 35B is not generated. The extended data set 35C of this embodiment also includes the extended data of a user that is not associated with the customer of the using company as one entity. The extended data is the characteristic data of the user contained in the internal data set 35B that is not substantially expanded.

在本實施形態中，將與擴展資料集35C對應之實體的群組中的、已與對應於外部資料集35A的顧客群建立對應之實體群標記為種子，將除此以外的實體的群組標記為非種子。In this embodiment, among the entity groups corresponding to the extended data set 35C, the entity group that is associated with the customer group corresponding to the external data set 35A is marked as a seed, and the other entity groups are marked as seeds. Marked as non-seed.

在執行S720的處理後，處理器31基於擴展資料集35C，計算出非種子的各實體與種子的各實體之間的內部資料集35B所示的特徵的相似度(S730)。相似度係可以根據非種子的各實體與種子的各實體之間於特徵空間上的距離來計算得出。After executing the process of S720, the processor 31 calculates the similarity of the features shown in the internal data set 35B between each non-seed entity and each seed entity based on the extended data set 35C (S730). The similarity can be calculated based on the distance in the feature space between each non-seed entity and each seed entity.

計算出相似度後，處理器31將數量與在傳送指定資訊中被指定的傳送數量對應之實體按照相似度由高到低的順序決定為傳送標的(S740)。此時，與種子對應之所有實體也被決定為傳送標的。After calculating the similarity, the processor 31 determines entities whose number corresponds to the transmission quantity specified in the transmission designation information as transmission targets in order of similarity from high to low (S740). At this time, all entities corresponding to the seed are also determined as transmission targets.

如上所述，處理器31將與對應於外部資料集35A的複數個顧客已建立對應之用戶的集合即種子的集合和對應於內部資料集35B的複數個用戶中特徵和種子類似之用戶的集合選定為廣告內容的傳送標的。As described above, the processor 31 combines a set of users, that is, a set of seeds, that has been established with a plurality of customers corresponding to the external data set 35A and a set of users with similar characteristics and seeds among the plurality of users corresponding to the internal data set 35B. Selected as the delivery target of advertising content.

之後，與在S650的處理同樣地，處理器31將由利用企業方系統40提供的廣告內容經由區域網路發送至在S740中已決定的內容傳送標的的資訊終端(S750)。之後，處理器31結束傳送控制處理。Thereafter, similarly to the process in S650, the processor 31 sends the advertisement content provided by the enterprise system 40 to the information terminal of the content transmission target determined in S740 via the local network (S750). After that, the processor 31 ends the transfer control process.

根據以上說明的本實施形態的傳送系統30，能夠以由利用企業方系統40提供的顧客群的資料集為基礎，將廣告內容傳送至顯示與顧客群類似的特徵之更大的集合的消費者的資訊終端。因此，根據本實施形態，能夠向大量的消費者有效率地進行廣告傳送。According to the delivery system 30 of this embodiment described above, based on the data set of customer groups provided by the system 40 of the using company, advertising content can be delivered to a larger set of consumers showing characteristics similar to the customer groups. information terminal. Therefore, according to this embodiment, advertisements can be efficiently delivered to a large number of consumers.

[第六實施形態] 第六實施形態之傳送系統30係構成為，除了與第四實施形態或第五實施形態之傳送系統30同樣地提供廣告傳送服務以外，還提供預測服務。 [Sixth Embodiment] The delivery system 30 of the sixth embodiment is configured to provide a prediction service in addition to the advertisement delivery service like the delivery system 30 of the fourth or fifth embodiment.

在本實施形態中，處理器31響應來自利用企業方系統40的執行要求，執行第14圖所示的預測處理。以下作為第六實施形態的說明，選擇性地說明由處理器31執行的預測處理的細節。可以理解為在本實施形態中未提及的傳送系統30的構成與第四實施形態或第五實施形態相同。In this embodiment, the processor 31 responds to an execution request from the user company's system 40 and executes the prediction process shown in FIG. 14 . As an explanation of the sixth embodiment, details of prediction processing executed by the processor 31 will be selectively described below. It can be understood that the configuration of the transmission system 30 not mentioned in this embodiment is the same as that of the fourth embodiment or the fifth embodiment.

若處理器31開始執行預測處理，則處理器31透過通訊介面39從利用企業方系統40取得分析對象的資料集和分析條件指定資訊 (S810)。分析對象的資料集係包含分析對象之每個顧客的特徵資料的資料集。If the processor 31 starts to execute prediction processing, the processor 31 obtains the data set of the analysis target and the analysis condition designation information from the user enterprise system 40 through the communication interface 39 (S810). The data set of the analysis object is a data set containing the characteristic data of each customer of the analysis object.

分析條件指定資訊可以係指定對象的商品之資訊，該對象的商品係用於評價顧客的購買可能性。在預測處理中，係藉由計算出對象商品的購買數量的預測值來預測作為分析對象的各顧客購買被指定的對象商品的可能性。在此所述之預測係對應於推定顧客的行為，預測值係對應於關於行為的推定值。The analysis condition designation information may be information on a designated product that is used to evaluate a customer's purchase possibility. In the prediction process, the probability that each customer targeted for analysis will purchase the designated target product is predicted by calculating a predicted value of the purchase quantity of the target product. The prediction described here corresponds to the estimated customer behavior, and the predicted value corresponds to the estimated value regarding the behavior.

在執行S810的處理後，處理器31使用分析對象之資料集作為第一資料集15A，並使用內部資料集35B作為第二資料集15B，並且藉由執行與在分析處理中的S110～S170或S310～S370的處理相同的處理，計算出對應關係矩陣Ω ^＊(S820)，其中，對應關係矩陣Ω ^＊係表示作為分析對象的各顧客與在內部資料集35B中具有特徵資料的各用戶之間的對應關係。 After executing the processing of S810, the processor 31 uses the data set of the analysis target as the first data set 15A, and uses the internal data set 35B as the second data set 15B, and by executing S110 to S170 or The processing of S310 to S370 is the same, and the correspondence matrix Ω ^* is calculated (S820). The correspondence matrix Ω ^* represents the relationship between each customer who is the analysis target and each user who has characteristic data in the internal data set 35B. corresponding relationship.

處理器31進一步根據已計算出的對應關係矩陣Ω ^＊，針對分析對象之每個顧客，抽出與對應之顧客近似的既定數量的用戶，並藉由能夠自內部資料集35B特定出的、上述已抽出的用戶購買對象商品之數量的加權平均，來計算出對應之顧客購買對象商品之數量的預測值 (S830)。如上所述，處理器31根據已建立對應之用戶的購買行為來推定顧客的購買行為。內部資料集35B中包含能夠特定各用戶購買對象商品的數量的資訊。 The processor 31 further extracts, for each customer of the analysis object, a predetermined number of users similar to the corresponding customer based on the calculated correspondence matrix Ω ^* , and uses the above-mentioned user data that can be specified from the internal data set 35B. The weighted average of the extracted quantity of the target product purchased by the user is used to calculate the predicted value of the corresponding quantity of the target product purchased by the customer (S830). As described above, the processor 31 infers the customer's purchasing behavior based on the purchasing behavior of the corresponding user. The internal data set 35B includes information that can specify the quantity of the target product purchased by each user.

對應關係矩陣Ω ^＊的各元素用0～1的值表示顧客與用戶的相似度。對應關係矩陣Ω ^＊中的第i行第j列的元素用0～1的值表示與內部資料集35B對應之用戶的集合中第i個用戶和與分析對象之資料集之顧客的集合中第j個顧客之間的相似度。 Each element of the correspondence matrix Ω ^* represents the similarity between the customer and the user with a value from 0 to 1. The element in the i-th row and j-th column in the correspondence matrix Ω ^* represents, with a value of 0 to 1, the i-th user in the set of users corresponding to the internal data set 35B and the i-th user in the set of customers in the data set to be analyzed. The similarity between j customers.

例如使用相似度作為權重來計算出加權平均。當假定已抽出第一用戶、第二用戶和第三用戶作為與顧客近似的三個用戶時，可以按照如下計算出加權平均。For example, a weighted average is calculated using similarity as a weight. When it is assumed that the first user, the second user, and the third user have been extracted as three users similar to the customer, the weighted average can be calculated as follows.

亦即，當顧客與第一用戶的相似度係w1，顧客與第二用戶的相似度係w2，顧客與第三用戶的相似度係w3，第一用戶購買對象商品的數量係p1，第二用戶購買對象商品的數量係p2，第三用戶購買對象商品的數量係p3時，顧客的對象商品的購買數量的預測值pe可以由pe＝(w1・p1＋w2・p2＋w3・p3)/3計算得出。That is, when the similarity between the customer and the first user is w1, the similarity between the customer and the second user is w2, the similarity between the customer and the third user is w3, the quantity of the target product purchased by the first user is p1, and the similarity between the customer and the third user is p1, the similarity between the customer and the third user is w3. When the quantity of the target product purchased by the user is p2 and the quantity of the target product purchased by the third user is p3, the predicted value pe of the customer's purchase quantity of the target product can be calculated by pe=(w1・p1＋w2・p2+w3・p3)/3 .

能夠根據對應關係矩陣Ω ^＊對每個顧客特定其與所有用戶的相似度(換言之，建立對應之大小)。因此，可以無需抽出近似的用戶的過程，而根據所有用戶購買對象商品的數量的加權平均來計算出顧客購買對象商品的數量之預測值。 The similarity between each customer and all users can be specified based on the correspondence matrix Ω ^* (in other words, the size of the correspondence is established). Therefore, it is possible to calculate the predicted value of the quantity of the target product purchased by the customer based on the weighted average of the quantities of the target product purchased by all users without the process of extracting approximate users.

在執行S830的處理後，處理器31向預測處理的執行要求方輸出記述有每個顧客購買對應商品的數量的預測值之預測資料 (S840)。之後，處理器31結束第14圖所示的預測處理。After executing the process of S830, the processor 31 outputs the forecast data describing the predicted value of the quantity of the corresponding product purchased by each customer to the execution requester of the forecast process (S840). After that, the processor 31 ends the prediction process shown in Fig. 14.

根據其他示例，處理器31在執行S830的處理後，還可以取代輸出預測資料或不僅輸出預測資料，執行如下處理，即，根據每個顧客購買對應商品的數量的預測值，按照預測值由大到小的順序，針對人數與利用企業所指定的傳送數量對應之顧客，傳送推薦購買對象商品的的廣告內容 (S840)。According to other examples, after performing the processing of S830, the processor 31 may also perform the following processing instead of outputting the prediction data or not only outputting the prediction data, that is, according to the predicted value of the quantity of the corresponding product purchased by each customer, the predicted value is In descending order, advertising content of recommended purchase target products is delivered to customers whose number corresponds to the delivery quantity designated by the using company (S840).

以上對第六實施形態之傳送系統30進行了說明，根據本實施形態，能夠利用無共用變數的資料融合技術，提供有意義的廣告傳送服務，進一步能夠提供有意義的行銷解決方案。The delivery system 30 of the sixth embodiment has been described above. According to this embodiment, it is possible to provide meaningful advertisement delivery services by utilizing data fusion technology without shared variables, and further to provide meaningful marketing solutions.

[其他] 本發明的示例之實施形態並不限於上述實施形態，亦可採用各種形態。上述實施形態其中之一個構成元素所具有之功能，亦可分散設置於複數個構成元素中。複數個構成元素所具有之功能，亦可綜合成一個構成元素。上述實施形態之構成之一部分亦可被省略。上述實施形態之構成之至少一部分，亦可針對其他上述實施形態之構成，進行附加或置換。由記載於申請專利範圍之字句所指明之技術思想所包含的所有樣態，其皆係本發明之實施形態。 [other] Example embodiments of the present invention are not limited to the above-described embodiments, and various forms may be adopted. The function of one of the constituent elements in the above-mentioned embodiment can also be distributed in a plurality of constituent elements. The functions of multiple constituent elements can also be synthesized into one constituent element. Part of the components of the above-described embodiment may be omitted. At least part of the configurations of the above-described embodiments may be added to or replaced with the configurations of other above-described embodiments. All aspects included in the technical ideas specified by the words described in the scope of the patent application are implementation forms of the present invention.

[本說明書所開示之技術思想] 本說明書中，可理解到揭示了以下之技術思想。 [項目1]一種資訊處理系統，包含：第一取得部，係構成為取得關於複數個第一實體的第一資料集，该第一資料集記述該複數個第一實體各自的特徵；第二取得部，係構成為取得關於複數個第二實體的第二資料集，该第二資料集記述該複數個第二實體各自的特徵；降維部，係構成為藉由對自該第一資料集特定的第一特徵向量的群組及自該第二資料集特定的第二特徵向量的群組執行降維處理，而生成與該第一特徵向量的群組對應之第一低維度特徵向量的群組及與該第二特徵向量的群組對應之第二低維度特徵向量的群組，其中，該第一特徵向量之每一個係表示該複數個第一實體中的對應之一個實體的特徵，該第二特徵向量之每一個係表示該複數個第二實體中的對應之一個實體的特徵，該第二低維度特徵向量的群組係具有與該第一低維度特徵向量的群組相同的維度數；及對應建立部，係構成為根據該第一低維度特徵向量的群組及該第二低維度特徵向量的群組，將該複數個第一實體之每一個與該複數個第二實體的至少一個建立對應。 [項目2]如項目1之資訊處理系統，其中，該對應建立部係根據自該第一低維度特徵向量的群組特定的該第一實體間的相似度及自該第二低維度特徵向量的群組特定的該第二實體間的相似度，將該複數個第一實體之每一個與該複數個第二實體的至少一個建立對應，以使關於相似度的該第一實體間的相互關係適合於該第二實體間的相互關係。 [項目3]如項目1或2之資訊處理系統，其中，根據第一特徵空間定義該第一低維度特徵向量的群組，根據第二特徵空間定義該第二低維度特徵向量的群組，該對應建立部係搜尋用於將該第一特徵空間上的該複數個第一實體映射到該第二特徵空間的映射（Mapping），以使自該第一低維度特徵向量的群組特定的該第一特徵空間中的該複數個第一實體的分布適合於自該第二低維度特徵向量的群組特定的該第二特徵空間中的該複數個第二實體的分布，並且基於該映射將該複數個第一實體之每一個與該複數個第二實體的至少一個建立對應。 [項目4]如項目1或2之資訊處理系統，其中，該對應建立部係構成為，將按照包含矩陣K、矩陣L及矩陣H之如下數學式搜尋使值Z(Ω)最大化的矩陣Ω作為矩陣Ω ^＊，並且根據該矩陣Ω ^＊，將該複數個第一實體之每一個與該複數個第二實體的至少一個建立對應， [數學式1] 該第一實體的數量係N，該第二實體的數量與該第一實體的數量相同，該矩陣K係，第i行第j列的元素的值表示該複數個第一實體中第i個實體與第j個實體之間的相似度，且係基於該複數個第一實體中的該第i個實體的第一低維度特徵向量和該複數個第一實體中的該第j個實體的第一低維度特徵向量而計算得出的N行N列的第一相似度矩陣，該矩陣L係，第i行第j列的元素的值表示該複數個第二實體中的第i個實體與第j個實體之間的相似度，且係基於該複數個第二實體中的該第i個實體的第二低維度特徵向量和該複數個第二實體中的該第j個實體的第二低維度特徵向量而計算得出的N行N列的第二相似度矩陣，矩陣H係，當第i行第j列的元素的值係i＝j時，表示值為1-1/N，當第i行第j列的元素的值係i≠j時，表示值為0 之N行N列的矩陣。 [項目5]如項目4之資訊處理系統，其中，該對應建立部係構成為，循環執行關於該矩陣Ω ^＊的再搜尋處理直至滿足既定條件為止，藉此來改善該矩陣Ω ^＊，並且根據已改善的該矩陣Ω ^＊，將該複數個第一實體之每一個與該複數個第二實體的至少一個建立對應，該再搜尋處理係包含：根據該矩陣Ω ^＊變更在該降維處理中的降維方式；使該降維部根據変更後的該降維方式執行降維處理，並且基於由此而新獲得的該第一低維度特徵向量的群組及該第二低維度特徵向量的群組再次搜尋該矩陣Ω ^＊。 [項目6]如項目5之資訊處理系統，其中，該對應建立部變更該降維方式，以使該第一低維度特徵向量的群組及該第二低維度特徵向量的群組中彼此對應之第一低維度特徵向量與第二低維度特徵向量之間於特徵空間上的距離縮短。 [項目7]如項目1～項目6中任一項之資訊處理系統，其中，該第一資料集包含複數個第一特性資料，該複數個第一特性資料之每一個表示該複數個第一實體中的對應之一個實體的特徵，該第二資料集包含複數個第二特性資料，該複數個第二特性資料之每一個表示該複數個第二實體中的對應之一個實體的特徵，該資訊處理系統進一步包含資料融合部，該資料融合部係根據由該對應建立部建立的該複數個第一實體與該複數個第二實體之間的對應，於該複數個第一特性資料之每一個分別結合該複數個第二特性資料中的一個，藉此生成擴展資料集，該擴展資料集包含複數個擴展資料，該複數個擴展資料之每一個係對應之一個第一特性資料與對應之一個第二特性資料結合而生成的資料。 [項目8]如項目1～項目7中任一項之資訊處理系統，其中，該第一實體及該第二實體係人，該第一資料集係記述屬於第一集團之複數個人各自的第一特徵的資料集，該第二資料集係記述屬於第二集團之複數個人各自的第二特徵的資料集。 [項目9]如項目8之資訊處理系統，其中，該第一特徵和該第二特徵的組合係關於購買行為的特徵、關於在線上空間及離線空間的至少一者的空間中的移動的特徵、和/或關於對該空間上的複數個地點進行訪問的特徵的組合。 [項目10]如項目1～項目9中任一項之資訊處理系統，其中，該第一實體及該第二實體係人，與該複數個第二實體之每一個對應的資訊終端的識別資訊與該第二資料集建立關聯，該資訊處理系統係包含：選擇部，該選擇部選擇該複數個第二實體中的、藉由該對應建立部而與該複數個第一實體之任一個實體已建立對應的第二實體的集合的至少一部分，來作為資訊內容的傳送標的；及傳送部，該傳送部係構成為，基於該識別資訊，將該資訊內容傳送至與該資訊內容的傳送標的對應之資訊終端的集合。 [項目11]如項目10之資訊處理系統，其中，該選擇部選擇第一集合和第二集合作為該資訊內容的傳送標的，該第一集合係藉由該對應建立部而與該複數個第一實體之任一個實體已建立對應的該第二實體的集合；該第二集合係該複數個第二實體中與該第一集合特徵類似的集合。 [項目12]如項目1～項目11中之任一項資訊處理系統，其中，該第一實體及該第二實體係人，該第二資料集係記述關於複數個第二實體之每一個實體的行為的特徵的資料集，該資訊處理系統進一步包含推定部，該推定部係關於一個以上的關注實體，針對每個關注實體，均計算出關於對應之關注實體的行為之推定值，該一個以上的關注實體係該複數個第一實體的至少一部分，該推定值係根據關於該複數個第二實體之至少一個實體的行為之特徵計算得出，其中，該複數個第二實體已與該對應之關注實體建立對應。 [項目13]一種資訊處理方法，係藉由電腦執行，包含：取得第一資料集，該第一資料集係關於複數個第一實體的資料集，且係記述該複數個第一實體各自的特徵；取得第二資料集，該第二資料集係關於複數個第二實體的資料集，且係記述該複數個第二實體各自的特徵；藉由對自該第一資料集特定的第一特徵向量的群組及自該第二資料集特定的第二特徵向量的群組執行降維處理，而生成與該第一特徵向量的群組對應之第一低維度特徵向量的群組及與該第二特徵向量的群組對應之第二低維度特徵向量的群組，其中，該第一特徵向量之每一個係表示該複數個第一實體中的對應之一個實體的特徵，該第二特徵向量之每一個係表示該複數個第二實體中的對應之一個實體的特徵，該第二低維度特徵向量的群組係具有與該第一低維度特徵向量的群組相同的維度數；及建立對應，係根據該第一低維度特徵向量的群組及該第二低維度特徵向量的群組，將該複數個第一實體之每一個與該複數個第二實體的至少一個建立對應。 [項目14]如項目13之資訊處理方法，其中，該建立對應係根據自該第一低維度特徵向量的群組特定的該第一實體間的相似度及自該第二低維度特徵向量的群組特定的該第二實體間的相似度，將該複數個第一實體之每一個與該複數個第二實體的一個建立對應，以使關於相似度的該第一實體間的相互關係適合於該第二實體間的相互關係。 [項目15]一種內儲電腦程式之電腦程式產品，其特徵在於：在電腦載入並執行電腦程式後，能夠完成項目13或項目14之資訊處理方法。 [項目16]一種內儲電腦程式之電腦可讀取記錄媒體，其特徵在於：在電腦載入並執行電腦程式後，能夠完成項目13或項目14之資訊處理方法。 [Technical ideas disclosed in this specification] In this specification, it can be understood that the following technical ideas are disclosed. [Item 1] An information processing system, including: a first acquisition unit configured to acquire a first data set about a plurality of first entities, the first data set describing respective characteristics of the plurality of first entities; second The acquisition part is configured to acquire a second data set about a plurality of second entities, and the second data set describes the characteristics of each of the plurality of second entities; the dimensionality reduction part is configured to obtain a second data set from the first data performing a dimensionality reduction process on a specific group of first feature vectors and a specific group of second feature vectors from the second data set to generate a first low-dimensional feature vector corresponding to the group of first feature vectors and a group of second low-dimensional feature vectors corresponding to the group of second feature vectors, wherein each of the first feature vectors represents a corresponding one of the plurality of first entities. Features, each of the second feature vectors represents a feature of a corresponding one of the plurality of second entities, and the group of the second low-dimensional feature vectors has the same group as the first low-dimensional feature vector. The same number of dimensions; and the corresponding establishment part is configured to associate each of the plurality of first entities with the plurality of first entities according to the group of the first low-dimensional feature vectors and the group of the second low-dimensional feature vectors. At least one of the second entities establishes correspondence. [Item 2] The information processing system of Item 1, wherein the correspondence establishment unit is based on the similarity between the first entities specified from the group of the first low-dimensional feature vector and the second low-dimensional feature vector. The group-specific similarity between the second entities is associated with each of the plurality of first entities and at least one of the plurality of second entities, so that the mutual relationship between the first entities with respect to the similarity is The relationship is suitable for the mutual relationship between the second entities. [Item 3] The information processing system of Item 1 or 2, wherein the group of the first low-dimensional feature vectors is defined according to the first feature space, and the group of the second low-dimensional feature vector is defined according to the second feature space, The correspondence establishment unit searches for a mapping (Mapping) for mapping the plurality of first entities on the first feature space to the second feature space, so that the group specific from the first low-dimensional feature vector The distribution of the plurality of first entities in the first feature space is adapted to the distribution of the plurality of second entities in the second feature space specified from the group of second low-dimensional feature vectors, and based on the mapping Each of the plurality of first entities is associated with at least one of the plurality of second entities. [Item 4] The information processing system of Item 1 or 2, wherein the correspondence establishment unit is configured to search for a matrix that maximizes the value Z(Ω) according to the following mathematical expression including the matrix K, the matrix L, and the matrix H. Ω serves as the matrix Ω ^* , and according to the matrix Ω ^* , each of the plurality of first entities is corresponding to at least one of the plurality of second entities, [Mathematical Formula 1] The quantity of the first entity is N, the quantity of the second entity is the same as the quantity of the first entity, the matrix K is a system, and the value of the element in the i-th row and j-th column represents the i-th of the plurality of first entities. The similarity between the entity and the j-th entity is based on the first low-dimensional feature vector of the i-th entity among the plurality of first entities and the j-th entity among the plurality of first entities. The first similarity matrix with N rows and N columns is calculated based on the first low-dimensional feature vector. In this matrix L, the value of the element in the i-th row and j-th column represents the i-th entity among the plurality of second entities. The similarity with the j-th entity is based on the second low-dimensional feature vector of the i-th entity among the plurality of second entities and the j-th entity among the plurality of second entities. The second similarity matrix of N rows and N columns calculated from the two low-dimensional feature vectors, the matrix H series, when the value of the element in the i-th row and j-th column is i=j, it means that the value is 1-1/N , when the value of the element in the i-th row and j-th column is i≠j, it represents a matrix of N rows and N columns with a value of 0. [Item 5] The information processing system of Item 4, wherein the correspondence establishment unit is configured to perform a re-search process on the matrix Ω ^* in a loop until a predetermined condition is met, thereby improving the matrix Ω ^* and based on The improved matrix Ω ^* establishes a correspondence between each of the plurality of first entities and at least one of the plurality of second entities. The re-search process includes: changing the dimensionality reduction process according to the matrix Ω ^* The dimensionality reduction method; causing the dimensionality reduction part to perform dimensionality reduction processing according to the modified dimensionality reduction method, and based on the newly obtained group of first low-dimensional feature vectors and the second low-dimensional feature vector The group searches the matrix Ω ^* again. [Item 6] The information processing system of Item 5, wherein the correspondence creation unit changes the dimensionality reduction method so that the group of first low-dimensional feature vectors and the group of second low-dimensional feature vectors correspond to each other The distance between the first low-dimensional feature vector and the second low-dimensional feature vector in the feature space is shortened. [Item 7] The information processing system of any one of Items 1 to 6, wherein the first data set includes a plurality of first characteristic data, and each of the plurality of first characteristic data represents the plurality of first characteristics. A characteristic of a corresponding one of the entities, the second data set includes a plurality of second characteristic data, each of the plurality of second characteristic data represents a characteristic of a corresponding one of the plurality of second entities, the The information processing system further includes a data fusion unit. The data fusion unit generates each of the plurality of first characteristic data based on the correspondence between the plurality of first entities and the plurality of second entities established by the correspondence creation unit. One is respectively combined with one of the plurality of second characteristic data, thereby generating an extended data set, the extended data set includes a plurality of extended data, each of the plurality of extended data is a corresponding first characteristic data and a corresponding Data generated by combining data with a second characteristic. [Item 8] The information processing system of any one of Items 1 to 7, wherein the first entity and the second entity are persons, and the first data set describes each of the plurality of individuals belonging to the first group. A data set of one characteristic, the second data set is a data set describing the second characteristics of each of the plurality of individuals belonging to the second group. [Item 9] The information processing system of Item 8, wherein the combination of the first feature and the second feature is a feature about purchasing behavior and a feature about movement in at least one of an online space and an offline space. , and/or a combination of features regarding visits to multiple locations in the space. [Item 10] The information processing system of any one of Items 1 to 9, wherein the first entity and the second entity are persons, and the identification information of the information terminal corresponding to each of the plurality of second entities To establish association with the second data set, the information processing system includes: a selection unit, the selection unit selects any one of the plurality of second entities that is associated with the plurality of first entities through the corresponding establishment unit. At least a part of the set of corresponding second entities has been established as the transmission target of the information content; and a transmission unit configured to, based on the identification information, transmit the information content to the transmission target of the information content. A collection of corresponding information terminals. [Item 11] The information processing system of Item 10, wherein the selection part selects a first set and a second set as transmission targets of the information content, and the first set is connected with the plurality of third sets by the correspondence establishing part. Any entity of an entity has established a corresponding set of the second entity; the second set is a set among the plurality of second entities that has similar characteristics to the first set. [Item 12] For example, any one of the information processing systems in Items 1 to 11, wherein the first entity and the second entity are persons, and the second data set describes each entity of the plurality of second entities. A data set of behavioral characteristics, the information processing system further includes an inference part, the inference part is about more than one attention entity, for each attention entity, calculates an inference value about the behavior of the corresponding attention entity, the one The above entity of concern is at least a part of the plurality of first entities, and the inferred value is calculated based on the characteristics of the behavior of at least one entity of the plurality of second entities, wherein the plurality of second entities have interacted with the plurality of second entities. Corresponding entities of interest establish correspondence. [Item 13] An information processing method, executed by a computer, including: obtaining a first data set, which is a data set about a plurality of first entities and describes each of the plurality of first entities. Characteristics; Obtain a second data set, which is a data set about a plurality of second entities and describes the characteristics of each of the plurality of second entities; by specifying a first data set specific from the first data set. The group of feature vectors and the group of second feature vectors specified from the second data set perform dimensionality reduction processing to generate a group of first low-dimensional feature vectors corresponding to the group of first feature vectors and a group of first low-dimensional feature vectors corresponding to the group of first feature vectors. The group of second feature vectors corresponds to a group of second low-dimensional feature vectors, wherein each of the first feature vectors represents a characteristic of a corresponding one of the plurality of first entities, and the second Each of the feature vectors represents a feature of a corresponding one of the plurality of second entities, and the group of the second low-dimensional feature vectors has the same number of dimensions as the group of the first low-dimensional feature vectors; And establishing a correspondence is to establish a correspondence between each of the plurality of first entities and at least one of the plurality of second entities according to the group of the first low-dimensional feature vectors and the group of the second low-dimensional feature vectors. . [Item 14] The information processing method of Item 13, wherein the establishing the correspondence is based on the similarity between the first entities specified from the group of the first low-dimensional feature vector and the similarity from the second low-dimensional feature vector. The group-specific similarity between the second entities corresponds to each of the plurality of first entities and one of the plurality of second entities, so that the mutual relationship between the first entities with respect to the similarity is suitable. relationship between the second entity. [Item 15] A computer program product with a built-in computer program, which is characterized in that: after the computer loads and executes the computer program, it can complete the information processing method of Item 13 or Item 14. [Item 16] A computer-readable recording medium storing a computer program, which is characterized in that: after the computer loads and executes the computer program, the information processing method of Item 13 or Item 14 can be completed.

1:資訊處理系統 11,31:處理器 13,33:記憶體 15,35:儲存器 15A:第一資料集 15B:第二資料集 15C:擴展資料集 17:用戶介面 19,39:通訊介面 30:傳送系統 35A:外部資料集 35B:內部資料集 35C:擴展資料集 40:利用企業方系統 E11,E12,E13,E14,E15,E16,E17,E21,E22,E23,E24,E25,E26,E27:點 Pr,Pr1:電腦程式 P1,P2,P3:商品 P1,S2,S3:網站 S110,S120,S130,S140,S150,S160,S170,S180,S190,S310,S320,S330,S340,S350,S360,S370,S380,S390,S410,S420,S430,S440,S450,S460,S510,S520,S530,S540,S550,S610,S620,S630,S640,S650,S710,S720,S730,S740,S750,S810,S820,S830,S840:步驟 1:Information processing system 11,31: Processor 13,33:Memory 15,35:storage 15A: First data set 15B: Second data set 15C:Extended data set 17: User interface 19,39: Communication interface 30:Transmission system 35A: External data set 35B: Internal data set 35C:Extended data set 40: Utilize enterprise-side systems E11,E12,E13,E14,E15,E16,E17,E21,E22,E23,E24,E25,E26,E27: points Pr, Pr1: computer program P1, P2, P3: Commodity P1,S2,S3: website S110,S120,S130,S140,S150,S160,S170,S180,S190,S310,S320,S330,S340,S350,S360,S370,S380,S390,S410,S420,S430,S440,S450,S460,S5 10, S520,S530,S540,S550,S610,S620,S630,S640,S650,S710,S720,S730,S740,S750,S810,S820,S830,S840: Steps

［第1圖］表示資訊處理系統之構成的方塊圖。［第2圖］表示處理器執行的分析處理的流程圖。［第3A圖］舉例說明第一資料集之構成的圖。［第3B圖］舉例說明第二資料集之構成的圖。［第4A圖］說明矩陣Ω之搜尋方法的圖。［第4B圖］說明矩陣Ω之搜尋方法的圖。［第5圖］舉例說明藉由處理器生成的對應表之構成的圖。［第6圖］舉例說明藉由處理器生成的擴展資料集之構成的圖。［第7圖］表示在第二實施形態中處理器執行的分析處理的流程圖。［第8圖］表示在第三實施形態中處理器執行的評價處理的流程圖。［第9圖］表示在第三實施形態中處理器執行的選擇處理的流程圖。［第10圖］表示第四實施形態的傳送系統之構成的方塊圖。［第11圖］舉例說明第四實施形態中的內部資料集之構成的圖。［第12圖］表示在第四實施形態中處理器執行的傳送控制處理的流程圖。［第13圖］表示在第五實施形態中處理器執行的傳送控制處理的流程圖。［第14圖］表示在第六實施形態中處理器執行的預測處理的流程圖。 [Picture 1] A block diagram showing the structure of an information processing system. [Figure 2] A flowchart showing the analysis processing performed by the processor. [Picture 3A] A diagram illustrating the composition of the first data set. [Picture 3B] A diagram illustrating the composition of the second data set. [Figure 4A] A diagram illustrating the search method for matrix Ω. [Picture 4B] A diagram illustrating the search method for matrix Ω. [Figure 5] This is an example of a diagram illustrating the composition of a correspondence table generated by a processor. [Figure 6] A diagram illustrating the composition of an extended data set generated by a processor. [Fig. 7] A flowchart showing analysis processing executed by the processor in the second embodiment. [Fig. 8] A flowchart showing evaluation processing executed by the processor in the third embodiment. [Fig. 9] A flowchart showing selection processing executed by the processor in the third embodiment. [Fig. 10] A block diagram showing the structure of the transmission system of the fourth embodiment. [Fig. 11] A diagram illustrating the structure of the internal data set in the fourth embodiment. [Fig. 12] A flowchart showing transfer control processing executed by the processor in the fourth embodiment. [Fig. 13] A flowchart showing transfer control processing executed by the processor in the fifth embodiment. [Fig. 14] A flowchart showing prediction processing executed by the processor in the sixth embodiment.

Claims

An information processing system consisting of: The first acquisition unit is configured to acquire a first data set regarding a plurality of first entities, and the first data set describes the characteristics of each of the plurality of first entities; The second acquisition unit is configured to acquire a second data set regarding a plurality of second entities, and the second data set describes the characteristics of each of the plurality of second entities; The dimensionality reduction unit is configured to perform dimensionality reduction processing on a group of first eigenvectors specified from the first data set and a group of second eigenvectors specified from the second data set. A group of first low-dimensional feature vectors corresponding to the group of first feature vectors and a group of second low-dimensional feature vectors corresponding to the group of second feature vectors, wherein each of the first feature vectors represents a feature of a corresponding one of the plurality of first entities, each of the second feature vectors represents a feature of a corresponding one of the plurality of second entities, and the second low-dimensional feature vector The group has the same number of dimensions as the group of the first low-dimensional feature vector; and The correspondence establishment unit is configured to associate each of the plurality of first entities with at least one of the plurality of second entities based on the group of the first low-dimensional feature vectors and the group of the second low-dimensional feature vectors. Establish correspondence.

The information processing system of claim 1, wherein the correspondence establishment unit is based on the similarity between the first entities specified from the group of the first low-dimensional feature vector and the group of the second low-dimensional feature vector. Specific similarity between the second entities, establishing correspondence between each of the plurality of first entities and at least one of the plurality of second entities, so that the mutual relationship between the first entities with respect to the degree of similarity is suitable for The mutual relationship between the second entities.

Such as requesting the information processing system of item 1, wherein, Define the group of the first low-dimensional feature vectors according to the first feature space, The group of second low-dimensional feature vectors is defined according to the second feature space, The correspondence establishment unit searches for a mapping for mapping the plurality of first entities on the first feature space to the second feature space, so that the first specific entity from the group of the first low-dimensional feature vector The distribution of the plurality of first entities in feature space is adapted to the distribution of the plurality of second entities in the second feature space specified from the group of second low-dimensional feature vectors, and Each of the plurality of first entities is corresponding to at least one of the plurality of second entities based on the mapping.

The information processing system of Claim 1, wherein the correspondence establishment unit is configured to search for the matrix Ω that maximizes the value Z(Ω) according to the following mathematical expression including the matrix K, the matrix L, and the matrix H as the matrix Ω ^* , and establish a correspondence between each of the plurality of first entities and at least one of the plurality of second entities according to the matrix Ω ^* , [Mathematical Formula 1] The quantity of the first entity is N, the quantity of the second entity is the same as the quantity of the first entity, the matrix K is a system, and the value of the element in the i-th row and j-th column represents the i-th of the plurality of first entities. The similarity between the entity and the j-th entity is based on the first low-dimensional feature vector of the i-th entity among the plurality of first entities and the j-th entity among the plurality of first entities. The first similarity matrix with N rows and N columns is calculated based on the first low-dimensional feature vector. In this matrix L, the value of the element in the i-th row and j-th column represents the i-th entity among the plurality of second entities. The similarity with the j-th entity is based on the second low-dimensional feature vector of the i-th entity among the plurality of second entities and the j-th entity among the plurality of second entities. The second similarity matrix of N rows and N columns calculated from the two low-dimensional feature vectors, the matrix H series, when the value of the element in the i-th row and j-th column is i=j, it means that the value is 1-1/N , when the value of the element in the i-th row and j-th column is i≠j, it represents a matrix of N rows and N columns with a value of 0.

The information processing system of claim 4, wherein the correspondence establishment unit is configured to perform a re-search process on the matrix Ω ^* in a loop until a predetermined condition is met, thereby improving the matrix Ω ^* and based on the improved The matrix Ω ^* establishes a correspondence between each of the plurality of first entities and at least one of the plurality of second entities. The re-search process includes: changing the dimensionality reduction in the dimensionality reduction process according to the matrix Ω ^* method; causing the dimensionality reduction part to perform dimensionality reduction processing according to the updated dimensionality reduction method, and based on the newly obtained group of first low-dimensional feature vectors and the group of second low-dimensional feature vectors again Search for the matrix Ω ^* .

The information processing system of claim 5, wherein the correspondence establishment unit changes the dimensionality reduction method so that the first group of the first low-dimensional feature vectors and the group of the second low-dimensional feature vectors correspond to each other. The distance between the low-dimensional feature vector and the second low-dimensional feature vector in the feature space is shortened.

Such as requesting the information processing system of item 1, wherein, The first data set includes a plurality of first characteristic data, each of the plurality of first characteristic data represents a characteristic of a corresponding one of the plurality of first entities, The second data set includes a plurality of second characteristic data, each of the plurality of second characteristic data represents a characteristic of a corresponding one of the plurality of second entities, The information processing system further includes a data fusion unit. The data fusion unit generates the plurality of first characteristic data based on the correspondence between the plurality of first entities and the plurality of second entities established by the correspondence creation unit. Each of the plurality of second characteristic data is combined with one of the plurality of second characteristic data, thereby generating an extended data set. The extended data set includes a plurality of extended data. Each of the plurality of extended data is a corresponding first characteristic data and the corresponding Data generated by combining a second characteristic data.

If the information processing system of claim 1 is requested, the first entity and the second entity are persons, The first data set is a data set describing the first characteristics of each of the plurality of individuals belonging to the first group, The second data set is a data set describing the second characteristics of each of the plurality of individuals belonging to the second group.

The information processing system of claim 8, wherein the combination of the first feature and the second feature is a feature about purchasing behavior, a feature about movement in at least one of online space and offline space, and/ Or a combination of features regarding visits to multiple locations in the space.

For example, the information processing system of any one of claims 1 to 9, wherein the first entity and the second entity are persons, The identification information of the information terminal corresponding to each of the plurality of second entities is associated with the second data set, The information processing system includes: A selection unit that selects, as information, at least a part of a set of second entities among the plurality of second entities that has been associated with any one of the plurality of first entities through the correspondence establishing unit. the subject of the transmission of the content; and The transmission unit is configured to transmit the information content to a set of information terminals corresponding to the transmission target of the information content based on the identification information.

The information processing system of claim 10, wherein the selection unit selects a first set and a second set as transmission targets of the information content, and the first set is connected to the plurality of first entities through the correspondence establishment unit. Any entity has established a corresponding set of the second entity; the second set is a set among the plurality of second entities that has similar characteristics to the first set.

For example, the information processing system of any one of claims 1 to 9, wherein the first entity and the second entity are persons, The second data set is a data set describing characteristics of the behavior of each of the plurality of second entities, The information processing system further includes an inference part, which is about one or more entities of interest, and for each entity of interest, calculates an inference value about the behavior of the corresponding entity of interest, The one or more entities of interest are at least part of the plurality of first entities, The inferred value is calculated based on the characteristics of the behavior of at least one entity of the plurality of second entities, wherein the plurality of second entities has established a correspondence with the corresponding entity of interest.

An information processing method executed by a computer, including: Obtaining a first data set, the first data set is a data set related to a plurality of first entities and describes the respective characteristics of the plurality of first entities; Obtaining a second data set, the second data set is a data set related to a plurality of second entities and describes the respective characteristics of the plurality of second entities; Generating a group of first feature vectors specified from the first data set and a group of second feature vectors specified from the second data set by performing a dimensionality reduction process on the group of first feature vectors specified from the first data set a group of corresponding first low-dimensional feature vectors and a group of second low-dimensional feature vectors corresponding to the group of second feature vectors, wherein each of the first feature vectors represents the plurality of first A characteristic of a corresponding one of the entities, each of the second feature vectors represents a characteristic of a corresponding one of the plurality of second entities, and the group of the second low-dimensional feature vectors has the same characteristics as the third A group of low-dimensional feature vectors with the same number of dimensions; and Establishing a correspondence involves establishing a correspondence between each of the plurality of first entities and at least one of the plurality of second entities based on the group of first low-dimensional feature vectors and the group of second low-dimensional feature vectors.

The information processing method of claim 13, wherein the establishing the correspondence is based on the similarity between the first entities specified from the group of the first low-dimensional feature vector and the group-specified from the second low-dimensional feature vector. The similarity between the second entities, each of the plurality of first entities is corresponding to one of the plurality of second entities, so that the mutual relationship between the first entities with respect to the similarity is suitable for the third entity The relationship between two entities.

A computer program product with a built-in computer program, which is characterized by: After the computer program is loaded and executed, the information processing method of claim 13 or claim 14 can be completed.

A computer-readable recording medium storing computer programs, which is characterized by: After the computer program is loaded and executed, the information processing method of claim 13 or claim 14 can be completed.