WO2024096054A1 - 情報処理方法、プログラム、記憶媒体および情報処理装置 - Google Patents
情報処理方法、プログラム、記憶媒体および情報処理装置 Download PDFInfo
- Publication number
- WO2024096054A1 WO2024096054A1 PCT/JP2023/039412 JP2023039412W WO2024096054A1 WO 2024096054 A1 WO2024096054 A1 WO 2024096054A1 JP 2023039412 W JP2023039412 W JP 2023039412W WO 2024096054 A1 WO2024096054 A1 WO 2024096054A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- variables
- source
- information processing
- data source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
Definitions
- the present invention relates to an information processing method, a program, a storage medium, and an information processing device.
- information has been collected, stored, and analyzed from a large number of users for purposes such as marketing.
- Information is collected from various perspectives, such as attribute information such as gender, age, and place of residence, information such as hobbies and preferences, and behavioral history such as purchase history and visiting location information, and is used according to the purpose.
- attribute information such as gender, age, and place of residence
- information such as hobbies and preferences
- behavioral history such as purchase history and visiting location information
- personal information surrounding the Internet such as website visit history, mail order purchase history, service usage history, and exposure history to Internet advertisements.
- cookie technology which temporarily records site visit information on the information terminal used by the user, is widely used. By utilizing cookies, various behavioral data can be collected from the user's information terminal, making it possible to carry out marketing in addition to the user's attribute and preference information.
- Legal regulations include the enactment of laws to protect users' personal information, which is being implemented in countries around the world, and include, for example, making it mandatory to obtain consent for the collection and use of personal information.
- Technical regulations include, for example, restrictions on the use of 3rd Party Cookies in browsers and restrictions on the use of terminal IDs on mobile information terminals.
- Patent Document 1 (Patent No. 6511186) performs estimation processing to deliver appropriate advertisements to users whose attribute information is difficult to link to cookies. In other words, the results of breaking down known user access information into specified patterns are compared with the results of breaking down unknown user access information into specified patterns to estimate attributes.
- Patent Document 2 JP Patent Publication No. 2020-526828
- event information related to advertisement viewing is collected in a manner not based on personal identifiers in order to measure conversions of Internet advertisements without using cookies.
- Patent Document 3 Patent Publication No. 5793794 also discloses the creation of virtual customer data that combines the characteristics of multiple customers from data on a single customer, using parameters in the feature data that function as a gap. In addition, multiple databases are integrated by combining feature data with matching or similar characteristics based on the created feature data.
- Patent No. 6511186 JP 2020-526828 A Patent No. 5793794
- Patent Document 2 methods based only on access information, such as those in Patent Document 1, have limitations in the accuracy of attribute estimation.
- collected data is managed together with the identification information of the information terminal, which raises concerns about the protection of users' personal information.
- pseudo-single-source data By integrating separate data sources, it is expected that it will be possible to achieve both the protection of personal information and the provision of information useful for marketing.
- the present invention was made in consideration of the above problems, and its purpose is to provide a technology for generating pseudo single-source data with high accuracy.
- the information processing method is characterized by having the following features.
- the present invention also employs the following configuration.
- the present invention also employs the following configuration.
- the information processing method is characterized by having the following features.
- the present invention provides a technology for generating pseudo single-source data with high accuracy.
- FIG. 1 is a block diagram illustrating a configuration of a control block according to a first embodiment.
- FIG. 1 is a diagram for explaining a processing flow according to the first embodiment;
- a diagram explaining an example of a variance-covariance matrix A diagram showing the similarity of pseudo single-source data.
- FIG. 10 is a diagram for explaining generation of pseudo single-source data in a first comparative example;
- FIG. 13 is a diagram for explaining generation of pseudo single-source data in a second comparative example;
- the present invention is a type of technology for statistically generating a single data source (single-source data) from separate data sources, and relates to a technology for generating pseudo single-source data by simultaneously predicting multi-digit data from variables common to multiple data groups.
- the present invention can be understood as an information processing method or information processing system for performing such processing.
- the present invention can also be understood as an information processing device used in such an information processing method or constituting such an information processing system.
- the present invention can also be understood as a method for controlling an information processing system or information processing device.
- the present invention can also be understood as a program that operates using the computational resources of an information processing device and executes each step of the information processing method, or a storage medium on which the program is stored.
- the storage medium may be a non-transitory storage medium readable by a computer.
- the user is the subject of the action and the subject (sample) of data collection.
- the user is a consumer who purchases products and services.
- the researcher collects various information about the user and generates real or simulated single-source data.
- Single-source data is data obtained from multiple aspects of the same user.
- examples of the information included in this include attribute information such as gender, age, place of residence, marital status, educational background, and occupation, as well as hobbies, preferences, values, cohabitants, ownership, income, lifestyle information, purchase history, and web-related information.
- Examples of the user's web-related information include website visit history, internet advertising exposure history, mail order purchase history, and service usage history.
- the types of information are not limited to these.
- the method of collecting data is arbitrary, and appropriate methods can be used depending on the type of information to be obtained, such as questionnaire surveys, panel surveys, sales data from retail stores, and web analysis.
- FIG. 6(a) the first data source 501 is user purchase data, and variables such as "gender 502, age 503, and purchase history 504" have been acquired for three users 505 to 507.
- the second data source 511 is user advertising contact data, and variables such as "gender 512, age 513, advertising contact history 514, and contacted advertising medium 515" have been acquired for three users 516 to 518.
- the covariates of the first data source 501 and the second data source 511 are gender and age.
- data of users with similar characteristics of "female, 30s", such as user 505 in Figure 6(a) and user 518 in Figure 6(b) shown in dashed lines, is statistically combined to generate data for user 527 in Figure 6(c).
- the result of this data combination is the third data source 521, which serves as pseudo single-source data.
- this first method requires a lot of trial and error to appropriately select the covariates used in data fusion (e.g., selecting gender and age) and to set the similarity of variable values (e.g., setting an evaluation index that determines that ages 34 and 30 are close).
- the first data source and the second data source are assumed to be conditionally independent, and then these are fused.
- a covariate must be found that makes the first data source and the second data source uncorrelated when the covariate is fixed with a third data source (single-source data) that is the correct answer at hand.
- a third data source single-source data
- the first data source and the second data source have the property of being uncorrelated.
- data is sorted by sex and age, and the hobbies, preferences, purchasing characteristics, and relationships between them for each group are analyzed, but the pseudo single-source data generated by the first method cannot be used for such normal marketing purposes.
- the pseudo single-source data generated by the first method may have a spurious correlation between the first data source and the second data source via a covariate.
- a multivariate normal distribution is assumed as the generation model of the first data source and a multivariate normal distribution is also assumed as the generation model of the second data source
- the generation model of the pseudo single-source data obtained by data fusion under the assumption of conditional independence will also be a multivariate normal distribution, but it can be shown that a spurious correlation occurs between the first data source and the second data source via a covariate.
- the generation models of the first data source and the second data source are not known in advance, and such analytical verification is impossible.
- the basic attributes may correlate with the data contained in the first data source 501 and the second data source 511 (e.g., purchase history 504 and advertising exposure history 514 and advertising medium 515), but it cannot be proven that this is not a spurious correlation.
- spurious single-source data through data fusion may lead to the creator mistaking a spurious correlation created for their own convenience for some meaningful correlation, leading to the planning of marketing activities that have no return on investment, which has a negative impact on the implementation of marketing.
- Patent Document 3 Patent No. 5793794
- Patent Document 3 which creates virtual customer data using feature data
- multiple databases are integrated by combining feature data with matching or similar features based on the created feature data.
- Patent Document 3 uses data fusion as a combining means to combine data with similar features to generate a database. In other words, it can be said that the same problems as those in the first method are inherent.
- a method of predicting and supplementing data for each variable (element) is considered.
- This is a method of constructing a prediction model from covariates of separate data sources and generating data.
- Fig. 7(a) and Fig. 7(b) show a first data source 501 and a second data source 511 similar to Fig. 7(a) and Fig. 7(b), respectively, and from these, a third data source 521 having variables as shown in Fig. 7(c) is generated.
- explanatory variables do not have a significant explanatory power for predicting the objective variable.
- gender and age are variables that cannot be excluded as explanatory variables, but there is little that can be explained by these two explanatory variables.
- uniform data may be generated for users with the same explanatory variables, such as "female, 30s.” This is a data tendency that is difficult to imagine in reality, and is problematic in terms of use in marketing, etc. In order to prevent uniform data generation, it is possible to apply Bernoulli random numbers, etc.
- the method of generating pseudo single-source data that can be considered an extension of conventional technology has problems with the accuracy of data combination and the quality of the generated data. Therefore, there is a demand for a data generation method that has properties close to actual single-source data and is suitable for use in marketing, etc.
- Example 1 (System Configuration) The overall configuration of an information processing system 1 according to the present invention will be described with reference to Fig. 1.
- the information processing system 1 includes a user terminal 20, a respondent terminal 30, a store terminal 40, and a data provider terminal 50.
- the information processing device 10 is interconnected with other components via a communication network such as the Web or a dedicated line so as to enable transmission and reception of information between the information processing device 10 and the other components.
- the researcher uses the information processing device 10 to carry out various information processing including single-source data generation.
- the information processing device 10 comprises computing resources such as a control unit 1001 such as a CPU, a storage unit 1002 such as ROM, RAM or HDD, a communication unit 1003 such as a communication adapter, an input unit 1004 such as a mouse or keyboard, and a notification unit 1005 such as a display or speaker.
- the information processing device 10 is preferably an information processing device such as a PC or workstation that operates according to instructions from a program deployed in memory or user instructions via an interface. Note that a cloud server that uses computing resources on the cloud may be used as the information processing device 10.
- the information processing device 10 may also be a combination of multiple PCs connected via an internet line or directly.
- the user terminal 20 is a terminal that allows the user to carry out various types of information processing on a daily basis, and can be a PC, a smartphone, a tablet device, etc.
- the user terminal 20 collects the user's website visit history, mail order purchase history, service usage history, exposure history to internet advertisements, etc., and transmits them to the researcher's information processing device 10.
- the respondent terminal 30 is an information terminal of a respondent who belongs to the researcher's monitoring organization. Respondents use the respondent terminal 30 to answer questionnaires about their basic attributes, hobbies, preferences, possessions, etc., and send them to the researcher's information processing device 10.
- the store terminal 40 is an information terminal installed in a store that sells products, which collects purchasing data from users and transmits it to the researcher's information processing device 10.
- the store may be a physical store or an online store.
- a POS terminal that acquires POS (Point of Sale) information during the cash register process when a product is purchased may be used.
- the data provider terminal 50 is an information terminal used by a data provider, such as a vendor that sells data that can be used for single-source data.
- the data provider processes data that they have collected themselves or purchased into a format desired by the researcher and transmits it to the information processing device 10.
- the data about users received by the researcher from the user terminal 20, respondent terminal 30, store terminal 40, and data provider terminal 50 typically contains the values of various variables associated with the user's basic attributes.
- the information contained in the single-source data is not limited to the above example.
- the source of data acquisition is also not limited to the user terminals in the above example. Needless to say, data acquisition is performed in compliance with laws and regulations, such as obtaining the user's consent, regardless of whether it is acquired by the researcher himself or herself.
- Fig. 2 is a block diagram for explaining data transmission/reception and data processing realized by functional modules of a program in the information processing device 10.
- the block configuration is not limited to this as long as the information processing of the present invention can be realized.
- the control unit 1001 of this embodiment has a data acquisition unit 1010, a pseudo data generation unit 1020, and a data analysis unit 1030.
- the data acquisition unit 1010 includes a data classification unit 1011, a matrix calculation unit 1012, and a learning execution unit 1013.
- the pseudo data generation unit 1020 includes a data selection unit 1021, a model application unit 1022, and a data organization unit 1023.
- the data analysis unit 1030 includes an analysis setting unit 1031 and an analysis implementation unit 1032. The processing of each of these blocks will be described later.
- ⁇ Processing flow> The processing flow of this embodiment will be described with reference to Fig. 3.
- the researcher generates single-source data by integrating various data sources about a certain user, preferably with the user's basic attributes as common variables.
- the basic attributes are gender and age (generation).
- pseudo single-source data is created by integrating data obtained from each data source, such as the user terminal 20, respondent terminal 30, store terminal 40, and data provider terminal 50.
- step S101 data from each data source is sent to the data acquisition unit 1010 via the communication unit 1003 of the information processing device 10.
- each data includes gender and age (generation), which are used as explanatory variables as described later, but the variables are not limited to these. Since various data sources are used, one record has many blank fields.
- the data classification unit 1011 classifies and stores the data in a table. At this time, the data classification unit 1011 may perform various data shaping processes, such as converting the data into a specified format (for example, determining the generation from the age), removing outliers, and simply combining multiple similar data sources. Note that the table is defined on the database of the storage unit 1002.
- the table of single-source data in this embodiment holds various items, including at least gender and generation, as columns.
- the data classification unit 1011 adds the acquired data to the database as records, using gender and generation as common variables.
- the matrix calculation unit 1012 calculates a covariance matrix based on the first data source 201, for example, as shown in FIG. 4A.
- the first data source 201 is data consisting of 1000 records having n variables Y 1 to Y n in addition to the user's gender and age.
- each variable is a binary variable that takes either a true (1) or a false (0) value.
- the data classification unit 1011 classifies them based on a predetermined criterion, and converts them into a format that can be calculated by an information processing device.
- the variance-covariance matrix refers to an n ⁇ n matrix for n variables, in which the diagonal elements at the intersections between the same variables are set with variance, which is the degree of dispersion of the data of the variables, and the off-diagonal elements at the intersections between different variables are set with covariance, which is an index showing the relationship between the variables. Therefore, if the variance of variable x is written as ⁇ 2 x and the covariance of variables x and y is written as Cov(x, y), the variance-covariance matrix 211 is expressed as shown in FIG. 4(b).
- the variables Y 1 to Y n are also used as explanatory variables. This makes it possible to make a prediction using information on the correlation between the variables Y 1 to Y n , and it is expected that the prediction accuracy will be improved. Note that even in this case, it is not necessary to use all variables other than the variable Y 2 , and explanatory variables may be selected as appropriate.
- the matrix calculation unit 1012 models the relationships between multiple variables in the first data source 201, which includes other variables in addition to gender and age as explanatory variables.
- the matrix calculation unit 1012 reads the first data source 201, it calculates the variance-covariance matrix between multiple variables that serve as training data, and the average value of each variable, and indexes the relationships between the objective variables.
- step S103 the data selection unit 1021 of the pseudo data generation unit 1020 selects a second data source different from the first data source 201 as an application target of the model.
- the second data source includes explanatory variables in the model created in step S102.
- step S104 the model application unit 1022 applies the model and optimization calculation to the second data source to generate values of the objective variables.
- an optimization problem of the following formulas (1) and (2) is defined so that the variance-covariance matrix and the average value in the teacher data are reproduced.
- the above formula (1) is an optimization problem for matching the relationships between explanatory variables and objective variables
- the above formula (2) is an optimization problem for matching the relationships between objective variables.
- optimization calculation processing is performed using the gradient method of the following formulas (3) and (4).
- ⁇ , ⁇ , and ⁇ are tuning parameters.
- the objective variable is predicted simultaneously while maintaining the variance-covariance matrix and the mean value.
- the variable X is an explanatory variable.
- the components i and j are the explanatory variables and the corresponding objective variable components, and mean the explanatory variable in the i-th column and the objective variable in the j-th column.
- Formulas (1) and (3) correspond to a conventional process of restoring the structure of the portion of the dashed line 212 in FIG. 4(b).
- formulas (2) and (4) correspond to a process of restoring the structure of the portion of the dashed line 213 adopted in the present invention.
- the above formula (3) is a formula for solving the optimization problem of the above formula (1) using the gradient method
- the above formula (4) is a formula for solving the optimization problem of the above formula (2) using the gradient method as well.
- the objective variable is a multi-digit variable Y1 to Yn .
- the first term below represents the error between the variance-covariance matrix of the teacher data and the variance-covariance matrix of the generated prediction data. Since the absolute value of the error is expected to be smaller than 1, the absolute value (so-called L1 norm) is used to strictly evaluate errors with an absolute value of less than 1.
- the second term below represents the error between the average value of the training data and the average value of the generated predicted data. Squared error is used to evaluate the error, but absolute values can also be used to evaluate the error more strictly.
- the third term below expresses a constraint that the generated prediction data must be a binary variable, either true (1) or false (0). The symbol 0 means the Hadamard product. This constraint can also be seen as a penalty.
- Equation (3) which is an optimization calculation process using the gradient method, searches for an optimal solution that minimizes the errors of these variance-covariance matrices and average values and satisfies the binary constraints.
- the tuning parameters ⁇ , ⁇ , and ⁇ are parameters that control the optimization calculation process and can be determined by the researcher. A similar interpretation can be made for equation (4).
- step S105 the data sorting unit 1023 stores the data including the multiple predicted values of the first data source in the second data source as single-source data.
- the data may be integrated with other single-source data already stored in the storage unit 1002.
- step S106 onwards is performed by the data analysis unit 1030.
- the analysis setting unit 1031 selects a data group having variables suitable for the research purpose from the single-source data stored in the memory unit 1002.
- the selected data may include actual single-source data in addition to pseudo single-source data.
- the analysis implementation unit 1032 analyzes the single-source data to obtain information required by the investigator, and notifies the investigator via the notification unit 1005.
- the contents of the data analysis are not particularly limited, and any method may be adopted as long as it can provide information to the investigator.
- the objective variables are predicted simultaneously while maintaining the variance-covariance structure. This reduces the problem of changes in the relationships between data, which was a problem with conventional technology, and is expected to be useful for marketing by generating highly accurate pseudo single-source data.
- Figure 5(a) is a variance-covariance matrix based on actual single-source data of 33 variables.
- Figure 5(b) is a variance-covariance matrix based on pseudo single-source data using the method of this flow.
- Figure 5(c) is a variance-covariance matrix based on data obtained by a conventional method, that is, data in which a prediction model is constructed for each variable and a random number is used to give a variance. In both figures, the higher the correlation, the brighter the color, and the lower the correlation, the darker the color.
- the similarity can also be judged by a direct comparison of the difference between the variance-covariance matrices of the training data and the generated data, or by the difference calculated by adding up the absolute values of the differences in the matrix elements for the upper triangular part including the diagonal elements.
- the method of judging the similarity is not limited to this, and any method can be used.
- Example 2 Next, a second embodiment of the present invention will be described.
- the same components as those in the first embodiment are denoted by the same reference numerals, and the description thereof will be simplified.
- Example 1 as described with reference to the comparative example, a method for generating pseudo single-source data that maintains a statistical structure by utilizing a variance-covariance structure between dependent variables has been described in order to solve the problems of the conventional technology. However, depending on the nature of the data and the problem, it is also possible to generate accurate pseudo single-source data by maintaining another statistical structure in addition to (or instead of) the variance-covariance structure.
- the variance-covariance matrix can be considered as a second-order moment.
- the nature of the data tends to follow a normal distribution, it is suitable for generating high-quality pseudo single-source data using the method described in Example 1.
- data used for marketing does not necessarily follow a normal distribution.
- the asymmetry of the distribution can be expressed by using skewness, which is the third-order moment, or kurtosis, which is the fourth-order moment. In other words, it can be said that it is possible to generate pseudo single-source data that maintains a more complex statistical structure.
- the first data source of this embodiment may include continuous variables as part of the variables.
- the matrix calculation unit 1012 reads the first data source, calculates the variance-covariance matrix between multiple variables that serve as training data, the mean value of each variable, and the skewness, and indexes the relationship between the objective variables.
- step S103 the data selection unit 1021 of the pseudo data generation unit 1020 selects a second data source different from the first data source as an application target of the model.
- the second data source includes explanatory variables in the model created in step S102.
- step S104 the model application unit 1022 applies the model and optimization calculations to the second data source to generate values for the objective variables.
- the optimization problem is defined so as to reproduce the variance-covariance matrix, mean, and skewness in the training data.
- a term that reproduces the skewness is added to equation (1) in Example 1.
- a tuning parameter ⁇ required to reproduce the skewness is added to equation (2). If the variables include continuous variables, the constraint equations in the third terms of equations (1) and (2) are further modified so as to accommodate continuous variables. These program modifications minimize errors in the variance-covariance matrix, mean, and skewness, and an optimal solution is searched for.
- step S105 the data reduction unit 1023 saves the data including the multiple predicted values of the first data source in the second data source as single-source data.
- the data reduction unit 1023 saves the data including the multiple predicted values of the first data source in the second data source as single-source data.
- skewness which is the third moment
- kurtosis which is the fourth moment
- Effects can be obtained by performing processing using statistical distribution features such as the mean, variance-covariance matrix, skewness, and kurtosis according to the nature and purpose of the data.
- Example 3 In the above-mentioned first and second examples, the statistical structure of the training data is maintained while simultaneously predicting the objective variable, thereby generating highly accurate pseudo single-source data that can retain the structure of the actual single-source data.
- highly accurate pseudo single-source data may raise concerns from the perspective of privacy. Therefore, in this embodiment, a "differential privacy" function is provided that aims to utilize data while removing privacy-related information, and a method for avoiding privacy issues is described.
- differential privacy makes it possible to change the first data source into mathematically provable data while maintaining its statistical usefulness by adding appropriate noise. This is because adding appropriate noise outputs similar statistics regardless of whether a particular individual is included in the data set.
- differential privacy is a mechanism that makes it difficult to guess personal data contained in the data while maintaining the statistical structure of the training data.
- the mechanism for adding noise can be determined appropriately by the researcher, and for example, known Laplace mechanisms, Gaussian mechanisms, or exponential mechanisms can be adopted.
- the matrix calculation unit 1012 applies the Laplace mechanism to add a noise value to the income of each user.
- the statistical structure of the data to which noise has been added does not deviate significantly from that before the noise was added. It is also preferable for the matrix calculation unit 1012 to add noise using a known method so as to conform to the requirements for maintaining the desired level of privacy. Data to which noise has been added in this way is an approximate value that does not exist as personal data, and is privacy-protected data.
- the subsequent processing can be carried out in the same manner as in Examples 1 and 2. That is, by reading the first data source after the change and indexing the relationships between multiple variables that serve as training data, pseudo single-source data that is both highly accurate and privacy-protected can be generated.
- the first data source is the first group from which specific data has been acquired.
- the second data source is the second group from which the specific data has not been acquired.
- the data that has been acquired in the first group can be expanded and applied to the second group from which the data has not been acquired.
- the specific data that has been acquired in the second group has not been acquired in the first group
- the data can be expanded and applied from the second group to the first group.
- pseudo single-source data is generated in a manner that multiple data sources with missing variables complement each other, thereby improving data integrity.
- the pseudo single-source data created by the present invention can also be used for marketing using machine learning.
- a prediction formula is created by focusing on the relationship between explanatory variable and objective variable data.
- the variance-covariance matrix of the dashed line 212 in FIG. 4(b) also corresponds to a type of relationship between data. Therefore, the pseudo single-source data obtained by the present invention can be used as a data set for marketing using machine learning.
- the pseudo single-source data of the present invention maintains the complex statistical structure seen in actual single-source data, and is therefore suitable as training data for machine learning.
- pseudo single-source data is created for users.
- the technique of the present invention can also be used to create pseudo single-source data for various objects other than users.
- the present invention can be applied to various cases where there is a first data source and a second data source that includes related variables to complement the first data source.
- an appropriate database can be provided by applying other data sources related to buildings based on a first data source related to buildings. Specifically, using variables such as the size, structure, construction age, equipment, purpose of use, geographical factors, and human factors of the building as variables common to multiple data groups, data sources such as electricity usage data, gas usage data, equipment operation log data, and power generation data are integrated to generate pseudo single-source data. This makes it possible to calculate the energy usage and energy usage efficiency (energy usage per square meter, etc.) for each characteristic of the building.
- This database is suitable for optimizing energy usage, deciding on capital investment, comparing with other buildings, and considering measures to improve energy efficiency.
- an appropriate database can be provided by applying other data sources based on a first data source related to moving bodies. Specifically, using geographic variables and time variables as variables common to multiple data groups, data sources such as GPS data of moving bodies, traffic congestion data, construction information data, traffic accident data, weather data, and event information data are integrated to generate pseudo single-source data.
- This database is suitable for considering urban transportation operation strategies, such as optimizing operation routes to avoid traffic congestion, measures to increase the occupancy rate of public transportation, and fare adjustments.
- 10 Information processing device, 1001: Control unit, 1010: Data acquisition unit, 1020: Pseudo data generation unit
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Strategic Management (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Algebra (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Operations Research (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020257018013A KR20250099220A (ko) | 2022-11-04 | 2023-11-01 | 정보 처리 방법, 프로그램, 기억 매체 및 정보 처리 장치 |
| JP2024520737A JP7541212B1 (ja) | 2022-11-04 | 2023-11-01 | 情報処理方法、プログラム、記憶媒体および情報処理装置 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022-177646 | 2022-11-04 | ||
| JP2022177646 | 2022-11-04 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024096054A1 true WO2024096054A1 (ja) | 2024-05-10 |
Family
ID=90930618
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2023/039412 Ceased WO2024096054A1 (ja) | 2022-11-04 | 2023-11-01 | 情報処理方法、プログラム、記憶媒体および情報処理装置 |
Country Status (3)
| Country | Link |
|---|---|
| JP (1) | JP7541212B1 (https=) |
| KR (1) | KR20250099220A (https=) |
| WO (1) | WO2024096054A1 (https=) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2016038780A (ja) * | 2014-08-08 | 2016-03-22 | 株式会社博報堂Dyホールディングス | 情報処理システム、及び、プログラム。 |
| JP2018156299A (ja) * | 2017-03-16 | 2018-10-04 | 株式会社ビデオリサーチ | 調査データ処理装置及び調査データ処理方法 |
| WO2019142597A1 (ja) * | 2018-01-19 | 2019-07-25 | ソニー株式会社 | 情報処理装置、情報処理方法及びプログラム |
| WO2023085279A1 (ja) * | 2021-11-09 | 2023-05-19 | 株式会社博報堂Dyホールディングス | 情報処理システム及び情報処理方法 |
-
2023
- 2023-11-01 WO PCT/JP2023/039412 patent/WO2024096054A1/ja not_active Ceased
- 2023-11-01 KR KR1020257018013A patent/KR20250099220A/ko active Pending
- 2023-11-01 JP JP2024520737A patent/JP7541212B1/ja active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2016038780A (ja) * | 2014-08-08 | 2016-03-22 | 株式会社博報堂Dyホールディングス | 情報処理システム、及び、プログラム。 |
| JP2018156299A (ja) * | 2017-03-16 | 2018-10-04 | 株式会社ビデオリサーチ | 調査データ処理装置及び調査データ処理方法 |
| WO2019142597A1 (ja) * | 2018-01-19 | 2019-07-25 | ソニー株式会社 | 情報処理装置、情報処理方法及びプログラム |
| WO2023085279A1 (ja) * | 2021-11-09 | 2023-05-19 | 株式会社博報堂Dyホールディングス | 情報処理システム及び情報処理方法 |
Non-Patent Citations (3)
| Title |
|---|
| HOSHINO, TAKAHIRO: "Statistical Methods for Validity and Dataset Integration: Data Fusion and Propensity Score Adjustment Method", TRANSACTIONS OF JAPANESE SOCIETY FOR INFORMATION AND SYSTEMS IN EDUCATION, vol. 24, no. 3, 1 July 2007 (2007-07-01), JP, pages 216 - 224, XP009555124, ISSN: 1341-4135, DOI: 10.14926/jsise.24.216 * |
| MACROMILL, INC.: "Developed a unique tool that quickly generates single-source data from huge amounts of consumer panel data. Also started offering a dashboard that enables multifaceted analysis", 24 November 2021 (2021-11-24), XP093169015, Retrieved from the Internet <URL:https://prtimes.jp/main/html/rd/p/000000590.000000624.html> * |
| MIHARA, YOSHIYUKI; TAKAYAMA, YUSAKU; OBU, RYUJI; IWAMI, TETSUO; KAMIYAMA, HAJIME; MATSUMOTO, SHIGERU; TOMURO, MASAHIRO: "Development of a Cross-Industry Data Aggregation Platform Centered on Municipalities", TRANSACTIONS OF INFORMATION PROCESSING SOCIETY OF JAPAN: CONSUMER DEVICES & SYSTEMS, vol. 10, no. 1, 3 March 2020 (2020-03-03), JP , pages 15 - 25, XP009555132, ISSN: 2186-5728 * |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20250099220A (ko) | 2025-07-01 |
| JPWO2024096054A1 (https=) | 2024-05-10 |
| JP7541212B1 (ja) | 2024-08-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN118379095B (zh) | 一种广告投放调整系统及方法 | |
| Giráldez‐Cru et al. | Modeling agent‐based consumers decision‐making with 2‐tuple fuzzy linguistic perceptions | |
| CN113157752B (zh) | 一种基于用户画像和情境的科技资源推荐方法及系统 | |
| Cheng et al. | Customer lifetime value prediction by a Markov chain based data mining model: Application to an auto repair and maintenance company in Taiwan | |
| CN116823498A (zh) | 一种基于大数据分析和机器学习的个性化保险产品推荐引擎系统 | |
| Wang et al. | Knowledge fusion enhanced graph neural network for traffic flow prediction | |
| Zhong et al. | Distinguishing the land use effects of road pricing based on the urban form attributes | |
| JP2004355616A (ja) | 情報提供システム並びに情報処理システム | |
| CN118710318A (zh) | 基于区块链的贸易数据处理方法及系统 | |
| JP2004234646A (ja) | コンテンツ関連情報提供装置及びコンテンツ関連情報提供方法及びコンテンツ関連情報提供システム及び携帯端末、並びに情報処理システム。 | |
| CN118941330A (zh) | 一种基于poi和客户数据挖掘潜在商机的方法及系统 | |
| Wilson et al. | Linking transportation agent-based model (ABM) outputs with micro-urban social types (MUSTs) via typology transfer for improved community relevance | |
| KR20240019901A (ko) | 인공지능 기반의 사용자 행동측정 모델에 의한 상품의 수요예측방법 | |
| JP7541212B1 (ja) | 情報処理方法、プログラム、記憶媒体および情報処理装置 | |
| JP2005025645A (ja) | 情報提供システム並びに情報処理システム | |
| Kudyba | Information creation through analytics | |
| TWI813888B (zh) | 土地智能估價系統 | |
| Kumar et al. | Handbook of research on intelligent techniques and modeling applications in marketing analytics | |
| Ghanim et al. | Enhancing Spatial Legibility through Building Layout Optimization: The Case of Erbil City Shopping Malls | |
| Roumpani et al. | Data-driven modelling of public library infrastructure and usage in the United Kingdom | |
| Guzman et al. | E-commerce adoption and its influence on the business performance of micro and small enterprises | |
| Lopes et al. | The impact of the post-purchase experience on online cosmetic consumer satisfaction: case study Pluricosmética | |
| Chaudhary et al. | Artificial intelligence-based digital marketing for discovering shopping possibilities and enhancing customer experience | |
| CN118037338B (zh) | 一种基于机器学习的广告交易模拟和预测方法及装置 | |
| Lanjouw | Estimating Geographically Disaggregated Welfare |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| ENP | Entry into the national phase |
Ref document number: 2024520737 Country of ref document: JP Kind code of ref document: A |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23885808 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 20257018013 Country of ref document: KR Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWP | Wipo information: published in national office |
Ref document number: 1020257018013 Country of ref document: KR |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23885808 Country of ref document: EP Kind code of ref document: A1 |