WO2017092696A1

WO2017092696A1 - Method for safe integration of big data without leaking privacy

Info

Publication number: WO2017092696A1
Application number: PCT/CN2016/108245
Authority: WO
Inventors: 周雍恺; 柴洪峰; 何朔; 何东杰; 刘国宝; 才华
Original assignee: 中国银联股份有限公司
Priority date: 2015-12-02
Filing date: 2016-12-01
Publication date: 2017-06-08
Also published as: TW201727516A; CN105590066B; TWI664538B; CN105590066A

Abstract

A method for safe integration of big data, comprising: a first party and a second party negotiating about associated fields, data items required by the first party and the second party and a sorting rule; screening out, on the basis of the data items required by the first party and the second party, a first to-be-integrated data set and a second to-be-integrated data set respectively from a first data set and a second data set; sorting, according to the sorting rule, respectively the first to-be-integrated data set and the second to-be-integrated data set, and removing, respectively from the first to-be-integrated data set and the second to-be-integrated data set, the data that the associated fields correspond to; submitting the first to-be-integrated data set and the second to-be-integrated data set to a third party computing platform, so as to form an integrated data set; and the third party computing platform generating, by means of analysis and calculation of the integrated data set, a result data set. This invention effectively prevents the private data from being leaked while accomplishing the integration of big data, facilitating share of information on the premise of ensuring the data security.

Description

Big data security fusion method without revealing privacy

Technical field

The invention relates to a big data security fusion method.

Background technique

With the introduction of the national “Internet Plus” strategy, the need for big data convergence between industries is becoming more urgent. However, on the one hand, different institutions have a welcome attitude towards big data sharing. The introduction of different types of data can produce new analysis results, and the data value will have a multiplier effect; on the other hand, the two sides are in the process of data fusion. There is concern about the disclosure of private data, because the final analysis result is often only a statistical conclusion, but in the process of big data fusion calculation, it has to expose all the details of the data to the other party. This problem has become a major obstacle to the sharing and sharing of big data between industries.

Therefore, those skilled in the art desire to obtain a reliable big data security fusion method that effectively blocks private data.

Summary of the invention

It is an object of the present invention to provide a big data security fusion method that effectively blocks private data.

To achieve the above object, the present invention provides a technical solution as follows:

A big data security fusion method for fusing a first data set stored by a first party with a second data set stored by a second party, the method comprising the steps of: a), the first party and the second party Correlating fields, respective data items, and collation rules are negotiated; b) filtering the first data to be merged and the second data to be merged from the first data set and the second data set respectively according to respective data items The first data to be merged and the second data to be merged are sorted according to the sorting rule, and the data corresponding to the associated field is respectively removed from the first data to be merged and the second data to be merged; d), the first party and the second party respectively submit the first data to be merged and the second data to be merged to a third party computing platform to form a merged data set; e), the third party computing platform pairs the merged data Set Perform analytical calculations to generate a result data set.

Preferably, the third party computing platform is independent of the first party and the second party, respectively.

Preferably, after the analysis calculation is completed, the first to-be-fused data set and the second to-be-fused data set are deleted from the computing system.

The big data security fusion method provided by the embodiment of the present invention effectively prevents the leakage of private data while realizing big data fusion, promotes information sharing under the premise of ensuring data security, and broadens the application breadth of big data fusion technology and depth. In addition, the above-mentioned big data security convergence method is simple to implement and low in implementation cost, and is advantageous for promotion and application in the industry.

DRAWINGS

FIG. 1 is a schematic flowchart diagram of a big data security convergence method according to a first embodiment of the present invention.

detailed description

It should be noted that, in accordance with various embodiments of the present disclosure, the first party stores the first data set in the first database, and the second party stores the second data set in the second database.

The first and second data sets respectively record different information, such as activity information of multiple users on different occasions. The first and second data sets have an intersection of information, such as user identity information, which can be extracted as an associated field.

The present invention provides various embodiments for performing big data fusion on first and second data sets.

As shown in FIG. 1 , a first embodiment of the present invention provides a big data security fusion method, which includes the following steps:

Step S10: The first party and the second party negotiate the associated fields, the data items required by each, and the collation rules.

Specifically, the first party and the second party negotiate a session, and agree on the associated fields, the respective required data items, and the collation rules.

The respective data items required include data items that the first party desires to obtain indirectly from the second party in the data fusion, and data items that the second party desires to obtain indirectly from the first party in the data fusion. The first party and the second can be determined in the negotiation session by the respective data items required The parties are concerned about which users have relevant information, and further agree on the identity information of these users.

The associated field can represent an intersection of information in the first and second data sets, which can be taken directly from any one or more of the following information: identity information of the user; card information held by the user; and/or uniquely determining the user Other identifying information.

The collation determines the order in which the specific data sets to be merged are sorted in the subsequent fusion process. Once determined, this sorting rule cannot be arbitrarily changed unless changes are made through a separate negotiation session. According to the determined sorting rules, the correspondence between the data items in the first and second to-be-fused data sets can also be determined.

The negotiation session can be initiated by the first party or the second party, and the other party responds. Alternatively, the negotiation session may be initiated by an independent entity module different from the first party and the second party. After receiving the instruction, the first party and the second party directly perform the negotiation session, and after the negotiation session is completed, notify the entity module. .

Step S20: Filter the first data to be merged and the second data to be merged from the first data set and the second data set respectively according to the data items required by the respective data items.

Specifically, based on the respective required data items determined by the negotiation session, the first data set to be merged may be filtered out from the first data set, and the second data set to be merged may be filtered out from the second data set. It can be understood that the first to-be-fused data set and the second to-be-fused data set have the same number of data items, and each data item in the first to-be-fused data set can find the corresponding data in the second to-be-fused data set. Item and vice versa.

Step S30: Sort the first to-be-fused data set and the second to-be-fused data set according to the sorting rule, and remove the data corresponding to the associated field from the first to-be-fused data set and the second to-be-fused data set respectively.

This step S30 specifically includes a sorting step and a culling step.

According to a specific implementation, the sorting step may include: the first party and the second party respectively sort the first to-be-fused data set and the second to-be-fused data set according to the sorting rule.

The culling step may include: the first party and the second party respectively remove data corresponding to the associated field from the first to-be-fused data set and the second to-be-fused data set.

By performing the culling step, the first and second data sets to be merged no longer include user identity information, thereby effectively shielding the privacy information; and by performing the sorting step, the data items in the first and second data sets to be merged are already Have a clear one-to-one correspondence.

Step S40: The first party and the second party respectively submit the first data to be merged and the second data to be merged to a computing platform set up by a third party to form a merged data set.

Specifically, the first party submits the first data to be merged obtained after the performing the sorting step and the culling step to the computing platform of the third party through a dedicated communication line, and the second party performs a similar operation. The third-party computing platform is independent of the first party and the second party, respectively.

Then, according to the sequence obtained by performing the above sorting step, the data items in the first data to be merged are combined with the data items in the second data group to be merged in a one-to-one correspondence to generate new data items, thereby forming a merged data set.

The formed merged data set includes both user activity information from the first party and user activity information from the second party, but does not include the user identity information. Therefore, for the third party, it is impossible to know which user has performed. These activities.

Step S50: The third-party computing platform analyzes and calculates the merged data set, and generates a result data set.

Through the step S50, the third-party computing platform can perform analysis and calculation on the merged data set to generate a result data set, and the result data set can be the result of the analysis statistics, which is completely different from the first and second to-be-fused data sets. The result data set can be fed back to the first party and the second party, and the first party and the second party cannot restore the original data from the result data set.

Further, after the foregoing analysis and calculation is completed, the third-party computing platform may delete the first to-be-fused data set and the second to-be-fused data set, thereby facilitating protection of data security and privacy.

The big data security fusion method provided by the embodiment shields the user's identity information while realizing big data fusion, thereby effectively preventing leakage of private data. This method of big data fusion is safe and reliable, and simple to implement.

According to the implementation of the foregoing embodiment, the implementation may be further improved in step S10. Including: the first direction the second party proposes a field in the first data set that relates to user privacy information or a field that needs to be protected. Correspondingly, step S30 further includes: deleting the data corresponding to the field of the user privacy information or the field to be protected from the first to-be-fused data set.

Similarly, the second party may also present to the first party a field in the second data set that relates to user privacy information or a field that needs to be protected.

This improved implementation provides enhanced protection of user privacy information, and is particularly suitable for use in applications where data protection is critical.

The above description is only for the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Various modifications may be made by those skilled in the art without departing from the spirit of the invention and the appended claims.

Claims

A big data security fusion method is used for fusing a first data set stored by a first party with a second data set stored by a second party, the method comprising the following steps:

a), the first party and the second party negotiate the associated fields, their respective data items, and the collation rules;

b) filtering, according to the respective required data items, the first to-be-fused data set and the second to-be-fused data set from the first data set and the second data set respectively;

c) sorting the first to-be-fused data set and the second to-be-fused data set according to the sorting rule, and respectively, the data corresponding to the associated field is respectively from the first to-be-fused data set, and the second The data to be merged is removed;

d), the first party and the second party respectively submit the first to-be-fused data set and the second to-be-fused data set to a third-party computing platform to form a merged data set;

e), the third-party computing platform analyzes and calculates the merged data set, and generates a result data set.
The method of claim 1 wherein said third party computing platform is independent of said first party and said second party, respectively.
The method of claim 1 wherein said step e) further comprises:

After the analysis and calculation is completed, the first to-be-fused data set and the second to-be-fused data set are deleted from the computing system.
The method according to claim 1, wherein the first data set and the second data set respectively record different activity information of a plurality of users, and the associated fields include:

User identity information;

User's card information; and/or

Uniquely identifies the user's identification information.
The method according to claim 4, wherein said step a) further comprises:

The first direction, the second party, proposes a field in the first data set that relates to user privacy information;

The step c) further includes:

And deleting data corresponding to the field related to the user privacy information from the first to-be-fused data set.