CN113254989B

CN113254989B - Fusion method and device of target data and server

Info

Publication number: CN113254989B
Application number: CN202110457984.6A
Authority: CN
Inventors: 李漓春; 殷山; 尹栋
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2022-02-15
Anticipated expiration: 2041-04-27
Also published as: CN113254989A

Abstract

The specification provides a fusion method, a fusion device and a fusion server of target data. In some embodiments, a first server holding first target data and a second server holding second target data are shared to obtain a key root parameter and a hash salt parameter; the first server processes the first data set according to a preset processing rule by using a secret key root parameter, a hash salt parameter and a preset first secret key derivation function to obtain a processed first data set; similarly, the second server processes the second data set according to a preset processing rule and a preset second secret key derivation function to obtain a processed second data set; and the fusion server acquires and performs fusion processing according to the processed first data set and the processed second data set by using a preset derivation rule to obtain a fused data set. Therefore, leakage of target data which is not fused to the fusion server can be avoided in the fusion processing process.

Description

Fusion method and device of target data and server

Technical Field

The specification belongs to the technical field of artificial intelligence, and particularly relates to a target data fusion method, a target data fusion device and a server.

Background

In some application scenarios, different data parties may each hold different kinds of feature data for the same group of users. At this time, it is often necessary to fuse different types of feature data of the same user held by different data parties through a fusion party to obtain fused feature data with relatively richer and more comprehensive features. Further, further data processing may be subsequently performed based on the fused feature data.

However, in the above fusion process, there is a case where a part of the feature data is not fused. Based on the requirement of protecting data privacy of the data side, the data side does not want to reveal the un-fused feature data to the fusion side.

At present, a method capable of better avoiding leakage of feature data which is not fused to a fusion party in a fusion process is needed.

Disclosure of Invention

The specification provides a target data fusion method, a target data fusion device and a target data fusion server, which can effectively avoid leakage of target data which is not fused to a fusion server in a fusion processing process, and can also avoid leakage of a data identifier corresponding to the target data, so that data security in the fusion processing process is protected.

The fusion method, device and server of target data provided by the specification are realized as follows:

a target data fusion method is applied to a first server holding a first data set, wherein the first data set comprises a plurality of first data groups, and each first data group comprises a data identifier and first target data corresponding to the data identifier; the method comprises the following steps: acquiring a key root parameter and a hash salt parameter; processing the first data set by using the key root parameter, the hash salt parameter and a preset first secret key derivation function according to a preset processing rule to obtain a processed first data set; the processed first data set comprises a plurality of processed first data groups, and the processed first data groups comprise hash values of data identifiers, first secret key components and ciphertext data of first target data; sending the processed first data set to a fusion server; wherein the fusion server further receives the processed second data set; the fusion server obtains a fused data set through fusion processing according to a preset derivation rule, the processed first data set and the processed second data set; the second processed data set is obtained by the second server processing the held second data set by using the key root parameter, the hash salt parameter and a preset second secret key derivation function according to a preset processing rule; the preset first secret key derivation function is associated with a preset second secret key derivation function.

A fusion method of target data is applied to a fusion server and comprises the following steps: acquiring a processed first data set and a processed second data set; the processed first data set comprises a plurality of processed first data sets which are obtained by processing a first server based on a preset processing rule and a preset first secret key derivation function; the processed second data set comprises a plurality of processed second data groups which are obtained by the second server based on a preset processing rule and a preset second secret key derivation function; the preset first secret key derivation function is associated with a preset second secret key derivation function; screening out a processed first data group and a processed second data group which contain the same data identification hash value as a matching data pair according to the processed first data set and the processed second data set; and performing corresponding fusion processing on the matching data pair according to a preset derivation rule to obtain a fused data set.

A fusion method of target data is applied to a fusion server and comprises the following steps: obtaining a plurality of processed data sets; the processed data sets are obtained by processing the held data sets by the servers respectively based on a preset processing rule and a preset key derivation function; the plurality of servers comprises at least three servers; screening out at least a threshold number of processed data groups with the same hash value of the contained data identification as a matching data pair according to a plurality of processed data sets; and performing corresponding fusion processing on the matching data pair according to a preset derivation rule to obtain a fused data set.

An apparatus for fusing target data, comprising: the obtaining module is used for obtaining a key root parameter and a hash salt parameter; the processing module is used for processing the first data set by utilizing the secret key root parameter, the hash salt parameter and a preset first secret key derivation function according to a preset processing rule to obtain a processed first data set; the processed first data set comprises a plurality of processed first data groups, and the processed first data groups comprise hash values of data identifiers, first secret key components and ciphertext data of first target data; the sending module is used for sending the processed first data set to a fusion server; wherein the fusion server further receives the processed second data set; the fusion server obtains a fused data set through fusion processing according to a preset derivation rule, the processed first data set and the processed second data set; the second processed data set is obtained by the second server processing the held second data set by using the key root parameter, the hash salt parameter and a preset second secret key derivation function according to a preset processing rule; the preset first secret key derivation function is associated with a preset second secret key derivation function.

An apparatus for fusing target data, comprising: the acquisition module is used for acquiring the processed first data set and the processed second data set; the processed first data set comprises a plurality of processed first data sets which are obtained by processing a first server based on a preset processing rule and a preset first secret key derivation function; the processed second data set comprises a plurality of processed second data groups which are obtained by the second server based on a preset processing rule and a preset second secret key derivation function; the preset first secret key derivation function is associated with a preset second secret key derivation function; the screening module is used for screening out the processed first data group and the processed second data group which contain the same data identification hash value as a matching data pair according to the processed first data set and the processed second data set; and the fusion module is used for carrying out corresponding fusion processing on the matching data pair according to a preset derivation rule so as to obtain a fused data set.

A server comprising a processor and a memory for storing processor-executable instructions which, when executed by the processor, implement the relevant steps of the fusion method of target data.

A computer storage medium having stored thereon computer instructions which, when executed, implement the relevant steps of the fusion method of the target data.

The target data fusion method, the target data fusion device and the target data fusion server provided by the specification are characterized in that a first server holding a first data set containing first target data and a second server holding a second data set containing second target data can obtain a key root parameter and a hash salt parameter through cooperative sharing according to a preset protocol; then, the first server may process the first data set according to a preset processing rule by using the key root parameter, the hash salt parameter, and a preset first secret key derivation function, to obtain a processed first data set including a hash value of the data identifier, a first secret key component, and ciphertext data of the first target data; similarly, the second server may process the second data set according to a preset processing rule by using the key root parameter, the hash salt parameter, and a preset second secret key derivation function associated with the preset first secret key derivation function, to obtain a corresponding processed second data set; the fusion server may obtain and perform fusion processing according to the processed first data set and the processed second data set by using a preset derivation rule matched with a preset first secret key derivation function and a preset second secret key derivation function, so as to obtain a fused data set in which the first target data and the second target data are fused. Therefore, leakage of target data which is not fused to the fusion server can be effectively avoided in the fusion processing process, meanwhile, data identification corresponding to the target data can be avoided, and data safety in the fusion processing process is protected.

Drawings

In order to more clearly illustrate the embodiments of the present specification, the drawings needed to be used in the embodiments will be briefly described below, and the drawings in the following description are only some of the embodiments described in the present specification, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a diagram illustrating an embodiment of a structural component of a system to which a fusion method of target data provided by an embodiment of the present specification is applied;

FIG. 2 is a diagram illustrating an embodiment of a fusion method for target data provided by an embodiment of the present specification, in an example scenario;

FIG. 3 is a flow diagram illustrating a method for fusing target data provided by an embodiment of the present disclosure;

FIG. 4 is a flow diagram illustrating a method for fusing target data provided by an embodiment of the present disclosure;

FIG. 5 is a diagram illustrating an embodiment of a fusion method for target data according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural component diagram of a server provided in an embodiment of the present description;

fig. 7 is a schematic structural composition diagram of a target data fusion device provided in an embodiment of the present specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.

The embodiment of the specification provides a fusion method of target data. The method can be particularly applied to a system comprising a first server, a second server and a converged service. In particular, reference may be made to fig. 1. The first server, the second server and the fusion server can be connected in a wired or wireless mode to carry out specific data interaction.

In this embodiment, the first server, the second server, and the fusion server may specifically include a background server that is applied to a network platform side and is capable of implementing functions such as data transmission and data processing. Specifically, the first server, the second server and the fusion server may be, for example, an electronic device having data operation, storage function and network interaction function. Alternatively, the first server, the second server and the fusion server may also be software programs running in the electronic device and providing support for data processing, storage and network interaction. In this embodiment, the number of the servers included in the first server, the second server, and the fusion server is not particularly limited. The first server, the second server and the fusion server may be specifically one server, or may be several servers, or a server cluster formed by several servers.

In this embodiment, the first server may be a server deployed on a side of a first data party (e.g., a shopping website). The first server may hold a first set of data. The first data set may specifically include a plurality of first data groups. Each of the plurality of first data sets may further include a data identifier and first target data corresponding to the data identifier.

The second server may be a server deployed on the side of the second data party (e.g., a credit evaluation organization). The second server may hold a second set of data. The second data set may specifically contain a plurality of second data groups. Each of the plurality of second data sets may further include a data identifier and second target data corresponding to the data identifier.

The data identifier in the first data set and the data identifier in the second data set may be partially or completely the same.

In some embodiments, the data identifier may specifically be an identity identifier of the user object. For example, the name, account name, mobile phone number used at the time of registration, and the like of the sample user object.

Accordingly, the first target data may specifically be first type feature data of the user object (for example, a shopping place of the sample user object, a shopping time of the user object, a shopping amount of the user object, and the like), and/or first type tag data of the user object (for example, a shopping website membership grade tag of the sample user object, a payment overdue tag of the shopping amount of the user object, and the like).

The second target data may specifically be a second type of feature data of the user object (e.g., monthly income data of the sample user object, occupation of the user object, age of the user object, etc.), and/or a second type of label of the user object (e.g., credit rating label of the sample user object, default record label of the user object, etc.).

The third server may be specifically a server deployed on a side of a convergence party trusted by the first data party and the second data party together. The fusion party is specifically responsible for providing corresponding target data fusion services, namely finding out first target data and second target data corresponding to the same data identifier based on the first data set and the second data set; and fusing the first target data and the second target data corresponding to the same data identification together to construct a fused data set for subsequent use.

Currently, a first data party expects to use a first data set held by the own party to cooperate with a second data party holding a second data set, and with the assistance of a fusion party, data fusion based on the first data set and the second data set is completed to obtain a fused data set, so that the first data party can use the fused data set to train and obtain a target prediction model for predicting the transaction risk of a user. Meanwhile, the first data side and the second data side also require that in the data fusion process, the data value of the target data which is not fused is prevented from being leaked to the fusion side, and the data identification corresponding to the target data is avoided.

In particular, reference may be made to FIG. 2. The first server may generate and initiate a training request at the system for the target prediction model as an initialization request.

In the first stage, the first server and the second server may obtain a key root parameter and a hash salt parameter through cooperative sharing based on a preset protocol in response to the initialization request. And the key root parameter and the hash salt parameter are kept secret from the fusion server.

Specifically, for example, first, the first server may generate a first random number (which may be denoted as r1) and a second random number (which may be denoted as c1) by using a random number generator in response to the initialization request. Meanwhile, the second server may generate a third random number (which may be denoted as r2) and a fourth random number (which may be denoted as c2) using a random number generator in response to the initialization request.

Then, the first server may send the first random number and the second random number to the second server. Meanwhile, the second server may send the third random number and the fourth random number to the first server.

Then, at one side of the first server, the first server can obtain a corresponding key root parameter by combining the first random number and the third random number; and combining the second random number and the fourth random number to obtain the corresponding hash salt parameter.

For example, the first server may use a sum obtained by adding r1 and r2 as the key root parameter r; the sum of c1 and c2 is taken as the hash salt parameter c.

Meanwhile, on the side of the second server, the second server may adopt the same combination mode, and obtain the same key root parameter by combining the first random number and the third random number; the same hash salt parameter is obtained by combining the second random number and the fourth random number.

Thus, the first server and the second server can cooperate to obtain the shared key root parameter and the hash salt parameter. The data processing of the first stage is completed.

For another example, first, the first server generates a first random number and a second random number; and determining the first random number as a key root parameter and a hash salt parameter.

Then, the first server may encrypt the key root parameter and the hash salt parameter by using the held encryption key, and then encrypt the encrypted key root parameter and the encrypted hash salt parameter; and sending the encrypted key root parameter and the encrypted hash salt parameter to the second server. The first server and the second server may generate and hold a pair of keys including an encryption key and a decryption key based on a corresponding key generation algorithm in advance.

Then, the second server may perform a decryption process using the decryption key to obtain the key root parameter and the hash salt parameter. The data processing of the first stage is completed.

Of course, it should be noted that the above-listed manner of obtaining the shared key root parameter and the hash salt parameter is only an illustrative example. In specific implementation, according to specific situations and processing requirements, other suitable manners may also be adopted to cooperatively obtain the shared key root parameter and the hash salt parameter. The present specification is not limited to these.

In the second stage, the first server may process the first data set by using the key root parameter, the hash salt parameter, and the preset first secret key derivation function according to the preset processing rule, so as to obtain the processed first data set. Wherein the processed first data set comprises a plurality of processed first data groups. The processed first data group is different from the first data group, and may specifically include a hash value of the data identifier, a first secret key component corresponding to the hash value of the data identifier, and ciphertext data of the first target data obtained by encrypting with the first secret key component.

Meanwhile, the second server may process the second data set by using the shared secret key root parameter, the hash salt parameter, and a preset second secret key derivation function according to a preset processing rule, so as to obtain a processed second data set. Wherein the processed second data set comprises a plurality of processed second data groups. The processed second data group is different from the second data group, and may specifically include a hash value of the data identifier, a second secret key component corresponding to the hash value of the data identifier, and ciphertext data of second target data obtained by encrypting with the second secret key component.

The preset first secret key derivation function and the preset second secret key derivation function may be configured specifically to satisfy a specific structural relationship of the associated secret key derivation functions. Based on the preset first secret key derivation function and the preset second secret key derivation function, the same secret key root parameter and the same data identifier can be used as input, and two associated secret key components which meet a specific numerical relationship, namely the corresponding first secret key component and the corresponding second secret key component, are obtained through derivation calculation.

In addition, a preset derivation rule matched with the preset first secret key derivation function and the preset second secret key derivation function can be configured in advance. Based on the preset derivation rule, the decryption key which is simultaneously effective to the associated key components can be obtained by using the associated key components in a combined manner by utilizing the specific numerical relationship of the key components.

Specifically, for example, the preset first secret key derivation function can be expressed as: k1(r, ID) ═ hash (ID | | |1| | | r); the predetermined second secret key derivation function associated with the predetermined first secret key derivation function satisfying the specific structural relationship may be expressed as: k2(r, ID) ═ hash (ID | | |2| | | r). Wherein, hash () represents hash operation, r represents a key root parameter, symbol "|" represents splicing operation, and ID represents data identification. By using the preset first secret key derivation function and the preset second secret key derivation function, the corresponding first secret key component k1 and second secret key component k2 calculated based on the same data identifier and key root parameter satisfy the following specific numerical relationship: k1+ k2 equals k'. Wherein k' represents a decryption key that is valid for both the first and second key components k1, k 2. Accordingly, the matching preset derivation rule can be expressed as: and adding the first secret key component and the second secret key component to obtain a corresponding decryption secret key.

It should be understood that the above-mentioned predetermined first secret key derivation function, the predetermined second secret key derivation function, and the predetermined derivation rule are only exemplary. In specific implementation, according to specific situations and processing requirements, other types of preset first secret key derivation functions, preset second secret key derivation functions, and preset derivation rules may also be configured and used. The present specification is not limited to these.

In some embodiments, before the specific implementation, a preset first secret key derivation function, a preset second secret key derivation function, and a preset derivation rule that satisfy the above relationship may be configured in advance. And writing the preset first secret key derivation function, the preset second secret key derivation function and the preset derivation rule into a preset protocol. Furthermore, the first server may obtain a preset first secret key derivation function based on the preset protocol; the second server may obtain a preset second secret key derivation function based on the preset protocol; the fusion server may obtain a preset derivation rule based on the preset protocol.

Certainly, the preset protocol may also be configured to temporarily configure a preset first secret key derivation function, a preset second secret key derivation function, and a preset derivation rule on line in the case that the trigger request is received; and automatically sending a preset first secret key derivation function to the first server, sending a preset second secret key derivation function to the second server, and sending a preset derivation rule to the fusion server.

At the second stage, at the first server side, the first server may sequentially process each first data group of the plurality of first data groups included in the first data set (which may be written as { (ID1[ i ], data1[ i ]) }) according to a preset processing rule and a preset first secret key derivation function to obtain a corresponding processed first data group, so as to complete processing on the first data set, and obtain a processed first data set.

In this embodiment, the current first data set of the plurality of first data sets is processed. Wherein the current first data set may be represented in the form: (ID1[ i ], data1[ i ]). Wherein ID1[ i ] represents the current data identifier in the current first data group, and data1[ i ] represents the current first target data in the current first data group corresponding to the current data identifier.

First, the first server may perform a hash operation on the current data identifier (ID1[ i ]) in the current first data group by using the hash salt parameter, so as to obtain a hash value of the current data identifier, which may be denoted as h (ID1[ i ]). Therefore, the hash value of the current data identifier can be obtained and used for salting, so as to cover and hide the real value of the current data identifier.

Next, the first server may calculate a current first secret key component (which may be denoted as k1) for the current data set based on the current data identifier, the key root parameter (r), and the like by using a preset first secret key derivation function. Specifically, for example, the current data identifier and the key root parameter may be spliced according to a related splicing order (e.g., an order of the data identifier and the key root parameter), so as to obtain a spliced character string; and substituting the spliced character string into a preset first secret key derivation function to carry out operation, and obtaining a corresponding operation result as a current first secret key component.

Then, the first server may encrypt the current first target data by using the obtained first secret key component, so as to obtain ciphertext data of the current first target data, which may be denoted as E (data1[ i ]). So that the actual value of the first target data can be masked and hidden.

Finally, the first server may combine the hash value of the current data identifier, the first secret key component, and the ciphertext data of the current first target data to obtain a processed current first data group corresponding to the current first data group, which may be represented as the following form: (h (ID1[ i ]), k1(r, ID1[ i ]), E (data1[ i ])).

According to the above manner, the first server may process each first data group included in the first data set respectively to obtain a corresponding processed first data group, so as to obtain a processed first data set, which may be specifically represented as the following form: { (h (ID1[ i ]), k1(r, ID1[ i ]), E (data1[ i ])) }.

On the second server side, based on a similar processing manner to that of the first server processing the first data set, the second server may sequentially process each of a plurality of second data sets included in the second data set (e.g., { (ID2[ i ], data2[ i ]) }) according to a preset processing rule and a preset second secret key derivation function to obtain a corresponding processed second data set (e.g., (h (ID2[ i ]), k2(r, ID2[ i ]), E (data2[ i ]))), so as to complete processing of the second data set, and obtain a processed second data set (e.g., { (h (ID2[ i ]), k2(r, ID2[ i ]) }, E (data2[ i ])).

In a third phase, the first server may send the processed first data set to the fusion server. Meanwhile, the second server may send the processed second data set to the fusion server.

Correspondingly, the fusion server may receive and obtain the processed first data set and the processed second data set. Screening out matching data pairs successfully matched according to the processed first data set and the processed second data set; and performing specific fusion processing on the matched data pair according to a preset derivation rule to obtain a fused data set required by the first server.

The matching data pair may be specifically understood as a combination including a processed first data group and a processed second data group, and a hash value of a data identifier included in the processed first data group in the same combination is the same as a hash value of a data identifier included in the processed second data group.

Specifically, when the matched data pairs are screened, the fusion server may find the processed second data group obtained by combining the processed first data with the same data identifier by retrieving the hash value of the data identifier in each processed first data group in the processed first data set and the hash value of the data identifier in each processed second data group in the second data set, and use the found second data group as the matched data pair successfully matched.

Specifically, when performing the fusion processing, the fusion server may fuse the processed first data group and the processed second data group included in each matching data pair, respectively, to obtain fused data groups corresponding to each matching data pair, respectively, so as to obtain a fused data set.

The specific implementation takes processing the current matching data pair in the matching data pair as an example. Wherein the current matching data pair may be represented in the form of: ((h (ID1[ i ]), k1(r, ID1[ i ]), E (data1[ i ]))), (h (ID2[ i ]), k2(r, ID2[ i ]), E (data2[ i ])). Wherein h (ID1[ i ]) is h (ID2[ i ]). Accordingly, ID1[ i ] ═ ID2[ i ].

The fusion server may extract the first secret key component k1(r, ID2[ i ]) (denoted as k1), the second secret key component k2(r, ID2[ i ]) (denoted as k2), the ciphertext data E (data1[ i ]) of the first target data, and the ciphertext data E (data2[ i ]) of the second target data from the current matching data pair.

Then, according to a preset derivation rule, the first secret key component and the second secret key component are combined and utilized to obtain a decryption secret key corresponding to the current matching data pair. The decryption key can support decryption processing on ciphertext data obtained by encrypting the ciphertext data of the first target data and the ciphertext data of the second target data by using the first secret key component and the second secret key component respectively.

Specifically, for example, the convergence server may add the first secret key component and the second secret key component according to a preset derivation rule to obtain a corresponding decryption secret key, which is denoted as k'.

Further, the fusion server may decrypt the ciphertext data of the first target data and the ciphertext data of the second target data using the decryption key, respectively, to obtain the first target data1[ i ] in a plaintext form and the second target data2[ i ] in a plaintext form.

Then, the fusion server may splice the first target data and the second target data according to a preset splicing order (for example, an order of the first target data to the second target data), to obtain fused target data (or referred to as batch spliced target data), which may be represented as: data1[ i ] -data2[ i ]. The fused target data is combined with the hash value (h (ID1[ i ]) or h (ID2[ i ])) of the data identifier of the current matching data pair to obtain a corresponding fused data set, which can be recorded as (h (ID1[ i ]), (data1[ i ] -data2[ i ]), for example.

In the above manner, the fusion server can obtain fused data sets corresponding to the respective matching data pairs by fusion processing, and can further obtain fused data sets, for example, { (h (ID1[ i ]), (data1[ i ] -data2[ i ]) }.

In the fusion processing process, on one hand, the fusion server can only contact the hash value of the data identifier, and the fusion server does not have the corresponding hash salt parameter, so that the fusion server cannot know the real value of the data identifier.

On the other hand, for a data group which is not successfully matched and is not combined with a processed data group in other processed data sets to form a matched data pair, target data in the data group is not fused, but the fusion server cannot obtain a corresponding decryption key only based on a single key component in the data group, so that the fusion server cannot know that i in the data group obtains a true value of fused target data.

Therefore, leakage of target data which is not fused to the fusion server can be effectively avoided in the fusion processing process, meanwhile, data identification corresponding to the target data can be avoided, and data safety in the fusion processing process is protected.

After the target data fusion is completed and the corresponding fused data set is obtained according to the above manner, the fusion server may send the fused data set to the first server based on a preset protocol.

Correspondingly, the first server receives and acquires the fused data set. Then, the first server may perform desalination processing on the fused data set by using the held hash salt parameter, so as to obtain a desalinated data set. The desalted data set comprises a plurality of data groups, and each data group comprises a data identifier and fused target data corresponding to the data identifier.

The first server can construct a training sample set and a testing sample set according to the data set after the desalting; and performing model training by using the training sample set and the testing sample set to obtain a target prediction model which has a good effect and high accuracy and is used for predicting the transaction risk of the user.

Referring to fig. 3, an embodiment of the present disclosure provides a method for fusing target data. The method is particularly applied to a first server side holding a first data set. The first data set may specifically include a plurality of first data groups, and the first data group may further include a data identifier and first target data corresponding to the data identifier. The method, when embodied, may include the following.

S301: acquiring a key root parameter and a hash salt parameter;

s302: processing the first data set by using the key root parameter, the hash salt parameter and a preset first secret key derivation function according to a preset processing rule to obtain a processed first data set; the processed first data set comprises a plurality of processed first data groups, and the processed first data groups comprise hash values of data identifiers, first secret key components and ciphertext data of first target data;

s303: sending the processed first data set to a fusion server; wherein the fusion server further receives the processed second data set; the fusion server obtains a fused data set through fusion processing according to a preset derivation rule, the processed first data set and the processed second data set; the second processed data set is obtained by the second server processing the held second data set by using the key root parameter, the hash salt parameter and a preset second secret key derivation function according to a preset processing rule; the preset first secret key derivation function is associated with a preset second secret key derivation function.

In some embodiments, the key root parameter and the hash salt parameter may be data that is shared by the first server and the second server and is kept secret from the fusion server.

Wherein the second server holds a second data set. The second data set includes a plurality of second data groups, and the second data groups may further include data identifiers and second target data corresponding to the data identifiers. The data identities in the first data set may be partially identical to the data identities in the second data set.

In some embodiments, the predetermined first key derivation function is associated with a predetermined second key derivation function. Specifically, the preset first secret key derivation function and the preset second secret key derivation function may be configured to satisfy an associated secret key derivation function of a specific structural relationship. And based on the preset first secret key derivation function and the preset second secret key derivation function, the same secret key root parameter and the same data identifier can be used as input, and two associated secret key components which meet a specific numerical relationship, namely the corresponding first secret key component and the corresponding second secret key component, are obtained through derivation calculation.

In addition, a preset derivation rule matched with the preset first secret key derivation function and the preset second secret key derivation function can be configured. Based on the preset derivation rule, the decryption key which is simultaneously effective to the associated key components can be obtained by using the associated key components in a combined manner by utilizing the specific numerical relationship of the key components.

In some embodiments, the obtaining the key root parameter and the hash salt parameter may include: responding to the initialization request, and generating a first random number and a second random number; the second server responds to the initialization request and generates a third random number and a fourth random number; sending the first random number and the second random number to a second server; receiving a third random number and a fourth random number sent by a second server; combining the first random number and the third random number to obtain a key root parameter; and combining the second random number and the fourth random number to obtain the hash salt parameter.

The initialization request may be generated and initiated by any one of the first server, the second server, or the convergence server.

In some embodiments, the processing, according to a preset processing rule, the first data set by using the key root parameter, the hash salt parameter, and the preset first secret key derivation function to obtain a processed first data set may include the following steps: processing a current first data set of a plurality of first data sets comprised by said first data set in the following manner: according to a preset processing rule, carrying out hash processing on the current data identifier in the current first data group by using a hash salt parameter to obtain a hash value of the current data identifier; calling a preset first secret key derivation function, and calculating to obtain a current first secret key component aiming at a current first data set according to a current data identifier and a secret key root parameter; encrypting the current first target data in the current data set by using the current first secret key component to obtain the ciphertext data of the current first target data; and combining the hash value of the current data identifier, the current first secret key component and the ciphertext data of the current first target data to obtain a processed current first data group.

Each first data group contained in the first data set is processed in the above manner to obtain each corresponding processed first data group, so that the processed first data set can be obtained.

In some embodiments, the preset first secret key derivation function may specifically include: a key derivation function based on additive secret sharing, or a key derivation function based on Shamir secret sharing, etc. The preset second secret key derivation function is the same as the preset first secret key derivation function in function type.

The key derivation function based on the addition secret sharing is obtained based on an addition secret sharing algorithm, has the characteristic of the addition secret sharing algorithm, and is suitable for application scenes with only two data parties except the fusion server. The key derivation function based on Shamir secret sharing is obtained based on a Shamir secret sharing algorithm, has the characteristics of the Shamir secret sharing algorithm, and is suitable for application scenarios in which a plurality of data parties (including two data parties or more data parties) participate in addition to the fusion server.

In some embodiments, in the case that the preset first secret key derivation function is a secret key derivation function shared based on addition, specifically, the preset first secret key derivation function and the preset second secret key derivation function configured may be represented as follows: k1(r, ID) ═ hash (ID | |1| | | r), k2(r, ID) ═ hash (ID | |2| | r). Wherein k1() represents a preset first secret key derivation function, k2() represents a preset second secret key derivation function, r represents a secret key root parameter, ID represents data identification, hash () represents hash operation, and | | l represents splicing operation.

Using the preset first key derivation function and the preset second key derivation function, the first key component k1 and the second key component k2 calculated according to the same r and ID satisfy the following numerical relationship: k' k1+ k 2. Where k' denotes a decryption key valid for both k1 and k 2.

In some embodiments, in the case that the preset first secret key derivation function is a secret key derivation function shared based on Shamir secrets, specifically, the preset first secret key derivation function and the preset second secret key derivation function are configured to have the following relationship characteristics: k1(r, ID) is a + k '(r, ID), and k2(r, ID) is 2a + k' (r, ID). Where k' (r, ID) represents a functional expression of the decryption key to be solved, and a represents a random value determined based on r and ID. k1(r, ID), k2(r, ID) represent function expressions (i.e. preset first key derivation function, preset second key derivation function) for calculating corresponding first and second key components based on r, ID. The k1(r, ID) and k2(r, ID) may be constructed based on a hash function of cryptographic security.

By utilizing the relation characteristics, an equation set related to a and k' can be constructed by combining two key components of k1 and k 2; and then, solving the equation set to calculate the corresponding k' as the corresponding decryption key. For example, the functional expressions for calculating a and k' can be determined in the above manner as: a ═ hash (ID | | | r | | |1), k' (r, ID) ═ hash (ID | | | r).

Of course, it should be noted that the above-mentioned predetermined first secret key derivation function and the predetermined second secret key derivation function are only exemplary.

In some embodiments, the invoking of the preset first secret key derivation function calculates, according to the current data identifier and the key root parameter, a current first secret key component for the current first data set, and the specific implementation may include the following: splicing the current data identifier and the key root parameter to obtain a spliced character string; and substituting the spliced character string into a preset first secret key derivation function, and calculating to obtain a current first secret key component.

In some embodiments, when the current first secret key component is specifically calculated, the current data identifier and the secret key root parameter may be mapped to corresponding data identifier character strings and secret key root parameter character strings according to a preset character mapping rule; and splicing the data identification character string and the key root parameter character string according to the corresponding splicing sequence to obtain the corresponding spliced character string. The spliced character string can be used as input and input into a preset first secret key derivation function; and calculating to obtain a corresponding current first secret key component by performing derivation operation.

In some embodiments, for a second server holding a second data set, the second data set may be processed with reference to the above-described embodiment when the first server processes the first data set, resulting in a processed second data set. Therefore, the description is not repeated.

The second data set comprises a plurality of second data groups, and each second data group comprises a data identifier and second target data corresponding to the data identifier. The processed second data set comprises a plurality of processed second data groups, and each processed second data group comprises a hash value of the data identifier, a second secret key component corresponding to the hash value of the data identifier, and ciphertext data of second target data.

In some embodiments, the first target data may specifically include: the first type of characteristic data corresponding to the data identification, and/or the first type of tag data corresponding to the data identification, and the like. Similarly, the second target data may specifically include: the second type of characteristic data corresponding to the data identification and/or the second type of label data corresponding to the data identification.

Specifically, for example, the first target data may be one of different kinds of feature data of the sample user object, and may also be one or more kinds of tag data of the sample user object. Similarly, the second target data may be another kind or kinds of characteristic data of the sample user object, and may also be another kind or kinds of tag data of the sample user object.

In some embodiments, after obtaining the processed first data set, the first server may send the processed first data set to the fusion server in a wired or wireless manner. Meanwhile, the second server may send the processed second data set to the fusion server in a wired or wireless manner. The fusion server may perform specific fusion processing by using the processed first data set and the processed second data set according to a preset derivation rule matched with a preset first secret key derivation function and a preset second secret key derivation function, and finally obtain a fused data set. The specific procedure of the fusion process will be described later.

The fused data set comprises a plurality of fused data groups, and the fused data groups specifically comprise hash values of data identifiers and fused target data corresponding to the hash values of the data identifiers. The fused target data may specifically be richer target data obtained by fusing two target data, i.e., the first target data and the second target data.

In some embodiments, the fusion server may provide the fused data set to a server having a corresponding usage right according to a preset protocol. For example, according to a preset protocol, it is determined that the first server has a usage right to use the fused data set, and the fused server may send the fused data set to the first server.

Correspondingly, the first server receives and acquires the fused data set, and performs specific data processing by using the fused data set.

In some embodiments, the first server may perform a desalination process on the fused data set using the held hash salt parameter to obtain a desalinated data set. And then the data set after desalting can be used for carrying out combined statistics on the target data. Model training can also be performed by using the desalted data set to obtain a corresponding target prediction model.

As can be seen from the above, in the target data fusion method provided in the embodiments of the present specification, a first server holding a first data set including first target data and a second server holding a second data set including second target data may cooperate to generate a shared key root parameter and a hash salt parameter according to a related protocol; then, the first server may process the first data set according to a preset processing rule by using the key root parameter, the hash salt parameter, and a preset first secret key derivation function, to obtain a processed first data set including a hash value of the data identifier, a first secret key component, and ciphertext data of the first target data; similarly, the second server may process the second data set by using the key root parameter, the hash salt parameter, and a preset second secret key derivation function associated with the preset first secret key derivation function according to a preset processing rule, so as to obtain a corresponding processed second data set; the fusion server may obtain and perform fusion processing according to the processed first data set and the processed second data set by using a preset derivation rule matched with a preset first secret key derivation function and a preset second secret key derivation function, so as to obtain a fused data set in which the first target data and the second target data are fused. Therefore, leakage of target data which is not fused to the fusion server can be effectively avoided in the fusion processing process, meanwhile, data identification corresponding to the target data can be avoided, and data safety in the fusion processing process is protected.

Referring to fig. 4, an embodiment of the present disclosure provides a method for fusing target data. The method is particularly applied to the side of the fusion server. The method may be embodied as follows.

S401: acquiring a processed first data set and a processed second data set; the processed first data set comprises a plurality of processed first data sets which are obtained by processing a first server based on a preset processing rule and a preset first secret key derivation function; the processed second data set comprises a plurality of processed second data groups which are obtained by the second server based on a preset processing rule and a preset second secret key derivation function; and the preset first secret key derivation function is associated with the preset second secret key derivation function.

S402: and screening out the processed first data group and the processed second data group which contain the same data identification hash value as a matching data pair according to the processed first data set and the processed second data set.

S403: and performing corresponding fusion processing on the matching data pair according to a preset derivation rule to obtain a fused data set.

In some embodiments, the screening, according to the processed first data set and the processed second data set, which include data identifiers having the same hash value, as a matching data pair, may include: retrieving the hash value of the data identifier in the processed first data set and the hash value of the data identifier in the processed second data set; and finding out a second data group which is formed by combining the processed first data with the same hash value and processing the two data groups, and combining the two data groups to form a matching data pair.

In some embodiments, the performing, according to a preset derivation rule, corresponding fusion processing on the matching data pair may include: performing corresponding fusion processing on the current matching data pair in the matching data pair according to the following mode: extracting a first secret key component, a second secret key component, ciphertext data of first target data and ciphertext data of second target data according to the current matching data pair; determining a decryption secret key corresponding to the current matching data pair by using the first secret key component and the second secret key component according to a preset derivation rule; decrypting the ciphertext data of the first target data and the ciphertext data of the second target data by using the decryption key to obtain first target data and second target data; and fusing the first target data and the second target data according to a preset fusion rule to obtain a fused data group corresponding to the current matching data pair.

By the above mode, corresponding fusion processing can be performed according to each matching data pair to obtain fused data groups corresponding to each matching data pair respectively, and further obtain fused data sets.

In some embodiments, the fusing the first target data and the second target data according to a preset fusion rule to obtain a fused data group corresponding to the current matching data pair, and the specific implementation may include the following: splicing the first target data and the second target data according to a preset splicing sequence to obtain fused target data; and combining the hash value of the data identifier in the current matching data pair with the fused target data to obtain a fused data group corresponding to the current matching data pair.

In some embodiments, the determining, by using the first secret key component and the second secret key component, a decryption key corresponding to the current matching data pair may be implemented by the following steps: and under the condition that the preset first secret key derivation function and the preset second secret key derivation function are secret key derivation functions shared based on addition secrets, according to a preset derivation rule, adding the first secret key component and the second secret key component to obtain a decryption secret key corresponding to the current matching data pair.

In some embodiments, the determining, according to a preset derivation rule, a decryption key corresponding to the current matching data pair by using the first secret key component and the second secret key component may include the following steps: under the condition that a preset first secret key derivation function and a preset second secret key derivation function are secret key derivation functions shared based on Shamir secrets, constructing a corresponding equation set by utilizing a first secret key component and a second secret key component according to a preset derivation rule; and solving the equation set to obtain a decryption key corresponding to the current matching data pair.

In some embodiments, after obtaining the fused data set, when the method is implemented, the following may be further included: sending the fused data set to a first server; and the first server trains a target prediction model by using the fused data set.

In some embodiments, in the case that the objective prediction model is a transaction risk prediction model and the fused objective data is a plurality of transaction characteristic data of the user, the method further includes: the method comprises the steps that a first server collects a plurality of transaction characteristic data of a target user to be detected; the first server calls the target prediction model to process the transaction characteristic data to obtain a prediction result corresponding to the target user; and the first server determines whether the target user has transaction risk according to the prediction result. Furthermore, the first server may set a corresponding risk tag for the target user according to whether the target user has a transaction risk. Subsequently, according to the risk label carried by the target user, the matched business service can be provided for the target user.

Referring to fig. 5, the present specification further provides another target data fusion method applied to a fusion server. The method may be embodied as follows.

S1: obtaining a plurality of processed data sets; the processed data sets are obtained by processing the held data sets by the servers respectively based on a preset processing rule and a preset key derivation function; the plurality of servers comprises at least three servers;

s2: screening out at least a threshold number of processed data groups with the same hash value of the contained data identification as a matching data pair according to a plurality of processed data sets;

s3: and performing corresponding fusion processing on the matching data pair according to a preset derivation rule to obtain a fused data set.

In some embodiments, the preset key derivation function may specifically be a key derivation function based on Shamir secret sharing.

In some embodiments, the plurality of servers may specifically refer to 3 or more than 3 servers. For example, the plurality of servers may specifically include: server 1, server 2, server 3 … … server N are N servers. Wherein each of the plurality of servers holds a data set. For example, any one of the plurality of servers t holds a data set { (IDt [ i ], datat [ i ]) }, where t is an integer of 1 or more and N or less.

In specific implementation, each server may process the held data set using the configured related preset key derivation function to obtain a corresponding processed data set (e.g., { (h (IDt [ i ]), kt (r, IDt [ i ]), E (datat [ i ]))))))) }); and then the processed data set is provided to the fusion server. The processed data set includes a plurality of processed data sets, each processed data set may further include a hash value of the data identifier and a key component corresponding to the hash value of the data identifier, and the ciphertext data of the target data is obtained by encrypting the key component.

In some embodiments, the number of the threshold values may be set to be 2, or a value greater than 2 but less than or equal to the total number of the plurality of servers.

Specifically, for example, the plurality of servers is 4 servers, and the threshold number is 3. Correspondingly, the fusion server receives 4 processed data sets, which are respectively denoted as a processed data set a, a processed data set B, a processed data set C, and a processed data set D.

When the fusion server searches the hash values of the data identifiers of the 4 processed data sets, for example, it is found that the processed data sets including the hash value h1 of the same data identifier are only the processed data set 1 in the processed data set a and the data set 2 in the processed data set C, that is, the number of data sets is less than the threshold number 3. Therefore, the matching data pair corresponding to the hash value h1 of the data id cannot be screened out.

For another example, retrieving the processed data set containing the retrieved hash value h2 containing the same data identifier includes: the processed data group 17 in the processed data set a, the data group 21 in the processed data set C, and the processed data group 30 in the processed data set D, that is, the number of data groups is not less than the threshold number 3. Therefore, the three data sets of the processed data set 17, the processed data set 21, and the processed data set 30 can be selected and combined to obtain one matching data pair corresponding to the hash value h2 of the data identifier.

In some embodiments, the performing, according to a preset derivation rule, corresponding fusion processing on the matching data pair may include the following steps: performing corresponding fusion processing on the current matching data pair in the matching data pair according to the following mode: extracting ciphertext data of a plurality of key components and a plurality of target data according to the current matching data pair; determining a decryption key corresponding to the current matching data pair by using at least a threshold number of key components in the key components according to a preset derivation rule; decrypting the ciphertext data of the plurality of target data by using the decryption key to obtain a plurality of target data; and fusing the plurality of target data according to a preset fusion rule to obtain a fused data set corresponding to the current matching data pair.

In some embodiments, based on the configured relationship characteristics of the preset key derivation function, a corresponding decryption key may be obtained by using a number of key components greater than or equal to the threshold number in combination with a matching preset derivation rule. Further, the decryption key can be used for decrypting ciphertext data of the target data in the corresponding matching data pair so as to obtain a plurality of target data in the matching data pair; and then fusing the plurality of target data to obtain fused target data.

By the above embodiment, the fusion processing of the target data can be performed in a more complex data fusion scenario in which three or more servers and a fusion service cooperate with each other by applying the fusion method for the target data provided by the present specification, and leakage of target data that is not fused to the fusion server can be effectively avoided in the fusion processing process, and meanwhile, leakage of a data identifier corresponding to the target data can also be avoided.

Embodiments of the present specification further provide a server, including a processor and a memory for storing processor-executable instructions, where the processor, when implemented, may perform the following steps according to the instructions: acquiring a key root parameter and a hash salt parameter; processing the first data set by using the key root parameter, the hash salt parameter and a preset first secret key derivation function according to a preset processing rule to obtain a processed first data set; the processed first data set comprises a plurality of processed first data groups, and the processed first data groups comprise hash values of data identifiers, first secret key components and ciphertext data of first target data; sending the processed first data set to a fusion server; wherein the fusion server further receives the processed second data set; the fusion server obtains a fused data set through fusion processing according to a preset derivation rule, the processed first data set and the processed second data set; the second processed data set is obtained by the second server processing the held second data set by using the key root parameter, the hash salt parameter and a preset second secret key derivation function according to a preset processing rule; the preset first secret key derivation function is associated with a preset second secret key derivation function.

In order to more accurately complete the above instructions, referring to fig. 6, another specific server is provided in the embodiments of the present specification, where the server includes a network communication port 601, a processor 602, and a memory 603, and the above structures are connected by an internal cable, so that the structures may perform specific data interaction.

The network communication port 601 may be specifically configured to obtain a key root parameter and a hash salt parameter.

The processor 602 may be specifically configured to process the first data set according to a preset processing rule by using the key root parameter, the hash salt parameter, and a preset first secret key derivation function, so as to obtain a processed first data set; the processed first data set comprises a plurality of processed first data groups, and the processed first data groups comprise hash values of data identifiers, first secret key components and ciphertext data of first target data; sending the processed first data set to a fusion server; wherein the fusion server further receives the processed second data set; the fusion server obtains a fused data set through fusion processing according to a preset derivation rule, the processed first data set and the processed second data set; the second processed data set is obtained by the second server processing the held second data set by using the key root parameter, the hash salt parameter and a preset second secret key derivation function according to a preset processing rule; the preset first secret key derivation function is associated with a preset second secret key derivation function.

The memory 603 may be specifically configured to store a corresponding instruction program.

In this embodiment, the network communication port 601 may be a virtual port bound with different communication protocols, so that different data can be sent or received. For example, the network communication port may be a port responsible for web data communication, a port responsible for FTP data communication, or a port responsible for mail data communication. In addition, the network communication port can also be a communication interface or a communication chip of an entity. For example, it may be a wireless mobile network communication chip, such as GSM, CDMA, etc.; it can also be a Wifi chip; it may also be a bluetooth chip.

In this embodiment, the processor 602 may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The description is not intended to be limiting.

In this embodiment, the memory 603 may include multiple layers, and in a digital system, the memory may be any memory as long as binary data can be stored; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.

The present specification further provides a computer storage medium based on the fusion method of the target data, where the computer storage medium stores computer program instructions, and when the computer program instructions are executed, the computer storage medium implements: acquiring a key root parameter and a hash salt parameter; processing the first data set by using the key root parameter, the hash salt parameter and a preset first secret key derivation function according to a preset processing rule to obtain a processed first data set; the processed first data set comprises a plurality of processed first data groups, and the processed first data groups comprise hash values of data identifiers, first secret key components and ciphertext data of first target data; sending the processed first data set to a fusion server; wherein the fusion server further receives the processed second data set; the fusion server obtains a fused data set through fusion processing according to a preset derivation rule, the processed first data set and the processed second data set; the second processed data set is obtained by the second server processing the held second data set by using the key root parameter, the hash salt parameter and a preset second secret key derivation function according to a preset processing rule; the preset first secret key derivation function is associated with a preset second secret key derivation function.

The embodiment of the present specification further provides another computer storage medium based on the fusion method of the target data, where the computer storage medium stores computer program instructions, and when the computer program instructions are executed, the computer storage medium implements: acquiring a processed first data set and a processed second data set; the processed first data set comprises a plurality of processed first data sets which are obtained by processing a first server based on a preset processing rule and a preset first secret key derivation function; the processed second data set comprises a plurality of processed second data groups which are obtained by the second server based on a preset processing rule and a preset second secret key derivation function; the preset first secret key derivation function is associated with a preset second secret key derivation function; screening out a processed first data group and a processed second data group which contain the same data identification hash value as a matching data pair according to the processed first data set and the processed second data set; and performing corresponding fusion processing on the matching data pair according to a preset derivation rule to obtain a fused data set.

In this embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.

In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer storage medium can be explained by comparing with other embodiments, and are not described herein again.

Referring to fig. 7, in a software level, an embodiment of the present specification further provides a target data fusion apparatus, which may specifically include the following structural modules:

the obtaining module 701 may be specifically configured to obtain a key root parameter and a hash salt parameter;

the processing module 702 may be specifically configured to process the first data set according to a preset processing rule by using the key root parameter, the hash salt parameter, and a preset first secret key derivation function, so as to obtain a processed first data set; the processed first data set comprises a plurality of processed first data groups, and the processed first data groups comprise hash values of data identifiers, first secret key components and ciphertext data of first target data;

a sending module 703, which may be specifically configured to send the processed first data set to the fusion server; wherein the fusion server further receives the processed second data set; the fusion server obtains a fused data set through fusion processing according to a preset derivation rule, the processed first data set and the processed second data set; the second processed data set is obtained by the second server processing the held second data set by using the key root parameter, the hash salt parameter and a preset second secret key derivation function according to a preset processing rule; the preset first secret key derivation function is associated with a preset second secret key derivation function.

It should be noted that, the units, devices, modules, etc. illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. It is to be understood that, in implementing the present specification, functions of each module may be implemented in one or more pieces of software and/or hardware, or a module that implements the same function may be implemented by a combination of a plurality of sub-modules or sub-units, or the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

An embodiment of the present specification further provides another target data fusion apparatus, including: the acquisition module may be specifically configured to acquire the processed first data set and the processed second data set; the processed first data set comprises a plurality of processed first data sets which are obtained by processing a first server based on a preset processing rule and a preset first secret key derivation function; the processed second data set comprises a plurality of processed second data groups which are obtained by the second server based on a preset processing rule and a preset second secret key derivation function; the preset first secret key derivation function is associated with a preset second secret key derivation function; the screening module may be specifically configured to screen out, according to the processed first data set and the processed second data set, which include the same data identifier and have the same hash value, as a matching data pair; and the fusion module is specifically configured to perform corresponding fusion processing on the matching data pair according to a preset derivation rule to obtain a fused data set.

As can be seen from the above, the target data fusion device provided in the embodiments of the present specification can effectively avoid leakage of target data that is not fused to a fusion server in a fusion processing process, and can also avoid leakage of a data identifier corresponding to the target data, thereby protecting data security in the fusion processing process.

Although the present specification provides method steps as described in the examples or flowcharts, additional or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded. The terms first, second, etc. are used to denote names, but not any particular order.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus necessary general hardware platform. With this understanding, the technical solutions in the present specification may be essentially embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments in the present specification.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims

1. A target data fusion method is applied to a first server holding a first data set, wherein the first data set comprises a plurality of first data groups, and each first data group comprises a data identifier and first target data corresponding to the data identifier; the method comprises the following steps:

acquiring a key root parameter and a hash salt parameter;

processing the first data set by using the key root parameter, the hash salt parameter and a preset first secret key derivation function according to a preset processing rule to obtain a processed first data set; the processed first data set comprises a plurality of processed first data groups, and the processed first data groups comprise hash values of data identifiers, first secret key components and ciphertext data of first target data;

sending the processed first data set to a fusion server; wherein the fusion server further receives the processed second data set; the fusion server obtains a fused data set through fusion processing according to a preset derivation rule, the processed first data set and the processed second data set; the second processed data set is obtained by the second server processing the held second data set by using the key root parameter, the hash salt parameter and a preset second secret key derivation function according to a preset processing rule; the preset first secret key derivation function is associated with a preset second secret key derivation function.

2. The method of claim 1, obtaining a key root parameter and a hash salt parameter, comprising:

responding to the initialization request, and generating a first random number and a second random number; the second server responds to the initialization request and generates a third random number and a fourth random number;

sending the first random number and the second random number to a second server; receiving a third random number and a fourth random number sent by a second server;

combining the first random number and the third random number to obtain a key root parameter; and combining the second random number and the fourth random number to obtain the hash salt parameter.

3. The method according to claim 1, wherein processing the first data set according to a preset processing rule by using the key root parameter, the hash salt parameter, and a preset first secret key derivation function to obtain a processed first data set includes:

processing a current first data set of a plurality of first data sets comprised by said first data set in the following manner:

according to a preset processing rule, carrying out hash processing on the current data identifier in the current first data group by using a hash salt parameter to obtain a hash value of the current data identifier;

calling a preset first secret key derivation function, and calculating to obtain a current first secret key component aiming at a current first data set according to a current data identifier and a secret key root parameter;

encrypting the current first target data in the current data set by using the current first secret key component to obtain the ciphertext data of the current first target data;

and combining the hash value of the current data identifier, the current first secret key component and the ciphertext data of the current first target data to obtain a processed current first data group.

4. The method of claim 3, wherein the predetermined first secret key derivation function comprises: a key derivation function based on additive secret sharing, or a key derivation function based on Shamir secret sharing.

5. The method according to claim 3, wherein the step of calling a preset first secret key derivation function to calculate a current first secret key component for the current first data set according to the current data identifier and the key root parameter includes:

splicing the current data identifier and the key root parameter to obtain a spliced character string;

and substituting the spliced character string into a preset first secret key derivation function, and calculating to obtain a current first secret key component.

6. The method of claim 1, the first target data comprising: the first type of characteristic data corresponding to the data identification and/or the first type of label data corresponding to the data identification.

7. A fusion method of target data is applied to a fusion server and comprises the following steps:

acquiring a processed first data set and a processed second data set; the processed first data set comprises a plurality of processed first data sets which are obtained by processing a first server based on a preset processing rule and a preset first secret key derivation function; the processed second data set comprises a plurality of processed second data groups which are obtained by the second server based on a preset processing rule and a preset second secret key derivation function; the preset first secret key derivation function is associated with a preset second secret key derivation function;

screening out a processed first data group and a processed second data group which contain the same data identification hash value as a matching data pair according to the processed first data set and the processed second data set;

and performing corresponding fusion processing on the matching data pair according to a preset derivation rule to obtain a fused data set.

8. The method according to claim 7, wherein performing corresponding fusion processing on the matching data pairs according to a preset derivation rule includes:

performing corresponding fusion processing on the current matching data pair in the matching data pair according to the following mode:

extracting a first secret key component, a second secret key component, ciphertext data of first target data and ciphertext data of second target data according to the current matching data pair;

determining a decryption secret key corresponding to the current matching data pair by using the first secret key component and the second secret key component according to a preset derivation rule;

decrypting the ciphertext data of the first target data and the ciphertext data of the second target data by using the decryption key to obtain first target data and second target data;

and fusing the first target data and the second target data according to a preset fusion rule to obtain a fused data group corresponding to the current matching data pair.

9. The method according to claim 8, fusing the first target data and the second target data according to a preset fusion rule to obtain a fused data set corresponding to the current matching data pair, comprising:

splicing the first target data and the second target data according to a preset splicing sequence to obtain fused target data;

and combining the hash value of the data identifier in the current matching data pair with the fused target data to obtain a fused data group corresponding to the current matching data pair.

10. The method according to claim 8, wherein determining, according to a preset derivation rule, a decryption key corresponding to the current matching data pair by using the first secret key component and the second secret key component, comprises:

and under the condition that the preset first secret key derivation function and the preset second secret key derivation function are secret key derivation functions shared based on addition secrets, according to a preset derivation rule, adding the first secret key component and the second secret key component to obtain a decryption secret key corresponding to the current matching data pair.

11. The method according to claim 8, wherein determining, according to a preset derivation rule, a decryption key corresponding to the current matching data pair by using the first secret key component and the second secret key component, comprises:

under the condition that a preset first secret key derivation function and a preset second secret key derivation function are secret key derivation functions shared based on Shamir secrets, constructing a corresponding equation set by utilizing a first secret key component and a second secret key component according to a preset derivation rule; and solving the equation set to obtain a decryption key corresponding to the current matching data pair.

12. The method of claim 9, after obtaining the fused data set, further comprising:

sending the fused data set to a first server; and the first server trains a target prediction model by using the fused data set.

13. A fusion method of target data is applied to a fusion server and comprises the following steps:

obtaining a plurality of processed data sets; the processed data sets are obtained by processing the held data sets by the servers respectively based on a preset processing rule and a preset key derivation function; the plurality of servers comprises at least three servers;

screening out at least a threshold number of processed data groups with the same hash value of the contained data identification as a matching data pair according to a plurality of processed data sets;

14. The method according to claim 13, wherein performing corresponding fusion processing on the matching data pairs according to a preset derivation rule includes:

extracting ciphertext data of a plurality of key components and a plurality of target data according to the current matching data pair;

determining a decryption key corresponding to the current matching data pair by using at least a threshold number of key components in the key components according to a preset derivation rule;

decrypting the ciphertext data of the plurality of target data by using the decryption key to obtain a plurality of target data;

and fusing the plurality of target data according to a preset fusion rule to obtain a fused data set corresponding to the current matching data pair.

15. An apparatus for fusing target data, comprising:

the obtaining module is used for obtaining a key root parameter and a hash salt parameter;

the processing module is used for processing the first data set by utilizing the secret key root parameter, the hash salt parameter and a preset first secret key derivation function according to a preset processing rule to obtain a processed first data set; the processed first data set comprises a plurality of processed first data groups, and the processed first data groups comprise hash values of data identifiers, first secret key components and ciphertext data of first target data;

the sending module is used for sending the processed first data set to a fusion server; wherein the fusion server further receives the processed second data set; the fusion server obtains a fused data set through fusion processing according to a preset derivation rule, the processed first data set and the processed second data set; the second processed data set is obtained by the second server processing the held second data set by using the key root parameter, the hash salt parameter and a preset second secret key derivation function according to a preset processing rule; the preset first secret key derivation function is associated with a preset second secret key derivation function.

16. An apparatus for fusing target data, comprising:

the acquisition module is used for acquiring the processed first data set and the processed second data set; the processed first data set comprises a plurality of processed first data sets which are obtained by processing a first server based on a preset processing rule and a preset first secret key derivation function; the processed second data set comprises a plurality of processed second data groups which are obtained by the second server based on a preset processing rule and a preset second secret key derivation function; the preset first secret key derivation function is associated with a preset second secret key derivation function;

the screening module is used for screening out the processed first data group and the processed second data group which contain the same data identification hash value as a matching data pair according to the processed first data set and the processed second data set;

and the fusion module is used for carrying out corresponding fusion processing on the matching data pair according to a preset derivation rule so as to obtain a fused data set.

17. A server comprising a processor and a memory for storing processor-executable instructions that, when executed by the processor, implement the steps of the method of any one of claims 1 to 6, 7 to 12, or 13 to 14.

18. A computer storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 6, 7 to 12, or 13 to 14.