CN115118448B

CN115118448B - Data processing method, device, equipment and storage medium

Info

Publication number: CN115118448B
Application number: CN202210427247.6A
Authority: CN
Inventors: 范晓亮; 蒋杰; 杨昱睿; 刘煜宏; 陈鹏; 陶阳宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2023-09-01
Anticipated expiration: 2042-04-21
Also published as: CN115118448A

Abstract

The application provides a data processing method, a device, equipment and a storage medium, and relates to the technical field of data processing, wherein the method comprises the following steps: m data sets of the first attribute data set are determined, for each data set, a plurality of data identification sets and first ciphertext of the data set are sent to a second server, the second server holds a second attribute data set of a data object, the second server is used for determining a plurality of second ciphertext of the data set, each second ciphertext is determined by the second server according to a preset aggregation algorithm, one data identification set in the plurality of data identification sets, the first ciphertext and the second attribute data set, one data identification set is composed of data identifications corresponding to first attribute data in an unmarked state randomly selected by the first server from the data set, a plurality of second ciphertext of the data set sent by the second server is received, and an aggregation result of the second attribute data of the data set is obtained according to the aggregation algorithm, a private key and the plurality of second ciphertext of the data set.

Description

Data processing method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of data processing, in particular to a data processing method, a device, equipment and a storage medium.

Background

Currently, in many data processing application scenarios, data is stored and maintained independently between different data parties (e.g., between different institutions or between different departments of the same institution), where different data parties often hold different types of data for the same object, e.g., a data party holds the name of a data object and B data party holds some attribute score for the data object.

Grouping aggregation refers to splitting a data set into groups according to some criteria, applying some aggregation function or method to each group, and integrating the resulting new value into the result object. Due to privacy protection, data protection and other factors, data scattered on different data parties cannot be directly concentrated together for grouping and aggregation. When data of two data parties are grouped and aggregated to perform joint data statistics, the data of each data party needs privacy protection and cannot be revealed to the other party.

Therefore, a data processing method capable of protecting the data privacy of two data parties by grouping and aggregating different types of data objects held by the two data parties respectively is needed.

Disclosure of Invention

The application provides a data processing method, a device, equipment and a storage medium, which are used for realizing grouping and aggregation by utilizing different types of data objects held by two data parties respectively, protecting the data privacy of the two data parties and improving the safety of data processing.

In a first aspect, the present application provides a data processing method, the method being applied to a first server, the first server holding a first attribute dataset of a data object, comprising:

determining M data sets of the first attribute data set, wherein each data set comprises identical first attribute data and data identifiers corresponding to the first attribute data, and M is a positive integer;

for each data set in the M data sets, sending a plurality of data identification sets of the data sets and a first ciphertext encrypted by using a public key to a second server, wherein the second server holds a second attribute data set of the data object and is used for determining a plurality of second ciphertexts of the data sets, and each second ciphertext is determined by the second server according to a preset aggregation algorithm, one data identification set in the plurality of data identification sets, the first ciphertext and the second attribute data set, and one data identification set consists of data identifications corresponding to first attribute data in an unmarked state randomly selected by the first server from the data sets;

Sequentially receiving a plurality of second ciphertexts of the data set sent by the second server;

and obtaining an aggregation result of the second attribute data of the data set according to the aggregation algorithm, the private key and the plurality of second ciphertexts of the data set.

In a second aspect, the present application provides a data processing method, the method being applied to a second server, the second server holding a second attribute dataset of a data object, comprising:

receiving a plurality of data identification sets of a data set and a first ciphertext encrypted by a public key, wherein the data set is any one of M data sets of the first attribute data set, the data sets comprise the same first attribute data and data identifications corresponding to each first attribute data, M is a positive integer, and one data identification set consists of data identifications corresponding to first attribute data in an unmarked state randomly selected by the first server from the data sets;

determining a plurality of second ciphertexts of the data set, wherein each second ciphertext is determined by the second server according to a preset aggregation algorithm, one data identification set in the plurality of data identification sets, the first ciphertext and the second attribute data set;

And sequentially sending a plurality of second ciphertexts of the data set to the first server, wherein the second ciphertexts are used for the first server to obtain an aggregation result of the second attribute data of the data set according to the aggregation algorithm, the private key and the plurality of second ciphertexts of the data set.

In a third aspect, the present application provides a data processing apparatus holding a first set of attribute data for a data object, the apparatus comprising:

the determining module is used for determining M data sets of the first attribute data set, each data set comprises identical first attribute data and data identifiers corresponding to the first attribute data, and M is a positive integer;

the acquisition module is used for sending a plurality of data identification sets of the data sets and first ciphertext encrypted by using a public key to a second server for each data set in the M data sets, wherein the second server holds a second attribute data set of the data object and is used for determining a plurality of second ciphertext of the data sets, each second ciphertext is determined by the second server according to a preset aggregation algorithm, one data identification set in the plurality of data identification sets, the first ciphertext and the second attribute data set, and one data identification set consists of data identifications corresponding to first attribute data in an unmarked state selected randomly by the first server from the data sets;

The receiving module is used for sequentially receiving a plurality of second ciphertexts of the data set sent by the second server;

and the processing module is used for obtaining an aggregation result of the second attribute data of the data group according to the aggregation algorithm, the private key and the plurality of second ciphertext of the data group.

In a fourth aspect, the present application provides a data processing apparatus holding a second set of attribute data for a data object, the apparatus comprising:

the receiving module is used for receiving a plurality of data identification sets of a data set sent by a first server and a first ciphertext encrypted by a public key, wherein the first server holds a first attribute data set of the data object, the data set is any one of M data sets of the first attribute data set, the data sets comprise identical first attribute data and data identifications corresponding to each first attribute data, M is a positive integer, and one data identification set consists of data identifications corresponding to first attribute data in an untagged state selected randomly by the first server from the data sets;

the determining module is used for determining a plurality of second ciphertexts of the data set, and each second ciphertext is determined by the second server according to a preset aggregation algorithm, one data identification set in the plurality of data identification sets, the first ciphertext and the second attribute data set;

And the sending module is used for sequentially sending a plurality of second ciphertexts of the data set to the first server, and obtaining an aggregation result of the second attribute data of the data set by the first server according to the aggregation algorithm, the private key and the plurality of second ciphertexts of the data set.

In a fifth aspect, the present application provides a data processing apparatus comprising: a processor and a memory for storing a computer program, the processor being for invoking and running the computer program stored in the memory to perform the method of the first or second aspect.

In a sixth aspect, the present application provides a computer-readable storage medium storing a computer program that causes a computer to perform the method of the first or second aspect.

In a seventh aspect, the application provides a computer program product comprising a computer program which, when executed by a processor, implements the method of the first or second aspect.

In summary, in the present application, M data sets of a first attribute data set are first determined by a first server, for each data set of the M data sets, a plurality of second ciphertexts of the data set are obtained from a second server, each second ciphertext is determined by the second server according to a preset aggregation algorithm, a data identification set, a first ciphertext encrypted by the first server by using a public key, and a second attribute data set, each data identification set is composed of data identifications corresponding to first attribute data in an unmarked state randomly selected by the first server from the data set, and finally, an aggregation result of the second attribute data of the data set is obtained according to the aggregation algorithm, a private key, and a plurality of second ciphertexts of the data set. Because the first server and the second server are interacted with each other by the encrypted first ciphertext and the encrypted second ciphertext, and the data identifiers corresponding to the first attribute data of each data group are not sent to the second server once, but are sent to the second server in a plurality of times and randomly selected for each time by the data identifiers corresponding to the plurality of first attribute data (namely, the data identifier set), the second server cannot acquire grouping information of the first attribute data group, the first server cannot acquire details of the second attribute data group, and the aggregation result of the second attribute data of each data group in the M data groups is realized by the first server. Therefore, the purpose of grouping aggregation is achieved, the data privacy of two data parties is protected, and the safety of data processing is improved.

Further, in the application, the correctness of the grouping aggregation result and the data security of the interaction process can be further ensured by using the homomorphic encryption algorithm.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a data processing process according to an embodiment of the present application;

FIG. 3 is an interactive flowchart of a data processing method according to an embodiment of the present application;

FIG. 4 is an interactive flowchart of a data processing method according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating a data processing method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

Fig. 8 is a schematic block diagram of a data processing apparatus 700 provided by an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Before the technical scheme of the application is introduced, the related knowledge of the application is introduced as follows:

1. cloud computing (clouding) is a computing model that distributes computing tasks over a resource pool of large numbers of computers, enabling various application systems to acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed.

As a basic capability provider of cloud computing, a cloud computing resource pool (cloud platform for short, generally referred to as IaaS (Infrastructure as a Service, infrastructure as a service) platform) is established, in which multiple types of virtual resources are deployed for external clients to select for use.

According to the logic function division, a PaaS (Platform as a Service ) layer can be deployed on an IaaS (Infrastructure as a Service ) layer, and a SaaS (Software as a Service, software as a service) layer can be deployed above the PaaS layer, or the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, web container, etc. SaaS is a wide variety of business software such as web portals, sms mass senders, etc. Generally, saaS and PaaS are upper layers relative to IaaS.

2. The Database (Database), which can be considered as an electronic filing cabinet, is a place for storing electronic files, and users can perform operations such as adding, inquiring, updating, deleting and the like on the data in the files. A "database" is a collection of data stored together in a manner that can be shared with multiple users, with as little redundancy as possible, independent of the application. The database management system (Database Management System, abbreviated as DBMS) is a computer software system designed for managing databases, and generally has basic functions of storage, interception, security, backup and the like. The database management system may classify according to the database model it supports, e.g., relational, XML (Extensible Markup Language ); or by the type of computer supported, e.g., server cluster, mobile phone; or by the query language used, such as SQL (structured query language (Structured Query Language), XQuery, or by the energy impact emphasis, such as maximum-scale, maximum-speed, or other classification means, regardless of which classification means is used, some DBMSs can cross-category, for example, while supporting multiple query languages.

3. Homomorphic encryption is a cryptographic technique, and is based on processing homomorphic encrypted data to obtain an output, decrypting the output, and the result is the same as the output result obtained by processing unencrypted original data by the same method.

4. A packet, which may also be referred to as a data packet, is a grouping of data in a data table in a database by a certain column or a certain row.

With reference to fig. 1, fig. 1 is a schematic diagram of an architecture of a data processing system according to an embodiment of the present application, where the data processing system includes a first server 10 and a second server 20, and the first server 10 and the second server 20 may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

The first server 10 or the second server 20 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and an artificial intelligence platform.

Alternatively, the first server 10 or the second server 20 in this embodiment may be any other computing device having computing capabilities, for example, a terminal. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc.

In the application scenario of the embodiment of the present application, the data object is divided into a plurality of data groups according to the first attribute data set of the data object on the premise that the first server 10 and the second server 20 cannot acquire the attribute data of the data object held by the opposite party, and the second attribute data of the data object in each data group is aggregated according to a preset aggregation algorithm, so as to obtain an aggregation result of the second attribute data of each data group. The first attribute data set is a set formed by the first attribute data, and the second attribute data set is a set formed by the second attribute data.

In some embodiments, the data objects may be user objects, and in different application scenarios, the data objects may also be other types of data objects, such as merchandise objects, order objects, and so on, each of which may be specifically represented using an identification of the object. In the embodiment of the present application, a scenario in which a data object is a user object is taken as an example, and for other types of data objects, similar to the user object, details are not repeated.

Illustratively, as shown in Table one below, table one is a first set of attribute data (a 1-ai) for data objects stored by a first server, the first set of attribute data corresponding to data identifications ID1-IDN, and Table two is a second set of attribute data (b 1-bN) for data objects stored by a second server, the second set of attribute data likewise corresponding to data identifications ID1-IDN, i.e., the first set of attribute data and the second set of attribute data correspond to the same set of data identifications.

List one

Watch II

Data identification	Second attribute data set
		ID1	b1
ID2	b2
		……	……
IDN	bN

More specifically, taking a data object as an example of a student, the first server 10 stores the name of the student (i.e., the first attribute data set of the student), and the following table three is the name of the student:

name of student

Data identification	Name of name
		1	a
2	a
		3	b
4	b
		5	a
6	b
		7	c
8	c
		9	a

The second server 20 stores examination results of the students (i.e., second attribute data sets of the students), and the following table four is the examination results of the students:

examination results of four students

Wherein the first attribute data set of the student and the second attribute data set of the student correspond to the same set of data identifications (i.e., 1-9). In the embodiment of the present application, on the premise that the first server 10 and the second server 20 cannot acquire the attribute data of the data object held by the other party, the data object is divided into a plurality of data sets according to the first attribute data set of the data object (i.e., the data sets may be divided into three data sets including student a, student b and student c according to the names of the students), and the second attribute data of the data object in each data set is aggregated according to a preset aggregation algorithm, for example, the aggregation algorithm is summation, and the score sum of each student is calculated to obtain the aggregation result of the second attribute data of each data set, so as to obtain the score sum of each student. Fig. 2 is a schematic diagram of a data processing process provided in an embodiment of the present application, as shown in fig. 2, in which a first attribute data set is divided into three data sets including a student a, a student b and a student c according to a student name, and then the scores of each student are summed to obtain an aggregate result of second attribute data of each data set, that is, the score sum of each student shown in fig. 2, where the score sum of student a is 340, the score sum of student b is 260 and the score sum of student c is 130.

It should be noted that, the first and second tables are only examples, and the same data object may further include more attribute data sets, for example, the first server holds two attribute data sets of the data object, and the second server holds three data sets of the data object, which is not limited in this embodiment of the present application.

In order to solve the technical problem, grouping aggregation is performed by utilizing different types of data objects held by two data parties respectively, so that the data privacy of the two data parties is protected, and the safety of data processing is improved. In the embodiment of the present application, a first server 10 holds a first attribute data set of a data object, a second server 20 holds a second attribute data set of the same data object, in the present application, M data sets of the first attribute data set are determined by the first server 10, for each data set of the M data sets, a plurality of second ciphertexts of the data set are obtained from the second server 20, each second ciphertexts is determined by the second server 20 according to a preset aggregation algorithm, a data identification set, a first ciphertext encrypted by the first server 10 by using a public key, and a second attribute data set, each data identification set is formed by a data identification corresponding to first attribute data in an unmarked state randomly selected by the first server 10 from the data set, and finally, an aggregation result of the second attribute data of the data set is obtained according to the aggregation algorithm, a private key, and a plurality of second ciphertexts of the data sets. Since the first server 10 and the second server 20 interact with each other by using the encrypted first ciphertext and the encrypted second ciphertext, and the data identifier corresponding to the first attribute data of each data set is not sent to the second server once, but sent to the second server 20 in multiple times and randomly selected data identifiers (i.e. data identifier sets) corresponding to the multiple first attribute data each time, the second server 20 cannot learn the grouping information of the first attribute data set, the first server 10 cannot learn the details of the second attribute data set, and the first server 10 obtains the aggregation result of the second attribute data of each data set in the M data sets. Therefore, the purpose of grouping aggregation is achieved, the data privacy of two data parties is protected, and the safety of data processing is improved.

The technical scheme of the application will be described in detail as follows:

fig. 3 is an interactive flowchart of a data processing method according to an embodiment of the present application, in this embodiment, a first server holds a first attribute data set of a data object, and a second server holds a second attribute data set of the same data object, as shown in fig. 3, the method includes the following steps:

s101, a first server determines M data sets of a first attribute data set, wherein each data set comprises identical first attribute data and data identifiers corresponding to the first attribute data, and M is a positive integer.

Specifically, in the embodiment of the present application, the first attribute data set and the second attribute data set correspond to the same group of data identifiers. If the method of the present embodiment is executed on this basis and the first server holds the first attribute data set of the data object and the second server holds the second attribute data set of the same data object, the two attribute data sets have no intersection, and optionally, before S101, the method of the present embodiment may further include:

the first server generates a set of data identifications corresponding to both the first attribute data set and the second attribute data set, and the second server also generates the set of data identifications. For example, after passing the security Join, the first server and the second server generate a set of data identifications corresponding to both the first attribute data set and the second attribute data set. For example, a virtual table T may be provided, where the table T contains only a portion of the intersection of the first and second attribute data sets, and each record of the virtual table T is linked by an Id column of the first and second attribute data sets (either a single Id or a union Id). On the basis of table T, the method of the present embodiment is performed for federation grouping, and an aggregation operation is performed within the grouping.

The first server determines M data sets of the first attribute data set, where each data set includes the same first attribute data and a data identifier corresponding to each first attribute data, taking the first attribute data set shown in table one as an example, and the first attribute data is a1 and the data identifiers ID1 and ID2 corresponding to two a1 are one data set.

S102, for each data group in M data groups, the first server sends a plurality of data identification sets of the data groups and a first ciphertext encrypted by a public key to the second server, wherein one data identification set consists of data identifications corresponding to first attribute data in an unmarked state randomly selected from the data groups by the first server.

Specifically, for each of the M data sets, the aggregation result of the second attribute data of each data set may be obtained through S102-S105.

In one implementation manner, for each of the M data sets, sending, to the second server, a plurality of data identification sets of the data sets and a first ciphertext encrypted using the public key may specifically include:

s1021, the first server determines one data identification set in a plurality of data identification sets through a first mode, wherein the first mode is as follows: and randomly selecting a plurality of first attribute data in unlabeled states from the data group, and forming a data identification set D by data identifications corresponding to the first attribute data in the unlabeled states.

Optionally, the first server randomly selects the first attribute data in the plurality of unlabeled states from the data set, which may be:

the first server determines a random number r according to a preset value range of the random number, and randomly selects first attribute data in r unmarked states from the data set.

For example, the range of values of the random number may be: 80< r <100. When the data identification set is determined each time, the random number is refreshed, so that the data security can be further ensured.

And S1022, the first server sends the D and the first ciphertext to the second server.

S1023, when the first server receives a second ciphertext transmitted by the second server, setting first attribute data corresponding to the data identifier in the D in the first attribute data set to be in a marked state, and determining that the second ciphertext is the second server when the D meets the preset condition.

And S1024, if the first server determines that the first attribute data in the unlabeled state exists in the data group, determining one of the data identification sets in the first mode, and sending D and the first ciphertext to the second server until the first attribute data in the unlabeled state does not exist in the data group or the data identification set meeting the preset condition does not exist.

Optionally, the method of this embodiment may further include: the first server receives a first ciphertext sent by the second server, wherein the first ciphertext is sent by the second server when the second server determines that D does not meet the preset condition. The first server may not process the first ciphertext when it receives the first ciphertext.

Alternatively, the preset condition may be: and D, judging whether the number N of the data identifications in the D is larger than a preset N.

Wherein n is a preset positive integer greater than 1, where n is set to be greater than 1, so that leakage of second attribute data of the data object stored by the second server to the first server can be avoided.

S103, the second server determines a plurality of second ciphertexts of the data set, and each second ciphertext is determined by the second server according to a preset aggregation algorithm, one data identification set in a plurality of data identification sets, the first ciphertext and a second attribute data set.

The preset aggregation algorithm may be any one of Sum (Sum), maximum (Max), minimum (Min), average (Avg) and Count (Count), and may be a user-defined aggregation function (UDAF).

Optionally, the aggregation algorithm in this embodiment may be preset in both the first server and the second server, or may be preset in the first server, where the first server may send the required aggregation algorithm to the second server at the same time when sending D and the first ciphertext to the second server each time.

Optionally, the determining, by the second server, the plurality of second ciphertexts of the data set may specifically include:

s1031, respectively determining a second ciphertext corresponding to each data identification set D in the plurality of data identification sets in a target mode, wherein the target mode comprises: and when the D is determined to meet the preset condition, determining a second ciphertext according to the aggregation algorithm, the D, the second attribute data set and the first ciphertext.

Optionally, determining a second ciphertext according to the aggregation algorithm, D, the second attribute data set, and the first ciphertext may specifically be:

searching second attribute data corresponding to the data identifier belonging to the D from the second attribute data set, aggregating the second attribute data corresponding to the data identifier belonging to the D according to an aggregation algorithm to obtain a first aggregation result, and encrypting the first aggregation result according to the first ciphertext, the public key and calculation logic corresponding to the aggregation algorithm to obtain a second ciphertext.

Specifically, for example, the first ciphertext is [ state ], the first aggregation result is delta, the second ciphertext=merge ([ state ], delta), merge is computation logic corresponding to an aggregation algorithm, and if the aggregation algorithm is Sum (Sum), the computation logic of merge is Sum, and the corresponding second ciphertext= [ state ] +delta; if the aggregation algorithm is maximum (Max), then the computation logic of merge is maximum, the corresponding second ciphertext= [ state ] > delta? Delta is the maximum of the delta and the state. Since the state is the homomorphic ciphertext and the second ciphertext aggregated with the homomorphic ciphertext, the second ciphertext can be prevented from being leaked to the third party when the second server sends the second ciphertext to the first server.

S1032, determining the second ciphertext corresponding to each of the plurality of data identification sets as a plurality of second ciphertexts of the data set.

And S104, the second server sequentially sends a plurality of second ciphertexts of the data set to the first server.

Optionally, the method of this embodiment may further include: and when the D is determined not to meet the preset condition, sending a first ciphertext to the first server.

S105, the first server obtains an aggregation result of the second attribute data of the data group according to the aggregation algorithm, the private key and the plurality of second ciphertexts of the data group.

In one implementation, S105 may specifically include:

s1051, decrypting the second ciphertexts of the data set by using the private key to obtain a plurality of first aggregation results of the data set.

S1052, aggregating the plurality of first aggregation results by using an aggregation algorithm to obtain an aggregation result of the second attribute data of the data group.

Optionally, the method of this embodiment may further include: the first server generates a public key and a private key and sends the public key to the second server.

Optionally, the first ciphertext is homomorphic encryption ciphertext, and a value of the first ciphertext is preset according to an aggregation algorithm. Accordingly, the second ciphertext is also homomorphic encrypted ciphertext. For example, when the aggregation algorithm is Sum (Sum), average (Avg) or Count (Count), the value of the first ciphertext is set to 0, when the aggregation algorithm is maximum (Max), the value of the first ciphertext may be set to a smaller value, and when the aggregation algorithm is minimum (Min), the value of the first ciphertext may be set to a larger value. When the aggregation algorithm is a user-defined aggregation function (UDAF), the value of the first ciphertext may be set according to the UDAF.

In this embodiment, by using the homomorphic encryption algorithm, the correctness of the packet aggregation result and the data security can be further ensured.

According to the data processing method provided by the embodiment, M data sets of a first attribute data set are firstly determined through a first server, a plurality of second ciphertexts of the data sets are obtained from a second server for each data set of the M data sets, each second ciphertext is determined by the second server according to a preset aggregation algorithm, a data identification set, a first ciphertext encrypted by the first server by using a public key and the second attribute data set, each data identification set is composed of data identifications corresponding to first attribute data in an unmarked state randomly selected from the data sets by the first server, and finally an aggregation result of the second attribute data of the data sets is obtained according to the aggregation algorithm, a private key and a plurality of second ciphertexts of the data sets. Because the first server and the second server are interacted with each other by the encrypted first ciphertext and the encrypted second ciphertext, and the data identifiers corresponding to the first attribute data of each data group are not sent to the second server once, but are sent to the second server in a plurality of times and randomly selected for each time by the data identifiers corresponding to the plurality of first attribute data (namely, the data identifier set), the second server cannot acquire grouping information of the first attribute data group, the first server cannot acquire details of the second attribute data group, and the aggregation result of the second attribute data of each data group in the M data groups is realized by the first server. Therefore, the purpose of grouping aggregation is achieved, the data privacy of two data parties is protected, and the safety of data processing is improved.

In this embodiment, in a multiparty scenario, for example, there are N participants, which can be converted into N-1 two-party scenarios to solve the problem. Specifically, for example, assume that there are 4 parties, and the grouping column belongs to party a, parties a, B, C, and D, respectively. The current 4-party scene can be converted into 3 two-party scenes, namely an A party and a B party, an A party and a C party, and an A party and a D party, and each two-party scene can be subjected to grouping aggregation based on privacy protection by using the data processing method.

The data processing method provided by the application is described in detail below with reference to a specific embodiment.

Fig. 4 is an interactive flowchart of a data processing method according to an embodiment of the present application, and fig. 5 is a flowchart of an exemplary data processing method according to an embodiment of the present application, as shown in fig. 5, in this embodiment, a first server stores a first attribute data set (a 1-ai) of a data object, a second server stores a second attribute data set (b 1-bn) of the same data object, the second attribute data set and the first attribute data set have the same data identifier (Id 1-Idn), an encryption algorithm in this embodiment uses a homomorphic encryption algorithm, as shown in fig. 4, the method may include the following steps:

S301, the first server generates homomorphic encryption public keys and private keys, and sends the homomorphic encryption public keys to the second server.

S302, a first server determines M data groups of a first attribute data set, each data group comprises the same first attribute data and a data identifier corresponding to each first attribute data, and an aggregation result of second attribute data of each data group is obtained through the following S3021-S3032.

For example, the first attribute data is a1 and the data identifications ID1, ID2, and ID3 corresponding to the two a1 are one data group.

S3021, the first server determines a random number r according to a preset value range of the random number.

Specifically, for example, 80< r <100.

S3022, the first server randomly selects r first attribute data in unmarked states from the target data set of the first attribute data set, and forms a data identification set D by data identifications corresponding to the r first attribute data in unmarked states.

Specifically, the target data set is any one of M data sets. For example, there are 50 first attribute data in a data set, where the first attribute data in the data set is user a, there are 50 data identifiers (id 1-id 50), each time a random number r is determined, where the random number r determined each time may be different, for example, the first determined random number r is 5, 5 first attribute data in an untagged state (i.e., first attribute data that is not selected) are randomly selected from the 50 first attribute data, and the data identifiers corresponding to the 5 first attribute data form a data identifier set D, for example, D includes id1, id2, id5, id7, and id9. And selecting from the data identifiers corresponding to the remaining 45 pieces of first attribute data in the second selection.

S3023, the first server sends the D and the first ciphertext encrypted by the homomorphic encryption public key to the second server.

The value of the first ciphertext may be set according to a preset aggregation algorithm, for example, when the aggregation algorithm is Sum (Sum), average (Avg) or Count (Count), the value of the first ciphertext is set to 0, when the aggregation algorithm is maximum (Max), the value of the first ciphertext may be set to a smaller value, and when the aggregation algorithm is minimum (Min), the value of the first ciphertext may be set to a larger value. When the aggregation algorithm is a user-defined aggregation function (UDAF), the value of the first ciphertext may be set according to the UDAF.

S3024, the second server judges whether the D meets a preset condition, namely judges whether the number N of the data identifiers in the D is larger than a preset N.

Wherein n is a preset positive integer greater than 1, where n is set to be greater than 1, so that second attribute data of a data object of the second server can be prevented from being leaked to the first server.

If N is determined to be greater than N, S3025 is performed; if N is determined to be less than or equal to N, S3029 described below is executed.

S3025, the second server searches the second attribute data corresponding to the data identifier belonging to the D from the second attribute data set.

Specifically, as shown in fig. 5, for example, id1, id2, and Id3 are included in D, the second attribute data corresponding to Id1, id2, and Id3, for example, b1, b2, and b3, respectively, are searched for from the second attribute data set.

S3026, the second server aggregates the second attribute data corresponding to the data identifier belonging to the D according to a preset aggregation algorithm to obtain a first aggregation result.

S3027, the second server encrypts the first aggregation result according to the first ciphertext, the public key and calculation logic corresponding to the aggregation algorithm to obtain a second ciphertext.

S3028, the second server sends the second ciphertext and first indication information to the first server, wherein the first indication information is used for indicating that D meets a preset condition.

S3029, the second server sends the first ciphertext and second indication information to the first server, wherein the second indication information is used for indicating that D does not meet the preset condition.

S3030, if the first server receives the second ciphertext and the first indication information, setting the first attribute data corresponding to the data identifier in the D in the first attribute data set as a mark state.

Specifically, by setting the first attribute data corresponding to the data identifier in D to the flag state, it can be indicated that the first attribute data corresponding to the data identifier in D has been selected.

If it is determined by the first server that the first attribute data in the unlabeled state exists in the target data set, S3021 to S3030 are continued until the first attribute data in the unlabeled state does not exist in the target data set, or the data identification set satisfying the preset condition does not exist.

S3032, the first server uses the private key to decrypt the second ciphertexts of the target data set respectively to obtain a plurality of first aggregation results of the data set, and uses an aggregation algorithm to aggregate the first aggregation results to obtain an aggregation result of the second attribute data of the target data set.

Specifically, through S3031, the first server may receive all second ciphertexts of the target data set, for example, 10 second ciphertexts of the target data set are received, each second ciphertext corresponds to a first aggregation result, the first server uses a homomorphic encryption private key to decrypt the second ciphertexts, so as to obtain a first aggregation result, the first server uses the homomorphic encryption private key to decrypt the 10 second ciphertexts respectively, so as to obtain 10 first aggregation results, and if a preset aggregation algorithm is Sum (Sum), the first server adds the 10 first aggregation results, so as to obtain an aggregation result of the second attribute data of the target data set. If the preset aggregation algorithm is maximum (Max), the first server may determine the maximum aggregation result of the 10 first aggregation results as an aggregation result of the second attribute data of the target data set.

Alternatively, in this embodiment, the second server and the first server may preset the same aggregation algorithm. The first server may preset an aggregation algorithm, and before sending D to the second server or when sending D to the second server, the aggregation algorithm may be sent to the second server, so that the two sides will perform the same aggregation processing.

In summary, through the above-described S3021 to S3032, an aggregation result of the second attribute data of each packet in the first attribute data set may be obtained.

According to the data processing method provided by the embodiment, the first server and the second server are interacted with each other by the encrypted first ciphertext and the encrypted second ciphertext, and the data identification corresponding to the first attribute data of each data set is not sent to the second server once, but is sent to the second server in a plurality of times and randomly selected data identifications (namely data identification sets) corresponding to the plurality of first attribute data each time, so that the second server cannot acquire grouping information of the first attribute data set, the first server cannot acquire details of the second attribute data set, and the first server acquires an aggregation result of the second attribute data of each data set in the M data sets. Therefore, the purpose of grouping aggregation is achieved, the data privacy of two data parties is protected, and the safety of data processing is improved.

Fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, where the data processing apparatus holds a first attribute data set of a data object, and as shown in fig. 6, the apparatus may include: a determination module 11, a transmission module 12, a reception module 13 and a processing module 14, wherein,

The determining module 11 is configured to determine M data sets of the first attribute data set, where each data set includes the same first attribute data and a data identifier corresponding to each first attribute data, and M is a positive integer;

the sending module 12 is configured to send, for each of the M data groups, a plurality of data identifier sets of the data group and a first ciphertext encrypted using a public key to a second server, where the second server holds a second attribute data set of the data object, and the second server is configured to determine a plurality of second ciphertexts of the data group, where each second ciphertext is determined by the second server according to a preset aggregation algorithm, one of the plurality of data identifier sets, the first ciphertext, and the second attribute data set, and one of the data identifier sets is composed of data identifiers corresponding to first attribute data in an unlabeled state that is randomly selected by the first server from the data groups;

the receiving module 13 is configured to sequentially receive a plurality of second ciphertexts of the data set sent by the second server;

the processing module 14 is configured to obtain an aggregation result of the second attribute data of the data set according to the aggregation algorithm, the private key, and the plurality of second ciphertexts of the data set.

Optionally, the sending module 12 is configured to: determining one of a plurality of data identification sets by a first method, the first method being: randomly selecting a plurality of first attribute data in unlabeled states from the data group, and forming a data identification set D by data identifications corresponding to the first attribute data in the unlabeled states;

Sending the D and the first ciphertext to a second server;

when a second ciphertext transmitted by the second server is received, setting first attribute data corresponding to the data identifier in the D in the first attribute data set as a mark state, and determining that the second ciphertext is determined by the second server when the D meets a preset condition;

if the first attribute data in the unlabeled state still exists in the data group, one of the data identification sets is continuously determined through the first mode, and the D and the first ciphertext are sent to the second server until the first attribute data in the unlabeled state does not exist in the data group or the data identification set meeting the preset condition does not exist.

Optionally, the receiving module 13 is further configured to:

and receiving a first ciphertext transmitted by the second server, wherein the first ciphertext is transmitted by the second server when the D is determined to not meet the preset condition.

Optionally, the sending module 12 is specifically configured to: determining a random number r according to a preset value range of the random number;

first attribute data of r unlabeled states are randomly selected from the data set.

Optionally, the sending module 12 is further configured to: and generating a public key and a private key, and sending the public key to the second server.

Optionally, the processing module 14 is configured to decrypt a plurality of second ciphertexts of the data set by using the private key, so as to obtain a plurality of first aggregation results of the data set;

and aggregating the plurality of first aggregation results by using an aggregation algorithm to obtain an aggregation result of the second attribute data of the data group.

Optionally, the first ciphertext is homomorphic encryption ciphertext, and a value of the first ciphertext is preset according to an aggregation algorithm.

Fig. 7 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, where the data processing apparatus holds a second attribute data set of a data object, and as shown in fig. 7, the apparatus may include: a receiving module 21, a determining module 22 and a transmitting module 23, wherein,

the receiving module 21 is configured to receive a plurality of data identifier sets of a data set sent by a first server and a first ciphertext encrypted using a public key, where the first server holds a first attribute data set of a data object, the data set is any one of M data sets of the first attribute data set, the data set includes the same first attribute data and data identifiers corresponding to each first attribute data, M is a positive integer, and one data identifier set is composed of data identifiers corresponding to first attribute data in an untagged state randomly selected by the first server from the data sets;

The determining module 22 is configured to determine a plurality of second ciphertexts of the data set, where each second ciphertext is determined by the second server according to a preset aggregation algorithm, one of the plurality of data identifier sets, the first ciphertext, and the second attribute data set;

the sending module 23 is configured to send, in sequence, a plurality of second ciphertexts of the data set to the first server, and obtain an aggregation result of the second attribute data of the data set according to the aggregation algorithm, the private key, and the plurality of second ciphertexts of the data set by the first server.

Optionally, the determining module 22 is configured to: respectively determining a second ciphertext corresponding to each data identification set D in the plurality of data identification sets in a target mode, wherein the target mode comprises the following steps: when the D meets the preset condition, determining a second ciphertext according to the aggregation algorithm, the D, the second attribute data set and the first ciphertext;

and determining the second ciphertext corresponding to the plurality of data identification sets as a plurality of second ciphertexts of the data set.

Optionally, the determining module 22 is specifically configured to: searching second attribute data corresponding to the data identifier belonging to the D from the second attribute data set;

aggregating the second attribute data corresponding to the data identifier belonging to the D according to an aggregation algorithm to obtain a first aggregation result;

And encrypting the first aggregation result according to the first ciphertext, the public key and calculation logic corresponding to the aggregation algorithm to obtain a second ciphertext.

Optionally, the sending module 23 is further configured to: and when the D is determined not to meet the preset condition, sending a first ciphertext to the first server.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the apparatus shown in fig. 6 may perform a method embodiment corresponding to the first server, the apparatus shown in fig. 7 may perform a method embodiment corresponding to the second server, and the foregoing and other operations and/or functions of each module in the apparatus are respectively for implementing a method embodiment corresponding to the data processing device, which is not described herein for brevity.

The data processing apparatus according to the embodiment of the present application is described above in terms of functional blocks with reference to the accompanying drawings. It should be understood that the functional module may be implemented in hardware, or may be implemented by instructions in software, or may be implemented by a combination of hardware and software modules. Specifically, each step of the method embodiment in the embodiment of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in a software form, and the steps of the method disclosed in connection with the embodiment of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.

As shown in fig. 8, the data processing apparatus 700 may include:

a memory 710 and a processor 720, the memory 710 being configured to store a computer program and to transfer the program code to the processor 720. In other words, the processor 720 may call and run a computer program from the memory 710 to implement the method in the embodiment of the present application.

For example, the processor 720 may be configured to perform the above-described method embodiments according to instructions in the computer program.

In some embodiments of the application, the processor 720 may include, but is not limited to:

a general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

In some embodiments of the application, the memory 710 includes, but is not limited to:

volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DR RAM).

In some embodiments of the application, the computer program may be partitioned into one or more modules that are stored in the memory 710 and executed by the processor 720 to perform the methods provided by the application. The one or more modules may be a series of computer program instruction segments capable of performing particular functions in describing the execution of the computer program in the data processing apparatus.

As shown in fig. 8, the data processing apparatus may further include:

a transceiver 730, the transceiver 730 being connectable to the processor 720 or the memory 710.

The processor 720 may control the transceiver 730 to communicate with other devices, and in particular, may send information or data to other devices or receive information or data sent by other devices. Transceiver 730 may include a transmitter and a receiver. Transceiver 730 may further include antennas, the number of which may be one or more.

It will be appreciated that the various components in the data processing apparatus are connected by a bus system comprising, in addition to a data bus, a power bus, a control bus and a status signal bus.

The present application also provides a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, embodiments of the present application also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments described above.

When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional modules in various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

The above is only a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data processing method, the method being applied to a first server, the first server holding a first set of attribute data for a data object, the method comprising:

2. The method of claim 1, wherein said sending, for each of said M data sets, to a second server, a plurality of sets of data identifications of said data sets and a first ciphertext encrypted using a public key, comprises:

determining one of the plurality of data identification sets by a first method, the first method being: randomly selecting a plurality of first attribute data in an unlabeled state from the data group, and forming a data identification set D by data identifications corresponding to the first attribute data in the unlabeled state;

transmitting the D and the first ciphertext to the second server;

when receiving a second ciphertext sent by the second server, setting first attribute data corresponding to the data identifier in the D in the first attribute data set as a mark state, wherein the second ciphertext is determined by the second server when the D meets a preset condition;

if the first attribute data in the unlabeled state still exists in the data group, continuing to determine one of the data identification sets in the plurality of data identification sets through the first mode, and sending the D and the first ciphertext to a second server until the first attribute data in the unlabeled state does not exist in the data group or the data identification set meeting the preset condition does not exist.

3. The method according to claim 2, wherein the method further comprises:

and receiving the first ciphertext transmitted by the second server, wherein the first ciphertext is transmitted by the second server when the D is determined to not meet the preset condition.

4. The method of claim 2, wherein randomly selecting the first attribute data for the plurality of unlabeled states from the data set comprises:

determining a random number r according to a preset value range of the random number;

and randomly selecting the first attribute data of the r unlabeled states from the data group.

5. The method according to claim 2, wherein the method further comprises:

generating a public key and the private key, and sending the public key to the second server.

6. The method of claim 1, wherein obtaining the aggregate result of the second attribute data of the data set based on the aggregation algorithm, the private key, and the plurality of second ciphertexts of the data set comprises:

decrypting the second ciphertexts of the data set by using the private key respectively to obtain a plurality of first aggregation results of the data set;

And aggregating the plurality of first aggregation results by using the aggregation algorithm to obtain an aggregation result of the second attribute data of the data set.

7. The method of claim 1, wherein the first ciphertext is homomorphic encrypted ciphertext, and wherein the value of the first ciphertext is preset according to the aggregation algorithm.

8. A data processing method, the method being applied to a second server holding a second set of attribute data for a data object, the method comprising:

9. The method of claim 8, wherein the determining the plurality of second ciphertexts for the data set comprises:

determining a second ciphertext corresponding to each data identification set D in the plurality of data identification sets respectively in a target manner, wherein the target manner comprises: when the D meets the preset condition, determining a second ciphertext according to the aggregation algorithm, the D, the second attribute data set and the first ciphertext;

and determining the second ciphertext corresponding to each of the plurality of data identification sets as a plurality of second ciphertexts of the data set.

10. The method of claim 9, wherein said determining a second ciphertext from said aggregation algorithm, said D, said second attribute data set, and said first ciphertext comprises:

searching second attribute data corresponding to the data identifier belonging to the D from the second attribute data set;

According to the aggregation algorithm, aggregating the second attribute data corresponding to the data identifier belonging to the D to obtain a first aggregation result;

and encrypting the first aggregation result according to the first ciphertext, the public key and calculation logic corresponding to the aggregation algorithm to obtain the second ciphertext.

11. The method of claim 8, wherein the first ciphertext is homomorphic encrypted ciphertext, and wherein the value of the first ciphertext is preset according to the aggregation algorithm.

12. A data processing apparatus, the data processing apparatus holding a first set of attribute data for a data object, the apparatus comprising:

the sending module is used for sending a plurality of data identification sets of the data sets and first ciphertext encrypted by using a public key to a second server for each data set in the M data sets, wherein the second server holds a second attribute data set of the data object and is used for determining a plurality of second ciphertext of the data sets, each second ciphertext is determined by the second server according to a preset aggregation algorithm, one data identification set in the plurality of data identification sets, the first ciphertext and the second attribute data set, and one data identification set consists of data identifications corresponding to first attribute data in an unmarked state randomly selected by the first server from the data sets;

13. A data processing apparatus, the data processing apparatus holding a second set of attribute data for a data object, the apparatus comprising:

the determining module is used for determining a plurality of second ciphertexts of the data set, wherein each second ciphertext is determined by a second server according to a preset aggregation algorithm, one data identification set in the plurality of data identification sets, the first ciphertext and the second attribute data set;

14. A data processing apparatus, comprising:

a processor and a memory for storing a computer program, the processor being for invoking and running the computer program stored in the memory to perform the method of any of claims 1 to 7 or 8 to 11.

15. A computer readable storage medium storing a computer program for causing a computer to perform the method of any one of claims 1 to 7 or 8 to 11.