CN114154196A - Heterogeneous data processing method and device and electronic equipment - Google Patents

Heterogeneous data processing method and device and electronic equipment Download PDF

Info

Publication number
CN114154196A
CN114154196A CN202111462228.9A CN202111462228A CN114154196A CN 114154196 A CN114154196 A CN 114154196A CN 202111462228 A CN202111462228 A CN 202111462228A CN 114154196 A CN114154196 A CN 114154196A
Authority
CN
China
Prior art keywords
identifier
data
private
processing
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111462228.9A
Other languages
Chinese (zh)
Inventor
严强
廖飞强
李昊轩
王朝阳
李辉忠
张开翔
范瑞彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202111462228.9A priority Critical patent/CN114154196A/en
Publication of CN114154196A publication Critical patent/CN114154196A/en
Priority to PCT/CN2022/124375 priority patent/WO2023098294A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application provides a heterogeneous data processing method, a heterogeneous data processing device and electronic equipment, wherein the method comprises the following steps: the method comprises the steps of obtaining a blinded identifier set after blinding processing corresponding to a service to be processed sent by each user side, carrying out matching processing on each blinded identifier set according to a preset matching rule to obtain a matching identifier set, generating identifier record information corresponding to each user side according to the matching identifier set and each blinded identifier set, and sending the identifier record information corresponding to each user side to the corresponding user side respectively so that the user side carries out heterogeneous data decryption calculation according to the received identifier record information to obtain a plaintext processing result of the service to be processed. The embodiment can reduce the risk of user information leakage, and further improve the use experience of the user.

Description

Heterogeneous data processing method and device and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of block chains, in particular to a heterogeneous data processing method and device and electronic equipment.
Background
With the development of internet technology, interaction among organizations is more and more frequent, and when financial services are realized, the situation that heterogeneous data among different organizations are jointly calculated may be involved.
In the prior art, when performing joint calculation of heterogeneous data, sample alignment may be performed in a privacy intersection manner to obtain plaintext identification information in sample records shared by each party, and then a safe multiparty calculation or a federal learning algorithm is performed on a data set corresponding to the plaintext identification information to implement joint calculation of heterogeneous data across organizations.
However, in the real business, when the samples are aligned, the output is the common plaintext information, that is, all the mechanisms participating in the calculation can acquire the information common to other mechanisms and the mechanisms, so that the risk of user information leakage is increased, the compliance risk is further increased, and the use experience of the user is influenced.
Disclosure of Invention
The embodiment of the application provides a heterogeneous data processing method and device and electronic equipment, so that the risk of user information leakage is reduced.
In a first aspect, an embodiment of the present application provides a heterogeneous data processing method, which is applied to a cooperator, and includes:
acquiring a blinded identifier set after blinded processing corresponding to a service to be processed sent by each user side;
matching each blinded identifier set according to a preset matching rule to obtain a matched identifier set;
and generating identifier record information corresponding to each user side according to the matching identifier set and each blinding identifier set, and respectively sending the identifier record information corresponding to each user side to the corresponding user side, so that the user side performs heterogeneous data decryption calculation according to the received identifier record information to obtain a plaintext processing result of the service to be processed.
Optionally, the performing matching processing on each blinded identifier set according to a preset matching rule to obtain a matching identifier set includes:
and comparing the blinded identifier sets corresponding to different user sides, and determining the intersection of the blinded identifier sets corresponding to different user sides to obtain a matching identifier set.
Optionally, the generating identifier record information corresponding to each user side according to the matching identifier set and each blinded identifier set includes:
deleting the blinded identifiers corresponding to the matched identifier sets in the blinded identifier sets to obtain private identifier sets corresponding to the user sides;
mixing the matching identifier set and the supplemented private identifier sets corresponding to the user sides to obtain an aligned sample set;
for each alignment sample in the alignment sample set, if the alignment sample belongs to the matching identifier set, generating identifier record information corresponding to each user side according to a pre-stored first generation rule;
and if the alignment sample belongs to the private identifier set corresponding to each user side, generating identifier record information corresponding to each user side according to a pre-stored second generation rule.
Optionally, if the alignment sample belongs to the matching identifier set, generating identifier record information corresponding to each user side according to a pre-stored first generation rule, including:
if the alignment sample belongs to the matching identifier set, randomly generating a pair of first private fragment data about a value 1;
and generating identifier record information corresponding to each user side according to the alignment sample and the first private fragment data.
Optionally, if the alignment sample belongs to the private identifier set corresponding to each user side, generating identifier record information corresponding to each user side according to a second pre-stored generation rule, including:
if the alignment sample belongs to the private identifier set corresponding to each user side, randomly generating a pair of second private fragment data related to the value 0;
and generating identifier record information corresponding to each user side according to the alignment sample and the second private fragment data.
Optionally, the performing, according to a preset completion rule, completion processing on the private identifier set corresponding to each user side to obtain a completed private identifier set corresponding to each user side includes:
determining a target private identifier set with the largest number of private identifiers in the private identifier sets corresponding to the user sides;
and according to the number of the private identifiers in the target private identifier set, performing complementation processing on other private identifier sets except the target private identifier set in the private identifier set corresponding to each user side to obtain the complemented private identifier set corresponding to each user side.
Optionally, after the generating identifier record information corresponding to each user end according to the matching identifier set and each blinded identifier set, the method further includes:
converting the identifier record information corresponding to each user side according to a pre-stored hash function to obtain converted identifier record information;
and uploading the converted identifier record information to a preset block chain.
Optionally, before the obtaining the blinding identifier set after the blinding processing of the to-be-processed service sent by each user side, the method further includes:
constructing a data transmission channel between the cooperator side and each user side according to a preset channel construction rule;
the obtaining of the blinded identifier set after the blinding process corresponding to the service to be processed sent by each user side includes:
and acquiring a blinded identifier set after blinding corresponding to the service to be processed sent by each user side through a pre-constructed data transmission channel.
In a second aspect, an embodiment of the present application provides a heterogeneous data processing method, applied to a user side, including:
performing blind processing on identifier data corresponding to the service to be processed according to a pre-stored blind processing rule to obtain a blind identifier set;
sending the blinded identifier sets to a cooperator side so that the cooperator side performs matching processing according to the blinded identifier sets sent by the user sides to obtain matching identifier sets, and then generating identifier record information corresponding to the user sides according to the matching identifier sets and the blinded identifier sets;
and receiving a plurality of identifier recording information sent by the cooperative party end, and carrying out heterogeneous data decryption calculation according to the plurality of identifier recording information to obtain a plaintext processing result of the service to be processed.
Optionally, the blinding the identifier data corresponding to the service to be processed according to the pre-stored blinding rule to obtain a blinding identifier set, including:
determining a blinding parameter according to a preset blinding parameter processing rule;
processing the blinding parameter and the identifier data corresponding to the service to be processed according to a preset hash function to obtain an initial blinding identifier set;
generating a preset number of noise identifiers which are not repeated with the initial blinding identifier set according to a preset noise generation rule;
and obtaining a blind identifier set according to the noise identifier and the initial blind identifier set.
Optionally, the determining the blinding parameter according to the preset blinding parameter processing rule includes:
randomly determining a first initial blinding parameter, and receiving second initial blinding parameters sent by other user sides;
and carrying out XOR operation on the first initial blinding parameter and the second initial blinding parameter to obtain a blinding parameter.
Optionally, the identifier record information includes an identifier position, and the performing heterogeneous data decryption calculation according to the identifier record information to obtain a plaintext processing result of the service to be processed includes:
for each identifier recording information, judging whether the identifier recording information is a noise identifier according to the identifier position;
if the identifier recording information is not a noise identifier, determining first initial data corresponding to the identifier position according to the identifier position and identifier data corresponding to the service to be processed;
processing the first initial data and the second initial data sent by other user ends jointly according to a preset joint calculation processing rule to obtain an intermediate result fragment corresponding to the identifier recording information;
and performing aggregation processing on the intermediate result fragments corresponding to each identifier record information to obtain an intermediate result, and performing decryption processing on the intermediate result according to a preset secure multiparty computing protocol to obtain a plaintext processing result of the service to be processed.
Optionally, the identifier record information includes sub-private fragment data generated by first private fragment data or second private fragment data, and the combining processing is performed on the first initial data and second initial data sent by other user sides according to a preset combining computation processing rule to obtain an intermediate result fragment corresponding to the identifier record information, including:
carrying out fragmentation processing on the first initial data according to a preset fragmentation rule to obtain a plurality of first initial subdata;
sending a preset number of first initial subdata in the plurality of first initial subdata to other user terminals to obtain first residual initial subdata, and simultaneously receiving a preset number of second external initial subdata sent by the other user terminals, wherein the second external initial subdata is obtained by carrying out fragmentation processing on the second subdata in the other user terminals according to a preset fragmentation rule;
and jointly calculating the first remaining initial sub-data, the second outgoing initial sub-data, the first sub-private fragment data, and the second remaining initial sub-data, the first outgoing initial sub-data, and the second sub-private fragment data in the other user sides according to a preset private sharing protocol to obtain an intermediate result fragment corresponding to the identifier recording information, wherein the first sub-private fragment data is sub-private fragment data included in the identifier recording information corresponding to the local user side, and the second sub-private fragment data is sub-private fragment data included in the identifier recording information corresponding to the other user sides.
Optionally, the method further includes:
and if the identifier recording information is a noise identifier, setting data corresponding to the identifier position to be zero.
Optionally, after obtaining the plaintext processing result of the service to be processed, the method further includes:
converting the plaintext processing result of the service to be processed according to a prestored Hash function to obtain a converted plaintext processing result;
and uploading the converted plaintext processing result to a preset block chain.
In a third aspect, an embodiment of the present application provides a heterogeneous data processing apparatus, which is applied to a cooperator, and includes:
the acquiring module is used for acquiring a blinded identifier set after blinded processing corresponding to the service to be processed sent by each user side;
the processing module is used for matching each blinded identifier set according to a preset matching rule to obtain a matching identifier set;
the processing module is further configured to generate identifier record information corresponding to each user side according to the matching identifier set and each blinding identifier set, and send the identifier record information corresponding to each user side to the corresponding user side, so that the user side performs heterogeneous data decryption calculation according to the received identifier record information to obtain a plaintext processing result of the service to be processed.
In a fourth aspect, an embodiment of the present application provides a heterogeneous data processing apparatus, applied to a user side, including:
the processing module is used for carrying out blind processing on the identifier data corresponding to the service to be processed according to a pre-stored blind processing rule to obtain a blind identifier set;
the sending module is used for sending the blinded identifier sets to a cooperator end so that the cooperator end performs matching processing according to the blinded identifier sets sent by the user ends to obtain matching identifier sets, and then generates identifier record information corresponding to the user ends according to the matching identifier sets and the blinded identifier sets;
the processing module is further configured to receive the plurality of identifier record information sent by the cooperator, and perform heterogeneous data decryption calculation according to the plurality of identifier record information to obtain a plaintext processing result of the service to be processed.
In a fifth aspect, an embodiment of the present application provides an electronic device, including: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the heterogeneous data processing method of any one of the first and second aspects.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method for processing heterogeneous data according to any one of the first and second aspects is implemented.
In a fifth aspect, an embodiment of the present application provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the heterogeneous data processing method according to any one of the first aspect and the second aspect.
The embodiment of the application provides a heterogeneous data processing method, a device and electronic equipment, after adopting the scheme, a cooperator can firstly obtain a blind identifier set after blind processing corresponding to a service to be processed sent by each user terminal, then carry out matching processing on each blind identifier set according to a preset matching rule to obtain a matching identifier set, generate identifier record information corresponding to each user terminal according to the matching identifier set and each blind identifier set, respectively send the identifier record information corresponding to each user terminal to the corresponding user terminals, enable the user terminal to carry out heterogeneous data decryption calculation according to the received identifier record information to obtain a plaintext processing result of the service to be processed, and realize a sample alignment mode by adding a cooperator terminal only processing non-sensitive ciphertext data after blind processing, the problem that a sample identifier set shared by multiple parties needs to be disclosed in a traditional sample alignment mode is solved, the risk of user information leakage is reduced, the safety of heterogeneous data blind processing is improved, the compliance risk is reduced, and the use experience of a user is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic architecture diagram of an application system of a heterogeneous data processing method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a heterogeneous data processing method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a heterogeneous data processing method according to another embodiment of the present application;
fig. 4 is a schematic structural diagram of a heterogeneous data processing apparatus according to an embodiment of the present application;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of including other sequential examples in addition to those illustrated or described. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the prior art, different mechanisms can be represented by different terminal devices (which can be clients), different clients can respectively have heterogeneous data sets with different fields, and aggregation statistical operation can be performed after sample alignment is performed according to identifiers recorded in the data sets on the premise that self-sensitive data plaintext is not provided mutually. For example, user side a has 100 records (id, X1) about the sensitive data X1, and user side b has 50 records (id, X2) about the sensitive data X2. The common id set of the two parties A and B has a size of 40, after sample alignment is carried out, 40 records which are common can be found out, and the SUM Y of the sensitive data products of the 40 records is calculated to be SUM (X1X 2), so that the joint operation of heterogeneous data can be realized. However, in the real business, when the samples are aligned, the output is the common plaintext information, that is, all the mechanisms participating in the calculation can acquire the information common to other mechanisms and the mechanisms, so that the risk of user information leakage is increased, the compliance risk is further increased, and the use experience of the user is influenced.
Based on the technical problem, the method for aligning the samples by adding the cooperator end which only processes the non-sensitive ciphertext data after the blind processing overcomes the problem that a sample identifier set shared by multiple parties needs to be disclosed in the traditional sample aligning mode, reduces the risk of user information leakage, improves the safety of blind processing of heterogeneous data, reduces the compliance risk, and further improves the use experience of users.
Fig. 1 is a schematic architecture diagram of an application system of a heterogeneous data processing method provided in an embodiment of the present application, and as shown in fig. 1, the application system may include: a protocol side 101 and different user sides 102, wherein the number of the user sides 102 may be two, three or more. Each user terminal 102 may perform a blinding process on the locally stored identifier set corresponding to the service to be processed to obtain a blinded identifier set, and then may send the blinded identifier set to the protocol side 101 to perform a sample alignment process, so as to generate identifier record information corresponding to each user terminal. And then the identifier record information corresponding to each user side is sent to the corresponding user side, so that the user side performs heterogeneous data decryption calculation according to the received identifier record information to obtain a plaintext processing result of the service to be processed.
The cooperator side 101 may be an individual server or a server cluster. The user terminal 102 may be an independent server, a server cluster, a personal computer, a smart phone, a tablet, or the like. And each user terminal can be the same device or different devices.
The technical solution of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 2 is a flowchart illustrating a heterogeneous data processing method according to an embodiment of the present application, where the method of this embodiment may be executed by the protocol side 101. As shown in fig. 2, the method of this embodiment may include:
s201: and acquiring a blinded identifier set after blinded processing corresponding to the service to be processed sent by each user side.
In this embodiment, when the service to be processed is implemented, data corresponding to the service to be processed may be obtained from different user sides, and then the data is processed jointly according to the obtained heterogeneous data corresponding to the service to be processed, so as to implement the service to be processed.
However, since there may be a plurality of data in different clients, some data may be used and some data may not be used when implementing the service to be processed, and therefore, a sample alignment process needs to be performed on the data to obtain related data that needs to be used in implementing the service to be processed. In order to improve the efficiency of sample alignment, a protocol side end may be added, and the process of sample alignment processing is realized by sending an identifier set representing data from a user side to the protocol side end, so as to improve the processing efficiency of the sample alignment process.
However, in the process of sending the identifier set from each user side to the protocol side, data leakage may occur. In order to reduce the risk of data leakage, each user side may perform blind processing on the identifier set corresponding to the service to be processed to obtain a blind identifier set after the blind processing, and then send the blind identifier set to the cooperator side.
The blinded identifier set may include a plurality of blinded identifiers, and each blinded identifier is a data identifier after a blinded process. Illustratively, the data identification may be a data id, and the blinding identifier may be an id after the blinding process. For example, the data identifier before the blinding process may be a0001, and the data identifier after the blinding process may be 0xAF12C 3.
In addition, the blind processing mode can be set according to the actual application scene in a self-defined manner, and is not defined in detail here.
S202: and matching each blinded identifier set according to a preset matching rule to obtain a matched identifier set.
In this embodiment, after obtaining the blinded identifier sets after the blinded processing of each user side, the blinded identifier sets of different user sides may be subjected to matching processing, so as to obtain matching identifier sets including the same identifier information in different blinded identifier sets.
Further, matching each blinded identifier set according to a preset matching rule to obtain a matching identifier set, which may specifically include:
and comparing the blinded identifier sets corresponding to different user sides, and determining the intersection of the blinded identifier sets corresponding to different user sides to obtain a matching identifier set.
Specifically, the blind identifier sets corresponding to different user terminals may be compared to determine the same identifier information in the blind identifier sets corresponding to different user terminals, and then all the same identifier information may be allocated to a new set to obtain a matching identifier set.
S203: and generating identifier record information corresponding to each user side according to the matching identifier set and each blinding identifier set, and respectively sending the identifier record information corresponding to each user side to the corresponding user side so that the user side performs heterogeneous data decryption calculation according to the received identifier record information to obtain a plaintext processing result of the service to be processed.
In this embodiment, after the matching identifier set is obtained, the identifier record information corresponding to each user side may be generated according to each piece of blinded identifier information in the matching identifier set and the blinded identifier information in each blinded identifier set, and then the identifier record information corresponding to each user side is sent to the corresponding user side, so that the user side performs processing according to the identifier record information to obtain a plaintext processing result of the service to be processed.
Further, generating identifier record information corresponding to each user side according to the matching identifier set and each blinded identifier set may specifically include:
and deleting the blinded identifiers corresponding to the matched identifier sets in the blinded identifier sets to obtain the private identifier sets corresponding to the user sides.
Specifically, the blinded identifiers corresponding to the matching identifier sets in the blinded identifier sets corresponding to the user sides may be removed, so as to obtain the private identifier sets corresponding to the user sides.
In addition, the number of sets of private identifiers may be zero, one, or multiple. If the blinded identifier sets in all the user terminals are the same, the number of the private identifier sets is zero. If only one blinded identifier set of the user terminal contains other blinded identifiers except the blinded identifiers in the matching identifier set, the number of the private identifier sets is one.
And performing completion processing on the private identifier sets corresponding to the user sides according to a preset completion rule to obtain the completed private identifier sets corresponding to the user sides.
Specifically, when the private identifier set is complemented, the specific processing procedure may be:
and determining a target private identifier set with the maximum number of private identifiers in the private identifier sets corresponding to the user sides.
And according to the number of the private identifiers in the target private identifier set, performing complementation processing on other private identifier sets except the target private identifier set in the private identifier set corresponding to each user side to obtain the complemented private identifier set corresponding to each user side.
Correspondingly, a target private identifier set containing the largest number of private identifiers can be determined from the private identifier sets, and then the number of the private identifiers in other private identifier sets is complemented to be consistent with the private identifier data in the target private identifier set, that is, the number of the private identifiers contained in all the private identifier sets after complementation is consistent, so that convenience is provided for subsequent analysis processing. And the private identifiers in the private identifier set are blinded identifiers after blinding processing.
When the private identifier set is subjected to the padding process, the padding may be performed in such a manner that a noise identifier that does not overlap with a conventional blinded identifier is generated by a conventional noise identifier generation rule.
And mixing the matching identifier set and the supplemented private identifier sets corresponding to the user sides to obtain an aligned sample set.
Specifically, after the private identifier sets are complemented, in order to improve the security of the subsequent data processing process, the data in the matching identifier set and the data in the private identifier sets corresponding to the supplemented clients may be randomly mixed, so as to obtain the mixed aligned sample set. Wherein each aligned sample in the set of aligned samples has a corresponding sample number.
The data arrangement sequence before is disturbed by mixing the data, so that the possibility of data leakage is further reduced, and the safety of data transmission is further improved.
And for each alignment sample in the alignment sample set, if the alignment sample belongs to the matching identifier set, generating identifier record information corresponding to each user side according to a pre-stored first generation rule.
And if the alignment sample belongs to the private identifier set corresponding to each user side, generating identifier record information corresponding to each user side according to a pre-stored second generation rule.
Specifically, after the aligned sample set after the mixing processing is obtained, the aligned sample set may be randomly selected from the aligned sample set, and then the original source of the aligned sample is determined according to the sample identifier included in the aligned sample, that is, it is determined whether the aligned sample belongs to the matching identifier set or the private identifier set corresponding to each user side, and then the corresponding generation rule is determined according to the original source of the aligned sample, and the identifier record information corresponding to each user side is generated according to the corresponding generation rule.
Further, if the alignment sample belongs to the matching identifier set, generating identifier record information corresponding to each user end according to a pre-stored first generation rule, which may specifically include:
if the alignment sample belongs to the matching identifier set, a pair of first private tile data about a value of 1 is randomly generated.
And generating identifier record information corresponding to each user side according to the alignment sample and the first private fragment data.
In addition, if the alignment sample belongs to the private identifier set corresponding to each user side, generating identifier record information corresponding to each user side according to a second pre-stored generation rule, which may specifically include:
and if the alignment sample belongs to the private identifier set corresponding to each user side, randomly generating a pair of second private fragment data related to the value 0.
And generating identifier record information corresponding to each user side according to the alignment sample and the second private fragment data.
The first private fragment data related to the value 1 is a group of first private fragment data whose sum is 1, and the number of the first private fragment data corresponds to the number of the user terminals. For example, if there are two ues, when generating the first private fragment data with a value of 1, a random number r _ w is randomly generated first, and then the first private fragment data is generated as a selector _ w (r _ w, 1-r _ w). Similarly, the second private fragment data with the value 0 is a group of second private fragment data with a sum of 0, and the number of the second private fragment data corresponds to the number of the user terminals. For example, if there are two ues, when generating the second private fragment data related to the value 0, a random number r _ w is randomly generated first, and then the second private fragment data is generated as selector _ w ═ r _ w.
After the first private fragment data or the second private fragment data is obtained, the identifier record information corresponding to each user side can be generated according to the alignment sample and the obtained first private fragment data or the obtained second private fragment data, that is, the identifier record information after the information related to other user sides is deleted, and then the generated identifier record information corresponding to the user side is sent to the corresponding user side.
For example, assume that there are two clients, namely client a and client B, and client a has four data samples, and table 1 is a data sample table of client a, where the data samples respectively include identifier information and corresponding specific data. Wherein, the specific data in the user terminal a is income level (income).
Table 1 data sample table of user side a
id income
A0001 100
A0002 300
A0003 200
A0004 150
The user B has six data samples, and table 2 is a data sample table of the user B, where the data samples respectively include identifier information and corresponding specific data. The specific data in the ue B is a tax (tax _ rate).
Table 2 data sample table of user B
Id tax_rate
A0001 0.10
A0003 0.35
A0004 0.2
A0005 0.25
A0007 0.3
A0009 0.22
Then, the ue a may perform a blinding process on the identifier information (i.e., Id) of the ue a to obtain a blinding identifier set Id _ bn _ a corresponding to the ue a (the identifier position pos _ a _ i, the identifier data Id _ bn _ a _ i). The ue B may perform a blinding process on the identifier information (i.e., Id) of the ue B to obtain a blinding identifier set Id _ bn _ B corresponding to the ue B (i.e., identifier position pos _ B _ i, identifier data Id _ bn _ B _ i). Client a and client B may then send the respective blinded set of identifiers to the cooperator C. After receiving the blinded identifier sets sent by the user side a and the user side B, the cooperator side C may compare the blinded identifier sets of the user side a and the user side B to find all the same blinded identifiers to obtain a matching identifier set id _ match. Correspondingly, there are three identical blinded identifiers in the matching identifier set id _ match, namely blinded identifiers 0xAF12C3, 0xCC6712 and 0x2E341B after the blinding processing of A0001, A0003 and A0004. In addition, two sets of private identifiers can be obtained, namely, the set id _ rest _ a of the private identifier corresponding to the user terminal a and the set id _ rest _ B of the private identifier corresponding to the user terminal B. And the cooperative party end C can not see the identifier plaintext, and can only see the blinded identifier, so that the risk of identifier information leakage is reduced.
Further, the cooperator C may select a size _ rest _ large ═ max (size (id _ rest _ a), size (id _ rest _ B)) record of the larger set of id _ rest _ a and id _ rest _ B, add non-repetitive noise identifiers to the smaller set, and perform padding to the size of size _ rest _ large, so as to obtain two padded sets of id _ rest _ a and id _ rest _ B with the same size. Where the corresponding identifier location field pos _ a _ i or pos _ B _ i is incremented accordingly. Correspondingly, the ue B has 2 more data than the ue a, and the cooperator C inserts 2 noise identifiers into id _ rest _ a, so that the data amount is the same as id _ rest _ B. Then, the cooperator C may shuffle and mix the elements in the three sets id _ match, id _ rest _ a, and id _ rest _ B into a complete aligned sample set BAS, then may randomly select an aligned sample of the w-th group from the aligned sample set, and if the aligned sample of the w-th group is from id _ match, generate a pair of first secret slice data selector _ w ═ of the value 1 (r _ w, 1-r _ w), where r _ w is a random number newly generated independently at each time, and the finally obtained aligned sample is (idx _ w, pos _ a _ i, pos _ B _ j, selector _ w). Wherein the aligned sample is also the blinded aligned set of samples.
In addition, if the w-th group of aligned samples is not from id _ match, a group of data is randomly selected from id _ rest _ a and id _ rest _ B, and a pair of second secret slice data selector _ w with respect to data 0 is generated as (r _ w, -r _ w), where r _ w is a random number that is independently generated each time, and the finally obtained aligned samples are (idx _ w, pos _ a _ i, pos _ B _ j, selector _ w). Wherein each time data is selected as a sample that is not put back, the previously selected data will not be selected next time. Then, the cooperator C may obtain the identifier record information corresponding to the ue a and the ue B according to the aligned samples (idx _ w, pos _ a _ i, pos _ B _ j, selector _ w), that is, the identifier record information corresponding to the ue a is (idx _ w, pos _ a _ i, selector _ w [0]), and the identifier record information corresponding to the ue B is (idx _ w, pos _ B _ j, selector _ w [1 ]). The cooperative party C can send (idx _ w, pos _ A _ i, selector _ w [0]) to the user terminal A through the preset Channel _ AC, and can send (idx _ w, pos _ B _ j, selector _ w [1]) to the user terminal B through the preset Channel _ BC.
Further, the pending transaction may be a transfer, a balance inquiry, a loan, or the like. In the present embodiment, the pending traffic is to calculate the tax total, that is, the tax total is determined by the expression SUM (inco × tax _ rate). Therefore, after receiving the identifier record information, the ue a and the ue B may perform a heterogeneous data decryption calculation according to the received (idx _ w, pos _ a _ i, selector _ w [0]) and (idx _ w, pos _ B _ j, selector _ w [1]), so as to obtain the total tax amount.
After the scheme is adopted, the cooperative party end can firstly obtain the blinded identifier sets after the blinded processing corresponding to the service to be processed sent by each user end, then carry out the matching processing to each blinded identifier set according to the preset matching rule to obtain the same identifier information, obtain the matching identifier sets according to the same identifier information and each blinded identifier set, then generate the identifier record information corresponding to each user end according to the matching identifier sets and each blinded identifier set, and respectively send the identifier record information corresponding to each user end to the corresponding user ends, so that the user end carries out the heterogeneous data decryption calculation according to the received identifier record information to obtain the plaintext processing result of the service to be processed, and the sample alignment mode is realized by adding the cooperative party end only processing the non-sensitive ciphertext data after the blinded processing, the problem that a sample identifier set shared by multiple parties needs to be disclosed in a traditional sample alignment mode is solved, the risk of user information leakage is reduced, the safety of heterogeneous data processing is improved, the compliance risk is reduced, and the use experience of a user is improved.
Based on the method of fig. 2, the present specification also provides some specific embodiments of the method, which are described below.
In another embodiment, after generating the identifier record information corresponding to each user terminal according to the matching identifier set and each blinded identifier set, the method may further include:
and performing conversion processing on the identifier record information corresponding to each user side according to a pre-stored hash function to obtain the converted identifier record information.
And uploading the converted identifier record information to a preset block chain.
In this embodiment, in order to improve the security of data, after the identifier record information is obtained, the identifier record information may be uploaded to a preset block chain, so as to prevent other user terminals from being tampered with, and support post-audit. In addition, after each user side obtains the blinded identifier set after blinding, the blinded identifier set can also be converted according to a pre-stored hash function to obtain the blinded identifier set after conversion, and then the blinded identifier set after conversion can be uploaded to a preset block chain. The blockchain may be implemented in an existing manner, and is not limited in detail herein.
In addition, in another embodiment, before the obtaining of the blinding identifier set after the blinding processing of the to-be-processed service sent by each user end, the method may further include:
and constructing a data transmission channel between the cooperator side and each user side according to a preset channel construction rule.
Obtaining a blinded identifier set after blinding corresponding to the service to be processed sent by each user side, which may specifically include:
and acquiring a blinded identifier set after blinding corresponding to the service to be processed sent by each user side through a pre-constructed data transmission channel.
In this embodiment, each user side may first obtain its own private key and public key, then send their own public keys to each other, and then construct a corresponding data transmission channel based on the received public keys. As an example, it is assumed that there are two user terminals, a user terminal a and a user terminal B, the user terminal a applies for a certificate containing the public key pk _ a from the wayside certificate authority for its private key sk _ a, the user terminal B applies for a certificate containing the public key pk _ B from the wayside certificate authority for its private key sk _ B, and the user terminal a and the user terminal B transmit their respective public keys to each other and verify their authenticity based on the associated certificates. And then the user side A and the user side B construct an anti-eavesdropping safety Channel, namely a data transmission Channel _ AB based on the public key of the other side, when the user side A sends a message m to the user side B, the message is sent through the Channel _ AB, namely the message is encrypted through the public key pk _ B of the user side B and then sent to the user side B, and after the message is received by the user side B, the message is decrypted by using the private key sk _ B of the user side B to obtain the plaintext of the user side B, and vice versa. In addition, a third cooperator terminal C without hand-sensitive data can be introduced, the steps are repeated, corresponding secure channels Channel _ AC and Channel _ BC are constructed, and data are transmitted through a data transmission Channel, so that the data transmission safety is improved.
Fig. 3 is a flowchart illustrating a heterogeneous data processing method according to another embodiment of the present application, where the method of the present embodiment can be executed by the user terminal 102. As shown in fig. 3, the method of this embodiment may include:
s301: and performing blind processing on the identifier data corresponding to the service to be processed according to a pre-stored blind processing rule to obtain a blind identifier set.
In this embodiment, at least one identifier data corresponding to the service to be processed in each user side may be processed according to a pre-stored blinding processing rule to obtain a blinding identifier set.
Further, the blinding processing is performed on the identifier data corresponding to the service to be processed according to a pre-stored blinding processing rule to obtain a blinding identifier set, which may specifically include:
and determining the blinding parameters according to a preset blinding parameter processing rule.
And processing the blinding parameters and the identifier data corresponding to the service to be processed according to a preset hash function to obtain an initial blinding identifier set.
And generating a preset number of noise identifiers which are not repeated with the initial blinding identifier set according to a preset noise generation rule.
And obtaining a blind identifier set according to the noise identifier and the initial blind identifier set.
Specifically, when the blinding parameter is determined according to the preset blinding parameter processing rule, a first initial blinding parameter may be randomly determined, and a second initial blinding parameter sent by another user side is received. And then carrying out XOR operation on the first initial blinding parameter and the second initial blinding parameter to obtain a blinding parameter. Wherein, the second initial blinding parameter may be one or more.
In addition, when a preset number of noise identifiers which are not repeated with the initial blinding identifier set are generated according to a preset noise generation rule, N noise identifiers which are not repeated with the blinding identifiers in the existing initial blinding identifier set can be generated according to a preset noise generation rate, then the N noise identifiers are added into the initial blinding identifier set, and the sequence of the N noise identifiers is disturbed, so that the blinding identifier set is obtained.
Illustratively, there are two ues, i.e., a ue a and a ue B, where the ue a independently selects a random number B _ a, and sends the random number B _ a as a blind seed fragment to the ue B through a Channel _ AB. And the user terminal B independently selects the random number B _ B as a blinding seed fragment and sends the blinding seed fragment to the user terminal A through the Channel _ AB. The user end a calculates the blinding parameter B ═ B _ a XOR B _ B. Then, the user terminal a may sequentially calculate id _ b as Hash (id, b) for all id fields in its N _ a records, and obtain N _ a blinded identifier sets id _ b _ a. For example, id _ b ═ Hash ("a0001", b ═ a0001 ") is a blinded character string 0xAF12C3 corresponding to" a0001", so that leakage of id plaintext data is avoided in the sample alignment stage.
In addition, the user terminal a generates N _ a × rate _ N noise identifiers that do not overlap with the existing elements in id _ b _ a based on the noise rate _ N estimated by itself, adds the generated noise identifiers to the initial blinding identifier set id _ b _ a, then breaks the order of the noise identifiers, and finally obtains a blinding identifier set, which is denoted as id _ bn _ a ═ identifier position pos _ a _ i, identifier data id _ bn _ a _ i }. For example, if N _ a is 4, rate _ N is 50%, and N _ a is 2, 2 noise identifiers are added to id _ bn _ a, that is, 4+2 is 6 blinded identifiers in the blinded identifier set.
Similarly, the ue B may generate its own blinded identifier set as id _ bn _ B ═ identifier position pos _ B _ i, identifier data id _ bn _ B _ i }.
Since the input id common to the user terminal a and the user terminal B is the same, the generated blinded id value is also the same. For example, id _ b ═ Hash ("a0001", b) ═ a blinded string 0xAF12C3 corresponding to "a0001", and thus can be used in the subsequent sample id blinding comparison process to determine the matching identifier set.
S302: and sending the blind identifier sets to a cooperative party end so that the cooperative party end performs matching processing according to the blind identifier sets sent by the user ends to obtain matching identifier sets, and then generating identifier record information corresponding to the user ends according to the matching identifier sets and the blind identifier sets.
In this embodiment, after obtaining the blinded identifier set, the ue may send the blinded identifier to the cooperator through a preset data transmission channel. After receiving the blinded identifier sets sent by the user sides, the cooperator side can also receive the blinded identifier sets sent by other users at the same time, and after receiving the blinded identifier sets sent by the user sides, the cooperator side can process the blinded identifier sets sent by the user sides to obtain a plurality of identifier record information corresponding to the user sides. The specific processing procedures have been described in detail in the foregoing embodiments, and are not limited repeatedly herein.
S303: and receiving a plurality of identifier recording information sent by the cooperative party end, and carrying out heterogeneous data decryption calculation according to the plurality of identifier recording information to obtain a plaintext processing result of the service to be processed.
In this embodiment, after obtaining the identifier record information corresponding to each user terminal, the cooperator may return the obtained plurality of identifier record information to the corresponding user terminal, and after receiving the returned identifier record information, the user terminal may perform heterogeneous data decryption calculation according to the plurality of identifier record information to obtain a plaintext processing result of the service to be processed.
Further, if the identifier recording information includes the identifier position, performing heterogeneous data decryption calculation according to the plurality of identifier recording information to obtain a plaintext processing result of the service to be processed, which may specifically include:
for each of the identifier recording information, it is determined whether the identifier recording information is a noise identifier from the identifier position.
And if the identifier recording information is not a noise identifier, determining first initial data corresponding to the identifier position according to the identifier position and the identifier data corresponding to the service to be processed.
And jointly processing the first initial data and second initial data sent by other user sides according to a preset joint calculation processing rule to obtain an intermediate result fragment corresponding to the identifier recording information.
And performing aggregation processing on the intermediate result fragments corresponding to each identifier record information to obtain an intermediate result, and performing decryption processing on the intermediate result according to a preset secure multiparty computing protocol to obtain a plaintext processing result of the service to be processed.
Specifically, each identifier record information includes an identifier position, namely pos _ a _ i or pos _ B _ j, and then it can be determined whether the identifier position is data in the initially acquired data set sample according to pos _ a _ i or pos _ B _ j. The determination method may be various, and only one specific implementation is illustrated here, and other determination methods are also within the scope of the present application.
Further, the identifier record information includes sub-private fragment data generated by first private fragment data or second private fragment data, and the combining processing is performed on the first initial data and second initial data sent by other user sides according to a preset combining computation processing rule to obtain an intermediate result fragment corresponding to the identifier record information, which may specifically include:
and carrying out fragmentation processing on the first initial data according to a preset fragmentation rule to obtain a plurality of first initial subdata.
Sending a preset number of first initial subdata in the plurality of first initial subdata to other user terminals to obtain first remaining initial subdata, and simultaneously receiving a preset number of second outgoing initial subdata sent by the other user terminals, wherein the second outgoing initial subdata is obtained by carrying out fragmentation processing on the second subdata in the other user terminals according to a preset fragmentation rule.
And jointly calculating the first remaining initial sub-data, the second outgoing initial sub-data, the first sub-private fragment data, and the second remaining initial sub-data, the first outgoing initial sub-data, and the second sub-private fragment data in the other user sides according to a preset private sharing protocol to obtain an intermediate result fragment corresponding to the identifier recording information, wherein the first sub-private fragment data is sub-private fragment data included in the identifier recording information corresponding to the local user side, and the second sub-private fragment data is sub-private fragment data included in the identifier recording information corresponding to the other user sides.
Specifically, after the other user side sends the second outgoing initial sub data, the remaining data may be referred to as second remaining initial sub data.
Further, the method may further include:
and if the identifier recording information is a noise identifier, setting data corresponding to the identifier position to be zero.
For example, taking the user side a and the user side B as an example, the user side a locally reads the record related data pointed by pos _ a _ i in (idx _ w, pos _ a _ i, selector _ w [0]), and determines the corresponding income field in the initial data set sample, which may specifically be:
if pos _ a _ i exceeds the maximum number of records in the original data set sample of user end a, indicating a noisy record added by cooperating end C, income is set to 0.
Income is also set to 0 if pos _ a _ i points to the noise record originally added by user side a itself.
Otherwise, the field of the real income data corresponding to pos _ a _ i in the original dataset sample is read.
In addition, the user terminal a may generate a new random number r _ w based on the selected secure multiparty computing protocol (optionally using the classical secret sharing protocol), split the income value v1_ w into two random shard sets v1_ w (r _ w, v1_ w-r _ w), and send v1_ w [0] (i.e., the first outgoing initial sub-data) to the user terminal B.
Client B reads locally (idx _ w, pos _ B _ j, selector _ w [1]) the record related data pointed by pos _ B _ j, and in the initial data set sample, the corresponding tax _ rate field may specifically be:
if pos _ B _ i exceeds the maximum number of records in the original data set sample for user B, indicating a noisy record added by cooperating end C, tax _ rate is set to 0.
If pos _ B _ i points to the noise record originally added by the user B itself, tax _ rate is also set to 0.
Otherwise, read pos _ B _ i points to the field containing the real tax _ rate data.
In addition, the user terminal B divides the tax _ rate field, splits the tax _ rate value v2_ w into two random slice sets v2_ w (r _ w, v2_ w-r _ w), and sends v2_ w [0] (i.e., the second outgoing initial sub data) to the user terminal a.
Based on the existing secure multiparty computing protocol (e.g. the classic secret sharing protocol), the user terminal a and the user terminal B jointly compute v1_ w _ v2_ w _ selector _ w related intermediate result fragments according to the fragment data v1_ w [1] (i.e. the first remaining initial child data), v2_ w [0] (i.e. the second outgoing initial child data), selector _ w [0] (i.e. the first child private fragment data) in the hand of the user terminal a and the fragment data v1_ w [0], v2_ w [1], selector _ w [1] (i.e. the second child private fragment data) in the hand of the user terminal B under the effect of not revealing v1_ w, v2_ w and selector _ w values.
The value of selector _ w is 1 only when the w-th group contains correct alignment data, and the calculation will obtain an intermediate result containing ciphertext derived from real data, such as data corresponding to a0001, a0003 and a0004 in the example, otherwise, the value of selector _ w is 0, and the intermediate result corresponding to the ciphertext is 0-0, namely any noise identifier, so that the influence of the noise identifier on the final result is eliminated.
And repeating the processes until the traversal of the whole sample alignment set is completed, and after the intermediate result fragments corresponding to each piece of identifier excitation information are aggregated, performing final decryption based on a secure multi-party computing protocol (such as a classical secret sharing protocol) to obtain a plaintext processing result of the final service to be processed.
Based on the foregoing embodiment, table 3 is a sample-aligned logic summary table, which specifically includes identifier information and corresponding specific information common to the user side a and the user side B.
TABLE 3 sample aligned logical summary Table
id income tax
A0001 100 0.10
A0003 200 0.35
A0004 150 0.2
Namely, the SUM of taxation (inco × tax _ rate) is 100 × 0.10+200 × 0.35+150 × 0.2 ═ 110 as the plaintext processing result.
Optionally, the user side a and the user side B may respectively calculate the aggregation Hash of the local fragmented data, and send the result to the block chain for storage.
In addition, in another embodiment, after obtaining a plaintext processing result of the service to be processed, the method may further include:
and converting the plaintext processing result of the service to be processed according to a pre-stored hash function to obtain a converted plaintext processing result.
And uploading the converted plaintext processing result to a preset block chain.
By combining the anti-tampering capability of the block chain, the key password input and the final plaintext processing result are subjected to chain verification, so that the data security is improved, and the post-audit is supported.
Based on the same idea, an embodiment of the present specification further provides a device corresponding to the foregoing method, and fig. 4 is a schematic structural diagram of a heterogeneous data processing device provided in the embodiment of the present application, and is applied to a cooperator, as shown in fig. 4, the device provided in this embodiment may include:
the obtaining module 401 is configured to obtain a blinded identifier set after blinding processing corresponding to a service to be processed sent by each user side.
And the processing module 402 is configured to perform matching processing on each blinded identifier set according to a preset matching rule to obtain a matching identifier set.
In this embodiment, the processing module 402 is further configured to:
and comparing the blinded identifier sets corresponding to different user sides, and determining the intersection of the blinded identifier sets corresponding to different user sides to obtain a matching identifier set.
The processing module 402 is further configured to generate identifier record information corresponding to each user side according to the matching identifier set and each blinding identifier set, and send the identifier record information corresponding to each user side to the corresponding user side, so that the user side performs heterogeneous data decryption calculation according to the received identifier record information to obtain a plaintext processing result of the service to be processed.
In this embodiment, the processing module 402 is further configured to:
and performing completion processing on the private identifier sets corresponding to the user sides according to a preset completion rule to obtain the completed private identifier sets corresponding to the user sides.
And mixing the matching identifier set and the supplemented private identifier sets corresponding to the user sides to obtain an aligned sample set.
And for each alignment sample in the alignment sample set, if the alignment sample belongs to the matching identifier set, generating identifier record information corresponding to each user side according to a pre-stored first generation rule.
And if the alignment sample belongs to the private identifier set corresponding to each user side, generating identifier record information corresponding to each user side according to a pre-stored second generation rule.
Further, the processing module 402 is further configured to:
if the alignment sample belongs to the matching identifier set, a pair of first private tile data about a value of 1 is randomly generated.
And generating identifier record information corresponding to each user side according to the alignment sample and the first private fragment data.
In addition, the processing module 402 is further configured to:
and if the alignment sample belongs to the private identifier set corresponding to each user side, randomly generating a pair of second private fragment data related to the value 0.
And generating identifier record information corresponding to each user side according to the alignment sample and the second private fragment data.
Furthermore, the processing module is further configured to:
and determining a target private identifier set with the maximum number of private identifiers in the private identifier sets corresponding to the user sides.
And according to the number of the private identifiers in the target private identifier set, performing complementation processing on other private identifier sets except the target private identifier set in the private identifier set corresponding to each user side to obtain the complemented private identifier set corresponding to each user side.
Furthermore, in another embodiment, the processing module is further configured to:
and performing conversion processing on the identifier record information corresponding to each user side according to a pre-stored hash function to obtain the converted identifier record information.
And uploading the converted identifier record information to a preset block chain.
Furthermore, in another embodiment, the processing module is further configured to:
and constructing a data transmission channel between the cooperator side and each user side according to a preset channel construction rule.
And acquiring a blinded identifier set after blinding corresponding to the service to be processed sent by each user side through a pre-constructed data transmission channel.
In another embodiment, the present application further provides another heterogeneous data processing apparatus, applied to a user side, where the apparatus may include:
and the processing module is used for carrying out blind processing on the identifier data corresponding to the service to be processed according to the pre-stored blind processing rule to obtain a blind identifier set.
In this embodiment, the processing module is further configured to:
and determining the blinding parameters according to a preset blinding parameter processing rule.
And processing the blinding parameters and the identifier data corresponding to the service to be processed according to a preset hash function to obtain an initial blinding identifier set.
And generating a preset number of noise identifiers which are not repeated with the initial blinding identifier set according to a preset noise generation rule.
And obtaining a blind identifier set according to the noise identifier and the initial blind identifier set.
Further, the processing module is further configured to:
and randomly determining a first initial blinding parameter, and receiving second initial blinding parameters sent by other user terminals.
And carrying out XOR operation on the first initial blinding parameter and the second initial blinding parameter to obtain a blinding parameter.
And the sending module is used for sending the blinded identifier sets to a cooperative party end so that the cooperative party end performs matching processing according to the blinded identifier sets sent by the user ends to obtain matching identifier sets, and then generates identifier record information corresponding to the user ends according to the matching identifier sets and the blinded identifier sets.
The processing module is further configured to receive the plurality of identifier record information sent by the cooperator, and perform heterogeneous data decryption calculation according to the plurality of identifier record information to obtain a plaintext processing result of the service to be processed.
In this embodiment, the identifier record information includes an identifier position, and the processing module is further configured to:
for each of the identifier recording information, it is determined whether the identifier recording information is a noise identifier from the identifier position.
And if the identifier recording information is not a noise identifier, determining first initial data corresponding to the identifier position according to the identifier position and the identifier data corresponding to the service to be processed.
And jointly processing the first initial data and second initial data sent by other user sides according to a preset joint calculation processing rule to obtain an intermediate result fragment corresponding to the identifier recording information.
And performing aggregation processing on the intermediate result fragments corresponding to each identifier record information to obtain an intermediate result, and performing decryption processing on the intermediate result according to a preset secure multiparty computing protocol to obtain a plaintext processing result of the service to be processed.
Further, the identifier record information includes sub-private fragment data generated by the first private fragment data or the second private fragment data, and the processing module is further configured to:
and carrying out fragmentation processing on the first initial data according to a preset fragmentation rule to obtain a plurality of first initial subdata.
Sending a preset number of first initial subdata in the plurality of first initial subdata to other user terminals to obtain first remaining initial subdata, and simultaneously receiving a preset number of second outgoing initial subdata sent by the other user terminals, wherein the second outgoing initial subdata is obtained by carrying out fragmentation processing on the second subdata in the other user terminals according to a preset fragmentation rule.
And jointly calculating the first remaining initial sub-data, the second outgoing initial sub-data, the first sub-private fragment data, and the second remaining initial sub-data, the first outgoing initial sub-data, and the second sub-private fragment data in the other user sides according to a preset private sharing protocol to obtain an intermediate result fragment corresponding to the identifier recording information, wherein the first sub-private fragment data is sub-private fragment data included in the identifier recording information corresponding to the local user side, and the second sub-private fragment data is sub-private fragment data included in the identifier recording information corresponding to the other user sides.
Furthermore, the processing module is further configured to:
and if the identifier recording information is a noise identifier, setting data corresponding to the identifier position to be zero.
Furthermore, in another embodiment, the processing module is further configured to:
and converting the plaintext processing result of the service to be processed according to a pre-stored hash function to obtain a converted plaintext processing result.
And uploading the converted plaintext processing result to a preset block chain.
The apparatus provided in the embodiment of the present application can implement the method of the embodiment shown in fig. 2, and the implementation principle and the technical effect are similar, which are not described herein again.
Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application, and as shown in fig. 5, a device 500 according to the embodiment includes: a processor 501, and a memory communicatively coupled to the processor. The processor 501 and the memory 502 are connected by a bus 503.
In a specific implementation process, the processor 501 executes the computer execution instructions stored in the memory 502, so that the processor 501 executes the heterogeneous data processing method in the above method embodiment.
For a specific implementation process of the processor 501, reference may be made to the above method embodiments, which implement the similar principle and technical effect, and this embodiment is not described herein again.
In the embodiment shown in fig. 5, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise high speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The embodiment of the present application further provides a computer-readable storage medium, where a computer execution instruction is stored in the computer-readable storage medium, and when a processor executes the computer execution instruction, the heterogeneous data processing method according to the above method embodiment is implemented.
An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for processing heterogeneous data as described above is implemented.
The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (20)

1. A heterogeneous data processing method is applied to a cooperator side, and comprises the following steps:
acquiring a blinded identifier set after blinded processing corresponding to a service to be processed sent by each user side;
matching each blinded identifier set according to a preset matching rule to obtain a matched identifier set;
and generating identifier record information corresponding to each user side according to the matching identifier set and each blinding identifier set, and respectively sending the identifier record information corresponding to each user side to the corresponding user side, so that the user side performs heterogeneous data decryption calculation according to the received identifier record information to obtain a plaintext processing result of the service to be processed.
2. The method according to claim 1, wherein the matching each blinded identifier set according to a preset matching rule to obtain a matching identifier set comprises:
and comparing the blinded identifier sets corresponding to different user sides, and determining the intersection of the blinded identifier sets corresponding to different user sides to obtain a matching identifier set.
3. The method according to claim 2, wherein the generating identifier record information corresponding to each user terminal according to the matching identifier sets and the blinded identifier sets comprises:
deleting the blinded identifiers corresponding to the matched identifier sets in the blinded identifier sets to obtain private identifier sets corresponding to the user sides;
according to a preset complementing rule, complementing the private identifier sets corresponding to the user sides to obtain the complemented private identifier sets corresponding to the user sides;
mixing the matching identifier set and the supplemented private identifier sets corresponding to the user sides to obtain an aligned sample set;
for each alignment sample in the alignment sample set, if the alignment sample belongs to the matching identifier set, generating identifier record information corresponding to each user side according to a pre-stored first generation rule;
and if the alignment sample belongs to the private identifier set corresponding to each user side, generating identifier record information corresponding to each user side according to a pre-stored second generation rule.
4. The method according to claim 3, wherein if the alignment sample belongs to the matching identifier set, generating identifier record information corresponding to each user terminal according to a first pre-stored generation rule, includes:
if the alignment sample belongs to the matching identifier set, randomly generating a pair of first private fragment data about a value 1;
and generating identifier record information corresponding to each user side according to the alignment sample and the first private fragment data.
5. The method according to claim 3, wherein if the alignment sample belongs to the set of private identifiers corresponding to the user terminals, generating identifier record information corresponding to the user terminals according to a second pre-stored generation rule comprises:
if the alignment sample belongs to the private identifier set corresponding to each user side, randomly generating a pair of second private fragment data related to the value 0;
and generating identifier record information corresponding to each user side according to the alignment sample and the second private fragment data.
6. The method according to claim 3, wherein the performing the complementation process on the private identifier set corresponding to each user side according to a preset complementation rule to obtain a complemented private identifier set corresponding to each user side comprises:
determining a target private identifier set with the largest number of private identifiers in the private identifier sets corresponding to the user sides;
and according to the number of the private identifiers in the target private identifier set, performing complementation processing on other private identifier sets except the target private identifier set in the private identifier set corresponding to each user side to obtain the complemented private identifier set corresponding to each user side.
7. The method according to any of claims 1-6, further comprising, after the generating identifier record information corresponding to each user terminal according to the matching identifier sets and the blinded identifier sets,:
converting the identifier record information corresponding to each user side according to a pre-stored hash function to obtain converted identifier record information;
and uploading the converted identifier record information to a preset block chain.
8. The method according to any of claims 1-6, wherein before the obtaining the blinded identifier set after the blinded processing of the to-be-processed service sent by each user side, further comprising:
constructing a data transmission channel between the cooperator side and each user side according to a preset channel construction rule;
the obtaining of the blinded identifier set after the blinding process corresponding to the service to be processed sent by each user side includes:
and acquiring a blinded identifier set after blinding corresponding to the service to be processed sent by each user side through a pre-constructed data transmission channel.
9. A heterogeneous data processing method is applied to a user side and comprises the following steps:
performing blind processing on identifier data corresponding to the service to be processed according to a pre-stored blind processing rule to obtain a blind identifier set;
sending the blinded identifier sets to a cooperator side so that the cooperator side performs matching processing according to the blinded identifier sets sent by the user sides to obtain matching identifier sets, and then generating identifier record information corresponding to the user sides according to the matching identifier sets and the blinded identifier sets;
and receiving a plurality of identifier recording information sent by the cooperative party end, and carrying out heterogeneous data decryption calculation according to the plurality of identifier recording information to obtain a plaintext processing result of the service to be processed.
10. The method according to claim 9, wherein the blinding the identifier data corresponding to the service to be processed according to the pre-stored blinding rule to obtain a blinded identifier set, includes:
determining a blinding parameter according to a preset blinding parameter processing rule;
processing the blinding parameter and the identifier data corresponding to the service to be processed according to a preset hash function to obtain an initial blinding identifier set;
generating a preset number of noise identifiers which are not repeated with the initial blinding identifier set according to a preset noise generation rule;
and obtaining a blind identifier set according to the noise identifier and the initial blind identifier set.
11. The method according to claim 10, wherein the determining the blinding parameter according to the preset blinding parameter processing rule comprises:
randomly determining a first initial blinding parameter, and receiving second initial blinding parameters sent by other user sides;
and carrying out XOR operation on the first initial blinding parameter and the second initial blinding parameter to obtain a blinding parameter.
12. The method according to claim 9, wherein the identifier record information includes an identifier position, and the performing a decryption calculation on heterogeneous data according to the plurality of identifier record information to obtain a plaintext processing result of the service to be processed includes:
for each identifier recording information, judging whether the identifier recording information is a noise identifier according to the identifier position;
if the identifier recording information is not a noise identifier, determining first initial data corresponding to the identifier position according to the identifier position and identifier data corresponding to the service to be processed;
processing the first initial data and the second initial data sent by other user ends jointly according to a preset joint calculation processing rule to obtain an intermediate result fragment corresponding to the identifier recording information;
and performing aggregation processing on the intermediate result fragments corresponding to each identifier record information to obtain an intermediate result, and performing decryption processing on the intermediate result according to a preset secure multiparty computing protocol to obtain a plaintext processing result of the service to be processed.
13. The method according to claim 12, wherein the identifier record information includes sub-private fragment data generated by first private fragment data or second private fragment data, and the obtaining of the intermediate result fragment corresponding to the identifier record information by jointly processing the first initial data and the second initial data sent by other user sides according to a preset joint calculation processing rule includes:
carrying out fragmentation processing on the first initial data according to a preset fragmentation rule to obtain a plurality of first initial subdata;
sending a preset number of first initial subdata in the plurality of first initial subdata to other user terminals to obtain first residual initial subdata, and simultaneously receiving a preset number of second external initial subdata sent by the other user terminals, wherein the second external initial subdata is obtained by carrying out fragmentation processing on the second subdata in the other user terminals according to a preset fragmentation rule;
and jointly calculating the first remaining initial sub-data, the second outgoing initial sub-data, the first sub-private fragment data, and the second remaining initial sub-data, the first outgoing initial sub-data, and the second sub-private fragment data in the other user sides according to a preset private sharing protocol to obtain an intermediate result fragment corresponding to the identifier recording information, wherein the first sub-private fragment data is sub-private fragment data included in the identifier recording information corresponding to the local user side, and the second sub-private fragment data is sub-private fragment data included in the identifier recording information corresponding to the other user sides.
14. The method of claim 12, further comprising:
and if the identifier recording information is a noise identifier, setting data corresponding to the identifier position to be zero.
15. The method according to any of claims 9-14, further comprising, after said obtaining the plaintext processing result of the service to be processed, the steps of:
converting the plaintext processing result of the service to be processed according to a prestored Hash function to obtain a converted plaintext processing result;
and uploading the converted plaintext processing result to a preset block chain.
16. A heterogeneous data processing apparatus, applied to a cooperator side, includes:
the acquiring module is used for acquiring a blinded identifier set after blinded processing corresponding to the service to be processed sent by each user side;
the processing module is used for matching each blinded identifier set according to a preset matching rule to obtain a matching identifier set;
the processing module is further configured to generate identifier record information corresponding to each user side according to the matching identifier set and each blinding identifier set, and send the identifier record information corresponding to each user side to the corresponding user side, so that the user side performs heterogeneous data decryption calculation according to the received identifier record information to obtain a plaintext processing result of the service to be processed.
17. A heterogeneous data processing device applied to a user side comprises:
the processing module is used for carrying out blind processing on the identifier data corresponding to the service to be processed according to a pre-stored blind processing rule to obtain a blind identifier set;
the sending module is used for sending the blinded identifier sets to a cooperator end so that the cooperator end performs matching processing according to the blinded identifier sets sent by the user ends to obtain matching identifier sets, and then generates identifier record information corresponding to the user ends according to the matching identifier sets and the blinded identifier sets;
the processing module is further configured to receive the plurality of identifier record information sent by the cooperator, and perform heterogeneous data decryption calculation according to the plurality of identifier record information to obtain a plaintext processing result of the service to be processed.
18. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the heterogeneous data processing method of any of claims 1-8 or 9-15.
19. A computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, perform the heterogeneous data processing method of any one of claims 1-8 or 9-15.
20. A computer program product comprising a computer program, characterized in that the computer program realizes the heterogeneous data processing method according to any one of claims 1-8 or 9-15 when executed by a processor.
CN202111462228.9A 2021-12-02 2021-12-02 Heterogeneous data processing method and device and electronic equipment Pending CN114154196A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111462228.9A CN114154196A (en) 2021-12-02 2021-12-02 Heterogeneous data processing method and device and electronic equipment
PCT/CN2022/124375 WO2023098294A1 (en) 2021-12-02 2022-10-10 Heterogeneous data processing method and apparatus, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111462228.9A CN114154196A (en) 2021-12-02 2021-12-02 Heterogeneous data processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114154196A true CN114154196A (en) 2022-03-08

Family

ID=80456014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111462228.9A Pending CN114154196A (en) 2021-12-02 2021-12-02 Heterogeneous data processing method and device and electronic equipment

Country Status (2)

Country Link
CN (1) CN114154196A (en)
WO (1) WO2023098294A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023098294A1 (en) * 2021-12-02 2023-06-08 深圳前海微众银行股份有限公司 Heterogeneous data processing method and apparatus, and electronic device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842561B (en) * 2023-06-29 2024-05-24 上海零数众合信息科技有限公司 Privacy intersection system and method capable of dynamically adding and deleting data sets
CN117577248B (en) * 2024-01-15 2024-04-05 浙江大学 Medical data sharing method and system integrating blockchain and privacy intersection technology

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019098941A1 (en) * 2017-11-20 2019-05-23 Singapore Telecommunications Limited System and method for private integration of datasets
US11070366B2 (en) * 2018-05-08 2021-07-20 Nec Corporation Dynamic anonymous password-authenticated key exchange (APAKE)
CN110572253B (en) * 2019-09-16 2023-03-24 济南大学 Method and system for enhancing privacy of federated learning training data
CN114154196A (en) * 2021-12-02 2022-03-08 深圳前海微众银行股份有限公司 Heterogeneous data processing method and device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023098294A1 (en) * 2021-12-02 2023-06-08 深圳前海微众银行股份有限公司 Heterogeneous data processing method and apparatus, and electronic device

Also Published As

Publication number Publication date
WO2023098294A1 (en) 2023-06-08

Similar Documents

Publication Publication Date Title
US11206132B2 (en) Multiparty secure computing method, device, and electronic device
US11290266B2 (en) Secure multi-party computation method and apparatus, and electronic device
US20210281402A1 (en) Multi-party security computing method and apparatus, and electronic device
US10122710B2 (en) Binding a data transaction to a person's identity using biometrics
CN114154196A (en) Heterogeneous data processing method and device and electronic equipment
CN103457732B (en) Private key generating means and method
CN111861473B (en) Electronic bidding system and method
CN110213059A (en) A kind of generation method of random number, generating means and storage medium
US20210344500A1 (en) Computer-implemented system and method for transferring access to digital resource
CN111008863A (en) Lottery drawing method and system based on block chain
CN116204912B (en) Data processing method and device based on isomorphic encryption
CN113536379B (en) Private data query method and device and electronic equipment
CN112003696A (en) SM9 key generation method, system, electronic equipment, device and storage medium
CN112597542B (en) Aggregation method and device of target asset data, storage medium and electronic device
CN112073196B (en) Service data processing method and device, electronic equipment and storage medium
US10530581B2 (en) Authenticated broadcast encryption
CN114785524B (en) Electronic seal generation method, device, equipment and medium
CN118160275A (en) Threshold signature scheme
WO2018105038A1 (en) Communication device and distributed ledger system
CN117478303B (en) Block chain hidden communication method, system and computer equipment
CN111917533A (en) Privacy preserving benchmark analysis with leakage reducing interval statistics
CN111401888B (en) Method and device for generating multi-signature wallet
CN111552950A (en) Software authorization method and device and computer readable storage medium
CN116681141A (en) Federal learning method, terminal and storage medium for privacy protection
CN116032639A (en) Message pushing method and device based on privacy calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination