WO2023098294A1 - 异构数据处理方法、装置及电子设备 - Google Patents
异构数据处理方法、装置及电子设备 Download PDFInfo
- Publication number
- WO2023098294A1 WO2023098294A1 PCT/CN2022/124375 CN2022124375W WO2023098294A1 WO 2023098294 A1 WO2023098294 A1 WO 2023098294A1 CN 2022124375 W CN2022124375 W CN 2022124375W WO 2023098294 A1 WO2023098294 A1 WO 2023098294A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- identifier
- client
- data
- record information
- blinded
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 177
- 238000000034 method Methods 0.000 claims abstract description 77
- 238000004364 calculation method Methods 0.000 claims abstract description 29
- 230000008569 process Effects 0.000 claims description 29
- 239000012634 fragment Substances 0.000 claims description 17
- 230000005540 biological transmission Effects 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 13
- 238000013467 fragmentation Methods 0.000 claims description 12
- 238000006062 fragmentation reaction Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 8
- 230000000295 complement effect Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
Definitions
- the present application relates to the technical field of blockchain, and in particular to a heterogeneous data processing method, device and electronic equipment.
- the sample alignment can be performed by means of privacy intersecting first, and the plaintext identification information in the sample records shared by all parties can be obtained, and then the data set corresponding to the plaintext identification information can be Secure multi-party computing or federated learning algorithms to realize joint computing of heterogeneous data across institutions.
- the output is shared plaintext information, that is, all institutions participating in the calculation can obtain the information shared by other institutions and themselves, which increases the risk of user information leakage, thereby increasing the compliance risks and affect user experience.
- the purpose of this application is to provide a heterogeneous data processing method, device and electronic equipment, so as to reduce the risk of user information leakage.
- this application discloses a method for processing heterogeneous data, which is applied to the collaboration side, including:
- performing matching processing on each blinded identifier set according to a preset matching rule to obtain a matching identifier set includes:
- the blinded identifier sets corresponding to different user terminals are compared, and the intersection of the blinded identifier sets corresponding to different user terminals is determined to obtain a matching identifier set.
- the generating identifier record information corresponding to each client according to the matching identifier set and each blinded identifier set includes:
- the alignment sample belongs to the private identifier set corresponding to each client, generate identifier record information corresponding to each client according to a prestored second generation rule.
- generating identifier record information corresponding to each client according to a pre-stored first generation rule including:
- the alignment sample belongs to the set of matching identifiers, randomly generate a pair of first private segment data with a value of 1;
- generating identifier record information corresponding to each client according to a pre-stored second generation rule including:
- the alignment sample belongs to the private identifier set corresponding to each client, then randomly generate a pair of second private fragmented data with a value of 0;
- performing completion processing on the private identifier sets corresponding to each client according to a preset completion rule, to obtain the completed private identifier sets corresponding to each client including:
- the method further includes:
- the acquisition of the blinded identifier set corresponding to the service to be processed sent by each client after the blinding process includes:
- the blinded identifier sets corresponding to the services to be processed sent by each client after the blinding process are obtained through the pre-built data transmission channel.
- the present application discloses a method for processing heterogeneous data, which is applied to the client, including:
- the identifier set and the blinded identifier sets generate identifier record information corresponding to each client;
- performing blinding processing on the identifier data corresponding to the service to be processed according to a pre-stored blinding processing rule to obtain a blinded identifier set includes:
- a blinded identifier set is obtained according to the noise identifier and the initial blinded identifier set.
- the determining blinding parameters according to preset blinding parameter processing rules includes:
- Exclusive OR operation is performed on the first initial blinding parameter and the second initial blinding parameter to obtain a blinding parameter.
- the identifier record information includes the identifier position
- the heterogeneous data decryption calculation is performed according to the plurality of identifier record information to obtain the plaintext processing result of the service to be processed, including:
- identifier record information For each of the identifier record information, judge whether the identifier record information is a noise identifier according to the position of the identifier;
- the identifier record information is not a noise identifier, then determine the first initial data corresponding to the identifier position according to the identifier position and the identifier data corresponding to the service to be processed;
- the identifier record information includes sub-private segment data generated by the first private segment data or the second private segment data, and the first initial data and The second initial data sent by other clients are jointly processed to obtain the intermediate result fragments corresponding to the identifier record information, including:
- the first remaining initial subdata, the second outgoing initial subdata, the first subprivate segment data, and the second remaining initial subdata, the second remaining initial subdata in the other client An outgoing initial sub-data and second sub-private segment data are jointly calculated to obtain an intermediate result segment corresponding to the identifier record information, wherein the first sub-private segment data is the identifier corresponding to the local client
- the sub-private segment data included in the identifier record information, the second sub-private segment data is the sub-private segment data included in the identifier record information corresponding to the other client.
- the identifier record information is a noise identifier
- the data corresponding to the identifier position is set to zero.
- the method further includes:
- the present application discloses a heterogeneous data processing device, which is applied to the collaborating party, including:
- An acquisition module configured to acquire a set of blinded identifiers after blinding processing corresponding to the services to be processed sent by each client;
- a processing module configured to perform matching processing on each blinded identifier set according to a preset matching rule to obtain a matched identifier set
- the processing module is further configured to generate identifier record information corresponding to each client according to the matching identifier set and each blinded identifier set, and store the identifier corresponding to each client
- the record information is respectively sent to the corresponding user end, so that the user end performs heterogeneous data decryption calculation according to the received identifier record information, and obtains the plaintext processing result of the service to be processed.
- the present application discloses a heterogeneous data processing device, which is applied to the client end, including:
- a processing module configured to perform blinding processing on the identifier data corresponding to the business to be processed according to pre-stored blinding processing rules, to obtain a blinded identifier set;
- a sending module configured to send the blinded identifier set to the cooperating party, so that the cooperating party performs matching processing according to the blinded identifier set sent by each user terminal to obtain a matching identifier set, and then generating identifier record information corresponding to each client according to the matching identifier set and each blinded identifier set;
- the processing module is further configured to receive the plurality of identifier record information sent by the coordinating party, and perform heterogeneous data decryption calculation according to the plurality of identifier record information, and obtain the plaintext processing result of the service to be processed .
- the present application discloses an electronic device, including: a processor, and a memory communicatively connected to the processor;
- the memory stores computer-executable instructions
- the processor executes the computer-executed instructions stored in the memory, so as to implement the heterogeneous data processing method according to any one of the first aspect and the second aspect.
- the present application discloses a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and when the processor executes the computer-executable instructions, the first aspect and the second aspect are implemented.
- the present application discloses a computer program product, including a computer program.
- the computer program When the computer program is executed by a processor, the heterogeneous data processing method according to any one of the first aspect and the second aspect is implemented.
- the embodiment of the present application provides a heterogeneous data processing method, device, and electronic equipment.
- the cooperating party can first obtain the blinded identifier set corresponding to the business to be processed sent by each client after blinding processing , and then perform matching processing on each blinded identifier set according to a preset matching rule to obtain a matched identifier set, and generate identifier record information corresponding to each client according to the matched identifier set and each blinded identifier set, And the identifier record information corresponding to each client is sent to the corresponding client respectively, so that the client performs heterogeneous data decryption calculation according to the received identifier record information, and obtains the plaintext processing result of the business to be processed.
- FIG. 1 is a schematic diagram of the architecture of the application system of the heterogeneous data processing method provided by the embodiment of the present application;
- FIG. 2 is a schematic flowchart of a method for processing heterogeneous data provided in an embodiment of the present application
- FIG. 3 is a schematic flowchart of a heterogeneous data processing method provided in another embodiment of the present application.
- FIG. 4 is a schematic structural diagram of a heterogeneous data processing device provided in an embodiment of the present application.
- FIG. 5 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.
- different organizations can be represented by different terminal devices (which can be clients), and different clients can have heterogeneous data sets with different fields.
- the aggregation statistics operation is performed. For example, client party A has 100 records (id, X1) about sensitive data X1, and client party B has 50 records (id, X2) about sensitive data X2.
- the size of the id set jointly owned by both parties is 40.
- the joint operation of heterogeneous data can be realized.
- the output is shared plaintext information, that is, all institutions participating in the calculation can obtain the information shared by other institutions and themselves, which increases the risk of user information leakage, thereby increasing the compliance risks and affect user experience.
- this application implements sample alignment by adding a cooperating party that only processes non-sensitive ciphertext data after blinding processing, which overcomes the need to disclose the set of sample identifiers shared by multiple parties in the traditional sample alignment method It reduces the risk of user information leakage, improves the security of blind processing of heterogeneous data, reduces compliance risks, and improves user experience.
- FIG. 1 is a schematic diagram of the architecture of the application system of the heterogeneous data processing method provided by the embodiment of the present application.
- Each user terminal 102 can perform blinding processing on the identifier set corresponding to the service to be processed locally, and obtain a blinded identifier set, and then send the blinded identifier set to the agreement party 101 for sample alignment processing, and generate a The identifier record information corresponding to each client. Then, the identifier record information corresponding to each client is sent to the corresponding client, so that the client performs heterogeneous data decryption calculation according to the received identifier record information, and obtains the plaintext processing result of the service to be processed.
- the coordinating party terminal 101 may be a single server, or may be a server cluster.
- the client 102 can be a single server, or a server cluster, or a personal computer, smart phone, tablet and other devices.
- each client may be the same device, or may be a different device.
- FIG. 2 is a schematic flowchart of a method for processing heterogeneous data provided by an embodiment of the present application, and the method of this embodiment may be executed by the protocol party 101 . As shown in Figure 2, the method of this embodiment may include:
- S201 Obtain a set of blinded identifiers after blinding processing corresponding to services to be processed sent by each client.
- the data corresponding to the business to be processed can be obtained from different clients first, and then jointly processed according to the obtained heterogeneous data corresponding to the business to be processed, so as to realize the business to be processed .
- each client can first perform blinding processing on the identifier set corresponding to the business to be processed, obtain the blinded identifier set after blinding processing, and then send the blinded identifier set to the collaborating party .
- each blinding identifier is a data identifier after blinding processing.
- the data identifier may be a data id
- the blinding identifier may be an id after blinding processing.
- the data identifier before blinding processing may be A0001, and the data identifier after blinding processing is 0xAF12C3.
- blinding processing method can be customized and set according to actual application scenarios, and will not be defined in detail here.
- S202 Perform matching processing on each blinded identifier set according to a preset matching rule to obtain a matching identifier set.
- the blinded identifier sets of different client ends can be matched to obtain the identifiers containing the same identifier in different blinded identifier sets.
- each blinded identifier set is matched according to a preset matching rule to obtain a matching identifier set, which may specifically include:
- the blinded identifier sets corresponding to different user terminals are compared, and the intersection of the blinded identifier sets corresponding to different user terminals is determined to obtain a matching identifier set.
- the blinded identifier sets corresponding to different client terminals can be compared first to determine the same identifier information in the blinded identifier sets corresponding to different client terminals, and then all the identical identifier information can be assigned to In a new collection, get the collection of matching identifiers.
- S203 Generate identifier record information corresponding to each user terminal according to the matching identifier set and each blinded identifier set, and send the identifier record information corresponding to each user terminal to the corresponding user terminal, so that the user terminal According to the received identifier record information, the heterogeneous data is decrypted and calculated, and the plaintext processing result of the business to be processed is obtained.
- each blinded identifier information in the matching identifier set and the blinded identifier information in each blinded identifier set can be used to generate the The identifier record information, and then send the identifier record information corresponding to each client to the corresponding client respectively, so that the client can process according to the identifier record information and obtain the plaintext processing result of the service to be processed.
- generating identifier record information corresponding to each client according to the matching identifier set and each blinded identifier set may specifically include:
- the blinded identifiers corresponding to the matched identifier sets in the blinded identifier sets corresponding to the user terminals may be respectively removed to obtain the private identifier sets corresponding to each user terminal.
- the number of private identifier sets can be zero, one, or multiple. If the blinded identifier sets in all UEs are the same, then the number of private identifier sets is zero. If only one blinded identifier set of the UE contains other blinded identifiers except the blinded identifiers in the matching identifier set, the number of private identifier sets is one.
- the private identifier set corresponding to each client is completed according to the preset completion rule, and the completed private identifier set corresponding to each client is obtained.
- the specific processing process can be as follows:
- the target private identifier set containing the largest number of private identifiers can be determined first from the private identifier set, and then the number of private identifiers in other private identifier sets can be added to the target private identifier set
- the data of private identifiers is consistent, that is, the number of private identifiers contained in all private identifier sets after completion is consistent, which provides convenience for subsequent analysis and processing.
- the private identifiers in the private identifier set are blinded identifiers after blinding processing.
- the existing noise identifier generation rules can be used to generate noise identifiers that do not overlap with the existing blinded identifiers.
- the matching identifier set and the completed private identifier set corresponding to each client are mixed to obtain an alignment sample set.
- the data in the matching identifier set can be compared with the data in the private identifier set corresponding to each client after completion Randomly mix to obtain the aligned sample set after mixing.
- each aligned sample in the aligned sample set has a corresponding sample number.
- identifier record information corresponding to each client For each alignment sample in the alignment sample set, if the alignment sample belongs to the matching identifier set, generate identifier record information corresponding to each client according to a prestored first generation rule.
- the alignment sample belongs to the private identifier set corresponding to each client, generate identifier record information corresponding to each client according to a prestored second generation rule.
- the aligned sample can be randomly selected from the aligned sample set, and then the original source of the aligned sample can be determined according to the sample ID contained in the aligned sample, that is, it can be determined that the aligned sample belongs to the matching identifier
- the set is also a set of private identifiers corresponding to each client, and then the corresponding generation rules are determined according to the original source of the alignment sample, and the identifier record information corresponding to each client is generated according to the corresponding generation rules.
- generating identifier record information corresponding to each client according to a pre-stored first generation rule may specifically include:
- a pair of first private segment data with a value of 1 is randomly generated.
- generating identifier record information corresponding to each client according to a pre-stored second generation rule may specifically include:
- the first private segment data with a value of 1 is a group of first private segment data whose sum is 1, and the number of the first private segment data corresponds to the number of clients.
- selector_w (r_w, 1– r_w).
- the second private segment data with a value of 0 is a set of second private segment data whose sum is 0, and the number of second private segment data corresponds to the number of clients.
- the identifier record information corresponding to each client may be generated according to the alignment sample and the obtained first private segment data or the second private segment data, That is, after deleting the identifier record information related to other user terminals, the generated identifier record information corresponding to the user terminal is sent to the corresponding user terminal.
- client A has four data samples
- Table 1 is the data sample table of client A
- the data samples include identifiers respectively information and corresponding specific data.
- the specific data in the user terminal A is the income level (income).
- Client B has six data samples.
- Table 2 is the data sample table of client B.
- the data samples include identifier information and corresponding specific data.
- the specific data in the client B is tax (tax_rate).
- the collaborating party C can compare the blinded identifier sets of client A and client B to find all the same blinded identifiers. Get the matching identifier set id_match.
- the matching identifier set id_match there are three identical blinded identifiers in the matching identifier set id_match, that is, blinded identifiers 0xAF12C3, 0xCC6712, and 0x2E341B obtained after blinding processing of A0001, A0003, and A0004.
- two private identifier sets can also be obtained, that is, the private identifier set id_rest_A corresponding to the client A and the private identifier set id_rest_B corresponding to the client B.
- the collaborating party C cannot see the plain text of the identifier, but can only see the blinded identifier, which reduces the risk of identifier information leakage.
- the corresponding identifier position field pos_A_i or pos_B_i is correspondingly incremented.
- client B has 2 pieces of data more than client A, and partner C will insert 2 noise identifiers into id_rest_A so that the amount of data is the same as id_rest_B. Then, the coordinating party C can scramble and mix the elements in the three sets of id_match, id_rest_A and id_rest_B into a complete alignment sample set BAS, and then randomly select the alignment samples of group w from the alignment sample set.
- the aligned samples are also a blinded sample alignment set.
- the final alignment sample is (idx_w, pos_A_i, pos_B_j, selector_w).
- the data is selected as a sample without replacement, the previously selected data will not be selected next time.
- the collaborating party C can obtain the identifier record information corresponding to client A and client B according to the alignment sample (idx_w, pos_A_i, pos_B_j, selector_w), that is, the identifier record information corresponding to client A is (idx_w, pos_A_i, selector_w[0]), the identifier record information corresponding to client B is (idx_w, pos_B_j, selector_w[1]).
- Collaborator C can send (idx_w, pos_A_i, selector_w[0]) to client A through the preset Channel_AC, and can send (idx_w, pos_B_j, selector_w[1]) to client B through the preset Channel_BC.
- the business to be processed may be business such as transfer, balance inquiry, and loan.
- the business to be processed is calculating the total tax amount, that is, the total tax amount is determined by the expression SUM(income*tax_rate). Therefore, after receiving the identifier record information, client A and client B can decrypt heterogeneous data according to the received (idx_w, pos_A_i, selector_w[0]) and (idx_w, pos_B_j, selector_w[1]) Calculate to get the total tax payment.
- the collaborating party can first obtain the blinded identifier sets after blinding processing corresponding to the services to be processed sent by each client, and then perform matching processing on each blinded identifier set according to the preset matching rules, Obtain the same identifier information, and obtain matching identifier sets based on the same identifier information and each blinded identifier set, and then generate identifier records corresponding to each client based on the matched identifier set and each blinded identifier set information, and send the identifier record information corresponding to each client to the corresponding client respectively, so that the client performs heterogeneous data decryption calculation according to the received identifier record information, and obtains the plaintext processing result of the service to be processed,
- the method of sample alignment is achieved by adding a collaborating party that only processes non-sensitive ciphertext data after blinding processing, which overcomes the problem of disclosing the set of sample identifiers shared by multiple parties in the traditional sample alignment method, and reduces user information leakage
- the method may further include:
- the identifier record information corresponding to each client is converted according to the pre-stored hash function to obtain the converted identifier record information.
- the identifier record information can be uploaded to the preset blockchain to prevent tampering by other clients and support post-event auditing.
- each client after each client obtains the blinded identifier set after blinding, it can also convert the blinded identifier set according to the pre-stored hash function to obtain the converted blinded identifier set, and then convert the blinded identifier set to The set of blinded identifiers is then uploaded to the preset blockchain.
- the block chain can be implemented through existing methods, which will not be limited in detail here.
- the method may further include:
- a data transmission channel between the coordinating party and each user terminal is constructed according to a preset channel construction rule.
- the blinded identifier set after blinding processing corresponding to the service to be processed sent by each client which may specifically include:
- the blinded identifier sets corresponding to the services to be processed sent by each client after the blinding process are obtained through the pre-built data transmission channel.
- each client can first obtain its own private key and public key, and then send their respective public keys to each other, and then construct a corresponding data transmission channel based on the received public key. For example, suppose there are two clients, client A and client B. Client A applies for a certificate containing public key pk_A for its private key sk_A from the authoritative certificate authority, and client B applies for a certificate containing the public key pk_A from the authoritative certificate authority. Apply for a certificate containing the public key pk_B for its own private key sk_B, client A and client B send their respective public keys to each other, and verify their authenticity based on the relevant certificates.
- client A and client B build an anti-eavesdropping secure channel based on the public key of the other party, that is, the data transmission channel Channel_AB.
- client A sends a message m to client B, it is sent through Channel_AB, that is, first through B
- the public key pk_B is encrypted and then sent to B.
- B receives it, he decrypts it with his private key sk_B to obtain the plaintext of m, and vice versa.
- it is also possible to introduce a third collaborator C who does not handle sensitive data repeat the above steps, build corresponding secure channels Channel_AC and Channel_BC, and transmit data through the data transmission channel, which improves the security of data transmission.
- FIG. 3 is a schematic flowchart of a method for processing heterogeneous data provided by another embodiment of the present application.
- the method of this embodiment can be executed by the client 102 .
- the method of this embodiment may include:
- S301 Perform blinding processing on the identifier data corresponding to the service to be processed according to the pre-stored blinding processing rules to obtain a blinded identifier set.
- At least one identifier data corresponding to the service to be processed in each user terminal may be processed according to a pre-stored blinding processing rule to obtain a blinded identifier set.
- the identifier data corresponding to the service to be processed is blinded according to the pre-stored blinding processing rules to obtain a blinded identifier set, which may specifically include:
- the blinding parameters are determined according to preset blinding parameter processing rules.
- the blinding parameters and the identifier data corresponding to the service to be processed are processed according to a preset hash function to obtain an initial blinding identifier set.
- a preset number of noise identifiers that are not repeated with the initial blinded identifier set are generated according to a preset noise generation rule.
- a blinded identifier set is obtained according to the noise identifier and the initial blinded identifier set.
- a first initial blinding parameter may be randomly determined, and at the same time, a second initial blinding parameter sent by other user terminals may be received.
- the second initial blinding parameter may be one or more.
- N number of noise identifiers that are different from the existing ones can be generated according to a preset noise generation rate.
- the blinded identifiers in the initial blinded identifier set are non-repetitive noise identifiers, and then N noise identifiers are added to the initial blinded identifier set, and their order is disturbed to obtain a blinded identifier set.
- client A there are two clients, client A and client B, and client A independently selects a random number b_A, uses it as a blinding seed segment, and sends it to client B through Channel_AB.
- client B independently selects random number b_B, uses it as a blinding seed fragment, and sends it to client A through Channel_AB.
- id_b Hash("A0001"
- b) a blinded string 0xAF12C3 corresponding to "A0001”, which can be used to determine the matching identifier set in the subsequent sample id blind comparison process.
- S302 Send the blinded identifier set to the coordinating party, so that the coordinating party performs matching processing according to the blinded identifier set sent by each client to obtain a matching identifier set, and then according to the matching identifier set and each blinded
- the identifier set generates identifier record information corresponding to each client.
- the user end may send the blinded identifiers to the cooperating party through a preset data transmission channel.
- the coordinating party can also receive the blinded identifier set sent by other users at the same time.
- it can Process the blinded identifier set sent by the user end to obtain several identifier record information corresponding to each user end.
- S303 Receive several identifier record information sent by the coordinating party, and perform heterogeneous data decryption calculation according to the several identifier record information, and obtain the plaintext processing result of the service to be processed.
- the coordinating party can return the obtained several identifier record information to the corresponding client, and after the client receives the returned identifier record information , the heterogeneous data decryption calculation can be performed according to several identifier record information, and the plaintext processing result of the business to be processed can be obtained.
- the heterogeneous data decryption calculation is performed according to the several identifier record information to obtain the plaintext processing result of the service to be processed, which may specifically include:
- each identifier record information it is judged according to the position of the identifier whether the identifier record information is a noise identifier.
- first initial data corresponding to the identifier position is determined according to the identifier position and the identifier data corresponding to the service to be processed.
- the first initial data and the second initial data sent by other clients are jointly processed according to a preset joint calculation processing rule to obtain intermediate result fragments corresponding to the identifier record information.
- each identifier record information includes an identifier position, that is, pos_A_i or pos_B_j, and then it can be determined according to pos_A_i or pos_B_j whether it is the data in the initially acquired data set sample.
- the identifier record information includes sub-private data fragments generated by the first private fragment data or the second private fragment data, and the first initial data and other user
- the second initial data sent by the terminal is jointly processed to obtain the intermediate result fragment corresponding to the identifier record information, which may specifically include:
- the first remaining initial subdata, the second outgoing initial subdata, the first subprivate segment data, and the second remaining initial subdata, the second remaining initial subdata in the other client An outgoing initial sub-data and second sub-private segment data are jointly calculated to obtain an intermediate result segment corresponding to the identifier record information, wherein the first sub-private segment data is the identifier corresponding to the local client
- the sub-private segment data included in the identifier record information, the second sub-private segment data is the sub-private segment data included in the identifier record information corresponding to the other client.
- the remaining data may be referred to as second remaining initial sub-data.
- the method may also include:
- the identifier record information is a noise identifier
- the data corresponding to the identifier position is set to zero.
- client A locally reads the record-related data pointed to by pos_A_i in (idx_w, pos_A_i, selector_w[0]), and determines the The corresponding income field in , can be specifically:
- pos_A_i exceeds the maximum number of records in the original data set sample of client A, it indicates that it is a noise record added by partner C, and income is set to 0.
- r_w secure multi-party computing protocol
- Client B locally reads the record-related data pointed to by pos_B_j in (idx_w, pos_B_j, selector_w[1]).
- pos_B_j in (idx_w, pos_B_j, selector_w[1]).
- it corresponds to the tax_rate field, which can be specifically:
- pos_B_i exceeds the maximum number of records of client B in the original dataset sample, indicating that it is a noise record added by collaborator C, set tax_rate to 0.
- pos_B_i points to the noise record originally added by Client B, set tax_rate to 0 as well.
- Client A and client B are based on the existing secure multi-party computing protocol (such as the classic secret sharing protocol), according to the fragmented data v1_w[1] (that is, the first remaining initial sub-data) in the hands of client A, v2_w[0 ] (that is, the second outgoing initial sub-data), selector_w[0] (that is, the first sub-private fragmented data) and the fragmented data v1_w[0], v2_w[1], selector_w[1] ( That is, the second sub-private segment data), under the effect of not disclosing the plaintext values of v1_w, v2_w, selector_w, the joint calculation obtains the intermediate result segment related to v1_w*v2_w*selector_w.
- the existing secure multi-party computing protocol such as the classic secret sharing protocol
- selector_w is 1.
- the final decryption is performed based on a secure multi-party computing protocol (such as the classic secret sharing protocol), and the final pending The plain text processing result of the processing business.
- a secure multi-party computing protocol such as the classic secret sharing protocol
- Table 3 is a logical summary table after sample alignment, specifically including identifier information shared by client A and client B and corresponding specific information.
- client A and client B can respectively calculate the aggregated Hash of the local fragmented data, and send the result to the blockchain for certificate storage.
- the method may further include:
- the plaintext processing result of the service to be processed is converted according to the pre-stored hash function to obtain the converted plaintext processing result.
- Figure 4 is a schematic structural diagram of a heterogeneous data processing device provided by the embodiment of this application, which is applied to the coordinating party. As shown in Figure 4, this implementation Examples of provided devices may include:
- the obtaining module 401 is configured to obtain a set of blinded identifiers after blinding processing corresponding to services to be processed sent by each client.
- the processing module 402 is configured to perform matching processing on each blinded identifier set according to a preset matching rule to obtain a matching identifier set.
- processing module 402 is further configured to:
- the blinded identifier sets corresponding to different user terminals are compared, and the intersection of the blinded identifier sets corresponding to different user terminals is determined to obtain a matching identifier set.
- the processing module 402 is further configured to generate identifier record information corresponding to each client according to the matching identifier set and each blinded identifier set, and record information corresponding to each client
- the identifier record information is sent to the corresponding user end respectively, so that the user end performs heterogeneous data decryption calculation according to the received identifier record information, and obtains the plaintext processing result of the service to be processed.
- processing module 402 is further configured to:
- the private identifier set corresponding to each client is completed according to the preset completion rule, and the completed private identifier set corresponding to each client is obtained.
- the matching identifier set and the completed private identifier set corresponding to each client are mixed to obtain an alignment sample set.
- identifier record information corresponding to each client For each alignment sample in the alignment sample set, if the alignment sample belongs to the matching identifier set, generate identifier record information corresponding to each client according to a prestored first generation rule.
- the alignment sample belongs to the private identifier set corresponding to each client, generate identifier record information corresponding to each client according to a prestored second generation rule.
- processing module 402 is also used for:
- a pair of first private segment data with a value of 1 is randomly generated.
- processing module 402 is also used for:
- processing module is also used for:
- processing module is also used for:
- the identifier record information corresponding to each client is converted according to the pre-stored hash function to obtain the converted identifier record information.
- processing module is also used for:
- a data transmission channel between the coordinating party and each user terminal is constructed according to a preset channel construction rule.
- the blinded identifier sets corresponding to the services to be processed sent by each client after the blinding process are obtained through the pre-built data transmission channel.
- the present application also provides another device for processing heterogeneous data, which is applied to a client, and the device may include:
- the processing module is configured to perform blinding processing on the identifier data corresponding to the service to be processed according to the pre-stored blinding processing rules to obtain a blinded identifier set.
- processing module is also used for:
- the blinding parameters are determined according to preset blinding parameter processing rules.
- the blinding parameters and the identifier data corresponding to the service to be processed are processed according to a preset hash function to obtain an initial blinding identifier set.
- a preset number of noise identifiers that are not repeated with the initial blinded identifier set are generated according to a preset noise generation rule.
- a blinded identifier set is obtained according to the noise identifier and the initial blinded identifier set.
- processing module is also used for:
- a first initial blinding parameter is randomly determined, and at the same time, a second initial blinding parameter sent by other user terminals is received.
- Exclusive OR operation is performed on the first initial blinding parameter and the second initial blinding parameter to obtain a blinding parameter.
- a sending module configured to send the blinded identifier set to the cooperating party, so that the cooperating party performs matching processing according to the blinded identifier set sent by each user terminal to obtain a matching identifier set, and then Generate identifier record information corresponding to each client according to the matched identifier set and each blinded identifier set.
- the processing module is further configured to receive several identifier record information sent by the coordinating party, and perform heterogeneous data decryption calculation according to the several identifier record information, and obtain the plaintext processing result of the service to be processed.
- the identifier record information includes the identifier position
- the processing module is further configured to:
- each identifier record information it is judged according to the position of the identifier whether the identifier record information is a noise identifier.
- first initial data corresponding to the identifier position is determined according to the identifier position and the identifier data corresponding to the service to be processed.
- the first initial data and the second initial data sent by other clients are jointly processed according to a preset joint calculation processing rule to obtain intermediate result fragments corresponding to the identifier record information.
- the identifier record information includes sub-private segment data generated by the first private segment data or the second private segment data, and the processing module is also used for:
- the first remaining initial subdata, the second outgoing initial subdata, the first subprivate segment data, and the second remaining initial subdata, the second remaining initial subdata in the other client An outgoing initial sub-data and second sub-private segment data are jointly calculated to obtain an intermediate result segment corresponding to the identifier record information, wherein the first sub-private segment data is the identifier corresponding to the local client
- the sub-private segment data included in the identifier record information, the second sub-private segment data is the sub-private segment data included in the identifier record information corresponding to the other client.
- processing module is also used for:
- the identifier record information is a noise identifier
- the data corresponding to the identifier position is set to zero.
- processing module is also used for:
- the plaintext processing result of the service to be processed is converted according to the pre-stored hash function to obtain the converted plaintext processing result.
- the device provided in the embodiment of the present application can implement the method in the above embodiment as shown in FIG. 2 , and its implementation principle and technical effect are similar, and will not be repeated here.
- FIG. 5 is a schematic diagram of a hardware structure of an electronic device provided in an embodiment of the present application.
- a device 500 provided in this embodiment includes a processor 501 and a memory communicatively connected to the processor. Wherein, the processor 501 and the memory 502 are connected through a bus 503 .
- the processor 501 executes the computer-executed instructions stored in the memory 502, so that the processor 501 executes the heterogeneous data processing method in the foregoing method embodiments.
- the processor can be a central processing unit (English: Central Processing Unit, referred to as: CPU), and can also be other general-purpose processors, digital signal processors (English: Digital Signal Processor, referred to as: DSP), application specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC), etc.
- a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the method disclosed in conjunction with the invention can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
- the memory may include high-speed RAM memory, and may also include non-volatile storage NVM, such as at least one disk memory.
- the bus can be an Industry Standard Architecture (Industry Standard Architecture, ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus, etc.
- ISA Industry Standard Architecture
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- the bus can be divided into address bus, data bus, control bus and so on.
- the buses in the drawings of the present application are not limited to only one bus or one type of bus.
- the embodiment of the present application also provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the processor executes the computer-executable instructions, the heterogeneous data processing method of the foregoing method embodiment is implemented .
- An embodiment of the present application further provides a computer program product, including a computer program, and when the computer program is executed by a processor, implements the heterogeneous data processing method as described above.
- the above-mentioned computer-readable storage medium can be realized by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable Programmable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
- SRAM static random access memory
- EEPROM electrically erasable Programmable Read Only Memory
- EPROM Erasable Programmable Read Only Memory
- PROM Programmable Read Only Memory
- ROM Read Only Memory
- Magnetic Memory Flash Memory
- Magnetic or Optical Disk Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.
- An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium.
- the readable storage medium can also be a component of the processor.
- the processor and the readable storage medium may be located in Application Specific Integrated Circuits (ASIC for short).
- ASIC Application Specific Integrated Circuits
- the processor and the readable storage medium can also exist in the device as discrete components.
- the aforementioned program can be stored in a computer-readable storage medium.
- the program executes the steps including the above-mentioned method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本申请实施例提供一种异构数据处理方法、装置及电子设备,所述方法包括:获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合,根据预设的匹配规则对各盲化标识符集合进行匹配处理,得到匹配标识符集合,根据匹配标识符集合以及各盲化标识符集合生成与各用户端对应的标识符记录信息,并将与各用户端对应的标识符记录信息分别发送至对应的用户端,以使用户端根据接收到的标识符记录信息进行异构数据解密计算,得到待处理业务的明文处理结果。该实施例可以降低用户信息泄露的风险,进而提高用户的使用体验。
Description
本申请要求于2021年12月2日提交中国专利局、申请号为202111462228.9、申请名称为“异构数据处理方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及区块链技术领域,尤其涉及一种异构数据处理方法、装置及电子设备。
随着互联网技术的发展,各机构之间的交互越来越频繁,在实现金融业务时,可能会涉及到不同机构之间的异构数据联合进行计算的情况。
现有技术中,在进行异构数据联合计算时,可以先通过隐私求交的方式来进行样本对齐,获得各方共有的样本记录中的明文标识信息,然后对明文标识信息对应的数据集进行安全多方计算或者联邦学习算法,实现跨机构的异构数据联合计算。
然而,在现实业务中,在进行样本对齐时,输出的为共有的明文信息,即所有参与计算的机构都可以获取其他机构与自己共有的信息,增大了用户信息泄露的风险,进而增大了合规风险,影响了用户的使用体验。
发明内容
本申请的目的在于提供一种异构数据处理方法、装置及电子设备,以降低用户信息泄露的风险。
第一方面,本申请公开了一种异构数据处理方法,应用于协作方端,包括:
获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合;
根据预设的匹配规则对各盲化标识符集合进行匹配处理,得到匹配标识符集合;
根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息,并将与所述各用户端对应的标识符记录信息分别发送至对应的用户端,以使所述用户端根据接收到的标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果。
可选的,所述根据预设的匹配规则对各盲化标识符集合进行匹配处理,得到匹配标识符集合,包括:
对不同用户端对应的盲化标识符集合进行比对处理,确定所述不同用户端对应的盲化标识符集合的交集,得到匹配标识符集合。
可选的,所述根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息,包括:
删除所述各盲化标识符集合中与所述匹配标识符集合对应的盲化标识符,得到各用户端对应的私有标识符集合;
对所述匹配标识符集合以及所述补齐之后的各用户端对应的私有标识符集合进行混合处理,得到对齐样本集合;
针对所述对齐样本集合中的每个对齐样本,若所述对齐样本属于所述匹配标识符集合,则根据预存的第一生成规则生成与所述各用户端对应的标识符记录信息;
若所述对齐样本属于所述各用户端对应的私有标识符集合,则根据预存的第二生成规则生成与所述各用户端对应的标识符记录信息。
可选的,所述若所述对齐样本属于所述匹配标识符集合,则根据预存的第一生成规则生成与所述各用户端对应的标识符记录信息,包括:
若所述对齐样本属于所述匹配标识符集合,则随机生成一对关于数值1的第一私密分片数据;
根据所述对齐样本以及所述第一私密分片数据生成与所述各用户端对应的标识符记录信息。
可选的,所述若所述对齐样本属于所述各用户端对应的私有标识符集合,则根据预存的第二生成规则生成与所述各用户端对应的标识符记录信息,包括:
若所述对齐样本属于所述各用户端对应的私有标识符集合,则随机生成一对关于数值0的第二私密分片数据;
根据所述对齐样本以及所述第二私密分片数据生成与所述各用户端对应的标识符记录信息。
可选的,所述根据预设的补齐规则对所述各用户端对应的私有标识符集合进行补齐处理,得到补齐之后的各用户端对应的私有标识符集合,包括:
确定所述各用户端对应的私有标识符集合中私有标识符数量最多的目标私有标识符集合;
根据所述目标私有标识符集合中私有标识符的数量对所述各用户端对应的私有标识符集合中除所述目标私有标识符集合之外的其他私有标识符集合进行补齐处理,得到补齐之后的各用户端对应的私有标识符集合。
可选的,在所述根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息之后,还包括:
根据预存的哈希函数对所述与所述各用户端对应的标识符记录信息进行转换处理,得到转换之后的标识符记录信息;
将所述转换之后的标识符记录信息上传至预设的区块链中。
可选的,在所述获取各用户端发送的待处理业务盲化处理之后的盲化标识符集合之前,还包括:
根据预设的通道构建规则构建所述协作方端与所述各用户端之间的数据传输通道;
则所述获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合,包括:
通过预先构建的数据传输通道获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合。
第二方面,本申请公开了一种异构数据处理方法,应用于用户端,包括:
根据预存的盲化处理规则对待处理业务对应的标识符数据进行盲化处理,得到盲化标识符集合;
将所述盲化标识符集合发送至协作方端,以使所述协作方端根据各用户端发送的所述盲化标识符集合进行匹配处理,得到匹配标识符集合,再根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息;
接收所述协作方端发送的若干标识符记录信息,并根据所述若干标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果。
可选的,所述根据预存的盲化处理规则对待处理业务对应的标识符数据进行盲化处理,得到盲化标识符集合,包括:
根据预设的盲化参数处理规则确定盲化参数;
根据预设的哈希函数对所述盲化参数以及所述待处理业务对应的标识符数据进行处理,得到初始盲化标识符集合;
根据预设的噪声生成规则生成预设数量的,且与所述初始盲化标识符集合不重复的噪声标识符;
根据所述噪声标识符和所述初始盲化标识符集合得到盲化标识符集合。
可选的,所述根据预设的盲化参数处理规则确定盲化参数,包括:
随机确定一第一初始盲化参数,同时接收其他用户端发送的第二初始盲化参数;
对所述第一初始盲化参数和所述第二初始盲化参数做异或运算,得到盲化参数。
可选的,所述标识符记录信息中包含标识符位置,所述根据所述若干标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果,包括:
针对每个所述标识符记录信息,根据所述标识符位置判断所述标识符记录信息是否为噪声标识符;
若所述标识符记录信息不是噪声标识符,则根据所述标识符位置以及所述待处理业务对应的标识符数据确定所述标识符位置对应的第一初始数据;
根据预设联合计算处理规则对所述第一初始数据以及其他用户端发送的第二初始数据联合进行处理,得到所述标识符记录信息对应的中间结果分片;
对每个所述标识符记录信息对应的中间结果分片进行聚合处理,得到中间结果,并根据预设的安全多方计算协议对所述中间结果进行解密处理,得到所述待处理业务的明文处理结果。
可选的,所述标识符记录信息中包含第一私密分片数据或第二私密分片数据生成的子私密分片数据,所述根据预设联合计算处理规则对所述第一初始数据以及其他用户端发送的第二初始数据联合进行处理,得到所述标识符记录信息对应的中间结果分片,包括:
根据预设的分片规则对所述第一初始数据进行分片处理,得到若干第一初始子数据;
将所述若干第一初始子数据中预设数量的第一初始子数据发送至其他用户端,得到第一剩余初始子数据,同时接收所述其他用户端发送的预设数量的第二外发初始子数据,其中,所述第二外发初始子数据为其他用户端中的第二子数据根据预设的分片规则进行分片处理得到的;
根据预设的私密共享协议对所述第一剩余初始子数据、所述第二外发初始子数据、第一子私密分片数据以及所述其他用户端中的第二剩余初始子数据、第一外发初始子数据和第二子私密分片数据联合进行计算,得到所述标识符记录信息对应的中间结果分片,其中,所述第一子私密分片数据为本地用户端对应的标识符记录信息中包含的子私密分片数据, 所述第二子私密分片数据为所述其他用户端对应的标识符记录信息中包含的子私密分片数据。
可选的,还包括:
若所述标识符记录信息是噪声标识符,则将所述标识符位置对应的数据设置为零。
可选的,在所述得到所述待处理业务的明文处理结果之后,还包括:
根据预存的哈希函数对所述待处理业务的明文处理结果进行转换处理,得到转换之后的明文处理结果;
将所述转换之后的明文处理结果上传至预设的区块链中。
第三方面,本申请公开一种异构数据处理装置,应用于协作方端,包括:
获取模块,用于获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合;
处理模块,用于根据预设的匹配规则对各盲化标识符集合进行匹配处理,得到匹配标识符集合;
所述处理模块,还用于根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息,并将与所述各用户端对应的标识符记录信息分别发送至对应的用户端,以使所述用户端根据接收到的标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果。
第四方面,本申请公开一种异构数据处理装置,应用于用户端,包括:
处理模块,用于根据预存的盲化处理规则对待处理业务对应的标识符数据进行盲化处理,得到盲化标识符集合;
发送模块,用于将所述盲化标识符集合发送至协作方端,以使所述协作方端根据各用户端发送的所述盲化标识符集合进行匹配处理,得到匹配标识符集合,再根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息;
所述处理模块,还用于接收所述协作方端发送的所述若干标识符记录信息,并根据所述若干标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果。
第五方面,本申请公开一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如第一方面以及第二方面中任一项所述的异构数据处理方法。
第六方面,本申请公开一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,以实现如第一方面以及第二方面中任一项所述的异构数据处理方法。
第七方面,本申请公开一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,以实现如第一方面以及第二方面中任一项所述的异构数据处理方法。
本申请实施例提供了一种异构数据处理方法、装置及电子设备,采用上述方案后,协作方端可以先获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合,然后根据预设的匹配规则对各盲化标识符集合进行匹配处理,得到匹配标识符集合,并根据匹配标识符集合以及各盲化标识符集合生成与各用户端对应的标识符记录信息,并将与各 用户端对应的标识符记录信息分别发送至对应的用户端,以使用户端根据接收到的标识符记录信息进行异构数据解密计算,得到待处理业务的明文处理结果,通过加入只处理盲化处理之后的非敏感密文数据的协作方端来实现样本对齐的方式,克服了传统的样本对齐方式中需要披露多方共有的样本标识符集合的问题,降低了用户信息泄露的风险,提高了异构数据盲化处理的安全性,降低了合规风险,进而提高了用户的使用体验。
图1为本申请实施例提供的异构数据处理方法的应用系统的架构示意图;
图2为本申请实施例提供的异构数据处理方法的流程示意图;
图3为本申请另一实施例提供的异构数据处理方法的流程示意图;
图4为本申请实施例提供的异构数据处理装置的结构示意图;
图5为本申请实施例提供的电子设备的硬件结构示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例还能够包括除了图示或描述的那些实例以外的其他顺序实例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
现有技术中,可以通过不同的终端设备(可以为用户端)表示不同的机构,不同的用户端可以各自拥有不同字段的异构数据集,在不相互提供自身敏感数据明文的前提下,可以根据数据集记录的标识符执行样本对齐之后,进行聚合统计运算。例如,用户端甲方拥有100条关于敏感数据X1的记录(id,X1),用户端乙方拥有50条关于敏感数据X2的记录(id,X2)。甲乙双方共同拥有的id集合大小为40,在进行样本对齐之后,即可以找出共同拥有的40条记录,并计算该40条记录的敏感数据积的和数Y=SUM(X1*X2),即可实现异构数据的联合运算。然而,在现实业务中,在进行样本对齐时,输出的为共有的明文信息,即所有参与计算的机构都可以获取其他机构与自己共有的信息,增大了用户信息泄露的风险,进而增大了合规风险,影响了用户的使用体验。
基于上述技术问题,本申请通过加入只处理盲化处理之后的非敏感密文数据的协作方端来实现样本对齐的方式,克服了传统的样本对齐方式中需要披露多方共有的样本标识符集合的问题,降低了用户信息泄露的风险,提高了异构数据盲化处理的安全性,降低了合规风险,进而提高了用户的使用体验。
图1为本申请实施例提供的异构数据处理方法的应用系统的架构示意图,如图1所示, 所述应用系统可以包括:协议方端101以及不同的用户端102,其中,用户端102可以为两个、三个或多个。各用户端102可以对本地存储的待处理业务对应的标识符集合进行盲化处理,得到盲化标识符集合,然后可以将盲化标识符集合发送至协议方端101进行样本对齐处理,生成与各用户端对应的标识符记录信息。再将与各用户端对应的标识符记录信息发送至对应的用户端,以使用户端根据接收到的标识符记录信息进行异构数据解密计算,得到待处理业务的明文处理结果。
其中,协作方端101可以为单独的服务器,也可以为服务器集群。用户端102可以为单独的服务器,也可以为服务器集群,还可以为个人电脑,智能手机、平板等设备。且各用户端可以为相同的设备,也可以为不同的设备。
下面以具体地实施例对本申请的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。
图2为本申请实施例提供的异构数据处理方法的流程示意图,本实施例的方法可以由协议方端101执行。如图2所示,本实施例的方法,可以包括:
S201:获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合。
在本实施例中,在实现待处理业务时,可以先从不同的用户端获取待处理业务对应的数据,然后根据获取到的待处理业务对应的异构数据联合进行处理,进而实现待处理业务。
然而,由于不同用户端中的数据可能有多个,在实现待处理业务时,有的数据可能会被用到,有的数据可能用不到,因此,需要对数据进行样本对齐处理,得到在实现待处理业务过程中,需要用到的相关数据。为了提高样本对齐的效率,可以增加一协议方端,通过用户端向协议方端发送代表数据的标识符集合的方式来实现样本对齐处理的过程,进而提高样本对齐过程的处理效率。
然而,在各用户端向协议方端发送标识符集合的过程中,有可能造成数据的泄露。为了降低数据泄露的风险,各用户端可以先对待处理业务对应的标识符集合进行盲化处理,得到盲化处理之后的盲化标识符集合,然后再将盲化标识符集合发送至协作方端。
其中,盲化标识符集合中可以有多个盲化标识符,每个盲化标识符为一个盲化处理之后的数据标识。示例性的,数据标识可以为数据id,盲化标识符可以为盲化处理之后的id。例如,盲化处理之前的数据标识可以为A0001,盲化处理之后的数据标识为0xAF12C3。
此外,盲化处理方式可以根据实际应用场景自定义进行设置,在此不再详细进行定义。
S202:根据预设的匹配规则对各盲化标识符集合进行匹配处理,得到匹配标识符集合。
在本实施例中,在得到各用户端盲化处理后的盲化标识符集合之后,可以对不同用户端的盲化标识符集合进行匹配处理,得到包含不同盲化标识符集合中相同的标识符信息的匹配标识符集合。
进一步的,根据预设的匹配规则对各盲化标识符集合进行匹配处理,得到匹配标识符集合,具体可以包括:
对不同用户端对应的盲化标识符集合进行比对处理,确定所述不同用户端对应的盲化标识符集合的交集,得到匹配标识符集合。
具体的,可以先对不同用户端对应的盲化标识符集合进行比对处理,确定不同用户端对应的盲化标识符集合中相同的标识符信息,然后可以将所有相同的标识符信息分配到一个新的集合中,得到匹配标识符集合。
S203:根据匹配标识符集合以及各盲化标识符集合生成与各用户端对应的标识符记录信息,并将与各用户端对应的标识符记录信息分别发送至对应的用户端,以使用户端根据接收到的标识符记录信息进行异构数据解密计算,得到待处理业务的明文处理结果。
在本实施例中,在得到匹配标识符集合之后,可以根据匹配标识符集合中的每个盲化标识符信息以及各盲化标识符集合中的盲化标识符信息生成与各用户端对应的标识符记录信息,然后再将与各用户端对应的标识符记录信息分别发送至对应的用户端,以使用户端根据标识符记录信息进行处理,得到待处理业务的明文处理结果。
进一步的,根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息,具体可以包括:
删除所述各盲化标识符集合中与所述匹配标识符集合对应的盲化标识符,得到各用户端对应的私有标识符集合。
具体的,可以分别将用户端对应的盲化标识符集合中匹配标识符集合对应的盲化标识符移除,得到各用户端对应的私有标识符集合。
此外,私有标识符集合的数量可以为零个,可以为一个,也可以为多个。若所有用户端中的盲化标识符集合均相同,则私有标识符集合的数量为零个。若仅有一个用户端的盲化标识符集合包含除匹配标识符集合中的盲化标识符之外的其他盲化标识符,则私有标识符集合的数量为一个。
根据预设的补齐规则对所述各用户端对应的私有标识符集合进行补齐处理,得到补齐之后的各用户端对应的私有标识符集合。
具体的,在对私有标识符集合进行补齐时,具体的处理过程可以为:
确定所述各用户端对应的私有标识符集合中私有标识符数量最多的目标私有标识符集合。
根据所述目标私有标识符集合中私有标识符的数量对所述各用户端对应的私有标识符集合中除所述目标私有标识符集合之外的其他私有标识符集合进行补齐处理,得到补齐之后的各用户端对应的私有标识符集合。
对应的,可以先从私有标识符集合中确定包含私有标识符数量最多的目标私有标识符集合,然后将其他的私有标识符集合中的私有标识符的数量补齐到与目标私有标识符集合中的私有标识符数据一致,即补齐之后的所有私有标识符集合中包含的私有标识符数量是一致的,为后续进行分析处理提供了便利性。其中,私有标识符集合中的私有标识符为盲化处理后的盲化标识符。
其中,在对私有标识符集合进行补齐处理时,可以通过现有的噪声标识符生成规则生成与现有的盲化标识符不重复的噪声标识符的方式进行补齐。
对所述匹配标识符集合以及所述补齐之后的各用户端对应的私有标识符集合进行混合处理,得到对齐样本集合。
具体的,在对私有标识符集合进行补齐之后,为了提高后续数据处理过程的安全性,可以将匹配标识符集合中的数据与补齐之后的各用户端对应的私有标识符集合中的数据随机进行混合,得到混合之后的对齐样本集合。其中,对齐样本集合中的每个对齐样本都有对应的样本编号。
通过将数据进行混合的方式来打乱之前的数据排列顺序,进一步降低了数据泄露的可 能性,进而提高了数据传输的安全性。
针对所述对齐样本集合中的每个对齐样本,若所述对齐样本属于所述匹配标识符集合,则根据预存的第一生成规则生成与所述各用户端对应的标识符记录信息。
若所述对齐样本属于所述各用户端对应的私有标识符集合,则根据预存的第二生成规则生成与所述各用户端对应的标识符记录信息。
具体的,在得到混合处理之后的对齐样本集合后,可以随机从对齐样本集合中选择对齐样本,然后根据对齐样本中包含的样本标识确定对齐样本的原始来源,即确定对齐样本是属于匹配标识符集合还是各用户端对应的私有标识符集合,再根据对齐样本的原始来源来确定对应的生成规则,并根据对应的生成规则生成各用户端对应的标识符记录信息。
进一步的,若所述对齐样本属于所述匹配标识符集合,则根据预存的第一生成规则生成与所述各用户端对应的标识符记录信息,具体可以包括:
若所述对齐样本属于所述匹配标识符集合,则随机生成一对关于数值1的第一私密分片数据。
根据所述对齐样本以及所述第一私密分片数据生成与所述各用户端对应的标识符记录信息。
此外,若所述对齐样本属于所述各用户端对应的私有标识符集合,则根据预存的第二生成规则生成与所述各用户端对应的标识符记录信息,具体可以包括:
若所述对齐样本属于所述各用户端对应的私有标识符集合,则随机生成一对关于数值0的第二私密分片数据。
根据所述对齐样本以及所述第二私密分片数据生成与所述各用户端对应的标识符记录信息。
其中,关于数值1的第一私密分片数据为和为1的一组第一私密分片数据,且第一私密分片数据的数量与用户端的数量相对应。示例性的,若用户端有两个,在生成关于数值1的第一私密分片数据时,先随机生成一随机数r_w,然后再生成第一私密分片数据为selector_w=(r_w,1–r_w)。同理,关于数值0的第二私密分片数据为和为0的一组第二私密分片数据,且第二私密分片数据的数量与用户端的数量相对应。示例性的,若用户端有两个,在生成关于数值0的第二私密分片数据时,先随机生成一随机数r_w,然后再生成第二私密分片数据为selector_w=(r_w,–r_w)。
在得到第一私密分片数据或第二私密分片数据之后,可以根据对齐样本以及得到的第一私密分片数据或第二私密分片数据生成与各用户端相对应的标识符记录信息,即删除掉与其他用户端相关的信息之后的标识符记录信息,然后将生成的与用户端对应的标识符记录信息发送至对应的用户端。
示例性的,假定有两个用户端,分别为用户端A和用户端B,用户端A有四条数据样例,表1为用户端A的数据样例表,数据样例中分别包括标识符信息以及对应的具体数据。其中,用户端A中的具体数据为收入水平(income)。
表1用户端A的数据样例表
id | income |
A0001 | 100 |
A0002 | 300 |
A0003 | 200 |
A0004 | 150 |
用户端B有六条数据样例,表2为用户端B的数据样例表,数据样例中分别包括标识符信息以及对应的具体数据。其中,用户端B中的具体数据为税收(tax_rate)。
表2用户端B的数据样例表
Id | tax_rate |
A0001 | 0.10 |
A0003 | 0.35 |
A0004 | 0.2 |
A0005 | 0.25 |
A0007 | 0.3 |
A0009 | 0.22 |
然后用户端A可以对用户端A的标识符信息(即Id)进行盲化处理,得到用户端A对应的盲化标识符集合id_bn_A={标识符位置pos_A_i,标识符数据id_bn_A_i}。用户端B可以对用户端B的标识符信息(即Id)进行盲化处理,得到用户端B对应的盲化标识符集合id_bn_B={标识符位置pos_B_i,标识符数据id_bn_B_i}。然后用户端A和用户端B可以将各自的盲化标识符集合发送至协作方端C。协作方端C接收到用户端A和用户端B发送的盲化标识符集合之后,可以对用户端A和用户端B的盲化标识符集合进行比对,找到所有相同的盲化标识符,得到匹配标识符集合id_match。对应的,匹配标识符集合id_match中有三个相同的盲化标识符,即A0001、A0003和A0004盲化处理之后得到的盲化标识符0xAF12C3、0xCC6712、0x2E341B。另外,还可以得到两个私有标识符集合,即用户端A对应的私有标识符集合id_rest_A和用户端B对应的私有标识符集合id_rest_B。且协作方端C无法看到标识符明文,只能看到盲化后的标识符,降低了标识符信息泄露的风险。
进一步的,协作方端C可以选取id_rest_A与id_rest_B中较大的集合的大小size_rest_large=max(size(id_rest_A),size(id_rest_B))条记录,然后对应较小的集合添加不重复的噪声标识符,补齐到size_rest_large的大小,得到大小相同的补齐后的id_rest_A与id_rest_B两个集合。其中,对应的标识符位置字段pos_A_i或pos_B_i相应递增。对应的,用户端B比用户端A多2条数据,协作方端C将向id_rest_A中插入2条噪声标识符,使其数据量与id_rest_B相同。然后,协作方端C可以将id_match、id_rest_A和id_rest_B三个集合中的元素打乱混合成一个完整的对齐样本集合BAS,然后可以随机从对齐样本集合中选取第w组的对齐样本,若第w组的对齐样本来自id_match,则生成一对关于数值1的第一秘密分片数据selector_w=(r_w,1–r_w),其中r_w为每次独立新生成的随机数,最终得到的对齐样本为(idx_w,pos_A_i,pos_B_j,selector_w)。其中,该对齐样本也为盲化后的样本对齐集合。
另外,若第w组的对齐样本不是来自id_match,则随机从id_rest_A和id_rest_B中分别选取一组数据,生成一对关于数据0的第二秘密分片数据selector_w=(r_w,-r_w),其中r_w为每次独立新生成的随机数,最终得到的对齐样本为(idx_w,pos_A_i,pos_B_j,selector_w)。其中,每次数据选取为不放回的采样,下一次不会选到之前选过的数据。然 后协作方端C可以根据对齐样本为(idx_w,pos_A_i,pos_B_j,selector_w)得到用户端A和用户端B对应的标识符记录信息,即用户端A对应的标识符记录信息为(idx_w,pos_A_i,selector_w[0]),用户端B对应的标识符记录信息为(idx_w,pos_B_j,selector_w[1])。协作方端C可以通过预先设置的Channel_AC向用户端A发送(idx_w,pos_A_i,selector_w[0]),可以通过预先设置的Channel_BC向用户端B发送(idx_w,pos_B_j,selector_w[1])。
更进一步的,待处理业务可以是转账、查询余额、贷款等业务。在本实施例中,待处理业务为计算纳税总额,即通过表达式SUM(income*tax_rate)来确定纳税总额。因此,用户端A和用户端B在接收到标识符记录信息之后,可以根据接收到的(idx_w,pos_A_i,selector_w[0])和(idx_w,pos_B_j,selector_w[1])来进行异构数据解密计算,得到纳税总额。
采用上述方案后,协作方端可以先获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合,然后根据预设的匹配规则对各盲化标识符集合进行匹配处理,得到相同的标识符信息,并根据相同的标识符信息以及各盲化标识符集合得到匹配标识符集合,再根据匹配标识符集合以及各盲化标识符集合生成与各用户端对应的标识符记录信息,并将与各用户端对应的标识符记录信息分别发送至对应的用户端,以使用户端根据接收到的标识符记录信息进行异构数据解密计算,得到待处理业务的明文处理结果,通过加入只处理盲化处理之后的非敏感密文数据的协作方端来实现样本对齐的方式,克服了传统的样本对齐方式中需要披露多方共有的样本标识符集合的问题,降低了用户信息泄露的风险,提高了异构数据处理的安全性,降低了合规风险,进而提高了用户的使用体验。
基于图2的方法,本说明书实施例还提供了该方法的一些具体实施方案,下面进行说明。
此外,在另一实施例中,在根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息之后,所述还可以包括:
根据预存的哈希函数对所述与所述各用户端对应的标识符记录信息进行转换处理,得到转换之后的标识符记录信息。
将所述转换之后的标识符记录信息上传至预设的区块链中。
在本实施例中,为了提高数据的安全性,可以在得到标识符记录信息之后,将标识符记录信息上传至预设的区块链中,防止其他用户端进行篡改,支持事后审计。此外,在各用户端得到盲化后的盲化标识符集合之后,也可以根据预存的哈希函数对盲化标识符集合进行转换处理,得到转换之后的盲化标识符集合,然后可以将转换之后的盲化标识符集合上传至预设的区块链中。其中,区块链可以为通过现有方式实现的,在此不再详细进行限定。
此外,在另一实施例中,在所述获取各用户端发送的待处理业务盲化处理之后的盲化标识符集合之前,所述方法还可以包括:
根据预设的通道构建规则构建所述协作方端与所述各用户端之间的数据传输通道。
则获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合,具体可以包括:
通过预先构建的数据传输通道获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合。
在本实施例中,各用户端可以先获取自己的私钥与公钥,然后相互发送各自的公钥, 再基于接收到的公钥构建对应的数据传输通道。示例性的,假设有用户端A和用户端B两个用户端,用户端A从权威证书颁发机构处为自己的私钥sk_A申请包含公钥pk_A的证书,用户端B从权威证书颁发机构处为自己的私钥sk_B申请包含公钥pk_B的证书,用户端A和用户端B相互发送各自公钥,并基于相关的证书核实其真实。然后用户端A和用户端B基于对方的公钥,构建一个防窃听的安全信道,即数据传输通道Channel_AB,当用户端A向用户端B发送消息m时,通过Channel_AB进行发送,即先通过B的公钥pk_B加密,然后发送给B,B收到之后,用自己的私钥sk_B解密,获得m的明文,反之亦然。此外,还可以引入第三个不经手敏感数据的协作方端C,重复上述步骤,构建对应的安全信道Channel_AC和Channel_BC,通过数据传输通道来传输数据的方式,提高了数据传输的安全性。
图3为本申请另一实施例提供的异构数据处理方法的流程示意图,本实施例的方法可以由用户端102执行。如图3所示,本实施例的方法,可以包括:
S301:根据预存的盲化处理规则对待处理业务对应的标识符数据进行盲化处理,得到盲化标识符集合。
在本实施例中,对于各用户端中待处理业务对应的至少一标识符数据,可以根据预存的盲化处理规则进行处理,得到盲化标识符集合。
进一步的,根据预存的盲化处理规则对待处理业务对应的标识符数据进行盲化处理,得到盲化标识符集合,具体可以包括:
根据预设的盲化参数处理规则确定盲化参数。
根据预设的哈希函数对所述盲化参数以及所述待处理业务对应的标识符数据进行处理,得到初始盲化标识符集合。
根据预设的噪声生成规则生成预设数量的,且与所述初始盲化标识符集合不重复的噪声标识符。
根据噪声标识符和所述初始盲化标识符集合得到盲化标识符集合。
具体的,在根据预设的盲化参数处理规则确定盲化参数时,可以随机确定一第一初始盲化参数,同时接收其他用户端发送的第二初始盲化参数。
然后对所述第一初始盲化参数和所述第二初始盲化参数做异或运算,得到盲化参数。其中,第二初始盲化参数可以为一个或多个。
此外,在根据预设的噪声生成规则生成预设数量的,且与所述初始盲化标识符集合不重复的噪声标识符时,可以先根据预设的噪声生成率生成N个与现有的初始盲化标识符集合中的盲化标识符不重复的噪声标识符,然后将N个噪声标识符加入到初始盲化标识符集合中,并打乱其顺序,得到盲化标识符集合。
示例性的,有用户端A和用户端B两个用户端,用户端A独立选取随机数b_A,将其作为盲化种子分片,通过Channel_AB发送给用户端B。用户端B独立选取随机数b_B,将其作为盲化种子分片,通过Channel_AB发送给用户端A。用户端A计算盲化参数b=b_A XOR b_B。然后用户端A对于自己的N_A条记录中所有的id字段,可以依次计算id_b=Hash(id,b),获得N_A个盲化标识符集合id_b_A。例如,id_b=Hash("A0001",b)=一个与"A0001"对应的盲化字符串0xAF12C3,以此在样本对齐阶段避免了id明文数据的泄露。
另外,用户端A基于自己预算的噪音率rate_n,生成N_A*rate_n个与id_b_A中现有 元素不重复的噪声标识符,并将生成的噪声标识符加到初始盲化标识符集合id_b_A中,然后打乱其顺序,最终获得的盲化标识符集合,记为id_bn_A={标识符位置pos_A_i,标识符数据id_bn_A_i}。
例如,N_A=4,取rate_n=50%,N_A*rate_n=2,则向id_bn_A中加入2个噪声标识符,即盲化标识符集合中有4+2=6条盲化标识符。
用户端B同理,可以生成自己的盲化标识符集合记为id_bn_B={标识符位置pos_B_i,标识符数据id_bn_B_i}。
其中,对于用户端A与用户端B共有的id,因输入相同,所以生成的盲化id值也相同。例如,id_b=Hash("A0001",b)=一个与"A0001"对应的盲化字符串0xAF12C3,由此可用于后续的样本id盲化比对过程中确定匹配标识符集合。
S302:将盲化标识符集合发送至协作方端,以使协作方端根据各用户端发送的盲化标识符集合进行匹配处理,得到匹配标识符集合,再根据匹配标识符集合以及各盲化标识符集合生成与各用户端对应的标识符记录信息。
在本实施例中,用户端在得到盲化标识符集合之后,可以将盲化标识符通过预设的数据传输通道发送至协作方端。协作方端在接收到用户端发送的盲化标识符集合之后,还可以同时接收其他用户发送的盲化标识符集合,在接收到各用户端发送的盲化标识符集合之后,可以根据各用户端发送的盲化标识符集合进行处理,得到各用户端对应的若干标识符记录信息。其中,具体处理过程已在前述实施例中详细进行描述,在此不再重复进行限定。
S303:接收协作方端发送的若干标识符记录信息,并根据若干标识符记录信息进行异构数据解密计算,得到待处理业务的明文处理结果。
在本实施例中,协作方端在得到与各用户端对应的标识符记录信息之后,可以将得到的若干标识符记录信息返回至对应的用户端,用户端接收到返回的标识符记录信息之后,可以根据若干标识符记录信息进行异构数据解密计算,得到待处理业务的明文处理结果。
进一步的,标识符记录信息中包含标识符位置,则根据所述若干标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果,具体可以包括:
针对每个所述标识符记录信息,根据所述标识符位置判断所述标识符记录信息是否为噪声标识符。
若所述标识符记录信息不是噪声标识符,则根据所述标识符位置以及所述待处理业务对应的标识符数据确定所述标识符位置对应的第一初始数据。
根据预设联合计算处理规则对所述第一初始数据以及其他用户端发送的第二初始数据联合进行处理,得到所述标识符记录信息对应的中间结果分片。
对每个所述标识符记录信息对应的中间结果分片进行聚合处理,得到中间结果,并根据预设的安全多方计算协议对所述中间结果进行解密处理,得到所述待处理业务的明文处理结果。
具体的,每个标识符记录信息中均包含一标识符位置,即pos_A_i或pos_B_j,然后可以根据pos_A_i或pos_B_j确定是否为最初获取的数据集样例中的数据。其中,确定方式可以有多种,在此仅是列举了一种具体的实现方式,其他确定方式也在本申请的保护范围内。
更进一步的,标识符记录信息中包含第一私密分片数据或第二私密分片数据生成的子 私密分片数据,所述根据预设联合计算处理规则对所述第一初始数据以及其他用户端发送的第二初始数据联合进行处理,得到所述标识符记录信息对应的中间结果分片,具体可以包括:
根据预设的分片规则对所述第一初始数据进行分片处理,得到若干第一初始子数据。
将所述若干第一初始子数据中预设数量的第一初始子数据发送至其他用户端,得到第一剩余初始子数据,同时接收所述其他用户端发送的预设数量的第二外发初始子数据,其中,所述第二外发初始子数据为其他用户端中的第二子数据根据预设的分片规则进行分片处理得到的。
根据预设的私密共享协议对所述第一剩余初始子数据、所述第二外发初始子数据、第一子私密分片数据以及所述其他用户端中的第二剩余初始子数据、第一外发初始子数据和第二子私密分片数据联合进行计算,得到所述标识符记录信息对应的中间结果分片,其中,所述第一子私密分片数据为本地用户端对应的标识符记录信息中包含的子私密分片数据,所述第二子私密分片数据为所述其他用户端对应的标识符记录信息中包含的子私密分片数据。
具体的,其他用户端在发送第二外发初始子数据之后,剩余的数据可以称为第二剩余初始子数据。
此外,所述方法还可以包括:
若所述标识符记录信息是噪声标识符,则将所述标识符位置对应的数据设置为零。
示例性的,以用户端A和用户端B为例,用户端A在本地读取(idx_w,pos_A_i,selector_w[0])中的pos_A_i所指的记录相关数据,确定在最初的数据集样例中对应的income字段,具体可以为:
如果pos_A_i超出用户端A最初的数据集样例中的最大记录数,表明是协作端方C添加的噪声记录,将income设置为0。
如果pos_A_i指向用户端A原先自己添加的噪声记录,也将income设置为0。
否则,读取pos_A_i在最初的数据集样例中对应的真实income数据的字段。
此外,用户端A可以将income字段基于所选的安全多方计算协议(可选用经典的秘密共享协议),生成一个新的随机数r_w,将income值v1_w拆分成两个随机分片集合v1_w=(r_w,v1_w–r_w),并将v1_w[0](即第一外发初始子数据)发送给用户端B。
用户端B在本地读取(idx_w,pos_B_j,selector_w[1])中的pos_B_j所指的记录相关数据,在最初的数据集样例中,对应tax_rate字段,具体可以为:
如果pos_B_i超出用户端B在最初的数据集样例中的最大记录数,表明是协作端方C添加的噪声记录,将tax_rate设置为0。
如果pos_B_i指向用户端B原先自己添加的噪音记录,也将tax_rate设置为0。
否则,读取pos_B_i指向包含真实tax_rate数据的字段。
此外,用户端B对tax_rate字段划分,将tax_rate值v2_w拆分成两个随机分片集合v2_w=(r_w,v2_w–r_w),并将v2_w[0](即第二外发初始子数据)发送给用户端A。
用户端A和用户端B基于现有的安全多方计算协议(如经典的秘密共享协议),根据用户端A手中的分片数据v1_w[1](即第一剩余初始子数据),v2_w[0](即第二外发初始子数据),selector_w[0](即第一子私密分片数据)和用户端B手中的分片数据v1_w[0], v2_w[1],selector_w[1](即第二子私密分片数据),在不泄露v1_w,v2_w,selector_w明文值效果下,联合计算得到v1_w*v2_w*selector_w相关的中间结果分片。
其中,只有当第w组包含正确的对齐数据时,selector_w的值才为1,此时计算将获得包含源自真实数据的密文中间结果,例如示例中A0001、A0003、A0004对应的数据,否则selector_w为0,对应密文中间结果也是0=0*任意噪声标识符,由此消除了噪声标识符对最终结果的影响。
重复前述过程,直到完成对整个样本对齐集合的遍历,聚合每条标识符激励信息对应的中间结果分片后,基于安全多方计算协议(如经典的秘密共享协议)进行最终的解密,获得最后待处理业务的明文处理结果。
基于前述实施例,表3为样本对齐后的逻辑汇总表,具体包含用户端A和用户端B共有的标识符信息及对应的具体信息。
表3样本对齐后的逻辑汇总表
id | income | tax |
A0001 | 100 | 0.10 |
A0003 | 200 | 0.35 |
A0004 | 150 | 0.2 |
即明文处理结果纳税总额SUM(income*tax_rate)=100*0.10+200*0.35+150*0.2=110。
可选的,用户端A和用户端B可以分别计算本地的分片数据的聚合Hash,并将其结果发送到区块链,进行存证。
此外,在另一实施例中,在得到所述待处理业务的明文处理结果之后,所述方法还可以包括:
根据预存的哈希函数对所述待处理业务的明文处理结果进行转换处理,得到转换之后的明文处理结果。
将所述转换之后的明文处理结果上传至预设的区块链中。
通过结合区块链的防篡改能力,对关键密码输入和最终的明文处理结果进行了链上存证,提高了数据的安全性,且支持事后审计。
基于同样的思路,本说明书实施例还提供了上述方法对应的装置,图4为本申请实施例提供的异构数据处理装置的结构示意图,应用于协作方端,如图4所示,本实施例提供的装置,可以包括:
获取模块401,用于获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合。
处理模块402,用于根据预设的匹配规则对各盲化标识符集合进行匹配处理,得到匹配标识符集合。
在本实施例中,所述处理模块402,还用于:
对不同用户端对应的盲化标识符集合进行比对处理,确定所述不同用户端对应的盲化标识符集合的交集,得到匹配标识符集合。
所述处理模块402,还用于根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息,并将与所述各用户端对应的标识符记录信息分别 发送至对应的用户端,以使所述用户端根据接收到的标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果。
在本实施例中,所述处理模块402,还用于:
根据预设的补齐规则对所述各用户端对应的私有标识符集合进行补齐处理,得到补齐之后的各用户端对应的私有标识符集合。
对所述匹配标识符集合以及所述补齐之后的各用户端对应的私有标识符集合进行混合处理,得到对齐样本集合。
针对所述对齐样本集合中的每个对齐样本,若所述对齐样本属于所述匹配标识符集合,则根据预存的第一生成规则生成与所述各用户端对应的标识符记录信息。
若所述对齐样本属于所述各用户端对应的私有标识符集合,则根据预存的第二生成规则生成与所述各用户端对应的标识符记录信息。
进一步的,所述处理模块402,还用于:
若所述对齐样本属于所述匹配标识符集合,则随机生成一对关于数值1的第一私密分片数据。
根据所述对齐样本以及所述第一私密分片数据生成与所述各用户端对应的标识符记录信息。
此外,所述处理模块402,还用于:
若所述对齐样本属于所述各用户端对应的私有标识符集合,则随机生成一对关于数值0的第二私密分片数据。
根据所述对齐样本以及所述第二私密分片数据生成与所述各用户端对应的标识符记录信息。
此外,所述处理模块,还用于:
确定所述各用户端对应的私有标识符集合中私有标识符数量最多的目标私有标识符集合。
根据所述目标私有标识符集合中私有标识符的数量对所述各用户端对应的私有标识符集合中除所述目标私有标识符集合之外的其他私有标识符集合进行补齐处理,得到补齐之后的各用户端对应的私有标识符集合。
此外,在另一实施例中,所述处理模块,还用于:
根据预存的哈希函数对所述与所述各用户端对应的标识符记录信息进行转换处理,得到转换之后的标识符记录信息。
将所述转换之后的标识符记录信息上传至预设的区块链中。
此外,在另一实施例中,所述处理模块,还用于:
根据预设的通道构建规则构建所述协作方端与所述各用户端之间的数据传输通道。
通过预先构建的数据传输通道获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合。
在另一实施例中,本申请还提供了另一种异构数据处理装置,应用于用户端,所述装置可以包括:
处理模块,用于根据预存的盲化处理规则对待处理业务对应的标识符数据进行盲化处理,得到盲化标识符集合。
在本实施例中,所述处理模块,还用于:
根据预设的盲化参数处理规则确定盲化参数。
根据预设的哈希函数对所述盲化参数以及所述待处理业务对应的标识符数据进行处理,得到初始盲化标识符集合。
根据预设的噪声生成规则生成预设数量的,且与所述初始盲化标识符集合不重复的噪声标识符。
根据所述噪声标识符和所述初始盲化标识符集合得到盲化标识符集合。
进一步的,所述处理模块,还用于:
随机确定一第一初始盲化参数,同时接收其他用户端发送的第二初始盲化参数。
对所述第一初始盲化参数和所述第二初始盲化参数做异或运算,得到盲化参数。
发送模块,用于将所述盲化标识符集合发送至协作方端,以使所述协作方端根据各用户端发送的所述盲化标识符集合进行匹配处理,得到匹配标识符集合,再根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息。
所述处理模块,还用于接收所述协作方端发送的若干标识符记录信息,并根据所述若干标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果。
在本实施例中,所述标识符记录信息中包含标识符位置,所述处理模块,还用于:
针对每个所述标识符记录信息,根据所述标识符位置判断所述标识符记录信息是否为噪声标识符。
若所述标识符记录信息不是噪声标识符,则根据所述标识符位置以及所述待处理业务对应的标识符数据确定所述标识符位置对应的第一初始数据。
根据预设联合计算处理规则对所述第一初始数据以及其他用户端发送的第二初始数据联合进行处理,得到所述标识符记录信息对应的中间结果分片。
对每个所述标识符记录信息对应的中间结果分片进行聚合处理,得到中间结果,并根据预设的安全多方计算协议对所述中间结果进行解密处理,得到所述待处理业务的明文处理结果。
进一步的,所述标识符记录信息中包含第一私密分片数据或第二私密分片数据生成的子私密分片数据,所述处理模块,还用于:
根据预设的分片规则对所述第一初始数据进行分片处理,得到若干第一初始子数据。
将所述若干第一初始子数据中预设数量的第一初始子数据发送至其他用户端,得到第一剩余初始子数据,同时接收所述其他用户端发送的预设数量的第二外发初始子数据,其中,所述第二外发初始子数据为其他用户端中的第二子数据根据预设的分片规则进行分片处理得到的。
根据预设的私密共享协议对所述第一剩余初始子数据、所述第二外发初始子数据、第一子私密分片数据以及所述其他用户端中的第二剩余初始子数据、第一外发初始子数据和第二子私密分片数据联合进行计算,得到所述标识符记录信息对应的中间结果分片,其中,所述第一子私密分片数据为本地用户端对应的标识符记录信息中包含的子私密分片数据,所述第二子私密分片数据为所述其他用户端对应的标识符记录信息中包含的子私密分片数据。
此外,所述处理模块,还用于:
若所述标识符记录信息是噪声标识符,则将所述标识符位置对应的数据设置为零。
此外,在另一实施例中,所述处理模块,还用于:
根据预存的哈希函数对所述待处理业务的明文处理结果进行转换处理,得到转换之后的明文处理结果。
将所述转换之后的明文处理结果上传至预设的区块链中。
本申请实施例提供的装置,可以实现上述如图2所示的实施例的方法,其实现原理和技术效果类似,此处不再赘述。
图5为本申请实施例提供的电子设备的硬件结构示意图,如图5所示,本实施例提供的设备500包括:处理器501,以及与所述处理器通信连接的存储器。其中,处理器501、存储器502通过总线503连接。
在具体实现过程中,处理器501执行所述存储器502存储的计算机执行指令,使得处理器501执行上述方法实施例中的异构数据处理方法。
处理器501的具体实现过程可参见上述方法实施例,其实现原理和技术效果类似,本实施例此处不再赘述。
在上述的图5所示的实施例中,应理解,处理器可以是中央处理单元(英文:Central Processing Unit,简称:CPU),还可以是其他通用处理器、数字信号处理器(英文:Digital Signal Processor,简称:DSP)、专用集成电路(英文:Application Specific Integrated Circuit,简称:ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合发明所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器可能包含高速RAM存储器,也可能还包括非易失性存储NVM,例如至少一个磁盘存储器。
总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component Interconnect,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,本申请附图中的总线并不限定仅有一根总线或一种类型的总线。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现上述方法实施例的异构数据处理方法。
本申请实施例还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,实现如上所述的异构数据处理方法。
上述的计算机可读存储介质,上述可读存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。可读存储介质可以是通用或专用计算机能够存取的任何可用介质。
一种示例性的可读存储介质耦合至处理器,从而使处理器能够从该可读存储介质读取信息,且可向该可读存储介质写入信息。当然,可读存储介质也可以是处理器的组成部分。处理器和可读存储介质可以位于专用集成电路(Application Specific Integrated Circuits,简称: ASIC)中。当然,处理器和可读存储介质也可以作为分立组件存在于设备中。
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。
Claims (20)
- 一种异构数据处理方法,其特征在于,应用于协作方端,包括:获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合;根据预设的匹配规则对各盲化标识符集合进行匹配处理,得到匹配标识符集合;根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息,并将与所述各用户端对应的标识符记录信息分别发送至对应的用户端,以使所述用户端根据接收到的标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果。
- 根据权利要求1所述的方法,其特征在于,所述根据预设的匹配规则对各盲化标识符集合进行匹配处理,得到匹配标识符集合,包括:对不同用户端对应的盲化标识符集合进行比对处理,确定所述不同用户端对应的盲化标识符集合的交集,得到匹配标识符集合。
- 根据权利要求2所述的方法,其特征在于,所述根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息,包括:删除所述各盲化标识符集合中与所述匹配标识符集合对应的盲化标识符,得到各用户端对应的私有标识符集合;根据预设的补齐规则对所述各用户端对应的私有标识符集合进行补齐处理,得到补齐之后的各用户端对应的私有标识符集合;对所述匹配标识符集合以及所述补齐之后的各用户端对应的私有标识符集合进行混合处理,得到对齐样本集合;针对所述对齐样本集合中的每个对齐样本,若所述对齐样本属于所述匹配标识符集合,则根据预存的第一生成规则生成与所述各用户端对应的标识符记录信息;若所述对齐样本属于所述各用户端对应的私有标识符集合,则根据预存的第二生成规则生成与所述各用户端对应的标识符记录信息。
- 根据权利要求3所述的方法,其特征在于,所述若所述对齐样本属于所述匹配标识符集合,则根据预存的第一生成规则生成与所述各用户端对应的标识符记录信息,包括:若所述对齐样本属于所述匹配标识符集合,则随机生成一对关于数值1的第一私密分片数据;根据所述对齐样本以及所述第一私密分片数据生成与所述各用户端对应的标识符记录信息。
- 根据权利要求3或4所述的方法,其特征在于,所述若所述对齐样本属于所述各用户端对应的私有标识符集合,则根据预存的第二生成规则生成与所述各用户端对应的标识符记录信息,包括:若所述对齐样本属于所述各用户端对应的私有标识符集合,则随机生成一对关于数值0的第二私密分片数据;根据所述对齐样本以及所述第二私密分片数据生成与所述各用户端对应的标识符记录信息。
- 根据权利要求3-5任一项所述的方法,其特征在于,所述根据预设的补齐规则对所述各用户端对应的私有标识符集合进行补齐处理,得到补齐之后的各用户端对应的私有标识符集合,包括:确定所述各用户端对应的私有标识符集合中私有标识符数量最多的目标私有标识符集合;根据所述目标私有标识符集合中私有标识符的数量对所述各用户端对应的私有标识符集合中除所述目标私有标识符集合之外的其他私有标识符集合进行补齐处理,得到补齐之后的各用户端对应的私有标识符集合。
- 根据权利要求1-6任一项所述的方法,其特征在于,在所述根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息之后,还包括:根据预存的哈希函数对所述与所述各用户端对应的标识符记录信息进行转换处理,得到转换之后的标识符记录信息;将所述转换之后的标识符记录信息上传至预设的区块链中。
- 根据权利要求1-7任一项所述的方法,其特征在于,在所述获取各用户端发送的待处理业务盲化处理之后的盲化标识符集合之前,还包括:根据预设的通道构建规则构建所述协作方端与所述各用户端之间的数据传输通道;则所述获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合,包括:通过预先构建的数据传输通道获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合。
- 一种异构数据处理方法,其特征在于,应用于用户端,包括:根据预存的盲化处理规则对待处理业务对应的标识符数据进行盲化处理,得到盲化标识符集合;将所述盲化标识符集合发送至协作方端,以使所述协作方端根据各用户端发送的所述盲化标识符集合进行匹配处理,得到匹配标识符集合,再根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息;接收所述协作方端发送的若干标识符记录信息,并根据所述若干标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果。
- 根据权利要求9所述的方法,其特征在于,所述根据预存的盲化处理规则对待处理业务对应的标识符数据进行盲化处理,得到盲化标识符集合,包括:根据预设的盲化参数处理规则确定盲化参数;根据预设的哈希函数对所述盲化参数以及所述待处理业务对应的标识符数据进行处理,得到初始盲化标识符集合;根据预设的噪声生成规则生成预设数量的,且与所述初始盲化标识符集合不重复的噪声标识符;根据所述噪声标识符和所述初始盲化标识符集合得到盲化标识符集合。
- 根据权利要求10所述的方法,其特征在于,所述根据预设的盲化参数处理规则确定盲化参数,包括:随机确定一第一初始盲化参数,同时接收其他用户端发送的第二初始盲化参数;对所述第一初始盲化参数和所述第二初始盲化参数做异或运算,得到盲化参数。
- 根据权利要求9-11任一项所述的方法,其特征在于,所述标识符记录信息中包含标识符位置,所述根据所述若干标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果,包括:针对每个所述标识符记录信息,根据所述标识符位置判断所述标识符记录信息是否为噪声标识符;若所述标识符记录信息不是噪声标识符,则根据所述标识符位置以及所述待处理业务对应的标识符数据确定所述标识符位置对应的第一初始数据;根据预设联合计算处理规则对所述第一初始数据以及其他用户端发送的第二初始 数据联合进行处理,得到所述标识符记录信息对应的中间结果分片;对每个所述标识符记录信息对应的中间结果分片进行聚合处理,得到中间结果,并根据预设的安全多方计算协议对所述中间结果进行解密处理,得到所述待处理业务的明文处理结果。
- 根据权利要求12所述的方法,其特征在于,所述标识符记录信息中包含第一私密分片数据或第二私密分片数据生成的子私密分片数据,所述根据预设联合计算处理规则对所述第一初始数据以及其他用户端发送的第二初始数据联合进行处理,得到所述标识符记录信息对应的中间结果分片,包括:根据预设的分片规则对所述第一初始数据进行分片处理,得到若干第一初始子数据;将所述若干第一初始子数据中预设数量的第一初始子数据发送至其他用户端,得到第一剩余初始子数据,同时接收所述其他用户端发送的预设数量的第二外发初始子数据,其中,所述第二外发初始子数据为其他用户端中的第二子数据根据预设的分片规则进行分片处理得到的;根据预设的私密共享协议对所述第一剩余初始子数据、所述第二外发初始子数据、第一子私密分片数据以及所述其他用户端中的第二剩余初始子数据、第一外发初始子数据和第二子私密分片数据联合进行计算,得到所述标识符记录信息对应的中间结果分片,其中,所述第一子私密分片数据为本地用户端对应的标识符记录信息中包含的子私密分片数据,所述第二子私密分片数据为所述其他用户端对应的标识符记录信息中包含的子私密分片数据。
- 根据权利要求12或13所述的方法,其特征在于,还包括:若所述标识符记录信息是噪声标识符,则将所述标识符位置对应的数据设置为零。
- 根据权利要求9-14任一项所述的方法,其特征在于,在所述得到所述待处理业务的明文处理结果之后,还包括:根据预存的哈希函数对所述待处理业务的明文处理结果进行转换处理,得到转换之后的明文处理结果;将所述转换之后的明文处理结果上传至预设的区块链中。
- 一种异构数据处理装置,其特征在于,应用于协作方端,包括:获取模块,用于获取各用户端发送的待处理业务对应的盲化处理之后的盲化标识符集合;处理模块,用于根据预设的匹配规则对各盲化标识符集合进行匹配处理,得到匹配标识符集合;所述处理模块,还用于根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息,并将与所述各用户端对应的标识符记录信息分别发送至对应的用户端,以使所述用户端根据接收到的标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果。
- 一种异构数据处理装置,其特征在于,应用于用户端,包括:处理模块,用于根据预存的盲化处理规则对待处理业务对应的标识符数据进行盲化处理,得到盲化标识符集合;发送模块,用于将所述盲化标识符集合发送至协作方端,以使所述协作方端根据各用户端发送的所述盲化标识符集合进行匹配处理,得到匹配标识符集合,再根据所述匹配标识符集合以及所述各盲化标识符集合生成与所述各用户端对应的标识符记录信息;所述处理模块,还用于接收所述协作方端发送的若干标识符记录信息,并根据所 述若干标识符记录信息进行异构数据解密计算,得到所述待处理业务的明文处理结果。
- 一种电子设备,其特征在于,包括处理器和存储器;其中,所述存储器,用于存储程序代码;所述处理器,用于调用所述存储器中所存储的程度代码,以实现如权利要求1-8或9-15中任一项所述的异构数据处理方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当该指令在计算机上运行时,以实现如权利要求1-8或9-15中任一项所述的异构数据处理方法。
- 一种计算机程序,其特征在于,包括程序代码,当计算机运行所述计算机程序时,所述程序代码执行时实现如权利要求1-8或9-15中任一项所述的异构数据处理方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111462228.9 | 2021-12-02 | ||
CN202111462228.9A CN114154196A (zh) | 2021-12-02 | 2021-12-02 | 异构数据处理方法、装置及电子设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023098294A1 true WO2023098294A1 (zh) | 2023-06-08 |
Family
ID=80456014
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/124375 WO2023098294A1 (zh) | 2021-12-02 | 2022-10-10 | 异构数据处理方法、装置及电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114154196A (zh) |
WO (1) | WO2023098294A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116842561A (zh) * | 2023-06-29 | 2023-10-03 | 上海零数众合信息科技有限公司 | 一种数据集可动态增删的隐私求交系统和方法 |
CN117577248A (zh) * | 2024-01-15 | 2024-02-20 | 浙江大学 | 融合区块链与隐私求交技术的医疗数据共享方法及系统 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114154196A (zh) * | 2021-12-02 | 2022-03-08 | 深圳前海微众银行股份有限公司 | 异构数据处理方法、装置及电子设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190349191A1 (en) * | 2018-05-08 | 2019-11-14 | NEC Laboratories Europe GmbH | Dynamic anonymous password-authenticated key exchange (apake) |
CN110572253A (zh) * | 2019-09-16 | 2019-12-13 | 济南大学 | 一种联邦学习训练数据隐私性增强方法及系统 |
US20200401726A1 (en) * | 2017-11-20 | 2020-12-24 | Singapore Telecommunications Limited | System and method for private integration of datasets |
CN114154196A (zh) * | 2021-12-02 | 2022-03-08 | 深圳前海微众银行股份有限公司 | 异构数据处理方法、装置及电子设备 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11494506B2 (en) * | 2018-04-19 | 2022-11-08 | Google Llc | Security measures for determination of private set intersections |
CN110348231B (zh) * | 2019-06-18 | 2020-08-14 | 阿里巴巴集团控股有限公司 | 实现隐私保护的数据同态加解密方法及装置 |
CN113556733B (zh) * | 2020-04-14 | 2023-09-22 | 大唐移动通信设备有限公司 | 签约隐藏标识符生成、解密方法及相关装置 |
CN113282934B (zh) * | 2021-05-07 | 2022-05-03 | 深圳大学 | 数据处理方法及装置 |
CN113434888B (zh) * | 2021-07-06 | 2022-08-26 | 建信金融科技有限责任公司 | 数据共享方法、装置、设备及系统 |
-
2021
- 2021-12-02 CN CN202111462228.9A patent/CN114154196A/zh active Pending
-
2022
- 2022-10-10 WO PCT/CN2022/124375 patent/WO2023098294A1/zh unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200401726A1 (en) * | 2017-11-20 | 2020-12-24 | Singapore Telecommunications Limited | System and method for private integration of datasets |
US20190349191A1 (en) * | 2018-05-08 | 2019-11-14 | NEC Laboratories Europe GmbH | Dynamic anonymous password-authenticated key exchange (apake) |
CN110572253A (zh) * | 2019-09-16 | 2019-12-13 | 济南大学 | 一种联邦学习训练数据隐私性增强方法及系统 |
CN114154196A (zh) * | 2021-12-02 | 2022-03-08 | 深圳前海微众银行股份有限公司 | 异构数据处理方法、装置及电子设备 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116842561A (zh) * | 2023-06-29 | 2023-10-03 | 上海零数众合信息科技有限公司 | 一种数据集可动态增删的隐私求交系统和方法 |
CN116842561B (zh) * | 2023-06-29 | 2024-05-24 | 上海零数众合信息科技有限公司 | 一种数据集可动态增删的隐私求交系统和方法 |
CN117577248A (zh) * | 2024-01-15 | 2024-02-20 | 浙江大学 | 融合区块链与隐私求交技术的医疗数据共享方法及系统 |
CN117577248B (zh) * | 2024-01-15 | 2024-04-05 | 浙江大学 | 融合区块链与隐私求交技术的医疗数据共享方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN114154196A (zh) | 2022-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023098294A1 (zh) | 异构数据处理方法、装置及电子设备 | |
EP3811560B1 (en) | Systems and methods for permissioned blockchain infrastructure with fine-grained access control and confidentiality-preserving publish/subscribe messaging | |
CN108811519B (zh) | 用于在不公开特定识别信息的情况下建立标识符之间的链接的系统和方法 | |
US11962513B2 (en) | Verification of data processes in a network of computing resources | |
US10284462B2 (en) | Verification of data processes in a network of computing resources | |
Sarfraz et al. | Privacy aware IOTA ledger: Decentralized mixing and unlinkable IOTA transactions | |
US10609010B2 (en) | System, methods and software application for sending secured messages on decentralized networks | |
WO2019227225A1 (en) | Systems and methods for establishing communications via blockchain | |
US20210344500A1 (en) | Computer-implemented system and method for transferring access to digital resource | |
CN112003696B (zh) | Sm9密钥生成方法、系统、电子设备、装置及存储介质 | |
CN116506124B (zh) | 多方隐私求交系统及方法 | |
Yan et al. | A dynamic integrity verification scheme of cloud storage data based on lattice and Bloom filter | |
CN112235111B (zh) | 密钥生成方法、装置、设备及计算机可读存储介质 | |
US10530581B2 (en) | Authenticated broadcast encryption | |
CN114785524B (zh) | 电子印章生成方法、装置、设备和介质 | |
CN112597542A (zh) | 目标资产数据的聚合方法及装置、存储介质、电子装置 | |
CN116681141A (zh) | 隐私保护的联邦学习方法、终端及存储介质 | |
WO2022068234A1 (zh) | 基于共享根密钥的加密方法、装置、设备及介质 | |
CN117478303B (zh) | 区块链隐蔽通信方法、系统和计算机设备 | |
WO2024138854A1 (zh) | 联邦学习预测阶段隐私保护方法及系统 | |
Somaiya et al. | Implementation and evaluation of EMAES–A hybrid encryption algorithm for sharing multimedia files with more security and speed | |
CN117371011A (zh) | 数据隐匿查询方法、电子设备和可读存储介质 | |
CN116032639A (zh) | 基于隐私计算的消息推送方法及装置 | |
Ruan et al. | Efficient Private Set Intersection Using Point‐Value Polynomial Representation | |
CN117910024B (zh) | 密钥生成方法及装置、电子设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22900101 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |