WO2023124945A1 - 多方数据处理方法、系统、电子装置和存储介质 - Google Patents

多方数据处理方法、系统、电子装置和存储介质 Download PDF

Info

Publication number
WO2023124945A1
WO2023124945A1 PCT/CN2022/138422 CN2022138422W WO2023124945A1 WO 2023124945 A1 WO2023124945 A1 WO 2023124945A1 CN 2022138422 W CN2022138422 W CN 2022138422W WO 2023124945 A1 WO2023124945 A1 WO 2023124945A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
calculation
computing
initiator
sub
Prior art date
Application number
PCT/CN2022/138422
Other languages
English (en)
French (fr)
Inventor
李伟
邱炜伟
刘敬
汪小益
蔡亮
Original Assignee
杭州趣链科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州趣链科技有限公司 filed Critical 杭州趣链科技有限公司
Publication of WO2023124945A1 publication Critical patent/WO2023124945A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials

Definitions

  • the present application relates to the field of data processing, in particular to a multi-party data processing method, system, electronic device and storage medium.
  • Secure multi-party computing solves the problem of how to safely calculate an agreed function in the absence of a trusted third party.
  • each participant does not need to disclose its original data to the other party or a third party to complete the calculation of the agreed function that requires the original data of multiple participants.
  • the computing strategy is to load the data into the memory directly For calculation, when the amount of data to be calculated is getting larger and larger, the memory can only be increased to support it, but when the amount of data reaches TB (Terabyte) or even PB (petabyte), increasing the memory cannot solve the problem, and due to the The computing power is limited, and when the amount of data is too large, the calculation speed will be very slow.
  • SPDZ MP-SPDZ: A Versatile Framework for Multi-Party Computation
  • ABY3 A Mixed Protocol Framework for Machine Learning
  • a multi-party data processing method, system, device and storage medium are provided.
  • the present application provides a multi-party data processing method, including:
  • the initiator and the participant obtain their own data sets and calculation operators respectively;
  • the data set is divided to obtain sub-data sets, and the sub-data sets are distributed to each calculation slave node of the party, and each calculation slave node according to the The calculation operator executes the corresponding calculation logic;
  • the initiator calculation slave node obtains the calculation result according to the corresponding calculation data provided by each participant calculation slave node, and sends the calculation result to the initiator calculation master node;
  • the initiator calculation master node performs data aggregation on the calculation results according to the calculation operator to obtain aggregated data.
  • the method also includes:
  • the data cutting function perform data segmentation on the data set to obtain the sub-data set and label the sub-data set, assign the sub-data set to each calculation slave node of the party, and have the same sub-data set label Execute the corresponding calculation logic between the initiator computing slave node and the participant computing slave node;
  • the initiator calculation slave node obtains the calculation result according to the corresponding calculation data provided by each participant calculation slave node, and sends the calculation result to the initiator calculation master node.
  • the calculation result obtained by the computing slave node of the initiator according to the corresponding computing data provided by the computing slave nodes of each participant includes:
  • the participant computing slave node sends the computing data and the corresponding sub-data set label to the participant computing master node
  • the participant computing master node sends the computing data and the corresponding sub-dataset label to the initiator computing master node, and the initiator computing master node sends the computing data to The corresponding computing slave node of the initiator obtains the computing result according to the corresponding computing data provided by the computing slave node of each participating party.
  • performing data segmentation on the data set according to the data cutting function in the calculation operator to obtain the sub-data sets and labeling the sub-data sets includes:
  • the data set is a vector data set, it is cut according to the segmentation mode, and the sub-data set label includes the corresponding segment sequence number;
  • the data set is a collection data set
  • read each element in the data set calculate the bucket number by modulo after the hash operation, and write it into the corresponding sub-data set file, the sub-data set
  • the dataset label includes the bucket number.
  • the obtaining of respective calculation operators by the initiator and the participant includes:
  • the initiator obtains a calculation operator, parses the configuration file in the calculation operator, reads the algorithm type and version number therein, and performs overwriting if the calculation operator of the same algorithm type already exists;
  • the initiator distributes the calculation operators to each of the participants.
  • the calculation operator includes a data cutting function, a data aggregation function, calculation logic and a message type of the calculation data which are set independently.
  • the present application provides a multi-party data processing system, including an initiator and a participant, the initiator includes an initiator computing master node and an initiator computing slave node, and the participant includes a participant computing master node and a participating Party computing slave nodes:
  • the initiating party and the participating party respectively obtain their respective data sets and calculation operators, perform data segmentation on the data sets according to the calculation operators to obtain sub-data sets, and distribute the sub-data sets to their own parties
  • Each calculation slave node, each of the calculation slave nodes executes corresponding calculation logic according to the calculation operator;
  • the initiator calculation slave node obtains the calculation result according to the corresponding calculation data provided by each participant calculation slave node, and sends the calculation result to the initiator calculation master node, and the initiator calculation master node calculates according to the calculation operator Data aggregation is performed on the calculation results to obtain aggregated data.
  • the computing nodes include computing master nodes and computing slave nodes, each of the computing nodes includes a scheduler, and the scheduler includes virtual machines and network components,
  • the virtual machine is executing the multi-party data processing method described in any one of the above;
  • the network component is used for data communication between the computing nodes.
  • the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the computer program, the above-mentioned The multi-party data processing method described in the first aspect.
  • the present application provides a storage medium on which a computer program is stored, and when the program is executed by a processor, the multi-party data processing method as described in the first aspect above is implemented.
  • Fig. 1 is a flowchart of a multi-party data processing method according to an embodiment of the present application.
  • Fig. 2 is a flowchart of a multi-party data processing method according to another embodiment of the present application.
  • Fig. 3 is a flowchart of a multi-party data interaction method according to another embodiment of the present application.
  • Fig. 4 is a flowchart of a multi-party data interaction method according to yet another embodiment of the present application.
  • Fig. 5 is a schematic diagram of a multi-party data processing method according to an optional embodiment of the present application.
  • Fig. 6 is a structural block diagram of a multi-party data processing system according to an embodiment of the present application.
  • Fig. 7 is a structural block diagram of a computing node according to an embodiment of the present application.
  • FIG. 1 is a flowchart of a method for processing multi-party data according to an embodiment of the present application. As shown in FIG. 1 , the process includes the following steps:
  • step S101 the initiator and the participant obtain their own data sets and computing operators respectively.
  • the initiator and participant of the multi-party computation obtain the computation operator sent to each party respectively, and the computation operator contains the cutting function and computation logic of each party's data set.
  • the initiator and participant will obtain the data set uploaded by the party for calculation.
  • Step S102 divide the data set according to the calculation operator to obtain sub-data sets, and distribute the sub-data sets to each calculation slave node on its own side, and each calculation slave node executes the corresponding calculation logic according to the calculation operator.
  • the initiator and participant After the initiator and participant receive the uploaded data set, they will split their respective data sets according to the cutting function in their calculation operators to obtain sub-data sets.
  • the initiator and each participant send the sub-dataset to each computing slave node owned by the party.
  • the number of computing slave nodes owned by the initiator and each participant can be the same or different.
  • the slave nodes of the initiator can participate in the data calculation, or, in some embodiments, if the initiator has no data to upload, the initiator can act as the manager, only for scheduling and summarizing the data calculation of each participant.
  • the calculation slave nodes of all parties After the calculation slave nodes of all parties obtain the sub-data set, they perform corresponding calculations on the sub-data set according to the calculation logic in the calculation operator. It should be noted that the above calculations include not only simple mathematical calculations, but also various data processing methods such as screening, comparing, or set operations on data.
  • step S103 the computing slave node of the initiator obtains the calculation result according to the corresponding calculation data provided by the computing slave nodes of each participant, and sends the calculation result to the computing master node of the initiator.
  • the calculation slave nodes of the participants send the calculation data to the corresponding calculation slave nodes of the initiator, and each initiator calculation slave node completes the collection of the corresponding calculation data of the participants, and then communicates with the initiator calculation slave nodes.
  • the data in the node is calculated to obtain the calculation result.
  • Each initiator computing slave node sends the calculation result to the initiator computing master node.
  • step S104 the computing master node of the initiator performs data aggregation on the computing results according to the computing operators to obtain aggregated data.
  • the computing master node of the initiator calls the aggregation function in the computing operator to aggregate the computing results in all computing sub-nodes to obtain the final aggregated data, which is also the final result of multi-party computing.
  • the calculation operator includes independently set data cutting function, data aggregation function, calculation logic and message type of the calculation data.
  • a multi-party data processing method with high versatility for multi-party privacy computing is provided.
  • the independent setting of the cutting function, calculation logic, aggregation function and message type of the calculation data in the calculation operator in the case of a huge amount of data In this way, data segmentation, efficient computing processing and aggregation are realized.
  • algorithm developers do not need to pay attention to writing data cutting logic, aggregation logic and data transmission between nodes, but only write calculation logic with modification.
  • FIG. 2 is a flowchart of a multi-party data processing method according to another embodiment of the present application. As shown in FIG. 2, the method further includes the following steps:
  • Step S201 perform data segmentation on the data set to obtain sub-datasets and label the sub-datasets, assign the sub-datasets to each computing slave node of the party, and the initiator computing slaves with the same sub-dataset label
  • the corresponding calculation logic is executed between the nodes and the participant computing slave nodes.
  • the initiator and all participants split their own dataset into multiple sub-datasets according to the data cutting function of the operator, and return the id of each sub-dataset, that is, the sub-dataset label.
  • an optional data cutting function is provided in the data cutting function.
  • the data set is a vector data set, it is cut in segments, and the sub-data set labels include corresponding segment numbers.
  • Vector operation is to calculate data with the same sub-dataset label in multiple parties, such as vector addition, multiplication, etc.;
  • the data set is a collection data set
  • read each element in the data set calculate the bucket number by modulo after the hash operation, and write it into the corresponding sub-dataset file.
  • the sub-dataset label includes the bucket number .
  • a hash function can also be called a hash function.
  • the function of the hash function is that the target key passes a mapping method, or a function operation f, and finally obtains the target hash value.
  • the function f here is called hash function or hash function.
  • the hash bucket algorithm is to resolve hash conflicts. That is, different target keys get the same hash value after mapping.
  • the so-called hash bucket algorithm is actually a method for chain addresses to resolve conflicts.
  • setting the number of buckets to 5 is the number of f(key) sets.
  • the hash value can be used as the index of the bucket.
  • Use f(key) to get 1,2,3,4,5 respectively to get 1,2,3,4,0 then you can put these keys into the first address of bucket 1,2,3,4,0
  • the hash bucket operation is performed on the elements in the data set, the calculated bucket number is written into the corresponding sub-dataset file, and the bucket number is used as the label of the sub-dataset.
  • the two preset data segmentation functions provided in the above embodiments can cover most data segmentation scenarios, there is no need to perform additional operations on the data segmentation function when the calculation operator provided by this method is called.
  • Write and set In the multi-party computing process, if there is a need for other data cutting methods, the data cutting function can also be edited and called by modifying the part of the data cutting function in the calculation operator.
  • the above data cutting functions are not used to limit the applicable methods of this method. Data splitting method.
  • step S202 the initiator computing slave node obtains the calculation result according to the corresponding calculation data provided by each participant computing slave node, and sends the calculation result to the initiator computing master node. Since the sub-datasets are divided according to the data cutting function in the calculation operator, the sub-datasets that need to exchange protocol data during the multi-party computing process have the same sub-dataset label.
  • Each calculation slave node of the initiator starts to execute the initiator logic in the calculation operator at the same time, and each calculation slave node of the participant starts to execute the participant logic in the calculation operator at the same time, and the initiator and participant with the same sub-dataset id
  • the corresponding algorithmic logic will be executed between nodes, and data interaction may occur.
  • the sub-datasets requiring multi-party interaction have the same sub-dataset labels in the initiator and each participant. It will further improve the efficiency of multi-party secure computing.
  • FIG. 3 is a flow chart of a multi-party data interaction method according to another embodiment of the present application.
  • the initiator computing slave node obtains the corresponding computing data provided by each participating party computing slave node. Calculation results include:
  • Step S301 the participant computing slave node sends the computing data and the corresponding sub-data set label to the participant computing master node;
  • step S302 the computing master node of the participant sends the calculation data and the corresponding sub-data set label to the initiator computing master node, and the initiator computing master node sends the calculation data to the corresponding initiator computing slave node according to the sub-data set label, and initiates Party computing slave nodes obtain calculation results according to the corresponding computing data provided by each participating party computing slave node.
  • a data interaction method that is, the data interaction between the initiator and each participant in the calculation process needs to be transmitted through the calculation master node of each party, but cannot be transmitted through the calculation slave node of each party directly, thus improving the security of computing data.
  • FIG. 4 is a flow chart of a multi-party data interaction method according to yet another embodiment of the present application. As shown in FIG. 4 , the initiator and the participant respectively obtain their own calculation operators including:
  • step S401 the originator acquires a computing operator, parses the configuration file in the computing operator, reads the algorithm type and version number therein, and performs overwriting if a computing operator of the same algorithm type already exists.
  • the code of the calculated operator is packaged and uploaded to the system of the initiator of the calculation operator call.
  • the system parses the configuration file in the calculation operator package and reads the algorithm type and version number in it. If the same type and version already exist The calculation operator package of the calculation operator package is overwritten and replaces the old calculation operator package.
  • Step S402 the initiator distributes computing operators to each participant.
  • Computing operators are distributed by the initiator to each participant.
  • the distribution method of computing operators is not limited to acquisition, update and distribution by the initiator.
  • This embodiment provides the registration process of computing operators and supports dynamic upgrade and replacement of computing operators.
  • the operation and maintenance of the system can be easily carried out .
  • Fig. 5 is a schematic diagram of a multi-party data processing method according to an optional embodiment of the present application. As shown in Fig. 5, the calculation operator execution flow of the multi-party secure privacy calculation is as follows:
  • Step S1 the initiator of the calculation operator and each participant upload the data set to their own system respectively.
  • step S2 the initiator of the privacy calculation distributes the operators of this call to all participants to ensure that the operators of all parties are up-to-date.
  • institution 1 initiating privacy computing
  • institution 2 is regarded as the participant, and so on.
  • step S3 the initiator and all participants split their own data set into multiple sub-data sets according to the data cutting function of the operator, and return the sub-data set label of each sub-data set, that is, the sub-data set id.
  • the data sets in Institution 1 and Institution 2 are divided into sub-data 0, sub-data set 1, and sub-data set 2 respectively.
  • the data cutting function provides two default optional data cutting methods with high applicability: if it is a vector data set, it will be cut directly according to the segmentation method, and the sub-data set id includes the corresponding serial number; if it is a collection type Data set, set operation, multi-party intersection, union or complement, etc., read each element in the data set, calculate the bucket number of the sub-data set it belongs to after the hash bucket operation, and write it into the corresponding sub-data set In the file, the subdataset id includes the calculated bucket number.
  • Step S4 the initiator and each participant equally distribute the sub-datasets and send them to all privacy computing slave nodes owned by the party.
  • the purpose of equal distribution is to further improve computing efficiency and prevent large differences in the calculation amount of each computing slave node.
  • Step S5 each computing slave node used for privacy computing of the initiator starts to execute the initiator logic of the computing operator at the same time, and each computing slave node of the participant party starts executing the participant logic of the computing operator at the same time,
  • the data used are sub-datasets allocated by the master node.
  • step S6 the corresponding algorithmic logic of privacy calculation is executed between the initiator and participant nodes with the same sub-dataset id.
  • the slave node cannot directly send data to the slave nodes of other institutions, it needs to send it to the computing master node of the party first, and then the computing master node of the party sends the data and the sub-dataset id to which the data belongs To the computing master node of the other party, after receiving it, the computing master node of the other party sends it to the computing slave node corresponding to the received data according to the sub-dataset id to which the data belongs.
  • step S7 after all sub-data sets have executed the algorithm logic, all computing slave nodes of the initiator send all computing results to the computing master node.
  • the initiator calculation master node calls the aggregation function in the operator package to aggregate all sub-results into a complete calculation result.
  • the system natively supports the expansion of the amount of data, and developers of privacy computing algorithms do not need to pay attention to the split computing and aggregation computing logic of writing data and the transmission of data between nodes in the algorithm process, and can focus more on the algorithm itself Logical writing.
  • This method supports the dynamic upgrade and replacement of privacy computing operators. When the number of nodes expands to dozens, the operation and maintenance of the system can be easily carried out.
  • step S1 and step S2 can be interchanged.
  • This embodiment also provides a multi-party data processing system, which is used to implement the foregoing embodiments and optional implementation manners.
  • module may be a combination of software and/or hardware that realize a predetermined function.
  • devices described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
  • FIG. 6 is a structural block diagram of a multi-party data processing system according to an embodiment of the present application.
  • the system includes an initiator 60 and a participant 64, and there may be more than one initiator and participant in an actual application.
  • the initiator 60 includes the initiator computing master node 62 and the initiator computing slave node 63
  • the participant 64 includes the participant computing master node 66 and the participant computing slave node 67:
  • Initiator 60 and participant 64 obtain their respective data sets and calculation operators, divide the data sets according to the calculation operators to obtain sub-data sets, and distribute the sub-data sets to each calculation slave node of the party, including the initiator Party computing slave node 63 and participant computing slave node 67, each computing slave node executes corresponding computing logic according to computing operators;
  • Initiator computing slave node 63 and participant computing slave node 67 execute the calculation logic process, initiator computing slave node 63 obtains the calculation result according to the corresponding calculation data provided by each participant computing slave node 67, and sends the calculation result to the initiator Computing master node 62, the initiator computing master node 62 performs data aggregation on the calculation results according to the calculation operator to obtain the aggregated data.
  • FIG. 7 is a structural block diagram of a computing node according to an embodiment of the present application.
  • the computing node includes a computing master node and a computing slave node, each computing node includes a scheduler 72, and the scheduler includes The virtual machine 74 and the network component 76, the virtual machine 74 is configured to execute the multi-party data processing methods in the foregoing embodiments and optional embodiments.
  • the network component 76 is used to perform data communication between computing nodes, realize data interaction and computing operator distribution, and so on.
  • the initiator 60 and the participant 64 divide the data set according to the data cutting function to obtain sub-data sets and label the sub-data sets, assign the sub-data sets to each computing slave node of the party, and initiate The party computing slave node 63 and the participant computing slave node 67 execute corresponding computing logic between the initiator computing slave node 63 and the participant computing slave node 67 having the same sub-data set label;
  • the computing slave node 67 of the participating party sends the computing data to the computing slave node 63 of the originating party with the same sub-data set label, and the computing slave node 63 of the initiating party calculates the slave node 63 according to the The calculation result is obtained from the corresponding calculation data, and the calculation result is sent to the originator calculation master node 62 .
  • the participant computing slave node 67 sends the computing data and the corresponding sub-data set label to the participant computing master node 66, and the participant computing master node 66 sends the computing data and the corresponding sub-data set label to the initiator
  • each of the above-mentioned modules may be a function module or a program module, and may be realized by software or hardware.
  • the above modules may be located in the same processor; or the above modules may be located in different processors in any combination.
  • This embodiment also provides an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to perform the steps in any one of the above method embodiments.
  • the above-mentioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the above-mentioned processor, and the input-output device is connected to the above-mentioned processor.
  • the above-mentioned processor may be configured to execute the following steps through a computer program:
  • the initiator and the participant obtain their own data sets and calculation operators respectively;
  • the data set is divided to obtain sub-data sets, and the sub-data sets are distributed to each calculation slave node of the party, and each calculation slave node executes the corresponding calculation logic according to the calculation operator;
  • the initiator calculation slave node obtains the calculation result according to the corresponding calculation data provided by each participant calculation slave node, and sends the calculation result to the initiator calculation master node;
  • the computing master node of the initiator performs data aggregation on the calculation results according to the calculation operator to obtain the aggregated data.
  • the above-mentioned processor may also be configured to perform the following steps through a computer program:
  • the data cutting function perform data segmentation on the data set to obtain the sub-data set and label the sub-data set, assign the sub-data set to each calculation slave node of the party, and have the same sub-data set label Execute the corresponding calculation logic between the initiator computing slave node and the participant computing slave node;
  • the initiator calculation slave node obtains the calculation result according to the corresponding calculation data provided by each participant calculation slave node, and sends the calculation result to the initiator calculation master node.
  • the above-mentioned processor may also be configured to perform the following steps through a computer program:
  • the participant computing slave node sends the computing data and the corresponding sub-data set label to the participant computing master node
  • the participant computing master node sends the computing data and the corresponding sub-dataset label to the initiator computing master node, and the initiator computing master node sends the computing data to The corresponding computing slave node of the initiator obtains the computing result according to the corresponding computing data provided by the computing slave node of each participating party.
  • the above-mentioned processor may also be configured to perform the following steps through a computer program:
  • the data set is a vector data set, it is cut according to the segmentation mode, and the sub-data set label is its corresponding segment serial number;
  • the data set is a collection data set
  • read each element in the data set calculate the bucket number by modulo after the hash operation, and write it into the corresponding sub-data set file, the sub-data set
  • the dataset number is the bucket number.
  • the above-mentioned processor may also be configured to perform the following steps through a computer program:
  • the initiator acquires a calculation operator, parses the configuration file in the calculation operator, reads the algorithm type and version number therein, and performs overwriting if the calculation operator of the same algorithm type already exists;
  • the initiator distributes the calculation operators to each of the participants.
  • the calculation operator includes a data cutting function, a data aggregation function, calculation logic and a message type of the calculation data which are set independently.
  • this embodiment of the present application may provide a storage medium for implementation.
  • a computer program is stored on the storage medium; when the computer program is executed by a processor, any one of the multi-party data processing methods in the foregoing embodiments is implemented.
  • the initiator and the participant obtain their own data sets and calculation operators respectively;
  • the data set is divided to obtain sub-data sets, and the sub-data sets are distributed to each calculation slave node of the party, and each calculation slave node executes the corresponding calculation logic according to the calculation operator;
  • the initiator calculation slave node obtains the calculation result according to the corresponding calculation data provided by each participant calculation slave node, and sends the calculation result to the initiator calculation master node;
  • the computing master node of the initiator performs data aggregation on the calculation results according to the calculation operator to obtain the aggregated data.
  • the above computer program implements the following steps when executed by a processor:
  • the data cutting function perform data segmentation on the data set to obtain the sub-data set and label the sub-data set, assign the sub-data set to each calculation slave node of the party, and have the same sub-data set label Execute the corresponding calculation logic between the initiator computing slave node and the participant computing slave node;
  • the initiator calculation slave node obtains the calculation result according to the corresponding calculation data provided by each participant calculation slave node, and sends the calculation result to the initiator calculation master node.
  • the above computer program implements the following steps when executed by a processor:
  • the participant computing slave node sends the computing data and the corresponding sub-data set label to the participant computing master node
  • the participant computing master node sends the computing data and the corresponding sub-dataset label to the initiator computing master node, and the initiator computing master node sends the computing data to The corresponding computing slave node of the initiator obtains the calculation result according to the corresponding computing data provided by the computing slave node of each participating party.
  • the above computer program implements the following steps when executed by a processor:
  • the data set is a vector data set, it is cut according to the segmentation mode, and the sub-data set label is its corresponding segment serial number;
  • the data set is a collection data set
  • read each element in the data set calculate the bucket number by modulo after the hash operation, and write it into the corresponding sub-data set file, the sub-data set
  • the dataset label is the bucket number.
  • the above computer program implements the following steps when executed by a processor:
  • the initiator obtains a calculation operator, parses the configuration file in the calculation operator, reads the algorithm type and version number therein, and performs overwriting if the calculation operator of the same algorithm type already exists;
  • the initiator distributes the calculation operators to each of the participants.
  • the calculation operator includes a data cutting function, a data aggregation function, calculation logic and a message type of the calculation data which are set independently.

Abstract

一种多方数据处理方法,包括:通过发起方和参与方分别获取各自的数据集以及计算算子;根据计算算子中的数据切割函数对数据集进行数据分割得到子数据集,并将子数据集分配至本方的各个计算从节点,各个计算从节点根据计算算子执行对应的计算逻辑;发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将计算结果发送至发起方计算主节点;发起方计算主节点根据计算算子对计算结果进行数据聚合,得到聚合数据。

Description

多方数据处理方法、系统、电子装置和存储介质
相关申请
本申请要求2021年12月28日申请的,申请号为202111631336.4,发明名称为“多方数据处理方法、系统、电子装置和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理领域,特别是涉及多方数据处理方法、系统、电子装置和存储介质。
背景技术
安全多方计算解决的是针对无可信第三方的情况下,如何安全地计算一个约定函数的问题。安全多方计算中,各个参与方无需向对方或第三方透露其原数据,即可完成需要多个参与方原数据才能完成的约定函数的计算。相关的开源的隐私计算框架如SPDZ(MP-SPDZ:A Versatile Framework for Multi-Party Computation),ABY3(ABY3:A Mixed Protocol Framework for Machine Learning)等,计算策略都是将数据加载到内存中直接进行计算,当需要计算的数据量越来越大时,只能增大内存来支持,但是当数据量达到TB(Terabyte)甚至PB(petabyte)时,增大内存已无法解决问题,且由于单机的算力有限,数据量过大时,计算速度会非常慢。
目前针对相关技术中多方计算效率低的问题,尚未提出有效的解决方案。
发明内容
根据本申请的各种实施例,提供一种多方数据处理方法、系统、装置和存储介质。
第一方面,本申请提供一种多方数据处理方法,包括:
发起方和参与方分别获取各自的数据集以及计算算子;
根据所述计算算子中的数据切割函数对所述数据集进行数据分割得到子数据集,并将所述子数据集分配至本方的各个计算从节点,各个所述计算从节点根据所述计算算子执行对应的计算逻辑;
发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将 所述计算结果发送至所述发起方计算主节点;
所述发起方计算主节点根据所述计算算子对所述计算结果进行数据聚合,得到聚合数据。
在其中一些实施例中,所述方法还包括:
根据所述数据切割函数,对所述数据集进行数据分割得到所述子数据集并标注子数据集标号,将所述子数据集分配至本方的各个计算从节点,拥有相同子数据集标号的所述发起方计算从节点和所述参与方计算从节点之间执行对应的所述计算逻辑;
发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将所述计算结果发送至所述发起方计算主节点。
在其中一些实施例中,所述发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果包括:
所述参与方计算从节点将所述计算数据以及对应的子数据集标号发送至参与方计算主节点,
所述参与方计算主节点将所述计算数据以及对应的子数据集标号发送至所述发起方计算主节点,所述发起方计算主节点根据所述子数据集标号将所述计算数据发送至对应的发起方计算从节点,所述发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果。
在其中一些实施例中,所述根据所述计算算子中的数据切割函数,对所述数据集进行数据分割得到所述子数据集并标注子数据集标号包括:
在所述数据集为向量类数据集的情况下,按照分段的方式进行切割,子数据集标号包括对应的分段序号;
在所述数据集为集合类数据集的情况下,读取所述数据集中的每一个元素,哈希运算后取模计算出桶号,并写入对应的子数据集文件中,所述子数据集标号包括所述桶号。
在其中一些实施例中,所述发起方和所述参与方分别获取各自的计算算子包括:
所述发起方获取计算算子,解析所述计算算子中的配置文件,读取其中的算法种类和版本号,如果已存在相同算法种类的所述计算算子则执行覆盖;
所述发起方将所述计算算子分发至各个所述参与方。
在其中一些实施例中,所述计算算子包括独立设置的数据切割函数、数据聚合函数、计算逻辑和所述计算数据的消息类型。
第二方面,本申请提供一种多方数据处理系统,包括发起方和参与方,所述发起方包括发起方计算主节点和发起方计算从节点,所述参与方包括参与方计算主节点和参与方计 算从节点:
所述发起方和所述参与方分别获取各自的数据集以及计算算子,根据所述计算算子对所述数据集进行数据分割得到子数据集,并将所述子数据集分配至本方的各个计算从节点,各个所述计算从节点根据所述计算算子执行对应的计算逻辑;
发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将所述计算结果发送至所述发起方计算主节点,所述发起方计算主节点根据所述计算算子对所述计算结果进行数据聚合,得到聚合数据。
在其中一些实施例中,所述计算节点包括计算主节点和计算从节点,每个所述计算节点包括调度器,所述调度器包括虚拟机和网络组件,
所述虚拟机于执行上述任一项所述的多方数据处理方法;
所述网络组件用于在所述计算节点间进行数据通信。
第三方面,本申请提供一种电子装置,包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上述第一方面所述的多方数据处理方法。
第四方面,本申请提供一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上述第一方面所述的多方数据处理方法。
本申请的一个或多个实施例的细节在以下附图和描述中提出,以使本申请的其他特征、目的和优点更加简明易懂。
附图说明
为了更好地描述和说明这里公开的本申请的实施例和/或示例,可以参考一幅或多幅附图。用于描述附图的附加细节或示例不应当被认为是对所公开的申请、目前描述的实施例和/或示例以及目前理解的这些申请的最佳模式中的任何一者的范围的限制。
图1是根据本申请一个实施例的多方数据处理方法的流程图。
图2是根据本申请另一个实施例的多方数据处理方法的流程图。
图3是根据本申请又一实施例的多方数据交互方法的流程图。
图4是根据本申请再一实施例的多方数据交互方法的流程图。
图5是根据本申请可选实施例的多方数据处理方法的示意图。
图6是根据本申请实施例的多方数据处理系统的结构框图。
图7是根据本申请实施例的计算节点的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本 申请进行描述和说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。基于本申请提供的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。此外,还可以理解的是,虽然这种开发过程中所作出的努力可能是复杂并且冗长的,然而对于与本申请公开的内容相关的本领域的普通技术人员而言,在本申请揭露的技术内容的基础上进行的一些设计,制造或者生产等变更只是常规的技术手段,不应当理解为本申请公开的内容不充分。
在本申请中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域普通技术人员显式地和隐式地理解的是,本申请所描述的实施例在不冲突的情况下,可以与其它实施例相结合。
除非另作定义,本申请所涉及的技术术语或者科学术语应当为本申请所属技术领域内具有一般技能的人士所理解的通常意义。本申请所涉及的“一”、“一个”、“一种”、“该”等类似词语并不表示数量限制,可表示单数或复数。本申请所涉及的术语“包括”、“包含”、“具有”以及它们任何变形,意图在于覆盖不排他的包含;例如包含了一系列步骤或模块(单元)的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可以还包括没有列出的步骤或单元,或可以还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。本申请所涉及的“连接”、“相连”、“耦接”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电气的连接,不管是直接的还是间接的。本申请所涉及的“多个”是指大于或者等于两个。“和/或”描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。本申请所涉及的术语“第一”、“第二”、“第三”等仅仅是区别类似的对象,不代表针对对象的特定排序。
本实施例提供了一种多方数据处理方法,图1是根据本申请一个实施例的多方数据处理方法的流程图,如图1所示,该流程包括如下步骤:
步骤S101,发起方和参与方分别获取各自的数据集以及计算算子。多方计算的发起方与参与方分别获取发送到各方的计算算子,该计算算子中包含了各方数据集的切割函数和计算逻辑。此外,发起方和参与方会获取该方上传的用于计算的数据集。
步骤S102,根据计算算子对数据集进行数据分割得到子数据集,并将子数据集分配至本方的各个计算从节点,各个计算从节点根据计算算子执行对应的计算逻辑。在发起方和参与方收到上传的数据集之后,就会根据各自计算算子中的切割函数将各自的数据集进行 拆分,得到子数据集。发起方和各个参与方将子数据集发送到本方拥有的各个计算从节点中。发起方和各个参与方拥有的计算从节点数目可以是一样的,也可以是不同的。发起方的从节点可以参与数据计算,或者,在一些实施例中,发起方没有数据上传的情况下,发起方可以作为管理方,仅用于各个参与方数据计算的调度和汇总。各方的计算从节点在获取到子数据集后,根据计算算子中的计算逻辑对子数据集进行相应的计算。需要说明的是,上述计算不仅仅包括简单的数学计算,还包括各种对数据进行筛选、对比或者集合运算等数据处理方式。
步骤S103,发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将计算结果发送至发起方计算主节点。参与方的计算从节点在执行对应计算逻辑过程中,将计算数据发送至发起方对应的计算从节点,由各个发起方计算从节点完成对参与方相应计算数据的搜集,再与发起方计算从节点中的数据进行计算得到计算结果。各个发起方计算从节点将计算结果发给发起方计算主节点。
步骤S104,发起方计算主节点根据计算算子对计算结果进行数据聚合,得到聚合数据。发起方计算主节点调用计算算子中的聚合函数,将所有计算子节点中的计算结果进行聚合,得到最终的聚合数据,也是多方计算的最终结果。
所述计算算子包括独立设置的数据切割函数、数据聚合函数、计算逻辑和所述计算数据的消息类型。
通过上述步骤,提供了一种通用性高的多方隐私计算的多方数据处理方法,通过计算算子中独立设置的切割函数、计算逻辑、聚合函数和计算数据的消息类型,在数据量巨大的情况下,实现对数据的切割、高效计算处理和聚合。并且,由于切割函数、计算逻辑、聚合函数和计算数据的消息类型的分块设置,算法开发人员无需关注编写数据的切割逻辑、聚合逻辑以及数据在节点间的传输,而仅仅进行计算逻辑的编写与修改。
在其中一些实施例中,图2是根据本申请另一个实施例的多方数据处理方法的流程图,如图2所示,该方法还包括以下步骤:
步骤S201,根据数据切割函数,对数据集进行数据分割得到子数据集并标注子数据集标号,将子数据集分配至本方的各个计算从节点,拥有相同子数据集标号的发起方计算从节点和参与方计算从节点之间执行对应的计算逻辑。在本步骤中,发起方和所有参与方分别对本方的数据集按照算子的数据切割函数拆分成多个子数据集的同时会返回每个子数据集的id,即子数据集标号。
可选地,数据切割函数中提供了可选的数据切割函数,在数据集为向量类数据集的情况下,按照分段的方式进行切割,子数据集标号包括对应的分段序号。向量运算是对多方 相同子数据集标号的数据进行计算,如向量加法,乘法等;
在数据集为集合类数据集的情况下,读取数据集中的每一个元素,哈希运算后取模计算出桶号,并写入对应的子数据集文件中,子数据集标号包括桶号。通常哈希函数也可以称为散列函数,哈希函数的功能是目标key通过一种映射方法,或者说是一种函数运算f,最后得到目标的哈希值,这里的函数f就称为哈希函数或者散列函数。而哈希桶算法,则是为了解决哈希冲突的。也就是不同的目标key通过映射后得到了同样值的哈希值。而所谓的哈希桶算法其实就是链地址解决冲突的方法,例如设置桶的个数为5,也就是f(key)集合的个数,而这样的话,哈希值就可以作为桶的索引,将1,2,3,4,5分别通过f(key)得到1,2,3,4,0,则可将这几个key放入桶1,2,3,4,0的首地址所指的内存中,然后处理值为6的key,得到哈希值值为1,需要放入桶1中,但桶1的首地址已经有了元素1,那么就可以为每个桶开辟一片内存,内存中存放所有哈希值相同的key,冲突的key之间用单向链表进行存储,就解决了哈希冲突。在查找对应key的时候,只需要通过key索引到对应的桶,然后从桶的首地址对应的节点开始查找,就是链表顺序找到,对比key的值,直到找到对应key的信息。在本实施例中,对数据集中的元素进行哈希桶运算,计算出桶号写入对应的子数据集文件,且用桶号作为子数据集标号。
需要说明的是,上述实施例中提供的两种预设的数据分割函数由于可以涵盖大多数数据分割场景,因此在调用本方法提供的计算算子的情况下,无需对数据分割函数进行额外的编写和设定。在多方计算过程中,若有其他数据切割方式的需求,也可以通过修改计算算子中数据切割函数的部分,进行数据切割方式的编辑和调用,上述数据切割函数不用于限制本方法所适用的数据切割方式。
步骤S202,发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将计算结果发送至所述发起方计算主节点。由于子数据集是根据计算算子中的数据切割函数进行分割的,多方计算过程中需要进行协议数据交互的子数据集拥有相同的子数据集标号。发起方的各个计算从节点同时开始执行计算算子中的发起方逻辑,参与方的各个计算从节点同时开始执行计算算子中的参与方逻辑,拥有相同子数据集id的发起方和参与方节点之间会执行对应的算法逻辑,并可能产生数据交互。
在本实施例中,通过对数据切割后的子数据集进行子数据集标号的标注,需要进行多方交互的子数据集在发起方以及各个参与方中拥有相同的子数据集标号。将进一步提高多方安全计算的效率。
在其中一些实施例中,图3是根据本申请又一实施例的多方数据交互方法的流程图,如图3所示,发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结 果包括:
步骤S301,参与方计算从节点将计算数据以及对应的子数据集标号发送至参与方计算主节点;
步骤S302,参与方计算主节点将计算数据以及对应的子数据集标号发送至发起方计算主节点,发起方计算主节点根据子数据集标号将计算数据发送至对应的发起方计算从节点,发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果。
在本实施例中,提供了一种数据交互方式,即运算过程中发起方以及各个参与方之间的数据交互,需要通过各方的计算主节点进行传输,而不能通过各方的计算从节点之间直接进行,从而提高了计算数据的安全性。
在一些实施例中,图4是根据本申请再一实施例的多方数据交互方法的流程图,如图4所示,发起方和所述参与方分别获取各自的计算算子包括:
步骤S401,发起方获取计算算子,解析计算算子中的配置文件,读取其中的算法种类和版本号,如果已存在相同算法种类的计算算子则执行覆盖。编写好计算算子代码被打包上传到该计算算子调用发起方的系统中,系统解析该计算算子包中的配置文件,读取其中的算法种类和版本号,如果已存在相同种类和版本的计算算子包则覆盖,替换旧的计算算子包。
步骤S402,发起方将计算算子分发至各个参与方。由发起方将计算算子分发至各个参与方。计算算子的分发方式不限于由发起方进行获取、更新和分发。
本实施例中提供了计算算子的注册流程,支持计算算子的动态升级替换,当参与方较多且各方计算从节点数量扩展到几十个时,可以很方便的进行系统的运维。
下面通过可选实施例对本申请实施例进行描述和说明。
图5是根据本申请可选实施例的多方数据处理方法的示意图,如图5所示,多方安全隐私计算的计算算子执行流程如下:
步骤S1,计算算子的发起方和各参与方分别上传数据集到本方的系统中。
步骤S2,隐私计算的发起方将本次调用的算子分发给所有参与方,以保证所有方的算子是最新的。图5中仅标注了参与计算的机构1和机构2,在机构1发起隐私计算的情况下,则机构1视为发起方,机构2视为参与方,以此类推。
步骤S3,发起方和所有参与方分别对本方的数据集按照算子的数据切割函数拆分成多个子数据集,同时返回每个子数据集的子数据集标号即子数据集id。如图5,机构1和机构2中的数据集执行数据集切割后分别得到子数据0、子数据集1和子数据集2。
数据切割函数中提供了两种适用度较高的默认可选数据切割方式:如果是向量类数据 集,则直接按照分段的方式进行切割,子数据集id包括对应的序号;如果是集合类数据集,集合运算、多方交集、并集或者补集等,则读取数据集中的每一个元素,哈希桶运算后计算出其属于的子数据集桶号,并写入到对应子数据集文件中,子数据集id包括计算出的桶号。
步骤S4,发起方和每一个参与方都将子数据集平均分配发送到本方拥有的所有隐私计算从节点中。平均分配是为了进一步提高计算效率,防止各个计算从节点的计算量有较大的差异,实际应用中的子数据集分配方式不做限制。如图5,各个计算节点,包括计算主节点和计算从节点,分别分配得到一个子数据集。
步骤S5,发起方每一个用于隐私计算的计算从节点同时开始执行计算算子的发起方逻辑,参与方的每一个用于隐私计算的计算从节点同时开始执行计算算子的参与方逻辑,使用的数据都是由主节点分配的子数据集。
步骤S6,拥有相同子数据集id的发起方和参与方节点之间执行隐私计算相应的算法逻辑。在各方数据交互过程中,从节点无法直接向其他机构的从节点发送数据,需要先发送给本方的计算主节点,再由本方的计算主节点将数据和数据所属的子数据集id发送给对方的计算主节点,对方的计算主节点收到后,根据数据所属的子数据集id发送给对应接收数据的计算从节点。如图5所示,机构1和机构2的计算主节点之间有数据传输,而分配有相同子数据集标号的计算从节点例如机构1的计算从节点1和机构2的计算从节点2之间没有实际的数据传输,但是会执行计算算子所约定的相应算法。计算从节点之间的数据传输仅与本机构的计算主节点之间产生。
步骤S7,当所有子数据集都执行完算法逻辑后,发起方的所有计算从节点将所有的计算结果都发送给计算主节点。由发起方计算主节点调用算子包中的聚合函数将所有子结果聚合成完整的计算结果。
通过上述可选实施例,系统原生支持数据量的扩展,隐私计算算法开发人员无需在算法流程中关注编写数据的拆分计算和聚合计算逻辑以及数据在节点间的传输,可以更专注于算法本身逻辑的编写。该方法支持隐私计算算子的动态升级替换,当节点数量扩展到几十个时,可以很方便的进行系统的运维。
需要说明的是,在上述流程中或者附图的流程图中示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。例如,步骤S1和步骤S2的顺序可以进行互换。
本实施例还提供了一种多方数据处理系统,该系统用于实现上述实施例及可选实施方 式。如以下所使用的,术语“模块”、“单元”、“子单元”等可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图6是根据本申请实施例的多方数据处理系统的结构框图,如图6所示,该系统包括发起方60和参与方64,实际应用中发起方和参与方的数量可能不止一个。发起方60包括发起方计算主节点62和发起方计算从节点63,参与方64包括参与方计算主节点66和参与方计算从节点67:
发起方60和参与方64分别获取各自的数据集以及计算算子,根据计算算子对数据集进行数据分割得到子数据集,并将子数据集分配至本方的各个计算从节点,包括发起方计算从节点63和参与方计算从节点67,各个计算从节点根据计算算子执行对应的计算逻辑;
发起方计算从节点63和参与方计算从节点67执行计算逻辑过程中,发起方计算从节点63根据各参与方计算从节点67提供的相应计算数据得到计算结果,并将计算结果发送至发起方计算主节点62,发起方计算主节点62根据计算算子对计算结果进行数据聚合,得到聚合数据。
在一些实施例中,图7是根据本申请实施例的计算节点的结构框图,如图7所示,计算节点包括计算主节点和计算从节点,每个计算节点包括调度器72,调度器包括虚拟机74和网络组件76,虚拟机74用于执行上述各个实施例以及可选实施例中的多方数据处理方法。网络组件76用于在计算节点间进行数据通信,实现数据交互以及计算算子分发等。
在一些实施例中,发起方60和参与方64根据数据切割函数,对数据集进行数据分割得到子数据集并标注子数据集标号,将子数据集分配至本方的各个计算从节点,发起方计算从节点63和参与方计算从节点67,拥有相同子数据集标号的发起方计算从节点63和参与方计算从节点67之间执行对应的计算逻辑;
在执行所述计算逻辑过程中,参与方计算从节点67将计算数据发送至拥有相同子数据集标号的发起方计算从节点63,发起方计算从节点63根据各参与方计算从节点67提供的相应计算数据得到计算结果,并将计算结果发送至发起方计算主节点62。
在一些实施例中,参与方计算从节点67将计算数据以及对应的子数据集标号发送至参与方计算主节点66,参与方计算主节点66将计算数据以及对应的子数据集标号发送至发起方计算主节点62,发起方计算主节点62根据子数据集标号将计算数据发送至对应的发起方计算从节点63,发起方计算从节点63根据各参与方计算从节点提供的相应计算数据得到计算结果。
需要说明的是,上述各个模块可以是功能模块也可以是程序模块,既可以通过软件来 实现,也可以通过硬件来实现。对于通过硬件来实现的模块而言,上述各个模块可以位于同一处理器中;或者上述各个模块还可以按照任意组合的形式分别位于不同的处理器中。本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,在此不再赘述。
本实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
可选地,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
发起方和参与方分别获取各自的数据集以及计算算子;
根据计算算子中的数据切割函数对数据集进行数据分割得到子数据集,并将子数据集分配至本方的各个计算从节点,各个计算从节点根据计算算子执行对应的计算逻辑;
发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将计算结果发送至发起方计算主节点;
发起方计算主节点根据计算算子对计算结果进行数据聚合,得到聚合数据。
在一些实施例中,上述处理器还可以被设置为通过计算机程序执行以下步骤:
根据所述数据切割函数,对所述数据集进行数据分割得到所述子数据集并标注子数据集标号,将所述子数据集分配至本方的各个计算从节点,拥有相同子数据集标号的所述发起方计算从节点和所述参与方计算从节点之间执行对应的所述计算逻辑;
发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将所述计算结果发送至所述发起方计算主节点。
在一些实施例中,上述处理器还可以被设置为通过计算机程序执行以下步骤:
所述参与方计算从节点将所述计算数据以及对应的子数据集标号发送至参与方计算主节点,
所述参与方计算主节点将所述计算数据以及对应的子数据集标号发送至所述发起方计算主节点,所述发起方计算主节点根据所述子数据集标号将所述计算数据发送至对应的发起方计算从节点,所述发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果。
在一些实施例中,上述处理器还可以被设置为通过计算机程序执行以下步骤:
在所述数据集为向量类数据集的情况下,按照分段的方式进行切割,子数据集标号就是其对应的分段序号;
在所述数据集为集合类数据集的情况下,读取所述数据集中的每一个元素,哈希运算后取模计算出桶号,并写入对应的子数据集文件中,所述子数据集标号为所述桶号。
在一些实施例中,上述处理器还可以被设置为通过计算机程序执行以下步骤:
所述发起方获取计算算子,解析所述计算算子中的配置文件,读取其中的算法种类和版本号,如果已存在相同算法种类的所述计算算子则执行覆盖;
所述发起方将所述计算算子分发至各个所述参与方。
在一些实施例中,所述计算算子包括独立设置的数据切割函数、数据聚合函数、计算逻辑和所述计算数据的消息类型。
需要说明的是,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
另外,结合上述实施例中的多方数据处理方法,本申请实施例可提供一种存储介质来实现。该存储介质上存储有计算机程序;该计算机程序被处理器执行时实现上述实施例中的任意一种多方数据处理方法。
可选地,在本实施例中,上述计算机程序被处理器执行时实现以下步骤:
发起方和参与方分别获取各自的数据集以及计算算子;
根据计算算子中的数据切割函数对数据集进行数据分割得到子数据集,并将子数据集分配至本方的各个计算从节点,各个计算从节点根据计算算子执行对应的计算逻辑;
发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将计算结果发送至发起方计算主节点;
发起方计算主节点根据计算算子对计算结果进行数据聚合,得到聚合数据。
在一些实施例中,上述计算机程序被处理器执行时实现以下步骤:
根据所述数据切割函数,对所述数据集进行数据分割得到所述子数据集并标注子数据集标号,将所述子数据集分配至本方的各个计算从节点,拥有相同子数据集标号的所述发起方计算从节点和所述参与方计算从节点之间执行对应的所述计算逻辑;
发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将所述计算结果发送至所述发起方计算主节点。
在一些实施例中,上述计算机程序被处理器执行时实现以下步骤:
所述参与方计算从节点将所述计算数据以及对应的子数据集标号发送至参与方计算主节点,
所述参与方计算主节点将所述计算数据以及对应的子数据集标号发送至所述发起方计算主节点,所述发起方计算主节点根据所述子数据集标号将所述计算数据发送至对应的 发起方计算从节点,所述发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果。
在一些实施例中,上述计算机程序被处理器执行时实现以下步骤:
在所述数据集为向量类数据集的情况下,按照分段的方式进行切割,子数据集标号就是其对应的分段序号;
在所述数据集为集合类数据集的情况下,读取所述数据集中的每一个元素,哈希运算后取模计算出桶号,并写入对应的子数据集文件中,所述子数据集标号为所述桶号。
在一些实施例中,上述计算机程序被处理器执行时实现以下步骤:
所述发起方获取计算算子,解析所述计算算子中的配置文件,读取其中的算法种类和版本号,如果已存在相同算法种类的所述计算算子则执行覆盖;
所述发起方将所述计算算子分发至各个所述参与方。
在一些实施例中,所述计算算子包括独立设置的数据切割函数、数据聚合函数、计算逻辑和所述计算数据的消息类型。
本领域的技术人员应该明白,以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (15)

  1. 一种多方数据处理方法,其特征在于,包括:
    发起方和参与方分别获取各自的数据集以及计算算子;
    根据所述计算算子中的数据切割函数对所述数据集进行数据分割得到子数据集,并将所述子数据集分配至本方的各个计算从节点,各个所述计算从节点根据所述计算算子执行对应的计算逻辑;
    发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将所述计算结果发送至所述发起方计算主节点;
    所述发起方计算主节点根据所述计算算子对所述计算结果进行数据聚合,得到聚合数据。
  2. 根据权利要求1所述的多方数据处理方法,其中,所述方法还包括:
    根据所述数据切割函数,对所述数据集进行数据分割得到所述子数据集并标注子数据集标号,将所述子数据集分配至本方的各个计算从节点,拥有相同子数据集标号的所述发起方计算从节点和所述参与方计算从节点之间执行对应的所述计算逻辑;
    发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将所述计算结果发送至所述发起方计算主节点。
  3. 根据权利要求2所述的多方数据处理方法,其中,所述发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果包括:
    所述参与方计算从节点将所述计算数据以及对应的子数据集标号发送至参与方计算主节点,
    所述参与方计算主节点将所述计算数据以及对应的子数据集标号发送至所述发起方计算主节点,所述发起方计算主节点根据所述子数据集标号将所述计算数据发送至对应的发起方计算从节点,所述发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果。
  4. 根据权利要求2所述的多方数据处理方法,其中,所述根据所述计算算子中的数据切割函数,对所述数据集进行数据分割得到所述子数据集并标注子数据集标号包括:
    在所述数据集为向量类数据集的情况下,按照分段的方式进行切割,子数据集标号包括对应的分段序号;
    在所述数据集为集合类数据集的情况下,读取所述数据集中的每一个元素,哈希运算后取模计算出桶号,并写入对应的子数据集文件中,所述子数据集标号包括所述桶号。
  5. 根据权利要求2所述的多方数据处理方法,其中,所述发起方和所述参与方分别获 取各自的计算算子包括:
    所述发起方获取计算算子,解析所述计算算子中的配置文件,读取其中的算法种类和版本号,如果已存在相同算法种类的所述计算算子则执行覆盖;
    所述发起方将所述计算算子分发至各个所述参与方。
  6. 根据权利要求1所述的多方数据处理方法,其中,所述计算算子包括独立设置的数据切割函数、数据聚合函数、计算逻辑和所述计算数据的消息类型。
  7. 一种多方数据处理系统,包括发起方和参与方,所述发起方包括发起方计算主节点和发起方计算从节点,所述参与方包括参与方计算主节点和参与方计算从节点:
    所述发起方和所述参与方分别获取各自的数据集以及计算算子,根据所述计算算子对所述数据集进行数据分割得到子数据集,并将所述子数据集分配至本方的各个计算从节点,各个所述计算从节点根据所述计算算子执行对应的计算逻辑;
    并将所述计算结果发送至所述发起方计算主节点,所述发起方计算主节点根据所述计算算子对所述计算结果进行数据聚合,得到聚合数据。
  8. 根据权利要求7所述的多方数据处理系统,其中,所述计算节点包括计算主节点和计算从节点,每个所述计算节点包括调度器,所述调度器包括虚拟机和网络组件,
    所述虚拟机于执行如权利要求1至6中任一项所述的多方数据处理方法;
    所述网络组件用于在所述计算节点间进行数据通信。
  9. 一种电子装置,包括存储器和处理器,其特征在于,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行权利要求1所述的多方数据处理方法。
  10. 根据权利要求9所述的电子装置,其中,所述处理器被设置为运行所述计算机程序以执行:
    根据所述数据切割函数,对所述数据集进行数据分割得到所述子数据集并标注子数据集标号,将所述子数据集分配至本方的各个计算从节点,拥有相同子数据集标号的所述发起方计算从节点和所述参与方计算从节点之间执行对应的所述计算逻辑;
    发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果,并将所述计算结果发送至所述发起方计算主节点。
  11. 根据权利要求10所述的电子装置,其中,所述处理器被设置为运行所述计算机程序以执行:
    所述参与方计算从节点将所述计算数据以及对应的子数据集标号发送至参与方计算主节点,
    所述参与方计算主节点将所述计算数据以及对应的子数据集标号发送至所述发起方 计算主节点,所述发起方计算主节点根据所述子数据集标号将所述计算数据发送至对应的发起方计算从节点,所述发起方计算从节点根据各参与方计算从节点提供的相应计算数据得到计算结果。
  12. 根据权利要求10所述的电子装置,其中,所述处理器被设置为运行所述计算机程序以执行:
    在所述数据集为向量类数据集的情况下,按照分段的方式进行切割,子数据集标号就是其对应的分段序号;
    在所述数据集为集合类数据集的情况下,读取所述数据集中的每一个元素,哈希运算后取模计算出桶号,并写入对应的子数据集文件中,所述子数据集标号为所述桶号。
  13. 根据权利要求10所述的电子装置,其中,所述处理器被设置为运行所述计算机程序以执行:
    所述发起方获取计算算子,解析所述计算算子中的配置文件,读取其中的算法种类和版本号,如果已存在相同算法种类的所述计算算子则执行覆盖;
    所述发起方将所述计算算子分发至各个所述参与方。
  14. 根据权利要求9所述的电子装置,其中,所述计算算子包括独立设置的数据切割函数、数据聚合函数、计算逻辑和所述计算数据的消息类型。
  15. 一种存储介质,其特征在于,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行权利要求1所述的多方数据处理方法。
PCT/CN2022/138422 2021-12-28 2022-12-12 多方数据处理方法、系统、电子装置和存储介质 WO2023124945A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111631336.4A CN114296922A (zh) 2021-12-28 2021-12-28 多方数据处理方法、系统、电子装置和存储介质
CN202111631336.4 2021-12-28

Publications (1)

Publication Number Publication Date
WO2023124945A1 true WO2023124945A1 (zh) 2023-07-06

Family

ID=80972193

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138422 WO2023124945A1 (zh) 2021-12-28 2022-12-12 多方数据处理方法、系统、电子装置和存储介质

Country Status (2)

Country Link
CN (1) CN114296922A (zh)
WO (1) WO2023124945A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117252676A (zh) * 2023-11-20 2023-12-19 成都新希望金融信息有限公司 业务处理方法、装置、电子设备和指标策略系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114296922A (zh) * 2021-12-28 2022-04-08 杭州趣链科技有限公司 多方数据处理方法、系统、电子装置和存储介质
CN114884709B (zh) * 2022-04-25 2024-01-23 北京原语科技有限公司 一种多方安全计算协议的数据转换方法
CN115994161B (zh) * 2023-03-21 2023-06-06 杭州金智塔科技有限公司 基于多方安全计算的数据聚合系统及方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395642A (zh) * 2020-11-20 2021-02-23 湖南智慧政务区块链科技有限公司 一种安全多方隐私计算方法、装置、设备及存储介质
CN113239403A (zh) * 2021-06-03 2021-08-10 光大科技有限公司 一种数据共享方法及装置
CN113472538A (zh) * 2021-09-02 2021-10-01 富算科技(上海)有限公司 多方安全计算的结果隐私性检测方法、装置、设备及介质
US20220029971A1 (en) * 2019-12-13 2022-01-27 TripleBlind, Inc. Systems and Methods for Providing a Modified Loss Function in Federated-Split Learning
CN114296922A (zh) * 2021-12-28 2022-04-08 杭州趣链科技有限公司 多方数据处理方法、系统、电子装置和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220029971A1 (en) * 2019-12-13 2022-01-27 TripleBlind, Inc. Systems and Methods for Providing a Modified Loss Function in Federated-Split Learning
CN112395642A (zh) * 2020-11-20 2021-02-23 湖南智慧政务区块链科技有限公司 一种安全多方隐私计算方法、装置、设备及存储介质
CN113239403A (zh) * 2021-06-03 2021-08-10 光大科技有限公司 一种数据共享方法及装置
CN113472538A (zh) * 2021-09-02 2021-10-01 富算科技(上海)有限公司 多方安全计算的结果隐私性检测方法、装置、设备及介质
CN114296922A (zh) * 2021-12-28 2022-04-08 杭州趣链科技有限公司 多方数据处理方法、系统、电子装置和存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117252676A (zh) * 2023-11-20 2023-12-19 成都新希望金融信息有限公司 业务处理方法、装置、电子设备和指标策略系统
CN117252676B (zh) * 2023-11-20 2024-02-02 成都新希望金融信息有限公司 业务处理方法、装置、电子设备和指标策略系统

Also Published As

Publication number Publication date
CN114296922A (zh) 2022-04-08

Similar Documents

Publication Publication Date Title
WO2023124945A1 (zh) 多方数据处理方法、系统、电子装置和存储介质
US5920703A (en) Systems and methods for managing the processing of relatively large data objects in a communications stack
JP4334901B2 (ja) コンピュータ処理システム及びコンピュータで実行される処理方法
WO2019042312A1 (zh) 分布式计算系统,分布式计算系统中数据传输方法和装置
JP4768386B2 (ja) 外部デバイスとデータ通信可能なインターフェイスデバイスを有するシステム及び装置
TW201731253A (zh) 量子金鑰分發方法及裝置
CN108304473B (zh) 数据源之间的数据传输方法和系统
WO2021052169A1 (zh) 分布式数据的均衡处理方法、装置、计算终端和存储介质
CN103346981A (zh) 虚拟交换方法、相关装置和计算机系统
JP2004046861A (ja) プロセッサ装置内で一時的に専用パイプラインを設定する方法及びシステム
CN111585887B (zh) 基于多个网络的通信方法、装置、电子设备及存储介质
CN110119304B (zh) 一种中断处理方法、装置及服务器
WO2020052379A1 (zh) 分布式存储系统中处理对象的元数据的方法及装置
WO2023178766A1 (zh) 基于Flink引擎计算节点动态扩展的任务评价方法和装置
WO2023056797A1 (zh) 基于区块链的数据处理方法、装置、设备及存储介质
CN110990415A (zh) 数据处理方法、装置、电子设备及存储介质
CN112199442A (zh) 分布式批量下载文件方法、装置、计算机设备及存储介质
CN115328645A (zh) 计算任务调度方法、计算任务调度装置及电子设备
US9705978B1 (en) Dependency graph management
WO2021143183A1 (zh) 一种部署虚拟机的方法及相关装置
CN117370460A (zh) 基于双链存储的区块链存储优化方法及装置
WO2023185454A1 (zh) 一种数据访问方法及相关装置
US11012506B2 (en) Node and cluster management on distributed self-governed ecosystem
CN114880717A (zh) 数据归档方法及装置
Wang et al. Coupling GPU and MPTCP to improve Hadoop/MapReduce performance

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22914200

Country of ref document: EP

Kind code of ref document: A1