WO2021051629A1 - 联邦学习隐私数据处理方法、设备、系统及存储介质 - Google Patents

联邦学习隐私数据处理方法、设备、系统及存储介质 Download PDF

Info

Publication number
WO2021051629A1
WO2021051629A1 PCT/CN2019/119237 CN2019119237W WO2021051629A1 WO 2021051629 A1 WO2021051629 A1 WO 2021051629A1 CN 2019119237 W CN2019119237 W CN 2019119237W WO 2021051629 A1 WO2021051629 A1 WO 2021051629A1
Authority
WO
WIPO (PCT)
Prior art keywords
mask
model parameter
parameter update
update
masked
Prior art date
Application number
PCT/CN2019/119237
Other languages
English (en)
French (fr)
Inventor
程勇
刘洋
陈天健
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021051629A1 publication Critical patent/WO2021051629A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This application relates to the field of data processing technology, and in particular to a method, equipment, system, and storage medium for processing federal learning privacy data.
  • the local model parameter updates for example, neural network model weights or gradient information
  • the coordinator In the actual application of horizontal federated learning scenarios, the local model parameter updates (for example, neural network model weights or gradient information) sent by participants to the coordination device will be obtained by the coordinator.
  • the reliability of the coordinator cannot be guaranteed, It may leak the participant's privacy, data information, and the trained machine learning model to the coordinator.
  • participants can use encryption methods, for example, using homomorphic encryption technology, secret sharing technology or differential privacy technology , Send model parameter updates to the coordinator, and the coordinator cannot obtain model weights or gradient information if the coordinator cannot decrypt, thereby ensuring that no information will be leaked to the coordinator.
  • the use of encryption technology will significantly increase the length of the information that needs to be transmitted.
  • the length of the ciphertext (measured by the number of bits) obtained is at least twice the length of the plaintext. That is, encryption at least doubles the communication bandwidth requirement than no encryption.
  • communication bandwidth is severely limited, and the additional communication bandwidth requirements brought about by participants' encryption operations may not be met, or at least significantly Increase the communication delay.
  • the main purpose of this application is to provide a federal learning privacy data processing method, equipment, system and storage medium, aiming to implement a security mechanism so that the participant’s information will not be leaked to the coordinator, and will not cause a significant increase in communication Bandwidth requirements.
  • this application provides a method for processing federal learning privacy data.
  • the method for processing federal learning privacy data is applied to a coordinating device.
  • the coordinating device includes a trusted execution environment TEE module, and the coordinating device communicates with multiple participating devices.
  • the federal learning privacy data processing method includes the following steps:
  • each participating device receives masked model parameter updates sent by each participating device, where each participating device adds a mask to the model parameter update obtained by each training based on the first mask generated by each participating device, and obtains each masked model parameter update;
  • the TEE module In the TEE module, generate a second mask that is the same as the first mask, and update and remove the mask for each masked model parameter based on the second mask to obtain each model parameter update;
  • the step of generating a second mask that is the same as the first mask, and updating and removing masks for each masked model parameter based on the second mask, to obtain each model parameter update includes:
  • the first preset mask generator is used to generate the second mask
  • each masked model parameter update is removed from the mask, and each model parameter update is obtained.
  • each participating device uses its local second preset mask to generate at least according to the iteration index of this model update.
  • the generator generates the first mask, and the first preset mask generator is the same as the second preset mask generator.
  • the step of generating a second mask that is the same as the first mask, and updating and removing masks for each masked model parameter based on the second mask, to obtain each model parameter update includes:
  • the first preset mask generator is used to generate each second mask corresponding to each participating device;
  • Respectively based on the second mask corresponding to each participating device remove the mask from each masked model parameter update sent by each participating device to obtain each model parameter update, where each participating device is at least based on the model updated this time
  • the iterative index and the respective device numbers are used to generate the respective first masks using the respective local second preset mask generators, and the first preset mask generators are the same as the second preset mask generators.
  • the fusion of each model parameter update obtains a global model parameter update
  • the generated third mask is used to add a mask to the global model parameter update to obtain
  • the steps for updating masked global model parameters include:
  • the third mask is complemented by the preset completion method, and the global model parameter update is added with the mask after the completion of the third mask, and the global model parameter update with the mask is obtained. Among them, the completed third mask is used to update the global model parameters.
  • the length of the three masks is the same as the length of the model parameter update.
  • the method further includes:
  • the training of the model to be trained is ended, or if the number of iterations reaches the preset maximum number of iterations, the training of the model to be trained is ended, or if the training time reaches the maximum training time, the training of the model to be trained is ended.
  • this application also provides a method for processing federal learning privacy data.
  • the method for processing federal learning privacy data is applied to participating devices, and the participating devices are in communication connection with the coordination device.
  • the processing method for federal learning privacy data includes the following steps :
  • the locally generated first mask of this model update is used to add a mask to the model parameter update, and the masked model parameter update is obtained and sent to the coordination device.
  • the coordination device includes a trusted execution environment TEE module,
  • the step of removing the mask from the masked global model parameter update to obtain the global model parameter update includes:
  • the coordinating device receives the masked model parameter update sent by each participating device, and then The TEE module generates a second mask that is the same as the first mask of the last model update of each participating device. Based on the second mask, each masked model parameter update removes the mask, and each model parameter update is obtained, and each model is updated.
  • the model parameter update obtains the global model parameter update, and the generated third mask is used to add a mask to the global model parameter update to obtain the masked global model parameter update of this model update.
  • the step of removing the mask from the masked global model parameter update to obtain the global model parameter update includes the following steps:
  • the coordinating device receives the masked mask sent by each participating device
  • the model parameters are updated, and the model parameter updates with masks are merged to obtain the global model parameter updates with masks.
  • the present application also provides a device, the device is a coordination device, the device includes: a memory, a processor, and federal learning privacy data stored in the memory and running on the processor A processing program, when the federal learning private data processing program is executed by the processor, the steps of the above-mentioned federal learning private data processing method are realized.
  • this application also provides a device, the device being a participating device, the device comprising: a memory, a processor, and federal learning privacy data stored in the memory and running on the processor A processing program, when the federal learning private data processing program is executed by the processor, the steps of the above-mentioned federal learning private data processing method are realized.
  • this application also provides a federal learning privacy data processing system, which includes: at least one coordination device as described above and at least one participating device as described above.
  • this application also proposes a computer-readable storage medium with a federal learning privacy data processing program stored on the computer-readable storage medium, which is implemented when the federal learning privacy data processing program is executed by a processor
  • the above-mentioned federation learns the steps of the private data processing method.
  • each participating device adds a mask to the model parameter update obtained by each training based on the first mask generated by each, to obtain each masked model parameter update; the coordination device receives the masked data sent by each participating device Model parameter update.
  • the TEE module generate a second mask that is the same as the first mask, and update and remove the mask based on the second mask to update each model parameter; in the TEE module
  • the global model parameter update is obtained, and the generated third mask is used to add a mask to the global model parameter update to obtain the masked global model parameter update; send the masked global model parameter update
  • Each participating device allows each participating device to update and remove the mask from the masked global model parameter to obtain a global model parameter update based on a fourth mask that is generated by each participating device, which is the same as the third mask.
  • the coordination device cannot obtain the model parameter update and global model parameter update of each participating device, but can obtain the model parameter update of the participating device in the TEE module and perform the fusion operation. It realizes the model update process of federated learning without revealing the privacy of the coordinating device; and through masking technology, the model parameter update and global model parameter update can be safely transmitted without increasing the communication bandwidth requirement; and , Through the coordinating device and the participating device respectively generating the mask locally, it is ensured that the masks used for adding and removing the mask are the same, so that between the participating device and the coordinating device, the participating device and the participating device, or Participating devices and third-party servers do not need to increase additional communication overhead to negotiate the consistency of the mask, especially in the scenario where the mask is replaced in each model update, which greatly reduces the communication overhead and power consumption.
  • FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution of an embodiment of the present application
  • Figure 2 is a schematic flow chart of the first embodiment of a method for processing private learning privacy data applied for federal learning
  • FIG. 3 is a schematic diagram of visible content in a coordination device according to an embodiment of the application.
  • FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.
  • the device in the embodiment of the present application is a coordination device
  • the coordination device may be devices such as a smart phone, a personal computer, and a server, and there is no specific limitation here.
  • the device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
  • FIG. 1 does not constitute a limitation on the device, and may include more or fewer components than those shown in the figure, or a combination of certain components, or different component arrangements.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a federal learning privacy data processing program, as well as a TEE (Trusted Execution Environment, Trusted Execution Environment) module.
  • the operating system is a program that manages and controls equipment hardware and software resources, and supports the operation of federal learning privacy data processing programs and other software or programs.
  • TEE is a secure area within the main processor, which runs in an independent environment and runs in parallel with the operating system. It ensures that the confidentiality and integrity of the code and data loaded in the TEE are protected.
  • Trusted applications running in the TEE can access all the functions of the device's main processor and memory, and hardware isolation protects these components from the user-installed applications running in the main operating system.
  • the TEE module can be implemented in many ways, such as Intel's Software Guard Extensions (SGX), AMD's Secure Encrypted Virtualization (SEV), ARM's Trust Zone or MIT Sanctum.
  • the authentication and authentication of the TEE module can be done through a third-party security server. For example, when the TEE uses Intel's SGX, the TEE can be authenticated by the Intel security server, that is, the security of the TEE can be guaranteed.
  • the user interface 1003 is mainly used to communicate with the client;
  • the network interface 1004 is mainly used to establish a communication connection with each participating device; and the processor 1001 can be used to call the federation stored in the memory 1005 Learn about privacy data processing procedures and do the following:
  • each participating device receives masked model parameter updates sent by each participating device, where each participating device adds a mask to the model parameter update obtained by each training based on the first mask generated by each participating device, and obtains each masked model parameter update;
  • the TEE module In the TEE module, generate a second mask that is the same as the first mask, and update and remove the mask for each masked model parameter based on the second mask to obtain each model parameter update;
  • the step of generating a second mask that is the same as the first mask, and updating and removing the mask for each model parameter with mask based on the second mask, to obtain the update of each model parameter includes:
  • the first preset mask generator is used to generate the second mask
  • each masked model parameter update is removed from the mask, and each model parameter update is obtained.
  • each participating device uses its local second preset mask to generate at least according to the iteration index of this model update.
  • the generator generates the first mask, and the first preset mask generator is the same as the second preset mask generator.
  • the step of generating a second mask that is the same as the first mask, and updating and removing the mask for each model parameter with mask based on the second mask, to obtain the update of each model parameter includes:
  • the first preset mask generator is used to generate each second mask corresponding to each participating device;
  • Respectively based on the second mask corresponding to each participating device remove the mask from each masked model parameter update sent by each participating device to obtain each model parameter update, where each participating device is at least based on the model updated this time
  • the iterative index and the respective device numbers are used to generate the respective first masks using the respective local second preset mask generators, and the first preset mask generators are the same as the second preset mask generators.
  • the fusion of each model parameter update obtains a global model parameter update
  • the generated third mask is used to add a mask to the global model parameter update to obtain a band
  • the steps for updating the global model parameters of the mask include:
  • the third mask is complemented by the preset completion method, and the global model parameter update is added with the mask after the completion of the third mask, and the global model parameter update with the mask is obtained. Among them, the completed third mask is used to update the global model parameters.
  • the length of the three masks is the same as the length of the model parameter update.
  • the processor 1001 may also be used to call the federated learning privacy data processing program stored in the memory 1005, and execute the following steps:
  • the training of the model to be trained is ended, or if the number of iterations reaches the preset maximum number of iterations, the training of the model to be trained is ended, or if the training time reaches the maximum training time, the training of the model to be trained is ended.
  • an embodiment of the present application also proposes a participating device, which is in communication connection with a coordination device, the participating device includes: a memory, a processor, and federated learning stored on the memory and running on the processor A private data processing program, when the federal learning private data processing program is executed by the processor, the following steps of the federal learning private data processing method are implemented:
  • the locally generated first mask of this model update is used to add a mask to the model parameter update, and the masked model parameter update is obtained and sent to the coordination device.
  • the coordination device includes a trusted execution environment TEE module,
  • the step of removing the mask from the masked global model parameter update to obtain the global model parameter update includes:
  • the coordinating device receives the masked model parameter update sent by each participating device, and then The TEE module generates a second mask that is the same as the first mask of the last model update of each participating device. Based on the second mask, each masked model parameter update removes the mask, and each model parameter update is obtained, and each model is updated.
  • the model parameter update obtains the global model parameter update, and the generated third mask is used to add a mask to the global model parameter update to obtain the masked global model parameter update of this model update.
  • step of removing the mask from the masked global model parameter update to obtain the global model parameter update includes the following steps:
  • the coordinating device receives the masked mask sent by each participating device
  • the model parameters are updated, and the model parameter updates with masks are merged to obtain the global model parameter updates with masks.
  • an embodiment of the present application also proposes a federal learning privacy data processing system, which includes at least one coordination device as described above and at least one participating device as described above.
  • an embodiment of the present application also proposes a computer-readable storage medium, the storage medium stores a federal learning privacy data processing program, and the federal learning privacy data processing program is executed by a processor to implement the following federal learning Steps of private data processing method.
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for processing private learning privacy data in a federated learning under this application.
  • the embodiment of this application provides an embodiment of a method for processing federal learning privacy data. It should be noted that although the logical sequence is shown in the flowchart, in some cases, the sequence shown here can be executed in a different order. Steps out or described.
  • the method for processing federal learning privacy data in the first embodiment of this application is applied to a coordination device.
  • the coordination device communicates with multiple participating devices.
  • the coordination device includes a TEE module.
  • the coordination device and the participating device in the embodiment of this application can be smart phones or personal computers. Participating devices can support the training of the federated learning model, and there is no specific restriction here.
  • the federal learning privacy data processing method includes:
  • Step S10 receiving masked model parameter updates sent by each participating device, wherein each participating device updates and adds a mask to the model parameter obtained by each training based on the first mask generated by each participating device to obtain each masked model parameter Update
  • a mask technology is used to perform security processing on data, and the mask is also called perturbation.
  • the mask can be a vector, the elements of the vector can be one or more, the element type can be integer or floating point, and the mask can be randomly generated, that is, each element in the vector is randomly generated.
  • the process of adding a mask to data can be: For a target vector to be masked (when the number of elements in the target vector is the same as the number of elements in the mask, that is, when the length of the target vector is the same as the length of the mask), change For each element, add or subtract the element at the corresponding position in the mask to get the target vector with the mask.
  • the process of removing the mask from the data can be: for a target vector with a mask, subtract or add an element at a corresponding position in the mask to each element in the target vector to obtain the target vector. After the target vector is masked and removed, the original target vector is still obtained, and the length of the target vector is increased by adding a mask to the target vector. When only the target vector with a mask is obtained, it cannot Know the original target vector, thereby ensuring the security of the data.
  • the above operations of adding and removing masks may also include modulo operations.
  • the modulo operation can ensure that the result of the operation stays in a finite integer domain.
  • the coordination device and each participating device may establish a communication connection in advance through handshake and identity authentication, and determine the model to be trained for this federated learning.
  • the model to be trained may be a machine learning model, such as a neural network model.
  • the coordination device and the participating device cooperate with each other to perform multiple iterations of the training model to obtain the final convergent model to be trained, and the training process of the training model can be ended.
  • a model update each participating device performs local training on the model to be trained based on the global model parameter update of this model update and the local training data that it owns locally to obtain their respective local model parameter updates, and generate their own local model parameters.
  • One mask add a mask to the respective model parameter update, obtain the model parameter update with the mask, and send the model parameter update with the mask to the coordination device.
  • the model parameter update can be the weight parameter connected between the nodes of the neural network, or the model parameter update can also be the gradient information of the federated learning model, for example, the gradient information in the neural network gradient descent algorithm, the gradient information can be the gradient value Or the compressed gradient value;
  • the model parameter update is a vector that includes multiple elements. For example, when the model parameter update is a weight parameter, the elements in the vector are each weight parameter, and the number of elements in the vector is the length of the model parameter update; Participating devices can generate different first masks for each model update, and the first masks generated by each participating device can be the same or different; each participating device can generate its own first mask through a preset mask generation method.
  • the preset mask generation method can be set in advance according to needs, such as using a mask generator, which can be a commonly used pseudo-random number generator, such as ANSI X9.17 or linear A pseudo-random number generator similar to the method, etc., or generates a random mask according to a specific distribution, for example, generates a random mask that conforms to the Gaussian distribution;
  • the length of the first mask generated by each participating device can be the same or different, and can be The length of the first mask of each participating device is preset, and the length may be less than or equal to the length of the model parameter update, so as to reduce the computational complexity of generating the mask.
  • the coordination device receives masked model parameter updates sent by each participating device.
  • Step S20 In the TEE module, generate a second mask that is the same as the first mask, and update and remove the mask for each masked model parameter based on the second mask, to obtain each model parameter update;
  • the coordination device generates a second mask that is the same as the first mask in the TEE module, and updates and removes the mask for each model parameter with the mask based on the second mask to obtain each model parameter update. It should be noted that if the first mask generated by each participating device is the same, the coordinating device generates a second mask that is the same as the first mask. If the first mask generated by each participating device is different , The coordination device generates multiple second masks, which correspond to the same first masks of each participating device.
  • each participating device If each participating device generates a different first mask for each model update, that is, the first mask generated by the last model update of a participating device is different from the first mask generated by this model update, then the coordinated device generates and participates The second mask that is the same as the first mask generated in this model update of the device.
  • the TEE module of the coordinating device can be preset with the same mask generation mode as each participating device, so that the second mask generated by the coordinating device in the TEE module is the same as the first mask.
  • the coordination device uses the same second mask as the first mask in the TEE module to remove the mask operation for the masked model parameter update, the original model parameter update of each participating device can be restored.
  • the mask removal operation is performed in the TEE module of the coordination device, the model parameter update obtained by removing the mask is only visible in the TEE module, and the coordination device can only obtain the masked model parameter update, and cannot obtain the participating devices.
  • the model parameters are updated so that the privacy of participating devices will not be stolen, and the privacy of participating devices will not be leaked to the coordinating device.
  • step S30 in the TEE module, the global model parameter updates are obtained by fusing each model parameter update, and the generated third mask is used to add a mask to the global model parameter update to obtain the global model parameter update with the mask;
  • the coordination device integrates the model parameter updates to obtain a global model parameter update, and generates a third mask.
  • the third mask is used to add a mask to the global model parameter update to obtain a masked global model parameter update.
  • fusing each model parameter update to obtain a global model parameter update may be performed by fusing each model parameter update through a fusion function, and the fusion function may be a function for performing a weighted average operation.
  • the coordination device can generate a different third mask in each model update. If a different third mask is generated in each model update, the global model parameter update obtained by the fusion is used for the next model update. , The third mask generated by the coordination device corresponds to the next model update.
  • the TEE module of the coordination device can use the same mask generation method as the second mask to generate the third mask; the length of the third mask can also be preset, which can be the same as the length of the first mask, or It may be different. Similarly, in order to reduce the computational complexity of generating the mask, the length of the third mask may be less than or equal to the length of the global model parameter update.
  • Step S40 Send the masked global model parameter update to each participating device, so that each participating device can update and remove the masked global model parameter update based on the fourth mask that is generated by each participating device, which is the same as the third mask. Code to get the global model parameter update.
  • the coordination device obtains the masked global model parameter update from the TEE module, and sends the masked global model parameter update to each participating device. If the coordination device detects convergence of the model to be trained in this model update, it can send the masked global model parameter update to each participating device, so that each participating device can determine the model to be trained based on the masked global model parameter update The final parameters of the end of this federation study.
  • the coordination device can send the masked global model parameter update to each participating device, and each participating device performs the next model update according to the masked global model parameter update ;
  • each participating device receives the masked global model parameter update sent by the coordinating device, each locally generates a fourth mask that is the same as the third mask of the coordinating device, and uses the fourth mask.
  • the global model parameter update performs a mask removal operation to obtain the global model parameter update.
  • the mask generation mode can be preset in each participating device, and the mask generation mode is the same as that of the third mask generated in the coordinating device, so that the fourth mask generated by the participating device is the same as the third mask generated by the coordinating device.
  • the code is the same.
  • the participating device uses the same fourth mask as the third mask of the coordinating device to remove the mask from the masked global model parameter update, the participating device can restore the original global model in the TEE module of the coordinating device Parameter update, so as to ensure that the participating devices obtain accurate global model parameter updates without causing data deviation; and, because the coordination device obtains masked global model parameter updates from the TEE module, while the original global model parameters The update can only be seen in the TEE module. Therefore, the coordination device cannot learn the original global model parameter update, and thus cannot steal the private data of each participating device.
  • each participating device adds a mask to the model parameter update obtained by each training based on the first mask generated by each participating device to obtain the respective masked model parameter update; the coordination device receives the masked mask sent by each participating device. Update the model parameters of the code.
  • the TEE module In the TEE module, generate a second mask that is the same as the first mask, and update and remove the masks for each masked model parameter based on the second mask to obtain each model parameter update;
  • the global model parameter update is obtained by fusing each model parameter update, and the generated third mask is used to add a mask to the global model parameter update to obtain the global model parameter update with the mask; the global model parameter with the mask is updated
  • Each participating device is updated and sent, so that each participating device updates and removes the mask from the masked global model parameter to obtain a global model parameter update based on the fourth mask that is generated by each participating device, which is the same as the third mask.
  • the coordination device cannot obtain the model parameter update and global model parameter update of each participating device, but can obtain the model parameter update of the participating device in the TEE module and perform the fusion operation. It realizes the model update process of federated learning without revealing the privacy of the coordinating device; and through masking technology, the model parameter update and global model parameter update can be safely transmitted without increasing the communication bandwidth requirement; and , Through the coordinating device and the participating device respectively generating the mask locally, it is ensured that the masks used for adding and removing the mask are the same, so that between the participating device and the coordinating device, the participating device and the participating device, or Participating devices and third-party servers do not need to increase additional communication overhead to negotiate the consistency of the mask, especially in the scenario where the mask is replaced in each model update, which greatly reduces the communication overhead and power consumption.
  • the coordination device determines whether the model to be trained has converged according to the global model parameter update
  • the operation of the coordination device to determine whether the model to be trained has converged is also performed in the TEE module. Specifically, after step S30, it further includes:
  • Step S301 in the TEE module, judge whether the model to be trained for federated learning converges according to the global model parameter update;
  • step S302 if the model to be trained converges, the training of the model to be trained is terminated, or if the number of iterations reaches the preset maximum number of iterations, the training of the model to be trained is terminated, or if the training time reaches the maximum training time, the training of the model to be trained is terminated.
  • the coordination device After the coordination device obtains the global model parameter update in the TEE module, it continues to judge whether the model to be trained converges in the TEE module according to the global model parameter update. Specifically, it can be determined whether the difference between the global model parameter update obtained in this model update and the joint model obtained in the previous model update is less than the preset difference, and if it is less than the preset difference, it is determined that the model to be trained has converged , If it is not less than the preset difference, it is determined that the model to be trained has not converged.
  • the coordination device can end the training of the model to be trained, that is, update the masked global model parameters obtained in the TEE module for this model update as the final model of the model to be trained
  • the parameters are sent to each participating device. If it is determined that the model to be trained has not converged, the coordination device will obtain the global model parameter update with mask obtained from the local model update obtained in the TEE module as the global model parameter update of the new model update and send it to each participating device. Each participating device performs a new model update according to the global model parameter update with mask. The loop iterates until the coordination device determines in the TEE module that the model to be trained has converged.
  • the coordination device detects in the TEE module that the number of iterations reaches the preset maximum number of iterations, the training of the model to be trained is terminated, or the TEE module detects that if the training time reaches the maximum training time, the training of the model to be trained is terminated.
  • the coordination device judges whether the model to be trained has converged according to the global model parameter update in the TEE module, the global model parameter update is only visible in the TEE module, and the coordination device cannot learn the global model parameter update, thereby ensuring participation The private data of the device will not be leaked to the coordinating device, and the normal progress of federated learning is also guaranteed.
  • the model to be trained may be a neural network model for credit risk estimation.
  • the input of the neural network model may be user characteristic data
  • the output may be the risk score of the user
  • the participating device may be The devices of multiple banks each have sample data of multiple users locally, and the coordination device is a third-party server independent of multiple banks.
  • the coordination device and each participating device perform training of the model to be trained according to the process of federated learning in the foregoing embodiment, and obtain a neural network model that is finally converged and used for credit risk estimation.
  • Each bank can use the trained neural network model to estimate the user's credit risk, and input the user's characteristic data into the trained model to obtain the user's risk score.
  • the coordination device and each participating device are in the federal learning process, through the fusion of masking technology and TEE technology, the coordination device cannot obtain the user privacy data of each bank; and through the masking technology, the model parameter update and the global model parameter update are made It can transmit safely without increasing communication bandwidth requirements, thereby reducing the cost of equipment deployment for various banks; and, by coordinating the equipment and participating equipment to generate masks locally, it is guaranteed that the generation is used to add mask operations and The mask for removing the mask operation is the same, so that there is no need to increase additional communication overhead between the bank equipment and the coordination equipment to negotiate the consistency of the mask, especially in the scenario where the mask is changed every time the model is updated, which greatly reduces This reduces the cost of communication and power consumption, and reduces the cost of equipment deployment for various banks.
  • model to be trained can also be used in other application scenarios besides credit risk estimation, such as performance level prediction, paper value evaluation, etc.
  • the embodiment of the application does not limit it here.
  • the step S20 includes:
  • Step S201 using the first preset mask generator to generate a second mask at least according to the iteration index of this model update;
  • each masked model parameter update removes the mask based on the second mask to obtain each model parameter update, where each participating device adopts its local second preset at least according to the iteration index of this model update.
  • the mask generator generates a first mask, and the first preset mask generator is the same as the second preset mask generator.
  • the iteration index refers to the number of model updates, and the identifier is the number of model updates.
  • the coordinating device can number each model update as an iterative index. When sending a masked global model parameter update to each participating device to start a new model update, the iterative index can be sent to each participating device. The participating device can carry the iteration index of this time when returning to the model parameter update of this time model update, so as to ensure the synchronization of the update times of the coordinating device and the participating device.
  • the second preset mask generator After each participating device performs local training on the model to be trained based on the global model parameter update of this model update and the local training data, and obtains their respective model parameter updates, they can use their respective local first index at least according to the iteration index of this model update.
  • the second preset mask generator generates the first mask.
  • the second preset mask generator in each participating device is the same.
  • Each participating device inputs the iteration index into the second preset mask generator, and the second preset mask generator uses the iteration index as a base to generate the first mask.
  • the length of the first mask can be set in advance by configuring the parameters of the second preset mask generator, that is, the number of elements of the first mask can be set by setting the parameters of the second preset mask generator.
  • the mask generator used is the same, therefore, the first mask generated by each participating device is the same; but for a participating device, the iteration index is different
  • the first mask generated by each model update of the participating device is different, so that the coordinating device cannot update the model parameters with the mask twice adjacent to the participating device, and infer the original model parameter update, thereby further improving the participation
  • the protection of the private data of the device is the same.
  • Each participating device uses the generated first mask to perform an add mask operation on its model parameter update, and sends the obtained masked model parameter update to the coordination device.
  • the coordination device performs the following operations in the TEE module (that is, the following operations are only visible in the TEE module):
  • the first preset mask generator is used to generate the second mask.
  • the first preset mask generator may be a preset mask generator, such as ANSI X9.17, and the first preset mask generator is the same as the second preset mask generator.
  • the iteration index of this model update is input to the first preset mask generator, and the first preset mask generator uses the iteration index as a base to generate the second mask.
  • the parameters of the mask generator can be configured in advance so that the length of the second mask generated by the mask generator is the same as the first mask.
  • the generated second mask is the same as the first mask.
  • each masked model parameter update removes the mask, and each model parameter update is obtained. Since the masked model parameter update is an adding mask operation performed by using the first mask, the masked model parameter update can be removed by using a second mask that is the same as the first mask. Get the original model parameter update.
  • the coordination device performs the above operations in the TEE module. Therefore, the coordination device can only obtain the masked model parameter update, but cannot obtain the original model parameter update, so the privacy of the participating device will not be leaked to the coordination device. ; And the TEE module of the coordination device can obtain the model parameter update of each participating device, and integrate the model parameter update to ensure the normal progress of federated learning. In addition, the coordinating device and each participating device generate the same mask locally, so there is no need to increase the communication overhead to negotiate the consistency of the mask, which greatly reduces the communication overhead and power consumption.
  • the coordination device may also perform the following operations in the TEE module: perform a fusion operation on the obtained model parameter updates of each participating device to obtain a global model parameter update.
  • the third preset mask generator is used to generate the third mask according to the iteration index of the next model update.
  • the third preset mask generator may be a preset mask generator, and may be the same as or different from the first preset mask generator.
  • the global model parameter update update is performed to add a mask operation to obtain the global model parameter update with the mask.
  • the coordination device sends the masked global model parameter update to each participating device, and can carry the iteration index of the next model update to start the next model update.
  • each participating device After receiving the masked global model parameter update of the new model update, each participating device uses the fourth preset mask generator to generate the fourth mask according to the iteration index of this model update carried in the message, and The fourth mask is used to remove the mask operation on the masked global model parameter update to obtain the global model parameter update, and the local training of this model update is performed according to the global model parameter update.
  • the fourth preset mask generator in each participating device is set to be the same, and is set to be the same as the third preset mask generator of the coordinating device. Since each participating device and coordinating device use the same mask generator to generate the mask according to the same iteration index, the fourth mask and the third mask are the same.
  • the participating device Since the masked global model parameter update is obtained by using the third mask to add the mask, the participating device uses the same fourth mask as the third mask to update the masked global model parameters. By removing the mask operation, the original global model parameter update can be obtained, so that the normal progress of federated learning can be ensured under the condition that the privacy of the participating devices is not leaked to the coordinating device.
  • the coordinating device and the K participating devices perform federated learning, determine that the length of the mask is L, which is less than or equal to the length N of the model parameter update and the global model parameter update.
  • t is the iterative index of the model update, and identifies the number of model updates.
  • the coordination device generates the same mask m(k,t) in the TEE module as the participating device, and uses m(k,t) to remove the mask from v(k,t) to obtain w(k,t).
  • the coordinating device sends u(t) to each participating device. Since both w(k, t) and w(t) are obtained in the TEE module, the coordination device cannot know w(k, t) and w(t). And the masks m(t) and p(t) are also generated in the TEE module, and the coordination device cannot infer w(k,t) and w(t) from v(k,t) and u(t).
  • the step S20 includes:
  • Step S203 using the first preset mask generator to generate each second mask corresponding to each participating device at least according to the iterative index of this model update and the device number of each participating device;
  • Step S204 based on the second mask corresponding to each participating device, remove the mask from each masked model parameter update sent by each participating device to obtain each model parameter update, wherein each participating device at least according to the current time
  • the iteration index of the model update and the respective device numbers are used to generate the respective first masks using the respective local second preset mask generators, and the first preset mask generators are the same as the second preset mask generators.
  • the coordinating device can assign a device number to each participating device participating in the federated learning, which can be a number, letter number, etc., or it can be negotiated with each participating device in advance. Identify the device and send the device number of each participating device to each participating device. For the newly added participating device during the model training process of federated learning, the coordinating device can assign a number to the participating device to ensure the The serial number is different, so as to realize the management of each participating device by the coordinated device during the federal learning process.
  • the respective local second preset mask generator is used to generate the first mask.
  • the second preset mask generator in each participating device is the same.
  • Each participating device inputs the iteration index and the respective device number into the second preset mask generator, and the second preset mask generator uses the iteration index and the device number as a base to generate the first mask.
  • the length of the first mask can be set in advance by configuring the parameters of the second preset mask generator, that is, the number of elements of the first mask can be set by setting the parameters of the second preset mask generator. Since the device numbers of the participating devices are different, the first masks generated by the participating devices are different.
  • Each participating device uses the generated first mask to perform an add mask operation on its model parameter update, and sends the obtained masked model parameter update to the coordination device.
  • the coordination device performs the following operations in the TEE module (that is, the following operations are only visible in the TEE module):
  • the first preset mask generator is used to generate a second mask corresponding to each participating device. Specifically, since the first mask of each participating device is different, for each participating device, according to the iteration index and the device number of the participating device, the first preset mask generator is used to generate the first mask corresponding to the participating device. Two masks.
  • the first preset mask generator may be a preset mask generator, such as ANSI X9.17, and the first preset mask generator is the same as the second preset mask generator.
  • the iteration index of this model update and the device number of the participating device are input into the first preset mask generator, and the first preset mask generator uses the iteration index and the device number as Base, generate a second mask corresponding to the participating device.
  • the parameters of the mask generator can be configured in advance so that the length of the second mask generated by the mask generator is the same as the first mask.
  • the second mask corresponding to each participating device is generated with the same The first mask of the participating devices is the same.
  • the masked model parameter update sent by the participating device is removed according to the second mask corresponding to each participating device, respectively, to obtain the model parameter update of each participating device. Since the masked model parameter update is an adding mask operation performed by using the first mask, the masked model parameter update can be removed by using a second mask that is the same as the first mask. Get the original model parameter update.
  • the coordination device performs the above operations in the TEE module. Therefore, the coordination device can only obtain the masked model parameter update, but cannot obtain the original model parameter update, so the privacy of the participating device will not be leaked to the coordination device. ; And the TEE module of the coordination device can obtain the model parameter update of each participating device, and integrate the model parameter update to ensure the normal progress of federated learning. In addition, the coordinating device and each participating device generate the same mask locally, so there is no need to increase the communication overhead to negotiate the consistency of the mask, which greatly reduces the communication overhead and power consumption.
  • the coordination device may also perform the following operations in the TEE module: perform a fusion operation on the obtained model parameter updates of each participating device to obtain a global model parameter update.
  • the third preset mask generator is used to generate a third mask corresponding to each participating device according to the iteration index of the next model update and the device number of each participating device.
  • the third preset mask generator may be a preset mask generator, and may be the same as or different from the first preset mask generator.
  • the global model parameter update update is performed to add a mask operation to obtain the global model parameter update with the mask.
  • the coordination device sends the masked global model parameter update to each participating device, and can carry the iteration index of the next model update to start the next model update.
  • each participating device After each participating device receives the masked global model parameter update of the new model update, it uses the fourth preset mask generator to generate it according to the iteration index of this model update carried in the message and the respective device number.
  • the fourth mask and use the fourth mask to remove the mask operation on the masked global model parameter update to obtain the global model parameter update; perform the local training of this model update according to the global model parameter update.
  • the fourth preset mask generator in each participating device is set to be the same, and is set to be the same as the third preset mask generator of the coordinating device. Since each participating device and coordinating device use the same mask generator to generate the mask according to the same iteration index and device number, the fourth mask generated by each participating device and the third mask corresponding to the participating device the same.
  • the participating device Since the masked global model parameter update is obtained by using the third mask to add the mask, the participating device uses the same fourth mask as the third mask to update the masked global model parameters. By removing the mask operation, the original global model parameter update can be obtained, so that the normal progress of federated learning can be ensured under the condition that the privacy of the participating devices is not leaked to the coordinating device.
  • the coordination device and K participating devices determine that the length of the mask is L before performing federated learning, and L is less than or equal to the length N of the model parameter update and the global model parameter update.
  • t is the iterative index of the model update, and identifies the number of model updates.
  • the coordination device generates the same mask m(k, t) in the TEE module as the participating device, and uses m(k, t) to remove the mask from v(k, t) to obtain w(k, t).
  • the coordinating device sends u(k, t) to the k-th participating device.
  • FIG 3 it shows the contents visible in the TEE module and other parts of the coordination device. Since both w(k, t) and w(t) are obtained in the TEE module, the coordination device cannot know w(k, t) and w(t). And the masks m(k,t) and p(k,t) are also generated in the TEE module, and the coordination device cannot infer w(k,t) from v(k,t) and u(k,t). And w(t).
  • the technical solution of this application is also applicable to scenarios of vertical federated learning, that is, to scenarios where the machine learning model structure of each participating device training may be different, for example , Each participating device trains different neural network models.
  • step S30 includes:
  • Step S301 fusing each model parameter update to obtain a global model parameter update, and using a third preset mask generator to generate a third mask;
  • step S302 the third mask is complemented by a preset completion method, and the mask is added to the global model parameter update by using the completed third mask to obtain the global model parameter update with the mask, where the completion is The length of the following third mask is the same as the length of the model parameter update.
  • the coordination device obtains the model parameter update of each participating device in the TEE module, it can also perform the following operations in the TEE module:
  • the model parameter updates are merged to obtain the global model parameter update, and the third preset mask generator is used to generate the third mask.
  • the third mask is completed by the preset completion method, and the global model parameter update is added with the mask after the completion of the third mask, and the global model parameter update with the mask is obtained, and the third mask after the completion is added.
  • the length of the code is the same as the length of the model parameter update.
  • the preset completion method can be preset, such as using a zero-padding method.
  • the method of zero-padding is used to make the length of the third mask the same as the length of the model parameter update, such as The length of the model parameter update is 100, and the length of the third mask is 90, then 10 elements with a value of zero can be added to the third mask, so that the length of the third mask is 100.
  • the length of the mask can be less than the length of the model parameter update, thereby further reducing the computational complexity of the mask.
  • the length of the mask can only be seen in the TEE module of the coordinating device, thus avoiding the coordinating device inferring the completed part based on the length of the mask, and inferring the global model parameter update based on the completed part, ensuring the privacy of the participating devices Data will not be leaked to the coordinating device.
  • the participating device can complete the first mask, and use the completed first mask to remove the mask from the model parameter update Operation, get the model parameter update with mask.
  • a third embodiment of the method for processing the federal learning privacy data of the application is proposed.
  • the method for processing the federal learning privacy data is applied to a participating device, and the participating device is in communication with the coordinating device.
  • the coordination device of the embodiment of the application is
  • the participating devices can be devices such as smart phones, personal computers, and servers.
  • the participating devices can support the training of the federated learning model, and there is no specific restriction here.
  • the federal learning privacy data processing method includes the following steps:
  • Step A10 Receive the masked global model parameter update of this model update sent by the coordination device;
  • the coordination device and each participating device can establish a communication connection in advance through handshake and identity authentication, and determine the model to be trained for this federated learning.
  • the coordination device and the participating device cooperate with each other to perform multiple iterations of the training model to obtain the final convergent model to be trained, and the training process of the training model can be ended.
  • the coordination device sends the masked global model parameter update for this model update to each participating device, and each participating device receives the masked global model parameter update for each model update sent by the coordination device.
  • Step A20 remove the mask from the masked global model parameter update to obtain the global model parameter update
  • the participating equipment updates the global model parameters with the mask and removes the mask to obtain the global model parameter updates.
  • the participating device may use the first mask generated locally during the last model update process to perform the mask addition operation on the model parameter update, and perform the mask removal operation on the global model parameter update with the mask.
  • Step A30 Perform local training on the to-be-trained model of federated learning according to the local training data of the participating device and the global model parameter update to obtain model parameter updates;
  • Participating devices locally store training data for local training of the model to be trained, and perform local training on the model to be trained for federated learning based on local training data and global model parameter updates to obtain model parameter updates.
  • the specific local training process is the same as the process of using local data to train the model to be trained by the participating devices in the existing federated learning, and will not be described in detail here.
  • Step A40 Use the locally generated first mask of this model update to add a mask to the model parameter update to obtain the masked model parameter update and send it to the coordination device.
  • Participating equipment generates the first mask of this model update, uses the first mask to add mask operation to the model parameter update, obtains the masked model parameter update, and sends the masked model parameter update to the coordination device .
  • Participating devices can generate different first masks for each model update, and the first masks generated by each participating device can be the same or different; each participating device can generate its own first mask through a preset mask generation method.
  • Mask where the preset mask generation method can be set in advance according to needs; the length of the first mask generated by each participating device may be the same or different, and the first mask of each participating device may be preset The length can be less than or equal to the length of the model parameter update to reduce the computational complexity of generating the mask.
  • each participating device Because the participating device sends a masked model parameter update to the coordinating device, and the coordinating device cannot learn the first mask of the participating device, nor the mask generation method of the participating device, so the model parameters of the participating device cannot be learned Update, therefore, the private data of the participating devices will not be leaked to the coordinating device. In addition, each participating device generates the first mask locally, so that no additional communication overhead is needed to negotiate the consistency of the mask between the participating devices, thereby reducing communication overhead and power consumption.
  • step A20 may include:
  • Step A201 Use the first mask in the last model update to remove the mask from the masked global model parameter update to obtain the global model parameter update.
  • the coordinating device receives the data sent by each participating device. Masked model parameter updates, and fused each masked model parameter update to obtain a masked global model parameter update.
  • the coordination device can receive masked model parameter updates sent by each participating device in a model update, and directly perform fusion operations on each masked model parameter update, because the coordination device updates the masked model parameters Perform the fusion operation, therefore, the fusion obtained is the global model parameter update with the mask.
  • the coordination device sends the obtained masked global model parameter update to each participating device, so that each participating device starts a new model update based on the masked global model parameter update.
  • each participating device After each participating device receives the masked global model parameter update, it starts this model update. Specifically, the participating device uses the first mask in the last model parameter update to remove the mask from the masked global model parameter update to obtain the global model parameter update. Since the first mask of each participating device in the same model update is the same, each participating device can use its own first mask to remove the mask from the masked global model parameter update, and the resulting global model parameter The update is the same as the global model parameter update obtained by directly fusing the model parameter update of each participating device.
  • the principle is: suppose that there are two participating devices, namely device 1 and device 2, and device 1 and device 2 are in the last model
  • the coordination device can compare the model of the participating device.
  • the parameter update is fused to ensure the normal progress of federated learning.
  • the length of the masked model parameter update will not increase, and therefore, will not cause additional communication bandwidth requirements.
  • each participating device generates a homomorphic mask locally, without adding additional communication overhead to negotiate the consistency of the mask between the participating devices, which greatly reduces communication overhead and power consumption.
  • a fourth embodiment of the method for processing private learning private data according to this application is proposed.
  • the coordination device includes a TEE module
  • the step A20 includes:
  • Step A203 Generate a fourth mask that is the same as the third mask of the coordination device.
  • Step A204 Use the fourth mask to update the masked global model parameters to remove the mask to obtain the global model parameter update, where the coordination device receives the masked model parameter update sent by each participating device in the last model update , And generate a second mask in the TEE module that is the same as the first mask of the last model update of each participating device. Based on the second mask, update and remove the mask for each masked model parameter to obtain each model parameter update , Fused each model parameter update to obtain the global model parameter update, and use the generated third mask to add a mask to the global model parameter update to obtain the masked global model parameter update of this model update.
  • the coordination device receives the masked model parameter update sent by each participating device, and in the TEE module, generates a second mask that is the same as the first mask of each participating device in this model update , And update and remove the mask for each model parameter with mask based on the second mask to obtain each model parameter update.
  • the coordinating device generates a second mask that is the same as the first mask. If the first mask generated by each participating device is different , The coordination device generates multiple second masks, which correspond to the same first masks of each participating device.
  • the TEE module of the coordinating device can be preset with the same mask generation mode as each participating device, so that the second mask generated by the coordinating device in the TEE module is the same as the first mask.
  • the coordination device uses the same second mask as the first mask in the TEE module to remove the mask operation for the masked model parameter update, the original model parameter update of each participating device can be restored.
  • the mask removal operation is performed in the TEE module of the coordination device, the model parameter update obtained by removing the mask is only visible in the TEE module, and the coordination device can only obtain the masked model parameter update, and cannot obtain the participating devices.
  • the model parameters are updated so that the privacy of participating devices will not be stolen, and the privacy of participating devices will not be leaked to the coordinating device.
  • the coordinating device integrates each model parameter update to obtain a global model parameter update, and generates a third mask.
  • the third mask is used to add a mask to the global model parameter update to obtain a masked global model parameter update.
  • fusing each model parameter update to obtain a global model parameter update may be performed by fusing each model parameter update through a fusion function, and the fusion function may be a function for performing a weighted average operation.
  • the coordination device can generate a different third mask in each model update. If a different third mask is generated in each model update, the global model parameter update obtained by the fusion is used for the next model update. , The third mask generated by the coordination device corresponds to the next model update.
  • the TEE module of the coordination device can use the same mask generation method as the second mask to generate the third mask; the length of the third mask can also be preset, which can be the same as the length of the first mask, or It may be different. Similarly, in order to reduce the computational complexity of generating the mask, the length of the third mask may be less than or equal to the length of the global model parameter update.
  • the coordination device obtains the masked global model parameter update from the TEE module, and sends the masked global model parameter update to each participating device. If the coordination device detects convergence of the model to be trained in this model update, it can send the masked global model parameter update to each participating device, so that each participating device can determine the model to be trained based on the masked global model parameter update The final parameters of the end of this federation study.
  • the coordination device can send the masked global model parameter update to each participating device, and each participating device performs the next model update according to the masked global model parameter update ;
  • each participating device receives the masked global model parameter update sent by the coordinating device, each locally generates a fourth mask that is the same as the third mask of the coordinating device, and uses the fourth mask.
  • the global model parameter update performs a mask removal operation to obtain the global model parameter update.
  • the mask generation mode can be preset in each participating device, and the mask generation mode is the same as that of the third mask generated in the coordinating device, so that the fourth mask generated by the participating device is the same as the third mask generated by the coordinating device.
  • the code is the same.
  • the participating device uses the same fourth mask as the third mask of the coordinating device to remove the mask from the masked global model parameter update, the participating device can restore the original global model in the TEE module of the coordinating device Parameter update, so as to ensure that the participating devices obtain accurate global model parameter updates without causing data deviation; and, because the coordination device obtains masked global model parameter updates from the TEE module, while the original global model parameters The update can only be seen in the TEE module. Therefore, the coordination device cannot learn the original global model parameter update, and thus cannot steal the private data of each participating device.
  • each participating device adds a mask to the model parameter update obtained by each training based on the first mask generated by each participating device to obtain the respective masked model parameter update; the coordination device receives the masked mask sent by each participating device. Update the model parameters of the code.
  • the TEE module In the TEE module, generate a second mask that is the same as the first mask, and update and remove the masks for each masked model parameter based on the second mask to obtain each model parameter update;
  • the global model parameter update is obtained by fusing each model parameter update, and the generated third mask is used to add a mask to the global model parameter update to obtain the global model parameter update with the mask; the global model parameter with the mask is updated
  • Each participant device is updated and sent, and each participant device updates and removes the mask from the masked global model parameter update to obtain the global model parameter update based on the fourth mask generated by each participant device which is the same as the third mask.
  • the coordination device cannot obtain the model parameter update and global model parameter update of each participating device, but can obtain the model parameter update of the participating device in the TEE module and perform the fusion operation. It realizes the model update process of federated learning without revealing the privacy of the coordinating device; and through masking technology, the model parameter update and global model parameter update can be safely transmitted without increasing the communication bandwidth requirement; and , Through the coordinating device and the participating device respectively generating the mask locally, it is ensured that the masks used for adding and removing the mask are the same, so that between the participating device and the coordinating device, the participating device and the participating device, or Participating devices and third-party servers do not need to increase additional communication overhead to negotiate the consistency of the mask, especially in the scenario where the mask is replaced in each model update, which greatly reduces the communication overhead and power consumption.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Abstract

一种联邦学习隐私数据处理方法、设备、系统及存储介质,所述方法包括:接收参与设备发送的带掩码的模型参数更新,其中,各参与设备基于各自生成的第一掩码对各自训练得到的模型参数更新添加掩码,得到各自带掩码的模型参数更新(S10);在TEE模块中,生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新(S20);在TEE模块中,融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新(S30);将带掩码的全局模型参数更新发送给各参与设备,以供各参与设备基于各自生成的与第三掩码相同的第四掩码,对带掩码的全局模型参数更新去除掩码得到全局模型参数更新(S40)。本方法实现了一种安全机制,使得参与设备的信息不会泄露给协调设备,并不会造成显著增加通信带宽要求。

Description

联邦学习隐私数据处理方法、设备、系统及存储介质
本申请要求于2019年9月20日提交中国专利局、申请号为201910892806.9、发明名称为“联邦学习隐私数据处理方法、设备、系统及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及数据处理技术领域,尤其涉及一种联邦学习隐私数据处理方法、设备、系统及存储介质。
背景技术
随着人工智能的发展,人们为解决数据孤岛的问题,提出了“联邦学习”的概念,使得联邦双方在不用给出己方数据的情况下,也可进行模型训练得到模型参数,并且可以避免数据隐私泄露的问题。
在实际应用横向联邦学习的场景中,参与者向协调设备发送的本地模型参数更新(例如,神经网络模型权值,或者梯度信息)会被协调者获得,在不能保证协调者可靠性的场景,可能会泄露参与者的隐私、数据信息、所训练的机器学习模型给协调者。目前,为了保证不向协调者泄露参与者的隐私信息,参与者可通过加密的方式,例如,使用同态加密(homomorphic encryption)技术、秘密分享(secret sharing)技术或者差分隐私(differential privacy)技术,向协调者发送模型参数更新,协调者不能解密的情况下不能获得模型权值或者梯度信息,进而保证了不会向协调者泄露任何信息。
但是,使用加密技术会显著增加需要传输的信息的长度,例如,使用同态加密技术,使用最常用的Paillier算法,获得的密文(用比特数衡量)的长度至少是明文长度的2倍,即加密比不加密至少增加了一倍的通信带宽要求。在一些实际应用中,例如,IoT、移动互联网、遥感和商业卫星通信链路中,通信带宽严重受限,参与者加密操作带来的额外的通信带宽要求很可能不能被满足,或者至少会显著增加通信的延迟。
发明内容
本申请的主要目的在于提供一种联邦学习隐私数据处理方法、设备、系统及存储介质,旨在实现一种安全机制,使得参与者的信息不会泄露给协调者,并不会造成显著增加通信带宽要求。
为实现上述目的,本申请提供一种联邦学习隐私数据处理方法,所述联邦学习隐私数据处理方法应用于协调设备,协调设备中包括可信执行环境TEE模块,协调设备与多个参与设备通信连接,所述联邦学习隐私数据处理方法包括以下步骤:
接收各参与设备发送的带掩码的模型参数更新,其中,各参与设备基于各自生成的第一掩码对各自训练得到的模型参数更新添加掩码,得到各自带掩码的模型参数更新;
在TEE模块中,生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新;
在TEE模块中,融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新;
将带掩码的全局模型参数更新发送给各参与设备,以供各参与设备基于各自生成的与 第三掩码相同的第四掩码,对带掩码的全局模型参数更新去除掩码得到全局模型参数更新。
可选地,所述生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新的步骤包括:
至少根据本次模型更新的迭代索引,采用第一预设掩码生成器生成第二掩码;
基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新,其中,各参与设备至少根据本次模型更新的迭代索引,采用各自本地的第二预设掩码生成器生成第一掩码,第一预设掩码生成器与第二预设掩码生成器相同。
可选地,所述生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新的步骤包括:
至少根据本次模型更新的迭代索引和各参与设备的设备编号,采用第一预设掩码生成器生成与各参与设备对应的各第二掩码;
分别基于每个参与设备对应的第二掩码,对每个参与设备发送的各带掩码的模型参数更新去除掩码,得到各模型参数更新,其中,各参与设备至少根据本次模型更新的迭代索引和各自的设备编号,采用各自本地的第二预设掩码生成器生成各自的第一掩码,第一预设掩码生成器与第二预设掩码生成器相同。
可选地,当第三掩码的长度小于模型参数更新的长度时,所述融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新的步骤包括:
融合各模型参数更新得到全局模型参数更新,并采用第三预设掩码生成器生成第三掩码;
通过预设补全方法对第三掩码进行补全,采用补全后的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新,其中,补全后的第三掩码的长度与模型参数更新的长度相同。
可选地,所述融合各模型参数更新得到全局模型参数更新的步骤之后,还包括:
在TEE模块中根据全局模型参数更新判断联邦学习的待训练模型是否收敛;
若待训练模型收敛则结束对待训练模型的训练,或者若迭代次数达到预设最大迭代次数则结束对待训练模型的训练,或者若训练时间达到最大训练时间则结束对待训练模型的训练。
为实现上述目的,本申请还提供一种联邦学习隐私数据处理方法,所述联邦学习隐私数据处理方法应用于参与设备,参与设备与协调设备通信连接,所述联邦学习隐私数据处理方法包括以下步骤:
接收协调设备发送的本次模型更新的带掩码的全局模型参数更新;
对带掩码的全局模型参数更新去除掩码得到全局模型参数更新;
根据参与设备本地的训练数据和全局模型参数更新对联邦学习的待训练模型进行本地训练,得到模型参数更新;
采用本地生成的本次模型更新的第一掩码对模型参数更新添加掩码,得到带掩码的模型参数更新并发送给协调设备。
可选地,协调设备中包括可信执行环境TEE模块,
所述对带掩码的全局模型参数更新去除掩码得到全局模型参数更新的步骤包括:
生成与协调设备的第三掩码相同的第四掩码;
采用第四掩码对带掩码的全局模型参数更新去除掩码得到全局模型参数更新,其中, 协调设备在上一次模型更新中,接收各参与设备发送的带掩码的模型参数更新,并在TEE模块中生成与各参与设备上一次模型更新的第一掩码相同的第二掩码,基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新,融合各模型参数更新得到全局模型参数更新,采用生成的第三掩码对全局模型参数更新添加掩码,得到本次模型更新的带掩码的全局模型参数更新。
可选地,所述对带掩码的全局模型参数更新去除掩码得到全局模型参数更新的步骤包括的步骤包括:
采用上一次模型更新中的第一掩码对带掩码的全局模型参数更新去除掩码,得到全局模型参数更新,其中,协调设备在上一次模型更新中,接收各参与设备发送的带掩码的模型参数更新,并融合各带掩码的模型参数更新得到带掩码的全局模型参数更新。
为实现上述目的,本申请还提供一种设备,所述设备为协调设备,所述设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的联邦学习隐私数据处理程序,所述联邦学习隐私数据处理程序被所述处理器执行时实现如上所述的联邦学习隐私数据处理方法的步骤。
为实现上述目的,本申请还提供一种设备,所述设备为参与设备,所述设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的联邦学习隐私数据处理程序,所述联邦学习隐私数据处理程序被所述处理器执行时实现如上所述的联邦学习隐私数据处理方法的步骤。
为实现上述目的,本申请还提供一种联邦学习隐私数据处理系统,所述联邦学习隐私数据处理系统包括:至少一个如上所述的协调设备和至少一个如上所述的参与设备。
此外,为实现上述目的,本申请还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有联邦学习隐私数据处理程序,所述联邦学习隐私数据处理程序被处理器执行时实现如上所述的联邦学习隐私数据处理方法的步骤。
本申请中,通过各参与设备基于各自生成的第一掩码对各自训练得到的模型参数更新添加掩码,得到各自带掩码的模型参数更新;协调设备接收各参与设备发送的带掩码的模型参数更新,在TEE模块中,生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新;在TEE模块中,融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新;将带掩码的全局模型参数更新发送各参与设备,以供各参与设备基于各自生成的与第三掩码相同的第四掩码,对带掩码的全局模型参数更新去除掩码得到全局模型参数更新。本实施例中,通过融合掩码技术和TEE技术,使得协调设备无法获得各参与设备的模型参数更新和全局模型参数更新,但能够在TEE模块中得到参与设备的模型参数更新并进行融合操作,实现了在不泄露给协调设备隐私的情况下,完成联邦学习的模型更新过程;并通过掩码技术,使得模型参数更新和全局模型参数更新既能够安全传输,又不会增加通信带宽要求;并且,通过协调设备和参与设备各自在本地生成掩码,保证生成用于添加掩码操作和去除掩码操作的掩码相同,使得参与设备与协调设备之间、参与设备和参与设备之间、或参与设备与第三方服务器之间,无需增加额外的通信开销去协商掩码的一致性,特别是在每一次模型更新中都更换掩码的场景,极大地降低了通信开销和电量开销。
附图说明
图1是本申请实施例方案涉及的硬件运行环境的结构示意图;
图2为本申请联邦学习隐私数据处理方法第一实施例的流程示意图;
图3为本申请实施例涉及一种协调设备中可见内容示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
如图1所示,图1是本申请实施例方案涉及的硬件运行环境的设备结构示意图。
需要说明的是,本申请实施例设备是协调设备,协调设备可以是智能手机、个人计算机和服务器等设备,在此不做具体限制。
如图1所示,该设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的设备结构并不构成对设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及联邦学习隐私数据处理程序,还包括TEE(Trusted execution environment,可信执行环境)模块。其中,操作系统是管理和控制设备硬件和软件资源的程序,支持联邦学习隐私数据处理程序以及其它软件或程序的运行。TEE是主处理器内的安全区域,运行在一个独立的环境中且与操作系统并行运行,它确保TEE中加载的代码和数据的机密性和完整性都得到保护。TEE中运行的受信任应用程序可以访问设备主处理器和内存的全部功能,而硬件隔离保护这些组件不受主操作系统中运行的用户安装应用程序的影响。在本实施例中,TEE模块的实现方式可以有多种,如基于Intel的Software Guard Extensions(软件保护扩展,SGX)、AMD的Secure EncryptedVirtualization(安全虚拟化加密,SEV)、ARM的Trust Zone或MIT的Sanctum。对TEE模块的认证和鉴权,可以通过第三方安全服务器来完成。例如,当TEE是使用Intel的SGX时,可以通过Intel的安全服务器对所述TEE进行认证,即保证所述TEE的安全。
在图1所示的设备中,用户接口1003主要用于与客户端进行数据通信;网络接口1004主要用于与各参与设备建立通信连接;而处理器1001可以用于调用存储器1005中存储的联邦学习隐私数据处理程序,并执行以下操作:
接收各参与设备发送的带掩码的模型参数更新,其中,各参与设备基于各自生成的第一掩码对各自训练得到的模型参数更新添加掩码,得到各自带掩码的模型参数更新;
在TEE模块中,生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新;
在TEE模块中,融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新;
将带掩码的全局模型参数更新发送各参与设备,以供各参与设备基于各自生成的与第 三掩码相同的第四掩码,对带掩码的全局模型参数更新去除掩码得到全局模型参数更新。
进一步地,所述生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新的步骤包括:
至少根据本次模型更新的迭代索引,采用第一预设掩码生成器生成第二掩码;
基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新,其中,各参与设备至少根据本次模型更新的迭代索引,采用各自本地的第二预设掩码生成器生成第一掩码,第一预设掩码生成器与第二预设掩码生成器相同。
进一步地,所述生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新的步骤包括:
至少根据本次模型更新的迭代索引和各参与设备的设备编号,采用第一预设掩码生成器生成与各参与设备对应的各第二掩码;
分别基于每个参与设备对应的第二掩码,对每个参与设备发送的各带掩码的模型参数更新去除掩码,得到各模型参数更新,其中,各参与设备至少根据本次模型更新的迭代索引和各自的设备编号,采用各自本地的第二预设掩码生成器生成各自的第一掩码,第一预设掩码生成器与第二预设掩码生成器相同。
进一步地,当第三掩码的长度小于模型参数更新的长度时,所述融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新的步骤包括:
融合各模型参数更新得到全局模型参数更新,并采用第三预设掩码生成器生成第三掩码;
通过预设补全方法对第三掩码进行补全,采用补全后的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新,其中,补全后的第三掩码的长度与模型参数更新的长度相同。
进一步地,所述融合各模型参数更新得到全局模型参数更新的步骤之后,处理器1001还可以用于调用存储器1005中存储的联邦学习隐私数据处理程序,并执行以下步骤:
在TEE模块中根据全局模型参数更新判断联邦学习的待训练模型是否收敛;
若待训练模型收敛则结束对待训练模型的训练,或者若迭代次数达到预设最大迭代次数则结束对待训练模型的训练,或者若训练时间达到最大训练时间则结束对待训练模型的训练。
此外,本申请实施例还提出一种参与设备,参与设备与协调设备通信连接,所述参与设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的联邦学习隐私数据处理程序,所述联邦学习隐私数据处理程序被所述处理器执行时实现如下所述的联邦学习隐私数据处理方法的步骤:
接收协调设备发送的本次模型更新的带掩码的全局模型参数更新;
对带掩码的全局模型参数更新去除掩码得到全局模型参数更新;
根据参与设备本地的训练数据和全局模型参数更新对联邦学习的待训练模型进行本地训练,得到模型参数更新;
采用本地生成的本次模型更新的第一掩码对模型参数更新添加掩码,得到带掩码的模型参数更新并发送给协调设备。
进一步地,协调设备中包括可信执行环境TEE模块,
所述对带掩码的全局模型参数更新去除掩码得到全局模型参数更新的步骤包括:
生成与协调设备的第三掩码相同的第四掩码;
采用第四掩码对带掩码的全局模型参数更新去除掩码得到全局模型参数更新,其中,协调设备在上一次模型更新中,接收各参与设备发送的带掩码的模型参数更新,并在TEE模块中生成与各参与设备上一次模型更新的第一掩码相同的第二掩码,基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新,融合各模型参数更新得到全局模型参数更新,采用生成的第三掩码对全局模型参数更新添加掩码,得到本次模型更新的带掩码的全局模型参数更新。
进一步地,所述对带掩码的全局模型参数更新去除掩码得到全局模型参数更新的步骤包括的步骤包括:
采用上一次模型更新中的第一掩码对带掩码的全局模型参数更新去除掩码,得到全局模型参数更新,其中,协调设备在上一次模型更新中,接收各参与设备发送的带掩码的模型参数更新,并融合各带掩码的模型参数更新得到带掩码的全局模型参数更新。
此外,本申请实施例还提出一种联邦学习隐私数据处理系统,所述联邦学习隐私数据处理系统包括至少一个如上所述的协调设备、至少一个如上所述的参与设备。
此外,本申请实施例还提出一种计算机可读存储介质,所述存储介质上存储有联邦学习隐私数据处理程序,所述联邦学习隐私数据处理程序被处理器执行时实现如下所述的联邦学习隐私数据处理方法的步骤。
本申请协调设备、参与设备、联邦学习隐私数据处理系统和计算机可读存储介质的各实施例,均可参照本申请联邦学习隐私数据处理方法各个实施例,此处不再赘述。
基于上述的结构,提出联邦学习隐私数据处理方法的各个实施例。
参照图2,图2为本申请联邦学习隐私数据处理方法第一实施例的流程示意图。
本申请实施例提供了联邦学习隐私数据处理方法的实施例,需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本申请第一实施例联邦学习隐私数据处理方法应用于协调设备,协调设备与多个参与设备通信连接,协调设备中包括TEE模块,本申请实施例协调设备和参与设备可以是智能手机、个人计算机和服务器等设备,参与设备可支持联邦学习模型的训练,在此不做具体限制。在本实施例中,联邦学习隐私数据处理方法包括:
步骤S10,接收各参与设备发送的带掩码的模型参数更新,其中,各参与设备基于各自生成的第一掩码对各自训练得到的模型参数更新添加掩码,得到各自带掩码的模型参数更新;
在以下各实施例中,采用掩码(mask)技术对数据进行安全处理,掩码也称为摄动(perturbation)。掩码可以是一个向量,向量的元素可以是一个或多个,元素类型可以是整型或浮点型,掩码可以是随机生成的,即随机生成向量中的各元素。对数据添加掩码的过程可以是:对于一个待添加掩码的目标向量(目标向量中元素个数与掩码中元素个数相同时,即目标向量的长度与掩码长度相同时),将其中每一个元素,加上或减去掩码中对应位置的元素,得到带掩码的目标向量。对数据去除掩码的过程可以是:对于带掩码的目标向量,将其中每一个元素,减去或加上掩码中对应位置的元素,得到目标向量。对目标向量进行添加掩码和去除掩码操作后,得到的仍然是原来的目标向量,并且对目标向量添加掩码增加目标向量的长度,在只获得带掩码的目标向量的情况下,无法获知原始的目标向量,从而保障了数据的安全性。
需要说明的是,如果操作对象是整数,即在整数域里进行运算,那么上述添加掩码和 去除掩码的操作中,还可以包括取模操作。取模操作可以保证所述运算结果停留在一个有限的整数域里。
在本实施例中,协调设备与各参与设备可通过握手、身份认证预先建立通信连接,并确定本次联邦学习的待训练模型,待训练模型可以是机器学习模型,如神经网络模型。在联邦学习过程中,协调设备与参与设备通过相互配合,对待训练模型进行多次迭代更新,得到最终收敛的待训练模型,即可结束对待训练模型的训练过程。在一次模型更新中,各参与设备根据本次模型更新的全局模型参数更新和各自本地拥有的本地训练数据,对待训练模型分别进行本地训练,得到各自本地的模型参数更新,并在各自本地生成第一掩码,对各自的模型参数更新添加掩码,得到带掩码的模型参数更新,并将带掩码的模型参数更新发送给协调设备。
其中,模型参数更新可以是神经网络的节点之间连接的权重参数,或者模型参数更新也可以是联邦学习模型的梯度信息,例如,神经网络梯度下降算法中的梯度信息,梯度信息可以是梯度值或压缩后的梯度值;模型参数更新是一个向量,包括多个元素,如模型参数更新是权重参数时,向量中的元素是各个权重参数,向量中元素的个数即模型参数更新的长度;参与设备可以是每一次模型更新都生成不同的第一掩码,各个参与设备生成的第一掩码可以相同也可以不相同;各个参与设备可通过预设的掩码生成方式生成各自的第一掩码,其中,预设的掩码生成方式可以是预先根据需要进行设置,如采用掩码生成器,掩码生成器可以是采用常用的伪随机数生成器,如ANSI X9.17或采用线性同于法的伪随机数生成器等,或者根据特定分布生成随机掩码,例如,生成符合高斯分布的随机掩码;各个参与设备生成的第一掩码的长度可以相同也可以不相同,可以预先设置各个参与设备的第一掩码的长度,长度可以小于或者等于模型参数更新的长度,以降低生成掩码的计算复杂度。
协调设备接收各个参与设备发送的带掩码的模型参数更新。
步骤S20,在TEE模块中,生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新;
协调设备在TEE模块中生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新。需要说明的是,若各参与设备各自生成的第一掩码相同,则协调设备生成一个与该第一掩码相同的第二掩码即可,若参与设备各自生成的第一掩码不相同,则协调设备生成多个第二掩码,分别与各个参与设备的第一掩码对应相同。若各参与设备每一次模型更新生成不相同第一掩码,即一个参与设备上次模型更新生成的第一掩码与本次模型更新生成的第一掩码不相同,则协调设备生成与参与设备本次模型更新中生成的第一掩码相同的第二掩码。协调设备的TEE模块中可预置与各参与设备相同的掩码生成方式,使得协调设备在TEE模块中生成的第二掩码与第一掩码相同。
由于协调设备在TEE模块中采用与第一掩码相同的第二掩码对带掩码的模型参数更新进行去除掩码操作,因此,能够还原得到各参与设备的原始模型参数更新。并且,由于是在协调设备的TEE模块中进行去除掩码操作,去除掩码得到的模型参数更新只在TEE模块中可见,协调设备只能获得带掩码的模型参数更新,无法获得各参与设备的模型参数更新,从而不会窃取参与设备的隐私,保证了参与设备的隐私不会泄露给协调设备。
步骤S30,在TEE模块中,融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新;
协调设备在TEE模块中,融合各模型参数更新得到全局模型参数更新,并生成第三 掩码,采用第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新。其中,融合各模型参数更新得到全局模型参数更新,可以是通过融合函数对各个模型参数更新进行融合,融合函数可以是进行加权平均操作的函数。协调设备可以是每一次模型更新中都生成不同的第三掩码,若每一次模型更新中都生成不同的第三掩码,由于融合得到的全局模型参数更新用于下一次的模型更新,因此,协调设备生成的第三掩码对应下一次模型更新。协调设备的TEE模块中可采用与生成第二掩码相同的掩码生成方式生成第三掩码;第三掩码的长度也可以是预先进行设置,可以与第一掩码的长度相同,也可以不相同,同样地,为了降低生成掩码的计算复杂度,第三掩码的长度可以小于或等于全局模型参数更新的长度。
步骤S40,将带掩码的全局模型参数更新发送给各参与设备,以供各参与设备基于各自生成的与第三掩码相同的第四掩码,对带掩码的全局模型参数更新去除掩码得到全局模型参数更新。
协调设备从TEE模块中获取带掩码的全局模型参数更新,将带掩码的全局模型参数更新发送给各个参与设备。若协调设备在本次模型更新中检测到待训练模型收敛,则可以将带掩码的全局模型参数更新发送给各个参与设备,供各个参与设备根据带掩码的全局模型参数更新确定待训练模型的最终参数,结束本次联邦学习。若协调设备在本次模型更新中检测到待训练模型收敛,则可以将带掩码的全局模型参数更新发送给各个参与设备,各个参与设备根据带掩码的全局模型参数更新进行下一次模型更新;各参与设备在接收到协调设备发送的带掩码的全局模型参数更新后,各自本地生成与协调设备的第三掩码相同的第四掩码,采用第四掩码,对带掩码的全局模型参数更新进行去除掩码操作,得到全局模型参数更新。各参与设备中可预置掩码生成方式,该掩码生成方式与协调设备中生成第三掩码的掩码生成方式相同,以使得参与设备生成的第四掩码与协调设备的第三掩码相同。
由于参与设备是采用与协调设备的第三掩码相同的第四掩码对带掩码的全局模型参数更新进行去除掩码操作,因此,参与设备能够还原得到协调设备TEE模块中的原始全局模型参数更新,从而保证了参与设备获取到准确的全局模型参数更新,不会造成数据的偏差;并且,由于协调设备是从TEE模块中获取带掩码的全局模型参数更新,而原始的全局模型参数更新只能在TEE模块中可见,因此,协调设备无法获知原始的全局模型参数更新,从而无法窃取各个参与设备的隐私数据。
在本实施例中,通过各参与设备基于各自生成的第一掩码对各自训练得到的模型参数更新添加掩码,得到各自带掩码的模型参数更新;协调设备接收各参与设备发送的带掩码的模型参数更新,在TEE模块中,生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新;在TEE模块中,融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新;将带掩码的全局模型参数更新发送各参与设备,以供各参与设备基于各自生成的与第三掩码相同的第四掩码,对带掩码的全局模型参数更新去除掩码得到全局模型参数更新。本实施例中,通过融合掩码技术和TEE技术,使得协调设备无法获得各参与设备的模型参数更新和全局模型参数更新,但能够在TEE模块中得到参与设备的模型参数更新并进行融合操作,实现了在不泄露给协调设备隐私的情况下,完成联邦学习的模型更新过程;并通过掩码技术,使得模型参数更新和全局模型参数更新既能够安全传输,又不会增加通信带宽要求;并且,通过协调设备和参与设备各自在本地生成掩码,保证生成用于添加掩码操作和去除掩码操作的掩码相同,使得参与设备与协调设备之间、参与设备和参与设备之间、或参与设备与第三方服务器之间,无需增加额外的通信开销去 协商掩码的一致性,特别是在每一次模型更新中都更换掩码的场景,极大地降低了通信开销和电量开销。
进一步地,若协调设备根据全局模型参数更新来确定待训练模型是否收敛,则协调设备判断待训练模型是否收敛的操作也在TEE模块中执行,具体地,步骤S30之后,还包括:
步骤S301,在TEE模块中根据全局模型参数更新判断联邦学习的待训练模型是否收敛;
步骤S302,若待训练模型收敛则结束对待训练模型的训练,或者若迭代次数达到预设最大迭代次数则结束对待训练模型的训练,或者若训练时间达到最大训练时间则结束对待训练模型的训练。
协调设备在TEE模块中得到全局模型参数更新后,继续在TEE模块中根据全局模型参数更新判断待训练模型是否收敛。具体地,可判断本次模型更新得到的全局模型参数更新与上一次模型更新得到的联合模型之间的差值是否小于预设差值,若小于预设差值,则确定待训练模型已收敛,若不小于预设差值,则确定待训练模型未收敛。
若确定待训练模型收敛,则协调设备可结束对待训练模型的训练,也即,将在TEE模块中获取到的本次模型更新得到的带掩码的全局模型参数更新,作为待训练模型最终的参数发送给各个参与设备。若确定待训练模型未收敛,则协调设备将在TEE模块中获取到的本地模型更新得到的带掩码的全局模型参数更新,作为新一次模型更新的全局模型参数更新,发送给各个参与设备,各个参与设备根据带掩码的全局模型参数更新进行新一次模型更新。循环迭代直到协调设备在TEE模块中确定待训练模型已收敛为止。
或者,若协调设备在TEE模块中检测到迭代次数达到预设最大迭代次数则结束对待训练模型的训练,或者在TEE模块中检测到若训练时间达到最大训练时间则结束对待训练模型的训练。
在本实施例中,由于协调设备在TEE模块中根据全局模型参数更新来判断待训练模型是否收敛,使得全局模型参数更新只在TEE模块中可见,协调设备无法获知全局模型参数更新,从而保证参与设备的隐私数据不会泄露给协调设备,也保证联邦学习的正常进行。
进一步地,在一实施例中,待训练模型可以是用于信贷风险预估的神经网络模型,神经网络模型的输入可以是用户的特征数据,输出可以是对用户的风险评分,参与设备可以是多家银行的设备,各自在本地拥有多个用户的样本数据,协调设备是独立于多家银行的第三方服务器。协调设备与各个参与设备按照上述实施例中联邦学习的过程进行待训练模型的训练,得到最终收敛用于信贷风险预估的神经网络模型。各家银行可采用训练得到的神经网络模型来对用户的信贷风险进行预估,将用户的特征数据输入训练好的模型中,得到该用户的风险评分。由于协调设备和各参与设备按照在联邦学习过程中,通过融合掩码技术和TEE技术,使得协调设备无法获得各个银行的用户隐私数据;并通过掩码技术,使得模型参数更新和全局模型参数更新既能够安全传输,又不会增加通信带宽要求,从而降低了各家银行在设备部署上的成本;并且,通过协调设备和参与设备各自在本地生成掩码,保证生成用于添加掩码操作和去除掩码操作的掩码相同,使得银行设备与协调设备之间,无需增加额外的通信开销去协商掩码的一致性,特别是在每一次模型更新中都更换掩码的场景,极大地降低了通信开销和电量开销,而降低了各家银行在设备部署上的成本。
需要说明的是,待训练模型还可以是用于除信贷风险预估以外其他的应用场景,如还可以应用于绩效等级预测,论文价值评价等,本申请实施例在此不做限定。
进一步地,基于上述第一实施例,提出本申请联邦学习隐私数据处理方法第二实施例, 在本申请联邦学习隐私数据处理方法第二实施例中,所述步骤S20包括:
步骤S201,至少根据本次模型更新的迭代索引,采用第一预设掩码生成器生成第二掩码;
步骤S202,基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新,其中,各参与设备至少根据本次模型更新的迭代索引,采用各自本地的第二预设掩码生成器生成第一掩码,第一预设掩码生成器与第二预设掩码生成器相同。
迭代索引是指模型更新的编号,标识是第几次模型更新。协调设备可对每次模型更新进行编号,作为迭代索引,并可在向各参与设备发送带掩码的全局模型参数更新以开启新一次的模型更新时,将迭代索引发送给各个参与设备,各个参与设备可在返回这一次模型更新的模型参数更新时携带这一次的迭代索引,从而保证协调设备与参与设备更新次数上的同步。
各参与设备在根据本次模型更新的全局模型参数更新,以及本地训练数据对待训练模型进行本地训练,得到各自的模型参数更新后,可至少根据本次模型更新的迭代索引,采用各自本地的第二预设掩码生成器,生成第一掩码。每个参与设备中的第二预设掩码生成器相同。各参与设备将迭代索引输入第二预设掩码生成器,由第二预设掩码生成器以迭代索引作为基数,生成第一掩码。可预先通过配置第二预设掩码生成器的参数,设置第一掩码的长度,即通过设置第二预设掩码生成器的参数,设置第一掩码的元素的个数。由于各个参与设备在同一次模型更新中的迭代索引相同,采用的掩码生成器相同,因此,各参与设备生成的第一掩码是相同的;但是对于一个参与设备来说,由于迭代索引不同,该参与设备每一次模型更新生成的第一掩码不同,使得协调设备无法根据参与设备相邻两次带掩码的模型参数更新,推断出原始的模型参数更新,从而进一步地提高了对参与设备的隐私数据的保护力度。
各参与设备采用生成的第一掩码对各自的模型参数更新进行添加掩码操作,并将得到的带掩码的模型参数更新发送给协调设备。
协调设备在TEE模块中执行以下操作(也即以下操作只在TEE模块中可见):
至少根据本次模型更新的迭代索引,采用第一预设掩码生成器生成第二掩码。其中,第一预设掩码生成器可以是预先设置的掩码生成器,如ANSI X9.17,且第一预设掩码生成器与第二预设掩码生成器相同。具体地,将本次模型更新的迭代索引将迭代索引输入第一预设掩码生成器,由第一预设掩码生成器以迭代索引作为基数,生成第二掩码。可通过预先配置掩码生成器的参数,使得掩码生成器生成的第二掩码的长度与第一掩码相同。
由于采用与参与设备中的掩码生成器相同的掩码生成器,且以相同的迭代索引作为掩码生成器的输入,使得生成的第二掩码与第一掩码相同。
根据第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新。由于带掩码的模型参数更新是采用第一掩码进行的添加掩码操作,因此,采用与第一掩码相同的第二掩码对带掩码的模型参数更新进行去除掩码操作,能够得到原始的模型参数更新。并且,协调设备是在TEE模块中进行的上述操作,因此,协调设备只能获得带掩码的模型参数更新,而无法获得原始的模型参数更新,因此不会造成参与设备的隐私泄露给协调设备;且协调设备的TEE模块能够获得各参与设备的模型参数更新,并对模型参数更新进行融合,保证了联邦学习的正常进行。并且协调设备与各参与设备均在各自本地生成对应相同的掩码,不需要额外增加通信开销协商掩码的一致性,极大地降低了通信开销和电量开销。
进一步地,协调设备还可在TEE模块中执行一下操作:对得到的各个参与设备的模型参数更新进行融合操作,得到全局模型参数更新。采用第三预设掩码生成器,根据下一 次模型更新的迭代索引生成第三掩码。其中,第三预设掩码生成器可以是预先设置的掩码生成器,可以与第一预设掩码生成器相同,也可以不相同。根据第三掩码对全局模型参数更新更新进行添加掩码操作,得到带掩码的全局模型参数更新。
协调设备将带掩码的全局模型参数更新发送给各个参与设备,并可携带下一次模型更新的迭代索引,以开启下一次的模型更新。
各个参与设备在接收到新一次模型更新的带掩码的全局模型参数更新后,根据消息中携带的本次模型更新的迭代索引,采用第四预设掩码生成器生成第四掩码,并采用第四掩码对带掩码的全局模型参数更新进行去除掩码操作,得到全局模型参数更新,根据全局模型参数更新进行本次模型更新的本地训练。其中,各个参与设备中的第四预设掩码生成器设置得相同,且均与协调设备的第三预设掩码器设置得相同。由于各个参与设备与协调设备根据相同的迭代索引,采用相同的掩码生成器生成掩码,因此第四掩码和第三掩码相同。由于带掩码的全局模型参数更新是采用第三掩码进行添加掩码操作得到的,因此,参与设备采用与第三掩码相同的第四掩码对带掩码的全局模型参数更新新进行去除掩码操作,能够得到原始的全局模型参数更新,从而在保证参与设备的隐私不泄露给协调设备的情况下,能够保证联邦学习的正常进行。
以下举一具体例子进行详细说明:
1、协调设备与K个参与设备在进行联邦学习之前,确定掩码的长度为L,L小于等于模型参数更新和全局模型参数更新的长度N。t是模型更新的迭代索引,标识是第几次模型更新。
2、在第t次模型更新中,第k个参与设备训练得到模型参数更新w(k,t),并生成掩码m(t),得到带掩码的模型参数更新v(k,t)=w(k,t)+m(t),发送给协调设备。
3、协调设备在TEE模块中生成与参与设备相同的掩码m(k,t),采用m(k,t)对v(k,t)去除掩码,得到w(k,t)。协调设备继续在TEE模块中对多个参与设备送来的{w(k,t)进行融合,得到全局模型参数更新w(t),并生成掩码p(t),得到带掩码的全局模型参数更新u(t)=w(t)+p(t)。协调设备将u(t)发送给各个参与设备。由于w(k,t)和w(t)都是在TEE模块中获得的,协调设备无法获知w(k,t)和w(t)。并且掩码m(t)和p(t)也是在TEE模块中生成的,协调设备也无法根据v(k,t)和u(t)来推断w(k,t)和w(t)。
4、第k个参与设备生成与协调设备TEE模块中相同的p(t),采用p(t)对u(t)去除掩码,得到w(t)=u(t)-p(t)。
进一步地,基于上述第一实施例,提出本申请联邦学习隐私数据处理方法第三实施例,在本申请联邦学习隐私数据处理方法第三实施例中,所述步骤S20包括:
步骤S203,至少根据本次模型更新的迭代索引和各参与设备的设备编号,采用第一预设掩码生成器生成与各参与设备对应的各第二掩码;
步骤S204,分别基于每个参与设备对应的第二掩码,对每个参与设备发送的各带掩码的模型参数更新去除掩码,得到各模型参数更新,其中,各参与设备至少根据本次模型更新的迭代索引和各自的设备编号,采用各自本地的第二预设掩码生成器生成各自的第一掩码,第一预设掩码生成器与第二预设掩码生成器相同。
协调设备和各个参与设备在开始联邦学习的模型训练之前,协调设备可对参与联邦学习的各个参与设备分配设备编号,可以采用数字编号、字母编号等,也可以是预先与各个参与设备协商的不同标识,并将各个参与设备的设备编号分别发送给各个参与设备,对于 在联邦学习的模型训练过程中新加入的参与设备,协调设备可给该参与设备分配一个编号,以保证每个参与设备的编号不同,从而实现联邦学习过程中协调设备对各个参与设备的管理。
各个参与设备在根据本次模型更新的全局模型参数更新,以及本地训练数据对待训练模型进行本地训练,得到各自的模型参数更新后,可至少根据本次模型更新的迭代索引和各自的设备编号,采用各自本地的第二预设掩码生成器,生成第一掩码。每个参与设备中的第二预设掩码生成器相同。各参与设备将迭代索引和各自的设备编号输入第二预设掩码生成器,由第二预设掩码生成器以迭代索引和设备编号作为基数,生成第一掩码。可预先通过配置第二预设掩码生成器的参数,设置第一掩码的长度,即通过设置第二预设掩码生成器的参数,设置第一掩码的元素的个数。由于各个参与设备的设备编号不相同,因此各个参与设备生成的第一掩码不相同。
各参与设备采用生成的第一掩码对各自的模型参数更新进行添加掩码操作,并将得到的带掩码的模型参数更新发送给协调设备。
协调设备在TEE模块中执行以下操作(也即以下操作只在TEE模块中可见):
至少根据本次模型更新的迭代索引和各个参与设备的设备编号,采用第一预设掩码生成器生成与各个参与设备对应的第二掩码。具体地,由于各个参与设备的第一掩码不相同,所以对于每个参与设备,根据迭代索引和该参与设备的设备编号,采用第一预设掩码生成器生成与该参与设备对应的第二掩码。其中,第一预设掩码生成器可以是预先设置的掩码生成器,如ANSI X9.17,且第一预设掩码生成器与第二预设掩码生成器相同。具体地,对于每个参与设备,将本次模型更新的迭代索引和该参与设备的设备编号输入第一预设掩码生成器,由第一预设掩码生成器以迭代索引和设备编号作为基数,生成与该参与设备对应的第二掩码。可通过预先配置掩码生成器的参数,使得掩码生成器生成的第二掩码的长度与第一掩码相同。
由于采用与参与设备中的掩码生成器相同的掩码生成器,且以相同的迭代索引和设备编号作为掩码生成器的输入,使得生成的每个参与设备对应的第二掩码与该参与设备的第一掩码相同。
分别根据每个参与设备对应的第二掩码对该参与设备发送的带掩码的模型参数更新去除掩码,得到各参与设备的模型参数更新。由于带掩码的模型参数更新是采用第一掩码进行的添加掩码操作,因此,采用与第一掩码相同的第二掩码对带掩码的模型参数更新进行去除掩码操作,能够得到原始的模型参数更新。并且,协调设备是在TEE模块中进行的上述操作,因此,协调设备只能获得带掩码的模型参数更新,而无法获得原始的模型参数更新,因此不会造成参与设备的隐私泄露给协调设备;且协调设备的TEE模块能够获得各参与设备的模型参数更新,并对模型参数更新进行融合,保证了联邦学习的正常进行。并且协调设备与各参与设备均在各自本地生成对应相同的掩码,不需要额外增加通信开销协商掩码的一致性,极大地降低了通信开销和电量开销。
进一步地,协调设备还可在TEE模块中执行一下操作:对得到的各个参与设备的模型参数更新进行融合操作,得到全局模型参数更新。采用第三预设掩码生成器,根据下一次模型更新的迭代索引和各个参与设备的设备编号,生成与各个参与设备对应的第三掩码。其中,第三预设掩码生成器可以是预先设置的掩码生成器,可以与第一预设掩码生成器相同,也可以不相同。根据第三掩码对全局模型参数更新更新进行添加掩码操作,得到带掩码的全局模型参数更新。
协调设备将带掩码的全局模型参数更新发送给各个参与设备,并可携带下一次模型更 新的迭代索引,以开启下一次的模型更新。
各个参与设备在接收到新一次模型更新的带掩码的全局模型参数更新后,根据消息中携带的本次模型更新的迭代索引,和各自的设备编号,采用第四预设掩码生成器生成第四掩码;并采用第四掩码对带掩码的全局模型参数更新进行去除掩码操作,得到全局模型参数更新;根据全局模型参数更新进行本次模型更新的本地训练。其中,各个参与设备中的第四预设掩码生成器设置得相同,且均与协调设备的第三预设掩码器设置得相同。由于各个参与设备与协调设备根据相同的迭代索引和设备编号,采用相同的掩码生成器生成掩码,因此每个参与设备生成的第四掩码,和与该参与设备对应的第三掩码相同。由于带掩码的全局模型参数更新是采用第三掩码进行添加掩码操作得到的,因此,参与设备采用与第三掩码相同的第四掩码对带掩码的全局模型参数更新新进行去除掩码操作,能够得到原始的全局模型参数更新,从而在保证参与设备的隐私不泄露给协调设备的情况下,能够保证联邦学习的正常进行。
以下举一具体例子进行详细说明:
1、协调设备与K个参与设备在进行联邦学习之前确定掩码的长度为L,L小于等于模型参数更新和全局模型参数更新的长度N。t是模型更新的迭代索引,标识是第几次模型更新。
2、在第t次模型更新中,第k个参与设备训练得到模型参数更新w(k,t),并生成掩码m(k,t),得到带掩码的模型参数更新v(k,t)=w(k,t)+m(k,t),发送给协调设备。
3、协调设备在TEE模块中生成与参与设备相同的掩码m(k,t),采用m(k,t)对v(k,t)去除掩码,得到w(k,t)。协调设备继续在TEE模块中对多个参与设备发送的w(k,t)进行融合,得到全局模型参数更新w(t),并生成掩码p(k,t),得到带掩码的全局模型参数更新u(k,t)=w(t)+p(k,t)。协调设备将u(k,t)发送给第k个参与设备。
如图3,示出了TEE模块和协调设备中其他部分分别可见的内容。由于w(k,t)和w(t)都是在TEE模块中获得的,协调设备无法获知w(k,t)和w(t)。并且掩码m(k,t)和p(k,t)也是在TEE模块中生成的,协调设备也无法根据v(k,t)和u(k,t)来推断w(k,t)和w(t)。
4、第k个参与设备生成与协调设备TEE模块中相同的p(k,t),采用p(k,t)对u(k,t)去除掩码,得到w(t)=u(k,t)-p(k,t)。
特别的,当对不同的参与设备使用不同的第三掩码时,本申请的技术方案也适用于纵向联邦学习的场景,即适用于各参与设备训练的机器学习模型结构可能不同的场景,例如,各参与设备训练不同的神经网络模型。
进一步地,当第三掩码的长度小于模型参数更新的长度时,步骤S30包括:
步骤S301,融合各模型参数更新得到全局模型参数更新,并采用第三预设掩码生成器生成第三掩码;
步骤S302,通过预设补全方法对第三掩码进行补全,采用补全后的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新,其中,补全后的第三掩码的长度与模型参数更新的长度相同。
当预先设置的第三掩码的长度小于模型参数更新的长度时,协调设备在TEE模块中得到各参与设备的模型参数更新后,还可在TEE模块中执行以下操作:
融合各模型参数更新得到全局模型参数更新,并采用第三预设掩码生成器生成第三掩 码。通过预设补全方法对第三掩码进行补全,采用补全后的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新,补全后的第三掩码的长度与模型模型参数更新的长度相同。其中,预设补全方法可以是预先设置,如采用补零方法,对第三掩码长度不够的部分,采用补零的方式,使得第三掩码的长度与模型参数更新的长度相同,如模型参数更新的长度是100,第三掩码的长度是90,则可给第三掩码补充10个值为零的元素,使得第三掩码的长度为100。通过补全掩码的方式,使得掩码的长度可以小于模型参数更新的长度,从而进一步降低了掩码的计算复杂度。并且,掩码的长度只能在协调设备的TEE模块中可见,从而避免的协调设备根据掩码的长度推断补全的部分,根据补全的部分推断全局模型参数更新,保证了参与设备的隐私数据不会泄露给协调设备。
以及同样的补全原理,当第一掩码长度小于模型参数更新的长度时,参与设备可对第一掩码进行补全,采用补全后的第一掩码对模型参数更新进行去除掩码操作,得到带掩码的模型参数更新。
进一步地,提出本申请联邦学习隐私数据处理方法第三实施例,在本实施例中,所述联邦学习隐私数据处理方法应用于参与设备,参与设备与协调设备通信连接,本申请实施例协调设备和参与设备可以是智能手机、个人计算机和服务器等设备,参与设备可支持联邦学习模型的训练,在此不做具体限制。在本实施例中,联邦学习隐私数据处理方法包括以下步骤:
步骤A10,接收协调设备发送的本次模型更新的带掩码的全局模型参数更新;
在本实施例中,协调设备与各参与设备可通过握手、身份认证预先建立通信连接,并确定本次联邦学习的待训练模型。在联邦学习过程中,协调设备与参与设备通过相互配合,对待训练模型进行多次迭代更新,得到最终收敛的待训练模型,即可结束对待训练模型的训练过程。在一次模型更新中,协调设备向各个参与设备发送本次模型更新的带掩码的全局模型参数更新,各个参与设备接收协调设备发送各本次模型更新的带掩码的全局模型参数更新。
步骤A20,对带掩码的全局模型参数更新去除掩码得到全局模型参数更新;
参与设备对带掩码的全局模型参数更新去除掩码得到全局模型参数更新。具体地,参与设备可采用上一次模型更新过程中,本地生成的对模型参数更新进行添加掩码操作的第一掩码,对带掩码的全局模型参数更新进行去除掩码操作。
步骤A30,根据参与设备本地的训练数据和全局模型参数更新对联邦学习的待训练模型进行本地训练,得到模型参数更新;
参与设备本地存储有用于对待训练模型进行本地训练的训练数据,根据本地的训练数据和全局模型参数更新,对联邦学习的待训练模型进行本地训练,得到模型参数更新。具体的本地训练过程与现有的联邦学习中参与设备采用本地数据训练待训练模型的过程相同,在此不进行详细赘述。
步骤A40,采用本地生成的本次模型更新的第一掩码对模型参数更新添加掩码,得到带掩码的模型参数更新并发送给协调设备。
参与设备生成本次模型更新的第一掩码,采用第一掩码对模型参数更新进行添加掩码操作,得到带掩码的模型参数更新,并将带掩码的模型参数更新发送给协调设备。参与设备可以是每一次模型更新都生成不同的第一掩码,各个参与设备生成的第一掩码可以相同也可以不相同;各个参与设备可通过预设的掩码生成方式生成各自的第一掩码,其中,预设的掩码生成方式可以是预先根据需要进行设置;各个参与设备生成的第一掩码的长度可 以相同也可以不相同,可以预先设置各个参与设备的第一掩码的长度,长度可以小于或者等于模型参数更新的长度,以降低生成掩码的计算复杂度。
由于参与设备向协调设备发送的是带掩码的模型参数更新,且协调设备中无法获知参与设备的第一掩码,也无法获知参与设备的掩码生成方式,从而无法获知参与设备的模型参数更新,因此,参与设备的隐私数据不会泄露给协调设备。并且,各个参与设备之间各自在本地生成第一掩码,使得不用增加额外的通信开销来协商参与设备之间掩码的一致性,从而降低了通信开销和电量开销。
进一步地,在一实施例中,若各个参与设备在同一次的模型更新中生成的第一掩码都相同,则步骤A20可包括:
步骤A201,采用上一次模型更新中的第一掩码对带掩码的全局模型参数更新去除掩码,得到全局模型参数更新,其中,协调设备在上一次模型更新中,接收各参与设备发送的带掩码的模型参数更新,并融合各带掩码的模型参数更新得到带掩码的全局模型参数更新。
协调设备可以是在一次模型更新中,接收各个参与设备发送的带掩码的模型参数更新,并直接对各个带掩码的模型参数更新进行融合操作,由于协调设备对带掩码的模型参数更新进行融合操作,因此,融合得到的是带掩码的全局模型参数更新。协调设备将得到的带掩码的全局模型参数更新发送给各个参与设备,以使各个参与设备依据该带掩码的全局模型参数更新开始新一次的模型更新。
各个参与设备在接收到带掩码的全局模型参数更新后,开始本次模型更新。具体地,参与设备采用上一次模型参数更新中的第一掩码对带掩码的全局模型参数更新进行去除掩码,得到全局模型参数更新。由于各个参与设备在同一次模型更新中的第一掩码相同,因此,各个参与设备可采用各自的第一掩码对带掩码的全局模型参数更新进行去除掩码,所得到的全局模型参数更新,与直接对各个参与设备的模型参数更新进行融合操作得到的全局模型参数更新相同,原理是:假设参与设备有两个,分别是设备1和设备2,设备1与设备2在上次模型更新时,分别得到了模型参数更新w1和w2,并分别各自产生相同的第一掩码m,分别采用m对w1和w1进行添加掩码操作,得到带掩码的模型参数更新v1=w1+m和v2=w1+m发送给协调设备;协调设备对v1和v2进行融合操作,如平均,得到带掩码的全局模型参数更新u=(w1+w2)/2+m,发送给设备1和设备2;设备1和设备2分别采用上一次模型更新时的第一掩码m,对u进行去掩码操作w=u-m,得到全局模型参数更新w=(w1+w2)/2;而直接对w1和w2进行融合操作,如平均,得到的结果也是(w1+w2)/2。
基于上述原理,在本实施例中,可实现在协调设备不能获知参与设备的模型参数更新的情况下,即参与设备的隐私数据不泄露给协调设备的情况下,协调设备能够对参与设备的模型参数更新进行融合处理,保证联邦学习的正常进行。并且,带掩码的模型参数更新的长度并不会增加,因此,不会造成额外的通信带宽要求。并且,各个参与设备在各自本地生成同态的掩码,无需增加额外的通信开销来协商各个参与设备之间掩码的一致性,极大地降低了通信开销和电量开销。
进一步地,基于上述第三实施例,提出本申请联邦学习隐私数据处理方法第四实施例,在本申请联邦学习隐私数据处理方法第四实施例中,提出一种与上述步骤A201中不同的方案,以实现参与设备的隐私数据不会泄露给协调设备。具体地,协调设备中包括TEE模块,所述步骤A20包括:
步骤A203,生成与协调设备的第三掩码相同的第四掩码;
步骤A204,采用第四掩码对带掩码的全局模型参数更新去除掩码得到全局模型参数更新,其中,协调设备在上一次模型更新中,接收各参与设备发送的带掩码的模型参数更新,并在TEE模块中生成与各参与设备上一次模型更新的第一掩码相同的第二掩码,基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新,融合各模型参数更新得到全局模型参数更新,采用生成的第三掩码对全局模型参数更新添加掩码,得到本次模型更新的带掩码的全局模型参数更新。
协调设备在一次模型参数更新中,接收各个参与设备发送的带掩码的模型参数更新,并在TEE模块中,生成与本次模型更新中各个参与设备的第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新。需要说明的是,若各参与设备各自生成的第一掩码相同,则协调设备生成一个与该第一掩码相同的第二掩码即可,若参与设备各自生成的第一掩码不相同,则协调设备生成多个第二掩码,分别与各个参与设备的第一掩码对应相同。协调设备的TEE模块中可预置与各参与设备相同的掩码生成方式,使得协调设备在TEE模块中生成的第二掩码与第一掩码相同。
由于协调设备在TEE模块中采用与第一掩码相同的第二掩码对带掩码的模型参数更新进行去除掩码操作,因此,能够还原得到各参与设备的原始模型参数更新。并且,由于是在协调设备的TEE模块中进行去除掩码操作,去除掩码得到的模型参数更新只在TEE模块中可见,协调设备只能获得带掩码的模型参数更新,无法获得各参与设备的模型参数更新,从而不会窃取参与设备的隐私,保证了参与设备的隐私不会泄露给协调设备。
协调设备在TEE模块中,融合各模型参数更新得到全局模型参数更新,并生成第三掩码,采用第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新。其中,融合各模型参数更新得到全局模型参数更新,可以是通过融合函数对各个模型参数更新进行融合,融合函数可以是进行加权平均操作的函数。协调设备可以是每一次模型更新中都生成不同的第三掩码,若每一次模型更新中都生成不同的第三掩码,由于融合得到的全局模型参数更新用于下一次的模型更新,因此,协调设备生成的第三掩码对应下一次模型更新。协调设备的TEE模块中可采用与生成第二掩码相同的掩码生成方式生成第三掩码;第三掩码的长度也可以是预先进行设置,可以与第一掩码的长度相同,也可以不相同,同样地,为了降低生成掩码的计算复杂度,第三掩码的长度可以小于或等于全局模型参数更新的长度。
协调设备从TEE模块中获取带掩码的全局模型参数更新,将带掩码的全局模型参数更新发送给各个参与设备。若协调设备在本次模型更新中检测到待训练模型收敛,则可以将带掩码的全局模型参数更新发送给各个参与设备,供各个参与设备根据带掩码的全局模型参数更新确定待训练模型的最终参数,结束本次联邦学习。若协调设备在本次模型更新中检测到待训练模型收敛,则可以将带掩码的全局模型参数更新发送给各个参与设备,各个参与设备根据带掩码的全局模型参数更新进行下一次模型更新;各参与设备在接收到协调设备发送的带掩码的全局模型参数更新后,各自本地生成与协调设备的第三掩码相同的第四掩码,采用第四掩码,对带掩码的全局模型参数更新进行去除掩码操作,得到全局模型参数更新。各参与设备中可预置掩码生成方式,该掩码生成方式与协调设备中生成第三掩码的掩码生成方式相同,以使得参与设备生成的第四掩码与协调设备的第三掩码相同。
由于参与设备是采用与协调设备的第三掩码相同的第四掩码对带掩码的全局模型参数更新进行去除掩码操作,因此,参与设备能够还原得到协调设备TEE模块中的原始全局模型参数更新,从而保证了参与设备获取到准确的全局模型参数更新,不会造成数据的偏 差;并且,由于协调设备是从TEE模块中获取带掩码的全局模型参数更新,而原始的全局模型参数更新只能在TEE模块中可见,因此,协调设备无法获知原始的全局模型参数更新,从而无法窃取各个参与设备的隐私数据。
在本实施例中,通过各参与设备基于各自生成的第一掩码对各自训练得到的模型参数更新添加掩码,得到各自带掩码的模型参数更新;协调设备接收各参与设备发送的带掩码的模型参数更新,在TEE模块中,生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新;在TEE模块中,融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新;将带掩码的全局模型参数更新发送各参与设备,各参与设备基于各自生成的与第三掩码相同的第四掩码,对带掩码的全局模型参数更新去除掩码得到全局模型参数更新。本实施例中,通过融合掩码技术和TEE技术,使得协调设备无法获得各参与设备的模型参数更新和全局模型参数更新,但能够在TEE模块中得到参与设备的模型参数更新并进行融合操作,实现了在不泄露给协调设备隐私的情况下,完成联邦学习的模型更新过程;并通过掩码技术,使得模型参数更新和全局模型参数更新既能够安全传输,又不会增加通信带宽要求;并且,通过协调设备和参与设备各自在本地生成掩码,保证生成用于添加掩码操作和去除掩码操作的掩码相同,使得参与设备与协调设备之间、参与设备和参与设备之间、或参与设备与第三方服务器之间,无需增加额外的通信开销去协商掩码的一致性,特别是在每一次模型更新中都更换掩码的场景,极大地降低了通信开销和电量开销。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种联邦学习隐私数据处理方法,其中,所述联邦学习隐私数据处理方法应用于协调设备,协调设备中包括可信执行环境TEE模块,协调设备与多个参与设备通信连接,所述联邦学习隐私数据处理方法包括以下步骤:
    接收各参与设备发送的带掩码的模型参数更新,其中,各参与设备基于各自生成的第一掩码对各自训练得到的模型参数更新添加掩码,得到各自带掩码的模型参数更新;
    在TEE模块中,生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新;
    在TEE模块中,融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新;以及,
    将带掩码的全局模型参数更新发送给各参与设备,以供各参与设备基于各自生成的与第三掩码相同的第四掩码,对带掩码的全局模型参数更新去除掩码得到全局模型参数更新。
  2. 如权利要求1所述的联邦学习隐私数据处理方法,其中,所述生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新的步骤包括:
    至少根据本次模型更新的迭代索引,采用第一预设掩码生成器生成第二掩码;
    基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新,其中,各参与设备至少根据本次模型更新的迭代索引,采用各自本地的第二预设掩码生成器生成第一掩码,第一预设掩码生成器与第二预设掩码生成器相同。
  3. 如权利要求1所述的联邦学习隐私数据处理方法,其中,所述生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新的步骤包括:
    至少根据本次模型更新的迭代索引和各参与设备的设备编号,采用第一预设掩码生成器生成与各参与设备对应的各第二掩码;
    分别基于每个参与设备对应的第二掩码,对每个参与设备发送的各带掩码的模型参数更新去除掩码,得到各模型参数更新,其中,各参与设备至少根据本次模型更新的迭代索引和各自的设备编号,采用各自本地的第二预设掩码生成器生成各自的第一掩码,第一预设掩码生成器与第二预设掩码生成器相同。
  4. 如权利要求1所述的联邦学习隐私数据处理方法,其中,当第三掩码的长度小于模型参数更新的长度时,所述融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新的步骤包括:
    融合各模型参数更新得到全局模型参数更新,并采用第三预设掩码生成器生成第三掩码;
    通过预设补全方法对第三掩码进行补全,采用补全后的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新,其中,补全后的第三掩码的长度与模型参数更新的长度相同。
  5. 如权利要求1所述的联邦学习隐私数据处理方法,其中,所述融合各模型参数更 新得到全局模型参数更新的步骤之后,还包括:
    在TEE模块中根据全局模型参数更新判断联邦学习的待训练模型是否收敛;
    若待训练模型收敛则结束对待训练模型的训练,或者若迭代次数达到预设最大迭代次数则结束对待训练模型的训练,或者若训练时间达到最大训练时间则结束对待训练模型的训练。
  6. 一种联邦学习隐私数据处理方法,其中,所述联邦学习隐私数据处理方法应用于参与设备,参与设备与协调设备通信连接,所述联邦学习隐私数据处理方法包括以下步骤:
    接收协调设备发送的本次模型更新的带掩码的全局模型参数更新;
    对带掩码的全局模型参数更新去除掩码得到全局模型参数更新;
    根据参与设备本地的训练数据和全局模型参数更新对联邦学习的待训练模型进行本地训练,得到模型参数更新;以及,
    采用本地生成的本次模型更新的第一掩码对模型参数更新添加掩码,得到带掩码的模型参数更新并发送给协调设备。
  7. 如权利要求6所述的联邦学习隐私数据处理方法,其中,协调设备中包括可信执行环境TEE模块,
    所述对带掩码的全局模型参数更新去除掩码得到全局模型参数更新的步骤包括:
    生成与协调设备的第三掩码相同的第四掩码;
    采用第四掩码对带掩码的全局模型参数更新去除掩码得到全局模型参数更新,其中,协调设备在上一次模型更新中,接收各参与设备发送的带掩码的模型参数更新,并在TEE模块中生成与各参与设备上一次模型更新的第一掩码相同的第二掩码,基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新,融合各模型参数更新得到全局模型参数更新,采用生成的第三掩码对全局模型参数更新添加掩码,得到本次模型更新的带掩码的全局模型参数更新。
  8. 如权利要求6所述的联邦学习隐私数据处理方法,其中,所述对带掩码的全局模型参数更新去除掩码得到全局模型参数更新的步骤包括的步骤包括:
    采用上一次模型更新中的第一掩码对带掩码的全局模型参数更新去除掩码,得到全局模型参数更新,其中,协调设备在上一次模型更新中,接收各参与设备发送的带掩码的模型参数更新,并融合各带掩码的模型参数更新得到带掩码的全局模型参数更新。
  9. 一种设备,其中,所述设备是协调设备,协调设备中包括可信执行环境TEE模块,协调设备与多个参与设备通信连接,所述设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的联邦学习隐私数据处理程序,所述联邦学习隐私数据处理程序被所述处理器执行时实现如下步骤:
    接收各参与设备发送的带掩码的模型参数更新,其中,各参与设备基于各自生成的第一掩码对各自训练得到的模型参数更新添加掩码,得到各自带掩码的模型参数更新;
    在TEE模块中,生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新;
    在TEE模块中,融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新;以及,
    将带掩码的全局模型参数更新发送给各参与设备,以供各参与设备基于各自生成的与第三掩码相同的第四掩码,对带掩码的全局模型参数更新去除掩码得到全局模型参数更新。
  10. 如权利要求9所述的设备,其中,所述生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新的步骤包括:
    至少根据本次模型更新的迭代索引,采用第一预设掩码生成器生成第二掩码;
    基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新,其中,各参与设备至少根据本次模型更新的迭代索引,采用各自本地的第二预设掩码生成器生成第一掩码,第一预设掩码生成器与第二预设掩码生成器相同。
  11. 如权利要求9所述的设备,其中,所述生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新的步骤包括:
    至少根据本次模型更新的迭代索引和各参与设备的设备编号,采用第一预设掩码生成器生成与各参与设备对应的各第二掩码;
    分别基于每个参与设备对应的第二掩码,对每个参与设备发送的各带掩码的模型参数更新去除掩码,得到各模型参数更新,其中,各参与设备至少根据本次模型更新的迭代索引和各自的设备编号,采用各自本地的第二预设掩码生成器生成各自的第一掩码,第一预设掩码生成器与第二预设掩码生成器相同。
  12. 如权利要求9所述的设备,其中,当第三掩码的长度小于模型参数更新的长度时,所述融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新的步骤包括:
    融合各模型参数更新得到全局模型参数更新,并采用第三预设掩码生成器生成第三掩码;
    通过预设补全方法对第三掩码进行补全,采用补全后的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新,其中,补全后的第三掩码的长度与模型参数更新的长度相同。
  13. 如权利要求9所述的设备,其中,所述融合各模型参数更新得到全局模型参数更新的步骤之后,还包括:
    在TEE模块中根据全局模型参数更新判断联邦学习的待训练模型是否收敛;
    若待训练模型收敛则结束对待训练模型的训练,或者若迭代次数达到预设最大迭代次数则结束对待训练模型的训练,或者若训练时间达到最大训练时间则结束对待训练模型的训练。
  14. 一种设备,其中,所述设备是参与设备,参与设备与协调设备通信连接,所述设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的联邦学习隐私数据处理程序,所述联邦学习隐私数据处理程序被所述处理器执行时实现如下步骤:
    接收协调设备发送的本次模型更新的带掩码的全局模型参数更新;
    对带掩码的全局模型参数更新去除掩码得到全局模型参数更新;
    根据参与设备本地的训练数据和全局模型参数更新对联邦学习的待训练模型进行本地训练,得到模型参数更新;以及,
    采用本地生成的本次模型更新的第一掩码对模型参数更新添加掩码,得到带掩码的模型参数更新并发送给协调设备。
  15. 如权利要求14所述的设备,其中,协调设备中包括可信执行环境TEE模块,
    所述对带掩码的全局模型参数更新去除掩码得到全局模型参数更新的步骤包括:
    生成与协调设备的第三掩码相同的第四掩码;
    采用第四掩码对带掩码的全局模型参数更新去除掩码得到全局模型参数更新,其中,协调设备在上一次模型更新中,接收各参与设备发送的带掩码的模型参数更新,并在TEE模块中生成与各参与设备上一次模型更新的第一掩码相同的第二掩码,基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新,融合各模型参数更新得到全局模型参数更新,采用生成的第三掩码对全局模型参数更新添加掩码,得到本次模型更新的带掩码的全局模型参数更新。
  16. 如权利要求14所述的设备,其中,所述对带掩码的全局模型参数更新去除掩码得到全局模型参数更新的步骤包括的步骤包括:
    采用上一次模型更新中的第一掩码对带掩码的全局模型参数更新去除掩码,得到全局模型参数更新,其中,协调设备在上一次模型更新中,接收各参与设备发送的带掩码的模型参数更新,并融合各带掩码的模型参数更新得到带掩码的全局模型参数更新。
  17. 一种联邦学习隐私数据处理系统,其中,所述联邦学习隐私数据处理系统包括:至少一个协调设备和至少一个参与设备,所述协调设备为权利要求9所述的设备,所述参与设备为权利要求14所述的设备。
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有联邦学习隐私数据处理程序,所述联邦学习隐私数据处理程序被处理器执行时实现如下步骤:
    接收各参与设备发送的带掩码的模型参数更新,其中,各参与设备基于各自生成的第一掩码对各自训练得到的模型参数更新添加掩码,得到各自带掩码的模型参数更新;
    在TEE模块中,生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新;
    在TEE模块中,融合各模型参数更新得到全局模型参数更新,并采用生成的第三掩码对全局模型参数更新添加掩码,得到带掩码的全局模型参数更新;以及,
    将带掩码的全局模型参数更新发送给各参与设备,以供各参与设备基于各自生成的与第三掩码相同的第四掩码,对带掩码的全局模型参数更新去除掩码得到全局模型参数更新。
  19. 如权利要求18所述的计算机可读存储介质,其中,所述生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新的步骤包括:
    至少根据本次模型更新的迭代索引,采用第一预设掩码生成器生成第二掩码;
    基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新,其中,各参与设备至少根据本次模型更新的迭代索引,采用各自本地的第二预设掩码生成器生成第一掩码,第一预设掩码生成器与第二预设掩码生成器相同。
  20. 如权利要求18所述的计算机可读存储介质,其中,所述生成与第一掩码相同的第二掩码,并基于第二掩码对各带掩码的模型参数更新去除掩码,得到各模型参数更新的步骤包括:
    至少根据本次模型更新的迭代索引和各参与设备的设备编号,采用第一预设掩码生成器生成与各参与设备对应的各第二掩码;
    分别基于每个参与设备对应的第二掩码,对每个参与设备发送的各带掩码的模型参数更新去除掩码,得到各模型参数更新,其中,各参与设备至少根据本次模型更新的迭代索引和各自的设备编号,采用各自本地的第二预设掩码生成器生成各自的第一掩码,第一预设掩码生成器与第二预设掩码生成器相同。
PCT/CN2019/119237 2019-09-20 2019-11-18 联邦学习隐私数据处理方法、设备、系统及存储介质 WO2021051629A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910892806.9 2019-09-20
CN201910892806.9A CN110674528B (zh) 2019-09-20 2019-09-20 联邦学习隐私数据处理方法、设备、系统及存储介质

Publications (1)

Publication Number Publication Date
WO2021051629A1 true WO2021051629A1 (zh) 2021-03-25

Family

ID=69077085

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/119237 WO2021051629A1 (zh) 2019-09-20 2019-11-18 联邦学习隐私数据处理方法、设备、系统及存储介质

Country Status (2)

Country Link
CN (1) CN110674528B (zh)
WO (1) WO2021051629A1 (zh)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312169B (zh) * 2020-02-27 2023-12-19 香港理工大学深圳研究院 一种计算资源的分配方法及装置
CN111340453A (zh) * 2020-02-28 2020-06-26 深圳前海微众银行股份有限公司 联邦学习开发方法、装置、设备及存储介质
CN113379062B (zh) * 2020-03-10 2023-07-14 百度在线网络技术(北京)有限公司 用于训练模型的方法和装置
CN113449872B (zh) * 2020-03-25 2023-08-08 百度在线网络技术(北京)有限公司 基于联邦学习的参数处理方法、装置和系统
CN111291416B (zh) * 2020-05-09 2020-07-31 支付宝(杭州)信息技术有限公司 基于隐私保护对业务模型进行数据预处理的方法及装置
CN111861099A (zh) * 2020-06-02 2020-10-30 光之树(北京)科技有限公司 联邦学习模型的模型评估方法及装置
US11651292B2 (en) 2020-06-03 2023-05-16 Huawei Technologies Co., Ltd. Methods and apparatuses for defense against adversarial attacks on federated learning systems
CN112149160B (zh) * 2020-08-28 2022-11-01 山东大学 基于同态伪随机数的联邦学习隐私保护方法及系统
CN112016932A (zh) * 2020-09-04 2020-12-01 中国银联股份有限公司 测试方法、装置、服务器及介质
US11842260B2 (en) 2020-09-25 2023-12-12 International Business Machines Corporation Incremental and decentralized model pruning in federated machine learning
CN112100642B (zh) * 2020-11-13 2021-06-04 支付宝(杭州)信息技术有限公司 在分布式系统中保护隐私的模型训练方法及装置
CN112287377A (zh) * 2020-11-25 2021-01-29 南京星环智能科技有限公司 基于联邦学习的模型训练方法、计算机设备及存储介质
CN112560088A (zh) * 2020-12-11 2021-03-26 同盾控股有限公司 基于知识联邦的数据安全交换方法、装置及存储介质
KR20220106619A (ko) * 2021-01-22 2022-07-29 삼성전자주식회사 하드웨어 보안 아키텍쳐를 이용하여 연합 학습을 수행하는 전자 장치 및 이를 이용한 연합 학습 방법
CN112800468B (zh) * 2021-02-18 2022-04-08 支付宝(杭州)信息技术有限公司 一种基于隐私保护的数据处理方法、装置及设备
CN113033826B (zh) * 2021-05-25 2021-09-10 北京百度网讯科技有限公司 基于区块链的模型联合训练方法、装置、设备和介质
CN114492846B (zh) * 2022-04-06 2022-08-26 天聚地合(苏州)科技股份有限公司 基于可信执行环境的跨域联邦学习方法及系统
CN114662155B (zh) * 2022-05-23 2022-09-02 广州中平智能科技有限公司 面向联邦学习的数据隐私安全机制评估方法、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871160A (zh) * 2016-09-26 2018-04-03 谷歌公司 通信高效联合学习
WO2018174873A1 (en) * 2017-03-22 2018-09-27 Visa International Service Association Privacy-preserving machine learning
CN109308418A (zh) * 2017-07-28 2019-02-05 阿里巴巴集团控股有限公司 一种基于共享数据的模型训练方法及装置
CN109753820A (zh) * 2019-01-10 2019-05-14 贵州财经大学 数据开放共享的方法、装置及系统
US20190227980A1 (en) * 2018-01-22 2019-07-25 Google Llc Training User-Level Differentially Private Machine-Learned Models

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150324690A1 (en) * 2014-05-08 2015-11-12 Microsoft Corporation Deep Learning Training System
JP6535112B2 (ja) * 2016-02-16 2019-06-26 日本電信電話株式会社 マスク推定装置、マスク推定方法及びマスク推定プログラム
GB201610883D0 (en) * 2016-06-22 2016-08-03 Microsoft Technology Licensing Llc Privacy-preserving machine learning
CN109871702A (zh) * 2019-02-18 2019-06-11 深圳前海微众银行股份有限公司 联邦模型训练方法、系统、设备及计算机可读存储介质
CN110263936B (zh) * 2019-06-14 2023-04-07 深圳前海微众银行股份有限公司 横向联邦学习方法、装置、设备及计算机存储介质
CN110263908B (zh) * 2019-06-20 2024-04-02 深圳前海微众银行股份有限公司 联邦学习模型训练方法、设备、系统及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871160A (zh) * 2016-09-26 2018-04-03 谷歌公司 通信高效联合学习
WO2018174873A1 (en) * 2017-03-22 2018-09-27 Visa International Service Association Privacy-preserving machine learning
CN109308418A (zh) * 2017-07-28 2019-02-05 阿里巴巴集团控股有限公司 一种基于共享数据的模型训练方法及装置
US20190227980A1 (en) * 2018-01-22 2019-07-25 Google Llc Training User-Level Differentially Private Machine-Learned Models
CN109753820A (zh) * 2019-01-10 2019-05-14 贵州财经大学 数据开放共享的方法、装置及系统

Also Published As

Publication number Publication date
CN110674528B (zh) 2024-04-09
CN110674528A (zh) 2020-01-10

Similar Documents

Publication Publication Date Title
WO2021051629A1 (zh) 联邦学习隐私数据处理方法、设备、系统及存储介质
CN110263936B (zh) 横向联邦学习方法、装置、设备及计算机存储介质
US20210312334A1 (en) Model parameter training method, apparatus, and device based on federation learning, and medium
Gai et al. Privacy-preserving content-oriented wireless communication in internet-of-things
CN112329041B (zh) 部署合约的方法及装置
WO2021004551A1 (zh) 纵向联邦学习系统优化方法、装置、设备及可读存储介质
CN110601814B (zh) 联邦学习数据加密方法、装置、设备及可读存储介质
CN106899410B (zh) 一种设备身份认证的方法及装置
WO2021120862A1 (zh) 一种私有数据保护方法和系统
RU2723308C1 (ru) Управление приватными транзакциями в сетях цепочек блоков на основе потока обработки
CN105933353B (zh) 安全登录的实现方法及系统
CN113204787A (zh) 基于区块链的联邦学习隐私保护方法、系统、设备和介质
Abdalla et al. Universally composable relaxed password authenticated key exchange
CN107005569A (zh) 端对端服务层认证
Sarier Multimodal biometric authentication for mobile edge computing
CN111767411A (zh) 知识图谱表示学习优化方法、设备及可读存储介质
CN113569263A (zh) 跨私域数据的安全处理方法、装置及电子设备
Yang et al. Publicly verifiable outsourced data migration scheme supporting efficient integrity checking
CN115238172A (zh) 基于生成对抗网络和社交图注意力网络的联邦推荐方法
CN116502732B (zh) 基于可信执行环境的联邦学习方法以及系统
CN112801307B (zh) 基于区块链的联邦学习方法、装置和计算机设备
CN116170144B (zh) 智能电网匿名认证方法、电子设备及存储介质
Gomaa et al. Virtual identity approaches evaluation for anonymous communication in cloud environments
CN110175283B (zh) 一种推荐模型的生成方法及装置
CN116992458A (zh) 基于可信执行环境的可编程数据处理方法以及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19945904

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19945904

Country of ref document: EP

Kind code of ref document: A1