CN117592584A - Random multi-model privacy protection method based on federal learning - Google Patents

Random multi-model privacy protection method based on federal learning Download PDF

Info

Publication number
CN117592584A
CN117592584A CN202311689003.6A CN202311689003A CN117592584A CN 117592584 A CN117592584 A CN 117592584A CN 202311689003 A CN202311689003 A CN 202311689003A CN 117592584 A CN117592584 A CN 117592584A
Authority
CN
China
Prior art keywords
model
data
server
participants
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311689003.6A
Other languages
Chinese (zh)
Inventor
张泽飞
惠蓉
王崇文
董银环
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
West Yunnan University Of Applied Sciences
Original Assignee
West Yunnan University Of Applied Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by West Yunnan University Of Applied Sciences filed Critical West Yunnan University Of Applied Sciences
Priority to CN202311689003.6A priority Critical patent/CN117592584A/en
Publication of CN117592584A publication Critical patent/CN117592584A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a random multi-model privacy protection method based on federal learning, which comprises a participant selection step, a model sharing pool construction step, a randomization processing step and a model evaluation step. Compared with the prior art, the invention has the advantages that: not only can the problem of unbalanced data distribution of all parties be solved, but also the data privacy protection intensity can be increased through a differentiation mechanism; the concept of sharing the model pool is introduced, so that hidden dangers such as model attack and the like are well avoided; different concepts are introduced, and different models are selected to participate in training according to the actual conditions of all the participants; enhancing the randomization concept, and confusing the memory of the server to the data of each participant so as to protect the privacy of the data of the participant; a heuristic selection strategy is introduced, and participants are selected symmetrically according to the data of the same application of different users.

Description

Random multi-model privacy protection method based on federal learning
Technical Field
The invention relates to the field of privacy protection of federal learning, in particular to a random multi-model privacy protection method based on federal learning.
Background
Federal learning offers many benefits, in that it protects the privacy and data security of users to some extent, as it does not require direct data exchange between the parties involved in the training. However, with the development of internet technology, the requirement of users for privacy protection is increased, and the federal learning model faces new challenges in data privacy and security protection.
As an important branch field of artificial intelligence, machine learning is applied to the fields of intelligent transportation, financial analysis, recommendation systems, intelligent medical treatment, and the like with a certain result. However, in the mainstream machine learning model training and use, data always faces the risk of leakage, and great challenges are brought to personal privacy and data security. For example, events such as Facebook data leakage, yahoo data leakage, etc. have attracted considerable attention in industry and academia. Privacy protection and data security are also important issues for machine learning applications. With the rapid development of artificial intelligence, the effective sharing and fusion of data are increasingly demanded, and federal learning is generated and widely applied in the fields of health care, financial analysis and the like in order to solve the challenges brought by data privacy and data island.
Federal learning is a multi-user scenario oriented distributed machine learning framework that is used to solve the data islanding and privacy protection problems encountered by artificial intelligence. According to the technology, under the condition that the original data is not required to be uploaded by the participants, the server coordinates a plurality of participants to finish training of a global model together, so that the data is available and invisible. Training data in federal learning is distributed across the devices of the data owners, which train locally and share model parameters only with service providers, who aggregate updates of the data owners by some algorithm (e.g., fedAvg, fedProx, etc.) to train the global model.
Typical federal learning generally includes the following steps: (1) all participants download the latest model from the server; (2) Each participant calculates gradients by using local data, then encrypts and uploads the gradients to a server, and the server aggregates the gradient update model parameters; (3) The server distributes the updated model to each participant; (4) all participants update the local model. Sequentially iterating the steps (1) (2) (3) (4) until the model converges or a specified termination condition is reached.
The federal study privacy protection still has the following problems at present:
(1) Privacy preserving mechanism has high cost
In recent years, researchers have proposed a large number of privacy protection methods based on techniques such as differential privacy, secret sharing, homomorphic encryption, secure multiparty computing, and the like. However, these methods tend to have a large computational and communication overhead, which affects the usability and real-time performance of the application. Therefore, how to design privacy protection with balanced energy efficiency according to specific requirements of internet services, especially a privacy protection mechanism capable of meeting real-time query requirements of data, becomes a challenge for designing a privacy protection scheme.
(2) Privacy preserving mechanisms are less robust
Differential privacy, homomorphic encryption, secure multiparty computation and the like can improve the security of federal learning, but the privacy protection mechanisms also influence the performance and accuracy of the model, so that the robustness of federal learning is reduced, and the applicability of the model is influenced to a certain extent. Data in a real application scene of federation learning often does not meet independent same distribution, but the existing federation learning privacy protection mechanism assumes independent same distribution properties of the data to a certain extent. The design of data security and privacy protection mechanisms under the condition of independent and same distribution of data also becomes a great challenge of federal learning.
(3) Lack of an effective excitation mechanism
How to ensure that the server and the local participants are honest and not malicious is a challenge in privacy protection.
Disclosure of Invention
The technical problem to be solved by the invention is to ensure that more and more federal learning applications fall to the ground safely, and design of an energy efficiency balanced privacy protection mechanism is imperative. The invention provides a random multi-model privacy protection mechanism based on federal learning by combining the technologies of a heuristic selection method, a model sharing pool, randomization and the like of a participant so as to solve the problem of improving the existing federal learning privacy protection mechanism and provide better privacy protection for federal learning application.
In order to solve the technical problems, the technical scheme provided by the invention is as follows: the random multi-model privacy protection method based on federal learning comprises the following steps:
step one, a participant selects a step, which is used for a server to send heuristic information to a client, and confirms the communication, data quality and participation wish information of the client through feedback information of the client, and a part of the client is selected as a participant to participate in model training;
step two, a model sharing pool construction step is used for constructing a model sharing pool, wherein the sharing pool comprises a plurality of models, and different models are selected to participate in training according to the data distribution condition of the participants;
step three, a randomization processing step and a model evaluation step are used for confusing the parameter information memory of the server to the participants, so that the server cannot memorize the parameter information of the participants;
compared with the prior art, the invention has the advantages that: not only can the problem of unbalanced data distribution of all parties be solved, but also the data privacy protection intensity can be increased through a differentiation mechanism; the concept of sharing the model pool is introduced, so that hidden dangers such as model attack and the like are well avoided; different concepts are introduced, and different models are selected to participate in training according to the actual conditions of all the participants; enhancing the randomization concept, and confusing the memory of the server to the data of each participant so as to protect the privacy of the data of the participant; a heuristic selection strategy is introduced, and participants are selected symmetrically according to the data of the same application of different users.
Furthermore, the method-mounted proposal server provides a shared model pool, and different participants can extract model training data from the model pool according to the data volume and the data distribution.
Further, in the fourth step, the symmetry principle of the data training results of the same type is assumed to evaluate the advantages and disadvantages of model training, and the participants with poor performance are periodically removed through repeated iteration, and then new participants are randomly added.
Further, the method uses a hypothesis test mode to assume that the model trained by centralized machine learning after the data of each participant are combined is MALL, and the model independently trained by each participant is MONLY; if the PALL and PONLY represent the performance of MALL and MONLY, respectively, there is a non-negative real LP, and there is a ||pall-ponly| < LP, then the participant can continue to participate in the next round of training.
Furthermore, the participant of the method selects a heuristic selection strategy, if the server wants to randomly select N clients to participate in model training, the server sends heuristic messages to more than N devices, and the clients recommend the heuristic messages to the server after receiving the heuristic messages; the server selects N participants from the client terminal of the sponsor through the speed and data information returned by the message; the server sends out exploration information, sends out invitation information to the explored clients, sends acknowledgement information to the server if the clients receiving the invitation information have the intention of participating in training, simultaneously sends self-evaluation information to the server, and selects a proper amount of clients to participate in model training from the returned data and the acknowledged clients; the exploration information sent by the server contains model information, and then participants are selected according to the symmetry of the data of the same application of different users.
Drawings
FIG. 1 is a schematic diagram of a federal learning-based random multi-model privacy preserving method architecture of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
In the specific implementation of the invention, as shown in fig. 1, the invention provides a random multi-model privacy protection method based on federal learning, in the training process, different models are designated by each participant with different data distribution characteristics of a server to participate in training, the participant uploads the model parameters after training to the server, and the server realizes the integration of the characteristic parameters in a weighted average mode, and finally realizes that only the parameters of each participant are known, but the models and the data of each participant are not known. According to the algorithm, due to the fact that training models selected among all the participants are different, all the participants do not need to achieve operations such as data consistency. Meanwhile, the algorithm is added with a participant training quality evaluation mechanism, new participants are continuously added in the training process to replace the participants with poor performance, and a model with excellent performance can be trained through small data truly. Therefore, the algorithm not only can solve the problem of unbalanced data distribution of all parties, but also can increase the data privacy protection strength through a differentiation mechanism.
In one embodiment of the invention, as shown in FIG. 1, the invention introduces the concept of a shared model pool. In existing federal learning mechanisms, all participants use the same model training data. The scheme not only can not be well suitable for data which are not independently and uniformly distributed by all the participants, but also provides possibility of model attack for malicious participants. The trained global model may leak to other participants and even to untrained devices. Meanwhile, statistical heterogeneity significantly increases the complexity of problem modeling, theoretical analysis, and solution demonstration assessment. The invention provides a shared model pool provided by the server, and different participants can extract model training data from the model pool according to the conditions of the participants, such as the data size, the data distribution and the like. This strategy greatly alleviates various adverse effects from maldistribution of data, such as the omission of data-starved participants, etc. In addition, in the scheme, the server only knows the parameter training model selected by each participant at the present time, and hidden troubles such as model attack and the like can be well avoided.
In one embodiment of the invention, as shown in FIG. 1, the invention introduces a differentiated concept. Differentiation refers to the differentiation of devices between participants and the differentiation of data distribution between the participants in federal learning. Because of these differences, there are certain drawbacks to using the same model training data by different participants in classical federal learning. Thus, different models should be chosen to participate in the training according to the actual situation of each participant.
In one embodiment of the invention, as shown in FIG. 1, the invention enhances the concept of randomization. The curious server may learn private information of the client training data by generating countering the network. The shared model pool and different participation variance training provided by the invention can improve training efficiency and participant data privacy to a great extent. However, because the model selection and parameter update of the participants can be recorded by the server, if a malicious or curious server exists, the privacy data of each participant can still be obtained. The present invention thus enhances the randomization technique, which is different from the prior art federal learning in which the server randomly selects the participants. The randomization technique here aims to confuse the server's memory of the individual participant data, thereby protecting the participant data privacy.
In one embodiment of the present invention, as shown in fig. 1, the present invention assumes that the symmetry principle of the same type of data training results evaluates the merits of model training. Through repeated iteration, the participants with poor performance are removed periodically, and then new participants are added randomly. The invention adopts a hypothesis test mode to assume that the model trained by the centralized machine learning after the data of all the participants are combined is MALL, and the model independently trained by all the participants is MONLY. If PALL and PONLY represent the properties of MALL and MONLY, respectively. There is a non-negative real LP, with ||poll-ponly| < LP, then the participant can continue to participate in the next round of training.
In one embodiment of the invention, as shown in FIG. 1, the participants choose heuristic selection strategies. If the server wants to randomly select N clients to participate in model training, the server sends out heuristic messages to more than N devices, and the clients recommend the heuristic messages to the server after receiving the heuristic messages. The server selects N participants in the client terminal of the referral through the speed and data information returned by the message. The server sends out exploration information, sends out invitation information to the explored clients, sends acknowledgement information to the server if the clients receiving the invitation information have the intention of participating in training, simultaneously sends self-evaluation information to the server, and the server selects a proper amount of clients to participate in model training from the returned data and the acknowledged clients. The exploration information sent by the server contains model information, and then participants are selected according to the symmetry of the data of the same application of different users.
While there has been shown and described what is at present considered to be the fundamental principles and the main features of the invention and the advantages of the invention, it will be understood by those skilled in the art that the invention is not limited to the foregoing embodiments, but is described in the foregoing description merely illustrates the principles of the invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention as hereinafter claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (5)

1. A random multi-model privacy protection method based on federal learning is characterized by comprising the following steps:
step one, a participant selects a step, which is used for a server to send heuristic information to a client, and confirms the communication, data quality and participation wish information of the client through feedback information of the client, and a part of the client is selected as a participant to participate in model training;
step two, a model sharing pool construction step is used for constructing a model sharing pool, wherein the sharing pool comprises a plurality of models, and different models are selected to participate in training according to the data distribution condition of the participants;
step three, a randomization processing step and a model evaluation step are used for confusing the parameter information memory of the server to the participants, so that the server cannot memorize the parameter information of the participants;
and step four, a model evaluation step is used for evaluating the contribution and the data quality of each participant and providing a participant selection mechanism for training.
2. A random multi-model privacy preserving method based on federal learning as claimed in claim 1, wherein: the method comprises the steps that a provided server mounted on the method provides a shared model pool, and different participants can extract model training data from the model pool according to the data size and the data distribution condition.
3. A random multi-model privacy preserving method based on federal learning as claimed in claim 1, wherein: in the fourth step, the symmetry principle of the data training results of the same type is assumed to evaluate the advantages and disadvantages of model training, the participants with poor performance are removed periodically through repeated iteration, and then new participants are added randomly.
4. The random multi-model privacy preserving method based on federal learning, wherein: the method adopts a hypothesis test mode to assume that the model trained by centralized machine learning after the data of all the participants are combined is MALL, and the model independently trained by all the participants is MONLY; if the PALL and PONLY represent the performance of MALL and MONLY, respectively, there is a non-negative real LP, and there is a ||pall-ponly| < LP, then the participant can continue to participate in the next round of training.
5. A random multi-model privacy preserving method based on federal learning as claimed in claim 1, wherein: the method comprises the steps that a participant selects a heuristic selection strategy, if a server wants to randomly select N clients to participate in model training, the server sends heuristic messages to more than N devices, and the clients recommend the heuristic messages to the server after receiving the heuristic messages; the server selects N participants from the client terminal of the sponsor through the speed and data information returned by the message; the server sends out exploration information, sends out invitation information to the explored clients, sends acknowledgement information to the server if the clients receiving the invitation information have the intention of participating in training, simultaneously sends self-evaluation information to the server, and selects a proper amount of clients to participate in model training from the returned data and the acknowledged clients; the exploration information sent by the server contains model information, and then participants are selected according to the symmetry of the data of the same application of different users.
CN202311689003.6A 2023-12-11 2023-12-11 Random multi-model privacy protection method based on federal learning Pending CN117592584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311689003.6A CN117592584A (en) 2023-12-11 2023-12-11 Random multi-model privacy protection method based on federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311689003.6A CN117592584A (en) 2023-12-11 2023-12-11 Random multi-model privacy protection method based on federal learning

Publications (1)

Publication Number Publication Date
CN117592584A true CN117592584A (en) 2024-02-23

Family

ID=89915114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311689003.6A Pending CN117592584A (en) 2023-12-11 2023-12-11 Random multi-model privacy protection method based on federal learning

Country Status (1)

Country Link
CN (1) CN117592584A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051608A (en) * 2021-03-11 2021-06-29 佳讯飞鸿(北京)智能科技研究院有限公司 Method for transmitting virtualized sharing model for federated learning
WO2021158313A1 (en) * 2020-02-03 2021-08-12 Intel Corporation Systems and methods for distributed learning for wireless edge dynamics
CN113536382A (en) * 2021-08-09 2021-10-22 北京理工大学 Block chain-based medical data sharing privacy protection method by using federal learning
CN114462090A (en) * 2022-02-18 2022-05-10 北京邮电大学 Tightening method for differential privacy budget calculation in federal learning
CN114841364A (en) * 2022-04-14 2022-08-02 北京理工大学 Federal learning method capable of meeting personalized local differential privacy requirements
CN115099421A (en) * 2022-04-07 2022-09-23 中国联合网络通信集团有限公司 Group-oriented federal learning system
CN115775010A (en) * 2022-11-23 2023-03-10 国网江苏省电力有限公司信息通信分公司 Electric power data sharing method based on horizontal federal learning
CN115906162A (en) * 2022-11-17 2023-04-04 重庆邮电大学 Privacy protection method based on heterogeneous representation and federal factorization machine
CN115964706A (en) * 2021-10-12 2023-04-14 中国电信股份有限公司 Training data poisoning defense method under federal learning scene
CN116796832A (en) * 2023-06-27 2023-09-22 西安电子科技大学 Federal learning method, system and equipment with high availability under personalized differential privacy scene
EP4266220A1 (en) * 2022-04-21 2023-10-25 E-GROUP ICT SOFTWARE Informatikai Zrt. Method for efficient machine learning

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021158313A1 (en) * 2020-02-03 2021-08-12 Intel Corporation Systems and methods for distributed learning for wireless edge dynamics
CN113051608A (en) * 2021-03-11 2021-06-29 佳讯飞鸿(北京)智能科技研究院有限公司 Method for transmitting virtualized sharing model for federated learning
CN113536382A (en) * 2021-08-09 2021-10-22 北京理工大学 Block chain-based medical data sharing privacy protection method by using federal learning
CN115964706A (en) * 2021-10-12 2023-04-14 中国电信股份有限公司 Training data poisoning defense method under federal learning scene
CN114462090A (en) * 2022-02-18 2022-05-10 北京邮电大学 Tightening method for differential privacy budget calculation in federal learning
CN115099421A (en) * 2022-04-07 2022-09-23 中国联合网络通信集团有限公司 Group-oriented federal learning system
CN114841364A (en) * 2022-04-14 2022-08-02 北京理工大学 Federal learning method capable of meeting personalized local differential privacy requirements
EP4266220A1 (en) * 2022-04-21 2023-10-25 E-GROUP ICT SOFTWARE Informatikai Zrt. Method for efficient machine learning
CN115906162A (en) * 2022-11-17 2023-04-04 重庆邮电大学 Privacy protection method based on heterogeneous representation and federal factorization machine
CN115775010A (en) * 2022-11-23 2023-03-10 国网江苏省电力有限公司信息通信分公司 Electric power data sharing method based on horizontal federal learning
CN116796832A (en) * 2023-06-27 2023-09-22 西安电子科技大学 Federal learning method, system and equipment with high availability under personalized differential privacy scene

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DAVID BYRD等: "Differentially private secure multi-party computation for federated learning in financial applications", 《ICAIF \'20: PROCEEDINGS OF THE FIRST ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE》, no. 16, 7 October 2021 (2021-10-07), pages 1 - 9 *
SHULAI ZHANG等: "Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client Selection", 《ICPP \'21: PROCEEDINGS OF THE 50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING》, no. 83, 5 October 2021 (2021-10-05), pages 1 - 10 *
李从: "多模型联邦学习的资源优化分配", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 07, 15 July 2023 (2023-07-15), pages 140 - 24 *

Similar Documents

Publication Publication Date Title
Liu et al. A secure federated learning framework for 5G networks
CN111935156B (en) Data privacy protection method for federated learning
Ma et al. Transparent contribution evaluation for secure federated learning on blockchain
Kim A model and case for supporting participatory public decision making in e-democracy
Azouvi et al. Betting on blockchain consensus with fantomette
Yang et al. A practical cross-device federated learning framework over 5g networks
Chen et al. Secure collaborative deep learning against GAN attacks in the Internet of Things
CN112765631B (en) Safe multi-party computing method based on block chain
CN112990987B (en) Information popularization method and device, electronic equipment and storage medium
CN115795518B (en) Block chain-based federal learning privacy protection method
CN112597542B (en) Aggregation method and device of target asset data, storage medium and electronic device
Alwen et al. Collusion-free multiparty computation in the mediated model
CN111966976A (en) Anonymous investigation method based on zero knowledge proof and block chain
Petruzzi et al. Experiments with social capital in multi-agent systems
CN117592584A (en) Random multi-model privacy protection method based on federal learning
Zhou et al. Mobile augmented reality with federated learning in the metaverse
CN110365671A (en) A kind of intelligent perception incentive mechanism method for supporting secret protection
CN115361196A (en) Service interaction method based on block chain network
Filippov et al. Online protest mobilization: building a computational model
CN113657616A (en) Method and device for updating federal learning model
Xu et al. Privacy-preserving task-matching and multiple-submissions detection in crowdsourcing
CN113657614B (en) Updating method and device of federal learning model
Chandran et al. Comparison-based MPC in Star Topology.
CN116614273B (en) Federal learning data sharing system and model construction method in peer-to-peer network based on CP-ABE
Chen et al. Practical multi-party private set intersection cardinality and intersection-sum protocols under arbitrary collusion 1

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination