CN115310121B - Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles - Google Patents
Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles Download PDFInfo
- Publication number
- CN115310121B CN115310121B CN202210816716.3A CN202210816716A CN115310121B CN 115310121 B CN115310121 B CN 115310121B CN 202210816716 A CN202210816716 A CN 202210816716A CN 115310121 B CN115310121 B CN 115310121B
- Authority
- CN
- China
- Prior art keywords
- model
- data
- mepc
- federal
- gradient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 75
- 230000006870 function Effects 0.000 claims abstract description 60
- 238000012549 training Methods 0.000 claims abstract description 55
- 230000006854 communication Effects 0.000 claims abstract description 18
- 238000004891 communication Methods 0.000 claims abstract description 17
- 230000004931 aggregating effect Effects 0.000 claims abstract description 8
- 230000009471 action Effects 0.000 claims description 30
- 230000008569 process Effects 0.000 claims description 20
- 230000002776 aggregation Effects 0.000 claims description 17
- 238000004220 aggregation Methods 0.000 claims description 17
- 239000003795 chemical substances by application Substances 0.000 claims description 16
- 230000002787 reinforcement Effects 0.000 claims description 11
- 238000009826 distribution Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 9
- 230000001186 cumulative effect Effects 0.000 claims description 7
- 230000008901 benefit Effects 0.000 claims description 4
- 230000000717 retained effect Effects 0.000 claims description 4
- 241000764238 Isis Species 0.000 claims description 3
- 230000006978 adaptation Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000006116 polymerization reaction Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000003491 array Methods 0.000 claims description 2
- 238000013524 data verification Methods 0.000 claims description 2
- 241000287196 Asthenes Species 0.000 abstract 1
- 238000004364 calculation method Methods 0.000 description 8
- 230000007246 mechanism Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 241001522296 Erithacus rubecula Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2463/00—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
- H04L2463/062—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00 applying encryption of the keys
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a real-time reinforced federal learning data privacy security method based on a MePC-F model in the internet of vehicles, which comprises the following steps: building multiple edge servers E i And a cloud server CS; edge server E i Downloading initial type A gradients from cloud server CSAnd decrypted intoRandom initialization of type B gradientsCarrying out local model training; edge server E i By a decoding function fromTo obtain partial gradient information to be preservedAnd the remaining gradient information is usedIs homomorphically encrypted asThen broadcast and send to all other edge servers E through MePC algorithm j (ii) a The class A gradient information after all the edge servers are updated and shared is respectivelyAll edge servers willUploading the global parameters to a cloud server CS, and aggregating the global parameters by the cloud server CS through a PreFLa algorithm; the above steps are repeated until a termination condition is reached. The invention prevents data leakage between terminals, realizes data privacy safety protection, and reduces communication overhead while preventing original data leakage.
Description
Technical Field
The invention relates to the technical field of networked vehicle user cooperative processing real-time safety behavior analysis, in particular to a real-time reinforced federal learning data privacy security method based on a MePC-F model in the internet of vehicles.
Background
With the development of various real-time communications and services supported by the internet of vehicles, the data volume generated by interconnected equipment such as vehicle-mounted units is unprecedentedly huge, a large amount of heterogeneous data oriented to vehicle users and the difference of equipment computing capacity are provided, federal learning provides an effective solution for meeting the requirement of data safety protection in the real-time training process of a network model, and different edge devices can cooperatively train a machine learning model under the condition of not exposing original data.
The edge computing mass data and the personal privacy of the user are closely combined, for example, the track of the user, the credit card, the bill and other data are really related to the privacy security of the user, and if data leakage occurs, great potential safety hazards are brought to the user. Federal learning can protect data to some extent, but the risk of information leakage still exists, and there are four types, 1) member leakage, 2) unexpected feature leakage, 3) class representing original data leakage, and 4) original data leakage. The last type of data leakage is least acceptable to privacy-sensitive participants.
In order to protect the data privacy of mobile users and solve the problem of leakage of raw data, researchers have conducted a lot of research on data security protection based on cryptography: differential privacy, homomorphic encryption, and multi-party secure computation. Differential privacy generally uses three noise addition mechanisms: a laplace mechanism, a gaussian mechanism, and an exponential mechanism, respectively. The context information is disturbed by adding noise to protect the privacy of the data, but if the noise is increased too much, the performance of the model training is affected. Common in homomorphic encryption are additive and multiplicative homomorphic encryption: research shows that when Paillier addition homomorphic encryption calculation is used, noise is doubled, and when El Gamal multiplication homomorphic encryption calculation is used, the noise is increased in a second order. To increase the availability of data and overcome the noise problem, researchers have introduced bootstrapping, which reduces noise by setting a threshold for encryption and decryption, allowing the scheme to compute an unlimited number of operations. It is also possible to solve the noise problem by batch processing or by parallel homomorphic calculation or compression of deletion pairs. The safe multi-party calculation refers to the problem that a multi-party participant safely calculates an appointed function under the condition of no trusted third party, and the main purpose is to ensure that private input of each party is independent in the calculation process and no local data is leaked in the calculation process. Research proves that the gradient leakage problem in federal learning can be solved by using safe multi-party calculation, and the data safety protection can be carried out while the accuracy is ensured only by carrying out information exchange on the first hidden layer. But the process of information interaction is P2P, so the problem of large communication overhead occurs.
Most of data security protection research based on cryptography is a centralized solution, and aims to solve the problem of time overhead while protecting data security: federated learning allows edge devices to co-train machine learning models without exposing raw data. Federal learning typically employs a parameter server architecture, where the client is trained by a parameter server synchronization local model. The synchronization method is usually used for realizing, namely, the central server synchronously sends the global model to a plurality of clients, and the plurality of clients synchronously return the updated model to the central server after training the model based on local data. This may be slow due to a queue loss. Global synchronization is very difficult, especially in a joint learning scenario, due to limited computing power and battery time, and varying availability and completion times from device to device. A new joint optimization asynchronous algorithm is proposed to solve the regularization local problem to ensure convergence, so that multiple devices and servers can cooperatively and efficiently train the model without revealing privacy.
Despite much research in data security. However, most of the existing methods are limited to solving the safety problem of original data, how to simultaneously meet the goals of privacy and availability of big data of mobile users in a complex car networking space, and the problem that data is recovered after gradient leakage is prevented while communication overhead is reduced by designing an effective federal learning algorithm is still open.
Firstly, data in the federal learning are stored in the local node, so that the risk problem of leakage of original data in data transmission can be reduced. But only gradient information is transmitted, the possibility still arises that the original data is recovered. Data interaction in secure multi-party computing can enable multiple parties to have data, and the possibility that a sample is recovered by information after gradient information is leaked is reduced. However, in the existing secure multi-party computing, the way of exchanging information among users is that all users send information to other users, and simply speaking, a unicast way is used, which brings higher time overhead. It is important to find a suitable solution to reduce the risk of data being attacked and recovered, and to reduce transmission delays, while addressing the data security and real-time needs of the vehicle user. Secondly, due to the difference of data and equipment of different edge servers, it is also necessary to improve the training precision of the whole model in a targeted manner in the training process. The global parameter aggregation in a typical federal average synchronization mode is slow due to the phenomenon of queue loss. While the communication time overhead is balanced and calculated, it is also important that the global precision is guaranteed through personalized training of a plurality of models. However, most federal learning algorithms based on data security rely on a synchronous aggregation algorithm, which can bring high latency to challenge meeting the real-time requirements of the internet of vehicles. Therefore, a federated learning algorithm based on reinforcement learning is necessary to reduce time delay, improve accuracy and guarantee data safety.
Disclosure of Invention
The invention aims to solve the technical problem of providing a real-time reinforced federal learning data privacy security method based on a MePC-F model in the internet of vehicles aiming at the defects in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the invention provides a real-time reinforced federal learning data privacy security method based on a MePC-F model in the internet of vehicles, which comprises the following steps:
s1, constructing a plurality of edge servers E i And a cloud server CS; acquiring vehicle data D = { D = { (D) 1 ,D 2 ,…,D i }, edge server E i Acquiring corresponding vehicle data D i ;
S2, in the k-th round of federal task, the edge server E i Downloading initial type A gradients from cloud server CSAnd deciphers as->Randomly initializing a type B gradient pick>Edge server E i According to its vehicle data D i To compute gradients in local network model training, edge server E i Recording the gradient information after finishing the T-round local training
S3, edge server E i By decoding functionsSlave->To obtain partial gradient information to be preservedAnd the remaining gradient information is pick>Is encrypted to be->Then broadcast and send to all other edge servers E through MePC algorithm j (ii) a Edge server E i Based on a decoding function>Obtain information from other edge servers E j Corresponding partial gradient information ≥>The updated and shared A-type gradient information of all the edge servers is ^ R>i∈[1,n]N is the total number of edge servers;
s4, all edge servers willUploading the data to a cloud server CS, aggregating global parameters by the cloud server CS through a PreFLa algorithm, and selecting an edge server E by the PreFLa algorithm through obtaining a maximization report through reinforcement learning i Is optimized to the parameter weight ratio a i,k The global gradient parameter->According to a i,k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;
and S5, repeating the steps S2 to S4 until a termination condition is reached, calculating a final global gradient parameter by the cloud server CS, issuing the final global gradient parameter to each edge server, extracting the feature of the plurality of vehicle data by the edge server, calculating the accuracy and the optimal loss function of the MePC-F model to obtain the trained MePC-F model, completing the whole training process, and outputting the model to a service corresponding to the Internet of vehicles in real time. .
Further, in step S2 of the present invention, a specific method for training the local network model is as follows:
employing a deep neural network DNN model, the DNN performing end-to-end feature learning and classifier training by taking different vehicle data as raw inputs, using stochastic gradient descent as a subroutine to minimize the loss value in each local training;
E i downloading base layer parameters from cloud server CS in kth round of communication, namely initial A-type gradient before decryptionAnd deciphers as type A gradient->Randomly initializing a type B gradient pick>Wherein k is the [1,K ]]K represents the total number of rounds of the federal mission; if it is the first round of the federal task, the CS initializes randomly->Before local training, E i By using homomorphic encryption pairs->Decipher into->And is recorded as->
The loss function of the local model is set as follows:
L(w i )=l(w i )+λ(w i,t -w i,t+1 ) 2
where L () represents the loss of the network, the second term is the L2 regularization term, and λ is the regularization coefficient; w is a i Representing total weight information in the local model, w i,t Is the weight information of the local model at time t, w i,t+1 Is the weight of the local model at time t +1Information;
E i initialization G k And replaces the weight parameter w of the model i The local model training continues by minimizing the loss function as follows:
w i =w i -ηG k
where eta is the learning rate, G k Is thatAnd &>In a manner known per se, here>Random initialization;
edge server E i After T rounds of local training are achieved, the accuracy acc of each local model is obtained at the moment i,k And
further, in step S3 of the present invention, a specific method of the MePC algorithm is as follows:
in the k-th federal task, all edge servers use MePC to exchange base layer gradientsWherein it is present>Encrypted data in class A representing the nth edge server in the kth federated task, and +>Encrypted data in class A representing the ith edge server in the kth federated task, and +>Express the kth round of federationThe ith edge server broadcasts the A-type encrypted data sent to other edge servers in the task, and the encrypted data is judged to be on or off>I.e. is->The encrypted data reserved by the user is removed;
to avoid the risk of data being cracked, a random ratio χ is taken in each networkThe gradient is then->And keeping the random proportion χ of the same federal round the same, and then will ^ be ^ or ^ be ^ ed>Encrypted is->The random proportion χ varies across different rounds of federal mission, and χ ∈ [1,1/n ]];The remaining gradient is ^ encrypted by homomorphism>Is divided into n-1 partsThe values of (a) are divided into:
only is provided withIs retained at E i In the method, other parts and the random parameter χ are broadcast and transmitted to other E in the form of ciphertext j (ii) a In this manner, even if portions of the transmitted content are attacked, the original data ≧ is>The leakage is avoided;
Further, in step S3 of the present invention, a specific method for locally performing data verification includes:
in the k-th round of federal task, verification was performed using a corresponding "multiplication" method, each edge server itself designing two decoding functions, as follows:
wherein L is 0 Is thatIs L' is->Length of (d); subscript k of the decoding function represents the decoding function in the k-th round of federal task;
L 0 =χ·L
require thatThe decoding functions of all the edge servers execute 'union' operation on the same data packet to obtain all 0's, and execute' intersection 'operation to obtain all 1's, namely:
first, the initial decoding function is as follows:
data packetDecoding functions corresponding to those in other serversMultiplying; due to->The binary bit of middle 0 is multiplied by 0, so E i Ensuring that only its own partial data packet is obtained; when/is>When the binary bit in (1) is 1, obtaining the ciphertext of the gradient information of the corresponding position as follows:
E i will be from other edge servers E j Adding all the obtained data packet arrays to corresponding positions to obtain all the ciphertext data, and updating the ciphertext data into the final ciphertext dataNamely:
each time a secure multiparty computation is performed, as k increases, every E i Decoding function ofIs left-cyclically shifted by m units to ensure @>Dynamics of sharing and equally dividing them into E 1 ,E 2 ,…,E n And the data information of each part is not repeated.
Further, in step S4 of the present invention, the specific method of the PreFla algorithm is as follows:
PreFLa adopts reinforcement learning RL to carry out adaptation to select optimal parameter weight ratio a i,k Aggregating global parameters
In an uplink communication stage, each edge server not only trains a local model, but also uploads local parameters to a cloud server CS for joint aggregation; after execution of the MePC algorithm in the k-th federation, E i Parameterization over TLS/SSL secure channelAnd &>Uploading to the CS; in the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, the model parameters of each ES are used for aggregation, and the convergence speed of the aggregation stage has a crucial influence; therefore, it is necessary to consider participant E in k rounds of federal aggregation i Parameter weight ratio of (a) i,k ;
Using the reinforced learning based on DQN to predict the weight ratio of the parameters, and storing information through a Q function to prevent a spatial multi-dimensional disaster; in order to better realize model personalization and reduce the waiting time of the uploading weight in MePC-F, DQN is used for selecting the optimal parameter weight ratio a i,k Aggregating global parameters in update CSThe reinforcement learning comprises the following steps: status, actions, reward functions, and feedback.
Further, in step S4 of the present invention, the specific method of the state, the action, the reward function and the feedback is as follows:
the state is as follows: state of the kth wheelWherein it is present>Is a poor precision, expressed as:
the actions are as follows: weight ratio of parameters a i,k An action expressed as a kth round of federal task; in order to avoid trapping in a local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process to obtain a i,k :
Where P is a set of weight permutations, rand is a random number, rand is an element of 0,1],Q(s i,k ,a i,k ) Means agent is in state s i,k Take action a i,k The accumulated cash-out benefits; once DQN is trained to approximate Q(s) during testing i,k ,a i,k ) The DQN agent will compute { Q(s) for all actions in the kth round i,k ,a i,k )|a i,k ∈[P]}; each action value represents a passing of the agent in state s i,k Selecting a particular action a i,k The maximum expected reward that can be achieved;
rewarding: the observed reward at the end of the k-th round of federal is set to be:
wherein,is a positive number, ensuring that r k Δ acc with test accuracy i,k The growth is exponential; the first incentive agent selects equipment capable of achieving higher test accuracy;For controlling the following Δ acc i,k Increase r k A change in (c); when Δ acc i,k When less than 0, there is r k ∈(-1,0);
Training the DQN agent to maximize the expectation of a cumulative discount reward, as shown by:
wherein γ ∈ (0,1 ], representing a factor discounting future rewards;
at obtaining r k The cloud server CS then saves the multi-dimensional quadruplets B for each round of federal tasks k =(s i,k ,a i,k ,r k ,s i,k+1 ) (ii) a Optimal action value function Q(s) i,k ,a i,k ) Is the memo sought by the RL proxy, defined as s i,k Initial cumulative discount yield maximum expectation:
Q(s i,k ,a i,k )=E(r i,k +γmax Q(s i,k+1 ,a i,k )|s i,k ,a i,k )
learning a parameterized value function Q(s) using function approximation techniques i,k ,a i,k ;w k ) Approximating the optimum function Q(s) i,k ,a i,k );r k +γmax Q(s i,k+1 ,a i,k ) Is Q(s) i,k ,a i,k ;w k ) A goal of learning; DNN is used to represent a function approximator; the RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:
l(w k )=(r i,k +γmax Q(s i,k+1 ,a i,k ;w k )-Q(s i,k ,a i,k ;w k )) 2
CS updates the global parameter w k Comprises the following steps:
wherein eta is more than or equal to 0 and is the step length;
after the cloud server CS obtains the optimal learning model, a of the kth round weight ratio sequence is obtained i,k Global parameter ofThe updating is as follows: />
Further, the HE encryption method in the method of the present invention specifically includes:
the encryption schemes of the weight matrix and the offset vector follow the same idea, and the addition homomorphic encryption of the real number a is expressed as a E In addition homomorphic encryption, for any two numbers a and b, there is a E +b E =(a+b) E (ii) a The method of converting any real number r into a coded rational number stationary point v is:
consider the gradientEach encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits, and d-bit fractional bits; thus, each rational number that can be encoded is defined by its position H =1+ z + d; performing encoding to allow multiplication operations, which require operations modulo H +2d to avoid comparisons;
the decoding is defined as:
multiplication of these code numbers requires removal of the factor 1/2d; when Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once; for simplicity, it is processed at decoding time;
the largest encryptable integer is V-1, so the largest encryptable real number must be taken into account, and therefore the integer z and the fraction d are chosen as follows:
V≥2 H+2d ≥2 1+z+3d 。
further, the optimal loss function in step S5 of the present invention is
Wherein, L (w) i ) Denotes E i Loss of the network.
The invention has the following beneficial effects:
(1) A Federal learning model (MePC-F) for multi-party broadcast security computing is presented. The model combines the MePC algorithm and the PreFla algorithm to solve the problems of the security of federal learning training data and communication overhead in the Internet of vehicles. And the mixed advantages of homomorphic encryption and safe multi-party calculation are considered to prevent data leakage between the terminals, and the reduction degree of the original data is reduced after the data is attacked, so that the privacy safety protection of the data is realized to the maximum extent.
(2) A secure broadcast multi-party computing MePC is presented. For secure multiparty computation, sharing only the gradient information of the first layer can greatly reduce the risk of data being recovered and reduce traffic. In the sharing process, the edge server model takes respective parts through a decoding function in a broadcasting mode, and the time complexity can be increased from O (n) 2 ) Lowering to O (n) reduces communication overhead while preventing leakage of raw data.
(3) A federal learning algorithm PreFla based on weight proportion is proposed. And finding the optimal gradient weight ratio by using PreFla to aggregate global parameters, and designing a reward function by using the accuracy difference of each edge server, so that the action selection with the maximum overall return is the weight ratio of each round of federation. And an L2 regularization term is added in a loss function to promote edge server cooperation and reduce time delay and performance problems caused by data heterogeneity. Therefore, the global model is generalized better, and convergence is accelerated.
Drawings
The invention will be further described with reference to the following drawings and examples, in which:
FIG. 1 is a MePC-F model of an embodiment of the present invention;
FIG. 2 is a flow chart of a MePC-F model of an embodiment of the present invention;
FIG. 3 is a MePC algorithm of an embodiment of the present invention;
fig. 4 shows DLG results when the first hidden layer is hidden and not hidden by four methods in the MNIST according to the embodiment of the present invention; (a) FL; (b) MePC-F; (c) PeMPC; (d) Gaussian; (e) Laplacian;
FIG. 5 is the performance of DLG on MNIST when the gradient of the first hidden layer is replaced by four methods (Gauss distribution, laplace distribution, PEMPC and MePC-F) according to an embodiment of the present invention;
FIG. 6 is a graph of average accuracy and loss of No-IID MNIST data for an embodiment of the present invention;
FIG. 7 is the average accuracy and loss of No-IID CAFIR-10 data for an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The parameters involved in the examples of the invention are described below:
TABLE 1 description of the parameters
Wherein E is i Indicating the current edge server, E j Representing edge servers other than the current edge server, E s Representing all edge servers.
The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles comprises the following steps:
s1, constructing a plurality of edge servers E i And a cloud server CS; acquiring vehicle data D = { D = { (D) 1 ,D 2 ,…,D i }, edge server E i Acquiring corresponding vehicle data D i ;
S2, in the k-th round of federal task, the edge server E i Downloading initial type A gradients from cloud server CSAnd deciphers as->Randomly initializing a type B gradient pick>Edge server E i According to its vehicle data D i To compute gradients in local network model training, edge server E i Recording the gradient information after finishing the T-round local training
S3, edge server E i By decoding functionsSlave/slave unit>To obtain partial gradient information to be preservedAnd the remaining gradient information is pick>Via homomorphic encryptionIs->Then broadcast and send to all other edge servers E through the MePC algorithm j (ii) a Edge server E i According to a decoding function>Obtain information from other edge servers E j In the corresponding partial gradient information &>The updated and shared A-type gradient information of all the edge servers is ^ R>i∈[1,n]N is the total number of edge servers;
s4, all edge servers willUploading the global parameters to a cloud server CS, aggregating the global parameters by the cloud server CS through a PreFLa algorithm, and selecting an edge server E by the PreFLa algorithm through obtaining a maximized report through reinforcement learning i Is optimized to the parameter weight ratio a i,k The global parameter->According to a i,k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;
and S5, repeating the steps S2-S4 until a termination condition is reached, and finishing the whole training process. The termination condition may be a maximum number of training cycles, a convergence of a loss function, or other user-defined condition. Finally, an optimal loss function can be obtained according to the following equation (1).
Wherein, L (w) i ) Denotes E i Loss of the network.
The specific method of local training is as follows:
in the local model phase, a Deep Neural Network (DNN) is employed to learn the cloud model and the ES model. DNN performs end-to-end feature learning and classifier training by taking different user data as raw inputs. Random gradient descent will be used in the proposed algorithm as a subroutine to minimize the loss value in each local training.
In the downstream communication phase E i At k (k E [1,K)]) Downloading base layer parameters from CS in round robin communicationAnd randomly initializing>Where K represents the total number of rounds of the federal mission. If it is the first round of the federal task, the CS initializes randomly->Before local training, E i Needs to be met by using homomorphic encryption (equation (4)) pairs>Decipher into->And is recorded as->
In order to better embody the model personalization, the loss function of the local model is set as follows:
L(w i )=l(w i )+λ(w i,t -w i,t+1 ) 2 (16)
where l () represents the loss of the network, e.g. the cross-entropy loss of the classification task. The second term is an L2 regularization term, which can not only keep the individuation capability of the second term, but also improve the cooperation efficiency with other participants. λ is a regularization coefficient.
E i Initialization G k And replaces the weight parameter w of the model i Continuing the local model training as follows
w i =w i -ηG k (17)
Where eta is the learning rate, G k Is thatAnd &>Is shown generally. Here->And (4) random initialization.
E i After T rounds of local training are achieved, the accuracy acc of each local model can be obtained at the moment i,k 、And &>Direct sharing of user information between the terminals is forbidden, and data in the edge server needs to be encrypted before communication, so that the data is prevented from being attacked before communication. This process uses HE to avoid information leakage. The process of adding HE using real numbers will be shown below. The encryption scheme of the weight matrix and the offset vector follows the same idea, and the addition homomorphic encryption of the real number a is expressed as a E . In additive homomorphic encryption, for any two numbers a and b, there is a E +b E =(a+b) E . The method of converting any real number r into a coded rational number stationary point v is:
consider the gradientEach encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits and d-bit fractional bits. Thus, each rational number that can be encoded is defined by its H =1+ z + d bits. The encoding is performed to allow multiplication operations, which require the operation modulo H +2d to avoid comparison.
The decoding is defined as:
multiplication of these code numbers requires removal by a factor of 1/2d. When Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once. For simplicity, it is processed at decoding time.
It is correct if only one code multiplication has taken place. Since the largest encryptable integer is V-1, the largest encryptable real number must take this into account. Therefore, the integer z and the fraction d must be chosen as follows:
V≥2 H+2d ≥2 1+z+3d (5)
The specific method of the MePC algorithm is shown in FIG. 3.
Exchange of base layer gradients using MePC in the k-th Federal taskTo avoid the risk of data being cracked, at each timeTaking a random proportion χ from a network>The gradient is then->And keeping the same federal random proportion χ the same. In the federal mission of different rounds, the random ratio χ (χ ∈ [1,1/n)]) Is changed, is>The remaining gradients are averaged n-1 ^ based>As shown in fig. 3, is selected>The values of (a) are divided into:
only is provided withIs retained at E i The other part and the random parameter χ are broadcast to the other ESs in the form of ciphertext. In this manner, even if portions of the transmitted content are attacked, the original data ≧ is>And will not leak. In particular, if an attacker wants to acquire data ≧ or>Must be taken>All of (a). But is present in>And χ in participant E i And a receiver E j The cryptograph form is kept through homomorphic encryption during communication.
When E is i Receiving data packet sent by other serverIt performs data authentication locally. In particular, it uses a corresponding "multiplication" method for verification. Each edge server designs two decoding functions by itself, as follows:
L 0 =χ·L (9)
Require thatThe decoding functions meeting all the ES execute 'and' operation on the same data packet to obtain all 0's, and execute' cross 'operation to obtain all 1's, that is
First, the decoding function is initialized as follows,
note that at the time of initialization, E is different i The data decoding of the transmitted data packet in the same federal task is also the same function.
Data packetMultiplied by the corresponding decoding functions in the other servers. Due to->The binary bit of middle 0 is multiplied by 0, so E i It can be guaranteed that only its own partial data packet is available. When the temperature is higher than the set temperatureWhen the binary bit in (1) is 1, the ciphertext of the gradient information of the corresponding position can be obtained as follows:
E i adding all data packet groups obtained from other ES to corresponding positions to obtain all ciphertext data, and updating to be the final dataNamely that
Each time a secure multiparty computation is performed, each E will be incremented with k i Decoding function ofIs left-cyclically shifted by m units to ensure @>Dynamics of sharing and can divide them equally into E 1 ,E 2 ,…,E n And the data information of each part is not repeated.
The specific method of the PreFla algorithm comprises the following steps:
the data distribution in the internet of vehicles is dispersed, and the data is unbalanced and heterogeneous, so that the improvement of the personalized service requirement while the real-time requirement is met is difficult. In order to prevent the privacy of users from being leaked in the communication process of different edge servers, the HE is used for encrypting parameters in the communication process. In order to better realize personalized training aiming at different user data, the first layer is set as a basic layer, and the existing federal learning method is used for training in a cooperative modeWhile other layers are trained locally as personalization layers, thereby enabling the capture of personal information for different ES devices. In this way, after the joint training process, the globally shared base layer can be transferred into the ES to build its own personalized deep learning model and use its unique personalized layer. Downloading base layer parameters from CS onlyParameter of the personalization level->The local data is randomly generated and used for fine tuning. In order to meet the real-time requirement and realize the personalized requirement of the ES, the PreFLa adopts Reinforcement Learning (RL) to carry out adaptation to select the optimal parameter weight ratio a i,k Aggregate global parameter->
In the upstream communication phase, each ES not only trains the local model, but also uploads the local parameters to the CS for joint aggregation. After execution of the MePC algorithm in the k-th federation, E i Parameterization over TLS/SSL secure channelAnd &>And uploading to the BS. In the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, its model parameters for aggregation have a crucial influence on the convergence speed of the stage. Therefore, it is necessary to consider participant E in k rounds of federal aggregation i Parameter weight ratio of (a) i,k 。
In the invention, reinforced Learning based on DQN is used to predict the parameter weight ratio, and information is stored through a Q function instead of table storage in Q-Learning so as to prevent space multidimensional disasters. In order to better realize model personalization and reduce the waiting time of the uploading weight in MePC-F, DQN is used for selecting the optimal parameter weight ratio a i,k To, aggregate the global parameters in the update CSThe reinforcement learning center contains: the state, action, reward function, and feedback are defined as follows:
the actions are as follows: weight ratio of parameter a i,k Represented as the action of the k-th round of federal task. To avoid trapping in the local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process, and a can be obtained i,k :
Where P is a set of weight permutations and rand is a random number (rand E0,1)]),Q(s i,k ,a i,k ) Means that the agent is in state s i,k Take action a i,k The accumulation of time discounts revenue. Once DQN is trained to approximate Q(s) during testing i,k ,a i,k ) The DQN agent will compute { Q(s) for all actions in the kth round i,k ,a i,k )|a i,k ∈[P]}. Each action value represents a passing of the agent in state s i,k Selecting a particular action a i,k The maximum expected return that can be achieved.
Rewarding: set the observed reward at the end of the k-th round of federal as
Wherein,is a positive number, ensuring that r k Δ acc with test accuracy i,k And the growth is exponential. The first incentive agent selects devices that enable higher test accuracy.For controlling the following Δ acc i,k Increase r k A change in (c). In general, as machine learning training progresses, model accuracy increases at a slower rate. However, in the federal cooperative task, the model accuracy may be reduced due to data distribution imbalance and heterogeneity. Thus, as FL enters the late stage, an exponential term is used to amplify the increase in boundary accuracy. The second term-1 is used to encourage the agent to improve model accuracy because when Δ acc i,k When less than 0, there is r k ∈(-1,0)。
Training DQN agents to maximize the expectation of a cumulative discount reward as shown in
Where γ ∈ (0,1 ] is a factor discounting future rewards.
At obtaining r k Thereafter, the CS saves the multi-dimensional quadruplets B for each round of federated tasks k =(s i,k ,a i,k ,r k ,s i,k+1 ). Optimal action value function Q(s) i,k ,a i,k ) Is the memo sought by the RL proxy, defined as s i,k Initial cumulative discount yield maximum expectation:
Q(s i,k ,a i,k )=E(r i,k +γmax Q(s i,k+1 ,a i,k )|s i,k ,a i,k ) (22)
then, can applyFunction approximation technique learns a parameterized value function Q(s) i,k ,a i,k ;w k ) Approximating the optimum function Q(s) i,k ,a i,k ). In the first step r k +γmax Q(s i,k+1 ,a i,k ) Is Q(s) i,k ,a i,k ;w k ) The goal of learning. Generally, DNN is used to represent a function approximator. The RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:
l(w k )=(r i,k +γmax Q(s i,k+1 ,a i,k ;w k )-Q(s i,k ,a i,k ;w k )) 2 (23)
CS updates Global parameter w k Comprises the following steps:
wherein eta is more than or equal to 0, which is the step length.
The CS repeats the above steps to obtain the best learning model. CS can obtain a of the k-th round weight ratio sequence i,k Global parameter ofThe updating is as follows:
Experimental test examples:
to verify the validity of the proposed mechanism, experimental results and analysis are given. Consider a system with 1 cloud server, 10 edge servers, respectively. The experimental learning rate α is 0.01 and the discount factor γ is 0.9. Positive integerAnd taking 3. The values of the parameters are shown in Table 2.
TABLE 2 parameter settings
The validity of the proposed model was verified on two data sets: MNIST and CIFAR-10. The performance of the proposed federal learning model MePC-F was evaluated based on reconstructed images, average accuracy and average loss of DLG. First, the performance of five schemes against DLG attacks was defended, and then the performance of the proposed federal learning model, mePC-F, was compared to centralized and PeMPC. All results in the following scenario are the average of 1000 independent experiments.
1) Performance against DLG attacks
This section evaluates the effectiveness of MePC-F and compares it with FL, peMPC and DP algorithms (Gaussian and Laplace distributed noise) in DLG reconstructed images. The common gradient of the network is computed for a single image on the MNIST dataset, the results of the different schemes are shown in fig. 4. Since studies [17] indicate that hiding the gradient of the first layer can reduce the reconstruction of the data, the gradient of the first layer (weight and bias terms) is replaced by four methods: mePC-F, peMPC, gaussian distribution (μ =0, σ = 1) noise, and laplace distribution (μ =0, σ = 1) noise were proposed to look at the behavior of DLG. Ladder hiding the first floor after completion
After degrees, DLG uses these gradients to recover the image that created the common shared gradient.
As can be seen from fig. 4, the DLG process can accurately reconstruct the training data without any method to hide the gradient of the first layer (FL in fig. 4 (a)). When the gradient of the first layer is protected by the method proposed by the present invention, mePC-F, information leakage can be effectively prevented in fig. 4 (b). When the number of iteration steps reaches 500, the DLG still cannot construct an image. From fig. 4 (c), it can be seen that similar results to fig. 4 (b), the PeMPC can also defend against the DLG attack in fig. 4. As can be seen from fig. 4 (d), by adding gaussian noise to the first layer, the reconstructed image is partially displayed from the 15 th round to the 20 th round, where the basic contour of the original image has been constructed. As the number of iteration rounds increases to 500 rounds, the image can be restored clearly. The laplacian noise and the gaussian noise in fig. 4 (e) also have a similar phenomenon.
As can be seen from fig. 5, if a malicious server receives the gradients of all hidden layers as plain text, the reconstruction process can obtain the lowest gradient loss and MSE of the image (green line in fig. 5). As the number of rounds increases, peMPC and MePC-F do not converge to zero, and the MSE of the image reaches 10 7 . Adding laplacian and gaussian noise to the original gradient converges to 10 -5 Fig. 4 also demonstrates that the data can be reconstructed when up to 20 rounds are reached. The larger the MSE of the image, the less likely it is that the image is reconstructed.
Based on the above experimental results, it is verified that adding laplacian and gaussian noise to the original gradient can prevent the early partial gradient leakage, but as the number of rounds increases, the original data can still be recovered due to the depth leakage. However, peMPC and MePC-F are effective methods to prevent DLG attacks from reconstructing the original data no matter how long the number of training rounds.
2) Performance comparison of average accuracy and average penalty
In this section, the effectiveness of MePC-F was evaluated and compared to centralized and PeMPC in terms of average accuracy, MNIST and average loss on CIFAR-10 data sets.
From fig. 6 (a) it can be seen the number of rounds required by the model to achieve 98% accuracy on the MNIST dataset. The average accuracy of all three methods increases with increasing training rounds. A centralized approach to achieve target accuracy on MNIST data requires 25 rounds, pepmc 140 rounds and MePC-F40 rounds, with MePC-F requiring 71.2% lower training rounds than pepmc. The reason is that the proposed reinforced federal learning algorithm PreFLa can find better aggregation parameter weight a through interaction with the environment i,k The method can better deal with No-IID data, accelerate model convergence and achieve target precision. Centralized is training on all combinations of data, so its accuracy will be higher than federal scienceThe accuracy of the algorithm is learned. But it can be seen from the figure that the convergence of the PeMPC can almost reach a centralized accuracy.
From fig. 6 (b), it can be seen that the average loss of the three schemes decreases as the number of training rounds increases. For the lumped type, the average loss is reduced from 0.233 to 0.052. The average loss of the peppc decreased from 0.35 to 0.084. Meanwhile, the average loss of MePC-F proposed by the invention is reduced to 0.06, which is lower than 28.6% of that of PeMPC. When the number of training rounds reaches 100 rounds, the proposed MePC-F can almost reach a centralized loss value.
As can be seen in FIG. 7 (a), the number of rounds required for the model to achieve a target accuracy of 50% on CAFIR-10. Similar results to those of fig. 6 (a) can be seen. The average accuracy of all three models is increasing until the target value is reached. For the centralised type, the average accuracy increased from 0.42 to 0.5 for 23 rounds. The average accuracy of the peppc increased from 0.372 to 0.5 for 89 rounds. Meanwhile, the average precision of the proposed MePC-F reaches the target precision at 41 rounds, which is 53.9% lower than that of PeMPC. FIG. 7 (a) shows that MePC-F uses a better weight a than PeMPC i,k The global model is updated, which results in a faster convergence speed.
As can be seen from fig. 7 (b), the average loss for the three schemes is decreasing until a stable value is reached. The centralized MePC-F, peMPC reaches the minimum loss value in sequence, and the time efficiency of the proposed MePC-F is better under PeMPC.
TABLE 3 top accuracy of three schemes in 100 rounds
MNIST | CIFAR-10 | |
centralized | 98.4% | 51.4% |
MePC-F | 98.2% | 51.1% |
PeMPC | 97.6% | 49.2% |
Table 3 gives the accuracy of the three solutions within 100 rounds. For MNIST data, the average accuracy of the proposed MePC-F is 98.2% which is 0.6% higher than that of PeMPC. The accuracy of the PeMPC can almost reach the accuracy of the centralized training. For the CAFIR-10 data, the average accuracy of MePC-F at 100 runs was as high as 0.511, 1.9% higher than that of PeMPC. It shows that MePC-F can update a by optimal weight i,k Global parameters are aggregated better than with peampc, resulting in higher accuracy, closer to focused accuracy.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.
Claims (8)
1. A real-time reinforced Federal learning data privacy security method based on a MePC-F model in the Internet of vehicles is characterized by comprising the following steps:
s1, constructing a plurality of edge servers E i And a cloud server CS; acquiring vehicle data D = { D = { (D) 1 ,D 2 ,…,D i }, edge server E i Acquiring corresponding vehicle data D i ;
S2, in the k-th round of federal task, the edge server E i Downloading initial type A gradients from cloud server CSAnd decrypted intoRandomly initializing a type B gradient pick>Edge server E i According to its vehicle data D i To compute gradients in local network model training, edge server E i The gradient information after the completion of the T rounds of local training is recorded as ^ er>
S3, edge server E i By decoding functionsSlave/slave unit>In which the partial gradient information which needs to be retained is acquired>And the remaining gradient information is pick>Is encrypted to be->Then broadcast and send to all other edge servers E through MePC algorithm j (ii) a Edge server E i Based on a decoding function>Obtain information from other edge servers E j Corresponding partial gradient information ofInformation-binding device>The class A gradient information updated and shared by all the edge servers is ^ or ^>i∈[1,n]N is the total number of edge servers;
s4, all edge servers willUploading the global parameters to a cloud server CS, aggregating the global parameters by the cloud server CS through a PreFLa algorithm, and selecting an edge server E by the PreFLa algorithm through obtaining a maximized report through reinforcement learning i Is optimized to the parameter weight ratio a i,k Global gradient parameter>According to a i,k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;
and S5, repeating the steps S2 to S4 until a termination condition is reached, calculating a final global gradient parameter by the cloud server CS, issuing the final global gradient parameter to each edge server, extracting the accuracy and the optimal loss function of the MePC-F model by the edge server according to the characteristics of a plurality of vehicle data, obtaining the trained MePC-F model, completing the whole training process, and outputting the model to a service corresponding to the Internet of vehicles in real time.
2. The real-time reinforced federal learning data privacy security method in the internet of vehicles based on the MePC-F model as claimed in claim 1, wherein in the step S2, the specific method for training the local network model is as follows:
employing a deep neural network DNN model, the DNN performing end-to-end feature learning and classifier training by taking different vehicle data as raw inputs, using stochastic gradient descent as a subroutine to minimize the loss value in each local training;
E i downloading base layer parameters from cloud server CS in k-th round of communication, namely initial A-type gradient before decryptionAnd deciphers as type A gradient->Randomly initializing a type B gradient pick>Wherein k is the [1,K ]]K represents the total number of rounds of the federal mission; if it is the first round of the federal task, the CS initializes randomly->Before local training, E i By using homomorphic encryption pairs->Decipher into->And is recorded as->
The loss function of the local model is set as follows:
L(w i )=l(w i )+λ(w i,t -w i,t+1 ) 2
where L () represents the loss of the network, the second term is the L2 regularization term, and λ is the regularization coefficient; w is a i Representing the total weight information in the local model, w i,t Is the weight information of the local model at time t, w i,t+1 The weight information of the local model at the moment t +1 is obtained;
E i update G k And replaces the weight parameter w of the model i Proceed through a minimization loss functionThe line local model training is as follows:
w i =w i -ηG k
3. the real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein in the step S3, the specific method of the MePC algorithm is as follows:
in the k-th federal task, all edge servers use MePC to exchange base layer gradientsWherein it is present>Encrypted class A data representing the nth edge server in the kth round of federated tasks, <' >>Encrypted data in class A representing the ith edge server in the kth federated task, and +>Indicating that the ith edge server in the kth round of federated task broadcasts encrypted data of class A to other edge servers, and->I.e. is>The encrypted data reserved by the user is removed;
to avoid the risk of data being cracked, a random ratio χ is taken in each networkThe gradient is then->And keeping the random proportion χ of the same federal round the same, and then will ^ be ^ or ^ be ^ ed>Encrypted is->The random proportion χ varies across different rounds of federal mission, and χ ∈ [1,1/n ]];The remaining gradient is ^ encrypted by homomorphism>Is divided into n-1 partsThe values of (a) are divided into:
only is provided withIs retained at E i In the method, other parts and the random parameter χ are broadcast and transmitted to other E in the form of ciphertext j (ii) a In this manner, even if portions of the transmitted content are attacked, the original data ≧ is>The leakage is avoided;
4. The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles as claimed in claim 3, wherein the specific method for locally performing data verification in the step S3 is as follows:
in the k-th round of federal tasks, verification was performed using the corresponding "multiplication" method, with each edge server designing two decoding functions on its own, as follows:
wherein L is 0 Is thatL' is->Length of (d); subscript k of the decoding function represents the decoding function in the k-th round of federal task; />
L 0 =χ·L
require to make a request forThe decoding functions of all edge servers execute 'and' operation on the same data packet to obtain all 0's, and execute' cross 'operation to obtain all 1's, namely:
first, the initial decoding function is as follows:
data packetMultiplying with corresponding decoding functions in other servers; due to->The binary bit of middle 0 is multiplied by 0, so E i Ensuring that only its own partial data packet is obtained; when/is>When the binary bit in (1) is 1, obtaining the ciphertext of the gradient information of the corresponding position as follows:
E i will be from other edge servers E j Adding all the obtained data packet arrays to corresponding positions to obtain all the ciphertext data, and updating the ciphertext data into the final ciphertext dataNamely:
5. The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein in the step S4, the specific method of the PreFla algorithm is as follows:
PreFLa adopts reinforcement learning RL to carry out adaptation to select optimal parameter weight ratio a i,k Aggregating global parameters
In an uplink communication stage, each edge server not only trains a local model, but also uploads local parameters to a cloud server CS for joint aggregation; after execution of the MePC algorithm in the k-th federation, E i Parameterization over TLS/SSL secure channelAnduploading to the CS; in the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, the model parameters of each ES are used for aggregation, and the convergence speed of the aggregation stage has a crucial influence; therefore, it is necessary to consider participant E in k rounds of federal aggregation i Parameter weight ratio of (a) i,k ;
Using the DQN-based reinforcement learning to predict the parameter weight ratio, and storing information through a Q function to prevent a spatial multi-dimensional disaster; in order to better realize model personalization and reduce the waiting time of the uploading weight in MePC-F, the method usesDQN to select optimal parameter weight ratio a i,k Aggregating global parameters in update CSThe reinforcement learning comprises the following steps: status, actions, reward functions, and feedback.
6. The real-time reinforced federal learning data privacy security method in internet of vehicles based on the MePC-F model as claimed in claim 5, wherein in the step S4, the specific methods of status, action, reward function and feedback are as follows:
the state is as follows: state of the kth wheelWherein it is present>Is a poor precision, expressed as:
the actions are as follows: weight ratio of parameter a i,k An action expressed as a kth round of federal task; in order to avoid trapping in a local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process to obtain a i,k :
Where P is a set of weight permutations, rand is a random number, rand is an element of 0,1],Q(s i,k ,a i,k ) Means agent is in state s i,k Take action a i,k Cumulative cash-out benefits over time; once DQN is trained to approximate Q(s) during testing i,k ,a i,k ) The DQN agent will compute { Q(s) for all actions in the kth round i,k ,a i,k )|a i,k ∈[P]}; each action value represents a passing of the agent in state s i,k Selecting a particular action a i,k Maximum expected return obtained;
rewarding: the observed reward at the end of the k-th round of federal is set to be:
wherein,is a positive number, ensuring that r k Δ acc with training accuracy i,k The growth is exponential; the first incentive agent selects equipment capable of achieving higher test accuracy;For controlling the following Δ acc i,k Increase r k A change in (c); when Δ acc i,k When less than 0, there is r k ∈(-1,0);
Training the DQN agent to maximize the expectation of a cumulative discount reward, as shown by:
wherein γ ∈ (0,1 ], represents a factor discounting future rewards;
at obtaining r k The cloud server CS then saves the multi-dimensional quadruplets B for each round of federal tasks k =(s i,k ,a i,k ,r k ,s i,k+1 ) (ii) a Optimal action value function Q(s) i,k ,a i,k ) Is the memo sought by the RL proxy, defined as s i,k Initial cumulative discount yield maximum expectation:
Q(s i,k ,a i,k )=E(r i,k +γmax Q(s i,k+1 ,a i,k )|s i,k ,a i,k )
learning a parameterized value function Q(s) using function approximation techniques i,k ,a i,k ;w k ) Approximating the optimal value function Q(s) i,k ,a i,k );r k +γmax Q(s i,k+1 ,a i,k ) Is Q(s) i,k ,a i,k ;w k ) A goal of learning; DNN is used to represent a function approximator; the RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:
l(w k )=(r i,k +γmax Q(S i,k+1 ,a i,k ;w k )-Q(s i,k ,a i,k ;w k )) 2
CS updates Global parameter w k Comprises the following steps:
wherein eta is more than or equal to 0 and is the step length;
after the cloud server CS obtains the optimal learning model, a of the kth round weight ratio sequence is obtained i,k Global parameter ofThe updating is as follows: />
7. The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein the HE encryption method in the method specifically comprises the following steps:
the encryption schemes of the weight matrix and the offset vector follow the same idea, and the addition homomorphic encryption of the real number a is expressed as a E In addition homomorphic encryption, for any two numbers a and b, there is a E +b E =(a+b) E (ii) a The method of converting any real number r into a coded rational number stationary point v is:
consider the gradientEach encoded real number r in (a) can be expressed as a rational H-bit number, consisting of one sign bit, z-bit integer bits, and d-bit fractional bits; thus, each rational number that can be encoded is defined by its H =1+ z + d bits; performing encoding to allow multiplication operations, which require the operation modulus to be H +2d to avoid comparison;
the decoding is defined as:
multiplication of these code numbers requires removal of the factor 1/2d; when Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once; for simplicity, it is processed at decoding time;
the largest encryptable integer is V-1, so the largest encryptable real number must be taken into account, and therefore the integer z and the fraction d are chosen as follows:
V≥2 H+2d ≥2 1+z+3d 。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210816716.3A CN115310121B (en) | 2022-07-12 | 2022-07-12 | Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210816716.3A CN115310121B (en) | 2022-07-12 | 2022-07-12 | Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115310121A CN115310121A (en) | 2022-11-08 |
CN115310121B true CN115310121B (en) | 2023-04-07 |
Family
ID=83857637
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210816716.3A Active CN115310121B (en) | 2022-07-12 | 2022-07-12 | Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115310121B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115731424B (en) * | 2022-12-03 | 2023-10-31 | 北京邮电大学 | Image classification model training method and system based on enhanced federal domain generalization |
CN115860789B (en) * | 2023-03-02 | 2023-05-30 | 国网江西省电力有限公司信息通信分公司 | CES day-ahead scheduling method based on FRL |
CN116610958B (en) * | 2023-06-20 | 2024-07-26 | 河海大学 | Unmanned aerial vehicle group reservoir water quality detection oriented distributed model training method and system |
CN117812564B (en) * | 2024-02-29 | 2024-05-31 | 湘江实验室 | Federal learning method, device, equipment and medium applied to Internet of vehicles |
CN117873402B (en) * | 2024-03-07 | 2024-05-07 | 南京邮电大学 | Collaborative edge cache optimization method based on asynchronous federal learning and perceptual clustering |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160573B (en) * | 2020-04-01 | 2020-06-30 | 支付宝(杭州)信息技术有限公司 | Method and device for protecting business prediction model of data privacy joint training by two parties |
CN111611610B (en) * | 2020-04-12 | 2023-05-30 | 西安电子科技大学 | Federal learning information processing method, system, storage medium, program, and terminal |
CN112100295A (en) * | 2020-10-12 | 2020-12-18 | 平安科技(深圳)有限公司 | User data classification method, device, equipment and medium based on federal learning |
CN112199702B (en) * | 2020-10-16 | 2024-07-26 | 鹏城实验室 | Privacy protection method, storage medium and system based on federal learning |
CN112015749B (en) * | 2020-10-27 | 2021-02-19 | 支付宝(杭州)信息技术有限公司 | Method, device and system for updating business model based on privacy protection |
CN112347500B (en) * | 2021-01-11 | 2021-04-09 | 腾讯科技(深圳)有限公司 | Machine learning method, device, system, equipment and storage medium of distributed system |
CN113037460B (en) * | 2021-03-03 | 2023-02-28 | 北京工业大学 | Federal learning privacy protection method based on homomorphic encryption and secret sharing |
CN113435472A (en) * | 2021-05-24 | 2021-09-24 | 西安电子科技大学 | Vehicle-mounted computing power network user demand prediction method, system, device and medium |
-
2022
- 2022-07-12 CN CN202210816716.3A patent/CN115310121B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115310121A (en) | 2022-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115310121B (en) | Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles | |
CN109684855B (en) | Joint deep learning training method based on privacy protection technology | |
US11449753B2 (en) | Method for collaborative learning of an artificial neural network without disclosing training data | |
Han et al. | Logistic regression on homomorphic encrypted data at scale | |
Zhu et al. | Distributed additive encryption and quantization for privacy preserving federated deep learning | |
Al-Maadeed et al. | A New Chaos‐Based Image‐Encryption and Compression Algorithm | |
CN110572253A (en) | Method and system for enhancing privacy of federated learning training data | |
CN111291411B (en) | Safe video anomaly detection system and method based on convolutional neural network | |
Erkin et al. | Privacy-preserving distributed clustering | |
CN114254386A (en) | Federated learning privacy protection system and method based on hierarchical aggregation and block chain | |
CN114363043B (en) | Asynchronous federal learning method based on verifiable aggregation and differential privacy in peer-to-peer network | |
CN112949741B (en) | Convolutional neural network image classification method based on homomorphic encryption | |
CN113298268A (en) | Vertical federal learning method and device based on anti-noise injection | |
Beguier et al. | Safer: Sparse secure aggregation for federated learning | |
CN117807597A (en) | Robust personalized federal learning method facing back door attack | |
CN110737907A (en) | Anti-quantum computing cloud storage method and system based on alliance chain | |
Ghavamipour et al. | Federated synthetic data generation with stronger security guarantees | |
Li et al. | An Adaptive Communication‐Efficient Federated Learning to Resist Gradient‐Based Reconstruction Attacks | |
Qiu et al. | Privacy preserving federated learning using ckks homomorphic encryption | |
Kiamari et al. | Non-interactive verifiable LWE-based multi secret sharing scheme | |
Gad et al. | Joint Knowledge Distillation and Local Differential Privacy for Communication-Efficient Federated Learning in Heterogeneous Systems | |
CN116595589B (en) | Secret sharing mechanism-based distributed support vector machine training method and system | |
CN117294469A (en) | Privacy protection method for federal learning | |
CN117113413A (en) | Robust federal learning privacy protection system based on block chain | |
CN116582242A (en) | Safe federal learning method of ciphertext and plaintext hybrid learning mode |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |