CN115310121A - Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles - Google Patents
Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles Download PDFInfo
- Publication number
- CN115310121A CN115310121A CN202210816716.3A CN202210816716A CN115310121A CN 115310121 A CN115310121 A CN 115310121A CN 202210816716 A CN202210816716 A CN 202210816716A CN 115310121 A CN115310121 A CN 115310121A
- Authority
- CN
- China
- Prior art keywords
- model
- data
- federal
- mepc
- gradient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2463/00—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
- H04L2463/062—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00 applying encryption of the keys
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a real-time reinforced federal learning data privacy security method based on a MePC-F model in the internet of vehicles, which comprises the following steps: building multiple edge servers E i And a cloud server CS; edge server E i Downloading initial type A gradients from cloud server CSAnd decrypted intoRandom initialization of type B gradientsCarrying out local model training; edge server E i By a decoding function fromTo obtain partial gradient information to be preservedAnd residual gradient information is processedIs homomorphically encrypted asThen broadcast and send to all other edge servers E through MePC algorithm j (ii) a The class A gradient information after all edge servers are updated and shared is respectivelyAll edge servers willUploading the global parameters to a cloud server CS, and aggregating the global parameters by the cloud server CS through a PreFLa algorithm; the above steps are repeated until a termination condition is reached. The invention prevents data leakage between terminals, realizes data privacy safety protection, and reduces communication overhead while preventing original data leakage.
Description
Technical Field
The invention relates to the technical field of networked vehicle user cooperative processing real-time safety behavior analysis, in particular to a real-time reinforced federal learning data privacy security method based on a MePC-F model in a vehicle networking.
Background
With the development of various real-time communications and services supported by the internet of vehicles, the data volume generated by interconnected equipment such as vehicle-mounted units is unprecedentedly huge, a large amount of heterogeneous data oriented to vehicle users and the difference of equipment computing capacity are provided, federal learning provides an effective solution for meeting the requirement of data safety protection in the real-time training process of a network model, and different edge devices can cooperatively train a machine learning model under the condition of not exposing original data.
The edge computing mass data and the personal privacy of the user are closely combined, for example, the track, credit card, bill and other data of the user are really related to the privacy security of the user, and if data leakage occurs, great potential safety hazards are brought to the user. Federal learning can protect data to some extent, but the risk of information leakage still exists, and there are four types, 1) member leakage, 2) unexpected feature leakage, 3) class representing original data leakage, and 4) original data leakage. The last type of data leakage is the least acceptable for privacy-sensitive participants.
In order to protect the data privacy of mobile users and solve the problem of leakage of raw data, researchers have conducted a lot of research on data security protection based on cryptography: differential privacy, homomorphic encryption, and multi-party secure computation. Differential privacy generally uses three noise addition mechanisms: a laplace mechanism, a gaussian mechanism, and an exponential mechanism, respectively. The context information is disturbed by adding noise to protect the privacy of the data, but if the noise is increased too much, the performance of the model training is affected. Common in homomorphic encryption are additive and multiplicative homomorphic encryption: research shows that when Paillier addition homomorphic encryption calculation is used, noise is doubled, and when El Gamal multiplication homomorphic encryption calculation is used, the noise is increased in a second order. To increase the availability of data and overcome the noise problem, researchers have introduced bootstrapping, which reduces noise by setting thresholds for encryption and decryption, allowing the scheme to compute an infinite number of operations. It is also possible to do batch processing or parallel homomorphic computation or compression of erasure pairs to solve the noise problem. The safe multi-party calculation refers to the problem that a multi-party participant safely calculates an appointed function without a trusted third party, and the main purpose is to ensure that private input of each party is independent in the calculation process and no local data is leaked in the calculation process. Research proves that the gradient leakage problem in federal learning can be solved by using safe multi-party calculation, and the data safety protection can be carried out while the accuracy is ensured only by carrying out information exchange on the first hidden layer. But the process of information interaction is P2P, so the problem of large communication overhead occurs.
Most of data security protection research based on cryptography is a centralized solution, and aims to solve the problem of time overhead while protecting data security: federated learning allows edge devices to co-train machine learning models without exposing raw data. Federal learning typically employs a parameter server architecture, where the client is trained by a parameter server synchronization local model. The synchronization method is usually used for realizing, namely, the central server synchronously sends the global model to a plurality of clients, and the plurality of clients synchronously return the updated model to the central server after training the model based on local data. This may be slow due to a queue loss. Global synchronization is very difficult, especially in a joint learning scenario, due to limited computing power and battery time, and varying availability and completion times from device to device. A new joint optimization asynchronous algorithm is proposed to solve the regularization local problem to ensure convergence, so that multiple devices and servers can cooperatively and efficiently train the model without revealing privacy.
Although there have been many studies in terms of data security. However, most of the data are limited to solving the safety problem of original data, and how to simultaneously meet the requirements of privacy and usability of big data of mobile users in a complex vehicle networking space is still open.
Firstly, data in federal learning are stored in a local node, so that the risk problem of leakage of original data in data transmission can be reduced. But only gradient information is transmitted, the possibility still arises that the original data is recovered. Data interaction in secure multi-party computing can enable multiple parties to have data, and the possibility that a sample is recovered by information after gradient information is leaked is reduced. However, in the existing secure multi-party computing, the way of exchanging information among users is that all users send information to other users, and simply speaking, a unicast way is used, which brings higher time overhead. Therefore, when dealing with the data security and real-time requirements of vehicle users, it is important to find a suitable solution to reduce the risk of data being attacked and recovered, and to reduce the transmission delay. Secondly, due to the difference of data and equipment of different edge servers, it is also necessary to improve the training precision of the whole model in a targeted manner in the training process. The global parameter aggregation performed by adopting a typical federal average synchronization mode is slow due to the phenomenon of queue loss. While the communication time overhead is balanced and calculated, it is also important that the global precision is guaranteed through personalized training of a plurality of models. However, most federal learning algorithms based on data security rely on a synchronous aggregation algorithm, which can bring high latency to challenge meeting the real-time requirements of the internet of vehicles. Therefore, a federated learning algorithm based on reinforcement learning is necessary to reduce time delay, improve accuracy and guarantee data safety.
Disclosure of Invention
The invention aims to solve the technical problem of providing a real-time Federal learning data privacy strengthening safety method based on a MePC-F model in the Internet of vehicles aiming at the defects in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the invention provides a real-time reinforced federal learning data privacy security method based on a MePC-F model in the Internet of vehicles, which comprises the following steps:
s1, constructing a plurality of edge servers E i And a cloud server CS; acquiring vehicle data D = { D = { (D) 1 ,D 2 ,…,D i }, edge server E i Acquiring corresponding vehicle data D i ;
S2, in the k-th round of federal task, the edge server E i Downloading initial type A gradients from cloud server CSAnd decrypted intoRandom initialization of type B gradientsEdge server E i According to the number of vehiclesAccording to D i To calculate the gradient in the local network model training, edge server E i Recording the gradient information after finishing the T-round local training
S3, edge server E i By decoding functionsFromTo obtain partial gradient information to be preservedAnd the remaining gradient information is usedIs homomorphically encrypted toThen broadcast and send to all other edge servers E through MePC algorithm j (ii) a Edge server E i According to decoding functionGet from other edge servers E j Corresponding partial gradient information ofThe class A gradient information after all the edge servers are updated and shared is respectivelyi∈[1,n]N is the total number of edge servers;
s4, all edge servers willIs uploaded toThe cloud server CS aggregates global parameters through a PreFLa algorithm, and the PreFLa algorithm obtains maximization report through reinforcement learning to select the edge server E i Is optimized to the parameter weight ratio a i,k Global gradient parameterAccording to a i,k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;
and S5, repeating the steps S2 to S4 until a termination condition is reached, calculating a final global gradient parameter by the cloud server CS, issuing the final global gradient parameter to each edge server, extracting the feature of the plurality of vehicle data by the edge server, calculating the accuracy and the optimal loss function of the MePC-F model to obtain the trained MePC-F model, completing the whole training process, and outputting the model to a service corresponding to the Internet of vehicles in real time. .
Further, in step S2 of the present invention, a specific method for training the local network model is as follows:
employing a deep neural network DNN model, the DNN performing end-to-end feature learning and classifier training by taking different vehicle data as raw inputs, using stochastic gradient descent as a subroutine to minimize the loss value in each local training;
E i downloading base layer parameters from cloud server CS in k-th round of communication, namely initial A-type gradient before decryptionAnd decipher as A-type gradientRandom initialization of type B gradientsWherein k is the [1,K ]]K represents the total number of rounds of the federal mission; if the task is the first round of federal task, the CS is initialized randomlyBefore local training, E i By using homomorphic cryptographic pairsIs decrypted intoAnd is marked as
The loss function of the local model is set as follows:
L(w i )=l(w i )+λ(w i,t -w i,t+1 ) 2
where L () represents the loss of the network, the second term is the L2 regularization term, and λ is the regularization coefficient; w is a i Representing the total weight information in the local model, w i,t Is the weight information of the local model at time t, w i,t+1 The weight information of the local model at the moment t +1 is obtained;
E i initialization G k And replaces the weight parameter w of the model i Continuing the local model training by minimizing the loss function as follows:
w i =w i -ηG k
edge server E i After T rounds of local training are achieved, the accuracy acc of each local model is obtained at the moment i,k And
further, in step S3 of the present invention, a specific method of the MePC algorithm is as follows:
in the k-th federal task, all edge servers use MePC to exchange base layer gradientsWherein the content of the first and second substances,class a encrypted data representing the nth edge server in the k round of federal tasks,the encrypted data of class A of the ith edge server in the k round of federal task,indicating that the ith edge server in the k round of federal task broadcasts a type A encrypted data to other edge servers,is thatThe encrypted data reserved by the user is removed;
to avoid the risk of data being cracked, a random ratio χ is taken in each networkThe gradient isAnd keeping the same federal random proportion chi equal in the same round, and then willIs encrypted asThe random proportion χ varies across different rounds of federal mission, and χ ∈ [1,1/n ]];The remaining gradient is encrypted homomorphically toIs divided into n-1 partsThe values of (a) are divided into:
only is provided withIs retained at E i In the method, other parts and the random parameter χ are broadcast and transmitted to other E in the form of ciphertext j (ii) a In this way, even if part of the transmission content is attacked, the original dataThe leakage is avoided;
Further, in step S3 of the present invention, a specific method for locally performing data verification includes:
in the k-th round of federal tasks, verification was performed using the corresponding "multiplication" method, with each edge server designing two decoding functions on its own, as follows:
wherein L is 0 Is thatLength of (L') isLength of (d); subscript k of the decoding function represents the decoding function in the kth round of federal task;
L 0 =χ·L
require thatThe decoding functions of all edge servers execute ' and ' operation on the same data packet to obtain all 0's, andperforming the "cross" operation results in all 1's, i.e.:
first, the initial decoding function is as follows:
data packetMultiplying with corresponding decoding functions in other servers; due to the fact thatThe binary bit of middle 0 is multiplied by 0, so E i Ensuring that only its own partial data packet is obtained; when in useWhen the binary bit in (1) is 1, obtaining the ciphertext of the gradient information of the corresponding position as follows:
E i will be from other edge servers E j Adding all the obtained data packet arrays to corresponding positions to obtain all the ciphertext data, and updating the ciphertext data into the final ciphertext dataNamely:
each E will be added as k increases each time a secure multiparty computation is performed i Decoding function ofIs circularly moved to the left by m units to ensureDynamics of sharing and equally dividing them into E 1 ,E 2 ,…,E n And the data information of each part is not repeated.
Further, in step S4 of the present invention, the specific method of the PreFla algorithm is as follows:
PreFLa adopts reinforcement learning RL to carry out adaptation to select optimal parameter weight ratio a i,k Aggregating global parameters
In an uplink communication stage, each edge server not only trains a local model, but also uploads local parameters to a cloud server CS for joint aggregation; after execution of the MePC algorithm in the k-th federation, E i Parameterization over TLS/SSL secure channelAnduploading to the CS; in the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, the model parameters of each ES are used for aggregation, and the convergence speed of the aggregation stage has a crucial influence; therefore, it is necessary to consider participant E in k rounds of federal aggregation i Parameter weight ratio of (a) i,k ;
Using the reinforced learning based on DQN to predict the weight ratio of the parameters, and storing information through a Q function to prevent a spatial multi-dimensional disaster; to better achieve model personalization and reduce latency of upload weights in MePC-FDQN is used to select the optimal parameter weight ratio a i,k Aggregating global parameters in updated CSThe reinforcement learning comprises the following steps: status, actions, reward functions, and feedback.
Further, in step S4 of the present invention, the specific method of the state, the action, the reward function and the feedback is as follows:
and (3) state: state of the kth wheelWherein the content of the first and second substances,is a poor precision, expressed as:
the actions are as follows: weight ratio of parameter a i,k An action represented as a kth round of federated tasks; in order to avoid trapping in a local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process to obtain a i,k :
Where P is a set of weight permutations, rand is a random number, rand is an element of 0,1],Q(s i,k ,a i,k ) Means agent is in state s i,k Take action a i,k Cumulative cash-out benefits over time; once DQN is trained to approximate Q(s) during testing i,k ,a i,k ) The DQN agent will compute { Q(s) for all actions in the kth round i,k ,a i,k )|a i,k ∈[P]}; each action value represents a passing of the agent in state s i,k Selecting a particular action a i,k The maximum expected reward that can be achieved;
rewarding: the observed reward at the end of the kth round of federal is set to be:
wherein the content of the first and second substances,is a positive number, ensuring that r k Δ acc with test accuracy i,k The growth is exponential; the first incentive agent selects equipment capable of achieving higher test accuracy;for controlling the following Δ acc i,k Increase r k A change in (c); when Δ acc i,k When less than 0, there is r k ∈(-1,0);
The desire to train the DQN agent to maximize the cumulative discount reward is shown as follows:
wherein γ ∈ (0,1 ], represents a factor discounting future rewards;
at obtaining r k The cloud server CS then saves the multi-dimensional quadruplets B for each round of federal tasks k =(s i,k ,a i,k ,r k ,s i,k+1 ) (ii) a Optimal action value function Q(s) i,k ,a i,k ) Is the memo sought by the RL proxy, defined as s i,k Initial cumulative discount yield maximum expectation:
Q(s i,k ,a i,k )=E(r i,k +γmax Q(s i,k+1 ,a i,k )|s i,k ,a i,k )
learning a parameterized value function Q(s) using function approximation techniques i,k ,a i,k ;w k ) Approximating the optimal value function Q(s) i,k ,a i,k );r k +γmax Q(s i,k+1 ,a i,k ) Is Q(s) i,k ,a i,k ;w k ) A goal of learning; DNN is used to represent a function approximator; the RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:
l(w k )=(r i,k +γmax Q(s i,k+1 ,a i,k ;w k )-Q(s i,k ,a i,k ;w k )) 2
CS updates the global parameter w k Comprises the following steps:
wherein eta is more than or equal to 0 and is the step length;
after the cloud server CS obtains the optimal learning model, a of the k-th round weight ratio sequence is obtained i,k Global parameter(s)The updating is as follows:
Further, the HE encryption method in the method of the present invention specifically includes:
the encryption schemes of the weight matrix and the offset vector follow the same idea, and the addition homomorphic encryption of the real number a is expressed as a E In addition homomorphic encryption, for any two numbers a and b, there is a E +b E =(a+b) E (ii) a The method of converting any real number r into a coded rational number stationary point v is:
consider the gradientEach encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits, and d-bit fractional bits; thus, each rational number that can be encoded is defined by its H =1+ z + d bits; performing encoding to allow multiplication operations, which require the operation modulus to be H +2d to avoid comparison;
the decoding is defined as:
multiplication of these code numbers requires removal of the factor 1/2d; when Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once; for simplicity, it is processed at decoding time;
the largest encryptable integer is V-1, so the largest encryptable real number must be taken into account, and therefore the integer z and the fraction d are chosen as follows:
V≥2 H+2d ≥2 1+z+3d 。
further, the optimal loss function in step S5 of the present invention is
Wherein, L (w) i ) Denotes E i Loss of the network.
The invention has the following beneficial effects:
(1) A Federal learning model (MePC-F) for multi-party broadcast security computing is presented. The model combines the MePC algorithm and the PreFla algorithm to solve the problems of the security of federal learning training data and communication overhead in the Internet of vehicles. And the mixed advantages of homomorphic encryption and safe multi-party calculation are considered to prevent data leakage between the terminals, and the reduction degree of the original data is reduced after the data is attacked, so that the privacy safety protection of the data is realized to the maximum extent.
(2) A secure broadcast multi-party computing MePC is presented. For secure multiparty computation, sharing only the gradient information of the first layer can greatly reduce the risk of data being recovered and reduce traffic. In the sharing process, the edge server model takes respective parts through a decoding function in a broadcasting mode, and the time complexity can be increased from O (n) 2 ) To O (n), communication overhead is reduced while preventing leakage of original data.
(3) A federal learning algorithm PreFla based on weight proportion is proposed. And finding the optimal gradient weight ratio by using PreFla to aggregate global parameters, and designing a reward function by using the accuracy difference of each edge server, so that the action selection with the maximum overall return is the weight ratio of each round of federation. And an L2 regularization term is added in the loss function to promote edge server cooperation and reduce time delay and performance problems brought by data heterogeneity. Therefore, the global model is generalized better, and convergence is accelerated.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a MePC-F model of an embodiment of the present invention;
FIG. 2 is a flow chart of a MePC-F model of an embodiment of the present invention;
FIG. 3 is a MePC algorithm of an embodiment of the present invention;
fig. 4 shows DLG results when the first hidden layer is hidden and not hidden by four methods on the MNIST according to the embodiment of the present invention; (a) FL; (b) MePC-F; (c) PeMPC; (d) Gaussian; (e) Laplacian;
FIG. 5 is the performance of DLG on MNIST of an embodiment of the present invention when the gradient of the first hidden layer is replaced by four methods (Gaussian distribution, laplacian distribution, PEMPC and MePC-F);
FIG. 6 is the average accuracy and loss of No-IID MNIST data for an embodiment of the present invention;
FIG. 7 is the average accuracy and loss of No-IID CAFIR-10 data for an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The parameters involved in the examples of the invention are described below:
TABLE 1 description of the parameters
Wherein E is i Indicating the current edge server, E j Representing edge servers other than the current edge server, E s Representing all edge servers.
The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles comprises the following steps:
s1, constructing a plurality of edge servers E i And a cloud server CS; obtaining vehicles data D = { D = 1 ,D 2 ,…,D i }, edge server E i Acquiring corresponding vehicle data D i ;
S2, in the k-th round of federal task, an edge server E i Downloading initial type A gradients from cloud server CSAnd decrypted intoRandom initialization of type B gradientsEdge server E i According to its vehicle data D i To compute gradients in local network model training, edge server E i After finishing the local training of T wheelIs recorded as gradient information of
S3, edge server E i By decoding functionsFromTo obtain partial gradient information to be preservedAnd the remaining gradient information is usedIs homomorphically encrypted toThen broadcast and send to all other edge servers E through MePC algorithm j (ii) a Edge server E i According to decoding functionGet from other edge servers E j Corresponding partial gradient information ofThe class A gradient information after all the edge servers are updated and shared is respectivelyi∈[1,n]N is the total number of edge servers;
s4, all edge servers willUploading the data to a cloud server CS, aggregating global parameters by the cloud server CS through a PreFLa algorithm, and selecting edge services by the PreFLa algorithm through obtaining a maximization report through reinforcement learningDevice E i Is optimized to the parameter weight ratio a i,k Global parameterAccording to a i,k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;
and S5, repeating the steps S2-S4 until a termination condition is reached, and finishing the whole training process. The termination condition may be a maximum number of training cycles, a convergence of a loss function, or other user-defined condition. Finally, an optimal loss function can be obtained according to the following equation (1).
Wherein, L (w) i ) Represents E i Loss of the network.
The specific method of local training is as follows:
in the local model phase, a Deep Neural Network (DNN) is employed to learn the cloud model and the ES model. DNN performs end-to-end feature learning and classifier training by taking different user data as raw inputs. Random gradient descent will be used as a subroutine in the proposed algorithm to minimize the loss value in each local training.
In a downstream communication phase E i At k (k E [1,K)]) Downloading base layer parameters from CS in round-robin communicationAnd randomly initializingWhere K represents the total number of rounds of the federated task. If the task is the first federal task, the CS initializes randomlyBefore local training, E i It is necessary to use the pair of homomorphic encryptions (formula (4)) forIs decrypted intoAnd is marked as
In order to better embody the model personalization, the loss function of the local model is set as follows:
L(w i )=l(w i )+λ(w i,t -w i,t+1 ) 2 (16)
where l () represents the loss of the network, e.g., the cross-entropy loss of the classification task. The second term is an L2 regularization term, which can not only keep the individuation capability of the second term, but also improve the cooperation efficiency with other participants. λ is the regularization coefficient.
E i Initialization G k And replaces the weight parameter w of the model i Continuing the local model training as follows
w i =w i -ηG k (17)
Where eta is the learning rate, G k Is thatAndis shown in general. Herein, theAnd (4) random initialization.
E i After T rounds of local training are achieved, the accuracy acc of each local model is obtained at the moment i,k 、Andterminal anddirect sharing of user information between the terminals is prohibited, and data in the edge server needs to be encrypted before communication, so that the data is prevented from being attacked before communication. This process uses the HE to avoid information leakage. The process of adding HE using real numbers will be shown below. The encryption schemes of the weight matrix and the offset vector follow the same idea, and the addition homomorphic encryption of the real number a is expressed as a E . In additive homomorphic encryption, for any two numbers a and b, there is a E +b E =(a+b) E . The method of converting any real number r into a coded rational number stationary point v is:
consider the gradientEach encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits and d-bit fractional bits. Thus, each rational number that can be encoded is defined by its H =1+ z + d bits. The encoding is performed to allow multiplication operations, which require the operation modulus to be H +2d to avoid comparison.
The decoding is defined as:
multiplication of these code numbers requires removal by a factor of 1/2d. When Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once. For simplicity, it is processed at decoding time.
It is correct if only one code multiplication has taken place. Since the largest encryptable integer is V-1, the largest encryptable real number must take this into account. Therefore, the integer z and the decimal d must be chosen as follows:
V≥2 H+2d ≥2 1+z+3d (5)
the specific method of the MePC algorithm is shown in FIG. 3.
Exchange of base layer gradients using MePC in the k-th Federal taskTo avoid the risk of data being cracked, a random ratio χ is taken in each networkThe gradient isAnd keeping the same federal random proportion χ the same. In the different rounds of federal mission, the random ratio χ (χ ∈ [1,1/n)]) Is that the number of the optical fibers is varied,the remaining gradient was divided equally into n-1 portionsAs shown in figure 3 of the drawings,the values of (a) are divided into:
only is provided withIs retained at E i The other part and the random parameter χ are broadcast to the other ESs in the form of ciphertext. In this way, even if part of the transmission content is attacked, the original dataAnd will not leak. In particular, if an attacker wants to obtain dataMust acquireAll of (a). However, it is possible to use a single-layer,and χ in participant E i And a receiver E j The cryptograph form is kept through homomorphic encryption during communication.
When E is i Receiving data packet sent by other serverIt performs data authentication locally. In particular, it uses a corresponding "multiplication" method for verification. Each edge server designs two decoding functions by itself, as follows:
L 0 =χ·L (9)
Require thatThe decoding functions meeting all the ES execute 'and' operation on the same data packet to obtain all 0's, and execute' cross 'operation to obtain all 1's, that is
First, the decoding function is initialized as follows,
note that at the time of initialization, different E' s i The data decoding of the transmitted data packet in the same federal task is also the same function.
Data packetMultiplied by the corresponding decoding functions in the other servers. Due to the fact thatThe binary bit of middle 0 is multiplied by 0, so E i It can be guaranteed that only its own partial data packet is available. When in useWhen the binary bit in (1) is 1, the ciphertext of the gradient information of the corresponding position can be obtained as follows:
E i adding all data packet arrays obtained from other ES to corresponding positions to obtain all ciphertext data, and updating the ciphertext data into the final ciphertext dataNamely, it is
Each time a secure multiparty computation is performed, each E will be incremented with k i Decoding function ofIs circularly moved to the left by m units to ensureDynamics of sharing and can divide them equally into E 1 ,E 2 ,…,E n And the data information of each part is not repeated.
The specific method of the PreFla algorithm is as follows:
the data distribution in the internet of vehicles is dispersed, and the data is unbalanced and heterogeneous, so that the improvement of the personalized service requirement while the real-time requirement is met is difficult. In order to prevent the privacy of users from being leaked during the communication of different edge servers, the HE is used for encrypting parameters during the communication. In order to better realize personalized training aiming at different user data, the first layer is set as a basic layer, the existing federal learning method is used for training in a cooperative mode, and other layers are used as personalized layers for local training, so that personal information of different ES devices can be captured. In this way, after the joint training process, the globally shared base layer can be transferred into the ES to build its own personalized deep learning model and use its unique personalized layer. Downloading base layer parameters from CS onlyParameters of a personalization layerRandomly generated and fine-tuned using local data. In order to meet the real-time requirement and realize the personalized requirement of the ES, the PreFLa adopts Reinforcement Learning (RL) to carry out adaptation to select the optimal parameter weight ratio a i,k Aggregating global parameters
In the upstream communication phase, each ES not only trains the local model, but also uploads the local parameters to the CS for joint aggregation. After execution of the MePC algorithm in the k-th federation, E i Parameterization over TLS/SSL secure channelAndand uploading to the BS. In the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, its model parameters for aggregation have a crucial influence on the convergence speed of the stage. Therefore, it is necessary to consider participant E in k rounds of federal aggregation i Parameter weight ratio of (a) i,k 。
In the invention, DQN-based reinforcement Learning is used for predicting the parameter weight ratio, and information is stored through a Q function instead of table storage in Q-Learning so as to prevent a spatial multidimensional disaster. In order to better realize model personalization and reduce the waiting time of the uploading weight in MePC-F, DQN is used for selecting the optimal parameter weight ratio a i,k To, aggregate the global parameters in the update CSThe reinforcement learning center contains: the state, action, reward function, and feedback are defined as follows:
the state is as follows: state of the kth wheelWherein the content of the first and second substances,is a poor precision, expressed as:
the actions are as follows: weight ratio of parameter a i,k Represented as the actions of the k-th round of federal tasks. In order to avoid trapping in a local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process, and a can be obtained i,k :
Where P is a set of weight permutations and rand is a random number (ra)nd∈[0,1]),Q(s i,k ,a i,k ) Means agent is in state s i,k Take action a i,k The accumulation of time discounts revenue. Once DQN is trained to approximate Q(s) during testing i,k ,a i,k ) The DQN agent will compute { Q(s) for all actions in the kth round i,k ,a i,k )|a i,k ∈[P]}. Each action value represents a passing of the agent in state s i,k Selecting a particular action a i,k The maximum expected return that can be achieved.
Rewarding: set the observed reward at the end of the k-th round of federal as
Wherein the content of the first and second substances,is a positive number, ensuring that r k Δ acc with test accuracy i,k And the growth is exponential. The first incentive agent selects devices that enable higher test accuracy.For controlling the following Δ acc i,k Increase r k A change in (c). In general, as machine learning training progresses, model accuracy increases at a slower rate. However, in the federal cooperative task, the model accuracy may be reduced due to data distribution imbalance and heterogeneity. Thus, as FL enters the late stage, an exponential term is used to amplify the increase in boundary accuracy. The second term-1 is used to encourage the agent to improve model accuracy because when Δ acc i,k When less than 0, there is r k ∈(-1,0)。
Training DQN agents to maximize the expectation of a cumulative discount reward as shown in
Where γ ∈ (0,1 ] is a factor discounting future rewards.
At obtaining r k Thereafter, the CS saves the multi-dimensional quad B for each round of federated tasks k =(s i,k ,a i,k ,r k ,s i,k+1 ). Optimal action value function Q(s) i,k ,a i,k ) Is the memo sought by the RL proxy, defined as s i,k Initial cumulative discount yield maximum expectation:
Q(s i,k ,a i,k )=E(r i,k +γmax Q(s i,k+1 ,a i,k )|s i,k ,a i,k ) (22)
a parameterized value function Q(s) can then be learned using function approximation techniques i,k ,a i,k ;w k ) Approximating the optimum function Q(s) i,k ,a i,k ). In the first step r k +γmax Q(s i,k+1 ,a i,k ) Is Q(s) i,k ,a i,k ;w k ) The goal of learning. Generally, DNN is used to represent a function approximator. The RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:
l(w k )=(r i,k +γmax Q(s i,k+1 ,a i,k ;w k )-Q(s i,k ,a i,k ;w k )) 2 (23)
CS updates Global parameter w k Comprises the following steps:
wherein eta is more than or equal to 0, which is the step length.
The CS repeats the above steps to obtain the best learning model. CS can obtain a of the k-th round weight ratio sequence i,k Global parameter ofThe updating is as follows:
Experimental test examples:
to verify the validity of the proposed mechanism, experimental results and analysis are given. Consider a system with 1 cloud server, 10 edge servers, respectively. The experimental learning rate α is 0.01 and the discount factor γ is 0.9. Positive integerAnd taking 3. The values of the parameters are shown in Table 2.
TABLE 2 parameter settings
The validity of the proposed model was verified on two data sets: MNIST and CIFAR-10. The performance of the proposed federal learning model MePC-F was evaluated based on reconstructed images, average accuracy and average loss of DLG. First, the performance of five schemes against DLG attacks was defended, and then the performance of the proposed federal learning model, mePC-F, was compared to centralized and PeMPC. All results in the following scenario are mean values of 1000 independent experiments.
1) Performance against DLG attacks
This section evaluates the effectiveness of MePC-F and compares it with FL, peMPC and DP algorithms (Gaussian and Laplace distributed noise) in DLG reconstructed images. The common gradient of the network is computed for a single image on the MNIST dataset, the results of the different schemes are shown in fig. 4. Since studies [17] indicate that hiding the gradient of the first layer can reduce the reconstruction of the data, the gradient of the first layer (weight and bias terms) is replaced by four methods: mePC-F, peMPC, gaussian distribution (μ =0, σ = 1) noise, and laplace distribution (μ =0, σ = 1) noise were proposed to look at the behavior of DLG. Ladder hiding the first floor after completion
After degrees, DLG uses these gradients to recover the image that created the common shared gradient.
As can be seen from fig. 4, the DLG process can accurately reconstruct the training data without any method to hide the gradient of the first layer (FL in fig. 4 (a)). When the gradient of the first layer is protected by the method proposed by the present invention, mePC-F, information leakage can be effectively prevented in fig. 4 (b). When the number of iteration steps reaches 500, the DLG still cannot construct an image. From fig. 4 (c), it can be seen that similar results to fig. 4 (b), the PeMPC can also defend against the DLG attack in fig. 4. As can be seen from fig. 4 (d), by adding gaussian noise to the first layer, the reconstructed image is partially displayed from the 15 th round to the 20 th round, where the basic contour of the original image has been constructed. As the number of iteration rounds increases to 500 rounds, the image can be restored clearly. The laplacian noise and the gaussian noise in fig. 4 (e) also have a similar phenomenon.
As can be seen from fig. 5, if a malicious server receives the gradients of all hidden layers as plain text, the reconstruction process can obtain the lowest gradient loss and MSE of the image (green line in fig. 5). As the number of rounds increases, peMPC and MePC-F do not converge to zero and the MSE of the image reaches 10 7 . Adding laplacian and gaussian noise to the original gradient converges to 10 -5 Fig. 4 also demonstrates that the data can be reconstructed when up to 20 rounds are reached. The larger the MSE of the image, the less likely it is that the image is reconstructed.
Based on the above experimental results, it is verified that adding laplacian and gaussian noise to the original gradient can prevent the early partial gradient leakage, but as the number of rounds increases, the original data is still recovered due to the depth leakage. However, peMPC and MePC-F are effective methods to prevent DLG attacks from reconstructing raw data no matter how long the number of training rounds are.
2) Performance comparison of average accuracy and average penalty
In this section, the effectiveness of MePC-F was evaluated and compared to centralized and PeMPC in terms of average accuracy, MNIST and average loss on CIFAR-10 data sets.
From fig. 6 (a) it can be seen the number of rounds required by the model to achieve 98% accuracy on the MNIST dataset. The average accuracy of all three methods increases with increasing training rounds. A centralized approach to achieve target accuracy on MNIST data requires 25 rounds, pepmc 140 rounds and MePC-F40 rounds, with MePC-F requiring 71.2% lower training rounds than pepmc. The reason is that the proposed reinforced federal learning algorithm PreFLa can find better aggregation parameter weight a through interaction with the environment i,k The method can better deal with No-IID data, accelerate model convergence and achieve target precision. The centralized type is trained on all combinations of data, so its accuracy will be higher than that of the federal learning algorithm. But it can be seen from the figure that the convergence of the PeMPC can almost reach a centralized accuracy.
From fig. 6 (b), it can be seen that the average loss for the three schemes decreases as the number of training rounds increases. For the lumped type, the average loss is reduced from 0.233 to 0.052. The average loss of the peppc decreased from 0.35 to 0.084. Meanwhile, the average loss of MePC-F proposed by the invention is reduced to 0.06, which is lower than 28.6% of that of PeMPC. The proposed MePC-F can almost reach a centralized loss value when the number of training rounds reaches 100 rounds.
As can be seen in FIG. 7 (a), the number of rounds required for the model to achieve a target accuracy of 50% on CAFIR-10. Similar results to those of fig. 6 (a) can be seen. The average accuracy of all three models is increasing until the target value is reached. For the centralised type, the average accuracy increased from 0.42 to 0.5 for 23 rounds. The average accuracy of the peppc increased from 0.372 to 0.5 for 89 rounds. Meanwhile, the average precision of the proposed MePC-F reaches the target precision at 41 rounds, which is 53.9% lower than that of PeMPC. FIG. 7 (a) shows that MePC-F uses a better weight a than PeMPC i,k The global model is updated, which results in a faster convergence speed.
As can be seen from fig. 7 (b), the average loss for the three schemes is decreasing until a stable value is reached. The centralized MePC-F, peMPC reaches the minimum loss value in sequence, and the time efficiency of the proposed MePC-F is better under PeMPC.
TABLE 3 top accuracy of three schemes in 100 rounds
MNIST | CIFAR-10 | |
centralized | 98.4% | 51.4% |
MePC-F | 98.2% | 51.1% |
PeMPC | 97.6% | 49.2% |
Table 3 gives the accuracy of the three solutions within 100 rounds. For MNIST data, the average accuracy of the proposed MePC-F is 98.2% which is 0.6% higher than that of PeMPC. The accuracy of the PeMPC can almost reach the accuracy of the centralized training. For the CAFIR-10 data, the mean accuracy of MePC-F was as high as 0.511 at 100 rounds, 1.9% higher than for PeMPC. It shows that MePC-F can update a by optimal weight i,k Global parameters are aggregated better than with peampc, resulting in higher accuracy, closer to focused accuracy.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.
Claims (8)
1. A real-time reinforced Federal learning data privacy security method based on a MePC-F model in the Internet of vehicles is characterized by comprising the following steps:
s1, constructing a plurality of edge servers E i And a cloud server CS; acquiring vehicle data D = { D = { (D) 1 ,D 2 ,…,D i }, edge server E i Acquiring corresponding vehicle data D i ;
S2, in the k-th round of federal task, an edge server E i Downloading initial type A gradients from cloud server CSAnd decrypted intoRandom initialization of type B gradientsEdge server E i According to its vehicle data D i To compute gradients in local network model training, edge server E i Recording the gradient information after finishing the T-round local training
S3, edge server E i By decoding functionsFromTo obtain partial gradient information to be preservedAnd the rest ladder is arrangedDegree informationIs homomorphically encrypted toThen broadcast and send to all other edge servers E through the MePC algorithm j (ii) a Edge server E i According to decoding functionObtain information from other edge servers E j Corresponding partial gradient information ofThe class A gradient information after all the edge servers are updated and shared is respectivelyi∈[1,n]N is the total number of edge servers;
s4, all edge servers willUploading the data to a cloud server CS, aggregating global parameters by the cloud server CS through a PreFLa algorithm, and selecting an edge server E by the PreFLa algorithm through obtaining a maximization report through reinforcement learning i Is optimized to the parameter weight ratio a i,k Global gradient parameterAccording to a i,k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;
and S5, repeating the steps S2 to S4 until a termination condition is reached, calculating a final global gradient parameter by the cloud server CS, issuing the final global gradient parameter to each edge server, extracting the accuracy and the optimal loss function of the MePC-F model by the edge server according to the characteristics of a plurality of vehicle data, obtaining the trained MePC-F model, completing the whole training process, and outputting the model to a service corresponding to the Internet of vehicles in real time.
2. The real-time reinforced federal learning data privacy security method in the internet of vehicles based on the MePC-F model as claimed in claim 1, wherein in the step S2, the specific method for training the local network model is as follows:
employing a deep neural network DNN model, the DNN performing end-to-end feature learning and classifier training by taking different vehicle data as raw inputs, using stochastic gradient descent as a subroutine to minimize the loss value in each local training;
E i downloading base layer parameters from cloud server CS in kth round of communication, namely initial A-type gradient before decryptionAnd decipher as A-type gradientRandom initialization of type B gradientsWherein k is the [1,K ]]K represents the total number of rounds of the federal mission; if the task is the first round of federal task, the CS is initialized randomlyBefore local training, E i By using homomorphic cryptographic pairsIs decrypted intoAnd is marked as
The loss function of the local model is set as follows:
L(w i )=l(w i )+λ(w i,t -w i,t+1 ) 2
where L () represents the loss of the network, the second term is the L2 regularization term, and λ is the regularization coefficient; w is a i Representing the total weight information in the local model, w i,t Is the weight information of the local model at time t, w i,t+1 The weight information of the local model at the moment t +1 is obtained;
E i update G k And replaces the weight parameter w of the model i The local model training continues by minimizing the loss function as follows:
w i =w i -ηG k
where eta is the learning rate, G k Is thatAndgeneral expression of (1), hereinRandom initialization;
3. the real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein in the step S3, the specific method of the MePC algorithm is as follows:
in the k-th federal task, all edge servers use MePC to exchange base layer gradientsWherein the content of the first and second substances,class a encrypted data representing the nth edge server in the kth round of federal task,class a encrypted data representing the ith edge server in the kth round of federal task,indicating that the ith edge server in the k round of federal task broadcasts a type A encrypted data to other edge servers,is thatThe encrypted data reserved by the user is removed;
to avoid the risk of data being cracked, a random ratio χ is taken in each networkThe gradient isAnd keeping the same federal random proportion chi equal in the same round, and then willIs encrypted asThe random proportion χ varies across different rounds of federal mission, and χ ∈ [1,1/n ]];The remaining gradient is encrypted homomorphically toIs divided into n-1 partsThe values of (a) are divided into:
only is provided withIs retained at E i In the method, other parts and the random parameter x are broadcast and transmitted to other E in a ciphertext mode j (ii) a In this way, even if part of the transmitted content is attacked, the original dataThe leakage is avoided;
4. The real-time reinforced federal learning data privacy security method in the internet of vehicles based on the MePC-F model as claimed in claim 3, wherein in the step S3, the specific method for locally performing data verification is as follows:
in the k-th round of federal tasks, verification was performed using the corresponding "multiplication" method, with each edge server designing two decoding functions on its own, as follows:
wherein L is 0 Is thatLength of (L') isThe length of (d); subscript k of the decoding function represents the decoding function in the k-th round of federal task;
L 0 =χ·L
require thatThe decoding functions of all the edge servers execute 'union' operation on the same data packet to obtain all 0's, and execute' intersection 'operation to obtain all 1's, namely:
first, the initial decoding function is as follows:
data packetMultiplying with corresponding decoding functions in other servers; due to the fact thatThe binary bit of middle 0 is multiplied by 0, so E i Ensuring that only its own partial data packet is obtained; when in useWhen the binary bit in (1) is 1, obtaining the ciphertext of the gradient information of the corresponding position as follows:
E i will be from other edge servers E j Adding all the obtained data packet arrays to corresponding positions to obtain all the ciphertext data, and updating the ciphertext data into the final ciphertext dataNamely:
5. The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein in the step S4, the specific method of the PreFla algorithm is as follows:
PreFLa adopts reinforcement learning RL to carry out adaptation to select optimal parameter weight ratio a i,k Aggregating global parameters
In an uplink communication stage, each edge server not only trains a local model, but also uploads local parameters to a cloud server CS for joint aggregation; after execution of MePC algorithm in the kth Federal, E i Parameterization over TLS/SSL secure channelAnduploading to the CS; in the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, the model parameters of each ES are used for aggregation, and the convergence speed of the aggregation stage has a crucial influence; therefore, it is necessary to consider participant E in k rounds of federal aggregation i Parameter weight ratio of (a) i,k ;
Using the DQN-based reinforcement learning to predict the parameter weight ratio, and storing information through a Q function to prevent a spatial multi-dimensional disaster; in order to better realize model personalization and reduce the waiting time of the uploading weight in MePC-F, DQN is used for selecting the optimal parameter weight ratio a i,k Aggregating global parameters in update CSThe reinforcement learning comprises the following steps: status, actions, reward functions, and feedback.
6. The real-time reinforced federal learning data privacy security method in internet of vehicles based on the MePC-F model as claimed in claim 5, wherein in the step S4, the specific methods of status, action, reward function and feedback are as follows:
the state is as follows: state of the kth wheelWherein the content of the first and second substances,is a poor precision, expressed as:
the actions are as follows: weight ratio of parameter a i,k An action represented as a kth round of federated tasks; in order to avoid trapping in a local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process to obtain a i,k :
Where P is a set of weight permutations, rand is a random number, rand is an element of 0,1],Q(s i,k ,a i,k ) Means agent is in state s i,k Take action a i,k The accumulated cash-out benefits; once DQN is trained to approximate Q(s) during testing i,k ,a i,k ) The DQN agent will compute { Q(s) for all actions in the kth round i,k ,a i,k )|a i,k ∈[P]}; each action value represents a passing of the agent in state s i,k Selecting a particular action a i,k Maximum expected return obtained;
rewarding: the observed reward at the end of the k-th round of federal is set to be:
wherein the content of the first and second substances,is a positive number, ensuring that r k Δ acc with training accuracy i,k The growth is exponential; the first incentive agent selects equipment capable of achieving higher test accuracy;for controlling the following Δ acc i,k Increase r k A change in (c); when Δ acc i,k When less than 0, there is r k ∈(-1,0);
Training the DQN agent to maximize the expectation of a cumulative discount reward, as shown by:
wherein γ ∈ (0,1 ], representing a factor discounting future rewards;
in obtaining r k The cloud server CS then saves the multi-dimensional quadruplets B for each round of federal tasks k =(s i,k ,a i,k ,r k ,s i,k+1 ) (ii) a Optimal action value function Q(s) i,k ,a i,k ) Is the memo sought by the RL proxy, defined as s i,k Initial cumulative discount yield maximum expectation:
Q(s i,k ,a i,k )=E(r i,k +γmax Q(s i,k+1 ,a i,k )|s i,k ,a i,k )
learning a parameterized value function Q(s) using function approximation techniques i,k ,a i,k ;w k ) Approximating the optimum function Q(s) i,k ,a i,k );r k +γmax Q(s i,k+1 ,a i,k ) Is Q(s) i,k ,a i,k ;w k ) A goal of learning; DNN is used to represent a function approximator; the RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:
l(w k )=(r i,k +γmax Q(S i,k+1 ,a i,k ;w k )-Q(s i,k ,a i,k ;w k )) 2
CS updates Global parameter w k Comprises the following steps:
wherein eta is more than or equal to 0 and is the step length;
after the cloud server CS obtains the optimal learning model, a of the k-th round weight ratio sequence is obtained i,k Global parameter ofThe updating is as follows:
7. The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein the HE encryption method in the method specifically comprises the following steps:
the encryption schemes of the weight matrix and the offset vector follow the same idea, and the addition homomorphic encryption of the real number a is expressed as a E In addition homomorphic encryption, for any two numbers a and b, there is a E +b E =(a+b) E (ii) a The method of converting any real number r into a coded rational number stationary point v is:
consider the gradientEach encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits, and d-bit fractional bits; thus, each rational number that can be encoded is defined by its H =1+ z + d bits; performing encoding to allow multiplication operations, which require operations modulo H +2d to avoid comparisons;
the decoding is defined as:
multiplication of these code numbers requires removal by a factor of 1/2d; when Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once; for simplicity, it is processed at decoding time;
the largest encryptable integer is V-1, so the largest encryptable real number must be taken into account, and therefore the integer z and the fraction d are chosen as follows:
V≥2 H+2d ≥2 1+z+3d 。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210816716.3A CN115310121B (en) | 2022-07-12 | 2022-07-12 | Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210816716.3A CN115310121B (en) | 2022-07-12 | 2022-07-12 | Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115310121A true CN115310121A (en) | 2022-11-08 |
CN115310121B CN115310121B (en) | 2023-04-07 |
Family
ID=83857637
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210816716.3A Active CN115310121B (en) | 2022-07-12 | 2022-07-12 | Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115310121B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115731424A (en) * | 2022-12-03 | 2023-03-03 | 北京邮电大学 | Image classification model training method and system based on enhanced federal domain generalization |
CN115860789A (en) * | 2023-03-02 | 2023-03-28 | 国网江西省电力有限公司信息通信分公司 | FRL (fast recovery loop) -based CES (Cyclic emergency separation) day-ahead scheduling method |
CN117812564A (en) * | 2024-02-29 | 2024-04-02 | 湘江实验室 | Federal learning method, device, equipment and medium applied to Internet of vehicles |
CN117873402A (en) * | 2024-03-07 | 2024-04-12 | 南京邮电大学 | Collaborative edge cache optimization method based on asynchronous federal learning and perceptual clustering |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160573A (en) * | 2020-04-01 | 2020-05-15 | 支付宝(杭州)信息技术有限公司 | Method and device for protecting business prediction model of data privacy joint training by two parties |
CN111611610A (en) * | 2020-04-12 | 2020-09-01 | 西安电子科技大学 | Federal learning information processing method, system, storage medium, program, and terminal |
CN112100295A (en) * | 2020-10-12 | 2020-12-18 | 平安科技(深圳)有限公司 | User data classification method, device, equipment and medium based on federal learning |
CN112199702A (en) * | 2020-10-16 | 2021-01-08 | 鹏城实验室 | Privacy protection method, storage medium and system based on federal learning |
CN112347500A (en) * | 2021-01-11 | 2021-02-09 | 腾讯科技(深圳)有限公司 | Machine learning method, device, system, equipment and storage medium of distributed system |
CN113037460A (en) * | 2021-03-03 | 2021-06-25 | 北京工业大学 | Federal learning privacy protection method based on homomorphic encryption and secret sharing |
CN113435472A (en) * | 2021-05-24 | 2021-09-24 | 西安电子科技大学 | Vehicle-mounted computing power network user demand prediction method, system, device and medium |
US20220129700A1 (en) * | 2020-10-27 | 2022-04-28 | Alipay (Hangzhou) Information Technology Co., Ltd. | Methods, apparatuses, and systems for updating service model based on privacy protection |
-
2022
- 2022-07-12 CN CN202210816716.3A patent/CN115310121B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160573A (en) * | 2020-04-01 | 2020-05-15 | 支付宝(杭州)信息技术有限公司 | Method and device for protecting business prediction model of data privacy joint training by two parties |
CN111611610A (en) * | 2020-04-12 | 2020-09-01 | 西安电子科技大学 | Federal learning information processing method, system, storage medium, program, and terminal |
CN112100295A (en) * | 2020-10-12 | 2020-12-18 | 平安科技(深圳)有限公司 | User data classification method, device, equipment and medium based on federal learning |
CN112199702A (en) * | 2020-10-16 | 2021-01-08 | 鹏城实验室 | Privacy protection method, storage medium and system based on federal learning |
US20220129700A1 (en) * | 2020-10-27 | 2022-04-28 | Alipay (Hangzhou) Information Technology Co., Ltd. | Methods, apparatuses, and systems for updating service model based on privacy protection |
CN112347500A (en) * | 2021-01-11 | 2021-02-09 | 腾讯科技(深圳)有限公司 | Machine learning method, device, system, equipment and storage medium of distributed system |
CN113037460A (en) * | 2021-03-03 | 2021-06-25 | 北京工业大学 | Federal learning privacy protection method based on homomorphic encryption and secret sharing |
CN113435472A (en) * | 2021-05-24 | 2021-09-24 | 西安电子科技大学 | Vehicle-mounted computing power network user demand prediction method, system, device and medium |
Non-Patent Citations (1)
Title |
---|
孙爽: "不同场景的联邦学习安全与隐私保护研究综述", 《计算机应用研究》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115731424A (en) * | 2022-12-03 | 2023-03-03 | 北京邮电大学 | Image classification model training method and system based on enhanced federal domain generalization |
CN115731424B (en) * | 2022-12-03 | 2023-10-31 | 北京邮电大学 | Image classification model training method and system based on enhanced federal domain generalization |
CN115860789A (en) * | 2023-03-02 | 2023-03-28 | 国网江西省电力有限公司信息通信分公司 | FRL (fast recovery loop) -based CES (Cyclic emergency separation) day-ahead scheduling method |
CN115860789B (en) * | 2023-03-02 | 2023-05-30 | 国网江西省电力有限公司信息通信分公司 | CES day-ahead scheduling method based on FRL |
CN117812564A (en) * | 2024-02-29 | 2024-04-02 | 湘江实验室 | Federal learning method, device, equipment and medium applied to Internet of vehicles |
CN117812564B (en) * | 2024-02-29 | 2024-05-31 | 湘江实验室 | Federal learning method, device, equipment and medium applied to Internet of vehicles |
CN117873402A (en) * | 2024-03-07 | 2024-04-12 | 南京邮电大学 | Collaborative edge cache optimization method based on asynchronous federal learning and perceptual clustering |
CN117873402B (en) * | 2024-03-07 | 2024-05-07 | 南京邮电大学 | Collaborative edge cache optimization method based on asynchronous federal learning and perceptual clustering |
Also Published As
Publication number | Publication date |
---|---|
CN115310121B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115310121B (en) | Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles | |
Han et al. | Logistic regression on homomorphic encrypted data at scale | |
CN109684855B (en) | Joint deep learning training method based on privacy protection technology | |
US11017322B1 (en) | Method and system for federated learning | |
US11449753B2 (en) | Method for collaborative learning of an artificial neural network without disclosing training data | |
Wang et al. | Authenticated garbling and efficient maliciously secure two-party computation | |
EP3958158B1 (en) | Privacy-preserving machine learning | |
Zhu et al. | Distributed additive encryption and quantization for privacy preserving federated deep learning | |
CN110572253A (en) | Method and system for enhancing privacy of federated learning training data | |
CN111160573A (en) | Method and device for protecting business prediction model of data privacy joint training by two parties | |
CN112989368A (en) | Method and device for processing private data by combining multiple parties | |
CN111291411B (en) | Safe video anomaly detection system and method based on convolutional neural network | |
Kundu et al. | Learning to linearize deep neural networks for secure and efficient private inference | |
Darwish | A modified image selective encryption-compression technique based on 3D chaotic maps and arithmetic coding | |
CN114363043B (en) | Asynchronous federal learning method based on verifiable aggregation and differential privacy in peer-to-peer network | |
CN114254386A (en) | Federated learning privacy protection system and method based on hierarchical aggregation and block chain | |
Zhu et al. | Enhanced federated learning for edge data security in intelligent transportation systems | |
CN112949741B (en) | Convolutional neural network image classification method based on homomorphic encryption | |
Ghavamipour et al. | Federated synthetic data generation with stronger security guarantees | |
Li et al. | An Adaptive Communication‐Efficient Federated Learning to Resist Gradient‐Based Reconstruction Attacks | |
CN117294469A (en) | Privacy protection method for federal learning | |
CN116582242A (en) | Safe federal learning method of ciphertext and plaintext hybrid learning mode | |
CN110737907A (en) | Anti-quantum computing cloud storage method and system based on alliance chain | |
Qiu et al. | Privacy preserving federated learning using ckks homomorphic encryption | |
Xu et al. | Privacy-preserving outsourcing decision tree evaluation from homomorphic encryption |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |