CN115310121B - Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles - Google Patents

Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles Download PDF

Info

Publication number
CN115310121B
CN115310121B CN202210816716.3A CN202210816716A CN115310121B CN 115310121 B CN115310121 B CN 115310121B CN 202210816716 A CN202210816716 A CN 202210816716A CN 115310121 B CN115310121 B CN 115310121B
Authority
CN
China
Prior art keywords
model
data
mepc
federal
gradient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210816716.3A
Other languages
Chinese (zh)
Other versions
CN115310121A (en
Inventor
朱容波
李梦瑶
刘浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong Agricultural University
Original Assignee
Huazhong Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong Agricultural University filed Critical Huazhong Agricultural University
Priority to CN202210816716.3A priority Critical patent/CN115310121B/en
Publication of CN115310121A publication Critical patent/CN115310121A/en
Application granted granted Critical
Publication of CN115310121B publication Critical patent/CN115310121B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/062Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00 applying encryption of the keys
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a real-time reinforced federal learning data privacy security method based on a MePC-F model in the internet of vehicles, which comprises the following steps: building multiple edge servers E i And a cloud server CS; edge server E i Downloading initial type A gradients from cloud server CS
Figure DDA0003740951440000011
And decrypted into
Figure DDA0003740951440000012
Random initialization of type B gradients
Figure DDA0003740951440000013
Carrying out local model training; edge server E i By a decoding function from
Figure DDA0003740951440000014
To obtain partial gradient information to be preserved
Figure DDA0003740951440000015
And the remaining gradient information is used
Figure DDA0003740951440000016
Is homomorphically encrypted as
Figure DDA0003740951440000017
Then broadcast and send to all other edge servers E through MePC algorithm j (ii) a The class A gradient information after all the edge servers are updated and shared is respectively
Figure DDA0003740951440000018
All edge servers will
Figure DDA0003740951440000019
Uploading the global parameters to a cloud server CS, and aggregating the global parameters by the cloud server CS through a PreFLa algorithm; the above steps are repeated until a termination condition is reached. The invention prevents data leakage between terminals, realizes data privacy safety protection, and reduces communication overhead while preventing original data leakage.

Description

Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles
Technical Field
The invention relates to the technical field of networked vehicle user cooperative processing real-time safety behavior analysis, in particular to a real-time reinforced federal learning data privacy security method based on a MePC-F model in the internet of vehicles.
Background
With the development of various real-time communications and services supported by the internet of vehicles, the data volume generated by interconnected equipment such as vehicle-mounted units is unprecedentedly huge, a large amount of heterogeneous data oriented to vehicle users and the difference of equipment computing capacity are provided, federal learning provides an effective solution for meeting the requirement of data safety protection in the real-time training process of a network model, and different edge devices can cooperatively train a machine learning model under the condition of not exposing original data.
The edge computing mass data and the personal privacy of the user are closely combined, for example, the track of the user, the credit card, the bill and other data are really related to the privacy security of the user, and if data leakage occurs, great potential safety hazards are brought to the user. Federal learning can protect data to some extent, but the risk of information leakage still exists, and there are four types, 1) member leakage, 2) unexpected feature leakage, 3) class representing original data leakage, and 4) original data leakage. The last type of data leakage is least acceptable to privacy-sensitive participants.
In order to protect the data privacy of mobile users and solve the problem of leakage of raw data, researchers have conducted a lot of research on data security protection based on cryptography: differential privacy, homomorphic encryption, and multi-party secure computation. Differential privacy generally uses three noise addition mechanisms: a laplace mechanism, a gaussian mechanism, and an exponential mechanism, respectively. The context information is disturbed by adding noise to protect the privacy of the data, but if the noise is increased too much, the performance of the model training is affected. Common in homomorphic encryption are additive and multiplicative homomorphic encryption: research shows that when Paillier addition homomorphic encryption calculation is used, noise is doubled, and when El Gamal multiplication homomorphic encryption calculation is used, the noise is increased in a second order. To increase the availability of data and overcome the noise problem, researchers have introduced bootstrapping, which reduces noise by setting a threshold for encryption and decryption, allowing the scheme to compute an unlimited number of operations. It is also possible to solve the noise problem by batch processing or by parallel homomorphic calculation or compression of deletion pairs. The safe multi-party calculation refers to the problem that a multi-party participant safely calculates an appointed function under the condition of no trusted third party, and the main purpose is to ensure that private input of each party is independent in the calculation process and no local data is leaked in the calculation process. Research proves that the gradient leakage problem in federal learning can be solved by using safe multi-party calculation, and the data safety protection can be carried out while the accuracy is ensured only by carrying out information exchange on the first hidden layer. But the process of information interaction is P2P, so the problem of large communication overhead occurs.
Most of data security protection research based on cryptography is a centralized solution, and aims to solve the problem of time overhead while protecting data security: federated learning allows edge devices to co-train machine learning models without exposing raw data. Federal learning typically employs a parameter server architecture, where the client is trained by a parameter server synchronization local model. The synchronization method is usually used for realizing, namely, the central server synchronously sends the global model to a plurality of clients, and the plurality of clients synchronously return the updated model to the central server after training the model based on local data. This may be slow due to a queue loss. Global synchronization is very difficult, especially in a joint learning scenario, due to limited computing power and battery time, and varying availability and completion times from device to device. A new joint optimization asynchronous algorithm is proposed to solve the regularization local problem to ensure convergence, so that multiple devices and servers can cooperatively and efficiently train the model without revealing privacy.
Despite much research in data security. However, most of the existing methods are limited to solving the safety problem of original data, how to simultaneously meet the goals of privacy and availability of big data of mobile users in a complex car networking space, and the problem that data is recovered after gradient leakage is prevented while communication overhead is reduced by designing an effective federal learning algorithm is still open.
Firstly, data in the federal learning are stored in the local node, so that the risk problem of leakage of original data in data transmission can be reduced. But only gradient information is transmitted, the possibility still arises that the original data is recovered. Data interaction in secure multi-party computing can enable multiple parties to have data, and the possibility that a sample is recovered by information after gradient information is leaked is reduced. However, in the existing secure multi-party computing, the way of exchanging information among users is that all users send information to other users, and simply speaking, a unicast way is used, which brings higher time overhead. It is important to find a suitable solution to reduce the risk of data being attacked and recovered, and to reduce transmission delays, while addressing the data security and real-time needs of the vehicle user. Secondly, due to the difference of data and equipment of different edge servers, it is also necessary to improve the training precision of the whole model in a targeted manner in the training process. The global parameter aggregation in a typical federal average synchronization mode is slow due to the phenomenon of queue loss. While the communication time overhead is balanced and calculated, it is also important that the global precision is guaranteed through personalized training of a plurality of models. However, most federal learning algorithms based on data security rely on a synchronous aggregation algorithm, which can bring high latency to challenge meeting the real-time requirements of the internet of vehicles. Therefore, a federated learning algorithm based on reinforcement learning is necessary to reduce time delay, improve accuracy and guarantee data safety.
Disclosure of Invention
The invention aims to solve the technical problem of providing a real-time reinforced federal learning data privacy security method based on a MePC-F model in the internet of vehicles aiming at the defects in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the invention provides a real-time reinforced federal learning data privacy security method based on a MePC-F model in the internet of vehicles, which comprises the following steps:
s1, constructing a plurality of edge servers E i And a cloud server CS; acquiring vehicle data D = { D = { (D) 1 ,D 2 ,…,D i }, edge server E i Acquiring corresponding vehicle data D i
S2, in the k-th round of federal task, the edge server E i Downloading initial type A gradients from cloud server CS
Figure BDA0003740951420000041
And deciphers as->
Figure BDA0003740951420000042
Randomly initializing a type B gradient pick>
Figure BDA0003740951420000043
Edge server E i According to its vehicle data D i To compute gradients in local network model training, edge server E i Recording the gradient information after finishing the T-round local training
Figure BDA0003740951420000044
S3, edge server E i By decoding functions
Figure BDA0003740951420000045
Slave->
Figure BDA0003740951420000046
To obtain partial gradient information to be preserved
Figure BDA0003740951420000047
And the remaining gradient information is pick>
Figure BDA0003740951420000048
Is encrypted to be->
Figure BDA0003740951420000049
Then broadcast and send to all other edge servers E through MePC algorithm j (ii) a Edge server E i Based on a decoding function>
Figure BDA00037409514200000410
Obtain information from other edge servers E j Corresponding partial gradient information ≥>
Figure BDA00037409514200000411
The updated and shared A-type gradient information of all the edge servers is ^ R>
Figure BDA00037409514200000412
i∈[1,n]N is the total number of edge servers;
s4, all edge servers will
Figure BDA00037409514200000413
Uploading the data to a cloud server CS, aggregating global parameters by the cloud server CS through a PreFLa algorithm, and selecting an edge server E by the PreFLa algorithm through obtaining a maximization report through reinforcement learning i Is optimized to the parameter weight ratio a i,k The global gradient parameter->
Figure BDA00037409514200000414
According to a i,k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;
and S5, repeating the steps S2 to S4 until a termination condition is reached, calculating a final global gradient parameter by the cloud server CS, issuing the final global gradient parameter to each edge server, extracting the feature of the plurality of vehicle data by the edge server, calculating the accuracy and the optimal loss function of the MePC-F model to obtain the trained MePC-F model, completing the whole training process, and outputting the model to a service corresponding to the Internet of vehicles in real time. .
Further, in step S2 of the present invention, a specific method for training the local network model is as follows:
employing a deep neural network DNN model, the DNN performing end-to-end feature learning and classifier training by taking different vehicle data as raw inputs, using stochastic gradient descent as a subroutine to minimize the loss value in each local training;
E i downloading base layer parameters from cloud server CS in kth round of communication, namely initial A-type gradient before decryption
Figure BDA0003740951420000051
And deciphers as type A gradient->
Figure BDA0003740951420000052
Randomly initializing a type B gradient pick>
Figure BDA0003740951420000053
Wherein k is the [1,K ]]K represents the total number of rounds of the federal mission; if it is the first round of the federal task, the CS initializes randomly->
Figure BDA0003740951420000054
Before local training, E i By using homomorphic encryption pairs->
Figure BDA0003740951420000055
Decipher into->
Figure BDA0003740951420000056
And is recorded as->
Figure BDA0003740951420000057
The loss function of the local model is set as follows:
L(w i )=l(w i )+λ(w i,t -w i,t+1 ) 2
where L () represents the loss of the network, the second term is the L2 regularization term, and λ is the regularization coefficient; w is a i Representing total weight information in the local model, w i,t Is the weight information of the local model at time t, w i,t+1 Is the weight of the local model at time t +1Information;
E i initialization G k And replaces the weight parameter w of the model i The local model training continues by minimizing the loss function as follows:
w i =w i -ηG k
where eta is the learning rate, G k Is that
Figure BDA0003740951420000058
And &>
Figure BDA0003740951420000059
In a manner known per se, here>
Figure BDA00037409514200000510
Random initialization;
edge server E i After T rounds of local training are achieved, the accuracy acc of each local model is obtained at the moment i,k And
Figure BDA00037409514200000511
further, in step S3 of the present invention, a specific method of the MePC algorithm is as follows:
in the k-th federal task, all edge servers use MePC to exchange base layer gradients
Figure BDA00037409514200000512
Wherein it is present>
Figure BDA00037409514200000513
Encrypted data in class A representing the nth edge server in the kth federated task, and +>
Figure BDA00037409514200000514
Encrypted data in class A representing the ith edge server in the kth federated task, and +>
Figure BDA00037409514200000515
Express the kth round of federationThe ith edge server broadcasts the A-type encrypted data sent to other edge servers in the task, and the encrypted data is judged to be on or off>
Figure BDA00037409514200000516
I.e. is->
Figure BDA00037409514200000517
The encrypted data reserved by the user is removed;
to avoid the risk of data being cracked, a random ratio χ is taken in each network
Figure BDA0003740951420000061
The gradient is then->
Figure BDA0003740951420000062
And keeping the random proportion χ of the same federal round the same, and then will ^ be ^ or ^ be ^ ed>
Figure BDA0003740951420000063
Encrypted is->
Figure BDA0003740951420000064
The random proportion χ varies across different rounds of federal mission, and χ ∈ [1,1/n ]];
Figure BDA0003740951420000065
The remaining gradient is ^ encrypted by homomorphism>
Figure BDA0003740951420000066
Is divided into n-1 parts
Figure BDA0003740951420000067
The values of (a) are divided into:
Figure BDA0003740951420000068
only is provided with
Figure BDA0003740951420000069
Is retained at E i In the method, other parts and the random parameter χ are broadcast and transmitted to other E in the form of ciphertext j (ii) a In this manner, even if portions of the transmitted content are attacked, the original data ≧ is>
Figure BDA00037409514200000610
The leakage is avoided;
sharing to other E j The gradient information of (A) is
Figure BDA00037409514200000611
Figure BDA00037409514200000612
When Ei receives data packet sent by other server
Figure BDA00037409514200000613
It performs data authentication locally.
Further, in step S3 of the present invention, a specific method for locally performing data verification includes:
in the k-th round of federal task, verification was performed using a corresponding "multiplication" method, each edge server itself designing two decoding functions, as follows:
Figure BDA00037409514200000614
Figure BDA00037409514200000615
wherein L is 0 Is that
Figure BDA00037409514200000616
Is L' is->
Figure BDA00037409514200000617
Length of (d); subscript k of the decoding function represents the decoding function in the k-th round of federal task;
L 0 =χ·L
Figure BDA00037409514200000618
wherein L is
Figure BDA00037409514200000619
Is greater than or equal to>
Figure BDA00037409514200000620
And/or>
Figure BDA00037409514200000621
Are equal in length;
require that
Figure BDA0003740951420000071
The decoding functions of all the edge servers execute 'union' operation on the same data packet to obtain all 0's, and execute' intersection 'operation to obtain all 1's, namely:
Figure BDA0003740951420000072
Figure BDA0003740951420000073
first, the initial decoding function is as follows:
Figure BDA0003740951420000074
data packet
Figure BDA0003740951420000075
Decoding functions corresponding to those in other serversMultiplying; due to->
Figure BDA0003740951420000076
The binary bit of middle 0 is multiplied by 0, so E i Ensuring that only its own partial data packet is obtained; when/is>
Figure BDA0003740951420000077
When the binary bit in (1) is 1, obtaining the ciphertext of the gradient information of the corresponding position as follows:
Figure BDA0003740951420000078
E i will be from other edge servers E j Adding all the obtained data packet arrays to corresponding positions to obtain all the ciphertext data, and updating the ciphertext data into the final ciphertext data
Figure BDA0003740951420000079
Namely:
Figure BDA00037409514200000710
each time a secure multiparty computation is performed, as k increases, every E i Decoding function of
Figure BDA00037409514200000711
Is left-cyclically shifted by m units to ensure @>
Figure BDA00037409514200000712
Dynamics of sharing and equally dividing them into E 1 ,E 2 ,…,E n And the data information of each part is not repeated.
Further, in step S4 of the present invention, the specific method of the PreFla algorithm is as follows:
PreFLa adopts reinforcement learning RL to carry out adaptation to select optimal parameter weight ratio a i,k Aggregating global parameters
Figure BDA00037409514200000713
In an uplink communication stage, each edge server not only trains a local model, but also uploads local parameters to a cloud server CS for joint aggregation; after execution of the MePC algorithm in the k-th federation, E i Parameterization over TLS/SSL secure channel
Figure BDA00037409514200000714
And &>
Figure BDA00037409514200000715
Uploading to the CS; in the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, the model parameters of each ES are used for aggregation, and the convergence speed of the aggregation stage has a crucial influence; therefore, it is necessary to consider participant E in k rounds of federal aggregation i Parameter weight ratio of (a) i,k
Using the reinforced learning based on DQN to predict the weight ratio of the parameters, and storing information through a Q function to prevent a spatial multi-dimensional disaster; in order to better realize model personalization and reduce the waiting time of the uploading weight in MePC-F, DQN is used for selecting the optimal parameter weight ratio a i,k Aggregating global parameters in update CS
Figure BDA0003740951420000081
The reinforcement learning comprises the following steps: status, actions, reward functions, and feedback.
Further, in step S4 of the present invention, the specific method of the state, the action, the reward function and the feedback is as follows:
the state is as follows: state of the kth wheel
Figure BDA0003740951420000082
Wherein it is present>
Figure BDA0003740951420000083
Is a poor precision, expressed as:
Figure BDA0003740951420000084
the actions are as follows: weight ratio of parameters a i,k An action expressed as a kth round of federal task; in order to avoid trapping in a local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process to obtain a i,k
Figure BDA0003740951420000085
Where P is a set of weight permutations, rand is a random number, rand is an element of 0,1],Q(s i,k ,a i,k ) Means agent is in state s i,k Take action a i,k The accumulated cash-out benefits; once DQN is trained to approximate Q(s) during testing i,k ,a i,k ) The DQN agent will compute { Q(s) for all actions in the kth round i,k ,a i,k )|a i,k ∈[P]}; each action value represents a passing of the agent in state s i,k Selecting a particular action a i,k The maximum expected reward that can be achieved;
rewarding: the observed reward at the end of the k-th round of federal is set to be:
Figure BDA0003740951420000086
wherein,
Figure BDA0003740951420000087
is a positive number, ensuring that r k Δ acc with test accuracy i,k The growth is exponential; the first incentive agent selects equipment capable of achieving higher test accuracy;
Figure BDA0003740951420000091
For controlling the following Δ acc i,k Increase r k A change in (c); when Δ acc i,k When less than 0, there is r k ∈(-1,0);
Training the DQN agent to maximize the expectation of a cumulative discount reward, as shown by:
Figure BDA0003740951420000092
wherein γ ∈ (0,1 ], representing a factor discounting future rewards;
at obtaining r k The cloud server CS then saves the multi-dimensional quadruplets B for each round of federal tasks k =(s i,k ,a i,k ,r k ,s i,k+1 ) (ii) a Optimal action value function Q(s) i,k ,a i,k ) Is the memo sought by the RL proxy, defined as s i,k Initial cumulative discount yield maximum expectation:
Q(s i,k ,a i,k )=E(r i,k +γmax Q(s i,k+1 ,a i,k )|s i,k ,a i,k )
learning a parameterized value function Q(s) using function approximation techniques i,k ,a i,k ;w k ) Approximating the optimum function Q(s) i,k ,a i,k );r k +γmax Q(s i,k+1 ,a i,k ) Is Q(s) i,k ,a i,k ;w k ) A goal of learning; DNN is used to represent a function approximator; the RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:
l(w k )=(r i,k +γmax Q(s i,k+1 ,a i,k ;w k )-Q(s i,k ,a i,k ;w k )) 2
CS updates the global parameter w k Comprises the following steps:
Figure BDA0003740951420000093
wherein eta is more than or equal to 0 and is the step length;
after the cloud server CS obtains the optimal learning model, a of the kth round weight ratio sequence is obtained i,k Global parameter of
Figure BDA0003740951420000094
The updating is as follows: />
Figure BDA0003740951420000095
All edge servers update global parameters
Figure BDA0003740951420000096
And begin the next T local training rounds.
Further, the HE encryption method in the method of the present invention specifically includes:
the encryption schemes of the weight matrix and the offset vector follow the same idea, and the addition homomorphic encryption of the real number a is expressed as a E In addition homomorphic encryption, for any two numbers a and b, there is a E +b E =(a+b) E (ii) a The method of converting any real number r into a coded rational number stationary point v is:
Figure BDA0003740951420000101
consider the gradient
Figure BDA0003740951420000102
Each encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits, and d-bit fractional bits; thus, each rational number that can be encoded is defined by its position H =1+ z + d; performing encoding to allow multiplication operations, which require operations modulo H +2d to avoid comparisons;
the decoding is defined as:
Figure BDA0003740951420000103
multiplication of these code numbers requires removal of the factor 1/2d; when Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once; for simplicity, it is processed at decoding time;
the largest encryptable integer is V-1, so the largest encryptable real number must be taken into account, and therefore the integer z and the fraction d are chosen as follows:
V≥2 H+2d ≥2 1+z+3d
further, the optimal loss function in step S5 of the present invention is
Figure BDA0003740951420000104
Wherein, L (w) i ) Denotes E i Loss of the network.
The invention has the following beneficial effects:
(1) A Federal learning model (MePC-F) for multi-party broadcast security computing is presented. The model combines the MePC algorithm and the PreFla algorithm to solve the problems of the security of federal learning training data and communication overhead in the Internet of vehicles. And the mixed advantages of homomorphic encryption and safe multi-party calculation are considered to prevent data leakage between the terminals, and the reduction degree of the original data is reduced after the data is attacked, so that the privacy safety protection of the data is realized to the maximum extent.
(2) A secure broadcast multi-party computing MePC is presented. For secure multiparty computation, sharing only the gradient information of the first layer can greatly reduce the risk of data being recovered and reduce traffic. In the sharing process, the edge server model takes respective parts through a decoding function in a broadcasting mode, and the time complexity can be increased from O (n) 2 ) Lowering to O (n) reduces communication overhead while preventing leakage of raw data.
(3) A federal learning algorithm PreFla based on weight proportion is proposed. And finding the optimal gradient weight ratio by using PreFla to aggregate global parameters, and designing a reward function by using the accuracy difference of each edge server, so that the action selection with the maximum overall return is the weight ratio of each round of federation. And an L2 regularization term is added in a loss function to promote edge server cooperation and reduce time delay and performance problems caused by data heterogeneity. Therefore, the global model is generalized better, and convergence is accelerated.
Drawings
The invention will be further described with reference to the following drawings and examples, in which:
FIG. 1 is a MePC-F model of an embodiment of the present invention;
FIG. 2 is a flow chart of a MePC-F model of an embodiment of the present invention;
FIG. 3 is a MePC algorithm of an embodiment of the present invention;
fig. 4 shows DLG results when the first hidden layer is hidden and not hidden by four methods in the MNIST according to the embodiment of the present invention; (a) FL; (b) MePC-F; (c) PeMPC; (d) Gaussian; (e) Laplacian;
FIG. 5 is the performance of DLG on MNIST when the gradient of the first hidden layer is replaced by four methods (Gauss distribution, laplace distribution, PEMPC and MePC-F) according to an embodiment of the present invention;
FIG. 6 is a graph of average accuracy and loss of No-IID MNIST data for an embodiment of the present invention;
FIG. 7 is the average accuracy and loss of No-IID CAFIR-10 data for an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The parameters involved in the examples of the invention are described below:
TABLE 1 description of the parameters
Figure BDA0003740951420000121
Wherein E is i Indicating the current edge server, E j Representing edge servers other than the current edge server, E s Representing all edge servers.
The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles comprises the following steps:
s1, constructing a plurality of edge servers E i And a cloud server CS; acquiring vehicle data D = { D = { (D) 1 ,D 2 ,…,D i }, edge server E i Acquiring corresponding vehicle data D i
S2, in the k-th round of federal task, the edge server E i Downloading initial type A gradients from cloud server CS
Figure BDA0003740951420000131
And deciphers as->
Figure BDA0003740951420000132
Randomly initializing a type B gradient pick>
Figure BDA0003740951420000133
Edge server E i According to its vehicle data D i To compute gradients in local network model training, edge server E i Recording the gradient information after finishing the T-round local training
Figure BDA0003740951420000134
S3, edge server E i By decoding functions
Figure BDA0003740951420000135
Slave/slave unit>
Figure BDA0003740951420000136
To obtain partial gradient information to be preserved
Figure BDA0003740951420000137
And the remaining gradient information is pick>
Figure BDA0003740951420000138
Via homomorphic encryptionIs->
Figure BDA0003740951420000139
Then broadcast and send to all other edge servers E through the MePC algorithm j (ii) a Edge server E i According to a decoding function>
Figure BDA00037409514200001310
Obtain information from other edge servers E j In the corresponding partial gradient information &>
Figure BDA00037409514200001311
The updated and shared A-type gradient information of all the edge servers is ^ R>
Figure BDA00037409514200001312
i∈[1,n]N is the total number of edge servers;
s4, all edge servers will
Figure BDA00037409514200001313
Uploading the global parameters to a cloud server CS, aggregating the global parameters by the cloud server CS through a PreFLa algorithm, and selecting an edge server E by the PreFLa algorithm through obtaining a maximized report through reinforcement learning i Is optimized to the parameter weight ratio a i,k The global parameter->
Figure BDA00037409514200001314
According to a i,k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;
and S5, repeating the steps S2-S4 until a termination condition is reached, and finishing the whole training process. The termination condition may be a maximum number of training cycles, a convergence of a loss function, or other user-defined condition. Finally, an optimal loss function can be obtained according to the following equation (1).
Figure BDA00037409514200001315
Wherein, L (w) i ) Denotes E i Loss of the network.
The specific method of local training is as follows:
in the local model phase, a Deep Neural Network (DNN) is employed to learn the cloud model and the ES model. DNN performs end-to-end feature learning and classifier training by taking different user data as raw inputs. Random gradient descent will be used in the proposed algorithm as a subroutine to minimize the loss value in each local training.
In the downstream communication phase E i At k (k E [1,K)]) Downloading base layer parameters from CS in round robin communication
Figure BDA0003740951420000141
And randomly initializing>
Figure BDA0003740951420000142
Where K represents the total number of rounds of the federal mission. If it is the first round of the federal task, the CS initializes randomly->
Figure BDA0003740951420000143
Before local training, E i Needs to be met by using homomorphic encryption (equation (4)) pairs>
Figure BDA0003740951420000144
Decipher into->
Figure BDA0003740951420000145
And is recorded as->
Figure BDA0003740951420000146
In order to better embody the model personalization, the loss function of the local model is set as follows:
L(w i )=l(w i )+λ(w i,t -w i,t+1 ) 2 (16)
where l () represents the loss of the network, e.g. the cross-entropy loss of the classification task. The second term is an L2 regularization term, which can not only keep the individuation capability of the second term, but also improve the cooperation efficiency with other participants. λ is a regularization coefficient.
E i Initialization G k And replaces the weight parameter w of the model i Continuing the local model training as follows
w i =w i -ηG k (17)
Where eta is the learning rate, G k Is that
Figure BDA0003740951420000147
And &>
Figure BDA0003740951420000148
Is shown generally. Here->
Figure BDA0003740951420000149
And (4) random initialization.
E i After T rounds of local training are achieved, the accuracy acc of each local model can be obtained at the moment i,k
Figure BDA00037409514200001410
And &>
Figure BDA00037409514200001411
Direct sharing of user information between the terminals is forbidden, and data in the edge server needs to be encrypted before communication, so that the data is prevented from being attacked before communication. This process uses HE to avoid information leakage. The process of adding HE using real numbers will be shown below. The encryption scheme of the weight matrix and the offset vector follows the same idea, and the addition homomorphic encryption of the real number a is expressed as a E . In additive homomorphic encryption, for any two numbers a and b, there is a E +b E =(a+b) E . The method of converting any real number r into a coded rational number stationary point v is:
Figure BDA00037409514200001412
consider the gradient
Figure BDA0003740951420000151
Each encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits and d-bit fractional bits. Thus, each rational number that can be encoded is defined by its H =1+ z + d bits. The encoding is performed to allow multiplication operations, which require the operation modulo H +2d to avoid comparison.
The decoding is defined as:
Figure BDA0003740951420000152
multiplication of these code numbers requires removal by a factor of 1/2d. When Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once. For simplicity, it is processed at decoding time.
It is correct if only one code multiplication has taken place. Since the largest encryptable integer is V-1, the largest encryptable real number must take this into account. Therefore, the integer z and the fraction d must be chosen as follows:
V≥2 H+2d ≥2 1+z+3d (5)
after encryption
Figure BDA0003740951420000153
And aCC i,k Respectively expressed as->
Figure BDA0003740951420000154
And &>
Figure BDA0003740951420000155
The specific method of the MePC algorithm is shown in FIG. 3.
Exchange of base layer gradients using MePC in the k-th Federal task
Figure BDA0003740951420000156
To avoid the risk of data being cracked, at each timeTaking a random proportion χ from a network>
Figure BDA0003740951420000157
The gradient is then->
Figure BDA0003740951420000158
And keeping the same federal random proportion χ the same. In the federal mission of different rounds, the random ratio χ (χ ∈ [1,1/n)]) Is changed, is>
Figure BDA0003740951420000159
The remaining gradients are averaged n-1 ^ based>
Figure BDA00037409514200001510
As shown in fig. 3, is selected>
Figure BDA00037409514200001511
The values of (a) are divided into:
Figure BDA00037409514200001512
only is provided with
Figure BDA00037409514200001513
Is retained at E i The other part and the random parameter χ are broadcast to the other ESs in the form of ciphertext. In this manner, even if portions of the transmitted content are attacked, the original data ≧ is>
Figure BDA0003740951420000161
And will not leak. In particular, if an attacker wants to acquire data ≧ or>
Figure BDA0003740951420000162
Must be taken>
Figure BDA0003740951420000163
All of (a). But is present in>
Figure BDA0003740951420000164
And χ in participant E i And a receiver E j The cryptograph form is kept through homomorphic encryption during communication.
The gradient information shared to the other ESs is
Figure BDA0003740951420000165
Figure BDA0003740951420000166
When E is i Receiving data packet sent by other server
Figure BDA0003740951420000167
It performs data authentication locally. In particular, it uses a corresponding "multiplication" method for verification. Each edge server designs two decoding functions by itself, as follows:
Figure BDA0003740951420000168
Figure BDA0003740951420000169
wherein L is 0 Is that
Figure BDA00037409514200001610
Is L' is->
Figure BDA00037409514200001611
Of the length of (c).
L 0 =χ·L (9)
Figure BDA00037409514200001612
Wherein L is
Figure BDA00037409514200001613
Is greater than or equal to>
Figure BDA00037409514200001614
And/or>
Figure BDA00037409514200001615
Are equal in length.
Require that
Figure BDA00037409514200001616
The decoding functions meeting all the ES execute 'and' operation on the same data packet to obtain all 0's, and execute' cross 'operation to obtain all 1's, that is
Figure BDA00037409514200001617
Figure BDA00037409514200001618
First, the decoding function is initialized as follows,
Figure BDA00037409514200001619
note that at the time of initialization, E is different i The data decoding of the transmitted data packet in the same federal task is also the same function.
Data packet
Figure BDA0003740951420000171
Multiplied by the corresponding decoding functions in the other servers. Due to->
Figure BDA0003740951420000172
The binary bit of middle 0 is multiplied by 0, so E i It can be guaranteed that only its own partial data packet is available. When the temperature is higher than the set temperature
Figure BDA0003740951420000173
When the binary bit in (1) is 1, the ciphertext of the gradient information of the corresponding position can be obtained as follows:
Figure BDA0003740951420000174
E i adding all data packet groups obtained from other ES to corresponding positions to obtain all ciphertext data, and updating to be the final data
Figure BDA0003740951420000175
Namely that
Figure BDA0003740951420000176
Each time a secure multiparty computation is performed, each E will be incremented with k i Decoding function of
Figure BDA0003740951420000177
Is left-cyclically shifted by m units to ensure @>
Figure BDA0003740951420000178
Dynamics of sharing and can divide them equally into E 1 ,E 2 ,…,E n And the data information of each part is not repeated.
The specific method of the PreFla algorithm comprises the following steps:
the data distribution in the internet of vehicles is dispersed, and the data is unbalanced and heterogeneous, so that the improvement of the personalized service requirement while the real-time requirement is met is difficult. In order to prevent the privacy of users from being leaked in the communication process of different edge servers, the HE is used for encrypting parameters in the communication process. In order to better realize personalized training aiming at different user data, the first layer is set as a basic layer, and the existing federal learning method is used for training in a cooperative modeWhile other layers are trained locally as personalization layers, thereby enabling the capture of personal information for different ES devices. In this way, after the joint training process, the globally shared base layer can be transferred into the ES to build its own personalized deep learning model and use its unique personalized layer. Downloading base layer parameters from CS only
Figure BDA0003740951420000179
Parameter of the personalization level->
Figure BDA00037409514200001710
The local data is randomly generated and used for fine tuning. In order to meet the real-time requirement and realize the personalized requirement of the ES, the PreFLa adopts Reinforcement Learning (RL) to carry out adaptation to select the optimal parameter weight ratio a i,k Aggregate global parameter->
Figure BDA00037409514200001711
In the upstream communication phase, each ES not only trains the local model, but also uploads the local parameters to the CS for joint aggregation. After execution of the MePC algorithm in the k-th federation, E i Parameterization over TLS/SSL secure channel
Figure BDA0003740951420000181
And &>
Figure BDA0003740951420000182
And uploading to the BS. In the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, its model parameters for aggregation have a crucial influence on the convergence speed of the stage. Therefore, it is necessary to consider participant E in k rounds of federal aggregation i Parameter weight ratio of (a) i,k
In the invention, reinforced Learning based on DQN is used to predict the parameter weight ratio, and information is stored through a Q function instead of table storage in Q-Learning so as to prevent space multidimensional disasters. In order to better realize model personalization and reduce the waiting time of the uploading weight in MePC-F, DQN is used for selecting the optimal parameter weight ratio a i,k To, aggregate the global parameters in the update CS
Figure BDA0003740951420000183
The reinforcement learning center contains: the state, action, reward function, and feedback are defined as follows:
and (3) state: state of the kth wheel
Figure BDA0003740951420000184
Wherein it is present>
Figure BDA0003740951420000185
Is a poor precision, expressed as:
Figure BDA0003740951420000186
the actions are as follows: weight ratio of parameter a i,k Represented as the action of the k-th round of federal task. To avoid trapping in the local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process, and a can be obtained i,k
Figure BDA0003740951420000187
Where P is a set of weight permutations and rand is a random number (rand E0,1)]),Q(s i,k ,a i,k ) Means that the agent is in state s i,k Take action a i,k The accumulation of time discounts revenue. Once DQN is trained to approximate Q(s) during testing i,k ,a i,k ) The DQN agent will compute { Q(s) for all actions in the kth round i,k ,a i,k )|a i,k ∈[P]}. Each action value represents a passing of the agent in state s i,k Selecting a particular action a i,k The maximum expected return that can be achieved.
Rewarding: set the observed reward at the end of the k-th round of federal as
Figure BDA0003740951420000191
Wherein,
Figure BDA0003740951420000195
is a positive number, ensuring that r k Δ acc with test accuracy i,k And the growth is exponential. The first incentive agent selects devices that enable higher test accuracy.
Figure BDA0003740951420000192
For controlling the following Δ acc i,k Increase r k A change in (c). In general, as machine learning training progresses, model accuracy increases at a slower rate. However, in the federal cooperative task, the model accuracy may be reduced due to data distribution imbalance and heterogeneity. Thus, as FL enters the late stage, an exponential term is used to amplify the increase in boundary accuracy. The second term-1 is used to encourage the agent to improve model accuracy because when Δ acc i,k When less than 0, there is r k ∈(-1,0)。
Training DQN agents to maximize the expectation of a cumulative discount reward as shown in
Figure BDA0003740951420000193
Where γ ∈ (0,1 ] is a factor discounting future rewards.
At obtaining r k Thereafter, the CS saves the multi-dimensional quadruplets B for each round of federated tasks k =(s i,k ,a i,k ,r k ,s i,k+1 ). Optimal action value function Q(s) i,k ,a i,k ) Is the memo sought by the RL proxy, defined as s i,k Initial cumulative discount yield maximum expectation:
Q(s i,k ,a i,k )=E(r i,k +γmax Q(s i,k+1 ,a i,k )|s i,k ,a i,k ) (22)
then, can applyFunction approximation technique learns a parameterized value function Q(s) i,k ,a i,k ;w k ) Approximating the optimum function Q(s) i,k ,a i,k ). In the first step r k +γmax Q(s i,k+1 ,a i,k ) Is Q(s) i,k ,a i,k ;w k ) The goal of learning. Generally, DNN is used to represent a function approximator. The RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:
l(w k )=(r i,k +γmax Q(s i,k+1 ,a i,k ;w k )-Q(s i,k ,a i,k ;w k )) 2 (23)
CS updates Global parameter w k Comprises the following steps:
Figure BDA0003740951420000194
wherein eta is more than or equal to 0, which is the step length.
The CS repeats the above steps to obtain the best learning model. CS can obtain a of the k-th round weight ratio sequence i,k Global parameter of
Figure BDA0003740951420000201
The updating is as follows:
Figure BDA0003740951420000202
all ES update Global parameters
Figure BDA0003740951420000203
And begin the next T local training rounds.
Experimental test examples:
to verify the validity of the proposed mechanism, experimental results and analysis are given. Consider a system with 1 cloud server, 10 edge servers, respectively. The experimental learning rate α is 0.01 and the discount factor γ is 0.9. Positive integer
Figure BDA0003740951420000204
And taking 3. The values of the parameters are shown in Table 2.
TABLE 2 parameter settings
Figure BDA0003740951420000205
The validity of the proposed model was verified on two data sets: MNIST and CIFAR-10. The performance of the proposed federal learning model MePC-F was evaluated based on reconstructed images, average accuracy and average loss of DLG. First, the performance of five schemes against DLG attacks was defended, and then the performance of the proposed federal learning model, mePC-F, was compared to centralized and PeMPC. All results in the following scenario are the average of 1000 independent experiments.
1) Performance against DLG attacks
This section evaluates the effectiveness of MePC-F and compares it with FL, peMPC and DP algorithms (Gaussian and Laplace distributed noise) in DLG reconstructed images. The common gradient of the network is computed for a single image on the MNIST dataset, the results of the different schemes are shown in fig. 4. Since studies [17] indicate that hiding the gradient of the first layer can reduce the reconstruction of the data, the gradient of the first layer (weight and bias terms) is replaced by four methods: mePC-F, peMPC, gaussian distribution (μ =0, σ = 1) noise, and laplace distribution (μ =0, σ = 1) noise were proposed to look at the behavior of DLG. Ladder hiding the first floor after completion
After degrees, DLG uses these gradients to recover the image that created the common shared gradient.
As can be seen from fig. 4, the DLG process can accurately reconstruct the training data without any method to hide the gradient of the first layer (FL in fig. 4 (a)). When the gradient of the first layer is protected by the method proposed by the present invention, mePC-F, information leakage can be effectively prevented in fig. 4 (b). When the number of iteration steps reaches 500, the DLG still cannot construct an image. From fig. 4 (c), it can be seen that similar results to fig. 4 (b), the PeMPC can also defend against the DLG attack in fig. 4. As can be seen from fig. 4 (d), by adding gaussian noise to the first layer, the reconstructed image is partially displayed from the 15 th round to the 20 th round, where the basic contour of the original image has been constructed. As the number of iteration rounds increases to 500 rounds, the image can be restored clearly. The laplacian noise and the gaussian noise in fig. 4 (e) also have a similar phenomenon.
As can be seen from fig. 5, if a malicious server receives the gradients of all hidden layers as plain text, the reconstruction process can obtain the lowest gradient loss and MSE of the image (green line in fig. 5). As the number of rounds increases, peMPC and MePC-F do not converge to zero, and the MSE of the image reaches 10 7 . Adding laplacian and gaussian noise to the original gradient converges to 10 -5 Fig. 4 also demonstrates that the data can be reconstructed when up to 20 rounds are reached. The larger the MSE of the image, the less likely it is that the image is reconstructed.
Based on the above experimental results, it is verified that adding laplacian and gaussian noise to the original gradient can prevent the early partial gradient leakage, but as the number of rounds increases, the original data can still be recovered due to the depth leakage. However, peMPC and MePC-F are effective methods to prevent DLG attacks from reconstructing the original data no matter how long the number of training rounds.
2) Performance comparison of average accuracy and average penalty
In this section, the effectiveness of MePC-F was evaluated and compared to centralized and PeMPC in terms of average accuracy, MNIST and average loss on CIFAR-10 data sets.
From fig. 6 (a) it can be seen the number of rounds required by the model to achieve 98% accuracy on the MNIST dataset. The average accuracy of all three methods increases with increasing training rounds. A centralized approach to achieve target accuracy on MNIST data requires 25 rounds, pepmc 140 rounds and MePC-F40 rounds, with MePC-F requiring 71.2% lower training rounds than pepmc. The reason is that the proposed reinforced federal learning algorithm PreFLa can find better aggregation parameter weight a through interaction with the environment i,k The method can better deal with No-IID data, accelerate model convergence and achieve target precision. Centralized is training on all combinations of data, so its accuracy will be higher than federal scienceThe accuracy of the algorithm is learned. But it can be seen from the figure that the convergence of the PeMPC can almost reach a centralized accuracy.
From fig. 6 (b), it can be seen that the average loss of the three schemes decreases as the number of training rounds increases. For the lumped type, the average loss is reduced from 0.233 to 0.052. The average loss of the peppc decreased from 0.35 to 0.084. Meanwhile, the average loss of MePC-F proposed by the invention is reduced to 0.06, which is lower than 28.6% of that of PeMPC. When the number of training rounds reaches 100 rounds, the proposed MePC-F can almost reach a centralized loss value.
As can be seen in FIG. 7 (a), the number of rounds required for the model to achieve a target accuracy of 50% on CAFIR-10. Similar results to those of fig. 6 (a) can be seen. The average accuracy of all three models is increasing until the target value is reached. For the centralised type, the average accuracy increased from 0.42 to 0.5 for 23 rounds. The average accuracy of the peppc increased from 0.372 to 0.5 for 89 rounds. Meanwhile, the average precision of the proposed MePC-F reaches the target precision at 41 rounds, which is 53.9% lower than that of PeMPC. FIG. 7 (a) shows that MePC-F uses a better weight a than PeMPC i,k The global model is updated, which results in a faster convergence speed.
As can be seen from fig. 7 (b), the average loss for the three schemes is decreasing until a stable value is reached. The centralized MePC-F, peMPC reaches the minimum loss value in sequence, and the time efficiency of the proposed MePC-F is better under PeMPC.
TABLE 3 top accuracy of three schemes in 100 rounds
MNIST CIFAR-10
centralized 98.4% 51.4%
MePC-F 98.2% 51.1%
PeMPC 97.6% 49.2%
Table 3 gives the accuracy of the three solutions within 100 rounds. For MNIST data, the average accuracy of the proposed MePC-F is 98.2% which is 0.6% higher than that of PeMPC. The accuracy of the PeMPC can almost reach the accuracy of the centralized training. For the CAFIR-10 data, the average accuracy of MePC-F at 100 runs was as high as 0.511, 1.9% higher than that of PeMPC. It shows that MePC-F can update a by optimal weight i,k Global parameters are aggregated better than with peampc, resulting in higher accuracy, closer to focused accuracy.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims (8)

1. A real-time reinforced Federal learning data privacy security method based on a MePC-F model in the Internet of vehicles is characterized by comprising the following steps:
s1, constructing a plurality of edge servers E i And a cloud server CS; acquiring vehicle data D = { D = { (D) 1 ,D 2 ,…,D i }, edge server E i Acquiring corresponding vehicle data D i
S2, in the k-th round of federal task, the edge server E i Downloading initial type A gradients from cloud server CS
Figure FDA0003740951410000011
And decrypted into
Figure FDA0003740951410000012
Randomly initializing a type B gradient pick>
Figure FDA0003740951410000013
Edge server E i According to its vehicle data D i To compute gradients in local network model training, edge server E i The gradient information after the completion of the T rounds of local training is recorded as ^ er>
Figure FDA0003740951410000014
S3, edge server E i By decoding functions
Figure FDA0003740951410000015
Slave/slave unit>
Figure FDA0003740951410000016
In which the partial gradient information which needs to be retained is acquired>
Figure FDA0003740951410000017
And the remaining gradient information is pick>
Figure FDA0003740951410000018
Is encrypted to be->
Figure FDA0003740951410000019
Then broadcast and send to all other edge servers E through MePC algorithm j (ii) a Edge server E i Based on a decoding function>
Figure FDA00037409514100000110
Obtain information from other edge servers E j Corresponding partial gradient information ofInformation-binding device>
Figure FDA00037409514100000111
The class A gradient information updated and shared by all the edge servers is ^ or ^>
Figure FDA00037409514100000112
i∈[1,n]N is the total number of edge servers;
s4, all edge servers will
Figure FDA00037409514100000113
Uploading the global parameters to a cloud server CS, aggregating the global parameters by the cloud server CS through a PreFLa algorithm, and selecting an edge server E by the PreFLa algorithm through obtaining a maximized report through reinforcement learning i Is optimized to the parameter weight ratio a i,k Global gradient parameter>
Figure FDA00037409514100000114
According to a i,k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;
and S5, repeating the steps S2 to S4 until a termination condition is reached, calculating a final global gradient parameter by the cloud server CS, issuing the final global gradient parameter to each edge server, extracting the accuracy and the optimal loss function of the MePC-F model by the edge server according to the characteristics of a plurality of vehicle data, obtaining the trained MePC-F model, completing the whole training process, and outputting the model to a service corresponding to the Internet of vehicles in real time.
2. The real-time reinforced federal learning data privacy security method in the internet of vehicles based on the MePC-F model as claimed in claim 1, wherein in the step S2, the specific method for training the local network model is as follows:
employing a deep neural network DNN model, the DNN performing end-to-end feature learning and classifier training by taking different vehicle data as raw inputs, using stochastic gradient descent as a subroutine to minimize the loss value in each local training;
E i downloading base layer parameters from cloud server CS in k-th round of communication, namely initial A-type gradient before decryption
Figure FDA0003740951410000021
And deciphers as type A gradient->
Figure FDA0003740951410000022
Randomly initializing a type B gradient pick>
Figure FDA0003740951410000023
Wherein k is the [1,K ]]K represents the total number of rounds of the federal mission; if it is the first round of the federal task, the CS initializes randomly->
Figure FDA0003740951410000024
Before local training, E i By using homomorphic encryption pairs->
Figure FDA0003740951410000025
Decipher into->
Figure FDA0003740951410000026
And is recorded as->
Figure FDA0003740951410000027
The loss function of the local model is set as follows:
L(w i )=l(w i )+λ(w i,t -w i,t+1 ) 2
where L () represents the loss of the network, the second term is the L2 regularization term, and λ is the regularization coefficient; w is a i Representing the total weight information in the local model, w i,t Is the weight information of the local model at time t, w i,t+1 The weight information of the local model at the moment t +1 is obtained;
E i update G k And replaces the weight parameter w of the model i Proceed through a minimization loss functionThe line local model training is as follows:
w i =w i -ηG k
where eta is the learning rate, G k Is that
Figure FDA0003740951410000028
And &>
Figure FDA0003740951410000029
Is expressed here->
Figure FDA00037409514100000210
Random initialization;
edge server E i After T rounds of local training are achieved, the accuracy of each local model aCC is obtained at the moment i,k And
Figure FDA00037409514100000211
3. the real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein in the step S3, the specific method of the MePC algorithm is as follows:
in the k-th federal task, all edge servers use MePC to exchange base layer gradients
Figure FDA0003740951410000031
Wherein it is present>
Figure FDA0003740951410000032
Encrypted class A data representing the nth edge server in the kth round of federated tasks, <' >>
Figure FDA0003740951410000033
Encrypted data in class A representing the ith edge server in the kth federated task, and +>
Figure FDA0003740951410000034
Indicating that the ith edge server in the kth round of federated task broadcasts encrypted data of class A to other edge servers, and->
Figure FDA0003740951410000035
I.e. is>
Figure FDA0003740951410000036
The encrypted data reserved by the user is removed;
to avoid the risk of data being cracked, a random ratio χ is taken in each network
Figure FDA0003740951410000037
The gradient is then->
Figure FDA0003740951410000038
And keeping the random proportion χ of the same federal round the same, and then will ^ be ^ or ^ be ^ ed>
Figure FDA0003740951410000039
Encrypted is->
Figure FDA00037409514100000310
The random proportion χ varies across different rounds of federal mission, and χ ∈ [1,1/n ]];
Figure FDA00037409514100000311
The remaining gradient is ^ encrypted by homomorphism>
Figure FDA00037409514100000312
Is divided into n-1 parts
Figure FDA00037409514100000313
The values of (a) are divided into:
Figure FDA00037409514100000314
only is provided with
Figure FDA00037409514100000315
Is retained at E i In the method, other parts and the random parameter χ are broadcast and transmitted to other E in the form of ciphertext j (ii) a In this manner, even if portions of the transmitted content are attacked, the original data ≧ is>
Figure FDA00037409514100000316
The leakage is avoided;
sharing to other E j The gradient information of
Figure FDA00037409514100000317
Figure FDA00037409514100000318
When E is i Receiving data packet sent by other server
Figure FDA00037409514100000319
It performs data authentication locally.
4. The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles as claimed in claim 3, wherein the specific method for locally performing data verification in the step S3 is as follows:
in the k-th round of federal tasks, verification was performed using the corresponding "multiplication" method, with each edge server designing two decoding functions on its own, as follows:
Figure FDA0003740951410000041
Figure FDA0003740951410000042
wherein L is 0 Is that
Figure FDA0003740951410000043
L' is->
Figure FDA0003740951410000044
Length of (d); subscript k of the decoding function represents the decoding function in the k-th round of federal task; />
L 0 =χ·L
Figure FDA0003740951410000045
Wherein L is
Figure FDA0003740951410000046
Is greater than or equal to>
Figure FDA0003740951410000047
And/or>
Figure FDA0003740951410000048
Are equal in length;
require to make a request for
Figure FDA0003740951410000049
The decoding functions of all edge servers execute 'and' operation on the same data packet to obtain all 0's, and execute' cross 'operation to obtain all 1's, namely:
Figure FDA00037409514100000410
Figure FDA00037409514100000411
first, the initial decoding function is as follows:
Figure FDA00037409514100000412
data packet
Figure FDA00037409514100000413
Multiplying with corresponding decoding functions in other servers; due to->
Figure FDA00037409514100000414
The binary bit of middle 0 is multiplied by 0, so E i Ensuring that only its own partial data packet is obtained; when/is>
Figure FDA00037409514100000415
When the binary bit in (1) is 1, obtaining the ciphertext of the gradient information of the corresponding position as follows:
Figure FDA00037409514100000416
E i will be from other edge servers E j Adding all the obtained data packet arrays to corresponding positions to obtain all the ciphertext data, and updating the ciphertext data into the final ciphertext data
Figure FDA00037409514100000417
Namely:
Figure FDA00037409514100000418
each E will be added as k increases each time a secure multiparty computation is performed i Decoding function of
Figure FDA0003740951410000051
Is left-cyclically shifted by m units to ensure @>
Figure FDA0003740951410000052
Dynamics of sharing and equally dividing them into E 1 ,E 2 ,…,E n And the data information of each part is not repeated.
5. The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein in the step S4, the specific method of the PreFla algorithm is as follows:
PreFLa adopts reinforcement learning RL to carry out adaptation to select optimal parameter weight ratio a i,k Aggregating global parameters
Figure FDA0003740951410000053
In an uplink communication stage, each edge server not only trains a local model, but also uploads local parameters to a cloud server CS for joint aggregation; after execution of the MePC algorithm in the k-th federation, E i Parameterization over TLS/SSL secure channel
Figure FDA0003740951410000054
And
Figure FDA0003740951410000055
uploading to the CS; in the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, the model parameters of each ES are used for aggregation, and the convergence speed of the aggregation stage has a crucial influence; therefore, it is necessary to consider participant E in k rounds of federal aggregation i Parameter weight ratio of (a) i,k
Using the DQN-based reinforcement learning to predict the parameter weight ratio, and storing information through a Q function to prevent a spatial multi-dimensional disaster; in order to better realize model personalization and reduce the waiting time of the uploading weight in MePC-F, the method usesDQN to select optimal parameter weight ratio a i,k Aggregating global parameters in update CS
Figure FDA0003740951410000056
The reinforcement learning comprises the following steps: status, actions, reward functions, and feedback.
6. The real-time reinforced federal learning data privacy security method in internet of vehicles based on the MePC-F model as claimed in claim 5, wherein in the step S4, the specific methods of status, action, reward function and feedback are as follows:
the state is as follows: state of the kth wheel
Figure FDA0003740951410000057
Wherein it is present>
Figure FDA0003740951410000058
Is a poor precision, expressed as:
Figure FDA0003740951410000059
the actions are as follows: weight ratio of parameter a i,k An action expressed as a kth round of federal task; in order to avoid trapping in a local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process to obtain a i,k
Figure FDA0003740951410000061
Where P is a set of weight permutations, rand is a random number, rand is an element of 0,1],Q(s i,k ,a i,k ) Means agent is in state s i,k Take action a i,k Cumulative cash-out benefits over time; once DQN is trained to approximate Q(s) during testing i,k ,a i,k ) The DQN agent will compute { Q(s) for all actions in the kth round i,k ,a i,k )|a i,k ∈[P]}; each action value represents a passing of the agent in state s i,k Selecting a particular action a i,k Maximum expected return obtained;
rewarding: the observed reward at the end of the k-th round of federal is set to be:
Figure FDA0003740951410000062
wherein,
Figure FDA0003740951410000063
is a positive number, ensuring that r k Δ acc with training accuracy i,k The growth is exponential; the first incentive agent selects equipment capable of achieving higher test accuracy;
Figure FDA0003740951410000065
For controlling the following Δ acc i,k Increase r k A change in (c); when Δ acc i,k When less than 0, there is r k ∈(-1,0);
Training the DQN agent to maximize the expectation of a cumulative discount reward, as shown by:
Figure FDA0003740951410000064
wherein γ ∈ (0,1 ], represents a factor discounting future rewards;
at obtaining r k The cloud server CS then saves the multi-dimensional quadruplets B for each round of federal tasks k =(s i,k ,a i,k ,r k ,s i,k+1 ) (ii) a Optimal action value function Q(s) i,k ,a i,k ) Is the memo sought by the RL proxy, defined as s i,k Initial cumulative discount yield maximum expectation:
Q(s i,k ,a i,k )=E(r i,k +γmax Q(s i,k+1 ,a i,k )|s i,k ,a i,k )
learning a parameterized value function Q(s) using function approximation techniques i,k ,a i,k ;w k ) Approximating the optimal value function Q(s) i,k ,a i,k );r k +γmax Q(s i,k+1 ,a i,k ) Is Q(s) i,k ,a i,k ;w k ) A goal of learning; DNN is used to represent a function approximator; the RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:
l(w k )=(r i,k +γmax Q(S i,k+1 ,a i,k ;w k )-Q(s i,k ,a i,k ;w k )) 2
CS updates Global parameter w k Comprises the following steps:
Figure FDA0003740951410000071
wherein eta is more than or equal to 0 and is the step length;
after the cloud server CS obtains the optimal learning model, a of the kth round weight ratio sequence is obtained i,k Global parameter of
Figure FDA0003740951410000072
The updating is as follows: />
Figure FDA0003740951410000073
All edge servers update global parameters
Figure FDA0003740951410000074
And the next T local training rounds are started.
7. The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein the HE encryption method in the method specifically comprises the following steps:
the encryption schemes of the weight matrix and the offset vector follow the same idea, and the addition homomorphic encryption of the real number a is expressed as a E In addition homomorphic encryption, for any two numbers a and b, there is a E +b E =(a+b) E (ii) a The method of converting any real number r into a coded rational number stationary point v is:
Figure FDA0003740951410000076
consider the gradient
Figure FDA0003740951410000075
Each encoded real number r in (a) can be expressed as a rational H-bit number, consisting of one sign bit, z-bit integer bits, and d-bit fractional bits; thus, each rational number that can be encoded is defined by its H =1+ z + d bits; performing encoding to allow multiplication operations, which require the operation modulus to be H +2d to avoid comparison;
the decoding is defined as:
Figure FDA0003740951410000081
multiplication of these code numbers requires removal of the factor 1/2d; when Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once; for simplicity, it is processed at decoding time;
the largest encryptable integer is V-1, so the largest encryptable real number must be taken into account, and therefore the integer z and the fraction d are chosen as follows:
V≥2 H+2d ≥2 1+z+3d
8. the real-time reinforced federal learning data privacy security method in internet of vehicles based on MePC-F model as claimed in claim 1, wherein the optimal loss function in the step S5 is
Figure FDA0003740951410000082
Wherein, L (w) i ) Represents E i Loss of the network.
CN202210816716.3A 2022-07-12 2022-07-12 Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles Active CN115310121B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210816716.3A CN115310121B (en) 2022-07-12 2022-07-12 Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210816716.3A CN115310121B (en) 2022-07-12 2022-07-12 Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles

Publications (2)

Publication Number Publication Date
CN115310121A CN115310121A (en) 2022-11-08
CN115310121B true CN115310121B (en) 2023-04-07

Family

ID=83857637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210816716.3A Active CN115310121B (en) 2022-07-12 2022-07-12 Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles

Country Status (1)

Country Link
CN (1) CN115310121B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731424B (en) * 2022-12-03 2023-10-31 北京邮电大学 Image classification model training method and system based on enhanced federal domain generalization
CN115860789B (en) * 2023-03-02 2023-05-30 国网江西省电力有限公司信息通信分公司 CES day-ahead scheduling method based on FRL
CN116610958B (en) * 2023-06-20 2024-07-26 河海大学 Unmanned aerial vehicle group reservoir water quality detection oriented distributed model training method and system
CN117812564B (en) * 2024-02-29 2024-05-31 湘江实验室 Federal learning method, device, equipment and medium applied to Internet of vehicles
CN117873402B (en) * 2024-03-07 2024-05-07 南京邮电大学 Collaborative edge cache optimization method based on asynchronous federal learning and perceptual clustering

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160573B (en) * 2020-04-01 2020-06-30 支付宝(杭州)信息技术有限公司 Method and device for protecting business prediction model of data privacy joint training by two parties
CN111611610B (en) * 2020-04-12 2023-05-30 西安电子科技大学 Federal learning information processing method, system, storage medium, program, and terminal
CN112100295A (en) * 2020-10-12 2020-12-18 平安科技(深圳)有限公司 User data classification method, device, equipment and medium based on federal learning
CN112199702B (en) * 2020-10-16 2024-07-26 鹏城实验室 Privacy protection method, storage medium and system based on federal learning
CN112015749B (en) * 2020-10-27 2021-02-19 支付宝(杭州)信息技术有限公司 Method, device and system for updating business model based on privacy protection
CN112347500B (en) * 2021-01-11 2021-04-09 腾讯科技(深圳)有限公司 Machine learning method, device, system, equipment and storage medium of distributed system
CN113037460B (en) * 2021-03-03 2023-02-28 北京工业大学 Federal learning privacy protection method based on homomorphic encryption and secret sharing
CN113435472A (en) * 2021-05-24 2021-09-24 西安电子科技大学 Vehicle-mounted computing power network user demand prediction method, system, device and medium

Also Published As

Publication number Publication date
CN115310121A (en) 2022-11-08

Similar Documents

Publication Publication Date Title
CN115310121B (en) Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles
CN109684855B (en) Joint deep learning training method based on privacy protection technology
US11449753B2 (en) Method for collaborative learning of an artificial neural network without disclosing training data
Han et al. Logistic regression on homomorphic encrypted data at scale
Zhu et al. Distributed additive encryption and quantization for privacy preserving federated deep learning
Al-Maadeed et al. A New Chaos‐Based Image‐Encryption and Compression Algorithm
CN110572253A (en) Method and system for enhancing privacy of federated learning training data
CN111291411B (en) Safe video anomaly detection system and method based on convolutional neural network
Erkin et al. Privacy-preserving distributed clustering
CN114254386A (en) Federated learning privacy protection system and method based on hierarchical aggregation and block chain
CN114363043B (en) Asynchronous federal learning method based on verifiable aggregation and differential privacy in peer-to-peer network
CN112949741B (en) Convolutional neural network image classification method based on homomorphic encryption
CN113298268A (en) Vertical federal learning method and device based on anti-noise injection
Beguier et al. Safer: Sparse secure aggregation for federated learning
CN117807597A (en) Robust personalized federal learning method facing back door attack
CN110737907A (en) Anti-quantum computing cloud storage method and system based on alliance chain
Ghavamipour et al. Federated synthetic data generation with stronger security guarantees
Li et al. An Adaptive Communication‐Efficient Federated Learning to Resist Gradient‐Based Reconstruction Attacks
Qiu et al. Privacy preserving federated learning using ckks homomorphic encryption
Kiamari et al. Non-interactive verifiable LWE-based multi secret sharing scheme
Gad et al. Joint Knowledge Distillation and Local Differential Privacy for Communication-Efficient Federated Learning in Heterogeneous Systems
CN116595589B (en) Secret sharing mechanism-based distributed support vector machine training method and system
CN117294469A (en) Privacy protection method for federal learning
CN117113413A (en) Robust federal learning privacy protection system based on block chain
CN116582242A (en) Safe federal learning method of ciphertext and plaintext hybrid learning mode

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant