CN115310121A - Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles - Google Patents

Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles Download PDF

Info

Publication number
CN115310121A
CN115310121A CN202210816716.3A CN202210816716A CN115310121A CN 115310121 A CN115310121 A CN 115310121A CN 202210816716 A CN202210816716 A CN 202210816716A CN 115310121 A CN115310121 A CN 115310121A
Authority
CN
China
Prior art keywords
model
data
federal
mepc
gradient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210816716.3A
Other languages
Chinese (zh)
Other versions
CN115310121B (en
Inventor
朱容波
李梦瑶
刘浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong Agricultural University
Original Assignee
Huazhong Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong Agricultural University filed Critical Huazhong Agricultural University
Priority to CN202210816716.3A priority Critical patent/CN115310121B/en
Publication of CN115310121A publication Critical patent/CN115310121A/en
Application granted granted Critical
Publication of CN115310121B publication Critical patent/CN115310121B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/062Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00 applying encryption of the keys
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a real-time reinforced federal learning data privacy security method based on a MePC-F model in the internet of vehicles, which comprises the following steps: building multiple edge servers E i And a cloud server CS; edge server E i Downloading initial type A gradients from cloud server CS
Figure DDA0003740951440000011
And decrypted into
Figure DDA0003740951440000012
Random initialization of type B gradients
Figure DDA0003740951440000013
Carrying out local model training; edge server E i By a decoding function from
Figure DDA0003740951440000014
To obtain partial gradient information to be preserved
Figure DDA0003740951440000015
And residual gradient information is processed
Figure DDA0003740951440000016
Is homomorphically encrypted as
Figure DDA0003740951440000017
Then broadcast and send to all other edge servers E through MePC algorithm j (ii) a The class A gradient information after all edge servers are updated and shared is respectively
Figure DDA0003740951440000018
All edge servers will
Figure DDA0003740951440000019
Uploading the global parameters to a cloud server CS, and aggregating the global parameters by the cloud server CS through a PreFLa algorithm; the above steps are repeated until a termination condition is reached. The invention prevents data leakage between terminals, realizes data privacy safety protection, and reduces communication overhead while preventing original data leakage.

Description

Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles
Technical Field
The invention relates to the technical field of networked vehicle user cooperative processing real-time safety behavior analysis, in particular to a real-time reinforced federal learning data privacy security method based on a MePC-F model in a vehicle networking.
Background
With the development of various real-time communications and services supported by the internet of vehicles, the data volume generated by interconnected equipment such as vehicle-mounted units is unprecedentedly huge, a large amount of heterogeneous data oriented to vehicle users and the difference of equipment computing capacity are provided, federal learning provides an effective solution for meeting the requirement of data safety protection in the real-time training process of a network model, and different edge devices can cooperatively train a machine learning model under the condition of not exposing original data.
The edge computing mass data and the personal privacy of the user are closely combined, for example, the track, credit card, bill and other data of the user are really related to the privacy security of the user, and if data leakage occurs, great potential safety hazards are brought to the user. Federal learning can protect data to some extent, but the risk of information leakage still exists, and there are four types, 1) member leakage, 2) unexpected feature leakage, 3) class representing original data leakage, and 4) original data leakage. The last type of data leakage is the least acceptable for privacy-sensitive participants.
In order to protect the data privacy of mobile users and solve the problem of leakage of raw data, researchers have conducted a lot of research on data security protection based on cryptography: differential privacy, homomorphic encryption, and multi-party secure computation. Differential privacy generally uses three noise addition mechanisms: a laplace mechanism, a gaussian mechanism, and an exponential mechanism, respectively. The context information is disturbed by adding noise to protect the privacy of the data, but if the noise is increased too much, the performance of the model training is affected. Common in homomorphic encryption are additive and multiplicative homomorphic encryption: research shows that when Paillier addition homomorphic encryption calculation is used, noise is doubled, and when El Gamal multiplication homomorphic encryption calculation is used, the noise is increased in a second order. To increase the availability of data and overcome the noise problem, researchers have introduced bootstrapping, which reduces noise by setting thresholds for encryption and decryption, allowing the scheme to compute an infinite number of operations. It is also possible to do batch processing or parallel homomorphic computation or compression of erasure pairs to solve the noise problem. The safe multi-party calculation refers to the problem that a multi-party participant safely calculates an appointed function without a trusted third party, and the main purpose is to ensure that private input of each party is independent in the calculation process and no local data is leaked in the calculation process. Research proves that the gradient leakage problem in federal learning can be solved by using safe multi-party calculation, and the data safety protection can be carried out while the accuracy is ensured only by carrying out information exchange on the first hidden layer. But the process of information interaction is P2P, so the problem of large communication overhead occurs.
Most of data security protection research based on cryptography is a centralized solution, and aims to solve the problem of time overhead while protecting data security: federated learning allows edge devices to co-train machine learning models without exposing raw data. Federal learning typically employs a parameter server architecture, where the client is trained by a parameter server synchronization local model. The synchronization method is usually used for realizing, namely, the central server synchronously sends the global model to a plurality of clients, and the plurality of clients synchronously return the updated model to the central server after training the model based on local data. This may be slow due to a queue loss. Global synchronization is very difficult, especially in a joint learning scenario, due to limited computing power and battery time, and varying availability and completion times from device to device. A new joint optimization asynchronous algorithm is proposed to solve the regularization local problem to ensure convergence, so that multiple devices and servers can cooperatively and efficiently train the model without revealing privacy.
Although there have been many studies in terms of data security. However, most of the data are limited to solving the safety problem of original data, and how to simultaneously meet the requirements of privacy and usability of big data of mobile users in a complex vehicle networking space is still open.
Firstly, data in federal learning are stored in a local node, so that the risk problem of leakage of original data in data transmission can be reduced. But only gradient information is transmitted, the possibility still arises that the original data is recovered. Data interaction in secure multi-party computing can enable multiple parties to have data, and the possibility that a sample is recovered by information after gradient information is leaked is reduced. However, in the existing secure multi-party computing, the way of exchanging information among users is that all users send information to other users, and simply speaking, a unicast way is used, which brings higher time overhead. Therefore, when dealing with the data security and real-time requirements of vehicle users, it is important to find a suitable solution to reduce the risk of data being attacked and recovered, and to reduce the transmission delay. Secondly, due to the difference of data and equipment of different edge servers, it is also necessary to improve the training precision of the whole model in a targeted manner in the training process. The global parameter aggregation performed by adopting a typical federal average synchronization mode is slow due to the phenomenon of queue loss. While the communication time overhead is balanced and calculated, it is also important that the global precision is guaranteed through personalized training of a plurality of models. However, most federal learning algorithms based on data security rely on a synchronous aggregation algorithm, which can bring high latency to challenge meeting the real-time requirements of the internet of vehicles. Therefore, a federated learning algorithm based on reinforcement learning is necessary to reduce time delay, improve accuracy and guarantee data safety.
Disclosure of Invention
The invention aims to solve the technical problem of providing a real-time Federal learning data privacy strengthening safety method based on a MePC-F model in the Internet of vehicles aiming at the defects in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the invention provides a real-time reinforced federal learning data privacy security method based on a MePC-F model in the Internet of vehicles, which comprises the following steps:
s1, constructing a plurality of edge servers E i And a cloud server CS; acquiring vehicle data D = { D = { (D) 1 ,D 2 ,…,D i }, edge server E i Acquiring corresponding vehicle data D i
S2, in the k-th round of federal task, the edge server E i Downloading initial type A gradients from cloud server CS
Figure BDA0003740951420000041
And decrypted into
Figure BDA0003740951420000042
Random initialization of type B gradients
Figure BDA0003740951420000043
Edge server E i According to the number of vehiclesAccording to D i To calculate the gradient in the local network model training, edge server E i Recording the gradient information after finishing the T-round local training
Figure BDA0003740951420000044
S3, edge server E i By decoding functions
Figure BDA0003740951420000045
From
Figure BDA0003740951420000046
To obtain partial gradient information to be preserved
Figure BDA0003740951420000047
And the remaining gradient information is used
Figure BDA0003740951420000048
Is homomorphically encrypted to
Figure BDA0003740951420000049
Then broadcast and send to all other edge servers E through MePC algorithm j (ii) a Edge server E i According to decoding function
Figure BDA00037409514200000410
Get from other edge servers E j Corresponding partial gradient information of
Figure BDA00037409514200000411
The class A gradient information after all the edge servers are updated and shared is respectively
Figure BDA00037409514200000412
i∈[1,n]N is the total number of edge servers;
s4, all edge servers will
Figure BDA00037409514200000413
Is uploaded toThe cloud server CS aggregates global parameters through a PreFLa algorithm, and the PreFLa algorithm obtains maximization report through reinforcement learning to select the edge server E i Is optimized to the parameter weight ratio a i,k Global gradient parameter
Figure BDA00037409514200000414
According to a i,k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;
and S5, repeating the steps S2 to S4 until a termination condition is reached, calculating a final global gradient parameter by the cloud server CS, issuing the final global gradient parameter to each edge server, extracting the feature of the plurality of vehicle data by the edge server, calculating the accuracy and the optimal loss function of the MePC-F model to obtain the trained MePC-F model, completing the whole training process, and outputting the model to a service corresponding to the Internet of vehicles in real time. .
Further, in step S2 of the present invention, a specific method for training the local network model is as follows:
employing a deep neural network DNN model, the DNN performing end-to-end feature learning and classifier training by taking different vehicle data as raw inputs, using stochastic gradient descent as a subroutine to minimize the loss value in each local training;
E i downloading base layer parameters from cloud server CS in k-th round of communication, namely initial A-type gradient before decryption
Figure BDA0003740951420000051
And decipher as A-type gradient
Figure BDA0003740951420000052
Random initialization of type B gradients
Figure BDA0003740951420000053
Wherein k is the [1,K ]]K represents the total number of rounds of the federal mission; if the task is the first round of federal task, the CS is initialized randomly
Figure BDA0003740951420000054
Before local training, E i By using homomorphic cryptographic pairs
Figure BDA0003740951420000055
Is decrypted into
Figure BDA0003740951420000056
And is marked as
Figure BDA0003740951420000057
The loss function of the local model is set as follows:
L(w i )=l(w i )+λ(w i,t -w i,t+1 ) 2
where L () represents the loss of the network, the second term is the L2 regularization term, and λ is the regularization coefficient; w is a i Representing the total weight information in the local model, w i,t Is the weight information of the local model at time t, w i,t+1 The weight information of the local model at the moment t +1 is obtained;
E i initialization G k And replaces the weight parameter w of the model i Continuing the local model training by minimizing the loss function as follows:
w i =w i -ηG k
where eta is the learning rate, G k Is that
Figure BDA0003740951420000058
And
Figure BDA0003740951420000059
is a general expression of
Figure BDA00037409514200000510
Random initialization;
edge server E i After T rounds of local training are achieved, the accuracy acc of each local model is obtained at the moment i,k And
Figure BDA00037409514200000511
further, in step S3 of the present invention, a specific method of the MePC algorithm is as follows:
in the k-th federal task, all edge servers use MePC to exchange base layer gradients
Figure BDA00037409514200000512
Wherein the content of the first and second substances,
Figure BDA00037409514200000513
class a encrypted data representing the nth edge server in the k round of federal tasks,
Figure BDA00037409514200000514
the encrypted data of class A of the ith edge server in the k round of federal task,
Figure BDA00037409514200000515
indicating that the ith edge server in the k round of federal task broadcasts a type A encrypted data to other edge servers,
Figure BDA00037409514200000516
is that
Figure BDA00037409514200000517
The encrypted data reserved by the user is removed;
to avoid the risk of data being cracked, a random ratio χ is taken in each network
Figure BDA0003740951420000061
The gradient is
Figure BDA0003740951420000062
And keeping the same federal random proportion chi equal in the same round, and then will
Figure BDA0003740951420000063
Is encrypted as
Figure BDA0003740951420000064
The random proportion χ varies across different rounds of federal mission, and χ ∈ [1,1/n ]];
Figure BDA0003740951420000065
The remaining gradient is encrypted homomorphically to
Figure BDA0003740951420000066
Is divided into n-1 parts
Figure BDA0003740951420000067
The values of (a) are divided into:
Figure BDA0003740951420000068
only is provided with
Figure BDA0003740951420000069
Is retained at E i In the method, other parts and the random parameter χ are broadcast and transmitted to other E in the form of ciphertext j (ii) a In this way, even if part of the transmission content is attacked, the original data
Figure BDA00037409514200000610
The leakage is avoided;
sharing to other E j The gradient information of (A) is
Figure BDA00037409514200000611
Figure BDA00037409514200000612
When Ei receives data packet sent by other server
Figure BDA00037409514200000613
It performs data authentication locally.
Further, in step S3 of the present invention, a specific method for locally performing data verification includes:
in the k-th round of federal tasks, verification was performed using the corresponding "multiplication" method, with each edge server designing two decoding functions on its own, as follows:
Figure BDA00037409514200000614
Figure BDA00037409514200000615
wherein L is 0 Is that
Figure BDA00037409514200000616
Length of (L') is
Figure BDA00037409514200000617
Length of (d); subscript k of the decoding function represents the decoding function in the kth round of federal task;
L 0 =χ·L
Figure BDA00037409514200000618
wherein L is
Figure BDA00037409514200000619
The length of (a) of (b),
Figure BDA00037409514200000620
and
Figure BDA00037409514200000621
are equal in length;
require that
Figure BDA0003740951420000071
The decoding functions of all edge servers execute ' and ' operation on the same data packet to obtain all 0's, andperforming the "cross" operation results in all 1's, i.e.:
Figure BDA0003740951420000072
Figure BDA0003740951420000073
first, the initial decoding function is as follows:
Figure BDA0003740951420000074
data packet
Figure BDA0003740951420000075
Multiplying with corresponding decoding functions in other servers; due to the fact that
Figure BDA0003740951420000076
The binary bit of middle 0 is multiplied by 0, so E i Ensuring that only its own partial data packet is obtained; when in use
Figure BDA0003740951420000077
When the binary bit in (1) is 1, obtaining the ciphertext of the gradient information of the corresponding position as follows:
Figure BDA0003740951420000078
E i will be from other edge servers E j Adding all the obtained data packet arrays to corresponding positions to obtain all the ciphertext data, and updating the ciphertext data into the final ciphertext data
Figure BDA0003740951420000079
Namely:
Figure BDA00037409514200000710
each E will be added as k increases each time a secure multiparty computation is performed i Decoding function of
Figure BDA00037409514200000711
Is circularly moved to the left by m units to ensure
Figure BDA00037409514200000712
Dynamics of sharing and equally dividing them into E 1 ,E 2 ,…,E n And the data information of each part is not repeated.
Further, in step S4 of the present invention, the specific method of the PreFla algorithm is as follows:
PreFLa adopts reinforcement learning RL to carry out adaptation to select optimal parameter weight ratio a i,k Aggregating global parameters
Figure BDA00037409514200000713
In an uplink communication stage, each edge server not only trains a local model, but also uploads local parameters to a cloud server CS for joint aggregation; after execution of the MePC algorithm in the k-th federation, E i Parameterization over TLS/SSL secure channel
Figure BDA00037409514200000714
And
Figure BDA00037409514200000715
uploading to the CS; in the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, the model parameters of each ES are used for aggregation, and the convergence speed of the aggregation stage has a crucial influence; therefore, it is necessary to consider participant E in k rounds of federal aggregation i Parameter weight ratio of (a) i,k
Using the reinforced learning based on DQN to predict the weight ratio of the parameters, and storing information through a Q function to prevent a spatial multi-dimensional disaster; to better achieve model personalization and reduce latency of upload weights in MePC-FDQN is used to select the optimal parameter weight ratio a i,k Aggregating global parameters in updated CS
Figure BDA0003740951420000081
The reinforcement learning comprises the following steps: status, actions, reward functions, and feedback.
Further, in step S4 of the present invention, the specific method of the state, the action, the reward function and the feedback is as follows:
and (3) state: state of the kth wheel
Figure BDA0003740951420000082
Wherein the content of the first and second substances,
Figure BDA0003740951420000083
is a poor precision, expressed as:
Figure BDA0003740951420000084
the actions are as follows: weight ratio of parameter a i,k An action represented as a kth round of federated tasks; in order to avoid trapping in a local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process to obtain a i,k
Figure BDA0003740951420000085
Where P is a set of weight permutations, rand is a random number, rand is an element of 0,1],Q(s i,k ,a i,k ) Means agent is in state s i,k Take action a i,k Cumulative cash-out benefits over time; once DQN is trained to approximate Q(s) during testing i,k ,a i,k ) The DQN agent will compute { Q(s) for all actions in the kth round i,k ,a i,k )|a i,k ∈[P]}; each action value represents a passing of the agent in state s i,k Selecting a particular action a i,k The maximum expected reward that can be achieved;
rewarding: the observed reward at the end of the kth round of federal is set to be:
Figure BDA0003740951420000086
wherein the content of the first and second substances,
Figure BDA0003740951420000087
is a positive number, ensuring that r k Δ acc with test accuracy i,k The growth is exponential; the first incentive agent selects equipment capable of achieving higher test accuracy;
Figure BDA0003740951420000091
for controlling the following Δ acc i,k Increase r k A change in (c); when Δ acc i,k When less than 0, there is r k ∈(-1,0);
The desire to train the DQN agent to maximize the cumulative discount reward is shown as follows:
Figure BDA0003740951420000092
wherein γ ∈ (0,1 ], represents a factor discounting future rewards;
at obtaining r k The cloud server CS then saves the multi-dimensional quadruplets B for each round of federal tasks k =(s i,k ,a i,k ,r k ,s i,k+1 ) (ii) a Optimal action value function Q(s) i,k ,a i,k ) Is the memo sought by the RL proxy, defined as s i,k Initial cumulative discount yield maximum expectation:
Q(s i,k ,a i,k )=E(r i,k +γmax Q(s i,k+1 ,a i,k )|s i,k ,a i,k )
learning a parameterized value function Q(s) using function approximation techniques i,k ,a i,k ;w k ) Approximating the optimal value function Q(s) i,k ,a i,k );r k +γmax Q(s i,k+1 ,a i,k ) Is Q(s) i,k ,a i,k ;w k ) A goal of learning; DNN is used to represent a function approximator; the RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:
l(w k )=(r i,k +γmax Q(s i,k+1 ,a i,k ;w k )-Q(s i,k ,a i,k ;w k )) 2
CS updates the global parameter w k Comprises the following steps:
Figure BDA0003740951420000093
wherein eta is more than or equal to 0 and is the step length;
after the cloud server CS obtains the optimal learning model, a of the k-th round weight ratio sequence is obtained i,k Global parameter(s)
Figure BDA0003740951420000094
The updating is as follows:
Figure BDA0003740951420000095
all edge servers update global parameters
Figure BDA0003740951420000096
And begin the next T local training rounds.
Further, the HE encryption method in the method of the present invention specifically includes:
the encryption schemes of the weight matrix and the offset vector follow the same idea, and the addition homomorphic encryption of the real number a is expressed as a E In addition homomorphic encryption, for any two numbers a and b, there is a E +b E =(a+b) E (ii) a The method of converting any real number r into a coded rational number stationary point v is:
Figure BDA0003740951420000101
consider the gradient
Figure BDA0003740951420000102
Each encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits, and d-bit fractional bits; thus, each rational number that can be encoded is defined by its H =1+ z + d bits; performing encoding to allow multiplication operations, which require the operation modulus to be H +2d to avoid comparison;
the decoding is defined as:
Figure BDA0003740951420000103
multiplication of these code numbers requires removal of the factor 1/2d; when Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once; for simplicity, it is processed at decoding time;
the largest encryptable integer is V-1, so the largest encryptable real number must be taken into account, and therefore the integer z and the fraction d are chosen as follows:
V≥2 H+2d ≥2 1+z+3d
further, the optimal loss function in step S5 of the present invention is
Figure BDA0003740951420000104
Wherein, L (w) i ) Denotes E i Loss of the network.
The invention has the following beneficial effects:
(1) A Federal learning model (MePC-F) for multi-party broadcast security computing is presented. The model combines the MePC algorithm and the PreFla algorithm to solve the problems of the security of federal learning training data and communication overhead in the Internet of vehicles. And the mixed advantages of homomorphic encryption and safe multi-party calculation are considered to prevent data leakage between the terminals, and the reduction degree of the original data is reduced after the data is attacked, so that the privacy safety protection of the data is realized to the maximum extent.
(2) A secure broadcast multi-party computing MePC is presented. For secure multiparty computation, sharing only the gradient information of the first layer can greatly reduce the risk of data being recovered and reduce traffic. In the sharing process, the edge server model takes respective parts through a decoding function in a broadcasting mode, and the time complexity can be increased from O (n) 2 ) To O (n), communication overhead is reduced while preventing leakage of original data.
(3) A federal learning algorithm PreFla based on weight proportion is proposed. And finding the optimal gradient weight ratio by using PreFla to aggregate global parameters, and designing a reward function by using the accuracy difference of each edge server, so that the action selection with the maximum overall return is the weight ratio of each round of federation. And an L2 regularization term is added in the loss function to promote edge server cooperation and reduce time delay and performance problems brought by data heterogeneity. Therefore, the global model is generalized better, and convergence is accelerated.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a MePC-F model of an embodiment of the present invention;
FIG. 2 is a flow chart of a MePC-F model of an embodiment of the present invention;
FIG. 3 is a MePC algorithm of an embodiment of the present invention;
fig. 4 shows DLG results when the first hidden layer is hidden and not hidden by four methods on the MNIST according to the embodiment of the present invention; (a) FL; (b) MePC-F; (c) PeMPC; (d) Gaussian; (e) Laplacian;
FIG. 5 is the performance of DLG on MNIST of an embodiment of the present invention when the gradient of the first hidden layer is replaced by four methods (Gaussian distribution, laplacian distribution, PEMPC and MePC-F);
FIG. 6 is the average accuracy and loss of No-IID MNIST data for an embodiment of the present invention;
FIG. 7 is the average accuracy and loss of No-IID CAFIR-10 data for an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The parameters involved in the examples of the invention are described below:
TABLE 1 description of the parameters
Figure BDA0003740951420000121
Wherein E is i Indicating the current edge server, E j Representing edge servers other than the current edge server, E s Representing all edge servers.
The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles comprises the following steps:
s1, constructing a plurality of edge servers E i And a cloud server CS; obtaining vehicles data D = { D = 1 ,D 2 ,…,D i }, edge server E i Acquiring corresponding vehicle data D i
S2, in the k-th round of federal task, an edge server E i Downloading initial type A gradients from cloud server CS
Figure BDA0003740951420000131
And decrypted into
Figure BDA0003740951420000132
Random initialization of type B gradients
Figure BDA0003740951420000133
Edge server E i According to its vehicle data D i To compute gradients in local network model training, edge server E i After finishing the local training of T wheelIs recorded as gradient information of
Figure BDA0003740951420000134
S3, edge server E i By decoding functions
Figure BDA0003740951420000135
From
Figure BDA0003740951420000136
To obtain partial gradient information to be preserved
Figure BDA0003740951420000137
And the remaining gradient information is used
Figure BDA0003740951420000138
Is homomorphically encrypted to
Figure BDA0003740951420000139
Then broadcast and send to all other edge servers E through MePC algorithm j (ii) a Edge server E i According to decoding function
Figure BDA00037409514200001310
Get from other edge servers E j Corresponding partial gradient information of
Figure BDA00037409514200001311
The class A gradient information after all the edge servers are updated and shared is respectively
Figure BDA00037409514200001312
i∈[1,n]N is the total number of edge servers;
s4, all edge servers will
Figure BDA00037409514200001313
Uploading the data to a cloud server CS, aggregating global parameters by the cloud server CS through a PreFLa algorithm, and selecting edge services by the PreFLa algorithm through obtaining a maximization report through reinforcement learningDevice E i Is optimized to the parameter weight ratio a i,k Global parameter
Figure BDA00037409514200001314
According to a i,k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;
and S5, repeating the steps S2-S4 until a termination condition is reached, and finishing the whole training process. The termination condition may be a maximum number of training cycles, a convergence of a loss function, or other user-defined condition. Finally, an optimal loss function can be obtained according to the following equation (1).
Figure BDA00037409514200001315
Wherein, L (w) i ) Represents E i Loss of the network.
The specific method of local training is as follows:
in the local model phase, a Deep Neural Network (DNN) is employed to learn the cloud model and the ES model. DNN performs end-to-end feature learning and classifier training by taking different user data as raw inputs. Random gradient descent will be used as a subroutine in the proposed algorithm to minimize the loss value in each local training.
In a downstream communication phase E i At k (k E [1,K)]) Downloading base layer parameters from CS in round-robin communication
Figure BDA0003740951420000141
And randomly initializing
Figure BDA0003740951420000142
Where K represents the total number of rounds of the federated task. If the task is the first federal task, the CS initializes randomly
Figure BDA0003740951420000143
Before local training, E i It is necessary to use the pair of homomorphic encryptions (formula (4)) for
Figure BDA0003740951420000144
Is decrypted into
Figure BDA0003740951420000145
And is marked as
Figure BDA0003740951420000146
In order to better embody the model personalization, the loss function of the local model is set as follows:
L(w i )=l(w i )+λ(w i,t -w i,t+1 ) 2 (16)
where l () represents the loss of the network, e.g., the cross-entropy loss of the classification task. The second term is an L2 regularization term, which can not only keep the individuation capability of the second term, but also improve the cooperation efficiency with other participants. λ is the regularization coefficient.
E i Initialization G k And replaces the weight parameter w of the model i Continuing the local model training as follows
w i =w i -ηG k (17)
Where eta is the learning rate, G k Is that
Figure BDA0003740951420000147
And
Figure BDA0003740951420000148
is shown in general. Herein, the
Figure BDA0003740951420000149
And (4) random initialization.
E i After T rounds of local training are achieved, the accuracy acc of each local model is obtained at the moment i,k
Figure BDA00037409514200001410
And
Figure BDA00037409514200001411
terminal anddirect sharing of user information between the terminals is prohibited, and data in the edge server needs to be encrypted before communication, so that the data is prevented from being attacked before communication. This process uses the HE to avoid information leakage. The process of adding HE using real numbers will be shown below. The encryption schemes of the weight matrix and the offset vector follow the same idea, and the addition homomorphic encryption of the real number a is expressed as a E . In additive homomorphic encryption, for any two numbers a and b, there is a E +b E =(a+b) E . The method of converting any real number r into a coded rational number stationary point v is:
Figure BDA00037409514200001412
consider the gradient
Figure BDA0003740951420000151
Each encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits and d-bit fractional bits. Thus, each rational number that can be encoded is defined by its H =1+ z + d bits. The encoding is performed to allow multiplication operations, which require the operation modulus to be H +2d to avoid comparison.
The decoding is defined as:
Figure BDA0003740951420000152
multiplication of these code numbers requires removal by a factor of 1/2d. When Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once. For simplicity, it is processed at decoding time.
It is correct if only one code multiplication has taken place. Since the largest encryptable integer is V-1, the largest encryptable real number must take this into account. Therefore, the integer z and the decimal d must be chosen as follows:
V≥2 H+2d ≥2 1+z+3d (5)
after encryption
Figure BDA0003740951420000153
And aCC i,k Are respectively represented as
Figure BDA0003740951420000154
And
Figure BDA0003740951420000155
the specific method of the MePC algorithm is shown in FIG. 3.
Exchange of base layer gradients using MePC in the k-th Federal task
Figure BDA0003740951420000156
To avoid the risk of data being cracked, a random ratio χ is taken in each network
Figure BDA0003740951420000157
The gradient is
Figure BDA0003740951420000158
And keeping the same federal random proportion χ the same. In the different rounds of federal mission, the random ratio χ (χ ∈ [1,1/n)]) Is that the number of the optical fibers is varied,
Figure BDA0003740951420000159
the remaining gradient was divided equally into n-1 portions
Figure BDA00037409514200001510
As shown in figure 3 of the drawings,
Figure BDA00037409514200001511
the values of (a) are divided into:
Figure BDA00037409514200001512
only is provided with
Figure BDA00037409514200001513
Is retained at E i The other part and the random parameter χ are broadcast to the other ESs in the form of ciphertext. In this way, even if part of the transmission content is attacked, the original data
Figure BDA0003740951420000161
And will not leak. In particular, if an attacker wants to obtain data
Figure BDA0003740951420000162
Must acquire
Figure BDA0003740951420000163
All of (a). However, it is possible to use a single-layer,
Figure BDA0003740951420000164
and χ in participant E i And a receiver E j The cryptograph form is kept through homomorphic encryption during communication.
The gradient information shared to the other ESs is
Figure BDA0003740951420000165
Figure BDA0003740951420000166
When E is i Receiving data packet sent by other server
Figure BDA0003740951420000167
It performs data authentication locally. In particular, it uses a corresponding "multiplication" method for verification. Each edge server designs two decoding functions by itself, as follows:
Figure BDA0003740951420000168
Figure BDA0003740951420000169
wherein L is 0 Is that
Figure BDA00037409514200001610
Length of (L') is
Figure BDA00037409514200001611
Length of (d).
L 0 =χ·L (9)
Figure BDA00037409514200001612
Wherein L is
Figure BDA00037409514200001613
The length of (a) of (b),
Figure BDA00037409514200001614
and
Figure BDA00037409514200001615
are equal in length.
Require that
Figure BDA00037409514200001616
The decoding functions meeting all the ES execute 'and' operation on the same data packet to obtain all 0's, and execute' cross 'operation to obtain all 1's, that is
Figure BDA00037409514200001617
Figure BDA00037409514200001618
First, the decoding function is initialized as follows,
Figure BDA00037409514200001619
note that at the time of initialization, different E' s i The data decoding of the transmitted data packet in the same federal task is also the same function.
Data packet
Figure BDA0003740951420000171
Multiplied by the corresponding decoding functions in the other servers. Due to the fact that
Figure BDA0003740951420000172
The binary bit of middle 0 is multiplied by 0, so E i It can be guaranteed that only its own partial data packet is available. When in use
Figure BDA0003740951420000173
When the binary bit in (1) is 1, the ciphertext of the gradient information of the corresponding position can be obtained as follows:
Figure BDA0003740951420000174
E i adding all data packet arrays obtained from other ES to corresponding positions to obtain all ciphertext data, and updating the ciphertext data into the final ciphertext data
Figure BDA0003740951420000175
Namely, it is
Figure BDA0003740951420000176
Each time a secure multiparty computation is performed, each E will be incremented with k i Decoding function of
Figure BDA0003740951420000177
Is circularly moved to the left by m units to ensure
Figure BDA0003740951420000178
Dynamics of sharing and can divide them equally into E 1 ,E 2 ,…,E n And the data information of each part is not repeated.
The specific method of the PreFla algorithm is as follows:
the data distribution in the internet of vehicles is dispersed, and the data is unbalanced and heterogeneous, so that the improvement of the personalized service requirement while the real-time requirement is met is difficult. In order to prevent the privacy of users from being leaked during the communication of different edge servers, the HE is used for encrypting parameters during the communication. In order to better realize personalized training aiming at different user data, the first layer is set as a basic layer, the existing federal learning method is used for training in a cooperative mode, and other layers are used as personalized layers for local training, so that personal information of different ES devices can be captured. In this way, after the joint training process, the globally shared base layer can be transferred into the ES to build its own personalized deep learning model and use its unique personalized layer. Downloading base layer parameters from CS only
Figure BDA0003740951420000179
Parameters of a personalization layer
Figure BDA00037409514200001710
Randomly generated and fine-tuned using local data. In order to meet the real-time requirement and realize the personalized requirement of the ES, the PreFLa adopts Reinforcement Learning (RL) to carry out adaptation to select the optimal parameter weight ratio a i,k Aggregating global parameters
Figure BDA00037409514200001711
In the upstream communication phase, each ES not only trains the local model, but also uploads the local parameters to the CS for joint aggregation. After execution of the MePC algorithm in the k-th federation, E i Parameterization over TLS/SSL secure channel
Figure BDA0003740951420000181
And
Figure BDA0003740951420000182
and uploading to the BS. In the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, its model parameters for aggregation have a crucial influence on the convergence speed of the stage. Therefore, it is necessary to consider participant E in k rounds of federal aggregation i Parameter weight ratio of (a) i,k
In the invention, DQN-based reinforcement Learning is used for predicting the parameter weight ratio, and information is stored through a Q function instead of table storage in Q-Learning so as to prevent a spatial multidimensional disaster. In order to better realize model personalization and reduce the waiting time of the uploading weight in MePC-F, DQN is used for selecting the optimal parameter weight ratio a i,k To, aggregate the global parameters in the update CS
Figure BDA0003740951420000183
The reinforcement learning center contains: the state, action, reward function, and feedback are defined as follows:
the state is as follows: state of the kth wheel
Figure BDA0003740951420000184
Wherein the content of the first and second substances,
Figure BDA0003740951420000185
is a poor precision, expressed as:
Figure BDA0003740951420000186
the actions are as follows: weight ratio of parameter a i,k Represented as the actions of the k-th round of federal tasks. In order to avoid trapping in a local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process, and a can be obtained i,k
Figure BDA0003740951420000187
Where P is a set of weight permutations and rand is a random number (ra)nd∈[0,1]),Q(s i,k ,a i,k ) Means agent is in state s i,k Take action a i,k The accumulation of time discounts revenue. Once DQN is trained to approximate Q(s) during testing i,k ,a i,k ) The DQN agent will compute { Q(s) for all actions in the kth round i,k ,a i,k )|a i,k ∈[P]}. Each action value represents a passing of the agent in state s i,k Selecting a particular action a i,k The maximum expected return that can be achieved.
Rewarding: set the observed reward at the end of the k-th round of federal as
Figure BDA0003740951420000191
Wherein the content of the first and second substances,
Figure BDA0003740951420000195
is a positive number, ensuring that r k Δ acc with test accuracy i,k And the growth is exponential. The first incentive agent selects devices that enable higher test accuracy.
Figure BDA0003740951420000192
For controlling the following Δ acc i,k Increase r k A change in (c). In general, as machine learning training progresses, model accuracy increases at a slower rate. However, in the federal cooperative task, the model accuracy may be reduced due to data distribution imbalance and heterogeneity. Thus, as FL enters the late stage, an exponential term is used to amplify the increase in boundary accuracy. The second term-1 is used to encourage the agent to improve model accuracy because when Δ acc i,k When less than 0, there is r k ∈(-1,0)。
Training DQN agents to maximize the expectation of a cumulative discount reward as shown in
Figure BDA0003740951420000193
Where γ ∈ (0,1 ] is a factor discounting future rewards.
At obtaining r k Thereafter, the CS saves the multi-dimensional quad B for each round of federated tasks k =(s i,k ,a i,k ,r k ,s i,k+1 ). Optimal action value function Q(s) i,k ,a i,k ) Is the memo sought by the RL proxy, defined as s i,k Initial cumulative discount yield maximum expectation:
Q(s i,k ,a i,k )=E(r i,k +γmax Q(s i,k+1 ,a i,k )|s i,k ,a i,k ) (22)
a parameterized value function Q(s) can then be learned using function approximation techniques i,k ,a i,k ;w k ) Approximating the optimum function Q(s) i,k ,a i,k ). In the first step r k +γmax Q(s i,k+1 ,a i,k ) Is Q(s) i,k ,a i,k ;w k ) The goal of learning. Generally, DNN is used to represent a function approximator. The RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:
l(w k )=(r i,k +γmax Q(s i,k+1 ,a i,k ;w k )-Q(s i,k ,a i,k ;w k )) 2 (23)
CS updates Global parameter w k Comprises the following steps:
Figure BDA0003740951420000194
wherein eta is more than or equal to 0, which is the step length.
The CS repeats the above steps to obtain the best learning model. CS can obtain a of the k-th round weight ratio sequence i,k Global parameter of
Figure BDA0003740951420000201
The updating is as follows:
Figure BDA0003740951420000202
all ES update Global parameters
Figure BDA0003740951420000203
And begin the next T local training rounds.
Experimental test examples:
to verify the validity of the proposed mechanism, experimental results and analysis are given. Consider a system with 1 cloud server, 10 edge servers, respectively. The experimental learning rate α is 0.01 and the discount factor γ is 0.9. Positive integer
Figure BDA0003740951420000204
And taking 3. The values of the parameters are shown in Table 2.
TABLE 2 parameter settings
Figure BDA0003740951420000205
The validity of the proposed model was verified on two data sets: MNIST and CIFAR-10. The performance of the proposed federal learning model MePC-F was evaluated based on reconstructed images, average accuracy and average loss of DLG. First, the performance of five schemes against DLG attacks was defended, and then the performance of the proposed federal learning model, mePC-F, was compared to centralized and PeMPC. All results in the following scenario are mean values of 1000 independent experiments.
1) Performance against DLG attacks
This section evaluates the effectiveness of MePC-F and compares it with FL, peMPC and DP algorithms (Gaussian and Laplace distributed noise) in DLG reconstructed images. The common gradient of the network is computed for a single image on the MNIST dataset, the results of the different schemes are shown in fig. 4. Since studies [17] indicate that hiding the gradient of the first layer can reduce the reconstruction of the data, the gradient of the first layer (weight and bias terms) is replaced by four methods: mePC-F, peMPC, gaussian distribution (μ =0, σ = 1) noise, and laplace distribution (μ =0, σ = 1) noise were proposed to look at the behavior of DLG. Ladder hiding the first floor after completion
After degrees, DLG uses these gradients to recover the image that created the common shared gradient.
As can be seen from fig. 4, the DLG process can accurately reconstruct the training data without any method to hide the gradient of the first layer (FL in fig. 4 (a)). When the gradient of the first layer is protected by the method proposed by the present invention, mePC-F, information leakage can be effectively prevented in fig. 4 (b). When the number of iteration steps reaches 500, the DLG still cannot construct an image. From fig. 4 (c), it can be seen that similar results to fig. 4 (b), the PeMPC can also defend against the DLG attack in fig. 4. As can be seen from fig. 4 (d), by adding gaussian noise to the first layer, the reconstructed image is partially displayed from the 15 th round to the 20 th round, where the basic contour of the original image has been constructed. As the number of iteration rounds increases to 500 rounds, the image can be restored clearly. The laplacian noise and the gaussian noise in fig. 4 (e) also have a similar phenomenon.
As can be seen from fig. 5, if a malicious server receives the gradients of all hidden layers as plain text, the reconstruction process can obtain the lowest gradient loss and MSE of the image (green line in fig. 5). As the number of rounds increases, peMPC and MePC-F do not converge to zero and the MSE of the image reaches 10 7 . Adding laplacian and gaussian noise to the original gradient converges to 10 -5 Fig. 4 also demonstrates that the data can be reconstructed when up to 20 rounds are reached. The larger the MSE of the image, the less likely it is that the image is reconstructed.
Based on the above experimental results, it is verified that adding laplacian and gaussian noise to the original gradient can prevent the early partial gradient leakage, but as the number of rounds increases, the original data is still recovered due to the depth leakage. However, peMPC and MePC-F are effective methods to prevent DLG attacks from reconstructing raw data no matter how long the number of training rounds are.
2) Performance comparison of average accuracy and average penalty
In this section, the effectiveness of MePC-F was evaluated and compared to centralized and PeMPC in terms of average accuracy, MNIST and average loss on CIFAR-10 data sets.
From fig. 6 (a) it can be seen the number of rounds required by the model to achieve 98% accuracy on the MNIST dataset. The average accuracy of all three methods increases with increasing training rounds. A centralized approach to achieve target accuracy on MNIST data requires 25 rounds, pepmc 140 rounds and MePC-F40 rounds, with MePC-F requiring 71.2% lower training rounds than pepmc. The reason is that the proposed reinforced federal learning algorithm PreFLa can find better aggregation parameter weight a through interaction with the environment i,k The method can better deal with No-IID data, accelerate model convergence and achieve target precision. The centralized type is trained on all combinations of data, so its accuracy will be higher than that of the federal learning algorithm. But it can be seen from the figure that the convergence of the PeMPC can almost reach a centralized accuracy.
From fig. 6 (b), it can be seen that the average loss for the three schemes decreases as the number of training rounds increases. For the lumped type, the average loss is reduced from 0.233 to 0.052. The average loss of the peppc decreased from 0.35 to 0.084. Meanwhile, the average loss of MePC-F proposed by the invention is reduced to 0.06, which is lower than 28.6% of that of PeMPC. The proposed MePC-F can almost reach a centralized loss value when the number of training rounds reaches 100 rounds.
As can be seen in FIG. 7 (a), the number of rounds required for the model to achieve a target accuracy of 50% on CAFIR-10. Similar results to those of fig. 6 (a) can be seen. The average accuracy of all three models is increasing until the target value is reached. For the centralised type, the average accuracy increased from 0.42 to 0.5 for 23 rounds. The average accuracy of the peppc increased from 0.372 to 0.5 for 89 rounds. Meanwhile, the average precision of the proposed MePC-F reaches the target precision at 41 rounds, which is 53.9% lower than that of PeMPC. FIG. 7 (a) shows that MePC-F uses a better weight a than PeMPC i,k The global model is updated, which results in a faster convergence speed.
As can be seen from fig. 7 (b), the average loss for the three schemes is decreasing until a stable value is reached. The centralized MePC-F, peMPC reaches the minimum loss value in sequence, and the time efficiency of the proposed MePC-F is better under PeMPC.
TABLE 3 top accuracy of three schemes in 100 rounds
MNIST CIFAR-10
centralized 98.4% 51.4%
MePC-F 98.2% 51.1%
PeMPC 97.6% 49.2%
Table 3 gives the accuracy of the three solutions within 100 rounds. For MNIST data, the average accuracy of the proposed MePC-F is 98.2% which is 0.6% higher than that of PeMPC. The accuracy of the PeMPC can almost reach the accuracy of the centralized training. For the CAFIR-10 data, the mean accuracy of MePC-F was as high as 0.511 at 100 rounds, 1.9% higher than for PeMPC. It shows that MePC-F can update a by optimal weight i,k Global parameters are aggregated better than with peampc, resulting in higher accuracy, closer to focused accuracy.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims (8)

1. A real-time reinforced Federal learning data privacy security method based on a MePC-F model in the Internet of vehicles is characterized by comprising the following steps:
s1, constructing a plurality of edge servers E i And a cloud server CS; acquiring vehicle data D = { D = { (D) 1 ,D 2 ,…,D i }, edge server E i Acquiring corresponding vehicle data D i
S2, in the k-th round of federal task, an edge server E i Downloading initial type A gradients from cloud server CS
Figure FDA0003740951410000011
And decrypted into
Figure FDA0003740951410000012
Random initialization of type B gradients
Figure FDA0003740951410000013
Edge server E i According to its vehicle data D i To compute gradients in local network model training, edge server E i Recording the gradient information after finishing the T-round local training
Figure FDA0003740951410000014
S3, edge server E i By decoding functions
Figure FDA0003740951410000015
From
Figure FDA0003740951410000016
To obtain partial gradient information to be preserved
Figure FDA0003740951410000017
And the rest ladder is arrangedDegree information
Figure FDA0003740951410000018
Is homomorphically encrypted to
Figure FDA0003740951410000019
Then broadcast and send to all other edge servers E through the MePC algorithm j (ii) a Edge server E i According to decoding function
Figure FDA00037409514100000110
Obtain information from other edge servers E j Corresponding partial gradient information of
Figure FDA00037409514100000111
The class A gradient information after all the edge servers are updated and shared is respectively
Figure FDA00037409514100000112
i∈[1,n]N is the total number of edge servers;
s4, all edge servers will
Figure FDA00037409514100000113
Uploading the data to a cloud server CS, aggregating global parameters by the cloud server CS through a PreFLa algorithm, and selecting an edge server E by the PreFLa algorithm through obtaining a maximization report through reinforcement learning i Is optimized to the parameter weight ratio a i,k Global gradient parameter
Figure FDA00037409514100000114
According to a i,k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;
and S5, repeating the steps S2 to S4 until a termination condition is reached, calculating a final global gradient parameter by the cloud server CS, issuing the final global gradient parameter to each edge server, extracting the accuracy and the optimal loss function of the MePC-F model by the edge server according to the characteristics of a plurality of vehicle data, obtaining the trained MePC-F model, completing the whole training process, and outputting the model to a service corresponding to the Internet of vehicles in real time.
2. The real-time reinforced federal learning data privacy security method in the internet of vehicles based on the MePC-F model as claimed in claim 1, wherein in the step S2, the specific method for training the local network model is as follows:
employing a deep neural network DNN model, the DNN performing end-to-end feature learning and classifier training by taking different vehicle data as raw inputs, using stochastic gradient descent as a subroutine to minimize the loss value in each local training;
E i downloading base layer parameters from cloud server CS in kth round of communication, namely initial A-type gradient before decryption
Figure FDA0003740951410000021
And decipher as A-type gradient
Figure FDA0003740951410000022
Random initialization of type B gradients
Figure FDA0003740951410000023
Wherein k is the [1,K ]]K represents the total number of rounds of the federal mission; if the task is the first round of federal task, the CS is initialized randomly
Figure FDA0003740951410000024
Before local training, E i By using homomorphic cryptographic pairs
Figure FDA0003740951410000025
Is decrypted into
Figure FDA0003740951410000026
And is marked as
Figure FDA0003740951410000027
The loss function of the local model is set as follows:
L(w i )=l(w i )+λ(w i,t -w i,t+1 ) 2
where L () represents the loss of the network, the second term is the L2 regularization term, and λ is the regularization coefficient; w is a i Representing the total weight information in the local model, w i,t Is the weight information of the local model at time t, w i,t+1 The weight information of the local model at the moment t +1 is obtained;
E i update G k And replaces the weight parameter w of the model i The local model training continues by minimizing the loss function as follows:
w i =w i -ηG k
where eta is the learning rate, G k Is that
Figure FDA0003740951410000028
And
Figure FDA0003740951410000029
general expression of (1), herein
Figure FDA00037409514100000210
Random initialization;
edge server E i After T rounds of local training are achieved, the accuracy of each local model aCC is obtained at the moment i,k And
Figure FDA00037409514100000211
3. the real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein in the step S3, the specific method of the MePC algorithm is as follows:
in the k-th federal task, all edge servers use MePC to exchange base layer gradients
Figure FDA0003740951410000031
Wherein the content of the first and second substances,
Figure FDA0003740951410000032
class a encrypted data representing the nth edge server in the kth round of federal task,
Figure FDA0003740951410000033
class a encrypted data representing the ith edge server in the kth round of federal task,
Figure FDA0003740951410000034
indicating that the ith edge server in the k round of federal task broadcasts a type A encrypted data to other edge servers,
Figure FDA0003740951410000035
is that
Figure FDA0003740951410000036
The encrypted data reserved by the user is removed;
to avoid the risk of data being cracked, a random ratio χ is taken in each network
Figure FDA0003740951410000037
The gradient is
Figure FDA0003740951410000038
And keeping the same federal random proportion chi equal in the same round, and then will
Figure FDA0003740951410000039
Is encrypted as
Figure FDA00037409514100000310
The random proportion χ varies across different rounds of federal mission, and χ ∈ [1,1/n ]];
Figure FDA00037409514100000311
The remaining gradient is encrypted homomorphically to
Figure FDA00037409514100000312
Is divided into n-1 parts
Figure FDA00037409514100000313
The values of (a) are divided into:
Figure FDA00037409514100000314
only is provided with
Figure FDA00037409514100000315
Is retained at E i In the method, other parts and the random parameter x are broadcast and transmitted to other E in a ciphertext mode j (ii) a In this way, even if part of the transmitted content is attacked, the original data
Figure FDA00037409514100000316
The leakage is avoided;
sharing to other E j The gradient information of
Figure FDA00037409514100000317
Figure FDA00037409514100000318
When E is i Receiving data packet sent by other server
Figure FDA00037409514100000319
It performs data authentication locally.
4. The real-time reinforced federal learning data privacy security method in the internet of vehicles based on the MePC-F model as claimed in claim 3, wherein in the step S3, the specific method for locally performing data verification is as follows:
in the k-th round of federal tasks, verification was performed using the corresponding "multiplication" method, with each edge server designing two decoding functions on its own, as follows:
Figure FDA0003740951410000041
Figure FDA0003740951410000042
wherein L is 0 Is that
Figure FDA0003740951410000043
Length of (L') is
Figure FDA0003740951410000044
The length of (d); subscript k of the decoding function represents the decoding function in the k-th round of federal task;
L 0 =χ·L
Figure FDA0003740951410000045
wherein L is
Figure FDA0003740951410000046
The length of (a) of (b),
Figure FDA0003740951410000047
and with
Figure FDA0003740951410000048
Are equal in length;
require that
Figure FDA0003740951410000049
The decoding functions of all the edge servers execute 'union' operation on the same data packet to obtain all 0's, and execute' intersection 'operation to obtain all 1's, namely:
Figure FDA00037409514100000410
Figure FDA00037409514100000411
first, the initial decoding function is as follows:
Figure FDA00037409514100000412
data packet
Figure FDA00037409514100000413
Multiplying with corresponding decoding functions in other servers; due to the fact that
Figure FDA00037409514100000414
The binary bit of middle 0 is multiplied by 0, so E i Ensuring that only its own partial data packet is obtained; when in use
Figure FDA00037409514100000415
When the binary bit in (1) is 1, obtaining the ciphertext of the gradient information of the corresponding position as follows:
Figure FDA00037409514100000416
E i will be from other edge servers E j Adding all the obtained data packet arrays to corresponding positions to obtain all the ciphertext data, and updating the ciphertext data into the final ciphertext data
Figure FDA00037409514100000417
Namely:
Figure FDA00037409514100000418
each E will be added as k increases each time a secure multiparty computation is performed i Decoding function of
Figure FDA0003740951410000051
Is circularly moved to the left by m units to ensure
Figure FDA0003740951410000052
Dynamics of sharing and equally dividing them into E 1 ,E 2 ,…,E n And the data information of each part is not repeated.
5. The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein in the step S4, the specific method of the PreFla algorithm is as follows:
PreFLa adopts reinforcement learning RL to carry out adaptation to select optimal parameter weight ratio a i,k Aggregating global parameters
Figure FDA0003740951410000053
In an uplink communication stage, each edge server not only trains a local model, but also uploads local parameters to a cloud server CS for joint aggregation; after execution of MePC algorithm in the kth Federal, E i Parameterization over TLS/SSL secure channel
Figure FDA0003740951410000054
And
Figure FDA0003740951410000055
uploading to the CS; in the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, the model parameters of each ES are used for aggregation, and the convergence speed of the aggregation stage has a crucial influence; therefore, it is necessary to consider participant E in k rounds of federal aggregation i Parameter weight ratio of (a) i,k
Using the DQN-based reinforcement learning to predict the parameter weight ratio, and storing information through a Q function to prevent a spatial multi-dimensional disaster; in order to better realize model personalization and reduce the waiting time of the uploading weight in MePC-F, DQN is used for selecting the optimal parameter weight ratio a i,k Aggregating global parameters in update CS
Figure FDA0003740951410000056
The reinforcement learning comprises the following steps: status, actions, reward functions, and feedback.
6. The real-time reinforced federal learning data privacy security method in internet of vehicles based on the MePC-F model as claimed in claim 5, wherein in the step S4, the specific methods of status, action, reward function and feedback are as follows:
the state is as follows: state of the kth wheel
Figure FDA0003740951410000057
Wherein the content of the first and second substances,
Figure FDA0003740951410000058
is a poor precision, expressed as:
Figure FDA0003740951410000059
the actions are as follows: weight ratio of parameter a i,k An action represented as a kth round of federated tasks; in order to avoid trapping in a local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process to obtain a i,k
Figure FDA0003740951410000061
Where P is a set of weight permutations, rand is a random number, rand is an element of 0,1],Q(s i,k ,a i,k ) Means agent is in state s i,k Take action a i,k The accumulated cash-out benefits; once DQN is trained to approximate Q(s) during testing i,k ,a i,k ) The DQN agent will compute { Q(s) for all actions in the kth round i,k ,a i,k )|a i,k ∈[P]}; each action value represents a passing of the agent in state s i,k Selecting a particular action a i,k Maximum expected return obtained;
rewarding: the observed reward at the end of the k-th round of federal is set to be:
Figure FDA0003740951410000062
wherein the content of the first and second substances,
Figure FDA0003740951410000063
is a positive number, ensuring that r k Δ acc with training accuracy i,k The growth is exponential; the first incentive agent selects equipment capable of achieving higher test accuracy;
Figure FDA0003740951410000065
for controlling the following Δ acc i,k Increase r k A change in (c); when Δ acc i,k When less than 0, there is r k ∈(-1,0);
Training the DQN agent to maximize the expectation of a cumulative discount reward, as shown by:
Figure FDA0003740951410000064
wherein γ ∈ (0,1 ], representing a factor discounting future rewards;
in obtaining r k The cloud server CS then saves the multi-dimensional quadruplets B for each round of federal tasks k =(s i,k ,a i,k ,r k ,s i,k+1 ) (ii) a Optimal action value function Q(s) i,k ,a i,k ) Is the memo sought by the RL proxy, defined as s i,k Initial cumulative discount yield maximum expectation:
Q(s i,k ,a i,k )=E(r i,k +γmax Q(s i,k+1 ,a i,k )|s i,k ,a i,k )
learning a parameterized value function Q(s) using function approximation techniques i,k ,a i,k ;w k ) Approximating the optimum function Q(s) i,k ,a i,k );r k +γmax Q(s i,k+1 ,a i,k ) Is Q(s) i,k ,a i,k ;w k ) A goal of learning; DNN is used to represent a function approximator; the RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:
l(w k )=(r i,k +γmax Q(S i,k+1 ,a i,k ;w k )-Q(s i,k ,a i,k ;w k )) 2
CS updates Global parameter w k Comprises the following steps:
Figure FDA0003740951410000071
wherein eta is more than or equal to 0 and is the step length;
after the cloud server CS obtains the optimal learning model, a of the k-th round weight ratio sequence is obtained i,k Global parameter of
Figure FDA0003740951410000072
The updating is as follows:
Figure FDA0003740951410000073
all edge servers update global parameters
Figure FDA0003740951410000074
And begin the next T local training rounds.
7. The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein the HE encryption method in the method specifically comprises the following steps:
the encryption schemes of the weight matrix and the offset vector follow the same idea, and the addition homomorphic encryption of the real number a is expressed as a E In addition homomorphic encryption, for any two numbers a and b, there is a E +b E =(a+b) E (ii) a The method of converting any real number r into a coded rational number stationary point v is:
Figure FDA0003740951410000076
consider the gradient
Figure FDA0003740951410000075
Each encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits, and d-bit fractional bits; thus, each rational number that can be encoded is defined by its H =1+ z + d bits; performing encoding to allow multiplication operations, which require operations modulo H +2d to avoid comparisons;
the decoding is defined as:
Figure FDA0003740951410000081
multiplication of these code numbers requires removal by a factor of 1/2d; when Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once; for simplicity, it is processed at decoding time;
the largest encryptable integer is V-1, so the largest encryptable real number must be taken into account, and therefore the integer z and the fraction d are chosen as follows:
V≥2 H+2d ≥2 1+z+3d
8. the real-time reinforced federal learning data privacy security method in internet of vehicles based on MePC-F model as claimed in claim 1, wherein the optimal loss function in the step S5 is
Figure FDA0003740951410000082
Wherein, L (w) i ) Denotes E i Loss of the network.
CN202210816716.3A 2022-07-12 2022-07-12 Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles Active CN115310121B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210816716.3A CN115310121B (en) 2022-07-12 2022-07-12 Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210816716.3A CN115310121B (en) 2022-07-12 2022-07-12 Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles

Publications (2)

Publication Number Publication Date
CN115310121A true CN115310121A (en) 2022-11-08
CN115310121B CN115310121B (en) 2023-04-07

Family

ID=83857637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210816716.3A Active CN115310121B (en) 2022-07-12 2022-07-12 Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles

Country Status (1)

Country Link
CN (1) CN115310121B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731424A (en) * 2022-12-03 2023-03-03 北京邮电大学 Image classification model training method and system based on enhanced federal domain generalization
CN115860789A (en) * 2023-03-02 2023-03-28 国网江西省电力有限公司信息通信分公司 FRL (fast recovery loop) -based CES (Cyclic emergency separation) day-ahead scheduling method
CN117812564A (en) * 2024-02-29 2024-04-02 湘江实验室 Federal learning method, device, equipment and medium applied to Internet of vehicles
CN117873402A (en) * 2024-03-07 2024-04-12 南京邮电大学 Collaborative edge cache optimization method based on asynchronous federal learning and perceptual clustering

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160573A (en) * 2020-04-01 2020-05-15 支付宝(杭州)信息技术有限公司 Method and device for protecting business prediction model of data privacy joint training by two parties
CN111611610A (en) * 2020-04-12 2020-09-01 西安电子科技大学 Federal learning information processing method, system, storage medium, program, and terminal
CN112100295A (en) * 2020-10-12 2020-12-18 平安科技(深圳)有限公司 User data classification method, device, equipment and medium based on federal learning
CN112199702A (en) * 2020-10-16 2021-01-08 鹏城实验室 Privacy protection method, storage medium and system based on federal learning
CN112347500A (en) * 2021-01-11 2021-02-09 腾讯科技(深圳)有限公司 Machine learning method, device, system, equipment and storage medium of distributed system
CN113037460A (en) * 2021-03-03 2021-06-25 北京工业大学 Federal learning privacy protection method based on homomorphic encryption and secret sharing
CN113435472A (en) * 2021-05-24 2021-09-24 西安电子科技大学 Vehicle-mounted computing power network user demand prediction method, system, device and medium
US20220129700A1 (en) * 2020-10-27 2022-04-28 Alipay (Hangzhou) Information Technology Co., Ltd. Methods, apparatuses, and systems for updating service model based on privacy protection

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160573A (en) * 2020-04-01 2020-05-15 支付宝(杭州)信息技术有限公司 Method and device for protecting business prediction model of data privacy joint training by two parties
CN111611610A (en) * 2020-04-12 2020-09-01 西安电子科技大学 Federal learning information processing method, system, storage medium, program, and terminal
CN112100295A (en) * 2020-10-12 2020-12-18 平安科技(深圳)有限公司 User data classification method, device, equipment and medium based on federal learning
CN112199702A (en) * 2020-10-16 2021-01-08 鹏城实验室 Privacy protection method, storage medium and system based on federal learning
US20220129700A1 (en) * 2020-10-27 2022-04-28 Alipay (Hangzhou) Information Technology Co., Ltd. Methods, apparatuses, and systems for updating service model based on privacy protection
CN112347500A (en) * 2021-01-11 2021-02-09 腾讯科技(深圳)有限公司 Machine learning method, device, system, equipment and storage medium of distributed system
CN113037460A (en) * 2021-03-03 2021-06-25 北京工业大学 Federal learning privacy protection method based on homomorphic encryption and secret sharing
CN113435472A (en) * 2021-05-24 2021-09-24 西安电子科技大学 Vehicle-mounted computing power network user demand prediction method, system, device and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙爽: "不同场景的联邦学习安全与隐私保护研究综述", 《计算机应用研究》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731424A (en) * 2022-12-03 2023-03-03 北京邮电大学 Image classification model training method and system based on enhanced federal domain generalization
CN115731424B (en) * 2022-12-03 2023-10-31 北京邮电大学 Image classification model training method and system based on enhanced federal domain generalization
CN115860789A (en) * 2023-03-02 2023-03-28 国网江西省电力有限公司信息通信分公司 FRL (fast recovery loop) -based CES (Cyclic emergency separation) day-ahead scheduling method
CN115860789B (en) * 2023-03-02 2023-05-30 国网江西省电力有限公司信息通信分公司 CES day-ahead scheduling method based on FRL
CN117812564A (en) * 2024-02-29 2024-04-02 湘江实验室 Federal learning method, device, equipment and medium applied to Internet of vehicles
CN117812564B (en) * 2024-02-29 2024-05-31 湘江实验室 Federal learning method, device, equipment and medium applied to Internet of vehicles
CN117873402A (en) * 2024-03-07 2024-04-12 南京邮电大学 Collaborative edge cache optimization method based on asynchronous federal learning and perceptual clustering
CN117873402B (en) * 2024-03-07 2024-05-07 南京邮电大学 Collaborative edge cache optimization method based on asynchronous federal learning and perceptual clustering

Also Published As

Publication number Publication date
CN115310121B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN115310121B (en) Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles
Han et al. Logistic regression on homomorphic encrypted data at scale
CN109684855B (en) Joint deep learning training method based on privacy protection technology
US11017322B1 (en) Method and system for federated learning
US11449753B2 (en) Method for collaborative learning of an artificial neural network without disclosing training data
Wang et al. Authenticated garbling and efficient maliciously secure two-party computation
EP3958158B1 (en) Privacy-preserving machine learning
Zhu et al. Distributed additive encryption and quantization for privacy preserving federated deep learning
CN110572253A (en) Method and system for enhancing privacy of federated learning training data
CN111160573A (en) Method and device for protecting business prediction model of data privacy joint training by two parties
CN112989368A (en) Method and device for processing private data by combining multiple parties
CN111291411B (en) Safe video anomaly detection system and method based on convolutional neural network
Kundu et al. Learning to linearize deep neural networks for secure and efficient private inference
Darwish A modified image selective encryption-compression technique based on 3D chaotic maps and arithmetic coding
CN114363043B (en) Asynchronous federal learning method based on verifiable aggregation and differential privacy in peer-to-peer network
CN114254386A (en) Federated learning privacy protection system and method based on hierarchical aggregation and block chain
Zhu et al. Enhanced federated learning for edge data security in intelligent transportation systems
CN112949741B (en) Convolutional neural network image classification method based on homomorphic encryption
Ghavamipour et al. Federated synthetic data generation with stronger security guarantees
Li et al. An Adaptive Communication‐Efficient Federated Learning to Resist Gradient‐Based Reconstruction Attacks
CN117294469A (en) Privacy protection method for federal learning
CN116582242A (en) Safe federal learning method of ciphertext and plaintext hybrid learning mode
CN110737907A (en) Anti-quantum computing cloud storage method and system based on alliance chain
Qiu et al. Privacy preserving federated learning using ckks homomorphic encryption
Xu et al. Privacy-preserving outsourcing decision tree evaluation from homomorphic encryption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant