CN115310121A

CN115310121A - Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles

Info

Publication number: CN115310121A
Application number: CN202210816716.3A
Authority: CN
Inventors: 朱容波; 李梦瑶; 刘浩
Original assignee: Huazhong Agricultural University
Current assignee: Huazhong Agricultural University
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2022-11-08
Anticipated expiration: 2042-07-12
Also published as: CN115310121B

Abstract

The invention discloses a real-time reinforced federal learning data privacy security method based on a MePC-F model in the internet of vehicles, which comprises the following steps: building multiple edge servers E _i And a cloud server CS; edge server E _i Downloading initial type A gradients from cloud server CS

And decrypted into

Random initialization of type B gradients

Carrying out local model training; edge server E _i By a decoding function from

To obtain partial gradient information to be preserved

And residual gradient information is processed

Is homomorphically encrypted as

Then broadcast and send to all other edge servers E through MePC algorithm _j (ii) a The class A gradient information after all edge servers are updated and shared is respectively

All edge servers will

Uploading the global parameters to a cloud server CS, and aggregating the global parameters by the cloud server CS through a PreFLa algorithm; the above steps are repeated until a termination condition is reached. The invention prevents data leakage between terminals, realizes data privacy safety protection, and reduces communication overhead while preventing original data leakage.

Description

Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles

Technical Field

The invention relates to the technical field of networked vehicle user cooperative processing real-time safety behavior analysis, in particular to a real-time reinforced federal learning data privacy security method based on a MePC-F model in a vehicle networking.

Background

With the development of various real-time communications and services supported by the internet of vehicles, the data volume generated by interconnected equipment such as vehicle-mounted units is unprecedentedly huge, a large amount of heterogeneous data oriented to vehicle users and the difference of equipment computing capacity are provided, federal learning provides an effective solution for meeting the requirement of data safety protection in the real-time training process of a network model, and different edge devices can cooperatively train a machine learning model under the condition of not exposing original data.

The edge computing mass data and the personal privacy of the user are closely combined, for example, the track, credit card, bill and other data of the user are really related to the privacy security of the user, and if data leakage occurs, great potential safety hazards are brought to the user. Federal learning can protect data to some extent, but the risk of information leakage still exists, and there are four types, 1) member leakage, 2) unexpected feature leakage, 3) class representing original data leakage, and 4) original data leakage. The last type of data leakage is the least acceptable for privacy-sensitive participants.

In order to protect the data privacy of mobile users and solve the problem of leakage of raw data, researchers have conducted a lot of research on data security protection based on cryptography: differential privacy, homomorphic encryption, and multi-party secure computation. Differential privacy generally uses three noise addition mechanisms: a laplace mechanism, a gaussian mechanism, and an exponential mechanism, respectively. The context information is disturbed by adding noise to protect the privacy of the data, but if the noise is increased too much, the performance of the model training is affected. Common in homomorphic encryption are additive and multiplicative homomorphic encryption: research shows that when Paillier addition homomorphic encryption calculation is used, noise is doubled, and when El Gamal multiplication homomorphic encryption calculation is used, the noise is increased in a second order. To increase the availability of data and overcome the noise problem, researchers have introduced bootstrapping, which reduces noise by setting thresholds for encryption and decryption, allowing the scheme to compute an infinite number of operations. It is also possible to do batch processing or parallel homomorphic computation or compression of erasure pairs to solve the noise problem. The safe multi-party calculation refers to the problem that a multi-party participant safely calculates an appointed function without a trusted third party, and the main purpose is to ensure that private input of each party is independent in the calculation process and no local data is leaked in the calculation process. Research proves that the gradient leakage problem in federal learning can be solved by using safe multi-party calculation, and the data safety protection can be carried out while the accuracy is ensured only by carrying out information exchange on the first hidden layer. But the process of information interaction is P2P, so the problem of large communication overhead occurs.

Most of data security protection research based on cryptography is a centralized solution, and aims to solve the problem of time overhead while protecting data security: federated learning allows edge devices to co-train machine learning models without exposing raw data. Federal learning typically employs a parameter server architecture, where the client is trained by a parameter server synchronization local model. The synchronization method is usually used for realizing, namely, the central server synchronously sends the global model to a plurality of clients, and the plurality of clients synchronously return the updated model to the central server after training the model based on local data. This may be slow due to a queue loss. Global synchronization is very difficult, especially in a joint learning scenario, due to limited computing power and battery time, and varying availability and completion times from device to device. A new joint optimization asynchronous algorithm is proposed to solve the regularization local problem to ensure convergence, so that multiple devices and servers can cooperatively and efficiently train the model without revealing privacy.

Although there have been many studies in terms of data security. However, most of the data are limited to solving the safety problem of original data, and how to simultaneously meet the requirements of privacy and usability of big data of mobile users in a complex vehicle networking space is still open.

Firstly, data in federal learning are stored in a local node, so that the risk problem of leakage of original data in data transmission can be reduced. But only gradient information is transmitted, the possibility still arises that the original data is recovered. Data interaction in secure multi-party computing can enable multiple parties to have data, and the possibility that a sample is recovered by information after gradient information is leaked is reduced. However, in the existing secure multi-party computing, the way of exchanging information among users is that all users send information to other users, and simply speaking, a unicast way is used, which brings higher time overhead. Therefore, when dealing with the data security and real-time requirements of vehicle users, it is important to find a suitable solution to reduce the risk of data being attacked and recovered, and to reduce the transmission delay. Secondly, due to the difference of data and equipment of different edge servers, it is also necessary to improve the training precision of the whole model in a targeted manner in the training process. The global parameter aggregation performed by adopting a typical federal average synchronization mode is slow due to the phenomenon of queue loss. While the communication time overhead is balanced and calculated, it is also important that the global precision is guaranteed through personalized training of a plurality of models. However, most federal learning algorithms based on data security rely on a synchronous aggregation algorithm, which can bring high latency to challenge meeting the real-time requirements of the internet of vehicles. Therefore, a federated learning algorithm based on reinforcement learning is necessary to reduce time delay, improve accuracy and guarantee data safety.

Disclosure of Invention

The invention aims to solve the technical problem of providing a real-time Federal learning data privacy strengthening safety method based on a MePC-F model in the Internet of vehicles aiming at the defects in the prior art.

The technical scheme adopted by the invention for solving the technical problems is as follows:

the invention provides a real-time reinforced federal learning data privacy security method based on a MePC-F model in the Internet of vehicles, which comprises the following steps:

s1, constructing a plurality of edge servers E _i And a cloud server CS; acquiring vehicle data D = { D = { (D) ₁ ，D ₂ ，…，D _i }, edge server E _i Acquiring corresponding vehicle data D _i ；

S2, in the k-th round of federal task, the edge server E _i Downloading initial type A gradients from cloud server CS

And decrypted into

Random initialization of type B gradients

Edge server E _i According to the number of vehiclesAccording to D _i To calculate the gradient in the local network model training, edge server E _i Recording the gradient information after finishing the T-round local training

S3, edge server E _i By decoding functions

From

To obtain partial gradient information to be preserved

And the remaining gradient information is used

Is homomorphically encrypted to

Then broadcast and send to all other edge servers E through MePC algorithm _j (ii) a Edge server E _i According to decoding function

Get from other edge servers E _j Corresponding partial gradient information of

The class A gradient information after all the edge servers are updated and shared is respectively

i∈[1，n]N is the total number of edge servers;

s4, all edge servers will

Is uploaded toThe cloud server CS aggregates global parameters through a PreFLa algorithm, and the PreFLa algorithm obtains maximization report through reinforcement learning to select the edge server E _i Is optimized to the parameter weight ratio a _i，k Global gradient parameter

According to a _i，k Carrying out polymerization; uploading and downloading processes of the parameters are parallel, and all the parameters are encrypted by the HE;

and S5, repeating the steps S2 to S4 until a termination condition is reached, calculating a final global gradient parameter by the cloud server CS, issuing the final global gradient parameter to each edge server, extracting the feature of the plurality of vehicle data by the edge server, calculating the accuracy and the optimal loss function of the MePC-F model to obtain the trained MePC-F model, completing the whole training process, and outputting the model to a service corresponding to the Internet of vehicles in real time. .

Further, in step S2 of the present invention, a specific method for training the local network model is as follows:

employing a deep neural network DNN model, the DNN performing end-to-end feature learning and classifier training by taking different vehicle data as raw inputs, using stochastic gradient descent as a subroutine to minimize the loss value in each local training;

E _i downloading base layer parameters from cloud server CS in k-th round of communication, namely initial A-type gradient before decryption

And decipher as A-type gradient

Random initialization of type B gradients

Wherein k is the [1,K ]]K represents the total number of rounds of the federal mission; if the task is the first round of federal task, the CS is initialized randomly

Before local training, E _i By using homomorphic cryptographic pairs

Is decrypted into

And is marked as

The loss function of the local model is set as follows:

L(w _i )＝l(w _i )+λ(w _i，t -w _i，t+1 ) ²

where L () represents the loss of the network, the second term is the L2 regularization term, and λ is the regularization coefficient; w is a _i Representing the total weight information in the local model, w _i，t Is the weight information of the local model at time t, w _i，t+1 The weight information of the local model at the moment t +1 is obtained;

E _i initialization G _k And replaces the weight parameter w of the model _i Continuing the local model training by minimizing the loss function as follows:

w _i ＝w _i -ηG _k

where eta is the learning rate, G _k Is that

And

is a general expression of

Random initialization;

edge server E _i After T rounds of local training are achieved, the accuracy acc of each local model is obtained at the moment _i，k And

further, in step S3 of the present invention, a specific method of the MePC algorithm is as follows:

in the k-th federal task, all edge servers use MePC to exchange base layer gradients

Wherein the content of the first and second substances,

class a encrypted data representing the nth edge server in the k round of federal tasks,

the encrypted data of class A of the ith edge server in the k round of federal task,

indicating that the ith edge server in the k round of federal task broadcasts a type A encrypted data to other edge servers,

is that

The encrypted data reserved by the user is removed;

to avoid the risk of data being cracked, a random ratio χ is taken in each network

The gradient is

And keeping the same federal random proportion chi equal in the same round, and then will

Is encrypted as

The random proportion χ varies across different rounds of federal mission, and χ ∈ [1,1/n ]]；

The remaining gradient is encrypted homomorphically to

Is divided into n-1 parts

The values of (a) are divided into:

only is provided with

Is retained at E _i In the method, other parts and the random parameter χ are broadcast and transmitted to other E in the form of ciphertext _j (ii) a In this way, even if part of the transmission content is attacked, the original data

The leakage is avoided;

sharing to other E _j The gradient information of (A) is

When Ei receives data packet sent by other server

It performs data authentication locally.

Further, in step S3 of the present invention, a specific method for locally performing data verification includes:

in the k-th round of federal tasks, verification was performed using the corresponding "multiplication" method, with each edge server designing two decoding functions on its own, as follows:

wherein L is ₀ Is that

Length of (L') is

Length of (d); subscript k of the decoding function represents the decoding function in the kth round of federal task;

L ₀ ＝χ·L

wherein L is

The length of (a) of (b),

and

are equal in length;

require that

The decoding functions of all edge servers execute ' and ' operation on the same data packet to obtain all 0's, andperforming the "cross" operation results in all 1's, i.e.:

first, the initial decoding function is as follows:

data packet

Multiplying with corresponding decoding functions in other servers; due to the fact that

The binary bit of middle 0 is multiplied by 0, so E _i Ensuring that only its own partial data packet is obtained; when in use

When the binary bit in (1) is 1, obtaining the ciphertext of the gradient information of the corresponding position as follows:

E _i will be from other edge servers E _j Adding all the obtained data packet arrays to corresponding positions to obtain all the ciphertext data, and updating the ciphertext data into the final ciphertext data

Namely:

each E will be added as k increases each time a secure multiparty computation is performed _i Decoding function of

Is circularly moved to the left by m units to ensure

Dynamics of sharing and equally dividing them into E ₁ ，E ₂ ，…，E _n And the data information of each part is not repeated.

Further, in step S4 of the present invention, the specific method of the PreFla algorithm is as follows:

PreFLa adopts reinforcement learning RL to carry out adaptation to select optimal parameter weight ratio a _i，k Aggregating global parameters

In an uplink communication stage, each edge server not only trains a local model, but also uploads local parameters to a cloud server CS for joint aggregation; after execution of the MePC algorithm in the k-th federation, E _i Parameterization over TLS/SSL secure channel

And

uploading to the CS; in the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, the model parameters of each ES are used for aggregation, and the convergence speed of the aggregation stage has a crucial influence; therefore, it is necessary to consider participant E in k rounds of federal aggregation _i Parameter weight ratio of (a) _i，k ；

Using the reinforced learning based on DQN to predict the weight ratio of the parameters, and storing information through a Q function to prevent a spatial multi-dimensional disaster; to better achieve model personalization and reduce latency of upload weights in MePC-FDQN is used to select the optimal parameter weight ratio a _i，k Aggregating global parameters in updated CS

The reinforcement learning comprises the following steps: status, actions, reward functions, and feedback.

Further, in step S4 of the present invention, the specific method of the state, the action, the reward function and the feedback is as follows:

and (3) state: state of the kth wheel

Wherein the content of the first and second substances,

is a poor precision, expressed as:

the actions are as follows: weight ratio of parameter a _i，k An action represented as a kth round of federated tasks; in order to avoid trapping in a local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process to obtain a _i，k ：

Where P is a set of weight permutations, rand is a random number, rand is an element of 0,1]，Q(s _i，k ，a _i，k ) Means agent is in state s _i，k Take action a _i，k Cumulative cash-out benefits over time; once DQN is trained to approximate Q(s) during testing _i，k ，a _i，k ) The DQN agent will compute { Q(s) for all actions in the kth round _i，k ，a _i，k )|a _i，k ∈[P]}; each action value represents a passing of the agent in state s _i，k Selecting a particular action a _i，k The maximum expected reward that can be achieved;

rewarding: the observed reward at the end of the kth round of federal is set to be:

wherein the content of the first and second substances,

is a positive number, ensuring that r _k Δ acc with test accuracy _i，k The growth is exponential; the first incentive agent selects equipment capable of achieving higher test accuracy;

for controlling the following Δ acc _i，k Increase r _k A change in (c); when Δ acc _i，k When less than 0, there is r _k ∈(-1，0)；

The desire to train the DQN agent to maximize the cumulative discount reward is shown as follows:

wherein γ ∈ (0,1 ], represents a factor discounting future rewards;

at obtaining r _k The cloud server CS then saves the multi-dimensional quadruplets B for each round of federal tasks _k ＝(s _i，k ，a _i，k ，r _k ，s _i，k+1 ) (ii) a Optimal action value function Q(s) _i，k ，a _i，k ) Is the memo sought by the RL proxy, defined as s _i，k Initial cumulative discount yield maximum expectation:

Q(s _i，k ，a _i，k )＝E(r _i，k +γmax Q(s _i，k+1 ，a _i，k )|s _i，k ，a _i，k )

learning a parameterized value function Q(s) using function approximation techniques _i，k ，a _i，k ；w _k ) Approximating the optimal value function Q(s) _i，k ，a _i，k )；r _k +γmax Q(s _i，k+1 ，a _i，k ) Is Q(s) _i，k ，a _i，k ；w _k ) A goal of learning; DNN is used to represent a function approximator; the RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:

l(w _k )＝(r _i，k +γmax Q(s _i，k+1 ，a _i，k ；w _k )-Q(s _i，k ，a _i，k ；w _k )) ²

CS updates the global parameter w _k Comprises the following steps:

wherein eta is more than or equal to 0 and is the step length;

after the cloud server CS obtains the optimal learning model, a of the k-th round weight ratio sequence is obtained _i，k Global parameter(s)

The updating is as follows:

all edge servers update global parameters

And begin the next T local training rounds.

Further, the HE encryption method in the method of the present invention specifically includes:

the encryption schemes of the weight matrix and the offset vector follow the same idea, and the addition homomorphic encryption of the real number a is expressed as a ^E In addition homomorphic encryption, for any two numbers a and b, there is a ^E +b ^E ＝(a+b) ^E (ii) a The method of converting any real number r into a coded rational number stationary point v is:

consider the gradient

Each encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits, and d-bit fractional bits; thus, each rational number that can be encoded is defined by its H =1+ z + d bits; performing encoding to allow multiplication operations, which require the operation modulus to be H +2d to avoid comparison;

the decoding is defined as:

multiplication of these code numbers requires removal of the factor 1/2d; when Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once; for simplicity, it is processed at decoding time;

the largest encryptable integer is V-1, so the largest encryptable real number must be taken into account, and therefore the integer z and the fraction d are chosen as follows:

V≥2 ^H+2d ≥2 ^1+z+3d 。

further, the optimal loss function in step S5 of the present invention is

Wherein, L (w) _i ) Denotes E _i Loss of the network.

The invention has the following beneficial effects:

(1) A Federal learning model (MePC-F) for multi-party broadcast security computing is presented. The model combines the MePC algorithm and the PreFla algorithm to solve the problems of the security of federal learning training data and communication overhead in the Internet of vehicles. And the mixed advantages of homomorphic encryption and safe multi-party calculation are considered to prevent data leakage between the terminals, and the reduction degree of the original data is reduced after the data is attacked, so that the privacy safety protection of the data is realized to the maximum extent.

(2) A secure broadcast multi-party computing MePC is presented. For secure multiparty computation, sharing only the gradient information of the first layer can greatly reduce the risk of data being recovered and reduce traffic. In the sharing process, the edge server model takes respective parts through a decoding function in a broadcasting mode, and the time complexity can be increased from O (n) ² ) To O (n), communication overhead is reduced while preventing leakage of original data.

(3) A federal learning algorithm PreFla based on weight proportion is proposed. And finding the optimal gradient weight ratio by using PreFla to aggregate global parameters, and designing a reward function by using the accuracy difference of each edge server, so that the action selection with the maximum overall return is the weight ratio of each round of federation. And an L2 regularization term is added in the loss function to promote edge server cooperation and reduce time delay and performance problems brought by data heterogeneity. Therefore, the global model is generalized better, and convergence is accelerated.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a MePC-F model of an embodiment of the present invention;

FIG. 2 is a flow chart of a MePC-F model of an embodiment of the present invention;

FIG. 3 is a MePC algorithm of an embodiment of the present invention;

fig. 4 shows DLG results when the first hidden layer is hidden and not hidden by four methods on the MNIST according to the embodiment of the present invention; (a) FL; (b) MePC-F; (c) PeMPC; (d) Gaussian; (e) Laplacian;

FIG. 5 is the performance of DLG on MNIST of an embodiment of the present invention when the gradient of the first hidden layer is replaced by four methods (Gaussian distribution, laplacian distribution, PEMPC and MePC-F);

FIG. 6 is the average accuracy and loss of No-IID MNIST data for an embodiment of the present invention;

FIG. 7 is the average accuracy and loss of No-IID CAFIR-10 data for an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The parameters involved in the examples of the invention are described below:

TABLE 1 description of the parameters

Wherein E is _i Indicating the current edge server, E _j Representing edge servers other than the current edge server, E _s Representing all edge servers.

The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles comprises the following steps:

s1, constructing a plurality of edge servers E _i And a cloud server CS; obtaining vehicles data D = { D = ₁ ，D ₂ ，…，D _i }, edge server E _i Acquiring corresponding vehicle data D _i ；

S2, in the k-th round of federal task, an edge server E _i Downloading initial type A gradients from cloud server CS

And decrypted into

Random initialization of type B gradients

Edge server E _i According to its vehicle data D _i To compute gradients in local network model training, edge server E _i After finishing the local training of T wheelIs recorded as gradient information of

S3, edge server E _i By decoding functions

From

To obtain partial gradient information to be preserved

And the remaining gradient information is used

Is homomorphically encrypted to

Get from other edge servers E _j Corresponding partial gradient information of

i∈[1，n]N is the total number of edge servers;

s4, all edge servers will

Uploading the data to a cloud server CS, aggregating global parameters by the cloud server CS through a PreFLa algorithm, and selecting edge services by the PreFLa algorithm through obtaining a maximization report through reinforcement learningDevice E _i Is optimized to the parameter weight ratio a _i，k Global parameter

and S5, repeating the steps S2-S4 until a termination condition is reached, and finishing the whole training process. The termination condition may be a maximum number of training cycles, a convergence of a loss function, or other user-defined condition. Finally, an optimal loss function can be obtained according to the following equation (1).

Wherein, L (w) _i ) Represents E _i Loss of the network.

The specific method of local training is as follows:

in the local model phase, a Deep Neural Network (DNN) is employed to learn the cloud model and the ES model. DNN performs end-to-end feature learning and classifier training by taking different user data as raw inputs. Random gradient descent will be used as a subroutine in the proposed algorithm to minimize the loss value in each local training.

In a downstream communication phase E _i At k (k E [1,K)]) Downloading base layer parameters from CS in round-robin communication

And randomly initializing

Where K represents the total number of rounds of the federated task. If the task is the first federal task, the CS initializes randomly

Before local training, E _i It is necessary to use the pair of homomorphic encryptions (formula (4)) for

Is decrypted into

And is marked as

In order to better embody the model personalization, the loss function of the local model is set as follows:

L(w _i )＝l(w _i )+λ(w _i，t -w _i，t+1 ) ² (16)

where l () represents the loss of the network, e.g., the cross-entropy loss of the classification task. The second term is an L2 regularization term, which can not only keep the individuation capability of the second term, but also improve the cooperation efficiency with other participants. λ is the regularization coefficient.

E _i Initialization G _k And replaces the weight parameter w of the model _i Continuing the local model training as follows

w _i ＝w _i -ηG _k (17)

Where eta is the learning rate, G _k Is that

And

is shown in general. Herein, the

And (4) random initialization.

E _i After T rounds of local training are achieved, the accuracy acc of each local model is obtained at the moment _i，k 、

And

terminal anddirect sharing of user information between the terminals is prohibited, and data in the edge server needs to be encrypted before communication, so that the data is prevented from being attacked before communication. This process uses the HE to avoid information leakage. The process of adding HE using real numbers will be shown below. The encryption schemes of the weight matrix and the offset vector follow the same idea, and the addition homomorphic encryption of the real number a is expressed as a ^E . In additive homomorphic encryption, for any two numbers a and b, there is a ^E +b ^E ＝(a+b) ^E . The method of converting any real number r into a coded rational number stationary point v is:

consider the gradient

Each encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits and d-bit fractional bits. Thus, each rational number that can be encoded is defined by its H =1+ z + d bits. The encoding is performed to allow multiplication operations, which require the operation modulus to be H +2d to avoid comparison.

The decoding is defined as:

multiplication of these code numbers requires removal by a factor of 1/2d. When Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once. For simplicity, it is processed at decoding time.

It is correct if only one code multiplication has taken place. Since the largest encryptable integer is V-1, the largest encryptable real number must take this into account. Therefore, the integer z and the decimal d must be chosen as follows:

V≥2 ^H+2d ≥2 ^1+z+3d (5)

after encryption

And aCC _i，k Are respectively represented as

And

the specific method of the MePC algorithm is shown in FIG. 3.

Exchange of base layer gradients using MePC in the k-th Federal task

The gradient is

And keeping the same federal random proportion χ the same. In the different rounds of federal mission, the random ratio χ (χ ∈ [1,1/n)]) Is that the number of the optical fibers is varied,

the remaining gradient was divided equally into n-1 portions

As shown in figure 3 of the drawings,

the values of (a) are divided into:

only is provided with

Is retained at E _i The other part and the random parameter χ are broadcast to the other ESs in the form of ciphertext. In this way, even if part of the transmission content is attacked, the original data

And will not leak. In particular, if an attacker wants to obtain data

Must acquire

All of (a). However, it is possible to use a single-layer,

and χ in participant E _i And a receiver E _j The cryptograph form is kept through homomorphic encryption during communication.

The gradient information shared to the other ESs is

When E is _i Receiving data packet sent by other server

It performs data authentication locally. In particular, it uses a corresponding "multiplication" method for verification. Each edge server designs two decoding functions by itself, as follows:

wherein L is ₀ Is that

Length of (L') is

Length of (d).

L ₀ ＝χ·L (9)

Wherein L is

The length of (a) of (b),

and

are equal in length.

Require that

The decoding functions meeting all the ES execute 'and' operation on the same data packet to obtain all 0's, and execute' cross 'operation to obtain all 1's, that is

First, the decoding function is initialized as follows,

note that at the time of initialization, different E' s _i The data decoding of the transmitted data packet in the same federal task is also the same function.

Data packet

Multiplied by the corresponding decoding functions in the other servers. Due to the fact that

The binary bit of middle 0 is multiplied by 0, so E _i It can be guaranteed that only its own partial data packet is available. When in use

When the binary bit in (1) is 1, the ciphertext of the gradient information of the corresponding position can be obtained as follows:

E _i adding all data packet arrays obtained from other ES to corresponding positions to obtain all ciphertext data, and updating the ciphertext data into the final ciphertext data

Namely, it is

Each time a secure multiparty computation is performed, each E will be incremented with k _i Decoding function of

Is circularly moved to the left by m units to ensure

Dynamics of sharing and can divide them equally into E ₁ ，E ₂ ，…，E _n And the data information of each part is not repeated.

The specific method of the PreFla algorithm is as follows:

the data distribution in the internet of vehicles is dispersed, and the data is unbalanced and heterogeneous, so that the improvement of the personalized service requirement while the real-time requirement is met is difficult. In order to prevent the privacy of users from being leaked during the communication of different edge servers, the HE is used for encrypting parameters during the communication. In order to better realize personalized training aiming at different user data, the first layer is set as a basic layer, the existing federal learning method is used for training in a cooperative mode, and other layers are used as personalized layers for local training, so that personal information of different ES devices can be captured. In this way, after the joint training process, the globally shared base layer can be transferred into the ES to build its own personalized deep learning model and use its unique personalized layer. Downloading base layer parameters from CS only

Parameters of a personalization layer

Randomly generated and fine-tuned using local data. In order to meet the real-time requirement and realize the personalized requirement of the ES, the PreFLa adopts Reinforcement Learning (RL) to carry out adaptation to select the optimal parameter weight ratio a _i，k Aggregating global parameters

In the upstream communication phase, each ES not only trains the local model, but also uploads the local parameters to the CS for joint aggregation. After execution of the MePC algorithm in the k-th federation, E _i Parameterization over TLS/SSL secure channel

And

and uploading to the BS. In the aggregation stage, due to the unbalanced distribution and data heterogeneity of each ES, its model parameters for aggregation have a crucial influence on the convergence speed of the stage. Therefore, it is necessary to consider participant E in k rounds of federal aggregation _i Parameter weight ratio of (a) _i，k 。

In the invention, DQN-based reinforcement Learning is used for predicting the parameter weight ratio, and information is stored through a Q function instead of table storage in Q-Learning so as to prevent a spatial multidimensional disaster. In order to better realize model personalization and reduce the waiting time of the uploading weight in MePC-F, DQN is used for selecting the optimal parameter weight ratio a _i，k To, aggregate the global parameters in the update CS

The reinforcement learning center contains: the state, action, reward function, and feedback are defined as follows:

the state is as follows: state of the kth wheel

Wherein the content of the first and second substances,

is a poor precision, expressed as:

the actions are as follows: weight ratio of parameter a _i，k Represented as the actions of the k-th round of federal tasks. In order to avoid trapping in a local optimal solution, an epsilon-greedy algorithm is adopted to optimize the action selection process, and a can be obtained _i，k ：

Where P is a set of weight permutations and rand is a random number (ra)nd∈[0，1])，Q(s _i，k ，a _i，k ) Means agent is in state s _i，k Take action a _i，k The accumulation of time discounts revenue. Once DQN is trained to approximate Q(s) during testing _i，k ，a _i，k ) The DQN agent will compute { Q(s) for all actions in the kth round _i，k ，a _i，k )|a _i，k ∈[P]}. Each action value represents a passing of the agent in state s _i，k Selecting a particular action a _i，k The maximum expected return that can be achieved.

Rewarding: set the observed reward at the end of the k-th round of federal as

Wherein the content of the first and second substances,

is a positive number, ensuring that r _k Δ acc with test accuracy _i，k And the growth is exponential. The first incentive agent selects devices that enable higher test accuracy.

For controlling the following Δ acc _i，k Increase r _k A change in (c). In general, as machine learning training progresses, model accuracy increases at a slower rate. However, in the federal cooperative task, the model accuracy may be reduced due to data distribution imbalance and heterogeneity. Thus, as FL enters the late stage, an exponential term is used to amplify the increase in boundary accuracy. The second term-1 is used to encourage the agent to improve model accuracy because when Δ acc _i，k When less than 0, there is r _k ∈(-1，0)。

Training DQN agents to maximize the expectation of a cumulative discount reward as shown in

Where γ ∈ (0,1 ] is a factor discounting future rewards.

At obtaining r _k Thereafter, the CS saves the multi-dimensional quad B for each round of federated tasks _k ＝(s _i，k ，a _i，k ，r _k ，s _i，k+1 ). Optimal action value function Q(s) _i，k ，a _i，k ) Is the memo sought by the RL proxy, defined as s _i，k Initial cumulative discount yield maximum expectation:

Q(s _i，k ，a _i，k )＝E(r _i，k +γmax Q(s _i，k+1 ，a _i，k )|s _i，k ，a _i，k ) (22)

a parameterized value function Q(s) can then be learned using function approximation techniques _i，k ，a _i，k ；w _k ) Approximating the optimum function Q(s) _i，k ，a _i，k ). In the first step r _k +γmax Q(s _i，k+1 ，a _i，k ) Is Q(s) _i，k ，a _i，k ；w _k ) The goal of learning. Generally, DNN is used to represent a function approximator. The RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:

l(w _k )＝(r _i，k +γmax Q(s _i，k+1 ，a _i，k ；w _k )-Q(s _i，k ，a _i，k ；w _k )) ² (23)

CS updates Global parameter w _k Comprises the following steps:

wherein eta is more than or equal to 0, which is the step length.

The CS repeats the above steps to obtain the best learning model. CS can obtain a of the k-th round weight ratio sequence _i，k Global parameter of

The updating is as follows:

all ES update Global parameters

And begin the next T local training rounds.

Experimental test examples:

to verify the validity of the proposed mechanism, experimental results and analysis are given. Consider a system with 1 cloud server, 10 edge servers, respectively. The experimental learning rate α is 0.01 and the discount factor γ is 0.9. Positive integer

And taking 3. The values of the parameters are shown in Table 2.

TABLE 2 parameter settings

The validity of the proposed model was verified on two data sets: MNIST and CIFAR-10. The performance of the proposed federal learning model MePC-F was evaluated based on reconstructed images, average accuracy and average loss of DLG. First, the performance of five schemes against DLG attacks was defended, and then the performance of the proposed federal learning model, mePC-F, was compared to centralized and PeMPC. All results in the following scenario are mean values of 1000 independent experiments.

1) Performance against DLG attacks

This section evaluates the effectiveness of MePC-F and compares it with FL, peMPC and DP algorithms (Gaussian and Laplace distributed noise) in DLG reconstructed images. The common gradient of the network is computed for a single image on the MNIST dataset, the results of the different schemes are shown in fig. 4. Since studies [17] indicate that hiding the gradient of the first layer can reduce the reconstruction of the data, the gradient of the first layer (weight and bias terms) is replaced by four methods: mePC-F, peMPC, gaussian distribution (μ =0, σ = 1) noise, and laplace distribution (μ =0, σ = 1) noise were proposed to look at the behavior of DLG. Ladder hiding the first floor after completion

After degrees, DLG uses these gradients to recover the image that created the common shared gradient.

As can be seen from fig. 4, the DLG process can accurately reconstruct the training data without any method to hide the gradient of the first layer (FL in fig. 4 (a)). When the gradient of the first layer is protected by the method proposed by the present invention, mePC-F, information leakage can be effectively prevented in fig. 4 (b). When the number of iteration steps reaches 500, the DLG still cannot construct an image. From fig. 4 (c), it can be seen that similar results to fig. 4 (b), the PeMPC can also defend against the DLG attack in fig. 4. As can be seen from fig. 4 (d), by adding gaussian noise to the first layer, the reconstructed image is partially displayed from the 15 th round to the 20 th round, where the basic contour of the original image has been constructed. As the number of iteration rounds increases to 500 rounds, the image can be restored clearly. The laplacian noise and the gaussian noise in fig. 4 (e) also have a similar phenomenon.

As can be seen from fig. 5, if a malicious server receives the gradients of all hidden layers as plain text, the reconstruction process can obtain the lowest gradient loss and MSE of the image (green line in fig. 5). As the number of rounds increases, peMPC and MePC-F do not converge to zero and the MSE of the image reaches 10 ⁷ . Adding laplacian and gaussian noise to the original gradient converges to 10 ^-5 Fig. 4 also demonstrates that the data can be reconstructed when up to 20 rounds are reached. The larger the MSE of the image, the less likely it is that the image is reconstructed.

Based on the above experimental results, it is verified that adding laplacian and gaussian noise to the original gradient can prevent the early partial gradient leakage, but as the number of rounds increases, the original data is still recovered due to the depth leakage. However, peMPC and MePC-F are effective methods to prevent DLG attacks from reconstructing raw data no matter how long the number of training rounds are.

2) Performance comparison of average accuracy and average penalty

In this section, the effectiveness of MePC-F was evaluated and compared to centralized and PeMPC in terms of average accuracy, MNIST and average loss on CIFAR-10 data sets.

From fig. 6 (a) it can be seen the number of rounds required by the model to achieve 98% accuracy on the MNIST dataset. The average accuracy of all three methods increases with increasing training rounds. A centralized approach to achieve target accuracy on MNIST data requires 25 rounds, pepmc 140 rounds and MePC-F40 rounds, with MePC-F requiring 71.2% lower training rounds than pepmc. The reason is that the proposed reinforced federal learning algorithm PreFLa can find better aggregation parameter weight a through interaction with the environment _i,k The method can better deal with No-IID data, accelerate model convergence and achieve target precision. The centralized type is trained on all combinations of data, so its accuracy will be higher than that of the federal learning algorithm. But it can be seen from the figure that the convergence of the PeMPC can almost reach a centralized accuracy.

From fig. 6 (b), it can be seen that the average loss for the three schemes decreases as the number of training rounds increases. For the lumped type, the average loss is reduced from 0.233 to 0.052. The average loss of the peppc decreased from 0.35 to 0.084. Meanwhile, the average loss of MePC-F proposed by the invention is reduced to 0.06, which is lower than 28.6% of that of PeMPC. The proposed MePC-F can almost reach a centralized loss value when the number of training rounds reaches 100 rounds.

As can be seen in FIG. 7 (a), the number of rounds required for the model to achieve a target accuracy of 50% on CAFIR-10. Similar results to those of fig. 6 (a) can be seen. The average accuracy of all three models is increasing until the target value is reached. For the centralised type, the average accuracy increased from 0.42 to 0.5 for 23 rounds. The average accuracy of the peppc increased from 0.372 to 0.5 for 89 rounds. Meanwhile, the average precision of the proposed MePC-F reaches the target precision at 41 rounds, which is 53.9% lower than that of PeMPC. FIG. 7 (a) shows that MePC-F uses a better weight a than PeMPC _i,k The global model is updated, which results in a faster convergence speed.

As can be seen from fig. 7 (b), the average loss for the three schemes is decreasing until a stable value is reached. The centralized MePC-F, peMPC reaches the minimum loss value in sequence, and the time efficiency of the proposed MePC-F is better under PeMPC.

TABLE 3 top accuracy of three schemes in 100 rounds

	MNIST	CIFAR-10
			centralized	98.4％	51.4％
MePC-F	98.2％	51.1％
			PeMPC	97.6％	49.2％

Table 3 gives the accuracy of the three solutions within 100 rounds. For MNIST data, the average accuracy of the proposed MePC-F is 98.2% which is 0.6% higher than that of PeMPC. The accuracy of the PeMPC can almost reach the accuracy of the centralized training. For the CAFIR-10 data, the mean accuracy of MePC-F was as high as 0.511 at 100 rounds, 1.9% higher than for PeMPC. It shows that MePC-F can update a by optimal weight _i,k Global parameters are aggregated better than with peampc, resulting in higher accuracy, closer to focused accuracy.

It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims

1. A real-time reinforced Federal learning data privacy security method based on a MePC-F model in the Internet of vehicles is characterized by comprising the following steps:

And decrypted into

Random initialization of type B gradients

Edge server E _i According to its vehicle data D _i To compute gradients in local network model training, edge server E _i Recording the gradient information after finishing the T-round local training

S3, edge server E _i By decoding functions

From

To obtain partial gradient information to be preserved

And the rest ladder is arrangedDegree information

Is homomorphically encrypted to

Then broadcast and send to all other edge servers E through the MePC algorithm _j (ii) a Edge server E _i According to decoding function

Obtain information from other edge servers E _j Corresponding partial gradient information of

i∈[1，n]N is the total number of edge servers;

s4, all edge servers will

Uploading the data to a cloud server CS, aggregating global parameters by the cloud server CS through a PreFLa algorithm, and selecting an edge server E by the PreFLa algorithm through obtaining a maximization report through reinforcement learning _i Is optimized to the parameter weight ratio a _i，k Global gradient parameter

and S5, repeating the steps S2 to S4 until a termination condition is reached, calculating a final global gradient parameter by the cloud server CS, issuing the final global gradient parameter to each edge server, extracting the accuracy and the optimal loss function of the MePC-F model by the edge server according to the characteristics of a plurality of vehicle data, obtaining the trained MePC-F model, completing the whole training process, and outputting the model to a service corresponding to the Internet of vehicles in real time.

2. The real-time reinforced federal learning data privacy security method in the internet of vehicles based on the MePC-F model as claimed in claim 1, wherein in the step S2, the specific method for training the local network model is as follows:

E _i downloading base layer parameters from cloud server CS in kth round of communication, namely initial A-type gradient before decryption

And decipher as A-type gradient

Random initialization of type B gradients

Before local training, E _i By using homomorphic cryptographic pairs

Is decrypted into

And is marked as

The loss function of the local model is set as follows:

L(w _i )＝l(w _i )+λ(w _i，t -w _i，t+1 ) ²

E _i update G _k And replaces the weight parameter w of the model _i The local model training continues by minimizing the loss function as follows:

w _i ＝w _i -ηG _k

where eta is the learning rate, G _k Is that

And

general expression of (1), herein

Random initialization;

edge server E _i After T rounds of local training are achieved, the accuracy of each local model aCC is obtained at the moment _i，k And

3. the real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein in the step S3, the specific method of the MePC algorithm is as follows:

Wherein the content of the first and second substances,

class a encrypted data representing the nth edge server in the kth round of federal task,

class a encrypted data representing the ith edge server in the kth round of federal task,

is that

The encrypted data reserved by the user is removed;

The gradient is

Is encrypted as

The remaining gradient is encrypted homomorphically to

Is divided into n-1 parts

The values of (a) are divided into:

only is provided with

Is retained at E _i In the method, other parts and the random parameter x are broadcast and transmitted to other E in a ciphertext mode _j (ii) a In this way, even if part of the transmitted content is attacked, the original data

The leakage is avoided;

sharing to other E _j The gradient information of

When E is _i Receiving data packet sent by other server

It performs data authentication locally.

4. The real-time reinforced federal learning data privacy security method in the internet of vehicles based on the MePC-F model as claimed in claim 3, wherein in the step S3, the specific method for locally performing data verification is as follows:

wherein L is ₀ Is that

Length of (L') is

The length of (d); subscript k of the decoding function represents the decoding function in the k-th round of federal task;

L ₀ ＝χ·L

wherein L is

The length of (a) of (b),

and with

Are equal in length;

require that

The decoding functions of all the edge servers execute 'union' operation on the same data packet to obtain all 0's, and execute' intersection 'operation to obtain all 1's, namely:

first, the initial decoding function is as follows:

data packet

Namely:

Is circularly moved to the left by m units to ensure

5. The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein in the step S4, the specific method of the PreFla algorithm is as follows:

In an uplink communication stage, each edge server not only trains a local model, but also uploads local parameters to a cloud server CS for joint aggregation; after execution of MePC algorithm in the kth Federal, E _i Parameterization over TLS/SSL secure channel

And

Using the DQN-based reinforcement learning to predict the parameter weight ratio, and storing information through a Q function to prevent a spatial multi-dimensional disaster; in order to better realize model personalization and reduce the waiting time of the uploading weight in MePC-F, DQN is used for selecting the optimal parameter weight ratio a _i，k Aggregating global parameters in update CS

6. The real-time reinforced federal learning data privacy security method in internet of vehicles based on the MePC-F model as claimed in claim 5, wherein in the step S4, the specific methods of status, action, reward function and feedback are as follows:

the state is as follows: state of the kth wheel

Wherein the content of the first and second substances,

is a poor precision, expressed as:

Where P is a set of weight permutations, rand is a random number, rand is an element of 0,1]，Q(s _i，k ，a _i，k ) Means agent is in state s _i，k Take action a _i，k The accumulated cash-out benefits; once DQN is trained to approximate Q(s) during testing _i，k ，a _i，k ) The DQN agent will compute { Q(s) for all actions in the kth round _i，k ，a _i，k )|a _i，k ∈[P]}; each action value represents a passing of the agent in state s _i，k Selecting a particular action a _i，k Maximum expected return obtained;

rewarding: the observed reward at the end of the k-th round of federal is set to be:

wherein the content of the first and second substances,

is a positive number, ensuring that r _k Δ acc with training accuracy _i，k The growth is exponential; the first incentive agent selects equipment capable of achieving higher test accuracy;

Training the DQN agent to maximize the expectation of a cumulative discount reward, as shown by:

wherein γ ∈ (0,1 ], representing a factor discounting future rewards;

in obtaining r _k The cloud server CS then saves the multi-dimensional quadruplets B for each round of federal tasks _k ＝(s _i，k ，a _i，k ，r _k ，s _i，k+1 ) (ii) a Optimal action value function Q(s) _i，k ，a _i，k ) Is the memo sought by the RL proxy, defined as s _i，k Initial cumulative discount yield maximum expectation:

learning a parameterized value function Q(s) using function approximation techniques _i，k ，a _i，k ；w _k ) Approximating the optimum function Q(s) _i，k ，a _i，k )；r _k +γmax Q(s _i，k+1 ，a _i，k ) Is Q(s) _i，k ，a _i，k ；w _k ) A goal of learning; DNN is used to represent a function approximator; the RL learning problem becomes a minimum MSE loss between the target and the approximator, defined as:

CS updates Global parameter w _k Comprises the following steps:

wherein eta is more than or equal to 0 and is the step length;

after the cloud server CS obtains the optimal learning model, a of the k-th round weight ratio sequence is obtained _i，k Global parameter of

The updating is as follows:

all edge servers update global parameters

And begin the next T local training rounds.

7. The real-time reinforced federal learning data privacy security method based on the MePC-F model in the internet of vehicles according to claim 1, wherein the HE encryption method in the method specifically comprises the following steps:

consider the gradient

Each encoded real number r in (a) can be represented as a rational H-bit number, consisting of one sign bit, z-bit integer bits, and d-bit fractional bits; thus, each rational number that can be encoded is defined by its H =1+ z + d bits; performing encoding to allow multiplication operations, which require operations modulo H +2d to avoid comparisons;

the decoding is defined as:

multiplication of these code numbers requires removal by a factor of 1/2d; when Paillier addition encryption is used, the condition of coding multiplication can be accurately calculated, but homomorphic multiplication can be ensured only once; for simplicity, it is processed at decoding time;

V≥2 ^H+2d ≥2 ^1+z+3d 。

8. the real-time reinforced federal learning data privacy security method in internet of vehicles based on MePC-F model as claimed in claim 1, wherein the optimal loss function in the step S5 is

Wherein, L (w) _i ) Denotes E _i Loss of the network.