CN115310121B

CN115310121B - Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles

Info

Publication number: CN115310121B
Application number: CN202210816716.3A
Authority: CN
Inventors: 朱容波; 李梦瑶; 刘浩
Original assignee: Huazhong Agricultural University
Current assignee: Huazhong Agricultural University
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2023-04-07
Anticipated expiration: 2042-07-12
Also published as: CN115310121A

Abstract

The invention discloses a real-time reinforced federal learning data privacy security method based on a MePC-F model in the internet of vehicles, which comprises the following steps: building multiple edge servers E _i And a cloud server CS; edge server E _i Downloading initial type A gradients from cloud server CS

And decrypted into

Random initialization of type B gradients

Carrying out local model training; edge server E _i By a decoding function from

To obtain partial gradient information to be preserved

And the remaining gradient information is used

Is homomorphically encrypted as

Then broadcast and send to all other edge servers E through MePC algorithm _j (ii) a The class A gradient information after all the edge servers are updated and shared is respectively

All edge servers will

Uploading the global parameters to a cloud server CS, and aggregating the global parameters by the cloud server CS through a PreFLa algorithm; the above steps are repeated until a termination condition is reached. The invention prevents data leakage between terminals, realizes data privacy safety protection, and reduces communication overhead while preventing original data leakage.

Description

Real-time enhanced federated learning data privacy security based on MePC-F model in Internet of Vehicles Method

技术领域Technical Field

本发明涉及联网车辆用户协同处理实时安全行为分析技术领域，尤其涉及一种车联网中基于MePC-F模型的实时强化联邦学习数据隐私安全方法。The present invention relates to the technical field of real-time safety behavior analysis for collaborative processing by networked vehicle users, and in particular to a real-time enhanced federated learning data privacy security method based on a MePC-F model in a network of vehicles.

背景技术Background Art

随着车联网支持各种实时通信和服务的发展，通过车载单元等互联设备生成的数据量空前巨大，面向车辆用户的大量异构性数据和设备计算能力的差异性，联邦学习为满足网络模型实时训练过程中数据安全保护要求提供了一种有效的解决方案，它可以让不同的边缘设备在不暴露原始数据的情况下协同训练机器学习模型。With the development of the Internet of Vehicles supporting various real-time communications and services, the amount of data generated by interconnected devices such as vehicle-mounted units is unprecedentedly huge. Faced with the large amount of heterogeneous data of vehicle users and the differences in device computing capabilities, federated learning provides an effective solution to meet the data security protection requirements during real-time training of network models. It allows different edge devices to collaboratively train machine learning models without exposing the original data.

边缘计算海量数据与用户个人隐私联合紧密，例如，用户的轨迹、信用卡、账单等数据，切实关系到用户隐私安全，如发生数据泄露，将给用户带来重大安全隐患。联邦学习可以在一定程度上保护数据，但依旧存在信息泄露的风险，主要有四种类型:1)成员泄露，2)非预期特征泄露，3)代表原始数据泄露的类，4)原始数据泄露。最后一种类型的数据泄漏对于隐私敏感的参与者来说是最不可接受的。The massive data of edge computing is closely related to the personal privacy of users. For example, the user's trajectory, credit card, bill and other data are closely related to the privacy and security of users. If data leakage occurs, it will bring major security risks to users. Federated learning can protect data to a certain extent, but there is still the risk of information leakage, which mainly includes four types: 1) member leakage, 2) unexpected feature leakage, 3) class representing the original data leakage, and 4) original data leakage. The last type of data leakage is the most unacceptable for privacy-sensitive participants.

为了保护移动用户的数据隐私性，解决上述原始数据泄露问题，研究者们在基于密码学的数据安全保护进行了大量的研究：差分隐私、同态加密、多方安全计算。差分隐私通常使用三种噪声添加机制：分别为拉普拉斯机制、高斯机制和指数机制。通过添加噪声来干扰上下文信息，以保护数据的隐私，但如果噪声增加过多则会影响模型训练的性能。同态加密中常用的是加法和乘法同态加密：研究表明，使用Paillier加法同态加密计算时，噪声会增加一倍，而使用El Gamal乘法同态加密计算时，噪声呈二次增长。为了增加数据的可用性并克服噪声问题，研究者引入bootstrapping，通过设定阈值进行加密解密降低噪声，从而允许该方案计算无限次的操作。还可以进行批处理，或者进行并行同态计算或删除对的压缩来解决噪声问题。安全多方计算是指在无可信第三方的条件下多方参与者安全地计算一个约定函数的问题，主要目的是在计算过程中必须保证各方私密输入独立，计算时不泄露任何本地数据。有研究证明使用安全多方计算可以解决联邦学习中的梯度泄露问题，并证明只需对第一隐藏层进行信息交换就可在保证精确度的同时进行数据安全保护。但它的信息交互的过程是P2P的，所以会出现通信开销大的问题。In order to protect the data privacy of mobile users and solve the above-mentioned original data leakage problem, researchers have conducted a lot of research on data security protection based on cryptography: differential privacy, homomorphic encryption, and multi-party secure computing. Differential privacy usually uses three noise addition mechanisms: Laplace mechanism, Gaussian mechanism, and exponential mechanism. By adding noise to interfere with contextual information, the privacy of data is protected, but if the noise increases too much, the performance of model training will be affected. Additive and multiplicative homomorphic encryption are commonly used in homomorphic encryption: studies have shown that when using Paillier additive homomorphic encryption, the noise will double, while when using El Gamal multiplicative homomorphic encryption, the noise will increase quadratically. In order to increase the availability of data and overcome the noise problem, researchers introduced bootstrapping, which reduces noise by setting a threshold for encryption and decryption, allowing the scheme to calculate an unlimited number of operations. Batch processing, parallel homomorphic computing, or deletion of compression can also be performed to solve the noise problem. Secure multi-party computing refers to the problem of multiple participants securely calculating an agreed function without a trusted third party. The main purpose is to ensure that the private inputs of all parties are independent during the calculation process and no local data is leaked during the calculation. Research has shown that using secure multi-party computing can solve the gradient leakage problem in federated learning, and that only information exchange on the first hidden layer is needed to ensure accuracy while protecting data security. However, the information exchange process is P2P, so there will be a problem of high communication overhead.

大部分基于密码学的数据安全保护研究都是集中式解决方法，为了在数据安全保护的同时解决时间开销问题：联邦学习可以让边缘设备在不暴露原始数据的情况下协同训练机器学习模型。联邦学习通常采用参数服务器架构，其中客户端由参数服务器同步局部模型训练。通常使用同步方法实现，即中央服务器将全局模型同步发送给多个客户机，多个客户机基于本地数据训练模型后同步将更新后的模型返回中央服务器。这可能会因为掉队而变得缓慢。由于计算能力和电池时间有限，可用性和完成时间因设备而异，因此全局同步非常困难，尤其是在联合学习场景中。有人提出了一种新的联合优化异步算法来解决正则化局部问题以保证收敛，使得多个设备和服务器能够在不泄露隐私的情况下协同高效地训练模型。Most cryptography-based data security protection research is a centralized solution. In order to solve the time overhead problem while protecting data security: Federated learning allows edge devices to collaboratively train machine learning models without exposing raw data. Federated learning usually adopts a parameter server architecture, in which the client synchronizes local model training by the parameter server. It is usually implemented using a synchronous method, that is, the central server sends the global model to multiple clients synchronously, and multiple clients train the model based on local data and then synchronously return the updated model to the central server. This may become slow due to falling behind. Due to limited computing power and battery time, availability and completion time vary from device to device, so global synchronization is very difficult, especially in federated learning scenarios. A new joint optimization asynchronous algorithm has been proposed to solve the regularization local problem to ensure convergence, so that multiple devices and servers can collaboratively and efficiently train models without leaking privacy.

尽管在数据安全方面有很多的研究。但大多数都限制于解决原始数据安全问题，如何在复杂车联网空间中，同时满足移动用户大数据隐私性和可用性为目标，设计一个有效的联邦学习算法来减少通信开销的同时防止梯度泄露后导致数据被恢复的问题依旧是开放的。Although there are many studies on data security, most of them are limited to solving the original data security problem. How to meet the privacy and availability of mobile users' big data in the complex Internet of Vehicles space, and design an effective federated learning algorithm to reduce communication overhead while preventing data recovery due to gradient leakage are still open issues.

首先，联邦学习中数据都存储在本地节点中，可以减少数据传输中原始数据泄露的风险问题。但仅仅只传输梯度信息，依旧会出现原始数据被恢复的可能性。安全多方计算中数据交互可以使得多方拥有数据，降低梯度信息被泄露后样本被信息恢复的可能性。但现有的安全多方计算中用户交互信息的方式是所有用户都发送给其他用户，简单讲就是使用单播的方式，这样就会带来较高的时间开销。所以在应对车辆用户的数据安全和实时性需求时，找到一个合适的解决方案降低数据被攻击和被恢复的风险，并降低传输时延就很重要。其次，由于不同边缘服务器的数据和设备差异性，在训练过程中有针对性的提高整个模型训练精度也是很有必要的。采用典型的联邦平均同步方式进行全局参数聚合会出现掉队现象而变得缓慢。在平衡计算通信时间开销的同时，多个模型个性化训练来保障全局精度也是很重要。然而，大多数基于数据安全的联邦学习算法依赖于同步聚合算法会带来高时延对满足车联网的实时性需求具有挑战性。因此一个基于强化学习的联邦学习算法来降低时延提高精确度并保障数据安全是有必要的。First, in federated learning, data is stored in local nodes, which can reduce the risk of original data leakage during data transmission. However, if only gradient information is transmitted, there is still a possibility that the original data can be recovered. Data interaction in secure multi-party computing allows multiple parties to have data, reducing the possibility of sample information recovery after gradient information is leaked. However, the existing way of user interaction information in secure multi-party computing is that all users send it to other users, which is simply to use unicast, which will bring high time overhead. Therefore, when dealing with the data security and real-time requirements of vehicle users, it is important to find a suitable solution to reduce the risk of data being attacked and recovered, and reduce transmission delay. Secondly, due to the differences in data and devices of different edge servers, it is also necessary to improve the training accuracy of the entire model in a targeted manner during the training process. Using the typical federated average synchronization method for global parameter aggregation will cause falling behind and become slow. While balancing the computational communication time overhead, it is also important to train multiple models in a personalized way to ensure global accuracy. However, most federated learning algorithms based on data security rely on synchronous aggregation algorithms, which will bring high latency and are challenging to meet the real-time requirements of the Internet of Vehicles. Therefore, a federated learning algorithm based on reinforcement learning is necessary to reduce latency, improve accuracy and ensure data security.

发明内容Summary of the invention

本发明要解决的技术问题在于针对现有技术中的缺陷，提供一种车联网中基于MePC-F模型的实时强化联邦学习数据隐私安全方法。The technical problem to be solved by the present invention is to provide a real-time enhanced federated learning data privacy security method based on the MePC-F model in the Internet of Vehicles in response to the defects in the prior art.

本发明解决其技术问题所采用的技术方案是：The technical solution adopted by the present invention to solve the technical problem is:

本发明提供一种车联网中基于MePC-F模型的实时强化联邦学习数据隐私安全方法，该方法包括以下步骤：The present invention provides a real-time enhanced federated learning data privacy security method based on the MePC-F model in an Internet of Vehicles, the method comprising the following steps:

S1、构建多个边缘服务器E_i和一个云服务器CS；获取车辆数据D＝{D₁，D₂，…，D_i}，边缘服务器E_i获取对应车辆数据D_i；S1. Build multiple edge servers E _i and a cloud server CS; obtain vehicle data D = {D ₁ , D ₂ , ..., D _i }, and the edge server E _i obtains the corresponding vehicle data D _i ;

S2、在第k轮联邦任务中，边缘服务器E_i从云服务器CS中下载初始A型梯度

并解密为

随机初始化B型梯度

边缘服务器E_i根据其车辆数据D_i的最小化损失函数来计算本地网络模型训练中的梯度，边缘服务器E_i完成T轮本地训练完后的梯度信息记为

S2. In the kth round of federated tasks, the edge server E _i downloads the initial A-type gradient from the cloud server CS

and decrypted to

Randomly initialize B-type gradient

The edge server E _i calculates the gradient in the local network model training according to the minimized loss function of its vehicle data D _i . The gradient information of the edge server E _i after completing T rounds of local training is recorded as

S3、边缘服务器E_i通过解码函数

从

中获取需要保留的部分梯度信息

并将剩余的梯度信息

经同态加密为

再通过MePC算法广播发送给其它所有的边缘服务器E_j；边缘服务器E_i根据解码函数

获取来自其它边缘服务器E_j的对应部分梯度信息

所有边缘服务器更新共享后的A类梯度信息分别为

i∈[1，n]，n为边缘服务器的总数；S3, edge server E _i through decoding function

from

Get some gradient information that needs to be retained

And the remaining gradient information

After homomorphic encryption,

Then, the MePC algorithm is used to broadcast the message to all other edge servers _Ej ; the edge server _Ei receives the message according to the decoding function

Get the corresponding partial gradient information from other edge servers _Ej

The updated and shared A-type gradient information of all edge servers are

i∈[1,n], n is the total number of edge servers;

S4、所有边缘服务器将

上传到云服务器CS，云服务器CS通过PreFLa算法聚合全局参数，PreFLa算法通过强化学习获得最大化回报来选择边缘服务器E_i的最优参数权重比a_i，k，全局梯度参数

根据a_i，k进行聚合；参数的上传和下载过程是并行的，所有参数都经过HE加密；S4, all edge servers will

Uploaded to the cloud server CS, the cloud server CS aggregates global parameters through the PreFLa algorithm, and the PreFLa algorithm selects the optimal parameter weight ratio a _{i, k} of the edge server E _i and the global gradient parameter by maximizing the return through reinforcement learning

Aggregate according to a _{i, k} ; the parameter upload and download process is parallel, and all parameters are HE encrypted;

S5、重复步骤S2-S4，直到达到终止条件，云服务器CS计算最终的全局梯度参数，下发给各边缘服务器，边缘服务器根据多个车辆数据的特征提取，计算MePC-F模型的精确度和最优损失函数，得到训练好的MePC-F模型，完成整个训练过程，实时输出给车联网对应的服务。。S5. Repeat steps S2-S4 until the termination condition is reached. The cloud server CS calculates the final global gradient parameters and sends them to each edge server. The edge server extracts features from multiple vehicle data, calculates the accuracy and optimal loss function of the MePC-F model, obtains the trained MePC-F model, completes the entire training process, and outputs it to the corresponding service of the Internet of Vehicles in real time.

进一步地，本发明的所述步骤S2中，本地网络模型训练的具体方法为：Furthermore, in step S2 of the present invention, the specific method of training the local network model is:

采用深度神经网络DNN模型，DNN通过将不同车辆数据作为原始输入来执行端到端的特征学习和分类器训练，使用随机梯度下降作为子程序来最小化每个本地训练中损失值；A deep neural network (DNN) model is used. The DNN performs end-to-end feature learning and classifier training by taking different vehicle data as raw input, and uses stochastic gradient descent as a subroutine to minimize the loss value in each local training.

E_i在第k轮通信中从云服务器CS下载基础层参数，即解密前的初始A型梯度

并解密为A型梯度

随机初始化B型梯度

其中，k∈[1，K]，K表示联邦任务的总轮数；若为第一轮联邦任务，CS随机初始化

在本地训练之前，E_i通过使用同态加密对

解密为

并记为

E _i downloads the base layer parameters from the cloud server CS in the kth round of communication, i.e., the initial A-type gradient before decryption

and decrypted into a type A gradient

Randomly initialize B-type gradient

Among them, k∈[1, K], K represents the total number of rounds of the federated task; if it is the first round of the federated task, CS is randomly initialized

Before local training, E _i is encrypted using homomorphic encryption

Decrypted to

And record it as

局部模型的损失函数设置如下：The loss function of the local model is set as follows:

L(w_i)＝l(w_i)+λ(w_i，t-w_i，t+1)² L( _wi )=l( _wi )+λ( _wi,t - _wi,t+1 ) ²

其中，l()表示网络的损失，第二项是L2正则化项，λ是正则化系数；w_i表示局部模型中的总权重信息，w_i，t是局部模型在t时刻的权重信息，w_i，t+1是局部模型在t+1时刻的权重信息；Among them, l() represents the loss of the network, the second term is the L2 regularization term, and λ is the regularization coefficient; _wi represents the total weight information in the local model, _wi,t is the weight information of the local model at time t, and _wi,t+1 is the weight information of the local model at time t+1;

E_i初始化G_k并替换模型的权重参数w_i，通过最小化损失函数继续进行局部模型训练如下：E _i initializes G _k and replaces the model's weight parameters w _i , and continues local model training by minimizing the loss function as follows:

w_i＝w_i-ηG_k w _i = w _i −ηG _k

其中，η是学习率，G_k是

和

的总表示，这里的

随机初始化；Where η is the learning rate and _Gk is

and

The total representation here is

Random initialization;

边缘服务器E_i在达到T轮本地训练后，此时会得到每个局部模型的准确率acc_i，k和

After the edge server E _i reaches T rounds of local training, the accuracy of each local model acc _i,k and

进一步地，本发明的所述步骤S3中，MePC算法的具体方法为：Furthermore, in step S3 of the present invention, the specific method of the MePC algorithm is:

第k轮联邦任务中，所有的边缘服务器使用MePC来交换基础层梯度

其中，

表示第k轮联邦任务中第n个边缘服务器的A类加密数据，

表示第k轮联邦任务中第i个边缘服务器的A类加密数据，

表示第k轮联邦任务中第i个边缘服务器广播发给其他边缘服务器的A类加密数据，

即为

去除了自己保留的那一份后的加密数据；In the kth round of federated tasks, all edge servers use MePC to exchange base layer gradients.

in,

represents the encrypted data of type A of the nth edge server in the kth round of federated tasks,

represents the encrypted data of type A of the i-th edge server in the k-th round of federated tasks,

It indicates the encrypted data of type A broadcasted by the i-th edge server to other edge servers in the k-th round of federated tasks.

That is

The encrypted data after removing the copy you keep;

为了避免数据被破解的风险，在每个网络中取随机比例χ的

梯度即为

并保持同一轮联邦的随机比例χ相同，再将

加密为

在不同轮的联邦任务中，随机比例χ是变化的，χ∈[1，1/n]；

剩下的梯度通过同态加密为

被均分为n-1份

的值划分为：In order to avoid the risk of data being cracked, a random ratio of χ is taken in each network.

The gradient is

Keep the random proportion χ of the federation in the same round the same, and then

Encrypted as

In different rounds of federated tasks, the random ratio χ varies, χ∈[1, 1/n];

The remaining gradients are homomorphically encrypted as

Divided equally into n-1 parts

The values are divided into:

只有

被保留在E_i中，其它部分和随机参数χ将会以密文的形式广播发送给其它E_j；通过这种方式，即使部分传输内容被攻击，最初的数据

也不会泄露；only

is retained in E _i , and the other parts and random parameters χ will be broadcasted to other E _j in the form of ciphertext; in this way, even if part of the transmission content is attacked, the original data

It will not be leaked;

共享给其它E_j的梯度信息是

The gradient information shared to other E _j is

当Ei接收到由其它服务器发送的数据包

它在本地执行数据验证。When Ei receives a data packet sent by other servers

It performs data validation locally.

进一步地，本发明的所述步骤S3中，在本地执行数据验证的具体方法为：Furthermore, in step S3 of the present invention, the specific method of performing data verification locally is:

在第k轮联邦任务中，使用相应的“乘法”方法进行验证，每个边缘服务器自己设计两个解码函数，如下：In the kth round of federated tasks, the corresponding "multiplication" method is used for verification, and each edge server designs two decoding functions by itself, as follows:

其中，L₀是

的长度，L’是

的长度；解码函数的下标k，表示第k轮联邦任务中的解码函数；Among them, _L0 is

The length, L' is

The length of the decoding function; the subscript k of the decoding function represents the decoding function in the k-th round of federated tasks;

L₀＝χ·LL ₀ = x·L

其中，L是

的长度，

与

的长度相等；Where L is

Length,

and

are of equal length;

要求

满足所有边缘服务器的解码函数对同一个数据包执行“并”操作得到全0，并执行“交”操作得到全1，即：Require

The decoding functions that satisfy all edge servers perform a "and" operation on the same data packet to obtain all 0s, and perform a "cross" operation to obtain all 1s, that is:

首先，初始化解码函数如下：First, initialize the decoding function as follows:

将数据包

与其它服务器中相应的解码函数相乘；由于

中0的二进制位被乘得0，所以E_i保证只得到它自己的部分数据包；当

中的二进制位为1时，得到对应位置的梯度信息的密文，如下：Packet

Multiply with the corresponding decoding function in other servers; since

The 0 bits in are multiplied by 0, so E _i is guaranteed to get only its own part of the data packet; when

When the binary bit in is 1, the ciphertext of the gradient information at the corresponding position is obtained as follows:

E_i将从其它边缘服务器E_j获取的所有数据包数组添加到对应位置，得到所有密文数据，并更新为最后

即：E _i adds all the data packet arrays obtained from other edge servers E _j to the corresponding positions, obtains all the ciphertext data, and updates to the final

Right now:

每次进行安全多方计算时，随着k的增加，将每个E_i中的解码函数

的二进制向左循环移动m个单位，以保证

共享的动态性，并将它们平均划分到E₁，E₂，…，E_n中，且每个部分的数据信息不重复。Each time a secure multi-party computation is performed, as k increases, the decoding function in each E _i is replaced by

The binary is circularly shifted to the left by m units to ensure

The dynamics of sharing are evenly divided into E ₁ , E ₂ , …, E _n , and the data information of each part is not repeated.

进一步地，本发明的所述步骤S4中，PreFla算法的具体方法为：Furthermore, in step S4 of the present invention, the specific method of the PreFla algorithm is:

PreFLa采用强化学习RL进行适配来选择最优参数权重比a_i，k聚合全局参数

PreFLa uses reinforcement learning RL to adapt to select the optimal parameter weight ratio a _i,k to aggregate global parameters

在上行通信阶段，每个边缘服务器不仅训练局部模型，还将本地参数上传到云服务器CS进行联合聚合；在第k轮联邦中执行MePC算法后，E_i通过TLS/SSL安全通道将参数

和

上传到CS；在聚合阶段，由于每个ES的分布不平衡和数据异构性，其模型参数用于聚合对该阶段的收敛速度具有至关重要的影响；因此，有必要考虑k轮联邦聚合中参与者E_i的参数权重比a_i，k；In the uplink communication phase, each edge server not only trains the local model, but also uploads the local parameters to the cloud server CS for joint aggregation; after executing the MePC algorithm in the kth round of federation, E _i transmits the parameters to the cloud server CS through the TLS/SSL secure channel.

and

Upload to CS; In the aggregation stage, due to the imbalanced distribution and data heterogeneity of each ES, its model parameters used for aggregation have a crucial impact on the convergence speed of this stage; therefore, it is necessary to consider the parameter weight ratio a _i,k of participant E _i in the k-round federated aggregation;

使用基于DQN的强化学习去预测参数权重比，通过Q函数来存储信息，以防止空间多维灾难；为了更好地实现模型个性化，减少MePC-F中上传权重的等待时间，用DQN来选择最优参数权重比a_i，k，聚合更新CS中的全局参数

强化学习包括：状态、动作、奖励函数以及反馈。Use DQN-based reinforcement learning to predict parameter weight ratios and store information through Q functions to prevent spatial multidimensional disasters; in order to better personalize the model and reduce the waiting time for uploading weights in MePC-F, use DQN to select the optimal parameter weight ratio a _{i, k} and aggregate to update the global parameters in CS

Reinforcement learning includes: state, action, reward function, and feedback.

进一步地，本发明的所述步骤S4中，状态、动作、奖励函数以及反馈的具体方法为：Furthermore, in step S4 of the present invention, the specific methods of state, action, reward function and feedback are:

状态：第k轮的状态

其中，

是精度差，表示为：Status: Status of round k

in,

is the precision difference, expressed as:

动作：参数权重占比a_i，k表示为第k轮联邦任务的动作；为避免陷入局部最优解，采用ε-贪心算法优化动作选择过程，得到a_i，k：Action: The parameter weight ratio _ai,k represents the action of the federated task in round k. To avoid falling into the local optimal solution, the ε-greedy algorithm is used to optimize the action selection process and obtain _ai,k :

其中P是权重排列的集合，rand是一个随机数，rand∈[0，1]，Q(s_i，k，a_i，k)指代理在状态s_i，k下采取行动a_i，k时的累积折现收益；一旦DQN在测试期间被训练为近似Q(s_i，k，a_i，k)，DQN代理将为第k轮中的所有动作计算{Q(s_i，k，a_i，k)|a_i，k∈[P]}；每个动作值表示代理通过在状态s_i，k选择特定动作a_i，k可以获得的最大预期回报；where P is a set of weight permutations, rand is a random number, rand ∈ [0, 1], Q( _{si, k} , ai _{, k} ) refers to the cumulative discounted return of the agent when taking action ai _{, k} in state si _{, k} ; once the DQN is trained to approximate Q(si _{, k} , ai _{, k} ) during testing, the DQN agent will calculate {Q(si _{, k} , ai _{, k} )|ai _{, k} ∈ [P]} for all actions in the kth round; each action value represents the maximum expected return that the agent can obtain by choosing a specific action ai _, _{k in state si,} k;

奖励：将第k轮联邦结束时观察到的奖励设置为：Reward: Set the reward observed at the end of the kth federation round to:

其中，

是一个正数，确保r_k随着测试准确度Δacc_i，k呈指数增长；第一项激励代理选择能够实现更高测试精度的设备；

用来控制随着Δacc_i，k增长r_k的变化；当Δacc_i，k小于0时，有r_k∈(-1，0)；in,

is a positive number, ensuring that r _k grows exponentially with the test accuracy Δacc _i,k ; the first term motivates the agent to select equipment that can achieve higher test accuracy;

It is used to control the change of r _k as Δacc _i,k increases; when Δacc _i,k is less than 0, r _k ∈(-1,0);

训练DQN代理以最大化累积折扣奖励的期望，如下式所示：The DQN agent is trained to maximize the expectation of the cumulative discounted reward as shown below:

其中，γ∈(0，1]，表示一个折扣未来奖励的因子；Among them, γ∈(0,1], represents a factor that discounts future rewards;

在获得r_k之后，云服务器CS为每轮联邦任务保存多维四元组B_k＝(s_i，k，a_i，k，r_k，s_i，k+1)；最优动作值函数Q(s_i，k，a_i，k)是RL代理寻求的备忘单，定义为从s_i，k开始的累积折现收益的最大期望：After obtaining r _k , the cloud server CS saves the multidimensional quadruple B _k = (s _{i, k} , a _i, k , r _k , s _{i, k + 1} ) for each round of federated tasks; the optimal action-value function Q(s _{i, k} , a _{i, k} ) is the cheat sheet sought by the RL agent and is defined as the maximum expectation of the cumulative discounted return starting from s _{i, k} :

Q(s_i，k，a_i，k)＝E(r_i，k+γmax Q(s_i，k+1，a_i，k)|s_i，k，a_i，k)Q(s _i,k ,a _i,k )=E(r _i,k +γmax Q(s _i,k+1 ,a _i,k )|s _i,k ,a _i,k )

应用函数逼近技术学习一个参数化的值函数Q(s_i，k，a_i，k；w_k)逼近最优值函数Q(s_i，k，a_i，k)；r_k+γmax Q(s_i，k+1，a_i，k)是Q(s_i，k，a_i，k；w_k)学习的目标；DNN用于表示函数逼近器；RL学习问题变成最小化目标和逼近器之间的MSE损失，定义为：Function approximation techniques are applied to learn a parameterized value function Q(s _{i, k} , a _{i, k} ; w _k ) to approximate the optimal value function Q(s _{i, k} , a _{i, k} ); r _k +γmax Q(s _{i, k+1} , a _{i, k} ) is the goal of learning Q(s _{i, k} , a _{i, k} ; w _k ); DNN is used to represent the function approximator; the RL learning problem becomes minimizing the MSE loss between the target and the approximator, defined as:

l(w_k)＝(r_i，k+γmax Q(s_i，k+1，a_i，k；w_k)-Q(s_i，k，a_i，k；w_k))² l( _wk )=(ri _,k +γmax Q(si _,k+1 ,ai _,k ; _wk )-Q(si _,k ,ai _,k ; _wk )) ²

CS更新全局参数w_k为：CS updates the global parameter _wk as:

其中，η≥0是步长；Where η≥0 is the step size;

云服务器CS得到最佳学习模型后，得到第k轮权重比序列的a_i，k，将全局参数

更新为：After the cloud server CS obtains the best learning model, it obtains the k-th round weight ratio sequence a _{i, k} and sets the global parameter

Updated to:

所有边缘服务器更新全局参数

并开始接下来的T轮本地训练。All edge servers update global parameters

And start the next T rounds of local training.

进一步地，本发明的该方法中HE加密的方法具体为：Furthermore, the HE encryption method in the method of the present invention is specifically as follows:

权重矩阵和偏置向量的加密方案遵循相同的思想，实数a的加法同态加密表示为a^E，在加法同态加密中，对于任意两个数a和b，有a^E+b^E＝(a+b)^E；将任何实数r转换为编码的有理数不动点v的方法是：The encryption schemes for the weight matrix and the bias vector follow the same idea. The additive homomorphic encryption of a real number a is represented by a ^E . In additive homomorphic encryption, for any two numbers a and b, a ^E + b ^E = (a+b) ^E . The method to convert any real number r into an encoded rational fixed point v is:

认为梯度

中的每个编码实数r可以表示为有理定理的H位数，由一个符号位、z位整数位和d位小数位组成；因此，每个可以编码的有理数都由其H＝1+z+d位定义；执行编码以允许乘法运算，这需要运算模数为H+2d以避免比较；Think gradient

Each coded real number r in can be represented as an H-bit rational theorem, consisting of a sign bit, z integer bits, and d fraction bits; therefore, each coded rational number is defined by its H = 1 + z + d bits; the encoding is performed to allow multiplication operations, which require operations modulo H + 2d to avoid comparisons;

解码定义为：Decode is defined as:

这些编码数字的乘法需要去除因子1/2d；使用Paillier加法加密时，可以准确计算编码乘法的情况，但只能保证一次同态乘法；为简单起见，在解码时处理它；Multiplication of these encoded numbers requires the removal of the factor 1/2d; when using Paillier addition encryption, the encoded multiplication can be calculated exactly, but only one homomorphic multiplication is guaranteed; for simplicity, it is handled during decoding;

最大的可加密整数是V-1，所以最大的可加密实数必须考虑到这一点，因此，整数z和小数d的选择条件如下：The largest encryptable integer is V-1, so the largest encryptable real number must take this into account. Therefore, the integer z and the decimal d are selected as follows:

V≥2^H+2d≥2^1+z+3d。V≥2 ^H+2d ≥2 ^1+z+3d .

进一步地，本发明的所述步骤S5中最优的损失函数为Furthermore, the optimal loss function in step S5 of the present invention is

其中，L(w_i)表示E_i网络的损失。Where L( _wi ) represents the loss of the _Ei network.

本发明产生的有益效果是：The beneficial effects produced by the present invention are:

(1)提出一种多方广播安全计算的联邦学习模型(MePC-F)。该模型将MePC算法和PreFla算法相结合解决车联网中联邦学习训练数据安全和通信开销问题。并考虑同态加密以及安全多方计算的混合优势来防止端与端之间的数据泄露，在数据被攻击后降低原始数据的还原度，以最大限度实现数据的隐私安全保护。(1) A federated learning model with multi-party broadcast secure computing (MePC-F) is proposed. This model combines the MePC algorithm and the PreFla algorithm to solve the problems of data security and communication overhead in federated learning training in the Internet of Vehicles. The hybrid advantages of homomorphic encryption and secure multi-party computing are considered to prevent data leakage between ends, and reduce the degree of restoration of original data after the data is attacked, so as to maximize the privacy and security protection of data.

(2)提出一种安全广播多方计算MePC。针对安全多方计算，只共享第一层的梯度信息就可以大大降低数据被恢复的风险并减少通信量。在共享过程中采用广播的方式，边缘服务器模型经过解码函数取各自的部分，可以将时间复杂度从O(n²)降低到O(n)，在防止原始数据泄露的同时减小通信开销。(2) A secure broadcast multi-party computation (MePC) is proposed. For secure multi-party computation, only sharing the gradient information of the first layer can greatly reduce the risk of data recovery and reduce the amount of communication. In the sharing process, the broadcast method is adopted. The edge server model takes its own part through the decoding function, which can reduce the time complexity from O(n ² ) to O(n), preventing the leakage of original data while reducing the communication overhead.

(3)提出了基于权重占比的联邦学习算法PreFla。利用PreFla找到最优的梯度权重占比去聚合全局参数，使用每个边缘服务器的精确度差值来设计奖励函数，使得整体回报最大的动作选择即为每轮联邦的权重占比。并在损失函数中添加L2正则化项来促进边缘服务器协作并降低数据异构性带来的时延和性能问题。从而更好的泛化全局模型、加速收敛。(3) A federated learning algorithm PreFla based on weight ratio is proposed. PreFla is used to find the optimal gradient weight ratio to aggregate global parameters, and the accuracy difference of each edge server is used to design the reward function, so that the action selection with the largest overall return is the weight ratio of each round of federation. An L2 regularization term is added to the loss function to promote edge server collaboration and reduce the latency and performance issues caused by data heterogeneity. This can better generalize the global model and accelerate convergence.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

下面将结合附图及实施例对本发明作进一步说明，附图中：The present invention will be further described below with reference to the accompanying drawings and embodiments, in which:

图1是本发明实施例的MePC-F模型；FIG1 is a MePC-F model according to an embodiment of the present invention;

图2是本发明实施例的MePC-F模型的流程图；FIG2 is a flow chart of a MePC-F model according to an embodiment of the present invention;

图3是本发明实施例的MePC算法；FIG3 is a MePC algorithm according to an embodiment of the present invention;

图4是本发明实施例的MNIST上四种方法隐藏第一个隐藏层不隐藏时的DLG结果；(a)FL；(b)MePC-F；(c)PeMPC；(d)Gaussian；(e)Laplacian；FIG4 is a DLG result of four methods on MNIST in an embodiment of the present invention when the first hidden layer is not hidden; (a) FL; (b) MePC-F; (c) PeMPC; (d) Gaussian; (e) Laplacian;

图5是本发明实施例的DLG在MNIST上的性能，当第一个隐藏层的梯度被四种方法(高斯分布、拉普拉斯分布、PEMPC和MePC-F)替换时；FIG5 is the performance of DLG on MNIST according to an embodiment of the present invention when the gradient of the first hidden layer is replaced by four methods (Gaussian distribution, Laplace distribution, PEMPC, and MePC-F);

图6是本发明实施例的No-IID MNIST数据的平均准确率和损失；FIG6 is an average accuracy and loss of No-IID MNIST data according to an embodiment of the present invention;

图7是本发明实施例的No-IID CAFIR-10数据的平均准确率和损失。FIG. 7 shows the average accuracy and loss of No-IID CAFIR-10 data according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not used to limit the present invention.

本发明实施例中涉及的参数说明如下：The parameters involved in the embodiments of the present invention are described as follows:

表1参数说明Table 1 Parameter Description

其中，E_i表示当前的边缘服务器，E_j表示除当前边缘服务器之外的边缘服务器，E_s表示所有边缘服务器。Among them, E _i represents the current edge server, E _j represents the edge server other than the current edge server, and _Es represents all edge servers.

本发明实施例的车联网中基于MePC-F模型的实时强化联邦学习数据隐私安全方法，包括以下步骤：The real-time enhanced federated learning data privacy security method based on the MePC-F model in the Internet of Vehicles of the embodiment of the present invention includes the following steps:

并解密为

随机初始化B型梯度

and decrypted to

Randomly initialize B-type gradient

S3、边缘服务器E_i通过解码函数

从

中获取需要保留的部分梯度信息

并将剩余的梯度信息

经同态加密为

获取来自其它边缘服务器E_j的对应部分梯度信息

所有边缘服务器更新共享后的A类梯度信息分别为

from

Get some gradient information that needs to be retained

And the remaining gradient information

After homomorphic encryption,

Get the corresponding partial gradient information from other edge servers _Ej

The updated and shared A-type gradient information of all edge servers are

i∈[1,n], n is the total number of edge servers;

S4、所有边缘服务器将

上传到云服务器CS，云服务器CS通过PreFLa算法聚合全局参数，PreFLa算法通过强化学习获得最大化回报来选择边缘服务器E_i的最优参数权重比a_i，k，全局参数

Uploaded to the cloud server CS, the cloud server CS aggregates the global parameters through the PreFLa algorithm, and the PreFLa algorithm selects the optimal parameter weight ratio a _{i, k} of the edge server E _i by maximizing the return through reinforcement learning, and the global parameters

S5、重复步骤S2-S4，直到达到终止条件，完成整个训练过程。终止条件可以是最大训练周期数、损失函数的收敛性或其他用户定义的条件。最后，可以根据以下公式(1)得到最优的损失函数。S5. Repeat steps S2-S4 until the termination condition is reached and the entire training process is completed. The termination condition can be the maximum number of training cycles, the convergence of the loss function, or other user-defined conditions. Finally, the optimal loss function can be obtained according to the following formula (1).

本地训练的具体方法为：The specific method of local training is:

在局部模型阶段，采用深度神经网络(DNN)来学习云模型和ES模型。DNN通过将不同用户数据作为原始输入来执行端到端的特征学习和分类器训练。将在提出的算法中使用随机梯度下降作为子程序来最小化每个本地训练中损失值。In the local model stage, a deep neural network (DNN) is used to learn the cloud model and ES model. DNN performs end-to-end feature learning and classifier training by taking different user data as raw input. Stochastic gradient descent is used as a subroutine in the proposed algorithm to minimize the loss value in each local training.

在下行通信阶段E_i在第k(k∈[1，K])轮通信中从CS下载基础层参数

并随机初始化

其中，K表示联邦任务的总轮数。若为第一轮联邦任务，CS随机初始化

在本地训练之前，E_i需要通过使用同态加密(公式(4))对

解密为

并记为

In the downlink communication phase, E _i downloads the base layer parameters from CS in the kth (k∈[1, K]) round of communication.

And randomly initialize

Where K represents the total number of rounds of the federated task. If it is the first round of the federated task, CS is randomly initialized.

Before local training, E _i needs to be encrypted using homomorphic encryption (Formula (4)).

Decrypted to

And record it as

为了更好的体现模型个性化，局部模型的损失函数设置如下：In order to better reflect the model personalization, the loss function of the local model is set as follows:

L(w_i)＝l(w_i)+λ(w_i，t-w_i，t+1)² (16)L( _wi )=l( _wi )+λ( _wi,t - _wi,t+1 ) ² (16)

其中l()表示网络的损失，例如分类任务的交叉熵损失。第二项是L2正则化项，既可以保留自己的个性化能力，又可以提高与其他参与者的协作效率。λ是正则化系数。Where l() represents the loss of the network, such as the cross entropy loss of the classification task. The second term is the L2 regularization term, which can both retain its own personalization ability and improve the efficiency of cooperation with other participants. λ is the regularization coefficient.

E_i初始化G_k并替换模型的权重参数w_i，继续进行局部模型训练如下E _i initializes G _k and replaces the model's weight parameter w _i , and continues local model training as follows

w_i＝w_i-ηG_k (17)w _i = w _i - ηG _k (17)

其中，η是学习率，G_k是

和

的总表示。这里的

随机初始化。Where η is the learning rate and _Gk is

and

Here is the total representation of

Random initialization.

E_i在达到T轮本地训练后，此时会得到每个局部模型的准确率acc_i，k、

和

端与端之间直接共享用户信息是禁止的，边缘服务器中的数据在通信前需要进行加密，防止数据在通信前被攻击。这个过程使用HE来避免信息泄露。以下将展示使用实数进行加法HE的过程。权重矩阵和偏置向量的加密方案遵循相同的思想，实数a的加法同态加密表示为a^E。在加法同态加密中，对于任意两个数a和b，有a^E+b^E＝(a+b)^E。将任何实数r转换为编码的有理数不动点v的方法是：After E _i reaches T rounds of local training, the accuracy of each local model acc _i,k ,

and

Direct sharing of user information between ends is prohibited. Data in the edge server needs to be encrypted before communication to prevent data from being attacked before communication. This process uses HE to avoid information leakage. The following will show the process of additive HE using real numbers. The encryption schemes for weight matrices and bias vectors follow the same idea. The additive homomorphic encryption of a real number a is represented as a ^E . In additive homomorphic encryption, for any two numbers a and b, a ^E +b ^E ＝(a+b) ^E . The method to convert any real number r into an encoded rational number fixed point v is:

认为梯度

中的每个编码实数r可以表示为有理定理的H位数，由一个符号位、z位整数位和d位小数位组成。因此，每个可以编码的有理数都由其H＝1+z+d位定义。执行编码以允许乘法运算，这需要运算模数为H+2d以避免比较。Think gradient

Each coded real number r in can be represented as an H-bit rational theorem, consisting of a sign bit, z integer bits, and d fraction bits. Therefore, each coded rational number is defined by its H = 1 + z + d bits. The encoding is performed to allow multiplication operations, which require operations modulo H + 2d to avoid comparisons.

解码定义为：Decode is defined as:

这些编码数字的乘法需要去除因子1/2d。使用Paillier加法加密时，可以准确计算编码乘法的情况，但只能保证一次同态乘法。为简单起见，在解码时处理它。Multiplication of these encoded numbers requires the removal of the factor 1/2d. When encrypting with Paillier addition, the encoded multiplication can be calculated exactly, but only one homomorphic multiplication is guaranteed. For simplicity, it is handled during decoding.

如果只发生了一次代码乘法，则它是正确的。因为最大的可加密整数是V-1，所以最大的可加密实数必须考虑到这一点。因此，整数z和小数d的选择必须如下：If only one code multiplication occurs, then it is correct. Since the largest encryptable integer is V-1, the largest encryptable real number must take this into account. Therefore, the integer z and the decimal d must be chosen as follows:

V≥2^H+2d≥2^1+z+3d (5)V≥2 ^H+2d ≥2 ^1+z+3d (5)

加密过后的

和aCC_i，k分别表示为

和

Encrypted

and aCC _i,k are represented as

and

MePC算法的具体方法如图3所示。The specific method of the MePC algorithm is shown in Figure 3.

第k轮联邦任务中，使用MePC来交换基础层梯度

为了避免数据被破解的风险，在每个网络中取随机比例χ的

梯度即为

并保持同一轮联邦的随机比例χ相同。在不同轮的联邦任务中，随机比例χ(χ∈[1，1/n])是变化的，

剩下的梯度被均分为n-1份

如图3所示，

的值划分为：In the kth round of federated tasks, MePC is used to exchange the base layer gradients

In order to avoid the risk of data being cracked, a random ratio of χ is taken in each network.

The gradient is

And keep the random ratio χ of the same round of federation the same. In different rounds of federation tasks, the random ratio χ (χ∈[1, 1/n]) is changed.

The remaining gradient is divided equally into n-1 parts

As shown in Figure 3,

The values are divided into:

只有

被保留在E_i中，其他部分和随机参数χ将会以密文的形式广播发送给其他ESs。通过这种方式，即使部分传输内容被攻击，最初的数据

也不会泄露。具体来说，如果攻击者想要获取数据

就必须获取

的所有部分。但是，

和χ在参与者E_i和接收者E_j之间通信时都是通过同态加密保持密文形式。only

is retained in E _i , and the other parts and random parameters χ will be broadcasted to other ESs in the form of ciphertext. In this way, even if part of the transmission content is attacked, the original data

Specifically, if an attacker wants to obtain data

You must obtain

However,

When communicating between participant E _i and receiver E _j, both χ and χ are kept in ciphertext form through homomorphic encryption.

共享给其他ESs的梯度信息是

The gradient information shared to other ESs is

当E_i接收到由其他服务器发送的数据包

它在本地执行数据验证。具体来说，它使用相应的“乘法”方法进行验证。每个边缘服务器自己设计两个解码功能，如下：When E _i receives a data packet sent by other servers

It performs data verification locally. Specifically, it uses the corresponding "multiplication" method for verification. Each edge server designs two decoding functions by itself, as follows:

其中，L₀是

的长度，L’是

的长度。Among them, _L0 is

length, L' is

Length.

L₀＝χ·L (9)L ₀ = χ·L (9)

其中，L是

的长度，

与

的长度相等。Where L is

Length,

and

are of equal length.

要求

满足所有ES的解码函数对同一个数据包执行“并”操作得到全0，并执行“交”操作得到全1，即Require

The decoding function that satisfies all ESs performs a "and" operation on the same data packet to obtain all 0s, and performs a "cross" operation to obtain all 1s, that is,

首先，初始化解码函数如下，First, initialize the decoding function as follows,

需要说明的是，在初始化的时候，不同E_i发送的数据包在同一个联邦任务中的数据解码也是同一个函数。It should be noted that, at the time of initialization, the data packets sent by different E _i in the same federated task also use the same function for data decoding.

将数据包

与其他服务器中相应的解码函数相乘。由于

中0的二进制位被乘得0，所以E_i可以保证只得到它自己的部分数据包。当

中的二进制位为1时，可以得到对应位置的梯度信息的密文，如下：Packet

Multiply by the corresponding decoding function in the other servers.

The 0 bits in are multiplied to 0, so E _i can ensure that it only gets its own part of the data packet.

When the binary bit in is 1, the ciphertext of the gradient information at the corresponding position can be obtained as follows:

E_i将从其他ES获取的所有数据包数组添加到对应位置，得到所有密文数据，并更新为最后

即E _i adds all the data packet arrays obtained from other ES to the corresponding positions, obtains all the ciphertext data, and updates to the last

Right now

每次进行安全多方计算时，随着k的增加将每个E_i中的解码函数

的二进制向左循环移动m个单位，以保证

共享的动态性，并可以将它们平均划分到E₁，E₂，…，E_n中，且每个部分的数据信息不重复。Each time a secure multi-party computation is performed, the decoding function in each E _i is replaced by

The binary is circularly shifted to the left by m units to ensure

The shared dynamics can be evenly divided into E ₁ , E ₂ , …, E _n , and the data information in each part is not repeated.

PreFla算法的具体方法为：The specific method of the PreFla algorithm is:

车联网中的数据分布较为分散且数据的不平衡和异构化问题，在满足实时性要求的同时提高个性化服务需求就很困难。为了防止不同边缘服务器通信过程中的用户隐私泄露，在通信过程中使用HE对参数进行加密。为了更好的实现针对不同的用户数据进行个性化训练，将第一层设置为基础层，使用现有的联邦学习方法以协作方式进行训练，而其他层作为个性化层在本地训练，从而能够捕获不同ES设备的个人信息。这样，在联合训练过程之后，全局共享的基础层可以转移到ES中，以构建自己的个性化深度学习模型，并使用其独特的个性化层。仅从CS下载基础层参数

个性化层的参数

随机生成并使用本地数据进行微调。为了满足实时性要求，实现ES的个性化需求，PreFLa采用强化学习(RL)进行适配来选择最优参数权重比a_i，k聚合全局参数

The data distribution in the Internet of Vehicles is relatively scattered, and the data is unbalanced and heterogeneous, making it difficult to meet real-time requirements while improving personalized service needs. In order to prevent user privacy leakage during communication between different edge servers, HE is used to encrypt parameters during the communication process. In order to better achieve personalized training for different user data, the first layer is set as the base layer and trained in a collaborative manner using the existing federated learning method, while the other layers are trained locally as personalized layers, thereby being able to capture personal information of different ES devices. In this way, after the joint training process, the globally shared base layer can be transferred to ES to build its own personalized deep learning model and use its unique personalized layer. Only download the base layer parameters from CS

Parameters of the personalization layer

Randomly generate and fine-tune using local data. In order to meet the real-time requirements and realize the personalized needs of ES, PreFLa uses reinforcement learning (RL) to adapt to select the optimal parameter weight ratio a _i,k aggregate global parameters

在上行通信阶段，每个ES不仅训练局部模型，还将本地参数上传到CS进行联合聚合。在第k轮联邦中执行MePC算法后，E_i通过TLS/SSL安全通道将参数

和

上传到BS。在聚合阶段，由于每个ES的分布不平衡和数据异构性，其模型参数用于聚合对该阶段的收敛速度具有至关重要的影响。因此，有必要考虑k轮联邦聚合中参与者E_i的参数权重比a_i，k。In the uplink communication phase, each ES not only trains the local model, but also uploads the local parameters to the CS for joint aggregation. After executing the MePC algorithm in the kth round of federation, E _i transmits the parameters to the CS through the TLS/SSL secure channel.

and

Upload to BS. In the aggregation stage, due to the imbalanced distribution and data heterogeneity of each ES, its model parameters used for aggregation have a crucial impact on the convergence speed of this stage. Therefore, it is necessary to consider the parameter weight ratio a _i,k of participant E _i in the k-round federated aggregation.

本发明中使用基于DQN的强化学习去预测参数权重比，通过Q函数而不是Q-Learning中的表存储来存储信息，以防止空间多维灾难。为了更好地实现模型个性化，减少MePC-F中上传权重的等待时间，用DQN来选择最优参数权重比a_i，k来，聚合更新CS中的全局参数

强化学习中核心内容有：状态、动作、奖励函数以及反馈，定义如下：In this invention, DQN-based reinforcement learning is used to predict parameter weight ratios, and information is stored through Q functions instead of table storage in Q-Learning to prevent spatial multidimensional disasters. In order to better realize model personalization and reduce the waiting time for uploading weights in MePC-F, DQN is used to select the optimal parameter weight ratio a _{i, k} to aggregate and update the global parameters in CS.

The core contents of reinforcement learning are: state, action, reward function and feedback, which are defined as follows:

状态：第k轮的状态

其中，

是精度差，表示为：Status: Status of round k

in,

is the precision difference, expressed as:

动作：参数权重占比a_i，k表示为第k轮联邦任务的动作。为避免陷入局部最优解，采用ε-贪心算法优化动作选择过程，可以得到a_i，k：Action: The parameter weight ratio _ai,k represents the action of the federated task in round k. To avoid falling into the local optimal solution, the ε-greedy algorithm is used to optimize the action selection process, and ai _,k can be obtained:

其中P是权重排列的集合，rand是一个随机数(rand∈[0，1])，Q(s_i，k，a_i，k)指代理在状态s_i，k下采取行动a_i，k时的累积折现收益。一旦DQN在测试期间被训练为近似Q(s_i，k，a_i，k)，DQN代理将为第k轮中的所有动作计算{Q(s_i，k，a_i，k)|a_i，k∈[P]}。每个动作值表示代理通过在状态s_i，k选择特定动作a_i，k可以获得的最大预期回报。Where P is a set of weight permutations, rand is a random number (rand ∈ [0, 1]), and Q( _{si, k} , ai _{, k} ) refers to the cumulative discounted return of the agent when taking action ai _{, k} in state si _, k. Once DQN is trained to approximate Q(si _{, k} , ai _{, k} ) during testing, the DQN agent computes {Q(si _{, k} , ai _{, k} )|ai _{, k} ∈ [P]} for all actions in the kth round. Each action value represents the maximum expected return that the agent can obtain by choosing a specific action _ai, _{k in state si,} k.

奖励：将第k轮联邦结束时观察到的奖励设置为Reward: Set the reward observed at the end of the kth federation round to

其中，

是一个正数，确保r_k随着测试准确度Δacc_i，k呈指数增长。第一项激励代理选择能够实现更高测试精度的设备。

用来控制随着Δacc_i，k增长r_k的变化。一般来说，随着机器学习训练的进行，模型准确率会以较慢的速度增加。但在联邦合作任务中，由于数据分布不平衡和异质性，模型精度可能会降低。因此，随着FL进入后期阶段，使用指数项来放大边界准确度的增加。第二项-1用来鼓励智能体提高模型准确性，因为当Δacc_i，k小于0时，有r_k∈(-1，0)。in,

is a positive number that ensures that r _k grows exponentially with the test accuracy Δacc _i,k . The first term motivates the agent to select devices that can achieve higher test accuracy.

It is used to control the change of r _k as Δacc _i,k grows. In general, as machine learning training progresses, model accuracy increases at a slower rate. However, in federated cooperative tasks, model accuracy may decrease due to imbalanced data distribution and heterogeneity. Therefore, as FL enters the later stages, an exponential term is used to amplify the increase in boundary accuracy. The second term -1 is used to encourage the agent to improve model accuracy because when Δacc _i,k is less than 0, there is r _k ∈(-1, 0).

训练DQN代理以最大化累积折扣奖励的期望，如下式所示The DQN agent is trained to maximize the expectation of the cumulative discounted reward as shown below

其中，γ∈(0，1]一个折扣未来奖励的因子。where γ∈(0,1] is a factor that discounts future rewards.

在获得r_k之后，CS为每轮联邦任务保存多维四元组B_k＝(s_i，k，a_i，k，r_k，s_i，k+1)。最优动作值函数Q(s_i，k，a_i，k)是RL代理寻求的备忘单，定义为从s_i，k开始的累积折现收益的最大期望：After obtaining r _k , CS saves a multidimensional quadruple B _k = (s _{i , k} , a _{i , k} , r _k , s _{i , k + 1} ) for each round of federated tasks. The optimal action-value function Q (s _{i , k} , a _{i , k} ) is the cheat sheet sought by the RL agent and is defined as the maximum expectation of the cumulative discounted return starting from s _{i , k} :

Q(s_i，k，a_i，k)＝E(r_i，k+γmax Q(s_i，k+1，a_i，k)|s_i，k，a_i，k) (22)Q(s _i,k ,a _i,k )=E(r _i,k +γmax Q(s _i,k+1 ,a _i,k )|s _i,k ,a _i,k ) (22)

然后，可以应用函数逼近技术学习一个参数化的值函数Q(s_i，k，a_i，k；w_k)逼近最优值函数Q(s_i，k，a_i，k)。第一步的r_k+γmax Q(s_i，k+1，a_i，k)是Q(s_i，k，a_i，k；w_k)学习的目标。通常，DNN用于表示函数逼近器。RL学习问题变成最小化目标和逼近器之间的MSE损失，定义为：Then, function approximation techniques can be applied to learn a parameterized value function Q(s _{i, k} , a _{i, k} ; w _k ) to approximate the optimal value function Q(s _{i, k} , a _{i, k} ). The first step r _k +γmax Q(s _{i, k+1} , a _{i, k} ) is the target of Q(s _{i, k} , a _{i, k} ; w _k ) learning. Typically, DNN is used to represent the function approximator. The RL learning problem becomes minimizing the MSE loss between the target and the approximator, defined as:

l(w_k)＝(r_i，k+γmax Q(s_i，k+1，a_i，k；w_k)-Q(s_i，k，a_i，k；w_k))² (23)l( _wk )=(ri _,k +γmax Q(si _,k+1 ,ai _,k ; _wk )-Q(si _,k ,ai _,k ; _wk )) ² (23 )

CS更新全局参数w_k为：CS updates the global parameter _wk as:

其中，η≥0是步长。where η ≥ 0 is the step size.

CS重复上述步骤以获得最佳学习模型。CS可以得到第k轮权重比序列的a_i，k，将全局参数

更新为：CS repeats the above steps to obtain the best learning model. CS can obtain the k-th round weight ratio sequence a _{i, k} and set the global parameter

Updated to:

所有ES更新全局参数

并开始接下来的T轮本地训练。All ES update global parameters

And start the next T rounds of local training.

实验测试实施例：Experimental test example:

为了验证所提出的机制的有效性，给出了实验结果和分析。分别考虑具有1个云服务器、10个边缘服务器的系统。实验学习率α为0.01，折扣因子γ为0.9。正整数

取3。各参数取值见表2。In order to verify the effectiveness of the proposed mechanism, experimental results and analysis are given. Consider a system with 1 cloud server and 10 edge servers respectively. The experimental learning rate α is 0.01 and the discount factor γ is 0.9. Positive integer

The value is 3. See Table 2 for the values of each parameter.

表2参数设置Table 2 Parameter settings

在两个数据集上验证了提出的模型的有效性：MNIST和CIFAR-10。所提出的联邦学习模型MePC-F的性能是根据DLG的重建图像、平均精度和平均损失来评估的。首先，防御DLG攻击的五种方案的性能，然后将所提出的联邦学习模型MePC-F的性能与集中式和PeMPC进行比较。以下场景中的所有结果都是1000次独立实验的平均值。The effectiveness of the proposed model is verified on two datasets: MNIST and CIFAR-10. The performance of the proposed federated learning model MePC-F is evaluated based on the reconstructed images, average precision, and average loss of DLG. First, the performance of five schemes for defending against DLG attacks is presented, and then the performance of the proposed federated learning model MePC-F is compared with centralized and PeMPC. All results in the following scenarios are the average of 1000 independent experiments.

1)防御DLG攻击的性能1) Performance of defending against DLG attacks

这一节评估MePC-F的有效性并与FL、PeMPC和DP算法(Gaussian分布噪声和Laplace分布噪声)在DLG重建图像方面进行比较。为MNIST数据集上的单个图像计算网络的公共梯度，不同方案的结果如图4所示。由于研究[17]表明隐藏第一层的梯度可以减少数据的重建，用四种方法替换第一层的梯度(权重和偏置项)：提出的MePC-F、PeMPC、高斯分布(μ＝0，σ＝1)噪声和拉普拉斯分布(μ＝0，σ＝1)噪声以查看DLG的行为。在完成隐藏第一层的梯This section evaluates the effectiveness of MePC-F and compares it with FL, PeMPC, and DP algorithms (Gaussian distributed noise and Laplace distributed noise) on DLG reconstructed images. The common gradient of the network is calculated for a single image on the MNIST dataset, and the results of different schemes are shown in Figure 4. Since the study [17] showed that the gradient of the hidden first layer can reduce the reconstruction of the data, four methods are used to replace the gradient of the first layer (weights and bias terms): the proposed MePC-F, PeMPC, Gaussian distributed (μ = 0, σ = 1) noise, and Laplace distributed (μ = 0, σ = 1) noise to see the behavior of DLG. After completing the gradient of the hidden first layer

度后，DLG使用这些梯度来恢复创建公共共享梯度的图像。After quantization, DLG uses these gradients to recover the image creating a common shared gradient.

从图4可以看出，在没有任何方法隐藏第一层的梯度(图4(a)中的FL)时，DLG过程可以准确地重建训练数据。当第一层的梯度用本发明提出的方法MePC-F进行保护时，图4(b)中可以有效地防止信息泄漏。当迭代步数达到500时，DLG仍然无法构建图像。从图4(c)中可以看到与图4(b)类似的结果，PeMPC也可以防御图4中的DLG攻击。从图4(d)可以看出，通过在第一层加入高斯噪声，重建图像从第15轮开始有部分显示，到第20轮已经构建了原图的基本轮廓。随着迭代轮数增加到500轮，可以清晰地恢复图像。图4(e)中的拉普拉斯噪声与驾驭高斯噪声也有类似的现象。As can be seen from Figure 4, when there is no method to hide the gradient of the first layer (FL in Figure 4(a)), the DLG process can accurately reconstruct the training data. When the gradient of the first layer is protected by the method MePC-F proposed in the present invention, information leakage can be effectively prevented in Figure 4(b). When the number of iterations reaches 500, DLG still cannot construct the image. From Figure 4(c), it can be seen that the results are similar to those in Figure 4(b), and PeMPC can also defend against the DLG attack in Figure 4. From Figure 4(d), it can be seen that by adding Gaussian noise to the first layer, the reconstructed image is partially displayed from the 15th round, and the basic outline of the original image has been constructed by the 20th round. As the number of iterations increases to 500, the image can be clearly restored. The Laplace noise in Figure 4(e) also has similar phenomena to the driven Gaussian noise.

从图5可以看出，如果恶意服务器将所有隐藏层的梯度作为纯文本接收，则重建过程能够获得最低梯度损失和图像的MSE(图5中的绿线)。随着轮数的增加，PeMPC和MePC-F不会收敛到零，图像的MSE达到10⁷。在原始梯度上添加拉普拉斯和高斯噪声会收敛到10^-5，图4也证明了当达到20轮时数据可以被重构。图像的MSE越大，表明图像被重建的可能性就越小。As can be seen from Figure 5, if the malicious server receives all the gradients of the hidden layers as plain text, the reconstruction process is able to obtain the lowest gradient loss and the MSE of the image (the green line in Figure 5). As the number of rounds increases, PeMPC and MePC-F do not converge to zero, and the MSE of the image reaches 10 ⁷ . Adding Laplace and Gaussian noise to the original gradient converges to 10 ^-5 , and Figure 4 also proves that the data can be reconstructed when 20 rounds are reached. The larger the MSE of the image, the less likely it is that the image is reconstructed.

基于以上实验结果验证了在原始梯度上加入拉普拉斯和高斯噪声可以防止早期的部分梯度泄漏，但是随着轮数的增加，由于深度泄漏仍然会恢复原始数据。然而，PeMPC和MePC-F无论训练轮数多久都是可以防止DLG攻击重建原始数据的有效方法。Based on the above experimental results, it is verified that adding Laplace and Gaussian noise to the original gradient can prevent partial gradient leakage in the early stage, but as the number of rounds increases, the original data will still be restored due to deep leakage. However, PeMPC and MePC-F are effective methods to prevent DLG attacks from reconstructing the original data regardless of the number of training rounds.

2)平均准确率和平均损失的性能比较2) Performance comparison of average accuracy and average loss

在本节中，评估MePC-F的有效性，并与集中式和PeMPC在平均准确度、MNIST和CIFAR-10数据集上的平均损失方面进行比较。In this section, the effectiveness of MePC-F is evaluated and compared with Centralized and PeMPC in terms of mean average accuracy and mean loss on MNIST and CIFAR-10 datasets.

从图6(a)中可以看出模型在MNIST数据集上达到98％的准确度所需的轮数。三个方法的平均准确率都随着训练轮次的增加而增加。在MNIST数据上实现目标准确度集中式方法需要25轮，PeMPC需要140轮和MePC-F需要40轮，MePC-F需要的训练轮数比PeMPC低71.2％。原因是所提出的强化联邦学习算法PreFLa可以通过与环境的交互找到更好的聚合参数权重a_i,k，可以更好地应对No-IID数据，加速模型收敛，达到目标精度。集中式是在在所有数据的组合上训练，所以它的准确性将高于联邦学习算法的准确性。但是从图中可以看出PeMPC的收敛几乎可以达到集中式的精度。Figure 6(a) shows the number of rounds required for the model to achieve 98% accuracy on the MNIST dataset. The average accuracy of the three methods increases with the increase in training rounds. To achieve the target accuracy on the MNIST data, the centralized method requires 25 rounds, PeMPC requires 140 rounds, and MePC-F requires 40 rounds. The number of training rounds required for MePC-F is 71.2% lower than that of PeMPC. The reason is that the proposed reinforced federated learning algorithm PreFLa can find better aggregate parameter weights a _i,k through interaction with the environment, which can better cope with No-IID data, accelerate model convergence, and achieve the target accuracy. The centralized method is trained on a combination of all data, so its accuracy will be higher than that of the federated learning algorithm. However, it can be seen from the figure that the convergence of PeMPC can almost reach the accuracy of the centralized method.

从图6(b)中，可以看到三个方案的平均损失随着训练轮数的增加而减少。对于集中式，平均损失从0.233减少到0.052。PeMPC的平均损失从0.35减少到0.084。同时，本发明提出的MePC-F的平均损失降低到0.06，低于PeMPC的28.6％。当训练轮数达到100轮时，提出的MePC-F几乎可以达到集中式的损失值。From Figure 6(b), we can see that the average losses of the three schemes decrease as the number of training rounds increases. For the centralized scheme, the average loss decreases from 0.233 to 0.052. The average loss of PeMPC decreases from 0.35 to 0.084. At the same time, the average loss of MePC-F proposed in the present invention is reduced to 0.06, which is lower than 28.6% of PeMPC. When the number of training rounds reaches 100, the proposed MePC-F can almost reach the loss value of the centralized scheme.

从图7(a)中可以看出，模型在CAFIR-10上达到50％的目标准确度所需的轮数。可以看到与图6(a)类似的结果。三个模型的平均准确率都在增加，直到达到目标值。对于集中式，平均准确度从0.42增加到0.5，共23轮。PeMPC的平均准确率从0.372上升到0.5，共89轮。同时，所提出的MePC-F的平均精度在41轮时达到目标精度，比PeMPC低53.9％。图7(a)表明，MePC-F使用比PeMPC更优的权重a_i,k来更新全局模型，这导致更快的收敛速度。From Figure 7(a), we can see the number of rounds required for the model to reach the target accuracy of 50% on CAFIR-10. Similar results can be seen as in Figure 6(a). The average accuracy of the three models increases until the target value is reached. For the centralized model, the average accuracy increases from 0.42 to 0.5 in 23 rounds. The average accuracy of PeMPC increases from 0.372 to 0.5 in 89 rounds. Meanwhile, the average accuracy of the proposed MePC-F reaches the target accuracy at 41 rounds, which is 53.9% lower than PeMPC. Figure 7(a) shows that MePC-F uses better weights a _i,k than PeMPC to update the global model, which leads to faster convergence.

从图7(b)可以看出，三种方案的平均损失正在减少，直到达到稳定值。集中式、MePC-F、PeMPC依次达到损失最小值，提出的MePC-F的时间效率在PeMPC下更好。As can be seen from Figure 7(b), the average loss of the three schemes is decreasing until they reach a stable value. Centralized, MePC-F, and PeMPC reach the minimum loss in turn, and the time efficiency of the proposed MePC-F is better under PeMPC.

表3 100轮内三种方案的最高精度Table 3 The highest accuracy of the three schemes within 100 rounds

MNISTMNIST CIFAR-10CIFAR-10 centralizedcentralized 98.4％98.4% 51.4％51.4% MePC-FMePC-F 98.2％98.2% 51.1％51.1% PeMPCPcP 97.6％97.6% 49.2％49.2%

表3给出三种方案在100轮内的精度。对于MNIST数据，所提出的MePC-F的平均准确率为98.2％，比PeMPC高0.6％。PeMPC的准确率几乎可以达到集中训练的准确率。对于CAFIR-10数据，MePC-F在100轮时的平均准确度高达0.511，比PeMPC高1.9％。它表明，MePC-F可以通过最优权重更新a_i,k,比PeMPC更好地聚合全局参数，从而获得更高的准确度，更接近集中准确度。Table 3 gives the accuracy of the three schemes within 100 rounds. For MNIST data, the average accuracy of the proposed MePC-F is 98.2%, which is 0.6% higher than PeMPC. The accuracy of PeMPC can almost reach the accuracy of centralized training. For CAFIR-10 data, the average accuracy of MePC-F in 100 rounds is as high as 0.511, which is 1.9% higher than PeMPC. It shows that MePC-F can better aggregate global parameters than PeMPC through the optimal weight update a _i,k , thereby achieving higher accuracy and closer to the centralized accuracy.

应当理解的是，对本领域普通技术人员来说，可以根据上述说明加以改进或变换，而所有这些改进和变换都应属于本发明所附权利要求的保护范围。It should be understood that those skilled in the art can make improvements or changes based on the above description, and all these improvements and changes should fall within the scope of protection of the appended claims of the present invention.

Claims

1. A real-time enhanced federated learning data privacy security method based on the MePC-F model in the Internet of Vehicles, characterized in that the method comprises the following steps:

S1. Build multiple edge servers E _i and a cloud server CS; obtain vehicle data D = {D ₁ , D ₂ , ..., D _i }, and the edge server E _i obtains the corresponding vehicle data D _i ;

and decrypted to

Randomly initialize B-type gradient

S3, edge server E _i through decoding function

from

Get some gradient information that needs to be retained

And the remaining gradient information

After homomorphic encryption,

Get the corresponding partial gradient information from other edge servers _Ej

The updated and shared A-type gradient information of all edge servers are

i∈[1,n], n is the total number of edge servers;

S4, all edge servers will

S5. Repeat steps S2-S4 until the termination condition is reached. The cloud server CS calculates the final global gradient parameters and sends them to each edge server. The edge server extracts features from multiple vehicle data, calculates the accuracy and optimal loss function of the MePC-F model, obtains the trained MePC-F model, completes the entire training process, and outputs it to the corresponding service of the Internet of Vehicles in real time.

2. According to the method for real-time enhanced federated learning data privacy security based on the MePC-F model in the Internet of Vehicles according to claim 1, it is characterized in that in the step S2, the specific method of training the local network model is:

A deep neural network (DNN) model is used. The DNN performs end-to-end feature learning and classifier training by taking different vehicle data as raw input, and uses stochastic gradient descent as a subroutine to minimize the loss value in each local training.

and decrypted into a type A gradient

Randomly initialize B-type gradient

Before local training, E _i is encrypted using homomorphic encryption

Decrypted to

And record it as

The loss function of the local model is set as follows:

L( _wi )=l( _wi )+λ( _wi,t - _wi,t+1 ) ²

Among them, l() represents the loss of the network, the second term is the L2 regularization term, and λ is the regularization coefficient; _wi represents the total weight information in the local model, wi _,t is the weight information of the local model at time t, and _wi,t+1 is the weight information of the local model at time t+1;

E _i updates G _k and replaces the model's weight parameters w _i , and continues local model training by minimizing the loss function as follows:

w _i = w _i −ηG _k

Where η is the learning rate and _Gk is

and

The total representation here is

Random initialization;

3. According to claim 1, the real-time enhanced federated learning data privacy security method based on the MePC-F model in the Internet of Vehicles is characterized in that in step S3, the specific method of the MePC algorithm is:

In the kth round of federated tasks, all edge servers use MePC to exchange base layer gradients.

in,

That is

The encrypted data after removing the copy you keep;

The gradient is

Encrypted as

The remaining gradients are homomorphically encrypted as

Divided equally into n-1 parts

The values are divided into:

only

It will not be leaked;

The gradient information shared to other E _j is

When E _i receives a data packet sent by other servers

It performs data validation locally.

4. The real-time enhanced federated learning data privacy security method based on the MePC-F model in the Internet of Vehicles according to claim 3 is characterized in that in the step S3, the specific method of performing data verification locally is:

In the kth round of federated tasks, the corresponding "multiplication" method is used for verification, and each edge server designs two decoding functions by itself, as follows:

Among them, _L0 is

The length, L' is

L ₀ = x·L

Where L is

Length,

and

are of equal length;

Require

First, initialize the decoding function as follows:

Packet

Multiply with the corresponding decoding function in other servers; since

E _i adds all the data packet arrays obtained from other edge servers E _j to the corresponding positions, obtains all the ciphertext data, and updates to the final

Right now:

Each time a secure multi-party computation is performed, as k increases, the decoding function in each E _i is replaced by

The binary is circularly shifted to the left by m units to ensure

5. The real-time enhanced federated learning data privacy security method based on the MePC-F model in the Internet of Vehicles according to claim 1 is characterized in that in the step S4, the specific method of the PreFla algorithm is:

In the uplink communication phase, each edge server not only trains the local model, but also uploads the local parameters to the cloud server CS for joint aggregation; after executing the MePC algorithm in the kth round of federation, E _i transmits the parameters to the cloud server CS through the TLS/SSL secure channel.

and

Use DQN-based reinforcement learning to predict parameter weight ratios and store information through Q functions to prevent spatial multidimensional disasters; in order to better personalize the model and reduce the waiting time for uploading weights in MePC-F, use DQN to select the optimal parameter weight ratio a _{i, k} and aggregate to update the global parameters in CS

Reinforcement learning includes: state, action, reward function, and feedback.

6. The real-time enhanced federated learning data privacy security method based on the MePC-F model in the Internet of Vehicles according to claim 5 is characterized in that in step S4, the specific methods of state, action, reward function and feedback are:

Status: Status of round k

in,

is the precision difference, expressed as:

Action: The parameter weight ratio _ai,k represents the action of the federated task in round k. To avoid falling into the local optimal solution, the ε-greedy algorithm is used to optimize the action selection process and obtain _ai,k :

where P is a set of weight permutations, rand is a random number, rand ∈ [0, 1], Q( _{si, k} , ai _{, k} ) refers to the cumulative discounted return of the agent when taking action ai _{, k} in state si _{, k} ; once the DQN is trained to approximate Q(si _{, k} , ai _{, k} ) during testing, the DQN agent will calculate {Q(si _{, k} , ai _{, k} )|ai _{, k} ∈ [P]} for all actions in the kth round; each action value represents the maximum expected return that the agent can obtain by choosing a specific action ai _, _{k in state si,} k;

Reward: Set the reward observed at the end of the kth federation round to:

in,

is a positive number that ensures that r _k grows exponentially with the training accuracy Δacc _i,k ; the first term incentivizes the agent to choose a device that can achieve higher test accuracy;

The DQN agent is trained to maximize the expectation of the cumulative discounted reward as shown below:

Among them, γ∈(0,1], represents a factor that discounts future rewards;

After obtaining r _k , the cloud server CS saves the multidimensional quadruple B _k = (s _{i, k} , a _i, k , r _k , s _{i, k + 1} ) for each round of federated tasks; the optimal action-value function Q(s _{i, k} , a _{i, k} ) is the cheat sheet sought by the RL agent and is defined as the maximum expectation of the cumulative discounted return starting from s _{i, k} :

Q(s _i,k ,a _i,k )=E(r _i,k +γmax Q(s _i,k+1 ,a _i,k )|s _i,k ,a _i,k )

Function approximation techniques are applied to learn a parameterized value function Q(s _{i, k} , a _{i, k} ; w _k ) to approximate the optimal value function Q(s _{i, k} , a _{i, k} ); r _k +γmax Q(s _{i, k+1} , a _{i, k} ) is the goal of learning Q(s _{i, k} , a _{i, k} ; w _k ); DNN is used to represent the function approximator; the RL learning problem becomes minimizing the MSE loss between the target and the approximator, defined as:

l( _wk )=(ri _,k +γmax Q(Si _,k+1 ,ai _,k ; _wk )-Q(si _,k ,ai _,k ; _wk )) ²

CS updates the global parameter _wk as:

Where η≥0 is the step size;

After the cloud server CS obtains the best learning model, it obtains the k-th round weight ratio sequence a _{i, k} and sets the global parameter

Updated to:

All edge servers update global parameters

And start the next T rounds of local training.

7. The real-time enhanced federated learning data privacy security method based on the MePC-F model in the Internet of Vehicles according to claim 1 is characterized in that the HE encryption method in the method is specifically:

The encryption schemes for the weight matrix and the bias vector follow the same idea. The additive homomorphic encryption of a real number a is represented by a ^E . In additive homomorphic encryption, for any two numbers a and b, a ^E + b ^E = (a+b) ^E . The method to convert any real number r into an encoded rational fixed point v is:

Think gradient

Decode is defined as:

Multiplication of these encoded numbers requires the removal of the factor 1/2d; when using Paillier addition encryption, the encoded multiplication can be calculated exactly, but only one homomorphic multiplication is guaranteed; for simplicity, it is handled during decoding;

The largest encryptable integer is V-1, so the largest encryptable real number must take this into account. Therefore, the integer z and the decimal d are selected as follows:

V≥2 ^H+2d ≥2 ^1+z+3d .

8. The real-time enhanced federated learning data privacy security method based on the MePC-F model in the Internet of Vehicles according to claim 1 is characterized in that the optimal loss function in step S5 is

Where L( _wi ) represents the loss of the _Ei network.