CN117010484A

CN117010484A - Personalized federal learning generalization method, device and application based on attention mechanism

Info

Publication number: CN117010484A
Application number: CN202311277193.0A
Authority: CN
Inventors: 张璐; 杨耀
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2023-11-07
Anticipated expiration: 2043-10-07
Also published as: CN117010484B

Abstract

The application relates to a personalized federal learning generalization method, equipment and application based on an attention mechanism, which comprises the following steps: initializing the sharing parameters of the global model, sending the sharing parameters to a client which is pre-connected, receiving the sharing parameters and the personalized parameters of each client after local training, and updating the sharing parameters of the server based on the sharing parameters of each client; and sending the personalized parameters of the existing client and the shared parameters of the server to an untrained new client, and generating the personalized parameters at the new client by using the super network based on the attention mechanism. The new client trains with local data to update the super network parameters instead of the local model parameters. The sharing parameter part is unchanged, and personalized parameters of the new client are generated through super network learning. When the super network of the new client is constructed, the super network refers to the personalized parameters of each model at the same time so as to introduce the correlation information of the personalized parameters of the client and promote the final effect.

Description

Personalized federated learning generalization methods, equipment and applications based on attention mechanism

技术领域Technical field

本发明涉及人工智能技术领域，尤其是涉及一种基于注意力机制的个性化联邦学习泛化方法、设备、应用。The invention relates to the field of artificial intelligence technology, and in particular to a personalized federated learning generalization method, equipment and application based on an attention mechanism.

背景技术Background technique

联邦学习通过共享各客户端数据训练出来的参数或梯度，在“数据孤岛”（即各个客户端之间数据不互通，也不上传至服务器）的前提下训练通用模型，保护客户端的数据隐私。个性化联邦学习是常用的联邦学习方法，目的是针对各个客户端数据分布不同，保留个性化的模型参数，适应本客户端的数据分布，以提升本地模型的效果。Federated learning trains a common model by sharing parameters or gradients trained by each client's data on the premise of "data islands" (that is, data between clients does not communicate with each other and is not uploaded to the server) to protect the client's data privacy. Personalized federated learning is a commonly used federated learning method. The purpose is to retain personalized model parameters according to the different data distribution of each client and adapt to the data distribution of this client to improve the effect of the local model.

个性化联邦学习涉及到一个重要问题，即如何保证模型的泛化性。具体来说，当新增客户端，尤其是可训练数据较少的客户端时，新客户端的效果往往难以保证。原因是，当数据较少时，本地模型直接进行整体参数的训练，容易出现过拟合现象，降低模型效果。Personalized federated learning involves an important issue, that is, how to ensure the generalization of the model. Specifically, when a new client is added, especially a client with less trainable data, the effect of the new client is often difficult to guarantee. The reason is that when there is less data, the local model directly trains the overall parameters, which is prone to overfitting and reduces the model effect.

中国专利公开号CN115600686A公开了一种基于个性化Transformer的联邦学习系统，该申请通过在服务端设置一个超网络并为新加入的客户端分配随机初始化的嵌入向量再利用本地数据训练新客户端的个性化模型。然而，随机初始化的可训练嵌入向量不容易收敛，另外各客户端的模型结构缺乏灵活性，仅适用于transformer一类带有注意力层的本地模型。Chinese Patent Publication No. CN115600686A discloses a federated learning system based on a personalized Transformer. This application sets up a super network on the server and assigns randomly initialized embedding vectors to newly added clients, and then uses local data to train the personality of the new clients. model. However, randomly initialized trainable embedding vectors are not easy to converge. In addition, the model structure of each client lacks flexibility and is only suitable for local models with attention layers such as transformers.

发明内容Contents of the invention

本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种基于注意力机制的个性化联邦学习泛化方法、设备、应用，通过缓和过拟合提高新客户端的收敛性，提高训练效果。The purpose of the present invention is to provide a personalized federated learning generalization method, equipment and application based on the attention mechanism in order to overcome the shortcomings of the above-mentioned existing technologies, improve the convergence of new clients and improve the training effect by easing over-fitting. .

本发明的目的可以通过以下技术方案来实现：The object of the present invention can be achieved through the following technical solutions:

本发明的一个方面，提供了一种基于注意力机制的个性化联邦学习泛化方法，应用于服务端，包括如下步骤：One aspect of the present invention provides a personalized federated learning generalization method based on the attention mechanism, which is applied to the server and includes the following steps:

初始化全局模型的共享参数并发送给预先建立连接的至少一个客户端，接收并储存各个客户端经过本地训练后的共享参数以及个性化参数，基于各个客户端的共享参数更新服务端的共享参数，多次执行本步骤直至达到终止条件；Initialize the shared parameters of the global model and send them to at least one client that has established a connection in advance, receive and store the shared parameters and personalized parameters of each client after local training, and update the shared parameters of the server based on the shared parameters of each client, multiple times Execute this step until the termination condition is reached;

将各个已有客户端的个性化参数以及服务端的共享参数发送到未经训练的新客户端，在新客户端利用基于注意力机制的超网络生成个性化参数，并基于新客户端的本地数据训练超网络，完成新客户端超网络个性化参数的本地更新。Send the personalized parameters of each existing client and the shared parameters of the server to the untrained new client, use the attention mechanism-based super network to generate personalized parameters on the new client, and train the super network based on the local data of the new client. Network, complete the local update of the new client's super network personalization parameters.

作为优选的技术方案，所述的终止条件为通信轮次达到预设值。As a preferred technical solution, the termination condition is that the communication round reaches a preset value.

作为优选的技术方案，所述的超网络的输入为各个已有客户端的个性化参数，输出为新客户端的个性化参数。As a preferred technical solution, the input of the super network is the personalized parameters of each existing client, and the output is the personalized parameters of the new client.

作为优选的技术方案，所述的基于注意力机制的超网络包括：As a preferred technical solution, the attention mechanism-based super network includes:

全连接层，用于生成隐向量；Fully connected layer, used to generate hidden vectors;

多个标准化层以及多个设置在标准化层之间的自注意力层，用于根据隐向量生成新客户端的个性化参数。Multiple normalization layers and multiple self-attention layers set between the normalization layers are used to generate personalized parameters of new clients based on latent vectors.

作为优选的技术方案，所述的新客户端的共享参数采用服务端的共享参数。As a preferred technical solution, the shared parameters of the new client adopt the shared parameters of the server.

作为优选的技术方案，还包括如下步骤：As a preferred technical solution, the following steps are also included:

接收包括经过参数初始化后新客户端在内的多个客户端的共享参数以及个性化参数，基于各个客户端的共享参数加权更新服务端的共享参数。Receive shared parameters and personalized parameters from multiple clients, including new clients after parameter initialization, and weightedly update the shared parameters of the server based on the shared parameters of each client.

作为优选的技术方案，基于各个客户端的共享参数，通过加权聚合更新服务端的共享参数。As a preferred technical solution, based on the shared parameters of each client, the shared parameters of the server are updated through weighted aggregation.

本发明的另一个方面，提供了一种基于注意力机制的个性化联邦学习泛化方法，应用于未经训练的新客户端，包括如下步骤：Another aspect of the present invention provides a personalized federated learning generalization method based on the attention mechanism, which is applied to new untrained clients, including the following steps:

接收多个已进行过本地训练的客户端的个性化参数，以及服务端经过加全局和的共享参数；Receive personalized parameters from multiple clients that have been trained locally, as well as shared parameters from the server that have been globally summed;

利用本地数据训练更新基于注意力机制的超网络的参数，基于多个已进行过本地训练的客户端的个性化参数，利用训练后的超网络生成新客户端的个性化参数，将服务端经过加全局和的共享参数作为新客户端的共享参数；Use local data training to update the parameters of the super network based on the attention mechanism. Based on the personalized parameters of multiple clients that have been locally trained, use the trained super network to generate personalized parameters of the new client, and add the server to the global The shared parameters of and are used as shared parameters of the new client;

将更新后的个性化参数和共享参数上传至服务端。Upload the updated personalized parameters and shared parameters to the server.

本发明的另一个方面，提供了一种电子设备，包括：一个或多个处理器以及存储器，所述存储器内储存有一个或多个程序，所述一个或多个程序包括用于执行上述基于注意力机制的个性化联邦学习泛化方法的指令。Another aspect of the present invention provides an electronic device, including: one or more processors and a memory, one or more programs are stored in the memory, and the one or more programs include a program for executing the above-mentioned based on Instructions for generalization methods for personalized federated learning of attention mechanisms.

本发明的另一个方面，提供了上述基于注意力机制的个性化联邦学习泛化方法的应用，针对包括服务端以及至少一个车载端的车联网，所述的个性化联邦学习泛化方法应用于所述服务端，所述服务端部署有全局模型，所述车载端部署有本地模型，所述本地模型包括共享参数以及个性化参数，所述车载端还包括用于在加入车联网时生成所述个性化参数的超网络。Another aspect of the present invention provides the application of the above personalized federated learning generalization method based on the attention mechanism. For the Internet of Vehicles including a server and at least one vehicle terminal, the personalized federated learning generalization method is applied to all The server is deployed with a global model, and the vehicle-mounted terminal is deployed with a local model. The local model includes shared parameters and personalized parameters. The vehicle-mounted terminal also includes a device for generating the vehicle-mounted model when joining the Internet of Vehicles. Hypernetworks with personalized parameters.

与现有技术相比，本发明具有以下优点：Compared with the prior art, the present invention has the following advantages:

（1）改善新客户端训练的收敛性，提高训练效果：相比使用普通的全局平均模型做新客户端的初始化模型而后直接进行本地训练的方案，本发明使用基于注意力机制的超网络生成新客户端的个性化参数，既能保证新客户端模型的快速收敛，又避免了本地训练当中由于数据缺乏造成的过拟合，保留全局模型由于涵盖数据广泛而产生的泛化能力。不同于已有的为各个客户端分配嵌入向量以进行训练的方案，本发明的超网络训练输入即为各个已训练客户端的个性化参数，易于收敛。(1) Improve the convergence of new client training and improve the training effect: Compared with the solution of using an ordinary global average model to initialize the new client model and then directly perform local training, the present invention uses a super network based on the attention mechanism to generate new The client's personalized parameters can not only ensure the rapid convergence of the new client model, but also avoid overfitting due to lack of data in local training, and retain the generalization ability of the global model due to its wide coverage of data. Different from the existing solution of allocating embedding vectors to each client for training, the super network training input of the present invention is the personalized parameter of each trained client, which is easy to converge.

（2）适用于存在多种客户端模型结构的场景，适用性强：不同于已有的部分方案会限制客户端采用某一种网络结构，本发明的每个客户端的本地模型结构不受限，举例说明，既可以为CNN，也可以为transformer，也可以为其他结构，网络结构中的个性化层作为超网络的输出，因此客户端本地的训练可以更加灵活，不受算力条件等限制，另外，本申请的超网络位于客户端而非服务器，可以根据客户端本地的情况，灵活选择是否使用超网络。(2) Suitable for scenarios where there are multiple client model structures, with strong applicability: Unlike some existing solutions that restrict clients to adopt a certain network structure, the local model structure of each client in the present invention is not limited. , for example, it can be either CNN, transformer, or other structures. The personalized layer in the network structure serves as the output of the super network, so the client's local training can be more flexible and not restricted by computing power conditions. , In addition, the super network of this application is located on the client instead of the server. You can flexibly choose whether to use the super network according to the local situation of the client.

附图说明Description of the drawings

图1为实施例中应用于服务端的联邦学习泛化方法的流程图；Figure 1 is a flow chart of the federated learning generalization method applied to the server in the embodiment;

图2为实施例中超网络的结构示意图；Figure 2 is a schematic structural diagram of the hypernetwork in the embodiment;

图3为实施例中应用于新的客户端的联邦学习泛化方法的流程图；Figure 3 is a flow chart of the federated learning generalization method applied to new clients in the embodiment;

图4为实施例中已有客户端的参数更新过程的流程图；Figure 4 is a flow chart of the parameter update process of an existing client in the embodiment;

图5为实施例中电子设备的结构示意图。FIG. 5 is a schematic structural diagram of the electronic device in the embodiment.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明的一部分实施例，而不是全部实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都应属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, rather than all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the scope of protection of the present invention.

需要说明的是，在不冲突的情况下，下述的实施例及实施方式中的特征可以相互组合。It should be noted that, as long as there is no conflict, the features in the following embodiments and implementation modes can be combined with each other.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质（包括但不限于磁盘存储器、CD-ROM、光学存储器等）上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention may be provided as methods, systems, or computer program products. Thus, the invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备（系统）、和计算机程序产品的流程图和／或方框图来描述的。应理解可由计算机程序指令实现流程图和／或方框图中的每一流程和／或方框、以及流程图和／或方框图中的流程和／或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in a process or processes in a flowchart and/or a block or blocks in a block diagram.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprises," "comprises," or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements not only includes those elements, but also includes Other elements are not expressly listed or are inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or device that includes the stated element.

实施例1Example 1

为解决或部分解决现有技术中在联邦学习中加入新客户端时，新客户端的模型效果难以保证的问题，本实施例提供了一种基于注意力机制的个性化联邦学习泛化方法，以应用在服务端。本方法基于模型权重相似度的注意力机制，在新客户端使用基于模型参数相关性的注意力机制的超网络，对原各客户端的多个模型聚合训练，获得新客户端的模型参数。In order to solve or partially solve the problem in the existing technology that when a new client is added to federated learning, the model effect of the new client is difficult to guarantee, this embodiment provides a personalized federated learning generalization method based on the attention mechanism to Applied on the server side. This method is based on the attention mechanism of model weight similarity, and uses a super network based on the attention mechanism of model parameter correlation on the new client to aggregate and train multiple models of the original clients to obtain the model parameters of the new client.

本实施例中，共有N个客户端，1个服务器，通信轮次K。In this embodiment, there are N clients, 1 server, and K communication rounds.

参见图1，本方法包括如下步骤：Referring to Figure 1, this method includes the following steps:

S1，随机初始化服务器的全局模型的共享参数和客户端的个性化参数{/>,…/>}；S1, randomly initializes the shared parameters of the global model of the server and client’s personalized parameters{/> , …/> };

S2，服务器发送初始化参数至每个客户端；S2, the server sends initialization parameters to each client;

S3，客户端接收并更新共享参数，根据本地数据进行本地参数（包括/>和/>）的训练和更新，得到{/>,/>…/>}和{/>,/>…/>}；S3, the client receives and updates shared parameters , perform local parameters based on local data (including/> and/> ), get{/> ,/> …/> } and {/> ,/> …/> };

S4，客户端将更新后的本地参数{,/>…/>}和{/>,/>…/>}上传至服务器；S4, the client will update the local parameters { ,/> …/> } and {/> ,/> …/> }Upload to server;

S5，服务器接收各个客户端上传的参数，根据各客户端训练数据量对{,/>…}进行加权聚合，获得新的/>。跳转步骤S2，直至循环次数达到预设的通信轮次K；S5, the server receives the parameters uploaded by each client, and compares { based on the amount of training data of each client. ,/> … } Perform weighted aggregation to obtain new /> . Jump to step S2 until the number of cycles reaches the preset communication round K;

S6，加入新客户端参与训练，将服务器中存储的共享参数/>及个性化参数{,/>…/>}传输至新客户端；S6, add new client Participate in training and transfer the shared parameters stored in the server/> and personalized parameters{ ,/> …/> }Transfer to new client;

S7，在新客户端构建基于注意力机制的超网络生成本地模型参数，使用本地数据对超网络进行训练，得到本地模型的参数；S7, build a super network based on the attention mechanism on the new client to generate local model parameters, use local data to train the super network, and obtain the parameters of the local model;

S8，将本地模型的参数传输至服务器，服务器接收各个客户端的模型参数，根据各客户端训练数据量进行加权聚合。S8: Transmit the parameters of the local model to the server. The server receives the model parameters of each client and performs weighted aggregation according to the amount of training data of each client.

参见图2为基于注意力机制的超网络的结构示意图。模型的输入为已有客户端的个性化参数{,/>…/>}，输出为新客户端/>的个性化参数/>。模型包括顺次连接的全连接层、标准化层1、自注意力层1、标准化层2、自注意力层2、标准化层3。全连接层用于根据已有客户端的个性化参数生成多个与已有客户端数量匹配的隐向量。需要强调的是，本实施例中层的种类和数量可以进行改变，例如，可以使用多组标准化层、自注意力层的结构。See Figure 2 for a schematic structural diagram of a supernetwork based on the attention mechanism. The input of the model is the personalized parameters of the existing client { ,/> …/> }, the output is the new client/> Personalized parameters/> . The model includes a fully connected layer, a normalization layer 1, a self-attention layer 1, a normalization layer 2, a self-attention layer 2, and a normalization layer 3 that are connected in sequence. The fully connected layer is used to generate multiple latent vectors matching the number of existing clients based on the personalized parameters of existing clients. It should be emphasized that the type and number of layers in this embodiment can be changed. For example, multiple sets of normalization layers and self-attention layer structures can be used.

参见图4为已有客户端的参数更新过程，包括如下步骤：See Figure 4 for the parameter update process of an existing client, which includes the following steps:

S1，接收并更新共享参数；S1, receives and updates shared parameters ;

S2，根据本地数据进行本地参数（包括和/>）的训练和更新，得到{/>,/>…}和{/>,/>…/>}；S2, perform local parameters (including and/> ), get{/> ,/> … } and {/> ,/> …/> };

S3，客户端将更新后的本地参数{,/>…/>}和{/>,/>…/>}上传至服务器。S3, the client will update the local parameters { ,/> …/> } and {/> ,/> …/> }Upload to server.

本方法考虑客户端之间的关系，具体为引入了注意力机制，同一个超网络的输入为多个原始客户端的个性化参数，以生成新客户端的个性化参数。This method considers the relationship between clients, specifically introducing an attention mechanism. The input of the same super network is the personalized parameters of multiple original clients to generate personalized parameters of new clients.

为了说明本方法的优点，以下提供一种联邦学习的服务端更新算法作为对比例，其具体包括如下步骤：In order to illustrate the advantages of this method, a federated learning server-side update algorithm is provided below as a comparison example, which specifically includes the following steps:

Step1，随机初始化全局模型的参数；Step1, randomly initialize the parameters of the global model ;

Step2，发送全局模型参数至每个客户端；Step2, send global model parameters to each client;

Step3，客户端接收全局参数，进行本地参数更新；Step 3: The client receives global parameters and updates local parameters;

Step4，服务器接收各个客户端参数，根据各客户端训练数据量进行加权聚合，跳转步骤2，直至循环次数达到预设的通信轮次K；Step 4: The server receives the parameters of each client, performs weighted aggregation according to the amount of training data of each client, and jumps to step 2 until the number of cycles reaches the preset communication round K;

由此可见，相比使用普通的全局平均模型做新客户端的初始化模型而后直接进行本地训练，本发明使用基于注意力机制的超网络即能保证新客户端模型的快速收敛，又避免了本地训练当中由于数据缺乏造成的过拟合，保留全局模型由于涵盖数据广泛而产生的泛化能力。原因是，如果直接对初始化的客户端模型进行完整参数的训练，而该客户端整体的模型会往偏向本地数据分布的局部最优处移动，当本地数据少的时候，该最优解会距离全局的最优解非常远，从而影响本地模型的效果。然而，超网络的输入为其他客户端的模型，这样输出模型受到全局训练结果的约束，可以极大改善过拟合现象，并仍然保证新客户端的收敛效果。It can be seen that compared with using the ordinary global average model to initialize the new client model and then directly perform local training, the present invention uses a super network based on the attention mechanism to ensure the rapid convergence of the new client model and avoid local training. Among them, overfitting due to lack of data retains the generalization ability of the global model due to its extensive coverage of data. The reason is that if the initialized client model is directly trained with complete parameters, the overall client model will move towards the local optimum of the local data distribution. When there is little local data, the optimal solution will be far away The global optimal solution is very far away, thus affecting the effectiveness of the local model. However, the input of the super network is the model of other clients, so that the output model is constrained by the global training results, which can greatly improve the overfitting phenomenon and still ensure the convergence effect of the new client.

在一个具体的应用场景中，针对包括服务端以及至少一个车载端的车联网，签署的个性化联邦学习泛化方法应用于服务端，服务端部署有全局模型，车载端部署有本地模型，本地模型包括共享参数以及个性化参数，车载端还包括用于在加入车联网时生成个性化参数的超网络。In a specific application scenario, for the Internet of Vehicles including the server and at least one vehicle terminal, the signed personalized federated learning generalization method is applied to the server. The server is deployed with a global model, and the vehicle terminal is deployed with a local model. The local model Including shared parameters and personalized parameters, the vehicle terminal also includes a super network used to generate personalized parameters when joining the Internet of Vehicles.

本发明构造新客户端的超网络时，超网络同时参考各个模型的个性化参数，这样可以引入客户端个性化参数的相关性信息，提升最终效果。而非像以往的方案，训练过程中不考虑模型间的相关性。When the present invention constructs a super network of a new client, the super network simultaneously refers to the personalized parameters of each model, so that the correlation information of the client's personalized parameters can be introduced to improve the final effect. Unlike previous solutions, the correlation between models is not considered during the training process.

实施例2Example 2

在实施例1的基础上，参见图3，本实施例本提供了一种基于注意力机制的个性化联邦学习泛化方法，以应用在新的（即未经过训练的）客户端，方法包括如下步骤：On the basis of Embodiment 1, referring to Figure 3, this embodiment provides a personalized federated learning generalization method based on the attention mechanism to be applied to new (i.e., untrained) clients. The method includes Follow these steps:

S1，接收已有多个客户端模型的个性化参数以及服务端经过加权聚合的共享参数S1, receives the personalized parameters of multiple existing client models and the shared parameters of the server through weighted aggregation

S2，根据本地数据训练更新超网络参数，获得本地模型个性化层参数，其他层使用全局平均共享参数；S2, update the super network parameters based on local data training to obtain local model personalized layer parameters, and use global average shared parameters for other layers. ;

S3，构建基于注意力机制的超网络，该网络输入为客户端模型中的个性化参数{,/>…/>}，输出为本地模型个性化层参数/>；S3, build a super network based on the attention mechanism, and the network input is the personalized parameter in the client model { ,/> …/> }, the output is the local model personalized layer parameters/> ;

S4，将新的参数上传至服务端。S4, upload new parameters to the server.

为了说明本方法的优点，以下提供一种联邦学习的客户端更新算法作为对比例，其具体包括如下步骤：In order to illustrate the advantages of this method, a federated learning client update algorithm is provided as a comparison example below, which specifically includes the following steps:

Step31，接收全局模型参数作为本地模型参数，个性化层保留原参数；Step 31: Receive global model parameters as local model parameters, and the personalization layer retains the original parameters;

Step32，根据本地数据训练更新本地模型，得到更新后的本地模型参数；Step32, update the local model based on local data training and obtain updated local model parameters;

Step33，将除个性化层之外的更新后本地参数传输至服务器端。Step 33: Transmit the updated local parameters except the personalization layer to the server.

本发明使用基于注意力机制的超网络即能保证新客户端模型的快速收敛，又避免了本地训练当中由于数据缺乏造成的过拟合，保留全局模型由于涵盖数据广泛而产生的泛化能力。The present invention uses a super network based on the attention mechanism to ensure the rapid convergence of the new client model, avoids overfitting due to lack of data in local training, and retains the generalization ability of the global model due to its wide coverage of data.

实施例3Example 3

本实施例提供了一种电子设备，包括：一个或多个处理器以及存储器，所述存储器内储存有一个或多个程序，所述一个或多个程序包括用于执行如实施例1或实施例2所述基于注意力机制的个性化联邦学习泛化方法的计算机程序指令。This embodiment provides an electronic device, including: one or more processors and a memory, where one or more programs are stored in the memory, and the one or more programs include: Computer program instructions for the personalized federated learning generalization method based on the attention mechanism described in Example 2.

上述实施例阐明的方法或设备，具体可以由计算机芯片或实体实现，或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的，计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The methods or devices described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or A combination of any of these devices.

参见图5为一种电子设备的结构示意图，在硬件层面，该电子设备包括处理器、内部总线、网络接口、内存以及非易失性存储器，当然还可能包括其他业务所需要的硬件。非易失性存储器中储存有用于执行实施例1或实施例2中个性化联邦学习泛化方法的指令，处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行，以实现上述图1所述的数据采集的方法。当然，除了软件实现方式之外，本发明并不排除其他实现方式，比如逻辑器件抑或软硬件结合的方式等等，也就是说以下处理流程的执行主体并不限定于各个逻辑单元，也可以是硬件或逻辑器件。FIG. 5 is a schematic structural diagram of an electronic device. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory. Of course, it may also include other hardware required for services. The non-volatile memory stores instructions for executing the personalized federated learning generalization method in Embodiment 1 or 2. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs it. Implement the method of data collection described in Figure 1 above. Of course, in addition to software implementation, the present invention does not exclude other implementation methods, such as logic devices or a combination of software and hardware, etc. That is to say, the execution subject of the following processing flow is not limited to each logical unit, and may also be hardware or logic device.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in computer-readable media, random access memory (RAM), and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

实施例4Example 4

本实施例提供了一种计算机可读存储介质，包括供电子设备的一个或多个处理器执行的一个或多个程序，所述一个或多个程序包括用于执行如实施例1或实施例2所述基于注意力机制的个性化联邦学习泛化方法的计算机程序指令。This embodiment provides a computer-readable storage medium, including one or more programs for execution by one or more processors of an electronic device, the one or more programs including: 2 Computer program instructions for the personalized federated learning generalization method based on the attention mechanism.

当为实施例1的个性化联邦学习泛化方法时，计算机程序指令为：When using the personalized federated learning generalization method of Embodiment 1, the computer program instructions are:

S3，接收客户端更新后的本地参数{,/>…/>}和{/>,/>…/>}；S3, receive the client’s updated local parameters { ,/> …/> } and {/> ,/> …/> };

S4，服务器接收各个客户端上传的参数，根据各客户端训练数据量对{,/>…}进行加权聚合，获得新的/>。跳转步骤S2，直至循环次数达到预设的通信轮次K；S4, the server receives the parameters uploaded by each client, and compares { based on the amount of training data of each client. ,/> … } Perform weighted aggregation to obtain new /> . Jump to step S2 until the number of cycles reaches the preset communication round K;

S5，当加入新客户端参与训练时，将服务器中存储的共享参数/>及个性化参数{/>,/>…/>}传输至新客户端；S5, when adding a new client When participating in training, the shared parameters stored in the server/> and personalized parameters{/> ,/> …/> }Transfer to new client;

S6，在新客户端构建基于注意力机制的超网络生成本地模型参数，使用本地数据对超网络进行训练，得到本地模型的参数，接收本地模型的参数，基于各个客户端的模型参数，根据各客户端训练数据量进行加权聚合。S6, build a super network based on the attention mechanism on the new client to generate local model parameters, use local data to train the super network, obtain the parameters of the local model, receive the parameters of the local model, and based on the model parameters of each client, according to each client Weighted aggregation is performed based on the amount of terminal training data.

当为实施例2的个性化联邦学习泛化方法时，计算机程序指令为：When using the personalized federated learning generalization method of Embodiment 2, the computer program instructions are:

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes in the flowchart and/or in a block or blocks in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

实施例5Example 5

本实施例提供了一种基于注意力机制的个性化联邦学习泛化系统，包括N个客户端和1个服务器。This embodiment provides a personalized federated learning generalization system based on the attention mechanism, including N clients and 1 server.

其中，服务器用于执行以下过程：Among them, the server is used to perform the following processes:

当系统有新加入的客户端时，服务器还用于执行：When a new client joins the system The server is also used to execute:

客户端用于执行以下过程：The client is used to perform the following processes:

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到各种等效的修改或替换，这些修改或替换都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of various equivalent methods within the technical scope disclosed in the present invention. Modifications or substitutions shall be included in the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A personalized federated learning generalization method based on the attention mechanism, which is characterized in that it is applied to the server and includes the following steps:

Initialize the shared parameters of the global model and send them to at least one client that has established a connection in advance, receive and store the shared parameters and personalized parameters of each client after local training, and update the shared parameters of the server based on the shared parameters of each client, multiple times Execute this step until the termination condition is reached;

Send the personalized parameters of each existing client and the shared parameters of the server to the untrained new client, use the attention mechanism-based super network to generate personalized parameters on the new client, and train the super network based on the local data of the new client. Network, complete the local update of the new client's super network personalization parameters.

2. A personalized federated learning generalization method based on attention mechanism according to claim 1, characterized in that the termination condition is that the communication round reaches a preset value.

3. A personalized federated learning generalization method based on attention mechanism according to claim 1, characterized in that the input of the super network is the personalized parameters of each existing client, and the output is the personalized parameter of the new client. Personalization parameters.

4. A personalized federated learning generalization method based on an attention mechanism according to claim 1, characterized in that the super network based on the attention mechanism includes:

Fully connected layer, used to generate hidden vectors;

Multiple normalization layers and multiple self-attention layers set between the normalization layers are used to generate personalized parameters of new clients based on latent vectors.

5. A personalized federated learning generalization method based on attention mechanism according to claim 1, characterized in that the shared parameters of the new client adopt the shared parameters of the server.

6. A personalized federated learning generalization method based on an attention mechanism according to claim 1, characterized in that it further includes the following steps:

Receive shared parameters and personalized parameters from multiple clients, including new clients after parameter initialization, and weightedly update the shared parameters of the server based on the shared parameters of each client.

7. A personalized federated learning generalization method based on the attention mechanism according to claim 1, characterized in that based on the shared parameters of each client, the shared parameters of the server are updated through weighted aggregation.

8. A personalized federated learning generalization method based on the attention mechanism, which is characterized in that it is applied to new untrained clients and includes the following steps:

Receive personalized parameters from multiple clients that have been trained locally, as well as shared parameters from the server that have been globally summed;

Use local data training to update the parameters of the super network based on the attention mechanism. Based on the personalized parameters of multiple clients that have been locally trained, use the trained super network to generate personalized parameters of the new client, and add the server to the global The shared parameters of and are used as shared parameters of the new client;

Upload the updated personalized parameters and shared parameters to the server.

9. An application of the personalized federated learning generalization method based on the attention mechanism according to any one of claims 1 to 8, characterized in that, for the Internet of Vehicles including a server and at least one vehicle-mounted terminal, the personalized The federated learning generalization method is applied to the server. The server is deployed with a global model. The vehicle-mounted terminal is deployed with a local model. The local model includes shared parameters and personalized parameters. The vehicle-mounted terminal also includes: The super network of personalized parameters is generated when joining the Internet of Vehicles.

10. An electronic device, characterized in that it includes: one or more processors and a memory, one or more programs are stored in the memory, and the one or more programs include a method for executing the method of claim 1- 8. Instructions for any of the personalized federated learning generalization methods based on the attention mechanism.