CN117010484B

CN117010484B - Personalized federal learning generalization method, device and application based on attention mechanism

Info

Publication number: CN117010484B
Application number: CN202311277193.0A
Authority: CN
Inventors: 张璐; 杨耀
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2024-01-26
Anticipated expiration: 2043-10-07
Also published as: CN117010484A

Abstract

The invention relates to a personalized federal learning generalization method, equipment and application based on an attention mechanism, which comprises the following steps: initializing the sharing parameters of the global model, sending the sharing parameters to a client which is pre-connected, receiving the sharing parameters and the personalized parameters of each client after local training, and updating the sharing parameters of the server based on the sharing parameters of each client; and sending the personalized parameters of the existing client and the shared parameters of the server to an untrained new client, and generating the personalized parameters at the new client by using the super network based on the attention mechanism. The new client trains with local data to update the super network parameters instead of the local model parameters. The sharing parameter part is unchanged, and personalized parameters of the new client are generated through super network learning. When the super network of the new client is constructed, the super network refers to the personalized parameters of each model at the same time so as to introduce the correlation information of the personalized parameters of the client and promote the final effect.

Description

Personalized federal learning generalization method, device and application based on attention mechanism

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a personalized federal learning generalization method, device and application based on an attention mechanism.

Background

The federation learning trains a general model on the premise of data island (namely, data among all clients is not communicated and uploaded to a server) by sharing parameters or gradients trained by the data of all clients, so that the data privacy of the clients is protected. Personalized federal learning is a common federal learning method, and aims to keep personalized model parameters aiming at different data distribution of each client, adapt to the data distribution of the client, and improve the effect of a local model.

Personalized federal learning involves an important issue, namely how to guarantee generalization of models. In particular, when a client is newly added, particularly a client with less trainable data, the effect of the new client is often difficult to guarantee. The reason is that when the data are less, the local model directly carries out the training of the overall parameters, the fitting phenomenon is easy to occur, and the model effect is reduced.

Chinese patent publication No. CN115600686a discloses a federal learning system based on personalized convertors, which trains a personalized model of a new client by setting a super network at a server and distributing randomly initialized embedded vectors to the newly added client for reuse with local data. However, the randomly initialized trainable embedded vectors do not converge easily, and the model structure of each client lacks flexibility, and is only applicable to a local model with an attention layer, such as a transducer.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a personalized federal learning generalization method, equipment and application based on an attention mechanism, which improve the convergence of a new client by alleviating fitting and improve the training effect.

The aim of the invention can be achieved by the following technical scheme:

the invention provides a personalized federal learning generalization method based on an attention mechanism, which is applied to a server and comprises the following steps of:

initializing the sharing parameters of the global model, sending the sharing parameters to at least one client which is pre-established with connection, receiving and storing the sharing parameters and personalized parameters of each client after local training, updating the sharing parameters of the server based on the sharing parameters of each client, and executing the steps for a plurality of times until reaching a termination condition;

and sending the personalized parameters of all the existing clients and the shared parameters of the server to an untrained new client, generating the personalized parameters at the new client by using the super network based on the attention mechanism, training the super network based on the local data of the new client, and finishing the local update of the personalized parameters of the super network of the new client.

As a preferable technical scheme, the termination condition is that the communication round reaches a preset value.

As a preferable technical scheme, the input of the super network is the personalized parameter of each existing client, and the input is the personalized parameter of the new client.

As a preferred technical solution, the super network based on the attention mechanism includes:

the full connection layer is used for generating hidden vectors;

a plurality of normalization layers and a plurality of self-attention layers arranged between the normalization layers for generating personalized parameters of the new client according to the hidden vectors.

As a preferable technical scheme, the sharing parameters of the new client adopt the sharing parameters of the server.

As a preferable technical scheme, the method further comprises the following steps:

and receiving the sharing parameters and the personalized parameters of a plurality of clients including the new client after parameter initialization, and updating the sharing parameters of the server based on the sharing parameters of the clients in a weighted manner.

As a preferable technical scheme, the sharing parameters of the server are updated through weighted aggregation based on the sharing parameters of the clients.

In another aspect of the present invention, there is provided a personalized federal learning generalization method based on an attention mechanism, applied to an untrained new client, comprising the steps of:

receiving personalized parameters of a plurality of clients subjected to local training and sharing parameters of a server subjected to global summation;

updating parameters of the super network based on the attention mechanism by utilizing local data training, generating personalized parameters of a new client by utilizing the trained super network based on personalized parameters of a plurality of clients which have undergone local training, and taking the shared parameters of the server subjected to global summation as shared parameters of the new client;

uploading the updated personalized parameters and the updated shared parameters to the server.

In another aspect of the present invention, there is provided an electronic apparatus including: one or more processors and memory, the memory having stored therein one or more programs comprising instructions for performing the personalized federal learning generalization method based on an attention mechanism described above.

The invention further provides an application of the personalized federal learning generalization method based on the attention mechanism, and the personalized federal learning generalization method is applied to a service end and at least one vehicle-mounted end aiming at the vehicle network comprising the service end, wherein the service end is provided with a global model, the vehicle-mounted end is provided with a local model, the local model comprises sharing parameters and personalized parameters, and the vehicle-mounted end further comprises a super network for generating the personalized parameters when joining the vehicle network.

Compared with the prior art, the invention has the following advantages:

(1) The convergence of new client training is improved, and the training effect is improved: compared with the scheme that a common global average model is used for making an initialization model of a new client and then local training is directly carried out, the method and the device generate personalized parameters of the new client by using the super network based on an attention mechanism, can ensure the rapid convergence of the new client model, avoid overfitting caused by data deficiency in the local training, and reserve the generalization capability of the global model caused by wide coverage data. Different from the existing scheme of distributing embedded vectors for each client to train, the super-network training input is the personalized parameter of each trained client, and is easy to converge.

(2) The method is suitable for scenes with various client model structures, and has strong applicability: different from the existing partial schemes, the client is limited to adopt a certain network structure, the local model structure of each client is not limited, and can be exemplified by CNN, a transformer or other structures, and a personalized layer in the network structure is used as the output of the super network, so that the local training of the client can be more flexible and is not limited by calculation conditions and the like.

Drawings

FIG. 1 is a flowchart of a federal learning generalization method applied to a server in an embodiment;

FIG. 2 is a schematic diagram of a super network in an embodiment;

FIG. 3 is a flow chart of a federal learning generalization method applied to new clients in an embodiment;

FIG. 4 is a flowchart of a parameter update process of an existing client in an embodiment;

fig. 5 is a schematic structural diagram of an electronic device in an embodiment.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

The features of the following examples and embodiments may be combined with each other without any conflict.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

Example 1

In order to solve or partially solve the problem that in the prior art, when a new client is added in federal learning, the model effect of the new client is difficult to guarantee, the embodiment provides a personalized federal learning generalization method based on an attention mechanism, so as to be applied to a server. The method is based on the attention mechanism of model weight similarity, a super network based on the attention mechanism of model parameter correlation is used at a new client, and a plurality of models of the original clients are aggregated and trained to obtain model parameters of the new client.

In this embodiment, there are N clients, 1 server, and communication round K.

Referring to fig. 1, the method comprises the steps of:

s1, randomly initializing sharing parameters of a global model of a serverAnd personalization parameters of the client {>,…/>}；

S2, the server sends initialization parametersTo each client;

s3, the client receives and updates the sharing parametersLocal parameters are performed based on local data (including +.>And->) Is to get { about }>,/>…/>Sum {>,/>…/>}；

S4, the client side updates the updated local parameter {,/>…/>Sum {>,/>…/>Uploading to a server;

s5, the server receives parameters uploaded by each client and { according to training data quantity of each client,/>…Weighting aggregation is carried out to obtain new +.>. Step S2 is skipped until the cycle number reaches a preset communication round K;

s6, adding a new clientParticipate in training, sharing parameter stored in server +.>Personalized parameter {,/>…/>Transmitting to the new client;

s7, constructing a super network based on an attention mechanism at the new client to generate local model parameters, and training the super network by using local data to obtain parameters of a local model;

and S8, transmitting the parameters of the local model to a server, receiving the model parameters of each client by the server, and carrying out weighted aggregation according to the training data quantity of each client.

Referring to fig. 2, a schematic diagram of the structure of an attention-based supernetwork is shown. The input of the model is the personalized parameter { of the existing client side,/>…/>Output as new client +.>Personalized parameters->. The model comprises a full connection layer, a standardization layer 1, a self-attention layer 1, a standardization layer 2, a self-attention layer 2 and a standardization layer 3 which are connected in sequence. The full connection layer is used for generating a plurality of hidden vectors matched with the number of the existing clients according to the personalized parameters of the existing clients. It is emphasized that the kind and number of layers in this embodiment may vary, for example, a structure of plural sets of standardized layers, self-attention layers may be used.

Referring to fig. 4, the parameter updating process of the existing client includes the following steps:

s1, receiving and updating shared parameters；

S2, carrying out local parameters according to the local data (comprisingAnd->) Is to get { about }>,/>…Sum {>,/>…/>}；

S3, the client side updates the updated local parameter {,/>…/>Sum {>,/>…/>Upload to the server.

The method considers the relation between the clients, in particular introduces a attention mechanism, and the input of the same super network is the personalized parameters of a plurality of original clients so as to generate the personalized parameters of a new client.

To illustrate the advantages of the method, the following provides a federal learning server-side update algorithm as a comparative example, which specifically includes the following steps:

step1, randomly initializing parameters of a global model；

Step2, sending global model parameters to each client;

step3, the client receives the global parameters and updates the local parameters;

step4, the server receives parameters of each client, carries out weighted aggregation according to training data quantity of each client, and jumps to Step2 until the circulation times reach a preset communication round K;

therefore, compared with the method that a common global average model is used for making an initialization model of a new client and then local training is directly carried out, the method and the device can ensure the rapid convergence of the new client model by using the super network based on the attention mechanism, avoid overfitting caused by lack of data in the local training, and retain the generalization capability of the global model caused by wide coverage data. The reason is that if the initialized client model is directly trained on complete parameters, the overall model of the client moves to a local optimal position biased to local data distribution, and when the local data is less, the optimal solution is far away from the global optimal solution, so that the effect of the local model is affected. However, the input of the super network is a model of other clients, so that the output model is constrained by the global training result, the over-fitting phenomenon can be greatly improved, and the convergence effect of the new client is still ensured.

In a specific application scenario, aiming at the Internet of vehicles comprising a service end and at least one vehicle-mounted end, the signed personalized federal learning generalization method is applied to the service end, the service end is provided with a global model, the vehicle-mounted end is provided with a local model, the local model comprises sharing parameters and personalized parameters, and the vehicle-mounted end further comprises a super network for generating the personalized parameters when joining the Internet of vehicles.

When the super network of the new client is constructed, the super network refers to the personalized parameters of each model at the same time, so that the correlation information of the personalized parameters of the client can be introduced, and the final effect is improved. Unlike previous solutions, the correlation between models is not considered in the training process.

Example 2

Based on embodiment 1, referring to fig. 3, the present embodiment provides a personalized federal learning generalization method based on an attention mechanism, so as to be applied to a new (i.e. untrained) client, the method includes the following steps:

s1, receiving personalized parameters of a plurality of existing client models and sharing parameters of a server subjected to weighted aggregation

S2, according to local data training, updating the super-network parameters to obtain local model personalized layer parameters, and using global average sharing parameters by other layers；

S3, constructing a super network based on an attention mechanism, wherein the network is input into a personalized parameter { in a client model,/>…/>Output as local model personalization layer parameter +.>；

And S4, uploading the new parameters to the server.

To illustrate the advantages of the present method, the following provides a federally learned client update algorithm as a comparative example, which specifically includes the following steps:

step31, receiving global model parameters as local model parameters, and reserving original parameters by a personalized layer;

step32, training and updating the local model according to the local data to obtain updated local model parameters;

step33, transmitting the updated local parameters except the personalized layer to the server.

The invention uses the super network based on the attention mechanism to ensure the rapid convergence of the new client model, avoid the overfitting caused by lack of data in the local training, and keep the generalization capability of the global model caused by wide coverage data.

Example 3

The present embodiment provides an electronic device, including: one or more processors and memory, the memory having stored therein one or more programs comprising computer program instructions for performing the personalized federal learning generalization method based on an attention mechanism as described in embodiment 1 or embodiment 2.

The method or apparatus set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having some function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Referring to fig. 5, a schematic structural diagram of an electronic device includes a processor, an internal bus, a network interface, a memory, and a nonvolatile memory, and may include hardware required by other services. The non-volatile memory stores instructions for executing the personalized federal learning generalization method of embodiment 1 or embodiment 2, and the processor reads the corresponding computer program from the non-volatile memory into the memory and then runs the computer program to implement the data acquisition method described in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present invention, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

Example 4

The present embodiment provides a computer-readable storage medium comprising one or more programs for execution by one or more processors of an electronic device, the one or more programs comprising computer program instructions for performing the personalized federal learning generalization method based on an attention mechanism as described in embodiment 1 or embodiment 2.

When the personalized federal learning generalization method of embodiment 1, the computer program instructions are:

S2, the server sends initialization parametersTo each client;

s3, receiving the local parameter { updated by the client,/>…/>Sum {>,/>…/>}；

S4, the server receives parameters uploaded by each client and { according to training data quantity of each client,/>…Weighting aggregation is carried out to obtain new +.>. Step S2 is skipped until the cycle number reaches a preset communication round K;

s5, when a new client is addedSharing stored in a server while participating in trainingParameter->Personalized parameter {>,/>…/>Transmitting to the new client;

and S6, constructing a super network based on an attention mechanism at a new client to generate local model parameters, training the super network by using local data to obtain the parameters of the local model, receiving the parameters of the local model, and carrying out weighted aggregation according to the training data quantity of each client based on the model parameters of each client.

When the personalized federal learning generalization method of embodiment 2, the computer program instructions are:

And S4, uploading the new parameters to the server.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Example 5

The embodiment provides a personalized federal learning generalization system based on an attention mechanism, which comprises N clients and 1 server.

Wherein the server is used for executing the following procedures:

S2, the server sends initialization parametersTo each client;

when the system has a newly added clientThe server is further configured to perform:

s5, when a new client is addedWhen participating in training, the sharing parameter stored in the server is +.>Personalized parameter {>,/>…/>Transmitting to the new client;

The client is used for executing the following processes:

And S4, uploading the new parameters to the server.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. The personalized federal learning generalization method based on the attention mechanism is characterized by being applied to a server and comprising the following steps of:

transmitting the personalized parameters of each existing client and the shared parameters of the server to an untrained new client, generating personalized parameters at the new client by using the super network based on the attention mechanism deployed at the new client, training the super network based on the local data of the new client, completing the local update of the personalized parameters of the super network of the new client,

the input of the super network is the personalized parameter of each existing client, and the input is the personalized parameter of the new client.

2. The method for generalizing personalized federal learning based on an attention mechanism according to claim 1, wherein the termination condition is that the communication round reaches a preset value.

3. The method for generalizing personalized federal learning based on an attention mechanism according to claim 1, wherein the attention mechanism based super network comprises:

the full connection layer is used for generating hidden vectors;

4. The personalized federal learning generalization method based on the attention mechanism according to claim 1, wherein the sharing parameters of the new client use the sharing parameters of the server.

5. The personalized federal learning generalization method based on an attention mechanism according to claim 1, further comprising the steps of:

6. The personalized federal learning generalization method based on an attention mechanism of claim 1, wherein the sharing parameters of the server are updated by weighted aggregation based on the sharing parameters of the respective clients.

7. A personalized federal learning generalization method based on an attention mechanism, which is applied to a new untrained client, comprising the steps of:

receiving personalized parameters of a plurality of clients subjected to local training and sharing parameters of a server subjected to weighted aggregation;

uploading the updated personalized parameters and the updated shared parameters to a server;

the super network based on the attention mechanism is deployed at the new client, the input of the super network is the personalized parameter of each existing client, and the input is the personalized parameter of the new client.

8. An application of the personalized federal learning generalization method based on an attention mechanism according to any one of claims 1 to 7, wherein the personalized federal learning generalization method is applied to a service end and at least one vehicle-mounted end for a vehicle network comprising the service end, the service end is deployed with a global model, the vehicle-mounted end is deployed with a local model, the local model comprises sharing parameters and personalization parameters, and the vehicle-mounted end further comprises a super network for generating the personalization parameters when joining the vehicle network.

9. An electronic device, comprising: one or more processors and memory, the memory having stored therein one or more programs, the one or more programs comprising instructions for performing the personalized federal learning generalization method based on an attention mechanism of any of claims 1-7.