CN115860135A

CN115860135A - Method, apparatus, and medium for solving heterogeneous federated learning using a super network

Info

Publication number: CN115860135A
Application number: CN202211432123.3A
Authority: CN
Inventors: 任皓; 李达; 刘敏超; 刘通泽; 段振飞
Original assignee: Chinese PLA General Hospital
Current assignee: Chinese PLA General Hospital
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2023-03-28
Anticipated expiration: 2042-11-16
Also published as: CN115860135B

Abstract

The invention discloses a method, equipment and a medium for solving heterogeneous federated learning by using an ultra-network, which are used for generating model weights for heterogeneous models on edge equipment through the ultra-network, helping different convolutional neural network models on equipment such as a camera, a smart phone and the like to accurately and quickly complete a federated learning task of the models under the condition of different model structures, and reducing memory space required to be occupied in a federated learning training process. Reasonably dividing a cifar10 data set; generating a heterogeneous model; performing heterogeneous federated learning; and repeating the steps S2 and S3, when the heterogeneous model of the client is converged, the super network generates model weight for the client, so that the client can accurately complete the target identification task, and the super network H is directly deployed in new edge equipment to generate weight for the new heterogeneous model and complete a new image processing task. Effectively finishing the image processing task, reducing the memory space and greatly reducing the communication cost.

Description

Method, apparatus, and medium for solving heterogeneous federated learning using a super network

Technical Field

The invention belongs to the technical field of federal learning, and particularly relates to a method, electronic equipment and a computer storage medium for solving heterogeneous federal learning by using an ultra-network.

Background

Mobile devices and internet of things devices are becoming the primary computing resource for billions of users worldwide. These devices generate large amounts of data that can be used to retrofit many existing applications. From a privacy and economic perspective, storing data and training models locally is becoming increasingly attractive as the computing power of these devices continues to grow. Federal Learning (FL) is a distributed machine learning framework that enables many customers to generate global inference models without sharing local data by aggregating locally trained model parameters. A widely accepted assumption is that the local model must be of the same architecture as the global model to enable the use of federal learning methods to produce a single global inference model. With this basic assumption, we must limit the complexity of the global model of the least computationally powerful customer to train their data. In practice, each client has different computing resources, computing power and communication bandwidth, and a heterogeneous deep learning model needs to be deployed at different clients so as to meet performance requirements of the different clients, so that solving the problem of the heterogeneous client is very important.

Heterogeneous federated learning (which may also be referred to as personalized federated learning, PFL) aims to provide a heterogeneous local model for each client in the federation, which may effectively solve heterogeneous client problems. Unlike global model heterogeneous strategies that train a single global model, this type of approach trains a single heterogeneous model on each client. The goal is to achieve efficient aggregation of heterogeneous models by modifying the aggregation process of the FL model. The most advanced PFL at present is a heterogeneous Federal super network, also called pFedHN, and the method realizes parameter sharing across clients by training a central super network model, generates a unique heterogeneous model for each client and keeps the capability of generating unique and diversified heterogeneous models. However, pFedHN has an obvious defect, in the client training stage, parameters before and after network model training need to be stored, the gradient of the network model is calculated for updating the extranet, and once the client network model is a complex model, the storage network model occupies a large memory space and even cannot be trained due to insufficient memory space. Therefore, there is a need to reduce the memory resources occupied by each edge device during the training process while using the super-network to provide a heterogeneous model satisfying the current computing resources for each edge device.

Disclosure of Invention

In view of the defects of the existing solutions, an object of the embodiments of the present invention is to provide a method for solving heterogeneous federated learning by using an ultra-network, so as to generate model weights for heterogeneous models on edge devices through the ultra-network, help models on devices such as cameras and smart phones to complete image processing tasks more accurately and quickly, and reduce memory space required to be occupied in the federated learning training process.

Another object of the embodiments of the present invention is to provide an electronic device.

It is a further object of embodiments of the present invention to provide a computer storage medium.

In order to solve the technical problems, the technical scheme adopted by the invention is that a method for solving heterogeneous federated learning by using an ultra-network is carried out according to the following steps:

s1, generating a heterogeneous model and initializing model parameters;

s2, generating heterogeneous model parameter weights;

s3, heterogeneous federation learning is carried out;

and S4, repeating the steps S2 and S3, when the heterogeneous model of the client is converged, generating model weight for the client by the super network, so that the client model can accurately complete the target identification task, and at the moment, directly deploying the super network H in new edge equipment to generate weight for the new heterogeneous model and complete a new image processing task.

Another technical solution adopted by the present invention is an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.

Yet another technical solution adopted by the present invention is a computer storage medium, in which at least one program instruction is stored, and the at least one program instruction is loaded and executed by a processor to implement the steps of the method.

The invention has the beneficial effects that:

(1) The method realizes end-to-end gradient descent training of the ultra-network and the client heterogeneous model, improves the accuracy of the client heterogeneous model, helps the heterogeneous model to effectively complete an image processing task, and reduces the memory space required to be occupied in the training process compared with the pFedHN.

(2) The method only aggregates and updates the super-network parameters at the server end, so that the client end only needs to send the super-network parameters to the server end, and the communication cost is greatly reduced because the super-network is only a simple linear network.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a line graph of accuracy of the present invention compared to the pFadHN method.

Fig. 2 is a flow chart of heterogeneous federated learning.

FIG. 3 is a schematic diagram of a training process framework according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims at intelligent edge devices with limited computing resources and aims to provide a heterogeneous model meeting the current computing resources for each edge device. The idea of the invention is as follows:

before training begins, the server sets the heterogeneous model of each client in advance at the cloud end and sends a uniform super-network model to each client, and an independent model weight is generated for the heterogeneous model of each client by using the super-network. Each local client has a unique set of embedded vectors that are passed as input to the super-net to generate heterogeneous model weights. The client can train own heterogeneous model according to the currently held data, and the hyper-network and the heterogeneous model can perform gradient descent training end to end in the training process, so that the method is more effective. After the local training is finished, the client side sends the updated extranet parameters to the server side, and updates the extranet parameters of the server side.

The method specifically comprises the following steps:

s1, generating a heterogeneous model and initializing model parameters:

in this embodiment, the server side initially selects and generates a heterogeneous model according to the computing capability of the edge device and initializes heterogeneous model parameters, and then distributes the corresponding heterogeneous model to each edge device, where each edge device has its own image data set, and these image data sets are only used for training of the heterogeneous model and the super-network on the current device and are not shared with other edge devices and servers.

S2, generating heterogeneous model parameters:

the method specifically comprises the following steps:

s2.1, initializing a federal learning training environment: setting an integral training round E, local data T and the number M of integral clients participating in federal learning, wherein the number of the clients participating in the federal learning selected in each round is M, and initializing a super network model H;

s2.2, the server is used for each client M at the cloud end _i Setting a corresponding client heterogeneous model C _i The main process is as shown in fig. 2, the server side sends a unified hyper-network model H to the client side _i After receiving the extranet, the client starts the federal learningTraining, namely, after one round of training is finished, the supernet model H is put in _i Returning to the server side;

s2.3, generating heterogeneous model parameters;

the method specifically comprises the following substeps:

s2.3.1, at the current client, the super network model H _i Is a linear network with L layers, and L is set according to the complexity and the structure of a client model, such as 2 layers. Client heterogeneous model C _i The convolution network accords with the ith client computing resource, and the convolution kernel of the convolution network contains most of model parameters, so the ultra-network model H _i Only the parameter weights of the convolution kernels of the convolution network need to be generated. Each convolution kernel in the client network contains K _in ×K _out Filters, each filter having a dimension f _size ×f _size Size. It is assumed that these convolution kernel parameter weights are all stored in a matrix

In the network, each layer of the client network is l =1 _i The number of layers of (a). Based on each layer of the client network>

Extra-net model H _i Will receive a layer insert->

As input and generates +>

Written as follows:

wherein R is a real number, and R is a real number,

is to save the parameter weight of the current layer convolution kernel of the client heterogeneous model>

F of (a) _size K _in ×f _size K _out A dimension real number parameter, and>

is described in description C _i Embedded vector of each layer of network information, supernet H _i By learning>

And generating the weight of the current model l layer of the client.

S2.3.2, the previous step has described the supernet model H in detail _i Weight of how to generate l-layer convolution kernel of client heterogeneous model

The process of generating the entire heterogeneous model weight will be described in detail next.

Super net H _i Will embed the vector

Output after linear operation->

Will be as H _i And inputting the next layer. H _i The last layer of (A) is a simple linear operation, which takes the form of H _i Output on level L-1->

As input to the last layer, through H _i The last layer will->

Linear projection onto->

Middle, or>

Is the l-th layer weight of the i-th client heterogeneous model. Therefore, the temperature of the molten metal is controlled,

the following formula can be written:

...

wherein

Is H _i Based on the weight of->

Is the weight of the ith entire client heterogeneous model.

In the formula, the learnable parameter is

And all>

Therefore/based on>

Is H _i Therefore, in the training process, the super-net performs gradient descent training end to end along with the client heterogeneous model.

S3, performing heterogeneous federated learning

The method specifically comprises the following steps:

s3.1, training the client by taking the current heterogeneous model as a target network, and generating weights for the heterogeneous model on one side of a super network, wherein the weights are represented as shown in a figure 3; while carrying out gradient descent training end to end together with the heterogeneous model and updating the parameter H of the ultra-net model _i The objective function is formulated as follows:

the extranet parameter updating formula is as follows:

wherein, L (-) is a loss function, X and Y respectively correspond to the image and the label of the data set, N is the number of the data sets held by the client, and ζ is the learning rate of the extranet model.

S3.2, after the client training is finished, the client returns the parameter H of the hyper-network model _i And wherein i = 1.,. M, performing extranet aggregation and updating extranet model parameters at the server side, and issuing the updated extranet model parameters to the corresponding client side:

h is a server-side ultra-network model;

and S4, repeating the steps S2 and S3, wherein the weight is generated for the client heterogeneous model when the client heterogeneous model is trained, so that when the client heterogeneous model is converged, the fact that the client heterogeneous model generates the model weight for the client by the super network is proved to enable the client model to accurately complete the target recognition task, and the super network H can be directly deployed in new edge equipment to generate the weight for the new heterogeneous model so as to complete a new image processing task.

The embodiment provides a program stored on an edge device, and when executed, the program is used for implementing the method for solving heterogeneous federated learning by using the ultra-network. The hyper-network can directly generate model weight for the heterogeneous model on the current edge equipment, and help the model on equipment such as a camera and a smart phone to complete an image processing task.

The method of the present embodiment, if implemented in the form of a software functional module and sold or used as a standalone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention or a part of the technical solution that substantially contributes to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method for identifying an underground anomaly interface according to the embodiment of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

In this embodiment, a CIFAR dataset is used for training and testing, so as to further verify that the embodiment of the present invention not only reduces the memory space required to be occupied in the training process, but also improves the accuracy of the client heterogeneous model, compared with the pFedHN.

This example tests the ResNet network against pFedHN on the CIFAR10 dataset in terms of accuracy. As shown in fig. 1, it can be seen that the accuracy of the present invention is consistently higher than that of pFedHN at the same setting.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. The method for solving heterogeneous federated learning by using the ultra-network is characterized by comprising the following steps of:

s1, generating a heterogeneous model and initializing model parameters;

s2, generating heterogeneous model parameters;

s3, heterogeneous federated learning is carried out;

2. The method for solving heterogeneous federated learning using the ultra-network as claimed in claim 1, wherein in S1, the server end initially generates a heterogeneous model and initializes heterogeneous model parameters according to the computing power of the edge devices, and then distributes the corresponding heterogeneous model to each edge device, each edge device has its own image data set, and these image data sets are only used for training the heterogeneous model and the ultra-network on the current device and are not shared with other edge devices and servers.

3. The method for solving heterogeneous federated learning using the ultra-network of claim 1, wherein the S2 comprises:

s2.2, the server is used for each client M at the cloud end _i Setting a corresponding client heterogeneous model C _i The server side sends a unified hyper-network model H to the client side _i After receiving the extranet, the client starts the federal learning training, and after one round of training is finished, the client starts the extranet model H _i Returning to the server side;

and S2.3, generating parameters of the heterogeneous model.

4. The method for solving heterogeneous federated learning using the ultra-network of claim 3, wherein the S2.3 comprises:

s2.3.1, at the current client, the number of layers of the linear network is set according to the complexity and the structure of a client model, and a super network model H _i Generating parameter weights for convolution kernels of a convolutional network, each convolution kernel in a client network comprising K _in ×K _out Filters, each filter size having f _size ×f _size Assuming that the convolution kernel parameter weights are all stored in a matrix

In the client network, each layer l =1 _i For each layer l of the client network, the extranet model H _i Will receive a layer embedding>

As input and generates->

Written as follows:

wherein R is a real number, and R is a real number,

the method is to save the parameter weight of the convolution kernel of the current layer of the client heterogeneous model

F of (a) _size K _in ×f _size K _out A dimension real number parameter, and>

is description of C _i Embedded vector of each layer of network information, supernet H _i By learning>

Generating the weight of the current model layer I of the client;

s2.3.2, super net H _i Will embed the vector

Output after linear operation->

Will be as H _i Input of the next layer, H _i The last layer of (2) is a linear operation, taken as H _i Output on level L-1->

As input to the last layer, by H _i Last layer will>

Linear projection onto->

Middle, or>

Is the l-th tier weight of the i-th client heterogeneous model,

therefore, the temperature of the molten metal is controlled,

the following formula can be written:

...

wherein

Is H _i In based on the weight of (c), in>

Is the weight of the ith entire client heterogeneous model.

5. The method for solving heterogeneous federal learning using a hyper network as claimed in claim 1, wherein said S3 comprises:

s3.1, training by the client by taking the current heterogeneous model as a target network, generating weight for the heterogeneous model by the super network, performing gradient descent training end to end together with the heterogeneous model and updating parameters Hi of the super network model, wherein an objective function formula is as follows:

the extranet parameter updating formula is as follows:

wherein L (-) is a loss function, X and Y respectively correspond to an image and a label of a data set, N is the number of the data sets held by a client, and ζ is the learning rate of the extranet model;

s3.2, after the client training is finished, the client returns the parameter H of the hyper-network model _i Wherein

Carrying out hyper-network aggregation and updating hyper-network model parameters at a server side, and issuing the updated hyper-network model parameters to corresponding client sides:

h is a server-side ultra-network model.

6. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method according to any of claims 1 to 5 when executing the computer program.

7. A computer storage medium, characterized in that at least one program instruction is stored in the storage medium, which at least one program instruction is loaded and executed by a processor to implement the steps of the method according to any of claims 1 to 5.