CN114386570A - Heterogeneous federated learning training method based on multi-branch neural network model - Google Patents

Heterogeneous federated learning training method based on multi-branch neural network model Download PDF

Info

Publication number
CN114386570A
CN114386570A CN202111575862.3A CN202111575862A CN114386570A CN 114386570 A CN114386570 A CN 114386570A CN 202111575862 A CN202111575862 A CN 202111575862A CN 114386570 A CN114386570 A CN 114386570A
Authority
CN
China
Prior art keywords
model
branch
cloud
training
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111575862.3A
Other languages
Chinese (zh)
Inventor
陈旭
崔嘉洛
周知
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202111575862.3A priority Critical patent/CN114386570A/en
Publication of CN114386570A publication Critical patent/CN114386570A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a heterogeneous federated learning training method based on a multi-branch neural network model. A multi-branch neural network model is introduced as a shared global model, suitable sub-branch models can be matched according to the computing power of different devices, the scene of heterogeneous computing resources can be well adapted, the computing resources of different devices are fully utilized, and therefore the performance and efficiency of the whole heterogeneous federated learning training system are effectively improved; aiming at the characteristics of a multi-branch model, the invention provides a multi-branch model aggregation method based on shared layer parameters, which can aggregate different sub-branch models to form a global multi-branch model, so that model parameters can be effectively shared among different equipment models; on the basis of parameter aggregation, distillation learning is introduced to solve the problem of performance fluctuation after model aggregation, so that the convergence speed of the global model is accelerated, the number of required training rounds is reduced, and communication consumption is saved.

Description

Heterogeneous federated learning training method based on multi-branch neural network model
Technical Field
The invention relates to the technical field of federal learning, in particular to a heterogeneous federal learning training method based on a multi-branch neural network model.
Background
Existing federated learning algorithms such as FedAvg and its improved algorithms, while effective for distributed machine learning while satisfying data privacy, require that all models deployed on devices participating in federated learning be homogeneous due to limitations of model gradient aggregation. However, in a real situation, the existence of device heterogeneity is inevitable, which causes the size and capability of the training model to depend on the device with the weakest computing capability in the system, thereby causing resource waste and performance bottleneck of the system. Many scholars are also trying to address the federal learning training of heterogeneous models by introducing methods of knowledge distillation. For example, fedmad performs knowledge transfer by using a labeled public data set and a method of performing average aggregation on output logit of device models, so as to individually train each device model, but it cannot aggregate to form a public model on a central server like classical federal learning. In addition, in the work of FedDF, by averagely aggregating all equipment models and the output logit thereof, the integrated distillation for model aggregation is provided, and a plurality of heterogeneous shared models can be obtained through cloud aggregation. However, if the device heterogeneity changes, the device model types inevitably increase, so that the server maintenance and heterogeneous model training cost is increased sharply.
Disclosure of Invention
The invention provides a heterogeneous federated learning training method based on a multi-branch neural network model to overcome the defects in the prior art, and effectively improves the performance and efficiency of the whole heterogeneous federated learning training system.
In order to solve the technical problems, the invention adopts the technical scheme that: a heterogeneous federated learning training method based on a multi-branch neural network model comprises the following steps:
s1, initialization training of a cloud multi-branch model: in the cloud server, a global branch model with a plurality of branches and a common data set D exist0Different global branch models are reserved with respective output layers on the basis of sharing part of the public hidden layer; firstly, carrying out multiple global branch models on public data set D based on cloudPre-training to initialize;
s2, matching of the sub-branch neural network model: before the start of federal learning, all devices requesting to participate in federal learning report available calculation and storage information of the devices to a cloud server, and then the cloud server calculates and determines an optimal branch model suitable for the devices according to the collected device information and distributes the weighted single branch model to the corresponding devices;
s3, local training of the equipment model: after each participant device receives the single-branch model issued by the cloud, the single-branch model is replaced by a local model of the device, and then the current device model is trained on the basis of a local private data set;
s4, aggregation of cloud multi-branch lifting network models: after local training is completed, each participant device uploads parameters of the respective model to a cloud server, and weighted average is performed on the parameters of the device model and the same parts of the global branch models, so that the global branch models of the cloud are aggregated and updated;
s5, distillation training of the cloud multi-branch neural network model: performing knowledge distillation learning on the aggregated multi-branch network model by using a public data set based on the cloud and a model gradient uploaded by each device; after the distillation training is finished, model parameters of each branch in the multi-branch model are issued to corresponding equipment for the next round of local training;
s6, application of the cloud multi-branch neural network model: and repeating the steps S3 to S5 until a preset number of training rounds is reached, and finally obtaining a global branch neural network model.
In one embodiment, the global branch model has the overall parameter WsWherein the parameters of each sub-branch model are denoted as Ws kK is 1, …, K; on the device side, there are N participating devices, with a private local data set D for each device iiAnd a local model
Figure BDA0003424771300000021
In said step S1, each time is calculatedThe cross entropy loss value of the output of each branch model and the grounttrue, and then the loss value of each sub-branch is weighted and averaged to be used as the total loss value:
Figure BDA0003424771300000022
finally, the parameters are updated by back propagation according to the total loss value until the model converges:
Figure BDA0003424771300000023
in one embodiment, in step S2, when each device i reports its maximum satisfiable parameter number PiThen, match one to satisfy Pi≥PkThen the selected sub-branch model k is issued to the device i, where P iskIs the parameter number of the sub-branch model k.
In one embodiment, in step S3, a constraint is applied to a difference between parameters of the current local model and the cloud original model, so that under the constraint of the cloud model, the shared hidden layer of each device model can obtain a parameter distribution as similar as possible in the training.
In one embodiment, the applying a constraint to the difference between the parameters of the current local model and the cloud original model specifically includes: adding an L2 regular loss function to the original cross entropy loss function to measure the difference of parameter distribution between the local model and the cloud model; when device i receives the sub-branch model k, it initializes before local training
Figure BDA0003424771300000031
Assuming that the number of shared network layers of the sub-branch model k and the cloud main branch model is H, then the L2 canonical loss function trained locally by device i is represented as:
Figure BDA0003424771300000032
and combining the loss function based on the local data set to obtain the total loss function of the equipment i as follows:
Figure BDA0003424771300000033
wherein η is a hyper-parameter for measuring the ratio of the regular loss function of L2 to the overall loss function;
finally, the local model parameters of device i are optimally updated according to the following formula:
Figure BDA0003424771300000034
in one embodiment, in step S4, the polymerization method using multi-branch joint averaging specifically includes: when the cloud server receives the model parameter sets uploaded by all the participant devices
Figure BDA0003424771300000035
Then multi-branch polymerization is carried out; firstly, acquiring a dictionary set W of all parameter layers in a global multi-branch modelSIf the number of layers of the model is K, the index of the K-th layer is lkCorrespondingly, the parameter of the k-th layer is denoted as WS[lk](ii) a Then traverse WSAll parameter layers in (1): for each layer k, the layer parameter W is calculatedS[lk]Firstly, initializing to zero value, and counting the index l of the layerkUploading a set of models
Figure BDA0003424771300000036
The total number of occurrences in (1) is recorded as Countk]Simultaneously for each presence of lkIndex layer device model
Figure BDA0003424771300000037
Adding the parameters of its corresponding layer to the global model, i.e.Order to
Figure BDA0003424771300000038
Traversing and accumulating each layer of the global multi-branch model to obtain and output a new global model parameter WS
In one embodiment, the loss function in distillation training comprises: a cross entropy loss function with a real tag, as shown in equation 1; the KL divergence loss function from the soft tag output by the device model is shown in equation 6 below:
Figure BDA0003424771300000041
the total loss function of the distillation training is shown in equation 7, where the over-parameter α is used to set the weight ratio between the cross-entropy function and the KL divergence loss;
Figure BDA0003424771300000042
the global branch model optimized in distillation training is calculated from equation 8:
Figure BDA0003424771300000043
in one embodiment, in step S6, the output global multi-branch neural network model can be matched with the adaptive branch model according to the resource limitation and the accuracy requirement of different devices, so as to meet the applications of different devices; when the computing resources of the device side are insufficient or insufficient, the device side can only process the computation of the shared parameter part of the local model, then sends the intermediate result to the cloud global network for the computation of the rest part, and finally returns the cloud result to the edge device.
The present invention also provides an electronic device comprising: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the method for generating the quasi-cyclic hyperelliptic code.
The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of generating quasi-cyclic hyperelliptic codes described above.
Compared with the prior art, the beneficial effects are:
1. in the invention, a multi-branch neural network model is introduced as a shared global model, and suitable sub-branch models can be matched according to the computing power of different devices. Compared with the traditional federal learning method, the method can be well adapted to the scenes of heterogeneous computing resources, and fully utilizes the computing resources of different devices, thereby effectively improving the performance and efficiency of the whole heterogeneous federal learning training system;
2. aiming at the characteristics of a multi-branch model, the invention provides a multi-branch model aggregation method based on shared layer parameters, which can aggregate different sub-branch models to form a global multi-branch model, so that model parameters can be effectively shared among different equipment models;
3. on the basis of parameter aggregation, distillation learning is introduced to solve the problem of performance fluctuation after model aggregation, so that the convergence speed of the global model is accelerated, the number of required training rounds is reduced, and communication consumption is saved;
4. in the invention, the multi-branch neural network model obtained by cloud training can flexibly match the application requirements of various types of equipment, and the cooperative inference of end edges is supported.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
FIG. 2 is a schematic workflow diagram of the multi-branch model training system of the present invention.
FIG. 3 is a diagram illustrating the computation of a loss function in local training according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The invention is described below in one of its embodiments with reference to specific embodiments. Wherein the showings are for the purpose of illustration only and are shown by way of illustration only and not in actual form, and are not to be construed as limiting the present patent; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
In the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by the terms "upper", "lower", "left", "right", etc. based on the orientation or positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but it is not intended to indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limiting the present patent, and the specific meaning of the terms may be understood by those skilled in the art according to specific circumstances. In addition, if there is a description of "first", "second", etc. in an embodiment of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, the meaning of "and/or" appearing throughout is to include three juxtapositions, exemplified by "A and/or B" including either scheme A, or scheme B, or a scheme in which both A and B are satisfied.
Example 1:
as shown in fig. 1, a heterogeneous federated learning training method based on a multi-branch neural network model includes the following steps:
step 1, initializing and training a cloud multi-branch model. In the invention, most data exist in private data sets of each parameter side device, and only a small number of public data sets are stored in our cloud server. A global multi-branch neural network model exists in the cloud server as shown in fig. 2, wherein different branch models have respective output layers on the basis of sharing part of the common hidden layer. Therefore, the model size of each branch is different, and the requirements of different computing resource devices can be met. And then, pre-training the global multi-branch model based on the public data set of the cloud end to enable the initialization to have certain performance, and the convergence speed and the training precision of subsequent federal learning are improved.
Specifically, the system of the invention can be divided into a cloud end and an equipment end. In the cloud server, a global multi-branch model with K branches exists, and the overall parameters of the model are recorded as WsWherein the parameters of each sub-branch model are denoted as Ws kK1, …, K, and a common data set denoted D0. At the device side, there are N participating devices, and for each device i there is a private local data set DiAnd a local model
Figure BDA0003424771300000061
Since the target tasks of the various branches in the multi-branch model are the same, they will be based on a common data set D0To perform joint optimization training. Firstly, calculating the cross entropy loss value of each sub-branch model output and the grounttrue, and then taking the weighted average of the loss values of the sub-branches as the total loss value as shown in formula 1:
Figure BDA0003424771300000062
finally, the parameters are updated by back propagation according to the total loss value until the model converges, as shown in formula 2:
Figure BDA0003424771300000063
step 2, matching of the sub-branch neural network model: before the start of federal learning, all devices requesting to participate in federal learning should report the available computing resource information of the devices to the cloud server, and the model parameters are used as the measurement of computing capacity in the system of the invention. Thus, when each device i reports its maximum satisfiable parameter number PiThen, match one to satisfy Pi≥PkMaximum sub-branch model k, P ofkIs the parameter quantity of the sub-branch model k, and then the selected sub-branch model k is issued to the device i.
Step 3, local training of the equipment model: after each participant device receives the single-branch model issued by the cloud, the single-branch model is replaced by a local model of the device, and then the current device model is trained on the basis of a local private data set; but in each round of local training, as shown in fig. 3, constraints are imposed on the differences between the parameters of the current local model and the cloud-side raw model. Therefore, under the constraint of the cloud model, the shared hidden layers of the equipment models can obtain parameter distribution which is as similar as possible in training, and noise parameters can be effectively reduced in the subsequent parameter aggregation step.
In a traditional federal learning algorithm, equipment carries out reasoning calculation on local data through a local model to obtain a predicted value, then calculates a Loss value between the predicted value and a true value according to a traditional Loss function (such as a cross entropy function), and finally updates model parameters according to the obtained Loss value. However, since the local models of the devices in the system of the present invention are heterogeneous, the distribution of model parameters obtained by their respective local training may not be uniform. In order to enable models of different devices after local training to have parameter distribution which is as similar as possible, constraint needs to be imposed on updating of model parameters in the training process, and specifically, an L2 regular loss function is added to an original cross entropy loss function and used for measuring the difference of parameter distribution between a local model and a cloud model. When device i receives the sub-branch model k, it initializes before local training
Figure BDA0003424771300000071
Assuming that the number of shared network layers of the sub-branch model k and the cloud main branch model is H, the L2 canonical loss function trained locally by device i can be represented as:
Figure BDA0003424771300000072
and combining the loss functions based on the local data set to obtain the total loss function of the equipment i as shown in formula 4, wherein eta is a hyper-parameter and is used for measuring the ratio of the L2 regular loss function to the overall loss function.
Figure BDA0003424771300000073
Finally, the local model parameters of device i are optimally updated according to equation 5:
Figure BDA0003424771300000074
step 4, aggregation of the cloud multi-branch lifting network model: after local training is completed, each participant device uploads parameters of the respective model to the cloud server, and weighted average is performed on the device model and the parameters of the same part of the plurality of global branch models, so that the global branch models of the cloud are aggregated and updated.
Since the models uploaded by each device belong to different subbranches, and the subbranch models are heterogeneous, the models cannot be aggregated by directly using the traditional federal learning method. Considering that the device models are heterogeneous but part of the global multi-branch model, a multi-branch joint average aggregation method is proposed, which has the core that average aggregation is performed among the shared layer parameters of each device model, and the specific flow is as follows:
when the cloud server receives the module uploaded by all the participant devicesSet of type parameters
Figure BDA0003424771300000075
Figure BDA0003424771300000076
Then multi-branch polymerization is carried out; firstly, acquiring a dictionary set W of all parameter layers in a global multi-branch modelSIf the number of layers of the model is K, the index of the K-th layer is lkCorrespondingly, the parameter of the k-th layer is denoted as WS[lk](ii) a Then traverse WSAll parameter layers in (1): for each layer k, the layer parameter W is calculatedS[lk]Firstly, initializing to zero value, and counting the index l of the layerkUploading a set of models
Figure BDA0003424771300000077
The total number of occurrences in (1) is recorded as Countk]Simultaneously for each presence of lkIndex layer device model
Figure BDA0003424771300000078
Adding the parameters of its corresponding layer to the global model, i.e. ordering
Figure BDA0003424771300000081
Traversing and accumulating each layer of the global multi-branch model to obtain and output a new global model parameter WS
And 5, distillation training of the cloud multi-branch neural network model: due to the fact that structures of different branch models are inconsistent, parameter distribution of each equipment model is inconsistent, parameter noise is inevitably brought to the global model after parameter aggregation, and performance fluctuation of the global model is caused. Therefore, knowledge distillation learning is carried out on the aggregated multi-branch network model based on the public data set of the cloud and the model gradient uploaded by each device, and therefore the prediction accuracy of the multi-branch model is restored and improved. And after the distillation training is finished, model parameters of each branch in the multi-branch model are sent to corresponding equipment for the next round of local training.
Specifically, after a new global multi-branch model is generated by cloud aggregation, a public data set D based on the cloud is followed0And carrying out distillation training on model parameters of each device. Likewise, the loss function in distillation training is divided into two parts: the first is a cross entropy loss function with a real tag, as shown in formula 1, and the second is a KL divergence loss function with a soft tag output by an equipment model, as shown in formula 6.
Figure BDA0003424771300000082
The overall loss function of the distillation training is thus shown in equation 7, where the over-parameter α is used to set the weight ratio between the cross-entropy function and the KL divergence loss.
Figure BDA0003424771300000083
Then the global branch model optimized in distillation training can be calculated from equation 8:
Figure BDA0003424771300000084
and after the distillation training is finished, model parameters of each branch in the global multi-branch model are sent to corresponding equipment for the next round of local training.
Step 6, application of the cloud multi-branch neural network model: and repeating the steps S3 to S5 until a preset number of training rounds is reached, and finally obtaining a global branch neural network model. When the device needs to be deployed or newly added for application, the sub-branch models with different inference precision can be matched according to the resource limitation and the requirement of different devices, so that the flexibility and the diversity of model deployment are realized. In addition to this, the multi-branch neural network model can also support edge collaborative reasoning. For example, when the computing resources of the device side are insufficient or the requirement of low delay is met, the device side may only process the computation of the shared parameter part of the local model, then send the intermediate result to the cloud global network for the computation of the remaining part, and finally return the cloud result to the edge device, thereby implementing the accelerated inference of the edge side cooperation.
Example 2
The present embodiment provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the method for generating the quasi-cyclic hyperelliptic code in the embodiment 1.
Example 3
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for generating a quasi-cyclic hyperelliptic code according to embodiment 1.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A heterogeneous federated learning training method based on a multi-branch neural network model is characterized by comprising the following steps:
s1, initialization training of a cloud multi-branch model: in the cloud server, a global branch model with a plurality of branches and a common data set D exist0Different global branch models are reserved with respective output layers on the basis of sharing part of the public hidden layer; firstly, pre-training a plurality of global branch models based on a public data set D of a cloud end to initialize the global branch models;
s2, matching of the sub-branch neural network model: before the start of federal learning, all devices requesting to participate in federal learning report available calculation and storage information of the devices to a cloud server, and then the cloud server calculates and determines an optimal branch model suitable for the devices according to the collected device information and distributes the weighted single branch model to the corresponding devices;
s3, local training of the equipment model: after each participant device receives the single-branch model issued by the cloud, the single-branch model is replaced by a local model of the device, and then the current device model is trained on the basis of a local private data set;
s4, aggregation of cloud multi-branch lifting network models: after local training is completed, each participant device uploads parameters of the respective model to a cloud server, and weighted average is performed on the parameters of the device model and the same parts of the global branch models, so that the global branch models of the cloud are aggregated and updated;
s5, distillation training of the cloud multi-branch neural network model: performing knowledge distillation learning on the aggregated multi-branch network model by using a public data set based on the cloud and a model gradient uploaded by each device; after the distillation training is finished, model parameters of each branch in the multi-branch model are issued to corresponding equipment for the next round of local training;
s6, application of the cloud multi-branch neural network model: and repeating the steps S3 to S5 until a preset number of training rounds is reached, and finally obtaining a global branch neural network model.
2. The method of claim 1, wherein the global branch model has overall parameters WsWherein the parameters of each sub-branch model are denoted as Ws kK is 1, …, K; on the device side, there are N participating devices, with a private local data set D for each device iiAnd a local model
Figure FDA0003424771290000011
In said step S1, each branch modulus is calculatedThe cross entropy loss value of the output of the type and the group is then weighted and averaged for each sub-branch as the overall loss value:
Figure FDA0003424771290000021
finally, the parameters are updated by back propagation according to the total loss value until the model converges:
Figure FDA0003424771290000022
3. the method for heterogeneous federated learning and training based on multi-branch neural network model as claimed in claim 2, wherein in step S2, when each device i reports its maximum satisfiable parameter PiThen, match one to satisfy Pi≥PkThen the selected sub-branch model k is issued to the device i, where P iskIs the parameter number of the sub-branch model k.
4. The method of claim 2, wherein in step S3, a constraint is applied to a difference between parameters of the current local model and the cloud original model, so that under the constraint of the cloud original model, the shared hidden layer of each device model can obtain a parameter distribution as similar as possible in the training.
5. The method according to claim 4, wherein the applying of the constraint on the difference between the parameters of the current local model and the cloud-side original model specifically comprises: adding an L2 regular loss function to the original cross entropy loss function to measure the difference of parameter distribution between the local model and the cloud model; when device i receivesIs the sub-branch model k, then it is initialized before local training
Figure FDA0003424771290000023
Assuming that the number of shared network layers of the sub-branch model k and the cloud main branch model is H, then the L2 canonical loss function trained locally by device i is represented as:
Figure FDA0003424771290000024
and combining the loss function based on the local data set to obtain the total loss function of the equipment i as follows:
Figure FDA0003424771290000025
wherein η is a hyper-parameter for measuring the ratio of the regular loss function of L2 to the overall loss function;
finally, the local model parameters of device i are optimally updated according to the following formula:
Figure FDA0003424771290000026
6. the method for heterogeneous federated learning training based on a multi-branch neural network model as claimed in claim 5, wherein in step S4, an aggregation method using multi-branch joint averaging specifically includes: when the cloud server receives the model parameter sets uploaded by all the participant devices
Figure FDA0003424771290000027
Then multi-branch polymerization is carried out; firstly, acquiring a dictionary set W of all parameter layers in a global multi-branch modelSIf the number of layers of the model is K, the index of the K-th layer is lkCorrespondingly, the parameter of the k-th layer is denoted as WS[lk](ii) a Then traverse WSAll parameter layers in (1): for each layer k, the layer parameter W is calculatedS[lk]Firstly, initializing to zero value, and counting the index l of the layerkUpload model set Wc iThe total number of occurrences in (i) } is counted [ lk]Simultaneously for each presence of lkDevice model W of index layerc iAdding the parameters of its corresponding layer to the global model, i.e. ordering
Figure FDA0003424771290000031
After each layer of the global branch model is traversed and accumulated, a new global model parameter W is obtained and outputS
7. The method of claim 6, wherein the loss function in distillation training comprises: a cross entropy loss function with a real tag, as shown in equation 1; the KL divergence loss function from the soft tag output by the device model is shown in equation 6 below:
Figure FDA0003424771290000032
the total loss function of the distillation training is shown in equation 7, where the over-parameter α is used to set the weight ratio between the cross-entropy function and the KL divergence loss;
Figure FDA0003424771290000033
the global branch model optimized in distillation training is calculated from equation 8:
Figure FDA0003424771290000034
8. the method for heterogeneous federated learning and training based on multi-branch neural network model of claim 7, wherein in step S6, the output global multi-branch neural network model can match the adaptive branch models according to the resource limitations and accuracy requirements of different devices, so as to meet the applications of different types of devices; when the computing resources of the device side are insufficient or insufficient, the device side can only process the computation of the shared parameter part of the local model, then sends the intermediate result to the cloud global network for the computation of the rest part, and finally returns the cloud result to the edge device.
9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor executes the computer program to implement the method of generating quasi-cyclic hyperelliptic code as claimed in any of claims 1 to 8.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of generating a quasi-cyclic hyperelliptic code according to any one of claims 1 to 8.
CN202111575862.3A 2021-12-21 2021-12-21 Heterogeneous federated learning training method based on multi-branch neural network model Pending CN114386570A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111575862.3A CN114386570A (en) 2021-12-21 2021-12-21 Heterogeneous federated learning training method based on multi-branch neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111575862.3A CN114386570A (en) 2021-12-21 2021-12-21 Heterogeneous federated learning training method based on multi-branch neural network model

Publications (1)

Publication Number Publication Date
CN114386570A true CN114386570A (en) 2022-04-22

Family

ID=81198395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111575862.3A Pending CN114386570A (en) 2021-12-21 2021-12-21 Heterogeneous federated learning training method based on multi-branch neural network model

Country Status (1)

Country Link
CN (1) CN114386570A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925829A (en) * 2022-07-18 2022-08-19 山东海量信息技术研究院 Neural network training method and device, electronic equipment and storage medium
CN115034836A (en) * 2022-08-12 2022-09-09 腾讯科技(深圳)有限公司 Model training method and related device
CN116614484A (en) * 2023-07-19 2023-08-18 北京邮电大学 Heterogeneous data federal learning method based on structure enhancement and related equipment
WO2024017001A1 (en) * 2022-07-21 2024-01-25 华为技术有限公司 Model training method and communication apparatus

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925829A (en) * 2022-07-18 2022-08-19 山东海量信息技术研究院 Neural network training method and device, electronic equipment and storage medium
WO2024017001A1 (en) * 2022-07-21 2024-01-25 华为技术有限公司 Model training method and communication apparatus
CN115034836A (en) * 2022-08-12 2022-09-09 腾讯科技(深圳)有限公司 Model training method and related device
CN115034836B (en) * 2022-08-12 2023-09-22 腾讯科技(深圳)有限公司 Model training method and related device
CN116614484A (en) * 2023-07-19 2023-08-18 北京邮电大学 Heterogeneous data federal learning method based on structure enhancement and related equipment
CN116614484B (en) * 2023-07-19 2023-11-10 北京邮电大学 Heterogeneous data federal learning method based on structure enhancement and related equipment

Similar Documents

Publication Publication Date Title
CN114386570A (en) Heterogeneous federated learning training method based on multi-branch neural network model
Nie et al. Network traffic prediction based on deep belief network in wireless mesh backbone networks
CN110533183A (en) The model partition and task laying method of heterogeneous network perception in a kind of assembly line distribution deep learning
CN113010305B (en) Federal learning system deployed in edge computing network and learning method thereof
CN106815782A (en) A kind of real estate estimation method and system based on neutral net statistical models
WO2021036414A1 (en) Co-channel interference prediction method for satellite-to-ground downlink under low earth orbit satellite constellation
CN108122048B (en) Transportation path scheduling method and system
CN108111335A (en) A kind of method and system dispatched and link virtual network function
CN113992676A (en) Incentive method and system for layered federal learning under terminal edge cloud architecture and complete information
CN113708969B (en) Collaborative embedding method of cloud data center virtual network based on deep reinforcement learning
CN110134507B (en) A kind of cooperative computing method under edge calculations system
CN110119399B (en) Business process optimization method based on machine learning
CN104021315A (en) Method for calculating station service power consumption rate of power station on basis of BP neutral network
CN109034232A (en) The automation output system and control method of urban planning condition verification achievement Report
JP2020077090A (en) Decentralized processing system and decentralized processing method
CN115001978B (en) Cloud tenant virtual network intelligent mapping method based on reinforcement learning model
CN116502709A (en) Heterogeneous federal learning method and device
CN106789163A (en) A kind of network equipment power information monitoring method, device and system
CN111292062A (en) Crowdsourcing garbage worker detection method and system based on network embedding and storage medium
CN114997422B (en) Grouping type federal learning method of heterogeneous communication network
CN116362327A (en) Model training method and system and electronic equipment
CN116302481A (en) Resource allocation method and system based on sparse knowledge graph link prediction
Sen et al. A Data and Model Parallelism based Distributed Deep Learning System in a Network of Edge Devices
CN114816755A (en) Scheduling method, scheduling device, processing core, electronic device and readable medium
CN114599043A (en) Air-space-ground integrated network resource allocation method based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination